IoT: Big Data on Steroids
If you think that Big Data projects are challenging, wait until you are asked to do an Internet of Things project. In my previous blog I argued that the volume aspect of IoT projects can dwarf a Big Data project. But there is another V relevant to IoT that can truly trump a Big Data project – and that is velocity. Why?
When you hear “expanding velocity demands” in relation to a Big Data project, it is often in the remit of data storage and the ability to keep up with the influx of data. Whole architectures and technologies like Hadoop have risen to the occasion to make real-time storage of large data volumes possible.
In IOT, however, we have to go a step further; there is not only a real-time storage requirement but also the need for real-time analysis and decision making.
The velocity and volume of IoT data will make the current Big Data examples look pale. For example, Twitter is often mentioned as a source of Big Data, as the number of tweets during the day can reach hundreds of millions. I believe the actual number of tweets per second is between 6000-8500.
In contrast, for IoT I have already seen one company that needs to be able to ingest 250,000 events per second from its devices - and another that is architecting for a staggering 1,000,000 events per second.
These customers are looking for examples and use cases in the area of predictive maintenance and superior servitization. This means they are aiming to architect for real-time predictive analytics and the ability to trigger the processes within seconds after certain critical patterns have been detected.
One big difference between Big Data and IoT projects is time. While in Big Data projects it is perfectly normal for data to rest before it is used in any kind of analysis, in any IoT project time is of the absolute essence.
This is why I like to refer to IoT projects as Fast Data projects. I am not alone in this; IDC researcher John Gantz indicated that the IoT solutions you are likely to build will demand a decision from you within one minute after detecting the situations you designed the system to look for.
In order to make it even more complicated, there are a number of considerations when it comes to velocity. The first issue is that data coming from the devices is often in a raw and simple format, but in order to be of any use in the more analytical decision models the data needs to be 1) organized, 2) transformed and 3) enriched.
- Organized refers to the issue that the data might arrive out of order for analysis purposes, meaning that the data has to be re-shuffled on the fly.
- Transformation points to the fact that the analytical models your decisions rely often don’t need the raw data from the device but rather the derived values. Let’s say you are looking to calculate the normal range for a certain time series dataset. In order to do that, you might want to do a Bollinger band calculation. However the Bollinger band might need an Exponential Weighed Time Based Moving Average (EWMA). This EWMA can be seen as something that you see as a derived value stream, constantly re-calculating itself with the arrival of every new event.
- Enrichment is necessary if your decision models need not just data from the devices but also data from your enterprise sources - for example, which service level was agreed upon for this device? When was the last known maintenance? What historical reference data can be applied? Etc. Constantly retrieving this information might bring most enterprise applications to its knees.
In order to support these three capabilities in real time you need advanced integration, analysis and in-memory caching capabilities. Forrester calls this technology domain streaming analytics.
To conclude, it is fair to say that if, from a volume aspect IoT might look like Big Data in disguise, then from a velocity viewpoint IoT is Big Data on steroids.
Unlocking Opportunities by Articulating Value | Global VP Partner & Alliances Cumulocity | #unlockingopportunities
9yHi Derick, like your remark of that new V - value. Pitty differentation starts with a D. Cant have it all.
Exec Enterprise Digital-transformation Advisory - EU EntArch (EA-portfolio : BTaaS, Pgm, CoE, MFG) SME
9yGreat commentary Bart. Data => Analytics for insightful decision-making is where the most important V for Data-in-IoT [& Industry 4.0] is Value and differentiation.