Instrumenting for Deep Learning Training data acquisition and Inference
Deep Learning Training and Inference Data Flow

Instrumenting for Deep Learning Training data acquisition and Inference

Abraham Lincoln said, “Give me six hours to chop down a tree and I will spend the first four sharpening the axe”. In context to deep learning systems, we will modify this to “Give me one month to design a deep learning system and I will spend the first twenty days in data preparation”. Data preparation (acquisition and filtering) is a key prerequisite for Deep Learning systems.  It is often the less talked about aspect of Deep Learning system design.

Use of existing datasets:  Many deep learning system architects assume availability of suitable dataset. In fact, an architect is fortunate if there exists a dataset that can be used for the deep learning project at hand.   Most of the time, such a tailored dataset is not available.  Even for performing transfer learning (where a smaller dataset is needed), the user needs to acquire data specific to the project.

Creating new dataset:  Usually a new dataset is created by instrumenting the system for which deep learning model is to be developed. If images are needed for the Deep Learning model, then a camera is setup to capture images. Similarly, if an acoustic signature is needed by the Deep Learning model, an acoustic sensor is needed to capture sound.  Both images and sound are saved to a data storage device after capture. Based on sampling time (more on this later) and the time duration of multiple cycles of the process, the size of the data storage device is determined.  Data captured from sensors is saved to an Edge storage device on the factory shop floor, or on an environmentally hardened data logger installed at a remote location.

If multiple features are used by the model, the data acquisition system has to capture image, sound, temperature, humidity, vibration….. at each sampling instant.  From a data scientist’s perspective, this is one row of the table.  Each subsequent data sample yields another row of the table.  If the data sampling time is 5 seconds, each sample is taken at multiples of 5 seconds.  Before data samples can be used for model creation, data is carefully labelled and inspected for missing or erroneous values.

Deep Learning Training and Inference Dataflow

Sampling time: Requirements for data acquisition system vary with type of application. One key specification of a data acquisition system is sampling time. Sampling time is related to a system’s dynamic behavior. A system that changes slowly can have longer sampling time, while fast moving systems need to sample at shorter time intervals. Features that change slowly- such as license plate image from a parked car or temperature in a room can be acquired using single board computer (like Raspberry Pi). But if a signal changes quickly, it requires fast and precise sampling speeds - such as for vibrations and acoustic signals. A customized real time data acquisition system is needed for systems that change rapidly.

Data size and features:   When developing a new model, it is difficult to estimate suitable size and number of features of a dataset. Based on desired and achieved accuracy of model, the designer may request for more data from the system.  The system architect may hit a wall, when increasing the dataset size does not increase accuracy of model.

It is at this time that the architect may consider adding more features to improve the model’s accuracy.   More features can mean installation of additional sensors or extracting information from combination of existing sensors.   Most architects try to minimize number of features during model development.  This is because the number of features used for model creation must also be available at time of inferencing.

Use of existing data sources: Closed loop control systems already have feedback (sensor) data coming to controllers such as single loop controllers or programmable logic controllers (PLC’s). These controllers usually support Modbus protocol, which can provide access to data from sensor.  A data acquisition system can read and log sensor data from the controller using Modbus or other protocols. This approach uses existing sensors, thereby eliminating need of downtime in a factory.

If installation of new sensors is necessary, a factory supervisor would appreciate use of non-contact or easily install-able sensors. An example of easily install-able sensor would be vibration sensors mounted on machinery or non-contact type image and acoustic sensor.

Further use of data acquisition system: It would be perfectly valid to ask about what happens to the data acquisition system after training data has been acquired.    There are two possibilities for data storage unit.  The first is that it can continue to be in place to look for and capture rare events. This helps in strengthening the model.  For example, if a model is needed for many co-located similar wind turbines, one wind turbine can be permanently fitted with data storage unit. Else, it can be moved to another process or machine, where training data is needed for a new model.

What happens to the sensors?  The sensors continue to be in place.  Not many of us know that installing and calibrating sensors can be a painstaking task. Now, instead of feeding sensor data to a data storage unit, the sensor data is fed to an inference compute unit. The trained model would perform better (with less pr-processing), when inference input data for the model is sourced from same or similar sensors as training data. 

IoT Gateway for data acquisition and inferencing: IoT systems use Gateways to capture data from sensors and send it to a central server. These gateways support variety of protocols (such as Bluetooth, MQTT, Modbus, Zigbee…) and offer connectivity to a server using either Ethernet, WiFi or Cellular. Based on type of Gateway, it is possible to add a data acquisition hardware to it. This way, the Gateway collects data samples before sending it to an on-premise or cloud-based server, for model training.   A single board computer based IoT Gateway also performs inference using the trained model and sensor data.

 Do such systems exist commercially?    Yes. AWS IoT SiteWise , Azure Stack Edge and Intel AIoT Developer Kit are few examples of data acquisition system at the edge.  These systems need to be configured to match requirements of the application.

---------------------------------------------------------------------------------------------------------------

 

To view or add a comment, sign in

More articles by Rahul Dubey, PhD

Insights from the community

Others also viewed

Explore topics