Machine learning application in analyzing crash simulation data
Huge data generated from crash simulations poses significant challenges of analyzing them effectively and quickly. Data generated from a single simulation might not be of a big concern but when several iterations are performed to optimize design parameters (thickness, material variations), output data needs to be compared for a significantly large number of nodes, sometimes more than a million, and several time steps. Each crash simulation generates a very high dimensional data set of the order of approx. 100 million (time steps × number of nodes), not to forget about degrees of freedom associated with each node to add to another dimension. An engineer needs to compare different simulation data by the help of animation tools and graph plots. But this is a very time consuming task to describe the change a few design parameters would lead to the crash behavior of a vehicle structure.
Are machine learning algorithms of any use to significantly reduce high dimensional space of crash data to some lower dimension? Can we use this low dimensional data to reveal characteristics of the crash behavior automatically or semi-automatically and visualize these behaviors? These are the two key questions around application of machine learning in post-processing of crash simulation data. Clustering and Dimensionality reduction algorithms are being used very successfully in various fields of engineering and statistics. Use of the same in analysis of crash simulation data have also been demonstrated successfully. Dimensionality reduction techniques can reduce a high dimensional data into a lower dimensional space while preserving inherent characteristics of the original data. A clustering algorithm, on the other hand, can divide dataset into sub sets based on their inherent similarities.
Clustering
Clustering techniques are used in many fields including pattern recognition, image analysis, bioinformatics, data compression. The clustering problem is the problem of dividing a given data set {x1, . . . , xN} of N points into several homogenous non-overlapping groups. Each such group or cluster should contain similar data items and data items from different groups should not be similar. Data shown in the figure below is in two dimension and on visualization shows 3 clusters of similar data.
It's easy to visualize these similarities in the data when in 2-3 dimension but a high dimensional data can not be visualized this way and finding similar homogeneous groups can only be done by programmed algorithms. One of the popular clustering algorithm is k-means clustering.
A basic k-means algorithm works with only a few steps. The first step is to randomly select k centroids, where k is equal to the number of randomly chosen clusters. Centroids are data points representing the center of a cluster. The second step of the algorithm assigns each data point to its nearest centroid. In the next step mean of all the data points associated to each cluster is computed and set to the new centroid. These three steps are repeated until no new centroid is found.
A few lines of python code leads to clustering of the data set and computes centroids of the same.
There could be different possibilities of use of clustering in crash simulation or in general large CAE data. Clustering can be applied to entire set of simulation data consisting several parts or on individual part at a time. When applied to individual component, interest could be to identify, compare and visualize location in the part with high deformation, across several design iterations. let's take an example of b-pillar deformation in side impact crash simulation. We would be highly interested in knowing the design parameters which are causing certain deformation patterns. Similarity in deformation patterns could be identified and visualized by clustering algorithms.
How to decide that how many such clusters of similar behavior would be present in the component? Clustering algorithm also provides convergence criteria to decide number of clusters.
Dimensionality reduction (DR)
Dimensionality reduction is mapping high dimensional space to low dimensional space in a way that converted data still describes the intrinsic properties of original data. So, conversion without loss of information, ideally. Sometimes objective is to find a low dimensional space as subset of original data, which is called feature selection. If the low dimensional space is entirely new set of dimensions then it's referred as feature extraction.
For a given data set with two dimensions x1 and x2, dimensionality reduction task would be to find out a z dimension which describes original data with only one dimension. In the figure below all blue dots data in x1 and x2 dimension are projected, red dots, onto vector z.
z dimension is now new representation of the data and it keeps inherent properties such that it's able to describe, to a great extent, variance of original data.
DR appears same as linear regression but they are completely two different methods for different objectives. Linear regression is to compute a function relating x1 and x2. Whereas in DR, a new vector is computed and all original data is projected on this new dimension to reduce dimensional space.
There are a several linear and non-linear DR algorithms available, Principal Component Analysis (PCA) is one of the widely used. It's a linear algorithm that minimizes the mapping error between original data and low dimensional representation as measured by the squared Euclidean distance.
One of the interesting example of application of DR can be seen in Diez, C., [2]. A two step approach is followed by the author to generate a low dimensional representation of Bumper part. In the first step a low dimensional Parametric Bezier regression function u on the range t in [0,1] is computed to represent undeformed geometry. In the second step element data projected onto the Bezier regression followed by a Gaussian kernel smoothing.
A different dimensionality reduction method by Kracker, D., [2], uses, in the first step, mapping of nodal data onto a enclosed spherical surface represented with fixed number discretized points along the two polar angles. This mapping specially addresses the challenge of representing a meshed part with the same number of dimensional array. When design iterations are carried out node/element count for parts changes which changes the dimensionality of the component but machine learning algorithms need the data for all iterations to be in same dimension. A big requirement! Mapping nodal data to enclosed spherical fixed dimensional surface overcomes this challenge. In the next step Kracker, D.,[2] a dimensionality reduction carried out using two different algorithms, PCA and t-SNE.
Summary
Use of machine learning for analysis of crash simulation data is relatively new subject. Application of two methods Clustering and Dimensionality reduction have been successfully demonstrated by some researchers. Clustering techniques provide a compact representation of the data by mapping each data point to a discrete cluster index. Which in turn can be used to classify simulation data based on physical behavior inherently clustered by the algorithm. In contrast, dimension reduction techniques (DRTs) are used to find compact representation of data by mapping each point to a lower dimensional array. Such a low dimensional data is further processed by other machine learning algorithms to make some useful interpretation.
References
- Kracker, D., Jochen, G., Axel, S., "Automatic analysis of crash simulations with dimensionality reduction algorithms such as PCA and t-SNE", 16th International LS-DYNA® Users Conference, 2020.
- Diez, C., "Machine learning process to analyze big-data from crash simulations", 7th BETA CAE International Conference, 2017.
- Diez, C., Wieser, C., Harzheim, L., & Schumacher, A., "Automated Generation of Robustness Knowledge for selected Crash Structures", 14th LS-DYNA Forum, 2016.
- Bohn, B., Garcke, J., Iza-Teran, R., Paprotny, A., Peherstorfer, B., Schepsmeier, U., & Thole, C. A., "Analysis of car crash simulation data with nonlinear machine learning methods", Procedia Computer Science, 2013.
Database Administrator at DRÄXLMAIER Group Romania
4yGreat article!
Head India Operations| Engineering Simulation| Automotive
4yGood example with an insight into Clustering and DR algorithms for data analysis. Without such statistical techniques (data science! ML!), it's difficult to derive a design decison from a heap of simulation data. It becomes more important in case of multi-objective optimization of any component or system performance. Keep it up!