Neural Network based Forecasting
Machine Learning Based Forecasting
ANNs (Artificial Neural Networks) have powerful pattern classification and pattern recognition capabilities. s. Inspired by biological systems, particularly by research into the human brain, ANNs are able to underlying relationships are unknown or hard to learn from and generalize from experience. Currently ANNs are being used for a wide variety of tasks in many different fields of business, industry and science etc.
One major application area of ANNs is forecasting. Several distinguishing features of ANNs make them valuable and attractive for a forecasting task. ANNs are data-driven self- adaptive methods; they learn from examples and capture subtle functional relationships among the data even if the underlying relationships are unknown or hard to describe. After learning the data presented to them (a sample), ANNs can often correctly infer the unseen part of a population even if the sample data contain noisy information.
The traditional approaches to time-series prediction, such as Box-Jenkins or ARIMA method assumes that time series understudy is generated from linear process. Linear models have advantages in that they can be understood and analyzed in great detail, and they are easy to explain and implement. However they may be totally inappropriate if the underlying mechanism is non-linear. As a matter of fact, ANNs are capable of performing nonlinear modeling without a priori knowledge about the relationships between input and output variables. Thus they are a more general and flexible modeling tool for forecasting.
Artificial neural networks, originally developed to mimic basic biological neural systems– the human brain particularly, are composed of a number of interconnected simple processing elements called neurons or nodes. Each node receives an input signal which is the total ‘‘information’’ from other nodes or external stimuli, processes it locally through an activation or transfer function and produces a transformed output signal to other nodes or external outputs. Although each individual neuron implements its function rather slowly and imperfectly collectively a network can perform a surprising number of tasks quite efficiently. This information processing characteristic makes ANNs a powerful computational device and able to learn from examples and then to generalize to examples never before seen.
Technique: Neural Networks (Feed-Forward Neural Networks)
Tools: R-Studio 3
R-Packages: “nnet” and “devtools”
Theoretical Approach to Neural Networks
The linear models for regression and classification respectively, are based on linear combinations of fixed nonlinear basis functions φj(x) and take the form
Where f (・) is a nonlinear activation function in the case of classification and is the identity in the case of regression. Our goal is to extend this model by making the basic functions φj(x) depend on parameters and then to allow these parameters to be adjusted, along with the coefficients {wj}, during training. There are, of course, many ways to construct parametric nonlinear basis functions. Neural networks use basic functions that follow the same form so that each basis function is itself a nonlinear function of a linear combination of the inputs, where the coefficients in the linear combination are adaptive parameters.
This leads to the basic neural network model, which can be described, a series of functional transformations. First we construct M linear combinations of the input variables x1, . . . , xD in the form
Where j = 1, . . . , M, and the superscript (1) indicates that the corresponding parameters are in the first ‘layer’ of the network. We shall refer to the parameters w (1) ji as weights and the parameters w(1)j0 as biases, following the nomenclature The quantities aj are known as activations. Each of them is then transformed using a differentiable, nonlinear activation function h(・) to give
zj = h(aj)
These quantities correspond to the outputs of the basic functions in that, in the context of neural networks, are called hidden units. The nonlinear functions h (・) are generally chosen to be sigmoidal functions such as the logistic sigmoid or the ‘tanh’
Following, these values are again linearly combined to give output unit activations
Where k = 1, . . . , K, and K is the total number of outputs. This transformation corresponds to the second layer of the network, and again the w(2) k0 are bias parameters. Finally, the output unit activations are transformed using an appropriate activation function to give a set of network outputs yk.
The Network diagram for the two layers neural network corresponding. The input, hidden, and output variables are represented by nodes and the weight parameters are represented by links between the nodes, in which the bias parameters are denoted by links coming from additional input and hidden variables x0 and z0. Arrows denote the direction of information flow through the network during forward propagation.
Our Experimentation with Forecasting:
- Moving average and ARIMA Modeling
- Linear regression and segment based forecasting
- Neural networks.
Data: Our objective was to forecast monthly sales for a direct sales retailer. The sales data from Jan 1998 to Mar 2002 was used for building forecasting model. Data from Apr 2002 - Sep 2002 was used for validation. Values from Apr 2002 -Sep 2002 were forecasted.
Moving Average and ARIMA Modeling
- The sales series showed an increasing trend but had very noise in it. There was no seasonality pattern in sales. Predicting such a series accurately was a vital challenge.
- As a first step, the series was smoothened using up to 6 moving average and exponential smoothing with alpha 0.1 to 0.9. The series was forecasted using both moving average and exponential smoothing method, error rate of the series was too high from both the techniques. i.e. 47%
- As a second step, we used ARIMA modeling method for forecasting. The result from ARIMA models were not satisfactory as the average error rate was very high i.e. 32% with a range of 0 to 75%.
Linear Regression and Segment Based Forecasting
- As a third step, we segmented agents into various segments using CART analysis basis various key performance indicators, demographics and other external variables.
- After identification of key segments, we forecasted the sales of each segment but the average error rate was still substantial. i.e. 27% with range of 2% to 65%
- For minimizing error rate, we built a linear regression model for forecasting agent’s sales using all the, up to 3 lagged value of sales, error component got from ARIMA modeling i.e. difference between model fit and actual values , key performance indicators and monthly promotions& offer variables. Quantitative (Impact on sales) and qualitative (nature of offer) analysis of monthly promotions were done to create variables as an input to regression model. Using it we were able to drop down the average error rate to 20% with error range of 4% to 35% but this was again on a very higher side.
Machine Learning for Forecasting: Neural Networks
- After applying all the above techniques, way forward we used feed-forward neural networks with a single hidden layer (“nnet” package in R). Various combinations of neurons were used to train neural network. We got the optimal output with 24 neurons.
- For training neural network model we used up to 12 lagged values of sales plus a bias variable which has value equal to 1 were used to train neural network. After training neural network using various combinations of neurons in the hidden layer, we achieved a great success in forecasting sales series. The average error rate was brought down to 10% with a range of 0% to 18%.
Pros and Cons of neural network:
- ANN has the ability to implicitly detect complex nonlinear relationships between dependent and independent variables.
- Also has ability to detect all possible interactions between predictor variables.
- Neural networks are too much of a black box. It’s hard to determine how they are solving a problem, as they are opaque.
B2B Business Development Executive
9yVery good article, I am interested to know if the hedge fund making use of this method to quantify the mass investor sentiment and the correlations of various stimulus with their buy or sell decision.