The Four Classical Blunders of Time Series Forecasting
Could you be falling victim to one of these classic modeling blunders?
These four blunders aren’t quite as well-known as going all-in against a Sicilian, but they are commonly observed when validating DFAST, CCAR, CECL, and similar stress testing models. In many cases, modelers are conditioned to make these blunders by their previous experience – creating issues that common sense indicates should be avoidable. For example, in school, teachers give students well-defined problems. Students are provided with sufficient data to solve their problems, and when they finish, they can ask their teacher “Did I do it correctly?”. None of these things happen in real life – problems are ill defined, data is messy, and if anyone knew the right answer there would be no need to build a model in the first place. Certain types of errors are especially common with regression models used for forecasting and stress testing.
These include:
Classical Blunder 1. Solving the wrong problem
The first classic blunder of time series forecasting is solving the wrong problem. Everyone knows that you can it is possible to solve a math problem correctly, but not actually answer the right question. In many cases, this blunder occurs because solving the wrong problem is easier than solving the right problem. For example, modelers might try several approaches, get a passing test result, and stop looking into things. A commercial real-estate modeler might want to forecast which neighborhoods will see the highest rent growth with a goal of investing in rental properties where rents will rise fastest. As a result, they create a regression equation and use a machine-learning technique to solve for A and B in an equation like (see Eq. 1):
Rent(t)= A*Rent(t-1) + B*Predictive_Factor(t-1) + Noise(). (Eq. 1)
Unfortunately, this formula doesn’t actually predict relative changes (one neighborhood versus another). It also is set up with one non-predictive factor (A) that will be much larger than any predictive factor (B). As any renter can tell you, rents typically go up. On average, rents tend to go up 2% to 5% a year, so the “A” coefficient will explain about 97% of the next year’s rent. There are a couple of problems with this analysis:
This creates the perfect pre-conditions for our first classical blunder. First, there is a non-meaningful analysis that is easy to create, second, the important predictive factors are obscured. Thirdly, a really good test result is associated with the non-meaningful analysis. Even if they are shown the analysis, a model reviewer might get distracted by the good test metric and not catch that the forecast doesn’t actually focus on the right analysis
Of course, there is a well-known solution that minimizes the chance of setting up forecasting models incorrectly, called differencing. Differencing compares changes in outputs to changes in inputs. It won’t give you a 0.97 R2 test result, but it will give a meaningful analysis for most types of forecasts. Differencing allows modelers to set up the problem to look at changes in rent as a function of changes in predictive factors (See Eq. 2).
Rent_Change(t) = A*Factor1_Change(t-1) + B Factor2_Changet-1 + Noise() (Eq. 2)
Setting up this equation as changes in values has several advantages. First, it doesn’t use up a degree of freedom to estimate the average historical change. This would allow a second predictive factor to be included in the model. Second, it would allow predictive factors, like inflation, to be estimated. Previously, it would have been obscured since most of the time inflation is in the 2% to 5% range (the same as the average change in rents). Third, this would allow inclusion of location specific factors in a way that their effects would actually impact the test results (like R2).
Many new forecasters wonder if differencing is a necessary or useful way to improve regression tests. The answer is an unequivocable “Yes!”. Proper differencing will improve almost any type of forecasting study. Even if a different approach is ultimately taken, a differenced equation should be the starting point for most forecasts.
There are a couple of reasons why differencing is so useful. First, there are a lot of mathematical benefits if all the coefficients are centered around zero. Regression residuals are much more likely to be stationary. This is a huge deal, although a full discussion is outside the scope of this article. Second, forecasts that analyze differences (changes in values) more closely align with cause and effect (causation relationships) that correlational studies (that compare levels of one variable to another). Causation is better at forecasting than correlation.
Correlation studies (which compare levels of output to levels of input), aren’t always bad studies. However, at best, typically they can only hint at the right answer. They need additional analysis. Since that additional analysis is typically a differenced analysis, it often makes sense to started with the differenced analysis. This is such a common problem that it’s become a meme in popular media:
But to measure cause and effect... you must ensure that a simple correlation, however tempting it may be, is not mistaken for a cause. In the 1990s the stork population of Germany increased, and the German at-home birth rate rose as well. Shall we credit storks for airlifting the babies? – Neil deGrasse Tyson
Classical Blunder 2. Building a more complex model than the data will support.
The second classic forecasting blunder occurs when a model includes more variables than can be supported by the data. There is a mathematical term for the number of variables used by a regression model – degrees of freedom. This may abbreviated “df” (which can get confusing if you are differentiating) or “dof” (which sounds a little silly). When a model uses too many, it’s called “overfitting”. Overfit models test well, but don’t work well in out of sample data.
In school, teachers provide their students with enough data to solve the problems assigned to them. However, once the classroom is left behind, data becomes a huge problem. Sometimes it is just not available. In other cases, it might be full of errors. It is not unusual for data scientists to spend 50% to 90% of their modeling efforts getting clean datasets.
The simplest regression formula, a straight line like the formula Y = A*X + B + noise(), uses two degrees of freedom. One for the slope (A) and one for the intercept (B). A more complicated formulas, Y = A1*X1 + A2*X2 + B + noise() would use three degrees of freedom. A degree of freedom is used for each explanatory variable (X1 and X2) plus an additional one for the intercept.
To be statistically reliable, a simple regression (one with a single explanatory variable) needs about 30 pieces of data. Inverting that statement, unless you have more than 30 observations, your model should only have 1 explanatory variable. After that, every 10 additional data points will allow the model to incorporate another explanatory factor (See Figure – Allowable Explanatory Variables).
Recommended by LinkedIn
For example, a 10-year study on quarterly data would have about 40 data points. The most complicated model built to analyze this data should include 2 explanatory variables. If the data needs to be partitioned into a fitting period (30 data points) and an out-of-sample testing period (10 data points), the model might need to be simplified so that it only has a single explanatory variable.
Additional explanatory variables should increase the accuracy of a model – but only up to a point. After that point, the model will be “overfit”. An overfit model occurs when it learns the training data too well, capturing noise and random fluctuations rather than the underlying patterns. This results in a model that performs superbly on the training data but fails to give good predictions out of sample. To avoid overfitting, techniques such as regularization, cross-validation, and pruning can also be employed. Finally, many machine learning tools can be used to automatically screen variables for potential inclusion in a regression forecasting model. If these tools are used, it is critical to constrain the number of variables that can be included. If there is no constraint, add one. If a constraint exists, don’t turn it off.
Classical Blunder 3. Defining the wrong success criteria
The third classic forecasting blunder is to over-fixate on a specific testing metric. In school, teachers will have scoring rubrics. For example, a teacher might say 90% or above is an A, 80% or above is a B, and so on. Models can’t be graded the same way. A good model is one that accurately describes reality – not one that scores well on a particular test.
For example, a bank might need to do a stress test on various loans. These tests might examine whether a loan issued by the bank will get riskier if various market conditions change (things like US Treasury Rates, credit spreads, and so on). Not all loans will be equally sensitive to these risks. Some may not be sensitive at all. As a result, one potential conclusion – a reasonable conclusion - may be that certain loans are not sensitive to the changes in market conditions at all. This would produce a really bad R2 result. However, in many ways, this is a good result – it means the loan’s risks are not correlated with the same risk affecting other loans. That’s great for diversification.
If risk is unrelated to market conditions, that is important for the business to understand. For example, if a R2 test indicates that only 10% of the variation is due to the explanatory variables, that doesn’t mean the analysis was done incorrectly. The loan might not be sensitive to the risk factors. Similarly, modelers should not attempt to add dummy variables or modify the to give the forecast a higher R2. If the risk is actually low, that’s what needs to be reported.
It should be noted that while a low R2 test result may be acceptable, it may require additional analysis. Understanding the actual risk drivers is usually better that leaving things as an unknown. It’s a waterfall – an explanatory forecast is better than no forecast, and no forecast is better than a misleading forecast.
Also, modelers should keep some rules of thumb in mind when interpreting test results. Extremely high results are often as bad as low results. Test results can be too good to be believed (See Figure – R2 Rubric). If the results look too good, this might be due to modelers trying to solve the wrong problem (the first blunder) or overfitting the model (the second blunder).
Classical Blunder 4. Interpolating with Future Data
At some point in everyone’s modeling career, there comes a sudden realization that real world data is messy. There is often a traumatic point in a modeler’s early career. The transition from being the one provided with clean data sets to being the one responsible for preparing those data sets is a hard transition.
In the real world, a very large amount of the time, more than 50% of the total time spent on many projects, will be spent getting the data ready to analyze. Failing to spend enough time is a good way to ensure a failed model. This is very much a garbage in – garbage out situation. For example, financial data is often missing on holiday when financial markets are closed (See Figure – Missing Holiday Data).
Cleaning the data includes handling missing or corrupted data points, outliers, and unwanted seasonal patterns in the data. Sometime this involves interpolation. Techniques like forward fill and linear interpolation are used to estimate missing data points. Looking at daily financial data, it is common sense that filling in missing data using the prior good data point is safer than trying to interpolate using the surrounding data points.
With a fill-forward approach, the data is copied forward (See Figure - Using a Fill-Forward Approach). This is safe because the previous value was known at the date of the missing data. Interpolation is not safe because the data for the following date was not available on the missing date. As a result, interpolation shouldn't be used because interpolation requires both end points to be known on that date.
Cyber Threat Intelligence Leader | Transforming Threat Data into Actionable Security Defenses
1moGood insights! Don't forget accurate point-in-time lookback ;)