The Four Classical Blunders of Time Series Forecasting
Scene from The Princess Bride (1987 Film)

The Four Classical Blunders of Time Series Forecasting

Could you be falling victim to one of these classic modeling blunders?

These four blunders aren’t quite as well-known as going all-in against a Sicilian, but they are commonly observed when validating DFAST, CCAR, CECL, and similar stress testing models. In many cases, modelers are conditioned to make these blunders by their previous experience – creating issues that common sense indicates should be avoidable. For example, in school, teachers give students well-defined problems. Students are provided with sufficient data to solve their problems, and when they finish, they can ask their teacher “Did I do it correctly?”. None of these things happen in real life – problems are ill defined, data is messy, and if anyone knew the right answer there would be no need to build a model in the first place. Certain types of errors are especially common with regression models used for forecasting and stress testing.

These include:

Article content
Figure - The Four Classical Blunders

Classical Blunder 1. Solving the wrong problem

The first classic blunder of time series forecasting is solving the wrong problem. Everyone knows that you can it is possible to solve a math problem correctly, but not actually answer the right question. In many cases, this blunder occurs because solving the wrong problem is easier than solving the right problem. For example, modelers might try several approaches, get a passing test result, and stop looking into things. A commercial real-estate modeler might want to forecast which neighborhoods will see the highest rent growth with a goal of investing in rental properties where rents will rise fastest. As a result, they create a regression equation and use a machine-learning technique to solve for A and B in an equation like (see Eq. 1):

Rent(t)= A*Rent(t-1) + B*Predictive_Factor(t-1) + Noise().                            (Eq. 1)

Unfortunately, this formula doesn’t actually predict relative changes (one neighborhood versus another). It also is set up with one non-predictive factor (A) that will be much larger than any predictive factor (B). As any renter can tell you, rents typically go up. On average, rents tend to go up 2% to 5% a year, so the “A” coefficient will explain about 97% of the next year’s rent. There are a couple of problems with this analysis:

  1. Random Walk. This equation is set up to look a lot like a random walk. Random walks occur in equations like X(t) = X(t-1) + Noise(), and this formula looks like a random walk because A >> B (since we know A explains about 97% of the current rent). The historical behavior of random walks (or its historical correlation with other factors) isn’t predictive of future behavior.
  2. Predictive Factors get ignored. If you are modeling rents, because last year’s rent makes up such a large portion of next year's rent, it hides the other predictive factors.
  3. Test Results. The prior year’s rent makes up about 97% of the next year’s rent. This test will produce an R2 of approximately 0.97 – a level where many modelers would put down their pencil and claim great success – when they should be rechecking their work.

This creates the perfect pre-conditions for our first classical blunder. First, there is a non-meaningful analysis that is easy to create, second, the important predictive factors are obscured. Thirdly, a really good test result is associated with the non-meaningful analysis. Even if they are shown the analysis, a model reviewer might get distracted by the good test metric and not catch that the forecast doesn’t actually focus on the right analysis

Of course, there is a well-known solution that minimizes the chance of setting up forecasting models incorrectly, called differencing. Differencing compares changes in outputs to changes in inputs. It won’t give you a 0.97 R2 test result, but it will give a meaningful analysis for most types of forecasts. Differencing allows modelers to set up the problem to look at changes in rent as a function of changes in predictive factors (See Eq. 2).

Rent_Change(t) = A*Factor1_Change(t-1) + B Factor2_Changet-1 + Noise()            (Eq. 2)

Setting up this equation as changes in values has several advantages. First, it doesn’t use up a degree of freedom to estimate the average historical change. This would allow a second predictive factor to be included in the model. Second, it would allow predictive factors, like inflation, to be estimated. Previously, it would have been obscured since most of the time inflation is in the 2% to 5% range (the same as the average change in rents). Third, this would allow inclusion of location specific factors in a way that their effects would actually impact the test results (like R2).

Many new forecasters wonder if differencing is a necessary or useful way to improve regression tests. The answer is an unequivocable “Yes!”. Proper differencing will improve almost any type of forecasting study. Even if a different approach is ultimately taken, a differenced equation should be the starting point for most forecasts.

  • Key Point 1: Regression-based forecasts should almost always examine differences in value, comparing changes in outputs to changes in inputs, not absolute levels.

There are a couple of reasons why differencing is so useful. First, there are a lot of mathematical benefits if all the coefficients are centered around zero. Regression residuals are much more likely to be stationary. This is a huge deal, although a full discussion is outside the scope of this article. Second, forecasts that analyze differences (changes in values) more closely align with cause and effect (causation relationships) that correlational studies (that compare levels of one variable to another). Causation is better at forecasting than correlation.

Correlation studies (which compare levels of output to levels of input), aren’t always bad studies. However, at best, typically they can only hint at the right answer. They need additional analysis. Since that additional analysis is typically a differenced analysis, it often makes sense to started with the differenced analysis. This is such a common problem that it’s become a meme in popular media:

But to measure cause and effect... you must ensure that a simple correlation, however tempting it may be, is not mistaken for a cause. In the 1990s the stork population of Germany increased, and the German at-home birth rate rose as well. Shall we credit storks for airlifting the babies? – Neil deGrasse Tyson

Classical Blunder 2. Building a more complex model than the data will support.

The second classic forecasting blunder occurs when a model includes more variables than can be supported by the data. There is a mathematical term for the number of variables used by a regression model – degrees of freedom. This may abbreviated “df” (which can get confusing if you are differentiating) or “dof” (which sounds a little silly). When a model uses too many, it’s called “overfitting”. Overfit models test well, but don’t work well in out of sample data.

In school, teachers provide their students with enough data to solve the problems assigned to them. However, once the classroom is left behind, data becomes a huge problem. Sometimes it is just not available. In other cases, it might be full of errors. It is not unusual for data scientists to spend 50% to 90% of their modeling efforts getting clean datasets.

  • Don’t trust the data set is accurate.
  • Don’t use too many variables.
  • Don’t use dummy variables for error correction.

The simplest regression formula, a straight line like the formula Y = A*X + B + noise(), uses two degrees of freedom. One for the slope (A) and one for the intercept (B). A more complicated formulas, Y = A1*X1 + A2*X2 + B + noise() would use three degrees of freedom. A degree of freedom is used for each explanatory variable (X1 and X2) plus an additional one for the intercept.

To be statistically reliable, a simple regression (one with a single explanatory variable) needs about 30 pieces of data. Inverting that statement, unless you have more than 30 observations, your model should only have 1 explanatory variable. After that, every 10 additional data points will allow the model to incorporate another explanatory factor (See Figure – Allowable Explanatory Variables).

Article content
Figure 2 - Allowable Explanatory Variables

For example, a 10-year study on quarterly data would have about 40 data points. The most complicated model built to analyze this data should include 2 explanatory variables. If the data needs to be partitioned into a fitting period (30 data points) and an out-of-sample testing period (10 data points), the model might need to be simplified so that it only has a single explanatory variable.

  • Key Point 2: The complexity of a model should be limited by the amount of available data. If automated tools are used to select explanatory variables for a model, there should be a constraint limiting the number of explanatory variables that can be selected.

Additional explanatory variables should increase the accuracy of a model – but only up to a point. After that point, the model will be “overfit”. An overfit model occurs when it learns the training data too well, capturing noise and random fluctuations rather than the underlying patterns. This results in a model that performs superbly on the training data but fails to give good predictions out of sample. To avoid overfitting, techniques such as regularization, cross-validation, and pruning can also be employed. Finally, many machine learning tools can be used to automatically screen variables for potential inclusion in a regression forecasting model. If these tools are used, it is critical to constrain the number of variables that can be included. If there is no constraint, add one. If a constraint exists, don’t turn it off.

Classical Blunder 3. Defining the wrong success criteria

The third classic forecasting blunder is to over-fixate on a specific testing metric. In school, teachers will have scoring rubrics. For example, a teacher might say 90% or above is an A, 80% or above is a B, and so on. Models can’t be graded the same way. A good model is one that accurately describes reality – not one that scores well on a particular test.

For example, a bank might need to do a stress test on various loans. These tests might examine whether a loan issued by the bank will get riskier if various market conditions change (things like US Treasury Rates, credit spreads, and so on). Not all loans will be equally sensitive to these risks. Some may not be sensitive at all. As a result, one potential conclusion – a reasonable conclusion - may be that certain loans are not sensitive to the changes in market conditions at all. This would produce a really bad R2 result. However, in many ways, this is a good result – it means the loan’s risks are not correlated with the same risk affecting other loans. That’s great for diversification.

If risk is unrelated to market conditions, that is important for the business to understand. For example, if a R2 test indicates that only 10% of the variation is due to the explanatory variables, that doesn’t mean the analysis was done incorrectly. The loan might not be sensitive to the risk factors. Similarly, modelers should not attempt to add dummy variables or modify the to give the forecast a higher R2. If the risk is actually low, that’s what needs to be reported.

  • Key Point 3: Modelers should focus on accurately describing reality and not get tunnel vision trying to maximize R2.

It should be noted that while a low R2 test result may be acceptable, it may require additional analysis. Understanding the actual risk drivers is usually better that leaving things as an unknown. It’s a waterfall – an explanatory forecast is better than no forecast, and no forecast is better than a misleading forecast.

Also, modelers should keep some rules of thumb in mind when interpreting test results. Extremely high results are often as bad as low results. Test results can be too good to be believed (See Figure – R2 Rubric). If the results look too good, this might be due to modelers trying to solve the wrong problem (the first blunder) or overfitting the model (the second blunder).

Article content
Figure - R2 Rubric


Classical Blunder 4. Interpolating with Future Data

At some point in everyone’s modeling career, there comes a sudden realization that real world data is messy. There is often a traumatic point in a modeler’s early career. The transition from being the one provided with clean data sets to being the one responsible for preparing those data sets is a hard transition.

In the real world, a very large amount of the time, more than 50% of the total time spent on many projects, will be spent getting the data ready to analyze. Failing to spend enough time is a good way to ensure a failed model. This is very much a garbage in – garbage out situation. For example, financial data is often missing on holiday when financial markets are closed (See Figure – Missing Holiday Data).


Article content
Figure - Missing Holiday Data

Cleaning the data includes handling missing or corrupted data points, outliers, and unwanted seasonal patterns in the data. Sometime this involves interpolation. Techniques like forward fill and linear interpolation are used to estimate missing data points. Looking at daily financial data, it is common sense that filling in missing data using the prior good data point is safer than trying to interpolate using the surrounding data points.

  • Fill-Forward. The approach fills forward the prior data point. With this approach, you only have to know the prior data point.
  • Linear Interpolation. Interpolation estimates unknown values within a range of known values. It requires both end points to be known at the time of the estimate. This process replaces missing data point by averaging the value of the surrounding data points.

With a fill-forward approach, the data is copied forward (See Figure - Using a Fill-Forward Approach). This is safe because the previous value was known at the date of the missing data. Interpolation is not safe because the data for the following date was not available on the missing date. As a result, interpolation shouldn't be used because interpolation requires both end points to be known on that date.

Article content
Figure - Using a Fill-Forward Approach


  • Key Point. Filling forward data prevents tainting historical data with knowledge of the future.
  • Key Point. Interpolation has a lot of uses. It can be used to de-seasonalize data (assuming the seasonality stays constant year-over-year). It should only be used on time series if both end points would have been known on date of the missing data point.

Marc Cooperman

Cyber Threat Intelligence Leader | Transforming Threat Data into Actionable Security Defenses

1mo

Good insights! Don't forget accurate point-in-time lookback ;)

Like
Reply

To view or add a comment, sign in

More articles by Davis Edwards

  • The AI-regon Trail

    An epic adventure building ethical, secure, and explainable AI/ML models in the financial industry – includes a…

  • Avoiding the Pitfalls of Portfolio Optimization

    Key terms: Portfolio Management, Multi-Asset, Investment, Asset Allocation, Optimization, Machine Learning, Python…

  • What Textbooks and ChatGPT won’t tell you about Value at Risk (VAR) Models

    One of the big issues with value at risk (VAR) models is that almost all sources describe VAR by its implementation…

    1 Comment
  • AI/ML Governance – Don’t put the cart in front of the horse.

    Business leaders understand the value of developing an engine to drive growth (the horse) before they focus on…

  • Taming the EUC Beast

    Four simple suggestions that will massively reduce the risks of using end-user computing applications (EUCs) like Excel…

    1 Comment
  • Blockchain - What is it good for?

    I am a professional quant – a combination of mathematician, computer programmer, and financial professional. I have…

    1 Comment
  • A Practical Introduction to Cyber Risk Management

    I am a professional quant – a combination of mathematician, computer programmer, and financial professional. I have…

    2 Comments
  • Introduction to Option Valuation

    I am a professional quant – a combination of mathematician, computer programmer, and financial professional. I have…

  • An Introduction to Trading

    I am a professional quant – a combination of mathematician, computer programmer, and financial professional. I have…

  • The Art and Practice of Model Risk Management

    I am a professional quant – a combination of mathematician, computer programmer, and financial professional. I have…

    2 Comments

Insights from the community

Others also viewed

Explore topics