PREDICTIVE MODELING

PREDICTIVE MODELING

What is predictive modeling?

Predictive modeling is a mathematical process used to predict future events or outcomes by analyzing patterns in a given set of input data. It is a crucial component of predictive analytics, a type of data analytics which uses current and historical data to forecast activity, behavior and trends.

Examples of predictive modeling include estimating the quality of a sales lead, the likelihood of spam or the probability someone will click a link or buy a product. These capabilities are often baked into various business applications, so it is worth understanding the mechanics of predictive modeling to troubleshoot and improve performance.

Although predictive modeling implies a focus on forecasting the future, it can also predict outcomes (e.g., the probability a transaction is fraudulent). In this case, the event has already happened (fraud committed). The goal here is to predict whether future analysis will find the transaction is fraudulent. Predictive modeling can also forecast future requirements or facilitate what-if analysis.

Predictive modeling is a form of data mining that analyzes historical data with the goal of identifying trends or patterns and then using those insights to predict future outcomes," explained Donncha Carroll a partner in the revenue growth practice of Axiom Consulting Partners. "Essentially, it asks the question, 'have I seen this before' followed by, 'what typically comes after this pattern.'"

Top types of predictive models

There are many ways of classifying predictive models and in practice multiple types of models may be combined for best results. The most salient distinction is between unsupervised versus supervised models.

  • Unsupervised models use traditional statistics to classify the data directly, using techniques like logistic regression, time series analysis and decision trees.
  • Supervised models use newer machine learning techniques such as neural networks to identify patterns buried in data that has already been labeled.

The biggest difference between these approaches is that with supervised models more care must be taken to properly label data sets upfront.

"The application of different types of models tends to be more domain-specific than industry-specific," said Scott Buchholz, government and public services CTO and emerging technology research director at Deloitte Consulting.

In certain cases, for example, standard statistical regression analysis may provide the best predictive power. In other cases, more sophisticated models are the right approach. For example, in a hospital, classic statistical techniques may be enough to identify key constraints for scheduling, but neural networks, a type of deep learning, may be required to optimize patient assignment to doctors.

Once data scientists gather this sample data, they must select the right model. Linear regressions are among the simplest types of predictive models. Linear models take two variables that are correlated -- one independent and the other dependent -- and plot one on the x-axis and one on the y-axis. The model applies a best fit line to the resulting data points. Data scientists can use this to predict future occurrences of the dependent variable.

Some of the most popular methods include the following:

  • Decision trees. Decision tree algorithms take data (mined, open source, internal) and graph it out in branches to display the possible outcomes of various decisions. Decision trees classify response variables and predict response variables based on past decisions, can be used with incomplete data sets and are easily explainable and accessible for novice data scientists.
  • Time series analysis. This is a technique for the prediction of events through a sequence of time. You can predict future events by analyzing past trends and extrapolating from there.
  • Logistic regression. This method is a statistical analysis method that aids in data preparation. As more data is brought in, the algorithm's ability to sort and classify it improves and therefore predictions can be made.
  • Neural networks. This technique reviews large volumes of labeled data in search of correlations between variables in the data. Neural networks form the basis of many of today's examples of artificial intelligence (AI), including image recognition, smart assistants and natural language generation.

The most complex area of predictive modeling is the neural network. This type of machine learning model independently reviews large volumes of labeled data in search of correlations between variables in the data. It can detect even subtle correlations that only emerge after reviewing millions of data points. The algorithm can then make inferences about unlabeled data files that are similar in type to the data set it trained on.

What are the uses of predictive modeling?

Predictive modeling is often associated with meteorology and weather forecasting, but predictive models have many applications in business. Today's predictive analytics techniques can discover patterns in the data to identify upcoming risks and opportunities for an organization.

"Almost anywhere a smart human is regularly making a prediction in a historically data rich environment is a good use case for predicative analytics," Buchholz said. "After all, the model has no ego and won't get bored."

One of the most common uses of predictive modeling is in online advertising and marketing. Modelers use web surfers' historical data, to determine what kinds of products users might be interested in and what they are likely to click on.

Bayesian spam filters use predictive modeling to identify the probability that a given message is spam.

In fraud detection, predictive modeling is used to identify outliers in a data set that point toward fraudulent activity. In customer relationship management, predictive modeling is used to target messaging to customers who are most likely to make a purchase.

Carroll said that predictive modeling is widely used in predictive maintenance, which has become a huge industry generating billions of dollars in revenue. One of the more notable examples can be found in the airline industry where engineers use IoT devices to remotely monitor performance of aircraft components like fuel pumps or jet engines.

These tools enable preemptive deployment of maintenance resources to increase equipment utilization and limit unexpected downtime. "These actions can meaningfully improve operational efficiency in a world that runs just in time where surprises can be very expensive," Caroll said.

Other areas where predictive models are used include the following:

  • capacity planning
  • change management
  • disaster recovery
  • engineering
  • physical and digital security management
  • city planning

Benefits of predictive modeling

Phil Cooper, group VP of products at Clari, a RevOps software startup, said some of the top benefits of predictive modeling in business include the following:

  • Prioritizing resources. Predictive modeling is used to identify sales lead conversion and send the best leads to inside sales teams; predict whether a customer service case will be escalated and triage and route it appropriately; and predict whether a customer will pay their invoice on time and optimize accounts receivable workflows.
  • Improving profit margins. Predictive modeling is used to forecast inventory, create pricing strategies, predict the number of customers and configure store layouts to maximize sales.
  • Optimizing marketing campaigns. Predictive modeling is used to unearth new customer insights and predict behaviors based on inputs, allowing organizations to tailor marketing strategies, retain valuable customers and take advantage of cross-sell opportunities.
  • Reducing risk. Predictive analytics can detect activities that are out of the ordinary such as fraudulent transactions, corporate spying or cyber attacks to reduce reaction time and negative consequences.

The techniques used in predictive modeling are probabilistic as opposed to deterministic. This means models generate probabilities of an outcome and include some uncertainty.

"This is a fundamental and inherent difference between data modeling of historical facts versus predicting future events [based on historical data] and has implications for how this information is communicated to users," Cooper said. Understanding this difference is a critical necessity for transparency and explainability in how a prediction or recommendation was generated.



To view or add a comment, sign in

More articles by Rijika Roy

  • Oracle

    What is Oracle? Oracle database is a relational database management system (RDBMS) from Oracle Corporation. This…

  • Tableau

    Introduction to Tableau Every day, we encounter data in the amounts of zettabytes and yottabytes! This enormous amount…

  • GCP

    What is Google Cloud Platform (GCP)? Companies turn to the public cloud for various reasons, but it has become the…

  • Oracle

    What is Oracle? Oracle database is a relational database management system (RDBMS) from Oracle Corporation. This…

  • Python Developer

    What is a Python Developer? Though there are many jobs in tech that use Python extensively — including Software…

  • Hadoop

    What is Apache Hadoop? Apache Hadoop software is an open source framework that allows for the distributed storage and…

  • Data Analytics

    What is data analytics? Data analytics converts raw data into actionable insights. It includes a range of tools…

  • MySQL

    What is MySQL? MySQL, the most popular Open Source SQL database management system, is developed, distributed, and…

  • What is Hive?

    Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale. Hive…

  • JAVA

    What is Java? Java is a widely-used programming language for coding web applications. It has been a popular choice…

Insights from the community

Others also viewed

Explore topics