How to improve MBO NN models predictions?

Wael Fayyad

Markets, Deep Tech, Private Equity, Venture Capital, Perpetual learner

Published Apr 3, 2024

Market by Order (MBO) data, once overshadowed by the prominence of limit order books (LOBs), has emerged as a potent tool for forecasting price movements. While LOBs have traditionally garnered the limelight, recent advancements highlight the value of incorporating MBO data into machine learning neural network (NN) algorithms.

Exchanges typically provide high-frequency microstructure data in three tiers: Level 1 (L1), Level 2 (L2), and Level 3 (L3). L1 offers basic information like the last executed trade price and real-time best bid and ask of an order book. L2 delves deeper, showing bids and asks at various levels of the order book, commonly known as LOB data. L3, or MBO data, provides even more detailed insights, displaying non-aggregated bids and asks placed by individual traders.

MBO data operates on a message-based feed, enabling observation of individual actions of market participants. It includes crucial components such as timestamps, unique order IDs, order types (limit or market), side (buy or sell), and actions (updating, adding, or canceling orders). This level of granularity offers unparalleled insights into individual behavior and order book dynamics, enhancing transparency without compromising customer confidentiality.

Integrating MBO data alongside LOB data in NN algorithms signifies a significant advancement in high-frequency trading. By harnessing the strengths of both data sources, traders can unlock new avenues for alpha generation and gain a competitive edge in dynamic markets. Moreover, MBO data offers a fresh perspective on market dynamics, providing insights into individual behaviors and order book dynamics, ultimately empowering traders to make more informed decisions.

Utilizing MBO data to train neural network models for market direction forecasting within a short forecasting horizon is a promising approach. By analyzing sequences of MBO data and their corresponding market movements, NNs can discern patterns and relationships between order actions and price movements. For instance, identifying that a surge in market orders on the buy side often precedes an upward price movement or recognizing that a high volume of canceled limit orders may signal potential market indecision or reversal.

However, in real-world production environments, the structure of MBO data may evolve over time, reflecting new strategies or trading behaviors not captured during model training. This evolution introduces epistemic uncertainty, stemming from limitations in model capacity and encountering unseen data. Adapting models to these unforeseen changes is crucial for maintaining robustness and adaptability.

As *uncertainty remains a critical aspect of machine learning model deployment, ongoing research efforts aim to develop reliable strategies for detecting and addressing out-of-model-scope inputs. By acknowledging and navigating epistemic uncertainty, traders can enhance the resilience of their machine learning models, ultimately improving decision-making and performance in dynamic market environments. Two approaches that can be employed by quants to handle uncertainty effectively.

Recommended by LinkedIn

Live-Relate™: Quantifying Guanxi in Global Supply…

Michael A. Krafft, Ph.D., Thunderbird MBA, MS GTD AID, MS CAS 5 days ago

Are Digital Twins the ‘Next Big Thing’ in Analytics?

Absolutdata Analytics-an Infogain company 2 years ago

ARIMA

Rohit Singh 4 months ago

Out-of-Model-Scope Detection

One strategy to address uncertainty is out-of-model-scope detection, aimed at identifying inputs that may lead to erroneous model predictions due to unseen data or model limitations. One simple approach involves setting a threshold based on the maximum **softmax value of predicted classes. Inputs with softmax values below this threshold are flagged as out of model scope, indicating a lack of confidence in the prediction.

To evaluate the effectiveness of this strategy, we treat out-of-model-scope detection as a binary classification problem and measure performance using metrics such as true positive rate and false positive rate. By varying the threshold, we can plot a receiver operating characteristic (ROC) curve and compute the area under the curve (AUC) to assess the quality of the binary classifier. However, it's essential to recognize that the choice of threshold and performance evaluation may vary depending on the specific dataset and model characteristics.

Bayesian Neural Networks

Another approach involves using Bayesian neural networks to quantify uncertainty in model predictions. Bayesian neural networks incorporate uncertainty by representing model weights as probability distributions, allowing for the estimation of epistemic uncertainty, which arises from model uncertainty and unseen data.

One method to approximate Bayesian neural networks is through deep ensembles, where multiple models with different initializations or architectures are trained and their outputs combined to measure uncertainty. Another approach is Monte Carlo dropout, which leverages dropout regularization during both training and inference to estimate uncertainty by sampling multiple predictions with varying neuron activations.

Addressing uncertainty in NN models for market prediction requires thoughtful consideration of model limitations and unseen data. By employing strategies such as out-of-model-scope detection and Bayesian neural networks, traders and quants can enhance the robustness and reliability of their models, ultimately improving decision-making in dynamic market environments.

*In machine learning model deployment, confidence in predictions is crucial for making informed and reliable decisions. Two types of uncertainty, aleatoric and epistemic, play pivotal roles in understanding the reliability of model predictions. Aleatoric uncertainty stems from inherent data variability, representing known unknowns, while epistemic uncertainty arises from the lack of data or model limitations, representing unknown unknowns.

Aleatoric uncertainty is observable in regions where data is present, reflecting the spread in outputs for a given input. In contrast, epistemic uncertainty manifests in areas devoid of data, where predicting outcomes becomes challenging due to insufficient information. Distinguishing between these uncertainties is essential for ensuring the robustness of machine learning models in real-world scenarios.

**In neural networks, the softmax function is a type of activation function used in the output layer. It takes a vector of arbitrary real values as input and transforms them into a vector of probabilities. This transformation allows the neural network to output probabilities for each class in a classification problem.

To view or add a comment, sign in

How to improve MBO NN models predictions?

Wael Fayyad

Markets, Deep Tech, Private Equity, Venture Capital, Perpetual learner

Recommended by LinkedIn

More articles by Wael Fayyad

Insights from the community

Others also viewed

How to Fix Overfitting, Underfitting, and Imbalanced Data in Machine Learning

How to Detect Multivariate Covariate Shift in Machine Learning Models?

How to Detect Multivariate Covariate Shift in Machine Learning Models?

SHAP: Bridging the Gap Between Machine Predictions and Actionable Recommendations

When Machine Learning Models Lose Their Way: Grasping Data Drift and Concept Drift

How AI Will Take Predictive Analytics to the Next Level

Bagging and Boosting in Machine Learning

Regularization..

Too Good to Be True? When High Accuracy Hides Deeper Model Problems

How can analytics and AI allow marketers to predict the future?

Explore topics

Recommended by LinkedIn

More articles by Wael Fayyad

The week ahead- April 28, 2025

The week ahead- April 21, 2025

The week ahead- April 14, 2025

The week ahead- March 3, 2025

The week ahead- February 24, 2025

The week ahead- February 17, 2025

The week ahead- February 10, 2025

The week ahead- February 3, 2025

Memory-Driven Time Series Forecasting Framework

The week ahead- January 27, 2025

Insights from the community

Others also viewed

How to Fix Overfitting, Underfitting, and Imbalanced Data in Machine Learning

How to Detect Multivariate Covariate Shift in Machine Learning Models?

How to Detect Multivariate Covariate Shift in Machine Learning Models?

SHAP: Bridging the Gap Between Machine Predictions and Actionable Recommendations

When Machine Learning Models Lose Their Way: Grasping Data Drift and Concept Drift

How AI Will Take Predictive Analytics to the Next Level

Bagging and Boosting in Machine Learning

Regularization..

Too Good to Be True? When High Accuracy Hides Deeper Model Problems

How can analytics and AI allow marketers to predict the future?

Explore topics