Can We Beat the Financial Market Using Semantic Signals in the Transfer Learning Approach?

Bohdan Pavlyshenko

Senior Data Scientist | NLP (PhD in Physics, D.Sc. in AI, Kaggle Master)

Published May 13, 2021

According to efficient market theory, we cannot beat the financial market using publicly available information. One of the modern points of view is that the market is efficient, but not perfectly efficient. So, the question is how to find these inefficiencies to get advantageson the market and find the Alphas - the trading rules which can give one the temporary benefits on the market.One of the modern approaches is based on the use of alternative data, e.g. Twitter and news streams, extracting sentiment signals from news for different companies and business processes. We considered some approaches in the following posts: Extracting Predictive Features from Opinion Trends of Users' Communities in Social Networks, Coronavirus Affects Stock Market, Can We Predict Stock Price Movements ? It is important to consolidate semistructured data from different sources into one multilevel ensemble of predictive models. Some of such approaches are considered in my Dr.Sc. thesis "Methods of intellectual analysis of consolidated data fordecision-making support" (you can load it here, english abstract is from page 8). New features for predictive models can be found using graph representation of users' connections and communities in social networks (see our post Using Semantic Structures of Tweets in Predictive Analytics for Stock Market). It is also important to calculate the uncertainty of predictive models to make risk assessments that can be done by combining machine learning and Bayesian inference (e.g. it was considered in our posts Using Bayesian Regression for Stacking Time Series Predictive Models, The Analysis of Coronavirus Impact on the Stock Market Using Regression ).

Here I would like to share my point of view on another approach which can also be useful. One of the biggest problems of using machine learning for predicting stock market is that data are highly noisy and non-stationary. The pattern which worked yesterday may not work tomorrow. But at the same time we can see effective investors and hedge funds. Our point of view is that the opinions of highly qualified experts should be included into the decision-making process of investing and portfolio forming to get more accurate results of prediction. Such experts have both knowledge and vision which are not included into training data of predictive models. That is why sometimes they can effectively extrapolate and predict economics and financial market evolving. In the post Bitcoin Price Predictive Modeling we considered one of the possible ways how to include the quantitative features which are based on the expert opinion. The most informative for supporting decision-making can be semantic signals which are described in natural language. Such signals can carry intellectual semantic information including the information about semantic uncertainty with respect to the processes under studies. For example, such semantic signals are in experts' consultations as well as in the answers on related questions. In the process of communication with informed experts, the potential investors can receive insightful semantic signals for the decision-making support. So, the question is how to get many high level experts to get those signals. The transfer learning approach makes it possible to transfer knowledge of experts to AI model. For this, we can use the transformers which consist of an encoder and decoder with multi-head attention based mechanism. They show the state-of-the-art results in many natural language problems like translation, question-answering, summarization, etc. One of the first powerful transformer models was Bert from Google. Nowadays, there are various transformers e.g. from Hugging Face company for different NLP tasks. The main idea is that experts' opinion can be transferred into transformer models. To get insightful semantic signals, investors can ask related questions in natural language, even using voice interface, when the transformer for voice-to-text transformation is used. T5 is one of the powerful transformer models which can support different Seq2Seq tasks like summarization, classification, QnA in one model. Classification results can be used as features for predictive algorithmic models. Summarization and QnA results can be used as semantic signals for the decision-making support. Experts can provide their analytical opinion and thoughts concerning market which then can be used as training datasets for transformers. Preliminary research results show that customers can form questions for transformer models like the following: 'What tickers will go up the next month?', 'Why AAPL will go up next months?', 'What's going on with Bitcoin?'. Transformers will be able to provide answers on such questions in natural language using models which are trained on the expert opinions and knowledge. At the same time, the answer on the question 'What is the trend for Bitcoin for the next month?' can be binary - '1' for bullish trend or '0' for bearish trend. Such results can be considered as a feature for machine learning predictive model. We can also incorporate different opinions of different expert groups even controversial into one model and then separate answers that enable investors to assess semantic uncertainty concerning company or process under investigation.

Of course, the question in the title of this post is open. We assume that using the transformer which can produce semantic signals brings seeking alphas on a new level, but a lot of research should be done to get practically working system which will combine ML and experts' opinions using transfer learning.

To view or add a comment, sign in

Can We Beat the Financial Market Using Semantic Signals in the Transfer Learning Approach?

Bohdan Pavlyshenko

Senior Data Scientist | NLP (PhD in Physics, D.Sc. in AI, Kaggle Master)

More articles by Bohdan Pavlyshenko

Insights from the community

Others also viewed

Advancements in Approximate Nearest Neighbor Algorithms: The Evolution of HNSW Algorithm

The Age of Reasoners

Core Concept: Cache-Augmented Generation (CAG)

Why Vector Databases Are Really Fast: An In-depth Look at FAISS

Unlock Crypto Market Buying Signals using MindsDB

Google’s New Algorithm Research: Tests Direct Question / Answer in SERP

Revolutionising IRR verification: Columbia University's new research method vs. Dashflow's user-friendly solution

From RAG to TAG: Leveraging the Power of Table-Augmented Generation (TAG): A Leap Beyond Retrieval-Augmented Generation (RAG)

Connecting the Dots: Streamlining Complexity from SQL to Conversational Drilling Analytics

Explore topics

More articles by Bohdan Pavlyshenko

Using GPT Models for Qualitative and Quantitative News Analytics in the 2024 US Presidential Election Process

Forecasting of Non-Stationary Sales Time Series Using Deep Learning

Methods of Informational Trends Analytics and Fake News Detection on Twitter

Named Entity Recognition for Documents with Structured Layout Using Multimodal Transformers

Fine-Tuning GPT-2 Model Using COVID-19 News

Bayesian Approach for Predicting COVID-19 Impact on Stock Market Movement Using Alternative Data

Stochastic Patterns in Time Series Predictive Analytics

COVID-19 and 5G in Tweets Analytics

Time Series Forecasting with Uncertainty Assessment Using LSTM Neural Network

Effect of Simultaneous Product Selling in Sales Predictive Analytics