What does a quant need from alternative data?
Quants like data, in the same way I like burgers. Quants like to consume data to come up with ways of forecasting the market. The most important dataset is usually market data. However, there are all sorts of other interesting datasets that can be used to help forecast those market prices. In recent years, the area of alternative data has exploded. There are many datasets now available from alternative data vendors. Obviously a quant wants a dataset which can make them money! But before that stage, the question is, if you're a quant, which datasets are worth investigating for alpha? We need a way of pruning down our massive list of alternative datasets before we go anywhere near a Python script! There simply isn't enough time to investigate every dataset for alpha thoroughly (we can do fairly standardised tests such as finding the correlation with price action etc. but these aren't necessarily going to tell us if there's a specific profitable trading rule we can apply). For alternative data vendors, what do they need to do, to make their datasets usable by quants? Hence, there are many questions we need to ask about a dataset first, we even want to investigate whether a dataset can provide us any insight into forecasting markets. Below we give a list of questions a quant might want to ask about a dataset, that they might want to use to find alpha.
Historical data for a dataset
One of the most important is the length of the dataset. If we want our dataset to forecast prices on a daily basis, and we only have a month of data, it isn't really going to be that useful. One month is simply not sufficient to gauge the robustness of a dataset. There is a balance however. Ideally, we'd like several years of data to do a historical backtest. There is a balance however, because, if the dataset does turn out to be valuable, do we really want to wait 10 years till there's enough historical data to use it? This is tricky!
Timeliness and point-in-time
When is the data available? If it takes many weeks on end, with a very large lag, it's likely to be more difficult to trade off it. Furthermore, is the dataset properly timestamped, so we know the time that it is released to users, not just when it is collected. This is crucial for trading....
To read the rest of the article on the Cuemacro website, please click here
Strategy Consultant | Valuation Expert | Author | Speaker
6yI agree Johan Vanderlugt that alternative datasets are also relevant to the non-quant population. We have developed the first alternative dataset of integrated metrics focused on long-term wealth creation achieved in ways that enhance the wellbeing of the wider community. Surely this is relevant to all institutional investors and fiduciaries that invest on behalf of pensioners and retirees, let alone fuel sustainable wealth creation in the real economy!
Superintendente Geral, Produtos de Tesouraria
6ySaeed Amen very well writen, thanks for the article! In the name of the stratsphera community, can i kindly ask your permission to repost a translated version in our blog? All credita given and link to the original post. Many thanks!
Researcher | Nature Strategy & Finance | Sustainable Investing
6yAlso relevant for the non-quant population out there!
CEO, Founder, Technology Speaker and Data Management Specialist
6yI have a unique data set which reflects risk emerging in company ecosystems (eg supply chains, parent companies, customers, etc). I’ve quantifies severities of the language identified and would like to speak to traders about alpha in my RISC Scores. Would love to discuss further!
Equity Analyst at Fidelity
6yMattia Giammarusto Quants like data in the same way I like burgers 😂