How to Build a Data Product
Hundreds of companies (both private and public) are currently sitting on valuable data that has a potential to be monetized. While the concept is simple on paper, data monetization is complex. If done right, however, a market ready product can pay out in the long run. Particularly, if sold off to the Wall Street crowd.
What helps an investor make the decision to purchase an alternative data set? Well, does the productized data help the trader and portfolio manager make better investment decisions? The two most important pieces of this puzzle are: 1) how clean and accurate is your data; and 2) does your data have the ‘alpha’ value that the investment audience needs?
How Clean and Accurate is the Data?
If the data you are offering is not high-quality, the analysis derived from it will fail during a POC/trial phase and not make it to actual monetization. Below are factors that make up high-quality data:
- uniformity and consistency – what is the underlying methodology of the data? Is is it consistent across all sectors, companies, products, reviews, etc? Does the methodology or third party source change?
- data dictionary – all fields must be documented and properly defined. The data dictionary or data spec sheet is crucial here – it helps provide a road map for investment teams
- ticker mappings – investors need to know how the data ties into a public company ticker – this is important for mapping
- historical time stamps – investors need to know when the data point happened to test the historical performance of the dataset
Does the Data Produce Alpha?
In the investment world, alpha is key. It describes the power or strategic ability to beat the market and create excess return on an investment. The following factors determine whether a dataset has that predictive power:
- correlation – does the data relate to a stock price, an economic indicator, another market variable? Is there a statistical correlation that can be measured through historical testing?
- predictors – does the data predict market behavior? The signal must be evident ahead of a market action in order for an investor to have an edge trading the stock
- history for backtesting – a dataset with a longer history > no history. A lot of historical data allows the user to statistically test the power of the data
- coverage universe – valuable datasets cover as much of a relevant universe as possible – whether a particular sector, retailer, product, asset class, etc. It is always better to have more data predicting performance across a wide range of categories