Alternative Data, Data Monetization, Data Vendor, sell data

Sell Data to Wall Street

Companies have become increasingly data-focused businesses. Why? Data runs the world. Everything is algorithm based, digitally recorded via footprint, all feeding into big-picture understanding of the world. Every industry uses data differently. For example, marketing firms use the data to understand an audience run targeted advertising while investment firms use the data to identify signals and trends that funnel into a mosaic model and allow analysts to buy or sell stocks. Regardless of industry, the end goal is the same: make more money.

Hundreds of companies (both private and public) are currently sitting on valuable data that has a potential to be monetized. While the concept is simple on paper, data monetization is complex. If done right, however, a market ready product can pay out in the long run. Particularly, if sold off to the Wall Street crowd.

Data monetization, as defined by Wikipedia, refers to “the act of generating measurable economic benefits from available data sources (analytics)…data monetization leverages data generated through business operations, available exogenous data or content, as well as data associated with individual actors such as that collected via electronic devices and sensors participating in the internet of things.” Long story short, it’s the process of creating an additional revenue source for your company simply by using the data you already have in-house and collect through your everyday service and product offerings to the industry.

The vast majority of companies have data that may be valuable to investors – anything from healthcare claims data, geolocation data, rewards/loyalty memberships, point of sale/transaction data, etc. Currently there are two types of entities that hedge funds purchase data from: 1) alternative data firms that sell off-the-shelf type reports/data feeds to dozens of investors and potentially sell-side firms and 2) companies that have nothing to do with selling data and provide non-financial services to clients outside of the investment management community.

Option 1 is the easy route: makes the data broadly distributed (safety in numbers), typically no concern about a lack of compliance framework internally, everyone on the street ingests it BUT the overall value and uniqueness drops significantly. There’s also the issue of the product being ‘packaged’ versus a raw data feed. The packaged product leaves little analysis for the investment team. The signal is already translated for you (and everyone else). When it’s right, cool. When it’s wrong, you drown.

It’s kind of like being with somebody that’s been with everybody.

Option 2 takes more work both on the hedge fund and company side: data is exclusive, almost never distributed to any other firm on the street, potential to lock the data in for an exclusive proof-of-concept is high, and the hedge fund can mold the data offering into what it wants it to be. The deliverable is raw data that is completely uninterpreted. BUT, this data is riskier to consume. The company is usually in early stage data monetization conversations, has no internal legal or compliance framework, little idea of privacy or third party rights ramifications and will require a lot of hand holding to get things ‘right’. This effort is well worth it in the end though if that big data is generating alpha and beating the market.

Small and mid-sized companies can start this second option by: identifying the data they’ve been sitting on; parsing data into fields and aggregating content that may be converted directly into actionable insights; thinking through data vetting, storage, delivery options; performing analytics on the data internally, etc.

Who is the Target Audience?

The first step in any data monetization process is determining if your company even has data that may be valuable to the investment community (i.e. hedge funds, specifically). More often than not, companies are sitting on a ton of insights that relate to their strategic partners, customers, supply chains, competitors, etc. All of this adds color to an investment thesis. Data that is gobbled up by hedge funds may be – transactional/point-of-sale, geolocation, satellite imagery, loyalty/rewards cards insights, email receipt, health care claims, shipment data, etc.

Before we talk about data value, let’s review the different types of hedge fund strategies you need to take into consideration:

Discretionary, Stock-Picking – Fundamental shops (multi-manager, multi-strategy platform funds)

These platform funds are made up of PM teams who share centralized resources and purchase data that is highly correlated with a public company KPI (“Key Performance Indicator”), is unique in nature, and can be back tested. Typically these funds have an internal market intelligence/data sourcing unit made up of professionals who find and onboard data for investment teams based on need.

Long/Short hedge funds also live in this category. They pick stocks, have the largest AUM typically and trade frequently. The appetite and need for data is highest with these types of shops. Long/Short funds want to be right on every trade due to their large positions. The budget for alternative data investment is significant and often, limitless.

Quant, Systematic, Algorithmic – Quantitative Investing shops

The strategy at a quant shop is all based on algorithms and high-frequency trading. These firms are looking for a long history of data that can be backtested, that applies to 1000’s of securities and is published frequently. The data here is typically market data but they have interest in unique alternative datasets as well. Quant funds are a little more difficult to sell data to, they aren’t looking for just anything.

In addition to traditional discretionary or quant funds, there are other types of hedge funds like macro driven shops (want broader, global trends like inflation, weather, interest rates, global events), credit funds (invest in debt), pension funds/sovereign wealth funds (manage money of countries, endowments), or private equity/venture capital (invest in mostly private companies and not publicly traded ones).

Is Your Data Valuable?

What helps an investor make the decision to purchase an alternative data set? Well, does the productized data help the trader and portfolio manager make better investment decisions? The two most important pieces of this puzzle are: 1) how clean and accurate is your data; and 2) does your data have the ‘alpha’ value that the investment audience needs?

How Clean and Accurate is the Data?

If the data you are offering is not high-quality, the analysis derived from it will fail during a POC/trial phase and not make it to actual monetization. Below are factors that make up high-quality data:

  • accuracy – are there errors, outliers, improperly tagged fields? Is there information that should not be included in the data product per your third party obligations, privacy policy, terms of use, etc?
  • uniformity and consistency – what is the underlying methodology of the data? Is is it consistent across all sectors, companies, products, reviews, etc? Does the methodology or third party source change?
  • data dictionary – all fields must be documented and properly defined. The data dictionary or data spec sheet is crucial here – it helps provide a road map for investment teams
  • ticker mappings – investors need to know how the data ties into a public company ticker – this is important for mapping
  • historical time stamps – investors need to know when the data point happened to test the historical performance of the dataset

Does the Data Produce Alpha?

In the investment world, alpha is key. It describes the power or strategic ability to beat the market and create excess return on an investment. The following factors determine whether a dataset has that predictive power:

  • correlation – does the data relate to a stock price, an economic indicator, another market variable? Is there a statistical correlation that can be measured through historical testing?
  • predictors – does the data predict market behavior? The signal must be evident ahead of a market action in order for an investor to have an edge trading the stock
  • history for backtesting – a dataset with a longer history > no history. A lot of historical data allows the user to statistically test the power of the data
  • coverage universe – valuable datasets cover as much of a relevant universe as possible – whether a particular sector, retailer, product, asset class, etc. It is always better to have more data predicting performance across a wide range of categories

What are Big Data Use Cases?

Knowing the potential use cases for your company’s data will not only help narrow down the target audience but assist greatly with marketing/sales efforts. We touched upon the differences between discretionary and quant funds in the previous post on How to Target Your Audience.

High-Level Cheat Sheet:

  • Discretionary funds – looking for data on specific publicly traded companies; best to start with several case studies on companies dominating the space/sector your company is focused on; focus correlation to the KPIs (revenue, gross profit, etc.); choose a company where the stock price is driven by a key metric, has large market cap and high volatility. Typically do not need to backtest.
  • Quant funds – these shops want a long time series covering 100-1000’s companies, indices, markets, etc.; need to backtest historical data and fully understand the data correlation with share prices to identify value before moving forward.

Examples of top applications for alternative data vendors:

  • Advertising – tracks corporate advertising spend across platforms and by campaign. The data is focused on consumer interests based on their internet browsing habits – can be used to track certain categories like mortgages, automobiles, luxury goods, etc.
  • App Usage and Web Traffic – wildly popular data in the hedge fund space as this data can be used to estimate company revenues. App usage, app reviews, purchase tracking can all indicate how successful a product is among consumers. This includes mobile banking, streaming media, food delivery apps. It is also possible for investors to track services embedded into the apps like payment providers and advertising services.
  • Business to Business (B2B)/Supply Chain – provides a read into the supply chain (typically private companies) which then adds color to a public company analysis and investment model. This data identifies sales, marketing, business development or contracts for a range of industries like industrial materials, oil contracts/drilling concessions, or B2B trade indices.
  • Consumer Transactions/Payment Processing – data tracks merchant level transaction data (e.g. retailer or service provider), product level purchase data (e.g. food, electronics), and macro high level data (e.g. trends). These data sets are valuable because they are used to estimate quarterly revenue growth before the corporate earnings. Also valuable to gain insight into consumer purchasing behavior like rate of adoption, trends, how well promotions and discounts fare, and insight into consumer demographics. Payment processing data from PayPal and Square is also picked up in this category.
  • Environmental, Social and Governance (ESG) – this area has recently taken flight as consumers are preoccupied with “investing with a conscience” and ” investing in companies that care about the same issues” as they do. Valuable ESG data is obviously company-specific and can be tracked via social media data, open and public data, consumer surveys, satellite imagery. Data that monitors consumer reviews, hiring trends, business complaints, compensation, etc. will be valuable in this category.
  • Geo-location – also popular with funds, this data provides read into visitation trends/foot traffic, identify impact of promotions, and understand the influence of weather events. Top industry applications are retailers, restaurants, hotels and travel. This data is usually gathered from a third party like a mobile application, satellites, sensors and Bluetooth beacons.
  • Internet of Things (IoT) – this data provides a better understanding into consumer and business activity via tracking digital footprint of IoT activity. Again, this correlates with product adoption and overall market growth.
  • Natural Language Processing (NLP) – using NLP, data vendors pick up on topic and sentiment trends among experts or key industry leaders in any industry or field of expertise from niche blogs and forums.
  • Open Source Data – this is publicly available data and includes government data, trade organizations, market data, industry data, weather data, free APIs, etc. Vendors often collect this data and then repackage it for hedge fund use.
  • Ratings Data – this data comes from online and app consumer reviews (positive and negative). Brand and company reputation is tracked here with consumer and B2B opinions. This data is collected via webscraping if open source or through a third party vendor. It is then aggregated and analyzed for trends.
  • Satellite Imagery – satellite data and intelligence provides insight into economic activity, construction, oil and gas, shipping/tankers, retail parking lot monitoring, trucks/vehicles, stockpiles of raw materials on factory sites, etc. of publicly traded companies.
  • Social Media – includes insights from social media posts that help analyze consumer trends, adoption of product launches, how popular a new product or brand is, how satisfied a customer is, what the promotions look like, corporate/customer engagement, etc. All of this reads into sales momentum and revenue.
  • Webscraped/Web-crawled – typically any type of data may be webscraped off any web page (considering there is nothing barring the activity from a Terms of Use/Terms of Service and robots.txt perspective). Data picked up could be e-commerce activity (pricing, listing descriptions, product info, reviews, commentary/posts, press releases, IR websites, government filings). Hedge funds with internal data scientist teams may scrape internally but at large volume, external data vendors are the preferred source for this data.

Start with Early Adopters with a Limited Distribution

Once you have determined a target audience and created a valuable data offering, the key to commercializing it is to offer the data on a semi-exclusive basis to early adopters. This usually means quant funds and platform funds who want access to a new, limited distribution data set at a discounted price. Typically, this arrangement is 40-50% off the sticker price for a 3-6 month basis with heavy customer/analyst interaction. Not only is the product placed into the stream of commerce but the closer interactions with hedge funds users will actually help the vendor update, customize, build out and refine the product further with hedge fund feedback.

How Much Money Can I Charge for the Data?

While there are data sets that can fetch millions of dollars, the typical data set, once productized, falls within the $100,000 – $250,000/year range per subscribing firm. Many new players in the alternative data space have unrealistic expectations of the value of their data products. Reasons for this unrealistic expectation include:

  • The value of the dataset will depend heavily on details such as accuracy, time series, compliance, release schedule
  • It will also depend on factors specific to the target company such as the existence of competitor datasets, the precision of sell side consensus, key investor questions, or the existence of legal/macro/regulatory overhangs
  • Investors will pay a large premium for the #1 dataset in a category. Are you #1? Are you unique? Who are your competitors?
  • Rumors of datasets commanding enormous premiums are more viral than ones about datasets nobody wants

What Should Data Vendors Never Provide?

Data vendors should NEVER provide material, non-public information in violation of securities laws or personally identifiable information or misleading, doctored or “data-mined” historical correlations. Vendor should also never conceal significant data outages or other issues that may affect the data’s accuracy or integrity.


A reasonable go-to-market strategy for hedge funds is:

  • Start with platform funds and quant funds
  • Set up 1-year contracts with a limited number (5-10) of buyers as soon as possible (early adopters with discounted cost)
  • Spend time productizing the data and learning about its use cases
  • Determine whether or not you want to expand the size of your distribution
Share this:

Leave a Reply