Alternative data (finance) – Wikipedia
Alternative data (in finance) refers to data used to obtain insight into the investment process.  These data sets are often used by hedge fund managers and other institutional investment professionals within an investment company.  Alternative data sets are information about a particular company that is published by sources outside of the company, which can provide unique and timely insights into investment opportunities. 
Alternative data sets are often categorized as big data,  which means that they may be very large and complex and often cannot be handled by software traditionally used for storing or handling data, such as Microsoft Excel. An alternative data set can be compiled from various sources such as financial transactions, sensors, mobile devices, satellites, public records, and the internet.  Alternative data can be compared with data that is traditionally used by investment companies such as investor presentations, SEC filings, and press releases.  These examples of “traditional data” are produced directly by the company itself.
Since alternative data sets originate as a product of a company’s operations, these data sets are often less readily accessible and less structured than traditional sources of data.  Alternative data is also known as “exhaust data. ” The company that produces alternative data generally overlooks the value of the data to institutional investors. During the last decade, many data brokers, aggregators, and other intermediaries began specializing in providing alternative data to investors and analysts. 
Examples of alternative data include:
Geolocation (foot traffic)
Credit card transactions
Web site usage
Mobile App or App Store analytics
Obscure city hall records
Social media posts
Online browsing activity
Shipping container receipts
Internet activity and quality data
Example of sentiment analysis against stock price (S&P 500)
Alternative data is being used by fundamental and quantitative institutional investors to create innovative sources of alpha. The field is still in the early phases of development, yet depending on the resources and risk tolerance of a fund, multiple approaches abound to participate in this new paradigm. 
The process to extract benefits from alternative data can be extremely challenging. The analytics, systems, and technologies for processing such data are relatively new and most institutional investors do not have capabilities to integrate alternative data into their investment decision process.  However, with the right tools and strategy, a fund can mitigate costs while creating an enduring competitive advantage. 
Most alternative data research projects are lengthy and resource intensive; therefore, due-diligence is required before working with a data set. The due-diligence should include an approval from the compliance team, validation of processes that create and deliver this data set, and identification of investment insights that can be additive to the investment process. 
However, the usage of the alternative data is not restricted by investment sphere, it’s successfully used in economics and politics as well as retail and e-commerce spheres. It’s possible to predict geopolitical risk through a profound alternative data analysis, while social media sites reveal a host of data for consumer sentiment analysis.
Alternative data can be accessed via:
Web scraping (or web Harvesting, performed by computer programmers that design an algorithm that searches websites for specific data on a desired topic)
Acquisition of Raw data
In finance, Alternative data is often analysed in the following ways:
Scarcity: the data Information overload within financial markets
Granularity: the level of detail and aggregation of data (including time)
History: the trajectory of data
Structure: the form of the data (csv, json etc. )
Coverage: the stocks or geographical locations that data can be linked with
While compliance and internal regulation are widely practiced in the alternative data field, there exists a need for an industry-wide best practices standard. Such a standard should address personally identifiable information (PII) obfuscation and access scheme requirements among other issues. Compliance professionals and decision makers can benefit from proactively creating internal guidelines for data operations. Publications such as NIST 800-122 provide guidelines for protecting PII and are useful when developing internal best practices. Investment Data Standards Organization (IDSO) was established to develop, maintain, and promote industry-wide standards and best practices for the Alternative Data industry.
Legal aspects surrounding web scraping of alternative data have yet to be defined. Current best practices address the following issues when determining legal compliance of web crawling operations:
Review of the terms and conditions associated with the websites crawled
Control over the potential interference with crawled websites
Web scraped data refers to data harvested from public websites. With 4 billion webpages and 1. 2 million terabytes of data on the internet, there is a mountain of information that can be valuable to investors when analyzing a corporate performance.
The companies that specialize in this type of data collection, like Thinknum Alternative Data,  write programs that access targeted websites and collect and store the scraped information on a periodic basis. In some cases web scraping requires use of public APIs as a way to access the data within those pages directly without visiting the actual website.
Types of web scraped data include:
Job listings: A company that is increasing hiring and headcount is likely experiencing growth.
Company ratings: Sites like Glassdoor allows employees to rate their company; increasing ratings, especially (in conjunction with increasing job listings) can be another growth indicator.
Online retail data: High product rankings on online retailers suggest strong sales for those product manufacturers. On the flip side, heavy discounting of products suggest weak sales. 
Standards Board for Alternative Investment (SBAI) is the global standard-setting agency for the alternative investment industry and guardian of the Alternative Investment Standards. The agency supported by approximately 200 alternative investment managers and institutional investors and collectively manage $3. 5 trillion. The SBAI has published the Standardised Trial Data License Agreement which addresses investment managers’ issues when comes to new data trailing process, like alternative data and big data.  Thomas Deinet, Executive Director of the SBAI said: “This Trial Data Licence Agreement template highlights a number of very important issues, including personal data protection, which has become a hot topic in light of the overhaul of data protection regulation in many jurisdictions. It also includes key protections for managers in areas such as prevention of insider trading and ‘right to use data’. It is crucial that managers and data vendors fully understand all risks when selling and using new data. “
^ Z., W. (2016-08-22). “Why investors want alternative data”. The Economist. Retrieved 21 August 2017.
^ Flanagan, Terry (2016-12-07). “‘Early Days’ For Alternative Data”. Markets Media. Retrieved 21 August 2017.
^ a b c d Kolanovic, Marko; Krishnamachari, Rajesh. “Big Data and AI Strategies – Machine Learning and Alternative Data Approach to Investing”. RavenPack. J. P. Morgan, Global Quantitative & Derivatives Strategy. Archived from the original on July 22, 2018. Retrieved June 29, 2017.
^ Nathan, Krishna (2017-01-03). “What is ‘alternative data’ and how can you use it? “. CIO. Retrieved 20 August 2017.
^ Belissent, Jennifer (2017-06-23). “The Age of Alt: Data Commercialization Brings Alternative Data To Market”. Forrester Research. Forrester Research, Inc. Retrieved 20 August 2017.
^ a b c “Searching for Alpha: Big Data. Navigating New Alternative Datasets”. Eagle Alpha. Citi Research. Retrieved July 3, 2017.
^ Hafez, Peter. “Data Hoarding and Alternative Data In Finance – How to Overcome the Challenges”. RavenPack.
^ Savi, Raffaele; Shen, Jeff; Betts, Brad; MacCartney, Bill. “The Evolution of Active Investing Finding Big Alpha in Big Data” (PDF). BlackRock. Retrieved August 9, 2017.
^ Kilburn, Faye (2017-07-19). “Quants look to image recognition to process alternative data”. Retrieved 21 August 2017.
^ Sapnu, Raquel. “Why Alternative Data is the New Financial Data for Industry Investors”. Datafloq. Retrieved 21 August 2017.
^ Barnes, Dan (2017-07-02). “The role of data in gaining valuable financial insights”. Raconteur. Raconteur Media Ltd. Retrieved 21 August 2017.
^ Turner, Matt. “This is the future of investing, and you probably can’t afford it”. Business Insider. Retrieved 11 August 2017.
^ Iati, Robert. “Alternative Data: The Hidden Source of Alpha” (PDF). Dun & Bradstreet. Retrieved August 9, 2017.
^ Noyes, Katherine (2016-05-13). “5 things you need to know about data exhaust”. Computer World. IDG News. Retrieved 11 August 2017.
^ Levy, Rachael. “Hedge funds are tracking your every move, and ‘it’s the future of investing”. Retrieved 21 August 2017.
^ Wigglesworth, Robin. “Investors mine Big Data for cutting-edge strategies”. Financial Times. Retrieved 21 August 2017.
^ Borzykowski, Bryan (2016-06-09). “How investors are using social media to make money”. Retrieved August 3, 2017.
^ Wieczner, Jen. “How Social Media Is Helping Investors Make Money”. Fortune. Retrieved August 4, 2017.
^ a b c Ekster, Gene. “Driving Investment Performance with Alternative Data”. Integrity Research. Retrieved August 2, 2017.
^ McPartland, Kevin. “Alternative Data for Alpha” (PDF). GREENWICH ASSOCIATES. Retrieved 11 August 2017.
^ Najork, Marc; Heydon, Allan (2002). Handbook of Massive Data Sets. Springer US. pp. 25–45. doi:10. 1007/978-1-4615-0005-6. ISBN 9781461348825.
^ Ekster, Gene (2015-08-19). “Alternative Data Cross-functional Teams and Workflow”. Retrieved August 7, 2017.
^ Ekster, Gene (2016-05-02). “Mitigating Alternative Data Compliance Risks Associated with Web Crawling”. Retrieved June 20, 2017.
^ McCallister, Erika; Grance, Tim; Scarfone, Karen. “National Institute of Standards and Technology Special Publication 800-122: Guide to Protecting the Confidentiality of Personally Identifiable Information (PII)” (PDF). National Institute of Standards and Technology. Retrieved June 25, 2017.
^ Johnson, Richard. “Alternative Data in Action: Web-Scraping. ” 14 January 2019
^ “SBAI Publishes Standardised Trial Data License Agreement” 6 February 2019. Retrieved 15 May 2019. [permanent dead link]
^ “SBAI publishes Standardised Trial Data License Agreement. “6 February 2019. Retrieved 15 May 2019.
Alexander Denev and Saeed Amen, The Book of Alternative Data: A Guide for Investors, Traders and Risk Managers (Wiley 2020)
Marko Kolanovic and Rajesh T. Krishnamachari, Big Data & AI Strategies: Machine Learning and Alternative Data Approach to Investing (JP Morgan 2018)
What is Alternative Data? – Examples of Alternative … – AMPLYFI
Alternative data is auxiliary financial information useful for making investment decisions away from official or corporate sources. Used together with information from traditional data sources, alternative data give investors a full picture of an investment opportunity. Examples of alternative data include unstructured data emerging from a company or a person’s activity, public records such as non-farm payroll, mobile device data, Internet of Things (IoT) sensors, credit card transactions, point of sale transactions, website data, online browsing activity, product reviews, internet activity, app store analytics, ESG data, satellite imagery data and social media sentiment data.
It is widely recognised that big data is another essential factor of production in the modern world, as much land, labour, and capital were when early political economist Adam Smith was defining his classical theory of economic growth. Some argue that data is replacing labour and technological advances diminish the importance of land in the process of creating wealth. In finance, alternative data, a subset of big data, is used to provide competitive insights unavailable in traditional sources of investment information such as SEC filings, financial reports or market data.
The examples of alternative data mentioned above show that data is wide in breadth and provides a better predictor for trends and asset performance. Insights drawn from the data are profound, with implications spanning multiple assets and industries. Analysis and consumption of alternative data are made possible by AI technologies that collect, organise, and process raw unstructured data and use it for query or building applications that make evidence-based recommendations for users.
But there are several hurdles that institutions must overcome before they can benefit from alternative data. First is the vastness of it. Collectively, the world produces an estimated minimum of 2. 5 quintillion bytes of data daily. In a corporate environment, only 2% of this data is used, while another 95% is stored in a non-uniform unstructured format. Meaning, different institutions have to work through the maze to find the data relevant to their industry and unique insights it will yield to help maintain their competitive edge. After identifying what’s appropriate, functional and unique, institutions must organise the data to speak to the nature of assets and companies of interest. Lastly, collating the data, entities must query and determine if the data will reveal alpha opportunities.
The second challenge concerns investors who seek to determine the sustainability of investments by leveraging alternative data in the form of ESG data. With 85% of S&P companies publishing sustainability reports in 2017, ESG is gaining considerable traction. However, sifting through the noise and building the data sourcing process with an inbuilt mechanism for eliminating greenwashing companies may pose a significant challenge for investors. The problem is compounded by the absence of a global standard for reporting ESG metrics, meaning companies can choose what and how to report. ESG data is published data across multiple publications, including company websites, annual reports and emission disclosures. Therefore, investors run the risk of missing or misinterpreting the sustainability of an investment.
However, investors can leverage technology, specifically tools developed by artificial intelligence experts like AMPLYFI. AMPLYFI has expertise in building intelligent machine learning tools that use unstructured data to enable users such to make evidence-based decisions to either move forward or change with conviction. Our DeepInsight tool structures sourced data and applies machine learning algorithms to extract information and generate unique insights. DeepResearch, on the other hand, searches over 400 web, paid and internal sources in one click, summarising results using machine learning, saving institutions up to half of the secondary research time. AI-driven tools such as these can be incremental in helping organisations create value from the vast amounts of data available.
Get Started – AlternativeData.org
What is Alternative Data?
Alternative data refers to data used by investors to evaluate a company or investment that is not within their traditional data sources (financial statements, SEC filings, management presentations, press releases, etc. ). Alternative data helps investors get more accurate, faster, or more granular insights and metrics into company performance than traditional data sources. Over the last 10 years, increases in computing power and personal device usage created massive growth in data generation. As a direct outcome, a large number of companies emerged to collect, clean, analyze, and interpret data and provide it as a product that could inform investment decisions (“Alternative Data Providers”). See growth in alternative data providers selling to institutional investors in Figure 1.
Alternative Data Provider Stats
Alternative Data Providers: 445
Alternative Data Use Growth
For funds to make use of these datasets for investment decisions, they have had to build out their data teams.
The number of alternative data full-time employees (FTEs) at funds has grown ~450% in last 5 years.
Most alternative data FTEs have 11+ years experience and do not have graduate degrees.
Tech, Academia, and Data Providers are quickly becoming main channels for sourcing alternative data FTEs.
Cost of an alternative data team starts at $1. 5 – $2. 5m.
Figure 2. Growth in funds with alternative data teams and full-time alternative data employees.
See our original Buy-side Alternative Data Employee Analysis for detailed breakdown of growth in alternative data team building on the Buy-side. Note: Updated methodology on the analysis cited above led to new estimate of 1, 190 Data FTEs in 2017.
For most recent alternative data-related job posting at funds and providers, see the Jobs Page.
As funds have found use cases and applications for the increasing number of alternative datasets, their spend on alternative data has increased accordingly. (See Figure 3 – note: this includes spend on both datasets and infrastructure).
Figure 3. Buy-side spend on alternative datasets and infrastructure.
After thousands of conversations with investors, vendors, and experts, we have compiled the stack of top alternative data providers in the institutional investment space. The stack focuses on the top 100 data providers used by fundamental investors. It excludes market data, economic/macro data, and market news/industry publications.
Each provider’s position is intended to the firm’s product positioning relative to institutional investors. Data providers in the clusters towards the top are focused on data analysis and extracting insights from alternative data. Clusters that are positioned toward the bottom are more focused on data collection and quality assurance. and tend to not be directly consumed by fundamental analysts and PMs, but rather go through data brokers, the sell-side, or internal data teams for analysis.
For major players in alternative data providers broker out by data source and sector coverage, read on.
Major Types of Alternative Data
How is alternative data generated?
Individuals: Social/Sentiment, Web Traffic, App Usage, Survey
Business Processes: Credit/Debit Card, Web Data, Public Data, Email/Consumer Receipts
Sensors: Geo-location, Satellite, Weather
Data from business processes are typically more structured than data from individuals or sensors.
Data cost: typically Business Processes > Sensors > Individuals.
What are the different categories of alternative data?
App Usage – Data on app engagement and reviews. The level of data accuracy and usefulness depends on the app panel size, functions and features collected, and the level of user engagement. Popular use cases: gaming, food delivery, streaming services.
Credit/Debit Card – Transaction data generated from credit and debit cards. This data is considered highly accurate when the transaction panel is large and covers a consistent user sample. Usually panels over 3 million consumers are considered large enough to be useful. These panels are some of the more expensive data licenses on the market. Popular use cases: Retail revenue tracking.
Email/Consumer Receipts – Transaction data generated from email receipts. This data is accurate, but panels are typically smaller than credit/debit card panels and can be biased depending on the nature of the email receipt collection (often via an opt-in email or rewards app). Popular use cases: Retail revenue tracking.
Geo-location – Foot traffic data available from WiFi signals (limited granularity and accuracy) or bluetooth beacons (higher accuracy, more expensive, less coverage). Popular use cases: Geography-specific retail foot traffic tracking.
Public Data – Data from public resources. In its original form, this data is often difficult to access, not clean, not in a usable format (e. g. PDF). The value add of public data providers is the work of collecting, aggregating, and making the data actionable. Examples include SEC filings, patent data, government contracts, import/export data, etc. Popular use cases: patent data for tech company; supply chain imports for manufacturing; government contracts for construction company.
Satellite – Data collected from satellites or (increasingly common) low-level drones. This data is expensive and of variable quality. Image processing is as important as data collection (raw data is not valuable to most investment teams). Satellite data on parking lots is only useful if a more direct measurement of store activity (geo-location data) or spend (credit card, email receipt) data is not available or beyond price range. Popular use cases: supply chain disruption tracking; agriculture yields tracking; construction tracking; oil & gas production/storage.
Sell-side – Alternative data teams within large sell-side institutions. Combine new data and processing techniques with traditional sell-side research.
Social/Sentiment – Data obtained from text processing of social media, news, management communications, and other sources. Sentiment data is relevant for some companies (think younger, more trading volume, more volatile) more than large, established corporations. The data is often more relevant to shorter-term traders as it does not always reflect fundamental business aspects. On the lower end of cost spectrum. Popular use cases: Event-driven sentiment tracking; Brand Virality/Advertising success.
Survey – Data collected from surveys. This requires opt-in and panel diversity is variable depending on how good the provider is. This is a direct line in to consumer sentiment, rather than collecting it from text processing as in social/sentiment data. Popular use cases: brand preference; consumer behavior.
Weather – Data on weather patterns collected from sensors. Popular use cases: agriculture and commodities.
Web Data – Data scraped from public websites. This data comes in a wide range, from highly accurate and expensive to extremely raw and relatively inexpensive. This data is applicable where KPIs can be tracked by aggregating and analyzing large amounts of public-facing information, such as companies that publicize quantity sold and prices on each item page. This data can be extremely granular. Popular use cases: e-commerce; auto sales; airlines bookings; travel bookings; job postings.
Web Traffic – Data on quantity, demographics, and history (clickstream) of users visiting a certain website. This is popular for tracking e-commerce efforts. Popular use cases: travel bookings; e-commerce.
Other – There are many other popular datasets, including point-of-sale data, ad spend data, pricing data, and much more. These are not yet broad enough to capture a full section.
Which are the most popular datasets for investors?
Data source with the greatest number of providers:Social/Sentiment
Highest grossing data source: Credit/Debit Card
Most utilized datasets: Web Data, Credit/Debit Card
Most insightful datasets: Credit/Debit Card, Web Data
Least insightful datasets: Geo-location, Satellite
Major Players in Alternative Data
Frequently Asked Questions about alternative data finance
What is alternate data in finance?
Alternative data is auxiliary financial information useful for making investment decisions away from official or corporate sources. Used together with information from traditional data sources, alternative data give investors a full picture of an investment opportunity.Jul 8, 2021
What is alternative data?
What is Alternative Data? Alternative data refers to data used by investors to evaluate a company or investment that is not within their traditional data sources (financial statements, SEC filings, management presentations, press releases, etc.).
What is alternative market data?
More than 400 companies are engaged in selling alternative data to hedge funds, thereby contributing significantly to market revenue. Alt-data refers to undiscovered data that is not within the traditional data sources, such as SEC filings, financial statements, press releases, and management presentations.