AI-ready ESG Data
Corporate financial and sustainability reporting datasets. Reliable, tagged and machine-readable.
Centralised library of over 6,400 international companies, 212,500 documents and 149 million sentences.
How We Can Help You
Machine-Readable Data
LLM training data incorporating sustainability disclosures from European and US companies since 2019.
Streamlined Delivery
Access through multiple interfaces making acquisition and data science simple.
Time-Sensitive Updates
The latest company reports updated regularly across the dataset.
Trustworthy Sources
Transparent and complete data you can trust.
Tagged, pre-trained sentence-level data
Publicly available corporate documents are curated using an automated pipeline which seamlessly processes data collected. For transparency, the database holds the original document (pdf), and the extracted machine-readable text, split into sentences. Each sentence is tagged with metadata and attributed to company, region, country, industry, reporting year and report type, among others.
Natural Language Processing (NLP) is used to classify each sentence across 14 sustainability-related topics for specific pre-trained model usage. These BERT-based models are subject matter expert-trained on issues such as climate change and human rights.
Guided set-up, simple access
Our dataset is provided using the AWS cloud infrastructure, with ease of deployment and user access front of mind (available in JSON, CSV or text). We also provide data via an API service enabling on-demand analytics on our disclosure data. The dataset is also available via our Research portal, which is designed for analytics, benchmarking, interrogation and export of the data.
Data can be curated according to client-specific requirements such as sector, region and topics based on our NLP classifiers or keywords.
Most recent reports available
Regular review of company websites is an important part of ensuring we have all documents made public by a company. This allows the user to understand industry trends in relation to corporate financial and sustainability reporting and language.
Our scalable document collection pipeline also allows efficient gathering of data required by clients not already in the document library, i.e. document collection on demand.
Data you can rely on
For each document collected, we provide the source of the information at time of collection so our dataset of sentences can be trusted as authentic and appropriate for LLM use. While our process is largely automated, we take the time to human tag, review and validate the work done by our team. This ensures that you can trust what we deliver, and the quality meets your expectations.
Related Insights
27 June 2024
TDI Global reveals 59% of world’s largest companies are failing to meet basic reporting expectations, while the UK takes a clear lead
8 April 2024
Using AI to help financial regulators detect greenwashing. Presentation with ImpactScope at World AI Cannes Festival 2024
31 January 2024
Harnessing specialised large language models for corporate sustainability reporting
20 November 2023
Transparency & Disclosure Index reveals stark differences in reporting patterns across UK's largest companies
4 April 2023
Only 5% of FTSE100 have credible climate transitions plans according to EY: Insig AI's response
9 March 2023