Preprints surface days after a discovery. Years before the patents, the funding rounds, the analyst coverage. Finch tracks that signal across 73 themes, every month.
Scientific papers appear within days of a discovery — years before patents, funding rounds, and analyst coverage. The Finch Innovation Index captures that signal systematically, every month.
Each month the index scans 1.01M research abstracts to surface technical concepts surging in frequency — the building blocks of tomorrow's products, before they have ticker symbols.
Explore Sample Data →Looking back across six years of preprint data, the precursor signals for every major technology wave were visible at the research stage — years before capital arrived.
Preprint velocity in GLP-1 receptor agonists surged 180% before any major pharma coverage. The signal was unambiguous two years before Ozempic became a household name.
Research output at the intersection of deep learning and molecular biology began compounding well before the sector attracted major venture capital attention.
Preprint publication rates for transformer-based models began an exponential climb in 2018–2019. The GPT revolution and its commercial consequences followed on schedule.
Delivery mechanism research for mRNA therapeutics was compounding quietly in 2018. Moderna and BioNTech were building on a preprint signal that had been accumulating for years.
CRISPR preprint volume doubled across two consecutive years before therapeutic applications entered clinical development and attracted institutional capital.
Efficiency research publications in perovskite photovoltaics doubled over two years, well ahead of the commercial investment wave that followed into the sector.
Structured CSV and Parquet files updated on the 1st of each month. Every dataset covers all 73 themes across 66 months of history.
| theme | momentum_score | paper_count | mom_change | rank | sector |
|---|---|---|---|---|---|
| Speech & Audio AI | 81 | 322 | +3.2 | 1 | AI |
| AI Agents & Reasoning | 78 | 2,052 | +1.8 | 2 | AI |
| LLMs & NLP | 72 | 3,759 | +0.4 | 3 | AI |
| AR/VR & Immersive | 66 | 824 | -0.6 | 4 | HW |
| Checkpoint Inhibitors | 64 | 420 | -1.2 | 5 | LS |
| Next-Gen Vaccines | 62 | 677 | -0.3 | 6 | LS |
| Quantum Error Correction | 61 | 318 | +0.9 | 7 | HW |
| Federated Learning | 59 | 589 | -1.4 | 8 | AI |
| theme | country | paper_count | inst_count | share_pct | rank |
|---|---|---|---|---|---|
| AI Agents | 🇺🇸 USA | 8,420 | 1,011 | 24.1% | 1 |
| LLMs & NLP | 🇨🇳 China | 5,620 | 674 | 18.3% | 2 |
| Perovskite Solar | 🇨🇳 China | 3,840 | 461 | 33.4% | 3 |
| Next-Gen Vaccines | 🇬🇧 UK | 2,220 | 266 | 14.8% | 4 |
| Checkpoint Inhibitors | 🇺🇸 USA | 4,068 | 488 | 52.1% | 5 |
| Remote Monitoring | 🇬🇧 UK | 2,897 | 348 | 29.3% | 6 |
| AI Agents | 🇨🇦 Canada | 1,320 | 158 | 3.8% | 7 |
| LLMs & NLP | 🇮🇳 India | 2,860 | 343 | 9.3% | 8 |
| keyword_bigram | growth_mult | tag | primary_theme | paper_count | prior_12m_avg |
|---|---|---|---|---|---|
| latent actions | ×24.3 | novel | AI Agents & Reasoning | 214 | 8.8 |
| attention sink | ×23.1 | novel | LLMs & NLP | 187 | 8.1 |
| computational budgets | ×22.4 | novel | Chip Architecture | 176 | 7.9 |
| gated attention | ×21.0 | surging | LLMs & NLP | 310 | 14.8 |
| coded caching | ×35.2 | novel | Federated Learning | 141 | 4.0 |
| test-time scaling | ×19.7 | surging | AI Agents & Reasoning | 289 | 14.7 |
| reward shaping | ×18.4 | novel | AI Agents & Reasoning | 198 | 10.8 |
| vector quantization | ×17.1 | surging | Speech & Audio AI | 162 | 9.5 |
| theme | emerged | months | cum_growth | accel_3mo | sig_score | momentum |
|---|---|---|---|---|---|---|
| AI Agents & Reasoning | Jan 2020 | 73 | +337% | +12.4% | 78 | 81 |
| mRNA Therapeutics | Apr 2020 | 70 | +512% | +8.1% | 58 | 55 |
| Speech & Audio AI | Jun 2019 | 80 | +621% | +15.2% | 81 | 84 |
| Solid-State Batteries | Mar 2021 | 59 | +218% | +6.3% | 66 | 57 |
| Federated Learning | Feb 2021 | 60 | +284% | +4.9% | 59 | 52 |
| Perovskite Solar | Sep 2020 | 65 | +193% | +3.7% | 54 | 51 |
| Checkpoint Inhibitors | Nov 2019 | 75 | +141% | +2.1% | 64 | 48 |
| Quantum Error Correction | Aug 2021 | 54 | +256% | +7.8% | 61 | 59 |
A systematic, four-stage pipeline converts raw preprint publications into structured investment signals — no manual curation, no subjective scoring.
Monthly ingestion of academic preprints from the largest open-access repositories for frontier research. Over 1M papers processed since January 2019.
Each abstract is classified into one or more of 73 investable technology themes using a proprietary taxonomy validated against expert panels. 99.3% classification accuracy.
Monthly publication velocity is normalised into a 0–100 momentum score per theme. Scores account for baseline volume, growth rate, and 3-month acceleration to separate signal from noise.
Four structured datasets produced on the 1st of each month: Momentum rankings, Geographic Intelligence across 19 countries, Rising Keywords via bigram analysis, and Theme Emergence tracking.
The index is used by investors, strategists, and advisors who need a quantitative foundation for technology thesis work — not a narrative, a dataset.
Identify emerging categories 2–4 years before deal flow appears. Build data-backed thesis documents before the sector has a name.
Scan adjacent technical domains for threats and opportunities. Build R&D pipeline maps grounded in publication velocity, not analyst opinion.
Validate target company positioning against underlying research trends. Identify sectors approaching peak publication velocity before valuation follows.
Systematic, monthly signals across 73 themes. Integrate preprint momentum into quantitative models as a leading factor for technology-sector positioning.
Deliver technology landscape assessments backed by publication data, not keyword searches. Differentiate strategy reports with proprietary signal intelligence.
License structured monthly feeds to enrich alternative data products. Four clean datasets, consistent schema, CSV and Parquet on a monthly cadence.
Over one million papers classified. 73 investable themes. Rising Keywords, Geographic Intelligence, and Theme Emergence — updated every month.
Explore Sample Data