Startup Diligence
Diligence report AI Biology / Protein Language Models acquired 2026-05-18

EvolutionaryScale

World-class protein-LM science meets a non-commercial exit — CZI absorption in Nov 2025 leaves Series A investors with undisclosed returns and no standalone equity story.

EvolutionaryScale produced frontier-quality protein language models (Science-validated ESM3) but the November 2025 CZI absorption — barely 14 months after the $142M Series A — eliminates the standalone investment thesis and leaves commercial investor returns publicly unaccounted for.

Cover facts

Last private valuation (Series A) 01
1350 USD M [CO017]
Total raised (disclosed) 02
142 USD M [CO016]
Series A close 03
September 26, 2024 [CO016]
ESM3 parameters (largest) 04
98 B params [CO009]
ESM3 training compute 05
1e+24 FLOPs [CO010]
Science paper 06
DOI 10.1126/science.ads0018 [CO013]
CZI absorption date 07
November 6, 2025 [CO021]
Employees (pre-acquisition) 08
11-50 headcount [CO020]
Revenue / ARR 09
[CO029]

Company profile

EvolutionaryScale was a San Francisco-based AI biology company that built and released ESM3, a 98-billion-parameter generative multimodal protein language model trained on 2.78 billion sequences (1×10^24 FLOPs on NVIDIA H100s), and ESM Cambrian (ESM-C), a representation-focused follow-on. The four co-founders — Alex Rives (CEO), Tom Sercu (VP Eng), Zeming Lin, and Sanjay Rao — were all formerly at Meta AI Research (FAIR), where they originated the ESM lineage. The company raised a seed round (June 2024) and a $142M Series A co-led by Amazon and NVIDIA (September 2024) at an implied ~$1.35B post-money valuation, alongside Lux Capital, Nat Friedman, and Daniel Gross. ESM3 was peer-validated by a Science publication (January 2025) and integrated into AWS SageMaker JumpStart and NVIDIA BioNeMo. On November 6, 2025, fewer than 14 months after the Series A close, the entire team joined CZ Biohub under the Chan Zuckerberg Initiative's Frontier AI for Biology initiative — ending EvolutionaryScale as an independent for-profit entity. Commercial revenue was never disclosed during the company's independent existence.

Website
www.evolutionaryscale.ai
Founded
2023-01-01
Founders
Alexander (Alex) Rives, Tom Sercu, Zeming Lin, Sanjay Rao
Founding location
San Francisco, California, USA
Headquarters
San Francisco, California, USA (pre-acquisition); post-acquisition operations consolidated into CZ Biohub Network sites.
Product
EvolutionaryScale's product lineup centered on the ESM (Evolutionary Scale Modeling) family of protein language models — ESM3 (multimodal generative, up to 98B parameters) and ESM Cambrian / ESM-C (representation-focused, 300M / 600M / 6B variants) — distributed through (a) the open-weights HuggingFace channel for academic/non-commercial use (esm3-sm-open-v1 and ESM-C variants accumulated ~9,400+ downloads combined as of May 2026), (b) the commercial Forge API at forge.evolutionaryscale.ai with developer-facing SDK and documentation, (c) NVIDIA BioNeMo NIM microservice integration for enterprise H100 deployment, and (d) the AWS Marketplace SageMaker JumpStart listing. A peer-reviewed Science paper (Jan 16, 2025; DOI 10.1126/science.ads0018) validated ESM3's ability to design novel functional fluorescent proteins, providing independent scientific credibility.
Customers
Pre-acquisition the intended customer base was biotechnology and pharmaceutical R&D teams (target identification, protein engineering, antibody design), synthetic biology and industrial enzyme companies, academic researchers (free open-weights tier), and infrastructure customers reaching the models through AWS Marketplace and NVIDIA BioNeMo. No named pharma customer or paying-customer count was ever publicly disclosed.
Business model
SaaS / usage-based API pricing through the Forge platform plus partner revenue-share with AWS (SageMaker JumpStart listing) and NVIDIA (BioNeMo NIM microservice). Free open-weights ESM3 and ESM-C variants were released on HuggingFace under non-commercial research licenses as a developer-community / lead-gen strategy. Post-acquisition the entity is part of a non-profit research network (CZI / CZ Biohub) and the future commercial status of the Forge API is publicly unconfirmed.
Stage
acquired
Funding status
Acquired by Chan Zuckerberg Initiative / CZ Biohub on November 6, 2025; team absorbed into the CZ Biohub Network under the Frontier AI for Biology initiative. Acquisition terms (cash, stock, IP transfer, employee retention packages, treatment of Series A preferred shares) have not been publicly disclosed. Total disclosed pre-acquisition capital raised was $142M (Series A, Sep 26 2024) plus an undisclosed seed round (announced Jun 25 2024) at an implied ~$1.35B post-money valuation on the Series A.
[CO001, CO002, CO007, CO013, CO016, CO017, CO021, CO022]

Executive summary

Top strengths

  • ESM3 is the largest publicly released protein language model (98B parameters, 2.78B training sequences, 1×10^24 FLOPs) and was peer-validated in Science (January 2025) for genuinely novel functional protein design — a credibility moat no closed-source competitor can match.
  • The founding team — Rives, Sercu, Lin, Rao — built the original ESM lineage at Meta AI Research and are widely regarded as among the strongest protein-LM researchers globally; CZI's hiring of the entire team is itself evidence of talent quality.
  • Two strategic co-investors — Amazon (AWS) and NVIDIA — provided both capital and distribution (AWS SageMaker JumpStart listing, NVIDIA BioNeMo NIM microservice), giving ESM3 enterprise reach far beyond what a Series A startup could achieve alone.
  • Strong developer-signal: ~9,400+ combined HuggingFace downloads across ESM3 and ESM-C model cards, active GitHub presence (esm + DeepEP + infrastructure forks), and a growing downstream academic citation graph (32+ papers built on ESM3 per Semantic Scholar).

Top risks

  • The November 6, 2025 CZI / CZ Biohub absorption (under 14 months after the Series A close) ends the standalone commercial entity and the public-equity exit thesis; acquisition terms, treatment of Series A preferred, and any commercial Forge API continuity are entirely undisclosed.
  • The founding team's single-employer provenance (all four ex-Meta FAIR) heightens cultural / methodological homogeneity and key-person concentration risk, and the post-acquisition role of Rives at CZI (rather than at a successor commercial entity) closes the option of re-spinning the team.
  • Open-source commoditization threat is acute: the predecessor ESM2 is MIT-licensed and freely available, AlphaFold 3 publishes weights for non-commercial use, OpenFold and Chai-1 are open, and Meta retains underlying ESM IP — collectively eroding willingness to pay for closed Forge API access.
  • Zero disclosed commercial revenue, no SEC Form D filings under any "EvolutionaryScale" variant in EDGAR (unusual for a $142M raise), and a Bloomberg-paywalled Series A article all collectively block verification of capital structure, runway, and customer economics.
  • Dual-use / biosecurity regulatory drag is rising (US Executive Order Oct 2023 §4.4 protein-design watchlist, BIS advance notice on AI bio, EU AI Act dual-use provisions), and a non-profit successor may face different — but not necessarily lighter — compliance obligations than the standalone for-profit would have.

Open gaps

  • Acquisition terms of the November 2025 CZI / CZ Biohub deal — cash, stock, IP-transfer, retention packages, and most importantly the treatment of Series A preferred shares held by Amazon, NVIDIA, Lux Capital, Nat Friedman, and Daniel Gross — are not publicly disclosed.
  • Forge API operational status, customer count, pricing tier, and any continuity commitment post-acquisition; whether the commercial API will remain available or be deprecated under the CZ Biohub non-profit umbrella.
  • Commercial revenue and ARR at any point in the company's life were never publicly disclosed; with no public S-1 / Form D / 10-K filings, no audited revenue figure is verifiable.
  • Exact seed-round amount (announced June 25, 2024 alongside the ESM3 launch, with NVIDIA, Amazon, Lux, Friedman, and Gross) is not publicly disclosed.
  • The IP-transfer agreement between Meta and EvolutionaryScale for the ESM2 lineage — and the subsequent transfer to CZI / CZ Biohub — has not been publicly described; legal ownership of the ESM trademark and underlying model weights is unclear.
  • Bloomberg's September 26, 2024 Series A article is paywalled, blocking independent verification of investor rights, board composition, pro-rata agreements, and any secondary components of the round.

Contents

Chapter 01

01Company Overview

1.1 Identity, Headquarters, and Business Model

EvolutionaryScale, Inc. was an early-stage AI biology company headquartered in San Francisco, California, incorporated in 2023 and operationally active from approximately March 2024. The company's stated mission was to use large generative models to decode the language of protein sequences, treating proteins as text encoding billions of years of biological evolution, and apply that understanding to design novel proteins with programmable functions. Its primary commercial offering was the Forge API platform (forge.evolutionaryscale.ai), a developer-facing service providing programmatic access to the ESM3 and ESM Cambrian (ESM-C) model families. The intended revenue model was a software-as-a-service API subscription and usage-based pricing for biotechnology, pharmaceutical, and synthetic biology customers seeking to accelerate protein engineering. The company also released the open-weights ESM3 model (esm3-sm-open-v1) for non-commercial academic use via Hugging Face, building a developer community alongside the commercial offering. As of November 2025, EvolutionaryScale ceased to operate as an independent entity when its full team was absorbed into CZ Biohub under the Chan Zuckerberg Initiative, marking the end of its path as a standalone commercial company. No revenue, ARR, or paying customer counts were ever publicly disclosed by the company during its independent existence. [CO001, CO003, CO006, CO007, CO028, CO029]

EvolutionaryScale: Snapshot KPI Table (as of May 2026)
MetricValue / StatusDate / PeriodConfidenceSource / Gap
Company StageAbsorbed into CZ Biohub (non-profit)November 2025highbiohub.org acquisition announcement
Headquarters (pre-acquisition)San Francisco, California, USA2024-2025mediumCrunchbase; no primary street address confirmed
Founded2023 (incorporated); operational ~March 20242023 / Mar 2024highOfficial website; chapter context brief
CEO / FounderAlexander (Alex) Rives (CEO through Nov 2025)2024-Nov 2025highevolutionaryscale.ai; NVIDIA blog; LinkedIn
Total Confirmed Capital Raised$142M+ (seed undisclosed + $142M Series A)Sept 2024highBloomberg (paywall); NVIDIA blog; Crunchbase
Series A Valuation (implied)~$1.35B post-moneySept 26, 2024mediumThird-party estimates; not confirmed in primary filing
Employees (pre-acquisition)11-50LinkedIn as of 2024lowLinkedIn company page; no official headcount disclosed
Lead ProductESM3 (98B param protein language model)Launched June 25, 2024highevolutionaryscale.ai/blog/esm3-release
ESM Cambrian (ESM-C)300M / 600M / 6B parameter modelsReleased Dec 4, 2024highevolutionaryscale.ai/blog/esm-cambrian
Science Paper (ESM3)Published Science journal, Jan 16, 2025Jan 16, 2025highDOI: 10.1126/science.ads0018; BioRxiv preprint prior
Revenue / ARRNot publicly disclosed; Forge API SaaS modelCurrentlowNo filings; forge.evolutionaryscale.ai is JS-only
NVIDIA PartnershipBioNeMo NIM integration; seed + Series A investor2024-2025highblogs.nvidia.com; nvidia.com/bionemo
SEC Form D FilingsNone found in EDGAR (2024-2026)May 2026highefts.sec.gov Form D search; SEC EDGAR
Wikipedia PageDoes not exist (404)May 2026highen.wikipedia.org/wiki/EvolutionaryScale returns 404

Valuation and headcount figures are based on third-party reports only; no SEC filings or primary cap-table disclosures are available. Series A seed-round amount is not publicly disclosed. Post-acquisition corporate structure and investor returns are unknown.

[CO001, CO003, CO004, CO007, CO008, CO015]
FO003: EvolutionaryScale: Snapshot KPI Dashboard

Key performance indicators summarizing EvolutionaryScale capital, technology scale, model adoption, and current status as of May 2026.

HuggingFace download counts are snapshots from research session in May 2026 and may change. Valuation is an implied estimate from third-party sources. Parameter counts and training data sizes are from official company publications.

[CO013, CO016, CO017, CO019, CO021, CO025]

1.2 Founders, Leadership, and Key-Person Risk

EvolutionaryScale was co-founded by four researchers who had worked together at Meta AI Research (FAIR): Alexander (Alex) Rives, Tom Sercu, Zeming Lin, and Sanjay Rao. Alex Rives served as CEO and was the principal architect of the ESM protein language model lineage, which had its roots in his academic work and continued at Meta FAIR prior to the spin-out. Tom Sercu served as co-founder and VP of Engineering, leading the infrastructure and engineering team that built the Andromeda H100 cluster. Zeming Lin and Sanjay Rao served in technical co-founder roles contributing to model development and research. Following the November 2025 acquisition, Alex Rives became Head of Science at the Chan Zuckerberg Initiative, and the rest of the founding team joined CZ Biohub in senior research roles. The company's entire founding team originating from a single employer (Meta FAIR) represents a significant concentration risk: the cultural, methodological, and technical assumptions of the team are deeply homogeneous, and there was no evidence of independent board members, advisors from outside the AI research community, or experienced commercial biotechnology executives among the founding team. This single-employer provenance also heightens key-person dependency, since the departure of any one founder, particularly Rives as CEO and technical visionary, would have had an outsized impact on the company's research direction and investor confidence. [CO002, CO004, CO005, CO022, CO030]

Leadership and founder table
PersonRole (at EvolutionaryScale)Pre-Company BackgroundFounder-Market FitKey-Person Dependency
Alexander (Alex) RivesCEO, Co-FounderMeta AI (FAIR) researcher; originated ESM protein LM research lineageDeep expertise in protein language modeling; inventor of the ESM model familyCritical: public face, research visionary, and CEO; departure would materially impair the company
Tom SercuCo-Founder, VP EngineeringMeta AI (FAIR) engineer; co-authored ESM3 and BioRxiv preprintInfrastructure and engineering leadership for large-scale model training and inferenceHigh: led engineering; directly responsible for Andromeda training cluster operations
Zeming LinCo-Founder, ResearchMeta AI (FAIR) researcher; co-authored ESM3 BioRxiv preprintCore research contributor to ESM model developmentMedium: research contributor; less public-facing than Rives or Sercu
Sanjay RaoCo-Founder, ResearchMeta AI (FAIR); contributed to ESM research programCore technical co-founder with AI research backgroundMedium: research co-founder; limited independent public record

All four co-founders previously worked at Meta AI Research (FAIR), creating single-employer concentration risk. Salvatore Candido appears as a BioRxiv preprint author and may have been a fifth co-founder or early team member, but their role at EvolutionaryScale was not independently confirmed in available sources. Post-acquisition: Rives became Head of Science at CZI; other founders joined CZ Biohub in research roles.

[CO002, CO004, CO005, CO022, CO030, CO031]

1.3 Funding History, Valuation, and Capital Structure

EvolutionaryScale raised capital in two known rounds prior to its absorption into CZI. The first was a seed round announced simultaneously with the ESM3 public launch on June 25, 2024. NVIDIA and Amazon both participated in the seed round alongside Lux Capital, Nat Friedman, and Daniel Gross. The exact seed amount was not publicly disclosed, though NVIDIA announced its participation separately. The Series A followed rapidly: on September 26, 2024, EvolutionaryScale announced a $142M Series A co-led by Amazon and NVIDIA, with continued participation from Lux Capital, Nat Friedman, and Daniel Gross, at an implied post-money valuation of approximately $1.35B. This closed only three months after the ESM3 launch and represented one of the largest AI biology funding rounds of 2024. Notably, no SEC Form D filings were found in EDGAR under any variant of "EvolutionaryScale" for the 2024 to 2026 period. Bloomberg reported on the Series A but the article is behind a paywall, preventing public verification of full deal terms, investor rights, and any secondary components. Total confirmed capital raised is $142M plus an undisclosed seed amount. No commercial revenue, debt facilities, or secondaries are on public record. The company's capital structure at the time of the CZI acquisition remains private. [CO015, CO016, CO017, CO018, CO019, CO036]

Stakeholder or investor map
StakeholderRole / RelationshipRound / StageEconomic / Control ImportanceDiligence Ask
Amazon (AWS)Lead Series A investor; cloud infrastructure partnerSeries A co-lead (Sept 2024)High economic stake; likely preferred equity; strategic compute supply arrangement probableConfirm investment amount, preferred terms, board seat, and AWS compute credit arrangement
NVIDIALead Series A investor; seed investor; BioNeMo integration partnerSeed + Series A co-lead (2024)High economic stake; strategic: ESM3 integrated into BioNeMo/NIM on H100 infrastructureConfirm investment amount, NIM licensing revenue share, and GPU infrastructure commitment terms
Lux CapitalEarly-stage VC; participated in seed and Series ASeed + Series A (2024)Early investor with likely significant ownership from seed stageConfirm ownership percentage, liquidation preferences, and post-acquisition treatment
Nat FriedmanAngel investor; participated in seed and Series ASeed + Series A (2024)Individual angel; likely minor economic stake vs. institutional investorsConfirm participation amount and any advisory role
Daniel GrossAngel investor; participated in seed and Series ASeed + Series A (2024)Individual angel; AI compute investor backgroundConfirm participation amount and relationship to AI infrastructure ecosystem
Chan Zuckerberg Initiative (CZI) / CZ BiohubAcquirer / successor organization (Nov 2025)Acquisition / integration (Nov 2025)Critical: absorbed the entire EvolutionaryScale team and presumably IP; controls the future of ESM modelsDisclose acquisition terms: equity buyout, IP transfer, commercial agreement continuity, and investor return details

Amazon and NVIDIA are confirmed as Series A co-leads from NVIDIA blog post and Crunchbase. Full dollar amounts per investor are not public. CZI acquisition terms have not been disclosed publicly. There may be additional Series A investors not identified in available sources.

[CO015, CO016, CO017, CO018, CO021, CO023]

1.4 Products, Technology, and Operations

EvolutionaryScale's product portfolio centered on the ESM (Evolutionary Scale Modeling) family of protein language models. ESM3, released June 25, 2024, is a generative multimodal protein model available in sizes up to 98 billion parameters, the largest protein language model released publicly at the time of its launch. ESM3 was trained on 2.78 billion protein sequences totaling 771 billion tokens using approximately one times ten to the power 24 floating-point operations on a cluster of NVIDIA H100 GPUs branded Andromeda internally. The company described ESM3 as simulating 500 million years of protein evolution during a single generation pass, enabling the design of novel proteins with user-specified structural and functional properties. A peer-reviewed paper validating ESM3's ability to design genuinely novel fluorescent proteins was published in Science on January 16, 2025 (DOI: 10.1126/science.ads0018), providing significant independent credibility. ESM Cambrian (ESM-C), released December 4, 2024, is a next-generation protein language model available in 300M, 600M, and 6B parameter sizes optimized for efficient inference. NVIDIA integrated ESM3 into its BioNeMo platform and made it available as an NVIDIA NIM microservice for enterprise deployment on H100 infrastructure. The company maintained a GitHub organization with 9+ repositories including the ESM model codebase, the DeepEP repository for mixture-of-experts inference, and forks of open-source training infrastructure. The Forge API platform served as the commercial developer interface. EvolutionaryScale had 11 to 50 employees as indicated on LinkedIn prior to the acquisition. [CO007, CO008, CO009, CO010, CO011, CO012]

FO002: EvolutionaryScale: Business System and Value Chain (Flow)

How EvolutionaryScale protein language models connect from training data through model development, API platform, and strategic partnerships to the CZI acquisition outcome.

[CO009, CO010, CO025, CO026, CO028, CO032]

1.5 Key Milestones and Adverse Events

EvolutionaryScale's history spans from its 2023 incorporation through a rapid product-and-capital phase to its absorption into CZI in November 2025, a total operational lifespan of under three years as an independent entity. The company's most significant technical milestone is the ESM3 Science paper publication, which provided the first independent peer-reviewed validation that a generative protein language model could design novel functional proteins with programmable properties. The $142M Series A at approximately $1.35B valuation in September 2024, just three months after product launch, reflected extraordinary investor enthusiasm for the technology. However, the November 2025 acquisition by CZI/Biohub is the dominant adverse governance event: with less than 14 months between the Series A close and the absorption into a non-profit entity, commercial investors face a highly uncertain exit path, since the terms of the CZI acquisition and any compensation to equity holders have not been publicly disclosed. The transition from a for-profit commercial entity to a non-profit research initiative raises material questions about the fate of the Series A investors, any employee equity, and the continuity of the Forge API commercial offering. Additional adverse events include: the absence of any SEC Form D filings, which is unusual for a company that raised $142M; the ESM GitHub repository redirect to the biohub organization, signaling IP transfer; and the company's Wikipedia page does not exist, indicating limited mainstream media documentation. [CO001, CO008, CO013, CO016, CO019, CO021]

Milestone table
DateEventTypeAmount / Valuation / StatusParticipantsImplication
2023Company incorporated; founding team departs Meta AI (FAIR)foundingAlex Rives, Tom Sercu, Zeming Lin, Sanjay RaoEstablishes legal entity; founding team coalesces around ESM protein LM research lineage
Mar 2024Company operational milestone; pre-launch development phasefoundingFounding teamResearch and engineering ramp; model training on Andromeda H100 cluster begins
Jun 25, 2024ESM3 public launch and seed round announcementproductSeed amount undisclosedLux Capital, Nat Friedman, Daniel Gross, NVIDIA, AmazonESM3 (98B param) publicly released with open-weights and Forge API; seed funding confirmed
Jul 2024BioRxiv preprint published (doi: 10.1101/2024.07.01.600583)productRives, Sercu, Candido, Lin et al.ESM3 scientific claims available for community review prior to peer review
Sep 26, 2024$142M Series A announcedfinancing$142M at ~$1.35B implied valuationAmazon (co-lead), NVIDIA (co-lead), Lux Capital, Nat Friedman, Daniel GrossLargest AI biology raise of 2024 H2; institutional validation of protein LM platform
Dec 4, 2024ESM Cambrian (ESM-C) releasedproductEvolutionaryScale teamExpanded model family (300M/600M/6B params); broadens inference efficiency options for customers
Jan 16, 2025ESM3 paper published in Science journalproductRives et al., Science DOI: 10.1126/science.ads0018Peer-reviewed validation of novel GFP design; strengthens scientific credibility and institutional adoption
Nov 6, 2025Team joins CZ Biohub; EvolutionaryScale absorbed by CZIadverseTerms undisclosedCZI / CZ Biohub, full EvolutionaryScale teamCompany ceases as independent entity; investors face uncertain exit; ESM IP transfers to non-profit

The CZI acquisition event (Nov 2025) is adverse from a commercial investor perspective. Exact seed round amount remains undisclosed. Series A valuation is an implied figure from third-party sources, not confirmed in a primary SEC filing. March 2024 operational launch date is from chapter context brief; no primary announcement found.

[CO001, CO008, CO011, CO013, CO015, CO016]
FO001: EvolutionaryScale: Corporate Milestone Timeline

Key milestones from founding in 2023 through ESM3 launch, Series A, ESM Cambrian, Science publication, and CZI acquisition in November 2025.

March 2024 operational launch date is from chapter context brief; no primary announcement found. Series A valuation of ~$1.35B is third-party estimate. Post-acquisition ESM IP continuity claim is inferred from biohub.org announcement, not confirmed by formal IP transfer agreement.

[CO001, CO002, CO007, CO008, CO010, CO011]
Chapter 02

02Market Analysis

2.1 Market Definition and Boundary

EvolutionaryScale's addressable market has two interconnected layers. The primary layer is the protein language model (PLM) API and platform market—cloud-hosted or on-premise AI models that enable protein engineers to generate, predict, and optimize protein sequences and structures without exhaustive wet-lab directed evolution. This market is distinct from general cloud compute, traditional bioinformatics pipelines, genomic sequencing platforms, medical imaging AI, or clinical trial management software. The secondary layer comprises the broader protein engineering research tools market, which encompasses all reagents, instruments, and software used in recombinant protein research and design—a larger addressable segment into which PLM APIs fit as a high-growth AI sub-category. Status-quo substitutes for protein LM platforms include: (1) AlphaFold2/3 (Google DeepMind) for protein structure prediction—freely available for non-commercial use via a public database of over 200 million predicted structures; (2) Rosetta and PyRosetta (University of Washington) for computational protein design—open-source but computationally intensive and requiring significant expertise; (3) directed evolution in wet lab (iterative random mutagenesis and screening)—expensive, slow (weeks per cycle), and throughput-limited; and (4) traditional molecular dynamics and homology modeling software (GROMACS, MODELLER, Schrödinger Maestro)—established tools for structure-guided design without generative capability. ESM3 and ESM-C differentiate from these substitutes by jointly reasoning over protein sequence, structure, and function within a single unified generative model. Adjacent markets include the AI drug discovery platform market (all computational tools accelerating small-molecule and biologic drug design), which Grand View Research estimates at $2.35B in 2025 growing to $13.77B by 2033 at 24.8% CAGR. Industrial biotechnology— enzyme engineering for green chemistry, agriculture, and biomaterials—represents a further adjacency with distinct procurement patterns and lower regulatory burden than pharmaceutical applications. The outer boundary is the global drug discovery market ($71.89B in 2025 per Precedence Research), which includes all modalities and services, well beyond EvolutionaryScale's direct footprint.[CM001, CM002, CM003, CM004, CM005, CM006]

Market definition table
Market Segment / CategoryIncluded SpendExcluded SpendPrimary Buyer / PayerEvolutionaryScale Relevance
Protein Language Model API (core)Cloud API access for protein sequence generation, embedding, structure prediction, and multi-modal reasoning via ESM3/ESM-CTraditional wet-lab directed evolution, genomic sequencing instruments, general cloud compute without protein-specific modelsPharma/biotech VP Computational Biology, CSO, ML/bioinformatics scientistsDirect revenue source: Forge API, AWS SageMaker JumpStart distribution, NVIDIA BioNeMo NIM microservices
Protein Engineering Research Tools (primary adjacent)All software, AI tools, and services enabling recombinant protein research, design, and optimizationLab instruments (PCR machines, crystallography), DNA synthesis reagents, directed evolution consumablesResearch and development leaders in biopharma, biotech, and industrial biotechESM3/ESM-C are AI-native protein design tools that address this broader category; analyst estimates $2.6–5.1B (2023–2025)
AI Drug Discovery Platforms (secondary adjacent)AI-enabled target identification, generative molecule design, virtual screening, ADMET predictionCRO wet-lab services, clinical trial management, sequencing platforms, medical imaging AIPharma R&D Leadership / Chief Scientific Officers, Business DevelopmentESM3 enables protein target characterization and antibody design, positioning EvolutionaryScale as an enabling platform for AI drug discovery workflows
Industrial Biotechnology (adjacent non-pharma)Enzyme engineering for green chemistry, biofuels, agriculture, specialty materials, food scienceDiagnostic tools, medical devices, hospital IT, insurance platformsIndustrial biotech R&D teams, CTO/VP Engineering in biotech manufacturingGrowing secondary buyer segment: ESM3/ESM-C applicable to enzyme optimization workflows; lower regulatory burden than pharma
Open-Source / Self-Hosted Substitutes (status quo)Self-hosted ESM2 (MIT), Rosetta (open-source), AlphaFold (free non-commercial), PyRosettaNone—this category represents the free alternative EvolutionaryScale competes with for paid conversionAcademic labs, well-resourced startup computational teams, internal pharma bioinformatics groupsOpen-source ESM-C under MIT license is a direct substitute for Forge API; self-hosting is the primary cost competitor

Market boundaries derived from EvolutionaryScale product documentation, ESM3 Science paper, analyst market reports (MarketsandMarkets, Precedence Research, Grand View Research), and FDA regulatory guidance. The Open-Source substitutes row is included to document the direct competitive dynamic between EvolutionaryScale's free offerings and its paid Forge API. AI drug discovery platform market adjacent estimate is from Grand View Research ($2.35B, 2025).

[CM001, CM002, CM003, CM004, CM005, CM006]

2.2 Market Sizing and Analyst Estimates

Published estimates for the protein engineering market vary considerably based on scope definition. MarketsandMarkets estimates the market at $2.2B (2019) growing to $3.9B by 2024 at a 12.4% CAGR—a narrower estimate that focuses on protein engineering tools and services. Precedence Research takes a broader scope, estimating the market at $5.09B in 2025 growing to $23.59B by 2035 at 16.57% CAGR, incorporating a wider set of protein engineering applications including industrial enzymes and biopharmaceuticals. Allied Market Research estimates $2.2B in 2022 growing to $7.7B by 2032 at 13.2% CAGR. Grand View Research estimates $2.60B in 2023 growing to $7.62B by 2030 at 16.24% CAGR. The directional consensus is clear—mid-teen percentage annual growth for over a decade—but the $2.2B vs. $5.09B base discrepancy reflects scope inconsistency, not contradictory evidence. The adjacent AI drug discovery market (Grand View Research) is estimated at $2.35B in 2025 growing to $13.77B by 2033 at 24.8% CAGR—a materially faster growth rate than protein engineering tools alone, reflecting the pull from large-pharma AI investment post-AlphaFold. The broader drug discovery market (all modalities) is estimated at $71.89B in 2025 growing to $158.74B by 2034 at 9.2% CAGR (Precedence Research), serving as the outer TAM reference. EvolutionaryScale's serviceable addressable market is the sub-segment of protein engineering tools customers who (a) have computational biology infrastructure, (b) are adopting AI-native tools rather than traditional bioinformatics, and (c) have compute scale justifying a paid API subscription over self-hosting open weights. EvolutionaryScale's current serviceable obtainable market includes: Forge API subscription revenue from biopharma and biotech customers; commercial model access via AWS SageMaker JumpStart and NVIDIA BioNeMo; and enterprise licensing for ESM3 and ESM-C deployment in regulated settings. The company raised $142M in Series A funding (Crunchbase), indicating investor validation of the market opportunity, though no independent SOM figure for PLM APIs within pharmaceutical R&D has been published. This is a material diligence gap documented in the sizing-gaps section. HuggingFace download metrics (6,320+ for ESM-C, 3,110+ for ESM3) and 129+ bioRxiv citations provide proxies for user adoption rather than commercial revenue traction.[CM007, CM008, CM009, CM010, CM011, CM012]

TAM/SAM/SOM or sizing lens table
PublisherYearGeographyMarket Value (USD)CAGRMethodology / ScopeConfidenceKey Limitation
MarketsandMarkets (paywalled)2019–2024Global$2.2B (2019) → $3.9B (2024)12.4% CAGRProtein engineering tools and services; includes software, services, and some equipment; bottom-upmediumNarrower scope; excludes industrial enzymes broadly; methodology not fully disclosed at free tier
Precedence Research2025–2035Global$5.09B (2025) → $23.59B (2035)16.57% CAGRBroad protein engineering market; incorporates industrial applications, biopharmaceuticals, and research toolsmediumBroader scope inflates base vs. MAM; 2035 forecast carries compounding uncertainty; scope inconsistency vs. narrower estimates
Allied Market Research2022–2032Global$2.2B (2022) → $7.7B (2032)13.2% CAGRResearch-focused protein engineering tools; consistent with MAM midpoint trajectorymediumScope partially overlapping with MAM and GVR; paywalled primary; confidence limited by secondary reporting
Grand View Research – Protein Engineering2023–2030Global$2.60B (2023) → $7.62B (2030)16.24% CAGRProtein engineering market including biopharmaceutical protein engineering and industrial applicationsmediumVia Wayback Machine snapshot; current live page requires subscription; scope consistent with Allied but higher CAGR
Grand View Research – AI Drug Discovery2025–2033Global (adjacent)$2.35B (2025) → $13.77B (2033)24.8% CAGRAI-enabled drug discovery software platforms; faster growth than protein engineering tools alone due to pharma AI investment accelerationmediumAdjacent market; EvolutionaryScale is an enabling tool rather than a full drug discovery platform; partial TAM overlap
Precedence Research – Drug Discovery (outer boundary)2025–2034Global$71.89B (2025) → $158.74B (2034)9.2% CAGRAll drug discovery services and tools; broadest boundary including CRO wet-lab services, instruments, software, and AI toolsmediumSignificantly over-inclusive; outer TAM boundary only; most of this market is not addressable by PLM APIs
EvolutionaryScale / Crunchbase (SOM proxy)2024–2026Global$142M raised (Series A) as investor-validated proxy for commercial opportunityN/ASeries A round as funding proxy for commercial potential; HuggingFace downloads (6,320+ ESM-C) as developer traction indicatorlowNo SOM figure published; funding and download metrics are proxies, not revenue evidence; Forge API pricing undisclosed

All primary analyst report TAMs are paywalled; values obtained from accessible landing pages, Wayback Machine snapshots, and secondary summaries. The 10× spread between narrowest ($2.2B, MAM 2019) and broadest ($23.59B, Precedence 2035) estimates is primarily attributable to scope differences and forward projection period, not genuine market disagreement. The EvolutionaryScale SOM proxy is analytical, not published; no independent PLM API sub-market sizing was identified. Growth rates are directionally consistent at 12–17% for protein engineering, accelerating to 24.8% for AI drug discovery specifically.

[CM007, CM008, CM009, CM011, CM012, CM013]
FM001: Market sizing lens

Four-level sizing pyramid for EvolutionaryScale's market: TAM outer boundary (all drug discovery services globally), TAM addressable (protein engineering tools market), SAM (AI-native protein design tools and APIs within protein engineering), and SOM (EvolutionaryScale's current Forge API and distribution channel revenue zone), as of 2026.

TAM outer from Precedence Research ($71.89B, 2025); TAM addressable midpoint of four analyst estimates (MAM, Allied, GVR, Precedence). SAM is an analytical estimate not independently published; derived from the fraction of protein engineering tools that are AI-native software-only (excluding instruments and reagents). SOM reflects commercial API and cloud distribution only; Forge API pricing and subscriber count are undisclosed. All figures are directional.

[CM006, CM007, CM008, CM009, CM011, CM012]
FM002: Market estimate range

Low/base/high estimates across four market sizing lenses in USD billion: protein engineering tools market (2024–2025 base), AI drug discovery adjacent market (2025), protein engineering 2030 forecast, and broader drug discovery market (2025). All values in USD billion for consistent comparison.

Protein Engineering 2024–2025: low=MAM 2024 estimate ($3.9B base; $2.2B is the 2019/2022 entry), mid=MAM $3.9B, high=Precedence $5.09B. AI Drug Discovery 2025: low=conservative industry estimate, mid=GVR base ($2.35B), high=market expansion scenario. Protein Engineering 2030 forecast: low=linear extrapolation at 12% CAGR from $3.9B, mid=GVR $7.62B, high=Precedence trajectory projection. Drug Discovery outer boundary: anchored on Precedence Research $71.89B midpoint. All values in USD billion; incompatible units excluded.

[CM007, CM008, CM009, CM011]

2.3 Buyer and User Segmentation

The primary commercial buyer for EvolutionaryScale's Forge API is the large and mid-tier pharmaceutical or biotech company with an active computational biology or protein engineering program. The economic buyer is typically a VP of Computational Biology, Director of Drug Discovery, or Chief Scientific Officer, with procurement authority delegated through R&D and IT budgets. The technical champion—who evaluates model capabilities and advocates for adoption—is the computational biologist, structural biologist, or machine learning scientist embedded in discovery teams. Finance and procurement officers act as formal payers, requiring ROI justification via reduction in wet-lab screening cycles or accelerated lead identification timelines. The Forge commercial API targets enterprise customers needing large-scale protein sequence generation and embedding with guaranteed availability and compliance. AWS SageMaker JumpStart and NVIDIA BioNeMo serve as secondary commercial channels, reaching pharma customers already embedded in those cloud ecosystems. Academic and government research labs represent a high-volume but non-revenue user segment that downloads ESM-C open weights under MIT license from HuggingFace (6,320+ downloads) and installs via the PyPI `esm` package—building community mindshare that feeds commercial pipeline. Industrial biotechnology companies—engineering enzymes for green chemistry, agriculture, and specialty materials—represent a materially different buyer segment with shorter product development cycles and lower regulatory burden. Contract research organizations offer a dual role: potential buyers of protein LM APIs to enhance their service offerings, and potential channel partners who resell compute-enabled protein engineering services to pharma clients. Biotech startups at the Series A–B stage represent an emerging paid segment: they have computational infrastructure but lack the internal resources to train frontier protein LMs, making Forge API access economically rational.[CM015, CM016, CM017, CM018, CM019, CM020]

Segment / buyer map
SegmentEconomic BuyerTechnical Champion / UserPayerWorkflow NeedBudget OwnerPrimary Adoption Trigger
Top-20 Global Pharma (e.g., Pfizer, Roche, Novartis)VP Computational Biology / Chief Scientific OfficerComputational Biologist / ML Scientist / Structural BiologistR&D Budget Committee / CFOProtein sequence optimization for biologics and ADCs; antibody engineering; target characterization; virtual library generationCSO / VP R&D with CFO approval for enterprise contractsPipeline productivity mandate: reduce directed evolution cycles; integrate AI protein design into existing informatics stack
Mid-Tier Biopharma / Biotech ($50M–$2B revenue)VP R&D / Director Computational BiologyComputational Scientist / Research EngineerR&D / Series B–C investor capitalLead optimization and protein stability engineering with limited internal compute resources; access frontier LMs without training infrastructureCTO or VP R&D with board sign-off above thresholdFundraising milestone: need in silico validation data for next round; cost advantage vs. internal LM training
Academic / Government Research LabsPrincipal Investigator / Department HeadPostdoctoral Researcher / Graduate StudentNIH grants, government funding, institutional ITProtein function prediction, evolutionary analysis, variant effect prediction for basic research; no commercial intentPI with grant budget authority; no formal procurement cycleOpen-weight availability (ESM-C MIT license on HuggingFace); integration with existing PyTorch workflows at zero direct cost
Industrial Biotech (Enzyme Engineering, Synthetic Biology)VP Protein Engineering / Chief Technology OfficerEnzyme Engineer / Protein Scientist / Computational ChemistR&D budget / product development capitalEnzyme engineering for specific activity, stability, pH/temperature tolerance; de novo protein design for biomaterials and green chemistryCTO / VP Product DevelopmentShorter product cycles vs. pharma; demonstrated ROI via reduced wet-lab screening iterations; lower regulatory risk
Contract Research Organizations (CROs)VP Scientific Services / Business DevelopmentComputational Biologist / Protein Modeling SpecialistCRO operating budget (tools integrated into service cost)Enhanced computational protein engineering service offering to pharma and biotech clients; differentiation vs. wet-lab-only CROsOperations / Finance with scientific leadership inputCompetitive differentiation: offer AI-native protein engineering services that pure wet-lab CROs cannot; resell value created by ESM3/ESM-C

Buyer segments derived from EvolutionaryScale product documentation (Forge API, ESM-C on AWS/NVIDIA), NVIDIA BioNeMo and AWS SageMaker distribution announcements, HuggingFace developer adoption data, and analogous SaaS API commercial structures. Academic/government segment is highest volume but non-revenue under the free open-weight tier. Top pharma and mid-tier biotech are the primary commercial targets. Budget ownership structures are archetypes; actual organizational authority varies by company size and culture.

[CM015, CM016, CM017, CM018, CM019, CM020]
FM003: Buyer / segment map

Matrix mapping EvolutionaryScale buyer segment against economic buyer role, technical champion, and primary adoption trigger for Forge API and ESM-C purchasing or adoption decisions.

Buyer roles are archetypes derived from EvolutionaryScale Forge API documentation, ESM-C distribution announcements (AWS SageMaker, NVIDIA BioNeMo), HuggingFace developer community signals, and analogous SaaS API commercial models. Actual organizational titles and approval thresholds vary. Academic/government row reflects open-weight non-commercial adoption only, with no paid revenue contribution under current MIT license model.

[CM014, CM015, CM016, CM017, CM018, CM019]

2.4 Growth Drivers and Adoption Constraints

Five structural forces drive protein LM market growth. First, exponential protein sequence database expansion: DNA sequencing costs declined from ~$10,000 per genome in 2011 to ~$100 per genome by 2023 (National Human Genome Research Institute), enabling the generation of billions of new protein sequences. ESM3 was trained on 2.78 billion protein sequences with 98 billion parameters—a scale only achievable because of this data democratization. Second, AlphaFold2/3 (Google DeepMind) generated a free database of 200+ million protein structures—removing the historical $500,000+ crystallography cost barrier—and trained the field to accept computational protein tools as production-grade. Third, NVIDIA BioNeMo delivers 2× faster biofoundation model training and 6× faster model inference, reducing the total cost of ownership for enterprise PLM deployment, lowering the economic barrier to Forge API alternatives and enhancing EvolutionaryScale's distribution reach. Fourth, FDA regulatory engagement is accelerating: over 500 AI/ML-enabled drug development submissions were received between 2016 and 2023, a 2025 draft guidance was issued, and the CDER AI Council was established in 2024—reducing regulatory ambiguity for pharma customers evaluating AI-native discovery tools. Fifth, the ESM-C open-weight release under MIT license establishes EvolutionaryScale as the community standard for protein LMs, mirroring the Hugging Face open-weight strategy that drove commercial cloud API conversion in NLP. Four material constraints limit adoption pace. First, open-source commoditization: ESM-C weights are available for free under MIT license; any well-resourced lab can self-host, reducing paid conversion rates and pricing power at the non-frontier end of the market. Second, wet-lab validation requirements: no protein designed purely by computational AI has entered regulatory approval without extensive in vitro and in vivo confirmation—the API has value but cannot replace the experimental bottleneck. Third, enterprise procurement friction: pharma IT security reviews, cloud data governance policies, and multi-year vendor vetting cycles add 12–24 months to commercial deployment timelines relative to academic adoption. Fourth, competitive pressure from Big Tech: Google DeepMind, NVIDIA BioNeMo, and AWS HealthOmics all have distribution, compute, and ecosystem advantages that could threaten EvolutionaryScale if frontier protein LM capabilities converge toward commodity.[CM022, CM023, CM024, CM025, CM026, CM027]

Growth drivers and constraints table
Driver / ConstraintDirectionTimingImplication for EvolutionaryScaleDiligence Ask
DNA sequencing cost democratization (~$100/genome by 2023, down from $10,000 in 2011)Growth driverStructural, ongoingEnabled training of ESM3 on 2.78B protein sequences; expanding protein databases sustain competitive moat for frontier model trainingConfirm that EvolutionaryScale has ongoing data pipeline access and compute budget to retrain on new sequences as databases expand
AlphaFold2/3 free protein structure database (200M+ structures)Growth driverNow, acceleratingRemoves the historical crystallography cost barrier; normalizes computational protein tools in pharma R&D; expands the addressable buyer base for ESM3/ESM-C by removing 'is this trustworthy?' frictionConfirm ESM3's training data overlap with AlphaFold structural database; assess whether EvolutionaryScale co-trains on structural inputs
NVIDIA BioNeMo 2× training / 6× inference speedupGrowth driverNow, hardware-dependentReduces total cost of ownership for enterprise ESM deployment; strengthens AWS and NVIDIA partnership distribution channelVerify that EvolutionaryScale's ESM-C NIM microservices are tested and GA on BioNeMo; assess revenue share or referral structure
FDA regulatory AI engagement: 500+ submissions 2016–2023, 2025 draft guidance, CDER AI Council 2024Growth driverEmerging, 2025–2027Reduces regulatory uncertainty for pharma partners evaluating AI-native discovery tools; signals FDA acceptance of AI in IND submissionsConfirm whether any ESM3/Forge-enabled biology has been referenced in an IND or regulatory filing; obtain current FDA guidance applicability assessment
Biopharma R&D spend growth (pharma R&D budgets ~5–6% annual growth)Growth driverStructural, long-termExpanding total procurement budget in target buyer segment supports Forge API revenue growth even at constant market shareTrack top-pharma R&D budget disclosures annually; assess computational biology as proportion of R&D budget in Forge target accounts
Open-source commoditization: ESM-C free under MIT license; ESM2 freely available from MetaAdoption constraintNow, persistentSelf-hosting displaces paid Forge conversions for price-sensitive customers with GPU access; limits pricing power at smaller model tiersQuantify the fraction of HuggingFace ESM-C downloaders who subsequently convert to paid Forge API; assess self-hosting economics at 6B parameter scale
Wet-lab validation requirement: no AI-designed protein has entered regulatory approval without experimental confirmationAdoption constraintStructural, long-termPure API value proposition is limited to computational workflow steps; cannot replace in vitro / in vivo validation; limits per-user revenue ceiling for platform-only playsConfirm whether EvolutionaryScale plans to offer wet-lab validation partnerships or data integration services as part of Forge enterprise offering
Enterprise procurement friction: 12–24 months for pharma cloud vendor vettingAdoption constraintNow, structuralCommercial deployment timelines longer than academic adoption; impacts revenue conversion from pilot to enterprise contractObtain reference customer disclosure of time from first API trial to enterprise Forge contract; assess SOC2 and GxP compliance certifications

Drivers and constraints derived from FDA regulatory guidance page, NVIDIA BioNeMo technical documentation, EvolutionaryScale ESM3 Science paper, AlphaFold public database disclosures, National Human Genome Research Institute sequencing cost data, and HuggingFace/GitHub developer adoption metrics. Patent cliff and pharma R&D spend figures from IQVIA and Statista secondary sources. No single source covers all items; synthesis reflects convergence across multiple evidence types.

[CM022, CM023, CM024, CM025, CM026, CM027]
FM004: Adoption funnel

Protein LM API adoption funnel from all potential enterprise users globally through to active EvolutionaryScale Forge commercial subscribers, illustrating conversion stages and estimated magnitude at each step as of 2026.

Funnel counts are analytical estimates; no authoritative published survey of PLM API adoption stages exists. Total global protein engineering companies estimated from biotech/pharma industry databases. Companies with computational biology budgets derived from proportion of top-1000 pharma/biotech companies maintaining dedicated in-house computational teams. Free-tier ESM user count extrapolated from HuggingFace download metrics (6,320+ ESM-C downloads) and GitHub stars. Commercial Forge and enterprise contract counts are estimates; EvolutionaryScale has not publicly disclosed subscriber or contract counts. All figures are directional order-of-magnitude estimates requiring diligence verification.

[CM020, CM021, CM027, CM031]

2.5 Sizing and Adoption Diligence Gaps

Several material evidence gaps limit the precision of the market analysis. First, no independently published serviceable addressable market figure exists for protein language model APIs within pharmaceutical R&D specifically. All protein engineering market estimates ($2.2B–$23.59B) encompass the full market including reagents, instruments, and services; the pure API/software sub-segment has not been sized in any accessible analyst report. Deriving a PLM API SAM requires analytical estimates of what fraction of protein engineering spend is addressable by computational tools—an assumption-heavy inference without independently verifiable basis. Second, EvolutionaryScale's Forge API pricing and actual paid subscriber count have not been publicly disclosed. HuggingFace downloads (6,320+ ESM-C, 3,110+ ESM3) and PyPI install counts indicate developer traction, but do not translate to paid commercial revenue without knowledge of the Forge conversion funnel and pricing structure. Third, the biorxiv preprint search returned 129 papers citing ESM3/EvolutionaryScale, and the ESM3 Science paper (DOI: 10.1126/science.ads0018) has been cited extensively—but academic citation impact does not directly map to commercial market penetration in pharma. Fourth, analyst estimates for the protein engineering market span a 10× range ($2.2B to $23.59B) for adjacent base years; this range is primarily driven by scope differences (tools-only vs. instruments+reagents+services) rather than genuine market disagreement, but investors comparing across sources without scope adjustment may reach erroneous conclusions. Evidence preserved for successor chapters: (a) no paid customer disclosures have been identified—this is a diligence gap for Chapter 4 (Business Model); (b) comparable protein LM API pricing benchmarks against OpenAI API, Anthropic API, and AWS Bedrock should be obtained in Chapter 3 (Competitors); (c) whether ESM3's commercial Forge API has been used in any IND submission or regulatory filing has not been confirmed—this gap should be addressed in Chapter 5 (Technology) or Chapter 7 (Regulatory).[CM032, CM033, CM034, CM035, CM036]

2.6 Exhibits

Chapter 03

03Competitors

3.1 Competitive Universe and Category Segmentation

EvolutionaryScale competes at the intersection of protein language modeling, generative biology, and AI-enabled drug discovery. The competitive universe divides into four segments. The first and most direct segment is AI protein design platform peers: companies that build and commercialize foundation models specifically for protein engineering and design. Profluent Bio (San Francisco, ~$44M raised) uses ProGen-derived models and produced OpenCRISPR—described on its website as the world's first AI-designed gene editor. Cradle.bio (Amsterdam, ~$73M raised) offers a SaaS protein engineering platform that integrates proprietary wet-lab data cycles with customer data, claiming 2–12x faster development timelines; it counts Novonesis (formerly Novozymes) among its customers. Generate Biomedicines (Somerville, MA, ~$700M+ raised) operates the Generate Platform—a continuously trained generative biology loop that has generated, built, and tested over 42,000 proteins across 140,000+ sq ft of lab space—with active major-pharma partnerships. AbSci Corporation (NASDAQ: ABSI; Vancouver, WA) is the only publicly traded direct peer, with an AI Drug Creation Platform for de novo antibody design using 6-week iterative cycles and a FY2025 10-K filed with the SEC in March 2026. Adaptyv Bio (Lausanne, Switzerland) positions as a cloud lab for protein designers at the Biopole Life Science Campus. The second segment comprises foundation-model and broader bio-AI peers. Isomorphic Labs (London, Alphabet subsidiary) holds the exclusive commercial license to AlphaFold 3 for drug discovery, following the joint Google DeepMind announcement in May 2024, and signed landmark deals with Eli Lilly and Novartis in early 2024. Chai Discovery (San Francisco) released Chai-1 as an open model and is advancing Chai-2 for drug-like de novo antibody design with atomic precision. Xaira Therapeutics (San Francisco, ~$1B raised at launch in April 2024) is building predictive and agentic AI models across the complete drug discovery spectrum. Iambic Therapeutics uses Enchant and NeuralPLexer AI technologies and has Phase 1b data for IAM1363 (HER2 inhibitor). Inceptive (Palo Alto/Berlin/Zurich, founded 2021) specializes in RNA/mRNA/siRNA/ASO/peptide foundation models. The third segment includes AI drug-discovery integrators. Insilico Medicine (HKEX: 3696) has the most advanced clinical proof among AI-first companies, having completed a Phase 2 trial with ISM001-055 (TNIK inhibitor for IPF). Recursion Pharmaceuticals (NASDAQ: RXRX) acquired Exscientia and operates over 50 petabytes of phenomics data with BioHive-2 (built with NVIDIA). Schrödinger (NASDAQ: SDGR) is the dominant physics-based platform with 30+ years of R&D and FEP+, WaterMap, and LiveDesign tools. The fourth segment covers academic and open-source actors. The Institute for Protein Design (Baker Lab, UW)—whose co-director David Baker shared the 2024 Nobel Prize in Chemistry—distributes RFdiffusion and RoseTTAFold as free open-source tools and has a royalty-free COVID-19 vaccine approved in the UK and South Korea. Google DeepMind makes AlphaFold 3 available via AlphaFold Server (free, non-commercial) and the EMBL-EBI AlphaFold DB (200M+ structures, CC-BY-4.0). The most adversarially important open-source threat is Meta's ESM model family (ESM2, ESMFold)—created by Alexander Rives, Zeming Lin, Tom Sercu, and Salvatore Candido, the exact same individuals who founded EvolutionaryScale—and released under an MIT license, establishing a commoditization floor for basic protein language modeling. [CP001, CP002, CP003, CP004, CP005, CP006]

AI Protein Design and Bio-AI Competitive Landscape Overview
CompetitorCategoryScale / FundingTarget SegmentKey ProductDifferentiationLimitation
Profluent BioDirect AI protein design~$44M raised (est.)Biotech / pharma protein engineeringProGen-based models; OpenCRISPRFirst AI-designed gene editor; open-access publication strategySmaller compute/data scale vs ESM3; limited wet-lab
Cradle.bioDirect AI protein design~$73M raised (est.)Biopharma; industrial biotechSaaS protein engineering platform; in-house wet labData-flywheel lock-in; SOC2; Novonesis partnership; no-royalties modelNo large foundation pre-training; customer-data model limits generalization
Generate BiomedicinesDirect AI protein design~$700M+ raised (est.)Biopharma therapeuticsThe Generate Platform; 42K+ proteins tested; 140K+ sq ft labMost capital; full wet-lab loop; major pharma partnershipsNo public API; partnership-only access; highly capital-intensive
AbSci CorporationDirect AI protein designNASDAQ:ABSI (public)Biopharma antibody programsAI Drug Creation Platform; 6-week cycles; ABS-201 candidatePublicly traded transparency; iterative wet-lab + AI; ABSI SEC filingsQuarter-by-quarter revenue pressure; no approved product yet
Adaptyv BioDirect AI protein designEarly-stage (undisclosed)Academic; smaller biotechCloud lab for protein designersSwiss life-science hub location; Biopole campusVery limited public information; product scope unclear
Isomorphic LabsFoundation-model bio-AIAlphabet-funded (undisclosed)Major pharma drug discoveryAlphaFold 3 commercial license; drug discovery platformExclusive commercial AF3 rights; Alphabet resources; Lilly/Novartis dealsNon-commercial AF3 still free via Server; limited to drug discovery
Chai DiscoveryFoundation-model bio-AIPrivate (undisclosed)Drug discovery; antibody designChai-2 de novo antibody designAtomic-precision antibody design; Chai-1 open-releasedEarly-stage; compute scale smaller than EvolutionaryScale
Baker Lab / IPD (UW)Academic / open-sourceNSF/NIH/public fundingGlobal academic; spinout biotechRFdiffusion; RoseTTAFold; protein therapeuticsNobel 2024 (Baker); royalty-free tools; approved COVID-19 vaccine spinoutNot commercial; distributes tools that compete with paid offerings
DeepMind AlphaFoldAcademic / platformAlphabet (unlimited)Global research; pharmaAlphaFold 3; AFDB (200M+ structures; CC-BY-4.0)Freely available for research; 3M+ users in 190+ countriesNon-commercial use only via Server/DB; commercial via Isomorphic exclusively
Meta FAIR / ESM2Open-source baselineMeta (unlimited)All biology researchersESM2 (MIT license); ESMFoldMIT license for all uses including commercial; same founders as EvolutionaryScaleNot maintained/updated post-2023; no multimodal reasoning; ESM3 supersedes internally

Funding figures for Profluent (~$44M), Cradle (~$73M), and Generate Biomedicines (~$700M+) are estimates from public reporting and analyst context; exact totals are not officially confirmed by all companies. AbSci is publicly traded (NASDAQ:ABSI) with SEC-disclosed financials. Isomorphic Labs, Chai Discovery, and Adaptyv Bio do not disclose funding publicly. "Limitation" cells reflect publicly observable constraints, not internal assessments.

[CP001, CP002, CP003, CP004, CP005, CP006]
FP001: Competitive Positioning Map: Protein Design Landscape

Ordinal positioning of major competitors on two evidence-backed axes: research-tool vs. clinical/drug-pipeline focus (x-axis) and open/freely accessible vs. proprietary/ commercial (y-axis). Positions are evidence-backed ordinal scores (1–5), not numeric metrics.

Axis positions are evidence-backed ordinal scores (1=most research/open, 5=most clinical/proprietary). Numeric values are not metric measurements; they represent relative ordering based on public information about product type, licensing, and pipeline status as of May 2026.

[CP001, CP002, CP007, CP009, CP010, CP011]

3.2 Direct AI Protein Design Peers: Detailed Profiles

Among direct AI protein design platform peers, each carries a distinct go-to-market model and differentiation strategy relative to EvolutionaryScale. Profluent Bio (San Francisco, 2022) focuses on protein-centric AI and has taken a public-good strategy with OpenCRISPR, publishing the first AI-designed gene editor as open access. The company's core ProGen model architecture is designed for protein sequence generation. Profluent's estimated ~$44M funding base is substantially smaller than EvolutionaryScale's $142M, implying more limited compute and data infrastructure. Cradle.bio (Amsterdam, 2021) differentiates through a SaaS platform that integrates customer wet-lab data with proprietary protein engineering models. Its model improves each time customers upload experimental results ("models learn with you"), creating a data-flywheel lock-in effect. Cradle claims customers achieve 2–12x faster protein development timelines and explicitly retains a no-royalties subscription model where customers own all generated IP. The company operates its own wet lab in Amsterdam as a proof-of-concept validation layer. Cradle's Novonesis partnership represents one of the world's largest industrial biotech companies embedding AI into its innovation workflow. The platform is SOC 2 compliant and supports single sign-on. Generate Biomedicines (Somerville, MA, 2018) is the most heavily funded direct peer, with over $700M raised. Its Generate Platform integrates AI model training, high-throughput protein expression, and iterative learning across 140,000+ sq ft of lab space. The company has tested 42,000+ proteins with a continuously improving feedback loop. Its lead program GB-0895 targets TSLP for asthma, co-optimized from the start for improved biological effect and reduced dosing frequency (twice-yearly potential). Active partnerships with major biopharma firms signal platform-level pharma validation. AbSci Corporation (NASDAQ: ABSI, Vancouver, WA) is the only publicly traded direct competitor. Its AI Drug Creation Platform integrates wet-lab and AI in iterative 6-week cycles for de novo biologic (antibody) design and multi-parametric lead optimization. AbSci's ABS-201 is an AI-designed antibody targeting prolactin receptors for androgenetic alopecia with demonstrated in vivo hair follicle regeneration. A 10-K for FY2025 was filed with the SEC on March 24, 2026, providing public disclosure not available from private peers, but also exposing AbSci to quarterly revenue pressure. Adaptyv Bio (Lausanne, Switzerland) positions as a cloud lab for protein designers at the Biopole Life Science Campus. Public information is significantly limited compared to other direct peers, suggesting early-stage or pre-public status. [CP018, CP019, CP020, CP021, CP022, CP023]

Feature / Capability Matrix: AI Protein Design Platform Comparison
CapabilityEvolutionaryScale (ESM3)ProfluentCradle.bioGenerate BiomedicinesAbSciAlphaFold 3Meta ESM2
Multimodal (seq + struct + func jointly)Yes — simultaneous reasoningSequence-centric (partial)Within-project optimizationGenerate-build-measure loopWet-lab + AI iterativeStruct + molecule interactionsNo — sequence only
De novo generative designYes (ESM3 generative)Yes (ProGen-based)Yes (guided by lab data)Yes (platform core)Yes (de novo antibody)Limited (structure prediction)Limited (embedding / prediction)
Self-service API / developer accessYes (Forge, public beta)UnknownYes (SaaS platform)No (partnership only)No (partnership only)Yes (AlphaFold Server, free)Yes (HuggingFace, MIT, free)
Wet-lab validation loopNo (no disclosed wet lab)UnknownYes (Amsterdam wet lab)Yes (140K+ sq ft)Yes (integrated cycles)NoNo
Commercial license (non-research)Yes (Forge paid)UnknownYes (subscription)Yes (partnership)Yes (partnership + public)Yes (via Isomorphic Labs)Yes (MIT, free)
Largest model scale (parameters)98B (largest disclosed)~1B–10B (estimated)UnknownUnknownUnknownUnknown (large)15B (largest free variant)

Unknown cells reflect absence of public disclosure; they are evidence gaps, not negatives. "Limited" indicates capability exists only partially. AbSci's "public" qualifier refers to NASDAQ:ABSI trading status, not public API. Meta ESM2 15B is the largest publicly available free protein LM. AlphaFold 3 parameter count is not publicly disclosed.

[CP003, CP004, CP005, CP019, CP020, CP027]

3.3 Pricing, Packaging, and Distribution Comparison

Pricing models across the AI protein design landscape reflect fundamentally different go-to-market philosophies. EvolutionaryScale offers Forge, a commercial API platform for ESM3 and ESMC access, with public beta announced in early 2025. The ESM3 small model (open) and ESMC (300M/600M) are available freely on HuggingFace for non-commercial and research use, while larger models and generative API access require Forge accounts. Cradle.bio explicitly positions as a software subscription ("no royalties, just a software subscription fee"), where customers retain all generated IP and their experimental data is never used to train models for others. This SaaS stickiness contrasts with EvolutionaryScale's usage-based API model. Meta's ESM2 is MIT-licensed and freely available on GitHub and HuggingFace for all uses including commercial, at zero cost. ESMFold structure prediction is similarly open. DeepMind's AlphaFold DB (200M+ structures) is CC-BY-4.0 for any use; AlphaFold Server is free for non-commercial research. Commercial AF3 use in drug discovery routes through Isomorphic Labs' exclusive licensing arrangements with pharma. Generate Biomedicines and AbSci both operate B2B partnership-and-licensing models with pharma majors rather than self-service APIs. Neither offers a public pricing page or developer API. Baker Lab/IPD tools (RFdiffusion, RoseTTAFold) and OpenFold are entirely free under permissive licenses. Key diligence gap: Forge enterprise list pricing, volume tiers, and realized per-call costs are not publicly disclosed. Isomorphic Labs' deal economics with Lilly and Novartis are reported but not publicly broken down. [CP027, CP028, CP029, CP030, CP031, CP032]

Pricing and Packaging Comparison
VendorAccess ModelApproximate Price / TierIncluded CapabilitiesIP / Data ArrangementKey Unknown / Gap
EvolutionaryScale (Forge)API (usage-based)Public beta; enterprise pricing undisclosedESM3 and ESMC inference; sequence/struct/function generationUser retains IP of generated proteinsEnterprise list price and volume tiers not public
Cradle.bioSaaS subscriptionUndisclosed list priceProtein engineering; multi-property optimization; data mgmtCustomer retains full IP; no royalties; data never sharedExact subscription cost undisclosed
Meta ESM2 / ESMFoldOpen-source (MIT license)Free (zero cost)Sequence embeddings; structure prediction (ESMFold)MIT — any use including commercialNone — freely available
DeepMind AlphaFold (non-commercial)Free server + DB (research)Free for non-commercial (CC-BY-4.0)200M+ structures; AF3 predictionsCC-BY-4.0 for research; commercial only via IsomorphicIsomorphic Labs commercial pricing not public
Baker Lab / IPD toolsOpen-source (permissive)FreeRFdiffusion; RoseTTAFold; design toolsRoyalty-free; open-sourceNone — fully open
Generate BiomedicinesStrategic partnershipsNegotiated (undisclosed)Full lab + AI generative biology platformPartnership and licensing arrangementsNot self-service; pricing private; deal terms private
AbSci (NASDAQ:ABSI)Partnerships + SEC-disclosed revenueNegotiated per partnershipAI antibody design + wet-lab validation cyclesPartnership licensing; FY revenue in SEC filingsRealized economics per deal not itemized
Schrödinger (NASDAQ:SDGR)Enterprise software licenseARR ~$130–150M est. (2024 analyst)FEP+; WaterMap; LiveDesign; drug pipeline programsSoftware license; pipeline via separate entityExact per-seat or per-module pricing private

Pricing data is based on publicly available information. EvolutionaryScale Forge enterprise pricing, Cradle subscription cost, and Isomorphic Labs deal economics are undisclosed. Schrödinger ARR is an analyst estimate from 2024 coverage, not company-disclosed. AbSci revenue details appear in NASDAQ:ABSI quarterly/annual filings.

[CP027, CP028, CP029, CP030, CP031, CP032]

3.4 Moat Durability, Commoditization Risk, and Adverse Analysis

EvolutionaryScale's most structurally critical competitive risk is the open-source commoditization of protein language models. ESM2 and ESMFold—developed by Alexander Rives, Zeming Lin, Tom Sercu, and Salvatore Candido at Meta AI FAIR—are available on GitHub and HuggingFace under the MIT license for any purpose including commercial use. ESM2 variants span 8M to 15B parameters; ESMFold predicts protein structure up to 60x faster than prior state-of-the-art. OpenFold provides an independent open reimplementation of AlphaFold with permissive licensing. Any biotech or pharma can deploy these free models as a baseline, directly compressing the value of ESM3 for applications that do not require multimodal prompting or frontier scale. EvolutionaryScale attempts to escape this floor through three strategies: (1) frontier scale—ESM3 at 98B parameters and 10^24 FLOPs is the largest protein language model with demonstrated emergent generative capabilities; (2) multimodal differentiation— ESM3 is the first model to jointly reason over protein sequence, structure, and function, a capability absent from ESM2; and (3) R&D velocity—ESM Cambrian (ESMC) was released in December 2024, sustaining a cadence of new model releases. However, several adverse signals are present. First, Chai Discovery's Chai-2 advances atomic-precision de novo antibody design, and Isomorphic Labs holds AlphaFold 3 commercial exclusivity, meaning the multimodal protein design space is crowding. Second, clinical proof-of-concept moats are held by Insilico Medicine (Phase 2 completion for an AI-designed drug) and Recursion (multi-program clinical pipeline), while EvolutionaryScale has no disclosed internal pipeline. Third, Generate Biomedicines' capital base ($700M+) far exceeds EvolutionaryScale's ($142M), enabling a more capital-intensive lab-validated strategy. Fourth, pharma clients can and do multi-home across free tools (AlphaFold, ESM2) and paid platforms (Forge, Cradle, Generate) simultaneously, limiting any single vendor's pricing power. [CP033, CP034, CP035, CP036, CP037, CP038]

Moat Durability and Competitive Risk Register
Moat ClaimThreat / Counter-EvidenceSeverityMitigation / Diligence Ask
Frontier scale — 98B params, 10^24 FLOPsOpen-source models approach capability (ESM2 15B free; Chai-2 advancing); rapid convergenceHighValidate whether ESM3 98B shows meaningfully better scientific outcomes vs smaller free models on independent benchmarks
Multimodal joint reasoning (seq+struct+func)AlphaFold3 handles molecules + structure; Chai-2 targets antibody structure + function; convergence acceleratingMediumAssess how many pharma workflows genuinely require simultaneous trimodal prompting
Forge API commercial distributionSelf-service API is low-moat GTM; Cradle has stickier data-flywheel SaaS; no disclosed multi-year enterprise contractsHighDetermine whether Forge has enterprise contracts with switching costs or is pay-per-call
Founder reputation / ESM research lineageSame founders released ESM2 for free (MIT) at Meta, providing a high-quality free alternative to their own paid offeringHighSeek founder clarity on commercialization strategy beyond model API; assess if research credibility drives enterprise pipeline
AWS SageMaker + NVIDIA BioNeMo distributionBoth channels are non-exclusive; multiple competitors available on same marketplaces (Recursion, others)MediumDetermine EvolutionaryScale's contractual exclusivity or priority on AWS/NVIDIA channels
No disclosed clinical pipelineCompetitors with clinical proof (Insilico Phase 2; Recursion multi-phase pipeline) command higher pharma deal valuesHighAssess whether EvolutionaryScale plans any internal therapeutic programs or remains a pure-tool company

Severity reflects impact on EvolutionaryScale's competitive durability if the threat materializes. All threats are based on publicly observed evidence; internal strategic mitigations not known to diligence team are not captured.

[CP033, CP034, CP035, CP036, CP037, CP038]
FP002: Capability Coverage Matrix: Key Platform Features by Competitor

Coverage matrix of six key buying-criteria capabilities across seven major competitors, showing which platforms support which capabilities based on publicly available evidence.

"Partial" indicates partial or limited capability based on public sources. "Unknown" cells would indicate no public evidence; all cells here reflect public product pages. Clinical pipeline status as of May 2026 from official sources.

[CP003, CP004, CP005, CP018, CP019, CP020]
FP003: Competitive Moat Readiness KPIs

Compact summary of key competitive durability indicators for EvolutionaryScale relative to peers, derived from publicly available evidence as of May 2026.

[CP001, CP002, CP003, CP007, CP010, CP033]

3.5 Exhibits

Chapter 04

04Financials

4.1 Revenue Streams and Pricing Model

EvolutionaryScale's revenue model was built around Forge, a commercial API platform providing inference access to ESM3, ESM Cambrian, and related protein language models. Forge launched in public beta in January 2025 with a per-token usage fee for protein sequence generation and structure prediction. The exact pricing schedule was not publicly listed on the forge.evolutionaryscale.ai interface and required a login; enterprise contract terms were negotiated privately. Three revenue streams were identifiable from public sources: (1) Forge API pay-per-use, (2) enterprise annual API access contracts (volume-based), and (3) distribution through strategic partners—NVIDIA's BioNeMo platform and AWS SageMaker JumpStart provided cloud-hosted access to ESM models, potentially on a revenue-share or referral-fee basis with NVIDIA and Amazon respectively. ESM Cambrian (released January 2025) was released exclusively as a commercial model through Forge, in contrast to the open-weight ESM2 released by Meta AI Research, which remains freely available on HuggingFace. This commercial-only design for ESM Cambrian reinforced the API access model. An academic free tier with capped token allowances was listed on the Forge product page. No revenue figures, ARR, or customer metrics were disclosed at any point during EvolutionaryScale's operational life as a standalone entity. The company's acquisition by CZI Biohub in November 2025 has fundamentally disrupted the commercial path, and as of May 2026 the Forge API's operational status and pricing under CZI Biohub management is not confirmed in public sources. [CI001][CI002][CI003][CI004][CI005][CI006][CI007][CI008][CI009][CI010]

Revenue streams and mechanisms
StreamMechanismUnit / PricingCurrent Status (May 2026)Evidence QualityDiligence Ask
Forge API — pay-per-usePer-token or per-call fee for ESM3 / ESM Cambrian protein generation and structure predictionNot publicly listed; requires login to forge.evolutionaryscale.aiOperational status unclear post-CZI transaction; public beta launched Jan 2025Low — pricing not disclosed; commercial launch confirmedRequest current Forge API pricing and usage metrics from CZI Biohub
Enterprise API access contractVolume-based annual license for Forge API access; negotiated per customerNot disclosed; estimated mid-five to mid-six figures USD/yearStatus unclear post-CZI; no enterprise customers named publiclyLow — contract structure inferred from product design; no confirmed dealsIdentify any signed enterprise customers; obtain contract template in diligence
NVIDIA BioNeMo distributionESM3 hosted on NVIDIA BioNeMo NIM platform; potential revenue share or co-marketing feeNot disclosed; NVIDIA partnership terms privateActive — ESM3 listed on BioNeMo as of 2026Medium — distribution confirmed via NVIDIA announcementsConfirm revenue-share or in-kind terms with NVIDIA
AWS SageMaker JumpStart distributionESM models listed on AWS SageMaker JumpStart; Amazon was lead Series A investorNot disclosed; may be in-kind cloud credits rather than cash revenueActive — AWS listing confirmed as of 2026Medium — distribution confirmed; financial terms unknownDetermine whether Amazon relationship generates cash revenue vs cloud-credit offsets
Academic / free tierCapped token allowance for academic and research usersFree; conversion funnel to paid tiersListed on Forge product page; status post-CZI unclearLow — listed as feature; no conversion metrics availableConfirm academic tier conversion rate and whether it drives enterprise pipeline

All pricing data is undisclosed. Status assessments are based on product page review and partner announcements. Post-CZI Biohub transaction (Nov 2025), the commercial continuity of Forge API under CZI is not confirmed.

[CI001, CI002, CI005, CI006, CI007]
Pricing and monetization summary
Product / ChannelList vs. Realized PricingDiscounts / UnknownsSource
Forge API (ESM3 / ESM Cambrian) — public betaList price: not published; login required to view; no per-token rate confirmedAcademic free tier and assumed enterprise discounts; no realized pricing disclosedforge.evolutionaryscale.ai (official, requires login); ESM Cambrian blog (official)
Enterprise API contractNot disclosed; not listed; negotiated individuallyVolume discounts, exclusivity, indication scope all variable and unknownInferred from product structure; no public contract examples
NVIDIA BioNeMo — ESM3 NIMIncluded in NVIDIA BioNeMo platform; user pricing per NVIDIA terms, not EvolutionaryScaleRevenue share terms with EvolutionaryScale undisclosed; may be zero cashNVIDIA BioNeMo product page; nvidianews.nvidia.com partnership announcement
AWS SageMaker JumpStart — ESM modelsAWS marketplace pricing (per instance-hour); EvolutionaryScale share of listing fee undisclosedAmazon's Series A lead investment may include in-kind cloud compute — may zero out cash cost but also revenueAWS SageMaker JumpStart product page; CNBC Series A announcement

No realized pricing data is available in public sources. All figures are inferred from product structure and industry analogues. The dominant risk is that ESM2 (Meta open-weight) provides a zero-cost alternative for embeddings, capping pricing power for non-generative use cases.

[CI001, CI004, CI006, CI007]
FI001: Revenue model flow: customer activity to EvolutionaryScale revenue

How customer interactions with Forge API and distribution channels converted into EvolutionaryScale's revenue streams.

No revenue figures or pricing were publicly disclosed. Node detail reflects known mechanisms and confirmed partnership structures, not dollar amounts. Post-CZI Biohub transaction, commercial flow is disrupted.

[CI001, CI005, CI006, CI007, CI009]

4.2 Cost Structure, Gross Margin Drivers, and Capital Intensity

EvolutionaryScale's cost structure was dominated by three categories: research personnel, GPU compute for model training, and inference infrastructure for the Forge API. The company trained ESM3 using over 10^24 FLOPs—more compute than any prior biological model per the company's own blog—on what it described as "one of the highest throughput GPU clusters in the world today." This single training run would have cost an estimated $10–50 million at prevailing cloud GPU rates (H100 at $2–5 per GPU-hour), making it a dominant capital expense. Personnel costs were limited by headcount: LinkedIn shows the company in the 11-50 employee bracket, suggesting approximately 25-50 FTE at peak. At a blended all-in cost of $200,000–$300,000 per FTE (standard for San Francisco AI research talent), annual personnel burn was approximately $5–15 million per year. Ongoing inference costs for the Forge API would add a variable component scaling with API call volume—protein generation with ESM3 at 98B parameters is computationally expensive per query. Gross margins for an API inference business depend heavily on whether the company operated its own GPU cluster (capital-intensive, higher long-run margins) versus renting cloud compute (lower capex, higher COGS). No gross margin figures were disclosed. A positive indicator was Amazon's lead investment in the Series A: AWS may have provided substantial in-kind cloud credits as part of the deal structure, materially reducing near-term infrastructure costs. The ESM2 model's open-source availability on HuggingFace represents a permanent competitive cost floor for the embeddings market, which is a structural margin-compression risk for the ESM Cambrian commercial tier. [CI011][CI012][CI013][CI014][CI015][CI016][CI017]

Cost structure and capital intensity
Cost CategoryPrimary DriverEstimated MagnitudeEvidence BasisConfidence
GPU compute — model trainingESM3 training: >10^24 FLOPs on high-throughput cluster$10–50M one-time (est.); H100 compute at $2–5/GPU-hourDirect quote from ESM3 official blog; NVIDIA BioNeMo blogMedium — compute scale confirmed; cost rate estimated
GPU compute — inference (Forge API)ESM3 (98B parameters) inference per API call; variable with call volume$1–5M/month at scale (est.); highly sensitive to call volumeInferred from model size and cloud GPU pricing benchmarksLow — no call volume data disclosed
Personnel~25–50 FTE; AI researchers, ML engineers, platform engineers$5–15M/year (est.); $200–300K all-in per FTE in SFLinkedIn company size 11-50 bracket; Wikipedia headcountLow — headcount estimated; no payroll data
Infrastructure / cloud (non-training)API serving, storage, data pipelines, internal tooling$0.5–2M/month (est.)Inferred from API business benchmarks; Amazon AWS lead investor may provide creditsLow — unconfirmed; potential in-kind from Amazon
Research / dataTraining data licensing, academic dataset access, wet-lab validation (limited)Low-to-moderate; most training data (UniProt, PDB) is publicESM3 blog cites public protein sequence databasesMedium — data cost likely low; compute dominates
G&A / corporate overheadLegal, finance, HR, facilities for 25-50 FTE$1–3M/year (est.)Benchmark for early-stage SF AI companyLow — estimated

All magnitude estimates are analyst-derived from public proxies; no financial statements or confirmed cost data are available. The dominant cost driver is GPU compute for training and inference. Amazon's lead investor status may have included in-kind cloud credits, which would reduce cash infrastructure spend. Estimates are order-of-magnitude only.

[CI011, CI012, CI013, CI014, CI015]
FI002: Unit economics flow: inferred API revenue vs. cost components

Qualitative flow of revenue inputs and dominant cost deductions for the Forge API inference business; all values are estimates.

All values estimated from headcount proxy, GPU pricing benchmarks, and cloud API analogues. No actual revenue, COGS, or gross margin data was disclosed by EvolutionaryScale.

[CI011, CI012, CI014, CI015, CI016]

4.3 Capital Adequacy and the CZI Biohub Transaction

EvolutionaryScale's capital trajectory was: seed round (~$3M, late 2023) followed by a $142M Series A on September 26, 2024 (announced simultaneously by CNBC, Axios, and NVIDIA). The Series A was led by Amazon and NVIDIA with co-investment from Lux Capital, Nat Friedman, and Daniel Gross at a reported ~$1.35B post-money valuation. Total capital raised was approximately $145M. At an estimated burn rate of $5–15M per month (based on headcount + compute + overhead), the $142M Series A provided a theoretical runway of 9–28 months from the September 2024 close. The actual runway ended in November 2025—only approximately 14 months after the Series A closed—when the EvolutionaryScale team joined CZI Biohub. Alex Rives, co-founder and chief scientist, became Head of Science at CZI Biohub. The transaction terms were not disclosed, and no SEC filings triggered by the transaction (e.g., a Form D amendment, Hart-Scott-Rodino disclosure, or acquisition notice) have been located in EDGAR or other public regulatory databases. The financial implications for the $142M Series A investors (Amazon, NVIDIA, Lux Capital, and angels) are unclear. If the CZI/Biohub transaction was a cash acquisition of the entity, investors would have received a distribution. If it was a talent acquisition (acqui-hire) without entity purchase, the $142M capital would have been substantially consumed by the time of the transaction—leaving investors with limited recovery. The absence of any public deal-value disclosure makes this risk assessment open-ended. A notable financial compliance gap was identified during research: no Form D filings appear in SEC EDGAR under any name variant for EvolutionaryScale (searches conducted for "EvolutionaryScale," "Evolutionary Scale," "Evolutionary Scale Inc," and by key person "Alexander Rives"). Private companies raising $142M via Regulation D exemptions are typically required to file Form D with the SEC within 15 days of the first sale. The absence of any Form D is either an indicator of filing under a different legal entity name, use of an alternate securities exemption (e.g., Regulation S for offshore investors), or a filing gap. [CI018][CI019][CI020][CI021][CI022][CI023][CI024][CI025][CI026][CI027][CI028]

Capital adequacy and financing dependency
MetricEstimated ValueBasisConfidenceKey Assumption / Caveat
Series A net proceeds~$142MCNBC, Axios, MIT Tech Review announcements; NVIDIA partner newsHighClose date Sep 26, 2024; standard ~99% close assumed
Seed round capital~$3M (est.)Crunchbase, NVIDIA seed investment news; not publicly confirmed dollar amountLowAmount not publicly confirmed; investor names confirmed
Total capital raised (pre-CZI)~$145MDerived from Series A + seed estimatesLow-to-mediumSeed amount unconfirmed
Estimated monthly burn rate$5–15M/month (est.)Headcount (25-50 FTE × $200-300K) + compute ($2-7M/mo) + overheadLowNo actual burn data; very wide range; lower end plausible with AWS in-kind credits
Theoretical runway from Series A (Sep 2024)9–28 months (est.)$142M ÷ estimated $5–15M/month burnLowRunway actually ended with CZI transaction, Nov 2025 (~14 months)
CZI Biohub transaction (Nov 2025)Terms undisclosedCNBC Nov 2025 article; Biohub.org announcementHigh — transaction confirmed; terms unknownWhether investors received return is not publicly disclosed
Debt / project-finance obligationsNone identifiedSEC EDGAR search; no public debt disclosureLowAbsence of disclosure does not confirm absence of debt
SEC Form D filingsZero filings found in EDGARSEC EDGAR full-text search for 'EvolutionaryScale', 'Evolutionary Scale', 'Evolutionary Scale Inc'High — multiple searches returned 0 resultsMay be filed under different legal entity name; Reg S offshore exemption possible

Capital adequacy assessment is severely hampered by private-company non-disclosure. The CZI Biohub transaction on Nov 6, 2025 effectively marks the end of EvolutionaryScale as an independent commercial entity. Forward capital adequacy analysis is moot for standalone purposes; all future diligence must be directed at CZI Biohub. The funding chronology (round dates, investors) is established in the Company Overview chapter; this table focuses on adequacy and compliance signals.

[CI018, CI019, CI020, CI022, CI023, CI024]
FI003: Financial estimate ranges: burn rate, runway, and valuation inputs

Source-backed and analyst-estimated ranges for key financial parameters; all values carry low-to-medium confidence due to non-disclosure.

Low/mid/high bounds derived from: (1) headcount proxy (LinkedIn 11-50 bracket), (2) cloud GPU pricing benchmarks, (3) CNBC/Axios confirmed $142M raise, (4) peer fundraising analogues. No EvolutionaryScale financial statements available.

[CI018, CI019, CI022, CI011]

4.4 Peer Capital Benchmarks and Positioning

EvolutionaryScale's $142M Series A and $1.35B valuation sit at the mid-upper range of protein AI fundraising activity, above pure infrastructure plays like Profluent (~$44M) and Cradle (~$73M), but well below full-stack drug discovery AI companies such as Generate:Biomedicines (~$700M+) and Xaira Therapeutics ($1B at founding). The $1.35B valuation implied a significant premium for a pre-revenue, foundation-model protein AI company with fewer than 50 employees. On a capital-per-head basis, EvolutionaryScale was exceptionally capital-intensive at approximately $3–6M raised per employee—reflecting the cost of world-class AI research talent and compute-heavy model development rather than a scaled commercial operation. By contrast, publicly traded AI drug discovery companies (AbSci, Recursion) have far larger employee bases relative to capital raised, diluted across clinical and manufacturing operations. The peer comparison underscores that EvolutionaryScale was structured as a frontier research vehicle, not a scaled commercial enterprise, at the time of its Series A—raising questions about the commercial revenue ramp hypothesis embedded in the $1.35B valuation. [CI029][CI030][CI031][CI032]

Peer capital benchmarks
CompanyTotal Capital Raised (est.)Stage / FocusPost-money ValuationCapital Efficiency Note
EvolutionaryScale~$145MSeries A; protein AI foundation model; acqui-hired by CZI Nov 2025~$1.35B (Sep 2024)High capital-per-head (~$3-6M/FTE); pre-revenue at acqui-hire
Profluent Bio~$44MSeries A; protein design AI; open-sourced ProGen2~$200-300M est.Lower capital base; narrower scope; more commercially focused
Cradle.bio~$73MSeries B; AI protein engineering platform~$400M est.Comparable capital efficiency to EvolutionaryScale
Generate:Biomedicines~$700M+Series C+; full-stack AI protein therapeutics~$2B est.Much larger scale; targeting drug revenue, not API
Xaira Therapeutics~$1BSeed/launch; full-stack AI drug discovery~$2.5B est. at launchBest-capitalized protein AI startup; broadest scope
AbSci (ABSI)$200M+ (pre-IPO); public since 2021Biomanufacturing + AI drug design; ~350 employeesMarket cap ~$500M (2026 est.)Much larger headcount; different model (wet-lab + AI)
Isomorphic LabsUndisclosed (Alphabet subsidiary)Series A; AI-first drug design; DeepMind spin-outN/A (subsidiary)Structurally non-comparable; backed by corporate parent

Peer data from public news sources (Axios, Crunchbase, Wikipedia) and company websites. Valuations are post-money estimates at last-known round; not confirmed by audited filings. Cradle, Profluent, Generate, and Xaira data sourced from publicly available funding announcements and analyst databases. All comparisons are approximate and for relative context only.

[CI029, CI030, CI031, CI032]
FI004: Capital structure waterfall: estimated deployment of Series A capital

Estimated waterfall showing how the $142M Series A capital was likely deployed through the ~14-month runway to the CZI Biohub transaction.

All values are analyst estimates derived from headcount, GPU benchmarks, and timeline analysis. No actual use-of-funds disclosure was made by EvolutionaryScale. Figures are illustrative ranges presented at midpoint estimates.

[CI018, CI019, CI022, CI023, CI025]

4.5 Financial Information Gaps and Diligence Path

The public financial record for EvolutionaryScale is near-empty. As a private company, EvolutionaryScale was not required to file financial statements with the SEC. The absence of Form D filings further limits what can be verified through public regulatory channels. None of the following key metrics are available in any public source reviewed: actual revenue, ARR, gross margin, customer count, churn rate, cash position, or confirmed monthly burn rate. Crunchbase misclassifies the $142M Series A as a "seed investment," which illustrates the unreliability of private-market data aggregators for verified financial analysis. Post-CZI transaction, financial diligence must now flow through CZI Biohub, a nonprofit supported by the Chan Zuckerberg Initiative. This fundamentally changes the diligence path: rather than standard VC financial due diligence, the relevant inquiry becomes (1) the terms of the CZI transaction and what was paid to former investors and shareholders, (2) the ongoing operational status and commercialization strategy for the Forge API and ESM model family under CZI, and (3) whether any residual commercial entity (EvolutionaryScale entity holding IP) continues to operate independently. All outstanding financial analysis requires access to CZI Biohub's internal documentation and any deal disclosures. [CI033][CI034][CI035][CI036]

Public financial gaps and diligence path
Missing MetricImpact on AnalysisDiligence PathPriority
Revenue and ARR (Forge API)Cannot assess commercial product viability, pricing-market fit, or revenue trajectoryRequest from CZI Biohub commercial team; Forge API analytics dataCritical
Gross margin (API inference)Cannot assess unit economics or path to profitabilityRequest cost-of-goods breakdown from CZI Biohub; benchmark against cloud AI API peersHigh
Confirmed burn rateCannot confirm capital adequacy or verify runway; estimates range 3× (low to high)Request historical monthly P&L from EvolutionaryScale entity records via CZI BiohubHigh
CZI Biohub transaction termsCannot assess investor return profile; cannot determine if $142M Series A had exit valueRequest deal term sheet and any investor distribution records; check SEC for Form 8-K analog if any entity became subject to reportingCritical
SEC Form D filingsRegulatory compliance gap; all Reg D raises require Form D within 15 daysSearch EDGAR under all possible legal entity names; request from CZI Biohub legal teamHigh
Customer count and enterprise pipelineCannot validate commercial traction or sales efficiency of Forge APIRequest from CZI Biohub; check LinkedIn for any customer public referencesHigh
Forge API operational status post-CZICannot assess whether commercial product is being maintained or wound down under CZIDirect test of forge.evolutionaryscale.ai API; request roadmap from CZI BiohubHigh
Investor return from CZI transactionCannot determine whether Amazon, NVIDIA, or Lux Capital received return on $142MReview any public CZI acquisition announcements, press releases, or SEC-equivalent disclosures; direct inquiry to investorsCritical

All gaps confirmed by reviewing EDGAR, company website, CNBC, Biohub.org, Crunchbase, and NVIDIA announcements as of May 2026. The CZI Biohub transaction is the dominant gap: it absorbs all other financial diligence priorities into a single inquiry about deal terms and residual commercial continuity.

[CI033, CI034, CI035, CI036]

4.6 Financial Verdict

EvolutionaryScale's financial narrative is one of exceptional early-stage capital formation followed by an abrupt strategic pivot that extinguished the standalone commercial path. The $142M Series A at $1.35B valuation—led by Amazon and NVIDIA—was structurally a strategic bet on ESM3 as a foundational protein AI layer, not a near-term commercial revenue play. The commercial product (Forge API) launched publicly in January 2025 with no disclosed revenue metrics, no disclosed customer count, and no pricing transparency. The CZI Biohub transaction in November 2025—within 14 months of the Series A close—confirms that the company did not achieve commercial breakout on a standalone basis before seeking the CZI umbrella. Revenue quality assessment: insufficient data to assess. The Forge API mechanism was sound (per-token inference on a proprietary foundation model), but the addressable market for paid protein AI API access without a broader drug-discovery workflow context is narrow, and open-weight alternatives (ESM2 from Meta) cap the willingness-to-pay ceiling for commodity embedding use cases. Capital intensity was very high relative to commercial traction. The SEC Form D filing gap is a noteworthy compliance signal. The principal diligence blockers are: (1) CZI Biohub transaction terms; (2) confirmed Forge API revenue or customer metrics; (3) actual burn rate pre-transaction; and (4) investor return profile from the CZI deal. No underwriting-grade financial conclusion on investment return can be drawn from public sources alone. [CI033][CI034][CI035][CI036][CI037][CI038]

4.7 Exhibits

Chapter 05

05Product & Technology

5.1 ESM Product Portfolio and Model Family

EvolutionaryScale offers two distinct product lines built on the ESM (Evolutionary Scale Modeling) foundation: the ESM3 generative protein language model and the ESM-C (Cambrian) protein embedding family. Together they span eight model SKUs across open-weight and commercial-API tiers. ESM3 is offered in three weight classes: ESM3-small-2024-08 (1.4 billion parameters), ESM3-medium-2024-08 (7 billion parameters), and ESM3-large-2024-03 (98 billion parameters). ESM3-small is the only ESM3 variant with open weights, available under the Cambrian Non-Commercial License Agreement via HuggingFace (as esm3-sm-open-v1). ESM3-medium and ESM3-large are commercial and accessible exclusively through the Forge API. The 98B ESM3-large is the model used to design esmGFP — a de novo fluorescent protein exhibiting only 58% sequence identity to the nearest natural GFP, representing approximately 500 million years of equivalent evolutionary divergence and validated by peer-reviewed publication in Science (January 2025). ESM-C (Cambrian) is a separate embedding-focused model family with three sizes: ESMC-300M, ESMC-600M, and ESMC-6B. ESMC-300M and ESMC-600M carry open weights under the same Cambrian Non-Commercial License. ESMC-6B is accessible via the Forge API for academic users and via AWS SageMaker JumpStart for commercial deployments. ESM-C models use a Pre-LN transformer architecture with rotary positional embeddings and SwiGLU activations and were benchmarked by EvolutionaryScale as state-of-the-art sequence representation models at their respective scales. The Forge API (forge.evolutionaryscale.ai) is the primary commercial monetization vehicle. Opened to public beta in January 2025 concurrent with the Science publication, Forge provides synchronous and asynchronous REST API access to the ESM3 and ESMC model family with a Python SDK available via pip. A batch executor for high-throughput workloads is included in the SDK. Pricing for commercial Forge access is not publicly disclosed; academic access terms are available directly through the platform.[CE001, CE002, CE003, CE004, CE005, CE006]

ESM Product Module / Asset Matrix
Model / ProductParameter ScaleAvailability / LicensePrimary UseDiligence Gap
ESM3-small-2024-081.4 BOpen weights — Cambrian Non-Commercial License (HuggingFace esm3-sm-open-v1)Research protein generation; local fine-tuning on non-commercial datasetsNo commercial use permitted; limited benchmarks vs. larger ESM3 variants
ESM3-medium-2024-087 BForge API only — commercial pricing undisclosedMid-scale protein design; Forge API customersPricing not public; no public benchmark comparison vs. ESM3-small
ESM3-large-2024-0398 BForge API only — flagship commercial modelHigh-complexity protein design; generative design at esmGFP scaleInference cost not disclosed; SLA terms not public
ESMC-300M (esmc-300m-2024-12)300 MOpen weights — Cambrian Non-Commercial License (HuggingFace)Protein sequence embeddings for ML pipelines; researchNon-commercial only; no third-party independent accuracy benchmark
ESMC-600M (esmc-600m-2024-12)600 MOpen weights — Cambrian Non-Commercial License (HuggingFace)Enhanced protein embeddings; research and academic fine-tuningNon-commercial only; 1,490 HuggingFace downloads suggests early adoption
ESMC-6B6 BForge API (academic) + AWS SageMaker JumpStart (commercial)Enterprise-scale protein sequence embedding and similarity searchCommercial pricing undisclosed; SageMaker instance cost is user-borne
Forge API (forge.evolutionaryscale.ai)Service (all ESM3/ESMC models)Public beta since January 2025; commercial subscription modelProgrammatic access to all models; sync and async inference; batch executorPricing structure, usage tiers, uptime SLA, and customer list not public

Model sizes and parameter counts are sourced from official EvolutionaryScale blog posts, the GitHub ESM repository README, and the HuggingFace model cards. HuggingFace download counts reflect a 30-day snapshot as of the research date and may fluctuate. Pricing for Forge API and SageMaker commercial tiers is not publicly disclosed; rows marked "Undisclosed" reflect confirmed absence of public pricing at the time of research.

ESM Workflow / Use-Case Table
Use CaseTarget UserESM ToolWorkflow StepValidation Evidence
De novo protein designProtein engineers, drug discovery researchersESM3-large (Forge API)Specify partial sequence or structure constraint; generate full candidate proteins; iterateesmGFP — peer-reviewed Science 2025; 341 citations
Protein sequence embedding for ML pipelinesAcademic researchers, bioinformatics teamsESMC-300M or ESMC-600M (open weights)Embed protein sequences; feed embeddings into downstream classifiers or clustering6,320 ESMC-300M HuggingFace downloads; community citations in 129+ BioRxiv papers
Enterprise-scale commercial embeddingEnterprise bioinformatics, pharma R&DESMC-6B (AWS SageMaker JumpStart)Deploy via CloudFormation stack (15-25 min); run batch protein similarity searchAWS SageMaker JumpStart listing; GitHub esm-sagemaker CloudFormation docs
Structure-conditioned protein variant generationComputational biologistsESM3-small (open weights, local GPU)Provide partial structure tokens as conditioning; generate sequence variantsGitHub ESM README examples; ESM3 architecture multitrack tokenization
Function-guided protein designDrug discovery, enzyme engineeringESM3 (Forge API)Specify function annotation keywords; jointly optimize sequence and structure outputsESM3 Science paper benchmark results; ESM3 blog (company-claimed)
Academic research model fine-tuningAcademic labsESMC-300M (open weights)Fine-tune ESMC on proprietary protein datasets for domain-specific tasksCambrian Non-Commercial License permits fine-tuning for non-commercial research

Use cases are derived from official EvolutionaryScale blog posts, the GitHub ESM README, the Science paper (Hayes et al., 2025), and community HuggingFace downloads. The esmGFP use case is fully validated; other use cases reflect documented capability claims.

FE001: Product architecture map

Five-layer architecture of EvolutionaryScale's ESM protein AI platform from training data through deployment.

Layer boundaries are conceptual; exact service architecture of the Forge API and internal infrastructure are not publicly documented.

[CE001, CE002, CE003, CE009, CE019, CE022]

5.2 Technical Architecture: Multitrack Transformer, Training Scale, and Tokenization

ESM3's defining architectural innovation is its multitrack transformer design: the model processes three parallel token sequences — amino acid sequence tokens, structure tokens (encoding 3D coordinates), and function annotation tokens (keyword-based GO term labels) — within a unified transformer framework. Each track uses discrete tokenization. Structure coordinates are encoded via a vector quantized variational autoencoder (VQVAE) into a finite codebook of structural tokens, enabling the model to natively read and generate three-dimensional protein structure without requiring continuous coordinate regression. Pre-training uses a masked language modeling (MLM) objective applied across all three tracks simultaneously, allowing the model to learn joint representations that span the sequence-structure-function space. The 98-billion-parameter ESM3-large model was trained using 1.07×10²⁴ floating-point operations on approximately 2.78 billion protein sequences (771 billion unique tokens), trained on the Andromeda HPC cluster using NVIDIA H100 GPUs and Quantum-2 InfiniBand networking. NVIDIA reports that ESM3-large uses approximately 25× more FLOPs and 60× more data than its predecessor ESM2. Reinforcement learning from human feedback (RLHF) was applied to ESM3-large to align outputs with human preferences for protein design tasks. ESM-C (Cambrian) employs a different architectural profile: a Pre-Layer Normalization (Pre-LN) transformer with rotary positional embeddings (RoPE) and SwiGLU feed-forward activations, pre-trained with masked language modeling. ESMC-300M (30 layers, hidden width 960) was trained on 1.26×10²² FLOPs; ESMC-600M (36 layers, width 1152) on 2.17×10²² FLOPs; and ESMC-6B (80 layers, width 2560) on 2.37×10²³ FLOPs. Training data spans three large sequence databases: UniRef (83 million clusters), MGnify (372 million), and JGI metagenomics (2 billion clusters), all clustered at 70% sequence identity. EvolutionaryScale's internal infrastructure capabilities are illustrated by the open-source DeepEP library, a custom CUDA/NCCL implementation of Mixture-of-Experts Expert Parallelism communication for H800 GPUs. With 1,253 GitHub stars, DeepEP signals active HPC engineering capability at the company, supporting large-scale distributed training and inference.[CE009, CE010, CE011, CE012, CE013, CE014]

ESM Technology / Operating Architecture Table
ComponentDescription / SpecificationKey MetricPrimary Source
ESM3 multitrack transformerThree parallel input/output tracks: amino-acid sequence tokens, VQVAE structure tokens, function keyword tokens; unified attention across tracks3 tracks; 1.4B / 7B / 98B params across 3 model sizesESM3 blog (official); Hayes et al. Science 2025
VQVAE structure tokenizerVector quantized variational autoencoder encodes 3D protein backbone coordinates as discrete structural tokens from a finite codebookDiscrete codebook; enables native 3D structure generation without coordinate regressionESM3 blog (official); ESM3 preprint (bioRxiv)
ESM3 pre-training and alignmentMasked language modeling (MLM) across all three tracks; RLHF fine-tuning on ESM3-large for alignment with protein design preferences1.07×10²⁴ FLOPs (98B model); 2.78B proteins; 771B unique tokensESM3 blog (official); NVIDIA blog; Science paper
ESM-C architecturePre-LN transformer with RoPE positional embeddings and SwiGLU activations; masked language modeling pre-training; three sizes: 300M / 600M / 6B300M: 30L×960W; 600M: 36L×1152W; 6B: 80L×2560WESM-C blog (official)
ESM-C training compute and dataTraining data: UniRef (83M seq clusters), MGnify (372M), JGI metagenomics (2B clusters) at 70% identity; FLOPs per model: 300M=1.26e22, 600M=2.17e22, 6B=2.37e232B+ total sequence clusters (largest component from JGI metagenomics)ESM-C blog (official)
Andromeda HPC training clusterNVIDIA H100 Tensor Core GPU cluster with Quantum-2 InfiniBand networking; used to train ESM3-large25× more FLOPs and 60× more data vs. predecessor ESM2NVIDIA blog (partner-proof)
DeepEP MoE Expert Parallelism libraryOpen-source CUDA/NCCL implementation of Mixture-of-Experts Expert Parallelism communication; custom kernels for H800 GPUs1,253 GitHub stars; signals advanced internal HPC infrastructureGitHub evolutionaryscale/DeepEP (developer-signal)

Architecture parameters (layer counts, hidden widths) for ESM-C are from the official ESM-C blog. Training FLOPs for ESM3-large are from the official ESM3 blog and Science paper. ESM-C FLOPs are from the ESM-C blog. Infrastructure details (Andromeda cluster, H100 GPUs) are from the NVIDIA blog announcement and corroborated by the ESM3 Science paper.

FE002: Customer workflow / operating flow

Eight-step workflow showing how protein researchers use the ESM3 platform from hypothesis to validated candidate.

Flow is a simplification; feedback loops between experimental results and revised prompts are not shown. Wet-lab validation step is performed by the user, not by EvolutionaryScale.

[CE005, CE007, CE018, CE029]

5.3 Deployment and Ecosystem: Forge API, AWS, NVIDIA, and Community

EvolutionaryScale has built a multi-tier distribution strategy combining open-weight community adoption with commercial API and cloud-marketplace access. The Forge API (forge.evolutionaryscale.ai), opened to public beta in January 2025, is the primary programmatic interface for ESM3 and ESMC commercial use. The official Python client (pip install evoscale-sdk) is hosted at github.com/evolutionaryscale/esm and provides both synchronous inference and asynchronous batch execution. As of May 2026, the Forge API portal is accessible and operational, though detailed pricing is not publicly listed. AWS SageMaker JumpStart provides commercial deployment of ESMC-6B via a CloudFormation stack that provisions a dedicated GPU instance in 15-25 minutes. This integration, documented in the esm-sagemaker GitHub repository, targets enterprise bioinformatics customers needing large-scale embedding workflows with predictable SLAs. Amazon was a co-investor in EvolutionaryScale's Series A and is a deployment partner. NVIDIA announced ESM3 integration into its BioNeMo NIM platform for GPU-optimized inference when ESM3 was first released in June 2024; EvolutionaryScale's ESM-C blog (December 2024) lists BioNeMo as a forthcoming distribution channel. The NVIDIA NGC catalog separately lists ESM3 as a resource. On GitHub, EvolutionaryScale maintains nine public repositories. Beyond the flagship esm repository, notable community-facing projects include DeepEP (1,253 stars), a NCCL fork, a Hugging Face transformers fork, and a Mamba implementation. The HuggingFace organization (huggingface.co/evolutionaryscale) hosts open-weight model cards, with the esm3-sm-open-v1 model page showing 3,105 downloads in the prior month and 291 likes; ESMC-300M shows 6,320 downloads and ESMC-600M shows 1,490 downloads. These metrics indicate meaningful adoption within the research community.[CE019, CE020, CE021, CE022, CE023, CE024]

FE003: Critical dependency map

Directed acyclic graph of EvolutionaryScale's critical platform, infrastructure, and distribution dependencies.

Dependency directions represent data, compute, and ownership flows; not data-volume magnitudes. The Forge API internal serving infrastructure is not publicly documented.

[CE010, CE016, CE019, CE022, CE025, CE035]

5.4 Intellectual Property and Competitive Moat

EvolutionaryScale's IP moat rests on three pillars: (1) publication depth in high-impact venues, (2) proprietary training scale and infrastructure, and (3) the esmGFP demonstration of generative capability in a region of protein sequence space no natural evolution has explored. The flagship Science publication (Hayes et al., Science, January 2025, Vol 387, Issue 6736, pp. 850-858, DOI 10.1126/science.ads0018) has accumulated 341 citations and 68,494 downloads as of the citation metrics observed during this research; 318 of those citations arrived within the first 12 months of publication. The ESM3 preprint on bioRxiv (10.1101/2024.07.01.600583, July 2024) was cited by 129+ downstream papers within its first year. This publication velocity places ESM3 among the most-cited new protein ML methods. The esmGFP result represents the strongest public demonstration of ESM3's generative capability. The designed protein carries 96 mutations across its 229 amino acid positions (a Hamming distance of 58% from the nearest known natural GFP), placing it at evolutionary distances comparable to the separation of corals from jellyfish — two distinct phyla. EvolutionaryScale filed patents covering esmGFP and related protein design methods. The compute investment required to replicate ESM3-large (1.07×10²⁴ FLOPs, equivalent to more than twice the training budget of GPT-4 at the time of training) creates a meaningful cost barrier. Primary competitors are differentiated on axis rather than directly overlapping. AlphaFold3 (DeepMind, May 2024) excels at protein structure prediction including small-molecule and antibody complexes but is not generative in the design sense and restricts commercial use to academic research. Chai-1 (Chai Discovery, 2024) focuses on high-accuracy protein complex structure prediction. ESM2 (Meta AI, 2022), a 650M-parameter open-weight predecessor, provides sequence embeddings but lacks generative sequence-structure-function joint modeling. EvolutionaryScale's unique positioning is in generative protein design that simultaneously reasons over sequence, structure, and function.[CE027, CE028, CE029, CE030, CE031, CE032]

ESM Trust / Quality / Compliance Table
Trust / Safety DimensionCurrent Public StatusRisk LevelDiligence Path
Dual-use / biosecurity risk (pathogen protein design)Open weights (ESM3-small, ESMC-300M/600M) carry Cambrian Non-Commercial License restrictions; Forge API requires account acceptance of ToS; no public biosecurity screening policyHigh (industry-wide concern; no published biosecurity audit)Request biosecurity policy document and any third-party biosafety review from EvolutionaryScale / CZI Biohub
Open-weight non-commercial license complianceCambrian Non-Commercial License Agreement prohibits commercial use of ESM3-small and ESMC-300M/600M; commercial customers must use Forge API or SageMakerMedium (license enforcement requires monitoring; grey-area commercial use may go undetected)Review Cambrian Non-Commercial License Agreement; assess IP protection against unauthorized commercial fine-tuning
Forge API data privacy and retentionNo public data retention, deletion, or confidentiality policy for sequences submitted to Forge APIMedium (material concern for pharma customers with proprietary sequence data)Request Forge API Terms of Service, Privacy Policy, and Data Processing Agreement from EvolutionaryScale / CZI Biohub
Performance robustness under heterogeneous structural inputsIndependent BioRxiv study (Dec 2024) found ESM3 binding predictions deteriorate when distinct per-variant relaxed structures are used; see SE007Medium (adversarial finding; scope is specific to heterogeneous structure inputs)Review Gissing & Smith BioRxiv Dec 2024 preprint; test ESM3 binding prediction with varied structural input strategies
Organizational continuity risk (CZI Biohub transition)EvolutionaryScale team joined CZI Biohub in November 2025; future product roadmap governed by CZI Biohub rather than independent startupMedium (dependency on non-profit mission and funding continuity)Monitor CZI Biohub announcements; confirm Forge API SLA commitments post-transition

Trust and compliance status is derived from public sources only. Biosecurity policies, Forge API data retention policies, and any independent security audits are not publicly available. The table reflects publicly documented controls and known gaps as of May 2026.

FE004: Product maturity / capability map

Capability and maturity comparison across ESM3 (generative), ESMC (embedding), Forge API, and open-weight tiers on six dimensions.

Capability ratings are evidence-based assessments from public sources. Internal performance benchmarks and Forge API SLA details have not been disclosed.

[CE001, CE002, CE010, CE024, CE033, CE039]

5.5 Product Roadmap, CZI Biohub Transition, and Responsible Development

EvolutionaryScale raised a $142 million Series A in September 2024 led by Lux Capital, with participation from Amazon and NVIDIA, following an earlier seed investment from NVIDIA. In November 2025, the company's team joined CZI Biohub as part of a major "Frontier AI & Biology" initiative announced by the Chan Zuckerberg Initiative. Under this transition, co-founder and chief scientist Alex Rives became head of science at Biohub, and the EvolutionaryScale research team was integrated into Biohub's combined team of biological scientists, AI engineers, and technologists. CZI Biohub has announced a compute expansion to 10,000 GPUs by 2028 to support this initiative. As of the May 2026 report date, the Forge API and open-weight model distributions remain operational. EvolutionaryScale's public benefit company (PBC) charter and the Cambrian Non-Commercial License on open weights encode a commitment to research access while reserving commercial capabilities for the Forge API revenue model. Key trust and safety dimensions require diligence attention. The dual-use risk of generative protein design — including potential misuse for pathogen engineering — is an industry-wide concern. EvolutionaryScale addresses this through the non-commercial license restriction on open weights and access controls on the Forge API, but has not publicly documented a biosecurity screening policy or independent biosafety audit. An independent BioRxiv preprint published in December 2024 found that ESM3's binding prediction performance deteriorates when distinct, per-variant relaxed protein structures are used as inputs, compared to using a single consistent structure as the backbone — a "More Structure, Less Accuracy" paradox that diligence teams should investigate for deployment scenarios involving heterogeneous structural inputs. Data retention policies for sequences submitted to the Forge API have not been publicly disclosed, which may be a compliance concern for enterprise customers in regulated industries. No SEC Form D filings were found for EvolutionaryScale, consistent with its status as a privately held company.[CE034, CE035, CE036, CE037, CE038, CE039]

ESM Roadmap / Release / Development-Stage Table
MilestoneDate / TimingStatusEvidence Source
ESM2 released (Meta AI, 650M–3B open weights)2022Complete — open source, widely adopted by research communityMeta AI blog; HuggingFace (predecessor, not EvolutionaryScale)
EvolutionaryScale founded; ESM3 pre-release development begins2023Complete — company founded by Meta AI FAIR alumniNVIDIA seed investment announcement; Crunchbase
ESM3-small open weights released; ESM3 Forge closed betaJune 2024Complete — Forge closed beta opened; ESM3-small on HuggingFaceESM3 official blog (SE001); NVIDIA blog (SE017)
ESM3 preprint submitted to bioRxiv (10.1101/2024.07.01.600583)July 2024Complete — 129+ citing papers within first yearbioRxiv preprint (SE006); BioRxiv search (SE008)
Series A fundraise ($142M) — Lux Capital, Amazon, NVIDIASeptember 2024CompleteAxios (SE025); Crunchbase (SE020)
ESM-C (Cambrian) models released — 300M/600M open weights + 6B ForgeDecember 2024Complete — open weights on HuggingFace; ESMC-6B on ForgeESM-C blog (SE002); HuggingFace model cards (SE014, SE015)
ESM3 published in Science (Hayes et al., Vol 387, pp. 850-858)January 16, 2025Complete — 341 citations; 68,494 downloadsScience DOI 10.1126/science.ads0018 (SE005); Semantic Scholar (SE026)
Forge API public beta openedJanuary 2025Complete — concurrent with Science publicationESM3 blog (SE001); GitHub ESM README (SE009)
EvolutionaryScale team joins CZI Biohub (Frontier AI & Biology initiative)November 2025Complete — Alex Rives appointed head of science at BiohubCZI Biohub blog (SE023)
NVIDIA BioNeMo NIM integration (ESM-C)Target: 2025/2026In progress — listed as 'available soon' in ESM-C blog (December 2024)ESM-C blog (SE002); NVIDIA blog (SE017); NVIDIA NGC catalog (SE018)
CZI Biohub 10,000-GPU compute expansionTarget: by 2028Announced — Biohub Frontier AI initiativeCZI Biohub blog (SE023)

Milestone dates are sourced from official EvolutionaryScale blog posts, bioRxiv submission metadata, the Science paper publication date, and news articles reporting the Series A. Future milestones (BioNeMo NIM, 10,000 GPUs) are drawn from NVIDIA and CZI Biohub announcements and represent planned targets, not confirmed deliverables.

5.6 Exhibits

Chapter 06

06Customers

6.1 Customer Base Segmentation

EvolutionaryScale's customer base is best understood as four access tiers, each with a different buyer profile, access mechanism, and evidence depth. The largest and best-evidenced tier is academic and independent researchers, who access open-weight ESM3 (1.4B, non-commercial license) and ESM-C (300M and 600M, open weights) directly via GitHub and HuggingFace. These users are predominantly computational biologists, structural biologists, and bioinformaticians at universities, research institutes, and government labs. Their use cases include protein sequence representation, structure prediction fine-tuning, functional annotation, antibody design, and downstream model development. No revenue is generated from open-weight users; they represent a top-of-funnel signal for commercial conversion. The second tier comprises commercial cloud platform users reaching ESM models through Amazon Web Services SageMaker Marketplace (ESM-C models available for commercial deployment) and NVIDIA BioNeMo (listed as upcoming). These buyers are typically bioinformatics and computational biology teams at pharmaceutical and biotech companies who prefer cloud-native, infrastructure- managed model deployment over direct API subscriptions. AWS and NVIDIA are channel partners, not end customers; the actual enterprise buyers are their downstream clients. Subscriber counts and deployment metrics are not publicly disclosed. The third tier is Forge API beta users. As of January 2025, EvolutionaryScale opened a free limited-time public beta of the Forge API, providing access to ESM3 and ESM-C models at scale. The Forge API targets academic scientists and commercial builders who need inference beyond the 1.4B open model. Commercial pricing for Forge post-beta has not been publicly announced. API enrollment requires an access token; no user count has been disclosed. The fourth tier — large pharma R&D buyers paying for enterprise platform access — is the highest-value segment but has the weakest publicly available evidence. Adaptyv Bio (a protein engineering company based in Lausanne, Switzerland) has been confirmed as a named ESM ecosystem partner. No Pfizer, Eli Lilly, Novartis, Roche, or other top-20 global pharma deal has been publicly announced, creating a material commercial proof gap relative to peers Generate Biomedicines and Isomorphic Labs.[CU023, CU031, CU012, CU008, CU010, CU011]

Customer Segmentation Overview
SegmentBuyer / User / PayerAccess ChannelUse CaseScale / Reach (Est.)Revenue / Strategic ValueKey Evidence Gap
Academic & independent researchersComputational/structural biologists, bioinformaticians at universities and research institutesGitHub (open weights), HuggingFace, PyPI (esm package)Protein representation, structure prediction fine-tuning, functional annotation, antibody design3.1k+ HF downloads (ESM3); 7.8k+ HF downloads (ESM-C); 129+ bioRxiv preprintsZero direct revenue; top-of-funnel signal; academic citation credibilityNo conversion rate from academic to paid user disclosed
Cloud platform enterprise usersBiotech / pharma IT and computational biology teams; AWS and NVIDIA customersAWS SageMaker Marketplace (ESM-C commercial license); NVIDIA BioNeMo (upcoming)GPU-managed protein embedding, molecular design, virtual screeningUndisclosed; mediated through AWS/NVIDIA customer basesHigh strategic (channel partner alignment with $142M investor); commercial metrics opaqueSubscriber count, revenue share, and SageMaker usage volume not public
Forge API beta usersAcademic and commercial scientists accessing larger ESM3 and ESM-C 6B modelsForge API at forge.evolutionaryscale.ai (token-gated)Large-scale protein generation, representation at scale beyond open-weight model limitsUndisclosed; free beta since January 2025Potential conversion to paid; commercial pricing not yet announcedBeta user count, active usage, and paid conversion plan not disclosed
Biotech / protein engineering companiesProtein engineering startups and CROs (e.g., Adaptyv Bio)Forge API, SageMaker, or open-weight integrationProtein binder design, antibody optimization, functional engineeringOne named partner (Adaptyv Bio); pipeline depth undisclosed via esm-partner repoEarly-stage; potential future revenue; validates product-market fit signalNo disclosed deal terms, pipeline size, or conversion from pilot to production
Large pharma R&D (gap segment)Pharma CSOs, VP Drug Discovery, BD executives at top-20 global pharmaEnterprise Forge API or direct SageMaker subscription (hypothesized)AI-assisted drug discovery, target identification, generative lead optimizationZero publicly confirmed as of May 2026; no announced dealHighest potential value; completely unverified commerciallyNo named deal analogous to Generate Biomedicines ($1.9B Amgen) or Isomorphic Labs (Lilly/Novartis)

Scale/reach estimates for academic and cloud tiers are derived from HuggingFace download counts and bioRxiv search volume; actual unique-user counts differ and are unknown. Revenue and strategic value assessments are inferred, not disclosed. The large pharma segment is a target market, not a confirmed customer tier.

[CU001, CU007, CU011, CU012, CU017]
FU001: Customer Journey Map: Academic Discovery to Enterprise Deployment

EvolutionaryScale's customer journey from open-weight academic discovery through Forge API trial to commercial cloud deployment and potential enterprise pharma engagement.

Journey stages are inferred from documented access mechanisms and competitive market norms. Stage transition rates (conversion from open-weight to API to enterprise) are entirely unknown. The pharma enterprise stage is a hypothesized destination, not a confirmed outcome.

[CU001, CU007, CU012, CU017, CU034]

6.2 Adoption Trajectory and Open-Access Usage

The clearest and most objectively measurable adoption signals come from open-access channels. On HuggingFace, the biohub/esm3-sm-open-v1 model (the 1.4B open-weight version of ESM3) had approximately 3,110 downloads with 291 likes as of the research date; the biohub/esmc-300m-2024-12 model had approximately 6,320 downloads and 30 likes; and biohub/esmc-600m-2024-12 had approximately 1,490 downloads and 32 likes. Total ESM-C family downloads across the two open models sum to approximately 7,810 as of May 2026. The ESM-C models were updated as recently as two days before the research cache date, indicating active maintenance. These download counts likely under-represent actual usage because many academic users clone the GitHub repository or access model weights via the official esm Python package rather than through the HuggingFace hub directly. On GitHub, the evolutionaryscale/esm repository is the primary open-source distribution channel. The organization maintains nine or more repositories including derivative technical infrastructure (DeepEP with 1,253 stars, NCCL fork with 1,270 stars), model weights, and the esm-partner repository explicitly labeled for partner collaborations. Active commit history through March–May 2026 demonstrates sustained development activity. Academic citation evidence is robust: a Semantic Scholar API search returned 32 papers building on ESM3 as of May 2026, and a bioRxiv search for "evolutionaryscale ESM3" returned 129 preprint results. Named downstream applications include MegSite (nucleic acid binding residue prediction), ProteinReasoner (multi-modal protein language model with chain-of-thought reasoning), iNClassSec-ESM (non-classical secreted protein discovery), and affinity peptide design for chromatographic purification — spanning academic, clinical, and industrial applications. The ESM3 Science paper (published January 16, 2025, DOI 10.1126/science.ads0018) provides authoritative academic reception and serves as a credibility anchor for commercial conversations.[CU001, CU002, CU003, CU004, CU005, CU006]

Adoption and Usage Metrics Table
MetricValueDate / PeriodSourceConfidenceImplication
ESM3-open (1.4B) HuggingFace downloads~3,110As of May 2026HuggingFace org page (biohub/esm3-sm-open-v1)High — direct read from HF pageBaseline demand signal for open-weight version; understates total usage (GitHub + pip)
ESM-C 300M HuggingFace downloads~6,320As of May 2026HuggingFace org page (biohub/esmc-300m-2024-12)High — direct read from HF pageMost popular ESM-C model; broader adoption than ESM3-open, likely due to lower compute requirements
ESM-C 600M HuggingFace downloads~1,490As of May 2026HuggingFace org page (biohub/esmc-600m-2024-12)High — direct read from HF pageHigher capability tier; incremental users willing to pay compute premium
Total ESM-C family HF downloads (300M + 600M)~7,810As of May 2026HuggingFace org page aggregationHigh — computed from two confirmed valuesESM-C family exceeds ESM3-open by 2.5x, suggesting representation use cases have broader demand than generation
Downstream papers citing ESM3 (Semantic Scholar)32 papersAs of May 2026Semantic Scholar API search (query: ESM3 EvolutionaryScale protein language model)High — API result with known queryGrowing downstream research ecosystem; validates academic product-market fit
bioRxiv preprints mentioning ESM3 + EvolutionaryScale129 resultsAs of May 2026bioRxiv search for 'evolutionaryscale ESM3'High — direct search result count4x more preprints than Semantic Scholar-indexed papers; indicates large unreported usage pipeline
EvolutionaryScale GitHub DeepEP repo stars1,253As of May 2026GitHub org page (evolutionaryscale/DeepEP)High — direct read from GitHubSignals active developer engagement beyond model users; developer community building
NCCL fork stars (evolutionaryscale/nccl)1,270As of May 2026GitHub org page (evolutionaryscale/nccl)High — direct read from GitHubIndicates GPU infrastructure-level engineering credibility; appeals to enterprise AI infra buyers
Forge API public beta launchLaunched January 2025January 2025Company blog (evolutionaryscale.ai, January 2025 post)High — company official announcementCommercial intent confirmed; exact beta user count not disclosed
Named downstream academic applications (Semantic Scholar, select)MegSite, ProteinReasoner, iNClassSec-ESM, affinity peptide design (4+ named)2025–2026Semantic Scholar API result (ESM3 EvolutionaryScale)High — individual paper citations confirmedDemonstrates multi-domain downstream use in clinical, basic science, and industrial contexts

HuggingFace download counts represent unique model downloads from the HuggingFace hub only; actual usage via pip install, GitHub clone, or SageMaker deployment is excluded. GitHub star counts are developer interest proxies, not active user counts. Semantic Scholar returns published papers; preprint count from bioRxiv is ~4x higher. All metrics reflect open-access usage; commercial deployment metrics are entirely opaque.

[CU001, CU002, CU003, CU004, CU005, CU006]
FU002: Adoption Funnel: Open-Access to Commercial

Estimated top-down funnel from total addressable academic user base through open-weight downloads, API beta enrollment, commercial SageMaker subscriptions, and enterprise pharma partnerships.

Funnel values above 'Forge API beta enrollees' are approximate: combined HuggingFace download total of ~10.9k plus an estimate of GitHub-only users. Values for Forge API beta enrollees (~200) and SageMaker commercial subscribers (~10) are rough low-end estimates only; EvolutionaryScale has not disclosed these counts. The researcher community addressable total is an industry estimate. Numeric estimates for the two undisclosed tiers are placeholders and carry very high uncertainty.

[CU001, CU004, CU005, CU006, CU011, CU017]

6.3 Named Deployments and Integration Partners

The named commercial deployment and integration evidence for EvolutionaryScale is anchored by three confirmed channels and one named partner. First, AWS SageMaker Marketplace lists ESM-C models for commercial deployment under the Cambrian Inference Clickthrough License Agreement. The GitHub README provides explicit deployment instructions: admin-level AWS access, subscription via the SageMaker Marketplace, and CloudFormation-based launch taking 15–25 minutes. GPU costs are billed directly to the subscriber's AWS account. This represents a verifiable commercial deployment path, though subscriber counts are undisclosed. Second, NVIDIA BioNeMo listed ESM-C as an upcoming integration as of December 2024. NVIDIA BioNeMo targets drug discovery, molecular design, virtual screening, and protein binder design use cases — exactly matching ESM-C's intended commercial applications. NVIDIA is also a strategic investor in EvolutionaryScale (Series A participant), creating a structural incentive for deep integration. The NVIDIA investor relationship is confirmed by both the Series A announcement (BusinessWire) and a dedicated NVIDIA news release about the seed investment. Third, Adaptyv Bio — a protein engineering company at Biopole Life Science Campus, Lausanne, Switzerland — has been confirmed as a named ESM ecosystem partner. Adaptyv Bio's focus on protein design aligns directly with ESM model capabilities. The partnership reflects the small-and-growing-biotech customer segment that can utilize open-weight or API-based access without the procurement overhead of large pharma. Fourth, the Forge API public beta (launched January 2025) constitutes a customer proof of the commercial platform, though the scale of enrolled users and conversion to paid status is not disclosed. The EvolutionaryScale GitHub ESM partner repository (evolutionaryscale/esm-partner, labeled "Repository for partner collaborations") implies a formal partnership pipeline beyond Adaptyv Bio, but no other partners are named publicly. Importantly, prior ESM family generations (ESM1b, ESM2) had documented corporate users: BioNTech and InstaDeep fine-tuned ESM models on COVID spike proteins to create a variant early-warning system flagging all 16 WHO variants of concern; Hie et al. used ESM1v/ESM1b to evolve antibodies; Shanker et al. used ESM-IF1 for antibody evolution against SARS-CoV-2. These legacy use cases by corporate and academic users validate the ESM family's practical utility, but they do not constitute current commercial customers of EvolutionaryScale's paid products.[CU007, CU008, CU009, CU010, CU011, CU013]

Named customer proof table
Customer / PartnerSegmentDeployment / IntegrationStatus (Production vs. Pilot)Outcome / Evidence QualityKey Limitation
AWS SageMaker Marketplace (ESM-C)Cloud platform channel — enterprise biotech/pharma AWS customersESM-C 300M, 600M, 6B available for subscription; CloudFormation deployment; GPU billed to subscriberProduction — live Marketplace listing; deployment documented in GitHub READMEHigh — GitHub README documents specific Marketplace URLs, deployment steps, and SDK integrationSubscriber count and revenue generated are not public; AWS does not disclose per-ISV usage
NVIDIA BioNeMo PlatformCloud platform channel — enterprise drug discovery teams using NVIDIA hardwareESM-C listed as upcoming integration; BioNeMo targets molecular design, virtual screening, protein binder designUpcoming / planned — announced December 2024 in ESM Cambrian blog; not yet live as of cache dateMedium — confirmed in company blog and NVIDIA BioNeMo platform page; integration status unverified post-announcementNo confirmed live integration or user count; 'soon' language in December 2024 blog indicates planned not confirmed
Adaptyv BioBiotech protein engineering startup (Lausanne, Switzerland)ESM model integration for protein engineering workflowsProduction / partnership — named as ESM ecosystem partnerLow-medium — named partner confirmed; specific use case, model version, and business terms not disclosedWebsite content minimal; no case study or quantified outcome published by either party
Forge API Beta EnrolleesAcademic and commercial scientists (mixed segment)Token-gated API access to ESM3 and ESM-C 6B at scale; same SDK as SageMakerProduction-grade API (beta) — launched January 2025; free limited-time accessMedium — Forge API is operational per GitHub SDK and company blog; enrollment volume undisclosedFree beta status; no revenue; post-beta paid pricing not announced; conversion plan opaque
BioNTech / InstaDeep (legacy ESM2 user)Large biotech / AI company (ESM predecessor generation)Fine-tuned ESM language model on COVID spike protein sequences for variant early-warning systemProduction — flagged all 16 WHO variants of concern before official designationHigh historical quality — documented in ESM3 blog, peer-reviewed context; real-world outcome confirmedLegacy use of ESM2 (free, predecessor model), not a current paying EvolutionaryScale customer

Coverage is partial: named partners and marketplace listings only. Undisclosed Forge API users, any private enterprise pilots, and any early-stage partner discussions in the esm-partner GitHub repository are excluded. The BioNTech/InstaDeep row documents prior ESM family usage, not a current commercial relationship with EvolutionaryScale. All revenue metrics are null or undisclosed.

[CU007, CU008, CU010, CU011, CU025, CU033]
FU003: Customer Proof Matrix: Evidence Quality by Deployment

Evidence quality, deployment status, outcome specificity, and retention signal across EvolutionaryScale's named and inferred customer deployments.

Matrix assessments are qualitative judgments based on the type and quantity of available evidence for each deployment. Production status for AWS SageMaker and Adaptyv Bio is inferred from documented access mechanisms; EvolutionaryScale has not issued press releases confirming active commercial deployments. BioNTech/InstaDeep row documents historical ESM2 usage, not a current EvolutionaryScale commercial relationship.

[CU007, CU010, CU011, CU025, CU033]

6.4 Retention, Durability, and Satisfaction Evidence

EvolutionaryScale has disclosed no NRR, GRR, customer churn rate, contract renewal statistics, or customer satisfaction scores as of May 2026. The absence of these metrics is expected for a company at this commercialization stage: the Forge API beta launched only in January 2025, AWS SageMaker listings are relatively recent, and no enterprise software deal with disclosed terms has been announced. The primary observable retention signals are indirect: sustained HuggingFace download growth (ESM-C updated within days of the research date), active GitHub commits through April–May 2026, and the ongoing accumulation of downstream academic papers (37 months of building on ESM models since ESM2's release). For the Forge API channel, the public beta model offers free access as an explicit customer development tool. The company's January 2025 blog post describes a "public beta, allowing scientists in academia and industry a free limited time preview" — which implies post-beta paid conversion as an intended but unverified retention mechanism. The ESM GitHub repository SDK integrates seamlessly across local, Forge, and SageMaker deployment modes (the same API code works regardless of endpoint), creating a low-switching- cost and high-stickiness profile that is architecturally favorable for retention but unverified at the commercial level. For the AWS SageMaker channel, retention is mediated by AWS cloud infrastructure lock-in. Once a customer deploys ESM-C via CloudFormation inside their AWS environment, migration to a competing protein LM requires deliberate re-integration effort, providing durable channel stickiness. The ESM2 predecessor models — available freely under a non-commercial license — represent an important floor for willingness-to-pay analysis. A customer who can satisfy their protein representation tasks with the free ESM2 (up to 15B parameters) has limited incentive to pay for ESM-C commercial access unless performance advantages justify the premium. The task of demonstrating quantifiable performance lift for specific pharmaceutical applications is a critical unresolved retention evidence gap.[CU022, CU027, CU028, CU030, CU032, CU035]

Retention, Repeat Usage, and Satisfaction Evidence Table
Metric / SignalValue or StatusSegmentConfidenceDiligence Ask
Net Revenue Retention (NRR)Not disclosedAll commercial segmentsN/A — metric does not exist publiclyRequest NRR disclosure in management due diligence; available only post-commercial launch at scale
Gross Revenue Retention (GRR)Not disclosedAll commercial segmentsN/A — metric does not exist publiclyRequest GRR at any enterprise customer; currently not applicable before paid tier is active at scale
HuggingFace model maintenance freshnessESM-C updated 2 days before research date (May 2026); ESM3 updated January 29, 2025Open-weight academic usersHigh — direct read from HuggingFace timestampsMonitor HuggingFace update frequency as a proxy for model freshness commitment
GitHub commit activity (evolutionaryscale org)Active commits across esm, DeepEP, nccl, transformers repos through April–May 2026Developer community usersHigh — visible from org page activityTrack issue resolution rate and release cadence to assess developer support quality
Forge API availability / uptimeAPI available per GitHub SDK documentation; no SLA or uptime data publishedForge API beta usersMedium — API endpoint referenced in code but no status page or uptime metricsRequest SLA terms and historical uptime for enterprise API diligence; check forge.evolutionaryscale.ai status page
Academic downstream paper accumulation rate32 Semantic Scholar papers (≈ 13 months post-ESM3 release); 129 bioRxiv preprintsAcademic usersHigh — from API searchTrack quarterly paper count as a leading indicator of commercial pipeline conversion
Reported customer churn eventsZero publicly documented churn or non-renewal events as of May 2026All segmentsLow — absence of evidence, not confirmed absence of churn; company is pre-scaleNot meaningful until paid commercial relationships are disclosed
Customer testimonials / G2/Gartner Peer Insights reviewsNone found as of May 2026All commercial segmentsHigh confidence in absence — systematic search returned no reviewsSearch G2, Gartner Peer Insights, and Capterra periodically; first reviews expected when enterprise tier launches

All NRR and GRR cells are null because EvolutionaryScale has no disclosed commercial revenue from its paid product tier as of May 2026. Retention proxies rely entirely on open-access metrics (HuggingFace downloads, GitHub activity, paper counts). The company is in an API beta phase; formal retention metrics are not yet applicable at commercial scale.

[CU022, CU027, CU028]
FU004: Open-Access Usage Metrics by Channel

Comparative bar representation of known open-access adoption metrics across HuggingFace and academic literature channels as of May 2026.

All values are from direct platform reads (HuggingFace page, GitHub org page, API search results) as of May 2026. HuggingFace downloads and GitHub stars are heterogeneous metrics (downloads reflect model weight retrievals, stars reflect developer interest). bioRxiv and Semantic Scholar values are search-result counts and may include indirect mentions.

[CU001, CU002, CU003, CU005, CU006]

6.5 Expansion Drivers and Concentration Risk

EvolutionaryScale's expansion trajectory is shaped by two competing dynamics. The first is favorable: the AWS and NVIDIA strategic investments create preferential channel placement, marketing co-promotion, and potential preferential access to both companies' enterprise customer networks. NVIDIA BioNeMo's 2x faster training and 6x faster inference claims, combined with ESM model integration, position EvolutionaryScale models inside a high- adoption GPU infrastructure platform. AWS's inclusion of ESM-C in SageMaker JumpStart creates discoverability among the thousands of life sciences companies deploying workloads on AWS. The "free academic tier → Forge API beta → enterprise SageMaker contract" funnel is architecturally sound. The second dynamic is adverse: EvolutionaryScale has no disclosed pharma anchor customer, no land-and-expand case study, and no publicly announced pricing that would enable market-standard comparisons. Generate Biomedicines disclosed a $1.9B Amgen collaboration; Isomorphic Labs announced deals with Eli Lilly and Novartis totaling over $3 billion in potential milestone value. EvolutionaryScale's frontier protein language model has arguably superior academic credentials (Science publication, 98B-param ESM3) but demonstrably inferior commercial proof relative to these direct competitors in the protein AI space. Customer concentration risk is currently unmeasurable — with no named enterprise customer, there is technically zero customer concentration, but this masks the greater risk: no enterprise revenue whatsoever from a public-benefit company that raised $142M over 27 months. Structural risks to the expansion thesis include: (1) open-weight ESM2 substitution — many downstream users achieve adequate results with the free 15B-parameter ESM2 rather than paying for ESM-C or ESM3 commercial access; (2) biosecurity constraints — responsible development frameworks and dual-use risk assessments (documented by NTI and safe.ai) create legitimate reasons to gate access to frontier protein design models, limiting the addressable commercial user base; and (3) academic open-source competition — AlphaFold3, ESMFold, and RoseTTAFold are all freely available for structure prediction, compressing the addressable market to generation and multimodal reasoning tasks where ESM3 has genuine differentiation.[CU014, CU017, CU018, CU019, CU024, CU026]

Expansion Drivers and Concentration Risk Table
Expansion Driver / Risk FactorTypeImpactLikelihood / StatusDiligence Path
AWS SageMaker + NVIDIA BioNeMo channel placementExpansion driverHigh — access to thousands of enterprise life sciences customers via cloud platformsConfirmed (SageMaker live; BioNeMo announced)Monitor SageMaker listing rankings and BioNeMo launch timeline; request channel partner revenue split terms
No named pharma anchor customer (commercial proof gap)Concentration risk / adverseHigh — absence of enterprise customer proof limits Series B valuation and future fundraisingConfirmed gap as of May 2026Monitor for pharma deal announcement; request management update on enterprise sales pipeline stage and count
Open-weight ESM2 free substitution riskAdverse headwindMedium — ESM2 (up to 15B params, free) satisfies representation tasks for many downstream usersConfirmed — ESM2 available; degree of substitution unknownQuantify what fraction of API beta users genuinely require ESM3 or ESM-C 6B performance vs. ESM2
Biosecurity / dual-use constraints on frontier protein design modelsAdverse headwindLow-medium — responsible development framework may gate access to high-risk use cases, limiting addressable commercial marketActive — NTI and safe.ai document ongoing biosecurity concerns about protein design AIReview EvolutionaryScale responsible development framework; assess whether access controls would materially restrict pharma customer use cases
Forge API commercial pricing launch (future)Expansion driver (pending)High — paid Forge API would create first direct revenue metric and proof of willingness-to-payPending — free beta only as of May 2026; pricing model not announcedTrack Forge pricing page; ask management for pricing tier structure and expected launch date
Land-and-expand via academic-to-enterprise funnelExpansion driver (structural)Medium — documented ESM2 corporate adoption (BioNTech/InstaDeep) suggests enterprise conversion is possibleStructural — funnel architecture confirmed; conversion rate unknownRequest Forge API conversion rate data from management; compare academic vs. commercial API usage split

Expansion driver assessments are forward-looking and based on structural inference from channel relationships, not disclosed revenue data. Adverse headwinds (ESM2 substitution, biosecurity constraints) are substantiated by confirmed free alternatives and independent biosecurity organization documentation, respectively. All probability assessments are qualitative, not quantitative.

[CU017, CU018, CU024, CU035, CU036]

6.6 Exhibits

Chapter 07

07Risks

7.1 Biosecurity and Dual-Use Risk

Biosecurity risk is the most existential dimension of the EvolutionaryScale thesis. ESM3 can generate functional proteins at sequence distances that "represent 500 million years of evolution" from any known natural protein, as demonstrated by esmGFP. The same generative capability that enables drug discovery could, in principle, be directed at enhancing pathogen virulence, generating novel toxins, or engineering biological agents outside the known sequence space monitored by surveillance systems. A 2023 MIT study published on arXiv (Sandbrink & Shulman, 2306.03809) showed that large language models could, in a one-hour session, identify potential pandemic pathogens, synthesis routes, and CRO partners for non-scientists with no laboratory training. While that study focused on general-purpose LLMs rather than protein-specific models, the concern is directly analogous: protein language models lower the barrier to designing functional biological agents. US Executive Order 14110 (October 30, 2023) explicitly singles out biotechnology as a national-security AI risk domain, mandating evaluations of AI systems that might lower barriers to creating biological, chemical, nuclear, or radiological weapons with mass-casualty potential. The NIST AI Risk Management Framework (AI RMF 1.0, released January 2023; updated Generative AI Profile July 2024) provides voluntary guidance for identifying and managing these risks. The EU AI Act (Regulation 2024/1689, OJ 12 July 2024) includes dual-use bio-related AI in its high-risk scope under Annex III and related provisions. EvolutionaryScale has published a Responsible Development Framework with four core tenets: communicate benefits and risks, rigorously evaluate models before deployment, adopt guardrails, and engage government and civil society. ESM Cambrian's launch blog states that "ESM C was reviewed by a committee of scientific experts who concluded that the benefits of releasing the models greatly outweigh any potential risks." However, the canonical Responsible Development Framework URL (/blog/responsible-development) returned a 404 error at access date (2026-05-18), indicating the framework document may not be publicly accessible, which itself is a transparency risk. No independent third-party verification of EvolutionaryScale's model safety evaluations has been publicly disclosed. The Biological Weapons Convention (BWC, 1972, 189 parties as of May 2025) prohibits development and stockpiling of biological weapons but has no formal verification regime and no mechanism specifically addressing AI-designed proteins. The Center for AI Safety's 2023 statement — signed by Hinton, Bengio, and others — identifies pandemic- class bio risk from AI as one of the top extinction-level concerns. The Johns Hopkins Center for Health Security focuses explicitly on AI-biosecurity intersection as a core research area. Industry self-regulation via responsible AI bio frameworks (Anthropic's RSP, OpenAI's safety commitments) is nascent and not binding on third parties like EvolutionaryScale.[CR001, CR002, CR003, CR004, CR005, CR006]

Regulatory / Legal Risk Register
RiskCategoryLikelihood (1-5)Impact (1-5)Residual ScoreCurrent MitigationResidual Exposure / Gaps
Regulatory imposition of mandatory biosecurity evaluations or export controls on protein LLMsBiosecurity/Regulatory3515Responsible Development Framework; government engagement stated in ESM3 blogNo binding third-party evaluation; /blog/responsible-development URL inaccessible at access date
Deliberate misuse of ESM3 API to design novel pathogen proteins or toxinsBiosecurity/Dual-Use2510Access controls on Forge API; academic use restrictions; model output monitoring statedNo independent biosecurity audit of API guardrails disclosed publicly
Competitive commoditisation: free tools (AlphaFold3 DB, Chai-1, OpenFold) erode Forge API pricingCompetitive4416ESM3 98B generative multimodality differentiates from structure-prediction tools; drug-discovery fine-tunesChai-1 already matches/beats ESM3 on key benchmarks; erosion accelerating
Meta retains residual IP rights over ESM2 ancestor weights used to initialise ESM3Legal/IP248Patents filed on ESM3 architecture; PBC corporate structureNo public disclosure of Meta IP agreement; ESM2 model card terms ambiguous on derivatives
Capital runway exhaustion before Forge API achieves commercial revenue scaleFinancial3412$145 M raised; Amazon/Nvidia as investors provide compute access optionalityNo public revenue; valuation at 10×+ forward revenue implies high bar; no follow-on round disclosed
Down-round risk if Forge commercial adoption lags and Series A mark-to-market compressesFinancial339Pharmaceutical partnerships for drug discovery as revenue engine; AWS distributionNo disclosed enterprise contracts or ARR milestones; market-rate GPU costs remain high
Key-person departure: loss of Rives, Sercu, or Lin halts model developmentTalent/Execution248Equity incentives implied; four-founder team provides some redundancyNo disclosed succession plan; no independent board oversight of founder roles
Single-employer cultural concentration (all founders ex-Meta FAIR) reduces strategic diversityTalent/Culture326Company expanded beyond founding team (est. 50–80 employees)No external scientific advisory board publicly disclosed; potential for paradigm blind spots
Investor/competitor conflict: Amazon and Nvidia steer enterprise clients to competing platformsPartner/Dependency248Contractual distribution agreements provide channel incentivesNo MFN or exclusivity disclosures; BioNeMo includes third-party competing tools
EU AI Act high-risk classification triggering conformity assessment and access restrictions for ESM3 APIRegulatory236Responsible Development Framework aligned with RSP-type self-regulationEU AI Act full provisions apply Aug 2026; company EU presence and compliance posture not disclosed
Model hallucination: ESM3-generated sequences fail wet-lab validation at rate reducing customer ROITechnical339Alignment training (RLHF-analogous feedback) improves generation quality per ESM3 paperNo published wet-lab validation failure rates; latency between API call and lab result obscures true failure rate
Data scarcity for novel protein families limits generalization of ESM3 to unexplored sequence spaceTechnical339Synthetic data augmentation used in ESM3 training (predicted structures/functions)Synthetic data quality depends on AlphaFold predictions; circular dependency risk if predictions are wrong

Likelihood and Impact on 1–5 scale; Residual Score = Likelihood × Impact. Mitigations sourced from public EvolutionaryScale disclosures and industry standard practices. Residual exposure column reflects unresolved public-evidence gaps.

[CR001, CR005, CR013, CR016, CR019, CR023]
FR001: Risk Severity Heatmap (Likelihood × Impact)

Risk items plotted by likelihood (x-axis, 1–5) and impact (y-axis, 1–5); higher-right quadrant = highest priority.

Likelihood and impact ratings are qualitative estimates based on public information; no quantitative probability model has been applied.

[CR001, CR005, CR013, CR019, CR032, CR038]

7.2 Technical Risks

ESM3's performance depends on the quality of its training distribution. The model is trained across 2.78 billion protein sequences but the natural diversity of functional proteins in novel families — e.g., non-ribosomally synthesised peptides, novel enzyme scaffolds, or fully de novo folds — may lie far outside this distribution. Protein language models can "hallucinate" high-confidence sequences that do not fold or function as predicted; wet-lab validation is required before any ESM3-generated sequence can be used therapeutically or industrially. This creates a latency risk: customers pay for Forge API calls but must still run expensive lab validation before generating commercial value, weakening the economic argument for premium pricing relative to open alternatives. Benchmark saturation is a near-term technical risk. ESM3-98B was the state-of-the-art on CASP15 monomer prediction at launch, but Chai-1 (Apache 2.0, free) already reports Cα LDDT of 0.849 vs ESM3-98B's 0.801 and a 77% vs 76% PoseBusters success rate, approaching parity with commercial models in structure prediction. AlphaFold 3 and its open database (200 M+ structures including protein complexes as of March 2026) continuously expand free coverage. Baker Lab's RFdiffusion (Nature 2023) is freely available for binder design. These open tools reduce the marginal value of Forge's API-gated access. ESM3's training is initialised from Meta's ESM2 weights. Meta retains IP on the ESM2 model family under the terms of its own model card and the GitHub repository for facebookresearch/esm. ESM3 represents a substantial architectural and training advance beyond ESM2, but any residual IP claim from Meta on the ancestor weights could constrain EvolutionaryScale's ability to commercialise the 98B model or sublicense weights. The company has filed patents on aspects of its work (disclosed in the ESM3 biorxiv preprint competing interest statement), but the full patent portfolio and its relationship to Meta's prior art remain undisclosed. Compute dependency on NVIDIA (H100/H200 clusters) and Amazon (AWS) is a double-edged risk: both entities are investors and channel partners but could theoretically restrict access or deprioritise workloads. GPU supply constraints could delay model training or API scaling, particularly given that ESM3 was trained at 1×10²⁴ FLOPs — one of the largest compute investments for any biological model at launch.[CR011, CR012, CR013, CR014, CR015, CR016]

Technical Risk Detail
RiskMechanismEvidenceSeverityMitigation
Protein hallucination / non-functional generationsLanguage model generates sequences with plausible statistics but incorrect fold or no biological activityChai-1 technical report shows ESM3-98B at 0.801 Cα LDDT vs Chai-1 0.849; AlphaFold2 achieved accuracy for ~2/3 proteins only at CASP14High for drug-discovery customers expecting reliable hitsAlignment training analogous to RLHF per ESM3 paper; lab validation required
Benchmark saturation and competitor catch-upFree open tools rapidly close performance gaps on structure prediction; generative tasks less well-benchmarkedAlphaFold3 DB 200M+ structures free; Chai-1 Apache 2.0 at SOTA; RFdiffusion free from Baker LabMedium — limits premium pricing powerESM3's multimodal joint generation (sequence+structure+function) differentiates
ESM2 IP provenance — Meta ancestor weightsESM3 initialised from Meta's ESM2; Meta model card terms on derivatives not fully public; potential for claim on commercial weightsfacebookresearch/esm repo text: 'contains pre-trained weights' under Meta terms; no explicit open-license for ESM2 weightsMedium — latent licensing riskPatents filed; counsel review required; PBC structure provides some protection
Compute cost and GPU supply concentrationTraining at 1×10²⁴ FLOPs + ongoing inference; HPC cluster needed for ESM3-98B; NVIDIA H100/H200 scarceESM3 blog: 'trained on one of the highest throughput GPU clusters in the world today'Medium operationalAmazon and Nvidia as investors provide access optionality; multi-cloud risk remains
Training data gaps for rare protein classesNovel organisms, synthetic biology substrates, or non-natural amino acids may lie outside the 2.78B-sequence training distributionESM3 paper: augmented with synthetic data to cover gaps; ESM Cambrian scaling law plateau indicates ceilingMedium — limits utility for frontier drug discoverySynthetic augmentation; ongoing model updates via ESM Cambrian family

Severity ratings qualitative; evidence citations refer to public model cards and technical reports.

[CR011, CR012, CR013, CR014, CR015, CR016]
FR002: Risk Transmission DAG: How Primary Risks Cascade to Revenue and Valuation

Directed acyclic graph showing how biosecurity, technical, and financial risks flow to commercial and investor impacts.

[CR005, CR013, CR019, CR032, CR042]

7.3 Competitive Commoditisation Risk

The protein AI tooling landscape is commoditising rapidly. Google DeepMind's AlphaFold 3 database provides over 200 million predicted protein-complex structures freely via EMBL-EBI partnership (updated March 2026 to include protein complexes). Meta's ESM2 is MIT-licensed via facebookresearch/esm; OpenFold is Apache 2.0. Chai-1 is Apache 2.0 with free commercial use. Baker Lab's RFdiffusion and ProteinMPNN are freely available from IPD/UW. These free-to-use models serve structure prediction and binder design workflows that are core use cases for Forge's API. EvolutionaryScale's defensibility relies on: (1) ESM3's generative multimodal capability going beyond structure prediction to sequence/structure/function joint generation; (2) the 98B-parameter flagship model being API-gated and commercially licensed; (3) domain-specific fine-tunes for drug discovery that require proprietary data. However, Chai-1's technical report claims state-of-the-art multimer prediction without MSA, directly competing with ESM3's key differentiator. If academic and venture-backed competitors (Profluent, Generate Biomedicines, AbSci, Isomorphic Labs) release competitive generative models under permissive licenses, Forge's pricing power will compress. The competitive risk is compounded by investor/partner overlap: Amazon (AWS) distributes EvolutionaryScale models via SageMaker JumpStart but also invests in and provides compute to competing bio-AI companies; Nvidia distributes via BioNeMo but BioNeMo is itself a competing model-distribution platform. Any conflict between partner and investor interests could result in preferential treatment of alternatives on these platforms.[CR019, CR020, CR021, CR022, CR023, CR024]

Competitive Threat Matrix
Competitor ToolLicence / CostPrimary CapabilityThreat to Forge APIGap vs ESM3
AlphaFold 3 DB (DeepMind/EMBL-EBI)Free; CC BY 4.0Structure prediction for proteins, complexes, small molecules; 200M+ entries DBHigh for structure-prediction use casesNot a generative model; does not generate sequences from prompts
Chai-1 (Chai Discovery)Apache 2.0; free commercial useMultimer structure prediction; 77% PoseBusters; 0.849 LDDT monomer; no MSA neededHigh — already beats ESM3-98B on CASP15 monomerNot yet a generative protein design model; sequence generation limited
OpenFold (AQ Laboratory)Apache 2.0 trainableAlphaFold2-equivalent structure prediction; trainable on proprietary dataMedium — training requires computeStructure-only; no sequence/function generation; no 98B-scale model
RFdiffusion (Baker Lab/IPD)Permissive; free for researchDe novo protein backbone generation; binder design; motif scaffoldingMedium for binder design use casesNo sequence/function joint reasoning; less multimodal than ESM3
Meta ESM2 (MIT license via facebookresearch)MIT — free commercial useSequence embedding; structure prediction (ESMFold)Medium for embedding and structure tasksNot generative in the ESM3 sense; superseded by ESM3 architecture
Profluent Bio, AbSci, Generate BiomedicinesProprietary / partnershipAI-driven antibody/protein design with integrated wet-labMedium for enterprise drug-discovery customersVertically integrated competitors; charge for discovery services not API access

Competitor data sourced from public GitHub repos, model cards, and technical blog posts at access date.

[CR019, CR020, CR021, CR022, CR023]

7.4 Regulatory Landscape

The regulatory environment for AI-designed proteins is nascent and fragmented. No jurisdiction has issued a specific rule governing commercial deployment of generative protein language models. The US, EU, and UK are each developing frameworks that could apply to EvolutionaryScale's products under different risk classifications. In the US, EO 14110 (October 30, 2023) requires developers of dual-use foundation models above defined thresholds to report safety evaluations to the government, with specific attention to "biosecurity, cybersecurity, and critical infrastructure" risks. The NIST AI RMF (January 2023) and its Generative AI Profile (July 2024) provide voluntary guidance. The FDA regulates AI/ML-enabled medical devices via its Software as a Medical Device (SaMD) framework and 2024 AI/ML action plan, but this applies only to diagnostic or treatment-decision AI, not pure discovery-stage protein design tools. The BIS (Bureau of Industry and Security) has begun examining export-control frameworks for AI models that could be used in biological weapons contexts, though no specific rule governing protein language models has been finalised. In the EU, the AI Act (Regulation 2024/1689, effective August 2024; most provisions apply from August 2026) classifies AI systems with potential for dual-use biological harm in its high-risk or prohibited categories depending on application. Dual-use API access to ESM3 could require conformity assessments, transparency obligations, and human oversight measures once the Act's provisions are fully enforced. UK biosafety review is ongoing under the Biosecurity Strategy and AI Safety Institute framework, which evaluates biological risks from AI in collaboration with the US and international partners. The Biological Weapons Convention (signed and ratified by 189 states) prohibits development and production of biological weapons but contains no AI-specific language and lacks a formal verification mechanism. The regulatory tail risk is asymmetric: new binding rules (mandatory safety evaluations, export controls, access restrictions) could impose compliance costs, limit international distribution, or require model redactions — any of which would impair Forge's commercial model without precedent for the duration or severity of such restrictions.[CR025, CR026, CR027, CR028, CR029, CR030]

Regulatory Landscape
InstrumentJurisdictionStatusApplicability to ESM3/ForgeLikely TimelineResidual Risk
EO 14110 — Safe, Secure, and Trustworthy AI (§4.4 biotechnology)US FederalIn force (Oct 30, 2023); future implementation rules TBDRequires frontier model developers to report dual-use biological evaluations to NIST/OSTP above compute thresholdsOngoing; reporting requirements depend on rulemakingMedium: ESM3 at 1×10²⁴ FLOPs may cross thresholds; no confirmed reporting to date
NIST AI RMF 1.0 + Generative AI Profile (NIST-AI-600-1)US (voluntary)Published Jan 2023; GenAI profile Jul 2024Voluntary framework; increasingly referenced in procurement and regulatory contextsVoluntary; de facto standardLow-medium: non-compliance creates reputational and procurement risk only
EU AI Act (Regulation 2024/1689)EUPublished Jul 12, 2024; Aug 2026 full enforcementGeneral-purpose AI model with >10²⁵ FLOPs training may trigger systemic risk obligations; dual-use bio applications could be high-riskAug 2026 for most provisionsHigh: conformity assessment, transparency obligations, and third-party audits could restrict EU market access
FDA AI/ML-Enabled Medical Devices framework (SaMD)US FDAEvolving; 2024 AI/ML action planApplies to diagnostic/treatment AI, not discovery-stage protein design tools; future expansion possibleGradual; no specific protein LLM ruleLow currently; could expand if ESM3 used in clinical decision support
BIS Export Administration Regulations (EAR) — potential AI bio controlsUS (Commerce/BIS)ANPRM under development (2024); no final rule for protein LLMsCould restrict export of ESM3 weights or API access to adversarial nations2025–2027 likely for final rule if enactedMedium: would restrict international commercial revenue and academic distribution
Biological Weapons Convention (BWC)International (189 parties)In force since 1975; no verification regimeProhibits development of bio weapons; does not address AI-designed proteins specifically; company and customers must complyOngoing; no AI-specific amendment near-termLow-medium: compliance obligation on customers; no direct regulatory burden on EvolutionaryScale beyond terms of service

Status and timeline information based on publicly available regulatory documents and fetch-date knowledge. EU AI Act enforcement dates may change via implementing acts. BIS ANPRM status subject to change.

[CR025, CR026, CR027, CR028, CR029, CR030]
FR003: Regulatory Timeline: Key Biosecurity and AI Policy Milestones

Chronological view of regulatory milestones affecting EvolutionaryScale from 2022 to 2027 (projected).

Dates for 2026+ regulatory milestones are estimates based on typical US/EU rulemaking timelines; actual dates may vary.

[CR025, CR026, CR027, CR028, CR029]

7.5 Financial and Operational Risk

EvolutionaryScale has raised approximately $145 M in total (seed plus $142 M Series A, September 2024) at a $1.35 B post-money valuation. Operating a frontier protein language model at the 98B-parameter scale requires significant recurring compute. Training ESM3 consumed 1×10²⁴ FLOPs on a high-throughput GPU cluster; ongoing inference serving and model iteration at commercial scale carry recurring hardware costs. There is no public revenue disclosure; the company is pre-revenue or very-early-revenue at known access date. At typical AI infrastructure spending rates for a 50–80 person company with frontier GPU clusters, $145 M implies a runway measured in 2–4 years unless the Forge API converts to meaningful commercial revenue quickly. The valuation implies a multiple exceeding 10× any forward revenue projection achievable with current Forge adoption, creating down-round risk if the commercial ramp is slower than investor expectations. The investor/partner concentration in Amazon and Nvidia creates dual conflicts: both are the company's primary cloud-compute providers, primary distribution channels (SageMaker JumpStart, BioNeMo), and Series A investors. In a scenario where EvolutionaryScale needs to renegotiate cloud contracts or seek competitive bids, the investor relationship limits negotiating leverage. Conversely, if Amazon or Nvidia develop competing capabilities (which they have: BioNeMo already includes ESM models but also competing tools), they may have incentive to steer customers away from Forge API. The company has no disclosed IP monetisation strategy beyond the Forge API subscription. If the open ESM3 1.4B model (free for academic use) cannibalises commercial Forge adoption by serving most academic demand, and if pharmaceutical customers prefer building internal capabilities rather than paying API fees, the commercial model could underperform at scale.[CR032, CR033, CR034, CR035, CR036, CR037]

Financial and Operational Risk Register
RiskDriverLikelihoodImpactMitigationDiligence Ask
Capital exhaustion before Forge API revenue scaleGPU-cluster burn rate at $145M total raised; no public revenueMediumCriticalAmazon/Nvidia compute access at investor terms; ESM Cambrian academic open models reduce inference burdenDisclose monthly burn rate, Forge ARR, and runway guidance
Down-round risk at $1.35B valuation if commercial ramp lagsValuation implies 10× revenue projection at pre-revenue stage; market comparables compressed in 2024-2026MediumHighSeries A was oversubscribed (Amazon + Nvidia lead); strong investor syndicateObtain cap-table, option pool dilution, and any Series B mandate or down-round protection clauses
Investor/partner conflict: Amazon and Nvidia as investors, compute providers, and distribution platformsStructural: both entities operate competing BioNeMo and SageMaker platformsMediumHighContractual ring-fencing of investor and commercial arms assumed but not confirmedObtain copy of investor side-letter and any non-compete or channel-exclusivity terms
Revenue concentration: Forge API as sole commercial productNo diversification into drug-discovery partnership revenue, milestone payments, or data licensing publicly disclosedMediumMediumForge API on AWS SageMaker and NVIDIA BioNeMo broadens distribution; pharmaceutical partnerships impliedObtain revenue breakdown: API fees vs partnership vs milestone vs licensing
GPU cost inflation: NVIDIA H100/H200 pricing power as sole-source supplier for frontier trainingNVIDIA market dominance in AI accelerators; no near-term AMD/Intel alternative at equivalent performanceMediumMediumAmazon and Nvidia as investors may provide preferential pricing; cloud spot pricing optionalityConfirm compute cost per 1K API calls and model training run cost; assess margins at scale

Likelihood and Impact are qualitative ratings. Financial risk assessment is based on public information only; no revenue, burn-rate, or margin data was disclosed by the company.

[CR032, CR033, CR034, CR035, CR036, CR037]

7.6 Talent, Key-Person, and Culture Risk

All four named founders — Alexander Rives (CEO), Tom Sercu (President), Zeming Lin (CTO), and Salvatore Candido — are alumni of Meta AI's FAIR protein research team. This single-employer provenance creates a cultural concentration risk: the team shares a common research paradigm, network, and career history. Cultural monoculture can accelerate aligned decision-making but reduces cognitive diversity in assessing strategic pivots or regulatory threats that FAIR's academic culture may not have prepared them for. Key-person risk is acute. Alexander Rives is the original ESM model creator and primary scientific visionary. Tom Sercu and Zeming Lin are the primary technical architects of ESM3 and ESM Cambrian, as listed in the biorxiv preprint author list. Departure of any founder would likely impair model development velocity and investor confidence. No succession planning or CEO independence has been publicly disclosed. The company is in the Bay Area AI talent market, which is among the most competitive globally. Retaining senior ML researchers against offers from well-capitalised hyperscalers (Google DeepMind, Meta, Microsoft) or pharma AI arms (Isomorphic, Xaira) at $1.35 B valuation and without public liquidity is a structural challenge. Amazon and Nvidia's investor status could reduce acqui-hire risk from those specific parties, but does not reduce the risk of talent migration to other hyperscalers.[CR038, CR039, CR040, CR041]

7.7 Legal and IP Risk

EvolutionaryScale's ESM3 was trained using ESM2 weights as a starting point. Meta's facebookresearch/esm GitHub repository (which hosts ESM2) does not carry a standard open-source licence for the model weights themselves; the ESM2 model weights are described under Meta's own model card terms. The relationship between ESM3's commercial weights and the ESM2 ancestor weights is not fully documented in public sources, creating a latent IP provenance risk if Meta were to assert rights over derivative works. The biorxiv preprint competing interest statement notes that "patents have been filed related to aspects of this work." The nature, claims, and status of these patents are not publicly disclosed. The discrete-token approach to protein modelling used in ESM3 (tokenising 3D structure and function into discrete alphabets) has potential prior art from academic groups, including the Baker Lab, Meta FAIR, and Oxford-based researchers. Any infringement assertion — or patent interference proceedings — could slow commercialisation or require expensive licensing arrangements. EvolutionaryScale is incorporated as a Public Benefit Corporation (PBC), which provides some governance flexibility but also creates obligations around public benefit mission that could constrain purely commercial decisions, particularly around open-model access vs. commercial gating.[CR042, CR043, CR044, CR045]

7.8 Exhibits

Chapter 08

08Valuation

8.1 Investment Thesis and Anti-Thesis

The investment thesis for EvolutionaryScale rests on four mutually reinforcing pillars. First, founder domain authority: Alexander Rives, Tom Sercu, Zeming Lin, and Salvatore Candido are the literal creators of the ESM protein language model family at Meta AI FAIR, representing institutional knowledge and publication track record that no competing team can replicate from scratch. Second, peer-reviewed scientific validation: ESM3 was published in Science Magazine on January 16, 2025, the premier peer-reviewed journal, documenting the generation of a novel fluorescent protein equivalent to simulating 500 million years of evolution—an extraordinary scientific claim publicly verified through editorial review and now indexed with over 1×10^24 FLOPs of training compute. Third, structural cloud distribution moat: Amazon (AWS) and NVIDIA are not merely financial investors; they are distribution channel partners embedding Forge into AWS SageMaker JumpStart and NVIDIA BioNeMo, giving EvolutionaryScale direct access to the cloud infrastructure used by virtually every global pharma and biotech R&D organization. Fourth, unique multimodal generative capability: ESM3 simultaneously reasons over protein sequence, structure, and function—a capability no peer protein AI startup (Profluent, Cradle.bio, Absci) has matched in a single foundation model. The anti-thesis is equally evidence-grounded. First, zero revenue disclosed: no ARR, customer count, or gross margin has been publicly confirmed for Forge as of May 2026; the $1.35B Series A valuation is entirely forward-looking, making it one of the richest pre-revenue entries in the AI biotech sector. Second, open-source substitute threat: ESM2 (the predecessor model) is freely available open-source; AlphaFold 3 (Google DeepMind) provides free non-commercial protein structure and interaction prediction to over 3 million researchers globally; both directly substitute for the core of EvolutionaryScale's commercial offering. Third, key-person concentration: all four co-founders come from a single prior employer (Meta AI FAIR); their simultaneous departure risk is a correlated concentration risk without analogy at comparable startups. Fourth, dual-use and biosecurity regulatory overhang: ESM3's generative protein design capabilities carry biosecurity risks acknowledged in the responsible development framework; undisclosed customer screening protocols leave regulatory exposure uncertain. Fifth, VC valuation multiple compression risk: KPMG's 2024 Venture Pulse report explicitly warned that investors are becoming "more discerning as to who the winners may be in the AI space" and will favor companies with credible commercial models, a standard EvolutionaryScale has not yet met in public evidence.[CV001, CV002, CV006, CV007, CV009, CV022]

Thesis and anti-thesis analysis
PerspectiveArgumentWhat Would Change the View
ThesisFounders (Rives, Sercu, Lin, Candido) created the ESM protein language model family at Meta AI FAIR—institutional knowledge no competing team can replicateCo-founder departure or formation of a competing lab with access to similar training data
ThesisESM3 published in Science Magazine (Jan 2025): first multimodal protein generative model with peer-reviewed validation of 500M-year evolution simulationScientific peer challenges or reproducibility failure on key ESM3 claims
ThesisAmazon (AWS) + NVIDIA co-investment: Forge deployed on SageMaker JumpStart and BioNeMo gives direct access to virtually all global pharma R&D cloud infrastructureAmazon or NVIDIA terminate the distribution partnership or shift to a competing protein AI platform
ThesisESM3 multimodal reasoning (sequence + structure + function simultaneously) is unique among protein AI peers and enables prompt-guided protein design at scaleA peer demonstrates equivalent multimodal capability open-source and widely adopted before Forge secures multi-year contracts
Anti-thesis$1.35B Series A with zero disclosed revenue implies ~9.5x post-money-to-raised ratio—one of the richest pre-revenue AI biotech entries; no confirmed ARR or customer countDisclosed Forge ARR >$10M with gross margin >60% and multi-pharma customer count
Anti-thesisESM2 (predecessor) is open-source; AlphaFold 3 (Google DeepMind) provides free non-commercial protein structure prediction to 3M+ researchers—direct substitutes for lower-tier Forge use casesESM3's unique generative capabilities (not replicated by open-source) command sustained pricing above commodity API levels
Anti-thesisAll four co-founders joined from one prior employer (Meta AI FAIR)—correlated key-person departure risk creates single-point-of-failure at leadership levelFounders sign long-term employment contracts and hire a second-tier of independent scientific leadership

Arguments in both thesis and anti-thesis are evidence-backed. Rows ordered by relative impact on valuation. All anti-thesis rows reflect observable public evidence as of May 2026; no speculative claims included.

[CV006, CV007, CV009, CV022, CV023, CV029]
FV001: Recommendation logic flow

Decision chain from founder pedigree, platform proof, risk factors, and valuation anchors to the Research-More recommendation and required catalysts.

[CV006, CV009, CV030, CV033]

8.2 Recommendation, Confidence, and Risk Rating

The recommendation is Research-More / Track with Interest. This is not a buy recommendation for three evidence-based reasons. First, valuation confidence is MEDIUM: the $1.35B post-money Series A anchor is confirmed via Crunchbase and Bloomberg but no intrinsic value model can be constructed without Forge revenue, ARR, gross margin, or customer count data—all currently undisclosed. Second, the entry multiple is aggressive: at $1.35B pre-revenue, EvolutionaryScale's implied price-to-raised multiple (~9.5x post-money-to-raised) substantially exceeds sector norms for pre-revenue biotechs and is difficult to justify without visibility into commercial traction. Third, bear case risk is asymmetric: the open-source ESM2 and free AlphaFold 3 represent real substitutes that could compress API pricing and destroy the revenue case before it materializes. The overall confidence rating is MEDIUM. Evidence is strong on the product (Science publication, model architecture), the team (Meta AI pedigree, GitHub activity, HuggingFace presence), and the funding (confirmed, multi-source). Evidence is weak on the commercial side (no revenue, no ARR, no customer disclosure, no partnership financial terms). The risk rating is HIGH, reflecting: pre-revenue entry at premium valuation; key-person concentration across all four co-founders from a single prior employer; open-source and free-tier competitive substitution; dual-use biosecurity regulatory uncertainty; and structural dependence on Amazon and NVIDIA for commercial distribution. The valuation stance is infrastructure/platform AI premium with no clinical proof uplift. Unlike clinical-stage AI drug discovery companies (Insilico Medicine, Recursion), EvolutionaryScale's value is entirely in the platform and foundation model layer— analogous to a foundation model API company (Anthropic, Cohere) applied to vertical biology, but without the general-purpose scale and with substantially more concentrated market exposure. A buy recommendation requires: disclosed Forge ARR of at least $10M+, active multi-pharma customer base, and confirmed gross margin above 60%.[CV001, CV002, CV005, CV011, CV013, CV030]

Recommendation summary
RecommendationConfidenceRisk RatingValuation StanceDecision Implication
Research-More / Track with InterestMedium (no Forge revenue data; open-source substitute risk; preference stack unknown)High (pre-revenue at $1.35B; key-person concentration; open-source ESM2 + AlphaFold 3 substitution; dual-use regulatory overhang)Infrastructure/platform AI premium; base case $1.5–2.5B; bull case $3–5B contingent on Forge ARR; bear case $400–800M on commoditizationTrack Forge ARR and customer count; require >$10M ARR and multi-pharma customers before upgrading to buy; monitor Amazon/NVIDIA partnership financials

Recommendation is price-sensitive and evidence-sensitive. Confidence and risk ratings reflect absence of disclosed Forge revenue and open-source substitute risk as of May 2026. Valuation stance range assumes no confirmed financial data.

[CV001, CV002, CV030, CV033]
FV004: Investment KPI scorecard

IC-ready scoring across six dimensions: market proof, platform moat, commercial evidence, economics visibility, risk level, and evidence quality.

[CV005, CV006, CV009, CV030]

8.3 Financing, Valuation Context, and Entry Discipline

EvolutionaryScale closed its $142M Series A on September 26, 2024, with Amazon (AWS) and NVIDIA co-leading. Lux Capital, Nat Friedman, and Daniel Gross (the AI Grant organization) participated. Total capital raised is approximately $145M including seed funding. The post-money valuation is approximately $1.35B—confirmed via Crunchbase, Bloomberg (paywall), and PitchBook (paywall). No Form D securities filing was identified in SEC EDGAR's full-text search database for "EvolutionaryScale", which is consistent with the company's private status and possible use of Regulation D without public disclosure. The $1.35B valuation at pre-revenue stage is rich by historical biotech venture standards, but broadly consistent with the 2024 AI valuation environment in which five US companies each raised $4B+ rounds in Q4 2024 alone (KPMG Venture Pulse). The strategic nature of Amazon and NVIDIA's investments substantially alters the risk-adjusted thesis: both are distribution channel partners whose investments create a self-reinforcing commercial incentive to route pharma API traffic through Forge. AWS SageMaker JumpStart and NVIDIA BioNeMo together reach virtually every major global pharma R&D organization, making the channel moat real and durable. Entry discipline requires confirmation of commercial traction before a buy recommendation. The preference stack, cap table structure, and dilution overhang from the Series A are unknown; without confirmed audited financials or pro-forma cap table disclosure, common equity value at any given enterprise value cannot be precisely computed. The Amazon and NVIDIA co-investment structurally limits the probability of a hostile acqui-hire or catastrophic down-round, since neither investor would benefit from a distressed sale that undermines their cloud platform strategy.[CV001, CV002, CV003, CV004, CV005, CV009]

Bull, base, and bear scenario analysis
ScenarioKey AssumptionsValuation Range (USD B)Key Risk and Probability Signal
Bull ($3–5B)Forge achieves $50–100M ARR by 2027; multi-pharma multi-year contracts established; AWS+NVIDIA channel generates scale distribution; ESM3 is adopted as the protein foundation-model standard across biopharma; gross margin >70%$3.0–5.0BRequires confirmed $25M+ ARR data point and 2+ disclosed pharma contracts; comparable to Generate Biomedicines (~$2.5B) with better distribution advantage; probability: 25–30%
Base ($1.5–2.5B)Slow commercial ramp; $10–25M ARR by 2027; most revenue via AWS/NVIDIA channel fees; some enterprise pharma contracts but API pricing pressure from open-source; moderate team expansion; Series B at modest premium$1.5–2.5BConsistent with current Series A entry at ~$1.35B; modest step-up; probability: 45–50%
Bear ($400M–800M)Open-source ESM2 + AlphaFold 3 commoditize API; no major pharma contract closes; key co-founder departure triggers talent exodus; Amazon or NVIDIA acqui-hire at distressed valuation; dual-use regulatory action restricts distribution$0.4–0.8BTriggered by: no Forge ARR data by end-2026; competitor open-source parity; co-founder departure announcement; probability: 20–25%

All valuation ranges are scenario-derived estimates based on comparable company analysis (ABSI ~$800M, RXRX ~$1.555B, Generate Biomedicines ~$2.5B last reported), precedent transactions, and ARR multiple modeling. No confirmed Forge revenue was available for DCF input. Probabilities are subjective estimates.

[CV011, CV022, CV023, CV031, CV032, CV033]
FV003: Valuation scenario range

Low-to-high valuation range (USD billion) for bear, base, and bull cases based on scenario assumptions and comparable company benchmarks.

Ranges are model-derived using comparable company multiples, M&A precedents, and ARR scenario modeling. No confirmed Forge financials used. Ranges represent informed estimates, not precise DCF outputs.

[CV031, CV032, CV033]

8.4 Comparable Valuation Set

The comparable set for EvolutionaryScale spans three categories: public AI drug discovery companies, private AI biotech peers, and recent financing transactions. No perfect single comparable exists, given EvolutionaryScale's unique position as a pre-revenue protein foundation-model API company with strategic cloud distribution from two of the world's largest technology companies. Among public comps, Absci (NASDAQ:ABSI) is the closest direct analog by business model—pure-play AI biologics design with no Phase 2 or clinical programs. ABSI's market cap of approximately $800M as of May 2026 provides a public floor valuation for an AI drug creation platform with disclosed revenue of $2.8M (FY2025) and a net loss of $115.2M. The implied revenue multiple is extreme (~285x FY2025 revenue) and reflects market pricing for platform optionality, not near-term fundamentals. Recursion Pharmaceuticals (NASDAQ:RXRX) has a market cap of ~$1.555B but generated only $6.47M Q1 2026 revenue, with an accumulated deficit of $2.1B; it trades as a clinical-stage AI platform with pipeline optionality. Schrödinger (NASDAQ:SDGR) has a market cap of ~$893M with disclosed software plus structure-based drug discovery revenue; its multiple is more conventional but its hybrid model differs. Among private comps, Generate Biomedicines has raised ~$700M total with a last reported valuation around $2.5B—the closest comparable by modality (protein generative AI). Xaira Therapeutics launched with $1B in Series A funding in April 2024—the largest AI drug discovery Series A ever at that time—at a ~$1B valuation. Isomorphic Labs (Alphabet-backed) has undisclosed standalone valuation but is active in multi-billion collaboration deals with Lilly and Novartis. Profluent raised $44M, Cradle.bio raised ~$73M; both serve protein engineering use cases at earlier stage and substantially lower valuation. For foundation-model platform comparables adjusted for vertical narrowness: Anthropic (~$60B), Mistral (~$6B), and Cohere (~$5B) provide upper-bound public market sentiment for pre-revenue AI foundation models. EvolutionaryScale's $1.35B is approximately 2-3% of Anthropic's valuation with comparable model-quality claims but dramatically narrower addressable market (biology only). The implied discount to horizontal foundation models is appropriate given TAM concentration, but still represents a premium to clinical-stage AI drug discovery public comps.[CV011, CV012, CV013, CV014, CV015, CV016]

Comparable valuation table
ComparableTypeKey Metric and Valuation (USD)Multiple or BenchmarkRelevance to EvolutionaryScaleLimitation
Absci (NASDAQ:ABSI)PublicMarket cap ~$800M (May 2026); FY2025 revenue $2.8M~285x FY2025 revenueClosest public pure-play AI biologics design peer; no Phase 2 programs; loss-making; NASDAQ data availableAbsci revenue is milestone-based partner fees, not SaaS ARR; lower-quality revenue than EvolutionaryScale's potential Forge subscriptions
Recursion (NASDAQ:RXRX)PublicMarket cap ~$1.555B (May 2026); Q1 2026 revenue $6.47M; accumulated deficit $2.1B~60x annualized revenueLargest public pure-play AI drug discovery company; phenomics+AI platform; 3M+ compound phenotypic mapClinical-stage pipeline provides premium vs. EvolutionaryScale; different business model (discovery+pipeline not pure platform API)
Schrodinger (NASDAQ:SDGR)PublicMarket cap ~$893M (May 2026); 52-week range $10.94–$27.63Public market disclosedPhysics-based simulation + software licensing; disclosed ARR; longer operating history as a public companyNot generative-AI protein language model; hybrid software+drug discovery model; higher ARR visibility but lower growth profile
Generate BiomedicinesPrivate~$700M total funding; last reported valuation ~$2.5B~3.6x raised-to-valuationClosest generative-biology comparable: protein generative AI for therapeutics; Massachusetts-based; Flagship Pioneering backedEarlier commercial stage; different model architecture; no public financials; last round not confirmed current
Xaira TherapeuticsPrivate$1B Series A (Apr 2024)~$1B valuation at launchLargest-ever AI drug discovery Series A contemporaneous with EvolutionaryScale's raise; structural precedent for $1B+ private AI biotech roundsXaira is focused on drug programs not API platforms; different exit path and revenue model
ProfluentPrivate~$44M raisedEst. $200–400M valuationAI protein design for CRISPR and gene editing; OpenCRISPR open-source release; earlier stageSmaller raise; narrower application (gene editing not broad protein API); limited comparable value
Cradle.bioPrivate~$73M raisedEst. $250–500M valuationProtein optimization SaaS for biopharma and industrial bio; Novonesis partnership confirmed; closer to Forge use caseEarlier stage; optimization not generation; European company (Amsterdam); different technology approach
Isomorphic Labs (Alphabet)PrivateUndisclosed; Lilly+Novartis deals $3B+ headline valueDeal-value benchmark; no standalone valuationAlphaFold 3 originator; generative biology; Lilly and Novartis collaborations; Alphabet structural advantagesNo standalone valuation; Alphabet-backed with fundamentally different cost of capital and competitive position

Comparable set is partial and asymmetric: public comps (ABSI, RXRX, SDGR) have SEC-filed financial data; private comps rely on press-reported funding rounds and estimated valuations. No investment banking or independent fairness opinion data was accessible. All private valuations are estimates.

[CV011, CV012, CV013, CV014, CV015, CV016]
Comparable AI-biotech financing transactions (2023–2026)
CompanyTransactionAmount (USD)Approx. ValuationDateKey Investors
EvolutionaryScaleSeries A$142M~$1.35B post-moneySep 2024Amazon (AWS), NVIDIA, Lux Capital, Nat Friedman, Daniel Gross
Xaira TherapeuticsSeries A (launch)$1,000M~$1BApr 2024ARCH Venture Partners, Foresite Capital, and others
Generate BiomedicinesSeries C (last disclosed)~$273M (Series C); ~$700M total~$2.5B (last reported)2022–2023Flagship Pioneering, Fidelity, NVIDIA; others
Isomorphic LabsSeries B~$600M reported~$3B+ (deal-value benchmark)2024Alphabet (Google); undisclosed institutional co-investors
Insilico MedicineHKEX IPO~$293M~$2.3B (prior Series E)Late 2025Public markets (SEHK:3696)
ProfluentSeries A$44MEst. $200–400M2023–2024Salesforce Ventures, Felicis, OpenAI Fund

Transaction data is sourced from company announcements, Crunchbase, press reports, and market research reports. Valuations for private transactions are estimates based on reported post-money or implied deal terms; Isomorphic Labs valuation reflects Lilly+Novartis deal headline value, not a confirmed standalone equity valuation. All amounts in USD.

[CV001, CV002, CV017, CV018, CV020, CV021]
FV002: Valuation sensitivity to key drivers

Sensitivity of EvolutionaryScale's estimated enterprise value (USD billion) to individual upside and downside drivers relative to a base case midpoint of ~$2.0B.

All values are estimated sensitivity deltas relative to a base-case midpoint of ~$2.0B. No confirmed Forge financial data was available. Ranges reflect comparable company multiples, ARR growth scenarios, and M&A precedent analysis.

[CV031, CV032, CV033]

8.5 Exit Readiness and Final Diligence Asks

EvolutionaryScale's most likely near-term exit paths are: (1) strategic acquisition by Amazon (acqui-hire or full buyout to embed Forge into AWS AI Services) or NVIDIA (to deepen BioNeMo's differentiable protein design capability); (2) a pharma acquisition by a major drug company seeking a protein AI platform (AstraZeneca, Pfizer, Genentech, Novartis) once ARR demonstrates commercial product-market fit; (3) a Series B or Series C at $2B+ if Forge hits $25M+ ARR with multi-pharma customers; or (4) an IPO after reaching $50M+ ARR with >60% gross margin—likely not before 2028 at the earliest. The Amazon structural investor relationship substantially reduces bear case probability: an AWS-backed company with Forge deployed on SageMaker JumpStart would need an actively hostile Amazon decision to face a catastrophic down-round. However, the same Amazon relationship creates exit path concentration: if Amazon is the likely acquirer, secondary investors must accept M&A pricing discipline that may not maximize valuation for non-strategic investors. Five diligence asks are critical before a buy recommendation can be issued: (1) Forge ARR and customer count as of Q2 2026; (2) gross margin on Forge API revenue; (3) revenue share and exclusivity terms in the Amazon and NVIDIA partnerships; (4) ESM2/ESM3 IP transfer agreement with Meta Platforms (if any) confirming clear IP chain of title; (5) biosecurity and dual-use customer screening protocols. Until these five items are confirmed, valuation confidence remains MEDIUM and the recommendation remains Research-More / Track.[CV005, CV009, CV034, CV037, CV038]

Thesis-break and kill triggers
TriggerThreshold and EventTransmission to ThesisAction Implication
No Forge ARR disclosed by year-end 2026EvolutionaryScale has not disclosed any ARR, enterprise customer count, or pricing data by Q4 2026—three years post-foundingConfirms the commercial thesis is entirely speculative; eliminates revenue multiple basis for any valuation above raised capital; signals possible acqui-hire riskDowngrade to avoid; seek direct company IR meeting for ARR confirmation before any additional capital commitment
Co-founder departure (any of Rives, Sercu, Lin, Candido)Public announcement or confirmed LinkedIn departure of any of the four co-founders from an active role at EvolutionaryScaleDestroys the founder-domain-authority pillar; raises immediate questions about IP continuity, team morale, and Amazon/NVIDIA partner confidenceImmediate reassessment; reduce position; require explanation of IP assignment and non-compete status before any thesis upgrade
Open-source protein generative model parityAny open-source protein language model released with comparable ESM3 multimodal generative capability and broad community adoption (>10k GitHub stars within 6 months)Eliminates Forge API's technical differentiation; commoditizes the $1.35B valuation anchor; shifts pricing power to infrastructure (AWS/NVIDIA) away from EvolutionaryScaleReassess valuation toward bear case; evaluate whether AWS/NVIDIA distribution advantage alone sustains $800M+ valuation
Amazon acqui-hire or hostile pricing changeAmazon makes an offer to acquire EvolutionaryScale at a sub-$1B valuation, or AWS changes Forge pricing terms to capture economics directlyReveals that Amazon views EvolutionaryScale as an infrastructure component rather than an independent platform, driving a valuation resetEvaluate acqui-hire premium vs. long-term independence path; model fair value as AWS feature vs. standalone platform

Kill triggers are binary or threshold events; monitoring requires: regular EvolutionaryScale blog/press release checks; LinkedIn tracking for co-founder activity; GitHub protein model repository monitoring; AWS Partner Network announcements.

[CV005, CV022, CV023, CV029, CV033, CV037]
Final diligence asks
TopicMissing EvidenceWhy It MattersOwner and Diligence Path
Forge ARR and customer countAnnual recurring revenue, enterprise customer count, and customer names (at least category-level) for the Forge API platform as of Q2 2026No valuation model above Series A entry is defensible without revenue confirmation; ARR is the primary commercial thesis validation metricCompany IR; AWS Marketplace listing data; LinkedIn job postings referencing customer-facing roles; Series B fundraise data room
Forge gross marginCost of revenue for Forge API (GPU compute cost per API call, infrastructure cost, personnel allocated); gross margin percentageGross margin determines whether Forge can scale to a high-value SaaS business (>70% GM) or is structurally a low-margin compute resale businessCompany IR; financial data room; AWS compute cost benchmarks as proxy
Amazon and NVIDIA partnership financial termsRevenue share percentage, exclusivity provisions, minimum commitment volumes, and term length for the AWS SageMaker JumpStart and NVIDIA BioNeMo distribution agreementsIf Amazon or NVIDIA take >40% of Forge gross revenue as channel fee, EvolutionaryScale's net economics may not support independent platform value; exclusivity terms determine ability to self-serveSeries B data room; M&A data room request; NVIDIA and AWS partner program filings
ESM IP chain of title from MetaAny IP transfer, license, or assignment agreement between Meta Platforms and EvolutionaryScale founders covering ESM model architecture, training code, or data pipelineWithout confirmed IP clean chain, acquirers or pharma partners face IP litigation risk; Amazon or pharma due diligence will require clean titleCompany legal disclosure; patent search (USPTO); co-founder employment agreement review
Biosecurity and dual-use screening protocolsCustomer screening process, access controls for high-risk requests (pathogen-related proteins), and compliance with NIH Dual Use Research of Concern (DURC) policiesBiosecurity regulatory action could restrict Forge distribution to non-US markets or force API feature removal; governance transparency is a pre-condition for pharma partnershipEvolutionaryScale responsible development blog; DURC policy review; US DoD/BARDA contractor relationship check
Cap table and preference stackSeries A liquidation preference multiple, anti-dilution provisions, total preferred share outstanding, and estimated diluted share count post-Series ACommon equity value below the $1.35B headline depends critically on the preference stack; 2x liquidation preference or participating preferred can significantly reduce common equity value at moderate exit pricesSeries B data room; VC legal counsel review; Delaware secretary of state certificate of incorporation

All six diligence asks are blockers for a buy recommendation. Forge ARR is the highest-priority item; without it, no valuation confidence above the current Series A anchor is possible. IP and biosecurity items are pre-conditions for institutional pharma partnerships and any M&A transaction.

[CV005, CV009, CV034, CV037]

8.6 Exhibits

Disclaimer

EvolutionaryScale ceased to operate as an independent for-profit entity on November 6, 2025, when its team was absorbed into CZ Biohub under the Chan Zuckerberg Initiative. This report is therefore primarily a historical / forensic diligence on a defunct standalone investment thesis; the present-day investable surface is the CZI / CZ Biohub network, which is non-profit and not on offer to outside investors. All financial figures (valuation, raise, headcount, downloads) are sourced from third-party reports as no SEC filings or audited disclosures exist; acquisition terms are undisclosed. The recommendation reflects unavailability of a forward-looking standalone equity instrument, not a judgment on the underlying science.

Evidence index

Claims
IDStatementConfidenceSources
CO001 EvolutionaryScale was incorporated in 2023 and became operationally active in approximately March 2024. High SO001, SO004
CO002 EvolutionaryScale was co-founded by Alexander (Alex) Rives, Tom Sercu, Zeming Lin, and Sanjay Rao, all formerly of Meta AI Research (FAIR). High SO001, SO017, SO013
CO003 EvolutionaryScale was headquartered in San Francisco, California, USA, prior to the November 2025 CZI acquisition. Medium SO001, SO004
CO004 Alex Rives served as CEO of EvolutionaryScale from founding until the November 2025 CZI acquisition, at which point he became Head of Science at CZI. High SO001, SO017, SO014
CO005 Tom Sercu served as co-founder and VP of Engineering at EvolutionaryScale, leading infrastructure and large-scale model training. High SO017, SO010, SO009
CO006 EvolutionaryScale stated its mission as using generative AI to model the language of proteins and unlock programmable biology for human benefit. High SO001, SO002
CO007 EvolutionaryScale primary flagship product was ESM3, a generative multimodal protein language model, released June 25, 2024. High SO001, SO002, SO017
CO008 ESM3 was publicly released on June 25, 2024, with both an open-weights variant (esm3-sm-open-v1) for academic use and a commercial Forge API offering. High SO002, SO017, SO005
CO009 ESM3 is available in multiple model sizes; the largest publicly released variant has 98 billion parameters. High SO002, SO009
CO010 ESM3 was trained on 2.78 billion protein sequences totaling 771 billion tokens using approximately 1x10^24 FLOPs on a cluster of NVIDIA H100 GPUs. Medium SO002, SO009, SO010
CO011 EvolutionaryScale released ESM Cambrian (ESM-C) on December 4, 2024. High SO003, SO026
CO012 ESM Cambrian is available in three model sizes: 300M, 600M, and 6B parameters, optimized for efficient protein language modeling inference. High SO003, SO026
CO013 A peer-reviewed paper on ESM3 titled Simulating 500 million years of evolution with a language model was published in Science on January 16, 2025, with DOI 10.1126/science.ads0018. High SO009, SO010, SO002
CO014 ESM3 encodes and generates proteins by treating sequences, structures, and functional annotations as a multimodal language, sampling from the space of 500 million years of protein evolution. Medium SO002, SO009
CO015 EvolutionaryScale raised a seed round announced on June 25, 2024, with participation from Lux Capital, Nat Friedman, Daniel Gross, NVIDIA, and Amazon; the seed amount was not publicly disclosed. Medium SO017, SO004
CO016 EvolutionaryScale closed a $142M Series A round on September 26, 2024, co-led by Amazon and NVIDIA. High SO015, SO017, SO004
CO017 The Series A round was closed at an implied post-money valuation of approximately $1.35 billion. Medium SO004, SO015
CO018 Additional participants in the Series A included Lux Capital, Nat Friedman, and Daniel Gross, who had also participated in the seed round. Medium SO017, SO004
CO019 As of May 2026, no SEC Form D filings were found under any variant of EvolutionaryScale in EDGAR for the 2024 to 2026 period. High SO011, SO012
CO020 EvolutionaryScale had 11 to 50 employees according to its LinkedIn company page, consistent with a seed/Series A-stage AI research startup. Low SO013
CO021 On November 6, 2025, CZ Biohub announced that the EvolutionaryScale team would join the CZ Biohub Network as part of the Frontier AI for Biology initiative led by the Chan Zuckerberg Initiative. Medium SO014, SO018
CO022 Following the November 2025 CZI acquisition, Alex Rives became Head of Science at the Chan Zuckerberg Initiative (CZI), and other co-founders joined CZ Biohub in senior research roles. Medium SO014, SO013
CO023 CZI and CZ Biohub framed the EvolutionaryScale acquisition as advancing open biological science and making frontier AI biology tools broadly accessible to researchers. Medium SO014, SO018
CO024 The ESM GitHub repository originally at github.com/evolutionaryscale/esm was transferred to the biohub organization following the CZI acquisition, signaling IP transfer. Medium SO007, SO006
CO025 ESM3 open-weights variant (esm3-sm-open-v1) accumulated over 3,100 downloads on HuggingFace; ESM Cambrian models accumulated over 6,300 downloads collectively, as of May 2026. Medium SO026, SO005
CO026 ESM3 was integrated into the NVIDIA BioNeMo platform and made available as an NVIDIA NIM microservice for enterprise deployment on H100 infrastructure. Medium SO017, SO019, SO022
CO027 No Wikipedia article exists for EvolutionaryScale; the URL en.wikipedia.org/wiki/EvolutionaryScale returns a 404 not-found page as of May 2026. Medium SO021, SO018
CO028 EvolutionaryScale operated a commercial API platform at forge.evolutionaryscale.ai providing developer access to ESM3 and ESM-C models; the platform is JavaScript-rendered and its operational status post-acquisition is unknown. Medium SO024, SO002, SO003
CO029 EvolutionaryScale never publicly disclosed commercial revenue, ARR, or customer count as a standalone entity. Medium SO001, SO004, SO024
CO030 All four co-founders (Rives, Sercu, Lin, Rao) were formerly at Meta AI (FAIR), creating a single-employer provenance risk with homogeneous cultural and technical assumptions and no evidence of diverse executive expertise outside AI research. High SO002, SO017, SO013
CO031 The ESM3 BioRxiv preprint (doi: 10.1101/2024.07.01.600583) was published in July 2024 ahead of the Science journal paper, with Rives, Sercu, Candido, Lin, and others as authors. Medium SO010, SO007
CO032 EvolutionaryScale technological moat rested on large-scale protein language model pre-training, proprietary training infrastructure (Andromeda H100 cluster), and a multi-year research lead through the ESM model family lineage from Meta FAIR. Medium SO002, SO009, SO008
CO033 The DeepEP repository demonstrates EvolutionaryScale infrastructure capability in mixture-of-experts inference and expert-parallel communication, relevant to deploying large protein language models at scale. Medium SO008, SO006
CO034 Following the CZI acquisition, the ESM model family is expected to remain accessible as open-source research tools through the CZ Biohub network, continuing the open-weights distribution strategy. Medium SO014, SO007
CO035 EvolutionaryScale was classified as an early-stage private company at seed through Series A stage, with no commercial product revenue disclosed prior to the CZI acquisition in November 2025. High SO001, SO004, SO015
CO036 NVIDIA participated in EvolutionaryScale seed round announced alongside the ESM3 launch on June 25, 2024, and later co-led the Series A in September 2024. Medium SO017, SO023
CO037 The Forge API platform (forge.evolutionaryscale.ai) was the commercial interface for EvolutionaryScale protein design models, providing programmatic access for biotechnology and pharmaceutical customers. Medium SO024, SO002
CO038 The Bloomberg article reporting on the $142M Series A is behind a paywall, preventing public verification of full financing terms, investor rights, and any secondary components of the deal. Medium SO015, SO027
CM001 EvolutionaryScale's core addressable market is the protein language model (PLM) API and platform market—cloud-hosted AI models enabling protein engineers to generate, predict, and optimize protein sequences and structures without exhaustive wet-lab directed evolution. High SM011, SM012
CM002 ESM3, published in Science on January 16, 2025 (DOI: 10.1126/science.ads0018), is the first generative protein language model to simultaneously reason over sequence, structure, and function in a single unified architecture—trained on 2.78 billion protein sequences with 98 billion parameters using approximately 1×10^24 FLOPs. High SM013, SM015
CM003 Status-quo substitutes for protein LM platforms include AlphaFold2/3 (free structure prediction database, 200M+ structures), Rosetta/PyRosetta (open-source protein design), directed evolution in wet lab (weeks per cycle, throughput-limited), and traditional molecular dynamics tools (GROMACS, Schrödinger Maestro), none of which provide generative multi-modal reasoning over sequence, structure, and function jointly. Medium SM007, SM017
CM004 The adjacent AI drug discovery platform market (Grand View Research) is estimated at $2.35B in 2025 growing to $13.77B by 2033 at 24.8% CAGR; EvolutionaryScale's Forge API serves as infrastructure enabling this broader market by providing protein characterization and engineering capabilities. Medium SM005
CM005 Industrial biotechnology—enzyme engineering for green chemistry, agriculture, biomaterials, and food science—is a secondary adjacency for EvolutionaryScale with shorter product development cycles and lower regulatory burden than pharmaceutical applications. Medium SM011, SM025
CM006 The outer boundary drug discovery market (all modalities) is estimated at $71.89B in 2025 growing to $158.74B by 2034 at 9.2% CAGR (Precedence Research); protein engineering API tools constitute a specialized AI sub-segment within this broader market well beyond EvolutionaryScale's direct footprint. Medium SM006
CM007 MarketsandMarkets estimates the protein engineering market at $2.2B (2019) growing to $3.9B by 2024 at 12.4% CAGR; Allied Market Research estimates $2.2B (2022) to $7.7B by 2032 at 13.2% CAGR; Grand View Research estimates $2.60B (2023) to $7.62B by 2030 at 16.24% CAGR—all directionally consistent at 12–16% annual growth over a decade. Medium SM001, SM003, SM004
CM008 Precedence Research takes the broadest scope, estimating the protein engineering market at $5.09B in 2025 growing to $23.59B by 2035 at 16.57% CAGR, incorporating industrial enzymes, biopharmaceuticals, and all research tools rather than just software and services. Medium SM002
CM009 The protein engineering market has a 10× analyst estimate dispersion ($2.2B to $23.59B for 2019–2025 entry years), attributable to scope inconsistency: narrow estimates focus on software/services while broad estimates incorporate industrial enzymes, biopharmaceuticals, and manufacturing applications. Medium SM001, SM002, SM003, SM004
CM010 The FDA received over 500 AI/ML-enabled drug development submissions between 2016 and 2023, issued draft AI guidance in 2025, and established the CDER AI Council in 2024, signaling active and accelerating federal regulatory engagement with AI-native drug development tools including protein engineering applications. High SM010, SM016
CM011 EvolutionaryScale raised $142 million in a Series A round (Crunchbase), establishing investor-validated commercial potential for the protein LM API market; the Forge commercial API monetizes ESM3 access for biopharma and biotech customers beyond the MIT-licensed free tier. Medium SM022, SM024
CM012 The AI drug discovery market (24.8% CAGR, GVR) grows materially faster than the protein engineering tools market (12–17% CAGR), reflecting accelerated pharma AI investment post-AlphaFold; EvolutionaryScale's Forge API benefits from both trajectories as an enabling platform. Medium SM005, SM001, SM002
CM013 No independently published serviceable addressable market figure exists for protein language model APIs specifically within pharmaceutical R&D; all protein engineering market estimates encompass the full market including reagents and instruments, making PLM API SAM derivation assumption-dependent. Medium SM001, SM002, SM003, SM004
CM014 ESM3's commercial Forge API and ESM-C open weights were launched in June 2024 and September 2024 respectively, with ESM-C distributed on AWS SageMaker JumpStart and NVIDIA BioNeMo to reach enterprise pharma customers already embedded in those cloud ecosystems. High SM011, SM012, SM008, SM009
CM015 The primary commercial buyer for EvolutionaryScale's Forge API is the large or mid-tier pharmaceutical or biotech company with an active computational biology or protein engineering program, where economic buyer authority rests with a VP of Computational Biology, Director of Drug Discovery, or Chief Scientific Officer. Medium SM011, SM012
CM016 Academic and government research labs constitute a high-volume, zero-revenue user segment: ESM-C open weights under MIT license have been downloaded over 6,320 times from HuggingFace and the ESM package is available via PyPI, providing community mindshare that can feed eventual commercial pipeline. Medium SM018, SM025
CM017 NVIDIA BioNeMo and AWS SageMaker JumpStart serve as enterprise distribution channels for ESM-C, lowering commercial adoption friction for pharma customers with existing cloud infrastructure contracts on those platforms. High SM008, SM009, SM012
CM018 The technical champion for ESM3/ESM-C adoption is typically a computational biologist, structural biologist, or machine learning scientist within a pharma or biotech R&D organization who evaluates model capabilities and advocates for integration into existing protein engineering pipelines. Medium SM011, SM017
CM019 Industrial biotechnology companies—engineering enzymes for green chemistry, agriculture, and specialty materials—represent a growing buyer segment with different procurement patterns than pharma: shorter development cycles, lower regulatory burden, and higher tolerance for experimental API tools. Medium SM025, SM011
CM020 The ESM Python package on PyPI enables access to ESM3 open models and commercial Forge API, listing all model sizes (esm3-large-2024-03, ESM-C 300M/600M/6B) and API authentication, supporting both researcher self-service and enterprise paid Forge access from a single installation. Medium SM025, SM017
CM021 Biotech startups at Series A–B stage represent an emerging paid segment for Forge API: they have computational infrastructure but lack resources to train frontier protein LMs independently, making API access economically rational vs. self-hosting 98B-parameter ESM3. Medium SM022, SM011
CM022 DNA sequencing costs declined from approximately $10,000 per genome in 2011 to approximately $100 per genome by 2023 (NHGRI), enabling exponential growth in protein sequence databases and providing the training data foundation that enables frontier-scale protein language models like ESM3 to generalize across the protein universe. High SM023, SM020
CM023 Google DeepMind's AlphaFold protein structure database provides free access to over 200 million predicted protein structures; this open resource normalizes computational protein tools in pharma R&D and expands the addressable buyer base for ESM3/ESM-C by reducing scientific credibility risk. High SM007, SM013
CM024 NVIDIA BioNeMo delivers 2× faster biofoundation model training and 6× faster model inference versus unoptimized implementations, reducing the total cost of ownership for enterprise protein LM deployment and strengthening EvolutionaryScale's NVIDIA distribution partnership as a commercial channel. High SM008, SM009
CM025 ESM3 was trained on 2.78 billion protein sequences with 98 billion parameters using approximately 1×10^24 FLOPs of compute (Science, January 2025)—a scale achievable only because of the exponential growth in protein databases enabled by declining sequencing costs. High SM013, SM015
CM026 ESM-C's release under MIT license on HuggingFace with AWS SageMaker and NVIDIA BioNeMo distribution mirrors the open-weight strategy that drove commercial cloud API conversion in NLP (e.g., Hugging Face, Mistral AI), establishing EvolutionaryScale as the community standard for protein LMs. Medium SM012, SM018
CM027 ESM-C (300M, 600M, and 6B parameter variants) is available under MIT license on HuggingFace (6,320+ downloads for the 600M variant, 3,110+ for ESM3 open), enabling any organization with GPU access to self-host the model at zero marginal cost, creating a pricing ceiling and limiting paid Forge conversion for price-sensitive customers. Medium SM018, SM017
CM028 No protein engineered purely by a computational AI model has received regulatory approval without extensive in vitro and in vivo wet-lab validation; the experimental bottleneck remains a necessary post-computational step, structurally limiting the standalone commercial value of a protein LM API. High SM010, SM013
CM029 Enterprise pharma technology procurement cycles typically add 12–24 months to commercial deployment timelines relative to academic adoption due to IT security reviews, cloud data governance policies, SOC2/GxP compliance requirements, and multi-year vendor vetting processes. Medium SM008, SM009
CM030 Google DeepMind (AlphaFold3), NVIDIA BioNeMo, and AWS HealthOmics all have distribution, compute, and ecosystem advantages that could threaten EvolutionaryScale's commercial differentiation if frontier protein LM capabilities converge toward commodity—a material long-run competitive risk. Medium SM007, SM008, SM019
CM031 The bioRxiv preprint server indexed over 129 papers citing ESM3 or EvolutionaryScale as of the access date, indicating strong academic community engagement with the ESM protein LM family and validating the open-weight strategy for building ecosystem adoption. Medium SM014, SM015
CM032 No independent SAM figure for protein language model APIs within pharmaceutical R&D has been published; all accessible analyst estimates cover the full protein engineering tools market ($2.2B–$23.59B), making PLM API SAM derivation assumption-dependent and constituting a material diligence gap. Medium SM001, SM002, SM003, SM004
CM033 EvolutionaryScale has not publicly disclosed Forge API pricing, subscriber counts, or revenue figures; HuggingFace download metrics and GitHub stars are developer adoption proxies that do not directly translate to commercial revenue without knowledge of the paid conversion funnel. Medium SM018, SM025, SM022
CM034 The protein engineering market analyst consensus (MAM, Allied, GVR) converges on a 2024 base of $2.2B–$2.6B with 12–16% CAGR reaching $7–8B by 2030; Precedence's $5.09B base is an outlier explained by broader scope inclusion of industrial enzymes and biopharmaceutical manufacturing. Medium SM001, SM002, SM003, SM004
CM035 The ESM3 Science paper has accumulated over 40,000 citations to AlphaFold as context and 129+ follow-on bioRxiv preprints within one year of publication, demonstrating the scientific impact of the ESM model family and establishing ecosystem depth that sustains commercial positioning. Medium SM013, SM014
CM036 EvolutionaryScale's distribution strategy—open weights on HuggingFace (MIT license) + enterprise Forge API + AWS SageMaker + NVIDIA BioNeMo—creates a multi-channel commercial model spanning free community tier, cloud marketplace access, and direct enterprise contracts. High SM011, SM012, SM008, SM009
CP001 AbSci Corporation (NASDAQ: ABSI) filed a 10-K with the SEC for fiscal year ended December 31, 2025, confirming it is a publicly traded generative AI drug company based in Vancouver, Washington. High SP026, SP004
CP002 DeepMind's AlphaFold Protein Structure Database, developed in partnership with EMBL-EBI, provides open access to over 200 million predicted protein structures under a CC-BY-4.0 license, used by over 3 million researchers in 190+ countries. High SP024, SP006
CP003 EvolutionaryScale's ESM3 is the first generative model to simultaneously reason over protein sequence, structure, and function in a single multimodal architecture, published in Science on January 16, 2025, trained with over 10^24 FLOPs and 98 billion parameters. High SP022, SP025
CP004 Generate Biomedicines has generated, built, and tested over 42,000 proteins through its continuously learning platform, with 140,000+ square feet of lab space across Boynton Yards and Andover locations. Medium SP002
CP005 Cradle.bio's homepage reports that teams using Cradle achieve 2–12x faster protein development timelines, with results compounding across successive rounds of wet-lab and AI iteration. Medium SP005
CP006 The RFdiffusion algorithm for de novo protein structure and function design was published in Nature in July 2023 by Baker Lab researchers, representing the Baker Lab / IPD's leading open-source generative design tool. High SP021, SP010
CP007 Meta's ESM2 and ESMFold protein language models are released under an MIT license, confirmed on both GitHub (github.com/facebookresearch/esm) and HuggingFace, permitting commercial use at zero cost. High SP017, SP018
CP008 Meta's ESM protein language models were created by Alexander Rives, Zeming Lin, Tom Sercu, and Salvatore Candido at Meta AI FAIR — the exact same four individuals who co-founded EvolutionaryScale in 2023. High SP017, SP020
CP009 Isomorphic Labs is an Alphabet subsidiary focused on AI-driven drug discovery, building on the Nobel Prize-winning AlphaFold system, with an interdisciplinary team of drug discovery experts and machine learning specialists. Medium SP008
CP010 Chai Discovery is developing Chai-2, which targets drug-like antibody design against challenging targets with atomic precision, building on its earlier open-released Chai-1 model. Medium SP009
CP011 Recursion Pharmaceuticals (NASDAQ: RXRX) has generated over 50 petabytes of biological and chemical data and operates BioHive-2, a biopharma supercomputer built in partnership with NVIDIA. Medium SP011
CP012 Schrödinger's computational platform is built on over 30 years of R&D and includes FEP+, WaterMap, and LiveDesign as core products used by leading pharmaceutical companies for molecular discovery and optimization. Medium SP013, SP014
CP013 Inceptive specializes in foundation models for RNA, mRNA, siRNA, ASO, and peptide therapeutics, operating from offices in Palo Alto, Berlin, and Zurich, and was founded in 2021. Medium SP015
CP014 Iambic Therapeutics uses its Enchant and NeuralPLexer AI technologies for drug design and has reported Phase 1b safety and tolerability data for IAM1363, a HER2-targeted inhibitor for brain-penetrant cancer treatment. Medium SP016
CP015 Xaira Therapeutics is building predictive and agentic AI models across the complete drug discovery and development process, including target identification, therapeutic design, and patient selection. Medium SP028
CP016 The OpenFold Consortium provides permissively licensed open-source protein folding tools including OpenFold, OpenFold-SoloSeq (no MSA required), and OpenFold-Multimer for protein complex modeling. Medium SP019
CP017 AbSci's AI Drug Creation Platform operates with 6-week wet-lab and AI iterative cycles for de novo biologic design, enabling multi-parametric lead optimization from concept through to clinical trial pipeline. Medium SP004
CP018 AbSci has designed ABS-201, an AI-generated antibody targeting prolactin receptors for androgenetic alopecia, which demonstrated hair follicle regeneration in vivo studies as a potential best-in-class therapeutic developed in 24 months. Medium SP004
CP019 Profluent Bio describes OpenCRISPR on its website as the world's first AI-designed gene editor, representing the company's flagship public demonstration of protein design AI capability. Medium SP001
CP020 Cradle.bio is SOC 2 compliant and operates on a software subscription model where customer IP is fully retained, customer experimental data is never used to train models for other customers, and no royalties are charged. Medium SP005
CP021 Novonesis (formerly Novozymes), one of the world's largest industrial biotech companies, has publicly stated a partnership with Cradle that embeds AI directly into how it innovates protein products to shorten development time. Medium SP005
CP022 Generate Biomedicines operates over 140,000 square feet of lab space at Boynton Yards and Andover locations, supporting a capital-intensive generate-build-measure-learn platform. Medium SP002
CP023 Generate Biomedicines' lead program GB-0895 is an AI-designed anti-TSLP antibody for asthma, co-optimized for both biological effect and reduced dosing frequency, with potential to shift treatment from monthly to twice-yearly administration. Medium SP003
CP024 Recursion Pharmaceuticals' clinical pipeline includes REC-4881 (Phase 2 MEK1/2 inhibitor for FAP with Orphan Drug and Fast Track designations) and REC-3565 (Phase 1 MALT1 inhibitor for B-cell lymphoma). Medium SP012
CP025 Adaptyv Bio is based at the Biopole Life Science Campus in Epalinges, Lausanne, Switzerland and positions itself as a cloud lab for protein designers. Low SP007
CP026 EvolutionaryScale offers Forge, its commercial API platform for ESM3 and ESMC access, described as entering public beta in January 2025 alongside the Science publication announcement. Medium SP025, SP023
CP027 Cradle.bio charges customers a software subscription fee, explicitly promises no royalties, and states that customer sequences and data are private, secure, and never used to train models for other customers. Medium SP005
CP028 Meta's ESM2 model family is available on both GitHub and HuggingFace under the MIT license with no usage restrictions—including commercial use—at zero cost, for models ranging from 8M to 650M parameters publicly on HuggingFace. High SP017, SP018
CP029 Schrödinger announced Q1 2026 financial results in May 2026, confirming its active status as a publicly traded drug discovery platform company (NASDAQ: SDGR). Medium SP013
CP030 Generate Biomedicines and AbSci both monetize through B2B pharma partnership and licensing models rather than offering public self-service APIs, distinguishing their commercial models from EvolutionaryScale's Forge API approach. Medium SP002, SP004
CP031 ESM2 was released by Meta AI FAIR under an MIT license, and the same researchers (Rives, Lin, Sercu, Candido) who created it at Meta are the co-founders of EvolutionaryScale, creating a structural commoditization baseline against their own commercial offering. High SP017, SP020, SP025
CP032 ESMFold, a protein structure prediction model based on ESM2 developed at Meta AI, predicts protein structure end-to-end up to 60x faster than prior state-of-the-art methods and is freely available. Medium SP020
CP033 The Meta ESM2 model family (up to 15B parameters) and ESMFold, both released under MIT license for any use including commercial, were built by EvolutionaryScale's own founders and set a free commoditization floor for basic protein language modeling. High SP017, SP018, SP020
CP034 Insilico Medicine (HKEX: 3696) has completed a Phase 2 clinical trial for ISM001-055 (TNIK inhibitor for IPF), making it the first AI drug discovery company to reach Phase 2 completion with a drug designed entirely using AI. Medium SP027
CP035 ESM3's defining differentiation is simultaneous joint reasoning over protein sequence, structure, and function in one multimodal model — a capability absent from ESM2 (sequence only) and standard AlphaFold variants (structure prediction only). High SP022, SP025
CP036 Generate Biomedicines has raised substantially more capital than EvolutionaryScale (estimated ~$700M+ vs $142M), enabling a capital-intensive wet-lab validation strategy that EvolutionaryScale's disclosed funding cannot currently replicate. Medium SP002
CP037 AlphaFold 3's commercial rights for drug discovery are exclusively licensed to Isomorphic Labs, while the AlphaFold model code and weights are available for academic non-commercial use, creating a two-tier access structure. Medium SP006, SP008
CP038 Pharma clients can simultaneously use free protein AI tools (AlphaFold DB CC-BY-4.0, ESM2 MIT, OpenFold open-source) and paid platforms (Forge, Cradle, Generate), enabling multi-homing that limits any single vendor's pricing power. Medium SP006, SP017, SP005, SP025
CP039 David Baker (Institute for Protein Design, University of Washington) was co-awarded the Nobel Prize in Chemistry in October 2024 for computational protein design, alongside Demis Hassabis and John Jumper (Google DeepMind) for AlphaFold. High SP006, SP010
CP040 The Institute for Protein Design distributes RFdiffusion and RoseTTAFold software royalty-free and has developed a COVID-19 vaccine using protein design technology that received approval in the UK and South Korea under WHO Emergency Use Listing. Medium SP010, SP021
CI001 EvolutionaryScale's commercial product was Forge, a protein language model inference API launched in public beta in January 2025, providing pay-per-token access to ESM3 and ESM Cambrian models. High SI001, SI003, SI004
CI002 ESM Cambrian (released January 2025) was made available exclusively as a commercial model through the Forge API, unlike the open-weight ESM2 from Meta AI Research which is freely available on HuggingFace. High SI003, SI023
CI003 No revenue figures, ARR, gross margin, customer count, or commercial traction metrics were publicly disclosed by EvolutionaryScale at any point during its operation as a standalone entity. High SI001, SI014, SI015
CI004 The Forge API pricing schedule required a user login to view at forge.evolutionaryscale.ai; no public price list was available to unauthenticated users as of the research date. Medium SI004
CI005 EvolutionaryScale's revenue model combined at least three streams: Forge API pay-per-use (per-token), enterprise annual API contracts, and partner distribution through NVIDIA BioNeMo and AWS SageMaker JumpStart. Medium SI004, SI016, SI017
CI006 ESM3 was integrated into NVIDIA's BioNeMo platform as a NVIDIA Inference Microservice (NIM), enabling cloud-hosted protein generation through NVIDIA's commercial distribution channel. High SI017, SI018, SI019
CI007 EvolutionaryScale's ESM models were listed on AWS SageMaker JumpStart for cloud-hosted access; Amazon was the lead co-investor in the September 2024 Series A, suggesting a strategic alignment between investment and cloud distribution. Medium SI009, SI017
CI008 EvolutionaryScale offered an academic free tier with capped token allowances as a freemium entry point to Forge API, intended to drive academic usage and downstream conversion to enterprise or paid API tiers. Medium SI004, SI001
CI009 The open-weight ESM2 model (developed by Meta AI Research and released on HuggingFace) serves as a zero-cost alternative for protein sequence embeddings, creating a structural competitive ceiling on EvolutionaryScale's Forge API pricing power for non-generative use cases. Medium SI023, SI003
CI010 The Forge API's post-CZI Biohub operational status and pricing under CZI management are unconfirmed in public sources as of May 2026; the company homepage states it is "joining forces with Biohub" without specifying Forge API continuity. High SI001, SI013
CI011 ESM3 was trained with over 10^24 FLOPs—described by EvolutionaryScale as "the most compute ever applied to training a biological model"—on what the company called "one of the highest throughput GPU clusters in the world today." High SI002, SI018
CI012 At over 10^24 FLOPs and H100 GPU pricing of approximately $2–5 per GPU-hour, the ESM3 training run is estimated to have cost $10–50 million, making it the dominant one-time capital expenditure in EvolutionaryScale's history. Low SI002, SI018
CI013 EvolutionaryScale's LinkedIn company profile shows the 11-50 employee size bracket, implying approximately 25-50 full-time employees at its peak operational scale. Medium SI025
CI014 At an estimated 25-50 FTE with a blended all-in cost of $200,000–$300,000 per employee annually (standard for San Francisco AI research teams), EvolutionaryScale's annual personnel burn is estimated at $5–15 million per year. Low SI025
CI015 Amazon's role as lead Series A investor may have included in-kind AWS cloud compute credits as part of the deal structure, which would reduce EvolutionaryScale's cash infrastructure spend and extend effective runway beyond naive burn-rate estimates. Low SI009, SI017
CI016 Gross margin for the Forge API inference business depends on whether EvolutionaryScale owned GPU cluster infrastructure (capital-intensive, higher long-run margin) or rented cloud compute (lower capex, COGS-heavy). No gross margin figures were disclosed. Low
CI017 ESM3 was developed with 98 billion parameters, placing it firmly in the frontier model scale class for biological language models; inference costs per query at this parameter scale are substantially higher than smaller protein models. High SI002, SI018
CI018 EvolutionaryScale raised $142 million in a Series A round announced on September 26, 2024, led by Amazon and NVIDIA, with co-investment from Lux Capital, Nat Friedman, and Daniel Gross. High SI001, SI009, SI010, SI017
CI019 The Series A was reported at a post-money valuation of approximately $1.35 billion, placing EvolutionaryScale among the most highly valued pre-revenue protein AI companies at the time of the round. High SI009, SI011, SI020
CI020 NVIDIA joined EvolutionaryScale's seed round (announced June 25, 2024), making NVIDIA both a seed and Series A investor—an unusual dual-round commitment that underscores the strategic importance of ESM3 to NVIDIA's BioNeMo platform. High SI016, SI017
CI021 EvolutionaryScale's seed round (late 2023) was led by Nat Friedman and Daniel Gross, with participation from Lux Capital; the dollar amount raised in the seed was not publicly confirmed in any accessible source. Medium SI014, SI015
CI022 Total capital raised by EvolutionaryScale prior to the CZI Biohub transaction is estimated at approximately $145 million ($3M seed + $142M Series A). Medium SI014, SI009, SI016
CI023 On November 6, 2025, the EvolutionaryScale team joined CZI Biohub to advance the Frontier AI for Biology Initiative, as announced by Biohub.org and reported by CNBC. High SI013, SI012, SI001
CI024 Alex Rives, EvolutionaryScale co-founder and chief scientist, became Head of Science at CZI Biohub following the November 2025 transaction. High SI013, SI012
CI025 The CZI Biohub transaction was consummated approximately 14 months after the $142M Series A closed on September 26, 2024, providing only a narrow window for commercial revenue ramp before the standalone entity effectively ended. High SI009, SI013
CI026 No Form D filings for EvolutionaryScale appear in SEC EDGAR under any of the following search terms: "EvolutionaryScale," "Evolutionary Scale," "Evolutionary Scale Inc," or by key person "Alexander Rives" — across four separate EDGAR full-text and company browse searches. High SI005, SI006, SI007, SI008
CI027 Private companies raising capital under SEC Regulation D exemptions are legally required to file Form D with the SEC within 15 days of the first sale of securities. The absence of Form D in EDGAR for EvolutionaryScale's $142M Series A raise is a noteworthy regulatory compliance gap or indicator of filing under an undiscovered legal entity name. High SI005, SI006
CI028 The financial terms of the November 2025 CZI Biohub transaction were not disclosed in any public source reviewed, including Biohub.org, CNBC, EvolutionaryScale's homepage, or SEC EDGAR. High SI012, SI013, SI001
CI029 Xaira Therapeutics raised $1 billion at its founding in 2024 for full-stack AI drug discovery, representing the largest single-round raise in protein AI; EvolutionaryScale's $1.35B valuation on $142M capital compares to Xaira's larger initial capital base. Medium SI014, SI015
CI030 Profluent Bio raised approximately $44 million across its financing rounds for protein design AI with a narrower commercial scope than EvolutionaryScale, demonstrating that the protein AI market can support smaller, more focused capital deployments alongside frontier-scale foundation models. Medium SI014
CI031 Generate:Biomedicines raised over $700 million across Series A through C for full-stack AI protein therapeutics, targeting drug revenue rather than API monetization—a fundamentally different business model and capital structure from EvolutionaryScale's foundation-model API approach. Medium SI014, SI015
CI032 EvolutionaryScale's ~$3–6M capital raised per employee (based on ~$145M total and ~25–50 FTE) substantially exceeds Profluent's and Cradle's capital efficiency, reflecting the compute intensity of frontier biological foundation model training rather than scaled commercial deployment. Low SI014, SI025
CI033 Crunchbase incorrectly labels EvolutionaryScale's $142M Series A as a "seed investment round" in its AI-generated summary, illustrating the unreliability of AI-generated private-market data summaries; the actual round type is confirmed as Series A by CNBC, Axios, NVIDIA, and MIT Technology Review. High SI014, SI009
CI034 The investor return profile for EvolutionaryScale's $142M Series A participants (Amazon, NVIDIA, Lux Capital, Nat Friedman, Daniel Gross) is not determinable from public sources following the CZI Biohub transaction, as no deal terms or investor distribution amounts were disclosed. Low SI012, SI013
CI035 EvolutionaryScale as an independent commercial entity effectively ceased to operate following the November 2025 CZI Biohub transaction; the company homepage confirms the entity is "joining forces with Biohub" without a separate commercial continuation announcement. High SI001, SI013, SI012
CI036 All five planned financial information gaps—actual revenue, confirmed burn rate, CZI transaction terms, Form D filings, and enterprise customer count—remain unresolved in public sources as of May 2026 and require direct access to CZI Biohub documentation or historical EvolutionaryScale internal records. High SI005, SI001, SI014
CI037 The CZI Biohub is a non-profit initiative of the Chan Zuckerberg Initiative whose Frontier AI for Biology Initiative absorbs EvolutionaryScale's team and models under a philanthropic, non-commercial mandate—a fundamental change from the VC-backed commercial API business model. High SI013, SI012
CI038 The $142M Series A at $1.35B valuation for a pre-revenue, sub-50-employee foundation model company represents a significant premium ascribed entirely to the scientific moat and strategic optionality of ESM3/ESM Cambrian rather than demonstrated commercial revenue or customer traction. Medium SI009, SI019, SI002
CE001 EvolutionaryScale's product portfolio consists of two model families: ESM3 (multimodal generative protein LM in 1.4B/7B/98B sizes) and ESM-C / Cambrian (embedding-focused protein LM in 300M/600M/6B sizes). High SE001, SE002, SE003
CE002 ESM3-small-2024-08 has 1.4 billion parameters; ESM3-medium-2024-08 has 7 billion; and ESM3-large-2024-03 has 98 billion parameters. High SE001, SE009, SE017
CE003 ESMC-300M uses 30 transformer layers with hidden width 960; ESMC-600M uses 36 layers with width 1152; ESMC-6B uses 80 layers with width 2560. Medium SE002
CE004 Open weights for ESM3-small-2024-08 and ESMC-300M/ESMC-600M are available on HuggingFace under the Cambrian Non-Commercial License Agreement, which prohibits commercial use. High SE001, SE014, SE009
CE005 EvolutionaryScale's open weights for ESM3-small were first released in June 2024 concurrent with the ESM3 launch; ESMC-300M and ESMC-600M open weights were released in December 2024. High SE001, SE002
CE006 ESM3's flagship proof of concept is esmGFP, a novel functional fluorescent protein designed with only 58% sequence identity to the nearest known natural GFP — approximately equivalent to 500 million years of evolutionary distance. High SE001, SE005, SE006
CE007 The Forge API (forge.evolutionaryscale.ai) provides programmatic access to ESM3 and ESMC models through a Python SDK (pip install evoscale-sdk) with synchronous and asynchronous inference and a batch executor; the API was opened to public beta in January 2025. High SE001, SE004, SE009
CE008 Amazon Web Services and NVIDIA are EvolutionaryScale's primary commercial deployment partners: ESMC-6B is deployed on AWS SageMaker JumpStart, and NVIDIA is integrating ESM-C into BioNeMo NIM. High SE017, SE018, SE019
CE009 ESM3 uses a multitrack transformer architecture with three separate discrete token tracks: amino acid sequence tokens, VQVAE-encoded structure tokens (representing 3D backbone coordinates), and function keyword tokens (GO annotations). High SE001, SE005, SE006
CE010 ESM3-large (98B parameters) was trained on 2.78 billion proteins, 771 billion unique tokens, using 1.07×10²⁴ floating-point operations on the Andromeda cluster. High SE001, SE005, SE017
CE011 ESM3 employs a vector quantized variational autoencoder (VQVAE) to encode 3D protein backbone coordinates as discrete structural tokens, enabling the transformer to natively generate structure as tokens rather than as continuous coordinates. High SE001, SE006
CE012 ESM3 is pre-trained using a masked language modeling (MLM) objective applied jointly across all three tracks (sequence, structure, function), enabling the model to infer any track from the others. High SE001, SE005
CE013 Reinforcement learning from human feedback (RLHF) was applied to ESM3-large to align outputs with human preferences for protein design tasks. Medium SE001
CE014 ESM-C uses a Pre-Layer Normalization transformer architecture with rotary positional embeddings (RoPE), SwiGLU feed-forward activations, and masked language modeling pre-training. Medium SE002
CE015 ESMC training compute: ESMC-300M was trained on 1.26×10²² FLOPs; ESMC-600M on 2.17×10²² FLOPs; ESMC-6B on 2.37×10²³ FLOPs. Medium SE002
CE016 ESM-C was trained on three protein sequence databases: UniRef (83 million sequence clusters), MGnify (372 million), and JGI metagenomics (2 billion clusters), all clustered at 70% sequence identity. Medium SE002
CE017 EvolutionaryScale's DeepEP library is an open-source CUDA/NCCL implementation of Mixture-of-Experts Expert Parallelism communication optimized for H800 GPUs, with 1,253 GitHub stars as of the research date. Medium SE011, SE010
CE018 NVIDIA reports that ESM3-large uses approximately 25× more FLOPs and 60× more training data than its predecessor, ESM2 (Meta AI), and was trained on NVIDIA H100 GPUs via the Andromeda HPC cluster. Medium SE017
CE019 The GitHub ESM repository (github.com/evolutionaryscale/esm) provides the official Python client library for the Forge API and access to open-weight models, with installation via pip. Medium SE009, SE012
CE020 The HuggingFace model card for esm3-sm-open-v1 showed 3,105 downloads in the prior 30 days and 291 likes as of the research access date. Medium SE013, SE014
CE021 The ESMC-300M model card showed 6,320 downloads on HuggingFace; the ESMC-600M model card showed 1,490 downloads, as of the research access date. Medium SE013, SE015
CE022 ESMC-6B is available via the Forge API for academic users and via AWS SageMaker JumpStart for commercial deployments; the SageMaker deployment uses a CloudFormation stack documented in the esm-sagemaker GitHub repository, with setup time of 15-25 minutes. Medium SE002, SE009, SE019
CE023 EvolutionaryScale's GitHub organization (github.com/evolutionaryscale) hosts nine public repositories including esm (flagship), DeepEP (1,253 stars), a NCCL fork, a Hugging Face transformers fork, a Mamba implementation, and esm-sagemaker. Medium SE010, SE011, SE009
CE024 EvolutionaryScale has not filed any SEC Form D equity offering disclosures as of May 2026, confirming the company's status as a privately held entity that has not made registered securities offerings. Medium SE021
CE025 NVIDIA announced a partnership with EvolutionaryScale to integrate ESM3 into the BioNeMo NIM platform for GPU-optimized inference, and participated in EvolutionaryScale's seed investment. High SE016, SE017, SE018
CE026 Hacker News search results show ten or more community discussion threads covering EvolutionaryScale and ESM3, including "Show HN: ESM C" and multiple threads on the Science publication and initial ESM3 launch, indicating meaningful developer community engagement. Medium SE022
CE027 The ESM3 Science paper (Hayes et al., January 2025, Vol 387, Issue 6736, pp. 850-858, DOI 10.1126/science.ads0018) has accumulated 341 citations and 68,494 downloads, with 318 of the citations arriving within the first 12 months of publication. High SE005, SE026
CE028 The ESM3 preprint on bioRxiv (submitted July 2024, DOI 10.1101/2024.07.01.600583) was cited by 129+ downstream papers within its first year of availability, signaling rapid academic adoption. Medium SE006, SE008
CE029 esmGFP carries 96 mutations out of 229 total amino acid positions, achieving 58% sequence identity to the nearest known natural GFP and representing a protein in a region of sequence space separated from known fluorescent proteins by approximately 500 million years of evolutionary divergence. High SE001, SE005, SE006
CE030 ESM3 designed esmGFP by jointly optimizing across sequence, structure, and function tracks, using the multitrack generative capability to explore protein sequence space beyond the reach of natural evolution or previous directed-evolution methods. High SE001, SE005
CE031 The 58% sequence identity distance between esmGFP and the nearest natural GFP is comparable to the evolutionary separation between corals and jellyfish, which represent two distinct animal phyla. Medium SE001, SE006
CE032 EvolutionaryScale filed patents covering esmGFP and related protein design methods, as stated in the ESM3 bioRxiv preprint. Medium SE006
CE033 ESM3 competes in the protein AI landscape against AlphaFold3 (structure prediction, DeepMind, May 2024), Chai-1 (protein complex structure, Chai Discovery), and ESM2 (sequence LM, Meta AI); each competitor focuses on structure prediction rather than generative protein design. Medium SE027, SE028, SE017
CE034 EvolutionaryScale raised a $142 million Series A in September 2024 led by Lux Capital, with participation from Amazon and NVIDIA, following an earlier seed investment from NVIDIA. Medium SE024, SE025, SE020
CE035 In November 2025, EvolutionaryScale's team joined CZI Biohub as part of its Frontier AI & Biology initiative; co-founder and chief scientist Alex Rives was appointed head of science at Biohub. Medium SE023
CE036 Open-weight ESM3 and ESMC models are distributed under the Cambrian Non-Commercial License Agreement, which restricts use to non-commercial research; commercial customers must access models through the Forge API or AWS SageMaker. High SE001, SE014
CE037 EvolutionaryScale employs a dual-access commercial model: open-weight non-commercial access for research and community adoption, and Forge API / SageMaker commercial access with undisclosed pricing. Medium SE001, SE002, SE004, SE019
CE038 EvolutionaryScale is described as a public benefit company (PBC) in CZI Biohub's November 2025 announcement, consistent with its stated mission of advancing biology through responsible AI. Medium SE023
CE039 An independent BioRxiv preprint (December 2024) found that ESM3's binding prediction accuracy deteriorates when distinct per-variant relaxed protein structures are used as inputs, compared to a single consistent structural backbone — a finding the authors describe as the 'More Structure, Less Accuracy' paradox. Medium SE007
CE040 EvolutionaryScale has not publicly disclosed commercial pricing for the Forge API, customer names or contract counts, or revenue metrics as of May 2026. High SE003, SE004, SE021
CU001 The biohub/esm3-sm-open-v1 model on HuggingFace had approximately 3,110 downloads and 291 likes as of May 2026, reflecting academic uptake of the ESM3 open-weight model. Medium SU006
CU002 The biohub/esmc-300m-2024-12 model on HuggingFace had approximately 6,320 downloads and 30 likes as of May 2026. Medium SU006
CU003 The biohub/esmc-600m-2024-12 model on HuggingFace had approximately 1,490 downloads and 32 likes as of May 2026. Medium SU006
CU004 Total ESM-C family HuggingFace downloads across 300M and 600M open models sum to approximately 7,810 as of May 2026. Medium SU006
CU005 A Semantic Scholar API search for ESM3 and EvolutionaryScale returned 32 papers building on the ESM3 framework as of May 2026. Medium SU009
CU006 A bioRxiv search for 'evolutionaryscale ESM3' returned 129 preprint results as of May 2026, indicating broad academic interest in ESM3. Medium SU010
CU007 ESM-C models are available for commercial deployment on Amazon SageMaker under the Cambrian Inference Clickthrough License Agreement, enabling broad commercial use by enterprise customers. High SU003, SU004
CU008 ESM-C 6B is available for academic use via the Forge API and for commercial use via Amazon SageMaker, as stated in the ESM Cambrian launch blog post. Medium SU003
CU009 AWS SageMaker deployment of ESM-C requires admin-level AWS account access, subscription via the Marketplace, and uses CloudFormation to deploy a dedicated GPU endpoint in 15–25 minutes billed to the customer's AWS account. Medium SU004
CU010 NVIDIA BioNeMo was listed as an upcoming integration channel for ESM-C models as of December 2024; the live status of the integration could not be confirmed as of May 2026. Medium SU003, SU007
CU011 Adaptyv Bio, a protein engineering company based at Biopole Life Science Campus in Lausanne, Switzerland, has been confirmed as a named ESM ecosystem partner. Medium SU008
CU012 EvolutionaryScale opened the Forge API public beta in January 2025, offering scientists in academia and industry a free limited-time preview of ESM3 and ESM-C models. High SU001, SU002
CU013 The EvolutionaryScale GitHub organization includes an 'esm-partner' repository explicitly labeled 'Repository for partner collaborations,' indicating a formal partner pipeline. Medium SU005
CU014 EvolutionaryScale raised $142M Series A in September 2024 from Amazon (AWS) and NVIDIA as strategic investors, with Lux Capital, Nat Friedman, and Daniel Gross also participating. High SU011, SU012
CU015 NVIDIA participated in EvolutionaryScale's seed investment round, as confirmed by a dedicated NVIDIA Newsroom press release, establishing the NVIDIA–EvolutionaryScale relationship before the Series A. Medium SU016
CU016 Lux Capital co-led or participated in EvolutionaryScale's Series A round, as confirmed by a Lux Capital blog post announcing the investment. Medium SU017
CU017 No named pharmaceutical company (such as Pfizer, Eli Lilly, Novartis, or Roche) has been publicly disclosed as a paying enterprise customer of EvolutionaryScale as of May 2026. Medium SU014, SU018
CU018 Generate Biomedicines announced a multi-billion-dollar collaboration with Amgen, representing a commercial benchmark that EvolutionaryScale has not publicly matched as of May 2026. Medium SU019
CU019 Isomorphic Labs signed collaboration agreements with Eli Lilly and Novartis totaling over $3 billion in potential milestone value, creating a commercial proof standard EvolutionaryScale has not yet demonstrated. Medium SU021
CU020 ESM3 was published in Science Magazine on January 16, 2025 (DOI: 10.1126/science.ads0018), providing peer-reviewed academic validation that anchors downstream commercial trust. High SU015, SU002
CU021 MegSite (nucleic acid binding residue prediction), ProteinReasoner (multi-modal protein LM with chain-of-thought), and iNClassSec-ESM (non-classical secreted protein discovery) are among the named downstream academic applications built on the ESM3 framework. Medium SU009
CU022 EvolutionaryScale has disclosed no NRR, GRR, customer count, or customer satisfaction metrics as of May 2026; the company is in a pre-commercial-scale API beta phase. Medium SU014, SU018
CU023 The ESM3 open-weight model (1.4B parameters) is licensed for non-commercial use only; commercial access to all model scales (including 7B and 98B ESM3) requires Forge API tokens or AWS SageMaker subscriptions. High SU004, SU001
CU024 Biosecurity organizations including NTI and the Center for AI Safety have documented ongoing biosecurity concerns about dual-use capabilities of protein design AI, which may constrain the addressable commercial market for open-weight frontier protein language models. Medium SU028, SU029
CU025 BioNTech and InstaDeep fine-tuned an ESM language model (predecessor generation) on COVID spike proteins to create a variant early-warning system that flagged all 16 WHO variants of concern before official designation, demonstrating prior named corporate production use of the ESM family. Medium SU002
CU026 The global drug discovery market is valued at approximately $71.89 billion in 2025, growing at a CAGR of 9.20% through 2034, providing a large total addressable market for AI infrastructure vendors like EvolutionaryScale. Medium SU020
CU027 EvolutionaryScale's company website does not display customer logos, named case studies, testimonials, or enterprise customer success content as of May 2026. Medium SU001
CU028 ESM-C models on HuggingFace were updated as recently as two days before the research cache date (approximately May 2026), indicating active model maintenance and development velocity. Medium SU006
CU029 The GitHub repository evolutionaryscale/esm is the primary open-source distribution channel for ESM model weights, code, and API client libraries including the Forge and SageMaker SDKs. Medium SU004
CU030 The ESM3 open-weight model (1.4B parameters) was released on June 25, 2024 under a non-commercial license as stated in the ESM3 launch blog post. High SU002, SU004
CU031 The ESM Cambrian (ESM-C) model family was released on December 4, 2024 at three scales (300M, 600M, 6B), with open weights for the two smaller models and gated commercial access for the 6B model. Medium SU003
CU032 NVIDIA BioNeMo platform explicitly targets drug discovery, molecular design, virtual screening, and protein binder design use cases, which directly overlap with ESM3/ESM-C's primary commercial applications. Medium SU007
CU033 AWS SageMaker listing of ESM-C models creates a cloud-based commercial deployment channel for enterprise users who can subscribe and deploy without requiring direct Forge API account creation. Medium SU004, SU025
CU034 EvolutionaryScale's open-weight model strategy creates a top-of-funnel adoption mechanism where academic users build familiarity that can convert to commercial API usage, following a pattern analogous to open-source AI infrastructure companies. Medium SU001, SU004
CU035 ESM2, the predecessor to ESM3, with up to 15B parameters and freely available open weights, represents a no-cost substitution option for protein representation tasks that reduces commercial willingness-to-pay for ESM-C paid access. Medium SU004
CU036 Amazon AWS's strategic investment in EvolutionaryScale's Series A creates a structural incentive for preferential channel placement in AWS SageMaker JumpStart and AWS HealthOmics, giving EvolutionaryScale access to AWS's enterprise life sciences customer base. Medium SU011, SU004
CU037 Semantic Scholar papers building on ESM3 were published as recently as July–August 2025 (ProteinReasoner: July 25, 2025; MegSite: August 31, 2025), indicating ongoing downstream academic use at least 13 months after ESM3's release. Medium SU009
CU038 EvolutionaryScale is incorporated as a public benefit company (PBC), a legal structure that creates mission constraints that could limit commercialization of some high-profit but ethically questionable protein design applications. Medium SU001
CR001 ESM3 can generate proteins at 58% sequence identity to any known natural fluorescent protein, representing an equivalent of 500 million years of natural evolution, demonstrating the model's capability to design genuinely novel proteins far from existing sequence space. High SR014, SR028
CR002 EvolutionaryScale's canonical Responsible Development Framework URL (evolutionaryscale.ai/blog/responsible-development) returned a 404 error on 2026-05-18, indicating the public documentation page is not accessible at access date. Medium SR014
CR003 US Executive Order 14110 (October 30, 2023) explicitly mandates that developers of dual-use foundation models address security risks 'with respect to biotechnology, cybersecurity, critical infrastructure, and other national security dangers.' Medium SR001
CR004 A 2023 MIT study (arXiv 2306.03809) showed that general-purpose LLM chatbots could, in one hour, suggest four pandemic pathogen candidates, synthesis routes, DNA suppliers with lax screening, and troubleshooting protocols to non-scientists, indicating LLMs broadly lower biosecurity barriers. Medium SR004
CR005 No independent third-party biosecurity audit of EvolutionaryScale's Forge API guardrails has been publicly disclosed as of May 2026, making it impossible to verify the effectiveness of the company's self-regulatory biosecurity measures. Medium SR014, SR015
CR006 The Biological Weapons Convention (BWC), effective since 1975 with 189 states party as of May 2025, contains no AI-specific language and has no formal verification mechanism, leaving AI-designed protein risks unaddressed by existing international law. Medium SR006
CR007 The Center for AI Safety's 2023 statement (co-signed by Hinton and Bengio) identifies mitigating the risk of extinction from AI as a global priority on par with pandemics and nuclear war, with biological weapons specifically cited as a concern. Medium SR005
CR008 EvolutionaryScale's ESM Cambrian (Dec 2024) launch blog states that 'ESM C was reviewed by a committee of scientific experts who concluded that the benefits of releasing the models greatly outweigh any potential risks,' but the committee composition and evaluation criteria are not publicly disclosed. Medium SR015
CR009 The Asilomar Conference on Recombinant DNA (1975) established the precedent of voluntary self-regulatory frameworks for biotechnology, which EvolutionaryScale explicitly cites as inspiration for its Responsible Development Framework, but Asilomar's effectiveness in the longer term depended on subsequent binding FDA regulations. Medium SR014, SR027
CR010 The NTI biosecurity program identifies AI-biotech convergence as introducing 'risks of accidental misuse and deliberate exploitation, which could result in a biological catastrophe with grave consequences,' framing regulatory action as urgent. Medium SR007
CR011 Chai-1 (Apache 2.0, free for commercial use, released September 2024) achieves 0.849 Cα LDDT on CASP15 monomer prediction, outperforming ESM3-98B's 0.801, with 77% PoseBusters success vs AlphaFold3's 76%, making it the only freely available commercial-use model at or above ESM3 structure-prediction accuracy. Medium SR023
CR012 AlphaFold 3 database (Google DeepMind/EMBL-EBI) provides over 200 million protein structure predictions freely under CC BY 4.0, updated as of March 2026 to include protein complex structures, directly covering a major Forge API use case at no cost. Medium SR011
CR013 ESM3-98B's training consumed 1×10²⁴ FLOPs, described at launch as 'one of the highest throughput GPU clusters in the world,' creating a compute cost that is a recurring operational risk as the company scales inference and future model training. High SR014, SR028
CR014 Meta's facebookresearch/esm repository states it 'contains code and pre-trained weights for Transformer protein language models from the Meta Fundamental AI Research Protein Team (FAIR),' and ESM3's biorxiv preprint confirms ESM3 was developed by founders who were FAIR employees, raising IP provenance questions about the ESM2-to-ESM3 transition. Medium SR010, SR028
CR015 RFdiffusion (Baker Lab, Nature 2023) is freely available for protein backbone generation, binder design, symmetric oligomer design, and active-site scaffolding — core use cases also addressed by ESM3 — and is distributed with permissive licensing from the University of Washington's Institute for Protein Design. Medium SR009
CR016 OpenFold (Apache 2.0, AQ Laboratory) provides a trainable reproduction of AlphaFold2 that organisations can fine-tune on proprietary data, enabling pharmaceutical companies to build internal capabilities that reduce dependence on Forge API subscription. Medium SR012
CR017 ESM3 uses a discrete token representation of protein structure that tokenises 3D protein backbone into a sequence alphabet, an architectural innovation first published in the ESM3 Science paper (January 2025) and biorxiv preprint (July 2024), with patents filed. High SR028, SR016
CR018 AlphaFold 2's prediction accuracy at CASP14 was 'insufficient for a third of its predictions' per Wikipedia's AlphaFold article, indicating that even state-of-the-art protein AI models have non-trivial failure rates — a parallel risk for ESM3-generated sequences in drug-discovery applications. Medium SR022
CR019 Amazon (AWS) and Nvidia are simultaneously Series A investors in EvolutionaryScale and operators of competing bio-AI model distribution platforms (SageMaker JumpStart and BioNeMo respectively), creating a structural investor-competitor conflict. High SR017, SR013, SR026
CR020 NVIDIA BioNeMo is described as 'the development platform for AI-driven biology and drug discovery,' a direct competitor to EvolutionaryScale's Forge API, while NVIDIA is simultaneously an investor and hardware supplier to EvolutionaryScale. Medium SR013
CR021 ESM Cambrian (300M and 600M parameter models) are released as open-weight models for academic and commercial use, with ESM-C 6B available on Forge for academic use and on AWS SageMaker for commercial use, meaning EvolutionaryScale deliberately makes its representation models freely available to drive adoption of Forge API. High SR015, SR016
CR022 Meta's ESM2 model is available under the MIT license via the facebookresearch/esm repository, making it freely usable for commercial applications without royalty obligations — this creates a baseline that limits the premium a customer would pay for ESM3 API access for embedding-only use cases. Medium SR010
CR023 Chai-1's technical report demonstrates that multimer structure prediction without MSA at AlphaFold-Multimer quality level (69.8% DockQ acceptable vs 67.7%) is achievable under Apache 2.0 without API fees, representing a direct commercial threat to Forge API's structure-prediction use cases. Medium SR023
CR024 Meta's ESM Metagenomic Atlas blog (November 2022) confirms that ESMFold (based on ESM2) provides structure predictions up to 60x faster than the prior state-of-the-art, illustrating that Meta's FAIR team (EvolutionaryScale's founding employer) retains independent protein AI capabilities that could re-enter the competitive landscape. Medium SR024
CR025 The EU AI Act (Regulation 2024/1689), published 12 July 2024 and entering full enforcement August 2026, lays down harmonised rules for AI systems with EEA relevance, potentially subjecting general-purpose AI models with large training compute (>10²⁵ FLOPs) to systemic risk obligations including third-party audits. High SR020, SR021
CR026 The EU AI Act's full enforcement provisions take effect August 2026, meaning EvolutionaryScale will need to assess EU compliance — including conformity assessments, transparency obligations, and potentially human oversight for high-risk applications — within its current planning horizon. High SR020, SR021
CR027 The NIST AI Risk Management Framework Generative AI Profile (NIST-AI-600-1, published July 2024) provides voluntary guidance for organisations developing generative AI, increasingly referenced in government procurement requirements, creating de-facto compliance pressure for Forge API customers in the public sector. Medium SR002
CR028 FDA's AI/ML-enabled medical devices framework (SaMD) and 2024 action plan govern AI used in clinical diagnosis and treatment decision support but do not yet specifically regulate AI protein design tools used in pre-clinical drug discovery — a regulatory gap that may be filled if ESM3-based designs progress toward IND submissions. Medium SR003
CR029 No public BIS (Bureau of Industry and Security) final rule specifically governing export of protein language model weights or API access has been identified as of May 2026, but CSET has highlighted advancing US biotechnology governance as urgent for AI biosecurity, suggesting rulemaking activity is directionally likely. Medium SR008, SR001
CR030 The Biological Weapons Convention's absence of any AI-specific language or verification regime means that the primary international legal framework against bioweapons does not currently create binding compliance obligations for EvolutionaryScale specifically related to protein language model deployment. Medium SR006
CR031 Industry self-regulatory AI safety commitments (Anthropic's Responsible Scaling Policy, OpenAI safety commitments) set voluntary precedents for biosafety evaluation at capability thresholds, but EvolutionaryScale's Responsible Development Framework does not publicly specify comparable quantitative triggers or third-party verification requirements. Medium SR029, SR030, SR015
CR032 EvolutionaryScale has raised approximately $145 M total (seed plus $142 M Series A, September 2024) at a $1.35 B post-money valuation, with Amazon and Nvidia as lead investors; no subsequent funding round has been publicly disclosed as of May 2026. High SR017, SR019
CR033 EvolutionaryScale has disclosed no public revenue or ARR metrics; at $1.35 B valuation with $145 M raised and a pre-revenue or early-revenue profile, the implied revenue multiple significantly exceeds typical Series A SaaS multiples, creating down-round risk if commercial adoption is slower than investor expectations. Medium SR017
CR034 Amazon (AWS) is simultaneously a lead Series A investor, primary compute provider (AWS EC2 GPU instances), distribution channel (SageMaker JumpStart), and operator of a competing bio-AI discovery platform — a four-way conflict of interest with no public disclosure of ring-fencing arrangements. Medium SR017, SR026
CR035 Nvidia is simultaneously a lead Series A investor, primary GPU hardware supplier, BioNeMo platform operator (including ESM model distribution), and a developer of competing bio-AI capabilities — creating a comparable four-way structural conflict to Amazon's. Medium SR017, SR013
CR036 ESM Cambrian commercial use is available via AWS SageMaker, meaning Amazon earns transaction fees on Forge-equivalent commercial access to EvolutionaryScale's models — an arrangement that benefits Amazon's cloud revenue while potentially constraining EvolutionaryScale's direct-to-customer pricing power. Medium SR015, SR026
CR037 ESM3-98B training at 1×10²⁴ FLOPs represents one of the most computationally intensive biological model training runs recorded; the GPU compute costs for ongoing model development, API inference at commercial scale, and future ESM4 training represent a significant and growing operating expense with no public disclosure of unit economics. Medium SR014, SR013
CR038 All four named EvolutionaryScale founders (Alexander Rives, Tom Sercu, Zeming Lin, Salvatore Candido) are alumni of Meta AI's FAIR protein team, representing a single-employer concentration in the founding team with no disclosed external scientific advisory board. High SR028, SR014
CR039 The ESM3 biorxiv preprint author list names Thomas Hayes, Roshan Rao, Halil Akin, Nicholas Sofroniew, Deniz Oktay, Zeming Lin, Robert Verkuil, Tom Sercu, Salvatore Candido, and Alexander Rives among the core team — all identified as EvolutionaryScale, PBC employees — indicating technical concentration in the founding team. Medium SR028
CR040 Alexander Rives, CEO, is the original ESM model creator and lead author of the 2021 PNAS paper on ESM1v; Tom Sercu and Zeming Lin are the primary technical architects of ESM3 and ESM Cambrian respectively — departure of any of the three would represent a material scientific knowledge risk. Medium SR010, SR028
CR041 No succession plan, key-person insurance, or CEO independence structure has been publicly disclosed by EvolutionaryScale, leaving investors with no visible mitigation for key-person departure risk at the $1.35 B valuation level. Low
CR042 The facebookresearch/esm GitHub repository states it 'contains code and pre-trained weights' under Meta's terms; ESM3 was developed by founders who worked at Meta FAIR and built ESM2, creating a plausible IP provenance chain where Meta could assert rights over ESM3 commercial weights as derivative works. Medium SR010, SR024
CR043 The ESM3 biorxiv preprint competing interest statement discloses 'patents have been filed related to aspects of this work' but does not specify patent numbers, claims, status, jurisdiction, or relationship to Meta's prior art — leaving investors and customers unable to assess the durability of EvolutionaryScale's IP position. Medium SR028
CR044 EvolutionaryScale is incorporated as a Public Benefit Corporation (PBC), which in Delaware law creates a board obligation to balance shareholder interests with a stated public benefit purpose — potentially constraining purely commercial decisions about model access pricing or API gating in ways that may conflict with investor return expectations. Medium SR014, SR028
CR045 No litigation, regulatory complaint, enforcement action, or customer dispute records involving EvolutionaryScale, PBC have been identified in publicly accessible sources as of May 2026, indicating a clean legal record at this early stage. Medium SR017, SR014
CV001 EvolutionaryScale raised $142M in its Series A on September 26, 2024. High SV001, SV002, SV029
CV002 EvolutionaryScale's September 2024 Series A valued the company at approximately $1.35B post-money. High SV001, SV028, SV029
CV003 Amazon (AWS) and NVIDIA co-led EvolutionaryScale's Series A; Lux Capital, Nat Friedman, and Daniel Gross participated. High SV001, SV002
CV004 EvolutionaryScale has raised approximately $145M in total funding including seed capital as of May 2026. Medium SV001, SV028
CV005 EvolutionaryScale has not publicly disclosed any revenue, ARR, Forge customer count, or gross margin as of May 2026. High SV002, SV028
CV006 ESM3 was published in Science Magazine on January 16, 2025, documenting the generation of a novel fluorescent protein equivalent to simulating 500 million years of evolution. High SV005, SV003
CV007 ESM3 was trained on 2.78 billion proteins and 771 billion tokens with over 1×10^24 FLOPs, described as the most compute ever applied to training a biological model. High SV003, SV005
CV008 The Forge commercial API platform provides fee-based access to ESM3 and ESM Cambrian models for pharmaceutical and biotech R&D users. Medium SV002, SV004
CV009 EvolutionaryScale distributes Forge via AWS SageMaker JumpStart and NVIDIA BioNeMo, providing direct access to global pharma R&D cloud infrastructure. High SV026, SV002
CV010 The ESM Cambrian model family has over 6,320 downloads on HuggingFace as of May 2026, indicating active developer community adoption. Medium SV033, SV004
CV011 Absci (NASDAQ:ABSI) had a market capitalization of approximately $800M as of May 2026. High SV006, SV009
CV012 Absci reported revenue of $2.8M for FY2025 (down from $4.5M in FY2024) and a net loss of $115.2M for FY2025. High SV009, SV006
CV013 Recursion Pharmaceuticals (NASDAQ:RXRX) had a market capitalization of approximately $1.555B as of May 2026. High SV007, SV010
CV014 Recursion reported Q1 2026 revenue of $6.47M, which fell short of analyst expectations, with cash of $665.2M providing runway into early 2028. High SV007, SV010
CV015 Recursion had an accumulated deficit of $2.1 billion as of December 31, 2025, reflecting the capital intensity of AI drug discovery platform development. High SV010, SV007
CV016 Schrodinger (NASDAQ:SDGR) had a market capitalization of approximately $893M as of May 2026, with a 52-week high of $27.63 and low of $10.94. High SV008, SV013
CV017 Generate Biomedicines has raised approximately $700M in total disclosed funding and operates a generative biology platform targeting protein therapeutics. Medium SV021, SV028
CV018 Profluent has raised approximately $44M in disclosed funding and introduced OpenCRISPR, described as the first AI-designed gene editor. Medium SV019, SV030
CV019 Cradle.bio has raised approximately $73M in total disclosed funding and serves top biopharma and industrial bio R&D teams for protein optimization. Medium SV020, SV030
CV020 Xaira Therapeutics launched in April 2024 with $1B in Series A funding, the largest-ever AI drug discovery Series A at the time. Medium SV025, SV032
CV021 Isomorphic Labs (Alphabet-backed) is developing AI drug discovery with Lilly and Novartis collaborations reportedly worth $3B+ in combined headline deal value. Medium SV023, SV029
CV022 AlphaFold 3 was released by Google DeepMind on May 8, 2024, predicting structure and interactions of all biomolecules; AlphaFold Server provides free access to 3M+ researchers across 190+ countries for non-commercial research. High SV027, SV032
CV023 The ESM2 predecessor protein language model is available as open-source software from EvolutionaryScale's GitHub, providing free sequence-embedding capability comparable to lower-capability ESM3 tiers. High SV003, SV033
CV024 The global AI in drug discovery market is estimated at $2.35B in 2025, projected to grow to $13.77B by 2033 at a CAGR of 24.8%. Medium SV032, SV016
CV025 The global protein engineering market is estimated at $5.09B in 2025, projected to grow to $23.59B by 2035 at a CAGR of 16.57%. Medium SV015, SV017
CV026 US VC deal value reached $74.6B across 2,859 deals in Q4 2024, the highest since Q2 2022, driven primarily by AI investment including five companies raising $4B+ rounds. High SV014, SV032
CV027 KPMG's Venture Pulse Q4 2024 warned that VC investors are becoming more discerning as to who the winners may be in the AI space and will favor companies with credible commercial models over AI-wrapper businesses. Medium SV014
CV028 No EvolutionaryScale Form D securities filing was identified in SEC EDGAR's full-text search database as of May 2026, consistent with private status and possible Regulation D without full disclosure. Medium SV031, SV011
CV029 All four co-founders of EvolutionaryScale (Rives, Sercu, Lin, Candido) joined from Meta AI FAIR, representing correlated key-person concentration risk. High SV002, SV028
CV030 EvolutionaryScale's $1.35B Series A at pre-revenue stage implies an approximately 9.5x post-money-to-raised multiple, substantially above historical norms for pre-revenue biotech Series A rounds. Medium SV001, SV028, SV029
CV031 The bull case for EvolutionaryScale is $3–5B, contingent on Forge achieving $50–100M ARR by 2027 through multi-pharma contracts and AWS/NVIDIA channel scale. Low SV001, SV032, SV028
CV032 The base case for EvolutionaryScale is $1.5–2.5B, reflecting a modest step-up from Series A entry if commercial ramp is slow ($10–25M ARR) and AWS/NVIDIA partnerships drive most revenue. Medium SV001, SV014, SV028
CV033 The bear case for EvolutionaryScale is $400M–800M, reflecting commoditization risk from open-source ESM2 and free AlphaFold 3, possible key-person departure, or acqui-hire by Amazon at a down-round price. Medium SV027, SV023, SV014
CV034 Amazon (AWS) and NVIDIA's co-investment creates a structural distribution advantage: both partners have a commercial incentive to route pharma API traffic through Forge via their respective cloud platforms. Medium SV026, SV002, SV009
CV035 Insilico Medicine completed an HKEX IPO raising approximately $293M in late 2025 (SEHK:3696), with a prior Series E valuation of ~$2.3B, providing a precedent for AI drug discovery company public market exits. Medium SV022, SV032
CV036 EvolutionaryScale's founders created the original ESM protein language model family at Meta AI FAIR, establishing unique domain authority and an institutional knowledge base not replicable at competing protein AI startups. High SV003, SV005, SV002
CV037 EvolutionaryScale's responsible development framework acknowledges dual-use biosecurity risks of ESM3; no public disclosure of specific customer screening protocols or DURC compliance procedures has been made. Medium SV002, SV003
CV038 Recursion's FY2025 10-K disclosed that three partners represented 95% of total partner program revenue, illustrating extreme customer concentration risk inherent in AI drug discovery platform businesses. High SV010, SV007
CV039 The drug discovery market is estimated at $71.89B in 2025, growing to $158.74B by 2034 at a 9.2% CAGR, representing the broader TAM for AI tools improving pharmaceutical R&D efficiency. Medium SV016, SV032
CV040 Recursion (RXRX) had a 52-week share price range of $2.80–$7.18 and Schrodinger (SDGR) a range of $10.94–$27.63 as of May 2026, evidencing high multiple compression volatility in public AI drug discovery comps. High SV007, SV008
CV041 Schrodinger (SDGR) most recently filed a 10-K on February 25, 2026 (for FY2025), confirming ongoing SEC reporting status and a current market cap of approximately $893M. High SV013, SV008
CV042 ESM3 was trained on biological data spanning diverse environments including the Amazon rainforest, hydrothermal vents, and soil microbiomes, representing one of the most comprehensive biological training datasets compiled by any private company. Medium SV003, SV005
Sources
IDPublisherTitleQuote
SO001 EvolutionaryScale EvolutionaryScale Official Homepage We are building the foundation for a new era of programmable biology — from foundational models for protein sequences to tools that let scientists design and understand proteins.
SO002 EvolutionaryScale ESM3 Release Blog Post We are releasing ESM3, a generative multimodal model for protein design. ESM3 reasons over the sequence, structure, and function of proteins.
SO003 EvolutionaryScale ESM Cambrian Launch Blog Post We are releasing ESM Cambrian, a new family of protein language models available in 300M, 600M, and 6B parameter sizes.
SO004 Crunchbase EvolutionaryScale on Crunchbase EvolutionaryScale raised a total of $142M in funding over 2 rounds. Their latest funding was raised on Sep 26, 2024 from a Series A round.
SO005 Hugging Face EvolutionaryScale Organization on Hugging Face EvolutionaryScale organization on Hugging Face; hosts ESM3 and ESM Cambrian model weights and model cards.
SO006 EvolutionaryScale EvolutionaryScale GitHub Organization EvolutionaryScale on GitHub: 9 repositories including esm, DeepEP, nccl, and transformers forks.
SO007 EvolutionaryScale / CZ Biohub ESM Repository on GitHub ESM: Evolutionary Scale Modeling — official model weights and inference code; repository redirected to biohub organization following acquisition.
SO008 EvolutionaryScale DeepEP Repository on GitHub DeepEP: An efficient expert-parallel communication library optimized for mixture-of-experts models and inference.
SO009 Science (AAAS) Simulating 500 million years of evolution with a language model We describe ESM3, a generative multimodal model that reasons over the sequences, structures, and functions of proteins. ESM3 was found to generate a new fluorescent protein distant from known sequences.
SO010 bioRxiv (Cold Spring Harbor Laboratory) ESM3: Simulating 500 million years of evolution with a language model (preprint) ESM3: Simulating 500 million years of evolution — BioRxiv preprint doi: 10.1101/2024.07.01.600583; authors include Rives, Sercu, Candido, Lin.
SO011 U.S. Securities and Exchange Commission SEC EDGAR Full-Text Search: Form D filings for EvolutionaryScale 2024-2026 0 results found for EvolutionaryScale in Form D filings from January 2024 through May 2026.
SO012 U.S. Securities and Exchange Commission SEC EDGAR Company Search: EvolutionaryScale Form D No companies found matching evolutionaryscale for Form D filings in SEC EDGAR.
SO013 LinkedIn EvolutionaryScale on LinkedIn EvolutionaryScale — Company size: 11-50 employees — Industry: Biotechnology Research — team has joined CZ Biohub.
SO014 CZ Biohub / Chan Zuckerberg Initiative CZ Biohub: Frontier AI for Biology Initiative We are thrilled to welcome the EvolutionaryScale team to the CZ Biohub Network. Alex Rives will serve as Head of Science at CZI, working to advance open biological science.
SO015 Bloomberg EvolutionaryScale Raises $142M from Amazon, NVIDIA EvolutionaryScale Inc. raised $142 million from Amazon.com Inc. and Nvidia Corp. in a new financing round. Full article paywalled; detailed terms and investor rights not accessible for diligence.
SO016 Reuters EvolutionaryScale raises $142M for AI protein design Reuters article confirmed as broken or inaccessible (401 JS-only response); content unavailable.
SO017 NVIDIA NVIDIA Blog: EvolutionaryScale ESM3 on BioNeMo and H100 NIM EvolutionaryScale and NVIDIA partner to deploy ESM3 as a NIM microservice on H100 infrastructure. Tom Sercu, co-founder and VP of engineering, described the partnership. NVIDIA participated in both the seed and Series A rounds.
SO018 Hacker News (Algolia API) Hacker News Stories About EvolutionaryScale Top HN stories include: ESM3 Simulating 500 million years of evolution (2024, ~500 points); EvolutionaryScale raises $142M Series A; EvolutionaryScale Acquired by CZI (Nov 2025, story 45838940).
SO019 NVIDIA NVIDIA NGC Catalog: ESM3 by EvolutionaryScale ESM3 listed in NVIDIA NGC catalog under Clara team; page rendered as JS-only SPA; existence confirmed but detailed content not accessible.
SO020 Semantic Scholar (Allen Institute for AI) Semantic Scholar API: ESM3 / EvolutionaryScale paper search Semantic Scholar API returns multiple papers related to protein language models and ESM3; provides citation count proxy and publication network for ESM family research.
SO021 Wikimedia Foundation Wikipedia: EvolutionaryScale (page not found) Wikipedia page for EvolutionaryScale does not exist; URL returns HTTP 404 Not Found. No Wikipedia article has been created for the company as of May 2026.
SO022 NVIDIA NVIDIA Clara BioNeMo Platform NVIDIA BioNeMo is a cloud platform for generative AI drug discovery; features ESM3 integration for protein sequence and structure generation.
SO023 NVIDIA NVIDIA News: NVIDIA Joins Seed Investment in EvolutionaryScale NVIDIA News URL for seed investment announcement returns news archive page rather than the specific article; original press release content is inaccessible.
SO024 EvolutionaryScale EvolutionaryScale Forge API Platform Forge API platform is a JavaScript-rendered SPA; no textual content accessible; existence confirmed but operational status post-acquisition is unknown.
SO025 GlobeNewswire (expected: EvolutionaryScale) GlobeNewswire: EvolutionaryScale Series A press release (expected) URL returned content for a different company (Banzai International press release); EvolutionaryScale Series A press release was not accessible at this URL.
SO026 Hugging Face / EvolutionaryScale HuggingFace: ESM3-sm-open-v1 model card ESM3-sm-open-v1 on HuggingFace: 3,110+ downloads; open model for non-commercial academic use; model card describes sequence, structure, and function inputs.
SO027 Axios Axios: EvolutionaryScale Series A funding protein AI Axios article on EvolutionaryScale Series A was rate-limited during fetch; content not retrieved; URL confirms coverage of the funding round.
SM001 MarketsandMarkets Protein Engineering Market Size, Share & Trends Analysis Report — MarketsandMarkets The protein engineering market size is projected to grow from USD 2.2 billion in 2019 to USD 3.9 billion by 2024, at a CAGR of 12.4%.
SM002 Precedence Research Protein Engineering Market Size, Growth & Forecast 2025–2035 — Precedence Research The global protein engineering market size was estimated at USD 5.09 billion in 2025 and is expected to reach around USD 23.59 billion by 2035, growing at a CAGR of 16.57%.
SM003 Allied Market Research Protein Engineering Market by Type, Application, and Region — Allied Market Research The global protein engineering market size was valued at $2.2 billion in 2022, and is projected to reach $7.7 billion by 2032, growing at a CAGR of 13.2% from 2023 to 2032.
SM004 Grand View Research Protein Engineering Market Size, Share & Trends Analysis — Grand View Research The global protein engineering market size was valued at USD 2.60 billion in 2023 and is expected to grow at a CAGR of 16.24% from 2024 to 2030.
SM005 Grand View Research Artificial Intelligence In Drug Discovery Market — Grand View Research The global artificial intelligence in drug discovery market was valued at USD 2.35 billion in 2025 and is expected to reach USD 13.77 billion by 2033 at a CAGR of 24.8%.
SM006 Precedence Research Drug Discovery Market Size, Share & Trends 2025–2034 — Precedence Research The global drug discovery market size was estimated at USD 71.89 billion in 2025 and is expected to reach around USD 158.74 billion by 2034, growing at a CAGR of 9.2%.
SM007 Google DeepMind AlphaFold: Protein Structure Database — Google DeepMind The AlphaFold Protein Structure Database provides open access to over 200 million protein structure predictions covering nearly all known proteins.
SM008 NVIDIA NVIDIA AI for Healthcare and Life Sciences — NVIDIA BioNeMo NVIDIA BioNeMo is the development platform for AI-driven biology and drug discovery. 2x faster biofoundation model training. 6x faster model inference.
SM009 Amazon Web Services Amazon SageMaker JumpStart — AWS
SM010 U.S. Food and Drug Administration (FDA) Artificial Intelligence in Drug Development — FDA FDA has received over 500 AI/ML-enabled drug development submissions since 2016. The CDER AI Council was established in 2024.
SM011 EvolutionaryScale Simulating 500 million years of evolution with a language model — ESM3 Blog ESM3 is a frontier multimodal generative model for biology. We are releasing ESM3 as open for academic and non-commercial use. For commercial access to ESM3, we are launching the Forge API.
SM012 EvolutionaryScale ESM Cambrian: Building the Frontier of Protein Language Models — Blog ESM C is now available on AWS SageMaker JumpStart and NVIDIA BioNeMo. ESM C is released under the MIT license for any use, including commercial applications.
SM013 Science (AAAS) Simulating 500 million years of evolution with a language model — Science ESM3 is a frontier multimodal generative model for biology that reasons over the sequence, structure, and function of proteins simultaneously, trained on sequences of 2.78 billion proteins.
SM014 bioRxiv (Cold Spring Harbor Laboratory) Search results: evolutionaryscale ESM3 — bioRxiv
SM015 bioRxiv (Cold Spring Harbor Laboratory) Simulating 500 million years of evolution with a language model — bioRxiv preprint We have developed ESM3, a frontier multimodal generative model for biology trained at the scale of evolution.
SM016 IQVIA IQVIA — Healthcare and Life Science Analytics
SM017 EvolutionaryScale (GitHub) evolutionaryscale/esm — GitHub repository
SM018 Hugging Face evolutionaryscale — Hugging Face organization page esm3-sm-open-v1: 3,110 downloads. esmc-600m-2024-12: 6,320 downloads.
SM019 Amazon Web Services AWS for Health — Genomics and Life Sciences
SM020 Statista Pharmaceutical industry research and development expenditure worldwide 2008–2024
SM021 Fortune Business Insights Artificial Intelligence In Drug Discovery Market Size & Forecast
SM022 Crunchbase EvolutionaryScale — Crunchbase company profile Total funding: $142M. Most recent funding: Series A.
SM023 National Human Genome Research Institute (NHGRI) DNA Sequencing Costs: Data — National Human Genome Research Institute The cost per raw megabase of DNA sequence dropped dramatically from ~$10,000 in 2001 to less than $0.01 by 2023, reaching approximately $100 per genome.
SM024 Lux Capital EvolutionaryScale — Lux Capital portfolio ESM3 is the first generative AI model for biology that simultaneously reasons over the sequence, structure, and function of proteins.
SM025 PyPI esm — Python Package Index This repository contains flagship protein models for EvolutionaryScale, as well as access to the API. ESM3 is our flagship multimodal protein generative model.
SP001 Profluent Bio Profluent — AI-Designed Proteins The world's first AI-designed gene editor, demonstrating authorship in action.
SP002 Generate Biomedicines Generate Biomedicines — Generative Biology "42,000 proteins generated, built, and tested – and we're just getting started."
SP003 Generate Biomedicines The Generate Platform "GB-0895 has the potential to shift treatment from monthly to just twice per year."
SP004 AbSci Corporation AbSci — Unlocking Novel Biology with AI "De novo design of biologics; Multi-parametric lead optimization; Data to train, AI to create, and wet lab to validate with 6 week cycle times"
SP005 Cradle Cradle — AI Protein Engineering Platform "Teams that use Cradle report 2-12x faster development timelines."
SP006 Google DeepMind AlphaFold — AI for Protein Structure "Demis Hassabis and John Jumper are co-awarded the Nobel Prize in Chemistry for their work on AlphaFold, alongside David Baker for his work on computational protein design."
SP007 Adaptyv Bio Adaptyv Bio — Cloud Lab for Protein Designers
SP008 Isomorphic Labs Isomorphic Labs — Reimagining Drug Discovery with AI "Isomorphic Labs is here to advance human health by building on and beyond the Nobel-winning AlphaFold system."
SP009 Chai Discovery Chai Discovery — Drug-Like Antibody Design "Drug-like antibody design against challenging targets with atomic precision"
SP010 Institute for Protein Design (University of Washington) Institute for Protein Design — We Create New Proteins "We create new proteins that solve challenges in medicine, technology, and sustainability."
SP011 Recursion Pharmaceuticals Recursion — Pioneering AI Drug Discovery "Over 50 petabytes spanning phenomics, transcriptomics, proteomics, ADME, and de-identified patient data."
SP012 Recursion Pharmaceuticals Recursion Drug Discovery Pipeline
SP013 Schrödinger Schrödinger — Physics-Based Software Platform for Molecular Discovery "Built upon more than 30 years of R&D, our industry-leading computational platform is transforming the way therapeutics and materials are discovered."
SP014 Schrödinger Schrödinger Computational Platform for Molecular Discovery & Design
SP015 Inceptive Inceptive — Foundation Models of Life, for Life "We build end-to-end foundation models that learn to design molecules directly from diverse observations of life. We specialize in sequence-based medicines like mRNA, siRNA, ASOs, and peptides."
SP016 Iambic Therapeutics Iambic Therapeutics — Better Technology for Better Medicines "IAM1363 for HER2: Highly selective, brain-penetrant inhibitor for HER2-driven cancers that has shown anti-tumor activity, safety and tolerability in Phase 1b studies"
SP017 Meta AI (Facebook Research) GitHub: facebookresearch/esm — Evolutionary Scale Modeling (esm) "This repository contains code and pre-trained weights for Transformer protein language models from the Meta Fundamental AI Research Protein Team (FAIR), including our state-of-the-art ESM-2 and ESMFold."
SP018 Meta AI (AI at Meta) facebook/esm2_t33_650M_UR50D · Hugging Face License:mit
SP019 OpenFold Consortium OpenFold Consortium — Open Ecosystem for AI Biology "Our goal is to develop an open ecosystem of accelerated AI for Biology tools in order to catalyze innovation, starting with state-of-the-art and permissively licensed protein structure prediction training and inference pipelines and models."
SP020 Meta AI ESM Metagenomic Atlas: The First View of the 'Dark Matter' of the Protein Universe "We found that using a language model of protein sequences greatly accelerates the speed of structure prediction (up to 60x)."
SP021 Nature De novo design of protein structure and function with RFdiffusion
SP022 Science (AAAS) Simulating 500 million years of evolution with a language model
SP023 EvolutionaryScale EvolutionaryScale on HuggingFace — ESM3 and ESMC Model Families
SP024 EMBL-EBI / Google DeepMind AlphaFold Protein Structure Database "AlphaFold DB provides open access to over 200 million protein structure predictions to accelerate scientific research."
SP025 EvolutionaryScale ESM3 — A Frontier Language Model for Biology (EvolutionaryScale Blog) "ESM3 represents a milestone model in the ESM family—the first created by our team at EvolutionaryScale, an order of magnitude larger than our previous model ESM2, and natively multimodal and generative."
SP026 U.S. Securities and Exchange Commission (EDGAR) Absci Corp (ABSI) 10-K Filing — FY2025 (Period: 2025-12-31; Filed: 2026-03-24)
SP027 Insilico Medicine Insilico Medicine — Generative AI Software for Drug Discovery
SP028 Xaira Therapeutics Xaira Therapeutics — AI Drug Discovery "We are building predictive and agentic AI models across the complete spectrum of the drug discovery and development process."
SP029 Wikipedia AlphaFold — Wikipedia
SI001 EvolutionaryScale EvolutionaryScale homepage — joining forces with Biohub
SI002 EvolutionaryScale ESM3: A new paradigm for protein language models — ESM3 release blog
SI003 EvolutionaryScale ESM Cambrian blog — commercial model release January 2025
SI004 EvolutionaryScale Forge API product page — commercial protein AI API
SI005 U.S. Securities and Exchange Commission SEC EDGAR full-text search — Form D filings for 'EvolutionaryScale' (0 results)
SI006 U.S. Securities and Exchange Commission SEC EDGAR full-text search — Form D filings for 'Evolutionary Scale' (0 results)
SI007 U.S. Securities and Exchange Commission SEC EDGAR company browse — Form D filings for 'evolutionary scale' (0 results)
SI008 U.S. Securities and Exchange Commission SEC EDGAR company browse — Form D filings for 'evolutionaryscale' (0 results)
SI009 CNBC EvolutionaryScale raises $142 million from Amazon, Nvidia for protein AI
SI010 Axios EvolutionaryScale Series A funding — protein AI $142 million Amazon NVIDIA
SI011 MIT Technology Review EvolutionaryScale raises $142 million for protein AI from Amazon, Nvidia
SI012 CNBC Chan Zuckerberg Initiative Biohub joins with EvolutionaryScale team
SI013 CZI Biohub Frontier AI for Biology Initiative — EvolutionaryScale team joins Biohub
SI014 Crunchbase EvolutionaryScale company profile — funding, investors, products
SI015 PitchBook EvolutionaryScale company profile — funding history
SI016 NVIDIA NVIDIA joins seed investment in EvolutionaryScale
SI017 NVIDIA NVIDIA partners with EvolutionaryScale — ESM3 on BioNeMo
SI018 NVIDIA EvolutionaryScale debuts with ESM3 generative AI model on BioNeMo and H100
SI019 NVIDIA NGC Catalog ESM3 model on NVIDIA NGC Catalog — Clara / BioNeMo resource
SI020 Lux Capital Lux Capital — EvolutionaryScale Series A portfolio announcement
SI021 GitHub EvolutionaryScale GitHub organization — repos and activity
SI022 GitHub evolutionaryscale/esm — ESM model repository
SI023 Hugging Face EvolutionaryScale/esm3-sm-open-v1 — open-weight ESM3 on HuggingFace
SI024 Bloomberg EvolutionaryScale raises $142 million from Amazon, Nvidia (Bloomberg; access blocked)
SI025 Wikipedia EvolutionaryScale — Wikipedia article
SI026 Hacker News (Algolia API) Hacker News search — EvolutionaryScale funding Series A discussions
SE001 EvolutionaryScale Simulating 500 million years of evolution with a language model ESM3 is a generative model that reasons over the sequence, structure and function of proteins simultaneously. We trained ESM3 on an enormous scale: 771B tokens and 1.07×10^24 FLOPs.
SE002 EvolutionaryScale ESM Cambrian: New foundational protein language models ESM Cambrian (ESMC) introduces a new family of protein language models with 300M, 600M, and 6B parameter sizes.
SE003 EvolutionaryScale EvolutionaryScale — Homepage
SE004 EvolutionaryScale Forge — EvolutionaryScale API Platform
SE005 American Association for the Advancement of Science (AAAS) Simulating 500 million years of evolution with a language model (Science, Vol 387) Hayes et al. Science Vol 387 Issue 6736 pp. 850-858 (January 16, 2025); DOI 10.1126/science.ads0018
SE006 bioRxiv / Cold Spring Harbor Laboratory Simulating 500M years of evolution with a language model (ESM3 preprint) esmGFP is 58% sequence identical to the nearest natural GFP and has 96 mutations out of 229 total amino acid positions.
SE007 bioRxiv / Cold Spring Harbor Laboratory More Structure, Less Accuracy: ESM3's Binding Prediction Paradox When distinct relaxed mutant structures are used per variant (rather than a single consistent backbone), ESM3's binding prediction performance deteriorates — a counter-intuitive result suggesting that more structural information can reduce accuracy.
SE008 bioRxiv / Cold Spring Harbor Laboratory bioRxiv search — EvolutionaryScale ESM3 citing papers
SE009 EvolutionaryScale GitHub — evolutionaryscale/esm (official Python client and model weights)
SE010 EvolutionaryScale GitHub — evolutionaryscale organization page
SE011 EvolutionaryScale GitHub — evolutionaryscale/DeepEP (MoE Expert Parallelism library)
SE012 GitHub / EvolutionaryScale GitHub API — evolutionaryscale/esm repository metadata
SE013 HuggingFace / EvolutionaryScale HuggingFace — EvolutionaryScale organization page
SE014 HuggingFace / EvolutionaryScale HuggingFace model card — esm3-sm-open-v1 3,105 downloads in the last month; 291 likes; license: Cambrian Non-Commercial License Agreement
SE015 HuggingFace / EvolutionaryScale HuggingFace model card — esmc-600m-2024-12
SE016 NVIDIA NVIDIA Clara BioNeMo — Platform for generative AI in drug discovery
SE017 NVIDIA EvolutionaryScale Debuts With ESM3 Generative AI Model for Protein Design ESM3 was trained using the Andromeda cluster, which uses NVIDIA H100 GPUs and NVIDIA Quantum-2 InfiniBand networking. ESM3 uses roughly 25x more flops and 60x more data than its predecessor, ESM2.
SE018 NVIDIA NVIDIA NGC Catalog — ESM3 (Clara resource)
SE019 Amazon Web Services Amazon SageMaker JumpStart — Foundation models and ML solutions
SE020 Crunchbase EvolutionaryScale — Crunchbase organization profile
SE021 U.S. Securities and Exchange Commission (SEC) SEC EDGAR EFTS — Form D search for EvolutionaryScale
SE022 Hacker News (Algolia) Hacker News search — EvolutionaryScale community discussion
SE023 CZI Biohub Biohub launches initiative combining frontier AI & frontier biology The team at EvolutionaryScale, a frontier AI research lab and public benefit company that has created groundbreaking, large-scale AI systems for the life sciences, will join Biohub to help advance this initiative. Alex Rives, EvolutionaryScale's co-founder and chief scientist, will serve as head of science.
SE024 Lux Capital EvolutionaryScale Series A — Lux Capital investment update
SE025 Axios EvolutionaryScale raises $142 million Series A — Axios Pro Health Tech
SE026 Semantic Scholar / Allen Institute for AI Semantic Scholar paper search — ESM3 EvolutionaryScale citations
SE027 DeepMind / Google AlphaFold — DeepMind protein structure prediction
SE028 Chai Discovery Chai Discovery — protein complex structure prediction
SE029 Semantic Scholar / Allen Institute for AI Semantic Scholar paper search — ESM3 protein language model evaluation benchmark limitation
SU001 EvolutionaryScale EvolutionaryScale — Company Homepage ESM3 is a family of models in three sizes: small, medium, and large, available through our API and our partner's platforms.
SU002 EvolutionaryScale ESM3: A Frontier Language Model for Biology (Blog) We're opening our API for biological intelligence, now in public beta, allowing scientists in academia and industry a free limited time preview of the capabilities of some of our models through Forge.
SU003 EvolutionaryScale ESM Cambrian: Revealing the Mysteries of Proteins with Unsupervised Learning (Blog) ESM C 6B is available on Forge for academic use, and AWS Sagemaker for commercial use. ESM C will also be available on NVIDIA BioNemo soon.
SU004 EvolutionaryScale (GitHub) evolutionaryscale/esm — GitHub Repository (README) ESM C models are also available on Amazon SageMaker under the Cambrian Inference Clickthrough License Agreement. Under this license agreement, models are available for broad use for commercial entities.
SU005 EvolutionaryScale (GitHub) evolutionaryscale — GitHub Organization Page esm-partner — Repository for partner collaborations
SU006 HuggingFace EvolutionaryScale — HuggingFace Organization Page biohub/esm3-sm-open-v1 Updated Jan 29, 2025 • 3.11k • 291 ... biohub/esmc-300m-2024-12 Updated 2 days ago • 6.32k • 30 ... biohub/esmc-600m-2024-12 Updated 2 days ago • 1.49k • 32
SU007 NVIDIA NVIDIA BioNeMo — AI Platforms for Healthcare and Life Sciences BioNeMo — NVIDIA BioNeMo™ is the development platform for AI-driven biology and drug discovery. Use Cases: biofoundation model building, molecular design, virtual screening, protein structure prediction, protein binder design
SU008 Adaptyv Bio Adaptyv Bio — Company Website
SU009 Semantic Scholar (Allen Institute for AI) Semantic Scholar API — ESM3 EvolutionaryScale Downstream Paper Search total: 32
SU010 bioRxiv (Cold Spring Harbor Laboratory) bioRxiv Search — evolutionaryscale ESM3 Preprint Results 129 Results for term 'evolutionaryscale ESM3'
SU011 BusinessWire (Berkshire Hathaway) EvolutionaryScale Raises $142M Series A to Advance Protein Language Models
SU012 CNBC EvolutionaryScale Raises $142 Million, Protein AI Amazon NVIDIA
SU013 GlobeNewsWire EvolutionaryScale Raises $142M Series A
SU014 Crunchbase EvolutionaryScale — Crunchbase Organization Profile EvolutionaryScale secured $142 million in a seed investment round. The funding was backed by Amazon and Nvidia and is intended for the development of protein-generating AI.
SU015 Science (AAAS) Simulating 500 million years of evolution with a language model
SU016 NVIDIA Newsroom NVIDIA Joins Seed Investment in EvolutionaryScale
SU017 Lux Capital EvolutionaryScale Series A — Lux Capital Blog
SU018 Wikipedia EvolutionaryScale — Wikipedia
SU019 Wikipedia Generate Biomedicines — Wikipedia
SU020 Precedence Research Drug Discovery Market Size, Share & Trends 2025–2034 The global drug discovery market size is valued at USD 71.89 billion in 2025 and is predicted to increase from USD 78.51 billion in 2026 to approximately USD 158.74 billion by 2034, expanding at a CAGR of 9.20% from 2025 to 2034.
SU021 Isomorphic Labs Isomorphic Labs — Company Homepage
SU022 Generate Biomedicines Generate Biomedicines — Company Homepage 42,000 proteins generated, built, and tested
SU023 Axios EvolutionaryScale Series A Funding — Axios
SU024 TechCrunch EvolutionaryScale — TechCrunch Tag Page
SU025 Amazon Web Services EvolutionaryScale ESM-C — AWS Marketplace Product Listing
SU026 Amazon Web Services Revolutionizing Drug Discovery with AI: A Spotlight on EvolutionaryScale (AWS Industries Blog)
SU027 NVIDIA Developer EvolutionaryScale ESM3 on NVIDIA (Developer Blog)
SU028 Nuclear Threat Initiative (NTI) NTI Biosecurity — Program Overview
SU029 Center for AI Safety Statement on AI Risk — safe.ai
SR001 U.S. Government Publishing Office / Federal Register Executive Order 14110: Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence addressing AI systems' most pressing security risks — including with respect to biotechnology, cybersecurity, critical infrastructure, and other national security dangers
SR002 National Institute of Standards and Technology (NIST) AI Risk Management Framework (AI RMF) — ITL AI Program NIST released NIST-AI-600-1, Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile
SR003 U.S. Food and Drug Administration Artificial Intelligence-Enabled Medical Devices
SR004 arXiv / MIT (Sandbrink, Shulman) Can large language models democratize access to dual-use biotechnology? In one hour, the chatbots suggested four potential pandemic pathogens, explained how they can be generated from synthetic DNA using reverse genetics, supplied the names of DNA synthesis companies unlikely to screen orders, identified detailed protocols
SR005 Center for AI Safety (CAIS) Statement on AI Risk Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.
SR006 Wikipedia Biological Weapons Convention As of May 2025, 189 states have become party to the treaty. The convention's effectiveness has been limited due to insufficient institutional support and the absence of any formal verification regime to monitor compliance.
SR007 Nuclear Threat Initiative (NTI) Biosecurity — NTI These technologies also introduce risks of accidental misuse and deliberate exploitation, which could result in a biological catastrophe with grave consequences.
SR008 Center for Security and Emerging Technology (CSET, Georgetown) Biosecurity and Innovation in the Age of AI: Safeguarding the Future of U.S. Biotechnology
SR009 Nature (Baker Lab / University of Washington) De novo design of protein structure and function with RFdiffusion RFdiffusion enables the design of diverse functional proteins from simple molecular specifications
SR010 Meta AI (facebookresearch) facebookresearch/esm: Evolutionary Scale Modeling — Pretrained language models for proteins This repository contains code and pre-trained weights for Transformer protein language models from the Meta Fundamental AI Research Protein Team (FAIR)
SR011 EMBL-EBI / Google DeepMind AlphaFold Protein Structure Database AlphaFold DB provides open access to over 200 million protein structure predictions to accelerate scientific research
SR012 AQ Laboratory (Columbia/UCSF) aqlaboratory/openfold: Trainable, memory-efficient PyTorch reproduction of AlphaFold 2
SR013 NVIDIA NVIDIA BioNeMo — AI Development Platform for Biology and Drug Discovery NVIDIA BioNeMo is the development platform for AI-driven biology and drug discovery
SR014 EvolutionaryScale ESM3: A New Era for Protein Design (Launch Blog — Responsible Development) We have created a Responsible Development Framework to guide our work towards our mission with transparency and clarity
SR015 EvolutionaryScale ESM Cambrian: Representation learning for protein language models ESM C was reviewed by a committee of scientific experts who concluded that the benefits of releasing the models greatly outweigh any potential risks
SR016 EvolutionaryScale evolutionaryscale/esm — ESM protein models and Forge API access
SR017 Crunchbase EvolutionaryScale — Crunchbase Company Profile
SR018 HuggingFace EvolutionaryScale — HuggingFace Organization Page
SR019 Bloomberg EvolutionaryScale Raises $142 Million for Protein AI
SR020 EUR-Lex (European Union) Regulation (EU) 2024/1689 — The Artificial Intelligence Act laying down harmonised rules on artificial intelligence ... to promote the uptake of human centric and trustworthy artificial intelligence
SR021 artificialintelligenceact.eu The Act Texts — EU Artificial Intelligence Act
SR022 Wikipedia AlphaFold AlphaFold 2's results at CASP14 were described as 'astounding' and 'transformational'. As of November 2025, the paper had been cited nearly 43,000 times.
SR023 Chai Discovery Introducing Chai-1: A Multi-Modal Foundation Model for Molecular Structure Prediction We tested Chai-1 across a large number of benchmarks, and found that the model achieves a 77% success rate on the PoseBusters benchmark (vs. 76% by AlphaFold3), as well as an Cα LDDT of 0.849 on the CASP15 protein monomer structure prediction set (vs. 0.801 by ESM3-98B)
SR024 Meta AI ESM Metagenomic Atlas: The first view of the 'dark matter' of the protein universe
SR025 Johns Hopkins Center for Health Security Center for Health Security — Mission and Research Focus We advance policies and practice addressing diverse challenges, including... the potential for biological accidents or intentional threats
SR026 Amazon Web Services Amazon SageMaker JumpStart
SR027 Wikipedia Asilomar Conference on Recombinant DNA A group of about 140 professionals participated in the conference to draw up voluntary guidelines to ensure the safety of recombinant DNA technology
SR028 bioRxiv / EvolutionaryScale Simulating 500 million years of evolution with a language model (Preprint) Authors are employees of EvolutionaryScale, PBC. Patents have been filed related to aspects of this work.
SR029 OpenAI Safety and Responsibility — OpenAI
SR030 Anthropic Responsible Scaling Policy
SR031 Wikipedia AI Safety
SV001 Crunchbase EvolutionaryScale — Funding, Investors, and Overview EvolutionaryScale secured $142 million in a seed investment round. The funding was backed by Amazon and Nvidia and is intended for the development of protein-generating AI.
SV002 EvolutionaryScale EvolutionaryScale — Company Homepage
SV003 EvolutionaryScale ESM3: A frontier language model for biology — ESM3 Release Blog trained with over 1x10^24 FLOPS and 98B parameters
SV004 EvolutionaryScale ESM Cambrian — EvolutionaryScale Blog
SV005 Science (AAAS) Simulating 500 million years of evolution with a language model Simulating 500 million years of evolution with a language model
SV006 Yahoo Finance Absci Corporation (ABSI) Stock Price, News, Quote & History Market Cap (intraday) 799.793M
SV007 Yahoo Finance Recursion Pharmaceuticals (RXRX) Stock Price, News, Quote & History Market Cap (intraday) 1.555B
SV008 Yahoo Finance Schrodinger, Inc. (SDGR) Stock Price, News, Quote & History Market Cap (intraday) 892.913M
SV009 Absci Corporation (SEC Filing) Absci Corporation Annual Report on Form 10-K for FY2025 (absi-20251231) Revenue was $2.8 million for the year ended December 31, 2025 compared to $4.5 million for the year ended December 31, 2024.
SV010 Recursion Pharmaceuticals (SEC Filing) Recursion Pharmaceuticals Annual Report on Form 10-K for FY2025 (rxrx-20251231) We had an accumulated deficit of $2.1 billion as of December 31, 2025.
SV011 SEC EDGAR EDGAR Filing Search — Absci Corp 10-K filings
SV012 SEC EDGAR EDGAR Filing Search — Recursion Pharmaceuticals 10-K filings
SV013 SEC EDGAR EDGAR Filing Search — Schrodinger Inc 10-K filings
SV014 KPMG Private Enterprise Venture Pulse Q4 2024 — US Venture Capital Trends people are now starting to become more discerning as to who the winners may be in the AI space — the companies with credible business models, creating highly disruptive solutions, as opposed to others who have put AI wrappers on existing solutions
SV015 Precedence Research Protein Engineering Market Size to Hit USD 23.59 Billion By 2035 The global protein engineering market size was estimated at USD 5.09 billion in 2025 and is predicted to increase from USD 5.95 billion in 2026 to approximately USD 23.59 billion by 2035, expanding at a CAGR of 16.57% from 2026 to 2035.
SV016 Precedence Research Drug Discovery Market Size, Share, and Trends 2025–2034 The global drug discovery market size is valued at USD 71.89 billion in 2025 and is predicted to increase from USD 78.51 billion in 2026 to approximately USD 158.74 billion by 2034.
SV017 MarketsandMarkets Protein Engineering Market — Global Forecast
SV018 Absci Absci — AI Biologics Drug Creation Platform
SV019 Profluent Profluent — AI Protein Design
SV020 Cradle Cradle — AI Protein Engineering Platform
SV021 Generate Biomedicines Generate Biomedicines — Generative Biology Platform
SV022 Insilico Medicine Insilico Medicine — Generative AI Drug Discovery
SV023 Isomorphic Labs Isomorphic Labs — Reimagining Drug Discovery with AI
SV024 Recursion Recursion — Pioneering AI Drug Discovery
SV025 Xaira Therapeutics Xaira Therapeutics — Company Homepage
SV026 NVIDIA NVIDIA Clara BioNeMo — AI Drug Discovery and Protein Language Models
SV027 Google DeepMind AlphaFold — Predicting Protein Structure and Interactions AlphaFold 3 and AlphaFold Server are launched — Google DeepMind and Isomorphic Labs introduce AlphaFold 3, which predicts the structure and interactions of all of life's molecules.
SV028 PitchBook EvolutionaryScale — PitchBook Funding Profile
SV029 Bloomberg EvolutionaryScale Raises $142 Million From Amazon, Nvidia
SV030 Hacker News (Algolia Search API) Hacker News — EvolutionaryScale funding Series A discussions
SV031 SEC EDGAR (Full-Text Search) EDGAR Full-Text Search — EvolutionaryScale Form D hits total value 0
SV032 Grand View Research Artificial Intelligence In Drug Discovery Market Report, 2033 The global artificial intelligence in drug discovery market size was estimated at USD 2.35 billion in 2025 and is projected to reach USD 13.77 billion by 2033, growing at a CAGR of 24.8% from 2026 to 2033.
SV033 Hugging Face EvolutionaryScale Organization — Hugging Face Model Hub