Diligence report AI inference infrastructure / developer tools Series C (private) 2026-06-14

Fireworks AI

Inference cloud for open models, priced for perfection

A top-tier AI-inference asset with elite founders and hypergrowth, priced for perfection against ~50% margins and structural commoditization risk.

Cover facts

Valuation 01

4 USD billion (Series C, Oct 2025) [CO011]

Total raised 02

327 USD million+ [CO014]

Annualized revenue 03

800 USD million (Sacra est., May 2026) [CI018]

Customers 04

10000 companies+ [CO018]

Tokens/day 05

15 trillion (early 2026) [CO022]

Gross margin 06

50 percent (est.) [CO024]

Company profile

Fireworks AI is a Redwood City-based AI inference cloud founded in 2022 by Lin Qiao and a team of former Meta PyTorch engineers. It lets enterprises run, fine-tune and scale hundreds of open-source LLM, image, audio and multimodal models in production via an OpenAI-compatible API, differentiating on proprietary inference optimization (FireAttention, FireOptimizer), best-in-class function calling and category-leading reliability. The company raised a $250M Series C at a $4B valuation in October 2025, reports rapid revenue growth to a third-party-estimated ~$800M annualized, and serves 10,000-plus companies including Cursor, Notion, DoorDash and Samsung.

Website: fireworks.ai
Founded: 2022-01-01
Founders: Lin Qiao, Dmytro Dzhulgakov, Dmytro Ivchenko
Founding location: Redwood City, California, USA
Headquarters: Redwood City, California, USA
Product: Usage-based AI inference platform: per-token serverless inference for open models, LoRA and reinforcement fine-tuning, dedicated and reserved GPU deployments, a function-calling model family (FireFunction), and a voice-agent platform, all on a proprietary optimized inference engine.
Customers: AI-native startups, digital-native enterprises and select Fortune 500 buyers building production generative-AI applications that need fast, cost-efficient, controllable open-model inference.
Business model: B2B usage-based monetization across serverless (per token), fine-tuning (per training token), reinforcement fine-tuning (per GPU-hour) and dedicated/reserved deployments, with bottoms-up developer entry expanding into negotiated enterprise contracts.
Stage: Series C (private, venture-backed)
Funding status: $250M Series C at $4B valuation (Oct 2025), >$327M total raised; reportedly in talks for a ~$15B round co-led by Index Ventures as of May 2026 (unconfirmed).

[CO001, CO011, CO014, CO018]

Executive summary

Top strengths

Rare founder-market fit - the team that built PyTorch at Meta now leads inference systems.
Hypergrowth to a reported ~$800M annualized revenue across 10,000+ customers and 15T tokens/day.
Blue-chip production references (Cursor, Notion, Sourcegraph, Upwork) with quantified outcomes.
Engineering-led differentiation (FireAttention, FireOptimizer), best-in-class function calling and 99.8% uptime.

Top risks

Inference commoditization and ~50% gross margins versus 70%+ software norms.
Hyperscaler bundling (Bedrock, Azure, Vertex) and NVIDIA acting as supplier, investor and competitor.
Low switching costs and multi-homing cap retention and pricing power.
Aggressive valuation ramp ($552M to $4B, $15B in talks) embeds flawless execution.

Open gaps

No audited financials or single reconciled, dated revenue figure (estimates span 6x within a year).
Gross margin, net revenue retention, churn, burn and runway are undisclosed.
Top-customer revenue concentration and GPU-supply contract terms are not public.
Preference stack and dilution structure of the next round are undisclosed.

Chapter 01

01Company Overview

1.1 Identity and Business Model

Fireworks AI is an American artificial-intelligence infrastructure company headquartered in Redwood City, California, founded in late 2022 by a team that left Meta's PyTorch organization. The company operates what it calls an "AI Cloud" for enterprise developer teams: a managed inference platform that runs, fine-tunes, and scales open-source large language, vision, audio, and multimodal models with low-latency serving. Its core thesis is "one-size-fits-one" inference, the belief that the highest-value AI is built on smaller, customizable open models tuned on enterprise-specific data rather than a handful of generic closed foundation models. Monetization is usage-based across the customer lifecycle: serverless inference billed per token, fine-tuning billed per training token, reinforcement fine-tuning billed per GPU-hour, and on-demand or reserved dedicated deployments billed per GPU-second or GPU-hour. The platform offers hundreds of models plus an OpenAI-compatible API, function calling, and enterprise security controls, positioning Fireworks between commodity GPU rental and closed-model APIs.[CO001, CO002, CO003, CO004, CO005, CO031]

FO002: Company snapshot logic

How identity, product, customers, capital and dependencies connect.

[CO001, CO004, CO018, CO024, CO028]

1.2 Founders and Leadership

Fireworks AI was co-founded by chief executive Lin Qiao alongside six colleagues, the majority of whom worked together on PyTorch at Meta. Qiao previously served as Senior Director of Engineering and Head of PyTorch at Meta, where she led an organization of more than 300 engineers, and earlier held roles at LinkedIn, IBM and other large systems companies; she holds a Ph.D. in Computer Science from UC Santa Barbara. Co-founders include Dmytro Dzhulgakov, a former core PyTorch maintainer who joined Facebook in 2011, and Dmytro Ivchenko, a Kyiv Polytechnic graduate who worked on PyTorch ranking at Meta, both originally from Ukraine. The remaining founders, James Reed, Benny Chen, Chenyu Zhao and Pawel Garbacki, bring experience from Meta's PyTorch compiler, ads infrastructure and core ML teams as well as Google Vertex AI. The founding team's deep inference-systems pedigree is repeatedly cited by investors as the company's core advantage and is also a key-person dependency concentrated in Qiao.[CO006, CO007, CO008, CO009, CO010, CO032]

Leadership and founder table
Person	Role	Background	Founder-market fit	Key-person dependency
Lin Qiao	CEO & co-founder	Head of PyTorch at Meta (300+ eng); LinkedIn, IBM; PhD UC Santa Barbara	Deep inference-systems and OSS leadership	High - public face, vision and fundraising lead
Dmytro Dzhulgakov	Co-founder (CTO-level)	Core PyTorch maintainer at Meta since 2011; from Kharkiv, Ukraine	Core inference engineering	High - principal technical architect
Dmytro Ivchenko	Co-founder	PyTorch ranking at Meta; LinkedIn; Kyiv Polytechnic	Large-scale ML systems	Medium
James Reed	Co-founder	PyTorch compiler team at Meta	Compiler / kernel optimization	Medium
Benny Chen	Co-founder	Meta ads infrastructure lead	Production infra strategy	Medium
Chenyu Zhao	Co-founder	Led Google Vertex AI	Cloud AI platform GTM	Medium
Pawel Garbacki	Co-founder	Core ML for Meta Newsfeed	ML systems and ranking	Medium

Founder list and backgrounds compiled from Index Ventures, Sequoia, scroll.media and executive directory sources; roles beyond CEO are not all formally titled publicly.

[CO006, CO007, CO008, CO009, CO010]

1.3 Funding and Capitalization

Fireworks has raised more than $327 million across a seed round and three priced rounds. A $25 million Series A led by Benchmark closed in March 2024, with Sequoia Capital, Databricks Ventures and angels including Frank Slootman, Sheryl Sandberg, Howie Liu and Alexandr Wang. A $52 million Series B led by Sequoia followed in July 2024 at a $552 million valuation, adding NVIDIA, AMD and MongoDB Ventures and bringing cumulative capital to $77 million. In October 2025 the company announced a $250 million Series C at a $4 billion valuation, co-led by Lightspeed Venture Partners, Index Ventures and Evantic with continued support from Sequoia; Sacra reports the round comprised roughly $230 million of primary capital plus a $20 million secondary. Strategic participants across rounds include NVIDIA, AMD, MongoDB and Databricks, tying the cap table to the hardware and data-platform ecosystems Fireworks depends on. As of May 2026 Sacra reports the company is in talks to raise again at a $15 billion post-money valuation, with Index set to co-lead, though terms are unconfirmed.[CO011, CO012, CO013, CO014, CO015, CO016]

Stakeholder or investor map
Stakeholder	Role / round	Control or economic importance	Diligence ask
Benchmark	Lead - Series A (Mar 2024)	Early lead investor; likely board seat	Confirm board composition and ownership %
Sequoia Capital	Lead - Series B; continued Series C	Multi-round backer; GP Sonya Huang	Confirm board seat and pro-rata stakes
Lightspeed Venture Partners	Co-lead - Series C (Oct 2025)	Late-stage lead at $4B	Confirm governance rights
Index Ventures	Co-lead - Series C; potential co-lead next round	Repeat backer (Sahir Azam); thesis investor	Confirm allocation in rumored $15B round
Evantic	Co-lead - Series C	New late-stage lead	Confirm fund profile and stake
NVIDIA	Strategic - Series B/C	Hardware supplier and investor	Assess GPU-allocation conflicts/benefits
AMD	Strategic - Series B/C	Alternative silicon supplier/investor	Assess MI-series adoption
MongoDB / Databricks	Strategic - Series B/C	Data-platform partners/investors	Confirm co-sell and partnership depth

Lead and strategic investors only; individual angels (Slootman, Sandberg, Liu, Wang) and seed backers are not enumerated. Board composition and ownership percentages are not public.

[CO011, CO012, CO013, CO015, CO016]

1.4 Scale and Traction Metrics

Fireworks reports rapid commercial scaling. At the Series C the company said it powers over 10,000 companies, a roughly tenfold increase from the Series B, serves hundreds of thousands of developers, and processes more than 10 trillion tokens per day; third-party profiles cite 15 trillion tokens per day by early 2026. The developer base grew from about 12,000 in February 2024 to 23,000 by the end of that year. Reported revenue figures vary by source and vintage and should be treated with care: the company stated annualized revenue had surpassed $280 million at the October 2025 Series C, while Sacra estimates roughly $305 million at year-end 2025 rising to about $800 million annualized by May 2026, and earlier 2025 coverage cited $130 million ARR with claims of profitability and 20x year-over-year growth. Gross margin is estimated near 50 percent, below the 70 percent-plus typical of software, because GPU costs sit in cost of goods sold; management has told investors it targets 60 percent.[CO018, CO019, CO020, CO021, CO022, CO023]

Snapshot KPI table
Metric	Value / Status	As of	Confidence	Gap or note
Valuation	$4.0B post-money (Series C)	Oct 2025	High	Reported $15B round in talks May 2026 (unconfirmed)
Total raised	>$327M	Oct 2025	High	Includes ~$20M secondary in Series C
Annualized revenue (company)	>$280M	Oct 2025	Medium	Company statement; not audited
Annualized revenue (Sacra est.)	~$800M	May 2026	Low	Third-party estimate; conflicts with company vintage
Customers	10,000+ companies	Oct 2025	Medium	~10x increase vs Series B
Developers	Hundreds of thousands	Oct 2025	Medium	23,000 cited end-2024
Tokens/day	10T+ (15T early 2026)	Oct 2025	Medium	Throughput metric, not revenue
Gross margin	~50% (targeting 60%)	2026	Low	Sacra estimate; GPU COGS heavy
Headcount	Not disclosed	2026	Low	No reliable public figure

Values compiled from company announcements and third-party analyst profiles; revenue and margin are estimates with conflicting vintages and are not audited financials.

[CO011, CO014, CO018, CO021, CO022, CO024]

FO003: Investability indicators

Traction, revenue trajectory and key-person signals beyond the headline KPI snapshot.

Revenue and growth are estimates across differing vintages; key-person concentration is a qualitative judgment.

[CO018, CO022, CO023, CO019, CO033]

1.5 Milestones and Adverse Signals

The company chronology runs from leaving PyTorch in 2022 through three financings, a string of platform launches (FireAttention, FireFunction V2, FireOptimizer, supervised and reinforcement fine-tuning), a Dev Day in 2025, a March 2026 launch on Microsoft Foundry, and the acquisition of Hathora to deepen real-time compute orchestration. Alongside the growth story sit genuine adverse signals that later chapters examine in depth. Independent reviewers note that Fireworks is "just the engine," requiring meaningful developer sophistication, and flag thin documentation and the absence of an ongoing free tier. Analysts highlight three structural risks: inference commoditization as open-source serving frameworks such as vLLM and SGLang improve, hyperscaler bundling by AWS Bedrock, Azure and Vertex, and hardware concentration given Fireworks does not own its GPU fleet while NVIDIA has entered inference directly through its Lepton acquisition. These pressures sit against an unusually strong founding team and fast revenue ramp.[CO025, CO026, CO027, CO028, CO029, CO030]

Milestone table
Date	Event	Type	Amount / valuation / status	Implication
2022	Team leaves Meta PyTorch; Fireworks founded in Redwood City	founding	n/a	Origin of inference-systems pedigree
Feb 2024	Reaches ~12,000 developers	scale	12,000 devs	Early bottoms-up traction
Mar 2024	Series A led by Benchmark	financing	$25M	First institutional lead
Jul 2024	Series B led by Sequoia	financing	$52M @ $552M	Compound-AI positioning
2024	FireFunction V2 and FireAttention V2 launched	product	released	Function calling and long-context speed
Dec 2024	Developer base reaches ~23,000	scale	23,000 devs	Roughly doubled in 10 months
Jun 2025	Supervised Fine-Tuning V2 released	product	released	Broader model + QAT support
2025	Reinforcement fine-tuning and Dev Day 2025	product	released	Agentic tuning wedge
Oct 2025	Series C co-led by Lightspeed, Index, Evantic	financing	$250M @ $4B	10x customer growth vs Series B
Early 2026	Scales to ~15T tokens/day	scale	15T tokens/day	Throughput leadership claim
Mar 2026	Launch on Microsoft Foundry (Azure)	partnership	live	Hyperscaler distribution
2026	Acquires Hathora for real-time compute orchestration	governance	acquisition	Vertical integration up the stack
May 2026	Reported talks for new round at $15B	financing	$15B (rumored)	Potential ~4x step-up in <1 year

Chronology compiled from Fireworks blogs, funding announcements and analyst profiles; dates for some product launches are approximate to the announcement month.

[CO011, CO014, CO019, CO025, CO026, CO027]

FO001: Company milestone timeline

Dated milestones across founding, financing, product, scale and partnerships.

Some launch dates approximate to announcement month; the $15B round is unconfirmed.

[CO011, CO014, CO019, CO025, CO026, CO017]

1.6 Exhibits

Chapter 02

02Market Analysis

2.1 Market Boundary and Definition

Fireworks operates in the managed AI inference market: the serving, fine-tuning and dedicated deployment of open-weight large language, vision, audio and multimodal models for production applications. The relevant included spend is what enterprises pay third parties to run models in production rather than what they spend training foundation models or renting bare GPUs. Excluded from the core boundary are foundation-model training compute consumed by frontier labs, raw GPU infrastructure-as-a-service from providers such as CoreWeave and Lambda, and closed-model APIs from OpenAI and Anthropic, although closed APIs are the most important status-quo substitute. Adjacent budget pools that Fireworks is expanding into include voice agents, retrieval-augmented generation with vector databases, and reinforcement-learning training for agents. The most direct substitutes for Fireworks are self-hosting open models on vLLM or SGLang, hyperscaler bundles like AWS Bedrock and Azure Foundry, and continued reliance on closed APIs. Defining this boundary first is essential because headline "AI inference" market figures conflate hardware, hyperscaler and independent-provider spend.[CM001, CM002, CM003, CM004, CM005]

Market definition table
Segment / category	Included spend	Excluded spend	Buyer / payer	Relevance to Fireworks
Managed open-weight inference	Per-token serverless serving of open models	Closed-model API usage	Eng/platform budget	Core market
Fine-tuning & adaptation	LoRA / SFT / RFT training spend	Foundation-model pretraining	ML/eng budget	Core adjacency
Dedicated / reserved GPU serving	Managed dedicated deployments	Bare-metal GPU IaaS rental	Platform/procurement	Core market
Voice & multimodal agents	Streaming STT+LLM+TTS stacks	Telephony hardware	Product budget	Expansion adjacency
RAG / embeddings	Embedding + reranking inference	Vector DB licenses	Eng budget	Expansion adjacency
Closed-model APIs (substitute)	n/a (excluded)	OpenAI/Anthropic API spend	Eng budget	Primary substitute

Boundary defines what Fireworks can capture as an independent inference provider; closed APIs and raw GPU IaaS are excluded but listed as substitutes.

[CM001, CM002, CM003, CM004]

2.2 Market Sizing Across Multiple Lenses

No single number captures Fireworks' opportunity, so we triangulate three lenses. The broadest top-down lens, the global AI inference market, is estimated by MarketsandMarkets at $106.15 billion in 2025 growing to $254.98 billion by 2030, a 19.2% CAGR; other research houses place 2026 between roughly $118 billion and $126 billion and 2034 between $312 billion and $536 billion. This lens, however, is dominated by semiconductor and hyperscaler spend and overstates Fireworks' reachable market. A narrower lens is generative-AI model spend, which Gartner (cited by Index Ventures) projects will nearly triple from $14 billion in 2025 to $39 billion by 2028, with much of the growth in specialized and fine-tuned models that favor Fireworks. The most relevant serviceable lens is the independent open-weight inference-serving niche, which has consolidated around roughly seven providers; with Together AI near $1 billion annualized revenue, Fireworks in the $280-800 million range and Groq valued at $6.9 billion, the independent-provider revenue pool is a few billion dollars today but expanding quickly. Fireworks' own ~$280 million-plus revenue represents an early single-digit share of that niche.[CM006, CM007, CM008, CM009, CM010, CM011]

TAM/SAM/SOM or sizing lens table
Lens	Publisher	Year	Value	CAGR / note	Confidence	Limitation
Top-down AI inference (TAM)	MarketsandMarkets	2025-2030	$106.15B -> $254.98B	19.2% CAGR	Medium	Dominated by chips & hyperscalers
Top-down AI inference (alt)	Fortune/Polaris/R&M	2026 / 2034	~$118-126B / $312-536B	13-19% CAGR	Low	Wide spread across houses
GenAI model spend (lens)	Gartner (via Index)	2025-2028	$14B -> $39B	~40%/yr	Medium	Includes closed-model spend
Independent inference niche (SAM)	Sacra / triangulated	2026	Low single-digit $B	Fast-growing	Low	No standard analyst measure
Fireworks revenue (SOM)	Fireworks / Sacra	2025-2026	$280M -> ~$800M	High growth	Low	Conflicting vintages

Three-lens triangulation; the top-down TAM overstates Fireworks' reachable market, so SAM/SOM rely on company-level estimates with low confidence.

[CM006, CM007, CM008, CM009, CM010, CM011]

FM001: Market sizing lens

TAM/SAM/SOM layers for the AI inference opportunity.

Layers use different vintages; SAM is a triangulated estimate, not an analyst measure.

[CM006, CM009, CM010, CM012]

FM002: Market estimate range

Low/base/high estimates of the AI inference market by forecast year, in USD billions.

Ranges span MarketsandMarkets, Polaris, Fortune, Research and Markets and Gartner estimates; units are USD billions.

[CM006, CM007, CM008]

2.3 Buyer and Segment Map

Demand for Fireworks spans three buyer segments with different adoption paths. AI-native startups (for example Cursor, Perplexity and Liner) adopt bottoms-up: individual developers start with self-serve API keys and pay-as-you-go billing, and the economic buyer is an engineering or platform lead. Digital-native enterprises (DoorDash, Notion, Shopify, Upwork, Quora) move features from pilot to production and expand into dedicated deployments and fine-tuning, with budget owned by product-engineering organizations. Traditional and regulated enterprises (Samsung and, increasingly, healthcare and financial-services buyers) adopt top-down through negotiated contracts, requiring SSO, audit logs, data residency and HIPAA or SOC2 posture, with budgets owned by platform and procurement functions. Across all three, the user is a developer, the payer is an engineering budget, and the dominant adoption trigger is the cost, latency or control limitations of closed-model APIs at production scale. Fireworks' AWS Strategic Collaboration Agreement and Microsoft Foundry availability let it reach these buyers inside existing cloud procurement channels rather than as a standalone vendor.[CM013, CM014, CM015, CM016, CM017]

Segment / buyer map
Segment	Buyer	User	Payer	Adoption trigger
AI-native startups	Eng/platform lead	Developers	Eng budget	Closed-API cost/latency at scale
Digital-native enterprises	Product-eng org	Developers	Eng budget	Pilot-to-production scaling
Regulated/Fortune 500	Platform + procurement	Internal devs	Procurement budget	Data control & compliance
Voice/agent builders	Product owner	App users	Product budget	Sub-500ms latency need
RAG/search teams	Eng lead	Developers	Eng budget	Retrieval latency & cost

Across segments the user is a developer and the payer an engineering or procurement budget; adoption triggers differ by maturity and regulation.

[CM013, CM014, CM015, CM016]

FM003: Buyer / segment map

Buyer-user-payer relationships and the adoption path into Fireworks.

[CM013, CM014, CM017]

FM004: Adoption funnel or value-chain map

Purchase and deployment stages from awareness to enterprise standardization.

Stages synthesized from Fireworks go-to-market descriptions; values are illustrative relative weights, not disclosed conversion rates.

[CM015, CM016, CM017]

2.4 Growth Drivers and Adoption Constraints

Several drivers expand Fireworks' market. Open-source model quality is converging on closed counterparts, agentic and compound AI systems multiply inference calls per task, fine-tuning on proprietary data is becoming a competitive necessity, and enterprises increasingly want to own their AI rather than depend on a few closed labs. Cost pressure also helps: open-weight inference can run materially cheaper than closed APIs at scale. Working against these are powerful constraints. Inference is commoditizing as vLLM and SGLang improve, compressing the proprietary advantage of optimized stacks and triggering a price race in which Fireworks' Llama 70B price sits within roughly 2% of Together's. Hyperscaler bundling lets AWS, Azure and Google fold inference into existing security, billing and governance relationships. GPU supply is concentrated and Fireworks does not own its fleet. Regulation such as the EU AI Act adds compliance overhead, and the OpenAI-compatible API that lowers switching-in cost also lowers switching-out cost, capping durable lock-in.[CM018, CM019, CM020, CM021, CM022, CM023]

Growth drivers and constraints table
Driver / constraint	Direction	Timing	Implication	Diligence ask
Open-model quality convergence	Driver	Now	Expands addressable workloads	Track OSS vs closed quality gap
Agentic / compound AI	Driver	1-2 yrs	More inference calls per task	Measure tokens-per-workflow growth
Fine-tuning on proprietary data	Driver	Now	Higher-value, stickier spend	Assess RFT/SFT attach rates
Enterprise data ownership	Driver	1-3 yrs	Open-model preference	Survey buyer build-vs-buy
Inference commoditization	Constraint	Now	Margin/price compression	Monitor vLLM/SGLang parity
Hyperscaler bundling	Constraint	Now	Channel capture risk	Assess Bedrock/Azure overlap
GPU supply concentration	Constraint	Ongoing	Capacity/cost exposure	Review GPU contracts
Regulation (EU AI Act)	Constraint	1-3 yrs	Compliance overhead	Map obligations by tier

Drivers expand the market while constraints compress margins or capture the channel; timing indicates when each materially affects adoption.

[CM018, CM019, CM020, CM021, CM022, CM023]

2.5 Sizing Gaps and Contradictory Estimates

Several gaps limit confidence in market sizing. Published "AI inference" totals vary widely and bundle incompatible categories (chips, hyperscaler services and independent software), so the top-down TAM cannot be cleanly mapped to Fireworks' reachable revenue. The independent inference-provider revenue pool is not measured by any standard analyst; it must be assembled from individual company estimates of uneven vintage and reliability. Forecast CAGRs range from roughly 13% to 19% across houses, and 2034 estimates differ by more than $200 billion. Within this, Fireworks' own revenue figures are themselves contested across sources. These gaps mean the market is clearly large and growing fast, but the serviceable and obtainable shares relevant to valuation remain estimates rather than measured facts, and any sizing should be treated as directional. We preserve the failed precision rather than assert a single SAM.[CM025, CM026, CM027, CM028]

2.6 Exhibits

Chapter 03

03Competitors

3.1 Competitive Landscape

The inference market has segmented into four distinct competitive layers, and Fireworks faces pressure from each. Managed open-model platforms, principally Together AI, Baseten and Replicate, are the closest direct peers, competing on model breadth, developer experience and per-token price. Vertically integrated silicon players, Groq, Cerebras and SambaNova, attack latency and cost from custom hardware rather than software optimization on commodity GPUs. Hyperscaler bundles, AWS Bedrock, Google Vertex AI, Microsoft Azure Foundry and Databricks Model Serving, are the most structurally threatening because they collapse model access, infrastructure, governance and contracting into one platform. Finally, open-source serving frameworks such as vLLM and SGLang, plus packaging layers like NVIDIA NIM and routers like OpenRouter, commoditize the proprietary advantage embedded in Fireworks' own stack. Status-quo alternatives include continued use of closed APIs and internal self-hosting. The most likely new entrant pressure comes from NVIDIA itself, which entered inference directly via its Lepton acquisition and a competing GPU-cloud marketplace, turning a key supplier into a rival.[CP001, CP002, CP003, CP004, CP005, CP006]

FP001: Competitive positioning map

Providers plotted by price-competitiveness (x) and enterprise/breadth depth (y).

Axis positions are qualitative author judgments synthesizing pricing and capability evidence.

[CP001, CP014, CP015]

3.2 Competitor Profiles

Together AI is Fireworks' closest direct competitor: founded in 2021 by Percy Liang, Chris Ré and Vipul Ved Prakash, it raised a $305 million Series B in February 2025 at a $3.3 billion valuation, reportedly reached about $1 billion annualized revenue by early 2026, and spans serverless inference, dedicated clusters, fine-tuning, voice and reinforcement learning. Baseten positions as an enterprise inference-engineering platform with self-hosted and hybrid VPC deployment; it raised a $300 million round in January 2026 at a $5 billion valuation, led by IVP and CapitalG with a reported $150 million from NVIDIA, lifting total funding to roughly $585 million. Groq competes from custom LPU silicon, raising $750 million in September 2025 at a $6.9 billion valuation and advertising 750-plus tokens per second on Llama models, with a Meta partnership powering the official Llama API. Cerebras and SambaNova extend the hardware-led attack at the premium latency end, while Replicate, Modal and Anyscale compete for developer mindshare. Against these, Fireworks holds a $4 billion valuation and $280 million-plus revenue with category-leading reliability and function calling.[CP007, CP008, CP009, CP010, CP011, CP012]

Competitor profile table
Competitor	Layer	Funding / valuation	Target customer	Product scope	Indicative price (Llama 70B)	Strategic direction
Fireworks AI	Managed open-model	$327M raised / $4.0B	AI-native + enterprise devs	Serverless, fine-tuning, RFT, dedicated, voice	$0.90/M	Up the stack: tuning, agents, governance
Together AI	Managed open-model	$533.5M / $3.3B (talks $7.5B)	Startups to enterprise	Serverless, clusters, fine-tuning, voice, RL	$0.88/M	Owned GPU clusters + breadth
Baseten	Managed open-model	~$585M / $5.0B (talks $11B)	Compliance-heavy enterprise	Custom models, VPC/self-host runtime	Quote-based	Enterprise inference engineering
Replicate	Managed open-model	Private / undisclosed	Developers / experimentation	Broad model catalog, run-by-API	Per-run	Developer mindshare top of funnel
Groq	Vertical silicon	$750M+ / $6.9B	Latency-sensitive workloads	LPU inference API	$0.59/M	Custom silicon + Meta Llama API
Cerebras / SambaNova	Vertical silicon	Private / multi-$B	Performance-sensitive	Wafer-scale / RDU inference	Quote-based	Hardware-led latency leadership
AWS Bedrock / Azure / Vertex	Hyperscaler bundle	Public mega-caps	Existing cloud enterprises	Bundled model access + governance	Bundled	Vendor consolidation
Databricks / NVIDIA NIM	Hyperscaler / packaging	Public / private	Data-platform & infra buyers	Model serving / NIM packaging	Bundled	Absorb inference into platform

Funding and valuation from company announcements and Sacra; prices are indicative Llama 70B serverless rates and vary by tier and date.

[CP007, CP008, CP009, CP010, CP011, CP014]

3.3 Capability, Pricing and GTM Comparisons

On capability, Fireworks differentiates through reliability and structured output: independent monitoring put its Q1 2026 uptime at 99.8%, the highest among specialized providers, and its FireFunction models hit roughly 92% multi-tool function-calling accuracy, ahead of Together and Groq and within a few points of GPT-4o. On price, the field is razor-thin: Llama 3.3 70B runs about $0.90 per million tokens on Fireworks versus $0.88 on Together and $0.59 on Groq, and the same model spreads roughly sixfold across the seven-provider field. On raw speed Groq's LPU dominates at 400-750 tokens per second versus Fireworks' ~145, but Fireworks wins latency consistency and stability under load. On go-to-market, Together and Baseten match Fireworks' bottoms-up developer motion, but hyperscalers win distribution through existing procurement, security and billing relationships. On trust and regulation, Fireworks, Together and Baseten all offer SOC2/HIPAA postures and VPC or data-residency options, with Baseten leaning hardest into self-hosted compliance-heavy deployments.[CP014, CP015, CP016, CP017, CP018, CP019]

Feature / capability matrix
Capability	Fireworks	Together AI	Baseten	Groq
Serverless open-model API	Yes	Yes	Yes	Yes
Model catalog size	50+	200+	Custom-focused	15-20
LoRA fine-tuning	Yes	Yes + full FT	Yes	No
Function calling quality	Best-in-class (~92%)	Good	Good	Basic
Custom silicon	No	No	No	Yes (LPU)
VPC / self-hosted	EKS airgapped	Dedicated	Yes (core strength)	Limited
Voice agent platform	Yes	Yes	Partner	No
Reinforcement fine-tuning	Yes	Yes	Partial	No

Compiled from provider docs, TokenMix and Sacra; 'Best-in-class' reflects independent FireFunction benchmark results.

[CP014, CP015, CP016, CP017]

Pricing / packaging comparison
Metric	Fireworks	Together AI	Groq	Note
Llama 3.3 70B ($/1M)	$0.90	$0.88	$0.59	Fireworks ~2% over Together, 66% under Bedrock
Llama 3.3 8B ($/1M)	$0.20	$0.18	$0.05	Groq cheapest
Q1 2026 uptime	99.8%	99.7%	99.4%	Fireworks highest
Throughput (tok/sec)	145	95	420	Groq fastest
TTFT P50	150ms	220ms	65ms	Groq lowest latency
Fine-tuning	LoRA $16/M	LoRA+full $14/M	None	Together cheapest/broadest
Batch API	Not yet	Yes (30-50% off)	No	Together advantage

Prices and benchmarks from TokenMix April 2026 and DeployBase; figures are indicative and change frequently.

[CP014, CP015, CP018]

FP002: Feature breadth / capability map

Capability coverage across the four direct and silicon competitors.

Capability cells summarized from provider documentation and benchmarks.

[CP016, CP017]

3.4 Switching Costs, Lock-in and Distribution Power

Switching costs in inference are structurally low. Most providers, including Fireworks, Together, Groq and Baseten, expose OpenAI-compatible APIs, so migration between them can take minutes, and routing aggregators such as OpenRouter and TokenMix actively encourage multi-homing and automatic failover across providers. This caps durable lock-in for everyone and means share is defended by performance, tuning and enterprise integration rather than contracts. Distribution power is increasingly decisive: hyperscalers and NVIDIA control the GPU supply, security posture and procurement relationships that independents depend on. Fireworks' counter is to plug into those channels through its AWS Strategic Collaboration Agreement and Microsoft Foundry availability, while moving up the stack into fine-tuning, reinforcement learning, voice and enterprise governance to create stickier, higher-value relationships. Baseten's VPC and self-hosted footprint and Together's owned data-center and GPU-cluster strategy are alternative answers to the same distribution and supply problem.[CP020, CP021, CP022, CP023, CP024]

3.5 Moat Durability and Adverse Evidence

Fireworks' moat is real but narrow. Its proprietary FireAttention and FireOptimizer stack converts inference-systems engineering into a performance-and-price advantage, and its reliability and function-calling lead are genuine. But the moat faces clear erosion vectors. Open-source serving frameworks like vLLM and SGLang keep closing the performance gap, and Baseten openly builds on them; NVIDIA pushes NIM as a packaging layer; Snowflake released Arctic Inference as an open vLLM plugin. Better-capitalized rivals raise the stakes: Groq at $6.9 billion, Baseten at $5 billion and Together near a rumored $7.5 billion all have more balance-sheet room for GPU commitments and enterprise go-to-market. Hardware concentration is an adverse signal too, since Fireworks does not own GPUs while NVIDIA, a supplier and investor, now competes directly. The durable question is whether Fireworks can keep extending its stack into tuning, agents and governance faster than the ecosystem commoditizes the serving layer.[CP025, CP026, CP027, CP028, CP029, CP030]

Moat durability / competitive risk register
Risk	Mechanism	Severity	Evidence
OSS serving parity	vLLM/SGLang close performance gap	High	Baseten builds on SGLang/vLLM/TGI
NIM packaging	NVIDIA standardizes enterprise inference	Medium	NVIDIA pushes NIM distribution
Supplier-as-competitor	NVIDIA enters inference via Lepton	High	NVIDIA GPU-cloud marketplace
Hyperscaler bundling	Bedrock/Azure absorb inference	High	Bedrock custom model import (Qwen)
Capital asymmetry	Rivals raise larger rounds	Medium	Groq $6.9B, Baseten $5B
Price commoditization	Razor-thin per-token spreads	High	Fireworks within 2% of Together
Low switching cost	OpenAI-compatible APIs + routers	Medium	OpenRouter multi-homing
Hardware concentration	No owned GPU fleet	Medium	Sources NVIDIA/AMD third-party

Risk register synthesizing Sacra analysis and pricing/benchmark sources; severity is the author's qualitative judgment.

[CP025, CP026, CP027, CP028, CP029, CP030]

FP003: Moat / readiness KPIs

Indicators of Fireworks' competitive standing.

KPIs synthesize benchmark and funding evidence; speed ratio is Fireworks throughput over Groq.

[CP014, CP015, CP028]

3.6 Exhibits

Chapter 04

04Financials

4.1 Revenue Streams and Pricing Model

Fireworks operates a usage-based B2B model layered across several product surfaces that map to the customer lifecycle. Serverless inference is billed per token, fine-tuning is billed per training token, reinforcement fine-tuning is billed per GPU-hour, and on-demand dedicated deployments are billed per GPU-second or GPU-hour, while reserved capacity is contracted separately on longer commitments at negotiated pricing. This lets Fireworks capture revenue at nearly every stage of a customer's AI workflow, from experimentation through scaled production. Published serverless rates illustrate the model: roughly $0.90 per million tokens for Llama 3.3 70B, $0.20 for the 8B model and $0.50 for DeepSeek V3, with image generation from about $0.013 to $0.04 per image and reserved capacity near $4.80 per hour per replica. Revenue mix is not disclosed, but analysts expect a shift toward higher-value dedicated deployments, fine-tuning and enterprise contracts rather than commodity serverless token volume, which would improve both margin and revenue durability over time.[CI001, CI002, CI003, CI004, CI005]

Revenue streams table
Stream	Billing basis	Lifecycle stage	Margin profile	Disclosure
Serverless inference	Per token	Experimentation & production	Lower (commodity)	Rates public
Fine-tuning (LoRA/SFT)	Per training token	Adaptation	Higher	Rates public
Reinforcement fine-tuning	Per GPU-hour	Adaptation/agents	Higher	Rates public
On-demand dedicated	Per GPU-second/hour	Production scaling	Higher	Rates public
Reserved capacity	Contracted commitment	Scaled enterprise	Highest (negotiated)	Not public
Voice / multimodal	Usage-based	Expansion	Mixed	Partially public

Streams and billing bases from Sacra and Fireworks pricing; revenue mix across streams is not disclosed and margin profile is qualitative.

[CI001, CI002, CI003]

Pricing / monetization table
Item	Price	Unit	Note
Llama 3.3 70B	$0.90	per 1M tokens	~2% over Together, 66% under Bedrock
Llama 3.3 8B	$0.20	per 1M tokens	Entry workloads
DeepSeek V3	$0.50	per 1M tokens	Frontier open model
Flux 1.1 Pro	$0.04	per image	Up to 1024x1024
SDXL 1.0	$0.013	per image	Lower-cost image gen
Reserved capacity	$4.80	per hour per replica	~50 concurrent requests
LoRA fine-tune (70B)	$16	per 1M training tokens	$2/M over Together
Free credits	$1	one-time	No ongoing free tier

Indicative April 2026 serverless rates from TokenMix and DeployBase; prices change frequently and exclude negotiated enterprise terms.

[CI004, CI005, CI007]

FI001: Revenue model bridge

How usage-based streams build toward total revenue across the customer lifecycle.

Stream shares are illustrative; Fireworks does not disclose revenue mix.

[CI001, CI002, CI003]

4.2 Go-to-Market and Sales Efficiency

Fireworks' go-to-market is bottoms-up at entry and top-down at expansion. Developers start immediately with self-serve API keys and pay-as-you-go billing, supported by $1 of free credits rather than an ongoing free tier, and a standard rate limit near 600 requests per minute. Larger customers graduate into negotiated enterprise relationships with higher rate limits, reserved capacity, account management, custom optimization and private deployment. Layered on top is a field and partner sales motion, anchored by an AWS Strategic Collaboration Agreement that funds proofs-of-concept and a startup acceleration program, giving Fireworks access to enterprise buyers through existing procurement channels rather than requiring a standalone vendor evaluation. Sales-efficiency metrics such as CAC, payback and net revenue retention are not disclosed, but the land-and-expand structure, in which a single serverless feature can grow into dedicated, fine-tuning, voice and reserved-capacity spend, is the principal efficiency lever, with blended annualized revenue per company estimated near $28,000 across a base skewed toward a smaller number of large production deployments.[CI006, CI007, CI008, CI009, CI010]

Unit economics table
Metric	Value / status	Driver	Confidence
Gross margin	~50%	GPU COGS heavy	Medium
Target gross margin	60%	Utilization + Blackwell + mix	Low
Blended ARPA	~$28K/yr	10,000+ companies	Low
Revenue concentration	Skewed to large deployments	Production whales	Low
Multi-LoRA utilization	Many variants per base model	Lower cost/variant	Medium
CAC / payback	Not disclosed	Bottoms-up + partner sales	Low
Net revenue retention	Not disclosed	Land-and-expand	Low

Unit-economics figures are Sacra estimates or qualitative; CAC, payback and NRR are not public.

[CI008, CI009, CI011, CI012, CI013]

4.3 Cost Structure and Gross Margin Drivers

Fireworks is not a pure software business: GPU procurement, capacity planning and regional infrastructure are real cost inputs embedded in cost of goods sold, which is why Sacra estimates gross margin near 50%, well below the 70%-plus typical of subscription software. Management has told investors it targets 60% through better GPU utilization, hardware efficiency gains on newer architectures such as NVIDIA Blackwell, and a revenue-mix shift toward dedicated and enterprise workloads. The core economic logic is that proprietary inference optimization, FireAttention and FireOptimizer, translates engineering into pricing power: if Fireworks serves a model faster and at higher throughput than a customer could self-host, it can charge a premium while undercutting the alternative's total cost. Multi-LoRA improves utilization by consolidating many fine-tuned variants onto a single base-model deployment, lowering compute cost per variant. The cost environment is shaped by NVIDIA and AMD data-center GPU economics, both of which report rapidly growing AI accelerator revenue, underscoring that Fireworks' input costs sit inside a supplier-driven, capacity-constrained market.[CI011, CI012, CI013, CI014, CI015, CI016]

FI002: Unit economics bridge

How GPU cost becomes gross margin via proprietary optimization and pricing power.

[CI011, CI012, CI013, CI014]

4.4 Public Traction Versus Private-Metric Gaps

Public traction signals are strong but inconsistently dated. Fireworks stated annualized revenue surpassing $280 million at its October 2025 Series C; Sacra estimates roughly $305 million at year-end 2025 rising to about $800 million annualized by May 2026; a third-party profile cites $315 million-plus by early 2026; and earlier 2025 coverage reported $130 million ARR with claims of profitability and roughly 20x year-over-year growth. The platform processes more than 10 trillion tokens per day (15 trillion by early 2026) across 10,000-plus companies and hundreds of thousands of developers. These are largely company-stated or estimated figures; audited financials, revenue mix, net revenue retention, churn and headcount are not public. The wide revenue spread, from $130 million to roughly $800 million annualized within twelve months, reflects both genuine hypergrowth and inconsistent measurement, and any single number should be treated as directional rather than verified.[CI017, CI018, CI019, CI020, CI021]

Public financial gaps table
Metric	Public status	What is missing	Diligence path
Revenue / ARR	Conflicting estimates	Single reconciled dated figure	Management-confirmed ARR
Gross margin	Analyst estimate ~50%	Audited margin	Confirmatory financials
Net revenue retention	Not disclosed	Expansion / churn data	Cohort retention pack
Headcount	Not disclosed	Employee count	HR / LinkedIn estimate
Burn & runway	Not disclosed	Cash flow statement	Bank balances + burn
Revenue mix	Not disclosed	Stream-level split	Product revenue breakdown

All listed metrics are private; the table frames the diligence asks needed to verify financial quality.

[CI017, CI018, CI020, CI028, CI029]

FI003: Financial estimate range

Annualized revenue estimates for Fireworks by source and vintage, in USD millions.

Estimates span company statements and third-party analysts of differing vintage; ranges approximate stated point figures.

[CI017, CI018, CI019]

4.5 Capital Adequacy and Financing Dependency

Fireworks has raised more than $327 million across seed, Series A, B and C rounds; the October 2025 Series C alone provided $250 million, comprising roughly $230 million of primary capital and a $20 million secondary, at a $4 billion valuation. That primary injection, combined with reported profitability in 2025 and a high-growth revenue base, suggests comfortable near-term capital adequacy, though cash on hand, burn rate and runway are not disclosed. The company has signaled it will grow its compute footprint three-to-four-fold over the next year, a capital-intensive plan that increases dependence on GPU access and could become the next-round trigger; Sacra reports Fireworks is in talks to raise again at a $15 billion valuation as of May 2026. The principal financing dependency is GPU supply: Fireworks does not own its fleet and sources NVIDIA and AMD capacity from third parties, exposing it to allocation constraints and to NVIDIA's own entry into inference. No public debt or project-finance obligations are disclosed.[CI022, CI023, CI024, CI025, CI026]

Capital adequacy table
Item	Value / status	As of	Note
Total raised	>$327M	Oct 2025	Seed through Series C
Series C size	$250M	Oct 2025	$230M primary + $20M secondary
Valuation	$4.0B	Oct 2025	Post-money
Profitability	Reported profitable	Mid-2025	Per scroll.media; unverified
Cash / burn / runway	Not disclosed	2026	Diligence blocker
Planned use of funds	3-4x compute expansion	Next year	Capital-intensive
Next-round signal	$15B talks	May 2026	Per Sacra; unconfirmed
Debt / project finance	None disclosed	2026	No public obligations

Capital figures from company and Sacra; cash, burn and runway are not public, limiting capital-adequacy assessment.

[CI022, CI023, CI024, CI025]

FI004: Capital intensity / cash-flow map

How capital flows into compute and infrastructure and back into revenue and margin.

Flow synthesizes stated use of funds and analyst estimates; cash and burn are not disclosed.

[CI022, CI024, CI025, CI026]

4.6 Financial Verdict

On revenue quality, Fireworks shows credible hypergrowth and a usage-based model that captures spend across the customer lifecycle, but the absence of audited figures, disclosed revenue mix and retention metrics caps confidence. On margin, the roughly 50% gross margin is the central financial weakness: it is structurally below software norms because of GPU costs, and the path to the stated 60% target depends on utilization gains and a mix shift that are plausible but unproven. On capital intensity, the three-to-four-fold compute expansion and lack of owned GPUs make the model more capital-hungry and supply-dependent than a typical SaaS company. The main diligence blockers are a single reconciled revenue figure, gross-margin and unit-economics verification, burn and runway, and net revenue retention. The picture is a fast-scaling, well-funded business with real but supplier-exposed economics rather than a proven high-margin software compounder.[CI027, CI028, CI029, CI030]

4.7 Exhibits

Chapter 05

05Product & Technology

5.1 Product Definition in Customer Workflow Terms

In customer terms, Fireworks is the layer that takes an open-source model and makes it run in production fast, cheaply and reliably without the customer managing GPUs. A developer signs up, points an OpenAI-compatible API at a model such as Llama 4, DeepSeek or Qwen, and gets low-latency inference with function calling, JSON-mode structured output and streaming. As usage grows, the same customer can fine-tune a model on proprietary data, move to dedicated or reserved GPU capacity for guaranteed throughput, add retrieval and embeddings for RAG, and deploy voice agents. The platform spans text, image (Flux, SDXL), audio and multimodal formats across hundreds of models with day-zero support for major new releases. The core job it does for customers is to collapse the gap between a model that works in a notebook and one that serves millions of users in production, which Fireworks positions as the difference between experimentation and shipping. This is why its customers describe it as an inference engine rather than an application: it supplies speed, cost and control, while the customer builds the product.[CE001, CE002, CE003, CE004, CE005]

Workflow / use-case table
Use case	Customer example	Result	Source type
Code generation	Cursor	~1,000 tokens/sec Fast Apply	Customer story
Productivity AI	Notion	Latency 2s -> 350ms	Customer story
Code assistance	Sourcegraph	30% lower latency, 2.5x acceptance	Customer/AWS
Proposal drafting	Upwork (Uma)	Real-time tailored proposals	Customer story
Conversational search	Quora (Poe)	Tripled response speed	Reported
Email assistant	Superhuman	Ask AI compound system	Customer story
Enterprise search	Hebbia	Fast access to new open models	Analyst

Use cases and outcomes drawn from Fireworks customer stories, an AWS case study and analyst coverage; results are vendor- or customer-reported.

[CE002, CE018, CE019, CE020]

FE002: Customer workflow / operating flow

Developer journey from API call through speculative decoding to response.

[CE001, CE013, CE015]

5.2 Product Module and Asset Map

Fireworks' product surface decomposes into several modules. Serverless inference is the entry product: pay-per-token access to 50-plus actively served models (hundreds across the catalog), including Llama 4 Scout and Maverick, DeepSeek V3, Qwen 3, Mixtral, Gemma 3 and Phi-4, with image generation via Flux and SDXL and vision models. FireFunction is the proprietary function-calling model family for tool use and structured output. The customization modules include LoRA fine-tuning, Supervised Fine-Tuning V2 with quantization-aware training, and Reinforcement Fine-Tuning for agentic tasks, all exposed through a Build SDK and an Experiment Platform. The deployment modules span serverless, on-demand dedicated and reserved capacity, plus multi-LoRA hosting that packs many fine-tuned adapters onto one base deployment. Newer surfaces include the Voice Agent Platform, which co-locates transcription, language models and tool calling for sub-500ms response, and BYOB secure training that lets enterprises train from their own AWS S3 buckets. Together these modules let a single customer relationship expand from one serverless feature into a full production AI runtime.[CE006, CE007, CE008, CE009, CE010]

Product module / asset matrix
Module	What it does	Billing	Maturity
Serverless inference	Per-token access to 50+ served models	Per token	GA
FireFunction	Function calling / structured output	Per token	GA
LoRA fine-tuning / SFT V2	Customize models with QAT	Per training token	GA
Reinforcement fine-tuning	Train agents to surpass closed models	Per GPU-hour	GA
Dedicated / reserved deployments	Guaranteed throughput on dedicated GPUs	Per GPU-hour	GA
Multi-LoRA hosting	Many adapters on one base model	Per token	GA
Voice Agent Platform	STT + LLM + tool calling, sub-500ms	Usage-based	Newer
Build SDK / Experiment Platform	Programmatic build, tune, evaluate	Included	Newer

Module list compiled from Fireworks blog and docs; maturity is qualitative (GA = generally available, Newer = recently launched).

[CE006, CE007, CE008, CE009]

5.3 Architecture and Operating Model

Fireworks runs a proprietary, multi-layer inference stack on commodity NVIDIA GPUs. At the kernel layer, FireAttention is a custom CUDA attention implementation that Fireworks reports as substantially faster than vLLM and TensorRT-LLM, extended across versions to support long context and architectures like Llama 4's chunked local attention. Above it, FireOptimizer performs adaptive speculative execution, personalizing speculative decoding, draft-model selection and caching to each workload, with reported latency reductions up to roughly 3x in production and native FP4 support on NVIDIA Blackwell B200 hardware. The serving topology combines a stateless request router, draft and target GPU pods for speculative decoding, a distributed KV cache, continuous batching and disaggregated serving, scaling to documented tests around 50,000 requests per minute. Multi-LoRA consolidates many fine-tuned variants onto a single base model. The operating model is open-model neutral: Fireworks bets on running whichever open model is winning at a given moment rather than on any single model, which makes day-zero support for new releases a core engineering discipline.[CE011, CE012, CE013, CE014, CE015, CE016]

Technology / operating architecture table
Layer	Component	Function	Differentiation
API	OpenAI-compatible API	Model access, streaming, JSON mode	Low switching-in cost
Orchestration	Stateless request router	Route requests across pods	Scale to ~50K RPM
Optimization	FireOptimizer	Adaptive speculative execution	Up to ~3x lower latency
Speculation	Draft + target pods	Speculative decoding	Parallel token generation
Kernel	FireAttention	Custom CUDA attention	Faster than vLLM/TensorRT-LLM
Memory	Distributed KV cache	Reuse context, cut prefill	Lower latency on long context
Adaptation	Multi-LoRA	Many adapters per base model	Higher GPU utilization
Hardware	NVIDIA/AMD GPUs (incl. B200)	Compute substrate, FP4	Day-zero on new silicon

Architecture compiled from Fireworks blog/docs and independent technical write-ups; performance claims are vendor- or analyst-reported.

[CE011, CE012, CE013, CE014, CE015]

FE001: Product architecture map

The layered Fireworks inference stack from API down to GPU hardware.

Layering synthesized from Fireworks blog/docs and independent architecture write-ups.

[CE011, CE012, CE013, CE014]

5.4 Deployment, Reliability, Integration and Roadmap

Fireworks supports serverless, on-demand dedicated and reserved deployments across a global multi-region fleet with documented locations including Frankfurt, Iceland and Tokyo plus US, Europe and APAC regions, enabling latency and data-residency requirements. Integration is eased by an OpenAI-compatible API plus SDKs and connectors for frameworks such as LangChain and LlamaIndex, so migration from closed APIs can take minutes. Reliability is a headline claim: independent monitoring placed Q1 2026 uptime at 99.8%, the highest among specialized providers, with strong stability under load. Documented production results include Cursor reaching about 1,000 tokens per second for code generation, Notion cutting AI response latency from roughly 2 seconds to 350 milliseconds, and Sourcegraph seeing a 30% latency reduction and a 2.5x increase in completion acceptance. The roadmap, funded by the Series C, targets deeper research in tuning and inference alignment, an end-to-end model-lifecycle toolchain, and a three-to-four-fold expansion of global compute, alongside the Hathora acquisition to deepen real-time orchestration.[CE017, CE018, CE019, CE020, CE021, CE022]

Roadmap / release / development-stage table
Item	Stage	Timing	Implication
FireAttention (v2+)	Shipped	2024+	Long-context speed
FireFunction V2	Shipped	2024	Function calling
FireOptimizer	Shipped	2024	Adaptive optimization
Supervised Fine-Tuning V2	Shipped	Jun 2025	QAT, more models
Reinforcement fine-tuning	Shipped	2025	Agentic tuning
Voice Agent Platform	Shipped	2025-2026	New budget category
Microsoft Foundry launch	Shipped	Mar 2026	Azure distribution
Model-lifecycle toolchain	Planned	2026+	End-to-end creation
3-4x compute expansion	Planned	2026	Capacity scale

Release timeline from Fireworks blog, docs changelog and analyst coverage; planned items are company-stated roadmap intent.

[CE008, CE021, CE022]

FE004: Product maturity / capability map

Maturity of each module across capability dimensions.

Maturity cells are author judgments synthesizing product and compliance evidence.

[CE006, CE008, CE029, CE031]

5.5 Differentiation, IP and Data

Fireworks' differentiation is engineering-led. Its core intellectual property is the proprietary inference engine, especially FireAttention's custom kernels and FireOptimizer's adaptive optimization, which convert systems expertise from the founders' PyTorch background into measurable speed and cost advantages; no public patents are listed, so the moat is know-how rather than registered IP. A second source of differentiation is product-model co-design: a data feedback loop in which customer interactions continuously improve fine-tuned models, which Fireworks frames as how enterprises build a competitive moat with AI. A third is breadth and freshness: day-zero support across hundreds of open models and modalities, so the platform benefits from model turnover rather than being threatened by it. The principal vulnerability is that the optimization advantage sits atop open-source frameworks like vLLM and SGLang that keep improving, so the differentiation must be continuously re-earned. Supply access to leading-edge NVIDIA and AMD GPUs is a further enabling, but not exclusive, advantage.[CE023, CE024, CE025, CE026, CE027]

FE003: Critical dependency map

Upstream dependencies the Fireworks platform relies on.

Dependency graph synthesized from technical sources; edge direction shows upstream-to-platform reliance.

[CE015, CE024, CE026, CE027]

5.6 Trust, Safety, Security and Compliance

Fireworks' enterprise posture is built for regulated buyers. The platform offers zero data retention by default, single sign-on, audit logs, and data-residency controls, and its AWS-based inference solution is HIPAA and SOC2 Type II compliant. For the most sensitive workloads it supports airgapped EKS deployments and bring-your-own-bucket secure training that keeps training data in the customer's own AWS S3. Structured output controls such as JSON mode and grammar-constrained decoding improve reliability and reduce malformed responses in agentic workflows, and FireFunction's high schema-compliance rate supports dependable tool use. These capabilities open regulated verticals including healthcare, financial services and government-adjacent workloads that were previously inaccessible to a standalone inference vendor. Quality is reinforced by continuous evaluation and reinforcement learning in the product-model co-design loop. Gaps remain: Fireworks does not publish a formal standard-tier SLA, enterprise SLAs are negotiated case by case, and independent reviewers note thin documentation in places, which are diligence items for security-sensitive buyers.[CE028, CE029, CE030, CE031, CE032]

Trust / quality / compliance table
Control	Status	Scope	Note
SOC2 Type II	Compliant	AWS-based inference	Per AWS case study
HIPAA	Compliant	AWS-based inference	Enables healthcare
Zero data retention	Default	Enterprise	Privacy posture
SSO / audit logs	Available	Enterprise	Governance
Data residency	Available	Multi-region	Frankfurt/Iceland/Tokyo
Airgapped EKS	Available	Sensitive workloads	Isolation
BYOB secure training	Available	SFT/RFT	Customer AWS S3
Standard-tier SLA	Not published	Serverless	Negotiated for enterprise

Compliance posture from AWS case study and Sacra; the absence of a published standard SLA is a diligence item.

[CE028, CE029, CE030, CE032]

5.7 Exhibits

Chapter 06

06Customers

6.1 Customer Base Segmentation

Fireworks' customer base spans three broad segments distinguished by buyer, user, payer and adoption path. AI-native startups, including Cursor, Perplexity, Liner and Cresta, adopt bottoms-up: individual developers start with self-serve API keys, and the economic buyer is an engineering or platform lead. Digital-native enterprises, such as DoorDash, Notion, Shopify, Upwork and Quora, move features from pilot into production and expand into dedicated deployments and fine-tuning, with budget owned by product-engineering organizations. Traditional and larger enterprises, exemplified by Samsung and Uber, and increasingly regulated buyers in healthcare and financial services, adopt top-down through negotiated contracts requiring compliance and data-residency controls. Across all three, the user is a developer, the payer is an engineering or procurement budget, and use cases cluster around code assistance, conversational AI, enterprise search, agentic workflows and voice. Geographically the base skews North American and European with global API access, and verticals span software, e-commerce, marketplaces, customer service and legal tech.[CU001, CU002, CU003, CU004, CU005]

Customer segmentation table
Segment	Example customers	Buyer / payer	Use case	Adoption path
AI-native startups	Cursor, Perplexity, Liner, Cresta	Eng lead / eng budget	Code, search, conversational AI	Bottoms-up self-serve
Digital-native enterprises	DoorDash, Notion, Shopify, Upwork, Quora	Product-eng / eng budget	Production AI features	Pilot to production
Large / regulated enterprises	Samsung, Uber	Platform + procurement	Enterprise AI roadmaps	Top-down contract
Enterprise search / agents	Sourcegraph, Hebbia	Eng lead / eng budget	Code + enterprise search	Land-and-expand
Communication / productivity	Superhuman	Product owner	Compound AI assistants	Feature-led

Segments and example customers from Fireworks blogs, Sacra and AI Market Watch; segment boundaries are analytical and some customers span multiple.

[CU001, CU002, CU003, CU004]

FU001: Customer journey map

Stages a customer moves through from discovery to enterprise standardization.

Journey synthesized from Fireworks go-to-market descriptions; not all customers traverse every stage.

[CU002, CU009, CU022]

6.2 Adoption Trajectory

Adoption has scaled steeply. Fireworks reported powering over 10,000 companies at its October 2025 Series C, roughly a tenfold increase from about 1,000 at the Series B, alongside hundreds of thousands of developers. The developer base grew from about 12,000 in February 2024 to 23,000 by the end of that year. Usage intensity is high: the platform processes more than 10 trillion tokens per day, rising to about 15 trillion by early 2026, indicating that many accounts run production rather than experimental workloads. Customers progress along a land-and-expand path, beginning with serverless inference for a single feature and expanding into dedicated deployments, fine-tuning, reinforcement fine-tuning, embeddings for retrieval and voice agents. Analyst commentary on Hebbia illustrates how a single inference relationship, anchored on fast access to new open models and high-concurrency latency guarantees, can grow into a broader infrastructure dependency. The trajectory is strong on breadth and usage, though account-level retention and cohort expansion data are not disclosed.[CU006, CU007, CU008, CU009, CU010]

Customer growth / adoption trajectory table
Metric	Value	As of	Source basis
Companies served	~1,000	Series B (2024)	Company-stated
Companies served	10,000+	Oct 2025	Company-stated
Developers	~12,000	Feb 2024	Reported
Developers	~23,000	Dec 2024	Reported
Developers	Hundreds of thousands	Oct 2025	Company-stated
Tokens/day (Oct 2025)	10T+	Oct 2025	Company-stated
Tokens/day (early 2026)	~15T	Early 2026	Third-party profile

Trajectory figures are company-stated or third-party; growth is rapid but account-level retention is not disclosed.

[CU006, CU007, CU008]

FU002: Adoption / deployment funnel

Relative narrowing from developer signups to standardized enterprise accounts.

Funnel values are illustrative relative weights; Fireworks does not disclose conversion rates.

[CU006, CU007, CU009]

6.3 Named Customer Proof

Fireworks has unusually strong named, production-grade proof points for a company of its age. Cursor used Fireworks' speculative-decoding API to reach roughly 1,000 tokens per second for Fast Apply code generation, with an AI researcher publicly stating Fireworks is "way more performant than the open source engines" and used in production. Notion reduced AI response latency from about two seconds to 350 milliseconds by fine-tuning with Fireworks, attributed by its head of AI engineering. Sourcegraph cut latency by 30% and lifted completion acceptance 2.5x, and Upwork's "Uma" assistant drafts real-time proposals on Fireworks. Quora's Poe chatbot tripled response speed, and Superhuman built its Ask AI compound system on the platform. These are mostly production deployments with named executives and quantified outcomes, giving the reference base high quality and reasonable freshness, though several case studies date to 2024 and a few logos appear only in aggregate marketing lists without standalone case studies.[CU011, CU012, CU013, CU014, CU015, CU016]

Named customer proof table
Customer	Deployment	Outcome	Reference quality	Freshness
Cursor	Production	~1,000 tok/sec Fast Apply; named researcher quote	High (quote + metric)	2024-2025
Notion	Production	Latency 2s -> 350ms; named exec quote	High (quote + metric)	2025
Sourcegraph	Production	30% lower latency, 2.5x acceptance	High (AWS + story)	2024
Upwork	Production	Uma real-time proposals; named exec	High (quote)	2025
Quora (Poe)	Production	Tripled response speed	Medium (reported)	2024
Superhuman	Production	Ask AI compound system	Medium (story)	2024
Samsung	Enterprise	AI roadmap acceleration	Medium (investor cited)	2025
DoorDash	Production	High-throughput AI features	Medium (logo + AWS)	2025

Named, mostly production references with quantified outcomes; some date to 2024 and a few logos appear only in aggregate lists, hence partial coverage.

[CU011, CU012, CU013, CU014, CU015]

FU003: Customer proof matrix

Reference quality across deployment status, quantified outcome and named attribution.

Cells synthesize evidence quality from customer stories and the AWS case study.

[CU011, CU016, CU031]

6.4 Retention and Durability

Retention is the weakest-evidenced dimension of the customer story. Fireworks does not disclose net revenue retention, gross retention, churn, renewal rates or contract lengths, so durability must be inferred from structural signals rather than measured. The positive signals are real: the platform's land-and-expand design, multi-product surface and enterprise controls encourage expansion, blue-chip logos run production workloads, and the OpenAI-compatible API plus reliability lead reduce reasons to leave once integrated. The negative signals are equally real: the same OpenAI-compatible API and the rise of routing aggregators make multi-homing and switching trivial, inference is commoditizing, and razor-thin price differentiation versus Together limits pricing-based stickiness. Independent reviewers explicitly note alternatives and switching paths. The net assessment is that durability is plausibly supported by product depth and integration but is not yet evidenced by disclosed retention metrics, which is a material diligence gap.[CU017, CU018, CU019, CU020, CU021]

Retention / repeat usage / satisfaction table
Dimension	Status	Signal	Confidence
Net revenue retention	Not disclosed	Land-and-expand structure	Low
Gross retention / churn	Not disclosed	No public data	Low
Contract length	Not disclosed	Enterprise negotiated	Low
Repeat usage	High (implied)	10T+ tokens/day production	Medium
Satisfaction	Positive (anecdotal)	Named exec testimonials	Medium
Switching risk	Elevated	OpenAI-compatible API + routers	Medium

Retention metrics are undisclosed; positive signals are structural/anecdotal while switching risk is elevated by low lock-in.

[CU017, CU018, CU019, CU020]

FU004: Retention / repeat cohort

Qualitative retention signal by customer segment (disclosed metrics absent).

Cohort cells are qualitative author judgments; Fireworks discloses no quantitative cohort retention.

[CU017, CU019, CU021]

6.5 Expansion and Concentration Risk

Fireworks' growth engine is land-and-expand, with a single serverless feature able to grow into dedicated, fine-tuning, voice and reserved-capacity spend, supported by an AWS Strategic Collaboration Agreement that reaches buyers through existing procurement channels. The principal concentration risks are twofold. First, revenue is likely skewed toward a smaller number of large production deployments, so blended annualized revenue per company near $28,000 understates a probable long tail beneath a few large accounts; the identity and share of top customers are not disclosed, creating top-customer risk that cannot be quantified. Second, distribution and partner dependence is real: the AWS alliance and Microsoft Foundry availability are growth accelerants but also channel dependencies, and several marquee customers (for example DoorDash and Shopify) are themselves sophisticated buyers capable of multi-homing or in-housing. Procurement friction is lower than for closed APIs because of cloud-marketplace availability, but enterprise sales cycles and compliance reviews still gate the largest deals.[CU022, CU023, CU024, CU025, CU026]

Expansion and concentration risk table
Factor	Direction	Detail	Diligence ask
Land-and-expand	Positive	Serverless -> dedicated/tuning/voice	Measure expansion revenue %
Blended ARPA	Neutral	~$28K/yr across base	Get ARPA distribution
Top-customer concentration	Risk	Revenue skewed to large deployments	Disclose top-10 revenue share
Channel dependence	Risk	AWS + Microsoft Foundry channels	Assess direct vs partner-sourced mix
Customer multi-homing	Risk	Sophisticated buyers can multi-home	Check single-vendor commitments
Procurement friction	Neutral	Lower via cloud marketplaces	Map enterprise sales-cycle length

Concentration and channel risks are inferred from analyst commentary and the AWS/Azure partnerships; top-customer share is undisclosed.

[CU022, CU023, CU024, CU025]

6.6 Exhibits

Chapter 07

07Risks

7.1 Severity-Ranked Risk Overview

Fireworks is a fast-scaling, well-funded business whose principal risks are commercial and structural rather than acute legal or operational failures. The highest-severity risks are inference commoditization and gross-margin compression, hyperscaler bundling that could capture the inference layer, and hardware-supply dependence on NVIDIA, which is simultaneously a supplier, an investor and, via its Lepton acquisition and NIM packaging, a competitor. Medium-severity risks include capital intensity from a planned three-to-four-fold compute expansion, key-person concentration in CEO Lin Qiao, low switching costs that cap retention, and an aggressive valuation ramp from $552 million to $4 billion and a rumored $15 billion. Lower but non-trivial risks include regulatory overhead from the EU AI Act and GDPR, open-model licensing constraints, the absence of registered patents, undisclosed burn and runway, and reliance on AWS and Microsoft distribution channels. The mitigation thesis is consistent across categories: move up the stack into tuning, agents, voice and enterprise governance faster than the serving layer commoditizes, while diversifying silicon and plugging into incumbent procurement channels. Residual exposure remains meaningful because several mitigations are unproven and several key metrics are undisclosed.[CR001, CR002, CR003, CR004, CR005, CR006]

Risk heatmap summary
Risk	Likelihood	Impact	Mitigation maturity	Residual exposure
Inference commoditization / margin	High	High	Medium	High
Hyperscaler bundling	Medium	High	Medium	High
NVIDIA supplier-competitor	Medium	High	Low	High
Capital intensity / burn	Medium	Medium	Low	Medium
Key-person concentration	Low	High	Low	Medium
Low switching cost / churn	High	Medium	Low	Medium
Valuation ramp	Medium	Medium	Low	Medium
Regulatory (EU AI Act/GDPR)	Medium	Low	Medium	Low

Severity ratings are the author's qualitative synthesis of analyst, review and filing evidence; residual exposure reflects mitigation maturity.

[CR001, CR002, CR003, CR004, CR025]

FR001: Risk heatmap

Likelihood versus impact and residual exposure across major risk categories.

Cells are qualitative author judgments synthesizing analyst, review and filing evidence.

[CR001, CR002, CR003, CR007]

7.2 Regulatory and Legal Risk

Fireworks' regulatory and legal exposure is real but currently manageable. The most material regime is the EU AI Act, which imposes tiered, risk-based obligations including transparency and documentation duties on general-purpose and foundation-model providers and their deployers; Fireworks' role as an inference and fine-tuning platform places it in the compliance chain for EU customers. GDPR and data-residency requirements drive the company's zero-data-retention, data-residency and regional-deployment features, and any lapse carries fines and reputational cost. Open-model licensing is a subtler legal risk: models such as Llama carry acceptable-use and license terms, and unresolved industry questions about training-data copyright could flow through to platforms that serve those models. Intellectual-property exposure runs the other way too: Fireworks lists no public patents, so its FireAttention and FireOptimizer advantages rest on trade secrets and know-how that are harder to defend if key engineers leave. No material litigation or enforcement action against Fireworks is publicly known, and its Series C was executed with top-tier legal counsel, but the regulatory surface will expand as the company sells into healthcare, financial services and government-adjacent verticals.[CR007, CR008, CR009, CR010, CR011, CR012]

Regulatory / legal risk register
Risk	Regime / source	Likelihood	Impact	Mitigation
EU AI Act obligations	EU AI Act (GPAI/deployer duties)	Medium	Medium	Compliance + documentation
Data privacy / GDPR	GDPR / data residency	Medium	Medium	Zero retention, EU regions
Open-model licensing	Llama / model licenses	Low	Medium	License compliance, model neutrality
Training-data copyright spillover	Industry IP uncertainty	Low	Medium	Serves third-party models
IP defensibility	No registered patents	Medium	Medium	Trade-secret protection
Sector compliance expansion	HIPAA / financial / gov	Medium	Low	SOC2/HIPAA posture
Litigation / enforcement	None known publicly	Low	Medium	Top-tier legal counsel

Regulatory register; no material litigation against Fireworks is publicly known, and several items are sector- and jurisdiction-dependent, hence partial coverage.

[CR007, CR008, CR009, CR010, CR011]

7.3 Operational, Quality and Security Risk

Operationally, Fireworks' defining exposure is GPU supply. The company does not own its fleet and sources NVIDIA and AMD capacity from third parties, leaving it exposed to allocation constraints, supply bottlenecks and hardware-transition timing as it scales compute three-to-four-fold. Reliability is a strength on observed data, with independently monitored Q1 2026 uptime of 99.8%, but Fireworks does not publish a formal standard-tier SLA, so contractual reliability commitments are negotiated case by case and incident history is opaque. Running a global multi-region fleet across Frankfurt, Iceland, Tokyo and US, EU and APAC regions adds operational complexity and cost. Security and compliance posture is comparatively strong, with SOC2 Type II and HIPAA on AWS-based inference, zero data retention, airgapped EKS and bring-your-own-bucket training, and no public breach is known; nonetheless, a single serious outage or data incident would be especially damaging given the production, latency-sensitive nature of customer workloads. Reviewers also flag thin documentation and potential support strain as the company scales, which are quality risks rather than safety risks.[CR013, CR014, CR015, CR016, CR017, CR018]

7.4 Partner and Dependency Risk

Fireworks sits inside a dense web of dependencies. The most acute is NVIDIA, which supplies the leading-edge GPUs Fireworks' performance and margin claims rely on, holds an investment stake, and now competes directly through its Lepton acquisition, a GPU-cloud marketplace and NIM packaging. AWS and Microsoft are both partners and threats: their Strategic Collaboration Agreement and Foundry availability provide distribution, but Bedrock, Vertex and Azure can bundle inference into existing security, billing and governance relationships and absorb the category. Fireworks also depends on the continued release and permissive licensing of open models from Meta, DeepSeek, Alibaba and others; a slowdown in open-model quality or a shift to restrictive licenses would undercut its open-model-neutral thesis. Cloud-platform dependence, capital-provider concentration among a handful of late-stage funds, and key-customer concentration among sophisticated buyers capable of multi-homing round out the dependency map. The common thread is that Fireworks' enabling partners are also its most credible competitors, so partnership depth and supplier diversification are central to the risk picture.[CR019, CR020, CR021, CR022, CR023, CR024]

Partner / dependency risk register
Dependency	Role	Risk	Severity
NVIDIA	GPU supplier + investor + competitor	Allocation, supplier-as-rival	High
AMD	Alternative silicon supplier	Smaller ecosystem maturity	Medium
AWS	Cloud + channel partner	Bundling via Bedrock	High
Microsoft	Foundry distribution	Bundling via Azure	Medium
Open-model labs	Meta / DeepSeek / Alibaba	Model supply & licensing	Medium
Late-stage investors	Capital providers	Financing concentration	Low
Key customers	Sophisticated buyers	Multi-homing / in-housing	Medium

Dependency register; the recurring theme is that Fireworks' enabling partners are also its most credible competitors.

[CR019, CR020, CR021, CR022, CR023]

FR003: Dependency map

Critical external dependencies and their failure paths.

Dependency edges show upstream reliance; NVIDIA, AWS and Azure are simultaneously partners and competitors.

[CR019, CR020, CR021, CR022]

7.5 Financial, Model and Execution Risk

Financially, the central risk is margin compression. Gross margin near 50% is structurally below software norms because GPU costs sit in cost of goods sold, and a razor-thin price gap versus Together plus improving open-source serving frameworks create persistent downward pressure; the stated path to 60% depends on unproven utilization gains and a revenue-mix shift. Capital intensity compounds this: the three-to-four-fold compute expansion requires recurring capacity spend, and burn, runway and net revenue retention are undisclosed, so capital adequacy is asserted rather than verified. The valuation ramp from $552 million to $4 billion within roughly fifteen months, with talk of $15 billion, embeds aggressive growth expectations that a slowdown or margin disappointment would punish. On execution and people, the founding team's PyTorch pedigree is a strength but concentrates key-person risk in CEO Lin Qiao, and retaining elite inference engineers in a hot market is a continuing challenge. The mitigation logic across all of these is the same up-the-stack diversification, but its success is the core open question of the investment.[CR025, CR026, CR027, CR028, CR029, CR030]

People / execution risk register
Risk	Detail	Likelihood	Impact
Key-person concentration	CEO Lin Qiao leads vision and fundraising	Low	High
Founder/engineer retention	Elite inference talent in hot market	Medium	Medium
Org scaling	Rapid headcount and GTM build-out	Medium	Medium
Execution on roadmap	Up-the-stack expansion unproven	Medium	High
Governance opacity	Board composition undisclosed	Low	Low

People/execution risks are inferred from founder concentration and roadmap ambition; headcount and board details are undisclosed.

[CR029, CR030, CR033]

7.6 Mitigations, Monitoring and Thesis-Break Triggers

Fireworks' mitigations are coherent: extend the stack into fine-tuning, reinforcement learning, voice and enterprise governance to escape commodity serving; diversify silicon across NVIDIA and AMD and pursue Blackwell efficiency; maintain day-zero open-model support so model turnover is a tailwind; harden enterprise compliance to win regulated verticals; and plug into AWS and Azure procurement rather than fighting them. The monitoring indicators that matter are gross-margin trajectory toward 60%, the revenue mix shifting to dedicated and enterprise, net revenue retention once disclosed, GPU-cost and allocation terms, and the competitive gap versus vLLM and SGLang. The clearest thesis-break triggers are gross margin failing to rise off ~50% or compressing further, a hyperscaler or NVIDIA capturing the inference layer and relegating Fireworks to an optimization add-on, a key-person departure, or growth stalling below the pace implied by the $4 billion-plus valuation. The priority diligence asks are a reconciled revenue and margin figure, NRR and burn, GPU-supply contracts, and top-customer concentration. The overall residual exposure is moderate-to-high, concentrated in commoditization and dependency risk rather than legal or operational failure.[CR031, CR032, CR033, CR034, CR035, CR036]

Mitigation and kill criteria table
Risk	Mitigation	Monitoring indicator	Thesis-break trigger
Commoditization	Move up stack to tuning/agents/voice	Revenue mix shift	Margin stuck/declining at ~50%
Hyperscaler bundling	Plug into AWS/Azure channels	Direct vs partner mix	Inference absorbed by Bedrock/Azure
NVIDIA dependence	Diversify to AMD, Blackwell efficiency	GPU cost/allocation terms	NVIDIA undercuts on price/access
Margin compression	Utilization + enterprise mix	Gross margin toward 60%	Margin compresses below 50%
Key-person risk	Deepen leadership bench	Exec retention	Lin Qiao departure
Growth durability	Land-and-expand + NRR	NRR, logo growth	Growth stalls vs valuation

Mitigations and kill criteria synthesize analyst commentary and company strategy; triggers are the author's thesis-break thresholds.

[CR031, CR032, CR034, CR035, CR036]

FR002: Risk transmission map

How commoditization and dependency risks transmit into financial outcomes.

Transmission edges synthesize analyst risk analysis; direction shows risk propagation.

[CR001, CR002, CR025, CR026]

7.7 Exhibits

Chapter 08

08Valuation

8.1 Investment Thesis and Anti-Thesis

The bull thesis is that Fireworks is becoming critical enterprise AI infrastructure, the runtime layer for open-model inference, at the exact moment enterprises shift from closed-API experimentation to owning customized models in production. It pairs a rare founding team that built PyTorch with genuine product advantages (FireAttention, FireOptimizer, best-in-class function calling, 99.8% uptime), blue-chip production references (Cursor, Notion, Sourcegraph, Upwork), and hypergrowth from roughly $130 million ARR in mid-2025 to a reported ~$800 million annualized by May 2026 across 10,000-plus customers. If managed inference is priced as durable infrastructure rather than a commodity, the valuation can compound. The anti-thesis is that inference is structurally commoditizing: gross margins sit near 50% because GPU costs dominate COGS, per-token prices sit within ~2% of Together, open-source serving frameworks keep closing the gap, switching costs are near zero, and the most powerful players, AWS, Azure and NVIDIA, are simultaneously partners and competitors capable of repricing the category. On this view, Fireworks risks becoming an optimization add-on on ~50% margins, and a valuation that ran from $552 million to $4 billion in fifteen months, with $15 billion in talks, already prices in flawless execution.[CV001, CV002, CV003, CV004, CV005, CV006]

Thesis / anti-thesis table
Dimension	Bull thesis	Bear anti-thesis
Market	Inference is the new runtime, huge TAM	Reachable SAM small vs hyperscalers
Product	FireAttention/FireOptimizer edge + reliability	OSS frameworks close the gap
Customers	Blue-chip production references	Low switching, multi-homing
Financials	Hypergrowth to ~$800M	~50% margins, price race
Competition	Best reliability + function calling	Sandwiched on price & speed
Dependency	Strategic NVIDIA/AWS/Azure support	Same players can reprice category
Valuation	Infrastructure multiple justified	Prices in flawless execution

Symmetric thesis/anti-thesis framing; the deciding variables are margin trajectory and retention, both undisclosed.

[CV001, CV002, CV003, CV004, CV005]

FV001: Recommendation logic

How thesis factors combine into the track recommendation.

Logic flow summarizing the recommendation drivers; weights are qualitative.

[CV007, CV008, CV001, CV002]

8.2 Recommendation, Confidence and Stance

We rate Fireworks AI track with medium confidence, a high risk rating and a stretched valuation stance, with an overall score of 6.5 out of 10. The business quality justifies close engagement and a position at the right entry, but the current and rumored prices demand conviction on two unproven variables: that gross margin can climb meaningfully off ~50% toward the stated 60% target, and that growth is durable rather than a commodity land-grab vulnerable to hyperscaler capture. At the October 2025 Series C, the $4 billion valuation implied roughly 14 times the company-stated $280 million annualized revenue; on Sacra's ~$800 million May 2026 estimate the same $4 billion would be about 5 times, but the rumored $15 billion round implies roughly 19 times that higher base. The wide range reflects genuine uncertainty about the right revenue figure and the right multiple for a sub-software-margin, fast-commoditizing category. The recommendation is therefore to track closely, underwrite to the base case, insist on entry discipline below the rumored mark, and require margin and retention disclosure before committing at a premium. Confidence is capped at medium primarily by the absence of audited financials, disclosed NRR, and a reconciled revenue figure.[CV007, CV008, CV009, CV010, CV011]

Recommendation summary table
Dimension	Assessment	Basis
Recommendation	Track	High quality, demanding price
Confidence	Medium	Unaudited financials, undisclosed NRR
Risk rating	High	Commoditization + dependency
Valuation stance	Stretched	$15B talk on ~50% margins
Overall score	6.5 / 10	Strong business, rich price
Entry discipline	Below rumored $15B	Underwrite to base case

Recommendation synthesizes thesis, financials, customers, competition and risk chapters; score is the author's composite judgment.

[CV007, CV008, CV009]

8.3 Financing Context and Entry Discipline

Fireworks has raised over $327 million across seed, Series A ($25M, 2024), Series B ($52M at $552M, July 2024) and Series C ($250M at $4B, October 2025), the last comprising roughly $230 million primary and a $20 million secondary. As of May 2026 it is reportedly in talks to raise at a $15 billion post-money valuation co-led by Index Ventures, a near-quadrupling in about seven months. For a private late-stage entry, the key disciplines are the revenue base used to strike the multiple, the preference stack and any liquidation overhang, and dilution from continued raising into a capital-intensive compute build-out. Public evidence supports the growth and customer story but not the financial quality: revenue figures are unaudited and conflict across sources, gross margin is an analyst estimate, and burn and runway are undisclosed. The presence of strategic investors NVIDIA, AMD, MongoDB and Databricks on the cap table is double-edged, adding ecosystem support but also concentrating supplier and partner influence. Entry discipline should anchor on the base-case valuation, treat the $15 billion mark as a stretch that requires margin proof, and account for preference and dilution that are not publicly disclosed.[CV012, CV013, CV014, CV015, CV016]

8.4 Bull, Base and Bear Cases

Our base case (about 45% weight) assumes Fireworks reaches roughly $700-900 million annualized revenue in 2026 and continues growing while gross margin improves only modestly into the low 50s; share is held but commoditization caps the multiple, implying a fair enterprise value around $5-8 billion, roughly in line with or modestly above the $4 billion Series C and below the rumored $15 billion. The bull case (about 30%) assumes the up-the-stack strategy works: fine-tuning, reinforcement learning, voice and governance lift margins toward 58-60%, revenue compounds past $1.5 billion by 2027, and Fireworks becomes a platform-of-record, justifying a $15-20 billion valuation. The bear case (about 25%) assumes commoditization and hyperscaler capture: margins stay near 50% or compress, growth decelerates sharply as buyers multi-home or shift to Bedrock and Azure, and the multiple compresses to a $2-3 billion range or a down round. The dispersion is unusually wide because the same company can be read as durable infrastructure or a commoditizing reseller, and the deciding evidence, margin trajectory and retention, is not yet disclosed.[CV017, CV018, CV019, CV020, CV021]

Bull / base / bear scenario table
Scenario	Probability	Key assumptions	2026-27 revenue	Margin	Implied value
Bull	~30%	Up-the-stack works, platform-of-record	>$1.5B by 2027	58-60%	$15-20B
Base	~45%	Holds share, modest margin gain	$700-900M (2026)	low 50s	$5-8B
Bear	~25%	Commoditization + hyperscaler capture	Growth halves	~50% or lower	$2-3B / down round

Scenario probabilities and ranges are the author's estimates; revenue uses company and Sacra figures and is unaudited.

[CV017, CV018, CV019, CV020]

FV003: Valuation / return range

Implied enterprise value by scenario, USD billions.

Scenario value ranges are author estimates anchored to comparable multiples and the disclosed marks.

[CV017, CV018, CV019]

8.5 Comparable Set

Private comparables anchor the analysis. Together AI, the closest peer, was valued at $3.3 billion on about $618 million annualized revenue in early 2025 (roughly 5x) and is reportedly in talks near $7.5 billion on about $1 billion (roughly 7-8x). Baseten raised at a $5 billion valuation in January 2026 and is reportedly discussing $11 billion, while Groq reached $6.9 billion as a hardware-led player on a different model, and Fal is cited around $4.5 billion. Against these, Fireworks at $4 billion on ~$280 million (Series C vintage) looks rich versus Together's multiple but is on a smaller, faster-growing base; on the ~$800 million May 2026 estimate it looks comparatively cheap, and the $15 billion talk re-stretches it. Public infrastructure-software comparables, Datadog, Snowflake, Confluent, Cloudflare, MongoDB and DigitalOcean, frame the multiple ceiling: high-growth public infra trades in a broad band but has compressed from peak, and lower-margin infrastructure businesses like DigitalOcean trade at clear discounts to pure software. Because Fireworks carries ~50% gross margins, a discount to pure-SaaS multiples is warranted, and hyperscalers Amazon, Microsoft and Oracle are both the scale reference and the competitive threat. The comparable set supports a wide, scenario-dependent value rather than a single point.[CV022, CV023, CV024, CV025, CV026, CV027]

Comparable valuation table
Company	Type	Valuation	Revenue (annualized)	Implied multiple	Note
Fireworks AI	Private round	$4.0B (Oct 2025)	~$280M	~14x	Series C vintage
Fireworks AI	Private (rumored)	$15B (2026)	~$800M	~19x	Talks, unconfirmed
Together AI	Private round	$3.3B (Feb 2025)	~$618M	~5x	Closest peer
Together AI	Private (rumored)	$7.5B (2026)	~$1.0B	~7-8x	In talks
Baseten	Private round	$5.0B (Jan 2026)	Undisclosed	n/a	Talks of $11B
Groq	Private round	$6.9B (Sep 2025)	Hardware model	n/a	Different model
Public infra SaaS	Public comps	Datadog/Snowflake/Cloudflare	Multi-$B	~8-20x EV/rev	Margin >70%
DigitalOcean	Public comp	Lower multiple	~$0.8B	Low single-digit	Infra-heavy discount

Private rounds from company and Sacra; public comps framed qualitatively from filings. Coverage is partial: not every peer's revenue is disclosed.

[CV022, CV023, CV024, CV025, CV026]

FV002: Valuation sensitivity

Implied valuation at different revenue and multiple assumptions, USD billions.

Sensitivity grid using company and Sacra revenue figures times illustrative multiples; not a forecast.

[CV009, CV022, CV023]

8.6 Exit Readiness and Final Diligence

Exit optionality is strong in direction but unproven in timing. Plausible paths include an IPO if Fireworks sustains hypergrowth and lifts margins toward software-like levels, or strategic acquisition by a hyperscaler or data-platform investor (AWS, Microsoft, Databricks, MongoDB, NVIDIA) seeking to own the inference layer, though several of those are also competitors. The principal thesis-break triggers are gross margin failing to rise off ~50%, hyperscaler or NVIDIA capture of the inference layer, a key-person departure, or growth stalling below the pace implied by the valuation. The priority final diligence asks are a single reconciled and dated revenue figure, audited or management-confirmed gross margin and the path to 60%, net revenue retention and churn cohorts, burn and runway against the compute build-out, GPU-supply contract terms, top-customer concentration, and the preference and dilution structure of the next round. Until those are answered, the right posture is to track the company closely, build conviction on the base case, and reserve a premium entry for confirmation that margin and retention support the infrastructure thesis rather than the commodity one.[CV028, CV029, CV030, CV031, CV032]

Thesis-break and kill triggers table
Trigger	Signal	Action
Margin stagnation	Gross margin stuck at ~50% or falling	Exit / avoid premium
Hyperscaler capture	Inference absorbed by Bedrock/Azure	Reassess durability
NVIDIA repricing	Supplier undercuts on price/access	Cut exposure
Growth stall	Revenue decelerates vs valuation	Down-round risk
Key-person loss	Lin Qiao departs	Re-underwrite
Retention shortfall	NRR below ~110% once disclosed	Lower multiple

Kill triggers map the conditions that would invalidate the infrastructure thesis; thresholds are the author's.

[CV028, CV029, CV030]

Final diligence asks table
Ask	Why it matters	Owner
Reconciled dated revenue	Sets the multiple denominator	Company / finance
Audited gross margin + 60% path	Tests the premium thesis	Company / finance
NRR and churn cohorts	Durability of revenue	Company / RevOps
Burn and runway	Financing risk vs compute plan	Company / finance
GPU-supply contracts	Margin and supply exposure	Company / infra
Top-customer concentration	Revenue concentration risk	Company / sales
Preference & dilution	Entry economics	Company / legal

Diligence asks are the gating items before committing at a premium valuation.

[CV031, CV032]

FV004: Investment KPIs

Headline investability indicators.

KPIs synthesize the recommendation and valuation analysis; multiples use unaudited revenue.

[CV007, CV009, CV010]

8.7 Exhibits

Disclaimer

This report is for informational purposes only, is based on public sources as of 2026-06-14, and is not investment advice. Financial figures are largely unaudited company statements or third-party estimates and should be independently verified before any decision.

Evidence index

Claims
ID	Statement	Confidence	Sources
CO001	Fireworks AI is an AI inference-cloud company headquartered in Redwood City, California.	High	SO018, SO020, SO025
CO002	Fireworks AI was founded in late 2022 by a team that left Meta's PyTorch organization.	High	SO002, SO004, SO014
CO003	Fireworks operates an "AI Cloud" platform that runs, fine-tunes and scales open-source LLM, vision, audio and multimodal models with low-latency inference.	Medium	SO002, SO013, SO001
CO004	Fireworks monetizes via usage-based pricing including per-token serverless inference, per-training-token fine-tuning, per-GPU-hour reinforcement fine-tuning and dedicated deployments.	Medium	SO013
CO005	Fireworks positions itself on a "one-size-fits-one" thesis favoring smaller customizable open models over generic closed foundation models.	Medium	SO002, SO005
CO006	Lin Qiao is CEO and co-founder of Fireworks AI and previously led the PyTorch team at Meta.	High	SO004, SO016, SO018
CO007	Fireworks AI was co-founded by seven people, most of whom worked together on PyTorch at Meta.	Medium	SO004, SO014, SO023
CO008	Co-founders Dmytro Dzhulgakov and Dmytro Ivchenko are Ukrainian former Meta PyTorch engineers.	Medium	SO014, SO004
CO009	Lin Qiao holds a Ph.D. in Computer Science from UC Santa Barbara and previously worked at LinkedIn and IBM.	Medium	SO018, SO016
CO010	Other co-founders include James Reed, Benny Chen, Chenyu Zhao and Pawel Garbacki, with backgrounds at Meta PyTorch, ads and ML teams and Google Vertex AI.	Medium	SO004, SO023
CO011	Fireworks AI raised a $250 million Series C in October 2025 at a $4 billion valuation.	High	SO002, SO019, SO020
CO012	The Series C was co-led by Lightspeed Venture Partners, Index Ventures and Evantic with continued support from Sequoia Capital.	High	SO002, SO021, SO022
CO013	A $52 million Series B led by Sequoia closed in July 2024 at a $552 million valuation with NVIDIA, AMD and MongoDB Ventures participating.	High	SO003, SO008, SO009
CO014	Fireworks AI has raised more than $327 million in total funding as of October 2025.	High	SO002, SO013
CO015	A $25 million Series A led by Benchmark closed in March 2024 with Sequoia, Databricks Ventures and angels including Frank Slootman, Sheryl Sandberg, Howie Liu and Alexandr Wang.	Medium	SO014, SO003
CO016	The Series B brought Fireworks AI's cumulative capital raised to $77 million.	Medium	SO003
CO017	As of May 2026 Sacra reports Fireworks is in talks to raise at a $15 billion post-money valuation with Index set to co-lead, on unconfirmed terms.	Low	SO013
CO018	Fireworks AI reported powering over 10,000 companies at its October 2025 Series C, roughly a tenfold increase from its Series B.	Medium	SO002, SO013
CO019	Fireworks reported annualized revenue surpassing $280 million at the time of the October 2025 Series C.	Medium	SO002
CO020	The Series C round comprised roughly $230 million of primary funding and a $20 million secondary transaction per Sacra.	Medium	SO013
CO021	Fireworks AI's developer base grew from about 12,000 in February 2024 to 23,000 by the end of 2024.	Medium	SO014
CO022	The Fireworks platform processes more than 10 trillion tokens per day as of October 2025, rising to about 15 trillion per day by early 2026 per third-party profiles.	Medium	SO002, SO018
CO023	Earlier 2025 coverage cited Fireworks at $130 million ARR, profitable, and growing roughly 20x year over year.	Low	SO014
CO024	Sacra estimates Fireworks AI's gross margin near 50 percent, below software norms, with management targeting 60 percent through GPU optimization.	Medium	SO013
CO025	Fireworks launched Microsoft Foundry (Azure) availability in March 2026, extending open-model inference to Azure customers.	Medium	SO018
CO026	Fireworks shipped FireFunction V2, FireAttention V2, FireOptimizer, supervised fine-tuning V2 and reinforcement fine-tuning between 2024 and 2026.	Medium	SO003, SO013
CO027	Fireworks AI acquired Hathora to deepen real-time and global compute orchestration.	Medium	SO013
CO028	Fireworks does not own its GPU fleet and sources NVIDIA and AMD capacity from third parties.	Medium	SO013
CO029	Analysts cite inference commoditization, hyperscaler bundling and hardware concentration as the main structural risks to Fireworks.	Medium	SO013
CO030	Independent reviewers describe Fireworks as "just the engine," requiring developer sophistication, with thin documentation and no ongoing free tier.	Medium	SO026
CO031	Fireworks offers an OpenAI-compatible API plus function calling, fine-tuning and enterprise security controls across hundreds of models.	Medium	SO001, SO002
CO032	Investors at Index Ventures and Sequoia cite the founding team's PyTorch and inference-systems pedigree as the core reason for backing Fireworks.	Medium	SO004, SO005
CO033	CEO Lin Qiao concentrates fundraising, vision and public representation, creating a meaningful key-person dependency.	Low	SO004, SO015
CO034	NVIDIA has entered the inference market directly via its Lepton acquisition and a competing GPU cloud marketplace, raising supplier-as-competitor risk for Fireworks.	Medium	SO013
CO035	Company-stated revenue figures and third-party estimates for Fireworks differ materially across vintages, from $130M ARR in mid-2025 to ~$800M annualized by May 2026.	Low	SO002, SO013, SO014
CM001	Fireworks AI competes in the managed AI inference market for serving and tuning open-weight models in production.	Medium	SM010, SM013
CM002	The core included spend is third-party production model serving, fine-tuning and dedicated deployment, not foundation-model training.	Medium	SM010, SM009
CM003	Closed-model APIs from OpenAI and Anthropic are excluded from the core market but are the primary status-quo substitute.	Medium	SM009, SM025
CM004	Self-hosting on vLLM or SGLang and hyperscaler bundles such as Bedrock and Azure Foundry are direct substitutes for Fireworks.	Medium	SM010, SM015
CM005	Adjacent expansion pools include voice agents, RAG/embeddings and reinforcement-learning training for agents.	Medium	SM010
CM006	MarketsandMarkets estimates the AI inference market at $106.15 billion in 2025 growing to $254.98 billion by 2030 at a 19.2% CAGR.	High	SM001, SM003
CM007	Other research houses place the 2026 AI inference market between roughly $118 billion and $126 billion and 2034 between $312 billion and $536 billion.	Low	SM002, SM003, SM005
CM008	Gartner projects generative-AI model spend to nearly triple from $14 billion in 2025 to $39 billion by 2028.	Medium	SM009
CM009	The independent open-weight inference-serving market has consolidated around roughly seven providers as of Q2 2026.	Medium	SM006
CM010	With Together AI near $1 billion annualized revenue and Fireworks in the $280-800 million range, the independent-provider revenue pool is a few billion dollars in 2026.	Low	SM011, SM010
CM011	Fireworks' $280 million-plus revenue represents an early single-digit share of the independent inference niche.	Low	SM010, SM013
CM012	The most relevant lens for valuing Fireworks is the independent inference niche, not the headline AI inference TAM.	Medium	SM006, SM010
CM013	AI-native startups adopt Fireworks bottoms-up via self-serve API keys with an engineering lead as economic buyer.	Medium	SM010
CM014	Digital-native enterprises such as DoorDash, Notion, Shopify and Upwork move features from pilot to production on Fireworks.	Medium	SM013, SM010
CM015	Regulated and Fortune 500 buyers require SSO, audit logs, data residency and HIPAA/SOC2 posture and adopt top-down via procurement.	Medium	SM010
CM016	Across segments the user is a developer and the payer is an engineering or procurement budget.	Medium	SM010, SM013
CM017	Fireworks reaches buyers through cloud procurement channels via an AWS Strategic Collaboration Agreement and Microsoft Foundry availability.	Medium	SM010, SM015
CM018	Open-source model quality convergence and agentic compound AI are primary drivers expanding inference demand.	Medium	SM009, SM013
CM019	Inference is commoditizing as vLLM and SGLang improve, compressing the proprietary advantage of optimized stacks.	Medium	SM010, SM025
CM020	Hyperscaler bundling by AWS, Azure and Google folds inference into existing security, billing and governance relationships.	Medium	SM010, SM015
CM021	Fireworks' Llama 70B price sits within roughly 2% of Together AI's, illustrating razor-thin price differentiation.	Medium	SM023, SM006
CM022	GPU supply is concentrated and Fireworks does not own its fleet, creating capacity and cost exposure.	Medium	SM010
CM023	The EU AI Act imposes tiered obligations that add compliance overhead for AI deployment in Europe.	Medium	SM026
CM024	The OpenAI-compatible API lowers both switching-in and switching-out costs, capping durable lock-in.	Medium	SM023, SM010
CM025	Published AI inference TAM figures bundle chips, hyperscaler services and independent software, so they overstate Fireworks' reachable market.	Medium	SM001, SM006
CM026	The independent inference-provider revenue pool is not measured by any standard analyst and must be assembled from uneven company estimates.	Low	SM010, SM011, SM012
CM027	Forecast CAGRs for AI inference range from roughly 13% to 19% and 2034 estimates differ by more than $200 billion across houses.	Low	SM001, SM002, SM003
CM028	Despite wide estimate spreads, the AI inference market is clearly large and growing double digits, with directional rather than precise SAM.	Medium	SM001, SM004
CM029	There is no public evidence of near-term saturation in the AI inference market; growth drivers remain intact through the forecast window.	Low	SM002, SM004
CM030	Fine-tuned and specialized models are projected to capture much of the generative-AI model-spend growth, favoring Fireworks' tuning products.	Medium	SM009
CM031	The serverless open-weight inference field shows roughly 6x price spread and 5-7x latency spread across providers on the same model.	Medium	SM006
CM032	Together AI, Groq, Baseten, Cerebras, Replicate, Anyscale and OctoAI are the other named providers in the consolidated inference field.	Medium	SM006, SM016, SM019
CM033	Voice agents targeting sub-500ms latency expand Fireworks into contact-center and telephony budget categories larger than API inference alone.	Medium	SM010
CM034	Demand differs by maturity: startups optimize cost-per-token while Fortune 500 buyers prioritize control, compliance and vendor consolidation.	Medium	SM010, SM015
CM035	A defensible 2026 AI inference market figure is roughly $118-126 billion, between the 2025 base and the 2030 forecast.	Low	SM001, SM003
CP001	The inference market has segmented into managed open-model platforms, vertically integrated silicon, hyperscaler bundles and open-source serving frameworks.	High	SP009, SP010
CP002	Together AI, Baseten and Replicate are Fireworks' closest managed open-model competitors.	Medium	SP009, SP010
CP003	Groq, Cerebras and SambaNova attack inference from custom silicon rather than software optimization on commodity GPUs.	Medium	SP009, SP005
CP004	AWS Bedrock, Google Vertex, Azure Foundry and Databricks Model Serving collapse model access, infrastructure and governance into one platform.	Medium	SP009, SP016
CP005	Open-source serving frameworks vLLM and SGLang plus NVIDIA NIM and routers like OpenRouter commoditize proprietary inference advantage.	Medium	SP009
CP006	NVIDIA entered inference directly via its Lepton acquisition and a competing GPU-cloud marketplace, becoming a supplier-turned-rival.	Medium	SP009
CP007	Together AI raised a $305 million Series B in February 2025 at a $3.3 billion valuation and reached about $1 billion annualized revenue by early 2026.	High	SP002, SP018
CP008	Together AI was founded in 2021 by Percy Liang, Chris Re and Vipul Ved Prakash and spans serverless, clusters, fine-tuning, voice and RL.	Medium	SP002, SP018
CP009	Baseten raised $300 million in January 2026 at a $5 billion valuation led by IVP and CapitalG with a reported $150 million from NVIDIA.	High	SP004, SP007
CP010	Baseten positions as an enterprise inference-engineering platform with self-hosted and hybrid VPC deployment built on TensorRT, SGLang, vLLM and TGI.	Medium	SP003, SP015
CP011	Groq raised $750 million in September 2025 at a $6.9 billion valuation and advertises 750-plus tokens per second on Llama models from custom LPU silicon.	High	SP005, SP006, SP017
CP012	Groq's partnership with Meta to power the official Llama API gives it strong distribution and first-party open-model credibility.	Medium	SP009
CP013	Replicate, Modal and Anyscale compete for developer mindshare at the top of the adoption funnel.	Medium	SP012, SP013, SP014
CP014	Fireworks' Q1 2026 uptime of 99.8% is the highest among specialized inference providers per independent monitoring.	Medium	SP001
CP015	Llama 3.3 70B runs about $0.90 per million tokens on Fireworks versus $0.88 on Together and $0.59 on Groq.	Medium	SP001, SP010
CP016	FireFunction achieves roughly 92% multi-tool function-calling accuracy, ahead of Together and Groq and within a few points of GPT-4o.	Medium	SP001
CP017	Together offers a 200-plus model catalog with full fine-tuning while Groq offers 15-20 models and no fine-tuning.	Medium	SP001
CP018	Groq's LPU delivers 400-750 tokens per second versus Fireworks' ~145, but Fireworks wins latency consistency under load.	Medium	SP001, SP010
CP019	Fireworks, Together and Baseten all offer SOC2/HIPAA postures and VPC or data-residency options, with Baseten leaning hardest into self-hosted compliance.	Medium	SP003, SP009
CP020	Most inference providers expose OpenAI-compatible APIs, making migration between them a matter of minutes.	Medium	SP001, SP020
CP021	Routing aggregators such as OpenRouter and TokenMix encourage multi-homing and automatic failover across providers.	Medium	SP001, SP009
CP022	Hyperscalers and NVIDIA control the GPU supply, security posture and procurement relationships that independents depend on.	Medium	SP009, SP016
CP023	Fireworks plugs into incumbent channels via an AWS Strategic Collaboration Agreement and Microsoft Foundry availability.	Medium	SP009, SP016
CP024	Fireworks does not own GPUs and sources NVIDIA and AMD capacity from third parties, unlike Together's owned data-center strategy.	Medium	SP009, SP002
CP025	Fireworks' proprietary FireAttention and FireOptimizer stack converts inference-systems engineering into a performance and price advantage.	Medium	SP009
CP026	Open-source serving frameworks keep closing the performance gap, and Baseten openly builds on vLLM and SGLang.	Medium	SP009, SP003
CP027	NVIDIA pushes NIM as a packaging layer and Snowflake released Arctic Inference as an open vLLM plugin, compressing proprietary advantage.	Medium	SP009
CP028	Groq at $6.9 billion, Baseten at $5 billion and Together near a rumored $7.5 billion are better capitalized than Fireworks at $4 billion.	Medium	SP005, SP004, SP002
CP029	Independent reviewers describe Fireworks as "just the engine," an adverse signal about its application-level differentiation versus full-stack rivals.	Medium	SP023
CP030	Fireworks' durability depends on extending into tuning, agents and governance faster than the ecosystem commoditizes the serving layer.	Medium	SP009
CP031	Fireworks' most defensible differentiation is reliability plus best-in-class function calling rather than price or raw speed.	Medium	SP001
CP032	The same Llama model spreads roughly sixfold in price and 5-7x in latency across the seven-provider field.	Medium	SP010
CP033	Together AI has raised $533.5 million in total funding from investors including General Catalyst, Prosperity7, NVIDIA, Salesforce and Kleiner Perkins.	Medium	SP002
CP034	Baseten's valuation roughly doubled from $2.15 billion in September 2025 to $5 billion in January 2026, with talks of an $11 billion round by May 2026.	Medium	SP003, SP004
CP035	Hyperscaler bundling is plausibly the single biggest structural threat to Fireworks because it removes the need for a standalone inference vendor.	Low	SP009, SP016
CI001	Fireworks bills serverless inference per token, fine-tuning per training token, reinforcement fine-tuning per GPU-hour and dedicated deployments per GPU-second or GPU-hour.	High	SI002, SI003
CI002	Fireworks' usage-based pricing maps to the customer lifecycle, capturing revenue across experimentation, production, adaptation and scaled deployment.	Medium	SI002
CI003	Reserved capacity is contracted separately on longer commitments at negotiated pricing and is the highest-margin stream.	Medium	SI002
CI004	Fireworks publishes serverless rates of about $0.90 per million tokens for Llama 3.3 70B, $0.20 for the 8B model and $0.50 for DeepSeek V3.	Medium	SI004, SI005
CI005	Image generation runs from about $0.013 (SDXL) to $0.04 (Flux 1.1 Pro) per image and reserved capacity near $4.80 per hour per replica.	Medium	SI004
CI006	Fireworks' go-to-market is bottoms-up at entry via self-serve API keys and top-down at expansion via negotiated enterprise relationships.	Medium	SI002
CI007	Fireworks offers $1 of free credits rather than an ongoing free tier and a standard rate limit near 600 requests per minute.	Medium	SI004
CI008	Fireworks runs a field and partner sales motion anchored by an AWS Strategic Collaboration Agreement with funded proofs-of-concept and a startup acceleration program.	Medium	SI002, SI007
CI009	Blended annualized revenue per company is estimated near $28,000 across Fireworks' 10,000-plus customer base.	Low	SI002
CI010	Fireworks revenue is likely concentrated among a smaller number of large production deployments rather than evenly across the base.	Low	SI002
CI011	Sacra estimates Fireworks' gross margin near 50%, below the 70%-plus typical of subscription software, because GPU costs sit in cost of goods sold.	Medium	SI002
CI012	Management targets a 60% gross margin through better GPU utilization, Blackwell-generation efficiency and a mix shift toward dedicated and enterprise workloads.	Medium	SI002
CI013	Multi-LoRA improves utilization by consolidating many fine-tuned variants onto a single base-model deployment, lowering compute cost per variant.	Medium	SI002, SI018
CI014	Proprietary optimization via FireAttention and FireOptimizer lets Fireworks charge a premium over self-hosting while undercutting the alternative's total cost.	Medium	SI002, SI016
CI015	NVIDIA reports rapidly growing data-center GPU revenue, evidencing the supplier-driven, capacity-constrained input market Fireworks operates within.	Medium	SI012
CI016	AMD's data-center accelerator business is also scaling, offering Fireworks an alternative silicon supplier to NVIDIA.	Medium	SI013
CI017	Fireworks stated annualized revenue surpassing $280 million at its October 2025 Series C.	High	SI001, SI006
CI018	Sacra estimates Fireworks at roughly $305 million annualized at year-end 2025 rising to about $800 million by May 2026.	Low	SI002
CI019	Earlier 2025 coverage reported Fireworks at $130 million ARR, profitable, and growing roughly 20x year over year.	Low	SI009
CI020	Fireworks' audited financials, revenue mix, net revenue retention, churn and headcount are not public.	Medium	SI002, SI010
CI021	Fireworks processes more than 10 trillion tokens per day, rising to 15 trillion by early 2026.	Medium	SI001, SI010
CI022	Fireworks has raised more than $327 million across seed, Series A, B and C rounds.	High	SI001, SI002
CI023	The October 2025 Series C provided $250 million, roughly $230 million primary and $20 million secondary, at a $4 billion valuation.	High	SI002, SI001
CI024	Fireworks plans to grow its compute footprint three-to-four-fold over the next year, a capital-intensive expansion.	Medium	SI001
CI025	Sacra reports Fireworks is in talks to raise again at a $15 billion valuation as of May 2026, which could be the next-round trigger.	Low	SI002
CI026	Fireworks' principal financing dependency is GPU supply, since it does not own its fleet and sources NVIDIA and AMD capacity from third parties.	Medium	SI002, SI012
CI027	Fireworks shows credible hypergrowth and a lifecycle-spanning usage model, but the absence of audited figures caps revenue-quality confidence.	Medium	SI002, SI001
CI028	The main financial diligence blockers are a reconciled revenue figure, gross-margin verification, burn and runway, and net revenue retention.	Medium	SI002, SI010
CI029	Fireworks' revenue figures span $130 million to roughly $800 million annualized within twelve months, reflecting both hypergrowth and inconsistent measurement.	Low	SI001, SI002, SI009
CI030	No public debt or project-finance obligations are disclosed for Fireworks AI.	Low	SI002, SI021
CI031	An AWS case study reports a Fireworks customer cut total costs four-fold and supported three times higher traffic per instance on EC2 P5.	Medium	SI007
CI032	Reported 2025 profitability, if accurate, would make Fireworks unusually capital-efficient for a hypergrowth infrastructure startup.	Low	SI009
CI033	Downward inference price pressure threatens Fireworks' margins absent continued differentiation, per critical reviewers.	Medium	SI020
CI034	MongoDB, a public infrastructure peer and Fireworks investor, illustrates the higher gross margins of pure-software comparables versus inference providers.	Low	SI014
CI035	Fireworks' capital intensity exceeds a typical SaaS company because compute scaling and the lack of owned GPUs require recurring capacity spend.	Medium	SI002, SI001
CE001	Fireworks lets a developer point an OpenAI-compatible API at an open model and get low-latency production inference without managing GPUs.	High	SE010, SE013, SE017
CE002	Customers describe Fireworks as an inference engine that supplies speed, cost and control while they build the product.	Medium	SE014, SE025, SE026
CE003	The platform spans text, image, audio and multimodal formats across hundreds of models with day-zero support for major releases.	Medium	SE010, SE006
CE004	Fireworks provides function calling, JSON-mode structured output and streaming through its API.	Medium	SE010, SE013
CE005	A single customer can expand from serverless inference into fine-tuning, dedicated capacity, RAG and voice agents.	Medium	SE017, SE023
CE006	Serverless inference is the entry product, offering pay-per-token access to 50-plus served models including Llama 4, DeepSeek V3, Qwen 3, Mixtral, Gemma 3 and Phi-4.	Medium	SE013, SE010
CE007	FireFunction is Fireworks' proprietary function-calling model family for tool use and structured output.	Medium	SE013
CE008	Customization modules include LoRA fine-tuning, Supervised Fine-Tuning V2 with quantization-aware training, and Reinforcement Fine-Tuning for agentic tasks.	High	SE005, SE003, SE004
CE009	Deployment modules span serverless, on-demand dedicated and reserved capacity plus multi-LoRA hosting of many adapters on one base deployment.	Medium	SE021, SE020
CE010	Newer surfaces include a Voice Agent Platform with sub-500ms response and BYOB secure training from customer AWS S3 buckets.	Medium	SE017, SE019
CE011	Fireworks runs a proprietary multi-layer inference stack on commodity NVIDIA GPUs with a stateless router, draft and target pods, distributed KV cache and continuous batching.	Medium	SE001
CE012	FireAttention is a custom CUDA attention implementation Fireworks reports as faster than vLLM and TensorRT-LLM, extended for long context and Llama 4 chunked local attention.	Medium	SE006, SE001
CE013	FireOptimizer performs adaptive speculative execution with reported latency reductions up to roughly 3x and native FP4 support on NVIDIA Blackwell B200.	Medium	SE002, SE009
CE014	The serving topology scales to documented tests around 50,000 requests per minute.	Low	SE001
CE015	Speculative decoding pairs a fast draft model with a full target model to generate and verify tokens in parallel, configurable per workload.	Medium	SE008, SE001
CE016	Fireworks' operating model is open-model neutral, betting on running whichever open model is winning rather than any single model.	Medium	SE017
CE017	Fireworks operates a global multi-region fleet including Frankfurt, Iceland and Tokyo plus US, Europe and APAC regions for latency and data residency.	Medium	SE017
CE018	Independent monitoring placed Fireworks' Q1 2026 uptime at 99.8%, the highest among specialized inference providers.	Medium	SE013
CE019	Notion reduced AI response latency from about 2 seconds to 350 milliseconds by fine-tuning with Fireworks.	Medium	SE015
CE020	Cursor reached about 1,000 tokens per second for code generation and Sourcegraph saw a 30% latency reduction and 2.5x acceptance increase on Fireworks.	Medium	SE014, SE016
CE021	The Series C-funded roadmap targets deeper tuning and inference-alignment research and an end-to-end model-lifecycle creation toolchain.	Medium	SE022, SE019
CE022	Fireworks plans a three-to-four-fold expansion of global compute and has acquired Hathora to deepen real-time orchestration.	Medium	SE022, SE017
CE023	Fireworks' core IP is the proprietary inference engine, especially FireAttention kernels and FireOptimizer, rather than registered patents.	Medium	SE002, SE017
CE024	No public patents are listed for Fireworks; its moat is engineering know-how.	Low	SE017
CE025	Product-model co-design uses a customer data feedback loop with continuous evaluation and reinforcement learning to improve fine-tuned models over time.	Medium	SE022, SE003
CE026	Fireworks' optimization advantage sits atop open-source frameworks like vLLM and SGLang that keep improving, so differentiation must be continuously re-earned.	Medium	SE017, SE009
CE027	The platform depends on leading-edge NVIDIA and AMD GPUs, CUDA, cloud regions and upstream open models.	Medium	SE001, SE017
CE028	Fireworks offers zero data retention by default, SSO, audit logs and data-residency controls for enterprise buyers.	Medium	SE017
CE029	Fireworks' AWS-based inference solution is HIPAA and SOC2 Type II compliant.	High	SE007, SE017
CE030	For sensitive workloads Fireworks supports airgapped EKS deployments and bring-your-own-bucket secure training.	Medium	SE017
CE031	Structured-output controls such as JSON mode and grammar-constrained decoding plus high schema compliance support dependable agentic tool use.	Medium	SE013, SE010
CE032	Fireworks does not publish a formal standard-tier SLA, and reviewers note thin documentation in places, both diligence items for security-sensitive buyers.	Medium	SE013, SE025
CE033	FireFunction achieves roughly 92% multi-tool function-calling accuracy and 99.1% JSON schema compliance in independent benchmarks.	Medium	SE013, SE027
CE034	Fireworks maintains day-zero support for new models such as Llama 4, DeepSeek and Qwen as a core engineering discipline.	Medium	SE006, SE011, SE012
CE035	Fireworks publishes open benchmark tooling via its GitHub organization, a developer-signal of technical openness.	Low	SE018
CU001	Fireworks' customer base spans AI-native startups, digital-native enterprises and large or regulated enterprises with distinct adoption paths.	Medium	SU009, SU007
CU002	AI-native startups such as Cursor, Perplexity, Liner and Cresta adopt Fireworks bottoms-up via self-serve API keys.	Medium	SU009, SU011
CU003	Digital-native enterprises including DoorDash, Notion, Shopify, Upwork and Quora run production AI features on Fireworks.	High	SU011, SU007
CU004	Use cases cluster around code assistance, conversational AI, enterprise search, agentic workflows and voice across software, e-commerce and customer-service verticals.	Medium	SU009, SU025
CU005	Fireworks' customer geography skews North American and European with global API access.	Low	SU025
CU006	Fireworks reported powering over 10,000 companies at its October 2025 Series C, about a tenfold increase from roughly 1,000 at the Series B.	High	SU006, SU009
CU007	Fireworks serves hundreds of thousands of developers, up from 12,000 in February 2024 to 23,000 by the end of 2024.	Medium	SU006, SU010
CU008	The platform processes more than 10 trillion tokens per day, rising to about 15 trillion by early 2026.	Medium	SU006, SU007
CU009	Customers follow a land-and-expand path from serverless inference into dedicated deployments, fine-tuning, RFT, embeddings and voice.	Medium	SU009, SU017
CU010	Analyst commentary on Hebbia shows how a single inference relationship can grow into a broader infrastructure dependency.	Medium	SU017
CU011	Cursor used Fireworks' speculative-decoding API to reach roughly 1,000 tokens per second for Fast Apply, with a named researcher endorsing production use.	Medium	SU001, SU013
CU012	Notion reduced AI response latency from about 2 seconds to 350 milliseconds by fine-tuning with Fireworks, attributed by its head of AI engineering.	Medium	SU002
CU013	Sourcegraph cut latency by 30% and lifted completion acceptance 2.5x on Fireworks, corroborated by an AWS case study.	High	SU003, SU012
CU014	Upwork's Uma assistant drafts real-time proposals on Fireworks per a named executive.	Medium	SU004
CU015	Quora's Poe chatbot tripled response speed and Superhuman built its Ask AI compound system on Fireworks.	Medium	SU013, SU007
CU016	Fireworks' named references are mostly production deployments with quantified outcomes and executive attribution, giving the reference base high quality.	Medium	SU001, SU002, SU012
CU017	Fireworks does not disclose net revenue retention, gross retention, churn, renewal rates or contract lengths.	Medium	SU009, SU017
CU018	Customer durability must be inferred from structural signals such as land-and-expand design and production usage rather than disclosed metrics.	Medium	SU017, SU009
CU019	High daily token volume and named executive testimonials indicate strong repeat usage and satisfaction anecdotally.	Low	SU006, SU002
CU020	The OpenAI-compatible API and routing aggregators make multi-homing and switching trivial, elevating churn risk.	Medium	SU018, SU021
CU021	Independent reviewers explicitly document Fireworks alternatives and switching paths, an adverse durability signal.	Medium	SU018
CU022	Fireworks' growth engine is land-and-expand, with a single serverless feature able to grow into dedicated, fine-tuning, voice and reserved-capacity spend.	Medium	SU009, SU017
CU023	Blended annualized revenue per company is roughly $28,000, likely understating a long tail beneath a few large accounts.	Low	SU022
CU024	The identity and revenue share of Fireworks' top customers are not disclosed, creating unquantifiable top-customer concentration risk.	Medium	SU009, SU022
CU025	The AWS Strategic Collaboration Agreement and Microsoft Foundry availability are growth accelerants but also channel dependencies.	Medium	SU009, SU024
CU026	Procurement friction is lower than for closed APIs via cloud marketplaces, but enterprise sales cycles and compliance reviews still gate the largest deals.	Low	SU009, SU024
CU027	Several marquee logos such as DoorDash and Shopify appear in aggregate marketing lists without standalone case studies.	Low	SU007, SU020
CU028	Sophisticated public customers like GitLab disclose AI-vendor dependence in their filings, illustrating buyer-side multi-homing and substitution capacity.	Low	SU016
CU029	WorkingAgents and other third parties corroborate Fireworks' compound-inference customer use cases for agentic workflows.	Low	SU015
CU030	Samsung is cited by investors as an enterprise customer accelerating its AI roadmap on Fireworks.	Medium	SU011
CU031	The named reference base is high quality but partly dated to 2024, a freshness caveat for diligence.	Medium	SU003, SU012
CU032	Fireworks' customer logos are concentrated in technology, e-commerce, customer service and legal-tech verticals.	Low	SU025
CU033	Production usage intensity is implied by 10-15 trillion tokens per day across the customer base.	Medium	SU006, SU007
CU034	Customer satisfaction evidence is positive but anecdotal, resting on named testimonials rather than survey or NPS data.	Low	SU002, SU004
CU035	Retention is the weakest-evidenced dimension of Fireworks' customer story, a material diligence gap.	Medium	SU017, SU018
CR001	Inference commoditization and gross-margin compression are Fireworks' highest-severity risks.	High	SR001, SR011
CR002	Hyperscaler bundling by AWS, Azure and Google could capture the inference layer and relegate Fireworks to an optimization add-on.	Medium	SR001
CR003	NVIDIA is simultaneously Fireworks' GPU supplier, an investor and a competitor via Lepton and NIM.	Medium	SR001, SR008
CR004	Capital intensity from a planned three-to-four-fold compute expansion is a medium-severity risk.	Medium	SR021, SR001
CR005	Fireworks' mitigation thesis is to move up the stack faster than the serving layer commoditizes.	Medium	SR001
CR006	Residual risk exposure remains meaningful because several mitigations are unproven and key metrics are undisclosed.	Medium	SR001, SR012
CR007	The EU AI Act imposes tiered, risk-based obligations including transparency and documentation duties on general-purpose AI providers and deployers.	High	SR004, SR005
CR008	GDPR and data-residency requirements drive Fireworks' zero-data-retention and regional-deployment features.	Medium	SR006, SR001
CR009	Open models such as Llama carry acceptable-use and license terms that flow through to platforms serving them.	Low	SR019, SR007
CR010	Fireworks lists no public patents, so its FireAttention and FireOptimizer advantages rest on trade secrets and know-how.	Medium	SR013, SR001
CR011	No material litigation or enforcement action against Fireworks is publicly known, and its Series C used top-tier legal counsel.	Medium	SR018, SR019
CR012	The NIST AI Risk Management Framework provides a voluntary governance baseline Fireworks and its customers can adopt.	Low	SR020
CR013	Fireworks does not own its GPU fleet and sources NVIDIA and AMD capacity from third parties, exposing it to allocation and supply risk.	Medium	SR001, SR008
CR014	Fireworks does not publish a formal standard-tier SLA, so contractual reliability commitments are negotiated case by case.	Medium	SR012
CR015	Independently monitored Q1 2026 uptime of 99.8% is a reliability strength despite the absence of a published SLA.	Medium	SR012
CR016	Operating a global multi-region fleet adds operational complexity and cost for Fireworks.	Low	SR001
CR017	Fireworks' SOC2 Type II, HIPAA, zero-retention and airgapped controls mitigate operational and security risk, with no public breach known.	Medium	SR001
CR018	A single serious outage or data incident would be especially damaging given customers' production, latency-sensitive workloads.	Medium	SR012, SR001
CR019	NVIDIA is the most acute dependency, supplying leading-edge GPUs while holding a stake and competing through Lepton, a GPU marketplace and NIM.	Medium	SR001, SR008
CR020	AMD provides an alternative silicon supplier, partly diversifying Fireworks' NVIDIA dependence.	Medium	SR025
CR021	AWS and Microsoft are both distribution partners and bundling threats via Bedrock, Vertex and Azure Foundry.	Medium	SR001
CR022	Fireworks depends on continued release and permissive licensing of open models from Meta, DeepSeek and Alibaba.	Medium	SR001, SR009
CR023	Capital-provider concentration among a handful of late-stage funds and key-customer multi-homing add dependency risk.	Low	SR022, SR028
CR024	Fireworks' enabling partners NVIDIA, AWS and Microsoft are also its most credible competitors.	Medium	SR001
CR025	Gross margin near 50% is structurally below software norms and faces persistent downward price pressure.	High	SR001, SR011
CR026	The path to a 60% gross margin depends on unproven utilization gains and a revenue-mix shift.	Medium	SR001
CR027	Burn, runway and net revenue retention are undisclosed, so Fireworks' capital adequacy is asserted rather than verified.	Medium	SR001, SR021
CR028	The valuation ramp from $552 million to $4 billion within roughly fifteen months, with talk of $15 billion, embeds aggressive growth expectations.	Medium	SR022, SR023
CR029	Key-person risk is concentrated in CEO Lin Qiao, who leads vision and fundraising.	Medium	SR024
CR030	Retaining elite inference engineers in a hot talent market is a continuing execution challenge.	Low	SR024, SR001
CR031	Fireworks' mitigations include moving up the stack, diversifying silicon, maintaining day-zero model support and hardening compliance.	Medium	SR001
CR032	Plugging into AWS and Azure procurement is a defensive mitigation against hyperscaler bundling.	Medium	SR001
CR033	Execution risk centers on whether the unproven up-the-stack expansion outruns commoditization.	Medium	SR001
CR034	Gross-margin trajectory toward 60% is the single best monitoring indicator of Fireworks' risk profile.	Medium	SR001
CR035	The clearest thesis-break triggers are margin stuck at ~50%, hyperscaler/NVIDIA capture, a key-person departure, or growth stalling versus the valuation.	Medium	SR001, SR022
CR036	Priority diligence asks are a reconciled revenue and margin figure, NRR and burn, GPU-supply contracts, and top-customer concentration.	Medium	SR001, SR012
CR037	Public infrastructure peers such as Datadog, Snowflake, Confluent and Cloudflare disclose AI-competition and margin risk factors that contextualize Fireworks' exposures.	Medium	SR014, SR015, SR016, SR017
CR038	DigitalOcean's filings illustrate the lower-margin reality of infrastructure-heavy businesses relative to pure software.	Low	SR030
CR039	Better-capitalized rivals such as Baseten raise the competitive stakes for Fireworks' enterprise go-to-market.	Medium	SR028, SR027
CR040	Low switching costs from OpenAI-compatible APIs and routers cap retention and amplify commoditization risk.	Medium	SR003, SR013
CR041	US export controls and supply constraints on advanced GPUs are an indirect risk transmitted through Fireworks' NVIDIA dependence.	Low	SR008, SR009
CR042	Fireworks' terms of service allocate liability and usage restrictions that are standard but warrant review for enterprise indemnification.	Low	SR019
CV001	The bull thesis is that Fireworks is becoming critical enterprise AI infrastructure as enterprises shift from closed-API experimentation to owning customized open models in production.	Medium	SV026, SV008
CV002	The anti-thesis is that inference is structurally commoditizing, with ~50% margins, near-zero switching costs, and hyperscaler and NVIDIA repricing risk.	Medium	SV001, SV016
CV003	Fireworks pairs a PyTorch-pedigree founding team with FireAttention, FireOptimizer, best-in-class function calling and 99.8% uptime.	Medium	SV026, SV001
CV004	Fireworks grew from roughly $130 million ARR in mid-2025 to a reported ~$800 million annualized by May 2026 across 10,000-plus customers.	Medium	SV001, SV029
CV005	Fireworks' per-token prices sit within ~2% of Together and open-source serving frameworks keep closing the performance gap, supporting the commoditization anti-thesis.	Medium	SV016, SV001
CV006	A valuation that ran from $552 million to $4 billion in fifteen months, with $15 billion in talks, prices in flawless execution.	Medium	SV001, SV008, SV029
CV007	We rate Fireworks AI track with medium confidence, a high risk rating and a stretched valuation stance.	Medium	SV001, SV016
CV008	We assign an overall score of 6.5 out of 10, reflecting a strong business at a demanding price.	Low	SV001, SV026
CV009	The $4 billion Series C implied roughly 14 times the company-stated $280 million annualized revenue.	Medium	SV008, SV001
CV010	The rumored $15 billion round implies roughly 19 times Sacra's ~$800 million May 2026 revenue estimate.	Low	SV001, SV004
CV011	Confidence is capped at medium primarily by the absence of audited financials, disclosed NRR and a reconciled revenue figure.	Medium	SV001, SV016
CV012	Fireworks has raised over $327 million across seed, a $25M Series A, a $52M Series B at $552M and a $250M Series C at $4B.	High	SV008, SV001
CV013	The Series C comprised roughly $230 million primary and a $20 million secondary.	Medium	SV001
CV014	As of May 2026 Fireworks is reportedly in talks to raise at a $15 billion post-money valuation co-led by Index Ventures.	Medium	SV001, SV004, SV002
CV015	Public evidence supports Fireworks' growth and customer story but not its financial quality, since revenue is unaudited, margin is estimated, and burn is undisclosed.	Medium	SV001, SV016
CV016	Strategic investors NVIDIA, AMD, MongoDB and Databricks on the cap table add ecosystem support but concentrate supplier and partner influence.	Medium	SV008, SV027
CV017	The base case (~45%) assumes ~$700-900 million 2026 revenue and low-50s margins, implying a fair value around $5-8 billion.	Low	SV001, SV005
CV018	The bull case (~30%) assumes margins toward 58-60% and revenue past $1.5 billion by 2027, justifying $15-20 billion.	Low	SV001, SV015
CV019	The bear case (~25%) assumes commoditization and hyperscaler capture compressing the multiple to a $2-3 billion range or a down round.	Low	SV016, SV001
CV020	The valuation dispersion is unusually wide because the same company can be read as durable infrastructure or a commoditizing reseller.	Medium	SV001, SV015
CV021	The deciding evidence between scenarios, gross-margin trajectory and retention, is not yet disclosed.	Medium	SV001
CV022	Together AI was valued at $3.3 billion on about $618 million annualized revenue in early 2025, roughly 5x, and is reportedly near $7.5 billion on about $1 billion.	Medium	SV005
CV023	Baseten raised at a $5 billion valuation in January 2026 with talks of $11 billion, and Groq reached $6.9 billion as a hardware-led player, while Fal is cited around $4.5 billion.	Medium	SV006, SV007, SV002
CV024	Public infrastructure-software comparables such as Datadog, Snowflake, Cloudflare and Confluent frame a broad, compressed multiple band with 70%-plus gross margins.	Medium	SV011, SV012, SV013, SV020
CV025	DigitalOcean illustrates that lower-margin infrastructure businesses trade at clear discounts to pure software, supporting a discount for Fireworks' ~50% margins.	Medium	SV014
CV026	Hyperscalers Amazon, Microsoft and Oracle are both the scale reference and the competitive threat to Fireworks' valuation.	Medium	SV017, SV018, SV019
CV027	At $4 billion on ~$280 million Fireworks looks rich versus Together's multiple but is on a smaller, faster-growing base; on ~$800 million it looks comparatively cheap.	Medium	SV001, SV005
CV028	Plausible exit paths include an IPO on sustained hypergrowth or strategic acquisition by a hyperscaler or data-platform investor that is also a competitor.	Low	SV017, SV018
CV029	The principal thesis-break triggers are margin failing to rise off ~50%, hyperscaler or NVIDIA capture, a key-person departure, or growth stalling versus the valuation.	Medium	SV001, SV016
CV030	A net revenue retention below roughly 110% once disclosed would warrant a lower multiple.	Low	SV001
CV031	Priority diligence asks are a reconciled dated revenue figure, audited gross margin and the path to 60%, NRR and churn, burn and runway, GPU-supply terms, top-customer concentration and preference and dilution structure.	Medium	SV001, SV016
CV032	Until margin and retention are confirmed, the right posture is to track closely, underwrite to the base case, and reserve premium entry for confirmation of the infrastructure thesis.	Medium	SV001, SV015
CV033	Together's prior round at $1.25 billion on $130 million 2024 revenue traded at 9.6x, a useful inference-peer multiple benchmark.	Medium	SV005
CV034	Fireworks' ~50% gross margin warrants a discount to the 70%-plus-margin public-software multiples because GPU costs sit in COGS.	Medium	SV014, SV001
CV035	The $15 billion valuation talk is corroborated by Sacra and multiple news outlets as of late May 2026 but remains unconfirmed.	Medium	SV001, SV002, SV003, SV024
CV036	The large AI inference TAM growing near 19% annually supports a premium for category leaders but does not by itself justify any single multiple.	Medium	SV030, SV015
CV037	A premium entry would become attractive if Fireworks demonstrates a credible path to 60% margins and net revenue retention above 120%.	Low	SV001
CV038	Usage-based comparables like Twilio and AI-software names like C3.ai bound the multiple range for consumption- and AI-exposed businesses.	Low	SV021, SV023
CV039	Preference stack and liquidation overhang are not publicly disclosed and must be diligenced before a late-stage entry.	Low	SV001, SV010
CV040	Salesforce and other large software comps illustrate mature-growth multiple compression that a maturing Fireworks would eventually face.	Low	SV022

Sources
ID	Publisher	Title	Quote
SO001	Fireworks AI	Fireworks AI - Fastest Inference for Generative AI
SO002	Fireworks AI	Fireworks AI Raises $250M Series C to Power the Future of Enterprise AI	Today, we're announcing a $250 million Series C at a $4 billion valuation ... brings our total funding to over $327 million
SO003	Fireworks AI	Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems	We're thrilled to announce our $52M Series B funding round led by Sequoia Capital, raising our valuation to $552M.
SO004	Index Ventures	Inference is the New Runtime: Our Investment in Fireworks	Alongside co-founders Dmytro Dzhulgakov, Dmytro Ivchenko, and James Reed ... as well as Benny Chen, Chenyu Zhao, and Pawel Garbacki
SO005	Sequoia Capital	Fireworks Founder Lin Qiao on Fast Inference and Small Models
SO006	The AI Insider	Fireworks AI Closes $250M Series C to Lead the AI Inference Market
SO007	The AI Insider	Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems
SO008	PYMNTS	Fireworks AI Valued at $552 Million After New Funding Round
SO009	Tech Funding News	NVIDIA, Sequoia invest in GenAI startup Fireworks AI's $52M round
SO010	The SaaS News	Fireworks AI Raises $52 Million in Series B
SO011	AI Curator	Fireworks AI Closes $250M Round, Eyes AI Inference Lead
SO012	AIM Media House	Fireworks AI raises $250 million for enterprise AI infrastructure
SO013	Sacra	Fireworks AI revenue, valuation & funding	Sacra estimates that Fireworks AI hit $800M in annualized revenue in May 2026, up from about $305M at the end of 2025.
SO014	Scroll.media	Fireworks AI has a valuation of $552 million. Ukrainians among the founders.	the company is already profitable, growing at an astonishing x20 year-over-year, with $130 million in annual recurring revenue (ARR).
SO015	The Stack	Fireworks AI's Lin Qiao: The future is compound AI
SO016	TWIML AI	Lin Qiao profile
SO017	Crunchbase	Fireworks AI - Company Profile
SO018	AI Market Watch	Fireworks AI - AI Startup Profile	Scaled to 15 trillion tokens processed daily and $315M+ annualized revenue by early 2026.
SO019	SiliconANGLE	Fireworks AI raises $250M at $4B valuation to help enterprises with AI inference workloads
SO020	Business Wire	Fireworks AI Raises $250M Series C to Lead the AI Inference Market
SO021	Orrick	Fireworks AI Raises $250 Million Series C at $4 Billion Valuation
SO022	Tech Funding News	PyTorch engineers' brainchild Fireworks AI closes $250M at $4B valuation
SO023	Exa	Meet the Executive Team at Fireworks AI
SO024	GitHub	Fireworks AI (fw-ai) GitHub organization
SO025	Fireworks AI	Fireworks AI Careers
SO026	eesel AI	An honest Fireworks AI review (2025): The good, the bad, and the ugly	Fireworks excels at performance and model selection, but it is 'just the engine' - developers and businesses still need technical sophistication to build deployable solutions.
SM001	MarketsandMarkets	AI Inference Market - Global Forecast to 2030	the AI inference market is expected to grow from USD 106.15 billion in 2025 to USD 254.98 billion by 2030, at a CAGR of 19.2%
SM002	Polaris Market Research	AI Inference Market Size & Trends, Industry Report 2034
SM003	Research and Markets	AI Inference Market Outlook 2026-2034
SM004	Vention	State of AI 2026 - AI Market Size, Investment, and Industry Data
SM005	Precedence Research	AI Inference Market Size and Forecast
SM006	Digital Applied	AI Inference Providers Compared: Q2 2026 Pricing Matrix	By Q2 2026 the serverless inference market has consolidated around seven providers - Together, Fireworks, Anyscale, Groq, Cerebras, Replicate, and OctoAI.
SM007	Alatirok	AI Inference Providers in 2026: 5-Way Comparison
SM008	Jimmy Research	Fireworks AI - entity profile
SM009	Index Ventures	Inference is the New Runtime: Our Investment in Fireworks	Gartner projects GenAI model spend to nearly triple from $14 billion in 2025 to $39 billion by 2028
SM010	Sacra	Fireworks AI revenue, valuation & funding
SM011	Sacra	Together AI revenue, valuation & funding	Sacra estimates that Together AI hit $1B in annualized revenue in February 2026, up from ~$618M at the end of 2025.
SM012	Sacra	Baseten revenue, valuation & funding
SM013	Fireworks AI	Fireworks AI Raises $250M Series C
SM014	SiliconANGLE	Fireworks AI raises $250M at $4B valuation
SM015	Microsoft Azure	Introducing Fireworks AI on Microsoft Foundry
SM016	Together AI	Together AI - The AI Acceleration Cloud
SM017	Together AI	Together AI Pricing
SM018	Baseten	Baseten - Inference Platform
SM019	Groq	Groq - Fast, low cost inference
SM020	Modal	Modal - High-performance AI infrastructure
SM021	Replicate	Replicate - Run AI with an API
SM022	Anyscale	Anyscale - Scalable compute for AI
SM023	TokenMix	Fireworks AI Review 2026
SM024	DeployBase	Fireworks AI Pricing Breakdown
SM025	eesel AI	An honest Fireworks AI review (2025)	the industry expects this downward pricing pressure to intensify by 2025-2026, making it difficult for any single provider to maintain high profit margins
SM026	EU AI Act (artificialintelligenceact.eu)	High-level summary of the AI Act
SP001	TokenMix	Fireworks AI Review 2026: 99.8% Uptime vs Together and Groq	Fireworks: 99.8% uptime + best function calling, 50+ models, $0.90/M. Together: 200+ models + cheap fine-tuning, $0.88/M. Groq: ultra-low latency, $0.59/M but lowest uptime (99.4%).
SP002	Sacra	Together AI revenue, valuation & funding	Together AI raised a $305M Series B in February 2025 led by General Catalyst ... valuing the company at $3.3B
SP003	Sacra	Baseten revenue, valuation & funding
SP004	Business Wire	Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future
SP005	DatacenterDynamics	AI chip company Groq raises $750m at $6.9bn valuation
SP006	Dataconomy	AI chip startup Groq raises $750 million at a $6.9 billion valuation
SP007	The AI World	Baseten raises $300M to scale AI inference
SP008	TechBuzz	Groq Raises $750M at $6.9B Valuation to Challenge Nvidia's AI Dominance
SP009	Sacra	Fireworks AI revenue, valuation & funding (competition section)	Together AI is Fireworks' closest direct competitor ... Baseten raised a $300M Series E at a $5 billion valuation
SP010	Digital Applied	AI Inference Providers Compared: Q2 2026 Pricing Matrix
SP011	DeployBase	Fireworks AI Pricing Breakdown vs competitors
SP012	Modal	Modal - High-performance AI infrastructure
SP013	Replicate	Replicate - Run AI with an API
SP014	Anyscale	Anyscale - Scalable compute for AI
SP015	Baseten	Baseten Pricing
SP016	Microsoft Foundry	Fireworks models on Microsoft Foundry
SP017	Groq	Groq - Fast, low cost inference
SP018	Together AI	Together AI - The AI Acceleration Cloud
SP019	Alatirok	AI Inference Providers in 2026: 5-Way Comparison
SP020	Walturn	What is Fireworks AI? Features, Pricing, and Use Cases
SP021	createaiagent.net	Fireworks AI: Optimized Inference Solutions
SP022	Fireworks AI	Fireworks AI Raises $250M Series C
SP023	eesel AI	An honest Fireworks AI review (2025)	Critics note that, while Fireworks excels at performance and model selection, it is 'just the engine'.
SP024	SiliconANGLE	Fireworks AI raises $250M at $4B valuation
SP025	Together AI	Together AI Pricing
SI001	Fireworks AI	Fireworks AI Raises $250M Series C	our annualized revenue has surpassed $280 million ... Growing our computation footprint 3-4x over the next year
SI002	Sacra	Fireworks AI revenue, valuation & funding	The company's gross margin sits at approximately 50% ... Fireworks has told investors it is targeting 60% gross margins
SI003	Fireworks AI	Fireworks AI Pricing
SI004	TokenMix	Fireworks AI Review 2026 - pricing breakdown	Llama 70B $0.90/M, Llama 8B $0.20/M, DeepSeek V3 $0.50/M ... Reserved capacity ... approximately $4.80/hour
SI005	DeployBase	Fireworks AI Pricing Breakdown: Cost Per Token
SI006	SiliconANGLE	Fireworks AI raises $250M at $4B valuation
SI007	Amazon Web Services	Fireworks.ai Case Study	the customer cut total costs by four times ... HIPAA and SOC2 Type II compliant
SI008	Markaicode	Fireworks AI Architecture: Multi-Layer Inference Stack for 50K RPM
SI009	Scroll.media	Fireworks AI valuation and ARR	the company is already profitable, growing at an astonishing x20 year-over-year, with $130 million in annual recurring revenue (ARR).
SI010	AI Market Watch	Fireworks AI - AI Startup Profile	Scaled to 15 trillion tokens processed daily and $315M+ annualized revenue by early 2026.
SI011	Digital Applied	AI Inference Providers Compared: Q2 2026 Pricing Matrix
SI012	U.S. Securities and Exchange Commission (NVIDIA)	NVIDIA Corporation Form 10-K (FY ended January 25, 2026)
SI013	U.S. Securities and Exchange Commission (AMD)	Advanced Micro Devices Form 10-K (FY ended December 27, 2025)
SI014	U.S. Securities and Exchange Commission (MongoDB)	MongoDB, Inc. Form 10-K (FY ended January 31, 2026)
SI015	Fireworks AI	Fireworks AI Docs - Concepts
SI016	Fireworks AI	FireOptimizer: Customizing latency and quality
SI017	Index Ventures	Inference is the New Runtime
SI018	Fireworks AI	Multi-LoRA: Personalize AI at scale
SI019	Sanjay Says	Fireworks AI and Adaptive Speculative Execution
SI020	eesel AI	An honest Fireworks AI review (2025)	there is pressure for all inference providers to cut prices ... making it difficult for any single provider to maintain high profit margins
SI021	Crunchbase	Fireworks AI - Company Profile
SI022	Business Wire	Fireworks AI Raises $250M Series C
SI023	Tech Funding News	Fireworks AI closes $250M at $4B valuation
SI024	Fireworks AI	Fireworks AI Docs - Deploying LoRAs
SI025	Fireworks AI	Fireworks AI Docs - Changelog
SE001	Markaicode	Fireworks AI Architecture: Multi-Layer Inference Stack for 50K RPM	Stateless request router ... Draft GPU pods running a small fast model ... Target GPU pods ... Distributed KV cache ... above 85 tokens/sec per GPU
SE002	Fireworks AI	FireOptimizer: Customizing latency and quality for production
SE003	Fireworks AI	Reinforcement Fine Tuning: Train expert open models to surpass closed
SE004	Fireworks AI	Fireworks RFT: Build AI agents with fine-tuned open models
SE005	Fireworks AI	Introducing Supervised Fine Tuning V2
SE006	Fireworks AI	Optimizing Llama 4 Maverick on Fireworks AI	Llama 4 Maverick became available day one on Fireworks with support for 1-million-token context ... custom attention via FireAttention
SE007	Amazon Web Services	Fireworks.ai Case Study (HIPAA / SOC2)	the Fireworks.ai inference solution built on AWS is HIPAA and SOC2 Type II compliant
SE008	Fireworks AI	Speculative Decoding - Fireworks AI Docs
SE009	Sanjay Says	Fireworks AI and Adaptive Speculative Execution
SE010	Fireworks AI	Fireworks AI Docs - Introduction
SE011	Fireworks AI	DeepSeek V3.1 now on Fireworks AI
SE012	Fireworks AI	Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost
SE013	TokenMix	Fireworks AI Review 2026: uptime and function calling benchmarks	FireFunction 92.1% multi-tool accuracy ... 99.8% uptime, highest in inference market
SE014	Fireworks AI	How Cursor built Fast Apply using the Speculative Decoding API	Cursor ... achieve 1000 tokens/sec for code generation use cases such as instant apply
SE015	Fireworks AI	How Notion fine-tuned with Fireworks	we reduced latency from about 2 seconds to 350 milliseconds
SE016	Fireworks AI	How Sourcegraph scaled real-time code assistance with Fireworks
SE017	Sacra	Fireworks AI - enterprise security posture	zero data retention by default, SSO, audit logs, data residency controls, HIPAA and SOC2 compliance posture, and airgapped EKS deployments
SE018	GitHub	Fireworks AI (fw-ai) GitHub organization and benchmarks
SE019	Fireworks AI	Fireworks AI Dev Day 2025 Wrapped
SE020	Fireworks AI	Multi-LoRA: Personalize AI at scale
SE021	Fireworks AI	Fireworks AI Docs - Concepts
SE022	Fireworks AI	Fireworks AI Raises $250M Series C (roadmap)	Expand Our Product into a Comprehensive AI Creation Toolchain ... Growing our computation footprint 3-4x
SE023	Fireworks AI	Fireworks AI - AI-native
SE024	Fireworks AI	Fireworks AI Docs - Deploying LoRAs
SE025	eesel AI	An honest Fireworks AI review (2025): documentation gaps	Some reviews point to limited transparency around free usage, sporadic documentation, and potential support slowdowns
SE026	Walturn	What is Fireworks AI? Features, Pricing, and Use Cases
SE027	DeployBase	Fireworks AI Pricing and capabilities breakdown
SU001	Fireworks AI	How Cursor built Fast Apply using the Speculative Decoding API	Fireworks is way more performant than the open source engines and is what we use in production.
SU002	Fireworks AI	How Notion fine-tuned models with Fireworks	we reduced latency from about 2 seconds to 350 milliseconds
SU003	Fireworks AI	Real-time code assistance: How Sourcegraph scaled with Fireworks
SU004	Fireworks AI	How Upwork and Fireworks deliver faster proposals (Uma)
SU005	Fireworks AI	Accelerating Code Completion with Fireworks Fast LLM Inference
SU006	Fireworks AI	Fireworks AI Raises $250M Series C (customer scale)	Fireworks now powers over 10,000 companies (a 10x increase from our Series B)
SU007	AI Market Watch	Fireworks AI - notable customers and growth metrics	Notable customers: Quora, DoorDash, Upwork, Cresta, Cursor, Liner, Superhuman, Sourcegraph, Tome, Samsung, Uber, Notion, Shopify
SU008	Fireworks AI	Fireworks AI - Customers
SU009	Sacra	Fireworks AI - customer base and expansion	The customer base grew from roughly 1,000 companies at the time of the Series B to more than 10,000 companies by October 2025.
SU010	Scroll.media	Fireworks AI developer growth 2024	The number of developers using Fireworks AI jumped from 12,000 in February 2024 to 23,000 by year's end.
SU011	Index Ventures	Inference is the New Runtime (customer references)	high-throughput, latency-sensitive applications at companies like Uber, DoorDash, Notion, Quora, and Upwork ... enterprise leaders like Samsung
SU012	Amazon Web Services	Fireworks.ai Case Study (Sourcegraph / Cody)	Cody doubled its completion acceptance rate ... Cody's backend latency accelerated by more than two times.
SU013	Fireworks AI	Fireworks AI Series B (Cursor, Quora, Upwork, Superhuman)	Superhuman ... used Fireworks to create Ask AI, a compound AI system
SU014	Fireworks AI	Fireworks AI - AI-native customers
SU015	WorkingAgents	Fireworks AI: The Compound Inference Engine
SU016	GitLab Inc. (SEC EDGAR)	GitLab Inc. Form 10-K (FY ended January 31, 2026)
SU017	Sacra	Fireworks AI - retention and expansion dynamics	a single inference relationship can anchor a broader infrastructure dependency over time
SU018	eesel AI	Fireworks AI alternatives and switching considerations
SU019	eesel AI	An honest Fireworks AI review (2025)
SU020	Fireworks AI	Fireworks AI homepage (customer logos)
SU021	TokenMix	Fireworks AI Review 2026 - production usage
SU022	Sacra	Fireworks AI - business model and ARPA	Blended annualized revenue per company works out to roughly $28,000 across the full base
SU023	Fireworks AI	Fireworks AI Blog index
SU024	Fireworks AI	Fireworks AI at AWS re:Invent 2025
SU025	AI Market Watch	Fireworks AI - geographic focus and industries
SR001	Sacra	Fireworks AI - risks section	the proprietary performance advantage in FireAttention and FireOptimizer is likely to compress ... Hyperscaler capture ... Hardware concentration
SR002	eesel AI	An honest Fireworks AI review (2025): risks
SR003	eesel AI	Fireworks AI alternatives (switching risk)
SR004	EU AI Act (artificialintelligenceact.eu)	High-level summary of the AI Act
SR005	EU AI Act (artificialintelligenceact.eu)	Article 53: Obligations for providers of general-purpose AI models
SR006	GDPR.eu	What is GDPR, the EU's data protection law?
SR007	European Commission	Regulatory framework for AI
SR008	U.S. Securities and Exchange Commission (NVIDIA)	NVIDIA Corporation Form 10-K (FY ended January 25, 2026)
SR009	DatacenterDynamics	Groq raises $750m at $6.9bn valuation (silicon competition)
SR010	Dataconomy	Groq raises $750 million (NVIDIA challenge)
SR011	Digital Applied	AI Inference Providers Pricing Matrix Q2 2026 (price pressure)
SR012	TokenMix	Fireworks AI Review 2026 (SLA and pricing risk)	Fireworks AI does not publish a formal SLA for its standard tier
SR013	Walturn	What is Fireworks AI? (risks and lock-in)
SR014	U.S. Securities and Exchange Commission (Datadog)	Datadog, Inc. Form 10-K (FY ended December 31, 2025)
SR015	U.S. Securities and Exchange Commission (Snowflake)	Snowflake Inc. Form 10-K (FY ended January 31, 2026)
SR016	U.S. Securities and Exchange Commission (Confluent)	Confluent, Inc. Form 10-K (FY ended December 31, 2025)
SR017	U.S. Securities and Exchange Commission (Cloudflare)	Cloudflare, Inc. Form 10-K (FY ended December 31, 2025)
SR018	Orrick	Fireworks AI Series C legal counsel
SR019	Fireworks AI	Fireworks AI Terms of Service
SR020	NIST	AI Risk Management Framework
SR021	Fireworks AI	Fireworks AI Raises $250M Series C (use of funds / capital intensity)
SR022	SiliconANGLE	Fireworks AI raises $250M at $4B valuation (valuation ramp)
SR023	Scroll.media	Fireworks AI valuation ramp 552M to 4B
SR024	Index Ventures	Inference is the New Runtime (founder dependency)
SR025	Advanced Micro Devices (SEC EDGAR)	AMD Form 10-K (alternative silicon supply)
SR026	DeployBase	Fireworks AI Pricing (margin pressure)
SR027	Alatirok	AI Inference Providers 2026 (competitive risk)
SR028	Business Wire	Baseten Raises $300M (capital asymmetry)
SR029	GitLab Inc. (SEC EDGAR)	GitLab Form 10-K (AI vendor risk-factor comparable)
SR030	DigitalOcean (SEC EDGAR)	DigitalOcean Form 10-K (infrastructure margin comparable)
SV001	Sacra	Fireworks AI revenue, valuation & funding	Fireworks AI is in talks to raise a new funding round at a $15 billion post-money valuation, with Index Ventures set to co-lead.
SV002	AI Weekly	Fireworks AI Targets $15B Valuation in New Round
SV003	StartupNews.fyi	Fireworks AI Seeks $15B Funding, Quadrupling Valuation
SV004	Yahoo Finance	Fireworks AI Eyes $15 Billion Valuation In New Funding Talks
SV005	Sacra	Together AI revenue, valuation & funding	Based on 2024 revenue of $130M and a $1.25B valuation, the company traded at a 9.6x revenue multiple at its prior round.
SV006	Sacra	Baseten revenue, valuation & funding
SV007	DatacenterDynamics	Groq raises $750m at $6.9bn valuation
SV008	Fireworks AI	Fireworks AI Raises $250M Series C at $4B valuation
SV009	SiliconANGLE	Fireworks AI raises $250M at $4B valuation
SV010	Orrick	Fireworks AI Raises $250 Million Series C at $4 Billion Valuation
SV011	U.S. Securities and Exchange Commission (Datadog)	Datadog, Inc. Form 10-K (FY 2025)
SV012	U.S. Securities and Exchange Commission (Snowflake)	Snowflake Inc. Form 10-K (FY 2026)
SV013	U.S. Securities and Exchange Commission (Cloudflare)	Cloudflare, Inc. Form 10-K (FY 2025)
SV014	U.S. Securities and Exchange Commission (DigitalOcean)	DigitalOcean Holdings Form 10-K (FY 2025)
SV015	a16z	AI Inference Economics
SV016	eesel AI	An honest Fireworks AI review (2025): margin and commoditization
SV017	U.S. Securities and Exchange Commission (Amazon)	Amazon.com, Inc. Form 10-K (FY 2025)
SV018	U.S. Securities and Exchange Commission (Microsoft)	Microsoft Corporation Form 10-K (FY 2025)
SV019	U.S. Securities and Exchange Commission (Oracle)	Oracle Corporation Form 10-K (FY 2025)
SV020	U.S. Securities and Exchange Commission (Confluent)	Confluent, Inc. Form 10-K (FY 2025)
SV021	U.S. Securities and Exchange Commission (Twilio)	Twilio Inc. Form 10-K (FY 2025)
SV022	U.S. Securities and Exchange Commission (Salesforce)	Salesforce, Inc. Form 10-K (FY 2026)
SV023	U.S. Securities and Exchange Commission (C3.ai)	C3.ai, Inc. Form 10-K (FY 2025)
SV024	CryptoBriefing	Fireworks AI reportedly seeks funding at $15 billion valuation
SV025	Briefs.co	Fireworks AI Eyes $15B Valuation In New Funding Round
SV026	Index Ventures	Inference is the New Runtime (thesis)
SV027	Tech Funding News	Fireworks AI closes $250M at $4B valuation
SV028	AI Market Watch	Fireworks AI - revenue and valuation profile
SV029	Scroll.media	Fireworks AI valuation ramp 552M to 4B
SV030	MarketsandMarkets	AI Inference Market - Global Forecast to 2030