Startup Diligence
Diligence report AI inference infrastructure / developer tools Series C (private) 2026-06-14

Fireworks AI

Inference cloud for open models, priced for perfection

A top-tier AI-inference asset with elite founders and hypergrowth, priced for perfection against ~50% margins and structural commoditization risk.

Cover facts

Valuation 01
4 USD billion (Series C, Oct 2025) [CO011]
Total raised 02
327 USD million+ [CO014]
Annualized revenue 03
800 USD million (Sacra est., May 2026) [CI018]
Customers 04
10000 companies+ [CO018]
Tokens/day 05
15 trillion (early 2026) [CO022]
Gross margin 06
50 percent (est.) [CO024]

Company profile

Fireworks AI is a Redwood City-based AI inference cloud founded in 2022 by Lin Qiao and a team of former Meta PyTorch engineers. It lets enterprises run, fine-tune and scale hundreds of open-source LLM, image, audio and multimodal models in production via an OpenAI-compatible API, differentiating on proprietary inference optimization (FireAttention, FireOptimizer), best-in-class function calling and category-leading reliability. The company raised a $250M Series C at a $4B valuation in October 2025, reports rapid revenue growth to a third-party-estimated ~$800M annualized, and serves 10,000-plus companies including Cursor, Notion, DoorDash and Samsung.

Website
fireworks.ai
Founded
2022-01-01
Founders
Lin Qiao, Dmytro Dzhulgakov, Dmytro Ivchenko
Founding location
Redwood City, California, USA
Headquarters
Redwood City, California, USA
Product
Usage-based AI inference platform: per-token serverless inference for open models, LoRA and reinforcement fine-tuning, dedicated and reserved GPU deployments, a function-calling model family (FireFunction), and a voice-agent platform, all on a proprietary optimized inference engine.
Customers
AI-native startups, digital-native enterprises and select Fortune 500 buyers building production generative-AI applications that need fast, cost-efficient, controllable open-model inference.
Business model
B2B usage-based monetization across serverless (per token), fine-tuning (per training token), reinforcement fine-tuning (per GPU-hour) and dedicated/reserved deployments, with bottoms-up developer entry expanding into negotiated enterprise contracts.
Stage
Series C (private, venture-backed)
Funding status
$250M Series C at $4B valuation (Oct 2025), >$327M total raised; reportedly in talks for a ~$15B round co-led by Index Ventures as of May 2026 (unconfirmed).
[CO001, CO011, CO014, CO018]

Executive summary

Top strengths

  • Rare founder-market fit - the team that built PyTorch at Meta now leads inference systems.
  • Hypergrowth to a reported ~$800M annualized revenue across 10,000+ customers and 15T tokens/day.
  • Blue-chip production references (Cursor, Notion, Sourcegraph, Upwork) with quantified outcomes.
  • Engineering-led differentiation (FireAttention, FireOptimizer), best-in-class function calling and 99.8% uptime.

Top risks

  • Inference commoditization and ~50% gross margins versus 70%+ software norms.
  • Hyperscaler bundling (Bedrock, Azure, Vertex) and NVIDIA acting as supplier, investor and competitor.
  • Low switching costs and multi-homing cap retention and pricing power.
  • Aggressive valuation ramp ($552M to $4B, $15B in talks) embeds flawless execution.

Open gaps

  • No audited financials or single reconciled, dated revenue figure (estimates span 6x within a year).
  • Gross margin, net revenue retention, churn, burn and runway are undisclosed.
  • Top-customer revenue concentration and GPU-supply contract terms are not public.
  • Preference stack and dilution structure of the next round are undisclosed.

Contents

Chapter 01

01Company Overview

1.1 Identity and Business Model

Fireworks AI is an American artificial-intelligence infrastructure company headquartered in Redwood City, California, founded in late 2022 by a team that left Meta's PyTorch organization. The company operates what it calls an "AI Cloud" for enterprise developer teams: a managed inference platform that runs, fine-tunes, and scales open-source large language, vision, audio, and multimodal models with low-latency serving. Its core thesis is "one-size-fits-one" inference, the belief that the highest-value AI is built on smaller, customizable open models tuned on enterprise-specific data rather than a handful of generic closed foundation models. Monetization is usage-based across the customer lifecycle: serverless inference billed per token, fine-tuning billed per training token, reinforcement fine-tuning billed per GPU-hour, and on-demand or reserved dedicated deployments billed per GPU-second or GPU-hour. The platform offers hundreds of models plus an OpenAI-compatible API, function calling, and enterprise security controls, positioning Fireworks between commodity GPU rental and closed-model APIs.[CO001, CO002, CO003, CO004, CO005, CO031]

FO002: Company snapshot logic

How identity, product, customers, capital and dependencies connect.

[CO001, CO004, CO018, CO024, CO028]

1.2 Founders and Leadership

Fireworks AI was co-founded by chief executive Lin Qiao alongside six colleagues, the majority of whom worked together on PyTorch at Meta. Qiao previously served as Senior Director of Engineering and Head of PyTorch at Meta, where she led an organization of more than 300 engineers, and earlier held roles at LinkedIn, IBM and other large systems companies; she holds a Ph.D. in Computer Science from UC Santa Barbara. Co-founders include Dmytro Dzhulgakov, a former core PyTorch maintainer who joined Facebook in 2011, and Dmytro Ivchenko, a Kyiv Polytechnic graduate who worked on PyTorch ranking at Meta, both originally from Ukraine. The remaining founders, James Reed, Benny Chen, Chenyu Zhao and Pawel Garbacki, bring experience from Meta's PyTorch compiler, ads infrastructure and core ML teams as well as Google Vertex AI. The founding team's deep inference-systems pedigree is repeatedly cited by investors as the company's core advantage and is also a key-person dependency concentrated in Qiao.[CO006, CO007, CO008, CO009, CO010, CO032]

Leadership and founder table
PersonRoleBackgroundFounder-market fitKey-person dependency
Lin QiaoCEO & co-founderHead of PyTorch at Meta (300+ eng); LinkedIn, IBM; PhD UC Santa BarbaraDeep inference-systems and OSS leadershipHigh - public face, vision and fundraising lead
Dmytro DzhulgakovCo-founder (CTO-level)Core PyTorch maintainer at Meta since 2011; from Kharkiv, UkraineCore inference engineeringHigh - principal technical architect
Dmytro IvchenkoCo-founderPyTorch ranking at Meta; LinkedIn; Kyiv PolytechnicLarge-scale ML systemsMedium
James ReedCo-founderPyTorch compiler team at MetaCompiler / kernel optimizationMedium
Benny ChenCo-founderMeta ads infrastructure leadProduction infra strategyMedium
Chenyu ZhaoCo-founderLed Google Vertex AICloud AI platform GTMMedium
Pawel GarbackiCo-founderCore ML for Meta NewsfeedML systems and rankingMedium

Founder list and backgrounds compiled from Index Ventures, Sequoia, scroll.media and executive directory sources; roles beyond CEO are not all formally titled publicly.

[CO006, CO007, CO008, CO009, CO010]

1.3 Funding and Capitalization

Fireworks has raised more than $327 million across a seed round and three priced rounds. A $25 million Series A led by Benchmark closed in March 2024, with Sequoia Capital, Databricks Ventures and angels including Frank Slootman, Sheryl Sandberg, Howie Liu and Alexandr Wang. A $52 million Series B led by Sequoia followed in July 2024 at a $552 million valuation, adding NVIDIA, AMD and MongoDB Ventures and bringing cumulative capital to $77 million. In October 2025 the company announced a $250 million Series C at a $4 billion valuation, co-led by Lightspeed Venture Partners, Index Ventures and Evantic with continued support from Sequoia; Sacra reports the round comprised roughly $230 million of primary capital plus a $20 million secondary. Strategic participants across rounds include NVIDIA, AMD, MongoDB and Databricks, tying the cap table to the hardware and data-platform ecosystems Fireworks depends on. As of May 2026 Sacra reports the company is in talks to raise again at a $15 billion post-money valuation, with Index set to co-lead, though terms are unconfirmed.[CO011, CO012, CO013, CO014, CO015, CO016]

Stakeholder or investor map
StakeholderRole / roundControl or economic importanceDiligence ask
BenchmarkLead - Series A (Mar 2024)Early lead investor; likely board seatConfirm board composition and ownership %
Sequoia CapitalLead - Series B; continued Series CMulti-round backer; GP Sonya HuangConfirm board seat and pro-rata stakes
Lightspeed Venture PartnersCo-lead - Series C (Oct 2025)Late-stage lead at $4BConfirm governance rights
Index VenturesCo-lead - Series C; potential co-lead next roundRepeat backer (Sahir Azam); thesis investorConfirm allocation in rumored $15B round
EvanticCo-lead - Series CNew late-stage leadConfirm fund profile and stake
NVIDIAStrategic - Series B/CHardware supplier and investorAssess GPU-allocation conflicts/benefits
AMDStrategic - Series B/CAlternative silicon supplier/investorAssess MI-series adoption
MongoDB / DatabricksStrategic - Series B/CData-platform partners/investorsConfirm co-sell and partnership depth

Lead and strategic investors only; individual angels (Slootman, Sandberg, Liu, Wang) and seed backers are not enumerated. Board composition and ownership percentages are not public.

[CO011, CO012, CO013, CO015, CO016]

1.4 Scale and Traction Metrics

Fireworks reports rapid commercial scaling. At the Series C the company said it powers over 10,000 companies, a roughly tenfold increase from the Series B, serves hundreds of thousands of developers, and processes more than 10 trillion tokens per day; third-party profiles cite 15 trillion tokens per day by early 2026. The developer base grew from about 12,000 in February 2024 to 23,000 by the end of that year. Reported revenue figures vary by source and vintage and should be treated with care: the company stated annualized revenue had surpassed $280 million at the October 2025 Series C, while Sacra estimates roughly $305 million at year-end 2025 rising to about $800 million annualized by May 2026, and earlier 2025 coverage cited $130 million ARR with claims of profitability and 20x year-over-year growth. Gross margin is estimated near 50 percent, below the 70 percent-plus typical of software, because GPU costs sit in cost of goods sold; management has told investors it targets 60 percent.[CO018, CO019, CO020, CO021, CO022, CO023]

Snapshot KPI table
MetricValue / StatusAs ofConfidenceGap or note
Valuation$4.0B post-money (Series C)Oct 2025HighReported $15B round in talks May 2026 (unconfirmed)
Total raised>$327MOct 2025HighIncludes ~$20M secondary in Series C
Annualized revenue (company)>$280MOct 2025MediumCompany statement; not audited
Annualized revenue (Sacra est.)~$800MMay 2026LowThird-party estimate; conflicts with company vintage
Customers10,000+ companiesOct 2025Medium~10x increase vs Series B
DevelopersHundreds of thousandsOct 2025Medium23,000 cited end-2024
Tokens/day10T+ (15T early 2026)Oct 2025MediumThroughput metric, not revenue
Gross margin~50% (targeting 60%)2026LowSacra estimate; GPU COGS heavy
HeadcountNot disclosed2026LowNo reliable public figure

Values compiled from company announcements and third-party analyst profiles; revenue and margin are estimates with conflicting vintages and are not audited financials.

[CO011, CO014, CO018, CO021, CO022, CO024]
FO003: Investability indicators

Traction, revenue trajectory and key-person signals beyond the headline KPI snapshot.

Revenue and growth are estimates across differing vintages; key-person concentration is a qualitative judgment.

[CO018, CO022, CO023, CO019, CO033]

1.5 Milestones and Adverse Signals

The company chronology runs from leaving PyTorch in 2022 through three financings, a string of platform launches (FireAttention, FireFunction V2, FireOptimizer, supervised and reinforcement fine-tuning), a Dev Day in 2025, a March 2026 launch on Microsoft Foundry, and the acquisition of Hathora to deepen real-time compute orchestration. Alongside the growth story sit genuine adverse signals that later chapters examine in depth. Independent reviewers note that Fireworks is "just the engine," requiring meaningful developer sophistication, and flag thin documentation and the absence of an ongoing free tier. Analysts highlight three structural risks: inference commoditization as open-source serving frameworks such as vLLM and SGLang improve, hyperscaler bundling by AWS Bedrock, Azure and Vertex, and hardware concentration given Fireworks does not own its GPU fleet while NVIDIA has entered inference directly through its Lepton acquisition. These pressures sit against an unusually strong founding team and fast revenue ramp.[CO025, CO026, CO027, CO028, CO029, CO030]

Milestone table
DateEventTypeAmount / valuation / statusImplication
2022Team leaves Meta PyTorch; Fireworks founded in Redwood Cityfoundingn/aOrigin of inference-systems pedigree
Feb 2024Reaches ~12,000 developersscale12,000 devsEarly bottoms-up traction
Mar 2024Series A led by Benchmarkfinancing$25MFirst institutional lead
Jul 2024Series B led by Sequoiafinancing$52M @ $552MCompound-AI positioning
2024FireFunction V2 and FireAttention V2 launchedproductreleasedFunction calling and long-context speed
Dec 2024Developer base reaches ~23,000scale23,000 devsRoughly doubled in 10 months
Jun 2025Supervised Fine-Tuning V2 releasedproductreleasedBroader model + QAT support
2025Reinforcement fine-tuning and Dev Day 2025productreleasedAgentic tuning wedge
Oct 2025Series C co-led by Lightspeed, Index, Evanticfinancing$250M @ $4B10x customer growth vs Series B
Early 2026Scales to ~15T tokens/dayscale15T tokens/dayThroughput leadership claim
Mar 2026Launch on Microsoft Foundry (Azure)partnershipliveHyperscaler distribution
2026Acquires Hathora for real-time compute orchestrationgovernanceacquisitionVertical integration up the stack
May 2026Reported talks for new round at $15Bfinancing$15B (rumored)Potential ~4x step-up in <1 year

Chronology compiled from Fireworks blogs, funding announcements and analyst profiles; dates for some product launches are approximate to the announcement month.

[CO011, CO014, CO019, CO025, CO026, CO027]
FO001: Company milestone timeline

Dated milestones across founding, financing, product, scale and partnerships.

Some launch dates approximate to announcement month; the $15B round is unconfirmed.

[CO011, CO014, CO019, CO025, CO026, CO017]

1.6 Exhibits

Chapter 02

02Market Analysis

2.1 Market Boundary and Definition

Fireworks operates in the managed AI inference market: the serving, fine-tuning and dedicated deployment of open-weight large language, vision, audio and multimodal models for production applications. The relevant included spend is what enterprises pay third parties to run models in production rather than what they spend training foundation models or renting bare GPUs. Excluded from the core boundary are foundation-model training compute consumed by frontier labs, raw GPU infrastructure-as-a-service from providers such as CoreWeave and Lambda, and closed-model APIs from OpenAI and Anthropic, although closed APIs are the most important status-quo substitute. Adjacent budget pools that Fireworks is expanding into include voice agents, retrieval-augmented generation with vector databases, and reinforcement-learning training for agents. The most direct substitutes for Fireworks are self-hosting open models on vLLM or SGLang, hyperscaler bundles like AWS Bedrock and Azure Foundry, and continued reliance on closed APIs. Defining this boundary first is essential because headline "AI inference" market figures conflate hardware, hyperscaler and independent-provider spend.[CM001, CM002, CM003, CM004, CM005]

Market definition table
Segment / categoryIncluded spendExcluded spendBuyer / payerRelevance to Fireworks
Managed open-weight inferencePer-token serverless serving of open modelsClosed-model API usageEng/platform budgetCore market
Fine-tuning & adaptationLoRA / SFT / RFT training spendFoundation-model pretrainingML/eng budgetCore adjacency
Dedicated / reserved GPU servingManaged dedicated deploymentsBare-metal GPU IaaS rentalPlatform/procurementCore market
Voice & multimodal agentsStreaming STT+LLM+TTS stacksTelephony hardwareProduct budgetExpansion adjacency
RAG / embeddingsEmbedding + reranking inferenceVector DB licensesEng budgetExpansion adjacency
Closed-model APIs (substitute)n/a (excluded)OpenAI/Anthropic API spendEng budgetPrimary substitute

Boundary defines what Fireworks can capture as an independent inference provider; closed APIs and raw GPU IaaS are excluded but listed as substitutes.

[CM001, CM002, CM003, CM004]

2.2 Market Sizing Across Multiple Lenses

No single number captures Fireworks' opportunity, so we triangulate three lenses. The broadest top-down lens, the global AI inference market, is estimated by MarketsandMarkets at $106.15 billion in 2025 growing to $254.98 billion by 2030, a 19.2% CAGR; other research houses place 2026 between roughly $118 billion and $126 billion and 2034 between $312 billion and $536 billion. This lens, however, is dominated by semiconductor and hyperscaler spend and overstates Fireworks' reachable market. A narrower lens is generative-AI model spend, which Gartner (cited by Index Ventures) projects will nearly triple from $14 billion in 2025 to $39 billion by 2028, with much of the growth in specialized and fine-tuned models that favor Fireworks. The most relevant serviceable lens is the independent open-weight inference-serving niche, which has consolidated around roughly seven providers; with Together AI near $1 billion annualized revenue, Fireworks in the $280-800 million range and Groq valued at $6.9 billion, the independent-provider revenue pool is a few billion dollars today but expanding quickly. Fireworks' own ~$280 million-plus revenue represents an early single-digit share of that niche.[CM006, CM007, CM008, CM009, CM010, CM011]

TAM/SAM/SOM or sizing lens table
LensPublisherYearValueCAGR / noteConfidenceLimitation
Top-down AI inference (TAM)MarketsandMarkets2025-2030$106.15B -> $254.98B19.2% CAGRMediumDominated by chips & hyperscalers
Top-down AI inference (alt)Fortune/Polaris/R&M2026 / 2034~$118-126B / $312-536B13-19% CAGRLowWide spread across houses
GenAI model spend (lens)Gartner (via Index)2025-2028$14B -> $39B~40%/yrMediumIncludes closed-model spend
Independent inference niche (SAM)Sacra / triangulated2026Low single-digit $BFast-growingLowNo standard analyst measure
Fireworks revenue (SOM)Fireworks / Sacra2025-2026$280M -> ~$800MHigh growthLowConflicting vintages

Three-lens triangulation; the top-down TAM overstates Fireworks' reachable market, so SAM/SOM rely on company-level estimates with low confidence.

[CM006, CM007, CM008, CM009, CM010, CM011]
FM001: Market sizing lens

TAM/SAM/SOM layers for the AI inference opportunity.

Layers use different vintages; SAM is a triangulated estimate, not an analyst measure.

[CM006, CM009, CM010, CM012]
FM002: Market estimate range

Low/base/high estimates of the AI inference market by forecast year, in USD billions.

Ranges span MarketsandMarkets, Polaris, Fortune, Research and Markets and Gartner estimates; units are USD billions.

[CM006, CM007, CM008]

2.3 Buyer and Segment Map

Demand for Fireworks spans three buyer segments with different adoption paths. AI-native startups (for example Cursor, Perplexity and Liner) adopt bottoms-up: individual developers start with self-serve API keys and pay-as-you-go billing, and the economic buyer is an engineering or platform lead. Digital-native enterprises (DoorDash, Notion, Shopify, Upwork, Quora) move features from pilot to production and expand into dedicated deployments and fine-tuning, with budget owned by product-engineering organizations. Traditional and regulated enterprises (Samsung and, increasingly, healthcare and financial-services buyers) adopt top-down through negotiated contracts, requiring SSO, audit logs, data residency and HIPAA or SOC2 posture, with budgets owned by platform and procurement functions. Across all three, the user is a developer, the payer is an engineering budget, and the dominant adoption trigger is the cost, latency or control limitations of closed-model APIs at production scale. Fireworks' AWS Strategic Collaboration Agreement and Microsoft Foundry availability let it reach these buyers inside existing cloud procurement channels rather than as a standalone vendor.[CM013, CM014, CM015, CM016, CM017]

Segment / buyer map
SegmentBuyerUserPayerAdoption trigger
AI-native startupsEng/platform leadDevelopersEng budgetClosed-API cost/latency at scale
Digital-native enterprisesProduct-eng orgDevelopersEng budgetPilot-to-production scaling
Regulated/Fortune 500Platform + procurementInternal devsProcurement budgetData control & compliance
Voice/agent buildersProduct ownerApp usersProduct budgetSub-500ms latency need
RAG/search teamsEng leadDevelopersEng budgetRetrieval latency & cost

Across segments the user is a developer and the payer an engineering or procurement budget; adoption triggers differ by maturity and regulation.

[CM013, CM014, CM015, CM016]
FM003: Buyer / segment map

Buyer-user-payer relationships and the adoption path into Fireworks.

[CM013, CM014, CM017]
FM004: Adoption funnel or value-chain map

Purchase and deployment stages from awareness to enterprise standardization.

Stages synthesized from Fireworks go-to-market descriptions; values are illustrative relative weights, not disclosed conversion rates.

[CM015, CM016, CM017]

2.4 Growth Drivers and Adoption Constraints

Several drivers expand Fireworks' market. Open-source model quality is converging on closed counterparts, agentic and compound AI systems multiply inference calls per task, fine-tuning on proprietary data is becoming a competitive necessity, and enterprises increasingly want to own their AI rather than depend on a few closed labs. Cost pressure also helps: open-weight inference can run materially cheaper than closed APIs at scale. Working against these are powerful constraints. Inference is commoditizing as vLLM and SGLang improve, compressing the proprietary advantage of optimized stacks and triggering a price race in which Fireworks' Llama 70B price sits within roughly 2% of Together's. Hyperscaler bundling lets AWS, Azure and Google fold inference into existing security, billing and governance relationships. GPU supply is concentrated and Fireworks does not own its fleet. Regulation such as the EU AI Act adds compliance overhead, and the OpenAI-compatible API that lowers switching-in cost also lowers switching-out cost, capping durable lock-in.[CM018, CM019, CM020, CM021, CM022, CM023]

Growth drivers and constraints table
Driver / constraintDirectionTimingImplicationDiligence ask
Open-model quality convergenceDriverNowExpands addressable workloadsTrack OSS vs closed quality gap
Agentic / compound AIDriver1-2 yrsMore inference calls per taskMeasure tokens-per-workflow growth
Fine-tuning on proprietary dataDriverNowHigher-value, stickier spendAssess RFT/SFT attach rates
Enterprise data ownershipDriver1-3 yrsOpen-model preferenceSurvey buyer build-vs-buy
Inference commoditizationConstraintNowMargin/price compressionMonitor vLLM/SGLang parity
Hyperscaler bundlingConstraintNowChannel capture riskAssess Bedrock/Azure overlap
GPU supply concentrationConstraintOngoingCapacity/cost exposureReview GPU contracts
Regulation (EU AI Act)Constraint1-3 yrsCompliance overheadMap obligations by tier

Drivers expand the market while constraints compress margins or capture the channel; timing indicates when each materially affects adoption.

[CM018, CM019, CM020, CM021, CM022, CM023]

2.5 Sizing Gaps and Contradictory Estimates

Several gaps limit confidence in market sizing. Published "AI inference" totals vary widely and bundle incompatible categories (chips, hyperscaler services and independent software), so the top-down TAM cannot be cleanly mapped to Fireworks' reachable revenue. The independent inference-provider revenue pool is not measured by any standard analyst; it must be assembled from individual company estimates of uneven vintage and reliability. Forecast CAGRs range from roughly 13% to 19% across houses, and 2034 estimates differ by more than $200 billion. Within this, Fireworks' own revenue figures are themselves contested across sources. These gaps mean the market is clearly large and growing fast, but the serviceable and obtainable shares relevant to valuation remain estimates rather than measured facts, and any sizing should be treated as directional. We preserve the failed precision rather than assert a single SAM.[CM025, CM026, CM027, CM028]

2.6 Exhibits

Chapter 03

03Competitors

3.1 Competitive Landscape

The inference market has segmented into four distinct competitive layers, and Fireworks faces pressure from each. Managed open-model platforms, principally Together AI, Baseten and Replicate, are the closest direct peers, competing on model breadth, developer experience and per-token price. Vertically integrated silicon players, Groq, Cerebras and SambaNova, attack latency and cost from custom hardware rather than software optimization on commodity GPUs. Hyperscaler bundles, AWS Bedrock, Google Vertex AI, Microsoft Azure Foundry and Databricks Model Serving, are the most structurally threatening because they collapse model access, infrastructure, governance and contracting into one platform. Finally, open-source serving frameworks such as vLLM and SGLang, plus packaging layers like NVIDIA NIM and routers like OpenRouter, commoditize the proprietary advantage embedded in Fireworks' own stack. Status-quo alternatives include continued use of closed APIs and internal self-hosting. The most likely new entrant pressure comes from NVIDIA itself, which entered inference directly via its Lepton acquisition and a competing GPU-cloud marketplace, turning a key supplier into a rival.[CP001, CP002, CP003, CP004, CP005, CP006]

FP001: Competitive positioning map

Providers plotted by price-competitiveness (x) and enterprise/breadth depth (y).

Axis positions are qualitative author judgments synthesizing pricing and capability evidence.

[CP001, CP014, CP015]

3.2 Competitor Profiles

Together AI is Fireworks' closest direct competitor: founded in 2021 by Percy Liang, Chris Ré and Vipul Ved Prakash, it raised a $305 million Series B in February 2025 at a $3.3 billion valuation, reportedly reached about $1 billion annualized revenue by early 2026, and spans serverless inference, dedicated clusters, fine-tuning, voice and reinforcement learning. Baseten positions as an enterprise inference-engineering platform with self-hosted and hybrid VPC deployment; it raised a $300 million round in January 2026 at a $5 billion valuation, led by IVP and CapitalG with a reported $150 million from NVIDIA, lifting total funding to roughly $585 million. Groq competes from custom LPU silicon, raising $750 million in September 2025 at a $6.9 billion valuation and advertising 750-plus tokens per second on Llama models, with a Meta partnership powering the official Llama API. Cerebras and SambaNova extend the hardware-led attack at the premium latency end, while Replicate, Modal and Anyscale compete for developer mindshare. Against these, Fireworks holds a $4 billion valuation and $280 million-plus revenue with category-leading reliability and function calling.[CP007, CP008, CP009, CP010, CP011, CP012]

Competitor profile table
CompetitorLayerFunding / valuationTarget customerProduct scopeIndicative price (Llama 70B)Strategic direction
Fireworks AIManaged open-model$327M raised / $4.0BAI-native + enterprise devsServerless, fine-tuning, RFT, dedicated, voice$0.90/MUp the stack: tuning, agents, governance
Together AIManaged open-model$533.5M / $3.3B (talks $7.5B)Startups to enterpriseServerless, clusters, fine-tuning, voice, RL$0.88/MOwned GPU clusters + breadth
BasetenManaged open-model~$585M / $5.0B (talks $11B)Compliance-heavy enterpriseCustom models, VPC/self-host runtimeQuote-basedEnterprise inference engineering
ReplicateManaged open-modelPrivate / undisclosedDevelopers / experimentationBroad model catalog, run-by-APIPer-runDeveloper mindshare top of funnel
GroqVertical silicon$750M+ / $6.9BLatency-sensitive workloadsLPU inference API$0.59/MCustom silicon + Meta Llama API
Cerebras / SambaNovaVertical siliconPrivate / multi-$BPerformance-sensitiveWafer-scale / RDU inferenceQuote-basedHardware-led latency leadership
AWS Bedrock / Azure / VertexHyperscaler bundlePublic mega-capsExisting cloud enterprisesBundled model access + governanceBundledVendor consolidation
Databricks / NVIDIA NIMHyperscaler / packagingPublic / privateData-platform & infra buyersModel serving / NIM packagingBundledAbsorb inference into platform

Funding and valuation from company announcements and Sacra; prices are indicative Llama 70B serverless rates and vary by tier and date.

[CP007, CP008, CP009, CP010, CP011, CP014]

3.3 Capability, Pricing and GTM Comparisons

On capability, Fireworks differentiates through reliability and structured output: independent monitoring put its Q1 2026 uptime at 99.8%, the highest among specialized providers, and its FireFunction models hit roughly 92% multi-tool function-calling accuracy, ahead of Together and Groq and within a few points of GPT-4o. On price, the field is razor-thin: Llama 3.3 70B runs about $0.90 per million tokens on Fireworks versus $0.88 on Together and $0.59 on Groq, and the same model spreads roughly sixfold across the seven-provider field. On raw speed Groq's LPU dominates at 400-750 tokens per second versus Fireworks' ~145, but Fireworks wins latency consistency and stability under load. On go-to-market, Together and Baseten match Fireworks' bottoms-up developer motion, but hyperscalers win distribution through existing procurement, security and billing relationships. On trust and regulation, Fireworks, Together and Baseten all offer SOC2/HIPAA postures and VPC or data-residency options, with Baseten leaning hardest into self-hosted compliance-heavy deployments.[CP014, CP015, CP016, CP017, CP018, CP019]

Feature / capability matrix
CapabilityFireworksTogether AIBasetenGroq
Serverless open-model APIYesYesYesYes
Model catalog size50+200+Custom-focused15-20
LoRA fine-tuningYesYes + full FTYesNo
Function calling qualityBest-in-class (~92%)GoodGoodBasic
Custom siliconNoNoNoYes (LPU)
VPC / self-hostedEKS airgappedDedicatedYes (core strength)Limited
Voice agent platformYesYesPartnerNo
Reinforcement fine-tuningYesYesPartialNo

Compiled from provider docs, TokenMix and Sacra; 'Best-in-class' reflects independent FireFunction benchmark results.

[CP014, CP015, CP016, CP017]
Pricing / packaging comparison
MetricFireworksTogether AIGroqNote
Llama 3.3 70B ($/1M)$0.90$0.88$0.59Fireworks ~2% over Together, 66% under Bedrock
Llama 3.3 8B ($/1M)$0.20$0.18$0.05Groq cheapest
Q1 2026 uptime99.8%99.7%99.4%Fireworks highest
Throughput (tok/sec)14595420Groq fastest
TTFT P50150ms220ms65msGroq lowest latency
Fine-tuningLoRA $16/MLoRA+full $14/MNoneTogether cheapest/broadest
Batch APINot yetYes (30-50% off)NoTogether advantage

Prices and benchmarks from TokenMix April 2026 and DeployBase; figures are indicative and change frequently.

[CP014, CP015, CP018]
FP002: Feature breadth / capability map

Capability coverage across the four direct and silicon competitors.

Capability cells summarized from provider documentation and benchmarks.

[CP016, CP017]

3.4 Switching Costs, Lock-in and Distribution Power

Switching costs in inference are structurally low. Most providers, including Fireworks, Together, Groq and Baseten, expose OpenAI-compatible APIs, so migration between them can take minutes, and routing aggregators such as OpenRouter and TokenMix actively encourage multi-homing and automatic failover across providers. This caps durable lock-in for everyone and means share is defended by performance, tuning and enterprise integration rather than contracts. Distribution power is increasingly decisive: hyperscalers and NVIDIA control the GPU supply, security posture and procurement relationships that independents depend on. Fireworks' counter is to plug into those channels through its AWS Strategic Collaboration Agreement and Microsoft Foundry availability, while moving up the stack into fine-tuning, reinforcement learning, voice and enterprise governance to create stickier, higher-value relationships. Baseten's VPC and self-hosted footprint and Together's owned data-center and GPU-cluster strategy are alternative answers to the same distribution and supply problem.[CP020, CP021, CP022, CP023, CP024]

3.5 Moat Durability and Adverse Evidence

Fireworks' moat is real but narrow. Its proprietary FireAttention and FireOptimizer stack converts inference-systems engineering into a performance-and-price advantage, and its reliability and function-calling lead are genuine. But the moat faces clear erosion vectors. Open-source serving frameworks like vLLM and SGLang keep closing the performance gap, and Baseten openly builds on them; NVIDIA pushes NIM as a packaging layer; Snowflake released Arctic Inference as an open vLLM plugin. Better-capitalized rivals raise the stakes: Groq at $6.9 billion, Baseten at $5 billion and Together near a rumored $7.5 billion all have more balance-sheet room for GPU commitments and enterprise go-to-market. Hardware concentration is an adverse signal too, since Fireworks does not own GPUs while NVIDIA, a supplier and investor, now competes directly. The durable question is whether Fireworks can keep extending its stack into tuning, agents and governance faster than the ecosystem commoditizes the serving layer.[CP025, CP026, CP027, CP028, CP029, CP030]

Moat durability / competitive risk register
RiskMechanismSeverityEvidence
OSS serving parityvLLM/SGLang close performance gapHighBaseten builds on SGLang/vLLM/TGI
NIM packagingNVIDIA standardizes enterprise inferenceMediumNVIDIA pushes NIM distribution
Supplier-as-competitorNVIDIA enters inference via LeptonHighNVIDIA GPU-cloud marketplace
Hyperscaler bundlingBedrock/Azure absorb inferenceHighBedrock custom model import (Qwen)
Capital asymmetryRivals raise larger roundsMediumGroq $6.9B, Baseten $5B
Price commoditizationRazor-thin per-token spreadsHighFireworks within 2% of Together
Low switching costOpenAI-compatible APIs + routersMediumOpenRouter multi-homing
Hardware concentrationNo owned GPU fleetMediumSources NVIDIA/AMD third-party

Risk register synthesizing Sacra analysis and pricing/benchmark sources; severity is the author's qualitative judgment.

[CP025, CP026, CP027, CP028, CP029, CP030]
FP003: Moat / readiness KPIs

Indicators of Fireworks' competitive standing.

KPIs synthesize benchmark and funding evidence; speed ratio is Fireworks throughput over Groq.

[CP014, CP015, CP028]

3.6 Exhibits

Chapter 04

04Financials

4.1 Revenue Streams and Pricing Model

Fireworks operates a usage-based B2B model layered across several product surfaces that map to the customer lifecycle. Serverless inference is billed per token, fine-tuning is billed per training token, reinforcement fine-tuning is billed per GPU-hour, and on-demand dedicated deployments are billed per GPU-second or GPU-hour, while reserved capacity is contracted separately on longer commitments at negotiated pricing. This lets Fireworks capture revenue at nearly every stage of a customer's AI workflow, from experimentation through scaled production. Published serverless rates illustrate the model: roughly $0.90 per million tokens for Llama 3.3 70B, $0.20 for the 8B model and $0.50 for DeepSeek V3, with image generation from about $0.013 to $0.04 per image and reserved capacity near $4.80 per hour per replica. Revenue mix is not disclosed, but analysts expect a shift toward higher-value dedicated deployments, fine-tuning and enterprise contracts rather than commodity serverless token volume, which would improve both margin and revenue durability over time.[CI001, CI002, CI003, CI004, CI005]

Revenue streams table
StreamBilling basisLifecycle stageMargin profileDisclosure
Serverless inferencePer tokenExperimentation & productionLower (commodity)Rates public
Fine-tuning (LoRA/SFT)Per training tokenAdaptationHigherRates public
Reinforcement fine-tuningPer GPU-hourAdaptation/agentsHigherRates public
On-demand dedicatedPer GPU-second/hourProduction scalingHigherRates public
Reserved capacityContracted commitmentScaled enterpriseHighest (negotiated)Not public
Voice / multimodalUsage-basedExpansionMixedPartially public

Streams and billing bases from Sacra and Fireworks pricing; revenue mix across streams is not disclosed and margin profile is qualitative.

[CI001, CI002, CI003]
Pricing / monetization table
ItemPriceUnitNote
Llama 3.3 70B$0.90per 1M tokens~2% over Together, 66% under Bedrock
Llama 3.3 8B$0.20per 1M tokensEntry workloads
DeepSeek V3$0.50per 1M tokensFrontier open model
Flux 1.1 Pro$0.04per imageUp to 1024x1024
SDXL 1.0$0.013per imageLower-cost image gen
Reserved capacity$4.80per hour per replica~50 concurrent requests
LoRA fine-tune (70B)$16per 1M training tokens$2/M over Together
Free credits$1one-timeNo ongoing free tier

Indicative April 2026 serverless rates from TokenMix and DeployBase; prices change frequently and exclude negotiated enterprise terms.

[CI004, CI005, CI007]
FI001: Revenue model bridge

How usage-based streams build toward total revenue across the customer lifecycle.

Stream shares are illustrative; Fireworks does not disclose revenue mix.

[CI001, CI002, CI003]

4.2 Go-to-Market and Sales Efficiency

Fireworks' go-to-market is bottoms-up at entry and top-down at expansion. Developers start immediately with self-serve API keys and pay-as-you-go billing, supported by $1 of free credits rather than an ongoing free tier, and a standard rate limit near 600 requests per minute. Larger customers graduate into negotiated enterprise relationships with higher rate limits, reserved capacity, account management, custom optimization and private deployment. Layered on top is a field and partner sales motion, anchored by an AWS Strategic Collaboration Agreement that funds proofs-of-concept and a startup acceleration program, giving Fireworks access to enterprise buyers through existing procurement channels rather than requiring a standalone vendor evaluation. Sales-efficiency metrics such as CAC, payback and net revenue retention are not disclosed, but the land-and-expand structure, in which a single serverless feature can grow into dedicated, fine-tuning, voice and reserved-capacity spend, is the principal efficiency lever, with blended annualized revenue per company estimated near $28,000 across a base skewed toward a smaller number of large production deployments.[CI006, CI007, CI008, CI009, CI010]

Unit economics table
MetricValue / statusDriverConfidence
Gross margin~50%GPU COGS heavyMedium
Target gross margin60%Utilization + Blackwell + mixLow
Blended ARPA~$28K/yr10,000+ companiesLow
Revenue concentrationSkewed to large deploymentsProduction whalesLow
Multi-LoRA utilizationMany variants per base modelLower cost/variantMedium
CAC / paybackNot disclosedBottoms-up + partner salesLow
Net revenue retentionNot disclosedLand-and-expandLow

Unit-economics figures are Sacra estimates or qualitative; CAC, payback and NRR are not public.

[CI008, CI009, CI011, CI012, CI013]

4.3 Cost Structure and Gross Margin Drivers

Fireworks is not a pure software business: GPU procurement, capacity planning and regional infrastructure are real cost inputs embedded in cost of goods sold, which is why Sacra estimates gross margin near 50%, well below the 70%-plus typical of subscription software. Management has told investors it targets 60% through better GPU utilization, hardware efficiency gains on newer architectures such as NVIDIA Blackwell, and a revenue-mix shift toward dedicated and enterprise workloads. The core economic logic is that proprietary inference optimization, FireAttention and FireOptimizer, translates engineering into pricing power: if Fireworks serves a model faster and at higher throughput than a customer could self-host, it can charge a premium while undercutting the alternative's total cost. Multi-LoRA improves utilization by consolidating many fine-tuned variants onto a single base-model deployment, lowering compute cost per variant. The cost environment is shaped by NVIDIA and AMD data-center GPU economics, both of which report rapidly growing AI accelerator revenue, underscoring that Fireworks' input costs sit inside a supplier-driven, capacity-constrained market.[CI011, CI012, CI013, CI014, CI015, CI016]

FI002: Unit economics bridge

How GPU cost becomes gross margin via proprietary optimization and pricing power.

[CI011, CI012, CI013, CI014]

4.4 Public Traction Versus Private-Metric Gaps

Public traction signals are strong but inconsistently dated. Fireworks stated annualized revenue surpassing $280 million at its October 2025 Series C; Sacra estimates roughly $305 million at year-end 2025 rising to about $800 million annualized by May 2026; a third-party profile cites $315 million-plus by early 2026; and earlier 2025 coverage reported $130 million ARR with claims of profitability and roughly 20x year-over-year growth. The platform processes more than 10 trillion tokens per day (15 trillion by early 2026) across 10,000-plus companies and hundreds of thousands of developers. These are largely company-stated or estimated figures; audited financials, revenue mix, net revenue retention, churn and headcount are not public. The wide revenue spread, from $130 million to roughly $800 million annualized within twelve months, reflects both genuine hypergrowth and inconsistent measurement, and any single number should be treated as directional rather than verified.[CI017, CI018, CI019, CI020, CI021]

Public financial gaps table
MetricPublic statusWhat is missingDiligence path
Revenue / ARRConflicting estimatesSingle reconciled dated figureManagement-confirmed ARR
Gross marginAnalyst estimate ~50%Audited marginConfirmatory financials
Net revenue retentionNot disclosedExpansion / churn dataCohort retention pack
HeadcountNot disclosedEmployee countHR / LinkedIn estimate
Burn & runwayNot disclosedCash flow statementBank balances + burn
Revenue mixNot disclosedStream-level splitProduct revenue breakdown

All listed metrics are private; the table frames the diligence asks needed to verify financial quality.

[CI017, CI018, CI020, CI028, CI029]
FI003: Financial estimate range

Annualized revenue estimates for Fireworks by source and vintage, in USD millions.

Estimates span company statements and third-party analysts of differing vintage; ranges approximate stated point figures.

[CI017, CI018, CI019]

4.5 Capital Adequacy and Financing Dependency

Fireworks has raised more than $327 million across seed, Series A, B and C rounds; the October 2025 Series C alone provided $250 million, comprising roughly $230 million of primary capital and a $20 million secondary, at a $4 billion valuation. That primary injection, combined with reported profitability in 2025 and a high-growth revenue base, suggests comfortable near-term capital adequacy, though cash on hand, burn rate and runway are not disclosed. The company has signaled it will grow its compute footprint three-to-four-fold over the next year, a capital-intensive plan that increases dependence on GPU access and could become the next-round trigger; Sacra reports Fireworks is in talks to raise again at a $15 billion valuation as of May 2026. The principal financing dependency is GPU supply: Fireworks does not own its fleet and sources NVIDIA and AMD capacity from third parties, exposing it to allocation constraints and to NVIDIA's own entry into inference. No public debt or project-finance obligations are disclosed.[CI022, CI023, CI024, CI025, CI026]

Capital adequacy table
ItemValue / statusAs ofNote
Total raised>$327MOct 2025Seed through Series C
Series C size$250MOct 2025$230M primary + $20M secondary
Valuation$4.0BOct 2025Post-money
ProfitabilityReported profitableMid-2025Per scroll.media; unverified
Cash / burn / runwayNot disclosed2026Diligence blocker
Planned use of funds3-4x compute expansionNext yearCapital-intensive
Next-round signal$15B talksMay 2026Per Sacra; unconfirmed
Debt / project financeNone disclosed2026No public obligations

Capital figures from company and Sacra; cash, burn and runway are not public, limiting capital-adequacy assessment.

[CI022, CI023, CI024, CI025]
FI004: Capital intensity / cash-flow map

How capital flows into compute and infrastructure and back into revenue and margin.

Flow synthesizes stated use of funds and analyst estimates; cash and burn are not disclosed.

[CI022, CI024, CI025, CI026]

4.6 Financial Verdict

On revenue quality, Fireworks shows credible hypergrowth and a usage-based model that captures spend across the customer lifecycle, but the absence of audited figures, disclosed revenue mix and retention metrics caps confidence. On margin, the roughly 50% gross margin is the central financial weakness: it is structurally below software norms because of GPU costs, and the path to the stated 60% target depends on utilization gains and a mix shift that are plausible but unproven. On capital intensity, the three-to-four-fold compute expansion and lack of owned GPUs make the model more capital-hungry and supply-dependent than a typical SaaS company. The main diligence blockers are a single reconciled revenue figure, gross-margin and unit-economics verification, burn and runway, and net revenue retention. The picture is a fast-scaling, well-funded business with real but supplier-exposed economics rather than a proven high-margin software compounder.[CI027, CI028, CI029, CI030]

4.7 Exhibits

Chapter 05

05Product & Technology

5.1 Product Definition in Customer Workflow Terms

In customer terms, Fireworks is the layer that takes an open-source model and makes it run in production fast, cheaply and reliably without the customer managing GPUs. A developer signs up, points an OpenAI-compatible API at a model such as Llama 4, DeepSeek or Qwen, and gets low-latency inference with function calling, JSON-mode structured output and streaming. As usage grows, the same customer can fine-tune a model on proprietary data, move to dedicated or reserved GPU capacity for guaranteed throughput, add retrieval and embeddings for RAG, and deploy voice agents. The platform spans text, image (Flux, SDXL), audio and multimodal formats across hundreds of models with day-zero support for major new releases. The core job it does for customers is to collapse the gap between a model that works in a notebook and one that serves millions of users in production, which Fireworks positions as the difference between experimentation and shipping. This is why its customers describe it as an inference engine rather than an application: it supplies speed, cost and control, while the customer builds the product.[CE001, CE002, CE003, CE004, CE005]

Workflow / use-case table
Use caseCustomer exampleResultSource type
Code generationCursor~1,000 tokens/sec Fast ApplyCustomer story
Productivity AINotionLatency 2s -> 350msCustomer story
Code assistanceSourcegraph30% lower latency, 2.5x acceptanceCustomer/AWS
Proposal draftingUpwork (Uma)Real-time tailored proposalsCustomer story
Conversational searchQuora (Poe)Tripled response speedReported
Email assistantSuperhumanAsk AI compound systemCustomer story
Enterprise searchHebbiaFast access to new open modelsAnalyst

Use cases and outcomes drawn from Fireworks customer stories, an AWS case study and analyst coverage; results are vendor- or customer-reported.

[CE002, CE018, CE019, CE020]
FE002: Customer workflow / operating flow

Developer journey from API call through speculative decoding to response.

[CE001, CE013, CE015]

5.2 Product Module and Asset Map

Fireworks' product surface decomposes into several modules. Serverless inference is the entry product: pay-per-token access to 50-plus actively served models (hundreds across the catalog), including Llama 4 Scout and Maverick, DeepSeek V3, Qwen 3, Mixtral, Gemma 3 and Phi-4, with image generation via Flux and SDXL and vision models. FireFunction is the proprietary function-calling model family for tool use and structured output. The customization modules include LoRA fine-tuning, Supervised Fine-Tuning V2 with quantization-aware training, and Reinforcement Fine-Tuning for agentic tasks, all exposed through a Build SDK and an Experiment Platform. The deployment modules span serverless, on-demand dedicated and reserved capacity, plus multi-LoRA hosting that packs many fine-tuned adapters onto one base deployment. Newer surfaces include the Voice Agent Platform, which co-locates transcription, language models and tool calling for sub-500ms response, and BYOB secure training that lets enterprises train from their own AWS S3 buckets. Together these modules let a single customer relationship expand from one serverless feature into a full production AI runtime.[CE006, CE007, CE008, CE009, CE010]

Product module / asset matrix
ModuleWhat it doesBillingMaturity
Serverless inferencePer-token access to 50+ served modelsPer tokenGA
FireFunctionFunction calling / structured outputPer tokenGA
LoRA fine-tuning / SFT V2Customize models with QATPer training tokenGA
Reinforcement fine-tuningTrain agents to surpass closed modelsPer GPU-hourGA
Dedicated / reserved deploymentsGuaranteed throughput on dedicated GPUsPer GPU-hourGA
Multi-LoRA hostingMany adapters on one base modelPer tokenGA
Voice Agent PlatformSTT + LLM + tool calling, sub-500msUsage-basedNewer
Build SDK / Experiment PlatformProgrammatic build, tune, evaluateIncludedNewer

Module list compiled from Fireworks blog and docs; maturity is qualitative (GA = generally available, Newer = recently launched).

[CE006, CE007, CE008, CE009]

5.3 Architecture and Operating Model

Fireworks runs a proprietary, multi-layer inference stack on commodity NVIDIA GPUs. At the kernel layer, FireAttention is a custom CUDA attention implementation that Fireworks reports as substantially faster than vLLM and TensorRT-LLM, extended across versions to support long context and architectures like Llama 4's chunked local attention. Above it, FireOptimizer performs adaptive speculative execution, personalizing speculative decoding, draft-model selection and caching to each workload, with reported latency reductions up to roughly 3x in production and native FP4 support on NVIDIA Blackwell B200 hardware. The serving topology combines a stateless request router, draft and target GPU pods for speculative decoding, a distributed KV cache, continuous batching and disaggregated serving, scaling to documented tests around 50,000 requests per minute. Multi-LoRA consolidates many fine-tuned variants onto a single base model. The operating model is open-model neutral: Fireworks bets on running whichever open model is winning at a given moment rather than on any single model, which makes day-zero support for new releases a core engineering discipline.[CE011, CE012, CE013, CE014, CE015, CE016]

Technology / operating architecture table
LayerComponentFunctionDifferentiation
APIOpenAI-compatible APIModel access, streaming, JSON modeLow switching-in cost
OrchestrationStateless request routerRoute requests across podsScale to ~50K RPM
OptimizationFireOptimizerAdaptive speculative executionUp to ~3x lower latency
SpeculationDraft + target podsSpeculative decodingParallel token generation
KernelFireAttentionCustom CUDA attentionFaster than vLLM/TensorRT-LLM
MemoryDistributed KV cacheReuse context, cut prefillLower latency on long context
AdaptationMulti-LoRAMany adapters per base modelHigher GPU utilization
HardwareNVIDIA/AMD GPUs (incl. B200)Compute substrate, FP4Day-zero on new silicon

Architecture compiled from Fireworks blog/docs and independent technical write-ups; performance claims are vendor- or analyst-reported.

[CE011, CE012, CE013, CE014, CE015]
FE001: Product architecture map

The layered Fireworks inference stack from API down to GPU hardware.

Layering synthesized from Fireworks blog/docs and independent architecture write-ups.

[CE011, CE012, CE013, CE014]

5.4 Deployment, Reliability, Integration and Roadmap

Fireworks supports serverless, on-demand dedicated and reserved deployments across a global multi-region fleet with documented locations including Frankfurt, Iceland and Tokyo plus US, Europe and APAC regions, enabling latency and data-residency requirements. Integration is eased by an OpenAI-compatible API plus SDKs and connectors for frameworks such as LangChain and LlamaIndex, so migration from closed APIs can take minutes. Reliability is a headline claim: independent monitoring placed Q1 2026 uptime at 99.8%, the highest among specialized providers, with strong stability under load. Documented production results include Cursor reaching about 1,000 tokens per second for code generation, Notion cutting AI response latency from roughly 2 seconds to 350 milliseconds, and Sourcegraph seeing a 30% latency reduction and a 2.5x increase in completion acceptance. The roadmap, funded by the Series C, targets deeper research in tuning and inference alignment, an end-to-end model-lifecycle toolchain, and a three-to-four-fold expansion of global compute, alongside the Hathora acquisition to deepen real-time orchestration.[CE017, CE018, CE019, CE020, CE021, CE022]

Roadmap / release / development-stage table
ItemStageTimingImplication
FireAttention (v2+)Shipped2024+Long-context speed
FireFunction V2Shipped2024Function calling
FireOptimizerShipped2024Adaptive optimization
Supervised Fine-Tuning V2ShippedJun 2025QAT, more models
Reinforcement fine-tuningShipped2025Agentic tuning
Voice Agent PlatformShipped2025-2026New budget category
Microsoft Foundry launchShippedMar 2026Azure distribution
Model-lifecycle toolchainPlanned2026+End-to-end creation
3-4x compute expansionPlanned2026Capacity scale

Release timeline from Fireworks blog, docs changelog and analyst coverage; planned items are company-stated roadmap intent.

[CE008, CE021, CE022]
FE004: Product maturity / capability map

Maturity of each module across capability dimensions.

Maturity cells are author judgments synthesizing product and compliance evidence.

[CE006, CE008, CE029, CE031]

5.5 Differentiation, IP and Data

Fireworks' differentiation is engineering-led. Its core intellectual property is the proprietary inference engine, especially FireAttention's custom kernels and FireOptimizer's adaptive optimization, which convert systems expertise from the founders' PyTorch background into measurable speed and cost advantages; no public patents are listed, so the moat is know-how rather than registered IP. A second source of differentiation is product-model co-design: a data feedback loop in which customer interactions continuously improve fine-tuned models, which Fireworks frames as how enterprises build a competitive moat with AI. A third is breadth and freshness: day-zero support across hundreds of open models and modalities, so the platform benefits from model turnover rather than being threatened by it. The principal vulnerability is that the optimization advantage sits atop open-source frameworks like vLLM and SGLang that keep improving, so the differentiation must be continuously re-earned. Supply access to leading-edge NVIDIA and AMD GPUs is a further enabling, but not exclusive, advantage.[CE023, CE024, CE025, CE026, CE027]

FE003: Critical dependency map

Upstream dependencies the Fireworks platform relies on.

Dependency graph synthesized from technical sources; edge direction shows upstream-to-platform reliance.

[CE015, CE024, CE026, CE027]

5.6 Trust, Safety, Security and Compliance

Fireworks' enterprise posture is built for regulated buyers. The platform offers zero data retention by default, single sign-on, audit logs, and data-residency controls, and its AWS-based inference solution is HIPAA and SOC2 Type II compliant. For the most sensitive workloads it supports airgapped EKS deployments and bring-your-own-bucket secure training that keeps training data in the customer's own AWS S3. Structured output controls such as JSON mode and grammar-constrained decoding improve reliability and reduce malformed responses in agentic workflows, and FireFunction's high schema-compliance rate supports dependable tool use. These capabilities open regulated verticals including healthcare, financial services and government-adjacent workloads that were previously inaccessible to a standalone inference vendor. Quality is reinforced by continuous evaluation and reinforcement learning in the product-model co-design loop. Gaps remain: Fireworks does not publish a formal standard-tier SLA, enterprise SLAs are negotiated case by case, and independent reviewers note thin documentation in places, which are diligence items for security-sensitive buyers.[CE028, CE029, CE030, CE031, CE032]

Trust / quality / compliance table
ControlStatusScopeNote
SOC2 Type IICompliantAWS-based inferencePer AWS case study
HIPAACompliantAWS-based inferenceEnables healthcare
Zero data retentionDefaultEnterprisePrivacy posture
SSO / audit logsAvailableEnterpriseGovernance
Data residencyAvailableMulti-regionFrankfurt/Iceland/Tokyo
Airgapped EKSAvailableSensitive workloadsIsolation
BYOB secure trainingAvailableSFT/RFTCustomer AWS S3
Standard-tier SLANot publishedServerlessNegotiated for enterprise

Compliance posture from AWS case study and Sacra; the absence of a published standard SLA is a diligence item.

[CE028, CE029, CE030, CE032]

5.7 Exhibits

Chapter 06

06Customers

6.1 Customer Base Segmentation

Fireworks' customer base spans three broad segments distinguished by buyer, user, payer and adoption path. AI-native startups, including Cursor, Perplexity, Liner and Cresta, adopt bottoms-up: individual developers start with self-serve API keys, and the economic buyer is an engineering or platform lead. Digital-native enterprises, such as DoorDash, Notion, Shopify, Upwork and Quora, move features from pilot into production and expand into dedicated deployments and fine-tuning, with budget owned by product-engineering organizations. Traditional and larger enterprises, exemplified by Samsung and Uber, and increasingly regulated buyers in healthcare and financial services, adopt top-down through negotiated contracts requiring compliance and data-residency controls. Across all three, the user is a developer, the payer is an engineering or procurement budget, and use cases cluster around code assistance, conversational AI, enterprise search, agentic workflows and voice. Geographically the base skews North American and European with global API access, and verticals span software, e-commerce, marketplaces, customer service and legal tech.[CU001, CU002, CU003, CU004, CU005]

Customer segmentation table
SegmentExample customersBuyer / payerUse caseAdoption path
AI-native startupsCursor, Perplexity, Liner, CrestaEng lead / eng budgetCode, search, conversational AIBottoms-up self-serve
Digital-native enterprisesDoorDash, Notion, Shopify, Upwork, QuoraProduct-eng / eng budgetProduction AI featuresPilot to production
Large / regulated enterprisesSamsung, UberPlatform + procurementEnterprise AI roadmapsTop-down contract
Enterprise search / agentsSourcegraph, HebbiaEng lead / eng budgetCode + enterprise searchLand-and-expand
Communication / productivitySuperhumanProduct ownerCompound AI assistantsFeature-led

Segments and example customers from Fireworks blogs, Sacra and AI Market Watch; segment boundaries are analytical and some customers span multiple.

[CU001, CU002, CU003, CU004]
FU001: Customer journey map

Stages a customer moves through from discovery to enterprise standardization.

Journey synthesized from Fireworks go-to-market descriptions; not all customers traverse every stage.

[CU002, CU009, CU022]

6.2 Adoption Trajectory

Adoption has scaled steeply. Fireworks reported powering over 10,000 companies at its October 2025 Series C, roughly a tenfold increase from about 1,000 at the Series B, alongside hundreds of thousands of developers. The developer base grew from about 12,000 in February 2024 to 23,000 by the end of that year. Usage intensity is high: the platform processes more than 10 trillion tokens per day, rising to about 15 trillion by early 2026, indicating that many accounts run production rather than experimental workloads. Customers progress along a land-and-expand path, beginning with serverless inference for a single feature and expanding into dedicated deployments, fine-tuning, reinforcement fine-tuning, embeddings for retrieval and voice agents. Analyst commentary on Hebbia illustrates how a single inference relationship, anchored on fast access to new open models and high-concurrency latency guarantees, can grow into a broader infrastructure dependency. The trajectory is strong on breadth and usage, though account-level retention and cohort expansion data are not disclosed.[CU006, CU007, CU008, CU009, CU010]

Customer growth / adoption trajectory table
MetricValueAs ofSource basis
Companies served~1,000Series B (2024)Company-stated
Companies served10,000+Oct 2025Company-stated
Developers~12,000Feb 2024Reported
Developers~23,000Dec 2024Reported
DevelopersHundreds of thousandsOct 2025Company-stated
Tokens/day (Oct 2025)10T+Oct 2025Company-stated
Tokens/day (early 2026)~15TEarly 2026Third-party profile

Trajectory figures are company-stated or third-party; growth is rapid but account-level retention is not disclosed.

[CU006, CU007, CU008]
FU002: Adoption / deployment funnel

Relative narrowing from developer signups to standardized enterprise accounts.

Funnel values are illustrative relative weights; Fireworks does not disclose conversion rates.

[CU006, CU007, CU009]

6.3 Named Customer Proof

Fireworks has unusually strong named, production-grade proof points for a company of its age. Cursor used Fireworks' speculative-decoding API to reach roughly 1,000 tokens per second for Fast Apply code generation, with an AI researcher publicly stating Fireworks is "way more performant than the open source engines" and used in production. Notion reduced AI response latency from about two seconds to 350 milliseconds by fine-tuning with Fireworks, attributed by its head of AI engineering. Sourcegraph cut latency by 30% and lifted completion acceptance 2.5x, and Upwork's "Uma" assistant drafts real-time proposals on Fireworks. Quora's Poe chatbot tripled response speed, and Superhuman built its Ask AI compound system on the platform. These are mostly production deployments with named executives and quantified outcomes, giving the reference base high quality and reasonable freshness, though several case studies date to 2024 and a few logos appear only in aggregate marketing lists without standalone case studies.[CU011, CU012, CU013, CU014, CU015, CU016]

Named customer proof table
CustomerDeploymentOutcomeReference qualityFreshness
CursorProduction~1,000 tok/sec Fast Apply; named researcher quoteHigh (quote + metric)2024-2025
NotionProductionLatency 2s -> 350ms; named exec quoteHigh (quote + metric)2025
SourcegraphProduction30% lower latency, 2.5x acceptanceHigh (AWS + story)2024
UpworkProductionUma real-time proposals; named execHigh (quote)2025
Quora (Poe)ProductionTripled response speedMedium (reported)2024
SuperhumanProductionAsk AI compound systemMedium (story)2024
SamsungEnterpriseAI roadmap accelerationMedium (investor cited)2025
DoorDashProductionHigh-throughput AI featuresMedium (logo + AWS)2025

Named, mostly production references with quantified outcomes; some date to 2024 and a few logos appear only in aggregate lists, hence partial coverage.

[CU011, CU012, CU013, CU014, CU015]
FU003: Customer proof matrix

Reference quality across deployment status, quantified outcome and named attribution.

Cells synthesize evidence quality from customer stories and the AWS case study.

[CU011, CU016, CU031]

6.4 Retention and Durability

Retention is the weakest-evidenced dimension of the customer story. Fireworks does not disclose net revenue retention, gross retention, churn, renewal rates or contract lengths, so durability must be inferred from structural signals rather than measured. The positive signals are real: the platform's land-and-expand design, multi-product surface and enterprise controls encourage expansion, blue-chip logos run production workloads, and the OpenAI-compatible API plus reliability lead reduce reasons to leave once integrated. The negative signals are equally real: the same OpenAI-compatible API and the rise of routing aggregators make multi-homing and switching trivial, inference is commoditizing, and razor-thin price differentiation versus Together limits pricing-based stickiness. Independent reviewers explicitly note alternatives and switching paths. The net assessment is that durability is plausibly supported by product depth and integration but is not yet evidenced by disclosed retention metrics, which is a material diligence gap.[CU017, CU018, CU019, CU020, CU021]

Retention / repeat usage / satisfaction table
DimensionStatusSignalConfidence
Net revenue retentionNot disclosedLand-and-expand structureLow
Gross retention / churnNot disclosedNo public dataLow
Contract lengthNot disclosedEnterprise negotiatedLow
Repeat usageHigh (implied)10T+ tokens/day productionMedium
SatisfactionPositive (anecdotal)Named exec testimonialsMedium
Switching riskElevatedOpenAI-compatible API + routersMedium

Retention metrics are undisclosed; positive signals are structural/anecdotal while switching risk is elevated by low lock-in.

[CU017, CU018, CU019, CU020]
FU004: Retention / repeat cohort

Qualitative retention signal by customer segment (disclosed metrics absent).

Cohort cells are qualitative author judgments; Fireworks discloses no quantitative cohort retention.

[CU017, CU019, CU021]

6.5 Expansion and Concentration Risk

Fireworks' growth engine is land-and-expand, with a single serverless feature able to grow into dedicated, fine-tuning, voice and reserved-capacity spend, supported by an AWS Strategic Collaboration Agreement that reaches buyers through existing procurement channels. The principal concentration risks are twofold. First, revenue is likely skewed toward a smaller number of large production deployments, so blended annualized revenue per company near $28,000 understates a probable long tail beneath a few large accounts; the identity and share of top customers are not disclosed, creating top-customer risk that cannot be quantified. Second, distribution and partner dependence is real: the AWS alliance and Microsoft Foundry availability are growth accelerants but also channel dependencies, and several marquee customers (for example DoorDash and Shopify) are themselves sophisticated buyers capable of multi-homing or in-housing. Procurement friction is lower than for closed APIs because of cloud-marketplace availability, but enterprise sales cycles and compliance reviews still gate the largest deals.[CU022, CU023, CU024, CU025, CU026]

Expansion and concentration risk table
FactorDirectionDetailDiligence ask
Land-and-expandPositiveServerless -> dedicated/tuning/voiceMeasure expansion revenue %
Blended ARPANeutral~$28K/yr across baseGet ARPA distribution
Top-customer concentrationRiskRevenue skewed to large deploymentsDisclose top-10 revenue share
Channel dependenceRiskAWS + Microsoft Foundry channelsAssess direct vs partner-sourced mix
Customer multi-homingRiskSophisticated buyers can multi-homeCheck single-vendor commitments
Procurement frictionNeutralLower via cloud marketplacesMap enterprise sales-cycle length

Concentration and channel risks are inferred from analyst commentary and the AWS/Azure partnerships; top-customer share is undisclosed.

[CU022, CU023, CU024, CU025]

6.6 Exhibits

Chapter 07

07Risks

7.1 Severity-Ranked Risk Overview

Fireworks is a fast-scaling, well-funded business whose principal risks are commercial and structural rather than acute legal or operational failures. The highest-severity risks are inference commoditization and gross-margin compression, hyperscaler bundling that could capture the inference layer, and hardware-supply dependence on NVIDIA, which is simultaneously a supplier, an investor and, via its Lepton acquisition and NIM packaging, a competitor. Medium-severity risks include capital intensity from a planned three-to-four-fold compute expansion, key-person concentration in CEO Lin Qiao, low switching costs that cap retention, and an aggressive valuation ramp from $552 million to $4 billion and a rumored $15 billion. Lower but non-trivial risks include regulatory overhead from the EU AI Act and GDPR, open-model licensing constraints, the absence of registered patents, undisclosed burn and runway, and reliance on AWS and Microsoft distribution channels. The mitigation thesis is consistent across categories: move up the stack into tuning, agents, voice and enterprise governance faster than the serving layer commoditizes, while diversifying silicon and plugging into incumbent procurement channels. Residual exposure remains meaningful because several mitigations are unproven and several key metrics are undisclosed.[CR001, CR002, CR003, CR004, CR005, CR006]

Risk heatmap summary
RiskLikelihoodImpactMitigation maturityResidual exposure
Inference commoditization / marginHighHighMediumHigh
Hyperscaler bundlingMediumHighMediumHigh
NVIDIA supplier-competitorMediumHighLowHigh
Capital intensity / burnMediumMediumLowMedium
Key-person concentrationLowHighLowMedium
Low switching cost / churnHighMediumLowMedium
Valuation rampMediumMediumLowMedium
Regulatory (EU AI Act/GDPR)MediumLowMediumLow

Severity ratings are the author's qualitative synthesis of analyst, review and filing evidence; residual exposure reflects mitigation maturity.

[CR001, CR002, CR003, CR004, CR025]
FR001: Risk heatmap

Likelihood versus impact and residual exposure across major risk categories.

Cells are qualitative author judgments synthesizing analyst, review and filing evidence.

[CR001, CR002, CR003, CR007]

7.2 Regulatory and Legal Risk

Fireworks' regulatory and legal exposure is real but currently manageable. The most material regime is the EU AI Act, which imposes tiered, risk-based obligations including transparency and documentation duties on general-purpose and foundation-model providers and their deployers; Fireworks' role as an inference and fine-tuning platform places it in the compliance chain for EU customers. GDPR and data-residency requirements drive the company's zero-data-retention, data-residency and regional-deployment features, and any lapse carries fines and reputational cost. Open-model licensing is a subtler legal risk: models such as Llama carry acceptable-use and license terms, and unresolved industry questions about training-data copyright could flow through to platforms that serve those models. Intellectual-property exposure runs the other way too: Fireworks lists no public patents, so its FireAttention and FireOptimizer advantages rest on trade secrets and know-how that are harder to defend if key engineers leave. No material litigation or enforcement action against Fireworks is publicly known, and its Series C was executed with top-tier legal counsel, but the regulatory surface will expand as the company sells into healthcare, financial services and government-adjacent verticals.[CR007, CR008, CR009, CR010, CR011, CR012]

Regulatory / legal risk register
RiskRegime / sourceLikelihoodImpactMitigation
EU AI Act obligationsEU AI Act (GPAI/deployer duties)MediumMediumCompliance + documentation
Data privacy / GDPRGDPR / data residencyMediumMediumZero retention, EU regions
Open-model licensingLlama / model licensesLowMediumLicense compliance, model neutrality
Training-data copyright spilloverIndustry IP uncertaintyLowMediumServes third-party models
IP defensibilityNo registered patentsMediumMediumTrade-secret protection
Sector compliance expansionHIPAA / financial / govMediumLowSOC2/HIPAA posture
Litigation / enforcementNone known publiclyLowMediumTop-tier legal counsel

Regulatory register; no material litigation against Fireworks is publicly known, and several items are sector- and jurisdiction-dependent, hence partial coverage.

[CR007, CR008, CR009, CR010, CR011]

7.3 Operational, Quality and Security Risk

Operationally, Fireworks' defining exposure is GPU supply. The company does not own its fleet and sources NVIDIA and AMD capacity from third parties, leaving it exposed to allocation constraints, supply bottlenecks and hardware-transition timing as it scales compute three-to-four-fold. Reliability is a strength on observed data, with independently monitored Q1 2026 uptime of 99.8%, but Fireworks does not publish a formal standard-tier SLA, so contractual reliability commitments are negotiated case by case and incident history is opaque. Running a global multi-region fleet across Frankfurt, Iceland, Tokyo and US, EU and APAC regions adds operational complexity and cost. Security and compliance posture is comparatively strong, with SOC2 Type II and HIPAA on AWS-based inference, zero data retention, airgapped EKS and bring-your-own-bucket training, and no public breach is known; nonetheless, a single serious outage or data incident would be especially damaging given the production, latency-sensitive nature of customer workloads. Reviewers also flag thin documentation and potential support strain as the company scales, which are quality risks rather than safety risks.[CR013, CR014, CR015, CR016, CR017, CR018]

7.4 Partner and Dependency Risk

Fireworks sits inside a dense web of dependencies. The most acute is NVIDIA, which supplies the leading-edge GPUs Fireworks' performance and margin claims rely on, holds an investment stake, and now competes directly through its Lepton acquisition, a GPU-cloud marketplace and NIM packaging. AWS and Microsoft are both partners and threats: their Strategic Collaboration Agreement and Foundry availability provide distribution, but Bedrock, Vertex and Azure can bundle inference into existing security, billing and governance relationships and absorb the category. Fireworks also depends on the continued release and permissive licensing of open models from Meta, DeepSeek, Alibaba and others; a slowdown in open-model quality or a shift to restrictive licenses would undercut its open-model-neutral thesis. Cloud-platform dependence, capital-provider concentration among a handful of late-stage funds, and key-customer concentration among sophisticated buyers capable of multi-homing round out the dependency map. The common thread is that Fireworks' enabling partners are also its most credible competitors, so partnership depth and supplier diversification are central to the risk picture.[CR019, CR020, CR021, CR022, CR023, CR024]

Partner / dependency risk register
DependencyRoleRiskSeverity
NVIDIAGPU supplier + investor + competitorAllocation, supplier-as-rivalHigh
AMDAlternative silicon supplierSmaller ecosystem maturityMedium
AWSCloud + channel partnerBundling via BedrockHigh
MicrosoftFoundry distributionBundling via AzureMedium
Open-model labsMeta / DeepSeek / AlibabaModel supply & licensingMedium
Late-stage investorsCapital providersFinancing concentrationLow
Key customersSophisticated buyersMulti-homing / in-housingMedium

Dependency register; the recurring theme is that Fireworks' enabling partners are also its most credible competitors.

[CR019, CR020, CR021, CR022, CR023]
FR003: Dependency map

Critical external dependencies and their failure paths.

Dependency edges show upstream reliance; NVIDIA, AWS and Azure are simultaneously partners and competitors.

[CR019, CR020, CR021, CR022]

7.5 Financial, Model and Execution Risk

Financially, the central risk is margin compression. Gross margin near 50% is structurally below software norms because GPU costs sit in cost of goods sold, and a razor-thin price gap versus Together plus improving open-source serving frameworks create persistent downward pressure; the stated path to 60% depends on unproven utilization gains and a revenue-mix shift. Capital intensity compounds this: the three-to-four-fold compute expansion requires recurring capacity spend, and burn, runway and net revenue retention are undisclosed, so capital adequacy is asserted rather than verified. The valuation ramp from $552 million to $4 billion within roughly fifteen months, with talk of $15 billion, embeds aggressive growth expectations that a slowdown or margin disappointment would punish. On execution and people, the founding team's PyTorch pedigree is a strength but concentrates key-person risk in CEO Lin Qiao, and retaining elite inference engineers in a hot market is a continuing challenge. The mitigation logic across all of these is the same up-the-stack diversification, but its success is the core open question of the investment.[CR025, CR026, CR027, CR028, CR029, CR030]

People / execution risk register
RiskDetailLikelihoodImpact
Key-person concentrationCEO Lin Qiao leads vision and fundraisingLowHigh
Founder/engineer retentionElite inference talent in hot marketMediumMedium
Org scalingRapid headcount and GTM build-outMediumMedium
Execution on roadmapUp-the-stack expansion unprovenMediumHigh
Governance opacityBoard composition undisclosedLowLow

People/execution risks are inferred from founder concentration and roadmap ambition; headcount and board details are undisclosed.

[CR029, CR030, CR033]

7.6 Mitigations, Monitoring and Thesis-Break Triggers

Fireworks' mitigations are coherent: extend the stack into fine-tuning, reinforcement learning, voice and enterprise governance to escape commodity serving; diversify silicon across NVIDIA and AMD and pursue Blackwell efficiency; maintain day-zero open-model support so model turnover is a tailwind; harden enterprise compliance to win regulated verticals; and plug into AWS and Azure procurement rather than fighting them. The monitoring indicators that matter are gross-margin trajectory toward 60%, the revenue mix shifting to dedicated and enterprise, net revenue retention once disclosed, GPU-cost and allocation terms, and the competitive gap versus vLLM and SGLang. The clearest thesis-break triggers are gross margin failing to rise off ~50% or compressing further, a hyperscaler or NVIDIA capturing the inference layer and relegating Fireworks to an optimization add-on, a key-person departure, or growth stalling below the pace implied by the $4 billion-plus valuation. The priority diligence asks are a reconciled revenue and margin figure, NRR and burn, GPU-supply contracts, and top-customer concentration. The overall residual exposure is moderate-to-high, concentrated in commoditization and dependency risk rather than legal or operational failure.[CR031, CR032, CR033, CR034, CR035, CR036]

Mitigation and kill criteria table
RiskMitigationMonitoring indicatorThesis-break trigger
CommoditizationMove up stack to tuning/agents/voiceRevenue mix shiftMargin stuck/declining at ~50%
Hyperscaler bundlingPlug into AWS/Azure channelsDirect vs partner mixInference absorbed by Bedrock/Azure
NVIDIA dependenceDiversify to AMD, Blackwell efficiencyGPU cost/allocation termsNVIDIA undercuts on price/access
Margin compressionUtilization + enterprise mixGross margin toward 60%Margin compresses below 50%
Key-person riskDeepen leadership benchExec retentionLin Qiao departure
Growth durabilityLand-and-expand + NRRNRR, logo growthGrowth stalls vs valuation

Mitigations and kill criteria synthesize analyst commentary and company strategy; triggers are the author's thesis-break thresholds.

[CR031, CR032, CR034, CR035, CR036]
FR002: Risk transmission map

How commoditization and dependency risks transmit into financial outcomes.

Transmission edges synthesize analyst risk analysis; direction shows risk propagation.

[CR001, CR002, CR025, CR026]

7.7 Exhibits

Chapter 08

08Valuation

8.1 Investment Thesis and Anti-Thesis

The bull thesis is that Fireworks is becoming critical enterprise AI infrastructure, the runtime layer for open-model inference, at the exact moment enterprises shift from closed-API experimentation to owning customized models in production. It pairs a rare founding team that built PyTorch with genuine product advantages (FireAttention, FireOptimizer, best-in-class function calling, 99.8% uptime), blue-chip production references (Cursor, Notion, Sourcegraph, Upwork), and hypergrowth from roughly $130 million ARR in mid-2025 to a reported ~$800 million annualized by May 2026 across 10,000-plus customers. If managed inference is priced as durable infrastructure rather than a commodity, the valuation can compound. The anti-thesis is that inference is structurally commoditizing: gross margins sit near 50% because GPU costs dominate COGS, per-token prices sit within ~2% of Together, open-source serving frameworks keep closing the gap, switching costs are near zero, and the most powerful players, AWS, Azure and NVIDIA, are simultaneously partners and competitors capable of repricing the category. On this view, Fireworks risks becoming an optimization add-on on ~50% margins, and a valuation that ran from $552 million to $4 billion in fifteen months, with $15 billion in talks, already prices in flawless execution.[CV001, CV002, CV003, CV004, CV005, CV006]

Thesis / anti-thesis table
DimensionBull thesisBear anti-thesis
MarketInference is the new runtime, huge TAMReachable SAM small vs hyperscalers
ProductFireAttention/FireOptimizer edge + reliabilityOSS frameworks close the gap
CustomersBlue-chip production referencesLow switching, multi-homing
FinancialsHypergrowth to ~$800M~50% margins, price race
CompetitionBest reliability + function callingSandwiched on price & speed
DependencyStrategic NVIDIA/AWS/Azure supportSame players can reprice category
ValuationInfrastructure multiple justifiedPrices in flawless execution

Symmetric thesis/anti-thesis framing; the deciding variables are margin trajectory and retention, both undisclosed.

[CV001, CV002, CV003, CV004, CV005]
FV001: Recommendation logic

How thesis factors combine into the track recommendation.

Logic flow summarizing the recommendation drivers; weights are qualitative.

[CV007, CV008, CV001, CV002]

8.2 Recommendation, Confidence and Stance

We rate Fireworks AI track with medium confidence, a high risk rating and a stretched valuation stance, with an overall score of 6.5 out of 10. The business quality justifies close engagement and a position at the right entry, but the current and rumored prices demand conviction on two unproven variables: that gross margin can climb meaningfully off ~50% toward the stated 60% target, and that growth is durable rather than a commodity land-grab vulnerable to hyperscaler capture. At the October 2025 Series C, the $4 billion valuation implied roughly 14 times the company-stated $280 million annualized revenue; on Sacra's ~$800 million May 2026 estimate the same $4 billion would be about 5 times, but the rumored $15 billion round implies roughly 19 times that higher base. The wide range reflects genuine uncertainty about the right revenue figure and the right multiple for a sub-software-margin, fast-commoditizing category. The recommendation is therefore to track closely, underwrite to the base case, insist on entry discipline below the rumored mark, and require margin and retention disclosure before committing at a premium. Confidence is capped at medium primarily by the absence of audited financials, disclosed NRR, and a reconciled revenue figure.[CV007, CV008, CV009, CV010, CV011]

Recommendation summary table
DimensionAssessmentBasis
RecommendationTrackHigh quality, demanding price
ConfidenceMediumUnaudited financials, undisclosed NRR
Risk ratingHighCommoditization + dependency
Valuation stanceStretched$15B talk on ~50% margins
Overall score6.5 / 10Strong business, rich price
Entry disciplineBelow rumored $15BUnderwrite to base case

Recommendation synthesizes thesis, financials, customers, competition and risk chapters; score is the author's composite judgment.

[CV007, CV008, CV009]

8.3 Financing Context and Entry Discipline

Fireworks has raised over $327 million across seed, Series A ($25M, 2024), Series B ($52M at $552M, July 2024) and Series C ($250M at $4B, October 2025), the last comprising roughly $230 million primary and a $20 million secondary. As of May 2026 it is reportedly in talks to raise at a $15 billion post-money valuation co-led by Index Ventures, a near-quadrupling in about seven months. For a private late-stage entry, the key disciplines are the revenue base used to strike the multiple, the preference stack and any liquidation overhang, and dilution from continued raising into a capital-intensive compute build-out. Public evidence supports the growth and customer story but not the financial quality: revenue figures are unaudited and conflict across sources, gross margin is an analyst estimate, and burn and runway are undisclosed. The presence of strategic investors NVIDIA, AMD, MongoDB and Databricks on the cap table is double-edged, adding ecosystem support but also concentrating supplier and partner influence. Entry discipline should anchor on the base-case valuation, treat the $15 billion mark as a stretch that requires margin proof, and account for preference and dilution that are not publicly disclosed.[CV012, CV013, CV014, CV015, CV016]

8.4 Bull, Base and Bear Cases

Our base case (about 45% weight) assumes Fireworks reaches roughly $700-900 million annualized revenue in 2026 and continues growing while gross margin improves only modestly into the low 50s; share is held but commoditization caps the multiple, implying a fair enterprise value around $5-8 billion, roughly in line with or modestly above the $4 billion Series C and below the rumored $15 billion. The bull case (about 30%) assumes the up-the-stack strategy works: fine-tuning, reinforcement learning, voice and governance lift margins toward 58-60%, revenue compounds past $1.5 billion by 2027, and Fireworks becomes a platform-of-record, justifying a $15-20 billion valuation. The bear case (about 25%) assumes commoditization and hyperscaler capture: margins stay near 50% or compress, growth decelerates sharply as buyers multi-home or shift to Bedrock and Azure, and the multiple compresses to a $2-3 billion range or a down round. The dispersion is unusually wide because the same company can be read as durable infrastructure or a commoditizing reseller, and the deciding evidence, margin trajectory and retention, is not yet disclosed.[CV017, CV018, CV019, CV020, CV021]

Bull / base / bear scenario table
ScenarioProbabilityKey assumptions2026-27 revenueMarginImplied value
Bull~30%Up-the-stack works, platform-of-record>$1.5B by 202758-60%$15-20B
Base~45%Holds share, modest margin gain$700-900M (2026)low 50s$5-8B
Bear~25%Commoditization + hyperscaler captureGrowth halves~50% or lower$2-3B / down round

Scenario probabilities and ranges are the author's estimates; revenue uses company and Sacra figures and is unaudited.

[CV017, CV018, CV019, CV020]
FV003: Valuation / return range

Implied enterprise value by scenario, USD billions.

Scenario value ranges are author estimates anchored to comparable multiples and the disclosed marks.

[CV017, CV018, CV019]

8.5 Comparable Set

Private comparables anchor the analysis. Together AI, the closest peer, was valued at $3.3 billion on about $618 million annualized revenue in early 2025 (roughly 5x) and is reportedly in talks near $7.5 billion on about $1 billion (roughly 7-8x). Baseten raised at a $5 billion valuation in January 2026 and is reportedly discussing $11 billion, while Groq reached $6.9 billion as a hardware-led player on a different model, and Fal is cited around $4.5 billion. Against these, Fireworks at $4 billion on ~$280 million (Series C vintage) looks rich versus Together's multiple but is on a smaller, faster-growing base; on the ~$800 million May 2026 estimate it looks comparatively cheap, and the $15 billion talk re-stretches it. Public infrastructure-software comparables, Datadog, Snowflake, Confluent, Cloudflare, MongoDB and DigitalOcean, frame the multiple ceiling: high-growth public infra trades in a broad band but has compressed from peak, and lower-margin infrastructure businesses like DigitalOcean trade at clear discounts to pure software. Because Fireworks carries ~50% gross margins, a discount to pure-SaaS multiples is warranted, and hyperscalers Amazon, Microsoft and Oracle are both the scale reference and the competitive threat. The comparable set supports a wide, scenario-dependent value rather than a single point.[CV022, CV023, CV024, CV025, CV026, CV027]

Comparable valuation table
CompanyTypeValuationRevenue (annualized)Implied multipleNote
Fireworks AIPrivate round$4.0B (Oct 2025)~$280M~14xSeries C vintage
Fireworks AIPrivate (rumored)$15B (2026)~$800M~19xTalks, unconfirmed
Together AIPrivate round$3.3B (Feb 2025)~$618M~5xClosest peer
Together AIPrivate (rumored)$7.5B (2026)~$1.0B~7-8xIn talks
BasetenPrivate round$5.0B (Jan 2026)Undisclosedn/aTalks of $11B
GroqPrivate round$6.9B (Sep 2025)Hardware modeln/aDifferent model
Public infra SaaSPublic compsDatadog/Snowflake/CloudflareMulti-$B~8-20x EV/revMargin >70%
DigitalOceanPublic compLower multiple~$0.8BLow single-digitInfra-heavy discount

Private rounds from company and Sacra; public comps framed qualitatively from filings. Coverage is partial: not every peer's revenue is disclosed.

[CV022, CV023, CV024, CV025, CV026]
FV002: Valuation sensitivity

Implied valuation at different revenue and multiple assumptions, USD billions.

Sensitivity grid using company and Sacra revenue figures times illustrative multiples; not a forecast.

[CV009, CV022, CV023]

8.6 Exit Readiness and Final Diligence

Exit optionality is strong in direction but unproven in timing. Plausible paths include an IPO if Fireworks sustains hypergrowth and lifts margins toward software-like levels, or strategic acquisition by a hyperscaler or data-platform investor (AWS, Microsoft, Databricks, MongoDB, NVIDIA) seeking to own the inference layer, though several of those are also competitors. The principal thesis-break triggers are gross margin failing to rise off ~50%, hyperscaler or NVIDIA capture of the inference layer, a key-person departure, or growth stalling below the pace implied by the valuation. The priority final diligence asks are a single reconciled and dated revenue figure, audited or management-confirmed gross margin and the path to 60%, net revenue retention and churn cohorts, burn and runway against the compute build-out, GPU-supply contract terms, top-customer concentration, and the preference and dilution structure of the next round. Until those are answered, the right posture is to track the company closely, build conviction on the base case, and reserve a premium entry for confirmation that margin and retention support the infrastructure thesis rather than the commodity one.[CV028, CV029, CV030, CV031, CV032]

Thesis-break and kill triggers table
TriggerSignalAction
Margin stagnationGross margin stuck at ~50% or fallingExit / avoid premium
Hyperscaler captureInference absorbed by Bedrock/AzureReassess durability
NVIDIA repricingSupplier undercuts on price/accessCut exposure
Growth stallRevenue decelerates vs valuationDown-round risk
Key-person lossLin Qiao departsRe-underwrite
Retention shortfallNRR below ~110% once disclosedLower multiple

Kill triggers map the conditions that would invalidate the infrastructure thesis; thresholds are the author's.

[CV028, CV029, CV030]
Final diligence asks table
AskWhy it mattersOwner
Reconciled dated revenueSets the multiple denominatorCompany / finance
Audited gross margin + 60% pathTests the premium thesisCompany / finance
NRR and churn cohortsDurability of revenueCompany / RevOps
Burn and runwayFinancing risk vs compute planCompany / finance
GPU-supply contractsMargin and supply exposureCompany / infra
Top-customer concentrationRevenue concentration riskCompany / sales
Preference & dilutionEntry economicsCompany / legal

Diligence asks are the gating items before committing at a premium valuation.

[CV031, CV032]
FV004: Investment KPIs

Headline investability indicators.

KPIs synthesize the recommendation and valuation analysis; multiples use unaudited revenue.

[CV007, CV009, CV010]

8.7 Exhibits

Disclaimer

This report is for informational purposes only, is based on public sources as of 2026-06-14, and is not investment advice. Financial figures are largely unaudited company statements or third-party estimates and should be independently verified before any decision.

Evidence index

Claims
IDStatementConfidenceSources
CO001 Fireworks AI is an AI inference-cloud company headquartered in Redwood City, California. High SO018, SO020, SO025
CO002 Fireworks AI was founded in late 2022 by a team that left Meta's PyTorch organization. High SO002, SO004, SO014
CO003 Fireworks operates an "AI Cloud" platform that runs, fine-tunes and scales open-source LLM, vision, audio and multimodal models with low-latency inference. Medium SO002, SO013, SO001
CO004 Fireworks monetizes via usage-based pricing including per-token serverless inference, per-training-token fine-tuning, per-GPU-hour reinforcement fine-tuning and dedicated deployments. Medium SO013
CO005 Fireworks positions itself on a "one-size-fits-one" thesis favoring smaller customizable open models over generic closed foundation models. Medium SO002, SO005
CO006 Lin Qiao is CEO and co-founder of Fireworks AI and previously led the PyTorch team at Meta. High SO004, SO016, SO018
CO007 Fireworks AI was co-founded by seven people, most of whom worked together on PyTorch at Meta. Medium SO004, SO014, SO023
CO008 Co-founders Dmytro Dzhulgakov and Dmytro Ivchenko are Ukrainian former Meta PyTorch engineers. Medium SO014, SO004
CO009 Lin Qiao holds a Ph.D. in Computer Science from UC Santa Barbara and previously worked at LinkedIn and IBM. Medium SO018, SO016
CO010 Other co-founders include James Reed, Benny Chen, Chenyu Zhao and Pawel Garbacki, with backgrounds at Meta PyTorch, ads and ML teams and Google Vertex AI. Medium SO004, SO023
CO011 Fireworks AI raised a $250 million Series C in October 2025 at a $4 billion valuation. High SO002, SO019, SO020
CO012 The Series C was co-led by Lightspeed Venture Partners, Index Ventures and Evantic with continued support from Sequoia Capital. High SO002, SO021, SO022
CO013 A $52 million Series B led by Sequoia closed in July 2024 at a $552 million valuation with NVIDIA, AMD and MongoDB Ventures participating. High SO003, SO008, SO009
CO014 Fireworks AI has raised more than $327 million in total funding as of October 2025. High SO002, SO013
CO015 A $25 million Series A led by Benchmark closed in March 2024 with Sequoia, Databricks Ventures and angels including Frank Slootman, Sheryl Sandberg, Howie Liu and Alexandr Wang. Medium SO014, SO003
CO016 The Series B brought Fireworks AI's cumulative capital raised to $77 million. Medium SO003
CO017 As of May 2026 Sacra reports Fireworks is in talks to raise at a $15 billion post-money valuation with Index set to co-lead, on unconfirmed terms. Low SO013
CO018 Fireworks AI reported powering over 10,000 companies at its October 2025 Series C, roughly a tenfold increase from its Series B. Medium SO002, SO013
CO019 Fireworks reported annualized revenue surpassing $280 million at the time of the October 2025 Series C. Medium SO002
CO020 The Series C round comprised roughly $230 million of primary funding and a $20 million secondary transaction per Sacra. Medium SO013
CO021 Fireworks AI's developer base grew from about 12,000 in February 2024 to 23,000 by the end of 2024. Medium SO014
CO022 The Fireworks platform processes more than 10 trillion tokens per day as of October 2025, rising to about 15 trillion per day by early 2026 per third-party profiles. Medium SO002, SO018
CO023 Earlier 2025 coverage cited Fireworks at $130 million ARR, profitable, and growing roughly 20x year over year. Low SO014
CO024 Sacra estimates Fireworks AI's gross margin near 50 percent, below software norms, with management targeting 60 percent through GPU optimization. Medium SO013
CO025 Fireworks launched Microsoft Foundry (Azure) availability in March 2026, extending open-model inference to Azure customers. Medium SO018
CO026 Fireworks shipped FireFunction V2, FireAttention V2, FireOptimizer, supervised fine-tuning V2 and reinforcement fine-tuning between 2024 and 2026. Medium SO003, SO013
CO027 Fireworks AI acquired Hathora to deepen real-time and global compute orchestration. Medium SO013
CO028 Fireworks does not own its GPU fleet and sources NVIDIA and AMD capacity from third parties. Medium SO013
CO029 Analysts cite inference commoditization, hyperscaler bundling and hardware concentration as the main structural risks to Fireworks. Medium SO013
CO030 Independent reviewers describe Fireworks as "just the engine," requiring developer sophistication, with thin documentation and no ongoing free tier. Medium SO026
CO031 Fireworks offers an OpenAI-compatible API plus function calling, fine-tuning and enterprise security controls across hundreds of models. Medium SO001, SO002
CO032 Investors at Index Ventures and Sequoia cite the founding team's PyTorch and inference-systems pedigree as the core reason for backing Fireworks. Medium SO004, SO005
CO033 CEO Lin Qiao concentrates fundraising, vision and public representation, creating a meaningful key-person dependency. Low SO004, SO015
CO034 NVIDIA has entered the inference market directly via its Lepton acquisition and a competing GPU cloud marketplace, raising supplier-as-competitor risk for Fireworks. Medium SO013
CO035 Company-stated revenue figures and third-party estimates for Fireworks differ materially across vintages, from $130M ARR in mid-2025 to ~$800M annualized by May 2026. Low SO002, SO013, SO014
CM001 Fireworks AI competes in the managed AI inference market for serving and tuning open-weight models in production. Medium SM010, SM013
CM002 The core included spend is third-party production model serving, fine-tuning and dedicated deployment, not foundation-model training. Medium SM010, SM009
CM003 Closed-model APIs from OpenAI and Anthropic are excluded from the core market but are the primary status-quo substitute. Medium SM009, SM025
CM004 Self-hosting on vLLM or SGLang and hyperscaler bundles such as Bedrock and Azure Foundry are direct substitutes for Fireworks. Medium SM010, SM015
CM005 Adjacent expansion pools include voice agents, RAG/embeddings and reinforcement-learning training for agents. Medium SM010
CM006 MarketsandMarkets estimates the AI inference market at $106.15 billion in 2025 growing to $254.98 billion by 2030 at a 19.2% CAGR. High SM001, SM003
CM007 Other research houses place the 2026 AI inference market between roughly $118 billion and $126 billion and 2034 between $312 billion and $536 billion. Low SM002, SM003, SM005
CM008 Gartner projects generative-AI model spend to nearly triple from $14 billion in 2025 to $39 billion by 2028. Medium SM009
CM009 The independent open-weight inference-serving market has consolidated around roughly seven providers as of Q2 2026. Medium SM006
CM010 With Together AI near $1 billion annualized revenue and Fireworks in the $280-800 million range, the independent-provider revenue pool is a few billion dollars in 2026. Low SM011, SM010
CM011 Fireworks' $280 million-plus revenue represents an early single-digit share of the independent inference niche. Low SM010, SM013
CM012 The most relevant lens for valuing Fireworks is the independent inference niche, not the headline AI inference TAM. Medium SM006, SM010
CM013 AI-native startups adopt Fireworks bottoms-up via self-serve API keys with an engineering lead as economic buyer. Medium SM010
CM014 Digital-native enterprises such as DoorDash, Notion, Shopify and Upwork move features from pilot to production on Fireworks. Medium SM013, SM010
CM015 Regulated and Fortune 500 buyers require SSO, audit logs, data residency and HIPAA/SOC2 posture and adopt top-down via procurement. Medium SM010
CM016 Across segments the user is a developer and the payer is an engineering or procurement budget. Medium SM010, SM013
CM017 Fireworks reaches buyers through cloud procurement channels via an AWS Strategic Collaboration Agreement and Microsoft Foundry availability. Medium SM010, SM015
CM018 Open-source model quality convergence and agentic compound AI are primary drivers expanding inference demand. Medium SM009, SM013
CM019 Inference is commoditizing as vLLM and SGLang improve, compressing the proprietary advantage of optimized stacks. Medium SM010, SM025
CM020 Hyperscaler bundling by AWS, Azure and Google folds inference into existing security, billing and governance relationships. Medium SM010, SM015
CM021 Fireworks' Llama 70B price sits within roughly 2% of Together AI's, illustrating razor-thin price differentiation. Medium SM023, SM006
CM022 GPU supply is concentrated and Fireworks does not own its fleet, creating capacity and cost exposure. Medium SM010
CM023 The EU AI Act imposes tiered obligations that add compliance overhead for AI deployment in Europe. Medium SM026
CM024 The OpenAI-compatible API lowers both switching-in and switching-out costs, capping durable lock-in. Medium SM023, SM010
CM025 Published AI inference TAM figures bundle chips, hyperscaler services and independent software, so they overstate Fireworks' reachable market. Medium SM001, SM006
CM026 The independent inference-provider revenue pool is not measured by any standard analyst and must be assembled from uneven company estimates. Low SM010, SM011, SM012
CM027 Forecast CAGRs for AI inference range from roughly 13% to 19% and 2034 estimates differ by more than $200 billion across houses. Low SM001, SM002, SM003
CM028 Despite wide estimate spreads, the AI inference market is clearly large and growing double digits, with directional rather than precise SAM. Medium SM001, SM004
CM029 There is no public evidence of near-term saturation in the AI inference market; growth drivers remain intact through the forecast window. Low SM002, SM004
CM030 Fine-tuned and specialized models are projected to capture much of the generative-AI model-spend growth, favoring Fireworks' tuning products. Medium SM009
CM031 The serverless open-weight inference field shows roughly 6x price spread and 5-7x latency spread across providers on the same model. Medium SM006
CM032 Together AI, Groq, Baseten, Cerebras, Replicate, Anyscale and OctoAI are the other named providers in the consolidated inference field. Medium SM006, SM016, SM019
CM033 Voice agents targeting sub-500ms latency expand Fireworks into contact-center and telephony budget categories larger than API inference alone. Medium SM010
CM034 Demand differs by maturity: startups optimize cost-per-token while Fortune 500 buyers prioritize control, compliance and vendor consolidation. Medium SM010, SM015
CM035 A defensible 2026 AI inference market figure is roughly $118-126 billion, between the 2025 base and the 2030 forecast. Low SM001, SM003
CP001 The inference market has segmented into managed open-model platforms, vertically integrated silicon, hyperscaler bundles and open-source serving frameworks. High SP009, SP010
CP002 Together AI, Baseten and Replicate are Fireworks' closest managed open-model competitors. Medium SP009, SP010
CP003 Groq, Cerebras and SambaNova attack inference from custom silicon rather than software optimization on commodity GPUs. Medium SP009, SP005
CP004 AWS Bedrock, Google Vertex, Azure Foundry and Databricks Model Serving collapse model access, infrastructure and governance into one platform. Medium SP009, SP016
CP005 Open-source serving frameworks vLLM and SGLang plus NVIDIA NIM and routers like OpenRouter commoditize proprietary inference advantage. Medium SP009
CP006 NVIDIA entered inference directly via its Lepton acquisition and a competing GPU-cloud marketplace, becoming a supplier-turned-rival. Medium SP009
CP007 Together AI raised a $305 million Series B in February 2025 at a $3.3 billion valuation and reached about $1 billion annualized revenue by early 2026. High SP002, SP018
CP008 Together AI was founded in 2021 by Percy Liang, Chris Re and Vipul Ved Prakash and spans serverless, clusters, fine-tuning, voice and RL. Medium SP002, SP018
CP009 Baseten raised $300 million in January 2026 at a $5 billion valuation led by IVP and CapitalG with a reported $150 million from NVIDIA. High SP004, SP007
CP010 Baseten positions as an enterprise inference-engineering platform with self-hosted and hybrid VPC deployment built on TensorRT, SGLang, vLLM and TGI. Medium SP003, SP015
CP011 Groq raised $750 million in September 2025 at a $6.9 billion valuation and advertises 750-plus tokens per second on Llama models from custom LPU silicon. High SP005, SP006, SP017
CP012 Groq's partnership with Meta to power the official Llama API gives it strong distribution and first-party open-model credibility. Medium SP009
CP013 Replicate, Modal and Anyscale compete for developer mindshare at the top of the adoption funnel. Medium SP012, SP013, SP014
CP014 Fireworks' Q1 2026 uptime of 99.8% is the highest among specialized inference providers per independent monitoring. Medium SP001
CP015 Llama 3.3 70B runs about $0.90 per million tokens on Fireworks versus $0.88 on Together and $0.59 on Groq. Medium SP001, SP010
CP016 FireFunction achieves roughly 92% multi-tool function-calling accuracy, ahead of Together and Groq and within a few points of GPT-4o. Medium SP001
CP017 Together offers a 200-plus model catalog with full fine-tuning while Groq offers 15-20 models and no fine-tuning. Medium SP001
CP018 Groq's LPU delivers 400-750 tokens per second versus Fireworks' ~145, but Fireworks wins latency consistency under load. Medium SP001, SP010
CP019 Fireworks, Together and Baseten all offer SOC2/HIPAA postures and VPC or data-residency options, with Baseten leaning hardest into self-hosted compliance. Medium SP003, SP009
CP020 Most inference providers expose OpenAI-compatible APIs, making migration between them a matter of minutes. Medium SP001, SP020
CP021 Routing aggregators such as OpenRouter and TokenMix encourage multi-homing and automatic failover across providers. Medium SP001, SP009
CP022 Hyperscalers and NVIDIA control the GPU supply, security posture and procurement relationships that independents depend on. Medium SP009, SP016
CP023 Fireworks plugs into incumbent channels via an AWS Strategic Collaboration Agreement and Microsoft Foundry availability. Medium SP009, SP016
CP024 Fireworks does not own GPUs and sources NVIDIA and AMD capacity from third parties, unlike Together's owned data-center strategy. Medium SP009, SP002
CP025 Fireworks' proprietary FireAttention and FireOptimizer stack converts inference-systems engineering into a performance and price advantage. Medium SP009
CP026 Open-source serving frameworks keep closing the performance gap, and Baseten openly builds on vLLM and SGLang. Medium SP009, SP003
CP027 NVIDIA pushes NIM as a packaging layer and Snowflake released Arctic Inference as an open vLLM plugin, compressing proprietary advantage. Medium SP009
CP028 Groq at $6.9 billion, Baseten at $5 billion and Together near a rumored $7.5 billion are better capitalized than Fireworks at $4 billion. Medium SP005, SP004, SP002
CP029 Independent reviewers describe Fireworks as "just the engine," an adverse signal about its application-level differentiation versus full-stack rivals. Medium SP023
CP030 Fireworks' durability depends on extending into tuning, agents and governance faster than the ecosystem commoditizes the serving layer. Medium SP009
CP031 Fireworks' most defensible differentiation is reliability plus best-in-class function calling rather than price or raw speed. Medium SP001
CP032 The same Llama model spreads roughly sixfold in price and 5-7x in latency across the seven-provider field. Medium SP010
CP033 Together AI has raised $533.5 million in total funding from investors including General Catalyst, Prosperity7, NVIDIA, Salesforce and Kleiner Perkins. Medium SP002
CP034 Baseten's valuation roughly doubled from $2.15 billion in September 2025 to $5 billion in January 2026, with talks of an $11 billion round by May 2026. Medium SP003, SP004
CP035 Hyperscaler bundling is plausibly the single biggest structural threat to Fireworks because it removes the need for a standalone inference vendor. Low SP009, SP016
CI001 Fireworks bills serverless inference per token, fine-tuning per training token, reinforcement fine-tuning per GPU-hour and dedicated deployments per GPU-second or GPU-hour. High SI002, SI003
CI002 Fireworks' usage-based pricing maps to the customer lifecycle, capturing revenue across experimentation, production, adaptation and scaled deployment. Medium SI002
CI003 Reserved capacity is contracted separately on longer commitments at negotiated pricing and is the highest-margin stream. Medium SI002
CI004 Fireworks publishes serverless rates of about $0.90 per million tokens for Llama 3.3 70B, $0.20 for the 8B model and $0.50 for DeepSeek V3. Medium SI004, SI005
CI005 Image generation runs from about $0.013 (SDXL) to $0.04 (Flux 1.1 Pro) per image and reserved capacity near $4.80 per hour per replica. Medium SI004
CI006 Fireworks' go-to-market is bottoms-up at entry via self-serve API keys and top-down at expansion via negotiated enterprise relationships. Medium SI002
CI007 Fireworks offers $1 of free credits rather than an ongoing free tier and a standard rate limit near 600 requests per minute. Medium SI004
CI008 Fireworks runs a field and partner sales motion anchored by an AWS Strategic Collaboration Agreement with funded proofs-of-concept and a startup acceleration program. Medium SI002, SI007
CI009 Blended annualized revenue per company is estimated near $28,000 across Fireworks' 10,000-plus customer base. Low SI002
CI010 Fireworks revenue is likely concentrated among a smaller number of large production deployments rather than evenly across the base. Low SI002
CI011 Sacra estimates Fireworks' gross margin near 50%, below the 70%-plus typical of subscription software, because GPU costs sit in cost of goods sold. Medium SI002
CI012 Management targets a 60% gross margin through better GPU utilization, Blackwell-generation efficiency and a mix shift toward dedicated and enterprise workloads. Medium SI002
CI013 Multi-LoRA improves utilization by consolidating many fine-tuned variants onto a single base-model deployment, lowering compute cost per variant. Medium SI002, SI018
CI014 Proprietary optimization via FireAttention and FireOptimizer lets Fireworks charge a premium over self-hosting while undercutting the alternative's total cost. Medium SI002, SI016
CI015 NVIDIA reports rapidly growing data-center GPU revenue, evidencing the supplier-driven, capacity-constrained input market Fireworks operates within. Medium SI012
CI016 AMD's data-center accelerator business is also scaling, offering Fireworks an alternative silicon supplier to NVIDIA. Medium SI013
CI017 Fireworks stated annualized revenue surpassing $280 million at its October 2025 Series C. High SI001, SI006
CI018 Sacra estimates Fireworks at roughly $305 million annualized at year-end 2025 rising to about $800 million by May 2026. Low SI002
CI019 Earlier 2025 coverage reported Fireworks at $130 million ARR, profitable, and growing roughly 20x year over year. Low SI009
CI020 Fireworks' audited financials, revenue mix, net revenue retention, churn and headcount are not public. Medium SI002, SI010
CI021 Fireworks processes more than 10 trillion tokens per day, rising to 15 trillion by early 2026. Medium SI001, SI010
CI022 Fireworks has raised more than $327 million across seed, Series A, B and C rounds. High SI001, SI002
CI023 The October 2025 Series C provided $250 million, roughly $230 million primary and $20 million secondary, at a $4 billion valuation. High SI002, SI001
CI024 Fireworks plans to grow its compute footprint three-to-four-fold over the next year, a capital-intensive expansion. Medium SI001
CI025 Sacra reports Fireworks is in talks to raise again at a $15 billion valuation as of May 2026, which could be the next-round trigger. Low SI002
CI026 Fireworks' principal financing dependency is GPU supply, since it does not own its fleet and sources NVIDIA and AMD capacity from third parties. Medium SI002, SI012
CI027 Fireworks shows credible hypergrowth and a lifecycle-spanning usage model, but the absence of audited figures caps revenue-quality confidence. Medium SI002, SI001
CI028 The main financial diligence blockers are a reconciled revenue figure, gross-margin verification, burn and runway, and net revenue retention. Medium SI002, SI010
CI029 Fireworks' revenue figures span $130 million to roughly $800 million annualized within twelve months, reflecting both hypergrowth and inconsistent measurement. Low SI001, SI002, SI009
CI030 No public debt or project-finance obligations are disclosed for Fireworks AI. Low SI002, SI021
CI031 An AWS case study reports a Fireworks customer cut total costs four-fold and supported three times higher traffic per instance on EC2 P5. Medium SI007
CI032 Reported 2025 profitability, if accurate, would make Fireworks unusually capital-efficient for a hypergrowth infrastructure startup. Low SI009
CI033 Downward inference price pressure threatens Fireworks' margins absent continued differentiation, per critical reviewers. Medium SI020
CI034 MongoDB, a public infrastructure peer and Fireworks investor, illustrates the higher gross margins of pure-software comparables versus inference providers. Low SI014
CI035 Fireworks' capital intensity exceeds a typical SaaS company because compute scaling and the lack of owned GPUs require recurring capacity spend. Medium SI002, SI001
CE001 Fireworks lets a developer point an OpenAI-compatible API at an open model and get low-latency production inference without managing GPUs. High SE010, SE013, SE017
CE002 Customers describe Fireworks as an inference engine that supplies speed, cost and control while they build the product. Medium SE014, SE025, SE026
CE003 The platform spans text, image, audio and multimodal formats across hundreds of models with day-zero support for major releases. Medium SE010, SE006
CE004 Fireworks provides function calling, JSON-mode structured output and streaming through its API. Medium SE010, SE013
CE005 A single customer can expand from serverless inference into fine-tuning, dedicated capacity, RAG and voice agents. Medium SE017, SE023
CE006 Serverless inference is the entry product, offering pay-per-token access to 50-plus served models including Llama 4, DeepSeek V3, Qwen 3, Mixtral, Gemma 3 and Phi-4. Medium SE013, SE010
CE007 FireFunction is Fireworks' proprietary function-calling model family for tool use and structured output. Medium SE013
CE008 Customization modules include LoRA fine-tuning, Supervised Fine-Tuning V2 with quantization-aware training, and Reinforcement Fine-Tuning for agentic tasks. High SE005, SE003, SE004
CE009 Deployment modules span serverless, on-demand dedicated and reserved capacity plus multi-LoRA hosting of many adapters on one base deployment. Medium SE021, SE020
CE010 Newer surfaces include a Voice Agent Platform with sub-500ms response and BYOB secure training from customer AWS S3 buckets. Medium SE017, SE019
CE011 Fireworks runs a proprietary multi-layer inference stack on commodity NVIDIA GPUs with a stateless router, draft and target pods, distributed KV cache and continuous batching. Medium SE001
CE012 FireAttention is a custom CUDA attention implementation Fireworks reports as faster than vLLM and TensorRT-LLM, extended for long context and Llama 4 chunked local attention. Medium SE006, SE001
CE013 FireOptimizer performs adaptive speculative execution with reported latency reductions up to roughly 3x and native FP4 support on NVIDIA Blackwell B200. Medium SE002, SE009
CE014 The serving topology scales to documented tests around 50,000 requests per minute. Low SE001
CE015 Speculative decoding pairs a fast draft model with a full target model to generate and verify tokens in parallel, configurable per workload. Medium SE008, SE001
CE016 Fireworks' operating model is open-model neutral, betting on running whichever open model is winning rather than any single model. Medium SE017
CE017 Fireworks operates a global multi-region fleet including Frankfurt, Iceland and Tokyo plus US, Europe and APAC regions for latency and data residency. Medium SE017
CE018 Independent monitoring placed Fireworks' Q1 2026 uptime at 99.8%, the highest among specialized inference providers. Medium SE013
CE019 Notion reduced AI response latency from about 2 seconds to 350 milliseconds by fine-tuning with Fireworks. Medium SE015
CE020 Cursor reached about 1,000 tokens per second for code generation and Sourcegraph saw a 30% latency reduction and 2.5x acceptance increase on Fireworks. Medium SE014, SE016
CE021 The Series C-funded roadmap targets deeper tuning and inference-alignment research and an end-to-end model-lifecycle creation toolchain. Medium SE022, SE019
CE022 Fireworks plans a three-to-four-fold expansion of global compute and has acquired Hathora to deepen real-time orchestration. Medium SE022, SE017
CE023 Fireworks' core IP is the proprietary inference engine, especially FireAttention kernels and FireOptimizer, rather than registered patents. Medium SE002, SE017
CE024 No public patents are listed for Fireworks; its moat is engineering know-how. Low SE017
CE025 Product-model co-design uses a customer data feedback loop with continuous evaluation and reinforcement learning to improve fine-tuned models over time. Medium SE022, SE003
CE026 Fireworks' optimization advantage sits atop open-source frameworks like vLLM and SGLang that keep improving, so differentiation must be continuously re-earned. Medium SE017, SE009
CE027 The platform depends on leading-edge NVIDIA and AMD GPUs, CUDA, cloud regions and upstream open models. Medium SE001, SE017
CE028 Fireworks offers zero data retention by default, SSO, audit logs and data-residency controls for enterprise buyers. Medium SE017
CE029 Fireworks' AWS-based inference solution is HIPAA and SOC2 Type II compliant. High SE007, SE017
CE030 For sensitive workloads Fireworks supports airgapped EKS deployments and bring-your-own-bucket secure training. Medium SE017
CE031 Structured-output controls such as JSON mode and grammar-constrained decoding plus high schema compliance support dependable agentic tool use. Medium SE013, SE010
CE032 Fireworks does not publish a formal standard-tier SLA, and reviewers note thin documentation in places, both diligence items for security-sensitive buyers. Medium SE013, SE025
CE033 FireFunction achieves roughly 92% multi-tool function-calling accuracy and 99.1% JSON schema compliance in independent benchmarks. Medium SE013, SE027
CE034 Fireworks maintains day-zero support for new models such as Llama 4, DeepSeek and Qwen as a core engineering discipline. Medium SE006, SE011, SE012
CE035 Fireworks publishes open benchmark tooling via its GitHub organization, a developer-signal of technical openness. Low SE018
CU001 Fireworks' customer base spans AI-native startups, digital-native enterprises and large or regulated enterprises with distinct adoption paths. Medium SU009, SU007
CU002 AI-native startups such as Cursor, Perplexity, Liner and Cresta adopt Fireworks bottoms-up via self-serve API keys. Medium SU009, SU011
CU003 Digital-native enterprises including DoorDash, Notion, Shopify, Upwork and Quora run production AI features on Fireworks. High SU011, SU007
CU004 Use cases cluster around code assistance, conversational AI, enterprise search, agentic workflows and voice across software, e-commerce and customer-service verticals. Medium SU009, SU025
CU005 Fireworks' customer geography skews North American and European with global API access. Low SU025
CU006 Fireworks reported powering over 10,000 companies at its October 2025 Series C, about a tenfold increase from roughly 1,000 at the Series B. High SU006, SU009
CU007 Fireworks serves hundreds of thousands of developers, up from 12,000 in February 2024 to 23,000 by the end of 2024. Medium SU006, SU010
CU008 The platform processes more than 10 trillion tokens per day, rising to about 15 trillion by early 2026. Medium SU006, SU007
CU009 Customers follow a land-and-expand path from serverless inference into dedicated deployments, fine-tuning, RFT, embeddings and voice. Medium SU009, SU017
CU010 Analyst commentary on Hebbia shows how a single inference relationship can grow into a broader infrastructure dependency. Medium SU017
CU011 Cursor used Fireworks' speculative-decoding API to reach roughly 1,000 tokens per second for Fast Apply, with a named researcher endorsing production use. Medium SU001, SU013
CU012 Notion reduced AI response latency from about 2 seconds to 350 milliseconds by fine-tuning with Fireworks, attributed by its head of AI engineering. Medium SU002
CU013 Sourcegraph cut latency by 30% and lifted completion acceptance 2.5x on Fireworks, corroborated by an AWS case study. High SU003, SU012
CU014 Upwork's Uma assistant drafts real-time proposals on Fireworks per a named executive. Medium SU004
CU015 Quora's Poe chatbot tripled response speed and Superhuman built its Ask AI compound system on Fireworks. Medium SU013, SU007
CU016 Fireworks' named references are mostly production deployments with quantified outcomes and executive attribution, giving the reference base high quality. Medium SU001, SU002, SU012
CU017 Fireworks does not disclose net revenue retention, gross retention, churn, renewal rates or contract lengths. Medium SU009, SU017
CU018 Customer durability must be inferred from structural signals such as land-and-expand design and production usage rather than disclosed metrics. Medium SU017, SU009
CU019 High daily token volume and named executive testimonials indicate strong repeat usage and satisfaction anecdotally. Low SU006, SU002
CU020 The OpenAI-compatible API and routing aggregators make multi-homing and switching trivial, elevating churn risk. Medium SU018, SU021
CU021 Independent reviewers explicitly document Fireworks alternatives and switching paths, an adverse durability signal. Medium SU018
CU022 Fireworks' growth engine is land-and-expand, with a single serverless feature able to grow into dedicated, fine-tuning, voice and reserved-capacity spend. Medium SU009, SU017
CU023 Blended annualized revenue per company is roughly $28,000, likely understating a long tail beneath a few large accounts. Low SU022
CU024 The identity and revenue share of Fireworks' top customers are not disclosed, creating unquantifiable top-customer concentration risk. Medium SU009, SU022
CU025 The AWS Strategic Collaboration Agreement and Microsoft Foundry availability are growth accelerants but also channel dependencies. Medium SU009, SU024
CU026 Procurement friction is lower than for closed APIs via cloud marketplaces, but enterprise sales cycles and compliance reviews still gate the largest deals. Low SU009, SU024
CU027 Several marquee logos such as DoorDash and Shopify appear in aggregate marketing lists without standalone case studies. Low SU007, SU020
CU028 Sophisticated public customers like GitLab disclose AI-vendor dependence in their filings, illustrating buyer-side multi-homing and substitution capacity. Low SU016
CU029 WorkingAgents and other third parties corroborate Fireworks' compound-inference customer use cases for agentic workflows. Low SU015
CU030 Samsung is cited by investors as an enterprise customer accelerating its AI roadmap on Fireworks. Medium SU011
CU031 The named reference base is high quality but partly dated to 2024, a freshness caveat for diligence. Medium SU003, SU012
CU032 Fireworks' customer logos are concentrated in technology, e-commerce, customer service and legal-tech verticals. Low SU025
CU033 Production usage intensity is implied by 10-15 trillion tokens per day across the customer base. Medium SU006, SU007
CU034 Customer satisfaction evidence is positive but anecdotal, resting on named testimonials rather than survey or NPS data. Low SU002, SU004
CU035 Retention is the weakest-evidenced dimension of Fireworks' customer story, a material diligence gap. Medium SU017, SU018
CR001 Inference commoditization and gross-margin compression are Fireworks' highest-severity risks. High SR001, SR011
CR002 Hyperscaler bundling by AWS, Azure and Google could capture the inference layer and relegate Fireworks to an optimization add-on. Medium SR001
CR003 NVIDIA is simultaneously Fireworks' GPU supplier, an investor and a competitor via Lepton and NIM. Medium SR001, SR008
CR004 Capital intensity from a planned three-to-four-fold compute expansion is a medium-severity risk. Medium SR021, SR001
CR005 Fireworks' mitigation thesis is to move up the stack faster than the serving layer commoditizes. Medium SR001
CR006 Residual risk exposure remains meaningful because several mitigations are unproven and key metrics are undisclosed. Medium SR001, SR012
CR007 The EU AI Act imposes tiered, risk-based obligations including transparency and documentation duties on general-purpose AI providers and deployers. High SR004, SR005
CR008 GDPR and data-residency requirements drive Fireworks' zero-data-retention and regional-deployment features. Medium SR006, SR001
CR009 Open models such as Llama carry acceptable-use and license terms that flow through to platforms serving them. Low SR019, SR007
CR010 Fireworks lists no public patents, so its FireAttention and FireOptimizer advantages rest on trade secrets and know-how. Medium SR013, SR001
CR011 No material litigation or enforcement action against Fireworks is publicly known, and its Series C used top-tier legal counsel. Medium SR018, SR019
CR012 The NIST AI Risk Management Framework provides a voluntary governance baseline Fireworks and its customers can adopt. Low SR020
CR013 Fireworks does not own its GPU fleet and sources NVIDIA and AMD capacity from third parties, exposing it to allocation and supply risk. Medium SR001, SR008
CR014 Fireworks does not publish a formal standard-tier SLA, so contractual reliability commitments are negotiated case by case. Medium SR012
CR015 Independently monitored Q1 2026 uptime of 99.8% is a reliability strength despite the absence of a published SLA. Medium SR012
CR016 Operating a global multi-region fleet adds operational complexity and cost for Fireworks. Low SR001
CR017 Fireworks' SOC2 Type II, HIPAA, zero-retention and airgapped controls mitigate operational and security risk, with no public breach known. Medium SR001
CR018 A single serious outage or data incident would be especially damaging given customers' production, latency-sensitive workloads. Medium SR012, SR001
CR019 NVIDIA is the most acute dependency, supplying leading-edge GPUs while holding a stake and competing through Lepton, a GPU marketplace and NIM. Medium SR001, SR008
CR020 AMD provides an alternative silicon supplier, partly diversifying Fireworks' NVIDIA dependence. Medium SR025
CR021 AWS and Microsoft are both distribution partners and bundling threats via Bedrock, Vertex and Azure Foundry. Medium SR001
CR022 Fireworks depends on continued release and permissive licensing of open models from Meta, DeepSeek and Alibaba. Medium SR001, SR009
CR023 Capital-provider concentration among a handful of late-stage funds and key-customer multi-homing add dependency risk. Low SR022, SR028
CR024 Fireworks' enabling partners NVIDIA, AWS and Microsoft are also its most credible competitors. Medium SR001
CR025 Gross margin near 50% is structurally below software norms and faces persistent downward price pressure. High SR001, SR011
CR026 The path to a 60% gross margin depends on unproven utilization gains and a revenue-mix shift. Medium SR001
CR027 Burn, runway and net revenue retention are undisclosed, so Fireworks' capital adequacy is asserted rather than verified. Medium SR001, SR021
CR028 The valuation ramp from $552 million to $4 billion within roughly fifteen months, with talk of $15 billion, embeds aggressive growth expectations. Medium SR022, SR023
CR029 Key-person risk is concentrated in CEO Lin Qiao, who leads vision and fundraising. Medium SR024
CR030 Retaining elite inference engineers in a hot talent market is a continuing execution challenge. Low SR024, SR001
CR031 Fireworks' mitigations include moving up the stack, diversifying silicon, maintaining day-zero model support and hardening compliance. Medium SR001
CR032 Plugging into AWS and Azure procurement is a defensive mitigation against hyperscaler bundling. Medium SR001
CR033 Execution risk centers on whether the unproven up-the-stack expansion outruns commoditization. Medium SR001
CR034 Gross-margin trajectory toward 60% is the single best monitoring indicator of Fireworks' risk profile. Medium SR001
CR035 The clearest thesis-break triggers are margin stuck at ~50%, hyperscaler/NVIDIA capture, a key-person departure, or growth stalling versus the valuation. Medium SR001, SR022
CR036 Priority diligence asks are a reconciled revenue and margin figure, NRR and burn, GPU-supply contracts, and top-customer concentration. Medium SR001, SR012
CR037 Public infrastructure peers such as Datadog, Snowflake, Confluent and Cloudflare disclose AI-competition and margin risk factors that contextualize Fireworks' exposures. Medium SR014, SR015, SR016, SR017
CR038 DigitalOcean's filings illustrate the lower-margin reality of infrastructure-heavy businesses relative to pure software. Low SR030
CR039 Better-capitalized rivals such as Baseten raise the competitive stakes for Fireworks' enterprise go-to-market. Medium SR028, SR027
CR040 Low switching costs from OpenAI-compatible APIs and routers cap retention and amplify commoditization risk. Medium SR003, SR013
CR041 US export controls and supply constraints on advanced GPUs are an indirect risk transmitted through Fireworks' NVIDIA dependence. Low SR008, SR009
CR042 Fireworks' terms of service allocate liability and usage restrictions that are standard but warrant review for enterprise indemnification. Low SR019
CV001 The bull thesis is that Fireworks is becoming critical enterprise AI infrastructure as enterprises shift from closed-API experimentation to owning customized open models in production. Medium SV026, SV008
CV002 The anti-thesis is that inference is structurally commoditizing, with ~50% margins, near-zero switching costs, and hyperscaler and NVIDIA repricing risk. Medium SV001, SV016
CV003 Fireworks pairs a PyTorch-pedigree founding team with FireAttention, FireOptimizer, best-in-class function calling and 99.8% uptime. Medium SV026, SV001
CV004 Fireworks grew from roughly $130 million ARR in mid-2025 to a reported ~$800 million annualized by May 2026 across 10,000-plus customers. Medium SV001, SV029
CV005 Fireworks' per-token prices sit within ~2% of Together and open-source serving frameworks keep closing the performance gap, supporting the commoditization anti-thesis. Medium SV016, SV001
CV006 A valuation that ran from $552 million to $4 billion in fifteen months, with $15 billion in talks, prices in flawless execution. Medium SV001, SV008, SV029
CV007 We rate Fireworks AI track with medium confidence, a high risk rating and a stretched valuation stance. Medium SV001, SV016
CV008 We assign an overall score of 6.5 out of 10, reflecting a strong business at a demanding price. Low SV001, SV026
CV009 The $4 billion Series C implied roughly 14 times the company-stated $280 million annualized revenue. Medium SV008, SV001
CV010 The rumored $15 billion round implies roughly 19 times Sacra's ~$800 million May 2026 revenue estimate. Low SV001, SV004
CV011 Confidence is capped at medium primarily by the absence of audited financials, disclosed NRR and a reconciled revenue figure. Medium SV001, SV016
CV012 Fireworks has raised over $327 million across seed, a $25M Series A, a $52M Series B at $552M and a $250M Series C at $4B. High SV008, SV001
CV013 The Series C comprised roughly $230 million primary and a $20 million secondary. Medium SV001
CV014 As of May 2026 Fireworks is reportedly in talks to raise at a $15 billion post-money valuation co-led by Index Ventures. Medium SV001, SV004, SV002
CV015 Public evidence supports Fireworks' growth and customer story but not its financial quality, since revenue is unaudited, margin is estimated, and burn is undisclosed. Medium SV001, SV016
CV016 Strategic investors NVIDIA, AMD, MongoDB and Databricks on the cap table add ecosystem support but concentrate supplier and partner influence. Medium SV008, SV027
CV017 The base case (~45%) assumes ~$700-900 million 2026 revenue and low-50s margins, implying a fair value around $5-8 billion. Low SV001, SV005
CV018 The bull case (~30%) assumes margins toward 58-60% and revenue past $1.5 billion by 2027, justifying $15-20 billion. Low SV001, SV015
CV019 The bear case (~25%) assumes commoditization and hyperscaler capture compressing the multiple to a $2-3 billion range or a down round. Low SV016, SV001
CV020 The valuation dispersion is unusually wide because the same company can be read as durable infrastructure or a commoditizing reseller. Medium SV001, SV015
CV021 The deciding evidence between scenarios, gross-margin trajectory and retention, is not yet disclosed. Medium SV001
CV022 Together AI was valued at $3.3 billion on about $618 million annualized revenue in early 2025, roughly 5x, and is reportedly near $7.5 billion on about $1 billion. Medium SV005
CV023 Baseten raised at a $5 billion valuation in January 2026 with talks of $11 billion, and Groq reached $6.9 billion as a hardware-led player, while Fal is cited around $4.5 billion. Medium SV006, SV007, SV002
CV024 Public infrastructure-software comparables such as Datadog, Snowflake, Cloudflare and Confluent frame a broad, compressed multiple band with 70%-plus gross margins. Medium SV011, SV012, SV013, SV020
CV025 DigitalOcean illustrates that lower-margin infrastructure businesses trade at clear discounts to pure software, supporting a discount for Fireworks' ~50% margins. Medium SV014
CV026 Hyperscalers Amazon, Microsoft and Oracle are both the scale reference and the competitive threat to Fireworks' valuation. Medium SV017, SV018, SV019
CV027 At $4 billion on ~$280 million Fireworks looks rich versus Together's multiple but is on a smaller, faster-growing base; on ~$800 million it looks comparatively cheap. Medium SV001, SV005
CV028 Plausible exit paths include an IPO on sustained hypergrowth or strategic acquisition by a hyperscaler or data-platform investor that is also a competitor. Low SV017, SV018
CV029 The principal thesis-break triggers are margin failing to rise off ~50%, hyperscaler or NVIDIA capture, a key-person departure, or growth stalling versus the valuation. Medium SV001, SV016
CV030 A net revenue retention below roughly 110% once disclosed would warrant a lower multiple. Low SV001
CV031 Priority diligence asks are a reconciled dated revenue figure, audited gross margin and the path to 60%, NRR and churn, burn and runway, GPU-supply terms, top-customer concentration and preference and dilution structure. Medium SV001, SV016
CV032 Until margin and retention are confirmed, the right posture is to track closely, underwrite to the base case, and reserve premium entry for confirmation of the infrastructure thesis. Medium SV001, SV015
CV033 Together's prior round at $1.25 billion on $130 million 2024 revenue traded at 9.6x, a useful inference-peer multiple benchmark. Medium SV005
CV034 Fireworks' ~50% gross margin warrants a discount to the 70%-plus-margin public-software multiples because GPU costs sit in COGS. Medium SV014, SV001
CV035 The $15 billion valuation talk is corroborated by Sacra and multiple news outlets as of late May 2026 but remains unconfirmed. Medium SV001, SV002, SV003, SV024
CV036 The large AI inference TAM growing near 19% annually supports a premium for category leaders but does not by itself justify any single multiple. Medium SV030, SV015
CV037 A premium entry would become attractive if Fireworks demonstrates a credible path to 60% margins and net revenue retention above 120%. Low SV001
CV038 Usage-based comparables like Twilio and AI-software names like C3.ai bound the multiple range for consumption- and AI-exposed businesses. Low SV021, SV023
CV039 Preference stack and liquidation overhang are not publicly disclosed and must be diligenced before a late-stage entry. Low SV001, SV010
CV040 Salesforce and other large software comps illustrate mature-growth multiple compression that a maturing Fireworks would eventually face. Low SV022
Sources
IDPublisherTitleQuote
SO001 Fireworks AI Fireworks AI - Fastest Inference for Generative AI
SO002 Fireworks AI Fireworks AI Raises $250M Series C to Power the Future of Enterprise AI Today, we're announcing a $250 million Series C at a $4 billion valuation ... brings our total funding to over $327 million
SO003 Fireworks AI Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems We're thrilled to announce our $52M Series B funding round led by Sequoia Capital, raising our valuation to $552M.
SO004 Index Ventures Inference is the New Runtime: Our Investment in Fireworks Alongside co-founders Dmytro Dzhulgakov, Dmytro Ivchenko, and James Reed ... as well as Benny Chen, Chenyu Zhao, and Pawel Garbacki
SO005 Sequoia Capital Fireworks Founder Lin Qiao on Fast Inference and Small Models
SO006 The AI Insider Fireworks AI Closes $250M Series C to Lead the AI Inference Market
SO007 The AI Insider Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems
SO008 PYMNTS Fireworks AI Valued at $552 Million After New Funding Round
SO009 Tech Funding News NVIDIA, Sequoia invest in GenAI startup Fireworks AI's $52M round
SO010 The SaaS News Fireworks AI Raises $52 Million in Series B
SO011 AI Curator Fireworks AI Closes $250M Round, Eyes AI Inference Lead
SO012 AIM Media House Fireworks AI raises $250 million for enterprise AI infrastructure
SO013 Sacra Fireworks AI revenue, valuation & funding Sacra estimates that Fireworks AI hit $800M in annualized revenue in May 2026, up from about $305M at the end of 2025.
SO014 Scroll.media Fireworks AI has a valuation of $552 million. Ukrainians among the founders. the company is already profitable, growing at an astonishing x20 year-over-year, with $130 million in annual recurring revenue (ARR).
SO015 The Stack Fireworks AI's Lin Qiao: The future is compound AI
SO016 TWIML AI Lin Qiao profile
SO017 Crunchbase Fireworks AI - Company Profile
SO018 AI Market Watch Fireworks AI - AI Startup Profile Scaled to 15 trillion tokens processed daily and $315M+ annualized revenue by early 2026.
SO019 SiliconANGLE Fireworks AI raises $250M at $4B valuation to help enterprises with AI inference workloads
SO020 Business Wire Fireworks AI Raises $250M Series C to Lead the AI Inference Market
SO021 Orrick Fireworks AI Raises $250 Million Series C at $4 Billion Valuation
SO022 Tech Funding News PyTorch engineers' brainchild Fireworks AI closes $250M at $4B valuation
SO023 Exa Meet the Executive Team at Fireworks AI
SO024 GitHub Fireworks AI (fw-ai) GitHub organization
SO025 Fireworks AI Fireworks AI Careers
SO026 eesel AI An honest Fireworks AI review (2025): The good, the bad, and the ugly Fireworks excels at performance and model selection, but it is 'just the engine' - developers and businesses still need technical sophistication to build deployable solutions.
SM001 MarketsandMarkets AI Inference Market - Global Forecast to 2030 the AI inference market is expected to grow from USD 106.15 billion in 2025 to USD 254.98 billion by 2030, at a CAGR of 19.2%
SM002 Polaris Market Research AI Inference Market Size & Trends, Industry Report 2034
SM003 Research and Markets AI Inference Market Outlook 2026-2034
SM004 Vention State of AI 2026 - AI Market Size, Investment, and Industry Data
SM005 Precedence Research AI Inference Market Size and Forecast
SM006 Digital Applied AI Inference Providers Compared: Q2 2026 Pricing Matrix By Q2 2026 the serverless inference market has consolidated around seven providers - Together, Fireworks, Anyscale, Groq, Cerebras, Replicate, and OctoAI.
SM007 Alatirok AI Inference Providers in 2026: 5-Way Comparison
SM008 Jimmy Research Fireworks AI - entity profile
SM009 Index Ventures Inference is the New Runtime: Our Investment in Fireworks Gartner projects GenAI model spend to nearly triple from $14 billion in 2025 to $39 billion by 2028
SM010 Sacra Fireworks AI revenue, valuation & funding
SM011 Sacra Together AI revenue, valuation & funding Sacra estimates that Together AI hit $1B in annualized revenue in February 2026, up from ~$618M at the end of 2025.
SM012 Sacra Baseten revenue, valuation & funding
SM013 Fireworks AI Fireworks AI Raises $250M Series C
SM014 SiliconANGLE Fireworks AI raises $250M at $4B valuation
SM015 Microsoft Azure Introducing Fireworks AI on Microsoft Foundry
SM016 Together AI Together AI - The AI Acceleration Cloud
SM017 Together AI Together AI Pricing
SM018 Baseten Baseten - Inference Platform
SM019 Groq Groq - Fast, low cost inference
SM020 Modal Modal - High-performance AI infrastructure
SM021 Replicate Replicate - Run AI with an API
SM022 Anyscale Anyscale - Scalable compute for AI
SM023 TokenMix Fireworks AI Review 2026
SM024 DeployBase Fireworks AI Pricing Breakdown
SM025 eesel AI An honest Fireworks AI review (2025) the industry expects this downward pricing pressure to intensify by 2025-2026, making it difficult for any single provider to maintain high profit margins
SM026 EU AI Act (artificialintelligenceact.eu) High-level summary of the AI Act
SP001 TokenMix Fireworks AI Review 2026: 99.8% Uptime vs Together and Groq Fireworks: 99.8% uptime + best function calling, 50+ models, $0.90/M. Together: 200+ models + cheap fine-tuning, $0.88/M. Groq: ultra-low latency, $0.59/M but lowest uptime (99.4%).
SP002 Sacra Together AI revenue, valuation & funding Together AI raised a $305M Series B in February 2025 led by General Catalyst ... valuing the company at $3.3B
SP003 Sacra Baseten revenue, valuation & funding
SP004 Business Wire Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future
SP005 DatacenterDynamics AI chip company Groq raises $750m at $6.9bn valuation
SP006 Dataconomy AI chip startup Groq raises $750 million at a $6.9 billion valuation
SP007 The AI World Baseten raises $300M to scale AI inference
SP008 TechBuzz Groq Raises $750M at $6.9B Valuation to Challenge Nvidia's AI Dominance
SP009 Sacra Fireworks AI revenue, valuation & funding (competition section) Together AI is Fireworks' closest direct competitor ... Baseten raised a $300M Series E at a $5 billion valuation
SP010 Digital Applied AI Inference Providers Compared: Q2 2026 Pricing Matrix
SP011 DeployBase Fireworks AI Pricing Breakdown vs competitors
SP012 Modal Modal - High-performance AI infrastructure
SP013 Replicate Replicate - Run AI with an API
SP014 Anyscale Anyscale - Scalable compute for AI
SP015 Baseten Baseten Pricing
SP016 Microsoft Foundry Fireworks models on Microsoft Foundry
SP017 Groq Groq - Fast, low cost inference
SP018 Together AI Together AI - The AI Acceleration Cloud
SP019 Alatirok AI Inference Providers in 2026: 5-Way Comparison
SP020 Walturn What is Fireworks AI? Features, Pricing, and Use Cases
SP021 createaiagent.net Fireworks AI: Optimized Inference Solutions
SP022 Fireworks AI Fireworks AI Raises $250M Series C
SP023 eesel AI An honest Fireworks AI review (2025) Critics note that, while Fireworks excels at performance and model selection, it is 'just the engine'.
SP024 SiliconANGLE Fireworks AI raises $250M at $4B valuation
SP025 Together AI Together AI Pricing
SI001 Fireworks AI Fireworks AI Raises $250M Series C our annualized revenue has surpassed $280 million ... Growing our computation footprint 3-4x over the next year
SI002 Sacra Fireworks AI revenue, valuation & funding The company's gross margin sits at approximately 50% ... Fireworks has told investors it is targeting 60% gross margins
SI003 Fireworks AI Fireworks AI Pricing
SI004 TokenMix Fireworks AI Review 2026 - pricing breakdown Llama 70B $0.90/M, Llama 8B $0.20/M, DeepSeek V3 $0.50/M ... Reserved capacity ... approximately $4.80/hour
SI005 DeployBase Fireworks AI Pricing Breakdown: Cost Per Token
SI006 SiliconANGLE Fireworks AI raises $250M at $4B valuation
SI007 Amazon Web Services Fireworks.ai Case Study the customer cut total costs by four times ... HIPAA and SOC2 Type II compliant
SI008 Markaicode Fireworks AI Architecture: Multi-Layer Inference Stack for 50K RPM
SI009 Scroll.media Fireworks AI valuation and ARR the company is already profitable, growing at an astonishing x20 year-over-year, with $130 million in annual recurring revenue (ARR).
SI010 AI Market Watch Fireworks AI - AI Startup Profile Scaled to 15 trillion tokens processed daily and $315M+ annualized revenue by early 2026.
SI011 Digital Applied AI Inference Providers Compared: Q2 2026 Pricing Matrix
SI012 U.S. Securities and Exchange Commission (NVIDIA) NVIDIA Corporation Form 10-K (FY ended January 25, 2026)
SI013 U.S. Securities and Exchange Commission (AMD) Advanced Micro Devices Form 10-K (FY ended December 27, 2025)
SI014 U.S. Securities and Exchange Commission (MongoDB) MongoDB, Inc. Form 10-K (FY ended January 31, 2026)
SI015 Fireworks AI Fireworks AI Docs - Concepts
SI016 Fireworks AI FireOptimizer: Customizing latency and quality
SI017 Index Ventures Inference is the New Runtime
SI018 Fireworks AI Multi-LoRA: Personalize AI at scale
SI019 Sanjay Says Fireworks AI and Adaptive Speculative Execution
SI020 eesel AI An honest Fireworks AI review (2025) there is pressure for all inference providers to cut prices ... making it difficult for any single provider to maintain high profit margins
SI021 Crunchbase Fireworks AI - Company Profile
SI022 Business Wire Fireworks AI Raises $250M Series C
SI023 Tech Funding News Fireworks AI closes $250M at $4B valuation
SI024 Fireworks AI Fireworks AI Docs - Deploying LoRAs
SI025 Fireworks AI Fireworks AI Docs - Changelog
SE001 Markaicode Fireworks AI Architecture: Multi-Layer Inference Stack for 50K RPM Stateless request router ... Draft GPU pods running a small fast model ... Target GPU pods ... Distributed KV cache ... above 85 tokens/sec per GPU
SE002 Fireworks AI FireOptimizer: Customizing latency and quality for production
SE003 Fireworks AI Reinforcement Fine Tuning: Train expert open models to surpass closed
SE004 Fireworks AI Fireworks RFT: Build AI agents with fine-tuned open models
SE005 Fireworks AI Introducing Supervised Fine Tuning V2
SE006 Fireworks AI Optimizing Llama 4 Maverick on Fireworks AI Llama 4 Maverick became available day one on Fireworks with support for 1-million-token context ... custom attention via FireAttention
SE007 Amazon Web Services Fireworks.ai Case Study (HIPAA / SOC2) the Fireworks.ai inference solution built on AWS is HIPAA and SOC2 Type II compliant
SE008 Fireworks AI Speculative Decoding - Fireworks AI Docs
SE009 Sanjay Says Fireworks AI and Adaptive Speculative Execution
SE010 Fireworks AI Fireworks AI Docs - Introduction
SE011 Fireworks AI DeepSeek V3.1 now on Fireworks AI
SE012 Fireworks AI Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost
SE013 TokenMix Fireworks AI Review 2026: uptime and function calling benchmarks FireFunction 92.1% multi-tool accuracy ... 99.8% uptime, highest in inference market
SE014 Fireworks AI How Cursor built Fast Apply using the Speculative Decoding API Cursor ... achieve 1000 tokens/sec for code generation use cases such as instant apply
SE015 Fireworks AI How Notion fine-tuned with Fireworks we reduced latency from about 2 seconds to 350 milliseconds
SE016 Fireworks AI How Sourcegraph scaled real-time code assistance with Fireworks
SE017 Sacra Fireworks AI - enterprise security posture zero data retention by default, SSO, audit logs, data residency controls, HIPAA and SOC2 compliance posture, and airgapped EKS deployments
SE018 GitHub Fireworks AI (fw-ai) GitHub organization and benchmarks
SE019 Fireworks AI Fireworks AI Dev Day 2025 Wrapped
SE020 Fireworks AI Multi-LoRA: Personalize AI at scale
SE021 Fireworks AI Fireworks AI Docs - Concepts
SE022 Fireworks AI Fireworks AI Raises $250M Series C (roadmap) Expand Our Product into a Comprehensive AI Creation Toolchain ... Growing our computation footprint 3-4x
SE023 Fireworks AI Fireworks AI - AI-native
SE024 Fireworks AI Fireworks AI Docs - Deploying LoRAs
SE025 eesel AI An honest Fireworks AI review (2025): documentation gaps Some reviews point to limited transparency around free usage, sporadic documentation, and potential support slowdowns
SE026 Walturn What is Fireworks AI? Features, Pricing, and Use Cases
SE027 DeployBase Fireworks AI Pricing and capabilities breakdown
SU001 Fireworks AI How Cursor built Fast Apply using the Speculative Decoding API Fireworks is way more performant than the open source engines and is what we use in production.
SU002 Fireworks AI How Notion fine-tuned models with Fireworks we reduced latency from about 2 seconds to 350 milliseconds
SU003 Fireworks AI Real-time code assistance: How Sourcegraph scaled with Fireworks
SU004 Fireworks AI How Upwork and Fireworks deliver faster proposals (Uma)
SU005 Fireworks AI Accelerating Code Completion with Fireworks Fast LLM Inference
SU006 Fireworks AI Fireworks AI Raises $250M Series C (customer scale) Fireworks now powers over 10,000 companies (a 10x increase from our Series B)
SU007 AI Market Watch Fireworks AI - notable customers and growth metrics Notable customers: Quora, DoorDash, Upwork, Cresta, Cursor, Liner, Superhuman, Sourcegraph, Tome, Samsung, Uber, Notion, Shopify
SU008 Fireworks AI Fireworks AI - Customers
SU009 Sacra Fireworks AI - customer base and expansion The customer base grew from roughly 1,000 companies at the time of the Series B to more than 10,000 companies by October 2025.
SU010 Scroll.media Fireworks AI developer growth 2024 The number of developers using Fireworks AI jumped from 12,000 in February 2024 to 23,000 by year's end.
SU011 Index Ventures Inference is the New Runtime (customer references) high-throughput, latency-sensitive applications at companies like Uber, DoorDash, Notion, Quora, and Upwork ... enterprise leaders like Samsung
SU012 Amazon Web Services Fireworks.ai Case Study (Sourcegraph / Cody) Cody doubled its completion acceptance rate ... Cody's backend latency accelerated by more than two times.
SU013 Fireworks AI Fireworks AI Series B (Cursor, Quora, Upwork, Superhuman) Superhuman ... used Fireworks to create Ask AI, a compound AI system
SU014 Fireworks AI Fireworks AI - AI-native customers
SU015 WorkingAgents Fireworks AI: The Compound Inference Engine
SU016 GitLab Inc. (SEC EDGAR) GitLab Inc. Form 10-K (FY ended January 31, 2026)
SU017 Sacra Fireworks AI - retention and expansion dynamics a single inference relationship can anchor a broader infrastructure dependency over time
SU018 eesel AI Fireworks AI alternatives and switching considerations
SU019 eesel AI An honest Fireworks AI review (2025)
SU020 Fireworks AI Fireworks AI homepage (customer logos)
SU021 TokenMix Fireworks AI Review 2026 - production usage
SU022 Sacra Fireworks AI - business model and ARPA Blended annualized revenue per company works out to roughly $28,000 across the full base
SU023 Fireworks AI Fireworks AI Blog index
SU024 Fireworks AI Fireworks AI at AWS re:Invent 2025
SU025 AI Market Watch Fireworks AI - geographic focus and industries
SR001 Sacra Fireworks AI - risks section the proprietary performance advantage in FireAttention and FireOptimizer is likely to compress ... Hyperscaler capture ... Hardware concentration
SR002 eesel AI An honest Fireworks AI review (2025): risks
SR003 eesel AI Fireworks AI alternatives (switching risk)
SR004 EU AI Act (artificialintelligenceact.eu) High-level summary of the AI Act
SR005 EU AI Act (artificialintelligenceact.eu) Article 53: Obligations for providers of general-purpose AI models
SR006 GDPR.eu What is GDPR, the EU's data protection law?
SR007 European Commission Regulatory framework for AI
SR008 U.S. Securities and Exchange Commission (NVIDIA) NVIDIA Corporation Form 10-K (FY ended January 25, 2026)
SR009 DatacenterDynamics Groq raises $750m at $6.9bn valuation (silicon competition)
SR010 Dataconomy Groq raises $750 million (NVIDIA challenge)
SR011 Digital Applied AI Inference Providers Pricing Matrix Q2 2026 (price pressure)
SR012 TokenMix Fireworks AI Review 2026 (SLA and pricing risk) Fireworks AI does not publish a formal SLA for its standard tier
SR013 Walturn What is Fireworks AI? (risks and lock-in)
SR014 U.S. Securities and Exchange Commission (Datadog) Datadog, Inc. Form 10-K (FY ended December 31, 2025)
SR015 U.S. Securities and Exchange Commission (Snowflake) Snowflake Inc. Form 10-K (FY ended January 31, 2026)
SR016 U.S. Securities and Exchange Commission (Confluent) Confluent, Inc. Form 10-K (FY ended December 31, 2025)
SR017 U.S. Securities and Exchange Commission (Cloudflare) Cloudflare, Inc. Form 10-K (FY ended December 31, 2025)
SR018 Orrick Fireworks AI Series C legal counsel
SR019 Fireworks AI Fireworks AI Terms of Service
SR020 NIST AI Risk Management Framework
SR021 Fireworks AI Fireworks AI Raises $250M Series C (use of funds / capital intensity)
SR022 SiliconANGLE Fireworks AI raises $250M at $4B valuation (valuation ramp)
SR023 Scroll.media Fireworks AI valuation ramp 552M to 4B
SR024 Index Ventures Inference is the New Runtime (founder dependency)
SR025 Advanced Micro Devices (SEC EDGAR) AMD Form 10-K (alternative silicon supply)
SR026 DeployBase Fireworks AI Pricing (margin pressure)
SR027 Alatirok AI Inference Providers 2026 (competitive risk)
SR028 Business Wire Baseten Raises $300M (capital asymmetry)
SR029 GitLab Inc. (SEC EDGAR) GitLab Form 10-K (AI vendor risk-factor comparable)
SR030 DigitalOcean (SEC EDGAR) DigitalOcean Form 10-K (infrastructure margin comparable)
SV001 Sacra Fireworks AI revenue, valuation & funding Fireworks AI is in talks to raise a new funding round at a $15 billion post-money valuation, with Index Ventures set to co-lead.
SV002 AI Weekly Fireworks AI Targets $15B Valuation in New Round
SV003 StartupNews.fyi Fireworks AI Seeks $15B Funding, Quadrupling Valuation
SV004 Yahoo Finance Fireworks AI Eyes $15 Billion Valuation In New Funding Talks
SV005 Sacra Together AI revenue, valuation & funding Based on 2024 revenue of $130M and a $1.25B valuation, the company traded at a 9.6x revenue multiple at its prior round.
SV006 Sacra Baseten revenue, valuation & funding
SV007 DatacenterDynamics Groq raises $750m at $6.9bn valuation
SV008 Fireworks AI Fireworks AI Raises $250M Series C at $4B valuation
SV009 SiliconANGLE Fireworks AI raises $250M at $4B valuation
SV010 Orrick Fireworks AI Raises $250 Million Series C at $4 Billion Valuation
SV011 U.S. Securities and Exchange Commission (Datadog) Datadog, Inc. Form 10-K (FY 2025)
SV012 U.S. Securities and Exchange Commission (Snowflake) Snowflake Inc. Form 10-K (FY 2026)
SV013 U.S. Securities and Exchange Commission (Cloudflare) Cloudflare, Inc. Form 10-K (FY 2025)
SV014 U.S. Securities and Exchange Commission (DigitalOcean) DigitalOcean Holdings Form 10-K (FY 2025)
SV015 a16z AI Inference Economics
SV016 eesel AI An honest Fireworks AI review (2025): margin and commoditization
SV017 U.S. Securities and Exchange Commission (Amazon) Amazon.com, Inc. Form 10-K (FY 2025)
SV018 U.S. Securities and Exchange Commission (Microsoft) Microsoft Corporation Form 10-K (FY 2025)
SV019 U.S. Securities and Exchange Commission (Oracle) Oracle Corporation Form 10-K (FY 2025)
SV020 U.S. Securities and Exchange Commission (Confluent) Confluent, Inc. Form 10-K (FY 2025)
SV021 U.S. Securities and Exchange Commission (Twilio) Twilio Inc. Form 10-K (FY 2025)
SV022 U.S. Securities and Exchange Commission (Salesforce) Salesforce, Inc. Form 10-K (FY 2026)
SV023 U.S. Securities and Exchange Commission (C3.ai) C3.ai, Inc. Form 10-K (FY 2025)
SV024 CryptoBriefing Fireworks AI reportedly seeks funding at $15 billion valuation
SV025 Briefs.co Fireworks AI Eyes $15B Valuation In New Funding Round
SV026 Index Ventures Inference is the New Runtime (thesis)
SV027 Tech Funding News Fireworks AI closes $250M at $4B valuation
SV028 AI Market Watch Fireworks AI - revenue and valuation profile
SV029 Scroll.media Fireworks AI valuation ramp 552M to 4B
SV030 MarketsandMarkets AI Inference Market - Global Forecast to 2030