尽调报告 AI inference infrastructure / developer tools Series C (private) 2026-06-14

Fireworks AI

面向开放模型的推理云，定价容不得失误

Fireworks AI 是顶级 AI 推理资产，创始团队一线、增长极快；但约 50% 毛利率和结构性商品化风险之下，估值已按完美执行定价。

封面要素

估值 01

4 USD billion (Series C, Oct 2025) [CO011]

累计融资 02

327 USD million+ [CO014]

年化收入 03

800 USD million (Sacra est., May 2026) [CI018]

客户 04

10000 companies+ [CO018]

每日 tokens 05

15 trillion (early 2026) [CO022]

毛利率 06

50 percent (est.) [CO024]

公司概况

Fireworks AI 是一家位于 Redwood City 的 AI 推理云公司，由 Lin Qiao 和一批前 Meta PyTorch 工程师于 2022 年创立。它让企业通过 OpenAI 兼容 API，在生产环境运行、微调并扩展数百个开源 LLM、图像、音频和多模态模型；差异化来自自研推理优化（FireAttention、FireOptimizer）、同类最佳的函数调用能力和领先的可靠性。公司在 2025 年 10 月以 $4B 估值完成 $250M Series C，收入高速增长，第三方估算年化收入约 $800M；客户超过 10,000 家，包括 Cursor、Notion、DoorDash 和 Samsung。

官网: fireworks.ai
成立时间: 2022-01-01
创始人: Lin Qiao, Dmytro Dzhulgakov, Dmytro Ivchenko
创立地点: Redwood City, California, USA
总部: Redwood City, California, USA
产品: 按使用量计费的 AI 推理平台：面向开放模型的按 token 无服务器推理、LoRA 与强化微调、专用和预留 GPU 部署、函数调用模型族（FireFunction）以及语音智能体平台，全部跑在自研优化推理引擎上。
客户: AI 原生初创公司、数字原生企业，以及部分构建生产级生成式 AI 应用的 Fortune 500 买家；这些客户需要快速、低成本、可控的开放模型推理。
商业模式: B2B 按量计费覆盖无服务器（按 token）、微调（按训练 token）、强化微调（按 GPU-hour）以及专用 / 预留部署；开发者自下而上进入，再扩展为议价企业合同。
阶段: Series C (private, venture-backed)
融资情况: 2025 年 10 月以 $4B 估值完成 $250M Series C，累计融资超过 $327M；据称截至 2026 年 5 月，公司正洽谈一轮由 Index Ventures 共同领投、估值约 $15B 的融资（未确认）。

[CO001, CO011, CO014, CO018]

执行摘要

主要优势

稀缺的创始人市场契合：当年在 Meta 打造 PyTorch 的团队，如今在做推理系统。
增长极快：据称年化收入约 $800M，覆盖 10,000+ 客户，每日处理 15T tokens。
生产级客户背书优质：Cursor、Notion、Sourcegraph、Upwork 均给出可量化结果。
工程驱动差异化明显：FireAttention、FireOptimizer、一流 function calling，以及 99.8% uptime。

主要风险

推理正在商品化；约 50% 毛利率低于软件公司 70%+ 的常态。
Hyperscaler 捆绑销售压力大（Bedrock、Azure、Vertex），NVIDIA 同时是供应商、投资方和竞争者。
切换成本低、多平台并用常见，压住留存和定价权。
估值爬升激进（$552M 到 $4B，洽谈中达 $15B），已经押注零失误执行。

未决问题

没有审计财务，也没有一个可对账、带日期的收入数字；一年内外部估算相差 6 倍。
毛利率、净收入留存、流失、烧钱和 runway 均未披露。
头部客户收入集中度和 GPU 供应合同条款未公开。
下一轮的优先权栈和稀释结构未披露。

01公司概览

1.1 身份与商业模式

Fireworks AI 是一家美国人工智能基础设施公司，总部位于 California Redwood City，由一支离开 Meta PyTorch 组织的团队于 2022 年末创立。公司运营其所谓的「AI Cloud」，面向企业开发团队：托管推理平台以低延迟服务运行、微调并扩展开源大语言、视觉、音频和多模态模型。它的核心判断是 “one-size-fits-one”（一客一策）推理：最高价值的 AI 不会来自少数通用封闭基础模型，而会建立在更小、可定制、并用企业专有数据调优的开放模型上。商业化按客户生命周期计费：无服务器推理按 token 收费，微调按训练 token 收费，强化微调按 GPU-hour 收费，按需或预留专用部署按 GPU-second 或 GPU-hour 收费。平台提供数百个模型、OpenAI 兼容 API、函数调用和企业安全控制，把 Fireworks 放在商品化 GPU 租赁与封闭模型 API 之间。[CO001, CO002, CO003, CO004, CO005, CO031]

FO002: 公司快照逻辑

身份、产品、客户、资本和依赖如何串起来。

[CO001, CO004, CO018, CO024, CO028]

1.2 创始人与领导层

Fireworks AI 由 CEO Lin Qiao 与六位同事共同创立，多数人曾在 Meta 共同参与 PyTorch。Qiao 此前是 Meta 工程高级总监兼 PyTorch 负责人，带过 300 多名工程师；更早在 LinkedIn、IBM 及其他大型系统公司任职；拥有 UC Santa Barbara 计算机科学博士学位。联合创始人包括 Dmytro Dzhulgakov，他是前核心 PyTorch 维护者，2011 年加入 Facebook；以及 Dmytro Ivchenko，毕业于 Kyiv Polytechnic，曾在 Meta 负责 PyTorch 排序，两人均来自乌克兰。其余创始人 James Reed、Benny Chen、Chenyu Zhao 和 Pawel Garbacki 来自 Meta PyTorch 编译器、广告基础设施、核心 ML 团队以及 Google Vertex AI。投资人反复把创始团队深厚的推理系统背景视为公司核心优势；这也形成集中在 Qiao 身上的关键人依赖。[CO006, CO007, CO008, CO009, CO010, CO032]

领导层和创始人表
人物	角色	背景	创始人-市场匹配	关键人依赖
Lin Qiao	CEO 兼联合创始人	Meta PyTorch 负责人（300+ 工程师）；LinkedIn、IBM；UC Santa Barbara 博士	深厚推理系统和 OSS 领导力	高 - 公众门面、愿景和融资负责人
Dmytro Dzhulgakov	联合创始人（CTO 级）	2011 年起任 Meta 核心 PyTorch 维护者；来自 Ukraine Kharkiv	核心推理工程能力	高 - 主要技术架构师
Dmytro Ivchenko	联合创始人	Meta PyTorch 排序；LinkedIn；Kyiv Polytechnic	大规模 ML 系统	中
James Reed	联合创始人	Meta PyTorch 编译器团队	编译器 / kernel 优化	中
Benny Chen	联合创始人	Meta 广告基础设施负责人	生产基础设施战略	中
Chenyu Zhao	联合创始人	曾领导 Google Vertex AI	云 AI 平台 GTM	中
Pawel Garbacki	联合创始人	Meta Newsfeed 核心 ML	ML 系统和排序	中

创始人名单和背景汇总自 Index Ventures、Sequoia、scroll.media 和高管目录来源；除 CEO 外，其他角色并非全部都有公开正式头衔。

[CO006, CO007, CO008, CO009, CO010]

1.3 融资与资本结构

Fireworks 已通过种子轮和三轮定价融资募得超过 $327M。2024 年 3 月，Benchmark 领投 $25M Series A，Sequoia Capital、Databricks Ventures 以及 Frank Slootman、Sheryl Sandberg、Howie Liu、Alexandr Wang 等天使参投。2024 年 7 月，Sequoia 领投 $52M Series B，估值 $552M，引入 NVIDIA、AMD 和 MongoDB Ventures，使累计资本达到 $77M。2025 年 10 月，公司宣布以 $4B 估值完成 $250M Series C，由 Lightspeed Venture Partners、Index Ventures 和 Evantic 共同领投，Sequoia 继续支持；Sacra 称该轮约含 $230M 一级资本和 $20M 二级交易。NVIDIA、AMD、MongoDB 和 Databricks 等战略参投方贯穿多轮融资，把股东名单与 Fireworks 所依赖的硬件和数据平台生态绑在一起。截至 2026 年 5 月，Sacra 称公司正洽谈以 $15B 投后估值再次融资，Index 拟共同领投，但条款未确认。[CO011, CO012, CO013, CO014, CO015, CO016]

利益相关方或投资者图谱
利益相关方	角色 / 轮次	控制权或经济重要性	尽调问题
Benchmark	领投 - Series A（2024 年 3 月）	早期领投方；可能有董事席位	确认董事会构成和持股 %
Sequoia Capital	领投 - Series B；继续参与 Series C	多轮支持方；GP Sonya Huang	确认董事席位和 pro-rata 权益
Lightspeed Venture Partners	共同领投 - Series C（2025 年 10 月）	$4B 估值的后期领投方	确认治理权利
Index Ventures	共同领投 - Series C；潜在下一轮共同领投方	复投方（Sahir Azam）；论点型投资者	确认传闻 $15B 轮次中的配额
Evantic	共同领投 - Series C	新后期领投方	确认基金画像和持股
NVIDIA	战略 - Series B/C	硬件供应商和投资者	评估 GPU 配额冲突 / 收益
AMD	战略 - Series B/C	替代芯片供应商 / 投资者	评估 MI 系列采用情况
MongoDB / Databricks	战略 - Series B/C	数据平台伙伴 / 投资者	确认联合销售和伙伴深度

仅列领投方和战略投资者；未列个人天使（Slootman、Sandberg、Liu、Wang）和种子投资方。董事会构成和持股比例未公开。

[CO011, CO012, CO013, CO015, CO016]

1.4 规模与牵引指标

Fireworks 报告商业规模快速扩张。Series C 时，公司称已服务超过 10,000 家公司，较 Series B 约增长 10 倍，覆盖数十万开发者，并且每天处理超过 10 万亿 tokens；第三方画像称到 2026 年初每日 tokens 达 15 万亿。开发者基数从 2024 年 2 月约 12,000 人增至当年年底 23,000 人。收入数据因来源和时间口径不同而差异很大，需要谨慎看待：公司称 2025 年 10 月 Series C 时年化收入已超过 $280M；Sacra 估算 2025 年底约 $305M，到 2026 年 5 月年化约 $800M；而 2025 年更早报道提到 $130M ARR，并称公司盈利且同比增长 20 倍。毛利率估计接近 50%，低于软件常见的 70% 以上，因为 GPU 成本计入 COGS；管理层向投资人表示目标是 60%。[CO018, CO019, CO020, CO021, CO022, CO023]

快照 KPI 表
指标	数值 / 状态	截至	置信度	缺口或备注
估值	$4.0B 投后（Series C）	2025 年 10 月	高	据报 2026 年 5 月讨论 $15B 轮次（未确认）
总融资额	>$327M	2025 年 10 月	高	包括 Series C 中约 ~$20M 二级转让
年化收入（公司）	>$280M	2025 年 10 月	中	公司陈述；未经审计
年化收入（Sacra 估计）	~$800M	2026 年 5 月	低	第三方估计；与公司时点冲突
客户	10,000+ 家公司	2025 年 10 月	中	较 Series B 增加约 ~10x
开发者	数十万	2025 年 10 月	中	2024 年底引用为 23,000
Tokens/day	10T+（2026 年初 15T）	2025 年 10 月	中	吞吐指标，不是收入
毛利率	~50%（目标 60%）	2026	低	Sacra 估计；GPU COGS 占比高
员工人数	未披露	2026	低	无可靠公开数字

数值汇总自公司公告和第三方分析师画像；收入和毛利率为估计值，时点冲突，且不是经审计财务数据。

[CO011, CO014, CO018, CO021, CO022, CO024]

FO003: 可投性指标

头部 KPI 快照之外的牵引力、收入轨迹和关键人物信号。

收入和增长为不同时间口径下的估算；关键人物集中度为定性判断。

[CO018, CO022, CO023, CO019, CO033]

1.5 里程碑与负面信号

公司时间线从 2022 年离开 PyTorch 开始，随后完成三轮融资，连续推出平台能力（FireAttention、FireFunction V2、FireOptimizer、监督微调和强化微调），2025 年举办 Dev Day，2026 年 3 月上线 Microsoft Foundry，并收购 Hathora 以加深实时计算编排能力。增长叙事之外，也有后文会深入讨论的真实负面信号。独立评测者指出 Fireworks「只是引擎」，对开发者成熟度要求不低，并标记文档薄弱、没有持续免费层。分析师强调三类结构性风险：vLLM、SGLang 等开源服务框架进步带来的推理商品化；AWS Bedrock、Azure 和 Vertex 的 hyperscaler 捆绑；以及硬件集中度风险，因为 Fireworks 不拥有自己的 GPU 机群，而 NVIDIA 已通过收购 Lepton 直接进入推理市场。这些压力与异常强的创始团队和快速收入爬坡同时存在。[CO025, CO026, CO027, CO028, CO029, CO030]

里程碑表
日期	事件	类型	金额 / 估值 / 状态	含义
2022	团队离开 Meta PyTorch；Fireworks 在 Redwood City 创立	创立	n/a	推理系统背景的起点
2024 年 2 月	达到 ~12,000 名开发者	规模	12,000 devs	早期自下而上牵引力
2024 年 3 月	Benchmark 领投 Series A	融资	$25M	首个机构领投方
2024 年 7 月	Sequoia 领投 Series B	融资	$52M @ $552M	Compound-AI 定位
2024	发布 FireFunction V2 和 FireAttention V2	产品	已发布	函数调用和长上下文速度
2024 年 12 月	开发者基础达到 ~23,000	规模	23,000 devs	约 10 个月翻倍
2025 年 6 月	发布 Supervised Fine-Tuning V2	产品	已发布	更广模型 + QAT 支持
2025	强化微调和 Dev Day 2025	产品	已发布	Agentic 调优切入口
2025 年 10 月	Lightspeed、Index、Evantic 共同领投 Series C	融资	$250M @ $4B	客户较 Series B 增长 10x
2026 年初	扩展到 ~15T tokens/day	规模	15T tokens/day（每日处理量）	吞吐领先主张
2026 年 3 月	在 Microsoft Foundry（Azure）上线	伙伴关系	已上线	超大云厂商分发
2026	收购 Hathora，用于实时计算编排	治理	收购	沿堆栈向上垂直整合
2026 年 5 月	据报正讨论以 $15B 融资	融资	$15B（传闻）	不到 1 年潜在约 ~4x 跳升

时间线汇总自 Fireworks 博客、融资公告和分析师画像；部分产品发布日期按公告月份近似处理。

[CO011, CO014, CO019, CO025, CO026, CO027]

FO001: 公司里程碑时间线

按日期梳理创立、融资、产品、规模和合作伙伴里程碑。

部分发布日期近似到公告月份；$15B 融资轮未确认。

[CO011, CO014, CO019, CO025, CO026, CO017]

1.6 图表

Chapter 02

02市场分析

2.1 市场边界与定义

Fireworks 所在的是托管 AI 推理市场：为生产应用服务、微调并专用部署开放权重大语言、视觉、音频和多模态模型。相关支出是企业付给第三方、让模型在生产环境运行的钱，而不是训练基础模型或租裸 GPU 的开销。核心边界之外包括前沿实验室消耗的基础模型训练算力、CoreWeave 和 Lambda 等提供商的原始 GPU IaaS，以及 OpenAI 和 Anthropic 的封闭模型 API；但封闭 API 是最重要的现状替代方案。Fireworks 正在扩展进入的相邻预算池包括语音智能体、结合向量数据库的检索增强生成，以及面向智能体的强化学习训练。Fireworks 最直接的替代品，是在 vLLM 或 SGLang 上自托管开放模型、AWS Bedrock 和 Azure Foundry 等 hyperscaler 套件，以及继续依赖封闭 API。必须先界定边界，因为标题里的 “AI inference” 市场数字把硬件、hyperscaler 和独立提供商支出混在了一起。[CM001, CM002, CM003, CM004, CM005]

市场定义表
细分 / 类别	纳入支出	排除支出	买方 / 付款方	与 Fireworks 的相关性
托管开放权重推理	开放模型按 token serverless serving	封闭模型 API 使用	工程 / 平台预算	核心市场
微调与适配	LoRA / SFT / RFT 训练支出	Foundation-model 预训练	ML / 工程预算	核心邻近
专用 / 预留 GPU serving	托管专用部署	Bare-metal GPU IaaS 租赁	平台 / 采购	核心市场
语音与多模态 agents	Streaming STT+LLM+TTS 堆栈	电话硬件	产品预算	扩张邻近
RAG / embeddings	Embedding + reranking 推理	Vector DB 许可证	工程预算	扩张邻近
封闭模型 APIs（替代品）	n/a（排除）	OpenAI / Anthropic API 支出	工程预算	主要替代品

边界定义 Fireworks 作为独立推理提供商可捕获的支出；封闭 API 和原始 GPU IaaS 被排除，但列为替代品。

[CM001, CM002, CM003, CM004]

2.2 多视角市场规模测算

Fireworks 的机会不能用单一数字概括，因此我们用三种视角交叉测算。最宽的自上而下视角是全球 AI 推理市场：MarketsandMarkets 估计 2025 年为 $106.15B，到 2030 年增至 $254.98B，CAGR 为 19.2%；其他研究机构把 2026 年规模放在约 $118B 至 $126B，把 2034 年放在 $312B 至 $536B。这个视角被半导体和 hyperscaler 支出主导，会高估 Fireworks 可触达市场。更窄一层是生成式 AI 模型支出，Gartner（Index Ventures 引用）预计从 2025 年 $14B 到 2028 年 $39B，接近翻三倍，增长大多来自有利于 Fireworks 的专业化和微调模型。最相关的可服务视角，是独立开放权重推理服务细分市场；该市场已围绕约七家提供商整合。Together AI 年化收入接近 $1B，Fireworks 在 $280M–800M 区间，Groq 估值 $6.9B，因此独立提供商收入池今天只有数十亿美元，但扩张很快。Fireworks 自身约 $280M 以上收入，意味着它在该细分市场早期占个位数百分点份额。[CM006, CM007, CM008, CM009, CM010, CM011]

TAM/SAM/SOM 或规模测算视角表
视角	发布方	年份	数值	CAGR / 备注	置信度	局限
自上而下 AI 推理（TAM）	MarketsandMarkets	2025-2030	$106.15B -> $254.98B	19.2% CAGR	中	由芯片和超大云厂商主导
自上而下 AI 推理（替代）	Fortune / Polaris / R&M	2026 / 2034	~$118-126B / $312-536B	13-19% CAGR	低	各机构区间很宽
GenAI 模型支出（视角）	Gartner（via Index）	2025-2028	$14B -> $39B	~40%/yr	中	包含封闭模型支出
独立推理利基（SAM）	Sacra / 三角测算	2026	低个位数 $B	快速增长	低	无标准分析师指标
Fireworks 收入（SOM）	Fireworks / Sacra	2025-2026	$280M -> ~$800M	高增长	低	口径年份冲突

三种口径交叉校验；自上而下的 TAM 会高估 Fireworks 能触达的市场，因此 SAM/SOM 依赖公司层面的估算，置信度低。

[CM006, CM007, CM008, CM009, CM010, CM011]

FM001: 市场规模测算框架

AI 推理机会的 TAM/SAM/SOM 分层。

各层采用不同时间口径；SAM 是交叉估算，不是分析师测算值。

[CM006, CM009, CM010, CM012]

FM002: 市场估算区间

按预测年份给出 AI 推理市场的低 / 基准 / 高估算，单位为十亿美元。

区间覆盖 MarketsandMarkets、Polaris、Fortune、Research and Markets 和 Gartner 估算；单位为十亿美元。

[CM006, CM007, CM008]

2.3 买家与细分版图

Fireworks 的需求横跨三个买家细分，采用路径不同。AI 原生初创公司（如 Cursor、Perplexity 和 Liner）自下而上采用：个体开发者从自助 API key 和按量付费开始，经济买家是工程或平台负责人。数字原生企业（DoorDash、Notion、Shopify、Upwork、Quora）把功能从试点推向生产，再扩展到专用部署和微调，预算由产品工程组织掌握。传统和受监管企业（Samsung，以及越来越多的医疗和金融服务买家）通过议价合同自上而下采用，需要 SSO、审计日志、数据驻留以及 HIPAA 或 SOC2 姿态，预算由平台和采购职能掌握。三类客户中，用户都是开发者，付款来自工程预算，主导触发因素是封闭模型 API 在生产规模下的成本、延迟或控制限制。Fireworks 的 AWS Strategic Collaboration Agreement 和 Microsoft Foundry 可用性，让它能在现有云采购渠道里触达这些买家，而不是以独立供应商身份另起评估流程。[CM013, CM014, CM015, CM016, CM017]

细分市场 / 买方地图
细分市场	买方	用户	付款方	采用触发因素
AI 原生初创公司	工程 / 平台负责人	开发者	工程预算	规模化后的闭源 API 成本 / 延迟
数字原生企业	产品工程组织	开发者	工程预算	从试点扩到生产
受监管企业 / Fortune 500	平台 + 采购	内部开发者	采购预算	数据控制与合规
语音 / 智能体构建者	产品负责人	应用用户	产品预算	低于 500ms 的延迟需求
RAG / 搜索团队	工程负责人	开发者	工程预算	检索延迟与成本

各细分市场里，用户通常是开发者，付款方来自工程或采购预算；采用触发因素随成熟度和监管要求而变。

[CM013, CM014, CM015, CM016]

FM003: 买方 / 客群地图

买方、用户、付款方关系，以及进入 Fireworks 的采用路径。

[CM013, CM014, CM017]

FM004: 采用漏斗或价值链地图

从认知到企业标准化的购买与部署阶段。

阶段综合自 Fireworks 的 GTM 描述；数值是示意性的相对权重，不是已披露转化率。

[CM015, CM016, CM017]

2.4 增长驱动与采用约束

几个因素在扩大 Fireworks 的市场。开源模型质量正在向封闭模型靠拢，智能体和复合 AI 系统会放大每个任务的推理调用次数，基于专有数据的微调正成为竞争必需品，企业也越来越希望掌控自己的 AI，而不是依赖少数封闭实验室。成本压力同样有利：规模化时，开放权重推理可以比封闭 API 便宜很多。反向约束也很强。随着 vLLM 和 SGLang 进步，推理正在商品化，优化栈的专有优势被压缩，并引发价格竞争；Fireworks 的 Llama 70B 价格与 Together 相差约 2% 以内。Hyperscaler 捆绑让 AWS、Azure 和 Google 把推理折进既有安全、计费和治理关系。GPU 供给集中，Fireworks 也不拥有自己的机群。EU AI Act 等监管增加合规开销；降低迁入成本的 OpenAI 兼容 API 同样降低迁出成本，限制长期锁定。[CM018, CM019, CM020, CM021, CM022, CM023]

增长驱动因素与约束表
驱动因素 / 约束	方向	时间	影响	尽调问题
开源模型质量收敛	驱动因素	当前	扩大可服务工作负载	跟踪开源与闭源模型质量差距
智能体式 / 复合 AI	驱动因素	1-2 年	单任务推理调用增加	衡量每条工作流的 token 增长
基于专有数据微调	驱动因素	当前	支出价值更高、黏性更强	评估 RFT/SFT 挂载率
企业数据所有权	驱动因素	1-3 年	偏好开源模型	调研买方自建与采购取舍
推理商品化	约束	当前	利润率 / 价格受压	监测 vLLM/SGLang 能力持平
超大云厂商捆绑	约束	当前	渠道被截留风险	评估 Bedrock/Azure 重叠
GPU 供应集中	约束	持续	产能 / 成本暴露	审查 GPU 合同
监管（EU AI Act）	约束	1-3 年	合规开销	按层级梳理义务

驱动因素放大市场，约束则压缩利润率或截留渠道；时间栏说明每项因素何时会实质影响采用。

[CM018, CM019, CM020, CM021, CM022, CM023]

2.5 规模测算缺口与相互矛盾的估计

几个缺口限制了市场规模测算的可信度。公开的 “AI inference” 总量差异很大，并打包了不兼容类别（芯片、hyperscaler 服务和独立软件），因此自上而下的 TAM 不能干净映射到 Fireworks 可触达收入。没有标准分析机构衡量独立推理提供商收入池；只能从单家公司估计拼出，时间口径和可靠性参差不齐。不同机构预测的 CAGR 约在 13% 至 19% 之间，2034 年估计相差超过 $200B。在其中，Fireworks 自身收入数字也被多个来源争议。这些缺口说明市场显然很大且高速增长，但与估值相关的可服务和可获得份额仍是估计，不是实测事实；任何规模测算都应视为方向性判断。我们保留这种精度失败，而不是断言一个单一 SAM。[CM025, CM026, CM027, CM028]

2.6 图表

Chapter 03

03竞争对手

3.1 竞争格局

推理市场已经分成四个清晰的竞争层，Fireworks 每一层都承压。托管开放模型平台，主要是 Together AI、Baseten 和 Replicate，是最接近的直接同业，围绕模型广度、开发者体验和按 token 价格竞争。垂直整合的芯片玩家 Groq、Cerebras 和 SambaNova，不靠通用 GPU 上的软件优化，而是用定制硬件攻击延迟和成本。Hyperscaler 套件——AWS Bedrock、Google Vertex AI、Microsoft Azure Foundry 和 Databricks Model Serving——结构性威胁最大，因为它们把模型访问、基础设施、治理和合同压进一个平台。最后，vLLM 和 SGLang 等开源服务框架，加上 NVIDIA NIM 这样的打包层和 OpenRouter 这样的路由器，正在商品化 Fireworks 自身栈里的专有优势。现状替代方案包括继续使用封闭 API 和内部自托管。最可能的新进入者压力来自 NVIDIA 本身：它通过收购 Lepton 和推出竞争性 GPU 云市场直接进入推理，把关键供应商变成对手。[CP001, CP002, CP003, CP004, CP005, CP006]

FP001: 竞争定位图

按价格竞争力（x）与企业 / 广度深度（y）定位供应商。

轴位置为作者定性判断，综合了定价和能力证据。

[CP001, CP014, CP015]

3.2 竞争对手画像

Together AI 是 Fireworks 最接近的直接竞争对手：它由 Percy Liang、Chris Ré 和 Vipul Ved Prakash 于 2021 年创立，2025 年 2 月以 $3.3B 估值完成 $305M Series B，据称到 2026 年初年化收入约 $1B，覆盖无服务器推理、专用集群、微调、语音和强化学习。Baseten 定位为企业推理工程平台，提供自托管和混合 VPC 部署；它在 2026 年 1 月以 $5B 估值完成 $300M 融资，由 IVP 和 CapitalG 领投，据称 NVIDIA 投入 $150M，使总融资升至约 $585M。Groq 以定制 LPU 芯片竞争，2025 年 9 月以 $6.9B 估值融资 $750M，并宣传 Llama 模型上每秒 750 多 tokens，且与 Meta 合作为官方 Llama API 提供动力。Cerebras 和 SambaNova 在高端低延迟市场延续硬件主导攻击，Replicate、Modal 和 Anyscale 争夺开发者心智。相比之下，Fireworks 拥有 $4B 估值和 $280M 以上收入，并在可靠性和函数调用上处于品类领先。[CP007, CP008, CP009, CP010, CP011, CP012]

竞品画像表
竞品	层级	融资 / 估值	目标客户	产品范围	指示性价格（Llama 70B）	战略方向
Fireworks AI	托管开源模型	已融资 $327M / $4.0B	AI 原生 + 企业开发者	无服务器、微调、RFT、专用实例、语音	$0.90/M	向上走：调优、智能体、治理
Together AI	托管开源模型	$533.5M / $3.3B（洽谈 $7.5B）	从初创公司到企业	无服务器、集群、微调、语音、RL	$0.88/M	自有 GPU 集群 + 产品广度
Baseten	托管开源模型	~$585M / $5.0B（洽谈 $11B）	合规要求重的企业	定制模型、VPC / 自托管运行时	按报价	企业推理工程
Replicate	托管开源模型	私有 / 未披露	开发者 / 实验	广泛模型目录、API 调用运行	按运行次数	漏斗顶部开发者心智
Groq	垂直芯片	$750M+ / $6.9B	延迟敏感型工作负载	LPU 推理 API	$0.59/M	定制芯片 + Meta Llama API
Cerebras / SambaNova	垂直芯片	私有 / 数十亿美元级	性能敏感型	晶圆级 / RDU 推理	按报价	硬件驱动的延迟领先
AWS Bedrock / Azure / Vertex	超大云厂商捆绑	上市巨头	既有云企业客户	捆绑模型访问 + 治理	捆绑	供应商整合
Databricks / NVIDIA NIM	超大云厂商 / 打包	上市 / 私有	数据平台与基础设施买方	模型服务 / NIM 打包	捆绑	将推理吸收入平台

融资和估值来自公司公告与 Sacra；价格为 Llama 70B 无服务器指示性费率，会随层级和日期变化。

[CP007, CP008, CP009, CP010, CP011, CP014]

3.3 能力、定价与 GTM 对比

能力上，Fireworks 通过可靠性和结构化输出做出差异：独立监测显示其 2026 年 Q1 可用性为 99.8%，在专业提供商中最高；FireFunction 模型多工具函数调用准确率约 92%，领先 Together 和 Groq，距离 GPT-4o 只差几个百分点。价格上，竞争极薄：Llama 3.3 70B 在 Fireworks 每百万 tokens 约 $0.90，Together 为 $0.88，Groq 为 $0.59；同一模型在七家提供商之间价差约六倍。原始速度上，Groq 的 LPU 以每秒 400–750 tokens 领先，而 Fireworks 约为 145；但 Fireworks 赢在延迟一致性和负载下稳定性。GTM 上，Together 和 Baseten 与 Fireworks 一样走自下而上的开发者路径，但 hyperscaler 通过既有采购、安全和计费关系赢得分发。信任与监管方面，Fireworks、Together 和 Baseten 均提供 SOC2/HIPAA 姿态以及 VPC 或数据驻留选项，Baseten 最偏向自托管、合规重的部署。[CP014, CP015, CP016, CP017, CP018, CP019]

功能 / 能力矩阵
能力	Fireworks	Together AI	Baseten	Groq
无服务器开源模型 API	是	是	是	是
模型目录规模	50+	200+	聚焦定制	15-20
LoRA 微调	是	是 + 完整微调	是	否
函数调用质量	同类最佳（~92%）	好	好	基础
定制芯片	否	否	否	是（LPU）
VPC / 自托管	EKS 隔离部署	专用	是（核心强项）	有限
语音智能体平台	是	是	合作伙伴	否
强化微调	是	是	部分支持	否

根据供应商文档、TokenMix 和 Sacra 整理；「同类最佳」反映独立 FireFunction 基准测试结果。

[CP014, CP015, CP016, CP017]

定价 / 包装对比
指标	Fireworks	Together AI	Groq	备注
Llama 3.3 70B ($/1M)	$0.90	$0.88	$0.59	Fireworks 比 Together 高 ~2%，比 Bedrock 低 66%
Llama 3.3 8B ($/1M)	$0.20	$0.18	$0.05	Groq 最便宜
2026 年 Q1 正常运行时间	99.8%	99.7%	99.4%	Fireworks 最高
吞吐量（tok/sec）	145	95	420	Groq 最快
TTFT P50	150ms	220ms	65ms	Groq 延迟最低
微调	LoRA $16/M	LoRA+完整微调 $14/M	None	Together 最便宜 / 覆盖最广
批处理 API	尚未	是（优惠 30-50%）	否	Together 有优势

价格和基准来自 TokenMix 2026 年 4 月数据与 DeployBase；数字为指示性，变化频繁。

[CP014, CP015, CP018]

FP002: 功能广度 / 能力图谱

四家直接竞争对手和芯片竞争对手的能力覆盖情况。

能力单元格由服务商文档和基准测试汇总得出。

[CP016, CP017]

3.4 切换成本、锁定与分发能力

推理的切换成本结构性偏低。包括 Fireworks、Together、Groq 和 Baseten 在内的大多数提供商都暴露 OpenAI 兼容 API，供应商之间迁移可在数分钟内完成；OpenRouter 和 TokenMix 等路由聚合器还主动鼓励跨提供商多栖和自动故障转移。这限制了所有人的长期锁定，意味着份额要靠性能、调优和企业集成守住，而不是靠合同。分发能力越来越关键：hyperscaler 和 NVIDIA 控制独立提供商依赖的 GPU 供给、安全姿态和采购关系。Fireworks 的反制，是通过 AWS Strategic Collaboration Agreement 和 Microsoft Foundry 可用性接入这些渠道，同时向微调、强化学习、语音和企业治理上移，创造更粘、更高价值的关系。Baseten 的 VPC 与自托管足迹、Together 的自有数据中心和 GPU 集群策略，是对同一分发与供给问题的替代答案。[CP020, CP021, CP022, CP023, CP024]

3.5 护城河耐久性与负面证据

Fireworks 的护城河真实但狭窄。自研 FireAttention 和 FireOptimizer 栈把推理系统工程转化为性能与价格优势，可靠性和函数调用领先也是真实优势。但护城河面临清晰侵蚀路径。vLLM 和 SGLang 等开源服务框架持续缩小性能差距，Baseten 也公开基于它们构建；NVIDIA 把 NIM 推成打包层；Snowflake 发布 Arctic Inference，作为开放 vLLM 插件。资本更厚的竞争对手抬高门槛：Groq 估值 $6.9B，Baseten 估值 $5B，Together 传闻接近 $7.5B，都有更多资产负债表空间承诺 GPU 和企业 GTM。硬件集中也是负面信号，因为 Fireworks 不拥有 GPU，而供应商兼投资人 NVIDIA 现在直接竞争。长期问题在于，Fireworks 能否比生态系统商品化服务层更快，把栈延伸到调优、智能体和治理。[CP025, CP026, CP027, CP028, CP029, CP030]

护城河耐久度 / 竞争风险台账
风险	机制	严重性	证据
开源服务平价	vLLM/SGLang 缩小性能差距	高	Baseten 基于 SGLang/vLLM/TGI 构建
NIM 打包	NVIDIA 标准化企业推理	中	NVIDIA 推动 NIM 分发
供应商变竞品	NVIDIA 通过 Lepton 进入推理	高	NVIDIA GPU 云市场
超大云厂商捆绑	Bedrock/Azure 吸收推理	高	Bedrock 自定义模型导入（Qwen）
资本不对称	竞争对手融到更大轮次	中	Groq $6.9B，Baseten $5B
价格商品化	单 token 价差薄如刀片	高	Fireworks 与 Together 相差 2% 以内
低切换成本	OpenAI 兼容 API + 路由器	中	OpenRouter 多归属
硬件集中	无自有 GPU 集群	中	从第三方采购 NVIDIA/AMD

风险台账综合 Sacra 分析与定价 / 基准来源；严重性为作者的定性判断。

[CP025, CP026, CP027, CP028, CP029, CP030]

FP003: 护城河 / 就绪度 KPI

Fireworks 竞争位置的指标。

KPI 综合了基准测试和融资证据；速度比为 Fireworks 吞吐量除以 Groq 吞吐量。

[CP014, CP015, CP028]

3.6 图表

Chapter 04

04财务

4.1 收入流与定价模型

Fireworks 采用按使用量计费的 B2B 模式，叠加在映射客户生命周期的多个产品表面上。无服务器推理按 token 计费，微调按训练 token 计费，强化微调按 GPU-hour 计费，按需专用部署按 GPU-second 或 GPU-hour 计费；预留容量则按更长期承诺单独签约、议价定价。这让 Fireworks 几乎能在客户 AI 工作流的每个阶段捕获收入，从实验到规模化生产。公开无服务器费率展示了模型：Llama 3.3 70B 每百万 tokens 约 $0.90，8B 模型 $0.20，DeepSeek V3 $0.50；图像生成每张约 $0.013 至 $0.04，预留容量每 replica 每小时约 $4.80。收入结构未披露，但分析师预计收入会从商品化无服务器 token 量，转向价值更高的专用部署、微调和企业合同；这将随时间改善毛利率和收入耐久性。[CI001, CI002, CI003, CI004, CI005]

收入流表
收入流	计费基础	生命周期阶段	利润率特征	披露
无服务器推理	按 token	实验与生产	较低（商品化）	费率公开
微调（LoRA/SFT）	按训练 token	适配	较高	费率公开
强化微调	按 GPU 小时	适配 / 智能体	较高	费率公开
按需专用实例	按 GPU 秒 / 小时	生产扩容	较高	费率公开
预留容量	合同承诺	规模化企业	最高（协商）	未公开
语音 / 多模态	按用量	扩张	混合	部分公开

收入流和计费口径来自 Sacra 与 Fireworks 定价；各收入流占比未披露，利润率画像只能定性判断。

[CI001, CI002, CI003]

定价 / 变现表
项目	价格	单位	备注
Llama 3.3 70B	$0.90	每 1M tokens	比 Together 高约 2%，比 Bedrock 低 66%
Llama 3.3 8B	$0.20	每 1M tokens	入门工作负载
DeepSeek V3	$0.50	每 1M tokens	前沿开放模型
Flux 1.1 Pro	$0.04	每张图像	最高 1024x1024
SDXL 1.0	$0.013	每张图像	成本更低的图像生成
预留容量	$4.80	每小时每副本	约 50 个并发请求
LoRA 微调（70B）	$16	每 1M 训练 tokens	比 Together 高 $2/M
免费额度	$1	一次性	无持续免费层

TokenMix 与 DeployBase 给出的 2026 年 4 月无服务器指示性费率；价格变动频繁，且不含协商后的企业条款。

[CI004, CI005, CI007]

FI001: 收入模型桥

基于用量的收入流如何沿客户生命周期汇聚为总收入。

各收入流占比仅为示意；Fireworks 未披露收入结构。

[CI001, CI002, CI003]

4.2 市场进入与销售效率

Fireworks 的 GTM 入口自下而上，扩张自上而下。开发者用自助 API key 和按量付费即可立即开始；支持来自 $1 免费额度，而非持续免费层，标准速率限制约为每分钟 600 次请求。更大客户会升级为议价企业关系，获得更高速率限制、预留容量、客户管理、定制优化和私有部署。其上叠加现场与合作伙伴销售动作，由 AWS Strategic Collaboration Agreement 锚定：该协议资助概念验证和初创加速项目，让 Fireworks 通过既有采购渠道接触企业买家，而不是要求客户做独立供应商评估。CAC、回本期和净收入留存等销售效率指标未披露；但先落地、再扩张的结构是主要效率杠杆，一个无服务器功能可增长为专用、微调、语音和预留容量支出。按 10,000 家以上公司粗算，混合年化每家公司收入估计接近 $28,000，但基数偏向少数大型生产部署。[CI006, CI007, CI008, CI009, CI010]

单位经济表
指标	数值 / 状态	驱动因素	置信度
毛利率	~50%	GPU COGS 占比高	中
目标毛利率	60%	利用率 + Blackwell + 组合	低
混合 ARPA	~$28K/yr	10,000+ 家公司	低
收入集中度	向大型部署倾斜	生产级巨鲸客户	低
Multi-LoRA 利用率	每个基础模型挂载多种变体	单变体成本更低	中
CAC / 回本期	未披露	自下而上 + 伙伴销售	低
净收入留存	未披露	先落地、再扩张	低

单位经济数据来自 Sacra 估算或定性判断；CAC、回本期和 NRR 均未公开。

[CI008, CI009, CI011, CI012, CI013]

4.3 成本结构与毛利率驱动

Fireworks 不是纯软件业务：GPU 采购、容量规划和区域基础设施都是计入 COGS 的真实成本输入，因此 Sacra 估计毛利率接近 50%，远低于订阅软件常见的 70% 以上。管理层告诉投资人，公司目标是通过更高 GPU 利用率、NVIDIA Blackwell 等新架构带来的硬件效率提升，以及收入结构向专用和企业工作负载转移，把毛利率推到 60%。核心经济逻辑是，自研推理优化 FireAttention 和 FireOptimizer 能把工程能力转成定价权：如果 Fireworks 比客户自托管更快、更高吞吐地服务模型，就可以在低于替代方案总成本的同时收取溢价。Multi-LoRA 把许多微调变体整合到单个基础模型部署上，降低每个变体的计算成本。成本环境受 NVIDIA 和 AMD 数据中心 GPU 经济性塑造；两家公司都报告 AI 加速器收入快速增长，说明 Fireworks 的投入成本处在供应商驱动、容量受限的市场里。[CI011, CI012, CI013, CI014, CI015, CI016]

FI002: 单位经济桥

GPU 成本如何借助专有优化和定价权转化为毛利率。

[CI011, CI012, CI013, CI014]

4.4 公开牵引与私有指标缺口

公开牵引信号很强，但时间口径不一致。Fireworks 称其 2025 年 10 月 Series C 时年化收入超过 $280M；Sacra 估算 2025 年底约 $305M，到 2026 年 5 月年化约 $800M；第三方画像称 2026 年初超过 $315M；2025 年更早报道则称 $130M ARR，并称公司盈利且同比增长约 20 倍。平台每天处理超过 10 万亿 tokens（2026 年初为 15 万亿），覆盖 10,000 多家公司和数十万开发者。这些大多是公司披露或估算数字；经审计财务、收入结构、净收入留存、流失率和员工数均未公开。十二个月内收入区间从 $130M 到约 $800M 年化，既反映真实超高速增长，也反映测量口径不一致；任何单一数字都应视为方向性而非已验证。[CI017, CI018, CI019, CI020, CI021]

公开财务缺口表
指标	公开状态	缺什么	尽调路径
收入 / ARR	估算相互冲突	单一、可对账且带日期的口径	管理层确认 ARR
毛利率	分析师估算约 50%	经审计毛利率	确认性财务资料
净收入留存	未披露	扩张 / 流失数据	Cohort 留存包
员工数	未披露	员工数量	HR / LinkedIn 估算
烧钱与跑道	未披露	现金流量表	银行余额 + 烧钱
收入结构	未披露	按收入流拆分	产品收入拆分

所列指标均为私有信息；本表界定了验证财务质量所需的尽调问题。

[CI017, CI018, CI020, CI028, CI029]

FI003: 财务估算区间

不同来源和时间点对 Fireworks 年化收入的估算，单位为 USD millions。

估算覆盖公司表述和不同时间点的第三方分析师数据；区间近似对应其给出的点估计。

[CI017, CI018, CI019]

4.5 资本充足性与融资依赖

Fireworks 已通过种子轮、Series A、B、C 融资超过 $327M；仅 2025 年 10 月的 Series C 就提供 $250M，其中约 $230M 为一级资本、$20M 为二级交易，估值 $4B。这笔一级注资叠加 2025 年据称盈利和高增长收入基础，说明近期资本充足性较舒服；但现金余额、烧钱速度和跑道未披露。公司已表示未来一年会将计算足迹扩大三到四倍，这是资本密集计划，会提高对 GPU 获取的依赖，也可能成为下一轮融资触发因素；Sacra 称截至 2026 年 5 月，Fireworks 正洽谈以 $15B 估值再次融资。主要融资依赖是 GPU 供给：Fireworks 不拥有机群，而是从第三方采购 NVIDIA 和 AMD 容量，暴露于配额约束以及 NVIDIA 自身进入推理的风险。未披露公开债务或项目融资义务。[CI022, CI023, CI024, CI025, CI026]

资本充足性表
项目	数值 / 状态	截至	备注
累计融资	>$327M	Oct 2025	从种子轮到 Series C
Series C 规模	$250M	Oct 2025	$230M 主融资 + $20M 二级交易
估值	$4.0B	Oct 2025	投后
盈利能力	据称已盈利	Mid-2025	据 scroll.media；未验证
现金 / 烧钱 / 跑道	未披露	2026	尽调阻塞项
计划资金用途	算力扩张 3-4x	明年	资本密集
下一轮信号	$15B 融资洽谈	May 2026	据 Sacra；未确认
债务 / 项目融资	未披露	2026	无公开义务

资本数据来自公司与 Sacra；现金、烧钱和跑道未公开，资本充足性评估因此受限。

[CI022, CI023, CI024, CI025]

FI004: 资本强度 / 现金流图谱

资本如何流入算力和基础设施，再回流为收入和利润率。

流向综合了披露的资金用途和分析师估算；现金和烧钱速度未披露。

[CI022, CI024, CI025, CI026]

4.6 财务结论

收入质量上，Fireworks 展现可信的超高速增长，按使用量计费的模型可覆盖客户生命周期支出；但缺少审计数据、收入结构和留存指标，限制了信心。毛利率上，约 50% 的毛利率是核心财务弱点：GPU 成本使其结构性低于软件常态，通向公司所称 60% 目标的路径依赖利用率提升和结构转移，合理但尚未证实。资本强度上，三到四倍计算扩张加上缺少自有 GPU，使这个模式比典型 SaaS 更吃资本、更依赖供给。主要尽调阻塞点包括一套可调和的收入数字、毛利率和单位经济验证、烧钱与跑道，以及净收入留存。整体图景是一家快速扩张、资金充足、有真实但受供应商暴露影响的经济性的公司，而不是已经证明的高毛利软件复利机器。[CI027, CI028, CI029, CI030]

4.7 图表

Chapter 05

05产品与技术

5.1 以客户工作流定义产品

从客户角度看，Fireworks 是把开源模型带入生产的那一层：跑得快、便宜、可靠，客户无需管理 GPU。开发者注册后，把 OpenAI 兼容 API 指向 Llama 4、DeepSeek 或 Qwen 等模型，即可获得低延迟推理、函数调用、JSON 模式结构化输出和流式传输。随着用量增长，同一客户可以用专有数据微调模型，迁移到专用或预留 GPU 容量以获得有保障的吞吐，加入用于 RAG 的检索和 embeddings，并部署语音智能体。平台横跨文本、图像（Flux、SDXL）、音频和多模态格式，覆盖数百个模型，并对主要新版本提供首日支持。它为客户完成的核心工作，是把「能在 notebook 里跑」的模型，与「能在生产环境服务数百万用户」的模型之间的缺口压平；Fireworks 将此定位为实验与发货的区别。这也是客户把它称为推理引擎而非应用的原因：它提供速度、成本和控制，产品由客户自己构建。[CE001, CE002, CE003, CE004, CE005]

工作流 / 用例表
用例	客户案例	结果	来源类型
代码生成	Cursor	Fast Apply 约 1,000 tokens/sec	客户故事
生产力 AI	Notion	延迟从 2s 降至 350ms	客户故事
代码辅助	Sourcegraph	延迟降低 30%，采纳率提升 2.5x	客户 / AWS
提案起草	Upwork（Uma）	实时生成定制提案	客户故事
对话式搜索	Quora（Poe）	响应速度提升至 3 倍	报道
邮件助手	Superhuman	Ask AI 复合系统	客户故事
企业搜索	Hebbia	快速接入新的开放模型	分析师

用例和效果来自 Fireworks 客户故事、AWS 案例研究和分析师报道；结果由厂商或客户报告。

[CE002, CE018, CE019, CE020]

FE002: 客户工作流 / 运行流

开发者从 API 调用，经推测解码，到获得响应的路径。

[CE001, CE013, CE015]

5.2 产品模块与资产版图

Fireworks 的产品表面可拆成几个模块。无服务器推理是入口产品：按 token 付费访问 50 多个活跃服务模型（目录中有数百个），包括 Llama 4 Scout 和 Maverick、DeepSeek V3、Qwen 3、Mixtral、Gemma 3 和 Phi-4，并通过 Flux、SDXL 和视觉模型生成图像。FireFunction 是自研函数调用模型族，用于工具使用和结构化输出。定制模块包括 LoRA 微调、带量化感知训练的 Supervised Fine-Tuning V2，以及面向智能体任务的 Reinforcement Fine-Tuning，均通过 Build SDK 和 Experiment Platform 暴露。部署模块覆盖无服务器、按需专用和预留容量，并提供 multi-LoRA 托管，把许多微调 adapter 打包到一个基础部署上。更新的表面包括 Voice Agent Platform，它将转写、语言模型和工具调用共址，以实现低于 500ms 响应；以及 BYOB 安全训练，让企业从自己的 AWS S3 bucket 训练。合在一起，这些模块让单个客户关系能从一个无服务器功能扩展为完整生产 AI 运行时。[CE006, CE007, CE008, CE009, CE010]

产品模块 / 资产矩阵
模块	功能	计费	成熟度
无服务器推理	按 token 调用 50+ 个托管模型	按 token	正式可用
FireFunction	函数调用 / 结构化输出	按 token	正式可用
LoRA 微调 / SFT V2	用 QAT 定制模型	按训练 token	正式可用
强化微调	训练智能体以超过闭源模型	按 GPU-hour	正式可用
专用 / 预留部署	在专用 GPU 上保障吞吐	按 GPU-hour	正式可用
Multi-LoRA 托管	一个基础模型挂载多个 adapter	按 token	正式可用
Voice Agent Platform 产品线	STT + LLM + 工具调用，低于 500ms	按用量	较新
Build SDK / Experiment Platform 开发工具	以编程方式构建、调优、评估	已包含	较新

模块清单汇总自 Fireworks 博客和文档；成熟度为定性判断（正式可用 = 已普遍开放，较新 = 近期推出）。

[CE006, CE007, CE008, CE009]

5.3 架构与运营模式

Fireworks 在通用 NVIDIA GPU 上运行自研多层推理栈。内核层，FireAttention 是定制 CUDA attention 实现；Fireworks 称其显著快于 vLLM 和 TensorRT-LLM，并在多个版本中扩展，以支持长上下文和 Llama 4 chunked local attention 等架构。其上，FireOptimizer 执行自适应 speculative execution，针对每个工作负载个性化 speculative decoding、draft-model 选择和缓存；公司称生产中延迟最高可降低约 3 倍，并在 NVIDIA Blackwell B200 硬件上原生支持 FP4。服务拓扑结合无状态请求路由器、用于 speculative decoding 的 draft 和 target GPU pods、分布式 KV cache、连续 batching 和 disaggregated serving，扩展到文档记录的约每分钟 50,000 次请求测试。Multi-LoRA 将许多微调变体整合到单个基础模型上。运营模式对开放模型中立：Fireworks 赌的是运行任一时点胜出的开放模型，而不是押注单个模型；因此对新版本首日支持成为核心工程纪律。[CE011, CE012, CE013, CE014, CE015, CE016]

技术 / 运营架构表
层级	组件	功能	差异化
API	OpenAI 兼容 API	模型访问、流式输出、JSON mode	切换成本低
编排	无状态请求路由器	在 pods 间路由请求	扩展至约 50K RPM
优化	FireOptimizer	自适应推测执行	延迟最高降低约 3x
推测	草稿 + 目标 pods	推测解码	并行生成 token
内核	FireAttention	自研 CUDA attention	快于 vLLM / TensorRT-LLM
内存	分布式 KV cache	复用上下文，压低 prefill	长上下文延迟更低
适配	Multi-LoRA	每个基础模型挂载多个 adapter	提高 GPU 利用率
硬件	NVIDIA / AMD GPUs（含 B200）	算力底座，FP4	新硅片首日支持

架构汇总自 Fireworks 博客 / 文档和独立技术文章；性能主张由厂商或分析师报告。

[CE011, CE012, CE013, CE014, CE015]

FE001: 产品架构图

Fireworks 推理栈的分层结构，从 API 一直到 GPU 硬件。

分层综合自 Fireworks 博客 / 文档和独立架构文章。

[CE011, CE012, CE013, CE014]

5.4 部署、可靠性、集成与路线图

Fireworks 支持跨全球多区域机群的无服务器、按需专用和预留部署，文档位置包括 Frankfurt、Iceland、Tokyo，以及美国、欧洲和 APAC 区域，可满足延迟和数据驻留需求。OpenAI 兼容 API 加上 LangChain、LlamaIndex 等框架的 SDK 和连接器降低了集成难度，从封闭 API 迁移可在数分钟内完成。可靠性是核心主张：独立监测显示，2026 年 Q1 可用性为 99.8%，在专业提供商中最高，且负载下稳定性强。记录在案的生产结果包括：Cursor 在代码生成中达到约每秒 1,000 tokens；Notion 将 AI 响应延迟从约 2 秒降至 350 毫秒；Sourcegraph 延迟降低 30%，completion acceptance 提高 2.5 倍。Series C 资助的路线图瞄准更深的调优与推理对齐研究、端到端模型生命周期工具链，以及全球计算扩大三到四倍；收购 Hathora 则用于加深实时编排。[CE017, CE018, CE019, CE020, CE021, CE022]

路线图 / 发布 / 开发阶段表
项目	阶段	时间	含义
FireAttention（v2+）	已发布	2024+	长上下文速度
FireFunction V2	已发布	2024	函数调用
FireOptimizer	已发布	2024	自适应优化
Supervised Fine-Tuning V2	已发布	Jun 2025	QAT，更多模型
强化微调	已发布	2025	智能体调优
Voice Agent Platform 产品线	已发布	2025-2026	新预算类别
Microsoft Foundry 发布	已发布	Mar 2026	Azure 分发
模型生命周期工具链	计划中	2026+	端到端创建
3-4x 算力扩张	计划中	2026	容量扩容

发布时间线来自 Fireworks 博客、文档更新日志和分析师报道；计划中项目是公司披露的路线图意图。

[CE008, CE021, CE022]

FE004: 产品成熟度 / 能力图谱

各模块在能力维度上的成熟度。

成熟度单元格为作者定性判断，综合了产品和合规证据。

[CE006, CE008, CE029, CE031]

5.5 差异化、IP 与数据

Fireworks 的差异化由工程驱动。核心知识产权是自研推理引擎，尤其是 FireAttention 的定制 kernels 和 FireOptimizer 的自适应优化，把创始团队 PyTorch 背景中的系统能力转化为可衡量的速度和成本优势；没有公开专利列出，因此护城河是 know-how，而不是注册 IP。第二个差异化来源是产品—模型协同设计：客户交互形成数据反馈环，持续改进微调模型；Fireworks 将其描述为企业用 AI 构建竞争护城河的方式。第三是广度和新鲜度：数百个开放模型和模态的首日支持，让平台受益于模型更替，而不是被它威胁。主要脆弱点在于，优化优势建立在 vLLM 和 SGLang 等持续进步的开源框架之上，因此差异化必须不断重新赚回来。获取前沿 NVIDIA 和 AMD GPU 的供给也是助推优势，但并非独占优势。[CE023, CE024, CE025, CE026, CE027]

FE003: 关键依赖图

Fireworks 平台依赖的上游要素。

依赖图综合自技术来源；边的方向表示从上游到平台的依赖。

[CE015, CE024, CE026, CE027]

5.6 信任、安全、安保与合规

Fireworks 的企业姿态为受监管买家而建。平台默认零数据留存，提供单点登录、审计日志和数据驻留控制；其基于 AWS 的推理解决方案符合 HIPAA 和 SOC2 Type II。最敏感的工作负载可使用 airgapped EKS 部署，以及 bring-your-own-bucket 安全训练，把训练数据留在客户自己的 AWS S3 中。JSON mode 和 grammar-constrained decoding 等结构化输出控制提升可靠性，减少智能体工作流中的畸形响应；FireFunction 的高 schema-compliance 率支持可靠工具使用。这些能力打开了医疗、金融服务和政府相邻工作负载等受监管垂直，此前这些领域很难由独立推理供应商触达。产品—模型协同设计循环中的持续评估和强化学习进一步强化质量。缺口仍在：Fireworks 未发布正式标准层 SLA，企业 SLA 逐案议价；独立评测者指出部分文档薄弱，这些都是安全敏感买家的尽调事项。[CE028, CE029, CE030, CE031, CE032]

信任 / 质量 / 合规表
控制项	状态	范围	备注
SOC2 Type II	合规	基于 AWS 的推理	据 AWS 案例研究
HIPAA	合规	基于 AWS 的推理	支持医疗场景
零数据留存	默认	企业	隐私姿态
SSO / 审计日志	可用	企业	治理
数据驻留	可用	多区域	Frankfurt / Iceland / Tokyo
Airgapped EKS	可用	敏感工作负载	隔离
BYOB 安全训练	可用	SFT / RFT	客户 AWS S3
标准层 SLA	未发布	无服务器	企业客户协商

合规姿态来自 AWS 案例研究和 Sacra；未发布标准 SLA 是一个尽调项。

[CE028, CE029, CE030, CE032]

5.7 图表

Chapter 06

06客户

6.1 客户基数分层

Fireworks 的客户基数横跨三大类，区别在于买家、用户、付款方和采用路径。AI 原生初创公司，包括 Cursor、Perplexity、Liner 和 Cresta，自下而上采用：个体开发者从自助 API key 开始，经济买家是工程或平台负责人。数字原生企业，如 DoorDash、Notion、Shopify、Upwork 和 Quora，把功能从试点推入生产，并扩展到专用部署和微调，预算由产品工程组织掌握。Samsung 和 Uber 代表的传统与更大型企业，以及医疗和金融服务中越来越多的受监管买家，通过需要合规和数据驻留控制的议价合同自上而下采用。三类客户中，用户是开发者，付款来自工程或采购预算，用例集中在代码辅助、对话式 AI、企业搜索、智能体工作流和语音。地域上，客户基数偏北美和欧洲，但 API 全球可用；垂直覆盖软件、电商、市场平台、客服和法律科技。[CU001, CU002, CU003, CU004, CU005]

客户分层表
客群	示例客户	购买方 / 付款方	用例	采用路径
AI 原生初创公司	Cursor, Perplexity, Liner, Cresta	工程负责人 / 工程预算	代码、搜索、对话式 AI	自下而上自助采用
数字原生企业	DoorDash、Notion、Shopify、Upwork、Quora 等客户	产品工程 / 工程预算	生产级 AI 功能	从试点到生产
大型 / 受监管企业	Samsung, Uber	平台 + 采购	企业 AI 路线图	自上而下签约
企业搜索 / 智能体	Sourcegraph, Hebbia	工程负责人 / 工程预算	代码 + 企业搜索	先落地再扩张
通信 / 生产力	Superhuman	产品负责人	复合式 AI 助手	功能牵引

客群和示例客户来自 Fireworks 博客、Sacra 和 AI Market Watch；分层边界是分析判断，部分客户横跨多个客群。

[CU001, CU002, CU003, CU004]

FU001: 客户旅程图

客户从发现产品到企业标准化部署会经过的阶段。

旅程综合自 Fireworks 的 go-to-market 描述；并非所有客户都会走完每个阶段。

[CU002, CU009, CU022]

6.2 采用轨迹

采用曲线陡峭上行。Fireworks 在 2025 年 10 月 Series C 时称服务超过 10,000 家公司，较 Series B 时约 1,000 家增长约 10 倍，并覆盖数十万开发者。开发者基数从 2024 年 2 月约 12,000 人增至当年年底 23,000 人。使用强度高：平台每天处理超过 10 万亿 tokens，到 2026 年初升至约 15 万亿，说明许多账户跑的是生产负载，而不是实验负载。客户沿 land-and-expand 路径推进：先用无服务器推理服务单个功能，再扩展到专用部署、微调、强化微调、用于检索的 embeddings 和语音智能体。Hebbia 的分析师评论说明，单个推理关系只要锚定新开放模型的快速访问和高并发延迟保证，就可能增长为更广的基础设施依赖。这条轨迹在广度和用量上很强，但账户级留存和 cohort 扩张数据未披露。[CU006, CU007, CU008, CU009, CU010]

客户增长 / 采用轨迹表
指标	数值	截至	来源依据
服务公司数	~1,000	Series B 轮（2024）	公司披露
服务公司数	10,000+	Oct 2025	公司披露
开发者	~12,000	Feb 2024	报道
开发者	~23,000	Dec 2024	报道
开发者	数十万	Oct 2025	公司披露
每日 tokens（Oct 2025）	10T+	Oct 2025	公司披露
每日 tokens（2026 年初）	~15T	Early 2026	第三方资料

轨迹数据来自公司披露或第三方资料；增长很快，但账户级留存没有披露。

[CU006, CU007, CU008]

FU002: 采用 / 部署漏斗

从开发者注册到标准化企业账户的相对收窄。

漏斗数值为示意性相对权重；Fireworks 未披露转化率。

[CU006, CU007, CU009]

6.3 具名客户证明

Fireworks 对一家如此年轻的公司而言，拥有异常强的具名、生产级证明点。Cursor 使用 Fireworks 的 speculative-decoding API，在 Fast Apply 代码生成中达到约每秒 1,000 tokens；一位 AI 研究员公开表示 Fireworks「性能远强于开源引擎」，并已用于生产。Notion 通过 Fireworks 微调，将 AI 响应延迟从约 2 秒降至 350 毫秒，这一结果由其 AI 工程负责人归因。Sourcegraph 将延迟降低 30%，completion acceptance 提高 2.5 倍；Upwork 的 “Uma” assistant 在 Fireworks 上实时起草提案。Quora 的 Poe 聊天机器人响应速度提高三倍，Superhuman 在该平台上构建 Ask AI 复合系统。这些大多是生产部署，带有具名高管和量化结果，因此客户背书基底质量高、新鲜度也合理；不过几项案例研究来自 2024 年，少数标识只出现在汇总营销列表中，没有独立案例研究。[CU011, CU012, CU013, CU014, CU015, CU016]

具名客户验证表
客户	部署	结果	引用质量	新鲜度
Cursor	生产环境	Fast Apply 约 1,000 tok/sec；具名研究员引用	高（引用 + 指标）	2024-2025
Notion	生产环境	延迟从 2s 降至 350ms；具名高管引用	高（引用 + 指标）	2025
Sourcegraph	生产环境	延迟降低 30%，采纳率提高 2.5x	高（AWS + 案例）	2024
Upwork	生产环境	Uma 实时报价；具名高管	高（引用）	2025
Quora (Poe)	生产环境	响应速度提高三倍	中（报道）	2024
Superhuman	生产环境	Ask AI 复合系统	中（案例）	2024
Samsung	企业	AI 路线图提速	中（投资者引用）	2025
DoorDash	生产环境	高吞吐 AI 功能	中（标识 + AWS）	2025

具名案例多为生产部署，并带量化结果；部分案例停留在 2024 年，少数标识只出现在汇总名单里，因此覆盖不完整。

[CU011, CU012, CU013, CU014, CU015]

FU003: 客户证明矩阵

从部署状态、量化成效和具名归因看客户引用质量。

单元格综合了客户案例和 AWS case study 中的证据质量。

[CU011, CU016, CU031]

6.4 留存与耐久性

留存是客户叙事里证据最弱的一环。Fireworks 未披露净收入留存、总留存、流失率、续约率或合同期限，耐久性只能靠结构性信号推断，不能直接度量。正面信号确实存在：平台的 land-and-expand 设计、多产品界面和企业控制能力会鼓励扩张，蓝筹客户已经在跑生产工作负载，OpenAI 兼容 API 加上可靠性领先，也会降低接入后的离开理由。负面信号同样真实：同一个 OpenAI 兼容 API，加上路由聚合器兴起，让多栖和切换变得很容易；推理正在商品化，较 Together 近乎贴身的价格差也限制了靠价格建立黏性。独立评测者明确指出了替代方案和切换路径。综合判断是，产品深度和集成可能支撑耐久性，但公开留存指标尚未给出证据，这是重要尽调缺口。[CU017, CU018, CU019, CU020, CU021]

留存 / 重复使用 / 满意度表
维度	状态	信号	置信度
净收入留存	未披露	先落地再扩张结构	低
总留存 / 流失	未披露	无公开数据	低
合同期限	未披露	企业谈判	低
重复使用	高（隐含）	生产环境 10T+ tokens/day	中
满意度	正向（轶事证据）	具名高管证言	中
切换风险	偏高	OpenAI 兼容 API + 路由器	中

留存指标未披露；正面信号来自结构和轶事证据，而低锁定度抬高了切换风险。

[CU017, CU018, CU019, CU020]

FU004: 留存 / 重复 cohort

按客户细分看定性留存信号（未披露量化指标）。

Cohort 单元格为作者定性判断；Fireworks 未披露量化 cohort 留存。

[CU017, CU019, CU021]

6.5 扩张与集中度风险

Fireworks 的增长引擎是 land-and-expand：一个 serverless 功能可以扩展成专用部署、微调、语音和预留容量支出；AWS Strategic Collaboration Agreement 又借现有采购渠道触达买家。主要集中度风险有两类。第一，收入很可能偏向少数大型生产部署，因此按公司数折算、约 $28,000 的年化收入会低估少数大客户之下可能存在的长尾；头部客户身份和收入占比未披露，头部客户风险无法量化。第二，分发和伙伴依赖真实存在：AWS 联盟和 Microsoft Foundry 上架会加速增长，但也是渠道依赖；若干标杆客户（如 DoorDash 和 Shopify）本身也是成熟买家，有能力多栖或自建。云市场上架让采购摩擦低于封闭 API，但企业销售周期和合规审查仍然卡住最大交易。[CU022, CU023, CU024, CU025, CU026]

扩张与集中风险表
因素	方向	细节	尽调问题
先落地再扩张	正向	Serverless -> 专用部署 / 调优 / 语音	衡量扩张收入占比
混合 ARPA	中性	全客群约 ~$28K/yr	获取 ARPA 分布
头部客户集中度	风险	收入偏向大型部署	披露前 10 大客户收入占比
渠道依赖	风险	AWS + Microsoft Foundry 渠道	评估直销与伙伴来源组合
客户多供应商部署	风险	成熟买家可多供应商部署	核查单一供应商承诺
采购摩擦	中性	借助云市场降低	梳理企业销售周期长度

集中度和渠道风险来自分析师评论以及 AWS/Azure 合作关系推断；头部客户收入占比未披露。

[CU022, CU023, CU024, CU025]

6.6 展示材料

Chapter 07

07风险

7.1 按严重程度排序的风险概览

Fireworks 是一家高速扩张、资金充足的公司，主要风险来自商业和结构，而不是迫在眉睫的法律或运营失误。最高严重度风险包括推理商品化和毛利率压缩、可能拿走推理层的超大云厂商捆绑，以及对 NVIDIA 硬件供给的依赖；NVIDIA 同时是供应商、投资方，并且通过收购 Lepton 和 NIM 打包成为竞争者。中等严重度风险包括计划把算力扩张 3–4 倍带来的资本强度、CEO Lin Qiao 的关键人集中、压制留存的低切换成本，以及估值从 $552 million 冲到 $4 billion、传闻再到 $15 billion 的激进爬坡。较低但不可忽视的风险包括 EU AI Act 和 GDPR 带来的监管成本、开放模型许可约束、无注册专利、未披露 burn 和 runway，以及对 AWS 和 Microsoft 分发渠道的依赖。缓释逻辑在各类风险中一致：在 serving 层商品化前，更快向调优、agent、语音和企业治理上移，同时分散芯片来源，并接入既有采购渠道。剩余敞口仍然不小，因为多项缓释尚未被证明，若干关键指标也未披露。[CR001, CR002, CR003, CR004, CR005, CR006]

风险热力图摘要
风险	可能性	影响	缓释成熟度	剩余暴露
推理商品化 / 利润率	高	高	中	高
超大云厂商捆绑	中	高	中	高
NVIDIA 供应商兼竞争者	中	高	低	高
资本密集度 / 烧钱	中	中	低	中
关键人集中	低	高	低	中
低切换成本 / 流失	高	中	低	中
估值上行	中	中	低	中
监管（EU AI Act/GDPR）	中	低	中	低

严重度评级是作者综合分析师、评价和申报材料后的定性判断；剩余暴露反映缓释成熟度。

[CR001, CR002, CR003, CR004, CR025]

FR001: 风险热力图

主要风险类别的发生概率、影响和剩余敞口。

单元格为作者定性判断，综合了分析师、评价和备案证据。

[CR001, CR002, CR003, CR007]

7.2 监管和法律风险

Fireworks 的监管和法律敞口真实存在，但目前仍可管理。最重要的制度是 EU AI Act，它对通用和 foundation-model 提供商及其部署方施加分层、基于风险的义务，包括透明度和文档要求；Fireworks 作为推理和微调平台，位于欧盟客户的合规链条上。GDPR 和数据驻留要求推动公司提供零数据留存、数据驻留和区域部署功能，任何失误都会带来罚款和声誉成本。开放模型许可是更隐蔽的法律风险：Llama 等模型带有可接受使用和许可条款，行业关于训练数据版权的未决问题也可能传导到服务这些模型的平台。知识产权敞口也存在反向问题：Fireworks 没有列出公开专利，因此 FireAttention 和 FireOptimizer 的优势依赖商业秘密和 know-how；关键工程师离开后更难防守。公开信息中没有针对 Fireworks 的重大诉讼或执法行动，其 Series C 也由顶级法律顾问执行，但公司卖向医疗、金融服务和准政府行业后，监管表面积会扩大。[CR007, CR008, CR009, CR010, CR011, CR012]

监管 / 法律风险台账
风险	制度 / 来源	可能性	影响	缓释措施
EU AI Act 义务	EU AI Act（GPAI / 部署方义务）	中	中	合规 + 文档
数据隐私 / GDPR	GDPR / 数据驻留	中	中	零留存、欧盟区域
开放模型许可	Llama / 模型许可	低	中	许可合规、模型中立
训练数据版权外溢	行业 IP 不确定性	低	中	服务第三方模型
IP 防御力	无注册专利	中	中	商业秘密保护
行业合规扩张	HIPAA / 金融 / 政府	中	低	SOC2/HIPAA 状态
诉讼 / 执法	公开未见	低	中	顶级法律顾问

监管台账；公开信息中未见 Fireworks 面临重大诉讼，若干事项取决于行业和司法辖区，因此覆盖不完整。

[CR007, CR008, CR009, CR010, CR011]

7.3 运营、质量和安全风险

运营上，Fireworks 最核心的敞口是 GPU 供给。公司不拥有自己的集群，而是从第三方采购 NVIDIA 和 AMD 容量；随着算力扩张 3–4 倍，它会暴露在配额约束、供应瓶颈和硬件换代时点风险中。实测数据上，可靠性是强项，2026 年 Q1 独立监测 uptime 为 99.8%；但 Fireworks 未发布正式的标准层 SLA，因此合同可靠性承诺逐案谈判，事故历史也不透明。跨 Frankfurt、Iceland、Tokyo 以及 US、EU、APAC 区域运营全球多区域集群，会增加运营复杂度和成本。安全和合规姿态相对较强：基于 AWS 的推理具备 SOC2 Type II 和 HIPAA，支持零数据留存、airgapped EKS 和 bring-your-own-bucket 训练，且公开信息中没有已知数据泄露；即便如此，客户工作负载是生产级且延迟敏感，单次严重宕机或数据事件都会尤其伤。评测者还指出文档偏薄、公司扩张时支持可能吃紧；这些是质量风险，不是安全风险。[CR013, CR014, CR015, CR016, CR017, CR018]

7.4 合作伙伴和依赖风险

Fireworks 处在一张密集的依赖网络里。最尖锐的是 NVIDIA：Fireworks 的性能和毛利率主张依赖 NVIDIA 供应的领先 GPU，NVIDIA 持有投资份额，如今又通过收购 Lepton、GPU-cloud marketplace 和 NIM 打包直接竞争。AWS 和 Microsoft 既是伙伴也是威胁：Strategic Collaboration Agreement 和 Foundry 可用性提供分发，但 Bedrock、Vertex 和 Azure 可以把推理捆进既有安全、计费和治理关系，吸收这一品类。Fireworks 还依赖 Meta、DeepSeek、Alibaba 等持续发布开放模型并保持许可宽松；如果开放模型质量放缓，或许可转向限制性条款，开放模型中立的论点会被削弱。云平台依赖、少数 late-stage 基金带来的资本提供方集中，以及成熟买家可多栖造成的关键客户集中，共同补全依赖图谱。共同主线是：Fireworks 的赋能伙伴也是最可信的竞争者，因此伙伴深度和供应商多元化是风险图景的核心。[CR019, CR020, CR021, CR022, CR023, CR024]

伙伴 / 依赖风险台账
依赖方	角色	风险	严重度
NVIDIA	GPU 供应商 + 投资者 + 竞争者	分配、供应商兼对手	高
AMD	替代芯片供应商	生态成熟度较低	中
AWS	云 + 渠道伙伴	借 Bedrock 捆绑	高
Microsoft	Foundry 分发	借 Azure 捆绑	中
开放模型实验室	Meta / DeepSeek / Alibaba	模型供给与许可	中
后期投资者	资本提供方	融资集中	低
关键客户	成熟买家	多供应商部署 / 内部自建	中

依赖台账；反复出现的主线是，支撑 Fireworks 的伙伴也是它最可信的竞争者。

[CR019, CR020, CR021, CR022, CR023]

FR003: 依赖图

关键外部依赖及其失效路径。

依赖边表示上游依赖；NVIDIA、AWS 和 Azure 同时是合作伙伴和竞争者。

[CR019, CR020, CR021, CR022]

7.5 财务、模型和执行风险

财务上，核心风险是毛利率压缩。约 50% 的毛利率在结构上低于软件常态，因为 GPU 成本进入 COGS；相较 Together 几乎贴身的价格差，加上开源 serving 框架持续改进，都会形成长期下行压力。公司声称走向 60% 的路径，取决于尚未被证明的利用率提升和收入结构变化。资本强度进一步放大问题：3–4 倍算力扩张需要反复投入容量，burn、runway 和净收入留存未披露，因此资本充足性更多是声称而非验证。从 $552 million 到 $4 billion、约十五个月内的估值爬坡，再加上 $15 billion 传闻，嵌入了激进增长预期；任何放缓或毛利率失望都会受到惩罚。执行和人员方面，创始团队的 PyTorch 背景是优势，但关键人风险集中在 CEO Lin Qiao；在火热市场里留住顶尖推理工程师仍是持续挑战。所有这些风险的缓释逻辑都是同一个向上游堆栈多元化，但能否成功正是投资的核心未解题。[CR025, CR026, CR027, CR028, CR029, CR030]

人员 / 执行风险台账
风险	细节	可能性	影响
关键人集中	CEO Lin Qiao 主导愿景和融资	低	高
创始人与工程师留存	推理人才抢手，顶尖人才更稀缺	中	中
组织扩张	人员和 GTM 快速搭建	中	中
路线图执行	向上游应用层扩张仍未验证	中	高
治理不透明	董事会构成未披露	低	低

人员与执行风险来自创始人集中度和路线图野心；员工数和董事会细节未披露。

[CR029, CR030, CR033]

7.6 缓释、监控与论点失效触发器

Fireworks 的缓释路径是连贯的：把堆栈延伸到微调、强化学习、语音和企业治理，逃离商品化 serving；在 NVIDIA 和 AMD 之间分散芯片，并追求 Blackwell 效率；维持 day-zero 开放模型支持，让模型更替成为顺风；强化企业合规，拿下受监管行业；接入 AWS 和 Azure 采购，而不是正面硬拼。真正该监控的指标是毛利率向 60% 的轨迹、收入结构向专用和企业迁移、披露后的净收入留存、GPU 成本和配额条款，以及相较 vLLM 和 SGLang 的竞争差距。最清晰的论点失效触发器包括：毛利率无法从 ~50% 抬升或进一步压缩；超大云厂商或 NVIDIA 拿走推理层，把 Fireworks 降格为优化 add-on；关键人离开；或增长低于 $4 billion-plus 估值隐含的速度。优先尽调问题是对齐后的收入和毛利率数字、NRR 和 burn、GPU 供给合同，以及头部客户集中度。总体剩余敞口中高，主要集中在商品化和依赖风险，而非法律或运营失败。[CR031, CR032, CR033, CR034, CR035, CR036]

缓释措施与否决标准表
风险	缓释措施	监测指标	论点失效触发项
商品化	向调优、Agent、语音上移	收入结构变化	利润率卡在或跌破 ~50%
超大云厂商捆绑	接入 AWS/Azure 渠道	直销与伙伴占比	推理被 Bedrock/Azure 吸收
NVIDIA 依赖	分散到 AMD，吃到 Blackwell 效率	GPU 成本与配额条款	NVIDIA 用价格 / 供给压价
利润率压缩	提高利用率 + 企业客户占比	毛利率向 60% 靠拢	利润率压到 50% 以下
关键人物风险	加厚领导梯队	高管留存	Lin Qiao 离职
增长韧性	先落地再扩张 + NRR	NRR、客户数增长	增长相对估值失速

缓释措施和否决标准综合了分析师评论和公司策略；触发项是作者设定的论点失效阈值。

[CR031, CR032, CR034, CR035, CR036]

FR002: 风险传导图

商品化和依赖风险如何传导到财务结果。

传导边综合了分析师风险分析；方向表示风险传播。

[CR001, CR002, CR025, CR026]

7.7 展示材料

Chapter 08

08估值

8.1 投资论点与反论点

牛市论点是，Fireworks 正在成为关键企业 AI 基础设施，也就是开放模型推理的 runtime 层；与此同时，企业正从封闭 API 试验转向在生产中拥有定制模型。它把几项稀缺要素拼在一起：打造 PyTorch 的创始团队，真实产品优势（FireAttention、FireOptimizer、同类最佳函数调用、99.8% uptime），蓝筹生产参考（Cursor、Notion、Sourcegraph、Upwork），以及从 2025 年中约 $130 million ARR 到 2026 年 5 月据称覆盖 10,000 多客户、年化约 ~$800 million 的超高速增长。如果托管推理按耐久基础设施定价，而不是按商品定价，估值还能复合。反论点是，推理在结构上正在商品化：GPU 成本主导 COGS，使毛利率停在约 50%；按 token 价格与 Together 相差约 ~2%；开源 serving 框架不断缩小差距；切换成本接近零；最强玩家 AWS、Azure 和 NVIDIA 同时是伙伴和竞争者，有能力重定价这一品类。按这种看法，Fireworks 可能变成一个毛利率约 ~50% 的优化 add-on；估值十五个月内从 $552 million 跑到 $4 billion、并传出 $15 billion 谈判，已经把完美执行计入价格。[CV001, CV002, CV003, CV004, CV005, CV006]

正反论点表
维度	看多论点	看空反论点
市场	推理成为新运行时，TAM 巨大	相比超大云厂商，可触达 SAM 偏小
产品	FireAttention/FireOptimizer 有技术边际 + 可靠性	OSS 框架缩小差距
客户	蓝筹客户生产环境背书	切换成本低，多家并用
财务	高速增长到 ~$800M	~50% 利润率，价格战
竞争	可靠性和函数调用领先	价格和速度两头受挤
依赖	NVIDIA/AWS/Azure 战略支持	同一批玩家也能重定价这个品类
估值	基础设施倍数有支撑	价格已经计入完美执行

正反论点对称展开；决定变量是利润率轨迹和留存，二者都未披露。

[CV001, CV002, CV003, CV004, CV005]

FV001: 推荐逻辑

各 thesis 因素如何共同形成 track 建议。

逻辑流概括推荐驱动因素；权重为定性判断。

[CV007, CV008, CV001, CV002]

8.2 建议、置信度和立场

我们将 Fireworks AI 评为跟踪，置信度中等，风险评级高，估值立场偏拉伸，整体得分 6.5/10。业务质量值得密切接触，并在合适进入价建立仓位；但当前和传闻价格要求投资人相信两个尚未证明的变量：毛利率能从 ~50% 明显爬向公司声称的 60% 目标，且增长是耐久的，不是容易被超大云厂商拿走的商品化抢地盘。2025 年 10 月 Series C 时，$4 billion 估值约等于公司声称 $280 million 年化收入的 14 倍；按 Sacra 2026 年 5 月 ~$800 million 估计，同一 $4 billion 约为 5 倍，但传闻 $15 billion 轮次约等于这一更高基数的 19 倍。区间很宽，反映出对正确收入数字以及这个低于软件毛利、快速商品化品类应给多少倍数的真实不确定。建议因此是密切跟踪，按 base case 承保，坚持低于传闻标记的进入纪律，并在以溢价投入前要求披露毛利率和留存。置信度主要被三项缺失压在中等：经审计财务、披露的 NRR、以及对齐后的收入数字。[CV007, CV008, CV009, CV010, CV011]

投资建议摘要表
维度	评估	依据
建议	跟踪观察	资产质量高，但价格要求高
信心	中	财务未经审计，NRR 未披露
风险评级	高	商品化 + 依赖
估值判断	偏高	以 ~50% 利润率讨论 $15B
综合评分	6.5 / 10	业务强，价格贵
入场纪律	低于传闻中的 $15B	按基准情形承销

建议综合投资论点、财务、客户、竞争和风险章节；评分是作者的综合判断。

[CV007, CV008, CV009]

8.3 融资背景与进入纪律

Fireworks 已在种子轮、Series A（$25M，2024）、Series B（$52M，估值 $552M，2024 年 7 月）和 Series C（$250M，估值 $4B，2025 年 10 月）累计融资超过 $327 million；最后一轮约 $230 million 为一级发行，$20 million 为二级转让。到 2026 年 5 月，据称公司正讨论以 $15 billion 投后估值融资，由 Index Ventures 共同领投，约七个月内接近翻四倍。对私有后期进入而言，关键纪律包括：用哪个收入基数打倍数、优先权栈和任何清算压力，以及在资本密集型算力建设中继续融资带来的稀释。公开证据支持增长和客户叙事，但不支持财务质量：收入数字未经审计且来源冲突，毛利率是分析师估计，burn 和 runway 未披露。NVIDIA、AMD、MongoDB 和 Databricks 等战略投资者在股权结构表上是双刃剑：增加生态支持，也集中供应商和伙伴影响力。进入纪律应锚定基准情景估值，把 $15 billion 标记视为需要毛利率证明的拉伸情形，并计入未公开披露的优先权和稀释。[CV012, CV013, CV014, CV015, CV016]

8.4 牛市、基准和熊市情景

我们的基准情景（约 45% 权重）假设 Fireworks 在 2026 年达到约 $700-900 million 年化收入，并继续增长；毛利率只温和改善到 50% 出头。公司守住份额，但商品化压住倍数，对应公允企业价值约 $5-8 billion，大致接近或略高于 $4 billion Series C，低于传闻 $15 billion。牛市情景（约 30%）假设向上游堆栈策略奏效：微调、强化学习、语音和治理把毛利率抬向 58-60%，收入到 2027 年复合超过 $1.5 billion，Fireworks 成为 platform-of-record，支撑 $15-20 billion 估值。熊市情景（约 25%）假设商品化和超大云厂商捕获：毛利率停在 50% 附近或被压缩，买家多栖或转向 Bedrock 和 Azure 后增长急剧放缓，倍数压缩到 $2-3 billion 区间或 down round。离散度异常大，因为同一家公司既可被解读为耐久基础设施，也可被解读为商品化 reseller；决定性证据——毛利率轨迹和留存——尚未披露。[CV017, CV018, CV019, CV020, CV021]

牛市 / 基准 / 熊市情景表
情景	概率	关键假设	2026-27 收入	利润率	隐含价值
牛市	~30%	向上游应用层推进奏效，成为记录平台	到 2027 年 >$1.5B	58-60%	$15-20B
基准	~45%	守住份额，利润率小幅改善	$700-900M (2026)	50% 出头	$5-8B
熊市	~25%	商品化 + 超大云厂商捕获价值	增速减半	~50% 或更低	$2-3B / 下轮降估值

情景概率和区间为作者估算；收入采用公司和 Sacra 数字，未经审计。

[CV017, CV018, CV019, CV020]

FV003: 估值 / 回报区间

各情景隐含企业价值，单位为十亿美元。

情景价值区间为作者估算，以可比倍数和已披露估值标记为锚。

[CV017, CV018, CV019]

8.5 可比公司组

私有可比公司锚定本次分析。最接近的同行 Together AI 在 2025 年初以约 $618 million 年化收入获得 $3.3 billion 估值（约 5x），据称正以约 $1 billion 收入讨论接近 $7.5 billion 的估值（约 7-8x）。Baseten 2026 年 1 月以 $5 billion 估值融资，据称正在讨论 $11 billion；Groq 作为硬件驱动玩家、商业模式不同，估值达到 $6.9 billion；Fal 被引用在约 $4.5 billion。相较这些公司，Fireworks 按 ~$280 million（Series C 时点）和 $4 billion 估值看，较 Together 倍数偏贵，但收入基数更小、增长更快；按 2026 年 5 月 ~$800 million 估计看则相对便宜，而 $15 billion 讨论又把估值重新拉伸。公开基础设施软件可比公司——Datadog、Snowflake、Confluent、Cloudflare、MongoDB 和 DigitalOcean——给出倍数天花板：高增长公共基础设施公司的交易区间很宽，但已从峰值压缩；DigitalOcean 这类低毛利基础设施业务相对纯软件有明显折价。Fireworks 毛利率约 ~50%，因此应较纯 SaaS 倍数折价；超大云厂商 Amazon、Microsoft 和 Oracle 既是规模参照，也是竞争威胁。可比公司组支持的是一个情景依赖、区间很宽的价值，而不是单点估值。[CV022, CV023, CV024, CV025, CV026, CV027]

可比估值表
公司	类型	估值	收入（年化）	隐含倍数	备注
Fireworks AI	私募轮	$4.0B (Oct 2025)	~$280M	~14x	Series C 时点
Fireworks AI	私募（传闻）	$15B (2026)	~$800M	~19x	洽谈中，未确认
Together AI	私募轮	$3.3B (Feb 2025)	~$618M	~5x	最接近的同业
Together AI	私募（传闻）	$7.5B (2026)	~$1.0B	~7-8x	洽谈中
Baseten	私募轮	$5.0B (Jan 2026)	未披露	n/a	传闻讨论 $11B
Groq	私募轮	$6.9B (Sep 2025)	硬件模式	n/a	模式不同
上市基础设施 SaaS	上市可比公司	Datadog/Snowflake/Cloudflare	数十亿美元级	~8-20x EV/rev（收入倍数）	利润率 >70%
DigitalOcean	上市可比公司	倍数更低	~$0.8B	低个位数	重基础设施折价

私募轮来自公司和 Sacra；上市可比公司按公开文件做定性对照。覆盖不完整：并非所有同业都披露收入。

[CV022, CV023, CV024, CV025, CV026]

FV002: 估值敏感性

不同收入和倍数假设下的隐含估值，单位为 USD billions。

敏感性网格用公司和 Sacra 的收入数字乘以示例倍数；不是预测。

[CV009, CV022, CV023]

8.6 退出准备度与最终尽调

退出可选项方向上较强，但时点尚未证明。可行路径包括：若 Fireworks 维持超高速增长并把毛利率抬向软件水平，走 IPO；或被希望拥有推理层的超大云厂商或数据平台投资方（AWS、Microsoft、Databricks、MongoDB、NVIDIA）战略收购，尽管其中若干方也是竞争者。主要论点失效触发器包括：毛利率无法从 ~50% 抬升、超大云厂商或 NVIDIA 拿走推理层、关键人离开，或增长低于估值隐含速度。最终优先尽调问题包括：一个统一、标明日期的收入数字，经审计或管理层确认的毛利率及通向 60% 的路径，净收入留存和流失 cohort，算力建设对应的 burn 和 runway，GPU 供给合同条款，头部客户集中度，以及下一轮优先权和稀释结构。在这些问题得到回答前，正确姿态是密切跟踪公司，在基准情景上建立信念，并且只有在毛利率和留存支持基础设施论点、而非商品论点后，才为溢价进入预留空间。[CV028, CV029, CV030, CV031, CV032]

论点失效与否决触发项表
触发项	信号	动作
利润率停滞	毛利率卡在 ~50% 或下滑	退出 / 避免溢价
超大云厂商捕获价值	推理被 Bedrock/Azure 吸收	重新评估韧性
NVIDIA 重新定价	供应商在价格 / 供给上压价	降低敞口
增长停滞	收入相对估值减速	下轮降估值风险
关键人物流失	Lin Qiao 离职	重新承销
留存不及预期	NRR 披露后低于 ~110%	下调倍数

否决触发项对应会推翻基础设施论点的条件；阈值由作者设定。

[CV028, CV029, CV030]

最终尽调问题表
问题	重要性	负责人
已核对且带日期的收入	确定倍数分母	公司 / 财务
经审计毛利率 + 60% 路径	检验溢价论点	公司 / 财务
NRR 和流失分组	收入韧性	公司 / RevOps
烧钱和现金跑道	融资风险与算力计划的关系	公司 / 财务
GPU 供给合同	利润率和供给敞口	公司 / 基础设施
头部客户集中度	收入集中风险	公司 / 销售
优先权与稀释	入场经济性	公司 / 法务

尽调问题是以溢价估值投资前的门槛项。

[CV031, CV032]

FV004: 投资 KPI

核心可投性指标。

KPI 综合投资建议和估值分析；倍数使用未经审计的收入。

[CV007, CV009, CV010]

8.7 展示材料

免责声明

本报告仅供参考，基于截至 2026-06-14 的公开来源，不构成投资建议。财务数字大多来自未经审计的公司声明或第三方估算，做出任何决策前均应独立核验。

证据索引

结论
编号	陈述	可信度	来源
CO001	Fireworks AI is an AI inference-cloud company headquartered in Redwood City, California.	高	SO018, SO020, SO025
CO002	Fireworks AI was founded in late 2022 by a team that left Meta's PyTorch organization.	高	SO002, SO004, SO014
CO003	Fireworks operates an "AI Cloud" platform that runs, fine-tunes and scales open-source LLM, vision, audio and multimodal models with low-latency inference.	中	SO002, SO013, SO001
CO004	Fireworks monetizes via usage-based pricing including per-token serverless inference, per-training-token fine-tuning, per-GPU-hour reinforcement fine-tuning and dedicated deployments.	中	SO013
CO005	Fireworks positions itself on a "one-size-fits-one" thesis favoring smaller customizable open models over generic closed foundation models.	中	SO002, SO005
CO006	Lin Qiao is CEO and co-founder of Fireworks AI and previously led the PyTorch team at Meta.	高	SO004, SO016, SO018
CO007	Fireworks AI was co-founded by seven people, most of whom worked together on PyTorch at Meta.	中	SO004, SO014, SO023
CO008	Co-founders Dmytro Dzhulgakov and Dmytro Ivchenko are Ukrainian former Meta PyTorch engineers.	中	SO014, SO004
CO009	Lin Qiao holds a Ph.D. in Computer Science from UC Santa Barbara and previously worked at LinkedIn and IBM.	中	SO018, SO016
CO010	Other co-founders include James Reed, Benny Chen, Chenyu Zhao and Pawel Garbacki, with backgrounds at Meta PyTorch, ads and ML teams and Google Vertex AI.	中	SO004, SO023
CO011	Fireworks AI raised a $250 million Series C in October 2025 at a $4 billion valuation.	高	SO002, SO019, SO020
CO012	The Series C was co-led by Lightspeed Venture Partners, Index Ventures and Evantic with continued support from Sequoia Capital.	高	SO002, SO021, SO022
CO013	A $52 million Series B led by Sequoia closed in July 2024 at a $552 million valuation with NVIDIA, AMD and MongoDB Ventures participating.	高	SO003, SO008, SO009
CO014	Fireworks AI has raised more than $327 million in total funding as of October 2025.	高	SO002, SO013
CO015	A $25 million Series A led by Benchmark closed in March 2024 with Sequoia, Databricks Ventures and angels including Frank Slootman, Sheryl Sandberg, Howie Liu and Alexandr Wang.	中	SO014, SO003
CO016	The Series B brought Fireworks AI's cumulative capital raised to $77 million.	中	SO003
CO017	As of May 2026 Sacra reports Fireworks is in talks to raise at a $15 billion post-money valuation with Index set to co-lead, on unconfirmed terms.	低	SO013
CO018	Fireworks AI reported powering over 10,000 companies at its October 2025 Series C, roughly a tenfold increase from its Series B.	中	SO002, SO013
CO019	Fireworks reported annualized revenue surpassing $280 million at the time of the October 2025 Series C.	中	SO002
CO020	The Series C round comprised roughly $230 million of primary funding and a $20 million secondary transaction per Sacra.	中	SO013
CO021	Fireworks AI's developer base grew from about 12,000 in February 2024 to 23,000 by the end of 2024.	中	SO014
CO022	The Fireworks platform processes more than 10 trillion tokens per day as of October 2025, rising to about 15 trillion per day by early 2026 per third-party profiles.	中	SO002, SO018
CO023	Earlier 2025 coverage cited Fireworks at $130 million ARR, profitable, and growing roughly 20x year over year.	低	SO014
CO024	Sacra estimates Fireworks AI's gross margin near 50 percent, below software norms, with management targeting 60 percent through GPU optimization.	中	SO013
CO025	Fireworks launched Microsoft Foundry (Azure) availability in March 2026, extending open-model inference to Azure customers.	中	SO018
CO026	Fireworks shipped FireFunction V2, FireAttention V2, FireOptimizer, supervised fine-tuning V2 and reinforcement fine-tuning between 2024 and 2026.	中	SO003, SO013
CO027	Fireworks AI acquired Hathora to deepen real-time and global compute orchestration.	中	SO013
CO028	Fireworks does not own its GPU fleet and sources NVIDIA and AMD capacity from third parties.	中	SO013
CO029	Analysts cite inference commoditization, hyperscaler bundling and hardware concentration as the main structural risks to Fireworks.	中	SO013
CO030	Independent reviewers describe Fireworks as "just the engine," requiring developer sophistication, with thin documentation and no ongoing free tier.	中	SO026
CO031	Fireworks offers an OpenAI-compatible API plus function calling, fine-tuning and enterprise security controls across hundreds of models.	中	SO001, SO002
CO032	Investors at Index Ventures and Sequoia cite the founding team's PyTorch and inference-systems pedigree as the core reason for backing Fireworks.	中	SO004, SO005
CO033	CEO Lin Qiao concentrates fundraising, vision and public representation, creating a meaningful key-person dependency.	低	SO004, SO015
CO034	NVIDIA has entered the inference market directly via its Lepton acquisition and a competing GPU cloud marketplace, raising supplier-as-competitor risk for Fireworks.	中	SO013
CO035	Company-stated revenue figures and third-party estimates for Fireworks differ materially across vintages, from $130M ARR in mid-2025 to ~$800M annualized by May 2026.	低	SO002, SO013, SO014
CM001	Fireworks AI competes in the managed AI inference market for serving and tuning open-weight models in production.	中	SM010, SM013
CM002	The core included spend is third-party production model serving, fine-tuning and dedicated deployment, not foundation-model training.	中	SM010, SM009
CM003	Closed-model APIs from OpenAI and Anthropic are excluded from the core market but are the primary status-quo substitute.	中	SM009, SM025
CM004	Self-hosting on vLLM or SGLang and hyperscaler bundles such as Bedrock and Azure Foundry are direct substitutes for Fireworks.	中	SM010, SM015
CM005	Adjacent expansion pools include voice agents, RAG/embeddings and reinforcement-learning training for agents.	中	SM010
CM006	MarketsandMarkets estimates the AI inference market at $106.15 billion in 2025 growing to $254.98 billion by 2030 at a 19.2% CAGR.	高	SM001, SM003
CM007	Other research houses place the 2026 AI inference market between roughly $118 billion and $126 billion and 2034 between $312 billion and $536 billion.	低	SM002, SM003, SM005
CM008	Gartner projects generative-AI model spend to nearly triple from $14 billion in 2025 to $39 billion by 2028.	中	SM009
CM009	The independent open-weight inference-serving market has consolidated around roughly seven providers as of Q2 2026.	中	SM006
CM010	With Together AI near $1 billion annualized revenue and Fireworks in the $280-800 million range, the independent-provider revenue pool is a few billion dollars in 2026.	低	SM011, SM010
CM011	Fireworks' $280 million-plus revenue represents an early single-digit share of the independent inference niche.	低	SM010, SM013
CM012	The most relevant lens for valuing Fireworks is the independent inference niche, not the headline AI inference TAM.	中	SM006, SM010
CM013	AI-native startups adopt Fireworks bottoms-up via self-serve API keys with an engineering lead as economic buyer.	中	SM010
CM014	Digital-native enterprises such as DoorDash, Notion, Shopify and Upwork move features from pilot to production on Fireworks.	中	SM013, SM010
CM015	Regulated and Fortune 500 buyers require SSO, audit logs, data residency and HIPAA/SOC2 posture and adopt top-down via procurement.	中	SM010
CM016	Across segments the user is a developer and the payer is an engineering or procurement budget.	中	SM010, SM013
CM017	Fireworks reaches buyers through cloud procurement channels via an AWS Strategic Collaboration Agreement and Microsoft Foundry availability.	中	SM010, SM015
CM018	Open-source model quality convergence and agentic compound AI are primary drivers expanding inference demand.	中	SM009, SM013
CM019	Inference is commoditizing as vLLM and SGLang improve, compressing the proprietary advantage of optimized stacks.	中	SM010, SM025
CM020	Hyperscaler bundling by AWS, Azure and Google folds inference into existing security, billing and governance relationships.	中	SM010, SM015
CM021	Fireworks' Llama 70B price sits within roughly 2% of Together AI's, illustrating razor-thin price differentiation.	中	SM023, SM006
CM022	GPU supply is concentrated and Fireworks does not own its fleet, creating capacity and cost exposure.	中	SM010
CM023	The EU AI Act imposes tiered obligations that add compliance overhead for AI deployment in Europe.	中	SM026
CM024	The OpenAI-compatible API lowers both switching-in and switching-out costs, capping durable lock-in.	中	SM023, SM010
CM025	Published AI inference TAM figures bundle chips, hyperscaler services and independent software, so they overstate Fireworks' reachable market.	中	SM001, SM006
CM026	The independent inference-provider revenue pool is not measured by any standard analyst and must be assembled from uneven company estimates.	低	SM010, SM011, SM012
CM027	Forecast CAGRs for AI inference range from roughly 13% to 19% and 2034 estimates differ by more than $200 billion across houses.	低	SM001, SM002, SM003
CM028	Despite wide estimate spreads, the AI inference market is clearly large and growing double digits, with directional rather than precise SAM.	中	SM001, SM004
CM029	There is no public evidence of near-term saturation in the AI inference market; growth drivers remain intact through the forecast window.	低	SM002, SM004
CM030	Fine-tuned and specialized models are projected to capture much of the generative-AI model-spend growth, favoring Fireworks' tuning products.	中	SM009
CM031	The serverless open-weight inference field shows roughly 6x price spread and 5-7x latency spread across providers on the same model.	中	SM006
CM032	Together AI, Groq, Baseten, Cerebras, Replicate, Anyscale and OctoAI are the other named providers in the consolidated inference field.	中	SM006, SM016, SM019
CM033	Voice agents targeting sub-500ms latency expand Fireworks into contact-center and telephony budget categories larger than API inference alone.	中	SM010
CM034	Demand differs by maturity: startups optimize cost-per-token while Fortune 500 buyers prioritize control, compliance and vendor consolidation.	中	SM010, SM015
CM035	A defensible 2026 AI inference market figure is roughly $118-126 billion, between the 2025 base and the 2030 forecast.	低	SM001, SM003
CP001	The inference market has segmented into managed open-model platforms, vertically integrated silicon, hyperscaler bundles and open-source serving frameworks.	高	SP009, SP010
CP002	Together AI, Baseten and Replicate are Fireworks' closest managed open-model competitors.	中	SP009, SP010
CP003	Groq, Cerebras and SambaNova attack inference from custom silicon rather than software optimization on commodity GPUs.	中	SP009, SP005
CP004	AWS Bedrock, Google Vertex, Azure Foundry and Databricks Model Serving collapse model access, infrastructure and governance into one platform.	中	SP009, SP016
CP005	Open-source serving frameworks vLLM and SGLang plus NVIDIA NIM and routers like OpenRouter commoditize proprietary inference advantage.	中	SP009
CP006	NVIDIA entered inference directly via its Lepton acquisition and a competing GPU-cloud marketplace, becoming a supplier-turned-rival.	中	SP009
CP007	Together AI raised a $305 million Series B in February 2025 at a $3.3 billion valuation and reached about $1 billion annualized revenue by early 2026.	高	SP002, SP018
CP008	Together AI was founded in 2021 by Percy Liang, Chris Re and Vipul Ved Prakash and spans serverless, clusters, fine-tuning, voice and RL.	中	SP002, SP018
CP009	Baseten raised $300 million in January 2026 at a $5 billion valuation led by IVP and CapitalG with a reported $150 million from NVIDIA.	高	SP004, SP007
CP010	Baseten positions as an enterprise inference-engineering platform with self-hosted and hybrid VPC deployment built on TensorRT, SGLang, vLLM and TGI.	中	SP003, SP015
CP011	Groq raised $750 million in September 2025 at a $6.9 billion valuation and advertises 750-plus tokens per second on Llama models from custom LPU silicon.	高	SP005, SP006, SP017
CP012	Groq's partnership with Meta to power the official Llama API gives it strong distribution and first-party open-model credibility.	中	SP009
CP013	Replicate, Modal and Anyscale compete for developer mindshare at the top of the adoption funnel.	中	SP012, SP013, SP014
CP014	Fireworks' Q1 2026 uptime of 99.8% is the highest among specialized inference providers per independent monitoring.	中	SP001
CP015	Llama 3.3 70B runs about $0.90 per million tokens on Fireworks versus $0.88 on Together and $0.59 on Groq.	中	SP001, SP010
CP016	FireFunction achieves roughly 92% multi-tool function-calling accuracy, ahead of Together and Groq and within a few points of GPT-4o.	中	SP001
CP017	Together offers a 200-plus model catalog with full fine-tuning while Groq offers 15-20 models and no fine-tuning.	中	SP001
CP018	Groq's LPU delivers 400-750 tokens per second versus Fireworks' ~145, but Fireworks wins latency consistency under load.	中	SP001, SP010
CP019	Fireworks, Together and Baseten all offer SOC2/HIPAA postures and VPC or data-residency options, with Baseten leaning hardest into self-hosted compliance.	中	SP003, SP009
CP020	Most inference providers expose OpenAI-compatible APIs, making migration between them a matter of minutes.	中	SP001, SP020
CP021	Routing aggregators such as OpenRouter and TokenMix encourage multi-homing and automatic failover across providers.	中	SP001, SP009
CP022	Hyperscalers and NVIDIA control the GPU supply, security posture and procurement relationships that independents depend on.	中	SP009, SP016
CP023	Fireworks plugs into incumbent channels via an AWS Strategic Collaboration Agreement and Microsoft Foundry availability.	中	SP009, SP016
CP024	Fireworks does not own GPUs and sources NVIDIA and AMD capacity from third parties, unlike Together's owned data-center strategy.	中	SP009, SP002
CP025	Fireworks' proprietary FireAttention and FireOptimizer stack converts inference-systems engineering into a performance and price advantage.	中	SP009
CP026	Open-source serving frameworks keep closing the performance gap, and Baseten openly builds on vLLM and SGLang.	中	SP009, SP003
CP027	NVIDIA pushes NIM as a packaging layer and Snowflake released Arctic Inference as an open vLLM plugin, compressing proprietary advantage.	中	SP009
CP028	Groq at $6.9 billion, Baseten at $5 billion and Together near a rumored $7.5 billion are better capitalized than Fireworks at $4 billion.	中	SP005, SP004, SP002
CP029	Independent reviewers describe Fireworks as "just the engine," an adverse signal about its application-level differentiation versus full-stack rivals.	中	SP023
CP030	Fireworks' durability depends on extending into tuning, agents and governance faster than the ecosystem commoditizes the serving layer.	中	SP009
CP031	Fireworks' most defensible differentiation is reliability plus best-in-class function calling rather than price or raw speed.	中	SP001
CP032	The same Llama model spreads roughly sixfold in price and 5-7x in latency across the seven-provider field.	中	SP010
CP033	Together AI has raised $533.5 million in total funding from investors including General Catalyst, Prosperity7, NVIDIA, Salesforce and Kleiner Perkins.	中	SP002
CP034	Baseten's valuation roughly doubled from $2.15 billion in September 2025 to $5 billion in January 2026, with talks of an $11 billion round by May 2026.	中	SP003, SP004
CP035	Hyperscaler bundling is plausibly the single biggest structural threat to Fireworks because it removes the need for a standalone inference vendor.	低	SP009, SP016
CI001	Fireworks bills serverless inference per token, fine-tuning per training token, reinforcement fine-tuning per GPU-hour and dedicated deployments per GPU-second or GPU-hour.	高	SI002, SI003
CI002	Fireworks' usage-based pricing maps to the customer lifecycle, capturing revenue across experimentation, production, adaptation and scaled deployment.	中	SI002
CI003	Reserved capacity is contracted separately on longer commitments at negotiated pricing and is the highest-margin stream.	中	SI002
CI004	Fireworks publishes serverless rates of about $0.90 per million tokens for Llama 3.3 70B, $0.20 for the 8B model and $0.50 for DeepSeek V3.	中	SI004, SI005
CI005	Image generation runs from about $0.013 (SDXL) to $0.04 (Flux 1.1 Pro) per image and reserved capacity near $4.80 per hour per replica.	中	SI004
CI006	Fireworks' go-to-market is bottoms-up at entry via self-serve API keys and top-down at expansion via negotiated enterprise relationships.	中	SI002
CI007	Fireworks offers $1 of free credits rather than an ongoing free tier and a standard rate limit near 600 requests per minute.	中	SI004
CI008	Fireworks runs a field and partner sales motion anchored by an AWS Strategic Collaboration Agreement with funded proofs-of-concept and a startup acceleration program.	中	SI002, SI007
CI009	Blended annualized revenue per company is estimated near $28,000 across Fireworks' 10,000-plus customer base.	低	SI002
CI010	Fireworks revenue is likely concentrated among a smaller number of large production deployments rather than evenly across the base.	低	SI002
CI011	Sacra estimates Fireworks' gross margin near 50%, below the 70%-plus typical of subscription software, because GPU costs sit in cost of goods sold.	中	SI002
CI012	Management targets a 60% gross margin through better GPU utilization, Blackwell-generation efficiency and a mix shift toward dedicated and enterprise workloads.	中	SI002
CI013	Multi-LoRA improves utilization by consolidating many fine-tuned variants onto a single base-model deployment, lowering compute cost per variant.	中	SI002, SI018
CI014	Proprietary optimization via FireAttention and FireOptimizer lets Fireworks charge a premium over self-hosting while undercutting the alternative's total cost.	中	SI002, SI016
CI015	NVIDIA reports rapidly growing data-center GPU revenue, evidencing the supplier-driven, capacity-constrained input market Fireworks operates within.	中	SI012
CI016	AMD's data-center accelerator business is also scaling, offering Fireworks an alternative silicon supplier to NVIDIA.	中	SI013
CI017	Fireworks stated annualized revenue surpassing $280 million at its October 2025 Series C.	高	SI001, SI006
CI018	Sacra estimates Fireworks at roughly $305 million annualized at year-end 2025 rising to about $800 million by May 2026.	低	SI002
CI019	Earlier 2025 coverage reported Fireworks at $130 million ARR, profitable, and growing roughly 20x year over year.	低	SI009
CI020	Fireworks' audited financials, revenue mix, net revenue retention, churn and headcount are not public.	中	SI002, SI010
CI021	Fireworks processes more than 10 trillion tokens per day, rising to 15 trillion by early 2026.	中	SI001, SI010
CI022	Fireworks has raised more than $327 million across seed, Series A, B and C rounds.	高	SI001, SI002
CI023	The October 2025 Series C provided $250 million, roughly $230 million primary and $20 million secondary, at a $4 billion valuation.	高	SI002, SI001
CI024	Fireworks plans to grow its compute footprint three-to-four-fold over the next year, a capital-intensive expansion.	中	SI001
CI025	Sacra reports Fireworks is in talks to raise again at a $15 billion valuation as of May 2026, which could be the next-round trigger.	低	SI002
CI026	Fireworks' principal financing dependency is GPU supply, since it does not own its fleet and sources NVIDIA and AMD capacity from third parties.	中	SI002, SI012
CI027	Fireworks shows credible hypergrowth and a lifecycle-spanning usage model, but the absence of audited figures caps revenue-quality confidence.	中	SI002, SI001
CI028	The main financial diligence blockers are a reconciled revenue figure, gross-margin verification, burn and runway, and net revenue retention.	中	SI002, SI010
CI029	Fireworks' revenue figures span $130 million to roughly $800 million annualized within twelve months, reflecting both hypergrowth and inconsistent measurement.	低	SI001, SI002, SI009
CI030	No public debt or project-finance obligations are disclosed for Fireworks AI.	低	SI002, SI021
CI031	An AWS case study reports a Fireworks customer cut total costs four-fold and supported three times higher traffic per instance on EC2 P5.	中	SI007
CI032	Reported 2025 profitability, if accurate, would make Fireworks unusually capital-efficient for a hypergrowth infrastructure startup.	低	SI009
CI033	Downward inference price pressure threatens Fireworks' margins absent continued differentiation, per critical reviewers.	中	SI020
CI034	MongoDB, a public infrastructure peer and Fireworks investor, illustrates the higher gross margins of pure-software comparables versus inference providers.	低	SI014
CI035	Fireworks' capital intensity exceeds a typical SaaS company because compute scaling and the lack of owned GPUs require recurring capacity spend.	中	SI002, SI001
CE001	Fireworks lets a developer point an OpenAI-compatible API at an open model and get low-latency production inference without managing GPUs.	高	SE010, SE013, SE017
CE002	Customers describe Fireworks as an inference engine that supplies speed, cost and control while they build the product.	中	SE014, SE025, SE026
CE003	The platform spans text, image, audio and multimodal formats across hundreds of models with day-zero support for major releases.	中	SE010, SE006
CE004	Fireworks provides function calling, JSON-mode structured output and streaming through its API.	中	SE010, SE013
CE005	A single customer can expand from serverless inference into fine-tuning, dedicated capacity, RAG and voice agents.	中	SE017, SE023
CE006	Serverless inference is the entry product, offering pay-per-token access to 50-plus served models including Llama 4, DeepSeek V3, Qwen 3, Mixtral, Gemma 3 and Phi-4.	中	SE013, SE010
CE007	FireFunction is Fireworks' proprietary function-calling model family for tool use and structured output.	中	SE013
CE008	Customization modules include LoRA fine-tuning, Supervised Fine-Tuning V2 with quantization-aware training, and Reinforcement Fine-Tuning for agentic tasks.	高	SE005, SE003, SE004
CE009	Deployment modules span serverless, on-demand dedicated and reserved capacity plus multi-LoRA hosting of many adapters on one base deployment.	中	SE021, SE020
CE010	Newer surfaces include a Voice Agent Platform with sub-500ms response and BYOB secure training from customer AWS S3 buckets.	中	SE017, SE019
CE011	Fireworks runs a proprietary multi-layer inference stack on commodity NVIDIA GPUs with a stateless router, draft and target pods, distributed KV cache and continuous batching.	中	SE001
CE012	FireAttention is a custom CUDA attention implementation Fireworks reports as faster than vLLM and TensorRT-LLM, extended for long context and Llama 4 chunked local attention.	中	SE006, SE001
CE013	FireOptimizer performs adaptive speculative execution with reported latency reductions up to roughly 3x and native FP4 support on NVIDIA Blackwell B200.	中	SE002, SE009
CE014	The serving topology scales to documented tests around 50,000 requests per minute.	低	SE001
CE015	Speculative decoding pairs a fast draft model with a full target model to generate and verify tokens in parallel, configurable per workload.	中	SE008, SE001
CE016	Fireworks' operating model is open-model neutral, betting on running whichever open model is winning rather than any single model.	中	SE017
CE017	Fireworks operates a global multi-region fleet including Frankfurt, Iceland and Tokyo plus US, Europe and APAC regions for latency and data residency.	中	SE017
CE018	Independent monitoring placed Fireworks' Q1 2026 uptime at 99.8%, the highest among specialized inference providers.	中	SE013
CE019	Notion reduced AI response latency from about 2 seconds to 350 milliseconds by fine-tuning with Fireworks.	中	SE015
CE020	Cursor reached about 1,000 tokens per second for code generation and Sourcegraph saw a 30% latency reduction and 2.5x acceptance increase on Fireworks.	中	SE014, SE016
CE021	The Series C-funded roadmap targets deeper tuning and inference-alignment research and an end-to-end model-lifecycle creation toolchain.	中	SE022, SE019
CE022	Fireworks plans a three-to-four-fold expansion of global compute and has acquired Hathora to deepen real-time orchestration.	中	SE022, SE017
CE023	Fireworks' core IP is the proprietary inference engine, especially FireAttention kernels and FireOptimizer, rather than registered patents.	中	SE002, SE017
CE024	No public patents are listed for Fireworks; its moat is engineering know-how.	低	SE017
CE025	Product-model co-design uses a customer data feedback loop with continuous evaluation and reinforcement learning to improve fine-tuned models over time.	中	SE022, SE003
CE026	Fireworks' optimization advantage sits atop open-source frameworks like vLLM and SGLang that keep improving, so differentiation must be continuously re-earned.	中	SE017, SE009
CE027	The platform depends on leading-edge NVIDIA and AMD GPUs, CUDA, cloud regions and upstream open models.	中	SE001, SE017
CE028	Fireworks offers zero data retention by default, SSO, audit logs and data-residency controls for enterprise buyers.	中	SE017
CE029	Fireworks' AWS-based inference solution is HIPAA and SOC2 Type II compliant.	高	SE007, SE017
CE030	For sensitive workloads Fireworks supports airgapped EKS deployments and bring-your-own-bucket secure training.	中	SE017
CE031	Structured-output controls such as JSON mode and grammar-constrained decoding plus high schema compliance support dependable agentic tool use.	中	SE013, SE010
CE032	Fireworks does not publish a formal standard-tier SLA, and reviewers note thin documentation in places, both diligence items for security-sensitive buyers.	中	SE013, SE025
CE033	FireFunction achieves roughly 92% multi-tool function-calling accuracy and 99.1% JSON schema compliance in independent benchmarks.	中	SE013, SE027
CE034	Fireworks maintains day-zero support for new models such as Llama 4, DeepSeek and Qwen as a core engineering discipline.	中	SE006, SE011, SE012
CE035	Fireworks publishes open benchmark tooling via its GitHub organization, a developer-signal of technical openness.	低	SE018
CU001	Fireworks' customer base spans AI-native startups, digital-native enterprises and large or regulated enterprises with distinct adoption paths.	中	SU009, SU007
CU002	AI-native startups such as Cursor, Perplexity, Liner and Cresta adopt Fireworks bottoms-up via self-serve API keys.	中	SU009, SU011
CU003	Digital-native enterprises including DoorDash, Notion, Shopify, Upwork and Quora run production AI features on Fireworks.	高	SU011, SU007
CU004	Use cases cluster around code assistance, conversational AI, enterprise search, agentic workflows and voice across software, e-commerce and customer-service verticals.	中	SU009, SU025
CU005	Fireworks' customer geography skews North American and European with global API access.	低	SU025
CU006	Fireworks reported powering over 10,000 companies at its October 2025 Series C, about a tenfold increase from roughly 1,000 at the Series B.	高	SU006, SU009
CU007	Fireworks serves hundreds of thousands of developers, up from 12,000 in February 2024 to 23,000 by the end of 2024.	中	SU006, SU010
CU008	The platform processes more than 10 trillion tokens per day, rising to about 15 trillion by early 2026.	中	SU006, SU007
CU009	Customers follow a land-and-expand path from serverless inference into dedicated deployments, fine-tuning, RFT, embeddings and voice.	中	SU009, SU017
CU010	Analyst commentary on Hebbia shows how a single inference relationship can grow into a broader infrastructure dependency.	中	SU017
CU011	Cursor used Fireworks' speculative-decoding API to reach roughly 1,000 tokens per second for Fast Apply, with a named researcher endorsing production use.	中	SU001, SU013
CU012	Notion reduced AI response latency from about 2 seconds to 350 milliseconds by fine-tuning with Fireworks, attributed by its head of AI engineering.	中	SU002
CU013	Sourcegraph cut latency by 30% and lifted completion acceptance 2.5x on Fireworks, corroborated by an AWS case study.	高	SU003, SU012
CU014	Upwork's Uma assistant drafts real-time proposals on Fireworks per a named executive.	中	SU004
CU015	Quora's Poe chatbot tripled response speed and Superhuman built its Ask AI compound system on Fireworks.	中	SU013, SU007
CU016	Fireworks' named references are mostly production deployments with quantified outcomes and executive attribution, giving the reference base high quality.	中	SU001, SU002, SU012
CU017	Fireworks does not disclose net revenue retention, gross retention, churn, renewal rates or contract lengths.	中	SU009, SU017
CU018	Customer durability must be inferred from structural signals such as land-and-expand design and production usage rather than disclosed metrics.	中	SU017, SU009
CU019	High daily token volume and named executive testimonials indicate strong repeat usage and satisfaction anecdotally.	低	SU006, SU002
CU020	The OpenAI-compatible API and routing aggregators make multi-homing and switching trivial, elevating churn risk.	中	SU018, SU021
CU021	Independent reviewers explicitly document Fireworks alternatives and switching paths, an adverse durability signal.	中	SU018
CU022	Fireworks' growth engine is land-and-expand, with a single serverless feature able to grow into dedicated, fine-tuning, voice and reserved-capacity spend.	中	SU009, SU017
CU023	Blended annualized revenue per company is roughly $28,000, likely understating a long tail beneath a few large accounts.	低	SU022
CU024	The identity and revenue share of Fireworks' top customers are not disclosed, creating unquantifiable top-customer concentration risk.	中	SU009, SU022
CU025	The AWS Strategic Collaboration Agreement and Microsoft Foundry availability are growth accelerants but also channel dependencies.	中	SU009, SU024
CU026	Procurement friction is lower than for closed APIs via cloud marketplaces, but enterprise sales cycles and compliance reviews still gate the largest deals.	低	SU009, SU024
CU027	Several marquee logos such as DoorDash and Shopify appear in aggregate marketing lists without standalone case studies.	低	SU007, SU020
CU028	Sophisticated public customers like GitLab disclose AI-vendor dependence in their filings, illustrating buyer-side multi-homing and substitution capacity.	低	SU016
CU029	WorkingAgents and other third parties corroborate Fireworks' compound-inference customer use cases for agentic workflows.	低	SU015
CU030	Samsung is cited by investors as an enterprise customer accelerating its AI roadmap on Fireworks.	中	SU011
CU031	The named reference base is high quality but partly dated to 2024, a freshness caveat for diligence.	中	SU003, SU012
CU032	Fireworks' customer logos are concentrated in technology, e-commerce, customer service and legal-tech verticals.	低	SU025
CU033	Production usage intensity is implied by 10-15 trillion tokens per day across the customer base.	中	SU006, SU007
CU034	Customer satisfaction evidence is positive but anecdotal, resting on named testimonials rather than survey or NPS data.	低	SU002, SU004
CU035	Retention is the weakest-evidenced dimension of Fireworks' customer story, a material diligence gap.	中	SU017, SU018
CR001	Inference commoditization and gross-margin compression are Fireworks' highest-severity risks.	高	SR001, SR011
CR002	Hyperscaler bundling by AWS, Azure and Google could capture the inference layer and relegate Fireworks to an optimization add-on.	中	SR001
CR003	NVIDIA is simultaneously Fireworks' GPU supplier, an investor and a competitor via Lepton and NIM.	中	SR001, SR008
CR004	Capital intensity from a planned three-to-four-fold compute expansion is a medium-severity risk.	中	SR021, SR001
CR005	Fireworks' mitigation thesis is to move up the stack faster than the serving layer commoditizes.	中	SR001
CR006	Residual risk exposure remains meaningful because several mitigations are unproven and key metrics are undisclosed.	中	SR001, SR012
CR007	The EU AI Act imposes tiered, risk-based obligations including transparency and documentation duties on general-purpose AI providers and deployers.	高	SR004, SR005
CR008	GDPR and data-residency requirements drive Fireworks' zero-data-retention and regional-deployment features.	中	SR006, SR001
CR009	Open models such as Llama carry acceptable-use and license terms that flow through to platforms serving them.	低	SR019, SR007
CR010	Fireworks lists no public patents, so its FireAttention and FireOptimizer advantages rest on trade secrets and know-how.	中	SR013, SR001
CR011	No material litigation or enforcement action against Fireworks is publicly known, and its Series C used top-tier legal counsel.	中	SR018, SR019
CR012	The NIST AI Risk Management Framework provides a voluntary governance baseline Fireworks and its customers can adopt.	低	SR020
CR013	Fireworks does not own its GPU fleet and sources NVIDIA and AMD capacity from third parties, exposing it to allocation and supply risk.	中	SR001, SR008
CR014	Fireworks does not publish a formal standard-tier SLA, so contractual reliability commitments are negotiated case by case.	中	SR012
CR015	Independently monitored Q1 2026 uptime of 99.8% is a reliability strength despite the absence of a published SLA.	中	SR012
CR016	Operating a global multi-region fleet adds operational complexity and cost for Fireworks.	低	SR001
CR017	Fireworks' SOC2 Type II, HIPAA, zero-retention and airgapped controls mitigate operational and security risk, with no public breach known.	中	SR001
CR018	A single serious outage or data incident would be especially damaging given customers' production, latency-sensitive workloads.	中	SR012, SR001
CR019	NVIDIA is the most acute dependency, supplying leading-edge GPUs while holding a stake and competing through Lepton, a GPU marketplace and NIM.	中	SR001, SR008
CR020	AMD provides an alternative silicon supplier, partly diversifying Fireworks' NVIDIA dependence.	中	SR025
CR021	AWS and Microsoft are both distribution partners and bundling threats via Bedrock, Vertex and Azure Foundry.	中	SR001
CR022	Fireworks depends on continued release and permissive licensing of open models from Meta, DeepSeek and Alibaba.	中	SR001, SR009
CR023	Capital-provider concentration among a handful of late-stage funds and key-customer multi-homing add dependency risk.	低	SR022, SR028
CR024	Fireworks' enabling partners NVIDIA, AWS and Microsoft are also its most credible competitors.	中	SR001
CR025	Gross margin near 50% is structurally below software norms and faces persistent downward price pressure.	高	SR001, SR011
CR026	The path to a 60% gross margin depends on unproven utilization gains and a revenue-mix shift.	中	SR001
CR027	Burn, runway and net revenue retention are undisclosed, so Fireworks' capital adequacy is asserted rather than verified.	中	SR001, SR021
CR028	The valuation ramp from $552 million to $4 billion within roughly fifteen months, with talk of $15 billion, embeds aggressive growth expectations.	中	SR022, SR023
CR029	Key-person risk is concentrated in CEO Lin Qiao, who leads vision and fundraising.	中	SR024
CR030	Retaining elite inference engineers in a hot talent market is a continuing execution challenge.	低	SR024, SR001
CR031	Fireworks' mitigations include moving up the stack, diversifying silicon, maintaining day-zero model support and hardening compliance.	中	SR001
CR032	Plugging into AWS and Azure procurement is a defensive mitigation against hyperscaler bundling.	中	SR001
CR033	Execution risk centers on whether the unproven up-the-stack expansion outruns commoditization.	中	SR001
CR034	Gross-margin trajectory toward 60% is the single best monitoring indicator of Fireworks' risk profile.	中	SR001
CR035	The clearest thesis-break triggers are margin stuck at ~50%, hyperscaler/NVIDIA capture, a key-person departure, or growth stalling versus the valuation.	中	SR001, SR022
CR036	Priority diligence asks are a reconciled revenue and margin figure, NRR and burn, GPU-supply contracts, and top-customer concentration.	中	SR001, SR012
CR037	Public infrastructure peers such as Datadog, Snowflake, Confluent and Cloudflare disclose AI-competition and margin risk factors that contextualize Fireworks' exposures.	中	SR014, SR015, SR016, SR017
CR038	DigitalOcean's filings illustrate the lower-margin reality of infrastructure-heavy businesses relative to pure software.	低	SR030
CR039	Better-capitalized rivals such as Baseten raise the competitive stakes for Fireworks' enterprise go-to-market.	中	SR028, SR027
CR040	Low switching costs from OpenAI-compatible APIs and routers cap retention and amplify commoditization risk.	中	SR003, SR013
CR041	US export controls and supply constraints on advanced GPUs are an indirect risk transmitted through Fireworks' NVIDIA dependence.	低	SR008, SR009
CR042	Fireworks' terms of service allocate liability and usage restrictions that are standard but warrant review for enterprise indemnification.	低	SR019
CV001	The bull thesis is that Fireworks is becoming critical enterprise AI infrastructure as enterprises shift from closed-API experimentation to owning customized open models in production.	中	SV026, SV008
CV002	The anti-thesis is that inference is structurally commoditizing, with ~50% margins, near-zero switching costs, and hyperscaler and NVIDIA repricing risk.	中	SV001, SV016
CV003	Fireworks pairs a PyTorch-pedigree founding team with FireAttention, FireOptimizer, best-in-class function calling and 99.8% uptime.	中	SV026, SV001
CV004	Fireworks grew from roughly $130 million ARR in mid-2025 to a reported ~$800 million annualized by May 2026 across 10,000-plus customers.	中	SV001, SV029
CV005	Fireworks' per-token prices sit within ~2% of Together and open-source serving frameworks keep closing the performance gap, supporting the commoditization anti-thesis.	中	SV016, SV001
CV006	A valuation that ran from $552 million to $4 billion in fifteen months, with $15 billion in talks, prices in flawless execution.	中	SV001, SV008, SV029
CV007	We rate Fireworks AI track with medium confidence, a high risk rating and a stretched valuation stance.	中	SV001, SV016
CV008	We assign an overall score of 6.5 out of 10, reflecting a strong business at a demanding price.	低	SV001, SV026
CV009	The $4 billion Series C implied roughly 14 times the company-stated $280 million annualized revenue.	中	SV008, SV001
CV010	The rumored $15 billion round implies roughly 19 times Sacra's ~$800 million May 2026 revenue estimate.	低	SV001, SV004
CV011	Confidence is capped at medium primarily by the absence of audited financials, disclosed NRR and a reconciled revenue figure.	中	SV001, SV016
CV012	Fireworks has raised over $327 million across seed, a $25M Series A, a $52M Series B at $552M and a $250M Series C at $4B.	高	SV008, SV001
CV013	The Series C comprised roughly $230 million primary and a $20 million secondary.	中	SV001
CV014	As of May 2026 Fireworks is reportedly in talks to raise at a $15 billion post-money valuation co-led by Index Ventures.	中	SV001, SV004, SV002
CV015	Public evidence supports Fireworks' growth and customer story but not its financial quality, since revenue is unaudited, margin is estimated, and burn is undisclosed.	中	SV001, SV016
CV016	Strategic investors NVIDIA, AMD, MongoDB and Databricks on the cap table add ecosystem support but concentrate supplier and partner influence.	中	SV008, SV027
CV017	The base case (~45%) assumes ~$700-900 million 2026 revenue and low-50s margins, implying a fair value around $5-8 billion.	低	SV001, SV005
CV018	The bull case (~30%) assumes margins toward 58-60% and revenue past $1.5 billion by 2027, justifying $15-20 billion.	低	SV001, SV015
CV019	The bear case (~25%) assumes commoditization and hyperscaler capture compressing the multiple to a $2-3 billion range or a down round.	低	SV016, SV001
CV020	The valuation dispersion is unusually wide because the same company can be read as durable infrastructure or a commoditizing reseller.	中	SV001, SV015
CV021	The deciding evidence between scenarios, gross-margin trajectory and retention, is not yet disclosed.	中	SV001
CV022	Together AI was valued at $3.3 billion on about $618 million annualized revenue in early 2025, roughly 5x, and is reportedly near $7.5 billion on about $1 billion.	中	SV005
CV023	Baseten raised at a $5 billion valuation in January 2026 with talks of $11 billion, and Groq reached $6.9 billion as a hardware-led player, while Fal is cited around $4.5 billion.	中	SV006, SV007, SV002
CV024	Public infrastructure-software comparables such as Datadog, Snowflake, Cloudflare and Confluent frame a broad, compressed multiple band with 70%-plus gross margins.	中	SV011, SV012, SV013, SV020
CV025	DigitalOcean illustrates that lower-margin infrastructure businesses trade at clear discounts to pure software, supporting a discount for Fireworks' ~50% margins.	中	SV014
CV026	Hyperscalers Amazon, Microsoft and Oracle are both the scale reference and the competitive threat to Fireworks' valuation.	中	SV017, SV018, SV019
CV027	At $4 billion on ~$280 million Fireworks looks rich versus Together's multiple but is on a smaller, faster-growing base; on ~$800 million it looks comparatively cheap.	中	SV001, SV005
CV028	Plausible exit paths include an IPO on sustained hypergrowth or strategic acquisition by a hyperscaler or data-platform investor that is also a competitor.	低	SV017, SV018
CV029	The principal thesis-break triggers are margin failing to rise off ~50%, hyperscaler or NVIDIA capture, a key-person departure, or growth stalling versus the valuation.	中	SV001, SV016
CV030	A net revenue retention below roughly 110% once disclosed would warrant a lower multiple.	低	SV001
CV031	Priority diligence asks are a reconciled dated revenue figure, audited gross margin and the path to 60%, NRR and churn, burn and runway, GPU-supply terms, top-customer concentration and preference and dilution structure.	中	SV001, SV016
CV032	Until margin and retention are confirmed, the right posture is to track closely, underwrite to the base case, and reserve premium entry for confirmation of the infrastructure thesis.	中	SV001, SV015
CV033	Together's prior round at $1.25 billion on $130 million 2024 revenue traded at 9.6x, a useful inference-peer multiple benchmark.	中	SV005
CV034	Fireworks' ~50% gross margin warrants a discount to the 70%-plus-margin public-software multiples because GPU costs sit in COGS.	中	SV014, SV001
CV035	The $15 billion valuation talk is corroborated by Sacra and multiple news outlets as of late May 2026 but remains unconfirmed.	中	SV001, SV002, SV003, SV024
CV036	The large AI inference TAM growing near 19% annually supports a premium for category leaders but does not by itself justify any single multiple.	中	SV030, SV015
CV037	A premium entry would become attractive if Fireworks demonstrates a credible path to 60% margins and net revenue retention above 120%.	低	SV001
CV038	Usage-based comparables like Twilio and AI-software names like C3.ai bound the multiple range for consumption- and AI-exposed businesses.	低	SV021, SV023
CV039	Preference stack and liquidation overhang are not publicly disclosed and must be diligenced before a late-stage entry.	低	SV001, SV010
CV040	Salesforce and other large software comps illustrate mature-growth multiple compression that a maturing Fireworks would eventually face.	低	SV022

来源
编号	出版方	标题	引文
SO001	Fireworks AI	Fireworks AI - Fastest Inference for Generative AI
SO002	Fireworks AI	Fireworks AI Raises $250M Series C to Power the Future of Enterprise AI	Today, we're announcing a $250 million Series C at a $4 billion valuation ... brings our total funding to over $327 million
SO003	Fireworks AI	Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems	We're thrilled to announce our $52M Series B funding round led by Sequoia Capital, raising our valuation to $552M.
SO004	Index Ventures	Inference is the New Runtime: Our Investment in Fireworks	Alongside co-founders Dmytro Dzhulgakov, Dmytro Ivchenko, and James Reed ... as well as Benny Chen, Chenyu Zhao, and Pawel Garbacki
SO005	Sequoia Capital	Fireworks Founder Lin Qiao on Fast Inference and Small Models
SO006	The AI Insider	Fireworks AI Closes $250M Series C to Lead the AI Inference Market
SO007	The AI Insider	Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems
SO008	PYMNTS	Fireworks AI Valued at $552 Million After New Funding Round
SO009	Tech Funding News	NVIDIA, Sequoia invest in GenAI startup Fireworks AI's $52M round
SO010	The SaaS News	Fireworks AI Raises $52 Million in Series B
SO011	AI Curator	Fireworks AI Closes $250M Round, Eyes AI Inference Lead
SO012	AIM Media House	Fireworks AI raises $250 million for enterprise AI infrastructure
SO013	Sacra	Fireworks AI revenue, valuation & funding	Sacra estimates that Fireworks AI hit $800M in annualized revenue in May 2026, up from about $305M at the end of 2025.
SO014	Scroll.media	Fireworks AI has a valuation of $552 million. Ukrainians among the founders.	the company is already profitable, growing at an astonishing x20 year-over-year, with $130 million in annual recurring revenue (ARR).
SO015	The Stack	Fireworks AI's Lin Qiao: The future is compound AI
SO016	TWIML AI	Lin Qiao profile
SO017	Crunchbase	Fireworks AI - Company Profile
SO018	AI Market Watch	Fireworks AI - AI Startup Profile	Scaled to 15 trillion tokens processed daily and $315M+ annualized revenue by early 2026.
SO019	SiliconANGLE	Fireworks AI raises $250M at $4B valuation to help enterprises with AI inference workloads
SO020	Business Wire	Fireworks AI Raises $250M Series C to Lead the AI Inference Market
SO021	Orrick	Fireworks AI Raises $250 Million Series C at $4 Billion Valuation
SO022	Tech Funding News	PyTorch engineers' brainchild Fireworks AI closes $250M at $4B valuation
SO023	Exa	Meet the Executive Team at Fireworks AI
SO024	GitHub	Fireworks AI (fw-ai) GitHub organization
SO025	Fireworks AI	Fireworks AI Careers
SO026	eesel AI	An honest Fireworks AI review (2025): The good, the bad, and the ugly	Fireworks excels at performance and model selection, but it is 'just the engine' - developers and businesses still need technical sophistication to build deployable solutions.
SM001	MarketsandMarkets	AI Inference Market - Global Forecast to 2030	the AI inference market is expected to grow from USD 106.15 billion in 2025 to USD 254.98 billion by 2030, at a CAGR of 19.2%
SM002	Polaris Market Research	AI Inference Market Size & Trends, Industry Report 2034
SM003	Research and Markets	AI Inference Market Outlook 2026-2034
SM004	Vention	State of AI 2026 - AI Market Size, Investment, and Industry Data
SM005	Precedence Research	AI Inference Market Size and Forecast
SM006	Digital Applied	AI Inference Providers Compared: Q2 2026 Pricing Matrix	By Q2 2026 the serverless inference market has consolidated around seven providers - Together, Fireworks, Anyscale, Groq, Cerebras, Replicate, and OctoAI.
SM007	Alatirok	AI Inference Providers in 2026: 5-Way Comparison
SM008	Jimmy Research	Fireworks AI - entity profile
SM009	Index Ventures	Inference is the New Runtime: Our Investment in Fireworks	Gartner projects GenAI model spend to nearly triple from $14 billion in 2025 to $39 billion by 2028
SM010	Sacra	Fireworks AI revenue, valuation & funding
SM011	Sacra	Together AI revenue, valuation & funding	Sacra estimates that Together AI hit $1B in annualized revenue in February 2026, up from ~$618M at the end of 2025.
SM012	Sacra	Baseten revenue, valuation & funding
SM013	Fireworks AI	Fireworks AI Raises $250M Series C
SM014	SiliconANGLE	Fireworks AI raises $250M at $4B valuation
SM015	Microsoft Azure	Introducing Fireworks AI on Microsoft Foundry
SM016	Together AI	Together AI - The AI Acceleration Cloud
SM017	Together AI	Together AI Pricing
SM018	Baseten	Baseten - Inference Platform
SM019	Groq	Groq - Fast, low cost inference
SM020	Modal	Modal - High-performance AI infrastructure
SM021	Replicate	Replicate - Run AI with an API
SM022	Anyscale	Anyscale - Scalable compute for AI
SM023	TokenMix	Fireworks AI Review 2026
SM024	DeployBase	Fireworks AI Pricing Breakdown
SM025	eesel AI	An honest Fireworks AI review (2025)	the industry expects this downward pricing pressure to intensify by 2025-2026, making it difficult for any single provider to maintain high profit margins
SM026	EU AI Act (artificialintelligenceact.eu)	High-level summary of the AI Act
SP001	TokenMix	Fireworks AI Review 2026: 99.8% Uptime vs Together and Groq	Fireworks: 99.8% uptime + best function calling, 50+ models, $0.90/M. Together: 200+ models + cheap fine-tuning, $0.88/M. Groq: ultra-low latency, $0.59/M but lowest uptime (99.4%).
SP002	Sacra	Together AI revenue, valuation & funding	Together AI raised a $305M Series B in February 2025 led by General Catalyst ... valuing the company at $3.3B
SP003	Sacra	Baseten revenue, valuation & funding
SP004	Business Wire	Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future
SP005	DatacenterDynamics	AI chip company Groq raises $750m at $6.9bn valuation
SP006	Dataconomy	AI chip startup Groq raises $750 million at a $6.9 billion valuation
SP007	The AI World	Baseten raises $300M to scale AI inference
SP008	TechBuzz	Groq Raises $750M at $6.9B Valuation to Challenge Nvidia's AI Dominance
SP009	Sacra	Fireworks AI revenue, valuation & funding (competition section)	Together AI is Fireworks' closest direct competitor ... Baseten raised a $300M Series E at a $5 billion valuation
SP010	Digital Applied	AI Inference Providers Compared: Q2 2026 Pricing Matrix
SP011	DeployBase	Fireworks AI Pricing Breakdown vs competitors
SP012	Modal	Modal - High-performance AI infrastructure
SP013	Replicate	Replicate - Run AI with an API
SP014	Anyscale	Anyscale - Scalable compute for AI
SP015	Baseten	Baseten Pricing
SP016	Microsoft Foundry	Fireworks models on Microsoft Foundry
SP017	Groq	Groq - Fast, low cost inference
SP018	Together AI	Together AI - The AI Acceleration Cloud
SP019	Alatirok	AI Inference Providers in 2026: 5-Way Comparison
SP020	Walturn	What is Fireworks AI? Features, Pricing, and Use Cases
SP021	createaiagent.net	Fireworks AI: Optimized Inference Solutions
SP022	Fireworks AI	Fireworks AI Raises $250M Series C
SP023	eesel AI	An honest Fireworks AI review (2025)	Critics note that, while Fireworks excels at performance and model selection, it is 'just the engine'.
SP024	SiliconANGLE	Fireworks AI raises $250M at $4B valuation
SP025	Together AI	Together AI Pricing
SI001	Fireworks AI	Fireworks AI Raises $250M Series C	our annualized revenue has surpassed $280 million ... Growing our computation footprint 3-4x over the next year
SI002	Sacra	Fireworks AI revenue, valuation & funding	The company's gross margin sits at approximately 50% ... Fireworks has told investors it is targeting 60% gross margins
SI003	Fireworks AI	Fireworks AI Pricing
SI004	TokenMix	Fireworks AI Review 2026 - pricing breakdown	Llama 70B $0.90/M, Llama 8B $0.20/M, DeepSeek V3 $0.50/M ... Reserved capacity ... approximately $4.80/hour
SI005	DeployBase	Fireworks AI Pricing Breakdown: Cost Per Token
SI006	SiliconANGLE	Fireworks AI raises $250M at $4B valuation
SI007	Amazon Web Services	Fireworks.ai Case Study	the customer cut total costs by four times ... HIPAA and SOC2 Type II compliant
SI008	Markaicode	Fireworks AI Architecture: Multi-Layer Inference Stack for 50K RPM
SI009	Scroll.media	Fireworks AI valuation and ARR	the company is already profitable, growing at an astonishing x20 year-over-year, with $130 million in annual recurring revenue (ARR).
SI010	AI Market Watch	Fireworks AI - AI Startup Profile	Scaled to 15 trillion tokens processed daily and $315M+ annualized revenue by early 2026.
SI011	Digital Applied	AI Inference Providers Compared: Q2 2026 Pricing Matrix
SI012	U.S. Securities and Exchange Commission (NVIDIA)	NVIDIA Corporation Form 10-K (FY ended January 25, 2026)
SI013	U.S. Securities and Exchange Commission (AMD)	Advanced Micro Devices Form 10-K (FY ended December 27, 2025)
SI014	U.S. Securities and Exchange Commission (MongoDB)	MongoDB, Inc. Form 10-K (FY ended January 31, 2026)
SI015	Fireworks AI	Fireworks AI Docs - Concepts
SI016	Fireworks AI	FireOptimizer: Customizing latency and quality
SI017	Index Ventures	Inference is the New Runtime
SI018	Fireworks AI	Multi-LoRA: Personalize AI at scale
SI019	Sanjay Says	Fireworks AI and Adaptive Speculative Execution
SI020	eesel AI	An honest Fireworks AI review (2025)	there is pressure for all inference providers to cut prices ... making it difficult for any single provider to maintain high profit margins
SI021	Crunchbase	Fireworks AI - Company Profile
SI022	Business Wire	Fireworks AI Raises $250M Series C
SI023	Tech Funding News	Fireworks AI closes $250M at $4B valuation
SI024	Fireworks AI	Fireworks AI Docs - Deploying LoRAs
SI025	Fireworks AI	Fireworks AI Docs - Changelog
SE001	Markaicode	Fireworks AI Architecture: Multi-Layer Inference Stack for 50K RPM	Stateless request router ... Draft GPU pods running a small fast model ... Target GPU pods ... Distributed KV cache ... above 85 tokens/sec per GPU
SE002	Fireworks AI	FireOptimizer: Customizing latency and quality for production
SE003	Fireworks AI	Reinforcement Fine Tuning: Train expert open models to surpass closed
SE004	Fireworks AI	Fireworks RFT: Build AI agents with fine-tuned open models
SE005	Fireworks AI	Introducing Supervised Fine Tuning V2
SE006	Fireworks AI	Optimizing Llama 4 Maverick on Fireworks AI	Llama 4 Maverick became available day one on Fireworks with support for 1-million-token context ... custom attention via FireAttention
SE007	Amazon Web Services	Fireworks.ai Case Study (HIPAA / SOC2)	the Fireworks.ai inference solution built on AWS is HIPAA and SOC2 Type II compliant
SE008	Fireworks AI	Speculative Decoding - Fireworks AI Docs
SE009	Sanjay Says	Fireworks AI and Adaptive Speculative Execution
SE010	Fireworks AI	Fireworks AI Docs - Introduction
SE011	Fireworks AI	DeepSeek V3.1 now on Fireworks AI
SE012	Fireworks AI	Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost
SE013	TokenMix	Fireworks AI Review 2026: uptime and function calling benchmarks	FireFunction 92.1% multi-tool accuracy ... 99.8% uptime, highest in inference market
SE014	Fireworks AI	How Cursor built Fast Apply using the Speculative Decoding API	Cursor ... achieve 1000 tokens/sec for code generation use cases such as instant apply
SE015	Fireworks AI	How Notion fine-tuned with Fireworks	we reduced latency from about 2 seconds to 350 milliseconds
SE016	Fireworks AI	How Sourcegraph scaled real-time code assistance with Fireworks
SE017	Sacra	Fireworks AI - enterprise security posture	zero data retention by default, SSO, audit logs, data residency controls, HIPAA and SOC2 compliance posture, and airgapped EKS deployments
SE018	GitHub	Fireworks AI (fw-ai) GitHub organization and benchmarks
SE019	Fireworks AI	Fireworks AI Dev Day 2025 Wrapped
SE020	Fireworks AI	Multi-LoRA: Personalize AI at scale
SE021	Fireworks AI	Fireworks AI Docs - Concepts
SE022	Fireworks AI	Fireworks AI Raises $250M Series C (roadmap)	Expand Our Product into a Comprehensive AI Creation Toolchain ... Growing our computation footprint 3-4x
SE023	Fireworks AI	Fireworks AI - AI-native
SE024	Fireworks AI	Fireworks AI Docs - Deploying LoRAs
SE025	eesel AI	An honest Fireworks AI review (2025): documentation gaps	Some reviews point to limited transparency around free usage, sporadic documentation, and potential support slowdowns
SE026	Walturn	What is Fireworks AI? Features, Pricing, and Use Cases
SE027	DeployBase	Fireworks AI Pricing and capabilities breakdown
SU001	Fireworks AI	How Cursor built Fast Apply using the Speculative Decoding API	Fireworks is way more performant than the open source engines and is what we use in production.
SU002	Fireworks AI	How Notion fine-tuned models with Fireworks	we reduced latency from about 2 seconds to 350 milliseconds
SU003	Fireworks AI	Real-time code assistance: How Sourcegraph scaled with Fireworks
SU004	Fireworks AI	How Upwork and Fireworks deliver faster proposals (Uma)
SU005	Fireworks AI	Accelerating Code Completion with Fireworks Fast LLM Inference
SU006	Fireworks AI	Fireworks AI Raises $250M Series C (customer scale)	Fireworks now powers over 10,000 companies (a 10x increase from our Series B)
SU007	AI Market Watch	Fireworks AI - notable customers and growth metrics	Notable customers: Quora, DoorDash, Upwork, Cresta, Cursor, Liner, Superhuman, Sourcegraph, Tome, Samsung, Uber, Notion, Shopify
SU008	Fireworks AI	Fireworks AI - Customers
SU009	Sacra	Fireworks AI - customer base and expansion	The customer base grew from roughly 1,000 companies at the time of the Series B to more than 10,000 companies by October 2025.
SU010	Scroll.media	Fireworks AI developer growth 2024	The number of developers using Fireworks AI jumped from 12,000 in February 2024 to 23,000 by year's end.
SU011	Index Ventures	Inference is the New Runtime (customer references)	high-throughput, latency-sensitive applications at companies like Uber, DoorDash, Notion, Quora, and Upwork ... enterprise leaders like Samsung
SU012	Amazon Web Services	Fireworks.ai Case Study (Sourcegraph / Cody)	Cody doubled its completion acceptance rate ... Cody's backend latency accelerated by more than two times.
SU013	Fireworks AI	Fireworks AI Series B (Cursor, Quora, Upwork, Superhuman)	Superhuman ... used Fireworks to create Ask AI, a compound AI system
SU014	Fireworks AI	Fireworks AI - AI-native customers
SU015	WorkingAgents	Fireworks AI: The Compound Inference Engine
SU016	GitLab Inc. (SEC EDGAR)	GitLab Inc. Form 10-K (FY ended January 31, 2026)
SU017	Sacra	Fireworks AI - retention and expansion dynamics	a single inference relationship can anchor a broader infrastructure dependency over time
SU018	eesel AI	Fireworks AI alternatives and switching considerations
SU019	eesel AI	An honest Fireworks AI review (2025)
SU020	Fireworks AI	Fireworks AI homepage (customer logos)
SU021	TokenMix	Fireworks AI Review 2026 - production usage
SU022	Sacra	Fireworks AI - business model and ARPA	Blended annualized revenue per company works out to roughly $28,000 across the full base
SU023	Fireworks AI	Fireworks AI Blog index
SU024	Fireworks AI	Fireworks AI at AWS re:Invent 2025
SU025	AI Market Watch	Fireworks AI - geographic focus and industries
SR001	Sacra	Fireworks AI - risks section	the proprietary performance advantage in FireAttention and FireOptimizer is likely to compress ... Hyperscaler capture ... Hardware concentration
SR002	eesel AI	An honest Fireworks AI review (2025): risks
SR003	eesel AI	Fireworks AI alternatives (switching risk)
SR004	EU AI Act (artificialintelligenceact.eu)	High-level summary of the AI Act
SR005	EU AI Act (artificialintelligenceact.eu)	Article 53: Obligations for providers of general-purpose AI models
SR006	GDPR.eu	What is GDPR, the EU's data protection law?
SR007	European Commission	Regulatory framework for AI
SR008	U.S. Securities and Exchange Commission (NVIDIA)	NVIDIA Corporation Form 10-K (FY ended January 25, 2026)
SR009	DatacenterDynamics	Groq raises $750m at $6.9bn valuation (silicon competition)
SR010	Dataconomy	Groq raises $750 million (NVIDIA challenge)
SR011	Digital Applied	AI Inference Providers Pricing Matrix Q2 2026 (price pressure)
SR012	TokenMix	Fireworks AI Review 2026 (SLA and pricing risk)	Fireworks AI does not publish a formal SLA for its standard tier
SR013	Walturn	What is Fireworks AI? (risks and lock-in)
SR014	U.S. Securities and Exchange Commission (Datadog)	Datadog, Inc. Form 10-K (FY ended December 31, 2025)
SR015	U.S. Securities and Exchange Commission (Snowflake)	Snowflake Inc. Form 10-K (FY ended January 31, 2026)
SR016	U.S. Securities and Exchange Commission (Confluent)	Confluent, Inc. Form 10-K (FY ended December 31, 2025)
SR017	U.S. Securities and Exchange Commission (Cloudflare)	Cloudflare, Inc. Form 10-K (FY ended December 31, 2025)
SR018	Orrick	Fireworks AI Series C legal counsel
SR019	Fireworks AI	Fireworks AI Terms of Service
SR020	NIST	AI Risk Management Framework
SR021	Fireworks AI	Fireworks AI Raises $250M Series C (use of funds / capital intensity)
SR022	SiliconANGLE	Fireworks AI raises $250M at $4B valuation (valuation ramp)
SR023	Scroll.media	Fireworks AI valuation ramp 552M to 4B
SR024	Index Ventures	Inference is the New Runtime (founder dependency)
SR025	Advanced Micro Devices (SEC EDGAR)	AMD Form 10-K (alternative silicon supply)
SR026	DeployBase	Fireworks AI Pricing (margin pressure)
SR027	Alatirok	AI Inference Providers 2026 (competitive risk)
SR028	Business Wire	Baseten Raises $300M (capital asymmetry)
SR029	GitLab Inc. (SEC EDGAR)	GitLab Form 10-K (AI vendor risk-factor comparable)
SR030	DigitalOcean (SEC EDGAR)	DigitalOcean Form 10-K (infrastructure margin comparable)
SV001	Sacra	Fireworks AI revenue, valuation & funding	Fireworks AI is in talks to raise a new funding round at a $15 billion post-money valuation, with Index Ventures set to co-lead.
SV002	AI Weekly	Fireworks AI Targets $15B Valuation in New Round
SV003	StartupNews.fyi	Fireworks AI Seeks $15B Funding, Quadrupling Valuation
SV004	Yahoo Finance	Fireworks AI Eyes $15 Billion Valuation In New Funding Talks
SV005	Sacra	Together AI revenue, valuation & funding	Based on 2024 revenue of $130M and a $1.25B valuation, the company traded at a 9.6x revenue multiple at its prior round.
SV006	Sacra	Baseten revenue, valuation & funding
SV007	DatacenterDynamics	Groq raises $750m at $6.9bn valuation
SV008	Fireworks AI	Fireworks AI Raises $250M Series C at $4B valuation
SV009	SiliconANGLE	Fireworks AI raises $250M at $4B valuation
SV010	Orrick	Fireworks AI Raises $250 Million Series C at $4 Billion Valuation
SV011	U.S. Securities and Exchange Commission (Datadog)	Datadog, Inc. Form 10-K (FY 2025)
SV012	U.S. Securities and Exchange Commission (Snowflake)	Snowflake Inc. Form 10-K (FY 2026)
SV013	U.S. Securities and Exchange Commission (Cloudflare)	Cloudflare, Inc. Form 10-K (FY 2025)
SV014	U.S. Securities and Exchange Commission (DigitalOcean)	DigitalOcean Holdings Form 10-K (FY 2025)
SV015	a16z	AI Inference Economics
SV016	eesel AI	An honest Fireworks AI review (2025): margin and commoditization
SV017	U.S. Securities and Exchange Commission (Amazon)	Amazon.com, Inc. Form 10-K (FY 2025)
SV018	U.S. Securities and Exchange Commission (Microsoft)	Microsoft Corporation Form 10-K (FY 2025)
SV019	U.S. Securities and Exchange Commission (Oracle)	Oracle Corporation Form 10-K (FY 2025)
SV020	U.S. Securities and Exchange Commission (Confluent)	Confluent, Inc. Form 10-K (FY 2025)
SV021	U.S. Securities and Exchange Commission (Twilio)	Twilio Inc. Form 10-K (FY 2025)
SV022	U.S. Securities and Exchange Commission (Salesforce)	Salesforce, Inc. Form 10-K (FY 2026)
SV023	U.S. Securities and Exchange Commission (C3.ai)	C3.ai, Inc. Form 10-K (FY 2025)
SV024	CryptoBriefing	Fireworks AI reportedly seeks funding at $15 billion valuation
SV025	Briefs.co	Fireworks AI Eyes $15B Valuation In New Funding Round
SV026	Index Ventures	Inference is the New Runtime (thesis)
SV027	Tech Funding News	Fireworks AI closes $250M at $4B valuation
SV028	AI Market Watch	Fireworks AI - revenue and valuation profile
SV029	Scroll.media	Fireworks AI valuation ramp 552M to 4B
SV030	MarketsandMarkets	AI Inference Market - Global Forecast to 2030