初创公司尽调
尽调报告 AI inference infrastructure / developer tools Series C (private) 2026-06-14

Fireworks AI

面向开放模型的推理云,定价容不得失误

Fireworks AI 是顶级 AI 推理资产,创始团队一线、增长极快;但约 50% 毛利率和结构性商品化风险之下,估值已按完美执行定价。

封面要素

估值 01
4 USD billion (Series C, Oct 2025) [CO011]
累计融资 02
327 USD million+ [CO014]
年化收入 03
800 USD million (Sacra est., May 2026) [CI018]
客户 04
10000 companies+ [CO018]
每日 tokens 05
15 trillion (early 2026) [CO022]
毛利率 06
50 percent (est.) [CO024]

公司概况

Fireworks AI 是一家位于 Redwood City 的 AI 推理云公司,由 Lin Qiao 和一批前 Meta PyTorch 工程师于 2022 年创立。它让企业通过 OpenAI 兼容 API,在生产环境运行、微调并扩展数百个开源 LLM、图像、音频和多模态模型;差异化来自自研推理优化 (FireAttention、FireOptimizer)、同类最佳的函数调用能力和领先的可靠性。公司在 2025 年 10 月以 $4B 估值完成 $250M Series C,收入高速增长,第三方估算年化收入约 $800M;客户超过 10,000 家,包括 Cursor、Notion、DoorDash 和 Samsung。

官网
fireworks.ai
成立时间
2022-01-01
创始人
Lin Qiao, Dmytro Dzhulgakov, Dmytro Ivchenko
创立地点
Redwood City, California, USA
总部
Redwood City, California, USA
产品
按使用量计费的 AI 推理平台:面向开放模型的按 token 无服务器推理、LoRA 与强化微调、专用和预留 GPU 部署、函数调用模型族 (FireFunction)以及语音智能体平台,全部跑在自研优化推理引擎上。
客户
AI 原生初创公司、数字原生企业,以及部分构建生产级生成式 AI 应用的 Fortune 500 买家;这些客户需要快速、低成本、可控的开放模型推理。
商业模式
B2B 按量计费覆盖无服务器(按 token)、微调(按训练 token)、强化微调(按 GPU-hour)以及专用 / 预留部署;开发者自下而上进入,再扩展为议价企业合同。
阶段
Series C (private, venture-backed)
融资情况
2025 年 10 月以 $4B 估值完成 $250M Series C,累计融资超过 $327M;据称截至 2026 年 5 月,公司正洽谈一轮由 Index Ventures 共同领投、估值约 $15B 的融资(未确认)。
[CO001, CO011, CO014, CO018]

执行摘要

主要优势

  • 稀缺的创始人市场契合:当年在 Meta 打造 PyTorch 的团队,如今在做推理系统。
  • 增长极快:据称年化收入约 $800M,覆盖 10,000+ 客户,每日处理 15T tokens。
  • 生产级客户背书优质:Cursor、Notion、Sourcegraph、Upwork 均给出可量化结果。
  • 工程驱动差异化明显:FireAttention、FireOptimizer、一流 function calling,以及 99.8% uptime。

主要风险

  • 推理正在商品化;约 50% 毛利率低于软件公司 70%+ 的常态。
  • Hyperscaler 捆绑销售压力大(Bedrock、Azure、Vertex),NVIDIA 同时是供应商、投资方和竞争者。
  • 切换成本低、多平台并用常见,压住留存和定价权。
  • 估值爬升激进($552M 到 $4B,洽谈中达 $15B),已经押注零失误执行。

未决问题

  • 没有审计财务,也没有一个可对账、带日期的收入数字;一年内外部估算相差 6 倍。
  • 毛利率、净收入留存、流失、烧钱和 runway 均未披露。
  • 头部客户收入集中度和 GPU 供应合同条款未公开。
  • 下一轮的优先权栈和稀释结构未披露。

目录

Chapter 01

01公司概览

1.1 身份与商业模式

Fireworks AI 是一家美国人工智能基础设施公司,总部位于 California Redwood City,由一支离开 Meta PyTorch 组织的团队于 2022 年末创立。公司运营其所谓的「AI Cloud」,面向企业开发团队:托管推理平台以低延迟服务运行、微调并扩展开源大语言、视觉、音频和多模态模型。它的核心判断是 “one-size-fits-one”(一客一策)推理:最高价值的 AI 不会来自少数通用封闭基础模型,而会建立在更小、可定制、并用企业专有数据调优的开放模型上。商业化按客户生命周期计费:无服务器推理按 token 收费,微调按训练 token 收费,强化微调按 GPU-hour 收费,按需或预留专用部署按 GPU-second 或 GPU-hour 收费。平台提供数百个模型、OpenAI 兼容 API、函数调用和企业安全控制,把 Fireworks 放在商品化 GPU 租赁与封闭模型 API 之间。[CO001, CO002, CO003, CO004, CO005, CO031]

FO002: 公司快照逻辑

身份、产品、客户、资本和依赖如何串起来。

[CO001, CO004, CO018, CO024, CO028]

1.2 创始人与领导层

Fireworks AI 由 CEO Lin Qiao 与六位同事共同创立,多数人曾在 Meta 共同参与 PyTorch。Qiao 此前是 Meta 工程高级总监兼 PyTorch 负责人,带过 300 多名工程师;更早在 LinkedIn、IBM 及其他大型系统公司任职;拥有 UC Santa Barbara 计算机科学博士学位。联合创始人包括 Dmytro Dzhulgakov,他是前核心 PyTorch 维护者,2011 年加入 Facebook;以及 Dmytro Ivchenko,毕业于 Kyiv Polytechnic,曾在 Meta 负责 PyTorch 排序,两人均来自乌克兰。其余创始人 James Reed、Benny Chen、Chenyu Zhao 和 Pawel Garbacki 来自 Meta PyTorch 编译器、广告基础设施、核心 ML 团队以及 Google Vertex AI。投资人反复把创始团队深厚的推理系统背景视为公司核心优势;这也形成集中在 Qiao 身上的关键人依赖。[CO006, CO007, CO008, CO009, CO010, CO032]

领导层和创始人表
人物角色背景创始人-市场匹配关键人依赖
Lin QiaoCEO 兼联合创始人Meta PyTorch 负责人(300+ 工程师);LinkedIn、IBM;UC Santa Barbara 博士深厚推理系统和 OSS 领导力高 - 公众门面、愿景和融资负责人
Dmytro Dzhulgakov联合创始人(CTO 级)2011 年起任 Meta 核心 PyTorch 维护者;来自 Ukraine Kharkiv核心推理工程能力高 - 主要技术架构师
Dmytro Ivchenko联合创始人Meta PyTorch 排序;LinkedIn;Kyiv Polytechnic大规模 ML 系统
James Reed联合创始人Meta PyTorch 编译器团队编译器 / kernel 优化
Benny Chen联合创始人Meta 广告基础设施负责人生产基础设施战略
Chenyu Zhao联合创始人曾领导 Google Vertex AI云 AI 平台 GTM
Pawel Garbacki联合创始人Meta Newsfeed 核心 MLML 系统和排序

创始人名单和背景汇总自 Index Ventures、Sequoia、scroll.media 和高管目录来源;除 CEO 外,其他角色并非全部都有公开正式头衔。

[CO006, CO007, CO008, CO009, CO010]

1.3 融资与资本结构

Fireworks 已通过种子轮和三轮定价融资募得超过 $327M。2024 年 3 月,Benchmark 领投 $25M Series A,Sequoia Capital、Databricks Ventures 以及 Frank Slootman、Sheryl Sandberg、Howie Liu、Alexandr Wang 等天使参投。2024 年 7 月,Sequoia 领投 $52M Series B,估值 $552M,引入 NVIDIA、AMD 和 MongoDB Ventures,使累计资本达到 $77M。2025 年 10 月,公司宣布以 $4B 估值完成 $250M Series C,由 Lightspeed Venture Partners、Index Ventures 和 Evantic 共同领投,Sequoia 继续支持;Sacra 称该轮约含 $230M 一级资本和 $20M 二级交易。NVIDIA、AMD、MongoDB 和 Databricks 等战略参投方贯穿多轮融资,把股东名单与 Fireworks 所依赖的硬件和数据平台生态绑在一起。截至 2026 年 5 月,Sacra 称公司正洽谈以 $15B 投后估值再次融资,Index 拟共同领投,但条款未确认。[CO011, CO012, CO013, CO014, CO015, CO016]

利益相关方或投资者图谱
利益相关方角色 / 轮次控制权或经济重要性尽调问题
Benchmark领投 - Series A(2024 年 3 月)早期领投方;可能有董事席位确认董事会构成和持股 %
Sequoia Capital领投 - Series B;继续参与 Series C多轮支持方;GP Sonya Huang确认董事席位和 pro-rata 权益
Lightspeed Venture Partners共同领投 - Series C(2025 年 10 月)$4B 估值的后期领投方确认治理权利
Index Ventures共同领投 - Series C;潜在下一轮共同领投方复投方(Sahir Azam);论点型投资者确认传闻 $15B 轮次中的配额
Evantic共同领投 - Series C新后期领投方确认基金画像和持股
NVIDIA战略 - Series B/C硬件供应商和投资者评估 GPU 配额冲突 / 收益
AMD战略 - Series B/C替代芯片供应商 / 投资者评估 MI 系列采用情况
MongoDB / Databricks战略 - Series B/C数据平台伙伴 / 投资者确认联合销售和伙伴深度

仅列领投方和战略投资者;未列个人天使(Slootman、Sandberg、Liu、Wang)和种子投资方。董事会构成和持股比例未公开。

[CO011, CO012, CO013, CO015, CO016]

1.4 规模与牵引指标

Fireworks 报告商业规模快速扩张。Series C 时,公司称已服务超过 10,000 家公司,较 Series B 约增长 10 倍,覆盖数十万开发者,并且每天处理超过 10 万亿 tokens;第三方画像称到 2026 年初每日 tokens 达 15 万亿。开发者基数从 2024 年 2 月约 12,000 人增至当年年底 23,000 人。收入数据因来源和时间口径不同而差异很大,需要谨慎看待:公司称 2025 年 10 月 Series C 时年化收入已超过 $280M;Sacra 估算 2025 年底约 $305M,到 2026 年 5 月年化约 $800M;而 2025 年更早报道提到 $130M ARR,并称公司盈利且同比增长 20 倍。毛利率估计接近 50%,低于软件常见的 70% 以上,因为 GPU 成本计入 COGS;管理层向投资人表示目标是 60%。[CO018, CO019, CO020, CO021, CO022, CO023]

快照 KPI 表
指标数值 / 状态截至置信度缺口或备注
估值$4.0B 投后(Series C)2025 年 10 月据报 2026 年 5 月讨论 $15B 轮次(未确认)
总融资额>$327M2025 年 10 月包括 Series C 中约 ~$20M 二级转让
年化收入(公司)>$280M2025 年 10 月公司陈述;未经审计
年化收入(Sacra 估计)~$800M2026 年 5 月第三方估计;与公司时点冲突
客户10,000+ 家公司2025 年 10 月较 Series B 增加约 ~10x
开发者数十万2025 年 10 月2024 年底引用为 23,000
Tokens/day10T+(2026 年初 15T)2025 年 10 月吞吐指标,不是收入
毛利率~50%(目标 60%)2026Sacra 估计;GPU COGS 占比高
员工人数未披露2026无可靠公开数字

数值汇总自公司公告和第三方分析师画像;收入和毛利率为估计值,时点冲突,且不是经审计财务数据。

[CO011, CO014, CO018, CO021, CO022, CO024]
FO003: 可投性指标

头部 KPI 快照之外的牵引力、收入轨迹和关键人物信号。

收入和增长为不同时间口径下的估算;关键人物集中度为定性判断。

[CO018, CO022, CO023, CO019, CO033]

1.5 里程碑与负面信号

公司时间线从 2022 年离开 PyTorch 开始,随后完成三轮融资,连续推出平台能力(FireAttention、FireFunction V2、FireOptimizer、监督微调和强化微调),2025 年举办 Dev Day,2026 年 3 月上线 Microsoft Foundry,并收购 Hathora 以加深实时计算编排能力。增长叙事之外,也有后文会深入讨论的真实负面信号。独立评测者指出 Fireworks「只是引擎」,对开发者成熟度要求不低,并标记文档薄弱、没有持续免费层。分析师强调三类结构性风险:vLLM、SGLang 等开源服务框架进步带来的推理商品化;AWS Bedrock、Azure 和 Vertex 的 hyperscaler 捆绑;以及硬件集中度风险,因为 Fireworks 不拥有自己的 GPU 机群,而 NVIDIA 已通过收购 Lepton 直接进入推理市场。这些压力与异常强的创始团队和快速收入爬坡同时存在。[CO025, CO026, CO027, CO028, CO029, CO030]

里程碑表
日期事件类型金额 / 估值 / 状态含义
2022团队离开 Meta PyTorch;Fireworks 在 Redwood City 创立创立n/a推理系统背景的起点
2024 年 2 月达到 ~12,000 名开发者规模12,000 devs早期自下而上牵引力
2024 年 3 月Benchmark 领投 Series A融资$25M首个机构领投方
2024 年 7 月Sequoia 领投 Series B融资$52M @ $552MCompound-AI 定位
2024发布 FireFunction V2 和 FireAttention V2产品已发布函数调用和长上下文速度
2024 年 12 月开发者基础达到 ~23,000规模23,000 devs约 10 个月翻倍
2025 年 6 月发布 Supervised Fine-Tuning V2产品已发布更广模型 + QAT 支持
2025强化微调和 Dev Day 2025产品已发布Agentic 调优切入口
2025 年 10 月Lightspeed、Index、Evantic 共同领投 Series C融资$250M @ $4B客户较 Series B 增长 10x
2026 年初扩展到 ~15T tokens/day规模15T tokens/day(每日处理量)吞吐领先主张
2026 年 3 月在 Microsoft Foundry(Azure)上线伙伴关系已上线超大云厂商分发
2026收购 Hathora,用于实时计算编排治理收购沿堆栈向上垂直整合
2026 年 5 月据报正讨论以 $15B 融资融资$15B(传闻)不到 1 年潜在约 ~4x 跳升

时间线汇总自 Fireworks 博客、融资公告和分析师画像;部分产品发布日期按公告月份近似处理。

[CO011, CO014, CO019, CO025, CO026, CO027]
FO001: 公司里程碑时间线

按日期梳理创立、融资、产品、规模和合作伙伴里程碑。

部分发布日期近似到公告月份;$15B 融资轮未确认。

[CO011, CO014, CO019, CO025, CO026, CO017]

1.6 图表

Chapter 02

02市场分析

2.1 市场边界与定义

Fireworks 所在的是托管 AI 推理市场:为生产应用服务、微调并专用部署开放权重大语言、视觉、音频和多模态模型。相关支出是企业付给第三方、让模型在生产环境运行的钱,而不是训练基础模型或租裸 GPU 的开销。核心边界之外包括前沿实验室消耗的基础模型训练算力、CoreWeave 和 Lambda 等提供商的原始 GPU IaaS,以及 OpenAI 和 Anthropic 的封闭模型 API;但封闭 API 是最重要的现状替代方案。Fireworks 正在扩展进入的相邻预算池包括语音智能体、结合向量数据库的检索增强生成,以及面向智能体的强化学习训练。Fireworks 最直接的替代品,是在 vLLM 或 SGLang 上自托管开放模型、AWS Bedrock 和 Azure Foundry 等 hyperscaler 套件,以及继续依赖封闭 API。必须先界定边界,因为标题里的 “AI inference” 市场数字把硬件、hyperscaler 和独立提供商支出混在了一起。[CM001, CM002, CM003, CM004, CM005]

市场定义表
细分 / 类别纳入支出排除支出买方 / 付款方与 Fireworks 的相关性
托管开放权重推理开放模型按 token serverless serving封闭模型 API 使用工程 / 平台预算核心市场
微调与适配LoRA / SFT / RFT 训练支出Foundation-model 预训练ML / 工程预算核心邻近
专用 / 预留 GPU serving托管专用部署Bare-metal GPU IaaS 租赁平台 / 采购核心市场
语音与多模态 agentsStreaming STT+LLM+TTS 堆栈电话硬件产品预算扩张邻近
RAG / embeddingsEmbedding + reranking 推理Vector DB 许可证工程预算扩张邻近
封闭模型 APIs(替代品)n/a(排除)OpenAI / Anthropic API 支出工程预算主要替代品

边界定义 Fireworks 作为独立推理提供商可捕获的支出;封闭 API 和原始 GPU IaaS 被排除,但列为替代品。

[CM001, CM002, CM003, CM004]

2.2 多视角市场规模测算

Fireworks 的机会不能用单一数字概括,因此我们用三种视角交叉测算。最宽的自上而下视角是全球 AI 推理市场:MarketsandMarkets 估计 2025 年为 $106.15B,到 2030 年增至 $254.98B,CAGR 为 19.2%;其他研究机构把 2026 年规模放在约 $118B 至 $126B,把 2034 年放在 $312B 至 $536B。这个视角被半导体和 hyperscaler 支出主导,会高估 Fireworks 可触达市场。更窄一层是生成式 AI 模型支出,Gartner(Index Ventures 引用)预计从 2025 年 $14B 到 2028 年 $39B,接近翻三倍,增长大多来自有利于 Fireworks 的专业化和微调模型。最相关的可服务视角,是独立开放权重推理服务细分市场;该市场已围绕约七家提供商整合。Together AI 年化收入接近 $1B,Fireworks 在 $280M–800M 区间,Groq 估值 $6.9B,因此独立提供商收入池今天只有数十亿美元,但扩张很快。Fireworks 自身约 $280M 以上收入,意味着它在该细分市场早期占个位数百分点份额。[CM006, CM007, CM008, CM009, CM010, CM011]

TAM/SAM/SOM 或规模测算视角表
视角发布方年份数值CAGR / 备注置信度局限
自上而下 AI 推理(TAM)MarketsandMarkets2025-2030$106.15B -> $254.98B19.2% CAGR由芯片和超大云厂商主导
自上而下 AI 推理(替代)Fortune / Polaris / R&M2026 / 2034~$118-126B / $312-536B13-19% CAGR各机构区间很宽
GenAI 模型支出(视角)Gartner(via Index)2025-2028$14B -> $39B~40%/yr包含封闭模型支出
独立推理利基(SAM)Sacra / 三角测算2026低个位数 $B快速增长无标准分析师指标
Fireworks 收入(SOM)Fireworks / Sacra2025-2026$280M -> ~$800M高增长口径年份冲突

三种口径交叉校验;自上而下的 TAM 会高估 Fireworks 能触达的市场,因此 SAM/SOM 依赖公司层面的估算,置信度低。

[CM006, CM007, CM008, CM009, CM010, CM011]
FM001: 市场规模测算框架

AI 推理机会的 TAM/SAM/SOM 分层。

各层采用不同时间口径;SAM 是交叉估算,不是分析师测算值。

[CM006, CM009, CM010, CM012]
FM002: 市场估算区间

按预测年份给出 AI 推理市场的低 / 基准 / 高估算,单位为十亿美元。

区间覆盖 MarketsandMarkets、Polaris、Fortune、Research and Markets 和 Gartner 估算;单位为十亿美元。

[CM006, CM007, CM008]

2.3 买家与细分版图

Fireworks 的需求横跨三个买家细分,采用路径不同。AI 原生初创公司(如 Cursor、Perplexity 和 Liner)自下而上采用:个体开发者从自助 API key 和按量付费开始,经济买家是工程或平台负责人。数字原生企业(DoorDash、Notion、Shopify、Upwork、Quora)把功能从试点推向生产,再扩展到专用部署和微调,预算由产品工程组织掌握。传统和受监管企业(Samsung,以及越来越多的医疗和金融服务买家)通过议价合同自上而下采用,需要 SSO、审计日志、数据驻留以及 HIPAA 或 SOC2 姿态,预算由平台和采购职能掌握。三类客户中,用户都是开发者,付款来自工程预算,主导触发因素是封闭模型 API 在生产规模下的成本、延迟或控制限制。Fireworks 的 AWS Strategic Collaboration Agreement 和 Microsoft Foundry 可用性,让它能在现有云采购渠道里触达这些买家,而不是以独立供应商身份另起评估流程。[CM013, CM014, CM015, CM016, CM017]

细分市场 / 买方地图
细分市场买方用户付款方采用触发因素
AI 原生初创公司工程 / 平台负责人开发者工程预算规模化后的闭源 API 成本 / 延迟
数字原生企业产品工程组织开发者工程预算从试点扩到生产
受监管企业 / Fortune 500平台 + 采购内部开发者采购预算数据控制与合规
语音 / 智能体构建者产品负责人应用用户产品预算低于 500ms 的延迟需求
RAG / 搜索团队工程负责人开发者工程预算检索延迟与成本

各细分市场里,用户通常是开发者,付款方来自工程或采购预算;采用触发因素随成熟度和监管要求而变。

[CM013, CM014, CM015, CM016]
FM003: 买方 / 客群地图

买方、用户、付款方关系,以及进入 Fireworks 的采用路径。

[CM013, CM014, CM017]
FM004: 采用漏斗或价值链地图

从认知到企业标准化的购买与部署阶段。

阶段综合自 Fireworks 的 GTM 描述;数值是示意性的相对权重,不是已披露转化率。

[CM015, CM016, CM017]

2.4 增长驱动与采用约束

几个因素在扩大 Fireworks 的市场。开源模型质量正在向封闭模型靠拢,智能体和复合 AI 系统会放大每个任务的推理调用次数,基于专有数据的微调正成为竞争必需品,企业也越来越希望掌控自己的 AI,而不是依赖少数封闭实验室。成本压力同样有利:规模化时,开放权重推理可以比封闭 API 便宜很多。反向约束也很强。随着 vLLM 和 SGLang 进步,推理正在商品化,优化栈的专有优势被压缩,并引发价格竞争;Fireworks 的 Llama 70B 价格与 Together 相差约 2% 以内。Hyperscaler 捆绑让 AWS、Azure 和 Google 把推理折进既有安全、计费和治理关系。GPU 供给集中,Fireworks 也不拥有自己的机群。EU AI Act 等监管增加合规开销;降低迁入成本的 OpenAI 兼容 API 同样降低迁出成本,限制长期锁定。[CM018, CM019, CM020, CM021, CM022, CM023]

增长驱动因素与约束表
驱动因素 / 约束方向时间影响尽调问题
开源模型质量收敛驱动因素当前扩大可服务工作负载跟踪开源与闭源模型质量差距
智能体式 / 复合 AI驱动因素1-2 年单任务推理调用增加衡量每条工作流的 token 增长
基于专有数据微调驱动因素当前支出价值更高、黏性更强评估 RFT/SFT 挂载率
企业数据所有权驱动因素1-3 年偏好开源模型调研买方自建与采购取舍
推理商品化约束当前利润率 / 价格受压监测 vLLM/SGLang 能力持平
超大云厂商捆绑约束当前渠道被截留风险评估 Bedrock/Azure 重叠
GPU 供应集中约束持续产能 / 成本暴露审查 GPU 合同
监管(EU AI Act)约束1-3 年合规开销按层级梳理义务

驱动因素放大市场,约束则压缩利润率或截留渠道;时间栏说明每项因素何时会实质影响采用。

[CM018, CM019, CM020, CM021, CM022, CM023]

2.5 规模测算缺口与相互矛盾的估计

几个缺口限制了市场规模测算的可信度。公开的 “AI inference” 总量差异很大,并打包了不兼容类别(芯片、hyperscaler 服务和独立软件),因此自上而下的 TAM 不能干净映射到 Fireworks 可触达收入。没有标准分析机构衡量独立推理提供商收入池;只能从单家公司估计拼出,时间口径和可靠性参差不齐。不同机构预测的 CAGR 约在 13% 至 19% 之间,2034 年估计相差超过 $200B。在其中,Fireworks 自身收入数字也被多个来源争议。这些缺口说明市场显然很大且高速增长,但与估值相关的可服务和可获得份额仍是估计,不是实测事实;任何规模测算都应视为方向性判断。我们保留这种精度失败,而不是断言一个单一 SAM。[CM025, CM026, CM027, CM028]

2.6 图表

Chapter 03

03竞争对手

3.1 竞争格局

推理市场已经分成四个清晰的竞争层,Fireworks 每一层都承压。托管开放模型平台,主要是 Together AI、Baseten 和 Replicate,是最接近的直接同业,围绕模型广度、开发者体验和按 token 价格竞争。垂直整合的芯片玩家 Groq、Cerebras 和 SambaNova,不靠通用 GPU 上的软件优化,而是用定制硬件攻击延迟和成本。Hyperscaler 套件——AWS Bedrock、Google Vertex AI、Microsoft Azure Foundry 和 Databricks Model Serving——结构性威胁最大,因为它们把模型访问、基础设施、治理和合同压进一个平台。最后,vLLM 和 SGLang 等开源服务框架,加上 NVIDIA NIM 这样的打包层和 OpenRouter 这样的路由器,正在商品化 Fireworks 自身栈里的专有优势。现状替代方案包括继续使用封闭 API 和内部自托管。最可能的新进入者压力来自 NVIDIA 本身:它通过收购 Lepton 和推出竞争性 GPU 云市场直接进入推理,把关键供应商变成对手。[CP001, CP002, CP003, CP004, CP005, CP006]

FP001: 竞争定位图

按价格竞争力(x)与企业 / 广度深度(y)定位供应商。

轴位置为作者定性判断,综合了定价和能力证据。

[CP001, CP014, CP015]

3.2 竞争对手画像

Together AI 是 Fireworks 最接近的直接竞争对手:它由 Percy Liang、Chris Ré 和 Vipul Ved Prakash 于 2021 年创立,2025 年 2 月以 $3.3B 估值完成 $305M Series B,据称到 2026 年初年化收入约 $1B,覆盖无服务器推理、专用集群、微调、语音和强化学习。Baseten 定位为企业推理工程平台,提供自托管和混合 VPC 部署;它在 2026 年 1 月以 $5B 估值完成 $300M 融资,由 IVP 和 CapitalG 领投,据称 NVIDIA 投入 $150M,使总融资升至约 $585M。Groq 以定制 LPU 芯片竞争,2025 年 9 月以 $6.9B 估值融资 $750M,并宣传 Llama 模型上每秒 750 多 tokens,且与 Meta 合作为官方 Llama API 提供动力。Cerebras 和 SambaNova 在高端低延迟市场延续硬件主导攻击,Replicate、Modal 和 Anyscale 争夺开发者心智。相比之下,Fireworks 拥有 $4B 估值和 $280M 以上收入,并在可靠性和函数调用上处于品类领先。[CP007, CP008, CP009, CP010, CP011, CP012]

竞品画像表
竞品层级融资 / 估值目标客户产品范围指示性价格(Llama 70B)战略方向
Fireworks AI托管开源模型已融资 $327M / $4.0BAI 原生 + 企业开发者无服务器、微调、RFT、专用实例、语音$0.90/M向上走:调优、智能体、治理
Together AI托管开源模型$533.5M / $3.3B(洽谈 $7.5B)从初创公司到企业无服务器、集群、微调、语音、RL$0.88/M自有 GPU 集群 + 产品广度
Baseten托管开源模型~$585M / $5.0B(洽谈 $11B)合规要求重的企业定制模型、VPC / 自托管运行时按报价企业推理工程
Replicate托管开源模型私有 / 未披露开发者 / 实验广泛模型目录、API 调用运行按运行次数漏斗顶部开发者心智
Groq垂直芯片$750M+ / $6.9B延迟敏感型工作负载LPU 推理 API$0.59/M定制芯片 + Meta Llama API
Cerebras / SambaNova垂直芯片私有 / 数十亿美元级性能敏感型晶圆级 / RDU 推理按报价硬件驱动的延迟领先
AWS Bedrock / Azure / Vertex超大云厂商捆绑上市巨头既有云企业客户捆绑模型访问 + 治理捆绑供应商整合
Databricks / NVIDIA NIM超大云厂商 / 打包上市 / 私有数据平台与基础设施买方模型服务 / NIM 打包捆绑将推理吸收入平台

融资和估值来自公司公告与 Sacra;价格为 Llama 70B 无服务器指示性费率,会随层级和日期变化。

[CP007, CP008, CP009, CP010, CP011, CP014]

3.3 能力、定价与 GTM 对比

能力上,Fireworks 通过可靠性和结构化输出做出差异:独立监测显示其 2026 年 Q1 可用性为 99.8%,在专业提供商中最高;FireFunction 模型多工具函数调用准确率约 92%,领先 Together 和 Groq,距离 GPT-4o 只差几个百分点。价格上,竞争极薄:Llama 3.3 70B 在 Fireworks 每百万 tokens 约 $0.90,Together 为 $0.88,Groq 为 $0.59;同一模型在七家提供商之间价差约六倍。原始速度上,Groq 的 LPU 以每秒 400–750 tokens 领先,而 Fireworks 约为 145;但 Fireworks 赢在延迟一致性和负载下稳定性。GTM 上,Together 和 Baseten 与 Fireworks 一样走自下而上的开发者路径,但 hyperscaler 通过既有采购、安全和计费关系赢得分发。信任与监管方面,Fireworks、Together 和 Baseten 均提供 SOC2/HIPAA 姿态以及 VPC 或数据驻留选项,Baseten 最偏向自托管、合规重的部署。[CP014, CP015, CP016, CP017, CP018, CP019]

功能 / 能力矩阵
能力FireworksTogether AIBasetenGroq
无服务器开源模型 API
模型目录规模50+200+聚焦定制15-20
LoRA 微调是 + 完整微调
函数调用质量同类最佳(~92%)基础
定制芯片是(LPU)
VPC / 自托管EKS 隔离部署专用是(核心强项)有限
语音智能体平台合作伙伴
强化微调部分支持

根据供应商文档、TokenMix 和 Sacra 整理;「同类最佳」反映独立 FireFunction 基准测试结果。

[CP014, CP015, CP016, CP017]
定价 / 包装对比
指标FireworksTogether AIGroq备注
Llama 3.3 70B ($/1M)$0.90$0.88$0.59Fireworks 比 Together 高 ~2%,比 Bedrock 低 66%
Llama 3.3 8B ($/1M)$0.20$0.18$0.05Groq 最便宜
2026 年 Q1 正常运行时间99.8%99.7%99.4%Fireworks 最高
吞吐量(tok/sec)14595420Groq 最快
TTFT P50150ms220ms65msGroq 延迟最低
微调LoRA $16/MLoRA+完整微调 $14/MNoneTogether 最便宜 / 覆盖最广
批处理 API尚未是(优惠 30-50%)Together 有优势

价格和基准来自 TokenMix 2026 年 4 月数据与 DeployBase;数字为指示性,变化频繁。

[CP014, CP015, CP018]
FP002: 功能广度 / 能力图谱

四家直接竞争对手和芯片竞争对手的能力覆盖情况。

能力单元格由服务商文档和基准测试汇总得出。

[CP016, CP017]

3.4 切换成本、锁定与分发能力

推理的切换成本结构性偏低。包括 Fireworks、Together、Groq 和 Baseten 在内的大多数提供商都暴露 OpenAI 兼容 API,供应商之间迁移可在数分钟内完成;OpenRouter 和 TokenMix 等路由聚合器还主动鼓励跨提供商多栖和自动故障转移。这限制了所有人的长期锁定,意味着份额要靠性能、调优和企业集成守住,而不是靠合同。分发能力越来越关键:hyperscaler 和 NVIDIA 控制独立提供商依赖的 GPU 供给、安全姿态和采购关系。Fireworks 的反制,是通过 AWS Strategic Collaboration Agreement 和 Microsoft Foundry 可用性接入这些渠道,同时向微调、强化学习、语音和企业治理上移,创造更粘、更高价值的关系。Baseten 的 VPC 与自托管足迹、Together 的自有数据中心和 GPU 集群策略,是对同一分发与供给问题的替代答案。[CP020, CP021, CP022, CP023, CP024]

3.5 护城河耐久性与负面证据

Fireworks 的护城河真实但狭窄。自研 FireAttention 和 FireOptimizer 栈把推理系统工程转化为性能与价格优势,可靠性和函数调用领先也是真实优势。但护城河面临清晰侵蚀路径。vLLM 和 SGLang 等开源服务框架持续缩小性能差距,Baseten 也公开基于它们构建;NVIDIA 把 NIM 推成打包层;Snowflake 发布 Arctic Inference,作为开放 vLLM 插件。资本更厚的竞争对手抬高门槛:Groq 估值 $6.9B,Baseten 估值 $5B,Together 传闻接近 $7.5B,都有更多资产负债表空间承诺 GPU 和企业 GTM。硬件集中也是负面信号,因为 Fireworks 不拥有 GPU,而供应商兼投资人 NVIDIA 现在直接竞争。长期问题在于,Fireworks 能否比生态系统商品化服务层更快,把栈延伸到调优、智能体和治理。[CP025, CP026, CP027, CP028, CP029, CP030]

护城河耐久度 / 竞争风险台账
风险机制严重性证据
开源服务平价vLLM/SGLang 缩小性能差距Baseten 基于 SGLang/vLLM/TGI 构建
NIM 打包NVIDIA 标准化企业推理NVIDIA 推动 NIM 分发
供应商变竞品NVIDIA 通过 Lepton 进入推理NVIDIA GPU 云市场
超大云厂商捆绑Bedrock/Azure 吸收推理Bedrock 自定义模型导入(Qwen)
资本不对称竞争对手融到更大轮次Groq $6.9B,Baseten $5B
价格商品化单 token 价差薄如刀片Fireworks 与 Together 相差 2% 以内
低切换成本OpenAI 兼容 API + 路由器OpenRouter 多归属
硬件集中无自有 GPU 集群从第三方采购 NVIDIA/AMD

风险台账综合 Sacra 分析与定价 / 基准来源;严重性为作者的定性判断。

[CP025, CP026, CP027, CP028, CP029, CP030]
FP003: 护城河 / 就绪度 KPI

Fireworks 竞争位置的指标。

KPI 综合了基准测试和融资证据;速度比为 Fireworks 吞吐量除以 Groq 吞吐量。

[CP014, CP015, CP028]

3.6 图表

Chapter 04

04财务

4.1 收入流与定价模型

Fireworks 采用按使用量计费的 B2B 模式,叠加在映射客户生命周期的多个产品表面上。无服务器推理按 token 计费,微调按训练 token 计费,强化微调按 GPU-hour 计费,按需专用部署按 GPU-second 或 GPU-hour 计费;预留容量则按更长期承诺单独签约、议价定价。这让 Fireworks 几乎能在客户 AI 工作流的每个阶段捕获收入,从实验到规模化生产。公开无服务器费率展示了模型:Llama 3.3 70B 每百万 tokens 约 $0.90,8B 模型 $0.20,DeepSeek V3 $0.50;图像生成每张约 $0.013 至 $0.04,预留容量每 replica 每小时约 $4.80。收入结构未披露,但分析师预计收入会从商品化无服务器 token 量,转向价值更高的专用部署、微调和企业合同;这将随时间改善毛利率和收入耐久性。[CI001, CI002, CI003, CI004, CI005]

收入流表
收入流计费基础生命周期阶段利润率特征披露
无服务器推理按 token实验与生产较低(商品化)费率公开
微调(LoRA/SFT)按训练 token适配较高费率公开
强化微调按 GPU 小时适配 / 智能体较高费率公开
按需专用实例按 GPU 秒 / 小时生产扩容较高费率公开
预留容量合同承诺规模化企业最高(协商)未公开
语音 / 多模态按用量扩张混合部分公开

收入流和计费口径来自 Sacra 与 Fireworks 定价;各收入流占比未披露,利润率画像只能定性判断。

[CI001, CI002, CI003]
定价 / 变现表
项目价格单位备注
Llama 3.3 70B$0.90每 1M tokens比 Together 高约 2%,比 Bedrock 低 66%
Llama 3.3 8B$0.20每 1M tokens入门工作负载
DeepSeek V3$0.50每 1M tokens前沿开放模型
Flux 1.1 Pro$0.04每张图像最高 1024x1024
SDXL 1.0$0.013每张图像成本更低的图像生成
预留容量$4.80每小时每副本约 50 个并发请求
LoRA 微调(70B)$16每 1M 训练 tokens比 Together 高 $2/M
免费额度$1一次性无持续免费层

TokenMix 与 DeployBase 给出的 2026 年 4 月无服务器指示性费率;价格变动频繁,且不含协商后的企业条款。

[CI004, CI005, CI007]
FI001: 收入模型桥

基于用量的收入流如何沿客户生命周期汇聚为总收入。

各收入流占比仅为示意;Fireworks 未披露收入结构。

[CI001, CI002, CI003]

4.2 市场进入与销售效率

Fireworks 的 GTM 入口自下而上,扩张自上而下。开发者用自助 API key 和按量付费即可立即开始;支持来自 $1 免费额度,而非持续免费层,标准速率限制约为每分钟 600 次请求。更大客户会升级为议价企业关系,获得更高速率限制、预留容量、客户管理、定制优化和私有部署。其上叠加现场与合作伙伴销售动作,由 AWS Strategic Collaboration Agreement 锚定:该协议资助概念验证和初创加速项目,让 Fireworks 通过既有采购渠道接触企业买家,而不是要求客户做独立供应商评估。CAC、回本期和净收入留存等销售效率指标未披露;但 先落地、再扩张的结构是主要效率杠杆,一个无服务器功能可增长为专用、微调、语音和预留容量支出。按 10,000 家以上公司粗算,混合年化每家公司收入估计接近 $28,000,但基数偏向少数大型生产部署。[CI006, CI007, CI008, CI009, CI010]

单位经济表
指标数值 / 状态驱动因素置信度
毛利率~50%GPU COGS 占比高
目标毛利率60%利用率 + Blackwell + 组合
混合 ARPA~$28K/yr10,000+ 家公司
收入集中度向大型部署倾斜生产级巨鲸客户
Multi-LoRA 利用率每个基础模型挂载多种变体单变体成本更低
CAC / 回本期未披露自下而上 + 伙伴销售
净收入留存未披露先落地、再扩张

单位经济数据来自 Sacra 估算或定性判断;CAC、回本期和 NRR 均未公开。

[CI008, CI009, CI011, CI012, CI013]

4.3 成本结构与毛利率驱动

Fireworks 不是纯软件业务:GPU 采购、容量规划和区域基础设施都是计入 COGS 的真实成本输入,因此 Sacra 估计毛利率接近 50%,远低于订阅软件常见的 70% 以上。管理层告诉投资人,公司目标是通过更高 GPU 利用率、NVIDIA Blackwell 等新架构带来的硬件效率提升,以及收入结构向专用和企业工作负载转移,把毛利率推到 60%。核心经济逻辑是,自研推理优化 FireAttention 和 FireOptimizer 能把工程能力转成定价权:如果 Fireworks 比客户自托管更快、更高吞吐地服务模型,就可以在低于替代方案总成本的同时收取溢价。Multi-LoRA 把许多微调变体整合到单个基础模型部署上,降低每个变体的计算成本。成本环境受 NVIDIA 和 AMD 数据中心 GPU 经济性塑造;两家公司都报告 AI 加速器收入快速增长,说明 Fireworks 的投入成本处在供应商驱动、容量受限的市场里。[CI011, CI012, CI013, CI014, CI015, CI016]

FI002: 单位经济桥

GPU 成本如何借助专有优化和定价权转化为毛利率。

[CI011, CI012, CI013, CI014]

4.4 公开牵引与私有指标缺口

公开牵引信号很强,但时间口径不一致。Fireworks 称其 2025 年 10 月 Series C 时年化收入超过 $280M;Sacra 估算 2025 年底约 $305M,到 2026 年 5 月年化约 $800M;第三方画像称 2026 年初超过 $315M;2025 年更早报道则称 $130M ARR,并称公司盈利且同比增长约 20 倍。平台每天处理超过 10 万亿 tokens(2026 年初为 15 万亿),覆盖 10,000 多家公司和数十万开发者。这些大多是公司披露或估算数字;经审计财务、收入结构、净收入留存、流失率和员工数均未公开。十二个月内收入区间从 $130M 到约 $800M 年化,既反映真实超高速增长,也反映测量口径不一致;任何单一数字都应视为方向性而非已验证。[CI017, CI018, CI019, CI020, CI021]

公开财务缺口表
指标公开状态缺什么尽调路径
收入 / ARR估算相互冲突单一、可对账且带日期的口径管理层确认 ARR
毛利率分析师估算约 50%经审计毛利率确认性财务资料
净收入留存未披露扩张 / 流失数据Cohort 留存包
员工数未披露员工数量HR / LinkedIn 估算
烧钱与跑道未披露现金流量表银行余额 + 烧钱
收入结构未披露按收入流拆分产品收入拆分

所列指标均为私有信息;本表界定了验证财务质量所需的尽调问题。

[CI017, CI018, CI020, CI028, CI029]
FI003: 财务估算区间

不同来源和时间点对 Fireworks 年化收入的估算,单位为 USD millions。

估算覆盖公司表述和不同时间点的第三方分析师数据;区间近似对应其给出的点估计。

[CI017, CI018, CI019]

4.5 资本充足性与融资依赖

Fireworks 已通过种子轮、Series A、B、C 融资超过 $327M;仅 2025 年 10 月的 Series C 就提供 $250M,其中约 $230M 为一级资本、$20M 为二级交易,估值 $4B。这笔一级注资叠加 2025 年据称盈利和高增长收入基础,说明近期资本充足性较舒服;但现金余额、烧钱速度和跑道未披露。公司已表示未来一年会将计算足迹扩大三到四倍,这是资本密集计划,会提高对 GPU 获取的依赖,也可能成为下一轮融资触发因素;Sacra 称截至 2026 年 5 月,Fireworks 正洽谈以 $15B 估值再次融资。主要融资依赖是 GPU 供给:Fireworks 不拥有机群,而是从第三方采购 NVIDIA 和 AMD 容量,暴露于配额约束以及 NVIDIA 自身进入推理的风险。未披露公开债务或项目融资义务。[CI022, CI023, CI024, CI025, CI026]

资本充足性表
项目数值 / 状态截至备注
累计融资>$327MOct 2025从种子轮到 Series C
Series C 规模$250MOct 2025$230M 主融资 + $20M 二级交易
估值$4.0BOct 2025投后
盈利能力据称已盈利Mid-2025据 scroll.media;未验证
现金 / 烧钱 / 跑道未披露2026尽调阻塞项
计划资金用途算力扩张 3-4x明年资本密集
下一轮信号$15B 融资洽谈May 2026据 Sacra;未确认
债务 / 项目融资未披露2026无公开义务

资本数据来自公司与 Sacra;现金、烧钱和跑道未公开,资本充足性评估因此受限。

[CI022, CI023, CI024, CI025]
FI004: 资本强度 / 现金流图谱

资本如何流入算力和基础设施,再回流为收入和利润率。

流向综合了披露的资金用途和分析师估算;现金和烧钱速度未披露。

[CI022, CI024, CI025, CI026]

4.6 财务结论

收入质量上,Fireworks 展现可信的超高速增长,按使用量计费的模型可覆盖客户生命周期支出;但缺少审计数据、收入结构和留存指标,限制了信心。毛利率上,约 50% 的毛利率是核心财务弱点:GPU 成本使其结构性低于软件常态,通向公司所称 60% 目标的路径依赖利用率提升和结构转移,合理但尚未证实。资本强度上,三到四倍计算扩张加上缺少自有 GPU,使这个模式比典型 SaaS 更吃资本、更依赖供给。主要尽调阻塞点包括一套可调和的收入数字、毛利率和单位经济验证、烧钱与跑道,以及净收入留存。整体图景是一家快速扩张、资金充足、有真实但受供应商暴露影响的经济性的公司,而不是已经证明的高毛利软件复利机器。[CI027, CI028, CI029, CI030]

4.7 图表

Chapter 05

05产品与技术

5.1 以客户工作流定义产品

从客户角度看,Fireworks 是把开源模型带入生产的那一层:跑得快、便宜、可靠,客户无需管理 GPU。开发者注册后,把 OpenAI 兼容 API 指向 Llama 4、DeepSeek 或 Qwen 等模型,即可获得低延迟推理、函数调用、JSON 模式结构化输出和流式传输。随着用量增长,同一客户可以用专有数据微调模型,迁移到专用或预留 GPU 容量以获得有保障的吞吐,加入用于 RAG 的检索和 embeddings,并部署语音智能体。平台横跨文本、图像(Flux、SDXL)、音频和多模态格式,覆盖数百个模型,并对主要新版本提供首日支持。它为客户完成的核心工作,是把「能在 notebook 里跑」的模型,与「能在生产环境服务数百万用户」的模型之间的缺口压平;Fireworks 将此定位为实验与发货的区别。这也是客户把它称为推理引擎而非应用的原因:它提供速度、成本和控制,产品由客户自己构建。[CE001, CE002, CE003, CE004, CE005]

工作流 / 用例表
用例客户案例结果来源类型
代码生成CursorFast Apply 约 1,000 tokens/sec客户故事
生产力 AINotion延迟从 2s 降至 350ms客户故事
代码辅助Sourcegraph延迟降低 30%,采纳率提升 2.5x客户 / AWS
提案起草Upwork(Uma)实时生成定制提案客户故事
对话式搜索Quora(Poe)响应速度提升至 3 倍报道
邮件助手SuperhumanAsk AI 复合系统客户故事
企业搜索Hebbia快速接入新的开放模型分析师

用例和效果来自 Fireworks 客户故事、AWS 案例研究和分析师报道;结果由厂商或客户报告。

[CE002, CE018, CE019, CE020]
FE002: 客户工作流 / 运行流

开发者从 API 调用,经推测解码,到获得响应的路径。

[CE001, CE013, CE015]

5.2 产品模块与资产版图

Fireworks 的产品表面可拆成几个模块。无服务器推理是入口产品:按 token 付费访问 50 多个活跃服务模型(目录中有数百个),包括 Llama 4 Scout 和 Maverick、DeepSeek V3、Qwen 3、Mixtral、Gemma 3 和 Phi-4,并通过 Flux、SDXL 和视觉模型生成图像。FireFunction 是自研函数调用模型族,用于工具使用和结构化输出。定制模块包括 LoRA 微调、带量化感知训练的 Supervised Fine-Tuning V2,以及面向智能体任务的 Reinforcement Fine-Tuning,均通过 Build SDK 和 Experiment Platform 暴露。部署模块覆盖无服务器、按需专用和预留容量,并提供 multi-LoRA 托管,把许多微调 adapter 打包到一个基础部署上。更新的表面包括 Voice Agent Platform,它将转写、语言模型和工具调用共址,以实现低于 500ms 响应;以及 BYOB 安全训练,让企业从自己的 AWS S3 bucket 训练。合在一起,这些模块让单个客户关系能从一个无服务器功能扩展为完整生产 AI 运行时。[CE006, CE007, CE008, CE009, CE010]

产品模块 / 资产矩阵
模块功能计费成熟度
无服务器推理按 token 调用 50+ 个托管模型按 token正式可用
FireFunction函数调用 / 结构化输出按 token正式可用
LoRA 微调 / SFT V2用 QAT 定制模型按训练 token正式可用
强化微调训练智能体以超过闭源模型按 GPU-hour正式可用
专用 / 预留部署在专用 GPU 上保障吞吐按 GPU-hour正式可用
Multi-LoRA 托管一个基础模型挂载多个 adapter按 token正式可用
Voice Agent Platform 产品线STT + LLM + 工具调用,低于 500ms按用量较新
Build SDK / Experiment Platform 开发工具以编程方式构建、调优、评估已包含较新

模块清单汇总自 Fireworks 博客和文档;成熟度为定性判断(正式可用 = 已普遍开放,较新 = 近期推出)。

[CE006, CE007, CE008, CE009]

5.3 架构与运营模式

Fireworks 在通用 NVIDIA GPU 上运行自研多层推理栈。内核层,FireAttention 是定制 CUDA attention 实现;Fireworks 称其显著快于 vLLM 和 TensorRT-LLM,并在多个版本中扩展,以支持长上下文和 Llama 4 chunked local attention 等架构。其上,FireOptimizer 执行自适应 speculative execution,针对每个工作负载个性化 speculative decoding、draft-model 选择和缓存;公司称生产中延迟最高可降低约 3 倍,并在 NVIDIA Blackwell B200 硬件上原生支持 FP4。服务拓扑结合无状态请求路由器、用于 speculative decoding 的 draft 和 target GPU pods、分布式 KV cache、连续 batching 和 disaggregated serving,扩展到文档记录的约每分钟 50,000 次请求测试。Multi-LoRA 将许多微调变体整合到单个基础模型上。运营模式对开放模型中立:Fireworks 赌的是运行任一时点胜出的开放模型,而不是押注单个模型;因此对新版本首日支持成为核心工程纪律。[CE011, CE012, CE013, CE014, CE015, CE016]

技术 / 运营架构表
层级组件功能差异化
APIOpenAI 兼容 API模型访问、流式输出、JSON mode切换成本低
编排无状态请求路由器在 pods 间路由请求扩展至约 50K RPM
优化FireOptimizer自适应推测执行延迟最高降低约 3x
推测草稿 + 目标 pods推测解码并行生成 token
内核FireAttention自研 CUDA attention快于 vLLM / TensorRT-LLM
内存分布式 KV cache复用上下文,压低 prefill长上下文延迟更低
适配Multi-LoRA每个基础模型挂载多个 adapter提高 GPU 利用率
硬件NVIDIA / AMD GPUs(含 B200)算力底座,FP4新硅片首日支持

架构汇总自 Fireworks 博客 / 文档和独立技术文章;性能主张由厂商或分析师报告。

[CE011, CE012, CE013, CE014, CE015]
FE001: 产品架构图

Fireworks 推理栈的分层结构,从 API 一直到 GPU 硬件。

分层综合自 Fireworks 博客 / 文档和独立架构文章。

[CE011, CE012, CE013, CE014]

5.4 部署、可靠性、集成与路线图

Fireworks 支持跨全球多区域机群的无服务器、按需专用和预留部署,文档位置包括 Frankfurt、Iceland、Tokyo,以及美国、欧洲和 APAC 区域,可满足延迟和数据驻留需求。OpenAI 兼容 API 加上 LangChain、LlamaIndex 等框架的 SDK 和连接器降低了集成难度,从封闭 API 迁移可在数分钟内完成。可靠性是核心主张:独立监测显示,2026 年 Q1 可用性为 99.8%,在专业提供商中最高,且负载下稳定性强。记录在案的生产结果包括:Cursor 在代码生成中达到约每秒 1,000 tokens;Notion 将 AI 响应延迟从约 2 秒降至 350 毫秒;Sourcegraph 延迟降低 30%,completion acceptance 提高 2.5 倍。Series C 资助的路线图瞄准更深的调优与推理对齐研究、端到端模型生命周期工具链,以及全球计算扩大三到四倍;收购 Hathora 则用于加深实时编排。[CE017, CE018, CE019, CE020, CE021, CE022]

路线图 / 发布 / 开发阶段表
项目阶段时间含义
FireAttention(v2+)已发布2024+长上下文速度
FireFunction V2已发布2024函数调用
FireOptimizer已发布2024自适应优化
Supervised Fine-Tuning V2已发布Jun 2025QAT,更多模型
强化微调已发布2025智能体调优
Voice Agent Platform 产品线已发布2025-2026新预算类别
Microsoft Foundry 发布已发布Mar 2026Azure 分发
模型生命周期工具链计划中2026+端到端创建
3-4x 算力扩张计划中2026容量扩容

发布时间线来自 Fireworks 博客、文档更新日志和分析师报道;计划中项目是公司披露的 路线图意图。

[CE008, CE021, CE022]
FE004: 产品成熟度 / 能力图谱

各模块在能力维度上的成熟度。

成熟度单元格为作者定性判断,综合了产品和合规证据。

[CE006, CE008, CE029, CE031]

5.5 差异化、IP 与数据

Fireworks 的差异化由工程驱动。核心知识产权是自研推理引擎,尤其是 FireAttention 的定制 kernels 和 FireOptimizer 的自适应优化,把创始团队 PyTorch 背景中的系统能力转化为可衡量的速度和成本优势;没有公开专利列出,因此护城河是 know-how,而不是注册 IP。第二个差异化来源是产品—模型协同设计:客户交互形成数据反馈环,持续改进微调模型;Fireworks 将其描述为企业用 AI 构建竞争护城河的方式。第三是广度和新鲜度:数百个开放模型和模态的首日支持,让平台受益于模型更替,而不是被它威胁。主要脆弱点在于,优化优势建立在 vLLM 和 SGLang 等持续进步的开源框架之上,因此差异化必须不断重新赚回来。获取前沿 NVIDIA 和 AMD GPU 的供给也是助推优势,但并非独占优势。[CE023, CE024, CE025, CE026, CE027]

FE003: 关键依赖图

Fireworks 平台依赖的上游要素。

依赖图综合自技术来源;边的方向表示从上游到平台的依赖。

[CE015, CE024, CE026, CE027]

5.6 信任、安全、安保与合规

Fireworks 的企业姿态为受监管买家而建。平台默认零数据留存,提供单点登录、审计日志和数据驻留控制;其基于 AWS 的推理解决方案符合 HIPAA 和 SOC2 Type II。最敏感的工作负载可使用 airgapped EKS 部署,以及 bring-your-own-bucket 安全训练,把训练数据留在客户自己的 AWS S3 中。JSON mode 和 grammar-constrained decoding 等结构化输出控制提升可靠性,减少智能体工作流中的畸形响应;FireFunction 的高 schema-compliance 率支持可靠工具使用。这些能力打开了医疗、金融服务和政府相邻工作负载等受监管垂直,此前这些领域很难由独立推理供应商触达。产品—模型协同设计循环中的持续评估和强化学习进一步强化质量。缺口仍在:Fireworks 未发布正式标准层 SLA,企业 SLA 逐案议价;独立评测者指出部分文档薄弱,这些都是安全敏感买家的尽调事项。[CE028, CE029, CE030, CE031, CE032]

信任 / 质量 / 合规表
控制项状态范围备注
SOC2 Type II合规基于 AWS 的推理据 AWS 案例研究
HIPAA合规基于 AWS 的推理支持医疗场景
零数据留存默认企业隐私姿态
SSO / 审计日志可用企业治理
数据驻留可用多区域Frankfurt / Iceland / Tokyo
Airgapped EKS可用敏感工作负载隔离
BYOB 安全训练可用SFT / RFT客户 AWS S3
标准层 SLA未发布无服务器企业客户协商

合规姿态来自 AWS 案例研究和 Sacra;未发布标准 SLA 是一个尽调项。

[CE028, CE029, CE030, CE032]

5.7 图表

Chapter 06

06客户

6.1 客户基数分层

Fireworks 的客户基数横跨三大类,区别在于买家、用户、付款方和采用路径。AI 原生初创公司,包括 Cursor、Perplexity、Liner 和 Cresta,自下而上采用:个体开发者从自助 API key 开始,经济买家是工程或平台负责人。数字原生企业,如 DoorDash、Notion、Shopify、Upwork 和 Quora,把功能从试点推入生产,并扩展到专用部署和微调,预算由产品工程组织掌握。Samsung 和 Uber 代表的传统与更大型企业,以及医疗和金融服务中越来越多的受监管买家,通过需要合规和数据驻留控制的议价合同自上而下采用。三类客户中,用户是开发者,付款来自工程或采购预算,用例集中在代码辅助、对话式 AI、企业搜索、智能体工作流和语音。地域上,客户基数偏北美和欧洲,但 API 全球可用;垂直覆盖软件、电商、市场平台、客服和法律科技。[CU001, CU002, CU003, CU004, CU005]

客户分层表
客群示例客户购买方 / 付款方用例采用路径
AI 原生初创公司Cursor, Perplexity, Liner, Cresta工程负责人 / 工程预算代码、搜索、对话式 AI自下而上自助采用
数字原生企业DoorDash、Notion、Shopify、Upwork、Quora 等客户产品工程 / 工程预算生产级 AI 功能从试点到生产
大型 / 受监管企业Samsung, Uber平台 + 采购企业 AI 路线图自上而下签约
企业搜索 / 智能体Sourcegraph, Hebbia工程负责人 / 工程预算代码 + 企业搜索先落地再扩张
通信 / 生产力Superhuman产品负责人复合式 AI 助手功能牵引

客群和示例客户来自 Fireworks 博客、Sacra 和 AI Market Watch;分层边界是分析判断, 部分客户横跨多个客群。

[CU001, CU002, CU003, CU004]
FU001: 客户旅程图

客户从发现产品到企业标准化部署会经过的阶段。

旅程综合自 Fireworks 的 go-to-market 描述;并非所有客户都会走完每个阶段。

[CU002, CU009, CU022]

6.2 采用轨迹

采用曲线陡峭上行。Fireworks 在 2025 年 10 月 Series C 时称服务超过 10,000 家公司,较 Series B 时约 1,000 家增长约 10 倍,并覆盖数十万开发者。开发者基数从 2024 年 2 月约 12,000 人增至当年年底 23,000 人。使用强度高:平台每天处理超过 10 万亿 tokens,到 2026 年初升至约 15 万亿,说明许多账户跑的是生产负载,而不是实验负载。客户沿 land-and-expand 路径推进:先用无服务器推理服务单个功能,再扩展到专用部署、微调、强化微调、用于检索的 embeddings 和语音智能体。Hebbia 的分析师评论说明,单个推理关系只要锚定新开放模型的快速访问和高并发延迟保证,就可能增长为更广的基础设施依赖。这条轨迹在广度和用量上很强,但账户级留存和 cohort 扩张数据未披露。[CU006, CU007, CU008, CU009, CU010]

客户增长 / 采用轨迹表
指标数值截至来源依据
服务公司数~1,000Series B 轮(2024)公司披露
服务公司数10,000+Oct 2025公司披露
开发者~12,000Feb 2024报道
开发者~23,000Dec 2024报道
开发者数十万Oct 2025公司披露
每日 tokens(Oct 2025)10T+Oct 2025公司披露
每日 tokens(2026 年初)~15TEarly 2026第三方资料

轨迹数据来自公司披露或第三方资料;增长很快,但账户级留存没有披露。

[CU006, CU007, CU008]
FU002: 采用 / 部署漏斗

从开发者注册到标准化企业账户的相对收窄。

漏斗数值为示意性相对权重;Fireworks 未披露转化率。

[CU006, CU007, CU009]

6.3 具名客户证明

Fireworks 对一家如此年轻的公司而言,拥有异常强的具名、生产级证明点。Cursor 使用 Fireworks 的 speculative-decoding API,在 Fast Apply 代码生成中达到约每秒 1,000 tokens;一位 AI 研究员公开表示 Fireworks「性能远强于开源引擎」,并已用于生产。Notion 通过 Fireworks 微调,将 AI 响应延迟从约 2 秒降至 350 毫秒,这一结果由其 AI 工程负责人归因。Sourcegraph 将延迟降低 30%,completion acceptance 提高 2.5 倍;Upwork 的 “Uma” assistant 在 Fireworks 上实时起草提案。Quora 的 Poe 聊天机器人响应速度提高三倍,Superhuman 在该平台上构建 Ask AI 复合系统。这些大多是生产部署,带有具名高管和量化结果,因此客户背书基底质量高、新鲜度也合理;不过几项案例研究来自 2024 年,少数标识只出现在汇总营销列表中,没有独立案例研究。[CU011, CU012, CU013, CU014, CU015, CU016]

具名客户验证表
客户部署结果引用质量新鲜度
Cursor生产环境Fast Apply 约 1,000 tok/sec;具名研究员引用高(引用 + 指标)2024-2025
Notion生产环境延迟从 2s 降至 350ms;具名高管引用高(引用 + 指标)2025
Sourcegraph生产环境延迟降低 30%,采纳率提高 2.5x高(AWS + 案例)2024
Upwork生产环境Uma 实时报价;具名高管高(引用)2025
Quora (Poe)生产环境响应速度提高三倍中(报道)2024
Superhuman生产环境Ask AI 复合系统中(案例)2024
Samsung企业AI 路线图提速中(投资者引用)2025
DoorDash生产环境高吞吐 AI 功能中(标识 + AWS)2025

具名案例多为生产部署,并带量化结果;部分案例停留在 2024 年,少数标识只出现在汇总名单里, 因此覆盖不完整。

[CU011, CU012, CU013, CU014, CU015]
FU003: 客户证明矩阵

从部署状态、量化成效和具名归因看客户引用质量。

单元格综合了客户案例和 AWS case study 中的证据质量。

[CU011, CU016, CU031]

6.4 留存与耐久性

留存是客户叙事里证据最弱的一环。Fireworks 未披露净收入留存、总留存、流失率、续约率或合同期限,耐久性只能靠结构性信号推断,不能直接度量。正面信号确实存在:平台的 land-and-expand 设计、多产品界面和企业控制能力会鼓励扩张,蓝筹客户已经在跑生产工作负载,OpenAI 兼容 API 加上可靠性领先,也会降低接入后的离开理由。负面信号同样真实:同一个 OpenAI 兼容 API,加上路由聚合器兴起,让多栖和切换变得很容易;推理正在商品化,较 Together 近乎贴身的价格差也限制了靠价格建立黏性。独立评测者明确指出了替代方案和切换路径。综合判断是,产品深度和集成可能支撑耐久性,但公开留存指标尚未给出证据,这是重要尽调缺口。[CU017, CU018, CU019, CU020, CU021]

留存 / 重复使用 / 满意度表
维度状态信号置信度
净收入留存未披露先落地再扩张结构
总留存 / 流失未披露无公开数据
合同期限未披露企业谈判
重复使用高(隐含)生产环境 10T+ tokens/day
满意度正向(轶事证据)具名高管证言
切换风险偏高OpenAI 兼容 API + 路由器

留存指标未披露;正面信号来自结构和轶事证据,而低锁定度抬高了切换风险。

[CU017, CU018, CU019, CU020]
FU004: 留存 / 重复 cohort

按客户细分看定性留存信号(未披露量化指标)。

Cohort 单元格为作者定性判断;Fireworks 未披露量化 cohort 留存。

[CU017, CU019, CU021]

6.5 扩张与集中度风险

Fireworks 的增长引擎是 land-and-expand:一个 serverless 功能可以扩展成专用部署、微调、语音和预留容量支出;AWS Strategic Collaboration Agreement 又借现有采购渠道触达买家。主要集中度风险有两类。第一,收入很可能偏向少数大型生产部署,因此按公司数折算、约 $28,000 的年化收入会低估少数大客户之下可能存在的长尾;头部客户身份和收入占比未披露,头部客户风险无法量化。第二,分发和伙伴依赖真实存在:AWS 联盟和 Microsoft Foundry 上架会加速增长,但也是渠道依赖;若干标杆客户(如 DoorDash 和 Shopify)本身也是成熟买家,有能力多栖或自建。云市场上架让采购摩擦低于封闭 API,但企业销售周期和合规审查仍然卡住最大交易。[CU022, CU023, CU024, CU025, CU026]

扩张与集中风险表
因素方向细节尽调问题
先落地再扩张正向Serverless -> 专用部署 / 调优 / 语音衡量扩张收入占比
混合 ARPA中性全客群约 ~$28K/yr获取 ARPA 分布
头部客户集中度风险收入偏向大型部署披露前 10 大客户收入占比
渠道依赖风险AWS + Microsoft Foundry 渠道评估直销与伙伴来源组合
客户多供应商部署风险成熟买家可多供应商部署核查单一供应商承诺
采购摩擦中性借助云市场降低梳理企业销售周期长度

集中度和渠道风险来自分析师评论以及 AWS/Azure 合作关系推断;头部客户收入占比未披露。

[CU022, CU023, CU024, CU025]

6.6 展示材料

Chapter 07

07风险

7.1 按严重程度排序的风险概览

Fireworks 是一家高速扩张、资金充足的公司,主要风险来自商业和结构,而不是迫在眉睫的法律或运营失误。最高严重度风险包括推理商品化和毛利率压缩、可能拿走推理层的超大云厂商捆绑,以及对 NVIDIA 硬件供给的依赖;NVIDIA 同时是供应商、投资方,并且通过收购 Lepton 和 NIM 打包成为竞争者。中等严重度风险包括计划把算力扩张 3–4 倍带来的资本强度、CEO Lin Qiao 的关键人集中、压制留存的低切换成本,以及估值从 $552 million 冲到 $4 billion、传闻再到 $15 billion 的激进爬坡。较低但不可忽视的风险包括 EU AI Act 和 GDPR 带来的监管成本、开放模型许可约束、无注册专利、未披露 burn 和 runway,以及对 AWS 和 Microsoft 分发渠道的依赖。缓释逻辑在各类风险中一致:在 serving 层商品化前,更快向调优、agent、语音和企业治理上移,同时分散芯片来源,并接入既有采购渠道。剩余敞口仍然不小,因为多项缓释尚未被证明,若干关键指标也未披露。[CR001, CR002, CR003, CR004, CR005, CR006]

风险热力图摘要
风险可能性影响缓释成熟度剩余暴露
推理商品化 / 利润率
超大云厂商捆绑
NVIDIA 供应商兼竞争者
资本密集度 / 烧钱
关键人集中
低切换成本 / 流失
估值上行
监管(EU AI Act/GDPR)

严重度评级是作者综合分析师、评价和申报材料后的定性判断;剩余暴露反映缓释成熟度。

[CR001, CR002, CR003, CR004, CR025]
FR001: 风险热力图

主要风险类别的发生概率、影响和剩余敞口。

单元格为作者定性判断,综合了分析师、评价和备案证据。

[CR001, CR002, CR003, CR007]

7.2 监管和法律风险

Fireworks 的监管和法律敞口真实存在,但目前仍可管理。最重要的制度是 EU AI Act,它对通用和 foundation-model 提供商及其部署方施加分层、基于风险的义务,包括透明度和文档要求;Fireworks 作为推理和微调平台,位于欧盟客户的合规链条上。GDPR 和数据驻留要求推动公司提供零数据留存、数据驻留和区域部署功能,任何失误都会带来罚款和声誉成本。开放模型许可是更隐蔽的法律风险:Llama 等模型带有可接受使用和许可条款,行业关于训练数据版权的未决问题也可能传导到服务这些模型的平台。知识产权敞口也存在反向问题:Fireworks 没有列出公开专利,因此 FireAttention 和 FireOptimizer 的优势依赖商业秘密和 know-how;关键工程师离开后更难防守。公开信息中没有针对 Fireworks 的重大诉讼或执法行动,其 Series C 也由顶级法律顾问执行,但公司卖向医疗、金融服务和准政府行业后,监管表面积会扩大。[CR007, CR008, CR009, CR010, CR011, CR012]

监管 / 法律风险台账
风险制度 / 来源可能性影响缓释措施
EU AI Act 义务EU AI Act(GPAI / 部署方义务)合规 + 文档
数据隐私 / GDPRGDPR / 数据驻留零留存、欧盟区域
开放模型许可Llama / 模型许可许可合规、模型中立
训练数据版权外溢行业 IP 不确定性服务第三方模型
IP 防御力无注册专利商业秘密保护
行业合规扩张HIPAA / 金融 / 政府SOC2/HIPAA 状态
诉讼 / 执法公开未见顶级法律顾问

监管台账;公开信息中未见 Fireworks 面临重大诉讼,若干事项取决于行业和司法辖区, 因此覆盖不完整。

[CR007, CR008, CR009, CR010, CR011]

7.3 运营、质量和安全风险

运营上,Fireworks 最核心的敞口是 GPU 供给。公司不拥有自己的集群,而是从第三方采购 NVIDIA 和 AMD 容量;随着算力扩张 3–4 倍,它会暴露在配额约束、供应瓶颈和硬件换代时点风险中。实测数据上,可靠性是强项,2026 年 Q1 独立监测 uptime 为 99.8%;但 Fireworks 未发布正式的标准层 SLA,因此合同可靠性承诺逐案谈判,事故历史也不透明。跨 Frankfurt、Iceland、Tokyo 以及 US、EU、APAC 区域运营全球多区域集群,会增加运营复杂度和成本。安全和合规姿态相对较强:基于 AWS 的推理具备 SOC2 Type II 和 HIPAA,支持零数据留存、airgapped EKS 和 bring-your-own-bucket 训练,且公开信息中没有已知数据泄露;即便如此,客户工作负载是生产级且延迟敏感,单次严重宕机或数据事件都会尤其伤。评测者还指出文档偏薄、公司扩张时支持可能吃紧;这些是质量风险,不是安全风险。[CR013, CR014, CR015, CR016, CR017, CR018]

7.4 合作伙伴和依赖风险

Fireworks 处在一张密集的依赖网络里。最尖锐的是 NVIDIA:Fireworks 的性能和毛利率主张依赖 NVIDIA 供应的领先 GPU,NVIDIA 持有投资份额,如今又通过收购 Lepton、GPU-cloud marketplace 和 NIM 打包直接竞争。AWS 和 Microsoft 既是伙伴也是威胁:Strategic Collaboration Agreement 和 Foundry 可用性提供分发,但 Bedrock、Vertex 和 Azure 可以把推理捆进既有安全、计费和治理关系,吸收这一品类。Fireworks 还依赖 Meta、DeepSeek、Alibaba 等持续发布开放模型并保持许可宽松;如果开放模型质量放缓,或许可转向限制性条款,开放模型中立的论点会被削弱。云平台依赖、少数 late-stage 基金带来的资本提供方集中,以及成熟买家可多栖造成的关键客户集中,共同补全依赖图谱。共同主线是:Fireworks 的赋能伙伴也是最可信的竞争者,因此伙伴深度和供应商多元化是风险图景的核心。[CR019, CR020, CR021, CR022, CR023, CR024]

伙伴 / 依赖风险台账
依赖方角色风险严重度
NVIDIAGPU 供应商 + 投资者 + 竞争者分配、供应商兼对手
AMD替代芯片供应商生态成熟度较低
AWS云 + 渠道伙伴借 Bedrock 捆绑
MicrosoftFoundry 分发借 Azure 捆绑
开放模型实验室Meta / DeepSeek / Alibaba模型供给与许可
后期投资者资本提供方融资集中
关键客户成熟买家多供应商部署 / 内部自建

依赖台账;反复出现的主线是,支撑 Fireworks 的伙伴也是它最可信的竞争者。

[CR019, CR020, CR021, CR022, CR023]
FR003: 依赖图

关键外部依赖及其失效路径。

依赖边表示上游依赖;NVIDIA、AWS 和 Azure 同时是合作伙伴和竞争者。

[CR019, CR020, CR021, CR022]

7.5 财务、模型和执行风险

财务上,核心风险是毛利率压缩。约 50% 的毛利率在结构上低于软件常态,因为 GPU 成本进入 COGS;相较 Together 几乎贴身的价格差,加上开源 serving 框架持续改进,都会形成长期下行压力。公司声称走向 60% 的路径,取决于尚未被证明的利用率提升和收入结构变化。资本强度进一步放大问题:3–4 倍算力扩张需要反复投入容量,burn、runway 和净收入留存未披露,因此资本充足性更多是声称而非验证。从 $552 million 到 $4 billion、约十五个月内的估值爬坡,再加上 $15 billion 传闻,嵌入了激进增长预期;任何放缓或毛利率失望都会受到惩罚。执行和人员方面,创始团队的 PyTorch 背景是优势,但关键人风险集中在 CEO Lin Qiao;在火热市场里留住顶尖推理工程师仍是持续挑战。所有这些风险的缓释逻辑都是同一个向上游堆栈多元化,但能否成功正是投资的核心未解题。[CR025, CR026, CR027, CR028, CR029, CR030]

人员 / 执行风险台账
风险细节可能性影响
关键人集中CEO Lin Qiao 主导愿景和融资
创始人与工程师留存推理人才抢手,顶尖人才更稀缺
组织扩张人员和 GTM 快速搭建
路线图执行向上游应用层扩张仍未验证
治理不透明董事会构成未披露

人员与执行风险来自创始人集中度和路线图野心;员工数和董事会细节未披露。

[CR029, CR030, CR033]

7.6 缓释、监控与论点失效触发器

Fireworks 的缓释路径是连贯的:把堆栈延伸到微调、强化学习、语音和企业治理,逃离商品化 serving;在 NVIDIA 和 AMD 之间分散芯片,并追求 Blackwell 效率;维持 day-zero 开放模型支持,让模型更替成为顺风;强化企业合规,拿下受监管行业;接入 AWS 和 Azure 采购,而不是正面硬拼。真正该监控的指标是毛利率向 60% 的轨迹、收入结构向专用和企业迁移、披露后的净收入留存、GPU 成本和配额条款,以及相较 vLLM 和 SGLang 的竞争差距。最清晰的论点失效触发器包括:毛利率无法从 ~50% 抬升或进一步压缩;超大云厂商或 NVIDIA 拿走推理层,把 Fireworks 降格为优化 add-on;关键人离开;或增长低于 $4 billion-plus 估值隐含的速度。优先尽调问题是对齐后的收入和毛利率数字、NRR 和 burn、GPU 供给合同,以及头部客户集中度。总体剩余敞口中高,主要集中在商品化和依赖风险,而非法律或运营失败。[CR031, CR032, CR033, CR034, CR035, CR036]

缓释措施与否决标准表
风险缓释措施监测指标论点失效触发项
商品化向调优、Agent、语音上移收入结构变化利润率卡在或跌破 ~50%
超大云厂商捆绑接入 AWS/Azure 渠道直销与伙伴占比推理被 Bedrock/Azure 吸收
NVIDIA 依赖分散到 AMD,吃到 Blackwell 效率GPU 成本与配额条款NVIDIA 用价格 / 供给压价
利润率压缩提高利用率 + 企业客户占比毛利率向 60% 靠拢利润率压到 50% 以下
关键人物风险加厚领导梯队高管留存Lin Qiao 离职
增长韧性先落地再扩张 + NRRNRR、客户数增长增长相对估值失速

缓释措施和否决标准综合了分析师评论和公司策略;触发项是作者设定的论点失效阈值。

[CR031, CR032, CR034, CR035, CR036]
FR002: 风险传导图

商品化和依赖风险如何传导到财务结果。

传导边综合了分析师风险分析;方向表示风险传播。

[CR001, CR002, CR025, CR026]

7.7 展示材料

Chapter 08

08估值

8.1 投资论点与反论点

牛市论点是,Fireworks 正在成为关键企业 AI 基础设施,也就是开放模型推理的 runtime 层;与此同时,企业正从封闭 API 试验转向在生产中拥有定制模型。它把几项稀缺要素拼在一起:打造 PyTorch 的创始团队,真实产品优势(FireAttention、FireOptimizer、同类最佳函数调用、99.8% uptime),蓝筹生产参考(Cursor、Notion、Sourcegraph、Upwork),以及从 2025 年中约 $130 million ARR 到 2026 年 5 月据称覆盖 10,000 多客户、年化约 ~$800 million 的超高速增长。如果托管推理按耐久基础设施定价,而不是按商品定价,估值还能复合。反论点是,推理在结构上正在商品化:GPU 成本主导 COGS,使毛利率停在约 50%;按 token 价格与 Together 相差约 ~2%;开源 serving 框架不断缩小差距;切换成本接近零;最强玩家 AWS、Azure 和 NVIDIA 同时是伙伴和竞争者,有能力重定价这一品类。按这种看法,Fireworks 可能变成一个毛利率约 ~50% 的优化 add-on;估值十五个月内从 $552 million 跑到 $4 billion、并传出 $15 billion 谈判,已经把完美执行计入价格。[CV001, CV002, CV003, CV004, CV005, CV006]

正反论点表
维度看多论点看空反论点
市场推理成为新运行时,TAM 巨大相比超大云厂商,可触达 SAM 偏小
产品FireAttention/FireOptimizer 有技术边际 + 可靠性OSS 框架缩小差距
客户蓝筹客户生产环境背书切换成本低,多家并用
财务高速增长到 ~$800M~50% 利润率,价格战
竞争可靠性和函数调用领先价格和速度两头受挤
依赖NVIDIA/AWS/Azure 战略支持同一批玩家也能重定价这个品类
估值基础设施倍数有支撑价格已经计入完美执行

正反论点对称展开;决定变量是利润率轨迹和留存,二者都未披露。

[CV001, CV002, CV003, CV004, CV005]
FV001: 推荐逻辑

各 thesis 因素如何共同形成 track 建议。

逻辑流概括推荐驱动因素;权重为定性判断。

[CV007, CV008, CV001, CV002]

8.2 建议、置信度和立场

我们将 Fireworks AI 评为跟踪,置信度中等,风险评级高,估值立场偏拉伸,整体得分 6.5/10。业务质量值得密切接触,并在合适进入价建立仓位;但当前和传闻价格要求投资人相信两个尚未证明的变量:毛利率能从 ~50% 明显爬向公司声称的 60% 目标,且增长是耐久的,不是容易被超大云厂商拿走的商品化抢地盘。2025 年 10 月 Series C 时,$4 billion 估值约等于公司声称 $280 million 年化收入的 14 倍;按 Sacra 2026 年 5 月 ~$800 million 估计,同一 $4 billion 约为 5 倍,但传闻 $15 billion 轮次约等于这一更高基数的 19 倍。区间很宽,反映出对正确收入数字以及这个低于软件毛利、快速商品化品类应给多少倍数的真实不确定。建议因此是密切跟踪,按 base case 承保,坚持低于传闻标记的进入纪律,并在以溢价投入前要求披露毛利率和留存。置信度主要被三项缺失压在中等:经审计财务、披露的 NRR、以及对齐后的收入数字。[CV007, CV008, CV009, CV010, CV011]

投资建议摘要表
维度评估依据
建议跟踪观察资产质量高,但价格要求高
信心财务未经审计,NRR 未披露
风险评级商品化 + 依赖
估值判断偏高以 ~50% 利润率讨论 $15B
综合评分6.5 / 10业务强,价格贵
入场纪律低于传闻中的 $15B按基准情形承销

建议综合投资论点、财务、客户、竞争和风险章节;评分是作者的综合判断。

[CV007, CV008, CV009]

8.3 融资背景与进入纪律

Fireworks 已在种子轮、Series A($25M,2024)、Series B($52M,估值 $552M,2024 年 7 月)和 Series C($250M,估值 $4B,2025 年 10 月)累计融资超过 $327 million;最后一轮约 $230 million 为一级发行,$20 million 为二级转让。到 2026 年 5 月,据称公司正讨论以 $15 billion 投后估值融资,由 Index Ventures 共同领投,约七个月内接近翻四倍。对私有后期进入而言,关键纪律包括:用哪个收入基数打倍数、优先权栈和任何清算压力,以及在资本密集型算力建设中继续融资带来的稀释。公开证据支持增长和客户叙事,但不支持财务质量:收入数字未经审计且来源冲突,毛利率是分析师估计,burn 和 runway 未披露。NVIDIA、AMD、MongoDB 和 Databricks 等战略投资者在股权结构表上是双刃剑:增加生态支持,也集中供应商和伙伴影响力。进入纪律应锚定基准情景估值,把 $15 billion 标记视为需要毛利率证明的拉伸情形,并计入未公开披露的优先权和稀释。[CV012, CV013, CV014, CV015, CV016]

8.4 牛市、基准和熊市情景

我们的基准情景(约 45% 权重)假设 Fireworks 在 2026 年达到约 $700-900 million 年化收入,并继续增长;毛利率只温和改善到 50% 出头。公司守住份额,但商品化压住倍数,对应公允企业价值约 $5-8 billion,大致接近或略高于 $4 billion Series C,低于传闻 $15 billion。牛市情景(约 30%)假设向上游堆栈策略奏效:微调、强化学习、语音和治理把毛利率抬向 58-60%,收入到 2027 年复合超过 $1.5 billion,Fireworks 成为 platform-of-record,支撑 $15-20 billion 估值。熊市情景(约 25%)假设商品化和超大云厂商捕获:毛利率停在 50% 附近或被压缩,买家多栖或转向 Bedrock 和 Azure 后增长急剧放缓,倍数压缩到 $2-3 billion 区间或 down round。离散度异常大,因为同一家公司既可被解读为耐久基础设施,也可被解读为商品化 reseller;决定性证据——毛利率轨迹和留存——尚未披露。[CV017, CV018, CV019, CV020, CV021]

牛市 / 基准 / 熊市情景表
情景概率关键假设2026-27 收入利润率隐含价值
牛市~30%向上游应用层推进奏效,成为记录平台到 2027 年 >$1.5B58-60%$15-20B
基准~45%守住份额,利润率小幅改善$700-900M (2026)50% 出头$5-8B
熊市~25%商品化 + 超大云厂商捕获价值增速减半~50% 或更低$2-3B / 下轮降估值

情景概率和区间为作者估算;收入采用公司和 Sacra 数字,未经审计。

[CV017, CV018, CV019, CV020]
FV003: 估值 / 回报区间

各情景隐含企业价值,单位为十亿美元。

情景价值区间为作者估算,以可比倍数和已披露估值标记为锚。

[CV017, CV018, CV019]

8.5 可比公司组

私有可比公司锚定本次分析。最接近的同行 Together AI 在 2025 年初以约 $618 million 年化收入获得 $3.3 billion 估值(约 5x),据称正以约 $1 billion 收入讨论接近 $7.5 billion 的估值(约 7-8x)。Baseten 2026 年 1 月以 $5 billion 估值融资,据称正在讨论 $11 billion;Groq 作为硬件驱动玩家、商业模式不同,估值达到 $6.9 billion;Fal 被引用在约 $4.5 billion。相较这些公司,Fireworks 按 ~$280 million(Series C 时点)和 $4 billion 估值看,较 Together 倍数偏贵,但收入基数更小、增长更快;按 2026 年 5 月 ~$800 million 估计看则相对便宜,而 $15 billion 讨论又把估值重新拉伸。公开基础设施软件可比公司——Datadog、Snowflake、Confluent、Cloudflare、MongoDB 和 DigitalOcean——给出倍数天花板:高增长公共基础设施公司的交易区间很宽,但已从峰值压缩;DigitalOcean 这类低毛利基础设施业务相对纯软件有明显折价。Fireworks 毛利率约 ~50%,因此应较纯 SaaS 倍数折价;超大云厂商 Amazon、Microsoft 和 Oracle 既是规模参照,也是竞争威胁。可比公司组支持的是一个情景依赖、区间很宽的价值,而不是单点估值。[CV022, CV023, CV024, CV025, CV026, CV027]

可比估值表
公司类型估值收入(年化)隐含倍数备注
Fireworks AI私募轮$4.0B (Oct 2025)~$280M~14xSeries C 时点
Fireworks AI私募(传闻)$15B (2026)~$800M~19x洽谈中,未确认
Together AI私募轮$3.3B (Feb 2025)~$618M~5x最接近的同业
Together AI私募(传闻)$7.5B (2026)~$1.0B~7-8x洽谈中
Baseten私募轮$5.0B (Jan 2026)未披露n/a传闻讨论 $11B
Groq私募轮$6.9B (Sep 2025)硬件模式n/a模式不同
上市基础设施 SaaS上市可比公司Datadog/Snowflake/Cloudflare数十亿美元级~8-20x EV/rev(收入倍数)利润率 >70%
DigitalOcean上市可比公司倍数更低~$0.8B低个位数重基础设施折价

私募轮来自公司和 Sacra;上市可比公司按公开文件做定性对照。覆盖不完整:并非所有同业都披露收入。

[CV022, CV023, CV024, CV025, CV026]
FV002: 估值敏感性

不同收入和倍数假设下的隐含估值,单位为 USD billions。

敏感性网格用公司和 Sacra 的收入数字乘以示例倍数;不是预测。

[CV009, CV022, CV023]

8.6 退出准备度与最终尽调

退出可选项方向上较强,但时点尚未证明。可行路径包括:若 Fireworks 维持超高速增长并把毛利率抬向软件水平,走 IPO;或被希望拥有推理层的超大云厂商或数据平台投资方(AWS、Microsoft、Databricks、MongoDB、NVIDIA)战略收购,尽管其中若干方也是竞争者。主要论点失效触发器包括:毛利率无法从 ~50% 抬升、超大云厂商或 NVIDIA 拿走推理层、关键人离开,或增长低于估值隐含速度。最终优先尽调问题包括:一个统一、标明日期的收入数字,经审计或管理层确认的毛利率及通向 60% 的路径,净收入留存和流失 cohort,算力建设对应的 burn 和 runway,GPU 供给合同条款,头部客户集中度,以及下一轮优先权和稀释结构。在这些问题得到回答前,正确姿态是密切跟踪公司,在基准情景上建立信念,并且只有在毛利率和留存支持基础设施论点、而非商品论点后,才为溢价进入预留空间。[CV028, CV029, CV030, CV031, CV032]

论点失效与否决触发项表
触发项信号动作
利润率停滞毛利率卡在 ~50% 或下滑退出 / 避免溢价
超大云厂商捕获价值推理被 Bedrock/Azure 吸收重新评估韧性
NVIDIA 重新定价供应商在价格 / 供给上压价降低敞口
增长停滞收入相对估值减速下轮降估值风险
关键人物流失Lin Qiao 离职重新承销
留存不及预期NRR 披露后低于 ~110%下调倍数

否决触发项对应会推翻基础设施论点的条件;阈值由作者设定。

[CV028, CV029, CV030]
最终尽调问题表
问题重要性负责人
已核对且带日期的收入确定倍数分母公司 / 财务
经审计毛利率 + 60% 路径检验溢价论点公司 / 财务
NRR 和流失分组收入韧性公司 / RevOps
烧钱和现金跑道融资风险与算力计划的关系公司 / 财务
GPU 供给合同利润率和供给敞口公司 / 基础设施
头部客户集中度收入集中风险公司 / 销售
优先权与稀释入场经济性公司 / 法务

尽调问题是以溢价估值投资前的门槛项。

[CV031, CV032]
FV004: 投资 KPI

核心可投性指标。

KPI 综合投资建议和估值分析;倍数使用未经审计的收入。

[CV007, CV009, CV010]

8.7 展示材料

免责声明

本报告仅供参考,基于截至 2026-06-14 的公开来源,不构成投资建议。财务数字大多来自未经审计的公司声明或第三方估算, 做出任何决策前均应独立核验。

证据索引

结论
编号陈述可信度来源
CO001 Fireworks AI is an AI inference-cloud company headquartered in Redwood City, California. SO018, SO020, SO025
CO002 Fireworks AI was founded in late 2022 by a team that left Meta's PyTorch organization. SO002, SO004, SO014
CO003 Fireworks operates an "AI Cloud" platform that runs, fine-tunes and scales open-source LLM, vision, audio and multimodal models with low-latency inference. SO002, SO013, SO001
CO004 Fireworks monetizes via usage-based pricing including per-token serverless inference, per-training-token fine-tuning, per-GPU-hour reinforcement fine-tuning and dedicated deployments. SO013
CO005 Fireworks positions itself on a "one-size-fits-one" thesis favoring smaller customizable open models over generic closed foundation models. SO002, SO005
CO006 Lin Qiao is CEO and co-founder of Fireworks AI and previously led the PyTorch team at Meta. SO004, SO016, SO018
CO007 Fireworks AI was co-founded by seven people, most of whom worked together on PyTorch at Meta. SO004, SO014, SO023
CO008 Co-founders Dmytro Dzhulgakov and Dmytro Ivchenko are Ukrainian former Meta PyTorch engineers. SO014, SO004
CO009 Lin Qiao holds a Ph.D. in Computer Science from UC Santa Barbara and previously worked at LinkedIn and IBM. SO018, SO016
CO010 Other co-founders include James Reed, Benny Chen, Chenyu Zhao and Pawel Garbacki, with backgrounds at Meta PyTorch, ads and ML teams and Google Vertex AI. SO004, SO023
CO011 Fireworks AI raised a $250 million Series C in October 2025 at a $4 billion valuation. SO002, SO019, SO020
CO012 The Series C was co-led by Lightspeed Venture Partners, Index Ventures and Evantic with continued support from Sequoia Capital. SO002, SO021, SO022
CO013 A $52 million Series B led by Sequoia closed in July 2024 at a $552 million valuation with NVIDIA, AMD and MongoDB Ventures participating. SO003, SO008, SO009
CO014 Fireworks AI has raised more than $327 million in total funding as of October 2025. SO002, SO013
CO015 A $25 million Series A led by Benchmark closed in March 2024 with Sequoia, Databricks Ventures and angels including Frank Slootman, Sheryl Sandberg, Howie Liu and Alexandr Wang. SO014, SO003
CO016 The Series B brought Fireworks AI's cumulative capital raised to $77 million. SO003
CO017 As of May 2026 Sacra reports Fireworks is in talks to raise at a $15 billion post-money valuation with Index set to co-lead, on unconfirmed terms. SO013
CO018 Fireworks AI reported powering over 10,000 companies at its October 2025 Series C, roughly a tenfold increase from its Series B. SO002, SO013
CO019 Fireworks reported annualized revenue surpassing $280 million at the time of the October 2025 Series C. SO002
CO020 The Series C round comprised roughly $230 million of primary funding and a $20 million secondary transaction per Sacra. SO013
CO021 Fireworks AI's developer base grew from about 12,000 in February 2024 to 23,000 by the end of 2024. SO014
CO022 The Fireworks platform processes more than 10 trillion tokens per day as of October 2025, rising to about 15 trillion per day by early 2026 per third-party profiles. SO002, SO018
CO023 Earlier 2025 coverage cited Fireworks at $130 million ARR, profitable, and growing roughly 20x year over year. SO014
CO024 Sacra estimates Fireworks AI's gross margin near 50 percent, below software norms, with management targeting 60 percent through GPU optimization. SO013
CO025 Fireworks launched Microsoft Foundry (Azure) availability in March 2026, extending open-model inference to Azure customers. SO018
CO026 Fireworks shipped FireFunction V2, FireAttention V2, FireOptimizer, supervised fine-tuning V2 and reinforcement fine-tuning between 2024 and 2026. SO003, SO013
CO027 Fireworks AI acquired Hathora to deepen real-time and global compute orchestration. SO013
CO028 Fireworks does not own its GPU fleet and sources NVIDIA and AMD capacity from third parties. SO013
CO029 Analysts cite inference commoditization, hyperscaler bundling and hardware concentration as the main structural risks to Fireworks. SO013
CO030 Independent reviewers describe Fireworks as "just the engine," requiring developer sophistication, with thin documentation and no ongoing free tier. SO026
CO031 Fireworks offers an OpenAI-compatible API plus function calling, fine-tuning and enterprise security controls across hundreds of models. SO001, SO002
CO032 Investors at Index Ventures and Sequoia cite the founding team's PyTorch and inference-systems pedigree as the core reason for backing Fireworks. SO004, SO005
CO033 CEO Lin Qiao concentrates fundraising, vision and public representation, creating a meaningful key-person dependency. SO004, SO015
CO034 NVIDIA has entered the inference market directly via its Lepton acquisition and a competing GPU cloud marketplace, raising supplier-as-competitor risk for Fireworks. SO013
CO035 Company-stated revenue figures and third-party estimates for Fireworks differ materially across vintages, from $130M ARR in mid-2025 to ~$800M annualized by May 2026. SO002, SO013, SO014
CM001 Fireworks AI competes in the managed AI inference market for serving and tuning open-weight models in production. SM010, SM013
CM002 The core included spend is third-party production model serving, fine-tuning and dedicated deployment, not foundation-model training. SM010, SM009
CM003 Closed-model APIs from OpenAI and Anthropic are excluded from the core market but are the primary status-quo substitute. SM009, SM025
CM004 Self-hosting on vLLM or SGLang and hyperscaler bundles such as Bedrock and Azure Foundry are direct substitutes for Fireworks. SM010, SM015
CM005 Adjacent expansion pools include voice agents, RAG/embeddings and reinforcement-learning training for agents. SM010
CM006 MarketsandMarkets estimates the AI inference market at $106.15 billion in 2025 growing to $254.98 billion by 2030 at a 19.2% CAGR. SM001, SM003
CM007 Other research houses place the 2026 AI inference market between roughly $118 billion and $126 billion and 2034 between $312 billion and $536 billion. SM002, SM003, SM005
CM008 Gartner projects generative-AI model spend to nearly triple from $14 billion in 2025 to $39 billion by 2028. SM009
CM009 The independent open-weight inference-serving market has consolidated around roughly seven providers as of Q2 2026. SM006
CM010 With Together AI near $1 billion annualized revenue and Fireworks in the $280-800 million range, the independent-provider revenue pool is a few billion dollars in 2026. SM011, SM010
CM011 Fireworks' $280 million-plus revenue represents an early single-digit share of the independent inference niche. SM010, SM013
CM012 The most relevant lens for valuing Fireworks is the independent inference niche, not the headline AI inference TAM. SM006, SM010
CM013 AI-native startups adopt Fireworks bottoms-up via self-serve API keys with an engineering lead as economic buyer. SM010
CM014 Digital-native enterprises such as DoorDash, Notion, Shopify and Upwork move features from pilot to production on Fireworks. SM013, SM010
CM015 Regulated and Fortune 500 buyers require SSO, audit logs, data residency and HIPAA/SOC2 posture and adopt top-down via procurement. SM010
CM016 Across segments the user is a developer and the payer is an engineering or procurement budget. SM010, SM013
CM017 Fireworks reaches buyers through cloud procurement channels via an AWS Strategic Collaboration Agreement and Microsoft Foundry availability. SM010, SM015
CM018 Open-source model quality convergence and agentic compound AI are primary drivers expanding inference demand. SM009, SM013
CM019 Inference is commoditizing as vLLM and SGLang improve, compressing the proprietary advantage of optimized stacks. SM010, SM025
CM020 Hyperscaler bundling by AWS, Azure and Google folds inference into existing security, billing and governance relationships. SM010, SM015
CM021 Fireworks' Llama 70B price sits within roughly 2% of Together AI's, illustrating razor-thin price differentiation. SM023, SM006
CM022 GPU supply is concentrated and Fireworks does not own its fleet, creating capacity and cost exposure. SM010
CM023 The EU AI Act imposes tiered obligations that add compliance overhead for AI deployment in Europe. SM026
CM024 The OpenAI-compatible API lowers both switching-in and switching-out costs, capping durable lock-in. SM023, SM010
CM025 Published AI inference TAM figures bundle chips, hyperscaler services and independent software, so they overstate Fireworks' reachable market. SM001, SM006
CM026 The independent inference-provider revenue pool is not measured by any standard analyst and must be assembled from uneven company estimates. SM010, SM011, SM012
CM027 Forecast CAGRs for AI inference range from roughly 13% to 19% and 2034 estimates differ by more than $200 billion across houses. SM001, SM002, SM003
CM028 Despite wide estimate spreads, the AI inference market is clearly large and growing double digits, with directional rather than precise SAM. SM001, SM004
CM029 There is no public evidence of near-term saturation in the AI inference market; growth drivers remain intact through the forecast window. SM002, SM004
CM030 Fine-tuned and specialized models are projected to capture much of the generative-AI model-spend growth, favoring Fireworks' tuning products. SM009
CM031 The serverless open-weight inference field shows roughly 6x price spread and 5-7x latency spread across providers on the same model. SM006
CM032 Together AI, Groq, Baseten, Cerebras, Replicate, Anyscale and OctoAI are the other named providers in the consolidated inference field. SM006, SM016, SM019
CM033 Voice agents targeting sub-500ms latency expand Fireworks into contact-center and telephony budget categories larger than API inference alone. SM010
CM034 Demand differs by maturity: startups optimize cost-per-token while Fortune 500 buyers prioritize control, compliance and vendor consolidation. SM010, SM015
CM035 A defensible 2026 AI inference market figure is roughly $118-126 billion, between the 2025 base and the 2030 forecast. SM001, SM003
CP001 The inference market has segmented into managed open-model platforms, vertically integrated silicon, hyperscaler bundles and open-source serving frameworks. SP009, SP010
CP002 Together AI, Baseten and Replicate are Fireworks' closest managed open-model competitors. SP009, SP010
CP003 Groq, Cerebras and SambaNova attack inference from custom silicon rather than software optimization on commodity GPUs. SP009, SP005
CP004 AWS Bedrock, Google Vertex, Azure Foundry and Databricks Model Serving collapse model access, infrastructure and governance into one platform. SP009, SP016
CP005 Open-source serving frameworks vLLM and SGLang plus NVIDIA NIM and routers like OpenRouter commoditize proprietary inference advantage. SP009
CP006 NVIDIA entered inference directly via its Lepton acquisition and a competing GPU-cloud marketplace, becoming a supplier-turned-rival. SP009
CP007 Together AI raised a $305 million Series B in February 2025 at a $3.3 billion valuation and reached about $1 billion annualized revenue by early 2026. SP002, SP018
CP008 Together AI was founded in 2021 by Percy Liang, Chris Re and Vipul Ved Prakash and spans serverless, clusters, fine-tuning, voice and RL. SP002, SP018
CP009 Baseten raised $300 million in January 2026 at a $5 billion valuation led by IVP and CapitalG with a reported $150 million from NVIDIA. SP004, SP007
CP010 Baseten positions as an enterprise inference-engineering platform with self-hosted and hybrid VPC deployment built on TensorRT, SGLang, vLLM and TGI. SP003, SP015
CP011 Groq raised $750 million in September 2025 at a $6.9 billion valuation and advertises 750-plus tokens per second on Llama models from custom LPU silicon. SP005, SP006, SP017
CP012 Groq's partnership with Meta to power the official Llama API gives it strong distribution and first-party open-model credibility. SP009
CP013 Replicate, Modal and Anyscale compete for developer mindshare at the top of the adoption funnel. SP012, SP013, SP014
CP014 Fireworks' Q1 2026 uptime of 99.8% is the highest among specialized inference providers per independent monitoring. SP001
CP015 Llama 3.3 70B runs about $0.90 per million tokens on Fireworks versus $0.88 on Together and $0.59 on Groq. SP001, SP010
CP016 FireFunction achieves roughly 92% multi-tool function-calling accuracy, ahead of Together and Groq and within a few points of GPT-4o. SP001
CP017 Together offers a 200-plus model catalog with full fine-tuning while Groq offers 15-20 models and no fine-tuning. SP001
CP018 Groq's LPU delivers 400-750 tokens per second versus Fireworks' ~145, but Fireworks wins latency consistency under load. SP001, SP010
CP019 Fireworks, Together and Baseten all offer SOC2/HIPAA postures and VPC or data-residency options, with Baseten leaning hardest into self-hosted compliance. SP003, SP009
CP020 Most inference providers expose OpenAI-compatible APIs, making migration between them a matter of minutes. SP001, SP020
CP021 Routing aggregators such as OpenRouter and TokenMix encourage multi-homing and automatic failover across providers. SP001, SP009
CP022 Hyperscalers and NVIDIA control the GPU supply, security posture and procurement relationships that independents depend on. SP009, SP016
CP023 Fireworks plugs into incumbent channels via an AWS Strategic Collaboration Agreement and Microsoft Foundry availability. SP009, SP016
CP024 Fireworks does not own GPUs and sources NVIDIA and AMD capacity from third parties, unlike Together's owned data-center strategy. SP009, SP002
CP025 Fireworks' proprietary FireAttention and FireOptimizer stack converts inference-systems engineering into a performance and price advantage. SP009
CP026 Open-source serving frameworks keep closing the performance gap, and Baseten openly builds on vLLM and SGLang. SP009, SP003
CP027 NVIDIA pushes NIM as a packaging layer and Snowflake released Arctic Inference as an open vLLM plugin, compressing proprietary advantage. SP009
CP028 Groq at $6.9 billion, Baseten at $5 billion and Together near a rumored $7.5 billion are better capitalized than Fireworks at $4 billion. SP005, SP004, SP002
CP029 Independent reviewers describe Fireworks as "just the engine," an adverse signal about its application-level differentiation versus full-stack rivals. SP023
CP030 Fireworks' durability depends on extending into tuning, agents and governance faster than the ecosystem commoditizes the serving layer. SP009
CP031 Fireworks' most defensible differentiation is reliability plus best-in-class function calling rather than price or raw speed. SP001
CP032 The same Llama model spreads roughly sixfold in price and 5-7x in latency across the seven-provider field. SP010
CP033 Together AI has raised $533.5 million in total funding from investors including General Catalyst, Prosperity7, NVIDIA, Salesforce and Kleiner Perkins. SP002
CP034 Baseten's valuation roughly doubled from $2.15 billion in September 2025 to $5 billion in January 2026, with talks of an $11 billion round by May 2026. SP003, SP004
CP035 Hyperscaler bundling is plausibly the single biggest structural threat to Fireworks because it removes the need for a standalone inference vendor. SP009, SP016
CI001 Fireworks bills serverless inference per token, fine-tuning per training token, reinforcement fine-tuning per GPU-hour and dedicated deployments per GPU-second or GPU-hour. SI002, SI003
CI002 Fireworks' usage-based pricing maps to the customer lifecycle, capturing revenue across experimentation, production, adaptation and scaled deployment. SI002
CI003 Reserved capacity is contracted separately on longer commitments at negotiated pricing and is the highest-margin stream. SI002
CI004 Fireworks publishes serverless rates of about $0.90 per million tokens for Llama 3.3 70B, $0.20 for the 8B model and $0.50 for DeepSeek V3. SI004, SI005
CI005 Image generation runs from about $0.013 (SDXL) to $0.04 (Flux 1.1 Pro) per image and reserved capacity near $4.80 per hour per replica. SI004
CI006 Fireworks' go-to-market is bottoms-up at entry via self-serve API keys and top-down at expansion via negotiated enterprise relationships. SI002
CI007 Fireworks offers $1 of free credits rather than an ongoing free tier and a standard rate limit near 600 requests per minute. SI004
CI008 Fireworks runs a field and partner sales motion anchored by an AWS Strategic Collaboration Agreement with funded proofs-of-concept and a startup acceleration program. SI002, SI007
CI009 Blended annualized revenue per company is estimated near $28,000 across Fireworks' 10,000-plus customer base. SI002
CI010 Fireworks revenue is likely concentrated among a smaller number of large production deployments rather than evenly across the base. SI002
CI011 Sacra estimates Fireworks' gross margin near 50%, below the 70%-plus typical of subscription software, because GPU costs sit in cost of goods sold. SI002
CI012 Management targets a 60% gross margin through better GPU utilization, Blackwell-generation efficiency and a mix shift toward dedicated and enterprise workloads. SI002
CI013 Multi-LoRA improves utilization by consolidating many fine-tuned variants onto a single base-model deployment, lowering compute cost per variant. SI002, SI018
CI014 Proprietary optimization via FireAttention and FireOptimizer lets Fireworks charge a premium over self-hosting while undercutting the alternative's total cost. SI002, SI016
CI015 NVIDIA reports rapidly growing data-center GPU revenue, evidencing the supplier-driven, capacity-constrained input market Fireworks operates within. SI012
CI016 AMD's data-center accelerator business is also scaling, offering Fireworks an alternative silicon supplier to NVIDIA. SI013
CI017 Fireworks stated annualized revenue surpassing $280 million at its October 2025 Series C. SI001, SI006
CI018 Sacra estimates Fireworks at roughly $305 million annualized at year-end 2025 rising to about $800 million by May 2026. SI002
CI019 Earlier 2025 coverage reported Fireworks at $130 million ARR, profitable, and growing roughly 20x year over year. SI009
CI020 Fireworks' audited financials, revenue mix, net revenue retention, churn and headcount are not public. SI002, SI010
CI021 Fireworks processes more than 10 trillion tokens per day, rising to 15 trillion by early 2026. SI001, SI010
CI022 Fireworks has raised more than $327 million across seed, Series A, B and C rounds. SI001, SI002
CI023 The October 2025 Series C provided $250 million, roughly $230 million primary and $20 million secondary, at a $4 billion valuation. SI002, SI001
CI024 Fireworks plans to grow its compute footprint three-to-four-fold over the next year, a capital-intensive expansion. SI001
CI025 Sacra reports Fireworks is in talks to raise again at a $15 billion valuation as of May 2026, which could be the next-round trigger. SI002
CI026 Fireworks' principal financing dependency is GPU supply, since it does not own its fleet and sources NVIDIA and AMD capacity from third parties. SI002, SI012
CI027 Fireworks shows credible hypergrowth and a lifecycle-spanning usage model, but the absence of audited figures caps revenue-quality confidence. SI002, SI001
CI028 The main financial diligence blockers are a reconciled revenue figure, gross-margin verification, burn and runway, and net revenue retention. SI002, SI010
CI029 Fireworks' revenue figures span $130 million to roughly $800 million annualized within twelve months, reflecting both hypergrowth and inconsistent measurement. SI001, SI002, SI009
CI030 No public debt or project-finance obligations are disclosed for Fireworks AI. SI002, SI021
CI031 An AWS case study reports a Fireworks customer cut total costs four-fold and supported three times higher traffic per instance on EC2 P5. SI007
CI032 Reported 2025 profitability, if accurate, would make Fireworks unusually capital-efficient for a hypergrowth infrastructure startup. SI009
CI033 Downward inference price pressure threatens Fireworks' margins absent continued differentiation, per critical reviewers. SI020
CI034 MongoDB, a public infrastructure peer and Fireworks investor, illustrates the higher gross margins of pure-software comparables versus inference providers. SI014
CI035 Fireworks' capital intensity exceeds a typical SaaS company because compute scaling and the lack of owned GPUs require recurring capacity spend. SI002, SI001
CE001 Fireworks lets a developer point an OpenAI-compatible API at an open model and get low-latency production inference without managing GPUs. SE010, SE013, SE017
CE002 Customers describe Fireworks as an inference engine that supplies speed, cost and control while they build the product. SE014, SE025, SE026
CE003 The platform spans text, image, audio and multimodal formats across hundreds of models with day-zero support for major releases. SE010, SE006
CE004 Fireworks provides function calling, JSON-mode structured output and streaming through its API. SE010, SE013
CE005 A single customer can expand from serverless inference into fine-tuning, dedicated capacity, RAG and voice agents. SE017, SE023
CE006 Serverless inference is the entry product, offering pay-per-token access to 50-plus served models including Llama 4, DeepSeek V3, Qwen 3, Mixtral, Gemma 3 and Phi-4. SE013, SE010
CE007 FireFunction is Fireworks' proprietary function-calling model family for tool use and structured output. SE013
CE008 Customization modules include LoRA fine-tuning, Supervised Fine-Tuning V2 with quantization-aware training, and Reinforcement Fine-Tuning for agentic tasks. SE005, SE003, SE004
CE009 Deployment modules span serverless, on-demand dedicated and reserved capacity plus multi-LoRA hosting of many adapters on one base deployment. SE021, SE020
CE010 Newer surfaces include a Voice Agent Platform with sub-500ms response and BYOB secure training from customer AWS S3 buckets. SE017, SE019
CE011 Fireworks runs a proprietary multi-layer inference stack on commodity NVIDIA GPUs with a stateless router, draft and target pods, distributed KV cache and continuous batching. SE001
CE012 FireAttention is a custom CUDA attention implementation Fireworks reports as faster than vLLM and TensorRT-LLM, extended for long context and Llama 4 chunked local attention. SE006, SE001
CE013 FireOptimizer performs adaptive speculative execution with reported latency reductions up to roughly 3x and native FP4 support on NVIDIA Blackwell B200. SE002, SE009
CE014 The serving topology scales to documented tests around 50,000 requests per minute. SE001
CE015 Speculative decoding pairs a fast draft model with a full target model to generate and verify tokens in parallel, configurable per workload. SE008, SE001
CE016 Fireworks' operating model is open-model neutral, betting on running whichever open model is winning rather than any single model. SE017
CE017 Fireworks operates a global multi-region fleet including Frankfurt, Iceland and Tokyo plus US, Europe and APAC regions for latency and data residency. SE017
CE018 Independent monitoring placed Fireworks' Q1 2026 uptime at 99.8%, the highest among specialized inference providers. SE013
CE019 Notion reduced AI response latency from about 2 seconds to 350 milliseconds by fine-tuning with Fireworks. SE015
CE020 Cursor reached about 1,000 tokens per second for code generation and Sourcegraph saw a 30% latency reduction and 2.5x acceptance increase on Fireworks. SE014, SE016
CE021 The Series C-funded roadmap targets deeper tuning and inference-alignment research and an end-to-end model-lifecycle creation toolchain. SE022, SE019
CE022 Fireworks plans a three-to-four-fold expansion of global compute and has acquired Hathora to deepen real-time orchestration. SE022, SE017
CE023 Fireworks' core IP is the proprietary inference engine, especially FireAttention kernels and FireOptimizer, rather than registered patents. SE002, SE017
CE024 No public patents are listed for Fireworks; its moat is engineering know-how. SE017
CE025 Product-model co-design uses a customer data feedback loop with continuous evaluation and reinforcement learning to improve fine-tuned models over time. SE022, SE003
CE026 Fireworks' optimization advantage sits atop open-source frameworks like vLLM and SGLang that keep improving, so differentiation must be continuously re-earned. SE017, SE009
CE027 The platform depends on leading-edge NVIDIA and AMD GPUs, CUDA, cloud regions and upstream open models. SE001, SE017
CE028 Fireworks offers zero data retention by default, SSO, audit logs and data-residency controls for enterprise buyers. SE017
CE029 Fireworks' AWS-based inference solution is HIPAA and SOC2 Type II compliant. SE007, SE017
CE030 For sensitive workloads Fireworks supports airgapped EKS deployments and bring-your-own-bucket secure training. SE017
CE031 Structured-output controls such as JSON mode and grammar-constrained decoding plus high schema compliance support dependable agentic tool use. SE013, SE010
CE032 Fireworks does not publish a formal standard-tier SLA, and reviewers note thin documentation in places, both diligence items for security-sensitive buyers. SE013, SE025
CE033 FireFunction achieves roughly 92% multi-tool function-calling accuracy and 99.1% JSON schema compliance in independent benchmarks. SE013, SE027
CE034 Fireworks maintains day-zero support for new models such as Llama 4, DeepSeek and Qwen as a core engineering discipline. SE006, SE011, SE012
CE035 Fireworks publishes open benchmark tooling via its GitHub organization, a developer-signal of technical openness. SE018
CU001 Fireworks' customer base spans AI-native startups, digital-native enterprises and large or regulated enterprises with distinct adoption paths. SU009, SU007
CU002 AI-native startups such as Cursor, Perplexity, Liner and Cresta adopt Fireworks bottoms-up via self-serve API keys. SU009, SU011
CU003 Digital-native enterprises including DoorDash, Notion, Shopify, Upwork and Quora run production AI features on Fireworks. SU011, SU007
CU004 Use cases cluster around code assistance, conversational AI, enterprise search, agentic workflows and voice across software, e-commerce and customer-service verticals. SU009, SU025
CU005 Fireworks' customer geography skews North American and European with global API access. SU025
CU006 Fireworks reported powering over 10,000 companies at its October 2025 Series C, about a tenfold increase from roughly 1,000 at the Series B. SU006, SU009
CU007 Fireworks serves hundreds of thousands of developers, up from 12,000 in February 2024 to 23,000 by the end of 2024. SU006, SU010
CU008 The platform processes more than 10 trillion tokens per day, rising to about 15 trillion by early 2026. SU006, SU007
CU009 Customers follow a land-and-expand path from serverless inference into dedicated deployments, fine-tuning, RFT, embeddings and voice. SU009, SU017
CU010 Analyst commentary on Hebbia shows how a single inference relationship can grow into a broader infrastructure dependency. SU017
CU011 Cursor used Fireworks' speculative-decoding API to reach roughly 1,000 tokens per second for Fast Apply, with a named researcher endorsing production use. SU001, SU013
CU012 Notion reduced AI response latency from about 2 seconds to 350 milliseconds by fine-tuning with Fireworks, attributed by its head of AI engineering. SU002
CU013 Sourcegraph cut latency by 30% and lifted completion acceptance 2.5x on Fireworks, corroborated by an AWS case study. SU003, SU012
CU014 Upwork's Uma assistant drafts real-time proposals on Fireworks per a named executive. SU004
CU015 Quora's Poe chatbot tripled response speed and Superhuman built its Ask AI compound system on Fireworks. SU013, SU007
CU016 Fireworks' named references are mostly production deployments with quantified outcomes and executive attribution, giving the reference base high quality. SU001, SU002, SU012
CU017 Fireworks does not disclose net revenue retention, gross retention, churn, renewal rates or contract lengths. SU009, SU017
CU018 Customer durability must be inferred from structural signals such as land-and-expand design and production usage rather than disclosed metrics. SU017, SU009
CU019 High daily token volume and named executive testimonials indicate strong repeat usage and satisfaction anecdotally. SU006, SU002
CU020 The OpenAI-compatible API and routing aggregators make multi-homing and switching trivial, elevating churn risk. SU018, SU021
CU021 Independent reviewers explicitly document Fireworks alternatives and switching paths, an adverse durability signal. SU018
CU022 Fireworks' growth engine is land-and-expand, with a single serverless feature able to grow into dedicated, fine-tuning, voice and reserved-capacity spend. SU009, SU017
CU023 Blended annualized revenue per company is roughly $28,000, likely understating a long tail beneath a few large accounts. SU022
CU024 The identity and revenue share of Fireworks' top customers are not disclosed, creating unquantifiable top-customer concentration risk. SU009, SU022
CU025 The AWS Strategic Collaboration Agreement and Microsoft Foundry availability are growth accelerants but also channel dependencies. SU009, SU024
CU026 Procurement friction is lower than for closed APIs via cloud marketplaces, but enterprise sales cycles and compliance reviews still gate the largest deals. SU009, SU024
CU027 Several marquee logos such as DoorDash and Shopify appear in aggregate marketing lists without standalone case studies. SU007, SU020
CU028 Sophisticated public customers like GitLab disclose AI-vendor dependence in their filings, illustrating buyer-side multi-homing and substitution capacity. SU016
CU029 WorkingAgents and other third parties corroborate Fireworks' compound-inference customer use cases for agentic workflows. SU015
CU030 Samsung is cited by investors as an enterprise customer accelerating its AI roadmap on Fireworks. SU011
CU031 The named reference base is high quality but partly dated to 2024, a freshness caveat for diligence. SU003, SU012
CU032 Fireworks' customer logos are concentrated in technology, e-commerce, customer service and legal-tech verticals. SU025
CU033 Production usage intensity is implied by 10-15 trillion tokens per day across the customer base. SU006, SU007
CU034 Customer satisfaction evidence is positive but anecdotal, resting on named testimonials rather than survey or NPS data. SU002, SU004
CU035 Retention is the weakest-evidenced dimension of Fireworks' customer story, a material diligence gap. SU017, SU018
CR001 Inference commoditization and gross-margin compression are Fireworks' highest-severity risks. SR001, SR011
CR002 Hyperscaler bundling by AWS, Azure and Google could capture the inference layer and relegate Fireworks to an optimization add-on. SR001
CR003 NVIDIA is simultaneously Fireworks' GPU supplier, an investor and a competitor via Lepton and NIM. SR001, SR008
CR004 Capital intensity from a planned three-to-four-fold compute expansion is a medium-severity risk. SR021, SR001
CR005 Fireworks' mitigation thesis is to move up the stack faster than the serving layer commoditizes. SR001
CR006 Residual risk exposure remains meaningful because several mitigations are unproven and key metrics are undisclosed. SR001, SR012
CR007 The EU AI Act imposes tiered, risk-based obligations including transparency and documentation duties on general-purpose AI providers and deployers. SR004, SR005
CR008 GDPR and data-residency requirements drive Fireworks' zero-data-retention and regional-deployment features. SR006, SR001
CR009 Open models such as Llama carry acceptable-use and license terms that flow through to platforms serving them. SR019, SR007
CR010 Fireworks lists no public patents, so its FireAttention and FireOptimizer advantages rest on trade secrets and know-how. SR013, SR001
CR011 No material litigation or enforcement action against Fireworks is publicly known, and its Series C used top-tier legal counsel. SR018, SR019
CR012 The NIST AI Risk Management Framework provides a voluntary governance baseline Fireworks and its customers can adopt. SR020
CR013 Fireworks does not own its GPU fleet and sources NVIDIA and AMD capacity from third parties, exposing it to allocation and supply risk. SR001, SR008
CR014 Fireworks does not publish a formal standard-tier SLA, so contractual reliability commitments are negotiated case by case. SR012
CR015 Independently monitored Q1 2026 uptime of 99.8% is a reliability strength despite the absence of a published SLA. SR012
CR016 Operating a global multi-region fleet adds operational complexity and cost for Fireworks. SR001
CR017 Fireworks' SOC2 Type II, HIPAA, zero-retention and airgapped controls mitigate operational and security risk, with no public breach known. SR001
CR018 A single serious outage or data incident would be especially damaging given customers' production, latency-sensitive workloads. SR012, SR001
CR019 NVIDIA is the most acute dependency, supplying leading-edge GPUs while holding a stake and competing through Lepton, a GPU marketplace and NIM. SR001, SR008
CR020 AMD provides an alternative silicon supplier, partly diversifying Fireworks' NVIDIA dependence. SR025
CR021 AWS and Microsoft are both distribution partners and bundling threats via Bedrock, Vertex and Azure Foundry. SR001
CR022 Fireworks depends on continued release and permissive licensing of open models from Meta, DeepSeek and Alibaba. SR001, SR009
CR023 Capital-provider concentration among a handful of late-stage funds and key-customer multi-homing add dependency risk. SR022, SR028
CR024 Fireworks' enabling partners NVIDIA, AWS and Microsoft are also its most credible competitors. SR001
CR025 Gross margin near 50% is structurally below software norms and faces persistent downward price pressure. SR001, SR011
CR026 The path to a 60% gross margin depends on unproven utilization gains and a revenue-mix shift. SR001
CR027 Burn, runway and net revenue retention are undisclosed, so Fireworks' capital adequacy is asserted rather than verified. SR001, SR021
CR028 The valuation ramp from $552 million to $4 billion within roughly fifteen months, with talk of $15 billion, embeds aggressive growth expectations. SR022, SR023
CR029 Key-person risk is concentrated in CEO Lin Qiao, who leads vision and fundraising. SR024
CR030 Retaining elite inference engineers in a hot talent market is a continuing execution challenge. SR024, SR001
CR031 Fireworks' mitigations include moving up the stack, diversifying silicon, maintaining day-zero model support and hardening compliance. SR001
CR032 Plugging into AWS and Azure procurement is a defensive mitigation against hyperscaler bundling. SR001
CR033 Execution risk centers on whether the unproven up-the-stack expansion outruns commoditization. SR001
CR034 Gross-margin trajectory toward 60% is the single best monitoring indicator of Fireworks' risk profile. SR001
CR035 The clearest thesis-break triggers are margin stuck at ~50%, hyperscaler/NVIDIA capture, a key-person departure, or growth stalling versus the valuation. SR001, SR022
CR036 Priority diligence asks are a reconciled revenue and margin figure, NRR and burn, GPU-supply contracts, and top-customer concentration. SR001, SR012
CR037 Public infrastructure peers such as Datadog, Snowflake, Confluent and Cloudflare disclose AI-competition and margin risk factors that contextualize Fireworks' exposures. SR014, SR015, SR016, SR017
CR038 DigitalOcean's filings illustrate the lower-margin reality of infrastructure-heavy businesses relative to pure software. SR030
CR039 Better-capitalized rivals such as Baseten raise the competitive stakes for Fireworks' enterprise go-to-market. SR028, SR027
CR040 Low switching costs from OpenAI-compatible APIs and routers cap retention and amplify commoditization risk. SR003, SR013
CR041 US export controls and supply constraints on advanced GPUs are an indirect risk transmitted through Fireworks' NVIDIA dependence. SR008, SR009
CR042 Fireworks' terms of service allocate liability and usage restrictions that are standard but warrant review for enterprise indemnification. SR019
CV001 The bull thesis is that Fireworks is becoming critical enterprise AI infrastructure as enterprises shift from closed-API experimentation to owning customized open models in production. SV026, SV008
CV002 The anti-thesis is that inference is structurally commoditizing, with ~50% margins, near-zero switching costs, and hyperscaler and NVIDIA repricing risk. SV001, SV016
CV003 Fireworks pairs a PyTorch-pedigree founding team with FireAttention, FireOptimizer, best-in-class function calling and 99.8% uptime. SV026, SV001
CV004 Fireworks grew from roughly $130 million ARR in mid-2025 to a reported ~$800 million annualized by May 2026 across 10,000-plus customers. SV001, SV029
CV005 Fireworks' per-token prices sit within ~2% of Together and open-source serving frameworks keep closing the performance gap, supporting the commoditization anti-thesis. SV016, SV001
CV006 A valuation that ran from $552 million to $4 billion in fifteen months, with $15 billion in talks, prices in flawless execution. SV001, SV008, SV029
CV007 We rate Fireworks AI track with medium confidence, a high risk rating and a stretched valuation stance. SV001, SV016
CV008 We assign an overall score of 6.5 out of 10, reflecting a strong business at a demanding price. SV001, SV026
CV009 The $4 billion Series C implied roughly 14 times the company-stated $280 million annualized revenue. SV008, SV001
CV010 The rumored $15 billion round implies roughly 19 times Sacra's ~$800 million May 2026 revenue estimate. SV001, SV004
CV011 Confidence is capped at medium primarily by the absence of audited financials, disclosed NRR and a reconciled revenue figure. SV001, SV016
CV012 Fireworks has raised over $327 million across seed, a $25M Series A, a $52M Series B at $552M and a $250M Series C at $4B. SV008, SV001
CV013 The Series C comprised roughly $230 million primary and a $20 million secondary. SV001
CV014 As of May 2026 Fireworks is reportedly in talks to raise at a $15 billion post-money valuation co-led by Index Ventures. SV001, SV004, SV002
CV015 Public evidence supports Fireworks' growth and customer story but not its financial quality, since revenue is unaudited, margin is estimated, and burn is undisclosed. SV001, SV016
CV016 Strategic investors NVIDIA, AMD, MongoDB and Databricks on the cap table add ecosystem support but concentrate supplier and partner influence. SV008, SV027
CV017 The base case (~45%) assumes ~$700-900 million 2026 revenue and low-50s margins, implying a fair value around $5-8 billion. SV001, SV005
CV018 The bull case (~30%) assumes margins toward 58-60% and revenue past $1.5 billion by 2027, justifying $15-20 billion. SV001, SV015
CV019 The bear case (~25%) assumes commoditization and hyperscaler capture compressing the multiple to a $2-3 billion range or a down round. SV016, SV001
CV020 The valuation dispersion is unusually wide because the same company can be read as durable infrastructure or a commoditizing reseller. SV001, SV015
CV021 The deciding evidence between scenarios, gross-margin trajectory and retention, is not yet disclosed. SV001
CV022 Together AI was valued at $3.3 billion on about $618 million annualized revenue in early 2025, roughly 5x, and is reportedly near $7.5 billion on about $1 billion. SV005
CV023 Baseten raised at a $5 billion valuation in January 2026 with talks of $11 billion, and Groq reached $6.9 billion as a hardware-led player, while Fal is cited around $4.5 billion. SV006, SV007, SV002
CV024 Public infrastructure-software comparables such as Datadog, Snowflake, Cloudflare and Confluent frame a broad, compressed multiple band with 70%-plus gross margins. SV011, SV012, SV013, SV020
CV025 DigitalOcean illustrates that lower-margin infrastructure businesses trade at clear discounts to pure software, supporting a discount for Fireworks' ~50% margins. SV014
CV026 Hyperscalers Amazon, Microsoft and Oracle are both the scale reference and the competitive threat to Fireworks' valuation. SV017, SV018, SV019
CV027 At $4 billion on ~$280 million Fireworks looks rich versus Together's multiple but is on a smaller, faster-growing base; on ~$800 million it looks comparatively cheap. SV001, SV005
CV028 Plausible exit paths include an IPO on sustained hypergrowth or strategic acquisition by a hyperscaler or data-platform investor that is also a competitor. SV017, SV018
CV029 The principal thesis-break triggers are margin failing to rise off ~50%, hyperscaler or NVIDIA capture, a key-person departure, or growth stalling versus the valuation. SV001, SV016
CV030 A net revenue retention below roughly 110% once disclosed would warrant a lower multiple. SV001
CV031 Priority diligence asks are a reconciled dated revenue figure, audited gross margin and the path to 60%, NRR and churn, burn and runway, GPU-supply terms, top-customer concentration and preference and dilution structure. SV001, SV016
CV032 Until margin and retention are confirmed, the right posture is to track closely, underwrite to the base case, and reserve premium entry for confirmation of the infrastructure thesis. SV001, SV015
CV033 Together's prior round at $1.25 billion on $130 million 2024 revenue traded at 9.6x, a useful inference-peer multiple benchmark. SV005
CV034 Fireworks' ~50% gross margin warrants a discount to the 70%-plus-margin public-software multiples because GPU costs sit in COGS. SV014, SV001
CV035 The $15 billion valuation talk is corroborated by Sacra and multiple news outlets as of late May 2026 but remains unconfirmed. SV001, SV002, SV003, SV024
CV036 The large AI inference TAM growing near 19% annually supports a premium for category leaders but does not by itself justify any single multiple. SV030, SV015
CV037 A premium entry would become attractive if Fireworks demonstrates a credible path to 60% margins and net revenue retention above 120%. SV001
CV038 Usage-based comparables like Twilio and AI-software names like C3.ai bound the multiple range for consumption- and AI-exposed businesses. SV021, SV023
CV039 Preference stack and liquidation overhang are not publicly disclosed and must be diligenced before a late-stage entry. SV001, SV010
CV040 Salesforce and other large software comps illustrate mature-growth multiple compression that a maturing Fireworks would eventually face. SV022
来源
编号出版方标题引文
SO001 Fireworks AI Fireworks AI - Fastest Inference for Generative AI
SO002 Fireworks AI Fireworks AI Raises $250M Series C to Power the Future of Enterprise AI Today, we're announcing a $250 million Series C at a $4 billion valuation ... brings our total funding to over $327 million
SO003 Fireworks AI Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems We're thrilled to announce our $52M Series B funding round led by Sequoia Capital, raising our valuation to $552M.
SO004 Index Ventures Inference is the New Runtime: Our Investment in Fireworks Alongside co-founders Dmytro Dzhulgakov, Dmytro Ivchenko, and James Reed ... as well as Benny Chen, Chenyu Zhao, and Pawel Garbacki
SO005 Sequoia Capital Fireworks Founder Lin Qiao on Fast Inference and Small Models
SO006 The AI Insider Fireworks AI Closes $250M Series C to Lead the AI Inference Market
SO007 The AI Insider Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems
SO008 PYMNTS Fireworks AI Valued at $552 Million After New Funding Round
SO009 Tech Funding News NVIDIA, Sequoia invest in GenAI startup Fireworks AI's $52M round
SO010 The SaaS News Fireworks AI Raises $52 Million in Series B
SO011 AI Curator Fireworks AI Closes $250M Round, Eyes AI Inference Lead
SO012 AIM Media House Fireworks AI raises $250 million for enterprise AI infrastructure
SO013 Sacra Fireworks AI revenue, valuation & funding Sacra estimates that Fireworks AI hit $800M in annualized revenue in May 2026, up from about $305M at the end of 2025.
SO014 Scroll.media Fireworks AI has a valuation of $552 million. Ukrainians among the founders. the company is already profitable, growing at an astonishing x20 year-over-year, with $130 million in annual recurring revenue (ARR).
SO015 The Stack Fireworks AI's Lin Qiao: The future is compound AI
SO016 TWIML AI Lin Qiao profile
SO017 Crunchbase Fireworks AI - Company Profile
SO018 AI Market Watch Fireworks AI - AI Startup Profile Scaled to 15 trillion tokens processed daily and $315M+ annualized revenue by early 2026.
SO019 SiliconANGLE Fireworks AI raises $250M at $4B valuation to help enterprises with AI inference workloads
SO020 Business Wire Fireworks AI Raises $250M Series C to Lead the AI Inference Market
SO021 Orrick Fireworks AI Raises $250 Million Series C at $4 Billion Valuation
SO022 Tech Funding News PyTorch engineers' brainchild Fireworks AI closes $250M at $4B valuation
SO023 Exa Meet the Executive Team at Fireworks AI
SO024 GitHub Fireworks AI (fw-ai) GitHub organization
SO025 Fireworks AI Fireworks AI Careers
SO026 eesel AI An honest Fireworks AI review (2025): The good, the bad, and the ugly Fireworks excels at performance and model selection, but it is 'just the engine' - developers and businesses still need technical sophistication to build deployable solutions.
SM001 MarketsandMarkets AI Inference Market - Global Forecast to 2030 the AI inference market is expected to grow from USD 106.15 billion in 2025 to USD 254.98 billion by 2030, at a CAGR of 19.2%
SM002 Polaris Market Research AI Inference Market Size & Trends, Industry Report 2034
SM003 Research and Markets AI Inference Market Outlook 2026-2034
SM004 Vention State of AI 2026 - AI Market Size, Investment, and Industry Data
SM005 Precedence Research AI Inference Market Size and Forecast
SM006 Digital Applied AI Inference Providers Compared: Q2 2026 Pricing Matrix By Q2 2026 the serverless inference market has consolidated around seven providers - Together, Fireworks, Anyscale, Groq, Cerebras, Replicate, and OctoAI.
SM007 Alatirok AI Inference Providers in 2026: 5-Way Comparison
SM008 Jimmy Research Fireworks AI - entity profile
SM009 Index Ventures Inference is the New Runtime: Our Investment in Fireworks Gartner projects GenAI model spend to nearly triple from $14 billion in 2025 to $39 billion by 2028
SM010 Sacra Fireworks AI revenue, valuation & funding
SM011 Sacra Together AI revenue, valuation & funding Sacra estimates that Together AI hit $1B in annualized revenue in February 2026, up from ~$618M at the end of 2025.
SM012 Sacra Baseten revenue, valuation & funding
SM013 Fireworks AI Fireworks AI Raises $250M Series C
SM014 SiliconANGLE Fireworks AI raises $250M at $4B valuation
SM015 Microsoft Azure Introducing Fireworks AI on Microsoft Foundry
SM016 Together AI Together AI - The AI Acceleration Cloud
SM017 Together AI Together AI Pricing
SM018 Baseten Baseten - Inference Platform
SM019 Groq Groq - Fast, low cost inference
SM020 Modal Modal - High-performance AI infrastructure
SM021 Replicate Replicate - Run AI with an API
SM022 Anyscale Anyscale - Scalable compute for AI
SM023 TokenMix Fireworks AI Review 2026
SM024 DeployBase Fireworks AI Pricing Breakdown
SM025 eesel AI An honest Fireworks AI review (2025) the industry expects this downward pricing pressure to intensify by 2025-2026, making it difficult for any single provider to maintain high profit margins
SM026 EU AI Act (artificialintelligenceact.eu) High-level summary of the AI Act
SP001 TokenMix Fireworks AI Review 2026: 99.8% Uptime vs Together and Groq Fireworks: 99.8% uptime + best function calling, 50+ models, $0.90/M. Together: 200+ models + cheap fine-tuning, $0.88/M. Groq: ultra-low latency, $0.59/M but lowest uptime (99.4%).
SP002 Sacra Together AI revenue, valuation & funding Together AI raised a $305M Series B in February 2025 led by General Catalyst ... valuing the company at $3.3B
SP003 Sacra Baseten revenue, valuation & funding
SP004 Business Wire Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future
SP005 DatacenterDynamics AI chip company Groq raises $750m at $6.9bn valuation
SP006 Dataconomy AI chip startup Groq raises $750 million at a $6.9 billion valuation
SP007 The AI World Baseten raises $300M to scale AI inference
SP008 TechBuzz Groq Raises $750M at $6.9B Valuation to Challenge Nvidia's AI Dominance
SP009 Sacra Fireworks AI revenue, valuation & funding (competition section) Together AI is Fireworks' closest direct competitor ... Baseten raised a $300M Series E at a $5 billion valuation
SP010 Digital Applied AI Inference Providers Compared: Q2 2026 Pricing Matrix
SP011 DeployBase Fireworks AI Pricing Breakdown vs competitors
SP012 Modal Modal - High-performance AI infrastructure
SP013 Replicate Replicate - Run AI with an API
SP014 Anyscale Anyscale - Scalable compute for AI
SP015 Baseten Baseten Pricing
SP016 Microsoft Foundry Fireworks models on Microsoft Foundry
SP017 Groq Groq - Fast, low cost inference
SP018 Together AI Together AI - The AI Acceleration Cloud
SP019 Alatirok AI Inference Providers in 2026: 5-Way Comparison
SP020 Walturn What is Fireworks AI? Features, Pricing, and Use Cases
SP021 createaiagent.net Fireworks AI: Optimized Inference Solutions
SP022 Fireworks AI Fireworks AI Raises $250M Series C
SP023 eesel AI An honest Fireworks AI review (2025) Critics note that, while Fireworks excels at performance and model selection, it is 'just the engine'.
SP024 SiliconANGLE Fireworks AI raises $250M at $4B valuation
SP025 Together AI Together AI Pricing
SI001 Fireworks AI Fireworks AI Raises $250M Series C our annualized revenue has surpassed $280 million ... Growing our computation footprint 3-4x over the next year
SI002 Sacra Fireworks AI revenue, valuation & funding The company's gross margin sits at approximately 50% ... Fireworks has told investors it is targeting 60% gross margins
SI003 Fireworks AI Fireworks AI Pricing
SI004 TokenMix Fireworks AI Review 2026 - pricing breakdown Llama 70B $0.90/M, Llama 8B $0.20/M, DeepSeek V3 $0.50/M ... Reserved capacity ... approximately $4.80/hour
SI005 DeployBase Fireworks AI Pricing Breakdown: Cost Per Token
SI006 SiliconANGLE Fireworks AI raises $250M at $4B valuation
SI007 Amazon Web Services Fireworks.ai Case Study the customer cut total costs by four times ... HIPAA and SOC2 Type II compliant
SI008 Markaicode Fireworks AI Architecture: Multi-Layer Inference Stack for 50K RPM
SI009 Scroll.media Fireworks AI valuation and ARR the company is already profitable, growing at an astonishing x20 year-over-year, with $130 million in annual recurring revenue (ARR).
SI010 AI Market Watch Fireworks AI - AI Startup Profile Scaled to 15 trillion tokens processed daily and $315M+ annualized revenue by early 2026.
SI011 Digital Applied AI Inference Providers Compared: Q2 2026 Pricing Matrix
SI012 U.S. Securities and Exchange Commission (NVIDIA) NVIDIA Corporation Form 10-K (FY ended January 25, 2026)
SI013 U.S. Securities and Exchange Commission (AMD) Advanced Micro Devices Form 10-K (FY ended December 27, 2025)
SI014 U.S. Securities and Exchange Commission (MongoDB) MongoDB, Inc. Form 10-K (FY ended January 31, 2026)
SI015 Fireworks AI Fireworks AI Docs - Concepts
SI016 Fireworks AI FireOptimizer: Customizing latency and quality
SI017 Index Ventures Inference is the New Runtime
SI018 Fireworks AI Multi-LoRA: Personalize AI at scale
SI019 Sanjay Says Fireworks AI and Adaptive Speculative Execution
SI020 eesel AI An honest Fireworks AI review (2025) there is pressure for all inference providers to cut prices ... making it difficult for any single provider to maintain high profit margins
SI021 Crunchbase Fireworks AI - Company Profile
SI022 Business Wire Fireworks AI Raises $250M Series C
SI023 Tech Funding News Fireworks AI closes $250M at $4B valuation
SI024 Fireworks AI Fireworks AI Docs - Deploying LoRAs
SI025 Fireworks AI Fireworks AI Docs - Changelog
SE001 Markaicode Fireworks AI Architecture: Multi-Layer Inference Stack for 50K RPM Stateless request router ... Draft GPU pods running a small fast model ... Target GPU pods ... Distributed KV cache ... above 85 tokens/sec per GPU
SE002 Fireworks AI FireOptimizer: Customizing latency and quality for production
SE003 Fireworks AI Reinforcement Fine Tuning: Train expert open models to surpass closed
SE004 Fireworks AI Fireworks RFT: Build AI agents with fine-tuned open models
SE005 Fireworks AI Introducing Supervised Fine Tuning V2
SE006 Fireworks AI Optimizing Llama 4 Maverick on Fireworks AI Llama 4 Maverick became available day one on Fireworks with support for 1-million-token context ... custom attention via FireAttention
SE007 Amazon Web Services Fireworks.ai Case Study (HIPAA / SOC2) the Fireworks.ai inference solution built on AWS is HIPAA and SOC2 Type II compliant
SE008 Fireworks AI Speculative Decoding - Fireworks AI Docs
SE009 Sanjay Says Fireworks AI and Adaptive Speculative Execution
SE010 Fireworks AI Fireworks AI Docs - Introduction
SE011 Fireworks AI DeepSeek V3.1 now on Fireworks AI
SE012 Fireworks AI Fine-Tuning DeepSeek v3 & R1 to optimize quality, latency, & cost
SE013 TokenMix Fireworks AI Review 2026: uptime and function calling benchmarks FireFunction 92.1% multi-tool accuracy ... 99.8% uptime, highest in inference market
SE014 Fireworks AI How Cursor built Fast Apply using the Speculative Decoding API Cursor ... achieve 1000 tokens/sec for code generation use cases such as instant apply
SE015 Fireworks AI How Notion fine-tuned with Fireworks we reduced latency from about 2 seconds to 350 milliseconds
SE016 Fireworks AI How Sourcegraph scaled real-time code assistance with Fireworks
SE017 Sacra Fireworks AI - enterprise security posture zero data retention by default, SSO, audit logs, data residency controls, HIPAA and SOC2 compliance posture, and airgapped EKS deployments
SE018 GitHub Fireworks AI (fw-ai) GitHub organization and benchmarks
SE019 Fireworks AI Fireworks AI Dev Day 2025 Wrapped
SE020 Fireworks AI Multi-LoRA: Personalize AI at scale
SE021 Fireworks AI Fireworks AI Docs - Concepts
SE022 Fireworks AI Fireworks AI Raises $250M Series C (roadmap) Expand Our Product into a Comprehensive AI Creation Toolchain ... Growing our computation footprint 3-4x
SE023 Fireworks AI Fireworks AI - AI-native
SE024 Fireworks AI Fireworks AI Docs - Deploying LoRAs
SE025 eesel AI An honest Fireworks AI review (2025): documentation gaps Some reviews point to limited transparency around free usage, sporadic documentation, and potential support slowdowns
SE026 Walturn What is Fireworks AI? Features, Pricing, and Use Cases
SE027 DeployBase Fireworks AI Pricing and capabilities breakdown
SU001 Fireworks AI How Cursor built Fast Apply using the Speculative Decoding API Fireworks is way more performant than the open source engines and is what we use in production.
SU002 Fireworks AI How Notion fine-tuned models with Fireworks we reduced latency from about 2 seconds to 350 milliseconds
SU003 Fireworks AI Real-time code assistance: How Sourcegraph scaled with Fireworks
SU004 Fireworks AI How Upwork and Fireworks deliver faster proposals (Uma)
SU005 Fireworks AI Accelerating Code Completion with Fireworks Fast LLM Inference
SU006 Fireworks AI Fireworks AI Raises $250M Series C (customer scale) Fireworks now powers over 10,000 companies (a 10x increase from our Series B)
SU007 AI Market Watch Fireworks AI - notable customers and growth metrics Notable customers: Quora, DoorDash, Upwork, Cresta, Cursor, Liner, Superhuman, Sourcegraph, Tome, Samsung, Uber, Notion, Shopify
SU008 Fireworks AI Fireworks AI - Customers
SU009 Sacra Fireworks AI - customer base and expansion The customer base grew from roughly 1,000 companies at the time of the Series B to more than 10,000 companies by October 2025.
SU010 Scroll.media Fireworks AI developer growth 2024 The number of developers using Fireworks AI jumped from 12,000 in February 2024 to 23,000 by year's end.
SU011 Index Ventures Inference is the New Runtime (customer references) high-throughput, latency-sensitive applications at companies like Uber, DoorDash, Notion, Quora, and Upwork ... enterprise leaders like Samsung
SU012 Amazon Web Services Fireworks.ai Case Study (Sourcegraph / Cody) Cody doubled its completion acceptance rate ... Cody's backend latency accelerated by more than two times.
SU013 Fireworks AI Fireworks AI Series B (Cursor, Quora, Upwork, Superhuman) Superhuman ... used Fireworks to create Ask AI, a compound AI system
SU014 Fireworks AI Fireworks AI - AI-native customers
SU015 WorkingAgents Fireworks AI: The Compound Inference Engine
SU016 GitLab Inc. (SEC EDGAR) GitLab Inc. Form 10-K (FY ended January 31, 2026)
SU017 Sacra Fireworks AI - retention and expansion dynamics a single inference relationship can anchor a broader infrastructure dependency over time
SU018 eesel AI Fireworks AI alternatives and switching considerations
SU019 eesel AI An honest Fireworks AI review (2025)
SU020 Fireworks AI Fireworks AI homepage (customer logos)
SU021 TokenMix Fireworks AI Review 2026 - production usage
SU022 Sacra Fireworks AI - business model and ARPA Blended annualized revenue per company works out to roughly $28,000 across the full base
SU023 Fireworks AI Fireworks AI Blog index
SU024 Fireworks AI Fireworks AI at AWS re:Invent 2025
SU025 AI Market Watch Fireworks AI - geographic focus and industries
SR001 Sacra Fireworks AI - risks section the proprietary performance advantage in FireAttention and FireOptimizer is likely to compress ... Hyperscaler capture ... Hardware concentration
SR002 eesel AI An honest Fireworks AI review (2025): risks
SR003 eesel AI Fireworks AI alternatives (switching risk)
SR004 EU AI Act (artificialintelligenceact.eu) High-level summary of the AI Act
SR005 EU AI Act (artificialintelligenceact.eu) Article 53: Obligations for providers of general-purpose AI models
SR006 GDPR.eu What is GDPR, the EU's data protection law?
SR007 European Commission Regulatory framework for AI
SR008 U.S. Securities and Exchange Commission (NVIDIA) NVIDIA Corporation Form 10-K (FY ended January 25, 2026)
SR009 DatacenterDynamics Groq raises $750m at $6.9bn valuation (silicon competition)
SR010 Dataconomy Groq raises $750 million (NVIDIA challenge)
SR011 Digital Applied AI Inference Providers Pricing Matrix Q2 2026 (price pressure)
SR012 TokenMix Fireworks AI Review 2026 (SLA and pricing risk) Fireworks AI does not publish a formal SLA for its standard tier
SR013 Walturn What is Fireworks AI? (risks and lock-in)
SR014 U.S. Securities and Exchange Commission (Datadog) Datadog, Inc. Form 10-K (FY ended December 31, 2025)
SR015 U.S. Securities and Exchange Commission (Snowflake) Snowflake Inc. Form 10-K (FY ended January 31, 2026)
SR016 U.S. Securities and Exchange Commission (Confluent) Confluent, Inc. Form 10-K (FY ended December 31, 2025)
SR017 U.S. Securities and Exchange Commission (Cloudflare) Cloudflare, Inc. Form 10-K (FY ended December 31, 2025)
SR018 Orrick Fireworks AI Series C legal counsel
SR019 Fireworks AI Fireworks AI Terms of Service
SR020 NIST AI Risk Management Framework
SR021 Fireworks AI Fireworks AI Raises $250M Series C (use of funds / capital intensity)
SR022 SiliconANGLE Fireworks AI raises $250M at $4B valuation (valuation ramp)
SR023 Scroll.media Fireworks AI valuation ramp 552M to 4B
SR024 Index Ventures Inference is the New Runtime (founder dependency)
SR025 Advanced Micro Devices (SEC EDGAR) AMD Form 10-K (alternative silicon supply)
SR026 DeployBase Fireworks AI Pricing (margin pressure)
SR027 Alatirok AI Inference Providers 2026 (competitive risk)
SR028 Business Wire Baseten Raises $300M (capital asymmetry)
SR029 GitLab Inc. (SEC EDGAR) GitLab Form 10-K (AI vendor risk-factor comparable)
SR030 DigitalOcean (SEC EDGAR) DigitalOcean Form 10-K (infrastructure margin comparable)
SV001 Sacra Fireworks AI revenue, valuation & funding Fireworks AI is in talks to raise a new funding round at a $15 billion post-money valuation, with Index Ventures set to co-lead.
SV002 AI Weekly Fireworks AI Targets $15B Valuation in New Round
SV003 StartupNews.fyi Fireworks AI Seeks $15B Funding, Quadrupling Valuation
SV004 Yahoo Finance Fireworks AI Eyes $15 Billion Valuation In New Funding Talks
SV005 Sacra Together AI revenue, valuation & funding Based on 2024 revenue of $130M and a $1.25B valuation, the company traded at a 9.6x revenue multiple at its prior round.
SV006 Sacra Baseten revenue, valuation & funding
SV007 DatacenterDynamics Groq raises $750m at $6.9bn valuation
SV008 Fireworks AI Fireworks AI Raises $250M Series C at $4B valuation
SV009 SiliconANGLE Fireworks AI raises $250M at $4B valuation
SV010 Orrick Fireworks AI Raises $250 Million Series C at $4 Billion Valuation
SV011 U.S. Securities and Exchange Commission (Datadog) Datadog, Inc. Form 10-K (FY 2025)
SV012 U.S. Securities and Exchange Commission (Snowflake) Snowflake Inc. Form 10-K (FY 2026)
SV013 U.S. Securities and Exchange Commission (Cloudflare) Cloudflare, Inc. Form 10-K (FY 2025)
SV014 U.S. Securities and Exchange Commission (DigitalOcean) DigitalOcean Holdings Form 10-K (FY 2025)
SV015 a16z AI Inference Economics
SV016 eesel AI An honest Fireworks AI review (2025): margin and commoditization
SV017 U.S. Securities and Exchange Commission (Amazon) Amazon.com, Inc. Form 10-K (FY 2025)
SV018 U.S. Securities and Exchange Commission (Microsoft) Microsoft Corporation Form 10-K (FY 2025)
SV019 U.S. Securities and Exchange Commission (Oracle) Oracle Corporation Form 10-K (FY 2025)
SV020 U.S. Securities and Exchange Commission (Confluent) Confluent, Inc. Form 10-K (FY 2025)
SV021 U.S. Securities and Exchange Commission (Twilio) Twilio Inc. Form 10-K (FY 2025)
SV022 U.S. Securities and Exchange Commission (Salesforce) Salesforce, Inc. Form 10-K (FY 2026)
SV023 U.S. Securities and Exchange Commission (C3.ai) C3.ai, Inc. Form 10-K (FY 2025)
SV024 CryptoBriefing Fireworks AI reportedly seeks funding at $15 billion valuation
SV025 Briefs.co Fireworks AI Eyes $15B Valuation In New Funding Round
SV026 Index Ventures Inference is the New Runtime (thesis)
SV027 Tech Funding News Fireworks AI closes $250M at $4B valuation
SV028 AI Market Watch Fireworks AI - revenue and valuation profile
SV029 Scroll.media Fireworks AI valuation ramp 552M to 4B
SV030 MarketsandMarkets AI Inference Market - Global Forecast to 2030