Baseten
面向生产 AI 工作负载的高端推理基础设施
Baseten 是一家高质量 AI 推理基础设施公司,企业牵引力真实、品类位置也强;但公开财务披露太薄,还不足以把动量定价当成高确信买入。
封面要素
公司概况
Baseten 是一家总部位于 San Francisco 的推理基础设施公司,2019 年由 Tuhin Srivastava、Amir Haghighat、Phil Howes 和 Pankaj Gupta 创立。公司把自己定位为运行生产 AI 工作负载的软件层,可在自有云或客户环境中组合模型 API、专用推理、训练和企业级部署控制。公开客户证据覆盖 AI 原生产品和受监管工作负载,包括 Cursor、Clay、OpenEvidence、Abridge、Gamma、Patreon、Speechify、Writer、Hebbia、Wispr Flow 等。Baseten 在 2026 年 1 月以 $5B 估值融资 $300M,披露融资总额增至约 $585M,也强化了它作为后期私营基础设施公司的定位。
- 创始人
- Tuhin Srivastava, Amir Haghighat, Phil Howes, Pankaj Gupta
- 创立地点
- San Francisco, CA, USA
- 总部
- San Francisco, CA, USA
- 产品
- Baseten 销售面向自定义模型、模型 API、专用推理、训练工作流和复合 AI 编排的生产推理平台。Truss 框架是开发者入口;企业功能则强调多云部署、自托管、区域控制、合规范围,以及面向延迟敏感工作负载的性能调优。
- 客户
- 企业 AI 团队、AI 原生应用构建者,以及需要低延迟、高可靠模型服务并要求安全、混合部署或性能工程支持的受监管工作负载。公开客户证据在医疗 AI、开发者工具、GTM 自动化、语音和生产力软件中最强。
- 商业模式
- 基于用量变现:模型 API 按 token 计价,部署按 GPU 和 CPU 计算分钟计费,专用容量、支持以及自托管或对数据驻留敏感的环境通过议价的 Pro 或 Enterprise 合同销售。平台似乎还通过训练、Chains 和更重的企业工程支持扩张。
- 阶段
- Late-stage private (Series E)
- 融资情况
- 公开融资历史包括种子轮和 Series A 合计略高于 $20M,2024 年 Series B $40M,2025 年 2 月 Series C $75M,2025 年 9 月 Series D $150M,以及 2026 年 1 月以 $5B 估值完成的 Series E $300M。Business Wire 称披露融资总额约 $585M。
执行摘要
主要优势
- 站在 AI 栈高增长环节的高端位置:面向开源、自定义和混合负载的生产推理
- 近期融资和投资人阵容强,IVP、CapitalG、NVIDIA、Greylock、Spark 等机构都已下注
- Truss、专用推理、自托管和多云部署控制,给开发者和企业切入点都很清晰
- 医疗、代码、语音、GTM 和生产力负载里的公开客户证据,说明需求不局限于单一垂直
- 客户案例反复提到,相比此前推理方案,延迟、吞吐和成本都有明显改善
主要风险
- 公开收入、利润率、烧钱速度、客户集中度和留存数据仍未披露,投资人只能依赖估算而非申报财务
- 高端定价会被更便宜的 GPU 云压制,也会被超大规模云厂商把推理捆进更广云关系的做法挤压
- 可靠性和合规承销仍要看谈判条款、BAA 和定制 SLA,不能只看官网叙述
- 从 $5B 已完成估值跳到传闻中的 $11B 后续定价,若没有更好披露,很难支撑
- Baseten 似乎跑的是支持重、性能工程密集的模式,扩张难度可能高于纯软件叙事暗示
未决问题
- 经审计收入、毛利率、烧钱速度、现金跑道和客户集中度数据未公开
- 没有公开证据能解决 NRR、流失率或合同期限等留存指标
- 公开治理能见度有限;当前完整董事会、委员会和创始人持股未在已抓取材料中披露
- 医疗和受监管场景的承销仍需要生产账户的准确 BAA、DPA 和责任共担条款
- 任何传闻中 Series E 之后融资的经济性和条款均未公开
目录
01公司概况
1.1 身份、历史与领导层
Baseten 最容易理解为一家创始人主导的推理基础设施公司,而不是通用 MLOps 工具箱。它自己的历史把起点追溯到 2019 年底,当时四位创始人因为亲身经历过模型部署痛点而创办公司。当前法律页面把公司主体固定为 San Francisco 的 Baseten Labs, Inc.;官网、企业版和定价页面也一致围绕高性能推理、托管 API、训练或部署工作流来描述产品,而不是宽泛的横向软件。这个身份会影响本报告后续判断:Baseten 正把自己定位为软件层,在自有云或客户环境中运行生产 AI 工作负载,并用合规和数据控制功能服务更敏感的工作负载。对一家后期私营公司而言,领导层连续性也异常可见。公开页面显示 Tuhin Srivastava 是 CEO 兼联合创始人,Amir Haghighat 是 CTO 兼联合创始人;作者页仍把 Pankaj Gupta 和 Phil Howes 标为联合创始人。Series E 博文由四位创始人共同署名,进一步说明公司仍在讲一个以创始人为中心的领导故事。治理透明度是需要保留的风险点:公开语料清楚显示 2025 年新增一名董事,但没有给出完整现任董事名单、委员会结构或创始人持股图。[CO001, CO002, CO003, CO004, CO005, CO006]
| 指标 | 数值 / 状态 | 截至 | 置信度 | 备注 / 缺口 |
|---|---|---|---|---|
| 成立 | 2019 | 2019-01-01 | 高 | 抓取材料未呈现准确公开日期,因此年份是可靠锚点。 |
| 总部 | San Francisco, California(加州) | 2026-05-30 | 高 | 隐私政策给出了具体 San Francisco 地址。 |
| 法律实体 | Baseten Labs, Inc. 法律实体 | 2026-05-30 | 高 | 条款和隐私政策使用同一个法律实体名称。 |
| 当前阶段 | 私营,Series E | 2026-05-30 | 高 | Tracxn 和 2026 年 1 月轮次后的归档 PitchBook 资料提供支持。 |
| 最新融资 | $300M Series E,估值 $5B | 2026-01-20 | 高 | 由 IVP 和 CapitalG 领投,NVIDIA 和既有投资人也参与。 |
| 累计融资 | $585M | 2026-01-23 | 高 | BusinessWire 和市场数据来源对累计融资口径一致。 |
| 商业模式 | 按用量计费的 API token,加按分钟计费的算力 | 2026-05-30 | 中 | 公开定价清楚;企业合同折扣和最低消费不清楚。 |
| 部署模式 | Baseten Cloud、自托管和区域感知企业选项 | 2026-05-30 | 高 | 官方企业和医疗页面强调自托管与数据控制功能。 |
| 具名客户组合 | 具名客户:Abridge、Cursor、Clay、OpenEvidence、Notion、Speechify、Gamma | 2026-05-30 | 高 | 这些客户出现在招聘、客户中心、客户故事和 Series E 新闻材料中。 |
| 公开员工数 | 2026-05-30 | 低 | PitchBook 和 Tracxn 口径冲突,因此当前员工数应视为未解决。 |
快照行混合了稳定身份事实,以及当前运营和融资标记;null 表示抓取的公开材料不足以可靠支撑单一数字。
[CO001, CO002, CO003, CO004, CO006, CO007]| 人物 | 当前角色 / 公开头衔 | 公开适配度或覆盖证据 | 当前能见度 | 尽调含义 |
|---|---|---|---|---|
| Tuhin Srivastava(创始人) | CEO,联合创始人 | 他是融资和公司逻辑的公开发言人;作者页和融资报道都把他标识为 CEO。 | 高 | 创始人主导叙事仍是优势,但也带来 CEO 关键人依赖。 |
| Amir Haghighat(创始人) | CTO,联合创始人 | 作者页标识其为技术负责人;Series E 签名仍把他放在可见创始人组合中。 | 高 | 技术和产品可信度仍绑定创始高管。 |
| Phil Howes(创始人) | 联合创始人;独立报道称其为首席科学家 | 官方作者页和 Tech Funding News 显示,其围绕模型性能和研究仍保持创始人能见度。 | 中 | 即使具体组织架构不公开,科学领导力看起来仍根植于创始团队。 |
| Pankaj Gupta(创始人) | 联合创始人 | 官方作者页和 Series E 签名确认其连续性,但抓取材料没有呈现当前运营头衔。 | 中 | 职能覆盖比 CEO 和 CTO 更不透明。 |
| Jay Simons(董事) | Series D 后董事会成员 | Series D 明确称,他作为 BOND 领投融资的一部分加入董事会。 | 低 | 2025 年治理能见度有所改善,但完整董事会和委员会地图仍不完整。 |
各行覆盖创始人,以及抓取材料中明确的一名新增董事;已审阅公开材料没有披露完整高管名单或详细委员会结构。
[CO010, CO011, CO012, CO013, CO014, CO015]Baseten 当前叙事把创始人连续性、部署控制、客户验证过的性能和反复拿到资本,串成一条推理平台投资逻辑。
[CO004, CO005, CO006, CO008, CO010, CO011]1.2 融资、阶段与投资者基础
Baseten 的资本历史现在是最清晰的外部信号,说明公司已经进入后期 AI 基础设施。抓取语料支持一条路径:从资金不多的早期公司走到 Series E。Series A 博文称 Baseten 种子轮和 Series A 合计融资略高于 $20 million;Series B 公告新增 $40 million;Series C 新增 $75 million;Series D 新增 $150 million;Series E 又以 $5 billion 估值新增 $300 million。独立市场数据来源佐证了这些轮次的时间,并把 2025 年 9 月 Series D 估值放在约 $2.15 billion,显示公司在 2026 年 1 月轮前重估速度很快。投资者名单也在加深,而不是频繁换血。Greylock 和 South Park Commons 很早出现,IVP 和 Spark 在成长期变得可见,后续轮次又加入 BOND、CapitalG、Conviction、01A、BoxGroup 和 NVIDIA。这个模式重要,因为它同时意味着内部投资者反复支持,以及 AI 专业和平台型投资者范围扩大。但公开披露距离完整 cap table 仍差很远:抓取语料没有披露当前持股比例、投资者控制权、清算优先权,或足够可靠的董事会观察员图谱。[CO016, CO017, CO018, CO019, CO020, CO021]
| 投资人 / 利益相关方 | 材料中首次明确轮次 | 当前相关性 | 重要性 | 尽调请求 |
|---|---|---|---|---|
| 投资方:Greylock | Series A 轮 | 抓取的官方材料中最早明确具名的机构支持方 | 锚定公司早期形成,并在后续增长轮叙事中继续可见。 | 确认 Series E 后的当前持股和按比例跟投情况。 |
| 投资方:South Park Commons | Series A 轮 | 早期网络型支持方,仍出现在后续公司历史中 | 代表创始人网络支持,而不只是后期资本。 | 确认多轮估值上调后,SPC 是否仍持有有意义股权。 |
| 投资方:IVP | Series B 轮 | 反复领投的增长期投资人,也领投或锚定后续轮次 | 在 B、C、E 轮反复出现,是最清晰的长期财务赞助方之一。 | 确认 Series E 的董事会权利、预留资金行为和集中度。 |
| 投资方:Spark Capital | Series B 轮 | 2024–2025 轮次中可见的早期增长投资人 | 帮助说明从生成式 AI 扩张期到后续融资动量的连续性。 | 确认当前持股,以及 Spark 在 Series E 后是否仍参与。 |
| 投资方:01A | Series C 轮 | 后期投资人,仍在后续融资数据中具名 | 把 Baseten 连接到 Adam Bain 和 Dick Costolo 网络中的运营者投资人支持。 | 确认 01A 是否有治理权,还是只有经济敞口。 |
| 投资方:BOND | Series D 轮 | Series D 领投方,并参与后续 Series E | 是 2025 年估值上台阶和董事会演进的重要标记。 | 确认 BOND 在 D 到 E 过渡中是否加入特殊条款。 |
| 投资方:CapitalG | Series D 轮 | D 轮加入,并共同领投 E 轮 | 围绕 Google 生态分发和基础设施可信度,可能提供有价值战略网络。 | 澄清除纯股权支持外,是否存在商业合作或渠道重叠。 |
| 投资方:NVIDIA | Series E 轮 | 最新轮次战略投资人 | 可能影响硬件获取、性能协作和 AI 基础设施圈层信号。 | 确认关系是否包含商业承诺或优先硬件访问。 |
| 投资方:Conviction | Series C 轮 | C、D、E 时代披露中均可见 | 增加 AI 专业投资人支持,并公开倡导推理层逻辑。 | 确认当前持股和董事会或观察员权利。 |
| 投资方:BoxGroup | Series D 轮 | 后续轮次投资人名单中仍然具名 | 即使股权结构变深,也显示早期网络投资人持续支持。 | 确认后续轮次中持仓规模是实质参与还是象征性参与。 |
这张地图列举了经验证公开融资来源中从 Series A 到 Series E 明确具名的投资人;它不是完整股权结构表,也不披露持股比例或清算优先权。
[CO016, CO017, CO018, CO019, CO020, CO022]1.3 产品范围、规模证据与里程碑
Baseten 的产品和规模故事已经足以解释它为何能在约一年内三次融资,但仍要带着保留读。官方材料如今把公司放进更宽的推理平台叙事:云端和自托管部署选项、企业控制、模型 API,以及按量计费的计算定价。最强外部证据来自客户和合作伙伴。NVIDIA 案例研究称,Baseten 将冷启动从最多 5 分钟降至 5–10 秒,并用 TensorRT-LLM 让一位客户的推理性能翻倍。客户案例称,OpenEvidence 每周在 Baseten 上处理数十亿次请求,Gamma 每天为 70+ million 用户生成约 3 million 张图片,Speechify 将每百万字符成本降低 44%,Patreon 将 GPU 成本降低 70%。这些数字支持一个判断:Baseten 正在医疗、生产力、创作者和 GTM 软件中承载有意义的生产工作负载。不过第一章尽调应保留三面警示旗:不同数据供应商的员工数不一致,融资公告之外的治理细节仍薄,独立可达性监测缺少足够事故细节,无法完整评估可靠性历史。因此,正确结论不是 Baseten 缺少规模,而是这家公司在运营上已经重要,但相对于自身估值仍异常私密。[CO005, CO006, CO007, CO008, CO021, CO028]
| 日期 | 事件 | 类型 | 金额 / 估值 / 状态 | 参与方 | 含义 |
|---|---|---|---|---|---|
| 2019-01-01 | 创始人为解决 ML 部署痛点创办 Baseten | 创立 | 公司成立 | 创始人:Tuhin Srivastava;Amir Haghighat;Phil Howes;Pankaj Gupta | 建立当前推理优先叙事的起点。 |
| 2021-05-01 | 约 18 个月开发后,早期产品低调发布;Series A 文章突出公开 beta | 产品 | 公开 beta 阶段 | Baseten 创始人 | 说明公司在后续资本冲刺前很久,就已从内部构建转向市场测试。 |
| 2022-04-26 | Series A 里程碑正式确认早期投资人支持 | 融资 | > $20M 累计种子轮 + Series A | 投资方:Greylock;South Park Commons;Lachy Groom;Ray Tonsing;天使投资人 | 验证原始模型部署产品愿景的早期需求。 |
| 2024-03-04 | Series B 增加增长期资本 | 融资 | $40M | 投资方:IVP;Spark;Greylock;South Park Commons;Lachy Groom;Base Case | 推动 Baseten 从早期 MLOps 根基转向更广泛的生成式 AI 基础设施扩张。 |
| 2025-02-19 | Series C 把融资与公开规模主张绑定 | 规模 | $75M;工作负载覆盖数千块 GPU;触达数百万终端客户 | 投资方:IVP;Spark;Greylock;Conviction;South Park Commons;Basecase;01A 相关投资人 | 说明在后期估值跳升前,基础设施规模已成为叙事核心。 |
| 2025-09-05 | Series D 融入新增长资本,并新增一名董事 | 治理 | $150M,估值约 $2.15B | 参与方:BOND;CapitalG;Conviction;Jay Simons | 资本形成和治理成熟开始同步推进。 |
| 2026-01-14 | WorkOS 访谈突出创业公司计划,并把语音列为新兴模态 | 产品 | 新 GTM 计划和语音重点 | 参与方:Philip Kiely;WorkOS 访谈者 | 暗示 Baseten 正扩大市场覆盖,并在下一阶段优先推进语音工作负载。 |
| 2026-01-20 | Series E 确立 Baseten 作为后期推理平台公司的位置 | 融资 | $300M,估值 $5B;前一年内第三次融资 | 投资方:IVP;CapitalG;NVIDIA;01A;Altimeter;Battery Ventures;BOND;BoxGroup;Blackbird Ventures;Conviction; Greylock | 确认投资人对相当规模的独立推理基础设施仍有胃口。 |
当抓取的公开来源支持时间但不给精确日期时,仅有年份的日期使用 1 月 1 日,仅有月份的日期使用该月第一天。
[CO001, CO009, CO016, CO017, CO018, CO019]时间线显示,Baseten 从 2019 年创立,进入压缩的 A 到 E 轮融资序列,并在 2026 年初讲出更宽的推理平台故事。
如果抓取来源没有给出精确公开日期,只到年份或月份的里程碑就用 1 月 1 日或所引月份第一天。
[CO001, CO016, CO017, CO018, CO019, CO020]这些 KPI 不是内部财务报表;它们是抓取语料中最清晰的公开规模和客户结果标记。
客户指标来自单个案例研究,应视为证明点,而不是 Baseten 自身的合并运营仪表盘。
[CO025, CO027, CO031, CO032, CO033, CO034]1.4 要点展示
02市场分析
2.1 市场边界、纳入支出与替代方案
Baseten 最适合被理解为生产推理平台,而不是通用云或模型实验室。纳入支出是团队已经有模型或模型端点之后,为打包、部署、运行、监控、计量和保护 AI 工作负载所需的层:模型 API、专用部署、复合 AI 编排、可观测性、计费,以及把延迟和正常运行时间守在生产目标内所需的支持。这个市场边界窄于完整企业 AI 堆栈,因为 Baseten 并不把数据湖、BI、通用应用开发或宽泛的智能体生产力套件作为价值主张的中心。它也窄于前沿模型研发,因为 Baseten 帮助团队把模型运营起来,而不是发明模型。最接近的替代方案是超大规模云厂商 AI 平台、内部 GPU 基础设施,以及 Modal、Replicate、Runpod 等专业 GPU 云。Baseten 的相邻机会位于核心部署层上下各一步:能直接进入推理的训练,以及模型实验室通过白标 API 变现。[CM001, CM002, CM003, CM004, CM005, CM006]
| 细分 / 类别 | 纳入支出 | 排除支出 | 买方 / 付款方 | 与 Baseten 的相关性 |
|---|---|---|---|---|
| 生产推理平台 | 模型服务运行时、自动扩缩容、可观测性、计费、支持、安全控制 | 基础模型研发、通用数据仓库、通用应用开发工具 | AI 产品负责人或平台团队;产品或 IT 预算付款 | Baseten 明确定位的核心市场 |
| Model APIs(模型 API) | 按用量计费的推理端点、token 计量、OpenAI 兼容访问 | 闭源模型所有权或应用层 SaaS 功能支出 | 应用工程师或产品团队;初期由工程预算付款 | 低摩擦入口和评估切入点 |
| 专用 / 自托管推理 | 专用 GPU 容量、自托管、数据驻留、企业支持 | 通用托管机房、通用 Kubernetes 服务、非托管 GPU 预留 | AI 平台负责人、CIO/CTO、安全或采购利益相关方 | 敏感或规模化工作负载的企业扩张路径 |
| 复合 AI 编排 | 多模型链、硬件感知编排、工作流优化 | 通用工作流自动化或 iPaaS 工具 | ML 工程师或应用工程负责人;产品 / 平台预算付款 | 多模态和智能体工作负载的重要扩张层 |
| 模型实验室 API 商业化 | 白标 API 端点、速率限制、API key、计费和计量 | 消费者计费软件或支付处理商 | 模型实验室或前沿模型产品团队;R&D / 平台预算付款 | Frontier Gateway 暴露出的独立细分市场 |
| 训练到推理闭环 | 托管训练任务和检查点晋升到推理 | 前沿研究支出,以及没有部署意图的纯实验 | ML 研究负责人或平台团队;R&D 预算付款 | 加深平台黏性的相邻领域,但不是主要市场镜头 |
边界行混合了 Baseten 核心市场和紧邻支出层。目的在于应用公开市场规模估计前,先界定商业切入点包含哪些支出。
[CM001, CM002, CM003, CM004, CM006, CM008]2.2 多种测算口径,以及为什么不能合并成一个 TAM
公开市场规模证据方向上很强,但品类边界仍然混乱。Technavio 更窄的 AI 推理即服务品类在 2025 年已经达到 USD 85.25 billion,并预计到 2030 年每年增长 22.1%。Fortune Business Insights 使用更宽的 AI 推理口径,认为该品类 2026 年达到 USD 117.80 billion;Mordor Intelligence 则把相邻企业 AI 市场测算为 2026 年 USD 114.87 billion,并显示软件或平台层和云部署主导支出。这些数字不能相加,因为它们部分重叠且定义不同,但它们确实交叉指向同一个结论:Baseten 追逐的不是小众预算项。估值时有用的纪律,是从宽泛 AI 平台支出不断下沉到 Baseten 实际竞争的、更窄的生产推理楔子。这个更窄的楔子以云优先、平台占比高、北美集中为特征,规模已经足够大,不需要假设 Baseten 拿到夸张市场份额才能支撑有意义的机会。公开数据没有提供的是干净的 Baseten 专属 SAM 或 SOM。[CM009, CM010, CM011, CM012, CM013, CM014]
| 镜头 | 发布方 | 基准年 / 预测期 | 地区 | 指标 | 数值 | 局限 |
|---|---|---|---|---|---|---|
| AI inference-as-a-service 市场规模 | 发布方:Technavio | 2025 基准,2026-2030 预测 | 全球 | 市场规模 | 规模 USD 85.25B | 较窄的服务化推理类别;仅摘要页 |
| AI inference-as-a-service 增长 | 发布方:Technavio | 2026-2030 | 全球 | CAGR(年复合增长率) | 22.1% | 预测值,不是当前收入;类别范围不同于更广义推理报告 |
| 更广义 AI 推理市场规模 | 发布方:Fortune Business Insights | 2026 | 全球 | 市场规模 | 规模 USD 117.80B | 更广义执行市场,横跨云、边缘和本地部署 |
| 更广义 AI 推理增长 | 发布方:Fortune Business Insights | 2026-2034 | 全球 | CAGR(年复合增长率) | 12.98% | 预测期比 Technavio 更长 |
| 相邻企业 AI 市场规模 | 发布方:Mordor Intelligence | 2026 | 全球 | 市场规模 | 规模 USD 114.87B | 范围远宽于单独推理 |
| 企业 AI 中平台占比较高的切片 | 发布方:Mordor Intelligence | 2025 | 全球 | 软件 / 平台占比 | 65.89% | 更广义企业 AI 市场占比,不是 Baseten 专属 SAM |
| 云优先部署镜头 | 发布方:Mordor Intelligence | 2025 | 全球 | 云部署占比 | 67.33% | 企业 AI 收入占比,不是仅推理支出 |
| 大型企业买方集中度 | 发布方:Mordor Intelligence | 2025 | 全球 | 大型企业占比 | 71.43% | 对买方集中度有用,不是直接 TAM |
| 受监管医疗增长切入点 | 发布方:Mordor Intelligence | 2026-2031 | 全球 | 医疗 CAGR | 20.77% | 垂直增长镜头,而不是整体市场规模 |
这些行是规模镜头,不是可相加市场总量。公开来源使用的定义彼此重叠,最安全用法是三角校验,而不是算术相加。
[CM009, CM010, CM012, CM013, CM014, CM015]视角从广义企业 AI 支出逐步收窄,落到 Baseten 更能守住的滩头:性能敏感、合规的生产推理。
这个金字塔是一条收窄的逻辑链,不是可相加模型。中间层混合了市场份额和市场规模,因为公开来源没有发布一套干净的 Baseten 专属层级。
[CM005, CM009, CM013, CM014, CM015, CM020]专业供应商公开的 GPU 小时费率区间说明,原始基础设施定价在这个市场已经变得相当透明。
区间来自 HostFleet 2026 年 4 月的供应商公开价格矩阵。它们不是按性能归一化的基准测试结果,也不包含企业协商折扣。
[CM031, CM032, CM043, CM044, CM047]2.3 买方、用户与付款方分层
公开买方证据指向三个尤其相关的细分。第一类是 Gamma 这类 AI 原生产品团队,产品或工程负责人关心上线速度、低延迟,以及不组建专门 ML 基础设施团队也能低成本服务开源模型。第二类是 Writer 这类企业 AI 平台团队和模型构建者,用户是 ML 工程师或数据科学家;一旦工作负载变成专用、多 GPU 或合规敏感,部署决策就会扩大到平台、安全和采购。第三类是 OpenEvidence 这类医疗受监管垂直部署,可靠性、数据处理,以及无需签下大额 GPU 承诺也能扩容,都会成为明确筛选标准。Baseten 的包装以不同方式服务这些细分:基于用量的计划和模型 API 降低实验门槛;Enterprise、自托管、SSO 或 SCIM、合规政策和计费 API 则说明大型买方期待治理、归因和受控推出。因此,预算所有者一部分可观察、一部分需要推断:它可能从产品或工程预算起步,随着部署更具战略性而迁移到中央平台或 IT 预算。[CM016, CM017, CM021, CM022, CM023, CM024]
| 细分 | 买方 | 用户 | 付费方 | 工作流 | 预算负责人 | 采用触发因素 |
|---|---|---|---|---|---|---|
| PLG AI 应用团队 | 工程副总裁或产品负责人 | 应用工程师和 ML 工程师 | 产品或工程预算 | 先用 Model APIs 做原型,再迁到专用容量 | 产品工程 | 上线日延迟、可靠性和更低的开放模型成本 |
| AI 原生初创平台团队 | CTO 或 AI 负责人 | ML 工程师和数据科学家 | 基础设施或平台预算 | 用托管开放模型服务替代对闭源模型的依赖 | 工程 / 平台 | 需要性能,又不想招聘基础设施专门团队 |
| 大型企业 AI 平台团队 | AI 平台负责人、CIO 或 CTO | 平台工程师、ML 工程师、数据科学家 | 中央平台或 IT 预算 | 在多个业务单元部署合规生产推理 | 平台 / IT | 专用容量、SSO/SCIM、合规政策、云用量承诺 |
| 受监管医疗 AI 工作负载 | 工程副总裁、CTO 或临床产品负责人 | ML 工程师或应用工程师 | 平台加安全 / 合规预算 | 医疗搜索、转录或面向患者的助手部署 | 平台加安全 | HIPAA、可用性和数据控制要求 |
| 模型实验室或自研模型厂商 | 研究产品负责人或商业化负责人 | 推理工程师和研究工程师 | 研发或平台预算 | 通过 Frontier Gateway 做白标 API 变现 | 研发 / 平台 | 需要销售推理能力,但不想自建面向客户的控制平面 |
| 复合 AI / 多模态团队 | AI 应用负责人或资深工程师 | 全栈工程师加 ML 工程师 | 产品加平台预算 | 基于 Chains 在多模型、多机器之间编排 | 产品 / 平台 | 单体式部署带来的延迟和 GPU 浪费 |
买方和预算负责人字段把直接披露的产品打包方式与基于客户案例的谨慎推断合并呈现。公开证据对用户和触发因素更强,对实际签字权人的指向较弱。
[CM016, CM021, CM022, CM023, CM024, CM025]定性契合度热力图,展示 Baseten 的合规、云和高端支持主张在哪些细分市场看起来最强。
评分综合了云部署份额、医疗增长、合规要求和可见价格压力等公开证据;它们不是实测赢率数据。
[CM015, CM017, CM029, CM043, CM046, CM047]2.4 部署价值链与采用路径
Baseten 的价值链起点,是一个已经存在、需要变成可靠生产服务的开源模型、自定义模型或专有模型。之后,日常用户通常是模型、平台或应用工程师,负责打包工作负载并评估延迟、吞吐和成本。下一道门槛主要是组织性的,而不是技术性的:当工作负载需要专用容量、数据驻留控制或身份集成时,安全、合规和采购检查会加强。Baseten 随后位于编排层,决定工作负载通过 Model APIs、Dedicated Inference、Chains、Frontier Gateway,还是自托管或混合部署来运行。该层下面是真正的云和 GPU 底座;它在经济上仍然关键,因为容量、价格和区域可用性会直接决定毛利率和可靠性。Baseten 的客户故事显示,公司试图把价值捕获从原始 GPU 租赁上移到性能工程、部署工具和运营支持,因为客户解释为何没有继续自建时,引用的正是这些层。[CM021, CM025, CM027, CM028, CM029, CM030]
Baseten 位于模型创建和终端用户流量之间,试图通过掌握部署、控制和性能运营,捕获高于原始 GPU 供应的价值。
这条价值链综合了产品包装、客户故事和竞争对手文档;它是市场结构图,不是 Baseten 的内部流程图。
[CM021, CM025, CM027, CM028, CM029, CM030]2.5 增长驱动、采用约束与市场纪律
无论看供应商材料还是分析师材料,最强增长驱动都很清楚。开源模型进步足够快,产品团队越来越想要为这些模型优化的基础设施,而不是依赖封闭模型;实时和复合 AI 工作负载让延迟和吞吐在经济上可见;企业买方正在从试点走向生产,尤其是在受监管数据、正常运行时间或模型性能调优重要的场景。Baseten 自己的案例研究用延迟、每秒 token、图片吞吐和维护负担下降等具体主张支持这个判断。但这不是一个容易的市场。硬件供应约束、关税和高加速器价格仍是结构性逆风。人才短缺和遗留系统集成复杂度会拖慢企业买方推出。公开竞争定价也显示,原始 GPU 小时经济性非常残酷:更便宜的专业云和更宽的超大规模云套件都会挤压独立推理供应商。因此,对 Baseten 的正确市场视角不是全部 AI 基础设施,而是那部分值得为高端支持、合规和性能付费、足以抵消更高标价的推理工作负载。公开数据支持这个楔子,但还不支持精确的经济护城河或清晰可测的可服务市场。[CM031, CM032, CM033, CM034, CM035, CM036]
| 因素 | 方向 | 时点 | 含义 | 尽调问题 |
|---|---|---|---|---|
| 开源前沿模型提升性价比 | 驱动因素 | 当前 | 成本敏感型产品会更偏向专业推理平台,而不是闭源模型 API | Baseten 收入中已有多少比例来自开放模型工作负载? |
| 实时与复合 AI 对延迟敏感 | 驱动因素 | 当前 | 提高客户为性能工程、编排和自动扩缩容付费的意愿 | 用量中延迟关键型与离线批处理各占多少? |
| 云优先的企业 AI 部署 | 驱动因素 | 当前 | 许多团队更可能采用托管推理,而不是自建基础设施 | Baseten 需求中云优先账户与自托管账户各占多少? |
| 受监管行业对合规和数据控制的需求 | 驱动因素 | 当前 | HIPAA、区域限制和混合 / 自托管部署由此成为切入点 | 企业管线中有多少比例需要受监管部署边界? |
| GPU 供应约束和关税压力 | 约束因素 | 当前 | 推高销货成本,也可能限制可用容量 | 哪种预留容量策略或云多元化安排能保障供应? |
| 技能缺口和集成复杂度 | 约束因素 | 中期 | 拖慢企业落地,增加实施负担 | 部署工作中有多少已产品化,多少仍偏服务交付? |
| 专业 GPU 云的价格竞争 | 约束因素 | 当前 | 按商品化 GPU 小时对比,Baseten 账面价格会显得偏贵 | 在公开标价更高的情况下,Baseten 在哪些场景能稳定胜出? |
| 超大规模云厂商平台捆绑 | 约束因素 | 中期 | 更完整的原生云套件可能吸走本应流向专业推理厂商的预算 | 哪些工作负载真正需要专业厂商,而不是超大规模云厂商原生栈? |
| 单位经济和支持服务附着率不透明 | 约束因素 | 当前尽调 | 公开材料看不出高端定位能否转化为持久利润率 | 索取产品级毛利率和企业折扣数据。 |
驱动和约束行混合了第三方市场报告、Baseten 产品主张、客户证据和独立定价矩阵。它们用于搭建尽调框架,不是加权评分模型。
[CM031, CM032, CM033, CM034, CM038, CM039]2.6 要点展示
03竞争格局
3.1 竞争格局与任务场景覆盖
Baseten 位于一个拥挤的中间层,一边是低摩擦的无服务器推理同行,另一边是大型云既有厂商。最接近的直接替代品是 Modal、Replicate 和 Runpod:它们都让开发者无需直接拥有基础设施也能把模型放到 GPU 上,但各自压缩堆栈的方式不同。Modal 优化 Python 原生无服务器计算,Replicate 优化社区模型和极低摩擦 API,Runpod 则通过 Pods 和 Serverless 提供便宜的原始容量。再往上是 AWS Bedrock/SageMaker、Google Vertex AI 和 Azure ML,它们较少争夺独立开发者体验,更多靠采购杠杆、治理和既有云承诺竞争。再往下是现状替代方案:基于开放打包标准和租用 GPU 内部自建。Baseten 进一步拓宽战场,因为它销售的不只是部署,还包括训练、多步骤编排,以及面向模型实验室的白标 API 变现。这种宽度意味着公司并不处在双寡头赛道;独立数据集和公司材料都指向一个碎片化、多类别格局,买方会根据速度、控制、信任或成本偏好,在托管推理、原始算力、超大规模云工具或自管堆栈之间替换。[CP001, CP002, CP004, CP005, CP006, CP007]
| 竞争对手 | 类别 | 规模 / 融资 | 目标客群 | 差异化 | 局限 |
|---|---|---|---|---|---|
| Modal | 直接的无服务器同类 | $30/mo Starter 抵扣;Team 方案 $250/mo + 计算费用 | AI 工程师和初创公司 | Python 优先的无服务器 DX、即时自动扩缩容、可观测性 | 无公开自托管选项;企业控制集中在付费层 |
| Replicate | 直接托管 / API 同类 | 数千个社区模型;通过 Cog 自定义部署 | 开发者、原型团队、模型实验者 | 一行 API、模型市场、微调 | 私有模型需为启动和空闲时间付费,公开披露的企业级能力更薄弱 |
| Runpod | 原始 GPU 云 / 无服务器替代 | 750,000+ 开发者;Pods + Serverless + Clusters | 成本敏感的 AI 构建者和基础设施投入重的团队 | 公开原始 GPU 价格最低、SKU 多、扩容快 | 服务栈更偏 DIY,交钥匙式推理生命周期工具更少 |
| AWS Bedrock / SageMaker | 超大规模云厂商既有玩家 | AWS 规模的数据 / AI 平台,带供应商 / 模型菜单 | 已深度绑定 AWS 的企业 | 采购杠杆、治理、广泛生态 | 定价复杂,云锁定更强 |
| Google Vertex AI | 超大规模云厂商既有玩家 | 200+ 个 Google 和第三方模型 / 工具 | GCP 企业和平台团队 | Model Garden、流水线、集成数据 + AI 栈 | 管理费和 GCP 依赖让简单成本对比变复杂 |
| Azure ML | 超大规模云厂商既有玩家 | Azure 原生 ML 平台,99.9% SLA | 以 Azure 为中心的企业和受监管企业 | 中央化 Studio、模型目录、Azure 安全姿态 | Azure 服务单独收费,且没有公开多云叙事 |
| 内部自建(Truss/Cog + 租用 GPU) | 现状方案 / 内部自建 | 在自有或租用基础设施上使用可移植开源打包 | 平台工程能力强的团队 | 控制力最高,软件锁定最低 | 扩展、可靠性和合规运营负担最高 |
| 自建品牌 API 的模型实验室 | 相邻 / 可能进入者 | 直接拥有 API,并有自定义计费和计量界面 | 前沿模型厂商和专业实验室 | 自有品牌、自有客户关系、直接变现 | 没有托管合作伙伴,容量规划和企业运营很难长期维持 |
各行比较买方完成同一部署任务的主要路径,包括直接同类、既有厂商和内部自建替代方案。
[CP004, CP005, CP006, CP007, CP008, CP009]对自助易用性与企业控制 / 可移植性做序位评分。
[CP004, CP006, CP007, CP023, CP025, CP028]3.2 能力与定价对比
当买方想要托管推理平台,而不只是 GPU 租赁或一行演示 API 时,Baseten 的比较最有利。公开材料显示,它的堆栈组合了自定义模型打包、OpenAI 兼容的 Model APIs、训练、Chains 编排、企业部署模式,以及围绕低延迟优化技术构建的运行时。Modal 是最鲜明的开发者体验对照:清晰的 无服务器定价、慷慨的月度额度和明确的 GPU 并发限制,让它对主要需要弹性 Python 计算的团队很有吸引力。Replicate 对原型和模型发现更轻量,但其私有模型经济性包含专用硬件的启动设置和闲置时间。Runpod 是价格地板替代品,公布更便宜的原始小时和按秒 GPU 费率,同时把更多服务生命周期留给客户。超大规模云厂商更难逐项比较,因为 Bedrock、Vertex 和 Azure ML 把模型访问包进更宽的云计费、治理和平台费用。净看,Baseten 的公开标价透明且功能丰富,但它卖的显然是性能、可移植性和支持,而不是商品化算力。只有当客户看重总体生产结果超过最便宜的公开 GPU 小时时,这个楔子才成立。[CP003, CP011, CP012, CP013, CP014, CP015]
| 采购标准 | Baseten | Modal | Replicate | Runpod | Bedrock / SageMaker | Vertex AI | Azure ML |
|---|---|---|---|---|---|---|---|
| 自定义模型打包框架 | Truss | Python 函数 | Cog | 容器 / handler 模式 | 自定义训练 + 部署 | 自定义训练 + 部署 | 模型目录 + 部署 |
| 兼容 OpenAI 的托管开放模型 | 是 | unknown | 部分支持 | unknown | 部分支持 | 部分支持 | 部分支持 |
| 同一平台托管训练 | 是 | unknown | 仅微调 | 是 | 是 | 是 | 是 |
| 自托管 / 客户云选项 | 是 | unknown | unknown | unknown | AWS 以外无公开 BYOC | 无公开多云选项 | 无公开多云选项 |
| 多云 / 云中立路由 | 是 | 是 | unknown | 多区域 / 声称无锁定 | 否 | 否 | 否 |
| 企业信任姿态 | SOC2 + HIPAA + 单租户 | 企业 SSO / 审计 / HIPAA | unknown | SOC2 Type II | 企业治理 | 企业治理 | 99.9% SLA + Azure 控制 |
| 内置多步编排 | Chains | 通用函数 | 仅自定义代码 | 队列 + 无服务器 | 更广的平台服务 | 流水线 + 智能体 | 更完整的 ML 工作台 |
| 公开标价透明度 | 高 | 高 | 中 | 高 | 中 | 中 | 低 |
标为未知的单元格反映公开证据缺失;该矩阵比较采购标准,不评判基准测试赢家。
[CP013, CP014, CP015, CP019, CP020, CP021]| 供应商 | 公开产品 | 合同模式 | 价格信号 | 已含能力 | 含义 |
|---|---|---|---|---|---|
| Baseten Basic | 自定义 + 开源部署 | $0/mo + 用量 | 公开 GPU 和按 token 计价表;声称不收空闲费 | 专用部署、Model APIs、训练 | 生产工作负载的透明入口 |
| Baseten Pro / Enterprise 企业套餐 | 报价制 | 销售主导 / 折扣 | 优先计算、自定义 SLA、自托管、量级折扣 | 专属支持、数据驻留、企业控制 | 增购靠能力广度和支持,而不是更低公开标价 |
| Modal Starter | 无服务器计算 | $0 + 计算费用 | $30/mo 抵扣;10 个 GPU 并发 | 日志、区域选择、无服务器原语 | 很适合原型和小团队起步 |
| Modal Team | 无服务器计算 | $250/mo + 计算费用 | $100/mo 抵扣;50 个 GPU 并发 | 自定义域名、静态 IP、回滚 | 仍以计算为中心,可随初创团队扩展 |
| Replicate 私有模型 | 面向自定义模型的专用硬件 | 按秒计费,包含设置 / 空闲 / 活跃 | 无固定席位方案;实例在线即付费 | 通过 Cog 自定义模型、自动扩缩容 | 常驻自定义部署成本可能变高 |
| Runpod Secure Cloud | 原始 GPU 实例 | 按小时租用 GPU | 示例标价包括 A100 $1.39/hr、H100 PCIe $2.89/hr | 可靠 pods、广泛 GPU 菜单 | 给愿意自管的买方提供成本下限 |
| Runpod Serverless | flex 或 active 工作进程 | 按秒 | H100 计费:flex $0.00116/s,active $0.00093/s | API 端点、队列、快速冷启动 | 适合突发推理和可缩容至零的工作负载 |
| AWS Bedrock | 按供应商 / 模型划分的 API | 按 token + 分层服务 | 批处理标价比按需低 50% | 托管模型访问加 AWS 附加服务 | 既有 API 路径好接入,但账单复杂度更高 |
| Google Vertex AI | 智能体 / 模型平台 | 用量 + 计算 + 费用 | 计算 / 存储加管理费;流水线 $0.03/run | 笔记本、Model Garden、流水线 | 最适合已有 GCP 资产内使用 |
| Azure ML | Azure 原生 ML 平台 | 消耗的 Azure 服务 | 无单独 Azure ML 费用 | Studio、模型目录、部署 | 对 Azure 优先买方有采购优势 |
公开标价只能在标题层面比较;谈判折扣和特定工作负载成本在多数企业交易中仍不公开。
[CP011, CP012, CP016, CP017, CP018, CP019]呈现类别层面的能力强弱,而不是逐个厂商的基准测试说法。
[CP014, CP023, CP024, CP025, CP028, CP031]3.3 分发力、切换成本与信任姿态
Baseten 最强的非性能论点,是客户既能保留控制权,又能避开自建推理平台的运营负担。多云路由、自托管部署和单租户选项,有助于赢下担心把关键工作负载锁进单一超大规模云厂商、或需要更严格数据驻留边界的买方。结构性取舍也很清楚:Baseten 也依赖可移植、开放的打包方式,Cog 加原始 GPU 云等相邻工具仍然可用,因此硬切换成本低于封闭模型 API 或数据平台。信任层面,Baseten 的公开姿态领先于自助式同侪:它把 SOC 2 Type II 和 HIPAA 主张与默认不存储输入或输出的明确声明配在一起。Modal 通过仅面向企业的 SSO、审计日志和 HIPAA 缩小了一部分差距;Replicate 仍最强于采用便利性;Runpod 仍最强于低成本基础设施自由度。最大的分发劣势仍是超大规模云厂商的渠道权力。AWS、GCP 和 Azure 可以把 AI 采购折进既有计费、IAM 和云承诺关系,这意味着 Baseten 必须持续证明,开源模型性能、可移植性和支持值得客户单独选择一个供应商。[CP023, CP024, CP025, CP026, CP027, CP028]
3.4 护城河耐久性与竞争风险
Baseten 的护城河真实存在,但比专有模型或数据网络护城河更软。证据最充分的优势是集成执行:优化运行时、多云容量、受监管部署选项,以及面向认真交付 AI 产品团队的高接触工程支持。Training 和 Frontier Gateway 把产品扩展成更大的平台故事;如果客户从模型开发到品牌化 API 交付都标准化到同一家供应商,账户控制力会增强。反向证据同样重要。HostFleet 的 2026 年 4 月无服务器 GPU 矩阵显示,Baseten 在多个常见 GPU 档位中是公开选项里最贵的;Sacra 也明确警告,超大规模云厂商可以把推理捆进更宽的云承诺,从而压制独立供应商。Baseten 自己的状态页在展示窗口内报告 Model API 正常运行时间为 99.91%,且 2026 年 5 月有多起事故,因此「四个 9」可靠性应被视为销售目标,而不是运营已经无摩擦的证明。资本有帮助,但不能消灭问题:最新融资和投资者名单提高了公司在资本密集型市场中的续航力;核心承销结论仍然是,Baseten 最适合高端、生产级开源模型推理工作负载,而不是在原始 GPU 小时的商品价格战中取胜。[CP031, CP032, CP033, CP034, CP035, CP036]
| 护城河主张 | 威胁 | 严重程度 | 缓解措施 / 尽调问题 |
|---|---|---|---|
| 集成运行时 + 编排栈 | 无服务器同类能复制部分 DX,但覆盖不到完整广度 | 中 | 基准测试应覆盖完整应用工作流,而不只是单个端点延迟数字 |
| 多云 + 自托管可迁移性 | 相比封闭平台,客户更容易多栖部署或迁出 | 中 | 按部署模式衡量留存和扩张,判断可迁移性是否仍能转化为持续支出 |
| 企业信任姿态 | 超大规模云厂商把治理能力打包进现有合同和云承诺 | 高 | 收集受监管行业对阵 AWS、GCP、Azure 的赢单 / 输单记录 |
| 训练 + 网关扩张 | Baseten 现在要和更宽的 AI 平台厂商及模型实验室工具竞争 | 中 | 量化训练和网关产品带来增量推理收入的频率 |
| 开源封装 | Truss 降低锁定效应,压低切换成本 | 高 | 按队列跟踪 Truss 转付费和生产环境留存 |
| 相对原始 GPU 云的价格溢价 | Runpod 等主机商压低了公开基础设施价格 | 高 | 用客户基准证明总拥有成本和可靠性的 ROI |
| 可靠性品牌 | 近期事故削弱了竞标中的四个九叙事 | 中 | 复盘事故频率、MTTR,以及客户多区域故障转移模式 |
| 资本支持 | 超大规模云厂商和其他资金充裕的基础设施厂商仍能比 Baseten 更敢花钱 | 中 | 验证投资方和 GPU 供应关系是否带来真实商业或容量优势 |
严重性反映未来 12–24 个月内,每项威胁压缩定价权或抬高获客摩擦的可能性。
[CP023, CP024, CP025, CP029, CP030, CP031]对 Baseten 竞争耐久性和当前压力点的紧凑读数。
[CP031, CP033, CP035, CP038, CP039, CP040]3.5 要点展示
04财务情况
4.1 收入模式与公开定价
Baseten 的公开财务故事始于一个直接的收入设计:公司按生产推理和相邻基础设施用量收费,而不是按席位收费。定价页暴露了三个商业面——专用部署、Model APIs 和 Training——并把它们包进 Basic 自助层,以及按报价销售的 Pro 和 Enterprise 包装。Model APIs 按每百万 token 计价,专用部署按实际使用的计算资源计费且精确到分钟,Training 则既以托管 Training Jobs 销售,也以较新的 Loops 工作流销售,后者把 checkpoint 直接送入推理。这是一个连贯的生产基础设施变现模型;计费用量 API 也通过按日拆分和已用额度,把支出拆到专用推理、训练和 Model APIs。 细节在于,标价只是这个模型的外壳。公开材料显示,真正的商业楔子在高端支持、优先计算、自托管部署、使用既有云承诺、自定义 SLA,以及高级安全或治理。这些功能意味着 Baseten 试图同时变现用量和更高接触的企业销售动作。Terms 还把客户 Order 设为有约束力的商业工具,这意味着当支持、折扣和最低承诺进入交易后,实际价格可能与公开定价页显著偏离。这一点对收入确认和收益率分析重要,因为公开标价可观察,但标价到净价的经济性不可见。 结果是,Baseten 有可信的收入架构,对计费单位有不错的公开可见度,但对已实现收入质量的可见度很弱。公开证据能显示 Baseten 打算如何收费;它不能显示收入组合、服务附着率,或客户在企业谈判后实际支付多少。[CI001, CI002, CI003, CI004, CI005, CI006]
| 收入流 | 机制 | 计费单位 | 当前值 / 状态 | 质量 | 尽调要求 |
|---|---|---|---|---|---|
| 专用部署 | 按工作负载配置 GPU 计算,并叠加部署控制和支持 | GPU-minute / 合同 | 有公开标价,也有基于报价的 Pro / Enterprise 套餐 | 计费单位证据质量中;实际成交价证据质量低 | 提供客户级实际费率表、折扣,以及按 GPU 系列拆分的毛利率。 |
| Model APIs | 通过兼容 OpenAI 的端点托管模型,按使用量计价 | 1M tokens | 公开标价列出了输入、缓存输入和输出的独立价格栏 | 标价单位证据质量高;实际收益率证据质量低 | 按模型提供 token 量、缓存占比、批处理占比,以及每 token 实际净收入。 |
| Training Jobs / Loops | 托管训练工作负载,可直接接入推理部署 | GPU-minute / 作业 / 合同 | 已有商业化界面,但未披露公开标价 | 低 | 提供 Training Jobs 价格表、贡献毛利,以及挂接到生产推理的附加率。 |
| 支持和工程服务 | 实操工程支持、Slack / Zoom 支持、部署优化和企业协助 | 服务附加 / 合同 | Pro 和 Enterprise 话术中明确存在,但没有独立费率表 | 低 | 提供服务附加率、混合定价,以及支持业务是增厚毛利还是由补贴支撑。 |
| 企业自托管 / 云承诺可迁移性 | Baseten 软件和支持叠加到客户云或混合部署中 | 定制合同 | 对外宣传为关键企业功能集 | 低 | 提供自托管账户的典型年合同额、最低承诺和续约行为。 |
公开证据能支持收入入口和计费单位,但不能支持产品级收入结构或实际成交价格。
[CI001, CI002, CI003, CI004, CI005, CI006]| 价格 / 单位 / 合同 | 标价与实际成交价 | 折扣 / 未知项 | 有来源支撑的含义 |
|---|---|---|---|
| Basic:$0 / 月,按量付费 | 纯标价 | 没有公开转化率、ARPU 或激活数据 | 自助入口拓宽漏斗,但无法说明付费转化。 |
| Pro:基于报价,含优先计算、专用计算、更高 API 速率限制、专属支持 | 标价套餐,实际成交价隐藏 | 提供批量折扣,但折扣深度未披露 | 收入质量可能取决于支持和优先容量有多大比例附着到使用量。 |
| Enterprise:基于报价,含自托管、定制 SLA、云承诺、数据驻留、高级 RBAC | 标价套餐,实际成交价隐藏 | 没有公开最低承诺、续约条款或服务定价 | 企业价值主张是运营控制,而不是透明 SKU 定价。 |
| Model APIs:按每 1M token 定价,输入、缓存输入和输出费率分列 | 标价 | 没有企业费率表、批处理曲线或结构数据 | 可用于基准比较,但 token 标价不是实际收入。 |
| 专用计算:按实际使用计算量计费,精确到分钟 | 标价计费规则 | HostFleet 称,专用部署最低成本和计费运行时长仍然适用 | 缩容到零有帮助,但最低费用会压缩突发型工作负载的节省空间。 |
| 费用和开票:月底计费,除非订单另有约定,30 天内到期 | 合同规则,而非产品标价 | 订单决定实际经济条件 | 收入确认和回款节奏可能随协商后的企业订单而变。 |
本表区分公开标价机制和私下实际经济条件;折扣、最低承诺和企业费率问题仍全部未解。
[CI002, CI003, CI004, CI005, CI006, CI007]Baseten 将专用部署、Model APIs 和 Training 的使用量转成计量支出,再把其中一部分转成价值更高的企业和支持合同。
流程展示商业逻辑,而不是量化瀑布;公开证据没有披露收入组合或实际合同价值。
[CI001, CI005, CI006, CI008, CI011, CI012]4.2 GTM 动作与销售效率代理指标
Baseten 的 GTM 最适合理解为先用用量落地,再靠可靠性、支持和部署控制扩张。Basic 计划和公开 Model APIs 创造低摩擦开发者入口,但变现叙事很快转向 Pro 和 Enterprise 功能,例如专用计算、更高速率限制、高接触工程、自托管,以及云承诺可移植性。这个包装说明 Baseten 不是只想赢一场商品化自助竞赛;它想成为那些足够在意延迟、正常运行时间和控制、愿意为运营帮助付费的团队的生产推理层。 Baseten 是私营公司,CAC、回本周期、企业销售周期长度和净留存率(NRR)都不可得。最好的公开替代指标是客户案例。Writer 认为 Baseten 降低了 70B 级模型的每百万 token 成本,并提高吞吐。OpenEvidence 强调无需多年期预留也能灵活获取计算资源,同时部署和维护时间大幅改善。Speechify 报告称,它得以退役大规模自管 GPU 集群,同时降低每百万字符成本。Superhuman 和 Patreon 把价值主张描述为节省稀缺工程时间,同时显著改善延迟或降低 GPU 成本。这些不是经审计财务数据,但方向上符合一种 GTM 动作:销售的是投产时间和更低总体运营成本,而不只是标价算力。 因此,证据支持一个合理的扩张引擎,但只能以代理形式支持。买方逻辑可见;销售效率数学不可见。没有内部转化、留存和销售支出数据,就无法有把握地承销 Baseten 的 GTM 效率。[CI003, CI004, CI014, CI018, CI019, CI020]
4.3 成本结构与单位经济代理指标
Baseten 的公开材料指向一种轻资产但不一定低价的成本结构。Sacra 称,公司聚合 15+ 云供应商的容量,而不是直接拥有 GPU 基础设施;相较直接购买并融资 GPU 集群的供应商,这应当降低固定资产强度。官方材料从另一个角度强化同一模型:Baseten 持续谈多云容量管理、跨云自动扩缩、缩容到零,以及必要时在客户云中运行。原则上,这应让业务比专用自有集群运营商更灵活地调整供给,并让成本更紧密匹配工作负载形态。 但轻资产不等于便宜。HostFleet 的 2026 年 4 月无服务器 GPU 矩阵显示,在列出的每一个共享 SKU 上,Baseten 定价都高于 Runpod;在共享 L4 和 H100 行上也高于 Modal;在重叠的 A100 行中,只有 Replicate 的 A100 自定义部署价格更高。这是公开记录中最清晰的反向信号:Baseten 在原始计算之上销售高端托管层。公司事实上的反驳,是它自己的性能叙事。Dedicated Inference 声称优化运行时能带来 6x 更好 GPU 利用率和 5-10x 更低成本;Model APIs 声称相较封闭替代品支出降低 5-10x;客户研究报告每单位成本更低、工程师更少,或两者兼有。这些主张符合一个论点:Baseten 通过利用率和支持附着率扩张毛利率,但公开数据仍未证明毛利率。 因此,单位经济仍停留在代理区间。我们能看到计费单位。我们能看到一些客户说总成本下降。我们能看到标价相对原始 GPU 云有溢价。我们看不到的是云成本转嫁、支持人力、议价折扣和留存之间的实际平衡。[CI006, CI007, CI015, CI016, CI017, CI018]
| 指标 | 数值 / 公开代理指标 | 置信度 | 为什么重要 | 尽调要求 |
|---|---|---|---|---|
| 已发布计费单位 | 专用推理按 GPU-minute;Model APIs 按每 1M token | 高 | 说明 Baseten 按使用量而非席位变现。 | 按工作负载类型提供实际计费结构。 |
| 公司声称的利用率杠杆 | Dedicated Inference 的 GPU 利用率提升 6x、成本降低 5-10x | 中 | 若属实,利用率就是核心毛利率杠杆。 | 提供优化前 / 后利用率直方图,以及按优化运行时拆分的毛利率。 |
| 价格地板压力 | HostFleet 显示,在共享 SKU 上 Baseten 相比 Runpod 有溢价,在共享 L4 / H100 行上相比 Modal 有溢价 | 中 | 溢价必须靠更低总成本解释,而不是靠原始 GPU-hour 平价。 | 提供 Baseten 战胜更便宜原始 GPU 替代方案的赢单 / 输单分析。 |
| Writer 代理案例 | 每百万 tokens 成本降低 35%;tokens/sec 提高 60%;TTFT 降低 23% | 中 | 表明性能优化可能抵消标价溢价。 | 提供基准方法,以及可比客户的毛利率影响。 |
| OpenEvidence / Speechify 代理案例 | 延迟降低 78%,部署快 6x,维护量低 8x,每百万字符成本降低 44% | 中 | 通过基础设施节省和平台工程师减少,支撑 TCO 论点。 | 提供迁移后经审计的客户扩张和留存数据。 |
| Patreon / Superhuman 代理案例 | 每年节省 $600k 资源,GPU 成本节省 70%,延迟降低 80%,释放多名工程师 | 中 | 说明经济价值既可能来自人效,也可能来自计算节省。 | 对声称节省人力的客户,提供队列级 NRR 和服务附加情况。 |
| 毛利率 / CAC / NRR | 未公开披露 | 低 | 缺少这些指标,公开材料无法闭合单位经济模型。 | 提供按产品线拆分的毛利率、CAC、回本周期、流失和 NRR。 |
这些行混合了官方标价机制、客户证明代理指标和独立价格地板检查;没有一项能替代披露的毛利率数据。
[CI005, CI006, CI015, CI018, CI019, CI020]公开单位经济性证据从工作负载形态延伸到利用率和支持经济性,但在毛利率前断开,因为实际折扣和 COGS 是私有信息。
这座桥接是方向性的。公开证据以定性方式或通过案例研究代理指标支持这些节点,但不足以给出闭合的利润率方程。
[CI007, CI015, CI018, CI019, CI020, CI021]4.4 资本充足性与融资依赖
Baseten 的资本位置纸面上强,实践中不透明。公开记录支持 2025 年 2 月 $75 million 的 Series C 融资、2025 年 9 月 $150 million 的 Series D 融资,以及 2026 年 1 月以 $5 billion 估值完成的 $300 million Series E。Business Wire、Tracxn 和 CB Insights 都指向约 $585 million 的累计融资,Business Wire 还明确把 2026 年 1 月轮描述为公司此前一年内第三次融资。这个节奏重要:Baseten 显然不是靠慢速现金生成运营;随着推理需求扩大,它在激进融资以支撑增长。 资金用途措辞强化了这个解读。Baseten 自己的 Series E 博文把新资金聚焦在速度、正常运行时间、开发者体验,以及扩展基础设施平台。Tech Funding News 补充了预期招聘、客户支持扩张和更多集成。公开员工数代理也与这个投资故事一致:PitchBook 的 2025 年快照显示 73 名员工,而 Tracxn 到 2026 年 4 月列出 258 名员工。即便这些数据集不完美,方向仍很清楚——Baseten 似乎在伴随产品和基础设施范围扩张,大幅放大运营支出。 公开记录没有显示的是,当前资本基础相对于烧钱速度是否足够。没有披露现金余额、月度烧钱、现金跑道、债务安排或约束性条款包。Sacra 报道的 $200 million 至 $600 million 年化收入估计显示了相当规模;据报道 $11 billion 至 $15 billion 的估值讨论也说明市场可能愿意为下一段融资。但这些都不能替代现金、毛利率和现金跑道披露。唯一硬结论是:Baseten 获得资本的能力很强;它相对实际烧钱是否资本充足,仍然是私有信息。[CI027, CI028, CI029, CI030, CI031, CI032]
| 指标 | 公开数值 / 状态 | 有来源支撑的含义 | 尽调要求 |
|---|---|---|---|
| 累计融资 | 公开报道累计融资 $585M | 融资能力足以支持快速扩张,但剩余现金未知。 | 提供 2026 年 1 月融资后的当前现金余额和非受限现金。 |
| 最新融资 | 2026 年 1 月以 $5B 估值完成 $300M Series E | 新股融资显著提升了进入 2026 年的灵活性。 | 提供交割后现金桥,以及董事会批准的运营计划。 |
| 融资节奏 | Series C $75M(2025 年 2 月)、Series D $150M(2025 年 9 月)、Series E $300M(2026 年 1 月) | 大约一年内完成三轮融资,意味着公司处于激进投入模式,也可能依赖资本市场。 | 提供下一轮目标时间和融资应急计划。 |
| 计划资金用途 | 速度、正常运行时间、开发者体验、团队增长、平台扩张、更多集成和支持 | 支出看起来投向产品、基础设施和人力,而不是收割模式。 | 按职能提供 24 个月 capex / opex 预算。 |
| 现金余额 / 月度烧钱速度 / 现金跑道 | 未公开披露 | 公开证据无法支持资本充足性承销判断。 | 提供月度净烧钱速度、现金余额,以及基准和下行情景下的现金跑道。 |
| 债务 / 项目融资义务 | 未披露公开债务明细或项目融资义务;引用的 EDGAR 页面无法取得 SEC 经营公司文件 | 没有披露不等于没有义务。 | 提供所有债务额度、云承诺负债、预留容量义务和主要供应商条款。 |
融资事实是公开的;充足性不是。本表有意把已知融资历史与未知流动性和义务指标分开。
[CI027, CI028, CI029, CI030, CI031, CI033]公开财务信号跨度很宽,因为它们混合了已关闭融资事实、第三方估计和快照,而不是经审计财务数据。
收入和估值上限来自第三方估计,不是公司披露的经审计数字。员工数跨越两个不同供应商数据集和日期。
[CI027, CI035, CI037, CI038, CI048]Baseten 的现金流逻辑似乎是反复股权融资流向产品、支持和容量编排,但剩余现金头寸和义务仍是私有信息。
该图展示现金使用方向和融资依赖,而不是数值化现金流量表,因为公开资料没有烧钱速度和现金数据。
[CI027, CI028, CI029, CI030, CI033, CI034]4.5 财务结论与披露缺口
财务结论是:业务模式连贯性为正面,可承销披露为负面。正面看,Baseten 清楚地按其产品品类的正确单位变现:GPU 时间、token 用量、训练任务,以及高接触企业功能。客户证据持续强化同一个故事——当生产级推理降低总体运营成本、缩小工程负担,并在真实工作负载下守住延迟或正常运行时间时,它有价值。资本获取能力也异常强,约一年内三轮融资,最终落在 $300 million Series E。 负面看,几乎每一个把好故事变成投资案例的指标都仍是私有信息。公开来源没有显示按产品划分的收入组合、已实现企业价格、毛利率、获客成本(CAC)、净留存率(NRR)、流失、客户集中度、现金余额、月度烧钱或现金跑道。与公司查询相关的 SEC EDGAR 实体落地页没有提供公开运营公司监管文件,因此没有可依赖的经审计财务桥。可靠性证据尚可但并非无瑕:状态页显示近期事故,SLA 目标是 99.9%,不是最强营销表述暗示的完美正常运行时间。 净看,Baseten 像是一个有真实需求和可信降本代理的高端、基于用量的推理平台,但公开记录仍太薄,无法严谨承销收入质量或资本充足性。正确尽调姿态是:在私人财务数据补齐下列缺口之前,把公司视为有前景但披露轻。公开客户证据现在覆盖金融工作流(Hebbia)、编码产品(Zed 和 Posit)、语音界面(Wispr Flow)和世界模型实验(World Labs),这增强了一个判断:Baseten 的基于用量收入机会,分散在多个要求很高的生产工作负载上,而不是单一狭窄细分。[CI024, CI025, CI027, CI028, CI033, CI040]
| 缺失的私有指标 | 对承销判断的影响 | 具体尽调路径 |
|---|---|---|
| Model APIs、Dedicated Inference、Training 和服务之间的收入结构 | 无法判断增长是可持续的软件式扩张,还是支持占比高的服务收入。 | 索取过去 18 个月按产品模块拆分的月度收入,以及按模块拆分的贡献毛利。 |
| 标价到净价、企业最低承诺和折扣表 | 公开标价可能夸大实际收益率和毛利。 | 审阅最近五份企业订单及其折扣审批和使用曲线。 |
| 按产品线拆分的毛利率,以及云 / GPU 采购条款 | 无法评估优化主张是否转化为留存下来的毛利润。 | 提供产品级 COGS、按供应商拆分的主要云支出,以及任何预留容量承诺。 |
| 现金余额、月度烧钱速度和现金跑道 | 即便近期融资,仍无法承销判断资本充足性。 | 提供当前现金瀑布、过去六个月烧钱速度和情景现金跑道模型。 |
| 客户集中度、NRR、流失和队列扩张 | 除了客户故事,无法检验收入质量或耐久性。 | 提供前 20 大客户收入集中度、logo 流失、金额流失和队列 NRR。 |
| 公开申报和审计线索深度 | 缺少 SEC 经营财务数据,投资人只能依赖管理层材料。 | 提供经审计财务报表、董事会材料 KPI,以及任何贷方报告包。 |
每一行都是实质性尽调障碍,而不是锦上添花。公开证据只能建立叙事方向,不能给出可承销判断的私有指标。
[CI047, CI050]05产品与技术
5.1 按客户工作流理解的产品界面
Baseten 现在覆盖现代 AI 部署工作流的大部分,而不是单一托管 SKU。最轻的一端,Model APIs 让团队替换 OpenAI 或 Anthropic base URL 后立即调用共享前沿 / 开放模型,适合原型或不需要专用 GPU 的产品。更重的一端,Truss 把自定义或开源模型打包成可复现部署制品;Dedicated Inference 则加入租户隔离容量、自定义扩缩和对更严格延迟或合规需求的支持。Chains 位于单模型推理之上,用于多步骤 RAG、转写或多模态流程;Frontier Gateway 加入品牌化 URL 以及面向模型实验室 API 变现的计费 / 速率限制控制;Baseten Training/Loops 则试图打通从 checkpoint 创建到生产服务的闭环。按客户工作流看,公司卖的是从即时 API 评估到自定义生产推理的分级路径,而不只是原始 GPU 租赁。[CE001, CE002, CE006, CE007, CE008, CE009]
| 模块 / 资产 | 主要用户 | 状态 / 成熟度 | 核心功能 | 差异化 | 尽调缺口 |
|---|---|---|---|---|---|
| Model APIs | 评估或上线前沿 / 开放模型的应用开发者 | 正式可用(GA)/ 成熟共享服务 | 在 Baseten 托管的共享基础设施上,立即获得兼容 OpenAI 和 Anthropic 的推理 | 摩擦最低的入口;内置缓存、工具调用、结构化输出,并提供迁移到专用部署的路径 | 设计上就是共享基础设施;公开文档未披露租户级争用控制或单模型基准方法 |
| Dedicated Inference | 在生产环境服务自定义、微调或自研模型的团队 | 正式可用(GA)/ 核心企业产品面 | 单租户或客户可控推理,支持定制硬件、扩缩容和部署选项 | 结合托管性能调优、跨云自动扩缩容和企业控制界面 | 唯一公开合同 SLA 是 99.9%;已发布 GPU-hour 定价高于自助式同业 |
| Truss | 打包并迭代自定义部署的 ML 工程师 | 成熟开源 CLI,2026 年 5 月仍有活跃发布 | 打包模型代码、权重、依赖和 GPU 配置;通过 uvx truss push/watch 部署 | 一次编写的封装抽象,支持热重载和多种服务框架 | 开源活跃度健康,但公开文档没有量化 Truss 在企业中的具体采用情况 |
| Chains | 构建 RAG、转录或多模型工作流的团队 | 正式可用(GA)/ 生产工作流层 | 编排 Python chainlet,并为每一步配置硬件、依赖和自动扩缩容 | 让 Baseten 能销售复合 AI 工作流,不必强迫客户采用单体模型部署 | 公开性能主张只是方向性;具体工作流延迟取决于设计和工作负载 |
| Frontier Gateway | 将自有托管模型商业化的 AI 实验室 | 正式可用(GA)/ 专用变现产品面 | 白标推理 API,带密钥管理、计费、计量、速率限制和品牌 URL 路由 | 对想拥有自有 API 品牌和变现层的实验室,Baseten 变成幕后基础设施 | 公开文档未说明客户数量、支持的计费边界情况或结算工作流 |
| Training Jobs / Loops | 部署前训练或后训练模型的研究和基础设施团队 | Training Jobs = 正式可用(GA);Loops = 早期访问 | 托管 GPU 训练,并提供把检查点部署到推理端点的路径 | 尝试在一个平台内闭合训练到推理的循环,而不是交给另一家厂商 | Loops 仍处早期访问,因此相较推理栈成熟度不均衡 |
状态标签反映 Baseten 截至 2026-05-30 的公开措辞。“成熟”指多次描述为正式可用(GA)、且有操作文档的产品面;并不意味着功能质量经过外部审计。
[CE001, CE002, CE006, CE007, CE008, CE009]| 用户任务 | 当前工作流 | Baseten 方案 | 公开收益 | 局限 |
|---|---|---|---|---|
| 快速用前沿 / 开放模型做原型 | 不搭建部署基础设施也能切换供应商 | Model APIs | 将现有 OpenAI 或 Anthropic SDK 指向 Baseten,即可立刻调用受支持模型 | 客户接受受支持模型列表和共享基础设施模式,而不是选择具体硬件 |
| 将开源或自研模型部署到生产环境 | 打包模型、选择硬件 / 引擎、暴露稳定端点 | Truss + Dedicated Inference | 配置驱动的部署路径,支持 TensorRT-LLM 或 custom-server 选项、可观测性和环境提升 | Baseten 没有发布通用基准方法,客户仍需逐个模型验证性能 / 成本权衡 |
| 运行复合 AI 应用 | 将多步骤工作流拆到专用组件中 | Chains | 每个 chainlet 可使用自己的硬件和自动扩缩容,减少单体 GPU 浪费和延迟瓶颈 | 公开性能主张只是方向性;具体工作流延迟取决于设计和工作负载 |
| 将实验室自有模型商业化 | 向第三方客户开放模型,并配置计量和速率限制 | Frontier Gateway | 白标 URL、密钥管理、使用限制和按客户计费,让团队无需从零搭建 API 网关 | 营销表层之外的商业和合同细节未公开 |
| 训练或微调后部署检查点 | 运行训练代码、同步检查点,并推广到推理 | 训练路径:Training Jobs / Loops + deploy_checkpoints | 同一厂商可覆盖托管训练基础设施和下游部署端点 | Loops 仍处早期访问,因此公开材料中的最高级后训练路径尚未完全成熟 |
收益来自公开产品主张和客户证明结果,并不是保证的客户成效。每行描述的是 Baseten 最清晰营销的工作流,而不是所有可能实现变体。
[CE001, CE002, CE005, CE006, CE007, CE008]团队如何从评估或封装走向 Baseten 上的生产推理,并可选择训练和网关路径。
[CE002, CE005, CE006, CE007, CE008, CE009]5.2 部署架构与运营模型
Baseten 公开语料中最清晰的技术差异点,是它比许多 AI 基础设施创业公司更具体地解释部署路径。Truss 抽象打包、依赖和 GPU 配置;Baseten 的构建步骤随后验证并上传包,在选择引擎路径时用 TensorRT-LLM 编译受支持的 LLM,并把生成的容器部署到专用模型子域后面。MCM 控制平面位于训练和推理之下,抽象云供应商差异,并在需要时跨区域或供应商重新路由容量。请求路由会从 URL 路径解析环境名,环境在部署晋升时保持稳定端点,异步请求进入队列以保护实时流量免受后台工作的影响。BDN 通过在多个层级镜像并缓存大型模型权重来处理冷启动瓶颈,让新副本更少依赖外部存储。结果是一个推理优先架构,具备明确的构建、路由、自动扩缩和权重交付原语。[CE003, CE004, CE005, CE010, CE011, CE012]
| 层 / 组件 | 公开机制 | 关键依赖 | 风险 / 局限 |
|---|---|---|---|
| 封装层(Truss) | 从 config.yaml 或 Python 模型代码打包模型定义、依赖、密钥、缓存和 GPU 配置 | Baseten CLI、GitHub / PyPI 分发、用户源代码仓库 | 抽象层很强,但部署成功仍取决于特定模型调优和用户提供的权重 |
| 构建 / 编译路径 | Engine-Builder-LLM 下载权重,用 TensorRT-LLM 编译,应用量化 / 张量并行,并产出服务容器 | Hugging Face 或云存储权重源、兼容 CUDA 的 GPU 目标 | 编译可能需要数分钟,公开文档没有对每个模型 / 硬件组合做基准 |
| 运行时优化层 | 推理运行时暴露 TensorRT、SGLang、vLLM、TGI、TEI、投机解码、结构化输出、KV-cache 优化和拓扑感知并行 | 模型架构支持、Baseten 推理运行时、GPU 内存 / 布局假设 | 优化选项很多,但并非都按工作负载公开验证 |
| MCM 控制平面 | 统一跨云 / 区域 GPU,供给资源、监控健康状态,并在容量紧张或故障时重路由 | 底层云 GPU 供应、网络、Baseten 控制平面 | 跨云抽象降低锁定效应,但引入对 Baseten 自有编排层的依赖 |
| 权重交付 / 冷启动路径 | BDN 将权重镜像到 Baseten 存储,并在镜像源、集群和节点层缓存 | 首次镜像的上游权重仓库、Baseten blob 存储、集群内缓存 | 首次部署仍依赖上游权重可用性;“快 2-3x”的基准方法未公开 |
| 请求路由 / 环境 | 每个模型获得一个子域名;URL 路径解析环境;异步请求排队;提升操作保持端点名称稳定 | Baseten API 网关、环境配置、自动扩缩容组件、队列服务 | 缩容到零带来冷启动权衡,区域保证需要特殊端点 |
| 工作流编排 / 训练 | Chains 协调多步骤工作流;Training Jobs 和 Loops 通过 MCM 供给 GPU,并可将检查点部署到推理 | Python SDK / CLI、MCM、存储 / 检查点同步 | Loops 仍处早期访问,训练产品成熟度落后于核心推理产品面 |
| Enterprise 控制 / 租户隔离 | 单租户、自托管、混合和区域环境,加上 SSO/SCIM 与合规政策边界 | 客户 IdP、Baseten 支持配置;若自托管则用客户云 | 部分控制需要销售 / 支持介入,不能完全自助开通 |
本表混合了文档直接事实与对运营依赖的综合判断。「风险 / 限制」指公开限制说明或尽调事项, 不代表已观察到故障。
[CE003, CE004, CE005, CE006, CE007, CE010]从访问入口到封装 / 编排、运行时和跨云基础设施,分层展示 Baseten 的公开架构。
[CE001, CE002, CE006, CE007, CE009, CE010]5.3 信任、数据处理与可靠性控制
Baseten 的信任姿态按创业公司标准很强,也尤其重要,因为公司想承载敏感推理工作负载。公开安全文档称,Baseten 维持 SOC 2 Type II 和 HIPAA 合规,默认不存储模型输入、输出或权重,从不在用户之间共享 GPU,把客户隔离到专用 Kubernetes namespace,并围绕工作负载隔离使用 Calico、Falco 和 Gatekeeper 等控制。企业版和定价页面也宣传自托管、单租户、混合和区域限制选项;regional-environments 文档解释说,真正的数据驻留需要独立区域端点,而不是默认环境 CNAME。细节在可靠性:Baseten 营销经常使用「四个 9」话术,但唯一公开合同承诺是 Dedicated Inference SLA 的 99.9% 月度可用性。公开状态页还记录了 2026 年 5 月多起事故,因此尽调应把信任 / 合规视为优势,把公开可靠性保证视为更混合。[CE015, CE016, CE017, CE018, CE019, CE020]
| 控制项 / 信号 | 公开状态 | 范围 / 证据 | 含义 | 缺口 |
|---|---|---|---|---|
| SOC 2 Type II + HIPAA | 已公开 | 安全文档、Enterprise 页面和定价页面都提到 SOC 2 Type II 与 HIPAA | 为企业推理工作负载提供扎实的基线信任信号 | 已审阅材料中没有公开证书文件或审计范围细节 |
| 默认不存储提示 / 输出 / 权重 | 已公开但有保留 | 安全文档称,Baseten 默认不存储输入、输出或权重,但临时异步存储和可选缓存除外 | 这对敏感推理的隐私和 IP 定位很关键 | 需要审查合同 / DPA,确认留存边界情形和客户启用缓存的具体行为 |
| GPU 与命名空间隔离 | 已公开 | 安全文档称,Baseten 从不在用户之间共享 GPU,并为每个客户分配专用 Kubernetes 命名空间,配套 Calico/Falco/Gatekeeper 控制 | 支撑租户隔离主张,不止停留在泛泛云营销 | 未审阅到公开渗透测试报告或架构图 |
| 区域环境 / 数据驻留 | 已公开但需支持配置 | 文档说明区域受限副本和特殊区域端点格式 | 对需要路由保证的 GDPR / 数据驻留买家有用 | 配置需要 Baseten 介入,公开文档未说明交付周期或定价 |
| 身份与生命周期控制 | 2026 年扩展 | SSO/SCIM 更新日志在 Enterprise 中加入 SAML 2.0、SCIM 2.0、JIT 开通、账号注销和基于组的角色 | 提升企业管理规范性和采购就绪度 | 未公开映射到具体 IdP 限制或 SCIM 属性 |
| 托管灵活性 | 已公开 | Enterprise、Dedicated Inference 和安全页面描述 Baseten Cloud、自托管、混合、单租户模式 | 买家可在速度、控制权和复用云承诺之间取舍 | Baseten 管理与客户管理责任的具体边界并未完全公开 |
| 合同可用性 | 信号混合 | SLA 合同称 Dedicated Inference 目标为月度 99.9% 可用性;营销页面常使用 99.99 / 四个 9 表述 | 公开采购应以法律 SLA 为准,而不是首页简写 | 未找到 Model APIs、Chains 或更广泛 Web 应用的公开 SLA |
| 运营事故可见性 | 已公开但信号混合 | 公开状态页显示 2026 年 5 月多起事故;第三方可达性跟踪器称详细事故数据不可用 | 可见性存在,但独立正常运行时间佐证偏弱,标题式正常运行时间面板可能掩盖短时事故 | 尽调需获取合同事故响应条款、RCA 访问权和服务积分历史 |
多页官方材料相互印证时,置信度最高;如果公开文档要求联系支持或审查合同,置信度较低。 本表描述公开信息,不代表 Baseten 在私下尽调中可能提供的材料。
[CE015, CE016, CE017, CE018, CE019, CE020]围绕 Baseten 推理栈的关键依赖:上游权重、GPU 云、区域 / 身份控制,以及非核心 SaaS 工具。
[CE010, CE016, CE023, CE024, CE025, CE036]5.4 开发者信号、客户证据与竞争定位
Baseten 的护城河不是最低公开单位价格,而是打包工具、性能工程和托管跨云运营的组合。Truss 给了 Baseten 一个真实开发者界面:GitHub 仓库和 PyPI 包显示,它有一个活跃的开源打包 CLI,2026 年 5 月围绕 Loops 和部署工作流频繁发布。客户证据强于普通 logo 展示页:Writer 称 Baseten 构建的 TensorRT-LLM 引擎提高了每秒 token,降低了首 token 时间和成本;OpenEvidence 则把显著降低的延迟、更快部署和更低维护负担归因于 Baseten 的 MCM、向量嵌入运行时 和工具。取舍在独立定价比较中很明显。HostFleet 的 2026 年 4 月矩阵显示,在可比 GPU 实例上,Baseten 定价高于 Runpod 和 Modal;Runpod 和 Modal 则宣传更激进的零闲置和冷启动定位。面对 AWS、Google 和 Microsoft,Baseten 范围更窄,但更容易被理解为推理专业层,而不是完整超大规模云 AI 平台。[CE026, CE027, CE028, CE029, CE030, CE031]
5.5 路线图与产品成熟度
公开 2026 年路线图信号显示,Baseten 正围绕一个相当稳定的产品架构成熟运营控制,而不是不断添加新产品家族。2026 年值得注意的发布包括 SSO/SCIM、滚动部署、BDN 和计费用量 API——这些功能让平台更易治理、更新更安全、冷启动更快,也更容易做财务度量。这个发布组合说明 Baseten 正从「能不能服务模型?」走向「能不能在企业流程里运行关键任务推理?」同时,堆栈成熟度并不均匀。Training Jobs 已经公开 GA,Loops 仍处于早期访问,因此训练到推理故事在战略上有吸引力,但还没有全面被生产证明。公开材料也很少说明基准方法、区域控制的企业落地 精确周期,或当前披露的 2026 年变更日志之外的产品优先级,留下一些产品技术尽调事项未解。[CE007, CE020, CE021, CE022, CE037, CE038]
| 日期 / 阶段 | 功能 / 里程碑 | 状态 | 含义 | 来源 |
|---|---|---|---|---|
| 2026-03-04 | 计费用量 API | 已发布 | 通过每日 API 拆分,让 Dedicated Inference、Training 和 Model APIs 更容易做财务计量 | Baseten 更新日志 |
| 2026-03-19 | Baseten Delivery Network (BDN) | 已发布 | 说明 Baseten 在缓解冷启动上投入,也在首次镜像后减少对上游权重仓库的依赖 | Baseten 更新日志 + How Baseten Works 文档 |
| 2026-03-30 | 滚动部署 | 已发布 | 为生产发布增加更安全的零停机上线流程,并改善环境生命周期控制 | Baseten 更新日志 |
| 2026-05-14 | SSO 与 SCIM | 已在 Enterprise 发布 | 改善大型客户的身份治理和账号注销 | Baseten 更新日志 |
| 2026 年公开产品状态 | Training Jobs | GA | 说明托管训练产品不再是实验性产品 | Training 产品页面 |
| 2026 年公开产品状态 | Loops | 早期访问 | 说明 Baseten 在投入后训练 / RL 工作流,但公开产品面尚未完全打磨成熟 | Training 产品页面 + Truss 发布 |
这里只列明示披露的公开里程碑。表中没有出现,不应理解为内部路线图没有; 只说明该项未出现在已审阅的公开材料中。
[CE007, CE020, CE021, CE022, CE037, CE038]截至 2026-05-30,按成熟度展示 Baseten 主要产品界面的能力。
成熟度标签是综合判断,不是公司给出的分数。“差异化”指公开材料显示相对优势更清晰或外部证明更强,不代表该能力在所有维度都客观领先。
[CE007, CE026, CE027, CE031, CE032, CE037]5.6 要点展示
06客户情况
6.1 客户分层与买方画像
Baseten 的公开客户证据指向一个买方集合,主要由 AI 原生软件构建者组成;他们自己的终端产品成败取决于推理速度、可靠性和成本。买方通常是 ML、平台或产品工程负责人;一旦工作负载从模型评估走向生产,用户群就会扩大到应用工程师、安全团队和运营负责人。公开具名样本覆盖 Writer 和 Notion 等企业智能体平台,OpenEvidence 和 Abridge 等受监管医疗应用,Speechify 等语音与朗读应用,Superhuman 等生产力软件,Gamma 等创意工具,以及 Clay 和 Cursor 等 GTM 或编码产品。这个宽度重要,因为它说明 Baseten 卖的不只是实验基础设施。同时,披露客户簿仍然压倒性偏向 AI 原生软件,而不是多元化的传统企业集合,因此客户宽度真实存在,但在公开资料中还没有达到机构级广度。[CU001, CU002, CU003, CU004, CU005, CU006]
| 分群 | 买方 / 用户 / 付费方 | 代表性公开客户 | 公开价值信号 | 缺口 |
|---|---|---|---|---|
| 企业智能体与知识平台 | 买方:CIO / AI 平台负责人;用户:应用与运营团队;付费方:企业软件供应商 | Writer, Notion | 关键任务 AI 智能体,需要安全和治理 | 未披露企业客户与创业公司客户的收入结构 |
| 医疗 AI 应用 | 买方:临床 / IT 负责人;用户:临床医生、护理团队、收入周期运营;付费方:医疗 AI 供应商或医疗系统软件预算 | OpenEvidence, Abridge, Ambience | 受监管医疗信息和临床文档工作负载 | 未公开披露合同金额或医疗系统数量 |
| 语音与语音合成应用 | 买方:产品 / ML 平台负责人;用户:终端用户和内容团队;付费方:语音应用供应商 | Speechify | 承受实时延迟压力的消费者级 TTS 与语音基础设施 | 未公开披露音频工作负载内部收入集中度 |
| 生产力与协作应用 | 买方:产品 / 工程负责人;用户:专业人士和知识工作者;付费方:软件供应商 | Superhuman, Notion | 推理直接影响邮件、工作空间和智能体 UX | Superhuman 的生产 KPI 公开,Notion 未公开 |
| 创意与创作者经济平台 | 买方:产品 / 基础设施负责人;用户:创作者和轻专业用户;付费方:软件供应商 | Gamma, Patreon | 大规模图像生成和创作者媒体工作负载 | 消费级客户 logo 本身不能证明长期续约经济性 |
| GTM 与开发者工具 | 买方:工程 / 营收运营负责人;用户:开发者、GTM 运营、招聘人员;付费方:软件供应商 | Clay, Cursor, Mercor | 显示 Baseten 覆盖代码、GTM 和 AI 经济工具 | 多数这类客户只有公开点名,没有更多细节 |
分群由 Baseten 案例研究、融资材料和官方客户页面拼出;Baseten 不按分群公布客户数或 ARR 结构。
[CU001, CU002, CU003, CU004, CU005, CU006]| 客户 | 分群 | 证据类型 | 公开内容 | 验证强度 | 缺口 |
|---|---|---|---|---|---|
| Abridge | 医疗 AI | Business Wire + Abridge 网站 | 被点名为 Baseten 客户;Abridge 向大型医疗系统销售企业临床对话 AI | 中 | 未披露 Baseten 相关部署范围或结果 |
| Cursor | 开发者工具 | WorkOS + Cursor 网站 | 被点名为 Baseten 客户;Cursor 服务 AI 辅助编码,并称超过一半 Fortune 500 信任其产品 | 中 | 未披露 Baseten 相关工作负载细节或经济性 |
| Notion AI | 企业生产力 | Business Wire + Notion AI 页面 | 被列为 Baseten 客户;Notion AI 主打智能体、企业搜索和面向企业的零留存控制 | 中 | 未披露 Baseten 相关性能或支出数据 |
| Clay | GTM 软件 | WorkOS + Clay 网站 | 被列为 Baseten 客户;Clay 服务大量 GTM 团队,提供数据增强和工作流自动化 | 中 | 未披露 Baseten 相关生产指标 |
| Mercor | AI 经济 / 招聘 | Business Wire + Mercor 网站 | 被列为 Baseten 客户;Mercor 以驱动 AI 经济为定位 | 中低 | 有公开客户提及,但未说明用例和对基础设施的依赖 |
这些行把具名客户集扩展到旗舰案例研究之外,但证明力弱于六个详细故事,因为它们通常缺少部署细节。
[CU010, CU011, CU047, CU048, CU049, CU050]Baseten 通常从技术买家评估模型性能切入;当工作负载变成业务关键,运营、安全和采购相关方再加入。
[CU001, CU003, CU036, CU037, CU038, CU039]6.2 采用轨迹与具名生产证据
Baseten 最好的公开采用证据,不是披露的客户数量时间序列,而是一组来自参考客户的工作负载规模和结果披露。公司称推理量过去一年增长 100x,客户故事索引现在覆盖医疗、代码、音频、演示和运营用例。旗舰六个案例研究更像生产级而非试点:OpenEvidence 披露每周数十亿次请求,并覆盖美国每个邮编区域的医生;Speechify 披露每月 161B+ 字符、服务 60M+ 用户;Gamma 披露每天 3M+ 图片、服务 70M+ 用户;Superhuman 称数十个自定义模型进入生产;Patreon 报告在规模化 Whisper 部署上节省大量成本。当 Baseten 和客户共同给出具体延迟、成本、吞吐或工作流指标时,证据质量最强,不过公开样本仍是精选集。[CU012, CU013, CU014, CU015, CU016, CU017]
| 指标 | 数值 | 日期 | 来源 | 置信度 | 含义 / 缺失分母 |
|---|---|---|---|---|---|
| Baseten 平台推理增长 | 过去一年推理量增长 100 倍 | 2026-01 | Baseten Series E 博客 | 中 | 需求信号强,但未按客户或工作负载拆分 |
| 公开客户背书库存 | 13 个案例研究、29 条客户证言、4 个视频、654 条评分 | 2026-05 | FeaturedCustomers | 中 | 公开背书面很大,但聚合器方法论并不完全透明 |
| OpenEvidence 工作负载规模 | 每周数十亿次请求;医生遍布美国每个州和邮编 | 2026 年查看 | Baseten 案例研究 | 中 | 显示全国临床覆盖,但不说明收入或合同扩张 |
| Speechify 工作负载规模 | 每月 161B+ 字符,服务 60M+ 用户 | 2026 年查看 | Baseten 案例研究 | 中 | 消费者级推理负载很大 |
| Gamma 工作负载规模 | 每天 3M+ 张图片,服务 70M+ 用户 | 2026 年查看 | Baseten 案例研究 | 中 | PLG 规模验证强,但未披露 Gamma 流量中有多少跑在 Baseten 上 |
| Superhuman 生产覆盖广度 | 一周项目后,数十个自定义嵌入模型切入生产 | 2026 年查看 | Baseten 案例研究 | 中 | 即使没有量级指标,也说明部署面较宽 |
| 其他具名客户 | Abridge、Cursor、Clay、Notion、Mercor 等被公开点名 | 2026-01 | Business Wire / WorkOS | 中 | 覆盖面超出六个旗舰故事,但大多数缺少量化部署细节 |
Baseten 不公布统一的客户数量序列,因此本表使用工作负载和参考客户代理,而非单一活跃账户 KPI。
[CU012, CU013, CU014, CU015, CU016, CU017]| 客户 | 分群 | 部署 / 用例 | 生产 vs 试点 | 结果 | 限制 |
|---|---|---|---|---|---|
| Writer | 企业 AI 平台 | 在 Baseten 上用 TensorRT-LLM 服务自定义 70B 垂直领域 LLM | 生产 | tokens/sec 提高 60%,TTFT 降低 23%,每百万 token 成本降低 35% | 未披露续约期限或合同金额 |
| OpenEvidence | 医疗 AI | 面向临床医生的医疗搜索和嵌入推理 | 生产 | 延迟从 >700ms 降至 160ms,部署快 6 倍,基础设施维护减少 8x+ | 未公开披露支出或账户扩张 |
| Speechify | 语音 / TTS | 托管 10+ 个生产模型部署,覆盖 TTS、语音转换和解析 | 生产 | 每百万字符成本降低 44%,p99 延迟降低 30-50%,启动快 4.5 倍 | 未披露收入集中度或合同期限 |
| Gamma | 创意 AI 平台 | 以大规模用户量服务开源图像生成模型 | 生产 | 生成速度提升 30%-80%,效率提升 20%,每天 3M+ 张图片 | 未披露留存或每用户支出指标 |
| Superhuman | AI 原生产力 | 为核心产品功能部署数十个自定义嵌入模型 | 生产 | P95 延迟降低 80%,快速迁移且对用户零影响 | 未公开席位数或合同经济性 |
| Patreon | 创作者经济平台 | 服务 Whisper 转写和字幕工作负载 | 生产 | GPU 成本降低 70%,节省 440+ 小时,年节省近 $600k | 未公开续约或扩张指标 |
这是 Baseten 已量化结果部署的部分公开样本,不是完整客户名单。
[CU013, CU014, CU015, CU016, CU017, CU018]公开采用路径始于技术评估,随后进入一个生产工作负载;到更后期才扩展到专属计算、治理和更大的企业承诺。
[CU003, CU036, CU037, CU039, CU040, CU041]Baseten 与客户披露具体性能结果的地方,公开证明最强;所有已披露账户的持久性可见度都最弱。
[CU014, CU016, CU017, CU018, CU019, CU021]6.3 耐久性、满意度与扩张证据
耐久性证据方向上积极,但还不完整。积极的一面是,第三方客户引用聚合器给出了异常强的公开口碑代理指标:FeaturedCustomers 基于 654 个评分给出 4.8/5 的引用分,PeerSpot 强调协作和成本效率,单个客户引语也反复把 Baseten 描述为执行、正常运行时间或自助部署上的赢家。产品包装也显示出可行的落地后扩张路径:从 Basic 按用量付费,升级到 Pro 专用算力,再到 Enterprise 自托管或区域受限部署。医疗和企业页面把这条路径落到具体场景上: HIPAA 敏感工作流、单租户集群、故障切换和工程师深度支持。关键限制是,已审阅公开来源都没有给出 NRR、GRR、 续约队列或合同期限,因此扩张只能从包装和客户引语推断,而不是从可衡量账户经济性验证。[CU031, CU032, CU033, CU034, CU035, CU036]
| 指标 / 代理 | 数值 | 分群 | 置信度 | 尽调问题 |
|---|---|---|---|---|
| 组合 NRR / GRR / logo 流失 | 全部客户 | 低 | 要求提供队列留存、毛收入和净收入留存,以及按分群拆分的流失 | |
| 合同期限 / 续约节奏 | 全部客户 | 低 | 要求按套餐提供合同期限中位数、续约日期和承诺最低用量 | |
| 第三方背书评分 | 654 条参考评分给出 4.8/5 | 跨客户公开背书 | 中 | 核实有多少评分仍然有效,并可归因于 2025 年后的产品面 |
| 定性评论总结 | PeerSpot 强调部署速度、灵活性和成本效益 | 跨行业用户 | 低-中 | 索取原始评论数量和更细颗粒度的情绪分布 |
| OpenEvidence 客户证言 | 供应商筛选最终以 Baseten 明确胜出告终 | 医疗 AI | 中 | 询问迁移后的续约历史和支出增长 |
| Speechify 客户证言 | Speechify 称,合作仍在增长,并实现了其所知推理供应商中最高正常运行时间 | 语音 / TTS | 中 | 询问正常运行时间 SLA、事故频率和合同期限披露 |
公开耐久性证据以客户证言为主,缺少队列指标,因此不能把满意度代理误读为经审计的留存数据。
[CU031, CU032, CU033, CU034, CU035, CU046]6.4 集中度、切换和竞争压力
公开记录里的主要客户风险不是明显流失,而是可见度集中。Baseten 的具名账户不止六个旗舰案例,但可量化证明仍落在一小群 AI 原生软件公司里。这带来两个尽调问题。第一,头部客户收入占比、合同期限和续约率没有公开披露,投资人无法判断账本是否由少数超大工作负载主导。 第二,竞争压力真实存在。HostFleet 2026 年 4 月矩阵显示,在多类常用 GPU 上 Baseten 是列出的最贵选项;Runpod 2026 年对比把 Baseten 排在第五,并称部分竞争者的标称冷启动明显更快。WorkOS 也描述了一个临界点:客户每月支出达到 $10k-$50k 后,会开始考虑控制力更强、成本更低的开源方案。Baseten 用开放运行时和客户模型不锁定来对冲这一风险,但这也意味着可迁移容忍度会保持很高。换句话说,Baseten 可能比自研专有栈更容易采用,但它必须持续靠运营表现和支持重新赢得客户,而不能靠锁定效应。[CU039, CU040, CU041, CU042, CU043, CU044]
| 扩张驱动因素 | 集中度风险 / 约束 | 影响 | 尽调路径 |
|---|---|---|---|
| Basic → Pro → Enterprise 套餐梯度 | 公开套餐梯度暗示上销售空间,但转化率未披露 | 如果客户规模超过自助推理套餐,可支撑先落地再扩张 | 按客户队列索取套餐结构,以及 Basic 升级到 Pro / Enterprise 的扩张率 |
| Enterprise 控制与自托管部署 | 可能让销售偏向更少、更大的技术买家,并加重服务交付动作 | 利好受监管和敏感工作负载,但可能提高头部客户集中度风险 | 索取 ARR、工作负载量和部署模式维度的前 10 大客户 |
| 面向医疗的合规姿态 | 如果医疗成为主导扩张向量,垂直集中度可能加深 | 只要合规和可靠性经得住验证,受监管工作负载可能更有黏性 | 索取医疗收入占比、客户数量和续约历史 |
| 旗舰案例集中度 | 量化公开验证集中在六个故事,且大多是 AI 原生软件 | 投资人不能凭狭窄参考集推断整体组合耐久性 | 索取更广泛客户群的匿名队列统计 |
| 较高公开定价 | 较高公开 GPU 价格和最低部署成本会增加切换压力 | 定价可能拖慢成本敏感型工作负载扩张,或让迁移到其他平台更有吸引力 | 索取价格敏感账户的赢 / 输和流失原因 |
| 开源模型可迁移性 | 随着支出上升,客户越来越能切换模型,并把更多基础设施带回内部 | Baseten 必须继续靠速度、支持和经济性取胜,而不是靠锁定 | 索取客户存续期、自托管转化,以及优化后保留下来的工作负载数据 |
最强扩张信号来自套餐和客户引述;最强风险信号来自公开验证集中度和竞争性价格压力。
[CU036, CU037, CU038, CU039, CU040, CU041]07风险
7.1 法律和监管风险集中在合规范围、客户合同和扩张中的 AI 规则
Baseten 公开呈现的信任姿态表面上很强:公司称已通过 SOC 2 Type II 认证、符合 HIPAA、对齐 GDPR,并能为敏感工作负载运行区域受限或自托管部署。风险在于,一旦完整阅读法律栈,这套营销表述会明显收窄。Baseten 的安全文档称默认不存储模型输入或输出,并可执行合规策略;医疗和企业页面则向关键任务工作负载营销符合 HIPAA 的基础设施。但 Baseten 公开条款中的 DPA 写明,除非另有书面约定,客户不得提交 PHI 和其他受限数据,并把法律基础、通知和多项违规通知责任留给客户。这不证明产品有缺陷;它说明,仅靠公开网站不足以承销受监管使用。 监管压力也正从隐私扩展到 AI 治理和采购。欧盟委员会 AI 政策页面强调 AI Act 落地、行业指引、实践准则,以及帮助企业合规的服务台。Baseten 这类向医疗和其他受监管企业工作负载销售的推理厂商,即使不是最终应用层决策者,也更可能遇到围绕数据驻留、文档、模型治理和责任共担边界的更长尽调周期。因此,最重要的法律风险不是已知执法行动,而是合规范围不清。 [CR001, CR002, CR003, CR008, CR009, CR010]
| 风险 | 证据 | 发生概率 | 严重程度 | 缓释成熟度 | 剩余敞口 | 尽调路径 |
|---|---|---|---|---|---|---|
| 医疗合规范围取决于是否签署对公开 Restricted Data 排除条款的例外 | HIPAA 合规营销与 DPA 条款并存;DPA 禁止处理 PHI,除非另有书面约定 | 中 | 高 | 部分 | 高 | 要求提供已签署的 BAA/HIPAA 附录,以及允许的 PHI 流转清单 |
| EU AI Act 和 GDPR 落地可能拖慢受监管企业销售 | 欧盟委员会指引强调 AI Act 落地支持、操作指南和行业采用 | 中 | 高 | 早期 | 中高 | 审阅欧盟法律备忘录、数据驻留控制、DPIA 模板和审计材料 |
| 子处理方更新的异议窗口很短,公开主要补救手段是终止合同 | Baseten 提前 15 天通知,客户有 5 天反对子处理方变更 | 中 | 中 | 基础 | 中 | 审阅协商后的子处理方通知权和变更控制流程 |
| 客户侧法律依据和数据泄露处置义务可能增加部署摩擦 | DPA 将合法依据、通知和多项通报义务留给客户 | 中 | 中 | 基础 | 中 | 投产前梳理共同责任矩阵 |
| 默认合同语言可能削弱关键任务定位 | 产品页面主打关键任务推理,但条款排除时间敏感或关键任务用途 | 中 | 高 | 低 | 高 | 为关键工作负载协商定制 SLA 和例外条款 |
各行反映截至 2026-05-30 公开可见的法律和监管风险;严重程度按投资相关性排序,不构成法律意见。
[CR003, CR008, CR009, CR010, CR011, CR012]7.2 运营和安全风险由合同范围与可见事故定义,不只看正常运行时间营销
Baseten 对可靠性营销很积极。企业、医疗、专用推理、Frontier Gateway 和 Model API 页面都承诺四个 9 或 99.99% 正常运行时间、主动-主动冗余,或高可靠多云运营。已发布 SLA 更窄:只有 Baseten 作为托管方时的 Dedicated Inference 适用,月可用性目标为 99.9%;服务抵扣上限为月费 40%,且客户必须在 24 小时内提交申请。条款还进一步说明,服务未授权用于时间关键或任务关键功能,也不保证不中断或无错误。这在产品定位和默认合同保护之间留下了真实尽调缺口。 Baseten 状态页也说明这不是理论风险。该服务报告了多起 2026 年 5 月事故,包括持续调查、已识别修复、监控更新,以及 90 天视图中的重大故障标记。第三方监控只能提供部分安慰,因为 Servicealert 称详细事故数据不可用,依赖的是可达性快照,而不是完整根因报告。Baseten 已经推出滚动部署和部署健康工具等有用缓释措施,但可靠性风险仍应放在本章前列,因为平台卖进的是生产工作流,客户对停机和延迟回退高度敏感。 [CR013, CR014, CR015, CR016, CR017, CR018]
| 故障模式 | 发生概率 | 严重程度 | 缓释成熟度 | 剩余敞口 | 未解决缺口 |
|---|---|---|---|---|---|
| 可靠性营销承诺超过默认合同 SLA | 高 | 高 | 部分 | 高 | 需要当前定制 SLA 案例和真实企业合同中的服务抵扣条款 |
| 产品快速扩张期间,控制平面或推理事故可能反复出现 | 中高 | 高 | 部分 | 中高 | 需要公开状态页之外的复盘、Sev1 频率和 MTTR 数据 |
| 停机抵扣操作性偏弱:索赔必须在 24 小时内提交且设有上限 | 中 | 中 | 低 | 中 | 需要证明企业客户能协商更广的补救条款 |
| 合规政策或数据驻留变更需要 Baseten 支持介入 | 中 | 中 | 部分 | 中 | 需要管理员截图和变更控制工作流证据 |
| 即便有新的发布控制,部署回归仍会发生 | 中 | 中 | 中 | 中 | 需要滚动部署采用数据和回滚成功率 |
公开 SLA 和状态数据不披露客户专属补救、复盘或协商后的可靠性条款,因此剩余敞口仍高。
[CR013, CR014, CR015, CR016, CR017, CR018]运营、合同和定价风险会直接传导到信任、销售速度、毛利率和估值。
[CR017, CR019, CR023, CR026, CR027, CR034]7.3 依赖和商业模式风险来自上游容量、供应商链复杂度和高端定价
Baseten 应对基础设施风险的核心战略答案,是多云容量管理、跨云自动扩缩容、单租户选项,以及在客户云中运行的能力。这些都是有意义的缓释措施, 但没有消除对云合作伙伴、GPU 可用性和长尾第三方服务的依赖。Nudge Security 公开资料列出 AWS、Vercel、 Statuspage、SendGrid、Stripe、Segment、Sentry、GitBook 等 Baseten 可见供应链里的供应商。Baseten 自己的产品页面还承诺可访问最新一代 GPU、弹性容量和优先算力。也就是说,无论平台是否把这些包装成单一托管界面,上游容量和定价仍是基础性依赖。 商业风险在于,相比公开同业,Baseten 看起来偏贵。HostFleet 2026 年 4 月矩阵显示,在多类 GPU 上,Baseten 价格高于 Runpod、Modal 和 Fal.ai;Runpod 2026 年对比也给 Baseten 标上高端定价,并显示其冷启动区间慢于部分替代方案。Baseten 仍可能靠企业控制、支持和性能调优证明溢价合理;但如果客户能在明显更便宜的基础设施上复制可接受的延迟和正常运行时间,利润率和赢单率风险会快速上升。 这也是为什么性价比和供应商集中度应与经典供应商风险问题放在一起。 [CR021, CR022, CR023, CR024, CR025, CR026]
| 依赖项 | 对手方或暴露面 | 作用 | 集中度 | 失效情景 | 严重程度 | 缓释措施 | 剩余敞口 |
|---|---|---|---|---|---|---|---|
| 最新一代 GPU 容量 | 云 / GPU 供应商 | 支撑高端推理和自动扩缩容承诺 | 外部未知 | 容量收紧或价格上涨,挤压毛利并拖慢客户上线 | 高 | 多云路由与混合 / 自托管选项 | 高 |
| 可见 SaaS 控制平面供应商 | AWS、Vercel、Statuspage、SendGrid、Stripe、Segment 等 | 支撑网站、计费、监控、消息和运营 | 中 | 单一第三方停机会拖累客户体验或内部运营 | 中 | 供应商多元化和客户云选项 | 中 |
| 高端价格定位 | Runpod / Modal / Fal.ai / Replicate 构成公开对标组 | 影响客户支付意愿 | 高 | 同业以显著更低的入门成本提供可接受性能 | 高 | 主打可观测性、支持、安全和企业控制能力 | 高 |
| 服务占比较重的企业交付 | 前置部署工程师和支持团队 | 定制部署以达到延迟、吞吐和合规目标 | 中 | 支持负载扩张快于产品自动化 | 中高 | 标准化作战手册和自助工具 | 中高 |
| 状态可见性 | Baseten 状态页加不完整的第三方监控 | 可用性事件的主要外部信号 | 高 | 公开信号低估事故深度或复发 | 中 | 要求提供内部可靠性仪表盘和复盘 | 中 |
公司显然比单一云提供商更能缓释集中度,但公开证据仍让供应商组合和预留容量敞口不透明。
[CR021, CR022, CR023, CR024, CR025, CR026]Baseten 的风险面取决于云和 GPU 伙伴、第三方 SaaS 供应商、企业控制和客户特定合同。
[CR021, CR022, CR023, CR029, CR037, CR042]7.4 产品蔓延、快速扩张和估值压力推高执行与财务模型风险
Baseten 已不再是一家边界很窄的模型托管创业公司。公开材料现在覆盖 Model APIs、Dedicated Inference、Frontier Gateway、模型管理、自定义服务器、chains、Training Jobs,以及名为 Loops 的早期访问 RL 产品。与此同时,公司在前一年多轮融资后, 又于 2026 年 1 月以 $5B 估值完成 $300M Series E。资本降低了近期融资风险,但也抬高了执行门槛: 投资人和客户会期待公司把高端基础设施定位转化为可重复的企业增长,同时不牺牲可靠性或毛利率纪律。 人员和市场进入姿态放大了执行负担。Tracxn 显示员工快速增长,而 Baseten 自己的网站反复强调工程师深度支持、前置部署专家、 定制 SLA 和部署定制。这可以成为早期企业扩张的差异化,但也暗示一种服务较重的运营模式;如果产品复杂度、支持需求和客户特定安全要求并行扩张, 干净扩大的难度可能很高。因此,首要人员和执行问题不是 Baseten 是否有人才,而是组织能否在如此快速拓宽范围时,维持产品质量、客户响应和经济纪律。 外部资本市场也强化了这一风险:Modal 称在年化收入超过约 $300 million 后,以 $4.65 billion 估值融资 $355 million; 按 Sacra 估计,Fireworks AI 年化收入已约 $315 million;RunPod 说明资本少得多的竞争者仍能打价格战; CoreWeave 近 $100 billion 积压订单则显示,资金充足的基础设施玩家都在追逐推理机会。[CR030, CR031, CR032, CR033, CR034, CR035]
| 角色或职能 | 依赖或缺口 | 发生概率 | 严重程度 | 缓释措施 | 尽调路径 |
|---|---|---|---|---|---|
| 管理团队和运营节奏 | 在 $5B 估值下,必须把快速融资转化为可持续的企业级执行 | 中 | 高 | 新资金和标杆客户 | 审阅 2026 年计划、招聘目标和产品路线图排序 |
| 产品与工程 | 多条产品线同时扩张,包括一个早期访问的训练层 | 中高 | 高 | 共享推理栈和工具 | 索取 GA 就绪标准、缺陷积压清单和可靠性责任图谱 |
| 客户成功与支持 | 要支撑高端定价,可能需要工程团队深度介入 | 中 | 中高 | 前置部署能力和 Enterprise 方案打包 | 要求提供支持配比、升级 SLA 和客户访谈 |
| 销售与安全采购 | 企业控制功能受套餐限制,可能需要定制合同 | 中 | 中 | SSO/SCIM、自托管、定制 SLA | 审阅平均销售周期、安全审查转化率和医疗行业成交率 |
这张清单强调执行扩展性,而不是评价创始人质量;核心问题是 Baseten 能否扩大范围,而不把每个大客户都变成定制服务项目。
[CR030, CR031, CR032, CR033, CR034, CR035]最高剩余风险来自合规范围模糊、可靠性合同缺口、溢价定价和执行摊子过大。
[CR010, CR019, CR023, CR026, CR034, CR041]7.5 公开缓释措施有意义,但投资逻辑仍应由明确否决标准约束
Baseten 确实有可触达的缓释措施。自托管和混合部署降低锁定与数据驻留风险,Truss 提升可迁移性,滚动部署降低切换风险, SSO/SCIM 强化访问控制,计费用量 API 帮客户更好掌握支出。这些不是表面的官网徽章;它们是具体产品和运营功能,如果在生产账户中充分落地, 可以压低风险。即便是法律和监管风险,只要尽调确认 BAA 路径清晰、子处理方控制可接受、合同条款匹配工作负载关键性,也可以管理。 关键是把承销纪律写清楚。如果 Baseten 无法弥合公开可靠性营销与可执行 SLA 语言之间的缺口,如果性价比在没有量化企业 ROI 抵消的情况下仍远高于同业, 如果供应商集中度比多云叙事暗示的更紧,或者如果重支持销售只是维持账户基础健康所必需,投资逻辑就应失效。因此,正确姿态是有条件的确信: Baseten 有可信缓释措施,但剩余证据缺口足够重要,必须先转化为可监控触发器,才能建立高确信投资观点。 [CR020, CR029, CR037, CR038, CR039, CR040]
| 风险 | 可监控触发因素 | 阈值或事件 | 行动含义 |
|---|---|---|---|
| 医疗合规边界不清 | 已签署 BAA / HIPAA 附录可用性 | 顶级医疗机会没有可执行的 BAA 路径,或 PHI 边界不清 | 在拿到合同证据前,暂停受监管医疗业务承销 |
| 可靠性缺口 | Sev1 事故频率和 SLA 条款 | 90 天内出现两次或更多重大事故,或公开 SLA 之外没有协商补救 | 降低确信度、要求价格留置,或停止流程 |
| 价格竞争力 | 同业 GPU 经济性和客户 ROI | 在目标 GPU 类别上,Baseten 价格仍超过公开同业 2 倍,且没有量化企业 ROI 抵消 | 下调毛利和胜率假设 |
| 执行摊子过大 | GA 就绪度和支持负载 | Training/Loops、网关和专用部署都需要大量定制支持才能稳定运行 | 按服务占比较重而非软件化模型处理 |
| 供应商集中 | 容量预订证据 | 预留容量或供应商多元化明显弱于多云叙事暗示 | 提高依赖折价,并要求应急计划 |
否决标准刻意设计为可监控项,以便绑定尽调输出,而不是泛泛谨慎。
[CR019, CR026, CR027, CR029, CR036, CR037]08估值
8.1 建议:按已成交价格跟踪公司,但不要追逐动量定价
Baseten 看起来是一家高质量公司,所处位置也是 AI 技术栈里最好的部分之一,但投资判断仍受制于公开材料能证明什么、不能证明什么。 最干净的估值锚,是 2026 年 1 月已完成的 Series E:融资 $300 million,估值 $5 billion。这个价格真实、近期, 且由 Baseten 自己公告、Business Wire、Tech Funding News 和 Tracxn 互相印证。更难的问题是,公开证据是否足以把这个价格视为有吸引力、 合理,或已经偏高。 答案是有条件的,不是绝对的。Sacra 对 $600 million 年化收入的估计,使已完成轮次看起来大约是 8.3x 隐含收入,具备一定支撑,尤其是参照 MongoDB 式低双位数公开倍数,以及 Modal 或 Fireworks 低到中双位数的私募倍数。但同一组公开记录也显示论证很脆弱:Baseten 披露的定价偏高,第三方定价矩阵称它纸面上昂贵, 传闻中的 $11 billion 后续融资会在没有相应一手财务披露的情况下,直接跳进高端软件定价区间。这不是一个干净买入的设置。 因此,正确姿态是观察:中等信心、高风险评级、估值偏高。公司值得持续贴近,因为市场真实、产品有差异化;但在承销更高入场价之前, 投资人应坚持用尽调把私人市场热情转化为证据。[CV001, CV002, CV007, CV008, CV009, CV035]
| 建议 | 置信度 | 风险评级 | 估值立场 | 决策含义 |
|---|---|---|---|---|
| 观察 | 中 | 高 | 偏高 | 只有在严格尽调前提下,才继续跟进已完成的 $5B 估值锚点;不要仅凭公开证据承销下一轮升估值融资。 |
结论明确对价格敏感:它把已完成的 Series E 锚点与更热的后续融资叙事切开。
[CV001, CV007, CV035, CV042, CV046]当前判断维持在观察:品类和产品信号强,但收入不透明和估值跃升风险抵消了部分吸引力。
这是推理图,不是加权评分模型。
[CV001, CV030, CV031, CV042, CV046, CV047]8.2 只有收入质量真实且高端定价顶住竞争,价格才站得住
乐观逻辑从速度和时点开始。Baseten 的融资路径从 2025 年 2 月 $75 million Series C,走到 2025 年 9 月以 $2.15 billion 估值完成 $150 million Series D,再到 2026 年 1 月完成 $5 billion 估值的 Series E。 Sacra 估计的 $600 million 年化收入,以及 Baseten 自己声称的 100x 推理量增长,方向上都符合一个判断:推理从原型基础设施转向生产基础设施时, 公司正好踩中陡峭采用曲线。公开市场和私募可比公司也给这段增长提供了上下文:Modal 2026 年 5 月轮次约按 15.5x 年化收入定价,Fireworks 已完成估值约为 12.7x,MongoDB 式公开基础设施待遇仍在约 10x。 反向逻辑同样重要。Baseten 定价页、HostFleet 矩阵和 Runpod 对比都指向一个高端定价服务层,而不是商品化端点供应商。如果溢价买来正常运行时间、 支持、合规和混合灵活性,这可以是优势;如果业务比高端上市软件公司更重支持、利润率更低,也会压住倍数。超大规模云厂商让下行情景更尖锐: AWS、Google 和 Azure 可以把模型访问、算力、治理和云赠金捆进更大的云关系里。换句话说,Baseten 可能相对原始 AI 云值得溢价,但还没有拿出足够披露质量,让投资人有信心支付 Cloudflare 或 Datadog 式溢价。[CV003, CV004, CV005, CV006, CV010, CV011]
| 论点 | 支撑证据 | 何种情况会改变判断 |
|---|---|---|
| 推理正在从实验预算变成生产预算。 | Technavio 和 Mordor 都显示 AI 部署市场规模大且增长快,Baseten 的融资速度也说明资本在追逐这个主题。 | 如果采用放缓、企业转化较弱或模型服务预算收缩,溢价逻辑会被削弱。 |
| Baseten 可能配得上溢价,因为它卖的是性能、混合控制和支持,而不是裸 GPU 时间。 | Baseten 主打定制 SLA、自托管、优先 GPU、跨云扩展和前置部署工程师。 | 如果客户能用更便宜的替代方案复制可接受的延迟和正常运行时间,溢价会从护城河变成负担。 |
| 如果私营收入估计大致正确,已完成的 $5B 融资就说得通。 | Sacra 估计的 $600M 收入运行率意味着约 8.3x 收入倍数,低于许多高端软件可比公司。 | 如果一手财务数据显著低于该估计,这一支撑会很快失效。 |
| $11B 叙事尚未得到公开事实支撑。 | 唯一公开支撑是第三方关于谈判的报道,而不是已完成融资或披露的基本面。 | 如果出现已签署投资条款清单或已完成公告,且条款干净、财务得到佐证,投资逻辑会明显改善。 |
表格区分公司质量和估值质量;两者都成立,才有买入结论。
[CV007, CV008, CV010, CV011, CV015, CV030]要支撑 $5B 估值,所需收入会随投资人锚定的可比倍数而大幅变化。
每个柱形都用 $5B Series E 标记除以选定可比倍数;这些数值是支撑门槛,不是预测。
[CV017, CV020, CV022, CV024, CV027, CV029]8.3 可比公司和情景分析把 $5B 放进基准情景,但容错很小
可比公司组重要,是因为 Baseten 夹在两套估值制度之间。一边是 CoreWeave 这样的 AI 云和基础设施公司,资本密集度可见, 公开市场倍数更低;另一边是 Datadog、Cloudflare 这样的高端基础设施软件公司,披露质量、利润率和平台广度允许更高交易倍数。Baseten 最好的公开收入估计,把已完成的 $5 billion 轮次放在两者之间,而不是明确归入某一边。因此,已完成轮次可以讨论; 但传闻中的 $11 billion 跳升,还没有得到公开证据承销。 情景分析把同一点讲得更具体。悲观情景假设只有 $300 million 到 $400 million 的可持续收入支撑,并给 7x 到 9x 倍数, 指向实质降轮风险。基准情景用 $500 million 到 $650 million 收入和 8x 到 12x 倍数,支撑约 $4 billion 到 $7.8 billion,足以覆盖已完成的 Series E。乐观情景既需要更强收入延续,也需要倍数接近 Modal 式或私募推理基础设施上沿。因此,当前投资争论不是 Baseten 是否有吸引力;而是投资人是否因承担一个可辩护 $5 billion 价格与一个愿景型 $11 billion 叙事之间的不确定性而得到补偿。[CV016, CV017, CV018, CV019, CV020, CV021]
| 情景 | 假设 | 估值 / 回报逻辑 | 关键风险 | 概率信号 |
|---|---|---|---|---|
| 悲观 | 可持续收入支撑只有 $300M-$400M,高端定价被侵蚀,独立供应商经济性更像基础设施而非软件。 | 按 7x-9x 区间为 $2.1B-$3.6B;当前估值容易被重置。 | 价格压力、较低的毛利质量或企业扩张放缓会暴露降估值融资风险。 | 如果尽调显示交付依赖大量支持、收入集中或单位经济性较弱,悲观情景概率会上升。 |
| 基准 | 收入支撑落在约 $500M-$650M,增长仍强,Baseten 相对裸 AI 云资源保持适度溢价。 | 按 8x-12x 区间为 $4.0B-$7.8B;已完成的 $5B Series E 落在这个区间内。 | 判断仍取决于验证收入质量和毛利率,而不只是收入增长。 | 按今天的公开证据,这是最站得住脚的情景。 |
| 乐观 | 收入支撑达到 $700M-$900M,高端经济性守住,投资人继续为 Modal 级或更高的私营推理倍数买单。 | 按 12x-16x 区间为 $8.4B-$14.4B;$11B 升估值变得可能。 | 上行空间取决于持续超高增长,以及公开资料尚未证明的高质量毛利率与披露质量。 | 这一情景需要比当前公开记录更多的证据。 |
这些区间是情景输出,不是点估计;设计目的在于展示当收入支撑或倍数选择变化时,承销结论会多快移动。
[CV035, CV036, CV043, CV044, CV045]| 可比对象 | 估值背景 | 收入背景 | 隐含倍数 | 与 Baseten 的相关性 | 局限性 |
|---|---|---|---|---|---|
| Baseten 已完成 Series E | $5.0B 投后估值(2026 年 1 月) | ~$600M 年化收入估计 | ~8.3x | 当前承销讨论的直接锚点。 | 收入支撑来自第三方估计,不是公司披露。 |
| Modal | $4.65B 投后估值(2026 年 5 月) | ~$300M 年化收入 | ~15.5x | 弹性推理基础设施最接近的高端私营可比对象。 | 企业支持、合规或客户基础组合并不相同。 |
| Fireworks AI | $4.0B 投后估值(2025 年 10 月完成) | 2026 年 2 月年化收入估计约 $315M | ~12.7x | 有明确毛利率讨论的私营推理可比对象。 | 收入和毛利也都是第三方估计,并非经审计披露。 |
| CoreWeave | $59.75B 市值(2026 年 5 月) | 2026 年收入参考,指引中点代理约 $12.5B | ~4.8x | 可作为资本密集型基础设施的纯 AI 云下限参考。 | 规模、债务结构和商业模式都与 Baseten 差异很大。 |
| Datadog | $88.04B 市值(2026 年 5 月) | 2026 年收入指引中点约 $4.32B | ~20.4x | 披露增长质量的高端公开基础设施软件基准。 | 可观测性软件的毛利和披露质量优于 Baseten 的公开记录。 |
| Cloudflare | $85.47B 市值(2026 年 5 月) | 过去 12 个月收入约 $2.33B | ~36.7x | 开发者平台倍数上限参考。 | 品类领导力和上市公司成熟度远强于 Baseten 现在的水平。 |
| MongoDB | $27.01B 市值(2026 年 5 月) | 过去 12 个月收入约 $2.60B | ~10.4x | 中低位公开基础设施软件参考。 | 数据库经济性和安装基础不同于推理基础设施。 |
这是一个不完整但刻意拉宽的样本,覆盖未上市推理同业、AI 云和上市基础设施软件,用来框定市场可能愿意支付的价格区间。
[CV016, CV017, CV019, CV020, CV021, CV022]公开证据显示,已完成的 $5B 轮融资落在基准情景内;传闻中的 $11B 轮融资则需要乐观情景假设。
情景区间把收入支撑范围和可比公司倍数区间合并;传闻中的后续融资只作为外部信号展示,不作为认可的估值锚。
[CV008, CV043, CV044, CV045]Baseten 在市场顺风和产品差异化上得分较高,但信息披露质量和利润率确定性明显偏低。
评分是方向性的投委会式判断,只基于截至运行日保留下来的公开证据。
[CV025, CV030, CV031, CV040, CV041, CV042]8.4 投资逻辑应由条款、利润率证据和集中度把关,而不是只靠热情
最终判断取决于少数几个能迅速移动估值的尽调项。第一,投资人需要管理层级收入证据。如果公司确实接近 $600 million 年化收入运行率, 扩张强、集中度可接受,已完成的 $5 billion 轮次就开始显得合理,下一次估值也从空想变成可讨论。如果真实数字明显更低, 同一套定价和可比公司分析就会从「可辩护溢价」翻成「过度拉伸的后期估值」。第二,投资人需要直接利润率数据。Fireworks 约 50% 毛利率是有用同业提醒:推理业务不是纯软件。只有当利用率、支持负担和预留容量经济性带来比这个类比更好的利润率质量时, Baseten 的高端定价才配得上高端倍数。 第三,投资人需要标题价格下面的条款。优先权悬置、二级交易和客户集中度,可能比投后估值标题更重要。因此,正确否决触发器应务实而非修辞: 如果 Baseten 不能在可接受利润率下守住高端性价比,如果增长跌破基准情景区间,或如果任何新轮次只能靠激进条款完成,投资逻辑就应迅速下调。 在这些问题关闭之前,公司值得积极跟踪和结构化尽调,而不是给出高确信、对价格不敏感的买入决定。[CV023, CV025, CV032, CV033, CV034, CV039]
| 触发因素 | 阈值 / 信号 | 对投资逻辑的传导 | 行动含义 |
|---|---|---|---|
| 收入证据失效 | 管理层口径的收入运行率显著低于 $500M,或增速较公开叙事大幅放缓。 | 已关闭的 $5B 融资落出基准情景区间,开始像后周期估值标记。 | 下调建议,把估值工作重置到悲观情景区间。 |
| 利润率质量不及预期 | 毛利率和利用率更接近基础设施转售经济性,而不是高端软件经济性。 | 高端软件倍数不再适配该商业模式。 | 改用更低的可比公司组,并要求明显更好的进入价格。 |
| 性价比优势削弱 | 客户可以用更便宜的替代方案,或超大规模云厂商捆绑产品,跑出可接受的生产结果。 | Baseten 的溢价定价从护城河变成采用阻力。 | 下调信心,重新审视长期市场份额假设。 |
| 激进融资条款出现 | 新一轮只有在高额清算优先权、老股交易占比较高的结构,或非常规保护条款下才能完成。 | 表面估值不再能清晰对应普通股上行空间。 | 将该估值标记视为结构性更弱,并重做回报预期。 |
| 集中度显现 | 少数 AI 原生账户贡献过高收入占比,但留存证据跟不上。 | 相比表面增速,公司的收入质量和持续性吸引力大幅下降。 | 在集中度和扩张数据厘清前,暂停任何高信心判断。 |
这些触发因素需要可监测,并直接连到估值支撑,而不是泛泛的经营提醒。
[CV012, CV013, CV014, CV039, CV042, CV043]| 主题 | 缺失证据 | 重要性 | 负责人或尽调路径 |
|---|---|---|---|
| 收入桥接 | 截至当前季度的月度和季度收入、ARR 以及队列扩张。 | 这是判断 $5B 是否合理、还是已经偏高的最大变量。 | CFO / 财务资料包和董事会材料。 |
| 毛利率与利用率 | 按产品线拆分的毛利率、GPU 利用率、预留容量组合和支持负担。 | 只有利润率质量真实,估值倍数才配得上向高端软件靠拢。 | 基础设施 + 财务深挖,并拆到产品线。 |
| 股权结构表与条款 | 完全稀释股数、清算优先权、老股交易,以及任何结构化条款。 | 表面投后估值可能夸大普通股上行空间。 | 法务 + 财务审阅最新融资文件。 |
| 客户集中度与留存 | 头部客户敞口、NRR、客户标识留存,以及 AI 原生和企业账户的垂直行业组合。 | 如果支出集中或不可重复,溢价倍数很脆弱。 | 销售运营和客户成功队列复盘。 |
| 上调估值融资证据 | 任何高于 $5B 的估值,都需要已签署投资意向书或已关闭融资证明。 | 传闻估值不应在承销判断中替代已关闭锚点。 | 董事会流程复盘和直接融资确认。 |
这些尽调事项按改变估值支撑的速度排序,而不是按公司总体重要性排序。
[CV025, CV037, CV046, CV047, CV048]免责声明
本报告由自动化研究流程基于截至 2026-05-30 的公开信息生成。报告不构成投资建议。私营公司数据可能不完整、过时或只是估计,投资者在作出任何投资决定前,应补充管理层尽调、合同审查,并直接获取财务材料。
证据索引
| 编号 | 陈述 | 可信度 | 来源 |
|---|---|---|---|
| CO001 | Baseten was founded in 2019. | 高 | SO009, SO016, SO018 |
| CO002 | Baseten is based in San Francisco. | 高 | SO008, SO014, SO016 |
| CO003 | Baseten’s legal entity is Baseten Labs, Inc., and its privacy policy lists 201 Spear St, Suite 1600, San Francisco, CA 94105. | 高 | SO007, SO008 |
| CO004 | Baseten currently presents itself as an inference company built around high-performance production inference. | 高 | SO001, SO003, SO014 |
| CO005 | Official product surfaces show that Baseten combines production inference, model APIs, and training workflows in one platform. | 中 | SO001, SO005 |
| CO006 | Baseten sells cloud, self-hosted, and region-aware deployment options aimed at customers that need control over security or data residency. | 高 | SO003, SO004, SO005 |
| CO007 | Baseten says it is SOC 2 Type II and HIPAA compliant across its hosting options. | 高 | SO003, SO004, SO005 |
| CO008 | Baseten’s careers page, customer hub, and Series E press release name customers such as Abridge, Cursor, OpenEvidence, Speechify, Gamma, Clay, Notion, and Lovable. | 高 | SO006, SO002, SO014 |
| CO009 | The founders say they started Baseten at the end of 2019 to solve model-deployment and ML-infrastructure pain they had experienced themselves. | 中 | SO009 |
| CO010 | Tuhin Srivastava is publicly identified as CEO and co-founder. | 高 | SO031, SO015 |
| CO011 | Amir Haghighat is publicly identified as CTO and co-founder. | 高 | SO032, SO015 |
| CO012 | Phil Howes is publicly identified as a co-founder, and Tech Funding News describes him as chief scientist. | 中 | SO034, SO015 |
| CO013 | Pankaj Gupta is publicly identified as a co-founder. | 中 | SO033, SO015 |
| CO014 | Baseten’s Series E announcement is signed by Amir, Pankaj, Phil, and Tuhin, showing that all four founders still anchor the public leadership narrative. | 中 | SO013 |
| CO015 | Public governance visibility is limited in the fetched corpus, but Series D explicitly says Jay Simons joined Baseten’s board. | 中 | SO012, SO016 |
| CO016 | By the Series A milestone, Baseten said it had raised a little over $20 million across seed and Series A, with Greylock leading the Series A and South Park Commons, Lachy Groom, and Ray Tonsing also involved. | 中 | SO009 |
| CO017 | Tracxn and the archived PitchBook profile both place Baseten’s Series A on 2022-04-26. | 中 | SO016, SO018 |
| CO018 | Baseten’s Series B added $40 million led by IVP and Spark, with Greylock, South Park Commons, Lachy Groom, and Base Case also participating. | 中 | SO010 |
| CO019 | Tracxn records the Series B round date as 2024-03-04. | 中 | SO016 |
| CO020 | Baseten’s Series C raised $75 million on 2025-02-19 and was backed by IVP, Spark, Greylock, Conviction, South Park Commons, Basecase, Lachy Groom, Adam Bain, and Dick Costolo. | 高 | SO011, SO016 |
| CO021 | The Series C post says Baseten was already running workloads across thousands of GPUs and serving millions of end customers worldwide by early 2025. | 中 | SO011 |
| CO022 | Baseten’s Series D raised $150 million, was led by BOND, and brought CapitalG and Conviction into the round alongside prior investors. | 高 | SO012, SO016 |
| CO023 | CB Insights and Tracxn both peg Baseten’s September 2025 valuation at about $2.15 billion. | 中 | SO017, SO016 |
| CO024 | Series D linked financing to governance by adding Jay Simons to the board. | 中 | SO012 |
| CO025 | Baseten’s Series E raised $300 million at a $5 billion valuation, led by IVP and CapitalG with NVIDIA and several prior investors participating. | 高 | SO014, SO013, SO015 |
| CO026 | Tracxn and CB Insights both list the Series E closing date as 2026-01-20. | 中 | SO016, SO017 |
| CO027 | BusinessWire says Baseten has raised $585 million to date and that the Series E financing was its third fundraise in the prior year. | 高 | SO014, SO016, SO017 |
| CO028 | BusinessWire describes Baseten as infrastructure behind AI products including Cursor, Mercor, Clay, OpenEvidence, Lovable, and Abridge. | 中 | SO014 |
| CO029 | Official enterprise and healthcare pages market four-nines reliability, multi-cloud autoscaling, and region-restricted or self-hosted deployment options for sensitive workloads. | 高 | SO003, SO004, SO001 |
| CO030 | NVIDIA’s case study says Baseten reduced cold starts to 5–10 seconds from up to five minutes and doubled one customer’s inference performance with TensorRT-LLM. | 中 | SO027 |
| CO031 | OpenEvidence’s case study says Baseten now serves billions of requests per week for OpenEvidence and reduced end-to-end latency from over 700 milliseconds to 160 milliseconds. | 中 | SO023 |
| CO032 | Gamma says it generates roughly 3 million images per day on Baseten for 70+ million users and more than $100 million of ARR. | 中 | SO024 |
| CO033 | Speechify says Baseten cut its cost per million characters by 44% while supporting a 60M+ user base. | 中 | SO025 |
| CO034 | Patreon says Baseten saved more than 440 developer hours per year and cut GPU costs by 70% for Whisper-based workloads. | 中 | SO026 |
| CO035 | A January 2026 WorkOS interview says Baseten had just launched a startup program for seed and Series A companies and saw voice as an emerging modality. | 中 | SO028 |
| CO036 | Current headcount is not cleanly supportable because PitchBook and Tracxn show conflicting employee counts and entity-level staff figures. | 低 | SO018, SO016 |
| CO037 | ServiceAlert’s 90-day monitor showed May 2026 at 100% uptime and zero days with issues, but it also says it only tracks daily worst-status reachability and lacks detailed incident data. | 中 | SO030 |
| CO038 | Nudge Security frames Baseten as a vendor-risk review target and lists security badges and SSO or MFA features, but it is an external aggregator rather than Baseten’s primary trust documentation. | 中 | SO029, SO007 |
| CO039 | Abridge describes itself as enterprise-grade AI for clinical conversations used by large healthcare systems, consistent with Baseten’s official claim that healthcare AI is a core customer segment. | 中 | SO019, SO006 |
| CO040 | Cursor says it is trusted by over half of the Fortune 500, supporting Baseten’s official claim to serve category-defining AI applications rather than only hobby use cases. | 中 | SO021, SO014 |
| CO041 | Clay says more than 500,000 GTM teams use its platform, and Baseten’s Series E materials identify Clay as a customer. | 中 | SO020, SO014 |
| CO042 | OpenEvidence positions itself as America’s Official Medical Knowledge Platform with major medical-content partners, and Baseten names it as a customer in both careers and Series E materials. | 中 | SO022, SO006, SO014 |
| CO043 | External market-data sources classify Baseten as a private Series E company. | 中 | SO016, SO018 |
| CO044 | Baseten’s pricing page shows a pay-as-you-go model with token-priced model APIs and per-minute compute pricing for GPU and CPU instances. | 中 | SO005 |
| CO045 | Baseten’s terms define the service as a platform for deploying machine learning models and building or operating applications for machine learning through a web interface. | 中 | SO007 |
| CO046 | The Series A post says Baseten quietly announced its first product in May after more than 18 months of building and used the funding moment to launch a public beta. | 中 | SO009 |
| CM001 | Baseten positions itself as a platform for high-performance inference in production rather than as a foundation-model creator. | 中 | SM001, SM003, SM006 |
| CM002 | Baseten's product surface now spans Model APIs, Dedicated Inference, Chains, Frontier Gateway, and Training. | 中 | SM005, SM006, SM007, SM015, SM016 |
| CM003 | The included spend in Baseten's core market is model-serving runtime, autoscaling, observability, billing or metering, and associated performance support attached to production inference. | 中 | SM002, SM006, SM007, SM016, SM018 |
| CM004 | The excluded spend is frontier-model R&D and the broader data or analytics stack that hyperscaler AI suites bundle but Baseten does not foreground. | 中 | SM015, SM027, SM028, SM029 |
| CM005 | Baseten competes inside overlapping categories of AI inference-as-a-service, broader AI inference, and enterprise AI platform software rather than a single cleanly defined market. | 中 | SM019, SM020, SM021 |
| CM006 | Status-quo substitutes include hyperscaler AI platforms, internal GPU infrastructure, and specialist GPU clouds such as Modal, Replicate, and Runpod. | 中 | SM022, SM023, SM024, SM027, SM028, SM029 |
| CM007 | Baseten's positioning emphasizes open-source and custom model deployment rather than ownership of closed frontier models. | 中 | SM005, SM006, SM014 |
| CM008 | Training is an adjacency for Baseten, but the commercial center of gravity remains inference and the train-to-deploy loop that feeds inference endpoints. | 中 | SM001, SM006, SM015 |
| CM009 | Technavio values the AI inference-as-a-service market at USD 85.25 billion in 2025. | 中 | SM019 |
| CM010 | Technavio forecasts a 22.1% CAGR for AI inference-as-a-service during 2026-2030. | 中 | SM019 |
| CM011 | Technavio says the GPU segment accounts for more than 58% of the AI inference-as-a-service market and that cloud deployment dominates the category. | 中 | SM019 |
| CM012 | Technavio says North America contributes 41.1% of forecast growth in AI inference-as-a-service. | 中 | SM019 |
| CM013 | Mordor Intelligence puts the enterprise AI market at USD 114.87 billion in 2026 and projects 18.91% CAGR through 2031. | 中 | SM020 |
| CM014 | Mordor says software and platforms led 65.89% of 2025 enterprise AI revenue. | 中 | SM020 |
| CM015 | Mordor says cloud solutions accounted for 67.33% of 2025 enterprise AI revenue while hybrid and edge configurations are the faster-growing deployment path. | 中 | SM020 |
| CM016 | Large enterprises accounted for 71.43% of 2025 enterprise AI spending in Mordor's market model. | 中 | SM020 |
| CM017 | Healthcare and life sciences are Mordor's fastest-growing enterprise AI vertical at 20.77% CAGR. | 中 | SM020 |
| CM018 | Fortune Business Insights values the broader AI inference market at USD 117.80 billion in 2026, up from USD 103.73 billion in 2025. | 中 | SM021 |
| CM019 | Fortune forecasts 12.98% CAGR to 2034 and says North America held 41.78% of the AI inference market in 2025. | 中 | SM021 |
| CM020 | Across public lenses, Baseten's addressable opportunity is clearly large but scope-sensitive: roughly USD 85 billion for inference-as-a-service today and more than USD 100 billion for adjacent inference or enterprise AI platform categories. | 中 | SM019, SM020, SM021 |
| CM021 | Baseten's best-evidenced buyer groups are performance-sensitive AI product teams, enterprise AI infrastructure teams, and model labs monetizing APIs. | 中 | SM003, SM010, SM016 |
| CM022 | Gamma shows a PLG or self-serve segment that values low latency and open-source model serving without building an internal ML infrastructure team. | 中 | SM012 |
| CM023 | OpenEvidence shows a regulated healthcare segment that wanted reliable performance, redundancy, and flexible compute without multi-year GPU commitments. | 中 | SM011 |
| CM024 | Writer shows enterprise model teams serving 70B models need multi-GPU performance engineering and secure deployments. | 中 | SM013 |
| CM025 | The daily users of Baseten are ML engineers, data scientists, and application engineers, while procurement, security, and IT administrators become stakeholders once deployments require identity, policy, or compliance controls. | 中 | SM009, SM014, SM017 |
| CM026 | Budget ownership appears to begin in product or engineering budgets for usage-based experimentation and shift toward central platform or IT budgets for quoted Pro, Enterprise, or self-hosted deployments. | 中 | SM002, SM003, SM017, SM018 |
| CM027 | Baseten's adoption path commonly starts with Model APIs or simple deployments and expands to Dedicated Inference and Chains as traffic, hardware specialization, or compound workflows grow. | 中 | SM001, SM005, SM006, SM007 |
| CM028 | Frontier Gateway creates a separate buyer motion for labs that need white-labeled APIs, rate limits, token metering, and billing without building their own inference control plane. | 中 | SM016 |
| CM029 | Baseten productizes compliance with HIPAA, SOC 2 Type II, region restrictions, dedicated namespaces, and a no-shared-GPU posture. | 中 | SM003, SM004, SM009 |
| CM030 | Baseten positions self-hosted, hybrid, and cloud deployments as ways to meet data residency, security, and existing cloud-commitment requirements. | 中 | SM002, SM003, SM006, SM008 |
| CM031 | Baseten's Model APIs are OpenAI-compatible and are marketed as 5-10x cheaper than closed alternatives. | 中 | SM005 |
| CM032 | Dedicated Inference is marketed as delivering 6x better GPU utilization and 5-10x lower costs at scale. | 中 | SM006 |
| CM033 | Chains is marketed as giving compound AI teams 6x better GPU usage and roughly half the latency through hardware-aware orchestration. | 中 | SM001, SM007 |
| CM034 | Baseten's value proposition is not just compute rental; customer stories repeatedly emphasize outsourced performance engineering and forward-deployed support. | 中 | SM003, SM011, SM012, SM013 |
| CM035 | OpenEvidence reported 78% lower latency, 6x faster deployments, and 8x-plus lower infrastructure maintenance time after moving to Baseten. | 中 | SM011 |
| CM036 | Gamma reported 30-80% faster image generation, 20% better efficiency, and scaling to 70+ million users and about 3 million images per day on Baseten. | 中 | SM012 |
| CM037 | Writer reported 60% higher tokens per second, 23% lower time to first token, and 35% lower cost per million tokens on four A100 GPUs. | 中 | SM013 |
| CM038 | Enterprise AI growth is being driven by automation demand, exploding data volumes, cloud AI services, and specialized hardware advances. | 中 | SM020 |
| CM039 | AI inference demand is also being driven by real-time processing needs, generative AI workloads, and edge or IoT expansion. | 中 | SM019, SM021 |
| CM040 | Hardware supply constraints, high accelerator prices, and tariff pressure are material market constraints for both inference providers and buyers. | 中 | SM019, SM021 |
| CM041 | Talent shortages and legacy-system integration complexity remain major barriers to enterprise AI rollout. | 中 | SM020, SM021 |
| CM042 | Multi-cloud capacity management and the ability to avoid long-term GPU commitments address a real buyer pain point around capacity risk and demand spikes. | 中 | SM008, SM011, SM016 |
| CM043 | HostFleet's April 2026 matrix shows Baseten's published hourly GPU prices above Runpod, Modal, and Replicate on like-for-like SKUs such as T4, L4, A100, and H100. | 中 | SM026 |
| CM044 | Runpod's own 2026 comparison ranks Baseten fifth and attributes 8-12 second cold starts to Baseten while highlighting cheaper or faster specialist alternatives. | 中 | SM025 |
| CM045 | Hyperscaler substitutes bundle model deployment with broader data, notebook, governance, and agent tooling rather than pure inference specialization. | 中 | SM027, SM028, SM029 |
| CM046 | Baseten's clearest public beachheads are high-performance consumer AI products, regulated healthcare workloads, and model labs monetizing proprietary models. | 中 | SM011, SM012, SM013, SM016 |
| CM047 | Public pricing and packaging imply Baseten trades a higher headline GPU rate for bundled performance engineering, observability, security, and managed support. | 中 | SM002, SM003, SM016, SM026 |
| CM048 | Public sources do not isolate a clean Baseten-specific SAM or SOM because available estimates mix enterprise AI, inference infrastructure, cloud, edge, and model-serving categories. | 中 | SM019, SM020, SM021 |
| CM049 | Public material does not disclose Baseten's contract sizes, attach rates for support, or revenue mix across Model APIs, Dedicated Inference, Training, and Frontier Gateway. | 中 | SM002, SM016, SM018 |
| CP001 | Baseten's current product surface spans custom-model deployment, Model APIs, training, Chains orchestration, and Frontier Gateway. | 高 | SP003, SP005, SP006, SP007, SP010 |
| CP002 | Baseten supports Baseten Cloud, single-tenant or self-hosted deployments, and multi-cloud capacity or cross-cloud autoscaling. | 高 | SP001, SP003, SP004, SP009 |
| CP003 | Baseten's public pitch centers on speed, uptime, and developer experience instead of lowest-cost raw GPU capacity. | 高 | SP001, SP008, SP009, SP029 |
| CP004 | Modal positions as Python-first serverless AI infrastructure with instant autoscaling to 1000+ GPUs and built-in observability. | 高 | SP014, SP015 |
| CP005 | Replicate positions around one-line APIs, community-published models, fine-tuning, and custom deployment through Cog. | 高 | SP016, SP017 |
| CP006 | Runpod offers Pods, Serverless, and Clusters, emphasizing fast scaling, many GPU SKUs or regions, and low-cost capacity. | 高 | SP018, SP019, SP020 |
| CP007 | AWS SageMaker, Google Vertex AI, and Azure ML each market broader end-to-end ML or AI lifecycle tooling with strong enterprise controls. | 高 | SP021, SP023, SP024 |
| CP008 | Internal build remains a real substitute because Truss packages models portably and can narrow the software gap between local, self-managed, and hosted deployments. | 高 | SP011, SP003 |
| CP009 | Frontier Gateway lets model labs ship white-labeled APIs with per-user keys, rate limits, and metering, widening Baseten's competitor set to lab-facing platforms. | 中 | SP010, SP029 |
| CP010 | PitchBook and Tracxn independently name Modal and Replicate among Baseten's comparable competitors, supporting the direct-peer set beyond vendor marketing. | 高 | SP027, SP026 |
| CP011 | Baseten's public plans split into Basic pay-as-you-go, quote-driven Pro, and Enterprise, with priority compute, dedicated compute, self-host, and custom SLAs above Basic. | 高 | SP002, SP009 |
| CP012 | Baseten says customers do not pay for idle time and only pay while models are deploying, scaling, or processing on the platform. | 中 | SP002 |
| CP013 | Baseten advertises SOC 2 Type II, HIPAA compliance, and no default storage of model inputs or outputs. | 高 | SP002, SP004, SP009 |
| CP014 | Baseten's runtime layers open-source engines such as TensorRT-LLM, SGLang, vLLM, TGI, and TEI with custom optimizations like speculative decoding and KV-cache management. | 高 | SP003, SP008 |
| CP015 | Baseten Model APIs are OpenAI-compatible and can move from shared APIs to dedicated deployments on Baseten-managed hardware. | 高 | SP005, SP003 |
| CP016 | Modal's public pricing offers $30 per month in Starter credits, 10 GPU concurrency on Starter, and 50 GPU concurrency on Team at $250 per month plus compute. | 中 | SP015 |
| CP017 | Replicate private models usually bill for setup, idle, and active time on dedicated hardware, making always-warm custom deployments costlier than pure scale-to-zero billing. | 高 | SP017, SP016 |
| CP018 | Runpod Secure Cloud and Serverless publish materially lower raw GPU list prices than Baseten's public per-GPU pricing for comparable capacity tiers. | 中 | SP019, SP020, SP002, SP025 |
| CP019 | Runpod Serverless bills per second from worker start until full stop, with flex workers scaling to zero and active workers remaining on. | 高 | SP020, SP018 |
| CP020 | AWS Bedrock prices open-model access by provider or model and service tier, and its batch inference option is listed at 50% below on-demand pricing. | 中 | SP022 |
| CP021 | Google Vertex AI prices tools, compute, storage, and management fees separately rather than offering simple public list GPU rates. | 中 | SP023 |
| CP022 | Azure ML charges no standalone platform fee but bills the Azure services consumed around training, deployment, storage, and monitoring. | 中 | SP024 |
| CP023 | Baseten's multi-cloud and self-host options reduce buyer fear of cloud lock-in, but they also make it easier for customers to migrate away from Baseten later. | 高 | SP001, SP009, SP011 |
| CP024 | Baseten's public trust posture is stronger than most self-serve peers because it combines compliance claims with single-tenant and self-host deployment modes. | 高 | SP002, SP004, SP009 |
| CP025 | Hyperscalers retain the strongest distribution power because Bedrock or SageMaker, Vertex AI, and Azure ML sit inside existing identity, billing, and procurement relationships. | 高 | SP021, SP023, SP024 |
| CP026 | Modal narrows the enterprise gap with marketplace transacting, SSO, audit logs, and HIPAA on Enterprise, but its public package is still compute-led rather than inference-specific governance. | 中 | SP014, SP015 |
| CP027 | Replicate minimizes adoption friction for prototypes through community models and simple APIs, but its public materials disclose less enterprise control than Baseten's. | 中 | SP016, SP017, SP009 |
| CP028 | Runpod explicitly markets no lock-in, low cost, and fast scale, making it attractive to cost-sensitive teams comfortable assembling their own serving stack. | 高 | SP018, SP019, SP020 |
| CP029 | Open-source packaging via Truss and Cog plus raw GPU clouds make multi-homing structurally easier in this market than in closed-model or data-platform markets. | 高 | SP011, SP017, SP019 |
| CP030 | Baseten's expansion into training and lab-facing gateway products moves it from pure hosting into a broader AI infrastructure platform category. | 高 | SP006, SP010, SP029 |
| CP031 | Baseten's main moat is the integration of optimized runtimes, multi-cloud capacity, enterprise deployment modes, and hands-on engineering support rather than exclusive model ownership. | 高 | SP008, SP009, SP029 |
| CP032 | Truss can create developer pull and portability at the same time, so it is both a funnel asset and a limiter on hard lock-in. | 高 | SP011, SP003 |
| CP033 | HostFleet's April 2026 matrix shows Baseten as the highest published per-GPU-hour option among the compared serverless hosts on multiple common GPU tiers. | 中 | SP025, SP002, SP015, SP017, SP019 |
| CP034 | The same HostFleet comparison still argues Baseten is attractive for production workloads because Truss, observability, and support are tangible despite higher headline pricing. | 中 | SP025, SP002, SP003 |
| CP035 | Baseten's public status page reports 99.91% uptime for Model APIs over the displayed window and records multiple May 2026 incidents. | 中 | SP012 |
| CP036 | Servicealert's independent outage tracker also shows non-perfect recent availability for Baseten, reinforcing that reliability remains a diligence item. | 中 | SP013, SP012 |
| CP037 | Sacra identifies hyperscaler bundling and below-market pricing as the clearest external threat to independent inference platforms like Baseten. | 中 | SP028, SP021, SP023, SP024 |
| CP038 | Business Wire and TechFundingNews both frame Baseten's current strategic battleground as production inference infrastructure rather than frontier-model training ownership. | 中 | SP029, SP030 |
| CP039 | Business Wire says Baseten has raised $585 million and counts NVIDIA, IVP, and CapitalG among key investors, improving its staying power in a capital-intensive market. | 中 | SP029, SP028, SP026 |
| CP040 | Baseten's best-supported positioning is premium, production-grade open-model inference for teams that value performance, portability, and support more than lowest-cost GPU hours. | 高 | SP003, SP008, SP009, SP025, SP028 |
| CI001 | Baseten's public monetization surfaces span dedicated deployments, Model APIs, and Training. | 高 | SI001, SI003, SI004, SI007 |
| CI002 | Baseten's public plan structure is Basic at $0 per month pay-as-you-go, with Pro and Enterprise sold via quote. | 中 | SI001 |
| CI003 | Pro includes priority access to high-demand GPUs, dedicated compute, higher Model API rate limits, hands-on engineering expertise, dedicated Slack and Zoom support, and volume discounts. | 中 | SI001 |
| CI004 | Enterprise includes custom SLAs, self-host deployments, use of existing cloud commitments, full control over data residency, and advanced RBAC with Teams. | 中 | SI001, SI005 |
| CI005 | Baseten publishes Model API list pricing per 1 million tokens with separate columns for input, cached input, and output. | 高 | SI001, SI003 |
| CI006 | Dedicated deployments are billed only for compute used, down to the minute. | 中 | SI001 |
| CI007 | Baseten says customers do not pay for idle time, but do pay while a model is deploying, scaling up or down, or making predictions. | 中 | SI001 |
| CI008 | Baseten sells Training both as Loops early access and as generally available Training Jobs, with a direct train-to-deploy path into production inference. | 中 | SI004 |
| CI009 | Baseten's Terms state that fees are billed at the end of the month and payable within 30 days unless an Order says otherwise. | 中 | SI013 |
| CI010 | Baseten's Terms make the Order the binding commercial instrument, so enterprise economics can vary contract by contract even though list pricing is public. | 中 | SI013, SI005 |
| CI011 | The billing usage API returns separate dedicated_usage, training_usage, and model_apis_usage blocks with subtotals, credits used, totals, and daily breakdowns. | 中 | SI007 |
| CI012 | The model_apis_usage block reports model name plus input, output, and cached input token counts. | 中 | SI007 |
| CI013 | The dedicated_usage block reports billable resource metadata, minutes, subtotal, and inference request counts. | 中 | SI007 |
| CI014 | Baseten explicitly monetizes support and engineering help through Pro, Enterprise, and enterprise deployment offers. | 高 | SI001, SI002, SI005 |
| CI015 | Dedicated Inference claims Baseten regularly sees 6x better GPU utilization and 5-10x lower costs powered by its inference stack. | 中 | SI002 |
| CI016 | The Model APIs page claims Baseten can spend 5-10x less than closed alternatives when serving optimized frontier open models. | 中 | SI003 |
| CI017 | The Enterprise page frames Baseten's economic advantage as higher output and better GPU utilization from optimized runtimes rather than seat-based software pricing. | 中 | SI005 |
| CI018 | The Healthcare page says per-minute billing and scale-to-zero make GPU costs scale with active inference rather than idle overhead. | 中 | SI006 |
| CI019 | Writer reports 35% lower cost per million tokens, 60% higher tokens per second, and 23% lower time to first token on Baseten. | 中 | SI016 |
| CI020 | OpenEvidence reports 78% lower latency, 6x faster deployment processes, 8x+ lower infrastructure maintenance time, and flexible access to compute without multi-year contracts. | 中 | SI017 |
| CI021 | Speechify reports 44% lower cost per million characters, 30-50% lower p99 latency, and 4.5x faster replica startup after migrating to Baseten. | 中 | SI018 |
| CI022 | Superhuman reports 80% lower P95 latency and says Baseten freed multiple engineers from building and running inference infrastructure in-house. | 中 | SI019 |
| CI023 | Patreon reports 440+ hours of development time saved per year, $600,000 of resources saved per year, and 70% GPU-cost savings on Baseten. | 中 | SI020 |
| CI024 | Taken together, Baseten's customer proofs sell lower total production cost and faster deployment for serious workloads rather than the lowest raw GPU-hour list price. | 中 | SI016, SI017, SI018, SI019, SI020 |
| CI025 | HostFleet's April 2026 matrix shows Baseten priced above Runpod on every shared GPU SKU it lists, above Modal on the shared L4 and H100 rows, and below only Replicate's A100 custom deployment rate among the shared A100 prices shown. | 中 | SI027 |
| CI026 | HostFleet says Baseten combines per-GPU-hour pricing with a minimum dedicated deployment cost and billed minimum awake times. | 中 | SI027 |
| CI027 | Baseten raised $300 million at a $5 billion valuation in January 2026. | 高 | SI010, SI021, SI024, SI025 |
| CI028 | Business Wire says the January 2026 financing was Baseten's third fundraise in the prior year and brought total capital raised to $585 million. | 高 | SI021, SI024, SI025 |
| CI029 | Baseten's Series D was $150 million in September 2025. | 高 | SI009, SI024 |
| CI030 | Baseten's Series C was $75 million in February 2025. | 高 | SI008, SI024 |
| CI031 | Tracxn and CB Insights also show $585 million total funding and a $300 million Series E on January 20, 2026. | 中 | SI024, SI025 |
| CI032 | Baseten's Series E blog says inference volume grew 100x in the prior year. | 中 | SI010 |
| CI033 | Baseten's Series E materials say the new capital will fund speed, uptime, developer experience, team growth, and a broader infrastructure platform. | 高 | SI010, SI021 |
| CI034 | Tech Funding News says the new funding is expected to support hiring in engineering and customer service plus platform and integration expansion. | 中 | SI022 |
| CI035 | Sacra estimates Baseten reached $200 million annualized revenue in December 2025 and $600 million annualized revenue in March 2026. | 低 | SI023 |
| CI036 | Sacra says Baseten monetizes either API consumption or GPU minutes and hours and uses multi-cloud capacity management across more than 15 cloud providers instead of owning GPU infrastructure. | 中 | SI023 |
| CI037 | PitchBook labeled Baseten as generating revenue by February 2025 and showed 73 employees in its April 2025 snapshot. | 中 | SI026 |
| CI038 | Tracxn lists 258 employees as of April 2026. | 中 | SI024 |
| CI039 | The jump from 73 employees in PitchBook's 2025 snapshot to 258 in Tracxn's April 2026 snapshot implies substantial operating-expense growth, but payroll and burn are undisclosed. | 中 | SI024, SI026 |
| CI040 | Baseten's status page shows Model APIs at 99.91% uptime over the displayed 90-day window and multiple incidents in May 2026, while the Dedicated Inference component shows 100.0% uptime over the same displayed window. | 中 | SI015 |
| CI041 | The Dedicated Inference SLA targets 99.9% monthly availability, caps service credits at 40% of monthly fees, and requires claims within 24 hours of downtime. | 中 | SI014 |
| CI042 | Baseten's privacy policy identifies the contracting entity as BaseTen Labs, Inc. | 中 | SI012 |
| CI043 | The SEC EDGAR entity landing page for CIK 0001850888 says there is no filings data for the organization, so there are no public SEC operating-company financial statements available from that page. | 中 | SI029 |
| CI044 | Mordor says cloud deployments were 67.33% of enterprise AI revenue in 2025 and hybrid and edge deployments are forecast to grow 19.53% CAGR through 2031. | 中 | SI028 |
| CI045 | Mordor says healthcare and life sciences are forecast to grow 20.77% CAGR through 2031. | 中 | SI028 |
| CI046 | Baseten's enterprise and healthcare pages align with that opportunity through self-host, cloud-commitment, data-residency, HIPAA, and SOC 2 positioning. | 中 | SI005, SI006, SI028 |
| CI047 | Baseten's public materials do not disclose cash balance, monthly burn, runway, gross margin, CAC, NRR, customer concentration, or revenue mix by product surface. | 中 | SI001, SI010, SI013, SI021, SI023, SI024, SI025, SI029 |
| CI048 | Sacra reports Baseten is in talks to raise capital at about an $11 billion post-money valuation, with some reported offers as high as $15 billion, but that is not a closed financing. | 低 | SI023 |
| CI049 | Because Baseten appears asset-light on owned GPUs but premium-priced on raw list compute, margin quality likely depends on utilization, enterprise support attachment, and negotiated discounts rather than headline GPU rates alone. | 中 | SI005, SI023, SI027 |
| CI050 | The public evidence supports strong demand, pricing, and capital-access narratives, but a real underwriting decision still depends on private data for realized pricing, retention, gross margin, burn, and concentration. | 中 | SI021, SI023, SI024, SI025, SI027, SI029 |
| CI051 | Baseten's public customer proofs now span financial-services AI, coding copilots, voice dictation, and world-model workloads, indicating that production demand is diversified across several latency-sensitive categories rather than one single end market. | 中 | SI030, SI031, SI032, SI033, SI034 |
| CI052 | Hebbia said Baseten improved tokens per second 2.5x, improved time to first token 4x, and reduced inference cost by more than 10x versus its previous deployment. | 中 | SI030 |
| CI053 | Posit said Baseten delivered sub-200ms latency for its Next Edit Suggestions feature and let the team pay only for compute it actually used. | 中 | SI031 |
| CI054 | Wispr Flow said its end-to-end speech and Llama pipeline ran in under 700 milliseconds at p99 on Baseten and AWS, with scale-to-zero elasticity. | 中 | SI032 |
| CI055 | Zed said Baseten lowered p90 latency by 45% and increased throughput 3.6x versus its previous inference provider, supporting Baseten's claim that performance wins can displace incumbent infrastructure. | 中 | SI033 |
| CE001 | Baseten publicly presents a full-stack product surface spanning Truss-led custom deployment, Model APIs, Dedicated Inference, Chains, Frontier Gateway, and Training rather than a single hosting SKU. | 高 | SE001, SE002, SE005, SE007, SE008, SE009, SE010 |
| CE002 | Model APIs run on shared infrastructure with OpenAI and Anthropic API compatibility, while dedicated deployments let customers choose hardware, engines, and scaling for their own models. | 高 | SE006, SE007 |
| CE003 | Truss packages model serving logic, dependencies, weights, and GPU configuration so the same artifact behaves consistently in development and production. | 中 | SE025, SE027 |
| CE004 | Truss publicly supports vLLM, SGLang, TensorRT-LLM, transformers, diffusers, PyTorch, and TensorFlow. | 中 | SE025, SE027 |
| CE005 | For supported architectures, a config-only Truss deployment can compile a model with TensorRT-LLM and expose an OpenAI-compatible endpoint without custom Python model code. | 中 | SE004, SE025 |
| CE006 | Chains deploys Python-defined chainlets where each step can set its own hardware resources, software dependencies, and autoscaling settings. | 高 | SE002, SE009 |
| CE007 | Baseten's training surface has two public tracks: Training Jobs is GA and Loops is early access. | 中 | SE010 |
| CE008 | Loops is positioned as a training SDK whose checkpoints can promote directly into Dedicated Inference, making inference a first-class output of training. | 中 | SE010, SE026 |
| CE009 | Frontier Gateway adds a white-labeled API surface with key management, rate limits, metering, billing, and branded URLs for labs serving their own models to customers. | 高 | SE002, SE008 |
| CE010 | MCM is Baseten's infrastructure control plane for unifying GPUs across cloud providers and regions, provisioning resources, and rerouting workloads during capacity crunches or outages. | 高 | SE004, SE011 |
| CE011 | Baseten gives each deployment a dedicated model subdomain and keeps endpoint names stable across environment promotion. | 中 | SE004 |
| CE012 | Baseten's request-routing model parks requests during scale-to-zero cold starts and offers an async queue that prioritizes synchronous traffic when capacity is tight. | 中 | SE004 |
| CE013 | BDN mirrors model weights into Baseten-controlled storage and uses mirrored-origin, cluster, and node caches to make large-model cold starts faster after the first pull. | 高 | SE004, SE019 |
| CE014 | Baseten publicly documents runtime optimizations including TensorRT, SGLang, vLLM, TGI, TEI, speculative decoding, structured outputs, KV-cache optimization, and topology-aware parallelism. | 高 | SE002, SE013 |
| CE015 | Baseten offers Baseten Cloud, self-hosted, hybrid, single-tenant, and region-restricted deployment options for customers that need different control or residency models. | 高 | SE007, SE011, SE015 |
| CE016 | Regional environments require Baseten configuration and a different regional endpoint format to guarantee inference traffic stays inside the designated geography. | 中 | SE021 |
| CE017 | Baseten publicly claims SOC 2 Type II and HIPAA compliance across its cloud hosting surfaces. | 高 | SE014, SE015, SE016 |
| CE018 | Baseten says it does not store model inputs, outputs, or weights by default, except temporary storage for async inference and optional caching users enable. | 高 | SE014, SE015 |
| CE019 | Baseten's public security docs say the platform never shares GPUs across users, isolates each customer into a dedicated Kubernetes namespace, and uses Calico, Falco, and Gatekeeper around workload security. | 中 | SE014 |
| CE020 | Baseten added Enterprise SSO and SCIM in May 2026 with SAML 2.0 sign-in, SCIM 2.0 sync, just-in-time provisioning, automatic deprovisioning, and group-based role assignment. | 中 | SE017 |
| CE021 | Rolling deployments launched in March 2026 and introduced max_surge_percent and stabilization_time_seconds controls for gradual zero-downtime promotion. | 中 | SE018 |
| CE022 | The billing usage API launched in March 2026 and exposes daily spend breakdowns across Dedicated Inference, Training, and Model APIs. | 中 | SE020 |
| CE023 | The only reviewed public Baseten SLA is for Dedicated Inference at 99.9% monthly availability, while Baseten marketing elsewhere uses four-nines or 99.99 reliability language. | 高 | SE001, SE007, SE015, SE023 |
| CE024 | Baseten's public status page showed incidents on May 15, 16, 18, 19, 26, and 29, 2026 even though its summary cards displayed 100.0% uptime for Dedicated Inference and 99.91% for Model APIs over the visible 90-day window. | 中 | SE022 |
| CE025 | ServiceAlert's third-party reachability page listed May 2026 at 100% uptime but explicitly said detailed incident data is unavailable, limiting independent verification of Baseten outage quality. | 中 | SE030 |
| CE026 | Truss has a visible public developer surface through a GitHub repository, a PyPI package, and active May 2026 release activity. | 高 | SE025, SE026, SE027 |
| CE027 | The May 2026 Truss release stream emphasized Loops CLI features, training checkpoint views, deployment-log links, and inference-call behavior, which indicates active investment in the training-to-inference workflow. | 中 | SE026 |
| CE028 | Writer's Baseten case study says model-specific TensorRT-LLM engines delivered 60% higher tokens per second, 23% lower time to first token, and 35% lower cost per million tokens on four A100 GPUs. | 中 | SE028 |
| CE029 | OpenEvidence says Baseten reduced end-to-end latency from more than 700 milliseconds to 160 milliseconds and sped deployments 6x. | 中 | SE029 |
| CE030 | OpenEvidence also says Baseten now serves billions of requests per week for its medical workflow and reduced infrastructure maintenance time by more than 8x. | 中 | SE029 |
| CE031 | HostFleet's April 2026 pricing matrix shows Baseten posting higher public GPU-hour rates than Runpod and Modal on comparable L4, A100, and H100 instances. | 中 | SE016, SE031, SE032, SE033 |
| CE032 | Despite the higher published price points, HostFleet characterizes Truss, observability, and support as Baseten's tangible value-adds for startups running production inference. | 中 | SE031 |
| CE033 | Runpod and Modal market more aggressive zero-idle and cold-start language than Baseten, while Baseten emphasizes dedicated compute, managed performance engineering, and control. | 中 | SE005, SE031, SE032, SE033 |
| CE034 | Replicate's public product surface is simpler API-first model serving through Cog, whereas Baseten layers dedicated deployments, Chains, and Frontier Gateway on top of its packaging tool. | 中 | SE008, SE009, SE025, SE034 |
| CE035 | AWS SageMaker, Google Agent Platform, and Azure Machine Learning all span training, deployment, governance, and observability, so Baseten competes by offering a narrower inference-first abstraction rather than full hyperscaler platform breadth. | 中 | SE004, SE035, SE036, SE037 |
| CE036 | A third-party security profile lists AWS, Vercel, Statuspage, SendGrid, Stripe, Segment, Sentry, and other SaaS tools in Baseten's visible operational footprint. | 中 | SE038 |
| CE037 | Baseten's visible 2026 roadmap signal centered on trust and operating controls such as SSO/SCIM, rolling deployments, BDN, and billing instrumentation rather than entirely new product lines. | 中 | SE017, SE018, SE019, SE020 |
| CE038 | Public materials show uneven maturity within the training stack because Training Jobs is GA while Loops is still early access. | 中 | SE010 |
| CE039 | Public Baseten sources still leave unresolved product-tech gaps around benchmark methodology, exact regional-environment setup lead times, and roadmap priorities beyond the currently announced 2026 releases. | 低 | SE004, SE021, SE028 |
| CE040 | Baseten's product-tech moat appears strongest for teams that value performance tuning, cross-cloud capacity, and engineering support more than lowest published unit price or hyperscaler breadth. | 中 | SE007, SE015, SE031, SE032, SE033, SE035, SE036, SE037 |
| CU001 | Baseten markets itself as a high-performance inference platform for teams shipping AI products in production. | 中 | SU001 |
| CU002 | Baseten's enterprise page targets mission-critical enterprise inference with secure, scalable, and controllable deployment options. | 中 | SU003 |
| CU003 | Baseten publicly packages Basic, Pro, and Enterprise plans around progressively heavier buyer needs, from pay-as-you-go deployments to self-hosted regulated environments. | 中 | SU005 |
| CU004 | Writer positions itself as an enterprise AI platform used by world-class enterprises. | 中 | SU008, SU015, SU016 |
| CU005 | OpenEvidence describes itself as a medical knowledge platform for clinicians and physicians. | 中 | SU009, SU014 |
| CU006 | Speechify says more than 55 million people use its voice AI productivity assistant. | 中 | SU010, SU018 |
| CU007 | Gamma says its AI tools create presentations, websites, and social content. | 中 | SU011, SU017 |
| CU008 | Superhuman positions itself as AI-enhanced mail, docs, and workflow software for knowledge workers. | 中 | SU012, SU020 |
| CU009 | Patreon says hundreds of thousands of creators use its platform to build direct fan communities and recurring businesses. | 中 | SU013, SU021 |
| CU010 | Business Wire names OpenEvidence, Abridge, Notion, Clay, and Mercor among Baseten customers. | 中 | SU007 |
| CU011 | WorkOS says Baseten powers AI workloads for Cursor, Notion, Clay, OpenEvidence, and Ambience. | 中 | SU023 |
| CU012 | Baseten says its inference volume grew 100x in the last year. | 中 | SU006 |
| CU013 | Baseten's customer-stories index spans speech, healthcare, coding, pharmaceutical search, and AI operations use cases. | 中 | SU002 |
| CU014 | OpenEvidence says Baseten now serves billions of requests per week for its medical-information product. | 中 | SU009 |
| CU015 | OpenEvidence says its product now works with a doctor in every state and zip code in America. | 中 | SU009 |
| CU016 | Speechify says its platform synthesizes more than 161 billion characters per month for 60M+ users. | 中 | SU010 |
| CU017 | Gamma says it generates more than 3 million images per day for more than 70 million users on Baseten. | 中 | SU011 |
| CU018 | Superhuman says Baseten runs dozens of custom embedding models that power core features in its product. | 中 | SU012 |
| CU019 | Patreon says Baseten saved 440+ hours of developer time and nearly $600k per year on its Whisper deployment. | 中 | SU013 |
| CU020 | FeaturedCustomers lists 13 case studies, 29 testimonials, 4 customer videos, and 654 reference ratings for Baseten. | 中 | SU024 |
| CU021 | Writer reports 60% higher tokens per second on Baseten for its domain-specific LLMs. | 中 | SU008 |
| CU022 | Writer reports 23% lower time to first token and 35% lower cost per million tokens on Baseten. | 中 | SU008 |
| CU023 | OpenEvidence reports latency falling from more than 700 milliseconds to 160 milliseconds on Baseten. | 中 | SU009 |
| CU024 | OpenEvidence reports 6x faster deployments and an 8x+ reduction in infrastructure maintenance time on Baseten. | 中 | SU009 |
| CU025 | Speechify reports a 44% lower cost per million characters on Baseten. | 中 | SU010 |
| CU026 | Speechify reports 30-50% lower p99 inference latency and 4.5x faster replica startup on Baseten. | 中 | SU010 |
| CU027 | Gamma reports 30%-80% faster image generation per model on Baseten. | 中 | SU011 |
| CU028 | Gamma reports 20% efficiency improvement while reducing replica count and supporting billions of generated images. | 中 | SU011 |
| CU029 | Superhuman reports an average 80% reduction in P95 latency across its embedding models on Baseten. | 中 | SU012 |
| CU030 | Patreon reports 70% GPU-cost savings and says Baseten was twice as cheap as the next cheapest solution for its Whisper workload. | 中 | SU013 |
| CU031 | FeaturedCustomers reports a 4.8 out of 5 reference-rating score for Baseten based on 654 ratings. | 中 | SU024 |
| CU032 | OpenEvidence says Baseten was a clear winner after the team spent weeks researching and vetting inference providers. | 中 | SU009 |
| CU033 | Speechify says Baseten delivered the highest uptime of any inference provider it knows. | 中 | SU010 |
| CU034 | Superhuman says it was able to self-serve 95% of what it needed on Baseten. | 中 | SU012 |
| CU035 | PeerSpot's review summary emphasizes Baseten's supportive environment, speed-to-deployment, flexibility, and cost effectiveness. | 中 | SU031 |
| CU036 | Baseten's pricing page shows a self-serve Basic plan, a Pro plan with dedicated compute and hands-on engineering, and an Enterprise plan with self-hosting and custom SLAs. | 中 | SU005 |
| CU037 | Baseten's enterprise page says Baseten Cloud offers single-tenant clusters and the self-hosted product can fail over to Baseten Cloud. | 中 | SU003 |
| CU038 | Baseten's healthcare page says the platform is SOC 2 Type II and HIPAA compliant, supports region-restricted deployments, and highlights OpenEvidence and Latent as healthcare cases. | 中 | SU004 |
| CU039 | WorkOS says customers often start thinking about controlling their own destiny once inference spending reaches roughly $10,000-$50,000 per month. | 中 | SU023 |
| CU040 | WorkOS says open-source models let companies switch to options that are faster, cheaper, more customizable, and more reliable at scale. | 中 | SU023 |
| CU041 | Business Wire says Baseten pitches open runtimes and no lock-in around customer models. | 中 | SU007 |
| CU042 | HostFleet says Baseten is the highest-priced listed provider in its April 2026 comparison for T4, L4, A10G, A100, and H100 where listed, and adds that Baseten has a minimum dedicated deployment cost. | 中 | SU026 |
| CU043 | Runpod ranks Baseten fifth in its 2026 serverless GPU comparison and characterizes it as per-minute, configurable-replica infrastructure with 8-12 second speed. | 中 | SU025 |
| CU044 | NVIDIA says Baseten cut cold starts from up to five minutes to 5-10 seconds using NVIDIA GPUs and TensorRT-LLM. | 中 | SU022 |
| CU045 | Publicly quantified proof is concentrated in six flagship case studies even though fundraising and interview materials name additional accounts. | 中 | SU002, SU007, SU023, SU024 |
| CU046 | Reviewed public customer materials do not disclose NRR, GRR, contract length, or top-customer revenue share. | 中 | SU002, SU003, SU005, SU006, SU007 |
| CU047 | Abridge sells enterprise-grade AI for clinical conversations trusted by the largest healthcare systems. | 中 | SU007, SU027 |
| CU048 | Clay says more than 500,000 GTM teams use its data-enrichment and workflow platform. | 中 | SU023, SU028 |
| CU049 | Cursor says it is trusted by over half of the Fortune 500 for AI-assisted software development. | 中 | SU023, SU029 |
| CU050 | Notion AI markets built-in agents, enterprise search, HIPAA-capable enterprise workflows, and zero-data-retention enterprise controls. | 中 | SU007, SU030 |
| CU051 | Mercor says it is organizing human intelligence to power the AI economy. | 中 | SU007, SU032 |
| CU052 | Publicly named strategic accounts extend Baseten beyond consumer applications into healthcare, GTM, coding, and enterprise productivity. | 中 | SU007, SU023, SU027, SU028, SU029, SU030, SU032 |
| CU053 | Public references skew toward AI-native software companies whose own products depend heavily on inference quality and latency. | 中 | SU002, SU007, SU008, SU009, SU010, SU011, SU012, SU013 |
| CU054 | Baseten's public customer-proof quality is high on outcome specificity for six flagship stories but low on disclosed renewal economics. | 中 | SU008, SU009, SU010, SU011, SU012, SU013, SU024 |
| CU055 | The public record supports land-and-expand potential from model experimentation into dedicated compute, multi-cloud scale, and self-hosted enterprise configurations. | 中 | SU003, SU005, SU009, SU010, SU012 |
| CR001 | Baseten says it maintains SOC 2 Type II certification and HIPAA compliance. | 高 | SR001, SR011, SR012 |
| CR002 | Baseten says it does not store model inputs or outputs by default, except async inputs are temporarily stored until processed. | 高 | SR001, SR011 |
| CR003 | Baseten says compliance policies are read-only for customers and must be changed through Baseten support. | 中 | SR001 |
| CR004 | Baseten offers self-hosted and single-tenant deployment options for sensitive workloads on higher-tier plans. | 高 | SR001, SR008, SR011, SR024 |
| CR005 | Baseten's terms incorporate a DPA and security measures that Baseten may update so long as overall protection is not materially decreased. | 中 | SR003 |
| CR006 | Baseten's DPA lets customers object to a new subprocessor within five calendar days after notice. | 中 | SR003 |
| CR007 | Baseten's DPA says it will notify customers without undue delay after discovering a personal-data breach affecting customer personal data, but customers remain responsible for their own notification obligations. | 中 | SR003 |
| CR008 | Baseten's DPA says customers must not provide PHI and other Restricted Data unless otherwise agreed upon with Baseten in writing. | 高 | SR003, SR029 |
| CR009 | HHS says a covered entity must obtain written satisfactory assurances before disclosing PHI to a business associate. | 高 | SR029, SR003 |
| CR010 | Baseten's healthcare positioning creates a diligence need to verify a signed BAA or similar written override before underwriting PHI workflows. | 高 | SR001, SR003, SR012, SR029 |
| CR011 | The European Commission says the AI Act is being implemented through guidance, codes of practice, and an AI Act Service Desk. | 中 | SR030 |
| CR012 | Because Baseten markets healthcare and regulated enterprise workloads, AI Act and GDPR implementation can lengthen security and legal review cycles even if Baseten is infrastructure rather than the end application. | 中 | SR011, SR012, SR030 |
| CR013 | Baseten's published SLA applies only to Dedicated Inference for which Baseten is the hosting party. | 高 | SR004, SR024 |
| CR014 | Baseten's published Dedicated Inference SLA targets 99.9% monthly availability. | 高 | SR004, SR024 |
| CR015 | Baseten's SLA caps service credits at 40% of monthly fees and requires customers to submit claims within 24 hours of unscheduled downtime. | 中 | SR004 |
| CR016 | Baseten's terms say the services are not licensed for time-critical or mission-critical functions and are not warranted to be uninterrupted or error-free. | 高 | SR003, SR004 |
| CR017 | Baseten's status page shows multiple May 2026 incidents, including investigation, identified-fix, monitoring, and major-outage markers in the 90-day view. | 中 | SR005 |
| CR018 | Servicealert says detailed incident data is not available for Baseten and that its history is based on reachability monitoring. | 中 | SR006 |
| CR019 | Baseten's public product pages market four nines or 99.99% uptime more broadly than the default 99.9% Dedicated Inference SLA. | 高 | SR004, SR011, SR012, SR020, SR023, SR024 |
| CR020 | Baseten shipped rolling deployments with gradual traffic shifting, pause, resume, and cancel controls as a mitigation against deployment-induced outages. | 中 | SR026, SR022 |
| CR021 | Baseten positions its reliability story around multi-cloud, multi-region autoscaling and hybrid deployment options rather than a single-cloud architecture. | 高 | SR010, SR011, SR023, SR024 |
| CR022 | Nudge Security lists AWS, Vercel, Statuspage, SendGrid, Stripe, Segment, Sentry, GitBook, and other SaaS tools in Baseten's visible supply chain. | 中 | SR007 |
| CR023 | Baseten's frontier, model API, and dedicated inference pages all tie product promises to access to the latest-generation GPUs and elastic capacity. | 高 | SR008, SR020, SR023, SR024 |
| CR024 | Technavio says AI inference-as-a-service providers face hardware supply constraints and high accelerator costs that inflate operating costs and limit scalability. | 中 | SR013 |
| CR025 | Mordor Intelligence says hardware accelerators are the fastest-growing enterprise AI component and that GPU supply constraints and salary inflation are current headwinds. | 中 | SR014 |
| CR026 | HostFleet's April 2026 matrix shows Baseten priced above Runpod, Modal, and Fal.ai on multiple comparable GPU classes. | 中 | SR016, SR008 |
| CR027 | Runpod's 2026 comparison lists Baseten with per-minute pricing and an 8-12 second cold-start range while ranking cheaper or faster peers above it on some dimensions. | 中 | SR015, SR008 |
| CR028 | HostFleet says Baseten has a minimum dedicated deployment cost and billed minimum awake times, which raises entry friction for smaller customers. | 中 | SR016, SR008 |
| CR029 | Baseten counters lock-in risk with self-hosting, hybrid deployment, open runtimes, and full ownership of trained weights. | 中 | SR011, SR021, SR028 |
| CR030 | Baseten announced a $300M Series E at a $5B valuation in January 2026 after multiple fundraises within the prior year. | 中 | SR018, SR017 |
| CR031 | Baseten says the financing marked the company's third fundraise in the prior year, increasing pressure to convert capital into durable enterprise growth. | 中 | SR018, SR017 |
| CR032 | Tracxn lists Baseten at 46 employees on December 31, 2024 and 258 employees by April 26, 2026. | 低 | SR017 |
| CR033 | Baseten's careers page says companies such as Abridge, Cursor, Lovable, Notion, and OpenEvidence depend on Baseten for mission-critical AI workloads. | 中 | SR019, SR018 |
| CR034 | Baseten is expanding simultaneously across model APIs, dedicated inference, frontier gateway, model management, and training products. | 中 | SR020, SR021, SR022, SR023, SR024 |
| CR035 | Baseten's Loops training product is still early access even as Training Jobs is generally available. | 中 | SR021 |
| CR036 | SSO/SCIM, advanced identity controls, self-hosting, and custom SLAs are tied to higher-tier enterprise packaging rather than the self-serve entry plan. | 中 | SR008, SR011, SR025 |
| CR037 | Baseten added SSO/SCIM with automatic provisioning and deprovisioning plus group-based role assignment as a concrete mitigation for identity risk in larger accounts. | 中 | SR025, SR011 |
| CR038 | Baseten's billing usage API gives customers programmatic daily cost visibility across Dedicated Inference, Training, and Model APIs. | 中 | SR027, SR008 |
| CR039 | Baseten's model-management tooling says customers can monitor deployment health and adjust autoscaling policies to hit performance SLAs. | 中 | SR022, SR010 |
| CR040 | Truss and custom-server packaging reduce some switching-cost risk because Baseten exposes a more portable packaging layer than a fully closed model-hosting service. | 中 | SR028, SR022 |
| CR041 | Baseten's repeated emphasis on hands-on engineering expertise and customized deployments implies a service-heavy go-to-market model that may pressure margins as enterprise accounts scale. | 中 | SR008, SR011, SR024 |
| CR042 | Baseten's public contract stack leaves customers responsible for system configuration, backups, valid legal basis, and parts of incident response, which can slow regulated deployments even when Baseten provides secure infrastructure. | 高 | SR003, SR004, SR029 |
| CR043 | Modal said it raised $355 million in May 2026 after surpassing $300 million in annualized revenue, showing that a close inference-infrastructure rival is scaling quickly with large new capital. | 高 | SR031, SR032 |
| CR044 | Reuters reported that Modal's Series C valued the company at $4.65 billion, close to Baseten's $5 billion January 2026 valuation, which limits room for execution misses if buyers compare the platforms directly. | 中 | SR032 |
| CR045 | Sacra estimated Fireworks AI at roughly $315 million in annualized revenue in 2026 and a $4 billion valuation from its 2025 Series C, indicating that another open-model inference peer is already operating at substantial scale. | 中 | SR033 |
| CR046 | Tracxn says RunPod has raised only $22 million while positioning itself as a cost-effective GPU-infrastructure provider, which suggests cheaper rivals do not need Baseten-like capital intensity to pressure pricing. | 中 | SR034 |
| CR047 | CoreWeave reported nearly $100 billion of revenue backlog in May 2026 and explicitly framed inference as a major growth vector, underscoring that capital-rich infrastructure platforms are racing to absorb the same demand pool Baseten targets. | 中 | SR035 |
| CV001 | Baseten officially announced a $300 million Series E at a $5 billion valuation in January 2026. | 高 | SV001, SV002, SV003, SV005 |
| CV002 | After the Series E, public sources put Baseten’s total disclosed funding at about $585 million. | 中 | SV002, SV004, SV005, SV006 |
| CV003 | Tracxn records Baseten’s financing path as a $75 million Series C in February 2025, a $150 million Series D at a $2.15 billion valuation in September 2025, and a $300 million Series E at a $5 billion valuation in January 2026. | 中 | SV005 |
| CV004 | Baseten’s official Series D announcement says the company raised $150 million in a round led by BOND. | 高 | SV008, SV005 |
| CV005 | Baseten’s Series C announcement and PitchBook archive together support that the company’s 2025 Series C was a $75 million round. | 中 | SV009, SV007 |
| CV006 | Baseten said inference volume grew 100x during 2025. | 中 | SV001, SV004 |
| CV007 | Sacra estimates Baseten reached $600 million of annualized revenue in March 2026, up from about $200 million in December 2025. | 中 | SV004 |
| CV008 | Sacra says Baseten was in talks in May 2026 to raise $1 billion at an $11 billion post-money valuation, with reported offers reaching as high as $15 billion. | 中 | SV004 |
| CV009 | The gap between the closed $5 billion round and the mooted $11 billion follow-on means the underwriting question is whether fundamentals have caught up with sentiment, not whether enthusiasm exists. | 中 | SV001, SV002, SV004 |
| CV010 | Baseten’s pricing page shows a free pay-as-you-go Basic tier, while Pro adds priority compute and dedicated support and Enterprise adds custom SLAs and self-hosting. | 中 | SV010 |
| CV011 | Baseten’s homepage pitches cross-cloud scale, forward-deployed engineers, and 99.99% uptime as reasons customers should trust it for production workloads. | 中 | SV011 |
| CV012 | HostFleet’s April 2026 pricing matrix shows Baseten at $4.00 per hour for A100 and $6.50 per hour for H100, above Modal at $2.10 and $3.95 and above Runpod at $2.17 and $3.35 on the same GPU classes. | 中 | SV012 |
| CV013 | HostFleet also notes Baseten combines per-GPU-hour pricing with a minimum dedicated deployment cost and billed minimum awake times. | 中 | SV012 |
| CV014 | Runpod’s 2026 comparison ranks Baseten below several alternatives on affordability and cites usage-based per-minute billing with 8–12 second cold starts. | 中 | SV013 |
| CV015 | Baseten can still justify premium pricing if observability, support, compliance, and hybrid deployment reduce customers’ total cost of production inference. | 中 | SV010, SV011, SV012 |
| CV016 | Modal disclosed a May 2026 Series C of $355 million at a $4.65 billion post-money valuation after surpassing $300 million in annualized revenue. | 高 | SV015, SV016 |
| CV017 | Modal’s May 2026 round implies an approximate 15.5x annualized-revenue multiple. | 中 | SV015, SV016 |
| CV018 | Modal says it can scale from 0 to 1,000 GPUs in minutes or even seconds, making it a credible direct infrastructure comparable rather than a generic application software company. | 中 | SV015 |
| CV019 | Fireworks AI’s last closed round was a $250 million Series C at a $4 billion post-money valuation, while Sacra estimates roughly $315 million of annualized revenue in February 2026 and gross margin around 50%. | 中 | SV018 |
| CV020 | Fireworks’ closed-round valuation implies an approximate 12.7x annualized-revenue multiple, which is above Baseten’s implied closed-round multiple if the Sacra estimate is right. | 中 | SV018, SV004 |
| CV021 | CoreWeave reported Q1 2026 revenue of $2.078 billion and a $99.4 billion revenue backlog, while CompaniesMarketCap showed a May 2026 market cap of $59.75 billion. | 中 | SV019, SV020 |
| CV022 | Using CoreWeave’s 2026 revenue context, public AI infrastructure is trading at roughly 4.8x market cap to annualized-guide revenue. | 中 | SV019, SV020 |
| CV023 | Datadog guided to $4.30 billion to $4.34 billion of 2026 revenue, and CompaniesMarketCap put its May 2026 market cap at $88.04 billion. | 高 | SV021, SV022 |
| CV024 | Datadog’s implied multiple is about 20.4x forward revenue, showing how the market prices premium infrastructure software with strong growth and disclosure. | 中 | SV021, SV022 |
| CV025 | Datadog’s Form 10-K highlights the disclosure baseline public investors get on risk factors, revenue, and growth that private Baseten investors do not get from public materials. | 高 | SV023, SV021 |
| CV026 | CompaniesMarketCap and Stock Analysis put Cloudflare at about an $85.47 billion market cap and $2.33 billion of trailing revenue in late May 2026. | 中 | SV024, SV025 |
| CV027 | Cloudflare’s implied 36.7x revenue multiple is an upper-bound developer-platform reference that assumes much better disclosure, margin structure, and category leadership than Baseten has shown publicly. | 中 | SV024, SV025 |
| CV028 | CompaniesMarketCap and Stock Analysis put MongoDB at about a $27.01 billion market cap and $2.60 billion of trailing revenue in late May 2026. | 中 | SV026, SV027 |
| CV029 | MongoDB’s implied 10.4x revenue multiple is a lower-middle public infrastructure-software reference for a scaled but less euphoric comp set. | 中 | SV026, SV027 |
| CV030 | Technavio values the AI inference-as-a-service market at $85.25 billion in 2025 and expects 22.1% CAGR through 2030. | 中 | SV028 |
| CV031 | Mordor values the broader enterprise AI market at $114.87 billion in 2026, with cloud deployment accounting for 67.33% of 2025 revenue. | 中 | SV029 |
| CV032 | AWS Bedrock advertises select batch inference at 50% below on-demand pricing, showing hyperscalers can attack the inference layer with bundled economics. | 中 | SV030 |
| CV033 | Google promotes a unified agent platform with 200-plus models and free credits for new customers, increasing the risk that enterprises default to broader cloud bundles. | 中 | SV031 |
| CV034 | Azure Machine Learning publishes a 99.9% SLA and no additional platform charge beyond underlying Azure services, reinforcing the bundling threat to independent vendors. | 中 | SV032 |
| CV035 | If Sacra’s $600 million annualized-revenue estimate is directionally right, Baseten’s closed $5 billion round implies roughly an 8.3x revenue multiple. | 中 | SV004 |
| CV036 | An $8.3x implied multiple would place Baseten above CoreWeave-like AI cloud treatment but below Modal, Fireworks, Datadog, and Cloudflare-style premium software treatment. | 中 | SV004, SV018, SV019, SV020, SV021, SV022, SV024, SV025 |
| CV037 | At the same $600 million run-rate, the mooted $11 billion follow-on would imply roughly an 18.3x multiple, much closer to Datadog-grade public software pricing. | 中 | SV004, SV021, SV022 |
| CV038 | Baseten’s pricing and delivery model suggest revenue quality may be more support-intensive and lower-margin than top public software comps even if growth is exceptional. | 中 | SV010, SV011, SV012, SV013 |
| CV039 | Fireworks’ roughly 50% gross margin and explicit 60% target are a useful reminder that inference platforms are infrastructure businesses first, not pure software businesses. | 中 | SV018 |
| CV040 | The strongest pro-valuation argument is that inference demand is large, cloud-heavy, and moving into production workloads where Baseten offers hybrid deployment and performance differentiation. | 中 | SV028, SV029, SV010, SV011 |
| CV041 | The strongest anti-valuation argument is that premium pricing can be attacked by Runpod and Modal at the edge and by hyperscalers through bundled platform pricing. | 中 | SV012, SV013, SV017, SV030, SV031, SV032 |
| CV042 | The current $5 billion price is supportable only conditionally because it assumes the private revenue estimate is directionally right and that Baseten can defend premium economics despite bundling pressure. | 中 | SV004, SV012, SV013, SV015, SV016, SV030, SV031, SV032 |
| CV043 | A reasonable bear case uses $300 million to $400 million of revenue support and a 7x to 9x multiple, implying roughly $2.1 billion to $3.6 billion of value. | 中 | SV004, SV018, SV026, SV027 |
| CV044 | A reasonable base case uses $500 million to $650 million of revenue support and an 8x to 12x multiple, implying roughly $4.0 billion to $7.8 billion and placing the closed $5 billion round inside the range. | 中 | SV004, SV015, SV016, SV018, SV026, SV027 |
| CV045 | A reasonable bull case uses $700 million to $900 million of revenue support and a 12x to 16x multiple, implying roughly $8.4 billion to $14.4 billion and making an $11 billion step-up possible only if growth and premium perception keep compounding. | 中 | SV004, SV015, SV016, SV018 |
| CV046 | The right investment recommendation is track, not buy, because company quality is high but the public evidence leaves the price only fair-to-stretched rather than clearly attractive. | 中 | SV004, SV012, SV013, SV015, SV016, SV023 |
| CV047 | The highest-leverage diligence question is whether internal revenue, gross margin, and customer-concentration data support the market narrative implied by the $5 billion round. | 中 | SV004, SV018, SV023 |
| CV048 | The thesis should break if Baseten cannot preserve premium price-performance with acceptable margin, if growth normalizes materially below the base-case band, or if any new round clears only with aggressive terms. | 中 | SV004, SV012, SV013, SV018, SV023 |
| 编号 | 出版方 | 标题 | 引文 |
|---|---|---|---|
| SO001 | Baseten | Baseten | Inference is everything | |
| SO002 | Baseten | Baseten customers | |
| SO003 | Baseten | Enterprise | |
| SO004 | Baseten | Healthcare | |
| SO005 | Baseten | Pricing | |
| SO006 | Baseten | Careers at Baseten | Companies like Abridge, Cursor, Lovable, Notion, and OpenEvidence depend on Baseten to power mission-critical AI workloads in production. |
| SO007 | Baseten | Baseten Terms and Conditions | BASETEN LABS, INC. (“BASETEN”). |
| SO008 | Baseten | Privacy Policy | Company (referred to as either "the Company", "We", "Us" or "Our" in this Agreement) refers to BaseTen Labs, Inc., 201 Spear St, Suite 1600, San Francisco, CA 94105. |
| SO009 | Baseten | Announcing our Series A | We’ve raised a little over $20 million dollars to date across our seed and Series A rounds. |
| SO010 | Baseten | Announcing our Series B | We’re excited to announce that we’ve raised an additional $40M. |
| SO011 | Baseten | Announcing Baseten’s $75M Series C | Today, we run workloads across thousands of GPUs, serving millions of end customers worldwide while continuously adding new cloud partners. |
| SO012 | Baseten | Announcing Baseten’s $150M Series D | Today, we’re excited to announce our $150M Series D, led by BOND, with Jay Simons joining our Board. |
| SO013 | Baseten | Announcing Baseten's $300M Series E | We’re thrilled to announce that we have raised $300M at a $5B valuation. |
| SO014 | Business Wire | Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future | Founded in 2019 and based in San Francisco, Baseten has raised $585 million to date from investors including IVP, CapitalG, Conviction, Bond, Greylock, and Spark Capital. |
| SO015 | Tech Funding News | Baseten nabs $300M from IVP, CapitalG to challenge Together AI in inference | |
| SO016 | Tracxn | Baseten Technologies | |
| SO017 | CB Insights | Baseten Stock Price, Funding, Valuation, Revenue & Financial Statements | |
| SO018 | PitchBook via Internet Archive | Baseten 2025 Company Profile: Valuation, Funding & Investors | PitchBook | |
| SO019 | Abridge | Abridge | Intelligence at the point of conversation | |
| SO020 | Clay | Clay | GTM workflows at scale | |
| SO021 | Cursor | Cursor | The new way to build software | |
| SO022 | OpenEvidence | OpenEvidence | America's Official Medical Knowledge Platform | |
| SO023 | Baseten | OpenEvidence delivers instant, accurate medical information with Baseten | Baseten now serves billions of requests per week for OpenEvidence. |
| SO024 | Baseten | How Gamma makes building presentations criminally fun | |
| SO025 | Baseten | Speechify real-time text-to-speech | Because of Baseten’s efficient autoscaling, model performance and infrastructure optimizations, Speechify’s cost per million characters dropped by 44%. |
| SO026 | Baseten | Patreon | |
| SO027 | NVIDIA | Streamlined AI Inference Infrastructure in the Cloud | Baseten’s infrastructure, powered by NVIDIA GPUs on AWS EC2 instances powered by NVIDIA A100 Tensor Core GPUs, only takes 5–10 seconds to get the model up and running. This is an incredible speedup on cold starts, which previously took up to five minutes. |
| SO028 | WorkOS | A conversation with Philip Kiely from Baseten at AWS re:Invent 2025 | |
| SO029 | Nudge Security | Is Baseten safe? Learn if Baseten Is Legit | Review Baseten security risks. |
| SO030 | ServiceAlert | Baseten Outage History, Downtime & Incident Records | Detailed incident data is not available for this service. |
| SO031 | Baseten | Tuhin Srivastava - CEO, Co-Founder | |
| SO032 | Baseten | Amir Haghighat - CTO, Co-Founder | |
| SO033 | Baseten | Pankaj Gupta - Co-Founder | |
| SO034 | Baseten | Phil Howes - Co-Founder | |
| SM001 | Baseten | Inference Platform: Deploy AI models in production | Baseten | Scale workloads across any region and any cloud (in our cloud or yours), with blazing-fast cold starts and 99.99% uptime out of the box. |
| SM002 | Baseten | Cloud Pricing | Basic: $0 per month, pay as you go. Enterprise adds self-host deployments, cloud commitments, and custom SLAs. |
| SM003 | Baseten | Mission-Critical Inference for Enterprise AI Infrastructure | The Baseten Inference Stack runs inside your cloud infrastructure, keeping your data fully under your control. |
| SM004 | Baseten | Healthcare | 99.99% uptime and infinite scaling through a unified GPU pool spanning 10+ clouds. |
| SM005 | Baseten | Production-First Model APIs - Baseten Inference Stack | Spend 5-10x less than closed alternatives with our optimized multi-cloud infrastructure and efficient frontier open models. |
| SM006 | Baseten | Inference at Scale with Dedicated Deployments | Baseten | We regularly see 6x better GPU utilization and 5-10x lower costs powered by our Inference Stack. |
| SM007 | Baseten | Multi-Model Inference, Ultra-Low Latency at Scale | Baseten | Baseten Chains enables granular hardware and autoscaling for compound AI, powering 6x better GPU usage and cutting latency in half. |
| SM008 | Baseten | Cloud-Native AI Infrastructure | Baseten | Scale in your cloud, ours, or both with Baseten Self-hosted, Cloud, and Hybrid deployment options. |
| SM009 | Baseten | Secure model inference - Baseten | Baseten never shares GPUs across users. |
| SM010 | Baseten | Customer stories | Speechify synthesizes 161B+ characters per month for 60M+ users. With Baseten, Speechify cut costs by 44%, p99 latency by 30-50%, and got 4.5x faster cold starts. |
| SM011 | Baseten | OpenEvidence delivers instant, accurate medical information with the Baseten Inference Stack | OpenEvidence can scale efficiently even in the face of traffic spikes, hardware failure, or capacity constraints... without locking into multi-year commitments with single cloud vendors. |
| SM012 | Baseten | How Gamma makes building presentations criminally fun | We generate millions of images a day on Baseten for our 70+ million users with ultra-low latency and high throughput. |
| SM013 | Baseten | How Writer helps businesses transform with AI | In a benchmark running the LLMs in FP16 on four NVIDIA A100 GPUs, Writer saw 60% higher tokens per second, 23% lower time to first token, and 35% lower cost per million tokens. |
| SM014 | Baseten | Why we built and open-sourced a model serving solution | Truss bridges the gap between model development and model deployment by making it equally straightforward to serve a model on localhost and in prod. |
| SM015 | Baseten | AI Model Training Built for Production Inference | Baseten | Train -> deploy loop: Models trained with Loops promote directly to Baseten Dedicated Inference with one command. |
| SM016 | Baseten | Baseten Frontier Gateway | The Baseten Frontier Gateway is the path from weights to a production-ready API. |
| SM017 | Baseten | SSO and SCIM | Available on the Enterprise plan with just-in-time provisioning, automatic deprovisioning, and optional group-gated admin access. |
| SM018 | Baseten | Retrieve billing usage via API | The response includes aggregate totals and a per-resource or per-model breakdown array, with daily granularity on each entry. |
| SM019 | Technavio | AI Inference-as-a-service Market Growth Analysis - Size and Forecast 2026-2030 | The AI Inference-as-a-service Market size was valued at USD 85.25 billion in 2025, growing at a CAGR of 22.1% during the forecast period 2026-2030. |
| SM020 | Mordor Intelligence | Enterprise AI Market - Share, Trends & Size 2025 - 2031 | The Enterprise AI market size stood at USD 114.87 billion in 2026 and is projected to reach USD 273.08 billion by 2031, registering an 18.91% CAGR over 2026-2031. |
| SM021 | Fortune Business Insights | AI Inference Market Size, Share | Global Growth Report [2034] | The global AI inference market size was valued at USD 103.73 billion in 2025 and is projected to grow from USD 117.80 billion in 2026 to USD 312.64 billion by 2034. |
| SM022 | Modal | Modal: High-performance AI infrastructure | Autoscale from 0 to 1000+ GPUs, instantly. |
| SM023 | Replicate | Run AI with an API | We scale up and down to handle demand, and you only pay for the compute that you use. |
| SM024 | Runpod | The AI Developer Cloud | Runpod | One platform to go from AI experiment to production. Pods for building. Serverless for shipping. Clusters for scaling. |
| SM025 | Runpod | Top Serverless GPU Clouds for 2026: Comparing Runpod, Modal, and More | Baseten: usage-based (per-minute), configurable replicas, T4/A10G/L4/A100/H100, 8-12 sec cold starts. |
| SM026 | HostFleet | Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) — HostFleet | You’re a startup with a production inference workload and a budget -> Baseten. Most expensive per GPU-hour on paper, but Truss, observability, and support are tangible. |
| SM027 | Amazon Web Services | The center for all your data, analytics, and AI – Amazon SageMaker – AWS | Train, customize, and deploy ML and foundation models on a highly performant and cost-effective infrastructure. |
| SM028 | Google Cloud | Gemini Enterprise Agent Platform (formerly Vertex AI) | Build, scale, govern and optimize enterprise grade AI agents. |
| SM029 | Microsoft Azure | Azure Machine Learning - ML as a Service | Microsoft Azure | Azure Machine Learning is a comprehensive machine learning platform that supports language model fine-tuning and deployment. |
| SP001 | Baseten | Inference Platform: Deploy AI models in production | Baseten | |
| SP002 | Baseten | Cloud Pricing | |
| SP003 | Baseten Docs | Overview - Baseten | |
| SP004 | Baseten Docs | Secure model inference - Baseten | |
| SP005 | Baseten | Production-First Model APIs - Baseten Inference Stack | |
| SP006 | Baseten | AI Model Training Built for Production Inference | Baseten | |
| SP007 | Baseten | Multi-Model Inference, Ultra-Low Latency at Scale | Baseten | |
| SP008 | Baseten | AI Model Performance - Baseten Inference Runtime | |
| SP009 | Baseten | Mission-Critical Inference for Enterprise AI Infrastructure | |
| SP010 | Baseten | Baseten Frontier Gateway | |
| SP011 | Baseten | Why we built and open-sourced a model serving solution | |
| SP012 | Baseten | Baseten Status | |
| SP013 | Servicealert.ai | Baseten Outage History, Downtime & Incident Records | |
| SP014 | Modal | Modal: High-performance AI infrastructure | |
| SP015 | Modal | Plan Pricing | Modal | |
| SP016 | Replicate | Run AI with an API | |
| SP017 | Replicate | Pricing – Replicate | |
| SP018 | Runpod | The AI Developer Cloud | Runpod | |
| SP019 | Runpod | GPU Cloud Pricing - Runpod | |
| SP020 | Runpod Docs | Serverless pricing | Runpod Docs | |
| SP021 | AWS | The center for all your data, analytics, and AI – Amazon SageMaker – AWS | |
| SP022 | AWS | Amazon Bedrock Pricing – AWS | |
| SP023 | Google Cloud | Gemini Enterprise Agent Platform (formerly Vertex AI) | |
| SP024 | Microsoft Azure | Azure Machine Learning - ML as a Service | Microsoft Azure | |
| SP025 | HostFleet | Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) — HostFleet | Baseten: Most expensive per GPU-hour on paper, but Truss, observability, and support are tangible. |
| SP026 | Tracxn | Baseten Technologies | |
| SP027 | PitchBook | Baseten 2025 Company Profile: Valuation, Funding & Investors | PitchBook | |
| SP028 | Sacra | Baseten revenue, valuation & funding | AWS, Google, and Microsoft leverage extensive enterprise relationships to bundle AI inference with broader cloud commitments at below-market rates. |
| SP029 | Business Wire | Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future | |
| SP030 | Tech Funding News | Baseten nabs $300M from IVP, CapitalG to challenge Together AI in inference — TFN | |
| SI001 | Baseten | Cloud Pricing | |
| SI002 | Baseten | Inference at Scale with Dedicated Deployments | Baseten | |
| SI003 | Baseten | Production-First Model APIs - Baseten Inference Stack | |
| SI004 | Baseten | AI Model Training Built for Production Inference | Baseten | |
| SI005 | Baseten | Enterprise | |
| SI006 | Baseten | Healthcare | |
| SI007 | Baseten | Retrieve billing usage via API | |
| SI008 | Baseten | Announcing Baseten’s $75M Series C | |
| SI009 | Baseten | Announcing Baseten’s $150M Series D | |
| SI010 | Baseten | Announcing Baseten’s $300M Series E | |
| SI011 | Baseten | Careers at Baseten | |
| SI012 | Baseten | Privacy Policy | |
| SI013 | Baseten | Baseten Terms and Conditions | |
| SI014 | Baseten | Service Level Agreement | |
| SI015 | Baseten | Baseten Status | |
| SI016 | Baseten | How Writer helps businesses transform with AI | |
| SI017 | Baseten | OpenEvidence delivers instant, accurate medical information with the Baseten Inference Stack | |
| SI018 | Baseten | How Speechify makes audio the default with real-time text-to-speech | |
| SI019 | Baseten | Superhuman achieves 80% faster embedding model inference with Baseten | |
| SI020 | Baseten | Patreon scales Whisper transcription with Baseten | |
| SI021 | Business Wire | Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future | |
| SI022 | Tech Funding News | Baseten nabs $300M from IVP, CapitalG to challenge Together AI in inference | |
| SI023 | Sacra | Baseten revenue, valuation & funding | |
| SI024 | Tracxn | Baseten Technologies | |
| SI025 | CB Insights | Baseten Stock Price, Funding, Valuation, Revenue & Financial Statements | |
| SI026 | PitchBook via Wayback | Baseten 2025 Company Profile: Valuation, Funding & Investors | PitchBook | |
| SI027 | HostFleet | Serverless GPU Pricing Matrix 2026 | |
| SI028 | Mordor Intelligence | Enterprise AI Market - Share, Trends & Size 2025 - 2031 | |
| SI029 | U.S. Securities and Exchange Commission | EDGAR Entity Landing Page (CIK 0001850888) | |
| SI030 | Baseten | How Hebbia uses Baseten to power AI workflows for the world's leading financial institutions | |
| SI031 | Baseten | Posit launches real-time AI code suggestions with Baseten | |
| SI032 | Baseten | Wispr Flow creates effortless voice dictation with Llama on Baseten | |
| SI033 | Baseten | How Zed is reimagining the code editor from the ground up | |
| SI034 | Baseten | How World Labs is building large world models, pushing the boundaries of 3D | |
| SE001 | Baseten | Inference Platform: Deploy AI models in production | Baseten | Rapidly scale workloads across any cloud provider with global capacity. We offer single-tenant and self-hosted deployments for extra security. |
| SE002 | Baseten | Overview - Baseten | Baseten is a training and inference platform. Bring a model ... and Baseten turns it into a production API endpoint with autoscaling, observability, and optimized serving infrastructure. |
| SE003 | Baseten | Reference documentation - Baseten | |
| SE004 | Baseten | How Baseten works | Behind every GPU workload on Baseten is the Multi-cloud Capacity Management (MCM) system. |
| SE005 | Baseten | Production-First Model APIs - Baseten Inference Stack | Model APIs made for products, not toys. |
| SE006 | Baseten | Model APIs - Baseten | Model APIs provide instant access to high-performance LLMs through endpoints that are compatible with both the OpenAI Chat Completions API and the Anthropic Messages API. |
| SE007 | Baseten | Inference at Scale with Dedicated Deployments | Baseten | Dedicated deployments are single-tenant, can be region-locked, and are HIPAA compliant and SOC 2 Type II certified on Baseten Cloud. |
| SE008 | Baseten | Baseten Frontier Gateway | Baseten Frontier Gateway gives you a production-ready, white-labeled API endpoint. |
| SE009 | Baseten | Multi-Model Inference, Ultra-Low Latency at Scale | Baseten | Deploy your Chain to production with each Chainlet specifying its own hardware resources, software dependencies and scaling settings independently. |
| SE010 | Baseten | AI Model Training Built for Production Inference | Baseten | Loops (early access) ... Training Jobs (GA). |
| SE011 | Baseten | Cloud-Native AI Infrastructure | Baseten | We built cross-cloud autoscaling so you can serve users anywhere in the world with low latency and high reliability. |
| SE012 | Baseten | AI Model Management for Production Inference | Baseten | |
| SE013 | Baseten | AI Model Performance - Baseten Inference Runtime | We take the best open-source inference frameworks (TensorRT, SGLang, vLLM, TGI, TEI, and more) and layer in our own optimizations for maximum performance. |
| SE014 | Baseten | Secure model inference - Baseten | Baseten does not store model inputs, outputs, or weights by default. |
| SE015 | Baseten | Mission-Critical Inference for Enterprise AI Infrastructure | We are SOC-2 Type II certified and HIPAA compliant across all hosting options and support data residency requirements through region-restricted deployments. |
| SE016 | Baseten | Cloud Pricing | Only pay for the compute you use, down to the minute. |
| SE017 | Baseten | SSO and SCIM | Connect Baseten to your identity provider for SAML 2.0 sign-in and SCIM 2.0 directory sync. |
| SE018 | Baseten | Rolling deployments | You can now gradually shift traffic to new deployments instead of swapping all at once. |
| SE019 | Baseten | Introducing the Baseten Delivery Network (BDN) | We just launched the Baseten Delivery Network (BDN), designed to make cold starts 2-3x faster for large models. |
| SE020 | Baseten | Retrieve billing usage via API | You can now query your billing usage programmatically using the new GET /v1/billing/usage_summary endpoint. |
| SE021 | Baseten | Regional environments | Regional environments route inference traffic for a deployment exclusively to workload planes within a designated geographic region. |
| SE022 | Baseten | Baseten Status | Past Incidents ... May 29, 2026 ... May 26 ... May 19 ... May 18 ... May 16 ... May 15. |
| SE023 | Baseten | Service Level Agreement | Baseten will undertake commercially reasonable measures to ensure that System Availability ... equals or exceeds ninety-nine point nine percent (99.9%) during each calendar month. |
| SE024 | Baseten | Why we built and open-sourced a model serving solution | To address this problem, we built Truss. |
| SE025 | GitHub / basetenlabs | GitHub - basetenlabs/truss: The simplest way to serve AI/ML models in production | Truss is the CLI for deploying and serving ML models on Baseten. |
| SE026 | GitHub / basetenlabs | Releases · basetenlabs/truss | v0.18.3 ... 21 May 16:14 ... feat(loops/cli) ... feat(train) ... feat(truss). |
| SE027 | PyPI | truss | pip install --upgrade truss |
| SE028 | Baseten | How Writer helps businesses transform with AI | In a benchmark running the LLMs in FP16 on four NVIDIA A100 GPUs, Writer saw: 60% higher tokens per second, 23% lower time to first token, 35% lower cost per million tokens. |
| SE029 | Baseten | OpenEvidence delivers instant, accurate medical information with the Baseten Inference Stack | By using Baseten, OpenEvidence achieved: 78% lower latency ... 6x faster deployment processes ... 8x+ reduction in infrastructure maintenance time overall. |
| SE030 | ServiceAlert | Baseten Outage History, Downtime & Incident Records | Detailed incident data is not available for this service. |
| SE031 | HostFleet | Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) — HostFleet | You’re a startup with a production inference workload and a budget → Baseten. Most expensive per GPU-hour on paper, but Truss, observability, and support are tangible. |
| SE032 | Runpod | The AI Developer Cloud | Runpod | Sub-200ms cold starts ... Zero idle cost. |
| SE033 | Modal | Modal: High-performance AI infrastructure | Autoscale from 0 to 1000+ GPUs, instantly. |
| SE034 | Replicate | Run AI with an API | You can deploy your own custom models using Cog, our open-source tool for packaging machine learning models. |
| SE035 | Amazon Web Services | The center for all your data, analytics, and AI – Amazon SageMaker – AWS | Train, customize, and deploy ML and foundation models on a highly performant and cost-effective infrastructure. |
| SE036 | Google Cloud | Gemini Enterprise Agent Platform (formerly Vertex AI) | Agent Platform is our open and comprehensive platform ... to build, scale, govern and optimize enterprise-grade agents. |
| SE037 | Microsoft Azure | Azure Machine Learning - ML as a Service | Microsoft Azure | Azure Machine Learning is a comprehensive machine learning platform that supports language model fine-tuning and deployment. |
| SE038 | Nudge Security | Is Baseten Safe? Learn if Baseten Is Legit | Nudge Security | Baseten Supply Chain ... Amazon Web Services (AWS), Vercel, Statuspage, SendGrid, Stripe, Google Analytics, Segment, Sentry ... |
| SU001 | Baseten | Inference Platform: Deploy AI models in production | Baseten | |
| SU002 | Baseten | Customer stories | |
| SU003 | Baseten | Mission-Critical Inference for Enterprise AI Infrastructure | |
| SU004 | Baseten | Healthcare | |
| SU005 | Baseten | Cloud Pricing | |
| SU006 | Baseten | Announcing Baseten's $300M Series E | |
| SU007 | Business Wire | Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future | |
| SU008 | Baseten | How Writer helps businesses transform with AI | |
| SU009 | Baseten | OpenEvidence delivers instant, accurate medical information with the Baseten Inference Stack | |
| SU010 | Baseten | How Speechify makes audio the default with real-time text-to-speech | |
| SU011 | Baseten | How Gamma makes building presentations criminally fun | |
| SU012 | Baseten | Superhuman achieves 80% faster embedding model inference with Baseten | |
| SU013 | Baseten | Patreon saves nearly $600k/year in ML resources with Baseten | |
| SU014 | OpenEvidence | OpenEvidence | |
| SU015 | Writer | WRITER | |
| SU016 | Writer | About WRITER | |
| SU017 | Gamma | About Us – Reinventing Presentations with AI | Gamma.app | |
| SU018 | Speechify | Speechify: Text to Speech & Voice Typing AI Assistant | 55M+ Users | |
| SU019 | Speechify | Voice Over Studio: Request A Free Demo | Speechify | |
| SU020 | Superhuman | Superhuman: Docs, Mail, and AI That Work Everywhere | |
| SU021 | Patreon | Where Creator Communities Thrive — Patreon | |
| SU022 | NVIDIA | Case study:Baseten’s AI Inference Infrastructure | Baseten's infrastructure, powered by NVIDIA GPUs on AWS EC2 instances powered by NVIDIA A100 Tensor Core GPUs, only takes 5–10 seconds to get the model up and running. |
| SU023 | WorkOS | Baseten is betting big on open source models — WorkOS | companies could switch to models that were faster, less expensive, more customizable, and more reliable at scale |
| SU024 | FeaturedCustomers | 46 Baseten Customer Reviews & References | Customer Rating Review Score based on 654 reference ratings 4.8/5.0 |
| SU025 | Runpod | Top Serverless GPU Clouds for 2026: Comparing Runpod, Modal, and More | Baseten ... Usage-based (per-minute) ... 8–12 sec |
| SU026 | HostFleet | Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) — HostFleet | Baseten has per-GPU-hour pricing plus a minimum dedicated deployment cost. |
| SU027 | Abridge | Generative AI for Clinical Conversations | Abridge | |
| SU028 | Clay | Clay | Go to market with unique data—and the ability to act on it | |
| SU029 | Cursor | The best coding agent | |
| SU030 | Notion | Meet your AI team | Notion | |
| SU031 | PeerSpot | Baseten Reviews, Competitors and Pricing | |
| SU032 | Mercor | Mercor | Organizing human intelligence to power the AI economy | |
| SR001 | Baseten | Secure model inference | Baseten does not store model inputs, outputs, or weights by default. |
| SR002 | Baseten | Privacy Policy | |
| SR003 | Baseten | Baseten Terms and Conditions | Customer acknowledges and agrees that the Baseten Products & Services will not be used, and is not licensed for use, in connection with any of Customer’s time-critical or mission-critical functions. |
| SR004 | Baseten | Service Level Agreement | Baseten will undertake commercially reasonable measures to ensure that System Availability ... equals or exceeds ninety-nine point nine percent (99.9%). |
| SR005 | Baseten | Baseten Status | |
| SR006 | ServiceAlert | Baseten Outage History, Downtime & Incident Records | |
| SR007 | Nudge Security | Is Baseten Safe? Learn if Baseten Is Legit | |
| SR008 | Baseten | Cloud Pricing | |
| SR009 | Baseten | Baseten homepage | |
| SR010 | Baseten | Cloud-Native AI Infrastructure | |
| SR011 | Baseten | Mission-Critical Inference for Enterprise AI Infrastructure | |
| SR012 | Baseten | Healthcare | SOC-2 Type II and HIPAA compliant with flexible hosting and data residency with region-restricted cloud deployments. |
| SR013 | Technavio | AI Inference-as-a-service Market Growth Analysis - Size and Forecast 2026-2030 | |
| SR014 | Mordor Intelligence | Enterprise AI Market - Share, Trends & Size 2025 - 2031 | |
| SR015 | Runpod | Top Serverless GPU Clouds for 2026: Comparing Runpod, Modal, and More | |
| SR016 | HostFleet | Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) | Baseten has per-GPU-hour pricing plus a minimum dedicated deployment cost. Scale-to-zero is available but there are billed minimum awake times. |
| SR017 | Tracxn | Baseten Technologies | |
| SR018 | Business Wire | Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future | |
| SR019 | Baseten | Careers at Baseten | |
| SR020 | Baseten | Production-First Model APIs - Baseten Inference Stack | |
| SR021 | Baseten | AI Model Training Built for Production Inference | |
| SR022 | Baseten | AI Model Management for Production Inference | |
| SR023 | Baseten | Baseten Frontier Gateway | |
| SR024 | Baseten | Inference at Scale with Dedicated Deployments | |
| SR025 | Baseten | SSO and SCIM | |
| SR026 | Baseten | Rolling deployments | |
| SR027 | Baseten | Retrieve billing usage via API | |
| SR028 | Baseten | Why we built and open-sourced a model serving solution | |
| SR029 | U.S. Department of Health & Human Services | Business Associates | The satisfactory assurances must be in writing, whether in the form of a contract or other agreement between the covered entity and the business associate. |
| SR030 | European Commission | The EU’s approach to artificial intelligence | |
| SR031 | Modal | Series C announcement | |
| SR032 | Reuters | AI startup Modal raised $355 million in a new round of financing, valuing the company at $4.65 billion | |
| SR033 | Sacra | Fireworks AI revenue, valuation & funding | |
| SR034 | Tracxn | RunPod | |
| SR035 | CoreWeave | Record First Quarter Revenue and Revenue Backlog Highlight Unprecedented Demand for CoreWeave Cloud | |
| SV001 | Baseten | Announcing Baseten’s $300M Series E | We’re thrilled to announce that we have raised $300M at a $5B valuation. |
| SV002 | Business Wire | Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future | This values Baseten at $5 billion and marks the company’s third fundraise in the past year. |
| SV003 | Tech Funding News | Baseten nabs $300M from IVP, CapitalG to challenge Together AI in inference | investors invested $300 million in the company, pushing its valuation to about $5 billion |
| SV004 | Sacra | Baseten revenue, valuation & funding | Sacra estimates that Baseten hit $600M in annualized revenue in March 2026. |
| SV005 | Tracxn | Baseten Technologies | Jan 20, 2026 | $300M | Series E | $5B |
| SV006 | CB Insights | Baseten Stock Price, Funding, Valuation, Revenue & Financial Statements | Baseten has raised $585M over 7 rounds. |
| SV007 | PitchBook | Baseten 2025 Company Profile: Valuation, Funding & Investors | PitchBook | Latest Deal Amount $75M |
| SV008 | Baseten | Announcing Baseten’s $150M Series D | Today, we’re excited to announce our $150M Series D, led by BOND. |
| SV009 | Baseten | Announcing Baseten’s $75M Series C | Today, we’re thrilled to announce our Series C fundraise. |
| SV010 | Baseten | Cloud Pricing | Basic: $0 per month, pay as you go. |
| SV011 | Baseten | Baseten homepage | Scale workloads across any region and any cloud ... with ... 99.99% uptime out of the box. |
| SV012 | HostFleet | Serverless GPU pricing matrix 2026 | Baseten. Most expensive per GPU-hour on paper, but Truss, observability, and support are tangible. |
| SV013 | Runpod | Top serverless GPU clouds for 2026 AI workloads | Baseten ... Usage-based (per-minute) ... 8–12 sec |
| SV014 | Runpod | Runpod pricing | H100 PCIe $2.89/hr |
| SV015 | Modal | Modal's Series C: Raising $355M at a $4.65B valuation | We’ve raised $355 million ... surpassing $300 million in annualized revenue. Our valuation is $4.65B post-money. |
| SV016 | Reuters / U.S. News | Exclusive-Modal Labs Valued at $4.65 Billion as AI Coding Takes Off | The company’s annualized revenue is about $300 million, up from an annualized rate of $60 million in September. |
| SV017 | Modal | Modal pricing | Get started with $30 / month free credits |
| SV018 | Sacra | Fireworks AI revenue, valuation & funding | Fireworks AI hit $315M in annualized revenue in February 2026 ... gross margin sits at approximately 50%. |
| SV019 | CoreWeave / Business Wire | CoreWeave Reports Strong First Quarter 2026 Results | Revenue backlog was $99.4 billion as of March 31, 2026. |
| SV020 | CompaniesMarketCap | CoreWeave market capitalization | As of May 2026 CoreWeave has a market cap of $59.75 Billion USD. |
| SV021 | Datadog | Datadog Announces First Quarter 2026 Financial Results | Revenue was $1,006 million ... Full Year 2026 Outlook: Revenue between $4.30 billion and $4.34 billion. |
| SV022 | CompaniesMarketCap | Datadog market capitalization | As of May 2026 Datadog has a market cap of $88.04 Billion USD. |
| SV023 | Datadog / SEC filing mirror | Datadog Annual Report 2026 | Form 10-K (NASDAQ:DDOG) ... For the fiscal year ended December 31, 2025 |
| SV024 | CompaniesMarketCap | Cloudflare market capitalization | As of May 2026 Cloudflare has a market cap of $85.47 Billion USD. |
| SV025 | Stock Analysis | Cloudflare revenue 2016-2026 | This brings the company’s revenue in the last twelve months to $2.33B. |
| SV026 | CompaniesMarketCap | MongoDB market capitalization | As of May 2026 MongoDB has a market cap of $27.01 Billion USD. |
| SV027 | Stock Analysis | MongoDB revenue 2017-2026 | This brings the company’s revenue in the last twelve months to $2.60B. |
| SV028 | Technavio | AI Inference-as-a-service Market Growth Analysis - Size and Forecast 2026-2030 | The AI Inference-as-a-service Market size was valued at USD 85.25 billion in 2025, growing at a CAGR of 22.1% during 2026-2030. |
| SV029 | Mordor Intelligence | Enterprise AI Market - Share, Trends & Size 2025 - 2031 | The Enterprise AI market size stood at USD 114.87 billion in 2026. |
| SV030 | Amazon Web Services | Amazon Bedrock Pricing | Amazon Bedrock offers ... batch inference at a 50% lower price compared to on-demand inference pricing. |
| SV031 | Google Cloud | Gemini Enterprise Agent Platform | New customers get up to $300 in free credits. |
| SV032 | Microsoft Azure | Azure Machine Learning | The SLA for Azure Machine Learning is 99.9 percent uptime. There's no additional charge to use Azure Machine Learning. |