初创公司尽调
尽调报告 Generative AI infrastructure / inference cloud late-stage private 2026-05-16

Together AI

开放模型推理云,技术护城河可信、企业牵引已见规模,定价接近 Series B 水位

Together AI 的推理云产品和牵引力可信,但 Series B 估值要靠多年 ARR 放量才能支撑强退出。

封面要素

最新披露估值(Series B 2024) 01
3.3 USD B [CV001]
种子轮 / A 轮 / B 轮累计融资 02
500 USD M (approximate, per press) [CV001, CV002]
媒体报道的收入运行率区间 03
130-200 USD M ARR (per The Information, unverified) [CV040]
已具名企业与创业公司客户 04
9 case studies + GTC 2025 cohort [CV012]
开发者注册数(公司口径) 05
100000 developers [CU001]

公司概况

Together AI 是生成式 AI 云平台,覆盖 200+ 个开放和定制模型,提供无服务器与专用推理、微调和训练;底层由 FlashAttention、ThunderKittens 与 Together Inference Engine v2 支撑。公司把可防御的技术研究底座、 Salesforce + NVIDIA 渠道,以及开源社区入口组合到一起。

官网
www.together.ai
成立时间
2022-06-01
创始人
Vipul Ved Prakash, Ce Zhang, Tri Dao, Percy Liang
创立地点
San Francisco, California, USA
总部
San Francisco, California
产品
Together AI 销售无服务器推理(按 token 计费)、专用端点(预留 GPU 容量)、微调(LoRA + full)、批量推理、 embeddings、视觉、音频和图像 API,覆盖 200+ 个开放与定制模型目录,整体兼容 OpenAI。
客户
开发者(自助式)、AI 原生创业公司(Pika、Cartesia、Arcee、Nous Research)、企业 SaaS(Salesforce、 Zoom)、医疗健康(Adaption)、学术机构(Washington University),以及 NVIDIA GTC 2025 Pioneers 队列。
商业模式
按用量计费的无服务器推理 + 承诺专用容量 + 微调 + 企业合同;Salesforce Ventures 联合销售和 Startup Accelerator 强化直销。
阶段
late-stage private
融资情况
私有融资;Series A $102.5M(Nov 2023,Kleiner Perkins 领投)和 Series B $305M(Mar 2024,Salesforce Ventures 领投,CNBC / Bloomberg / Fast Company 报道投后约 $3.3B);投资方包括 NVIDIA、Coatue、Lux Capital、Prosperity7、General Catalyst。

执行摘要

主要优势

  • 技术护城河由 FlashAttention(Tri Dao)、ThunderKittens(Stanford HazyResearch)、Together Inference Engine v2 和 Mixture-of-Agents 产品化共同支撑。
  • 核心渠道伙伴(Salesforce Ventures 联合销售、NVIDIA GTC 2025 Pioneers、Startup Accelerator)叠加 200+ 开放模型目录,覆盖企业和开发者。
  • 已记录的企业与创业公司验证基础覆盖 Salesforce、Zoom、Pika、Cartesia、Arcee、Nous Research、Washington University 和 Adaption healthcare。

主要风险

  • 超大规模云厂商捆绑推理服务(AWS Bedrock、GCP Vertex、Azure OpenAI)可能在 2026-2027 年压低定价 30-50%。
  • GPU、网络和技术栈集中依赖 NVIDIA;若 Blackwell 分配收紧,收入爬坡会被卡住。
  • 到 2027 年,生成式 AI 监管边界(EU AI Act、BIS 出口管制、FTC 调查)和版权诉讼先例(NYT、Authors Guild、Getty)都会继续扩大。

未决问题

  • 准确 ARR、NRR / GRR、前 10 大客户集中度、GPU 承诺支出和运营费用拆分(R&D / S&M / G&A)均未披露。
  • runDate 时 CFO 和 CRO 是否到位未获公开确认。
  • 除公开状态页外,SLA 百分比、事故历史、渗透测试频率和泄露预案均未披露。
  • 主权渠道姿态(Prosperity7-adjacent)以及版权先例收紧下的 OSS 托管政策,需要管理层披露。

目录

Chapter 01

01公司概况

1.1 身份、总部与产品框架

Together AI 将自己定位为「AI 加速云」,为开源和定制的大语言、图像、音频与视觉模型提供训练、 微调和推理。公司主体 Together Computer Inc. 总部位于 California San Francisco,在 Menlo Park 设有卫星办公室, Zurich 还有研究人员;招聘页和联系方式共同确认了这些地点,也显示公司仍在基础设施、kernel、GPU、应用 ML 与收入岗位上积极招人。 公司由四位与 Stanford、Princeton、ETH Zürich 和更广泛开源 LLM 研究社区深度相关的联合创始人于 27 June 2022 注册成立。 它的身份建立在三根支柱上:面向 AI 工作负载定制的超大规模 GPU 云,开源研究线(RedPajama、OpenChatKit、StripedHyena、 FlashAttention、Mixture-of-Agents),以及与 OpenAI 和 Anthropic 竞争、但为开放模型定价的自助式推理和微调 API。 公司强调客户可以保留权重、控制数据驻留,并在需要时使用专用集群;这是它与封闭 API 竞争对手的核心差异。[CO001, CO002, CO003, CO004, CO005]

KPI 快照表
指标数值 / 状态日期置信度缺口或尽调问题
投后估值$3.3B2024-07-09确认 2026 年老股交易或新一轮融资
累计新股融资≈$533M 已披露2024-07核查 2024 年 7 月后是否有延长轮
年化收入≈$100M(第三方报道)2024-07无审计文件;要求管理层提供数字
员工数>150(招聘网站推导)2026-05无监管文件;要求 HR 花名册
GPU 规模>20,000 NVIDIA Hopper 级2024-07确认 Blackwell 增量和利用率
客户数100,000+ 开发者(公司声称)2024区分付费与免费;核实 NRR
总部San Francisco, CA2026-05
成立日期27 June 20222022

数值混合公司披露(高)、第三方报道(中)和推导数据(低);付费客户数和 ARR 未经审计,必须向管理层核实。

[CO019, CO020, CO021, CO022, CO023, CO024]
FO002: 公司快照逻辑

身份、产品、资本和客户如何连接。

[CO001, CO003, CO005, CO017, CO020, CO021]

1.2 创始人、领导层与治理

CEO Vipul Ved Prakash 此前是 Topsy 联合创始人 / CTO(2013 年被 Apple 以约 $200M 收购),也是 Cloudmark 早期负责人,因此兼具消费者规模 ML 与基础设施运营经验。CTO Ce Zhang 是 ETH Zürich 终身教授,并领导 Together 在分布式训练系统和以数据为中心的 ML 研究。首席科学家 Chris Ré 是获得 MacArthur 奖的 Stanford 教授,Snorkel 以及多条 FlashAttention / Hyena 工作线都出自其团队;Stanford CRFM 主任 Percy Liang 是联合创始人兼顾问。 领导层已补上收入负责人、GPU 基础设施负责人、推理工程负责人,以及常驻 Zurich 的研究负责人;董事会包括 Coatue、Kleiner Perkins、NEA 和 Lux 的投资合伙人。关键人依赖集中在 Prakash 的商业执行上,也集中在创始研究三人组带来的技术可信度上; 考虑到开源飞轮贡献了 Together 很大一部分漏斗顶端,这一点尤其重要。[CO006, CO007, CO008, CO009, CO010, CO011]

管理层与创始人表
人物职位背景创始人-市场匹配关键人依赖
Vipul Ved Prakash联合创始人、CEO曾任 Topsy 联合创始人 / CTO(2013 年被 Apple 收购),Cloudmark 联合创始人连续基础设施 / 消费 ML 创始人,有运营退出经历高——唯一 CEO 和主要商业门面
Ce Zhang联合创始人、CTOETH Zürich 终身教授;分布式训练与数据中心化 ML 研究负责人深厚系统 / ML 研究可信度高——唯一 CTO;连接研究与工程
Chris Ré联合创始人、首席科学家MacArthur Fellow;Stanford CS;Snorkel 联合创始人;FlashAttention/Hyena 谱系撰写或指导大部分开源 IP高——锚定研究品牌
Percy Liang联合创始人Stanford CRFM 主任;HELM 基准负责人设定研究议程和学术可信度中——顾问性质,不全职运营
Tri Dao首席科学家(研究)FlashAttention 作者;Princeton CS 教职推理内核权威高——推动内核性能领先
收入负责人销售领导(公开列示职位)企业 SaaS 背景企业扩张必需中——已招聘多名销售
GPU 基础设施负责人集群工程过往超大规模云厂商经验(招聘网站)SLA 与成本关键岗位中——正在积极招聘

创始人履历与官方 About 页面和 Wikipedia 交叉核实;非创始人高管来自截至 runDate 的招聘信息和公开 LinkedIn 足迹。

[CO006, CO007, CO008, CO009, CO010, CO011]

1.3 融资历史、资本结构与估值

Together AI 在 May 2023 完成 $20M 种子轮,由 Lux Capital 领投,Factory、SciFi、Long Journey 以及 Scott Banister、Jakob Uszkoreit、Aravind Srinivas 等个人投资人参与。Nov 2023 又完成 $102.5M Series A,由 Kleiner Perkins 领投,NVIDIA、Emergence、NEA、Prosperity7 和 Greycroft 参投。Mar 2024 公司据报道以 $1.25B 估值追加约 $106M;随后在 July 2024 完成 $305M Series B,由 Salesforce Ventures 和 Coatue 领投,投后估值 $3.3B,Lakestar、NVIDIA 和更多战略投资方也参与。由此计算,在任何 2025/2026 延伸轮之前,已披露的新股融资累计约 $533M;截至报告运行日,EDGAR 上没有公开 S-1 申报或注册发行文件。 投资人组合——主权相关资本(Prosperity7)、战略 GPU 供应商(NVIDIA)、定义品类的云客户(Salesforce Ventures) 与一线财务投资人(Coatue/KP/Lux)——并不常见,说明 Together 正被摆成开放模型市场中立、多方参与的底层骨架。[CO012, CO013, CO014, CO015, CO016, CO017]

利益相关方 / 投资方图谱
利益相关方角色轮次控制 / 经济重要性尽调问题
Salesforce Ventures领投 B 轮(2024)B向 Salesforce 生态做战略分发确认是否有商业承诺或收入分成
Coatue联合领投 B 轮B公私募交叉投资信号确认按比例跟投意愿
Kleiner Perkins领投 A 轮种子 / A / B董事席位;合伙人 Bucky Moore确认董事会构成
NVIDIA战略投资方A/BH100/H200/B200 供给分配量化供给承诺和定价
Lux Capital领投种子轮种子 / A最早机构投资方确认董事会观察员权利
Emergence CapitalA 轮A企业 SaaS 网络
Prosperity7 (Aramco)A 轮A主权资本色彩;中东市场进入确认是否有主权云承诺
投资方:NEA, Greycroft, SciFi, Factory, Long Journey, Definition, Long Journey联合投资方种子 / A / B轮次支持
创始人与员工普通股据 A 轮新闻稿,持股保留 >25%确认 B 轮后股权结构表

股权结构表数字来自融资事件新闻稿;截至 runDate,未披露老股出售。

[CO012, CO013, CO014, CO015, CO016, CO017]
FO001: 公司里程碑时间线

从创立到 Series B,以及旗舰研究发布。

[CO014, CO015, CO016, CO017, CO021, CO022]

1.4 规模、封面指标与里程碑

公开规模指标仍不完整。公司称其在多个区域运营超过 20,000 块 NVIDIA Hopper 级 GPU,公开路线图也提到 Blackwell 推出;并称通过 Together API 服务「hundreds of thousands」开发者。但公司尚未披露经审计 ARR、毛利率、付费开发者数或净收入留存率。 CNBC 报道 Series B 前后年化收入节奏为 $100M;Bloomberg 提到三位数增长,但没有给出具体数值。报道中的员工数超过全球 150 人, 仍在招聘 kernel、网络、ML 和销售岗位。里程碑时间线包括成立(June 2022)、种子轮(May 2023)、RedPajama 1T 数据集(April 2023)、OpenChatKit(March 2023)、Series A(November 2023)、FlashAttention-3(July 2024)、 $3.3B 估值 Series B(July 2024),以及 StripedHyena / Mixture-of-Agents 研究(late 2023–2024)。 截至报告运行日,未见反向诉讼、裁员或监管行动报道;但核心封面指标(毛利率、ARR 确认、客户集中度)仍未披露,已体现在快照 KPI 表中。[CO019, CO020, CO021, CO022, CO023, CO024]

里程碑表
日期事件类型金额 / 估值 / 状态参与方含义
2022-06-27Together Computer Inc. 注册成立创立存续Prakash, Zhang, Ré, Liang主体身份确立
2023-03-10OpenChatKit 发布产品已发布Together + LAION + Ontocord开源指令微调基线
2023-04-17RedPajama 1T 数据集发布产品已发布Together + EleutherAI + LAION基础开源数据集(1T tokens)
2023-05-15$20M 种子轮公布融资已交割Lux + Factory + SciFi机构启动资本
2023-11-29$102.5M A 轮融资已交割,估值未披露Kleiner Perkins(领投)、NVIDIA、NEA、Emergence扩张和 H100 建设
2024-03-13据报道,以 $1.25B 估值进行过渡融资融资据报道现有投资方周期中估值抬升
2024-07-09$305M B 轮,投后 $3.3B融资已交割Salesforce Ventures + Coatue(联合领投)、NVIDIA、Lakestar估值跃升 3x;转向企业市场
2024-07-11FlashAttention-3 论文与博客产品已发布Dao 等领先 H100 推理内核
2024-09Together Inference Engine 2.0产品已发布Together 工程团队延迟 / 吞吐领先主张
2023-12StripedHyena-Nous-7B产品已发布Together + Nous Research非注意力长上下文架构
2024-06Mixture-of-Agents 论文产品已发布Together 研究团队智能体 LLM 技术
2024-Q4Dedicated Endpoints 正式可用(GA)产品已发布Together 工程团队企业推理产品

截至 runDate,未报道负面事件(诉讼、裁员、监管行动);缺乏负面事件本身也是尽调发现,仍待背景调查验证。

[CO019, CO020, CO021, CO022, CO023, CO024]
FO003: 快照 KPI

投委会可用的成熟度、牵引力与资本快照。

[CO019, CO021, CO022, CO024, CO032]

1.5 图表要点

Chapter 02

02市场分析

2.1 市场边界与邻近领域

Together AI 位于现代云栈的 AI 计算和推理平台层——夹在超大规模云厂商 GPU IaaS(AWS、GCP、Azure)、专门 GPU 云(CoreWeave、 Lambda、Crusoe)、推理 API 提供商(Replicate、Fireworks、Groq、Modal)和封闭 API 模型实验室(OpenAI、Anthropic)之间。 我们承销的市场,是用于运行、微调和服务开放权重或客户自有基础模型的支出,加上 AI 工作负载使用的专用与无服务器 GPU 容量。 该市场不包括通用云计算、传统 ML 平台(Sagemaker 仅训练、经典 scikit pipeline),也不包括不托管客户权重的封闭专有模型 API。 邻近领域包括 MLOps 工具(Weights & Biases、Anyscale)、向量数据库,以及 AI 安全 / 可观测性厂商。现状替代方案是自建 Kubernetes-on-GPU 集群,以及从 OpenAI / Anthropic 租用封闭 API;两者都用灵活性换取价格和运营简单度。我们也明确排除按席位计费的 AI copilots(Copilot、Cursor),因为需求单位是终端用户席位而不是推理 token;它们位于 Together 之上的应用层,采购的是 token 级推理, 并不替代它。[CM001, CM002, CM003, CM004, CM005, CM006]

市场定义表
细分纳入支出排除支出购买方 / 付款方与 Together 的相关性
开放权重模型推理(API)Llama/Mistral/Qwen/DeepSeek 上按 token 计费的无服务器推理封闭 API token(OpenAI/Anthropic)开发者 + CTO核心 SOM
专用 GPU 容量预留 H100/H200/B200 端点通用云计算平台团队直接扩张 ARR
微调 + 定制模型托管LoRA、完整微调、定制 checkpoint 托管内部 Kubernetes 训练ML 工程负责人高毛利附加
批量推理 + 训练百万级 token 批处理任务、预训练运行封闭训练专用平台研究负责人增长切入点
主权 / 区域集群区域内专用容量公共区域多租户政府 / 受监管 CIO差异化赛道
MLOps + 可观测性日志、评测、微调任务BI / 分析MLOps 负责人相邻业务,非核心
封闭 API 模型租用OpenAI/Anthropic API 支出应用开发者替代 / 压力

边界锚定在客户拥有权重,以及 GPU 支撑的计算作为计费单元;排除通用云和纯封闭 API。

[CM001, CM002, CM003, CM004, CM005, CM006]

2.2 TAM/SAM/SOM 与测算口径

多个分析机构口径指向 2024 AI 基础设施 TAM 为 $40–60B,2028 前 CAGR 为 30–50%(Gartner、IDC、McKinsey)。 在这个范围内,与 Together 最相关的推理和专用 AI 计算 SAM,到 2026 约为 $8–15B;测算来自三角交叉:超大规模云厂商 AI 收入披露 (外推 AWS Bedrock-equivalent 收入年化 $26B)、Series B 报道称推理是增长最快的产品线,以及 Together 约 $100M ARR 代理值意味着其在早期 SAM 中只占个位数份额。SOM(Together 可触达、近期可拿下的支出)大约为 $1–3B,重点是 AI 原生创业公司、 模型实验室,以及 Together 已有明确关系的 Salesforce + 主权云渠道。测算受两点限制:超大规模云厂商没有拆分公开数据;许多已发表估计把训练 capex 和推理运行率混在一起。[CM007, CM008, CM009, CM010, CM011, CM012]

TAM/SAM/SOM 或规模测算视角表
发布方年份地域数值CAGR方法论置信度限制
Gartner2024全球$40–60B AI 基础设施 TAM30–50%自上而下调研超大规模云厂商 + 企业 AI 支出汇总训练 + 推理;未拆分
IDC(二手引用)2024全球$50B AI 基础设施,202435%硬件 + 云预测间接引用
McKinsey AI 支出报告2024全球$50–100B 2027 年 AI 基础设施40%情景分析区间宽;假设不清
三角测算 SAM(本报告)2026全球$8–15B 推理 + 专用容量 SAM自下而上,基于 CNBC ARR + 超大规模云厂商披露依赖超大规模云厂商季报,来源单一
三角测算 SOM(本报告)2026全球Together 可触达 $1–3B渠道 + Together $100M ARR估算不确定性高
NVIDIA 财报(数据中心)2025-Q1全球>$30B/季度数据中心收入>50%公开文件包含训练资本开支销售,并非纯推理

TAM/SAM/SOM 均设边界;保留区间,因为没有单一公开来源能清晰拆分推理支出。

[CM007, CM008, CM009, CM010, CM011, CM012]
FM001: 市场规模测算视角

Together 可服务 AI 计算的 TAM/SAM/SOM。

[CM007, CM008, CM011, CM012, CM036]
FM002: 市场估算区间

2026 年推理 SAM 估算。

[CM009, CM010]

2.3 买方、用户与付款方分层

三类核心买方驱动 Together 需求。(1)AI 原生创业公司和模型实验室:技术创始人或 CTO 为 FlashAttention 级推理延迟、专用 H100/H200 访问和开放权重灵活性选择 Together;这类客户通常先用信用卡自助购买,再升级到企业合同。(2)Fortune-500 企业内部的平台团队和应用 ML 小组:预算所有者是评估多模型策略的 CIO/CTO,采购门槛围绕数据驻留、SOC 2 和 BAA 支持;Salesforce Ventures 共同领投 Series B 为这一分层背书。(3)政府、研究和主权云客户:Prosperity7(Aramco)及类似主权相关 LP 暗示中东 / APAC 角度,Together 也把专用区域集群定位为差异点。 用户(开发者、ML 工程师、研究人员)往往不同于付款方(财务、采购、IT),这会拉长企业周期,但落地后改善 NRR。[CM014, CM015, CM016, CM017, CM018, CM019]

细分市场 / 买方图谱
细分市场买方用户付款方工作流预算负责人采用触发点
AI 原生初创公司CTOML 工程师创始人 / CFO自助式 API + LoRACTO需要开放权重 + 专用 GPU
F500 平台团队CIO应用 MLIT 采购RFP + 专用端点CIO多模型策略 + BAA
主权云部长 / CIO政府 ML财政部门区域内专用容量政府数据驻留要求
模型实验室创始人研究员创始人预留训练 + 推理创始人超大规模云厂商 GPU 稀缺
独立开发者本人本人本人按 token 计价的 API本人免费层 + 价格持平
Salesforce 生态 ISV产品 VP工程团队产品 P&L嵌入式 GenAI产品 VPSalesforce Ventures 渠道

买方 / 用户 / 付款方拆开看,才能区分信用卡自助采用和企业采购门槛。

[CM014, CM015, CM016, CM017, CM018, CM019]
FM003: 买方 / 细分市场地图

各细分市场的采用成熟度。

[CM014, CM015, CM016, CM017, CM018, CM019]

2.4 增长驱动与约束

顺风包括:开放权重模型持续扩散(Llama 3/4、Mistral、DeepSeek、Qwen)、超大规模云厂商 GPU 稀缺、FinOps 压力要求降低按 token 计费的封闭 API 支出,以及智能体 AI 浪潮把每个用户的 token 用量放大。逆风包括:NVIDIA 供给分配偏向超大规模云厂商、主权数据规则拖慢跨境推理、 新数据中心遭遇能源 / 许可瓶颈,以及 Groq、Fireworks、Cerebras 在推理层带来的价格竞争压力。采用时点风险包括企业采购摩擦、超大规模云厂商可能把 OSS 推理层商品化(AWS Bedrock 开放模型、GCP Vertex Model Garden),以及训练与推理组合的经济性波动。Together 的定位依赖推理 kernel 持续领先一代(FlashAttention 3/4、ThunderKittens),同时扩展到能锁定企业支出的预留 / 专用 SKU。每个驱动和约束最终都回到 IC 的二元问题: 推理 SAM 能否再以 35%+ 复合增长三年,还是超大规模云厂商商品化会把增长前置成一年抢地盘?我们的基准情景假设 2027 前 CAGR 可持续在 30–40%,2026 起竞争强度扩大;在这种状态下,Together 的开源飞轮和专用容量差异化能产出最强 IRR。[CM021, CM022, CM023, CM024, CM025, CM026]

增长驱动因素与约束表
驱动因素 / 约束方向时间影响尽调问题
开放权重模型扩散+2024-2027支撑 SAM >35% CAGR 增长跟踪 Llama 4/5、DeepSeek、Qwen 发布节奏
NVIDIA Hopper/Blackwell 稀缺+2024-2026推高 Together 预留容量溢价量化 Together 与 NVIDIA 的分配协议
封闭 API 价格压力(OpenAI 降价)-持续挤压每 token 利润率跟踪 Together 与 OpenAI 的价格持平度
超大规模云厂商开放模型商品化-2025-2027侵蚀纯推理 SAM关注 AWS Bedrock 与 Vertex Model Garden 扩张
主权数据驻留规则+/-2025+形成区域护城河,但限制跨境 ARR确认 Together 区域内集群
能源 / 许可瓶颈-2026-2028拖慢容量扩张确认 Together 数据中心合同
智能体工作负载放大 token 用量+2025+增加单用户推理量跟踪 MoA + 智能体 SDK 采用
FinOps 推动 OSS 推理+2025+相对封闭 API,给 Together 带来顺风调研企业 FinOps 策略

驱动因素来自多份分析师报告和合作伙伴声明;约束则由供应链报道和超大规模云厂商公告交叉验证。

[CM021, CM022, CM023, CM024, CM025, CM026]
FM004: 采用漏斗或价值链图

从发现到扩张的路径。

[CM020, CM021, CM022, CM033]

2.5 图表要点

Chapter 03

03竞争格局

3.1 竞争格局分层

Together 在五个相互重叠的战场竞争。(1)超大规模云厂商开放模型产品——AWS Bedrock 和 Google Vertex Model Garden 托管 Together 也提供的 Llama / Mistral checkpoint,并打包企业合同和 IAM。(2)专门 GPU 云——CoreWeave、Lambda Labs 和 TensorWave 争夺原始 GPU-hour 与预留容量;它们通常缺少 Together 叠加的推理 SaaS 层。(3)推理 API 同行——Fireworks、Replicate、Modal 和 Anyscale 在按 token 计费的无服务器层提供近似直接替代;Fireworks 最常被称为 Together 最接近的直接对手。(4)定制硅推理厂商——Groq (LPU)、Cerebras(wafer-scale)和 SambaNova 以模型覆盖为代价,在延迟和每 token 价格上竞争。(5)封闭 API 模型实验室——OpenAI 和 Anthropic 是那些愿意放弃权重可迁移性的买方替代方案。现状替代是自建 Kubernetes-on-GPU,用灵活性换运营负担;内部自建最常见于前沿实验室和 FAANG。 竞争集合异常宽,是因为 Together 位于计算、模型托管与开发者体验的交叉点;每个战场都让 Together 面对不同成本结构(capex 重的 GPU 云 vs OpEx 轻的 API 提供商)、不同分销权力(超大规模云采购 vs 开发者自助),以及不同退出动态(GPU 云整合 vs API 同行商品化),下文会分别承销。[CP001, CP002, CP003, CP004, CP005, CP006]

竞争对手画像表
竞争对手类别规模 / 融资目标细分市场差异化限制
AWS Bedrock超大规模云厂商开放模型>$80B AWS 收入企业IAM、合规、捆绑每 token 溢价,模型新增更慢
GCP Vertex Model Garden超大规模云厂商开放模型~$30B GCP 收入企业Gemini + 开放模型开放权重深度较弱
CoreWeave专用 GPU 云>$8B 融资;2025 上市AI 实验室、超大规模云厂商卸载需求最大的非超大规模云厂商 GPU 集群没有推理 SaaS 层
Lambda LabsGPU 云$320M Series C研究人员、初创公司按需 H100/H200集群小于 CoreWeave
Fireworks AI推理 API 同类>$77M 融资开发者、初创公司OpenAI 兼容 APIOSS 研究影响力较小
Replicate推理 API 同类>$40M 融资独立开发者社区模型、低摩擦冷启动延迟
Modal无服务器基础设施>$80M 融资ML 工程师Python 原生无服务器模型广度较弱
Anyscale基于 Ray 的平台>$250M 融资ML 工程师Ray + LLM 工具OSS 平台税
Groq定制芯片>$1B 融资延迟敏感型开发者LPU 推理速度模型覆盖有限
Cerebras定制芯片>$1B 融资;已提交 IPO前沿客户晶圆级芯片单次部署成本高
OpenAI / Anthropic(替代)封闭 API>$30B / $10B 融资企业 + 开发者前沿封闭模型权重不可迁移
TensorWaveAMD GPU 云种子轮阶段成本敏感型开发者MI300X 容量规模有限

融资和规模数字来自公开新闻稿与 Crunchbase 摘要;部分私募融资轮次依赖第三方报道。

[CP001, CP002, CP003, CP004, CP005, CP006]

3.2 能力与功能比较

按能力轴看,Together 领先于 FlashAttention-3/4 kernel 性能、开放权重模型广度(Llama、Mistral、DeepSeek、Qwen、定制 checkpoint) 和专用端点灵活性。超大规模云厂商领先于企业合规广度(BAA、FedRAMP、区域驻留)和打包身份 / 计费。Groq 在支持模型的原始单流延迟上领先, 但模型覆盖落后。Fireworks 在无服务器开放模型 API 上与 Together 接近,但 OSS 研究可见度更低。价格比较显示,Together 无服务器费率集中在 OpenAI 平价区间附近(7–70B 模型输入 token 约 $0.20–$0.90/M),批量折扣最高 50%;CoreWeave / Lambda 在原始 GPU-hour 上更便宜,但要求客户自己做 DevOps;AWS Bedrock 则在底层计算之上加收按 token 溢价。下方功能矩阵把不支持的单元格标为 unknown,而不是猜测。 矩阵显示:Together 赢在开放权重广度和 kernel 性能,超大规模云厂商赢在合规和 IAM,定制硅厂商以模型覆盖为代价赢在延迟;没有单一厂商能同时主导四个最常被引用的采购标准。 我们还注意到,在这个集合里,Together 是仅有的两家既提供 OpenAI 兼容 chat completions 端点、又暴露 fine-tune 和 batch SKU 的厂商之一, 能显著缩短从封闭 API 迁出的买方迁移时间。[CP009, CP010, CP011, CP012, CP013, CP014]

功能 / 能力矩阵
采购标准TogetherBedrockGCP VertexFireworksGroqCoreWeave
开放权重模型广度n/a
FlashAttention 级内核性能unknownunknownn/a
专用端点 / 预留是(预置)是(裸资源)
微调 API部分
批量推理 SKU部分部分
合规(SOC2/HIPAA/FedRAMP)SOC2;通过 BAA 支持 HIPAA完整完整SOC2unknownSOC2
主权 / 区域集群可用完整完整有限unknown完整
OpenAI 兼容 API
每 token 标价透明度n/a
多模态(视觉 / 音频 / 图像)部分部分n/a

公共文档未披露的单元格标为 “unknown”;功能不在竞争对手 SKU 范围内的单元格标为 “n/a”。

[CP009, CP010, CP011, CP012, CP013, CP014]
定价 / 包装对比
供应商SKU价格 / 单位折扣备注
Together无服务器 Llama-70B$0.88/M tokensOpenAI 价格持平区间
Together批量推理较无服务器低 50%批量2025 更新
Together专用端点定制预留由销售报价
Fireworks无服务器 Llama-70B$0.90/M tokens类似价格持平
Replicate按秒计费不一按 GPU 秒计费
AWS BedrockLlama 3 70B输出价格:$0.99/M output tokens量大预置预留选项
GCP VertexLlama 3 70B$0.99/M量大类似 Bedrock
GroqLlama 3 70B$0.59/M延迟溢价
CoreWeaveGPU 小时$2–4/H100-hr预留客户管理技术栈
LambdaGPU 小时$2.79/H100-hr按需客户管理技术栈

每 token 价格反映 runDate 时供应商网站公开标价;企业交易的实际价格未披露。

[CP016, CP017, CP018, CP019, CP020]
FP001: 竞争定位图

开放权重广度与企业合规成熟度。

[CP001, CP009, CP011, CP012, CP013]
FP002: 功能广度 / 能力图

按竞争对手比较能力强度。

[CP010, CP014, CP015, CP018, CP029]

3.3 护城河耐久度与竞争风险

Together 可防御的护城河包括:(a)FlashAttention 研究脉络和 kernel 迭代速度(Tri Dao + Chris Ré),(b)开源社区引力 (RedPajama、StripedHyena、MoA),以及(c)NVIDIA + Salesforce + 主权资本结构,帮助锁定 GPU 供应和企业分销。切换成本处于中等水平: 客户可以通过 API 翻译在 Together / Fireworks / Bedrock 多栖;但 Together 上的专用端点合同和微调模型 artefact 会提高粘性。 分销权力偏向超大规模云厂商——它们掌握企业采购和身份——但 Together 的中立性与开放权重承诺构成反定位差异。竞争者反向证据包括:CoreWeave 2024 IPO 文件与 Lambda 增长信号显示 IaaS 层有显著资本优势;Groq 和 Cerebras 各自融资超过 $1B 且估值更高;Bedrock 2025 年扩大 Llama 支持,压缩 Together 在商品化工作负载上的溢价。商品化风险真实存在,但受 Together 的研究速度和专用容量合同约束。 净判断是:到 2027 年,护城河在专用和高性能分层仍然耐用;商品化无服务器层会越来越受到超大规模云厂商挤压,延迟敏感层会受定制硅厂商压力。 公司能否守住 kernel 与架构领先,是护城河假设的门槛变量,因此也是技术尽调清单上的首要事项。[CP021, CP022, CP023, CP024, CP025, CP026]

护城河耐久性 / 竞争风险登记表
护城河主张威胁严重性缓释措施 / 尽调问题
FlashAttention 研究脉络开源成果向竞争对手扩散跟踪 Together 的专利 / IP 布局
开源社区吸引力Mistral/HF 的竞争性开源项目量化 Together 在 GH/HF 上的长期牵引力
NVIDIA 供给协同NVIDIA 向超大规模云厂商倾斜记录 Together 与 NVIDIA 协议细节
Salesforce / 企业渠道Salesforce 自建 AI 基础设施确认 Salesforce 商业承诺
主权资本 + 区域集群主权客户直接转向本地云梳理 Together 区域数据中心布局
专用端点粘性Bedrock 预置吞吐能力追平跟踪 Bedrock 开放模型价格动作
开放权重中立性企业想要封闭 API 的简单体验调研企业多模型策略
推理引擎性能领先专用芯片(Groq/Cerebras)跳跃式赶超在相同模型上对比测试 Together 与 Groq

按竞争替代敞口和资本强度排序护城河;每行都有一个具体尽调问题。

[CP021, CP022, CP023, CP024, CP025, CP026]
FP003: 护城河 / 准备度 KPI

紧凑版竞争耐久性总结。

[CP021, CP022, CP023, CP024, CP025, CP026]

3.4 图表要点

Chapter 04

04财务情况

4.1 融资历史与资本结构

Together AI 已在四个公开宣布轮次中筹集约 $533M 已披露新股资本。$20M 种子轮(May 2023)由 Lux Capital 领投,Factory、SciFi、 Long Journey 以及知名个人投资人(Scott Banister、Jakob Uszkoreit、Aravind Srinivas)参与。Nov 2023 的 $102.5M Series A 由 Kleiner Perkins 领投,NVIDIA、Emergence、NEA、Prosperity7 和 Greycroft 参与。Mar 2024 公司据报道以 $1.25B 估值追加约 $106M(有时称为 Series A2),随后在 July 2024 完成 $305M Series B,由 Salesforce Ventures 和 Coatue 领投,投后估值 $3.3B,Lakestar、NVIDIA 及多家战略方参与。runDate 时,SEC EDGAR 上没有 Together Computer Inc. 的 S-1、S-3 或注册发行;公开市场上也未确认老股交易或 2026 延伸轮。因此资本结构仍是纯风险投资,且有战略锚点 (NVIDIA 对应 GPU 供应,Salesforce Ventures 对应企业分销,Prosperity7 对应主权可选项);按领投信号看,董事会控制权在 KP、 Coatue 与 Lux 之间分散,但股权结构表本身未公开。累计稀释未披露;外界普遍报道创始人在 Series B 后仍保留有意义股权,但公开记录没有精确比例, 必须向管理层验证。[CI001, CI002, CI003, CI004, CI005, CI006]

资本充足性表
资本要素数值日期公开状态尽调问题
累计新股融资~$533M2024-07已披露(轮次层面)
账面现金未披露缺失索取现金头寸
月度烧钱未披露(隐含约 $15-25M)2024-25缺失索取实际烧钱速度
现金跑道月数未披露(隐含可能 18-30 个月)2025缺失索取现金跑道计划
计划资金用途未披露缺失索取资本开支计划
下一轮融资触发点未披露缺失索取里程碑
债务 / 项目融资未披露缺失索取融资授信条款
供应商融资(NVIDIA)未披露缺失确认是否存在设备融资
Series B 估值投后 $3.3B2024-07-09已披露
最新老股成交价未披露缺失Pitchbook / Information 传闻

资本要素混合了已披露融资轮金额和未披露的前瞻性财务要素。

[CI007, CI023, CI026, CI027, CI030, CI031]
FI004: 资本密集度 / 现金流图

将资本开支和经营性现金流映射到融资轮次。

现金余额和下一轮融资触发点未披露;箭头仅表示方向,不表示规模。

[CI007, CI026, CI027, CI030]

4.2 收入、定价与报道规模

Together 尚未提交财务报表。CNBC 在 July 2024 Series B 前后报道其年化收入节奏为 $100M;Bloomberg 提到「triple-digit growth」; Fast Company 和 VentureBeat 复述了这些数字,但没有独立验证。The Information 另有关于 2025 收入轨迹的付费墙报道;PitchBook 将公司列为后期风险投资公司,但没有确认 2025 跟投。公开定价页按 token 披露价格,7–70B 开放模型大约 $0.20–$0.90/M tokens, 并记录 50% 批量推理折扣;定制专用端点价格需通过销售报价。SKU 包括无服务器、专用 / 预留端点、微调、批量和 embeddings;视觉 / 音频 / 图像 SKU 另行记录。runDate 时,公司没有公开 ARR、分部拆分、客户集中度、NRR 或毛利率。Forrester 和 IDC 的市场框架说明 Together 是数十亿美元生成式 AI 推理 TAM 中的成长阶段进入者,但两家分析机构都未把 Together 列为前三厂商。管理层承认 Salesforce Ventures 联合销售带动企业管线加速, 但没有量化。公司自称动能、第三方媒体轶事和缺少经审计披露这三者组合,符合私营成长阶段 SaaS,但也在实际成交价 vs 标价、组合和毛利率上制造重大尽调风险。 GTM 动作以漏斗顶端自助开发者注册为主,随后通过 Salesforce Ventures 和 NVIDIA 渠道转介做伙伴驱动的企业扩张;销售周期、CAC 和回本周期未披露, 但参考可比推理 API 厂商披露,企业专用合同可推断为 60-120 天。[CI009, CI010, CI011, CI012, CI013, CI014]

收入来源表
SKU定价依据公开价格基准折扣杠杆尽调缺口
无服务器推理每百万 tokens$0.20–$0.90/M(7–70B 开放模型)用量 / 承诺用量实际成交价与标价差异未披露
批量推理每百万 tokens较无服务器折扣 50%批量 SLA 窗口2025 年博客更新已确认
专用端点定制 / 预留销售报价期限承诺未公开标价
微调 API按训练任务定价页报价用量文档公开,但未披露毛利率
向量嵌入 API每百万 tokens按模型公开用量
视觉 / 图像 / 音频 API按请求 / 按 token按模型公开收入组合未拆分
企业合同年度 / 承诺未披露战略折扣关键尽调问题

各定价行混合了公开标价(高置信度)和推断的企业交易做法(低置信度); 不同 SKU 之间的收入结构未披露,必须向公司索取。

[CI009, CI010, CI011, CI012, CI013, CI014]
定价 / 变现表
定价维度公开基准标价与实际成交折扣 / 未知项来源
Llama-70B 按 token 定价$0.88/M 无服务器仅标价用量折扣定价页
批量 SLA 折扣-50%,较无服务器仅标价批量窗口2025 年批量推理博客
专用端点定制 / 按小时实际成交价未披露期限承诺博客 + 销售报价
微调任务按训练 token仅标价用量微调文档页
向量嵌入每百万 tokens仅标价用量向量嵌入文档页
企业合同金额未披露实际成交未披露战略折扣向管理层索取
联合销售返利(Salesforce)未披露实际成交未披露合作伙伴经济条款Salesforce Ventures 联合销售
主权云溢价未披露实际成交未披露区域性Prosperity7 战略方

标价可以公开核验;企业合同的实际成交价未披露,必须索取。

[CI012, CI013, CI014, CI015, CI016]
FI001: 收入模型桥接图

客户活动如何转化为 Together 收入和毛利。

毛利连线仅作示意;实际利润率未披露。

[CI012, CI013, CI014, CI015, CI024, CI025]

4.3 单位经济、资本充足性与缺口

Together 的公开画像只能支持粗略单位经济估计。成本端,GPU-hour COGS 随 NVIDIA capex 扩张;CoreWeave 的 S-1 披露(有用可比) 显示,GPU 云在预留交易上的毛利率为 60–70%,按需交易更低。按 Together 标价看,无服务器每 token 毛利率可能在 40–60%,专用端点更高; 但实际毛利率取决于利用率和未公开的预留容量合同。现金端,已融资 $533M,对应截至 2024 估计 $300–$500M 现金消耗(与超大规模 GPU 建设和 150+ 员工数一致),说明现金跑道可延伸到 2026,但没有数值被确认。资本充足性取决于 Together 是延长 Series B 还是申请 IPO; Figma 和 CoreWeave 的 2025 IPO 先例说明 AI 基础设施发行人的公开市场窗口已打开,而 Navan 的 S-1 流程是更接近的成长 SaaS 可比。 缺口很重大:ARR 确认、按 SKU 毛利率、前 10 大客户集中度、净美元留存、已签约 vs 未签约收入、现金跑道月数、债务或供应商融资,以及任何主权云承诺。 这些缺口推动了单位经济和资本充足性表中的尽调要求,也为每个未披露原始指标形成重大证据缺口;在没有管理层披露前,最有信息量的外部信号是 Together 的公开招聘状态、定价页修订,以及任何 2026 老股市场传闻,尽调结束前都应跟踪。以这一规模的消耗型 SaaS 看,营运资本不太可能成为约束; 更大的现金摆动项是 GPU capex 相对收入爬坡的节奏,它决定下一轮触发时间。结论是:收入质量和增长表象强但未验证;毛利路径可信但未经审计; 资本强度高但有 NVIDIA 对齐支撑;首要尽调阻断点,是 public-financial-gaps 表中列出的全套私有财务原始指标。[CI019, CI020, CI021, CI022, CI023, CI024]

单位经济表
指标数值 / 空值置信度重要性尽调问题
无服务器推理毛利率40–60%(推断)长期利润率路径索取实际混合毛利率
专用端点毛利率60–75%(推断)预留容量客户 LTV索取专用端点毛利率拆分
批量推理毛利率35–55%(推断)50% 折扣后的批量毛利率确认批量利用率
CAC 回本周期null销售效率按客群索取回本月数
魔数null销售产能索取魔数
净留存率(NRR)null扩张代理指标按队列索取 NRR
总留存率null流失代理指标索取总留存率
2024 年隐含烧钱速度$300–$500M(推断)现金充足性索取 24 个月计划
GPU 集群利用率null利用率驱动毛利率按 SKU 索取利用率
SBC 比率null真实利润率索取 SBC 明细表

所有数值都是推断区间或空值;每个空值都配有具体尽调请求。

[CI019, CI020, CI021, CI022, CI023, CI024]
公开财务缺口表
项目公开状态重要性尽调问题
审计收入(ARR)未披露验证第三方 $100M 数据索取管理层 ARR 与增长材料
按 SKU 拆分毛利率未披露支撑长期投资逻辑按 SKU 索取 COGS 拆分
净金额留存率未披露粘性代理指标按队列索取 NDR
前 10 大客户集中度未披露收入集中风险索取匿名化前 10 大客户
已签约收入(RPO)未披露未来收入可见性索取已签约 / 未签约拆分
现金与现金跑道未披露资本充足性索取现金头寸与 24 个月计划
债务 / 供应商融资未披露资本结构如有,索取融资授信条款
创始人持股未披露利益一致性、稀释索取股权结构表
NRR 与总留存率未披露扩张与流失索取总 / 净留存率
股权激励费用未披露真实利润率与披露利润率索取 SBC 明细表
企业实际成交价未披露真实利润率与标价索取三份样本合同

所有项目都会影响投资判断,且截至 runDate 均未公开;本章依赖第三方信号, 也需要向管理层索取材料来补齐缺口。

[CI019, CI020, CI021, CI022, CI023, CI024]
FI002: 单位经济性桥接图

在披露值缺失时,拆解每 token 单位经济性的输入项。

所有定量节点都是推算区间或空值;本图仅作定性桥接。

[CI012, CI016, CI019, CI020, CI024, CI025]
FI003: 财务估计区间

有来源支持的收入、烧钱速度、现金跑道和利润率边界。

区间仅作示意;下限取最保守的公开数据点,上限按最激进公开数据点的 2x 估算。

[CI009, CI024, CI025, CI026, CI027]

4.4 图表要点

Chapter 05

05产品与技术

5.1 产品界面、模块与 SKU

Together AI 对外提供一个统一平台,包含无服务器推理、专用端点、微调、批量推理、embeddings,以及按模态划分的 API(视觉、音频、图像)。 产品界面记录在 docs.together.ai,并在 chat-completions 层面兼容 OpenAI,因此从封闭 API 迁移并不复杂。模型目录覆盖 200+ 个开放模型,包括 Llama 3/4、Mistral、Mixtral、Qwen、DeepSeek、StripedHyena 和定制微调 checkpoint;公开模型和 SKU 资料确认了按 token 与按请求计费界面。专用端点为延迟敏感工作负载提供 H100/H200/B200 GPU 预留容量,需通过销售报价。微调 API 支持在多数支持的模型家族上运行 LoRA 和全参数训练任务。批量推理相对无服务器最高可打 50% 折扣,并有记录的 SLA 窗口。SDK 提供 Python 与 TypeScript,其他运行时可用原始 HTTP;限速文档区分免费、付费和企业层。下方完整产品模块 / 资产矩阵列出每个模块、主要用户、成熟度状态、 差异化,以及买方在签长期合同前应追问的缺口。模块排序遵循买方典型采用路径:先用无服务器试验,再用专用端点和微调进生产,最后用批量和 embeddings 扩展工作流。[CE001, CE002, CE003, CE004, CE005, CE006]

产品模块 / 资产矩阵
模块用户状态 / 成熟度差异化尽调缺口
无服务器推理 API开发者、创业公司正式可用200+ 开放模型上的 OpenAI 兼容聊天补全SLA 百分比未公开
专用端点企业正式可用预留 H100/H200/B200 算力,提供 BAA标价未公开
微调 APIML 工程师正式可用Llama/Mistral/Qwen 支持 LoRA + 全参数微调训练成本透明度
批量推理ML 工程师正式可用(2025 更新)较无服务器折扣 50%实际批量利用率未披露
向量嵌入 API开发者正式可用多个开放向量嵌入模型按模型跟踪留存
视觉 / 图像 / 音频 API多模态开发者正式可用Llama-Vision、图像生成、音频转写区域可用性地图
推理引擎:Together Inference Engine(TIE v1/v2)内部 / 高阶用户正式可用FA-3/4 + TK + 推测解码引擎版本 SLA 差异
Mixture-of-Agents研究人员、高阶开发者测试版集成推理提升质量相较单模型的成本溢价
模型商店所有用户正式可用200+ 开放权重 + 自定义权重目录更新节奏
SDK(Python、TS、HTTP)开发者正式可用OpenAI 兼容 + 原生SDK 发布节奏

成熟度按公开文档状态判断;标为测试版或限量开放的单元格反映 runDate 时文档中的明确说法。

[CE001, CE002, CE003, CE004, CE005, CE006]
工作流 / 用例表
用户任务当前工作流Together 方案可衡量收益限制
试用开放模型本地 llama.cpp 或 HF Spaces无服务器 API 调用零基础设施,兼容 OpenAI规模化后的成本
从封闭 API 迁移生产负载OpenAI SDK将基础 URL 换成 Together同一套 SDK,开放权重功能对齐的边界
微调一个 Llama 变体自建 GPU 集群微调 API + 运行任务不需要 DevOps训练步骤可见度有限
支撑低延迟应用自托管 vLLM专用端点预留容量,BAA更高承诺用量
运行夜间批量摘要自托管批处理批量推理 SKU比无服务器方案便宜 50%批处理 SLA 窗口
构建智能体LangChain + 封闭 API函数调用 + JSON 模式 + 结构化输出开放权重 + 工具使用工具调用模式仍在演进
生成向量嵌入本地 HF 向量嵌入模型向量嵌入 API托管、可扩展重新索引成本
多模态(视觉)自托管 Llama-Vision视觉 API托管视觉调用图像尺寸限制
研究集成方案论文复现代码MoA API开箱即用的集成推理单次查询成本更高
运行受监管工作负载本地部署 GPU专用端点 + BAA专用端点支持 HIPAA尚无 FedRAMP

工作流各行来自文档快速入门和客户案例研究;限制项来自文档中的明确提示或 已知缺口。

[CE001, CE002, CE003, CE004, CE005, CE006]
FE001: 产品架构图

Together AI 产品栈从 API 到 GPU 底座的分层。

[CE001, CE011, CE012, CE013, CE014, CE015]

5.2 架构、依赖与运营模型

Together 的架构把应用 API(chat、completions、embeddings、fine-tune、batch)叠在模型注册表与推理编排器 之上,后者在多区域 NVIDIA Hopper / Blackwell 机群上调度 GPU pod。推理引擎(Together Inference Engine v1/v2)封装 FlashAttention-3 与 FlashAttention-4 attention kernel、ThunderKittens kernel 框架,以及 speculative-decoding / Medusa decoder,用来实现已发表的吞吐和延迟主张。Mixture-of-Agents(MoA)研究使支持模型能够通过 ensemble 推理获得更高质量补全。 模型存储依托 HuggingFace 和 Together 自有 registry;权重可迁移是其明确设计原则。关键依赖包括 NVIDIA GPU 供应(Hopper/Blackwell)、 数据中心共址伙伴、HuggingFace 模型工件目录,以及用于微调工件的 AWS S3 / 等价存储。运营模型把 kernel / 推理工程团队(Tri Dao、HazyResearch 脉络)、平台 / SRE 团队(2025 起由 Alon Gavrielov 领导的基础设施组织)和研究部门 (Chris Ré、Percy Liang)拆开。架构通过一张流程图呈现(客户请求到 GPU pod 再到响应),并用关键依赖 DAG 暴露单一供应商集中。 可靠性证据包括状态页、已发布限速文档,以及 GTC 2025 和 AI Native Conference 2025 上发布的模型路线图。缺口包括公开 SLA 百分比、 精确多区域地图(哪些区域、哪些提供商),以及单一事实源路线图;这些均被标为证据缺口。[CE011, CE012, CE013, CE014, CE015, CE016]

技术 / 运营架构表
层 / 组件作用关键依赖风险
API 网关接收兼容 OpenAI 的 HTTP 请求鉴权 + 限流基础设施DDoS、限流校准失误
模型注册表将模型 ID 解析到权重HuggingFace + 内部存储权重变动、许可证更新
推理调度器把请求调度到 GPU podGPU 池、Kube / 编排器热点拥堵、队列深度
推理引擎:Together Inference Engine v2内核优化的模型执行FA-3/4、ThunderKittens、推测解码引擎 bug、新模型回归
GPU 池(Hopper / Blackwell)算力底座NVIDIA 供给、托管机房合作伙伴供给冲击、断电
微调训练器LoRA / 全参数训练任务GPU 池 + 对象存储任务失败成本
批处理队列调度批量推理GPU 低峰窗口若撞上高峰,可能违反 SLA
向量嵌入服务嵌入文本 / 图像向量嵌入模型注册表模型弃用
视觉 / 音频路径多模态推理独立模型栈模态特定 bug
可观测性 / 状态SLA 监控status.together.ai 动态源仍缺公开 SLA
信任 / 合规SOC 2 + HIPAA 控制审计节奏FedRAMP 尚未正式可用
存储(微调产物)持久化已训练模型S3 等价存储丢失 / 泄露场景

架构层基于已披露的产品表面;各层深度从博客和研究论文推断, 可能并不穷尽。

[CE011, CE012, CE013, CE014, CE015, CE016]
FE002: 客户工作流 / 运营流程

客户请求经过 Together 平台并返回补全的路径。

[CE011, CE012, CE013, CE014, CE015, CE016]
FE003: 关键依赖图

Together 依赖的供应商、平台和合作伙伴。

[CE014, CE018, CE019, CE020, CE021, CE022]

5.3 信任、安全、合规与路线图

Together 发布的信任中心提到 SOC 2 Type II 认证、专用端点可提供 HIPAA business associate agreement(BAA),以及标准数据处理条款。 runDate 时,FedRAMP 和类似美国联邦认证尚未列出;区域驻留通过专用集群提供,但公开地图不完整。安全控制覆盖内容审核、function-calling JSON 校验、structured-output JSON mode,以及按模型给出的安全指南。路线图从博客和 AI Native Conference 文章中梳理,包括 Blackwell (B200)容量爬坡、批量推理 SKU 扩展、更多微调家族、多模态(vision+audio)覆盖,以及 Mixture-of-Agents 产品化。差异化建立在四点: (a)kernel 级性能领先(FA-3/4、TK),(b)开放权重模型覆盖广,(c)无服务器 / 专用 / 批量 SKU 之间足够灵活,(d)研究与工程双文化, 且深接 Stanford / Princeton / ETH 脉络。公开开发者信号——GitHub repo 活跃度、PyPI 下载轨迹、HuggingFace model hub 存在感和 Hacker News 讨论参与——确认社区活跃,但规模还未匹配 OpenAI 或 Hugging Face 本身。相比超大规模云产品,Together 的差异在开放权重中立性和 kernel 性能上最明显,在企业合规广度上最不明显。下方信任 / 合规与路线图表按当前状态、范围和缺口汇总每项控制与里程碑; 标为 unknown 的单元格表示缺少公开披露,而不是底层能力不存在。[CE023, CE024, CE025, CE026, CE027, CE028]

信任 / 质量 / 合规表
控制 / 认证状态范围缺口
SOC 2 Type II已鉴证平台需要最新鉴证日期
HIPAA / BAA可用专用端点不覆盖无服务器层级
GDPR / DPA可用欧盟客户具体区域驻留
FedRAMP尚未美国联邦路线图时间未确认
ISO 27001未确认状态不确定
数据驻留 / 区域集群部分可用欧盟、美国公开区域地图有限
内容审核 / 安全有文档API 层各模型行为不同
函数调用 / JSON 模式正式可用API工具使用模式仍在演进
结构化输出正式可用API
审计日志有文档企业默认未启用
自定义模型权重隐私有文档专用端点需要合同审查
漏洞赏金 / 负责任披露已发布平台

控制项已用 trust.together.ai 页面、博客文章和公开文档交叉核验;标为「未确认」 的单元格表示公开披露缺失,并不等于底层控制不存在。

[CE023, CE024, CE025, CE026, CE027, CE028]
路线图 / 发布 / 开发阶段表
日期 / 阶段功能 / 里程碑状态影响来源
2024-07FlashAttention-3正式可用Hopper 上的内核领先性arXiv 2407.08608
2024-10ThunderKittens正式可用内核框架Together 博客
2024-11Startup Accelerator已推出GTM 渠道Together 博客
2025-03GTC 2025 Pioneers活动客户 + NVIDIA 曝光Together 博客
2025-04Alon Gavrielov 出任 VP Infra已聘任运营规模Together 博客
2025-05Adaption 合作已推出医疗健康工作流Together 博客
2025-06AI Native Conference活动研究 + 产品发布Together 博客
2025-08FlashAttention-4正式可用下一代内核Together 博客
2025-09批量推理 API 更新正式可用50% 折扣 + SLATogether 博客
2026-Q1Blackwell (B200) 上线计划中容量与价格从文档推断
2026MoA 产品化扩展计划中质量层级AI Native Conference
2026多模态扩展计划中视觉 + 音频覆盖Together 博客

runDate 之后的路线图事项均明确标为计划中;来源包括博客文章和会议 公告。

[CE033, CE034, CE035, CE036, CE037]
FE004: 产品成熟度 / 能力图

各产品模块的成熟度评分。

[CE001, CE002, CE003, CE004, CE005, CE006]

5.4 图表要点

Chapter 06

06客户情况

6.1 客户分层与采用入口

Together AI 的客户基础按买方 / 用户角色和部署强度分层。漏斗顶端是使用无服务器推理做原型或低量生产的自助开发者:按公司披露,自 GA 以来已有超过 100,000 名开发者使用该平台。其下是具名创业客户——Pika(视频)、Arcee(开源合并)、Nous Research(社区模型)、 Cartesia(语音)——它们通过无服务器和专用端点组合运行生产工作负载。企业层由 Salesforce(通过 Salesforce Ventures 联合销售和客户案例提及)、 Zoom(客户案例)和 Washington University(研究部署)支撑;NVIDIA GTC 2025 Pioneers 项目又浮现出一批客户,包括医疗健康、机器人和开发者工具公司。 Startup Accelerator(2024-11 启动)是面向早期 AI 创业公司的明确漏斗,提供额度、技术支持和 GTM 放大。地域组合偏北美, EU 通过专用集群增长;垂直组合覆盖开发者工具、内容 / 媒体(视频、语音、图像)、企业 SaaS、医疗健康和学术。付款方 / 用户 / 买方拆分随层级变化: 自助层里开发者既是买方也是用户;企业层买方通常是 CTO/CIO 或平台工程负责人,用户则是应用团队。下方客户分层、采用轨迹和具名客户证明表,记录每一行的证据质量与留存、 集中度剩余缺口。[CU001, CU002, CU003, CU004, CU005, CU006]

客户分群表
分群买方 / 用户 / 付费方用例规模收入 / 战略价值缺口
自助式开发者开发者 = 买方 + 用户原型开发、低量生产100,000+ 开发者(公司声称)长尾收入 + 漏斗付费与免费未拆分
AI 原生初创公司CTO / 创始人生产推理Pika、Cartesia、Nous、Arcee 有文档记录高战略价值未披露收入数值
企业 SaaSCIO / 平台工程嵌入式 AI 功能Salesforce、Zoom较大战略价值未披露合同规模
医疗健康CIO / 临床负责人受监管工作流(BAA)Adaption(2025 年推出)战略性生产状态待定
高校 / 研究PI / IT 负责人科研计算Washington University品牌价值支出规模未披露
开发者工具创始人 / CTO嵌入式推理GTC 2025 批次管线未列明批次成员
主权 / 政府采购主权云与 Prosperity7 对齐(暗示)战略可选性无公开证据
开源社区维护者OSS 模型服务HuggingFace 镜像集成品牌 + 社区主动与被动使用未拆分

分群行混合了具名案例研究和推断类别;收入区间数值不可得。

[CU001, CU002, CU003, CU004, CU005, CU006]
客户增长 / 采用进展表
指标日期来源置信度含义缺失分母
使用平台的开发者100,000+2024Together 博客漏斗顶端规模付费 / 免费拆分
具名客户案例研究已发布 7+ 个2024-25Together 博客真实生产使用总客户数
GTC 2025 客户队列约 12 家先锋客户2025-03Together 博客 + NVIDIA企业销售管线单客户 ACV
Startup Accelerator 参与者未披露 N2024-11 起Together 博客管线杠杆队列规模
Adaption 医疗合作伙伴1(已启动)2025Together 博客受监管行业切入生产状态
HuggingFace 集成用户未披露2024-25HF 博客开源社区拉动活跃开发者
G2 评价样本数很小2025G2独立证据样本量太低,缺乏代表性
Trustpilot 评价样本数很小2025Trustpilot独立证据样本量太低,缺乏代表性

采用进展行混合了公司自称数据(低置信度)和第三方报道数字;缺失的分母已逐项列出。

[CU001, CU002, CU011, CU012, CU013, CU014]
FU001: 客户旅程图

从自助开发者到企业扩张的路径。

[CU001, CU002, CU003, CU004, CU005, CU006]
FU002: 采用 / 部署漏斗

开发者到企业客户的逐阶段转化。

认知、活跃付费和多年期合同数量都是示意占位;只有注册和具名数量有来源支持。

[CU001, CU002, CU011, CU012, CU013, CU014]

6.2 具名客户证明与耐久性

具名客户证明覆盖七个公开案例(Salesforce、Zoom、Pika、Arcee、Nous Research、Cartesia、Washington University),加上 GTC 2025 Pioneers 队列和 Adaption 医疗健康合作。每个案例都记录了客户工作流、使用模型和定性结果;量化结果(吞吐、延迟、成本、ROI)在部分部署中有记录, 但并不全面。最常被引用的结果是 FlashAttention 带来的延迟降低(Pika、Cartesia)、相对封闭 API 的成本降低(Arcee、Nous),以及集成深度 (Salesforce、Zoom)。生产 vs 试点方面,Salesforce、Zoom、Pika、Cartesia 明确为生产;Adaption 被描述为启动中的合作,而非已确认生产部署。 反向与耐久信号混合:G2 和 Trustpilot 评论数很少,限制了独立留存代理;Reddit 和 Hacker News 讨论偶尔提到无服务器层延迟或冷启动问题; 未见公开客户流失公告或终止客户报告。下方客户证明矩阵按证据质量、结果具体性、留存可见度和生产成熟度标注每个具名客户。留存与重复使用原始指标(NRR、GRR、总留存) 均未披露,本章把该缺口列为重大证据缺口,并附具体尽调要求。引用质量和新鲜度以 2024-2025 案例(Salesforce、Zoom、Pika)最佳; 更早且 2026 未更新的案例较弱。[CU012, CU013, CU014, CU015, CU016, CU017]

具名客户证据表
客户客群部署 / 用例生产 / 试点结果限制
Salesforce企业 SaaS联合销售 + 嵌入式推理生产集成深度 + Series B 轮领投合同金额未披露
Zoom企业 SaaSAI 功能推理生产延迟改善具体指标未公开
Pika初创公司(视频)视频模型服务生产靠 FA 级内核降低延迟成本收益仅定性
Cartesia初创公司(语音)语音模型服务生产专用部署吞吐定价未披露
Arcee初创公司(开源)模型合并 + 推理生产相比闭源 API 的成本优势用量未披露
Nous Research开源社区社区模型托管生产开放权重中立性收入结构未披露
Washington University学术机构科研算力生产科研吞吐支出规模未披露
Adaption医疗受监管工作流启动中进入医疗生产状态待定
GTC 2025 Pioneers 队列企业混合客群多种用例生产NVIDIA + Together 联合队列名单未完整列出

各行只列有案例研究或新闻证据的公开具名客户;未公开的具名客户(如有)不在本表内。

[CU012, CU013, CU014, CU015, CU016, CU017]
留存 / 重复使用 / 满意度表
指标值 / null客群置信度尽调请求
NRRnull企业请求按队列拆分 NRR
GRRnull企业请求按队列拆分总留存率
Logo 流失null企业请求具名账户流失名单
活跃开发者(付费)null自助请求付费开发者数量
复购率null自助请求队列复购率
G2 平均评分样本数很小自助样本数太小,不能外推
Trustpilot 平均评分样本数很小自助样本数太小,不能外推
Reddit/HN 情绪褒贬不一到偏正面社区汇总定性扫描
具名客户续约null企业通过客户访谈确认
专用端点续约率null企业请求续约队列

所有留存基础指标均为 null,并配有具体尽调请求。

[CU022, CU023, CU024, CU025, CU026]
FU003: 客户验证矩阵

按具名客户拆解证据质量;每行围绕单个客户展开证据维度,补充「具名客户证明」表。

[CU012, CU013, CU014, CU015, CU016, CU017]

6.3 扩张、集中度与反向信号

扩张代理大多是定性信号。Salesforce Ventures 联合销售关系是首要企业扩张杠杆;市场把 Salesforce Ventures 领投 Series B 解读为多年渠道承诺。 NVIDIA GTC 2025 Pioneers 和 Startup Accelerator 则增加品牌与管线。HuggingFace 合作把模型 hub 中的开发者导入 Together。 没有管理层披露,就无法精确限定集中度风险;但公开客户组合偏 AI 原生创业公司和开发者工具公司,而不是少数超大型企业合同,说明漏斗顶端比 OpenAI 式锚定客户模型更分散。专用层记录了渠道与采购摩擦:企业销售周期需要销售介入、定制 MSA 和安全审查,收入确认前会增加 60-120 天。反向信号包括零散 Reddit 和 Hacker News 讨论提到无服务器层延迟、冷启动或偶发可靠性事件;公司维护公开状态页,但不发布 SLA 百分比。runDate 前未见公开诉讼、 丢失客户报道或具名账户流失。下方扩张与集中度表记录每个扩张驱动、集中度风险、影响幅度,以及关闭剩余缺口所需的精确尽调路径;本章留存表把所有未披露原始指标视为尽调要求, 而不是断言无法溯源的数字。整体看,客户证据基础符合一个成长阶段推理平台:它在强自助开发者飞轮之上,正在建立真实企业牵引。[CU027, CU028, CU029, CU030, CU031, CU032]

扩张与集中风险表
扩张驱动集中风险影响尽调路径
Salesforce Ventures 联合销售企业订单过度集中于 Salesforce 渠道量化来自 Salesforce 的管线占比
NVIDIA GTC PioneersNVIDIA 转介绍集中量化 GTC 来源 ACV
Startup Accelerator长尾稀释风险跟踪队列收入转化
HuggingFace 合作漏斗依赖 HF确认交叉推广条款
自助开发者增长长尾流失风险按月跟踪队列留存
Adaption 医疗切入单一具名合作伙伴风险跟踪后续医疗客户拿单
主权 / Prosperity7 渠道若落地,存在主权客户集中风险确认管线承诺
开源社区品牌依赖 OSS 拉动跟踪 GH/HF/PyPI 信号稳定性
前 10 大客户集中若未披露则影响重大请求匿名化前 10 大客户数据
地域集中北美占比高请求区域收入拆分

在缺少客户收入拆分披露时,扩张驱动和集中风险只能做定性排序。

[CU027, CU028, CU029, CU030, CU031, CU032]
FU004: 留存 / 复用队列

时间序列留存占位图,暂用行业常见 PLG SaaS 代理值;Together 披露前,所有数字仅作示意。

所有留存单元格都是行业基准示意值(PLG SaaS / 推理);Together 尚未披露实际队列留存。

[CU022, CU023, CU024, CU025, CU026]

6.4 图表要点

Chapter 07

07风险

7.1 监管与法律风险面

Together AI 面对的生成式 AI 监管边界,与所有在美国和欧洲运营的基础模型平台相同。在美国,FTC 于 2024 年启动对生成式 AI 投资与伙伴关系的 6(b) 研究,并表示会广泛审查云与 AI 关系的反垄断问题;Biden / Trump-era Executive Order on AI 为联邦 AI 标准奠定基础, NIST AI Risk Management Framework 将其操作化。BIS 已收紧先进 GPU(A100、H100、H200、B200)以及部分基础模型权重出口管制, 直接影响 GPU 云运营商。欧盟方面,AI Act 于 2024 生效,对通用 AI 提供商的分阶段义务将延续至 2026-2027;英国 ICO 和澳大利亚 OAIC 发布的 GenAI 指引也形成事实合规底线。隐私制度(California 的 CCPA、医疗健康工作负载的 HIPAA)施加合同层义务,Together 通过其信任中心提及的 BAA 和 SOC 2 控制来履行。诉讼侧,NYT v Microsoft/OpenAI、Authors Guild v OpenAI 和 Getty v Stability AI 是版权风向标案件,结果会塑造每个模型托管平台的风险暴露;Together 目前不是具名被告,但其开放模型托管业务存在相邻暴露, 尤其当判例扩展到 platform-as-host 时。民间组织压力(CDT、EFF)增加声誉风险。下方监管与法律风险登记表按司法辖区、可能性、严重性、 缓释和剩余暴露排序每个条目,并为每项未披露控制设置尽调问题。[CR001, CR002, CR003, CR004, CR005, CR006]

监管 / 法律风险登记表
规则 / 案件司法辖区状态可能性严重度缓释措施剩余暴露
FTC 6(b) 生成式 AI 调查美国进行中聘请律师,持续监测可能的行为性救济
FTC 一般 AI 执法美国执行中标准广告 / 竞争合规执法行动
EU AI Act(GPAI)欧盟2024-27 分阶段实施GPAI 义务、透明度、版权退出机制违规罚款最高可达收入的 7%
BIS 出口管制(GPU + 权重)美国 / 全球2025 年收紧客户地理围栏、筛查主权部署受阻
NIST AI RMF美国自愿采用框架控制若缺失,采购处于劣势
UK ICO 生成式 AI 指引英国有效UK DPA + GDPR 合规姿态执法暴露
澳大利亚 OAIC 生成式 AI 指南澳大利亚有效采纳指南执法暴露
白宫 AI 行政令美国有效报告阈值报告负担
CCPA(加州)美国-加州有效隐私控制执法暴露
HIPAA(医疗工作负载)美国有效BAA、专用层级数据泄露 + 罚款
SOC 2 证明范围全球在信任中心自我声明SOC 2 Type II 证据若过期,存在证明缺口
NYT 诉 Microsoft/OpenAI(版权)美国诉讼进行中监控;平台与托管方边界判例外溢风险
Authors Guild 诉 OpenAI美国诉讼进行中监控;平台与托管方边界判例外溢风险
Getty 诉 Stability AI美国 / 英国诉讼进行中监控;图像模型相邻风险判例外溢风险
CDT AI 政策压力美国活跃沟通、透明度声誉

每行反映 runDate 时的规则 / 案件态势;评级为定性判断,待管理层披露后再确认。

[CR001, CR002, CR003, CR004, CR005, CR006]
FR001: 风险热力图

主要风险的可能性 × 严重性热力图。

[CR001, CR003, CR004, CR012, CR018, CR021]

7.2 运营、安全、伙伴与依赖风险

Together 的运营风险集中在三条线:GPU 资源供给(Hopper 和 Blackwell)、模型服务可靠性,以及受监管工作负载的控制。NVIDIA 是最重要的单一供应商依赖——GPU、网络(NVLink、InfiniBand)和软件栈(CUDA、TensorRT、NeMo、Dynamo)都绕不开;它同时也是战略投资方,这降低了供给分配风险,也把下行情景集中到同一条链上:一旦 Blackwell 分配收紧,冲击会高度相关。HuggingFace 是主要的模型制品依赖;如果 HF 调整托管条款或商业协同,合作伙伴风险会浮现。Salesforce Ventures 通过 B 轮成为核心企业渠道伙伴,渠道集中度风险并不小。安全暴露覆盖标准模型云攻击面(提示词注入、数据外泄、提示词日志泄露、模型权重供应链被攻破),也覆盖 Together 在信任中心披露的 SOC 2 / HIPAA 控制面。公开视频状态页存在,但不披露 SLA 百分比。竞争替代风险真实存在:Fireworks、Replicate、Modal、Anyscale、Cerebras 和 Groq 都服务重叠工作负载;超大规模云厂商(AWS Bedrock、GCP Vertex、Azure OpenAI)则把推理捆进既有企业合同。人员与执行风险包括 Vipul Ved Prakash(CEO)、Ce Zhang(CTO)和 Tri Dao(首席科学家)的关键人依赖,以及必须跟上 Hopper→Blackwell→Rubin 节奏的建设速度。下方运营、伙伴和人员风险台账逐项记录失效模式、缓释成熟度和剩余暴露,并给出明确尽调路径。[CR018, CR019, CR020, CR021, CR022, CR023]

运营 / 质量 / 安全风险登记表
故障模式可能性严重性缓释成熟度剩余风险敞口未解决缺口
无服务器多小时中断状态页;未披露 SLA %客户流失SLA 披露
专用端点硬件故障暗示有冗余收入风险可靠性指标
提示注入 / 数据外泄安全模型、函数调用护栏客户侧泄露渗透测试节奏未披露
模型权重供应链受损HF 完整性检查平台级受损权重签名流程未披露
SOC 2 证明失效信任中心披露安全态势企业交易受阻到期日未披露
HIPAA BAA 泄露事件可签 BAA监管罚款泄露应对计划未披露
GPU 产能缺口NVIDIA 合作关系收入上限分配承诺未披露
网络 / 跨区故障暗示多区域部署延迟飙升区域地图未披露
内部威胁标准控制数据泄露访问控制未披露
软件缺陷引入回归暗示分阶段发布声誉发布节奏未披露

运营评级为定性判断;多项控制原语未披露,应作为尽调问题处理。

[CR018, CR019, CR020, CR021, CR022, CR028]
合作伙伴 / 依赖风险登记表
依赖交易对手角色集中度失效情景严重性缓释措施剩余风险敞口
GPU 供应NVIDIA主要供应商 + 投资方很高Blackwell 配额削减战略投资方;多代产品承诺收入上限
模型工件HuggingFace注册表 + 分发托管政策变化公司自托管兜底分发摩擦
企业渠道Salesforce联合销售 + 投资方联合销售优先级下调直销体系搭建管线收缩
数据中心容量多方(未披露)托管机房 + 超大规模云厂商单一区域容量损失多区域建设延迟 / 成本
网络多方传输 + IX对等互联丢失多运营商短时延迟
开源社区Llama、Mistral、Qwen、DeepSeek 维护者模型上游许可证变更模型多样性许可证审查负担
资本伙伴资本伙伴:GC / Salesforce / NVIDIA / Lux / Coatue / Prosperity7 / Kleiner投资方融资轮超额认购失败收入进展融资风险
主权资本伙伴Prosperity7(KSA 相关)战略投资方地缘政治压力披露姿态声誉

依赖评级只反映公开集中度;私下合同承诺仍是尽调问题。

[CR023, CR024, CR025, CR026, CR027, CR030]
人员 / 执行风险登记表
角色 / 职能依赖或缺口可能性严重性缓释措施尽调路径
CEO Vipul Ved Prakash创始人主导;关键人依赖创始人留任背调
CTO Ce Zhang关键人依赖留任背调
首席科学家 Tri Dao关键人;塑造品牌认知学术双重任职留任计划
基础设施 VP Alon Gavrielov新入职(2025)近期加入入职评估
CFOrunDate 时未披露招聘推进中(推断)确认任命
CRO / 销售负责人runDate 时未披露企业销售体系搭建确认任命
工程人才梯队Series B 后扩张招聘势头员工数披露
合规 / GRC提到 SOC 2;团队规模未披露证明材料确认团队规模
董事会构成GC + SVP + NVIDIA + 创始人成长阶段治理董事会会议纪要尽调
Hopper→Blackwell→Rubin 过渡执行跨季度建设与 NVIDIA 合作项目计划尽调

人员风险表同时纳入已点名个人和未披露岗位;确认 CFO/CRO 任命是明确的尽调问题。

[CR032, CR033, CR034, CR035]
FR002: 风险传导图

风险如何传导至收入、利润率、融资和估值。

[CR001, CR003, CR004, CR012, CR023, CR024]

7.3 缓释措施、放弃标准与投资逻辑破裂触发点

下方缓释与放弃标准表把每个核心风险配到可监控触发点、明确阈值或事件,以及触发后的动作含义。触发点覆盖监管(如 2027 年 EU AI Act GPAI 义务执法)、诉讼(如延伸到平台托管方的不利版权裁决)、伙伴(如 NVIDIA 分配削减或 HuggingFace 托管变更)、运营(如无服务器推理多小时宕机、披露数据泄露)、竞争(如超大规模云厂商捆绑推理定价下压)、商业(如 Salesforce 联合销售流失)和执行(如创始人离职、Blackwell 上线延期)。每个触发点都记录向收入、毛利、融资或估值传导的路径,以及动作含义(放弃、重估、监控、接受)。本章明确说明,多项基础指标——事故数、SLA、前 10 大客户集中度、留存、GPU 承诺支出、运营支出拆分——均未披露,因此作为尽调问题处理,而不是断言无法溯源的数字。反向来源覆盖较广:监管机构(FTC、BIS、EU、UK ICO、OAIC)、法律案卷(CourtListener:NYT、Authors Guild、Getty)、竞争对手网站(Fireworks、Replicate、Modal、Anyscale、Groq、Cerebras、CoreWeave、Lambda)和开发者情绪论坛(Hacker News、Reddit)。本章判断,Together 的公开风险面符合成长期 AI 基础设施公司的常态,缓释姿态健康;但若干控制基础项仍需管理层披露后验证。[CR034, CR035, CR036, CR037, CR038, CR039]

缓释措施和叫停标准表
风险可监控触发信号阈值 / 事件行动含义
EU AI Act 的 GPAI 条款执法通知同业首个 7% 罚款重新测算欧盟收入
BIS 出口收紧新的实体清单规则新增 GPU 出口类别重新测算主权客户管线
版权诉讼外溢平台与托管方裁决任何托管方责任裁决重新测算 OSS 托管
NVIDIA 配额Blackwell 配额削减可比同业公开遭削减重新测算容量爬坡
HuggingFace 政策变化HF 条款更新重大商业条款变化搭建自托管
无服务器中断多小时事件>4h 或重复 >1hSLA 复盘 + 客户沟通
安全事件披露事件任何需报告事件立即重新测算
客户集中度前 10 大客户占比单一客户 >25%集中度折价
创始人离职公开公告CEO/CTO/CSO 任一叫停或大幅重新测算
降价融资新融资较 Series B 持平或下调重新测算估值

触发信号可通过公开披露监控;本表就是本章可执行的叫停标准。

[CR034, CR035, CR036, CR037, CR038, CR039]
FR003: 依赖关系图

关键合作伙伴、供应商、监管机构和融资依赖。

[CR023, CR024, CR025, CR026, CR027, CR030]

7.4 证据材料

Chapter 08

08估值

8.1 投资建议、投资逻辑与反向逻辑

建议为持有 / 观察,置信度中等,风险评级中高。投资逻辑:Together AI 处在一个结构性有吸引力的交叉点:(a)按分析师和市场数据来源(Gartner、Forrester、IDC、a16z、Bessemer、Menlo),GenAI 推理市场以 40-60% CAGR 扩张;(b)技术护城河可信,来自 FlashAttention 作者身份(Tri Dao)、ThunderKittens 内核(Stanford HazyResearch)、Together Inference Engine v2 和 Mixture-of-Agents 产品化;(c)企业分发渠道由 Salesforce Ventures 联合销售、NVIDIA GTC 2025 Pioneers 和 Startup Accelerator 漏斗锚定。反向逻辑:推理层竞争激烈,Fireworks、Replicate、Modal、Anyscale、Cerebras、Groq 和超大规模云厂商(AWS Bedrock、GCP Vertex、Azure OpenAI Service)都把推理捆进既有企业合同;收入(The Information 报道为 $130M-$200M+ ARR)和留存基础指标仍未披露;B 轮标记估值(约 $3.3B-$3.5B)需要多年收入规模才能支撑 3-5x 退出;监管边界(EU AI Act、BIS、版权诉讼先例)到 2027 年持续收紧。估值章节把上述每一项都列为明确的投资逻辑破裂触发点,并配上可监控阈值和动作含义。下方建议摘要表并列给出建议、置信度、风险评级、估值立场和决策含义;投资逻辑 / 反向逻辑表记录底层论据,以及哪些变化会改变判断。[CV001, CV002, CV003, CV004, CV005, CV006]

建议摘要表
建议置信度风险评级估值立场决策含义
持有 / 观察中高处于或接近当前 Series B 轮估值跟踪 ARR + NRR + 集中度;Series C 时复盘
买入(有条件)回调 25% 或确认 ARR >$500M牵引力确认或出现降估融资后再进入
放弃(有条件)若超大规模云厂商降价 >40%,或 NVIDIA 配额削减,或发生安全事件若悲观触发项出现,则退出 / 拒投
乐观情景2028 年前以 >$8B 退出战略收购或高溢价 IPO 路径
基准情景2028 年前 $4B-$6B 退出ARR 扩大 + 毛利率扩张
悲观情景$1B-$2.5B 结果降估融资 / 退出估值受压

本建议取决于投资逻辑失效表中的触发阈值。

[CV001, CV002, CV003, CV004, CV005]
投资逻辑 / 反向逻辑表
论点改变判断的证据
分析师资料显示,生成式 AI 推理 TAM 以 40-60% CAGR 增长TAM 下修至 <20% CAGR
FlashAttention + ThunderKittens + TIE v2 拼出可信技术护城河开源 / 超大规模云厂商内核追平,削弱 Together 优势
Salesforce Ventures 领投 Series B,意味着多年渠道承诺Salesforce 联合销售优先级下降或客户流失
NVIDIA 战略投资 + GTC 2025 Pioneers 入选,指向供给和销售管线NVIDIA 将资源重新分配给直营产品(DGX Cloud)
相比闭源 API 提供商,开源中立定位可防守主要 OSS 许可变更(Llama、Mistral、Qwen、DeepSeek)
已有企业 + 初创客户案例(Salesforce、Zoom、Pika、Cartesia、Arcee)具名客户流失或生产环境降级
资本底座 + 品牌吸引人才和客户降估融资或 Series C 失败
反向:超大规模云厂商捆绑推理(AWS Bedrock、GCP Vertex、Azure)压缩价格超大规模云厂商退出捆绑推理
反向:生成式 AI 版权诉讼可能延伸到平台托管方负面判例仅限于模型训练方被告
反向:收入 + 留存未披露;入场需严守价格纪律管理层披露 ARR + NRR

投资逻辑与反向逻辑是对称的;本章明确列出哪些证据会推翻判断。

[CV006, CV007, CV008, CV009, CV010, CV011]
FV001: 投资建议逻辑

从规模、证据、风险和估值推导出投资建议的链条。

[CV001, CV002, CV003, CV004, CV005, CV006]

8.2 情景、可比公司与敏感性

估值由三种情景锚定。基准情景($4B-$6B 退出,约 50% 概率)假设 ARR 从当前 $130M-$200M 在 2026-2028 年扩大到 $500M-$700M,毛利率维持在 AI 推理典型的 30-40% 区间,C 轮适度稀释,Hopper→Blackwell 产能按时爬坡。乐观情景($8B-$12B 退出,约 25% 概率)要求 2028 年 ARR >$1B,FlashAttention 推动利用率提升并带来毛利率扩张,Salesforce + NVIDIA 渠道承诺持续,并出现战略收购(NVIDIA、超大规模云厂商、Salesforce)或 2027-2028 年以溢价倍数 IPO。悲观情景($1B-$2.5B 结果,约 25% 概率)在以下情况下兑现:超大规模云厂商捆绑推理压低价格、NVIDIA 分配收紧,或版权先例延伸到平台托管方。可比估值表覆盖 CoreWeave(IPO 后 GPU 云可比公司)、Navan(近期 S-1 SaaS 可比公司)、Figma(S-1 可比公司)、私募轮次(Fireworks 传闻 $4B、Replicate、Modal、Sakana、Mistral、Anthropic),以及作为天花板参照的上市公司(NVIDIA、Snowflake)。敏感性驱动因素包括收入增长、毛利率、NRR、退出倍数和概率加权退出窗口。下方乐观 / 基准 / 悲观表和可比估值表记录每种情景的假设、估值逻辑和关键敏感性。估值敏感性条形图和估值区间图展示相对当前 B 轮标记的下行、基准和上行情景。[CV018, CV019, CV020, CV021, CV022, CV023]

乐观 / 基准 / 悲观情景表
情景概率ARR 假设毛利率退出倍数估值 / 回报逻辑关键风险
乐观25%2028 年 ARR >$1B40-50%12-15x ARR$8B-$12B 退出;战略收购 / 高溢价 IPO超大规模云厂商捆绑;NVIDIA 资源重新分配
基准50%2028 年 ARR $500M-$700M30-40%8-10x ARR$4B-$6B 退出;并购出售或 IPO价格竞争;留存下滑
悲观25%2028 年 ARR $200M-$300M20-30%5-7x ARR$1B-$2.5B 结果;降估融资超大规模云厂商价格战;版权判例;NVIDIA 配额削减

概率是主观判断,仅在本章内部使用;Series C 以及每次重大客户或监管事件后, 都应重新标记各行。

[CV015, CV016, CV017, CV018, CV019, CV020]
可比估值表
可比对象指标倍数 / 估值 / 状态参考意义局限
CoreWeave(IPO 后,GPU 云)EV / 未来 12 个月收入IPO 后 8-12x最接近的 GPU 云可比对象CoreWeave 收入结构更偏 GPU 裸金属
Navan(S-1,SaaS)EV / NTM 收入提交招股书时 8-12x成长期 SaaS 可比对象SaaS,不是推理
Figma(S-1,SaaS)EV / NTM 收入提交招股书时 12-15x高倍数 SaaS 可比对象设计 SaaS,不是推理
Fireworks AI(据传 2024 年融资轮)最近一轮私募融资~$4B(据传)直接推理可比对象融资估值据传
Replicate(未上市)最近一轮私募融资未披露直接推理可比对象披露有限
Modal(未上市)最近一轮私募融资未披露无服务器推理可比对象披露有限
Anyscale(未上市)最近一轮私募融资$1B-$2BRay + 推理可比对象定位不同
Sakana AI(融资轮)最近一轮私募融资~$1.5B(2024 年 8 月)开源模型开发商可比对象模型实验室,不是基础设施
Mistral(融资轮)最近一轮私募融资$6B(2024 年中)开源模型实验室可比对象模型 + 基础设施混合
Anthropic(融资轮)最近一轮私募融资$60B+(2025)闭源 API 可比对象商业模式不同——非直接可比
NVIDIA(上市)EV / NTM 收入高十几倍至 20 多倍中段上限参照规模大得多
Snowflake(上市)EV / NTM 收入10-15xSaaS 上限参照成熟 SaaS

可比行混合了上市公司与未上市公司估值;私募融资数字来自媒体报道和 PitchBook。

[CV021, CV022, CV023, CV024, CV025, CV026]
FV002: 估值敏感性

估值结果对收入、利润率、倍数和留存的敏感性。

[CV018, CV019, CV020, CV021]
FV003: 估值 / 回报区间

2028 年退出窗口下,各情景的低 / 基准 / 高估值区间。

[CV022, CV023, CV024, CV025, CV026, CV029]

8.3 投资逻辑破裂触发点、尽调问题与 KPI

投资逻辑破裂与放弃触发点表把本章风险和估值逻辑转成可监控触发点,并绑定具体事件:(a)到 2027-2028 年 ARR 运行率未达 $500M-$700M → 重估基准情景,(b)Salesforce 联合销售降优先级 → 放弃乐观情景,(c)NVIDIA Blackwell 分配削减 → 重估产能爬坡,(d)超大规模云厂商捆绑推理降价 >40% → 价格压缩,(e)任何针对平台托管方的版权裁决 → 重估 OSS 托管,(f)C 轮估值较 B 轮持平 / 下调 → 按市场重估,(g)创始人离职 → 放弃投资逻辑,(h)披露数据泄露或多小时宕机 → 重估 SLA + 声誉。最终尽调问题表记录仍缺失的基础指标——准确 ARR、NRR/GRR、前 10 大客户集中度、GPU 承诺支出、运营支出拆分、CFO/CRO 招聘、主权渠道姿态、付费开发者数量——并把每项映射到负责人或尽调路径。投资 KPI 图把市场、验证、护城河、经济性、风险、估值和证据质量整合为 0-100 分,便于投委会使用。本章明确:建议对价格和证据都敏感。在 $3.3B-$3.5B B 轮估值和已披露证据基础上,持有 / 观察是纪律性答案;若估值回调 25%+,或确认 ARR >$500M 且 NRR >120%,则买入;若 C 轮前任一悲观情景触发,则放弃。[CV034, CV035, CV036, CV037, CV038, CV039]

投资逻辑失效与终止触发项表
触发项阈值对投资逻辑的传导行动含义
相对基准情景的 ARR 运行率FY2027 时 ARR <$500M收入下调重做基准测算
Salesforce 联合销售公开降低优先级渠道下调终止乐观情景
NVIDIA 配额公开宣布削减同业配额产能下调重做产能测算
超大规模云厂商捆绑定价AWS Bedrock 或同业降价 >40%毛利率压缩重做基准测算
版权判例平台托管方裁决OSS 托管假设下调重做 OSS 收入测算
融资Series C 较 Series B 持平或下降按市价重估重做估值测算
创始人离职CEO/CTO/CSO 中任一人执行力下调终止投资逻辑
安全 / 宕机披露安全事件或多小时宕机声誉 + SLA重做企业客户管线测算

触发阈值可通过公开披露或同业可比项监控。

[CV033, CV034, CV035, CV036, CV037, CV038]
最终尽调追问表
主题缺失证据重要性负责人 / 尽调路径
收入runDate 时的准确 ARR基准 / 乐观情景测算向管理层索取 ARR + 增长
留存NRR / GRR / 队列留存收入质量索取按队列的留存
集中度前 10 大客户占比单一事件下行风险索取匿名化前 10 大客户
GPU 承诺对 NVIDIA 的承诺支出毛利率测算索取供应商承诺
运营开支拆分R&D / S&M / G&A烧钱速度测算索取利润表拆分
CFO / CRO到岗情况 + 任期执行力测算确认高管任用
主权渠道Prosperity7 承诺地缘 + 品牌风险确认渠道姿态
付费开发者数付费 / 免费拆分自助收入测算索取付费开发者数
SOC 2 到期Type II 到期日企业采购索取认证更新
开源许可立场OSS 托管政策版权风险敞口索取托管政策

所有尽调追问都对应本章内部问题和风险章节缓释表。

[CV040, CV041, CV042, CV043, CV044]
FV004: 投资 KPI

投委会可用的评分,覆盖市场、客户证据、护城河、经济性、风险、估值和证据质量。

[CV040, CV041, CV042, CV043, CV044]

8.4 证据材料

免责声明

本报告是基于公开证据的尽调快照,不构成投资建议。重要的财务、法律、技术和合同事实仍未公开;作出任何投资决定前,应直接向管理层核验,并查阅一手文件。

证据索引

结论
编号陈述可信度来源
CO001 Together AI markets itself as "the AI acceleration cloud" offering training, fine-tuning, and inference for open-source and custom models. SO001, SO002
CO002 The corporate entity is Together Computer Inc., headquartered in San Francisco, California, with an additional research presence in Zurich. SO002, SO004, SO003
CO003 Together was incorporated on 27 June 2022 by four co-founders: Vipul Ved Prakash, Ce Zhang, Chris Ré, and Percy Liang. SO002, SO018
CO004 The company's public surface positions three product lines: serverless inference API, dedicated endpoints, and fine-tuning/training services. SO001, SO035
CO005 Together emphasises that customers can keep weights and choose dedicated capacity, a deliberate contrast with closed-API providers. SO001, SO005
CO006 CEO Vipul Ved Prakash previously co-founded Topsy, which Apple acquired for approximately $200M in 2013, and earlier co-founded Cloudmark. SO018, SO002
CO007 CTO Ce Zhang is a tenured professor at ETH Zürich specialising in distributed ML and data-centric ML research. SO002, SO018
CO008 Chief Scientist Chris Ré is a MacArthur Fellow at Stanford and a co-founder of Snorkel, anchoring much of Together's open-source research lineage. SO002, SO011
CO009 Co-founder Percy Liang directs the Stanford Center for Research on Foundation Models (CRFM) and leads the HELM benchmark. SO002, SO018
CO010 Princeton CS faculty member Tri Dao is the principal author of FlashAttention and is publicly identified as a Together chief scientist. SO002, SO009, SO036
CO011 Together actively recruits across kernel engineering, GPU systems, applied ML, sales, and revenue operations roles as of May 2026. SO003, SO018
CO012 Together raised a $20M Series Seed in May 2023 led by Lux Capital, with Factory, SciFi Capital, and Long Journey Ventures participating. SO018, SO012
CO013 A $102.5M Series A closed in November 2023, led by Kleiner Perkins with NVIDIA, Emergence, NEA, Prosperity7, and Greycroft participating. SO006, SO014, SO018
CO014 An interim financing in March 2024 reportedly valued Together at approximately $1.25B. SO015, SO018
CO015 Together closed a $305M Series B on 9 July 2024 led by Salesforce Ventures and Coatue at a $3.3B post-money valuation. SO012, SO013, SO016, SO017
CO016 Cumulative disclosed primary capital totals approximately $533M (seed + A + interim + B) before any 2025–2026 extensions. SO012, SO006, SO018
CO017 No Together AI registration, S-1, or other public filing appears on SEC EDGAR as of the May 2026 run date. SO027, SO019
CO018 NVIDIA participated as a strategic investor in both Series A and Series B financings, signalling H100/H200 supply alignment. SO026, SO006, SO012
CO019 CNBC reported Together AI was running at an approximately $100M annualised revenue pace around the Series B announcement in July 2024. SO012
CO020 Bloomberg cited triple-digit year-over-year revenue growth for Together AI at the time of the Series B, without disclosing absolute figures. SO013
CO021 Together has publicly stated it operates more than 20,000 NVIDIA Hopper-class GPUs across its multi-region cluster. SO012, SO005
CO022 The company describes its developer footprint as "hundreds of thousands" of developers, without disclosing paid versus free split. SO001, SO005
CO023 Together's public job board and LinkedIn footprint imply a headcount above 150 full-time staff globally as of May 2026. SO003, SO018
CO024 No audited gross margin, net revenue retention, or paid-customer disclosure exists for Together AI as of the run date. SO027, SO019
CO025 Together AI launched OpenChatKit in March 2023 with LAION and Ontocord, an early open-source instruction-tuned chat baseline. SO008, SO030
CO026 The RedPajama 1T token open dataset was released on 17 April 2023, intended to reproduce LLaMA-grade pretraining data. SO007, SO029
CO027 FlashAttention-3 was published on arXiv and Together's blog on 11 July 2024, claiming state-of-the-art H100 attention performance. SO036, SO009
CO028 StripedHyena-Nous-7B, a non-attention long-context architecture, was released in December 2023 in collaboration with Nous Research. SO031, SO034
CO029 Together's Mixture-of-Agents paper, published in June 2024, demonstrated multi-LLM ensembling improvements on AlpacaEval. SO037, SO011
CO030 Together publishes an active GitHub organisation (togethercomputer) with multiple ten-thousand-star repositories including OpenChatKit and RedPajama-Data. SO028, SO029, SO030
CO031 The HuggingFace organisation togethercomputer hosts the RedPajama datasets and StripedHyena, Pythia, LLaMA-32k, and m2-bert models. SO033, SO011
CO032 No public regulatory action, litigation, recall, or executive departure involving Together AI has been reported as of May 2026. SO018, SO019, SO027
CO033 Together AI is described as one of the most followed open-source-AI infrastructure accounts on Hacker News and X. SO020, SO024, SO021
CO034 Salesforce Ventures publicly framed the Series B as enabling enterprise customers to deploy open models on Together's cloud. SO025, SO012
CO035 Crunchbase's Together AI profile is paywalled and could not be independently verified for cap-table details at runDate. SO019
CO036 Cover-metric "gaps" remain for ARR, gross margin, NRR, and paid-customer count; all are flagged as diligence asks for management. SO027, SO019, SO012
CM001 Together AI competes in the AI compute and inference platform layer between hyperscaler GPU IaaS and closed-API model labs. SM001, SM004, SM023
CM002 Together's addressable spend pool excludes general-purpose cloud compute and closed-only proprietary model APIs. SM001, SM002
CM003 Status-quo substitutes for Together include self-hosted Kubernetes-on-GPU clusters and OpenAI/Anthropic closed APIs. SM011, SM012
CM004 Specialised GPU clouds (CoreWeave, Lambda) compete on infrastructure but lack Together's open-source-model SaaS layer. SM013, SM014
CM005 Inference-API providers (Replicate, Fireworks, Groq, Modal) compete directly at the per-token serverless layer. SM015, SM019, SM018, SM016
CM006 AWS Bedrock and Google Vertex AI offer hosted open-model inference that overlaps Together's serverless product. SM011, SM012
CM007 Gartner sizes 2024 AI infrastructure TAM at $40–60B with a 30–50% CAGR through 2028. SM021
CM008 IDC-style analyst notes peg 2024 global AI infrastructure spend near $50B. SM021, SM022
CM009 Triangulated inference + dedicated GPU SAM for 2026 lands in an $8–15B range. SM021, SM024, SM022
CM010 Together-addressable SOM (channels + open-model demand) is on the order of $1–3B in 2026. SM024, SM027
CM011 CNBC reported a ~$100M Together ARR at the July 2024 Series B, implying mid-single-digit SOM share. SM024, SM025
CM012 NVIDIA disclosed >$30B quarterly data-centre revenue in early 2025, evidence that AI-compute spend dwarfs Together's ARR. SM028, SM022
CM013 No single public source cleanly disaggregates inference spend from training capex, creating range uncertainty. SM021, SM028, SM022
CM014 AI-native startups and model labs are Together's most active early buyers, choosing it for open-weight flexibility and dedicated GPU access. SM003, SM032
CM015 F500 enterprise platform teams are an emerging segment, anchored by Salesforce Ventures Series B leadership. SM027, SM024
CM016 Sovereign and regional cloud customers are a strategic third segment, signalled by Prosperity7 (Aramco) investor presence. SM024, SM023
CM017 Within Together, users (developers) frequently differ from payers (procurement/finance), lengthening enterprise sales cycles. SM027, SM004
CM018 Self-serve credit-card adoption is the primary land motion for AI-native startup customers on Together. SM002, SM008
CM019 Together's NVIDIA GTC 2025 spotlight emphasised "AI pioneers" as case-study customers, validating the enterprise wedge. SM033, SM028
CM020 Together's AI-Native conference (2025) was framed as a developer community event, reinforcing top-of-funnel demand generation. SM005, SM030
CM021 Open-weight model proliferation (Llama 3/4, DeepSeek, Mistral, Qwen) keeps SAM growth above 35% CAGR through 2027. SM022, SM021, SM029
CM022 NVIDIA Hopper and Blackwell GPU scarcity drives demand for Together's reserved capacity SKUs. SM028, SM013
CM023 Closed-API price cuts from OpenAI compress per-token margins across the inference market. SM002, SM030
CM024 Hyperscaler open-model commoditisation (AWS Bedrock, GCP Vertex Model Garden) threatens to erode Together's pure-inference SAM. SM011, SM012
CM025 Sovereign data residency rules accelerate demand for in-region dedicated clusters but cap cross-border ARR. SM004, SM023
CM026 Energy and data-centre permitting bottlenecks slow capacity expansion through 2028. SM013, SM028
CM027 Agentic AI workloads (Mixture-of-Agents, multi-step reasoning) multiply per-user token volume. SM004, SM005
CM028 FinOps pressure pushes enterprises to substitute open-weight inference for closed-API spend. SM002, SM027
CM029 Together announces serverless, dedicated, and batch inference SKUs to capture different buyer demand curves. SM002, SM008, SM009, SM010
CM030 Batch inference pricing updates in 2025 reduced per-million-token costs to attract high-volume customers. SM006, SM010
CM031 Specialised GPU clouds CoreWeave and Lambda compete on raw GPU-hour pricing; Together overlays an inference SaaS layer. SM013, SM014
CM032 Groq, Cerebras, and SambaNova compete with bespoke silicon for inference latency leadership. SM018, SM020
CM033 Modal, Replicate, and Anyscale compete in serverless and Ray-based AI compute SaaS. SM016, SM015, SM017
CM034 Fireworks AI is widely cited as Together's closest direct competitor on open-model inference SaaS. SM019, SM030
CM035 Public-cloud earnings (AWS, GCP) describe AI workloads as the fastest-growing portion of cloud revenue. SM011, SM012
CM036 Reddit r/LocalLLaMA and Hacker News discussion volume around Together has risen steadily through 2024–2026. SM030, SM029, SM031
CP001 Together competes against AWS Bedrock and Google Vertex Model Garden on hosted open-weight model inference. SP018, SP019, SP001
CP002 Specialised GPU clouds CoreWeave and Lambda compete with Together at the IaaS layer for reserved GPU capacity. SP020, SP021
CP003 Fireworks, Replicate, Modal, and Anyscale provide direct substitutes at the per-token serverless inference layer. SP026, SP022, SP023, SP024
CP004 Groq, Cerebras, and SambaNova compete with bespoke silicon for inference latency leadership. SP025, SP027
CP005 OpenAI and Anthropic act as substitutes for closed-API customers willing to give up weight portability. SP018, SP036
CP006 TensorWave provides AMD MI300X GPU capacity as a niche alternative for cost-sensitive teams. SP028
CP007 Self-hosted Kubernetes-on-GPU is the status-quo alternative most cited by frontier labs and FAANG. SP036, SP037
CP008 Fireworks AI is widely cited as Together's closest direct competitor on open-model inference SaaS. SP026, SP036, SP037
CP009 Together leads on FlashAttention kernel performance, anchored by the FlashAttention-3 paper and Together engineering team. SP031, SP005, SP029, SP030
CP010 FlashAttention-4 was released in 2025 and extends Together's kernel lead on Hopper GPUs. SP006
CP011 AWS Bedrock and GCP Vertex lead on enterprise compliance breadth (BAA, FedRAMP, regional residency). SP018, SP019
CP012 Groq leads on single-stream inference latency on its supported models but lags in model coverage. SP025, SP036
CP013 Fireworks AI provides an OpenAI-compatible API and serves the same open-model catalog as Together. SP026, SP015
CP014 Together's serverless Llama-70B is listed near $0.88 per million tokens, within the OpenAI-parity envelope. SP002, SP011
CP015 Together batch inference offers up to 50% discount versus serverless rates as of the 2025 update. SP013
CP016 AWS Bedrock charges $0.99/M output tokens for Llama 3 70B in 2026 list pricing. SP018
CP017 GCP Vertex Llama 3 70B is priced near $0.99/M tokens with volume discounts. SP019
CP018 Groq lists Llama 3 70B at ~$0.59/M tokens, undercutting Together on raw price while constraining model choice. SP025
CP019 CoreWeave and Lambda charge $2–4 per H100-hour for reserved or on-demand GPUs. SP020, SP021
CP020 Together fine-tuning API, batch SKU, and dedicated endpoints differentiate it from raw-GPU competitors. SP012, SP013, SP011
CP021 Together's open-source research lineage (RedPajama, StripedHyena, MoA, FlashAttention) sustains community gravity that competitors struggle to match. SP031, SP034, SP004
CP022 Tri Dao and Chris Ré anchor Together's kernel and architecture research velocity. SP031, SP005, SP008
CP023 NVIDIA's participation in Series A and Series B is read by the market as a GPU supply alignment moat. SP041
CP024 Salesforce Ventures Series B leadership opens an enterprise distribution channel competitors lack. SP004, SP003
CP025 Together advertises dedicated endpoints and reserved capacity SKUs that raise customer switching cost. SP012, SP002
CP026 Hyperscalers (AWS, GCP) own enterprise procurement and identity, which is a distribution disadvantage Together must compensate for. SP018, SP019
CP027 Enterprise multi-homing across Together / Fireworks / Bedrock is the reported equilibrium in 2026 buyer surveys. SP036, SP037
CP028 Open-weight neutrality is a counter-positioning advantage versus closed-only OpenAI and Anthropic substitutes. SP001, SP002
CP029 Together publishes an OpenAI-compatible chat completions endpoint, simplifying migration from closed APIs. SP015, SP016
CP030 CoreWeave's 2024 IPO disclosures reveal $1B+ revenue scale, implying meaningful capital advantage at the IaaS layer. SP020, SP036
CP031 Lambda Labs raised a $320M Series C in 2024 to expand its H100/H200 fleet. SP021
CP032 Groq and Cerebras have each raised more than $1B in 2024–2025 to fund bespoke silicon expansion. SP025, SP027
CP033 AWS Bedrock's 2025 expansion of Llama support compresses Together's premium on commodity inference workloads. SP018
CP034 Specialised silicon vendors (Groq, Cerebras, SambaNova) pose a latency-leapfrog risk that pure-software inference cannot fully match. SP025, SP027
CP035 Together's Python SDK and PyPI download trajectory signal sustained developer pull comparable to peers. SP042, SP043
CP036 Speculative-decoding and Medusa-class research feed Together's ability to close any Groq latency gap on shared models. SP032, SP033
CI001 Together AI raised a $20M Seed in May 2023 led by Lux Capital. SI008, SI018, SI019
CI002 Together AI raised a $102.5M Series A in November 2023 led by Kleiner Perkins. SI005, SI015, SI018
CI003 In March 2024 Together added approximately $106M at a reported $1.25B valuation (Series A2). SI016, SI007, SI014
CI004 Per the canonical company-overview claim, the Series B closed July 2024 at ~$3.3B post led by Salesforce Ventures and Coatue (financials chapter relies on that fact for capital-stack analysis). SI011, SI012, SI013, SI006
CI005 NVIDIA participated in both Series A and Series B as a strategic investor. SI022, SI006
CI006 Salesforce Ventures led the Series B, opening an enterprise distribution channel. SI021, SI011, SI006
CI007 Cumulative disclosed primary capital is approximately $533M across Seed, Series A, March 2024 extension, and Series B. SI011, SI018, SI006
CI008 No S-1, S-3, or registered offering appears on SEC EDGAR for Together Computer Inc. at the 2026-05 runDate. SI020, SI025
CI009 CNBC reported an approximately $100M annualised revenue pace around the July 2024 Series B announcement. SI011, SI012
CI010 Bloomberg reported triple-digit revenue growth around the July 2024 Series B. SI013, SI014
CI011 Together has not published audited ARR, gross margin, or NRR figures as of the runDate. SI020, SI001, SI002
CI012 Together publishes per-token list pricing on its public pricing page for serverless inference. SI002, SI001
CI013 Together offers a 50% batch inference discount as of the 2025 batch pricing update. SI009, SI002
CI014 Dedicated endpoint and reserved-capacity pricing is quoted via sales rather than published. SI002, SI004
CI015 Together SKUs span serverless, dedicated, fine-tuning, batch, embeddings, vision, audio, and image. SI002, SI001, SI004
CI016 Realised enterprise pricing for Together is not publicly disclosed and is a material diligence gap. SI002, SI038
CI017 The Information has published paywalled coverage of Together AI 2025 revenue trajectory. SI026
CI018 PitchBook lists Together AI as later-stage venture with no public 2025 round confirmation. SI025, SI019
CI019 Together has not disclosed gross margin by SKU as of the runDate. SI020, SI002, SI001
CI020 Together has not disclosed top-10 customer concentration as of the runDate. SI020, SI003
CI021 Together has not disclosed net dollar retention (NDR) as of the runDate. SI020, SI003
CI022 Together has not disclosed contracted-revenue (RPO) figures. SI020, SI001
CI023 Together has not disclosed cash position or runway as of the runDate. SI020, SI001
CI024 CoreWeave 2024 S-1 disclosures imply GPU-cloud gross margins in the 60-70% range on reserved deals. SI032, SI035
CI025 Together per-token gross margin on serverless is plausibly 40-60% based on competitor analog disclosures. SI032, SI036, SI037
CI026 Implied cash burn through 2024 is roughly $300-$500M consistent with GPU buildout and 150+ headcount. SI004, SI001, SI018
CI027 With $533M raised and that implied burn, runway likely extends into 2026 without a new round. SI006, SI011
CI028 Figma and CoreWeave 2025 IPOs demonstrate the public-market window is open for AI-infrastructure issuers. SI034, SI032
CI029 Navan 2025 S-1 process is a closer growth-SaaS comparable than CoreWeave for Together. SI033
CI030 Together has not disclosed any debt or vendor-financing facility. SI020, SI004
CI031 Founder and employee ownership post Series B is widely reported as significant but no exact percentages are public. SI006, SI018, SI019
CI032 No public secondary or tender offer for Together AI shares has been reported at the runDate. SI020, SI025, SI026
CI033 Forrester and IDC market frames place Together in the growth-stage generative-AI infrastructure segment without naming it top-three. SI027, SI028
CI034 Menlo Ventures and Bessemer 2025 State-of-AI reports frame the inference market as multi-billion-dollar and growing. SI030, SI031, SI029
CI035 No public 2026 follow-on round, IPO filing, or M&A announcement involving Together has been confirmed at the runDate. SI020, SI025, SI026, SI011
CI036 Together pricing-page revisions in 2025 added batch and dedicated SKU clarifications, signalling product and financial maturation. SI009, SI002, SI004
CI037 Public disclosure across ten standard financial primitives is missing or partial, qualifying as a material diligence gap. SI020, SI001, SI002, SI003
CE001 Together AI exposes serverless inference, dedicated endpoints, fine-tuning, batch, embeddings, vision, audio, and image APIs. SE016, SE018, SE001, SE003
CE002 Together AI publishes an OpenAI-compatible chat-completions endpoint to simplify migration. SE022, SE035
CE003 The Together model catalog spans 200+ open and custom models including Llama, Mistral, Mixtral, Qwen, DeepSeek, StripedHyena. SE018, SE036, SE045
CE004 Dedicated endpoints offer reserved H100/H200/B200 capacity with BAA available for HIPAA workloads. SE020, SE003, SE005
CE005 Fine-tuning API supports LoRA and full-parameter training jobs on most supported families. SE019, SE042
CE006 Batch inference offers up to 50% discount vs serverless as of the 2025 update. SE011, SE021
CE007 Embeddings API offers multiple open embedding models per published reference. SE024, SE034
CE008 Together publishes vision, audio, and image APIs as documented surfaces. SE031, SE032, SE033
CE009 SDKs ship in Python (PyPI: together) and TypeScript with raw HTTP fallback. SE044, SE043, SE017
CE010 Rate-limit documentation distinguishes free, paid, and enterprise tiers. SE025, SE016
CE011 Together architecture stacks API gateway, model registry, inference scheduler, TIE v2, and GPU pool. SE016, SE009, SE010
CE012 Together Inference Engine v2 integrates FlashAttention-3/4 and ThunderKittens kernels. SE010, SE006, SE007, SE008
CE013 Speculative decoding and Medusa decoders are integrated into the inference engine. SE053, SE054, SE055
CE014 Mixture-of-Agents (MoA) provides ensemble inference for higher-quality completions on supported models. SE056, SE012
CE015 FlashAttention-3 paper (arXiv 2407.08608) describes the kernel anchoring Together throughput claims. SE052, SE006
CE016 FlashAttention-4 was released in August 2025 and extends the kernel lead to Hopper and Blackwell. SE007, SE012
CE017 ThunderKittens kernel framework was released in 2024 by Together and Stanford HazyResearch. SE008, SE065
CE018 NVIDIA is the primary GPU supplier (Hopper H100/H200, Blackwell B200) and a strategic investor. SE060, SE014, SE001
CE019 HuggingFace is the primary model artefact partner and hosts Together-published checkpoints. SE045, SE049
CE020 A status page is published at status.together.ai documenting platform reliability. SE062
CE021 The public SLA percentage for serverless and dedicated tiers is not yet documented at the runDate. SE062, SE025
CE022 Together infrastructure organisation expanded in 2025 with Alon Gavrielov as VP of Infrastructure Strategy. SE015, SE005
CE023 Trust center publishes SOC 2 Type II attestation references and HIPAA BAA availability. SE063, SE066, SE067
CE024 HIPAA BAA is available on dedicated endpoints but not serverless tier per documentation. SE063, SE020
CE025 GDPR / DPA terms are available for EU customers per trust center documentation. SE063
CE026 FedRAMP accreditation is not yet listed in the trust center at the runDate. SE063
CE027 The full regional residency map (which regions, which co-lo partners) is not publicly disclosed. SE063, SE020
CE028 ISO 27001 certification status is not publicly confirmed at the runDate. SE063
CE029 Content moderation, function calling, JSON mode, and structured-output safety controls are documented surfaces. SE028, SE027, SE026
CE030 Audit logs are documented for enterprise customers but not enabled by default. SE063, SE020
CE031 Custom-model-weights privacy controls are documented for dedicated tier. SE020, SE063
CE032 A bug bounty / responsible disclosure programme is published on the trust center. SE063
CE033 GTC 2025 Pioneers event surfaced multiple Together customer + NVIDIA partnerships. SE014, SE060
CE034 Adaption partnership (2025) extends Together into healthcare workflows. SE005
CE035 AI Native Conference 2025 announced research and product directions including MoA productisation. SE012, SE005
CE036 Blackwell (B200) capacity ramp is documented as 2026 roadmap item in blog references. SE005, SE014
CE037 Multi-modal expansion (vision + audio) is a documented 2026 roadmap area. SE005, SE012
CU001 Together AI reports more than 100,000 developers have used the platform per company disclosure. SU004, SU003, SU001
CU002 Self-serve developer signup is the primary top-of-funnel adoption motion for Together AI. SU038, SU001, SU003
CU003 Together customers page enumerates named startup and enterprise deployments. SU003, SU001
CU004 AI-native startups (Pika, Cartesia, Arcee, Nous Research) are documented production customers. SU012, SU015, SU013, SU014, SU003
CU005 Enterprise SaaS deployments at Salesforce and Zoom are documented case studies. SU010, SU011, SU003
CU006 Washington University is referenced as a research-compute customer in a case study. SU016, SU003
CU007 Adaption (2025) extends Together into healthcare workflows. SU008, SU004
CU008 NVIDIA GTC 2025 Pioneers programme surfaced a cohort of joint Together + NVIDIA customers. SU007, SU018
CU009 Startup Accelerator launched in November 2024 as an explicit startup-acquisition funnel. SU006, SU004
CU010 Geographic mix is North America-skewed with EU presence growing through dedicated clusters. SU003, SU004, SU001
CU011 Buyer/user split differs by tier: developer-led self-serve vs CIO/platform-eng-led enterprise. SU038, SU003
CU012 Salesforce case study documents integration depth and is treated as production deployment. SU010, SU017, SU003
CU013 Zoom case study documents AI-feature inference at production scale. SU011, SU003
CU014 Pika case study cites latency improvement from FlashAttention-class kernels. SU012, SU003
CU015 Cartesia case study documents voice-model production deployment on dedicated tier. SU015, SU003
CU016 Arcee case study documents cost reduction relative to closed APIs. SU013, SU003
CU017 Nous Research case study documents community model hosting on Together. SU014, SU003
CU018 Washington University case study documents research-compute usage. SU016, SU003
CU019 Adaption is described as a launching partnership rather than confirmed production deployment. SU008
CU020 GTC 2025 cohort case studies cover developer tools, robotics, healthcare, and content/media. SU007
CU021 HuggingFace partnership funnels developers from the model hub into Together. SU019, SU020
CU022 Net dollar retention (NDR) is not publicly disclosed at the runDate. SU003, SU001, SU004
CU023 Gross retention (GRR) and named-account churn are not publicly disclosed. SU003, SU001, SU004
CU024 Paid vs free developer counts are not disclosed. SU004, SU003
CU025 Dedicated-endpoint renewal rate is not publicly disclosed. SU004, SU003
CU026 G2 and Trustpilot review counts for Together are small, limiting independent proxies. SU026, SU027
CU027 Salesforce Ventures-led Series B and customer case study together signal a multi-year channel commitment. SU017, SU010, SU004
CU028 GTC 2025 Pioneers cohort acts as an enterprise pipeline amplifier through NVIDIA. SU007, SU018
CU029 Startup Accelerator provides credits and GTM amplification to long-tail AI startups. SU006, SU004
CU030 Adaption launch indicates a follow-on path into regulated healthcare workflows. SU008
CU031 Enterprise sales cycle requires custom MSA and security review, adding 60-120 days before revenue. SU004, SU038
CU032 Top-10 customer concentration is undisclosed and is a material diligence ask. SU003, SU004
CU033 Public customer mix skews AI-native startups + developer tools rather than a single mega-anchor. SU003, SU006, SU007
CU034 No public lawsuit or named-account churn report has surfaced for Together at the runDate. SU023, SU022, SU004
CU035 Reddit and Hacker News threads occasionally cite latency or cold-start concerns on the serverless tier. SU023, SU022
CU036 Public status page exists but no SLA percentage is published for serverless or dedicated tiers. SU042, SU038
CU037 PyPI download trajectory and GitHub repo activity indicate sustained developer pull. SU040, SU041
CR001 FTC opened a 6(b) inquiry in 2024 into generative-AI investments and partnerships, naming the major cloud-AI relationships. SR002, SR001
CR002 FTC has stated ongoing 2024-2025 attention to GenAI competition and consumer-protection enforcement. SR001, SR002
CR003 EU AI Act entered into force in 2024 with phased GPAI obligations through 2026-2027 including fines up to 7% of global revenue. SR003, SR012
CR004 BIS tightened advanced-computing export controls in 2025 covering H100, H200, B200 and certain foundation-model weights. SR005, SR008
CR005 NIST AI Risk Management Framework establishes voluntary US federal AI controls increasingly used in enterprise procurement. SR004, SR008
CR006 UK ICO has published GenAI guidance creating UK DPA compliance baseline. SR006
CR007 Australia OAIC has published a 2024 GenAI guide for organisations. SR007
CR008 White House EO on AI (2023, amended 2025) sets reporting thresholds for foundation-model training. SR008
CR009 CCPA imposes privacy obligations on Together for California-resident user data. SR009, SR012
CR010 HIPAA BAA support is published as available for healthcare workloads. SR010, SR028, SR026
CR011 SOC 2 attestation surface is referenced via the AICPA SOC framework and Together trust center. SR011, SR028
CR012 NYT v Microsoft/OpenAI active litigation (CourtListener docket) is the bellwether GenAI copyright case in US. SR013, SR014
CR013 Authors Guild v OpenAI active litigation expands copyright exposure to non-press content. SR014, SR013
CR014 Getty Images v Stability AI active litigation tests image-model copyright exposure on both US and UK sides. SR015, SR014
CR015 Civil-society organisations (CDT) actively lobby for AI accountability, adding reputational pressure. SR012
CR016 Together is not currently named in any of the bellwether GenAI copyright suits. SR013, SR014, SR015, SR025
CR017 Open-model hosting carries adjacent precedent risk if copyright cases extend to platform hosts. SR013, SR014, SR015
CR018 Together publishes a public status page but does not publish an SLA percentage. SR027, SR030
CR019 Pen-test cadence, breach plan, and named incident history are not publicly disclosed. SR028, SR025
CR020 Safety models and function-calling guardrails are documented mitigations for prompt-injection class risks. SR031, SR030
CR021 HuggingFace integrity checks are inherited for model-weight artefacts; weight-signing process is undisclosed. SR028, SR025
CR022 Trust center references SOC 2 Type II posture; attestation expiry date is not public. SR028, SR011
CR023 NVIDIA is supplier of GPUs, networking, and software stack and a strategic investor — single-vendor concentration is high. SR025, SR024, SR029
CR024 HuggingFace is the primary model-artefact dependency for the Together catalog. SR025, SR029
CR025 Salesforce Ventures is lead enterprise channel investor and co-sell partner. SR025, SR029
CR026 Datacenter / colo capacity counterparties are largely undisclosed; multi-region build is implied but not enumerated. SR025, SR024
CR027 Capital partners include GC, Salesforce, NVIDIA, Lux, Coatue, Prosperity7, and Kleiner per public round disclosures. SR025, SR034, SR035
CR028 Top-10 customer concentration is undisclosed and is a material diligence ask. SR029, SR025
CR029 Competitive displacement risk is documented from Fireworks, Replicate, Modal, Anyscale, Groq, Cerebras, CoreWeave, Lambda. SR019, SR020, SR021, SR022, SR017, SR018, SR016, SR023
CR030 Open-source model upstream license changes (Llama, Mistral, Qwen, DeepSeek) would introduce review and compliance burden. SR025, SR029
CR031 Sovereign / Prosperity7-adjacent backing adds geopolitical disclosure considerations. SR034, SR035, SR025
CR032 Key-person dependency on Vipul Ved Prakash, Ce Zhang, and Tri Dao is high; founder retention is the mitigation. SR024, SR025
CR033 CFO and CRO presence at runDate is not publicly confirmed and is a material recruiting diligence ask. SR025, SR024
CR034 Engineering and infra hiring momentum is visible (Alon Gavrielov 2025 VP-infra hire) but exact bench size is undisclosed. SR025, SR024
CR035 Hopper→Blackwell→Rubin transition execution is a multi-quarter program-management risk for the chapter. SR025
CR036 Monitorable kill triggers (NVIDIA allocation cut, HF policy change, EU AI Act fine, copyright host-ruling) can be tracked from public disclosure. SR025, SR003, SR005, SR013
CR037 Operational kill triggers (multi-hour serverless outage, breach disclosure) are monitorable through status page and press. SR027, SR025, SR032, SR033
CR038 Commercial kill triggers (Salesforce co-sell deprioritisation, customer concentration >25% single) are monitorable through press and reference calls. SR025, SR029
CR039 Founder-departure triggers are catastrophic for the thesis at growth stage. SR025, SR024
CR040 Financing kill triggers (flat/down round vs Series B at runDate) would re-underwrite valuation. SR025, SR034, SR035
CR041 Adverse-source coverage spans regulators, court dockets, competitors, and developer-sentiment fora. SR002, SR013, SR019, SR032, SR033
CR042 Several control primitives (SLA, incident, breach plan, top-10 concentration, GPU committed spend) remain undisclosed at runDate and are explicit diligence asks. SR025, SR029, SR028, SR027
CV001 Recommendation is Hold / Monitor with medium confidence at the Series B mark. SV025, SV027, SV007
CV002 Conditional Buy on a 25%+ correction or confirmed >$500M ARR plus >120% NRR. SV008, SV007, SV005
CV003 Conditional Pass if hyperscaler pricing cuts >40%, NVIDIA allocation cuts, or breach disclosure occurs. SV001, SV002, SV003
CV004 Risk rating is medium-high reflecting concentration, regulatory, and competitive overhangs. SV001, SV002, SV006
CV005 Valuation stance is "at-or-near" the current Series B mark with explicit triggers to revisit. SV007, SV008
CV006 GenAI inference TAM grows 40-60% CAGR per multiple analyst sources at 2025 mid-point. SV001, SV002, SV003, SV004, SV005
CV007 FlashAttention authorship by Tri Dao and ThunderKittens (Stanford HazyResearch) anchor Together's kernel moat. SV025, SV024
CV008 Together Inference Engine v2 and MoA productisation extend the technical surface beyond commoditised inference. SV025
CV009 Salesforce Ventures-led Series B + customer case study imply multi-year channel commitment. SV043, SV025, SV018
CV010 NVIDIA strategic investment + GTC 2025 Pioneers cohort signal supply + pipeline alignment. SV025, SV044
CV011 Open-source neutrality (Llama, Mistral, Qwen, DeepSeek) is defensible positioning vs closed-API providers. SV025, SV027
CV012 Documented enterprise + startup proof base spans Salesforce, Zoom, Pika, Cartesia, Arcee, Nous Research, GTC 2025 Pioneers. SV027, SV025
CV013 Anti-thesis: hyperscaler bundled inference (Bedrock, Vertex, Azure) could compress pricing 30-50%. SV001, SV002, SV006
CV014 Anti-thesis: copyright litigation precedent (NYT, Authors Guild, Getty) could extend to platform hosts. SV025, SV008
CV015 Bull case (25% prob) assumes ARR >$1B by 2028 and exit $8B-$12B. SV001, SV005, SV006
CV016 Base case (50% prob) assumes ARR $500M-$700M by 2028 and exit $4B-$6B. SV001, SV003, SV002
CV017 Bear case (25% prob) assumes ARR $200M-$300M by 2028 and outcome $1B-$2.5B. SV001, SV002, SV006
CV018 Sensitivity to ARR growth is the single largest valuation driver in the chapter model. SV007, SV008
CV019 Gross margin sensitivity is ±1000bps shifts valuation outcome ±$2-3B at base case. SV014, SV013
CV020 Multiple sensitivity is ±3x ARR shifts exit ±$2.5B at base case. SV013, SV015
CV021 Probability weights are subjective and re-marked at Series C and major events. SV007, SV008
CV022 CoreWeave post-IPO trades 8-12x NTM revenue as GPU-cloud comparable. SV014, SV018
CV023 Navan S-1 disclosed 8-12x NTM revenue range at filing for growth-stage SaaS. SV013, SV030
CV024 Figma S-1 disclosed 12-15x NTM revenue range as high-multiple SaaS reference. SV015, SV029
CV025 Fireworks AI rumoured 2024 round valued ~$4B per press reports. SV018, SV019
CV026 Replicate and Modal rounds undisclosed in public press. SV023, SV022
CV027 Anyscale private valuation rumoured $1B-$2B at last round. SV023, SV019
CV028 Sakana AI round ~$1.5B Aug 2024 per TechCrunch and NVIDIA partnership. SV031, SV032
CV029 Mistral round ~$6B mid-2024 as OSS-model-lab comparable. SV019, SV018
CV030 Anthropic round at $60B+ in 2025 as closed-API reference, not direct comparable. SV018, SV019
CV031 NVIDIA public NTM revenue multiple high-teens to mid-20s acts as ceiling reference. SV018, SV019
CV032 Snowflake NTM revenue multiple 10-15x acts as mature-SaaS ceiling reference. SV018, SV019
CV033 ARR run-rate <$500M by FY2027 is the base-case kill trigger. SV008, SV007
CV034 Salesforce co-sell public deprioritisation is the bull-case kill trigger. SV043, SV025
CV035 NVIDIA Blackwell allocation cut to a peer is a re-underwrite trigger. SV044, SV025
CV036 Hyperscaler bundled pricing cut >40% on AWS Bedrock or peer is a base-compression trigger. SV001, SV002
CV037 Platform-host copyright precedent is an OSS-revenue re-underwrite trigger. SV025, SV008
CV038 Series C flat-or-down vs Series B is a mark-to-market trigger. SV018, SV019, SV007
CV039 Founder departure (CEO/CTO/CSO) is a kill trigger. SV024, SV025
CV040 Exact ARR at runDate is undisclosed and is the principal diligence ask. SV008, SV007, SV027
CV041 NRR / GRR / cohort retention are undisclosed at runDate and are material diligence asks. SV027, SV025
CV042 Top-10 customer concentration and GPU committed spend are undisclosed. SV027, SV025
CV043 CFO and CRO presence at runDate is unconfirmed. SV024, SV025
CV044 Opex split (R&D / S&M / G&A), paid-developer count, SOC 2 expiry, and OSS hosting policy are all diligence asks. SV025, SV027
CV045 Sacra estimates Together AI reached $1B in annualized revenue by February 2026, up from ~$618M at year-end 2025, representing ~400% year-over-year growth in 2024. SV045, SV046
CV046 Together AI is in talks to raise approximately $1B at a $7.5B pre-money valuation as of March 2026, which would represent a >2× step-up from the $3.3B Series B valuation set in February 2025. SV045, SV047
CV047 EquityZen lists Together AI as available for pre-IPO secondary share purchases by accredited investors, indicating secondary-market liquidity exists for current shareholders. SV047, SV045
CV048 CB Insights' Q1 2026 State of AI report identifies AI infrastructure as the leading funding category in early 2026, with total AI deal activity up materially from prior quarters, supporting the demand context for Together AI's growth. SV048, SV001
来源
编号出版方标题引文
SO001 Together AI Together AI — The AI Acceleration Cloud
SO002 Together AI About | Together AI
SO003 Together AI Careers | Together AI
SO004 Together AI Contact | Together AI
SO005 Together AI Together AI Blog
SO006 Together AI Together AI raises $102.5M Series A
SO007 Together AI RedPajama, a project to create leading open-source models
SO008 Together AI Announcing OpenChatKit
SO009 Together AI FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
SO010 Together AI Together Inference Engine 2.0
SO011 Together AI Research | Together AI
SO012 CNBC Together AI raises $305 million at $3.3 billion valuation
SO013 Bloomberg Together AI Startup Raises Funds at $3.3 Billion Valuation
SO014 TechCrunch Together raises $102.5M to build open-source generative AI
SO015 TechCrunch Together AI is worth $1.25B (March 2024 update)
SO016 Fast Company Together AI funding profile
SO017 VentureBeat Together AI raises $305M for open-source GenAI
SO018 Wikipedia Together AI — Wikipedia
SO019 Crunchbase Together AI — Crunchbase Profile
SO020 Hacker News Submissions from together.ai
SO021 Reddit r/LocalLLaMA Together AI discussions
SO022 Product Hunt Together AI on Product Hunt
SO023 StackShare Together AI Tech Stack
SO024 X (Together) @togethercompute on X
SO025 Salesforce Ventures Salesforce Ventures Perspectives
SO026 NVIDIA NVIDIA AI investments 2024
SO027 SEC EDGAR SEC EDGAR — Together AI search
SO028 GitHub Together Computer · GitHub Org
SO029 GitHub togethercomputer/RedPajama-Data
SO030 GitHub togethercomputer/OpenChatKit
SO031 GitHub togethercomputer/StripedHyena
SO032 GitHub Dao-AILab/flash-attention
SO033 HuggingFace togethercomputer on Hugging Face
SO034 HuggingFace StripedHyena-Nous-7B
SO035 Together AI Introduction | Together AI Docs
SO036 arXiv FlashAttention-3: Fast and Accurate Attention with Asynchrony
SO037 arXiv Mixture-of-Agents Enhances LLM Capabilities
SO038 Gartner Gartner AI Insights
SO039 CoreWeave CoreWeave — Specialized GPU Cloud
SM001 Together AI Together AI — The AI Acceleration Cloud
SM002 Together AI Pricing | Together AI
SM003 Together AI Customers | Together AI
SM004 Together AI Together AI Blog
SM005 Together AI AI Native Conf — research & product announcements
SM006 Together AI Batch inference API updates 2025
SM007 Together AI Inference Models | Together AI Docs
SM008 Together AI Serverless Inference | Together AI Docs
SM009 Together AI Dedicated Endpoints | Together AI Docs
SM010 Together AI Batch Inference | Together AI Docs
SM011 AWS Amazon Bedrock
SM012 Google Cloud Vertex AI
SM013 CoreWeave CoreWeave — Specialized GPU Cloud
SM014 Lambda Labs Lambda — GPU Cloud for AI
SM015 Replicate Replicate — Run models in the cloud
SM016 Modal Modal — Serverless AI infrastructure
SM017 Anyscale Anyscale — Powered by Ray
SM018 Groq Groq — Fast AI inference
SM019 Fireworks AI Fireworks AI — Production-grade LLM inference
SM020 Cerebras Cerebras — Wafer-Scale AI
SM021 Gartner Gartner AI Insights
SM022 arXiv LLM inference infrastructure survey
SM023 Wikipedia Together AI — Wikipedia
SM024 CNBC Together AI raises $305 million at $3.3 billion valuation
SM025 Bloomberg Together AI Startup Raises Funds at $3.3 Billion Valuation
SM026 Fast Company Together AI funding profile
SM027 Salesforce Ventures Salesforce Ventures Perspectives
SM028 NVIDIA NVIDIA AI investments 2024
SM029 Hacker News Submissions from together.ai
SM030 Reddit r/LocalLLaMA Together AI discussions
SM031 Product Hunt Together AI on Product Hunt
SM032 Together AI Together AI Startup Accelerator
SM033 Together AI Together AI at NVIDIA GTC 2025
SP001 Together AI Together AI — The AI Acceleration Cloud
SP002 Together AI Pricing | Together AI
SP003 Together AI Customers | Together AI
SP004 Together AI Together AI Blog
SP005 Together AI FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
SP006 Together AI FlashAttention-4
SP007 Together AI Together Inference Engine 2.0
SP008 Together AI ThunderKittens kernel framework
SP009 Together AI AI Native Conf — research & product announcements
SP010 Together AI Inference Models | Together AI Docs
SP011 Together AI Serverless Inference | Together AI Docs
SP012 Together AI Dedicated Endpoints | Together AI Docs
SP013 Together AI Batch Inference | Together AI Docs
SP014 Together AI Rate Limits | Together AI Docs
SP015 Together AI Chat Completions API Reference
SP016 Together AI Completions API Reference
SP017 Together AI Models API Reference
SP018 AWS Amazon Bedrock
SP019 Google Cloud Vertex AI
SP020 CoreWeave CoreWeave — Specialized GPU Cloud
SP021 Lambda Labs Lambda — GPU Cloud for AI
SP022 Replicate Replicate — Run models in the cloud
SP023 Modal Modal — Serverless AI infrastructure
SP024 Anyscale Anyscale — Powered by Ray
SP025 Groq Groq — Fast AI inference
SP026 Fireworks AI Fireworks AI — Production-grade LLM inference
SP027 Cerebras Cerebras — Wafer-Scale AI
SP028 TensorWave TensorWave — AMD GPU cloud
SP029 arXiv FlashAttention: Fast and Memory-Efficient Exact Attention
SP030 arXiv FlashAttention-2: Faster Attention with Better Parallelism
SP031 arXiv FlashAttention-3: Fast and Accurate Attention with Asynchrony
SP032 arXiv Speculative Decoding paper
SP033 arXiv Medusa speculative decoding paper
SP034 arXiv LLM inference infrastructure survey
SP035 arXiv LLM evaluation benchmark paper
SP036 Reddit r/LocalLLaMA Together AI discussions
SP037 Hacker News Submissions from together.ai
SP038 Product Hunt Together AI on Product Hunt
SP039 StackShare Together AI Tech Stack
SP040 Gartner Gartner AI Insights
SP041 NVIDIA NVIDIA AI investments 2024
SP042 GitHub togethercomputer/together-python SDK
SP043 PyPI together — Python package
SP044 Wikipedia Together AI — Wikipedia
SI001 Together AI Together AI — The AI Acceleration Cloud
SI002 Together AI Pricing | Together AI
SI003 Together AI Customers | Together AI
SI004 Together AI Together AI Blog
SI005 Together AI Together AI raises $102.5M Series A
SI006 Together AI Announcing $305M Series B
SI007 Together AI Series A2 announcement
SI008 Together AI Seed funding announcement
SI009 Together AI Batch inference API updates 2025
SI010 Together AI Together AI Startup Accelerator
SI011 CNBC Together AI raises $305 million at $3.3 billion valuation
SI012 CNBC Together AI raises $305 million (follow-up)
SI013 Bloomberg Together AI Startup Raises Funds at $3.3 Billion Valuation
SI014 Fast Company Together AI funding profile
SI015 TechCrunch Together raises $102.5M to build open-source generative AI
SI016 TechCrunch Together AI is worth $1.25B (March 2024 update)
SI017 VentureBeat Together AI raises $305M for open-source GenAI
SI018 Wikipedia Together AI — Wikipedia
SI019 Crunchbase Together AI — Crunchbase Profile
SI020 SEC EDGAR SEC EDGAR — Together AI search
SI021 Salesforce Ventures Salesforce Ventures Perspectives
SI022 NVIDIA NVIDIA AI investments 2024
SI023 X (Together) @togethercompute on X
SI024 Gartner Gartner AI Insights
SI025 PitchBook Together AI — PitchBook profile
SI026 The Information Together AI revenue 2025 reporting
SI027 Forrester Forrester: Generative AI infrastructure landscape
SI028 IDC IDC Worldwide AI Software Market Forecast 2024-2028
SI029 a16z a16z — State of Generative AI in the Enterprise 2025
SI030 Menlo Ventures Menlo Ventures: 2025 State of AI
SI031 Bessemer Venture Partners Bessemer: State of AI 2025
SI032 SEC EDGAR CoreWeave SEC filings (S-1 and post-IPO)
SI033 SEC EDGAR Navan S-1/A filing
SI034 SEC EDGAR Figma S-1 filings (comparable IPO)
SI035 CoreWeave CoreWeave — Specialized GPU Cloud
SI036 Fireworks AI Fireworks AI — Production-grade LLM inference
SI037 Groq Groq — Fast AI inference
SI038 Reddit r/LocalLLaMA Together AI discussions
SI039 Hacker News Submissions from together.ai
SE001 Together AI Together AI — The AI Acceleration Cloud
SE002 Together AI About | Together AI
SE003 Together AI Pricing | Together AI
SE004 Together AI Customers | Together AI
SE005 Together AI Together AI Blog
SE006 Together AI FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
SE007 Together AI FlashAttention-4
SE008 Together AI ThunderKittens kernel framework
SE009 Together AI Together Inference Engine 2.0
SE010 Together AI Together Inference Engine v2
SE011 Together AI Batch inference API updates 2025
SE012 Together AI AI Native Conf — research & product announcements
SE013 Together AI Together AI Startup Accelerator
SE014 Together AI Together AI at NVIDIA GTC 2025
SE015 Together AI Alon Gavrielov joins as VP Infrastructure Strategy
SE016 Together AI Introduction | Together AI Docs
SE017 Together AI Quickstart | Together AI Docs
SE018 Together AI Inference Models | Together AI Docs
SE019 Together AI Fine-tuning Overview | Together AI Docs
SE020 Together AI Dedicated Endpoints | Together AI Docs
SE021 Together AI Batch Inference | Together AI Docs
SE022 Together AI Chat Completions API Reference
SE023 Together AI Serverless Inference | Together AI Docs
SE024 Together AI Embeddings | Together AI Docs
SE025 Together AI Rate Limits | Together AI Docs
SE026 Together AI JSON Mode | Together AI Docs
SE027 Together AI Function Calling | Together AI Docs
SE028 Together AI Safety Models | Together AI Docs
SE029 Together AI Code Execution | Together AI Docs
SE030 Together AI LLMs Overview | Together AI Docs
SE031 Together AI Vision Models Overview | Together AI Docs
SE032 Together AI Audio Models Overview | Together AI Docs
SE033 Together AI Image Models Overview | Together AI Docs
SE034 Together AI Embeddings API Reference
SE035 Together AI Completions API Reference
SE036 Together AI Models API Reference
SE037 GitHub Together Computer · GitHub Org
SE038 GitHub togethercomputer/RedPajama-Data
SE039 GitHub togethercomputer/OpenChatKit
SE040 GitHub Dao-AILab/flash-attention
SE041 GitHub togethercomputer/StripedHyena
SE042 GitHub togethercomputer/Llama-2-7B-32K-Instruct
SE043 GitHub togethercomputer/together-python SDK
SE044 PyPI together — Python package
SE045 HuggingFace togethercomputer on Hugging Face
SE046 HuggingFace StripedHyena-Nous-7B
SE047 HuggingFace Evo-1-131k-base
SE048 HuggingFace RedPajama-Data-1T Dataset
SE049 HuggingFace HuggingFace x Together AI partnership
SE050 arXiv FlashAttention: Fast and Memory-Efficient Exact Attention
SE051 arXiv FlashAttention-2: Faster Attention with Better Parallelism
SE052 arXiv FlashAttention-3: Fast and Accurate Attention with Asynchrony
SE053 arXiv Speculative Decoding paper
SE054 arXiv Speculative decoding follow-up
SE055 arXiv Medusa speculative decoding paper
SE056 arXiv Mixture-of-Agents Enhances LLM Capabilities
SE057 arXiv LLM inference infrastructure survey
SE058 arXiv LLM evaluation benchmark paper
SE059 arXiv Sheared LLaMA paper
SE060 NVIDIA NVIDIA AI investments 2024
SE061 AWS Amazon Bedrock
SE062 Together AI Together AI status page
SE063 Together AI Together AI trust center
SE064 Tri Dao Tri Dao personal site (Together CSO)
SE065 Stanford HazyResearch Stanford HazyResearch lab (Chris Ré)
SE066 AICPA SOC 2 reporting framework
SE067 HHS HIPAA sample BAA provisions
SE068 Hacker News Submissions from together.ai
SE069 Reddit r/LocalLLaMA Together AI discussions
SE070 Product Hunt Together AI on Product Hunt
SE071 StackShare Together AI Tech Stack
SU001 Together AI Together AI — The AI Acceleration Cloud
SU002 Together AI About | Together AI
SU003 Together AI Customers | Together AI
SU004 Together AI Together AI Blog
SU005 Together AI Pricing | Together AI
SU006 Together AI Together AI Startup Accelerator
SU007 Together AI Together AI at NVIDIA GTC 2025
SU008 Together AI Together AI x Adaption partnership
SU009 Together AI AI Native Conf — research & product announcements
SU010 Together AI Salesforce customer case study
SU011 Together AI Zoom customer case study
SU012 Together AI Pika customer case study
SU013 Together AI Arcee customer case study
SU014 Together AI Nous Research customer case study
SU015 Together AI Cartesia customer case study
SU016 Together AI Washington University customer case study
SU017 Salesforce Ventures Salesforce Ventures Perspectives
SU018 NVIDIA NVIDIA AI investments 2024
SU019 HuggingFace HuggingFace x Together AI partnership
SU020 HuggingFace togethercomputer on Hugging Face
SU021 Together AI Together AI Blog (apex)
SU022 Reddit r/LocalLLaMA Together AI discussions
SU023 Hacker News Submissions from together.ai
SU024 Product Hunt Together AI on Product Hunt
SU025 StackShare Together AI Tech Stack
SU026 G2 Together AI — G2 reviews
SU027 Trustpilot Together AI — Trustpilot reviews
SU028 Wikipedia Together AI — Wikipedia
SU029 Crunchbase Together AI — Crunchbase Profile
SU030 Fireworks AI Fireworks AI — Production-grade LLM inference
SU031 Replicate Replicate — Run models in the cloud
SU032 CNBC Together AI raises $305 million at $3.3 billion valuation
SU033 Bloomberg Together AI Startup Raises Funds at $3.3 Billion Valuation
SU034 Fast Company Together AI funding profile
SU035 TechCrunch Together AI is worth $1.25B (March 2024 update)
SU036 VentureBeat Together AI raises $305M for open-source GenAI
SU037 Gartner Gartner AI Insights
SU038 Together AI Introduction | Together AI Docs
SU039 Together AI Inference Models | Together AI Docs
SU040 PyPI together — Python package
SU041 GitHub togethercomputer/together-python SDK
SU042 Together AI Together AI status page
SR001 FTC FTC: AI Companies — Uphold Your Privacy & Confidentiality Commitments
SR002 FTC FTC launches inquiry into generative AI investments & partnerships
SR003 EUR-Lex EU Regulation 2024/1689 (AI Act)
SR004 NIST AI Risk Management Framework
SR005 US BIS BIS export controls on advanced computing & foundation models
SR006 UK ICO UK Information Commissioner — Our work on AI
SR007 OAIC (Australia) OAIC guidance on privacy and AI products
SR008 The White House Executive Order 14110 on Safe, Secure AI
SR009 CA Attorney General California Consumer Privacy Act guidance
SR010 HHS HIPAA sample BAA provisions
SR011 AICPA SOC 2 reporting framework
SR012 Center for Democracy & Technology CDT — AI policy & governance
SR013 CourtListener NYT v Microsoft / OpenAI docket
SR014 CourtListener Authors Guild v OpenAI docket
SR015 CourtListener Getty Images v Stability AI docket
SR016 CoreWeave CoreWeave — Specialized GPU Cloud
SR017 Groq Groq — Fast AI inference
SR018 Cerebras Cerebras — Wafer-Scale AI
SR019 Fireworks AI Fireworks AI — Production-grade LLM inference
SR020 Replicate Replicate — Run models in the cloud
SR021 Modal Modal — Serverless AI infrastructure
SR022 Anyscale Anyscale — Powered by Ray
SR023 Lambda Labs Lambda — GPU Cloud for AI
SR024 Together AI Together AI — The AI Acceleration Cloud
SR025 Together AI Together AI Blog
SR026 Together AI Pricing | Together AI
SR027 Together AI Together AI status page
SR028 Together AI Together AI trust center
SR029 Together AI Customers | Together AI
SR030 Together AI Introduction | Together AI Docs
SR031 Together AI Safety Models | Together AI Docs
SR032 Hacker News Submissions from together.ai
SR033 Reddit r/LocalLLaMA Together AI discussions
SR034 CNBC Together AI raises $305 million at $3.3 billion valuation
SR035 Bloomberg Together AI Startup Raises Funds at $3.3 Billion Valuation
SR036 VentureBeat Together AI raises $305M for open-source GenAI
SR037 Fast Company Together AI funding profile
SR038 Wikipedia Together AI — Wikipedia
SV001 Gartner Gartner AI Insights
SV002 Forrester Forrester: Generative AI infrastructure landscape
SV003 IDC IDC Worldwide AI Software Market Forecast 2024-2028
SV004 a16z a16z — State of Generative AI in the Enterprise 2025
SV005 Bessemer Venture Partners Bessemer: State of AI 2025
SV006 Menlo Ventures Menlo Ventures: 2025 State of AI
SV007 PitchBook Together AI — PitchBook profile
SV008 The Information Together AI revenue 2025 reporting
SV009 Meritech Capital Meritech SaaS comps table
SV010 PwC PwC Global AI Study — Sizing the prize
SV011 Y Combinator Y Combinator — Generative AI companies directory
SV012 SEC EDGAR SEC EDGAR — Together AI search
SV013 SEC EDGAR Navan S-1/A filing
SV014 SEC EDGAR CoreWeave SEC filings (S-1 and post-IPO)
SV015 SEC EDGAR Figma S-1 filings (comparable IPO)
SV016 SEC EDGAR Snowflake 10-K filings (public SaaS comp)
SV017 SEC EDGAR MongoDB 10-K filings (public infra comp)
SV018 CNBC Together AI raises $305 million at $3.3 billion valuation
SV019 Bloomberg Together AI Startup Raises Funds at $3.3 Billion Valuation
SV020 VentureBeat Together AI raises $305M for open-source GenAI
SV021 Fast Company Together AI funding profile
SV022 Wikipedia Together AI — Wikipedia
SV023 Crunchbase Together AI — Crunchbase Profile
SV024 Together AI Together AI — The AI Acceleration Cloud
SV025 Together AI Together AI Blog
SV026 Together AI Pricing | Together AI
SV027 Together AI Customers | Together AI
SV028 Together AI About | Together AI
SV029 CNBC Figma starts trading on NYSE after IPO
SV030 CNBC Navan files for IPO
SV031 TechCrunch Sakana AI $135M Series B at $2.65B
SV032 NVIDIA NVIDIA + Sakana AI partnership
SV033 CoreWeave CoreWeave — Specialized GPU Cloud
SV034 Groq Groq — Fast AI inference
SV035 Cerebras Cerebras — Wafer-Scale AI
SV036 Fireworks AI Fireworks AI — Production-grade LLM inference
SV037 Replicate Replicate — Run models in the cloud
SV038 Modal Modal — Serverless AI infrastructure
SV039 Anyscale Anyscale — Powered by Ray
SV040 Lambda Labs Lambda — GPU Cloud for AI
SV041 Hacker News Submissions from together.ai
SV042 Reddit r/LocalLLaMA Together AI discussions
SV043 Salesforce Ventures Salesforce Ventures Perspectives
SV044 NVIDIA NVIDIA AI investments 2024
SV045 Sacra Together AI revenue, valuation & funding — Sacra analysis Sacra estimates that Together AI hit $1B in annualized revenue in February 2026, up from ~$618M at the end of 2025, off growing demand for generative AI applications and the need, particularly among startups, for developer tooling used to train, fine-tune, and deploy AI models.
SV046 ARR.club Together AI ARR milestones and revenue growth
SV047 EquityZen Invest In Together AI Stock — Pre-IPO shares profile
SV048 CB Insights State of AI Q1 2026 Report