初创公司尽调
尽调报告 AI infrastructure Series C 2026-06-13

Modular

硬件可移植 AI 推理有真实潜力,但公开经济性仍薄

Modular 确有技术差异化、新融资和早期客户证据;但公开收入、利润率、留存和股权结构披露仍太薄,按最新 $1.6 billion 估值还不足以支撑买入。

封面要素

最新融资轮 01
250 USD M [CV001]
累计融资 02
380 USD M [CV001]
最新估值 03
1600 USD M [CV001]
成立时间 04
2022 [CO001]
总部 05
Los Altos, CA [CO005]
员工数 06
>130 [CR022]
具名生产验证 07
Inworld + Hippocratic AI [CU008, CU012]

公司概况

Modular 是一家湾区私有 AI 基础设施公司,Chris Lattner 和 Tim Davis 于 2022 年创立。公司已经从早期 Mojo 语言叙事,扩展为覆盖 MAX 推理、Mammoth 编排,以及面向硬件可移植 AI serving 的托管或 BYOC 部署界面的一整套栈。最强的公开证据包括 2025 年 $250 million Series C、$1.6 billion 估值,清晰的跨硬件定位,以及 Inworld、Hippocratic AI 等具名生产案例;核心承销分歧在于,这家公司最终会扩成耐久的软件平台,还是更偏服务密集型的优化供应商。

官网
www.modular.com
创始人
Chris Lattner, Tim Davis
创立地点
San Francisco Bay Area, CA, USA
总部
Los Altos, CA, USA
产品
Modular 销售分层 AI 基础设施栈:MAX 负责推理和模型执行,Mammoth 负责跨异构 GPU 集群的 Kubernetes-native 编排,Mojo 负责可移植内核开发,并提供托管或自带云部署选项。
客户
AI-native 应用开发商、企业平台和 ML 基础设施团队、重视合规的 BYOC 买家,以及渠道或云合作方。
商业模式
免费开发者入口导向按 token 计价的共享端点、按分钟计价的专用和 BYOC 部署,以及需要更多人工介入的优化或渠道合作,后者会叠加工程支持。
阶段
Series C
融资情况
2025 年 9 月 Series C 融资引入 $250 million,使累计融资达到 $380 million,并给出 $1.6 billion 估值。
[CO001, CO005, CO011, CO017, CO041, CE001, CE007, CE012]

执行摘要

主要优势

  • 产品栈具备可信的硬件可移植性,覆盖 MAX、Mammoth、Mojo,以及托管或 BYOC 部署界面。
  • 融资支持强,刚完成 $250 million Series C,累计融资达到 $380 million。
  • Inworld 和 Hippocratic AI 的具名生产证据显示,平台能够承载真实的低延迟 AI 工作负载。
  • 免费到企业版漏斗和云渠道动作,为商业采用打开了多条路径。

主要风险

  • 公开资料仍未披露收入、毛利率、现金跑道或各产品界面的经济性。
  • 虽然已有具名参考客户,客户广度、留存、续约行为和集中度仍披露不足。
  • 交付模型看起来部分偏服务,可能限制软件式利润率和扩展性。
  • 即便公司强调可移植性,对伙伴、云和 NVIDIA 相邻生态的依赖仍然不低。
  • 公开股权结构表和清算优先权细节缺席,限制了对普通股结果的测算。

未决问题

  • 按产品界面拆分的当前收入或 ARR,以及软件与服务收入占比。
  • 共享、专属和 BYOC 部署下的毛利率、支持强度和现金跑道证据。
  • 客户数、留存、续约节奏,以及按客户、云伙伴和硬件伙伴划分的集中度。
  • headline $1.6 billion 估值背后的股权结构表、清算优先权和其他融资条款。
  • 还需要证明开源和免费漏斗能转化为广泛、可持续的企业收入,而不只是服务少数具名参考客户。

目录

Chapter 01

01公司概况

1.1 身份、创立背景,以及公司实际销售什么

Modular 把自己描述为一家构建统一 AI 计算层的公司,而不是某个芯片厂商或某个模型家族的点状工具。结合 About 页面、定价界面和 2025 年融资文章,公司始终把核心产品定义为围绕 MAX、Mojo 以及现在的 Mammoth 打造的硬件可移植推理基础设施,部署选项覆盖 Modular 托管云、客户 VPC 和自托管环境。创立故事也一致:Chris Lattner 和 Tim Davis 在 Google 相识,认为割裂的 AI 基础设施拖慢了采用,于是在 2022 年创立 Modular,试图把复杂性抽象掉。公开地址表述在 Silicon Valley、Palo Alto、Los Altos 和更宽泛的 San Francisco Bay Area 之间变化,但重心显然在湾区。务实的商业模式结论是,Modular 已经不只是语言赌注;它在卖一套全栈基础设施层,既有免费开发者入口,也有付费消费端点和企业部署,服务那些希望在 NVIDIA、AMD、CPU 和云环境之间保持可移植性的客户。[CO001, CO002, CO003, CO009, CO010, CO011]

快照 KPI 表
指标数值 / 状态日期置信度缺口 / 注意事项
成立时间20222022 公开记录独立来源和官方来源均指向 2022 年,但没有给出确切注册日期。
创始人组合Chris Lattner 和 Tim Davis2022 公开记录创始人履历证据充分,但确切所有权分配未公开。
主要总部表述旧金山湾区 / Silicon Valley2025-2026 来源公开来源在 Silicon Valley、Palo Alto、Los Altos 和 Bay Area 等标签之间切换。
办公室足迹San Francisco、Los Altos、Boston、Edinburgh 等办公室2026 来源包当前办公室列表公开;各办公室人员结构未公开。
最近一轮融资$250M Series C 轮2025-09-24融资规模、领投方和估值都有充分交叉印证。
累计融资$380M2025-09-24公司、Reuters/Yahoo 和 Sacra 在累计资本上口径一致。
最新估值$1.6B2025-09-24公开估值对应 2025 年融资,仍是当前口径,但没有后续估值标记。
员工数>130 公司口径 / Reuters 相关报道约 1302025-09-24除 2025 年融资报道外,运行日期员工数没有公开刷新。
公开定价姿态免费开发者层级,加用量计费和企业销售2026 定价页详细企业合同经济性未公开。
具名客户 / 合作伙伴证明Inworld、AWS、AMD、NVIDIA、TensorWave、Oracle、SF Compute、Jane Street 等客户 / 伙伴2025-2026 来源具名 logo 不等于已披露的收入集中度或合同期限。
收入已审阅来源包中未找到权威公开收入数字。
客户数已审阅来源包中未找到权威公开活跃客户数。

公开披露无法支撑权威运行日期运营指标时,null 为刻意保留。

[CO001, CO003, CO004, CO010, CO011, CO016]
FO002: 公司快照逻辑

Modular 把硬件可移植基础设施、开发者工具、企业部署和合作伙伴分发连接起来;许可清晰度仍是采用风险。

[CO009, CO010, CO011, CO038, CO043, CO045]

1.2 领导层可见度、运营版图和组织规模

公开领导班子可识别,但治理透明度还没达到后期私有公司尽调理想状态。Modular 的 About 页面列出 Chris Lattner 为联合创始人兼 CEO,Tim Davis 为联合创始人兼总裁,Mostafa Hagog 为工程 VP,Kalor Lewis 为财务 VP,Eric Johnson 为产品负责人,Mike Edwards 为特殊项目负责人。独立和投资方信息增强了创始人-市场匹配信心:GV 强调 Lattner 的 LLVM、Clang、Swift 和 TPU 背景,以及 Davis 的 TensorFlow Lite 和端侧 ML 经验;TechCrunch 和 SDxCentral 也独立称公司位于 Palo Alto。办公版图披露也扩大了。Modular 的 About 页面现在列出 San Francisco、Los Altos、Boston 和 Edinburgh 办公室;办公室扩张文章称 Edinburgh 位于 Bayes Centre,San Francisco 的 Jackson Square 办公室则补充 Los Altos 总部。规模披露仍是方向性而非完整口径:公司称 2025 年 9 月员工已超过 130 人,Reuters 相关报道也称当时约 130 名员工。缺口仍在完整董事会名单、委员会结构,以及创始人之外更清晰的接班梯队。[CO003, CO004, CO005, CO006, CO007, CO008]

领导层和创始人表
人物职位背景创始人-市场匹配或职能覆盖关键人物依赖
Chris Lattner联合创始人兼 CEOLLVM、Clang、Swift、MLIR、Google TPU 背景编译器、系统和 AI 基础设施可信度支撑技术叙事和融资故事
Tim Davis联合创始人兼总裁Google Brain AI 基础设施;创办 TensorFlow Lite把产品和基础设施运营经验与创始人愿景配对
Mostafa Hagog工程副总裁官方领导层页面具名可见的工程高管,但具体组织跨度未公开
Kalor Lewis财务副总裁官方领导层页面具名财务负责人说明运营栈更成熟,但资本规划细节仍为私人信息
Eric Johnson产品负责人官方领导层页面具名显示创始人组合之外已有产品管理能力
Mike Edwards特别项目负责人官方领导层页面具名暗示内部战略或实验项目,但职责范围未公开展开

公开来源显示领导层班底有一定厚度但不完整;董事会构成和更深层继任厚度仍披露不足。

[CO001, CO006, CO007, CO042]
FO003: 快照 KPI

快速指标显示资本支持和开发者触达很强,但核心商业披露仍落后于技术动能。

本图混合了公司说法、独立融资数据和一个抓取的代码库快照;用途是提供方向面板,不能替代完整 KPI 尽调。

[CO017, CO019, CO022, CO032, CO033, CO040]

1.3 融资历史、投资人图谱和商业模型

公开资本历史是 Modular 故事里记录最完整的部分之一。Sacra 报道 2022 年 6 月有 $30 million 种子轮;TechCrunch 和 The SaaS News 均指向 2023 年 8 月由 General Catalyst 领投的 $100 million 融资,使累计融资达到 $130 million。关键跃迁发生在 2025 年 9 月,Modular 和独立媒体称公司完成由 Thomas Tull 的 US Innovative Technology fund 领投的 $250 million Series C,引入 DFJ Growth,并保留 GV、General Catalyst 和 Greylock 等老股东参与。该轮把累计融资推至 $380 million,估值达到 $1.6 billion,接近上一轮隐含估值的三倍。商业上,公司似乎同时在三层变现:MAX 和 Mojo 的免费开发者 / 社区入口,按消费计价的托管端点,以及把软件、工作负载调优和云收入分成打包的企业或伙伴交易。仍未公开的是收入规模、不同部署模式的单位经济性,以及客户基础在云、硬件伙伴和具名企业账户之间的集中度。[CO011, CO013, CO014, CO015, CO016, CO017]

利益相关方或投资者地图
利益相关方角色控制权或经济重要性尽调问题
US Innovative Technology Fund2025 年 Series C 领投方最新一轮中最可见的新领投方,也传递国防或国家利益对齐信号确认董事会权利、清算优先权,以及与 USIT 参与绑定的任何战略权利。
DFJ Growth2025 年融资新投资方为投资团增加一家成长期软件投资者确认出资额、持股比例和任何后续跟投储备策略。
General Catalyst2023 年融资领投方,2025 年现有支持方横跨扩张阶段的核心重复机构赞助方索取当前持股、按比例跟投权,以及任何董事会观察员角色。
GV 和 Greylock早期且重复投资方支撑技术创始人叙事,并提供风投信号梳理确切持股规模、治理权利,以及种子轮、B 轮和 C 轮条款之间的任何差异。
云和基础设施合作伙伴覆盖 AWS、Oracle、TensorWave 及相关渠道的分发和部署对手方可能在企业部署中提供有意义的渠道、托管或联合销售杠杆区分营销合作与合同收入贡献及利润率结构。
具名企业和研究证明点Inworld、SF Compute、Jane Street 及类似参考案例验证可移植性和性能说法可移植性和性能说法的重要证明,但不是已披露客户数索取合同规模、期限、扩张率和参考客户背书意愿。

该地图聚焦经济或战略上重要的公开利益相关方,而不是完整股权结构表或穷尽式客户名单。

[CO013, CO014, CO016, CO017, CO035, CO036]
FO001: 公司里程碑时间线

Modular 从 2022 年创立、2023 年推出 Mojo,走到 2025 年后期融资估值上台阶,并在 2026 年推动 Mojo 1.0 稳定性。

只有年份的里程碑使用当年第一天,以便在公开资料包没有给出准确日期时保持顺序。

[CO001, CO015, CO013, CO016, CO017, CO024]

1.4 里程碑、牵引力主张和主要承销缺口

里程碑曲线显示,公司正从开发者语言发布,成熟为更宽的基础设施平台。Mojo 于 2023 年 5 月 2 日公开发布;到 Modular 宣布本地下载时,公司称已有超过 120,000 名开发者注册,Discord 和 GitHub 上有 19,000 多名活跃用户。到 2025 年 9 月,公司声称平台每月下载量达数万次、GitHub star 超过 24,000、每天服务 trillions of tokens、生态覆盖 100 多个国家,并有 600,000 多行开源代码。路线图也在推进:核心标准库以 Apache 2 with LLVM exceptions 发布,公开 Mojo 网站列出 5 月 7 日稳定版 1.0.0b1 和 6 月 11 日 nightly,26.3 版本称最终 1.0 预计在 2026 年晚些时候发布。产品范围也变宽,Mammoth 面向企业级 serving 推出,围绕 AWS 和 AMD 的合作公告强化了硬件无关论点。最大未解问题不是技术品牌,而是商业证据:公开材料仍不披露收入、准确客户数、完整董事会组成,或开源 Mojo 组件与专有 / 合同约束商业栈之间的长期边界。GitHub 上围绕许可的担忧讨论串并不会打破投资逻辑,但它确实说明开发者信任仍是承销负担的一部分。[CO021, CO022, CO023, CO024, CO025, CO026]

里程碑表
日期事件类型金额 / 估值 / 状态参与方含义
2022Modular 成立,目标是构建统一 AI 基础设施层创立公司成立Chris Lattner、Tim Davis确立公司定位:重写 AI 基础设施,而非单一模型应用。
2022-06完成种子轮融资融资$30M 种子轮已审阅材料中种子轮投资者未完全公开为公开突破前提供初始资本基础。
2023-05-02Mojo 公开发布产品语言发布Modular 开发者社区创造最初切入开发者心智和性能工具的楔子。
2023-08-24宣布 Series B融资$100M;累计融资 $130MGeneral Catalyst、GV、SV Angel、Greylock、Factory 等投资方验证投资者对基础设施逻辑的需求。
2023平台商业化发布扩张公司称发布年份为 2023Modular标志公司从概念公司转向交付平台的供应商。
2025-09-24宣布 Series C融资$250M,估值 $1.6B;累计融资 $380MUSIT、DFJ Growth、GV、General Catalyst、Greylock 等投资方让 Modular 进入后期私有基础设施公司队列。
2025-09-24Mammoth 公开预览和 Platform 25.6 定位公开产品企业级服务与最新平台版本Modular、企业客户、硬件合作伙伴显示公司从语言或运行时扩展到编排和生产级服务。
2026-05-07Mojo 1.0.0b1 在 mojolang.org 上列为稳定版产品完整 1.0 前的 beta 或稳定里程碑Modular、Mojo 社区标志从探索性语言走向更稳定的开发者平台。
2026公开足迹显示四个已披露办公室枢纽扩张San Francisco、Los Altos、Boston、Edinburgh 等办公地点Modular暗示在北美和欧洲有更广的招聘与商业覆盖。
2026开源边界仍是活跃尽调议题反向核心标准库开放;编译器已承诺;商业栈仍由合同约束Modular、外部开发者开发者信任和许可清晰度仍是采用故事的一部分。

已审阅公开来源包没有暴露确切日期时,仅到年份或月份的条目保留时间顺序。

[CO001, CO015, CO021, CO013, CO016, CO017]

1.5 图表

Chapter 02

02市场分析

2.1 市场边界、纳入支出和替代品

不应把 Modular 当作参与所有 AI 软件或所有 GPU 基础设施支出的公司来分析。它自己的产品界面定义了更窄的市场:生产推理基础设施,包括托管共享端点、专用托管端点、自带云部署、自定义模型 serving,以及承诺在 NVIDIA、AMD、CPU 和 Apple Silicon 之间可移植的编译器 / runtime 层。因此,纳入支出是买家为在生产环境中以可接受延迟、可靠性和合规性 serving 模型而分配的预算,再加上调优 kernel、batching 和 routing 所需的工程层。排除支出包括基础模型创建、通用 SaaS copilots、无差异云 IaaS,以及大多数从未进入生产 serving 的一次性实验。替代品集合很宽:专有模型 API、单一厂商 GPU 云、vLLM 或 TensorRT 集成等 wrapper-based 栈、自管 Kubernetes 推理,以及 ONNX Runtime 等可移植 runtime。这个框架重要,因为 Modular 更少是押注某个模型家族,更多是押注可移植性、部署灵活性和推理经济性会成为严肃 AI 运营方的采购标准。[CM001, CM002, CM003, CM004, CM005, CM006]

市场定义表
细分 / 类别纳入支出排除支出买方 / 付款方与 Modular 的相关性
共享推理端点按 token 计价的 API 推理、突发容量,以及面向开放或定制模型的优化支持基础模型研发、通用聊天机器人 SaaS,以及没有服务层的原始云 GPU 预留有用量预算的产品团队或 AI 工程负责人最贴近使用 Modular 托管基础设施、希望快速启动的买方
专用托管推理Modular 托管云中的常驻托管服务、可观测性和定制模型调优与模型服务结果无关的通用云支出有延迟或可靠性预算的平台团队适合从原型走向生产 SLA 的团队
BYOC / 私有推理客户 VPC 内的控制平面、编排和模型服务栈,加工程支持非托管 Kubernetes 人力、无关安全工具,或推理之外的主权云支出使用已承诺云支出的平台、安全或采购负责人与受监管或大型企业买方高度相关
可移植编译器 / 运行时层内核优化、跨加速器可移植性和定制模型编译训练基础设施、模型创建,或一次性本地开发者 notebookML 基础设施或系统工程负责人可证明从 wrapper 型栈切换合理性的差异化层
工作流特定推理面向智能体、语音、代码和多模态服务,围绕延迟、吞吐和硬件组合调优不能归因于服务层的垂直应用收入AI 产品总经理或业务单元负责人重要,因为 Modular 围绕工作流经济性而不是抽象基础设施做市场叙事
现状替代方案以 NVIDIA 为中心的云、自研 API、vLLM/TensorRT wrapper、自管 K8s、基于 ONNX 的可移植栈N/A与上述相同的买方集合这些替代方案争夺同一预算,并定义需求的真实边界。

各行把生产推理基础设施支出,与更广泛的 AI 软件、模型开发和通用云支出分开,避免本章夸大 Modular 的市场。

[CM001, CM002, CM003, CM004, CM005, CM006]
FM001: 市场规模观察框架

从广义推理市场到 Modular 似乎瞄准的更窄生产服务切口,采用三层框架。

金字塔只把相邻已发布市场规模作为外层边界背景;中层和底层是边界判断,并非报告收入数字。

[CM009, CM010, CM011, CM012, CM019, CM041]

2.2 用受证据约束的测算取代单一泛化 TAM

公开资料包支持市场方向,但不支持一个干净、权威的 Modular TAM。第三方发布机构测的是相邻边界。The Business Research Company 估算更宽泛的 AI 基础设施市场在 2026 年为 USD 90.91 billion,Fortune Business Insights 估算 AI 推理市场在 2026 年为 USD 117.80 billion,Technavio 估算 2025 年 AI 推理硬件为 USD 67.80 billion,至 2030 年 CAGR 为 20.8%。这些数字有用,但不能互换,因为它们混合了纯硬件、基础设施层,以及更宽的推理软件加硬件定义。CNCF 和 KubeCon 报道提供了采用视角:Kubernetes 已被广泛用于生产和生成式 AI 推理,说明真实预算正从实验性模型访问,转向生产编排和成本控制。对 Modular 最可辩护的市场测算框架因此是分层的。广义推理和 AI 基础设施估计描述外层 TAM;更近的 SAM 是企业和 AI-native 生产 serving 支出中,买家真正重视硬件可移植性、从专有 API 迁移、BYOC 合规或成本敏感的多模型运营的那一部分。没有内部工作负载、客户或收入分段,公开 SOM 无法支撑。[CM009, CM010, CM011, CM012, CM013, CM014]

TAM/SAM/SOM 或规模测算视角表
发布方 / 视角年份地区数值增长信号方法 / 边界置信度关键限制
The Business Research Company — AI 基础设施2026全球USD 90.91B2025 至 2026 年 CAGR 26.5%广义 AI 基础设施市场,覆盖云、本地和混合环境中的硬件、服务器软件、训练和推理过于宽泛,不能视为 Modular 的直接可服务市场
Fortune Business Insights — AI 推理市场2026全球USD 117.80B至 2034 年 CAGR 12.98%覆盖边缘、云和本地执行已训练 AI/ML 模型的推理市场混合了硬件和软件层,大于纯服务平台切口
Technavio — AI 推理硬件2025 基准 / 2026-2030 预测全球2025 年 USD 67.80B至 2030 年 CAGR 20.8%面向低延迟推理工作负载的专用处理器和部署硬件捕捉的是芯片和硬件支出,多于软件 / 编排支出
CNCF 调研 — 生产基础设施采用2026 发布 / 2025 调研全球受访者基础82% 生产 Kubernetes;66% 的生成式 AI 推理在 K8s 上生产采用已经主流化以编排采用为视角,而非收入采用指标,不是以美元计的 TAM
Forbes KubeCon 报道 — 推理经济视角2026全球 / 企业推理市场预计到 2030 年达 USD 255B;67% 的 AI compute 已流向推理推理占比增速快于训练关注度围绕生产服务经济性的会议 / 报告综合新闻式总结,不是一级市场模型
受约束的 Modular SAM 视角2026 承保视角全球公开信息无法单独拆出取决于生产迁移和可移植性需求企业与 AI 原生服务支出;硬件可移植性、BYOC 控制或 API 迁移在其中重要需要私有客户、工作负载和收入数据才能量化

本表有意保留多个相邻的市场定义,而不是假装存在一个权威的 Modular TAM。

[CM009, CM010, CM011, CM012, CM013, CM015]
FM002: 市场估计区间

2026 年与推理相邻市场规模的低 / 基准 / 高边界视角,同时保留不同发布方衡量层级不同这一事实。

区间是围绕已发布相邻市场定义的示意性括号,不是概率分布,也不是单一调和预测。

[CM009, CM010, CM011, CM015, CM018, CM019]

2.3 买家、用户、付款方和采用路径

Modular 的买家地图比「任何运行模型的人」更细。自助和共享端点界面面向开发者和产品团队,他们想要快速实验、明确 token 经济性,并尽量少做基础设施工作。BYOC 方案不同:它瞄准平台、安全和 ML 基础设施团队,这些团队需要数据留在客户 VPC,想复用云承诺,并且更偏好企业工程支持,而不是内部拼集群。解决方案页面暗示至少三个短期、工作流很重的细分:agent builders、语音团队和 coding-tool 厂商。每个场景里,终端用户体验产品,但经济买家通常是平台负责人、AI 工程经理或采购 / FinOps owner,他们要对延迟、毛利率和供应商风险负责。客户页面进一步拓宽了地图,展示 AWS、AMD、NVIDIA、Inworld 和 Hippocratic AI 等云、硬件和应用伙伴。这个组合说明 Modular 卖的不是通用开发者工具,而是面向有重复推理负载、且高度敏感于基础设施设计的组织的生产 serving 层。[CM008, CM020, CM021, CM022, CM023, CM024]

细分市场 / 买方地图
细分市场买方用户付款方工作流预算负责人采用触发因素
使用共享端点的 AI 原生应用团队AI 产品负责人或工程经理应用开发者和 ML 工程师使用预算 / COGS 负责人快速模型集成、原型开发、突发式生产产品 GM 或工程负责人需要更快上线和可预测的 token 经济性
使用专用托管云的企业平台团队平台工程或 ML 基础设施负责人模型服务和 SRE 团队中央基础设施预算带可观测性和调优能力的常驻生产推理平台或基础设施负责人需要可靠性,但不想自管全栈
受监管或大型企业 BYOC 买方重视安全的平台或采购负责人ML 平台、DevOps 和合规团队已承诺云预算或预留资源在客户 VPC 内推理,并由 Modular 控制平面支持CIO / 平台 VP / 采购数据驻留、合规或云承诺额度利用
语音和实时音频团队AI 产品负责人语音工程师和对延迟敏感的应用团队产品或利润率负责人实时 TTS 和多模态服务产品 GM 或工程总监对延迟敏感,同时希望套利 GPU 成本
编程工具厂商工程领导层推理、IDE 和智能体编排团队基础设施和毛利负责人大规模补全、聊天和智能体循环CTO 或工程 VP巨大的经常性推理负载让硬件灵活性具备经济意义
云或硬件生态伙伴伙伴或平台战略负责人解决方案架构师和伙伴工程团队战略合作预算参考部署、集成和联合销售GM 或联盟负责人需要证明更好的经济性或更广的硬件适配

各行反映 Modular 公开产品和客户页面中最显眼的买方原型;并非覆盖所有未来买方的完整普查。

[CM008, CM020, CM021, CM022, CM023, CM024]
FM003: 买家 / 细分市场地图

矩阵展示 Modular 主要公开细分市场,在预算所有者、用户、证据点和近期就绪度上有何差异。

[CM008, CM021, CM022, CM023, CM024, CM025]

2.4 增长驱动、采用约束和仍缺什么

三个结构性驱动支撑 Modular 所在类别。第一,随着企业把 AI 运营化、云原生团队标准化 Kubernetes、开源 serving 栈把更多工作负载推入生产,推理背景足够大且在增长。第二,可移植工具真实存在:ONNX Runtime、MLIR 和 llm-d 都反映出行业需要能跨多个加速器、部署目标和编排模式的抽象。第三,Modular 自身信息与买家围绕延迟、成本可预测性和合规的痛点一致。约束同样重要。CUDA 的装机基础和生产硬化意味着,许多买家在接受迁移风险之前,会先容忍厂商集中。分析师报告也强调高 capex、集成复杂度、隐私要求和人才短缺。即便 Kubernetes-native 推理,运营成熟度也仍早,每日生产部署比例远低于广泛采用。承销缺口因此不是问题是否存在,而是 Modular 实际能拿下多少市场。公开资料仍不披露客户数、细分组合、共享端点与 BYOC 的量级,独立 benchmark 证据也不足以把公司自报的性能提升转化为干净的 bottom-up SOM。[CM013, CM014, CM017, CM018, CM024, CM025]

增长驱动因素与约束表
驱动因素 / 约束方向时点含义尽调问题
推理市场和基础设施增长增长驱动因素当前 / 多年期生产 AI 支出上升后,庞大的相邻市场给专业化服务层留出空间梳理 Modular 实际能货币化哪些支出,哪些仍属于通用云或模型支出
AI 工作负载的 Kubernetes 标准化增长驱动因素当前生产推理越来越围绕 Kubernetes 原生控制平面和路由来组织测试客户需求有多少真正偏好 K8s 原生技术栈,而不是更简单的托管 API
硬件可移植性和抽象需求增长驱动因素当前 / 多年期ONNX Runtime、MLIR 和 llm-d 都显示,行业想要与加速器中立的服务和编排验证在供应压力逼迫之前,买方是否愿意为可移植性更换供应商
智能体、语音和编程产品中的工作流特定成本压力增长驱动因素当前高调用量和低延迟要求让服务经济性成为战略性预算项要求提供超出伙伴引述的分细分市场毛利率和延迟案例研究
CUDA 锁定和迁移惯性约束当前 / 结构性现有软件栈、库和开发者肌肉记忆会拖慢平台切换量化迁移时间、重新测试负担,以及买方对双栈运营的接受度
GPU 供应稀缺和采购时点约束当前 / 周期性可用算力的获取可能比理论价格性能更重要,从而利好既有厂商判断 Modular 胜出是因为经济性更好、资源获取更好,还是两者兼有
Capex、集成和人才约束约束当前 / 结构性分析师来源认为,前期成本、协同设计复杂度、隐私 / 安全和技能缺口都是真实阻碍评估 Modular 到底降低了多少实施负担,还是只是把负担换了位置
Modular 特定规模的公开证据缺口约束当前没有公开客户数、工作负载组合或 SAM/SOM 披露,承保高度依赖尽调在 NDA 下索取队列、部署模式、留存和基准测试数据

本表有意混合驱动因素和约束,因为同一轮市场扩张既创造需求,也抬高了买方必须跨过的实施和切换门槛。

[CM013, CM014, CM015, CM024, CM025, CM026]
FM004: 采用漏斗或价值链地图

从模型和工作负载需求,流向 Modular 可能的变现点,同时标出主要摩擦点。

[CM017, CM024, CM028, CM029, CM032, CM034]

2.5 图表

Chapter 03

03竞争格局

3.1 版图、直接同行和替代品类别

Modular 不是在和一个单体「推理市场」竞争。它的真实战场分成几类。最直接的运行时同行是 vLLM、SGLang、TensorRT-LLM,以及现在没那么强势的 Hugging Face TGI。这些产品都试图解决同一个即时任务:以较好吞吐、batching 和 API 兼容性 serving 开放权重模型。外围还有 Ray Serve 和 Anyscale 等编排和部署层,买家往往同样在意组合、autoscaling 和 VPC 控制,而不只是 kernel 速度。Together AI 又处在另一类:它卖托管便利、公开定价和 GPU 访问,不要求客户运营运行时。内部自建替代品也重要。ONNX Runtime、llm-d,以及自托管 vLLM 加 Ray 栈,让成熟团队能把架构留在内部。 这种分类影响判断。Modular 的公开材料没有显示一个赢家通吃的引擎市场。它展示的是分层决策树:不同买家可以用开源引擎、托管云、编排平台或自定义栈,解决同一个底层 serving 问题。这让竞争集合比「vLLM 对 MAX」更宽,也抬高了护城河耐久性的门槛,因为 Modular 不仅要打败直接同行,还要打败可接受替代品和既有部署习惯。[CP001, CP006, CP007, CP008, CP009, CP010]

竞争对手画像表
选项类别目标客户产品范围硬件立场分发 / 打包主要限制
Modular MAX / Mammoth直接同类想要可移植性和底层控制的 AI 基础设施团队统一服务框架、内核工具和 Kubernetes 原生控制平面已支持 NVIDIA + AMD 生产环境,并扩展到 Apple 和消费级 GPU开源入口,加上销售驱动的企业 / 云接触公开打包方式和客户规模不如主要托管或既有替代方案标准化
vLLM直接同类自托管广泛开放权重模型集群的团队高吞吐开源服务引擎,覆盖广泛模型和硬件非常广泛的多加速器支持开源自托管,或由另一平台封装托管便利性差异化较弱;客户要承担更多运营
SGLang直接同类处理共享前缀或大型分布式工作负载、对延迟敏感的团队高性能服务框架,带前缀感知和分布式优化覆盖 NVIDIA、AMD、TPU 等的广泛硬件支持开源自托管,并有强生态伙伴公开叙事仍以运行时为中心,而不是开箱即用的企业打包
TensorRT-LLM既有运行时已围绕 NVIDIA 标准化、追求单栈最高吞吐的团队针对 NVIDIA 优化的推理库,集成 Triton 和 Dynamo设计上优先 NVIDIA开源,加上 NVIDIA 深生态的带动NVIDIA 之外的可移植性结构性偏弱
Ray Serve / Anyscale相邻编排器需要组合、自动扩缩容和 BYOC 控制的平台团队框架无关的服务和编排层,可运行其他引擎跨云可移植,而不是跨内核可移植开源 Ray,加上 Anyscale 托管控制选项自身不是最深的内核优化层
Together AI托管替代方案想要立刻获得托管访问和清晰定价的团队无服务器推理、专用端点和 GPU 基础设施托管云抽象,而不是运行时可移植性公开 token 和 GPU 定价,并有专用部署路径买方对底层服务栈的控制较少
TGI传统直接同类已有部署且与 Hugging Face 体系对齐的用户推理工具包,支持批处理、张量并行和 API 兼容已记录多硬件支持开源运行时维护模式状态削弱了未来竞争动能
内部自建(vLLM + Ray / ONNX / llm-d)替代品 / 现状愿意自己拼平台的成熟团队自组装的服务、编排和优化栈取决于所选组件,可能非常可移植除算力和工程时间外,没有额外许可溢价集成负担更高,价值实现更慢

各行聚焦截至 2026-06-13 公开证据中对买方最相关的替代方案,而不是每一个小众推理项目。

[CP006, CP007, CP008, CP009, CP010, CP011]
FP001: 竞争定位图

以面向买家的两个轴——硬件可移植性和运营便利性——对主要选项做序数地图。分数是有证据支撑的方向性判断,不是标准化基准测试。

坐标轴是分析师基于 2026-06-13 的公开文档和套餐证据给出的序位评分。它们表达买方面临的相对取舍,不是标准化基准框架。

[CP008, CP009, CP010, CP011, CP012, CP013]

3.2 能力比较、包装,以及 Modular 真正不同在哪里

从产品实质看,Modular 的论点在可移植性和 kernel 控制重要时最清晰。MAX 被定位为一套可编程栈,覆盖 NVIDIA、AMD 以及现在的 Apple 开发目标上的 serving、模型适配和底层优化。这和 TensorRT-LLM 明显不同,后者明确为 NVIDIA-centric 部署优化;也不同于 Together AI,后者卖的是托管云,而不是可移植运行时。放到熟悉的清单上,它与 vLLM 和 SGLang 的差异没那么大。OpenAI-compatible API、batching、cache 优化和广泛模型 serving 已经是品类标配,而不是 MAX 独有功能。公开第三方证据也收窄了领先主张:Spheron 报告称,在一个 2026 H100 设置里,MAX 能在 dense-model 吞吐上击败 vLLM 和 SGLang;但同一篇评测也说,vLLM 仍是通用生产默认选项,MAX 在 MoE 成熟度、multi-LoRA 和生态集成上仍落后。 包装是另一个真实差距。Together 公开 token 价格、专用端点方案和按小时 GPU 价格。Ray 和 Anyscale 公开了清晰的 BYOC 或 multi-cloud 控制叙事。Modular 的公开界面仍把大买家推向 demo 和企业沟通。这不代表产品弱,但说明面向市场的包装比几个替代方案更不标准、更不透明。对企业买家而言,包装清晰本身就是功能,因为它降低评估摩擦。[CP002, CP003, CP004, CP005, CP016, CP017]

功能 / 能力对比
购买标准ModularvLLMSGLangTensorRT-LLMRay / Anyscale含义
跨供应商加速器可移植性在 NVIDIA 和 AMD 上强,并向 Apple 开发扩展公开资料显示覆盖许多加速器,广度强公开资料显示覆盖许多加速器,广度强NVIDIA 之外偏弱取决于底层运行时,而不是原生内核可移植性是 Modular 最清晰的切入点,但原则上并非独有
广泛模型和生态覆盖在增长,但公开文档中的广度证据较少本组中公开广度最强很强,且在快速扩展在以 NVIDIA 为核心的工作流内强取决于连接的运行时广度优势仍偏向开源既有厂商
OpenAI 兼容 API不是主要公开护城河可作为许多 API 的前端仅有 API 兼容性不能让 Modular 差异化
Adapter / MoE 成熟度公开证据较薄,第三方评测也指出缺口multi-LoRA 和广泛生产支持强multi-LoRA 和大规模部署主张强对 NVIDIA 优化很强,但范围不同交由底层引擎工作负载形态可能把买方推向 vLLM 或 SGLang
组合和多模型编排Mammoth 扩展了叙事,但公开细节有限不是主要价值主张不是主要价值主张不是主要价值主张Ray Serve 和 Anyscale 的核心强项平台团队可能更偏好编排优先工具
托管部署便利性企业和云演示路径通常自托管或由伙伴封装通常自托管或由伙伴封装通常在 NVIDIA 栈内自托管BYOC 控制,不是即开即用的无服务器简便性Together 等类似提供商降低评估摩擦
公开定价透明度没有伙伴封装时低没有伙伴封装时低没有伙伴封装时低企业定价不透明打包透明度是竞争变量,不只是运营细节

单元格总结了 2026-06-13 可获得的最强公开证据;若竞争对手材料无法证明同等能力,对比保持方向性而不是绝对判断。

[CP016, CP017, CP018, CP020, CP027, CP028]
定价 / 打包对比
选项公开定价界面合同模式包含能力未知项 / 切换含义
Modular未找到公开企业目录价开源入口,加上演示 / 企业销售动作MAX 开源框架、托管或企业路径、自定义部署讨论定价不透明增加尽调摩擦,也削弱简单替代销售动作
Together AI 无服务器已公布按 token 定价按量计费的无服务器 API托管模型访问,无需管理基础设施团队快速比较供应商经济性时,容易从这里切入并做基准测试
Together AI 专用基础设施已公布小时目录价,例如 H100 和 B200 报价专用端点或预留 GPU 合同单租户性能和控制,加上托管运营具体目录价让它更容易与内部自建成本模型对比
vLLM 自托管运行时开源,因此没有目录价算力加工程人力覆盖广泛模型和硬件的服务引擎软件层面看起来便宜,但可能隐藏运营负担
SGLang 自托管运行时开源,因此没有目录价算力加工程人力高性能运行时,主打强共享前缀和分布式能力经济取舍取决于内部运营成熟度
TensorRT-LLM 自托管运行时本身没有目录价NVIDIA 栈内的算力加工程人力针对 NVIDIA 优化的服务,并与更广推理工具集成买方已围绕 NVIDIA 标准化时有吸引力
Ray Serve / Anyscale没有简单的公开工作负载价格表开源 Ray 或企业云协议组合、自动扩缩容和 BYOC 控制更适合作为平台支出,而不是按模型计价的服务价格
内部自建除所选组件外没有供应商目录价工程时间加算力从 vLLM、Ray、ONNX Runtime、llm-d 和周边工具拼出的自定义栈可以压低许可支出,但会增加集成和维护负担

在已审阅选项中,只有 Together 公开了丰富的价格界面;大多数其他选项需要内部成本建模或销售接触,因此未知项本身就是竞争叙事的一部分。

[CP019, CP037, CP038, CP041, CP042]
FP002: 功能广度 / 能力图

这张高层能力图按买方关心的维度对比主要选项。单元格只呈现方向性公开证据;未知不等于缺少能力。

这张图把多条主张压缩成方向性强弱标签,方便读者快速看清取舍;详细证据仍放在配套表格和主张引用里。

[CP016, CP017, CP018, CP019, CP020, CP024]

3.3 切换成本、分发力,以及 incumbents 为什么仍强

反驳 Modular 耐久护城河的最强反向证据,不是 MAX 缺少技术价值,而是很多买家不会迁移,除非迁移负担明显值得。CUDA 锁定会靠工具、库、验证工作流,以及先在 NVIDIA 上走「fast path」的实际习惯不断累积。AlphaStreet 2026 年文章引用 NVIDIA 披露的生态规模,强调这种装机基础的深度。NVIDIA 自己的 MGX 材料把故事从软件延伸到伙伴分发、模块化服务器参考设计和全栈系统兼容性。TensorRT-LLM 随后给这套硬件基础配上专用 serving 栈。对保守企业来说,这个 bundle 最好的一面就是无聊:懂它的工程师很多,集成路径熟悉,qualification 负担已经被吸收。 Modular 试图靠可移植性和更好经济性打破这种惯性,但竞争对手生态也能彼此协作。Anyscale 明确表示,用户可以在其平台上扩展 vLLM 和 SGLang。内部自建买家可以在 Ray 下跑 vLLM,或把 llm-d 和 ONNX Runtime 叠进自己的栈。托管买家可以用 Together,而不是运营任何 runtime。这些选项让 multi-homing 现实可行,也降低了 MAX 成为唯一架构默认选项的概率。因此,Modular 的分发挑战至少和技术挑战一样大。[CP020, CP021, CP022, CP030, CP031, CP032]

3.4 护城河耐久性、买家匹配和竞争结论

最可辩护的 Modular 逻辑不是「MAX 到处打败所有人」。更可信的逻辑更窄:某些买家越来越想要一套栈,能快速启动新硬件,保留自定义 kernel 空间,并降低对 CUDA-only 工作流的依赖。对这些客户而言,Modular 的 MAX 加 Mojo 加 Mammoth 一体化故事有差异化,也有实质产品工作支撑。公开材料显示出真实野心和足够第三方验证,可以把这个楔子视为真实存在。但护城河仍是有条件的,而不是已经落定。vLLM 和 SGLang 拿住更多开放推理心智份额。TensorRT-LLM 搭载最深的既有平台。Together 和 Anyscale 简化了那些更看重便利或控制、而非运行时新颖性的买家的采购。内部自建路径仍可信。 实际结果是一个分层市场。当工作负载是 dense-model 推理、买家重视跨厂商可移植性,并愿意采用较新的栈来换取潜在性能或灵活性收益时,MAX 看起来最强。当需求是默认稳妥的 OSS 广度、完全成熟的 MoE 和 adapter 生态、全托管云便利,或严格绑定 NVIDIA 软件和渠道栈时,MAX 看起来较弱。这是有意义但比广泛基础设施赢家叙事更窄的竞争位置;因此,护城河耐久性取决于 Modular 能否在既有厂商吸收更多同类叙事之前,把可移植性楔子转化为可重复客户采用。[CP014, CP015, CP016, CP023, CP024, CP026]

护城河耐久性 / 竞争风险登记表
护城河主张威胁严重性威胁为何真实缓释措施 / 尽调问题
跨供应商可移植性vLLM 和 SGLang 也宣传广泛加速器支持可移植性重要,但竞争运行时已经公开覆盖许多加速器索取真实迁移案例,证明相较开源同类,Modular 上手更快或重新验证负担更低
性能领先第三方胜出取决于工作负载,冷启动取舍仍在Spheron 报告 MAX 在稠密模型上胜出,但也指出首次运行冷启动更慢、MoE 成熟度较弱、生态支持较薄要求在稠密、MoE、延迟敏感和共享前缀工作负载上提供独立、同口径基准测试
集成式全栈控制Ray/Anyscale、Together 和内部自建栈可以把运行时与编排、采购拆开如果能组合出足够可接受的替代方案,许多买方并不需要一家供应商拥有每一层验证 Mammoth 是否真正减少运维人头,还是只是把常见平台功能重新打包
降低供应商锁定CUDA 锁定和 NVIDIA 渠道权力可能压过可移植性的经济收益迁移成本包括验证、工具链,以及拿到稀缺、可投产算力的能力在真实客户工作负载上测试 Modular 能否显著降低切换时间或 TCO
开源可信度vLLM 和 SGLang 目前在开放推理里的声量更高心智份额会带动集成、第三方支持和买方信心不只看 star,还要跟踪贡献速度、合作伙伴封装和具名生产案例
销售驱动的企业切入点托管替代方案公开的价格更清楚,试用入口也更容易打包方式不透明,会拖慢替换托管竞品的交易要求提供标准化价格区间、迁移优惠和投产周期参考

这张清单抓住主要公开护城河主张,以及最可能削弱这些主张的公开证据;由于拿不到私有客户证据,它是方向性判断,不是穷尽列表。

[CP016, CP021, CP023, CP024, CP030, CP033]
FP003: 护城河 / 就绪度 KPI

第 3 章最关乎 Modular 竞争位置的维度,用一张紧凑计分卡呈现。

[CP016, CP023, CP024, CP030, CP033, CP034]

3.5 图表

Chapter 04

04财务情况

4.1 变现界面和公开定价真正说明什么

Modular 的公开商业栈在包装层面异常清晰,尽管已实现经济性仍不透明。公司保留免费的自托管社区版,明显是开发者获客漏斗,而不是直接收入来源。付费变现随后分成三个主要界面:按 token 计价的共享端点、Modular 自有云内按分钟计价的专用端点,以及让推理留在客户环境内、按分钟计价的 BYOC 部署。公司还叠加自定义模型工作、自定义 kernel 和 forward-deployed engineers,因此付费产品不只是「租 GPU」,而是软件加服务模型。真正有用的是,Modular 公开了共享端点的实际 token 标价,也公开了专用和 BYOC 的计费基础。定价界面没揭示的同样重要:公开页面仍不展示分钟费率卡、典型企业折扣、渠道费用或已实现毛利率。因此读者应把定价页视为标价机制,而不是底层收入质量的证据。[CI001, CI002, CI003, CI004, CI005, CI006]

收入来源表
收入流机制计费单位公开证明收入质量判断尽调要求
社区版 / 自托管MAX + Mojo 按社区许可证免费分发免费定价页和 MAX 页面显示不收使用费漏斗证据强,但没有直接收入证据需要免费转付费转化率、激活率和企业交接率
共享端点Modular cloud 托管的开放模型 API$/1M tokens定价页公布模型级标价和缩容至零条款公开价格透明度最好,但实际折扣和毛利率未知需要按模型家族拆分的混合实际 ASP、利用率和毛利率
专用端点Modular cloud 中带工程师支持的预留热容量$/minute专用端点页面说明按分钟计费和预留容量更适合可预测的企业支出,但未公开费率表需要实际分钟费率、最低承诺额和单账户平均预留容量
BYOC / Your Cloud控制平面和工程师叠加在客户自有基础设施上$/minute(已部署)BYOC 页面说明客户云积分和承诺仍然适用可能接近软件收入确认,但净抽成率不透明需要按 BYOC 账户拆分的确认收入与转嫁云支出
定制模型 / 定制 kernel性能工程、专有模型部署和定制 kernel 工作合同 / 项目 + 经常性平台使用Custom Models 和 MAX 页面描述高级技术服务ACV 可能高且粘性强,但经常性收入和项目收入的组合未知需要服务与平台收入拆分,以及经常性部署的附加率
合作伙伴 / marketplace 渠道通过 AWS Marketplace 和云服务商关系采购与部署Marketplace 采购 + 收入分成 / 支持AWS Marketplace 公告和 Reuters 都描述了渠道动作可能加速预订额,但渠道费用会稀释净实现需要 marketplace 费用栈、收入分成比例,以及直销与渠道预订额组合

各行把公开打包方式和隐含经济性拆开。计费机制可见;实际合同费率、渠道费用和收入确认细节仍是私有信息。

[CI001, CI002, CI003, CI004, CI005, CI011]
定价 / 变现表
产品公开标价 / 合同依据包含内容可能变现的内容不透明 / 未知主要来源
自托管社区版永久免费MAX + Mojo、社区支持、自行部署开发者采用和未来企业管线转化率和支持负担定价页
共享端点按 token 标价;样本行中,输入价格从 $0.10 到 $1.74,输出价格从 $0.50 到 $4.30,均按每 1M tokens 计托管 API 访问、自动扩缩、可观测性、Modular 托管基础设施经常性用量收入实际折扣、模型组合和按工作负载拆分的利润率定价页
专用端点预留热容量按分钟计费专用 GPU、支持、前置部署工程师已承诺或经常性企业用量实际分钟费率、最低承诺额和 SLA 定价Dedicated Endpoints + 定价页
BYOC / Your Cloud已部署容量按分钟计费;客户使用自己的云积分 / 承诺控制平面、部署自动化、工程支持、VPC 驻留客户云支出之上的软件 / 平台费加服务费收入确认依据、合作伙伴成本和支持强度Your Cloud + 定价页
用量 / 承诺使用定制承诺使用和用量定价更大规模付费部署可获得折扣更高 ACV,并可能拉长合同折扣表和锁定机制定价 FAQ
AWS Marketplace 渠道Marketplace 采购路径加集中 AWS 账单Marketplace 采购、支持包和云账户购买路径渠道来源预订额和收入分成Marketplace 费用,以及由该渠道贡献的业务占比AWS Marketplace 公告 + AWS 案例研究

这张表刻意只看定价机制,不看实际经济性。公开资料说明产品如何销售,但不能说明扣除折扣、积分或渠道费之后的净有效费率。

[CI006, CI007, CI008, CI009, CI010, CI011]
FI001: 收入模型桥

从免费开发者采用流向付费触点,Modular 可在这些触点上把软件、服务和渠道采购变现。

[CI001, CI002, CI003, CI004, CI005, CI015]

4.2 GTM 动作、渠道证据和牵引力代理指标

Go-to-market 图景比财务披露更可信。Modular 的公开界面暗示典型 land-and-expand 动作:免费 MAX 和社区工具吸引开发者,共享端点降低试用门槛,一旦可靠性、合规或成本控制变重要,专用或 BYOC 部署就成为付费路径。Reuters 增加了一个重要细节:公司计划直接向企业销售,也通过与云提供商的收入分成伙伴关系销售。AWS 合作和 AWS Marketplace 材料强化了这种解读,因为它们显示了通过 AWS 账户集中采购、支持包装,以及不止一个推理端点的至少两个 Marketplace 应用。公开验证混合但真实。Modular 点名 Inworld、AWS、NVIDIA、AMD 和 Hippocratic AI 等客户和伙伴,并称其生态现在覆盖每月数万下载、每天 trillions of tokens,以及 100 多个国家的开发者。这些是有用的牵引力代理指标,但仍只是代理:它们不披露有多少付费客户,bookings 如何在直销和渠道之间拆分,或开发者兴趣能否转化为耐久企业收入。[CI016, CI017, CI018, CI019, CI020, CI021]

FI003: 财务估计区间

公开材料能支撑标价、客户节省主张和资本基础的区间,但不能支撑收入或现金跑道区间。

这张图有意不假装能用公开证据给收入、烧钱速度或现金跑道划区间。能支撑的只有公开标价、公司策划的节省主张和累计融资。

[CI008, CI009, CI010, CI022, CI028, CI029]

4.3 单位经济性、成本结构和公开证据边界

公开证据足以勾勒单位经济性模型的形状,但不足以计算。正向一面,Modular 反复讲同一个经济性故事:跨 NVIDIA 和 AMD 的硬件可移植性让客户追逐更好的 price-performance,BYOC 让客户使用自己的云 credits 和 commitments,MAX 的编译器加 kernel 栈应能提升吞吐,同时降低延迟和冷启动开销。Inworld 引述提供了一个具体但由公司筛选的证据点,声称 time-to-first-audio 大约快 70%,最终价格可能比 vanilla vLLM 路径低约 60%。但这些都没有揭示 Modular 自身的已实现毛利率。Forward-deployed engineers、自定义 kernel、支持和优化都会增加服务成本;按分钟计价的专用或 BYOC 合同,可能只有在利用率保持高位且支持强度受控时才有吸引力。核心尽调结论是,标价和客户轶事显示价值可能存在的位置,而不是证明公司已经以健康毛利率、高效销售回本和耐久留存捕获这些价值。[CI022, CI031, CI032, CI033, CI034, CI035]

单位经济性表
指标公开数值 / 状态置信度重要性可见驱动因素尽调要求
收入 / ARR未公开披露决定牵引力代理指标能否转化为真实商业规模只有下载量、tokens 和具名 logo 等间接代理要求提供最新月收入、ARR 和产品组合
按产品表面拆分的毛利率未公开披露核心问题是可移植性和服务能否拼出有吸引力的软件经济性GPU 成本、利用率、batching、支持和云支出转嫁要求按共享、专用、BYOC、服务和渠道拆分毛利率
实际折扣率未公开披露如果企业折扣很重,标价会高估变现能力已提到承诺使用定价和用量折扣,但未量化要求按细分市场和部署模式拆分平均折扣
支持 / 工程强度明显重要,但未量化前置部署工程师能提高 ACV,也会挤压贡献毛利嵌入式工程师、定制 kernel、高级支持、专业服务要求提供每账户支持小时数和工程师配置
客户 ROI 证明只有选择性正面轶事有助于销售,但不能替代 Modular 自身利润率数据Inworld 引述、AWS 成本 / 性能叙事、可移植性叙事要求提供独立的客户前后利润率和利用率研究
GPU / 云成本杠杆方向上正面,但 Modular 自身未量化可移植性是这条投资判断背后的核心经济楔子NVIDIA/AMD 切换、云积分、运行时效率、batching要求按硬件类别提供利用率和每 token 成本
CAC / 回本期未公开披露需要用它判断 GTM 扩张是否高效只有员工数增长和 GTM 招聘这些间接信号要求提供销售效率看板和按细分市场拆分的回本期
NRR / 流失未公开披露基础设施软件里,经常性质量比一次性试点更重要没有公开 cohort 或续约数据要求按产品表面提供 cohort 留存和 gross/logo churn
客户集中度未公开披露少数大客户或云伙伴可能扭曲早期收入质量具名客户公开,但收入集中度不公开要求提供前 10 大客户收入占比和伙伴依赖度

公开资料不足以支撑可信指标的地方,留空是刻意处理。这张表区分可见经济驱动因素和实际测得的单位经济性。

[CI022, CI031, CI032, CI033, CI034, CI035]
公开财务缺口表
缺失项重要性当前公开状态精确尽调路径严重性
收入 / ARR需要把牵引力代理指标转化为真实商业规模未找到权威公开数字获取按产品表面拆分的月度经常性收入、非经常性收入和 ARR bridge阻断性
现金、burn 和 runway这是判断资金依赖度的核心未找到权威公开数字获取 treasury 余额、burn bridge 和董事会 runway 情景阻断性
按部署模式拆分的毛利率核心问题是软件质量能否压过基础设施拖累未发现公开毛利率披露获取共享、专用、BYOC 和服务的毛利率 waterfall重大
客户集中度和合同期限检验收入耐久性和续约风险具名 logo 公开;集中度不公开获取头部客户集中度、ACV、期限和续约日程重大
Marketplace / 云收入分成经济性如果费用栈很高,渠道增长会稀释净实现Marketplace 动作公开;经济性不公开获取费用表、收入分成条款和伙伴来源预订额拆分重大
销售效率指标需要判断 GTM 扩张是否克制未发现 CAC、回本期或 NRR 披露获取按细分市场拆分的 CAC、回本期、pipeline 转化和 NRR重大
利用率和支持负载决定按分钟和 token 计价的产品表面能否盈利扩张公开资料只有方向性效率主张获取 GPU 利用率、每 token 成本和工程师 / 账户比重大

这张表点名缺失的具体私有证据;拿到这些证据后,本章才能从设计层面的分析变成可承保的财务分析。

[CI031, CI032, CI034, CI035, CI044]
FI002: 单位经济模型桥

虽然公司不披露最终指标,这条定性流程展示了大概率决定 Modular 毛利结果的主要输入。

这条桥是定性的,因为公开来源披露了驱动因素,但没有披露毛利率、CAC 或回本周期等输出指标。

[CI022, CI032, CI033, CI034, CI035, CI045]

4.4 资本充足性、融资依赖和财务结论

Modular 的资本基础是真实的,但公开证据仍无法支撑精确跑道判断。公司在种子轮、Series B 和 Series C 中合计融资约 $380 million,最新一轮估值约 $1.6 billion。公开报道还称,2025 年融资将用于工程和 go-to-market 扩张,同时推动公司从推理进入训练。这很重要,因为一个软件主导的推理平台,在依赖 BYOC、伙伴云和 marketplace 渠道时,可以保持相对 asset-light;但更深进入训练,或更重地持有基础设施,都可能显著提高资本强度。最干净的可比警示来自 CoreWeave 的 S-1/A:AI 基础设施的爆炸式收入增长,可以和大额净亏损、重大债务、可观 capex 需求以及客户集中并存。反向竞争背景也指向同一方向:NVIDIA 的 CUDA 锁定、MGX 生态和一体化平台 bundling 提高迁移摩擦,并可能限制替代栈把兴趣转化为盈利性经常性支出的速度。因此结论是,Modular 作为软件加服务平台在财务上看起来有前景,但作为可承销企业仍受证据限制,因为收入质量、毛利结构和现金跑道仍是私有信息。[CI025, CI026, CI027, CI028, CI029, CI030]

资本充足性表
项目公开证据置信度含义尽调要求
累计融资额种子轮、Series B 和 Series C 合计 $380M软件主导的推理平台有了有分量的资本基础要求提供完全稀释后 cap table 和剩余 primary cash
最新融资2025 年 9 月以约 $1.6B 估值完成 $250M Series C提供融资可信度,也给 2025 年之后的投入留下空间要求提供交割后现金余额和投资人权利
当前规模代理公开报道约 130 名员工 / 超过 130 人暗示已有真实运营规模,但固定成本基数也比早期创业公司更大要求提供部门人头和招聘计划
资金用途扩大工程和 GTM,并从推理推进到训练进入训练可能显著推高算力和人才需求要求提供 24 个月投资计划,以及训练扩张的阶段门
手头现金未公开披露无法直接估算 runway要求提供最新现金和有价证券余额
Burn / runway未公开披露只靠公开数据无法承保下一轮时点和下行情境韧性要求提供 gross burn、net burn,以及基准 / 下行情境 runway
债务 / 项目融资义务在已审阅资料中未找到 Modular 公开债务栈可能是真实优势,也可能只是披露缺口要求提供债务明细、租赁和云承诺负债
战略变化下的资产负债表敏感性如果 Modular 拥有更多基础设施或激进扩展训练,可能会上升路线图选择可能让公司从软件型经济性转向资本更重的经济性要求提供轻资产与较重资产扩张路径的情景分析

历史融资时间线只在判断未来资本充足性所需范围内引用。现金、burn、债务和 runway 这些缺失项,是承保的主要障碍。

[CI025, CI026, CI027, CI028, CI029, CI030]
FI004: 资本强度 / 现金流图

这张矩阵展示当前资产负债表负担落在哪里,以及 Modular 若调整战略姿态,负担可能在哪些地方上升。

方向性标签反映资产负担似乎落在哪里,不是量化的 Modular 损益表。加入可比行,是为了框定战略若转向更重基础设施所有权可能发生什么。

[CI017, CI018, CI030, CI036, CI037, CI038]

4.5 图表

Chapter 05

05产品与技术

5.1 平台地图和客户侧工作流

Modular 面向客户的产品已经不只是「一门编程语言」或「一个推理引擎」。公开界面现在可以拆成四个相互连接的层。第一,MAX 是 serving 和模型执行框架:它暴露 OpenAI-compatible endpoint,可通过 CLI 或 Docker 自托管运行,并给开发者一条类似 PyTorch 的路径来做自定义模型和自定义 ops。第二,Mammoth 是 scale-out 编排层:一个 Kubernetes-native 控制平面,服务那些需要在异构 GPU 集群中放置多个模型,并自动平衡性能和成本的组织。第三,Mojo 是栈底部面向 kernel 的语言。Modular 把它呈现为开发者扩展 MAX、编写 hardware-agnostic GPU kernel,并在 NVIDIA、AMD、Apple 和 CPU 之间保持可移植性的方式。第四,Modular 把软件包在几个部署界面里——自托管端点、托管 serverless 或专用端点,以及把推理流量留在客户 VPC 的 bring-your-own-cloud 选项。 放到客户工作流里,架构很直观,尽管实现很有野心。团队先选择一个受支持模型,或把相邻的 Hugging Face 架构移植到 MAX;随后把模型放在 OpenAI-compatible API 后 serving,再选择把端点留在本地、迁入 Modular 托管云,或采用 VPC-resident 部署。如果工作负载变大、多模型化或异构化,Mammoth 就是下一层,用来协调模型放置和分布式推理。这个顺序很重要,因为它让产品变得可理解:MAX 是执行层,Mammoth 是集群管理层,Mojo 是可扩展性层。最佳证据支持一张真实模块地图,而不是营销伞面,尽管社区 / 开放入口和合同约束商业使用之间的边界仍需尽调。[CE001, CE002, CE003, CE004, CE005, CE007]

产品模块 / 资产矩阵
模块 / 资产主要用户状态 / 成熟度差异化尽调缺口
MAX serving 框架推理工程师和平台团队已公开发布;文档、PyPI 包、GitHub repo 和 release 分支都活跃兼容 OpenAI 的 serving,加上跨供应商可移植性和定制 kernel 可扩展性需要客户级证据,证明生产 uptime 以及从现有 stack 迁移的摩擦
MAX 定制模型工作流调整 Hugging Face checkpoints 的模型开发者已有公开文档,包含参考架构和 weight-adapter 工作流让团队复用现有架构,只覆盖不同的 graph 片段需要证明非平凡架构需要比文档暗示更深重写的频率
Mammoth 编排层在混合 GPU 集群上运行大量模型的企业 AI 基础设施团队公开预览Kubernetes-native 控制平面、多模型编排,以及异构硬件上的解耦推理需要 GA 时间、客户参考和大集群运营的独立证明
托管云希望由 Modular 运营生产推理的团队已公开提供 serverless、专用、定制模型和 batch 模式从 kernel 到 cloud 的优化,加上前置部署工程支持公开 SLA 细节、认证证据和按产品表面拆分的可靠性指标仍偏薄
自带云有既有云承诺的受监管或安全敏感买方已公开提供数据平面留在客户 VPC 内,同时保留 Modular 控制平面工具和 GPU 可移植性控制平面边界、遥测和安全审查负担需要采购尽调
Mojo 语言Kernel 开发者和高级系统程序员1.0 beta;更广路线图仍在推进类 Python 语法,具备编译期元编程、硬件调度和可移植 kernel 编写能力需要确认最终 1.0 时间线,并厘清 beta 之后的编译器治理
社区和渠道表面开发者、评估者和企业买方活跃但仍在成熟GitHub、PyPI、Meetup、Discord、YouTube 和 AWS Marketplace 提供多条获客路径主流故障排查和独立生态广度仍落后于更老的开源对手

各行把执行层产品与编排、部署、语言和开发者获客表面分开,因为 Modular 现在销售的是一个 stack,而不是单一 runtime。

[CE001, CE003, CE007, CE012, CE014, CE024]
工作流 / 用例表
用户任务当前工作流Modular 方案可量化收益局限
快速启动标准开放模型拉取 Hugging Face 模型,搭建端点,接入 OpenAI 客户端max serve 或 Docker 启动兼容 OpenAI 的端点代码改动最少,自托管验证推进快收益在落地速度,不等于证明企业级耐久性
移植自定义或相邻架构手动适配配置字段、checkpoint 名称和自定义层MAX 参考架构,加上 arch.py、model_config.py、model.py 和 weight_adapters.py 工作流复用既有计算图和内核,不必从零搭建服务栈深度创新架构可能仍需要新的图组件
提升重复 prompt 工作负载吞吐服务重复系统 prompt 或长对话时,KV-cache 重复计算冗余Prefix caching 通过 PagedAttention 默认启用prefix 重复时,TTFT 更低,有效吞吐更好唯一 prompt 或解码主导型工作负载收益有限
提高受支持模型的 token 生成效率目标模型逐步运行,每个 token 都承担完整验证成本用 EAGLE、EAGLE3、MTP 或独立 draft model 做 speculative decoding每步可接受多个 token,计算利用率更高启用 speculative decoding 后,不支持 structured output 和 echo
在应用工作流中强制 schema 安全响应在下游 Python 或中间件解析自由格式模型文本借助 llguidance、JSON schema 或 Pydantic 实现 structured output下游系统拿到可预测的输出契约目前仅支持 GPU;仍需仔细测试,因为模型训练本身仍然关键
运行大规模、多模型生产集群手动把模型放到不同 GPU 类型上,并手工处理扩缩容Mammoth 控制平面提供模型放置、自动扩缩容和解耦式推理混合集群里的硬件利用率和多模型编排更好公开证据主要是公司撰写的预览材料,还不是广泛一线验证

各行刻意按真实买家任务组织,而不是按产品品牌组织,让工作流表始终锚定团队想用这套栈完成什么。

[CE002, CE005, CE009, CE010, CE014, CE017]
FE001: 产品架构图

Modular 的公开技术栈从托管或驻留 VPC 的部署触点一路向下,经过 MAX 服务和模型图,落到 Mojo 内核与异构硬件目标。

这套技术栈由产品页、文档和发布说明综合而来,不是从某一张厂商系统图复制。

[CE001, CE002, CE003, CE007, CE012, CE013]
FE002: 客户工作流 / 运营流程

典型 Modular 工作流从选择或适配模型开始,把模型放到 MAX API 后面提供服务,再按工作负载复杂度扩展到托管云、BYOC 或 Mammoth。

这条流程强调客户动作点,而不是内部调度器的每一步。

[CE002, CE003, CE014, CE017, CE020, CE022]

5.2 架构、部署模型和这套栈如何实际运行

Modular 解释 MAX 如何组织模型和 serving 内部结构时,技术故事最强。公开文档显示,MAX 把模型支持视为一组架构包:它们定义计算图、类型化 config、权重 adapter,以及把 Hugging Face checkpoint 映射到 MAX graph 格式所需的任何自定义层。这不只是浅层 wrapper:平台声称提供硬件优化 kernel、生产 batching、KV-cache 管理和多 GPU 分布,而无需用户从零重建 serving 层。Runtime 优化界面也很具体。MAX 文档把 speculative decoding、prefix caching 和 structured output 列为一等 serving 功能,并明确限制,例如 speculative decoding 不兼容 structured output。文档还说明 prefix caching 默认开启,structured output 目前仅支持 GPU。 部署架构同样具体。Modular 托管云提供 serverless、dedicated、custom-model 和 batch-inference 模式。Bring-your-own-cloud 选项把 data plane 留在客户 VPC 内,同时把端点生命周期、扩缩容策略、监控和模型注册交给 Modular 运营的 control plane。这个拆分对有数据驻留要求的团队有吸引力,但也是企业买家必须接受的真实治理边界。Modular 还用 forward-deployed engineering support 和明确承诺强化托管服务姿态:调优吞吐、延迟,甚至自定义 Mojo kernel。换句话说,产品不只是可下载 runtime,而是软件加专家运营的组合,运营模型横跨 graph compilation、kernel specialization、部署策略和人工调优支持。[CE014, CE015, CE016, CE017, CE018, CE019]

技术 / 运营架构表
层 / 组件角色依赖风险
Hugging Face / 模型架构映射提供 checkpoint、配置元数据,以及 MAX 适配的源模型家族依赖 MAX 参考架构和权重适配器持续跟上新架构或快速演进架构可能拉长 bring-up 时间
MAX 图与模型层构建类型化配置、计算图、量化设置和多 GPU 执行计划依赖 arch.py、model.py、model_config.py 等架构包不受支持的图差异可能迫使团队做定制工程
服务运行时暴露兼容 OpenAI 的端点、批处理、KV-cache 管理和运行时功能依赖图编译、缓存格式和端点标志功能组合有明确限制,例如 speculative decoding 与 structured output 不能并用
Mojo 内核层实现可移植 GPU 和 CPU 内核,并支持 custom ops 扩展依赖 Mojo 语言成熟度,以及编译器在不同目标上的行为封闭编译器治理仍是可审计工具链的尽调问题
部署控制平面处理端点生命周期、扩缩容、可观测性;在 Mammoth 场景下还处理工作负载放置即使采用 BYOC 模式,也依赖 Modular 运营的控制服务相比纯自托管,客户控制力下降,受监管买家尤其敏感
人工支持层前置部署工程师为企业部署调优工作负载并编写自定义内核依赖服务产能和 Modular 自身工程带宽经济与运营扩展性可能弱于纯软件毛利率暗示的水平

这张架构表同时列出软件组件和运营模式,因为 Modular 的企业产品交付包含专家服务。

[CE014, CE015, CE017, CE019, CE022, CE025]
FE003: 关键依赖图

Modular 的执行栈依赖外部模型生态、Modular 运营的控制服务和硬件厂商,尽管它试图降低对任何单一加速器栈的依赖。

这张图聚焦运营依赖,而不是所有权或排他合同。

[CE014, CE025, CE026, CE038, CE043, CE046]

5.3 差异化、路线图和开发者界面的强度

Modular 最清晰的差异化主张不只是速度,而是可移植性能。公司反复强调,同一套 MAX 和 Mojo 代码可以在 NVIDIA、AMD 和 Apple 硬件之间迁移,而不继承 CUDA 锁定;公开证据比泛泛的「write once, run anywhere」口号更具体。25.6、AMD 合作和 MI355 bring-up 材料显示,公司围绕快速硬件启用、公开 benchmark scripts,以及一种可专门化组件但无需重写整个 kernel 的 kernel 架构来锚定叙事。Structured-kernels 系列尤其有说明力,因为它把可移植性描述为一种软件架构属性:通用 kernel control flow,加上硬件特定的 TileIO、TilePipeline 和 TileOp 组件。如果实践中成立,这是整套栈里最有意义的产品楔子。 路线图也显得活跃,而不是静态。MAX 的 Python API 在 26.1 中脱离 experimental,加入 eager mode 和面向生产的 model.compile。Mojo 从「未来语言」故事走向真实 1.0 进程:path-to-1.0 文章设定稳定性目标,26.3 宣布 beta、2026 年晚些时候 finalization 目标,以及新的独立 Mojo 网站。开发者界面真实存在,但仍不均衡。GitHub 显示稳定和 nightly 的发布纪律、外部贡献、社区会议和大型开放 repository;PyPI 用标准 Python packaging 分发 modular package;Meetup、Discord 和 YouTube 给项目提供可见社区界面。与此同时,主流故障排查足迹仍早:抓取时 Stack Overflow 的 mojo-lang tag 有零个问题,独立评测仍把 MAX 描述为有前景但生态广度窄于 vLLM。结果是一个可信但仍在成熟中的开发者护城河。[CE028, CE029, CE030, CE031, CE032, CE033]

路线图 / 发布 / 开发阶段表
日期 / 阶段功能 / 里程碑状态含义来源
2025-06通过 Modular 合作实现 AMD GPU 全面可用已发布可移植叙事从「只支持 NVIDIA」的认知推进到真实 AMD 生产支持Modular + AMD 博客
2025-09Modular 25.6 增加 B200、MI355X、Apple Silicon 支持、pip install mojo 和 benchmark scripts已发布强化硬件可移植切入口,并降低开发者设置摩擦25.6 发布博客
2025-12公布 Mojo 1.0 路径已公布释放信号:语言从实验性高速迭代转向兼容性预期Path to Mojo 1.0 博客
2026-01Modular 26.1 让 MAX Python API 和 model.compile() 毕业已发布强化将 PyTorch 训练模型移植到生产 MAX 图的叙事26.1 发布博客
2026-04结构化内核可移植性系列展示跨 NVIDIA 和 AMD 的专门化能力已发布 / 工程证明表明内核可移植性正变成架构纪律,而不是一次性 benchmark 技巧Structured kernels 第 4 篇
2026-05Modular 26.3 推出 Mojo 1.0 beta 和 MAX 视频生成Beta / 已发布混合产品宽度继续扩张,语言稳定性也接近正式 1.0 线26.3 发布博客和 GitHub releases
2026 (forward)Mammoth 走向托管端点;最终 Mojo 1.0 在年内推出路线图 / 预览最重要的成熟度跃迁仍在前方,尤其是编排和编译器治理2025 年度回顾和 26.3 博客

日期基于发布文章和版本产物内嵌的发布时间;前瞻行仍是路线图主张,而不是已交付证明。

[CE028, CE030, CE033, CE035, CE036, CE037]
FE004: 产品成熟度 / 能力图

公开证据对 MAX 服务、可移植性主张和开发者工具的支撑最强;对安全认证、主流生态深度和 Mammoth 实地成熟度的支撑较弱。

这张矩阵只反映已审阅公开来源包能支撑的内容。

[CE017, CE024, CE025, CE034, CE035, CE038]

5.4 信任、治理和仍未解决的产品风险

Modular 确实有可见的信任控制,但公开资料包在政策上强于 attestations。隐私政策描述技术和组织保障,并映射到 GDPR 和 CPRA 风格权利。Report-issue 页面把隐私、安全和 security 关切导向专门 security team。Acceptable Use Policy 明确覆盖 MAX Platform、Modular Cloud 和 AI-powered features,并要求法律、医疗和金融建议用例有人类 review。这些都是有意义的控制。BYOC 模型也一样,它把推理流量留在客户 VPC 内。对于主要想确认公司已经考虑隐私、误用和事件入口的买家,基础项是存在的。 但尽调缺口仍然重大。本次审阅的公开材料没有浮现 SOC 2 报告、ISO 27001 证书、公开 uptime 承诺或详细安全架构白皮书。法律结构也引入治理摩擦。Modular 已经开源 MAX 和 Mojo 的大部分内容,但 Community License 仍受合同约束,允许使用 telemetry,限制逆向工程和 standalone redistribution,并要求自定义硬件在受支持目标之外使用时获得批准。独立评论把更大的风险说清楚:Mojo 标准库可能开源,但 MAX 编译器仍闭源,对一些企业仍是合规和可审计性担忧。产品结论:Modular 看起来技术上有差异化,且方向上具备企业意识;但风险敏感买家仍应把认证、SLA 证据、编译器治理,以及 preview-to-GA 过渡视为开放尽调项,而不是已解决问题。[CE025, CE043, CE044, CE045, CE046, CE047]

信任 / 质量 / 合规表
控制 / 信号状态范围缺口
隐私政策公开且最新覆盖网站和平台数据处理、GDPR/CPRA 权利及安全措施政策层面描述了控制措施,但不是独立认证
安全 / 安全性报告入口公开且最新为安全性、隐私和安全问题提供专门问题报告表单已审阅材料未显示公开披露时间表或 bug bounty 细节
Acceptable AI Use Policy公开且最新约束 MAX Platform、Modular Cloud 和 AI 驱动功能;对敏感建议类用例增加人工审核要求政策文本已经存在,但公开材料未深入描述执行证据
BYOC VPC 数据平面隔离公开文档已说明推理流量留在客户基础设施内,Modular 运行控制服务仍需审查控制平面访问、遥测和运营边界
社区许可证与条款公开且最新定义再分发、自定义硬件审批、遥测和逆向工程限制由合同约束的 SDK 使用限制了部分企业买家需要的开放性
独立合规证明已审阅来源未公开显示通常应包括认证、正常运行时间承诺或外部安全证明来源材料中未发现公开 SOC 2、ISO 27001 或详细安全架构材料

这张表区分「有政策」和「有独立保证」,因为 Modular 已审阅的公开信任界面文档丰富,但证明偏少。

[CE025, CE043, CE044, CE045, CE046, CE047]

5.5 图表

Chapter 06

06客户情况

6.1 客户地图:Modular 先卖给开发者,但通过托管和重视合规的生产买家变现

Modular 没有一个公开客户原型。免费的 Self Hosted 版本和开源 MAX repo,显然是为了吸引想在没有前期支出的情况下测试开放模型推理的开发者和平台工程师。变现从开发者兴趣转化为生产流量时开始:Shared Endpoints 面向实验和可变负载生产,按 token 付费;Dedicated Endpoints 面向 latency-sensitive 生产,使用预留 warm capacity;BYOC 面向重视安全或合规的团队,他们希望推理留在自己的云或 on-prem 环境里。这意味着买家、用户和付款方经常分离。开发者可能启动评估,但在 Dedicated 和 BYOC 界面上,平台、基础设施、安全或财务 owner 才是真正预算持有人。公开记录还显示第二层商业关系:AWS 和 SF Compute 等渠道和生态合作方。即便它们不是最终 end-customer 工作负载 owner,也会塑造采购和部署路径。[CU001, CU002, CU003, CU004, CU005, CU006]

客户分层表
分层买家 / 用户 / 付款方具名证明用例收入 / 战略价值主要缺口
免费自助开发者开发者和平台工程师评估;入口阶段没有单独付款方Self Hosted edition、MAX 仓库、社区会议试用开放模型服务、benchmark、早期集成顶层漏斗采用和未来企业管线免费使用向付费账户转化情况未披露
托管云实验者应用团队和平台工程师使用 Shared Endpoints;预算通常在工程或产品团队Shared Endpoints 页面可变流量原型和早期生产按 token 计价的落地动作,采购摩擦低未公开账户数或转化率
延迟敏感型生产买家基础设施或平台负责人付款;开发者和 ML 团队是用户Dedicated Endpoints 页面面向生产工作负载的暖启动预留推理ACV 更高的托管生产界面未公开分钟费率卡、合同期限或续约历史
合规敏感型企业买家安全、平台或采购团队付款;应用团队和运营人员使用服务BYOC / Your Cloud 页面在客户 VPC 或本地部署中推理,搭配 Modular 控制平面和工程师最适合受监管或数据敏感工作负载未披露具名 BYOC 客户或 Fortune 500 账户
AI 原生工作负载运营方产品和基础设施团队付款;终端用户是应用客户或患者Inworld 和 Hippocratic AI实时语音和大模型推理最强公开终端客户证明,并带有量化结果证明集中在少数具名账户
渠道 / 云交易对手云或 marketplace 交易对手打通采购;终端买家可能是 AWS 客户或批量推理买家AWS 和 SF ComputeMarketplace 采购、渠道包装、批量推理分发扩大触达,不要求 Modular 直接获取每个账户不等于直接客户广度已经多元化

各行拆开开发者采用、直接企业变现和合作伙伴渠道动作,避免把 logo 误当成等价客户证明。

[CU001, CU002, CU003, CU004, CU005, CU006]
公开客户证据质量表
证据类别公开来源显示什么示例承销价值不能证明什么
公司网站上的具名客户案例研究工作负载、部署叙事和结果指标Inworld 或 Hippocratic AI搭配第三方佐证时,是最强的客户证明界面合同价值、续约或集中度
客户撰写的佐证外部客户描述同一部署问题和结果Inworld 博客相比只由公司发布的案例研究,更能提升信任度更广的客户覆盖或留存
伙伴 / 渠道案例研究Marketplace 包装、部署范围和采购路径AWS 案例研究有助于判断 GTM 和渠道设计直接终端客户多元化
发布或版本公告新分发界面或批量推理界面SF Compute 发布或 Platform 25.5显示商业化试验和产品扩张持久支出或重复使用
标识、引语或生态提及具名伙伴或客户出现在引语或宽泛名单中客户页面、Modverse、融资博客是尽调线索单独证明不了生产成熟度、支出或留存

这条阶梯是本章的核心区分:并非所有具名标识都有同等证据权重。

[CU007, CU008, CU016, CU020, CU033]
FU001: 客户旅程图

Modular 公开可见的客户路径从免费开发者采用开始;只有工作负载进入托管或 BYOC 生产环境后,才变成收入质量证明。

这张图概括公开可见的先落地再扩张动作;它不是已披露的内部漏斗。

[CU002, CU003, CU004, CU005, CU006, CU030]

6.2 具名验证:Inworld 和 Hippocratic AI 是最强终端客户信号,AWS 和 SF Compute 更像渠道验证

最强的公开客户证据来自有具体工作负载的 AI-native 应用开发商,而不是宽泛企业 logo 页面。Inworld 是最干净的案例,因为 Modular 和 Inworld 都描述了同一项生产 text-to-speech 合作:联合工程部署,从接触到生产少于八周,time to first audio 大约快 70%,前两秒音频约 200 milliseconds,最终价格比 vanilla vLLM-based 路径低约 60%。Hippocratic AI 是次强证据点。Modular 称 Hippocratic 已经每天联系数万名患者,跨多个框架运行生产部署,并在 400B-plus-parameter 模型上把 MAX 与现有 SGLang 部署 benchmark,结果显示 sub-500 millisecond TTFT,以及更好的平均和 tail latency。相比之下,AWS 和 SF Compute 主要作为包装和分发验证而重要:它们展示采购、部署和伙伴变现界面,但本身不证明广泛、独立的终端客户广度。[CU007, CU008, CU009, CU010, CU011, CU012]

客户增长 / 采用轨迹表
信号公开细节日期 / 阶段来源基础含义缺失分母
免费 / 开源漏斗Free Self Hosted edition 加上 GitHub repo、monthly community meetings 和安装文档当前定价 + GitHub repo + MAX 页面可见的开发者获客界面很强没有免费到付费转化、激活或企业交接率
生态系统汇总牵引力公司称每月下载量达到 10K's,100+ 个国家有 100K's 开发者,每日生产 token 达数万亿2025融资博客暗示使用足迹真实存在,不只是小规模试点未拆分免费使用、测试、付费生产或客户数
Inworld 生产部署共同工程化的 TTS 栈在不到 8 周内从接触推进到生产,延迟和成本更低当前具名证明Modular 案例研究 + Inworld 博客公开材料中最强的直接生产账户未披露合同金额、期限或后续扩张金额
Hippocratic AI 在线栈评估生产环境每天接触数万名患者,并用 400B+ 模型将 MAX 与既有 SGLang 做评估2026-05Hippocratic 案例研究证实适配高风险实时推理公司称关系仍在持续,但缺少续约或收入数据
AWS 采购路径AWS Marketplace 加上两个 Modular 应用,并支持集中式 AWS 账户采购2025-07 onwardAWS 案例研究 + AWS Marketplace 博客显示渠道采购可以缩短企业购买摩擦未披露 AWS 渠道贡献的 bookings 占比
SF Compute 批量渠道在联合大规模 batch API 上提供 20+ 个模型,并向前 100 名新客户提供免费 batch token2025SF Compute 博客 + Platform 25.5显示直接端点销售之外的新分发路线终端客户留存和毛利率未披露

轨迹行跟踪公开采用界面和具名里程碑,不代表内部 CRM 计数或已签约 ARR。

[CU008, CU009, CU010, CU012, CU013, CU014]
具名客户证明表
客户 / 交易对手分层部署 / 用例生产 vs 试点结果 / 证明局限
InworldAI 原生应用客户实时文本转语音推理生产部署Modular 和 Inworld 均称已上线部署,首段音频约快 70%,价格约低 60%未披露合同金额、续约或客户数贡献
Hippocratic AI医疗 AI 应用客户在稠密大模型上做实时患者对话推理生产栈持续协作公开指标包括低于 500ms 的 TTFT,以及相对既有栈更好的平均 / P99 延迟除案例研究叙事外,没有合同期限、支出水平或部署规模证明
AWS渠道 / 云交易对手Marketplace 采购,以及横跨 AWS 服务的广泛部署选项生产渠道证明,不是具名终端用户工作负载证明公开包装显示 15+ 种架构、500+ 个模型、33+ 个区域和 AWS 账户采购单独看 AWS,不能证明 Modular 直接客户已经多元化
SF Compute渠道 / 批量推理伙伴大规模离线推理 API已上线产品发布20+ 个模型、前 100 名客户免费 token,以及降本叙事缺少终端客户名称和重复支出证明

这张表刻意混合终端客户证明和渠道证明,因为二者都会影响谁购买、谁部署,以及收入可能如何流向 Modular。

[CU008, CU009, CU012, CU014, CU016, CU018]
FU002: 采用 / 部署漏斗

公开证据从宽泛的漏斗顶部活动迅速收窄,最终几乎没有硬留存披露。

计数汇总本章保留的证据,不应解读为内部客户总数。

[CU008, CU012, CU016, CU021, CU028, CU032]
FU003: 客户证明矩阵

证明质量在具名工作负载运营方上最强,在续约或集中度可见性上最弱。

评级反映公开证据质量,而不是客户质量。留存可见度低,说明披露缺失,并不代表该账户弱。

[CU008, CU012, CU016, CU021, CU027, CU028]

6.3 耐久性:扩张循环清晰可见,但留存数学仍是私有信息

Modular 客户故事吸引人的地方,是扩张循环很容易看懂。公开页面显示,公司有意搭了一座桥:先从免费自托管使用切入,再进入 Shared Endpoints,随后是 Dedicated 或 BYOC 部署,最后走向定制工程、定制内核,或 AWS Marketplace 采购。每个付费层级也都包含工程师调优工作负载,这说明扩张不只是更多 GPU 消耗,还通过优化工作和迁移帮助更深地打进账户。问题在于,公开材料没有披露判断这个循环是否耐久、高效所需的指标。没有公开客户数,没有 NRR 或 GRR,没有流失率、合同期限、续约节奏,也没有头部客户结构。因此,公开可用的耐久性代理指标只能是更弱的替代品:与 Inworld 和 Hippocratic 反复共同工程的深度、BYOC 上没有具名账户的 Fortune 500 规模说法,以及通过 AWS 做渠道打包。这些都能说明相关性,但不是续约证据。[CU023, CU024, CU027, CU028, CU029, CU030]

留存 / 重复使用 / 满意度表
指标 / 代理指标公开值分层置信度解读尽调要求
客户数未公开披露全部分层无法判断付费采用广度要求按 shared、dedicated、BYOC 和渠道拆分活跃付费账户
NRR / GRR / 流失未公开披露全部分层公开数据无法支撑收入耐久性判断要求按分层提供 cohort 留存、logo 流失和扩张
合同期限 / 续约安排未公开披露Dedicated、BYOC、渠道缺少判断经常性收入质量的基本机制要求平均期限、续约日期和自动续约结构
重复部署代理指标有,但偏定性Inworld、Hippocratic、AWS 渠道共同工程化深度和持续合作措辞暗示技术账户粘性要求每个账户的具体扩张历史和使用增长
满意度 / ROI 证明只有选择性正面轶事Inworld、AWS、SF Compute有助于销售说服力,但经过筛选且不完整要求独立 reference 和账户级前后对比研究
企业级规模证明声称 Fortune 500 规模和数万亿 token,但未具名BYOC 和公司整体动作暗示可能有规模,但不能证明客户经济具备耐久性要求具名企业 reference 或匿名 cohort 统计

公开记录缺乏支撑处的 null 是有意保留;代理指标已和真实留存披露拆开。

[CU015, CU023, CU024, CU027, CU028, CU029]

6.4 风险判断:客户证明集中,合作伙伴依赖仍是故事里的真实部分

实际风险不是 Modular 毫无证明,而是相对于公司整体叙事暗示的规模,证明面仍然偏窄。具名终端客户的工作负载证据集中在少数 AI 原生参考案例,尤其是 Inworld 和 Hippocratic;客户页面其余部分混合了合作伙伴背书、硬件平台引用,以及未具名的企业级规模说法。Reuters 及后续报道也强化了一点:公司的商业动作既直接面向企业,也通过与云厂商的收入分成合作推进。因此,渠道杠杆既是优势,也是依赖。BYOC 降低了买方摩擦,适合希望把数据和云额度留在自身边界内的团队;但这也意味着 Modular 依赖云和硬件生态,而不是掌控完整的全栈经济性。反向背景同样重要:CUDA 锁定、供给稀缺、超大规模云厂商分发,都会抬高迁移摩擦。结论是:Modular 对 AI 推理买方中的真实一部分看起来具备商业相关性,但在客户广度、留存和集中度上仍披露不足。[CU025, CU026, CU032, CU034, CU035, CU036]

扩张与集中度风险表
扩张驱动因素集中度 / 依赖风险影响尽调路径
自托管和开源漏斗的免费到付费转化公开采用可见,但向付费账户转化不透明如果下载主要停留在非商业用途,漏斗质量可能被高估索取免费转共享、共享转专用,以及代码库转演示的转化指标
实时语音参考客户最强的具名证据集中在狭窄的 AI 原生工作负载楔子里客户吸引力可能真实存在,但比更宽泛叙事暗示的更偏垂直索取语音和推理基础设施团队之外,按终端市场拆分的管线和赢率
BYOC / 受监管部署动作Fortune 500 和合规声明没有具名很难判断高端企业动作是广泛铺开,还是定制项目索取具名推荐客户,或已上线 BYOC 租户的匿名数量
AWS Marketplace / 渠道采购渠道包装可能稀释客户所有权,并掩盖直接客户集中度增长可能取决于合作伙伴政策、费用和联合销售支持索取订单额结构、费用栈和伙伴来源续约率
云 / 硬件可移植性叙事客户采用仍取决于买方能否验证从 CUDA 优先栈迁移出去即便经济性有吸引力,迁移摩擦也会拖慢采用索取按目标硬件拆分的竞争赢 / 输数据和迁移周期
具名账户集中度公开证据围绕 Inworld、Hippocratic、AWS 和 SF Compute 展开少数参考客户可能主导可见叙事索取前 10 大客户占比,以及具名参考客户相对长尾的收入

扩张向量真实存在,但每一个仍受制于账户级披露缺失或生态依赖。

[CU025, CU032, CU034, CU036, CU037, CU038]

6.5 图表

Chapter 07

07风险

7.1 风险排序:法律合规漂移和生态依赖,比短期偿付能力更重要

Modular 的风险堆栈不是由单一生死缺陷主导,而是由合规、生态依赖和执行不透明之间的相互作用主导。最强的公开缓释因素是真实存在的:公司称付费产品已通过 SOC 2 Type 2 认证;提供 BYOC/VPC 部署,把推理输入和输出留在客户网络内;2025 年以 $1.6 billion 估值融资 $250 million;并宣传可在 NVIDIA、AMD、Apple 和云环境之间移植。这些因素降低了眼前的数据驻留、融资和单一供应商风险。但风险并未消失。同一组来源也显示,Modular 的 go-to-market 仍重度依赖前置部署工程师、AWS 分发与采购入口,以及对最新加速器路线图的持续支持。公开证据在收入、毛利率、客户集中度、事故历史和管理层继任方面仍然稀薄。因此,剩余严重度最高的风险是法律 / 监管漂移和合作伙伴 / 硬件依赖,其次是运营交付和人员 / 执行风险。近期融资缓释了短期财务风险,但该风险仍然重要,因为外部投资者无法公开验证需求能否转化为耐久的软件经济性。[CR007, CR009, CR019, CR021, CR022, CR043]

FR001: 风险热力图——按类别划分的剩余严重性

法律合规漂移和合作伙伴 / 硬件依赖的剩余严重性最高,因为 Modular 的缓释措施确实存在,但仍靠外部生态和不完整公开披露支撑。

评级只是基于公开证据的定性研究判断。剩余严重性同时反映底层风险,以及公开缓释证据的不完整。

[CR007, CR021, CR028, CR031, CR043, CR048]
FR002: 风险传导图——外部冲击如何打到收入、利润率和估值

合规漂移、硬件短缺和交付瓶颈,最终都会汇聚成部署放慢、利润率承压,以及更弱的估值叙事。

[CR028, CR029, CR035, CR036, CR042, CR048]

7.2 法律、监管、隐私和出口管制风险,正随 AI 合规边界扩张而上升

法律和监管风险并非来自某一起针对 Modular 的已知诉讼,而是来自 AI 基础设施供应商服务企业工作负载时可能承担的义务越来越多。Modular 自己的隐私、条款和问题报告入口显示,公司会收集个人数据;在账户保持开放或业务需要时保留数据;把安全 / 隐私问题转给安全团队;并在条款中排除了大量可用性和责任风险。缓释一侧,其定价和 BYOC 页面宣传 SOC 2 Type 2 认证及客户 VPC 部署。但外部政策来源清楚表明,合规底线正在移动。DOJ 的 Data Security Program 已经生效,并围绕大批量敏感个人数据提出尽调、审计和受限交易要求。BIS 继续收紧先进计算出口管制。NIST 的 Cyber AI Profile 把 AI 系统的网络安全控制框定为不断上升的预期,而不是小众最佳实践。在州层面,NCSL 和 Troutman 都显示,私营部门部署 AI 现在面对的透明度、歧视、来源和行业特定责任拼图正在扩大。对 Modular 而言,关键风险不太是眼下一次违规,而是向受监管企业销售的速度,可能快过公司把这些义务映射进合同、共同责任边界和运营控制的能力。[CR001, CR003, CR004, CR005, CR006, CR007]

监管 / 法律风险登记表
风险 / 规则司法辖区当前状态可能性严重性缓释措施剩余敞口尽调路径
DOJ Data Security Program / 28 CFR Part 202 对受覆盖数据交易的义务美国联邦已生效;尽职调查和受限交易审计义务已启动BYOC 数据本地化设计、合同筛查、企业安全姿态、客户控制的 VPC 选项取得法律顾问备忘录,将 Modular 产品流、分包商和支持模式映射到 DSP 对受限 / 禁止交易的定义
影响私营部门 AI 部署的州级 AI / 隐私 / ADMT 法律拼图美国各州2025-2026 年拼图继续扩大隐私政策、条款、SOC 2 营销声明、受监管环境中的客户专属控制索取州级合规矩阵、产品通知,以及针对受监管行业和高风险用例的合同语言
高级计算支持和分发的出口管制或境外访问限制美国联邦 / 跨境BIS 指引和许可边界已生效中高硬件可移植性和云部署灵活性可以改道部分工作负载中高审查芯片、软件支持、模型访问和受关注国家敞口的出口筛查政策
BYOC 部署中的客户数据驻留或共同责任缺口合同 / 隐私 / 行业特定潜在风险;产品文档声称已有缓释推理输入和输出留在客户 VPC;云积分和数据留在客户侧索取架构图、DPA、子处理方,以及包含控制平面范围的控制边界文档
服务暂停、责任免责声明和可用性与企业预期不匹配合同 / 商业现行条款把有意义的风险放在用户身上企业合同和带 SLA 的报价可能会为付费客户缩窄这一风险对比企业 MSA/SLA 红线与公开条款,判断实际通过合同转回 Modular 的风险有多少
Mojo 与 MAX 的开源 / IP / 路线图边界IP / 许可开源扩张正在推进,但边界仍在变化核心 stdlib 的 Apache 2 发布,以及公司宣称的语义版本目标确认哪些组件仍然闭源或受合同约束,以及未来 Mojo 2.0 的破坏性变更是否会影响企业承诺

各行按剩余严重性排序,而不是只按概率排序。多行属于情景风险,因为审阅材料中没有发现针对 Modular 的公开执法行动。

[CR001, CR003, CR005, CR006, CR007, CR028]

7.3 运营和合作伙伴风险嵌在产品承诺之中:可移植性、性能和支持都依赖外部生态

运营风险与产品叙事异常纠缠,因为 Modular 承诺的不只是一个模型端点;它承诺在共享、专用和 BYOC 环境中实现跨硬件可移植、定制内核优化和企业级可靠性。公开产品页面显示,这个承诺野心很大。Shared endpoints 把 NVIDIA 与 AMD 的选择作为定价杠杆。Dedicated endpoints 卖的是始终预热的容量和前置部署工程师。BYOC 增加客户云内驻留,但控制平面仍留在 VPC 外,并依赖 BentoCloud 架构。Custom-model 页面又加入一套代码库跨 NVIDIA、AMD、Apple Silicon 和 ARM 的可移植性。这些差异点有吸引力,但也扩大了 QA 矩阵,放大了新一代 GPU 上任何回归的后果,并让支持人员配置成为产品的一部分。外部证据进一步强化这一点。AWS 案例研究和合作文章显示,采购、部署和分发越来越多地通过 AWS Marketplace 和 AWS 服务运行。AlphaStreet 说明,即便供应商试图做到硬件无关,CUDA 锁定和供给稀缺仍然重要。NVIDIA 的 MGX 架构显示,生态标准可以多快加深对 NVIDIA 路线图的依赖。结论是:Modular 的可移植性故事是一项缓释因素,但也是一项运营承诺,需要云合作伙伴、芯片路线图、容器兼容性和稀缺工程劳动力同时撑住。[CR008, CR009, CR010, CR011, CR012, CR013]

运营 / 质量 / 安全风险登记表
失败模式可能性严重性缓释成熟度剩余敞口未解决缺口
Modular 继续支持 NVIDIA、AMD 和 Apple 目标时,新一代 GPU 或驱动栈出现回归部分中高没有跨硬件代际的公开发布质量 / 错误率历史
尽管有企业可靠性声明,共享或专用端点仍发生可用性或延迟事故部分中高审阅材料中没有公开事故登记表、正常运行时间历史或范围级 SLA 指标
BYOC 中,Modular 控制平面与客户 VPC 运营之间的共同责任边界混乱部分没有公开控制矩阵或 DPA 展示日志、密钥管理和事件响应的边界细节
面向现场的工程能力成为定制优化工作的交付瓶颈早期没有客户工程项目的公开人员配比、排队时间或利用率数据
Mojo / MAX 路线图变化,给基于新 API 或内核构建的开发者带来迁移摩擦中高部分公开路线图承认未来会有源代码破坏性变更,但没有按客户层级披露迁移负担

运营风险从公司在产品页面公开承诺的内容出发评估,而不是来自已披露的事故历史。

[CR007, CR009, CR011, CR012, CR013, CR018]
伙伴 / 依赖风险登记表
依赖交易对手角色集中度失败情景严重性缓释措施剩余敞口
高级 GPU 供应和软件生态NVIDIA性能锚点、路线图驱动者、生态标准制定者分配延迟、客户对 CUDA 优先的惯性,或路线图分叉,削弱 Modular 的可移植性价值主张AMD 和 Apple 支持、编译器可移植性、客户 VPC 选项
云采购和分发AWS / AWS Marketplace渠道、采购界面、部署场所、marketplace 计费中高Marketplace 或伙伴动作放慢,压低企业管线转化,并拉高 CAC / 销售周期长度直销、跨多云的 BYOC、开源漏斗中高
BYOC 基础设施底座BentoCloud 架构面向客户云部署的预置和生产级 IaC 基础控制平面、自动化或预置依赖变成瓶颈,或成为架构风险单点中高客户自有云账户、Modular 工程支持、多云支持
第二来源加速器定位AMD相比 NVIDIA 的成本和可移植性替代方案AMD 支持落后于客户需求,或无法抵消企业账户对 NVIDIA 的偏好公司营销同栈可移植性和混合供应商部署
参考架构生态NVIDIA MGX / OEM 生态加速系统的服务器设计和部署标准企业部署默认流向 NVIDIA 标准化栈,更难被替代中高可移植性叙事、云抽象、自定义 kernel 差异化中高
公开客户证明集合Inworld / AWS / 有限具名账户企业采用的验证和可被引用性狭窄证明集合夸大多元化,并掩盖集中度或续约风险中高开源漏斗、不止一种部署模式、宽泛生态信息中高

最重要的依赖不只是供应商;还包括分发渠道、生态标准,以及少数公开可见的证明账户。

[CR010, CR019, CR024, CR025, CR026, CR030]
FR003: 依赖图——Modular 产品承诺周围的关键生态交易方

Modular 处在一张合作伙伴网络的中心,网络里有芯片生态、采购渠道、云环境和交付人力。

[CR010, CR024, CR025, CR037, CR040, CR042]

7.4 人员风险和财务不透明眼下可控,但它们定义了本章的关键否决标准

人员和财务风险不太是迫在眉睫的困境,更在于投资者仍无法验证什么。2025 年融资实质性降低了短期资本压力,外部报道也印证了 $250 million 融资、$380 million 累计资本和 $1.6 billion 估值。这是真实缓冲。但公开披露仍未回答核心承销问题:Modular 是在像软件平台一样扩张,还是像高接触度基础设施咨询公司一样扩张。已审阅来源包仍未披露收入、ARR、毛利率、烧钱速度、现金跑道、客户数、续约行为,或按合作伙伴和账户划分的集中度。领导层可见度也不完整。About 页面列出了可信的创始人班底和少数职能负责人,但公开记录没有披露完整董事会名单或继任计划,而产品入口反复强调前置部署工程师是交付引擎。这意味着本章否决标准可以监测,而非纯属假设:受监管部署中出现重大合规失误、GPU 或云合作伙伴访问急剧丧失,或出现人才密度无法支撑承诺性能和支持水平的迹象,都会迫使尽调观点转向更负面。在公开证据填补经济性、事故和继任缺口之前,风险结论仍是高,而不只是中。[CR014, CR015, CR016, CR017, CR018, CR021]

人员 / 执行风险登记表
角色 / 职能依赖或缺口可能性严重性缓释措施尽调路径
创始人 / 产品架构领导Chris Lattner 和 Tim Davis 仍是技术叙事和战略可信度的核心;公开继任细节有限可见的更广领导层梯队,以及用于招聘的新资本索取董事会材料、继任计划,以及按产品线拆分的授权负责人
面向现场的工程客户结果和优化承诺似乎高度绑定稀缺的资深工程人力主动招聘和多办公室布局索取人员配比、部署排队时间和客户升级指标
合规 / 法务运营公开来源看不出 Modular 为 AI、隐私和出口管制合规配置了多少专职内部能力公开隐私、条款和企业安全营销材料已经存在索取组织架构图、具名合规负责人、外部律师覆盖范围和审计节奏
跨职能规模化执行云、BYOC、开源和自定义模型快速扩张,加重协调负担中高超过 130 名员工和多个办公室提供了一定运营厚度索取路线图治理流程、发布 QA 闸口和事后事故复盘流程

这张登记表聚焦公开记录中执行看起来依赖人力的环节;私下的组织设计可能改善,也可能恶化这一图景。

[CR014, CR015, CR016, CR022, CR042, CR045]
缓释和否决标准表
风险可监控触发点阈值 / 事件行动含义
法律 / 合规漂移受监管客户控制失败或接触执法机构任何与隐私、DSP 或州级 AI 控制有关的公开执法行动、重大客户补救或审计失败暂停承销,直到审阅产品控制映射、法律顾问分析和补救证据
硬件 / 供应依赖无法及时获得优先 GPU 产能,或主要供应商路线图滑坡反复无法在预期发布窗口内支持最新目标硬件,或因硬件不可得导致重大客户流失下调可移植性优势,并假设受限供应带来利润率压力
渠道依赖AWS Marketplace / 超大规模云渠道在没有多元化直接赢单证明的情况下成为主导企业订单额很大一部分依赖一个交易市场或一种云伙伴动作将收入质量视为更低,并在模型中计入集中度折价
交付能力瓶颈面向现场的工程利用率或排队时间飙升有意义的积压、延迟事故增加,或无法按时接入 / 定制优化新账户假设规模化更偏服务,并下调软件倍数假设
财务不透明公司继续抬高预期,却不披露基本单位经济到下一轮重大融资或刷新周期时,仍没有可信披露收入质量、烧钱速度或利润率进展维持信心上限,并要求直接尽调访问后再上调观点
人员 / 治理创始人离任、继任者缺失,或董事会 / 控制权疑虑未解决CEO、总裁或主要技术负责人离任,且没有清晰继任和运营连续性计划将投资逻辑转为持有 / 重新承销,直到领导连续性得到证明

否决标准刻意设计为可监控。它们不是预测;而是一旦触及,就应重新审视当前建设性但谨慎风险观点的阈值。

[CR021, CR022, CR028, CR031, CR035, CR036]

7.5 图表

Chapter 08

08估值

8.1 投资逻辑与当前立场

从产品故事看,Modular 不难让人喜欢。公司刚完成 $250 million 新融资;围绕 NVIDIA 和 AMD 硬件有可信的可移植叙事;开源漏斗可见;Inworld 和 Hippocratic AI 的具名客户证明显示,这套栈可能在真实工作负载上带来有意义的延迟和成本结果。独立市场报告也支持一个庞大且仍在增长的 AI 基础设施背景。问题在于,这不等于在最新估值下有一个干净的承销案例。公开来源仍未披露收入、ARR、毛利率、客户集中度或留存,商业模式又反复强调前置部署工程师和定制优化工作。因此,这个逻辑只有在附条件下才可投。仅凭公开证据,正确立场是继续研究:密切跟踪公司,但不要假装现有数据能证明 $1.6 billion 是便宜、合理还是昂贵。[CV001, CV004, CV006, CV008, CV014, CV015]

建议摘要表
维度评估理由什么会改变观点
建议继续研究公开证据显示真实产品需求,但经济性披露不足,无法在今天承销 $1.6B只有入场价格更低,或拿到私下 KPI 证据,才上调
信心融资、客户证明和市场增长都真实存在,但经济性材料缺失如果披露 ARR、利润率和留存,信心会上升
风险评级轻资本软件上行空间存在,但服务结构、集中度和以 NVIDIA 为中心的竞争仍可能压缩价值关注降价轮或集中度信号
估值立场偏高这个标记并非不可能,但公开数据无法说明收入是否接近支撑 6-10x 软件倍数所需的水平敏感性取决于未披露收入和利润率
决策含义不应只凭公开证据给出买入继续跟踪并开启尽调;只有价格更好,或私下指标确认规模后,才更建设性当前标记提供的是可选性,不是承销清晰度

这张表刻意对价格敏感:同样的公司质量,在披露经济性和入场点不同的情况下,可以支持不同判断。

[CV001, CV008, CV032, CV033, CV035, CV044]
投资逻辑 / 反向逻辑表
投资逻辑论点证据反向逻辑什么会改变判断
硬件可移植切口真实存在公司和第三方资料反复把 MAX 定位为横跨 NVIDIA、AMD 和 Apple 目标,并提供兼容 OpenAI 的端点NVIDIA 的一体化栈和 CUDA 使用惯性,仍是许多买家的默认生产路径需要独立、多客户证据,证明可移植性能赢下实质性企业支出
客户证据显示真实经济价值Inworld 和 Hippocratic 都称,在接近生产的场景中获得了有意义的延迟或效率改善具名证据仍然集中,且由公司筛选需要更广泛的独立客户案例,附续约和支出数据
开源漏斗可以带来企业转化GitHub、Apache 2 许可、公开 CI 和社区会议支撑开发者采用庞大的开源社区并不保证企业变现需要从社区到付费产品的转化和留存收入数据
市场增长顺风很强第三方报告显示,AI 基础设施和推理市场仍在快速复合增长市场高速增长会吸引资本更充足的对手,并压缩差异化需要证据证明,标准化和平台捆绑之下 Modular 仍能持续赢单
如果经济性已经很强,当前价格可以成立如果收入足够高、利润率接近软件,$1.6B 相比私有基础设施同业可能合理若没有披露收入和利润率,这一估值可能只是叙事溢价需要私有 KPI 包,显示收入规模、毛利率、NRR 和客户集中度

这些论点刻意绑定证据和反向证据,而不是泛泛赞美产品类别。

[CV014, CV015, CV017, CV020, CV022, CV023]
FV001: 建议逻辑

从市场机会和证明点,推导到当前对证据敏感的建议。

[CV018, CV019, CV014, CV015, CV017, CV035]
FV004: 投资 KPI

以 IC 记分卡方式,呈现今天承销 Modular 时最关键的维度。

[CV001, CV014, CV015, CV018, CV019, CV032]

8.2 估值背景与入场纪律

公开材料中最好的估值锚点,不是一个可直接观察的收入倍数,因为 Modular 不披露收入。更干净的练习,是反推支撑最新估值需要多少收入。在 $1.6 billion 估值下,10x 收入倍数意味着年收入约 $160 million,8x 意味着约 $200 million,6x 意味着约 $267 million。对类别领导者来说,这些门槛并非不合理,但已审阅来源没有告诉我们 Modular 是否已经接近任何一个。同业融资背景正反都有。Together AI、Groq、Lambda 和 Cerebras 都显示,投资者仍愿意以数十亿美元估值资助稀缺 AI 基础设施资产。但其中一些同业要么披露了更多规模信息,要么容量业务更明显,要么处在更稀缺的类别。结论是:价格并非显然荒谬,但在没有私人 KPI 证据或更好入场点之前,它仍然过于不透明,撑不起买入建议。[CV001, CV027, CV028, CV029, CV030, CV031]

可比估值表
可比对象类型指标 / 估值 / 状态倍数 / 门槛与 Modular 的相关性局限
Modular私有 AI 基础设施 / 推理平台$1.6B 估值;累计融资 $380M未披露收入;敏感性测算显示,若按 10x 倍数,需要约 $160M 收入直接标的;在本资料包中可移植性叙事最强收入、利润率和优先股堆叠均未公开
Together AI私有 AI 云 / 开源模型平台2025 年估值 $3.3B;Sacra 估计到 2026 年 2 月年化收入约 $1BSacra 称上一轮隐含约 9.6x 2024 年收入最接近的同业,兼具 token API 和 GPU 云,收入启发式指标更可见收入数字是分析师估计,并非公司申报
Groq私有推理基础设施厂商2025 年 9 月投后估值 $6.9B已披露估值;抓取资料包未披露收入显示投资人愿意为推理赢家支付稀缺性溢价业务组合和硬件策略不同于 Modular
Lambda私有 GPU 云 / AI 基础设施厂商2025 年 Series E 超过 $1.5B;此前报道提到 $4B 估值已披露估值;提及客户规模,但这里收入仍不透明可作为基础设施需求和 GPU 云偏好的参考可比对象相比 Modular 的软件主导叙事,更接近 GPU 云和硬件容量风险敞口
Cerebras私有 AI 硬件 / 系统公司2025 年 9 月估值 $8.1B已披露估值;抓取资料包未披露收入显示前沿 AI 基础设施资本如何给平台稀缺性定价硬件偏重,不能直接与 Modular 比较
CoreWeave已提交监管文件的 AI 基础设施公司S-1/A 显示 2024 年收入 $1.9B,capex 和客户集中度很高规模存在,但资本强度和客户集中度也极端可提醒:基础设施增长再快,也可能带有结构性风险不是软件可移植性平台;资本结构和资产基础大得多

可比集混合了私有轮次、一家已提交监管文件的公司和一个估算收入倍数,因为标的公司本身不披露收入。因此表格方向上有用,但不能机械套用。

[CV001, CV024, CV025, CV027, CV028, CV029]
FV002: 估值敏感性

在不同收入倍数下,Modular 要支撑 $1.6B 估值所需达到的收入门槛。

数值只是用最新披露的 $1.6B 估值标记除以倍数得出的简单计算;它们是门槛检查,不是对 Modular 当前收入的预测。

[CV001, CV028, CV033, CV034]

8.3 情景分析与逻辑破裂点

情景区间很宽,因为开放问题不是 Modular 有没有做出有用的东西;而是公司能否足够快地变成一个耐久的软件平台,在现有厂商和开源替代方案缩小差距之前,支撑溢价倍数。乐观情景需要几件事同时成立:企业转化从少数具名客户外扩,基准测试领先在新一代 GPU 上持续,私人尽调显示有意义收入上的软件式利润率。基准情景承认公开证明仍不完整,但假设公司仍在快速增长市场中复利,并保留足够差异化来守住当前估值。悲观情景不太是产品彻底失败,而是估值被压缩:可移植性不再那么独特,客户广度仍然狭窄,或经济性看起来更像服务密集型而非平台型。这些条件应驱动组合监测。[CV020, CV022, CV023, CV024, CV025, CV026]

乐观 / 基准 / 悲观情景表
情景核心假设估值逻辑概率信号关键风险
乐观收入已经进入或正快速逼近 $200M+ 区间;开源漏斗转化为广泛企业账户;跨 NVIDIA 和 AMD 的可移植性仍有差异化如果投资人奖励已披露规模和软件式利润率,未来 24-36 个月潜在估值区间为 $3.0B-$5.0B低-中执行、集中度和 incumbent 反应仍然重要
基准增长仍强,但经济性披露仍不完整,模式仍是软件和高接触服务的混合潜在估值区间 $1.5B-$2.5B,大致在最新估值附近或略高倍数压缩或转化放缓可能限制上行
悲观差异化收窄,付费转化滞后,或下一轮在经常性经济性公开证据出现前被迫重设估值存在降轮风险、议价能力变弱,潜在估值区间 $0.6B-$1.2B可移植性变成功能同质化,服务负担仍然高

区间是分析师情景,锚定已披露融资背景、同业轮次以及缺少公开收入披露这一事实;并非公司指引。

[CV032, CV039, CV040, CV041, CV044, CV045]
投资逻辑失效与止损触发因素表
触发因素门槛 / 事件对投资逻辑的传导行动含义
下一轮融资低于 2025 年估值重设相比 $1.6B 持平或降轮意味着私募投资人不再支撑既有叙事溢价下调立场,并重审下行情景
客户广度没有超出参考账户没有证据显示付费账户多元化、续约或集中度下降会削弱「Modular 正从窄优化厂商变成广泛平台」这一主张在广度改善前,维持或降低确信度
服务强度持续过高前置部署工程仍是多数赢单的必要条件,且毛利率证据一直不出现会限制倍数扩张,让公司更像高端服务而非可扩展软件增加风险敞口前,要求披露产品利润率和支持比率
可移植优势收窄竞争对手或 incumbent 在不带来类似迁移成本的情况下,匹配实际多硬件收益会压缩支撑溢价定价的核心差异化按更低倍数的软件或基础设施可比对象重新定价
资本强度或集中度开始类似基础设施下行情形出现大额承诺或客户集中,同时没有利润率透明度抵消会提高未来融资重设的概率,并降低战略杠杆视为投资逻辑失效,直到集中度或经济性改善

这些是可监控事件;即便更广泛的 AI 市场仍强,一旦发生也会迫使建议出现实质性重估。

[CV023, CV024, CV025, CV037, CV038, CV041]
FV003: 估值 / 回报区间

基于执行、披露和竞争压力,给出未来 24-36 个月的情景估值区间。

这些区间是分析师情景范围,锚定当前 $1.6B 估值标记、同业融资轮次,以及关于披露和执行的明确假设;它们不是公司指引。

[CV032, CV039, CV040, CV041, CV044, CV045]

8.4 退出准备度与最终尽调问题

公开退出准备度仍然薄弱。没有公开 KPI 包,外部投资者无法像建模一家成熟上市软件公司那样建模 Modular;也没有公开股权结构表或优先股堆叠,能让投资者把强劲的标题估值翻译成实际普通股结果。因此,最终尽调议程比任何漂亮的估值公式都更重要。在承销当前估值之前,投资者需要当前收入和 ARR、按业务界面划分的毛利率、队列留存、集中度、实际定价,以及平台工程与前置部署支持之间的组织结构。还需要融资机制:股份类别、清算优先权,以及任何反稀释条款,因为这些条款可能让未来平轮或下轮比标题估值显示的更具惩罚性。在这些事项清楚之前,Modular 仍是高兴趣跟踪标的,而不是高信念买入。[CV008, CV009, CV011, CV016, CV042, CV043]

最终尽调索取清单
主题缺失证据为什么重要负责人 / 尽调路径
当前收入 / ARR按产品表面拆分的最新月收入、ARR 和增长要判断 $1.6B 是便宜、合理还是昂贵,这是最低限度输入索取董事会材料中的 KPI 页和最新经营复盘
按表面拆分的毛利率共享端点、专用端点、BYOC 和服务的毛利率把软件式经济性和服务偏重的收入质量分开索取按收入表面和支持负担拆分的财务切片
留存和集中度NRR、GRR、logo 留存、前 10 大客户占比和具名续约日历显示客户证据是持久且多元,还是集中索取 cohort 表和集中度明细
股权结构表和优先权股份类别、清算优先权、SAFE、期权池和反稀释条款强劲的表面估值仍可能掩盖普通股结果偏弱索取最新股权结构表和融资文件
组织结构产品或平台工程师,与前置部署或客户工程师的占比测试 Modular 是像软件一样扩展,还是像高接触交付组织一样扩展索取当前组织架构和招聘计划
定价实现实际平均售价、折扣、承诺使用条款和渠道费用公开标价机制不能揭示实际经济性索取客户合同样本和定价瀑布

每一行都指出会实质性改变建议的证据,而不只是补充背景。

[CV008, CV009, CV011, CV016, CV042, CV043]

8.5 图表

免责声明

本报告仅供参考。

证据索引

结论
编号陈述可信度来源
CO001 Modular was founded in 2022 by Chris Lattner and Tim Davis. SO001, SO018, SO020
CO002 The founders say they started Modular to solve fragmented AI infrastructure and make accelerated compute easier to use. SO001, SO018, SO020
CO003 Public sources place Modular in the San Francisco Bay Area even though they alternate among Silicon Valley, Palo Alto, Los Altos, and broader Bay Area labels. SO001, SO002, SO018, SO021
CO004 Modular’s About page lists offices in San Francisco, Los Altos, Boston, and Edinburgh. SO001
CO005 Modular’s office-expansion post says the San Francisco office joins a Los Altos headquarters and that Edinburgh is based in the Bayes Centre. SO003
CO006 The public leadership team named on Modular’s About page includes Chris Lattner, Tim Davis, Mostafa Hagog, Kalor Lewis, Eric Johnson, and Mike Edwards. SO001
CO007 GV presents Chris Lattner as the creator of LLVM, Clang, and Swift and Tim Davis as the founder of TensorFlow Lite and a leader of Google on-device ML. SO020
CO008 Modular’s careers page says new-employee onboarding is conducted onsite at the Los Altos office. SO013
CO009 Modular positions itself as modular and composable infrastructure that simplifies AI development and deployment. SO001
CO010 The pricing page shows three deployment modes: Modular-hosted cloud services, customer-cloud or VPC deployment, and endpoint or custom-model offerings. SO012
CO011 Modular publicly offers a free developer entry point for MAX and Mojo, while also advertising paid consumption endpoints and enterprise engagements. SO012, SO015
CO012 Modular’s terms say access to the platform is contract-governed and that client-side software is licensed under the Modular Community License. SO015, SO016
CO013 TechCrunch and The SaaS News report that Modular raised $100 million in August 2023 and brought total funding to $130 million. SO018, SO019
CO014 The 2023 financing syndicate publicly included General Catalyst, GV, SV Angel, Greylock, and Factory. SO018, SO019
CO015 Sacra says Modular raised a $30 million seed round in June 2022. SO024
CO016 Modular’s September 2025 announcement says it raised $250 million in a third financing round led by USIT, with DFJ Growth joining and existing investors including GV, General Catalyst, and Greylock participating. SO002, SO021, SO023
CO017 Modular’s September 2025 financing set total capital raised at $380 million and valuation at $1.6 billion. SO002, SO023, SO024
CO018 Independent coverage says the 2025 valuation nearly tripled the company’s prior mark from two years earlier. SO021, SO023
CO019 Reuters-linked coverage described Modular as having about 130 employees at the time of the 2025 round. SO023
CO020 Modular’s own 2025 financing post says the company had grown to more than 130 people with a footprint across North America, the United Kingdom, and Europe. SO002
CO021 Modular’s 2025 financing announcement says the platform launched in 2023. SO002
CO022 Modular’s Mojo local-download post says more than 120,000 developers had signed up for the Mojo Playground and more than 19,000 were actively discussing Mojo on Discord and GitHub. SO004
CO023 Modular’s offices post says Mojo is free to use, has hundreds of thousands of lines of open-source code, and a community of more than 50,000 developers. SO003
CO024 The Mojo website lists stable version 1.0.0b1 with a May 7 date and a latest nightly dated June 11. SO017
CO025 Modular’s 26.3 release says Mojo 1.0 is in beta and final 1.0 is planned later in 2026. SO007
CO026 The path-to-1.0 post says Modular expects Mojo to reach 1.0 sometime in 2026 and to open source the Mojo compiler with that milestone. SO006, SO017
CO027 Modular says the core modules of the Mojo standard library were released under Apache 2 with LLVM exceptions. SO005, SO016
CO028 The Mojo website says the standard library is fully open-source on GitHub while the compiler is still planned for open-sourcing in 2026. SO017, SO006
CO029 Mammoth is Modular’s Kubernetes-native platform for enterprise-scale distributed AI serving. SO008, SO002
CO030 Modular’s AWS partnership announcement says MAX on Graviton CPUs can deliver up to 5x higher performance and up to 80% cost savings. SO009
CO031 Modular’s AMD partnership announcement says the platform is generally available across AMD’s GPU portfolio including MI300 and MI325 and reports up to 53% better throughput on prefill-heavy workloads against open-source stacks. SO010
CO032 Modular’s 2025 financing post claims 10-thousands of platform downloads per month, 24,000-plus GitHub stars, trillions of tokens served daily, more than 100 countries represented in its ecosystem, and 600,000-plus lines of open-source code. SO002
CO033 The fetched GitHub repository page showed 26.3 thousand stars at review time. SO016
CO034 Modular’s customer page claims +80% faster performance versus other providers, +70% cost reduction versus vLLM, and 2-5x faster movement from research to production. SO011
CO035 The customer and partner materials publicly name Inworld, AWS, AMD, NVIDIA, and TensorWave as part of Modular’s proof surface. SO011, SO009, SO010
CO036 Modular’s 2025 financing post names an ecosystem that includes Inworld, SF Compute, Jane Street, Oracle, AWS, Lambda Labs, TensorWave, AMD, and NVIDIA. SO002, SO021
CO037 Reuters-linked coverage says Modular serves cloud providers such as Oracle and Amazon as well as chipmakers Nvidia and AMD. SO023
CO038 Sacra and Reuters-linked coverage describe Modular as a B2B infrastructure software business monetizing on a consumption basis with direct enterprise sales and partner channels. SO024, SO023
CO039 Chris Lattner told TechCrunch that the 2023 financing would be used for product expansion, hardware support, and team growth rather than primarily for AI compute. SO018
CO040 No canonical public revenue figure appears in the reviewed official, media, or analyst source pack for Modular. SO001, SO002, SO012, SO018, SO023, SO024
CO041 No canonical public active-customer count appears in the reviewed source pack even though the company cites named partners and customer stories. SO001, SO002, SO011, SO023, SO024
CO042 The public record still lacks a full current board roster and detailed governance structure for Modular. SO001, SO002, SO021, SO023
CO043 An external GitHub issue on Modular’s repository shows developer concern that Mojo might not remain fully open source or free and could create future lock-in. SO025
CO044 Modular’s terms reserve rights and allow service suspension in several scenarios, showing that commercial platform access remains contract-governed even as open-source components expand. SO015
CO045 Across official materials, Modular says its stack runs across NVIDIA, AMD, CPUs, cloud environments, and in some cases Apple Silicon. SO001, SO010, SO012
CO046 Modular consistently frames the company as a unified AI compute layer or AI hypervisor rather than a single-vendor inference stack. SO001, SO002
CO047 The 2025 financing post says demand is already strong from enterprises, clouds, and developers. SO002
CO048 Modular says it is hiring across engineering, infrastructure, and go-to-market roles, including in Edinburgh. SO003, SO002, SO013
CO049 Modular’s About page publicly lists DFJ Growth, Factory, General Catalyst, Google Ventures, Greylock Partners, SV Angel, and USIT Fund among its named backers. SO001
CO050 GV says it led Modular’s first funding round alongside Greylock and Factory. SO020
CO051 The 2025 round added DFJ Growth as a new investor while existing investors re-participated. SO002, SO021, SO023
CO052 The 2025 financing is partly intended to help Modular expand from AI inference into the AI training market. SO023
CO053 Reuters-linked coverage says Modular plans to expand engineering and go-to-market teams with the new capital. SO023
CO054 Reuters-linked coverage says Modular plans to sell software directly to enterprises on a consumption basis and through revenue-sharing partnerships with cloud providers. SO023
CO055 Taken together, the public location signals suggest a Bay Area-centered company with Los Altos as an operating hub and San Francisco as a growing outward-facing office. SO001, SO003, SO013, SO021
CO056 Modular’s mission is to make the AI compute layer more unified, efficient, and accessible beyond closed or vendor-specific platforms. SO001
CM001 Modular describes itself as a unified AI compute layer or hypervisor for AI rather than a single-model application vendor. SM001, SM004
CM002 Modular's public offer is best bounded as production inference infrastructure spanning hosted endpoints, BYOC deployments, and a portability-focused compiler/runtime layer. SM002, SM003, SM004, SM010
CM003 Shared Endpoints are sold on a token-priced basis with no reserved capacity, no minimum spend, scale-to-zero behavior, and burst capacity for variable traffic. SM002
CM004 BYOC is sold as inference running inside the customer VPC with Modular handling the serving stack while customers keep their hardware, data, and cloud credits. SM003
CM005 Modular's managed cloud targets startups, rapid prototyping, cost-sensitive production inference, and migrations away from proprietary APIs. SM004
CM006 The model and solutions pages show Modular supporting LLM, vision, image, audio, and video workloads, implying a broader serving scope than text-only inference. SM006, SM007, SM008
CM007 The real substitute set includes proprietary model APIs, single-vendor GPU clouds, wrapper-based serving stacks, self-managed Kubernetes inference, and portable runtimes such as ONNX Runtime. SM002, SM004, SM017
CM008 Modular's customer page names Inworld, AWS, NVIDIA, AMD, and Hippocratic AI, implying buyer proof across application, cloud, and hardware ecosystem participants. SM009
CM009 The Business Research Company sizes the global AI infrastructure market at USD 90.91 billion in 2026. SM022
CM010 Fortune Business Insights sizes the global AI inference market at USD 117.80 billion in 2026. SM024
CM011 Technavio says the AI inference hardware market was worth USD 67.80 billion in 2025 and is growing at 20.8% CAGR through 2030. SM023
CM012 These public market figures are adjacent rather than interchangeable because they measure hardware-only, broader infrastructure, and full inference-market boundaries. SM022, SM023, SM024
CM013 CNCF reports that 82% of container users run Kubernetes in production and 66% of organizations hosting generative AI models use Kubernetes for some or all inference workloads. SM011, SM014
CM014 llm-d and Google's inference-gateway messaging show the market is investing in Kubernetes-native distributed inference with cache-aware routing, disaggregated serving, and accelerator-neutral design. SM012, SM013, SM019
CM015 Forbes reports that 67% of AI compute already goes toward inference and cites a USD 255 billion inference market by 2030. SM014
CM016 The Business Research Company identifies enterprises, government organizations, and cloud service providers as end-user groups for AI infrastructure. SM022
CM017 Technavio says cloud inference holds the largest revenue share by deployment in AI inference hardware while edge and on-prem remain material segments. SM023
CM018 Fortune Business Insights says edge inference is the leading 2026 deployment segment globally and cloud inference is second-largest, which conflicts with the hardware-market deployment lens. SM024, SM023
CM019 Because public market boundaries and deployment splits conflict, the most defensible SAM lens for Modular is a constrained portability-and-production wedge rather than one top-down headline TAM. SM022, SM023, SM024
CM020 Modular's pricing page presents three commercial entry points: free self-hosted usage, usage-priced managed endpoints, and pay-per-minute BYOC enterprise deployments. SM003, SM010
CM021 Modular publicly lists token pricing for named hosted models, including DeepSeek V4 at USD 1.74 per million input tokens and USD 3.48 per million output tokens. SM010
CM022 BYOC pricing is framed as a single per-minute rate across NVIDIA B200 and AMD MI355X dedicated endpoints, emphasizing cost predictability over per-token variability. SM003
CM023 Shared endpoints are positioned for variable-traffic production and prototyping, while BYOC is positioned for compliance and enterprise control. SM002, SM003, SM004
CM024 Agentic AI is a promising target segment because Modular says agent workflows often involve 10-50 LLM calls per task and latency savings compound across the chain. SM005
CM025 Voice workloads are a promising target segment because Modular positions real-time TTS as bursty, latency-sensitive, and highly sensitive to GPU price-performance. SM006
CM026 Coding-tool workloads are attractive because Modular frames code completion and agentic coding as sustained, high-volume inference where fleet cost dominates economics. SM007
CM027 Across Modular's public packaging, the end user is typically an AI engineering team, but the payer is often a product, platform, procurement, or FinOps owner accountable for serving economics. SM003, SM004, SM010
CM028 ONNX Runtime positions itself as a performant inference layer that runs models from multiple frameworks across cloud servers, edge and mobile devices, and web browsers. SM015, SM016
CM029 ONNX Runtime's execution-provider model spans CUDA, TensorRT, OpenVINO, QNN, CoreML, ROCm, MIGraphX, Azure, and other backends, evidencing strong market demand for backend abstraction. SM017, SM020
CM030 MLIR explicitly aims to reduce software fragmentation and improve compilation for heterogeneous hardware with target-specific operations. SM018, SM021
CM031 Phoronix reports that MLIR-AIE extends MLIR-based compiler tooling into AMD AI Engine devices and Ryzen AI NPUs, showing portability work broadening beyond classic GPU serving. SM021
CM032 llm-d's emphasis on prefix-cache-aware routing, prefill/decode disaggregation, and benchmarked inference scheduling shows the market is moving from simple hosting toward orchestration efficiency. SM012, SM013, SM019
CM033 Modular's product pages align with that market direction by selling compiler-aware scaling, custom kernels, workflow tuning, and hardware portability as core differentiators. SM002, SM003, SM004, SM005, SM006, SM007
CM034 AlphaStreet argues that CUDA lock-in is embedded in compilers, libraries, developer habits, and production toolchains, making migration costs practical as well as technical. SM025
CM035 AlphaStreet also argues that supply scarcity turns time-to-usable Nvidia compute into a procurement variable that can outweigh theoretical cost savings from alternatives. SM025
CM036 Forbes notes that daily production AI use on Kubernetes still lags broad adoption and highlights tooling maturity, GPU multi-tenancy, and cost management as ongoing barriers. SM014, SM011
CM037 Technavio cites high initial capex, hardware/software co-design complexity, and rapid hardware obsolescence risk as constraints on inference-platform adoption. SM023
CM038 Fortune Business Insights cites high hardware cost, integration difficulty, talent shortages, and privacy or security concerns as restraints on AI inference adoption. SM024
CM039 NVIDIA markets MGX as a modular server-design platform for accelerated computing, underscoring that incumbents are also reducing deployment friction around AI infrastructure. SM026
CM040 Modular's differentiation is strongest for buyers that care about cost predictability, compliance, or multi-accelerator flexibility, and weaker for buyers content with proprietary API abstraction alone. SM003, SM004, SM010, SM025
CM041 Public sources do not disclose Modular's customer count, cohort mix, or the split of demand across shared endpoints, managed dedicated endpoints, and BYOC deployments. SM009, SM010
CM042 Public performance claims such as 20-50% gains over vLLM or 60-80% customer cost savings are company- or partner-reported in this pack rather than independently benchmarked end to end. SM001, SM009
CM043 The cleanest underwriting frame is a constrained wedge: cross-accelerator production inference infrastructure for AI-native teams and enterprises trying to lower cost, preserve control, or reduce vendor dependence. SM002, SM003, SM004, SM013, SM015, SM022, SM025
CP001 MAX is publicly positioned as a single GenAI stack that combines model serving, model customization, and kernel programming inside one framework. SP001
CP002 Modular says the same MAX and Mojo code paths now target NVIDIA, AMD, and Apple Silicon hardware. SP001, SP002
CP003 Modular markets MAX as a stack that does not depend on PyTorch, CUDA, or ROCm and frames that design as lower vendor lock-in with smaller containers and faster cold starts. SP001
CP004 Modular's recent releases emphasize fast hardware enablement across Blackwell, MI355X, and Apple or consumer GPUs as a core part of its value proposition. SP002, SP003
CP005 Modular repeatedly says its headline performance claims can be checked with public benchmark scripts rather than only private customer data. SP002, SP004
CP006 vLLM is a direct open-source serving peer that publicly combines PagedAttention, continuous batching, multi-LoRA support, OpenAI-compatible APIs, and support for more than 200 model architectures. SP006, SP007
CP007 SGLang is a direct high-performance serving peer that publicly emphasizes RadixAttention, prefill-decode disaggregation, multi-LoRA batching, and large-scale production deployment. SP008, SP009
CP008 TensorRT-LLM is a CUDA-first incumbent stack that focuses on NVIDIA-only inference optimization through custom kernels, advanced parallelism, and integration with Triton and Dynamo. SP010, SP011
CP009 Ray Serve competes less as a kernel runtime and more as scalable serving infrastructure for composition, autoscaling, and multi-model application assembly. SP012
CP010 Together AI competes as a managed alternative that sells serverless inference, dedicated endpoints, and GPU capacity rather than an open-source runtime. SP014, SP015
CP011 Hugging Face's TGI docs say the project is now in maintenance mode and explicitly recommend vLLM, SGLang, and local compatible engines going forward. SP016, SP017
CP012 ONNX Runtime is a substitute path for internal builders because it offers cross-framework graph optimization and hardware-specific execution providers instead of a full managed inference product. SP024
CP013 llm-d presents another substitute path by packaging Kubernetes-native distributed inference on top of vLLM rather than replacing vLLM with a new serving engine. SP025, SP006
CP014 NVIDIA MGX extends the incumbent threat by giving OEMs and partners a modular reference architecture with multi-generational compatibility and the full NVIDIA software stack. SP023
CP015 For buyers already standardized on NVIDIA fleets, TensorRT-LLM plus MGX and adjacent CUDA tooling offer a deeper incumbent ecosystem than Modular publicly matches. SP010, SP023, SP022
CP016 Modular's cleanest direct wedge is cross-vendor portability across NVIDIA and AMD production hardware with Apple support extending the development story. SP001, SP002, SP004
CP017 Public evidence still shows vLLM ahead of Modular on disclosed ecosystem breadth, model coverage breadth, and adapter maturity. SP006, SP018
CP018 Public evidence still shows SGLang ahead of Modular on shared-prefix optimization emphasis and disclosed deployment scale. SP008, SP018
CP019 Together publishes a packaging model that Modular does not publicly match, including token pricing, dedicated endpoints, on-demand GPU hourly rates, and reserved pricing tiers. SP015
CP020 Ray Serve and Anyscale pitch BYO cloud, multi-cloud execution, and composition control rather than a single integrated inference runtime. SP012, SP013
CP021 Managed alternatives and orchestration layers make multi-homing feasible because customers can wrap or route across runtimes instead of hard-committing to one serving engine. SP012, SP013, SP014, SP021
CP022 Internal-build substitutes are credible because vLLM, Ray Serve, ONNX Runtime, and llm-d each expose composable building blocks without requiring Modular's full integrated stack. SP006, SP012, SP024, SP025
CP023 Spheron's 2026 H100 comparison says MAX led vLLM and SGLang on dense-model throughput in that benchmark but had slower first-run cold start than both. SP018
CP024 Spheron says MAX's current release is weaker for MoE workloads and lacks equivalent multi-LoRA support, so its advantage is workload-specific rather than universal. SP018
CP025 Spheron's decision matrix treats vLLM as the safest broad production default and SGLang as the better choice for shared-prefix workloads. SP018
CP026 Future AGI's 2026 alternatives guide still frames Together as the closest hosted replacement, Anyscale as the VPC-control option, and vLLM as the default OSS self-hosted runtime. SP021
CP027 OpenAI-compatible APIs are not a durable moat for Modular because MAX, vLLM, SGLang, and TGI all expose similar compatibility claims. SP001, SP006, SP008, SP017
CP028 Continuous batching, cache optimization, and high-throughput serving are now table-stakes features across MAX, vLLM, SGLang, and TGI rather than Modular-only differentiation. SP001, SP006, SP008, SP017
CP029 Modular's remaining differentiation is the combination of unified kernel tooling, compiler or runtime control, and cross-vendor enablement from one stack rather than any single serving feature. SP001, SP002, SP004
CP030 CUDA lock-in remains the strongest adverse counterpoint to Modular's portability thesis because real migration costs include validation, debugging, and re-qualification, not just benchmark deltas. SP022
CP031 AlphaStreet cites NVIDIA-reported scale of more than 4 million CUDA developers and over 40,000 organizations using CUDA-accelerated applications. SP022
CP032 NVIDIA supply constraints and bundled platforms can strengthen incumbent pricing power because faster access to production-ready compute is itself a procurement advantage. SP022, SP023
CP033 The combination of CUDA tooling, TensorRT-LLM, MGX reference designs, and partner ecosystems makes incumbent response durable for buyers who prioritize mature production operations over portability. SP010, SP022, SP023
CP034 Modular's public funding and product surface show real ambition, but the public evidence does not yet show distribution power on the level of NVIDIA, Hugging Face, or the vLLM community. SP005, SP006, SP017, SP023
CP035 Hugging Face's own documentation recommending vLLM and SGLang is evidence that open-inference mindshare has consolidated around those ecosystems rather than around a new proprietary standard. SP016, SP017
CP036 Anyscale explicitly says customers can scale vLLM and SGLang on its platform, so those ecosystems can borrow orchestration distribution rather than compete as isolated runtimes. SP013
CP037 Together's public materials appeal to buyers who value immediate managed access and transparent economics more than runtime-level programmability. SP014, SP015
CP038 Modular's MAX page still funnels scale deployments toward demos and managed enterprise engagement instead of a fully standardized public price sheet. SP001
CP039 Modular's competitive set is split across open-source engine peers, NVIDIA-specialized incumbents, orchestration or BYOC platforms, managed clouds, and internal-build substitutes. SP006, SP008, SP010, SP012, SP014, SP021, SP024, SP025
CP040 The most likely buyers to prefer MAX are teams that need cross-vendor performance, custom kernels, or rapid bring-up on nonstandard hardware and are willing to bet on a newer stack. SP001, SP002, SP018
CP041 Together publicly lists 1x H100 80GB dedicated infrastructure at $6.49 per hour and on-demand NVIDIA HGX H100 at $5.49 per hour, which is unusually concrete packaging for this category. SP015
CP042 Modular's public materials do not disclose equivalent list pricing for MAX Enterprise or Mammoth-managed deployments. SP001, SP005
CP043 Multiple 2026 comparison articles center the field on vLLM, SGLang, TensorRT-LLM, and TGI, which shows that Modular must break into an already established evaluator shortlist. SP019, SP020, SP021
CP044 Modular's financing post says Mammoth is a Kubernetes-native control plane with router and substrate features for large-scale distributed serving, expanding the company beyond a point inference engine. SP005
CI001 Modular keeps a free self-hosted community edition as a no-upfront-cost entry point for developers. SI001
CI002 Shared endpoints are billed on a per-token basis, scale to zero when idle, and are positioned for prototyping, dev/test, and variable-traffic production workloads. SI002
CI003 Dedicated endpoints are billed per minute on reserved GPU capacity with warm endpoints and no cold-start penalty. SI003
CI004 BYOC is billed per minute of deployed capacity inside the customer environment rather than as a token-priced API. SI001, SI004
CI005 Every paid surface emphasizes forward-deployed engineers and direct workload tuning, indicating a software-plus-services revenue design rather than infrastructure-only resale. SI001, SI002, SI003, SI004, SI005
CI006 Modular publicly offers committed-use and volume pricing for paid cloud and BYOC offers, but it does not publish the discount schedule. SI001
CI007 The pricing page publishes list pricing for hosted model endpoints in dollars per 1 million tokens, making shared-endpoint pricing the clearest public monetization surface. SI001
CI008 On the pricing page, DeepSeek V4 is listed at $1.74 input, $3.48 output, and $0.145 cache-hit per 1 million tokens. SI001
CI009 On the pricing page, GPT OSS 120B is listed at $0.10 input and $0.50 output per 1 million tokens, showing the low end of Modular's current public price band. SI001
CI010 On the pricing page, Qwen 3.7-Max is listed at $1.25 input, $3.75 output, and $0.13 cache-hit per 1 million tokens, showing that higher-end models still price below many proprietary APIs. SI001
CI011 Dedicated and BYOC product pages disclose the billing basis but not the underlying dollar-per-minute rate, so enterprise contract economics remain publicly opaque even when the pricing logic is visible. SI001, SI003, SI004
CI012 In BYOC, Modular keeps the control plane and engineering layer while inference runs inside the customer VPC, implying that customer cloud spend is not the same thing as Modular revenue. SI004
CI013 BYOC lets customers apply their own cloud credits and reserved commitments, which improves buyer ROI but limits Modular to a software, support, and orchestration take-rate. SI004
CI014 The Our Cloud offer is positioned as managed inference that removes cluster provisioning, orchestration, and optimization work from the customer team. SI005
CI015 The Custom Models and MAX pages position Modular to monetize proprietary-model deployment, custom kernels, and performance engineering, which expands the offer beyond commodity API tokens. SI006, SI014
CI016 MAX is presented as a free self-serve starting point that can later be upgraded into managed enterprise deployment in Modular's cloud or the customer's own cloud. SI001, SI014
CI017 Reuters reported that Modular plans to sell software directly to enterprises on a consumption basis and through revenue-sharing partnerships with cloud providers. SI018
CI018 The AI Agents for AWS Marketplace announcement shows that Modular is using AWS Marketplace as a procurement channel that centralizes purchasing, payments, and access through AWS accounts. SI013
CI019 The AWS case study says Marketplace buyers can access standard support, enterprise premium support, and professional services, reinforcing a mixed software-plus-services monetization path. SI012
CI020 Modular had at least two named AWS Marketplace applications in July 2025—MAX High-Performance GenAI Serving Platform and MAX Code Repo Agent—showing a broader SKU surface than a single inference API. SI013
CI021 Modular publicly shows named proof points across customers and partners including Inworld, AWS, NVIDIA, AMD, and Hippocratic AI. SI007, SI010
CI022 A customer quote from Inworld says Modular improved time-to-first-audio by roughly 70% versus a vanilla vLLM implementation and enabled about a 60% lower eventual API price. SI007
CI023 The AWS case-study surface claims 500+ models, 33+ geographic regions, and 15+ CPU+GPU architectures around the MAX-on-AWS offer. SI012
CI024 Modular claims it is being downloaded tens of thousands of times per month, serves trillions of tokens daily in production, and has developers in more than 100 countries. SI010, SI017
CI025 Modular said in September 2025 that it had grown to more than 130 people. SI010, SI018
CI026 Reuters said the company had about 130 employees and planned to use the new capital to expand both engineering and go-to-market teams. SI018, SI009
CI027 TechCrunch reported in 2023 that Modular intended to spend the $100 million round primarily on product expansion, hardware support, language expansion, and team growth rather than on AI compute itself. SI015
CI028 Public sources align that Modular has raised $380 million in primary equity funding across seed, Series B, and Series C rounds. SI015, SI016, SI017, SI018, SI019, SI020
CI029 Public sources align that the September 2025 round valued Modular at about $1.6 billion. SI017, SI018, SI019, SI020
CI030 Modular said the 2025 capital would help it expand from an inference focus into the AI training market, implying a more capital-demanding roadmap than inference-only software. SI010, SI018
CI031 No reviewed public source provided a canonical Modular revenue, ARR, active-customer count, gross margin, CAC, payback, NRR, burn, or runway figure. SI001, SI010, SI015, SI018, SI020
CI032 Official list pricing is useful for understanding billing mechanics but cannot reveal realized enterprise contract rates, channel fees, or gross margins. SI001, SI003, SI004
CI033 Across shared, dedicated, and BYOC offers, Modular repeatedly presents hardware portability and vendor choice as an economic lever that can reduce total cost of ownership. SI002, SI003, SI004, SI005
CI034 Forward-deployed engineers and premium support are likely to increase service-delivery cost even while they support higher ACVs and better retention. SI002, SI003, SI004, SI012
CI035 Modular's gross-margin path likely depends on GPU utilization, batching efficiency, hardware mix, and whether workloads run in Modular-managed cloud or customer-owned infrastructure. SI002, SI003, SI004, SI005, SI021
CI036 AlphaStreet says more than 4 million developers and over 40,000 organizations already use CUDA-accelerated applications, creating practical switching costs for any alternative inference stack. SI022
CI037 NVIDIA's MGX system strategy and platform bundling reinforce incumbent distribution power around validated hardware, networking, and deployment tooling. SI022, SI023
CI038 CoreWeave's S-1/A shows that scaled AI infrastructure can demand substantial capital expenditures and additional external capital even when revenue is growing very quickly. SI021
CI039 CoreWeave reported 2024 revenue of $1.9 billion, net loss of $863 million, and Microsoft concentration at 62% of revenue, illustrating how AI infra scale can coexist with concentration and profitability risk. SI021
CI040 CoreWeave disclosed $1.361 billion of cash and cash equivalents, $5.458 billion of non-current debt, and total indebtedness of about $8.0 billion as of December 2024, underscoring the balance-sheet intensity of owning more infrastructure. SI021
CI041 Third-party market reports still describe a large and growing AI inference and AI infrastructure market, so demand backdrop is not the weak point in the Modular thesis. SI024, SI025
CI042 The public underwriting case rests more on monetization design, customer proof, and partner channels than on disclosed company financial statements. SI001, SI007, SI010, SI018, SI020
CI043 Today Modular appears less balance-sheet intensive than a GPU owner because BYOC and marketplace channels offload much of the infrastructure asset burden, but a move deeper into training could increase financing dependency. SI004, SI013, SI018, SI021
CI044 Because public sources do not disclose cash on hand, monthly burn, or revenue scale, a credible runway estimate cannot be produced from public evidence alone. SI018, SI020, SI021
CI045 Modular's own positioning frames high costs, complex tools, and closed platforms as the economic pain points its paid products are meant to solve. SI008
CI046 The careers page shows the company is still actively hiring and running structured onboarding, consistent with ongoing people investment after the last financing round. SI009
CE001 Modular publicly describes the platform as a vertically integrated suite for AI development and deployment rather than a single-point inference tool. SE013, SE022
CE002 MAX exposes an OpenAI-compatible serving interface through the CLI, Docker, and REST-oriented client examples. SE001, SE013, SE014
CE003 Modular offers self-hosted endpoints, Modular-managed cloud endpoints, and a bring-your-own-cloud deployment model. SE013, SE015
CE004 MAX publicly claims support for more than 500 models or architectures across its serving surface. SE011, SE013, SE020
CE005 Modular says users can serve supported Hugging Face models, load fine-tuned weights, and extend MAX with custom architectures instead of staying inside a fixed catalog. SE001, SE013, SE016
CE006 Modular’s official product and docs pages frame MAX as hardware-agnostic and free from CUDA lock-in across diverse accelerator targets. SE001, SE013
CE007 Mammoth is presented as a Kubernetes-native public-preview orchestration layer for enterprise-scale GenAI serving. SE002, SE012
CE008 Mammoth’s control plane is described as automatically placing models according to performance needs, cluster state, and hardware capabilities. SE002
CE009 Mammoth publicly claims multi-model and multi-hardware orchestration plus intelligent auto-scaling across heterogeneous GPU fleets. SE002
CE010 Mammoth documents disaggregated inference that separates prompt prefill nodes from decode nodes for distributed optimization. SE002
CE011 Mammoth is marketed as enterprise-grade because it is built on Kubernetes with fault tolerance and observability patterns. SE002
CE012 Mojo is described as a kernel-focused systems language that combines Pythonic syntax with high-performance CPU and GPU programming features. SE013, SE021
CE013 Modular states that MAX’s kernels are written in Mojo and that Mojo can be used to extend MAX models with novel algorithms or custom operations. SE013, SE021, SE022
CE014 MAX’s model bring-up workflow centers on architecture packages that include arch.py, model_config.py, model.py, weight_adapters.py, and optional custom layers. SE016
CE015 MAX docs say many new checkpoints can reuse an existing reference architecture with only config overrides or weight-name remapping. SE016
CE016 The public bring-up docs show support for multiple weight formats including Safetensors and GGUF plus explicit handling for FP8 and FP4 quantized checkpoints. SE016
CE017 MAX documents speculative decoding as a native serving feature with EAGLE, EAGLE3, MTP, and standalone draft-model modes. SE017
CE018 For EAGLE and MTP, MAX reports a unified startup architecture because it compiles the target, draft, and verifier into a single graph. SE017
CE019 Structured output is not supported alongside speculative decoding in MAX, and --enable-echo is also excluded in that mode. SE017
CE020 Prefix caching is enabled by default in MAX and is implemented on top of PagedAttention-based KV-cache management. SE018
CE021 MAX docs say prefix caching works on both CPU and GPU and helps when requests share prefixes by improving TTFT and effective throughput. SE018
CE022 Structured output in MAX uses llguidance and supports either JSON schema or Pydantic-defined response contracts. SE019
CE023 MAX’s structured output feature is documented as GPU-only even though all text-generation models are intended to support it at the pipeline level. SE019
CE024 Modular’s managed cloud publicly offers serverless endpoints, dedicated endpoints, custom-model inference, and batch inference. SE015
CE025 In BYOC mode, Modular says the data plane stays inside the customer VPC while a Modular-operated control plane manages endpoint lifecycle, scaling, monitoring, and model registration. SE015
CE026 Modular’s BYOC docs claim support across AWS, GCP, Azure, and OCI with NVIDIA, AMD, and Apple Silicon targets. SE015
CE027 Modular includes forward-deployed engineers in its public cloud-deployment story for workload profiling, bottleneck analysis, and custom Mojo-kernel work. SE015
CE028 Modular 26.1 graduated the MAX Python API out of experimental with PyTorch-like eager mode and model.compile for production use. SE006, SE022
CE029 Modular 26.1 added compile-time reflection, linear types, typed errors, and better error messages to Mojo. SE006
CE030 Modular 25.6 added Apple Silicon GPU support and pip install mojo with a bundled compiler, LSP server, and debugger. SE007
CE031 MAX 25.2 added multi-GPU H100 and H200 support and promoted a 1.3 GB compressed slim serving container that avoids bundling CUDA. SE008
CE032 Modular 25.6 publicly claimed industry-leading performance on NVIDIA B200 and AMD MI355X with reproducible benchmarking scripts. SE007, SE023
CE033 Modular’s AMD partnership announcement said the platform became generally available across AMD’s MI300 and MI325 GPU portfolio. SE009
CE034 Modular’s MI355 bring-up post says rapid hardware enablement was possible because almost all of the stack is architecture-agnostic and only a small kernel subset needed updating. SE010
CE035 The structured-kernels series argues that Modular can keep a common kernel structure while progressively specializing TileIO, TilePipeline, and TileOp components per hardware target. SE010, SE023
CE036 Modular 26.3 announced a Mojo 1.0 beta, video generation in MAX with Wan 2.2, and a plan to finalize Mojo 1.0 later in 2026. SE005
CE037 Modular’s 2025 year-in-review post says Mammoth is intended to come to managed endpoints in 2026 while MAX kernels and the MAX Python API became open-source milestones in 2025. SE012
CE038 The main GitHub repository advertises nightly and stable release branches, monthly community meetings, and a public bug-report and contribution path. SE022, SE024
CE039 The GitHub repository says that as of May 2025 it included more than 450,000 lines of code from over 6,000 contributors. SE022
CE040 The modular package was distributed through PyPI as version 26.3.0 with a file upload date of May 7, 2026. SE025
CE041 Modular maintains a Meetup group for developers and AI practitioners interested in Mojo and the MAX platform. SE026, SE035, SE036
CE042 The Stack Overflow mojo-lang tag showed zero questions at fetch time, indicating that mainstream external Q-and-A footprint is still very early. SE027
CE043 Modular’s privacy policy says it uses technical, organizational, and administrative security measures but explicitly notes that no method of transmission or storage is completely secure. SE028
CE044 Modular provides a public issue-report workflow for safety, privacy, and security concerns that routes reports to its security team. SE030
CE045 Modular’s Acceptable Use Policy governs the MAX Platform, Modular Cloud, and AI-powered features and requires human review when outputs inform legal, medical, or financial advice. SE031
CE046 Modular’s Community License is contract-governed, permits telemetry usage, and requires approval for custom hardware use beyond supported targets. SE032
CE047 The Community License forbids reverse engineering the SDK and redistributing the SDK as a standalone component. SE032
CE048 Modular’s Terms of Service incorporate the privacy policy, acceptable-use policy, and community license into overall platform use. SE029
CE049 One independent ecosystem review argues that Mojo’s open standard library does not remove the compliance concern created by a still-closed MAX compiler for auditable toolchains. SE034
CE050 An independent 2026 benchmark review says MAX is compelling for dense models and hardware portability but that vLLM still remains the broader general-purpose production default. SE033
CU001 Modular's visible customer set splits across free self-serve developers, managed-cloud experimenters, latency-sensitive production buyers, compliance-sensitive BYOC buyers, AI-native workload operators, and cloud or channel counterparties. SU009, SU010, SU011, SU012, SU013, SU024, SU026
CU002 The Self Hosted edition is a free developer-acquisition funnel rather than public proof of paid customer breadth. SU009, SU016, SU026
CU003 Shared Endpoints are positioned for rapid experimentation and variable-traffic production with pay-per-token billing. SU009, SU011
CU004 Dedicated Endpoints are positioned for latency-sensitive production on reserved warm GPU capacity billed per minute. SU009, SU012
CU005 BYOC runs inference in the customer's VPC or on-prem environment while the customer keeps the hardware, data, and cloud credits. SU009, SU013
CU006 Across the public deployment surfaces, developers often start evaluations but infrastructure, security, or procurement owners become the real budget holders on Dedicated and BYOC deployments. SU009, SU011, SU012, SU013
CU007 Modular's customers page mixes genuine customer proof with partner and hardware-platform signaling, so logos and quotes on that page do not all carry the same evidentiary weight. SU001, SU006, SU007
CU008 Inworld is a real production customer proof point because both Modular and Inworld describe the same live text-to-speech deployment. SU002, SU025
CU009 The Inworld deployment is publicly associated with roughly 70% faster time to first audio, about 200 milliseconds to the first two seconds of audio, and an eventual price roughly 60% lower than a vanilla vLLM-based implementation. SU002, SU025
CU010 Modular says the Inworld engagement moved from start-of-engagement to production in less than eight weeks on NVIDIA Blackwell. SU002
CU011 Inworld's own blog says vLLM was not enough for production and that specialized APIs were needed to make real-time speech synthesis scalable and economical. SU025
CU012 Hippocratic AI is described as a live workload operator because its system contacts tens of thousands of patients daily and already runs production deployments across multiple frameworks. SU003
CU013 Hippocratic AI evaluated MAX against an existing SGLang deployment on 400B-plus-parameter models using NVIDIA B300 GPUs. SU003
CU014 Hippocratic AI's public evaluation metrics include sub-500ms mean TTFT, about 30% faster P99 end-to-end latency, and roughly 22% faster mean end-to-end latency. SU003
CU015 The Hippocratic material implies an ongoing collaboration and future heterogeneous-hardware strategy, which is stronger than a one-off benchmark but weaker than disclosed renewal evidence. SU003
CU016 AWS should be treated primarily as partner and channel proof rather than as direct diversified end-customer proof. SU007, SU014, SU015, SU024
CU017 Modular says MAX is being brought to AWS production services and quotes AWS framing the platform as helpful for millions of AWS customers. SU007
CU018 Modular's AWS case study says the MAX-on-AWS path spans 15-plus architectures, 500-plus models, 33-plus regions, and deployment across ECS, EKS, EC2, and AWS Batch. SU014
CU019 Modular's AWS Marketplace announcement says at least two Modular applications are available through AWS Marketplace with centralized AWS-account purchasing. SU015
CU020 SF Compute is a partner-led commercialization surface rather than direct end-customer proof. SU004, SU005
CU021 The SF Compute launch says the joint batch-inference API supports more than 20 models and offers free tokens to the first 100 new customers. SU004, SU005
CU022 Modular's Platform 25.5 post says Mammoth keeps over 90% cluster utilization in the large-scale batch-inference product, but that metric is a company claim without an external customer denominator. SU005
CU023 Modular's public top-of-funnel proxies include free self-hosted access, monthly community meetings, GitHub activity, and install flows that lower trial friction for developers. SU008, SU016, SU026
CU024 Modular says it has 10K's monthly downloads, 100K's developers in 100-plus countries, trillions of daily production tokens, and up to 70% latency reduction plus 80% cost reduction for partners and customers. SU008
CU025 Reuters says Modular serves cloud providers such as Oracle and Amazon, as well as chipmakers Nvidia and AMD, and plans to sell directly to enterprises and through revenue-sharing partnerships with cloud providers. SU024
CU026 Independent coverage repeatedly frames Inworld and SF Compute as the clearest named enterprise references while listing Oracle, AWS, Lambda Labs, and hardware vendors as ecosystem counterparties. SU019, SU020, SU021
CU027 BYOC is the clearest public enterprise-scale proof because it claims Fortune 500 scale and customer-controlled compliance boundaries, but it does not name the enterprise accounts. SU013
CU028 The reviewed public materials do not disclose customer count, NRR, GRR, churn, contract duration, or renewal schedule. SU001, SU009, SU013
CU029 The best public durability proxies are repeat co-engineering depth at Inworld and Hippocratic plus AWS procurement packaging, not explicit renewal or cohort data. SU002, SU003, SU014, SU025
CU030 The visible expansion loop runs from free self-hosted usage into Shared Endpoints, then Dedicated or BYOC production, and finally into custom engineering or channel procurement. SU009, SU011, SU012, SU013, SU015
CU031 Every paid deployment surface includes engineer involvement or optimization support, implying that account expansion depends partly on services attachment rather than pure self-serve software alone. SU009, SU011, SU012, SU013
CU032 Public customer proof is concentrated in four named reference accounts or channels—Inworld, Hippocratic AI, AWS, and SF Compute—rather than a broad list of independently corroborated end customers. SU001, SU002, SU003, SU004, SU014
CU033 The difference between strong customer proof and weak proof is visible on Modular's own surfaces, where named case studies sit alongside partner quotes and broad ecosystem mentions. SU001, SU007
CU034 Public sources do not disclose top-customer revenue share, partner-sourced bookings mix, or concentration by vertical. SU008, SU024
CU035 The strongest named end-market evidence is AI-native real-time voice and high-performance inference infrastructure, not a broad horizontal enterprise portfolio. SU002, SU003, SU025
CU036 Partner dependence is material because Modular's public customer story repeatedly routes through AWS Marketplace, cloud credits in BYOC, and named cloud-provider relationships. SU013, SU015, SU024
CU037 CUDA lock-in and scarce high-end GPU supply raise switching costs for customers considering alternatives to incumbent AI infrastructure stacks. SU023
CU038 Independent coverage frames the main strategic question as whether Modular can outpace hyperscalers and chip giants, which reinforces the distribution and adoption risk around customer expansion. SU022
CU039 Public mentions of Oracle and Lambda prove ecosystem or cloud-counterparty relationships more clearly than they prove direct paying-customer status. SU006, SU018, SU024
CU040 Inworld and Hippocratic AI are the clearest production-grade proof points, whereas AWS and SF Compute are stronger as channel proof and unnamed enterprise-scale claims remain lower-grade evidence. SU002, SU003, SU004, SU014, SU001
CU041 Modverse and a public YouTube talk show Modular publicly linking Inworld and Oracle around OCI and GPU portability, but without disclosing a direct Oracle contract scope or buyer identity. SU006, SU017
CU042 Fortune 500 scale and trillion-token claims are useful leads for diligence, but without named accounts or denominators they cannot substitute for customer-count or renewal disclosure. SU001, SU008, SU013
CR001 The public privacy policy was updated on 2026-02-04. SR001
CR002 Modular's privacy policy states that it governs the privacy rights attached to its platform, websites, and services. SR001
CR003 Modular says it retains personal data while an account remains open or as otherwise necessary for services and business purposes, and it also states that internet transmission and storage are not completely secure. SR001
CR004 The company directs safety, privacy, and security issues to a security-team intake flow instead of the normal GitHub bug channel. SR003
CR005 The public terms allow service suspension and disclaim liability for losses or damages that result from a suspension. SR002
CR006 The public terms also disclaim responsibility for accuracy, availability, errors, and related consequences of platform use, while requiring user indemnification. SR002
CR007 Modular publicly markets its paid offering as SOC 2 Type 2 certified. SR006, SR008
CR008 The company publicly differentiates commercial risk transfer by billing shared endpoints per token, dedicated endpoints per minute, and BYOC deployments per minute in the customer's cloud. SR006, SR010, SR011, SR008
CR009 BYOC keeps inference inputs and outputs inside the customer network while the control plane stays outside the VPC. SR008
CR010 BYOC relies on BentoCloud-proven infrastructure automation and supports AWS, GCP, Azure, and OCI while using the customer's own cloud credits and reservations. SR008
CR011 Shared endpoints are marketed as a no-minimum, scale-to-zero offering where NVIDIA-versus-AMD choice is positioned as a pricing and availability lever. SR010
CR012 Dedicated endpoints are marketed as always-warm reserved GPU capacity bundled with forward-deployed engineers. SR011
CR013 Modular says custom models can be compiled from one codebase across NVIDIA, AMD, Apple Silicon, and ARM targets. SR012
CR014 The company says Chris Lattner and Tim Davis founded Modular in 2022 to simplify fragmented AI infrastructure. SR004
CR015 The About page lists offices in San Francisco, Los Altos, Boston, and Edinburgh and names leaders across engineering, finance, product, and special projects. SR004
CR016 The careers page shows active hiring and emphasizes distributed computation and low-level GPU kernel work, which supports the view that expert systems talent remains central to execution. SR005
CR017 Core modules from the Mojo standard library were released under an Apache 2 license. SR013
CR018 Modular says Mojo 1.x will use semantic versioning and stable interfaces, but it also warns that future roadmap phases will introduce source-breaking changes on the path to Mojo 2.0. SR014
CR019 Modular's 2026 product materials tie its current value proposition to support for NVIDIA Blackwell, AMD MI355X, and Apple GPU targets. SR015, SR016
CR020 The GTC 2026 post shows Modular publicly demoing Blackwell/B200 workloads and states that its kernel code is open source in the modular/max repository. SR016
CR021 Independent and company sources agree that Modular raised $250 million in 2025, bringing total capital raised to $380 million at a $1.6 billion valuation. SR019, SR032, SR033
CR022 The same funding coverage says Modular had grown to more than 130 people and was seeing strong demand from enterprises and hardware partners. SR019, SR032
CR023 Modular claims that its platform is downloaded 10Ks of times per month, powers trillions of tokens served daily, and has a developer ecosystem spanning 100+ countries. SR019
CR024 Modular and AWS present MAX on AWS as a way to exploit Graviton CPUs with claimed performance and cost benefits, which also deepens the company's AWS distribution tie. SR020
CR025 The AWS case study says Modular packages 15+ CPU/GPU architectures, 500+ models, and 33+ regions across AWS deployment surfaces. SR021
CR026 The AWS case study identifies hardware complexity, vendor lock-in, deployment/scaling friction, and OpenAI-API migration effort as the buyer pain points Modular is trying to solve. SR021
CR027 The AWS Marketplace AI-agents page advertises enterprise-grade SLA-backed support. SR022
CR028 DOJ's Data Security Program became effective on 2025-04-08, and certain due-diligence, audit, annual-report, and rejected-transaction reporting requirements for restricted transactions became effective on 2025-10-05. SR023, SR024
CR029 DOJ says the program prohibits or restricts certain transactions that could give countries of concern or covered persons access to U.S. government-related data or Americans' bulk sensitive personal data. SR023, SR024
CR030 The DOJ compliance guide frames the program as a proactive response to foreign-adversary access to Americans' sensitive data, implying a real compliance burden for data-handling AI infrastructure vendors. SR024
CR031 BIS states that a license is required to export advanced computing items to entities headquartered in Country Group D:5 and Macau. SR025
CR032 NIST's Cyber AI Profile draft provides guidance for managing cybersecurity risk related to AI systems across Secure, Defend, and Thwart focus areas. SR026, SR027
CR033 NCSL's database shows that state AI legislation spans private-sector use, employment, health, responsible use, discrimination, and provenance topics. SR028
CR034 Troutman says its state AI law tracker focuses on laws that directly or indirectly affect private-sector AI development and deployment. SR029
CR035 AlphaStreet argues that NVIDIA's moat in AI accelerators remains anchored in CUDA lock-in that is deeply embedded across development and production workflows. SR030
CR036 The same analysis argues that supply scarcity makes time to usable compute a premium and disadvantages firms that are outside priority supply lists. SR030
CR037 NVIDIA says MGX is an open modular reference architecture that helps OEMs, ODMs, and ecosystem partners build accelerated systems faster with multi-generational compatibility. SR031
CR038 CoreWeave's S-1/A says it works with NVIDIA to deploy the latest GPU technologies at scale, illustrating how AI infrastructure vendors can become tightly coupled to NVIDIA's supplier ecosystem. SR034
CR039 Independent funding coverage corroborates Modular's pitch that the company is building a unified compute layer across heterogeneous hardware rather than a single-vendor point solution. SR032, SR033
CR040 Modular's public customer proof is concentrated in a relatively small set of named references, with Inworld and AWS materially more visible than a broad roster of disclosed enterprise accounts. SR017, SR018, SR021
CR041 The Inworld case study claims roughly 70% faster first audio, about 200ms latency for the first two seconds, and an eventual price roughly 60% lower than a vanilla vLLM path. SR018, SR017
CR042 Across dedicated, shared, and BYOC materials, Modular repeatedly positions forward-deployed engineers as part of the product rather than only as post-sale support. SR008, SR010, SR011
CR043 No reviewed public source in this pack discloses Modular's revenue, ARR, gross margin, burn, or runway. SR019, SR032, SR033, SR006
CR044 No reviewed public source in this pack discloses customer count, renewal behavior, NRR, or concentration by account, hardware partner, or cloud partner. SR017, SR019, SR021
CR045 No reviewed public source in this pack discloses a full board roster, formal succession plan, or named replacement depth for the founder leadership. SR004, SR005, SR019
CR046 No reviewed public source in this pack provides a public incident register, uptime history, or scope-level SOC 2 report for the paid platform. SR003, SR006, SR022
CR047 BYOC materially mitigates data-residency and data-leakage concerns by keeping inference inside the customer cloud, but the external control plane means shared-responsibility boundaries still matter. SR008, SR006, SR024
CR048 State AI-law proliferation plus DOJ Part 202 together create a moving compliance perimeter for AI infrastructure vendors serving regulated workloads. SR023, SR028, SR029, SR032
CR049 Multi-vendor GPU portability reduces but does not eliminate dependence on NVIDIA roadmaps, supply conditions, and ecosystem standards because Modular still markets Blackwell performance and operates inside NVIDIA-linked partner ecosystems. SR015, SR016, SR030, SR031
CR050 AWS Marketplace and cloud-credit procurement reduce buying friction, but they also increase channel dependence on hyperscaler partner programs and marketplace economics. SR020, SR021, SR022, SR008
CR051 Modular's public security posture looks more mature on control marketing than on transparency because the company markets SOC 2 Type 2 and VPC/BYOC controls but does not publish comparable detail on incident history or audit scope. SR006, SR008, SR022, SR003
CR052 Product and platform roadmap risk remains material because Modular is simultaneously expanding open-source Mojo, managed inference, custom kernels, and multi-vendor hardware support. SR013, SR014, SR015, SR016
CR053 Headcount growth helps, but the repeated reliance on forward-deployed engineers implies that talent density can still become the gating factor for enterprise delivery quality. SR005, SR019, SR010, SR011
CR054 Fresh capital mitigates near-term solvency risk, but the absence of public unit-economics disclosure means valuation and execution expectations still outrun what outside investors can verify. SR019, SR032, SR033
CV001 Modular said in September 2025 that it raised $250 million in a third financing round, bringing total capital raised to $380 million at a $1.6 billion valuation. SV001, SV004, SV006
CV002 SDxCentral and the company both described the 2025 round as nearly tripling Modular's prior valuation. SV001, SV004
CV003 TechCrunch and GV documented an earlier $100 million 2023 financing round for Modular. SV002, SV003
CV004 Reuters framed Modular's mission as challenging NVIDIA's software stranglehold by building a unified compute layer across heterogeneous hardware. SV006, SV001
CV005 Modular said it had grown to more than 130 people by the 2025 financing announcement. SV001
CV006 Modular claimed its platform was being downloaded tens of thousands of times per month, serving trillions of tokens daily, and reaching developers in more than 100 countries. SV001, SV004
CV007 Those traction proxies are usage and ecosystem claims rather than disclosed revenue, ARR, or retention metrics. SV001, SV017, SV022
CV008 None of the reviewed public sources disclosed Modular's revenue, ARR, gross margin, burn, NRR, or customer concentration. SV001, SV016, SV017, SV022
CV009 Modular's pricing surfaces reveal billing mechanics but not actual minute-rate cards, realized discounts, or margin data. SV016, SV024, SV025
CV010 Modular's pricing page says managed cloud offers charge per token or per minute and support committed-use or volume pricing. SV016, SV024, SV025
CV011 Every paid tier includes forward-deployed engineers, making services intensity part of the commercial model rather than an edge case. SV016, SV025, SV026
CV012 Modular says BYOC keeps inference inputs and outputs inside the customer VPC while the control plane remains outside that VPC and the customer keeps its cloud credits. SV023, SV016
CV013 Shared Endpoints and related managed surfaces are marketed as OpenAI-compatible, which lowers integration friction but does not itself prove durable retention. SV024, SV016
CV014 Inworld said MAX improved time to first audio by about 70% and enabled an eventual API price roughly 60% lower than its vanilla vLLM-based path. SV018, SV021
CV015 Hippocratic AI said its production system contacts tens of thousands of patients daily and that MAX delivered sub-500ms mean TTFT in evaluation against an existing SGLang deployment on 400B+ models. SV032
CV016 Public customer proof is concentrated in a small number of named reference accounts rather than a disclosed broad enterprise roster. SV017, SV018, SV021, SV032
CV017 Modular's open-source and developer surfaces show Apache 2 licensing, public CI, nightly or stable releases, and scheduled community meetings. SV019, SV020, SV030, SV031
CV018 The Business Research Company estimates the AI infrastructure market at $90.91 billion in 2026 and $226.95 billion by 2030. SV012
CV019 Fortune Business Insights estimates the AI inference market at $117.80 billion in 2026 and $312.64 billion by 2034. SV013
CV020 Independent inference-engine reviews describe vLLM, SGLang, TensorRT-LLM, and related stacks as credible established alternatives, so Modular competes in a crowded benchmark-driven field. SV014, SV015
CV021 Spheron's comparison positions MAX as one engine among several established options rather than an uncontested market standard. SV014
CV022 NVIDIA's MGX program and annual report show how the incumbent can deepen OEM, system, and software lock-in around its own platform stack. SV011, SV009
CV023 AlphaStreet argued that CUDA lock-in and supply scarcity make NVIDIA's AI moat harder to break than it may initially appear. SV010
CV024 CoreWeave's S-1/A shows that explosive AI-infrastructure growth can coexist with substantial capital expenditure needs, leverage, and concentration risk. SV008
CV025 CoreWeave disclosed $1.9 billion of 2024 revenue, $15.1 billion of remaining performance obligations, and Microsoft as 62% of 2024 revenue, illustrating the scale-concentration trade-off in AI infrastructure. SV008
CV026 NVIDIA's 2026 annual report reinforces that AI infrastructure competition is fought against hyperscalers and integrated platform vendors with far larger ecosystems and budgets than Modular. SV009, SV011
CV027 Together AI announced a $305 million Series B in 2025, and Sacra reports that round carried a $3.3 billion valuation. SV033, SV037
CV028 Sacra estimates Together AI reached a $1 billion annualized revenue run-rate in February 2026 and says its prior $1.25 billion valuation represented about 9.6x 2024 revenue. SV037
CV029 Groq announced $750 million of new financing at a $6.9 billion post-money valuation in September 2025. SV034
CV030 Lambda announced over $1.5 billion of Series E funding in November 2025, and Tech Funding News reported a prior $480 million Series D at a $4 billion valuation. SV035, SV036
CV031 Cerebras announced a $1.1 billion Series G at an $8.1 billion valuation in September 2025. SV038
CV032 Relative to scarce-infrastructure peers like Groq, Together AI, Lambda, and Cerebras, Modular's $1.6 billion mark is smaller in absolute terms but still difficult to underwrite because its revenue base is undisclosed. SV001, SV033, SV034, SV035, SV037, SV038
CV033 At a $1.6 billion valuation, Modular would need roughly $160 million of annual revenue to trade at 10x revenue, about $200 million at 8x, and about $267 million at 6x. SV001, SV037
CV034 Public evidence is insufficient to know whether Modular already clears any of those revenue thresholds. SV001, SV016, SV017, SV022
CV035 The price-sensitive public recommendation is therefore research-more rather than buy, because private revenue, margin, retention, and preference data are still missing. SV001, SV016, SV017, SV022, SV037
CV036 The current $1.6 billion mark is only attractive if Modular combines very fast growth with software-like margins and broader enterprise durability than the public sources presently show. SV001, SV018, SV021, SV032, SV037
CV037 Because paid offerings mix token APIs, minute-priced reserved capacity, BYOC control planes, and engineering-heavy optimization work, the gross-margin profile could look either software-like or services-heavy depending on usage mix. SV016, SV023, SV024, SV025, SV026
CV038 The cleanest anti-thesis is that Modular scales like a high-touch optimization vendor rather than a broadly self-serve software platform. SV016, SV025, SV026, SV032
CV039 A credible bull case requires continued benchmark leadership across NVIDIA and AMD, successful enterprise conversion of the open-source funnel, and private disclosure that revenue is already high enough to justify a premium multiple. SV001, SV014, SV018, SV029, SV037
CV040 A credible base case assumes strong market growth and real customer pull, but also continued opacity on revenue quality and some multiple compression across the AI infrastructure category. SV012, SV013, SV016, SV017, SV037
CV041 A credible bear case assumes NVIDIA-centric incumbents and open-source alternatives narrow Modular's differentiation before the company proves software-quality economics. SV010, SV011, SV014, SV015, SV023
CV042 There is no public evidence yet of IPO preparation, audited recurring-metrics disclosure, or a cap-table and preference stack that outside investors can model. SV001, SV022, SV037
CV043 The final diligence agenda should prioritize current revenue or ARR, gross margin by product surface, cohort retention, customer concentration, cap table and preferences, and org mix between product and forward-deployed engineering. SV016, SV017, SV022, SV025
CV044 A more constructive stance would require either a lower entry price or private diligence proving roughly $150-250 million of revenue with durable margins and manageable concentration. SV001, SV037, SV012, SV013
CV045 A more negative stance would be warranted if the next financing is flat or down, if reference customers fail to expand, or if performance portability advantages erode against better-capitalized rivals. SV001, SV010, SV018, SV021, SV029, SV032
CV046 Official competitor rounds and market reports show capital is still pouring into AI infrastructure winners, which creates both upside optionality and valuation risk for investors who buy before economics are disclosed. SV029, SV030, SV031, SV034, SV035, SV038, SV039, SV040, SV012, SV013
来源
编号出版方标题引文
SO001 Modular Modular: About Us Chris Lattner & Tim Davis met at Google. Frustrated by AI’s fragmented infrastructure and determined to accelerate AI’s global impact, they founded Modular, headquartered in Silicon Valley.
SO002 Modular Modular raises $250M to scale AI’s unified compute layer This brings its total capital raised to $380M across three rounds since its founding in 2022 and values Modular at $1.6 billion.
SO003 Modular Modular opens Edinburgh and San Francisco offices We have also opened a new office in San Francisco’s Jackson Square neighborhood, joining our Los Altos headquarters as our second Bay Area location.
SO004 Modular Mojo: local download launch post Since our launch of the Mojo programming language on May 2nd, more than 120K+ developers have signed up to use the Mojo Playground and 19K+ developers actively discuss Mojo on Discord and GitHub.
SO005 Modular The next big step in Mojo open source We are thrilled to announce the release of the core modules from the Mojo standard library under the Apache 2 license!
SO006 Modular The path to Mojo 1.0 We feel confident that Mojo will get to 1.0 sometime in 2026. This will also allow us to open source the Mojo compiler as promised.
SO007 Modular Modular 26.3: Mojo 1.0 beta, MAX video generation, and more Mojo 1.0 is officially in beta.
SO008 Modular Introducing Mammoth Mammoth is a distributed AI serving tool designed for enterprise-scale deployment.
SO009 Modular Modular partners with AWS to bring MAX to AWS services The MAX Platform is able to supercharge Graviton CPUs, executing your AI models with up to 5X higher performance and up to 80% cost savings out-of-the-box.
SO010 Modular Modular x AMD: unleashing AI performance on AMD GPUs Effective immediately, developers can deploy the Modular Platform on AMD’s flagship datacenter accelerators, including the MI300 and MI325 series.
SO011 Modular Modular: Customer Success Stories Enterprise innovation, supercharged by Modular.
SO012 Modular Modular: Editions & Pricing Free Forever. The full power of MAX and Mojo - free for all developers.
SO013 Modular Modular: Careers Our onboarding process for new employees is conducted onsite at our Los Altos, CA office.
SO014 Modular Modular: Privacy Policy
SO015 Modular Modular: Terms of Service Modular hereby grants you a right to access and use the Modular Platform on a non-exclusive, non-transferable, and non-sublicensable basis.
SO016 GitHub GitHub - modular/modular: The Modular Platform (includes MAX & Mojo) Modular raises $250M to scale AI’s unified compute layer, bringing Modular’s total raise to $380M at a $1.6B valuation.
SO017 MojoLang Mojo Stable: 1.0.0b1 (May 7) | Latest nightly Jun 11
SO018 TechCrunch Modular raises $100M for AI dev tools Modular, a startup creating a platform for developing and optimizing AI systems, has raised $100 million in a funding round led by General Catalyst.
SO019 The SaaS News Modular Raises $100 Million in Funding The round was led by General Catalyst, with participation from GV (Google Ventures), SV Angel, Greylock, and Factory.
SO020 GV Why GV invested in Modular We are leading the first round of funding for Modular, investing alongside Greylock and Factory.
SO021 SDxCentral Modular raises $250M for AI’s unified compute layer at $1.6B valuation The Palo Alto, California-based company’s latest round was led by Thomas Tull’s U.S. Innovative Technology fund.
SO022 SiliconANGLE Modular raises $250M to simplify AI deployment across hardware
SO023 Yahoo Finance / Reuters AI startup Modular raises $250 million at $1.6 billion valuation The company, with about 130 employees, plans to use the new capital to expand its engineering and go-to-market team.
SO024 Sacra Modular valuation, funding & news The company previously raised a $100 million Series B in August 2023 at approximately a $600 million valuation. Before that, Modular secured a $30 million seed round in June 2022.
SO025 GitHub Is mojo open source / free? · Issue #25 · modular/modular Reason for asking is to prevent future lock-ins (people migrating away from python and finding themselves with a limited version or having to pay for mojo).
SM001 Modular Modular Raises $250M to scale AI's Unified Compute Layer
SM002 Modular Modular: Shared Endpoints, Our Cloud, Any GPU
SM003 Modular Modular: Your Cloud, Our Engineers, Any GPU
SM004 Modular Modular: Our Cloud
SM005 Modular Faster agentic AI systems on any hardware
SM006 Modular Human-sounding text-to-speech on any hardware
SM007 Modular Faster AI coding infrastructure on any hardware
SM008 Modular AI Model Library, Deploy Open-Source LLMs & Image Models | Modular
SM009 Modular Modular: Customer Success Stories
SM010 Modular Modular: Editions & Pricing
SM011 Cloud Native Computing Foundation Kubernetes Established as the De Facto Operating System for AI as Production Use Hits 82% in 2025 CNCF Annual Cloud Native Survey
SM012 Cloud Native Computing Foundation Welcome llm-d to the CNCF: Evolving Kubernetes into SOTA AI infrastructure
SM013 Google Cloud llm-d officially a CNCF Sandbox project
SM014 Forbes AI Inference Takes Center Stage At KubeCon Europe 2026
SM015 ONNX Runtime ONNX Runtime | Home
SM016 ONNX Runtime ONNX Runtime for Inferencing
SM017 ONNX Runtime Execution Providers | onnxruntime
SM018 LLVM Project LLVM - MLIR
SM019 GitHub llm-d/llm-d repository
SM020 GitHub microsoft/onnxruntime repository
SM021 Phoronix MLIR-AIE 1.3 Released For AMD-Xilinx AI Engines / Ryzen AI NPUs
SM022 The Business Research Company Global AI Infrastructure Market Report 2026
SM023 Technavio AI Inference Hardware Market Industry Analysis
SM024 Fortune Business Insights AI Inference Market
SM025 AlphaStreet Nvidia's CUDA Lock-In and Supply Scarcity Make Its AI Chip Moat Harder to Break Than It Looks
SM026 NVIDIA MGX Platform for Modular Server Design | NVIDIA
SP001 Modular MAX: A high-performance inference framework for AI
SP002 Modular Modular: Modular 25.6: Unifying the latest GPUs from NVIDIA, AMD, and Apple
SP003 Modular Modular: Modular at NVIDIA GTC 2026: MAX on Blackwell, Mojo Kernel Porting, and DeepSeek V3 on B200
SP004 Modular Modular: MAX 25.2: Unleash the power of your H200's–without CUDA!
SP005 Modular Modular: Modular Raises $250M to scale AI's Unified Compute Layer
SP006 vLLM vLLM
SP007 vLLM Project GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs
SP008 SGLang Welcome to SGLang - SGLang Documentation
SP009 SGLang Project GitHub - sgl-project/sglang: SGLang is a high-performance serving framework for large language models and multimodal models.
SP010 NVIDIA Welcome to TensorRT LLM’s Documentation! — TensorRT LLM
SP011 NVIDIA GitHub - NVIDIA/TensorRT-LLM: TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
SP012 Ray Scalable and Programmable Serving — Ray 2.55.1
SP013 Anyscale Production-scale AI with Ray | Anyscale
SP014 Together AI Together AI | The AI Native Cloud
SP015 Together AI Pricing | Together AI
SP016 Hugging Face Text Generation Inference · Hugging Face
SP017 Hugging Face GitHub - huggingface/text-generation-inference: Large Language Model Text Generation Inference
SP018 Spheron Modular MAX and Mojo on GPU Cloud: Deploy an LLM Inference Engine That Outperforms vLLM (2026 Guide) | Spheron Blog
SP019 Yotta Labs Best LLM Inference Engines (2026): vLLM, SGLang & TensorRT-LLM | Yotta Labs
SP020 Kanerika 10 Best vLLM Alternatives for AI Inference in 2026
SP021 Future AGI Best 5 OctoML Alternatives for LLM Inference in 2026
SP022 AlphaStreet Nvidia’s CUDA Lock-In and Supply Scarcity Make Its AI Chip Moat Harder to Break Than It Looks
SP023 NVIDIA NVIDIA MGX Platform
SP024 ONNX Runtime ONNX Runtime
SP025 llm-d llm-d - Kubernetes-Native Distributed LLM Inference with vLLM | llm-d
SI001 Modular Modular: Editions & Pricing
SI002 Modular Modular: Shared Endpoints, Our Cloud, Any GPU
SI003 Modular Modular: Dedicated Endpoints
SI004 Modular Modular: Your Cloud, Our Engineers, Any GPU
SI005 Modular Modular: Our Cloud
SI006 Modular Modular: Custom Models
SI007 Modular Modular: Customer Success Stories Our API now returns the first 2 seconds of synthesized audio on average ~70% faster compared to vanilla vLLM based implementation... eventually offer the API at a ~60% lower price than would have been possible without using Modular's stack.
SI008 Modular Modular: About Us
SI009 Modular Modular: Careers
SI010 Modular Modular: Modular Raises $250M to scale AI's Unified Compute Layer Its platform is being downloaded 10K’s of times per month... powers trillions of tokens served daily in production... delivered up to 70% latency reduction and 80% cost reductions for their partners and customers.
SI011 Modular Modular: Modular partners with Amazon Web Services (AWS) to bring MAX to AWS services The MAX Platform is able to supercharge Graviton CPUs, executing your AI models with up to 5X higher performance and up to 80% cost savings out-of-the-box when compared with existing AI infrastructure.
SI012 Modular Modular: AWS Case Study Through AWS Marketplace, organizations gain access to standard support for deployment and configuration, enterprise premium support for large-scale implementations, and professional services for custom optimization and integration.
SI013 Modular Modular: AI Agents for AWS Marketplace Customers can now use AWS Marketplace to easily discover, purchase, and deploy AI agent solutions... with centralized purchasing using AWS accounts, customers maintain visibility and control over licensing, payments, and access through AWS.
SI014 Modular Modular: MAX
SI015 TechCrunch Modular raises $100M for AI dev tools
SI016 GV Modular AI
SI017 SDxCentral Modular raises $250M for AI's unified compute layer at $1.6B valuation
SI018 Yahoo Finance / Reuters AI startup Modular raises $250 million at $1.6 billion valuation It plans to sell the software directly to enterprises on a consumption basis and through revenue-sharing partnerships with cloud providers.
SI019 SiliconANGLE Modular raises $250M to simplify AI deployment across hardware
SI020 Sacra Modular
SI021 Securities and Exchange Commission S-1/A
SI022 AlphaStreet Nvidia's CUDA lock-in and supply scarcity make its AI chip moat harder to break than it looks More than 4 million developers have registered for CUDA and over 40,000 organizations use CUDA-accelerated applications.
SI023 NVIDIA NVIDIA MGX
SI024 The Business Research Company AI Infrastructure Market Report 2026
SI025 Fortune Business Insights AI Inference Market
SI026 AWS Marketplace Modular seller profile on AWS Marketplace
SI027 AWS Marketplace Modular Platform: High-Performance GenAI Serving listing
SI028 AWS Marketplace Modular Platform: Code Repo Agent listing
SE001 Modular MAX: A high-performance inference framework for AI MAX doesn't depend on PyTorch, CUDA, or ROCm, so there's nothing to bundle, patch, or keep in sync.
SE002 Modular Modular: Introducing Mammoth: Enterprise-Scale GenAI Deployments Made Simple Mammoth's intelligent control plane sets it apart—it acts as the brain of your AI infrastructure, automatically optimizing model placement based on performance needs, cluster state, and hardware capabilities.
SE003 Modular Modular: The path to Mojo 1.0
SE004 Modular Modular: The Next Big Step in Mojo Open Source
SE005 Modular Modular: Modular 26.3: Mojo 1.0 Beta, MAX Video Gen, and more
SE006 Modular Modular: Modular 26.1: A Big Step Towards More Programmable and Portable AI Infrastructure
SE007 Modular Modular: Modular 25.6: Unifying the latest GPUs from NVIDIA, AMD, and Apple
SE008 Modular Modular: MAX 25.2: Unleash the power of your H200's–without CUDA!
SE009 Modular Modular: Modular + AMD: Unleashing AI performance on AMD GPUs
SE010 Modular Modular: Achieving State-of-the-Art Performance on AMD MI355 — in Just 14 Days Because 99.9% of the stack is architecture-agnostic, adding support for a new GPU mostly involves updating a few kernels.
SE011 Modular Modular: AI Agents for AWS Marketplace
SE012 Modular Modular: 2025 Year in Review
SE013 Modular Docs What is Modular | Modular
SE014 Modular Docs Quickstart | Modular
SE015 Modular Docs Cloud deployments with Modular | Modular
SE016 Modular Docs Model bring-up workflow | Modular
SE017 Modular Docs Speculative decoding | Modular
SE018 Modular Docs Prefix caching with PagedAttention | Modular
SE019 Modular Docs Structured output | Modular
SE020 Modular Docs Supported models | Modular
SE021 Mojo Mojo
SE022 GitHub GitHub - modular/modular: The Modular Platform (includes MAX & Mojo)
SE023 GitHub Releases · modular/modular
SE024 GitHub Issues · modular/modular
SE025 Python Package Index modular
SE026 Meetup Modular Meetup Group | Meetup
SE027 Stack Overflow Newest 'mojo-lang' Questions
SE028 Modular Modular: Privacy Policy
SE029 Modular Modular: Terms of Service
SE030 Modular Modular: Report Issue
SE031 Modular Modular: Acceptable Use Policy
SE032 Modular Modular: Community License
SE033 Spheron Network Modular MAX and Mojo on GPU Cloud: Deploy an LLM Inference Engine That Outperforms vLLM Use MAX if you serve dense models at high concurrency on NVIDIA or AMD hardware and want kernel-level control without writing CUDA C++.
SE034 krun.pro Mojo Ecosystem 2026: Infrastructure, Libraries, and the MAX Engine The closed compiler is a real compliance consideration — especially for teams with build toolchain auditability requirements.
SE035 YouTube Modular - YouTube
SE036 Discord Modular
SU001 Modular Modular: Customer Success Stories Modular delivers high-speed inference, cross-architecture flexibility, and SLA-backed reliability—so your teams can innovate faster and scale without surprises.
SU002 Modular Modular: Inworld Case Study Our API now returns the first 2 seconds of synthesized audio on average ~70% faster compared to vanilla vLLM based implementation, at just 200ms for 2 second chunks.
SU003 Modular Modular: Hippocratic AI partners with Modular to power flexible, high-quality inference for real-time patient conversations MAX achieved approximately 30% faster P99 end-to-end latency in the evaluation for a critical dense production model.
SU004 Modular Modular: SF Compute and Modular Partner to Revolutionize AI Inference Economics At launch, it supports 20+ state-of-the-art models across language, vision, and multimodal domains.
SU005 Modular Modular: Modular Platform 25.5: Introducing Large Scale Batch Inference Mammoth continuously distributes jobs across GPU clusters using an optimized scheduler to maintain over 90% utilization of cluster resources.
SU006 Modular Modular: Modverse #51: Modular x Inworld x Oracle, Modular Meetup Recap and Community Projects Modular x Inworld x Oracle. See how we helped Inworld slash TTS costs by 70% and boosted performance 4x by partnering them and Oracle Cloud.
SU007 Modular Modular: Modular partners with Amazon Web Services (AWS) to bring MAX to AWS services The MAX Platform supercharges this mission for our millions of AWS customers.
SU008 Modular Modular: Modular Raises $250M to scale AI's Unified Compute Layer Its platform is being downloaded 10K’s of times per month ... powers trillions of tokens served daily in production ... and has 100K’s of developers in their ecosystem across more than 100 countries.
SU009 Modular Modular: Editions & Pricing Free ... Per token (shared) Per minute (dedicated) ... Per minute deployed. Use your AWS/GCP/Azure credits and commits.
SU010 Modular Modular: About Us The Modular Platform unifies AI under a single framework, offering text, audio, and image inference - all with the state-of-the-art performance that you can deploy with shared endpoints, dedicated endpoints, in your cloud or ours, and with custom models.
SU011 Modular Modular: Shared Endpoints, Our Cloud, Any GPU Shared endpoints scale to zero when idle and burst to meet demand - no reserved capacity, no minimum spend.
SU012 Modular Modular: Dedicated Endpoints Reserved GPU capacity dedicated to your workloads. Simple per-minute billing that makes cost forecasting straightforward.
SU013 Modular Modular: Your Cloud, Our Engineers, Any GPU Already running at scale for Fortune 500 companies.
SU014 Modular Modular: AWS Case Study 15+ CPU+GPU Architectures ... 500+ Models ... 33+ Geographic Regions.
SU015 Modular Modular: AI Agents for AWS Marketplace Customers can now use AWS Marketplace to easily discover, purchase, and deploy AI agent solutions ... all using their AWS accounts.
SU016 Modular MAX: A high-performance inference framework for AI Build once, deploy anywhere with a single programmable stack for high-performance GenAI on any hardware.
SU017 YouTube Modular x Inworld x Oracle - YouTube Modular x Inworld x Oracle.
SU018 Lambda For Superintelligence | Lambda Purpose-built AI factories for frontier workloads.
SU019 SDxCentral Modular raises $250M for AI's 'unified compute layer' at $1.6B valuation Modular's approach has earned it an array of partners, including enterprises like Inworld and SF Compute, research teams like Jane Street, and cloud giants including Oracle, Amazon Web Services (AWS), Lambda Labs, and TensorWave.
SU020 SiliconANGLE Modular raises $250M to simplify AI deployment across hardware Modular's approach has earned it an array of partners, including enterprises like Inworld and SF Compute, research teams like Jane Street, and cloud giants including Oracle, Amazon Web Services (AWS), Lambda Labs, and TensorWave.
SU021 Verdict Modular secures $250m to expand unified AI platform Its client and partner ecosystem spans enterprises such as Inworld and SF Compute, research teams such as Jane Street, cloud service providers including Oracle, Amazon Web Services, Lambda Labs, and Tensorwave, and hardware manufacturers such as AMD and Nvidia.
SU022 Business-News-Today.com Modular bags $250m to build AI’s “hypervisor” — but can it outpace Institutional sentiment acknowledges the risks — from competing initiatives by hyperscalers to the challenge of sustaining performance leadership.
SU023 AlphaStreet Nvidia’s CUDA Lock-In and Supply Scarcity Make Its AI Chip Moat Harder to Break Than It Looks It is easier for teams to stay on the same stack than to migrate, especially when migration introduces schedule and operational risk.
SU024 Yahoo Finance / Reuters AI startup Modular raises $250 million, seeks to challenge Nvidia dominance It plans to sell the software directly to enterprises on a consumption basis and through revenue-sharing partnerships with cloud providers.
SU025 Inworld TTS at Scale: Why vLLM Wasn't Enough for Production We’ve partnered with Modular to supercharge Inworld TTS, combining our state-of-the-art voice quality with Modular's world-class serving stack to deliver breakthrough speed and affordability for every developer.
SU026 GitHub GitHub - modular/modular: The Modular Platform (includes MAX & Mojo) As of May, 2025, this repo includes over 450,000 lines of code from over 6000 contributors.
SR001 Modular Privacy Policy We retain Personal Data about you for as long as you have an open account with us or as otherwise necessary to provide you with our Services.
SR002 Modular Terms of Service The Modular Parties will not be responsible or liable for the accuracy, availability, occurrence of errors, copyright compliance, legality, or decency of material contained in or accessed through the Platform.
SR003 Modular Report Issue If you instead found an ordinary bug (not a safety/privacy/security issue), please instead report it here on GitHub.
SR004 Modular About Us Chris Lattner and Tim Davis met at Google ... they founded Modular, headquartered in Silicon Valley.
SR005 Modular Careers
SR006 Modular Editions & Pricing Security & Compliance SOC 2 Type 2 certified.
SR007 Modular MAX: A high-performance inference framework for AI
SR008 Modular Your Cloud, Our Engineers, Any GPU Inference inputs and outputs never leave your network.
SR009 Modular Our Cloud
SR010 Modular Shared Endpoints, Our Cloud, Any GPU Choose the GPU that fits your workload's price-performance profile. MAX compiles natively for both NVIDIA and AMD.
SR011 Modular Dedicated Endpoints Reserved GPU capacity dedicated to your workloads. Simple per-minute billing that makes cost forecasting straightforward.
SR012 Modular Custom Models The MLIR compiler handles the rest - generating optimized code for NVIDIA, AMD, Apple Silicon, and ARM CPUs from a single source.
SR013 Modular The Next Big Step in Mojo Open Source We're thrilled to announce the release of the core modules from the Mojo standard library under the Apache 2 license.
SR014 Modular The path to Mojo 1.0 There are some important language features ... that will introduce breaking changes to the language and standard library.
SR015 Modular Modular 25.6: Unifying the latest GPUs from NVIDIA, AMD, and Apple The platform now delivers peak performance on NVIDIA Blackwell (B200) GPUs ... and AMD MI355X GPUs.
SR016 Modular Modular at NVIDIA GTC 2026: MAX on Blackwell, Mojo Kernel Porting, and DeepSeek V3 on B200 All kernel code is open source in our modular/max GitHub repository.
SR017 Modular Customer Success Stories Modular delivers high-speed inference, cross-architecture flexibility, and SLA-backed reliability.
SR018 Modular Inworld Case Study Our API now returns the first 2 seconds of synthesized audio on average ~70% faster compared to vanilla vLLM based implementation ... at a ~60% lower price.
SR019 Modular Modular Raises $250M to scale AI's Unified Compute Layer This brings its total capital raised to $380M across three rounds since its founding in 2022 and values Modular at $1.6 billion.
SR020 Modular Modular partners with Amazon Web Services (AWS) to bring MAX to AWS services The MAX Platform is able to supercharge Graviton CPUs, executing your AI models with up to 5X higher performance and up to 80% cost savings.
SR021 Modular AWS Case Study Traditional AI serving solutions require specific hardware configurations and proprietary software stacks (like CUDA), creating vendor lock-in and limiting deployment flexibility.
SR022 Modular AI Agents for AWS Marketplace Enterprise grade SLA
SR023 U.S. Department of Justice Data Security The Data Security Program went into effect on April 8, 2025.
SR024 U.S. Department of Justice Data Security Program: Compliance Guide The Data Security Program implemented by the National Security Division ... comprehensively and proactively addresses ... access ... to Americans' bulk sensitive personal data.
SR025 Bureau of Industry and Security Homepage | Bureau of Industry and Security A license is required to export advanced computing items to entities headquartered in Country Group D:5 and Macau.
SR026 National Institute of Standards and Technology Cybersecurity Framework Profile for Artificial Intelligence The Cyber AI Profile will provide guidelines for managing cybersecurity risk related to AI systems.
SR027 NIST CSRC NIST releases prelim draft of Cyber AI profile Draft for Public Comment
SR028 National Conference of State Legislatures Artificial Intelligence Legislation Database
SR029 Troutman Privacy & Cyber State AI Law Tracker Map Released The map tracks the AI laws most likely to create compliance obligations for companies developing or deploying AI systems.
SR030 AlphaStreet Nvidia's CUDA Lock-In and Supply Scarcity Make Its AI Chip Moat Harder to Break Than It Looks Nvidia's competitive position in AI accelerators is anchored in CUDA ... deeply embedded across model development and production workflows.
SR031 NVIDIA NVIDIA MGX Platform NVIDIA MGX provides an open modular reference architecture that enables OEMs, ODMs, and ecosystem partners to build accelerated systems faster.
SR032 SDxCentral Modular raises $250M for AI's unified compute layer at $1.6B valuation The Palo Alto, California-based company's latest round was led by Thomas Tull's U.S. Innovative Technology fund.
SR033 SiliconANGLE Modular raises $250M to simplify AI deployment across hardware Modular ... raised $250 million in its third financing round, valuing the company at $1.6 billion.
SR034 U.S. Securities and Exchange Commission / CoreWeave S-1/A We work with NVIDIA to deploy the latest GPU technologies at scale.
SR035 NVIDIA NVIDIA Form 10-K (fiscal year ended Jan. 25, 2026)
SV001 Modular Modular: Modular Raises $250M to scale AI's Unified Compute Layer Modular has raised $250M in its third financing round.
SV002 TechCrunch Modular secures $100M to build tools to optimize and create AI models | TechCrunch
SV003 GV Modular: Unlocking AI and Opportunity
SV004 SDxCentral Modular raises $250M for AI's 'unified compute layer' at $1.6B valuation
SV005 SiliconANGLE Modular raises $250M to simplify AI deployment across hardware - SiliconANGLE
SV006 Yahoo Finance / Reuters AI startup Modular raises $250 million, seeks to challenge Nvidia dominance AI startup Modular said on Wednesday it raised $250 million in a funding round valuing it at $1.6 billion.
SV007 Sacra Modular valuation, funding & news
SV008 Securities and Exchange Commission S-1/A
SV009 Securities and Exchange Commission XBRL Viewer
SV010 AlphaStreet Nvidia’s CUDA Lock-In and Supply Scarcity Make Its AI Chip Moat Harder to Break Than It Looks CUDA lock-in and supply scarcity make its AI chip moat harder to break than it looks.
SV011 NVIDIA NVIDIA MGX Platform
SV012 The Business Research Company The Business Research Company - Market Research & Business Intelligence
SV013 Fortune Business Insights AI Inference Market Size, Share | Global Growth Report [2034]
SV014 Spheron Network Modular MAX and Mojo on GPU Cloud: Deploy an LLM Inference Engine That Outperforms vLLM (2026 Guide) | Spheron Blog
SV015 Kanerika 10 Best vLLM Alternatives for AI Inference in 2026
SV016 Modular Modular: Editions & Pricing Pricing depends on your edition. Our Cloud charges per token or per minute ... Your Cloud (BYOC) is billed per minute of reserved GPU capacity.
SV017 Modular Modular: Customer Success Stories
SV018 Modular Modular: Inworld Case Study Our API now returns the first 2 seconds of synthesized audio on average ~70% faster ... at a ~60% lower price.
SV019 Modular MAX: A high-performance inference framework for AI
SV020 GitHub GitHub - modular/modular: The Modular Platform (includes MAX & Mojo)
SV021 Inworld TTS at Scale: Why vLLM Wasn't Enough for Production By using MAX we achieved a truly remarkable improvement both for the latency and throughput.
SV022 Modular Modular: About Us
SV023 Modular Modular: Your Cloud, Our Engineers, Any GPU Inference inputs and outputs never leave your network.
SV024 Modular Modular: Shared Endpoints, Our Cloud, Any GPU
SV025 Modular Modular: Dedicated Endpoints
SV026 Modular Modular: Custom Models
SV027 Modular Modular: AWS Case Study
SV028 Modular Modular: Modular partners with Amazon Web Services (AWS) to bring MAX to AWS services The MAX Platform is able to supercharge Graviton CPUs, executing your AI models with up to 5X higher performance and up to 80% cost savings.
SV029 Modular Modular: Modular at NVIDIA GTC 2026: MAX on Blackwell, Mojo Kernel Porting, and DeepSeek V3 on B200
SV030 Modular Modular: The Next Big Step in Mojo🔥 Open Source We're thrilled to announce the release of the core modules from the Mojo standard library under the Apache 2 license.
SV031 Modular Modular: The path to Mojo 1.0
SV032 Modular Modular: Hippocratic AI partners with Modular to power flexible, high-quality inference for real-time patient conversations MAX delivers sub-500ms mean time to first token (TTFT) and holds total generation time tight even at high concurrency.
SV033 Together AI Together AI Announces $305M Series B to Scale AI Acceleration Cloud for Open Source and Enterprise AI
SV034 Groq Groq Raises $750 Million as Inference Demand Surges
SV035 Lambda Lambda Raises Over $1.5B from TWG Global, USIT to Build Superintelligence Cloud Infrastructure
SV036 Tech Funding News NVIDIA-backed Lambda lands $480M at $4B valuation to scale its AI cloud — TFN
SV037 Sacra Together AI revenue, valuation & funding Sacra estimates that Together AI hit $1B in annualized revenue in February 2026.
SV038 Cerebras Cerebras Raises $1.1 Billion at $8.1 Billion Valuation
SV039 Business Wire Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future
SV040 d-Matrix d-Matrix Raises $275 Million to Power the Age of AI Inference - d-Matrix