Modal
AI 生产云——无服务器 GPU 计算、智能体沙箱与零基础设施管理
Modal 已证明 $300M ARR、七个月增长 5 倍、高质量且分散的客户组合,以及技术差异化的 serverless 平台;Sandbox 收入超过总 ARR 三分之一,足以给出 track 判断。但 15.5× ARR 倍数偏紧,2026 年 5–6 月三次重大宕机暴露可靠性风险,毛利率和 NRR 完全不透明,当前价格下还不能给 buy。
封面要素
公司概况
Modal Labs, Inc. 是一家总部位于纽约市的无服务器 AI 基础设施公司,约由 Erik Bernhardsson 与 Akshat Bubna 于 2021 年创立。公司定位为 AI 生产云,提供 Python 优先的平台,在 AWS、GCP 和 Oracle Cloud 之上抽象 GPU 与 CPU 计算,客户无需管理基础设施。核心产品包括 Functions(无服务器 GPU/CPU 计算)、Sandboxes(用于智能体执行和 LLM 生成代码的隔离容器)、Training(微调和多节点作业)、Volumes(高性能可变存储)、Web Endpoints 和 GPU Notebooks。2026 年 5 月 Series C 完成时,Modal 披露年化收入突破 $300M,较 2025 年 10 月 Series B 增长五倍;该轮融资 $355M,投后估值 $4.65B,由 General Catalyst 和 Redpoint Ventures 共同领投。Sandboxes 目前贡献总收入超过三分之一,使 Modal 从单纯 GPU 租赁延展为平台型业务。已披露客户包括 Cognition、Physical Intelligence、DoorDash、Suno、Ramp、Quora(Poe)、Substack、Lovable、Reducto 和 Applied Compute。
- 成立时间
- 2021-01-01
- 创始人
- Erik Bernhardsson, Akshat Bubna
- 创立地点
- New York City, NY, USA
- 总部
- New York City, NY, USA
- 产品
- Modal 销售按秒计费的无服务器 GPU 和 CPU 计算,无需基础设施管理;商业层级有三档(Starter 免费、Team 每月 $250、Enterprise 定制),Python SDK 是主要开发者入口。其差异化技术栈借助 GPU 内存快照实现亚秒级 GPU 冷启动(云缓冲、内容寻址容器文件系统、CPU checkpoint/restore 和 CUDA checkpoint/restore)。Sandbox 产品提供隔离容器,用于执行智能体生成代码;该产品已贡献总收入超过三分之一,使 Modal 从商品化 GPU 租赁走向智能体基础设施。AWS 和 GCP marketplace 集成让企业客户可把既有云承诺支出用于 Modal,降低采用阻力。
- 客户
- AI 原生软件开发者、ML 工程与平台团队、强化学习公司、编码智能体运营方,以及医疗、金融科技、媒体、机器人和计算生物学等行业的企业 AI 团队。进入路径由开发者驱动(免费 Starter 层);并发上限、合规要求(HIPAA、SOC 2、Okta SSO)和用量承诺经济性推动客户升级到 Team 和 Enterprise。
- 商业模式
- 纯用量计费:客户按 GPU 与 CPU 计算秒数、存储(Volumes)的 GB/天,以及 Sandbox 执行秒数付费——没有席位费,也没有按 token 计量的费用。收入来自三档计划,加上带用量折扣、嵌入式 ML 工程服务和专属支持的 Enterprise 合同。Startup Program 向早期公司提供免费额度,作为漏斗顶部获客渠道。
- 阶段
- Series C
- 融资情况
- Modal 已完成三轮确认的机构融资:Series A(2023 年,Redpoint Ventures 领投;抓取语料中未披露规模)、Series B(2025 年 10 月 $110M,投后估值约 $1.1B,按公司推断处理;Sacra 估计为 $87M 且 Lux Capital 领投——差异未解决),以及 Series C(2026 年 5 月 21 日公布, $355M,投后估值 $4.65B,由 General Catalyst 和 Redpoint Ventures 共同领投,Menlo Ventures、Bain Capital Ventures 和 Accel 为新投资方)。估计累计融资约 $465M。
执行摘要
主要优势
- $300M ARR 且七个月增长 5 倍,对 AI 基础设施公司来说极为少见,也验证了规模化产品市场契合
- Sandbox 收入超过总 ARR 三分之一,把 Modal 从高端 GPU cloud 叙事推向 agentic infrastructure 平台叙事,支撑软件式倍数扩张
- 专有 snapshotting 技术(GPU memory buffers、CUDA checkpoint/restore、Rust runtime)实现亚秒级 GPU cold start,在商品化 GPU cloud 之上形成可防守技术壁垒
- 一线投资方阵容——General Catalyst、Redpoint、Menlo Ventures、Bain Capital Ventures、Accel——确认了 $4.65B 估值下的机构承销质量
- 十个具名客户已深度生产部署(Cognition、Physical Intelligence、DoorDash、Suno、Ramp、Quora、Substack、Lovable、Reducto、Applied Compute),并有可衡量性能结果
- 轻资产多云供应模式汇集 AWS、GCP 和 Oracle Cloud 产能,避开 GPU 所有权的资本强度,同时支持弹性扩展到 1,000+ GPU
主要风险
- 单月三次重大运营宕机(2026 年 5 月 7 日、5 月 19 日、6 月 3 日),其中包括 control-plane authentication failure,说明可靠性基础设施可能没有跟上 5 倍收入增长
- 毛利率、烧钱速度、NRR、cohort retention 和 cap table 条款均未披露;缺少这些输入,15.5× ARR 倍数很难摆脱偏紧结论
- Series B 差异尚未解释(公司称 $110M / Redpoint 领投;Sacra 称 $87M / Lux Capital 领投),这是需要进数据室核查的透明度缺口
- 从 hyperscaler 采购 GPU 的轻资产模式会形成毛利率天花板;若 AWS、GCP 或 Azure 捆绑原生 serverless GPU 产品并做到相近开发者体验,竞争脆弱性会放大
- 双创始人治理结构下,没有公开任命 CFO、VP Engineering、VP Sales 或独立董事;CEO Erik Bernhardsson 又是唯一公开沟通面孔,关键人风险集中
- HIPAA BAA 范围排除了 GPU Memory Snapshots,而这正是 Modal 的核心 cold-start 差异点;即便公司强调企业合规定位,受监管医疗客户可用的产品面仍受限
未决问题
- 分产品线毛利率(compute vs. Sandboxes vs. storage)是最重要的未披露数据点;15.5× ARR 倍数要站得住,毛利率需要高于 35%
- NRR、cohort retention 数据、前 10 大客户占 ARR 比例均完全未披露,无法判断收入耐久性
- Series B 差异必须厘清(公司称 $110M,Sacra 估算 $87M;Redpoint vs. Lux Capital 领投),才能确认 cap table 准确性
- 四轮融资(合计约 $465M)的 capitalization table、清算优先权金额和参与权均未公开
- 即便 $355M Series C 刚完成,若没有私有财务报表,仍无法确认月度运营烧钱和当前现金余额
- 对一家估值 $4.65B 的公司来说,完整董事会构成、委员会结构和投资者治理权利仍未披露
- 人员结构(engineering vs. GTM)和单位经济性(CAC、payback、分层 ACV)没有公开资料
目录
01公司概览
1.1 身份、产品与市场定位
Modal Labs, Inc. 是一家在 Delaware 注册的 AI 生产云公司。2026 年 5 月 SaaS 协议确认了其法律实体名称和 Delaware 注册地;该协议适用于所有企业客户。运营总部位于 New York City, New York,这一点由 LinkedIn 公司页面(2026 年 6 月 25,318 名关注者)和 Redpoint Ventures 组合公司页面共同确认。部分二级市场数据库称其位于 San Francisco,与上述信息冲突;本报告以抓取到的一手来源为准。Modal 将自身使命描述为补齐 AI 工作负载到来时缺失的基础设施层:传统云基础设施为无状态 Web 应用设计,从未针对需要 GPU 内存、可在零到数千张加速卡之间动态伸缩、并为智能体生成代码提供隔离执行环境的模型来架构。公司使用过标语「The production cloud for AI」以及首页文案「The production cloud for AI—built for speed, at any scale.」。截至 2026 年 6 月,核心产品包括:Functions(GPU 和 CPU 无服务器计算)、Sandboxes(用于智能体执行或 LLM 生成代码的隔离容器)、Training(微调和多节点训练作业)、Volumes(高性能可变存储)、Web Endpoints(HTTP/ASGI 服务)和 GPU Notebooks(协作笔记本)。定价分为 Starter($0 基础费,含每月 $30 免费额度,10 个 GPU 并发)、Team(每月 $250,50 个 GPU 并发)和 Enterprise(定制)。modal Python SDK(PyPI 可用,支持 Python 3.10–3.14)是 Modal 的主要开发者界面;JavaScript/TypeScript 和 Go 也支持编排。Modal 在主要云和全球数百个数据中心池化容量,可在没有预留容量的情况下数秒内从 0 自动扩展到 1,000+ GPUs。公司在 2026 年 5 月 Series C 文章中称已投入五年基础设施建设,支持 2021 年成立的判断,也与用户提供的背景一致;公开语料未显示精确成立日期。[CO001, CO002, CO003, CO004, CO005, CO006]
| 指标 | 数值 / 状态 | 截至 | 置信度 | 备注 / 缺口 |
|---|---|---|---|---|
| 法律实体 | Modal Labs, Inc.(特拉华州公司) | 2026-06-14 | 高 | modal.com Terms of Service(2026 年 5 月版本)已确认。 |
| 主要总部 | New York City, New York(纽约) | 2026-06-14 | 高 | LinkedIn 公司页和 Redpoint 投资组合页均显示 New York City, NY。 |
| 成立时间 | ~2021 | 2022-12-07 | 中 | 创始人 2022 年 12 月博客称「我在做 Modal」;Series C 称「五年的深度基础设施工作」(2026 年 5 月)。抓取语料中没有精确成立日期。 |
| 当前阶段 | 私营,Series C | 2026-05-21 | 高 | 官方 Modal 博客和 General Catalyst 投资组合页确认 Series C。 |
| 最新估值 | $4.65B 投后 | 2026-05-21 | 高 | 官方 Series C 博客 modal.com/blog/modal-series-c 已披露。 |
| Series C 融资 | $355M | 2026-05-21 | 高 | 官方 Series C 博客披露;共同领投方为 General Catalyst 和 Redpoint。 |
| 年化收入 | >$300M ARR | 2026-05-21 | 中 | 公司在 Series C 博客中声称;抓取语料中没有独立第三方验证。 |
| Series B 以来收入增长 | ~5x | 2026-05-21 | 中 | Series C 博客中公司称较 Series B「growing fivefold since」;未经独立审计。 |
| 员工数 | ~180 名员工 | 2026-06-14 | 低 | LinkedIn 显示「51–200 employees」,people 板块显示 180;公司未确认精确人数。 |
| 商业模式 | 按用量计费(按秒 GPU/CPU 计算),含计划层级 | 2026-06-14 | 高 | 定价页和文档指南均确认按秒 serverless 计费;定价页确认计划层级。 |
| 主要产品 | Serverless GPU 计算、agent sandboxes、training、volumes、web endpoints | 2026-06-14 | 高 | 官方 modal.com 产品页和技术文档均已确认。 |
| PyPI 下载 / 版本 | PyPI 上的 SDK;支持 Python 3.10–3.14 | 2026-06-14 | 高 | 直接抓取 pypi.org/project/modal/ 确认。 |
Null 值已替换为可得最佳估算;「~」表示近似。置信度=高要求至少一个一手来源(官方或法律)。ARR 和增长数字来自公司声称,未经审计。
[CO001, CO002, CO003, CO005, CO006, CO007]Modal 的竞争位置把创始人主导的基础设施创新、跨云汇聚的弹性 GPU 容量、不断扩大的生产级 AI 客户群,以及快速资本形成,连成一条 serverless AI cloud 论点。
[CO001, CO003, CO005, CO006, CO011, CO012]1.2 创始人、领导层与治理
Redpoint Ventures 组合公司页面和多处公开引用均确认,Modal 由 Erik Bernhardsson 与 Akshat Bubna 创立。Erik Bernhardsson 是面向公众的 CEO 兼联合创始人,最显著的公开入口是其个人博客(erikbern.com):2022 年 12 月一篇文章公开宣布 Modal("Long story short: I'm working on a super cool tool called Modal")。Bernhardsson 在机器学习工程社区知名,既是 Annoy 近似最近邻库创建者,也是软件基础设施和 ML 系统领域活跃博主。本轮抓取未通过一手来源独立确认其此前行业职位,因此排除具体前雇主说法。Akshat Bubna 是联合创始人;截至 2026 年 6 月,抓取到的公开语料未确认其职能头衔(CTO 或其他)和过往背景,这构成治理透明度缺口。两位创始人之外,本轮成功检索的官方或独立来源没有显示其他具名高管(工程 VP、销售 VP、CFO、收入负责人等)。董事会同样不透明:抓取来源未披露董事会构成、委员会结构或投资人控制权。对于一家 Series C 后期私人公司,这并不罕见;但结合 $4.65B 估值和投资人阵容深度,仍值得注意。结构性风险在于,公司呈现出双创始人、创始人主导的叙事,但尚未在公开渠道披露独立治理监督机制。Series C 博文以公司口吻共同署名,而非点名个别高管,符合紧凑的创始人沟通风格。因此,关键人物风险集中在 Bernhardsson 身上,他是主要外部沟通面孔和技术思想领袖。对于一家 ARR 超过 $300M 的公司,未公开具名销售负责人或收入负责人同样值得关注。[CO014, CO015, CO016, CO017, CO018, CO019]
| 人物 | 角色 | 背景或匹配度证据 | 公开可见度 | 关键人 / 治理含义 |
|---|---|---|---|---|
| Erik Bernhardsson | 联合创始人,CEO(推断) | 2022 年 12 月博客公开宣布 Modal;运营个人博客 erikbern.com,在 ML 工程圈有显著影响力。为开源 ML 社区所知。 | 高 | 主要对外沟通面孔;产品叙事的技术思想领袖。若其离任,存在 CEO 关键人风险。 |
| Akshat Bubna | 联合创始人(职能头衔未确认) | Redpoint 投资组合页列为联合创始人。抓取语料中没有独立来源提供头衔或背景细节。 | 低 | 联合创始人集中度风险;没有可见的公开头衔或继任安排。 |
| 董事会 / 其他高管 | 未公开具名 | 抓取的公开语料中,除两位创始人外,没有出现董事会成员、独立董事、副总裁或 C-suite 领导者。 | None | 对一家 $4.65B 估值公司而言,治理不透明具有实质性。董事会组成和投资人控制权未披露。 |
抓取的公开来源只确认了两位联合创始人。截至 2026 年 6 月,董事会组成和所有其他高管角色仍未在公开记录中披露。
[CO014, CO015, CO016, CO017, CO018, CO019]1.3 融资历史、估值与投资人基础
Modal 已完成三轮确认的机构融资。Redpoint Ventures 组合公司页面明确称,其于 2023 年首次投资 Modal 的 Series A。用户提供的背景显示,Series B 于 2025 年 10 月完成,规模 $110M,投后估值 $1.1B,Redpoint 和 Sutter Hill Ventures 领投;抓取到的公开语料未独立确认该轮(未检索到新闻稿或官方公告),因此本报告按公司推断 / 部分验证处理。最近且明确确认的一轮是 2026 年 5 月 21 日公布的 Series C:$355M,投后估值 $4.65B,由 General Catalyst 和 Redpoint Ventures 共同领投,Menlo Ventures、Bain Capital Ventures 和 Accel 作为新投资方加入;所有现有主要投资人也参与。Series C 公告明确称,Modal 较 Series B「增长五倍」,年化收入已超过 $300M。累计融资约为 $465M+(种子轮加估算 Series A,再加 Series B $110M 和 Series C $355M),但抓取语料没有给出种子轮和 Series A 的精确金额。General Catalyst 组合公司页面确认该投资,并称 Modal 是「AI 时代的无服务器云」,同时披露该轮 GC 团队包括 Quentin Clark、Max Rimpel 和 Katie Keller。Menlo Ventures 的参与由 2026 年 5 月上传的 Menlo CDN 资产(modal.svg)以及 Series C 博文披露名单确认。Bain Capital Ventures 被列为新的 Series C 投资方,意味着其并非 Series B 投资人,这与用户提供的背景相矛盾;该冲突作为证据缺口记录。Modal 估值约七个月内从 $1.1B(Series B)升至 $4.65B(Series C),在 AI 基础设施领域速度极快,说明投资人对 $300M ARR 里程碑信心很高;但毛利率、烧钱速度和增长 cohort 数据仍未披露。[CO021, CO022, CO023, CO024, CO025, CO026]
| 投资人 / 利益相关方 | 轮次 | 已确认或推断 | 重要性 | 尽调要求 |
|---|---|---|---|---|
| Redpoint Ventures | Series A(2023),Series C(2026) | 已确认(Redpoint 投资组合页和 Series C 博客) | 最早的机构支持者;既领投 Series A,又共同领投 Series C;显示长期信念。GC 的关键 GP 参与很可能包括董事会席位。 | 确认董事会席位、reserve 行为以及 Series C 后持股。 |
| General Catalyst | Series C(2026,共同领投) | 已确认(GC 投资组合页和 Series C 博客) | 最新一轮的新领投方。GC 投资团队列出 Quentin Clark、Max Rimpel、Katie Keller。 | 确认董事会权利、治理角色,以及超越纯资本的战略理由。 |
| Sutter Hill Ventures | Series B(2025,推断) | 来自用户提供背景的推断;抓取语料中未确认 | 用户提供背景称 Sutter Hill 为 Series B 投资人。本轮未独立验证。 | 核实 Series B 参与情况并确认当前持股。 |
| Menlo Ventures | Series C(2026,新) | 已确认(Series C 博客;Menlo CDN 资产于 2026 年 5 月上传) | 作为新投资人加入 Series C。增加 AI 基础设施投资专长。 | 确认经济持股和任何治理权利。 |
| Bain Capital Ventures | Series C(2026,新投资人) | 已确认(Series C 博客明确称 BCV 为「new investor」) | 用户将其列为 Series B 投资人,但 Series C 博客称 BCV 在 Series C 作为新投资人加入,意味着它不在 Series B。与用户提供背景冲突。 | 确认 BCV 在 Series C 前是否有任何参与。 |
| Accel | Series C(2026,新) | 已确认(Series C 博客) | 新 Series C 参与方;全球大型 VC 增加投资人多样性。 | 确认经济持股,以及 Accel 是否打算领投后续轮次。 |
| 所有现有主要投资人 | Series C(2026,参与) | 已确认(Series C 博客称所有主要现有投资人均参与) | 表明内部支持,并愿意在 $4.65B 轮次中维持 pro-rata 配额。 | 获取完整 cap table,并确认 pro-rata 比例和任何 ratchet。 |
已确认表示投资人由成功抓取的来源明确具名。推断表示信息来自用户提供背景,未由本轮抓取 URL 独立验证。Series A 金额和除 Redpoint 之外的领投方不在抓取语料中。
[CO021, CO022, CO023, CO024, CO025, CO026]关键公开指标展示截至 2026 年 6 月 Modal 的资本位置、收入规模和客户证明;除 uptime(状态页)和员工数(LinkedIn)外,所有数字均为公司声称。
收入和增长数字由公司披露,未经审计。员工数为 LinkedIn 估算,可能滞后于实际情况。
[CO025, CO026, CO027, CO028, CO040, CO041]1.4 产品规模、客户证明与里程碑
抓取语料中的客户案例不断增加,已经大幅验证 Modal 的规模叙事。Reducto 将其 30+ 模型推理流水线迁移到 Modal 后,P90 延迟降低 3x,并在一小时内扩展到超过 1,000 GPUs。Zencastr 扩展到 1,500 张并发 GPU,在数天内处理了数百年的播客音频。Quora 在 Poe AI 聊天机器人平台中使用 Modal Sandboxes 做安全代码执行,节省了相当于两名工程师的持续基础设施工作。Substack 将其整个 ML 组合(垃圾检测、推荐、转写、图像生成)的训练和部署从 AWS SageMaker 迁移到 Modal。服务 DoorDash、Cognition 和 Mercor 的强化学习公司 Applied Compute 称,Modal 是唯一在 RL 循环每一层都提供合适 primitives 的基础设施选择。Series C 博文还点名 Physical Intelligence(机器人推理延迟 10–15 ms)、Suno(每天在数千张 GPU 上生成数百万首歌曲)、Cognition(面向编码智能体的数百万个 Sandboxes)、Decagon(自然客户对话 p90 延迟 342 ms)和 DoorDash(智能体商务基础设施)为活跃客户。编码智能体解决方案页面引用了 Lovable(数万个同时进行的应用创建会话)和 Ramp(全上下文后台编码智能体)。LLM 解决方案页面引用了 Allen AI、Substack 和 Reducto。从这些客户看,Modal 已在医疗 AI、机器人控制、音频、文档处理、代码生成、智能体商务和社交平台中跑通生产部署。技术前沿方面,Modal 于 2026 年 5 月发布详细博客,描述实现亚秒级 GPU 冷启动的四项技术:空闲 GPU 的云缓冲、自研内容寻址容器文件系统、CPU 侧进程 checkpoint/restore,以及 CUDA checkpoint/restore。公司状态页显示,截至 2026 年 6 月 14 日,GPU functions 90 天 uptime 为 99.946%,CPU functions 为 99.938%。负面运营信号是:2026 年 6 月 3 日 Hacker News 帖子引用社区用户说法,称一个月内发生三次重大中断(2026 年 5 月 7 日、5 月 19 日和 6 月 3 日),其中 6 月 3 日事件被描述为内部认证系统故障。尽管状态页显示汇总 uptime 比例很高,这一负面信号对可靠性尽调仍然重要。[CO031, CO032, CO033, CO034, CO035, CO036]
| 日期 | 事件 | 类型 | 金额 / 估值 / 状态 | 参与方 | 含义 |
|---|---|---|---|---|---|
| 2021-01-01 | Modal 由 Erik Bernhardsson 和 Akshat Bubna 创立 | 创立 | 公司成立 | Erik Bernhardsson;Akshat Bubna | 奠定 AI 基础设施论点的创立背景;精确日期未确认,因此用年初作为锚点。 |
| 2022-12-07 | Erik Bernhardsson 在个人博客公开描述 Modal | 产品 | 产品概念公开发布;候补名单启动 | Erik Bernhardsson | 来自一手来源的首个已确认公开信号,证明 Modal 的存在和产品愿景。 |
| 2023-01-01 | Series A 融资关闭;Redpoint Ventures 领投 | 融资 | 金额未披露 | Redpoint Ventures(领投) | 最早确认的机构资本;Redpoint 明确称其 2023 年首次投资 Series A。 |
| 2024-05-20 | Substack 案例研究发布;生产 ML 迁移里程碑 | 产品 | 案例研究发布 | Substack;Modal | 早期证据显示生产 ML 工作流从 AWS SageMaker 迁移出来;验证产品成熟度。 |
| 2025-06-30 | Quora 案例研究:Modal Sandboxes 支撑 Poe 代码执行 | 产品 | 案例研究发布 | Quora;Poe;Modal | 显示 Sandbox 产品在大型消费者互联网平台(400M 月活用户)中获得生产采用。 |
| 2025-08-28 | Zencastr 案例研究:转录负载达到 1,500 个并发 GPU 规模 | 规模 | 1,500 个并发 GPU | Zencastr;Modal | 抓取语料中的首个大规模 GPU 并发证明点;验证弹性扩展能力。 |
| 2025-10-01 | Series B 以 $1.1B 估值关闭;融资 $110M | 融资 | $110M,投后估值 $1.1B | Redpoint Ventures;Sutter Hill Ventures(用户提供背景,抓取来源未验证) | 公司达到独角兽状态;为 Series C 提到的 5x 收入增长设定基线。 |
| 2025-11-19 | Reducto 案例研究:P90 延迟降低 3x;一小时内完成 1,000+ GPU 规模测试 | 规模 | 延迟降低 3x;<1 小时内 >1,000 个 GPU | Reducto;Modal | 强企业性能证明;展示无需提前预留即可获得峰值容量。 |
| 2026-05-12 | 「Truly serverless GPUs」技术博客:亚秒级冷启动四项技术深潜 | 产品 | 亚秒级冷启动;较基线提升 40x | Modal 工程团队 | 首次集中公开解释 Modal 的核心基础设施护城河(cloud buffers、custom filesystem、CPU C/R、CUDA C/R)。 |
| 2026-05-20 | Applied Compute 案例研究:在 Modal 上为 DoorDash、Cognition、Mercor 做 RL training | 规模 | 面向企业客户的生产 RL 基础设施 | Applied Compute、DoorDash、Cognition、Mercor、Modal 等样本 | 验证 Modal 是下一代基于 RL 的 agent training 的基础设施骨干;形成新的战略用例。 |
| 2026-05-21 | Series C 以 $4.65B 估值关闭;融资 $355M;披露 $300M ARR 里程碑 | 融资 | $355M,投后估值 $4.65B;>$300M ARR | General Catalyst;Redpoint;Menlo Ventures;Bain Capital Ventures;Accel;所有现有主要投资人 | 公司跨过 $300M ARR,并在约 7 个月内以 Series B 估值 4.2x 融资;将 Modal 定位为领先的独立 AI 云。 |
| 2026-06-03 | 重大故障:内部认证系统失败;一个月内报告的第三起事件 | 反向 | 故障时长未说明;按 HN 评论,当日已解决 | Modal 平台;客户群 | 反向可靠性事件;用户报告一个月内三起事件(5 月 7 日、5 月 19 日、6 月 3 日)。需对照 SLA 承诺调查。 |
只有年份的日期用 1 月 1 日作为锚点。只有月份的日期用当月第一天。 「用户提供背景,未验证」表示事实来自任务提示,没有独立抓取来源在本轮确认。
[CO001, CO002, CO014, CO015, CO021, CO022]Modal 的时间线从 2021 年创立快速推进,到 2025 年 10 月完成 $110M Series B 并成为独角兽,再到七个月后完成 $4.65B Series C;客户案例也同步验证了技术规模化叙事。
只有年份的日期使用 1 月 1 日;若抓取来源未提供精确日期,只有月份的日期使用该月第一天。
[CO001, CO014, CO015, CO021, CO022, CO023]1.5 图表
02市场分析
2.1 市场边界、纳入支出与替代方案
Modal 的竞争市场是无服务器 AI 计算和 inference-as-a-service 层:由云管理的平台负责打包、部署、自动伸缩并计量 GPU 工作负载,客户无需配置、维护或预留底层硬件。纳入支出包括无服务器函数执行费(按 CPU 和 GPU 使用秒数计费)、托管推理端点费用、智能体代码 Sandbox 执行、Storage Volumes、网络出口,以及企业支持合同。排除支出包括原始模型权重成本、训练数据集采购、应用层开发人力、数据中心资本开支、裸金属托管费用,以及未专用于 AI 工作负载的通用 IaaS 计算支出。 潜在 Modal 客户会考虑的现状替代方案分三类。第一,基于 AWS、GCP 或 Azure 预留 GPU 实例自管 Kubernetes 集群:这种方式需要 DevOps 人手、容量规划、多年期财务承诺和大量集群管理开销;Suno 创始人选择 Modal 时明确提到希望避开「三年 GPU 预留」和集群管理,就是典型例子。第二,专业 GPU 云(RunPod、Lambda Labs)提供原始 GPU 租赁,但没有托管部署栈,客户还要自己搭建容器编排、自动伸缩逻辑和可观测性。第三,hyperscaler 原生托管 AI 服务(AWS Bedrock、Google Vertex AI / Agent Platform、Azure Machine Learning)提供托管推理,但 Python 优先开发体验较弱,专有锁定更强,且通常按 token 而非 GPU 秒计价。 Modal 已明确进入、但并非其变现中心的相邻市场包括:MLOps 实验跟踪、LLM 微调平台和开发者智能体沙箱。截至 2026 年 6 月,Modal 的 GPU 类型覆盖从 T4、L4(入门推理)到 A10、A100(40GB 和 80GB)、L40S、H100(PCIe、SXM、NVL)、H200 和 B200(Blackwell 架构),并提供可选 B200+ 标记,在可用时也路由到 B300。该硬件范围让 Modal 能服务成本优化批处理工作负载(L4、L40S)、中端生产推理(A100、L40S)和前沿模型部署(H100、H200、B200)。[CM001, CM002, CM003, CM004, CM005, CM025]
| 细分或类别 | 纳入支出 | 排除支出 | 主要买方 / 付款方 | 与 Modal 的相关性 |
|---|---|---|---|---|
| Serverless GPU 函数 | 按秒 GPU 计算费用、低于 minimum containers 的空闲计费 | 预留 GPU 容量、bare-metal 租赁 | ML / 产品工程师(部门预算) | 核心产品;主要收入线 |
| 托管推理 endpoints | endpoint hosting、HTTP/ASGI serving 费用、TLS 终止 | CDN 成本、应用托管、Modal 之上的 API gateway 层 | 平台工程师(产品或中央 IT 预算) | Web Endpoints 产品;重要企业用例 |
| Sandbox 执行 | agent 生成代码的隔离容器执行费用 | Modal 之上的编排平台成本(LangGraph、自定义 agent 框架) | AI / coding 平台工程团队 | Sandboxes 产品;快速增长的 agentic AI 细分 |
| Fine-tuning 和 training | 多节点 training、fine-tuning run 的 GPU-hour 收费 | 数据集获取、模型权重授权、标注 | ML 研究或平台团队(R&D 预算) | Training 产品;毗邻推理;占比在增长 |
| Storage(Volumes)和数据移动 | 网络挂载 volume 存储费、出口流量 | 云提供商底层 object storage(S3、GCS) | 任何在 Modal 上使用模型权重或数据的团队 | 支撑收入线;非主要收入驱动 |
| 企业支持和合规层级 | 企业合同费、SLA 保证、专属支持 | 内部合规工具、审计服务 | 采购和 IT(公司预算) | Enterprise SKU;扩大单客户 ACV |
纳入 / 排除口径来自 Modal 定价页和 Series C 公告。企业支持层级条款除 custom-pricing 提示外,未公开披露。
[CM001, CM003, CM005, CM027]2.2 多重测算口径与证据约束
没有单一分析师报告把「serverless GPU cloud」定义为独立市场类别。分析师发布的是不同抽象层级的估算,没有一个能完全匹配 Modal 的竞争边界。最相关的窄口径是 Technavio 的 AI inference-as-a-service 市场:2025 年规模为 USD 85.25 billion,至 2030 年 CAGR 为 22.1%;北美贡献增量增长的 41.1%,仅 GPU 组件 2024 年规模就达 USD 42.28 billion。MarketsandMarkets 发布更宽的 AI 基础设施口径(计算、内存、网络、存储和软件):2024 年 USD 135.81 billion,预计 2030 年达 USD 394.46 billion,CAGR 19.4%。第三个 MarketsandMarkets 口径单独测算云 AI 市场(基础设施 + ML 平台 + MLOps + AIaaS):到 2029 年达 USD 327.15 billion,CAGR 32.4%。Mordor Intelligence 预测云 AI 市场 2031 年达 USD 269.02 billion,自 2026 年起 CAGR 18.68%,其中混合云和多云架构预计以 22.31% CAGR 增长。最后,MarketsandMarkets 最宽的 AI 估算(硬件 + 软件 + 服务)把整体市场定为 2026 年 USD 601.93 billion,到 2033 年增长至 USD 3.638 trillion,CAGR 29.3%。 这些估算不能相加。它们在不同定义边界上衡量重叠或部分不同的市场;MarketsandMarkets 基础设施数字包含硬件 capex,而 Technavio 数字更窄但仅限服务。可用推论是方向性的:Modal 所处市场的可服务层(云托管、无服务器 AI 计算)今天保守估计也有数百亿到低千亿美元规模,具体取决于采用哪个口径,且有文档支持的 CAGR 区间为 19–32%。自下而上的估算——对 MarketsandMarkets $135B AI 基础设施规模套用 25–30% 云或无服务器托管占比——得到 2024 年隐含 SAM 为 USD 34–41 billion,并按比例扩张。Modal 超过 $300 million ARR 约等于 Technavio 窄口径推理市场(2025 年 USD 85.25B)的 0.35% 渗透率,说明其在大型且扩张中的机会里仍处于极早期渗透。按 15x ARR 倍数计算,Modal $4.65B 估值与 2026 年同样呈现高速收入增长的优质 AI 基础设施同行相符。[CM006, CM007, CM008, CM009, CM010, CM011]
| 发布方 | 发布年份 | 地理范围 | 基准值 | 预测值 | CAGR | 方法说明 | 置信度 | 对 Modal sizing 的局限 |
|---|---|---|---|---|---|---|---|---|
| Technavio | 2026 | 全球 | USD 85.25B(2025) | 2025–2030 累计 USD 146.12B | 22.1% (2026–2030) | AI inference-as-a-service;仅云托管推理 compute | 中 | 服务层较窄;排除 on-premises 和 training |
| MarketsandMarkets | 2024 | 全球 | USD 135.81B(2024) | USD 394.46B(2030) | 19.4% (2024–2030) | 完整 AI 基础设施(compute + memory + network + storage + software) | 中 | 包含硬件 capex;高估 Modal 的 serviceable market |
| MarketsandMarkets | 2024 | 全球 | 未说明 | USD 327.15B(2029) | 32.4%(至 2029) | Cloud AI(基础设施 + ML 平台 + MLOps + AIaaS + 生成式 AI) | 中 | 比纯推理更宽;包含 on-premises ML platform 支出 |
| Mordor Intelligence | 2026 | 全球 | 未说明 | USD 269.02B(2031) | 18.68% (2026–2031) | Cloud AI 服务层;包含 multi-cloud 和 hybrid architectures | 中 | 2026 年 2 月发布;方法无法公开核验 |
| MarketsandMarkets | 2026 | 全球 | USD 601.93B(2026) | USD 3,638B(2033) | 29.3% (2026–2033) | 最宽 AI 口径(hardware + software + services + generative AI) | 低 | 过宽;包含 NVIDIA chip revenue、model-lab R&D、enterprise software |
| Author bottom-up(SAM 估算) | 2026 | 全球 | USD 34–41B(2024 est.) | 未预测 | N/A | 将 25–30% 云托管占比应用于 MarketsandMarkets $135.81B 数字 | 低 | 作者估算;没有公开来源定义该子细分 |
| Technavio(GPU component) | 2026 | 全球 | USD 42.28B(2024) | 未说明 | N/A | AI inference-as-a-service 市场中的 GPU 硬件 | 中 | 硬件子组件;不是纯服务市场规模 |
| Modal ARR(penetration 参考) | 2026 | 未披露 | USD 300M+ ARR(2026) | 未披露 | N/A | 公司披露的年化收入运行率里程碑 | 中 | 约为 Technavio $85.25B 的 0.35%;印证仍处早期渗透 |
估算口径采用不同市场定义,不能相加。CAGR 来自各发布方的预测期;不一定能在各地区一概适用。
[CM006, CM007, CM008, CM009, CM011, CM012]从最宽的 AI 市场逐层收窄到 Modal 参与竞争的 serverless GPU compute 滩头,展示可服务市场空间。
这是一条收窄逻辑链,不是可加总模型。中间层混合了服务和基础设施定义,因为公开来源没有定义干净的 “serverless GPU cloud” 子类别。2031 年 Mordor 数字线性插值到 2026 年,仅用于说明数量级。
[CM006, CM007, CM009, CM011, CM013, CM041]RunPod(spot/cloud pod)发布的按小时 GPU 费率展示了 Modal 必须跨过的基础价格底线,才能为每个 GPU tier 证明其托管平台溢价合理。
低端 = RunPod spot/cloud-pod 公布价格(2026 年 6 月)。高端 = 基于超大云和托管推理市场数据估算的同类 GPU 托管 tier 溢价;没有单一来源公布所有类型的每 GPU 托管 tier 费率。本次未完整取得 Modal 自身 GPU 价格;该区间说明结构性定价带,并非直接比较 Modal 与 RunPod。
[CM016, CM017, CM019, CM020, CM040]2.3 买方、用户与付款方分层
Modal 披露的客户群和案例语料显示出五类清晰买方画像。AI 原生产品公司(Suno、Decagon、Lovable)的买方是工程或产品负责人;他们从自助 Starter 或 Team 层起步,只看开发体验和伸缩表现,通常停留在用量计费。智能体编码平台建设者(Cognition、Ramp、Lovable)需要 Modal 的 Sandbox 产品来做隔离容器执行;买方是工程或平台团队,工作负载天然突发且对延迟敏感。机器人和 physical AI 研究实验室(Physical Intelligence)需要极低延迟 GPU 推理(引用为 10–15 ms),价格敏感度较低;买方通常是研究或 ML 基础设施负责人。企业 ML 平台团队(DoorDash、Substack)已把现有 ML 流水线从 AWS SageMaker 或内部管理集群迁移出来;买方从工程扩展到中央平台或 IT 预算,合规、可靠性和 SLA 保证变成选择标准。RL/研究计算团队(Applied Compute,服务 DoorDash、Cognition、Mercor)需要完整 RL 计算栈——环境、策略、奖励和数据——在规模化并行中运行;买方是研究或应用 ML 团队。 预算所有者生命周期通常从产品或工程开始(开发者用个人或团队信用卡试用 Modal),生产工作负载确定后升级为部门预算分配,企业规模下再迁移到中央平台或 IT 预算。Modal 的定价层(Starter $0,含每月 $30 免费 GPU 额度和 10 个 GPU 并发;Team 每月 $250,50 个 GPU 并发;Enterprise 定制定价)设计上就是支撑从 PLG 到企业的漏斗,并尽量减少每一阶段的摩擦。 截至 2026 年 6 月,Modal 24+ 个文档化示例展示了支持工作负载的广度:LLM 推理(OpenAI 兼容端点)、蛋白质折叠、编码智能体、图像生成、批量 whisper 转写、视频生成、音乐生成、RAG 流水线和科学计算。规模限制(标准工作负载每个函数 2,000 个 pending inputs、25,000 个 total inputs;async .spawn() 作业可达 1 million 个 pending inputs)界定了企业买方必须核验的运营参数。[CM024, CM025, CM026, CM027, CM028, CM029]
| 细分市场 | 买方 | 日常用户 | 付款方 | 核心工作流 | 预算负责人 | 采用触发点 |
|---|---|---|---|---|---|---|
| AI 原生产品公司 | 工程或产品负责人 | ML / 产品工程师 | 公司(按用量计费或 Team 计划) | 面向消费级 AI 产品的推理服务 | 产品或工程预算 | 流量峰值叠加难预测的 GPU 需求;避开 Kubernetes 复杂度 |
| Agentic 编码平台 | 平台或基础设施工程负责人 | AI/ML 平台工程师 | 公司(Team 或 Enterprise 计划) | 大规模运行 agent 生成代码的沙箱执行 | 工程或中央平台预算 | 需要在数千个并发会话中隔离执行代码 |
| 机器人 / physical AI 实验室 | ML 基础设施或研究负责人 | 研究工程师 | 公司(Enterprise 计划) | 机器人策略模型的低延迟 GPU 推理 | 研发或基础设施预算 | 规模化场景要求低于 15 ms 延迟;没有自管替代方案 |
| 企业 ML 平台团队 | 工程 VP 或 ML 平台负责人 | 数据科学家或 ML 工程师 | 企业采购 | 从 SageMaker 或 K8s 迁移多模型流水线 | 中央平台或 IT 预算 | SageMaker 或自管方案成本高、运维重;需要 SLA 保障 |
| RL 与研究算力团队 | 研究或应用 ML 团队负责人 | 研究工程师 | 公司或课题经费 | 分布式 RL 训练、rollout 与奖励计算 | 研发预算 | RL 策略迭代需要弹性突增到数百块 GPU |
买方画像来自 Modal 的 Series C 公告、案例研究(Suno、Substack、Applied Compute、Series C 博客中提到的 Physical Intelligence)和定价页层级。个人与企业规模的预算负责人由定价层级结构推断。
[CM024, CM025, CM026, CM027, CM028]Modal 把控部署、扩缩容和执行编排层,在模型创建与终端用户流量之间攫取价值。
[CM002, CM004, CM027, CM030, CM038, CM039]2.4 增长驱动因素与采用约束
五股结构性力量正在推动对 Modal 这类产品的需求。第一,AI 模型复杂度非线性上升:LLM 参数量从数百亿扩展到数千亿时,推理基础设施成本和管理复杂度增长快于模型规模,托管计算平台抽象运营层的价值随之提高。第二,智能体 AI 架构需要隔离、短生命周期的执行环境(Sandboxes),并能在亚秒级需求下从零扩展到数千个容器;Kubernetes 支撑的预留基础设施不适合这类工作负载,也因此拉动对 Modal 冷启动优化无服务器模型的需求。第三,GPU 供应短缺——Mordor Intelligence(2026 年 2 月)称 H100 和 MI300X 交期超过 12 个月——推动开发者选择池化托管 GPU 云,而非直接采购硬件,从结构上扩大弹性计算平台的可寻址市场。第四,AI 支出正从训练为重加速转向推理为重:到 2025–2026 年,对多数生产 AI 公司而言,推理在 AI 计算总支出中的占比已高于训练;相比一次性大型训练,推理工作负载更适合无服务器弹性计费。第五,北美贡献 AI inference-as-a-service 增量增长的 41.1%(Technavio 2026),与 Modal 总部和当前客户集中度相匹配。 三项采用约束在中期限制 Modal 的 TAM。Hyperscaler 既有优势是主要天花板:AWS、GCP 和 Azure 都把 AI 推理服务(Bedrock、Vertex AI、Azure OpenAI)捆绑进既有企业云协议、折扣计划(EDP/CUD)和采购关系,大企业把 AI 工作负载路由到独立供应商的成本很高。GPU 供应约束也压住伸缩承诺:只要 NVIDIA 硬件配额仍然紧张,即使 Modal 也无法保证立刻弹性扩展到数千张 GPU。大型模型部署的冷启动延迟是一项部署权衡:Modal 容器栈约一秒启动,但加载数十 GB 模型权重仍需数分钟,除非配置预热,而预热会提高有效成本。随着受监管行业企业买方要求明确的基础设施保证,数据驻留、HIPAA、FedRAMP 和 GDPR 合规要求正成为新约束,多租户无服务器云必须证明自己能满足。最后,裸金属 GPU 云(2026 年 6 月 RunPod L40S $0.86/hr)对愿意吸收运营开销、面向批处理优化或成本敏感的工作负载形成下行价格压力。[CM015, CM016, CM017, CM018, CM031, CM032]
| 驱动或约束 | 方向 | 时间 | 对 Modal 的影响 | 尽调问题 |
|---|---|---|---|---|
| AI 模型复杂度上升(参数更大 → 推理成本更高) | 驱动 | 持续;2025–2027 年加速 | 更大模型抬高平台价值;买方无法在规模化时自管 | 跟踪 NVIDIA 训练与推理收入拆分,确认推理占比增长 |
| Agentic AI 工作负载增长(Sandboxes、多步 LLM 循环) | 驱动 | 2024–2026 年开始显现;高速增长 | Sandboxes 是 Modal 的差异化产品;超大云厂商没有直接对标品 | 确认 Sandbox 收入占总收入比例,评估该细分权重 |
| GPU 供给短缺(H100/MI300X 交期 12+ 个月) | 驱动 | 当前;预计到 2026 年末部分缓解 | 推动买方从预留容量转向池化托管云 | 按季度跟踪 NVIDIA/AMD 可得性与交期趋势 |
| 支出组合从训练转向推理 | 驱动 | 持续;模型部署铺开后加速 | 推理工作负载(稳态服务)贴合 Modal 的计费模型 | 要求队列分析:推理工作负载在 Modal GPU 小时中的占比是否增长? |
| 北美是主导地理市场(贡献 41.1% 的增量增长) | 驱动 | 当前;与 Modal 纽约总部和客户基础匹配 | 地理匹配降低当前增长阶段的销售开销 | 确认国际收入拆分与扩张计划 |
| 超大云厂商存量优势(AWS Bedrock、Vertex AI、Azure ML 捆绑) | 约束 | 持续;对大型企业买方最强 | 已有 EDP/CUD 云承诺的客户会压缩 TAM | 量化已披露客户胜单中的 EDP 替换率 |
| GPU 供给天花板限制扩容承诺 | 约束 | 当前至 2026 年中;正在缓解 | 如果 Modal 配额不足,大型突发事件可能失败 | 要求 Enterprise 层级的 SLA 条款与容量保障文件 |
| 合规 / 监管摩擦(HIPAA、GDPR、SOC2、FedRAMP) | 约束 | 持续;医疗、金融、政府行业压力加大 | 没有认证证据会挡住受监管垂直行业扩张 | 确认已发布 SOC2 Type II 与 HIPAA BAA 可用性 |
增长驱动来自 Technavio(2026)、Mordor Intelligence(2026 年 2 月)和 MarketsandMarkets(2024 年 11 月)。约束行基于分析师报告、定价对比和 Modal 技术文档的推断。
[CM015, CM031, CM032, CM033, CM034, CM035]围绕 serverless GPU compute 采购最关键的五个维度,对不同买家细分做定性适配评估。
评级综合了公开案例研究、定价层级设计和 Series C 公告叙事。不是基于赢单率或 CRM 数据;Modal 没有披露可用的细分收入拆分。
[CM024, CM025, CM026, CM027, CM028, CM029]2.5 规模测算缺口、矛盾与尽调要求
接受任何关于 Modal 可寻址机会的具体市场规模前,应保留五个证据缺口。第一,尚无分析师发布专门的「serverless GPU cloud」或「Python-native AI compute platform」市场类别;所有规模估算都覆盖更宽或定义不同的类别,因此本章可服务市场数字是作者构建,而非已发表研究。第二,分析师估算在范围和规模上差异很大——从 $85.25B(Technavio,窄口径推理服务层)到 $394.46B(MarketsandMarkets,含硬件的完整 AI 基础设施),再到 $601.93B(MarketsandMarkets,最宽 AI 市场)——反映的是定义不一致,而不是预测分歧;尽调问题是压测哪种定义最贴近 Modal 的实际发票项目。第三,GPU fractionalization 趋势(Mordor Intelligence 2026 年 2 月引用的低于 $2/hr GPU 切片)是双刃剑:它扩大可寻址买方基础(进入成本更低),但同时压缩价格底部,并可能让批处理容忍型推理计算商品化。第四,Modal 的国际 go-to-market 牵引未公开披露;Asia-Pacific 预计拥有最高 CAGR(Mordor Intelligence 为 22.74%),代表未经确认的扩张机会。第五,Modal 的合规认证状态(SOC2、HIPAA、FedRAMP)未在抓取到的公开语料中得到独立确认,给企业和受监管买方留下缺口。投资人应要求公司直接提供按垂直行业划分的收入集中度、地域组合和合规认证证据,以填补这些缺口。[CM010, CM014, CM041, CM042, CM043, CM044]
2.6 图表
03竞争对手
3.1 竞争格局与待完成任务覆盖
Modal 处理的基础任务,至少与四类重叠竞争对手相同:在云端运行 GPU 加速 AI 工作负载,同时无需配置或维护底层硬件。这个格局最好分三层理解。第一层(直接无服务器同行):Baseten、Replicate、Beam Cloud 和 Banana.dev 都提供托管 GPU 计算,并采用开发者优先的部署模型。Baseten 聚焦关键任务推理,提供专属部署、定制性能 kernel(TensorRT-LLM、vLLM、SGLang)和前线部署工程师支持。Replicate 主要靠社区模型库竞争(数百个公开模型,一行 API 即可访问)和 Cog 打包。Beam Cloud 明确支持多云路由(AWS、GCP、Azure、Hetzner),目标是智能体沙箱加 GPU 推理。Banana.dev 采用固定月费加按成本计算(Team:每月 $1,200)且零加价,面向更看重简单性而非托管功能的团队。第二层(原始 GPU 云):RunPod 借助 FlashBoot 技术实现低于 200ms 冷启动,并达到 750,000+ 开发者和 $120M ARR(Sacra,2026 年 1 月);Lambda AI(原 Lambda Labs)转向「The Superintelligence Cloud」,具备 ISO 27001/SOC 2 合规和专属集群管理。CoreWeave 将自己定位为「the world's #1 AI cloud platform」,拥有 Kubernetes 原生基础设施、96% 集群 goodput,以及与 OpenAI 和 Meta 的数十亿美元合同。第三层(hyperscaler 既有玩家):AWS SageMaker 提供统一的数据-分析-AI studio;Google Cloud Run 提供按需 L4 GPUs、5 秒启动和 scale-to-zero;Google 的 Gemini Enterprise Agent Platform(原 Vertex AI)提供 200+ 模型和完整 MLOps 工具;Azure Container Apps 提供无服务器 AI 应用托管,包括面向智能体代码执行的 Sandbox containers。Together AI 处在相邻位置:其以 $3.3B 估值完成 $305M Series B(Sacra),主要在基础模型访问的按 token 推理定价上竞争,而不是定制模型托管。现状替代方案——在 AWS、GCP 或 Azure 预留 GPU 实例之上搭 Kubernetes 集群——仍是大型企业默认选择,也是 Modal 最难替换、切换摩擦最高的路径。[CP001, CP002, CP003, CP004, CP005, CP006]
| 竞争对手 | 类别 | 规模 / 融资 | 目标细分市场 | 差异化 | 相比 Modal 的限制 |
|---|---|---|---|---|---|
| Baseten | 直接 serverless 同业 — 托管推理 | 已融资 $585M(Business Wire);$150M Series D | 企业 ML 团队;生产推理 | 推理优化栈(vLLM/TRT/kernels)、前置部署工程师、自托管 + 多云选项、SOC 2 + HIPAA | 不是 Python 原生 SDK;Truss 框架需要 YAML;开发者主导的 PLG 动作较弱 |
| Replicate | 直接 serverless 同业 — 社区 API | 25,000+ 付费客户(Sacra);已获 Series B 融资 | 开发者原型验证;模型发现;社区 ML | 一行 API、10,000+ 公开模型、Cog 打包 | 私有模型计费包含闲置时间;企业控制姿态较弱;同一平台不支持训练 |
| Beam Cloud | 直接 serverless 同业 — 沙箱 + GPU | 早期;定价从 $0.000192/sec 起(RTX 4090) | AI agents;多云算力;Python-first 构建者 | Python-first 沙箱、明确多云(AWS/GCP/Azure/Hetzner)、Docker-in-Docker、GitHub Actions CI/CD | 规模 / 客户基础更小;公开企业案例少于 Modal |
| Banana.dev | 直接 serverless 同业 — 固定费率 GPU | 早期;Team $1,200/月 + 按成本计算算力 | 想要简单定价和零算力加价的小团队 | 固定月费 + 零加价算力模型 | 功能广度有限;没有沙箱 / 训练 / volumes 对等能力;GPU SKU 更少 |
| RunPod | 原始 GPU 云 / serverless 替代品 | 750,000+ 开发者;$120M ARR(Sacra,2026 年 1 月);已融资 $22M | 成本敏感的 AI 构建者;训练工作负载;基础设施重团队 | 低于 200ms 的冷启动(FlashBoot)、30+ GPU SKU、31 个区域、OpenAI 基础设施合作伙伴(2026 年 3 月公告) | 服务生命周期更偏 DIY;Community Cloud 质量不一致;Python 原生体验较弱 |
| Lambda AI(Lambda Labs) | 专用 GPU 云 | 已融资 $64M+;ISO 27001/ISO 27017/SOC 2 Type II;硬件 + 云 | 大型基础模型训练;受监管企业;合规优先买方 | ISO/SOC 合规栈、专用集群管理、按需 / 年度 H100 实例 | 不是 serverless / autoscaling;不太适合突发推理工作负载;定价不是按秒 |
| CoreWeave | 超大规模 GPU 云 | 与 OpenAI/Meta 签有数十亿美元合同;>32 个数据中心;250,000+ GPUs | 基础模型实验室;多 GPU 训练集群;大型推理部署 | 96% 集群 goodput、Kubernetes 原生、H100/H200/B200/GB300 库存,声称启动速度比超大云厂商快 10x | 不是 serverless;需要预留 / 合同;主要面向集群规模工作负载,而非逐函数推理 |
| Together AI | 邻近 — 按 token 计费的基础模型推理 | $305M Series B,估值 $3.3B(Sacra);基于 NVIDIA Blackwell | 通过 token API 使用基础模型的开发者;价格有竞争力的 LLM 路由 | 按 token 定价(例如 DeepSeek V4 Pro 输入 $2.10/1M tokens)、托管 API、Blackwell GPUs | 不托管自定义模型;不是 GPU serverless 平台;计费单位不同(token vs. GPU-second) |
| AWS SageMaker / Bedrock | 超大云厂商存量玩家 | AWS 规模;与完整 AWS 数据 / 分析平台集成 | 已承诺 AWS 的企业;数据 + AI 统一工作流买方 | 数据 + AI 的 Unified Studio、治理、批量推理 50% 折扣、企业 IAM / 合规 | 定价复杂;运维负担更重;Python-first DX 较弱;AWS 锁定更强 |
| Google Cloud Run / Vertex AI(Google 云竞品) | 超大云厂商存量玩家 | GCP 规模;L4 GPU 按需;Gemini Agent Platform 内有 200+ 模型 | GCP 开发者;agentic AI 构建者;企业 AI 平台团队 | 5 秒 GPU 启动、scale-to-zero、Gemini Enterprise Agent Platform 含 200+ 模型和 MLOps 工具 | GCP 原生;多云较弱;按项目计费复杂;Vertex 更名为 Agent Platform 增加混淆 |
| Azure Container Apps | 超大云厂商存量玩家 — serverless | Azure 规模;亚秒级启动;面向 agentic code 的 Sandbox | 已承诺 Azure 的企业;agentic AI 应用构建者;受监管行业 | 用于不可信代码的 Sandbox containers、serverless GPU(按秒付费)、用于快速部署的 Express tier | 仅 Azure;无多云;存储 / 网络另收 Azure 服务费;计费模型复杂 |
| Internal build(K8s + 预留 GPUs) | 现状 / 内部自建 | 资本开支高;devops 负担重;多年 GPU 预留 | 已有云承诺的大企业平台工程团队 | 最大控制权、既有 IAM / 合规集成、无供应商依赖 | 运维负担最高;3 年 GPU 预留;DevOps 人力成本显著;扩容慢 |
竞争对手规模数据来自 Sacra、公司官网和新闻稿。融资 / 收入数字在标注为公司声称或第三方报道时为估算。内部自建行概括 Modal 潜在客户本来会维护的现状替代方案。
[CP001, CP002, CP003, CP004, CP005, CP006]两条轴采用顺序评分:开发者体验(Python 原生度、DX 简洁度、SDK 质量)对比企业控制(合规、自托管、治理姿态、采购路径)。分数是有证据支撑的顺序估计,不是基准测试;x 轴是相对 DX 评估,y 轴反映抓取来源确认的公开企业控制功能。
[CP001, CP004, CP005, CP007, CP008, CP009]3.2 竞争对手画像与能力对比
在直接无服务器同行中,Modal 和 Baseten 对生产推理工作负载最直接互为替代,但二者打包哲学不同。Modal 是纯 Python SDK:开发者用 `@app.function()` 装饰器包裹函数,再调用 `.remote()` 在云端执行,容器构建和多云调度自动完成。Baseten 依赖 Truss 框架(一种基于 YAML 的模型打包标准),并提供明确的推理优化栈,包括定制 kernel、speculative decoding 和 KV-cache 管理——这些能力不在 Modal 的通用平台中。Baseten 还提供前线部署工程师(FDEs)作为高接触支持模式,这是 Modal 未公开宣传的高端差异点。Replicate 的差异更根本:面向社区的模型库(Flux、Stable Diffusion 等公开模型)是主要用户漏斗,私有定制部署只是次级用例。Replicate 私有模型按专属硬件的 setup time、idle time 和 active time 计费,不同于 Modal 的 scale-to-zero 无服务器计费模型。Beam Cloud 在单一平台内提供 sandboxes(用于智能体代码执行的安全容器)、GPU 推理和明确的多云路由,并支持 Docker-in-Docker 和 GitHub Actions 部署集成。Modal 的 Sandbox 产品(同样运行在 gVisor 安全容器中)直接竞争 Beam Cloud 的 sandbox 以及 Azure Container Apps 面向智能体代码执行工作负载的 Sandbox。原始 GPU 云方面,RunPod 的 FlashBoot 实现低于 200ms 冷启动(供应商口径),而 Modal 预热容器冷启动约一秒。RunPod 运营两层基础设施:来自数据中心合作伙伴的企业 Secure Cloud,以及来自经筛选个人主机的 Community Cloud。Lambda AI(原 Lambda Labs)已重新定位为完整 Superintelligence Cloud,面向大型基础模型训练和推理,并具备 ISO 27001、ISO 27017、ISO 27701、ISO 22301 和 SOC 2 Type II 证明;该合规姿态目前超过 Modal 的公开认证。CoreWeave 面向最大规模集群(H100/B200/GB200)提供 96% 集群 goodput,并声称相对 hyperscalers 推理启动快 10x。Hyperscaler 原生选项中,Google Cloud Run 的按需 NVIDIA L4 GPU 实例 5 秒启动并可 scale to zero,占据了 Modal 入门级 GPU 产品同一工作负载空间的有意义部分。Google 的 Gemini Enterprise Agent Platform(截至 2026 年 6 月已由 Vertex AI 改名)提供 200+ 模型、Agent Studio、定制训练和 MLOps 工具——平台范围比 Modal 广得多,但在定制模型部署上没那么 Python 原生。Azure Container Apps Serverless GPUs 提供按秒计费、scale-to-zero,以及明确用于执行 AI 生成代码的 Sandbox 模式,在 Azure 生态内镜像了 Modal 的 Sandbox 功能。[CP001, CP016, CP002, CP019, CP020, CP029]
| 购买标准 | Modal | Baseten | Replicate | RunPod Serverless | Beam Cloud | Google Cloud Run | AWS SageMaker | Azure Container Apps |
|---|---|---|---|---|---|---|---|---|
| Python 原生 SDK(无需 YAML/Dockerfile) | 是 — @app.function() 装饰器 | 部分 — Truss YAML 框架 | 部分 — Cog 配置文件 | 否 — container handler 模型 | 是 — Python SDK | 部分 — 常见运行时可从源码部署 | 否 — 基于 notebook + API | 否 — YAML/Bicep 配置 |
| 亚秒级 GPU 冷启动 | 是 — GPU 内存快照 + CUDA ckpt | 部分 — 声称冷启动快,但未披露机制 | unknown | 部分 — FlashBoot <200ms worker 启动(非模型加载) | unknown | 部分 — 5s GPU 实例启动(仅 L4) | 否 — 分钟级容器启动 | 部分 — 容器亚秒级启动,未说明 GPU 冷启动 |
| Scale-to-zero(无闲置成本) | 是 | 是 | 是 — 公开模型;私有模型收取闲置费 | 是 — Serverless 层级 | 是 — serverless 层级 | 是 | 部分 — 要配置 min-instance 才能为零 | 是 — 默认配置 |
| Sandbox / 隔离式 agentic 代码执行 | 是 — Sandboxes(gVisor) | unknown | 否 | 否 | 是 — Sandbox primitives | 否 — 仅 functions;没有明确 sandbox 模式 | 否 | 是 — Container Apps Sandbox |
| 多云 GPU 池化(不锁定单一云) | 是 — AWS + GCP + Oracle | 是 — 多云 + 自托管选项 | unknown | 部分 — 31 个区域,单一基础设施模型 | 是 — AWS/GCP/Azure/Hetzner | 否 — 仅 GCP | 否 — 仅 AWS | 否 — 仅 Azure |
| 同一平台托管分布式训练 | 是 — 多节点集群(Beta) | 是 | 部分 — 仅 fine-tunes | 是 | 是 | 否 | 是 | 否 |
| 企业信任(SOC 2 / HIPAA / 认证) | 部分 — HIPAA 仅 Enterprise 层级;SOC 2 未公开说明 | 是 — SOC 2 Type II + HIPAA | unknown | 部分 — Sacra 称 SOC 2 进行中 | unknown | 是 — GCP 继承 SOC 2/ISO/HIPAA 资格 | 是 — AWS 合规组合 | 是 — Azure 合规组合 |
| 自托管 / BYOC 部署选项 | 否 — 仅云端 | 是 — self-host 与 BYOC | 否 | 否 | 部分 — 部署到你的云账户 | 否 | 部分 — VPC 隔离,没有完整 BYOC | 部分 — Dedicated workload profile |
| 开发者生产力工具(notebooks、volumes、可观测性) | 是 — Notebooks、Volumes、Dicts、Queues、Datadog/OTel 集成 | 部分 — 聚焦部署;存储 primitives 较少 | 否 — 仅 API | 部分 — 日志与指标,无托管存储 | 部分 — 日志与指标 | 部分 — Cloud Monitoring 集成 | 是 — 完整 Studio,含 notebooks、pipelines、feature store | 部分 — Azure Monitor 集成 |
| 使用现有云承诺支出 | 是 — AWS/GCP/Azure marketplace 上架 | 是 — 企业云承诺 | unknown | unknown | unknown | 是 — 原生 GCP 支出 | 是 — 原生 AWS 支出 | 是 — 原生 Azure 支出 |
标为「未知」的单元格表示本轮抓取来源无法确认该能力。不要从缺失中推断能力。对比反映截至 2026 年 6 月的公开产品表面。Modal Enterprise 层级功能未完整公开;行备注只反映公开文档中的能力。
[CP001, CP002, CP003, CP004, CP005, CP010]按竞品类别,对五项购买标准的能力强度做评估。分数(高 / 中 / 低 / 未知)来自本轮抓取的公开产品表面;它们反映的是文档化能力,不是性能基准或客户调研数据。
[CP003, CP007, CP008, CP010, CP012, CP016]3.3 定价、分发与切换成本
Modal 采用用量计费(按 GPU/CPU 计算秒数),并有三档计划:Starter($0 基础费、每月 $30 免费 GPU 额度、10 个 GPU 并发)、Team(每月 $250 外加计算费用、50 个 GPU 并发)和 Enterprise(定制)。Beam Cloud 的无服务器定价大致可比:RTX 4090 为 $0.000192/second,A10G 为 $0.000292/second,CPU 为 $0.0000528/core/second。Banana.dev 收取每月 $1,200 Team 固定费,加按成本计算(声称零加价)。RunPod 的 L40S 在 Secure Cloud 上被引用为 $0.86/hr(第 2 章证据),显著低于 Modal 的托管等价产品——这是主要的成本底部压力点。CoreWeave 的 H200 NVL72 按需价格为 $42.00/hr(8-GPU 配置),目标是大型模型训练,而非按请求推理。AWS Bedrock 对开放模型访问提供比按需价格低 50% 的批量推理,为已承诺 AWS 支出的企业提供折扣路径。Together AI 的按 token 定价(例如 DeepSeek V4 Pro 每 1M 输入 tokens $2.10)面向不同的单位经济层——token 级计费,而非 GPU 秒计费。Hyperscalers 通过云承诺计划(AWS Enterprise Discount Programs、GCP Committed Use Discounts、Azure MACC)主导企业分发,把 AI 计算打包进既有合同。Modal 通过与主要云供应商的 marketplace 集成部分缓解这一点,让企业能使用既有承诺支出、减少采购摩擦;Sacra 分析确认了这一策略。该市场切换成本中等。Modal 的 Python SDK 装饰器模式造成工作流层锁定:将大型代码库从 `@modal.function()` 装饰器迁移到替代方案,需要不小的重构。不过,底层模型权重、Docker 容器标准和推理框架(vLLM、TensorRT-LLM)可迁移,客户可以跨平台 multi-home。RunPod 明确宣传无锁定。Baseten 的 Truss 框架产生另一种打包锁定,需要格式迁移。最深的锁定存在于现状替代方案:已建设 Kubernetes GPU 基础设施的企业,常被多年 devops 投入、定制监控、IAM 集成和供应商关系锚住。Modal 最好的销售动作,是销售维护那套基础设施的成本,而非直接正面价格竞争。[CP001, CP004, CP005, CP006, CP018, CP021]
| 供应商 | 计费单位 | 样例费率 | 基础 / 平台费 | 闲置成本 | 对 Modal 对比的关键含义 |
|---|---|---|---|---|---|
| Modal | 按秒(GPU + CPU) | H100 SXM(由文档 GPU 列表推断);A10G 约 $0.000306/sec(公开费率卡近似) | $0(Starter);$250/月(Team);Enterprise 定制 | 无 — scale to zero | 基准;开发者友好;无闲置成本;Team 层级在算力之外形成 $3K/年下限 |
| Baseten | 按 GPU-second + 带宽(Basic 按量付费;Pro/Enterprise 定制) | 未公开列示每 GPU 费率;Pro 需要询价 | $0(Basic 按量付费);定制(Pro/Enterprise) | Basic 无;Pro 专用算力隐含预留成本 | 标价不透明;HostFleet(2026 年 4 月)将 Baseten 列为同业中每 GPU-hour 最高;生产工作负载的性能抵消支撑溢价 |
| Replicate | 按秒(私有模型使用专用硬件) | GPU-second 费率因模型类型而异;公开模型按 prediction 计费 | $0 | 是 — 私有模型在专用硬件上的闲置时间也计费 | 自定义模型的闲置计费,是相对 Modal 处理突发工作负载的结构性成本劣势 |
| RunPod Serverless | 按秒(仅 worker 活跃时间) | RTX 4090 约 $0.00069/sec(由公开现货费率约 $0.25/hr 推断) | $0 | 无 — scale to zero(Flex workers) | 价格下限竞争者;L40S 被引用为 $0.86/hr;显著低于 Modal 托管费率 |
| Beam Cloud | 按秒(CPU + GPU)+ 按需小时 | RTX 4090 serverless $0.000192/sec;A10G $0.000292/sec;H100 PCIe 按需 $1.74/hr | $0(serverless);按标价随用随付 | 无 — serverless 层 | 计费模式与 Modal 相近;公开的 serverless 费率更低,直接压低入门 GPU SKU 的价格 |
| Banana.dev | 固定月费 + 按成本计费(声称零加价) | 按成本(不加价);底层 GPU 费率未公开 | $1,200/月(Team,最多 50 个并行 GPU) | 未知 — 公开网站未说明 | 定价结构少见;对负载稳定的团队有吸引力,但可变工作负载的起步门槛高 |
| Lambda AI | 按小时(按需或预留)— 非 serverless | H100 按需 $2.40/hr(年度预留),来自 Sacra RunPod 来源 | $0 | 按需无锁定;预留会锁定算力 | 与 Modal serverless 不完全可比;面向专用训练集群 |
| CoreWeave | 按小时(按需或 spot)— 非 serverless | H200 NVL72:按需 $42.00/hr;B300 spot:$35.84/hr | $0 | Spot 可能被抢占;生产 SLA 需要预留 | 面向大集群训练 / 推理;最低支出高得多;买方画像不同 |
| AWS Bedrock(开放模型批处理) | 按 1K tokens(按需或批处理) | 支持模型的批量推理价格比按需低 50% | $0(按量付费);通过 EDP 获得 Enterprise Agreement 折扣 | 批处理无锁定 | token 计费模式;不同于 GPU-second;只与基础模型推理有关,不适用于自定义模型部署 |
| Google Cloud Run(GPU) | 按秒(vCPU + 内存 + GPU) | L4 GPU 按需(存在价格表,但抓取来源未给出按秒公开价) | $0(每月前 2M 次请求免费) | 无 — 可缩至零 | GCP 原生;L4 可 5 秒启动;仅有 L4;GPU SKU 范围小于 Modal |
| Azure Container Apps(Serverless GPU,无服务器 GPU) | 按秒(vCPU + GiB + GPU 附加项) | 抓取来源未公开(需 Azure 定价计算器) | $0(每订阅每月前 180,000 vCPU-seconds 免费) | 容器未处理请求时收取较低闲置费率 | Azure 生态买方可使用既有 MACC 支出;GPU SKU 范围未确认 |
从小时费率推导的按秒价格均为近似值(÷ 3600)。Baseten 公开标价披露不完整;Baseten 第 3 章引用了 HostFleet 截至 2026 年 4 月的对比。所有费率都可能变化。Modal GPU 价格表未在定价页完整公开;A10G 估算来自第三方来源。用于 M&A 或竞争定位前,建议按当前定价页复核。
[CP001, CP005, CP006, CP016, CP017, CP018]3.4 护城河耐久度与竞争风险
Modal 最耐久的护城河是架构:亚秒级 GPU 冷启动(来自 GPU 内存快照、内容寻址容器文件系统和 CUDA checkpoint/restore)、Python 原生易用性(多数用例不需要 YAML 或 Dockerfile),以及多云 GPU 池化,合在一起构成一个花了五年搭建、无法轻易复制的栈。2026 年 5 月 $355M Series C 给公司提供了继续推进硬件合作和 R&D 的资本。不断扩大的企业客户名单(Physical Intelligence、Suno、Cognition、DoorDash、Substack)提供了社会证明和案例证据,说明平台经受过生产考验。Sacra 指出,Modal 与 Oracle Cloud Infrastructure 的合作带来单一 hyperscaler 无法提供的定价灵活性和 GPU 容量。不过,Modal 面临实质侵蚀风险。第一,hyperscaler 收敛:Google Cloud Run 的 L4 GPU 实例(5 秒启动、scale-to-zero)和 Azure Container Apps Serverless GPUs(按秒计费、支持 sandbox)都在既有企业云关系——同一条采购路径——内复刻了 Modal 的核心无服务器 GPU 主张。第二,性能商品化:RunPod 的 FlashBoot(低于 200ms 冷启动)和 Baseten 的专属推理优化栈,都在特定工作负载中缩小 Modal 的性能优势。第三,合规缺口:Lambda AI 的 ISO 27001/ISO 27017/SOC 2 Type II 组合和 Baseten 的 SOC 2 Type II + HIPAA 认证,为受监管行业买方提供了纸面证明更强的替代方案;Modal 的 HIPAA 合规仅限 Enterprise 层,更广的合规路线图未公开披露。第四,价格底部压力:RunPod L40S $0.86/hr 和 Beam Cloud RTX 4090 约 $0.69/hr($0.000192/sec × 3,600)为批处理工作负载提供了明显更低的价格底部,而这些场景中开发者体验溢价不那么值钱。Hacker News 的负面信号(2026 年 6 月,第 1 章引用)称一个月内发生三次重大中断(2026 年 5 月 7 日、5 月 19 日、6 月 3 日);在 uptime SLA(Baseten 声称 99.99%)可构成差异化的竞争市场中,这是尤其相关的可靠性尽调警示。净竞争结论是:Modal 的护城河真实存在,但比专有模型或数据网络护城河更软;它建立在累积基础设施投入、开发者体验质量和平台宽度之上,而随着同行缩小技术差距,这三者都需要持续投入才能维持。[CP014, CP016, CP025, CP026, CP039, CP010]
| 护城河主张 | 支撑证据 | 威胁 | 严重性 | 缓释措施 / 尽调问题 |
|---|---|---|---|---|
| 借助内存快照实现亚秒级 GPU 冷启动 | 2026 年 5 月博客详述四层技术栈(云缓冲、内容寻址 FS、CPU ckpt、CUDA ckpt);Physical Intelligence 已在生产中确认(10–15ms 延迟) | RunPod FlashBoot 声称 worker 启动低于 200ms;Google Cloud Run L4 GPU 5 秒启动;Azure Container Apps 容器亚秒级启动 | 中 — RunPod 拉近差距,但没有达到 GPU 级内存快照深度;超大云厂商受限于 L4 | 核实 RunPod FlashBoot 是已加载模型还是仅启动 worker;用相同模型权重对 Modal、RunPod、GCR 做冷启动基准测试 |
| Python 原生 SDK 易用性(@app.function 装饰器) | Suno CTO 称:「你只需要知道,可以用几行 Python 在云端扩展函数调用」;并提到零配置文件 | Beam Cloud 提供 Python-first SDK,装饰器模式相似;未来超大云厂商 DX 也可能改进 | 低-中 — Beam Cloud 还早、规模更小;Modal 的 SDK 成熟度和文档深度构成切换成本 | 跟踪 Beam Cloud SDK 使用量和 HN 开发者情绪;评估 Beam Cloud 到 2026 年能否在 AI 工程师社区获得牵引力 |
| 多云 GPU 池化(AWS + GCP + Oracle) | Sacra 确认 Oracle Cloud Infrastructure 合作带来定价弹性;Modal 文档确认多云调度 | Baseten 和 Beam Cloud 都提供多云或 BYOC 选项;超大云原生选项天然是单云池化 | 中 — Baseten 的自托管和 BYOC 比 Modal 仅托管的多云模式更适合企业 | 确认 Oracle 合作条款和 GPU 分配保障;评估前 10 大企业客户是否需要 BYOC |
| 企业客户锁定(Python SDK 工作流耦合) | Applied Compute、Cognition、Lovable 被列为深度集成用户;Sandboxes 支撑数百万个 coding agent 环境 | 模型权重、容器和推理框架(vLLM、TRT-LLM)都可迁移;这个市场结构上容易多家部署 | 中 — 工作流层面的锁定存在,但数据可迁移性仍完整;成熟企业会双供应 | 跟踪 12 个月续约时的客户 NPS 和流失;识别已经与 RunPod 或 Baseten 多家部署的账户 |
| Series C 资本($355M)延长 runway,并增加 GPU 合作资源 | 2026 年 5 月确认以 $4.65B 估值融资,投资方包括 General Catalyst、Redpoint、Menlo、Bain、Accel | CoreWeave 拥有数十亿美元合同;Baseten 已融资 $585M;超大云厂商资产负债表近乎无限 | 低 — Modal 在这一阶段资本位置很强;超大云厂商的财务优势是结构性的,不是近期风险 | 审查资本配置计划:GPU 预留承诺、R&D 人员、企业销售产能 |
| $300M+ ARR 增速(Series B 到 Series C 增长 5 倍) | Sacra 估计 2026 年 4 月 ARR 为 $300M;公司称自 Series B 以来增长「五倍」 | 收入集中在 AI-native 初创公司(Suno、Cognition)会带来流失风险,一旦这些客户放缓支出;公司声称的 ARR 未经审计 | 中 — 集中度风险真实存在;没有独立收入验证 | 用经审计收入或客户级使用数据验证 ARR;评估前 10 大客户收入集中度 |
| 与受监管行业竞争对手相比,合规存在缺口 | Lambda AI 持有 ISO 27001/ISO 27017/ISO 27701/ISO 22301/SOC 2 Type II;Baseten 全层级持有 SOC 2 + HIPAA;Modal HIPAA 仅限 Enterprise | 大型企业和政府买方采购前越来越要求完整合规栈;Modal 未获 FedRAMP 授权 | 高 — 这是医疗、金融和联邦市场的具体替代风险 | 确认 Modal 2026–2027 年合规路线图;评估 FedRAMP 或 ISO 认证是否已规划或列入预算 |
严重性评级(低 / 中 / 高)基于证据质量、竞争对手能力和达到重要性的时间窗口综合判断。尽调问题面向未来,需要本轮未能取得的一手来源验证。
[CP007, CP008, CP010, CP012, CP014, CP015]截至 2026 年 6 月,Modal 在六个维度上的竞争耐久性摘要。评级只反映本章抓取来源的证据质量。
[CP008, CP014, CP016, CP018, CP025, CP026]04财务
4.1 收入模型与公开定价
Modal 只对计算使用量收费;没有按席位、按 API 调用或按 token 计量的费用。三档计划构成商业框架:Starter($0/month)含 $30/month 免费计算额度、三个 workspace seats、100 个 containers 和 10 个 GPU concurrencies;Team($250/month)增加 $100/month 额度、无限 seats、1,000 个 containers、50 个 GPU concurrencies、自定义域名、static IP proxy 和 deployment rollbacks;Enterprise(定制定价)增加用量折扣、更高 GPU concurrency、嵌入式 ML 工程服务、private Slack 支持、audit logs、Okta SSO 和 HIPAA 合规。CPU 计算按 $0.00003942/core/second(约 $2.37/core-hour)计费,内存按 $0.00000672/GiB/second(约 $0.024/GiB-hour)计费。Modal 自家定价页用代表性例子说明无服务器与传统成本模型:传统云方案中,75 张 GPU 运行 24 小时、按 $3/GPU-hour 计算,成本为 $5,400;Modal 无服务器方案平均使用 50 张活跃 GPU、按 $3.95/GPU-hour 计算,成本为 $4,740——说明单价小幅溢价可由利用率改善抵消。 计算之外还有三条不同收入界面:Volumes(分布式文件存储,按 GB/天计费)、Sandboxes(用于智能体和不受信任代码工作负载的隔离执行容器,像 Functions 一样按秒计费),以及 Notebooks(托管 Jupyter 环境,采用无服务器定价和自动空闲关闭)。Series C 博文披露,Sandboxes 目前贡献总收入超过三分之一,是 compute Functions 之后第二大收入线。这是结构性重要信号:它意味着 Modal 不是纯 GPU 租赁业务,而是一个平台;智能体执行基础设施在推出不到两年内,已经独立成长为九位数收入线。 AWS 和 GCP marketplace 集成让企业客户可把承诺云支出用于 Modal,显著降低已有承诺的大客户采用阻力。Startup program 向早期公司提供免费 GPU 额度。计费系统按月结算,并对使用峰值追加收费;Team 和 Enterprise 计划可访问 billing-report API,用于跨 workspace 成本归因。定制发票、国际银行转账和拆分发票是 Enterprise 层功能,说明 Modal 已具备支持大额交易机制的运营基础设施。标价只是外层;真实企业经济性取决于用量折扣、定制承诺和支持附加率——这些均未公开披露。[CI001, CI002, CI003, CI004, CI005, CI006]
| 收入流 | 机制 | 单位 | 当前数值 / 状态 | 质量 | 尽调问题 |
|---|---|---|---|---|---|
| Compute Functions(CPU + GPU) | 所有容器执行(CPU 和 GPU)按秒计费 | CPU:$0.00003942/core/sec;Memory:$0.00000672/GiB/sec;GPU:按秒市场价 | 核心收入界面;准确 GPU 层级定价可在定价页(wayback)获得 | 计费单位可信度高;按 GPU 类型的实际收益率可信度低 | 提供按 GPU 类型划分的收入组合、平均实际成交价与标价对比、以及按 GPU 家族划分的毛利率。 |
| Sandboxes | 隔离容器环境按秒计费;计算定价结构与 Functions 相同 | 按秒;同 CPU/内存/GPU 费率 | >1/3 的总收入,来自 Series C 博客(2026 年 5 月);增长最快的产品线 | 披露可信度高;利润率细节可信度低 | 提供 Sandbox 收入轨迹、平均会话时长,以及 GPU Sandboxes 是否有不同利润率。 |
| Storage(Volumes 和 Buckets) | Volume 快照按 GB 每日计费;定价页提到按 GB 费率 | 每 GB 每日 | 定价页列出;可访问归档未披露费率 | 低 | 提供 storage 收入占 ARR 比例、每客户平均 GB、以及毛利率。 |
| Notebooks | 基于浏览器的托管 Jupyter,采用 serverless 定价并自动闲置关停 | 按秒(同计算费率) | 近期推出;产品页在线;收入贡献未知 | 低 | 提供 Notebooks 激活和付费转化、平均会话时长、以及收入贡献。 |
| Team 方案订阅 | $250/月的经常性平台费,独立于计算用量 | 每 workspace $250/月 | 定价页确认标价;workspace 数量和付费方案挂载率未知 | 标价可信度中;实际组合可信度低 | 提供 Team-plan workspace 数量、订阅产生的月度经常性收入,以及从 Starter 升级的比例。 |
| Enterprise 方案(定制) | 定制定价,包括量级折扣、嵌入式工程、更高并发、合规功能 | 定制合同 | 公开营销;未披露合同价值、最低承诺或 ACV 数据 | 低 | 提供 Enterprise ACV 分布、最低计算承诺、支持服务挂载率和续约行为。 |
| Startup credits 计划 | 向早期初创公司提供免费计算额度;获客渠道;随增长转付费 | 补贴型 | 计划已上线;披露为获客工具;无转化数据 | 低 | 提供 startup cohort 转化率和首次付费发票用时指标。 |
公开证据清楚确认了计费界面和单位;产品级收入组合以及高于标价之外的实际定价未公开披露。
[CI001, CI002, CI003, CI004, CI005, CI006]| 价格 / 单位 / 合同 | 标价与实际定价 | 折扣 / 未知项 | 有来源支撑的含义 |
|---|---|---|---|
| Starter:$0/月 + 计算 | 纯标价;含 $30/月免费计算额度 | 无公开转化数据、ARPU 或激活率 | 带计算补贴的实质免费试用;漏斗入口摩擦低。 |
| Team:$250/月 + 计算,含 $100/月额度 | 标价已确认 | 量级折扣未公开;升级触发点(并发限制、自定义域名)清晰 | 每个 workspace 有可预测的 $250 MRR,再叠加计算扩张;付费订阅 ARR 取决于 workspace 数量。 |
| Enterprise:定制定价 | 报价制;量级折扣、嵌入式工程、更高 GPU 并发、合规 | 最低计算承诺、ACV、续约条款均未披露 | Enterprise 层是收入收益率和利润率最偏离标价的地方;属于关键尽调目标。 |
| CPU 计算:$0.00003942/core/sec(约 $2.37/core-hr) | 标价(定价页,Wayback 2026 年 6 月快照) | Enterprise 谈判费率未知 | 对云服务商而言,精确到秒的 CPU 费率透明度少见。 |
| Memory:$0.00000672/GiB/sec(约 $0.024/GiB-hr) | 标价 | Enterprise 谈判费率未知 | 内存定价可从定价页独立验证。 |
| GPU 示例(定价页):serverless 约 $3.95/GPU-hr,传统云 $3/GPU-hr | 定价页上的示例标价;不是按具体 GPU 类型列出的费率表 | 实际按 GPU 类型的定价无法在公开归档中取得;RunPod 列出 H100 SXM 为 $3.29/hr,可作对比 | Modal 的 serverless 溢价温和(相对 RunPod H100 SXM 约 20%),也低于纯托管云替代方案。 |
| AWS/GCP marketplace 集成 | 合同机制;Modal 通过超大云 marketplace 交易 | 无公开抽成率或 marketplace 折扣披露 | 降低企业采购摩擦;marketplace 费用会略微压低实际收入。 |
标价透明度高于多数私有基础设施同行;实际企业收益率、按 GPU 类型的费率和 marketplace 经济性均未披露。
[CI003, CI004, CI005, CI006, CI007, CI008]Modal 将开发者在 Functions、Sandboxes、Volumes 和 Notebooks 上的计算消耗转化为按秒计量收入,再把其中一部分升级为价值更高的 Team 和 Enterprise 合同。
该流程描绘商业逻辑,不量化收入组合。只有 Sandbox >1/3 的收入占比由公司披露;其他拆分均未公开。
[CI001, CI002, CI003, CI006, CI007, CI008]4.2 GTM 动作与销售效率代理指标
Modal 的 GTM 走的是开发者自助进入、用量扩张后再上企业版的路径。免费的 Starter 档和每月 $30 计算额度负责拉开漏斗,让任何 Python 开发者都能低门槛试用。从 Starter 升到 Team($250/month)的触发点很清楚:团队碰到并发上限(Starter 10 个 GPU slot,Team 50 个),需要自定义域名和静态 IP,或需要程序化账单报表。从 Team 跳到 Enterprise,则主要由合规(HIPAA、Okta SSO、审计日志)、SLA 要求、专属工程支持,或批量承诺带来的经济性驱动。Startup Program 又给高增长公司单独开了一条获客通道,提供免费 GPU 额度和 Modal 工程团队直连;这种早期品牌亲近感,可能在创业公司放大后转成付费。 公开案例承担了主要 GTM 证明功能,而不是量化转化率指标。Substack 把整套 ML 组合从 AWS SageMaker——一个重要且粘性很强的 AWS 产品——迁到 Modal;Quora 的 Poe 用 Modal Sandboxes 做安全代码执行,Quora 估算由此省下了相当于两名工程师的持续维护工作。为 DoorDash、Cognition、Mercor 提供 RL 基础设施的 Applied Compute 称,Modal 是唯一在 RL loop 每一层都给出合适 primitives 的平台。Cognition 报告并行运行数百万个 Sandboxes,意味着单客户 sandbox 消耗量极高。这些案例隐含的开发者到企业迁移轨迹——从创业公司档进入、生产规模使用、最终升级企业版——符合 PLG 转企业销售的动作。 Modal 没有公开 CAC、回收期、企业销售周期长度、NRR 或流失数据。GTM 效率最好的可用代理指标是收入增速:按 Sacra,ARR 从 2025 年底约 $119M 增至 2026 年 4 月 $300M+,Modal 的增长速度看起来快到不太可能被获客成本约束;这指向两种可能:开发者自助渠道 CAC 很低,或存量账户扩张带来很高 NRR。没有 cohort 数据,两种解释都无法确认。[CI002, CI003, CI009, CI013, CI014, CI015]
4.3 成本结构与单位经济性代理指标
Modal 采用轻资产供给模型:它聚合 AWS、GCP、Oracle Cloud Infrastructure 等多家云厂商的 GPU 容量,而不是直接购买或融资持有 GPU 硬件。因此,Modal 的成本结构主要是可变成本,随客户计算消耗扩张。没有自有 GPU 资产,Modal 避开了重资本折旧和供应链风险,但也带来结构性的毛利率上限:Modal 的实际毛利来自客户支付价格与云厂商向 Modal 收取计算费用之间的价差。Series C 博文称,Modal 在全球「数百个数据中心」做多云池化,目的在于利用区域容量差异、降低闲置成本;但 Modal 与每家 hyperscaler 谈到的采购折扣没有披露。 自研技术层——定制 Rust 容器运行时、内容寻址分布式文件系统、CPU checkpoint/restore、GPU memory snapshotting——是结构性降本机制。按 truly-serverless-gpus 博文和 Series C 博文,GPU snapshotting 将冷启动改善 40–100x;这意味着相比需要 30–60 秒冷启动的平台,Modal 可以用更少闲置 GPU 秒数承接突发工作负载。对收入成本的影响很实在:如果客户工作负载呈突发模式,Modal 即使支付同样的原始基础设施费率,也能比那些在预热上浪费更多 GPU 秒的平台维持更高的整体 GPU 利用率。这是一个直接支撑毛利的效率护城河,即便标价与竞争对手相近。 定价侧,RunPod 公开 GPU cloud 费率与 Modal 示例定价对比显示,serverless 溢价温和。RunPod 列出的 H100 SXM 为 $3.29/hr、A100 SXM 为 $1.49/hr;Modal 定价页示例意味着其 serverless pool 约 $3.95/GPU-hr。这个溢价与自动扩缩容、亚秒级冷启动、托管基础设施开销的价值相符。AWS EC2 GPU 实例标价(on-demand p4d.24xlarge,8x A100)明显高于原始 GPU clouds,使 Modal 在托管云层级具备竞争力,而不是与原始算力租赁正面竞争。 Modal 没有公开毛利率、COGS 拆分或云采购条款。覆盖可比 infrastructure-as-a-service 公司的独立分析师估算显示,拥有专有效率技术的轻资产 GPU 聚合商可实现 30–50% 毛利率,但这个区间尚未在 Modal 身上得到验证。Sacra 收入估算(2026 年 4 月 ARR $300M)和 Series C 估值($4.65B)意味着 15.5x ARR 倍数,这与高增长基础设施公司相符,但没有解决毛利率问题——如果毛利率只有 30%,15.5x ARR 对应约 50x 毛利润倍数,要求很高。[CI021, CI022, CI023, CI024, CI025, CI026]
| 指标 | 数值 / 公开 proxy | 可信度 | 重要性 | 尽调问题 |
|---|---|---|---|---|
| 已公开计费单位 | 按秒计算(CPU、GPU、内存);按 GB-day 存储;月度方案费 | 高 | 显示 Modal 以极细粒度变现用量,能最大化捕获突发型工作负载收入。 | 提供按产品线划分的计费单位收益率,以及按方案层级划分的平均发票金额。 |
| 收入增速(公开说法) | 自 2025 年 10 月 Series B 以来增长 5 倍;从约 $119M ARR(2025 年 12 月)到 $300M ARR(2026 年 4 月) | 中 — 公司说法加 Sacra 佐证;未经审计 | 意味着五个月增长约 150%;若能持续,业务复合增速快到 CAC 很难构成可信约束。 | 提供过去 12 个月的月度 ARR cohort 数据,以及新增与扩张收入拆分。 |
| Sandbox 收入占比 | >1/3 的总收入,来自 Series C 博客披露(2026 年 5 月) | 中 — 公司披露;未独立验证 | 推出不到三年即成为第二大产品线;说明平台宽度降低单一产品集中风险。 | 提供过去四个季度 Sandbox 收入趋势。 |
| GPU 成本与标价(proxy) | RunPod H100 SXM:$3.29/hr;Modal 定价页示例:serverless 约 $3.95/GPU-hr | 中 — 公开标价对比;不是 Modal 实际 COGS | 相对低成本 GPU 云只有约 20% 标价溢价;若采购折扣存在,说明仍有一定毛利空间。 | 提供按供应商和 GPU 类型划分的实际 GPU 采购费率,以及按 GPU 家族划分的毛利率。 |
| 毛利率 | 未公开披露;可比的轻资产 GPU 聚合商估计为 30–50%(分析师区间,未针对 Modal 验证) | 低 — 仅为估计 | 毛利率决定 $300M ARR 是否能转化为有意义的盈利贡献。 | 提供经审计或管理层报告的按产品线毛利率。 |
| CAC / 回本周期 | 未披露;PLG 模型意味着 CAC 较低,但没有公开转化或回本数据 | 低 | 开发者驱动模型的 CAC 效率决定增长是否资本高效。 | 提供按获客渠道划分的 CAC、各 cohort 的收入转化用时、以及按方案层级划分的回本周期。 |
| NRR / 流失 | 未披露;ARR 快速增长意味着净留存强,但缺少 cohort 拆分 | 低 | NRR 高于 100% 将证实扩张收入 thesis;流失低于 5% 将验证可靠性感知。 | 提供 logo 和 dollar churn、按 cohort 年份划分的 NRR、以及客户集中度(前 10 大占 ARR 百分比)。 |
| 人效 | ~$300M ARR / ~120–180 名员工 = 每员工 ~$1.67M–$2.5M ARR | 中 — 两组数字都是估计或公司声称 | ARR/员工比在私有基础设施公司中居前;说明运营模型精简,符合 PLG。 | 提供确认后的员工数,以及 R&D/G&A/S&M 拆分。 |
没有公开来源披露 Modal 的毛利率、CAC、NRR 或流失;所有估计都来自标价对比、ARR 披露和分析师估算等 proxy。
[CI005, CI006, CI011, CI036, CI037, CI038]Modal 的单位经济路径从多云 GPU 采购,穿过自研效率技术,最终进入客户计费;但 COGS 和实际折扣未公开,因此无法推到毛利率。
毛利率是基于可比轻资产 GPU 基础设施公司的分析师区间估计(30–50%);Modal 尚未披露毛利率。效率技术节点来自公司技术博客,但其对利润率的财务影响没有量化。
[CI021, CI022, CI023, CI024, CI025, CI026]4.4 公开牵引力与资本充足性
Modal 的公开牵引力叙事强过多数 Series C 阶段的私有基础设施公司。公司在 2026 年 5 月 Series C 公告中披露年化收入超过 $300M;这类自愿披露多数私有公司会回避。Sacra 也验证了方向,估算 2026 年 4 月 ARR 为 $300M,而 2025 年底约 $119M;五个月隐含增速约 150%,年化后同比超过 300%。公司称自 2025 年 10 月 Series B 以来收入增长 5x;若 Series B 时 ARR 约 $60M、2025 年 12 月约 $119M,这与 Sacra 估算相符。客户名单横跨 robotics(Physical Intelligence)、music(Suno,每天数百万首歌、数千块 GPU)、coding agents(Cognition、Lovable)、enterprise commerce(DoorDash)、document AI(Reducto)、social(Substack)和 developer productivity(Ramp),显示出真实的平台宽度,也降低了单一垂直行业集中风险。 公开资料显示资本充足性偏强,但无法完成承销级确认。「公司概览」章节(完整轮次时间线见该章)记录了三轮机构融资,最终是 2026 年 5 月以 $4.65B post-money 完成 $355M Series C。对本章资本充足性分析,关键事实是:Series C 在 Series B 后一年内完成,提供了大量运营资本;公开证据支持的总融资额约 $465M(seed 约 $7M、Series A 约 $16M、Series B 按公司语境约 $110M[Sacra 报告 $87M,构成证据缺口]、Series C $355M);该轮由 General Catalyst 联合领投,GC 团队的 Quentin Clark、Max Rimpel、Katie Keller 参与,意味着行业内资本最充足的 growth-equity 机构之一提供了深度受托监督。 公开证据无法确定的事项包括:账上现金、月度 burn rate、runway、Modal 在毛利或经营层面是否亏损、任何债务或授信安排义务,以及向云厂商承诺 GPU 容量是否构成表外负债。120–180 人团队,按纽约 / 旧金山 AI 基础设施公司的典型薪酬福利,再加上多云 GPU 采购,可能意味着可观的月度现金消耗。$355M 融资提供了厚实缓冲,但没有内部财务数据,任何 runway 估算都站不住。公开来源中唯一的负面信号仍是宕机模式:2026 年 6 月 3 日 Hacker News 社区报告记录一个月内三次重大宕机——5 月 7 日 AWS 过热事件、5 月 19 日未列明事件、6 月 3 日内部认证系统故障——提示高增长率可能暂时掩盖运营风险。[CI029, CI030, CI031, CI032, CI033, CI034]
| 指标 | 公开数值 / 状态 | 有来源支撑的含义 | 尽调问题 |
|---|---|---|---|
| 总融资额 | 约 $465M(seed 约 $7M、Series A 约 $16M、Series B 按公司上下文约 $110M、Series C $355M) | 对一家 2021 年成立的公司而言,资本基础充足;为继续采购 GPU 和团队增长提供缓冲。 | 确认 seed 和 Series A 的准确金额;解决 Sacra / $110M Series B 差异。 |
| 最近融资(Series C) | 2026 年 5 月以 $4.65B post-money valuation 融资 $355M,由 General Catalyst 和 Redpoint 共同领投 | 顶级投资方新近领投大额融资;若以 120–180 人基础设施公司的典型 burn rate 估算,runway 很充足。 | 提供交割后现金余额和董事会批准的资金用途计划。 |
| 年化收入 | > $300M ARR,截至 2026 年 5 月(公司披露) | 若收入按披露速度增长,即便尚未完全盈利,业务也可能在毛利基础上接近自我可持续。 | 提供月度 ARR 和毛利率,以判断贡献利润率轨迹。 |
| 员工数和 OpEx proxy | Series C 博客称 120+;LinkedIn 人员页面约 180 | 以 150 人(中点)团队在 NY/SF 的市场薪酬估算,福利和基础设施前的年度现金薪酬约 $25–40M+;总 burn 可能为每年 $50–100M+(仅估算区间)。 | 提供按职能划分的实际员工数、总现金薪酬和月度经营现金 burn。 |
| 现金余额 / 月 burn / runway | 未公开披露 | 没有这些数据,无法承保资本充足性;$355M 融资说明 runway 可能足够,但不能确认。 | 提供当前非受限现金余额、过去 6 个月平均 burn,以及 base 和 downside 情景下的 runway。 |
| 计划资金用途 | 规模化低延迟推理;RL / 训练循环;Sandbox 扩展;NY、SF、Stockholm 团队增长 | 投资目标是产品和团队,而非硬件资本开支;与轻资产模型一致。 | 提供未来 18 个月按职能和产品划分的 capex/opex 预算。 |
| 债务 / 项目融资 / 云承诺义务 | 无公开披露;GPU 容量通过未披露商业条款从超大云厂商采购 | 没有公开披露不等于没有义务;云 committed-use 折扣通常要求最低支出承诺。 | 提供所有债务额度、云供应商最低支出承诺、预留容量义务和重大供应商条款。 |
融资历史引用自 Company Overview 章节;本表只为资本充足性输入生成本地 Financials claims。现金、burn、runway 和义务事实仍属私有。
[CI029, CI030, CI031, CI032, CI033, CI034]截至 2026 年 6 月,Modal 关键财务指标的来源限定区间,按证据层级拆分。
ARR 和估值倍数由公司披露,或可从公开数据直接推导。其他估计均为分析师区间,不应作为公司数据引用。
[CI029, CI033, CI034, CI035, CI036, CI037]Modal 的资本结构从股权融资流向轻资产 GPU 采购和 R&D 投入,没有披露硬件资本开支或债务义务。
所有流出数字都是分析师基于人数代理和可比基础设施公司的估计。Modal 未披露财务报表、现金余额或消耗数据。该瀑布图展示资本流结构,不是 P&L。
[CI029, CI030, CI031, CI032, CI033, CI034]4.5 财务结论与披露缺口
与同阶段多数基础设施公司尽调文件相比,财务结论更偏建设性;但没有私有数据,仍无法承销。正面看,Modal 做了一件不寻常的事:在公开公告中自愿披露 ARR 突破 $300M,并称较上一轮增长 5x。这种透明度叠加 Sacra 的独立验证,让收入声明比单纯公司自述更可信。按用量计费的模型很适合 AI 工作负载类别——客户部署更多模型、增加更多 agents、扩大终端用户基础时,消耗自然扩张,形成天然扩张循环;Sandbox 分部已从 2023 年产品发布成长到 2026 年贡献超过三分之一收入,说明这种循环已经可见。客户名单覆盖多类用例,并有具名生产部署达到相当规模。 轻资产供给模型保留了本会被自持 GPU 竞争对手花在硬件上的现金,但也制造了一个外界无法验证的毛利率天花板。自研技术护城河——GPU snapshotting、定制文件系统、多云池化——应当能让毛利相对纯转售运营商改善,但实际毛利率、逐项 COGS 和云采购条款都没有公开。披露前,$300M ARR 与任何盈利路径之间的空白只能靠假设填补,而不是证据。 宕机模式是一个实质负面信号,削弱了可靠性叙事。一个月三次事件,包括一次内部认证故障,提示基础设施成熟度存在缺口;对这个 ARR 规模的云基础设施公司来说并不常见。整体 uptime 数字(GPU functions 99.946%)单看足够,但 2026 年 5–6 月事故聚集,正好与公司宣传收入增长 5x 的时期重合——可能说明运营扩张落后于商业增长。 资本充足性方向上为正——$355M 对基础设施公司的 Series C 来说金额很大——但没有现金余额和 burn 披露,无法确认。15.5x ARR 估值倍数与 2026 年中市场共识中的 AI infrastructure 倍数相符,但足够高,任何增长降速都会引发实质重估。总结结论是:作为私有公司,Modal 的收入质量强,资本位置刚刚补足,技术护城河可信。尽调阻断项是毛利率不透明、burn rate 不透明、宕机风险,以及「公司概览」章节记录的治理 / 披露缺口。投资前,完整私有财务披露是最重要的一道关。[CI002, CI007, CI011, CI036, CI037, CI038]
| 缺失的私有指标 | 对承保的影响 | 精确尽调路径 |
|---|---|---|
| 按产品线划分的毛利率(Compute、Sandboxes、Storage、Notebooks) | 无法判断 $300M ARR 对应的是 30% 还是 60% 毛利;差异会影响数十亿美元内在价值。 | 索取过去四个季度经审计的产品级 P&L,并按云供应商和 GPU 家族拆分 COGS。 |
| 云供应商采购条款、承诺支出和预留容量义务 | GPU 转嫁成本是最大 COGS 项;未披露的采购折扣决定毛利率下限。 | 审查所有云供应商协议(AWS、GCP、Oracle),包括 committed-use 合同、reserved-instance 持有量和 spot-instance 组合。 |
| 月度 burn rate 和现金余额 | 资本充足性只是被声称,未被证明;runway 可从 24 个月到 60+ 个月不等,取决于 burn。 | 提供当前非受限现金、过去 6 个月净 burn(含基础设施付款)和 12 个月情景 runway 模型。 |
| 客户集中度(前 10 大占 ARR 百分比)和 NRR | 收入质量取决于增长是广泛分布,还是集中在 2–3 家超大云 / agents 公司;NRR 决定扩张循环是否真实。 | 提供前 20 大客户收入表、按 cohort 年份划分的 dollar NRR,以及过去四个季度的 logo churn。 |
| 按获客渠道划分的 CAC 和回本周期 | PLG 模型应带来低 CAC,但没有数据就无法确认增长效率;startup program 经济性未知。 | 提供按渠道划分的 CAC(PLG self-serve、startup program、outbound、marketplace)、收入转化用时和按方案层级划分的回本周期。 |
| Series B 金额和日期差异解决 | Sacra 报道 2025 年 9 月为 $87M;公司上下文报道 2025 年 10 月为 $110M;领投方名称也不同;尚未解决。 | 提供 Series B 交割文件,确认准确轮次规模、日期、领投方和 cap table 影响。 |
| 收入确认政策和递延收入 | 消费型收入通常确认简单,但 startup credits、企业最低承诺和预付计算可能产生递延收入或冲减收入项目。 | 提供收入确认政策、递延收入余额和 credit 负债明细。 |
每一行都是重大尽调阻塞点。公开证据确立了强方向性叙事,但不足以承保收入质量、利润率或资本充足性。
[CI036, CI043, CI044, CI047, CI048, CI049]05产品与技术
5.1 以客户工作流理解的产品表面
Modal 把自己定位为围绕单一心智模型构建的「production cloud for AI」:任何 Python function 加一个 decorator,就能变成可自动扩缩、由 GPU 支撑的 cloud job。按客户工作流看,产品覆盖四类使用模式。第一,交互式和探索式计算:Notebooks 让 ML 工程师几秒内拉起带 GPU 的浏览器 notebook,`modal shell` 命令可把 debug shell 直接接到运行中的容器。第二,批处理和定时任务:Functions 通过 `map()`、`starmap()`、`for_each()` 在数千个容器中并行 fan out,`modal.Cron`/`modal.Period` 则无需外部调度器即可处理基于时间的触发。第三,服务化和实时推理:Web Endpoints 通过 `@modal.fastapi_endpoint`、ASGI 或 WSGI apps,把任意 function 暴露为公开 HTTPS endpoint;`@modal.concurrent` 提供 input concurrency,支持 LLM serving 的 continuous batching。第四,agent 和不可信代码执行:Sandboxes 是临时隔离容器,接收任意代码(来自 LLM 或用户),在 gVisor 隔离下执行,并返回 stdout/stderr——Lovable 用它支持数万场同时进行的 app 创建 session,Cognition 为 coding agents 运行了数百万个 Sandboxes。Storage 也是一等公民:Volumes(高性能分布式文件系统)、Dicts(分布式 key-value)和 Queues(FIFO、多生产者 / 消费者)补齐了 primitives。统一 SDK 意味着团队可以在同一套 codebase 里,从单 function 原型,走到生产 serving cluster,再走到 agent sandbox,无需更换基础设施供应商。[CE001, CE002, CE006, CE007, CE008, CE009]
| 模块 / 资产 | 主要用户 | 状态 / 成熟度 | 核心功能 | 差异化 | 尽调缺口 |
|---|---|---|---|---|---|
| Functions | 运行 GPU/CPU 工作负载的 ML 工程师和应用开发者 | GA / 成熟核心产品 | 任何 Python 函数都可通过 @app.function 或 @app.cls 变成自动扩缩的云作业;支持 GPU、并发和生命周期 hooks | 纯代码定义;约 1s 容器冷启动;无需预留即可从 0 扩到 1,000+ 个 GPU;多云资源池 | 没有独立验证的冷启动基准方法,也没有 standard/team 层级公开 SLA |
| Sandboxes | 执行 LLM 生成代码的 coding agent 和 AI 应用开发者 | GA / 快速增长 | 运行时启动的隔离 gVisor 容器,具备完整文件系统 / 网络隔离;支持 stdin/stdout/stderr、TCP tunnels、volume mounts、生命周期事件 | 50,000+ 个并发 Sandboxes(Lovable);累计启动 10 亿+ 个 Sandboxes(2026 年 5 月);亚秒级启动 | Sandbox 专属 SLA 条款和每 workspace 最大数量未完全公开 |
| Training | 用 GPU 集群微调或训练模型的 ML 工程师 | GA / 正扩展到多节点 | 托管 GPU 训练作业;多节点并配 RDMA 网络(据 Sacra);可分发到池化容量 | 训练和推理使用同一 SDK,消除供应商交接;checkpoint 可直接进入 serving | 本轮无法访问专门的训练文档页;多节点 / RDMA 成熟度尚未完全公开 |
| Volumes | 存储模型权重、数据集和 pipeline 输出的工程师 | GA(v2,HIPAA 范围扩展) | 分布式文件系统,针对一次写入、多次读取优化;由多云支撑以实现高可用;带宽最高 2.5 GB/s | 默认分布式运行,不需要管理副本;已与 Modal Functions 和 Sandboxes 集成;v2 符合 HIPAA | v1 Volumes 不在 HIPAA BAA 范围内;按日计费快照意味着删除最多需要 4 天才会反映到账单 |
| Web Endpoints | 用 Modal Functions 承接 HTTP 流量的 API 和应用开发者 | GA / 成熟 Web 服务层 | 通过 @modal.fastapi_endpoint 或 @modal.asgi_app,把 FastAPI、ASGI、WSGI 应用或简单 Python 函数暴露为公开 HTTPS 端点 | 平台托管冷启动并支持缩至零;Team 计划可用自定义域名 | Web endpoints 没有公开合同式 uptime;90 天状态显示 99.933% |
| Notebooks | 做探索式 / 协作式计算的 ML 工程师和研究人员 | GA(2025 年推出,支持 GPU memory snapshot) | 浏览器协作式 notebooks,可挂任意 GPU;GPU memory snapshots 可将启动时间最多缩短 10x | GPU 支撑的协作 notebooks,冷启动速度接近 serverless Functions;适配任意 ML 框架 | Memory Snapshots 不在当前 HIPAA BAA 范围内,限制了受监管研究环境的使用 |
| Dicts | 在 modal Functions 或 Sandboxes 间共享分布式状态的工程师 | GA / 工具型原语 | 分布式键值存储,任意位置可访问;cloudpickle 序列化;分布式锁 | 任意容器或 SDK 调用都可访问;可与其他 Modal 原语无缝组合 | 100 MiB/object 上限和 7 天不活跃 TTL;不保证持久化(建议用于小对象) |
| Queues | 构建异步流水线、fan-out 工作流和生产者 / 消费者模式的工程师 | GA / 工具型原语 | 多生产者、多消费者 FIFO 队列,按字符串 key 分区;支持同步 / 异步访问;24 小时 TTL | 云原生替代 Redis/Celery 队列,不需要管基础设施;可与 Functions 搭配做异步 fan-out | 24 小时 TTL 意味着队列不适合持久消息保存;每个分区 5,000 个 item |
| Scheduled Functions | 运行基于时间的 jobs 或 pipelines 的工程师 | GA / 简单调度 | Period(间隔)和 Cron 语法调度,挂在已部署 Modal Functions 上;可通过 dashboard 监控 | 不需要外部 Airflow、Prefect 或 cron 基础设施;调度与 function definition 放在一起 | 调度不能暂停;必须移除后重新部署;Period 会在重新部署时重置 |
状态基于截至 2026-06-14 的 Modal 公开文档和博客文章。"GA" 标签来自活跃公开文档和客户案例的推断;除 GPU Memory Snapshots(标为 alpha)和 Snapshot restores 外,Modal 并未持续使用 GA/alpha 标签。
[CE001, CE002, CE006, CE007, CE008, CE009]| 用户任务 | 当前工作流(不用 Modal) | Modal 方案 | 公开可量化收益 | 限制 |
|---|---|---|---|---|
| 按波动需求大规模运行 LLM 推理 | 预留 GPU 实例,配置 autoscaling,手动管理冷启动和模型加载 | Functions 搭配 GPU 类型,@modal.concurrent 做 continuous batching,Memory Snapshots 降低冷启动 | Reducto:P90 latency 降低 3x,cold boot 降低 83%;Physical Intelligence:约 10-15ms 网络开销 | GPU memory snapshots 不兼容 multi-GPU 和非 CUDA GPU code;限制已有文档说明 |
| 在生产中安全执行 agent 生成的代码 | 为不可信代码隔离自建或租用定制容器编排 | Sandboxes 提供 gVisor 隔离、volume mounts、TCP tunnels;一次 API 调用即可启动 | Lovable:数万个并发 app creation sessions;Cognition:为 coding agents 使用数百万个 Sandboxes | Sandbox availability 没有公开 SLA;status page 显示 90 天 uptime 为 99.861% |
| 端到端运行 RL training loop(rollouts、grading、inference) | 跨供应商拼接独立训练基础设施、sandbox environments 和 inference servers | 单一 SDK 覆盖 Sandboxes(rollouts)、Functions(grading fan-out)、Training(model updates) | Applied Compute:用于 DoorDash、Cognition、Mercor RL workloads;唯一具备全部 RL primitives 的平台 | Multi-node RDMA training 成熟度未完全公开;本次研究中 training docs 受阻 |
| 快速反馈下部署并迭代模型 | 打包模型,构建 container,推送 registry,配置 deployment YAML,搭建 monitoring | modal deploy <filename>;Image 用 Python 定义;modal serve 做 live reload;modal shell 用于 debug | Reducto:同等 endpoint deployment,"2 lines of code" 对比 "150 lines of code plus CNS and Cloudflare" | 开发者工作流针对 Python 优化;非 Python model artifacts 需要手动封装 |
| 把文档或媒体处理扩到企业级吞吐 | 预置集群 capacity,或使用需要复杂编排的 queued batch system | Functions 搭配 map() fan-out,parameterized Functions 做按客户隔离资源池,region-pinned Functions | Reducto:在不到一小时内拿到 1,000+ GPUs,完成 100k pages/minute 企业 load test | 大规模成本高于自管 RunPod 或 spot instances;企业价格需要直接谈判 |
收益来自公司发布的客户案例公开结果,不代表保证结果。限制列反映官方文档或公开信息中的约束。
[CE002, CE006, CE007, CE008, CE015, CE016]开发者或团队如何把本地 Python 函数或模型迁移到 Modal 上的生产工作负载,并分流到推理、智能体执行和批处理。
[CE001, CE002, CE006, CE007, CE012, CE022]5.2 架构与运营模型
Modal 的架构围绕 Python SDK 分层,把多云 GPU provisioning、容器管理和分布式存储抽象成一个编程接口。计算容器通过 `modal.Image` Python API 定义(链式调用:`Image.debian_slim().pip_install(...)`),无需 YAML 或 Dockerfile;随后 image builder 校验镜像并分发到 worker nodes。容器运行在 gVisor 中,这是 Google 在 Cloud Run 和 GKE 使用的 kernel sandbox,工作负载隔离强于标准容器 namespace。容器运行时用 Rust 编写,以追求性能和内存安全。容量在全球 AWS、GCP、Oracle Cloud Infrastructure 的数百个数据中心池化,让 Modal 能把每次 GPU 请求路由到最便宜的可用硬件,用户无需预留容量。GPU 选择写成 `@app.function(gpu="H100")`;Modal 可能自动升级请求(H100→H200、A100-40GB→A100-80GB)且不额外收费,以最大化池化利用率。多 GPU 容器每个容器最多支持 8 张卡(B200、H200、H100、A100、L4、T4、L40S)。`@modal.concurrent` 提供 input concurrency,让容器可同时处理多个请求;这对 vLLM 或 SGLang LLM serving 的 continuous batching 很关键。容器生命周期模型(通过 `@modal.enter` 和 `@modal.exit` 设置 enter/exit hooks)把一次性初始化与逐请求执行拆开,从而高效加载模型权重。Region selection(最高支持窄 / 宽粒度)和独立 routing regions(us-east、us-west、eu-west、ap-south)让对延迟敏感的工作负载可以贴近数据库或机器人。Secrets 通过 `modal.Secret` 注入为环境变量,永远不会进入 image build layer。[CE003, CE004, CE005, CE013, CE014, CE030]
| 层 / 组件 | 角色 | 关键技术细节 | 依赖 | 风险 |
|---|---|---|---|---|
| Python SDK / decorator 层 | 开发者界面;把 decorated Python functions 转成 Modal App objects | @app.function、@app.cls、@modal.enter、@modal.exit、@modal.fastapi_endpoint、@modal.concurrent;不需要 YAML | Python 3.10-3.14;开源 client(modal-labs/modal-client) | SDK 任何 breaking change 都要求下游开发者改代码;2026 年 6 月为 v1.5.0 |
| 容器镜像构建器 | 把 Python Image definitions 转成分发给 workers 的 container images | 从 Image.debian_slim() 开始链式调用;pip/uv install;Dockerfile fallback;add_local_dir 放本地代码 | Modal 控制的 build infrastructure;底层 cloud provider storage | Image build 90 天 uptime 为 99.863%;image build 失败会阻断 deployments |
| gVisor 容器运行时 | 为 Functions 和 Sandboxes 提供 OS 级隔离;GKE 和 Cloud Run 使用的 kernel sandbox | 每个 container 都在 gVisor 下运行;自动 synthetic monitoring 检查 network/application isolation | Google 维护的 gVisor project;NVIDIA CUDA driver 兼容性可能限制未来 GPU 功能 | gVisor 对新 CUDA 功能的兼容性需要 driver certification testing |
| Rust worker 运行时 | 执行 container lifecycle,处理 network I/O,并与 storage layer 协调 | memory-safe 实现提升安全性;处理 TLS、gRPC 和 container IPC | Modal 内部专有组件 | 核心专有组件;外部可审计性有限 |
| 自定义内容寻址容器文件系统 | 从多层 cache(worker memory → cluster → storage)提供 image layers;降低冷启动 | 文件按 content-addressed 管理;热门文件(torch 等)缓存在 worker memory;比未缓存快 3-5x | 多云对象存储(AWS S3、GCP GCS、Oracle) | Cache 效果取决于文件热度分布;新 image builds 初期可能 cold-start 更慢 |
| CPU Memory Snapshots | 捕获首次请求前的 container memory state;冷启动时恢复,跳过重新初始化 | 捕获 Python imports、JIT compilation results;冷启动快 3-10x;与 @modal.enter(snap=True) 集成 | Cloudpickle-compatible serialization;Modal distributed filesystem 用于 snapshot storage | 不在 HIPAA BAA 范围内;与 snapshot 阶段的 stateful I/O 不兼容 |
| GPU Memory Snapshots(alpha,GPU 内存快照) | 将 CPU snapshots 扩展到捕获 GPU device memory、CUDA kernels、streams 和 memory mappings | 使用 NVIDIA CUDA checkpoint/restore API(driver 570/575 branches);cuCheckpointProcessCheckpoint();冷启动最多降低 10x | NVIDIA driver compatibility 要求;目前为 alpha 状态 | 不兼容 multi-GPU 和非 CUDA code;torch.compile 交互需要 workarounds |
| 多云容量池 | 将每个 GPU request 路由到 AWS、GCP、Oracle 中可用硬件;用户不需要预留 | 各 GPU 类型维持闲置 GPUs 的 cloud buffers;自动升级路径(H100→H200、A100→A100-80GB) | AWS、GCP、Oracle Cloud Infrastructure;Sacra 提到 Oracle partnership | Cloud provider outages 会直接影响 capacity(5 月 7 日 SEV1:AWS AZ overheating);incident history 可见 single-AZ failures |
| 密钥管理 | 把 credentials 作为环境变量注入 containers,不写入 images | Dashboard、CLI 和 Python API 可创建 / 更新 / 删除;每个 Function 可挂多个 Secrets;key-value limit 32KB | Modal 控制的 secret storage;Dependabot-audited dependencies | 公开文档未提到 HSM 或专用 secret-store 集成 |
架构细节来自截至 2026-06-14 的 Modal 官方文档和工程博客。Rust runtime 与 content-addressed filesystem 架构由 Sacra analyst research 和 Modal 自身技术博客确认。
[CE002, CE003, CE004, CE005, CE013, CE014]分层展示 Modal 的公开架构,从开发者接口到容器执行,再到多云硬件和存储。
[CE001, CE003, CE004, CE005, CE008, CE009]5.3 冷启动技术与容器创新
Modal 技术上最突出的能力是冷启动工程,2026 年 5 月工程博客「Truly Serverless GPUs」有详细记录。四层能力叠加,把 GPU replica scaling 从「multiple kiloseconds」压到数十秒。第一,cloud buffers:Modal 在网络中维持一池健康、空闲的 GPU,让多数扩容请求不必等待 hyperscaler 实例 provisioning。第二,内容寻址的多层容器文件系统:全球分布式 cache 把热门容器镜像文件存入 worker memory,相比未缓存下载快 3–5x;torch 等大库因被许多用户共享,收益尤其明显。第三,CPU Memory Snapshots(2025 年 1 月 GA):容器在接受请求前被 snapshot;后续冷启动直接从冻结的内存状态恢复,跳过 Python imports 和 JIT compilation;实际加速 3–10x。第四,GPU Memory Snapshots(2025 年 7 月 alpha):借助 NVIDIA driver branches 570/575 中的 CUDA checkpoint/restore API,Modal 捕获 device memory contents(model weights)、CUDA kernels、CUDA objects、streams 和 memory mappings;恢复时重建 GPU context,无需重新运行 `torch.compile` 等昂贵操作。公开 benchmark 显示,vLLM serving Qwen2.5-0.5B-Instruct 的 P0 cold start 从 45s 改善到 5s,带 `torch.compile` 的 ViT inference function 的 P0 从 8.5s 改善到 2.25s。生产中,Reducto 称采用 GPU snapshots 后,其文档处理模型 cold boot time 降低 83%(70s 到 12s)。Modal 记录的限制包括:GPU snapshots 通常不兼容 multi-GPU code 和非 CUDA GPU work,也不会加速从 storage 加载权重。整体架构瞄准 GPU Allocation Utilization 问题——把付费 GPU-hours 与真正运行应用代码的 GPU-hours 之间的差距最小化;Modal 认为传统固定分配云部署中,这个利用率远低于 50%。[CE015, CE016, CE017, CE018, CE019, CE020]
Modal 平台依赖的关键外部依赖和内部组件;突出单一供应商风险集中点和合规范围边界。
[CE013, CE016, CE019, CE020, CE027, CE030]5.4 信任、安全与可靠性
以 late-stage 私有公司标准看,Modal 的信任姿态较强。安全文档写得具体:worker runtime 和 storage infrastructure 用 Rust(一种内存安全语言)编写,所有容器工作负载运行在 gVisor 内,所有 public APIs 使用 TLS 1.3,所有用户数据在传输中和静态状态下加密,自动 synthetic monitoring 持续检查 runtime 内的网络和应用隔离。SOC 2 Type II 已通过且未发现偏差(2025 年 1 月审计),Modal 承诺每年续审。Enterprise 计划在 BAA 下支持 HIPAA-compliant workloads,但 Volumes v1、Images(不含 Filesystem/Directory Snapshots)和 Memory Snapshots 目前排除在 BAA 范围外;Volumes v2 在范围内。私有漏洞赏金计划通过 HackerOne 运行,并公布严重程度 SLA(Critical:24 hours;High:1 week;Medium:1 month)。Stripe 在 PCI Level 1 认证下处理支付;Modal 不存储信用卡信息。公司安全控制包括 SSO IdP、抗钓鱼 MFA、Secureframe MDM 和年度业务连续性演练。trust.modal.com 的 trust portal 提供合规文件访问。账本另一侧:状态页(2026 年 6 月 14 日)显示,过去 90 天 GPU functions uptime 为 99.946%,CPU functions 为 99.938%,Web endpoints 为 99.933%,Snapshot restores 为 99.782%——数字都扎实。不过,Hacker News 社区帖子(2026 年 6 月 3 日)记录一个月内三次重大运营事件:5 月 7 日(AWS AZ overheating,SEV 1)、5 月 19 日(没有公开 incident report)、6 月 3 日(内部认证系统故障)。整体 uptime 统计与这种短时宕机相容,但一个月三次集中出现是负面信号。Modal 未披露 Standard 或 Team 计划的公开 contractual SLA;enterprise SLA 条款只在谈判合同中提供。尽调应索取 SLA exhibits。[CE026, CE027, CE028, CE029, CE030, CE031]
| 控制 / 认证 | 状态 | 范围 / 细节 | 缺口 |
|---|---|---|---|
| SOC 2 Type II | 已取得(无偏差) | 年度第三方审计;2025 年 1 月完成;覆盖 security、availability、confidentiality;可在 trust.modal.com 申请报告 | 审计范围细节和控制集未公开;报告需从 trust.modal.com 申请 |
| HIPAA | Enterprise plan 可用 | 提交 PHI 前需要 BAA;Volumes v2 在范围内;Volumes v1、Images、Memory Snapshots 不在范围内 | Memory Snapshots(核心性能功能)不在 BAA 范围内——这对受监管 healthcare AI 团队是实质限制 |
| PCI | Stripe Level 1 | 支付处理通过 Stripe PCI Service Provider Level 1;Modal 不存储信用卡数据 | Modal 自身 compute services 未通过 PCI 认证;PCI workloads 需要额外控制 |
| 数据加密 | 传输中和静态加密 | 所有公开 APIs 使用 TLS 1.3;client library 验证 TLS certificates;用户数据静态加密 | 公开文档未单独描述 internal-to-worker data paths |
| 容器隔离 | gVisor(生产) | 所有 Functions 和 Sandboxes 都运行在 gVisor 下;与 Google Cloud Run 和 GKE 相同技术;synthetic isolation monitoring | 相比 native containers,gVisor 会增加 syscall overhead;CUDA driver 与 gVisor 的兼容性是已知工程约束 |
| 漏洞赏金 | 活跃(private) | 通过 HackerOne 运行 private program;可通过 security@modal.com 申请邀请;已发布 severity SLA(Critical 24h、High 1 wk、Medium 1 mo) | Private program 限制了外部安全研究员访问;未发布 Hall of Fame 或 payout history |
| 员工访问控制 | 已记录 | SSO IdP 搭配 phishing-resistant MFA;笔记本用 Secureframe MDM(FileVault2);年度 access audits;基于 PR 的 code review | 内部 penetration test 频率未披露;提到 "external penetration testing firms",但未说明 cadence |
| 可靠性 SLA | 没有公开 standard/team SLA | Enterprise SLA 通过合同提供;Starter/Team plans 没有公开 SLA;90 天状态:GPU 99.946%、CPU 99.938%、Sandboxes 99.861% | 2026 年 5–6 月:一个月内三次重大 incidents;5 月 19 日 incident 无公开 RCA;reliability confidence 仍是待尽调项 |
合规状态截至 2026-06-14。Memory Snapshots 的 HIPAA BAA 范围限制对 healthcare AI 客户很重要,因为 snapshots 是 Modal 冷启动性能价值主张的核心。
[CE026, CE027, CE028, CE029, CE030, CE031]5.5 开发者信号、差异化与路线图方向
Modal 的差异化落在开发者体验与基础设施深度的交叉点。开发者侧:不需要 YAML 或 Dockerfile,容器约 1 秒启动,数秒内从 0 扩到 1,000+ GPUs,同一套 SDK 覆盖 batch jobs、inference serving、agent sandboxes 和 training。`modal` Python package 在单日(2026 年 6 月)有 1.6M PyPI downloads,前一周 13.9M downloads;这一开发者采用信号与第 4 章中 $300M ARR 公司相符。GitHub repo(modal-labs/modal-client)开源,支持 Python 3.10–3.14,以及 JS/TypeScript 和 Go SDKs。GPU Glossary(gpu-glossary.com、modal.com/gpu-glossary)是覆盖完整 GPU 软件栈的教育资源,同时承担社区信号和工程品牌资产功能。基础设施侧:四支柱冷启动架构属于专有 R&D,hyperscalers 或 RunPod 这类更简单的 serverless GPU peers 都不提供。独立价格对比(HostFleet,2026 年 4 月)显示,Modal L4 为 $0.80/hr、A100-80GB 为 $2.10/hr——不是最便宜(RunPod L4:$0.43/hr;Together AI A100-80GB:$0.99/hr),但可与 Baseten(A100-80GB $4.00/hr)竞争。Modal 的价值主张不是最低单位价格,而是首个输出的速度(亚秒级冷启动)、按需扩缩(无需预留)和代码定义基础设施。相较 AWS Lambda(SnapStart、Firecracker isolation)和 Google Cloud Run(gVisor、scale-to-zero),Modal 增加 GPU 支持、多云池化、agent sandboxes 和统一的 training-to-inference SDK。公开来源可见的 2025–2026 年产品新增包括:带 GPU memory snapshots 的 Notebooks(startup 降低 10x)、clustered multi-node RDMA GPU workloads(按 Sacra)、B200/B200+ GPU tier、input concurrency 和 region routing。工程博客节奏与 GPU Glossary 都显示 Modal 仍在投入深层技术能力和开发者社区。关键开放尽调项包括:(1)冷启动或吞吐量声明缺少独立第三方 benchmark methodology;(2)private enterprise SLA terms;(3)HIPAA BAA 范围限制排除 Memory Snapshots 和 Images,而二者是性能核心;(4)2026 年 5–6 月宕机集群留下可靠性信心问题。[CE025, CE033, CE034, CE035, CE037, CE039]
| 日期 / 阶段 | 功能 / 里程碑 | 状态 | 含义 | 来源 |
|---|---|---|---|---|
| 2025 年 1 月 | CPU Memory Snapshots(GA) | GA | 核心冷启动技术;初始化快 3-10x;为 GPU snapshot 工作打基础 | Modal 博客(memory-snapshots 文档) |
| 2025 年 7 月 | GPU Memory Snapshots(alpha,GPU 内存快照) | Alpha | CUDA-compatible workloads 冷启动提速 10x;限 single-GPU 和 CUDA-only code | Modal 博客(gpu-mem-snapshots) |
| 2025 年末 | 支持 GPU 的 Notebooks | GA | GPU 支撑的协作 notebooks;GPU memory snapshots 将启动时间降低 10x;把探索式 workloads 转为经常性使用 | Sacra 分析师数据;Modal 定价页 |
| 2025 年末 / 2026 年 | 集群式 multi-node RDMA GPU workloads | GA(Sacra 确认) | 支持 Modal 上的大规模分布式训练;在单一供应商内补上 training-to-inference gap | Sacra analyst report(2026 年 4 月) |
| 2026 | B200 / B200+ GPU 档位 | GA;B300 opt-in | 支持 Blackwell architecture;B200+ 允许按 B200 pricing opt-in 到 B300;需要 CUDA 13.0+ | Modal GPU docs(2026-06-14) |
| 2026 | @modal.concurrent decorator(input concurrency,并发输入) | GA(v0.73.148+) | 支持每个 container 的 LLM inference continuous batching;降低 I/O-bound workloads 的 scale-up overhead | Modal docs(concurrent-inputs) |
| 2026 | JavaScript/TypeScript 和 Go SDKs | GA | 非 Python services 可做 orchestration 和 Sandbox invocation;降低对 Python monorepos 的 lock-in | GitHub modal-labs/modal-client 仓库 |
| 2026 | Region selection 和 routing regions | GA(适用 pricing multiplier) | Robotics 等 latency-sensitive workloads 可做到 sub-10ms network overhead;已增加 eu-west 和 ap-south routing | Modal 文档(region-selection);Physical Intelligence 案例研究 |
| 未披露后续路线图 | Flash Attention、vLLM、SGLang 贡献(Series C blog) | 进行中 | Inference engineers 团队为开源 LLM serving engines 贡献;性能收益回流社区 | Modal Series C blog(2026 年 5 月) |
日期根据博客发布时间、文档修订语境和第三方 analyst research 推断。除开源 inference engine contributions 外,公开资料未披露更多后续 roadmap items。"Sacra-confirmed" 指 Sacra analyst profile 的佐证;Modal 尚未以命名产品形式独立公布 clustered RDMA feature。
[CE015, CE016, CE017, CE033, CE034, CE036]截至 2026-06-14,基于公开文档、客户案例研究和状态数据,评估 Modal 主要产品模块的能力与成熟度。
[CE006, CE008, CE009, CE010, CE011, CE015]06客户
6.1 客户分层与买方画像
Modal 披露的客户集覆盖六类清晰原型。最大可见 cohort 是 AI-native software builders——产品本身就是 AI 应用的公司;买方是 ML 工程师和平台团队,需要弹性 GPU compute,但不想管理 clusters。Lovable($75M ARR,AI app generation)、Cognition(Devin coding agent)、Decagon(voice AI)和 Applied Compute(为 DoorDash 与 Cognition 做 RL agent training)都在这一类。第二类是 enterprise SaaS 和 fintech:Ramp(fintech,$10B+ GMV platform)、Quora(Poe,400M monthly unique visitors)和 Blend(面向数百个银行环境的 mortgage technology)。第三类覆盖 media 和 content platforms(Suno music generation、Runway video characters、Zencastr podcast AI),GPU demand 随消费者使用模式高度波动。Computational biology(Chai Discovery drug design)和 robotic AI(Physical Intelligence real-time inference)补齐具名客户基础。Sacra 2026 年分析估算 Modal 服务数千个 ML teams,并把 Meta 的 Code World Models 团队列为值得注意的 logo。所有分部中,买方通常是 ML、platform-engineering 或 applied-AI 团队,他们更看重 Python-native 易用性和即时扩展性,而不是更底层的控制权。可见客户群仍以 AI-native startups 和中型科技公司为主;fintech 和 banking 之外的传统企业名称在公开记录中稀疏,Runway Characters 公告(提到 Fortune 10 companies)部分填补了缺口,但没有完全关闭。[CU001, CU002, CU003, CU004, CU005, CU022]
| 分群 | 买方 / 用户 / 付款方 | 主要使用场景 | 规模指标 | 收入 / 战略价值 | 尽调缺口 |
|---|---|---|---|---|---|
| AI 原生软件构建者 | ML 工程师、平台团队 | LLM serving、RL training、代码沙箱 | 数千客户(Sacra);20K 并发 sandboxes(Lovable) | 高;与 Modal 共同高速增长的客户,工作负载规模大 | 未披露收入集中度数据;公开客户集中在 AI 原生群体 |
| 企业 SaaS / 金融科技 | ML/平台团队、应用 AI 团队 | AI 智能体、代码执行、ML 流水线 | 400M MAU 产品(Quora/Poe);Fortune 10 提及(Runway Characters) | 高;迁移完成后,切换成本来自开发者体验 | 未披露合同期限或 NRR |
| 媒体 / 内容平台 | ML 基础设施和内容工程团队 | 音频 / 视频 / 音乐生成、转写、批处理 | Zencastr 1,500 GPU 突发峰值;Suno 1,000 GPU 峰值 | 中;需求有季节性 / 波动;可能存在价格敏感性 | 如果 hyperscaler pricing 缩小差距,存在流失风险 |
| 计算生物学 / 研究 | ML 研究员、计算科学家 | 药物发现、蛋白质建模、批量实验 | Chai Discovery 按需使用数百 GPUs、TB 级数据集 | 中;研究预算;存在从学术转商业的潜力 | 学术 vs. 商业转化率未知 |
| 机器人 / physical AI | 基础设施工程师、机器人研究员 | 真实机器人的实时远程推理 | Physical Intelligence:10-15 ms latency,生产规模 | 高;新兴空白市场,公开可比对象很少 | 低于 10ms latency SLA 的定价模型未公开 |
分群边界来自公开案例和 Sacra 2026 report;规模指标来自单个客户,不是分群级汇总。收入和战略价值评级为定性判断。未公开 headcount、合同或各分群收入数据。
[CU001, CU002, CU003, CU005, CU025]| 用例类别 | 子类型 | 示例客户 | 规模证据 | 生产成熟度 |
|---|---|---|---|---|
| LLM 推理服务 | 自托管开放权重模型(vLLM/SGLang) | Decagon、Reducto、Quora(Poe) | 1,000 个沙箱/秒;30+ 个模型处于生产环境(Reducto) | 生产环境 |
| 沙箱代码执行 | LLM 生成代码隔离(gVisor 运行时) | Lovable、Quora、Ramp(Inspect)、Cognition 等编码代理客户 | >1B 个沙箱累计;峰值 20K 并发 | 生产环境 |
| RL 训练基础设施 | Rollouts + 评分 + 推理闭环 | Applied Compute、Cognition、AE Studio 等 RL 客户 | 1,000s 并行 rollouts;数千个并行环境 | 生产环境 |
| 定制微调 | SFT、RL 微调、模型评估 | Ramp、Decagon | 相比 LLM API 节省 79% 成本(Ramp);定制 EAGLE3 草稿模型(Decagon) | 生产环境 |
| 音频 / 视频 / 图像生成 | 媒体生成、转写、视频推理 | Suno、Runway、Zencastr | 1,500 张 GPU 突发(Zencastr);20ms WebRTC 延迟(Runway/Modal) | 生产环境 |
| 计算生物学 | 蛋白结构、抗体设计、MSA | Chai Discovery | TB 级数据集;数分钟内调起数百张 GPU | 生产环境 |
| 批量数据处理 | 大规模并行数据增强 | Substack、Ramp(发票 PII)、Reducto | 100K 页/分钟演示;25K 张发票 20 分钟完成,对比 3 天 | 生产环境 |
| 机器人实时推理 | 物理机器人远程推理(<15ms) | Physical Intelligence | 10–15 ms 延迟;<1 s GPU 启动;生产环境已部署 | 生产环境 |
类别来自 Modal 解决方案页面和已发布案例。规模证据来自单个客户披露,不是汇总指标。生产成熟度指客户称工作负载已进生产环境,并不代表 Modal 自身验证过该说法。
[CU002, CU006, CU009, CU010, CU011, CU012]从免费试用到多产品企业级使用,覆盖 Modal 主要买方群体的获客、上手、扩张和留存阶段。
旅程阶段根据案例研究叙事推断;没有披露漏斗转化数据或各阶段停留时长指标。
[CU001, CU003, CU004, CU026, CU027, CU029]6.2 具名客户证明与采用轨迹
Modal 的案例库现在覆盖十个生产部署,横跨多样工作负载且有可衡量结果。最强的单点数据是 Lovable(48 小时活动中 100 万个 sandboxes、创建 250,000 个 apps、活动期间没有工程 page)、Ramp(所有已合并 pull requests 中超过一半由运行在 Modal Sandboxes 上的 Inspect coding agent 编写)和 Reducto(迁移 30 多条 model pipelines 后 P90 latency 降低 3x,cold-boot times 缩短 83%)。十个具名部署中,每个被描述的用例都是生产而非 pilot——客户要么迁移既有工作负载,要么直接在 Modal 上构建全新产品,而不是只做评估。累计采用信号同样清楚:Modal 自己的 2026 年 5 月 Series C 公告披露,自约三年前创立以来,平台已启动超过 10 亿个 sandboxes。Series C 文章还指出,sandboxes 贡献总收入超过三分之一,确认了支撑 coding agents 和 RL infrastructure 的 sandbox 产品线已成为 Modal 增长最快的商业表面。Quora 从通用 model deployment 扩展到为 Poe code interpreter 采用 Sandbox,说明即便现有客户也会扩大用例覆盖。Runway 从 proof-of-concept 到全球生产部署不到 30 天,凸显了较短的 time-to-value,有利于客户快速承诺。[CU006, CU007, CU008, CU009, CU010, CU011]
| 指标 | 数值 | 日期 | 来源 | 置信度 | 含义 | 缺失分母 |
|---|---|---|---|---|---|---|
| 累计启动 sandboxes | >1 billion | 2026 年 5 月 | Modal X 帖文 + Series C 博客 | 高 | 平台速度;开发者使用规模得到确认 | 未披露月活用户或活跃客户数 |
| 并发 sandbox capacity(Lovable event peak) | 20,000 | 2025 年 6 月 | Lovable 案例研究(Modal 博客) | 高 | 通过基础设施压力测试;生产可行性得到确认 | 单个推广周末;不是常态 |
| 并发 GPU 规模(Zencastr batch) | 1,500 | 2024 | Zencastr 案例研究(Modal 博客) | 中 | 真实工作负载展示了弹性 GPU 扩展能力 | 一次性批处理任务;不是持续并发 |
| 并发 GPU 规模(Reducto load test) | >1,000 | 2025 | Reducto 案例研究(Modal 博客) | 中 | 企业级规模证明 demo 帮助拿下潜在客户交易 | 压力测试;不代表常态流量 |
| Sandboxes 收入占比 | >33% | 2026 年 5 月 | Modal Series C 博客(官方) | 高 | Sandbox 产品线是 Modal 增长最快的商业化表面 | 未披露绝对收入分母 |
| Modal Sandbox 创建速率(Quora stress test) | 1,000 sandboxes/sec | 2025 | Quora/Poe 案例研究(Modal 博客) | 高 | 企业客户验证了基础设施吞吐能力 | 单点基准;不是持续吞吐数据 |
数值来自单个客户披露或 Modal 自身博客;截至 2026 年 6 月,未公开客户总数、收入 run rate 或 cohort 指标。置信度反映来源质量,不代表统计显著性。
[CU006, CU007, CU009, CU010, CU011, CU017]| 客户 | 分群 | 部署 / 使用场景 | 生产 vs. 试点 | 关键结果 | 证据限制 |
|---|---|---|---|---|---|
| Lovable | AI 原生 app 构建者 | 每次应用生成 session 都使用 Modal Sandboxes | 生产(所有 sessions) | 48h 内 1M sandboxes;创建 250K apps;代码减少 97%(15K→700 LoC) | Modal 撰写的博客;未独立验证 |
| Ramp | 金融科技 / 企业 SaaS | 微调 + Inspect 编码代理(Sandboxes + Dicts + Queues) | 生产环境(两个用例均已上线) | 50%+ PR 由 Inspect 合并;收据修复率提升 34%;相比 LLM API 成本降低 79% | Modal 博客,经 Ramp X 帖文(Rahul Sengottuvelu)证实 |
| Decagon | AI 原生语音 AI | 定制 SFT/RL 微调 + 实时推测解码推理 | 生产环境(Voice 2.0 已发布) | 延迟降低 65%;p90 342ms;草稿模型接受长度提高 38% | Modal 博客 + Decagon 自有 Voice 2.0 产品页 |
| Runway | 媒体 / 视频 AI | 面向 Runway Characters 实时视频代理的多节点 GPU 推理 | 生产环境(2026 年 3 月发布) | POC 到生产环境 <30 天;Fortune 10 机构、Hollywood 工作室、代理机构为下游用户 | Modal 博客(Wayback)+ Runway 网站证实 Characters 产品 |
| Cognition | AI 原生(自主编码代理) | RL 基础设施 + 生产推理(Devin) | 生产环境 | 数百万个沙箱(RL);实时模型服务;Series C 公告引用 CEO | Modal 博客客户证言 + Series C 引述;Cognition 网站证实产品 |
| Quora / Poe | 企业 SaaS | Poe AI 聊天机器人代码执行使用 Modal Sandboxes(400M MAU) | 生产环境 | 压测达到 1,000 个沙箱/秒;持续节省约 2 名工程师投入 | Modal 博客案例;官方来源含客户直接引述 |
| Suno | 媒体 / 消费 AI | 推理 + 批量预处理扩容 | 生产环境 | 可扩至 1,000 张 GPU;上市提早 4 个月;Microsoft Copilot 合作 | Modal 博客案例;Suno 网站证实产品规模 |
| Reducto | 企业文档智能 | 30+ 条模型推理流水线(金融、法律、医疗、保险) | 生产环境 | P90 延迟降低 3×;冷启动时间降低 83%;100K 页/分钟演示 | Modal 博客案例;Reducto 网站证实企业客户基础 |
| Applied Compute | AI 原生 RL 训练(服务 DoorDash、Cognition、Mercor) | 面向企业客户的完整 RL 训练闭环(rollouts、评估、推理) | 生产环境 | 数千个并行 rollout;面向 DoorDash 商户入驻的定制代理 | Modal 博客;引用 Applied Compute CEO;DoorDash 和 Cognition 被点名 |
| Chai Discovery | 计算生物学 / 药物发现 | 蛋白结构、MSA、抗体设计 ML 流水线 | 生产环境 | 数分钟内调起数百张 GPU;通过 Modal Volumes 处理 TB 级生物数据集 | Modal 博客案例;ML 研究员直接引述 |
Modal 博客案例中的 10 个生产部署(2024–2026);客户页面还有其他 logo,但缺少结果细节。证据主要来自 Modal 自有内容;Ramp(X 帖文)、Decagon(产品页)、Runway(网站)和 Cognition(CEO 引述)存在独立第三方佐证。未披露客户合同、定价或 NRR 数据。
[CU007, CU012, CU013, CU014, CU015, CU016]从免费层到生产和扩张,估算开发者转向企业客户的漏斗,并用已披露的采用里程碑锚定。
漏斗阶段值是根据案例研究和 Sacra 分析得出的定性描述;公开材料没有披露转化率或同期群数量。阶段标签为近似值。
[CU004, CU005, CU006, CU011, CU026, CU027]6.3 留存、耐久性与扩张信号
留存证据方向为正,但结构上不完整。正面看,至少两个具名账户(Ramp 和 Quora)有记录显示 multi-product expansion:Ramp 从 fine-tuning 走到完整 Inspect coding agent platform,Quora 从 model deployment infrastructure 扩展到 Poe code interpreter 的完整 Sandbox 采用。Lovable 创始人明确称 Modal 是他们「trust to keep up with growth」的伙伴,这种语言更像高承诺意图,而非短期评估。平台的结构性 land-and-expand 动作可见:客户通常从一个工作负载开始(fine-tuning job、batch pipeline、single inference endpoint),随规模扩大再增加产品(Sandboxes、Volumes、Queues、multi-node clusters)。多个案例显示客户从拼接式 AWS 或 Kubernetes 环境迁出且没有回头,说明高 switching costs 来自开发者体验,而非技术锁定。耐久性缺口在于,Modal 没有在本轮审阅的任何公开 filing、press release 或 interview 中披露 NRR、GRR、contract duration、average revenue per account、cohort retention 或 top-customer revenue concentration 数据。这意味着扩张信号仍是 anecdotal,不能外推到全量客户账本。可靠性风险真实存在:Hacker News 记录、状态页确认的 2026 年 5–6 月三次独立宕机,提出了企业客户是否经历 SLA breaches、事后是否发生 churn 的问题。[CU026, CU027, CU028, CU029, CU030, CU031]
| 指标 | 数值 / 状态 | 细分 | 置信度 | 尽调问题 |
|---|---|---|---|---|
| 净收入留存(NRR) | 未公开披露 | 全部 | 低 | 向管理层索取 NRR;耐久性判断的关键关口 |
| 总收入留存(GRR) | 未公开披露 | 全部 | 低 | 索取按 cohort 拆分的 GRR 和年化流失率 |
| 合同期限 / 续约节奏 | 未披露;按用量计费意味着存在月度流失风险 | 企业 | 低 | 询问平均合同期限,以及年度 ARR 与月度 ARR 的占比 |
| 头部客户收入集中度 | 未披露 | 全部 | 低 | 索取前 5 和前 10 大客户 ARR 占比 |
| 扩展:Ramp(从微调扩展到编码代理) | 已证实约 2 年内完成多产品扩展 | 金融科技 / 企业 SaaS | 高 | 核实单账户 ARR 增长,以及扩展是否仍在继续 |
| 扩展:Quora(从部署扩展到 Sandboxes) | 已证实;Quora 同时用 Modal 做 Poe 部署和代码执行 | 企业 SaaS | 高 | 核实采用 Sandbox 后是否继续扩展 |
| 满意度代理指标:客户证言 | 10 个点名案例全部正向;未发现负面客户引述 | 全部 | 中 | 未披露独立 CSAT、NPS 或评测平台评分 |
| 可靠性满意度风险 | HN 显示 2026 年 5–6 月发生三次重大宕机;90 天正常运行率 99.86–99.95% | 企业 / 延迟敏感型 | 中 | 宕机后是否产生 SLA 抵扣或客户流失;状态页显示事故记录 |
NRR、GRR、合同和集中度各行为空值,因为没有公开披露。扩展行来自具体点名账户,不能外推。可靠性数据来自 status.modal.com 和 HN。
[CU026, CU027, CU029, CU031, CU032, CU033]6.4 集中风险、负面信号与竞争压力
核心集中风险在公开记录中不可见,只能从缺席中推断。Modal 没有披露前五或前十大客户的收入占比。鉴于案例库中少数高知名度账户运行极大工作负载(Lovable 48 小时 100 万个 sandboxes;Suno 扩到数千块 GPU),一个小型 hyperscale 客户 cohort 贡献不成比例的 compute consumption 是合理可能。平台按用量计费,意味着任一大客户减少工作负载——无论因为模型优化、切换竞争对手,还是自身业务收缩——都可能带来显著收入波动。Sacra 提醒,hyperscaler 竞争(AWS、Google、Azure 增加带 scale-to-zero 计费的 serverless GPU)可能随时间侵蚀 Modal 的成本和冷启动优势。DoorDash 2026 年 5 月引用中把其在 Modal 上使用 Claude Managed Agents 描述为「evaluating」,读起来更像方向性探索,而非已承诺的生产支出,说明部分具名账户所处阶段可能早于最成熟案例所暗示的状态。2026 年 5–6 月记录的三次宕机构成负面信号:Hacker News 用户评论称 6 月 3 日事件是「the third major outage in a month」,指向可能危及延迟敏感企业工作负载留存的可靠性趋势。Modal 90 天 99.86–99.95% uptime 数字可用,但对 mission-critical production systems 不是顶级。切换成本方面,Modal 受益于 Python-native 易用性和低基础设施开销,但 open-model、open-runtime 设计意味着客户离开时可以带走模型和代码。[CU031, CU032, CU033, CU034, CU035, CU036]
| 扩展驱动因素 | 集中度 / 切换风险 | 影响 | 尽调路径 |
|---|---|---|---|
| 多产品采用(Sandboxes + Inference + Fine-tuning) | 收入可能集中在少数超大规模账户(按用量计费) | 大客户离开会带来收入波动 | 索取前 5 大客户 ARR 占比;询问按消费层级拆分的流失率 |
| 创业公司额度 → 企业转化漏斗 | cohort 转化率和转正时点未知 | 漏斗效率和 CAC 不透明;可能扭曲增长观感 | 索取 cohort 转化率和额度转付费的平均时间 |
| Sandbox 产品线(>1/3 收入) | 单一产品类别集中;风险与代理市场联动 | AI 代理采用放缓会对 Modal 造成不成比例影响 | 跟踪代理市场增长;询问 Sandbox 与 Inference 收入趋势 |
| Python 原生易用性是主要粘性来源 | 缺少硬技术锁定;开放模型 / 运行时意味着代码可迁移 | 如果竞争对手补齐 DX 差距或大幅降价,客户可能流失 | 访谈已流失客户;调研月消费 $10K+ 客户的价格敏感度 |
| 企业销售动作 | 未披露销售动作和 AE 人数;可能限制大单能力 | 自助服务若撞上合同规模天花板,收入上限会受限 | 索取员工人数、GTM 架构和大单销售周期数据 |
扩展驱动因素和风险来自案例、Series C 博客和 Sacra 2026 分析。没有一手财务数据;所有风险评级均由间接证据推断。
[CU028, CU030, CU033, CU034, CU035, CU036]围绕十个具名 Modal 客户部署,按生产状态、指标具体度、来源独立性和扩张可见度评估证据质量与结果具体度。
独立性评级为定性判断;高 = 独立第三方来源佐证,中 = 客户网站或非 Modal 来源引述提供部分佐证,低 = 仅有 Modal 撰写的博客。扩张可见度反映是否记录了第二个明确用例。
[CU007, CU012, CU013, CU014, CU015, CU016]6.5 平台宽度与用例分类
Modal 的客户证据覆盖八类不同用例——LLM inference serving、sandboxed code execution、RL training infrastructure、custom fine-tuning、audio/video/image generation、computational biology、batch processing 和 robotic real-time inference——每一类至少有一个具名生产部署证明。这种宽度重要,因为它降低了 Modal 依赖单一工作负载类型的风险。按 Series C 公告,单是 sandboxed code execution 就贡献超过三分之一收入,由 Lovable 的 AI app generation、Ramp 的 Inspect coding agent、Quora 的 Poe code interpreter 和 Cognition 的 RL environment work 支撑。LLM inference 是第二大类别,覆盖 Decagon 的 real-time voice model、Runway Characters 的 video model、Suno 的 music generation 和 Reducto 的 document intelligence pipelines。RL training 类别在 2025–2026 年快速出现:Applied Compute、Cognition 和 AE Studio(theorem proving)都使用 Modal 做高并行 RL rollouts,Series C 文章也明确把「RL workloads」列为关键增长驱动。Computational biology(Chai Discovery)和 robotic AI(Physical Intelligence)规模较小,但战略上相关,因为它们证明 Modal 能服务典型 cloud-AI 模式之外、对延迟敏感且领域特定的科学工作负载。LLM serving、image and video、coding agents 的 solutions pages 确认,Modal 正主动营销这些类别,而不只是观察有机采用。[CU002, CU006, CU011, CU020, CU021, CU023]
6.6 展示材料
07风险
7.1 法律与监管风险有边界,但需尽调 HIPAA 范围和 EU AI Act 合规链条
在 late-stage 私有基础设施公司中,Modal 的法律与监管姿态算是更透明的一类。公司把完整 Data Processing Agreement 嵌入服务条款(2025 年 10 月生效),补齐 GDPR Article 28 下 controller-processor 关系,并在 trust.modal.com/subprocessors 列出 subprocessor list。DPA 的 Technical and Organizational Measures 表承诺静态加密、访问控制、年度 SOC 2 Type II 续审和每日客户数据备份。不过关键在于,DPA 把 legal-basis、notice 和 consent 义务放在作为 data controller 的客户身上,而不是 Modal 身上;这意味着受监管部署即便 Modal 的基础设施栈完全合规,客户侧仍需要自己的 GDPR compliance programs。这种 shared-responsibility split 在云服务中很常见,但 healthcare 或 financial services 的企业买方常常低估它。 HIPAA 方面,Modal 安全文档明确把 Volumes v1、Memory Snapshots 和 Images(不含 Filesystem and Directory Snapshots)列在 BAA 范围之外。这个限制很重要:GPU Memory Snapshots 是 Modal 最差异化的冷启动能力,HIPAA 排除意味着医疗客户如果使用这一能力,就无法在不承担 PHI exposure 风险的情况下获得支撑 Modal 性能溢价的功能。因此,BAA-eligible surface 比产品营销暗示的更窄;在承销受监管工作负载前,尽调必须确认定制 Enterprise 合同是否扩大 BAA 范围。 EU AI Act(Regulation 2024/1689)已于 2024 年 8 月 1 日生效,并将在 2026 年 8 月 2 日全面适用。GPAI model governance rules 要求 general-purpose AI models 的 providers 提供技术文档、训练数据透明度和版权合规,这些规则已于 2025 年 8 月 2 日适用。Modal 不是 GPAI model provider,但其企业客户中若有 GPAI providers(fine-tuning open models、serving Llama variants、building downstream products),可能需要满足 AI Act 文档要求,并把要求向上游传导到基础设施供应商。这给 Modal 带来间接合规负担:企业采购周期可能拉长,因为客户会要求 Modal 提供文档、subprocessor lists 和 data residency confirmations,以满足自身 AI Act filing requirements。2026 年 5 月 7 日 AI omnibus political agreement 将部分 high-risk AI system rules 延至 2027 年 12 月,但没有推迟已经生效的 GPAI obligations。截至 2026 年 6 月 14 日,任何公开可得来源均未发现针对 Modal Labs, Inc. 的 active litigation、enforcement action 或 regulatory investigation。 [CR001, CR002, CR003, CR004, CR005, CR006]
| 风险 / 规则 | 司法辖区 | 状态 | 可能性 | 严重性 | 缓解措施 | 剩余暴露 | 尽调路径 |
|---|---|---|---|---|---|---|---|
| HIPAA BAA 范围缺口——Memory Snapshots 和 Volumes v1 不纳入 BAA 覆盖 | 美国(联邦) | 有效限制——公开安全页面已有说明 | 高 | 高 | 企业 BAA 可用;BAA 覆盖 Volumes v2;Starter/Team 用户必须完全避免 PHI | 使用冷启动优化(GPU Snapshots)的医疗客户不能包含 PHI;定制 Enterprise 条款可能扩大范围 | 与 Modal 确认 BAA 附件范围;索取带修订痕迹的 BAA,以及按产品功能列出的允许 PHI 数据流地图 |
| GDPR 控制者—处理者划分——DPA 下客户仍保留合法依据和同意义务 | EU / EEA | 有效——嵌入公开服务条款(2025 年 10 月生效日) | 高 | 中 | 已有 DPA 和完整 TOM 表;静态和传输中加密;SOC 2 Type II 证实控制 | 受监管欧盟客户必须维持自身 GDPR 合规项目;Modal 不承接控制者风险 | 审阅企业合同中的 DPA Schedule 1–3;核实 trust.modal.com/subprocessors 的子处理方名单是否最新 |
| EU AI Act GPAI 治理规则——自 2025 年 8 月起,文档和透明度义务适用于 GPAI 模型提供方 | EU / EEA | 2025 年 8 月 2 日起生效;AI Act 于 2026 年 8 月 2 日全面适用 | 中 | 中 | Modal 是基础设施提供方,不是 GPAI 模型提供方;通过企业客户产生间接暴露 | 被归类为 GPAI 的客户会向基础设施供应商索取 AI Act 文档,企业采购周期可能拉长 | 确认 Modal 面向 GPAI 服务客户的文档包;索取欧盟企业部署的模板合规材料 |
| FTC 云竞争执法——计算中介存在搭售和捆绑风险 | 美国(联邦) | 当前没有针对 Modal 的行动;FTC 分析提示该行业存在结构性风险 | 低 | 中 | Modal 不是超大云厂商;若 AWS/GCP/OCI 对聚合方采取排他性定价,风险会向下游传导 | 如果云厂商优先支持自有无服务器 GPU 产品,超大云厂商供给渠道可能受限或重新定价 | 跟踪 AWS/GCP/OCI 条款和价格;尽调 Modal 针对歧视性算力获取的合同保护 |
| 无已知诉讼或监管执法 | 全球 | 已确认不存在——截至 2026 年 6 月 14 日,已抓取来源未发现执法 | 低 | 低 | 无需缓解;标准公司治理提供基础保护 | 任何 Series C 公司都内生标准 IP、雇佣和数据隐私诉讼风险 | 通过法律顾问审阅 Delaware 注册记录及 PACER/EDGAR 检索来确认 |
严重性反映投资尽调相关性,不构成法律意见。截至运行日期,未发现针对 Modal Labs, Inc. 的执法行动或诉讼。
[CR001, CR002, CR003, CR004, CR005, CR006]7.2 运营与可靠性风险是本章最关键发现:一个月三次重大宕机,且没有公开 SLA
Modal 的整体 uptime 统计扎实:状态页(2026 年 6 月 14 日)显示,过去 90 天 GPU functions uptime 为 99.946%,CPU functions 为 99.938%,Web endpoints 为 99.933%,Snapshot restores 为 99.782%,Sandboxes 为 99.861%。这些数字符合生产级基础设施,不应轻视。但产生这些 downtime minutes 的事故形态,是实质尽调信号。2026 年 6 月 3 日 Hacker News 帖子记录一个月内三次重大宕机:5 月 7 日 SEV 1(AWS availability zone us1-az4 overheating)、5 月 19 日一次没有公开 post-mortem 的事件,以及 6 月 3 日事件——一次与 GPU hardware 或 cloud-provider availability 无关的内部认证系统故障。30 天内三起事件聚集,提出了一个问题:Modal 的可靠性基础设施是否跟上了约 12 个月内从约 $60M ARR 增至 $300M ARR 的收入增长。 6 月 3 日认证系统故障作为信号尤其负面:它暴露了一个 central control-plane dependency,而 Modal 的多云 GPU pooling 并不能直接缓解这个依赖。5 月 7 日 AWS AZ overheating 则说明,即便采用多云架构,单一区域故障仍会传播到进行中的客户工作负载。两个 failure modes 合在一起,说明 Modal 的冗余架构也许更擅长防止容量短缺,而不是吸收突发 AZ-level events 或 control-plane faults。 SLA 缺口进一步放大运营风险。Modal 没有为 Starter 或 Team 客户——其用户基础的大多数——发布 contractual uptime commitment。Enterprise SLA 条款私下谈判,公开不可得。这意味着大多数 Modal 客户对 2026 年 5–6 月三次宕机没有合同救济。Modal 确实有实质缓释措施:SOC 2 Type II 且无偏差(2025 年 1 月审计)、私有 HackerOne bug bounty program、gVisor container isolation、Rust-based container runtime、所有 public APIs 使用 TLS 1.3,以及 automated synthetic monitoring。这些保护是真实的。但非 Enterprise 客户没有公开 SLA,再加上宕机集群,意味着在通过事故根因和 post-mortem cadence 尽调确认前,运营风险应排在 severity ranking 顶部。 [CR009, CR010, CR011, CR012, CR013, CR014]
| 故障模式 | 可能性 | 严重性 | 缓解成熟度 | 剩余暴露 | 未解决缺口 |
|---|---|---|---|---|---|
| 重大宕机集群——2026 年 5–6 月发生 3 起 SEV 1/重大事故(5 月 7 日 AWS AZ 过热;5 月 19 日未报告事故;6 月 3 日认证系统故障) | 高(已发生;复发未证实) | 严重 | 部分——多云池化可处理部分 AZ 故障;认证系统故障未单独公开缓解 | Modal 上的生产工作负载仍暴露于反复短暂宕机,多数套餐层级没有合同救济 | 5 月 19 日宕机没有公开复盘;认证控制平面故障没有披露架构修复 |
| SLA 缺口——Starter 或 Team 客户没有合同正常运行率承诺 | 高(按设计存在合同缺口) | 高 | 部分——Enterprise SLA 可用;Team/Starter 条款没有正常运行率救济 | 包括 5–6 月事故集群在内的宕机,大多数客户没有 SLA 支持的救济 | 非 Enterprise 套餐的公开 SLA 文本;关于服务抵扣结构的客户沟通 |
| GPU Memory Snapshot alpha 不稳定——与多 GPU 代码和非 CUDA 工作负载不兼容 | 中(alpha 功能;限制已有文档) | 中 | 部分——CPU Memory Snapshots(GA)提供后备;受影响工作负载可避开 GPU snapshots | 使用多 GPU 训练或非 CUDA GPU 推理的客户无法受益于冷启动优化;HIPAA BAA 排除 Memory Snapshots | 完整多 GPU 支持的 GA 时间线;CUDA checkpoint/restore API 版本依赖披露 |
| 私有漏洞赏金——仅邀请制 HackerOne 项目限制安全研究广度 | 低(无已知关键披露) | 中 | 部分——SOC 2 Type II 和年度渗透测试提供外部验证;私有漏洞赏金限制社区覆盖面 | 相比公开漏洞赏金,独立研究者审视平台漏洞的数量更少 | 平台达到更大企业规模后考虑公开赏金范围;过渡方案是提高年度渗透测试透明度 |
各行按严重性排序。正常运行率来自 status.modal.com(2026 年 6 月 14 日,90 天视图)。宕机日期来自 Hacker News 帖文(2026 年 6 月 3 日)。
[CR009, CR010, CR011, CR012, CR013, CR014]有向无环图展示 Modal 五个根因风险集群如何沿运营、竞争、监管和治理路径向下传导,影响收入耐久性和估值。边表示因果或依赖关系。节点描述用于说明;方向性为近似判断。
[CR009, CR012, CR017, CR024, CR026, CR029]7.3 合作伙伴与基础设施依赖风险集中在 GPU 供应集中度,以及 NVIDIA 作为供应商和竞争者的角色演变
Modal 有意采用轻资产模式:不持有 GPU 硬件,而是在全球数百个数据中心聚合 AWS、GCP 和 Oracle Cloud Infrastructure 的容量。这一架构带来结构性灵活性——无需重资本 GPU 采购、没有折旧风险、可路由到最便宜的可用硬件——但也把生存级依赖集中到三家商业交易对手身上;它们的定价、分配和战略优先级都不受 Modal 控制。AWS shared responsibility model 很有启发:即便对抽象化云服务,云厂商也控制基础设施可靠性,而把配置、patching 和 security configuration 留给客户。Modal 相对于 AWS、GCP 和 OCI 的位置类似:它是 GPU intermediary,必须承受上游 availability risk,同时又向下游客户营销自己的 SLA。 NVIDIA 是 Modal 技术栈中最深的 single-point dependency。Modal 的 GPU Memory Snapshots——处于 alpha 阶段、可实现 10x 加速的冷启动特性——依赖特定 NVIDIA driver branches(570/575)中的 CUDA checkpoint/restore API。NVIDIA driver API 的任何变化,无论来自版本更新、商业限制,还是 driver maintenance 中止 checkpoint capability,都可能打断 Modal 冷启动架构中最差异化的功能。Modal 记录的 multi-GPU code 和 non-CUDA workloads 不兼容,也进一步限制了风险缓释空间。这一技术依赖目前没有任何公开披露的替代方案缓释。 NVIDIA 的竞争行为给依赖风险增加第二个维度。Sacra 的 Fireworks AI 报告把 NVIDIA 收购 Lepton 解读为 NVIDIA 有意直接竞争 GPU cloud marketplace 的信号。如果 NVIDIA 的战略利益从赋能 GPU aggregators 转向直接服务客户,Modal 与主导 GPU 制造商的供应关系就会从共生变成对抗。CoreWeave 的处境说明了 NVIDIA 如何用优先分配加深与重资本数据中心运营商的关系:NVIDIA 持有 CoreWeave $2B equity stake,并提供 $6.3B take-or-pay GPU backstop;这种安排可能以牺牲轻量级聚合平台为代价。Modal 依赖 Oracle Cloud Infrastructure(OCI)作为第三家云厂商——很可能来自 Oracle 的 GPU cloud expansion——相较 AWS 和 GCP,也增加了来自较不成熟 AI infrastructure provider 的集中与交易对手风险。 [CR017, CR018, CR019, CR020, CR021, CR022]
| 依赖 | 交易对手 | 角色 | 集中度 | 失败场景 | 严重性 | 缓解措施 | 剩余暴露 |
|---|---|---|---|---|---|---|---|
| GPU 算力供给——没有自有硬件;100% 依赖云厂商分配和定价 | AWS、GCP、Oracle Cloud(OCI,多云供应) | 跨数百个全球数据中心提供主要 GPU 算力 | 高——3 家供应商;若三家同时限制分配或提价,没有硬件备份 | 任何主要供应商提价、限制容量或战略性降级优先级;单 AZ 故障会传导(5 月 7 日事故) | 严重 | 多云池化分散风险;区域路由;GPU 自动升级(H100→H200)最大化池利用率 | 实质性——任何供应商定价动作或容量限制都会直接影响 Modal 毛利率和客户可用性 |
| NVIDIA CUDA checkpoint/restore API——GPU Memory Snapshot 功能依赖 570/575 驱动分支 | NVIDIA | 为 GPU Memory Snapshots(alpha)提供底层 CUDA checkpoint/restore 能力 | 严重——未披露替代实现;与多 GPU 代码不兼容 | NVIDIA 弃用或更改 checkpoint/restore API;依赖亚秒级冷启动优化的现有客户功能中断 | 高 | GPU snapshots 仍处 alpha;CPU Memory Snapshots(GA)提供后备;Modal 可禁用依赖 snapshot 的工作流 | NVIDIA 驱动变更可能让 Modal 最差异化的冷启动功能消失;未披露缓解时间线 |
| NVIDIA 作为潜在竞争者——收购 Lepton 释放 GPU 云市场野心 | NVIDIA | 目前是 GPU 硬件供应商;通过 Lepton 逐步成为直接 GPU 云平台 | 中——NVIDIA 的分配决策偏向资本密集型伙伴(CoreWeave $6.3B 兜底);Modal 不在该层级 | NVIDIA 优先把 GPU 分配给自有市场或资本密集型伙伴,而非聚合平台 | 中 | 多云采购降低 NVIDIA 专属 GPU 风险;AMD GPU 多元化是长期选项 | NVIDIA 建设竞争性分发渠道的同时,Modal 对 NVIDIA 硬件存在结构性依赖 |
| gVisor 容器运行时——容器隔离依赖 Google 维护的开源项目 | Google(gVisor) | 为所有 Modal 容器工作负载提供内核级沙箱隔离 | 中——gVisor 是开源项目;Google 也在 Cloud Run 和 GKE 使用;停更风险低 | gVisor 维护优先级下降或分叉;隔离属性偏离生产要求 | 低 | 开源许可;Modal 可以 fork 或替换为其他内核沙箱(Firecracker、kata containers) | 考虑到 Google 自有产品仍在活跃使用,剩余风险低 |
各行按严重性排序。OCI = Oracle Cloud Infrastructure。
[CR017, CR018, CR019, CR020, CR021, CR022]Modal 在算力供应、技术、监管合规和金融基础设施上的关键外部依赖有向图。边显示依赖关系的方向和性质。节点关键性由边数量和严重度标注体现。
[CR017, CR018, CR019, CR022, CR023]7.4 竞争与财务模型风险因 15.5x ARR 倍数、Sandbox 收入集中,以及 hyperscaler 和资金充足同行加速施压而升高
Modal 的 Series C 估值为 $4.65B,ARR 约 $300M,意味着 15.5x 收入倍数。作为参照,类似 ARR 规模的成熟云基础设施公司通常以 5-10x 收入交易;Modal 的溢价反映了异常高增速(自 2025 年 10 月 Series B 以来 5x),但也把持续 hypergrowth、margin discipline 和产品差异化的执行计入价格。ARR 任何降速、云厂商定价导致的 margin compression,或 hyperscaler-native solution 带来的 competitive displacement,都会对倍数形成下行压力。公司没有披露毛利率、burn rate 或客户集中度,因此没有私有财务数据,投资案例无法完整承销。轻资产 GPU aggregators 的估计毛利率为 30–50%(与可比基础设施公司一致),但在 15.5x ARR 倍数下,即便 40% 毛利率也意味着约 38x gross profit;对一个有明显供给侧集中风险的业务来说,这是很高的倍数。 Sandbox 收入集中——Sandboxes 贡献 Modal 总收入超过三分之一——带来产品特定风险。Sandboxes 服务 AI agent execution 市场,这是高增长类别,但 AWS、Google 和 Anthropic 正快速直接进入。AWS Bedrock AgentCore、Google Gemini 的 agent capabilities,以及 Anthropic 自有 managed Sandbox-like offerings 都瞄准同一用例。如果企业买方把 AI infrastructure procurement 合并回既有 hyperscaler 关系,Modal 的 Sandbox 收入可能在一个代表 $100M+ ARR 基础的产品中面临快速替代风险。 竞争环境也因资金充足的同行而趋硬。CoreWeave 的 $99.4B contracted backlog 和 $31–35B FY2026 capex investment 瞄准与 Modal 相同的 AI compute demand,但其原始容量规模是 Modal 作为轻资产聚合商无法匹配的。Sacra 估算 Fireworks AI 约 $315M ARR,高于 Modal 披露的 $300M ARR baseline,并在 fine-tuning、agent deployment 和 real-time latency optimization 上差异化。RunPod 到 2025 年底开发者数从 100,000 增至 400,000+,融资仅 $22M,说明价格有竞争力的 GPU platforms 可以不靠 Modal 级别资本也能放大。FTC 的 generative AI competition analysis 把 cloud platform bundling 和 tying 列为独立 compute vendors 的结构性风险:hyperscalers 可能通过把 preferred pricing、compliance posture 或 enterprise support 与排他性云关系绑定,把企业客户导向自家 GPU 产品。 [CR024, CR025, CR026, CR027, CR028, CR029]
| 角色 / 职能 | 依赖或缺口 | 可能性 | 严重性 | 缓释措施 | 尽调路径 |
|---|---|---|---|---|---|
| CEO / 联合创始人 Erik Bernhardsson——唯一被点名的外部发声者;具备技术可信度和开发者社区信任 | 关键人物集中;唯一公开确认的领导者;公司愿景和文化深度绑定 Bernhardsson 的个人品牌 | 低(正常运营连续性) | 高 | GC、Redpoint、Menlo、BCV、Accel 等投资人构成广泛董事会监督;工程团队规模较大;开源客户端沉淀了组织记忆 | 索取完整高管组织架构;确认 VP 级领导姓名;核验继任和业务连续性规划 |
| 联合创始人 Akshat Bubna——所有公开来源均未披露头衔和背景 | 治理不透明;其职能角色(CTO、CPO 或其他)和过往行业经验未知 | 低(未披露,不等于不存在) | 中 | Bubna 已确认是联合创始人;考虑到 Bernhardsson 更偏外部发声,其角色推测涉及技术领导 | 确认头衔、职责范围和工程监督责任;审阅 LinkedIn 或新闻记录 |
| 创始人之外没有具名 C-suite——未公开 VP Engineering、CRO、CFO 或 Head of Revenue | $300M ARR 阶段仍看不到销售、财务或规模化工程职能领导,执行风险上升 | 中(规模化需要两位创始人之外的授权) | 中 | Series C 投资人财团提供董事会治理;创业公司项目和案例研究节奏显示 BD 职能活跃 | 索取组织架构、按职能划分的员工数和招聘计划;确认 go-to-market 是创始人主导还是已授权 |
| 治理不透明——未披露董事会构成、委员会结构或投资人控制权 | $4.65B 估值下,外部问责可见度有限;机构治理依赖私下投资人安排 | 低(Series C 的常见情况) | 低 | GC、Redpoint、Menlo、BCV、Accel 均为成熟机构投资人,通常有标准治理预期 | 在 term sheet 审阅中索取董事会构成、委员会章程和保护性条款摘要 |
各行按严重性排序。
[CR031, CR032, CR033, CR034, CR035, CR037]截至 2026 年 6 月 14 日,按严重度排列的风险矩阵,将 Modal 八项实质风险放在可能性、影响、缓解成熟度和残余严重度维度上。行按残余严重度从高到低排序。缓解成熟度:强 = 公开控制措施完整记录;部分 = 有控制措施但仍有缺口;弱 = 公开缓解有限或没有公开缓解。
[CR001, CR004, CR009, CR010, CR012, CR013]7.5 关键人物与治理风险有意义但可管理;明确 kill criteria 锚定投资假设
Modal 的治理透明度,符合一家创始人主导的 Series C 私营公司的常见状态。公开材料中唯一具名高管是 Erik Bernhardsson——他出现在所有 Series C 沟通、产品博客和媒体报道里。Akshat Bubna 已确认是联合创始人,但他的职能角色和过往背景没有任何公开来源披露。公司网站、LinkedIn 领导层板块和媒体报道都没有列出其他高管(CTO、CRO、CFO、工程副总裁、营收负责人)。董事会、委员会结构和投资人控制条款在公开层面完全不透明。在当下,这对一家后期私营公司并不异常;但 Modal 估值已到 $4.65B,ARR 超过 $300M,且企业客户在跑生产负载,尽调必须盯住这一点。 关键人风险真实存在,但产品属性部分缓冲了这项风险。Modal 是工程驱动的平台,拥有庞大的开发者社区(2026 年 6 月单日 PyPI 下载量 1.6M)、开源客户端,以及冷启动基础设施上的深技术护城河。即使 Bernhardsson 有一段时间无法参与,这些资产也不会消失。更广泛的投资人组合——General Catalyst、Redpoint、Menlo、Bain Capital Ventures、Accel——会带来董事会席位和治理监督;这些安排公开不可见,但对 Series C 投资人而言属标准配置。Modal 没有公开提及与 NIST AI Risk Management Framework 或其他自愿性 AI 治理标准对齐;对有 AI 采购政策的企业账户来说,这是一个容易补上的缺口。 破坏投资论点的框架需要明确标准。若出现以下情况,Modal 的投资逻辑就会失效:(1)到 2026 年 Q2 之后,重大故障频率仍维持在每季度三次或以上,且没有公开事后复盘证明根因已修复、SLA 已改善;(2)ARR 同比增速跌破 50%,但毛利率没有相应改善;(3)某个具名企业客户(大规模使用 Sandboxes 或 Functions)公开迁移到 hyperscaler 原生方案,释放价格或合规驱动替代的信号;(4)NVIDIA 限制 CUDA checkpoint/restore API 或将其商业化,导致现有客户的 GPU Memory Snapshots 失效;或(5)监管执法实质削弱 Modal 服务欧洲或医疗客户的能力。按这些标准看,Modal 目前的资本位置(2026 年 4/5 月 $355M Series C)、SOC 2 状态和开发者采用度都显示韧性——但故障集中爆发和 SLA 缺口必须专项验证,可靠性这一块论点才算闭合。 [CR031, CR032, CR033, CR034, CR035, CR037]
| 风险 | 可监测触发项 | 阈值 / 事件 | 行动含义 |
|---|---|---|---|
| 运营可靠性——宕机集群复发 | 跟踪 status.modal.com 的月度事故数量和严重性;索取每起 SEV 1 事件的事后复盘报告 | 单季度发生三起或更多重大事故且未发布根因修复;或任一单次事故导致 GPU 函数不可用超过 4 小时 | 暂停投资;升级尽调要求,审查基础设施架构和事后复盘库;考虑在企业条款中加入 SLA 托管 |
| SLA 缺口——非 Enterprise 合同保护缺失 | 监测 Starter 或 Team 计划是否发布 SLA;跟踪任何 SLA 政策变更公开公告 | Series C 部署后非 Enterprise 计划仍未发布 SLA(预期 12 个月内应出现) | 将含自定义 SLA 的企业 MSA 作为任何生产部署的前置条件;将其标记为开发者大众市场变现的负面信号 |
| HIPAA / 受监管工作负载合规——BAA 范围扩展 | 跟踪 trust.modal.com 和安全文档页面的 BAA 范围更新;每年索取更新版 BAA 附件 | GPU Memory Snapshots 在 GA 后超过 24 个月仍被排除在 BAA 范围外;无法为受监管医疗客户提供自定义 BAA 扩展 | 下调医疗垂直 TAM 估算;在受监管企业销售中,将 HIPAA 合规标记为营销走在合同前面的风险 |
| ARR 增长降速——超高增长放缓 | Sacra 季度 ARR 估算;Modal 的任何公开披露;二级市场估值信号;新增企业客户公告 | ARR 同比增长低于 50%(相较 7 个月 5x 的节奏);或 Sandbox 收入占比从三分之一下降且 Functions 增长未能抵消 | 重新承销财务模型;下调目标倍数;在尽调中索取 pipeline 可见度和客户 cohort 数据 |
| 超大云替代——具名客户流失 | 监测客户公告流、媒体报道,以及 AWS Bedrock AgentCore、GCP Vertex AI、Azure AI Foundry 针对 Modal 邻近功能的产品发布提醒 | 任何具名 Modal 参考客户(Suno、Cognition、Physical Intelligence、Ramp、Applied Compute)公开宣布迁移至超大云原生 serverless GPU 或 Sandbox 等价产品 | 论文破裂事件;停止增加仓位规模;触发对 Modal 暴露的完整组合复盘;要求管理层就竞争应对进行紧急说明 |
触发项设计为可按季度监测节奏观察。所有阈值均假设投资人在投资前尽调中已确认基线可靠性和增长指标。
[CR004, CR009, CR013, CR024, CR025, CR026]08估值
8.1 建议:跟踪 Series C 定价;没有尽调证据,不追逐更高动量价格
Modal Labs 于 2026 年 5 月 21 日完成 $355 million Series C,投后估值 $4.65 billion。General Catalyst 领投,老股东 Redpoint Ventures 共同参与,Menlo Ventures、Bain Capital Ventures 和 Accel 作为新投资人加入。融资前,公司披露年化收入已超过 $300 million,较 2025 年 10 月 Series B 增长五倍。Sacra 独立估算,Modal 2026 年 4 月年化收入达到 $300 million,高于 2025 年底约 $119 million——意味着五个月约 150% 增长,年化超过 300%。$4.65 billion 投后估值除以 $300 million ARR,为 15.5x,正落在 2026 年中私有 AI 基础设施倍数的上沿。 这一轮已经关闭,时间很近,并得到公司博客、Sacra Modal Labs 研究报告、General Catalyst 投资组合页、Bain Capital Ventures 投资组合页和一般投资人评论的交叉印证。因此,$4.65 billion 投后估值是干净锚点。更难的问题是,公开证据是否支持这个价格仍有吸引力、只是公允,还是已经偏贵。 答案是:偏贵但有条件可辩护——前提是 Modal 的收入增长继续维持在当前或接近当前速度。私有可比公司显示,15.5x 已在分布上沿:Baseten 2026 年 2 月一轮融资估值 $5 billion,按 Sacra 的 $600 million ARR 估算约 8.3x;Together AI 2025 年 2 月估值 $3.3 billion,对应 2026 年约 $1 billion run-rate,隐含 3.3x;Fireworks AI 2025 年 10 月 Series C 约为 5x ARR,据报正在谈更高价格。Modal 相对这组同行的溢价,只有在架构领先(亚秒级冷启动、Rust runtime、CUDA checkpoint)和 Sandbox 牵引力(收入超过三分之一)能支撑高于同行中位数的增长时才站得住。 因此,正确姿态是跟踪:中等置信度、高风险评级、估值偏贵。公司值得密切关注,因为市场真实、产品有差异化、增长速度异常强劲。但在为当前估值之上的任何上调买单前,投资人应坚持完成本章末尾列出的尽调。[CV001, CV002, CV003, CV004, CV005, CV006]
| 维度 | 数值 | 理由 |
|---|---|---|
| 建议 | 持续跟踪 | $300M ARR 阶段增长出色,客户证明也强,但 15.5x ARR 倍数要求超高增长延续,几乎不给降速或利润率不及预期留空间 |
| 信心 | 中 | ARR 数字由公司披露和 Sacra 估算交叉印证;毛利率、烧钱速度、NRR 和 cap table 条款均未披露 |
| 风险评级 | 高 | 2026 年 5–6 月发生三起重大宕机;两位创始人治理,未具名董事会或 CFO;单位经济完全不透明;Sacra Series B 数据存在冲突 |
| 估值立场 | 偏高 | 15.5x ARR 位于私有 AI 基础设施倍数上沿;只有在 2027 年中 ARR 达到 $500M+ 且毛利率证据高于 35% 时才站得住 |
数值反映截至 2026 年 6 月 14 日基于公开证据的判断。若 TV006 中四道尽调门槛均满足,建议可上调为买入。
[CV001, CV002, CV006, CV007, CV008, CV009]TRACK 结论在强收入和客户证据,与偏高倍数和未披露单位经济之间做权衡。
这是推理图,不是加权评分模型;边权重为定性判断。
[CV001, CV002, CV006, CV007, CV008, CV009]8.2 只有收入质量和平台黏性真实,价格才站得住
投资论点从时点和执行开始。Modal 约五年做到 $300 million 年化收入,达到少数基础设施公司才曾以类似速度跨过的门槛。Series B 到 Series C 的估值跃升——约七个月内从 $1.1 billion 到 $4.65 billion——由公司披露的收入里程碑支撑,也得到 Sacra 独立第三方估算印证。投资人组合(General Catalyst、Redpoint、Menlo Ventures、Bain Capital Ventures、Accel)包括多家顶级机构;每家在按这些条款出资前,都应已完成自己的核心尽调。 产品论点由两根相互强化的柱子支撑。第一,Modal 的 GPU snapshotting 技术持久化 CUDA 内存状态,冷启动比传统 GPU 云快 40–100x,让平台在突发型推理负载中具备结构性优势。第二,Sandboxes 成为一等收入面(占总收入超过三分之一),证明 Modal 不是单纯 GPU 租赁平台,而是一个可编程云,拥有可独立于计算层运行的 agent 执行基础设施。两项能力叠加,拼出一个平台叙事,足以支撑相对商品化 GPU 访问的溢价。 反论点几乎同样有力。Modal 定价相对原始 GPU 云存在明显溢价:Hostfleet 2026 年 4 月定价矩阵显示,Modal L4 GPU 约 $0.80/小时,而 RunPod Secure Cloud 为 $0.43/小时、Baseten 为 $0.63/小时,是对比中最高的标价。溢价只有转化为更高毛利率才可持续,而这个数据点完全不公开。轻资产供给模式(Modal 聚合 AWS、GCP 和 Oracle 的容量,而非自有 GPU)形成结构性毛利率上限:Modal 赚客户支付与 hyperscaler 收费之间的价差,而 hyperscaler 可以把自有算力打包、折扣销售,压掉这段价差。2026 年 5 月和 6 月三次重大故障(5 月 7 日 SEV-1、5 月 19 日未公开事件、6 月 3 日内部认证失败)说明基础设施成熟度还没有追上收入增长。按 15.5x ARR,投资人买入的是尚未由一手财务披露证明的溢价。[CV001, CV002, CV003, CV004, CV005, CV006]
| 论点 | 证据 | 反证 / 什么会改变看法 |
|---|---|---|
| $300M ARR 证明平台规模 | 公司在 Series C 博客披露(2026 年 5 月);Sacra 独立估算 2026 年 4 月 ARR 为 $300M | 只有一个独立估算;无审计财务;增长率可能由少数大客户前置拉动 |
| 7 个月 5x 增长验证加速 | 公司称自 2025 年 10 月 Series B 以来增长五倍;Sacra 估算 YE2025 ARR 约 $119M | 隐含约 3x 同比年化增速,难以持续;若 Sacra 数据滞后,Series B 基线可能低于 $119M |
| 轻资产模型避开资本密集风险 | GPU 容量聚合自 AWS、GCP、Oracle;没有自有硬件或 GPU 债务 | 毛利率上限受超大云采购价格制约;超大云可通过打包压低价差 |
| Sandbox 牵引力把平台延展到计算之外 | Sandboxes 披露占总收入 >1/3;客户累计启动 10 亿+ 个 Sandboxes | Sandbox 毛利和流失未披露;执行环境可被超大云和开源替代方案复制 |
| 一线投资人财团确认承销质量 | General Catalyst(新进)、Redpoint(现有)、Menlo Ventures、Bain Capital Ventures、Accel 参与 Series C | 投资人背书不披露条款;四轮融资累积的优先权悬垂未知 |
| GPU snapshotting 和 Rust runtime 构成技术护城河 | 2026 年 5 月工程博客记录 100x 冷启动改进;自定义内容寻址文件系统和 CUDA checkpoint/restore | 开源推理 runtime(vLLM、SGLang)进步很快;只要工程投入足够,snapshotting 可以复制 |
论点和反证仅基于本次访问的公开来源。信心为中;私有财务数据会在任一方向显著改变权衡。
[CV001, CV002, CV003, CV004, CV005, CV006]若采用 5x 倍数(CoreWeave 式基础设施),Modal 需要 $930M ARR 才能支撑 Series C 价格;在 15.5x(当前隐含倍数)下,只需 $300M。敏感性显示,倍数选择主导了分析。
每根柱将 Series C 投后 $4.65B 除以选定可比倍数;数值是基于估计的支撑门槛,不是经审计收入。Fireworks 拟议倍数基于 Sacra 报道的进行中融资讨论,可能不会完成。
[CV001, CV025, CV026, CV027, CV028, CV029]8.3 可比公司分析把 $4.65B 放进基准情景,但已没有犯错空间
对 Modal 最有用的私有可比公司是 Fireworks AI 和 Together AI,二者都是纯推理平台,且有 Sacra 收入估算。Fireworks AI 在 2025 年 10 月 Series C 时披露 ARR 约 $800 million、投后估值 $4 billion,隐含约 5x ARR——显著低于 Modal 的 15.5x。据报 Fireworks 正在讨论以 $15 billion 估值融资;若按 $800 million ARR 关闭,则隐含约 18.75x,高于 Modal。Together AI 2025 年 2 月 Series B 估值 $3.3 billion,对应 2026 年约 $1 billion 年化收入,隐含 3.3x;据报它正以 $7.5 billion 估值洽谈,若按 $1 billion ARR 计算则为 7.5x。CoreWeave 不是合适的架构类比——它以巨大资本强度自有 GPU 硬件——但其 FY2025 收入 $5.13 billion 对应 IPO 前 $23 billion 估值,隐含约 4.5x trailing revenue,远低于 Modal 的软件化倍数。2026 年 3 月提交的 CoreWeave 10-K,是这组可比公司中唯一一份一手财务披露。 三个情景带概括了结果区间。牛市情景中,Sandbox 和推理动能延续,Modal 到 2027 年中达到 $600 million 至 $1 billion ARR,毛利率被证明达到或高于 40%,投资人为下一轮按 15–18x ARR 定价,隐含 $9 billion 至 $18 billion 估值。基准情景中,收入增长 100–150%,到 2027 年中达到 $450 million 至 $600 million,公司成熟后倍数温和压缩至 12–15x,隐含 $5.4 billion 至 $9 billion 区间,使当前 $4.65 billion 落在分布之内。熊市情景中,故障复发损害客户信任,增速跌破 80%,hyperscaler 打包竞品,倍数在 $250–350 million ARR 上压缩到 7–10x,隐含 $1.75 billion 至 $3.5 billion 估值——相较 Series C 价格形成实质性按市值重估损失。 基准与熊市之间的区间足够宽,当前估值不能称为有吸引力。这个案子本质上是买方押注执行继续兑现。可比公司确认,AI 基础设施公司交易倍数跨度可以很大——从 CoreWeave 的 4.5x 到 Fireworks 拟议的 18.75x——因此任何单一倍数的精确度都低。Modal 最可辩护的锚点是「具备已验证 Sandbox 牵引力的高端开发者云」,其价值更接近 12–16x 区间,而不是 4–8x 的原始算力区间。[CV025, CV026, CV027, CV028, CV029, CV030]
| 情景 | 概率信号 | 核心假设 | 2027 年中估算 ARR | 隐含估值区间 | 下行触发项 |
|---|---|---|---|---|---|
| 乐观 | 20–30% | Sandbox 动能延续;毛利率 45%+;宕机已解决;无重大超大云冲击;NRR 130%+ | $650M–$1.0B | $9.75B–$18B(15–18x 估值区间) | 需要披露毛利率,并提供高于阈值的 NRR 数据 |
| 基准 | 50–60% | 增长放缓至同比 100–150%;毛利率 30–45%;宕机缓释中等;竞争格局守住 | $450M–$650M | $5.4B–$9.75B(12–15x 估值区间) | 当前已完成的 $4.65B 轮次落在该区间内 |
| 悲观 | 20–25% | 增长降至同比 80% 以下;超大云打包竞争服务;宕机复发伤害留存;毛利率低于 25% | $200M–$330M | $1.4B–$3.3B(7–10x 估值区间) | 当前 $4.65B 标记落在悲观区间之外——存在重大减记风险 |
情景区间是基于同业倍数区间和公开 ARR 数据的分析师估算。无可用毛利率或 NRR 数据;情景仅为方向性判断。概率信号是定性判断,并非模型推导。
[CV030, CV031, CV032, CV033, CV034, CV035]| 公司 | 最近轮次 | 估值(投后) | ARR 估算 | ARR 倍数 | 与 Modal 的相关性 | 核心限制 |
|---|---|---|---|---|---|---|
| Baseten | $300M Series E,2026 年 2 月 | $5.0B | ~$600M(Sacra 估算) | ~8.3x | 最直接的可比公司;有开发者根基的企业推理平台 | 更聚焦企业 ACV;定价模型和利润率结构不同 |
| Fireworks AI | $250M Series C,2025 年 10 月;据称正洽谈 $15B 估值 | $4.0B → $15B proposed(拟议) | ~$800M(Sacra 估算) | 5.0x → ~18.75x proposed(拟议倍数) | 纯开放模型推理平台;客户基础庞大 | API 商品化定价隐含较低利润率;路径更偏硬件优化 |
| Together AI | $305M Series B,2025 年 2 月;正洽谈 $7.5B 估值 | $3.3B → $7.5B proposed(拟议) | ~$1.0B(Sacra 估算,2026 年) | 3.3x → ~7.5x proposed(拟议倍数) | 开源推理,并具备训练能力 | 端点模型更商品化;单客户收入低于 Modal |
| CoreWeave (CRWV) | 2025 年 3 月 IPO;Nvidia 2026 年 1 月 $2B 配售 | $23B(IPO 前二级市场) | $5.13B FY2025(SEC 10-K) | ~4.5x FY2025 收入 | 唯一完全公开的 AI 云;为纯基础设施倍数提供下限 | 资本密集的 GPU 持有者模型;非轻资产;利润率不像软件 |
| Groq | 2024 年 9 月 $750M;2025 年 12 月 Nvidia $17B 授权交易 | $6.9B(2024 年 9 月) | ~$90M(2024 年 Sacra 估算) | ~76x(2024 年估算)——现已被授权交易扭曲 | 自研硅推理;显示市场愿意为延迟领先者支付溢价 | 一次性授权横财从根本上改变可比性;LPU 架构属于不同市场 |
所有私有 ARR 数字均为 Sacra 第三方估算。倍数计算使用最新可得轮次估值和最新 ARR 估算;由于缺少远期预测,未反映 LTM 或 NTM 远期倍数。CoreWeave 倍数使用 FY2025 SEC 申报收入。
[CV025, CV026, CV027, CV028, CV029, CV038]$4.65B 的 Series C 估值稳稳落在基准情景内;从这里继续上调,需要收入和倍数都按牛市假设兑现。
情景带来自 ARR 增长预测,以及 TV004 中私有可比公司集合推导的倍数区间;熊市 / 基准 / 牛市 ARR 区间分别为 $200–$330M、$450–$650M 和 $650M–$1.0B;适用倍数分别为 7–10x(熊市)、12–15x(基准)、15–18x(牛市)。所有数字都是方向性分析师估计。
[CV030, CV031, CV032, CV033, CV034, CV035]Modal 在市场顺风和产品差异化上得分较高,但当前标记估值下,经济透明度和估值公允性明显偏低。
评分是截至 2026 年 6 月 14 日基于公开证据的 IC 式方向性判断;反映相对强弱,不是绝对校准。
[CV001, CV006, CV007, CV015, CV021, CV022]8.4 四个尽调闸门决定跟踪还是买入;论点只能靠证据推进
投资判断可以从跟踪升级为买入,不需要额外经营改善,只需要披露证据。四项尽调最关键。第一,毛利率:在 15.5x ARR 倍数下,投资人隐含支付的是软件化经济性。如果 Modal 实际 GPU compute 毛利率只有 20–30%(类似原始云聚合商),这个倍数非常吃紧。如果毛利率为 40–55%(类似 Cloudflare 或 Datadog 的云交付经济性),倍数就更可支撑。两端差距足以翻转结论:这个单一数据点最直接决定是否买入。Hostfleet 矩阵中最低成本的 serverless GPU 提供商 RunPod,据 Sacra 称毛利率在 60% 中段至 70% 高段,说明轻资产 GPU 中介可以做到软件化经济性——但那是一家收入规模低得多、业务组合不同的公司。 第二,收入质量。公司披露了 $300 million ARR 和 5x 增长,但没有公布 cohort 数据、NRR 或 churn。300% 年化增速可能来自少数超大交易(集中度风险),也可能来自广泛的开发者驱动扩张(若开发者初次使用后流失,则有 NRR 风险)。没有 NRR,$300 million ARR 的耐久性仍未定。第三,股权结构和清算优先权。$4.65 billion 投后估值是标题,但真实投资人经济性取决于 seed、Series A、Series B 和 Series C 累积的优先权堆栈——四轮合计约 $465 million 一级资本。以 $4.65 billion 入场的投资人,必须先建模瀑布分配,才能说入口价格有吸引力。第四,Series B 差异:Sacra 报告称 2025 年 9 月由 Lux Capital 领投的 $87 million Series B,估值 $1.1 billion;而 Modal 自己的博客写的是 $110 million,并列出 Redpoint 和 Sutter Hill Ventures 为领投方。公开可得来源都没有解释这一冲突,它构成透明度缺口,必须在正式 data room 里解决。 当前估值之上的任何跟投,都应由四个破坏论点的触发器卡住:六个月内再发生一次重大故障;毛利率证据低于 20%;到 2026 年 Q4 收入同比增速跌破 80%;或 Erik Bernhardsson 离任 CEO。公司值得密切跟踪,因为增长率真实、客户名单质量高、产品确有技术差异化。但任何从跟踪升级为买入的动作,都需要证据,而不是外推。[CV038, CV039, CV040, CV041, CV042, CV043]
| 触发项 | 阈值 | 对投资论点的传导 | 行动含义 |
|---|---|---|---|
| 宕机复发 | 任意 90 天窗口内发生两起或更多 SEV-1 事故 | 客户流失加速;倍数被施加可靠性折扣;NRR 恶化 | 降低或退出仓位;增加暴露前重新评估可靠性尽调 |
| 毛利率低于阈值 | 任何可信一手来源显示毛利率低于 25% | 轻资产溢价消失;倍数压缩至 CoreWeave 式 4–5x;当前标记需要 $750M ARR 才能打平 | 下调至回避;在商品化利润率下,当前入场价站不住 |
| 收入增长降速 | 截至 2026 年 Q4 或 2027 年 Q1 数据,ARR 同比增长低于 80% | 倍数压缩至 8–10x;$4.65B 标记从基准情景变成偏贵;down-round 风险兑现 | 不增加仓位;评估退出或对冲 |
| 超大云推出竞争性 serverless GPU 产品 | AWS、GCP 或 Azure 推出具备类似 Python DX 和冷启动表现的 serverless GPU 产品 | Modal 的核心差异化(冷启动、开发者体验)被削弱;可服务市场收缩 | 立即退出或严重下调估值;退出压缩时间线缩至 2–3 年 |
| 创始 CEO 离任 | Erik Bernhardsson 在没有透明继任计划的情况下离开 CEO 职位 | 技术领导力和产品愿景承压;客户对路线图的信心受损 | 暂停;下一次资本决策前评估继任者和技术领导层留存 |
触发项是基于截至 2026 年 6 月 14 日公开证据的前瞻判断;它们代表当前估值论点会显著走弱的条件,而非短期交易信号。
[CV019, CV021, CV022, CV023, CV040, CV041]| 主题 | 缺失证据 | 重要性 | 负责人 / 尽调路径 |
|---|---|---|---|
| 毛利率 | 按 GPU tier、存储和 Sandbox 拆分的 COGS;按产品线划分的毛利率百分比 | 15.5x ARR 只有在毛利率高于 35% 时才站得住;低于 25% 会把溢价倍数压回商品化区间 | 索取数据室财务报表;用超大云 GPU 定价对照 Modal 标价交叉核验 |
| 收入质量 | NRR、cohort 留存、前 10 大客户集中度占 ARR 的比例 | 300% 年化增长可能掩盖少数快速扩张账户;持久性未知 | 索取内部 BI dashboard 或 cohort 摘要;与可得 RunPod 和 Fireworks 数据对标 |
| 烧钱速度和 runway | 月度经营现金消耗和当前现金余额 | 若 burn rate 很高,$355M 融资可能很快耗尽;没有该数据无法确认资本充足性 | 索取 CFO 级财务披露;结合员工数(未披露)和基础设施成本三角验证 |
| Cap table 和优先权栈 | Capitalization table、清算优先权金额,以及各轮参与权 | 种子轮、Series A($16M)、Series B($87–$110M)和 Series C($355M)累积优先权可能显著损害普通股经济性 | 律师在数据室审阅;按不同退出倍数计算 waterfall |
| Series B 差异 | 解释 $87M(Sacra / Lux Capital 领投)与 $110M(公司博客 / Redpoint 领投)冲突 | 无法解释的融资历史冲突是透明度风险,也可能显示 cap table 复杂 | 索取 capitalization table 或 Series B term sheet;直接向公司询问解释 |
| 员工数和单位经济 | 总员工数、工程与 GTM 拆分、按 tier 划分的平均合同价值、CAC payback period | $300M ARR 且员工数未披露,经营杠杆无法判断;单位经济无法评估 | 索取内部 staffing 数据;LinkedIn 员工数只能提供粗略 proxy |
尽调问题代表把建议从持续跟踪上调至买入所需的最低证据;每一项都可独立推动建议变化。
[CV038, CV039, CV040, CV041, CV043, CV044]8.5 图表
免责声明
本报告由自动化研究工作流基于截至 2026-06-14 的公开信息生成,不构成投资建议。私有公司数据可能不完整、滞后或属于估计,投资者在作出任何投资决定前,应补充管理层尽调、合同审查,并直接获取财务材料。
证据索引
| 编号 | 陈述 | 可信度 | 来源 |
|---|---|---|---|
| CO001 | Modal Labs, Inc. is a Delaware corporation providing production cloud infrastructure for AI workloads. | 中 | SO009 |
| CO002 | Modal was founded approximately in 2021, as implied by the Series C blog statement that the company had spent "five years going very deep on technology" as of May 2026. | 中 | SO003 |
| CO003 | Modal's primary headquarters is in New York City, New York, as confirmed by both the LinkedIn company page and the Redpoint Ventures portfolio page. | 高 | SO004, SO007 |
| CO004 | Modal's homepage tagline is "The production cloud for AI." | 中 | SO001 |
| CO005 | Modal's documentation describes the platform as enabling low-latency inference with sub-second cold starts, scaling batch jobs massively in parallel, training and fine-tuning open-weight models, and spinning up isolated Sandboxes for AI-generated code execution. | 中 | SO005 |
| CO006 | Modal provides fully serverless execution and charges customers per second of actual usage, with no infrastructure management required. | 高 | SO005, SO014 |
| CO007 | Modal pools compute capacity across all major clouds and hundreds of data centers globally, routing workloads dynamically to optimize GPU availability and cost. | 高 | SO001, SO005 |
| CO008 | Modal's PyPI package supports Python 3.10 through 3.14 and can be installed with pip or uv. | 中 | SO013 |
| CO009 | Modal's GitHub organization (modal-labs) hosts the modal-client SDK (478 stars), modal-examples (1,221 stars), and gpu-glossary (616 stars) repositories as of June 2026. | 中 | SO012 |
| CO010 | Modal's pricing offers a Starter plan ($0 base, $30/month free credits, 10 GPU concurrency), Team plan ($250/month, 50 GPU concurrency), and Enterprise (custom pricing with volume discounts and higher GPU concurrency). | 中 | SO014 |
| CO011 | Modal's product portfolio as of June 2026 includes Functions (serverless GPU/CPU compute), Sandboxes (isolated execution environments), Training (fine-tuning and multi-node jobs), Volumes (mutable storage), Web Endpoints, and GPU Notebooks. | 高 | SO005, SO001 |
| CO012 | Modal's container infrastructure uses gVisor for enterprise-grade container isolation in Sandbox workloads. | 中 | SO019 |
| CO013 | Modal's Terms of Service (effective May 2026) identifies the contracting entity as Modal Labs, Inc., a Delaware corporation. | 中 | SO009 |
| CO014 | Redpoint Ventures' portfolio page identifies Modal's founders as Erik Bernhardsson and Akshat Bubna. | 中 | SO007 |
| CO015 | Erik Bernhardsson publicly described working on Modal in a personal blog post dated December 7, 2022, identifying it as a tool to run things in the cloud without managing infrastructure. | 中 | SO006 |
| CO016 | LinkedIn's Modal company page (June 2026) shows approximately 180 employees and lists the headquarters as New York City, New York. | 中 | SO004 |
| CO017 | Modal does not publicly disclose its board of directors, committee structure, or investor governance rights in any fetched public source as of June 2026. | 高 | SO007, SO008 |
| CO018 | Akshat Bubna's functional role (CTO or otherwise) and professional background are not confirmed in any successfully fetched public source as of June 2026. | 低 | |
| CO019 | The public corpus does not name any Modal executive beyond the two founders, including VP Engineering, CFO, Head of Revenue, or other C-suite titles. | 中 | SO004, SO007 |
| CO020 | The Series C blog post was written in the company's voice without attributing authorship to a named executive, consistent with a tight founder-led communications style. | 中 | SO003 |
| CO021 | Redpoint Ventures first invested in Modal's Series A in 2023, as stated on the Redpoint portfolio page. | 中 | SO007 |
| CO022 | Modal's Series A amount and the full list of Series A investors are not publicly disclosed in the fetched corpus. | 中 | SO007 |
| CO023 | Modal raised a Series B of approximately $110M in October 2025 at a post-money valuation of approximately $1.1B, according to the task-provided context; this round is not independently confirmed by a press release or official blog post in the fetched corpus. | 中 | SO003 |
| CO024 | Redpoint Ventures and Sutter Hill Ventures are named as Series B investors in the user-provided context; Sutter Hill's participation is not independently confirmed in any fetched source in this run. | 低 | SO007 |
| CO025 | Modal raised a Series C of $355M on or around May 21, 2026, as announced on the official Modal blog. | 高 | SO003, SO008 |
| CO026 | The Series C post-money valuation was $4.65B, representing a roughly 4.2x step up from the Series B valuation of approximately $1.1B in approximately seven months. | 中 | SO003 |
| CO027 | The Series C was co-led by General Catalyst and Redpoint Ventures, with Menlo Ventures, Bain Capital Ventures, and Accel joining as new investors; all existing major investors also participated. | 高 | SO003, SO008, SO026, SO027 |
| CO028 | Modal's annualized revenue had surpassed $300M at the time of the Series C announcement in May 2026, as stated in the official Series C blog post. | 中 | SO003 |
| CO029 | Modal grew its revenue approximately fivefold between the Series B (October 2025) and Series C (May 2026) rounds, as stated in the official Series C blog post. | 中 | SO003 |
| CO030 | Bain Capital Ventures is explicitly listed as a "new investor" in the Series C, implying BCV was not a Series B investor and contradicting the user-provided context. | 中 | SO003 |
| CO031 | Reducto migrated 30+ inference model workloads to Modal and achieved a 3x reduction in P90 latency, as documented in a November 2025 case study. | 中 | SO017 |
| CO032 | Reducto scaled its ingestion pipeline to over 1,000 GPUs in under an hour on Modal to meet a large enterprise prospect's demand for 100,000 pages per minute throughput. | 中 | SO017 |
| CO033 | Zencastr scaled to 1,500 concurrent GPUs on Modal to process hundreds of years of podcast audio in just a few days, eliminating the need to pre-allocate GPU nodes. | 中 | SO020 |
| CO034 | Quora shipped code execution for its Poe AI chatbot platform on Modal Sandboxes, eliminating the need to build sandbox infrastructure in-house and saving the equivalent of two engineers' ongoing work. | 中 | SO019 |
| CO035 | Substack migrated training and deployment for its entire ML portfolio (spam detection, recommendations, transcription, image generation) from AWS SageMaker to Modal by May 2024. | 中 | SO018 |
| CO036 | Applied Compute (serving DoorDash, Cognition, Mercor with RL-trained AI agents) uses Modal as its core reinforcement learning training and production inference platform. | 中 | SO021 |
| CO037 | Cognition's coding agents run "millions of sandboxes" on Modal for production inference and RL training, per the Series C announcement. | 中 | SO003, SO010 |
| CO038 | The Series C blog cites Physical Intelligence, Suno, DoorDash, and Decagon as additional named Modal customers with specific production workloads. | 中 | SO003, SO010 |
| CO039 | Lovable cited Modal as the only infrastructure provider enabling tens of thousands of simultaneous app creation sessions, per the coding agents solutions page. | 中 | SO023 |
| CO040 | Modal's GPU functions achieved 99.946% uptime over the trailing 90 days as reported by the status page on June 14, 2026. | 中 | SO016 |
| CO041 | A Hacker News community post dated June 3, 2026 cited three major Modal outages in one month, listing a May 7 SEV-1 AWS availability zone overheat, a May 19 incident with no published report, and a June 3 internal authentication system failure. | 中 | SO015 |
| CO042 | The June 3, 2026 outage described in the HN post was characterized as the internal authentication system being down and was noted as resolved the same day. | 中 | SO015 |
| CO043 | Modal's "truly serverless GPUs" blog post (May 2026) describes four technologies: cloud GPU buffers, a custom content-addressed multi-tier container filesystem, CPU-side checkpoint/restore, and CUDA checkpoint/restore. | 中 | SO011 |
| CO044 | Modal's four-technology stack reduces AI inference server replica scaling from multiple kiloseconds (minutes to hours) to tens of seconds, a claimed ~40x improvement. | 高 | SO011, SO025 |
| CO045 | Modal's status page (June 14, 2026) shows CPU function uptime of 99.938% and Sandbox uptime of 99.861% over the trailing 90 days. | 中 | SO016 |
| CO046 | Modal's status page shows GPU function uptime of 99.946% over the trailing 90 days, while community-reported incidents suggest the aggregate uptime figure may obscure incident frequency. | 中 | SO015, SO016 |
| CO047 | The Hacker News feed from the modal.com domain shows a post about "Cutting inference cold starts by 40x with LP, FUSE, C/R, and CUDA-checkpoint" earning 91 points, indicating strong developer community engagement. | 中 | SO025 |
| CO048 | Modal Sandboxes (isolated execution environments for AI-generated code) are described on the Modal blog as first-class compute primitives, and over two million have been launched on Modal per the Series C announcement. | 中 | SO003, SO023 |
| CO049 | A community HN post from June 3, 2026 reported a Modal major outage affecting the internal authentication system; this is the third major incident reported in a single month according to the thread. | 中 | SO015 |
| CO050 | Modal's Sandbox product has facilitated over two million launches, per the Series C blog, indicating meaningful scale in the agentic computing use case. | 中 | SO003 |
| CM001 | Modal's addressable market is the cloud-managed serverless AI compute and inference-as-a-service layer — the platform that packages, deploys, auto-scales, and meters GPU workloads without requiring customers to provision or reserve underlying hardware. | 中 | SM017, SM018, SM019 |
| CM002 | Status-quo substitutes for Modal include self-managed Kubernetes clusters with reserved GPU instances on hyperscalers, specialist GPU clouds (RunPod, Lambda Labs) providing raw rental without managed orchestration, and hyperscaler- native managed AI services (AWS Bedrock, Google Vertex AI, Azure ML). | 中 | SM006, SM009, SM010, SM011 |
| CM003 | Adjacent markets explicitly entered by Modal but not central to its monetization include MLOps experiment tracking, fine-tuning platforms, and developer agent sandbox orchestration; Modal's Training, Volumes, and Sandboxes products address these adjacencies. | 中 | SM022, SM023, SM019 |
| CM004 | Modal's GPU type range as of June 2026 spans from T4 and L4 (entry inference) through A10, A100 (40GB and 80GB), L40S, H100 (PCIe/SXM/NVL), H200, and B200 (Blackwell) with an opt-in B200+ flag that also routes to B300 GPUs where available. | 中 | SM012 |
| CM005 | Included spend in Modal's market encompasses serverless GPU-second fees, managed inference endpoint charges, Sandbox execution, Storage Volume fees, and enterprise support; excluded spend includes model weights, training datasets, data center capex, and general-purpose IaaS compute not dedicated to AI workloads. | 中 | SM018, SM019 |
| CM006 | Technavio sizes the AI inference-as-a-service market at USD 85.25 billion in 2025, with a CAGR of 22.1% forecast for 2026–2030; North America accounts for 41.1% of incremental growth, and the GPU hardware component within this market was valued at USD 42.28 billion in 2024. | 中 | SM002 |
| CM007 | MarketsandMarkets (November 2024) estimates the broader AI infrastructure market (compute, memory, network, storage, and software) at USD 135.81 billion in 2024, forecast to reach USD 394.46 billion by 2030 at a CAGR of 19.4%. | 中 | SM001 |
| CM008 | MarketsandMarkets (December 2024) projects the cloud AI market (including infrastructure, ML platforms, MLOps, AIaaS, and generative AI) to reach USD 327.15 billion by 2029 at a CAGR of 32.4% during the forecast period. | 中 | SM004 |
| CM009 | Mordor Intelligence (page last updated February 17, 2026) forecasts the cloud AI market at USD 269.02 billion by 2031 at an 18.68% CAGR from 2026, with hybrid and multi-cloud architectures projected to grow at 22.31% CAGR; Asia-Pacific leads growth at 22.74% CAGR. | 中 | SM003 |
| CM010 | The analyst estimates for Modal's market (ranging from USD 85.25B [Technavio inference service layer] to USD 394.46B [MarketsandMarkets AI infrastructure including hardware]) should not be summed; they reflect different definitional boundaries and different inclusions of on-premises, hardware, and service spending. | 中 | SM001, SM002, SM003, SM004 |
| CM011 | MarketsandMarkets' broadest AI market estimate (hardware + software + services + generative AI) puts the full sector at USD 601.93 billion in 2026, projected to reach USD 3.638 trillion by 2033 at a 29.3% CAGR; Modal is exposed to the software and services sub-layers of this market but not to hardware capex. | 中 | SM005 |
| CM012 | A bottom-up SAM estimate — applying a 25–30% cloud-managed or serverless share to the MarketsandMarkets USD 135.81B AI infrastructure figure for 2024 — yields an implied SAM of approximately USD 34–41 billion for the managed cloud compute layer relevant to Modal, growing proportionally with the broader market. | 低 | SM001, SM004 |
| CM013 | Modal's >$300 million ARR disclosed in its May 2026 Series C announcement represents approximately 0.35% penetration of the USD 85.25 billion AI inference- as-a-service market (Technavio 2025), confirming very early stage penetration in a large and fast-growing market. | 中 | SM019, SM002 |
| CM014 | No public analyst report segments "serverless GPU cloud" or "Python- native AI compute platform" as a standalone market category; all available sizing estimates cover broader or differently-defined categories, making it impossible to reference a clean published SAM for Modal's specific positioning. | 中 | SM001, SM002, SM003 |
| CM015 | Mordor Intelligence (February 2026) cites persistent shortages of NVIDIA H100 and AMD MI300X GPUs with limited HBM3 supply, stretching hardware lead times past 12 months and constraining new AI training projects. | 中 | SM003 |
| CM016 | GPU fractionalization platforms enable companies to rent one-eighth or one-quarter slices of H100 or MI300X accelerators at costs below USD 2 per hour, creating a structural pricing floor for batch-optimized AI inference workloads and compressing margins for managed platforms. | 中 | SM003 |
| CM017 | RunPod's published GPU cloud pricing as of June 2026 shows H100 PCIe at $2.89/hr, H100 SXM at $3.29/hr, H100 NVL at $3.19/hr, H200 at $4.39/hr, B200 at $5.89/hr, A100 SXM at $1.49/hr, and L40S at $0.86/hr. | 中 | SM006 |
| CM018 | Modal's GPU documentation as of June 2026 explicitly recommends the L40S as the starting point for production inference (excellent cost-to-performance at 48GB GPU RAM) and notes that memory-bound workloads with small batch sizes do not benefit proportionally from higher-arithmetic-throughput Blackwell chips. | 中 | SM012 |
| CM019 | AWS Bedrock uses a per-token API pricing model for foundation model inference (with distinct per-token rates for input and output tokens per model), positioning it as an API-gateway layer rather than a raw compute layer; Bedrock also charges per-image for image generation and per-second for video models. | 中 | SM009 |
| CM020 | Azure Machine Learning pricing is structured as pay-as-you-go (per-second compute capacity), Azure Savings Plan (fixed hourly rate committed for 1–3 years globally), and Azure Reserved VM Instances (one-year or three-year commitments); an ML service surcharge layer is added on top of the base VM price. | 中 | SM010 |
| CM021 | Google Vertex AI (Agent Platform) charges for training at $3.465 per hour and for deployment and online prediction at $1.375–$2.002 per hour, depending on model type; these rates apply to managed AutoML training, not serverless GPU inference on arbitrary user-provided models. | 中 | SM011 |
| CM022 | Together AI's inference API prices range from approximately $5.00 per million tokens for smaller open models to $60.00 per million tokens for the largest frontier-class models as of June 2026; fine-tuning is also priced per token in the training dataset. | 中 | SM008 |
| CM023 | Replicate's pricing model for private models charges customers for all online time including idle waiting time, not only active processing time, except for fast-boot fine-tunes which are billed only for active time; this contrasts structurally with Modal's serverless model where idle time is not billed. | 中 | SM007 |
| CM024 | Modal's Series C announcement and case study corpus reveal five distinct buyer archetypes: AI-native product companies (Suno, Decagon, Lovable), agentic coding platforms (Cognition, Ramp), robotics/physical AI labs (Physical Intelligence), enterprise ML platform teams (DoorDash, Substack), and RL/research compute teams (Applied Compute serving DoorDash, Cognition, Mercor). | 中 | SM019, SM020 |
| CM025 | Suno's co-founders explicitly stated they did not want to manage Kubernetes clusters, commit to three-year GPU reservations, or divert engineering resources to infrastructure when choosing Modal; these stated pain points define the primary adoption trigger for AI-native startups in the serverless compute market. | 中 | SM016 |
| CM026 | Suno's GPU usage on Modal peaks dramatically on holidays (Christmas, Valentine's Day) as users create more songs to share, illustrating that usage- based serverless pricing eliminates the trade-off between over-provisioning for peaks and degraded experience during spikes. | 中 | SM016 |
| CM027 | Modal's pricing tiers as of June 2026 are Starter ($0/month with $30 in free GPU credits and 10 GPU concurrency), Team ($250/month with 50 GPU concurrency), and Enterprise (custom pricing, unlimited concurrency negotiated); these tiers define the PLG land-and-expand funnel. | 高 | SM018, SM017 |
| CM028 | The budget owner for Modal deployments typically starts in product or engineering (developer self-serve credit card phase), migrates to departmental budget once production workloads are committed, and then transitions to central platform or IT budgets at enterprise scale as compliance and SLA requirements arise. | 中 | SM018, SM019 |
| CM029 | Modal's examples page documents 24 or more distinct use-case templates as of June 2026 spanning LLM inference (OpenAI-compatible endpoints), protein folding (ESMFold2, Boltz-2, Chai-1), coding agent deployment, image generation (Flux), batch audio transcription (Whisper), video generation, music generation (ACE-Step), RAG pipelines, and scientific computing. | 高 | SM015, SM022 |
| CM030 | Modal enforces per-function scale limits of 2,000 pending inputs and 25,000 total (running + pending) inputs for standard functions; async .spawn() jobs are allowed up to 1 million pending inputs; each .map() invocation can process at most 1,000 inputs concurrently. | 中 | SM014 |
| CM031 | The primary structural driver of the serverless AI compute market is rapid growth in open-source model complexity: as LLM parameter counts scale into the hundreds of billions, inference infrastructure cost and management complexity grow faster than model size, increasing the premium on managed platforms that abstract operational overhead. | 中 | SM001, SM002 |
| CM032 | Agentic AI architectures require isolated, ephemeral execution environments (Sandboxes) that scale from zero to thousands of containers on sub-second demand; this workload class is a major Modal growth driver because Kubernetes-backed reserved infrastructure is poorly suited for its bursty, security-sensitive execution requirements. | 中 | SM023, SM019 |
| CM033 | GPU supply shortages — H100 and MI300X lead times exceeding 12 months as cited by Mordor Intelligence (February 2026) — structurally push AI development teams toward pooled managed GPU clouds rather than direct hardware procurement, expanding the addressable market for elastic compute platforms. | 中 | SM003 |
| CM034 | The mix shift from AI training (large periodic jobs) to AI inference (persistent, latency-sensitive serving) is a structural market driver: by 2025–2026 inference accounts for a growing and larger share of total AI compute spend for most production AI companies, and inference workloads align better with Modal's serverless per-second billing than one-time large training jobs. | 中 | SM001, SM004 |
| CM035 | North America accounts for 41.1% of incremental growth in the AI inference- as-a-service market per Technavio's 2026 forecast, strongly aligning with Modal's New York City headquarters and the geographic concentration of its known customer base including Suno, Cognition, DoorDash, Ramp, and Substack. | 中 | SM002 |
| CM036 | Hyperscaler incumbency (AWS Bedrock, Google Vertex AI, Azure ML) is the primary ceiling constraint on Modal's addressable enterprise market: large enterprises with multi-year cloud discount commitments (EDP, CUD) face meaningful switching friction to route AI workloads to a standalone provider like Modal. | 中 | SM009, SM010, SM011 |
| CM037 | GPU supply constraints create ceiling pressure on Modal's elastic scaling guarantees: when NVIDIA H100/H200/B200 allocation remains constrained through 2026, compute platform providers — including Modal — cannot guarantee unlimited instantaneous scaling, limiting the dependability of the elastic scaling value proposition for large burst events. | 中 | SM003 |
| CM038 | Modal's cold-start documentation (June 2026) states containers boot in approximately one second, but loading large model weights (tens of gigabytes) adds initialization time ranging from seconds to minutes unless models are pre- cached using Modal Volumes, which increases effective GPU-hour spend during warm-up. | 中 | SM013, SM021 |
| CM039 | Data residency, HIPAA, FedRAMP, and GDPR compliance requirements represent an emerging constraint on Modal's enterprise TAM: buyers in healthcare, finance, and EU markets require explicit infrastructure guarantees that a multi-tenant serverless cloud must demonstrate, and Modal's compliance certification posture (SOC2, HIPAA BAA status) was not independently confirmed in the fetched public corpus. | 低 | SM003, SM019 |
| CM040 | Bare-metal GPU spot-cloud pricing (RunPod L40S at $0.86/hr, A100 SXM at $1.49/hr in June 2026) creates structural price pressure for cost-sensitive buyers who are willing to accept the operational overhead of managing their own orchestration in exchange for lower per-GPU-hour rates. | 中 | SM006 |
| CM041 | Modal's >$300M ARR in 2026 at approximately 0.35% of the $85.25B inference-as-a-service market (Technavio 2025) implies very low penetration, suggesting the remaining opportunity is over 200x the current run-rate if market share can be sustained. | 中 | SM019, SM002 |
| CM042 | The divergence between analyst estimates — ranging from USD 85.25B (Technavio, narrow inference service layer) to USD 394.46B (MarketsandMarkets, full AI infrastructure including hardware) to USD 601.93B (MarketsandMarkets, broadest AI market) — reflects category definition inconsistency and should be treated as directional, not precise. | 中 | SM001, SM002, SM003, SM004, SM005 |
| CM043 | The absence of a dedicated analyst sub-category for "serverless GPU cloud" or "Python-native AI compute platform" is a structural diligence gap: investors cannot reference a published SAM for Modal's specific positioning and must rely on bottom-up constructs or proxy categories. | 低 | |
| CM044 | The GPU fractionalization trend — enabling sub-$2/hr slices of H100 or MI300X — creates a structural pricing floor threat for Modal's batch-optimized workload segment: if hyperscalers or specialist providers offer fractional GPU access at commodity prices, Modal must demonstrate that developer experience, reliability, and scaling automation justify a premium. | 中 | SM003, SM006 |
| CM045 | Asia-Pacific is forecast to grow at a 22.74% CAGR by Mordor Intelligence (February 2026), driven by sovereign-AI mandates and large-scale digital infrastructure investments; Modal has not publicly disclosed international go-to-market strategy or Asian customer traction, representing an unconfirmed expansion opportunity. | 中 | SM003 |
| CM046 | Modal's GPU documentation references the pricing page for the latest GPU rates; the pricing page is publicly accessible but does not display specific per-GPU per-hour rates in the fetched version — only compute and storage tiers on the Starter/Team/Enterprise plan structure. | 中 | SM012, SM018 |
| CM047 | Modal's $4.65B Series C valuation at >$300M ARR implies a revenue multiple of approximately 15x ARR; this multiple is consistent with premium AI infrastructure companies showing high growth trajectories in 2026, and is supported by the market's 19–32% CAGR range which implies strong continued revenue expansion. | 中 | SM019, SM002, SM004 |
| CM048 | MarketsandMarkets' June 2026 update for the US AI market projects USD 750.04 billion by 2032, confirming continued enterprise AI investment growth as a baseline assumption for Modal's addressable market trajectory in North America. | 中 | SM005 |
| CP001 | Modal's pricing tiers in 2026 are Starter ($0 base, $30/month in free GPU credits, 10 GPU concurrency), Team ($250/month plus compute, 50 GPU concurrency), and Enterprise (custom pricing). | 高 | SP001, SP024 |
| CP002 | Replicate's platform runs hundreds of public AI models via a one-line API and also supports private model deployment using Cog, its open-source packaging tool. | 高 | SP005, SP007 |
| CP003 | RunPod serves more than 750,000 developers across 31 global regions with 30+ GPU SKUs, and Sacra estimated its ARR at $120M in January 2026 on $22M in total funding. | 中 | SP008, SP025, SP027 |
| CP004 | Baseten's homepage claims 99.99% uptime out of the box, blazing-fast cold starts, and SOC 2 Type II and HIPAA compliance across all tiers, and the company has raised $585M (Business Wire). | 高 | SP011, SP012 |
| CP005 | Beam Cloud is a Python-first compute platform offering sandboxes, GPU inference, durable task queues, and deployment across any AWS, GCP, Azure, or Hetzner account from a single Python SDK. | 高 | SP013, SP014 |
| CP006 | Banana.dev offers GPU inference hosting at a flat monthly rate ($1,200/month for the Team plan with 50 parallel GPUs maximum) plus at-cost compute with zero markup. | 中 | SP015 |
| CP007 | Lambda AI (formerly Lambda Labs) is positioned as "The Superintelligence Cloud" and holds ISO 27001, ISO 27017, ISO 27701, ISO 22301, and SOC 2 Type II certifications. | 中 | SP016 |
| CP008 | CoreWeave describes itself as "The Essential Cloud for AI" and claims 96% cluster goodput, 10x faster inference spin-up compared to hyperscalers, and multi-billion-dollar enterprise contracts. | 中 | SP017 |
| CP009 | AWS SageMaker (rebranded SageMaker Unified Studio) is a comprehensive platform for data, analytics, and AI development, including model training, deployment, governance, and observability under one interface. | 高 | SP019, SP023 |
| CP010 | Google Cloud Run offers on-demand NVIDIA L4 GPU instances that start in 5 seconds and scale to zero, with scale-to-zero as the default configuration. | 高 | SP020, SP021 |
| CP011 | Google's Gemini Enterprise Agent Platform (formerly Vertex AI) provides 200+ Google and third-party models, Agent Studio, custom model training, MLOps pipelines, and feature store as an integrated platform. | 高 | SP021, SP020 |
| CP012 | Azure Container Apps provides a Sandbox mode for executing untrusted AI-generated code and offers Serverless GPUs with pay-per-second billing and scale-to-zero as a default. | 中 | SP022 |
| CP013 | Together AI offers per-token foundation model inference pricing (e.g., $2.10/1M input tokens for DeepSeek V4 Pro) and raised a $305M Series B at a $3.3B valuation per Sacra. | 中 | SP026, SP024 |
| CP014 | Sacra estimates Modal reached $300M in annualized revenue in April 2026, up from ~$119M at the end of 2025, driven by inference, batch jobs, and agent sandboxes. | 中 | SP024 |
| CP015 | RunPod's FlashBoot technology enables sub-200ms cold starts for serverless workers, competing directly with Modal's approximately one-second cold start for pre-warmed containers. | 高 | SP009, SP008 |
| CP016 | Modal's primary developer-facing differentiator is its Python-native SDK with `@app.function()` decorators; Suno's CTO cited "no config files needed" as a key adoption reason. | 高 | SP001, SP002 |
| CP017 | CoreWeave's H200 NVL72 on-demand rate is $42.00/hr for the 8-GPU configuration, and its B300 spot pricing is $35.84/hr, targeting large-cluster training rather than per-function inference. | 高 | SP018, SP017 |
| CP018 | Beam Cloud's serverless GPU pricing starts at $0.000192/second for RTX 4090 and $0.000292/second for A10G; on-demand H100 PCIe is listed from $1.74/hr. | 高 | SP014, SP013 |
| CP019 | Modal Sandboxes run in gVisor-secured containers, the same sandboxing technology used in Google Cloud Run and Google Kubernetes Engine, providing hardware-isolated execution for agentic code. | 高 | SP004, SP003 |
| CP020 | Baseten's forward-deployed engineers (FDEs) work hands-on with customers to build, optimize, and scale models—a differentiated support layer not documented in Modal's public offering. | 高 | SP011, SP012 |
| CP021 | AWS Bedrock offers batch inference at 50% below on-demand pricing for supported open models, creating a discount path for AWS-committed enterprises that competes on economics with Modal. | 高 | SP023, SP019 |
| CP022 | Sacra confirms Modal operates a multi-cloud architecture with AWS, GCP, and Oracle Cloud Infrastructure, and that the Oracle partnership provides pricing flexibility and GPU capacity access. | 中 | SP024 |
| CP023 | Replicate private models bill for setup time, idle time, and active processing time on dedicated hardware; this differs structurally from Modal's scale-to-zero serverless billing. | 高 | SP006, SP005 |
| CP024 | The status-quo alternative to Modal—Kubernetes clusters backed by reserved GPU instances on AWS, GCP, or Azure—demands devops staffing, multi-year financial commitments, and significant cluster management overhead, as explicitly cited by Suno's founders. | 高 | SP024, SP001, SP028 |
| CP025 | Sacra confirms Modal's marketplace integrations with major cloud providers allow enterprises to apply existing committed cloud spend, reducing procurement friction for enterprise sales. | 中 | SP024 |
| CP026 | Sacra's analysis confirms Modal's multi-cloud architecture automatically selects the most cost-effective GPU capacity across providers to optimize costs. | 中 | SP024 |
| CP027 | Azure Container Apps Express tier offers instant provisioning, sub-second startup, and scale-from-zero for serverless AI apps and agents, directly overlapping with Modal's serverless function offering. | 中 | SP022 |
| CP028 | Lambda AI's compliance portfolio (ISO 27001, ISO 27017, ISO 27701, ISO 22301, SOC 2 Type II) exceeds Modal's publicly documented compliance posture, which has HIPAA available only at the Enterprise tier with no public SOC 2 Type II confirmation. | 高 | SP016, SP004 |
| CP029 | Modal's Sandbox product uses gVisor, the same sandboxing technology used in Google Cloud Run and GKE, indicating convergence of security primitives between Modal and GCP at the infrastructure layer. | 中 | SP004, SP020 |
| CP030 | RunPod operates two GPU supply tiers: enterprise Secure Cloud (data center partnerships) and Community Cloud (aggregated spare capacity from vetted hosts), with the latter offering lower prices but potential consistency differences. | 高 | SP008, SP025 |
| CP031 | Sacra reports Replicate serves over 25,000 paying customers, primarily through its community model library, indicating a broader but shallower developer funnel compared to Modal's enterprise-focused roster. | 中 | SP024 |
| CP032 | Sacra reports Together AI raised a $305M Series B at a $3.3B valuation to build an AI acceleration cloud on NVIDIA Blackwell GPUs, positioning it as a foundation model inference competitor rather than a custom model hosting competitor. | 中 | SP024 |
| CP033 | Baseten's inference stack integrates open-source engines (TensorRT-LLM, SGLang, vLLM, TGI, TEI) with custom performance optimizations including speculative decoding and KV-cache management— capabilities absent from Modal's generalist serverless compute platform. | 高 | SP011, SP012 |
| CP034 | CoreWeave claims 10x faster inference spin-up times compared to hyperscalers and 96% cluster goodput, positioning it for demanding production AI training and inference at multi-GPU scale. | 中 | SP017 |
| CP035 | RunPod grew from 100,000 developers in May 2024 to over 500,000 by January 2026 according to Sacra, while also announcing an OpenAI partnership as infrastructure provider for the Model Craft Challenge Series in March 2026. | 中 | SP008, SP025 |
| CP036 | Modal's switching cost is primarily workflow-level: migrating a codebase from `@modal.function()` decorators requires non-trivial rearchitecting, but model weights, Docker containers, and inference frameworks (vLLM, TRT-LLM) are fully portable, enabling multi-homing. | 高 | SP003, SP024 |
| CP037 | The deepest switching cost in this market remains the status-quo alternative: enterprises that have built Kubernetes-based GPU infrastructure are anchored by devops investment, custom monitoring, IAM integration, and vendor relationships, making Modal's migration pitch easier than raw competitor displacement. | 高 | SP019, SP020, SP024 |
| CP038 | Hyperscalers (AWS, GCP, Azure) retain the strongest distribution advantage through cloud commitment programs (AWS EDP, GCP CUDs, Azure MACC) that bundle AI compute into existing enterprise contracts, creating a procurement barrier for standalone AI cloud vendors. | 高 | SP019, SP020, SP022 |
| CP039 | Modal's marketplace listings on AWS, GCP, and Azure enable enterprises to apply existing committed cloud spend toward Modal bills, partially neutralizing hyperscaler procurement bundling advantage. | 中 | SP024 |
| CP040 | Beam Cloud explicitly supports deploying GPU workloads in customer-owned AWS, GCP, Azure, and Hetzner accounts, creating a BYOC (bring-your-own-cloud) option that Modal does not currently offer. | 高 | SP013, SP014 |
| CI001 | Modal charges exclusively for compute usage on a per-second basis; the platform has no seat fees, per-API-call charges, or token-metered pricing. | 高 | SI003, SI004 |
| CI002 | Three plan tiers define Modal's commercial packaging — Starter ($0/month), Team ($250/month), and Enterprise (custom pricing) — with compute billed separately under all plans. | 中 | SI003 |
| CI003 | The Starter plan includes $30/month in free compute credits, three workspace seats, 100 concurrent containers, and 10 GPU concurrencies. | 中 | SI003 |
| CI004 | The Team plan ($250/month) includes $100/month in compute credits, unlimited seats, 1,000 containers, 50 GPU concurrencies, custom domains, static IP proxy, and deployment rollbacks. | 中 | SI003 |
| CI005 | Modal's published CPU compute price is $0.00003942 per physical core per second (approximately $2.37/core-hour), with a minimum of 0.125 cores per container; memory is priced at $0.00000672 per GiB per second. | 中 | SI003 |
| CI006 | Modal's pricing page illustrates a serverless-vs-traditional cost comparison where a Modal serverless deployment of an average 50 GPUs over 24 hours at ~$3.95/GPU-hour ($4,740 total) compares favorably to a traditional fixed-fleet approach of 75 GPUs at $3/GPU-hour ($5,400 total), despite a higher per-GPU rate. | 中 | SI003 |
| CI007 | The Enterprise plan includes volume-based discounts, higher GPU concurrency, embedded ML engineering services, private Slack support, audit logs, Okta SSO, and HIPAA compliance; pricing is custom-negotiated. | 中 | SI003 |
| CI008 | All Modal workspaces are billed monthly; incremental usage charges are triggered within a billing cycle when certain thresholds are exceeded; Team and Enterprise plans include a billing-report API for cost attribution. | 中 | SI004 |
| CI009 | Modal transacts through AWS and GCP marketplace, enabling enterprise customers to apply committed hyperscaler spend toward Modal workloads, reducing procurement friction. | 中 | SI003 |
| CI010 | Custom invoicing, international bank-transfer payment, invoice splitting, and similar enterprise billing requirements are available to Enterprise customers with a usage commitment. | 中 | SI004 |
| CI011 | Modal's Series C blog (May 2026) disclosed that Sandboxes—isolated containers for agent and untrusted-code execution—drive more than one-third of total company revenue, making them the second-largest revenue line. | 中 | SI001 |
| CI012 | Modal offers four primary revenue-generating product surfaces beyond compute Functions — Sandboxes, Volumes (distributed storage), Buckets (object storage), and Notebooks (browser-based Jupyter environments with GPU access and idle shutdown) — all billed on consumption. | 高 | SI005, SI006, SI011, SI003 |
| CI013 | Modal operates a startup-credits program offering free GPU compute to early-stage companies, bundled with direct access to Modal's engineering team for technical support and GTM amplification on launches and fundraises. | 中 | SI009 |
| CI014 | Modal's go-to-market is developer-led; the free Starter tier and compute credits create a low-friction trial path for Python developers, with organic upgrade to Team and Enterprise as workloads scale. | 高 | SI001, SI003, SI009 |
| CI015 | AWS and GCP marketplace integrations reduce enterprise sales friction by allowing large accounts to apply existing cloud commitments to Modal spend, enabling procurement without a standalone vendor relationship. | 中 | SI003 |
| CI016 | Applied Compute—which builds RL infrastructure for DoorDash, Cognition, and Mercor—cited Modal as the only platform that provided the right primitives at every layer of the RL loop, from Sandboxes for environment simulation to production inference. | 中 | SI019 |
| CI017 | Substack migrated its entire ML portfolio (spam detection, recommendations, transcription, image generation) from AWS SageMaker to Modal, representing a major sticky workload migration. | 中 | SI021 |
| CI018 | Quora uses Modal Sandboxes for safe code execution in its Poe AI chatbot platform, estimating the platform saves the equivalent of two engineers' ongoing infrastructure maintenance work. | 中 | SI022 |
| CI019 | Cognition reported running millions of Sandboxes in parallel on Modal for coding-agent workflows, a level of consumption that corroborates the disclosed Sandbox revenue share. | 中 | SI001 |
| CI020 | The startup program offers free GPU credits plus direct Modal engineering team access, creating brand affinity and a conversion pipeline from high-growth startups that subsequently scale to paid workloads. | 中 | SI009 |
| CI021 | Modal operates an asset-light supply model, aggregating GPU capacity from multiple cloud providers—confirmed as AWS, GCP, and Oracle Cloud Infrastructure—rather than purchasing or financing its own GPU hardware. | 高 | SI002, SI010 |
| CI022 | Sacra's Modal research report confirms an Oracle Cloud Infrastructure partnership as a GPU capacity source alongside AWS and GCP, providing a third supply channel for cost and availability diversification. | 中 | SI002 |
| CI023 | Modal has built a proprietary technology stack in-house including a custom Rust-based container runtime, a content-addressed container filesystem, CPU process checkpoint/restore, and CUDA/GPU memory checkpoint/restore. | 高 | SI001, SI007 |
| CI024 | GPU memory snapshotting reduces cold-start latency by capturing and restoring GPU memory state, cutting model-loading and initialization overhead to near-zero for warm containers; the Modal docs confirm this as alpha/GA feature. | 中 | SI007 |
| CI025 | Modal's truly-serverless-gpus blog post (in Chapter 1) documented four proprietary cold-start technologies delivering 40–100x improvement over baseline GPU cold starts; this technology layer differentiates Modal's cost structure from a pure GPU-rental pass-through. | 高 | SI001, SI023 |
| CI026 | Modal does not own or directly finance GPU hardware; all compute is procured from hyperscalers, keeping fixed asset intensity low relative to GPU-owning competitors and eliminating depreciation from cost structure. | 高 | SI002, SI001 |
| CI027 | Modal pools GPU capacity across hundreds of data centers globally, enabling cross-region and cross-cloud autoscaling that reduces idle compute costs and improves supply availability without reserved-instance commitments. | 高 | SI001, SI010 |
| CI028 | RunPod's published GPU cloud list prices (June 2026) are H200 $4.39/hr, B200 $5.89/hr, H100 SXM $3.29/hr, A100 SXM $1.49/hr, L40S $0.86/hr—providing a raw-compute price floor for GPU infrastructure comparison. | 中 | SI024 |
| CI029 | Modal's Series C raised $355M at a $4.65B post-money valuation in May 2026, co-led by General Catalyst and Redpoint Ventures, with Menlo Ventures, Bain Capital Ventures, and Accel joining as new investors; all existing major investors participated. | 高 | SI001, SI017, SI018 |
| CI030 | General Catalyst's team for the Modal Series C investment includes Quentin Clark, Max Rimpel, and Katie Keller; the GC portfolio page describes Modal as "a serverless cloud for the AI era." | 中 | SI017 |
| CI031 | Modal's Series B raised approximately $110M (per Company Overview context; Sacra reports $87M in September 2025—discrepancy represents an evidence gap) at a $1.1B post-money valuation, with Redpoint Ventures among lead investors. | 中 | SI002 |
| CI032 | Modal raised a $16M Series A in October 2023 led by Redpoint Ventures and a ~$7M seed round in early 2022 led by Amplify Partners, per Sacra research. | 中 | SI002 |
| CI033 | Modal's total public capital raised is approximately $465M, calculated as seed (~$7M) + Series A (~$16M) + Series B (~$110M) + Series C ($355M); exact seed and Series A amounts are not in the fetched corpus. | 中 | SI001, SI002 |
| CI034 | No cash balance, monthly burn rate, or runway figure has been publicly disclosed by Modal or any investor source as of June 2026. | 高 | SI001, SI002 |
| CI035 | Modal's Series C blog states "120+ team across NY, SF and Stockholm"; LinkedIn shows approximately 180 employees in the company people section, representing the public headcount range. | 中 | SI001, SI025 |
| CI036 | Modal disclosed surpassing $300M in annualized revenue in its May 2026 Series C announcement—a voluntary public ARR disclosure uncommon among private infrastructure companies at Series C. | 中 | SI001 |
| CI037 | Modal's Series C blog states revenue has grown "fivefold since" the Series B (closed October 2025), implying a growth multiple of approximately 5x in roughly seven months. | 中 | SI001 |
| CI038 | Sacra estimates Modal's ARR at $300M in April 2026, up from approximately $119M at the end of 2025, representing approximately 150% growth in five months. | 中 | SI002 |
| CI039 | Extrapolating from Sacra's estimates, Modal grew from approximately $119M ARR (December 2025) to $300M ARR (April 2026), a compounded monthly growth rate of approximately 20%, which annualizes to roughly 800%. | 低 | SI002 |
| CI040 | Sacra's report describes Modal's revenue as consumption-based and describes an expansion loop driven by developer adoption and workload breadth, with revenue scaling as customers deploy more workloads and larger GPU jobs. | 中 | SI002 |
| CI041 | Modal's status page (June 2026) shows 90-day uptime figures of 99.946% for GPU Functions, 99.933% for web endpoints, 99.861% for Sandboxes, and 99.782% for Snapshot restores; these figures represent aggregate averages rather than incident-free periods. | 中 | SI020 |
| CI042 | A Hacker News post from June 3, 2026 (user "hunkins") documents three major Modal outages in one month — a SEV1 AWS overheating incident on May 7, an incident on May 19 with no published post-mortem, and an internal authentication system failure on June 3—characterizing them collectively as a concerning operational pattern. | 中 | SI026 |
| CI043 | Modal's implied revenue multiple at Series C is approximately 15.5x ARR ($4.65B valuation / $300M ARR), consistent with premium AI-infrastructure multiples in mid-2026 but demanding against a gross-margin profile that is not publicly known. | 高 | SI001, SI002 |
| CI044 | No gross margin, cost of revenue, COGS breakdown, product-level contribution margin, or cloud-procurement unit cost has been publicly disclosed by Modal or corroborated by an independent source. | 高 | SI002, SI001 |
| CI045 | Analysts covering comparable asset-light GPU aggregator businesses estimate gross margins in the 30–50% range; this estimate is not confirmed for Modal and is an illustrative range only. | 低 | SI002 |
| CI046 | Based on estimated headcount of 120–180 employees and typical New York/San Francisco AI infrastructure compensation and infrastructure costs, Modal's annual cash burn is estimated in the range of $50M–$120M; this estimate is not company-disclosed and should not be cited as a confirmed figure. | 低 | SI025, SI001 |
| CI047 | No CAC, payback period, NRR, logo churn, or dollar churn data have been publicly disclosed by Modal or any investor source as of June 2026. | 高 | SI001, SI002 |
| CI048 | There is a material evidence gap between Sacra's report ($87M Series B, September 2025, led by Lux Capital) and the company-context figure ($110M Series B, October 2025); the exact size, date, and lead investor of the Series B cannot be confirmed from the publicly fetched corpus. | 中 | SI002 |
| CI049 | RunPod lists H100 SXM at $3.29/hr on its public pricing page; Modal's pricing page example implies approximately $3.95/GPU-hr for its serverless pool—a premium of approximately 20% consistent with the value of managed autoscaling and sub-second cold starts. | 中 | SI003, SI024 |
| CI050 | PitchBook records Modal Labs as having completed at least three institutional funding rounds through mid-2026 — a seed, Series B, and Series C — with General Catalyst and Redpoint co-leading the Series C; the company profile is behind a paywall and exact PitchBook-recorded round sizes may differ from public disclosures. | 中 | SI029 |
| CE001 | Modal exposes Functions (GPU/CPU serverless compute), Sandboxes (isolated code execution), Training, Volumes, Web Endpoints, Notebooks, Dicts, and Queues as its core product primitives. | 高 | SE001, SE022 |
| CE002 | Modal's primary developer interface is the Python SDK; developers add @app.function() and @app.cls() decorators to Python functions to define cloud compute jobs, with GPU type, secrets, volumes, and concurrency specified inline. | 高 | SE001, SE030 |
| CE003 | Modal publicly supports the following GPU types: T4, L4, A10, L40S, A100-40GB, A100-80GB, H100, H200, B200, and B200+ (opt-in to B300); per-container GPU counts go up to 8 for most high-end SKUs. | 高 | SE006, SE027 |
| CE004 | Modal may automatically upgrade an H100 request to H200 or an A100-40GB request to A100-80GB at no extra charge to the customer, improving pool utilization. | 高 | SE006, SE027 |
| CE005 | The B200+ option allows Modal to run requests on either B200 or B300 hardware billed at B200 pricing; B300 requires CUDA 13.0+; the option widens the effective capacity pool. | 中 | SE006 |
| CE006 | Modal Sandboxes are ephemeral isolated containers launched at runtime via Sandbox.create(); they pass through Created, Scheduled, Started, Ready, and Finished lifecycle states. | 高 | SE003, SE029 |
| CE007 | Sandboxes support TCP tunnels (automatic TLS termination), QUIC-based portals for real-time bidirectional communication (with UDP hole punching), volume mounts, readiness probes, and exec() for arbitrary in-container commands. | 高 | SE003, SE025 |
| CE008 | Modal Volumes are a high-performance distributed filesystem optimized for write-once, read-many ML workloads; they are distributed by default (no replica management needed), backed by multi-cloud storage for high availability, and support up to 2.5 GB/s bandwidth. | 高 | SE007, SE001 |
| CE009 | Modal Dicts are a distributed key-value store with cloudpickle serialization, 100 MiB/object limit, 10,000 entries/update limit, a 7-day inactivity TTL, and a locking primitive for distributed coordination. | 中 | SE008 |
| CE010 | Modal Queues are multi-producer, multi-consumer FIFO queues with up to 100,000 partitions, 5,000 items per partition, 1 MiB item limit, a 24-hour default TTL, and synchronous/async access. | 中 | SE009 |
| CE011 | Modal Web Functions support @modal.fastapi_endpoint (wraps a Python function in FastAPI), @modal.asgi_app, and @modal.wsgi_app; each creates a public internet HTTPS endpoint; containers scale to zero between requests. | 高 | SE002, SE001 |
| CE012 | Modal supports function scheduling via modal.Period (interval between calls) and modal.Cron (cron syntax) attached to deployed functions, with monitoring via the web dashboard; schedules cannot be paused without redeployment. | 中 | SE014 |
| CE013 | Modal containers run inside gVisor, the sandboxing technology used in Google Cloud Run and GKE; the default container environment is Debian Linux with a Python installation; all Functions and Sandboxes use this isolation. | 高 | SE010, SE011 |
| CE014 | Modal Images are defined in Python via method chaining (Image.debian_slim().pip_install(...)); no YAML or Dockerfile is required; uv pip_install, add_local_dir, add_local_python_source, and Dockerfile fallback are all supported. | 高 | SE011, SE001 |
| CE015 | CPU Memory Snapshots (GA since January 2025) capture container memory state just before the first request; subsequent cold starts restore directly from the frozen state, skipping Python imports, JIT compilation, and model initialization; practical speedups are 3–10x. | 高 | SE005, SE012 |
| CE016 | GPU Memory Snapshots (alpha) use the NVIDIA CUDA checkpoint/restore API (driver branches 570/575) to checkpoint device memory, CUDA kernels, streams, contexts, and memory mappings; the feature requires cuCheckpointProcessCheckpoint() and cuCheckpointProcessRestore(). | 高 | SE005, SE012 |
| CE017 | Modal published GPU Memory Snapshot benchmarks showing: vLLM serving Qwen2.5-0.5B-Instruct from 45s to 5s P0 cold start; a ViT inference function with torch.compile from 8.5s to 2.25s P0; up to 10x faster cold boot overall. | 中 | SE012 |
| CE018 | Reducto achieved an 83% reduction in cold boot time (from approximately 70s to approximately 12s) for its production document-processing models after adopting GPU memory snapshotting on Modal. | 中 | SE026 |
| CE019 | Modal's four-pillar cold-start architecture comprises: (1) cloud buffers of idle GPUs maintained for each GPU type; (2) a content-addressed multi-tier container filesystem; (3) CPU checkpoint/restore (Memory Snapshots); (4) CUDA GPU checkpoint/restore (GPU Memory Snapshots). | 高 | SE027, SE004 |
| CE020 | Modal's custom content-addressed container filesystem caches popular container image files in worker memory; this yields 3–5x faster file delivery than uncached downloads and benefits all users that import commonly used libraries like torch. | 高 | SE027, SE012 |
| CE021 | Modal documentation states that containers boot in approximately 1 second via its custom container stack; initialization time beyond container boot depends on application code (imports, model loading) and is addressed by Memory Snapshots. | 高 | SE004, SE027 |
| CE022 | Reducto achieved a 3x reduction in P90 latency and scaled to over 1,000 GPUs in under an hour for a 100k-pages-per-minute enterprise load test, using independent per-model autoscaling and per-customer compute pools on Modal. | 中 | SE026 |
| CE023 | Physical Intelligence runs inference for real-time robotic control on Modal with only 10–15ms of network overhead, using a QUIC-based portal over UDP with automatic STUN/NAT traversal, coordinated via Modal Tunnels for rendezvous. | 中 | SE025 |
| CE024 | Applied Compute used Modal Sandboxes, Functions, and Training as a unified RL loop platform (rollouts, grading fan-out, inference) for enterprise RL customers including DoorDash, Cognition, and Mercor; they found Modal was the only platform with appropriate primitives at each layer. | 中 | SE024 |
| CE025 | As of May 2026, over 1 billion Sandboxes have been launched on Modal, per Modal's own X/Twitter post cited in the Series C blog. | 中 | SE039 |
| CE026 | Modal completed a SOC 2 Type II audit with no deviations found (announced January 2, 2025); the audit covers security, availability, and confidentiality; Modal commits to annual renewal; the report is available on request via trust.modal.com. | 高 | SE010, SE019, SE020 |
| CE027 | Modal's security documentation states that the worker runtime and storage infrastructure are written in Rust; all user data is encrypted in transit (TLS 1.3) and at rest; software dependencies are audited by GitHub Dependabot; code reviews use a PR-based workflow. | 高 | SE010, SE019 |
| CE028 | Modal supports HIPAA-compliant workloads on the Enterprise plan under a BAA; Volumes v2 is in BAA scope, but Volumes v1, Images (excluding Filesystem/Directory Snapshots), and Memory Snapshots are currently out of scope. | 高 | SE010, SE019 |
| CE029 | Modal operates a private bug-bounty program via HackerOne; access requires email invitation via security@modal.com; Modal publishes a severity SLA (Critical 24 hours; High 1 week; Medium 1 month; Low/Informational 3 months). | 高 | SE010, SE019 |
| CE030 | Modal uses automated synthetic monitoring test applications that continuously check for network and application isolation within its runtime; employee access is protected by SSO IdP with phishing-resistant MFA and Secureframe MDM. | 高 | SE010, SE019 |
| CE031 | Modal's status page (checked June 14, 2026) shows the following 90-day uptimes: GPU functions 99.946%, CPU functions 99.938%, Web endpoints 99.933%, Snapshot restores (beta) 99.782%, Sandboxes 99.861%, Volumes 99.979%, Image builds 99.863%. | 高 | SE028, SE018 |
| CE032 | A Hacker News community post (June 3, 2026) documented three major outages in one month—May 7 (AWS AZ SEV1 overheating), May 19 (no published incident report), and June 3 (internal authentication system failure)—as an adverse reliability signal. | 中 | SE018 |
| CE033 | The modal PyPI package is at version 1.5.0 as of June 2026, supports Python 3.10–3.14, and had 1,624,766 downloads in a single day and 13,899,772 downloads in the prior week. | 高 | SE017, SE016 |
| CE034 | The modal-client GitHub repository is open source, hosts the Modal Python SDK and JS/TypeScript and Go SDKs, and supports Python 3.10–3.14; community extensions exist (Ruby modal-rb). | 高 | SE016, SE017 |
| CE035 | HostFleet's April 2026 GPU pricing matrix shows Modal at $0.80/hr for L4 and $2.10/hr for A100-80GB, compared with RunPod at $0.43/hr (L4) and $2.17/hr (A100-80GB), and Together AI at $0.99/hr (A100-80GB); Baseten is priced higher than Modal on all comparable SKUs. | 中 | SE032, SE033 |
| CE036 | The @modal.concurrent decorator (added in SDK v0.73.148) allows containers to process multiple inputs simultaneously and enables continuous batching for LLM inference workloads (e.g., vLLM, SGLang); the decorator sets max_inputs and target_inputs. | 中 | SE013 |
| CE037 | Modal pools capacity across AWS, GCP, and Oracle Cloud Infrastructure globally across hundreds of data centers; an Oracle partnership cited by Sacra supports access to competitively priced GPU resources. | 中 | SE036, SE001 |
| CE038 | Modal's region selection charges pricing multipliers: broad regions (e.g., us) at 1.5x, narrow regions (e.g., us-west) at 1.75x; routing regions (us-east, us-west, eu-west, ap-south) control where inputs/outputs are processed; this enabled Physical Intelligence to achieve ~10ms latency. | 高 | SE015, SE025 |
| CE039 | Modal maintains a public GPU Glossary at modal.com/gpu-glossary covering the full GPU software stack from hardware architecture to CUDA programming; the glossary is open-source on GitHub and functions as a developer community asset. | 中 | SE021 |
| CE040 | Modal's May 2026 engineering blog post ("Truly Serverless GPUs") argues that GPU Allocation Utilization in fixed-allocation cloud deployments is commonly below 10–20%, and that Modal's four-pillar cold-start architecture reduces GPU replica scaling from "multiple kiloseconds to tens of seconds." | 中 | SE027 |
| CE041 | Sacra analyst data describes Modal's Rust-based container runtime and custom distributed filesystem as key performance differentiators; Sacra also notes Modal's multi-cloud architecture with automatic hardware selection. | 中 | SE036 |
| CE042 | Sacra analyst data (April 2026) confirms Modal introduced clustered computing for multi-node, RDMA-connected GPU workloads as a late-2025/2026 addition, enabling distributed training at scale on a single vendor. | 中 | SE036 |
| CE043 | Material unresolved product-tech diligence gaps include the absence of independent third-party performance benchmarks for cold-start or throughput claims, private enterprise SLA terms, HIPAA BAA scope exclusion of Memory Snapshots (a core performance feature), and unresolved reliability confidence from the May–June 2026 outage cluster. | 中 | SE018, SE028, SE010, SE027 |
| CU001 | Modal's publicly disclosed customer base spans at least six distinct archetypes: AI-native software builders, enterprise SaaS and fintech, media and content platforms, computational biology, robotics and physical AI, and government-adjacent and academic research. | 高 | SU012, SU019 |
| CU002 | Named customer verticals include fintech (Ramp), enterprise SaaS (Quora/Poe, Blend), voice AI (Decagon), media entertainment (Suno, Runway, Zencastr), computational biology (Chai Discovery), document intelligence (Reducto), and robotic control (Physical Intelligence). | 高 | SU012, SU020 |
| CU003 | The primary buyer across all Modal segments is an ML, platform-engineering, or applied-AI team that values Python-native ergonomics and instant auto-scaling over lower-level control of cloud infrastructure. | 中 | SU005, SU006, SU015 |
| CU004 | Modal operates a startup credits program and academic partnerships designed to create a conversion funnel from early-stage developers to paid enterprise accounts. | 中 | SU023, SU021 |
| CU005 | Sacra's 2026 analysis estimates Modal serves thousands of ML teams and specifically cites Meta's Code World Models team as a high-profile named customer alongside AI-native startups. | 中 | SU021 |
| CU006 | Modal announced in May 2026 that over one billion sandboxes have been launched on the platform since founding, approximately three years earlier. | 高 | SU008, SU020 |
| CU007 | During a 48-hour promotional event in June 2025, Lovable ran over 1 million Modal sandboxes at a peak of 20,000 concurrent sandboxes, enabling 250,000 app creations with no engineering pages from Modal's on-call. | 高 | SU004, SU027, SU008 |
| CU008 | Cognition CEO Scott Wu stated that Modal powers both Cognition's RL infrastructure and its production inference for Devin, with millions of sandboxes running on the RL side and real-time model serving on the inference side. | 高 | SU007, SU025 |
| CU009 | Suno scales its music-generation inference to thousands of GPUs on Modal to handle holiday demand peaks, allowing the platform to avoid purchasing dedicated capacity for variable workloads. | 中 | SU014, SU027 |
| CU010 | Zencastr scaled to 1,500 concurrent GPUs in a single Modal-powered batch job to enrich historical podcast audio with new features, without any additional DevOps work. | 中 | SU017 |
| CU011 | The 1 billion sandbox milestone was achieved roughly three years after founding, with the coding-agent cohort (Lovable, Ramp, Quora, Cognition) as the primary driver of Sandbox volume. | 中 | SU008, SU020 |
| CU012 | Ramp's Inspect coding agent, powered by Modal Sandboxes with Dicts and Queues, now accounts for more than half of all merged pull requests at Ramp across frontend and backend repositories. | 中 | SU005 |
| CU013 | Ramp previously achieved a 34% reduction in receipts requiring manual intervention using a Modal-trained fine-tuned model, at infrastructure cost estimated to be 79% lower than comparable LLM API providers. | 中 | SU006 |
| CU014 | Decagon's Voice 2.0 achieved a 65% reduction in latency and a p90 latency of 342ms for customer-service conversations after Modal's team built a custom EAGLE3 speculative-decoding draft model with 38% higher accept lengths than open-source baselines. | 中 | SU001, SU024 |
| CU015 | Runway moved Runway Characters from proof-of-concept to global production deployment in under 30 days, using Modal's single-line multi-node GPU cluster API with RDMA networking. | 高 | SU002, SU026 |
| CU016 | Lovable reduced sandbox orchestration code from 15,000 lines to 700 lines (a 97% reduction) by migrating from its prior distributed cloud VM platform to Modal Sandboxes. | 中 | SU004 |
| CU017 | Quora stress-tested Modal Sandbox creation throughput at 1,000 sandboxes per second and estimates ongoing savings of approximately 2 engineers' worth of infrastructure maintenance time per year. | 中 | SU013 |
| CU018 | Reducto achieved a 3x reduction in P90 latency and an 83% reduction in cold-boot times (from approximately 70 seconds to 12 seconds) after migrating its 30-plus production model inference stack from Kubernetes to Modal. | 中 | SU016, SU028 |
| CU019 | Substack migrated training and deployment pipelines for all major ML workloads—including spam detection, newsletter recommendations, audio transcription, and sentiment analysis—from AWS SageMaker and Airflow to Modal. | 中 | SU015 |
| CU020 | Chai Discovery uses Modal to process terabyte-scale biological datasets via Modal Volumes, spin up hundreds of GPUs in minutes for drug discovery experiments, and chain heterogeneous models including protein embeddings, MSAs, and antibody design pipelines. | 中 | SU003 |
| CU021 | Applied Compute uses Modal to run full RL training loops (rollouts, grading, and inference) for enterprise clients including DoorDash (merchant onboarding model) and Cognition (bug-catching coding agent), executing thousands of parallel environments simultaneously. | 高 | SU007, SU019 |
| CU022 | DoorDash co-founder and CTO Andy Fang confirmed in May 2026 that DoorDash is running production AI agents for merchants using Modal as part of its AI infrastructure, while also evaluating Claude Managed Agents built on Modal Sandboxes. | 高 | SU007, SU020 |
| CU023 | Physical Intelligence runs real-time remote robotic inference on Modal at 10–15 ms latency, using Modal's sub-second GPU boot and multi-region routing for production robot control. | 中 | SU018 |
| CU024 | Blend, a mortgage technology company serving hundreds of unique banking environments, uses Modal Sandboxes for agent-assisted software triage workflows that require complex cross-code, cross-configuration reasoning. | 中 | SU007 |
| CU025 | Runway Characters has thousands of early-access users including Fortune 10 technology companies, major Hollywood studios, global advertising agencies, and gaming companies using it for customer support, training, experiential advertising, and game worlds. | 高 | SU002, SU026 |
| CU026 | Ramp expanded its Modal usage from fine-tuning workloads (circa 2024) to the full Inspect coding agent platform (launched early 2026), demonstrating a documented multi-product, multi-year expansion within a single account. | 高 | SU005, SU006, SU008 |
| CU027 | Quora expanded its Modal usage from model-deployment infrastructure for Poe bots to adopting Modal Sandboxes for Poe's code execution feature, representing a second product tier within the same account. | 中 | SU013 |
| CU028 | Modal's May 2026 Series C announcement disclosed that Modal Sandboxes already drive more than one-third of total company revenue, confirming that the sandbox product line has reached material commercial scale. | 高 | SU020, SU008 |
| CU029 | Lovable founder Anton Osika stated in July 2025 that Lovable trusts Modal "to keep up with our growth" long-term after the stress test, signaling a committed partnership intent rather than a short-term evaluation. | 中 | SU004 |
| CU030 | Multiple Modal customers—including Reducto (Kubernetes/Ray), Substack (SageMaker), Lovable (distributed cloud VMs), and Chai Discovery (raw cloud instances)—migrated from legacy infrastructure to Modal and did not revert, suggesting high switching cost driven by developer experience rather than technical lock-in. | 中 | SU015, SU016, SU003, SU004 |
| CU031 | A Hacker News user documented three major Modal outages in approximately one month: a SEV-1 AWS heat event on May 7 2026, an incident on May 19 2026 with no published incident report, and an internal auth system failure on June 3 2026. | 中 | SU011 |
| CU032 | Modal's own status page shows 90-day uptime of 99.946% for GPU functions and 99.861% for Sandboxes as of June 2026, indicating non-trivial downtime over the measurement period. | 高 | SU022, SU011 |
| CU033 | Modal has not publicly disclosed NRR, GRR, contract duration, average revenue per account, cohort retention rates, or top-customer revenue concentration in any reviewed source as of June 2026. | 高 | SU020, SU021 |
| CU034 | Sacra's 2026 analysis identifies hyperscaler competition (AWS, Google, Azure adding serverless GPU with scale-to-zero billing) as a direct risk to Modal's customer retention, as these platforms can leverage existing enterprise contracts and committed spend programs. | 中 | SU021 |
| CU035 | The public named-customer set is almost entirely AI-native software companies or tech-first enterprises; no traditional industrial, regulated, or government enterprise has been named as a production customer in reviewed public sources. | 中 | SU012, SU021 |
| CU036 | DoorDash's May 2026 quote described its use of Claude Managed Agents on Modal as "evaluating" for the next step, indicating that at least this specific workload is in pre-production evaluation rather than committed production spend. | 中 | SU007 |
| CR001 | Modal's terms of service (effective October 2025) contain an embedded Data Processing Agreement that designates Modal as the "data processor" and customers as "data controllers" under GDPR Article 28, completing the required contractual relationship for EU personal data processing. | 高 | SR012, SR014 |
| CR002 | The DPA embedded in Modal's terms of service places legal-basis, notice, consent, and data-subject-rights obligations on the customer as data controller, not on Modal — meaning regulated deployments require customer-side GDPR compliance programs even when Modal's infrastructure stack is technically compliant. | 高 | SR012, SR014 |
| CR003 | The DPA's Technical and Organizational Measures (TOM) schedule commits Modal to encryption at rest, access control policies, annual SOC 2 Type II certification, daily customer-data backups, and annual restoration tests as its security obligations under the DPA. | 高 | SR012, SR014 |
| CR004 | Modal's HIPAA security documentation explicitly lists Volumes v1, Memory Snapshots, and Images (excluding Filesystem and Directory Snapshots) as out of scope for BAA commitments, meaning healthcare customers cannot submit PHI to those product surfaces. | 高 | SR013, SR024 |
| CR005 | EU AI Act Regulation 2024/1689 entered into force August 1, 2024 and will be fully applicable August 2, 2026; GPAI model governance rules — requiring technical documentation, training data transparency, and copyright compliance — became applicable August 2, 2025. | 高 | SR001, SR002 |
| CR006 | An AI omnibus political agreement reached May 7, 2026 extended high-risk AI system rules in certain categories to December 2027 but did not delay GPAI model governance obligations already in force since August 2025. | 高 | SR001, SR002 |
| CR007 | The FTC's June 2023 generative AI competition analysis flagged that incumbents controlling cloud compute infrastructure could engage in bundling, tying, exclusive dealing, and discriminatory access against specialized AI compute vendors — a risk that applies to Modal's dependence on AWS, GCP, and OCI for GPU capacity. | 高 | SR009, SR001 |
| CR008 | No active litigation, enforcement actions, or regulatory investigations against Modal Labs, Inc. have been identified in any publicly available source as of June 14, 2026. | 中 | SR012, SR014 |
| CR009 | A Hacker News post (June 3, 2026) documented three major Modal outages in a single month: May 7 (SEV 1, AWS us1-az4 overheating), May 19 (no published incident report), and June 3 (internal authentication system down). | 高 | SR011, SR010 |
| CR010 | Modal's status page (June 14, 2026) shows 90-day uptime of 99.946% for GPU functions, 99.938% for CPU functions, 99.933% for Web endpoints, 99.782% for Snapshot restores, and 99.861% for Sandboxes — solid aggregate statistics that are consistent with brief but frequent incident windows. | 高 | SR010, SR011 |
| CR011 | The June 3, 2026 outage was caused by an internal authentication system failure rather than a GPU or cloud-provider event, indicating a centralized control-plane dependency not directly mitigated by Modal's multi-cloud GPU pooling architecture. | 高 | SR011, SR010 |
| CR012 | The May 7, 2026 SEV 1 outage was caused by AWS availability zone us1-az4 overheating, demonstrating that even with multi-cloud pooling, a single AZ failure can propagate to in-flight customer workloads. | 高 | SR011, SR010 |
| CR013 | Modal publishes no contractual SLA for Starter or Team plan customers; Enterprise SLA terms are negotiated privately and not publicly available, leaving the majority of the customer base without explicit uptime remedies for the May–June 2026 outage cluster. | 高 | SR024, SR012 |
| CR014 | Modal achieved SOC 2 Type II certification audited January 2025 with no deviations found and commits to annual renewal, providing a verified external audit of its security control posture. | 高 | SR013, SR015 |
| CR015 | Modal runs a private bug bounty program through HackerOne requiring researchers to email security@modal.com for an invitation — a standard approach for private companies but narrower than a public program that allows broader community vulnerability discovery. | 中 | SR013 |
| CR016 | Modal's GPU Memory Snapshots use gVisor container isolation (Rust-based runtime) and depend on NVIDIA CUDA checkpoint/restore API in specific driver branches (570/575); they are documented as generally incompatible with multi-GPU code and non-CUDA GPU workloads. | 中 | SR016, SR025 |
| CR017 | Modal aggregates GPU capacity from AWS, GCP, and Oracle Cloud Infrastructure and does not own GPU hardware, making its compute supply entirely dependent on continued availability and pricing from these three cloud providers. | 高 | SR017, SR016 |
| CR018 | The AWS shared responsibility model specifies that even for abstracted cloud services, OS patching, configuration management, and application security remain the customer's (in Modal's case, the infrastructure operator's) responsibility — Modal inherits the same model with its own customers. | 高 | SR005, SR012 |
| CR019 | Sacra's Fireworks AI profile identifies NVIDIA's acquisition of Lepton as a signal of NVIDIA's GPU cloud marketplace ambitions, creating a scenario where Modal's primary GPU hardware supplier becomes a direct product-layer competitor. | 中 | SR007 |
| CR020 | CoreWeave's contracted backlog reached $99.4B as of March 31, 2026, with FY2026 capex guidance of $31–35B; CoreWeave holds a $6.3B NVIDIA take-or-pay GPU capacity backstop, giving it preferential allocation Modal cannot replicate as an asset-light aggregator. | 高 | SR003, SR022 |
| CR021 | Sacra's Fireworks AI profile identifies hardware concentration as a core risk for asset-light inference platforms: sourcing GPU capacity from third parties creates exposure to allocation constraints and hardware-generation transitions (H100 to H200 to Blackwell B200) — a risk that applies directly to Modal's supply model. | 中 | SR007 |
| CR022 | Modal's GPU Memory Snapshot cold-start technology depends on NVIDIA CUDA checkpoint/restore API in driver branches 570/575; any change to NVIDIA's driver API or commercial restrictions on the checkpoint capability could break the feature that provides Modal's most differentiated cold-start advantage. | 中 | SR016, SR025 |
| CR023 | Modal's DPA directs customers to trust.modal.com/subprocessors for the current subprocessor list; this dynamic reference creates an ongoing vendor-chain compliance obligation for enterprise customers who must monitor subprocessor changes for GDPR and procurement purposes. | 中 | SR012, SR014 |
| CR024 | Modal's $4.65B Series C valuation at approximately $300M ARR implies a ~15.5x revenue multiple — a premium that prices in continued hypergrowth and tolerates limited execution misses before triggering material multiple compression. | 高 | SR017, SR018, SR022 |
| CR025 | Sacra estimated Modal at $300M ARR in April 2026 and roughly 5x growth since the October 2025 Series B; sustaining this growth rate requires simultaneous headcount scaling, product investment, SLA delivery improvement, and competitive differentiation. | 高 | SR018, SR019, SR017 |
| CR026 | Sandboxes now drive more than one-third of Modal's total revenue (per the Series C blog), creating product-concentration risk in a single workload category whose growth depends on continued AI agent market expansion and resistance to hyperscaler-native substitution. | 高 | SR017, SR018 |
| CR027 | HostFleet's 2026 GPU pricing comparison shows Modal at $0.80/hr for L4 and $2.10/hr for A100-80GB — above RunPod ($0.43/hr for L4) but below Baseten ($4.00/hr for A100-80GB) — positioning Modal in a mid-premium tier that requires sustained cold-start and developer-experience differentiation to defend. | 中 | SR023, SR028 |
| CR028 | Sacra's Fireworks AI profile identifies inference commoditization as a core risk, noting that as vLLM, SGLang, and competing frameworks improve, "proprietary performance advantage is likely to compress" — the same dynamic applies to Modal's cold-start speed and SDK differentiation against lower-cost peers. | 中 | SR007 |
| CR029 | CoreWeave's $99.4B contracted backlog anchored by hyperscalers (Microsoft 67% of FY2025 revenue, Meta, OpenAI) demonstrates that the largest AI compute buyers are already committed to capital-intensive providers that Modal's asset-light model cannot match on reserved capacity guarantees. | 高 | SR003, SR022 |
| CR030 | RunPod grew from 100,000 to 400,000+ developers by late 2025 on approximately $22M raised (per Sacra), demonstrating that price-competitive GPU platforms can scale developer adoption aggressively against a well-funded competitor at a fraction of Modal's capital intensity. | 中 | SR020, SR028 |
| CR031 | Modal's public communications name Erik Bernhardsson as the sole executive; no other C-suite leaders (CRO, CPO, CFO, VP Engineering, Head of Revenue) are named in any public source fetched as of June 14, 2026. | 高 | SR017, SR021 |
| CR032 | Akshat Bubna is confirmed as Modal's co-founder but his functional title, scope, and prior industry background remain undisclosed in all public sources as of June 14, 2026. | 中 | SR017, SR026 |
| CR033 | Modal discloses no board composition, committee structure, or investor control terms in any public source — standard for a late-stage private company but notable at a $4.65B valuation with enterprise production workloads and $300M+ ARR. | 中 | SR017, SR026, SR027 |
| CR034 | The NIST AI Risk Management Framework (AI RMF) provides voluntary governance standards for AI trustworthiness that enterprise procurement teams may use as diligence criteria; Modal does not publicly reference alignment with the AI RMF, creating a potential procurement friction point for risk-mature enterprise buyers. | 中 | SR008 |
| CR035 | Modal gates HIPAA BAA, Okta SSO, audit logs, and custom SLAs behind the Enterprise plan, meaning Starter and Team customers operate without explicit contractual compliance, identity, or reliability protections beyond the baseline ToS terms. | 高 | SR024, SR013 |
| CR036 | Modal's multi-cloud pooling across AWS, GCP, and Oracle Cloud is a structural mitigation against single-cloud failure, but the May 7, 2026 AWS AZ overheating outage still propagated to customers, indicating that pooling does not guarantee instant in-flight workload failover during sudden AZ-level events. | 高 | SR011, SR017 |
| CR037 | Modal's operational security posture includes SOC 2 Type II (no deviations, January 2025), a private HackerOne bug bounty, gVisor container isolation, a Rust-based container runtime, TLS 1.3 on all public APIs, and automated synthetic monitoring for network and application isolation — a substantive security stack for a late-private company. | 高 | SR013, SR015, SR014 |
| CR038 | Modal raised $355M in its May 2026 Series C, providing estimated multi-year operating capital; the exact cash position and runway are not disclosed but recent capital adequacy risk appears low given the recency and size of the raise. | 中 | SR017, SR022 |
| CR039 | CoreWeave's contracted backlog of $99.4B is anchored by Microsoft (67% of FY2025 revenue), OpenAI (~$22.4B implied), and Meta (~$35.2B implied) — the same hyperscaler and frontier AI customer segments Modal would need to capture for sustained growth at its $4.65B valuation, suggesting CoreWeave has already locked in the largest contracts in the category. | 高 | SR003, SR022 |
| CR040 | GitHub issues for modal-labs/modal-client show active bug reports across multiple releases (issues in the #4000–4114 range as of June 2026), consistent with a large, active user base; no disclosed critical security vulnerabilities appear in the public repository. | 低 | SR006 |
| CR041 | The FTC cloud competition analysis specifically flags cloud providers offering both compute infrastructure and AI products as potential abusers of discriminatory pricing or access controls against specialized compute vendors — a structural risk to Modal's supply-chain access if AWS, GCP, or OCI expand their own serverless GPU offerings. | 中 | SR009, SR005 |
| CR042 | NVIDIA's $2B equity investment in CoreWeave and $6.3B take-or-pay GPU backstop demonstrates that NVIDIA can use preferential allocation to deepen relationships with capital-intensive data center operators — a dynamic that could disadvantage lighter-weight aggregation platforms like Modal in future GPU allocation cycles. | 高 | SR003, SR022 |
| CR043 | The EU AI Act's GPAI governance rules (applicable since August 2, 2025) require providers of general-purpose AI models to provide technical documentation and engage in training-data transparency; Modal's enterprise customers who are GPAI providers may route compliance documentation requests upstream to Modal, creating an indirect regulatory burden. | 中 | SR001, SR002 |
| CR044 | Modal's data retention policy stores function inputs/outputs for up to 7 days, app and container logs for 1 day (Starter) to 30 days (Team), and audit logs only on Enterprise plans — a retention structure that may be insufficient for regulated industries requiring longer forensic windows under HIPAA or sector compliance rules. | 高 | SR013, SR024 |
| CR045 | The EU AI Act reaches full applicability on August 2, 2026 — within the investment decision window this report informs — meaning EU enterprise customers will face live compliance obligations that may require Modal to provide GPAI documentation, data residency options, and compliance audit artifacts to complete their own AI Act filings. | 高 | SR001, SR002 |
| CV001 | Modal raised $355 million at a $4.65 billion post-money valuation in a Series C announced on May 21, 2026. | 高 | SV001, SV002, SV009 |
| CV002 | The Series C was co-led by General Catalyst and Redpoint Ventures, with Menlo Ventures, Bain Capital Ventures, and Accel joining as new investors. | 高 | SV001, SV002, SV017, SV018 |
| CV003 | Modal disclosed that annualized revenue had surpassed $300 million at the time of the Series C close. | 中 | SV001 |
| CV004 | Sacra independently estimates Modal Labs hit $300 million in annualized revenue in April 2026, up from approximately $119 million at the end of 2025. | 中 | SV005, SV006 |
| CV005 | Sandboxes, Modal's agent execution environment, drive more than one-third of total revenue as of the Series C close in May 2026. | 中 | SV001, SV025 |
| CV006 | The implied ARR multiple at the $4.65 billion Series C valuation divided by $300 million ARR is approximately 15.5x. | 中 | SV001, SV005 |
| CV007 | The valuation step-up from the $1.1 billion Series B to the $4.65 billion Series C in approximately seven months represents approximately a 4.2x increase. | 中 | SV001, SV006 |
| CV008 | Modal stated it grew fivefold in revenue since the October 2025 Series B, implying ARR at Series B was approximately $60 million if the $300 million post-Series C figure is accurate. | 中 | SV001 |
| CV009 | Sacra estimates Modal's ARR was approximately $119 million at end of 2025, consistent with a roughly 150% growth rate to $300 million in five months. | 中 | SV005 |
| CV010 | The Series C investor syndicate includes Quentin Clark, Max Rimpel, and Katie Keller as the General Catalyst deal team, confirmed on the GC portfolio page. | 中 | SV002, SV009 |
| CV011 | Modal's total capital raised through Series C is approximately $465 million, combining estimated seed ($7M), Series A ($16M), Series B ($110M company-disclosed), and Series C ($355M). | 中 | SV001, SV006, SV008 |
| CV012 | The Sacra Modal Labs report as of May 2026 shows a $1.1 billion valuation (from Series B) and total funding of $111 million, indicating it was last updated before the Series C close. | 中 | SV005, SV006 |
| CV013 | Sacra reports the Series B as $87 million led by Lux Capital in September 2025, while Modal's own blog post and the company context describe $110 million and Redpoint/Sutter Hill Ventures as leads—an unresolved discrepancy. | 低 | SV005, SV006, SV001, SV007 |
| CV014 | Modal's asset-light supply model aggregates GPU capacity from AWS, GCP, and Oracle Cloud Infrastructure rather than owning hardware, limiting capital intensity but also capping gross margin. | 中 | SV001, SV005 |
| CV015 | Modal's GPU memory snapshotting technology achieves 40–100x improvement in cold-start times over conventional GPU containers, per the company's engineering blog. | 中 | SV031 |
| CV016 | The Hostfleet April 2026 pricing matrix shows Modal charges $0.80 per hour for an L4 GPU versus $0.43 per hour on RunPod Secure Cloud—a 86% premium positioning. | 中 | SV021 |
| CV017 | Modal's multi-cloud aggregation model—sourcing from AWS, GCP, and Oracle—means its effective gross margin is the spread between customer rates and hyperscaler procurement costs, which are undisclosed. | 中 | SV001, SV014 |
| CV018 | No gross margin, COGS breakdown, or unit economics data for Modal has been publicly disclosed as of June 14, 2026; the company has not filed with the SEC or published audited financials. | 中 | SV005, SV006 |
| CV019 | A Hacker News community post from June 3, 2026 documented three major operational incidents in a single month: a May 7 SEV-1 involving AWS infrastructure overheat, an undocumented May 19 incident, and a June 3 internal authentication system failure. | 中 | SV020 |
| CV020 | Modal's status page reported 90-day GPU function uptime of 99.946% as of June 14, 2026, which appears to undercount severity of the three incidents reported on Hacker News in May–June 2026. | 中 | SV030, SV020 |
| CV021 | No NRR, customer cohort retention, or churn data has been publicly disclosed by Modal or any independent source as of June 14, 2026. | 中 | SV005, SV006 |
| CV022 | Modal's board composition, CFO identity, VP Sales identity, and governance structure are not disclosed in any publicly available source fetched in this run. | 中 | SV001, SV005 |
| CV023 | Three major outages in May–June 2026, coinciding with the company's Series C fundraising window, represent a material reliability risk signal at a $300M ARR scale that is unusual for infrastructure leaders. | 中 | SV020, SV030 |
| CV024 | Modal's $4.65 billion post-money valuation at 15.5x ARR sits at the upper end of private AI infrastructure multiples observed in 2025–2026, above Baseten (8.3x), Together AI (3.3x closed, 7.5x proposed), and CoreWeave (4.5x public). | 中 | SV005, SV010, SV011, SV013 |
| CV025 | Baseten raised $300 million at a $5 billion post-money valuation in February 2026; Sacra estimates Baseten's ARR at approximately $600 million, implying approximately 8.3x ARR multiple. | 中 | SV010, SV024 |
| CV026 | Fireworks AI raised $250 million at a $4 billion post-money valuation in October 2025; Sacra estimates approximately $800 million in ARR, implying roughly 5x ARR. As of May 2026, Fireworks is reportedly in talks to raise at a $15 billion valuation—implying 18.75x ARR. | 中 | SV010 |
| CV027 | Together AI raised $305 million at a $3.3 billion valuation in February 2025; Sacra estimates $1 billion in ARR in 2026, implying 3.3x ARR on the closed round. Together is reportedly in talks to raise at a $7.5 billion pre-money valuation, implying 7.5x ARR. | 中 | SV011 |
| CV028 | CoreWeave went public in March 2025 at a $23 billion pre-IPO valuation; its FY2025 revenue per the SEC 10-K filed March 2026 was $5.13 billion, implying approximately 4.5x trailing revenue at the pre-IPO mark. | 高 | SV013, SV014 |
| CV029 | Groq raised $750 million at a $6.9 billion valuation in September 2024 against approximately $90 million in 2024 revenue per Sacra. A December 2025 Nvidia licensing deal worth $17 billion materially altered its comparability to traditional inference platforms. | 中 | SV012 |
| CV030 | In the bull case, Modal grows ARR to $650 million to $1.0 billion by mid-2027 through Sandbox momentum and inference expansion; at 15–18x, this implies a valuation range of $9.75 billion to $18 billion. | 低 | SV001, SV005 |
| CV031 | In the base case, Modal grows ARR to $450 million to $650 million by mid-2027 at 100–150% YoY, with multiple compressing to 12–15x; this implies a valuation range of $5.4 billion to $9.75 billion, placing the closed $4.65 billion Series C inside the distribution. | 低 | SV001, SV005, SV010, SV011 |
| CV032 | In the bear case, Modal's revenue growth decelerates below 80% YoY due to hyperscaler bundling, outage recurrence, or margin revelation; at 7–10x on $200 million to $330 million ARR, the implied valuation range is $1.4 billion to $3.3 billion—representing a material mark-to-market loss from the Series C. | 低 | SV020, SV021, SV013 |
| CV033 | RunPod, the lowest-cost option in the Hostfleet matrix at $0.19 per hour for T4 GPUs, maintains gross margins in the mid-60s to high-70s percent range per Sacra, suggesting that asset-light GPU intermediaries can achieve software-like economics at lower scale. | 中 | SV016, SV021 |
| CV034 | CoreWeave's Q1 2026 revenue of $2.078 billion grew 112% year-over-year with adjusted EBITDA of $1.157 billion (56% margin), providing a public-market reference point for AI cloud economics at scale. | 高 | SV013, SV014 |
| CV035 | The private AI infrastructure market in mid-2026 shows a wide range of ARR multiples: from 3.3x (Together AI closed round) to a proposed 18.75x (Fireworks discussions), with Modal's 15.5x in the upper quartile. | 中 | SV010, SV011, SV005, SV013 |
| CV036 | At the current $300 million ARR and a 15.5x multiple, the sensitivity analysis shows that alternative multiples imply very different revenue requirements: 4.5x needs $1.03 billion, 8.3x needs $560 million, 15.5x needs $300 million. | 中 | SV005, SV013, SV010 |
| CV037 | Hyperscaler bundling risk is material: AWS, GCP, and Azure can bundle model access, compute, governance, and credit commitments inside existing cloud relationships, creating structural pressure on Modal's pricing premium over raw GPU access. | 中 | SV001, SV014 |
| CV038 | Gross margin evidence is the single most important undisclosed data point for Modal's valuation; the range of 25–65% implies a multiple range of 7x to 30x+ on $300 million ARR, meaning the gross margin question dominates the underwriting. | 中 | SV016, SV021 |
| CV039 | Plausible exit pathways for Modal include a late-stage IPO (2027–2028 at $5B-$15B), strategic acquisition by a hyperscaler (Google, Microsoft, Amazon) or infrastructure company (Databricks, Snowflake), or remaining private for 3–5 years with continued venture backing. | 低 | SV001, SV005 |
| CV040 | Another major outage within six months of the June 2026 incidents would constitute a thesis-break trigger, signaling that infrastructure reliability has not kept pace with revenue growth. | 中 | SV020 |
| CV041 | Gross margin evidence below 25% from any credible primary source would represent a thesis-break trigger, as it would imply the current 15.5x ARR multiple prices in software economics that the business does not demonstrate. | 中 | SV016, SV021 |
| CV042 | Revenue growth decelerating below 80% year-over-year by Q4 2026 or Q1 2027 would compress the multiple toward 8–10x and place the current $4.65 billion mark at or above the base case ceiling. | 中 | SV005, SV010 |
| CV043 | Cap table and preference terms for the Series C are not publicly disclosed; accumulated liquidation preferences across four rounds ($465M+ primary capital) could materially impair common equity economics at moderate exit multiples. | 中 | SV001, SV006 |
| CV044 | The combination of (1) gross margin opacity, (2) no NRR data, (3) three recent outages, and (4) the Sacra Series B data conflict together prevent a buy call; the recommendation is track with medium confidence. | 中 | SV005, SV020, SV006 |
| CV045 | Modal's Redpoint Series A in 2023, Sutter Hill Ventures participation in Series B, and new investors General Catalyst, Menlo Ventures, Bain Capital Ventures, and Accel in Series C indicate a high-quality syndicate that performed primary diligence on all disclosed terms. | 中 | SV002, SV008, SV009, SV017, SV018 |
| CV046 | Over 1 billion Sandboxes have been launched on Modal across its customer base, as disclosed in the Series C announcement—validating platform scale beyond pure GPU compute rental. | 中 | SV001, SV025 |
| 编号 | 出版方 | 标题 | 引文 |
|---|---|---|---|
| SO001 | Modal Labs (official) | Modal – The Production Cloud for AI (homepage) | The production cloud for AI. Modal SDK: Your cloud environment, in code. |
| SO002 | Modal Labs (official) | Modal Blog | |
| SO003 | Modal Labs (official) | Modal's Series C: Raising $355M at a $4.65B valuation | We've raised $355 million after growing fivefold since [Series B], surpassing $300 million in annualized revenue. Our valuation is $4.65B post-money in a round led by General Catalyst and Redpoint, with Menlo, Bain Capital Ventures, and Accel joining as new investors. |
| SO004 | Modal company page | Company size 51-200 employees. Headquarters New York City, New York. | |
| SO005 | Modal Labs (official) | Modal Documentation – Introduction and Getting Started | Modal is an AI infrastructure platform that lets you: Run low latency inference with sub-second cold starts... You get full serverless execution and pricing because we host everything and charge per second of usage. |
| SO006 | Erik Bernhardsson (personal blog) | What I have been working on: Modal | Long story short: I'm working on a super cool tool called Modal. Please check it out — it lets you run things in the cloud without having to think about infrastructure. |
| SO007 | Redpoint Ventures | Modal – Redpoint Portfolio | Redpoint first invested in Modal's Series A in 2023. Founders Erik Bernhardsson, Akshat Bubna. Location New York, NY. |
| SO008 | General Catalyst | Modal – General Catalyst Portfolio | AI infrastructure that developers love. Backed since: 2026. Our Investment in Modal: A Serverless Cloud for the AI Era. |
| SO009 | Modal Labs (official) | Modal Terms of Service (SaaS Agreement) | This Software as a Service Agreement (the "Agreement") is between the entity named below ("Customer") and Modal Labs, Inc., a Delaware corporation ("Modal"). |
| SO010 | Modal Labs (official) | Modal Customers page | "Modal powers both our reinforcement learning infrastructure and production inference. Millions of sandboxes on one end, real-time serving on the other." — Scott Wu, CEO, Cognition |
| SO011 | Modal Labs (official) | How we achieved truly serverless GPUs | Together, [cloud buffers, custom filesystem, checkpoint/restore, CUDA checkpoint/restore] take AI inference server replica scaling from multiple kiloseconds to just tens of seconds. |
| SO012 | GitHub (Modal Labs organization) | modal-labs GitHub organization | |
| SO013 | Python Package Index (PyPI) | modal – Python SDK on PyPI | This library requires Python 3.10 – 3.14. |
| SO014 | Modal Labs (official) | Modal Pricing Plans | Starter $0 + compute / month. Team $250 + [compute]. Enterprise Custom. |
| SO015 | Hacker News community | Modal Major Outage – HN discussion thread | This is the third major outage in a month. 5.7.2026 - SEV 1, AWS us1-az4 overheats 5.19.2026 - No published incident report 6.3.2026 - Ongoing, internal auth system down |
| SO016 | Modal Labs (official) | Modal Labs Status Page | GPU functions modal.Function: execute GPU functions 99.946% uptime |
| SO017 | Modal Labs (official) / Reducto (customer) | How Reducto improved enterprise-scale document processing latency by 3x | Reducto achieved massive latency reductions, including a 3x reduction in P90 latency, after migrating inference workloads for their 30+ models to Modal. |
| SO018 | Modal Labs (official) / Substack (customer) | Why Substack moved their AI and ML pipelines to Modal | "Modal lets us deploy new ML models in hours rather than weeks. We use it across spam detection, recommendations, audio transcription, and video pipelines, and it's helped us move faster with far less complexity." — Mike Cohen, Head of AI & ML Engineering |
| SO019 | Modal Labs (official) / Quora (customer) | How Quora uses Modal to run thousands of Python sandboxes simultaneously | "We offloaded this to Modal and are actively saving 2 engineers' worth of ongoing engineering time." — Hwan Seung Yeo, Director of Engineering |
| SO020 | Modal Labs (official) / Zencastr (customer) | How Zencastr transcribed hundreds of years worth of audio in just a few days | "Modal has been a really nice, scalable solution for us. We don't have to worry about pre-allocating GPUs weeks ahead of time – we just spin it up and it works." |
| SO021 | Modal Labs (official) / Applied Compute (customer) | Scaling reinforcement learning at Applied Compute | "Modal was clearly very flexible, structured in a way where we could build these complex environments, and really focused on performance and reliability." — Yash Patil, CEO, Applied Compute |
| SO022 | Modal Labs (official) | Modal LLM solutions page | |
| SO023 | Modal Labs (official) | Modal Coding Agents solutions page | "Modal was the only infrastructure provider that enabled us to reliably run tens of thousands of app creation sessions in an instant." — Anton Osika, CEO & Founder, Lovable |
| SO024 | TechCrunch | Modal Labs | TechCrunch tag page | |
| SO025 | Hacker News community | Submissions from modal.com – Hacker News developer feed | Cutting inference cold starts by 40x with LP, FUSE, C/R, and CUDA-checkpoint — 91 points |
| SO026 | Menlo Ventures | Menlo Ventures portfolio (Modal listed as Series C investment) | |
| SO027 | Bain Capital Ventures | Bain Capital Ventures portfolio page | |
| SO028 | Modal Labs (official) | Modal jobs site | |
| SM001 | MarketsandMarkets | AI Infrastructure Market by Offerings (Compute, Memory, Network, Storage, Software), Function (Training, Inference), Deployment — Global Forecast to 2030 | The AI Infrastructure market is expected to grow from USD 135.81 billion in 2024 to USD 394.46 billion by 2030, at a compound annual growth rate (CAGR) of 19.4% during the forecast period. |
| SM002 | Technavio | AI Inference-as-a-Service Market Growth Analysis — Size and Forecast 2026–2030 | The AI Inference-as-a-service Market size was valued at USD 85.25 billion in 2025, growing at a CAGR of 22.1% during the forecast period 2026-2030. North America dominated the market and accounted for a 41.1% growth during the forecast period. |
| SM003 | Mordor Intelligence | Cloud AI Market Size and Share Analysis — Growth Trends and Forecasts (2026–2031) | It is forecast to reach USD 269.02 billion, expanding at an 18.68% CAGR from 2026 to 2031. Persistent shortages of H100 and MI300X GPUs and limited HBM3 supply have stretched lead times past 12 months, constraining new training projects. |
| SM004 | MarketsandMarkets | Cloud AI Market by Cloud AI Infrastructure (Compute, Storage, Network), AI & ML Platforms (AutoML), MLOps, AIaaS, Technology — Global Forecast to 2029 | The global cloud AI market is projected to reach USD 327.15 billion by 2029 at a CAGR of 32.4% during the forecast period. |
| SM005 | MarketsandMarkets | Artificial Intelligence (AI) Market by Offering (Hardware, Software, Services), Technology (ML, NLP, Generative AI) — Global Forecast to 2033 | The Artificial intelligence (AI) market was estimated to be worth USD 601.93 billion in 2026 and is projected to reach USD 3,638.08 billion by 2033, at a CAGR of 29.3%. |
| SM006 | RunPod | GPU Cloud Pricing — Per-Second H100, A100, RTX | RunPod | H200 $4.39/hr, B200 $5.89/hr, H100 NVL $3.19/hr, H100 PCIe $2.89/hr, H100 SXM $3.29/hr, A100 SXM $1.49/hr, L40S $0.86/hr. |
| SM007 | Replicate | Pricing — Replicate | Unlike public models, most private models run on dedicated hardware so you don't have to share a queue with anyone else. This means you pay for all the time instances of the model are online — the time they spend setting up; the time they spend idle, waiting for requests; and the time they spend active, processing your requests. |
| SM008 | Together AI | Together AI Pricing — Inference API | |
| SM009 | Amazon Web Services | Amazon Bedrock Pricing | |
| SM010 | Microsoft Azure | Pricing — Azure Machine Learning | Pay as you go — Pay for compute capacity by the second, with no long-term commitments or upfront payments. Azure savings plan for compute — Save money across select compute services globally by committing to spend a fixed hourly amount for 1 or 3 years. |
| SM011 | Google Cloud | Gemini Enterprise Agent Platform pricing (Vertex AI / Agent Platform) | Training: $3.465 / 1 hour. Deployment and online prediction: $1.375 / 1 hour (classification) or $2.002 / 1 hour (object detection). |
| SM012 | Modal Labs | GPU Acceleration — Modal Documentation | Modal supports B200, B200+ (opt-in to B300), H200, H100, H100!, A100, A100-40GB, A100-80GB, RTX-PRO-6000, L40S, L4, A10, T4. Use gpu="B200+" to allow Modal to run requests on either B200 or B300 GPUs. |
| SM013 | Modal Labs | Cold Start Performance — Modal Documentation | Modal''s custom container stack has been heavily optimized to reduce this time. Containers boot in about one second. |
| SM014 | Modal Labs | Scaling and Map — Modal Documentation | Modal enforces the following limits for every function — 2,000 pending inputs (inputs that haven't been assigned to a container yet), 25,000 total inputs (which include both running and pending inputs). For inputs created with .spawn() for async jobs, Modal allows up to 1 million pending inputs. |
| SM015 | Modal Labs | Featured Examples — Modal Documentation | |
| SM016 | Modal Labs | How Suno Auto-Scales to 1000+ GPUs for Holiday Demand Peaks | "What kills you is this peak demand, right? Like you just can't afford to be buying machines for steady demand and then also have two people for six months do nothing other than building inference that can handle scaling down and up from that." — Georg Kucsko, Co-founder and CTO, Suno |
| SM017 | Modal Labs | Modal — The Production Cloud for AI | |
| SM018 | Modal Labs | Modal Pricing | |
| SM019 | Modal Labs | Modal Series C: $355M at $4.65B to build the production cloud for AI | Modal has grown fivefold since its Series B and has surpassed $300M in annualized revenue. |
| SM020 | Modal Labs | Modal Customers | |
| SM021 | Modal Labs | How we built truly serverless GPUs: Cold starts under 300ms | |
| SM022 | Modal Labs | Modal for LLM Inference and Serving | |
| SM023 | Modal Labs | Modal for Coding Agents | |
| SM024 | Modal Labs | Applied Compute — Reinforcement Learning Infrastructure on Modal | |
| SM025 | Modal Labs | Reducto Case Study — 3x P90 Latency Reduction and 1000+ GPU Scale | |
| SM026 | TechCrunch | TechCrunch coverage of Modal Labs | |
| SM027 | Stack Overflow | Stack Overflow Developer Survey 2024 — AI Tools Adoption | Most developers use ChatGPT of all the AI tools, and 74% want to keep using it next year. 41% of ChatGPT users want to use GitHub Copilot next year. |
| SP001 | Modal | Modal Pricing | |
| SP002 | Modal | Modal Solutions — Coding Agents | |
| SP003 | Modal Docs | Sandboxes — Modal Docs | |
| SP004 | Modal | Security and Privacy at Modal | |
| SP005 | Replicate | Replicate — Run AI with an API | |
| SP006 | Replicate | Pricing — Replicate | |
| SP007 | Replicate | Docs — Replicate | |
| SP008 | RunPod | The AI Developer Cloud | Runpod | |
| SP009 | RunPod | Serverless GPU Inference | Runpod | |
| SP010 | RunPod | GPU Instance Pricing | Runpod | |
| SP011 | Baseten | Inference Platform — Deploy AI models in production | Baseten | |
| SP012 | Baseten | Cloud Pricing — Baseten | |
| SP013 | Beam Cloud | On-Demand AI Compute | Beam | |
| SP014 | Beam Cloud | Pricing | Beam | |
| SP015 | Banana.dev | Banana — GPUs For Inference | |
| SP016 | Lambda AI | The Superintelligence Cloud | Lambda | |
| SP017 | CoreWeave | The Essential Cloud for AI | CoreWeave | |
| SP018 | CoreWeave | CoreWeave Cloud Pricing | CoreWeave | |
| SP019 | AWS | Amazon SageMaker — The center for all your data, analytics, and AI | |
| SP020 | Google Cloud | Cloud Run — Build apps on a fully managed platform | |
| SP021 | Google Cloud | Gemini Enterprise Agent Platform (formerly Vertex AI) | |
| SP022 | Microsoft Azure | Azure Container Apps | Microsoft Azure | |
| SP023 | AWS | Amazon Bedrock Pricing — AWS | |
| SP024 | Sacra | Modal Labs revenue, valuation and funding | |
| SP025 | Sacra | RunPod revenue, funding and news | |
| SP026 | Together AI | Pricing | Together AI | |
| SP027 | CNBC | AI startup Modal raises $355 million at $4.65 billion valuation | |
| SP028 | Modal | How Suno shaved 4 months off their launch timeline with Modal | |
| SI001 | Modal | Modal's Series C: Raising $355M at a $4.65B Valuation | |
| SI002 | Sacra | Modal Labs revenue, valuation and funding | |
| SI003 | Modal | Plan Pricing | |
| SI004 | Modal | Billing | |
| SI005 | Modal | Sandbox resources and pricing | |
| SI006 | Modal | Volumes | |
| SI007 | Modal | Memory Snapshots | |
| SI008 | Modal | GPU acceleration | |
| SI009 | Modal | Startups on Modal | |
| SI010 | Modal | Region selection | |
| SI011 | Modal | Modal Notebooks | |
| SI012 | Modal | Modal Legal Terms of Service | |
| SI013 | Modal | Modal Customers | |
| SI014 | Modal | Modal LLM Solutions | |
| SI015 | Modal | Coding Agents Solutions | |
| SI016 | Modal | Modal Status | |
| SI017 | General Catalyst | Modal — General Catalyst Portfolio | |
| SI018 | Redpoint Ventures | Modal — Redpoint Portfolio | |
| SI019 | Modal | Applied Compute — Reinforcement Learning Infrastructure Case Study | |
| SI020 | Modal | Modal Labs Status | |
| SI021 | Modal | Substack Case Study | |
| SI022 | Modal | Quora Case Study | |
| SI023 | Bain Capital Ventures | Bain Capital Ventures Portfolio — Modal | |
| SI024 | RunPod | GPU Cloud Pricing — Per-Second H100, A100, RTX | |
| SI025 | Modal Labs — LinkedIn Company Page | ||
| SI026 | Hacker News | Modal Major Outage | |
| SI027 | Amazon Web Services | EC2 On-Demand Instance Pricing | |
| SI028 | Amazon Web Services | SageMaker Pricing | |
| SI029 | PitchBook | Modal Labs Company Profile — Funding Rounds and Investors | |
| SE001 | Modal | Modal Documentation — Introduction | Modal is an AI infrastructure platform that lets you: Run low latency inference with sub-second cold starts, Scale out batch jobs to run massively in parallel, Spin up thousands of isolated and secure Sandboxes to execute AI generated code. |
| SE002 | Modal | Modal Web Functions documentation | You can turn any Python function into a Web Function with a single line of code. |
| SE003 | Modal | Modal Sandboxes documentation | Modal has a direct interface for defining containers at runtime and securely running arbitrary code inside them. |
| SE004 | Modal | Modal Cold Start Performance documentation | Containers boot in about one second. |
| SE005 | Modal | Modal Memory Snapshots documentation | Modal Memory Snapshots can dramatically reduce the cold start latency of Modal Functions by skipping initialization work on most container boots. |
| SE006 | Modal | Modal GPU Acceleration documentation | Modal supports the following GPU types: T4, L4, A10, L40S, A100, A100-40GB, A100-80GB, RTX-PRO-6000, H100, H200, B200, B200+. |
| SE007 | Modal | Modal Volumes documentation | Volumes are a high-performance distributed file system for Modal applications. They are optimized for write-once, read-many I/O workloads. |
| SE008 | Modal | Modal Dicts documentation | Modal Dicts provide distributed key-value storage to your Modal Apps. |
| SE009 | Modal | Modal Queues documentation | Modal Queues provide distributed FIFO queues to your Modal Apps. |
| SE010 | Modal | Modal Security and Privacy documentation | We build our software using memory-safe programming languages, including Rust (for our worker runtime and storage infrastructure) and Python (for our API servers and Modal client). |
| SE011 | Modal | Modal Container Images documentation | Modal runs containers using the sandboxed gVisor container runtime. |
| SE012 | Modal | GPU Memory Snapshots: Supercharging Sub-second Startup — Modal Blog | We have observed Functions starting up to 10x times faster than baseline. |
| SE013 | Modal | Modal Input Concurrency documentation | Modal supports these workloads with its input concurrency feature, which allows individual containers to process multiple inputs at the same time. |
| SE014 | Modal | Modal Scheduling (Cron) documentation | Modal facilitates this through function schedules. |
| SE015 | Modal | Modal Region Selection documentation | Modal has a variety of tools to optimize network latency—even down to ~10ms in extreme cases like real-time robotics. |
| SE016 | GitHub | modal-labs/modal-client GitHub repository | The Modal Python SDK provides convenient, on-demand access to serverless cloud compute from Python scripts on your local computer. This library requires Python 3.10 – 3.14. |
| SE017 | PyPI Stats | modal Python package — PyPI Download Stats | Downloads last day: 1,624,766. Downloads last week: 13,899,772. |
| SE018 | Hacker News | Modal Major Outage — Hacker News (June 3, 2026) | This is the third major outage in a month. 5.7.2026 - SEV 1, AWS us1-az4 overheats 5.19.2026 - No published incident report 6.3.2026 - Ongoing, internal auth system down. |
| SE019 | Modal | Modal Labs Trust Center | |
| SE020 | Modal | Modal is SOC 2 Type II Compliant — Modal Blog (January 2025) | We're excited to announce that we've successfully completed our SOC 2 Type II audit. No deviations were found in our audit. |
| SE021 | Modal | Modal GPU Glossary | We wrote this glossary to solve a problem we ran into working with GPUs here at Modal. |
| SE022 | Modal | Modal Pricing Plans | Enterprise: Volume-based discounts; Higher GPU concurrency; Embedded ML engineering services; Audit logs, Okta SSO, and HIPAA. |
| SE023 | Modal | Modal Developing and Debugging documentation | Modal also lets you run interactive commands on your running Containers from the terminal — much like ssh-ing into a traditional machine or cloud VM. |
| SE024 | Modal | Scaling Reinforcement Learning at Applied Compute — Modal Blog (May 2026) | Modal was clearly very flexible, structured in a way where we could build these complex environments, and really focused on performance and reliability. |
| SE025 | Modal | Real-time inference for robots at Physical Intelligence — Modal Blog (April 2026) | Running this compute on Modal simplified operations and enabled rapid experimentation with larger models, while only adding 10-15ms of network overhead. |
| SE026 | Modal | How Reducto improved enterprise-scale document processing latency by 3x — Modal Blog (November 2025) | GPU memory snapshotting for several models. This reduced cold boots by 83%, from ~70s to ~12s. |
| SE027 | Modal | How we achieved truly serverless GPUs — Modal Engineering Blog (May 2026) | Together, they take AI inference server replica scaling from multiple kiloseconds to just tens of seconds. |
| SE028 | Modal | Modal Labs Status Page (June 14, 2026) | GPU functions: 99.946% uptime. CPU functions: 99.938% uptime. |
| SE029 | Modal | Modal Coding Agents Solution Page | Spin up 50,000+ simultaneous code execution sandboxes for production use cases. |
| SE030 | Modal | Modal Container Lifecycle Hooks documentation | @modal.enter for one-time initialization (remote); @modal.exit for one-time cleanup (remote). |
| SE031 | Modal | Modal Secrets documentation | Securely provide credentials and other sensitive information to your Modal Functions with Secrets. |
| SE032 | HostFleet | Every serverless GPU host compared — HostFleet (April 2026) | L4 24GB — Runpod $0.43/hr, Modal $0.80/hr. A100 80GB — Runpod $2.17/hr, Modal $2.10/hr, Baseten $4.00/hr. |
| SE033 | RunPod | RunPod — The AI Developer Cloud | 0 to hundreds of concurrent workers in under 250ms. |
| SE034 | Amazon Web Services | AWS Lambda Features | AWS Lambda SnapStart delivers faster startup performance by up to 10x for Java, and from several seconds to as low as sub-second for Python and .NET. |
| SE035 | Google Cloud | What is Cloud Run — Google Cloud Documentation | Cloud Run lets developers spend their time writing their code, and very little time operating, configuring, and scaling their Cloud Run service. |
| SE036 | Sacra | Modal Labs — Sacra Analyst Research (accessed June 2026) | Modal's custom Rust-based container runtime, image builder, and distributed file system enable the fast startup times that differentiate it from traditional cloud platforms. |
| SE037 | Modal | Modal Labs SaaS Agreement (Terms of Service, effective May 2026) | This Software as a Service Agreement is between the entity named below and Modal Labs, Inc., a Delaware corporation. |
| SE038 | Modal Labs LinkedIn Company Page | Modal — The production cloud for AI. | |
| SE039 | Modal | Modal Series C Announcement Blog (May 2026) | Over 1 billion sandboxes have been launched on Modal. We've spent the last five years going very deep on technology, including building our own storage and compute layer from the ground up. |
| SU001 | Modal | How Decagon shipped real-time voice AI on Modal | "Decagon Voice 2.0 now has a 65% reduction in latency along with significant gains in intent recognition and response quality." |
| SU002 | Modal | Runway Chooses Modal to Power Real-Time Inference for Runway Characters | "The iteration speeds Modal afforded allowed Runway's team to move from proof of concept to production in under 30 days." |
| SU003 | Modal | Seamless Computational Bio at Chai Discovery | "Sometimes we spin up hundreds of GPUs at a time, and the fact it's up in a few minutes without onerous configurations or dashboards is kind of a miracle." |
| SU004 | Modal | How Modal powered 250,000 Lovable app creations in a weekend | "We now trust Modal to keep up with our growth, and we're excited to build together in the long term." — Anton Osika, Founder and CEO, Lovable |
| SU005 | Modal | How Ramp built a full context background coding agent on Modal | "Within a couple of months, roughly half of all merged pull requests across Ramp's frontend and backend repos are started by Inspect." |
| SU006 | Modal | How Ramp fine-tunes models on Modal for receipt classification | "Modal was able to support this workflow: driving down receipts requiring manual intervention by 34% on infrastructure that was an estimated 79% cheaper than other major LLM providers." |
| SU007 | Modal | Introducing Claude Managed Agents with Modal Sandboxes | "Modal powers both our reinforcement learning infrastructure and production inference. Millions of sandboxes on one end, real-time serving on the other." — Scott Wu, CEO, Cognition |
| SU008 | Modal | Over 1 billion sandboxes launched on Modal | "Over 1 billion sandboxes have been launched on Modal. Teams like Lovable, Ramp, Cognition and more are using Modal Sandboxes to power everything from coding agents to RL infrastructure at scale." |
| SU009 | Modal | Modal LLM Serving Solutions | |
| SU010 | Modal | Modal Image and Video Solutions | |
| SU011 | Hacker News | Modal Major Outage | "This is the third major outage in a month. 5.7.2026 - SEV 1, AWS us1-az4 overheats 5.19.2026 - No published incident report 6.3.2026 - Ongoing, internal auth system down" |
| SU012 | Modal | Modal Customers | |
| SU013 | Modal | How Quora uses Modal to run thousands of Python sandboxes simultaneously | "We offloaded this to Modal and are actively saving 2 engineers' worth of ongoing engineering time." — Hwan Seung Yeo, Director of Engineering, Quora |
| SU014 | Modal | How Suno uses Modal to scale music generation to 1000 GPUs | |
| SU015 | Modal | Why Substack moved their AI and ML pipelines to Modal | |
| SU016 | Modal | How Reducto decreased latency 3x by moving inference to Modal | "We were fighting, tearing our hair out trying to use Ray within our Kubernetes cluster, but the tooling was just not working." — Raunak Chowdhuri, Founder, Reducto |
| SU017 | Modal | Zencastr uses Modal for podcast AI and scales to 1500 GPUs | |
| SU018 | Modal | Real-time inference for robots at Physical Intelligence | |
| SU019 | Modal | Scaling reinforcement learning at Applied Compute | "Modal was clearly very flexible, structured in a way where we could build these complex environments, and really focused on performance and reliability." — Yash Patil, CEO, Applied Compute |
| SU020 | Modal | Modal's Series C: Raising $355M at a $4.65B valuation | "Sandboxes already drive more than a third of our revenue, and customers keep pushing us for more." |
| SU021 | Sacra | Modal Labs — Sacra Company Profile 2026 | |
| SU022 | Modal | Modal Status Page | |
| SU023 | Modal | Modal for Startups Program | |
| SU024 | Decagon | Decagon Voice 2.0 — Product Launch Page | |
| SU025 | Cognition | Cognition — Devin AI Software Engineer | "Devin is deployed at some of the largest and most complex institutions in the world." |
| SU026 | Runway | Runway — Runway Characters and GWM-1 World Model | "Thousands of organizations are already using Characters, including Fortune 10 technology companies, major Hollywood studios, global advertising agencies and gaming companies." |
| SU027 | Suno | Suno AI Music Generator | "Featured in Rolling Stone, Billboard, Wired, and Variety, Suno is used by everyone from first-time creators to top producers and songwriters. We're a top 10 music app on iOS and Android." |
| SU028 | Reducto | Reducto — Enterprise Document Intelligence | |
| SU029 | Lovable | Lovable — Build software with AI, together | |
| SR001 | European Parliament and Council of the European Union | Regulation (EU) 2024/1689 — Artificial Intelligence Act | |
| SR002 | European Commission — Digital Strategy | EU AI Act — Regulatory framework and application timeline | |
| SR003 | Sacra | CoreWeave — Sacra Company Profile | |
| SR004 | NVIDIA Corporation | NVIDIA H100 Tensor Core GPU — Data Center | |
| SR005 | Amazon Web Services | Shared Responsibility Model — Amazon Web Services | |
| SR006 | GitHub / modal-labs | modal-labs/modal-client — GitHub Issues | |
| SR007 | Sacra | Fireworks AI — Sacra Company Profile | |
| SR008 | National Institute of Standards and Technology (NIST) | AI Risk Management Framework (AI RMF) — NIST AI Resource Center | |
| SR009 | Federal Trade Commission | Generative AI Raises Competition Concerns — FTC Tech at FTC Blog | |
| SR010 | Modal Labs | Modal Status — Service uptime and incident history | GPU functions 99.946% uptime; CPU functions 99.938% uptime; Snapshot restores 99.782% uptime over 90 days ending June 14, 2026. |
| SR011 | Hacker News (user hunkins) | Modal Major Outage — Hacker News | This is the third major outage in a month. 5.7.2026 — SEV 1, AWS us1-az4 overheats. 5.19.2026 — No published incident report. 6.3.2026 — Ongoing, internal auth system down. |
| SR012 | Modal Labs | Modal Terms of Service (including Data Processing Agreement and TOMs) | Customer data is backed up at least at a daily cadence. Restoration tests are performed annually. |
| SR013 | Modal Labs | Security and Privacy at Modal | At the moment, Volumes v1, Images (excluding Filesystem and Directory Snapshots), Memory Snapshots, and user code are out of scope of the commitments within our BAA. |
| SR014 | Modal Labs | Modal Labs Trust Center | |
| SR015 | Modal Labs | Modal achieves SOC 2 Type II certification with no deviations found | SOC 2 Type II audit completed January 2025 with no deviations found. |
| SR016 | Modal Labs | Truly Serverless GPUs: Sub-Second Cold Starts | GPU Memory Snapshots: generally incompatible with multi-GPU code and non-CUDA GPU work, and do not speed up weight loading from storage. |
| SR017 | Modal Labs | Modal announces $355M Series C at $4.65B valuation | Sandboxes now make up over a third of our revenue. We have surpassed $300M in annualized revenue and grown fivefold since the Series B. |
| SR018 | Sacra | Modal Labs — Sacra Company Profile | |
| SR019 | Sacra | Modal Labs — Sacra 2026 Analysis | |
| SR020 | Sacra | Modal Labs — Sacra Research Report | |
| SR021 | TechCrunch | Modal Labs — TechCrunch coverage | |
| SR022 | CNBC | Modal raises $355 million at $4.65 billion valuation — CNBC | |
| SR023 | HostFleet | Serverless GPU Pricing Matrix 2026 — HostFleet | Modal at $0.80/hr for L4 and $2.10/hr for A100-80GB; Baseten at $4.00/hr for A100-80GB. |
| SR024 | Modal Labs | Modal Pricing | Starter: $0/month, $30 in credits; Team: $250/month; Enterprise: custom pricing with HIPAA compliance and Okta SSO. |
| SR025 | Modal Labs | GPU Memory Snapshots — Alpha Release Blog Post | |
| SR026 | Redpoint Ventures | Modal — Redpoint Ventures Portfolio Page | |
| SR027 | General Catalyst | Modal — General Catalyst Portfolio Page | |
| SR028 | RunPod | RunPod GPU Cloud Pricing | |
| SR029 | Replicate | Replicate Pricing | |
| SR030 | PitchBook | Modal Labs — PitchBook Company Profile | |
| SV001 | Modal Labs | Modal's Series C: Raising $355M at a $4.65B valuation | We've raised $355 million after growing fivefold since September, surpassing $300 million in annualized revenue. Our valuation is $4.65B post-money in a round led by General Catalyst and Redpoint. |
| SV002 | General Catalyst | Modal | General Catalyst Portfolio | AI infrastructure that developers love. Investors: Quentin Clark, Max Rimpel, Katie Keller |
| SV003 | CNBC | Modal raises $355 million Series C at $4.65 billion valuation | |
| SV004 | TechCrunch | Modal Labs — TechCrunch coverage | |
| SV005 | Sacra | Modal Labs revenue, valuation & funding | Sacra estimates that Modal Labs hit $300M in annualized revenue in April 2026, up from ~$119M at the end of 2025. |
| SV006 | Sacra | Modal Labs revenue, valuation & funding (2026 query) | Modal Labs closed an $87 million Series B in September 2025 led by Lux Capital, valuing the company at $1.1 billion post-money. As of May 2026, Modal is in talks to raise $150–$250M at a $4.5B valuation. |
| SV007 | Axios | Modal raises $110M Series B to build the production cloud for AI | |
| SV008 | Redpoint Ventures | Modal — Redpoint Ventures Portfolio | Redpoint first invested in Modal's Series A in 2023. |
| SV009 | General Catalyst | Modal — General Catalyst Portfolio (individual company page) | A Serverless Cloud for the AI Era. Backed since: 2026. |
| SV010 | Sacra | Fireworks AI revenue, valuation & funding | As of May 2026, Fireworks AI is in talks to raise a new funding round at a $15 billion post-money valuation, with Index Ventures set to co-lead. |
| SV011 | Sacra | Together AI revenue, valuation & funding | Together AI is in talks to raise approximately $1B at a $7.5B pre-money valuation as of March 2026. |
| SV012 | Sacra | Groq revenue, valuation & funding | On December 24, 2025, Groq entered a non-exclusive licensing agreement with Nvidia Corp. for its inference technology, structured to deliver $17 billion in cash payments across three installments by the end of 2026. |
| SV013 | Sacra | CoreWeave revenue, valuation & funding | CoreWeave went public on March 28, 2025, trading on Nasdaq under the ticker CRWV. Prior to the IPO, CoreWeave was valued at $23 billion. |
| SV014 | CoreWeave, Inc. | CoreWeave, Inc. Annual Report on Form 10-K for fiscal year ended December 31, 2025 | Annual report [Section 13 and 15(d), not S-K Item 405] for the fiscal year ended December 31, 2025. |
| SV015 | U.S. Securities and Exchange Commission | EDGAR Filing Documents for CoreWeave 10-K — Acc-no 0001769628-26-000104 | |
| SV016 | Sacra | RunPod revenue, valuation & funding | The company maintains gross margins in the mid-60s to high-70s percent range, similar to other data-heavy SaaS platforms. |
| SV017 | Bain Capital Ventures | Bain Capital Ventures Portfolio — Modal | |
| SV018 | Menlo Ventures | Menlo Ventures Portfolio | |
| SV019 | Tracxn | Modal Technologies — Tracxn company profile | |
| SV020 | Hacker News | Modal Major Outage — community report of three incidents in May–June 2026 | This is the third major outage in a month. 5.7.2026 - SEV 1, AWS us1-az4 overheats; 5.19.2026 - No published incident report; 6.3.2026 - Ongoing, internal auth system down. |
| SV021 | HostFleet | Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) | If you want to run an LLM, a diffusion model, or any custom inference workload and not own the GPU, you are picking between five real options in 2026: Runpod, Modal, Fal.ai, Baseten, and Replicate. |
| SV022 | Modal Labs | Modal pricing page | |
| SV023 | PitchBook | Modal Labs — PitchBook company profile | |
| SV024 | Sacra | Modal Labs research report | |
| SV025 | Modal Labs | Modal's Series C blog — announcing Series C milestones and growth | Sandboxes are one of the most important building blocks for Reinforcement Learning. |
| SV026 | Modal Labs | Modal customer showcase | |
| SV027 | Marketsandmarkets | AI Infrastructure Market — size, share, global forecast to 2030 | |
| SV028 | Technavio | AI Inference as a Service Market Industry Analysis | |
| SV029 | Mordor Intelligence | Cloud AI Market — size and share analysis | |
| SV030 | Modal Labs | Modal status page — 90-day uptime | |
| SV031 | Modal Labs | Truly serverless GPUs — Modal engineering blog on cold-start technology | |
| SV032 | Together AI | Together AI pricing page |