Baseten
Premium inference infrastructure for production AI workloads
Baseten is a high-quality AI inference infrastructure company with real enterprise traction and strong category positioning, but public financial disclosure is too thin to justify treating momentum pricing as a high-conviction buy.
Cover facts
Company profile
Baseten is a San Francisco-based inference infrastructure company founded in 2019 by Tuhin Srivastava, Amir Haghighat, Phil Howes, and Pankaj Gupta. The company positions itself as the software layer for running production AI workloads across its own cloud or a customer's environment, combining model APIs, dedicated inference, training, and enterprise deployment controls. Public customer proof spans AI-native products and regulated workloads including Cursor, Clay, OpenEvidence, Abridge, Gamma, Patreon, Speechify, Writer, Hebbia, Wispr Flow, and others. Baseten raised $300M at a $5B valuation in January 2026, bringing disclosed funding to about $585M and reinforcing its status as a late-stage private infrastructure company.
- Website
- www.baseten.co
- Founders
- Tuhin Srivastava, Amir Haghighat, Phil Howes, Pankaj Gupta
- Founding location
- San Francisco, CA, USA
- Headquarters
- San Francisco, CA, USA
- Product
- Baseten sells a production inference platform for custom models, model APIs, dedicated inference, training workflows, and compound-AI orchestration. Its Truss framework is the developer entry point, while enterprise features emphasize multi-cloud deployment, self-hosting, regional controls, compliance scope, and performance tuning for latency-sensitive workloads.
- Customers
- Enterprise AI teams, AI-native application builders, and regulated workloads that need low-latency, high-reliability model serving with security, hybrid deployment, or performance-engineering support. Public customer evidence is strongest in healthcare AI, developer tooling, GTM automation, voice, and productivity software.
- Business model
- Usage-based monetization: token-priced model APIs, per-minute GPU and CPU compute for deployments, and negotiated Pro or Enterprise contracts for dedicated capacity, support, and self-hosted or data- residency-sensitive environments. The platform also appears to expand via training, chains, and higher-touch enterprise engineering support.
- Stage
- Late-stage private (Series E)
- Funding status
- Public funding history includes a little over $20M across seed and Series A, a $40M Series B in 2024, a $75M Series C in February 2025, a $150M Series D in September 2025, and a $300M Series E at a $5B valuation in January 2026. Business Wire says total disclosed funding is about $585M.
Executive summary
Top strengths
- Premium positioning in a fast-growing part of the AI stack: production inference for open, custom, and hybrid workloads
- Strong recent fundraising and investor roster, with IVP, CapitalG, NVIDIA, Greylock, Spark, and others backing the company
- Clear developer and enterprise wedge via Truss, dedicated inference, self-hosting, and multi-cloud deployment controls
- Public customer proof across healthcare, coding, voice, GTM, and productivity workloads shows relevance beyond one narrow vertical
- Customer case studies repeatedly cite meaningful latency, throughput, and cost improvements versus prior inference setups
Top risks
- Public revenue, margin, burn, concentration, and retention data remain undisclosed, forcing investors to rely on estimated rather than filed financials
- Premium price positioning can be pressured by cheaper GPU clouds and by hyperscalers bundling inference with broader cloud relationships
- Reliability and compliance underwriting still depends on negotiated terms, BAAs, and custom SLAs rather than website messaging alone
- The jump from a $5B closed valuation to mooted $11B follow-on pricing is hard to defend without much better disclosure
- Baseten appears to run a support-heavy, performance-engineering-intensive model that could be harder to scale cleanly than pure software narratives imply
Open gaps
- Audited revenue, gross margin, burn, runway, and customer-concentration data are not public
- No public evidence resolves retention metrics such as NRR, churn, or contract duration
- Public governance visibility is limited; the full current board, committees, and founder ownership are not disclosed in the fetched corpus
- Healthcare and regulated-use underwriting still needs exact BAA, DPA, and shared-responsibility terms for production accounts
- The economics and terms behind any mooted post-Series-E financing are not publicly available
Contents
01Company Overview
1.1 Identity, History, and Leadership
Baseten is easiest to understand as a founder-led inference infrastructure company rather than a generic MLOps toolkit. Its own history traces the origin back to late 2019, when the four founders started the company to solve model-deployment pain they had experienced firsthand. Current legal pages anchor the business as Baseten Labs, Inc. in San Francisco, while the homepage, enterprise, and pricing surfaces consistently frame the product around high-performance inference, managed APIs, and training or deployment workflows rather than broad horizontal software. That identity matters for the rest of the report: Baseten is positioning itself as the software layer that runs production AI workloads across its own cloud or a customer’s environment, with compliance and data-control features aimed at more sensitive workloads. Leadership continuity is also unusually visible for a late-stage private company. Tuhin Srivastava is publicly surfaced as CEO and co-founder, Amir Haghighat as CTO and co-founder, and author pages still identify Pankaj Gupta and Phil Howes as co-founders. The Series E post is signed by all four founders, reinforcing that the company still presents a founder-centric leadership story. The caveat is governance transparency: the public corpus clearly shows one board addition in 2025, but it does not provide a full current board roster, committee structure, or founder ownership map.[CO001, CO002, CO003, CO004, CO005, CO006]
| Metric | Value / status | As of | Confidence | Note / gap |
|---|---|---|---|---|
| Founded | 2019 | 2019-01-01 | High | Exact public day is not surfaced in the fetched corpus, so the year is the reliable anchor. |
| Headquarters | San Francisco, California | 2026-05-30 | High | Privacy policy gives a specific San Francisco address. |
| Legal entity | Baseten Labs, Inc. | 2026-05-30 | High | Terms and privacy policy use the same legal entity name. |
| Current stage | Private, Series E | 2026-05-30 | High | Supported by Tracxn and the archived PitchBook profile after the January 2026 round. |
| Latest financing | $300M Series E at $5B valuation | 2026-01-20 | High | Led by IVP and CapitalG with NVIDIA and prior investors also participating. |
| Lifetime capital raised | $585M | 2026-01-23 | High | BusinessWire and market-data sources align on cumulative funding. |
| Business model | Usage-based API tokens plus per-minute compute | 2026-05-30 | Medium | Public pricing is clear; enterprise contract discounts and minimums are not. |
| Deployment model | Baseten Cloud, self-hosted, and region-aware enterprise options | 2026-05-30 | High | Official enterprise and healthcare pages emphasize self-hosting and data-control features. |
| Named customer set | Abridge, Cursor, Clay, OpenEvidence, Notion, Speechify, Gamma | 2026-05-30 | High | Named across careers, customer hub, customer stories, and Series E press materials. |
| Public headcount | 2026-05-30 | Low | PitchBook and Tracxn conflict, so current employee count should be treated as unresolved. |
Snapshot rows mix stable identity facts with current operating and financing markers; null means the fetched public corpus is not reliable enough to support a single number.
[CO001, CO002, CO003, CO004, CO006, CO007]| Person | Current role / public title | Public evidence of fit or coverage | Visibility today | Diligence implication |
|---|---|---|---|---|
| Tuhin Srivastava | CEO, co-founder | Public spokesperson on financing and company thesis; author page and financing coverage identify him as the chief executive. | High | Founder-led narrative remains a strength but also creates CEO key-person dependence. |
| Amir Haghighat | CTO, co-founder | Author page identifies the technical leader; Series E signoff keeps him in the visible founder set. | High | Technical and product credibility remain tied to a founding executive. |
| Phil Howes | Co-founder; chief scientist in independent coverage | Official author page plus Tech Funding News show ongoing founder visibility tied to model performance and research. | Medium | Science leadership appears founder-rooted even if the exact org chart is private. |
| Pankaj Gupta | Co-founder | Official author page and Series E signoff confirm continuity, but the fetched corpus does not surface a current operating title. | Medium | Functional coverage is less transparent than for the CEO and CTO. |
| Jay Simons | Board member since Series D | Series D explicitly says he joined the board as part of the BOND-led financing. | Low | Governance visibility improved in 2025, but the full board and committee map is still incomplete. |
Rows cover the founders and the one board addition that is explicit in the fetched corpus; the public materials reviewed do not expose a full executive roster or detailed committee structure.
[CO010, CO011, CO012, CO013, CO014, CO015]Baseten’s current story links founder continuity, deployment control, customer-proofed performance, and repeated capital access into one inference-platform thesis.
[CO004, CO005, CO006, CO008, CO010, CO011]1.2 Funding, Stage, and Investor Base
Baseten’s capital history is now the clearest external signal that the company has graduated into late-stage AI infrastructure. The fetched corpus supports a progression from a modestly funded early company to a Series E business: the Series A post says Baseten had raised a little over $20 million across seed and Series A; the Series B announcement adds $40 million; the Series C announcement adds $75 million; Series D adds $150 million; and Series E adds another $300 million at a $5 billion valuation. Independent market-data sources corroborate the timing of those rounds and place the September 2025 Series D valuation at about $2.15 billion, showing just how quickly the company repriced upward before the January 2026 round. The investor roster has also deepened rather than churned. Greylock and South Park Commons appear early, IVP and Spark become visible at growth stage, and later rounds add BOND, CapitalG, Conviction, 01A, BoxGroup, and NVIDIA. That pattern matters because it suggests both repeated insider support and a widening set of AI-specialist and platform investors. At the same time, public disclosure still stops well short of a full cap-table view: the fetched corpus does not expose current ownership percentages, investor control rights, liquidation preferences, or a fully reliable board-observer map.[CO016, CO017, CO018, CO019, CO020, CO021]
| Investor / stakeholder | First explicit round in corpus | Current relevance | Why it matters | Diligence ask |
|---|---|---|---|---|
| Greylock | Series A | Earliest clearly named institutional backer in the fetched official corpus | Anchors early company formation and remained visible through later growth-round narratives. | Confirm current ownership and pro-rata participation after Series E. |
| South Park Commons | Series A | Early network backer that still appears in later company histories | Signals founder-network support rather than pure late-stage capital. | Confirm whether SPC still holds a meaningful stake after multiple step-up rounds. |
| IVP | Series B | Growth-stage repeat lead that also led or anchored later rounds | Appears repeatedly across B, C, and E, making it one of the clearest long-duration financial sponsors. | Confirm board rights, reserve behavior, and concentration at Series E. |
| Spark Capital | Series B | Early growth investor visible across the 2024–2025 rounds | Helps show continuity from the generative-AI scaling phase into later financing momentum. | Confirm current stake and whether Spark still participates after Series E. |
| 01A | Series C | Later-stage investor that remains named in subsequent financing data | Links Baseten to operator-investor sponsorship from Adam Bain and Dick Costolo’s network. | Confirm whether 01A has governance rights or only economic exposure. |
| BOND | Series D | Series D lead and later participant in Series E | Important marker for the 2025 valuation step-up and board evolution. | Confirm whether BOND added special terms at the D-to-E transition. |
| CapitalG | Series D | Joined in D and co-led E | Potentially valuable strategic network around Google ecosystem distribution and infrastructure credibility. | Clarify commercial partnerships or channel overlap beyond pure equity sponsorship. |
| NVIDIA | Series E | Strategic investor in the latest round | Could matter for hardware access, performance collaboration, and signaling inside AI infrastructure. | Confirm whether the relationship includes commercial commitments or preferred hardware access. |
| Conviction | Series C | Visible across C, D, and E-era disclosure | Adds AI-specialist sponsorship and public advocacy for the inference-layer thesis. | Confirm present ownership and board or observer rights. |
| BoxGroup | Series D | Still named in later-round investor rosters | Shows continued support from earlier network investors even as the cap table deepens. | Confirm position size versus symbolic participation in later rounds. |
This map enumerates investors explicitly named across verified public financing sources from Series A through Series E; it is not a full cap table and does not expose ownership percentages or liquidation preferences.
[CO016, CO017, CO018, CO019, CO020, CO022]1.3 Product Scope, Scale Proof, and Milestones
The product and scale story is strong enough to explain why Baseten could raise three times in roughly a year, but it still needs to be read with some caution. Official materials now tie the company to a broad inference platform narrative: cloud and self-hosted deployment options, enterprise controls, model APIs, and pay-as-you-go compute pricing. The strongest external proof comes from customer and partner evidence. NVIDIA’s case study says Baseten cut cold starts to 5–10 seconds from up to five minutes and doubled one customer’s inference performance with TensorRT-LLM. Customer case studies claim that OpenEvidence runs billions of requests per week on Baseten, Gamma serves roughly 3 million images per day for 70+ million users, Speechify cut cost per million characters by 44%, and Patreon cut GPU cost by 70%. Those numbers support the idea that Baseten is serving meaningful production workloads across healthcare, productivity, creator, and GTM software. Still, chapter-one diligence should preserve three caution flags. Public headcount is inconsistent across data vendors, governance detail remains thin outside financing announcements, and independent reachability monitoring does not provide enough incident detail to fully evaluate reliability history. So the right takeaway is not that Baseten lacks scale; it is that the business looks operationally significant while remaining unusually private relative to its valuation.[CO005, CO006, CO007, CO008, CO021, CO028]
| Date | Event | Type | Amount / valuation / status | Participants | Implication |
|---|---|---|---|---|---|
| 2019-01-01 | Founders start Baseten to solve ML deployment pain | founding | Company formed | Tuhin Srivastava; Amir Haghighat; Phil Howes; Pankaj Gupta | Establishes the origin point for the current inference-first narrative. |
| 2021-05-01 | Early product is quietly announced after roughly 18 months of building and public beta is highlighted in the Series A post | product | Public beta era | Baseten founders | Shows that the company moved from internal build mode into market testing well before the later capital sprint. |
| 2022-04-26 | Series A milestone formalizes early investor backing | financing | > $20M cumulative seed + Series A | Greylock; South Park Commons; Lachy Groom; Ray Tonsing; angel investors | Validates early demand for the original model-deployment product vision. |
| 2024-03-04 | Series B adds growth-stage capital | financing | $40M | IVP; Spark; Greylock; South Park Commons; Lachy Groom; Base Case | Moves Baseten from early MLOps roots into broader generative-AI infrastructure expansion. |
| 2025-02-19 | Series C pairs funding with public scale claims | scale | $75M; workloads across thousands of GPUs; millions of end customers | IVP; Spark; Greylock; Conviction; South Park Commons; Basecase; 01A-linked investors | Demonstrates that infrastructure scale became central to the story before the late-stage jump. |
| 2025-09-05 | Series D raises new growth capital and adds a board member | governance | $150M at about $2.15B valuation | BOND; CapitalG; Conviction; Jay Simons | Capital formation and governance maturation start to move together. |
| 2026-01-14 | WorkOS interview highlights a startup program and voice as an emerging modality | product | New GTM program and voice focus | Philip Kiely; WorkOS interviewers | Suggests Baseten is broadening market coverage and prioritizing voice workloads in the next phase. |
| 2026-01-20 | Series E establishes Baseten as a late-stage inference platform company | financing | $300M at $5B valuation; third fundraise in prior year | IVP; CapitalG; NVIDIA; 01A; Altimeter; Battery Ventures; BOND; BoxGroup; Blackbird Ventures; Conviction; Greylock | Confirms investor appetite for independent inference infrastructure at significant scale. |
Year-only dates use January 1 and month-only dates use the first day of the cited month when the fetched public source supports the timing but not a precise day.
[CO001, CO009, CO016, CO017, CO018, CO019]The chronology shows Baseten moving from a 2019 founding into a compressed A-to-E financing sequence and a broader inference-platform narrative by early 2026.
Year-only or month-only milestones use January 1 or the first day of the cited month when the fetched source does not provide a precise public day.
[CO001, CO016, CO017, CO018, CO019, CO020]These KPIs are not internal financial statements; they are the clearest public scale and customer-outcome markers visible in the fetched corpus.
Customer metrics come from individual case studies and should be read as proof points rather than a consolidated operating dashboard for Baseten itself.
[CO025, CO027, CO031, CO032, CO033, CO034]1.4 Exhibits
02Market Analysis
2.1 Market Boundary, Included Spend, and Substitutes
Baseten is best understood as a production inference platform, not a general-purpose cloud or a model lab. The included spend is the layer required to package, deploy, run, monitor, meter, and secure AI workloads after a team already has a model or a model endpoint in mind: model APIs, dedicated deployments, compound-AI orchestration, observability, billing, and the support needed to keep latency and uptime inside production targets. The market boundary is narrower than the full enterprise-AI stack because Baseten does not market data lakes, BI, generic application development, or broad agent productivity suites as the center of its value proposition. It is also narrower than frontier-model R&D because Baseten helps teams operationalize models rather than invent them. The closest substitutes are hyperscaler AI platforms, internal GPU infrastructure, and specialist GPU clouds such as Modal, Replicate, and Runpod. Baseten's adjacencies sit one step above and below the core deployment layer: training that promotes directly into inference, and model-lab monetization through white-labeled APIs.[CM001, CM002, CM003, CM004, CM005, CM006]
| Segment / Category | Included Spend | Excluded Spend | Buyer / Payer | Relevance to Baseten |
|---|---|---|---|---|
| Production inference platform | Model-serving runtime, autoscaling, observability, billing, support, security controls | Foundation-model R&D, generic data warehousing, generic app-dev tools | AI product lead or platform team; product or IT budget pays | Core market where Baseten is explicitly positioned |
| Model APIs | Usage-based inference endpoints, token metering, OpenAI-compatible access | Closed-model ownership or application-layer SaaS feature spend | Application engineer or product team; engineering budget pays initially | Low-friction entry point and evaluation wedge |
| Dedicated / self-hosted inference | Dedicated GPU capacity, self-hosting, data residency, enterprise support | General colocation, generic Kubernetes services, unmanaged GPU reservations | Head of AI platform, CIO/CTO, security or procurement stakeholders | Enterprise expansion path for sensitive or scaled workloads |
| Compound AI orchestration | Multi-model chains, hardware-aware orchestration, workflow optimization | Generic workflow automation or iPaaS tools | ML engineer or application engineering lead; product/platform budget pays | Important expansion layer for multimodal and agentic workloads |
| Model-lab API monetization | White-labeled API endpoints, rate limits, API keys, billing and metering | Consumer billing software or payment processors | Model lab or frontier-model product team; R&D/platform budget pays | Distinct segment exposed by Frontier Gateway |
| Training-to-inference loop | Managed training jobs and checkpoint promotion into inference | Frontier research spend and pure experimentation without deployment intent | ML research lead or platform team; R&D budget pays | Adjacency that deepens platform stickiness but is not the primary market lens |
Boundary rows mix Baseten's core market with immediately adjacent spend layers. The point is to define what belongs in the commercial wedge before applying public market-size estimates.
[CM001, CM002, CM003, CM004, CM006, CM008]2.2 Multiple Sizing Lenses and Why They Do Not Collapse into One TAM
Public sizing evidence is directionally strong but category boundaries remain messy. Technavio's narrower AI inference-as-a-service category is already a large market at USD 85.25 billion in 2025 and is forecast to grow 22.1% annually through 2030. Fortune Business Insights uses a broader AI inference lens and puts the category at USD 117.80 billion in 2026, while Mordor Intelligence sizes the adjacent enterprise AI market at USD 114.87 billion in 2026 and shows that software or platform layers and cloud deployment dominate spending. Those numbers should not be added together, because they partially overlap and use different definitions, but they do triangulate the same conclusion: Baseten is not pursuing a niche budget line. The useful valuation discipline is to keep moving down the stack from broad AI platform spend to the narrower production-inference wedge where Baseten actually competes. That narrower wedge is cloud-first, platform-heavy, North-America-concentrated, and already large enough that Baseten does not need heroic market-share assumptions to justify a meaningful opportunity. What public data does not provide is a clean Baseten-specific SAM or SOM.[CM009, CM010, CM011, CM012, CM013, CM014]
| Lens | Publisher | Base year / horizon | Geography | Metric | Value | Limitation |
|---|---|---|---|---|---|---|
| AI inference-as-a-service market size | Technavio | 2025 base, 2026-2030 forecast | Global | Market size | USD 85.25B | Narrower service-based inference category; summary page only |
| AI inference-as-a-service growth | Technavio | 2026-2030 | Global | CAGR | 22.1% | Forecast, not current revenue; category scope differs from broader inference reports |
| Broader AI inference market size | Fortune Business Insights | 2026 | Global | Market size | USD 117.80B | Broader execution market spanning cloud, edge, and on-prem |
| Broader AI inference growth | Fortune Business Insights | 2026-2034 | Global | CAGR | 12.98% | Longer forecast horizon than Technavio |
| Adjacent enterprise AI market size | Mordor Intelligence | 2026 | Global | Market size | USD 114.87B | Much broader than inference alone |
| Platform-heavy slice inside enterprise AI | Mordor Intelligence | 2025 | Global | Software/platform share | 65.89% | Share of broader enterprise AI market, not Baseten-specific SAM |
| Cloud-first deployment lens | Mordor Intelligence | 2025 | Global | Cloud deployment share | 67.33% | Share of enterprise AI revenue, not inference-only spend |
| Large-enterprise buyer concentration | Mordor Intelligence | 2025 | Global | Large-enterprise share | 71.43% | Useful for buyer concentration, not direct TAM |
| Regulated-healthcare growth wedge | Mordor Intelligence | 2026-2031 | Global | Healthcare CAGR | 20.77% | Vertical growth lens rather than whole-market size |
These rows are sizing lenses, not additive market totals. Public sources use overlapping definitions, so the safest use is triangulation rather than arithmetic aggregation.
[CM009, CM010, CM012, CM013, CM014, CM015]A narrowing lens from broad enterprise AI spend toward Baseten's more defensible beachhead in performance-sensitive, compliant production inference.
This pyramid is a narrowing logic chain, not an additive model. The middle layers mix market share and market size because public sources do not publish one clean Baseten-specific hierarchy.
[CM005, CM009, CM013, CM014, CM015, CM020]Published hourly GPU-rate spread across specialist providers illustrates how visible raw infrastructure pricing has become in this market.
Ranges come from HostFleet's April 2026 matrix of vendor-published prices. They are not performance-normalized benchmark results and do not include negotiated enterprise discounts.
[CM031, CM032, CM043, CM044, CM047]2.3 Buyer, User, and Payer Segmentation
The public buyer evidence points to three especially relevant segments. First are AI-native product teams such as Gamma, where product or engineering leaders care about launch speed, low latency, and lower-cost open-model serving without building a dedicated ML-infrastructure team. Second are enterprise AI platform teams and model builders such as Writer, where the user is the ML engineer or data scientist but the deployment decision widens to include platform, security, and procurement once workloads become dedicated, multi-GPU, or compliance-sensitive. Third are regulated vertical deployments such as OpenEvidence in healthcare, where reliability, data handling, and the ability to scale without signing large GPU commitments become explicit selection criteria. Baseten's packaging supports these segments differently: usage-based plans and model APIs lower the barrier for experimentation, while Enterprise, self-hosting, SSO or SCIM, compliance policies, and billing APIs are signals that larger buyers expect governance, attribution, and controlled rollout. The budget owner is therefore partly observed and partly inferred: it likely starts in product or engineering budgets and migrates toward central platform or IT budgets as the deployment becomes more strategic.[CM016, CM017, CM021, CM022, CM023, CM024]
| Segment | Buyer | User | Payer | Workflow | Budget owner | Adoption trigger |
|---|---|---|---|---|---|---|
| PLG AI application team | VP Engineering or product lead | Application engineer and ML engineer | Product or engineering budget | Prototype with Model APIs then move to dedicated capacity | Product engineering | Launch-day latency, reliability, and lower open-model cost |
| AI-native startup platform team | CTO or Head of AI | ML engineer and data scientist | Infrastructure or platform budget | Replace closed-model dependence with managed open-model serving | Engineering / platform | Need performance without hiring an infra-specialist team |
| Large-enterprise AI platform team | Head of AI platform, CIO, or CTO | Platform engineer, ML engineer, data scientist | Central platform or IT budget | Deploy compliant production inference across business units | Platform / IT | Dedicated capacity, SSO/SCIM, compliance policy, cloud commitments |
| Regulated healthcare AI workload | VP Engineering, CTO, or clinical-product leader | ML engineer or application engineer | Platform plus security/compliance budget | Medical search, transcription, or patient-facing assistant deployment | Platform plus security | HIPAA, uptime, and data-control requirements |
| Model lab or proprietary-model vendor | Research-product leader or commercialization lead | Inference engineer and research engineer | R&D or platform budget | White-labeled API monetization through Frontier Gateway | R&D / platform | Need to sell inference without building a customer-facing control plane |
| Compound AI / multimodal team | Head of AI application or staff engineer | Full-stack engineer plus ML engineer | Product plus platform budget | Chains-based orchestration across multiple models and machines | Product / platform | Latency and GPU waste from monolithic deployments |
Buyer and budget-owner fields combine directly stated product packaging with cautious inference from customer stories. Public evidence is stronger on user and trigger than on exact signature authority.
[CM016, CM021, CM022, CM023, CM024, CM025]Qualitative fit heatmap showing where Baseten's compliance, cloud, and premium-support proposition appears strongest by segment.
Ratings synthesize public evidence on cloud deployment share, healthcare growth, compliance requirements, and visible pricing pressure; they are not measured win-rate data.
[CM015, CM017, CM029, CM043, CM046, CM047]2.4 Deployment Value Chain and Adoption Path
Baseten's value chain begins with either an open-source model, a custom model, or a proprietary model that already exists and needs to be turned into a dependable production service. From there the daily user is usually a model, platform, or application engineer who packages the workload and evaluates latency, throughput, and cost. The next gate is organizational rather than technical: security, compliance, and procurement checks intensify when the workload needs dedicated capacity, data residency controls, or identity integration. Baseten then sits in the orchestration layer that decides whether the workload runs through Model APIs, Dedicated Inference, Chains, Frontier Gateway, or a self-hosted or hybrid deployment. Under that layer sits the actual cloud and GPU substrate, which remains economically critical because capacity, price, and regional availability directly determine margin and reliability. Baseten's customer stories suggest that the company tries to move value capture upstream from raw GPU rental into performance engineering, deployment tooling, and operational support, because those are the layers customers cite when they explain why they did not keep building in-house.[CM021, CM025, CM027, CM028, CM029, CM030]
Baseten sits between model creation and end-user traffic, trying to capture value above raw GPU supply by owning deployment, controls, and performance operations.
This value chain is synthesized from product packaging, customer stories, and competitor documentation; it is a market-structure diagram, not an internal process map from Baseten.
[CM021, CM025, CM027, CM028, CM029, CM030]2.5 Growth Drivers, Adoption Constraints, and Market Discipline
The strongest growth drivers are clear in both vendor and analyst material. Open-source models are improving fast enough that product teams increasingly want infrastructure optimized for those models rather than closed-model dependence; real-time and compound-AI workloads make latency and throughput economically visible; and enterprise buyers are moving from pilots into production, especially where regulated data, uptime, or model-performance tuning matter. Baseten's own case studies back that thesis with concrete claims on latency, tokens per second, image throughput, and reduced maintenance burden. But this is not an easy market. Hardware supply constraints, tariffs, and high accelerator prices remain structural headwinds. Talent shortages and legacy-system integration complexity slow rollout for enterprise buyers. Public competitive pricing also shows that raw GPU-hour economics are unforgiving: cheaper specialist clouds and broader hyperscaler suites both put pressure on a standalone inference vendor. That is why the right market view for Baseten is not all AI infrastructure; it is the subset of inference workloads where premium support, compliance, and performance matter enough to offset higher headline price points. Public data supports that wedge, but not yet a precise economic moat or a cleanly measurable serviceable market.[CM031, CM032, CM033, CM034, CM035, CM036]
| Factor | Direction | Timing | Implication | Diligence ask |
|---|---|---|---|---|
| Open-source frontier models improve price/performance | Driver | Now | Makes specialist inference platforms more attractive than closed-model APIs for cost-sensitive products | What percentage of Baseten revenue already comes from open-model workloads? |
| Real-time and compound-AI latency sensitivity | Driver | Now | Raises willingness to pay for performance engineering, orchestration, and autoscaling | How much of usage is latency-critical versus offline batch? |
| Cloud-first enterprise AI deployment | Driver | Now | Supports adoption of managed inference rather than self-built infra for many teams | How much of Baseten demand comes from cloud-first versus self-hosted accounts? |
| Regulated-sector demand for compliance and data control | Driver | Now | Creates a wedge for HIPAA, region restrictions, and hybrid/self-hosted deployment | What share of enterprise pipeline requires regulated deployment boundaries? |
| GPU supply constraints and tariff pressure | Constraint | Now | Raises cost of goods sold and can limit capacity availability | What reserved-capacity strategy or cloud diversification protects supply? |
| Skills gaps and integration complexity | Constraint | Medium term | Slow enterprise rollouts and increase implementation burden | How much deployment work is productized versus services-heavy? |
| Price competition from specialist GPU clouds | Constraint | Now | Commodity GPU-hour comparisons can make Baseten look expensive on paper | Where does Baseten consistently win despite higher list prices? |
| Hyperscaler platform bundling | Constraint | Medium term | Broader native-cloud suites can absorb spend that might otherwise go to a specialist inference vendor | Which workloads truly require a specialist rather than a hyperscaler-native stack? |
| Opaque unit economics and support attachment | Constraint | Diligence now | Public material does not show whether premium positioning translates into durable margin | Request product-level gross margin and enterprise discount data. |
The driver and constraint rows mix third-party market reports, Baseten product claims, customer evidence, and an independent pricing matrix. They are intended as a diligence framework, not a weighted scoring model.
[CM031, CM032, CM033, CM034, CM038, CM039]2.6 Exhibits
03Competitors
3.1 Competitive landscape and job-to-be-done coverage
Baseten sits in a crowded middle layer between low-friction serverless inference peers and large-cloud incumbents. The closest direct substitutes are Modal, Replicate, and Runpod: all give developers a way to get models onto GPUs without owning the infrastructure outright, but each compresses the stack in a different way. Modal optimizes for Python-native serverless compute, Replicate for community models and ultra-low-friction APIs, and Runpod for cheap raw capacity through Pods and Serverless. Above them sit AWS Bedrock/SageMaker, Google Vertex AI, and Azure ML, which compete less on indie-developer ergonomics and more on procurement leverage, governance, and existing cloud commitments. Below them sits the status-quo alternative: internal build on top of open packaging standards and rented GPUs. Baseten broadens the battlefield further because it sells not just deployment, but also training, multi-step orchestration, and white-labeled API monetization for model labs. That breadth means the company is not in a two-vendor race; independent datasets and company materials alike point to a fragmented, multi-class landscape where buyers can substitute across hosted inference, raw compute, hyperscaler tools, or self-managed stacks depending on whether they optimize for speed, control, trust, or cost.[CP001, CP002, CP004, CP005, CP006, CP007]
| Competitor | Category | Scale/funding | Target segment | Differentiation | Limitation |
|---|---|---|---|---|---|
| Modal | Direct serverless peer | $30/mo Starter credit; Team plan at $250/mo + compute | AI engineers and startups | Python-first serverless DX, instant autoscaling, observability | No public self-host option; enterprise controls concentrated in paid tiers |
| Replicate | Direct hosting/API peer | Thousands of community models; custom deployment via Cog | Developers, prototyping teams, model tinkerers | One-line API, model marketplace, fine-tunes | Private models bill setup + idle time and public enterprise posture is thinner |
| Runpod | Raw GPU cloud / serverless substitute | 750,000+ developers; Pods + Serverless + Clusters | Cost-sensitive AI builders and infra-heavy teams | Cheapest published raw GPU rates, many SKUs, fast scale | More DIY serving stack and less turnkey inference lifecycle tooling |
| AWS Bedrock / SageMaker | Hyperscaler incumbent | AWS-scale data/AI platform with provider/model menu | Enterprises already committed to AWS | Procurement leverage, governance, wide ecosystem | Complex pricing and stronger cloud lock-in |
| Google Vertex AI | Hyperscaler incumbent | 200+ Google and third-party models/tools | GCP enterprise and platform teams | Model Garden, pipelines, integrated data + AI stack | Management fees and GCP dependence complicate simple cost comparisons |
| Azure ML | Hyperscaler incumbent | Azure-native ML platform with 99.9% SLA | Azure-centric and regulated enterprises | Centralized studio, model catalog, Azure security posture | Separate Azure service charges and no public multi-cloud story |
| Internal build (Truss/Cog + rented GPUs) | Status quo / internal build | Portable open-source packaging on owned or rented infrastructure | Teams with strong platform engineering capacity | Maximum control and lowest software lock-in | Highest operational burden for scaling, reliability, and compliance |
| Model labs building branded APIs | Adjacent / likely entrant | Direct API ownership with custom billing and metering surfaces | Frontier model vendors and specialized labs | Own brand, own customer relationship, direct monetization | Hard to maintain capacity planning and enterprise operations without a managed partner |
Rows compare the main ways buyers can solve the same deployment job, including direct peers, incumbents, and internal-build substitutes.
[CP004, CP005, CP006, CP007, CP008, CP009]Ordinal scoring of self-serve simplicity versus enterprise control / portability.
[CP004, CP006, CP007, CP023, CP025, CP028]3.2 Capability and pricing comparison
Baseten compares best when the buyer wants a managed inference platform rather than just a GPU rental or a one-line demo API. Public materials show a stack that combines custom-model packaging, OpenAI-compatible Model APIs, training, Chains orchestration, enterprise deployment modes, and a runtime built around low-latency optimization techniques. Modal is the sharpest developer-experience counterpoint: clean serverless pricing, generous monthly credits, and explicit GPU concurrency limits make it compelling for teams that mainly need elastic Python compute. Replicate is even lighter weight for prototypes and model discovery, but its private-model economics include setup and idle time on dedicated hardware. Runpod is the price-floor alternative, publishing cheaper raw hourly and per-second GPU rates while leaving more of the serving lifecycle to the customer. Hyperscalers are harder to compare on a like-for-like basis because Bedrock, Vertex, and Azure ML wrap model access in broader cloud billing, governance, and platform fees. Net: Baseten's public list pricing is transparent and feature-rich, but it clearly sells performance, portability, and support rather than commodity compute. That is a valid wedge only if customers value total production outcomes over the cheapest published GPU-hour.[CP003, CP011, CP012, CP013, CP014, CP015]
| Buying criterion | Baseten | Modal | Replicate | Runpod | Bedrock / SageMaker | Vertex AI | Azure ML |
|---|---|---|---|---|---|---|---|
| Custom-model packaging framework | Truss | Python functions | Cog | Container / handler model | Custom training + deployment | Custom training + deployment | Model catalog + deployment |
| OpenAI-compatible hosted open models | yes | unknown | partial | unknown | partial | partial | partial |
| Managed training on same platform | yes | unknown | fine-tunes only | yes | yes | yes | yes |
| Self-host / customer-cloud option | yes | unknown | unknown | unknown | no public BYOC outside AWS | no public multi-cloud option | no public multi-cloud option |
| Multi-cloud / cloud-agnostic routing | yes | yes | unknown | many regions / no lock-in claim | no | no | no |
| Enterprise trust posture | SOC2 + HIPAA + single-tenant | Enterprise SSO / audit / HIPAA | unknown | SOC2 Type II | enterprise governance | enterprise governance | 99.9% SLA + Azure controls |
| Multi-step orchestration built in | Chains | generic functions | custom code only | queues + serverless | broader platform services | pipelines + agents | broader ML studio |
| Public list pricing transparency | high | high | medium | high | medium | medium | low |
Cells marked unknown reflect missing public evidence; the matrix compares buying criteria, not benchmark winners.
[CP013, CP014, CP015, CP019, CP020, CP021]| Vendor | Public model | Contract model | Price signal | Included capabilities | Implication |
|---|---|---|---|---|---|
| Baseten Basic | Custom + open-source deployment | $0/mo + usage | Public GPU and per-token tables; no idle-charge claim | Dedicated deployments, Model APIs, training | Transparent entry point for production workloads |
| Baseten Pro / Enterprise | Quoted | Sales-led / discounted | Priority compute, custom SLAs, self-host, volume discounts | Dedicated support, data residency, enterprise controls | Upsell is breadth and support, not lower public list price |
| Modal Starter | Serverless compute | $0 + compute | $30/mo credits; 10 GPU concurrency | Logs, region selection, serverless primitives | Excellent prototype and small-team on-ramp |
| Modal Team | Serverless compute | $250/mo + compute | $100/mo credits; 50 GPU concurrency | Custom domains, static IP, rollbacks | Scales with startups while staying compute-centric |
| Replicate private models | Dedicated hardware for custom models | Per-second compute incl. setup / idle / active | No fixed seat plan; pay while instance is online | Custom models via Cog, autoscaling | Warm custom deployments can get expensive |
| Runpod Secure Cloud | Raw GPU instances | Per-hour GPU rental | Example list rates include A100 at $1.39/hr and H100 PCIe at $2.89/hr | Reliable pods, broad GPU menu | Cost floor for buyers willing to self-manage |
| Runpod Serverless | Flex or active workers | Per-second | H100 flex $0.00116/s and active $0.00093/s | API endpoints, queues, fast cold starts | Attractive for bursty inference and scale-to-zero workloads |
| AWS Bedrock | Provider/model-specific APIs | Per-token + tiered service | Batch listed at 50% below on-demand | Managed model access plus AWS add-ons | Easy incumbent API path but bill complexity is higher |
| Google Vertex AI | Agent/model platform | Usage + compute + fees | Compute/storage plus management fees; pipelines at $0.03/run | Notebooks, Model Garden, pipelines | Best fit inside existing GCP estate |
| Azure ML | Azure-native ML platform | Consumed Azure services | No standalone Azure ML fee | Studio, model catalog, deployment | Procurement advantage for Azure-first buyers |
Public list pricing is comparable only at the headline level; negotiated discounts and workload-specific costs remain private across most enterprise deals.
[CP011, CP012, CP016, CP017, CP018, CP019]Class-level capability strengths rather than vendor-by-vendor benchmark claims.
[CP014, CP023, CP024, CP025, CP028, CP031]3.3 Distribution power, switching costs, and trust posture
Baseten's strongest non-performance argument is that customers can keep control while still avoiding the operational burden of a self-built inference platform. Multi-cloud routing, self-hosted deployment, and single-tenant options help win buyers who fear locking critical workloads into one hyperscaler or who need tighter data-residency boundaries. The tradeoff is structural: because Baseten also relies on portable, open packaging and because adjacent tools like Cog plus raw GPU clouds remain available, hard switching costs stay lower than in closed-model APIs or data platforms. On trust, Baseten's public posture is ahead of the self-serve peer set: it pairs SOC 2 Type II and HIPAA claims with explicit statements about not storing inputs or outputs by default. Modal narrows part of that gap through enterprise-only SSO, audit logs, and HIPAA; Replicate stays strongest on ease of adoption; Runpod stays strongest on low-cost infrastructure freedom. The biggest distribution disadvantage remains hyperscaler channel power. AWS, GCP, and Azure can fold AI procurement into pre-existing billing, IAM, and cloud-commitment relationships, which means Baseten must keep proving that open-model performance, portability, and support justify a separate vendor decision.[CP023, CP024, CP025, CP026, CP027, CP028]
3.4 Moat durability and competitive risk
Baseten's moat is real but softer than a proprietary-model or data-network moat. The best-supported edge is integrated execution: optimized runtimes, multi-cloud capacity, regulated deployment options, and hands-on engineering support aimed at teams shipping serious AI products. Training and Frontier Gateway broaden the product into a larger platform story, which can strengthen account control if customers standardize on one vendor from model development through branded API delivery. The countervailing evidence is meaningful. HostFleet's April 2026 serverless GPU matrix shows Baseten as the most expensive published option across multiple common GPU tiers, while Sacra explicitly warns that hyperscalers can pressure independent vendors by bundling inference into broader cloud commitments. Baseten's own status page reports 99.91% Model API uptime over its displayed window and multiple May 2026 incidents, so four-nines reliability should be read as a sales target rather than proof that operations are already frictionless. Capital helps but does not eliminate the problem: the latest financing and investor roster improve staying power in a capital-intensive market, yet the core underwriting conclusion remains that Baseten is best positioned for premium, production-grade open-model inference workloads—not for winning the commodity price war on raw GPU hours.[CP031, CP032, CP033, CP034, CP035, CP036]
| Moat claim | Threat | Severity | Mitigation / diligence ask |
|---|---|---|---|
| Integrated runtime + orchestration stack | Serverless peers replicate parts of the DX without matching full breadth | medium | Benchmark full application workflows, not just one endpoint latency number |
| Multi-cloud + self-host portability | Customers can multi-home or migrate away more easily than in closed platforms | medium | Measure retention and expansion by deployment mode to see if portability still converts into durable spend |
| Enterprise trust posture | Hyperscalers bundle governance into existing contracts and cloud commitments | high | Collect regulated-industry win/loss notes against AWS, GCP, and Azure |
| Training + gateway expansion | Baseten now competes with broader AI platform vendors and model-lab tooling | medium | Quantify how often training and gateway products lead to incremental inference revenue |
| Open-source packaging | Truss lowers lock-in and shrinks switching cost | high | Track Truss-to-paid conversion and production retention by cohort |
| Price premium versus raw GPU clouds | Runpod and similar hosts undercut headline infrastructure price | high | Prove total-cost-of-ownership and reliability ROI with customer benchmarks |
| Reliability brand | Recent incidents weaken the four-nines story in competitive deals | medium | Review incident frequency, MTTR, and customer multi-region failover patterns |
| Capital backing | Hyperscalers and other well-funded infra vendors can still outspend Baseten | medium | Validate whether investor and GPU-supply ties create real commercial or capacity advantage |
Severity reflects the likelihood that each threat compresses pricing power or raises customer acquisition friction over the next 12–24 months.
[CP023, CP024, CP025, CP029, CP030, CP031]Compact readout of Baseten's competitive durability and current pressure points.
[CP031, CP033, CP035, CP038, CP039, CP040]3.5 Exhibits
04Financials
4.1 Revenue model and public pricing
Baseten's public financial story starts with a straightforward revenue design: the company charges for production inference and adjacent infrastructure usage, not for seats. The pricing page exposes three commercial surfaces—dedicated deployments, Model APIs, and Training—and wraps them in a Basic self-serve tier plus quote-based Pro and Enterprise packaging. Model APIs are priced per million tokens, dedicated deployments are billed for compute used down to the minute, and Training is sold as both managed Training Jobs and the newer Loops workflow that feeds checkpoints straight into inference. That is a coherent production-infrastructure monetization model, and the billing-usage API reinforces it by splitting spend across dedicated inference, training, and Model APIs with daily breakdowns and credits used. The nuance is that list pricing is only the outer shell of the model. Public materials show that the real commercial wedge sits in premium support, priority compute, self-host deployment, use of existing cloud commitments, custom SLAs, and advanced security or governance. Those features imply Baseten is trying to monetize both usage and a higher-touch enterprise motion. The Terms also make the customer Order the binding commercial instrument, which means realized pricing can diverge materially from the public price page once support, discounts, and minimum commitments enter the deal. That is important for revenue recognition and yield analysis because public list prices are observable, but list-to-net economics are not. The result is a credible revenue architecture with decent public visibility into billing units and weak visibility into realized revenue quality. Public evidence can show how Baseten intends to charge; it cannot show revenue mix, attach rates for services, or what customers actually pay after enterprise negotiation.[CI001, CI002, CI003, CI004, CI005, CI006]
| stream | mechanism | unit | current value/status | quality | diligence ask |
|---|---|---|---|---|---|
| Dedicated deployments | Per-workload GPU compute plus deployment controls and support | GPU-minute / contract | Public list pricing and quote-based Pro/Enterprise packaging exist | Medium for billing unit; low for realized price | Provide customer-level realized rate cards, discounts, and gross margin by GPU family. |
| Model APIs | Usage-priced hosted models through OpenAI-compatible endpoints | 1M tokens | Public list pricing exposed with separate input, cached-input, and output columns | High for list unit; low for realized yield | Provide token volume by model, cache share, batch share, and realized net revenue per token. |
| Training Jobs / Loops | Managed training workloads that connect directly into inference deployment | GPU-minute / job / contract | Commercial surface exists, but public list pricing is not disclosed | Low | Provide training-job pricing schedule, contribution margin, and attach rate into production inference. |
| Support and engineering services | Hands-on engineering, Slack/Zoom support, deployment optimization, and enterprise assistance | service attachment / contract | Clearly present in Pro and Enterprise messaging, but no standalone rate card | Low | Provide services attach rate, blended pricing, and whether support is margin-accretive or subsidized. |
| Enterprise self-host / cloud-commitment portability | Baseten software and support layered into customer-cloud or hybrid deployments | custom contract | Publicly marketed as a key enterprise feature set | Low | Provide typical annual contract value, minimum commit, and renewal behavior for self-hosted accounts. |
Public evidence supports revenue surfaces and billing units, but not product-level revenue mix or realized pricing.
[CI001, CI002, CI003, CI004, CI005, CI006]| price / unit / contract | list vs realized pricing | discounts / unknowns | source-backed implication |
|---|---|---|---|
| Basic: $0 per month pay-as-you-go | Pure list price | No public conversion, ARPU, or activation data | Self-serve entry point broadens funnel but says nothing about paid conversion. |
| Pro: quote-based with priority compute, dedicated compute, higher API rate limits, dedicated support | List package, realized price hidden | Volume discounts available but depth undisclosed | Revenue quality likely depends on how often support and priority capacity attach to usage. |
| Enterprise: quote-based self-host, custom SLAs, cloud commitments, data residency, advanced RBAC | List package, realized price hidden | No public minimum commit, renewal terms, or services pricing | Enterprise value proposition is operational control, not transparent SKU pricing. |
| Model APIs: per-1M-token pricing with separate input, cached input, and output rates | List pricing | No enterprise rate card, batching curve, or mix data | Useful for benchmarking, but list token prices are not realized revenue. |
| Dedicated compute: billed by compute used down to the minute | List billing rule | HostFleet says minimum dedicated deployment cost and billed awake times still apply | Scale-to-zero helps, but minimums can compress savings for spiky workloads. |
| Fees and invoicing: billed end-of-month and due in 30 days unless an Order says otherwise | Contract rule rather than product list price | Orders govern actual economics | Revenue recognition and payment timing likely vary by negotiated enterprise order form. |
This table separates public list mechanics from private realized economics; all discount, minimum-commit, and enterprise-rate questions remain open.
[CI002, CI003, CI004, CI005, CI006, CI007]Baseten turns usage across dedicated deployments, Model APIs, and Training into metered spend, then converts a subset into higher-value enterprise and support contracts.
Flow depicts commercial logic rather than a quantified waterfall; public evidence does not disclose revenue mix or realized contract values.
[CI001, CI005, CI006, CI008, CI011, CI012]4.2 GTM motion and sales-efficiency proxies
Baseten's go-to-market is best understood as land-with-usage, then expand through reliability, support, and deployment control. The Basic plan and public Model APIs create a low-friction developer entry point, but the monetization narrative shifts quickly toward Pro and Enterprise features such as dedicated compute, higher rate limits, hands-on engineering, self-hosting, and cloud-commitment portability. That packaging suggests Baseten is not trying to win a commodity self-serve race alone; it is trying to become the production-inference layer for teams that care about latency, uptime, and control enough to pay for operational help. Because Baseten is private, CAC, payback, enterprise sales cycle length, and NRR are unavailable. The best public substitutes are customer case studies. Writer credits Baseten with lower cost per million tokens and higher throughput on 70B-class models. OpenEvidence emphasizes flexible access to compute without multi-year reservations plus large deployment and maintenance-time gains. Speechify reports that it could retire a large self-managed GPU estate while cutting cost per million characters. Superhuman and Patreon frame the value proposition as saving scarce engineering time while materially improving latency or lowering GPU cost. Those are not audited financials, but they are directionally consistent with a GTM motion that sells time-to-production and lower total operating cost rather than just list-price compute. The evidence therefore supports a plausible expansion engine, but only in proxy form. The buyer logic is visible; the sales-efficiency math is not. Without internal conversion, retention, and sales-spend data, Baseten's GTM efficiency cannot be underwritten with confidence.[CI003, CI004, CI014, CI018, CI019, CI020]
4.3 Cost structure and unit-economics proxies
Baseten's public materials point to an asset-light but not necessarily low-price cost structure. Sacra describes the company as aggregating capacity across more than 15 cloud providers rather than owning GPU infrastructure outright, which should keep fixed asset intensity below a provider that buys and finances GPU fleets directly. Official materials reinforce the same model from another angle: Baseten talks constantly about multi-cloud capacity management, cross-cloud autoscaling, scale-to-zero, and running in the customer's cloud when needed. In principle, that should let the business flex supply and match cost to workload shape more tightly than a dedicated owned-fleet operator. But asset-light does not mean cheap. HostFleet's April 2026 serverless GPU matrix shows Baseten priced above Runpod on every shared SKU listed and above Modal on the shared L4 and H100 rows, while only Replicate's A100 custom deployment price sits higher among the overlapping A100 rows shown. That is the clearest adverse signal in the public record: Baseten sells a premium managed layer over raw compute. The company's rebuttal, effectively, is its own performance narrative. Dedicated Inference claims 6x better GPU utilization and 5-10x lower costs on optimized runtimes; Model APIs claim 5-10x lower spend versus closed alternatives; customer studies report lower per-unit cost, fewer engineers, or both. Those claims are consistent with a thesis that Baseten expands gross margin through utilization and support attachment, but public data still stops short of a gross-margin proof. That leaves unit economics in proxy territory. We can see the billing unit. We can see that some customers say total cost fell. We can see that list pricing is premium to raw GPU clouds. What we cannot see is the realized balance between cloud pass-through, support labor, negotiated discounts, and retention.[CI006, CI007, CI015, CI016, CI017, CI018]
| metric | value / public proxy | confidence | why it matters | diligence ask |
|---|---|---|---|---|
| Published billing unit | GPU-minute for dedicated inference; per-1M-token for Model APIs | High | Shows Baseten monetizes usage rather than seats. | Provide realized billing mix by workload type. |
| Company-claimed utilization lever | 6x better GPU utilization and 5-10x lower costs on Dedicated Inference | Medium | If true, utilization is the core gross-margin lever. | Provide before/after utilization histograms and gross margin by optimized runtime. |
| Price-floor pressure | HostFleet shows Baseten premium to Runpod on shared SKUs and premium to Modal on shared L4/H100 rows | Medium | Premium pricing must be justified by lower total cost, not raw GPU-hour parity. | Provide win/loss analyses where Baseten beats cheaper raw-GPU alternatives. |
| Writer proxy | 35% lower cost per million tokens; 60% higher tokens/sec; 23% lower TTFT | Medium | Suggests performance optimization can offset premium list pricing. | Provide benchmark methodology and comparable customer gross margin impact. |
| OpenEvidence / Speechify proxy | 78% lower latency, 6x faster deployment, 8x lower maintenance, 44% lower cost per million characters | Medium | Supports TCO argument through both infra savings and fewer platform engineers. | Provide audited customer expansion and retention data after migration. |
| Patreon / Superhuman proxy | $600k resources saved yearly, 70% GPU-cost savings, 80% lower latency, multiple engineers freed | Medium | Shows economic value can sit in labor efficiency as well as compute savings. | Provide cohort-level NRR and services attach for customers citing labor savings. |
| Gross margin / CAC / NRR | Not disclosed publicly | Low | Without these, no public unit-economics bridge closes. | Provide gross margin by product line, CAC, payback, churn, and NRR. |
Rows mix official list mechanics with customer-proof proxies and independent price-floor checks; none substitute for disclosed gross-margin data.
[CI005, CI006, CI015, CI018, CI019, CI020]Public unit-economics evidence runs from workload shape to utilization and support economics, but breaks before gross margin because realized discounts and COGS are private.
The bridge is directional. Public evidence supports the nodes qualitatively or via case-study proxies, but not a closed margin equation.
[CI007, CI015, CI018, CI019, CI020, CI021]4.4 Capital adequacy and financing dependency
Baseten's capital position looks strong on paper and opaque in practice. The public record supports $75 million of Series C financing in February 2025, $150 million of Series D financing in September 2025, and a $300 million Series E at a $5 billion valuation in January 2026. Business Wire, Tracxn, and CB Insights all point to roughly $585 million of cumulative capital raised, and Business Wire explicitly characterizes the January 2026 round as the company's third fundraise within the prior year. That pace matters: Baseten is clearly not operating from slow-burn cash generation; it is financing growth aggressively as inference demand expands. The use-of-funds language reinforces that interpretation. Baseten's own Series E post centers the new money on speed, uptime, developer experience, and broadening the infrastructure platform. Tech Funding News adds expected hiring, customer-support expansion, and more integrations. Public headcount proxies line up with that investment story: PitchBook's 2025 snapshot showed 73 employees, while Tracxn listed 258 employees by April 2026. Even if those datasets are imperfect, the direction is clear—Baseten appears to be scaling operating expense meaningfully alongside product and infrastructure scope. What the public record does not show is whether the current capital base is sufficient relative to burn. There is no disclosed cash balance, monthly burn, runway, debt schedule, or covenant package. Sacra's reported $200 million to $600 million annualized revenue estimates suggest substantial scale, and the reported $11 billion to $15 billion valuation talk suggests the market may be willing to finance the next leg. But those are not substitutes for cash, margin, and runway disclosure. The only hard conclusion is that Baseten has had strong access to capital; whether it is adequately capitalized against actual burn remains private.[CI027, CI028, CI029, CI030, CI031, CI032]
| metric | public value / status | source-backed implication | diligence ask |
|---|---|---|---|
| Total capital raised | $585M total capital raised publicly reported | Capital access has been strong enough to fund rapid scale-up, but cash remaining is unknown. | Provide current cash balance and unrestricted cash after the January 2026 round. |
| Latest financing | $300M Series E at $5B valuation in January 2026 | Fresh equity materially improved flexibility entering 2026. | Provide post-close cash bridge and board-approved operating plan. |
| Funding cadence | Series C $75M (Feb 2025), Series D $150M (Sep 2025), Series E $300M (Jan 2026) | Three rounds inside roughly a year implies aggressive investment mode and possible dependence on capital markets. | Provide target next-round timing and financing contingency plan. |
| Planned use of funds | Speed, uptime, developer experience, team growth, platform expansion, more integrations and support | Spend appears aimed at product, infra, and headcount rather than harvest mode. | Provide 24-month capex / opex budget by function. |
| Cash balance / monthly burn / runway | Not publicly disclosed | Capital adequacy cannot be underwritten from public evidence. | Provide monthly net burn, cash balance, runway under base and downside cases. |
| Debt / project-finance obligations | No public debt schedule or project-finance obligations disclosed; SEC operating-company filings unavailable from cited EDGAR page | Absence of disclosure does not equal absence of obligations. | Provide all debt facilities, cloud-commitment liabilities, reserved-capacity obligations, and major vendor terms. |
Funding facts are public; adequacy is not. This table intentionally separates known financing history from unknown liquidity and obligation metrics.
[CI027, CI028, CI029, CI030, CI031, CI033]Public financial signals span wide ranges because they mix closed financing facts with third-party estimates and snapshots rather than audited financials.
Revenue and upper valuation bound are third-party estimates rather than company-disclosed audited figures. Headcount spans two different vendor datasets and dates.
[CI027, CI035, CI037, CI038, CI048]Baseten's cash-flow logic appears to run from repeated equity raises into product, support, and capacity orchestration, but the residual cash position and obligations remain private.
This map shows direction of cash use and financing dependence rather than a numeric cash-flow statement because public burn and cash data are unavailable.
[CI027, CI028, CI029, CI030, CI033, CI034]4.5 Financial verdict and disclosure gaps
The financial verdict is positive on business-model coherence and negative on underwriteable disclosure. On the positive side, Baseten clearly monetizes the right units for its product category: GPU-time, token usage, training jobs, and high-touch enterprise features. Customer proofs consistently reinforce the same story—that production-grade inference is valuable when it lowers total operating cost, shrinks engineering burden, and preserves latency or uptime under real workloads. Capital access has also been unusually strong, with three rounds in roughly a year culminating in a $300 million Series E. On the negative side, nearly every metric that turns a good story into an investment case remains private. Public sources do not show revenue mix by product, realized enterprise pricing, gross margin, CAC, NRR, churn, customer concentration, cash balance, monthly burn, or runway. The SEC EDGAR entity landing page tied to the company lookup does not provide public operating-company filings, so there is no audited financial bridge to fall back on. Reliability evidence is adequate but not spotless: the status page shows recent incidents and the SLA target is 99.9%, not the perfect uptime implied by the strongest marketing claims. Net: Baseten looks like a premium, usage-based inference platform with real demand and credible cost-saving proxies, but the public record is still too thin to underwrite revenue quality or capital adequacy rigorously. The right diligence posture is to treat the company as promising but disclosure-light until private financials close the gaps listed below. Public customer proof now spans finance workflows (Hebbia), coding products (Zed and Posit), voice interfaces (Wispr Flow), and world-model experimentation (World Labs), which strengthens the case that Baseten's usage-based revenue opportunity is diversified across several demanding production workloads rather than one narrow niche.[CI024, CI025, CI027, CI028, CI033, CI040]
| missing private metric | impact on underwriting | exact diligence path |
|---|---|---|
| Revenue mix by Model APIs vs Dedicated Inference vs Training vs services | Cannot judge whether growth is durable software-like expansion or support-heavy services revenue. | Request monthly revenue by product surface for the last 18 months plus contribution margin by surface. |
| List-to-net pricing, enterprise minimum commits, and discount schedules | Public list pricing may overstate realized yield and margin. | Review five recent enterprise orders with associated discount approvals and usage curves. |
| Gross margin by product line and cloud / GPU procurement terms | Impossible to assess whether optimization claims translate into retained gross profit. | Provide product-level COGS, major cloud spend by provider, and any reserved-capacity commitments. |
| Cash balance, monthly burn, and runway | Capital adequacy cannot be underwritten despite recent fundraises. | Provide current cash waterfall, trailing six-month burn, and scenario runway model. |
| Customer concentration, NRR, churn, and cohort expansion | Cannot test revenue quality or durability beyond anecdotal customer stories. | Provide top-20 customer revenue concentration, logo churn, dollar churn, and cohort NRR. |
| Public filing and audit trail depth | Lack of SEC operating financials leaves investors dependent on management-only materials. | Provide audited financial statements, board package KPIs, and any lender reporting packs. |
Every row is a material diligence blocker rather than a nice-to-have. Public evidence establishes narrative direction, not underwriteable private metrics.
[CI047, CI050]05Product & Technology
5.1 Product Surface in Customer Workflow Terms
Baseten now spans most of the modern AI deployment workflow rather than a single hosting SKU. At the lightest-weight end, Model APIs let teams swap an OpenAI or Anthropic base URL and call shared frontier/open models immediately, which is useful for prototyping or for products that do not need dedicated GPUs. At the heavier-weight end, Truss packages a custom or open-source model into a reproducible deployment artifact, while Dedicated Inference adds tenant-isolated capacity, custom scaling, and support for stricter latency or compliance needs. Chains sits above single-model inference for multi-step RAG, transcription, or multimodal flows, Frontier Gateway adds branded URLs plus billing/rate-limit controls for model labs monetizing their own APIs, and Baseten Training/Loops try to close the loop from checkpoint creation to production serving. In customer workflow terms, the company is selling a graduated path from instant API evaluation to custom production inference, not just raw GPU rental.[CE001, CE002, CE006, CE007, CE008, CE009]
| Module / Asset | Primary user | Status / maturity | Core function | Differentiation | Diligence gap |
|---|---|---|---|---|---|
| Model APIs | Application developers evaluating or shipping frontier/open models | GA / mature shared service | Instant OpenAI- and Anthropic-compatible inference on Baseten-managed shared infrastructure | Lowest-friction entry point; built-in caching, tool calling, structured outputs, and migration path to dedicated deployments | Shared infrastructure by design; public docs do not disclose tenant-level contention controls or per-model benchmark methodology |
| Dedicated Inference | Teams serving custom, fine-tuned, or proprietary models in production | GA / core enterprise surface | Single-tenant or customer-controlled inference with custom hardware, scaling, and deployment options | Combines managed performance tuning, cross-cloud autoscaling, and enterprise control surfaces | Only public contractual SLA is 99.9%; published GPU-hour pricing is high versus self-serve peers |
| Truss | ML engineers packaging and iterating on custom deployments | Mature open-source CLI with active May 2026 releases | Packages model code, weights, dependencies, and GPU config; deploys via uvx truss push/watch | Write-once packaging abstraction with live reload and support for many serving frameworks | Open-source activity is healthy, but public docs do not quantify enterprise adoption of Truss specifically |
| Chains | Teams building RAG, transcription, or multi-model workflows | GA / production workflow layer | Orchestrates Python chainlets with per-step hardware, dependencies, and autoscaling | Lets Baseten sell compound-AI workflows without forcing monolithic model deployments | Public performance claims are directional; workflow-specific latency depends on design and workload |
| Frontier Gateway | AI labs commercializing their own hosted models | GA / specialized monetization surface | White-labeled inference API with key management, billing, metering, rate limits, and branded URL routing | Turns Baseten into invisible infrastructure for labs that want their own API brand and monetization layer | Public documentation does not describe customer count, supported billing edge cases, or settlement workflows |
| Training Jobs / Loops | Research and infra teams training or post-training models before deployment | Training Jobs = GA; Loops = early access | Managed GPU training plus a path to deploy checkpoints into inference endpoints | Attempts to close training-to-inference loop inside one platform rather than handing off to another vendor | Loops is still early access, so maturity is uneven versus the inference stack |
Status labels reflect Baseten's own public wording as of 2026-05-30. “Mature” means a repeatedly described GA surface with operational documentation; it does not imply externally audited feature quality.
[CE001, CE002, CE006, CE007, CE008, CE009]| User job | Current workflow | Baseten solution | Public benefit | Limitation |
|---|---|---|---|---|
| Prototype with a frontier/open model quickly | Swap providers without building deployment infra | Model APIs | Point an existing OpenAI or Anthropic SDK at Baseten and start calling supported models immediately | You accept the supported-model list and shared-infrastructure model rather than choosing exact hardware |
| Deploy an open-source or proprietary model to production | Package model, pick hardware/engine, expose stable endpoint | Truss + Dedicated Inference | Config-driven deployment path with TensorRT-LLM or custom-server options, observability, and environment promotion | Customer still needs to validate performance/cost trade-offs per model because Baseten does not publish a universal benchmark methodology |
| Run a compound AI application | Split a multi-step workflow across specialized components | Chains | Each chainlet can use its own hardware and autoscaling, reducing monolithic GPU waste and latency bottlenecks | Public performance claims are directional; workflow-specific latency depends on design and workload |
| Commercialize a lab-owned model | Expose model to third-party customers with metering and rate limits | Frontier Gateway | White-labeled URL, key management, usage limits, and per-customer billing remove the need to build an API gateway from scratch | Commercial and contractual details beyond the marketing surface are not public |
| Train or fine-tune and then deploy checkpoints | Run training code, sync checkpoints, and promote into inference | Training Jobs / Loops + deploy_checkpoints | Same vendor can cover managed training infra and downstream deployment endpoint | Loops remains early access, so the most advanced post-training path is not yet fully mature in public materials |
Benefits are public-product claims and customer-proof outcomes, not guaranteed customer results. Each row describes the workflow Baseten markets most clearly, not every possible implementation variant.
[CE001, CE002, CE005, CE006, CE007, CE008]How a team moves from evaluation or packaging into production inference on Baseten, with an optional training and gateway path.
[CE002, CE005, CE006, CE007, CE008, CE009]5.2 Deployment Architecture and Operating Model
The clearest technical differentiator in Baseten's public corpus is that it explains the deployment path with more specificity than many AI-infrastructure startups. Truss abstracts packaging, dependencies, and GPU configuration; Baseten's build step then validates and uploads the package, compiles supported LLMs with TensorRT-LLM when the engine path is selected, and deploys the resulting container behind a dedicated model subdomain. The MCM control plane sits underneath both training and inference, abstracting cloud-provider differences and rerouting capacity across regions or providers when needed. Request routing resolves environment names from URL paths, environments preserve stable endpoints as deployments are promoted, and async requests enter a queue that protects real-time traffic from background work. BDN addresses the cold-start bottleneck by mirroring and caching large model weights at multiple layers so new replicas are less dependent on external storage. The result is an inference-first architecture with explicit build, routing, autoscaling, and weight-delivery primitives.[CE003, CE004, CE005, CE010, CE011, CE012]
| Layer / component | Public mechanism | Key dependencies | Risk / limitation |
|---|---|---|---|
| Packaging layer (Truss) | Packages model definition, dependencies, secrets, caching, and GPU config from config.yaml or Python model code | Baseten CLI, GitHub/PyPI distribution, user source repositories | Abstraction is strong, but deployment success still depends on model-specific tuning and user-supplied weights |
| Build / compile path | Engine-Builder-LLM downloads weights, compiles with TensorRT-LLM, applies quantization/tensor parallelism, and emits a serving container | Hugging Face or cloud-storage weight source, CUDA-compatible GPU targets | Compile times can take minutes and public docs do not benchmark every model/hardware combination |
| Runtime optimization layer | Inference runtime exposes TensorRT, SGLang, vLLM, TGI, TEI, speculative decoding, structured output, KV-cache optimization, and topology-aware parallelism | Model architecture support, Baseten inference runtime, GPU memory/layout assumptions | Optimization options are numerous but not all are validated publicly per workload |
| MCM control plane | Unifies GPUs across clouds/regions, provisions resources, monitors health, and reroutes around capacity crunches or outages | Underlying cloud GPU supply, networking, Baseten control plane | Cross-cloud abstraction reduces lock-in but introduces dependency on Baseten’s own orchestration layer |
| Weight delivery / cold-start path | BDN mirrors weights to Baseten storage and caches them at mirrored-origin, cluster, and node layers | Upstream weight repository for first mirror, Baseten blob storage, in-cluster cache | First deploy still depends on upstream weight availability; benchmark methodology for “2-3x faster” is not public |
| Request routing / environments | Each model gets a subdomain; URL path resolves environment; async requests queue; promotions keep endpoint names stable | Baseten API gateway, environment config, autoscaler, queue service | Scale-to-zero introduces cold-start trade-offs and regional guarantees require a special endpoint |
| Workflow orchestration / training | Chains coordinates multi-step workflows; Training Jobs and Loops provision GPUs via MCM and can deploy checkpoints into inference | Python SDK/CLI, MCM, storage/checkpoint sync | Loops is early access and training maturity lags the core inference surface |
| Enterprise controls / tenancy | Single-tenant, self-hosted, hybrid, and regional environments plus SSO/SCIM and compliance-policy boundaries | Customer IdP, Baseten support setup, customer cloud if self-hosted | Some controls require sales/support intervention rather than pure self-serve enablement |
This table mixes direct documentation facts with synthesis about operating dependencies. “Risk / limitation” names the public caveat or diligence item, not an observed failure.
[CE003, CE004, CE005, CE006, CE007, CE010]Layered view of Baseten's public architecture from access surfaces through packaging/orchestration to runtime and cross-cloud infrastructure.
[CE001, CE002, CE006, CE007, CE009, CE010]5.3 Trust, Data Handling, and Reliability Controls
Baseten's trust posture is strong by startup standards and particularly important because the company wants sensitive inference workloads. The public security documentation says Baseten maintains SOC 2 Type II and HIPAA compliance, does not store model inputs, outputs, or weights by default, never shares GPUs across users, isolates customers into dedicated Kubernetes namespaces, and uses controls such as Calico, Falco, and Gatekeeper around workload isolation. Enterprise and pricing pages also advertise self-hosted, single-tenant, hybrid, and region-restricted options, while regional-environments docs explain that true residency requires a distinct regional endpoint rather than the default environment CNAME. The nuance is reliability: Baseten marketing frequently uses four-nines language, but the only public contractual commitment is the Dedicated Inference SLA at 99.9% monthly availability. The public status page also logged multiple May 2026 incidents, so diligence should treat trust/compliance as a strength and public reliability guarantees as more mixed.[CE015, CE016, CE017, CE018, CE019, CE020]
| Control / signal | Public status | Scope / evidence | Implication | Gap |
|---|---|---|---|---|
| SOC 2 Type II + HIPAA | Published | Security docs, enterprise page, and pricing page all cite SOC 2 Type II and HIPAA | Strong baseline trust signal for enterprise inference workloads | No public certificate artifacts or audit scope details in the reviewed corpus |
| Default non-storage of prompts/outputs/weights | Published with caveats | Security docs say Baseten does not store inputs, outputs, or weights by default, except temporary async storage and optional caching | Important privacy and IP positioning for sensitive inference | Need contract/DPA review for exact retention edge cases and customer-enabled caching behavior |
| GPU and namespace isolation | Published | Security docs say Baseten never shares GPUs across users and assigns each customer a dedicated Kubernetes namespace with Calico/Falco/Gatekeeper controls | Supports tenant isolation claims beyond generic cloud marketing | No public penetration-test report or architecture diagram was reviewed |
| Regional environments / data residency | Published but support-configured | Docs explain region-constrained replicas and special regional endpoint formats | Useful for GDPR/data-residency buyers that need routing guarantees | Setup requires Baseten involvement and public docs do not state lead times or pricing |
| Identity and lifecycle controls | Expanded in 2026 | SSO/SCIM changelog adds SAML 2.0, SCIM 2.0, JIT provisioning, deprovisioning, and group-based roles on Enterprise | Improves enterprise admin hygiene and procurement readiness | No public mapping to specific IdP limitations or SCIM attributes |
| Hosting flexibility | Published | Enterprise, dedicated-inference, and security pages describe Baseten Cloud, self-hosted, hybrid, and single-tenant modes | Lets buyers choose between speed, control, and cloud-commitment reuse | Exact operational split between Baseten-managed and customer-managed responsibilities is not fully public |
| Contractual availability | Mixed | SLA contract says Dedicated Inference targets 99.9% monthly availability; marketing pages often use 99.99/four-nines language | Public procurement should rely on the legal SLA rather than homepage shorthand | No public SLA was found for Model APIs, Chains, or the broader web app |
| Operational incident visibility | Published but mixed | Public status page shows multiple May 2026 incidents; third-party reachability tracker says detailed incident data is unavailable | Visibility exists, but independent uptime corroboration is thin and headline uptime panels can hide short incidents | Need contractual incident response terms, RCA access, and service-credit history in diligence |
Confidence is highest where multiple official pages agree; lower where public documentation requires contact with support or contract review. This table describes what is public, not what Baseten may provide privately in diligence.
[CE015, CE016, CE017, CE018, CE019, CE020]Key dependencies that sit around Baseten's inference stack: upstream weights, GPU clouds, regional/identity controls, and non-core SaaS tooling.
[CE010, CE016, CE023, CE024, CE025, CE036]5.4 Developer Signal, Customer Proof, and Competitive Positioning
Baseten's moat is not the lowest published unit price; it is the bundle of packaging tooling, performance engineering, and managed cross-cloud operations. Truss gives Baseten a real developer surface: the GitHub repo and PyPI package show an active open-source packaging CLI with frequent May 2026 releases centered on Loops and deployment workflows. Customer proof is stronger than generic logo pages: Writer says Baseten-built TensorRT-LLM engines improved tokens per second and lowered time to first token and cost, while OpenEvidence attributes materially lower latency, faster deployments, and lower maintenance burden to Baseten's MCM, embeddings runtime, and tooling. The trade-off is visible in independent pricing comparison. HostFleet's April 2026 matrix shows Baseten priced above Runpod and Modal on comparable GPU instances, while Runpod and Modal market more aggressive zero-idle and cold-start positioning. Against AWS, Google, and Microsoft, Baseten is narrower in scope but easier to read as an inference-specialist layer rather than a full hyperscaler AI platform.[CE026, CE027, CE028, CE029, CE030, CE031]
5.5 Roadmap and Product Maturity
Public 2026 roadmap signals point to Baseten maturing operating controls around a fairly stable product architecture rather than constantly adding new product families. The notable 2026 releases were SSO/SCIM, rolling deployments, BDN, and a billing usage API—features that make the platform easier to govern, safer to update, faster to cold-start, and easier to instrument financially. That release mix suggests Baseten is moving from “can this serve models?” toward “can this run mission-critical inference inside enterprise processes?” At the same time, maturity is uneven across the stack. Training Jobs are publicly GA, while Loops remains early access, so the training-to-inference story is strategically promising but not uniformly production-proven. Public materials also say little about benchmark methodology, exact enterprise onboarding lead times for regional controls, or product priorities beyond the currently disclosed 2026 changelog, leaving some product-tech diligence items unresolved.[CE007, CE020, CE021, CE022, CE037, CE038]
| Date / stage | Feature / milestone | Status | Implication | Source |
|---|---|---|---|---|
| 2026-03-04 | Billing usage API | Launched | Makes Dedicated Inference, Training, and Model APIs easier to instrument financially via daily API breakdowns | Baseten changelog |
| 2026-03-19 | Baseten Delivery Network (BDN) | Launched | Signals investment in cold-start mitigation and independence from upstream weight stores after first mirror | Baseten changelog + How Baseten Works docs |
| 2026-03-30 | Rolling deployments | Launched | Adds safer zero-downtime promotions and better environment lifecycle control for production releases | Baseten changelog |
| 2026-05-14 | SSO and SCIM | Launched on Enterprise | Improves identity governance and deprovisioning for larger customers | Baseten changelog |
| 2026 public product state | Training Jobs | GA | Shows the managed-training product is no longer experimental | Training product page |
| 2026 public product state | Loops | Early access | Indicates Baseten is investing in post-training/RL workflows but has not yet fully hardened the surface publicly | Training product page + Truss releases |
Only explicitly disclosed public milestones are listed. Absence from this table should not be read as absence from the internal roadmap; it only means the item was not visible in the reviewed public corpus.
[CE007, CE020, CE021, CE022, CE037, CE038]Capability-by-maturity view of Baseten's main product surfaces as of 2026-05-30.
Maturity labels are synthesis, not company-provided scores. “Differentiated” means the public corpus shows a clearer relative advantage or stronger external proof, not that the capability is objectively category-leading on all dimensions.
[CE007, CE026, CE027, CE031, CE032, CE037]5.6 Exhibits
06Customers
6.1 Customer segmentation and buyer profile
Baseten's public customer evidence points to a buyer set made up primarily of AI-native software builders whose own end products live or die on inference speed, reliability, and cost. The buyer is usually an ML, platform, or product-engineering leader, while the user base broadens into application engineers, security teams, and operations leads once a workload moves from model evaluation into production. Publicly named examples span enterprise agent platforms such as Writer and Notion, regulated healthcare apps such as OpenEvidence and Abridge, voice and speech applications such as Speechify, productivity software such as Superhuman, creative tools such as Gamma, and GTM or coding products such as Clay and Cursor. That breadth matters because it shows Baseten is not only selling experimentation infrastructure. At the same time, the disclosed book is still overwhelmingly AI-native software rather than a diversified set of legacy enterprises, so customer breadth is real but not yet institutionally broad in public.[CU001, CU002, CU003, CU004, CU005, CU006]
| Segment | Buyer / user / payer | Representative public accounts | Public value signal | Gap |
|---|---|---|---|---|
| Enterprise agent and knowledge platforms | Buyer: CIO / AI platform leader; User: app and ops teams; Payer: enterprise software vendor | Writer, Notion | Mission-critical AI agents with security and governance requirements | No disclosed revenue mix by enterprise vs startup accounts |
| Healthcare AI applications | Buyer: clinical/IT leadership; User: clinicians, care teams, revenue-cycle ops; Payer: healthcare AI vendor or health system software budget | OpenEvidence, Abridge, Ambience | Regulated medical information and clinical documentation workloads | No public contract-value or health-system count disclosure |
| Voice and speech applications | Buyer: product / ML platform lead; User: end users and content teams; Payer: voice application vendor | Speechify | Consumer-scale TTS and voice infrastructure under real-time latency pressure | No public disclosure of revenue concentration inside audio workloads |
| Productivity and collaboration apps | Buyer: product / engineering leadership; User: professionals and knowledge workers; Payer: software vendor | Superhuman, Notion | Inference directly affects email, workspace, and agent UX | Production KPIs are public for Superhuman, not for Notion |
| Creative and creator-economy platforms | Buyer: product / infra lead; User: creators and prosumers; Payer: software vendor | Gamma, Patreon | Large-scale image generation and creator-media workloads | Consumer logos do not by themselves prove long-term renewal economics |
| GTM and developer tools | Buyer: engineering / revenue-ops lead; User: developers, GTM operators, recruiters; Payer: software vendor | Clay, Cursor, Mercor | Shows Baseten exposure to code, GTM, and AI-economy tooling | Only public name mentions exist for most of these accounts |
Segmentation is assembled from Baseten case studies, fundraising references, and official customer pages; Baseten does not publish customer-count or ARR mix by segment.
[CU001, CU002, CU003, CU004, CU005, CU006]| Customer | Segment | Evidence type | What is public | Proof strength | Gap |
|---|---|---|---|---|---|
| Abridge | Healthcare AI | Business Wire + Abridge site | Named as Baseten customer; Abridge sells enterprise clinical-conversation AI to major health systems | Medium | No Baseten-specific deployment scope or outcomes disclosed |
| Cursor | Developer tools | WorkOS + Cursor site | Named as Baseten customer; Cursor serves AI-assisted coding and says it is trusted by over half the Fortune 500 | Medium | No Baseten-specific workload detail or economics disclosed |
| Notion AI | Enterprise productivity | Business Wire + Notion AI page | Named as Baseten customer; Notion AI markets agents, enterprise search, and zero-retention enterprise controls | Medium | No Baseten-specific performance or spend data disclosed |
| Clay | GTM software | WorkOS + Clay site | Named as Baseten customer; Clay serves a large GTM-team base with enrichment and workflow automation | Medium | No Baseten-specific production metrics disclosed |
| Mercor | AI economy / recruiting | Business Wire + Mercor site | Named as Baseten customer; Mercor positions itself around powering the AI economy | Low-medium | Public customer mention exists, but use case and infrastructure dependence are not described |
These rows extend the named customer set beyond flagship case studies, but they are weaker proof than the six detailed stories because they usually lack deployment specifics.
[CU010, CU011, CU047, CU048, CU049, CU050]Baseten typically starts with a technical buyer evaluating model performance, then expands into operational, security, and procurement stakeholders as the workload becomes business-critical.
[CU001, CU003, CU036, CU037, CU038, CU039]6.2 Adoption trajectory and named production proof
Baseten's best public adoption evidence is not a disclosed customer-count time series so much as a stack of workload-scale and outcome disclosures from reference customers. The company says inference volume grew 100x over the last year, and its customer-story index now spans healthcare, code, audio, presentations, and operations use cases. The flagship six case studies are production-grade rather than pilot-like: OpenEvidence discloses billions of requests per week and a doctor in every U.S. zip code, Speechify discloses 161B+ characters per month for 60M+ users, Gamma discloses 3M+ images per day for 70M+ users, Superhuman says dozens of custom models moved into production, and Patreon reports large cost savings on a scaled Whisper deployment. The quality of proof is strongest where Baseten and the customer share concrete latency, cost, throughput, or workflow metrics, though the public set remains curated.[CU012, CU013, CU014, CU015, CU016, CU017]
| Metric | Value | Date | Source | Confidence | Implication / missing denominator |
|---|---|---|---|---|---|
| Baseten platform inference growth | 100x inference-volume growth in the last year | 2026-01 | Baseten Series E blog | Medium | Strong demand signal, but not broken out by customer or workload |
| Public reference inventory | 13 case studies, 29 testimonials, 4 videos, 654 ratings | 2026-05 | FeaturedCustomers | Medium | Large public reference surface, but aggregator methodology is not fully transparent |
| OpenEvidence workload scale | Billions of requests per week; doctors in every U.S. state and zip code | 2026 viewed | Baseten case study | Medium | Shows national clinical reach, but not revenue or contract expansion |
| Speechify workload scale | 161B+ characters/month for 60M+ users | 2026 viewed | Baseten case study | Medium | Very large consumer-scale inference load |
| Gamma workload scale | 3M+ images/day for 70M+ users | 2026 viewed | Baseten case study | Medium | Strong PLG-scale proof, but no disclosed share of Gamma traffic on Baseten |
| Superhuman production breadth | Dozens of custom embedding models switched into production after a one-week project | 2026 viewed | Baseten case study | Medium | Indicates deployment breadth even without volume metrics |
| Additional named accounts | Abridge, Cursor, Clay, Notion, Mercor and others named publicly | 2026-01 | Business Wire / WorkOS | Medium | Breadth extends beyond the six flagship stories, but most lack quantified deployment detail |
Baseten does not publish one normalized customer-count series, so the table uses workload and reference proxies rather than a single active-account KPI.
[CU012, CU013, CU014, CU015, CU016, CU017]| Customer | Segment | Deployment / use case | Production vs pilot | Outcome | Limitation |
|---|---|---|---|---|---|
| Writer | Enterprise AI platform | Serve custom 70B domain-specific LLMs with TensorRT-LLM on Baseten | Production | 60% higher tokens/sec, 23% lower TTFT, 35% lower cost per million tokens | No renewal term or contract value disclosed |
| OpenEvidence | Healthcare AI | Medical search and embeddings inference for clinicians | Production | Latency cut from >700ms to 160ms, 6x faster deployments, 8x+ less infrastructure maintenance | No public spend or account-expansion disclosure |
| Speechify | Voice / TTS | Host 10+ production model deployments across TTS, voice conversion, and parsing | Production | 44% lower cost per million characters, 30-50% lower p99 latency, 4.5x faster startup | No disclosed revenue concentration or contract duration |
| Gamma | Creative AI platform | Serve open-source image generation models at massive user scale | Production | 30%-80% faster generation, 20% better efficiency, 3M+ images/day | No disclosed retention or spend-per-user metrics |
| Superhuman | AI-native productivity | Deploy dozens of custom embedding models for core product features | Production | 80% lower P95 latency and rapid migration with zero user impact | No public seat-count or contract economics |
| Patreon | Creator economy platform | Serve Whisper transcription and captioning workloads | Production | 70% lower GPU cost, 440+ hours saved, nearly $600k annual savings | No public renewal or expansion metrics |
This is a partial public sample of Baseten deployments with quantified outcomes, not an exhaustive customer list.
[CU013, CU014, CU015, CU016, CU017, CU018]The public adoption path starts with technical evaluation, moves into one production workload, and only later expands into dedicated compute, governance, and larger enterprise commitments.
[CU003, CU036, CU037, CU039, CU040, CU041]Public proof is strongest where Baseten and the customer disclose concrete performance outcomes; durability visibility is weakest across every disclosed account.
[CU014, CU016, CU017, CU018, CU019, CU021]6.3 Durability, satisfaction, and expansion evidence
Durability evidence is directionally positive but incomplete. On the positive side, third-party reference aggregators show unusually strong public sentiment proxies: FeaturedCustomers reports a 4.8/5 reference score from 654 ratings, PeerSpot highlights collaboration and cost effectiveness, and individual customer quotes repeatedly describe Baseten as a winner on execution, uptime, or self-serve deployment. The product packaging also shows a plausible land-and-expand motion, from Basic pay-as-you-go usage into Pro dedicated compute and Enterprise self-hosted or region-restricted deployments. Healthcare and enterprise pages make that motion concrete by advertising HIPAA-sensitive workflows, single-tenant clusters, failover, and hands-on engineering support. The key limitation is that none of the reviewed public sources provide NRR, GRR, renewal cohorts, or contract duration, so expansion has to be inferred from packaging and customer quotes rather than measured account economics.[CU031, CU032, CU033, CU034, CU035, CU036]
| Metric / proxy | Value | Segment | Confidence | Diligence ask |
|---|---|---|---|---|
| Portfolio NRR / GRR / logo churn | All customers | Low | Request cohort retention, gross and net revenue retention, and churn by segment | |
| Contract duration / renewal cadence | All customers | Low | Request median contract length, renewal dates, and committed minimums by plan | |
| Third-party reference score | 4.8/5 from 654 reference ratings | Cross-customer public references | Medium | Validate how many ratings are current and attributable to post-2025 product surface |
| Qualitative review summary | PeerSpot highlights deployment speed, flexibility, and cost effectiveness | Cross-industry users | Low-medium | Ask for raw review count and more granular sentiment distribution |
| OpenEvidence testimonial | Vendor-vetting process ended with Baseten as a clear winner | Healthcare AI | Medium | Ask for renewal history and spend growth since migration |
| Speechify testimonial | Speechify says the partnership continues to grow and delivered highest uptime among inference providers it knows | Voice / TTS | Medium | Ask for uptime SLA, incident frequency, and contract-duration disclosure |
Public durability evidence is testimonial-heavy and lacks cohort metrics, so satisfaction proxies should not be mistaken for audited retention data.
[CU031, CU032, CU033, CU034, CU035, CU046]6.4 Concentration, switching, and competitive pressure
The main customer risk in the public record is not obvious churn but concentrated visibility. Baseten has more named accounts than just the six flagship stories, yet the quantified proof still sits inside a narrow band of AI-native software companies. That creates two diligence questions. First, there is no public disclosure of top-customer revenue share, contract lengths, or renewal rates, so investors cannot tell whether a handful of very large workloads dominate the book. Second, competitive pressure is real. HostFleet's April 2026 matrix shows Baseten as the most expensive listed option on multiple common GPUs, while Runpod's 2026 comparison ranks Baseten fifth and attributes materially faster headline cold starts to some rivals. WorkOS also describes a point where customers spending $10k-$50k per month start considering more control and lower-cost open-source options. Baseten counters that risk with open runtimes and no lock-in around customer models, but that also means portability tolerance will stay high. In other words, Baseten may be easier to adopt than a proprietary stack, but it must continuously re-win customers on operating performance and support rather than on captivity.[CU039, CU040, CU041, CU042, CU043, CU044]
| Expansion driver | Concentration risk / constraint | Impact | Diligence path |
|---|---|---|---|
| Basic → Pro → Enterprise packaging | Public plan ladder suggests upsell, but conversion rates are undisclosed | Supports land-and-expand if customers outgrow self-serve inference | Request plan-mix by customer cohort and expansion rates from Basic into Pro/Enterprise |
| Enterprise controls and self-hosted deployment | Could bias sales toward fewer, larger technical buyers and service-heavy motions | Helpful for regulated and sensitive workloads, but may increase top-account concentration risk | Request top-10 customers by ARR, workload volume, and deployment mode |
| Healthcare-specific compliance posture | Vertical concentration could deepen if healthcare becomes the dominant expansion vector | Regulated workloads may be sticky if compliance and reliability prove durable | Request healthcare revenue share, customer count, and renewal history |
| Flagship case-study concentration | Quantified public proof is concentrated in six stories and mostly AI-native software | Investors cannot infer portfolio durability from a narrow reference set | Request anonymized cohort statistics for the broader customer base |
| Premium public pricing | Higher published GPU prices and minimum deployment costs can increase switching pressure | Pricing could slow expansion for cost-sensitive workloads or make replatforming attractive | Request win/loss and churn reasons on price-sensitive accounts |
| Open-source model portability | Customers can increasingly switch models and bring more infra in-house as spend rises | Baseten must keep winning on speed, support, and economics rather than lock-in | Request data on customer tenure, self-host conversions, and workloads retained after optimization |
The strongest expansion signals are packaging and customer quotes, while the strongest risk signals are public-proof concentration and competitive price pressure.
[CU036, CU037, CU038, CU039, CU040, CU041]07Risks
7.1 Legal and regulatory risk centers on compliance scope, customer contracting, and expanding AI rules
Baseten's public trust posture is strong on its face: the company says it is SOC 2 Type II certified, HIPAA compliant, GDPR aligned, and able to run region-restricted or self-hosted deployments for sensitive workloads. The risk is that this marketing posture narrows materially once the legal stack is read in full. Baseten's security docs say it does not store model inputs or outputs by default and can enforce compliance policies, while the healthcare and enterprise pages market HIPAA-compliant infrastructure for mission-critical workloads. But the DPA embedded in Baseten's public terms says customers must not submit PHI and other restricted data unless otherwise agreed in writing, and it leaves customers responsible for legal basis, notices, and many breach-notification duties. That does not prove a defect; it does mean the public website alone is not enough to underwrite regulated use. Regulatory pressure is also moving beyond privacy into AI-governance and procurement. The European Commission's AI policy page highlights AI Act implementation, sector guidance, codes of practice, and a service desk meant to help businesses comply. For an inference vendor selling into healthcare and other regulated enterprise workloads, that raises the probability of longer diligence cycles around residency, documentation, model governance, and shared-responsibility boundaries even if Baseten is not the final application-layer decision maker. The top legal risk is therefore compliance-scope ambiguity rather than a known enforcement action. [CR001, CR002, CR003, CR008, CR009, CR010]
| Risk | Evidence | Likelihood | Severity | Mitigation maturity | Residual exposure | Diligence path |
|---|---|---|---|---|---|---|
| Healthcare compliance scope depends on signed overrides to the public Restricted Data carve-out | HIPAA-compliant marketing sits beside DPA language that bars PHI unless otherwise agreed in writing | Medium | High | Partial | High | Request signed BAA/HIPAA addendum and the exact list of permitted PHI flows |
| EU AI Act and GDPR implementation can slow regulated enterprise sales | European Commission guidance emphasizes AI Act implementation support, guidance, and sector adoption | Medium | High | Early | Medium-High | Review EU legal memo, residency controls, DPIA templates, and audit artifacts |
| Subprocessor updates create short objection windows and termination as the main public remedy | Baseten gives 15 days' notice and five days to object to new subprocessors | Medium | Medium | Basic | Medium | Review negotiated subprocessor notice rights and change-control process |
| Customer-side legal-basis and breach duties can increase deployment friction | The DPA leaves customers with lawful-basis, notices, and many notification obligations | Medium | Medium | Basic | Medium | Map the shared-responsibility matrix before production launch |
| Mission-critical positioning can be undermined by default contract language | Terms exclude time-critical or mission-critical use even as product pages market mission-critical inference | Medium | High | Low | High | Negotiate custom SLA and carve-out language for critical workloads |
Rows reflect publicly observable legal and regulatory risks as of 2026-05-30; severity ranks investment relevance rather than legal advice.
[CR003, CR008, CR009, CR010, CR011, CR012]7.2 Operational and security risk is defined by contract scope and visible incidents, not only uptime marketing
Baseten markets reliability aggressively. The enterprise, healthcare, dedicated inference, frontier gateway, and model API pages all promise four nines or 99.99% uptime, active-active redundancy, or highly reliable multi-cloud operations. The published SLA is narrower. It applies only to Dedicated Inference when Baseten is the hosting party and sets a 99.9% monthly availability target, with credits capped at 40% of monthly fees and recoverable only if the customer files within 24 hours. The terms then go further by stating the services are not licensed for time-critical or mission-critical functions and are not warranted to be uninterrupted or error-free. That creates a real diligence gap between product positioning and default contractual protection. Baseten's status page also shows this is not theoretical. The service reported multiple May 2026 incidents, including ongoing investigations, identified fixes, monitoring updates, and a major-outage marker in the 90-day view. Third-party monitoring adds only partial comfort because Servicealert says detailed incident data is unavailable and relies on reachability snapshots rather than full root-cause reporting. Baseten has shipped useful mitigations such as rolling deployments and deployment-health tooling, but reliability risk still belongs near the top of the chapter because the platform is sold into production workflows whose customers are highly sensitive to downtime and latency regression. [CR013, CR014, CR015, CR016, CR017, CR018]
| Failure mode | Likelihood | Severity | Mitigation maturity | Residual exposure | Unresolved gap |
|---|---|---|---|---|---|
| Reliability marketing exceeds the default contractual SLA | High | High | Partial | High | Need current custom SLA examples and service-credit terms for real enterprise deals |
| Control-plane or inference incidents recur during rapid product expansion | Medium-High | High | Partial | Medium-High | Need postmortems, Sev1 frequency, and MTTR data beyond the public status page |
| Outage credits are operationally weak because claims must be filed within 24 hours and are capped | Medium | Medium | Low | Medium | Need evidence that enterprise customers negotiate broader remedies |
| Compliance-policy or residency changes require Baseten support intervention | Medium | Medium | Partial | Medium | Need admin screenshots and change-control workflow evidence |
| Deployment regressions still occur despite new rollout controls | Medium | Medium | Medium | Medium | Need adoption data for rolling deployments and rollback success rates |
Residual exposure stays elevated because the public SLA and status data do not reveal customer-specific remedies, postmortems, or negotiated reliability terms.
[CR013, CR014, CR015, CR016, CR017, CR018]Operational, contractual, and pricing risks flow directly into trust, sales velocity, margin, and valuation.
[CR017, CR019, CR023, CR026, CR027, CR034]7.3 Dependency and commercial-model risk comes from upstream capacity, vendor chain complexity, and premium pricing
Baseten's core strategic answer to infrastructure risk is multi-cloud capacity management, cross-cloud autoscaling, single-tenant options, and the ability to run in the customer's cloud. Those are meaningful mitigations, but they do not eliminate dependence on cloud partners, GPU availability, and a long tail of third-party services. Nudge Security's public profile lists AWS, Vercel, Statuspage, SendGrid, Stripe, Segment, Sentry, GitBook, and other vendors in Baseten's visible supply chain. Baseten's own product pages further promise access to the latest-generation GPUs, elastic capacity, and priority compute. That means upstream capacity and pricing remain foundational dependencies whether the platform presents them as a single managed surface or not. The commercial risk is that Baseten appears expensive relative to public peers. HostFleet's April 2026 matrix shows Baseten priced above Runpod, Modal, and Fal.ai on multiple GPU classes, while Runpod's 2026 comparison gives Baseten premium pricing and slower cold-start ranges than some alternatives. Baseten may still justify that premium through enterprise controls, support, and performance tuning, but if customers can reproduce acceptable latency and uptime on materially cheaper infrastructure, margin and win-rate risk rise quickly. This is why price-performance and supplier concentration belong alongside classic vendor-risk questions. [CR021, CR022, CR023, CR024, CR025, CR026]
| Dependency | Counterparty or surface | Role | Concentration | Failure scenario | Severity | Mitigation | Residual exposure |
|---|---|---|---|---|---|---|---|
| Latest-generation GPU capacity | Cloud/GPU suppliers | Powers premium inference and autoscaling promises | Unknown externally | Capacity tightens or prices rise, compressing margins and slowing onboarding | High | Multi-cloud routing and hybrid/self-host options | High |
| Visible SaaS control-plane vendors | AWS, Vercel, Statuspage, SendGrid, Stripe, Segment, others | Support web, billing, monitoring, messaging, and operations | Medium | A single third-party outage degrades customer experience or internal ops | Medium | Vendor diversification and customer-cloud options | Medium |
| Premium price position | Runpod / Modal / Fal.ai / Replicate set public comparison set | Shapes customer willingness to pay | High | Peers offer acceptable performance at materially lower entry cost | High | Sell observability, support, security, and enterprise controls | High |
| Service-heavy enterprise delivery | Forward-deployed engineers and support teams | Customizes deployments to hit latency, throughput, and compliance goals | Medium | Support load scales faster than product automation | Medium-High | Standardize playbooks and self-service tooling | Medium-High |
| Status visibility | Baseten status page plus incomplete third-party monitoring | Primary external signal for uptime events | High | Public signals understate incident depth or recurrence | Medium | Request internal reliability dashboards and postmortems | Medium |
The company clearly mitigates concentration better than a single-cloud provider, but public evidence still leaves supplier mix and reserved-capacity exposure opaque.
[CR021, CR022, CR023, CR024, CR025, CR026]Baseten's risk surface depends on cloud and GPU partners, third-party SaaS vendors, enterprise controls, and customer-specific contracting.
[CR021, CR022, CR023, CR029, CR037, CR042]7.4 Execution and financial-model risk rise with product sprawl, rapid scaling, and valuation pressure
Baseten is no longer a narrowly scoped model-hosting startup. Public materials now cover Model APIs, Dedicated Inference, Frontier Gateway, model management, custom servers, chains, Training Jobs, and an early-access RL product called Loops. At the same time, the company raised a $300M Series E at a $5B valuation in January 2026 after multiple fundraises in the prior year. That capital lowers near-term financing risk, but it also raises the execution bar: investors and customers will expect the company to convert premium infrastructure positioning into repeatable enterprise growth without losing reliability or gross-margin discipline. The execution burden is amplified by staffing and go-to-market posture. Tracxn shows rapid employee growth, while Baseten's own site repeatedly emphasizes hands-on engineering support, forward-deployed expertise, custom SLAs, and deployment customization. That can be a differentiator for early enterprise expansion, but it also suggests a service-heavy operating model that may be hard to scale cleanly if product complexity, support demands, and customer-specific security asks keep expanding in parallel. The top people and execution question is therefore not whether Baseten has talent; it is whether the organization can maintain product quality, customer responsiveness, and economic discipline while broadening scope this quickly. External capital markets reinforce that risk: Modal said it raised $355 million at a $4.65 billion valuation after surpassing roughly $300 million in annualized revenue, Fireworks AI is already operating at roughly $315 million in annualized revenue on Sacra's estimates, RunPod shows that much lower-capital rivals can still compete on price, and CoreWeave's nearly $100 billion backlog shows how much well-financed infrastructure is chasing the inference opportunity.[CR030, CR031, CR032, CR033, CR034, CR035]
| Role or function | Dependency or gap | Likelihood | Severity | Mitigation | Diligence path |
|---|---|---|---|---|---|
| Executive team and operating cadence | Must translate rapid fundraising into durable enterprise execution at a $5B valuation | Medium | High | Fresh capital and marquee customers | Review 2026 plan, hiring targets, and product roadmap sequencing |
| Product and engineering | Multiple product lines are expanding simultaneously, including an early-access training layer | Medium-High | High | Shared inference stack and tooling | Ask for GA readiness criteria, bug backlog, and reliability ownership map |
| Customer success and support | Hands-on engineering support may be required to justify premium pricing | Medium | Medium-High | Forward-deployed expertise and Enterprise plan packaging | Request support ratios, escalation SLAs, and reference calls |
| Sales and security procurement | Enterprise controls are plan-gated and can require custom contracting | Medium | Medium | SSO/SCIM, self-hosting, custom SLAs | Review average sales cycle, security-review conversion, and healthcare close rates |
This register emphasizes execution scalability rather than founder-quality judgments; the central question is whether Baseten can widen scope without turning every major account into a custom services project.
[CR030, CR031, CR032, CR033, CR034, CR035]The highest residual risks are compliance-scope ambiguity, reliability contract gaps, premium pricing, and execution sprawl.
[CR010, CR019, CR023, CR026, CR034, CR041]7.5 Public mitigations are meaningful, but the investment case should remain gated by explicit kill criteria
Baseten does have tangible mitigations. Self-hosted and hybrid deployments reduce lock-in and residency risk, Truss improves portability, rolling deployments reduce cutover risk, SSO/SCIM strengthens access control, and the billing usage API gives customers a better handle on spend. Those are not superficial website badges; they are concrete product and operational features that can shrink risk if they are fully implemented in production accounts. Even the legal and regulatory risk is manageable if diligence confirms a clean BAA path, acceptable subprocessor controls, and contract terms that match the criticality of the workload. The key is to keep the underwriting discipline explicit. The thesis should break if Baseten cannot close the gap between public reliability marketing and executable SLA language, if price-performance remains far above peers without a quantified enterprise ROI offset, if supplier concentration turns out to be tighter than the multi-cloud story suggests, or if support-heavy sales prove necessary just to preserve baseline account health. The right posture is therefore conditional conviction: Baseten has credible mitigations, but the remaining evidence gaps are material enough that they should be converted into monitored triggers before a high-confidence investment view is allowed to stand. [CR020, CR029, CR037, CR038, CR039, CR040]
| Risk | Monitorable trigger | Threshold or event | Action implication |
|---|---|---|---|
| Healthcare compliance ambiguity | Signed BAA / HIPAA addendum availability | No executable BAA path or unclear PHI boundary for a top healthcare opportunity | Pause regulated-healthcare underwriting until contract evidence is produced |
| Reliability gap | Sev1 incident cadence and SLA terms | Two or more material incidents in 90 days or no negotiated remedy beyond the public SLA | Reduce conviction, require pricing holdback, or stop the process |
| Price competitiveness | Peer GPU economics and customer ROI | Baseten remains >2x public peer pricing on target GPU classes without quantified enterprise ROI offset | Downgrade margin and win-rate assumptions |
| Execution sprawl | GA readiness and support load | Training/Loops, gateway, and dedicated deployments all require heavy custom support to stay healthy | Treat the model as services-heavy rather than software-like |
| Supplier concentration | Capacity-booking evidence | Reserved capacity or supplier diversification is materially weaker than the multi-cloud narrative implies | Increase dependency discount and require contingency planning |
Kill criteria are deliberately monitorable so they can be tied to diligence outputs rather than generic caution.
[CR019, CR026, CR027, CR029, CR036, CR037]08Valuation
8.1 Recommendation: track the company at the closed price, but do not chase momentum pricing
Baseten looks like a high-quality company operating in one of the best parts of the AI stack, but the investment call is still constrained by what is and is not publicly provable. The cleanest valuation anchor is the closed January 2026 Series E: $300 million raised at a $5 billion valuation. That mark is real, recent, and corroborated across Baseten’s own announcement, Business Wire, Tech Funding News, and Tracxn. The harder question is whether public evidence supports treating that mark as attractive, fair, or already stretched. The answer is conditional rather than categorical. Sacra’s $600 million annualized-revenue estimate makes the closed round look plausibly supportable at about 8.3x implied revenue, especially against a comp set that includes MongoDB-like low-double-digit public multiples and Modal or Fireworks private multiples in the low-to-mid teens. But that same public record also shows how fragile the case is: Baseten’s disclosed pricing is premium, third-party pricing matrices say it is expensive on paper, and the mooted $11 billion follow-on would leap straight into premium-software territory without corresponding primary financial disclosure. That is not a setup for a clean buy call. The right posture is therefore track with medium confidence, high risk rating, and a stretched valuation stance. The company is worth staying close to because the market is real and the product is differentiated, but investors should insist on diligence that converts private enthusiasm into evidence before underwriting a richer entry price.[CV001, CV002, CV007, CV008, CV009, CV035]
| Recommendation | Confidence | Risk rating | Valuation stance | Decision implication |
|---|---|---|---|---|
| track | Medium | High | Stretched | Stay engaged at the closed $5B mark only with strict diligence; do not underwrite a step-up round on public evidence alone. |
The call is explicitly price-sensitive: it separates the closed Series E anchor from any hotter follow-on narrative.
[CV001, CV007, CV035, CV042, CV046]The current call stays at track because strong category and product signals are offset by revenue-opacity and valuation-step-up risk.
This is a reasoning map, not a weighted scoring model.
[CV001, CV030, CV031, CV042, CV046, CV047]8.2 The price is only defensible if revenue quality is real and premium pricing survives competition
The bull case starts with speed and timing. Baseten’s financing path moved from a $75 million Series C in February 2025 to a $150 million Series D at $2.15 billion in September 2025 and then to a $5 billion Series E in January 2026. Sacra’s estimate of $600 million in annualized revenue and Baseten’s own 100x inference-volume claim are directionally consistent with a company that hit a steep adoption curve just as inference moved from prototype infrastructure into production infrastructure. Public market and private comp work then gives that growth some context: Modal’s May 2026 round was struck at about 15.5x annualized revenue, Fireworks’ closed valuation works out near 12.7x, and MongoDB-like public infrastructure treatment is still around 10x. The anti-thesis is just as important. Baseten’s pricing page, HostFleet’s matrix, and Runpod’s comparison all point to a premium-priced service layer rather than a commodity endpoint vendor. That can be a strength if the premium buys uptime, support, compliance, and hybrid flexibility. It can also cap the multiple if the business is more support-heavy and lower-margin than premium public software names. Hyperscalers make the downside sharper: AWS, Google, and Azure can bundle model access, compute, governance, and credits inside broader cloud relationships. In other words, Baseten may deserve a premium to raw AI cloud, but it has not yet earned the disclosure quality that would let investors pay Cloudflare-like or Datadog-like premiums with confidence.[CV003, CV004, CV005, CV006, CV010, CV011]
| Argument | What supports it | What would change the view |
|---|---|---|
| Inference is becoming a production budget, not an experiment budget. | Technavio and Mordor both show large, fast-growing AI deployment markets, while Baseten’s financing pace shows capital chasing the theme. | Evidence of slower adoption, weaker enterprise conversion, or shrinking model-serving budgets would reduce the premium case. |
| Baseten may deserve a premium because it sells performance, hybrid control, and support rather than bare GPU time. | Baseten markets custom SLAs, self-hosting, priority GPUs, cross-cloud scale, and forward-deployed engineers. | If customers can replicate acceptable latency and uptime on cheaper alternatives, the premium becomes a liability rather than a moat. |
| The closed $5B round is plausible if the private revenue estimate is roughly right. | Sacra’s $600M run-rate estimate implies about 8.3x revenue, below many premium software comps. | Primary finance data that lands far below the estimate would break this support quickly. |
| The $11B narrative is not yet underwritten by public facts. | The only public support is third-party reporting of talks, not a closed financing or disclosed fundamentals. | A signed term sheet or closed announcement with clean terms and corroborated financials would materially improve the case. |
The table separates company quality from valuation quality; both have to work for a buy call.
[CV007, CV008, CV010, CV011, CV015, CV030]The revenue required to justify a $5B valuation changes sharply depending on which comparable multiple investors anchor to.
Each bar divides the $5B Series E mark by a selected comparable multiple; values are support thresholds, not forecasts.
[CV017, CV020, CV022, CV024, CV027, CV029]8.3 Comparable work and scenarios put $5B inside the base case but leave little room for error
The comparable set matters because Baseten sits between two valuation regimes. On one side are AI-cloud and infrastructure names like CoreWeave, where capital intensity is visible and public multiples are lower. On the other are premium infrastructure-software names like Datadog and Cloudflare, where disclosure quality, margins, and platform breadth allow much richer trading. Baseten’s best public revenue estimate places the closed $5 billion round between those regimes rather than squarely in either one. That is why the closed round is arguable while the rumored $11 billion step-up is not yet underwritten by public evidence. Scenario work makes the same point more concretely. A bear case that assumes only $300 million to $400 million of durable revenue support and a 7x to 9x multiple points to material down-round risk. A base case that uses $500 million to $650 million and 8x to 12x supports roughly $4 billion to $7.8 billion, which comfortably contains the closed Series E. A bull case requires both stronger revenue continuation and a multiple closer to Modal-like or upper private-market inference treatment. That means the current investment debate is not whether Baseten is interesting; it is whether an investor is being paid for the uncertainty between a defensible $5 billion mark and an aspirational $11 billion narrative.[CV016, CV017, CV018, CV019, CV020, CV021]
| Scenario | Assumptions | Valuation / return logic | Key risks | Probability signal |
|---|---|---|---|---|
| Bear | Durable revenue support is only $300M-$400M, premium pricing erodes, and independent vendor economics look more like infrastructure than software. | $2.1B-$3.6B using a 7x-9x range; the current mark would be vulnerable to a reset. | Price pressure, lower margin quality, or slower enterprise expansion expose down-round risk. | This case rises if diligence shows support-heavy delivery, concentrated revenue, or weak unit economics. |
| Base | Revenue support lands around $500M-$650M, growth remains strong, and Baseten keeps a moderate premium to raw AI cloud. | $4.0B-$7.8B using 8x-12x; the closed $5B Series E sits inside this band. | The call still depends on validating revenue quality and gross margin, not just topline growth. | This is the most defensible case on today’s public evidence. |
| Bull | Revenue support reaches $700M-$900M, premium economics hold, and investors keep paying Modal-like or better private inference multiples. | $8.4B-$14.4B using 12x-16x; an $11B step-up becomes possible. | The upside depends on sustained hypergrowth and a premium-quality margin/disclosure profile that is not public yet. | This case requires more proof than the current public record supplies. |
Ranges are scenario outputs, not point estimates, and are designed to show how quickly the underwriting shifts when revenue support or multiple choice changes.
[CV035, CV036, CV043, CV044, CV045]| Comparable | Valuation context | Revenue context | Implied multiple | Relevance to Baseten | Limitation |
|---|---|---|---|---|---|
| Baseten closed Series E | $5.0B post-money (Jan 2026) | ~$600M annualized revenue estimate | ~8.3x | Direct anchor for the current underwriting debate. | Revenue support is third-party-estimated, not company-disclosed. |
| Modal | $4.65B post-money (May 2026) | ~$300M annualized revenue | ~15.5x | Closest premium private comp for elastic inference infrastructure. | Not the same mix of enterprise support, compliance, or customer base. |
| Fireworks AI | $4.0B post-money (Oct 2025 closed) | ~$315M annualized revenue estimate in Feb 2026 | ~12.7x | Relevant private inference comp with explicit gross-margin discussion. | Revenue and margin are also third-party estimates, not audited disclosure. |
| CoreWeave | $59.75B market cap (May 2026) | ~$12.5B 2026 revenue context / guide midpoint proxy | ~4.8x | Useful pure-play AI cloud floor for capital-intensive infrastructure. | Much larger scale, debt profile, and business model than Baseten. |
| Datadog | $88.04B market cap (May 2026) | ~$4.32B 2026 revenue guide midpoint | ~20.4x | Premium public infrastructure-software benchmark for disclosed growth quality. | Observability software carries better margin and disclosure quality than Baseten’s public record. |
| Cloudflare | $85.47B market cap (May 2026) | ~$2.33B trailing revenue | ~36.7x | Upper-bound developer-platform multiple reference. | Category leadership and public-company maturity are far stronger than Baseten’s today. |
| MongoDB | $27.01B market cap (May 2026) | ~$2.60B trailing revenue | ~10.4x | Lower-middle public infrastructure-software reference. | Database economics and installed base are not the same as inference infrastructure. |
This is a partial but intentionally broad sample spanning private inference peers, AI cloud, and public infrastructure software to bracket what the market could plausibly pay.
[CV016, CV017, CV019, CV020, CV021, CV022]Public evidence places the closed $5B round inside the base case, while a rumored $11B round requires bull-case assumptions.
Scenario bands combine revenue-support ranges and comp-multiple bands; the rumored follow-on is shown as an external signal, not an endorsed anchor.
[CV008, CV043, CV044, CV045]Baseten scores well on market tailwind and product differentiation, but much lower on disclosure quality and margin certainty.
Scores are directional IC-style judgments based only on retained public evidence as of the run date.
[CV025, CV030, CV031, CV040, CV041, CV042]8.4 The thesis should be gated by terms, margin evidence, and concentration—not by enthusiasm alone
The final call hinges on a small number of diligence items that can move valuation quickly. First, investors need management-grade revenue evidence. If the company is truly around a $600 million annualized run-rate with strong expansion and acceptable concentration, the closed $5 billion round starts to look reasonable and the next mark becomes debatable rather than fanciful. If the real number is materially lower, the same pricing and comp work flips from “defensible premium” to “overextended late-stage mark.” Second, investors need direct margin data. Fireworks’ roughly 50% gross margin is a useful peer reminder that inference businesses are not pure software. Baseten’s premium pricing only deserves a premium multiple if utilization, support load, and reserved-capacity economics produce better margin quality than that analogy implies. Third, investors need the terms underneath the headline price. Preference overhang, secondaries, and customer concentration can matter more than the post-money headline. This is why the right kill triggers are practical rather than rhetorical: if Baseten cannot preserve premium price-performance with acceptable margin, if growth slips below the base-case band, or if any new round clears only with aggressive terms, the thesis should be downgraded quickly. Until those questions are closed, the company deserves active tracking and structured diligence rather than a high-conviction price-insensitive buy decision.[CV023, CV025, CV032, CV033, CV034, CV039]
| Trigger | Threshold / signal | Transmission to thesis | Action implication |
|---|---|---|---|
| Revenue proof breaks | Management-grade run-rate lands materially below $500M or growth decelerates sharply from the public narrative. | The closed $5B round falls out of the base-case band and starts looking like a late-cycle mark. | Downgrade the recommendation and reset valuation work to the bear-case range. |
| Margin quality disappoints | Gross margin and utilization look closer to infrastructure resale economics than premium software economics. | Premium software multiples no longer fit the business model. | Apply a lower comp set and require a meaningfully better entry price. |
| Price-performance edge erodes | Customers can achieve acceptable production results on cheaper alternatives or bundled hyperscaler products. | Baseten’s premium pricing turns from moat into adoption friction. | Cut conviction and revisit long-term market-share assumptions. |
| Aggressive financing terms appear | A new round clears only with heavy preferences, secondary-heavy structures, or unusual protections. | Headline valuation stops mapping cleanly to common-equity upside. | Treat the mark as structurally weaker and rework return expectations. |
| Concentration emerges | A handful of AI-native accounts drive an outsized share of revenue without corresponding retention evidence. | The company’s revenue quality and durability become much less attractive than the headline growth rate. | Pause any high-conviction call until concentration and expansion data are clarified. |
These triggers are designed to be monitorable and directly tied to valuation support, not generic operating caution.
[CV012, CV013, CV014, CV039, CV042, CV043]| Topic | Missing evidence | Why it matters | Owner or diligence path |
|---|---|---|---|
| Revenue bridge | Monthly and quarterly revenue, ARR, and cohort expansion through the current quarter. | This is the single biggest determinant of whether $5B is fair or already stretched. | CFO / finance package and board materials. |
| Gross margin and utilization | Gross margin by product surface, GPU utilization, reserved-capacity mix, and support burden. | The multiple only deserves to converge toward premium software if margin quality is real. | Infra + finance deep dive with product-line detail. |
| Cap table and terms | Fully diluted share count, liquidation preferences, secondaries, and any structured terms. | Headline post-money can overstate common-equity upside. | Legal + finance review of latest financing docs. |
| Customer concentration and retention | Top-customer exposure, NRR, logo retention, and vertical mix across AI-native and enterprise accounts. | A premium multiple is fragile if spend is concentrated or non-repeatable. | Sales ops and customer-success cohort review. |
| Step-up financing evidence | Signed term sheet or closed-round proof for any valuation above $5B. | A rumored mark should not replace a closed anchor in underwriting. | Board process review and direct financing confirmation. |
The asks are ranked by how quickly they can change valuation support rather than by general company importance.
[CV025, CV037, CV046, CV047, CV048]Disclaimer
This report was produced by an automated research workflow using publicly available information as of 2026-05-30. It is not investment advice. Private-company data may be incomplete, stale, or estimated, and investors should supplement this report with management diligence, contractual review, and direct access to financial materials before making any investment decision.
Evidence index
| ID | Statement | Confidence | Sources |
|---|---|---|---|
| CO001 | Baseten was founded in 2019. | High | SO009, SO016, SO018 |
| CO002 | Baseten is based in San Francisco. | High | SO008, SO014, SO016 |
| CO003 | Baseten’s legal entity is Baseten Labs, Inc., and its privacy policy lists 201 Spear St, Suite 1600, San Francisco, CA 94105. | High | SO007, SO008 |
| CO004 | Baseten currently presents itself as an inference company built around high-performance production inference. | High | SO001, SO003, SO014 |
| CO005 | Official product surfaces show that Baseten combines production inference, model APIs, and training workflows in one platform. | Medium | SO001, SO005 |
| CO006 | Baseten sells cloud, self-hosted, and region-aware deployment options aimed at customers that need control over security or data residency. | High | SO003, SO004, SO005 |
| CO007 | Baseten says it is SOC 2 Type II and HIPAA compliant across its hosting options. | High | SO003, SO004, SO005 |
| CO008 | Baseten’s careers page, customer hub, and Series E press release name customers such as Abridge, Cursor, OpenEvidence, Speechify, Gamma, Clay, Notion, and Lovable. | High | SO006, SO002, SO014 |
| CO009 | The founders say they started Baseten at the end of 2019 to solve model-deployment and ML-infrastructure pain they had experienced themselves. | Medium | SO009 |
| CO010 | Tuhin Srivastava is publicly identified as CEO and co-founder. | High | SO031, SO015 |
| CO011 | Amir Haghighat is publicly identified as CTO and co-founder. | High | SO032, SO015 |
| CO012 | Phil Howes is publicly identified as a co-founder, and Tech Funding News describes him as chief scientist. | Medium | SO034, SO015 |
| CO013 | Pankaj Gupta is publicly identified as a co-founder. | Medium | SO033, SO015 |
| CO014 | Baseten’s Series E announcement is signed by Amir, Pankaj, Phil, and Tuhin, showing that all four founders still anchor the public leadership narrative. | Medium | SO013 |
| CO015 | Public governance visibility is limited in the fetched corpus, but Series D explicitly says Jay Simons joined Baseten’s board. | Medium | SO012, SO016 |
| CO016 | By the Series A milestone, Baseten said it had raised a little over $20 million across seed and Series A, with Greylock leading the Series A and South Park Commons, Lachy Groom, and Ray Tonsing also involved. | Medium | SO009 |
| CO017 | Tracxn and the archived PitchBook profile both place Baseten’s Series A on 2022-04-26. | Medium | SO016, SO018 |
| CO018 | Baseten’s Series B added $40 million led by IVP and Spark, with Greylock, South Park Commons, Lachy Groom, and Base Case also participating. | Medium | SO010 |
| CO019 | Tracxn records the Series B round date as 2024-03-04. | Medium | SO016 |
| CO020 | Baseten’s Series C raised $75 million on 2025-02-19 and was backed by IVP, Spark, Greylock, Conviction, South Park Commons, Basecase, Lachy Groom, Adam Bain, and Dick Costolo. | High | SO011, SO016 |
| CO021 | The Series C post says Baseten was already running workloads across thousands of GPUs and serving millions of end customers worldwide by early 2025. | Medium | SO011 |
| CO022 | Baseten’s Series D raised $150 million, was led by BOND, and brought CapitalG and Conviction into the round alongside prior investors. | High | SO012, SO016 |
| CO023 | CB Insights and Tracxn both peg Baseten’s September 2025 valuation at about $2.15 billion. | Medium | SO017, SO016 |
| CO024 | Series D linked financing to governance by adding Jay Simons to the board. | Medium | SO012 |
| CO025 | Baseten’s Series E raised $300 million at a $5 billion valuation, led by IVP and CapitalG with NVIDIA and several prior investors participating. | High | SO014, SO013, SO015 |
| CO026 | Tracxn and CB Insights both list the Series E closing date as 2026-01-20. | Medium | SO016, SO017 |
| CO027 | BusinessWire says Baseten has raised $585 million to date and that the Series E financing was its third fundraise in the prior year. | High | SO014, SO016, SO017 |
| CO028 | BusinessWire describes Baseten as infrastructure behind AI products including Cursor, Mercor, Clay, OpenEvidence, Lovable, and Abridge. | Medium | SO014 |
| CO029 | Official enterprise and healthcare pages market four-nines reliability, multi-cloud autoscaling, and region-restricted or self-hosted deployment options for sensitive workloads. | High | SO003, SO004, SO001 |
| CO030 | NVIDIA’s case study says Baseten reduced cold starts to 5–10 seconds from up to five minutes and doubled one customer’s inference performance with TensorRT-LLM. | Medium | SO027 |
| CO031 | OpenEvidence’s case study says Baseten now serves billions of requests per week for OpenEvidence and reduced end-to-end latency from over 700 milliseconds to 160 milliseconds. | Medium | SO023 |
| CO032 | Gamma says it generates roughly 3 million images per day on Baseten for 70+ million users and more than $100 million of ARR. | Medium | SO024 |
| CO033 | Speechify says Baseten cut its cost per million characters by 44% while supporting a 60M+ user base. | Medium | SO025 |
| CO034 | Patreon says Baseten saved more than 440 developer hours per year and cut GPU costs by 70% for Whisper-based workloads. | Medium | SO026 |
| CO035 | A January 2026 WorkOS interview says Baseten had just launched a startup program for seed and Series A companies and saw voice as an emerging modality. | Medium | SO028 |
| CO036 | Current headcount is not cleanly supportable because PitchBook and Tracxn show conflicting employee counts and entity-level staff figures. | Low | SO018, SO016 |
| CO037 | ServiceAlert’s 90-day monitor showed May 2026 at 100% uptime and zero days with issues, but it also says it only tracks daily worst-status reachability and lacks detailed incident data. | Medium | SO030 |
| CO038 | Nudge Security frames Baseten as a vendor-risk review target and lists security badges and SSO or MFA features, but it is an external aggregator rather than Baseten’s primary trust documentation. | Medium | SO029, SO007 |
| CO039 | Abridge describes itself as enterprise-grade AI for clinical conversations used by large healthcare systems, consistent with Baseten’s official claim that healthcare AI is a core customer segment. | Medium | SO019, SO006 |
| CO040 | Cursor says it is trusted by over half of the Fortune 500, supporting Baseten’s official claim to serve category-defining AI applications rather than only hobby use cases. | Medium | SO021, SO014 |
| CO041 | Clay says more than 500,000 GTM teams use its platform, and Baseten’s Series E materials identify Clay as a customer. | Medium | SO020, SO014 |
| CO042 | OpenEvidence positions itself as America’s Official Medical Knowledge Platform with major medical-content partners, and Baseten names it as a customer in both careers and Series E materials. | Medium | SO022, SO006, SO014 |
| CO043 | External market-data sources classify Baseten as a private Series E company. | Medium | SO016, SO018 |
| CO044 | Baseten’s pricing page shows a pay-as-you-go model with token-priced model APIs and per-minute compute pricing for GPU and CPU instances. | Medium | SO005 |
| CO045 | Baseten’s terms define the service as a platform for deploying machine learning models and building or operating applications for machine learning through a web interface. | Medium | SO007 |
| CO046 | The Series A post says Baseten quietly announced its first product in May after more than 18 months of building and used the funding moment to launch a public beta. | Medium | SO009 |
| CM001 | Baseten positions itself as a platform for high-performance inference in production rather than as a foundation-model creator. | Medium | SM001, SM003, SM006 |
| CM002 | Baseten's product surface now spans Model APIs, Dedicated Inference, Chains, Frontier Gateway, and Training. | Medium | SM005, SM006, SM007, SM015, SM016 |
| CM003 | The included spend in Baseten's core market is model-serving runtime, autoscaling, observability, billing or metering, and associated performance support attached to production inference. | Medium | SM002, SM006, SM007, SM016, SM018 |
| CM004 | The excluded spend is frontier-model R&D and the broader data or analytics stack that hyperscaler AI suites bundle but Baseten does not foreground. | Medium | SM015, SM027, SM028, SM029 |
| CM005 | Baseten competes inside overlapping categories of AI inference-as-a-service, broader AI inference, and enterprise AI platform software rather than a single cleanly defined market. | Medium | SM019, SM020, SM021 |
| CM006 | Status-quo substitutes include hyperscaler AI platforms, internal GPU infrastructure, and specialist GPU clouds such as Modal, Replicate, and Runpod. | Medium | SM022, SM023, SM024, SM027, SM028, SM029 |
| CM007 | Baseten's positioning emphasizes open-source and custom model deployment rather than ownership of closed frontier models. | Medium | SM005, SM006, SM014 |
| CM008 | Training is an adjacency for Baseten, but the commercial center of gravity remains inference and the train-to-deploy loop that feeds inference endpoints. | Medium | SM001, SM006, SM015 |
| CM009 | Technavio values the AI inference-as-a-service market at USD 85.25 billion in 2025. | Medium | SM019 |
| CM010 | Technavio forecasts a 22.1% CAGR for AI inference-as-a-service during 2026-2030. | Medium | SM019 |
| CM011 | Technavio says the GPU segment accounts for more than 58% of the AI inference-as-a-service market and that cloud deployment dominates the category. | Medium | SM019 |
| CM012 | Technavio says North America contributes 41.1% of forecast growth in AI inference-as-a-service. | Medium | SM019 |
| CM013 | Mordor Intelligence puts the enterprise AI market at USD 114.87 billion in 2026 and projects 18.91% CAGR through 2031. | Medium | SM020 |
| CM014 | Mordor says software and platforms led 65.89% of 2025 enterprise AI revenue. | Medium | SM020 |
| CM015 | Mordor says cloud solutions accounted for 67.33% of 2025 enterprise AI revenue while hybrid and edge configurations are the faster-growing deployment path. | Medium | SM020 |
| CM016 | Large enterprises accounted for 71.43% of 2025 enterprise AI spending in Mordor's market model. | Medium | SM020 |
| CM017 | Healthcare and life sciences are Mordor's fastest-growing enterprise AI vertical at 20.77% CAGR. | Medium | SM020 |
| CM018 | Fortune Business Insights values the broader AI inference market at USD 117.80 billion in 2026, up from USD 103.73 billion in 2025. | Medium | SM021 |
| CM019 | Fortune forecasts 12.98% CAGR to 2034 and says North America held 41.78% of the AI inference market in 2025. | Medium | SM021 |
| CM020 | Across public lenses, Baseten's addressable opportunity is clearly large but scope-sensitive: roughly USD 85 billion for inference-as-a-service today and more than USD 100 billion for adjacent inference or enterprise AI platform categories. | Medium | SM019, SM020, SM021 |
| CM021 | Baseten's best-evidenced buyer groups are performance-sensitive AI product teams, enterprise AI infrastructure teams, and model labs monetizing APIs. | Medium | SM003, SM010, SM016 |
| CM022 | Gamma shows a PLG or self-serve segment that values low latency and open-source model serving without building an internal ML infrastructure team. | Medium | SM012 |
| CM023 | OpenEvidence shows a regulated healthcare segment that wanted reliable performance, redundancy, and flexible compute without multi-year GPU commitments. | Medium | SM011 |
| CM024 | Writer shows enterprise model teams serving 70B models need multi-GPU performance engineering and secure deployments. | Medium | SM013 |
| CM025 | The daily users of Baseten are ML engineers, data scientists, and application engineers, while procurement, security, and IT administrators become stakeholders once deployments require identity, policy, or compliance controls. | Medium | SM009, SM014, SM017 |
| CM026 | Budget ownership appears to begin in product or engineering budgets for usage-based experimentation and shift toward central platform or IT budgets for quoted Pro, Enterprise, or self-hosted deployments. | Medium | SM002, SM003, SM017, SM018 |
| CM027 | Baseten's adoption path commonly starts with Model APIs or simple deployments and expands to Dedicated Inference and Chains as traffic, hardware specialization, or compound workflows grow. | Medium | SM001, SM005, SM006, SM007 |
| CM028 | Frontier Gateway creates a separate buyer motion for labs that need white-labeled APIs, rate limits, token metering, and billing without building their own inference control plane. | Medium | SM016 |
| CM029 | Baseten productizes compliance with HIPAA, SOC 2 Type II, region restrictions, dedicated namespaces, and a no-shared-GPU posture. | Medium | SM003, SM004, SM009 |
| CM030 | Baseten positions self-hosted, hybrid, and cloud deployments as ways to meet data residency, security, and existing cloud-commitment requirements. | Medium | SM002, SM003, SM006, SM008 |
| CM031 | Baseten's Model APIs are OpenAI-compatible and are marketed as 5-10x cheaper than closed alternatives. | Medium | SM005 |
| CM032 | Dedicated Inference is marketed as delivering 6x better GPU utilization and 5-10x lower costs at scale. | Medium | SM006 |
| CM033 | Chains is marketed as giving compound AI teams 6x better GPU usage and roughly half the latency through hardware-aware orchestration. | Medium | SM001, SM007 |
| CM034 | Baseten's value proposition is not just compute rental; customer stories repeatedly emphasize outsourced performance engineering and forward-deployed support. | Medium | SM003, SM011, SM012, SM013 |
| CM035 | OpenEvidence reported 78% lower latency, 6x faster deployments, and 8x-plus lower infrastructure maintenance time after moving to Baseten. | Medium | SM011 |
| CM036 | Gamma reported 30-80% faster image generation, 20% better efficiency, and scaling to 70+ million users and about 3 million images per day on Baseten. | Medium | SM012 |
| CM037 | Writer reported 60% higher tokens per second, 23% lower time to first token, and 35% lower cost per million tokens on four A100 GPUs. | Medium | SM013 |
| CM038 | Enterprise AI growth is being driven by automation demand, exploding data volumes, cloud AI services, and specialized hardware advances. | Medium | SM020 |
| CM039 | AI inference demand is also being driven by real-time processing needs, generative AI workloads, and edge or IoT expansion. | Medium | SM019, SM021 |
| CM040 | Hardware supply constraints, high accelerator prices, and tariff pressure are material market constraints for both inference providers and buyers. | Medium | SM019, SM021 |
| CM041 | Talent shortages and legacy-system integration complexity remain major barriers to enterprise AI rollout. | Medium | SM020, SM021 |
| CM042 | Multi-cloud capacity management and the ability to avoid long-term GPU commitments address a real buyer pain point around capacity risk and demand spikes. | Medium | SM008, SM011, SM016 |
| CM043 | HostFleet's April 2026 matrix shows Baseten's published hourly GPU prices above Runpod, Modal, and Replicate on like-for-like SKUs such as T4, L4, A100, and H100. | Medium | SM026 |
| CM044 | Runpod's own 2026 comparison ranks Baseten fifth and attributes 8-12 second cold starts to Baseten while highlighting cheaper or faster specialist alternatives. | Medium | SM025 |
| CM045 | Hyperscaler substitutes bundle model deployment with broader data, notebook, governance, and agent tooling rather than pure inference specialization. | Medium | SM027, SM028, SM029 |
| CM046 | Baseten's clearest public beachheads are high-performance consumer AI products, regulated healthcare workloads, and model labs monetizing proprietary models. | Medium | SM011, SM012, SM013, SM016 |
| CM047 | Public pricing and packaging imply Baseten trades a higher headline GPU rate for bundled performance engineering, observability, security, and managed support. | Medium | SM002, SM003, SM016, SM026 |
| CM048 | Public sources do not isolate a clean Baseten-specific SAM or SOM because available estimates mix enterprise AI, inference infrastructure, cloud, edge, and model-serving categories. | Medium | SM019, SM020, SM021 |
| CM049 | Public material does not disclose Baseten's contract sizes, attach rates for support, or revenue mix across Model APIs, Dedicated Inference, Training, and Frontier Gateway. | Medium | SM002, SM016, SM018 |
| CP001 | Baseten's current product surface spans custom-model deployment, Model APIs, training, Chains orchestration, and Frontier Gateway. | High | SP003, SP005, SP006, SP007, SP010 |
| CP002 | Baseten supports Baseten Cloud, single-tenant or self-hosted deployments, and multi-cloud capacity or cross-cloud autoscaling. | High | SP001, SP003, SP004, SP009 |
| CP003 | Baseten's public pitch centers on speed, uptime, and developer experience instead of lowest-cost raw GPU capacity. | High | SP001, SP008, SP009, SP029 |
| CP004 | Modal positions as Python-first serverless AI infrastructure with instant autoscaling to 1000+ GPUs and built-in observability. | High | SP014, SP015 |
| CP005 | Replicate positions around one-line APIs, community-published models, fine-tuning, and custom deployment through Cog. | High | SP016, SP017 |
| CP006 | Runpod offers Pods, Serverless, and Clusters, emphasizing fast scaling, many GPU SKUs or regions, and low-cost capacity. | High | SP018, SP019, SP020 |
| CP007 | AWS SageMaker, Google Vertex AI, and Azure ML each market broader end-to-end ML or AI lifecycle tooling with strong enterprise controls. | High | SP021, SP023, SP024 |
| CP008 | Internal build remains a real substitute because Truss packages models portably and can narrow the software gap between local, self-managed, and hosted deployments. | High | SP011, SP003 |
| CP009 | Frontier Gateway lets model labs ship white-labeled APIs with per-user keys, rate limits, and metering, widening Baseten's competitor set to lab-facing platforms. | Medium | SP010, SP029 |
| CP010 | PitchBook and Tracxn independently name Modal and Replicate among Baseten's comparable competitors, supporting the direct-peer set beyond vendor marketing. | High | SP027, SP026 |
| CP011 | Baseten's public plans split into Basic pay-as-you-go, quote-driven Pro, and Enterprise, with priority compute, dedicated compute, self-host, and custom SLAs above Basic. | High | SP002, SP009 |
| CP012 | Baseten says customers do not pay for idle time and only pay while models are deploying, scaling, or processing on the platform. | Medium | SP002 |
| CP013 | Baseten advertises SOC 2 Type II, HIPAA compliance, and no default storage of model inputs or outputs. | High | SP002, SP004, SP009 |
| CP014 | Baseten's runtime layers open-source engines such as TensorRT-LLM, SGLang, vLLM, TGI, and TEI with custom optimizations like speculative decoding and KV-cache management. | High | SP003, SP008 |
| CP015 | Baseten Model APIs are OpenAI-compatible and can move from shared APIs to dedicated deployments on Baseten-managed hardware. | High | SP005, SP003 |
| CP016 | Modal's public pricing offers $30 per month in Starter credits, 10 GPU concurrency on Starter, and 50 GPU concurrency on Team at $250 per month plus compute. | Medium | SP015 |
| CP017 | Replicate private models usually bill for setup, idle, and active time on dedicated hardware, making always-warm custom deployments costlier than pure scale-to-zero billing. | High | SP017, SP016 |
| CP018 | Runpod Secure Cloud and Serverless publish materially lower raw GPU list prices than Baseten's public per-GPU pricing for comparable capacity tiers. | Medium | SP019, SP020, SP002, SP025 |
| CP019 | Runpod Serverless bills per second from worker start until full stop, with flex workers scaling to zero and active workers remaining on. | High | SP020, SP018 |
| CP020 | AWS Bedrock prices open-model access by provider or model and service tier, and its batch inference option is listed at 50% below on-demand pricing. | Medium | SP022 |
| CP021 | Google Vertex AI prices tools, compute, storage, and management fees separately rather than offering simple public list GPU rates. | Medium | SP023 |
| CP022 | Azure ML charges no standalone platform fee but bills the Azure services consumed around training, deployment, storage, and monitoring. | Medium | SP024 |
| CP023 | Baseten's multi-cloud and self-host options reduce buyer fear of cloud lock-in, but they also make it easier for customers to migrate away from Baseten later. | High | SP001, SP009, SP011 |
| CP024 | Baseten's public trust posture is stronger than most self-serve peers because it combines compliance claims with single-tenant and self-host deployment modes. | High | SP002, SP004, SP009 |
| CP025 | Hyperscalers retain the strongest distribution power because Bedrock or SageMaker, Vertex AI, and Azure ML sit inside existing identity, billing, and procurement relationships. | High | SP021, SP023, SP024 |
| CP026 | Modal narrows the enterprise gap with marketplace transacting, SSO, audit logs, and HIPAA on Enterprise, but its public package is still compute-led rather than inference-specific governance. | Medium | SP014, SP015 |
| CP027 | Replicate minimizes adoption friction for prototypes through community models and simple APIs, but its public materials disclose less enterprise control than Baseten's. | Medium | SP016, SP017, SP009 |
| CP028 | Runpod explicitly markets no lock-in, low cost, and fast scale, making it attractive to cost-sensitive teams comfortable assembling their own serving stack. | High | SP018, SP019, SP020 |
| CP029 | Open-source packaging via Truss and Cog plus raw GPU clouds make multi-homing structurally easier in this market than in closed-model or data-platform markets. | High | SP011, SP017, SP019 |
| CP030 | Baseten's expansion into training and lab-facing gateway products moves it from pure hosting into a broader AI infrastructure platform category. | High | SP006, SP010, SP029 |
| CP031 | Baseten's main moat is the integration of optimized runtimes, multi-cloud capacity, enterprise deployment modes, and hands-on engineering support rather than exclusive model ownership. | High | SP008, SP009, SP029 |
| CP032 | Truss can create developer pull and portability at the same time, so it is both a funnel asset and a limiter on hard lock-in. | High | SP011, SP003 |
| CP033 | HostFleet's April 2026 matrix shows Baseten as the highest published per-GPU-hour option among the compared serverless hosts on multiple common GPU tiers. | Medium | SP025, SP002, SP015, SP017, SP019 |
| CP034 | The same HostFleet comparison still argues Baseten is attractive for production workloads because Truss, observability, and support are tangible despite higher headline pricing. | Medium | SP025, SP002, SP003 |
| CP035 | Baseten's public status page reports 99.91% uptime for Model APIs over the displayed window and records multiple May 2026 incidents. | Medium | SP012 |
| CP036 | Servicealert's independent outage tracker also shows non-perfect recent availability for Baseten, reinforcing that reliability remains a diligence item. | Medium | SP013, SP012 |
| CP037 | Sacra identifies hyperscaler bundling and below-market pricing as the clearest external threat to independent inference platforms like Baseten. | Medium | SP028, SP021, SP023, SP024 |
| CP038 | Business Wire and TechFundingNews both frame Baseten's current strategic battleground as production inference infrastructure rather than frontier-model training ownership. | Medium | SP029, SP030 |
| CP039 | Business Wire says Baseten has raised $585 million and counts NVIDIA, IVP, and CapitalG among key investors, improving its staying power in a capital-intensive market. | Medium | SP029, SP028, SP026 |
| CP040 | Baseten's best-supported positioning is premium, production-grade open-model inference for teams that value performance, portability, and support more than lowest-cost GPU hours. | High | SP003, SP008, SP009, SP025, SP028 |
| CI001 | Baseten's public monetization surfaces span dedicated deployments, Model APIs, and Training. | High | SI001, SI003, SI004, SI007 |
| CI002 | Baseten's public plan structure is Basic at $0 per month pay-as-you-go, with Pro and Enterprise sold via quote. | Medium | SI001 |
| CI003 | Pro includes priority access to high-demand GPUs, dedicated compute, higher Model API rate limits, hands-on engineering expertise, dedicated Slack and Zoom support, and volume discounts. | Medium | SI001 |
| CI004 | Enterprise includes custom SLAs, self-host deployments, use of existing cloud commitments, full control over data residency, and advanced RBAC with Teams. | Medium | SI001, SI005 |
| CI005 | Baseten publishes Model API list pricing per 1 million tokens with separate columns for input, cached input, and output. | High | SI001, SI003 |
| CI006 | Dedicated deployments are billed only for compute used, down to the minute. | Medium | SI001 |
| CI007 | Baseten says customers do not pay for idle time, but do pay while a model is deploying, scaling up or down, or making predictions. | Medium | SI001 |
| CI008 | Baseten sells Training both as Loops early access and as generally available Training Jobs, with a direct train-to-deploy path into production inference. | Medium | SI004 |
| CI009 | Baseten's Terms state that fees are billed at the end of the month and payable within 30 days unless an Order says otherwise. | Medium | SI013 |
| CI010 | Baseten's Terms make the Order the binding commercial instrument, so enterprise economics can vary contract by contract even though list pricing is public. | Medium | SI013, SI005 |
| CI011 | The billing usage API returns separate dedicated_usage, training_usage, and model_apis_usage blocks with subtotals, credits used, totals, and daily breakdowns. | Medium | SI007 |
| CI012 | The model_apis_usage block reports model name plus input, output, and cached input token counts. | Medium | SI007 |
| CI013 | The dedicated_usage block reports billable resource metadata, minutes, subtotal, and inference request counts. | Medium | SI007 |
| CI014 | Baseten explicitly monetizes support and engineering help through Pro, Enterprise, and enterprise deployment offers. | High | SI001, SI002, SI005 |
| CI015 | Dedicated Inference claims Baseten regularly sees 6x better GPU utilization and 5-10x lower costs powered by its inference stack. | Medium | SI002 |
| CI016 | The Model APIs page claims Baseten can spend 5-10x less than closed alternatives when serving optimized frontier open models. | Medium | SI003 |
| CI017 | The Enterprise page frames Baseten's economic advantage as higher output and better GPU utilization from optimized runtimes rather than seat-based software pricing. | Medium | SI005 |
| CI018 | The Healthcare page says per-minute billing and scale-to-zero make GPU costs scale with active inference rather than idle overhead. | Medium | SI006 |
| CI019 | Writer reports 35% lower cost per million tokens, 60% higher tokens per second, and 23% lower time to first token on Baseten. | Medium | SI016 |
| CI020 | OpenEvidence reports 78% lower latency, 6x faster deployment processes, 8x+ lower infrastructure maintenance time, and flexible access to compute without multi-year contracts. | Medium | SI017 |
| CI021 | Speechify reports 44% lower cost per million characters, 30-50% lower p99 latency, and 4.5x faster replica startup after migrating to Baseten. | Medium | SI018 |
| CI022 | Superhuman reports 80% lower P95 latency and says Baseten freed multiple engineers from building and running inference infrastructure in-house. | Medium | SI019 |
| CI023 | Patreon reports 440+ hours of development time saved per year, $600,000 of resources saved per year, and 70% GPU-cost savings on Baseten. | Medium | SI020 |
| CI024 | Taken together, Baseten's customer proofs sell lower total production cost and faster deployment for serious workloads rather than the lowest raw GPU-hour list price. | Medium | SI016, SI017, SI018, SI019, SI020 |
| CI025 | HostFleet's April 2026 matrix shows Baseten priced above Runpod on every shared GPU SKU it lists, above Modal on the shared L4 and H100 rows, and below only Replicate's A100 custom deployment rate among the shared A100 prices shown. | Medium | SI027 |
| CI026 | HostFleet says Baseten combines per-GPU-hour pricing with a minimum dedicated deployment cost and billed minimum awake times. | Medium | SI027 |
| CI027 | Baseten raised $300 million at a $5 billion valuation in January 2026. | High | SI010, SI021, SI024, SI025 |
| CI028 | Business Wire says the January 2026 financing was Baseten's third fundraise in the prior year and brought total capital raised to $585 million. | High | SI021, SI024, SI025 |
| CI029 | Baseten's Series D was $150 million in September 2025. | High | SI009, SI024 |
| CI030 | Baseten's Series C was $75 million in February 2025. | High | SI008, SI024 |
| CI031 | Tracxn and CB Insights also show $585 million total funding and a $300 million Series E on January 20, 2026. | Medium | SI024, SI025 |
| CI032 | Baseten's Series E blog says inference volume grew 100x in the prior year. | Medium | SI010 |
| CI033 | Baseten's Series E materials say the new capital will fund speed, uptime, developer experience, team growth, and a broader infrastructure platform. | High | SI010, SI021 |
| CI034 | Tech Funding News says the new funding is expected to support hiring in engineering and customer service plus platform and integration expansion. | Medium | SI022 |
| CI035 | Sacra estimates Baseten reached $200 million annualized revenue in December 2025 and $600 million annualized revenue in March 2026. | Low | SI023 |
| CI036 | Sacra says Baseten monetizes either API consumption or GPU minutes and hours and uses multi-cloud capacity management across more than 15 cloud providers instead of owning GPU infrastructure. | Medium | SI023 |
| CI037 | PitchBook labeled Baseten as generating revenue by February 2025 and showed 73 employees in its April 2025 snapshot. | Medium | SI026 |
| CI038 | Tracxn lists 258 employees as of April 2026. | Medium | SI024 |
| CI039 | The jump from 73 employees in PitchBook's 2025 snapshot to 258 in Tracxn's April 2026 snapshot implies substantial operating-expense growth, but payroll and burn are undisclosed. | Medium | SI024, SI026 |
| CI040 | Baseten's status page shows Model APIs at 99.91% uptime over the displayed 90-day window and multiple incidents in May 2026, while the Dedicated Inference component shows 100.0% uptime over the same displayed window. | Medium | SI015 |
| CI041 | The Dedicated Inference SLA targets 99.9% monthly availability, caps service credits at 40% of monthly fees, and requires claims within 24 hours of downtime. | Medium | SI014 |
| CI042 | Baseten's privacy policy identifies the contracting entity as BaseTen Labs, Inc. | Medium | SI012 |
| CI043 | The SEC EDGAR entity landing page for CIK 0001850888 says there is no filings data for the organization, so there are no public SEC operating-company financial statements available from that page. | Medium | SI029 |
| CI044 | Mordor says cloud deployments were 67.33% of enterprise AI revenue in 2025 and hybrid and edge deployments are forecast to grow 19.53% CAGR through 2031. | Medium | SI028 |
| CI045 | Mordor says healthcare and life sciences are forecast to grow 20.77% CAGR through 2031. | Medium | SI028 |
| CI046 | Baseten's enterprise and healthcare pages align with that opportunity through self-host, cloud-commitment, data-residency, HIPAA, and SOC 2 positioning. | Medium | SI005, SI006, SI028 |
| CI047 | Baseten's public materials do not disclose cash balance, monthly burn, runway, gross margin, CAC, NRR, customer concentration, or revenue mix by product surface. | Medium | SI001, SI010, SI013, SI021, SI023, SI024, SI025, SI029 |
| CI048 | Sacra reports Baseten is in talks to raise capital at about an $11 billion post-money valuation, with some reported offers as high as $15 billion, but that is not a closed financing. | Low | SI023 |
| CI049 | Because Baseten appears asset-light on owned GPUs but premium-priced on raw list compute, margin quality likely depends on utilization, enterprise support attachment, and negotiated discounts rather than headline GPU rates alone. | Medium | SI005, SI023, SI027 |
| CI050 | The public evidence supports strong demand, pricing, and capital-access narratives, but a real underwriting decision still depends on private data for realized pricing, retention, gross margin, burn, and concentration. | Medium | SI021, SI023, SI024, SI025, SI027, SI029 |
| CI051 | Baseten's public customer proofs now span financial-services AI, coding copilots, voice dictation, and world-model workloads, indicating that production demand is diversified across several latency-sensitive categories rather than one single end market. | Medium | SI030, SI031, SI032, SI033, SI034 |
| CI052 | Hebbia said Baseten improved tokens per second 2.5x, improved time to first token 4x, and reduced inference cost by more than 10x versus its previous deployment. | Medium | SI030 |
| CI053 | Posit said Baseten delivered sub-200ms latency for its Next Edit Suggestions feature and let the team pay only for compute it actually used. | Medium | SI031 |
| CI054 | Wispr Flow said its end-to-end speech and Llama pipeline ran in under 700 milliseconds at p99 on Baseten and AWS, with scale-to-zero elasticity. | Medium | SI032 |
| CI055 | Zed said Baseten lowered p90 latency by 45% and increased throughput 3.6x versus its previous inference provider, supporting Baseten's claim that performance wins can displace incumbent infrastructure. | Medium | SI033 |
| CE001 | Baseten publicly presents a full-stack product surface spanning Truss-led custom deployment, Model APIs, Dedicated Inference, Chains, Frontier Gateway, and Training rather than a single hosting SKU. | High | SE001, SE002, SE005, SE007, SE008, SE009, SE010 |
| CE002 | Model APIs run on shared infrastructure with OpenAI and Anthropic API compatibility, while dedicated deployments let customers choose hardware, engines, and scaling for their own models. | High | SE006, SE007 |
| CE003 | Truss packages model serving logic, dependencies, weights, and GPU configuration so the same artifact behaves consistently in development and production. | Medium | SE025, SE027 |
| CE004 | Truss publicly supports vLLM, SGLang, TensorRT-LLM, transformers, diffusers, PyTorch, and TensorFlow. | Medium | SE025, SE027 |
| CE005 | For supported architectures, a config-only Truss deployment can compile a model with TensorRT-LLM and expose an OpenAI-compatible endpoint without custom Python model code. | Medium | SE004, SE025 |
| CE006 | Chains deploys Python-defined chainlets where each step can set its own hardware resources, software dependencies, and autoscaling settings. | High | SE002, SE009 |
| CE007 | Baseten's training surface has two public tracks: Training Jobs is GA and Loops is early access. | Medium | SE010 |
| CE008 | Loops is positioned as a training SDK whose checkpoints can promote directly into Dedicated Inference, making inference a first-class output of training. | Medium | SE010, SE026 |
| CE009 | Frontier Gateway adds a white-labeled API surface with key management, rate limits, metering, billing, and branded URLs for labs serving their own models to customers. | High | SE002, SE008 |
| CE010 | MCM is Baseten's infrastructure control plane for unifying GPUs across cloud providers and regions, provisioning resources, and rerouting workloads during capacity crunches or outages. | High | SE004, SE011 |
| CE011 | Baseten gives each deployment a dedicated model subdomain and keeps endpoint names stable across environment promotion. | Medium | SE004 |
| CE012 | Baseten's request-routing model parks requests during scale-to-zero cold starts and offers an async queue that prioritizes synchronous traffic when capacity is tight. | Medium | SE004 |
| CE013 | BDN mirrors model weights into Baseten-controlled storage and uses mirrored-origin, cluster, and node caches to make large-model cold starts faster after the first pull. | High | SE004, SE019 |
| CE014 | Baseten publicly documents runtime optimizations including TensorRT, SGLang, vLLM, TGI, TEI, speculative decoding, structured outputs, KV-cache optimization, and topology-aware parallelism. | High | SE002, SE013 |
| CE015 | Baseten offers Baseten Cloud, self-hosted, hybrid, single-tenant, and region-restricted deployment options for customers that need different control or residency models. | High | SE007, SE011, SE015 |
| CE016 | Regional environments require Baseten configuration and a different regional endpoint format to guarantee inference traffic stays inside the designated geography. | Medium | SE021 |
| CE017 | Baseten publicly claims SOC 2 Type II and HIPAA compliance across its cloud hosting surfaces. | High | SE014, SE015, SE016 |
| CE018 | Baseten says it does not store model inputs, outputs, or weights by default, except temporary storage for async inference and optional caching users enable. | High | SE014, SE015 |
| CE019 | Baseten's public security docs say the platform never shares GPUs across users, isolates each customer into a dedicated Kubernetes namespace, and uses Calico, Falco, and Gatekeeper around workload security. | Medium | SE014 |
| CE020 | Baseten added Enterprise SSO and SCIM in May 2026 with SAML 2.0 sign-in, SCIM 2.0 sync, just-in-time provisioning, automatic deprovisioning, and group-based role assignment. | Medium | SE017 |
| CE021 | Rolling deployments launched in March 2026 and introduced max_surge_percent and stabilization_time_seconds controls for gradual zero-downtime promotion. | Medium | SE018 |
| CE022 | The billing usage API launched in March 2026 and exposes daily spend breakdowns across Dedicated Inference, Training, and Model APIs. | Medium | SE020 |
| CE023 | The only reviewed public Baseten SLA is for Dedicated Inference at 99.9% monthly availability, while Baseten marketing elsewhere uses four-nines or 99.99 reliability language. | High | SE001, SE007, SE015, SE023 |
| CE024 | Baseten's public status page showed incidents on May 15, 16, 18, 19, 26, and 29, 2026 even though its summary cards displayed 100.0% uptime for Dedicated Inference and 99.91% for Model APIs over the visible 90-day window. | Medium | SE022 |
| CE025 | ServiceAlert's third-party reachability page listed May 2026 at 100% uptime but explicitly said detailed incident data is unavailable, limiting independent verification of Baseten outage quality. | Medium | SE030 |
| CE026 | Truss has a visible public developer surface through a GitHub repository, a PyPI package, and active May 2026 release activity. | High | SE025, SE026, SE027 |
| CE027 | The May 2026 Truss release stream emphasized Loops CLI features, training checkpoint views, deployment-log links, and inference-call behavior, which indicates active investment in the training-to-inference workflow. | Medium | SE026 |
| CE028 | Writer's Baseten case study says model-specific TensorRT-LLM engines delivered 60% higher tokens per second, 23% lower time to first token, and 35% lower cost per million tokens on four A100 GPUs. | Medium | SE028 |
| CE029 | OpenEvidence says Baseten reduced end-to-end latency from more than 700 milliseconds to 160 milliseconds and sped deployments 6x. | Medium | SE029 |
| CE030 | OpenEvidence also says Baseten now serves billions of requests per week for its medical workflow and reduced infrastructure maintenance time by more than 8x. | Medium | SE029 |
| CE031 | HostFleet's April 2026 pricing matrix shows Baseten posting higher public GPU-hour rates than Runpod and Modal on comparable L4, A100, and H100 instances. | Medium | SE016, SE031, SE032, SE033 |
| CE032 | Despite the higher published price points, HostFleet characterizes Truss, observability, and support as Baseten's tangible value-adds for startups running production inference. | Medium | SE031 |
| CE033 | Runpod and Modal market more aggressive zero-idle and cold-start language than Baseten, while Baseten emphasizes dedicated compute, managed performance engineering, and control. | Medium | SE005, SE031, SE032, SE033 |
| CE034 | Replicate's public product surface is simpler API-first model serving through Cog, whereas Baseten layers dedicated deployments, Chains, and Frontier Gateway on top of its packaging tool. | Medium | SE008, SE009, SE025, SE034 |
| CE035 | AWS SageMaker, Google Agent Platform, and Azure Machine Learning all span training, deployment, governance, and observability, so Baseten competes by offering a narrower inference-first abstraction rather than full hyperscaler platform breadth. | Medium | SE004, SE035, SE036, SE037 |
| CE036 | A third-party security profile lists AWS, Vercel, Statuspage, SendGrid, Stripe, Segment, Sentry, and other SaaS tools in Baseten's visible operational footprint. | Medium | SE038 |
| CE037 | Baseten's visible 2026 roadmap signal centered on trust and operating controls such as SSO/SCIM, rolling deployments, BDN, and billing instrumentation rather than entirely new product lines. | Medium | SE017, SE018, SE019, SE020 |
| CE038 | Public materials show uneven maturity within the training stack because Training Jobs is GA while Loops is still early access. | Medium | SE010 |
| CE039 | Public Baseten sources still leave unresolved product-tech gaps around benchmark methodology, exact regional-environment setup lead times, and roadmap priorities beyond the currently announced 2026 releases. | Low | SE004, SE021, SE028 |
| CE040 | Baseten's product-tech moat appears strongest for teams that value performance tuning, cross-cloud capacity, and engineering support more than lowest published unit price or hyperscaler breadth. | Medium | SE007, SE015, SE031, SE032, SE033, SE035, SE036, SE037 |
| CU001 | Baseten markets itself as a high-performance inference platform for teams shipping AI products in production. | Medium | SU001 |
| CU002 | Baseten's enterprise page targets mission-critical enterprise inference with secure, scalable, and controllable deployment options. | Medium | SU003 |
| CU003 | Baseten publicly packages Basic, Pro, and Enterprise plans around progressively heavier buyer needs, from pay-as-you-go deployments to self-hosted regulated environments. | Medium | SU005 |
| CU004 | Writer positions itself as an enterprise AI platform used by world-class enterprises. | Medium | SU008, SU015, SU016 |
| CU005 | OpenEvidence describes itself as a medical knowledge platform for clinicians and physicians. | Medium | SU009, SU014 |
| CU006 | Speechify says more than 55 million people use its voice AI productivity assistant. | Medium | SU010, SU018 |
| CU007 | Gamma says its AI tools create presentations, websites, and social content. | Medium | SU011, SU017 |
| CU008 | Superhuman positions itself as AI-enhanced mail, docs, and workflow software for knowledge workers. | Medium | SU012, SU020 |
| CU009 | Patreon says hundreds of thousands of creators use its platform to build direct fan communities and recurring businesses. | Medium | SU013, SU021 |
| CU010 | Business Wire names OpenEvidence, Abridge, Notion, Clay, and Mercor among Baseten customers. | Medium | SU007 |
| CU011 | WorkOS says Baseten powers AI workloads for Cursor, Notion, Clay, OpenEvidence, and Ambience. | Medium | SU023 |
| CU012 | Baseten says its inference volume grew 100x in the last year. | Medium | SU006 |
| CU013 | Baseten's customer-stories index spans speech, healthcare, coding, pharmaceutical search, and AI operations use cases. | Medium | SU002 |
| CU014 | OpenEvidence says Baseten now serves billions of requests per week for its medical-information product. | Medium | SU009 |
| CU015 | OpenEvidence says its product now works with a doctor in every state and zip code in America. | Medium | SU009 |
| CU016 | Speechify says its platform synthesizes more than 161 billion characters per month for 60M+ users. | Medium | SU010 |
| CU017 | Gamma says it generates more than 3 million images per day for more than 70 million users on Baseten. | Medium | SU011 |
| CU018 | Superhuman says Baseten runs dozens of custom embedding models that power core features in its product. | Medium | SU012 |
| CU019 | Patreon says Baseten saved 440+ hours of developer time and nearly $600k per year on its Whisper deployment. | Medium | SU013 |
| CU020 | FeaturedCustomers lists 13 case studies, 29 testimonials, 4 customer videos, and 654 reference ratings for Baseten. | Medium | SU024 |
| CU021 | Writer reports 60% higher tokens per second on Baseten for its domain-specific LLMs. | Medium | SU008 |
| CU022 | Writer reports 23% lower time to first token and 35% lower cost per million tokens on Baseten. | Medium | SU008 |
| CU023 | OpenEvidence reports latency falling from more than 700 milliseconds to 160 milliseconds on Baseten. | Medium | SU009 |
| CU024 | OpenEvidence reports 6x faster deployments and an 8x+ reduction in infrastructure maintenance time on Baseten. | Medium | SU009 |
| CU025 | Speechify reports a 44% lower cost per million characters on Baseten. | Medium | SU010 |
| CU026 | Speechify reports 30-50% lower p99 inference latency and 4.5x faster replica startup on Baseten. | Medium | SU010 |
| CU027 | Gamma reports 30%-80% faster image generation per model on Baseten. | Medium | SU011 |
| CU028 | Gamma reports 20% efficiency improvement while reducing replica count and supporting billions of generated images. | Medium | SU011 |
| CU029 | Superhuman reports an average 80% reduction in P95 latency across its embedding models on Baseten. | Medium | SU012 |
| CU030 | Patreon reports 70% GPU-cost savings and says Baseten was twice as cheap as the next cheapest solution for its Whisper workload. | Medium | SU013 |
| CU031 | FeaturedCustomers reports a 4.8 out of 5 reference-rating score for Baseten based on 654 ratings. | Medium | SU024 |
| CU032 | OpenEvidence says Baseten was a clear winner after the team spent weeks researching and vetting inference providers. | Medium | SU009 |
| CU033 | Speechify says Baseten delivered the highest uptime of any inference provider it knows. | Medium | SU010 |
| CU034 | Superhuman says it was able to self-serve 95% of what it needed on Baseten. | Medium | SU012 |
| CU035 | PeerSpot's review summary emphasizes Baseten's supportive environment, speed-to-deployment, flexibility, and cost effectiveness. | Medium | SU031 |
| CU036 | Baseten's pricing page shows a self-serve Basic plan, a Pro plan with dedicated compute and hands-on engineering, and an Enterprise plan with self-hosting and custom SLAs. | Medium | SU005 |
| CU037 | Baseten's enterprise page says Baseten Cloud offers single-tenant clusters and the self-hosted product can fail over to Baseten Cloud. | Medium | SU003 |
| CU038 | Baseten's healthcare page says the platform is SOC 2 Type II and HIPAA compliant, supports region-restricted deployments, and highlights OpenEvidence and Latent as healthcare cases. | Medium | SU004 |
| CU039 | WorkOS says customers often start thinking about controlling their own destiny once inference spending reaches roughly $10,000-$50,000 per month. | Medium | SU023 |
| CU040 | WorkOS says open-source models let companies switch to options that are faster, cheaper, more customizable, and more reliable at scale. | Medium | SU023 |
| CU041 | Business Wire says Baseten pitches open runtimes and no lock-in around customer models. | Medium | SU007 |
| CU042 | HostFleet says Baseten is the highest-priced listed provider in its April 2026 comparison for T4, L4, A10G, A100, and H100 where listed, and adds that Baseten has a minimum dedicated deployment cost. | Medium | SU026 |
| CU043 | Runpod ranks Baseten fifth in its 2026 serverless GPU comparison and characterizes it as per-minute, configurable-replica infrastructure with 8-12 second speed. | Medium | SU025 |
| CU044 | NVIDIA says Baseten cut cold starts from up to five minutes to 5-10 seconds using NVIDIA GPUs and TensorRT-LLM. | Medium | SU022 |
| CU045 | Publicly quantified proof is concentrated in six flagship case studies even though fundraising and interview materials name additional accounts. | Medium | SU002, SU007, SU023, SU024 |
| CU046 | Reviewed public customer materials do not disclose NRR, GRR, contract length, or top-customer revenue share. | Medium | SU002, SU003, SU005, SU006, SU007 |
| CU047 | Abridge sells enterprise-grade AI for clinical conversations trusted by the largest healthcare systems. | Medium | SU007, SU027 |
| CU048 | Clay says more than 500,000 GTM teams use its data-enrichment and workflow platform. | Medium | SU023, SU028 |
| CU049 | Cursor says it is trusted by over half of the Fortune 500 for AI-assisted software development. | Medium | SU023, SU029 |
| CU050 | Notion AI markets built-in agents, enterprise search, HIPAA-capable enterprise workflows, and zero-data-retention enterprise controls. | Medium | SU007, SU030 |
| CU051 | Mercor says it is organizing human intelligence to power the AI economy. | Medium | SU007, SU032 |
| CU052 | Publicly named strategic accounts extend Baseten beyond consumer applications into healthcare, GTM, coding, and enterprise productivity. | Medium | SU007, SU023, SU027, SU028, SU029, SU030, SU032 |
| CU053 | Public references skew toward AI-native software companies whose own products depend heavily on inference quality and latency. | Medium | SU002, SU007, SU008, SU009, SU010, SU011, SU012, SU013 |
| CU054 | Baseten's public customer-proof quality is high on outcome specificity for six flagship stories but low on disclosed renewal economics. | Medium | SU008, SU009, SU010, SU011, SU012, SU013, SU024 |
| CU055 | The public record supports land-and-expand potential from model experimentation into dedicated compute, multi-cloud scale, and self-hosted enterprise configurations. | Medium | SU003, SU005, SU009, SU010, SU012 |
| CR001 | Baseten says it maintains SOC 2 Type II certification and HIPAA compliance. | High | SR001, SR011, SR012 |
| CR002 | Baseten says it does not store model inputs or outputs by default, except async inputs are temporarily stored until processed. | High | SR001, SR011 |
| CR003 | Baseten says compliance policies are read-only for customers and must be changed through Baseten support. | Medium | SR001 |
| CR004 | Baseten offers self-hosted and single-tenant deployment options for sensitive workloads on higher-tier plans. | High | SR001, SR008, SR011, SR024 |
| CR005 | Baseten's terms incorporate a DPA and security measures that Baseten may update so long as overall protection is not materially decreased. | Medium | SR003 |
| CR006 | Baseten's DPA lets customers object to a new subprocessor within five calendar days after notice. | Medium | SR003 |
| CR007 | Baseten's DPA says it will notify customers without undue delay after discovering a personal-data breach affecting customer personal data, but customers remain responsible for their own notification obligations. | Medium | SR003 |
| CR008 | Baseten's DPA says customers must not provide PHI and other Restricted Data unless otherwise agreed upon with Baseten in writing. | High | SR003, SR029 |
| CR009 | HHS says a covered entity must obtain written satisfactory assurances before disclosing PHI to a business associate. | High | SR029, SR003 |
| CR010 | Baseten's healthcare positioning creates a diligence need to verify a signed BAA or similar written override before underwriting PHI workflows. | High | SR001, SR003, SR012, SR029 |
| CR011 | The European Commission says the AI Act is being implemented through guidance, codes of practice, and an AI Act Service Desk. | Medium | SR030 |
| CR012 | Because Baseten markets healthcare and regulated enterprise workloads, AI Act and GDPR implementation can lengthen security and legal review cycles even if Baseten is infrastructure rather than the end application. | Medium | SR011, SR012, SR030 |
| CR013 | Baseten's published SLA applies only to Dedicated Inference for which Baseten is the hosting party. | High | SR004, SR024 |
| CR014 | Baseten's published Dedicated Inference SLA targets 99.9% monthly availability. | High | SR004, SR024 |
| CR015 | Baseten's SLA caps service credits at 40% of monthly fees and requires customers to submit claims within 24 hours of unscheduled downtime. | Medium | SR004 |
| CR016 | Baseten's terms say the services are not licensed for time-critical or mission-critical functions and are not warranted to be uninterrupted or error-free. | High | SR003, SR004 |
| CR017 | Baseten's status page shows multiple May 2026 incidents, including investigation, identified-fix, monitoring, and major-outage markers in the 90-day view. | Medium | SR005 |
| CR018 | Servicealert says detailed incident data is not available for Baseten and that its history is based on reachability monitoring. | Medium | SR006 |
| CR019 | Baseten's public product pages market four nines or 99.99% uptime more broadly than the default 99.9% Dedicated Inference SLA. | High | SR004, SR011, SR012, SR020, SR023, SR024 |
| CR020 | Baseten shipped rolling deployments with gradual traffic shifting, pause, resume, and cancel controls as a mitigation against deployment-induced outages. | Medium | SR026, SR022 |
| CR021 | Baseten positions its reliability story around multi-cloud, multi-region autoscaling and hybrid deployment options rather than a single-cloud architecture. | High | SR010, SR011, SR023, SR024 |
| CR022 | Nudge Security lists AWS, Vercel, Statuspage, SendGrid, Stripe, Segment, Sentry, GitBook, and other SaaS tools in Baseten's visible supply chain. | Medium | SR007 |
| CR023 | Baseten's frontier, model API, and dedicated inference pages all tie product promises to access to the latest-generation GPUs and elastic capacity. | High | SR008, SR020, SR023, SR024 |
| CR024 | Technavio says AI inference-as-a-service providers face hardware supply constraints and high accelerator costs that inflate operating costs and limit scalability. | Medium | SR013 |
| CR025 | Mordor Intelligence says hardware accelerators are the fastest-growing enterprise AI component and that GPU supply constraints and salary inflation are current headwinds. | Medium | SR014 |
| CR026 | HostFleet's April 2026 matrix shows Baseten priced above Runpod, Modal, and Fal.ai on multiple comparable GPU classes. | Medium | SR016, SR008 |
| CR027 | Runpod's 2026 comparison lists Baseten with per-minute pricing and an 8-12 second cold-start range while ranking cheaper or faster peers above it on some dimensions. | Medium | SR015, SR008 |
| CR028 | HostFleet says Baseten has a minimum dedicated deployment cost and billed minimum awake times, which raises entry friction for smaller customers. | Medium | SR016, SR008 |
| CR029 | Baseten counters lock-in risk with self-hosting, hybrid deployment, open runtimes, and full ownership of trained weights. | Medium | SR011, SR021, SR028 |
| CR030 | Baseten announced a $300M Series E at a $5B valuation in January 2026 after multiple fundraises within the prior year. | Medium | SR018, SR017 |
| CR031 | Baseten says the financing marked the company's third fundraise in the prior year, increasing pressure to convert capital into durable enterprise growth. | Medium | SR018, SR017 |
| CR032 | Tracxn lists Baseten at 46 employees on December 31, 2024 and 258 employees by April 26, 2026. | Low | SR017 |
| CR033 | Baseten's careers page says companies such as Abridge, Cursor, Lovable, Notion, and OpenEvidence depend on Baseten for mission-critical AI workloads. | Medium | SR019, SR018 |
| CR034 | Baseten is expanding simultaneously across model APIs, dedicated inference, frontier gateway, model management, and training products. | Medium | SR020, SR021, SR022, SR023, SR024 |
| CR035 | Baseten's Loops training product is still early access even as Training Jobs is generally available. | Medium | SR021 |
| CR036 | SSO/SCIM, advanced identity controls, self-hosting, and custom SLAs are tied to higher-tier enterprise packaging rather than the self-serve entry plan. | Medium | SR008, SR011, SR025 |
| CR037 | Baseten added SSO/SCIM with automatic provisioning and deprovisioning plus group-based role assignment as a concrete mitigation for identity risk in larger accounts. | Medium | SR025, SR011 |
| CR038 | Baseten's billing usage API gives customers programmatic daily cost visibility across Dedicated Inference, Training, and Model APIs. | Medium | SR027, SR008 |
| CR039 | Baseten's model-management tooling says customers can monitor deployment health and adjust autoscaling policies to hit performance SLAs. | Medium | SR022, SR010 |
| CR040 | Truss and custom-server packaging reduce some switching-cost risk because Baseten exposes a more portable packaging layer than a fully closed model-hosting service. | Medium | SR028, SR022 |
| CR041 | Baseten's repeated emphasis on hands-on engineering expertise and customized deployments implies a service-heavy go-to-market model that may pressure margins as enterprise accounts scale. | Medium | SR008, SR011, SR024 |
| CR042 | Baseten's public contract stack leaves customers responsible for system configuration, backups, valid legal basis, and parts of incident response, which can slow regulated deployments even when Baseten provides secure infrastructure. | High | SR003, SR004, SR029 |
| CR043 | Modal said it raised $355 million in May 2026 after surpassing $300 million in annualized revenue, showing that a close inference-infrastructure rival is scaling quickly with large new capital. | High | SR031, SR032 |
| CR044 | Reuters reported that Modal's Series C valued the company at $4.65 billion, close to Baseten's $5 billion January 2026 valuation, which limits room for execution misses if buyers compare the platforms directly. | Medium | SR032 |
| CR045 | Sacra estimated Fireworks AI at roughly $315 million in annualized revenue in 2026 and a $4 billion valuation from its 2025 Series C, indicating that another open-model inference peer is already operating at substantial scale. | Medium | SR033 |
| CR046 | Tracxn says RunPod has raised only $22 million while positioning itself as a cost-effective GPU-infrastructure provider, which suggests cheaper rivals do not need Baseten-like capital intensity to pressure pricing. | Medium | SR034 |
| CR047 | CoreWeave reported nearly $100 billion of revenue backlog in May 2026 and explicitly framed inference as a major growth vector, underscoring that capital-rich infrastructure platforms are racing to absorb the same demand pool Baseten targets. | Medium | SR035 |
| CV001 | Baseten officially announced a $300 million Series E at a $5 billion valuation in January 2026. | High | SV001, SV002, SV003, SV005 |
| CV002 | After the Series E, public sources put Baseten’s total disclosed funding at about $585 million. | Medium | SV002, SV004, SV005, SV006 |
| CV003 | Tracxn records Baseten’s financing path as a $75 million Series C in February 2025, a $150 million Series D at a $2.15 billion valuation in September 2025, and a $300 million Series E at a $5 billion valuation in January 2026. | Medium | SV005 |
| CV004 | Baseten’s official Series D announcement says the company raised $150 million in a round led by BOND. | High | SV008, SV005 |
| CV005 | Baseten’s Series C announcement and PitchBook archive together support that the company’s 2025 Series C was a $75 million round. | Medium | SV009, SV007 |
| CV006 | Baseten said inference volume grew 100x during 2025. | Medium | SV001, SV004 |
| CV007 | Sacra estimates Baseten reached $600 million of annualized revenue in March 2026, up from about $200 million in December 2025. | Medium | SV004 |
| CV008 | Sacra says Baseten was in talks in May 2026 to raise $1 billion at an $11 billion post-money valuation, with reported offers reaching as high as $15 billion. | Medium | SV004 |
| CV009 | The gap between the closed $5 billion round and the mooted $11 billion follow-on means the underwriting question is whether fundamentals have caught up with sentiment, not whether enthusiasm exists. | Medium | SV001, SV002, SV004 |
| CV010 | Baseten’s pricing page shows a free pay-as-you-go Basic tier, while Pro adds priority compute and dedicated support and Enterprise adds custom SLAs and self-hosting. | Medium | SV010 |
| CV011 | Baseten’s homepage pitches cross-cloud scale, forward-deployed engineers, and 99.99% uptime as reasons customers should trust it for production workloads. | Medium | SV011 |
| CV012 | HostFleet’s April 2026 pricing matrix shows Baseten at $4.00 per hour for A100 and $6.50 per hour for H100, above Modal at $2.10 and $3.95 and above Runpod at $2.17 and $3.35 on the same GPU classes. | Medium | SV012 |
| CV013 | HostFleet also notes Baseten combines per-GPU-hour pricing with a minimum dedicated deployment cost and billed minimum awake times. | Medium | SV012 |
| CV014 | Runpod’s 2026 comparison ranks Baseten below several alternatives on affordability and cites usage-based per-minute billing with 8–12 second cold starts. | Medium | SV013 |
| CV015 | Baseten can still justify premium pricing if observability, support, compliance, and hybrid deployment reduce customers’ total cost of production inference. | Medium | SV010, SV011, SV012 |
| CV016 | Modal disclosed a May 2026 Series C of $355 million at a $4.65 billion post-money valuation after surpassing $300 million in annualized revenue. | High | SV015, SV016 |
| CV017 | Modal’s May 2026 round implies an approximate 15.5x annualized-revenue multiple. | Medium | SV015, SV016 |
| CV018 | Modal says it can scale from 0 to 1,000 GPUs in minutes or even seconds, making it a credible direct infrastructure comparable rather than a generic application software company. | Medium | SV015 |
| CV019 | Fireworks AI’s last closed round was a $250 million Series C at a $4 billion post-money valuation, while Sacra estimates roughly $315 million of annualized revenue in February 2026 and gross margin around 50%. | Medium | SV018 |
| CV020 | Fireworks’ closed-round valuation implies an approximate 12.7x annualized-revenue multiple, which is above Baseten’s implied closed-round multiple if the Sacra estimate is right. | Medium | SV018, SV004 |
| CV021 | CoreWeave reported Q1 2026 revenue of $2.078 billion and a $99.4 billion revenue backlog, while CompaniesMarketCap showed a May 2026 market cap of $59.75 billion. | Medium | SV019, SV020 |
| CV022 | Using CoreWeave’s 2026 revenue context, public AI infrastructure is trading at roughly 4.8x market cap to annualized-guide revenue. | Medium | SV019, SV020 |
| CV023 | Datadog guided to $4.30 billion to $4.34 billion of 2026 revenue, and CompaniesMarketCap put its May 2026 market cap at $88.04 billion. | High | SV021, SV022 |
| CV024 | Datadog’s implied multiple is about 20.4x forward revenue, showing how the market prices premium infrastructure software with strong growth and disclosure. | Medium | SV021, SV022 |
| CV025 | Datadog’s Form 10-K highlights the disclosure baseline public investors get on risk factors, revenue, and growth that private Baseten investors do not get from public materials. | High | SV023, SV021 |
| CV026 | CompaniesMarketCap and Stock Analysis put Cloudflare at about an $85.47 billion market cap and $2.33 billion of trailing revenue in late May 2026. | Medium | SV024, SV025 |
| CV027 | Cloudflare’s implied 36.7x revenue multiple is an upper-bound developer-platform reference that assumes much better disclosure, margin structure, and category leadership than Baseten has shown publicly. | Medium | SV024, SV025 |
| CV028 | CompaniesMarketCap and Stock Analysis put MongoDB at about a $27.01 billion market cap and $2.60 billion of trailing revenue in late May 2026. | Medium | SV026, SV027 |
| CV029 | MongoDB’s implied 10.4x revenue multiple is a lower-middle public infrastructure-software reference for a scaled but less euphoric comp set. | Medium | SV026, SV027 |
| CV030 | Technavio values the AI inference-as-a-service market at $85.25 billion in 2025 and expects 22.1% CAGR through 2030. | Medium | SV028 |
| CV031 | Mordor values the broader enterprise AI market at $114.87 billion in 2026, with cloud deployment accounting for 67.33% of 2025 revenue. | Medium | SV029 |
| CV032 | AWS Bedrock advertises select batch inference at 50% below on-demand pricing, showing hyperscalers can attack the inference layer with bundled economics. | Medium | SV030 |
| CV033 | Google promotes a unified agent platform with 200-plus models and free credits for new customers, increasing the risk that enterprises default to broader cloud bundles. | Medium | SV031 |
| CV034 | Azure Machine Learning publishes a 99.9% SLA and no additional platform charge beyond underlying Azure services, reinforcing the bundling threat to independent vendors. | Medium | SV032 |
| CV035 | If Sacra’s $600 million annualized-revenue estimate is directionally right, Baseten’s closed $5 billion round implies roughly an 8.3x revenue multiple. | Medium | SV004 |
| CV036 | An $8.3x implied multiple would place Baseten above CoreWeave-like AI cloud treatment but below Modal, Fireworks, Datadog, and Cloudflare-style premium software treatment. | Medium | SV004, SV018, SV019, SV020, SV021, SV022, SV024, SV025 |
| CV037 | At the same $600 million run-rate, the mooted $11 billion follow-on would imply roughly an 18.3x multiple, much closer to Datadog-grade public software pricing. | Medium | SV004, SV021, SV022 |
| CV038 | Baseten’s pricing and delivery model suggest revenue quality may be more support-intensive and lower-margin than top public software comps even if growth is exceptional. | Medium | SV010, SV011, SV012, SV013 |
| CV039 | Fireworks’ roughly 50% gross margin and explicit 60% target are a useful reminder that inference platforms are infrastructure businesses first, not pure software businesses. | Medium | SV018 |
| CV040 | The strongest pro-valuation argument is that inference demand is large, cloud-heavy, and moving into production workloads where Baseten offers hybrid deployment and performance differentiation. | Medium | SV028, SV029, SV010, SV011 |
| CV041 | The strongest anti-valuation argument is that premium pricing can be attacked by Runpod and Modal at the edge and by hyperscalers through bundled platform pricing. | Medium | SV012, SV013, SV017, SV030, SV031, SV032 |
| CV042 | The current $5 billion price is supportable only conditionally because it assumes the private revenue estimate is directionally right and that Baseten can defend premium economics despite bundling pressure. | Medium | SV004, SV012, SV013, SV015, SV016, SV030, SV031, SV032 |
| CV043 | A reasonable bear case uses $300 million to $400 million of revenue support and a 7x to 9x multiple, implying roughly $2.1 billion to $3.6 billion of value. | Medium | SV004, SV018, SV026, SV027 |
| CV044 | A reasonable base case uses $500 million to $650 million of revenue support and an 8x to 12x multiple, implying roughly $4.0 billion to $7.8 billion and placing the closed $5 billion round inside the range. | Medium | SV004, SV015, SV016, SV018, SV026, SV027 |
| CV045 | A reasonable bull case uses $700 million to $900 million of revenue support and a 12x to 16x multiple, implying roughly $8.4 billion to $14.4 billion and making an $11 billion step-up possible only if growth and premium perception keep compounding. | Medium | SV004, SV015, SV016, SV018 |
| CV046 | The right investment recommendation is track, not buy, because company quality is high but the public evidence leaves the price only fair-to-stretched rather than clearly attractive. | Medium | SV004, SV012, SV013, SV015, SV016, SV023 |
| CV047 | The highest-leverage diligence question is whether internal revenue, gross margin, and customer-concentration data support the market narrative implied by the $5 billion round. | Medium | SV004, SV018, SV023 |
| CV048 | The thesis should break if Baseten cannot preserve premium price-performance with acceptable margin, if growth normalizes materially below the base-case band, or if any new round clears only with aggressive terms. | Medium | SV004, SV012, SV013, SV018, SV023 |
| ID | Publisher | Title | Quote |
|---|---|---|---|
| SO001 | Baseten | Baseten | Inference is everything | |
| SO002 | Baseten | Baseten customers | |
| SO003 | Baseten | Enterprise | |
| SO004 | Baseten | Healthcare | |
| SO005 | Baseten | Pricing | |
| SO006 | Baseten | Careers at Baseten | Companies like Abridge, Cursor, Lovable, Notion, and OpenEvidence depend on Baseten to power mission-critical AI workloads in production. |
| SO007 | Baseten | Baseten Terms and Conditions | BASETEN LABS, INC. (“BASETEN”). |
| SO008 | Baseten | Privacy Policy | Company (referred to as either "the Company", "We", "Us" or "Our" in this Agreement) refers to BaseTen Labs, Inc., 201 Spear St, Suite 1600, San Francisco, CA 94105. |
| SO009 | Baseten | Announcing our Series A | We’ve raised a little over $20 million dollars to date across our seed and Series A rounds. |
| SO010 | Baseten | Announcing our Series B | We’re excited to announce that we’ve raised an additional $40M. |
| SO011 | Baseten | Announcing Baseten’s $75M Series C | Today, we run workloads across thousands of GPUs, serving millions of end customers worldwide while continuously adding new cloud partners. |
| SO012 | Baseten | Announcing Baseten’s $150M Series D | Today, we’re excited to announce our $150M Series D, led by BOND, with Jay Simons joining our Board. |
| SO013 | Baseten | Announcing Baseten's $300M Series E | We’re thrilled to announce that we have raised $300M at a $5B valuation. |
| SO014 | Business Wire | Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future | Founded in 2019 and based in San Francisco, Baseten has raised $585 million to date from investors including IVP, CapitalG, Conviction, Bond, Greylock, and Spark Capital. |
| SO015 | Tech Funding News | Baseten nabs $300M from IVP, CapitalG to challenge Together AI in inference | |
| SO016 | Tracxn | Baseten Technologies | |
| SO017 | CB Insights | Baseten Stock Price, Funding, Valuation, Revenue & Financial Statements | |
| SO018 | PitchBook via Internet Archive | Baseten 2025 Company Profile: Valuation, Funding & Investors | PitchBook | |
| SO019 | Abridge | Abridge | Intelligence at the point of conversation | |
| SO020 | Clay | Clay | GTM workflows at scale | |
| SO021 | Cursor | Cursor | The new way to build software | |
| SO022 | OpenEvidence | OpenEvidence | America's Official Medical Knowledge Platform | |
| SO023 | Baseten | OpenEvidence delivers instant, accurate medical information with Baseten | Baseten now serves billions of requests per week for OpenEvidence. |
| SO024 | Baseten | How Gamma makes building presentations criminally fun | |
| SO025 | Baseten | Speechify real-time text-to-speech | Because of Baseten’s efficient autoscaling, model performance and infrastructure optimizations, Speechify’s cost per million characters dropped by 44%. |
| SO026 | Baseten | Patreon | |
| SO027 | NVIDIA | Streamlined AI Inference Infrastructure in the Cloud | Baseten’s infrastructure, powered by NVIDIA GPUs on AWS EC2 instances powered by NVIDIA A100 Tensor Core GPUs, only takes 5–10 seconds to get the model up and running. This is an incredible speedup on cold starts, which previously took up to five minutes. |
| SO028 | WorkOS | A conversation with Philip Kiely from Baseten at AWS re:Invent 2025 | |
| SO029 | Nudge Security | Is Baseten safe? Learn if Baseten Is Legit | Review Baseten security risks. |
| SO030 | ServiceAlert | Baseten Outage History, Downtime & Incident Records | Detailed incident data is not available for this service. |
| SO031 | Baseten | Tuhin Srivastava - CEO, Co-Founder | |
| SO032 | Baseten | Amir Haghighat - CTO, Co-Founder | |
| SO033 | Baseten | Pankaj Gupta - Co-Founder | |
| SO034 | Baseten | Phil Howes - Co-Founder | |
| SM001 | Baseten | Inference Platform: Deploy AI models in production | Baseten | Scale workloads across any region and any cloud (in our cloud or yours), with blazing-fast cold starts and 99.99% uptime out of the box. |
| SM002 | Baseten | Cloud Pricing | Basic: $0 per month, pay as you go. Enterprise adds self-host deployments, cloud commitments, and custom SLAs. |
| SM003 | Baseten | Mission-Critical Inference for Enterprise AI Infrastructure | The Baseten Inference Stack runs inside your cloud infrastructure, keeping your data fully under your control. |
| SM004 | Baseten | Healthcare | 99.99% uptime and infinite scaling through a unified GPU pool spanning 10+ clouds. |
| SM005 | Baseten | Production-First Model APIs - Baseten Inference Stack | Spend 5-10x less than closed alternatives with our optimized multi-cloud infrastructure and efficient frontier open models. |
| SM006 | Baseten | Inference at Scale with Dedicated Deployments | Baseten | We regularly see 6x better GPU utilization and 5-10x lower costs powered by our Inference Stack. |
| SM007 | Baseten | Multi-Model Inference, Ultra-Low Latency at Scale | Baseten | Baseten Chains enables granular hardware and autoscaling for compound AI, powering 6x better GPU usage and cutting latency in half. |
| SM008 | Baseten | Cloud-Native AI Infrastructure | Baseten | Scale in your cloud, ours, or both with Baseten Self-hosted, Cloud, and Hybrid deployment options. |
| SM009 | Baseten | Secure model inference - Baseten | Baseten never shares GPUs across users. |
| SM010 | Baseten | Customer stories | Speechify synthesizes 161B+ characters per month for 60M+ users. With Baseten, Speechify cut costs by 44%, p99 latency by 30-50%, and got 4.5x faster cold starts. |
| SM011 | Baseten | OpenEvidence delivers instant, accurate medical information with the Baseten Inference Stack | OpenEvidence can scale efficiently even in the face of traffic spikes, hardware failure, or capacity constraints... without locking into multi-year commitments with single cloud vendors. |
| SM012 | Baseten | How Gamma makes building presentations criminally fun | We generate millions of images a day on Baseten for our 70+ million users with ultra-low latency and high throughput. |
| SM013 | Baseten | How Writer helps businesses transform with AI | In a benchmark running the LLMs in FP16 on four NVIDIA A100 GPUs, Writer saw 60% higher tokens per second, 23% lower time to first token, and 35% lower cost per million tokens. |
| SM014 | Baseten | Why we built and open-sourced a model serving solution | Truss bridges the gap between model development and model deployment by making it equally straightforward to serve a model on localhost and in prod. |
| SM015 | Baseten | AI Model Training Built for Production Inference | Baseten | Train -> deploy loop: Models trained with Loops promote directly to Baseten Dedicated Inference with one command. |
| SM016 | Baseten | Baseten Frontier Gateway | The Baseten Frontier Gateway is the path from weights to a production-ready API. |
| SM017 | Baseten | SSO and SCIM | Available on the Enterprise plan with just-in-time provisioning, automatic deprovisioning, and optional group-gated admin access. |
| SM018 | Baseten | Retrieve billing usage via API | The response includes aggregate totals and a per-resource or per-model breakdown array, with daily granularity on each entry. |
| SM019 | Technavio | AI Inference-as-a-service Market Growth Analysis - Size and Forecast 2026-2030 | The AI Inference-as-a-service Market size was valued at USD 85.25 billion in 2025, growing at a CAGR of 22.1% during the forecast period 2026-2030. |
| SM020 | Mordor Intelligence | Enterprise AI Market - Share, Trends & Size 2025 - 2031 | The Enterprise AI market size stood at USD 114.87 billion in 2026 and is projected to reach USD 273.08 billion by 2031, registering an 18.91% CAGR over 2026-2031. |
| SM021 | Fortune Business Insights | AI Inference Market Size, Share | Global Growth Report [2034] | The global AI inference market size was valued at USD 103.73 billion in 2025 and is projected to grow from USD 117.80 billion in 2026 to USD 312.64 billion by 2034. |
| SM022 | Modal | Modal: High-performance AI infrastructure | Autoscale from 0 to 1000+ GPUs, instantly. |
| SM023 | Replicate | Run AI with an API | We scale up and down to handle demand, and you only pay for the compute that you use. |
| SM024 | Runpod | The AI Developer Cloud | Runpod | One platform to go from AI experiment to production. Pods for building. Serverless for shipping. Clusters for scaling. |
| SM025 | Runpod | Top Serverless GPU Clouds for 2026: Comparing Runpod, Modal, and More | Baseten: usage-based (per-minute), configurable replicas, T4/A10G/L4/A100/H100, 8-12 sec cold starts. |
| SM026 | HostFleet | Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) — HostFleet | You’re a startup with a production inference workload and a budget -> Baseten. Most expensive per GPU-hour on paper, but Truss, observability, and support are tangible. |
| SM027 | Amazon Web Services | The center for all your data, analytics, and AI – Amazon SageMaker – AWS | Train, customize, and deploy ML and foundation models on a highly performant and cost-effective infrastructure. |
| SM028 | Google Cloud | Gemini Enterprise Agent Platform (formerly Vertex AI) | Build, scale, govern and optimize enterprise grade AI agents. |
| SM029 | Microsoft Azure | Azure Machine Learning - ML as a Service | Microsoft Azure | Azure Machine Learning is a comprehensive machine learning platform that supports language model fine-tuning and deployment. |
| SP001 | Baseten | Inference Platform: Deploy AI models in production | Baseten | |
| SP002 | Baseten | Cloud Pricing | |
| SP003 | Baseten Docs | Overview - Baseten | |
| SP004 | Baseten Docs | Secure model inference - Baseten | |
| SP005 | Baseten | Production-First Model APIs - Baseten Inference Stack | |
| SP006 | Baseten | AI Model Training Built for Production Inference | Baseten | |
| SP007 | Baseten | Multi-Model Inference, Ultra-Low Latency at Scale | Baseten | |
| SP008 | Baseten | AI Model Performance - Baseten Inference Runtime | |
| SP009 | Baseten | Mission-Critical Inference for Enterprise AI Infrastructure | |
| SP010 | Baseten | Baseten Frontier Gateway | |
| SP011 | Baseten | Why we built and open-sourced a model serving solution | |
| SP012 | Baseten | Baseten Status | |
| SP013 | Servicealert.ai | Baseten Outage History, Downtime & Incident Records | |
| SP014 | Modal | Modal: High-performance AI infrastructure | |
| SP015 | Modal | Plan Pricing | Modal | |
| SP016 | Replicate | Run AI with an API | |
| SP017 | Replicate | Pricing – Replicate | |
| SP018 | Runpod | The AI Developer Cloud | Runpod | |
| SP019 | Runpod | GPU Cloud Pricing - Runpod | |
| SP020 | Runpod Docs | Serverless pricing | Runpod Docs | |
| SP021 | AWS | The center for all your data, analytics, and AI – Amazon SageMaker – AWS | |
| SP022 | AWS | Amazon Bedrock Pricing – AWS | |
| SP023 | Google Cloud | Gemini Enterprise Agent Platform (formerly Vertex AI) | |
| SP024 | Microsoft Azure | Azure Machine Learning - ML as a Service | Microsoft Azure | |
| SP025 | HostFleet | Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) — HostFleet | Baseten: Most expensive per GPU-hour on paper, but Truss, observability, and support are tangible. |
| SP026 | Tracxn | Baseten Technologies | |
| SP027 | PitchBook | Baseten 2025 Company Profile: Valuation, Funding & Investors | PitchBook | |
| SP028 | Sacra | Baseten revenue, valuation & funding | AWS, Google, and Microsoft leverage extensive enterprise relationships to bundle AI inference with broader cloud commitments at below-market rates. |
| SP029 | Business Wire | Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future | |
| SP030 | Tech Funding News | Baseten nabs $300M from IVP, CapitalG to challenge Together AI in inference — TFN | |
| SI001 | Baseten | Cloud Pricing | |
| SI002 | Baseten | Inference at Scale with Dedicated Deployments | Baseten | |
| SI003 | Baseten | Production-First Model APIs - Baseten Inference Stack | |
| SI004 | Baseten | AI Model Training Built for Production Inference | Baseten | |
| SI005 | Baseten | Enterprise | |
| SI006 | Baseten | Healthcare | |
| SI007 | Baseten | Retrieve billing usage via API | |
| SI008 | Baseten | Announcing Baseten’s $75M Series C | |
| SI009 | Baseten | Announcing Baseten’s $150M Series D | |
| SI010 | Baseten | Announcing Baseten’s $300M Series E | |
| SI011 | Baseten | Careers at Baseten | |
| SI012 | Baseten | Privacy Policy | |
| SI013 | Baseten | Baseten Terms and Conditions | |
| SI014 | Baseten | Service Level Agreement | |
| SI015 | Baseten | Baseten Status | |
| SI016 | Baseten | How Writer helps businesses transform with AI | |
| SI017 | Baseten | OpenEvidence delivers instant, accurate medical information with the Baseten Inference Stack | |
| SI018 | Baseten | How Speechify makes audio the default with real-time text-to-speech | |
| SI019 | Baseten | Superhuman achieves 80% faster embedding model inference with Baseten | |
| SI020 | Baseten | Patreon scales Whisper transcription with Baseten | |
| SI021 | Business Wire | Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future | |
| SI022 | Tech Funding News | Baseten nabs $300M from IVP, CapitalG to challenge Together AI in inference | |
| SI023 | Sacra | Baseten revenue, valuation & funding | |
| SI024 | Tracxn | Baseten Technologies | |
| SI025 | CB Insights | Baseten Stock Price, Funding, Valuation, Revenue & Financial Statements | |
| SI026 | PitchBook via Wayback | Baseten 2025 Company Profile: Valuation, Funding & Investors | PitchBook | |
| SI027 | HostFleet | Serverless GPU Pricing Matrix 2026 | |
| SI028 | Mordor Intelligence | Enterprise AI Market - Share, Trends & Size 2025 - 2031 | |
| SI029 | U.S. Securities and Exchange Commission | EDGAR Entity Landing Page (CIK 0001850888) | |
| SI030 | Baseten | How Hebbia uses Baseten to power AI workflows for the world's leading financial institutions | |
| SI031 | Baseten | Posit launches real-time AI code suggestions with Baseten | |
| SI032 | Baseten | Wispr Flow creates effortless voice dictation with Llama on Baseten | |
| SI033 | Baseten | How Zed is reimagining the code editor from the ground up | |
| SI034 | Baseten | How World Labs is building large world models, pushing the boundaries of 3D | |
| SE001 | Baseten | Inference Platform: Deploy AI models in production | Baseten | Rapidly scale workloads across any cloud provider with global capacity. We offer single-tenant and self-hosted deployments for extra security. |
| SE002 | Baseten | Overview - Baseten | Baseten is a training and inference platform. Bring a model ... and Baseten turns it into a production API endpoint with autoscaling, observability, and optimized serving infrastructure. |
| SE003 | Baseten | Reference documentation - Baseten | |
| SE004 | Baseten | How Baseten works | Behind every GPU workload on Baseten is the Multi-cloud Capacity Management (MCM) system. |
| SE005 | Baseten | Production-First Model APIs - Baseten Inference Stack | Model APIs made for products, not toys. |
| SE006 | Baseten | Model APIs - Baseten | Model APIs provide instant access to high-performance LLMs through endpoints that are compatible with both the OpenAI Chat Completions API and the Anthropic Messages API. |
| SE007 | Baseten | Inference at Scale with Dedicated Deployments | Baseten | Dedicated deployments are single-tenant, can be region-locked, and are HIPAA compliant and SOC 2 Type II certified on Baseten Cloud. |
| SE008 | Baseten | Baseten Frontier Gateway | Baseten Frontier Gateway gives you a production-ready, white-labeled API endpoint. |
| SE009 | Baseten | Multi-Model Inference, Ultra-Low Latency at Scale | Baseten | Deploy your Chain to production with each Chainlet specifying its own hardware resources, software dependencies and scaling settings independently. |
| SE010 | Baseten | AI Model Training Built for Production Inference | Baseten | Loops (early access) ... Training Jobs (GA). |
| SE011 | Baseten | Cloud-Native AI Infrastructure | Baseten | We built cross-cloud autoscaling so you can serve users anywhere in the world with low latency and high reliability. |
| SE012 | Baseten | AI Model Management for Production Inference | Baseten | |
| SE013 | Baseten | AI Model Performance - Baseten Inference Runtime | We take the best open-source inference frameworks (TensorRT, SGLang, vLLM, TGI, TEI, and more) and layer in our own optimizations for maximum performance. |
| SE014 | Baseten | Secure model inference - Baseten | Baseten does not store model inputs, outputs, or weights by default. |
| SE015 | Baseten | Mission-Critical Inference for Enterprise AI Infrastructure | We are SOC-2 Type II certified and HIPAA compliant across all hosting options and support data residency requirements through region-restricted deployments. |
| SE016 | Baseten | Cloud Pricing | Only pay for the compute you use, down to the minute. |
| SE017 | Baseten | SSO and SCIM | Connect Baseten to your identity provider for SAML 2.0 sign-in and SCIM 2.0 directory sync. |
| SE018 | Baseten | Rolling deployments | You can now gradually shift traffic to new deployments instead of swapping all at once. |
| SE019 | Baseten | Introducing the Baseten Delivery Network (BDN) | We just launched the Baseten Delivery Network (BDN), designed to make cold starts 2-3x faster for large models. |
| SE020 | Baseten | Retrieve billing usage via API | You can now query your billing usage programmatically using the new GET /v1/billing/usage_summary endpoint. |
| SE021 | Baseten | Regional environments | Regional environments route inference traffic for a deployment exclusively to workload planes within a designated geographic region. |
| SE022 | Baseten | Baseten Status | Past Incidents ... May 29, 2026 ... May 26 ... May 19 ... May 18 ... May 16 ... May 15. |
| SE023 | Baseten | Service Level Agreement | Baseten will undertake commercially reasonable measures to ensure that System Availability ... equals or exceeds ninety-nine point nine percent (99.9%) during each calendar month. |
| SE024 | Baseten | Why we built and open-sourced a model serving solution | To address this problem, we built Truss. |
| SE025 | GitHub / basetenlabs | GitHub - basetenlabs/truss: The simplest way to serve AI/ML models in production | Truss is the CLI for deploying and serving ML models on Baseten. |
| SE026 | GitHub / basetenlabs | Releases · basetenlabs/truss | v0.18.3 ... 21 May 16:14 ... feat(loops/cli) ... feat(train) ... feat(truss). |
| SE027 | PyPI | truss | pip install --upgrade truss |
| SE028 | Baseten | How Writer helps businesses transform with AI | In a benchmark running the LLMs in FP16 on four NVIDIA A100 GPUs, Writer saw: 60% higher tokens per second, 23% lower time to first token, 35% lower cost per million tokens. |
| SE029 | Baseten | OpenEvidence delivers instant, accurate medical information with the Baseten Inference Stack | By using Baseten, OpenEvidence achieved: 78% lower latency ... 6x faster deployment processes ... 8x+ reduction in infrastructure maintenance time overall. |
| SE030 | ServiceAlert | Baseten Outage History, Downtime & Incident Records | Detailed incident data is not available for this service. |
| SE031 | HostFleet | Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) — HostFleet | You’re a startup with a production inference workload and a budget → Baseten. Most expensive per GPU-hour on paper, but Truss, observability, and support are tangible. |
| SE032 | Runpod | The AI Developer Cloud | Runpod | Sub-200ms cold starts ... Zero idle cost. |
| SE033 | Modal | Modal: High-performance AI infrastructure | Autoscale from 0 to 1000+ GPUs, instantly. |
| SE034 | Replicate | Run AI with an API | You can deploy your own custom models using Cog, our open-source tool for packaging machine learning models. |
| SE035 | Amazon Web Services | The center for all your data, analytics, and AI – Amazon SageMaker – AWS | Train, customize, and deploy ML and foundation models on a highly performant and cost-effective infrastructure. |
| SE036 | Google Cloud | Gemini Enterprise Agent Platform (formerly Vertex AI) | Agent Platform is our open and comprehensive platform ... to build, scale, govern and optimize enterprise-grade agents. |
| SE037 | Microsoft Azure | Azure Machine Learning - ML as a Service | Microsoft Azure | Azure Machine Learning is a comprehensive machine learning platform that supports language model fine-tuning and deployment. |
| SE038 | Nudge Security | Is Baseten Safe? Learn if Baseten Is Legit | Nudge Security | Baseten Supply Chain ... Amazon Web Services (AWS), Vercel, Statuspage, SendGrid, Stripe, Google Analytics, Segment, Sentry ... |
| SU001 | Baseten | Inference Platform: Deploy AI models in production | Baseten | |
| SU002 | Baseten | Customer stories | |
| SU003 | Baseten | Mission-Critical Inference for Enterprise AI Infrastructure | |
| SU004 | Baseten | Healthcare | |
| SU005 | Baseten | Cloud Pricing | |
| SU006 | Baseten | Announcing Baseten's $300M Series E | |
| SU007 | Business Wire | Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future | |
| SU008 | Baseten | How Writer helps businesses transform with AI | |
| SU009 | Baseten | OpenEvidence delivers instant, accurate medical information with the Baseten Inference Stack | |
| SU010 | Baseten | How Speechify makes audio the default with real-time text-to-speech | |
| SU011 | Baseten | How Gamma makes building presentations criminally fun | |
| SU012 | Baseten | Superhuman achieves 80% faster embedding model inference with Baseten | |
| SU013 | Baseten | Patreon saves nearly $600k/year in ML resources with Baseten | |
| SU014 | OpenEvidence | OpenEvidence | |
| SU015 | Writer | WRITER | |
| SU016 | Writer | About WRITER | |
| SU017 | Gamma | About Us – Reinventing Presentations with AI | Gamma.app | |
| SU018 | Speechify | Speechify: Text to Speech & Voice Typing AI Assistant | 55M+ Users | |
| SU019 | Speechify | Voice Over Studio: Request A Free Demo | Speechify | |
| SU020 | Superhuman | Superhuman: Docs, Mail, and AI That Work Everywhere | |
| SU021 | Patreon | Where Creator Communities Thrive — Patreon | |
| SU022 | NVIDIA | Case study:Baseten’s AI Inference Infrastructure | Baseten's infrastructure, powered by NVIDIA GPUs on AWS EC2 instances powered by NVIDIA A100 Tensor Core GPUs, only takes 5–10 seconds to get the model up and running. |
| SU023 | WorkOS | Baseten is betting big on open source models — WorkOS | companies could switch to models that were faster, less expensive, more customizable, and more reliable at scale |
| SU024 | FeaturedCustomers | 46 Baseten Customer Reviews & References | Customer Rating Review Score based on 654 reference ratings 4.8/5.0 |
| SU025 | Runpod | Top Serverless GPU Clouds for 2026: Comparing Runpod, Modal, and More | Baseten ... Usage-based (per-minute) ... 8–12 sec |
| SU026 | HostFleet | Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) — HostFleet | Baseten has per-GPU-hour pricing plus a minimum dedicated deployment cost. |
| SU027 | Abridge | Generative AI for Clinical Conversations | Abridge | |
| SU028 | Clay | Clay | Go to market with unique data—and the ability to act on it | |
| SU029 | Cursor | The best coding agent | |
| SU030 | Notion | Meet your AI team | Notion | |
| SU031 | PeerSpot | Baseten Reviews, Competitors and Pricing | |
| SU032 | Mercor | Mercor | Organizing human intelligence to power the AI economy | |
| SR001 | Baseten | Secure model inference | Baseten does not store model inputs, outputs, or weights by default. |
| SR002 | Baseten | Privacy Policy | |
| SR003 | Baseten | Baseten Terms and Conditions | Customer acknowledges and agrees that the Baseten Products & Services will not be used, and is not licensed for use, in connection with any of Customer’s time-critical or mission-critical functions. |
| SR004 | Baseten | Service Level Agreement | Baseten will undertake commercially reasonable measures to ensure that System Availability ... equals or exceeds ninety-nine point nine percent (99.9%). |
| SR005 | Baseten | Baseten Status | |
| SR006 | ServiceAlert | Baseten Outage History, Downtime & Incident Records | |
| SR007 | Nudge Security | Is Baseten Safe? Learn if Baseten Is Legit | |
| SR008 | Baseten | Cloud Pricing | |
| SR009 | Baseten | Baseten homepage | |
| SR010 | Baseten | Cloud-Native AI Infrastructure | |
| SR011 | Baseten | Mission-Critical Inference for Enterprise AI Infrastructure | |
| SR012 | Baseten | Healthcare | SOC-2 Type II and HIPAA compliant with flexible hosting and data residency with region-restricted cloud deployments. |
| SR013 | Technavio | AI Inference-as-a-service Market Growth Analysis - Size and Forecast 2026-2030 | |
| SR014 | Mordor Intelligence | Enterprise AI Market - Share, Trends & Size 2025 - 2031 | |
| SR015 | Runpod | Top Serverless GPU Clouds for 2026: Comparing Runpod, Modal, and More | |
| SR016 | HostFleet | Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) | Baseten has per-GPU-hour pricing plus a minimum dedicated deployment cost. Scale-to-zero is available but there are billed minimum awake times. |
| SR017 | Tracxn | Baseten Technologies | |
| SR018 | Business Wire | Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future | |
| SR019 | Baseten | Careers at Baseten | |
| SR020 | Baseten | Production-First Model APIs - Baseten Inference Stack | |
| SR021 | Baseten | AI Model Training Built for Production Inference | |
| SR022 | Baseten | AI Model Management for Production Inference | |
| SR023 | Baseten | Baseten Frontier Gateway | |
| SR024 | Baseten | Inference at Scale with Dedicated Deployments | |
| SR025 | Baseten | SSO and SCIM | |
| SR026 | Baseten | Rolling deployments | |
| SR027 | Baseten | Retrieve billing usage via API | |
| SR028 | Baseten | Why we built and open-sourced a model serving solution | |
| SR029 | U.S. Department of Health & Human Services | Business Associates | The satisfactory assurances must be in writing, whether in the form of a contract or other agreement between the covered entity and the business associate. |
| SR030 | European Commission | The EU’s approach to artificial intelligence | |
| SR031 | Modal | Series C announcement | |
| SR032 | Reuters | AI startup Modal raised $355 million in a new round of financing, valuing the company at $4.65 billion | |
| SR033 | Sacra | Fireworks AI revenue, valuation & funding | |
| SR034 | Tracxn | RunPod | |
| SR035 | CoreWeave | Record First Quarter Revenue and Revenue Backlog Highlight Unprecedented Demand for CoreWeave Cloud | |
| SV001 | Baseten | Announcing Baseten’s $300M Series E | We’re thrilled to announce that we have raised $300M at a $5B valuation. |
| SV002 | Business Wire | Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future | This values Baseten at $5 billion and marks the company’s third fundraise in the past year. |
| SV003 | Tech Funding News | Baseten nabs $300M from IVP, CapitalG to challenge Together AI in inference | investors invested $300 million in the company, pushing its valuation to about $5 billion |
| SV004 | Sacra | Baseten revenue, valuation & funding | Sacra estimates that Baseten hit $600M in annualized revenue in March 2026. |
| SV005 | Tracxn | Baseten Technologies | Jan 20, 2026 | $300M | Series E | $5B |
| SV006 | CB Insights | Baseten Stock Price, Funding, Valuation, Revenue & Financial Statements | Baseten has raised $585M over 7 rounds. |
| SV007 | PitchBook | Baseten 2025 Company Profile: Valuation, Funding & Investors | PitchBook | Latest Deal Amount $75M |
| SV008 | Baseten | Announcing Baseten’s $150M Series D | Today, we’re excited to announce our $150M Series D, led by BOND. |
| SV009 | Baseten | Announcing Baseten’s $75M Series C | Today, we’re thrilled to announce our Series C fundraise. |
| SV010 | Baseten | Cloud Pricing | Basic: $0 per month, pay as you go. |
| SV011 | Baseten | Baseten homepage | Scale workloads across any region and any cloud ... with ... 99.99% uptime out of the box. |
| SV012 | HostFleet | Serverless GPU pricing matrix 2026 | Baseten. Most expensive per GPU-hour on paper, but Truss, observability, and support are tangible. |
| SV013 | Runpod | Top serverless GPU clouds for 2026 AI workloads | Baseten ... Usage-based (per-minute) ... 8–12 sec |
| SV014 | Runpod | Runpod pricing | H100 PCIe $2.89/hr |
| SV015 | Modal | Modal's Series C: Raising $355M at a $4.65B valuation | We’ve raised $355 million ... surpassing $300 million in annualized revenue. Our valuation is $4.65B post-money. |
| SV016 | Reuters / U.S. News | Exclusive-Modal Labs Valued at $4.65 Billion as AI Coding Takes Off | The company’s annualized revenue is about $300 million, up from an annualized rate of $60 million in September. |
| SV017 | Modal | Modal pricing | Get started with $30 / month free credits |
| SV018 | Sacra | Fireworks AI revenue, valuation & funding | Fireworks AI hit $315M in annualized revenue in February 2026 ... gross margin sits at approximately 50%. |
| SV019 | CoreWeave / Business Wire | CoreWeave Reports Strong First Quarter 2026 Results | Revenue backlog was $99.4 billion as of March 31, 2026. |
| SV020 | CompaniesMarketCap | CoreWeave market capitalization | As of May 2026 CoreWeave has a market cap of $59.75 Billion USD. |
| SV021 | Datadog | Datadog Announces First Quarter 2026 Financial Results | Revenue was $1,006 million ... Full Year 2026 Outlook: Revenue between $4.30 billion and $4.34 billion. |
| SV022 | CompaniesMarketCap | Datadog market capitalization | As of May 2026 Datadog has a market cap of $88.04 Billion USD. |
| SV023 | Datadog / SEC filing mirror | Datadog Annual Report 2026 | Form 10-K (NASDAQ:DDOG) ... For the fiscal year ended December 31, 2025 |
| SV024 | CompaniesMarketCap | Cloudflare market capitalization | As of May 2026 Cloudflare has a market cap of $85.47 Billion USD. |
| SV025 | Stock Analysis | Cloudflare revenue 2016-2026 | This brings the company’s revenue in the last twelve months to $2.33B. |
| SV026 | CompaniesMarketCap | MongoDB market capitalization | As of May 2026 MongoDB has a market cap of $27.01 Billion USD. |
| SV027 | Stock Analysis | MongoDB revenue 2017-2026 | This brings the company’s revenue in the last twelve months to $2.60B. |
| SV028 | Technavio | AI Inference-as-a-service Market Growth Analysis - Size and Forecast 2026-2030 | The AI Inference-as-a-service Market size was valued at USD 85.25 billion in 2025, growing at a CAGR of 22.1% during 2026-2030. |
| SV029 | Mordor Intelligence | Enterprise AI Market - Share, Trends & Size 2025 - 2031 | The Enterprise AI market size stood at USD 114.87 billion in 2026. |
| SV030 | Amazon Web Services | Amazon Bedrock Pricing | Amazon Bedrock offers ... batch inference at a 50% lower price compared to on-demand inference pricing. |
| SV031 | Google Cloud | Gemini Enterprise Agent Platform | New customers get up to $300 in free credits. |
| SV032 | Microsoft Azure | Azure Machine Learning | The SLA for Azure Machine Learning is 99.9 percent uptime. There's no additional charge to use Azure Machine Learning. |