Startup Diligence
Diligence report AI inference infrastructure Late-stage private (Series E) 2026-05-30

Baseten

Premium inference infrastructure for production AI workloads

Baseten is a high-quality AI inference infrastructure company with real enterprise traction and strong category positioning, but public financial disclosure is too thin to justify treating momentum pricing as a high-conviction buy.

Cover facts

Latest valuation 01
$5B USD (Jan 2026 Series E) [CO025]
Total raised 02
$585M USD (publicly disclosed) [CO027]
Last round 03
$300M Series E [CO025]
Revenue estimate 04
$600M annualized run rate (Sacra, Mar 2026) [CI035]
Founded 05
2019 [CO001]
Headcount 06
~258 employees (Tracxn, Apr 2026) [CI038]

Company profile

Baseten is a San Francisco-based inference infrastructure company founded in 2019 by Tuhin Srivastava, Amir Haghighat, Phil Howes, and Pankaj Gupta. The company positions itself as the software layer for running production AI workloads across its own cloud or a customer's environment, combining model APIs, dedicated inference, training, and enterprise deployment controls. Public customer proof spans AI-native products and regulated workloads including Cursor, Clay, OpenEvidence, Abridge, Gamma, Patreon, Speechify, Writer, Hebbia, Wispr Flow, and others. Baseten raised $300M at a $5B valuation in January 2026, bringing disclosed funding to about $585M and reinforcing its status as a late-stage private infrastructure company.

Website
www.baseten.co
Founders
Tuhin Srivastava, Amir Haghighat, Phil Howes, Pankaj Gupta
Founding location
San Francisco, CA, USA
Headquarters
San Francisco, CA, USA
Product
Baseten sells a production inference platform for custom models, model APIs, dedicated inference, training workflows, and compound-AI orchestration. Its Truss framework is the developer entry point, while enterprise features emphasize multi-cloud deployment, self-hosting, regional controls, compliance scope, and performance tuning for latency-sensitive workloads.
Customers
Enterprise AI teams, AI-native application builders, and regulated workloads that need low-latency, high-reliability model serving with security, hybrid deployment, or performance-engineering support. Public customer evidence is strongest in healthcare AI, developer tooling, GTM automation, voice, and productivity software.
Business model
Usage-based monetization: token-priced model APIs, per-minute GPU and CPU compute for deployments, and negotiated Pro or Enterprise contracts for dedicated capacity, support, and self-hosted or data- residency-sensitive environments. The platform also appears to expand via training, chains, and higher-touch enterprise engineering support.
Stage
Late-stage private (Series E)
Funding status
Public funding history includes a little over $20M across seed and Series A, a $40M Series B in 2024, a $75M Series C in February 2025, a $150M Series D in September 2025, and a $300M Series E at a $5B valuation in January 2026. Business Wire says total disclosed funding is about $585M.
[CO001, CO002, CO004, CO005, CO006, CO008, CO020, CO022]

Executive summary

Top strengths

  • Premium positioning in a fast-growing part of the AI stack: production inference for open, custom, and hybrid workloads
  • Strong recent fundraising and investor roster, with IVP, CapitalG, NVIDIA, Greylock, Spark, and others backing the company
  • Clear developer and enterprise wedge via Truss, dedicated inference, self-hosting, and multi-cloud deployment controls
  • Public customer proof across healthcare, coding, voice, GTM, and productivity workloads shows relevance beyond one narrow vertical
  • Customer case studies repeatedly cite meaningful latency, throughput, and cost improvements versus prior inference setups

Top risks

  • Public revenue, margin, burn, concentration, and retention data remain undisclosed, forcing investors to rely on estimated rather than filed financials
  • Premium price positioning can be pressured by cheaper GPU clouds and by hyperscalers bundling inference with broader cloud relationships
  • Reliability and compliance underwriting still depends on negotiated terms, BAAs, and custom SLAs rather than website messaging alone
  • The jump from a $5B closed valuation to mooted $11B follow-on pricing is hard to defend without much better disclosure
  • Baseten appears to run a support-heavy, performance-engineering-intensive model that could be harder to scale cleanly than pure software narratives imply

Open gaps

  • Audited revenue, gross margin, burn, runway, and customer-concentration data are not public
  • No public evidence resolves retention metrics such as NRR, churn, or contract duration
  • Public governance visibility is limited; the full current board, committees, and founder ownership are not disclosed in the fetched corpus
  • Healthcare and regulated-use underwriting still needs exact BAA, DPA, and shared-responsibility terms for production accounts
  • The economics and terms behind any mooted post-Series-E financing are not publicly available

Contents

Chapter 01

01Company Overview

1.1 Identity, History, and Leadership

Baseten is easiest to understand as a founder-led inference infrastructure company rather than a generic MLOps toolkit. Its own history traces the origin back to late 2019, when the four founders started the company to solve model-deployment pain they had experienced firsthand. Current legal pages anchor the business as Baseten Labs, Inc. in San Francisco, while the homepage, enterprise, and pricing surfaces consistently frame the product around high-performance inference, managed APIs, and training or deployment workflows rather than broad horizontal software. That identity matters for the rest of the report: Baseten is positioning itself as the software layer that runs production AI workloads across its own cloud or a customer’s environment, with compliance and data-control features aimed at more sensitive workloads. Leadership continuity is also unusually visible for a late-stage private company. Tuhin Srivastava is publicly surfaced as CEO and co-founder, Amir Haghighat as CTO and co-founder, and author pages still identify Pankaj Gupta and Phil Howes as co-founders. The Series E post is signed by all four founders, reinforcing that the company still presents a founder-centric leadership story. The caveat is governance transparency: the public corpus clearly shows one board addition in 2025, but it does not provide a full current board roster, committee structure, or founder ownership map.[CO001, CO002, CO003, CO004, CO005, CO006]

Snapshot KPI table
MetricValue / statusAs ofConfidenceNote / gap
Founded20192019-01-01HighExact public day is not surfaced in the fetched corpus, so the year is the reliable anchor.
HeadquartersSan Francisco, California2026-05-30HighPrivacy policy gives a specific San Francisco address.
Legal entityBaseten Labs, Inc.2026-05-30HighTerms and privacy policy use the same legal entity name.
Current stagePrivate, Series E2026-05-30HighSupported by Tracxn and the archived PitchBook profile after the January 2026 round.
Latest financing$300M Series E at $5B valuation2026-01-20HighLed by IVP and CapitalG with NVIDIA and prior investors also participating.
Lifetime capital raised$585M2026-01-23HighBusinessWire and market-data sources align on cumulative funding.
Business modelUsage-based API tokens plus per-minute compute2026-05-30MediumPublic pricing is clear; enterprise contract discounts and minimums are not.
Deployment modelBaseten Cloud, self-hosted, and region-aware enterprise options2026-05-30HighOfficial enterprise and healthcare pages emphasize self-hosting and data-control features.
Named customer setAbridge, Cursor, Clay, OpenEvidence, Notion, Speechify, Gamma2026-05-30HighNamed across careers, customer hub, customer stories, and Series E press materials.
Public headcount2026-05-30LowPitchBook and Tracxn conflict, so current employee count should be treated as unresolved.

Snapshot rows mix stable identity facts with current operating and financing markers; null means the fetched public corpus is not reliable enough to support a single number.

[CO001, CO002, CO003, CO004, CO006, CO007]
Leadership and founder table
PersonCurrent role / public titlePublic evidence of fit or coverageVisibility todayDiligence implication
Tuhin SrivastavaCEO, co-founderPublic spokesperson on financing and company thesis; author page and financing coverage identify him as the chief executive.HighFounder-led narrative remains a strength but also creates CEO key-person dependence.
Amir HaghighatCTO, co-founderAuthor page identifies the technical leader; Series E signoff keeps him in the visible founder set.HighTechnical and product credibility remain tied to a founding executive.
Phil HowesCo-founder; chief scientist in independent coverageOfficial author page plus Tech Funding News show ongoing founder visibility tied to model performance and research.MediumScience leadership appears founder-rooted even if the exact org chart is private.
Pankaj GuptaCo-founderOfficial author page and Series E signoff confirm continuity, but the fetched corpus does not surface a current operating title.MediumFunctional coverage is less transparent than for the CEO and CTO.
Jay SimonsBoard member since Series DSeries D explicitly says he joined the board as part of the BOND-led financing.LowGovernance visibility improved in 2025, but the full board and committee map is still incomplete.

Rows cover the founders and the one board addition that is explicit in the fetched corpus; the public materials reviewed do not expose a full executive roster or detailed committee structure.

[CO010, CO011, CO012, CO013, CO014, CO015]
FO002: Company snapshot logic

Baseten’s current story links founder continuity, deployment control, customer-proofed performance, and repeated capital access into one inference-platform thesis.

[CO004, CO005, CO006, CO008, CO010, CO011]

1.2 Funding, Stage, and Investor Base

Baseten’s capital history is now the clearest external signal that the company has graduated into late-stage AI infrastructure. The fetched corpus supports a progression from a modestly funded early company to a Series E business: the Series A post says Baseten had raised a little over $20 million across seed and Series A; the Series B announcement adds $40 million; the Series C announcement adds $75 million; Series D adds $150 million; and Series E adds another $300 million at a $5 billion valuation. Independent market-data sources corroborate the timing of those rounds and place the September 2025 Series D valuation at about $2.15 billion, showing just how quickly the company repriced upward before the January 2026 round. The investor roster has also deepened rather than churned. Greylock and South Park Commons appear early, IVP and Spark become visible at growth stage, and later rounds add BOND, CapitalG, Conviction, 01A, BoxGroup, and NVIDIA. That pattern matters because it suggests both repeated insider support and a widening set of AI-specialist and platform investors. At the same time, public disclosure still stops well short of a full cap-table view: the fetched corpus does not expose current ownership percentages, investor control rights, liquidation preferences, or a fully reliable board-observer map.[CO016, CO017, CO018, CO019, CO020, CO021]

Stakeholder or investor map
Investor / stakeholderFirst explicit round in corpusCurrent relevanceWhy it mattersDiligence ask
GreylockSeries AEarliest clearly named institutional backer in the fetched official corpusAnchors early company formation and remained visible through later growth-round narratives.Confirm current ownership and pro-rata participation after Series E.
South Park CommonsSeries AEarly network backer that still appears in later company historiesSignals founder-network support rather than pure late-stage capital.Confirm whether SPC still holds a meaningful stake after multiple step-up rounds.
IVPSeries BGrowth-stage repeat lead that also led or anchored later roundsAppears repeatedly across B, C, and E, making it one of the clearest long-duration financial sponsors.Confirm board rights, reserve behavior, and concentration at Series E.
Spark CapitalSeries BEarly growth investor visible across the 2024–2025 roundsHelps show continuity from the generative-AI scaling phase into later financing momentum.Confirm current stake and whether Spark still participates after Series E.
01ASeries CLater-stage investor that remains named in subsequent financing dataLinks Baseten to operator-investor sponsorship from Adam Bain and Dick Costolo’s network.Confirm whether 01A has governance rights or only economic exposure.
BONDSeries DSeries D lead and later participant in Series EImportant marker for the 2025 valuation step-up and board evolution.Confirm whether BOND added special terms at the D-to-E transition.
CapitalGSeries DJoined in D and co-led EPotentially valuable strategic network around Google ecosystem distribution and infrastructure credibility.Clarify commercial partnerships or channel overlap beyond pure equity sponsorship.
NVIDIASeries EStrategic investor in the latest roundCould matter for hardware access, performance collaboration, and signaling inside AI infrastructure.Confirm whether the relationship includes commercial commitments or preferred hardware access.
ConvictionSeries CVisible across C, D, and E-era disclosureAdds AI-specialist sponsorship and public advocacy for the inference-layer thesis.Confirm present ownership and board or observer rights.
BoxGroupSeries DStill named in later-round investor rostersShows continued support from earlier network investors even as the cap table deepens.Confirm position size versus symbolic participation in later rounds.

This map enumerates investors explicitly named across verified public financing sources from Series A through Series E; it is not a full cap table and does not expose ownership percentages or liquidation preferences.

[CO016, CO017, CO018, CO019, CO020, CO022]

1.3 Product Scope, Scale Proof, and Milestones

The product and scale story is strong enough to explain why Baseten could raise three times in roughly a year, but it still needs to be read with some caution. Official materials now tie the company to a broad inference platform narrative: cloud and self-hosted deployment options, enterprise controls, model APIs, and pay-as-you-go compute pricing. The strongest external proof comes from customer and partner evidence. NVIDIA’s case study says Baseten cut cold starts to 5–10 seconds from up to five minutes and doubled one customer’s inference performance with TensorRT-LLM. Customer case studies claim that OpenEvidence runs billions of requests per week on Baseten, Gamma serves roughly 3 million images per day for 70+ million users, Speechify cut cost per million characters by 44%, and Patreon cut GPU cost by 70%. Those numbers support the idea that Baseten is serving meaningful production workloads across healthcare, productivity, creator, and GTM software. Still, chapter-one diligence should preserve three caution flags. Public headcount is inconsistent across data vendors, governance detail remains thin outside financing announcements, and independent reachability monitoring does not provide enough incident detail to fully evaluate reliability history. So the right takeaway is not that Baseten lacks scale; it is that the business looks operationally significant while remaining unusually private relative to its valuation.[CO005, CO006, CO007, CO008, CO021, CO028]

Milestone table
DateEventTypeAmount / valuation / statusParticipantsImplication
2019-01-01Founders start Baseten to solve ML deployment painfoundingCompany formedTuhin Srivastava; Amir Haghighat; Phil Howes; Pankaj GuptaEstablishes the origin point for the current inference-first narrative.
2021-05-01Early product is quietly announced after roughly 18 months of building and public beta is highlighted in the Series A postproductPublic beta eraBaseten foundersShows that the company moved from internal build mode into market testing well before the later capital sprint.
2022-04-26Series A milestone formalizes early investor backingfinancing> $20M cumulative seed + Series AGreylock; South Park Commons; Lachy Groom; Ray Tonsing; angel investorsValidates early demand for the original model-deployment product vision.
2024-03-04Series B adds growth-stage capitalfinancing$40MIVP; Spark; Greylock; South Park Commons; Lachy Groom; Base CaseMoves Baseten from early MLOps roots into broader generative-AI infrastructure expansion.
2025-02-19Series C pairs funding with public scale claimsscale$75M; workloads across thousands of GPUs; millions of end customersIVP; Spark; Greylock; Conviction; South Park Commons; Basecase; 01A-linked investorsDemonstrates that infrastructure scale became central to the story before the late-stage jump.
2025-09-05Series D raises new growth capital and adds a board membergovernance$150M at about $2.15B valuationBOND; CapitalG; Conviction; Jay SimonsCapital formation and governance maturation start to move together.
2026-01-14WorkOS interview highlights a startup program and voice as an emerging modalityproductNew GTM program and voice focusPhilip Kiely; WorkOS interviewersSuggests Baseten is broadening market coverage and prioritizing voice workloads in the next phase.
2026-01-20Series E establishes Baseten as a late-stage inference platform companyfinancing$300M at $5B valuation; third fundraise in prior yearIVP; CapitalG; NVIDIA; 01A; Altimeter; Battery Ventures; BOND; BoxGroup; Blackbird Ventures; Conviction; GreylockConfirms investor appetite for independent inference infrastructure at significant scale.

Year-only dates use January 1 and month-only dates use the first day of the cited month when the fetched public source supports the timing but not a precise day.

[CO001, CO009, CO016, CO017, CO018, CO019]
FO001: Company milestone timeline

The chronology shows Baseten moving from a 2019 founding into a compressed A-to-E financing sequence and a broader inference-platform narrative by early 2026.

Year-only or month-only milestones use January 1 or the first day of the cited month when the fetched source does not provide a precise public day.

[CO001, CO016, CO017, CO018, CO019, CO020]
FO003: Scale and proof KPIs

These KPIs are not internal financial statements; they are the clearest public scale and customer-outcome markers visible in the fetched corpus.

Customer metrics come from individual case studies and should be read as proof points rather than a consolidated operating dashboard for Baseten itself.

[CO025, CO027, CO031, CO032, CO033, CO034]

1.4 Exhibits

Chapter 02

02Market Analysis

2.1 Market Boundary, Included Spend, and Substitutes

Baseten is best understood as a production inference platform, not a general-purpose cloud or a model lab. The included spend is the layer required to package, deploy, run, monitor, meter, and secure AI workloads after a team already has a model or a model endpoint in mind: model APIs, dedicated deployments, compound-AI orchestration, observability, billing, and the support needed to keep latency and uptime inside production targets. The market boundary is narrower than the full enterprise-AI stack because Baseten does not market data lakes, BI, generic application development, or broad agent productivity suites as the center of its value proposition. It is also narrower than frontier-model R&D because Baseten helps teams operationalize models rather than invent them. The closest substitutes are hyperscaler AI platforms, internal GPU infrastructure, and specialist GPU clouds such as Modal, Replicate, and Runpod. Baseten's adjacencies sit one step above and below the core deployment layer: training that promotes directly into inference, and model-lab monetization through white-labeled APIs.[CM001, CM002, CM003, CM004, CM005, CM006]

Market definition table
Segment / CategoryIncluded SpendExcluded SpendBuyer / PayerRelevance to Baseten
Production inference platformModel-serving runtime, autoscaling, observability, billing, support, security controlsFoundation-model R&D, generic data warehousing, generic app-dev toolsAI product lead or platform team; product or IT budget paysCore market where Baseten is explicitly positioned
Model APIsUsage-based inference endpoints, token metering, OpenAI-compatible accessClosed-model ownership or application-layer SaaS feature spendApplication engineer or product team; engineering budget pays initiallyLow-friction entry point and evaluation wedge
Dedicated / self-hosted inferenceDedicated GPU capacity, self-hosting, data residency, enterprise supportGeneral colocation, generic Kubernetes services, unmanaged GPU reservationsHead of AI platform, CIO/CTO, security or procurement stakeholdersEnterprise expansion path for sensitive or scaled workloads
Compound AI orchestrationMulti-model chains, hardware-aware orchestration, workflow optimizationGeneric workflow automation or iPaaS toolsML engineer or application engineering lead; product/platform budget paysImportant expansion layer for multimodal and agentic workloads
Model-lab API monetizationWhite-labeled API endpoints, rate limits, API keys, billing and meteringConsumer billing software or payment processorsModel lab or frontier-model product team; R&D/platform budget paysDistinct segment exposed by Frontier Gateway
Training-to-inference loopManaged training jobs and checkpoint promotion into inferenceFrontier research spend and pure experimentation without deployment intentML research lead or platform team; R&D budget paysAdjacency that deepens platform stickiness but is not the primary market lens

Boundary rows mix Baseten's core market with immediately adjacent spend layers. The point is to define what belongs in the commercial wedge before applying public market-size estimates.

[CM001, CM002, CM003, CM004, CM006, CM008]

2.2 Multiple Sizing Lenses and Why They Do Not Collapse into One TAM

Public sizing evidence is directionally strong but category boundaries remain messy. Technavio's narrower AI inference-as-a-service category is already a large market at USD 85.25 billion in 2025 and is forecast to grow 22.1% annually through 2030. Fortune Business Insights uses a broader AI inference lens and puts the category at USD 117.80 billion in 2026, while Mordor Intelligence sizes the adjacent enterprise AI market at USD 114.87 billion in 2026 and shows that software or platform layers and cloud deployment dominate spending. Those numbers should not be added together, because they partially overlap and use different definitions, but they do triangulate the same conclusion: Baseten is not pursuing a niche budget line. The useful valuation discipline is to keep moving down the stack from broad AI platform spend to the narrower production-inference wedge where Baseten actually competes. That narrower wedge is cloud-first, platform-heavy, North-America-concentrated, and already large enough that Baseten does not need heroic market-share assumptions to justify a meaningful opportunity. What public data does not provide is a clean Baseten-specific SAM or SOM.[CM009, CM010, CM011, CM012, CM013, CM014]

TAM / SAM / sizing lens table
LensPublisherBase year / horizonGeographyMetricValueLimitation
AI inference-as-a-service market sizeTechnavio2025 base, 2026-2030 forecastGlobalMarket sizeUSD 85.25BNarrower service-based inference category; summary page only
AI inference-as-a-service growthTechnavio2026-2030GlobalCAGR22.1%Forecast, not current revenue; category scope differs from broader inference reports
Broader AI inference market sizeFortune Business Insights2026GlobalMarket sizeUSD 117.80BBroader execution market spanning cloud, edge, and on-prem
Broader AI inference growthFortune Business Insights2026-2034GlobalCAGR12.98%Longer forecast horizon than Technavio
Adjacent enterprise AI market sizeMordor Intelligence2026GlobalMarket sizeUSD 114.87BMuch broader than inference alone
Platform-heavy slice inside enterprise AIMordor Intelligence2025GlobalSoftware/platform share65.89%Share of broader enterprise AI market, not Baseten-specific SAM
Cloud-first deployment lensMordor Intelligence2025GlobalCloud deployment share67.33%Share of enterprise AI revenue, not inference-only spend
Large-enterprise buyer concentrationMordor Intelligence2025GlobalLarge-enterprise share71.43%Useful for buyer concentration, not direct TAM
Regulated-healthcare growth wedgeMordor Intelligence2026-2031GlobalHealthcare CAGR20.77%Vertical growth lens rather than whole-market size

These rows are sizing lenses, not additive market totals. Public sources use overlapping definitions, so the safest use is triangulation rather than arithmetic aggregation.

[CM009, CM010, CM012, CM013, CM014, CM015]
FM001: Market sizing lens

A narrowing lens from broad enterprise AI spend toward Baseten's more defensible beachhead in performance-sensitive, compliant production inference.

This pyramid is a narrowing logic chain, not an additive model. The middle layers mix market share and market size because public sources do not publish one clean Baseten-specific hierarchy.

[CM005, CM009, CM013, CM014, CM015, CM020]
FM002: Serverless GPU price range

Published hourly GPU-rate spread across specialist providers illustrates how visible raw infrastructure pricing has become in this market.

Ranges come from HostFleet's April 2026 matrix of vendor-published prices. They are not performance-normalized benchmark results and do not include negotiated enterprise discounts.

[CM031, CM032, CM043, CM044, CM047]

2.3 Buyer, User, and Payer Segmentation

The public buyer evidence points to three especially relevant segments. First are AI-native product teams such as Gamma, where product or engineering leaders care about launch speed, low latency, and lower-cost open-model serving without building a dedicated ML-infrastructure team. Second are enterprise AI platform teams and model builders such as Writer, where the user is the ML engineer or data scientist but the deployment decision widens to include platform, security, and procurement once workloads become dedicated, multi-GPU, or compliance-sensitive. Third are regulated vertical deployments such as OpenEvidence in healthcare, where reliability, data handling, and the ability to scale without signing large GPU commitments become explicit selection criteria. Baseten's packaging supports these segments differently: usage-based plans and model APIs lower the barrier for experimentation, while Enterprise, self-hosting, SSO or SCIM, compliance policies, and billing APIs are signals that larger buyers expect governance, attribution, and controlled rollout. The budget owner is therefore partly observed and partly inferred: it likely starts in product or engineering budgets and migrates toward central platform or IT budgets as the deployment becomes more strategic.[CM016, CM017, CM021, CM022, CM023, CM024]

Segment / buyer map
SegmentBuyerUserPayerWorkflowBudget ownerAdoption trigger
PLG AI application teamVP Engineering or product leadApplication engineer and ML engineerProduct or engineering budgetPrototype with Model APIs then move to dedicated capacityProduct engineeringLaunch-day latency, reliability, and lower open-model cost
AI-native startup platform teamCTO or Head of AIML engineer and data scientistInfrastructure or platform budgetReplace closed-model dependence with managed open-model servingEngineering / platformNeed performance without hiring an infra-specialist team
Large-enterprise AI platform teamHead of AI platform, CIO, or CTOPlatform engineer, ML engineer, data scientistCentral platform or IT budgetDeploy compliant production inference across business unitsPlatform / ITDedicated capacity, SSO/SCIM, compliance policy, cloud commitments
Regulated healthcare AI workloadVP Engineering, CTO, or clinical-product leaderML engineer or application engineerPlatform plus security/compliance budgetMedical search, transcription, or patient-facing assistant deploymentPlatform plus securityHIPAA, uptime, and data-control requirements
Model lab or proprietary-model vendorResearch-product leader or commercialization leadInference engineer and research engineerR&D or platform budgetWhite-labeled API monetization through Frontier GatewayR&D / platformNeed to sell inference without building a customer-facing control plane
Compound AI / multimodal teamHead of AI application or staff engineerFull-stack engineer plus ML engineerProduct plus platform budgetChains-based orchestration across multiple models and machinesProduct / platformLatency and GPU waste from monolithic deployments

Buyer and budget-owner fields combine directly stated product packaging with cautious inference from customer stories. Public evidence is stronger on user and trigger than on exact signature authority.

[CM016, CM021, CM022, CM023, CM024, CM025]
FM003: Segment fit heatmap

Qualitative fit heatmap showing where Baseten's compliance, cloud, and premium-support proposition appears strongest by segment.

Ratings synthesize public evidence on cloud deployment share, healthcare growth, compliance requirements, and visible pricing pressure; they are not measured win-rate data.

[CM015, CM017, CM029, CM043, CM046, CM047]

2.4 Deployment Value Chain and Adoption Path

Baseten's value chain begins with either an open-source model, a custom model, or a proprietary model that already exists and needs to be turned into a dependable production service. From there the daily user is usually a model, platform, or application engineer who packages the workload and evaluates latency, throughput, and cost. The next gate is organizational rather than technical: security, compliance, and procurement checks intensify when the workload needs dedicated capacity, data residency controls, or identity integration. Baseten then sits in the orchestration layer that decides whether the workload runs through Model APIs, Dedicated Inference, Chains, Frontier Gateway, or a self-hosted or hybrid deployment. Under that layer sits the actual cloud and GPU substrate, which remains economically critical because capacity, price, and regional availability directly determine margin and reliability. Baseten's customer stories suggest that the company tries to move value capture upstream from raw GPU rental into performance engineering, deployment tooling, and operational support, because those are the layers customers cite when they explain why they did not keep building in-house.[CM021, CM025, CM027, CM028, CM029, CM030]

FM004: Deployment value-chain map

Baseten sits between model creation and end-user traffic, trying to capture value above raw GPU supply by owning deployment, controls, and performance operations.

This value chain is synthesized from product packaging, customer stories, and competitor documentation; it is a market-structure diagram, not an internal process map from Baseten.

[CM021, CM025, CM027, CM028, CM029, CM030]

2.5 Growth Drivers, Adoption Constraints, and Market Discipline

The strongest growth drivers are clear in both vendor and analyst material. Open-source models are improving fast enough that product teams increasingly want infrastructure optimized for those models rather than closed-model dependence; real-time and compound-AI workloads make latency and throughput economically visible; and enterprise buyers are moving from pilots into production, especially where regulated data, uptime, or model-performance tuning matter. Baseten's own case studies back that thesis with concrete claims on latency, tokens per second, image throughput, and reduced maintenance burden. But this is not an easy market. Hardware supply constraints, tariffs, and high accelerator prices remain structural headwinds. Talent shortages and legacy-system integration complexity slow rollout for enterprise buyers. Public competitive pricing also shows that raw GPU-hour economics are unforgiving: cheaper specialist clouds and broader hyperscaler suites both put pressure on a standalone inference vendor. That is why the right market view for Baseten is not all AI infrastructure; it is the subset of inference workloads where premium support, compliance, and performance matter enough to offset higher headline price points. Public data supports that wedge, but not yet a precise economic moat or a cleanly measurable serviceable market.[CM031, CM032, CM033, CM034, CM035, CM036]

Growth drivers and constraints table
FactorDirectionTimingImplicationDiligence ask
Open-source frontier models improve price/performanceDriverNowMakes specialist inference platforms more attractive than closed-model APIs for cost-sensitive productsWhat percentage of Baseten revenue already comes from open-model workloads?
Real-time and compound-AI latency sensitivityDriverNowRaises willingness to pay for performance engineering, orchestration, and autoscalingHow much of usage is latency-critical versus offline batch?
Cloud-first enterprise AI deploymentDriverNowSupports adoption of managed inference rather than self-built infra for many teamsHow much of Baseten demand comes from cloud-first versus self-hosted accounts?
Regulated-sector demand for compliance and data controlDriverNowCreates a wedge for HIPAA, region restrictions, and hybrid/self-hosted deploymentWhat share of enterprise pipeline requires regulated deployment boundaries?
GPU supply constraints and tariff pressureConstraintNowRaises cost of goods sold and can limit capacity availabilityWhat reserved-capacity strategy or cloud diversification protects supply?
Skills gaps and integration complexityConstraintMedium termSlow enterprise rollouts and increase implementation burdenHow much deployment work is productized versus services-heavy?
Price competition from specialist GPU cloudsConstraintNowCommodity GPU-hour comparisons can make Baseten look expensive on paperWhere does Baseten consistently win despite higher list prices?
Hyperscaler platform bundlingConstraintMedium termBroader native-cloud suites can absorb spend that might otherwise go to a specialist inference vendorWhich workloads truly require a specialist rather than a hyperscaler-native stack?
Opaque unit economics and support attachmentConstraintDiligence nowPublic material does not show whether premium positioning translates into durable marginRequest product-level gross margin and enterprise discount data.

The driver and constraint rows mix third-party market reports, Baseten product claims, customer evidence, and an independent pricing matrix. They are intended as a diligence framework, not a weighted scoring model.

[CM031, CM032, CM033, CM034, CM038, CM039]

2.6 Exhibits

Chapter 03

03Competitors

3.1 Competitive landscape and job-to-be-done coverage

Baseten sits in a crowded middle layer between low-friction serverless inference peers and large-cloud incumbents. The closest direct substitutes are Modal, Replicate, and Runpod: all give developers a way to get models onto GPUs without owning the infrastructure outright, but each compresses the stack in a different way. Modal optimizes for Python-native serverless compute, Replicate for community models and ultra-low-friction APIs, and Runpod for cheap raw capacity through Pods and Serverless. Above them sit AWS Bedrock/SageMaker, Google Vertex AI, and Azure ML, which compete less on indie-developer ergonomics and more on procurement leverage, governance, and existing cloud commitments. Below them sits the status-quo alternative: internal build on top of open packaging standards and rented GPUs. Baseten broadens the battlefield further because it sells not just deployment, but also training, multi-step orchestration, and white-labeled API monetization for model labs. That breadth means the company is not in a two-vendor race; independent datasets and company materials alike point to a fragmented, multi-class landscape where buyers can substitute across hosted inference, raw compute, hyperscaler tools, or self-managed stacks depending on whether they optimize for speed, control, trust, or cost.[CP001, CP002, CP004, CP005, CP006, CP007]

Competitor profile table
CompetitorCategoryScale/fundingTarget segmentDifferentiationLimitation
ModalDirect serverless peer$30/mo Starter credit; Team plan at $250/mo + computeAI engineers and startupsPython-first serverless DX, instant autoscaling, observabilityNo public self-host option; enterprise controls concentrated in paid tiers
ReplicateDirect hosting/API peerThousands of community models; custom deployment via CogDevelopers, prototyping teams, model tinkerersOne-line API, model marketplace, fine-tunesPrivate models bill setup + idle time and public enterprise posture is thinner
RunpodRaw GPU cloud / serverless substitute750,000+ developers; Pods + Serverless + ClustersCost-sensitive AI builders and infra-heavy teamsCheapest published raw GPU rates, many SKUs, fast scaleMore DIY serving stack and less turnkey inference lifecycle tooling
AWS Bedrock / SageMakerHyperscaler incumbentAWS-scale data/AI platform with provider/model menuEnterprises already committed to AWSProcurement leverage, governance, wide ecosystemComplex pricing and stronger cloud lock-in
Google Vertex AIHyperscaler incumbent200+ Google and third-party models/toolsGCP enterprise and platform teamsModel Garden, pipelines, integrated data + AI stackManagement fees and GCP dependence complicate simple cost comparisons
Azure MLHyperscaler incumbentAzure-native ML platform with 99.9% SLAAzure-centric and regulated enterprisesCentralized studio, model catalog, Azure security postureSeparate Azure service charges and no public multi-cloud story
Internal build (Truss/Cog + rented GPUs)Status quo / internal buildPortable open-source packaging on owned or rented infrastructureTeams with strong platform engineering capacityMaximum control and lowest software lock-inHighest operational burden for scaling, reliability, and compliance
Model labs building branded APIsAdjacent / likely entrantDirect API ownership with custom billing and metering surfacesFrontier model vendors and specialized labsOwn brand, own customer relationship, direct monetizationHard to maintain capacity planning and enterprise operations without a managed partner

Rows compare the main ways buyers can solve the same deployment job, including direct peers, incumbents, and internal-build substitutes.

[CP004, CP005, CP006, CP007, CP008, CP009]
FP001: Competitive positioning map

Ordinal scoring of self-serve simplicity versus enterprise control / portability.

[CP004, CP006, CP007, CP023, CP025, CP028]

3.2 Capability and pricing comparison

Baseten compares best when the buyer wants a managed inference platform rather than just a GPU rental or a one-line demo API. Public materials show a stack that combines custom-model packaging, OpenAI-compatible Model APIs, training, Chains orchestration, enterprise deployment modes, and a runtime built around low-latency optimization techniques. Modal is the sharpest developer-experience counterpoint: clean serverless pricing, generous monthly credits, and explicit GPU concurrency limits make it compelling for teams that mainly need elastic Python compute. Replicate is even lighter weight for prototypes and model discovery, but its private-model economics include setup and idle time on dedicated hardware. Runpod is the price-floor alternative, publishing cheaper raw hourly and per-second GPU rates while leaving more of the serving lifecycle to the customer. Hyperscalers are harder to compare on a like-for-like basis because Bedrock, Vertex, and Azure ML wrap model access in broader cloud billing, governance, and platform fees. Net: Baseten's public list pricing is transparent and feature-rich, but it clearly sells performance, portability, and support rather than commodity compute. That is a valid wedge only if customers value total production outcomes over the cheapest published GPU-hour.[CP003, CP011, CP012, CP013, CP014, CP015]

Feature / capability matrix
Buying criterionBasetenModalReplicateRunpodBedrock / SageMakerVertex AIAzure ML
Custom-model packaging frameworkTrussPython functionsCogContainer / handler modelCustom training + deploymentCustom training + deploymentModel catalog + deployment
OpenAI-compatible hosted open modelsyesunknownpartialunknownpartialpartialpartial
Managed training on same platformyesunknownfine-tunes onlyyesyesyesyes
Self-host / customer-cloud optionyesunknownunknownunknownno public BYOC outside AWSno public multi-cloud optionno public multi-cloud option
Multi-cloud / cloud-agnostic routingyesyesunknownmany regions / no lock-in claimnonono
Enterprise trust postureSOC2 + HIPAA + single-tenantEnterprise SSO / audit / HIPAAunknownSOC2 Type IIenterprise governanceenterprise governance99.9% SLA + Azure controls
Multi-step orchestration built inChainsgeneric functionscustom code onlyqueues + serverlessbroader platform servicespipelines + agentsbroader ML studio
Public list pricing transparencyhighhighmediumhighmediummediumlow

Cells marked unknown reflect missing public evidence; the matrix compares buying criteria, not benchmark winners.

[CP013, CP014, CP015, CP019, CP020, CP021]
Pricing / packaging comparison
VendorPublic modelContract modelPrice signalIncluded capabilitiesImplication
Baseten BasicCustom + open-source deployment$0/mo + usagePublic GPU and per-token tables; no idle-charge claimDedicated deployments, Model APIs, trainingTransparent entry point for production workloads
Baseten Pro / EnterpriseQuotedSales-led / discountedPriority compute, custom SLAs, self-host, volume discountsDedicated support, data residency, enterprise controlsUpsell is breadth and support, not lower public list price
Modal StarterServerless compute$0 + compute$30/mo credits; 10 GPU concurrencyLogs, region selection, serverless primitivesExcellent prototype and small-team on-ramp
Modal TeamServerless compute$250/mo + compute$100/mo credits; 50 GPU concurrencyCustom domains, static IP, rollbacksScales with startups while staying compute-centric
Replicate private modelsDedicated hardware for custom modelsPer-second compute incl. setup / idle / activeNo fixed seat plan; pay while instance is onlineCustom models via Cog, autoscalingWarm custom deployments can get expensive
Runpod Secure CloudRaw GPU instancesPer-hour GPU rentalExample list rates include A100 at $1.39/hr and H100 PCIe at $2.89/hrReliable pods, broad GPU menuCost floor for buyers willing to self-manage
Runpod ServerlessFlex or active workersPer-secondH100 flex $0.00116/s and active $0.00093/sAPI endpoints, queues, fast cold startsAttractive for bursty inference and scale-to-zero workloads
AWS BedrockProvider/model-specific APIsPer-token + tiered serviceBatch listed at 50% below on-demandManaged model access plus AWS add-onsEasy incumbent API path but bill complexity is higher
Google Vertex AIAgent/model platformUsage + compute + feesCompute/storage plus management fees; pipelines at $0.03/runNotebooks, Model Garden, pipelinesBest fit inside existing GCP estate
Azure MLAzure-native ML platformConsumed Azure servicesNo standalone Azure ML feeStudio, model catalog, deploymentProcurement advantage for Azure-first buyers

Public list pricing is comparable only at the headline level; negotiated discounts and workload-specific costs remain private across most enterprise deals.

[CP011, CP012, CP016, CP017, CP018, CP019]
FP002: Feature breadth / capability map

Class-level capability strengths rather than vendor-by-vendor benchmark claims.

[CP014, CP023, CP024, CP025, CP028, CP031]

3.3 Distribution power, switching costs, and trust posture

Baseten's strongest non-performance argument is that customers can keep control while still avoiding the operational burden of a self-built inference platform. Multi-cloud routing, self-hosted deployment, and single-tenant options help win buyers who fear locking critical workloads into one hyperscaler or who need tighter data-residency boundaries. The tradeoff is structural: because Baseten also relies on portable, open packaging and because adjacent tools like Cog plus raw GPU clouds remain available, hard switching costs stay lower than in closed-model APIs or data platforms. On trust, Baseten's public posture is ahead of the self-serve peer set: it pairs SOC 2 Type II and HIPAA claims with explicit statements about not storing inputs or outputs by default. Modal narrows part of that gap through enterprise-only SSO, audit logs, and HIPAA; Replicate stays strongest on ease of adoption; Runpod stays strongest on low-cost infrastructure freedom. The biggest distribution disadvantage remains hyperscaler channel power. AWS, GCP, and Azure can fold AI procurement into pre-existing billing, IAM, and cloud-commitment relationships, which means Baseten must keep proving that open-model performance, portability, and support justify a separate vendor decision.[CP023, CP024, CP025, CP026, CP027, CP028]

3.4 Moat durability and competitive risk

Baseten's moat is real but softer than a proprietary-model or data-network moat. The best-supported edge is integrated execution: optimized runtimes, multi-cloud capacity, regulated deployment options, and hands-on engineering support aimed at teams shipping serious AI products. Training and Frontier Gateway broaden the product into a larger platform story, which can strengthen account control if customers standardize on one vendor from model development through branded API delivery. The countervailing evidence is meaningful. HostFleet's April 2026 serverless GPU matrix shows Baseten as the most expensive published option across multiple common GPU tiers, while Sacra explicitly warns that hyperscalers can pressure independent vendors by bundling inference into broader cloud commitments. Baseten's own status page reports 99.91% Model API uptime over its displayed window and multiple May 2026 incidents, so four-nines reliability should be read as a sales target rather than proof that operations are already frictionless. Capital helps but does not eliminate the problem: the latest financing and investor roster improve staying power in a capital-intensive market, yet the core underwriting conclusion remains that Baseten is best positioned for premium, production-grade open-model inference workloads—not for winning the commodity price war on raw GPU hours.[CP031, CP032, CP033, CP034, CP035, CP036]

Moat durability / competitive risk register
Moat claimThreatSeverityMitigation / diligence ask
Integrated runtime + orchestration stackServerless peers replicate parts of the DX without matching full breadthmediumBenchmark full application workflows, not just one endpoint latency number
Multi-cloud + self-host portabilityCustomers can multi-home or migrate away more easily than in closed platformsmediumMeasure retention and expansion by deployment mode to see if portability still converts into durable spend
Enterprise trust postureHyperscalers bundle governance into existing contracts and cloud commitmentshighCollect regulated-industry win/loss notes against AWS, GCP, and Azure
Training + gateway expansionBaseten now competes with broader AI platform vendors and model-lab toolingmediumQuantify how often training and gateway products lead to incremental inference revenue
Open-source packagingTruss lowers lock-in and shrinks switching costhighTrack Truss-to-paid conversion and production retention by cohort
Price premium versus raw GPU cloudsRunpod and similar hosts undercut headline infrastructure pricehighProve total-cost-of-ownership and reliability ROI with customer benchmarks
Reliability brandRecent incidents weaken the four-nines story in competitive dealsmediumReview incident frequency, MTTR, and customer multi-region failover patterns
Capital backingHyperscalers and other well-funded infra vendors can still outspend BasetenmediumValidate whether investor and GPU-supply ties create real commercial or capacity advantage

Severity reflects the likelihood that each threat compresses pricing power or raises customer acquisition friction over the next 12–24 months.

[CP023, CP024, CP025, CP029, CP030, CP031]
FP003: Moat / readiness KPIs

Compact readout of Baseten's competitive durability and current pressure points.

[CP031, CP033, CP035, CP038, CP039, CP040]

3.5 Exhibits

Chapter 04

04Financials

4.1 Revenue model and public pricing

Baseten's public financial story starts with a straightforward revenue design: the company charges for production inference and adjacent infrastructure usage, not for seats. The pricing page exposes three commercial surfaces—dedicated deployments, Model APIs, and Training—and wraps them in a Basic self-serve tier plus quote-based Pro and Enterprise packaging. Model APIs are priced per million tokens, dedicated deployments are billed for compute used down to the minute, and Training is sold as both managed Training Jobs and the newer Loops workflow that feeds checkpoints straight into inference. That is a coherent production-infrastructure monetization model, and the billing-usage API reinforces it by splitting spend across dedicated inference, training, and Model APIs with daily breakdowns and credits used. The nuance is that list pricing is only the outer shell of the model. Public materials show that the real commercial wedge sits in premium support, priority compute, self-host deployment, use of existing cloud commitments, custom SLAs, and advanced security or governance. Those features imply Baseten is trying to monetize both usage and a higher-touch enterprise motion. The Terms also make the customer Order the binding commercial instrument, which means realized pricing can diverge materially from the public price page once support, discounts, and minimum commitments enter the deal. That is important for revenue recognition and yield analysis because public list prices are observable, but list-to-net economics are not. The result is a credible revenue architecture with decent public visibility into billing units and weak visibility into realized revenue quality. Public evidence can show how Baseten intends to charge; it cannot show revenue mix, attach rates for services, or what customers actually pay after enterprise negotiation.[CI001, CI002, CI003, CI004, CI005, CI006]

Revenue streams table
streammechanismunitcurrent value/statusqualitydiligence ask
Dedicated deploymentsPer-workload GPU compute plus deployment controls and supportGPU-minute / contractPublic list pricing and quote-based Pro/Enterprise packaging existMedium for billing unit; low for realized priceProvide customer-level realized rate cards, discounts, and gross margin by GPU family.
Model APIsUsage-priced hosted models through OpenAI-compatible endpoints1M tokensPublic list pricing exposed with separate input, cached-input, and output columnsHigh for list unit; low for realized yieldProvide token volume by model, cache share, batch share, and realized net revenue per token.
Training Jobs / LoopsManaged training workloads that connect directly into inference deploymentGPU-minute / job / contractCommercial surface exists, but public list pricing is not disclosedLowProvide training-job pricing schedule, contribution margin, and attach rate into production inference.
Support and engineering servicesHands-on engineering, Slack/Zoom support, deployment optimization, and enterprise assistanceservice attachment / contractClearly present in Pro and Enterprise messaging, but no standalone rate cardLowProvide services attach rate, blended pricing, and whether support is margin-accretive or subsidized.
Enterprise self-host / cloud-commitment portabilityBaseten software and support layered into customer-cloud or hybrid deploymentscustom contractPublicly marketed as a key enterprise feature setLowProvide typical annual contract value, minimum commit, and renewal behavior for self-hosted accounts.

Public evidence supports revenue surfaces and billing units, but not product-level revenue mix or realized pricing.

[CI001, CI002, CI003, CI004, CI005, CI006]
Pricing / monetization table
price / unit / contractlist vs realized pricingdiscounts / unknownssource-backed implication
Basic: $0 per month pay-as-you-goPure list priceNo public conversion, ARPU, or activation dataSelf-serve entry point broadens funnel but says nothing about paid conversion.
Pro: quote-based with priority compute, dedicated compute, higher API rate limits, dedicated supportList package, realized price hiddenVolume discounts available but depth undisclosedRevenue quality likely depends on how often support and priority capacity attach to usage.
Enterprise: quote-based self-host, custom SLAs, cloud commitments, data residency, advanced RBACList package, realized price hiddenNo public minimum commit, renewal terms, or services pricingEnterprise value proposition is operational control, not transparent SKU pricing.
Model APIs: per-1M-token pricing with separate input, cached input, and output ratesList pricingNo enterprise rate card, batching curve, or mix dataUseful for benchmarking, but list token prices are not realized revenue.
Dedicated compute: billed by compute used down to the minuteList billing ruleHostFleet says minimum dedicated deployment cost and billed awake times still applyScale-to-zero helps, but minimums can compress savings for spiky workloads.
Fees and invoicing: billed end-of-month and due in 30 days unless an Order says otherwiseContract rule rather than product list priceOrders govern actual economicsRevenue recognition and payment timing likely vary by negotiated enterprise order form.

This table separates public list mechanics from private realized economics; all discount, minimum-commit, and enterprise-rate questions remain open.

[CI002, CI003, CI004, CI005, CI006, CI007]
FI001: Revenue model bridge

Baseten turns usage across dedicated deployments, Model APIs, and Training into metered spend, then converts a subset into higher-value enterprise and support contracts.

Flow depicts commercial logic rather than a quantified waterfall; public evidence does not disclose revenue mix or realized contract values.

[CI001, CI005, CI006, CI008, CI011, CI012]

4.2 GTM motion and sales-efficiency proxies

Baseten's go-to-market is best understood as land-with-usage, then expand through reliability, support, and deployment control. The Basic plan and public Model APIs create a low-friction developer entry point, but the monetization narrative shifts quickly toward Pro and Enterprise features such as dedicated compute, higher rate limits, hands-on engineering, self-hosting, and cloud-commitment portability. That packaging suggests Baseten is not trying to win a commodity self-serve race alone; it is trying to become the production-inference layer for teams that care about latency, uptime, and control enough to pay for operational help. Because Baseten is private, CAC, payback, enterprise sales cycle length, and NRR are unavailable. The best public substitutes are customer case studies. Writer credits Baseten with lower cost per million tokens and higher throughput on 70B-class models. OpenEvidence emphasizes flexible access to compute without multi-year reservations plus large deployment and maintenance-time gains. Speechify reports that it could retire a large self-managed GPU estate while cutting cost per million characters. Superhuman and Patreon frame the value proposition as saving scarce engineering time while materially improving latency or lowering GPU cost. Those are not audited financials, but they are directionally consistent with a GTM motion that sells time-to-production and lower total operating cost rather than just list-price compute. The evidence therefore supports a plausible expansion engine, but only in proxy form. The buyer logic is visible; the sales-efficiency math is not. Without internal conversion, retention, and sales-spend data, Baseten's GTM efficiency cannot be underwritten with confidence.[CI003, CI004, CI014, CI018, CI019, CI020]

4.3 Cost structure and unit-economics proxies

Baseten's public materials point to an asset-light but not necessarily low-price cost structure. Sacra describes the company as aggregating capacity across more than 15 cloud providers rather than owning GPU infrastructure outright, which should keep fixed asset intensity below a provider that buys and finances GPU fleets directly. Official materials reinforce the same model from another angle: Baseten talks constantly about multi-cloud capacity management, cross-cloud autoscaling, scale-to-zero, and running in the customer's cloud when needed. In principle, that should let the business flex supply and match cost to workload shape more tightly than a dedicated owned-fleet operator. But asset-light does not mean cheap. HostFleet's April 2026 serverless GPU matrix shows Baseten priced above Runpod on every shared SKU listed and above Modal on the shared L4 and H100 rows, while only Replicate's A100 custom deployment price sits higher among the overlapping A100 rows shown. That is the clearest adverse signal in the public record: Baseten sells a premium managed layer over raw compute. The company's rebuttal, effectively, is its own performance narrative. Dedicated Inference claims 6x better GPU utilization and 5-10x lower costs on optimized runtimes; Model APIs claim 5-10x lower spend versus closed alternatives; customer studies report lower per-unit cost, fewer engineers, or both. Those claims are consistent with a thesis that Baseten expands gross margin through utilization and support attachment, but public data still stops short of a gross-margin proof. That leaves unit economics in proxy territory. We can see the billing unit. We can see that some customers say total cost fell. We can see that list pricing is premium to raw GPU clouds. What we cannot see is the realized balance between cloud pass-through, support labor, negotiated discounts, and retention.[CI006, CI007, CI015, CI016, CI017, CI018]

Unit economics table
metricvalue / public proxyconfidencewhy it mattersdiligence ask
Published billing unitGPU-minute for dedicated inference; per-1M-token for Model APIsHighShows Baseten monetizes usage rather than seats.Provide realized billing mix by workload type.
Company-claimed utilization lever6x better GPU utilization and 5-10x lower costs on Dedicated InferenceMediumIf true, utilization is the core gross-margin lever.Provide before/after utilization histograms and gross margin by optimized runtime.
Price-floor pressureHostFleet shows Baseten premium to Runpod on shared SKUs and premium to Modal on shared L4/H100 rowsMediumPremium pricing must be justified by lower total cost, not raw GPU-hour parity.Provide win/loss analyses where Baseten beats cheaper raw-GPU alternatives.
Writer proxy35% lower cost per million tokens; 60% higher tokens/sec; 23% lower TTFTMediumSuggests performance optimization can offset premium list pricing.Provide benchmark methodology and comparable customer gross margin impact.
OpenEvidence / Speechify proxy78% lower latency, 6x faster deployment, 8x lower maintenance, 44% lower cost per million charactersMediumSupports TCO argument through both infra savings and fewer platform engineers.Provide audited customer expansion and retention data after migration.
Patreon / Superhuman proxy$600k resources saved yearly, 70% GPU-cost savings, 80% lower latency, multiple engineers freedMediumShows economic value can sit in labor efficiency as well as compute savings.Provide cohort-level NRR and services attach for customers citing labor savings.
Gross margin / CAC / NRRNot disclosed publiclyLowWithout these, no public unit-economics bridge closes.Provide gross margin by product line, CAC, payback, churn, and NRR.

Rows mix official list mechanics with customer-proof proxies and independent price-floor checks; none substitute for disclosed gross-margin data.

[CI005, CI006, CI015, CI018, CI019, CI020]
FI002: Unit economics bridge

Public unit-economics evidence runs from workload shape to utilization and support economics, but breaks before gross margin because realized discounts and COGS are private.

The bridge is directional. Public evidence supports the nodes qualitatively or via case-study proxies, but not a closed margin equation.

[CI007, CI015, CI018, CI019, CI020, CI021]

4.4 Capital adequacy and financing dependency

Baseten's capital position looks strong on paper and opaque in practice. The public record supports $75 million of Series C financing in February 2025, $150 million of Series D financing in September 2025, and a $300 million Series E at a $5 billion valuation in January 2026. Business Wire, Tracxn, and CB Insights all point to roughly $585 million of cumulative capital raised, and Business Wire explicitly characterizes the January 2026 round as the company's third fundraise within the prior year. That pace matters: Baseten is clearly not operating from slow-burn cash generation; it is financing growth aggressively as inference demand expands. The use-of-funds language reinforces that interpretation. Baseten's own Series E post centers the new money on speed, uptime, developer experience, and broadening the infrastructure platform. Tech Funding News adds expected hiring, customer-support expansion, and more integrations. Public headcount proxies line up with that investment story: PitchBook's 2025 snapshot showed 73 employees, while Tracxn listed 258 employees by April 2026. Even if those datasets are imperfect, the direction is clear—Baseten appears to be scaling operating expense meaningfully alongside product and infrastructure scope. What the public record does not show is whether the current capital base is sufficient relative to burn. There is no disclosed cash balance, monthly burn, runway, debt schedule, or covenant package. Sacra's reported $200 million to $600 million annualized revenue estimates suggest substantial scale, and the reported $11 billion to $15 billion valuation talk suggests the market may be willing to finance the next leg. But those are not substitutes for cash, margin, and runway disclosure. The only hard conclusion is that Baseten has had strong access to capital; whether it is adequately capitalized against actual burn remains private.[CI027, CI028, CI029, CI030, CI031, CI032]

Capital adequacy table
metricpublic value / statussource-backed implicationdiligence ask
Total capital raised$585M total capital raised publicly reportedCapital access has been strong enough to fund rapid scale-up, but cash remaining is unknown.Provide current cash balance and unrestricted cash after the January 2026 round.
Latest financing$300M Series E at $5B valuation in January 2026Fresh equity materially improved flexibility entering 2026.Provide post-close cash bridge and board-approved operating plan.
Funding cadenceSeries C $75M (Feb 2025), Series D $150M (Sep 2025), Series E $300M (Jan 2026)Three rounds inside roughly a year implies aggressive investment mode and possible dependence on capital markets.Provide target next-round timing and financing contingency plan.
Planned use of fundsSpeed, uptime, developer experience, team growth, platform expansion, more integrations and supportSpend appears aimed at product, infra, and headcount rather than harvest mode.Provide 24-month capex / opex budget by function.
Cash balance / monthly burn / runwayNot publicly disclosedCapital adequacy cannot be underwritten from public evidence.Provide monthly net burn, cash balance, runway under base and downside cases.
Debt / project-finance obligationsNo public debt schedule or project-finance obligations disclosed; SEC operating-company filings unavailable from cited EDGAR pageAbsence of disclosure does not equal absence of obligations.Provide all debt facilities, cloud-commitment liabilities, reserved-capacity obligations, and major vendor terms.

Funding facts are public; adequacy is not. This table intentionally separates known financing history from unknown liquidity and obligation metrics.

[CI027, CI028, CI029, CI030, CI031, CI033]
FI003: Financial estimate range

Public financial signals span wide ranges because they mix closed financing facts with third-party estimates and snapshots rather than audited financials.

Revenue and upper valuation bound are third-party estimates rather than company-disclosed audited figures. Headcount spans two different vendor datasets and dates.

[CI027, CI035, CI037, CI038, CI048]
FI004: Capital intensity / cash-flow map

Baseten's cash-flow logic appears to run from repeated equity raises into product, support, and capacity orchestration, but the residual cash position and obligations remain private.

This map shows direction of cash use and financing dependence rather than a numeric cash-flow statement because public burn and cash data are unavailable.

[CI027, CI028, CI029, CI030, CI033, CI034]

4.5 Financial verdict and disclosure gaps

The financial verdict is positive on business-model coherence and negative on underwriteable disclosure. On the positive side, Baseten clearly monetizes the right units for its product category: GPU-time, token usage, training jobs, and high-touch enterprise features. Customer proofs consistently reinforce the same story—that production-grade inference is valuable when it lowers total operating cost, shrinks engineering burden, and preserves latency or uptime under real workloads. Capital access has also been unusually strong, with three rounds in roughly a year culminating in a $300 million Series E. On the negative side, nearly every metric that turns a good story into an investment case remains private. Public sources do not show revenue mix by product, realized enterprise pricing, gross margin, CAC, NRR, churn, customer concentration, cash balance, monthly burn, or runway. The SEC EDGAR entity landing page tied to the company lookup does not provide public operating-company filings, so there is no audited financial bridge to fall back on. Reliability evidence is adequate but not spotless: the status page shows recent incidents and the SLA target is 99.9%, not the perfect uptime implied by the strongest marketing claims. Net: Baseten looks like a premium, usage-based inference platform with real demand and credible cost-saving proxies, but the public record is still too thin to underwrite revenue quality or capital adequacy rigorously. The right diligence posture is to treat the company as promising but disclosure-light until private financials close the gaps listed below. Public customer proof now spans finance workflows (Hebbia), coding products (Zed and Posit), voice interfaces (Wispr Flow), and world-model experimentation (World Labs), which strengthens the case that Baseten's usage-based revenue opportunity is diversified across several demanding production workloads rather than one narrow niche.[CI024, CI025, CI027, CI028, CI033, CI040]

Public financial gaps table
missing private metricimpact on underwritingexact diligence path
Revenue mix by Model APIs vs Dedicated Inference vs Training vs servicesCannot judge whether growth is durable software-like expansion or support-heavy services revenue.Request monthly revenue by product surface for the last 18 months plus contribution margin by surface.
List-to-net pricing, enterprise minimum commits, and discount schedulesPublic list pricing may overstate realized yield and margin.Review five recent enterprise orders with associated discount approvals and usage curves.
Gross margin by product line and cloud / GPU procurement termsImpossible to assess whether optimization claims translate into retained gross profit.Provide product-level COGS, major cloud spend by provider, and any reserved-capacity commitments.
Cash balance, monthly burn, and runwayCapital adequacy cannot be underwritten despite recent fundraises.Provide current cash waterfall, trailing six-month burn, and scenario runway model.
Customer concentration, NRR, churn, and cohort expansionCannot test revenue quality or durability beyond anecdotal customer stories.Provide top-20 customer revenue concentration, logo churn, dollar churn, and cohort NRR.
Public filing and audit trail depthLack of SEC operating financials leaves investors dependent on management-only materials.Provide audited financial statements, board package KPIs, and any lender reporting packs.

Every row is a material diligence blocker rather than a nice-to-have. Public evidence establishes narrative direction, not underwriteable private metrics.

[CI047, CI050]
Chapter 05

05Product & Technology

5.1 Product Surface in Customer Workflow Terms

Baseten now spans most of the modern AI deployment workflow rather than a single hosting SKU. At the lightest-weight end, Model APIs let teams swap an OpenAI or Anthropic base URL and call shared frontier/open models immediately, which is useful for prototyping or for products that do not need dedicated GPUs. At the heavier-weight end, Truss packages a custom or open-source model into a reproducible deployment artifact, while Dedicated Inference adds tenant-isolated capacity, custom scaling, and support for stricter latency or compliance needs. Chains sits above single-model inference for multi-step RAG, transcription, or multimodal flows, Frontier Gateway adds branded URLs plus billing/rate-limit controls for model labs monetizing their own APIs, and Baseten Training/Loops try to close the loop from checkpoint creation to production serving. In customer workflow terms, the company is selling a graduated path from instant API evaluation to custom production inference, not just raw GPU rental.[CE001, CE002, CE006, CE007, CE008, CE009]

Product Module / Asset Matrix
Module / AssetPrimary userStatus / maturityCore functionDifferentiationDiligence gap
Model APIsApplication developers evaluating or shipping frontier/open modelsGA / mature shared serviceInstant OpenAI- and Anthropic-compatible inference on Baseten-managed shared infrastructureLowest-friction entry point; built-in caching, tool calling, structured outputs, and migration path to dedicated deploymentsShared infrastructure by design; public docs do not disclose tenant-level contention controls or per-model benchmark methodology
Dedicated InferenceTeams serving custom, fine-tuned, or proprietary models in productionGA / core enterprise surfaceSingle-tenant or customer-controlled inference with custom hardware, scaling, and deployment optionsCombines managed performance tuning, cross-cloud autoscaling, and enterprise control surfacesOnly public contractual SLA is 99.9%; published GPU-hour pricing is high versus self-serve peers
TrussML engineers packaging and iterating on custom deploymentsMature open-source CLI with active May 2026 releasesPackages model code, weights, dependencies, and GPU config; deploys via uvx truss push/watchWrite-once packaging abstraction with live reload and support for many serving frameworksOpen-source activity is healthy, but public docs do not quantify enterprise adoption of Truss specifically
ChainsTeams building RAG, transcription, or multi-model workflowsGA / production workflow layerOrchestrates Python chainlets with per-step hardware, dependencies, and autoscalingLets Baseten sell compound-AI workflows without forcing monolithic model deploymentsPublic performance claims are directional; workflow-specific latency depends on design and workload
Frontier GatewayAI labs commercializing their own hosted modelsGA / specialized monetization surfaceWhite-labeled inference API with key management, billing, metering, rate limits, and branded URL routingTurns Baseten into invisible infrastructure for labs that want their own API brand and monetization layerPublic documentation does not describe customer count, supported billing edge cases, or settlement workflows
Training Jobs / LoopsResearch and infra teams training or post-training models before deploymentTraining Jobs = GA; Loops = early accessManaged GPU training plus a path to deploy checkpoints into inference endpointsAttempts to close training-to-inference loop inside one platform rather than handing off to another vendorLoops is still early access, so maturity is uneven versus the inference stack

Status labels reflect Baseten's own public wording as of 2026-05-30. “Mature” means a repeatedly described GA surface with operational documentation; it does not imply externally audited feature quality.

[CE001, CE002, CE006, CE007, CE008, CE009]
Workflow / Use-Case Table
User jobCurrent workflowBaseten solutionPublic benefitLimitation
Prototype with a frontier/open model quicklySwap providers without building deployment infraModel APIsPoint an existing OpenAI or Anthropic SDK at Baseten and start calling supported models immediatelyYou accept the supported-model list and shared-infrastructure model rather than choosing exact hardware
Deploy an open-source or proprietary model to productionPackage model, pick hardware/engine, expose stable endpointTruss + Dedicated InferenceConfig-driven deployment path with TensorRT-LLM or custom-server options, observability, and environment promotionCustomer still needs to validate performance/cost trade-offs per model because Baseten does not publish a universal benchmark methodology
Run a compound AI applicationSplit a multi-step workflow across specialized componentsChainsEach chainlet can use its own hardware and autoscaling, reducing monolithic GPU waste and latency bottlenecksPublic performance claims are directional; workflow-specific latency depends on design and workload
Commercialize a lab-owned modelExpose model to third-party customers with metering and rate limitsFrontier GatewayWhite-labeled URL, key management, usage limits, and per-customer billing remove the need to build an API gateway from scratchCommercial and contractual details beyond the marketing surface are not public
Train or fine-tune and then deploy checkpointsRun training code, sync checkpoints, and promote into inferenceTraining Jobs / Loops + deploy_checkpointsSame vendor can cover managed training infra and downstream deployment endpointLoops remains early access, so the most advanced post-training path is not yet fully mature in public materials

Benefits are public-product claims and customer-proof outcomes, not guaranteed customer results. Each row describes the workflow Baseten markets most clearly, not every possible implementation variant.

[CE001, CE002, CE005, CE006, CE007, CE008]
FE002: Customer Workflow / Operating Flow

How a team moves from evaluation or packaging into production inference on Baseten, with an optional training and gateway path.

[CE002, CE005, CE006, CE007, CE008, CE009]

5.2 Deployment Architecture and Operating Model

The clearest technical differentiator in Baseten's public corpus is that it explains the deployment path with more specificity than many AI-infrastructure startups. Truss abstracts packaging, dependencies, and GPU configuration; Baseten's build step then validates and uploads the package, compiles supported LLMs with TensorRT-LLM when the engine path is selected, and deploys the resulting container behind a dedicated model subdomain. The MCM control plane sits underneath both training and inference, abstracting cloud-provider differences and rerouting capacity across regions or providers when needed. Request routing resolves environment names from URL paths, environments preserve stable endpoints as deployments are promoted, and async requests enter a queue that protects real-time traffic from background work. BDN addresses the cold-start bottleneck by mirroring and caching large model weights at multiple layers so new replicas are less dependent on external storage. The result is an inference-first architecture with explicit build, routing, autoscaling, and weight-delivery primitives.[CE003, CE004, CE005, CE010, CE011, CE012]

Technology / Operating Architecture Table
Layer / componentPublic mechanismKey dependenciesRisk / limitation
Packaging layer (Truss)Packages model definition, dependencies, secrets, caching, and GPU config from config.yaml or Python model codeBaseten CLI, GitHub/PyPI distribution, user source repositoriesAbstraction is strong, but deployment success still depends on model-specific tuning and user-supplied weights
Build / compile pathEngine-Builder-LLM downloads weights, compiles with TensorRT-LLM, applies quantization/tensor parallelism, and emits a serving containerHugging Face or cloud-storage weight source, CUDA-compatible GPU targetsCompile times can take minutes and public docs do not benchmark every model/hardware combination
Runtime optimization layerInference runtime exposes TensorRT, SGLang, vLLM, TGI, TEI, speculative decoding, structured output, KV-cache optimization, and topology-aware parallelismModel architecture support, Baseten inference runtime, GPU memory/layout assumptionsOptimization options are numerous but not all are validated publicly per workload
MCM control planeUnifies GPUs across clouds/regions, provisions resources, monitors health, and reroutes around capacity crunches or outagesUnderlying cloud GPU supply, networking, Baseten control planeCross-cloud abstraction reduces lock-in but introduces dependency on Baseten’s own orchestration layer
Weight delivery / cold-start pathBDN mirrors weights to Baseten storage and caches them at mirrored-origin, cluster, and node layersUpstream weight repository for first mirror, Baseten blob storage, in-cluster cacheFirst deploy still depends on upstream weight availability; benchmark methodology for “2-3x faster” is not public
Request routing / environmentsEach model gets a subdomain; URL path resolves environment; async requests queue; promotions keep endpoint names stableBaseten API gateway, environment config, autoscaler, queue serviceScale-to-zero introduces cold-start trade-offs and regional guarantees require a special endpoint
Workflow orchestration / trainingChains coordinates multi-step workflows; Training Jobs and Loops provision GPUs via MCM and can deploy checkpoints into inferencePython SDK/CLI, MCM, storage/checkpoint syncLoops is early access and training maturity lags the core inference surface
Enterprise controls / tenancySingle-tenant, self-hosted, hybrid, and regional environments plus SSO/SCIM and compliance-policy boundariesCustomer IdP, Baseten support setup, customer cloud if self-hostedSome controls require sales/support intervention rather than pure self-serve enablement

This table mixes direct documentation facts with synthesis about operating dependencies. “Risk / limitation” names the public caveat or diligence item, not an observed failure.

[CE003, CE004, CE005, CE006, CE007, CE010]
FE001: Baseten Product Architecture Map

Layered view of Baseten's public architecture from access surfaces through packaging/orchestration to runtime and cross-cloud infrastructure.

[CE001, CE002, CE006, CE007, CE009, CE010]

5.3 Trust, Data Handling, and Reliability Controls

Baseten's trust posture is strong by startup standards and particularly important because the company wants sensitive inference workloads. The public security documentation says Baseten maintains SOC 2 Type II and HIPAA compliance, does not store model inputs, outputs, or weights by default, never shares GPUs across users, isolates customers into dedicated Kubernetes namespaces, and uses controls such as Calico, Falco, and Gatekeeper around workload isolation. Enterprise and pricing pages also advertise self-hosted, single-tenant, hybrid, and region-restricted options, while regional-environments docs explain that true residency requires a distinct regional endpoint rather than the default environment CNAME. The nuance is reliability: Baseten marketing frequently uses four-nines language, but the only public contractual commitment is the Dedicated Inference SLA at 99.9% monthly availability. The public status page also logged multiple May 2026 incidents, so diligence should treat trust/compliance as a strength and public reliability guarantees as more mixed.[CE015, CE016, CE017, CE018, CE019, CE020]

Trust / Quality / Compliance Table
Control / signalPublic statusScope / evidenceImplicationGap
SOC 2 Type II + HIPAAPublishedSecurity docs, enterprise page, and pricing page all cite SOC 2 Type II and HIPAAStrong baseline trust signal for enterprise inference workloadsNo public certificate artifacts or audit scope details in the reviewed corpus
Default non-storage of prompts/outputs/weightsPublished with caveatsSecurity docs say Baseten does not store inputs, outputs, or weights by default, except temporary async storage and optional cachingImportant privacy and IP positioning for sensitive inferenceNeed contract/DPA review for exact retention edge cases and customer-enabled caching behavior
GPU and namespace isolationPublishedSecurity docs say Baseten never shares GPUs across users and assigns each customer a dedicated Kubernetes namespace with Calico/Falco/Gatekeeper controlsSupports tenant isolation claims beyond generic cloud marketingNo public penetration-test report or architecture diagram was reviewed
Regional environments / data residencyPublished but support-configuredDocs explain region-constrained replicas and special regional endpoint formatsUseful for GDPR/data-residency buyers that need routing guaranteesSetup requires Baseten involvement and public docs do not state lead times or pricing
Identity and lifecycle controlsExpanded in 2026SSO/SCIM changelog adds SAML 2.0, SCIM 2.0, JIT provisioning, deprovisioning, and group-based roles on EnterpriseImproves enterprise admin hygiene and procurement readinessNo public mapping to specific IdP limitations or SCIM attributes
Hosting flexibilityPublishedEnterprise, dedicated-inference, and security pages describe Baseten Cloud, self-hosted, hybrid, and single-tenant modesLets buyers choose between speed, control, and cloud-commitment reuseExact operational split between Baseten-managed and customer-managed responsibilities is not fully public
Contractual availabilityMixedSLA contract says Dedicated Inference targets 99.9% monthly availability; marketing pages often use 99.99/four-nines languagePublic procurement should rely on the legal SLA rather than homepage shorthandNo public SLA was found for Model APIs, Chains, or the broader web app
Operational incident visibilityPublished but mixedPublic status page shows multiple May 2026 incidents; third-party reachability tracker says detailed incident data is unavailableVisibility exists, but independent uptime corroboration is thin and headline uptime panels can hide short incidentsNeed contractual incident response terms, RCA access, and service-credit history in diligence

Confidence is highest where multiple official pages agree; lower where public documentation requires contact with support or contract review. This table describes what is public, not what Baseten may provide privately in diligence.

[CE015, CE016, CE017, CE018, CE019, CE020]
FE003: Critical Dependency Map

Key dependencies that sit around Baseten's inference stack: upstream weights, GPU clouds, regional/identity controls, and non-core SaaS tooling.

[CE010, CE016, CE023, CE024, CE025, CE036]

5.4 Developer Signal, Customer Proof, and Competitive Positioning

Baseten's moat is not the lowest published unit price; it is the bundle of packaging tooling, performance engineering, and managed cross-cloud operations. Truss gives Baseten a real developer surface: the GitHub repo and PyPI package show an active open-source packaging CLI with frequent May 2026 releases centered on Loops and deployment workflows. Customer proof is stronger than generic logo pages: Writer says Baseten-built TensorRT-LLM engines improved tokens per second and lowered time to first token and cost, while OpenEvidence attributes materially lower latency, faster deployments, and lower maintenance burden to Baseten's MCM, embeddings runtime, and tooling. The trade-off is visible in independent pricing comparison. HostFleet's April 2026 matrix shows Baseten priced above Runpod and Modal on comparable GPU instances, while Runpod and Modal market more aggressive zero-idle and cold-start positioning. Against AWS, Google, and Microsoft, Baseten is narrower in scope but easier to read as an inference-specialist layer rather than a full hyperscaler AI platform.[CE026, CE027, CE028, CE029, CE030, CE031]

5.5 Roadmap and Product Maturity

Public 2026 roadmap signals point to Baseten maturing operating controls around a fairly stable product architecture rather than constantly adding new product families. The notable 2026 releases were SSO/SCIM, rolling deployments, BDN, and a billing usage API—features that make the platform easier to govern, safer to update, faster to cold-start, and easier to instrument financially. That release mix suggests Baseten is moving from “can this serve models?” toward “can this run mission-critical inference inside enterprise processes?” At the same time, maturity is uneven across the stack. Training Jobs are publicly GA, while Loops remains early access, so the training-to-inference story is strategically promising but not uniformly production-proven. Public materials also say little about benchmark methodology, exact enterprise onboarding lead times for regional controls, or product priorities beyond the currently disclosed 2026 changelog, leaving some product-tech diligence items unresolved.[CE007, CE020, CE021, CE022, CE037, CE038]

Roadmap / Release / Development-Stage Table
Date / stageFeature / milestoneStatusImplicationSource
2026-03-04Billing usage APILaunchedMakes Dedicated Inference, Training, and Model APIs easier to instrument financially via daily API breakdownsBaseten changelog
2026-03-19Baseten Delivery Network (BDN)LaunchedSignals investment in cold-start mitigation and independence from upstream weight stores after first mirrorBaseten changelog + How Baseten Works docs
2026-03-30Rolling deploymentsLaunchedAdds safer zero-downtime promotions and better environment lifecycle control for production releasesBaseten changelog
2026-05-14SSO and SCIMLaunched on EnterpriseImproves identity governance and deprovisioning for larger customersBaseten changelog
2026 public product stateTraining JobsGAShows the managed-training product is no longer experimentalTraining product page
2026 public product stateLoopsEarly accessIndicates Baseten is investing in post-training/RL workflows but has not yet fully hardened the surface publiclyTraining product page + Truss releases

Only explicitly disclosed public milestones are listed. Absence from this table should not be read as absence from the internal roadmap; it only means the item was not visible in the reviewed public corpus.

[CE007, CE020, CE021, CE022, CE037, CE038]
FE004: Product Maturity / Capability Map

Capability-by-maturity view of Baseten's main product surfaces as of 2026-05-30.

Maturity labels are synthesis, not company-provided scores. “Differentiated” means the public corpus shows a clearer relative advantage or stronger external proof, not that the capability is objectively category-leading on all dimensions.

[CE007, CE026, CE027, CE031, CE032, CE037]

5.6 Exhibits

Chapter 06

06Customers

6.1 Customer segmentation and buyer profile

Baseten's public customer evidence points to a buyer set made up primarily of AI-native software builders whose own end products live or die on inference speed, reliability, and cost. The buyer is usually an ML, platform, or product-engineering leader, while the user base broadens into application engineers, security teams, and operations leads once a workload moves from model evaluation into production. Publicly named examples span enterprise agent platforms such as Writer and Notion, regulated healthcare apps such as OpenEvidence and Abridge, voice and speech applications such as Speechify, productivity software such as Superhuman, creative tools such as Gamma, and GTM or coding products such as Clay and Cursor. That breadth matters because it shows Baseten is not only selling experimentation infrastructure. At the same time, the disclosed book is still overwhelmingly AI-native software rather than a diversified set of legacy enterprises, so customer breadth is real but not yet institutionally broad in public.[CU001, CU002, CU003, CU004, CU005, CU006]

Customer segmentation table
SegmentBuyer / user / payerRepresentative public accountsPublic value signalGap
Enterprise agent and knowledge platformsBuyer: CIO / AI platform leader; User: app and ops teams; Payer: enterprise software vendorWriter, NotionMission-critical AI agents with security and governance requirementsNo disclosed revenue mix by enterprise vs startup accounts
Healthcare AI applicationsBuyer: clinical/IT leadership; User: clinicians, care teams, revenue-cycle ops; Payer: healthcare AI vendor or health system software budgetOpenEvidence, Abridge, AmbienceRegulated medical information and clinical documentation workloadsNo public contract-value or health-system count disclosure
Voice and speech applicationsBuyer: product / ML platform lead; User: end users and content teams; Payer: voice application vendorSpeechifyConsumer-scale TTS and voice infrastructure under real-time latency pressureNo public disclosure of revenue concentration inside audio workloads
Productivity and collaboration appsBuyer: product / engineering leadership; User: professionals and knowledge workers; Payer: software vendorSuperhuman, NotionInference directly affects email, workspace, and agent UXProduction KPIs are public for Superhuman, not for Notion
Creative and creator-economy platformsBuyer: product / infra lead; User: creators and prosumers; Payer: software vendorGamma, PatreonLarge-scale image generation and creator-media workloadsConsumer logos do not by themselves prove long-term renewal economics
GTM and developer toolsBuyer: engineering / revenue-ops lead; User: developers, GTM operators, recruiters; Payer: software vendorClay, Cursor, MercorShows Baseten exposure to code, GTM, and AI-economy toolingOnly public name mentions exist for most of these accounts

Segmentation is assembled from Baseten case studies, fundraising references, and official customer pages; Baseten does not publish customer-count or ARR mix by segment.

[CU001, CU002, CU003, CU004, CU005, CU006]
Publicly named strategic account table
CustomerSegmentEvidence typeWhat is publicProof strengthGap
AbridgeHealthcare AIBusiness Wire + Abridge siteNamed as Baseten customer; Abridge sells enterprise clinical-conversation AI to major health systemsMediumNo Baseten-specific deployment scope or outcomes disclosed
CursorDeveloper toolsWorkOS + Cursor siteNamed as Baseten customer; Cursor serves AI-assisted coding and says it is trusted by over half the Fortune 500MediumNo Baseten-specific workload detail or economics disclosed
Notion AIEnterprise productivityBusiness Wire + Notion AI pageNamed as Baseten customer; Notion AI markets agents, enterprise search, and zero-retention enterprise controlsMediumNo Baseten-specific performance or spend data disclosed
ClayGTM softwareWorkOS + Clay siteNamed as Baseten customer; Clay serves a large GTM-team base with enrichment and workflow automationMediumNo Baseten-specific production metrics disclosed
MercorAI economy / recruitingBusiness Wire + Mercor siteNamed as Baseten customer; Mercor positions itself around powering the AI economyLow-mediumPublic customer mention exists, but use case and infrastructure dependence are not described

These rows extend the named customer set beyond flagship case studies, but they are weaker proof than the six detailed stories because they usually lack deployment specifics.

[CU010, CU011, CU047, CU048, CU049, CU050]
FU001: Customer journey map

Baseten typically starts with a technical buyer evaluating model performance, then expands into operational, security, and procurement stakeholders as the workload becomes business-critical.

[CU001, CU003, CU036, CU037, CU038, CU039]

6.2 Adoption trajectory and named production proof

Baseten's best public adoption evidence is not a disclosed customer-count time series so much as a stack of workload-scale and outcome disclosures from reference customers. The company says inference volume grew 100x over the last year, and its customer-story index now spans healthcare, code, audio, presentations, and operations use cases. The flagship six case studies are production-grade rather than pilot-like: OpenEvidence discloses billions of requests per week and a doctor in every U.S. zip code, Speechify discloses 161B+ characters per month for 60M+ users, Gamma discloses 3M+ images per day for 70M+ users, Superhuman says dozens of custom models moved into production, and Patreon reports large cost savings on a scaled Whisper deployment. The quality of proof is strongest where Baseten and the customer share concrete latency, cost, throughput, or workflow metrics, though the public set remains curated.[CU012, CU013, CU014, CU015, CU016, CU017]

Customer growth / adoption trajectory table
MetricValueDateSourceConfidenceImplication / missing denominator
Baseten platform inference growth100x inference-volume growth in the last year2026-01Baseten Series E blogMediumStrong demand signal, but not broken out by customer or workload
Public reference inventory13 case studies, 29 testimonials, 4 videos, 654 ratings2026-05FeaturedCustomersMediumLarge public reference surface, but aggregator methodology is not fully transparent
OpenEvidence workload scaleBillions of requests per week; doctors in every U.S. state and zip code2026 viewedBaseten case studyMediumShows national clinical reach, but not revenue or contract expansion
Speechify workload scale161B+ characters/month for 60M+ users2026 viewedBaseten case studyMediumVery large consumer-scale inference load
Gamma workload scale3M+ images/day for 70M+ users2026 viewedBaseten case studyMediumStrong PLG-scale proof, but no disclosed share of Gamma traffic on Baseten
Superhuman production breadthDozens of custom embedding models switched into production after a one-week project2026 viewedBaseten case studyMediumIndicates deployment breadth even without volume metrics
Additional named accountsAbridge, Cursor, Clay, Notion, Mercor and others named publicly2026-01Business Wire / WorkOSMediumBreadth extends beyond the six flagship stories, but most lack quantified deployment detail

Baseten does not publish one normalized customer-count series, so the table uses workload and reference proxies rather than a single active-account KPI.

[CU012, CU013, CU014, CU015, CU016, CU017]
Named customer proof table
CustomerSegmentDeployment / use caseProduction vs pilotOutcomeLimitation
WriterEnterprise AI platformServe custom 70B domain-specific LLMs with TensorRT-LLM on BasetenProduction60% higher tokens/sec, 23% lower TTFT, 35% lower cost per million tokensNo renewal term or contract value disclosed
OpenEvidenceHealthcare AIMedical search and embeddings inference for cliniciansProductionLatency cut from >700ms to 160ms, 6x faster deployments, 8x+ less infrastructure maintenanceNo public spend or account-expansion disclosure
SpeechifyVoice / TTSHost 10+ production model deployments across TTS, voice conversion, and parsingProduction44% lower cost per million characters, 30-50% lower p99 latency, 4.5x faster startupNo disclosed revenue concentration or contract duration
GammaCreative AI platformServe open-source image generation models at massive user scaleProduction30%-80% faster generation, 20% better efficiency, 3M+ images/dayNo disclosed retention or spend-per-user metrics
SuperhumanAI-native productivityDeploy dozens of custom embedding models for core product featuresProduction80% lower P95 latency and rapid migration with zero user impactNo public seat-count or contract economics
PatreonCreator economy platformServe Whisper transcription and captioning workloadsProduction70% lower GPU cost, 440+ hours saved, nearly $600k annual savingsNo public renewal or expansion metrics

This is a partial public sample of Baseten deployments with quantified outcomes, not an exhaustive customer list.

[CU013, CU014, CU015, CU016, CU017, CU018]
FU002: Adoption / deployment funnel

The public adoption path starts with technical evaluation, moves into one production workload, and only later expands into dedicated compute, governance, and larger enterprise commitments.

[CU003, CU036, CU037, CU039, CU040, CU041]
FU003: Customer proof quality matrix

Public proof is strongest where Baseten and the customer disclose concrete performance outcomes; durability visibility is weakest across every disclosed account.

[CU014, CU016, CU017, CU018, CU019, CU021]

6.3 Durability, satisfaction, and expansion evidence

Durability evidence is directionally positive but incomplete. On the positive side, third-party reference aggregators show unusually strong public sentiment proxies: FeaturedCustomers reports a 4.8/5 reference score from 654 ratings, PeerSpot highlights collaboration and cost effectiveness, and individual customer quotes repeatedly describe Baseten as a winner on execution, uptime, or self-serve deployment. The product packaging also shows a plausible land-and-expand motion, from Basic pay-as-you-go usage into Pro dedicated compute and Enterprise self-hosted or region-restricted deployments. Healthcare and enterprise pages make that motion concrete by advertising HIPAA-sensitive workflows, single-tenant clusters, failover, and hands-on engineering support. The key limitation is that none of the reviewed public sources provide NRR, GRR, renewal cohorts, or contract duration, so expansion has to be inferred from packaging and customer quotes rather than measured account economics.[CU031, CU032, CU033, CU034, CU035, CU036]

Retention / repeat usage / satisfaction table
Metric / proxyValueSegmentConfidenceDiligence ask
Portfolio NRR / GRR / logo churnAll customersLowRequest cohort retention, gross and net revenue retention, and churn by segment
Contract duration / renewal cadenceAll customersLowRequest median contract length, renewal dates, and committed minimums by plan
Third-party reference score4.8/5 from 654 reference ratingsCross-customer public referencesMediumValidate how many ratings are current and attributable to post-2025 product surface
Qualitative review summaryPeerSpot highlights deployment speed, flexibility, and cost effectivenessCross-industry usersLow-mediumAsk for raw review count and more granular sentiment distribution
OpenEvidence testimonialVendor-vetting process ended with Baseten as a clear winnerHealthcare AIMediumAsk for renewal history and spend growth since migration
Speechify testimonialSpeechify says the partnership continues to grow and delivered highest uptime among inference providers it knowsVoice / TTSMediumAsk for uptime SLA, incident frequency, and contract-duration disclosure

Public durability evidence is testimonial-heavy and lacks cohort metrics, so satisfaction proxies should not be mistaken for audited retention data.

[CU031, CU032, CU033, CU034, CU035, CU046]

6.4 Concentration, switching, and competitive pressure

The main customer risk in the public record is not obvious churn but concentrated visibility. Baseten has more named accounts than just the six flagship stories, yet the quantified proof still sits inside a narrow band of AI-native software companies. That creates two diligence questions. First, there is no public disclosure of top-customer revenue share, contract lengths, or renewal rates, so investors cannot tell whether a handful of very large workloads dominate the book. Second, competitive pressure is real. HostFleet's April 2026 matrix shows Baseten as the most expensive listed option on multiple common GPUs, while Runpod's 2026 comparison ranks Baseten fifth and attributes materially faster headline cold starts to some rivals. WorkOS also describes a point where customers spending $10k-$50k per month start considering more control and lower-cost open-source options. Baseten counters that risk with open runtimes and no lock-in around customer models, but that also means portability tolerance will stay high. In other words, Baseten may be easier to adopt than a proprietary stack, but it must continuously re-win customers on operating performance and support rather than on captivity.[CU039, CU040, CU041, CU042, CU043, CU044]

Expansion and concentration risk table
Expansion driverConcentration risk / constraintImpactDiligence path
Basic → Pro → Enterprise packagingPublic plan ladder suggests upsell, but conversion rates are undisclosedSupports land-and-expand if customers outgrow self-serve inferenceRequest plan-mix by customer cohort and expansion rates from Basic into Pro/Enterprise
Enterprise controls and self-hosted deploymentCould bias sales toward fewer, larger technical buyers and service-heavy motionsHelpful for regulated and sensitive workloads, but may increase top-account concentration riskRequest top-10 customers by ARR, workload volume, and deployment mode
Healthcare-specific compliance postureVertical concentration could deepen if healthcare becomes the dominant expansion vectorRegulated workloads may be sticky if compliance and reliability prove durableRequest healthcare revenue share, customer count, and renewal history
Flagship case-study concentrationQuantified public proof is concentrated in six stories and mostly AI-native softwareInvestors cannot infer portfolio durability from a narrow reference setRequest anonymized cohort statistics for the broader customer base
Premium public pricingHigher published GPU prices and minimum deployment costs can increase switching pressurePricing could slow expansion for cost-sensitive workloads or make replatforming attractiveRequest win/loss and churn reasons on price-sensitive accounts
Open-source model portabilityCustomers can increasingly switch models and bring more infra in-house as spend risesBaseten must keep winning on speed, support, and economics rather than lock-inRequest data on customer tenure, self-host conversions, and workloads retained after optimization

The strongest expansion signals are packaging and customer quotes, while the strongest risk signals are public-proof concentration and competitive price pressure.

[CU036, CU037, CU038, CU039, CU040, CU041]
Chapter 07

07Risks

7.1 Legal and regulatory risk centers on compliance scope, customer contracting, and expanding AI rules

Baseten's public trust posture is strong on its face: the company says it is SOC 2 Type II certified, HIPAA compliant, GDPR aligned, and able to run region-restricted or self-hosted deployments for sensitive workloads. The risk is that this marketing posture narrows materially once the legal stack is read in full. Baseten's security docs say it does not store model inputs or outputs by default and can enforce compliance policies, while the healthcare and enterprise pages market HIPAA-compliant infrastructure for mission-critical workloads. But the DPA embedded in Baseten's public terms says customers must not submit PHI and other restricted data unless otherwise agreed in writing, and it leaves customers responsible for legal basis, notices, and many breach-notification duties. That does not prove a defect; it does mean the public website alone is not enough to underwrite regulated use. Regulatory pressure is also moving beyond privacy into AI-governance and procurement. The European Commission's AI policy page highlights AI Act implementation, sector guidance, codes of practice, and a service desk meant to help businesses comply. For an inference vendor selling into healthcare and other regulated enterprise workloads, that raises the probability of longer diligence cycles around residency, documentation, model governance, and shared-responsibility boundaries even if Baseten is not the final application-layer decision maker. The top legal risk is therefore compliance-scope ambiguity rather than a known enforcement action. [CR001, CR002, CR003, CR008, CR009, CR010]

Regulatory / legal risk register
RiskEvidenceLikelihoodSeverityMitigation maturityResidual exposureDiligence path
Healthcare compliance scope depends on signed overrides to the public Restricted Data carve-outHIPAA-compliant marketing sits beside DPA language that bars PHI unless otherwise agreed in writingMediumHighPartialHighRequest signed BAA/HIPAA addendum and the exact list of permitted PHI flows
EU AI Act and GDPR implementation can slow regulated enterprise salesEuropean Commission guidance emphasizes AI Act implementation support, guidance, and sector adoptionMediumHighEarlyMedium-HighReview EU legal memo, residency controls, DPIA templates, and audit artifacts
Subprocessor updates create short objection windows and termination as the main public remedyBaseten gives 15 days' notice and five days to object to new subprocessorsMediumMediumBasicMediumReview negotiated subprocessor notice rights and change-control process
Customer-side legal-basis and breach duties can increase deployment frictionThe DPA leaves customers with lawful-basis, notices, and many notification obligationsMediumMediumBasicMediumMap the shared-responsibility matrix before production launch
Mission-critical positioning can be undermined by default contract languageTerms exclude time-critical or mission-critical use even as product pages market mission-critical inferenceMediumHighLowHighNegotiate custom SLA and carve-out language for critical workloads

Rows reflect publicly observable legal and regulatory risks as of 2026-05-30; severity ranks investment relevance rather than legal advice.

[CR003, CR008, CR009, CR010, CR011, CR012]

7.2 Operational and security risk is defined by contract scope and visible incidents, not only uptime marketing

Baseten markets reliability aggressively. The enterprise, healthcare, dedicated inference, frontier gateway, and model API pages all promise four nines or 99.99% uptime, active-active redundancy, or highly reliable multi-cloud operations. The published SLA is narrower. It applies only to Dedicated Inference when Baseten is the hosting party and sets a 99.9% monthly availability target, with credits capped at 40% of monthly fees and recoverable only if the customer files within 24 hours. The terms then go further by stating the services are not licensed for time-critical or mission-critical functions and are not warranted to be uninterrupted or error-free. That creates a real diligence gap between product positioning and default contractual protection. Baseten's status page also shows this is not theoretical. The service reported multiple May 2026 incidents, including ongoing investigations, identified fixes, monitoring updates, and a major-outage marker in the 90-day view. Third-party monitoring adds only partial comfort because Servicealert says detailed incident data is unavailable and relies on reachability snapshots rather than full root-cause reporting. Baseten has shipped useful mitigations such as rolling deployments and deployment-health tooling, but reliability risk still belongs near the top of the chapter because the platform is sold into production workflows whose customers are highly sensitive to downtime and latency regression. [CR013, CR014, CR015, CR016, CR017, CR018]

Operational / quality / security risk register
Failure modeLikelihoodSeverityMitigation maturityResidual exposureUnresolved gap
Reliability marketing exceeds the default contractual SLAHighHighPartialHighNeed current custom SLA examples and service-credit terms for real enterprise deals
Control-plane or inference incidents recur during rapid product expansionMedium-HighHighPartialMedium-HighNeed postmortems, Sev1 frequency, and MTTR data beyond the public status page
Outage credits are operationally weak because claims must be filed within 24 hours and are cappedMediumMediumLowMediumNeed evidence that enterprise customers negotiate broader remedies
Compliance-policy or residency changes require Baseten support interventionMediumMediumPartialMediumNeed admin screenshots and change-control workflow evidence
Deployment regressions still occur despite new rollout controlsMediumMediumMediumMediumNeed adoption data for rolling deployments and rollback success rates

Residual exposure stays elevated because the public SLA and status data do not reveal customer-specific remedies, postmortems, or negotiated reliability terms.

[CR013, CR014, CR015, CR016, CR017, CR018]
FR002: Risk transmission map

Operational, contractual, and pricing risks flow directly into trust, sales velocity, margin, and valuation.

[CR017, CR019, CR023, CR026, CR027, CR034]

7.3 Dependency and commercial-model risk comes from upstream capacity, vendor chain complexity, and premium pricing

Baseten's core strategic answer to infrastructure risk is multi-cloud capacity management, cross-cloud autoscaling, single-tenant options, and the ability to run in the customer's cloud. Those are meaningful mitigations, but they do not eliminate dependence on cloud partners, GPU availability, and a long tail of third-party services. Nudge Security's public profile lists AWS, Vercel, Statuspage, SendGrid, Stripe, Segment, Sentry, GitBook, and other vendors in Baseten's visible supply chain. Baseten's own product pages further promise access to the latest-generation GPUs, elastic capacity, and priority compute. That means upstream capacity and pricing remain foundational dependencies whether the platform presents them as a single managed surface or not. The commercial risk is that Baseten appears expensive relative to public peers. HostFleet's April 2026 matrix shows Baseten priced above Runpod, Modal, and Fal.ai on multiple GPU classes, while Runpod's 2026 comparison gives Baseten premium pricing and slower cold-start ranges than some alternatives. Baseten may still justify that premium through enterprise controls, support, and performance tuning, but if customers can reproduce acceptable latency and uptime on materially cheaper infrastructure, margin and win-rate risk rise quickly. This is why price-performance and supplier concentration belong alongside classic vendor-risk questions. [CR021, CR022, CR023, CR024, CR025, CR026]

Partner / dependency risk register
DependencyCounterparty or surfaceRoleConcentrationFailure scenarioSeverityMitigationResidual exposure
Latest-generation GPU capacityCloud/GPU suppliersPowers premium inference and autoscaling promisesUnknown externallyCapacity tightens or prices rise, compressing margins and slowing onboardingHighMulti-cloud routing and hybrid/self-host optionsHigh
Visible SaaS control-plane vendorsAWS, Vercel, Statuspage, SendGrid, Stripe, Segment, othersSupport web, billing, monitoring, messaging, and operationsMediumA single third-party outage degrades customer experience or internal opsMediumVendor diversification and customer-cloud optionsMedium
Premium price positionRunpod / Modal / Fal.ai / Replicate set public comparison setShapes customer willingness to payHighPeers offer acceptable performance at materially lower entry costHighSell observability, support, security, and enterprise controlsHigh
Service-heavy enterprise deliveryForward-deployed engineers and support teamsCustomizes deployments to hit latency, throughput, and compliance goalsMediumSupport load scales faster than product automationMedium-HighStandardize playbooks and self-service toolingMedium-High
Status visibilityBaseten status page plus incomplete third-party monitoringPrimary external signal for uptime eventsHighPublic signals understate incident depth or recurrenceMediumRequest internal reliability dashboards and postmortemsMedium

The company clearly mitigates concentration better than a single-cloud provider, but public evidence still leaves supplier mix and reserved-capacity exposure opaque.

[CR021, CR022, CR023, CR024, CR025, CR026]
FR003: Dependency map

Baseten's risk surface depends on cloud and GPU partners, third-party SaaS vendors, enterprise controls, and customer-specific contracting.

[CR021, CR022, CR023, CR029, CR037, CR042]

7.4 Execution and financial-model risk rise with product sprawl, rapid scaling, and valuation pressure

Baseten is no longer a narrowly scoped model-hosting startup. Public materials now cover Model APIs, Dedicated Inference, Frontier Gateway, model management, custom servers, chains, Training Jobs, and an early-access RL product called Loops. At the same time, the company raised a $300M Series E at a $5B valuation in January 2026 after multiple fundraises in the prior year. That capital lowers near-term financing risk, but it also raises the execution bar: investors and customers will expect the company to convert premium infrastructure positioning into repeatable enterprise growth without losing reliability or gross-margin discipline. The execution burden is amplified by staffing and go-to-market posture. Tracxn shows rapid employee growth, while Baseten's own site repeatedly emphasizes hands-on engineering support, forward-deployed expertise, custom SLAs, and deployment customization. That can be a differentiator for early enterprise expansion, but it also suggests a service-heavy operating model that may be hard to scale cleanly if product complexity, support demands, and customer-specific security asks keep expanding in parallel. The top people and execution question is therefore not whether Baseten has talent; it is whether the organization can maintain product quality, customer responsiveness, and economic discipline while broadening scope this quickly. External capital markets reinforce that risk: Modal said it raised $355 million at a $4.65 billion valuation after surpassing roughly $300 million in annualized revenue, Fireworks AI is already operating at roughly $315 million in annualized revenue on Sacra's estimates, RunPod shows that much lower-capital rivals can still compete on price, and CoreWeave's nearly $100 billion backlog shows how much well-financed infrastructure is chasing the inference opportunity.[CR030, CR031, CR032, CR033, CR034, CR035]

People / execution risk register
Role or functionDependency or gapLikelihoodSeverityMitigationDiligence path
Executive team and operating cadenceMust translate rapid fundraising into durable enterprise execution at a $5B valuationMediumHighFresh capital and marquee customersReview 2026 plan, hiring targets, and product roadmap sequencing
Product and engineeringMultiple product lines are expanding simultaneously, including an early-access training layerMedium-HighHighShared inference stack and toolingAsk for GA readiness criteria, bug backlog, and reliability ownership map
Customer success and supportHands-on engineering support may be required to justify premium pricingMediumMedium-HighForward-deployed expertise and Enterprise plan packagingRequest support ratios, escalation SLAs, and reference calls
Sales and security procurementEnterprise controls are plan-gated and can require custom contractingMediumMediumSSO/SCIM, self-hosting, custom SLAsReview average sales cycle, security-review conversion, and healthcare close rates

This register emphasizes execution scalability rather than founder-quality judgments; the central question is whether Baseten can widen scope without turning every major account into a custom services project.

[CR030, CR031, CR032, CR033, CR034, CR035]
FR001: Risk heatmap

The highest residual risks are compliance-scope ambiguity, reliability contract gaps, premium pricing, and execution sprawl.

[CR010, CR019, CR023, CR026, CR034, CR041]

7.5 Public mitigations are meaningful, but the investment case should remain gated by explicit kill criteria

Baseten does have tangible mitigations. Self-hosted and hybrid deployments reduce lock-in and residency risk, Truss improves portability, rolling deployments reduce cutover risk, SSO/SCIM strengthens access control, and the billing usage API gives customers a better handle on spend. Those are not superficial website badges; they are concrete product and operational features that can shrink risk if they are fully implemented in production accounts. Even the legal and regulatory risk is manageable if diligence confirms a clean BAA path, acceptable subprocessor controls, and contract terms that match the criticality of the workload. The key is to keep the underwriting discipline explicit. The thesis should break if Baseten cannot close the gap between public reliability marketing and executable SLA language, if price-performance remains far above peers without a quantified enterprise ROI offset, if supplier concentration turns out to be tighter than the multi-cloud story suggests, or if support-heavy sales prove necessary just to preserve baseline account health. The right posture is therefore conditional conviction: Baseten has credible mitigations, but the remaining evidence gaps are material enough that they should be converted into monitored triggers before a high-confidence investment view is allowed to stand. [CR020, CR029, CR037, CR038, CR039, CR040]

Mitigation and kill criteria table
RiskMonitorable triggerThreshold or eventAction implication
Healthcare compliance ambiguitySigned BAA / HIPAA addendum availabilityNo executable BAA path or unclear PHI boundary for a top healthcare opportunityPause regulated-healthcare underwriting until contract evidence is produced
Reliability gapSev1 incident cadence and SLA termsTwo or more material incidents in 90 days or no negotiated remedy beyond the public SLAReduce conviction, require pricing holdback, or stop the process
Price competitivenessPeer GPU economics and customer ROIBaseten remains >2x public peer pricing on target GPU classes without quantified enterprise ROI offsetDowngrade margin and win-rate assumptions
Execution sprawlGA readiness and support loadTraining/Loops, gateway, and dedicated deployments all require heavy custom support to stay healthyTreat the model as services-heavy rather than software-like
Supplier concentrationCapacity-booking evidenceReserved capacity or supplier diversification is materially weaker than the multi-cloud narrative impliesIncrease dependency discount and require contingency planning

Kill criteria are deliberately monitorable so they can be tied to diligence outputs rather than generic caution.

[CR019, CR026, CR027, CR029, CR036, CR037]
Chapter 08

08Valuation

8.1 Recommendation: track the company at the closed price, but do not chase momentum pricing

Baseten looks like a high-quality company operating in one of the best parts of the AI stack, but the investment call is still constrained by what is and is not publicly provable. The cleanest valuation anchor is the closed January 2026 Series E: $300 million raised at a $5 billion valuation. That mark is real, recent, and corroborated across Baseten’s own announcement, Business Wire, Tech Funding News, and Tracxn. The harder question is whether public evidence supports treating that mark as attractive, fair, or already stretched. The answer is conditional rather than categorical. Sacra’s $600 million annualized-revenue estimate makes the closed round look plausibly supportable at about 8.3x implied revenue, especially against a comp set that includes MongoDB-like low-double-digit public multiples and Modal or Fireworks private multiples in the low-to-mid teens. But that same public record also shows how fragile the case is: Baseten’s disclosed pricing is premium, third-party pricing matrices say it is expensive on paper, and the mooted $11 billion follow-on would leap straight into premium-software territory without corresponding primary financial disclosure. That is not a setup for a clean buy call. The right posture is therefore track with medium confidence, high risk rating, and a stretched valuation stance. The company is worth staying close to because the market is real and the product is differentiated, but investors should insist on diligence that converts private enthusiasm into evidence before underwriting a richer entry price.[CV001, CV002, CV007, CV008, CV009, CV035]

Recommendation summary table
RecommendationConfidenceRisk ratingValuation stanceDecision implication
trackMediumHighStretchedStay engaged at the closed $5B mark only with strict diligence; do not underwrite a step-up round on public evidence alone.

The call is explicitly price-sensitive: it separates the closed Series E anchor from any hotter follow-on narrative.

[CV001, CV007, CV035, CV042, CV046]
FV001: Recommendation logic

The current call stays at track because strong category and product signals are offset by revenue-opacity and valuation-step-up risk.

This is a reasoning map, not a weighted scoring model.

[CV001, CV030, CV031, CV042, CV046, CV047]

8.2 The price is only defensible if revenue quality is real and premium pricing survives competition

The bull case starts with speed and timing. Baseten’s financing path moved from a $75 million Series C in February 2025 to a $150 million Series D at $2.15 billion in September 2025 and then to a $5 billion Series E in January 2026. Sacra’s estimate of $600 million in annualized revenue and Baseten’s own 100x inference-volume claim are directionally consistent with a company that hit a steep adoption curve just as inference moved from prototype infrastructure into production infrastructure. Public market and private comp work then gives that growth some context: Modal’s May 2026 round was struck at about 15.5x annualized revenue, Fireworks’ closed valuation works out near 12.7x, and MongoDB-like public infrastructure treatment is still around 10x. The anti-thesis is just as important. Baseten’s pricing page, HostFleet’s matrix, and Runpod’s comparison all point to a premium-priced service layer rather than a commodity endpoint vendor. That can be a strength if the premium buys uptime, support, compliance, and hybrid flexibility. It can also cap the multiple if the business is more support-heavy and lower-margin than premium public software names. Hyperscalers make the downside sharper: AWS, Google, and Azure can bundle model access, compute, governance, and credits inside broader cloud relationships. In other words, Baseten may deserve a premium to raw AI cloud, but it has not yet earned the disclosure quality that would let investors pay Cloudflare-like or Datadog-like premiums with confidence.[CV003, CV004, CV005, CV006, CV010, CV011]

Thesis / anti-thesis table
ArgumentWhat supports itWhat would change the view
Inference is becoming a production budget, not an experiment budget.Technavio and Mordor both show large, fast-growing AI deployment markets, while Baseten’s financing pace shows capital chasing the theme.Evidence of slower adoption, weaker enterprise conversion, or shrinking model-serving budgets would reduce the premium case.
Baseten may deserve a premium because it sells performance, hybrid control, and support rather than bare GPU time.Baseten markets custom SLAs, self-hosting, priority GPUs, cross-cloud scale, and forward-deployed engineers.If customers can replicate acceptable latency and uptime on cheaper alternatives, the premium becomes a liability rather than a moat.
The closed $5B round is plausible if the private revenue estimate is roughly right.Sacra’s $600M run-rate estimate implies about 8.3x revenue, below many premium software comps.Primary finance data that lands far below the estimate would break this support quickly.
The $11B narrative is not yet underwritten by public facts.The only public support is third-party reporting of talks, not a closed financing or disclosed fundamentals.A signed term sheet or closed announcement with clean terms and corroborated financials would materially improve the case.

The table separates company quality from valuation quality; both have to work for a buy call.

[CV007, CV008, CV010, CV011, CV015, CV030]
FV002: Valuation sensitivity

The revenue required to justify a $5B valuation changes sharply depending on which comparable multiple investors anchor to.

Each bar divides the $5B Series E mark by a selected comparable multiple; values are support thresholds, not forecasts.

[CV017, CV020, CV022, CV024, CV027, CV029]

8.3 Comparable work and scenarios put $5B inside the base case but leave little room for error

The comparable set matters because Baseten sits between two valuation regimes. On one side are AI-cloud and infrastructure names like CoreWeave, where capital intensity is visible and public multiples are lower. On the other are premium infrastructure-software names like Datadog and Cloudflare, where disclosure quality, margins, and platform breadth allow much richer trading. Baseten’s best public revenue estimate places the closed $5 billion round between those regimes rather than squarely in either one. That is why the closed round is arguable while the rumored $11 billion step-up is not yet underwritten by public evidence. Scenario work makes the same point more concretely. A bear case that assumes only $300 million to $400 million of durable revenue support and a 7x to 9x multiple points to material down-round risk. A base case that uses $500 million to $650 million and 8x to 12x supports roughly $4 billion to $7.8 billion, which comfortably contains the closed Series E. A bull case requires both stronger revenue continuation and a multiple closer to Modal-like or upper private-market inference treatment. That means the current investment debate is not whether Baseten is interesting; it is whether an investor is being paid for the uncertainty between a defensible $5 billion mark and an aspirational $11 billion narrative.[CV016, CV017, CV018, CV019, CV020, CV021]

Bull / base / bear scenario table
ScenarioAssumptionsValuation / return logicKey risksProbability signal
BearDurable revenue support is only $300M-$400M, premium pricing erodes, and independent vendor economics look more like infrastructure than software.$2.1B-$3.6B using a 7x-9x range; the current mark would be vulnerable to a reset.Price pressure, lower margin quality, or slower enterprise expansion expose down-round risk.This case rises if diligence shows support-heavy delivery, concentrated revenue, or weak unit economics.
BaseRevenue support lands around $500M-$650M, growth remains strong, and Baseten keeps a moderate premium to raw AI cloud.$4.0B-$7.8B using 8x-12x; the closed $5B Series E sits inside this band.The call still depends on validating revenue quality and gross margin, not just topline growth.This is the most defensible case on today’s public evidence.
BullRevenue support reaches $700M-$900M, premium economics hold, and investors keep paying Modal-like or better private inference multiples.$8.4B-$14.4B using 12x-16x; an $11B step-up becomes possible.The upside depends on sustained hypergrowth and a premium-quality margin/disclosure profile that is not public yet.This case requires more proof than the current public record supplies.

Ranges are scenario outputs, not point estimates, and are designed to show how quickly the underwriting shifts when revenue support or multiple choice changes.

[CV035, CV036, CV043, CV044, CV045]
Comparable valuation table
ComparableValuation contextRevenue contextImplied multipleRelevance to BasetenLimitation
Baseten closed Series E $5.0B post-money (Jan 2026)~$600M annualized revenue estimate~8.3xDirect anchor for the current underwriting debate.Revenue support is third-party-estimated, not company-disclosed.
Modal $4.65B post-money (May 2026)~$300M annualized revenue~15.5xClosest premium private comp for elastic inference infrastructure.Not the same mix of enterprise support, compliance, or customer base.
Fireworks AI $4.0B post-money (Oct 2025 closed)~$315M annualized revenue estimate in Feb 2026~12.7xRelevant private inference comp with explicit gross-margin discussion.Revenue and margin are also third-party estimates, not audited disclosure.
CoreWeave $59.75B market cap (May 2026)~$12.5B 2026 revenue context / guide midpoint proxy~4.8xUseful pure-play AI cloud floor for capital-intensive infrastructure.Much larger scale, debt profile, and business model than Baseten.
Datadog $88.04B market cap (May 2026)~$4.32B 2026 revenue guide midpoint~20.4xPremium public infrastructure-software benchmark for disclosed growth quality.Observability software carries better margin and disclosure quality than Baseten’s public record.
Cloudflare $85.47B market cap (May 2026)~$2.33B trailing revenue~36.7xUpper-bound developer-platform multiple reference.Category leadership and public-company maturity are far stronger than Baseten’s today.
MongoDB $27.01B market cap (May 2026)~$2.60B trailing revenue~10.4xLower-middle public infrastructure-software reference.Database economics and installed base are not the same as inference infrastructure.

This is a partial but intentionally broad sample spanning private inference peers, AI cloud, and public infrastructure software to bracket what the market could plausibly pay.

[CV016, CV017, CV019, CV020, CV021, CV022]
FV003: Valuation / return range

Public evidence places the closed $5B round inside the base case, while a rumored $11B round requires bull-case assumptions.

Scenario bands combine revenue-support ranges and comp-multiple bands; the rumored follow-on is shown as an external signal, not an endorsed anchor.

[CV008, CV043, CV044, CV045]
FV004: Investment KPIs

Baseten scores well on market tailwind and product differentiation, but much lower on disclosure quality and margin certainty.

Scores are directional IC-style judgments based only on retained public evidence as of the run date.

[CV025, CV030, CV031, CV040, CV041, CV042]

8.4 The thesis should be gated by terms, margin evidence, and concentration—not by enthusiasm alone

The final call hinges on a small number of diligence items that can move valuation quickly. First, investors need management-grade revenue evidence. If the company is truly around a $600 million annualized run-rate with strong expansion and acceptable concentration, the closed $5 billion round starts to look reasonable and the next mark becomes debatable rather than fanciful. If the real number is materially lower, the same pricing and comp work flips from “defensible premium” to “overextended late-stage mark.” Second, investors need direct margin data. Fireworks’ roughly 50% gross margin is a useful peer reminder that inference businesses are not pure software. Baseten’s premium pricing only deserves a premium multiple if utilization, support load, and reserved-capacity economics produce better margin quality than that analogy implies. Third, investors need the terms underneath the headline price. Preference overhang, secondaries, and customer concentration can matter more than the post-money headline. This is why the right kill triggers are practical rather than rhetorical: if Baseten cannot preserve premium price-performance with acceptable margin, if growth slips below the base-case band, or if any new round clears only with aggressive terms, the thesis should be downgraded quickly. Until those questions are closed, the company deserves active tracking and structured diligence rather than a high-conviction price-insensitive buy decision.[CV023, CV025, CV032, CV033, CV034, CV039]

Thesis-break and kill triggers table
TriggerThreshold / signalTransmission to thesisAction implication
Revenue proof breaksManagement-grade run-rate lands materially below $500M or growth decelerates sharply from the public narrative.The closed $5B round falls out of the base-case band and starts looking like a late-cycle mark.Downgrade the recommendation and reset valuation work to the bear-case range.
Margin quality disappointsGross margin and utilization look closer to infrastructure resale economics than premium software economics.Premium software multiples no longer fit the business model.Apply a lower comp set and require a meaningfully better entry price.
Price-performance edge erodesCustomers can achieve acceptable production results on cheaper alternatives or bundled hyperscaler products.Baseten’s premium pricing turns from moat into adoption friction.Cut conviction and revisit long-term market-share assumptions.
Aggressive financing terms appearA new round clears only with heavy preferences, secondary-heavy structures, or unusual protections.Headline valuation stops mapping cleanly to common-equity upside.Treat the mark as structurally weaker and rework return expectations.
Concentration emergesA handful of AI-native accounts drive an outsized share of revenue without corresponding retention evidence.The company’s revenue quality and durability become much less attractive than the headline growth rate.Pause any high-conviction call until concentration and expansion data are clarified.

These triggers are designed to be monitorable and directly tied to valuation support, not generic operating caution.

[CV012, CV013, CV014, CV039, CV042, CV043]
Final diligence asks table
TopicMissing evidenceWhy it mattersOwner or diligence path
Revenue bridgeMonthly and quarterly revenue, ARR, and cohort expansion through the current quarter.This is the single biggest determinant of whether $5B is fair or already stretched.CFO / finance package and board materials.
Gross margin and utilizationGross margin by product surface, GPU utilization, reserved-capacity mix, and support burden.The multiple only deserves to converge toward premium software if margin quality is real.Infra + finance deep dive with product-line detail.
Cap table and termsFully diluted share count, liquidation preferences, secondaries, and any structured terms.Headline post-money can overstate common-equity upside.Legal + finance review of latest financing docs.
Customer concentration and retentionTop-customer exposure, NRR, logo retention, and vertical mix across AI-native and enterprise accounts.A premium multiple is fragile if spend is concentrated or non-repeatable.Sales ops and customer-success cohort review.
Step-up financing evidenceSigned term sheet or closed-round proof for any valuation above $5B.A rumored mark should not replace a closed anchor in underwriting.Board process review and direct financing confirmation.

The asks are ranked by how quickly they can change valuation support rather than by general company importance.

[CV025, CV037, CV046, CV047, CV048]

Disclaimer

This report was produced by an automated research workflow using publicly available information as of 2026-05-30. It is not investment advice. Private-company data may be incomplete, stale, or estimated, and investors should supplement this report with management diligence, contractual review, and direct access to financial materials before making any investment decision.

Evidence index

Claims
IDStatementConfidenceSources
CO001 Baseten was founded in 2019. High SO009, SO016, SO018
CO002 Baseten is based in San Francisco. High SO008, SO014, SO016
CO003 Baseten’s legal entity is Baseten Labs, Inc., and its privacy policy lists 201 Spear St, Suite 1600, San Francisco, CA 94105. High SO007, SO008
CO004 Baseten currently presents itself as an inference company built around high-performance production inference. High SO001, SO003, SO014
CO005 Official product surfaces show that Baseten combines production inference, model APIs, and training workflows in one platform. Medium SO001, SO005
CO006 Baseten sells cloud, self-hosted, and region-aware deployment options aimed at customers that need control over security or data residency. High SO003, SO004, SO005
CO007 Baseten says it is SOC 2 Type II and HIPAA compliant across its hosting options. High SO003, SO004, SO005
CO008 Baseten’s careers page, customer hub, and Series E press release name customers such as Abridge, Cursor, OpenEvidence, Speechify, Gamma, Clay, Notion, and Lovable. High SO006, SO002, SO014
CO009 The founders say they started Baseten at the end of 2019 to solve model-deployment and ML-infrastructure pain they had experienced themselves. Medium SO009
CO010 Tuhin Srivastava is publicly identified as CEO and co-founder. High SO031, SO015
CO011 Amir Haghighat is publicly identified as CTO and co-founder. High SO032, SO015
CO012 Phil Howes is publicly identified as a co-founder, and Tech Funding News describes him as chief scientist. Medium SO034, SO015
CO013 Pankaj Gupta is publicly identified as a co-founder. Medium SO033, SO015
CO014 Baseten’s Series E announcement is signed by Amir, Pankaj, Phil, and Tuhin, showing that all four founders still anchor the public leadership narrative. Medium SO013
CO015 Public governance visibility is limited in the fetched corpus, but Series D explicitly says Jay Simons joined Baseten’s board. Medium SO012, SO016
CO016 By the Series A milestone, Baseten said it had raised a little over $20 million across seed and Series A, with Greylock leading the Series A and South Park Commons, Lachy Groom, and Ray Tonsing also involved. Medium SO009
CO017 Tracxn and the archived PitchBook profile both place Baseten’s Series A on 2022-04-26. Medium SO016, SO018
CO018 Baseten’s Series B added $40 million led by IVP and Spark, with Greylock, South Park Commons, Lachy Groom, and Base Case also participating. Medium SO010
CO019 Tracxn records the Series B round date as 2024-03-04. Medium SO016
CO020 Baseten’s Series C raised $75 million on 2025-02-19 and was backed by IVP, Spark, Greylock, Conviction, South Park Commons, Basecase, Lachy Groom, Adam Bain, and Dick Costolo. High SO011, SO016
CO021 The Series C post says Baseten was already running workloads across thousands of GPUs and serving millions of end customers worldwide by early 2025. Medium SO011
CO022 Baseten’s Series D raised $150 million, was led by BOND, and brought CapitalG and Conviction into the round alongside prior investors. High SO012, SO016
CO023 CB Insights and Tracxn both peg Baseten’s September 2025 valuation at about $2.15 billion. Medium SO017, SO016
CO024 Series D linked financing to governance by adding Jay Simons to the board. Medium SO012
CO025 Baseten’s Series E raised $300 million at a $5 billion valuation, led by IVP and CapitalG with NVIDIA and several prior investors participating. High SO014, SO013, SO015
CO026 Tracxn and CB Insights both list the Series E closing date as 2026-01-20. Medium SO016, SO017
CO027 BusinessWire says Baseten has raised $585 million to date and that the Series E financing was its third fundraise in the prior year. High SO014, SO016, SO017
CO028 BusinessWire describes Baseten as infrastructure behind AI products including Cursor, Mercor, Clay, OpenEvidence, Lovable, and Abridge. Medium SO014
CO029 Official enterprise and healthcare pages market four-nines reliability, multi-cloud autoscaling, and region-restricted or self-hosted deployment options for sensitive workloads. High SO003, SO004, SO001
CO030 NVIDIA’s case study says Baseten reduced cold starts to 5–10 seconds from up to five minutes and doubled one customer’s inference performance with TensorRT-LLM. Medium SO027
CO031 OpenEvidence’s case study says Baseten now serves billions of requests per week for OpenEvidence and reduced end-to-end latency from over 700 milliseconds to 160 milliseconds. Medium SO023
CO032 Gamma says it generates roughly 3 million images per day on Baseten for 70+ million users and more than $100 million of ARR. Medium SO024
CO033 Speechify says Baseten cut its cost per million characters by 44% while supporting a 60M+ user base. Medium SO025
CO034 Patreon says Baseten saved more than 440 developer hours per year and cut GPU costs by 70% for Whisper-based workloads. Medium SO026
CO035 A January 2026 WorkOS interview says Baseten had just launched a startup program for seed and Series A companies and saw voice as an emerging modality. Medium SO028
CO036 Current headcount is not cleanly supportable because PitchBook and Tracxn show conflicting employee counts and entity-level staff figures. Low SO018, SO016
CO037 ServiceAlert’s 90-day monitor showed May 2026 at 100% uptime and zero days with issues, but it also says it only tracks daily worst-status reachability and lacks detailed incident data. Medium SO030
CO038 Nudge Security frames Baseten as a vendor-risk review target and lists security badges and SSO or MFA features, but it is an external aggregator rather than Baseten’s primary trust documentation. Medium SO029, SO007
CO039 Abridge describes itself as enterprise-grade AI for clinical conversations used by large healthcare systems, consistent with Baseten’s official claim that healthcare AI is a core customer segment. Medium SO019, SO006
CO040 Cursor says it is trusted by over half of the Fortune 500, supporting Baseten’s official claim to serve category-defining AI applications rather than only hobby use cases. Medium SO021, SO014
CO041 Clay says more than 500,000 GTM teams use its platform, and Baseten’s Series E materials identify Clay as a customer. Medium SO020, SO014
CO042 OpenEvidence positions itself as America’s Official Medical Knowledge Platform with major medical-content partners, and Baseten names it as a customer in both careers and Series E materials. Medium SO022, SO006, SO014
CO043 External market-data sources classify Baseten as a private Series E company. Medium SO016, SO018
CO044 Baseten’s pricing page shows a pay-as-you-go model with token-priced model APIs and per-minute compute pricing for GPU and CPU instances. Medium SO005
CO045 Baseten’s terms define the service as a platform for deploying machine learning models and building or operating applications for machine learning through a web interface. Medium SO007
CO046 The Series A post says Baseten quietly announced its first product in May after more than 18 months of building and used the funding moment to launch a public beta. Medium SO009
CM001 Baseten positions itself as a platform for high-performance inference in production rather than as a foundation-model creator. Medium SM001, SM003, SM006
CM002 Baseten's product surface now spans Model APIs, Dedicated Inference, Chains, Frontier Gateway, and Training. Medium SM005, SM006, SM007, SM015, SM016
CM003 The included spend in Baseten's core market is model-serving runtime, autoscaling, observability, billing or metering, and associated performance support attached to production inference. Medium SM002, SM006, SM007, SM016, SM018
CM004 The excluded spend is frontier-model R&D and the broader data or analytics stack that hyperscaler AI suites bundle but Baseten does not foreground. Medium SM015, SM027, SM028, SM029
CM005 Baseten competes inside overlapping categories of AI inference-as-a-service, broader AI inference, and enterprise AI platform software rather than a single cleanly defined market. Medium SM019, SM020, SM021
CM006 Status-quo substitutes include hyperscaler AI platforms, internal GPU infrastructure, and specialist GPU clouds such as Modal, Replicate, and Runpod. Medium SM022, SM023, SM024, SM027, SM028, SM029
CM007 Baseten's positioning emphasizes open-source and custom model deployment rather than ownership of closed frontier models. Medium SM005, SM006, SM014
CM008 Training is an adjacency for Baseten, but the commercial center of gravity remains inference and the train-to-deploy loop that feeds inference endpoints. Medium SM001, SM006, SM015
CM009 Technavio values the AI inference-as-a-service market at USD 85.25 billion in 2025. Medium SM019
CM010 Technavio forecasts a 22.1% CAGR for AI inference-as-a-service during 2026-2030. Medium SM019
CM011 Technavio says the GPU segment accounts for more than 58% of the AI inference-as-a-service market and that cloud deployment dominates the category. Medium SM019
CM012 Technavio says North America contributes 41.1% of forecast growth in AI inference-as-a-service. Medium SM019
CM013 Mordor Intelligence puts the enterprise AI market at USD 114.87 billion in 2026 and projects 18.91% CAGR through 2031. Medium SM020
CM014 Mordor says software and platforms led 65.89% of 2025 enterprise AI revenue. Medium SM020
CM015 Mordor says cloud solutions accounted for 67.33% of 2025 enterprise AI revenue while hybrid and edge configurations are the faster-growing deployment path. Medium SM020
CM016 Large enterprises accounted for 71.43% of 2025 enterprise AI spending in Mordor's market model. Medium SM020
CM017 Healthcare and life sciences are Mordor's fastest-growing enterprise AI vertical at 20.77% CAGR. Medium SM020
CM018 Fortune Business Insights values the broader AI inference market at USD 117.80 billion in 2026, up from USD 103.73 billion in 2025. Medium SM021
CM019 Fortune forecasts 12.98% CAGR to 2034 and says North America held 41.78% of the AI inference market in 2025. Medium SM021
CM020 Across public lenses, Baseten's addressable opportunity is clearly large but scope-sensitive: roughly USD 85 billion for inference-as-a-service today and more than USD 100 billion for adjacent inference or enterprise AI platform categories. Medium SM019, SM020, SM021
CM021 Baseten's best-evidenced buyer groups are performance-sensitive AI product teams, enterprise AI infrastructure teams, and model labs monetizing APIs. Medium SM003, SM010, SM016
CM022 Gamma shows a PLG or self-serve segment that values low latency and open-source model serving without building an internal ML infrastructure team. Medium SM012
CM023 OpenEvidence shows a regulated healthcare segment that wanted reliable performance, redundancy, and flexible compute without multi-year GPU commitments. Medium SM011
CM024 Writer shows enterprise model teams serving 70B models need multi-GPU performance engineering and secure deployments. Medium SM013
CM025 The daily users of Baseten are ML engineers, data scientists, and application engineers, while procurement, security, and IT administrators become stakeholders once deployments require identity, policy, or compliance controls. Medium SM009, SM014, SM017
CM026 Budget ownership appears to begin in product or engineering budgets for usage-based experimentation and shift toward central platform or IT budgets for quoted Pro, Enterprise, or self-hosted deployments. Medium SM002, SM003, SM017, SM018
CM027 Baseten's adoption path commonly starts with Model APIs or simple deployments and expands to Dedicated Inference and Chains as traffic, hardware specialization, or compound workflows grow. Medium SM001, SM005, SM006, SM007
CM028 Frontier Gateway creates a separate buyer motion for labs that need white-labeled APIs, rate limits, token metering, and billing without building their own inference control plane. Medium SM016
CM029 Baseten productizes compliance with HIPAA, SOC 2 Type II, region restrictions, dedicated namespaces, and a no-shared-GPU posture. Medium SM003, SM004, SM009
CM030 Baseten positions self-hosted, hybrid, and cloud deployments as ways to meet data residency, security, and existing cloud-commitment requirements. Medium SM002, SM003, SM006, SM008
CM031 Baseten's Model APIs are OpenAI-compatible and are marketed as 5-10x cheaper than closed alternatives. Medium SM005
CM032 Dedicated Inference is marketed as delivering 6x better GPU utilization and 5-10x lower costs at scale. Medium SM006
CM033 Chains is marketed as giving compound AI teams 6x better GPU usage and roughly half the latency through hardware-aware orchestration. Medium SM001, SM007
CM034 Baseten's value proposition is not just compute rental; customer stories repeatedly emphasize outsourced performance engineering and forward-deployed support. Medium SM003, SM011, SM012, SM013
CM035 OpenEvidence reported 78% lower latency, 6x faster deployments, and 8x-plus lower infrastructure maintenance time after moving to Baseten. Medium SM011
CM036 Gamma reported 30-80% faster image generation, 20% better efficiency, and scaling to 70+ million users and about 3 million images per day on Baseten. Medium SM012
CM037 Writer reported 60% higher tokens per second, 23% lower time to first token, and 35% lower cost per million tokens on four A100 GPUs. Medium SM013
CM038 Enterprise AI growth is being driven by automation demand, exploding data volumes, cloud AI services, and specialized hardware advances. Medium SM020
CM039 AI inference demand is also being driven by real-time processing needs, generative AI workloads, and edge or IoT expansion. Medium SM019, SM021
CM040 Hardware supply constraints, high accelerator prices, and tariff pressure are material market constraints for both inference providers and buyers. Medium SM019, SM021
CM041 Talent shortages and legacy-system integration complexity remain major barriers to enterprise AI rollout. Medium SM020, SM021
CM042 Multi-cloud capacity management and the ability to avoid long-term GPU commitments address a real buyer pain point around capacity risk and demand spikes. Medium SM008, SM011, SM016
CM043 HostFleet's April 2026 matrix shows Baseten's published hourly GPU prices above Runpod, Modal, and Replicate on like-for-like SKUs such as T4, L4, A100, and H100. Medium SM026
CM044 Runpod's own 2026 comparison ranks Baseten fifth and attributes 8-12 second cold starts to Baseten while highlighting cheaper or faster specialist alternatives. Medium SM025
CM045 Hyperscaler substitutes bundle model deployment with broader data, notebook, governance, and agent tooling rather than pure inference specialization. Medium SM027, SM028, SM029
CM046 Baseten's clearest public beachheads are high-performance consumer AI products, regulated healthcare workloads, and model labs monetizing proprietary models. Medium SM011, SM012, SM013, SM016
CM047 Public pricing and packaging imply Baseten trades a higher headline GPU rate for bundled performance engineering, observability, security, and managed support. Medium SM002, SM003, SM016, SM026
CM048 Public sources do not isolate a clean Baseten-specific SAM or SOM because available estimates mix enterprise AI, inference infrastructure, cloud, edge, and model-serving categories. Medium SM019, SM020, SM021
CM049 Public material does not disclose Baseten's contract sizes, attach rates for support, or revenue mix across Model APIs, Dedicated Inference, Training, and Frontier Gateway. Medium SM002, SM016, SM018
CP001 Baseten's current product surface spans custom-model deployment, Model APIs, training, Chains orchestration, and Frontier Gateway. High SP003, SP005, SP006, SP007, SP010
CP002 Baseten supports Baseten Cloud, single-tenant or self-hosted deployments, and multi-cloud capacity or cross-cloud autoscaling. High SP001, SP003, SP004, SP009
CP003 Baseten's public pitch centers on speed, uptime, and developer experience instead of lowest-cost raw GPU capacity. High SP001, SP008, SP009, SP029
CP004 Modal positions as Python-first serverless AI infrastructure with instant autoscaling to 1000+ GPUs and built-in observability. High SP014, SP015
CP005 Replicate positions around one-line APIs, community-published models, fine-tuning, and custom deployment through Cog. High SP016, SP017
CP006 Runpod offers Pods, Serverless, and Clusters, emphasizing fast scaling, many GPU SKUs or regions, and low-cost capacity. High SP018, SP019, SP020
CP007 AWS SageMaker, Google Vertex AI, and Azure ML each market broader end-to-end ML or AI lifecycle tooling with strong enterprise controls. High SP021, SP023, SP024
CP008 Internal build remains a real substitute because Truss packages models portably and can narrow the software gap between local, self-managed, and hosted deployments. High SP011, SP003
CP009 Frontier Gateway lets model labs ship white-labeled APIs with per-user keys, rate limits, and metering, widening Baseten's competitor set to lab-facing platforms. Medium SP010, SP029
CP010 PitchBook and Tracxn independently name Modal and Replicate among Baseten's comparable competitors, supporting the direct-peer set beyond vendor marketing. High SP027, SP026
CP011 Baseten's public plans split into Basic pay-as-you-go, quote-driven Pro, and Enterprise, with priority compute, dedicated compute, self-host, and custom SLAs above Basic. High SP002, SP009
CP012 Baseten says customers do not pay for idle time and only pay while models are deploying, scaling, or processing on the platform. Medium SP002
CP013 Baseten advertises SOC 2 Type II, HIPAA compliance, and no default storage of model inputs or outputs. High SP002, SP004, SP009
CP014 Baseten's runtime layers open-source engines such as TensorRT-LLM, SGLang, vLLM, TGI, and TEI with custom optimizations like speculative decoding and KV-cache management. High SP003, SP008
CP015 Baseten Model APIs are OpenAI-compatible and can move from shared APIs to dedicated deployments on Baseten-managed hardware. High SP005, SP003
CP016 Modal's public pricing offers $30 per month in Starter credits, 10 GPU concurrency on Starter, and 50 GPU concurrency on Team at $250 per month plus compute. Medium SP015
CP017 Replicate private models usually bill for setup, idle, and active time on dedicated hardware, making always-warm custom deployments costlier than pure scale-to-zero billing. High SP017, SP016
CP018 Runpod Secure Cloud and Serverless publish materially lower raw GPU list prices than Baseten's public per-GPU pricing for comparable capacity tiers. Medium SP019, SP020, SP002, SP025
CP019 Runpod Serverless bills per second from worker start until full stop, with flex workers scaling to zero and active workers remaining on. High SP020, SP018
CP020 AWS Bedrock prices open-model access by provider or model and service tier, and its batch inference option is listed at 50% below on-demand pricing. Medium SP022
CP021 Google Vertex AI prices tools, compute, storage, and management fees separately rather than offering simple public list GPU rates. Medium SP023
CP022 Azure ML charges no standalone platform fee but bills the Azure services consumed around training, deployment, storage, and monitoring. Medium SP024
CP023 Baseten's multi-cloud and self-host options reduce buyer fear of cloud lock-in, but they also make it easier for customers to migrate away from Baseten later. High SP001, SP009, SP011
CP024 Baseten's public trust posture is stronger than most self-serve peers because it combines compliance claims with single-tenant and self-host deployment modes. High SP002, SP004, SP009
CP025 Hyperscalers retain the strongest distribution power because Bedrock or SageMaker, Vertex AI, and Azure ML sit inside existing identity, billing, and procurement relationships. High SP021, SP023, SP024
CP026 Modal narrows the enterprise gap with marketplace transacting, SSO, audit logs, and HIPAA on Enterprise, but its public package is still compute-led rather than inference-specific governance. Medium SP014, SP015
CP027 Replicate minimizes adoption friction for prototypes through community models and simple APIs, but its public materials disclose less enterprise control than Baseten's. Medium SP016, SP017, SP009
CP028 Runpod explicitly markets no lock-in, low cost, and fast scale, making it attractive to cost-sensitive teams comfortable assembling their own serving stack. High SP018, SP019, SP020
CP029 Open-source packaging via Truss and Cog plus raw GPU clouds make multi-homing structurally easier in this market than in closed-model or data-platform markets. High SP011, SP017, SP019
CP030 Baseten's expansion into training and lab-facing gateway products moves it from pure hosting into a broader AI infrastructure platform category. High SP006, SP010, SP029
CP031 Baseten's main moat is the integration of optimized runtimes, multi-cloud capacity, enterprise deployment modes, and hands-on engineering support rather than exclusive model ownership. High SP008, SP009, SP029
CP032 Truss can create developer pull and portability at the same time, so it is both a funnel asset and a limiter on hard lock-in. High SP011, SP003
CP033 HostFleet's April 2026 matrix shows Baseten as the highest published per-GPU-hour option among the compared serverless hosts on multiple common GPU tiers. Medium SP025, SP002, SP015, SP017, SP019
CP034 The same HostFleet comparison still argues Baseten is attractive for production workloads because Truss, observability, and support are tangible despite higher headline pricing. Medium SP025, SP002, SP003
CP035 Baseten's public status page reports 99.91% uptime for Model APIs over the displayed window and records multiple May 2026 incidents. Medium SP012
CP036 Servicealert's independent outage tracker also shows non-perfect recent availability for Baseten, reinforcing that reliability remains a diligence item. Medium SP013, SP012
CP037 Sacra identifies hyperscaler bundling and below-market pricing as the clearest external threat to independent inference platforms like Baseten. Medium SP028, SP021, SP023, SP024
CP038 Business Wire and TechFundingNews both frame Baseten's current strategic battleground as production inference infrastructure rather than frontier-model training ownership. Medium SP029, SP030
CP039 Business Wire says Baseten has raised $585 million and counts NVIDIA, IVP, and CapitalG among key investors, improving its staying power in a capital-intensive market. Medium SP029, SP028, SP026
CP040 Baseten's best-supported positioning is premium, production-grade open-model inference for teams that value performance, portability, and support more than lowest-cost GPU hours. High SP003, SP008, SP009, SP025, SP028
CI001 Baseten's public monetization surfaces span dedicated deployments, Model APIs, and Training. High SI001, SI003, SI004, SI007
CI002 Baseten's public plan structure is Basic at $0 per month pay-as-you-go, with Pro and Enterprise sold via quote. Medium SI001
CI003 Pro includes priority access to high-demand GPUs, dedicated compute, higher Model API rate limits, hands-on engineering expertise, dedicated Slack and Zoom support, and volume discounts. Medium SI001
CI004 Enterprise includes custom SLAs, self-host deployments, use of existing cloud commitments, full control over data residency, and advanced RBAC with Teams. Medium SI001, SI005
CI005 Baseten publishes Model API list pricing per 1 million tokens with separate columns for input, cached input, and output. High SI001, SI003
CI006 Dedicated deployments are billed only for compute used, down to the minute. Medium SI001
CI007 Baseten says customers do not pay for idle time, but do pay while a model is deploying, scaling up or down, or making predictions. Medium SI001
CI008 Baseten sells Training both as Loops early access and as generally available Training Jobs, with a direct train-to-deploy path into production inference. Medium SI004
CI009 Baseten's Terms state that fees are billed at the end of the month and payable within 30 days unless an Order says otherwise. Medium SI013
CI010 Baseten's Terms make the Order the binding commercial instrument, so enterprise economics can vary contract by contract even though list pricing is public. Medium SI013, SI005
CI011 The billing usage API returns separate dedicated_usage, training_usage, and model_apis_usage blocks with subtotals, credits used, totals, and daily breakdowns. Medium SI007
CI012 The model_apis_usage block reports model name plus input, output, and cached input token counts. Medium SI007
CI013 The dedicated_usage block reports billable resource metadata, minutes, subtotal, and inference request counts. Medium SI007
CI014 Baseten explicitly monetizes support and engineering help through Pro, Enterprise, and enterprise deployment offers. High SI001, SI002, SI005
CI015 Dedicated Inference claims Baseten regularly sees 6x better GPU utilization and 5-10x lower costs powered by its inference stack. Medium SI002
CI016 The Model APIs page claims Baseten can spend 5-10x less than closed alternatives when serving optimized frontier open models. Medium SI003
CI017 The Enterprise page frames Baseten's economic advantage as higher output and better GPU utilization from optimized runtimes rather than seat-based software pricing. Medium SI005
CI018 The Healthcare page says per-minute billing and scale-to-zero make GPU costs scale with active inference rather than idle overhead. Medium SI006
CI019 Writer reports 35% lower cost per million tokens, 60% higher tokens per second, and 23% lower time to first token on Baseten. Medium SI016
CI020 OpenEvidence reports 78% lower latency, 6x faster deployment processes, 8x+ lower infrastructure maintenance time, and flexible access to compute without multi-year contracts. Medium SI017
CI021 Speechify reports 44% lower cost per million characters, 30-50% lower p99 latency, and 4.5x faster replica startup after migrating to Baseten. Medium SI018
CI022 Superhuman reports 80% lower P95 latency and says Baseten freed multiple engineers from building and running inference infrastructure in-house. Medium SI019
CI023 Patreon reports 440+ hours of development time saved per year, $600,000 of resources saved per year, and 70% GPU-cost savings on Baseten. Medium SI020
CI024 Taken together, Baseten's customer proofs sell lower total production cost and faster deployment for serious workloads rather than the lowest raw GPU-hour list price. Medium SI016, SI017, SI018, SI019, SI020
CI025 HostFleet's April 2026 matrix shows Baseten priced above Runpod on every shared GPU SKU it lists, above Modal on the shared L4 and H100 rows, and below only Replicate's A100 custom deployment rate among the shared A100 prices shown. Medium SI027
CI026 HostFleet says Baseten combines per-GPU-hour pricing with a minimum dedicated deployment cost and billed minimum awake times. Medium SI027
CI027 Baseten raised $300 million at a $5 billion valuation in January 2026. High SI010, SI021, SI024, SI025
CI028 Business Wire says the January 2026 financing was Baseten's third fundraise in the prior year and brought total capital raised to $585 million. High SI021, SI024, SI025
CI029 Baseten's Series D was $150 million in September 2025. High SI009, SI024
CI030 Baseten's Series C was $75 million in February 2025. High SI008, SI024
CI031 Tracxn and CB Insights also show $585 million total funding and a $300 million Series E on January 20, 2026. Medium SI024, SI025
CI032 Baseten's Series E blog says inference volume grew 100x in the prior year. Medium SI010
CI033 Baseten's Series E materials say the new capital will fund speed, uptime, developer experience, team growth, and a broader infrastructure platform. High SI010, SI021
CI034 Tech Funding News says the new funding is expected to support hiring in engineering and customer service plus platform and integration expansion. Medium SI022
CI035 Sacra estimates Baseten reached $200 million annualized revenue in December 2025 and $600 million annualized revenue in March 2026. Low SI023
CI036 Sacra says Baseten monetizes either API consumption or GPU minutes and hours and uses multi-cloud capacity management across more than 15 cloud providers instead of owning GPU infrastructure. Medium SI023
CI037 PitchBook labeled Baseten as generating revenue by February 2025 and showed 73 employees in its April 2025 snapshot. Medium SI026
CI038 Tracxn lists 258 employees as of April 2026. Medium SI024
CI039 The jump from 73 employees in PitchBook's 2025 snapshot to 258 in Tracxn's April 2026 snapshot implies substantial operating-expense growth, but payroll and burn are undisclosed. Medium SI024, SI026
CI040 Baseten's status page shows Model APIs at 99.91% uptime over the displayed 90-day window and multiple incidents in May 2026, while the Dedicated Inference component shows 100.0% uptime over the same displayed window. Medium SI015
CI041 The Dedicated Inference SLA targets 99.9% monthly availability, caps service credits at 40% of monthly fees, and requires claims within 24 hours of downtime. Medium SI014
CI042 Baseten's privacy policy identifies the contracting entity as BaseTen Labs, Inc. Medium SI012
CI043 The SEC EDGAR entity landing page for CIK 0001850888 says there is no filings data for the organization, so there are no public SEC operating-company financial statements available from that page. Medium SI029
CI044 Mordor says cloud deployments were 67.33% of enterprise AI revenue in 2025 and hybrid and edge deployments are forecast to grow 19.53% CAGR through 2031. Medium SI028
CI045 Mordor says healthcare and life sciences are forecast to grow 20.77% CAGR through 2031. Medium SI028
CI046 Baseten's enterprise and healthcare pages align with that opportunity through self-host, cloud-commitment, data-residency, HIPAA, and SOC 2 positioning. Medium SI005, SI006, SI028
CI047 Baseten's public materials do not disclose cash balance, monthly burn, runway, gross margin, CAC, NRR, customer concentration, or revenue mix by product surface. Medium SI001, SI010, SI013, SI021, SI023, SI024, SI025, SI029
CI048 Sacra reports Baseten is in talks to raise capital at about an $11 billion post-money valuation, with some reported offers as high as $15 billion, but that is not a closed financing. Low SI023
CI049 Because Baseten appears asset-light on owned GPUs but premium-priced on raw list compute, margin quality likely depends on utilization, enterprise support attachment, and negotiated discounts rather than headline GPU rates alone. Medium SI005, SI023, SI027
CI050 The public evidence supports strong demand, pricing, and capital-access narratives, but a real underwriting decision still depends on private data for realized pricing, retention, gross margin, burn, and concentration. Medium SI021, SI023, SI024, SI025, SI027, SI029
CI051 Baseten's public customer proofs now span financial-services AI, coding copilots, voice dictation, and world-model workloads, indicating that production demand is diversified across several latency-sensitive categories rather than one single end market. Medium SI030, SI031, SI032, SI033, SI034
CI052 Hebbia said Baseten improved tokens per second 2.5x, improved time to first token 4x, and reduced inference cost by more than 10x versus its previous deployment. Medium SI030
CI053 Posit said Baseten delivered sub-200ms latency for its Next Edit Suggestions feature and let the team pay only for compute it actually used. Medium SI031
CI054 Wispr Flow said its end-to-end speech and Llama pipeline ran in under 700 milliseconds at p99 on Baseten and AWS, with scale-to-zero elasticity. Medium SI032
CI055 Zed said Baseten lowered p90 latency by 45% and increased throughput 3.6x versus its previous inference provider, supporting Baseten's claim that performance wins can displace incumbent infrastructure. Medium SI033
CE001 Baseten publicly presents a full-stack product surface spanning Truss-led custom deployment, Model APIs, Dedicated Inference, Chains, Frontier Gateway, and Training rather than a single hosting SKU. High SE001, SE002, SE005, SE007, SE008, SE009, SE010
CE002 Model APIs run on shared infrastructure with OpenAI and Anthropic API compatibility, while dedicated deployments let customers choose hardware, engines, and scaling for their own models. High SE006, SE007
CE003 Truss packages model serving logic, dependencies, weights, and GPU configuration so the same artifact behaves consistently in development and production. Medium SE025, SE027
CE004 Truss publicly supports vLLM, SGLang, TensorRT-LLM, transformers, diffusers, PyTorch, and TensorFlow. Medium SE025, SE027
CE005 For supported architectures, a config-only Truss deployment can compile a model with TensorRT-LLM and expose an OpenAI-compatible endpoint without custom Python model code. Medium SE004, SE025
CE006 Chains deploys Python-defined chainlets where each step can set its own hardware resources, software dependencies, and autoscaling settings. High SE002, SE009
CE007 Baseten's training surface has two public tracks: Training Jobs is GA and Loops is early access. Medium SE010
CE008 Loops is positioned as a training SDK whose checkpoints can promote directly into Dedicated Inference, making inference a first-class output of training. Medium SE010, SE026
CE009 Frontier Gateway adds a white-labeled API surface with key management, rate limits, metering, billing, and branded URLs for labs serving their own models to customers. High SE002, SE008
CE010 MCM is Baseten's infrastructure control plane for unifying GPUs across cloud providers and regions, provisioning resources, and rerouting workloads during capacity crunches or outages. High SE004, SE011
CE011 Baseten gives each deployment a dedicated model subdomain and keeps endpoint names stable across environment promotion. Medium SE004
CE012 Baseten's request-routing model parks requests during scale-to-zero cold starts and offers an async queue that prioritizes synchronous traffic when capacity is tight. Medium SE004
CE013 BDN mirrors model weights into Baseten-controlled storage and uses mirrored-origin, cluster, and node caches to make large-model cold starts faster after the first pull. High SE004, SE019
CE014 Baseten publicly documents runtime optimizations including TensorRT, SGLang, vLLM, TGI, TEI, speculative decoding, structured outputs, KV-cache optimization, and topology-aware parallelism. High SE002, SE013
CE015 Baseten offers Baseten Cloud, self-hosted, hybrid, single-tenant, and region-restricted deployment options for customers that need different control or residency models. High SE007, SE011, SE015
CE016 Regional environments require Baseten configuration and a different regional endpoint format to guarantee inference traffic stays inside the designated geography. Medium SE021
CE017 Baseten publicly claims SOC 2 Type II and HIPAA compliance across its cloud hosting surfaces. High SE014, SE015, SE016
CE018 Baseten says it does not store model inputs, outputs, or weights by default, except temporary storage for async inference and optional caching users enable. High SE014, SE015
CE019 Baseten's public security docs say the platform never shares GPUs across users, isolates each customer into a dedicated Kubernetes namespace, and uses Calico, Falco, and Gatekeeper around workload security. Medium SE014
CE020 Baseten added Enterprise SSO and SCIM in May 2026 with SAML 2.0 sign-in, SCIM 2.0 sync, just-in-time provisioning, automatic deprovisioning, and group-based role assignment. Medium SE017
CE021 Rolling deployments launched in March 2026 and introduced max_surge_percent and stabilization_time_seconds controls for gradual zero-downtime promotion. Medium SE018
CE022 The billing usage API launched in March 2026 and exposes daily spend breakdowns across Dedicated Inference, Training, and Model APIs. Medium SE020
CE023 The only reviewed public Baseten SLA is for Dedicated Inference at 99.9% monthly availability, while Baseten marketing elsewhere uses four-nines or 99.99 reliability language. High SE001, SE007, SE015, SE023
CE024 Baseten's public status page showed incidents on May 15, 16, 18, 19, 26, and 29, 2026 even though its summary cards displayed 100.0% uptime for Dedicated Inference and 99.91% for Model APIs over the visible 90-day window. Medium SE022
CE025 ServiceAlert's third-party reachability page listed May 2026 at 100% uptime but explicitly said detailed incident data is unavailable, limiting independent verification of Baseten outage quality. Medium SE030
CE026 Truss has a visible public developer surface through a GitHub repository, a PyPI package, and active May 2026 release activity. High SE025, SE026, SE027
CE027 The May 2026 Truss release stream emphasized Loops CLI features, training checkpoint views, deployment-log links, and inference-call behavior, which indicates active investment in the training-to-inference workflow. Medium SE026
CE028 Writer's Baseten case study says model-specific TensorRT-LLM engines delivered 60% higher tokens per second, 23% lower time to first token, and 35% lower cost per million tokens on four A100 GPUs. Medium SE028
CE029 OpenEvidence says Baseten reduced end-to-end latency from more than 700 milliseconds to 160 milliseconds and sped deployments 6x. Medium SE029
CE030 OpenEvidence also says Baseten now serves billions of requests per week for its medical workflow and reduced infrastructure maintenance time by more than 8x. Medium SE029
CE031 HostFleet's April 2026 pricing matrix shows Baseten posting higher public GPU-hour rates than Runpod and Modal on comparable L4, A100, and H100 instances. Medium SE016, SE031, SE032, SE033
CE032 Despite the higher published price points, HostFleet characterizes Truss, observability, and support as Baseten's tangible value-adds for startups running production inference. Medium SE031
CE033 Runpod and Modal market more aggressive zero-idle and cold-start language than Baseten, while Baseten emphasizes dedicated compute, managed performance engineering, and control. Medium SE005, SE031, SE032, SE033
CE034 Replicate's public product surface is simpler API-first model serving through Cog, whereas Baseten layers dedicated deployments, Chains, and Frontier Gateway on top of its packaging tool. Medium SE008, SE009, SE025, SE034
CE035 AWS SageMaker, Google Agent Platform, and Azure Machine Learning all span training, deployment, governance, and observability, so Baseten competes by offering a narrower inference-first abstraction rather than full hyperscaler platform breadth. Medium SE004, SE035, SE036, SE037
CE036 A third-party security profile lists AWS, Vercel, Statuspage, SendGrid, Stripe, Segment, Sentry, and other SaaS tools in Baseten's visible operational footprint. Medium SE038
CE037 Baseten's visible 2026 roadmap signal centered on trust and operating controls such as SSO/SCIM, rolling deployments, BDN, and billing instrumentation rather than entirely new product lines. Medium SE017, SE018, SE019, SE020
CE038 Public materials show uneven maturity within the training stack because Training Jobs is GA while Loops is still early access. Medium SE010
CE039 Public Baseten sources still leave unresolved product-tech gaps around benchmark methodology, exact regional-environment setup lead times, and roadmap priorities beyond the currently announced 2026 releases. Low SE004, SE021, SE028
CE040 Baseten's product-tech moat appears strongest for teams that value performance tuning, cross-cloud capacity, and engineering support more than lowest published unit price or hyperscaler breadth. Medium SE007, SE015, SE031, SE032, SE033, SE035, SE036, SE037
CU001 Baseten markets itself as a high-performance inference platform for teams shipping AI products in production. Medium SU001
CU002 Baseten's enterprise page targets mission-critical enterprise inference with secure, scalable, and controllable deployment options. Medium SU003
CU003 Baseten publicly packages Basic, Pro, and Enterprise plans around progressively heavier buyer needs, from pay-as-you-go deployments to self-hosted regulated environments. Medium SU005
CU004 Writer positions itself as an enterprise AI platform used by world-class enterprises. Medium SU008, SU015, SU016
CU005 OpenEvidence describes itself as a medical knowledge platform for clinicians and physicians. Medium SU009, SU014
CU006 Speechify says more than 55 million people use its voice AI productivity assistant. Medium SU010, SU018
CU007 Gamma says its AI tools create presentations, websites, and social content. Medium SU011, SU017
CU008 Superhuman positions itself as AI-enhanced mail, docs, and workflow software for knowledge workers. Medium SU012, SU020
CU009 Patreon says hundreds of thousands of creators use its platform to build direct fan communities and recurring businesses. Medium SU013, SU021
CU010 Business Wire names OpenEvidence, Abridge, Notion, Clay, and Mercor among Baseten customers. Medium SU007
CU011 WorkOS says Baseten powers AI workloads for Cursor, Notion, Clay, OpenEvidence, and Ambience. Medium SU023
CU012 Baseten says its inference volume grew 100x in the last year. Medium SU006
CU013 Baseten's customer-stories index spans speech, healthcare, coding, pharmaceutical search, and AI operations use cases. Medium SU002
CU014 OpenEvidence says Baseten now serves billions of requests per week for its medical-information product. Medium SU009
CU015 OpenEvidence says its product now works with a doctor in every state and zip code in America. Medium SU009
CU016 Speechify says its platform synthesizes more than 161 billion characters per month for 60M+ users. Medium SU010
CU017 Gamma says it generates more than 3 million images per day for more than 70 million users on Baseten. Medium SU011
CU018 Superhuman says Baseten runs dozens of custom embedding models that power core features in its product. Medium SU012
CU019 Patreon says Baseten saved 440+ hours of developer time and nearly $600k per year on its Whisper deployment. Medium SU013
CU020 FeaturedCustomers lists 13 case studies, 29 testimonials, 4 customer videos, and 654 reference ratings for Baseten. Medium SU024
CU021 Writer reports 60% higher tokens per second on Baseten for its domain-specific LLMs. Medium SU008
CU022 Writer reports 23% lower time to first token and 35% lower cost per million tokens on Baseten. Medium SU008
CU023 OpenEvidence reports latency falling from more than 700 milliseconds to 160 milliseconds on Baseten. Medium SU009
CU024 OpenEvidence reports 6x faster deployments and an 8x+ reduction in infrastructure maintenance time on Baseten. Medium SU009
CU025 Speechify reports a 44% lower cost per million characters on Baseten. Medium SU010
CU026 Speechify reports 30-50% lower p99 inference latency and 4.5x faster replica startup on Baseten. Medium SU010
CU027 Gamma reports 30%-80% faster image generation per model on Baseten. Medium SU011
CU028 Gamma reports 20% efficiency improvement while reducing replica count and supporting billions of generated images. Medium SU011
CU029 Superhuman reports an average 80% reduction in P95 latency across its embedding models on Baseten. Medium SU012
CU030 Patreon reports 70% GPU-cost savings and says Baseten was twice as cheap as the next cheapest solution for its Whisper workload. Medium SU013
CU031 FeaturedCustomers reports a 4.8 out of 5 reference-rating score for Baseten based on 654 ratings. Medium SU024
CU032 OpenEvidence says Baseten was a clear winner after the team spent weeks researching and vetting inference providers. Medium SU009
CU033 Speechify says Baseten delivered the highest uptime of any inference provider it knows. Medium SU010
CU034 Superhuman says it was able to self-serve 95% of what it needed on Baseten. Medium SU012
CU035 PeerSpot's review summary emphasizes Baseten's supportive environment, speed-to-deployment, flexibility, and cost effectiveness. Medium SU031
CU036 Baseten's pricing page shows a self-serve Basic plan, a Pro plan with dedicated compute and hands-on engineering, and an Enterprise plan with self-hosting and custom SLAs. Medium SU005
CU037 Baseten's enterprise page says Baseten Cloud offers single-tenant clusters and the self-hosted product can fail over to Baseten Cloud. Medium SU003
CU038 Baseten's healthcare page says the platform is SOC 2 Type II and HIPAA compliant, supports region-restricted deployments, and highlights OpenEvidence and Latent as healthcare cases. Medium SU004
CU039 WorkOS says customers often start thinking about controlling their own destiny once inference spending reaches roughly $10,000-$50,000 per month. Medium SU023
CU040 WorkOS says open-source models let companies switch to options that are faster, cheaper, more customizable, and more reliable at scale. Medium SU023
CU041 Business Wire says Baseten pitches open runtimes and no lock-in around customer models. Medium SU007
CU042 HostFleet says Baseten is the highest-priced listed provider in its April 2026 comparison for T4, L4, A10G, A100, and H100 where listed, and adds that Baseten has a minimum dedicated deployment cost. Medium SU026
CU043 Runpod ranks Baseten fifth in its 2026 serverless GPU comparison and characterizes it as per-minute, configurable-replica infrastructure with 8-12 second speed. Medium SU025
CU044 NVIDIA says Baseten cut cold starts from up to five minutes to 5-10 seconds using NVIDIA GPUs and TensorRT-LLM. Medium SU022
CU045 Publicly quantified proof is concentrated in six flagship case studies even though fundraising and interview materials name additional accounts. Medium SU002, SU007, SU023, SU024
CU046 Reviewed public customer materials do not disclose NRR, GRR, contract length, or top-customer revenue share. Medium SU002, SU003, SU005, SU006, SU007
CU047 Abridge sells enterprise-grade AI for clinical conversations trusted by the largest healthcare systems. Medium SU007, SU027
CU048 Clay says more than 500,000 GTM teams use its data-enrichment and workflow platform. Medium SU023, SU028
CU049 Cursor says it is trusted by over half of the Fortune 500 for AI-assisted software development. Medium SU023, SU029
CU050 Notion AI markets built-in agents, enterprise search, HIPAA-capable enterprise workflows, and zero-data-retention enterprise controls. Medium SU007, SU030
CU051 Mercor says it is organizing human intelligence to power the AI economy. Medium SU007, SU032
CU052 Publicly named strategic accounts extend Baseten beyond consumer applications into healthcare, GTM, coding, and enterprise productivity. Medium SU007, SU023, SU027, SU028, SU029, SU030, SU032
CU053 Public references skew toward AI-native software companies whose own products depend heavily on inference quality and latency. Medium SU002, SU007, SU008, SU009, SU010, SU011, SU012, SU013
CU054 Baseten's public customer-proof quality is high on outcome specificity for six flagship stories but low on disclosed renewal economics. Medium SU008, SU009, SU010, SU011, SU012, SU013, SU024
CU055 The public record supports land-and-expand potential from model experimentation into dedicated compute, multi-cloud scale, and self-hosted enterprise configurations. Medium SU003, SU005, SU009, SU010, SU012
CR001 Baseten says it maintains SOC 2 Type II certification and HIPAA compliance. High SR001, SR011, SR012
CR002 Baseten says it does not store model inputs or outputs by default, except async inputs are temporarily stored until processed. High SR001, SR011
CR003 Baseten says compliance policies are read-only for customers and must be changed through Baseten support. Medium SR001
CR004 Baseten offers self-hosted and single-tenant deployment options for sensitive workloads on higher-tier plans. High SR001, SR008, SR011, SR024
CR005 Baseten's terms incorporate a DPA and security measures that Baseten may update so long as overall protection is not materially decreased. Medium SR003
CR006 Baseten's DPA lets customers object to a new subprocessor within five calendar days after notice. Medium SR003
CR007 Baseten's DPA says it will notify customers without undue delay after discovering a personal-data breach affecting customer personal data, but customers remain responsible for their own notification obligations. Medium SR003
CR008 Baseten's DPA says customers must not provide PHI and other Restricted Data unless otherwise agreed upon with Baseten in writing. High SR003, SR029
CR009 HHS says a covered entity must obtain written satisfactory assurances before disclosing PHI to a business associate. High SR029, SR003
CR010 Baseten's healthcare positioning creates a diligence need to verify a signed BAA or similar written override before underwriting PHI workflows. High SR001, SR003, SR012, SR029
CR011 The European Commission says the AI Act is being implemented through guidance, codes of practice, and an AI Act Service Desk. Medium SR030
CR012 Because Baseten markets healthcare and regulated enterprise workloads, AI Act and GDPR implementation can lengthen security and legal review cycles even if Baseten is infrastructure rather than the end application. Medium SR011, SR012, SR030
CR013 Baseten's published SLA applies only to Dedicated Inference for which Baseten is the hosting party. High SR004, SR024
CR014 Baseten's published Dedicated Inference SLA targets 99.9% monthly availability. High SR004, SR024
CR015 Baseten's SLA caps service credits at 40% of monthly fees and requires customers to submit claims within 24 hours of unscheduled downtime. Medium SR004
CR016 Baseten's terms say the services are not licensed for time-critical or mission-critical functions and are not warranted to be uninterrupted or error-free. High SR003, SR004
CR017 Baseten's status page shows multiple May 2026 incidents, including investigation, identified-fix, monitoring, and major-outage markers in the 90-day view. Medium SR005
CR018 Servicealert says detailed incident data is not available for Baseten and that its history is based on reachability monitoring. Medium SR006
CR019 Baseten's public product pages market four nines or 99.99% uptime more broadly than the default 99.9% Dedicated Inference SLA. High SR004, SR011, SR012, SR020, SR023, SR024
CR020 Baseten shipped rolling deployments with gradual traffic shifting, pause, resume, and cancel controls as a mitigation against deployment-induced outages. Medium SR026, SR022
CR021 Baseten positions its reliability story around multi-cloud, multi-region autoscaling and hybrid deployment options rather than a single-cloud architecture. High SR010, SR011, SR023, SR024
CR022 Nudge Security lists AWS, Vercel, Statuspage, SendGrid, Stripe, Segment, Sentry, GitBook, and other SaaS tools in Baseten's visible supply chain. Medium SR007
CR023 Baseten's frontier, model API, and dedicated inference pages all tie product promises to access to the latest-generation GPUs and elastic capacity. High SR008, SR020, SR023, SR024
CR024 Technavio says AI inference-as-a-service providers face hardware supply constraints and high accelerator costs that inflate operating costs and limit scalability. Medium SR013
CR025 Mordor Intelligence says hardware accelerators are the fastest-growing enterprise AI component and that GPU supply constraints and salary inflation are current headwinds. Medium SR014
CR026 HostFleet's April 2026 matrix shows Baseten priced above Runpod, Modal, and Fal.ai on multiple comparable GPU classes. Medium SR016, SR008
CR027 Runpod's 2026 comparison lists Baseten with per-minute pricing and an 8-12 second cold-start range while ranking cheaper or faster peers above it on some dimensions. Medium SR015, SR008
CR028 HostFleet says Baseten has a minimum dedicated deployment cost and billed minimum awake times, which raises entry friction for smaller customers. Medium SR016, SR008
CR029 Baseten counters lock-in risk with self-hosting, hybrid deployment, open runtimes, and full ownership of trained weights. Medium SR011, SR021, SR028
CR030 Baseten announced a $300M Series E at a $5B valuation in January 2026 after multiple fundraises within the prior year. Medium SR018, SR017
CR031 Baseten says the financing marked the company's third fundraise in the prior year, increasing pressure to convert capital into durable enterprise growth. Medium SR018, SR017
CR032 Tracxn lists Baseten at 46 employees on December 31, 2024 and 258 employees by April 26, 2026. Low SR017
CR033 Baseten's careers page says companies such as Abridge, Cursor, Lovable, Notion, and OpenEvidence depend on Baseten for mission-critical AI workloads. Medium SR019, SR018
CR034 Baseten is expanding simultaneously across model APIs, dedicated inference, frontier gateway, model management, and training products. Medium SR020, SR021, SR022, SR023, SR024
CR035 Baseten's Loops training product is still early access even as Training Jobs is generally available. Medium SR021
CR036 SSO/SCIM, advanced identity controls, self-hosting, and custom SLAs are tied to higher-tier enterprise packaging rather than the self-serve entry plan. Medium SR008, SR011, SR025
CR037 Baseten added SSO/SCIM with automatic provisioning and deprovisioning plus group-based role assignment as a concrete mitigation for identity risk in larger accounts. Medium SR025, SR011
CR038 Baseten's billing usage API gives customers programmatic daily cost visibility across Dedicated Inference, Training, and Model APIs. Medium SR027, SR008
CR039 Baseten's model-management tooling says customers can monitor deployment health and adjust autoscaling policies to hit performance SLAs. Medium SR022, SR010
CR040 Truss and custom-server packaging reduce some switching-cost risk because Baseten exposes a more portable packaging layer than a fully closed model-hosting service. Medium SR028, SR022
CR041 Baseten's repeated emphasis on hands-on engineering expertise and customized deployments implies a service-heavy go-to-market model that may pressure margins as enterprise accounts scale. Medium SR008, SR011, SR024
CR042 Baseten's public contract stack leaves customers responsible for system configuration, backups, valid legal basis, and parts of incident response, which can slow regulated deployments even when Baseten provides secure infrastructure. High SR003, SR004, SR029
CR043 Modal said it raised $355 million in May 2026 after surpassing $300 million in annualized revenue, showing that a close inference-infrastructure rival is scaling quickly with large new capital. High SR031, SR032
CR044 Reuters reported that Modal's Series C valued the company at $4.65 billion, close to Baseten's $5 billion January 2026 valuation, which limits room for execution misses if buyers compare the platforms directly. Medium SR032
CR045 Sacra estimated Fireworks AI at roughly $315 million in annualized revenue in 2026 and a $4 billion valuation from its 2025 Series C, indicating that another open-model inference peer is already operating at substantial scale. Medium SR033
CR046 Tracxn says RunPod has raised only $22 million while positioning itself as a cost-effective GPU-infrastructure provider, which suggests cheaper rivals do not need Baseten-like capital intensity to pressure pricing. Medium SR034
CR047 CoreWeave reported nearly $100 billion of revenue backlog in May 2026 and explicitly framed inference as a major growth vector, underscoring that capital-rich infrastructure platforms are racing to absorb the same demand pool Baseten targets. Medium SR035
CV001 Baseten officially announced a $300 million Series E at a $5 billion valuation in January 2026. High SV001, SV002, SV003, SV005
CV002 After the Series E, public sources put Baseten’s total disclosed funding at about $585 million. Medium SV002, SV004, SV005, SV006
CV003 Tracxn records Baseten’s financing path as a $75 million Series C in February 2025, a $150 million Series D at a $2.15 billion valuation in September 2025, and a $300 million Series E at a $5 billion valuation in January 2026. Medium SV005
CV004 Baseten’s official Series D announcement says the company raised $150 million in a round led by BOND. High SV008, SV005
CV005 Baseten’s Series C announcement and PitchBook archive together support that the company’s 2025 Series C was a $75 million round. Medium SV009, SV007
CV006 Baseten said inference volume grew 100x during 2025. Medium SV001, SV004
CV007 Sacra estimates Baseten reached $600 million of annualized revenue in March 2026, up from about $200 million in December 2025. Medium SV004
CV008 Sacra says Baseten was in talks in May 2026 to raise $1 billion at an $11 billion post-money valuation, with reported offers reaching as high as $15 billion. Medium SV004
CV009 The gap between the closed $5 billion round and the mooted $11 billion follow-on means the underwriting question is whether fundamentals have caught up with sentiment, not whether enthusiasm exists. Medium SV001, SV002, SV004
CV010 Baseten’s pricing page shows a free pay-as-you-go Basic tier, while Pro adds priority compute and dedicated support and Enterprise adds custom SLAs and self-hosting. Medium SV010
CV011 Baseten’s homepage pitches cross-cloud scale, forward-deployed engineers, and 99.99% uptime as reasons customers should trust it for production workloads. Medium SV011
CV012 HostFleet’s April 2026 pricing matrix shows Baseten at $4.00 per hour for A100 and $6.50 per hour for H100, above Modal at $2.10 and $3.95 and above Runpod at $2.17 and $3.35 on the same GPU classes. Medium SV012
CV013 HostFleet also notes Baseten combines per-GPU-hour pricing with a minimum dedicated deployment cost and billed minimum awake times. Medium SV012
CV014 Runpod’s 2026 comparison ranks Baseten below several alternatives on affordability and cites usage-based per-minute billing with 8–12 second cold starts. Medium SV013
CV015 Baseten can still justify premium pricing if observability, support, compliance, and hybrid deployment reduce customers’ total cost of production inference. Medium SV010, SV011, SV012
CV016 Modal disclosed a May 2026 Series C of $355 million at a $4.65 billion post-money valuation after surpassing $300 million in annualized revenue. High SV015, SV016
CV017 Modal’s May 2026 round implies an approximate 15.5x annualized-revenue multiple. Medium SV015, SV016
CV018 Modal says it can scale from 0 to 1,000 GPUs in minutes or even seconds, making it a credible direct infrastructure comparable rather than a generic application software company. Medium SV015
CV019 Fireworks AI’s last closed round was a $250 million Series C at a $4 billion post-money valuation, while Sacra estimates roughly $315 million of annualized revenue in February 2026 and gross margin around 50%. Medium SV018
CV020 Fireworks’ closed-round valuation implies an approximate 12.7x annualized-revenue multiple, which is above Baseten’s implied closed-round multiple if the Sacra estimate is right. Medium SV018, SV004
CV021 CoreWeave reported Q1 2026 revenue of $2.078 billion and a $99.4 billion revenue backlog, while CompaniesMarketCap showed a May 2026 market cap of $59.75 billion. Medium SV019, SV020
CV022 Using CoreWeave’s 2026 revenue context, public AI infrastructure is trading at roughly 4.8x market cap to annualized-guide revenue. Medium SV019, SV020
CV023 Datadog guided to $4.30 billion to $4.34 billion of 2026 revenue, and CompaniesMarketCap put its May 2026 market cap at $88.04 billion. High SV021, SV022
CV024 Datadog’s implied multiple is about 20.4x forward revenue, showing how the market prices premium infrastructure software with strong growth and disclosure. Medium SV021, SV022
CV025 Datadog’s Form 10-K highlights the disclosure baseline public investors get on risk factors, revenue, and growth that private Baseten investors do not get from public materials. High SV023, SV021
CV026 CompaniesMarketCap and Stock Analysis put Cloudflare at about an $85.47 billion market cap and $2.33 billion of trailing revenue in late May 2026. Medium SV024, SV025
CV027 Cloudflare’s implied 36.7x revenue multiple is an upper-bound developer-platform reference that assumes much better disclosure, margin structure, and category leadership than Baseten has shown publicly. Medium SV024, SV025
CV028 CompaniesMarketCap and Stock Analysis put MongoDB at about a $27.01 billion market cap and $2.60 billion of trailing revenue in late May 2026. Medium SV026, SV027
CV029 MongoDB’s implied 10.4x revenue multiple is a lower-middle public infrastructure-software reference for a scaled but less euphoric comp set. Medium SV026, SV027
CV030 Technavio values the AI inference-as-a-service market at $85.25 billion in 2025 and expects 22.1% CAGR through 2030. Medium SV028
CV031 Mordor values the broader enterprise AI market at $114.87 billion in 2026, with cloud deployment accounting for 67.33% of 2025 revenue. Medium SV029
CV032 AWS Bedrock advertises select batch inference at 50% below on-demand pricing, showing hyperscalers can attack the inference layer with bundled economics. Medium SV030
CV033 Google promotes a unified agent platform with 200-plus models and free credits for new customers, increasing the risk that enterprises default to broader cloud bundles. Medium SV031
CV034 Azure Machine Learning publishes a 99.9% SLA and no additional platform charge beyond underlying Azure services, reinforcing the bundling threat to independent vendors. Medium SV032
CV035 If Sacra’s $600 million annualized-revenue estimate is directionally right, Baseten’s closed $5 billion round implies roughly an 8.3x revenue multiple. Medium SV004
CV036 An $8.3x implied multiple would place Baseten above CoreWeave-like AI cloud treatment but below Modal, Fireworks, Datadog, and Cloudflare-style premium software treatment. Medium SV004, SV018, SV019, SV020, SV021, SV022, SV024, SV025
CV037 At the same $600 million run-rate, the mooted $11 billion follow-on would imply roughly an 18.3x multiple, much closer to Datadog-grade public software pricing. Medium SV004, SV021, SV022
CV038 Baseten’s pricing and delivery model suggest revenue quality may be more support-intensive and lower-margin than top public software comps even if growth is exceptional. Medium SV010, SV011, SV012, SV013
CV039 Fireworks’ roughly 50% gross margin and explicit 60% target are a useful reminder that inference platforms are infrastructure businesses first, not pure software businesses. Medium SV018
CV040 The strongest pro-valuation argument is that inference demand is large, cloud-heavy, and moving into production workloads where Baseten offers hybrid deployment and performance differentiation. Medium SV028, SV029, SV010, SV011
CV041 The strongest anti-valuation argument is that premium pricing can be attacked by Runpod and Modal at the edge and by hyperscalers through bundled platform pricing. Medium SV012, SV013, SV017, SV030, SV031, SV032
CV042 The current $5 billion price is supportable only conditionally because it assumes the private revenue estimate is directionally right and that Baseten can defend premium economics despite bundling pressure. Medium SV004, SV012, SV013, SV015, SV016, SV030, SV031, SV032
CV043 A reasonable bear case uses $300 million to $400 million of revenue support and a 7x to 9x multiple, implying roughly $2.1 billion to $3.6 billion of value. Medium SV004, SV018, SV026, SV027
CV044 A reasonable base case uses $500 million to $650 million of revenue support and an 8x to 12x multiple, implying roughly $4.0 billion to $7.8 billion and placing the closed $5 billion round inside the range. Medium SV004, SV015, SV016, SV018, SV026, SV027
CV045 A reasonable bull case uses $700 million to $900 million of revenue support and a 12x to 16x multiple, implying roughly $8.4 billion to $14.4 billion and making an $11 billion step-up possible only if growth and premium perception keep compounding. Medium SV004, SV015, SV016, SV018
CV046 The right investment recommendation is track, not buy, because company quality is high but the public evidence leaves the price only fair-to-stretched rather than clearly attractive. Medium SV004, SV012, SV013, SV015, SV016, SV023
CV047 The highest-leverage diligence question is whether internal revenue, gross margin, and customer-concentration data support the market narrative implied by the $5 billion round. Medium SV004, SV018, SV023
CV048 The thesis should break if Baseten cannot preserve premium price-performance with acceptable margin, if growth normalizes materially below the base-case band, or if any new round clears only with aggressive terms. Medium SV004, SV012, SV013, SV018, SV023
Sources
IDPublisherTitleQuote
SO001 Baseten Baseten | Inference is everything
SO002 Baseten Baseten customers
SO003 Baseten Enterprise
SO004 Baseten Healthcare
SO005 Baseten Pricing
SO006 Baseten Careers at Baseten Companies like Abridge, Cursor, Lovable, Notion, and OpenEvidence depend on Baseten to power mission-critical AI workloads in production.
SO007 Baseten Baseten Terms and Conditions BASETEN LABS, INC. (“BASETEN”).
SO008 Baseten Privacy Policy Company (referred to as either "the Company", "We", "Us" or "Our" in this Agreement) refers to BaseTen Labs, Inc., 201 Spear St, Suite 1600, San Francisco, CA 94105.
SO009 Baseten Announcing our Series A We’ve raised a little over $20 million dollars to date across our seed and Series A rounds.
SO010 Baseten Announcing our Series B We’re excited to announce that we’ve raised an additional $40M.
SO011 Baseten Announcing Baseten’s $75M Series C Today, we run workloads across thousands of GPUs, serving millions of end customers worldwide while continuously adding new cloud partners.
SO012 Baseten Announcing Baseten’s $150M Series D Today, we’re excited to announce our $150M Series D, led by BOND, with Jay Simons joining our Board.
SO013 Baseten Announcing Baseten's $300M Series E We’re thrilled to announce that we have raised $300M at a $5B valuation.
SO014 Business Wire Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future Founded in 2019 and based in San Francisco, Baseten has raised $585 million to date from investors including IVP, CapitalG, Conviction, Bond, Greylock, and Spark Capital.
SO015 Tech Funding News Baseten nabs $300M from IVP, CapitalG to challenge Together AI in inference
SO016 Tracxn Baseten Technologies
SO017 CB Insights Baseten Stock Price, Funding, Valuation, Revenue & Financial Statements
SO018 PitchBook via Internet Archive Baseten 2025 Company Profile: Valuation, Funding & Investors | PitchBook
SO019 Abridge Abridge | Intelligence at the point of conversation
SO020 Clay Clay | GTM workflows at scale
SO021 Cursor Cursor | The new way to build software
SO022 OpenEvidence OpenEvidence | America's Official Medical Knowledge Platform
SO023 Baseten OpenEvidence delivers instant, accurate medical information with Baseten Baseten now serves billions of requests per week for OpenEvidence.
SO024 Baseten How Gamma makes building presentations criminally fun
SO025 Baseten Speechify real-time text-to-speech Because of Baseten’s efficient autoscaling, model performance and infrastructure optimizations, Speechify’s cost per million characters dropped by 44%.
SO026 Baseten Patreon
SO027 NVIDIA Streamlined AI Inference Infrastructure in the Cloud Baseten’s infrastructure, powered by NVIDIA GPUs on AWS EC2 instances powered by NVIDIA A100 Tensor Core GPUs, only takes 5–10 seconds to get the model up and running. This is an incredible speedup on cold starts, which previously took up to five minutes.
SO028 WorkOS A conversation with Philip Kiely from Baseten at AWS re:Invent 2025
SO029 Nudge Security Is Baseten safe? Learn if Baseten Is Legit Review Baseten security risks.
SO030 ServiceAlert Baseten Outage History, Downtime & Incident Records Detailed incident data is not available for this service.
SO031 Baseten Tuhin Srivastava - CEO, Co-Founder
SO032 Baseten Amir Haghighat - CTO, Co-Founder
SO033 Baseten Pankaj Gupta - Co-Founder
SO034 Baseten Phil Howes - Co-Founder
SM001 Baseten Inference Platform: Deploy AI models in production | Baseten Scale workloads across any region and any cloud (in our cloud or yours), with blazing-fast cold starts and 99.99% uptime out of the box.
SM002 Baseten Cloud Pricing Basic: $0 per month, pay as you go. Enterprise adds self-host deployments, cloud commitments, and custom SLAs.
SM003 Baseten Mission-Critical Inference for Enterprise AI Infrastructure The Baseten Inference Stack runs inside your cloud infrastructure, keeping your data fully under your control.
SM004 Baseten Healthcare 99.99% uptime and infinite scaling through a unified GPU pool spanning 10+ clouds.
SM005 Baseten Production-First Model APIs - Baseten Inference Stack Spend 5-10x less than closed alternatives with our optimized multi-cloud infrastructure and efficient frontier open models.
SM006 Baseten Inference at Scale with Dedicated Deployments | Baseten We regularly see 6x better GPU utilization and 5-10x lower costs powered by our Inference Stack.
SM007 Baseten Multi-Model Inference, Ultra-Low Latency at Scale | Baseten Baseten Chains enables granular hardware and autoscaling for compound AI, powering 6x better GPU usage and cutting latency in half.
SM008 Baseten Cloud-Native AI Infrastructure | Baseten Scale in your cloud, ours, or both with Baseten Self-hosted, Cloud, and Hybrid deployment options.
SM009 Baseten Secure model inference - Baseten Baseten never shares GPUs across users.
SM010 Baseten Customer stories Speechify synthesizes 161B+ characters per month for 60M+ users. With Baseten, Speechify cut costs by 44%, p99 latency by 30-50%, and got 4.5x faster cold starts.
SM011 Baseten OpenEvidence delivers instant, accurate medical information with the Baseten Inference Stack OpenEvidence can scale efficiently even in the face of traffic spikes, hardware failure, or capacity constraints... without locking into multi-year commitments with single cloud vendors.
SM012 Baseten How Gamma makes building presentations criminally fun We generate millions of images a day on Baseten for our 70+ million users with ultra-low latency and high throughput.
SM013 Baseten How Writer helps businesses transform with AI In a benchmark running the LLMs in FP16 on four NVIDIA A100 GPUs, Writer saw 60% higher tokens per second, 23% lower time to first token, and 35% lower cost per million tokens.
SM014 Baseten Why we built and open-sourced a model serving solution Truss bridges the gap between model development and model deployment by making it equally straightforward to serve a model on localhost and in prod.
SM015 Baseten AI Model Training Built for Production Inference | Baseten Train -> deploy loop: Models trained with Loops promote directly to Baseten Dedicated Inference with one command.
SM016 Baseten Baseten Frontier Gateway The Baseten Frontier Gateway is the path from weights to a production-ready API.
SM017 Baseten SSO and SCIM Available on the Enterprise plan with just-in-time provisioning, automatic deprovisioning, and optional group-gated admin access.
SM018 Baseten Retrieve billing usage via API The response includes aggregate totals and a per-resource or per-model breakdown array, with daily granularity on each entry.
SM019 Technavio AI Inference-as-a-service Market Growth Analysis - Size and Forecast 2026-2030 The AI Inference-as-a-service Market size was valued at USD 85.25 billion in 2025, growing at a CAGR of 22.1% during the forecast period 2026-2030.
SM020 Mordor Intelligence Enterprise AI Market - Share, Trends & Size 2025 - 2031 The Enterprise AI market size stood at USD 114.87 billion in 2026 and is projected to reach USD 273.08 billion by 2031, registering an 18.91% CAGR over 2026-2031.
SM021 Fortune Business Insights AI Inference Market Size, Share | Global Growth Report [2034] The global AI inference market size was valued at USD 103.73 billion in 2025 and is projected to grow from USD 117.80 billion in 2026 to USD 312.64 billion by 2034.
SM022 Modal Modal: High-performance AI infrastructure Autoscale from 0 to 1000+ GPUs, instantly.
SM023 Replicate Run AI with an API We scale up and down to handle demand, and you only pay for the compute that you use.
SM024 Runpod The AI Developer Cloud | Runpod One platform to go from AI experiment to production. Pods for building. Serverless for shipping. Clusters for scaling.
SM025 Runpod Top Serverless GPU Clouds for 2026: Comparing Runpod, Modal, and More Baseten: usage-based (per-minute), configurable replicas, T4/A10G/L4/A100/H100, 8-12 sec cold starts.
SM026 HostFleet Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) — HostFleet You’re a startup with a production inference workload and a budget -> Baseten. Most expensive per GPU-hour on paper, but Truss, observability, and support are tangible.
SM027 Amazon Web Services The center for all your data, analytics, and AI – Amazon SageMaker – AWS Train, customize, and deploy ML and foundation models on a highly performant and cost-effective infrastructure.
SM028 Google Cloud Gemini Enterprise Agent Platform (formerly Vertex AI) Build, scale, govern and optimize enterprise grade AI agents.
SM029 Microsoft Azure Azure Machine Learning - ML as a Service | Microsoft Azure Azure Machine Learning is a comprehensive machine learning platform that supports language model fine-tuning and deployment.
SP001 Baseten Inference Platform: Deploy AI models in production | Baseten
SP002 Baseten Cloud Pricing
SP003 Baseten Docs Overview - Baseten
SP004 Baseten Docs Secure model inference - Baseten
SP005 Baseten Production-First Model APIs - Baseten Inference Stack
SP006 Baseten AI Model Training Built for Production Inference | Baseten
SP007 Baseten Multi-Model Inference, Ultra-Low Latency at Scale | Baseten
SP008 Baseten AI Model Performance - Baseten Inference Runtime
SP009 Baseten Mission-Critical Inference for Enterprise AI Infrastructure
SP010 Baseten Baseten Frontier Gateway
SP011 Baseten Why we built and open-sourced a model serving solution
SP012 Baseten Baseten Status
SP013 Servicealert.ai Baseten Outage History, Downtime & Incident Records
SP014 Modal Modal: High-performance AI infrastructure
SP015 Modal Plan Pricing | Modal
SP016 Replicate Run AI with an API
SP017 Replicate Pricing – Replicate
SP018 Runpod The AI Developer Cloud | Runpod
SP019 Runpod GPU Cloud Pricing - Runpod
SP020 Runpod Docs Serverless pricing | Runpod Docs
SP021 AWS The center for all your data, analytics, and AI – Amazon SageMaker – AWS
SP022 AWS Amazon Bedrock Pricing – AWS
SP023 Google Cloud Gemini Enterprise Agent Platform (formerly Vertex AI)
SP024 Microsoft Azure Azure Machine Learning - ML as a Service | Microsoft Azure
SP025 HostFleet Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) — HostFleet Baseten: Most expensive per GPU-hour on paper, but Truss, observability, and support are tangible.
SP026 Tracxn Baseten Technologies
SP027 PitchBook Baseten 2025 Company Profile: Valuation, Funding & Investors | PitchBook
SP028 Sacra Baseten revenue, valuation & funding AWS, Google, and Microsoft leverage extensive enterprise relationships to bundle AI inference with broader cloud commitments at below-market rates.
SP029 Business Wire Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future
SP030 Tech Funding News Baseten nabs $300M from IVP, CapitalG to challenge Together AI in inference — TFN
SI001 Baseten Cloud Pricing
SI002 Baseten Inference at Scale with Dedicated Deployments | Baseten
SI003 Baseten Production-First Model APIs - Baseten Inference Stack
SI004 Baseten AI Model Training Built for Production Inference | Baseten
SI005 Baseten Enterprise
SI006 Baseten Healthcare
SI007 Baseten Retrieve billing usage via API
SI008 Baseten Announcing Baseten’s $75M Series C
SI009 Baseten Announcing Baseten’s $150M Series D
SI010 Baseten Announcing Baseten’s $300M Series E
SI011 Baseten Careers at Baseten
SI012 Baseten Privacy Policy
SI013 Baseten Baseten Terms and Conditions
SI014 Baseten Service Level Agreement
SI015 Baseten Baseten Status
SI016 Baseten How Writer helps businesses transform with AI
SI017 Baseten OpenEvidence delivers instant, accurate medical information with the Baseten Inference Stack
SI018 Baseten How Speechify makes audio the default with real-time text-to-speech
SI019 Baseten Superhuman achieves 80% faster embedding model inference with Baseten
SI020 Baseten Patreon scales Whisper transcription with Baseten
SI021 Business Wire Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future
SI022 Tech Funding News Baseten nabs $300M from IVP, CapitalG to challenge Together AI in inference
SI023 Sacra Baseten revenue, valuation & funding
SI024 Tracxn Baseten Technologies
SI025 CB Insights Baseten Stock Price, Funding, Valuation, Revenue & Financial Statements
SI026 PitchBook via Wayback Baseten 2025 Company Profile: Valuation, Funding & Investors | PitchBook
SI027 HostFleet Serverless GPU Pricing Matrix 2026
SI028 Mordor Intelligence Enterprise AI Market - Share, Trends & Size 2025 - 2031
SI029 U.S. Securities and Exchange Commission EDGAR Entity Landing Page (CIK 0001850888)
SI030 Baseten How Hebbia uses Baseten to power AI workflows for the world's leading financial institutions
SI031 Baseten Posit launches real-time AI code suggestions with Baseten
SI032 Baseten Wispr Flow creates effortless voice dictation with Llama on Baseten
SI033 Baseten How Zed is reimagining the code editor from the ground up
SI034 Baseten How World Labs is building large world models, pushing the boundaries of 3D
SE001 Baseten Inference Platform: Deploy AI models in production | Baseten Rapidly scale workloads across any cloud provider with global capacity. We offer single-tenant and self-hosted deployments for extra security.
SE002 Baseten Overview - Baseten Baseten is a training and inference platform. Bring a model ... and Baseten turns it into a production API endpoint with autoscaling, observability, and optimized serving infrastructure.
SE003 Baseten Reference documentation - Baseten
SE004 Baseten How Baseten works Behind every GPU workload on Baseten is the Multi-cloud Capacity Management (MCM) system.
SE005 Baseten Production-First Model APIs - Baseten Inference Stack Model APIs made for products, not toys.
SE006 Baseten Model APIs - Baseten Model APIs provide instant access to high-performance LLMs through endpoints that are compatible with both the OpenAI Chat Completions API and the Anthropic Messages API.
SE007 Baseten Inference at Scale with Dedicated Deployments | Baseten Dedicated deployments are single-tenant, can be region-locked, and are HIPAA compliant and SOC 2 Type II certified on Baseten Cloud.
SE008 Baseten Baseten Frontier Gateway Baseten Frontier Gateway gives you a production-ready, white-labeled API endpoint.
SE009 Baseten Multi-Model Inference, Ultra-Low Latency at Scale | Baseten Deploy your Chain to production with each Chainlet specifying its own hardware resources, software dependencies and scaling settings independently.
SE010 Baseten AI Model Training Built for Production Inference | Baseten Loops (early access) ... Training Jobs (GA).
SE011 Baseten Cloud-Native AI Infrastructure | Baseten We built cross-cloud autoscaling so you can serve users anywhere in the world with low latency and high reliability.
SE012 Baseten AI Model Management for Production Inference | Baseten
SE013 Baseten AI Model Performance - Baseten Inference Runtime We take the best open-source inference frameworks (TensorRT, SGLang, vLLM, TGI, TEI, and more) and layer in our own optimizations for maximum performance.
SE014 Baseten Secure model inference - Baseten Baseten does not store model inputs, outputs, or weights by default.
SE015 Baseten Mission-Critical Inference for Enterprise AI Infrastructure We are SOC-2 Type II certified and HIPAA compliant across all hosting options and support data residency requirements through region-restricted deployments.
SE016 Baseten Cloud Pricing Only pay for the compute you use, down to the minute.
SE017 Baseten SSO and SCIM Connect Baseten to your identity provider for SAML 2.0 sign-in and SCIM 2.0 directory sync.
SE018 Baseten Rolling deployments You can now gradually shift traffic to new deployments instead of swapping all at once.
SE019 Baseten Introducing the Baseten Delivery Network (BDN) We just launched the Baseten Delivery Network (BDN), designed to make cold starts 2-3x faster for large models.
SE020 Baseten Retrieve billing usage via API You can now query your billing usage programmatically using the new GET /v1/billing/usage_summary endpoint.
SE021 Baseten Regional environments Regional environments route inference traffic for a deployment exclusively to workload planes within a designated geographic region.
SE022 Baseten Baseten Status Past Incidents ... May 29, 2026 ... May 26 ... May 19 ... May 18 ... May 16 ... May 15.
SE023 Baseten Service Level Agreement Baseten will undertake commercially reasonable measures to ensure that System Availability ... equals or exceeds ninety-nine point nine percent (99.9%) during each calendar month.
SE024 Baseten Why we built and open-sourced a model serving solution To address this problem, we built Truss.
SE025 GitHub / basetenlabs GitHub - basetenlabs/truss: The simplest way to serve AI/ML models in production Truss is the CLI for deploying and serving ML models on Baseten.
SE026 GitHub / basetenlabs Releases · basetenlabs/truss v0.18.3 ... 21 May 16:14 ... feat(loops/cli) ... feat(train) ... feat(truss).
SE027 PyPI truss pip install --upgrade truss
SE028 Baseten How Writer helps businesses transform with AI In a benchmark running the LLMs in FP16 on four NVIDIA A100 GPUs, Writer saw: 60% higher tokens per second, 23% lower time to first token, 35% lower cost per million tokens.
SE029 Baseten OpenEvidence delivers instant, accurate medical information with the Baseten Inference Stack By using Baseten, OpenEvidence achieved: 78% lower latency ... 6x faster deployment processes ... 8x+ reduction in infrastructure maintenance time overall.
SE030 ServiceAlert Baseten Outage History, Downtime & Incident Records Detailed incident data is not available for this service.
SE031 HostFleet Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) — HostFleet You’re a startup with a production inference workload and a budget → Baseten. Most expensive per GPU-hour on paper, but Truss, observability, and support are tangible.
SE032 Runpod The AI Developer Cloud | Runpod Sub-200ms cold starts ... Zero idle cost.
SE033 Modal Modal: High-performance AI infrastructure Autoscale from 0 to 1000+ GPUs, instantly.
SE034 Replicate Run AI with an API You can deploy your own custom models using Cog, our open-source tool for packaging machine learning models.
SE035 Amazon Web Services The center for all your data, analytics, and AI – Amazon SageMaker – AWS Train, customize, and deploy ML and foundation models on a highly performant and cost-effective infrastructure.
SE036 Google Cloud Gemini Enterprise Agent Platform (formerly Vertex AI) Agent Platform is our open and comprehensive platform ... to build, scale, govern and optimize enterprise-grade agents.
SE037 Microsoft Azure Azure Machine Learning - ML as a Service | Microsoft Azure Azure Machine Learning is a comprehensive machine learning platform that supports language model fine-tuning and deployment.
SE038 Nudge Security Is Baseten Safe? Learn if Baseten Is Legit | Nudge Security Baseten Supply Chain ... Amazon Web Services (AWS), Vercel, Statuspage, SendGrid, Stripe, Google Analytics, Segment, Sentry ...
SU001 Baseten Inference Platform: Deploy AI models in production | Baseten
SU002 Baseten Customer stories
SU003 Baseten Mission-Critical Inference for Enterprise AI Infrastructure
SU004 Baseten Healthcare
SU005 Baseten Cloud Pricing
SU006 Baseten Announcing Baseten's $300M Series E
SU007 Business Wire Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future
SU008 Baseten How Writer helps businesses transform with AI
SU009 Baseten OpenEvidence delivers instant, accurate medical information with the Baseten Inference Stack
SU010 Baseten How Speechify makes audio the default with real-time text-to-speech
SU011 Baseten How Gamma makes building presentations criminally fun
SU012 Baseten Superhuman achieves 80% faster embedding model inference with Baseten
SU013 Baseten Patreon saves nearly $600k/year in ML resources with Baseten
SU014 OpenEvidence OpenEvidence
SU015 Writer WRITER
SU016 Writer About WRITER
SU017 Gamma About Us – Reinventing Presentations with AI | Gamma.app
SU018 Speechify Speechify: Text to Speech & Voice Typing AI Assistant | 55M+ Users
SU019 Speechify Voice Over Studio: Request A Free Demo | Speechify
SU020 Superhuman Superhuman: Docs, Mail, and AI That Work Everywhere
SU021 Patreon Where Creator Communities Thrive — Patreon
SU022 NVIDIA Case study:Baseten’s AI Inference Infrastructure Baseten's infrastructure, powered by NVIDIA GPUs on AWS EC2 instances powered by NVIDIA A100 Tensor Core GPUs, only takes 5–10 seconds to get the model up and running.
SU023 WorkOS Baseten is betting big on open source models — WorkOS companies could switch to models that were faster, less expensive, more customizable, and more reliable at scale
SU024 FeaturedCustomers 46 Baseten Customer Reviews & References Customer Rating Review Score based on 654 reference ratings 4.8/5.0
SU025 Runpod Top Serverless GPU Clouds for 2026: Comparing Runpod, Modal, and More Baseten ... Usage-based (per-minute) ... 8–12 sec
SU026 HostFleet Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) — HostFleet Baseten has per-GPU-hour pricing plus a minimum dedicated deployment cost.
SU027 Abridge Generative AI for Clinical Conversations | Abridge
SU028 Clay Clay | Go to market with unique data—and the ability to act on it
SU029 Cursor The best coding agent
SU030 Notion Meet your AI team | Notion
SU031 PeerSpot Baseten Reviews, Competitors and Pricing
SU032 Mercor Mercor | Organizing human intelligence to power the AI economy
SR001 Baseten Secure model inference Baseten does not store model inputs, outputs, or weights by default.
SR002 Baseten Privacy Policy
SR003 Baseten Baseten Terms and Conditions Customer acknowledges and agrees that the Baseten Products & Services will not be used, and is not licensed for use, in connection with any of Customer’s time-critical or mission-critical functions.
SR004 Baseten Service Level Agreement Baseten will undertake commercially reasonable measures to ensure that System Availability ... equals or exceeds ninety-nine point nine percent (99.9%).
SR005 Baseten Baseten Status
SR006 ServiceAlert Baseten Outage History, Downtime & Incident Records
SR007 Nudge Security Is Baseten Safe? Learn if Baseten Is Legit
SR008 Baseten Cloud Pricing
SR009 Baseten Baseten homepage
SR010 Baseten Cloud-Native AI Infrastructure
SR011 Baseten Mission-Critical Inference for Enterprise AI Infrastructure
SR012 Baseten Healthcare SOC-2 Type II and HIPAA compliant with flexible hosting and data residency with region-restricted cloud deployments.
SR013 Technavio AI Inference-as-a-service Market Growth Analysis - Size and Forecast 2026-2030
SR014 Mordor Intelligence Enterprise AI Market - Share, Trends & Size 2025 - 2031
SR015 Runpod Top Serverless GPU Clouds for 2026: Comparing Runpod, Modal, and More
SR016 HostFleet Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) Baseten has per-GPU-hour pricing plus a minimum dedicated deployment cost. Scale-to-zero is available but there are billed minimum awake times.
SR017 Tracxn Baseten Technologies
SR018 Business Wire Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future
SR019 Baseten Careers at Baseten
SR020 Baseten Production-First Model APIs - Baseten Inference Stack
SR021 Baseten AI Model Training Built for Production Inference
SR022 Baseten AI Model Management for Production Inference
SR023 Baseten Baseten Frontier Gateway
SR024 Baseten Inference at Scale with Dedicated Deployments
SR025 Baseten SSO and SCIM
SR026 Baseten Rolling deployments
SR027 Baseten Retrieve billing usage via API
SR028 Baseten Why we built and open-sourced a model serving solution
SR029 U.S. Department of Health & Human Services Business Associates The satisfactory assurances must be in writing, whether in the form of a contract or other agreement between the covered entity and the business associate.
SR030 European Commission The EU’s approach to artificial intelligence
SR031 Modal Series C announcement
SR032 Reuters AI startup Modal raised $355 million in a new round of financing, valuing the company at $4.65 billion
SR033 Sacra Fireworks AI revenue, valuation & funding
SR034 Tracxn RunPod
SR035 CoreWeave Record First Quarter Revenue and Revenue Backlog Highlight Unprecedented Demand for CoreWeave Cloud
SV001 Baseten Announcing Baseten’s $300M Series E We’re thrilled to announce that we have raised $300M at a $5B valuation.
SV002 Business Wire Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future This values Baseten at $5 billion and marks the company’s third fundraise in the past year.
SV003 Tech Funding News Baseten nabs $300M from IVP, CapitalG to challenge Together AI in inference investors invested $300 million in the company, pushing its valuation to about $5 billion
SV004 Sacra Baseten revenue, valuation & funding Sacra estimates that Baseten hit $600M in annualized revenue in March 2026.
SV005 Tracxn Baseten Technologies Jan 20, 2026 | $300M | Series E | $5B
SV006 CB Insights Baseten Stock Price, Funding, Valuation, Revenue & Financial Statements Baseten has raised $585M over 7 rounds.
SV007 PitchBook Baseten 2025 Company Profile: Valuation, Funding & Investors | PitchBook Latest Deal Amount $75M
SV008 Baseten Announcing Baseten’s $150M Series D Today, we’re excited to announce our $150M Series D, led by BOND.
SV009 Baseten Announcing Baseten’s $75M Series C Today, we’re thrilled to announce our Series C fundraise.
SV010 Baseten Cloud Pricing Basic: $0 per month, pay as you go.
SV011 Baseten Baseten homepage Scale workloads across any region and any cloud ... with ... 99.99% uptime out of the box.
SV012 HostFleet Serverless GPU pricing matrix 2026 Baseten. Most expensive per GPU-hour on paper, but Truss, observability, and support are tangible.
SV013 Runpod Top serverless GPU clouds for 2026 AI workloads Baseten ... Usage-based (per-minute) ... 8–12 sec
SV014 Runpod Runpod pricing H100 PCIe $2.89/hr
SV015 Modal Modal's Series C: Raising $355M at a $4.65B valuation We’ve raised $355 million ... surpassing $300 million in annualized revenue. Our valuation is $4.65B post-money.
SV016 Reuters / U.S. News Exclusive-Modal Labs Valued at $4.65 Billion as AI Coding Takes Off The company’s annualized revenue is about $300 million, up from an annualized rate of $60 million in September.
SV017 Modal Modal pricing Get started with $30 / month free credits
SV018 Sacra Fireworks AI revenue, valuation & funding Fireworks AI hit $315M in annualized revenue in February 2026 ... gross margin sits at approximately 50%.
SV019 CoreWeave / Business Wire CoreWeave Reports Strong First Quarter 2026 Results Revenue backlog was $99.4 billion as of March 31, 2026.
SV020 CompaniesMarketCap CoreWeave market capitalization As of May 2026 CoreWeave has a market cap of $59.75 Billion USD.
SV021 Datadog Datadog Announces First Quarter 2026 Financial Results Revenue was $1,006 million ... Full Year 2026 Outlook: Revenue between $4.30 billion and $4.34 billion.
SV022 CompaniesMarketCap Datadog market capitalization As of May 2026 Datadog has a market cap of $88.04 Billion USD.
SV023 Datadog / SEC filing mirror Datadog Annual Report 2026 Form 10-K (NASDAQ:DDOG) ... For the fiscal year ended December 31, 2025
SV024 CompaniesMarketCap Cloudflare market capitalization As of May 2026 Cloudflare has a market cap of $85.47 Billion USD.
SV025 Stock Analysis Cloudflare revenue 2016-2026 This brings the company’s revenue in the last twelve months to $2.33B.
SV026 CompaniesMarketCap MongoDB market capitalization As of May 2026 MongoDB has a market cap of $27.01 Billion USD.
SV027 Stock Analysis MongoDB revenue 2017-2026 This brings the company’s revenue in the last twelve months to $2.60B.
SV028 Technavio AI Inference-as-a-service Market Growth Analysis - Size and Forecast 2026-2030 The AI Inference-as-a-service Market size was valued at USD 85.25 billion in 2025, growing at a CAGR of 22.1% during 2026-2030.
SV029 Mordor Intelligence Enterprise AI Market - Share, Trends & Size 2025 - 2031 The Enterprise AI market size stood at USD 114.87 billion in 2026.
SV030 Amazon Web Services Amazon Bedrock Pricing Amazon Bedrock offers ... batch inference at a 50% lower price compared to on-demand inference pricing.
SV031 Google Cloud Gemini Enterprise Agent Platform New customers get up to $300 in free credits.
SV032 Microsoft Azure Azure Machine Learning The SLA for Azure Machine Learning is 99.9 percent uptime. There's no additional charge to use Azure Machine Learning.