Diligence report AI inference infrastructure Late-stage private (Series E) 2026-05-30

Baseten

Premium inference infrastructure for production AI workloads

Baseten is a high-quality AI inference infrastructure company with real enterprise traction and strong category positioning, but public financial disclosure is too thin to justify treating momentum pricing as a high-conviction buy.

Cover facts

Latest valuation 01

$5B USD (Jan 2026 Series E) [CO025]

Total raised 02

$585M USD (publicly disclosed) [CO027]

Last round 03

$300M Series E [CO025]

Revenue estimate 04

$600M annualized run rate (Sacra, Mar 2026) [CI035]

Founded 05

2019 [CO001]

Headcount 06

~258 employees (Tracxn, Apr 2026) [CI038]

Company profile

Baseten is a San Francisco-based inference infrastructure company founded in 2019 by Tuhin Srivastava, Amir Haghighat, Phil Howes, and Pankaj Gupta. The company positions itself as the software layer for running production AI workloads across its own cloud or a customer's environment, combining model APIs, dedicated inference, training, and enterprise deployment controls. Public customer proof spans AI-native products and regulated workloads including Cursor, Clay, OpenEvidence, Abridge, Gamma, Patreon, Speechify, Writer, Hebbia, Wispr Flow, and others. Baseten raised $300M at a $5B valuation in January 2026, bringing disclosed funding to about $585M and reinforcing its status as a late-stage private infrastructure company.

Website: www.baseten.co
Founders: Tuhin Srivastava, Amir Haghighat, Phil Howes, Pankaj Gupta
Founding location: San Francisco, CA, USA
Headquarters: San Francisco, CA, USA
Product: Baseten sells a production inference platform for custom models, model APIs, dedicated inference, training workflows, and compound-AI orchestration. Its Truss framework is the developer entry point, while enterprise features emphasize multi-cloud deployment, self-hosting, regional controls, compliance scope, and performance tuning for latency-sensitive workloads.
Customers: Enterprise AI teams, AI-native application builders, and regulated workloads that need low-latency, high-reliability model serving with security, hybrid deployment, or performance-engineering support. Public customer evidence is strongest in healthcare AI, developer tooling, GTM automation, voice, and productivity software.
Business model: Usage-based monetization: token-priced model APIs, per-minute GPU and CPU compute for deployments, and negotiated Pro or Enterprise contracts for dedicated capacity, support, and self-hosted or data- residency-sensitive environments. The platform also appears to expand via training, chains, and higher-touch enterprise engineering support.
Stage: Late-stage private (Series E)
Funding status: Public funding history includes a little over $20M across seed and Series A, a $40M Series B in 2024, a $75M Series C in February 2025, a $150M Series D in September 2025, and a $300M Series E at a $5B valuation in January 2026. Business Wire says total disclosed funding is about $585M.

[CO001, CO002, CO004, CO005, CO006, CO008, CO020, CO022]

Executive summary

Top strengths

Premium positioning in a fast-growing part of the AI stack: production inference for open, custom, and hybrid workloads
Strong recent fundraising and investor roster, with IVP, CapitalG, NVIDIA, Greylock, Spark, and others backing the company
Clear developer and enterprise wedge via Truss, dedicated inference, self-hosting, and multi-cloud deployment controls
Public customer proof across healthcare, coding, voice, GTM, and productivity workloads shows relevance beyond one narrow vertical
Customer case studies repeatedly cite meaningful latency, throughput, and cost improvements versus prior inference setups

Top risks

Public revenue, margin, burn, concentration, and retention data remain undisclosed, forcing investors to rely on estimated rather than filed financials
Premium price positioning can be pressured by cheaper GPU clouds and by hyperscalers bundling inference with broader cloud relationships
Reliability and compliance underwriting still depends on negotiated terms, BAAs, and custom SLAs rather than website messaging alone
The jump from a $5B closed valuation to mooted $11B follow-on pricing is hard to defend without much better disclosure
Baseten appears to run a support-heavy, performance-engineering-intensive model that could be harder to scale cleanly than pure software narratives imply

Open gaps

Audited revenue, gross margin, burn, runway, and customer-concentration data are not public
No public evidence resolves retention metrics such as NRR, churn, or contract duration
Public governance visibility is limited; the full current board, committees, and founder ownership are not disclosed in the fetched corpus
Healthcare and regulated-use underwriting still needs exact BAA, DPA, and shared-responsibility terms for production accounts
The economics and terms behind any mooted post-Series-E financing are not publicly available

Chapter 01

01Company Overview

1.1 Identity, History, and Leadership

Baseten is easiest to understand as a founder-led inference infrastructure company rather than a generic MLOps toolkit. Its own history traces the origin back to late 2019, when the four founders started the company to solve model-deployment pain they had experienced firsthand. Current legal pages anchor the business as Baseten Labs, Inc. in San Francisco, while the homepage, enterprise, and pricing surfaces consistently frame the product around high-performance inference, managed APIs, and training or deployment workflows rather than broad horizontal software. That identity matters for the rest of the report: Baseten is positioning itself as the software layer that runs production AI workloads across its own cloud or a customer’s environment, with compliance and data-control features aimed at more sensitive workloads. Leadership continuity is also unusually visible for a late-stage private company. Tuhin Srivastava is publicly surfaced as CEO and co-founder, Amir Haghighat as CTO and co-founder, and author pages still identify Pankaj Gupta and Phil Howes as co-founders. The Series E post is signed by all four founders, reinforcing that the company still presents a founder-centric leadership story. The caveat is governance transparency: the public corpus clearly shows one board addition in 2025, but it does not provide a full current board roster, committee structure, or founder ownership map.[CO001, CO002, CO003, CO004, CO005, CO006]

Snapshot KPI table
Metric	Value / status	As of	Confidence	Note / gap
Founded	2019	2019-01-01	High	Exact public day is not surfaced in the fetched corpus, so the year is the reliable anchor.
Headquarters	San Francisco, California	2026-05-30	High	Privacy policy gives a specific San Francisco address.
Legal entity	Baseten Labs, Inc.	2026-05-30	High	Terms and privacy policy use the same legal entity name.
Current stage	Private, Series E	2026-05-30	High	Supported by Tracxn and the archived PitchBook profile after the January 2026 round.
Latest financing	$300M Series E at $5B valuation	2026-01-20	High	Led by IVP and CapitalG with NVIDIA and prior investors also participating.
Lifetime capital raised	$585M	2026-01-23	High	BusinessWire and market-data sources align on cumulative funding.
Business model	Usage-based API tokens plus per-minute compute	2026-05-30	Medium	Public pricing is clear; enterprise contract discounts and minimums are not.
Deployment model	Baseten Cloud, self-hosted, and region-aware enterprise options	2026-05-30	High	Official enterprise and healthcare pages emphasize self-hosting and data-control features.
Named customer set	Abridge, Cursor, Clay, OpenEvidence, Notion, Speechify, Gamma	2026-05-30	High	Named across careers, customer hub, customer stories, and Series E press materials.
Public headcount		2026-05-30	Low	PitchBook and Tracxn conflict, so current employee count should be treated as unresolved.

Snapshot rows mix stable identity facts with current operating and financing markers; null means the fetched public corpus is not reliable enough to support a single number.

[CO001, CO002, CO003, CO004, CO006, CO007]

Leadership and founder table
Person	Current role / public title	Public evidence of fit or coverage	Visibility today	Diligence implication
Tuhin Srivastava	CEO, co-founder	Public spokesperson on financing and company thesis; author page and financing coverage identify him as the chief executive.	High	Founder-led narrative remains a strength but also creates CEO key-person dependence.
Amir Haghighat	CTO, co-founder	Author page identifies the technical leader; Series E signoff keeps him in the visible founder set.	High	Technical and product credibility remain tied to a founding executive.
Phil Howes	Co-founder; chief scientist in independent coverage	Official author page plus Tech Funding News show ongoing founder visibility tied to model performance and research.	Medium	Science leadership appears founder-rooted even if the exact org chart is private.
Pankaj Gupta	Co-founder	Official author page and Series E signoff confirm continuity, but the fetched corpus does not surface a current operating title.	Medium	Functional coverage is less transparent than for the CEO and CTO.
Jay Simons	Board member since Series D	Series D explicitly says he joined the board as part of the BOND-led financing.	Low	Governance visibility improved in 2025, but the full board and committee map is still incomplete.

Rows cover the founders and the one board addition that is explicit in the fetched corpus; the public materials reviewed do not expose a full executive roster or detailed committee structure.

[CO010, CO011, CO012, CO013, CO014, CO015]

FO002: Company snapshot logic

Baseten’s current story links founder continuity, deployment control, customer-proofed performance, and repeated capital access into one inference-platform thesis.

[CO004, CO005, CO006, CO008, CO010, CO011]

1.2 Funding, Stage, and Investor Base

Baseten’s capital history is now the clearest external signal that the company has graduated into late-stage AI infrastructure. The fetched corpus supports a progression from a modestly funded early company to a Series E business: the Series A post says Baseten had raised a little over $20 million across seed and Series A; the Series B announcement adds $40 million; the Series C announcement adds $75 million; Series D adds $150 million; and Series E adds another $300 million at a $5 billion valuation. Independent market-data sources corroborate the timing of those rounds and place the September 2025 Series D valuation at about $2.15 billion, showing just how quickly the company repriced upward before the January 2026 round. The investor roster has also deepened rather than churned. Greylock and South Park Commons appear early, IVP and Spark become visible at growth stage, and later rounds add BOND, CapitalG, Conviction, 01A, BoxGroup, and NVIDIA. That pattern matters because it suggests both repeated insider support and a widening set of AI-specialist and platform investors. At the same time, public disclosure still stops well short of a full cap-table view: the fetched corpus does not expose current ownership percentages, investor control rights, liquidation preferences, or a fully reliable board-observer map.[CO016, CO017, CO018, CO019, CO020, CO021]

Stakeholder or investor map
Investor / stakeholder	First explicit round in corpus	Current relevance	Why it matters	Diligence ask
Greylock	Series A	Earliest clearly named institutional backer in the fetched official corpus	Anchors early company formation and remained visible through later growth-round narratives.	Confirm current ownership and pro-rata participation after Series E.
South Park Commons	Series A	Early network backer that still appears in later company histories	Signals founder-network support rather than pure late-stage capital.	Confirm whether SPC still holds a meaningful stake after multiple step-up rounds.
IVP	Series B	Growth-stage repeat lead that also led or anchored later rounds	Appears repeatedly across B, C, and E, making it one of the clearest long-duration financial sponsors.	Confirm board rights, reserve behavior, and concentration at Series E.
Spark Capital	Series B	Early growth investor visible across the 2024–2025 rounds	Helps show continuity from the generative-AI scaling phase into later financing momentum.	Confirm current stake and whether Spark still participates after Series E.
01A	Series C	Later-stage investor that remains named in subsequent financing data	Links Baseten to operator-investor sponsorship from Adam Bain and Dick Costolo’s network.	Confirm whether 01A has governance rights or only economic exposure.
BOND	Series D	Series D lead and later participant in Series E	Important marker for the 2025 valuation step-up and board evolution.	Confirm whether BOND added special terms at the D-to-E transition.
CapitalG	Series D	Joined in D and co-led E	Potentially valuable strategic network around Google ecosystem distribution and infrastructure credibility.	Clarify commercial partnerships or channel overlap beyond pure equity sponsorship.
NVIDIA	Series E	Strategic investor in the latest round	Could matter for hardware access, performance collaboration, and signaling inside AI infrastructure.	Confirm whether the relationship includes commercial commitments or preferred hardware access.
Conviction	Series C	Visible across C, D, and E-era disclosure	Adds AI-specialist sponsorship and public advocacy for the inference-layer thesis.	Confirm present ownership and board or observer rights.
BoxGroup	Series D	Still named in later-round investor rosters	Shows continued support from earlier network investors even as the cap table deepens.	Confirm position size versus symbolic participation in later rounds.

This map enumerates investors explicitly named across verified public financing sources from Series A through Series E; it is not a full cap table and does not expose ownership percentages or liquidation preferences.

[CO016, CO017, CO018, CO019, CO020, CO022]

1.3 Product Scope, Scale Proof, and Milestones

The product and scale story is strong enough to explain why Baseten could raise three times in roughly a year, but it still needs to be read with some caution. Official materials now tie the company to a broad inference platform narrative: cloud and self-hosted deployment options, enterprise controls, model APIs, and pay-as-you-go compute pricing. The strongest external proof comes from customer and partner evidence. NVIDIA’s case study says Baseten cut cold starts to 5–10 seconds from up to five minutes and doubled one customer’s inference performance with TensorRT-LLM. Customer case studies claim that OpenEvidence runs billions of requests per week on Baseten, Gamma serves roughly 3 million images per day for 70+ million users, Speechify cut cost per million characters by 44%, and Patreon cut GPU cost by 70%. Those numbers support the idea that Baseten is serving meaningful production workloads across healthcare, productivity, creator, and GTM software. Still, chapter-one diligence should preserve three caution flags. Public headcount is inconsistent across data vendors, governance detail remains thin outside financing announcements, and independent reachability monitoring does not provide enough incident detail to fully evaluate reliability history. So the right takeaway is not that Baseten lacks scale; it is that the business looks operationally significant while remaining unusually private relative to its valuation.[CO005, CO006, CO007, CO008, CO021, CO028]

Milestone table
Date	Event	Type	Amount / valuation / status	Participants	Implication
2019-01-01	Founders start Baseten to solve ML deployment pain	founding	Company formed	Tuhin Srivastava; Amir Haghighat; Phil Howes; Pankaj Gupta	Establishes the origin point for the current inference-first narrative.
2021-05-01	Early product is quietly announced after roughly 18 months of building and public beta is highlighted in the Series A post	product	Public beta era	Baseten founders	Shows that the company moved from internal build mode into market testing well before the later capital sprint.
2022-04-26	Series A milestone formalizes early investor backing	financing	> $20M cumulative seed + Series A	Greylock; South Park Commons; Lachy Groom; Ray Tonsing; angel investors	Validates early demand for the original model-deployment product vision.
2024-03-04	Series B adds growth-stage capital	financing	$40M	IVP; Spark; Greylock; South Park Commons; Lachy Groom; Base Case	Moves Baseten from early MLOps roots into broader generative-AI infrastructure expansion.
2025-02-19	Series C pairs funding with public scale claims	scale	$75M; workloads across thousands of GPUs; millions of end customers	IVP; Spark; Greylock; Conviction; South Park Commons; Basecase; 01A-linked investors	Demonstrates that infrastructure scale became central to the story before the late-stage jump.
2025-09-05	Series D raises new growth capital and adds a board member	governance	$150M at about $2.15B valuation	BOND; CapitalG; Conviction; Jay Simons	Capital formation and governance maturation start to move together.
2026-01-14	WorkOS interview highlights a startup program and voice as an emerging modality	product	New GTM program and voice focus	Philip Kiely; WorkOS interviewers	Suggests Baseten is broadening market coverage and prioritizing voice workloads in the next phase.
2026-01-20	Series E establishes Baseten as a late-stage inference platform company	financing	$300M at $5B valuation; third fundraise in prior year	IVP; CapitalG; NVIDIA; 01A; Altimeter; Battery Ventures; BOND; BoxGroup; Blackbird Ventures; Conviction; Greylock	Confirms investor appetite for independent inference infrastructure at significant scale.

Year-only dates use January 1 and month-only dates use the first day of the cited month when the fetched public source supports the timing but not a precise day.

[CO001, CO009, CO016, CO017, CO018, CO019]

FO001: Company milestone timeline

The chronology shows Baseten moving from a 2019 founding into a compressed A-to-E financing sequence and a broader inference-platform narrative by early 2026.

Year-only or month-only milestones use January 1 or the first day of the cited month when the fetched source does not provide a precise public day.

[CO001, CO016, CO017, CO018, CO019, CO020]

FO003: Scale and proof KPIs

These KPIs are not internal financial statements; they are the clearest public scale and customer-outcome markers visible in the fetched corpus.

Customer metrics come from individual case studies and should be read as proof points rather than a consolidated operating dashboard for Baseten itself.

[CO025, CO027, CO031, CO032, CO033, CO034]

1.4 Exhibits

Chapter 02

02Market Analysis

2.1 Market Boundary, Included Spend, and Substitutes

Baseten is best understood as a production inference platform, not a general-purpose cloud or a model lab. The included spend is the layer required to package, deploy, run, monitor, meter, and secure AI workloads after a team already has a model or a model endpoint in mind: model APIs, dedicated deployments, compound-AI orchestration, observability, billing, and the support needed to keep latency and uptime inside production targets. The market boundary is narrower than the full enterprise-AI stack because Baseten does not market data lakes, BI, generic application development, or broad agent productivity suites as the center of its value proposition. It is also narrower than frontier-model R&D because Baseten helps teams operationalize models rather than invent them. The closest substitutes are hyperscaler AI platforms, internal GPU infrastructure, and specialist GPU clouds such as Modal, Replicate, and Runpod. Baseten's adjacencies sit one step above and below the core deployment layer: training that promotes directly into inference, and model-lab monetization through white-labeled APIs.[CM001, CM002, CM003, CM004, CM005, CM006]

Market definition table
Segment / Category	Included Spend	Excluded Spend	Buyer / Payer	Relevance to Baseten
Production inference platform	Model-serving runtime, autoscaling, observability, billing, support, security controls	Foundation-model R&D, generic data warehousing, generic app-dev tools	AI product lead or platform team; product or IT budget pays	Core market where Baseten is explicitly positioned
Model APIs	Usage-based inference endpoints, token metering, OpenAI-compatible access	Closed-model ownership or application-layer SaaS feature spend	Application engineer or product team; engineering budget pays initially	Low-friction entry point and evaluation wedge
Dedicated / self-hosted inference	Dedicated GPU capacity, self-hosting, data residency, enterprise support	General colocation, generic Kubernetes services, unmanaged GPU reservations	Head of AI platform, CIO/CTO, security or procurement stakeholders	Enterprise expansion path for sensitive or scaled workloads
Compound AI orchestration	Multi-model chains, hardware-aware orchestration, workflow optimization	Generic workflow automation or iPaaS tools	ML engineer or application engineering lead; product/platform budget pays	Important expansion layer for multimodal and agentic workloads
Model-lab API monetization	White-labeled API endpoints, rate limits, API keys, billing and metering	Consumer billing software or payment processors	Model lab or frontier-model product team; R&D/platform budget pays	Distinct segment exposed by Frontier Gateway
Training-to-inference loop	Managed training jobs and checkpoint promotion into inference	Frontier research spend and pure experimentation without deployment intent	ML research lead or platform team; R&D budget pays	Adjacency that deepens platform stickiness but is not the primary market lens

Boundary rows mix Baseten's core market with immediately adjacent spend layers. The point is to define what belongs in the commercial wedge before applying public market-size estimates.

[CM001, CM002, CM003, CM004, CM006, CM008]

2.2 Multiple Sizing Lenses and Why They Do Not Collapse into One TAM

Public sizing evidence is directionally strong but category boundaries remain messy. Technavio's narrower AI inference-as-a-service category is already a large market at USD 85.25 billion in 2025 and is forecast to grow 22.1% annually through 2030. Fortune Business Insights uses a broader AI inference lens and puts the category at USD 117.80 billion in 2026, while Mordor Intelligence sizes the adjacent enterprise AI market at USD 114.87 billion in 2026 and shows that software or platform layers and cloud deployment dominate spending. Those numbers should not be added together, because they partially overlap and use different definitions, but they do triangulate the same conclusion: Baseten is not pursuing a niche budget line. The useful valuation discipline is to keep moving down the stack from broad AI platform spend to the narrower production-inference wedge where Baseten actually competes. That narrower wedge is cloud-first, platform-heavy, North-America-concentrated, and already large enough that Baseten does not need heroic market-share assumptions to justify a meaningful opportunity. What public data does not provide is a clean Baseten-specific SAM or SOM.[CM009, CM010, CM011, CM012, CM013, CM014]

TAM / SAM / sizing lens table
Lens	Publisher	Base year / horizon	Geography	Metric	Value	Limitation
AI inference-as-a-service market size	Technavio	2025 base, 2026-2030 forecast	Global	Market size	USD 85.25B	Narrower service-based inference category; summary page only
AI inference-as-a-service growth	Technavio	2026-2030	Global	CAGR	22.1%	Forecast, not current revenue; category scope differs from broader inference reports
Broader AI inference market size	Fortune Business Insights	2026	Global	Market size	USD 117.80B	Broader execution market spanning cloud, edge, and on-prem
Broader AI inference growth	Fortune Business Insights	2026-2034	Global	CAGR	12.98%	Longer forecast horizon than Technavio
Adjacent enterprise AI market size	Mordor Intelligence	2026	Global	Market size	USD 114.87B	Much broader than inference alone
Platform-heavy slice inside enterprise AI	Mordor Intelligence	2025	Global	Software/platform share	65.89%	Share of broader enterprise AI market, not Baseten-specific SAM
Cloud-first deployment lens	Mordor Intelligence	2025	Global	Cloud deployment share	67.33%	Share of enterprise AI revenue, not inference-only spend
Large-enterprise buyer concentration	Mordor Intelligence	2025	Global	Large-enterprise share	71.43%	Useful for buyer concentration, not direct TAM
Regulated-healthcare growth wedge	Mordor Intelligence	2026-2031	Global	Healthcare CAGR	20.77%	Vertical growth lens rather than whole-market size

These rows are sizing lenses, not additive market totals. Public sources use overlapping definitions, so the safest use is triangulation rather than arithmetic aggregation.

[CM009, CM010, CM012, CM013, CM014, CM015]

FM001: Market sizing lens

A narrowing lens from broad enterprise AI spend toward Baseten's more defensible beachhead in performance-sensitive, compliant production inference.

This pyramid is a narrowing logic chain, not an additive model. The middle layers mix market share and market size because public sources do not publish one clean Baseten-specific hierarchy.

[CM005, CM009, CM013, CM014, CM015, CM020]

FM002: Serverless GPU price range

Published hourly GPU-rate spread across specialist providers illustrates how visible raw infrastructure pricing has become in this market.

Ranges come from HostFleet's April 2026 matrix of vendor-published prices. They are not performance-normalized benchmark results and do not include negotiated enterprise discounts.

[CM031, CM032, CM043, CM044, CM047]

2.3 Buyer, User, and Payer Segmentation

The public buyer evidence points to three especially relevant segments. First are AI-native product teams such as Gamma, where product or engineering leaders care about launch speed, low latency, and lower-cost open-model serving without building a dedicated ML-infrastructure team. Second are enterprise AI platform teams and model builders such as Writer, where the user is the ML engineer or data scientist but the deployment decision widens to include platform, security, and procurement once workloads become dedicated, multi-GPU, or compliance-sensitive. Third are regulated vertical deployments such as OpenEvidence in healthcare, where reliability, data handling, and the ability to scale without signing large GPU commitments become explicit selection criteria. Baseten's packaging supports these segments differently: usage-based plans and model APIs lower the barrier for experimentation, while Enterprise, self-hosting, SSO or SCIM, compliance policies, and billing APIs are signals that larger buyers expect governance, attribution, and controlled rollout. The budget owner is therefore partly observed and partly inferred: it likely starts in product or engineering budgets and migrates toward central platform or IT budgets as the deployment becomes more strategic.[CM016, CM017, CM021, CM022, CM023, CM024]

Segment / buyer map
Segment	Buyer	User	Payer	Workflow	Budget owner	Adoption trigger
PLG AI application team	VP Engineering or product lead	Application engineer and ML engineer	Product or engineering budget	Prototype with Model APIs then move to dedicated capacity	Product engineering	Launch-day latency, reliability, and lower open-model cost
AI-native startup platform team	CTO or Head of AI	ML engineer and data scientist	Infrastructure or platform budget	Replace closed-model dependence with managed open-model serving	Engineering / platform	Need performance without hiring an infra-specialist team
Large-enterprise AI platform team	Head of AI platform, CIO, or CTO	Platform engineer, ML engineer, data scientist	Central platform or IT budget	Deploy compliant production inference across business units	Platform / IT	Dedicated capacity, SSO/SCIM, compliance policy, cloud commitments
Regulated healthcare AI workload	VP Engineering, CTO, or clinical-product leader	ML engineer or application engineer	Platform plus security/compliance budget	Medical search, transcription, or patient-facing assistant deployment	Platform plus security	HIPAA, uptime, and data-control requirements
Model lab or proprietary-model vendor	Research-product leader or commercialization lead	Inference engineer and research engineer	R&D or platform budget	White-labeled API monetization through Frontier Gateway	R&D / platform	Need to sell inference without building a customer-facing control plane
Compound AI / multimodal team	Head of AI application or staff engineer	Full-stack engineer plus ML engineer	Product plus platform budget	Chains-based orchestration across multiple models and machines	Product / platform	Latency and GPU waste from monolithic deployments

Buyer and budget-owner fields combine directly stated product packaging with cautious inference from customer stories. Public evidence is stronger on user and trigger than on exact signature authority.

[CM016, CM021, CM022, CM023, CM024, CM025]

FM003: Segment fit heatmap

Qualitative fit heatmap showing where Baseten's compliance, cloud, and premium-support proposition appears strongest by segment.

Ratings synthesize public evidence on cloud deployment share, healthcare growth, compliance requirements, and visible pricing pressure; they are not measured win-rate data.

[CM015, CM017, CM029, CM043, CM046, CM047]

2.4 Deployment Value Chain and Adoption Path

Baseten's value chain begins with either an open-source model, a custom model, or a proprietary model that already exists and needs to be turned into a dependable production service. From there the daily user is usually a model, platform, or application engineer who packages the workload and evaluates latency, throughput, and cost. The next gate is organizational rather than technical: security, compliance, and procurement checks intensify when the workload needs dedicated capacity, data residency controls, or identity integration. Baseten then sits in the orchestration layer that decides whether the workload runs through Model APIs, Dedicated Inference, Chains, Frontier Gateway, or a self-hosted or hybrid deployment. Under that layer sits the actual cloud and GPU substrate, which remains economically critical because capacity, price, and regional availability directly determine margin and reliability. Baseten's customer stories suggest that the company tries to move value capture upstream from raw GPU rental into performance engineering, deployment tooling, and operational support, because those are the layers customers cite when they explain why they did not keep building in-house.[CM021, CM025, CM027, CM028, CM029, CM030]

FM004: Deployment value-chain map

Baseten sits between model creation and end-user traffic, trying to capture value above raw GPU supply by owning deployment, controls, and performance operations.

This value chain is synthesized from product packaging, customer stories, and competitor documentation; it is a market-structure diagram, not an internal process map from Baseten.

[CM021, CM025, CM027, CM028, CM029, CM030]

2.5 Growth Drivers, Adoption Constraints, and Market Discipline

The strongest growth drivers are clear in both vendor and analyst material. Open-source models are improving fast enough that product teams increasingly want infrastructure optimized for those models rather than closed-model dependence; real-time and compound-AI workloads make latency and throughput economically visible; and enterprise buyers are moving from pilots into production, especially where regulated data, uptime, or model-performance tuning matter. Baseten's own case studies back that thesis with concrete claims on latency, tokens per second, image throughput, and reduced maintenance burden. But this is not an easy market. Hardware supply constraints, tariffs, and high accelerator prices remain structural headwinds. Talent shortages and legacy-system integration complexity slow rollout for enterprise buyers. Public competitive pricing also shows that raw GPU-hour economics are unforgiving: cheaper specialist clouds and broader hyperscaler suites both put pressure on a standalone inference vendor. That is why the right market view for Baseten is not all AI infrastructure; it is the subset of inference workloads where premium support, compliance, and performance matter enough to offset higher headline price points. Public data supports that wedge, but not yet a precise economic moat or a cleanly measurable serviceable market.[CM031, CM032, CM033, CM034, CM035, CM036]

Growth drivers and constraints table
Factor	Direction	Timing	Implication	Diligence ask
Open-source frontier models improve price/performance	Driver	Now	Makes specialist inference platforms more attractive than closed-model APIs for cost-sensitive products	What percentage of Baseten revenue already comes from open-model workloads?
Real-time and compound-AI latency sensitivity	Driver	Now	Raises willingness to pay for performance engineering, orchestration, and autoscaling	How much of usage is latency-critical versus offline batch?
Cloud-first enterprise AI deployment	Driver	Now	Supports adoption of managed inference rather than self-built infra for many teams	How much of Baseten demand comes from cloud-first versus self-hosted accounts?
Regulated-sector demand for compliance and data control	Driver	Now	Creates a wedge for HIPAA, region restrictions, and hybrid/self-hosted deployment	What share of enterprise pipeline requires regulated deployment boundaries?
GPU supply constraints and tariff pressure	Constraint	Now	Raises cost of goods sold and can limit capacity availability	What reserved-capacity strategy or cloud diversification protects supply?
Skills gaps and integration complexity	Constraint	Medium term	Slow enterprise rollouts and increase implementation burden	How much deployment work is productized versus services-heavy?
Price competition from specialist GPU clouds	Constraint	Now	Commodity GPU-hour comparisons can make Baseten look expensive on paper	Where does Baseten consistently win despite higher list prices?
Hyperscaler platform bundling	Constraint	Medium term	Broader native-cloud suites can absorb spend that might otherwise go to a specialist inference vendor	Which workloads truly require a specialist rather than a hyperscaler-native stack?
Opaque unit economics and support attachment	Constraint	Diligence now	Public material does not show whether premium positioning translates into durable margin	Request product-level gross margin and enterprise discount data.

The driver and constraint rows mix third-party market reports, Baseten product claims, customer evidence, and an independent pricing matrix. They are intended as a diligence framework, not a weighted scoring model.

[CM031, CM032, CM033, CM034, CM038, CM039]

2.6 Exhibits

Chapter 03

03Competitors

3.1 Competitive landscape and job-to-be-done coverage

Baseten sits in a crowded middle layer between low-friction serverless inference peers and large-cloud incumbents. The closest direct substitutes are Modal, Replicate, and Runpod: all give developers a way to get models onto GPUs without owning the infrastructure outright, but each compresses the stack in a different way. Modal optimizes for Python-native serverless compute, Replicate for community models and ultra-low-friction APIs, and Runpod for cheap raw capacity through Pods and Serverless. Above them sit AWS Bedrock/SageMaker, Google Vertex AI, and Azure ML, which compete less on indie-developer ergonomics and more on procurement leverage, governance, and existing cloud commitments. Below them sits the status-quo alternative: internal build on top of open packaging standards and rented GPUs. Baseten broadens the battlefield further because it sells not just deployment, but also training, multi-step orchestration, and white-labeled API monetization for model labs. That breadth means the company is not in a two-vendor race; independent datasets and company materials alike point to a fragmented, multi-class landscape where buyers can substitute across hosted inference, raw compute, hyperscaler tools, or self-managed stacks depending on whether they optimize for speed, control, trust, or cost.[CP001, CP002, CP004, CP005, CP006, CP007]

Competitor profile table
Competitor	Category	Scale/funding	Target segment	Differentiation	Limitation
Modal	Direct serverless peer	$30/mo Starter credit; Team plan at $250/mo + compute	AI engineers and startups	Python-first serverless DX, instant autoscaling, observability	No public self-host option; enterprise controls concentrated in paid tiers
Replicate	Direct hosting/API peer	Thousands of community models; custom deployment via Cog	Developers, prototyping teams, model tinkerers	One-line API, model marketplace, fine-tunes	Private models bill setup + idle time and public enterprise posture is thinner
Runpod	Raw GPU cloud / serverless substitute	750,000+ developers; Pods + Serverless + Clusters	Cost-sensitive AI builders and infra-heavy teams	Cheapest published raw GPU rates, many SKUs, fast scale	More DIY serving stack and less turnkey inference lifecycle tooling
AWS Bedrock / SageMaker	Hyperscaler incumbent	AWS-scale data/AI platform with provider/model menu	Enterprises already committed to AWS	Procurement leverage, governance, wide ecosystem	Complex pricing and stronger cloud lock-in
Google Vertex AI	Hyperscaler incumbent	200+ Google and third-party models/tools	GCP enterprise and platform teams	Model Garden, pipelines, integrated data + AI stack	Management fees and GCP dependence complicate simple cost comparisons
Azure ML	Hyperscaler incumbent	Azure-native ML platform with 99.9% SLA	Azure-centric and regulated enterprises	Centralized studio, model catalog, Azure security posture	Separate Azure service charges and no public multi-cloud story
Internal build (Truss/Cog + rented GPUs)	Status quo / internal build	Portable open-source packaging on owned or rented infrastructure	Teams with strong platform engineering capacity	Maximum control and lowest software lock-in	Highest operational burden for scaling, reliability, and compliance
Model labs building branded APIs	Adjacent / likely entrant	Direct API ownership with custom billing and metering surfaces	Frontier model vendors and specialized labs	Own brand, own customer relationship, direct monetization	Hard to maintain capacity planning and enterprise operations without a managed partner

Rows compare the main ways buyers can solve the same deployment job, including direct peers, incumbents, and internal-build substitutes.

[CP004, CP005, CP006, CP007, CP008, CP009]

FP001: Competitive positioning map

Ordinal scoring of self-serve simplicity versus enterprise control / portability.

[CP004, CP006, CP007, CP023, CP025, CP028]

3.2 Capability and pricing comparison

Baseten compares best when the buyer wants a managed inference platform rather than just a GPU rental or a one-line demo API. Public materials show a stack that combines custom-model packaging, OpenAI-compatible Model APIs, training, Chains orchestration, enterprise deployment modes, and a runtime built around low-latency optimization techniques. Modal is the sharpest developer-experience counterpoint: clean serverless pricing, generous monthly credits, and explicit GPU concurrency limits make it compelling for teams that mainly need elastic Python compute. Replicate is even lighter weight for prototypes and model discovery, but its private-model economics include setup and idle time on dedicated hardware. Runpod is the price-floor alternative, publishing cheaper raw hourly and per-second GPU rates while leaving more of the serving lifecycle to the customer. Hyperscalers are harder to compare on a like-for-like basis because Bedrock, Vertex, and Azure ML wrap model access in broader cloud billing, governance, and platform fees. Net: Baseten's public list pricing is transparent and feature-rich, but it clearly sells performance, portability, and support rather than commodity compute. That is a valid wedge only if customers value total production outcomes over the cheapest published GPU-hour.[CP003, CP011, CP012, CP013, CP014, CP015]

Feature / capability matrix
Buying criterion	Baseten	Modal	Replicate	Runpod	Bedrock / SageMaker	Vertex AI	Azure ML
Custom-model packaging framework	Truss	Python functions	Cog	Container / handler model	Custom training + deployment	Custom training + deployment	Model catalog + deployment
OpenAI-compatible hosted open models	yes	unknown	partial	unknown	partial	partial	partial
Managed training on same platform	yes	unknown	fine-tunes only	yes	yes	yes	yes
Self-host / customer-cloud option	yes	unknown	unknown	unknown	no public BYOC outside AWS	no public multi-cloud option	no public multi-cloud option
Multi-cloud / cloud-agnostic routing	yes	yes	unknown	many regions / no lock-in claim	no	no	no
Enterprise trust posture	SOC2 + HIPAA + single-tenant	Enterprise SSO / audit / HIPAA	unknown	SOC2 Type II	enterprise governance	enterprise governance	99.9% SLA + Azure controls
Multi-step orchestration built in	Chains	generic functions	custom code only	queues + serverless	broader platform services	pipelines + agents	broader ML studio
Public list pricing transparency	high	high	medium	high	medium	medium	low

Cells marked unknown reflect missing public evidence; the matrix compares buying criteria, not benchmark winners.

[CP013, CP014, CP015, CP019, CP020, CP021]

Pricing / packaging comparison
Vendor	Public model	Contract model	Price signal	Included capabilities	Implication
Baseten Basic	Custom + open-source deployment	$0/mo + usage	Public GPU and per-token tables; no idle-charge claim	Dedicated deployments, Model APIs, training	Transparent entry point for production workloads
Baseten Pro / Enterprise	Quoted	Sales-led / discounted	Priority compute, custom SLAs, self-host, volume discounts	Dedicated support, data residency, enterprise controls	Upsell is breadth and support, not lower public list price
Modal Starter	Serverless compute	$0 + compute	$30/mo credits; 10 GPU concurrency	Logs, region selection, serverless primitives	Excellent prototype and small-team on-ramp
Modal Team	Serverless compute	$250/mo + compute	$100/mo credits; 50 GPU concurrency	Custom domains, static IP, rollbacks	Scales with startups while staying compute-centric
Replicate private models	Dedicated hardware for custom models	Per-second compute incl. setup / idle / active	No fixed seat plan; pay while instance is online	Custom models via Cog, autoscaling	Warm custom deployments can get expensive
Runpod Secure Cloud	Raw GPU instances	Per-hour GPU rental	Example list rates include A100 at $1.39/hr and H100 PCIe at $2.89/hr	Reliable pods, broad GPU menu	Cost floor for buyers willing to self-manage
Runpod Serverless	Flex or active workers	Per-second	H100 flex $0.00116/s and active $0.00093/s	API endpoints, queues, fast cold starts	Attractive for bursty inference and scale-to-zero workloads
AWS Bedrock	Provider/model-specific APIs	Per-token + tiered service	Batch listed at 50% below on-demand	Managed model access plus AWS add-ons	Easy incumbent API path but bill complexity is higher
Google Vertex AI	Agent/model platform	Usage + compute + fees	Compute/storage plus management fees; pipelines at $0.03/run	Notebooks, Model Garden, pipelines	Best fit inside existing GCP estate
Azure ML	Azure-native ML platform	Consumed Azure services	No standalone Azure ML fee	Studio, model catalog, deployment	Procurement advantage for Azure-first buyers

Public list pricing is comparable only at the headline level; negotiated discounts and workload-specific costs remain private across most enterprise deals.

[CP011, CP012, CP016, CP017, CP018, CP019]

FP002: Feature breadth / capability map

Class-level capability strengths rather than vendor-by-vendor benchmark claims.

[CP014, CP023, CP024, CP025, CP028, CP031]

3.3 Distribution power, switching costs, and trust posture

Baseten's strongest non-performance argument is that customers can keep control while still avoiding the operational burden of a self-built inference platform. Multi-cloud routing, self-hosted deployment, and single-tenant options help win buyers who fear locking critical workloads into one hyperscaler or who need tighter data-residency boundaries. The tradeoff is structural: because Baseten also relies on portable, open packaging and because adjacent tools like Cog plus raw GPU clouds remain available, hard switching costs stay lower than in closed-model APIs or data platforms. On trust, Baseten's public posture is ahead of the self-serve peer set: it pairs SOC 2 Type II and HIPAA claims with explicit statements about not storing inputs or outputs by default. Modal narrows part of that gap through enterprise-only SSO, audit logs, and HIPAA; Replicate stays strongest on ease of adoption; Runpod stays strongest on low-cost infrastructure freedom. The biggest distribution disadvantage remains hyperscaler channel power. AWS, GCP, and Azure can fold AI procurement into pre-existing billing, IAM, and cloud-commitment relationships, which means Baseten must keep proving that open-model performance, portability, and support justify a separate vendor decision.[CP023, CP024, CP025, CP026, CP027, CP028]

3.4 Moat durability and competitive risk

Baseten's moat is real but softer than a proprietary-model or data-network moat. The best-supported edge is integrated execution: optimized runtimes, multi-cloud capacity, regulated deployment options, and hands-on engineering support aimed at teams shipping serious AI products. Training and Frontier Gateway broaden the product into a larger platform story, which can strengthen account control if customers standardize on one vendor from model development through branded API delivery. The countervailing evidence is meaningful. HostFleet's April 2026 serverless GPU matrix shows Baseten as the most expensive published option across multiple common GPU tiers, while Sacra explicitly warns that hyperscalers can pressure independent vendors by bundling inference into broader cloud commitments. Baseten's own status page reports 99.91% Model API uptime over its displayed window and multiple May 2026 incidents, so four-nines reliability should be read as a sales target rather than proof that operations are already frictionless. Capital helps but does not eliminate the problem: the latest financing and investor roster improve staying power in a capital-intensive market, yet the core underwriting conclusion remains that Baseten is best positioned for premium, production-grade open-model inference workloads—not for winning the commodity price war on raw GPU hours.[CP031, CP032, CP033, CP034, CP035, CP036]

Moat durability / competitive risk register
Moat claim	Threat	Severity	Mitigation / diligence ask
Integrated runtime + orchestration stack	Serverless peers replicate parts of the DX without matching full breadth	medium	Benchmark full application workflows, not just one endpoint latency number
Multi-cloud + self-host portability	Customers can multi-home or migrate away more easily than in closed platforms	medium	Measure retention and expansion by deployment mode to see if portability still converts into durable spend
Enterprise trust posture	Hyperscalers bundle governance into existing contracts and cloud commitments	high	Collect regulated-industry win/loss notes against AWS, GCP, and Azure
Training + gateway expansion	Baseten now competes with broader AI platform vendors and model-lab tooling	medium	Quantify how often training and gateway products lead to incremental inference revenue
Open-source packaging	Truss lowers lock-in and shrinks switching cost	high	Track Truss-to-paid conversion and production retention by cohort
Price premium versus raw GPU clouds	Runpod and similar hosts undercut headline infrastructure price	high	Prove total-cost-of-ownership and reliability ROI with customer benchmarks
Reliability brand	Recent incidents weaken the four-nines story in competitive deals	medium	Review incident frequency, MTTR, and customer multi-region failover patterns
Capital backing	Hyperscalers and other well-funded infra vendors can still outspend Baseten	medium	Validate whether investor and GPU-supply ties create real commercial or capacity advantage

Severity reflects the likelihood that each threat compresses pricing power or raises customer acquisition friction over the next 12–24 months.

[CP023, CP024, CP025, CP029, CP030, CP031]

FP003: Moat / readiness KPIs

Compact readout of Baseten's competitive durability and current pressure points.

[CP031, CP033, CP035, CP038, CP039, CP040]

3.5 Exhibits

Chapter 04

04Financials

4.1 Revenue model and public pricing

Baseten's public financial story starts with a straightforward revenue design: the company charges for production inference and adjacent infrastructure usage, not for seats. The pricing page exposes three commercial surfaces—dedicated deployments, Model APIs, and Training—and wraps them in a Basic self-serve tier plus quote-based Pro and Enterprise packaging. Model APIs are priced per million tokens, dedicated deployments are billed for compute used down to the minute, and Training is sold as both managed Training Jobs and the newer Loops workflow that feeds checkpoints straight into inference. That is a coherent production-infrastructure monetization model, and the billing-usage API reinforces it by splitting spend across dedicated inference, training, and Model APIs with daily breakdowns and credits used. The nuance is that list pricing is only the outer shell of the model. Public materials show that the real commercial wedge sits in premium support, priority compute, self-host deployment, use of existing cloud commitments, custom SLAs, and advanced security or governance. Those features imply Baseten is trying to monetize both usage and a higher-touch enterprise motion. The Terms also make the customer Order the binding commercial instrument, which means realized pricing can diverge materially from the public price page once support, discounts, and minimum commitments enter the deal. That is important for revenue recognition and yield analysis because public list prices are observable, but list-to-net economics are not. The result is a credible revenue architecture with decent public visibility into billing units and weak visibility into realized revenue quality. Public evidence can show how Baseten intends to charge; it cannot show revenue mix, attach rates for services, or what customers actually pay after enterprise negotiation.[CI001, CI002, CI003, CI004, CI005, CI006]

Revenue streams table
stream	mechanism	unit	current value/status	quality	diligence ask
Dedicated deployments	Per-workload GPU compute plus deployment controls and support	GPU-minute / contract	Public list pricing and quote-based Pro/Enterprise packaging exist	Medium for billing unit; low for realized price	Provide customer-level realized rate cards, discounts, and gross margin by GPU family.
Model APIs	Usage-priced hosted models through OpenAI-compatible endpoints	1M tokens	Public list pricing exposed with separate input, cached-input, and output columns	High for list unit; low for realized yield	Provide token volume by model, cache share, batch share, and realized net revenue per token.
Training Jobs / Loops	Managed training workloads that connect directly into inference deployment	GPU-minute / job / contract	Commercial surface exists, but public list pricing is not disclosed	Low	Provide training-job pricing schedule, contribution margin, and attach rate into production inference.
Support and engineering services	Hands-on engineering, Slack/Zoom support, deployment optimization, and enterprise assistance	service attachment / contract	Clearly present in Pro and Enterprise messaging, but no standalone rate card	Low	Provide services attach rate, blended pricing, and whether support is margin-accretive or subsidized.
Enterprise self-host / cloud-commitment portability	Baseten software and support layered into customer-cloud or hybrid deployments	custom contract	Publicly marketed as a key enterprise feature set	Low	Provide typical annual contract value, minimum commit, and renewal behavior for self-hosted accounts.

Public evidence supports revenue surfaces and billing units, but not product-level revenue mix or realized pricing.

[CI001, CI002, CI003, CI004, CI005, CI006]

Pricing / monetization table
price / unit / contract	list vs realized pricing	discounts / unknowns	source-backed implication
Basic: $0 per month pay-as-you-go	Pure list price	No public conversion, ARPU, or activation data	Self-serve entry point broadens funnel but says nothing about paid conversion.
Pro: quote-based with priority compute, dedicated compute, higher API rate limits, dedicated support	List package, realized price hidden	Volume discounts available but depth undisclosed	Revenue quality likely depends on how often support and priority capacity attach to usage.
Enterprise: quote-based self-host, custom SLAs, cloud commitments, data residency, advanced RBAC	List package, realized price hidden	No public minimum commit, renewal terms, or services pricing	Enterprise value proposition is operational control, not transparent SKU pricing.
Model APIs: per-1M-token pricing with separate input, cached input, and output rates	List pricing	No enterprise rate card, batching curve, or mix data	Useful for benchmarking, but list token prices are not realized revenue.
Dedicated compute: billed by compute used down to the minute	List billing rule	HostFleet says minimum dedicated deployment cost and billed awake times still apply	Scale-to-zero helps, but minimums can compress savings for spiky workloads.
Fees and invoicing: billed end-of-month and due in 30 days unless an Order says otherwise	Contract rule rather than product list price	Orders govern actual economics	Revenue recognition and payment timing likely vary by negotiated enterprise order form.

This table separates public list mechanics from private realized economics; all discount, minimum-commit, and enterprise-rate questions remain open.

[CI002, CI003, CI004, CI005, CI006, CI007]

FI001: Revenue model bridge

Baseten turns usage across dedicated deployments, Model APIs, and Training into metered spend, then converts a subset into higher-value enterprise and support contracts.

Flow depicts commercial logic rather than a quantified waterfall; public evidence does not disclose revenue mix or realized contract values.

[CI001, CI005, CI006, CI008, CI011, CI012]

4.2 GTM motion and sales-efficiency proxies

Baseten's go-to-market is best understood as land-with-usage, then expand through reliability, support, and deployment control. The Basic plan and public Model APIs create a low-friction developer entry point, but the monetization narrative shifts quickly toward Pro and Enterprise features such as dedicated compute, higher rate limits, hands-on engineering, self-hosting, and cloud-commitment portability. That packaging suggests Baseten is not trying to win a commodity self-serve race alone; it is trying to become the production-inference layer for teams that care about latency, uptime, and control enough to pay for operational help. Because Baseten is private, CAC, payback, enterprise sales cycle length, and NRR are unavailable. The best public substitutes are customer case studies. Writer credits Baseten with lower cost per million tokens and higher throughput on 70B-class models. OpenEvidence emphasizes flexible access to compute without multi-year reservations plus large deployment and maintenance-time gains. Speechify reports that it could retire a large self-managed GPU estate while cutting cost per million characters. Superhuman and Patreon frame the value proposition as saving scarce engineering time while materially improving latency or lowering GPU cost. Those are not audited financials, but they are directionally consistent with a GTM motion that sells time-to-production and lower total operating cost rather than just list-price compute. The evidence therefore supports a plausible expansion engine, but only in proxy form. The buyer logic is visible; the sales-efficiency math is not. Without internal conversion, retention, and sales-spend data, Baseten's GTM efficiency cannot be underwritten with confidence.[CI003, CI004, CI014, CI018, CI019, CI020]

4.3 Cost structure and unit-economics proxies

Baseten's public materials point to an asset-light but not necessarily low-price cost structure. Sacra describes the company as aggregating capacity across more than 15 cloud providers rather than owning GPU infrastructure outright, which should keep fixed asset intensity below a provider that buys and finances GPU fleets directly. Official materials reinforce the same model from another angle: Baseten talks constantly about multi-cloud capacity management, cross-cloud autoscaling, scale-to-zero, and running in the customer's cloud when needed. In principle, that should let the business flex supply and match cost to workload shape more tightly than a dedicated owned-fleet operator. But asset-light does not mean cheap. HostFleet's April 2026 serverless GPU matrix shows Baseten priced above Runpod on every shared SKU listed and above Modal on the shared L4 and H100 rows, while only Replicate's A100 custom deployment price sits higher among the overlapping A100 rows shown. That is the clearest adverse signal in the public record: Baseten sells a premium managed layer over raw compute. The company's rebuttal, effectively, is its own performance narrative. Dedicated Inference claims 6x better GPU utilization and 5-10x lower costs on optimized runtimes; Model APIs claim 5-10x lower spend versus closed alternatives; customer studies report lower per-unit cost, fewer engineers, or both. Those claims are consistent with a thesis that Baseten expands gross margin through utilization and support attachment, but public data still stops short of a gross-margin proof. That leaves unit economics in proxy territory. We can see the billing unit. We can see that some customers say total cost fell. We can see that list pricing is premium to raw GPU clouds. What we cannot see is the realized balance between cloud pass-through, support labor, negotiated discounts, and retention.[CI006, CI007, CI015, CI016, CI017, CI018]

Unit economics table
metric	value / public proxy	confidence	why it matters	diligence ask
Published billing unit	GPU-minute for dedicated inference; per-1M-token for Model APIs	High	Shows Baseten monetizes usage rather than seats.	Provide realized billing mix by workload type.
Company-claimed utilization lever	6x better GPU utilization and 5-10x lower costs on Dedicated Inference	Medium	If true, utilization is the core gross-margin lever.	Provide before/after utilization histograms and gross margin by optimized runtime.
Price-floor pressure	HostFleet shows Baseten premium to Runpod on shared SKUs and premium to Modal on shared L4/H100 rows	Medium	Premium pricing must be justified by lower total cost, not raw GPU-hour parity.	Provide win/loss analyses where Baseten beats cheaper raw-GPU alternatives.
Writer proxy	35% lower cost per million tokens; 60% higher tokens/sec; 23% lower TTFT	Medium	Suggests performance optimization can offset premium list pricing.	Provide benchmark methodology and comparable customer gross margin impact.
OpenEvidence / Speechify proxy	78% lower latency, 6x faster deployment, 8x lower maintenance, 44% lower cost per million characters	Medium	Supports TCO argument through both infra savings and fewer platform engineers.	Provide audited customer expansion and retention data after migration.
Patreon / Superhuman proxy	$600k resources saved yearly, 70% GPU-cost savings, 80% lower latency, multiple engineers freed	Medium	Shows economic value can sit in labor efficiency as well as compute savings.	Provide cohort-level NRR and services attach for customers citing labor savings.
Gross margin / CAC / NRR	Not disclosed publicly	Low	Without these, no public unit-economics bridge closes.	Provide gross margin by product line, CAC, payback, churn, and NRR.

Rows mix official list mechanics with customer-proof proxies and independent price-floor checks; none substitute for disclosed gross-margin data.

[CI005, CI006, CI015, CI018, CI019, CI020]

FI002: Unit economics bridge

Public unit-economics evidence runs from workload shape to utilization and support economics, but breaks before gross margin because realized discounts and COGS are private.

The bridge is directional. Public evidence supports the nodes qualitatively or via case-study proxies, but not a closed margin equation.

[CI007, CI015, CI018, CI019, CI020, CI021]

4.4 Capital adequacy and financing dependency

Baseten's capital position looks strong on paper and opaque in practice. The public record supports $75 million of Series C financing in February 2025, $150 million of Series D financing in September 2025, and a $300 million Series E at a $5 billion valuation in January 2026. Business Wire, Tracxn, and CB Insights all point to roughly $585 million of cumulative capital raised, and Business Wire explicitly characterizes the January 2026 round as the company's third fundraise within the prior year. That pace matters: Baseten is clearly not operating from slow-burn cash generation; it is financing growth aggressively as inference demand expands. The use-of-funds language reinforces that interpretation. Baseten's own Series E post centers the new money on speed, uptime, developer experience, and broadening the infrastructure platform. Tech Funding News adds expected hiring, customer-support expansion, and more integrations. Public headcount proxies line up with that investment story: PitchBook's 2025 snapshot showed 73 employees, while Tracxn listed 258 employees by April 2026. Even if those datasets are imperfect, the direction is clear—Baseten appears to be scaling operating expense meaningfully alongside product and infrastructure scope. What the public record does not show is whether the current capital base is sufficient relative to burn. There is no disclosed cash balance, monthly burn, runway, debt schedule, or covenant package. Sacra's reported $200 million to $600 million annualized revenue estimates suggest substantial scale, and the reported $11 billion to $15 billion valuation talk suggests the market may be willing to finance the next leg. But those are not substitutes for cash, margin, and runway disclosure. The only hard conclusion is that Baseten has had strong access to capital; whether it is adequately capitalized against actual burn remains private.[CI027, CI028, CI029, CI030, CI031, CI032]

Capital adequacy table
metric	public value / status	source-backed implication	diligence ask
Total capital raised	$585M total capital raised publicly reported	Capital access has been strong enough to fund rapid scale-up, but cash remaining is unknown.	Provide current cash balance and unrestricted cash after the January 2026 round.
Latest financing	$300M Series E at $5B valuation in January 2026	Fresh equity materially improved flexibility entering 2026.	Provide post-close cash bridge and board-approved operating plan.
Funding cadence	Series C $75M (Feb 2025), Series D $150M (Sep 2025), Series E $300M (Jan 2026)	Three rounds inside roughly a year implies aggressive investment mode and possible dependence on capital markets.	Provide target next-round timing and financing contingency plan.
Planned use of funds	Speed, uptime, developer experience, team growth, platform expansion, more integrations and support	Spend appears aimed at product, infra, and headcount rather than harvest mode.	Provide 24-month capex / opex budget by function.
Cash balance / monthly burn / runway	Not publicly disclosed	Capital adequacy cannot be underwritten from public evidence.	Provide monthly net burn, cash balance, runway under base and downside cases.
Debt / project-finance obligations	No public debt schedule or project-finance obligations disclosed; SEC operating-company filings unavailable from cited EDGAR page	Absence of disclosure does not equal absence of obligations.	Provide all debt facilities, cloud-commitment liabilities, reserved-capacity obligations, and major vendor terms.

Funding facts are public; adequacy is not. This table intentionally separates known financing history from unknown liquidity and obligation metrics.

[CI027, CI028, CI029, CI030, CI031, CI033]

FI003: Financial estimate range

Public financial signals span wide ranges because they mix closed financing facts with third-party estimates and snapshots rather than audited financials.

Revenue and upper valuation bound are third-party estimates rather than company-disclosed audited figures. Headcount spans two different vendor datasets and dates.

[CI027, CI035, CI037, CI038, CI048]

FI004: Capital intensity / cash-flow map

Baseten's cash-flow logic appears to run from repeated equity raises into product, support, and capacity orchestration, but the residual cash position and obligations remain private.

This map shows direction of cash use and financing dependence rather than a numeric cash-flow statement because public burn and cash data are unavailable.

[CI027, CI028, CI029, CI030, CI033, CI034]

4.5 Financial verdict and disclosure gaps

The financial verdict is positive on business-model coherence and negative on underwriteable disclosure. On the positive side, Baseten clearly monetizes the right units for its product category: GPU-time, token usage, training jobs, and high-touch enterprise features. Customer proofs consistently reinforce the same story—that production-grade inference is valuable when it lowers total operating cost, shrinks engineering burden, and preserves latency or uptime under real workloads. Capital access has also been unusually strong, with three rounds in roughly a year culminating in a $300 million Series E. On the negative side, nearly every metric that turns a good story into an investment case remains private. Public sources do not show revenue mix by product, realized enterprise pricing, gross margin, CAC, NRR, churn, customer concentration, cash balance, monthly burn, or runway. The SEC EDGAR entity landing page tied to the company lookup does not provide public operating-company filings, so there is no audited financial bridge to fall back on. Reliability evidence is adequate but not spotless: the status page shows recent incidents and the SLA target is 99.9%, not the perfect uptime implied by the strongest marketing claims. Net: Baseten looks like a premium, usage-based inference platform with real demand and credible cost-saving proxies, but the public record is still too thin to underwrite revenue quality or capital adequacy rigorously. The right diligence posture is to treat the company as promising but disclosure-light until private financials close the gaps listed below. Public customer proof now spans finance workflows (Hebbia), coding products (Zed and Posit), voice interfaces (Wispr Flow), and world-model experimentation (World Labs), which strengthens the case that Baseten's usage-based revenue opportunity is diversified across several demanding production workloads rather than one narrow niche.[CI024, CI025, CI027, CI028, CI033, CI040]

Public financial gaps table
missing private metric	impact on underwriting	exact diligence path
Revenue mix by Model APIs vs Dedicated Inference vs Training vs services	Cannot judge whether growth is durable software-like expansion or support-heavy services revenue.	Request monthly revenue by product surface for the last 18 months plus contribution margin by surface.
List-to-net pricing, enterprise minimum commits, and discount schedules	Public list pricing may overstate realized yield and margin.	Review five recent enterprise orders with associated discount approvals and usage curves.
Gross margin by product line and cloud / GPU procurement terms	Impossible to assess whether optimization claims translate into retained gross profit.	Provide product-level COGS, major cloud spend by provider, and any reserved-capacity commitments.
Cash balance, monthly burn, and runway	Capital adequacy cannot be underwritten despite recent fundraises.	Provide current cash waterfall, trailing six-month burn, and scenario runway model.
Customer concentration, NRR, churn, and cohort expansion	Cannot test revenue quality or durability beyond anecdotal customer stories.	Provide top-20 customer revenue concentration, logo churn, dollar churn, and cohort NRR.
Public filing and audit trail depth	Lack of SEC operating financials leaves investors dependent on management-only materials.	Provide audited financial statements, board package KPIs, and any lender reporting packs.

Every row is a material diligence blocker rather than a nice-to-have. Public evidence establishes narrative direction, not underwriteable private metrics.

[CI047, CI050]

Chapter 05

05Product & Technology

5.1 Product Surface in Customer Workflow Terms

Baseten now spans most of the modern AI deployment workflow rather than a single hosting SKU. At the lightest-weight end, Model APIs let teams swap an OpenAI or Anthropic base URL and call shared frontier/open models immediately, which is useful for prototyping or for products that do not need dedicated GPUs. At the heavier-weight end, Truss packages a custom or open-source model into a reproducible deployment artifact, while Dedicated Inference adds tenant-isolated capacity, custom scaling, and support for stricter latency or compliance needs. Chains sits above single-model inference for multi-step RAG, transcription, or multimodal flows, Frontier Gateway adds branded URLs plus billing/rate-limit controls for model labs monetizing their own APIs, and Baseten Training/Loops try to close the loop from checkpoint creation to production serving. In customer workflow terms, the company is selling a graduated path from instant API evaluation to custom production inference, not just raw GPU rental.[CE001, CE002, CE006, CE007, CE008, CE009]

Product Module / Asset Matrix
Module / Asset	Primary user	Status / maturity	Core function	Differentiation	Diligence gap
Model APIs	Application developers evaluating or shipping frontier/open models	GA / mature shared service	Instant OpenAI- and Anthropic-compatible inference on Baseten-managed shared infrastructure	Lowest-friction entry point; built-in caching, tool calling, structured outputs, and migration path to dedicated deployments	Shared infrastructure by design; public docs do not disclose tenant-level contention controls or per-model benchmark methodology
Dedicated Inference	Teams serving custom, fine-tuned, or proprietary models in production	GA / core enterprise surface	Single-tenant or customer-controlled inference with custom hardware, scaling, and deployment options	Combines managed performance tuning, cross-cloud autoscaling, and enterprise control surfaces	Only public contractual SLA is 99.9%; published GPU-hour pricing is high versus self-serve peers
Truss	ML engineers packaging and iterating on custom deployments	Mature open-source CLI with active May 2026 releases	Packages model code, weights, dependencies, and GPU config; deploys via uvx truss push/watch	Write-once packaging abstraction with live reload and support for many serving frameworks	Open-source activity is healthy, but public docs do not quantify enterprise adoption of Truss specifically
Chains	Teams building RAG, transcription, or multi-model workflows	GA / production workflow layer	Orchestrates Python chainlets with per-step hardware, dependencies, and autoscaling	Lets Baseten sell compound-AI workflows without forcing monolithic model deployments	Public performance claims are directional; workflow-specific latency depends on design and workload
Frontier Gateway	AI labs commercializing their own hosted models	GA / specialized monetization surface	White-labeled inference API with key management, billing, metering, rate limits, and branded URL routing	Turns Baseten into invisible infrastructure for labs that want their own API brand and monetization layer	Public documentation does not describe customer count, supported billing edge cases, or settlement workflows
Training Jobs / Loops	Research and infra teams training or post-training models before deployment	Training Jobs = GA; Loops = early access	Managed GPU training plus a path to deploy checkpoints into inference endpoints	Attempts to close training-to-inference loop inside one platform rather than handing off to another vendor	Loops is still early access, so maturity is uneven versus the inference stack

Status labels reflect Baseten's own public wording as of 2026-05-30. “Mature” means a repeatedly described GA surface with operational documentation; it does not imply externally audited feature quality.

[CE001, CE002, CE006, CE007, CE008, CE009]

Workflow / Use-Case Table
User job	Current workflow	Baseten solution	Public benefit	Limitation
Prototype with a frontier/open model quickly	Swap providers without building deployment infra	Model APIs	Point an existing OpenAI or Anthropic SDK at Baseten and start calling supported models immediately	You accept the supported-model list and shared-infrastructure model rather than choosing exact hardware
Deploy an open-source or proprietary model to production	Package model, pick hardware/engine, expose stable endpoint	Truss + Dedicated Inference	Config-driven deployment path with TensorRT-LLM or custom-server options, observability, and environment promotion	Customer still needs to validate performance/cost trade-offs per model because Baseten does not publish a universal benchmark methodology
Run a compound AI application	Split a multi-step workflow across specialized components	Chains	Each chainlet can use its own hardware and autoscaling, reducing monolithic GPU waste and latency bottlenecks	Public performance claims are directional; workflow-specific latency depends on design and workload
Commercialize a lab-owned model	Expose model to third-party customers with metering and rate limits	Frontier Gateway	White-labeled URL, key management, usage limits, and per-customer billing remove the need to build an API gateway from scratch	Commercial and contractual details beyond the marketing surface are not public
Train or fine-tune and then deploy checkpoints	Run training code, sync checkpoints, and promote into inference	Training Jobs / Loops + deploy_checkpoints	Same vendor can cover managed training infra and downstream deployment endpoint	Loops remains early access, so the most advanced post-training path is not yet fully mature in public materials

Benefits are public-product claims and customer-proof outcomes, not guaranteed customer results. Each row describes the workflow Baseten markets most clearly, not every possible implementation variant.

[CE001, CE002, CE005, CE006, CE007, CE008]

FE002: Customer Workflow / Operating Flow

How a team moves from evaluation or packaging into production inference on Baseten, with an optional training and gateway path.

[CE002, CE005, CE006, CE007, CE008, CE009]

5.2 Deployment Architecture and Operating Model

The clearest technical differentiator in Baseten's public corpus is that it explains the deployment path with more specificity than many AI-infrastructure startups. Truss abstracts packaging, dependencies, and GPU configuration; Baseten's build step then validates and uploads the package, compiles supported LLMs with TensorRT-LLM when the engine path is selected, and deploys the resulting container behind a dedicated model subdomain. The MCM control plane sits underneath both training and inference, abstracting cloud-provider differences and rerouting capacity across regions or providers when needed. Request routing resolves environment names from URL paths, environments preserve stable endpoints as deployments are promoted, and async requests enter a queue that protects real-time traffic from background work. BDN addresses the cold-start bottleneck by mirroring and caching large model weights at multiple layers so new replicas are less dependent on external storage. The result is an inference-first architecture with explicit build, routing, autoscaling, and weight-delivery primitives.[CE003, CE004, CE005, CE010, CE011, CE012]

Technology / Operating Architecture Table
Layer / component	Public mechanism	Key dependencies	Risk / limitation
Packaging layer (Truss)	Packages model definition, dependencies, secrets, caching, and GPU config from config.yaml or Python model code	Baseten CLI, GitHub/PyPI distribution, user source repositories	Abstraction is strong, but deployment success still depends on model-specific tuning and user-supplied weights
Build / compile path	Engine-Builder-LLM downloads weights, compiles with TensorRT-LLM, applies quantization/tensor parallelism, and emits a serving container	Hugging Face or cloud-storage weight source, CUDA-compatible GPU targets	Compile times can take minutes and public docs do not benchmark every model/hardware combination
Runtime optimization layer	Inference runtime exposes TensorRT, SGLang, vLLM, TGI, TEI, speculative decoding, structured output, KV-cache optimization, and topology-aware parallelism	Model architecture support, Baseten inference runtime, GPU memory/layout assumptions	Optimization options are numerous but not all are validated publicly per workload
MCM control plane	Unifies GPUs across clouds/regions, provisions resources, monitors health, and reroutes around capacity crunches or outages	Underlying cloud GPU supply, networking, Baseten control plane	Cross-cloud abstraction reduces lock-in but introduces dependency on Baseten’s own orchestration layer
Weight delivery / cold-start path	BDN mirrors weights to Baseten storage and caches them at mirrored-origin, cluster, and node layers	Upstream weight repository for first mirror, Baseten blob storage, in-cluster cache	First deploy still depends on upstream weight availability; benchmark methodology for “2-3x faster” is not public
Request routing / environments	Each model gets a subdomain; URL path resolves environment; async requests queue; promotions keep endpoint names stable	Baseten API gateway, environment config, autoscaler, queue service	Scale-to-zero introduces cold-start trade-offs and regional guarantees require a special endpoint
Workflow orchestration / training	Chains coordinates multi-step workflows; Training Jobs and Loops provision GPUs via MCM and can deploy checkpoints into inference	Python SDK/CLI, MCM, storage/checkpoint sync	Loops is early access and training maturity lags the core inference surface
Enterprise controls / tenancy	Single-tenant, self-hosted, hybrid, and regional environments plus SSO/SCIM and compliance-policy boundaries	Customer IdP, Baseten support setup, customer cloud if self-hosted	Some controls require sales/support intervention rather than pure self-serve enablement

This table mixes direct documentation facts with synthesis about operating dependencies. “Risk / limitation” names the public caveat or diligence item, not an observed failure.

[CE003, CE004, CE005, CE006, CE007, CE010]

FE001: Baseten Product Architecture Map

Layered view of Baseten's public architecture from access surfaces through packaging/orchestration to runtime and cross-cloud infrastructure.

[CE001, CE002, CE006, CE007, CE009, CE010]

5.3 Trust, Data Handling, and Reliability Controls

Baseten's trust posture is strong by startup standards and particularly important because the company wants sensitive inference workloads. The public security documentation says Baseten maintains SOC 2 Type II and HIPAA compliance, does not store model inputs, outputs, or weights by default, never shares GPUs across users, isolates customers into dedicated Kubernetes namespaces, and uses controls such as Calico, Falco, and Gatekeeper around workload isolation. Enterprise and pricing pages also advertise self-hosted, single-tenant, hybrid, and region-restricted options, while regional-environments docs explain that true residency requires a distinct regional endpoint rather than the default environment CNAME. The nuance is reliability: Baseten marketing frequently uses four-nines language, but the only public contractual commitment is the Dedicated Inference SLA at 99.9% monthly availability. The public status page also logged multiple May 2026 incidents, so diligence should treat trust/compliance as a strength and public reliability guarantees as more mixed.[CE015, CE016, CE017, CE018, CE019, CE020]

Trust / Quality / Compliance Table
Control / signal	Public status	Scope / evidence	Implication	Gap
SOC 2 Type II + HIPAA	Published	Security docs, enterprise page, and pricing page all cite SOC 2 Type II and HIPAA	Strong baseline trust signal for enterprise inference workloads	No public certificate artifacts or audit scope details in the reviewed corpus
Default non-storage of prompts/outputs/weights	Published with caveats	Security docs say Baseten does not store inputs, outputs, or weights by default, except temporary async storage and optional caching	Important privacy and IP positioning for sensitive inference	Need contract/DPA review for exact retention edge cases and customer-enabled caching behavior
GPU and namespace isolation	Published	Security docs say Baseten never shares GPUs across users and assigns each customer a dedicated Kubernetes namespace with Calico/Falco/Gatekeeper controls	Supports tenant isolation claims beyond generic cloud marketing	No public penetration-test report or architecture diagram was reviewed
Regional environments / data residency	Published but support-configured	Docs explain region-constrained replicas and special regional endpoint formats	Useful for GDPR/data-residency buyers that need routing guarantees	Setup requires Baseten involvement and public docs do not state lead times or pricing
Identity and lifecycle controls	Expanded in 2026	SSO/SCIM changelog adds SAML 2.0, SCIM 2.0, JIT provisioning, deprovisioning, and group-based roles on Enterprise	Improves enterprise admin hygiene and procurement readiness	No public mapping to specific IdP limitations or SCIM attributes
Hosting flexibility	Published	Enterprise, dedicated-inference, and security pages describe Baseten Cloud, self-hosted, hybrid, and single-tenant modes	Lets buyers choose between speed, control, and cloud-commitment reuse	Exact operational split between Baseten-managed and customer-managed responsibilities is not fully public
Contractual availability	Mixed	SLA contract says Dedicated Inference targets 99.9% monthly availability; marketing pages often use 99.99/four-nines language	Public procurement should rely on the legal SLA rather than homepage shorthand	No public SLA was found for Model APIs, Chains, or the broader web app
Operational incident visibility	Published but mixed	Public status page shows multiple May 2026 incidents; third-party reachability tracker says detailed incident data is unavailable	Visibility exists, but independent uptime corroboration is thin and headline uptime panels can hide short incidents	Need contractual incident response terms, RCA access, and service-credit history in diligence

Confidence is highest where multiple official pages agree; lower where public documentation requires contact with support or contract review. This table describes what is public, not what Baseten may provide privately in diligence.

[CE015, CE016, CE017, CE018, CE019, CE020]

FE003: Critical Dependency Map

Key dependencies that sit around Baseten's inference stack: upstream weights, GPU clouds, regional/identity controls, and non-core SaaS tooling.

[CE010, CE016, CE023, CE024, CE025, CE036]

5.4 Developer Signal, Customer Proof, and Competitive Positioning

Baseten's moat is not the lowest published unit price; it is the bundle of packaging tooling, performance engineering, and managed cross-cloud operations. Truss gives Baseten a real developer surface: the GitHub repo and PyPI package show an active open-source packaging CLI with frequent May 2026 releases centered on Loops and deployment workflows. Customer proof is stronger than generic logo pages: Writer says Baseten-built TensorRT-LLM engines improved tokens per second and lowered time to first token and cost, while OpenEvidence attributes materially lower latency, faster deployments, and lower maintenance burden to Baseten's MCM, embeddings runtime, and tooling. The trade-off is visible in independent pricing comparison. HostFleet's April 2026 matrix shows Baseten priced above Runpod and Modal on comparable GPU instances, while Runpod and Modal market more aggressive zero-idle and cold-start positioning. Against AWS, Google, and Microsoft, Baseten is narrower in scope but easier to read as an inference-specialist layer rather than a full hyperscaler AI platform.[CE026, CE027, CE028, CE029, CE030, CE031]

5.5 Roadmap and Product Maturity

Public 2026 roadmap signals point to Baseten maturing operating controls around a fairly stable product architecture rather than constantly adding new product families. The notable 2026 releases were SSO/SCIM, rolling deployments, BDN, and a billing usage API—features that make the platform easier to govern, safer to update, faster to cold-start, and easier to instrument financially. That release mix suggests Baseten is moving from “can this serve models?” toward “can this run mission-critical inference inside enterprise processes?” At the same time, maturity is uneven across the stack. Training Jobs are publicly GA, while Loops remains early access, so the training-to-inference story is strategically promising but not uniformly production-proven. Public materials also say little about benchmark methodology, exact enterprise onboarding lead times for regional controls, or product priorities beyond the currently disclosed 2026 changelog, leaving some product-tech diligence items unresolved.[CE007, CE020, CE021, CE022, CE037, CE038]

Roadmap / Release / Development-Stage Table
Date / stage	Feature / milestone	Status	Implication	Source
2026-03-04	Billing usage API	Launched	Makes Dedicated Inference, Training, and Model APIs easier to instrument financially via daily API breakdowns	Baseten changelog
2026-03-19	Baseten Delivery Network (BDN)	Launched	Signals investment in cold-start mitigation and independence from upstream weight stores after first mirror	Baseten changelog + How Baseten Works docs
2026-03-30	Rolling deployments	Launched	Adds safer zero-downtime promotions and better environment lifecycle control for production releases	Baseten changelog
2026-05-14	SSO and SCIM	Launched on Enterprise	Improves identity governance and deprovisioning for larger customers	Baseten changelog
2026 public product state	Training Jobs	GA	Shows the managed-training product is no longer experimental	Training product page
2026 public product state	Loops	Early access	Indicates Baseten is investing in post-training/RL workflows but has not yet fully hardened the surface publicly	Training product page + Truss releases

Only explicitly disclosed public milestones are listed. Absence from this table should not be read as absence from the internal roadmap; it only means the item was not visible in the reviewed public corpus.

[CE007, CE020, CE021, CE022, CE037, CE038]

FE004: Product Maturity / Capability Map

Capability-by-maturity view of Baseten's main product surfaces as of 2026-05-30.

Maturity labels are synthesis, not company-provided scores. “Differentiated” means the public corpus shows a clearer relative advantage or stronger external proof, not that the capability is objectively category-leading on all dimensions.

[CE007, CE026, CE027, CE031, CE032, CE037]

5.6 Exhibits

Chapter 06

06Customers

6.1 Customer segmentation and buyer profile

Baseten's public customer evidence points to a buyer set made up primarily of AI-native software builders whose own end products live or die on inference speed, reliability, and cost. The buyer is usually an ML, platform, or product-engineering leader, while the user base broadens into application engineers, security teams, and operations leads once a workload moves from model evaluation into production. Publicly named examples span enterprise agent platforms such as Writer and Notion, regulated healthcare apps such as OpenEvidence and Abridge, voice and speech applications such as Speechify, productivity software such as Superhuman, creative tools such as Gamma, and GTM or coding products such as Clay and Cursor. That breadth matters because it shows Baseten is not only selling experimentation infrastructure. At the same time, the disclosed book is still overwhelmingly AI-native software rather than a diversified set of legacy enterprises, so customer breadth is real but not yet institutionally broad in public.[CU001, CU002, CU003, CU004, CU005, CU006]

Customer segmentation table
Segment	Buyer / user / payer	Representative public accounts	Public value signal	Gap
Enterprise agent and knowledge platforms	Buyer: CIO / AI platform leader; User: app and ops teams; Payer: enterprise software vendor	Writer, Notion	Mission-critical AI agents with security and governance requirements	No disclosed revenue mix by enterprise vs startup accounts
Healthcare AI applications	Buyer: clinical/IT leadership; User: clinicians, care teams, revenue-cycle ops; Payer: healthcare AI vendor or health system software budget	OpenEvidence, Abridge, Ambience	Regulated medical information and clinical documentation workloads	No public contract-value or health-system count disclosure
Voice and speech applications	Buyer: product / ML platform lead; User: end users and content teams; Payer: voice application vendor	Speechify	Consumer-scale TTS and voice infrastructure under real-time latency pressure	No public disclosure of revenue concentration inside audio workloads
Productivity and collaboration apps	Buyer: product / engineering leadership; User: professionals and knowledge workers; Payer: software vendor	Superhuman, Notion	Inference directly affects email, workspace, and agent UX	Production KPIs are public for Superhuman, not for Notion
Creative and creator-economy platforms	Buyer: product / infra lead; User: creators and prosumers; Payer: software vendor	Gamma, Patreon	Large-scale image generation and creator-media workloads	Consumer logos do not by themselves prove long-term renewal economics
GTM and developer tools	Buyer: engineering / revenue-ops lead; User: developers, GTM operators, recruiters; Payer: software vendor	Clay, Cursor, Mercor	Shows Baseten exposure to code, GTM, and AI-economy tooling	Only public name mentions exist for most of these accounts

Segmentation is assembled from Baseten case studies, fundraising references, and official customer pages; Baseten does not publish customer-count or ARR mix by segment.

[CU001, CU002, CU003, CU004, CU005, CU006]

Publicly named strategic account table
Customer	Segment	Evidence type	What is public	Proof strength	Gap
Abridge	Healthcare AI	Business Wire + Abridge site	Named as Baseten customer; Abridge sells enterprise clinical-conversation AI to major health systems	Medium	No Baseten-specific deployment scope or outcomes disclosed
Cursor	Developer tools	WorkOS + Cursor site	Named as Baseten customer; Cursor serves AI-assisted coding and says it is trusted by over half the Fortune 500	Medium	No Baseten-specific workload detail or economics disclosed
Notion AI	Enterprise productivity	Business Wire + Notion AI page	Named as Baseten customer; Notion AI markets agents, enterprise search, and zero-retention enterprise controls	Medium	No Baseten-specific performance or spend data disclosed
Clay	GTM software	WorkOS + Clay site	Named as Baseten customer; Clay serves a large GTM-team base with enrichment and workflow automation	Medium	No Baseten-specific production metrics disclosed
Mercor	AI economy / recruiting	Business Wire + Mercor site	Named as Baseten customer; Mercor positions itself around powering the AI economy	Low-medium	Public customer mention exists, but use case and infrastructure dependence are not described

These rows extend the named customer set beyond flagship case studies, but they are weaker proof than the six detailed stories because they usually lack deployment specifics.

[CU010, CU011, CU047, CU048, CU049, CU050]

FU001: Customer journey map

Baseten typically starts with a technical buyer evaluating model performance, then expands into operational, security, and procurement stakeholders as the workload becomes business-critical.

[CU001, CU003, CU036, CU037, CU038, CU039]

6.2 Adoption trajectory and named production proof

Baseten's best public adoption evidence is not a disclosed customer-count time series so much as a stack of workload-scale and outcome disclosures from reference customers. The company says inference volume grew 100x over the last year, and its customer-story index now spans healthcare, code, audio, presentations, and operations use cases. The flagship six case studies are production-grade rather than pilot-like: OpenEvidence discloses billions of requests per week and a doctor in every U.S. zip code, Speechify discloses 161B+ characters per month for 60M+ users, Gamma discloses 3M+ images per day for 70M+ users, Superhuman says dozens of custom models moved into production, and Patreon reports large cost savings on a scaled Whisper deployment. The quality of proof is strongest where Baseten and the customer share concrete latency, cost, throughput, or workflow metrics, though the public set remains curated.[CU012, CU013, CU014, CU015, CU016, CU017]

Customer growth / adoption trajectory table
Metric	Value	Date	Source	Confidence	Implication / missing denominator
Baseten platform inference growth	100x inference-volume growth in the last year	2026-01	Baseten Series E blog	Medium	Strong demand signal, but not broken out by customer or workload
Public reference inventory	13 case studies, 29 testimonials, 4 videos, 654 ratings	2026-05	FeaturedCustomers	Medium	Large public reference surface, but aggregator methodology is not fully transparent
OpenEvidence workload scale	Billions of requests per week; doctors in every U.S. state and zip code	2026 viewed	Baseten case study	Medium	Shows national clinical reach, but not revenue or contract expansion
Speechify workload scale	161B+ characters/month for 60M+ users	2026 viewed	Baseten case study	Medium	Very large consumer-scale inference load
Gamma workload scale	3M+ images/day for 70M+ users	2026 viewed	Baseten case study	Medium	Strong PLG-scale proof, but no disclosed share of Gamma traffic on Baseten
Superhuman production breadth	Dozens of custom embedding models switched into production after a one-week project	2026 viewed	Baseten case study	Medium	Indicates deployment breadth even without volume metrics
Additional named accounts	Abridge, Cursor, Clay, Notion, Mercor and others named publicly	2026-01	Business Wire / WorkOS	Medium	Breadth extends beyond the six flagship stories, but most lack quantified deployment detail

Baseten does not publish one normalized customer-count series, so the table uses workload and reference proxies rather than a single active-account KPI.

[CU012, CU013, CU014, CU015, CU016, CU017]

Named customer proof table
Customer	Segment	Deployment / use case	Production vs pilot	Outcome	Limitation
Writer	Enterprise AI platform	Serve custom 70B domain-specific LLMs with TensorRT-LLM on Baseten	Production	60% higher tokens/sec, 23% lower TTFT, 35% lower cost per million tokens	No renewal term or contract value disclosed
OpenEvidence	Healthcare AI	Medical search and embeddings inference for clinicians	Production	Latency cut from >700ms to 160ms, 6x faster deployments, 8x+ less infrastructure maintenance	No public spend or account-expansion disclosure
Speechify	Voice / TTS	Host 10+ production model deployments across TTS, voice conversion, and parsing	Production	44% lower cost per million characters, 30-50% lower p99 latency, 4.5x faster startup	No disclosed revenue concentration or contract duration
Gamma	Creative AI platform	Serve open-source image generation models at massive user scale	Production	30%-80% faster generation, 20% better efficiency, 3M+ images/day	No disclosed retention or spend-per-user metrics
Superhuman	AI-native productivity	Deploy dozens of custom embedding models for core product features	Production	80% lower P95 latency and rapid migration with zero user impact	No public seat-count or contract economics
Patreon	Creator economy platform	Serve Whisper transcription and captioning workloads	Production	70% lower GPU cost, 440+ hours saved, nearly $600k annual savings	No public renewal or expansion metrics

This is a partial public sample of Baseten deployments with quantified outcomes, not an exhaustive customer list.

[CU013, CU014, CU015, CU016, CU017, CU018]

FU002: Adoption / deployment funnel

The public adoption path starts with technical evaluation, moves into one production workload, and only later expands into dedicated compute, governance, and larger enterprise commitments.

[CU003, CU036, CU037, CU039, CU040, CU041]

FU003: Customer proof quality matrix

Public proof is strongest where Baseten and the customer disclose concrete performance outcomes; durability visibility is weakest across every disclosed account.

[CU014, CU016, CU017, CU018, CU019, CU021]

6.3 Durability, satisfaction, and expansion evidence

Durability evidence is directionally positive but incomplete. On the positive side, third-party reference aggregators show unusually strong public sentiment proxies: FeaturedCustomers reports a 4.8/5 reference score from 654 ratings, PeerSpot highlights collaboration and cost effectiveness, and individual customer quotes repeatedly describe Baseten as a winner on execution, uptime, or self-serve deployment. The product packaging also shows a plausible land-and-expand motion, from Basic pay-as-you-go usage into Pro dedicated compute and Enterprise self-hosted or region-restricted deployments. Healthcare and enterprise pages make that motion concrete by advertising HIPAA-sensitive workflows, single-tenant clusters, failover, and hands-on engineering support. The key limitation is that none of the reviewed public sources provide NRR, GRR, renewal cohorts, or contract duration, so expansion has to be inferred from packaging and customer quotes rather than measured account economics.[CU031, CU032, CU033, CU034, CU035, CU036]

Retention / repeat usage / satisfaction table
Metric / proxy	Value	Segment	Confidence	Diligence ask
Portfolio NRR / GRR / logo churn		All customers	Low	Request cohort retention, gross and net revenue retention, and churn by segment
Contract duration / renewal cadence		All customers	Low	Request median contract length, renewal dates, and committed minimums by plan
Third-party reference score	4.8/5 from 654 reference ratings	Cross-customer public references	Medium	Validate how many ratings are current and attributable to post-2025 product surface
Qualitative review summary	PeerSpot highlights deployment speed, flexibility, and cost effectiveness	Cross-industry users	Low-medium	Ask for raw review count and more granular sentiment distribution
OpenEvidence testimonial	Vendor-vetting process ended with Baseten as a clear winner	Healthcare AI	Medium	Ask for renewal history and spend growth since migration
Speechify testimonial	Speechify says the partnership continues to grow and delivered highest uptime among inference providers it knows	Voice / TTS	Medium	Ask for uptime SLA, incident frequency, and contract-duration disclosure

Public durability evidence is testimonial-heavy and lacks cohort metrics, so satisfaction proxies should not be mistaken for audited retention data.

[CU031, CU032, CU033, CU034, CU035, CU046]

6.4 Concentration, switching, and competitive pressure

The main customer risk in the public record is not obvious churn but concentrated visibility. Baseten has more named accounts than just the six flagship stories, yet the quantified proof still sits inside a narrow band of AI-native software companies. That creates two diligence questions. First, there is no public disclosure of top-customer revenue share, contract lengths, or renewal rates, so investors cannot tell whether a handful of very large workloads dominate the book. Second, competitive pressure is real. HostFleet's April 2026 matrix shows Baseten as the most expensive listed option on multiple common GPUs, while Runpod's 2026 comparison ranks Baseten fifth and attributes materially faster headline cold starts to some rivals. WorkOS also describes a point where customers spending $10k-$50k per month start considering more control and lower-cost open-source options. Baseten counters that risk with open runtimes and no lock-in around customer models, but that also means portability tolerance will stay high. In other words, Baseten may be easier to adopt than a proprietary stack, but it must continuously re-win customers on operating performance and support rather than on captivity.[CU039, CU040, CU041, CU042, CU043, CU044]

Expansion and concentration risk table
Expansion driver	Concentration risk / constraint	Impact	Diligence path
Basic → Pro → Enterprise packaging	Public plan ladder suggests upsell, but conversion rates are undisclosed	Supports land-and-expand if customers outgrow self-serve inference	Request plan-mix by customer cohort and expansion rates from Basic into Pro/Enterprise
Enterprise controls and self-hosted deployment	Could bias sales toward fewer, larger technical buyers and service-heavy motions	Helpful for regulated and sensitive workloads, but may increase top-account concentration risk	Request top-10 customers by ARR, workload volume, and deployment mode
Healthcare-specific compliance posture	Vertical concentration could deepen if healthcare becomes the dominant expansion vector	Regulated workloads may be sticky if compliance and reliability prove durable	Request healthcare revenue share, customer count, and renewal history
Flagship case-study concentration	Quantified public proof is concentrated in six stories and mostly AI-native software	Investors cannot infer portfolio durability from a narrow reference set	Request anonymized cohort statistics for the broader customer base
Premium public pricing	Higher published GPU prices and minimum deployment costs can increase switching pressure	Pricing could slow expansion for cost-sensitive workloads or make replatforming attractive	Request win/loss and churn reasons on price-sensitive accounts
Open-source model portability	Customers can increasingly switch models and bring more infra in-house as spend rises	Baseten must keep winning on speed, support, and economics rather than lock-in	Request data on customer tenure, self-host conversions, and workloads retained after optimization

The strongest expansion signals are packaging and customer quotes, while the strongest risk signals are public-proof concentration and competitive price pressure.

[CU036, CU037, CU038, CU039, CU040, CU041]

Chapter 07

07Risks

7.1 Legal and regulatory risk centers on compliance scope, customer contracting, and expanding AI rules

Baseten's public trust posture is strong on its face: the company says it is SOC 2 Type II certified, HIPAA compliant, GDPR aligned, and able to run region-restricted or self-hosted deployments for sensitive workloads. The risk is that this marketing posture narrows materially once the legal stack is read in full. Baseten's security docs say it does not store model inputs or outputs by default and can enforce compliance policies, while the healthcare and enterprise pages market HIPAA-compliant infrastructure for mission-critical workloads. But the DPA embedded in Baseten's public terms says customers must not submit PHI and other restricted data unless otherwise agreed in writing, and it leaves customers responsible for legal basis, notices, and many breach-notification duties. That does not prove a defect; it does mean the public website alone is not enough to underwrite regulated use. Regulatory pressure is also moving beyond privacy into AI-governance and procurement. The European Commission's AI policy page highlights AI Act implementation, sector guidance, codes of practice, and a service desk meant to help businesses comply. For an inference vendor selling into healthcare and other regulated enterprise workloads, that raises the probability of longer diligence cycles around residency, documentation, model governance, and shared-responsibility boundaries even if Baseten is not the final application-layer decision maker. The top legal risk is therefore compliance-scope ambiguity rather than a known enforcement action. [CR001, CR002, CR003, CR008, CR009, CR010]

Regulatory / legal risk register
Risk	Evidence	Likelihood	Severity	Mitigation maturity	Residual exposure	Diligence path
Healthcare compliance scope depends on signed overrides to the public Restricted Data carve-out	HIPAA-compliant marketing sits beside DPA language that bars PHI unless otherwise agreed in writing	Medium	High	Partial	High	Request signed BAA/HIPAA addendum and the exact list of permitted PHI flows
EU AI Act and GDPR implementation can slow regulated enterprise sales	European Commission guidance emphasizes AI Act implementation support, guidance, and sector adoption	Medium	High	Early	Medium-High	Review EU legal memo, residency controls, DPIA templates, and audit artifacts
Subprocessor updates create short objection windows and termination as the main public remedy	Baseten gives 15 days' notice and five days to object to new subprocessors	Medium	Medium	Basic	Medium	Review negotiated subprocessor notice rights and change-control process
Customer-side legal-basis and breach duties can increase deployment friction	The DPA leaves customers with lawful-basis, notices, and many notification obligations	Medium	Medium	Basic	Medium	Map the shared-responsibility matrix before production launch
Mission-critical positioning can be undermined by default contract language	Terms exclude time-critical or mission-critical use even as product pages market mission-critical inference	Medium	High	Low	High	Negotiate custom SLA and carve-out language for critical workloads

Rows reflect publicly observable legal and regulatory risks as of 2026-05-30; severity ranks investment relevance rather than legal advice.

[CR003, CR008, CR009, CR010, CR011, CR012]

7.2 Operational and security risk is defined by contract scope and visible incidents, not only uptime marketing

Baseten markets reliability aggressively. The enterprise, healthcare, dedicated inference, frontier gateway, and model API pages all promise four nines or 99.99% uptime, active-active redundancy, or highly reliable multi-cloud operations. The published SLA is narrower. It applies only to Dedicated Inference when Baseten is the hosting party and sets a 99.9% monthly availability target, with credits capped at 40% of monthly fees and recoverable only if the customer files within 24 hours. The terms then go further by stating the services are not licensed for time-critical or mission-critical functions and are not warranted to be uninterrupted or error-free. That creates a real diligence gap between product positioning and default contractual protection. Baseten's status page also shows this is not theoretical. The service reported multiple May 2026 incidents, including ongoing investigations, identified fixes, monitoring updates, and a major-outage marker in the 90-day view. Third-party monitoring adds only partial comfort because Servicealert says detailed incident data is unavailable and relies on reachability snapshots rather than full root-cause reporting. Baseten has shipped useful mitigations such as rolling deployments and deployment-health tooling, but reliability risk still belongs near the top of the chapter because the platform is sold into production workflows whose customers are highly sensitive to downtime and latency regression. [CR013, CR014, CR015, CR016, CR017, CR018]

Operational / quality / security risk register
Failure mode	Likelihood	Severity	Mitigation maturity	Residual exposure	Unresolved gap
Reliability marketing exceeds the default contractual SLA	High	High	Partial	High	Need current custom SLA examples and service-credit terms for real enterprise deals
Control-plane or inference incidents recur during rapid product expansion	Medium-High	High	Partial	Medium-High	Need postmortems, Sev1 frequency, and MTTR data beyond the public status page
Outage credits are operationally weak because claims must be filed within 24 hours and are capped	Medium	Medium	Low	Medium	Need evidence that enterprise customers negotiate broader remedies
Compliance-policy or residency changes require Baseten support intervention	Medium	Medium	Partial	Medium	Need admin screenshots and change-control workflow evidence
Deployment regressions still occur despite new rollout controls	Medium	Medium	Medium	Medium	Need adoption data for rolling deployments and rollback success rates

Residual exposure stays elevated because the public SLA and status data do not reveal customer-specific remedies, postmortems, or negotiated reliability terms.

[CR013, CR014, CR015, CR016, CR017, CR018]

FR002: Risk transmission map

Operational, contractual, and pricing risks flow directly into trust, sales velocity, margin, and valuation.

[CR017, CR019, CR023, CR026, CR027, CR034]

7.3 Dependency and commercial-model risk comes from upstream capacity, vendor chain complexity, and premium pricing

Baseten's core strategic answer to infrastructure risk is multi-cloud capacity management, cross-cloud autoscaling, single-tenant options, and the ability to run in the customer's cloud. Those are meaningful mitigations, but they do not eliminate dependence on cloud partners, GPU availability, and a long tail of third-party services. Nudge Security's public profile lists AWS, Vercel, Statuspage, SendGrid, Stripe, Segment, Sentry, GitBook, and other vendors in Baseten's visible supply chain. Baseten's own product pages further promise access to the latest-generation GPUs, elastic capacity, and priority compute. That means upstream capacity and pricing remain foundational dependencies whether the platform presents them as a single managed surface or not. The commercial risk is that Baseten appears expensive relative to public peers. HostFleet's April 2026 matrix shows Baseten priced above Runpod, Modal, and Fal.ai on multiple GPU classes, while Runpod's 2026 comparison gives Baseten premium pricing and slower cold-start ranges than some alternatives. Baseten may still justify that premium through enterprise controls, support, and performance tuning, but if customers can reproduce acceptable latency and uptime on materially cheaper infrastructure, margin and win-rate risk rise quickly. This is why price-performance and supplier concentration belong alongside classic vendor-risk questions. [CR021, CR022, CR023, CR024, CR025, CR026]

Partner / dependency risk register
Dependency	Counterparty or surface	Role	Concentration	Failure scenario	Severity	Mitigation	Residual exposure
Latest-generation GPU capacity	Cloud/GPU suppliers	Powers premium inference and autoscaling promises	Unknown externally	Capacity tightens or prices rise, compressing margins and slowing onboarding	High	Multi-cloud routing and hybrid/self-host options	High
Visible SaaS control-plane vendors	AWS, Vercel, Statuspage, SendGrid, Stripe, Segment, others	Support web, billing, monitoring, messaging, and operations	Medium	A single third-party outage degrades customer experience or internal ops	Medium	Vendor diversification and customer-cloud options	Medium
Premium price position	Runpod / Modal / Fal.ai / Replicate set public comparison set	Shapes customer willingness to pay	High	Peers offer acceptable performance at materially lower entry cost	High	Sell observability, support, security, and enterprise controls	High
Service-heavy enterprise delivery	Forward-deployed engineers and support teams	Customizes deployments to hit latency, throughput, and compliance goals	Medium	Support load scales faster than product automation	Medium-High	Standardize playbooks and self-service tooling	Medium-High
Status visibility	Baseten status page plus incomplete third-party monitoring	Primary external signal for uptime events	High	Public signals understate incident depth or recurrence	Medium	Request internal reliability dashboards and postmortems	Medium

The company clearly mitigates concentration better than a single-cloud provider, but public evidence still leaves supplier mix and reserved-capacity exposure opaque.

[CR021, CR022, CR023, CR024, CR025, CR026]

FR003: Dependency map

Baseten's risk surface depends on cloud and GPU partners, third-party SaaS vendors, enterprise controls, and customer-specific contracting.

[CR021, CR022, CR023, CR029, CR037, CR042]

7.4 Execution and financial-model risk rise with product sprawl, rapid scaling, and valuation pressure

Baseten is no longer a narrowly scoped model-hosting startup. Public materials now cover Model APIs, Dedicated Inference, Frontier Gateway, model management, custom servers, chains, Training Jobs, and an early-access RL product called Loops. At the same time, the company raised a $300M Series E at a $5B valuation in January 2026 after multiple fundraises in the prior year. That capital lowers near-term financing risk, but it also raises the execution bar: investors and customers will expect the company to convert premium infrastructure positioning into repeatable enterprise growth without losing reliability or gross-margin discipline. The execution burden is amplified by staffing and go-to-market posture. Tracxn shows rapid employee growth, while Baseten's own site repeatedly emphasizes hands-on engineering support, forward-deployed expertise, custom SLAs, and deployment customization. That can be a differentiator for early enterprise expansion, but it also suggests a service-heavy operating model that may be hard to scale cleanly if product complexity, support demands, and customer-specific security asks keep expanding in parallel. The top people and execution question is therefore not whether Baseten has talent; it is whether the organization can maintain product quality, customer responsiveness, and economic discipline while broadening scope this quickly. External capital markets reinforce that risk: Modal said it raised $355 million at a $4.65 billion valuation after surpassing roughly $300 million in annualized revenue, Fireworks AI is already operating at roughly $315 million in annualized revenue on Sacra's estimates, RunPod shows that much lower-capital rivals can still compete on price, and CoreWeave's nearly $100 billion backlog shows how much well-financed infrastructure is chasing the inference opportunity.[CR030, CR031, CR032, CR033, CR034, CR035]

People / execution risk register
Role or function	Dependency or gap	Likelihood	Severity	Mitigation	Diligence path
Executive team and operating cadence	Must translate rapid fundraising into durable enterprise execution at a $5B valuation	Medium	High	Fresh capital and marquee customers	Review 2026 plan, hiring targets, and product roadmap sequencing
Product and engineering	Multiple product lines are expanding simultaneously, including an early-access training layer	Medium-High	High	Shared inference stack and tooling	Ask for GA readiness criteria, bug backlog, and reliability ownership map
Customer success and support	Hands-on engineering support may be required to justify premium pricing	Medium	Medium-High	Forward-deployed expertise and Enterprise plan packaging	Request support ratios, escalation SLAs, and reference calls
Sales and security procurement	Enterprise controls are plan-gated and can require custom contracting	Medium	Medium	SSO/SCIM, self-hosting, custom SLAs	Review average sales cycle, security-review conversion, and healthcare close rates

This register emphasizes execution scalability rather than founder-quality judgments; the central question is whether Baseten can widen scope without turning every major account into a custom services project.

[CR030, CR031, CR032, CR033, CR034, CR035]

FR001: Risk heatmap

The highest residual risks are compliance-scope ambiguity, reliability contract gaps, premium pricing, and execution sprawl.

[CR010, CR019, CR023, CR026, CR034, CR041]

7.5 Public mitigations are meaningful, but the investment case should remain gated by explicit kill criteria

Baseten does have tangible mitigations. Self-hosted and hybrid deployments reduce lock-in and residency risk, Truss improves portability, rolling deployments reduce cutover risk, SSO/SCIM strengthens access control, and the billing usage API gives customers a better handle on spend. Those are not superficial website badges; they are concrete product and operational features that can shrink risk if they are fully implemented in production accounts. Even the legal and regulatory risk is manageable if diligence confirms a clean BAA path, acceptable subprocessor controls, and contract terms that match the criticality of the workload. The key is to keep the underwriting discipline explicit. The thesis should break if Baseten cannot close the gap between public reliability marketing and executable SLA language, if price-performance remains far above peers without a quantified enterprise ROI offset, if supplier concentration turns out to be tighter than the multi-cloud story suggests, or if support-heavy sales prove necessary just to preserve baseline account health. The right posture is therefore conditional conviction: Baseten has credible mitigations, but the remaining evidence gaps are material enough that they should be converted into monitored triggers before a high-confidence investment view is allowed to stand. [CR020, CR029, CR037, CR038, CR039, CR040]

Mitigation and kill criteria table
Risk	Monitorable trigger	Threshold or event	Action implication
Healthcare compliance ambiguity	Signed BAA / HIPAA addendum availability	No executable BAA path or unclear PHI boundary for a top healthcare opportunity	Pause regulated-healthcare underwriting until contract evidence is produced
Reliability gap	Sev1 incident cadence and SLA terms	Two or more material incidents in 90 days or no negotiated remedy beyond the public SLA	Reduce conviction, require pricing holdback, or stop the process
Price competitiveness	Peer GPU economics and customer ROI	Baseten remains >2x public peer pricing on target GPU classes without quantified enterprise ROI offset	Downgrade margin and win-rate assumptions
Execution sprawl	GA readiness and support load	Training/Loops, gateway, and dedicated deployments all require heavy custom support to stay healthy	Treat the model as services-heavy rather than software-like
Supplier concentration	Capacity-booking evidence	Reserved capacity or supplier diversification is materially weaker than the multi-cloud narrative implies	Increase dependency discount and require contingency planning

Kill criteria are deliberately monitorable so they can be tied to diligence outputs rather than generic caution.

[CR019, CR026, CR027, CR029, CR036, CR037]

Chapter 08

08Valuation

8.1 Recommendation: track the company at the closed price, but do not chase momentum pricing

Baseten looks like a high-quality company operating in one of the best parts of the AI stack, but the investment call is still constrained by what is and is not publicly provable. The cleanest valuation anchor is the closed January 2026 Series E: $300 million raised at a $5 billion valuation. That mark is real, recent, and corroborated across Baseten’s own announcement, Business Wire, Tech Funding News, and Tracxn. The harder question is whether public evidence supports treating that mark as attractive, fair, or already stretched. The answer is conditional rather than categorical. Sacra’s $600 million annualized-revenue estimate makes the closed round look plausibly supportable at about 8.3x implied revenue, especially against a comp set that includes MongoDB-like low-double-digit public multiples and Modal or Fireworks private multiples in the low-to-mid teens. But that same public record also shows how fragile the case is: Baseten’s disclosed pricing is premium, third-party pricing matrices say it is expensive on paper, and the mooted $11 billion follow-on would leap straight into premium-software territory without corresponding primary financial disclosure. That is not a setup for a clean buy call. The right posture is therefore track with medium confidence, high risk rating, and a stretched valuation stance. The company is worth staying close to because the market is real and the product is differentiated, but investors should insist on diligence that converts private enthusiasm into evidence before underwriting a richer entry price.[CV001, CV002, CV007, CV008, CV009, CV035]

Recommendation summary table
Recommendation	Confidence	Risk rating	Valuation stance	Decision implication
track	Medium	High	Stretched	Stay engaged at the closed $5B mark only with strict diligence; do not underwrite a step-up round on public evidence alone.

The call is explicitly price-sensitive: it separates the closed Series E anchor from any hotter follow-on narrative.

[CV001, CV007, CV035, CV042, CV046]

FV001: Recommendation logic

The current call stays at track because strong category and product signals are offset by revenue-opacity and valuation-step-up risk.

This is a reasoning map, not a weighted scoring model.

[CV001, CV030, CV031, CV042, CV046, CV047]

8.2 The price is only defensible if revenue quality is real and premium pricing survives competition

The bull case starts with speed and timing. Baseten’s financing path moved from a $75 million Series C in February 2025 to a $150 million Series D at $2.15 billion in September 2025 and then to a $5 billion Series E in January 2026. Sacra’s estimate of $600 million in annualized revenue and Baseten’s own 100x inference-volume claim are directionally consistent with a company that hit a steep adoption curve just as inference moved from prototype infrastructure into production infrastructure. Public market and private comp work then gives that growth some context: Modal’s May 2026 round was struck at about 15.5x annualized revenue, Fireworks’ closed valuation works out near 12.7x, and MongoDB-like public infrastructure treatment is still around 10x. The anti-thesis is just as important. Baseten’s pricing page, HostFleet’s matrix, and Runpod’s comparison all point to a premium-priced service layer rather than a commodity endpoint vendor. That can be a strength if the premium buys uptime, support, compliance, and hybrid flexibility. It can also cap the multiple if the business is more support-heavy and lower-margin than premium public software names. Hyperscalers make the downside sharper: AWS, Google, and Azure can bundle model access, compute, governance, and credits inside broader cloud relationships. In other words, Baseten may deserve a premium to raw AI cloud, but it has not yet earned the disclosure quality that would let investors pay Cloudflare-like or Datadog-like premiums with confidence.[CV003, CV004, CV005, CV006, CV010, CV011]

Thesis / anti-thesis table
Argument	What supports it	What would change the view
Inference is becoming a production budget, not an experiment budget.	Technavio and Mordor both show large, fast-growing AI deployment markets, while Baseten’s financing pace shows capital chasing the theme.	Evidence of slower adoption, weaker enterprise conversion, or shrinking model-serving budgets would reduce the premium case.
Baseten may deserve a premium because it sells performance, hybrid control, and support rather than bare GPU time.	Baseten markets custom SLAs, self-hosting, priority GPUs, cross-cloud scale, and forward-deployed engineers.	If customers can replicate acceptable latency and uptime on cheaper alternatives, the premium becomes a liability rather than a moat.
The closed $5B round is plausible if the private revenue estimate is roughly right.	Sacra’s $600M run-rate estimate implies about 8.3x revenue, below many premium software comps.	Primary finance data that lands far below the estimate would break this support quickly.
The $11B narrative is not yet underwritten by public facts.	The only public support is third-party reporting of talks, not a closed financing or disclosed fundamentals.	A signed term sheet or closed announcement with clean terms and corroborated financials would materially improve the case.

The table separates company quality from valuation quality; both have to work for a buy call.

[CV007, CV008, CV010, CV011, CV015, CV030]

FV002: Valuation sensitivity

The revenue required to justify a $5B valuation changes sharply depending on which comparable multiple investors anchor to.

Each bar divides the $5B Series E mark by a selected comparable multiple; values are support thresholds, not forecasts.

[CV017, CV020, CV022, CV024, CV027, CV029]

8.3 Comparable work and scenarios put $5B inside the base case but leave little room for error

The comparable set matters because Baseten sits between two valuation regimes. On one side are AI-cloud and infrastructure names like CoreWeave, where capital intensity is visible and public multiples are lower. On the other are premium infrastructure-software names like Datadog and Cloudflare, where disclosure quality, margins, and platform breadth allow much richer trading. Baseten’s best public revenue estimate places the closed $5 billion round between those regimes rather than squarely in either one. That is why the closed round is arguable while the rumored $11 billion step-up is not yet underwritten by public evidence. Scenario work makes the same point more concretely. A bear case that assumes only $300 million to $400 million of durable revenue support and a 7x to 9x multiple points to material down-round risk. A base case that uses $500 million to $650 million and 8x to 12x supports roughly $4 billion to $7.8 billion, which comfortably contains the closed Series E. A bull case requires both stronger revenue continuation and a multiple closer to Modal-like or upper private-market inference treatment. That means the current investment debate is not whether Baseten is interesting; it is whether an investor is being paid for the uncertainty between a defensible $5 billion mark and an aspirational $11 billion narrative.[CV016, CV017, CV018, CV019, CV020, CV021]

Bull / base / bear scenario table
Scenario	Assumptions	Valuation / return logic	Key risks	Probability signal
Bear	Durable revenue support is only $300M-$400M, premium pricing erodes, and independent vendor economics look more like infrastructure than software.	$2.1B-$3.6B using a 7x-9x range; the current mark would be vulnerable to a reset.	Price pressure, lower margin quality, or slower enterprise expansion expose down-round risk.	This case rises if diligence shows support-heavy delivery, concentrated revenue, or weak unit economics.
Base	Revenue support lands around $500M-$650M, growth remains strong, and Baseten keeps a moderate premium to raw AI cloud.	$4.0B-$7.8B using 8x-12x; the closed $5B Series E sits inside this band.	The call still depends on validating revenue quality and gross margin, not just topline growth.	This is the most defensible case on today’s public evidence.
Bull	Revenue support reaches $700M-$900M, premium economics hold, and investors keep paying Modal-like or better private inference multiples.	$8.4B-$14.4B using 12x-16x; an $11B step-up becomes possible.	The upside depends on sustained hypergrowth and a premium-quality margin/disclosure profile that is not public yet.	This case requires more proof than the current public record supplies.

Ranges are scenario outputs, not point estimates, and are designed to show how quickly the underwriting shifts when revenue support or multiple choice changes.

[CV035, CV036, CV043, CV044, CV045]

Comparable valuation table
Comparable	Valuation context	Revenue context	Implied multiple	Relevance to Baseten	Limitation
Baseten closed Series E	$5.0B post-money (Jan 2026)	~$600M annualized revenue estimate	~8.3x	Direct anchor for the current underwriting debate.	Revenue support is third-party-estimated, not company-disclosed.
Modal	$4.65B post-money (May 2026)	~$300M annualized revenue	~15.5x	Closest premium private comp for elastic inference infrastructure.	Not the same mix of enterprise support, compliance, or customer base.
Fireworks AI	$4.0B post-money (Oct 2025 closed)	~$315M annualized revenue estimate in Feb 2026	~12.7x	Relevant private inference comp with explicit gross-margin discussion.	Revenue and margin are also third-party estimates, not audited disclosure.
CoreWeave	$59.75B market cap (May 2026)	~$12.5B 2026 revenue context / guide midpoint proxy	~4.8x	Useful pure-play AI cloud floor for capital-intensive infrastructure.	Much larger scale, debt profile, and business model than Baseten.
Datadog	$88.04B market cap (May 2026)	~$4.32B 2026 revenue guide midpoint	~20.4x	Premium public infrastructure-software benchmark for disclosed growth quality.	Observability software carries better margin and disclosure quality than Baseten’s public record.
Cloudflare	$85.47B market cap (May 2026)	~$2.33B trailing revenue	~36.7x	Upper-bound developer-platform multiple reference.	Category leadership and public-company maturity are far stronger than Baseten’s today.
MongoDB	$27.01B market cap (May 2026)	~$2.60B trailing revenue	~10.4x	Lower-middle public infrastructure-software reference.	Database economics and installed base are not the same as inference infrastructure.

This is a partial but intentionally broad sample spanning private inference peers, AI cloud, and public infrastructure software to bracket what the market could plausibly pay.

[CV016, CV017, CV019, CV020, CV021, CV022]

FV003: Valuation / return range

Public evidence places the closed $5B round inside the base case, while a rumored $11B round requires bull-case assumptions.

Scenario bands combine revenue-support ranges and comp-multiple bands; the rumored follow-on is shown as an external signal, not an endorsed anchor.

[CV008, CV043, CV044, CV045]

FV004: Investment KPIs

Baseten scores well on market tailwind and product differentiation, but much lower on disclosure quality and margin certainty.

Scores are directional IC-style judgments based only on retained public evidence as of the run date.

[CV025, CV030, CV031, CV040, CV041, CV042]

8.4 The thesis should be gated by terms, margin evidence, and concentration—not by enthusiasm alone

The final call hinges on a small number of diligence items that can move valuation quickly. First, investors need management-grade revenue evidence. If the company is truly around a $600 million annualized run-rate with strong expansion and acceptable concentration, the closed $5 billion round starts to look reasonable and the next mark becomes debatable rather than fanciful. If the real number is materially lower, the same pricing and comp work flips from “defensible premium” to “overextended late-stage mark.” Second, investors need direct margin data. Fireworks’ roughly 50% gross margin is a useful peer reminder that inference businesses are not pure software. Baseten’s premium pricing only deserves a premium multiple if utilization, support load, and reserved-capacity economics produce better margin quality than that analogy implies. Third, investors need the terms underneath the headline price. Preference overhang, secondaries, and customer concentration can matter more than the post-money headline. This is why the right kill triggers are practical rather than rhetorical: if Baseten cannot preserve premium price-performance with acceptable margin, if growth slips below the base-case band, or if any new round clears only with aggressive terms, the thesis should be downgraded quickly. Until those questions are closed, the company deserves active tracking and structured diligence rather than a high-conviction price-insensitive buy decision.[CV023, CV025, CV032, CV033, CV034, CV039]

Thesis-break and kill triggers table
Trigger	Threshold / signal	Transmission to thesis	Action implication
Revenue proof breaks	Management-grade run-rate lands materially below $500M or growth decelerates sharply from the public narrative.	The closed $5B round falls out of the base-case band and starts looking like a late-cycle mark.	Downgrade the recommendation and reset valuation work to the bear-case range.
Margin quality disappoints	Gross margin and utilization look closer to infrastructure resale economics than premium software economics.	Premium software multiples no longer fit the business model.	Apply a lower comp set and require a meaningfully better entry price.
Price-performance edge erodes	Customers can achieve acceptable production results on cheaper alternatives or bundled hyperscaler products.	Baseten’s premium pricing turns from moat into adoption friction.	Cut conviction and revisit long-term market-share assumptions.
Aggressive financing terms appear	A new round clears only with heavy preferences, secondary-heavy structures, or unusual protections.	Headline valuation stops mapping cleanly to common-equity upside.	Treat the mark as structurally weaker and rework return expectations.
Concentration emerges	A handful of AI-native accounts drive an outsized share of revenue without corresponding retention evidence.	The company’s revenue quality and durability become much less attractive than the headline growth rate.	Pause any high-conviction call until concentration and expansion data are clarified.

These triggers are designed to be monitorable and directly tied to valuation support, not generic operating caution.

[CV012, CV013, CV014, CV039, CV042, CV043]

Final diligence asks table
Topic	Missing evidence	Why it matters	Owner or diligence path
Revenue bridge	Monthly and quarterly revenue, ARR, and cohort expansion through the current quarter.	This is the single biggest determinant of whether $5B is fair or already stretched.	CFO / finance package and board materials.
Gross margin and utilization	Gross margin by product surface, GPU utilization, reserved-capacity mix, and support burden.	The multiple only deserves to converge toward premium software if margin quality is real.	Infra + finance deep dive with product-line detail.
Cap table and terms	Fully diluted share count, liquidation preferences, secondaries, and any structured terms.	Headline post-money can overstate common-equity upside.	Legal + finance review of latest financing docs.
Customer concentration and retention	Top-customer exposure, NRR, logo retention, and vertical mix across AI-native and enterprise accounts.	A premium multiple is fragile if spend is concentrated or non-repeatable.	Sales ops and customer-success cohort review.
Step-up financing evidence	Signed term sheet or closed-round proof for any valuation above $5B.	A rumored mark should not replace a closed anchor in underwriting.	Board process review and direct financing confirmation.

The asks are ranked by how quickly they can change valuation support rather than by general company importance.

[CV025, CV037, CV046, CV047, CV048]

Disclaimer

This report was produced by an automated research workflow using publicly available information as of 2026-05-30. It is not investment advice. Private-company data may be incomplete, stale, or estimated, and investors should supplement this report with management diligence, contractual review, and direct access to financial materials before making any investment decision.

Evidence index

Claims
ID	Statement	Confidence	Sources
CO001	Baseten was founded in 2019.	High	SO009, SO016, SO018
CO002	Baseten is based in San Francisco.	High	SO008, SO014, SO016
CO003	Baseten’s legal entity is Baseten Labs, Inc., and its privacy policy lists 201 Spear St, Suite 1600, San Francisco, CA 94105.	High	SO007, SO008
CO004	Baseten currently presents itself as an inference company built around high-performance production inference.	High	SO001, SO003, SO014
CO005	Official product surfaces show that Baseten combines production inference, model APIs, and training workflows in one platform.	Medium	SO001, SO005
CO006	Baseten sells cloud, self-hosted, and region-aware deployment options aimed at customers that need control over security or data residency.	High	SO003, SO004, SO005
CO007	Baseten says it is SOC 2 Type II and HIPAA compliant across its hosting options.	High	SO003, SO004, SO005
CO008	Baseten’s careers page, customer hub, and Series E press release name customers such as Abridge, Cursor, OpenEvidence, Speechify, Gamma, Clay, Notion, and Lovable.	High	SO006, SO002, SO014
CO009	The founders say they started Baseten at the end of 2019 to solve model-deployment and ML-infrastructure pain they had experienced themselves.	Medium	SO009
CO010	Tuhin Srivastava is publicly identified as CEO and co-founder.	High	SO031, SO015
CO011	Amir Haghighat is publicly identified as CTO and co-founder.	High	SO032, SO015
CO012	Phil Howes is publicly identified as a co-founder, and Tech Funding News describes him as chief scientist.	Medium	SO034, SO015
CO013	Pankaj Gupta is publicly identified as a co-founder.	Medium	SO033, SO015
CO014	Baseten’s Series E announcement is signed by Amir, Pankaj, Phil, and Tuhin, showing that all four founders still anchor the public leadership narrative.	Medium	SO013
CO015	Public governance visibility is limited in the fetched corpus, but Series D explicitly says Jay Simons joined Baseten’s board.	Medium	SO012, SO016
CO016	By the Series A milestone, Baseten said it had raised a little over $20 million across seed and Series A, with Greylock leading the Series A and South Park Commons, Lachy Groom, and Ray Tonsing also involved.	Medium	SO009
CO017	Tracxn and the archived PitchBook profile both place Baseten’s Series A on 2022-04-26.	Medium	SO016, SO018
CO018	Baseten’s Series B added $40 million led by IVP and Spark, with Greylock, South Park Commons, Lachy Groom, and Base Case also participating.	Medium	SO010
CO019	Tracxn records the Series B round date as 2024-03-04.	Medium	SO016
CO020	Baseten’s Series C raised $75 million on 2025-02-19 and was backed by IVP, Spark, Greylock, Conviction, South Park Commons, Basecase, Lachy Groom, Adam Bain, and Dick Costolo.	High	SO011, SO016
CO021	The Series C post says Baseten was already running workloads across thousands of GPUs and serving millions of end customers worldwide by early 2025.	Medium	SO011
CO022	Baseten’s Series D raised $150 million, was led by BOND, and brought CapitalG and Conviction into the round alongside prior investors.	High	SO012, SO016
CO023	CB Insights and Tracxn both peg Baseten’s September 2025 valuation at about $2.15 billion.	Medium	SO017, SO016
CO024	Series D linked financing to governance by adding Jay Simons to the board.	Medium	SO012
CO025	Baseten’s Series E raised $300 million at a $5 billion valuation, led by IVP and CapitalG with NVIDIA and several prior investors participating.	High	SO014, SO013, SO015
CO026	Tracxn and CB Insights both list the Series E closing date as 2026-01-20.	Medium	SO016, SO017
CO027	BusinessWire says Baseten has raised $585 million to date and that the Series E financing was its third fundraise in the prior year.	High	SO014, SO016, SO017
CO028	BusinessWire describes Baseten as infrastructure behind AI products including Cursor, Mercor, Clay, OpenEvidence, Lovable, and Abridge.	Medium	SO014
CO029	Official enterprise and healthcare pages market four-nines reliability, multi-cloud autoscaling, and region-restricted or self-hosted deployment options for sensitive workloads.	High	SO003, SO004, SO001
CO030	NVIDIA’s case study says Baseten reduced cold starts to 5–10 seconds from up to five minutes and doubled one customer’s inference performance with TensorRT-LLM.	Medium	SO027
CO031	OpenEvidence’s case study says Baseten now serves billions of requests per week for OpenEvidence and reduced end-to-end latency from over 700 milliseconds to 160 milliseconds.	Medium	SO023
CO032	Gamma says it generates roughly 3 million images per day on Baseten for 70+ million users and more than $100 million of ARR.	Medium	SO024
CO033	Speechify says Baseten cut its cost per million characters by 44% while supporting a 60M+ user base.	Medium	SO025
CO034	Patreon says Baseten saved more than 440 developer hours per year and cut GPU costs by 70% for Whisper-based workloads.	Medium	SO026
CO035	A January 2026 WorkOS interview says Baseten had just launched a startup program for seed and Series A companies and saw voice as an emerging modality.	Medium	SO028
CO036	Current headcount is not cleanly supportable because PitchBook and Tracxn show conflicting employee counts and entity-level staff figures.	Low	SO018, SO016
CO037	ServiceAlert’s 90-day monitor showed May 2026 at 100% uptime and zero days with issues, but it also says it only tracks daily worst-status reachability and lacks detailed incident data.	Medium	SO030
CO038	Nudge Security frames Baseten as a vendor-risk review target and lists security badges and SSO or MFA features, but it is an external aggregator rather than Baseten’s primary trust documentation.	Medium	SO029, SO007
CO039	Abridge describes itself as enterprise-grade AI for clinical conversations used by large healthcare systems, consistent with Baseten’s official claim that healthcare AI is a core customer segment.	Medium	SO019, SO006
CO040	Cursor says it is trusted by over half of the Fortune 500, supporting Baseten’s official claim to serve category-defining AI applications rather than only hobby use cases.	Medium	SO021, SO014
CO041	Clay says more than 500,000 GTM teams use its platform, and Baseten’s Series E materials identify Clay as a customer.	Medium	SO020, SO014
CO042	OpenEvidence positions itself as America’s Official Medical Knowledge Platform with major medical-content partners, and Baseten names it as a customer in both careers and Series E materials.	Medium	SO022, SO006, SO014
CO043	External market-data sources classify Baseten as a private Series E company.	Medium	SO016, SO018
CO044	Baseten’s pricing page shows a pay-as-you-go model with token-priced model APIs and per-minute compute pricing for GPU and CPU instances.	Medium	SO005
CO045	Baseten’s terms define the service as a platform for deploying machine learning models and building or operating applications for machine learning through a web interface.	Medium	SO007
CO046	The Series A post says Baseten quietly announced its first product in May after more than 18 months of building and used the funding moment to launch a public beta.	Medium	SO009
CM001	Baseten positions itself as a platform for high-performance inference in production rather than as a foundation-model creator.	Medium	SM001, SM003, SM006
CM002	Baseten's product surface now spans Model APIs, Dedicated Inference, Chains, Frontier Gateway, and Training.	Medium	SM005, SM006, SM007, SM015, SM016
CM003	The included spend in Baseten's core market is model-serving runtime, autoscaling, observability, billing or metering, and associated performance support attached to production inference.	Medium	SM002, SM006, SM007, SM016, SM018
CM004	The excluded spend is frontier-model R&D and the broader data or analytics stack that hyperscaler AI suites bundle but Baseten does not foreground.	Medium	SM015, SM027, SM028, SM029
CM005	Baseten competes inside overlapping categories of AI inference-as-a-service, broader AI inference, and enterprise AI platform software rather than a single cleanly defined market.	Medium	SM019, SM020, SM021
CM006	Status-quo substitutes include hyperscaler AI platforms, internal GPU infrastructure, and specialist GPU clouds such as Modal, Replicate, and Runpod.	Medium	SM022, SM023, SM024, SM027, SM028, SM029
CM007	Baseten's positioning emphasizes open-source and custom model deployment rather than ownership of closed frontier models.	Medium	SM005, SM006, SM014
CM008	Training is an adjacency for Baseten, but the commercial center of gravity remains inference and the train-to-deploy loop that feeds inference endpoints.	Medium	SM001, SM006, SM015
CM009	Technavio values the AI inference-as-a-service market at USD 85.25 billion in 2025.	Medium	SM019
CM010	Technavio forecasts a 22.1% CAGR for AI inference-as-a-service during 2026-2030.	Medium	SM019
CM011	Technavio says the GPU segment accounts for more than 58% of the AI inference-as-a-service market and that cloud deployment dominates the category.	Medium	SM019
CM012	Technavio says North America contributes 41.1% of forecast growth in AI inference-as-a-service.	Medium	SM019
CM013	Mordor Intelligence puts the enterprise AI market at USD 114.87 billion in 2026 and projects 18.91% CAGR through 2031.	Medium	SM020
CM014	Mordor says software and platforms led 65.89% of 2025 enterprise AI revenue.	Medium	SM020
CM015	Mordor says cloud solutions accounted for 67.33% of 2025 enterprise AI revenue while hybrid and edge configurations are the faster-growing deployment path.	Medium	SM020
CM016	Large enterprises accounted for 71.43% of 2025 enterprise AI spending in Mordor's market model.	Medium	SM020
CM017	Healthcare and life sciences are Mordor's fastest-growing enterprise AI vertical at 20.77% CAGR.	Medium	SM020
CM018	Fortune Business Insights values the broader AI inference market at USD 117.80 billion in 2026, up from USD 103.73 billion in 2025.	Medium	SM021
CM019	Fortune forecasts 12.98% CAGR to 2034 and says North America held 41.78% of the AI inference market in 2025.	Medium	SM021
CM020	Across public lenses, Baseten's addressable opportunity is clearly large but scope-sensitive: roughly USD 85 billion for inference-as-a-service today and more than USD 100 billion for adjacent inference or enterprise AI platform categories.	Medium	SM019, SM020, SM021
CM021	Baseten's best-evidenced buyer groups are performance-sensitive AI product teams, enterprise AI infrastructure teams, and model labs monetizing APIs.	Medium	SM003, SM010, SM016
CM022	Gamma shows a PLG or self-serve segment that values low latency and open-source model serving without building an internal ML infrastructure team.	Medium	SM012
CM023	OpenEvidence shows a regulated healthcare segment that wanted reliable performance, redundancy, and flexible compute without multi-year GPU commitments.	Medium	SM011
CM024	Writer shows enterprise model teams serving 70B models need multi-GPU performance engineering and secure deployments.	Medium	SM013
CM025	The daily users of Baseten are ML engineers, data scientists, and application engineers, while procurement, security, and IT administrators become stakeholders once deployments require identity, policy, or compliance controls.	Medium	SM009, SM014, SM017
CM026	Budget ownership appears to begin in product or engineering budgets for usage-based experimentation and shift toward central platform or IT budgets for quoted Pro, Enterprise, or self-hosted deployments.	Medium	SM002, SM003, SM017, SM018
CM027	Baseten's adoption path commonly starts with Model APIs or simple deployments and expands to Dedicated Inference and Chains as traffic, hardware specialization, or compound workflows grow.	Medium	SM001, SM005, SM006, SM007
CM028	Frontier Gateway creates a separate buyer motion for labs that need white-labeled APIs, rate limits, token metering, and billing without building their own inference control plane.	Medium	SM016
CM029	Baseten productizes compliance with HIPAA, SOC 2 Type II, region restrictions, dedicated namespaces, and a no-shared-GPU posture.	Medium	SM003, SM004, SM009
CM030	Baseten positions self-hosted, hybrid, and cloud deployments as ways to meet data residency, security, and existing cloud-commitment requirements.	Medium	SM002, SM003, SM006, SM008
CM031	Baseten's Model APIs are OpenAI-compatible and are marketed as 5-10x cheaper than closed alternatives.	Medium	SM005
CM032	Dedicated Inference is marketed as delivering 6x better GPU utilization and 5-10x lower costs at scale.	Medium	SM006
CM033	Chains is marketed as giving compound AI teams 6x better GPU usage and roughly half the latency through hardware-aware orchestration.	Medium	SM001, SM007
CM034	Baseten's value proposition is not just compute rental; customer stories repeatedly emphasize outsourced performance engineering and forward-deployed support.	Medium	SM003, SM011, SM012, SM013
CM035	OpenEvidence reported 78% lower latency, 6x faster deployments, and 8x-plus lower infrastructure maintenance time after moving to Baseten.	Medium	SM011
CM036	Gamma reported 30-80% faster image generation, 20% better efficiency, and scaling to 70+ million users and about 3 million images per day on Baseten.	Medium	SM012
CM037	Writer reported 60% higher tokens per second, 23% lower time to first token, and 35% lower cost per million tokens on four A100 GPUs.	Medium	SM013
CM038	Enterprise AI growth is being driven by automation demand, exploding data volumes, cloud AI services, and specialized hardware advances.	Medium	SM020
CM039	AI inference demand is also being driven by real-time processing needs, generative AI workloads, and edge or IoT expansion.	Medium	SM019, SM021
CM040	Hardware supply constraints, high accelerator prices, and tariff pressure are material market constraints for both inference providers and buyers.	Medium	SM019, SM021
CM041	Talent shortages and legacy-system integration complexity remain major barriers to enterprise AI rollout.	Medium	SM020, SM021
CM042	Multi-cloud capacity management and the ability to avoid long-term GPU commitments address a real buyer pain point around capacity risk and demand spikes.	Medium	SM008, SM011, SM016
CM043	HostFleet's April 2026 matrix shows Baseten's published hourly GPU prices above Runpod, Modal, and Replicate on like-for-like SKUs such as T4, L4, A100, and H100.	Medium	SM026
CM044	Runpod's own 2026 comparison ranks Baseten fifth and attributes 8-12 second cold starts to Baseten while highlighting cheaper or faster specialist alternatives.	Medium	SM025
CM045	Hyperscaler substitutes bundle model deployment with broader data, notebook, governance, and agent tooling rather than pure inference specialization.	Medium	SM027, SM028, SM029
CM046	Baseten's clearest public beachheads are high-performance consumer AI products, regulated healthcare workloads, and model labs monetizing proprietary models.	Medium	SM011, SM012, SM013, SM016
CM047	Public pricing and packaging imply Baseten trades a higher headline GPU rate for bundled performance engineering, observability, security, and managed support.	Medium	SM002, SM003, SM016, SM026
CM048	Public sources do not isolate a clean Baseten-specific SAM or SOM because available estimates mix enterprise AI, inference infrastructure, cloud, edge, and model-serving categories.	Medium	SM019, SM020, SM021
CM049	Public material does not disclose Baseten's contract sizes, attach rates for support, or revenue mix across Model APIs, Dedicated Inference, Training, and Frontier Gateway.	Medium	SM002, SM016, SM018
CP001	Baseten's current product surface spans custom-model deployment, Model APIs, training, Chains orchestration, and Frontier Gateway.	High	SP003, SP005, SP006, SP007, SP010
CP002	Baseten supports Baseten Cloud, single-tenant or self-hosted deployments, and multi-cloud capacity or cross-cloud autoscaling.	High	SP001, SP003, SP004, SP009
CP003	Baseten's public pitch centers on speed, uptime, and developer experience instead of lowest-cost raw GPU capacity.	High	SP001, SP008, SP009, SP029
CP004	Modal positions as Python-first serverless AI infrastructure with instant autoscaling to 1000+ GPUs and built-in observability.	High	SP014, SP015
CP005	Replicate positions around one-line APIs, community-published models, fine-tuning, and custom deployment through Cog.	High	SP016, SP017
CP006	Runpod offers Pods, Serverless, and Clusters, emphasizing fast scaling, many GPU SKUs or regions, and low-cost capacity.	High	SP018, SP019, SP020
CP007	AWS SageMaker, Google Vertex AI, and Azure ML each market broader end-to-end ML or AI lifecycle tooling with strong enterprise controls.	High	SP021, SP023, SP024
CP008	Internal build remains a real substitute because Truss packages models portably and can narrow the software gap between local, self-managed, and hosted deployments.	High	SP011, SP003
CP009	Frontier Gateway lets model labs ship white-labeled APIs with per-user keys, rate limits, and metering, widening Baseten's competitor set to lab-facing platforms.	Medium	SP010, SP029
CP010	PitchBook and Tracxn independently name Modal and Replicate among Baseten's comparable competitors, supporting the direct-peer set beyond vendor marketing.	High	SP027, SP026
CP011	Baseten's public plans split into Basic pay-as-you-go, quote-driven Pro, and Enterprise, with priority compute, dedicated compute, self-host, and custom SLAs above Basic.	High	SP002, SP009
CP012	Baseten says customers do not pay for idle time and only pay while models are deploying, scaling, or processing on the platform.	Medium	SP002
CP013	Baseten advertises SOC 2 Type II, HIPAA compliance, and no default storage of model inputs or outputs.	High	SP002, SP004, SP009
CP014	Baseten's runtime layers open-source engines such as TensorRT-LLM, SGLang, vLLM, TGI, and TEI with custom optimizations like speculative decoding and KV-cache management.	High	SP003, SP008
CP015	Baseten Model APIs are OpenAI-compatible and can move from shared APIs to dedicated deployments on Baseten-managed hardware.	High	SP005, SP003
CP016	Modal's public pricing offers $30 per month in Starter credits, 10 GPU concurrency on Starter, and 50 GPU concurrency on Team at $250 per month plus compute.	Medium	SP015
CP017	Replicate private models usually bill for setup, idle, and active time on dedicated hardware, making always-warm custom deployments costlier than pure scale-to-zero billing.	High	SP017, SP016
CP018	Runpod Secure Cloud and Serverless publish materially lower raw GPU list prices than Baseten's public per-GPU pricing for comparable capacity tiers.	Medium	SP019, SP020, SP002, SP025
CP019	Runpod Serverless bills per second from worker start until full stop, with flex workers scaling to zero and active workers remaining on.	High	SP020, SP018
CP020	AWS Bedrock prices open-model access by provider or model and service tier, and its batch inference option is listed at 50% below on-demand pricing.	Medium	SP022
CP021	Google Vertex AI prices tools, compute, storage, and management fees separately rather than offering simple public list GPU rates.	Medium	SP023
CP022	Azure ML charges no standalone platform fee but bills the Azure services consumed around training, deployment, storage, and monitoring.	Medium	SP024
CP023	Baseten's multi-cloud and self-host options reduce buyer fear of cloud lock-in, but they also make it easier for customers to migrate away from Baseten later.	High	SP001, SP009, SP011
CP024	Baseten's public trust posture is stronger than most self-serve peers because it combines compliance claims with single-tenant and self-host deployment modes.	High	SP002, SP004, SP009
CP025	Hyperscalers retain the strongest distribution power because Bedrock or SageMaker, Vertex AI, and Azure ML sit inside existing identity, billing, and procurement relationships.	High	SP021, SP023, SP024
CP026	Modal narrows the enterprise gap with marketplace transacting, SSO, audit logs, and HIPAA on Enterprise, but its public package is still compute-led rather than inference-specific governance.	Medium	SP014, SP015
CP027	Replicate minimizes adoption friction for prototypes through community models and simple APIs, but its public materials disclose less enterprise control than Baseten's.	Medium	SP016, SP017, SP009
CP028	Runpod explicitly markets no lock-in, low cost, and fast scale, making it attractive to cost-sensitive teams comfortable assembling their own serving stack.	High	SP018, SP019, SP020
CP029	Open-source packaging via Truss and Cog plus raw GPU clouds make multi-homing structurally easier in this market than in closed-model or data-platform markets.	High	SP011, SP017, SP019
CP030	Baseten's expansion into training and lab-facing gateway products moves it from pure hosting into a broader AI infrastructure platform category.	High	SP006, SP010, SP029
CP031	Baseten's main moat is the integration of optimized runtimes, multi-cloud capacity, enterprise deployment modes, and hands-on engineering support rather than exclusive model ownership.	High	SP008, SP009, SP029
CP032	Truss can create developer pull and portability at the same time, so it is both a funnel asset and a limiter on hard lock-in.	High	SP011, SP003
CP033	HostFleet's April 2026 matrix shows Baseten as the highest published per-GPU-hour option among the compared serverless hosts on multiple common GPU tiers.	Medium	SP025, SP002, SP015, SP017, SP019
CP034	The same HostFleet comparison still argues Baseten is attractive for production workloads because Truss, observability, and support are tangible despite higher headline pricing.	Medium	SP025, SP002, SP003
CP035	Baseten's public status page reports 99.91% uptime for Model APIs over the displayed window and records multiple May 2026 incidents.	Medium	SP012
CP036	Servicealert's independent outage tracker also shows non-perfect recent availability for Baseten, reinforcing that reliability remains a diligence item.	Medium	SP013, SP012
CP037	Sacra identifies hyperscaler bundling and below-market pricing as the clearest external threat to independent inference platforms like Baseten.	Medium	SP028, SP021, SP023, SP024
CP038	Business Wire and TechFundingNews both frame Baseten's current strategic battleground as production inference infrastructure rather than frontier-model training ownership.	Medium	SP029, SP030
CP039	Business Wire says Baseten has raised $585 million and counts NVIDIA, IVP, and CapitalG among key investors, improving its staying power in a capital-intensive market.	Medium	SP029, SP028, SP026
CP040	Baseten's best-supported positioning is premium, production-grade open-model inference for teams that value performance, portability, and support more than lowest-cost GPU hours.	High	SP003, SP008, SP009, SP025, SP028
CI001	Baseten's public monetization surfaces span dedicated deployments, Model APIs, and Training.	High	SI001, SI003, SI004, SI007
CI002	Baseten's public plan structure is Basic at $0 per month pay-as-you-go, with Pro and Enterprise sold via quote.	Medium	SI001
CI003	Pro includes priority access to high-demand GPUs, dedicated compute, higher Model API rate limits, hands-on engineering expertise, dedicated Slack and Zoom support, and volume discounts.	Medium	SI001
CI004	Enterprise includes custom SLAs, self-host deployments, use of existing cloud commitments, full control over data residency, and advanced RBAC with Teams.	Medium	SI001, SI005
CI005	Baseten publishes Model API list pricing per 1 million tokens with separate columns for input, cached input, and output.	High	SI001, SI003
CI006	Dedicated deployments are billed only for compute used, down to the minute.	Medium	SI001
CI007	Baseten says customers do not pay for idle time, but do pay while a model is deploying, scaling up or down, or making predictions.	Medium	SI001
CI008	Baseten sells Training both as Loops early access and as generally available Training Jobs, with a direct train-to-deploy path into production inference.	Medium	SI004
CI009	Baseten's Terms state that fees are billed at the end of the month and payable within 30 days unless an Order says otherwise.	Medium	SI013
CI010	Baseten's Terms make the Order the binding commercial instrument, so enterprise economics can vary contract by contract even though list pricing is public.	Medium	SI013, SI005
CI011	The billing usage API returns separate dedicated_usage, training_usage, and model_apis_usage blocks with subtotals, credits used, totals, and daily breakdowns.	Medium	SI007
CI012	The model_apis_usage block reports model name plus input, output, and cached input token counts.	Medium	SI007
CI013	The dedicated_usage block reports billable resource metadata, minutes, subtotal, and inference request counts.	Medium	SI007
CI014	Baseten explicitly monetizes support and engineering help through Pro, Enterprise, and enterprise deployment offers.	High	SI001, SI002, SI005
CI015	Dedicated Inference claims Baseten regularly sees 6x better GPU utilization and 5-10x lower costs powered by its inference stack.	Medium	SI002
CI016	The Model APIs page claims Baseten can spend 5-10x less than closed alternatives when serving optimized frontier open models.	Medium	SI003
CI017	The Enterprise page frames Baseten's economic advantage as higher output and better GPU utilization from optimized runtimes rather than seat-based software pricing.	Medium	SI005
CI018	The Healthcare page says per-minute billing and scale-to-zero make GPU costs scale with active inference rather than idle overhead.	Medium	SI006
CI019	Writer reports 35% lower cost per million tokens, 60% higher tokens per second, and 23% lower time to first token on Baseten.	Medium	SI016
CI020	OpenEvidence reports 78% lower latency, 6x faster deployment processes, 8x+ lower infrastructure maintenance time, and flexible access to compute without multi-year contracts.	Medium	SI017
CI021	Speechify reports 44% lower cost per million characters, 30-50% lower p99 latency, and 4.5x faster replica startup after migrating to Baseten.	Medium	SI018
CI022	Superhuman reports 80% lower P95 latency and says Baseten freed multiple engineers from building and running inference infrastructure in-house.	Medium	SI019
CI023	Patreon reports 440+ hours of development time saved per year, $600,000 of resources saved per year, and 70% GPU-cost savings on Baseten.	Medium	SI020
CI024	Taken together, Baseten's customer proofs sell lower total production cost and faster deployment for serious workloads rather than the lowest raw GPU-hour list price.	Medium	SI016, SI017, SI018, SI019, SI020
CI025	HostFleet's April 2026 matrix shows Baseten priced above Runpod on every shared GPU SKU it lists, above Modal on the shared L4 and H100 rows, and below only Replicate's A100 custom deployment rate among the shared A100 prices shown.	Medium	SI027
CI026	HostFleet says Baseten combines per-GPU-hour pricing with a minimum dedicated deployment cost and billed minimum awake times.	Medium	SI027
CI027	Baseten raised $300 million at a $5 billion valuation in January 2026.	High	SI010, SI021, SI024, SI025
CI028	Business Wire says the January 2026 financing was Baseten's third fundraise in the prior year and brought total capital raised to $585 million.	High	SI021, SI024, SI025
CI029	Baseten's Series D was $150 million in September 2025.	High	SI009, SI024
CI030	Baseten's Series C was $75 million in February 2025.	High	SI008, SI024
CI031	Tracxn and CB Insights also show $585 million total funding and a $300 million Series E on January 20, 2026.	Medium	SI024, SI025
CI032	Baseten's Series E blog says inference volume grew 100x in the prior year.	Medium	SI010
CI033	Baseten's Series E materials say the new capital will fund speed, uptime, developer experience, team growth, and a broader infrastructure platform.	High	SI010, SI021
CI034	Tech Funding News says the new funding is expected to support hiring in engineering and customer service plus platform and integration expansion.	Medium	SI022
CI035	Sacra estimates Baseten reached $200 million annualized revenue in December 2025 and $600 million annualized revenue in March 2026.	Low	SI023
CI036	Sacra says Baseten monetizes either API consumption or GPU minutes and hours and uses multi-cloud capacity management across more than 15 cloud providers instead of owning GPU infrastructure.	Medium	SI023
CI037	PitchBook labeled Baseten as generating revenue by February 2025 and showed 73 employees in its April 2025 snapshot.	Medium	SI026
CI038	Tracxn lists 258 employees as of April 2026.	Medium	SI024
CI039	The jump from 73 employees in PitchBook's 2025 snapshot to 258 in Tracxn's April 2026 snapshot implies substantial operating-expense growth, but payroll and burn are undisclosed.	Medium	SI024, SI026
CI040	Baseten's status page shows Model APIs at 99.91% uptime over the displayed 90-day window and multiple incidents in May 2026, while the Dedicated Inference component shows 100.0% uptime over the same displayed window.	Medium	SI015
CI041	The Dedicated Inference SLA targets 99.9% monthly availability, caps service credits at 40% of monthly fees, and requires claims within 24 hours of downtime.	Medium	SI014
CI042	Baseten's privacy policy identifies the contracting entity as BaseTen Labs, Inc.	Medium	SI012
CI043	The SEC EDGAR entity landing page for CIK 0001850888 says there is no filings data for the organization, so there are no public SEC operating-company financial statements available from that page.	Medium	SI029
CI044	Mordor says cloud deployments were 67.33% of enterprise AI revenue in 2025 and hybrid and edge deployments are forecast to grow 19.53% CAGR through 2031.	Medium	SI028
CI045	Mordor says healthcare and life sciences are forecast to grow 20.77% CAGR through 2031.	Medium	SI028
CI046	Baseten's enterprise and healthcare pages align with that opportunity through self-host, cloud-commitment, data-residency, HIPAA, and SOC 2 positioning.	Medium	SI005, SI006, SI028
CI047	Baseten's public materials do not disclose cash balance, monthly burn, runway, gross margin, CAC, NRR, customer concentration, or revenue mix by product surface.	Medium	SI001, SI010, SI013, SI021, SI023, SI024, SI025, SI029
CI048	Sacra reports Baseten is in talks to raise capital at about an $11 billion post-money valuation, with some reported offers as high as $15 billion, but that is not a closed financing.	Low	SI023
CI049	Because Baseten appears asset-light on owned GPUs but premium-priced on raw list compute, margin quality likely depends on utilization, enterprise support attachment, and negotiated discounts rather than headline GPU rates alone.	Medium	SI005, SI023, SI027
CI050	The public evidence supports strong demand, pricing, and capital-access narratives, but a real underwriting decision still depends on private data for realized pricing, retention, gross margin, burn, and concentration.	Medium	SI021, SI023, SI024, SI025, SI027, SI029
CI051	Baseten's public customer proofs now span financial-services AI, coding copilots, voice dictation, and world-model workloads, indicating that production demand is diversified across several latency-sensitive categories rather than one single end market.	Medium	SI030, SI031, SI032, SI033, SI034
CI052	Hebbia said Baseten improved tokens per second 2.5x, improved time to first token 4x, and reduced inference cost by more than 10x versus its previous deployment.	Medium	SI030
CI053	Posit said Baseten delivered sub-200ms latency for its Next Edit Suggestions feature and let the team pay only for compute it actually used.	Medium	SI031
CI054	Wispr Flow said its end-to-end speech and Llama pipeline ran in under 700 milliseconds at p99 on Baseten and AWS, with scale-to-zero elasticity.	Medium	SI032
CI055	Zed said Baseten lowered p90 latency by 45% and increased throughput 3.6x versus its previous inference provider, supporting Baseten's claim that performance wins can displace incumbent infrastructure.	Medium	SI033
CE001	Baseten publicly presents a full-stack product surface spanning Truss-led custom deployment, Model APIs, Dedicated Inference, Chains, Frontier Gateway, and Training rather than a single hosting SKU.	High	SE001, SE002, SE005, SE007, SE008, SE009, SE010
CE002	Model APIs run on shared infrastructure with OpenAI and Anthropic API compatibility, while dedicated deployments let customers choose hardware, engines, and scaling for their own models.	High	SE006, SE007
CE003	Truss packages model serving logic, dependencies, weights, and GPU configuration so the same artifact behaves consistently in development and production.	Medium	SE025, SE027
CE004	Truss publicly supports vLLM, SGLang, TensorRT-LLM, transformers, diffusers, PyTorch, and TensorFlow.	Medium	SE025, SE027
CE005	For supported architectures, a config-only Truss deployment can compile a model with TensorRT-LLM and expose an OpenAI-compatible endpoint without custom Python model code.	Medium	SE004, SE025
CE006	Chains deploys Python-defined chainlets where each step can set its own hardware resources, software dependencies, and autoscaling settings.	High	SE002, SE009
CE007	Baseten's training surface has two public tracks: Training Jobs is GA and Loops is early access.	Medium	SE010
CE008	Loops is positioned as a training SDK whose checkpoints can promote directly into Dedicated Inference, making inference a first-class output of training.	Medium	SE010, SE026
CE009	Frontier Gateway adds a white-labeled API surface with key management, rate limits, metering, billing, and branded URLs for labs serving their own models to customers.	High	SE002, SE008
CE010	MCM is Baseten's infrastructure control plane for unifying GPUs across cloud providers and regions, provisioning resources, and rerouting workloads during capacity crunches or outages.	High	SE004, SE011
CE011	Baseten gives each deployment a dedicated model subdomain and keeps endpoint names stable across environment promotion.	Medium	SE004
CE012	Baseten's request-routing model parks requests during scale-to-zero cold starts and offers an async queue that prioritizes synchronous traffic when capacity is tight.	Medium	SE004
CE013	BDN mirrors model weights into Baseten-controlled storage and uses mirrored-origin, cluster, and node caches to make large-model cold starts faster after the first pull.	High	SE004, SE019
CE014	Baseten publicly documents runtime optimizations including TensorRT, SGLang, vLLM, TGI, TEI, speculative decoding, structured outputs, KV-cache optimization, and topology-aware parallelism.	High	SE002, SE013
CE015	Baseten offers Baseten Cloud, self-hosted, hybrid, single-tenant, and region-restricted deployment options for customers that need different control or residency models.	High	SE007, SE011, SE015
CE016	Regional environments require Baseten configuration and a different regional endpoint format to guarantee inference traffic stays inside the designated geography.	Medium	SE021
CE017	Baseten publicly claims SOC 2 Type II and HIPAA compliance across its cloud hosting surfaces.	High	SE014, SE015, SE016
CE018	Baseten says it does not store model inputs, outputs, or weights by default, except temporary storage for async inference and optional caching users enable.	High	SE014, SE015
CE019	Baseten's public security docs say the platform never shares GPUs across users, isolates each customer into a dedicated Kubernetes namespace, and uses Calico, Falco, and Gatekeeper around workload security.	Medium	SE014
CE020	Baseten added Enterprise SSO and SCIM in May 2026 with SAML 2.0 sign-in, SCIM 2.0 sync, just-in-time provisioning, automatic deprovisioning, and group-based role assignment.	Medium	SE017
CE021	Rolling deployments launched in March 2026 and introduced max_surge_percent and stabilization_time_seconds controls for gradual zero-downtime promotion.	Medium	SE018
CE022	The billing usage API launched in March 2026 and exposes daily spend breakdowns across Dedicated Inference, Training, and Model APIs.	Medium	SE020
CE023	The only reviewed public Baseten SLA is for Dedicated Inference at 99.9% monthly availability, while Baseten marketing elsewhere uses four-nines or 99.99 reliability language.	High	SE001, SE007, SE015, SE023
CE024	Baseten's public status page showed incidents on May 15, 16, 18, 19, 26, and 29, 2026 even though its summary cards displayed 100.0% uptime for Dedicated Inference and 99.91% for Model APIs over the visible 90-day window.	Medium	SE022
CE025	ServiceAlert's third-party reachability page listed May 2026 at 100% uptime but explicitly said detailed incident data is unavailable, limiting independent verification of Baseten outage quality.	Medium	SE030
CE026	Truss has a visible public developer surface through a GitHub repository, a PyPI package, and active May 2026 release activity.	High	SE025, SE026, SE027
CE027	The May 2026 Truss release stream emphasized Loops CLI features, training checkpoint views, deployment-log links, and inference-call behavior, which indicates active investment in the training-to-inference workflow.	Medium	SE026
CE028	Writer's Baseten case study says model-specific TensorRT-LLM engines delivered 60% higher tokens per second, 23% lower time to first token, and 35% lower cost per million tokens on four A100 GPUs.	Medium	SE028
CE029	OpenEvidence says Baseten reduced end-to-end latency from more than 700 milliseconds to 160 milliseconds and sped deployments 6x.	Medium	SE029
CE030	OpenEvidence also says Baseten now serves billions of requests per week for its medical workflow and reduced infrastructure maintenance time by more than 8x.	Medium	SE029
CE031	HostFleet's April 2026 pricing matrix shows Baseten posting higher public GPU-hour rates than Runpod and Modal on comparable L4, A100, and H100 instances.	Medium	SE016, SE031, SE032, SE033
CE032	Despite the higher published price points, HostFleet characterizes Truss, observability, and support as Baseten's tangible value-adds for startups running production inference.	Medium	SE031
CE033	Runpod and Modal market more aggressive zero-idle and cold-start language than Baseten, while Baseten emphasizes dedicated compute, managed performance engineering, and control.	Medium	SE005, SE031, SE032, SE033
CE034	Replicate's public product surface is simpler API-first model serving through Cog, whereas Baseten layers dedicated deployments, Chains, and Frontier Gateway on top of its packaging tool.	Medium	SE008, SE009, SE025, SE034
CE035	AWS SageMaker, Google Agent Platform, and Azure Machine Learning all span training, deployment, governance, and observability, so Baseten competes by offering a narrower inference-first abstraction rather than full hyperscaler platform breadth.	Medium	SE004, SE035, SE036, SE037
CE036	A third-party security profile lists AWS, Vercel, Statuspage, SendGrid, Stripe, Segment, Sentry, and other SaaS tools in Baseten's visible operational footprint.	Medium	SE038
CE037	Baseten's visible 2026 roadmap signal centered on trust and operating controls such as SSO/SCIM, rolling deployments, BDN, and billing instrumentation rather than entirely new product lines.	Medium	SE017, SE018, SE019, SE020
CE038	Public materials show uneven maturity within the training stack because Training Jobs is GA while Loops is still early access.	Medium	SE010
CE039	Public Baseten sources still leave unresolved product-tech gaps around benchmark methodology, exact regional-environment setup lead times, and roadmap priorities beyond the currently announced 2026 releases.	Low	SE004, SE021, SE028
CE040	Baseten's product-tech moat appears strongest for teams that value performance tuning, cross-cloud capacity, and engineering support more than lowest published unit price or hyperscaler breadth.	Medium	SE007, SE015, SE031, SE032, SE033, SE035, SE036, SE037
CU001	Baseten markets itself as a high-performance inference platform for teams shipping AI products in production.	Medium	SU001
CU002	Baseten's enterprise page targets mission-critical enterprise inference with secure, scalable, and controllable deployment options.	Medium	SU003
CU003	Baseten publicly packages Basic, Pro, and Enterprise plans around progressively heavier buyer needs, from pay-as-you-go deployments to self-hosted regulated environments.	Medium	SU005
CU004	Writer positions itself as an enterprise AI platform used by world-class enterprises.	Medium	SU008, SU015, SU016
CU005	OpenEvidence describes itself as a medical knowledge platform for clinicians and physicians.	Medium	SU009, SU014
CU006	Speechify says more than 55 million people use its voice AI productivity assistant.	Medium	SU010, SU018
CU007	Gamma says its AI tools create presentations, websites, and social content.	Medium	SU011, SU017
CU008	Superhuman positions itself as AI-enhanced mail, docs, and workflow software for knowledge workers.	Medium	SU012, SU020
CU009	Patreon says hundreds of thousands of creators use its platform to build direct fan communities and recurring businesses.	Medium	SU013, SU021
CU010	Business Wire names OpenEvidence, Abridge, Notion, Clay, and Mercor among Baseten customers.	Medium	SU007
CU011	WorkOS says Baseten powers AI workloads for Cursor, Notion, Clay, OpenEvidence, and Ambience.	Medium	SU023
CU012	Baseten says its inference volume grew 100x in the last year.	Medium	SU006
CU013	Baseten's customer-stories index spans speech, healthcare, coding, pharmaceutical search, and AI operations use cases.	Medium	SU002
CU014	OpenEvidence says Baseten now serves billions of requests per week for its medical-information product.	Medium	SU009
CU015	OpenEvidence says its product now works with a doctor in every state and zip code in America.	Medium	SU009
CU016	Speechify says its platform synthesizes more than 161 billion characters per month for 60M+ users.	Medium	SU010
CU017	Gamma says it generates more than 3 million images per day for more than 70 million users on Baseten.	Medium	SU011
CU018	Superhuman says Baseten runs dozens of custom embedding models that power core features in its product.	Medium	SU012
CU019	Patreon says Baseten saved 440+ hours of developer time and nearly $600k per year on its Whisper deployment.	Medium	SU013
CU020	FeaturedCustomers lists 13 case studies, 29 testimonials, 4 customer videos, and 654 reference ratings for Baseten.	Medium	SU024
CU021	Writer reports 60% higher tokens per second on Baseten for its domain-specific LLMs.	Medium	SU008
CU022	Writer reports 23% lower time to first token and 35% lower cost per million tokens on Baseten.	Medium	SU008
CU023	OpenEvidence reports latency falling from more than 700 milliseconds to 160 milliseconds on Baseten.	Medium	SU009
CU024	OpenEvidence reports 6x faster deployments and an 8x+ reduction in infrastructure maintenance time on Baseten.	Medium	SU009
CU025	Speechify reports a 44% lower cost per million characters on Baseten.	Medium	SU010
CU026	Speechify reports 30-50% lower p99 inference latency and 4.5x faster replica startup on Baseten.	Medium	SU010
CU027	Gamma reports 30%-80% faster image generation per model on Baseten.	Medium	SU011
CU028	Gamma reports 20% efficiency improvement while reducing replica count and supporting billions of generated images.	Medium	SU011
CU029	Superhuman reports an average 80% reduction in P95 latency across its embedding models on Baseten.	Medium	SU012
CU030	Patreon reports 70% GPU-cost savings and says Baseten was twice as cheap as the next cheapest solution for its Whisper workload.	Medium	SU013
CU031	FeaturedCustomers reports a 4.8 out of 5 reference-rating score for Baseten based on 654 ratings.	Medium	SU024
CU032	OpenEvidence says Baseten was a clear winner after the team spent weeks researching and vetting inference providers.	Medium	SU009
CU033	Speechify says Baseten delivered the highest uptime of any inference provider it knows.	Medium	SU010
CU034	Superhuman says it was able to self-serve 95% of what it needed on Baseten.	Medium	SU012
CU035	PeerSpot's review summary emphasizes Baseten's supportive environment, speed-to-deployment, flexibility, and cost effectiveness.	Medium	SU031
CU036	Baseten's pricing page shows a self-serve Basic plan, a Pro plan with dedicated compute and hands-on engineering, and an Enterprise plan with self-hosting and custom SLAs.	Medium	SU005
CU037	Baseten's enterprise page says Baseten Cloud offers single-tenant clusters and the self-hosted product can fail over to Baseten Cloud.	Medium	SU003
CU038	Baseten's healthcare page says the platform is SOC 2 Type II and HIPAA compliant, supports region-restricted deployments, and highlights OpenEvidence and Latent as healthcare cases.	Medium	SU004
CU039	WorkOS says customers often start thinking about controlling their own destiny once inference spending reaches roughly $10,000-$50,000 per month.	Medium	SU023
CU040	WorkOS says open-source models let companies switch to options that are faster, cheaper, more customizable, and more reliable at scale.	Medium	SU023
CU041	Business Wire says Baseten pitches open runtimes and no lock-in around customer models.	Medium	SU007
CU042	HostFleet says Baseten is the highest-priced listed provider in its April 2026 comparison for T4, L4, A10G, A100, and H100 where listed, and adds that Baseten has a minimum dedicated deployment cost.	Medium	SU026
CU043	Runpod ranks Baseten fifth in its 2026 serverless GPU comparison and characterizes it as per-minute, configurable-replica infrastructure with 8-12 second speed.	Medium	SU025
CU044	NVIDIA says Baseten cut cold starts from up to five minutes to 5-10 seconds using NVIDIA GPUs and TensorRT-LLM.	Medium	SU022
CU045	Publicly quantified proof is concentrated in six flagship case studies even though fundraising and interview materials name additional accounts.	Medium	SU002, SU007, SU023, SU024
CU046	Reviewed public customer materials do not disclose NRR, GRR, contract length, or top-customer revenue share.	Medium	SU002, SU003, SU005, SU006, SU007
CU047	Abridge sells enterprise-grade AI for clinical conversations trusted by the largest healthcare systems.	Medium	SU007, SU027
CU048	Clay says more than 500,000 GTM teams use its data-enrichment and workflow platform.	Medium	SU023, SU028
CU049	Cursor says it is trusted by over half of the Fortune 500 for AI-assisted software development.	Medium	SU023, SU029
CU050	Notion AI markets built-in agents, enterprise search, HIPAA-capable enterprise workflows, and zero-data-retention enterprise controls.	Medium	SU007, SU030
CU051	Mercor says it is organizing human intelligence to power the AI economy.	Medium	SU007, SU032
CU052	Publicly named strategic accounts extend Baseten beyond consumer applications into healthcare, GTM, coding, and enterprise productivity.	Medium	SU007, SU023, SU027, SU028, SU029, SU030, SU032
CU053	Public references skew toward AI-native software companies whose own products depend heavily on inference quality and latency.	Medium	SU002, SU007, SU008, SU009, SU010, SU011, SU012, SU013
CU054	Baseten's public customer-proof quality is high on outcome specificity for six flagship stories but low on disclosed renewal economics.	Medium	SU008, SU009, SU010, SU011, SU012, SU013, SU024
CU055	The public record supports land-and-expand potential from model experimentation into dedicated compute, multi-cloud scale, and self-hosted enterprise configurations.	Medium	SU003, SU005, SU009, SU010, SU012
CR001	Baseten says it maintains SOC 2 Type II certification and HIPAA compliance.	High	SR001, SR011, SR012
CR002	Baseten says it does not store model inputs or outputs by default, except async inputs are temporarily stored until processed.	High	SR001, SR011
CR003	Baseten says compliance policies are read-only for customers and must be changed through Baseten support.	Medium	SR001
CR004	Baseten offers self-hosted and single-tenant deployment options for sensitive workloads on higher-tier plans.	High	SR001, SR008, SR011, SR024
CR005	Baseten's terms incorporate a DPA and security measures that Baseten may update so long as overall protection is not materially decreased.	Medium	SR003
CR006	Baseten's DPA lets customers object to a new subprocessor within five calendar days after notice.	Medium	SR003
CR007	Baseten's DPA says it will notify customers without undue delay after discovering a personal-data breach affecting customer personal data, but customers remain responsible for their own notification obligations.	Medium	SR003
CR008	Baseten's DPA says customers must not provide PHI and other Restricted Data unless otherwise agreed upon with Baseten in writing.	High	SR003, SR029
CR009	HHS says a covered entity must obtain written satisfactory assurances before disclosing PHI to a business associate.	High	SR029, SR003
CR010	Baseten's healthcare positioning creates a diligence need to verify a signed BAA or similar written override before underwriting PHI workflows.	High	SR001, SR003, SR012, SR029
CR011	The European Commission says the AI Act is being implemented through guidance, codes of practice, and an AI Act Service Desk.	Medium	SR030
CR012	Because Baseten markets healthcare and regulated enterprise workloads, AI Act and GDPR implementation can lengthen security and legal review cycles even if Baseten is infrastructure rather than the end application.	Medium	SR011, SR012, SR030
CR013	Baseten's published SLA applies only to Dedicated Inference for which Baseten is the hosting party.	High	SR004, SR024
CR014	Baseten's published Dedicated Inference SLA targets 99.9% monthly availability.	High	SR004, SR024
CR015	Baseten's SLA caps service credits at 40% of monthly fees and requires customers to submit claims within 24 hours of unscheduled downtime.	Medium	SR004
CR016	Baseten's terms say the services are not licensed for time-critical or mission-critical functions and are not warranted to be uninterrupted or error-free.	High	SR003, SR004
CR017	Baseten's status page shows multiple May 2026 incidents, including investigation, identified-fix, monitoring, and major-outage markers in the 90-day view.	Medium	SR005
CR018	Servicealert says detailed incident data is not available for Baseten and that its history is based on reachability monitoring.	Medium	SR006
CR019	Baseten's public product pages market four nines or 99.99% uptime more broadly than the default 99.9% Dedicated Inference SLA.	High	SR004, SR011, SR012, SR020, SR023, SR024
CR020	Baseten shipped rolling deployments with gradual traffic shifting, pause, resume, and cancel controls as a mitigation against deployment-induced outages.	Medium	SR026, SR022
CR021	Baseten positions its reliability story around multi-cloud, multi-region autoscaling and hybrid deployment options rather than a single-cloud architecture.	High	SR010, SR011, SR023, SR024
CR022	Nudge Security lists AWS, Vercel, Statuspage, SendGrid, Stripe, Segment, Sentry, GitBook, and other SaaS tools in Baseten's visible supply chain.	Medium	SR007
CR023	Baseten's frontier, model API, and dedicated inference pages all tie product promises to access to the latest-generation GPUs and elastic capacity.	High	SR008, SR020, SR023, SR024
CR024	Technavio says AI inference-as-a-service providers face hardware supply constraints and high accelerator costs that inflate operating costs and limit scalability.	Medium	SR013
CR025	Mordor Intelligence says hardware accelerators are the fastest-growing enterprise AI component and that GPU supply constraints and salary inflation are current headwinds.	Medium	SR014
CR026	HostFleet's April 2026 matrix shows Baseten priced above Runpod, Modal, and Fal.ai on multiple comparable GPU classes.	Medium	SR016, SR008
CR027	Runpod's 2026 comparison lists Baseten with per-minute pricing and an 8-12 second cold-start range while ranking cheaper or faster peers above it on some dimensions.	Medium	SR015, SR008
CR028	HostFleet says Baseten has a minimum dedicated deployment cost and billed minimum awake times, which raises entry friction for smaller customers.	Medium	SR016, SR008
CR029	Baseten counters lock-in risk with self-hosting, hybrid deployment, open runtimes, and full ownership of trained weights.	Medium	SR011, SR021, SR028
CR030	Baseten announced a $300M Series E at a $5B valuation in January 2026 after multiple fundraises within the prior year.	Medium	SR018, SR017
CR031	Baseten says the financing marked the company's third fundraise in the prior year, increasing pressure to convert capital into durable enterprise growth.	Medium	SR018, SR017
CR032	Tracxn lists Baseten at 46 employees on December 31, 2024 and 258 employees by April 26, 2026.	Low	SR017
CR033	Baseten's careers page says companies such as Abridge, Cursor, Lovable, Notion, and OpenEvidence depend on Baseten for mission-critical AI workloads.	Medium	SR019, SR018
CR034	Baseten is expanding simultaneously across model APIs, dedicated inference, frontier gateway, model management, and training products.	Medium	SR020, SR021, SR022, SR023, SR024
CR035	Baseten's Loops training product is still early access even as Training Jobs is generally available.	Medium	SR021
CR036	SSO/SCIM, advanced identity controls, self-hosting, and custom SLAs are tied to higher-tier enterprise packaging rather than the self-serve entry plan.	Medium	SR008, SR011, SR025
CR037	Baseten added SSO/SCIM with automatic provisioning and deprovisioning plus group-based role assignment as a concrete mitigation for identity risk in larger accounts.	Medium	SR025, SR011
CR038	Baseten's billing usage API gives customers programmatic daily cost visibility across Dedicated Inference, Training, and Model APIs.	Medium	SR027, SR008
CR039	Baseten's model-management tooling says customers can monitor deployment health and adjust autoscaling policies to hit performance SLAs.	Medium	SR022, SR010
CR040	Truss and custom-server packaging reduce some switching-cost risk because Baseten exposes a more portable packaging layer than a fully closed model-hosting service.	Medium	SR028, SR022
CR041	Baseten's repeated emphasis on hands-on engineering expertise and customized deployments implies a service-heavy go-to-market model that may pressure margins as enterprise accounts scale.	Medium	SR008, SR011, SR024
CR042	Baseten's public contract stack leaves customers responsible for system configuration, backups, valid legal basis, and parts of incident response, which can slow regulated deployments even when Baseten provides secure infrastructure.	High	SR003, SR004, SR029
CR043	Modal said it raised $355 million in May 2026 after surpassing $300 million in annualized revenue, showing that a close inference-infrastructure rival is scaling quickly with large new capital.	High	SR031, SR032
CR044	Reuters reported that Modal's Series C valued the company at $4.65 billion, close to Baseten's $5 billion January 2026 valuation, which limits room for execution misses if buyers compare the platforms directly.	Medium	SR032
CR045	Sacra estimated Fireworks AI at roughly $315 million in annualized revenue in 2026 and a $4 billion valuation from its 2025 Series C, indicating that another open-model inference peer is already operating at substantial scale.	Medium	SR033
CR046	Tracxn says RunPod has raised only $22 million while positioning itself as a cost-effective GPU-infrastructure provider, which suggests cheaper rivals do not need Baseten-like capital intensity to pressure pricing.	Medium	SR034
CR047	CoreWeave reported nearly $100 billion of revenue backlog in May 2026 and explicitly framed inference as a major growth vector, underscoring that capital-rich infrastructure platforms are racing to absorb the same demand pool Baseten targets.	Medium	SR035
CV001	Baseten officially announced a $300 million Series E at a $5 billion valuation in January 2026.	High	SV001, SV002, SV003, SV005
CV002	After the Series E, public sources put Baseten’s total disclosed funding at about $585 million.	Medium	SV002, SV004, SV005, SV006
CV003	Tracxn records Baseten’s financing path as a $75 million Series C in February 2025, a $150 million Series D at a $2.15 billion valuation in September 2025, and a $300 million Series E at a $5 billion valuation in January 2026.	Medium	SV005
CV004	Baseten’s official Series D announcement says the company raised $150 million in a round led by BOND.	High	SV008, SV005
CV005	Baseten’s Series C announcement and PitchBook archive together support that the company’s 2025 Series C was a $75 million round.	Medium	SV009, SV007
CV006	Baseten said inference volume grew 100x during 2025.	Medium	SV001, SV004
CV007	Sacra estimates Baseten reached $600 million of annualized revenue in March 2026, up from about $200 million in December 2025.	Medium	SV004
CV008	Sacra says Baseten was in talks in May 2026 to raise $1 billion at an $11 billion post-money valuation, with reported offers reaching as high as $15 billion.	Medium	SV004
CV009	The gap between the closed $5 billion round and the mooted $11 billion follow-on means the underwriting question is whether fundamentals have caught up with sentiment, not whether enthusiasm exists.	Medium	SV001, SV002, SV004
CV010	Baseten’s pricing page shows a free pay-as-you-go Basic tier, while Pro adds priority compute and dedicated support and Enterprise adds custom SLAs and self-hosting.	Medium	SV010
CV011	Baseten’s homepage pitches cross-cloud scale, forward-deployed engineers, and 99.99% uptime as reasons customers should trust it for production workloads.	Medium	SV011
CV012	HostFleet’s April 2026 pricing matrix shows Baseten at $4.00 per hour for A100 and $6.50 per hour for H100, above Modal at $2.10 and $3.95 and above Runpod at $2.17 and $3.35 on the same GPU classes.	Medium	SV012
CV013	HostFleet also notes Baseten combines per-GPU-hour pricing with a minimum dedicated deployment cost and billed minimum awake times.	Medium	SV012
CV014	Runpod’s 2026 comparison ranks Baseten below several alternatives on affordability and cites usage-based per-minute billing with 8–12 second cold starts.	Medium	SV013
CV015	Baseten can still justify premium pricing if observability, support, compliance, and hybrid deployment reduce customers’ total cost of production inference.	Medium	SV010, SV011, SV012
CV016	Modal disclosed a May 2026 Series C of $355 million at a $4.65 billion post-money valuation after surpassing $300 million in annualized revenue.	High	SV015, SV016
CV017	Modal’s May 2026 round implies an approximate 15.5x annualized-revenue multiple.	Medium	SV015, SV016
CV018	Modal says it can scale from 0 to 1,000 GPUs in minutes or even seconds, making it a credible direct infrastructure comparable rather than a generic application software company.	Medium	SV015
CV019	Fireworks AI’s last closed round was a $250 million Series C at a $4 billion post-money valuation, while Sacra estimates roughly $315 million of annualized revenue in February 2026 and gross margin around 50%.	Medium	SV018
CV020	Fireworks’ closed-round valuation implies an approximate 12.7x annualized-revenue multiple, which is above Baseten’s implied closed-round multiple if the Sacra estimate is right.	Medium	SV018, SV004
CV021	CoreWeave reported Q1 2026 revenue of $2.078 billion and a $99.4 billion revenue backlog, while CompaniesMarketCap showed a May 2026 market cap of $59.75 billion.	Medium	SV019, SV020
CV022	Using CoreWeave’s 2026 revenue context, public AI infrastructure is trading at roughly 4.8x market cap to annualized-guide revenue.	Medium	SV019, SV020
CV023	Datadog guided to $4.30 billion to $4.34 billion of 2026 revenue, and CompaniesMarketCap put its May 2026 market cap at $88.04 billion.	High	SV021, SV022
CV024	Datadog’s implied multiple is about 20.4x forward revenue, showing how the market prices premium infrastructure software with strong growth and disclosure.	Medium	SV021, SV022
CV025	Datadog’s Form 10-K highlights the disclosure baseline public investors get on risk factors, revenue, and growth that private Baseten investors do not get from public materials.	High	SV023, SV021
CV026	CompaniesMarketCap and Stock Analysis put Cloudflare at about an $85.47 billion market cap and $2.33 billion of trailing revenue in late May 2026.	Medium	SV024, SV025
CV027	Cloudflare’s implied 36.7x revenue multiple is an upper-bound developer-platform reference that assumes much better disclosure, margin structure, and category leadership than Baseten has shown publicly.	Medium	SV024, SV025
CV028	CompaniesMarketCap and Stock Analysis put MongoDB at about a $27.01 billion market cap and $2.60 billion of trailing revenue in late May 2026.	Medium	SV026, SV027
CV029	MongoDB’s implied 10.4x revenue multiple is a lower-middle public infrastructure-software reference for a scaled but less euphoric comp set.	Medium	SV026, SV027
CV030	Technavio values the AI inference-as-a-service market at $85.25 billion in 2025 and expects 22.1% CAGR through 2030.	Medium	SV028
CV031	Mordor values the broader enterprise AI market at $114.87 billion in 2026, with cloud deployment accounting for 67.33% of 2025 revenue.	Medium	SV029
CV032	AWS Bedrock advertises select batch inference at 50% below on-demand pricing, showing hyperscalers can attack the inference layer with bundled economics.	Medium	SV030
CV033	Google promotes a unified agent platform with 200-plus models and free credits for new customers, increasing the risk that enterprises default to broader cloud bundles.	Medium	SV031
CV034	Azure Machine Learning publishes a 99.9% SLA and no additional platform charge beyond underlying Azure services, reinforcing the bundling threat to independent vendors.	Medium	SV032
CV035	If Sacra’s $600 million annualized-revenue estimate is directionally right, Baseten’s closed $5 billion round implies roughly an 8.3x revenue multiple.	Medium	SV004
CV036	An $8.3x implied multiple would place Baseten above CoreWeave-like AI cloud treatment but below Modal, Fireworks, Datadog, and Cloudflare-style premium software treatment.	Medium	SV004, SV018, SV019, SV020, SV021, SV022, SV024, SV025
CV037	At the same $600 million run-rate, the mooted $11 billion follow-on would imply roughly an 18.3x multiple, much closer to Datadog-grade public software pricing.	Medium	SV004, SV021, SV022
CV038	Baseten’s pricing and delivery model suggest revenue quality may be more support-intensive and lower-margin than top public software comps even if growth is exceptional.	Medium	SV010, SV011, SV012, SV013
CV039	Fireworks’ roughly 50% gross margin and explicit 60% target are a useful reminder that inference platforms are infrastructure businesses first, not pure software businesses.	Medium	SV018
CV040	The strongest pro-valuation argument is that inference demand is large, cloud-heavy, and moving into production workloads where Baseten offers hybrid deployment and performance differentiation.	Medium	SV028, SV029, SV010, SV011
CV041	The strongest anti-valuation argument is that premium pricing can be attacked by Runpod and Modal at the edge and by hyperscalers through bundled platform pricing.	Medium	SV012, SV013, SV017, SV030, SV031, SV032
CV042	The current $5 billion price is supportable only conditionally because it assumes the private revenue estimate is directionally right and that Baseten can defend premium economics despite bundling pressure.	Medium	SV004, SV012, SV013, SV015, SV016, SV030, SV031, SV032
CV043	A reasonable bear case uses $300 million to $400 million of revenue support and a 7x to 9x multiple, implying roughly $2.1 billion to $3.6 billion of value.	Medium	SV004, SV018, SV026, SV027
CV044	A reasonable base case uses $500 million to $650 million of revenue support and an 8x to 12x multiple, implying roughly $4.0 billion to $7.8 billion and placing the closed $5 billion round inside the range.	Medium	SV004, SV015, SV016, SV018, SV026, SV027
CV045	A reasonable bull case uses $700 million to $900 million of revenue support and a 12x to 16x multiple, implying roughly $8.4 billion to $14.4 billion and making an $11 billion step-up possible only if growth and premium perception keep compounding.	Medium	SV004, SV015, SV016, SV018
CV046	The right investment recommendation is track, not buy, because company quality is high but the public evidence leaves the price only fair-to-stretched rather than clearly attractive.	Medium	SV004, SV012, SV013, SV015, SV016, SV023
CV047	The highest-leverage diligence question is whether internal revenue, gross margin, and customer-concentration data support the market narrative implied by the $5 billion round.	Medium	SV004, SV018, SV023
CV048	The thesis should break if Baseten cannot preserve premium price-performance with acceptable margin, if growth normalizes materially below the base-case band, or if any new round clears only with aggressive terms.	Medium	SV004, SV012, SV013, SV018, SV023

Sources
ID	Publisher	Title	Quote
SO001	Baseten	Baseten \| Inference is everything
SO002	Baseten	Baseten customers
SO003	Baseten	Enterprise
SO004	Baseten	Healthcare
SO005	Baseten	Pricing
SO006	Baseten	Careers at Baseten	Companies like Abridge, Cursor, Lovable, Notion, and OpenEvidence depend on Baseten to power mission-critical AI workloads in production.
SO007	Baseten	Baseten Terms and Conditions	BASETEN LABS, INC. (“BASETEN”).
SO008	Baseten	Privacy Policy	Company (referred to as either "the Company", "We", "Us" or "Our" in this Agreement) refers to BaseTen Labs, Inc., 201 Spear St, Suite 1600, San Francisco, CA 94105.
SO009	Baseten	Announcing our Series A	We’ve raised a little over $20 million dollars to date across our seed and Series A rounds.
SO010	Baseten	Announcing our Series B	We’re excited to announce that we’ve raised an additional $40M.
SO011	Baseten	Announcing Baseten’s $75M Series C	Today, we run workloads across thousands of GPUs, serving millions of end customers worldwide while continuously adding new cloud partners.
SO012	Baseten	Announcing Baseten’s $150M Series D	Today, we’re excited to announce our $150M Series D, led by BOND, with Jay Simons joining our Board.
SO013	Baseten	Announcing Baseten's $300M Series E	We’re thrilled to announce that we have raised $300M at a $5B valuation.
SO014	Business Wire	Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future	Founded in 2019 and based in San Francisco, Baseten has raised $585 million to date from investors including IVP, CapitalG, Conviction, Bond, Greylock, and Spark Capital.
SO015	Tech Funding News	Baseten nabs $300M from IVP, CapitalG to challenge Together AI in inference
SO016	Tracxn	Baseten Technologies
SO017	CB Insights	Baseten Stock Price, Funding, Valuation, Revenue & Financial Statements
SO018	PitchBook via Internet Archive	Baseten 2025 Company Profile: Valuation, Funding & Investors \| PitchBook
SO019	Abridge	Abridge \| Intelligence at the point of conversation
SO020	Clay	Clay \| GTM workflows at scale
SO021	Cursor	Cursor \| The new way to build software
SO022	OpenEvidence	OpenEvidence \| America's Official Medical Knowledge Platform
SO023	Baseten	OpenEvidence delivers instant, accurate medical information with Baseten	Baseten now serves billions of requests per week for OpenEvidence.
SO024	Baseten	How Gamma makes building presentations criminally fun
SO025	Baseten	Speechify real-time text-to-speech	Because of Baseten’s efficient autoscaling, model performance and infrastructure optimizations, Speechify’s cost per million characters dropped by 44%.
SO026	Baseten	Patreon
SO027	NVIDIA	Streamlined AI Inference Infrastructure in the Cloud	Baseten’s infrastructure, powered by NVIDIA GPUs on AWS EC2 instances powered by NVIDIA A100 Tensor Core GPUs, only takes 5–10 seconds to get the model up and running. This is an incredible speedup on cold starts, which previously took up to five minutes.
SO028	WorkOS	A conversation with Philip Kiely from Baseten at AWS re:Invent 2025
SO029	Nudge Security	Is Baseten safe? Learn if Baseten Is Legit	Review Baseten security risks.
SO030	ServiceAlert	Baseten Outage History, Downtime & Incident Records	Detailed incident data is not available for this service.
SO031	Baseten	Tuhin Srivastava - CEO, Co-Founder
SO032	Baseten	Amir Haghighat - CTO, Co-Founder
SO033	Baseten	Pankaj Gupta - Co-Founder
SO034	Baseten	Phil Howes - Co-Founder
SM001	Baseten	Inference Platform: Deploy AI models in production \| Baseten	Scale workloads across any region and any cloud (in our cloud or yours), with blazing-fast cold starts and 99.99% uptime out of the box.
SM002	Baseten	Cloud Pricing	Basic: $0 per month, pay as you go. Enterprise adds self-host deployments, cloud commitments, and custom SLAs.
SM003	Baseten	Mission-Critical Inference for Enterprise AI Infrastructure	The Baseten Inference Stack runs inside your cloud infrastructure, keeping your data fully under your control.
SM004	Baseten	Healthcare	99.99% uptime and infinite scaling through a unified GPU pool spanning 10+ clouds.
SM005	Baseten	Production-First Model APIs - Baseten Inference Stack	Spend 5-10x less than closed alternatives with our optimized multi-cloud infrastructure and efficient frontier open models.
SM006	Baseten	Inference at Scale with Dedicated Deployments \| Baseten	We regularly see 6x better GPU utilization and 5-10x lower costs powered by our Inference Stack.
SM007	Baseten	Multi-Model Inference, Ultra-Low Latency at Scale \| Baseten	Baseten Chains enables granular hardware and autoscaling for compound AI, powering 6x better GPU usage and cutting latency in half.
SM008	Baseten	Cloud-Native AI Infrastructure \| Baseten	Scale in your cloud, ours, or both with Baseten Self-hosted, Cloud, and Hybrid deployment options.
SM009	Baseten	Secure model inference - Baseten	Baseten never shares GPUs across users.
SM010	Baseten	Customer stories	Speechify synthesizes 161B+ characters per month for 60M+ users. With Baseten, Speechify cut costs by 44%, p99 latency by 30-50%, and got 4.5x faster cold starts.
SM011	Baseten	OpenEvidence delivers instant, accurate medical information with the Baseten Inference Stack	OpenEvidence can scale efficiently even in the face of traffic spikes, hardware failure, or capacity constraints... without locking into multi-year commitments with single cloud vendors.
SM012	Baseten	How Gamma makes building presentations criminally fun	We generate millions of images a day on Baseten for our 70+ million users with ultra-low latency and high throughput.
SM013	Baseten	How Writer helps businesses transform with AI	In a benchmark running the LLMs in FP16 on four NVIDIA A100 GPUs, Writer saw 60% higher tokens per second, 23% lower time to first token, and 35% lower cost per million tokens.
SM014	Baseten	Why we built and open-sourced a model serving solution	Truss bridges the gap between model development and model deployment by making it equally straightforward to serve a model on localhost and in prod.
SM015	Baseten	AI Model Training Built for Production Inference \| Baseten	Train -> deploy loop: Models trained with Loops promote directly to Baseten Dedicated Inference with one command.
SM016	Baseten	Baseten Frontier Gateway	The Baseten Frontier Gateway is the path from weights to a production-ready API.
SM017	Baseten	SSO and SCIM	Available on the Enterprise plan with just-in-time provisioning, automatic deprovisioning, and optional group-gated admin access.
SM018	Baseten	Retrieve billing usage via API	The response includes aggregate totals and a per-resource or per-model breakdown array, with daily granularity on each entry.
SM019	Technavio	AI Inference-as-a-service Market Growth Analysis - Size and Forecast 2026-2030	The AI Inference-as-a-service Market size was valued at USD 85.25 billion in 2025, growing at a CAGR of 22.1% during the forecast period 2026-2030.
SM020	Mordor Intelligence	Enterprise AI Market - Share, Trends & Size 2025 - 2031	The Enterprise AI market size stood at USD 114.87 billion in 2026 and is projected to reach USD 273.08 billion by 2031, registering an 18.91% CAGR over 2026-2031.
SM021	Fortune Business Insights	AI Inference Market Size, Share \| Global Growth Report [2034]	The global AI inference market size was valued at USD 103.73 billion in 2025 and is projected to grow from USD 117.80 billion in 2026 to USD 312.64 billion by 2034.
SM022	Modal	Modal: High-performance AI infrastructure	Autoscale from 0 to 1000+ GPUs, instantly.
SM023	Replicate	Run AI with an API	We scale up and down to handle demand, and you only pay for the compute that you use.
SM024	Runpod	The AI Developer Cloud \| Runpod	One platform to go from AI experiment to production. Pods for building. Serverless for shipping. Clusters for scaling.
SM025	Runpod	Top Serverless GPU Clouds for 2026: Comparing Runpod, Modal, and More	Baseten: usage-based (per-minute), configurable replicas, T4/A10G/L4/A100/H100, 8-12 sec cold starts.
SM026	HostFleet	Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) — HostFleet	You’re a startup with a production inference workload and a budget -> Baseten. Most expensive per GPU-hour on paper, but Truss, observability, and support are tangible.
SM027	Amazon Web Services	The center for all your data, analytics, and AI – Amazon SageMaker – AWS	Train, customize, and deploy ML and foundation models on a highly performant and cost-effective infrastructure.
SM028	Google Cloud	Gemini Enterprise Agent Platform (formerly Vertex AI)	Build, scale, govern and optimize enterprise grade AI agents.
SM029	Microsoft Azure	Azure Machine Learning - ML as a Service \| Microsoft Azure	Azure Machine Learning is a comprehensive machine learning platform that supports language model fine-tuning and deployment.
SP001	Baseten	Inference Platform: Deploy AI models in production \| Baseten
SP002	Baseten	Cloud Pricing
SP003	Baseten Docs	Overview - Baseten
SP004	Baseten Docs	Secure model inference - Baseten
SP005	Baseten	Production-First Model APIs - Baseten Inference Stack
SP006	Baseten	AI Model Training Built for Production Inference \| Baseten
SP007	Baseten	Multi-Model Inference, Ultra-Low Latency at Scale \| Baseten
SP008	Baseten	AI Model Performance - Baseten Inference Runtime
SP009	Baseten	Mission-Critical Inference for Enterprise AI Infrastructure
SP010	Baseten	Baseten Frontier Gateway
SP011	Baseten	Why we built and open-sourced a model serving solution
SP012	Baseten	Baseten Status
SP013	Servicealert.ai	Baseten Outage History, Downtime & Incident Records
SP014	Modal	Modal: High-performance AI infrastructure
SP015	Modal	Plan Pricing \| Modal
SP016	Replicate	Run AI with an API
SP017	Replicate	Pricing – Replicate
SP018	Runpod	The AI Developer Cloud \| Runpod
SP019	Runpod	GPU Cloud Pricing - Runpod
SP020	Runpod Docs	Serverless pricing \| Runpod Docs
SP021	AWS	The center for all your data, analytics, and AI – Amazon SageMaker – AWS
SP022	AWS	Amazon Bedrock Pricing – AWS
SP023	Google Cloud	Gemini Enterprise Agent Platform (formerly Vertex AI)
SP024	Microsoft Azure	Azure Machine Learning - ML as a Service \| Microsoft Azure
SP025	HostFleet	Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) — HostFleet	Baseten: Most expensive per GPU-hour on paper, but Truss, observability, and support are tangible.
SP026	Tracxn	Baseten Technologies
SP027	PitchBook	Baseten 2025 Company Profile: Valuation, Funding & Investors \| PitchBook
SP028	Sacra	Baseten revenue, valuation & funding	AWS, Google, and Microsoft leverage extensive enterprise relationships to bundle AI inference with broader cloud commitments at below-market rates.
SP029	Business Wire	Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future
SP030	Tech Funding News	Baseten nabs $300M from IVP, CapitalG to challenge Together AI in inference — TFN
SI001	Baseten	Cloud Pricing
SI002	Baseten	Inference at Scale with Dedicated Deployments \| Baseten
SI003	Baseten	Production-First Model APIs - Baseten Inference Stack
SI004	Baseten	AI Model Training Built for Production Inference \| Baseten
SI005	Baseten	Enterprise
SI006	Baseten	Healthcare
SI007	Baseten	Retrieve billing usage via API
SI008	Baseten	Announcing Baseten’s $75M Series C
SI009	Baseten	Announcing Baseten’s $150M Series D
SI010	Baseten	Announcing Baseten’s $300M Series E
SI011	Baseten	Careers at Baseten
SI012	Baseten	Privacy Policy
SI013	Baseten	Baseten Terms and Conditions
SI014	Baseten	Service Level Agreement
SI015	Baseten	Baseten Status
SI016	Baseten	How Writer helps businesses transform with AI
SI017	Baseten	OpenEvidence delivers instant, accurate medical information with the Baseten Inference Stack
SI018	Baseten	How Speechify makes audio the default with real-time text-to-speech
SI019	Baseten	Superhuman achieves 80% faster embedding model inference with Baseten
SI020	Baseten	Patreon scales Whisper transcription with Baseten
SI021	Business Wire	Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future
SI022	Tech Funding News	Baseten nabs $300M from IVP, CapitalG to challenge Together AI in inference
SI023	Sacra	Baseten revenue, valuation & funding
SI024	Tracxn	Baseten Technologies
SI025	CB Insights	Baseten Stock Price, Funding, Valuation, Revenue & Financial Statements
SI026	PitchBook via Wayback	Baseten 2025 Company Profile: Valuation, Funding & Investors \| PitchBook
SI027	HostFleet	Serverless GPU Pricing Matrix 2026
SI028	Mordor Intelligence	Enterprise AI Market - Share, Trends & Size 2025 - 2031
SI029	U.S. Securities and Exchange Commission	EDGAR Entity Landing Page (CIK 0001850888)
SI030	Baseten	How Hebbia uses Baseten to power AI workflows for the world's leading financial institutions
SI031	Baseten	Posit launches real-time AI code suggestions with Baseten
SI032	Baseten	Wispr Flow creates effortless voice dictation with Llama on Baseten
SI033	Baseten	How Zed is reimagining the code editor from the ground up
SI034	Baseten	How World Labs is building large world models, pushing the boundaries of 3D
SE001	Baseten	Inference Platform: Deploy AI models in production \| Baseten	Rapidly scale workloads across any cloud provider with global capacity. We offer single-tenant and self-hosted deployments for extra security.
SE002	Baseten	Overview - Baseten	Baseten is a training and inference platform. Bring a model ... and Baseten turns it into a production API endpoint with autoscaling, observability, and optimized serving infrastructure.
SE003	Baseten	Reference documentation - Baseten
SE004	Baseten	How Baseten works	Behind every GPU workload on Baseten is the Multi-cloud Capacity Management (MCM) system.
SE005	Baseten	Production-First Model APIs - Baseten Inference Stack	Model APIs made for products, not toys.
SE006	Baseten	Model APIs - Baseten	Model APIs provide instant access to high-performance LLMs through endpoints that are compatible with both the OpenAI Chat Completions API and the Anthropic Messages API.
SE007	Baseten	Inference at Scale with Dedicated Deployments \| Baseten	Dedicated deployments are single-tenant, can be region-locked, and are HIPAA compliant and SOC 2 Type II certified on Baseten Cloud.
SE008	Baseten	Baseten Frontier Gateway	Baseten Frontier Gateway gives you a production-ready, white-labeled API endpoint.
SE009	Baseten	Multi-Model Inference, Ultra-Low Latency at Scale \| Baseten	Deploy your Chain to production with each Chainlet specifying its own hardware resources, software dependencies and scaling settings independently.
SE010	Baseten	AI Model Training Built for Production Inference \| Baseten	Loops (early access) ... Training Jobs (GA).
SE011	Baseten	Cloud-Native AI Infrastructure \| Baseten	We built cross-cloud autoscaling so you can serve users anywhere in the world with low latency and high reliability.
SE012	Baseten	AI Model Management for Production Inference \| Baseten
SE013	Baseten	AI Model Performance - Baseten Inference Runtime	We take the best open-source inference frameworks (TensorRT, SGLang, vLLM, TGI, TEI, and more) and layer in our own optimizations for maximum performance.
SE014	Baseten	Secure model inference - Baseten	Baseten does not store model inputs, outputs, or weights by default.
SE015	Baseten	Mission-Critical Inference for Enterprise AI Infrastructure	We are SOC-2 Type II certified and HIPAA compliant across all hosting options and support data residency requirements through region-restricted deployments.
SE016	Baseten	Cloud Pricing	Only pay for the compute you use, down to the minute.
SE017	Baseten	SSO and SCIM	Connect Baseten to your identity provider for SAML 2.0 sign-in and SCIM 2.0 directory sync.
SE018	Baseten	Rolling deployments	You can now gradually shift traffic to new deployments instead of swapping all at once.
SE019	Baseten	Introducing the Baseten Delivery Network (BDN)	We just launched the Baseten Delivery Network (BDN), designed to make cold starts 2-3x faster for large models.
SE020	Baseten	Retrieve billing usage via API	You can now query your billing usage programmatically using the new GET /v1/billing/usage_summary endpoint.
SE021	Baseten	Regional environments	Regional environments route inference traffic for a deployment exclusively to workload planes within a designated geographic region.
SE022	Baseten	Baseten Status	Past Incidents ... May 29, 2026 ... May 26 ... May 19 ... May 18 ... May 16 ... May 15.
SE023	Baseten	Service Level Agreement	Baseten will undertake commercially reasonable measures to ensure that System Availability ... equals or exceeds ninety-nine point nine percent (99.9%) during each calendar month.
SE024	Baseten	Why we built and open-sourced a model serving solution	To address this problem, we built Truss.
SE025	GitHub / basetenlabs	GitHub - basetenlabs/truss: The simplest way to serve AI/ML models in production	Truss is the CLI for deploying and serving ML models on Baseten.
SE026	GitHub / basetenlabs	Releases · basetenlabs/truss	v0.18.3 ... 21 May 16:14 ... feat(loops/cli) ... feat(train) ... feat(truss).
SE027	PyPI	truss	pip install --upgrade truss
SE028	Baseten	How Writer helps businesses transform with AI	In a benchmark running the LLMs in FP16 on four NVIDIA A100 GPUs, Writer saw: 60% higher tokens per second, 23% lower time to first token, 35% lower cost per million tokens.
SE029	Baseten	OpenEvidence delivers instant, accurate medical information with the Baseten Inference Stack	By using Baseten, OpenEvidence achieved: 78% lower latency ... 6x faster deployment processes ... 8x+ reduction in infrastructure maintenance time overall.
SE030	ServiceAlert	Baseten Outage History, Downtime & Incident Records	Detailed incident data is not available for this service.
SE031	HostFleet	Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) — HostFleet	You’re a startup with a production inference workload and a budget → Baseten. Most expensive per GPU-hour on paper, but Truss, observability, and support are tangible.
SE032	Runpod	The AI Developer Cloud \| Runpod	Sub-200ms cold starts ... Zero idle cost.
SE033	Modal	Modal: High-performance AI infrastructure	Autoscale from 0 to 1000+ GPUs, instantly.
SE034	Replicate	Run AI with an API	You can deploy your own custom models using Cog, our open-source tool for packaging machine learning models.
SE035	Amazon Web Services	The center for all your data, analytics, and AI – Amazon SageMaker – AWS	Train, customize, and deploy ML and foundation models on a highly performant and cost-effective infrastructure.
SE036	Google Cloud	Gemini Enterprise Agent Platform (formerly Vertex AI)	Agent Platform is our open and comprehensive platform ... to build, scale, govern and optimize enterprise-grade agents.
SE037	Microsoft Azure	Azure Machine Learning - ML as a Service \| Microsoft Azure	Azure Machine Learning is a comprehensive machine learning platform that supports language model fine-tuning and deployment.
SE038	Nudge Security	Is Baseten Safe? Learn if Baseten Is Legit \| Nudge Security	Baseten Supply Chain ... Amazon Web Services (AWS), Vercel, Statuspage, SendGrid, Stripe, Google Analytics, Segment, Sentry ...
SU001	Baseten	Inference Platform: Deploy AI models in production \| Baseten
SU002	Baseten	Customer stories
SU003	Baseten	Mission-Critical Inference for Enterprise AI Infrastructure
SU004	Baseten	Healthcare
SU005	Baseten	Cloud Pricing
SU006	Baseten	Announcing Baseten's $300M Series E
SU007	Business Wire	Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future
SU008	Baseten	How Writer helps businesses transform with AI
SU009	Baseten	OpenEvidence delivers instant, accurate medical information with the Baseten Inference Stack
SU010	Baseten	How Speechify makes audio the default with real-time text-to-speech
SU011	Baseten	How Gamma makes building presentations criminally fun
SU012	Baseten	Superhuman achieves 80% faster embedding model inference with Baseten
SU013	Baseten	Patreon saves nearly $600k/year in ML resources with Baseten
SU014	OpenEvidence	OpenEvidence
SU015	Writer	WRITER
SU016	Writer	About WRITER
SU017	Gamma	About Us – Reinventing Presentations with AI \| Gamma.app
SU018	Speechify	Speechify: Text to Speech & Voice Typing AI Assistant \| 55M+ Users
SU019	Speechify	Voice Over Studio: Request A Free Demo \| Speechify
SU020	Superhuman	Superhuman: Docs, Mail, and AI That Work Everywhere
SU021	Patreon	Where Creator Communities Thrive — Patreon
SU022	NVIDIA	Case study:Baseten’s AI Inference Infrastructure	Baseten's infrastructure, powered by NVIDIA GPUs on AWS EC2 instances powered by NVIDIA A100 Tensor Core GPUs, only takes 5–10 seconds to get the model up and running.
SU023	WorkOS	Baseten is betting big on open source models — WorkOS	companies could switch to models that were faster, less expensive, more customizable, and more reliable at scale
SU024	FeaturedCustomers	46 Baseten Customer Reviews & References	Customer Rating Review Score based on 654 reference ratings 4.8/5.0
SU025	Runpod	Top Serverless GPU Clouds for 2026: Comparing Runpod, Modal, and More	Baseten ... Usage-based (per-minute) ... 8–12 sec
SU026	HostFleet	Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026) — HostFleet	Baseten has per-GPU-hour pricing plus a minimum dedicated deployment cost.
SU027	Abridge	Generative AI for Clinical Conversations \| Abridge
SU028	Clay	Clay \| Go to market with unique data—and the ability to act on it
SU029	Cursor	The best coding agent
SU030	Notion	Meet your AI team \| Notion
SU031	PeerSpot	Baseten Reviews, Competitors and Pricing
SU032	Mercor	Mercor \| Organizing human intelligence to power the AI economy
SR001	Baseten	Secure model inference	Baseten does not store model inputs, outputs, or weights by default.
SR002	Baseten	Privacy Policy
SR003	Baseten	Baseten Terms and Conditions	Customer acknowledges and agrees that the Baseten Products & Services will not be used, and is not licensed for use, in connection with any of Customer’s time-critical or mission-critical functions.
SR004	Baseten	Service Level Agreement	Baseten will undertake commercially reasonable measures to ensure that System Availability ... equals or exceeds ninety-nine point nine percent (99.9%).
SR005	Baseten	Baseten Status
SR006	ServiceAlert	Baseten Outage History, Downtime & Incident Records
SR007	Nudge Security	Is Baseten Safe? Learn if Baseten Is Legit
SR008	Baseten	Cloud Pricing
SR009	Baseten	Baseten homepage
SR010	Baseten	Cloud-Native AI Infrastructure
SR011	Baseten	Mission-Critical Inference for Enterprise AI Infrastructure
SR012	Baseten	Healthcare	SOC-2 Type II and HIPAA compliant with flexible hosting and data residency with region-restricted cloud deployments.
SR013	Technavio	AI Inference-as-a-service Market Growth Analysis - Size and Forecast 2026-2030
SR014	Mordor Intelligence	Enterprise AI Market - Share, Trends & Size 2025 - 2031
SR015	Runpod	Top Serverless GPU Clouds for 2026: Comparing Runpod, Modal, and More
SR016	HostFleet	Every serverless GPU host compared: pricing, GPUs, and what they claim (April 2026)	Baseten has per-GPU-hour pricing plus a minimum dedicated deployment cost. Scale-to-zero is available but there are billed minimum awake times.
SR017	Tracxn	Baseten Technologies
SR018	Business Wire	Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future
SR019	Baseten	Careers at Baseten
SR020	Baseten	Production-First Model APIs - Baseten Inference Stack
SR021	Baseten	AI Model Training Built for Production Inference
SR022	Baseten	AI Model Management for Production Inference
SR023	Baseten	Baseten Frontier Gateway
SR024	Baseten	Inference at Scale with Dedicated Deployments
SR025	Baseten	SSO and SCIM
SR026	Baseten	Rolling deployments
SR027	Baseten	Retrieve billing usage via API
SR028	Baseten	Why we built and open-sourced a model serving solution
SR029	U.S. Department of Health & Human Services	Business Associates	The satisfactory assurances must be in writing, whether in the form of a contract or other agreement between the covered entity and the business associate.
SR030	European Commission	The EU’s approach to artificial intelligence
SR031	Modal	Series C announcement
SR032	Reuters	AI startup Modal raised $355 million in a new round of financing, valuing the company at $4.65 billion
SR033	Sacra	Fireworks AI revenue, valuation & funding
SR034	Tracxn	RunPod
SR035	CoreWeave	Record First Quarter Revenue and Revenue Backlog Highlight Unprecedented Demand for CoreWeave Cloud
SV001	Baseten	Announcing Baseten’s $300M Series E	We’re thrilled to announce that we have raised $300M at a $5B valuation.
SV002	Business Wire	Baseten Raises $300M at a $5B Valuation to Power a Multi-Model Future	This values Baseten at $5 billion and marks the company’s third fundraise in the past year.
SV003	Tech Funding News	Baseten nabs $300M from IVP, CapitalG to challenge Together AI in inference	investors invested $300 million in the company, pushing its valuation to about $5 billion
SV004	Sacra	Baseten revenue, valuation & funding	Sacra estimates that Baseten hit $600M in annualized revenue in March 2026.
SV005	Tracxn	Baseten Technologies	Jan 20, 2026 \| $300M \| Series E \| $5B
SV006	CB Insights	Baseten Stock Price, Funding, Valuation, Revenue & Financial Statements	Baseten has raised $585M over 7 rounds.
SV007	PitchBook	Baseten 2025 Company Profile: Valuation, Funding & Investors \| PitchBook	Latest Deal Amount $75M
SV008	Baseten	Announcing Baseten’s $150M Series D	Today, we’re excited to announce our $150M Series D, led by BOND.
SV009	Baseten	Announcing Baseten’s $75M Series C	Today, we’re thrilled to announce our Series C fundraise.
SV010	Baseten	Cloud Pricing	Basic: $0 per month, pay as you go.
SV011	Baseten	Baseten homepage	Scale workloads across any region and any cloud ... with ... 99.99% uptime out of the box.
SV012	HostFleet	Serverless GPU pricing matrix 2026	Baseten. Most expensive per GPU-hour on paper, but Truss, observability, and support are tangible.
SV013	Runpod	Top serverless GPU clouds for 2026 AI workloads	Baseten ... Usage-based (per-minute) ... 8–12 sec
SV014	Runpod	Runpod pricing	H100 PCIe $2.89/hr
SV015	Modal	Modal's Series C: Raising $355M at a $4.65B valuation	We’ve raised $355 million ... surpassing $300 million in annualized revenue. Our valuation is $4.65B post-money.
SV016	Reuters / U.S. News	Exclusive-Modal Labs Valued at $4.65 Billion as AI Coding Takes Off	The company’s annualized revenue is about $300 million, up from an annualized rate of $60 million in September.
SV017	Modal	Modal pricing	Get started with $30 / month free credits
SV018	Sacra	Fireworks AI revenue, valuation & funding	Fireworks AI hit $315M in annualized revenue in February 2026 ... gross margin sits at approximately 50%.
SV019	CoreWeave / Business Wire	CoreWeave Reports Strong First Quarter 2026 Results	Revenue backlog was $99.4 billion as of March 31, 2026.
SV020	CompaniesMarketCap	CoreWeave market capitalization	As of May 2026 CoreWeave has a market cap of $59.75 Billion USD.
SV021	Datadog	Datadog Announces First Quarter 2026 Financial Results	Revenue was $1,006 million ... Full Year 2026 Outlook: Revenue between $4.30 billion and $4.34 billion.
SV022	CompaniesMarketCap	Datadog market capitalization	As of May 2026 Datadog has a market cap of $88.04 Billion USD.
SV023	Datadog / SEC filing mirror	Datadog Annual Report 2026	Form 10-K (NASDAQ:DDOG) ... For the fiscal year ended December 31, 2025
SV024	CompaniesMarketCap	Cloudflare market capitalization	As of May 2026 Cloudflare has a market cap of $85.47 Billion USD.
SV025	Stock Analysis	Cloudflare revenue 2016-2026	This brings the company’s revenue in the last twelve months to $2.33B.
SV026	CompaniesMarketCap	MongoDB market capitalization	As of May 2026 MongoDB has a market cap of $27.01 Billion USD.
SV027	Stock Analysis	MongoDB revenue 2017-2026	This brings the company’s revenue in the last twelve months to $2.60B.
SV028	Technavio	AI Inference-as-a-service Market Growth Analysis - Size and Forecast 2026-2030	The AI Inference-as-a-service Market size was valued at USD 85.25 billion in 2025, growing at a CAGR of 22.1% during 2026-2030.
SV029	Mordor Intelligence	Enterprise AI Market - Share, Trends & Size 2025 - 2031	The Enterprise AI market size stood at USD 114.87 billion in 2026.
SV030	Amazon Web Services	Amazon Bedrock Pricing	Amazon Bedrock offers ... batch inference at a 50% lower price compared to on-demand inference pricing.
SV031	Google Cloud	Gemini Enterprise Agent Platform	New customers get up to $300 in free credits.
SV032	Microsoft Azure	Azure Machine Learning	The SLA for Azure Machine Learning is 99.9 percent uptime. There's no additional charge to use Azure Machine Learning.