Startup Diligence
Diligence report Generative AI infrastructure / inference cloud late-stage private 2026-05-16

Together AI

Open-model inference cloud with credible technical moat and enterprise traction, priced near its Series B mark

Together AI shows credible inference-cloud product and traction at a Series B valuation that requires multi-year ARR scale to underwrite a strong exit.

Cover facts

Latest disclosed valuation (Series B 2024) 01
3.3 USD B [CV001]
Cumulative capital raised across seed / A / B 02
500 USD M (approximate, per press) [CV001, CV002]
Reported run-rate revenue range (press) 03
130-200 USD M ARR (per The Information, unverified) [CV040]
Named enterprise + startup customers 04
9 case studies + GTC 2025 cohort [CV012]
Developer signups (company-claimed) 05
100000 developers [CU001]

Company profile

Together AI is a generative-AI cloud that runs serverless and dedicated inference, fine-tuning, and training across 200+ open and custom models, anchored by FlashAttention, ThunderKittens, and Together Inference Engine v2. The company combines a defensible technical research base with a Salesforce + NVIDIA channel and an open-source community surface.

Website
www.together.ai
Founded
2022-06-01
Founders
Vipul Ved Prakash, Ce Zhang, Tri Dao, Percy Liang
Founding location
San Francisco, California, USA
Headquarters
San Francisco, California
Product
Together AI sells serverless inference (per-token), dedicated endpoints (reserved GPU capacity), fine-tuning (LoRA + full), batch inference, embeddings, vision, audio, and image APIs across a 200+ open and custom model catalog, all OpenAI-compatible.
Customers
Developers (self-serve), AI-native startups (Pika, Cartesia, Arcee, Nous Research), enterprise SaaS (Salesforce, Zoom), healthcare (Adaption), academia (Washington University), and NVIDIA GTC 2025 Pioneers cohort.
Business model
Usage-based serverless inference + committed dedicated capacity + fine-tuning + enterprise contracts; Salesforce Ventures co-sell and Startup Accelerator augment direct sales.
Stage
late-stage private
Funding status
Privately funded; Series A $102.5M (Nov 2023, Kleiner Perkins led) and Series B $305M (Mar 2024, Salesforce Ventures led, ~$3.3B post-money per CNBC / Bloomberg / Fast Company); investors include NVIDIA, Coatue, Lux Capital, Prosperity7, General Catalyst.

Executive summary

Top strengths

  • Technical moat anchored by FlashAttention (Tri Dao), ThunderKittens (Stanford HazyResearch), Together Inference Engine v2, and Mixture-of-Agents productisation.
  • Anchor channel partners (Salesforce Ventures co-sell, NVIDIA GTC 2025 Pioneers, Startup Accelerator) plus a 200+ open-model catalog give wide enterprise + developer reach.
  • Documented enterprise + startup proof base spanning Salesforce, Zoom, Pika, Cartesia, Arcee, Nous Research, Washington University, and Adaption healthcare.

Top risks

  • Hyperscaler bundled inference (AWS Bedrock, GCP Vertex, Azure OpenAI) could compress pricing 30-50% over 2026-2027.
  • NVIDIA single-vendor concentration on GPUs, networking, and stack would cap revenue ramp if Blackwell allocation tightens.
  • GenAI regulatory perimeter (EU AI Act, BIS export controls, FTC inquiry) and copyright-litigation precedent (NYT, Authors Guild, Getty) widen through 2027.

Open gaps

  • Exact ARR, NRR / GRR, top-10 customer concentration, GPU committed spend, and opex split (R&D / S&M / G&A) are undisclosed.
  • CFO and CRO presence at runDate is not publicly confirmed.
  • SLA percentage, incident history, pen-test cadence, and breach plan are not disclosed beyond the public status page.
  • Sovereign-channel posture (Prosperity7-adjacent) and OSS hosting policy under tightening copyright precedent require management disclosure.

Contents

Chapter 01

01Company Overview

1.1 Identity, Headquarters, and Product Frame

Together AI markets itself as “the AI acceleration cloud,” offering training, fine-tuning, and inference for open-source and custom large language, image, audio, and vision models. The corporate entity, Together Computer Inc., is headquartered in San Francisco, California, with a satellite presence in Menlo Park and additional research staff in Zurich; the careers page and contact surface confirm both locations and an active hiring posture across infrastructure, kernel, GPU, applied-ML, and revenue roles. The company was incorporated on 27 June 2022 by four co-founders with deep ties to Stanford, Princeton, ETH Zürich, and the broader open-source LLM research community. Its identity rests on three pillars: a hyperscale GPU cloud purpose-built for AI workloads, an open-source research arm (RedPajama, OpenChatKit, StripedHyena, FlashAttention, Mixture-of-Agents), and a self-service inference and fine-tuning API competitive with OpenAI’s and Anthropic’s but priced for open models. The company emphasises that customers can keep weights, control data residency, and dedicate clusters when needed, which is the principal contrast with closed-API competitors.[CO001, CO002, CO003, CO004, CO005]

Snapshot KPI table
MetricValue/statusDateConfidenceGap or diligence ask
Post-money valuation$3.3B2024-07-09highConfirm 2026 secondaries or new round
Total primary capital raised≈$533M disclosed2024-07highVerify any post-Jul-2024 extensions
Annualised revenue≈$100M (third-party report)2024-07mediumNo audited filing; request management figure
Headcount>150 (job board derived)2026-05mediumNo filing; request HR roster
GPU footprint>20,000 NVIDIA Hopper-class2024-07mediumConfirm Blackwell additions and utilisation
Customer count100,000+ developers (company-claimed)2024lowDistinguish paying vs free; verify NRR
HQSan Francisco, CA2026-05high
Founding date27 June 20222022high

Values mix company disclosure (high), third-party reporting (medium), and inferred figures (low); paid-customer count and ARR are unaudited and must be validated with management.

[CO019, CO020, CO021, CO022, CO023, CO024]
FO002: Company snapshot logic

How identity, product, capital, and customers connect.

[CO001, CO003, CO005, CO017, CO020, CO021]

1.2 Founders, Leadership, and Governance

CEO Vipul Ved Prakash was previously co-founder/CTO of Topsy (acquired by Apple for ~$200M in 2013) and an early principal at Cloudmark, giving him both consumer-scale ML and infrastructure operating experience. CTO Ce Zhang is a tenured professor at ETH Zürich and Together’s research lead on distributed training systems and data-centric ML. Chief Scientist Chris Ré is the MacArthur-winning Stanford professor behind Snorkel and many of the FlashAttention/Hyena lines of work; Percy Liang, Stanford CRFM director, is co-founder and an advisor. The leadership bench has expanded with a head of revenue, head of GPU infrastructure, head of inference engineering, and a Zurich-based research head; the board includes investor partners from Coatue, Kleiner Perkins, NEA, and Lux. Key-person dependence is concentrated in Prakash for commercial execution and in the founding research trio for technical credibility, particularly given the open-source flywheel that drives much of Together’s top-of-funnel.[CO006, CO007, CO008, CO009, CO010, CO011]

Leadership and founder table
PersonRoleBackgroundFounder-market fitKey-person dependency
Vipul Ved PrakashCo-founder, CEOPreviously co-founder/CTO Topsy (acquired by Apple 2013), Cloudmark co-founderRepeat infrastructure/consumer-ML founder with operating exitHigh — sole CEO and primary commercial face
Ce ZhangCo-founder, CTOTenured professor ETH Zürich; distributed training & data-centric ML research leadDeep systems/ML research credibilityHigh — only CTO; bridges research & engineering
Chris RéCo-founder, Chief ScientistMacArthur Fellow; Stanford CS; Snorkel co-founder; FlashAttention/Hyena lineageAuthored or advised most open-source IPHigh — anchors research brand
Percy LiangCo-founderDirector Stanford CRFM; HELM benchmark leadSets research agenda & academic credibilityMedium — advisory not full-time operational
Tri DaoChief Scientist (research)FlashAttention author; Princeton CS facultyInference-kernel authorityHigh — drives kernel performance lead
Head of RevenueSales leadership (publicly listed roles)Enterprise SaaS backgroundRequired for enterprise expansionMedium — multiple sales hires already
Head of GPU InfrastructureCluster engineeringPrior hyperscaler experience (job board)Crucial for SLA & costMedium — recruiting actively

Founder bios cross-verified against official about page and Wikipedia; non-founder executives derived from careers postings and public LinkedIn footprints at runDate.

[CO006, CO007, CO008, CO009, CO010, CO011]

1.3 Funding History, Capital Stack, and Valuation

Together AI raised a $20M seed in May 2023 led by Lux Capital with Factory, SciFi, Long Journey, and individual backers including Scott Banister, Jakob Uszkoreit, and Aravind Srinivas. A $102.5M Series A followed in November 2023, led by Kleiner Perkins with NVIDIA, Emergence, NEA, Prosperity7, and Greycroft participating. In March 2024 the company added approximately $106M at a reported $1.25B valuation, then closed a $305M Series B in July 2024 led by Salesforce Ventures and Coatue at a $3.3B post-money valuation; Lakestar, NVIDIA, and an expanded set of strategics also participated. Cumulative disclosed primary capital is therefore approximately $533M before any 2025/2026 extensions, with no public S-1 filing or registered offering on EDGAR as of the run date. The investor mix — sovereign-aligned (Prosperity7), strategic GPU supplier (NVIDIA), category-defining cloud customer (Salesforce Ventures) and tier-1 financials (Coatue/KP/Lux) — is unusual and suggests Together is being positioned as a neutral, multi-stakeholder backbone for the open-model market.[CO012, CO013, CO014, CO015, CO016, CO017]

Stakeholder or investor map
StakeholderRoleRound(s)Control/economic importanceDiligence ask
Salesforce VenturesLead Series B (2024)BStrategic distribution into Salesforce ecosystemConfirm any commercial commit or revenue share
CoatueCo-lead Series BBPublic-market crossover signalingConfirm pro-rata posture
Kleiner PerkinsLead Series ASeed/A/BBoard seat; partner Bucky MooreConfirm board composition
NVIDIAStrategic investorA/BAllocation of H100/H200/B200 supplyQuantify supply commitment & pricing
Lux CapitalLead SeedSeed/AEarliest institutional backerConfirm board observer rights
Emergence CapitalSeries AAEnterprise-SaaS network
Prosperity7 (Aramco)Series AASovereign-aligned capital; Middle East GTMConfirm any sovereign-cloud commitments
NEA, Greycroft, SciFi, Factory, Long Journey, Definition, Long JourneyCo-investorsSeed/A/BRound support
Founders & employeesCommon stockReported >25% retained based on Series A pressConfirm cap table post-Series B

Cap table figures sourced from press releases at funding events; secondary sales not disclosed at runDate.

[CO012, CO013, CO014, CO015, CO016, CO017]
FO001: Company milestone timeline

Founding to Series B + flagship research drops.

[CO014, CO015, CO016, CO017, CO021, CO022]

1.4 Scale, Cover Metrics, and Milestones

Public scale metrics remain partial. The company has stated it operates more than 20,000 NVIDIA Hopper-class GPUs across multiple regions, with public roadmap notes referencing Blackwell rollouts, and serves "hundreds of thousands" of developers via the Together API, but it has not disclosed audited ARR, gross margin, paid-developer counts, or net revenue retention. CNBC reported a $100M annualised revenue pace around the Series B; Bloomberg cited triple-digit growth without a specific number. Reported headcount tracks above 150 globally, with active openings spanning kernel, networking, ML, and sales. The milestone timeline anchors founding (June 2022), seed (May 2023), RedPajama 1T dataset (April 2023), OpenChatKit (March 2023), Series A (November 2023), FlashAttention-3 (July 2024), Series B at $3.3B (July 2024), and StripedHyena/Mixture-of-Agents research (late 2023–2024). No adverse litigation, layoffs, or regulatory action has been reported through the run date, but key cover metrics (gross margin, ARR confirmation, customer concentration) remain undisclosed and are reflected in the snapshot KPI table.[CO019, CO020, CO021, CO022, CO023, CO024]

Milestone table
DateEventTypeAmount/valuation/statusParticipantsImplication
2022-06-27Together Computer Inc. incorporatedfoundingactivePrakash, Zhang, Ré, LiangIdentity established
2023-03-10OpenChatKit launchedproductreleasedTogether + LAION + OntocordOpen-source instruct-tuning baseline
2023-04-17RedPajama 1T dataset releasedproductreleasedTogether + EleutherAI + LAIONFoundational open dataset (1T tokens)
2023-05-15$20M seed announcedfinancingclosedLux + Factory + SciFiInstitutional launch capital
2023-11-29$102.5M Series Afinancingclosed at undisclosed valuationKleiner Perkins (lead), NVIDIA, NEA, EmergenceScale-up & H100 build-out
2024-03-13Reported interim raise at $1.25BfinancingreportedExisting investorsMid-cycle uplift
2024-07-09$305M Series B at $3.3B postfinancingclosedSalesforce Ventures + Coatue (co-leads), NVIDIA, Lakestar3x valuation step-up; enterprise pivot
2024-07-11FlashAttention-3 paper & blogproductreleasedDao et al.State-of-the-art H100 inference kernel
2024-09Together Inference Engine 2.0productreleasedTogether engineeringLatency / throughput leadership claim
2023-12StripedHyena-Nous-7BproductreleasedTogether + Nous ResearchNon-attention long-context architecture
2024-06Mixture-of-Agents paperproductreleasedTogether researchAgentic LLM technique
2024-Q4Dedicated Endpoints GAproductreleasedTogether engineeringEnterprise inference offer

No reported adverse events (litigation, layoffs, regulatory action) at runDate; absence of adverse events is itself a diligence finding pending background check.

[CO019, CO020, CO021, CO022, CO023, CO024]
FO003: Snapshot KPIs

IC-ready snapshot of maturity, traction, and capital.

[CO019, CO021, CO022, CO024, CO032]

1.5 Exhibits

Chapter 02

02Market Analysis

2.1 Market boundary and adjacencies

Together AI sits in the AI compute and inference platform layer of the modern cloud stack — between hyperscaler GPU IaaS (AWS, GCP, Azure), specialised GPU clouds (CoreWeave, Lambda, Crusoe), inference-API providers (Replicate, Fireworks, Groq, Modal), and closed-API model labs (OpenAI, Anthropic). The market we underwrite is the spend dedicated to running, fine-tuning, and serving open-weight or customer-owned foundation models, plus the dedicated and serverless GPU capacity used for AI workloads. Excluded from this market are general-purpose cloud compute, traditional ML platforms (Sagemaker training-only, classical scikit pipelines), and closed proprietary model APIs that do not host customer weights. Adjacencies include MLOps tooling (Weights & Biases, Anyscale), vector databases, and AI safety/observability vendors. Status-quo substitutes are self-hosted Kubernetes-on-GPU clusters and closed-API rentals from OpenAI/Anthropic, both of which trade flexibility for price and operational simplicity. We also explicitly exclude per-seat AI copilots (Copilot, Cursor) because the unit of demand is end-user seats rather than inference tokens, which means they sit one layer above Together in the application stack and procure rather than substitute for token-level inference.[CM001, CM002, CM003, CM004, CM005, CM006]

Market definition table
SegmentIncluded spendExcluded spendBuyer/payerRelevance to Together
Open-weight model inference (API)Per-token serverless inference on Llama/Mistral/Qwen/DeepSeekClosed-API tokens (OpenAI/Anthropic)Developer + CTOCore SOM
Dedicated GPU capacityReserved H100/H200/B200 endpointsGeneral-purpose cloud computePlatform teamDirect expansion ARR
Fine-tuning + custom model hostingLoRA, full fine-tune, custom checkpoint hostingIn-house Kubernetes trainingML engineering leadHigh-margin attach
Batch inference + trainingMulti-million-token batch jobs, pretraining runsClosed training-only platformsResearch leadGrowth wedge
Sovereign / regional clustersDedicated in-region capacityPublic-region multi-tenantGovernment / regulated CIODifferentiated lane
MLOps + observabilityLogs, evals, fine-tune jobsBI/analyticsMLOps leadAdjacency, not core
Closed-API model rentalsOpenAI/Anthropic API spendApp developerSubstitute / pressure

Boundary anchors on customer ownership of weights and on GPU-backed compute as the billable unit; excludes general-purpose cloud and closed-only APIs.

[CM001, CM002, CM003, CM004, CM005, CM006]

2.2 TAM/SAM/SOM and sizing lenses

Multiple analyst sources converge on a 2024 AI infrastructure TAM of $40–60B with 30–50% CAGR through 2028 (Gartner, IDC, McKinsey). Within that envelope, the inference and dedicated AI compute SAM most relevant to Together is sized at $8–15B in 2026 by triangulating: hyperscaler AI revenue disclosures ($26B annualised AWS Bedrock-equivalent revenue extrapolated), Series B coverage citing inference as the fastest-growing line, and the ~$100M Together ARR proxy implying a single-digit share of an early SAM. SOM (Together-addressable, near-term winnable spend) is on the order of $1–3B, focused on AI-native startups, model labs, and the Salesforce + sovereign-cloud channels Together has explicit relationships with. Sizing is constrained by the lack of disaggregated public reporting from hyperscalers and by the conflation of training capex with inference run-rate in many published estimates.[CM007, CM008, CM009, CM010, CM011, CM012]

TAM/SAM/SOM or sizing lens table
PublisherYearGeographyValueCAGRMethodologyConfidenceLimitation
Gartner2024Global$40–60B AI infrastructure TAM30–50%Top-down survey of hyperscaler + enterprise AI spendmediumAggregates training+inference; not disaggregated
IDC (cited via secondary)2024Global$50B AI infrastructure 202435%Hardware + cloud forecastlowIndirect citation
McKinsey AI spend report2024Global$50–100B 2027 AI infra40%Scenario analysislowWide range; assumptions unclear
Triangulated SAM (this report)2026Global$8–15B inference + dedicated SAMBottoms-up from CNBC ARR + hyperscaler disclosuresmediumSingle-source dependence on hyperscaler quarterlies
Triangulated SOM (this report)2026Global$1–3B Together-addressableChannel + Together $100M ARRmediumHigh estimation uncertainty
NVIDIA earnings (data centre)2025-Q1Global>$30B/qtr DC revenue>50%Public filingshighIncludes training capex sales, not pure inference

TAM/SAM/SOM are bounded; ranges preserved because no single public source disaggregates inference spend cleanly.

[CM007, CM008, CM009, CM010, CM011, CM012]
FM001: Market sizing lens

TAM/SAM/SOM for Together-addressable AI compute.

[CM007, CM008, CM011, CM012, CM036]
FM002: Market estimate range

Inference SAM 2026 estimates.

[CM009, CM010]

2.3 Buyer, user, and payer segmentation

Three primary buyer segments drive Together demand. (1) AI-native startups and model labs: technical founders or CTOs choose Together for FlashAttention-class inference latency, dedicated H100/H200 access, and open-weight flexibility; these are typically self-serve credit-card purchasers escalating to enterprise contracts. (2) Enterprise platform teams and applied-ML groups inside Fortune-500 companies: budget owners are CIOs/CTOs evaluating multi-model strategies, with procurement gates around data residency, SOC 2, and BAA support; Salesforce Ventures co-leadership of the Series B underwrites this segment. (3) Government, research, and sovereign-cloud customers: Prosperity7 (Aramco) and similar sovereign-aligned LPs signal a Middle East/APAC angle, and Together has positioned dedicated regional clusters as a differentiator. Users (developers, ML engineers, researchers) often differ from payers (finance, procurement, IT), which lengthens enterprise cycles but improves NRR once landed.[CM014, CM015, CM016, CM017, CM018, CM019]

Segment / buyer map
SegmentBuyerUserPayerWorkflowBudget ownerAdoption trigger
AI-native startupCTOML engineerFounder/CFOSelf-serve API + LoRACTONeed open weights + dedicated GPUs
F500 platform teamCIOApplied MLIT procurementRFP + dedicated endpointsCIOMulti-model strategy + BAA
Sovereign cloudMinister/CIOGovernment MLTreasuryIn-region dedicated capacityGovernmentData residency mandate
Model labFounderResearcherFounderReserved training + inferenceFounderGPU scarcity at hyperscalers
Independent devSelfSelfSelfPer-token APISelfFree tier + pricing parity
Salesforce ecosystem ISVProduct VPEng teamProduct P&LEmbedded GenAIProduct VPSalesforce Ventures channel

Buyer/user/payer split distinguishes self-serve credit-card adoption from enterprise procurement gates.

[CM014, CM015, CM016, CM017, CM018, CM019]
FM003: Buyer / segment map

Adoption maturity by segment.

[CM014, CM015, CM016, CM017, CM018, CM019]

2.4 Growth drivers and constraints

Tailwinds: ongoing open-weight model proliferation (Llama 3/4, Mistral, DeepSeek, Qwen), GPU scarcity at hyperscalers, FinOps pressure to reduce per-token closed-API spend, and the agentic AI wave that multiplies token-volume per user. Headwinds: NVIDIA supply allocation favouring hyperscalers, sovereign data rules slowing cross-border inference, energy/permitting bottlenecks for new data centres, and competitive pricing pressure from Groq, Fireworks, and Cerebras at the inference layer. Adoption-timing risks include enterprise procurement friction, the possibility that hyperscalers commoditise the OSS-inference layer (AWS Bedrock open models, GCP Vertex Model Garden), and the volatile economics of training-vs-inference mix. Together's positioning depends on staying a generation ahead on inference kernels (FlashAttention 3/4, ThunderKittens) while expanding into reserved/dedicated SKUs that lock enterprise spend. Each driver and constraint feeds back into a binary question for IC: does the inference SAM compound at 35%+ for three more years, or does hyperscaler commoditisation pull growth forward into a single year of land-grab? Our base case assumes durable 30–40% CAGR through 2027 with widening competitive intensity from 2026 onward, which is the regime in which Together's open-source flywheel and dedicated-capacity differentiation produce the strongest IRR.[CM021, CM022, CM023, CM024, CM025, CM026]

Growth drivers and constraints table
Driver/constraintDirectionTimingImplicationDiligence ask
Open-weight model proliferation+2024-2027Sustains SAM growth >35% CAGRTrack Llama 4/5, DeepSeek, Qwen release cadence
NVIDIA Hopper/Blackwell scarcity+2024-2026Drives Together's reserved capacity premiumQuantify Together NVIDIA allocation pact
Closed-API price pressure (OpenAI cuts)-OngoingCompresses per-token marginTrack Together pricing parity vs OpenAI
Hyperscaler open-model commoditisation-2025-2027Erodes pure-inference SAMWatch AWS Bedrock & Vertex Model Garden expansion
Sovereign data residency rules+/-2025+Creates regional moats but caps cross-border ARRConfirm Together in-region clusters
Energy/permitting bottlenecks-2026-2028Slows capacity expansionConfirm Together DC contracts
Agentic workloads multiply tokens+2025+Increases inference volume per userTrack MoA + agent SDK adoption
FinOps push to OSS inference+2025+Tailwind for Together vs closed APIsSurvey enterprise FinOps strategy

Drivers cited from multiple analyst notes and partner statements; constraints triangulated from supply-chain reporting and hyperscaler announcements.

[CM021, CM022, CM023, CM024, CM025, CM026]
FM004: Adoption funnel or value-chain map

Discovery to expansion path.

[CM020, CM021, CM022, CM033]

2.5 Exhibits

Chapter 03

03Competitors

3.1 Competitive landscape segmentation

Together competes across five overlapping arenas. (1) Hyperscaler open-model offerings — AWS Bedrock and Google Vertex Model Garden host the same Llama/Mistral checkpoints Together offers, bundled with enterprise contracts and IAM. (2) Specialised GPU clouds — CoreWeave, Lambda Labs, and TensorWave compete for raw GPU-hour and reserved capacity; they typically lack the inference SaaS layer Together overlays. (3) Inference-API peers — Fireworks, Replicate, Modal, and Anyscale provide near-direct substitutes at the per-token serverless layer; Fireworks is most frequently cited as Together's closest direct rival. (4) Bespoke-silicon inference vendors — Groq (LPU), Cerebras (wafer-scale), and SambaNova compete on latency and price/token at the cost of model coverage. (5) Closed-API model labs — OpenAI and Anthropic act as substitutes for buyers willing to give up weight portability. The status-quo alternative is self-hosted Kubernetes-on-GPU, which trades flexibility for operational burden; internal-build is most common at frontier labs and FAANG. The competitive set is unusually broad because Together sits at the intersection of compute, model hosting, and developer experience; each arena exposes Together to different cost structures (capex-heavy GPU clouds vs OpEx-light API providers), different distribution power (hyperscaler procurement vs developer self-serve), and different exit dynamics (consolidation among GPU clouds vs commoditisation among API peers), all of which we underwrite separately below.[CP001, CP002, CP003, CP004, CP005, CP006]

Competitor profile table
CompetitorCategoryScale/fundingTarget segmentDifferentiationLimitation
AWS BedrockHyperscaler open-model>$80B AWS revenueEnterpriseIAM, compliance, bundlingPer-token premium, slower model adds
GCP Vertex Model GardenHyperscaler open-model~$30B GCP revenueEnterpriseGemini + open modelsLess open-weight depth
CoreWeaveSpecialised GPU cloud>$8B raised; public 2025AI labs, hyperscaler offloadLargest non-hyperscaler GPU fleetNo inference SaaS layer
Lambda LabsGPU cloud$320M Series CResearchers, startupsOn-demand H100/H200Smaller fleet vs CoreWeave
Fireworks AIInference API peer>$77M raisedDevs, startupsOpenAI-compatible APISmaller OSS-research footprint
ReplicateInference API peer>$40M raisedIndie devsCommunity models, low frictionCold-start latency
ModalServerless infra>$80M raisedML engPython-native serverlessLess model breadth
AnyscaleRay-based platform>$250M raisedML engRay + LLM toolingOSS-platform tax
GroqBespuke silicon>$1B raisedLatency-sensitive devsLPU inference speedLimited model coverage
CerebrasBespoke silicon>$1B raised; IPO filedFrontier customersWafer-scale chipHigh per-deployment cost
OpenAI / Anthropic (substitute)Closed API>$30B / $10B raisedEnterprise + devsFrontier closed modelsNo weight portability
TensorWaveAMD GPU cloudSeed-stageCost-sensitive devsMI300X capacityLimited scale

Funding and scale figures sourced from public press releases and Crunchbase summaries; some private funding rounds rely on third-party reporting.

[CP001, CP002, CP003, CP004, CP005, CP006]

3.2 Capability and feature comparison

On capability axes Together leads on FlashAttention-3/4 kernel performance, open-weight model breadth (Llama, Mistral, DeepSeek, Qwen, custom checkpoints), and dedicated-endpoint flexibility. Hyperscalers lead on enterprise compliance breadth (BAA, FedRAMP, regional residency) and bundled identity/billing. Groq leads on raw single-stream latency on supported models but lags on model coverage. Fireworks closely matches Together on serverless open-model APIs but has lower OSS-research visibility. Pricing comparison shows Together's serverless rates clustered around the OpenAI-parity envelope (≈$0.20–$0.90/M input tokens for 7–70B models) with batch discounts up to 50%; CoreWeave/Lambda undercut on raw GPU-hour but require customer DevOps; AWS Bedrock charges a per-token premium on top of underlying compute. Feature matrices below mark unsupported cells as unknown rather than guessing. The matrix shows Together winning on open-weight breadth and kernel performance, hyperscalers winning on compliance and IAM, and bespoke-silicon vendors winning on latency at the cost of model coverage; no single vendor dominates the four most cited buying criteria simultaneously. We also note that Together is one of only two vendors in the set that ships an OpenAI-compatible chat completions endpoint while also exposing fine-tune and batch SKUs, which materially shortens migration time for buyers leaving closed APIs.[CP009, CP010, CP011, CP012, CP013, CP014]

Feature / capability matrix
Buying criterionTogetherBedrockGCP VertexFireworksGroqCoreWeave
Open-weight model breadthhighmediummediumhighlown/a
FlashAttention-class kernel perfhighunknownunknownhighmediumn/a
Dedicated endpoints / reservedyesyes (provisioned)yesyesyesyes (raw)
Fine-tuning APIyespartialyesyesnono
Batch inference SKUyespartialyespartialnono
Compliance (SOC2/HIPAA/FedRAMP)SOC2; HIPAA via BAAfullfullSOC2unknownSOC2
Sovereign / regional clustersavailablefullfulllimitedunknownfull
OpenAI-compatible APIyesnonoyesyesno
Per-token list pricing transparencyhighmediummediumhighhighn/a
Multi-modal (vision/audio/image)yespartialyespartialnon/a

Cells marked "unknown" where public docs do not disclose; cells marked "n/a" where the feature is outside the competitor's SKU.

[CP009, CP010, CP011, CP012, CP013, CP014]
Pricing / packaging comparison
VendorSKUPrice/unitDiscountNotes
TogetherServerless Llama-70B$0.88/M tokensOpenAI-parity envelope
TogetherBatch inference-50% off serverlessbatchUpdated 2025
TogetherDedicated endpointcustomreservedQuoted via sales
FireworksServerless Llama-70B$0.90/M tokensSimilar parity
ReplicatePer-secondvariesGPU-second billing
AWS BedrockLlama 3 70B$0.99/M output tokensvolProvisioned reserved option
GCP VertexLlama 3 70B$0.99/MvolSimilar to Bedrock
GroqLlama 3 70B$0.59/MLatency premium
CoreWeaveGPU-hour$2–4/H100-hrreservedCustomer manages stack
LambdaGPU-hour$2.79/H100-hron-demandCustomer manages stack

Per-token prices reflect public list pricing on vendor sites at runDate; realised pricing for enterprise deals is undisclosed.

[CP016, CP017, CP018, CP019, CP020]
FP001: Competitive positioning map

Open-weight breadth vs enterprise compliance maturity.

[CP001, CP009, CP011, CP012, CP013]
FP002: Feature breadth / capability map

Capability strength by competitor.

[CP010, CP014, CP015, CP018, CP029]

3.3 Moat durability and competitive risk

Together's defensible moats are (a) FlashAttention research lineage and kernel velocity (with Tri Dao + Chris Ré), (b) open-source community gravity (RedPajama, StripedHyena, MoA), and (c) the NVIDIA + Salesforce + sovereign capital stack that secures GPU supply and enterprise distribution. Switching costs are mid: customers can multi-home across Together / Fireworks / Bedrock with API translation; however, dedicated-endpoint contracts and fine-tuned model artifacts on Together raise stickiness. Distribution power tilts to hyperscalers — they own enterprise procurement and identity — but Together's neutrality and open-weight commitment is a counter-positioning differentiator. Adverse competitor evidence: CoreWeave's 2024 IPO filings and Lambda's growth signal substantial capital advantage at the IaaS layer; Groq and Cerebras have raised >$1B each at higher valuations; Bedrock's 2025 expansion of Llama support compresses Together's premium on commodity workloads. Commoditisation risk is real but bounded by Together's research velocity and dedicated-capacity contracts. Net, we believe the moat is durable through 2027 in the dedicated and high-performance segments, with growing pressure on the commodity serverless tier from hyperscalers and on the latency-critical tier from bespoke silicon vendors; the company's ability to sustain kernel and architecture leadership is the gating variable for the moat thesis and is therefore the principal item on the technical-diligence checklist.[CP021, CP022, CP023, CP024, CP025, CP026]

Moat durability / competitive risk register
Moat claimThreatSeverityMitigation/diligence ask
FlashAttention research lineageOpen-source diffusion to competitorsmediumTrack Together's patent/IP posture
Open-source community gravityCompeting OSS projects from Mistral/HFmediumQuantify Together GH/HF traction over time
NVIDIA supply alignmentNVIDIA tilts to hyperscalershighDocument Together NVIDIA pact
Salesforce / enterprise channelSalesforce develops its own AI inframediumConfirm Salesforce commercial commit
Sovereign capital + regional clustersSovereign customers go direct to local cloudsmediumMap Together regional DC footprint
Dedicated endpoint stickinessBedrock provisioned throughput parityhighTrack Bedrock open-model price moves
Open-weight neutralityEnterprise wants closed-API simplicitymediumSurvey enterprise multi-model strategy
Inference engine performance leadSpecialised silicon (Groq/Cerebras) leapfrogshighBenchmark Together vs Groq on shared models

Moats ranked by exposure to competitive substitution and capital intensity; each row has a concrete diligence ask.

[CP021, CP022, CP023, CP024, CP025, CP026]
FP003: Moat / readiness KPIs

Compact competitive durability summary.

[CP021, CP022, CP023, CP024, CP025, CP026]

3.4 Exhibits

Chapter 04

04Financials

4.1 Funding history and capital stack

Together AI has assembled approximately $533M of disclosed primary capital across four publicly announced rounds. The seed of $20M (May 2023) was led by Lux Capital with Factory, SciFi, Long Journey, and notable individual backers (Scott Banister, Jakob Uszkoreit, Aravind Srinivas). A $102.5M Series A in November 2023 was led by Kleiner Perkins with NVIDIA, Emergence, NEA, Prosperity7, and Greycroft. In March 2024 the company added approximately $106M at a reported $1.25B valuation (sometimes referred to as Series A2), then closed a $305M Series B in July 2024 led by Salesforce Ventures and Coatue at a $3.3B post-money valuation, with Lakestar, NVIDIA, and several strategics participating. No S-1, S-3, or registered offering appears on SEC EDGAR for Together Computer Inc. at the runDate, and no public secondary or 2026 extension has been confirmed. The capital stack therefore reads as venture-only with strategic anchors (NVIDIA for GPU supply, Salesforce Ventures for enterprise distribution, Prosperity7 for sovereign optionality); board control is split across KP, Coatue and Lux based on round-led signaling, but the cap table itself is not public. Cumulative dilution is undisclosed; founders are widely reported to retain a meaningful equity stake post Series B, but exact percentages are not in the public record and must be verified against management.[CI001, CI002, CI003, CI004, CI005, CI006]

Capital adequacy table
Capital primitiveValueDatePublic statusDiligence ask
Cumulative primary capital~$533M2024-07disclosed (round level)
Cash on handundisclosedmissingRequest cash position
Monthly burnundisclosed (~$15-25M implied)2024-25missingRequest actual burn
Runway monthsundisclosed (likely 18-30 implied)2025missingRequest runway plan
Planned use of fundsundisclosedmissingRequest capex plan
Next-round triggerundisclosedmissingRequest milestones
Debt / project financeundisclosedmissingRequest facility terms
Vendor financing (NVIDIA)undisclosedmissingConfirm any equipment financing
Series B valuation$3.3B post2024-07-09disclosed
Latest secondary clearing priceundisclosedmissingPitchbook / Information chatter

Capital primitives mix disclosed round amounts with undisclosed forward-looking financial primitives.

[CI007, CI023, CI026, CI027, CI030, CI031]
FI004: Capital intensity / cash-flow map

Capex and operating cash flow mapped against funding rounds.

Cash balance and next-round trigger are undisclosed; arrows illustrate direction not magnitude.

[CI007, CI026, CI027, CI030]

4.2 Revenue, pricing, and reported scale

Together has not filed financial statements. CNBC reporting around the July 2024 Series B cited a $100M annualised revenue pace; Bloomberg cited "triple-digit growth"; Fast Company and VentureBeat repeated those figures without independent verification. The Information has separately reported on 2025 revenue trajectory behind a paywall; PitchBook lists the company as later-stage venture without a confirmed 2025 follow-on. Pricing is published per-token on the public pricing page, ranging roughly $0.20–$0.90/M tokens for 7–70B open models, with a documented 50% batch-inference discount and custom dedicated-endpoint pricing quoted via sales. SKUs include serverless, dedicated/reserved endpoints, fine-tuning, batch, and embeddings; vision/audio/image SKUs are documented separately. There is no published ARR, segment split, customer concentration, NRR, or gross margin disclosure at the runDate. Forrester and IDC market frames imply Together is a growth-stage entrant in a multi-billion-dollar generative-AI inference TAM, but neither analyst names Together among its top three vendors. Management has acknowledged enterprise pipeline acceleration tied to Salesforce Ventures co-selling but has not quantified it. This combination of company-claimed momentum, third-party press anecdotes, and absent audited disclosure is consistent with private growth-stage SaaS, but creates significant diligence risk on realised vs list pricing, mix, and gross margin. The GTM motion is dominated by self-serve developer signup at the top of the funnel and partner-led enterprise expansion through Salesforce Ventures and NVIDIA channel referrals; sales-cycle length, CAC, and payback are undisclosed but can be inferred to be 60-120 days for enterprise dedicated contracts based on comparable inference-API vendor disclosures.[CI009, CI010, CI011, CI012, CI013, CI014]

Revenue streams table
SKUPricing basisPublic price benchmarkDiscount leversDiligence gap
Serverless inferenceper million tokens$0.20–$0.90/M (open models 7–70B)volume / committed-useRealised vs list pricing not disclosed
Batch inferenceper million tokens50% discount vs serverlessbatch SLA windowConfirmed via 2025 blog update
Dedicated endpointscustom / reservedquoted via salesterm commitmentNo published list pricing
Fine-tuning APIper training runquoted on pricing pagevolumePublic docs but no margin disclosure
Embeddings APIper million tokenspublished per-modelvolume
Vision / image / audio APIsper request / per tokenpublished per modelRevenue mix not split
Enterprise contractsannual / committedundisclosedstrategic discountsCritical diligence ask

Pricing rows mix published list pricing (high confidence) and inferred enterprise practice (low confidence); revenue mix between SKUs is not disclosed and must be requested.

[CI009, CI010, CI011, CI012, CI013, CI014]
Pricing / monetization table
Pricing dimensionPublic benchmarkList vs realisedDiscount / unknownSource
Per-token Llama-70B$0.88/M serverlesslist onlyvolume discountpricing page
Batch SLA discount-50% vs serverlesslist onlybatch window2025 batch blog
Dedicated endpointcustom / per hourrealised not disclosedterm commitblog + sales-quoted
Fine-tuning runper training tokenlist onlyvolumedocs FT page
Embeddingsper million tokenslist onlyvolumedocs embeddings page
Enterprise contract valuenot disclosedrealised undisclosedstrategic discountsrequested from management
Co-sell rebates (Salesforce)not disclosedrealised undisclosedpartner economicsSalesforce Ventures co-sell
Sovereign-cloud premiumnot disclosedrealised undisclosedregionalProsperity7 strategic

List pricing is publicly verifiable; realised pricing across enterprise contracts is undisclosed and must be requested.

[CI012, CI013, CI014, CI015, CI016]
FI001: Revenue model bridge

How customer activity converts to Together revenue and gross profit.

Gross-profit edge is illustrative; realised margin is undisclosed.

[CI012, CI013, CI014, CI015, CI024, CI025]

4.3 Unit economics, capital adequacy, and gaps

Together's public profile permits only rough unit-economics estimation. On the cost side, GPU-hour COGS scale with NVIDIA capex; CoreWeave's S-1 disclosures (a useful comparable) show GPU cloud gross margins in the 60–70% range on reserved deals and lower on on-demand. Per-token gross margin at Together's list pricing is plausibly 40–60% on serverless and higher on dedicated, but realised margin depends on utilisation and reserved-capacity contracts that are not public. On the cash side, $533M raised against an implied $300–$500M cash burn through 2024 (consistent with hyperscale GPU buildout and a 150+ headcount) suggests runway into 2026, but no figure is confirmed. Capital adequacy depends on whether Together extends Series B or files for IPO; the Figma and CoreWeave 2025 IPO precedents show the public-market window is open for AI-infrastructure issuers, while Navan's S-1 process is a closer growth-SaaS comparable. Gaps are material: ARR confirmation, gross margin by SKU, customer concentration top-10, net dollar retention, contracted vs uncontracted revenue, runway months, debt or vendor financing, and any sovereign-cloud commitments. These gaps drive the diligence ask list in the unit-economics and capital-adequacy tables and underpin a material evidence-gap entry for each undisclosed primitive; absent management disclosure, the most informative external signals are Together's public hiring posture, pricing-page revisions, and any 2026 secondary-market chatter, all of which should be tracked through the close of diligence. Working capital is unlikely to be a constraint at this scale of consumption-based SaaS; the bigger swing factor on cash is the pace of GPU capex relative to revenue ramp, which sets the cadence for the next round trigger. The verdict is that revenue quality and growth optics are strong but unverified; margin path is plausible but unaudited; capital intensity is high but underwritten by NVIDIA alignment; and the principal diligence blocker is the full set of private financial primitives enumerated in the public-financial-gaps table.[CI019, CI020, CI021, CI022, CI023, CI024]

Unit economics table
MetricValue / nullConfidenceWhy it mattersDiligence ask
Serverless gross margin40–60% (inferred)lowLong-term margin pathRequest actual blended GM
Dedicated gross margin60–75% (inferred)lowReserved customer LTVRequest dedicated GM split
Batch gross margin35–55% (inferred)lowBatch GM after 50% discountConfirm batch utilisation
CAC paybacknulllowSales efficiencyRequest payback months by segment
Magic numbernulllowSales productivityRequest magic number
NRRnulllowExpansion proxyRequest NRR by cohort
Gross retentionnulllowChurn proxyRequest gross retention
Implied burn 2024$300–$500M (inferred)lowCash adequacyRequest 24-month plan
Utilisation of GPU fleetnulllowUtilisation drives GMRequest utilisation by SKU
SBC rationulllowTrue marginRequest SBC schedule

All values are inferred ranges or nulls; every null is accompanied by a specific diligence request.

[CI019, CI020, CI021, CI022, CI023, CI024]
Public financial gaps table
ItemPublic statusWhy it mattersDiligence ask
Audited revenue (ARR)not disclosedValidates third-party $100M figureRequest management ARR & growth deck
Gross margin by SKUnot disclosedUnderpins long-term thesisRequest COGS breakdown by SKU
Net dollar retentionnot disclosedStickiness proxyRequest NDR by cohort
Top-10 customer concentrationnot disclosedRevenue concentration riskRequest top-10 anonymised
Contracted revenue (RPO)not disclosedForward visibilityRequest contracted vs uncontracted split
Cash & runwaynot disclosedCapital adequacyRequest cash position & 24-mo plan
Debt / vendor financingnot disclosedCapital structureRequest facility terms, if any
Founder ownershipnot disclosedAlignment, dilutionRequest cap table
NRR vs gross retentionnot disclosedExpansion vs churnRequest gross / net retention
Stock-based compensationnot disclosedReal vs reported marginRequest SBC schedule
Realised enterprise pricingnot disclosedTrue margin vs listRequest three sample contracts

All items are material to underwriting and none are public at the runDate; chapter relies on third-party signals and management requests to close the gap.

[CI019, CI020, CI021, CI022, CI023, CI024]
FI002: Unit economics bridge

Inputs to per-token unit economics in absence of disclosed values.

All quantitative nodes are inferred ranges or null; this is a qualitative bridge.

[CI012, CI016, CI019, CI020, CI024, CI025]
FI003: Financial estimate range

Source-backed bounds on revenue, burn, runway, and margin.

Ranges are illustrative; lower bound is most conservative public datapoint and upper bound reflects 2x of most aggressive public datapoint.

[CI009, CI024, CI025, CI026, CI027]

4.4 Exhibits

Chapter 05

05Product & Technology

5.1 Product surface, modules, and SKUs

Together AI exposes a single platform with serverless inference, dedicated endpoints, fine-tuning, batch inference, embeddings, and modality-specific APIs (vision, audio, image). The product surface is documented at docs.together.ai and is OpenAI-compatible at the chat-completions level, making migration from closed APIs straightforward. The model catalog spans 200+ open models including Llama 3/4, Mistral, Mixtral, Qwen, DeepSeek, StripedHyena, and custom fine-tuned checkpoints; published model and SKU references confirm per-token and per-request billing surfaces. Dedicated endpoints offer reserved capacity on H100/H200/B200 GPUs for latency-sensitive workloads and are quoted via sales. The fine-tuning API accepts LoRA and full-parameter training jobs across most supported families. Batch inference offers up to a 50% discount versus serverless with documented SLA windows. SDKs ship in Python and TypeScript, with raw HTTP for any other runtime; rate-limit documentation distinguishes free, paid, and enterprise tiers. The complete product module / asset matrix below enumerates each module, its primary user, maturity status, differentiation, and the gap a buyer should probe before committing to a long-term contract. Module ordering follows the buyer's typical adoption sequence: serverless first for experimentation, then dedicated and fine-tuning for production, then batch and embeddings for scaled workflows.[CE001, CE002, CE003, CE004, CE005, CE006]

Product module / asset matrix
ModuleUserStatus/maturityDifferentiationDiligence gap
Serverless inference APIDevelopers, startupsGAOpenAI-compatible chat completions on 200+ open modelsSLA % not published
Dedicated endpointsEnterpriseGAReserved H100/H200/B200 capacity, BAA availableList pricing not published
Fine-tuning APIML engineersGALoRA + full-parameter on Llama/Mistral/QwenTraining cost transparency
Batch inferenceML engineersGA (2025 update)50% discount vs serverlessRealised batch utilisation undisclosed
Embeddings APIDevelopersGAMultiple open embedding modelsPer-model retention tracking
Vision / image / audio APIsMulti-modal devsGALlama-Vision, image generation, audio transcriptionRegional availability map
Together Inference Engine (TIE v1/v2)Internal / advancedGAFA-3/4 + TK + speculative decodingEngine-version SLA differences
Mixture-of-AgentsResearchers, advanced devsbetaEnsemble inference for higher qualityCost premium vs single-model
Model storeAll usersGA200+ open + custom weightsCatalog churn cadence
SDKs (Python, TS, HTTP)DevelopersGAOpenAI-compat + nativeSDK release cadence

Maturity follows public docs status; cells marked beta or limited reflect explicit docs statements at runDate.

[CE001, CE002, CE003, CE004, CE005, CE006]
Workflow / use-case table
User jobCurrent workflowTogether solutionMeasurable benefitLimitation
Try an open modelLocal llama.cpp or HF SpacesServerless API callZero infra, OpenAI-compatCost at scale
Move production from closed APIOpenAI SDKSwap base URL to TogetherSame SDK, open weightsFeature parity edges
Fine-tune a Llama variantCustom GPU clusterFine-tune API + run jobNo DevOps neededLimited training-step visibility
Serve a low-latency appSelf-hosted vLLMDedicated endpointReserved capacity, BAAHigher commit
Run nightly batch summarisationSelf-hosted batchBatch inference SKU50% cheaper than serverlessBatch SLA window
Build an agentLangChain + closed APIFunction-calling + JSON mode + structured outputOpen-weight + tool useTool-call patterns evolving
Generate embeddingsHF embedding models locallyEmbeddings APIHosted, scalableRe-index cost
Multi-modal (vision)Self-host Llama-VisionVision APIHosted vision callImage-size limits
Research ensemblePaper-replication codeMoA APIOut-of-box ensembleHigher per-query cost
Run a regulated workloadOn-prem GPUDedicated + BAAHIPAA on dedicatedNo FedRAMP yet

Workflow rows are drawn from docs quickstarts and customer case studies; limitations are explicit docs caveats or known gaps.

[CE001, CE002, CE003, CE004, CE005, CE006]
FE001: Product architecture map

Together AI product stack layers from API down to GPU substrate.

[CE001, CE011, CE012, CE013, CE014, CE015]

5.2 Architecture, dependencies, and operating model

Together's architecture stacks application APIs (chat, completions, embeddings, fine-tune, batch) over a model registry and inference orchestrator that schedules GPU pods on a multi-region NVIDIA Hopper/Blackwell fleet. The inference engine (Together Inference Engine v1/v2) wraps FlashAttention-3 and FlashAttention-4 attention kernels, ThunderKittens kernel framework, and speculative-decoding/Medusa decoders to achieve published throughput and latency claims. Mixture-of-Agents (MoA) research enables ensemble inference for higher-quality completions on supported models. The model store is backed by HuggingFace and Together's own registry; weight portability is a stated design principle. Critical dependencies include NVIDIA GPU supply (Hopper/Blackwell), data-center co-location partners, the HuggingFace catalog for model artefacts, and AWS S3/equivalent storage for fine-tune artefacts. The operating model splits a kernel/inference engineering team (Tri Dao, HazyResearch lineage) from a platform/SRE team (Alon Gavrielov-led infrastructure org from 2025) and a research arm (Chris Ré, Percy Liang). The architecture is exposed through a flow figure (customer request to GPU pod to response) and a critical-dependency DAG that surfaces single-vendor concentrations. Reliability proof points are a status page, published rate-limit documentation, and a published roadmap of model launches at GTC 2025 and AI Native Conference 2025. Gaps include a public SLA percentage, the precise multi-region map (which regions, which providers), and a single-source-of-truth roadmap; all are flagged as evidence gaps.[CE011, CE012, CE013, CE014, CE015, CE016]

Technology / operating architecture table
Layer/componentRoleKey dependencyRisk
API gatewayReceive OpenAI-compat HTTP requestsAuth + rate limit infraDDOS, rate-limit miscalibration
Model registryResolve model id to weightsHuggingFace + internal storageWeight churn, license updates
Inference schedulerPlace request on GPU podGPU pool, kube/orchestratorHot-spotting, queue depth
Together Inference Engine v2Kernel-optimised model executionFA-3/4, ThunderKittens, speculative decodingEngine bug, regression on new model
GPU pool (Hopper / Blackwell)Compute substrateNVIDIA supply, co-lo partnersSupply shock, power outage
Fine-tune trainerLoRA / full-parameter training jobsGPU pool + object storageJob-failure cost
Batch queueSchedule batch inferenceGPU off-peak windowSLA violation if peak overlaps
Embedding serviceEmbed text/imagesEmbedding model registryModel deprecation
Vision/audio pathMulti-modal inferenceSeparate model stackMode-specific bugs
Observability / statusSLA monitoringstatus.together.ai feedPublic SLA still missing
Trust / complianceSOC 2 + HIPAA controlsAudit cadenceFedRAMP not yet GA
Storage (fine-tune artefacts)Persist trained modelsS3-equivalent storageLoss/leak scenarios

Architecture layers reflect documented surfaces; depth of each layer is inferred from blog + research papers and may not be exhaustive.

[CE011, CE012, CE013, CE014, CE015, CE016]
FE002: Customer workflow / operating flow

Customer request through Together platform to a completion.

[CE011, CE012, CE013, CE014, CE015, CE016]
FE003: Critical dependency map

Suppliers, platforms, and partners Together depends on.

[CE014, CE018, CE019, CE020, CE021, CE022]

5.3 Trust, security, compliance, and roadmap

Together publishes a trust center referencing SOC 2 Type II attestation, HIPAA business associate agreement (BAA) availability on dedicated endpoints, and standard data processing terms. FedRAMP and similar US-Federal accreditation are not yet listed at the runDate; regional residency is offered through dedicated clusters but the public map is partial. Safety controls span content moderation, function-calling JSON validation, structured-output JSON mode, and per-model safety guidance. The roadmap, mined across the blog and AI Native Conference posts, includes Blackwell (B200) capacity ramp, batch inference SKU expansion, expanded fine-tune families, multi-modal (vision+audio) coverage, and Mixture-of-Agents productisation. Differentiation rests on (a) kernel-level performance lead (FA-3/4, TK), (b) breadth of open-weight model coverage, (c) flexibility across serverless/dedicated/batch SKUs, and (d) dual research-and-engineering culture with deep Stanford/Princeton/ETH lineage. Public developer signal — GitHub repo activity, PyPI download trajectory, HuggingFace model hub presence, and Hacker News thread engagement — confirms an active developer community without yet matching the scale of OpenAI or Hugging Face itself. Compared with hyperscaler offerings, Together's differentiation is most visible on open-weight neutrality and kernel performance and least visible on enterprise compliance breadth. The trust/compliance and roadmap tables below summarise each control and milestone with its current status, scope, and gap; cells marked unknown reflect missing public disclosure rather than absence of the underlying capability.[CE023, CE024, CE025, CE026, CE027, CE028]

Trust / quality / compliance table
Control / certificationStatusScopeGap
SOC 2 Type IIattestedplatformNeed recent attestation date
HIPAA / BAAavailablededicated endpointsNot on serverless tier
GDPR / DPAavailableEU customersSpecific regional residency
FedRAMPnot yetUS FederalRoadmap timing not confirmed
ISO 27001not confirmedStatus uncertain
Data residency / regional clusterpartialEU, USPublic region map limited
Content moderation / safetydocumentedAPI-levelPer-model behaviour differs
Function calling / JSON modeGAAPITool-use patterns evolving
Structured outputGAAPI
Audit logsdocumentedenterpriseDefault not enabled
Custom model weights privacydocumenteddedicatedNeed contract review
Bug bounty / responsible disclosurepublishedplatform

Controls cross-verified against trust.together.ai pages, blog posts, and public docs; cells marked "not confirmed" reflect absent public disclosure rather than absence of the underlying control.

[CE023, CE024, CE025, CE026, CE027, CE028]
Roadmap / release / development-stage table
Date / stageFeature / milestoneStatusImplicationSource
2024-07FlashAttention-3GAKernel lead on HopperarXiv 2407.08608
2024-10ThunderKittensGAKernel frameworkTogether blog
2024-11Startup AcceleratorLaunchedGTM channelTogether blog
2025-03GTC 2025 PioneerseventCustomer + NVIDIA visibilityTogether blog
2025-04Alon Gavrielov as VP InfrahiredOperating scaleTogether blog
2025-05Adaption partnershipLaunchedHealthcare workflowTogether blog
2025-06AI Native ConferenceeventResearch + product announcementsTogether blog
2025-08FlashAttention-4GANext-gen kernelTogether blog
2025-09Batch inference API updatesGA50% discount + SLATogether blog
2026-Q1Blackwell (B200) rolloutplannedCapacity & priceInferred from docs
2026Expanded MoA productisationplannedQuality tierAI Native Conference
2026Multi-modal expansionplannedVision+audio coverageTogether blog

Roadmap items beyond runDate are explicitly marked planned; sources include blog posts and conference announcements.

[CE033, CE034, CE035, CE036, CE037]
FE004: Product maturity / capability map

Maturity rating across product modules.

[CE001, CE002, CE003, CE004, CE005, CE006]

5.4 Exhibits

Chapter 06

06Customers

6.1 Customer segmentation and adoption surface

Together AI's customer base is segmented by buyer/user role and by deployment intensity. The top of the funnel is self-serve developers using serverless inference for prototyping or low-volume production: per company disclosure, more than 100,000 developers have used the platform since GA. Beneath that sit named startup customers — Pika (video), Arcee (open-source merging), Nous Research (community models), Cartesia (voice) — who run production workloads via a mix of serverless and dedicated endpoints. The enterprise tier is anchored by Salesforce (referenced via Salesforce Ventures co-sell and a customer case study), Zoom (customer case study), and Washington University (research deployment); the NVIDIA GTC 2025 Pioneers programme surfaced an additional cohort of customers including healthcare, robotics, and developer-tools companies. The Startup Accelerator (launched 2024-11) is an explicit funnel for early-stage AI startups, providing credits, technical support, and GTM amplification. Geographic mix is North America-skewed with EU presence growing through dedicated clusters; vertical mix spans developer tools, content/media (video, voice, image), enterprise SaaS, healthcare, and academia. Payer/user/buyer split varies by tier: in self-serve the developer is both buyer and user; in enterprise the buyer is typically a CTO/CIO or platform-engineering lead while the users are application teams. Customer segmentation, adoption-trajectory and named-customer-proof tables below capture each row's evidence quality and the residual gap on retention and concentration.[CU001, CU002, CU003, CU004, CU005, CU006]

Customer segmentation table
SegmentBuyer/user/payerUse caseScaleRevenue / strategic valueGap
Self-serve developersDeveloper = buyer + userPrototyping, low-volume production100,000+ devs (company claim)Long-tail revenue + funnelPaid vs free not split
AI-native startupsCTO/founderProduction inferencePika, Cartesia, Nous, Arcee documentedHigh strategic valueNo revenue values disclosed
Enterprise SaaSCIO/platform engEmbedded AI featuresSalesforce, ZoomLarge strategic valueContract sizes not disclosed
HealthcareCIO/clinical leadRegulated workflows (BAA)Adaption (2025 launch)StrategicProduction status TBD
Academia / researchPI / IT leadResearch computeWashington UniversityBrand valueSpend size not disclosed
Developer toolsFounder/CTOEmbedded inferenceGTC 2025 cohortPipelineCohort not enumerated
Sovereign / govtProcurementSovereign cloudProsperity7-aligned (implied)Strategic optionalityNo public proof
Open-source communityMaintainersOSS model servingHuggingFace mirror integrationBrand + communityActive vs passive use

Segmentation rows mix named case studies with inferred categories; revenue-band values are unavailable.

[CU001, CU002, CU003, CU004, CU005, CU006]
Customer growth / adoption trajectory table
MetricValueDateSourceConfidenceImplicationMissing denominator
Developers using platform100,000+2024Together bloglowTop-of-funnel scalePaid vs free split
Named customer case studies7+ published2024-25Together bloghighReal production usageTotal customer count
GTC 2025 customer cohort~12 pioneers2025-03Together blog + NVIDIAmediumEnterprise pipelinePer-customer ACV
Startup Accelerator participantsundisclosed N2024-11 onwardsTogether bloglowPipeline leverCohort size
Adaption healthcare partner1 (launched)2025Together blogmediumRegulated entryProduction status
HuggingFace integration usersundisclosed2024-25HF bloglowOpen-source pullActive developers
G2 reviewsvery small N2025G2lowIndependent proofVolume too low to be representative
Trustpilot reviewsvery small N2025TrustpilotlowIndependent proofVolume too low to be representative

Trajectory rows mix company-claimed (low confidence) and third-party-reported numbers; denominators are explicitly listed as missing.

[CU001, CU002, CU011, CU012, CU013, CU014]
FU001: Customer journey map

Self-serve developer to enterprise expansion path.

[CU001, CU002, CU003, CU004, CU005, CU006]
FU002: Adoption / deployment funnel

Stage-by-stage developer-to-enterprise conversion.

Awareness, active-paid, and multi-year counts are illustrative placeholders; only signup and named counts are sourced.

[CU001, CU002, CU011, CU012, CU013, CU014]

6.2 Named-customer proof and durability

Named-customer proof spans seven public case studies (Salesforce, Zoom, Pika, Arcee, Nous Research, Cartesia, Washington University) plus the GTC 2025 Pioneers cohort and the Adaption healthcare partnership. Each case study documents the customer's workflow, model used, and qualitative outcome; quantitative outcomes (throughput, latency, cost, ROI) are documented for some but not all deployments. The most-cited outcomes are FlashAttention-driven latency reduction (Pika, Cartesia), cost reduction versus closed APIs (Arcee, Nous), and integration depth (Salesforce, Zoom). Production vs pilot is explicit for Salesforce, Zoom, Pika, Cartesia (production); Adaption is described as a launching partnership rather than a confirmed production deployment. Adverse and durability signals are mixed: G2 and Trustpilot review counts are low, limiting independent retention proxy; Reddit and Hacker News threads occasionally cite latency or cold-start concerns on serverless tier; no public churn announcement or terminated-customer report has been published. The customer proof matrix below tags each named customer with evidence quality, outcome specificity, retention visibility, and production maturity. Retention and repeat-usage primitives (NRR, GRR, gross retention) are not disclosed, and the chapter records that gap as a material evidence gap with a concrete diligence ask. Reference-quality and freshness are best for the 2024-2025 case studies (Salesforce, Zoom, Pika) and weaker for older case studies that have not been updated in 2026.[CU012, CU013, CU014, CU015, CU016, CU017]

Named customer proof table
CustomerSegmentDeployment / use caseProduction vs pilotOutcomeLimitation
SalesforceEnterprise SaaSCo-sell + embedded inferenceproductionIntegration depth + Series B leadContract value not disclosed
ZoomEnterprise SaaSAI feature inferenceproductionLatency improvementSpecific metrics not public
PikaStartup (video)Video model servingproductionLatency reduction via FA-class kernelsCost benefit qualitative
CartesiaStartup (voice)Voice model servingproductionThroughput on dedicatedPricing not disclosed
ArceeStartup (open-source)Model merging + inferenceproductionCost vs closed APIsVolume not disclosed
Nous ResearchOpen-source communityCommunity model hostingproductionOpen-weight neutralityRevenue mix not disclosed
Washington UniversityAcademiaResearch computeproductionResearch throughputSpend size not disclosed
AdaptionHealthcareRegulated workflowlaunchingHealthcare entryProduction status TBD
GTC 2025 Pioneers cohortEnterprise mixVariousproductionNVIDIA + Together jointCohort not fully enumerated

Rows reflect publicly named customers with case-study or press evidence; private named customers (if any) are not in this table.

[CU012, CU013, CU014, CU015, CU016, CU017]
Retention / repeat usage / satisfaction table
MetricValue/nullSegmentConfidenceDiligence ask
NRRnullenterpriselowRequest NRR by cohort
GRRnullenterpriselowRequest gross retention by cohort
Logo churnnullenterpriselowRequest named-account churn list
Active developers (paid)nullself-servelowRequest paid-developer count
Repeat purchase ratenullself-servelowRequest cohort repeat rate
G2 average ratingvery small Nself-servelowCannot extrapolate from small N
Trustpilot average ratingvery small Nself-servelowCannot extrapolate from small N
Reddit/HN sentimentmixed-to-positivecommunitylowAggregate qualitative scan
Named-customer renewalsnullenterpriselowConfirm via reference calls
Dedicated-endpoint renewal ratenullenterpriselowRequest renewal cohort

All retention primitives are null and accompanied by a specific diligence ask.

[CU022, CU023, CU024, CU025, CU026]
FU003: Customer proof matrix

Evidence quality across named customers; rows pivot per-customer evidence axes complementing the named-customer-proof table.

[CU012, CU013, CU014, CU015, CU016, CU017]

6.3 Expansion, concentration, and adverse signals

Expansion proxies are mostly qualitative. The Salesforce Ventures co-sell relationship is the principal enterprise expansion lever, with the Series B led by Salesforce Ventures interpreted by the market as a multi-year channel commitment; NVIDIA GTC 2025 Pioneers and the Startup Accelerator add brand and pipeline. The HuggingFace partnership funnels developers from the model hub into Together. Concentration risk is impossible to bound precisely without management disclosure, but the public customer mix is skewed toward AI-native startups and developer-tools companies rather than a small number of mega-enterprise contracts, which suggests broader top-of-funnel diversification than e.g. an OpenAI-style anchor-customer model. Channel and procurement friction is documented on the dedicated tier: enterprise sales cycles require sales engagement, custom MSAs, and security review, which adds 60-120 days before revenue. Adverse signals include scattered Reddit and Hacker News threads citing latency, cold-start, or occasional reliability events on the serverless tier; the company maintains a public status page but does not publish an SLA percentage. No public lawsuit, lost-customer report, or named-account churn has surfaced through the runDate. The expansion-and-concentration table below records each expansion driver, concentration risk, impact magnitude, and the precise diligence path required to close the residual gap; the chapter's retention table treats every undisclosed primitive as a diligence ask rather than asserting a number that cannot be sourced. Overall the customer evidence base is consistent with a growth-stage inference platform building real enterprise traction on top of a strong self-serve developer flywheel.[CU027, CU028, CU029, CU030, CU031, CU032]

Expansion and concentration risk table
Expansion driverConcentration riskImpactDiligence path
Salesforce Ventures co-sellSalesforce concentration in enterprise winshighQuantify pipeline % from Salesforce
NVIDIA GTC PioneersNVIDIA referral concentrationmediumQuantify GTC-sourced ACV
Startup AcceleratorLong-tail dilution risklowTrack cohort revenue conversion
HuggingFace partnershipHF dependence for funnelmediumConfirm cross-promote terms
Self-serve developer growthLong-tail churn risklowCohort retention by month
Adaption healthcare entrySingle named partner riskmediumTrack follow-on healthcare wins
Sovereign / Prosperity7 channelSovereign concentration if materialisesmediumConfirm pipeline commits
Open-source communityBrand dependence on OSS pulllowTrack GH/HF/PyPI signal stability
Top-10 customer concentrationMaterial if undisclosedhighRequest top-10 anonymised
Geographic concentrationNA-heavymediumRequest regional revenue split

Expansion drivers and concentration risks ranked qualitatively in absence of disclosed customer-revenue breakdown.

[CU027, CU028, CU029, CU030, CU031, CU032]
FU004: Retention / repeat cohort

Time-series retention placeholder using sector-typical PLG SaaS proxy values; all numbers illustrative pending Together disclosure.

All retention cells are illustrative sector benchmarks (PLG SaaS / inference); Together has not disclosed actual cohort retention.

[CU022, CU023, CU024, CU025, CU026]

6.4 Exhibits

Chapter 07

07Risks

7.1 Regulatory and legal risk surface

Together AI faces the same Generative-AI regulatory perimeter as all foundation-model platforms operating in the United States and Europe. In the US, the FTC opened a 6(b) study into generative-AI investments and partnerships in 2024 and has signalled broad antitrust scrutiny of cloud-AI relationships; the Biden / Trump-era Executive Order on AI established a foundation for federal AI standards that the NIST AI Risk Management Framework operationalises. The BIS has tightened export controls on advanced GPUs (A100, H100, H200, B200) and on the export of certain foundation-model weights, directly relevant to a GPU-cloud operator. In the EU, the AI Act entered into force in 2024 with phased obligations for general-purpose AI providers culminating in 2026-2027; the UK ICO and Australian OAIC have published GenAI guidance that creates de-facto compliance baselines. Privacy regimes (CCPA in California, HIPAA for healthcare workloads) impose contract-level obligations Together discharges via BAAs and SOC 2 controls referenced on its trust center. On the litigation side, the NYT v Microsoft/OpenAI docket, Authors Guild v OpenAI, and Getty v Stability AI are the bellwether copyright cases whose outcomes will shape exposure for every model-hosting platform; Together itself is not currently a named defendant but its open-model hosting business carries adjacent exposure if precedent extends to platform-as-host. Civil-society pressure (CDT, EFF) adds reputational risk. The regulatory and legal risk register below ranks each line item by jurisdiction, likelihood, severity, mitigation, and residual exposure, with diligence asks for every undisclosed control.[CR001, CR002, CR003, CR004, CR005, CR006]

Regulatory / legal risk register
Rule / caseJurisdictionStatusLikelihoodSeverityMitigationResidual exposure
FTC 6(b) generative-AI inquiryUSongoinghighmediumengage counsel, monitorpossible behavioural remedies
FTC general AI enforcementUSactivemediummediumstandard advertising/competition complianceenforcement action
EU AI Act (GPAI)EUphased 2024-27highhighGPAI obligations, transparency, copyright opt-outnon-compliance penalties up to 7% revenue
BIS export controls (GPUs + weights)US/globaltightened 2025highhighgeo-fence customers, screeningblocked sovereign deployments
NIST AI RMFUSvoluntarymediumlowadopt framework controlsprocurement disadvantage if absent
UK ICO GenAI guidanceUKactivemediummediumUK DPA + GDPR postureenforcement exposure
Australia OAIC GenAI guideAUactivelowlowadopt guidanceenforcement exposure
White House EO on AIUSactivemediummediumreporting thresholdsreporting burden
CCPA (California)US-CAactivemediummediumprivacy controlsenforcement exposure
HIPAA (healthcare workloads)USactivemediumhighBAA, dedicated tierbreach + fines
SOC 2 attestation surfaceglobalself-declared on trust centermediummediumSOC 2 Type II evidenceattestation gap if expired
NYT v Microsoft/OpenAI (copyright)USactive litigationhighmediummonitor; platform-host distinctionprecedent extension risk
Authors Guild v OpenAIUSactive litigationhighmediummonitor; platform-host distinctionprecedent extension risk
Getty v Stability AIUS/UKactive litigationmediummediummonitor; image-model adjacencyprecedent extension risk
CDT AI policy pressureUSactivelowlowengagement, transparencyreputational

Each row reflects the rule/case posture at runDate; ratings are qualitative pending management disclosure.

[CR001, CR002, CR003, CR004, CR005, CR006]
FR001: Risk heatmap

Likelihood × severity heatmap across top risks.

[CR001, CR003, CR004, CR012, CR018, CR021]

7.2 Operational, security, partner, and dependency risk

Operational risk for Together centres on three vectors: GPU capacity availability (Hopper and Blackwell), model-serving reliability, and regulated-workload controls. NVIDIA is the dominant single-vendor dependency — GPUs, networking (NVLink, InfiniBand), and software stack (CUDA, TensorRT, NeMo, Dynamo) — and is also a strategic investor, which both reduces supply-allocation risk and concentrates correlated downside if Blackwell allocation tightens. HuggingFace is the primary model-artefact dependency; partner risk would emerge if HF changes hosting terms or commercial alignment. Salesforce Ventures is the lead enterprise channel partner via the Series B; channel concentration risk is non-trivial. Security exposure spans the standard model-cloud surface (prompt injection, data exfiltration, prompt-logging leakage, supply-chain compromise of model weights) and the SOC 2 / HIPAA control surface that Together publishes via its trust center. The public status page exists but does not publish an SLA percentage. Competitive displacement risk is real: Fireworks, Replicate, Modal, Anyscale, Cerebras, and Groq all serve overlapping workloads, and hyperscalers (AWS Bedrock, GCP Vertex, Azure OpenAI) bundle inference into existing enterprise contracts. People and execution risk includes key-person dependency on Vipul Ved Prakash (CEO), Ce Zhang (CTO), and Tri Dao (Chief Scientist), and the build-out velocity required to keep pace with Hopper→Blackwell→Rubin cadence. The operational, partner, and people registers below capture each failure mode, mitigation maturity, and residual exposure with explicit diligence paths.[CR018, CR019, CR020, CR021, CR022, CR023]

Operational / quality / security risk register
Failure modeLikelihoodSeverityMitigation maturityResidual exposureUnresolved gap
Serverless multi-hour outagemediummediumstatus page; no SLA %customer churnSLA disclosure
Dedicated-endpoint hardware failurelowmediumredundancy impliedrevenue at riskreliability metrics
Prompt injection / data exfiltrationmediummediumsafety models, function-calling guardrailscustomer breachpen-test cadence undisclosed
Model-weight supply-chain compromiselowhighHF integrity checksplatform-wide compromiseweight signing process undisclosed
SOC 2 attestation lapselowmediumtrust center publishes postureenterprise-deal blockexpiry date undisclosed
HIPAA BAA breachlowhighBAA availableregulatory finesbreach plan undisclosed
GPU capacity shortfallmediumhighNVIDIA partnershiprevenue capallocation commit undisclosed
Network / inter-zone failurelowmediummulti-region impliedlatency spikeregion map undisclosed
Insider threatlowmediumstandard controlsdata leakaccess controls undisclosed
Software bug introducing regressionmediumlowstaged rollout impliedreputationrelease cadence undisclosed

Operational ratings are qualitative; multiple control primitives are undisclosed and treated as diligence asks.

[CR018, CR019, CR020, CR021, CR022, CR028]
Partner / dependency risk register
DependencyCounterpartyRoleConcentrationFailure scenarioSeverityMitigationResidual exposure
GPU supplyNVIDIAprimary supplier + investorvery highBlackwell allocation cuthighstrategic investor; multi-gen commitrevenue cap
Model artefactsHuggingFaceregistry + distributionhighhosting policy changemediumcompany self-host fallbackdistribution friction
Enterprise channelSalesforceco-sell + investormediumco-sell deprioritisationmediumdirect sales build-outpipeline shrink
Datacenter capacityMultiple (undisclosed)colo + hyperscalermediumsingle-region capacity lossmediummulti-region buildlatency / cost
NetworkMultipletransit + IXlowpeering losslowmulti-carriertransient latency
Open-source communityLlama, Mistral, Qwen, DeepSeek maintainersmodel upstreamsmediumlicense changemediummodel diversitylicensing review burden
Capital partnersGC / Salesforce / NVIDIA / Lux / Coatue / Prosperity7 / Kleinerinvestorsmediumround oversubscription failuremediumrevenue tractionfinancing risk
Sovereign partnersProsperity7 (KSA-adjacent)strategic investorlowgeo-political pressuremediumdisclosure posturereputational

Dependency ratings reflect public concentration only; private contractual commits remain a diligence ask.

[CR023, CR024, CR025, CR026, CR027, CR030]
People / execution risk register
Role / functionDependency or gapLikelihoodSeverityMitigationDiligence path
CEO Vipul Ved Prakashfounder-led; key-person dependencylowhighfounder retentionreference checks
CTO Ce Zhangkey-person dependencylowhighretentionreference checks
Chief Scientist Tri Daokey-person; brand-defininglowhighacademic dual-affiliationretention plan
VP Infra Alon Gavrielovnew hire (2025)lowmediumrecent joinonboarding review
CFOundisclosed at runDatemediummediumrecruiting in progress (inferred)confirm hire
CRO / sales leaderundisclosed at runDatemediummediumenterprise build-outconfirm hire
Engineering benchgrowing post-Series Bmediummediumhiring momentumheadcount disclosure
Compliance / GRCSOC 2 referenced; team size undisclosedmediummediumattestation evidenceteam size confirm
Board compositionGC + SVP + NVIDIA + foundersmediummediumgrowth-stage governanceboard minutes diligence
Hopper→Blackwell→Rubin transition executionmulti-quarter build-outmediumhighpartnership with NVIDIAprogram plan diligence

People register includes both named individuals and undisclosed roles; CFO/CRO confirmations are explicit diligence asks.

[CR032, CR033, CR034, CR035]
FR002: Risk transmission map

How risks flow into revenue, margin, financing, and valuation.

[CR001, CR003, CR004, CR012, CR023, CR024]

7.3 Mitigations, kill criteria, and thesis-break triggers

The mitigation-and-kill-criteria table below pairs every top risk with a monitorable trigger, an explicit threshold or event, and the action implication if the trigger fires. Triggers span regulatory (e.g. EU AI Act GPAI obligation enforcement in 2027), litigation (e.g. adverse copyright ruling that extends to platform hosts), partner (e.g. NVIDIA allocation cut or HuggingFace hosting change), operational (e.g. multi-hour serverless outage, breach disclosure), competitive (e.g. hyperscaler bundled-inference pricing undercut), commercial (e.g. Salesforce co-sell churn), and execution (e.g. founder departure, missed Blackwell go-live). For each trigger the table records the transmission path into revenue, margin, financing, or valuation, and the action implication (kill, re-underwrite, monitor, accept). The chapter is explicit that several primitives — incident count, SLA, top-10 customer concentration, retention, GPU committed-spend, opex split — are undisclosed and therefore treated as diligence asks rather than asserting numbers that cannot be sourced. Adverse-source coverage is wide: regulatory bodies (FTC, BIS, EU, UK ICO, OAIC), legal dockets (CourtListener: NYT, Authors Guild, Getty), competitor websites (Fireworks, Replicate, Modal, Anyscale, Groq, Cerebras, CoreWeave, Lambda), and developer-sentiment fora (Hacker News, Reddit). The chapter underwrites that Together's public risk surface is normal for a growth-stage AI infrastructure company with a healthy mitigation posture, but several control primitives remain unverified pending management disclosure.[CR034, CR035, CR036, CR037, CR038, CR039]

Mitigation and kill criteria table
RiskMonitorable triggerThreshold / eventAction implication
EU AI Act GPAIenforcement noticefirst 7% fine on a peerre-underwrite EU revenue
BIS export tighteningnew entity-list ruleadditional GPU export class addedre-underwrite sovereign pipeline
Copyright litigation extensionplatform-host rulingany host-liability rulingre-underwrite OSS hosting
NVIDIA allocationBlackwell allocation cutpublished cut to a comparable peerre-underwrite capacity ramp
HuggingFace policy changeHF terms updatematerial commercial changebuild self-host
Serverless outagemulti-hour incident>4h or repeated >1hSLA review + customer comms
Security breachdisclosure eventany reportable incidentimmediate re-underwrite
Customer concentrationtop-10 sharesingle customer >25%concentration discount
Founder departurepublic announcementany of CEO/CTO/CSOkill or major re-underwrite
Down-roundnew financingflat or down vs Series Bre-underwrite valuation

Triggers are monitorable from public disclosure; the table is the chapter's actionable kill-criteria contract.

[CR034, CR035, CR036, CR037, CR038, CR039]
FR003: Dependency map

Critical partners, suppliers, regulators, and financing dependencies.

[CR023, CR024, CR025, CR026, CR027, CR030]

7.4 Exhibits

Chapter 08

08Valuation

8.1 Recommendation, thesis, and anti-thesis

The recommendation is Hold/Monitor with medium confidence and a medium-high risk rating. Investment thesis: Together AI sits at a structurally attractive intersection of (a) the GenAI inference market expanding 40-60% CAGR per analyst-market-data sources (Gartner, Forrester, IDC, a16z, Bessemer, Menlo), (b) a credible technical moat through FlashAttention authorship (Tri Dao), ThunderKittens kernels (Stanford HazyResearch), Together Inference Engine v2, and Mixture-of-Agents productisation, and (c) an enterprise distribution channel anchored by Salesforce Ventures co-sell, NVIDIA GTC 2025 Pioneers, and the Startup Accelerator funnel. Anti-thesis: the inference layer is contested by Fireworks, Replicate, Modal, Anyscale, Cerebras, Groq, and hyperscalers (AWS Bedrock, GCP Vertex, Azure OpenAI Service) who bundle inference into existing enterprise contracts; revenue (reported $130M-$200M+ ARR per The Information) and retention primitives remain undisclosed; valuation at the Series B mark (~$3.3B-$3.5B) requires multi-year revenue scale to underwrite a 3-5x exit; and the regulatory perimeter (EU AI Act, BIS, copyright litigation precedent) is tightening through 2027. The valuation chapter records each of these as an explicit thesis-break trigger and pairs it with a monitorable threshold and an action implication. The recommendation summary table below pairs recommendation, confidence, risk rating, valuation stance, and decision implication; the thesis / anti-thesis table records the underlying arguments and what would change the view.[CV001, CV002, CV003, CV004, CV005, CV006]

Recommendation summary table
RecommendationConfidenceRisk ratingValuation stanceDecision implication
Hold / Monitormediummedium-highat-or-near current Series B markTrack ARR + NRR + concentration; revisit at Series C
Buy (conditional)mediummedium25% correction OR confirmed >$500M ARREnter on confirmed traction or down-round
Pass (conditional)mediumhighif hyperscaler pricing cut >40% OR NVIDIA allocation cut OR breachExit / decline if bear trigger fires
Bull caselowmedium>$8B exit by 2028Strategic-acquisition or premium IPO path
Base casemediummedium$4B-$6B exit by 2028ARR scale + margin expansion
Bear casemediumhigh$1B-$2.5B outcomeDown-round / compressed exit

Recommendation is conditional on the trigger thresholds in the thesis-break table.

[CV001, CV002, CV003, CV004, CV005]
Thesis / anti-thesis table
ArgumentWhat would change the view
GenAI inference TAM growing 40-60% CAGR per analyst sourcesTAM revisions <20% CAGR
FlashAttention + ThunderKittens + TIE v2 form a credible technical moatOpen-source / hyperscaler kernel parity erodes Together edge
Salesforce Ventures-led Series B implies multi-year channel commitmentSalesforce co-sell deprioritisation or churn
NVIDIA strategic investment + GTC 2025 Pioneers cohort signal supply + pipelineNVIDIA reallocation to direct-managed offerings (DGX Cloud)
Open-source neutrality is a defensible positioning vs closed-API providersMajor OSS license changes (Llama, Mistral, Qwen, DeepSeek)
Documented enterprise + startup proof base (Salesforce, Zoom, Pika, Cartesia, Arcee)Named-customer churn or production downgrade
Capital base + brand attract talent and customersDown-round or failed Series C
Anti: hyperscaler bundled inference (AWS Bedrock, GCP Vertex, Azure) compresses pricingHyperscaler retreats from bundled inference
Anti: GenAI copyright litigation could extend to platform hostsAdverse precedent contained to model-trainer defendants
Anti: revenue + retention undisclosed; price-sensitive entry discipline requiredManagement discloses ARR + NRR

Thesis and anti-thesis are symmetric; the chapter is explicit on what evidence would flip the view.

[CV006, CV007, CV008, CV009, CV010, CV011]
FV001: Recommendation logic

Chain from scale, proof, risks, and valuation to the recommendation.

[CV001, CV002, CV003, CV004, CV005, CV006]

8.2 Scenarios, comparables, and sensitivity

Three scenarios anchor the valuation. Base case ($4B-$6B exit, ~50% probability) assumes ARR scales from current $130M-$200M to $500M-$700M over 2026-2028 with sustained gross margin in the 30-40% range typical of AI inference, modest dilution at a Series C, and Hopper→Blackwell capacity ramp on time. Bull case ($8B-$12B exit, ~25% probability) requires ARR >$1B by 2028, gross margin expansion through FlashAttention-driven utilisation, sustained Salesforce + NVIDIA channel commitment, and either a strategic acquisition (NVIDIA, hyperscaler, Salesforce) or a 2027-2028 IPO at premium multiples. Bear case ($1B-$2.5B outcome, ~25% probability) materialises if hyperscaler bundled inference compresses pricing, NVIDIA allocation tightens, or copyright precedent extends to platform hosts. The comparable-valuation table covers CoreWeave (post-IPO GPU-cloud comparable), Navan (recent S-1 SaaS comparable), Figma (S-1 comparable), and private rounds (Fireworks rumoured $4B, Replicate, Modal, Sakana, Mistral, Anthropic) plus public listings (NVIDIA, Snowflake) used as ceiling references. Sensitivity drivers are revenue growth, gross margin, NRR, exit multiple, and probability-weighted exit window. The bull/base/bear table and comparable-valuation table below capture each scenario's assumptions, valuation logic, and key sensitivity. The valuation-sensitivity bar figure and the valuation-range figure show downside, base, and upside vs the current Series B mark.[CV018, CV019, CV020, CV021, CV022, CV023]

Bull / base / bear scenario table
ScenarioProbabilityARR assumptionGross marginExit multipleValuation/return logicKey risks
Bull25%>$1B ARR by 202840-50%12-15x ARR$8B-$12B exit; strategic / premium IPOHyperscaler bundling; NVIDIA reallocation
Base50%$500M-$700M ARR by 202830-40%8-10x ARR$4B-$6B exit; trade-sale or IPOCompetitive pricing; retention drift
Bear25%$200M-$300M ARR by 202820-30%5-7x ARR$1B-$2.5B outcome; down-roundHyperscaler price war; copyright precedent; NVIDIA allocation cut

Probabilities are subjective and chapter-internal; each row should be re-marked at Series C and at every major customer or regulatory event.

[CV015, CV016, CV017, CV018, CV019, CV020]
Comparable valuation table
ComparableMetricMultiple / valuation / statusRelevanceLimitation
CoreWeave (post-IPO, GPU-cloud)EV / next-12-month revenue8-12x at post-IPOGPU-cloud closest comparableCoreWeave revenue mix is GPU-bare-metal heavier
Navan (S-1, SaaS)EV / NTM revenue8-12x at filingGrowth-stage SaaS comparableSaaS, not inference
Figma (S-1, SaaS)EV / NTM revenue12-15x at filingHigh-multiple SaaS comparableDesign SaaS, not inference
Fireworks AI (rumoured 2024 round)last private round~$4B (rumoured)Direct inference comparableRound value rumoured
Replicate (private)last private roundundisclosedDirect inference comparableLimited disclosure
Modal (private)last private roundundisclosedServerless inference comparableLimited disclosure
Anyscale (private)last private round$1B-$2BRay + inference comparableDifferent positioning
Sakana AI (round)last private round~$1.5B (Aug 2024)OSS model-builder comparableModel lab not infra
Mistral (round)last private round$6B (mid-2024)OSS model lab comparableHybrid model + infra
Anthropic (round)last private round$60B+ (2025)Closed-API comparableDifferent model — not direct
NVIDIA (public)EV / NTM revenuehigh-teens to mid-20sCeiling referenceFar larger scale
Snowflake (public)EV / NTM revenue10-15xSaaS ceiling referenceMature SaaS

Comparable rows mix public and private valuations; private-round figures are taken from press reports and PitchBook.

[CV021, CV022, CV023, CV024, CV025, CV026]
FV002: Valuation sensitivity

Sensitivity of valuation outcome to revenue, margin, multiple, retention.

[CV018, CV019, CV020, CV021]
FV003: Valuation / return range

Low / base / high valuation range across scenarios at 2028 exit window.

[CV022, CV023, CV024, CV025, CV026, CV029]

8.3 Thesis-break triggers, diligence asks, and KPIs

The thesis-break and kill-triggers table converts the chapter's risk and valuation logic into monitorable triggers tied to specific events: (a) revenue miss vs $500M-$700M ARR run-rate by 2027-2028 → re-underwrite base case, (b) Salesforce co-sell deprioritisation → kill bull case, (c) NVIDIA Blackwell allocation cut → re-underwrite capacity ramp, (d) hyperscaler bundled inference price cut >40% → compression, (e) any platform-host copyright ruling → re-underwrite OSS hosting, (f) Series C at flat/down vs Series B → mark-to-market valuation, (g) founder departure → kill thesis, (h) breach disclosure or multi-hour outage → SLA + reputation re-underwrite. The final diligence asks table records the remaining missing primitives — exact ARR, NRR/GRR, top-10 customer concentration, GPU committed spend, opex split, CFO/CRO hires, sovereign-channel posture, paid-developer count — and maps each to an owner or diligence path. The investment-KPI figure consolidates IC-ready scoring across market, proof, moat, economics, risk, valuation, and evidence quality on a 0-100 scale. The chapter is explicit that the recommendation is price-sensitive and evidence-sensitive: at a Series B valuation in the $3.3B-$3.5B range with the disclosed evidence base, Hold/Monitor is the disciplined answer — Buy at a 25%+ correction or with confirmed >$500M ARR + >120% NRR; Pass if any of the bear-case triggers fires before the Series C.[CV034, CV035, CV036, CV037, CV038, CV039]

Thesis-break and kill triggers table
TriggerThresholdTransmission to thesisAction implication
ARR run-rate vs base case<$500M ARR by FY2027revenue mark-downre-underwrite base
Salesforce co-sellpublic deprioritisationchannel mark-downkill bull
NVIDIA allocationpublished cut to peercapacity mark-downre-underwrite capacity
Hyperscaler bundled pricing>40% cut on AWS Bedrock or peermargin compressionre-underwrite base
Copyright precedentplatform-host rulingOSS hosting mark-downre-underwrite OSS revenue
FinancingSeries C flat or down vs Series Bmark-to-marketre-underwrite valuation
Founder departureany of CEO/CTO/CSOexecution mark-downkill thesis
Security / outagebreach disclosure OR multi-hour outagereputation + SLAre-underwrite enterprise pipeline

Trigger thresholds are monitorable from public disclosure or peer comparables.

[CV033, CV034, CV035, CV036, CV037, CV038]
Final diligence asks table
TopicMissing evidenceWhy it mattersOwner / diligence path
Revenueexact ARR at runDatebase/bull scenario underwritingrequest management ARR + growth
RetentionNRR / GRR / cohort retentionquality of revenuerequest retention by cohort
Concentrationtop-10 customer sharesingle-event downsiderequest anonymised top-10
GPU commitcommitted spend with NVIDIAmargin underwritingrequest supplier commit
Opex splitR&D / S&M / G&Aburn underwritingrequest income-statement split
CFO / CROpresence + tenureexecution underwritingconfirm hires
Sovereign channelProsperity7 commitgeo + brand riskconfirm channel posture
Paid-developer countpaid vs free splitself-serve revenue underwritingrequest paid-developer count
SOC 2 expiryType II expiry dateenterprise procurementrequest attestation refresh
Open license postureOSS hosting policycopyright exposurerequest hosting policy

All diligence asks map to chapter-internal questions and to the risks chapter mitigation table.

[CV040, CV041, CV042, CV043, CV044]
FV004: Investment KPIs

IC-ready scoring across market, proof, moat, economics, risk, valuation, evidence.

[CV040, CV041, CV042, CV043, CV044]

8.4 Exhibits

Disclaimer

This report is a public-evidence diligence snapshot, not investment advice. Important financial, legal, technical, and contractual facts remain non-public and should be verified directly with management and primary documents before any investment decision.

Evidence index

Claims
IDStatementConfidenceSources
CO001 Together AI markets itself as "the AI acceleration cloud" offering training, fine-tuning, and inference for open-source and custom models. High SO001, SO002
CO002 The corporate entity is Together Computer Inc., headquartered in San Francisco, California, with an additional research presence in Zurich. High SO002, SO004, SO003
CO003 Together was incorporated on 27 June 2022 by four co-founders: Vipul Ved Prakash, Ce Zhang, Chris Ré, and Percy Liang. High SO002, SO018
CO004 The company's public surface positions three product lines: serverless inference API, dedicated endpoints, and fine-tuning/training services. High SO001, SO035
CO005 Together emphasises that customers can keep weights and choose dedicated capacity, a deliberate contrast with closed-API providers. Medium SO001, SO005
CO006 CEO Vipul Ved Prakash previously co-founded Topsy, which Apple acquired for approximately $200M in 2013, and earlier co-founded Cloudmark. High SO018, SO002
CO007 CTO Ce Zhang is a tenured professor at ETH Zürich specialising in distributed ML and data-centric ML research. High SO002, SO018
CO008 Chief Scientist Chris Ré is a MacArthur Fellow at Stanford and a co-founder of Snorkel, anchoring much of Together's open-source research lineage. High SO002, SO011
CO009 Co-founder Percy Liang directs the Stanford Center for Research on Foundation Models (CRFM) and leads the HELM benchmark. High SO002, SO018
CO010 Princeton CS faculty member Tri Dao is the principal author of FlashAttention and is publicly identified as a Together chief scientist. High SO002, SO009, SO036
CO011 Together actively recruits across kernel engineering, GPU systems, applied ML, sales, and revenue operations roles as of May 2026. High SO003, SO018
CO012 Together raised a $20M Series Seed in May 2023 led by Lux Capital, with Factory, SciFi Capital, and Long Journey Ventures participating. High SO018, SO012
CO013 A $102.5M Series A closed in November 2023, led by Kleiner Perkins with NVIDIA, Emergence, NEA, Prosperity7, and Greycroft participating. High SO006, SO014, SO018
CO014 An interim financing in March 2024 reportedly valued Together at approximately $1.25B. Medium SO015, SO018
CO015 Together closed a $305M Series B on 9 July 2024 led by Salesforce Ventures and Coatue at a $3.3B post-money valuation. High SO012, SO013, SO016, SO017
CO016 Cumulative disclosed primary capital totals approximately $533M (seed + A + interim + B) before any 2025–2026 extensions. Medium SO012, SO006, SO018
CO017 No Together AI registration, S-1, or other public filing appears on SEC EDGAR as of the May 2026 run date. High SO027, SO019
CO018 NVIDIA participated as a strategic investor in both Series A and Series B financings, signalling H100/H200 supply alignment. Medium SO026, SO006, SO012
CO019 CNBC reported Together AI was running at an approximately $100M annualised revenue pace around the Series B announcement in July 2024. Medium SO012
CO020 Bloomberg cited triple-digit year-over-year revenue growth for Together AI at the time of the Series B, without disclosing absolute figures. Medium SO013
CO021 Together has publicly stated it operates more than 20,000 NVIDIA Hopper-class GPUs across its multi-region cluster. Medium SO012, SO005
CO022 The company describes its developer footprint as "hundreds of thousands" of developers, without disclosing paid versus free split. Low SO001, SO005
CO023 Together's public job board and LinkedIn footprint imply a headcount above 150 full-time staff globally as of May 2026. Low SO003, SO018
CO024 No audited gross margin, net revenue retention, or paid-customer disclosure exists for Together AI as of the run date. High SO027, SO019
CO025 Together AI launched OpenChatKit in March 2023 with LAION and Ontocord, an early open-source instruction-tuned chat baseline. High SO008, SO030
CO026 The RedPajama 1T token open dataset was released on 17 April 2023, intended to reproduce LLaMA-grade pretraining data. High SO007, SO029
CO027 FlashAttention-3 was published on arXiv and Together's blog on 11 July 2024, claiming state-of-the-art H100 attention performance. High SO036, SO009
CO028 StripedHyena-Nous-7B, a non-attention long-context architecture, was released in December 2023 in collaboration with Nous Research. High SO031, SO034
CO029 Together's Mixture-of-Agents paper, published in June 2024, demonstrated multi-LLM ensembling improvements on AlpacaEval. High SO037, SO011
CO030 Together publishes an active GitHub organisation (togethercomputer) with multiple ten-thousand-star repositories including OpenChatKit and RedPajama-Data. High SO028, SO029, SO030
CO031 The HuggingFace organisation togethercomputer hosts the RedPajama datasets and StripedHyena, Pythia, LLaMA-32k, and m2-bert models. High SO033, SO011
CO032 No public regulatory action, litigation, recall, or executive departure involving Together AI has been reported as of May 2026. Medium SO018, SO019, SO027
CO033 Together AI is described as one of the most followed open-source-AI infrastructure accounts on Hacker News and X. Low SO020, SO024, SO021
CO034 Salesforce Ventures publicly framed the Series B as enabling enterprise customers to deploy open models on Together's cloud. Medium SO025, SO012
CO035 Crunchbase's Together AI profile is paywalled and could not be independently verified for cap-table details at runDate. Medium SO019
CO036 Cover-metric "gaps" remain for ARR, gross margin, NRR, and paid-customer count; all are flagged as diligence asks for management. Medium SO027, SO019, SO012
CM001 Together AI competes in the AI compute and inference platform layer between hyperscaler GPU IaaS and closed-API model labs. High SM001, SM004, SM023
CM002 Together's addressable spend pool excludes general-purpose cloud compute and closed-only proprietary model APIs. Medium SM001, SM002
CM003 Status-quo substitutes for Together include self-hosted Kubernetes-on-GPU clusters and OpenAI/Anthropic closed APIs. Medium SM011, SM012
CM004 Specialised GPU clouds (CoreWeave, Lambda) compete on infrastructure but lack Together's open-source-model SaaS layer. Medium SM013, SM014
CM005 Inference-API providers (Replicate, Fireworks, Groq, Modal) compete directly at the per-token serverless layer. High SM015, SM019, SM018, SM016
CM006 AWS Bedrock and Google Vertex AI offer hosted open-model inference that overlaps Together's serverless product. High SM011, SM012
CM007 Gartner sizes 2024 AI infrastructure TAM at $40–60B with a 30–50% CAGR through 2028. Medium SM021
CM008 IDC-style analyst notes peg 2024 global AI infrastructure spend near $50B. Low SM021, SM022
CM009 Triangulated inference + dedicated GPU SAM for 2026 lands in an $8–15B range. Medium SM021, SM024, SM022
CM010 Together-addressable SOM (channels + open-model demand) is on the order of $1–3B in 2026. Low SM024, SM027
CM011 CNBC reported a ~$100M Together ARR at the July 2024 Series B, implying mid-single-digit SOM share. Medium SM024, SM025
CM012 NVIDIA disclosed >$30B quarterly data-centre revenue in early 2025, evidence that AI-compute spend dwarfs Together's ARR. High SM028, SM022
CM013 No single public source cleanly disaggregates inference spend from training capex, creating range uncertainty. Medium SM021, SM028, SM022
CM014 AI-native startups and model labs are Together's most active early buyers, choosing it for open-weight flexibility and dedicated GPU access. Medium SM003, SM032
CM015 F500 enterprise platform teams are an emerging segment, anchored by Salesforce Ventures Series B leadership. Medium SM027, SM024
CM016 Sovereign and regional cloud customers are a strategic third segment, signalled by Prosperity7 (Aramco) investor presence. Low SM024, SM023
CM017 Within Together, users (developers) frequently differ from payers (procurement/finance), lengthening enterprise sales cycles. Low SM027, SM004
CM018 Self-serve credit-card adoption is the primary land motion for AI-native startup customers on Together. Medium SM002, SM008
CM019 Together's NVIDIA GTC 2025 spotlight emphasised "AI pioneers" as case-study customers, validating the enterprise wedge. Medium SM033, SM028
CM020 Together's AI-Native conference (2025) was framed as a developer community event, reinforcing top-of-funnel demand generation. Medium SM005, SM030
CM021 Open-weight model proliferation (Llama 3/4, DeepSeek, Mistral, Qwen) keeps SAM growth above 35% CAGR through 2027. Medium SM022, SM021, SM029
CM022 NVIDIA Hopper and Blackwell GPU scarcity drives demand for Together's reserved capacity SKUs. Medium SM028, SM013
CM023 Closed-API price cuts from OpenAI compress per-token margins across the inference market. Low SM002, SM030
CM024 Hyperscaler open-model commoditisation (AWS Bedrock, GCP Vertex Model Garden) threatens to erode Together's pure-inference SAM. Medium SM011, SM012
CM025 Sovereign data residency rules accelerate demand for in-region dedicated clusters but cap cross-border ARR. Low SM004, SM023
CM026 Energy and data-centre permitting bottlenecks slow capacity expansion through 2028. Low SM013, SM028
CM027 Agentic AI workloads (Mixture-of-Agents, multi-step reasoning) multiply per-user token volume. Medium SM004, SM005
CM028 FinOps pressure pushes enterprises to substitute open-weight inference for closed-API spend. Low SM002, SM027
CM029 Together announces serverless, dedicated, and batch inference SKUs to capture different buyer demand curves. High SM002, SM008, SM009, SM010
CM030 Batch inference pricing updates in 2025 reduced per-million-token costs to attract high-volume customers. Medium SM006, SM010
CM031 Specialised GPU clouds CoreWeave and Lambda compete on raw GPU-hour pricing; Together overlays an inference SaaS layer. Medium SM013, SM014
CM032 Groq, Cerebras, and SambaNova compete with bespoke silicon for inference latency leadership. High SM018, SM020
CM033 Modal, Replicate, and Anyscale compete in serverless and Ray-based AI compute SaaS. Medium SM016, SM015, SM017
CM034 Fireworks AI is widely cited as Together's closest direct competitor on open-model inference SaaS. Medium SM019, SM030
CM035 Public-cloud earnings (AWS, GCP) describe AI workloads as the fastest-growing portion of cloud revenue. Medium SM011, SM012
CM036 Reddit r/LocalLLaMA and Hacker News discussion volume around Together has risen steadily through 2024–2026. Low SM030, SM029, SM031
CP001 Together competes against AWS Bedrock and Google Vertex Model Garden on hosted open-weight model inference. High SP018, SP019, SP001
CP002 Specialised GPU clouds CoreWeave and Lambda compete with Together at the IaaS layer for reserved GPU capacity. High SP020, SP021
CP003 Fireworks, Replicate, Modal, and Anyscale provide direct substitutes at the per-token serverless inference layer. Medium SP026, SP022, SP023, SP024
CP004 Groq, Cerebras, and SambaNova compete with bespoke silicon for inference latency leadership. High SP025, SP027
CP005 OpenAI and Anthropic act as substitutes for closed-API customers willing to give up weight portability. Medium SP018, SP036
CP006 TensorWave provides AMD MI300X GPU capacity as a niche alternative for cost-sensitive teams. Low SP028
CP007 Self-hosted Kubernetes-on-GPU is the status-quo alternative most cited by frontier labs and FAANG. Low SP036, SP037
CP008 Fireworks AI is widely cited as Together's closest direct competitor on open-model inference SaaS. Medium SP026, SP036, SP037
CP009 Together leads on FlashAttention kernel performance, anchored by the FlashAttention-3 paper and Together engineering team. High SP031, SP005, SP029, SP030
CP010 FlashAttention-4 was released in 2025 and extends Together's kernel lead on Hopper GPUs. Medium SP006
CP011 AWS Bedrock and GCP Vertex lead on enterprise compliance breadth (BAA, FedRAMP, regional residency). High SP018, SP019
CP012 Groq leads on single-stream inference latency on its supported models but lags in model coverage. Medium SP025, SP036
CP013 Fireworks AI provides an OpenAI-compatible API and serves the same open-model catalog as Together. High SP026, SP015
CP014 Together's serverless Llama-70B is listed near $0.88 per million tokens, within the OpenAI-parity envelope. High SP002, SP011
CP015 Together batch inference offers up to 50% discount versus serverless rates as of the 2025 update. Medium SP013
CP016 AWS Bedrock charges $0.99/M output tokens for Llama 3 70B in 2026 list pricing. Medium SP018
CP017 GCP Vertex Llama 3 70B is priced near $0.99/M tokens with volume discounts. Medium SP019
CP018 Groq lists Llama 3 70B at ~$0.59/M tokens, undercutting Together on raw price while constraining model choice. Medium SP025
CP019 CoreWeave and Lambda charge $2–4 per H100-hour for reserved or on-demand GPUs. Medium SP020, SP021
CP020 Together fine-tuning API, batch SKU, and dedicated endpoints differentiate it from raw-GPU competitors. High SP012, SP013, SP011
CP021 Together's open-source research lineage (RedPajama, StripedHyena, MoA, FlashAttention) sustains community gravity that competitors struggle to match. High SP031, SP034, SP004
CP022 Tri Dao and Chris Ré anchor Together's kernel and architecture research velocity. High SP031, SP005, SP008
CP023 NVIDIA's participation in Series A and Series B is read by the market as a GPU supply alignment moat. Medium SP041
CP024 Salesforce Ventures Series B leadership opens an enterprise distribution channel competitors lack. Medium SP004, SP003
CP025 Together advertises dedicated endpoints and reserved capacity SKUs that raise customer switching cost. High SP012, SP002
CP026 Hyperscalers (AWS, GCP) own enterprise procurement and identity, which is a distribution disadvantage Together must compensate for. Medium SP018, SP019
CP027 Enterprise multi-homing across Together / Fireworks / Bedrock is the reported equilibrium in 2026 buyer surveys. Low SP036, SP037
CP028 Open-weight neutrality is a counter-positioning advantage versus closed-only OpenAI and Anthropic substitutes. Medium SP001, SP002
CP029 Together publishes an OpenAI-compatible chat completions endpoint, simplifying migration from closed APIs. High SP015, SP016
CP030 CoreWeave's 2024 IPO disclosures reveal $1B+ revenue scale, implying meaningful capital advantage at the IaaS layer. Medium SP020, SP036
CP031 Lambda Labs raised a $320M Series C in 2024 to expand its H100/H200 fleet. Medium SP021
CP032 Groq and Cerebras have each raised more than $1B in 2024–2025 to fund bespoke silicon expansion. Medium SP025, SP027
CP033 AWS Bedrock's 2025 expansion of Llama support compresses Together's premium on commodity inference workloads. Medium SP018
CP034 Specialised silicon vendors (Groq, Cerebras, SambaNova) pose a latency-leapfrog risk that pure-software inference cannot fully match. Medium SP025, SP027
CP035 Together's Python SDK and PyPI download trajectory signal sustained developer pull comparable to peers. Medium SP042, SP043
CP036 Speculative-decoding and Medusa-class research feed Together's ability to close any Groq latency gap on shared models. Medium SP032, SP033
CI001 Together AI raised a $20M Seed in May 2023 led by Lux Capital. High SI008, SI018, SI019
CI002 Together AI raised a $102.5M Series A in November 2023 led by Kleiner Perkins. High SI005, SI015, SI018
CI003 In March 2024 Together added approximately $106M at a reported $1.25B valuation (Series A2). Medium SI016, SI007, SI014
CI004 Per the canonical company-overview claim, the Series B closed July 2024 at ~$3.3B post led by Salesforce Ventures and Coatue (financials chapter relies on that fact for capital-stack analysis). High SI011, SI012, SI013, SI006
CI005 NVIDIA participated in both Series A and Series B as a strategic investor. High SI022, SI006
CI006 Salesforce Ventures led the Series B, opening an enterprise distribution channel. High SI021, SI011, SI006
CI007 Cumulative disclosed primary capital is approximately $533M across Seed, Series A, March 2024 extension, and Series B. High SI011, SI018, SI006
CI008 No S-1, S-3, or registered offering appears on SEC EDGAR for Together Computer Inc. at the 2026-05 runDate. High SI020, SI025
CI009 CNBC reported an approximately $100M annualised revenue pace around the July 2024 Series B announcement. Medium SI011, SI012
CI010 Bloomberg reported triple-digit revenue growth around the July 2024 Series B. Medium SI013, SI014
CI011 Together has not published audited ARR, gross margin, or NRR figures as of the runDate. High SI020, SI001, SI002
CI012 Together publishes per-token list pricing on its public pricing page for serverless inference. High SI002, SI001
CI013 Together offers a 50% batch inference discount as of the 2025 batch pricing update. Medium SI009, SI002
CI014 Dedicated endpoint and reserved-capacity pricing is quoted via sales rather than published. High SI002, SI004
CI015 Together SKUs span serverless, dedicated, fine-tuning, batch, embeddings, vision, audio, and image. High SI002, SI001, SI004
CI016 Realised enterprise pricing for Together is not publicly disclosed and is a material diligence gap. Medium SI002, SI038
CI017 The Information has published paywalled coverage of Together AI 2025 revenue trajectory. Low SI026
CI018 PitchBook lists Together AI as later-stage venture with no public 2025 round confirmation. Medium SI025, SI019
CI019 Together has not disclosed gross margin by SKU as of the runDate. High SI020, SI002, SI001
CI020 Together has not disclosed top-10 customer concentration as of the runDate. High SI020, SI003
CI021 Together has not disclosed net dollar retention (NDR) as of the runDate. High SI020, SI003
CI022 Together has not disclosed contracted-revenue (RPO) figures. High SI020, SI001
CI023 Together has not disclosed cash position or runway as of the runDate. High SI020, SI001
CI024 CoreWeave 2024 S-1 disclosures imply GPU-cloud gross margins in the 60-70% range on reserved deals. Medium SI032, SI035
CI025 Together per-token gross margin on serverless is plausibly 40-60% based on competitor analog disclosures. Low SI032, SI036, SI037
CI026 Implied cash burn through 2024 is roughly $300-$500M consistent with GPU buildout and 150+ headcount. Low SI004, SI001, SI018
CI027 With $533M raised and that implied burn, runway likely extends into 2026 without a new round. Low SI006, SI011
CI028 Figma and CoreWeave 2025 IPOs demonstrate the public-market window is open for AI-infrastructure issuers. High SI034, SI032
CI029 Navan 2025 S-1 process is a closer growth-SaaS comparable than CoreWeave for Together. Medium SI033
CI030 Together has not disclosed any debt or vendor-financing facility. Medium SI020, SI004
CI031 Founder and employee ownership post Series B is widely reported as significant but no exact percentages are public. Low SI006, SI018, SI019
CI032 No public secondary or tender offer for Together AI shares has been reported at the runDate. Medium SI020, SI025, SI026
CI033 Forrester and IDC market frames place Together in the growth-stage generative-AI infrastructure segment without naming it top-three. Medium SI027, SI028
CI034 Menlo Ventures and Bessemer 2025 State-of-AI reports frame the inference market as multi-billion-dollar and growing. Medium SI030, SI031, SI029
CI035 No public 2026 follow-on round, IPO filing, or M&A announcement involving Together has been confirmed at the runDate. High SI020, SI025, SI026, SI011
CI036 Together pricing-page revisions in 2025 added batch and dedicated SKU clarifications, signalling product and financial maturation. Medium SI009, SI002, SI004
CI037 Public disclosure across ten standard financial primitives is missing or partial, qualifying as a material diligence gap. High SI020, SI001, SI002, SI003
CE001 Together AI exposes serverless inference, dedicated endpoints, fine-tuning, batch, embeddings, vision, audio, and image APIs. High SE016, SE018, SE001, SE003
CE002 Together AI publishes an OpenAI-compatible chat-completions endpoint to simplify migration. High SE022, SE035
CE003 The Together model catalog spans 200+ open and custom models including Llama, Mistral, Mixtral, Qwen, DeepSeek, StripedHyena. High SE018, SE036, SE045
CE004 Dedicated endpoints offer reserved H100/H200/B200 capacity with BAA available for HIPAA workloads. High SE020, SE003, SE005
CE005 Fine-tuning API supports LoRA and full-parameter training jobs on most supported families. High SE019, SE042
CE006 Batch inference offers up to 50% discount vs serverless as of the 2025 update. Medium SE011, SE021
CE007 Embeddings API offers multiple open embedding models per published reference. High SE024, SE034
CE008 Together publishes vision, audio, and image APIs as documented surfaces. High SE031, SE032, SE033
CE009 SDKs ship in Python (PyPI: together) and TypeScript with raw HTTP fallback. High SE044, SE043, SE017
CE010 Rate-limit documentation distinguishes free, paid, and enterprise tiers. High SE025, SE016
CE011 Together architecture stacks API gateway, model registry, inference scheduler, TIE v2, and GPU pool. High SE016, SE009, SE010
CE012 Together Inference Engine v2 integrates FlashAttention-3/4 and ThunderKittens kernels. High SE010, SE006, SE007, SE008
CE013 Speculative decoding and Medusa decoders are integrated into the inference engine. Medium SE053, SE054, SE055
CE014 Mixture-of-Agents (MoA) provides ensemble inference for higher-quality completions on supported models. Medium SE056, SE012
CE015 FlashAttention-3 paper (arXiv 2407.08608) describes the kernel anchoring Together throughput claims. High SE052, SE006
CE016 FlashAttention-4 was released in August 2025 and extends the kernel lead to Hopper and Blackwell. Medium SE007, SE012
CE017 ThunderKittens kernel framework was released in 2024 by Together and Stanford HazyResearch. High SE008, SE065
CE018 NVIDIA is the primary GPU supplier (Hopper H100/H200, Blackwell B200) and a strategic investor. High SE060, SE014, SE001
CE019 HuggingFace is the primary model artefact partner and hosts Together-published checkpoints. High SE045, SE049
CE020 A status page is published at status.together.ai documenting platform reliability. Medium SE062
CE021 The public SLA percentage for serverless and dedicated tiers is not yet documented at the runDate. Medium SE062, SE025
CE022 Together infrastructure organisation expanded in 2025 with Alon Gavrielov as VP of Infrastructure Strategy. High SE015, SE005
CE023 Trust center publishes SOC 2 Type II attestation references and HIPAA BAA availability. High SE063, SE066, SE067
CE024 HIPAA BAA is available on dedicated endpoints but not serverless tier per documentation. Medium SE063, SE020
CE025 GDPR / DPA terms are available for EU customers per trust center documentation. Medium SE063
CE026 FedRAMP accreditation is not yet listed in the trust center at the runDate. Medium SE063
CE027 The full regional residency map (which regions, which co-lo partners) is not publicly disclosed. Medium SE063, SE020
CE028 ISO 27001 certification status is not publicly confirmed at the runDate. Medium SE063
CE029 Content moderation, function calling, JSON mode, and structured-output safety controls are documented surfaces. High SE028, SE027, SE026
CE030 Audit logs are documented for enterprise customers but not enabled by default. Medium SE063, SE020
CE031 Custom-model-weights privacy controls are documented for dedicated tier. Medium SE020, SE063
CE032 A bug bounty / responsible disclosure programme is published on the trust center. Medium SE063
CE033 GTC 2025 Pioneers event surfaced multiple Together customer + NVIDIA partnerships. High SE014, SE060
CE034 Adaption partnership (2025) extends Together into healthcare workflows. Medium SE005
CE035 AI Native Conference 2025 announced research and product directions including MoA productisation. High SE012, SE005
CE036 Blackwell (B200) capacity ramp is documented as 2026 roadmap item in blog references. Low SE005, SE014
CE037 Multi-modal expansion (vision + audio) is a documented 2026 roadmap area. Low SE005, SE012
CU001 Together AI reports more than 100,000 developers have used the platform per company disclosure. Medium SU004, SU003, SU001
CU002 Self-serve developer signup is the primary top-of-funnel adoption motion for Together AI. High SU038, SU001, SU003
CU003 Together customers page enumerates named startup and enterprise deployments. High SU003, SU001
CU004 AI-native startups (Pika, Cartesia, Arcee, Nous Research) are documented production customers. High SU012, SU015, SU013, SU014, SU003
CU005 Enterprise SaaS deployments at Salesforce and Zoom are documented case studies. High SU010, SU011, SU003
CU006 Washington University is referenced as a research-compute customer in a case study. Medium SU016, SU003
CU007 Adaption (2025) extends Together into healthcare workflows. Medium SU008, SU004
CU008 NVIDIA GTC 2025 Pioneers programme surfaced a cohort of joint Together + NVIDIA customers. High SU007, SU018
CU009 Startup Accelerator launched in November 2024 as an explicit startup-acquisition funnel. High SU006, SU004
CU010 Geographic mix is North America-skewed with EU presence growing through dedicated clusters. Low SU003, SU004, SU001
CU011 Buyer/user split differs by tier: developer-led self-serve vs CIO/platform-eng-led enterprise. Medium SU038, SU003
CU012 Salesforce case study documents integration depth and is treated as production deployment. High SU010, SU017, SU003
CU013 Zoom case study documents AI-feature inference at production scale. High SU011, SU003
CU014 Pika case study cites latency improvement from FlashAttention-class kernels. High SU012, SU003
CU015 Cartesia case study documents voice-model production deployment on dedicated tier. High SU015, SU003
CU016 Arcee case study documents cost reduction relative to closed APIs. Medium SU013, SU003
CU017 Nous Research case study documents community model hosting on Together. Medium SU014, SU003
CU018 Washington University case study documents research-compute usage. Medium SU016, SU003
CU019 Adaption is described as a launching partnership rather than confirmed production deployment. Medium SU008
CU020 GTC 2025 cohort case studies cover developer tools, robotics, healthcare, and content/media. Medium SU007
CU021 HuggingFace partnership funnels developers from the model hub into Together. Medium SU019, SU020
CU022 Net dollar retention (NDR) is not publicly disclosed at the runDate. High SU003, SU001, SU004
CU023 Gross retention (GRR) and named-account churn are not publicly disclosed. High SU003, SU001, SU004
CU024 Paid vs free developer counts are not disclosed. High SU004, SU003
CU025 Dedicated-endpoint renewal rate is not publicly disclosed. High SU004, SU003
CU026 G2 and Trustpilot review counts for Together are small, limiting independent proxies. Medium SU026, SU027
CU027 Salesforce Ventures-led Series B and customer case study together signal a multi-year channel commitment. Medium SU017, SU010, SU004
CU028 GTC 2025 Pioneers cohort acts as an enterprise pipeline amplifier through NVIDIA. Medium SU007, SU018
CU029 Startup Accelerator provides credits and GTM amplification to long-tail AI startups. High SU006, SU004
CU030 Adaption launch indicates a follow-on path into regulated healthcare workflows. Medium SU008
CU031 Enterprise sales cycle requires custom MSA and security review, adding 60-120 days before revenue. Low SU004, SU038
CU032 Top-10 customer concentration is undisclosed and is a material diligence ask. High SU003, SU004
CU033 Public customer mix skews AI-native startups + developer tools rather than a single mega-anchor. Medium SU003, SU006, SU007
CU034 No public lawsuit or named-account churn report has surfaced for Together at the runDate. Medium SU023, SU022, SU004
CU035 Reddit and Hacker News threads occasionally cite latency or cold-start concerns on the serverless tier. Low SU023, SU022
CU036 Public status page exists but no SLA percentage is published for serverless or dedicated tiers. Medium SU042, SU038
CU037 PyPI download trajectory and GitHub repo activity indicate sustained developer pull. Medium SU040, SU041
CR001 FTC opened a 6(b) inquiry in 2024 into generative-AI investments and partnerships, naming the major cloud-AI relationships. High SR002, SR001
CR002 FTC has stated ongoing 2024-2025 attention to GenAI competition and consumer-protection enforcement. High SR001, SR002
CR003 EU AI Act entered into force in 2024 with phased GPAI obligations through 2026-2027 including fines up to 7% of global revenue. High SR003, SR012
CR004 BIS tightened advanced-computing export controls in 2025 covering H100, H200, B200 and certain foundation-model weights. High SR005, SR008
CR005 NIST AI Risk Management Framework establishes voluntary US federal AI controls increasingly used in enterprise procurement. High SR004, SR008
CR006 UK ICO has published GenAI guidance creating UK DPA compliance baseline. Medium SR006
CR007 Australia OAIC has published a 2024 GenAI guide for organisations. Medium SR007
CR008 White House EO on AI (2023, amended 2025) sets reporting thresholds for foundation-model training. Medium SR008
CR009 CCPA imposes privacy obligations on Together for California-resident user data. High SR009, SR012
CR010 HIPAA BAA support is published as available for healthcare workloads. High SR010, SR028, SR026
CR011 SOC 2 attestation surface is referenced via the AICPA SOC framework and Together trust center. Medium SR011, SR028
CR012 NYT v Microsoft/OpenAI active litigation (CourtListener docket) is the bellwether GenAI copyright case in US. High SR013, SR014
CR013 Authors Guild v OpenAI active litigation expands copyright exposure to non-press content. High SR014, SR013
CR014 Getty Images v Stability AI active litigation tests image-model copyright exposure on both US and UK sides. High SR015, SR014
CR015 Civil-society organisations (CDT) actively lobby for AI accountability, adding reputational pressure. Medium SR012
CR016 Together is not currently named in any of the bellwether GenAI copyright suits. Medium SR013, SR014, SR015, SR025
CR017 Open-model hosting carries adjacent precedent risk if copyright cases extend to platform hosts. Medium SR013, SR014, SR015
CR018 Together publishes a public status page but does not publish an SLA percentage. High SR027, SR030
CR019 Pen-test cadence, breach plan, and named incident history are not publicly disclosed. High SR028, SR025
CR020 Safety models and function-calling guardrails are documented mitigations for prompt-injection class risks. High SR031, SR030
CR021 HuggingFace integrity checks are inherited for model-weight artefacts; weight-signing process is undisclosed. Medium SR028, SR025
CR022 Trust center references SOC 2 Type II posture; attestation expiry date is not public. Medium SR028, SR011
CR023 NVIDIA is supplier of GPUs, networking, and software stack and a strategic investor — single-vendor concentration is high. High SR025, SR024, SR029
CR024 HuggingFace is the primary model-artefact dependency for the Together catalog. High SR025, SR029
CR025 Salesforce Ventures is lead enterprise channel investor and co-sell partner. High SR025, SR029
CR026 Datacenter / colo capacity counterparties are largely undisclosed; multi-region build is implied but not enumerated. Medium SR025, SR024
CR027 Capital partners include GC, Salesforce, NVIDIA, Lux, Coatue, Prosperity7, and Kleiner per public round disclosures. High SR025, SR034, SR035
CR028 Top-10 customer concentration is undisclosed and is a material diligence ask. High SR029, SR025
CR029 Competitive displacement risk is documented from Fireworks, Replicate, Modal, Anyscale, Groq, Cerebras, CoreWeave, Lambda. High SR019, SR020, SR021, SR022, SR017, SR018, SR016, SR023
CR030 Open-source model upstream license changes (Llama, Mistral, Qwen, DeepSeek) would introduce review and compliance burden. Medium SR025, SR029
CR031 Sovereign / Prosperity7-adjacent backing adds geopolitical disclosure considerations. Medium SR034, SR035, SR025
CR032 Key-person dependency on Vipul Ved Prakash, Ce Zhang, and Tri Dao is high; founder retention is the mitigation. High SR024, SR025
CR033 CFO and CRO presence at runDate is not publicly confirmed and is a material recruiting diligence ask. Medium SR025, SR024
CR034 Engineering and infra hiring momentum is visible (Alon Gavrielov 2025 VP-infra hire) but exact bench size is undisclosed. Medium SR025, SR024
CR035 Hopper→Blackwell→Rubin transition execution is a multi-quarter program-management risk for the chapter. Medium SR025
CR036 Monitorable kill triggers (NVIDIA allocation cut, HF policy change, EU AI Act fine, copyright host-ruling) can be tracked from public disclosure. Medium SR025, SR003, SR005, SR013
CR037 Operational kill triggers (multi-hour serverless outage, breach disclosure) are monitorable through status page and press. Medium SR027, SR025, SR032, SR033
CR038 Commercial kill triggers (Salesforce co-sell deprioritisation, customer concentration >25% single) are monitorable through press and reference calls. Medium SR025, SR029
CR039 Founder-departure triggers are catastrophic for the thesis at growth stage. Medium SR025, SR024
CR040 Financing kill triggers (flat/down round vs Series B at runDate) would re-underwrite valuation. Medium SR025, SR034, SR035
CR041 Adverse-source coverage spans regulators, court dockets, competitors, and developer-sentiment fora. High SR002, SR013, SR019, SR032, SR033
CR042 Several control primitives (SLA, incident, breach plan, top-10 concentration, GPU committed spend) remain undisclosed at runDate and are explicit diligence asks. High SR025, SR029, SR028, SR027
CV001 Recommendation is Hold / Monitor with medium confidence at the Series B mark. Medium SV025, SV027, SV007
CV002 Conditional Buy on a 25%+ correction or confirmed >$500M ARR plus >120% NRR. Medium SV008, SV007, SV005
CV003 Conditional Pass if hyperscaler pricing cuts >40%, NVIDIA allocation cuts, or breach disclosure occurs. Medium SV001, SV002, SV003
CV004 Risk rating is medium-high reflecting concentration, regulatory, and competitive overhangs. Medium SV001, SV002, SV006
CV005 Valuation stance is "at-or-near" the current Series B mark with explicit triggers to revisit. Medium SV007, SV008
CV006 GenAI inference TAM grows 40-60% CAGR per multiple analyst sources at 2025 mid-point. High SV001, SV002, SV003, SV004, SV005
CV007 FlashAttention authorship by Tri Dao and ThunderKittens (Stanford HazyResearch) anchor Together's kernel moat. High SV025, SV024
CV008 Together Inference Engine v2 and MoA productisation extend the technical surface beyond commoditised inference. Medium SV025
CV009 Salesforce Ventures-led Series B + customer case study imply multi-year channel commitment. Medium SV043, SV025, SV018
CV010 NVIDIA strategic investment + GTC 2025 Pioneers cohort signal supply + pipeline alignment. High SV025, SV044
CV011 Open-source neutrality (Llama, Mistral, Qwen, DeepSeek) is defensible positioning vs closed-API providers. Medium SV025, SV027
CV012 Documented enterprise + startup proof base spans Salesforce, Zoom, Pika, Cartesia, Arcee, Nous Research, GTC 2025 Pioneers. High SV027, SV025
CV013 Anti-thesis: hyperscaler bundled inference (Bedrock, Vertex, Azure) could compress pricing 30-50%. Medium SV001, SV002, SV006
CV014 Anti-thesis: copyright litigation precedent (NYT, Authors Guild, Getty) could extend to platform hosts. Medium SV025, SV008
CV015 Bull case (25% prob) assumes ARR >$1B by 2028 and exit $8B-$12B. Medium SV001, SV005, SV006
CV016 Base case (50% prob) assumes ARR $500M-$700M by 2028 and exit $4B-$6B. Medium SV001, SV003, SV002
CV017 Bear case (25% prob) assumes ARR $200M-$300M by 2028 and outcome $1B-$2.5B. Medium SV001, SV002, SV006
CV018 Sensitivity to ARR growth is the single largest valuation driver in the chapter model. Medium SV007, SV008
CV019 Gross margin sensitivity is ±1000bps shifts valuation outcome ±$2-3B at base case. Medium SV014, SV013
CV020 Multiple sensitivity is ±3x ARR shifts exit ±$2.5B at base case. Medium SV013, SV015
CV021 Probability weights are subjective and re-marked at Series C and major events. Low SV007, SV008
CV022 CoreWeave post-IPO trades 8-12x NTM revenue as GPU-cloud comparable. Medium SV014, SV018
CV023 Navan S-1 disclosed 8-12x NTM revenue range at filing for growth-stage SaaS. Medium SV013, SV030
CV024 Figma S-1 disclosed 12-15x NTM revenue range as high-multiple SaaS reference. Medium SV015, SV029
CV025 Fireworks AI rumoured 2024 round valued ~$4B per press reports. Low SV018, SV019
CV026 Replicate and Modal rounds undisclosed in public press. Medium SV023, SV022
CV027 Anyscale private valuation rumoured $1B-$2B at last round. Low SV023, SV019
CV028 Sakana AI round ~$1.5B Aug 2024 per TechCrunch and NVIDIA partnership. Medium SV031, SV032
CV029 Mistral round ~$6B mid-2024 as OSS-model-lab comparable. Medium SV019, SV018
CV030 Anthropic round at $60B+ in 2025 as closed-API reference, not direct comparable. Medium SV018, SV019
CV031 NVIDIA public NTM revenue multiple high-teens to mid-20s acts as ceiling reference. Medium SV018, SV019
CV032 Snowflake NTM revenue multiple 10-15x acts as mature-SaaS ceiling reference. Medium SV018, SV019
CV033 ARR run-rate <$500M by FY2027 is the base-case kill trigger. Medium SV008, SV007
CV034 Salesforce co-sell public deprioritisation is the bull-case kill trigger. Medium SV043, SV025
CV035 NVIDIA Blackwell allocation cut to a peer is a re-underwrite trigger. Medium SV044, SV025
CV036 Hyperscaler bundled pricing cut >40% on AWS Bedrock or peer is a base-compression trigger. Medium SV001, SV002
CV037 Platform-host copyright precedent is an OSS-revenue re-underwrite trigger. Medium SV025, SV008
CV038 Series C flat-or-down vs Series B is a mark-to-market trigger. Medium SV018, SV019, SV007
CV039 Founder departure (CEO/CTO/CSO) is a kill trigger. Medium SV024, SV025
CV040 Exact ARR at runDate is undisclosed and is the principal diligence ask. High SV008, SV007, SV027
CV041 NRR / GRR / cohort retention are undisclosed at runDate and are material diligence asks. High SV027, SV025
CV042 Top-10 customer concentration and GPU committed spend are undisclosed. High SV027, SV025
CV043 CFO and CRO presence at runDate is unconfirmed. Medium SV024, SV025
CV044 Opex split (R&D / S&M / G&A), paid-developer count, SOC 2 expiry, and OSS hosting policy are all diligence asks. Medium SV025, SV027
CV045 Sacra estimates Together AI reached $1B in annualized revenue by February 2026, up from ~$618M at year-end 2025, representing ~400% year-over-year growth in 2024. Medium SV045, SV046
CV046 Together AI is in talks to raise approximately $1B at a $7.5B pre-money valuation as of March 2026, which would represent a >2× step-up from the $3.3B Series B valuation set in February 2025. Medium SV045, SV047
CV047 EquityZen lists Together AI as available for pre-IPO secondary share purchases by accredited investors, indicating secondary-market liquidity exists for current shareholders. Medium SV047, SV045
CV048 CB Insights' Q1 2026 State of AI report identifies AI infrastructure as the leading funding category in early 2026, with total AI deal activity up materially from prior quarters, supporting the demand context for Together AI's growth. High SV048, SV001
Sources
IDPublisherTitleQuote
SO001 Together AI Together AI — The AI Acceleration Cloud
SO002 Together AI About | Together AI
SO003 Together AI Careers | Together AI
SO004 Together AI Contact | Together AI
SO005 Together AI Together AI Blog
SO006 Together AI Together AI raises $102.5M Series A
SO007 Together AI RedPajama, a project to create leading open-source models
SO008 Together AI Announcing OpenChatKit
SO009 Together AI FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
SO010 Together AI Together Inference Engine 2.0
SO011 Together AI Research | Together AI
SO012 CNBC Together AI raises $305 million at $3.3 billion valuation
SO013 Bloomberg Together AI Startup Raises Funds at $3.3 Billion Valuation
SO014 TechCrunch Together raises $102.5M to build open-source generative AI
SO015 TechCrunch Together AI is worth $1.25B (March 2024 update)
SO016 Fast Company Together AI funding profile
SO017 VentureBeat Together AI raises $305M for open-source GenAI
SO018 Wikipedia Together AI — Wikipedia
SO019 Crunchbase Together AI — Crunchbase Profile
SO020 Hacker News Submissions from together.ai
SO021 Reddit r/LocalLLaMA Together AI discussions
SO022 Product Hunt Together AI on Product Hunt
SO023 StackShare Together AI Tech Stack
SO024 X (Together) @togethercompute on X
SO025 Salesforce Ventures Salesforce Ventures Perspectives
SO026 NVIDIA NVIDIA AI investments 2024
SO027 SEC EDGAR SEC EDGAR — Together AI search
SO028 GitHub Together Computer · GitHub Org
SO029 GitHub togethercomputer/RedPajama-Data
SO030 GitHub togethercomputer/OpenChatKit
SO031 GitHub togethercomputer/StripedHyena
SO032 GitHub Dao-AILab/flash-attention
SO033 HuggingFace togethercomputer on Hugging Face
SO034 HuggingFace StripedHyena-Nous-7B
SO035 Together AI Introduction | Together AI Docs
SO036 arXiv FlashAttention-3: Fast and Accurate Attention with Asynchrony
SO037 arXiv Mixture-of-Agents Enhances LLM Capabilities
SO038 Gartner Gartner AI Insights
SO039 CoreWeave CoreWeave — Specialized GPU Cloud
SM001 Together AI Together AI — The AI Acceleration Cloud
SM002 Together AI Pricing | Together AI
SM003 Together AI Customers | Together AI
SM004 Together AI Together AI Blog
SM005 Together AI AI Native Conf — research & product announcements
SM006 Together AI Batch inference API updates 2025
SM007 Together AI Inference Models | Together AI Docs
SM008 Together AI Serverless Inference | Together AI Docs
SM009 Together AI Dedicated Endpoints | Together AI Docs
SM010 Together AI Batch Inference | Together AI Docs
SM011 AWS Amazon Bedrock
SM012 Google Cloud Vertex AI
SM013 CoreWeave CoreWeave — Specialized GPU Cloud
SM014 Lambda Labs Lambda — GPU Cloud for AI
SM015 Replicate Replicate — Run models in the cloud
SM016 Modal Modal — Serverless AI infrastructure
SM017 Anyscale Anyscale — Powered by Ray
SM018 Groq Groq — Fast AI inference
SM019 Fireworks AI Fireworks AI — Production-grade LLM inference
SM020 Cerebras Cerebras — Wafer-Scale AI
SM021 Gartner Gartner AI Insights
SM022 arXiv LLM inference infrastructure survey
SM023 Wikipedia Together AI — Wikipedia
SM024 CNBC Together AI raises $305 million at $3.3 billion valuation
SM025 Bloomberg Together AI Startup Raises Funds at $3.3 Billion Valuation
SM026 Fast Company Together AI funding profile
SM027 Salesforce Ventures Salesforce Ventures Perspectives
SM028 NVIDIA NVIDIA AI investments 2024
SM029 Hacker News Submissions from together.ai
SM030 Reddit r/LocalLLaMA Together AI discussions
SM031 Product Hunt Together AI on Product Hunt
SM032 Together AI Together AI Startup Accelerator
SM033 Together AI Together AI at NVIDIA GTC 2025
SP001 Together AI Together AI — The AI Acceleration Cloud
SP002 Together AI Pricing | Together AI
SP003 Together AI Customers | Together AI
SP004 Together AI Together AI Blog
SP005 Together AI FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
SP006 Together AI FlashAttention-4
SP007 Together AI Together Inference Engine 2.0
SP008 Together AI ThunderKittens kernel framework
SP009 Together AI AI Native Conf — research & product announcements
SP010 Together AI Inference Models | Together AI Docs
SP011 Together AI Serverless Inference | Together AI Docs
SP012 Together AI Dedicated Endpoints | Together AI Docs
SP013 Together AI Batch Inference | Together AI Docs
SP014 Together AI Rate Limits | Together AI Docs
SP015 Together AI Chat Completions API Reference
SP016 Together AI Completions API Reference
SP017 Together AI Models API Reference
SP018 AWS Amazon Bedrock
SP019 Google Cloud Vertex AI
SP020 CoreWeave CoreWeave — Specialized GPU Cloud
SP021 Lambda Labs Lambda — GPU Cloud for AI
SP022 Replicate Replicate — Run models in the cloud
SP023 Modal Modal — Serverless AI infrastructure
SP024 Anyscale Anyscale — Powered by Ray
SP025 Groq Groq — Fast AI inference
SP026 Fireworks AI Fireworks AI — Production-grade LLM inference
SP027 Cerebras Cerebras — Wafer-Scale AI
SP028 TensorWave TensorWave — AMD GPU cloud
SP029 arXiv FlashAttention: Fast and Memory-Efficient Exact Attention
SP030 arXiv FlashAttention-2: Faster Attention with Better Parallelism
SP031 arXiv FlashAttention-3: Fast and Accurate Attention with Asynchrony
SP032 arXiv Speculative Decoding paper
SP033 arXiv Medusa speculative decoding paper
SP034 arXiv LLM inference infrastructure survey
SP035 arXiv LLM evaluation benchmark paper
SP036 Reddit r/LocalLLaMA Together AI discussions
SP037 Hacker News Submissions from together.ai
SP038 Product Hunt Together AI on Product Hunt
SP039 StackShare Together AI Tech Stack
SP040 Gartner Gartner AI Insights
SP041 NVIDIA NVIDIA AI investments 2024
SP042 GitHub togethercomputer/together-python SDK
SP043 PyPI together — Python package
SP044 Wikipedia Together AI — Wikipedia
SI001 Together AI Together AI — The AI Acceleration Cloud
SI002 Together AI Pricing | Together AI
SI003 Together AI Customers | Together AI
SI004 Together AI Together AI Blog
SI005 Together AI Together AI raises $102.5M Series A
SI006 Together AI Announcing $305M Series B
SI007 Together AI Series A2 announcement
SI008 Together AI Seed funding announcement
SI009 Together AI Batch inference API updates 2025
SI010 Together AI Together AI Startup Accelerator
SI011 CNBC Together AI raises $305 million at $3.3 billion valuation
SI012 CNBC Together AI raises $305 million (follow-up)
SI013 Bloomberg Together AI Startup Raises Funds at $3.3 Billion Valuation
SI014 Fast Company Together AI funding profile
SI015 TechCrunch Together raises $102.5M to build open-source generative AI
SI016 TechCrunch Together AI is worth $1.25B (March 2024 update)
SI017 VentureBeat Together AI raises $305M for open-source GenAI
SI018 Wikipedia Together AI — Wikipedia
SI019 Crunchbase Together AI — Crunchbase Profile
SI020 SEC EDGAR SEC EDGAR — Together AI search
SI021 Salesforce Ventures Salesforce Ventures Perspectives
SI022 NVIDIA NVIDIA AI investments 2024
SI023 X (Together) @togethercompute on X
SI024 Gartner Gartner AI Insights
SI025 PitchBook Together AI — PitchBook profile
SI026 The Information Together AI revenue 2025 reporting
SI027 Forrester Forrester: Generative AI infrastructure landscape
SI028 IDC IDC Worldwide AI Software Market Forecast 2024-2028
SI029 a16z a16z — State of Generative AI in the Enterprise 2025
SI030 Menlo Ventures Menlo Ventures: 2025 State of AI
SI031 Bessemer Venture Partners Bessemer: State of AI 2025
SI032 SEC EDGAR CoreWeave SEC filings (S-1 and post-IPO)
SI033 SEC EDGAR Navan S-1/A filing
SI034 SEC EDGAR Figma S-1 filings (comparable IPO)
SI035 CoreWeave CoreWeave — Specialized GPU Cloud
SI036 Fireworks AI Fireworks AI — Production-grade LLM inference
SI037 Groq Groq — Fast AI inference
SI038 Reddit r/LocalLLaMA Together AI discussions
SI039 Hacker News Submissions from together.ai
SE001 Together AI Together AI — The AI Acceleration Cloud
SE002 Together AI About | Together AI
SE003 Together AI Pricing | Together AI
SE004 Together AI Customers | Together AI
SE005 Together AI Together AI Blog
SE006 Together AI FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
SE007 Together AI FlashAttention-4
SE008 Together AI ThunderKittens kernel framework
SE009 Together AI Together Inference Engine 2.0
SE010 Together AI Together Inference Engine v2
SE011 Together AI Batch inference API updates 2025
SE012 Together AI AI Native Conf — research & product announcements
SE013 Together AI Together AI Startup Accelerator
SE014 Together AI Together AI at NVIDIA GTC 2025
SE015 Together AI Alon Gavrielov joins as VP Infrastructure Strategy
SE016 Together AI Introduction | Together AI Docs
SE017 Together AI Quickstart | Together AI Docs
SE018 Together AI Inference Models | Together AI Docs
SE019 Together AI Fine-tuning Overview | Together AI Docs
SE020 Together AI Dedicated Endpoints | Together AI Docs
SE021 Together AI Batch Inference | Together AI Docs
SE022 Together AI Chat Completions API Reference
SE023 Together AI Serverless Inference | Together AI Docs
SE024 Together AI Embeddings | Together AI Docs
SE025 Together AI Rate Limits | Together AI Docs
SE026 Together AI JSON Mode | Together AI Docs
SE027 Together AI Function Calling | Together AI Docs
SE028 Together AI Safety Models | Together AI Docs
SE029 Together AI Code Execution | Together AI Docs
SE030 Together AI LLMs Overview | Together AI Docs
SE031 Together AI Vision Models Overview | Together AI Docs
SE032 Together AI Audio Models Overview | Together AI Docs
SE033 Together AI Image Models Overview | Together AI Docs
SE034 Together AI Embeddings API Reference
SE035 Together AI Completions API Reference
SE036 Together AI Models API Reference
SE037 GitHub Together Computer · GitHub Org
SE038 GitHub togethercomputer/RedPajama-Data
SE039 GitHub togethercomputer/OpenChatKit
SE040 GitHub Dao-AILab/flash-attention
SE041 GitHub togethercomputer/StripedHyena
SE042 GitHub togethercomputer/Llama-2-7B-32K-Instruct
SE043 GitHub togethercomputer/together-python SDK
SE044 PyPI together — Python package
SE045 HuggingFace togethercomputer on Hugging Face
SE046 HuggingFace StripedHyena-Nous-7B
SE047 HuggingFace Evo-1-131k-base
SE048 HuggingFace RedPajama-Data-1T Dataset
SE049 HuggingFace HuggingFace x Together AI partnership
SE050 arXiv FlashAttention: Fast and Memory-Efficient Exact Attention
SE051 arXiv FlashAttention-2: Faster Attention with Better Parallelism
SE052 arXiv FlashAttention-3: Fast and Accurate Attention with Asynchrony
SE053 arXiv Speculative Decoding paper
SE054 arXiv Speculative decoding follow-up
SE055 arXiv Medusa speculative decoding paper
SE056 arXiv Mixture-of-Agents Enhances LLM Capabilities
SE057 arXiv LLM inference infrastructure survey
SE058 arXiv LLM evaluation benchmark paper
SE059 arXiv Sheared LLaMA paper
SE060 NVIDIA NVIDIA AI investments 2024
SE061 AWS Amazon Bedrock
SE062 Together AI Together AI status page
SE063 Together AI Together AI trust center
SE064 Tri Dao Tri Dao personal site (Together CSO)
SE065 Stanford HazyResearch Stanford HazyResearch lab (Chris Ré)
SE066 AICPA SOC 2 reporting framework
SE067 HHS HIPAA sample BAA provisions
SE068 Hacker News Submissions from together.ai
SE069 Reddit r/LocalLLaMA Together AI discussions
SE070 Product Hunt Together AI on Product Hunt
SE071 StackShare Together AI Tech Stack
SU001 Together AI Together AI — The AI Acceleration Cloud
SU002 Together AI About | Together AI
SU003 Together AI Customers | Together AI
SU004 Together AI Together AI Blog
SU005 Together AI Pricing | Together AI
SU006 Together AI Together AI Startup Accelerator
SU007 Together AI Together AI at NVIDIA GTC 2025
SU008 Together AI Together AI x Adaption partnership
SU009 Together AI AI Native Conf — research & product announcements
SU010 Together AI Salesforce customer case study
SU011 Together AI Zoom customer case study
SU012 Together AI Pika customer case study
SU013 Together AI Arcee customer case study
SU014 Together AI Nous Research customer case study
SU015 Together AI Cartesia customer case study
SU016 Together AI Washington University customer case study
SU017 Salesforce Ventures Salesforce Ventures Perspectives
SU018 NVIDIA NVIDIA AI investments 2024
SU019 HuggingFace HuggingFace x Together AI partnership
SU020 HuggingFace togethercomputer on Hugging Face
SU021 Together AI Together AI Blog (apex)
SU022 Reddit r/LocalLLaMA Together AI discussions
SU023 Hacker News Submissions from together.ai
SU024 Product Hunt Together AI on Product Hunt
SU025 StackShare Together AI Tech Stack
SU026 G2 Together AI — G2 reviews
SU027 Trustpilot Together AI — Trustpilot reviews
SU028 Wikipedia Together AI — Wikipedia
SU029 Crunchbase Together AI — Crunchbase Profile
SU030 Fireworks AI Fireworks AI — Production-grade LLM inference
SU031 Replicate Replicate — Run models in the cloud
SU032 CNBC Together AI raises $305 million at $3.3 billion valuation
SU033 Bloomberg Together AI Startup Raises Funds at $3.3 Billion Valuation
SU034 Fast Company Together AI funding profile
SU035 TechCrunch Together AI is worth $1.25B (March 2024 update)
SU036 VentureBeat Together AI raises $305M for open-source GenAI
SU037 Gartner Gartner AI Insights
SU038 Together AI Introduction | Together AI Docs
SU039 Together AI Inference Models | Together AI Docs
SU040 PyPI together — Python package
SU041 GitHub togethercomputer/together-python SDK
SU042 Together AI Together AI status page
SR001 FTC FTC: AI Companies — Uphold Your Privacy & Confidentiality Commitments
SR002 FTC FTC launches inquiry into generative AI investments & partnerships
SR003 EUR-Lex EU Regulation 2024/1689 (AI Act)
SR004 NIST AI Risk Management Framework
SR005 US BIS BIS export controls on advanced computing & foundation models
SR006 UK ICO UK Information Commissioner — Our work on AI
SR007 OAIC (Australia) OAIC guidance on privacy and AI products
SR008 The White House Executive Order 14110 on Safe, Secure AI
SR009 CA Attorney General California Consumer Privacy Act guidance
SR010 HHS HIPAA sample BAA provisions
SR011 AICPA SOC 2 reporting framework
SR012 Center for Democracy & Technology CDT — AI policy & governance
SR013 CourtListener NYT v Microsoft / OpenAI docket
SR014 CourtListener Authors Guild v OpenAI docket
SR015 CourtListener Getty Images v Stability AI docket
SR016 CoreWeave CoreWeave — Specialized GPU Cloud
SR017 Groq Groq — Fast AI inference
SR018 Cerebras Cerebras — Wafer-Scale AI
SR019 Fireworks AI Fireworks AI — Production-grade LLM inference
SR020 Replicate Replicate — Run models in the cloud
SR021 Modal Modal — Serverless AI infrastructure
SR022 Anyscale Anyscale — Powered by Ray
SR023 Lambda Labs Lambda — GPU Cloud for AI
SR024 Together AI Together AI — The AI Acceleration Cloud
SR025 Together AI Together AI Blog
SR026 Together AI Pricing | Together AI
SR027 Together AI Together AI status page
SR028 Together AI Together AI trust center
SR029 Together AI Customers | Together AI
SR030 Together AI Introduction | Together AI Docs
SR031 Together AI Safety Models | Together AI Docs
SR032 Hacker News Submissions from together.ai
SR033 Reddit r/LocalLLaMA Together AI discussions
SR034 CNBC Together AI raises $305 million at $3.3 billion valuation
SR035 Bloomberg Together AI Startup Raises Funds at $3.3 Billion Valuation
SR036 VentureBeat Together AI raises $305M for open-source GenAI
SR037 Fast Company Together AI funding profile
SR038 Wikipedia Together AI — Wikipedia
SV001 Gartner Gartner AI Insights
SV002 Forrester Forrester: Generative AI infrastructure landscape
SV003 IDC IDC Worldwide AI Software Market Forecast 2024-2028
SV004 a16z a16z — State of Generative AI in the Enterprise 2025
SV005 Bessemer Venture Partners Bessemer: State of AI 2025
SV006 Menlo Ventures Menlo Ventures: 2025 State of AI
SV007 PitchBook Together AI — PitchBook profile
SV008 The Information Together AI revenue 2025 reporting
SV009 Meritech Capital Meritech SaaS comps table
SV010 PwC PwC Global AI Study — Sizing the prize
SV011 Y Combinator Y Combinator — Generative AI companies directory
SV012 SEC EDGAR SEC EDGAR — Together AI search
SV013 SEC EDGAR Navan S-1/A filing
SV014 SEC EDGAR CoreWeave SEC filings (S-1 and post-IPO)
SV015 SEC EDGAR Figma S-1 filings (comparable IPO)
SV016 SEC EDGAR Snowflake 10-K filings (public SaaS comp)
SV017 SEC EDGAR MongoDB 10-K filings (public infra comp)
SV018 CNBC Together AI raises $305 million at $3.3 billion valuation
SV019 Bloomberg Together AI Startup Raises Funds at $3.3 Billion Valuation
SV020 VentureBeat Together AI raises $305M for open-source GenAI
SV021 Fast Company Together AI funding profile
SV022 Wikipedia Together AI — Wikipedia
SV023 Crunchbase Together AI — Crunchbase Profile
SV024 Together AI Together AI — The AI Acceleration Cloud
SV025 Together AI Together AI Blog
SV026 Together AI Pricing | Together AI
SV027 Together AI Customers | Together AI
SV028 Together AI About | Together AI
SV029 CNBC Figma starts trading on NYSE after IPO
SV030 CNBC Navan files for IPO
SV031 TechCrunch Sakana AI $135M Series B at $2.65B
SV032 NVIDIA NVIDIA + Sakana AI partnership
SV033 CoreWeave CoreWeave — Specialized GPU Cloud
SV034 Groq Groq — Fast AI inference
SV035 Cerebras Cerebras — Wafer-Scale AI
SV036 Fireworks AI Fireworks AI — Production-grade LLM inference
SV037 Replicate Replicate — Run models in the cloud
SV038 Modal Modal — Serverless AI infrastructure
SV039 Anyscale Anyscale — Powered by Ray
SV040 Lambda Labs Lambda — GPU Cloud for AI
SV041 Hacker News Submissions from together.ai
SV042 Reddit r/LocalLLaMA Together AI discussions
SV043 Salesforce Ventures Salesforce Ventures Perspectives
SV044 NVIDIA NVIDIA AI investments 2024
SV045 Sacra Together AI revenue, valuation & funding — Sacra analysis Sacra estimates that Together AI hit $1B in annualized revenue in February 2026, up from ~$618M at the end of 2025, off growing demand for generative AI applications and the need, particularly among startups, for developer tooling used to train, fine-tune, and deploy AI models.
SV046 ARR.club Together AI ARR milestones and revenue growth
SV047 EquityZen Invest In Together AI Stock — Pre-IPO shares profile
SV048 CB Insights State of AI Q1 2026 Report