Diligence report Generative AI infrastructure / inference cloud late-stage private 2026-05-16

Together AI

Open-model inference cloud with credible technical moat and enterprise traction, priced near its Series B mark

Together AI shows credible inference-cloud product and traction at a Series B valuation that requires multi-year ARR scale to underwrite a strong exit.

Cover facts

Latest disclosed valuation (Series B 2024) 01

3.3 USD B [CV001]

Cumulative capital raised across seed / A / B 02

500 USD M (approximate, per press) [CV001, CV002]

Reported run-rate revenue range (press) 03

130-200 USD M ARR (per The Information, unverified) [CV040]

Named enterprise + startup customers 04

9 case studies + GTC 2025 cohort [CV012]

Developer signups (company-claimed) 05

100000 developers [CU001]

Company profile

Together AI is a generative-AI cloud that runs serverless and dedicated inference, fine-tuning, and training across 200+ open and custom models, anchored by FlashAttention, ThunderKittens, and Together Inference Engine v2. The company combines a defensible technical research base with a Salesforce + NVIDIA channel and an open-source community surface.

Website: www.together.ai
Founded: 2022-06-01
Founders: Vipul Ved Prakash, Ce Zhang, Tri Dao, Percy Liang
Founding location: San Francisco, California, USA
Headquarters: San Francisco, California
Product: Together AI sells serverless inference (per-token), dedicated endpoints (reserved GPU capacity), fine-tuning (LoRA + full), batch inference, embeddings, vision, audio, and image APIs across a 200+ open and custom model catalog, all OpenAI-compatible.
Customers: Developers (self-serve), AI-native startups (Pika, Cartesia, Arcee, Nous Research), enterprise SaaS (Salesforce, Zoom), healthcare (Adaption), academia (Washington University), and NVIDIA GTC 2025 Pioneers cohort.
Business model: Usage-based serverless inference + committed dedicated capacity + fine-tuning + enterprise contracts; Salesforce Ventures co-sell and Startup Accelerator augment direct sales.
Stage: late-stage private
Funding status: Privately funded; Series A $102.5M (Nov 2023, Kleiner Perkins led) and Series B $305M (Mar 2024, Salesforce Ventures led, ~$3.3B post-money per CNBC / Bloomberg / Fast Company); investors include NVIDIA, Coatue, Lux Capital, Prosperity7, General Catalyst.

Executive summary

Top strengths

Technical moat anchored by FlashAttention (Tri Dao), ThunderKittens (Stanford HazyResearch), Together Inference Engine v2, and Mixture-of-Agents productisation.
Anchor channel partners (Salesforce Ventures co-sell, NVIDIA GTC 2025 Pioneers, Startup Accelerator) plus a 200+ open-model catalog give wide enterprise + developer reach.
Documented enterprise + startup proof base spanning Salesforce, Zoom, Pika, Cartesia, Arcee, Nous Research, Washington University, and Adaption healthcare.

Top risks

Hyperscaler bundled inference (AWS Bedrock, GCP Vertex, Azure OpenAI) could compress pricing 30-50% over 2026-2027.
NVIDIA single-vendor concentration on GPUs, networking, and stack would cap revenue ramp if Blackwell allocation tightens.
GenAI regulatory perimeter (EU AI Act, BIS export controls, FTC inquiry) and copyright-litigation precedent (NYT, Authors Guild, Getty) widen through 2027.

Open gaps

Exact ARR, NRR / GRR, top-10 customer concentration, GPU committed spend, and opex split (R&D / S&M / G&A) are undisclosed.
CFO and CRO presence at runDate is not publicly confirmed.
SLA percentage, incident history, pen-test cadence, and breach plan are not disclosed beyond the public status page.
Sovereign-channel posture (Prosperity7-adjacent) and OSS hosting policy under tightening copyright precedent require management disclosure.

Chapter 01

01Company Overview

1.1 Identity, Headquarters, and Product Frame

Together AI markets itself as “the AI acceleration cloud,” offering training, fine-tuning, and inference for open-source and custom large language, image, audio, and vision models. The corporate entity, Together Computer Inc., is headquartered in San Francisco, California, with a satellite presence in Menlo Park and additional research staff in Zurich; the careers page and contact surface confirm both locations and an active hiring posture across infrastructure, kernel, GPU, applied-ML, and revenue roles. The company was incorporated on 27 June 2022 by four co-founders with deep ties to Stanford, Princeton, ETH Zürich, and the broader open-source LLM research community. Its identity rests on three pillars: a hyperscale GPU cloud purpose-built for AI workloads, an open-source research arm (RedPajama, OpenChatKit, StripedHyena, FlashAttention, Mixture-of-Agents), and a self-service inference and fine-tuning API competitive with OpenAI’s and Anthropic’s but priced for open models. The company emphasises that customers can keep weights, control data residency, and dedicate clusters when needed, which is the principal contrast with closed-API competitors.[CO001, CO002, CO003, CO004, CO005]

Snapshot KPI table
Metric	Value/status	Date	Confidence	Gap or diligence ask
Post-money valuation	$3.3B	2024-07-09	high	Confirm 2026 secondaries or new round
Total primary capital raised	≈$533M disclosed	2024-07	high	Verify any post-Jul-2024 extensions
Annualised revenue	≈$100M (third-party report)	2024-07	medium	No audited filing; request management figure
Headcount	>150 (job board derived)	2026-05	medium	No filing; request HR roster
GPU footprint	>20,000 NVIDIA Hopper-class	2024-07	medium	Confirm Blackwell additions and utilisation
Customer count	100,000+ developers (company-claimed)	2024	low	Distinguish paying vs free; verify NRR
HQ	San Francisco, CA	2026-05	high	—
Founding date	27 June 2022	2022	high	—

Values mix company disclosure (high), third-party reporting (medium), and inferred figures (low); paid-customer count and ARR are unaudited and must be validated with management.

[CO019, CO020, CO021, CO022, CO023, CO024]

FO002: Company snapshot logic

How identity, product, capital, and customers connect.

[CO001, CO003, CO005, CO017, CO020, CO021]

1.2 Founders, Leadership, and Governance

CEO Vipul Ved Prakash was previously co-founder/CTO of Topsy (acquired by Apple for ~$200M in 2013) and an early principal at Cloudmark, giving him both consumer-scale ML and infrastructure operating experience. CTO Ce Zhang is a tenured professor at ETH Zürich and Together’s research lead on distributed training systems and data-centric ML. Chief Scientist Chris Ré is the MacArthur-winning Stanford professor behind Snorkel and many of the FlashAttention/Hyena lines of work; Percy Liang, Stanford CRFM director, is co-founder and an advisor. The leadership bench has expanded with a head of revenue, head of GPU infrastructure, head of inference engineering, and a Zurich-based research head; the board includes investor partners from Coatue, Kleiner Perkins, NEA, and Lux. Key-person dependence is concentrated in Prakash for commercial execution and in the founding research trio for technical credibility, particularly given the open-source flywheel that drives much of Together’s top-of-funnel.[CO006, CO007, CO008, CO009, CO010, CO011]

Leadership and founder table
Person	Role	Background	Founder-market fit	Key-person dependency
Vipul Ved Prakash	Co-founder, CEO	Previously co-founder/CTO Topsy (acquired by Apple 2013), Cloudmark co-founder	Repeat infrastructure/consumer-ML founder with operating exit	High — sole CEO and primary commercial face
Ce Zhang	Co-founder, CTO	Tenured professor ETH Zürich; distributed training & data-centric ML research lead	Deep systems/ML research credibility	High — only CTO; bridges research & engineering
Chris Ré	Co-founder, Chief Scientist	MacArthur Fellow; Stanford CS; Snorkel co-founder; FlashAttention/Hyena lineage	Authored or advised most open-source IP	High — anchors research brand
Percy Liang	Co-founder	Director Stanford CRFM; HELM benchmark lead	Sets research agenda & academic credibility	Medium — advisory not full-time operational
Tri Dao	Chief Scientist (research)	FlashAttention author; Princeton CS faculty	Inference-kernel authority	High — drives kernel performance lead
Head of Revenue	Sales leadership (publicly listed roles)	Enterprise SaaS background	Required for enterprise expansion	Medium — multiple sales hires already
Head of GPU Infrastructure	Cluster engineering	Prior hyperscaler experience (job board)	Crucial for SLA & cost	Medium — recruiting actively

Founder bios cross-verified against official about page and Wikipedia; non-founder executives derived from careers postings and public LinkedIn footprints at runDate.

[CO006, CO007, CO008, CO009, CO010, CO011]

1.3 Funding History, Capital Stack, and Valuation

Together AI raised a $20M seed in May 2023 led by Lux Capital with Factory, SciFi, Long Journey, and individual backers including Scott Banister, Jakob Uszkoreit, and Aravind Srinivas. A $102.5M Series A followed in November 2023, led by Kleiner Perkins with NVIDIA, Emergence, NEA, Prosperity7, and Greycroft participating. In March 2024 the company added approximately $106M at a reported $1.25B valuation, then closed a $305M Series B in July 2024 led by Salesforce Ventures and Coatue at a $3.3B post-money valuation; Lakestar, NVIDIA, and an expanded set of strategics also participated. Cumulative disclosed primary capital is therefore approximately $533M before any 2025/2026 extensions, with no public S-1 filing or registered offering on EDGAR as of the run date. The investor mix — sovereign-aligned (Prosperity7), strategic GPU supplier (NVIDIA), category-defining cloud customer (Salesforce Ventures) and tier-1 financials (Coatue/KP/Lux) — is unusual and suggests Together is being positioned as a neutral, multi-stakeholder backbone for the open-model market.[CO012, CO013, CO014, CO015, CO016, CO017]

Stakeholder or investor map
Stakeholder	Role	Round(s)	Control/economic importance	Diligence ask
Salesforce Ventures	Lead Series B (2024)	B	Strategic distribution into Salesforce ecosystem	Confirm any commercial commit or revenue share
Coatue	Co-lead Series B	B	Public-market crossover signaling	Confirm pro-rata posture
Kleiner Perkins	Lead Series A	Seed/A/B	Board seat; partner Bucky Moore	Confirm board composition
NVIDIA	Strategic investor	A/B	Allocation of H100/H200/B200 supply	Quantify supply commitment & pricing
Lux Capital	Lead Seed	Seed/A	Earliest institutional backer	Confirm board observer rights
Emergence Capital	Series A	A	Enterprise-SaaS network	—
Prosperity7 (Aramco)	Series A	A	Sovereign-aligned capital; Middle East GTM	Confirm any sovereign-cloud commitments
NEA, Greycroft, SciFi, Factory, Long Journey, Definition, Long Journey	Co-investors	Seed/A/B	Round support	—
Founders & employees	Common stock	—	Reported >25% retained based on Series A press	Confirm cap table post-Series B

Cap table figures sourced from press releases at funding events; secondary sales not disclosed at runDate.

[CO012, CO013, CO014, CO015, CO016, CO017]

FO001: Company milestone timeline

Founding to Series B + flagship research drops.

[CO014, CO015, CO016, CO017, CO021, CO022]

1.4 Scale, Cover Metrics, and Milestones

Public scale metrics remain partial. The company has stated it operates more than 20,000 NVIDIA Hopper-class GPUs across multiple regions, with public roadmap notes referencing Blackwell rollouts, and serves "hundreds of thousands" of developers via the Together API, but it has not disclosed audited ARR, gross margin, paid-developer counts, or net revenue retention. CNBC reported a $100M annualised revenue pace around the Series B; Bloomberg cited triple-digit growth without a specific number. Reported headcount tracks above 150 globally, with active openings spanning kernel, networking, ML, and sales. The milestone timeline anchors founding (June 2022), seed (May 2023), RedPajama 1T dataset (April 2023), OpenChatKit (March 2023), Series A (November 2023), FlashAttention-3 (July 2024), Series B at $3.3B (July 2024), and StripedHyena/Mixture-of-Agents research (late 2023–2024). No adverse litigation, layoffs, or regulatory action has been reported through the run date, but key cover metrics (gross margin, ARR confirmation, customer concentration) remain undisclosed and are reflected in the snapshot KPI table.[CO019, CO020, CO021, CO022, CO023, CO024]

Milestone table
Date	Event	Type	Amount/valuation/status	Participants	Implication
2022-06-27	Together Computer Inc. incorporated	founding	active	Prakash, Zhang, Ré, Liang	Identity established
2023-03-10	OpenChatKit launched	product	released	Together + LAION + Ontocord	Open-source instruct-tuning baseline
2023-04-17	RedPajama 1T dataset released	product	released	Together + EleutherAI + LAION	Foundational open dataset (1T tokens)
2023-05-15	$20M seed announced	financing	closed	Lux + Factory + SciFi	Institutional launch capital
2023-11-29	$102.5M Series A	financing	closed at undisclosed valuation	Kleiner Perkins (lead), NVIDIA, NEA, Emergence	Scale-up & H100 build-out
2024-03-13	Reported interim raise at $1.25B	financing	reported	Existing investors	Mid-cycle uplift
2024-07-09	$305M Series B at $3.3B post	financing	closed	Salesforce Ventures + Coatue (co-leads), NVIDIA, Lakestar	3x valuation step-up; enterprise pivot
2024-07-11	FlashAttention-3 paper & blog	product	released	Dao et al.	State-of-the-art H100 inference kernel
2024-09	Together Inference Engine 2.0	product	released	Together engineering	Latency / throughput leadership claim
2023-12	StripedHyena-Nous-7B	product	released	Together + Nous Research	Non-attention long-context architecture
2024-06	Mixture-of-Agents paper	product	released	Together research	Agentic LLM technique
2024-Q4	Dedicated Endpoints GA	product	released	Together engineering	Enterprise inference offer

No reported adverse events (litigation, layoffs, regulatory action) at runDate; absence of adverse events is itself a diligence finding pending background check.

[CO019, CO020, CO021, CO022, CO023, CO024]

FO003: Snapshot KPIs

IC-ready snapshot of maturity, traction, and capital.

[CO019, CO021, CO022, CO024, CO032]

1.5 Exhibits

Chapter 02

02Market Analysis

2.1 Market boundary and adjacencies

Together AI sits in the AI compute and inference platform layer of the modern cloud stack — between hyperscaler GPU IaaS (AWS, GCP, Azure), specialised GPU clouds (CoreWeave, Lambda, Crusoe), inference-API providers (Replicate, Fireworks, Groq, Modal), and closed-API model labs (OpenAI, Anthropic). The market we underwrite is the spend dedicated to running, fine-tuning, and serving open-weight or customer-owned foundation models, plus the dedicated and serverless GPU capacity used for AI workloads. Excluded from this market are general-purpose cloud compute, traditional ML platforms (Sagemaker training-only, classical scikit pipelines), and closed proprietary model APIs that do not host customer weights. Adjacencies include MLOps tooling (Weights & Biases, Anyscale), vector databases, and AI safety/observability vendors. Status-quo substitutes are self-hosted Kubernetes-on-GPU clusters and closed-API rentals from OpenAI/Anthropic, both of which trade flexibility for price and operational simplicity. We also explicitly exclude per-seat AI copilots (Copilot, Cursor) because the unit of demand is end-user seats rather than inference tokens, which means they sit one layer above Together in the application stack and procure rather than substitute for token-level inference.[CM001, CM002, CM003, CM004, CM005, CM006]

Market definition table
Segment	Included spend	Excluded spend	Buyer/payer	Relevance to Together
Open-weight model inference (API)	Per-token serverless inference on Llama/Mistral/Qwen/DeepSeek	Closed-API tokens (OpenAI/Anthropic)	Developer + CTO	Core SOM
Dedicated GPU capacity	Reserved H100/H200/B200 endpoints	General-purpose cloud compute	Platform team	Direct expansion ARR
Fine-tuning + custom model hosting	LoRA, full fine-tune, custom checkpoint hosting	In-house Kubernetes training	ML engineering lead	High-margin attach
Batch inference + training	Multi-million-token batch jobs, pretraining runs	Closed training-only platforms	Research lead	Growth wedge
Sovereign / regional clusters	Dedicated in-region capacity	Public-region multi-tenant	Government / regulated CIO	Differentiated lane
MLOps + observability	Logs, evals, fine-tune jobs	BI/analytics	MLOps lead	Adjacency, not core
Closed-API model rentals	OpenAI/Anthropic API spend	—	App developer	Substitute / pressure

Boundary anchors on customer ownership of weights and on GPU-backed compute as the billable unit; excludes general-purpose cloud and closed-only APIs.

[CM001, CM002, CM003, CM004, CM005, CM006]

2.2 TAM/SAM/SOM and sizing lenses

Multiple analyst sources converge on a 2024 AI infrastructure TAM of $40–60B with 30–50% CAGR through 2028 (Gartner, IDC, McKinsey). Within that envelope, the inference and dedicated AI compute SAM most relevant to Together is sized at $8–15B in 2026 by triangulating: hyperscaler AI revenue disclosures ($26B annualised AWS Bedrock-equivalent revenue extrapolated), Series B coverage citing inference as the fastest-growing line, and the ~$100M Together ARR proxy implying a single-digit share of an early SAM. SOM (Together-addressable, near-term winnable spend) is on the order of $1–3B, focused on AI-native startups, model labs, and the Salesforce + sovereign-cloud channels Together has explicit relationships with. Sizing is constrained by the lack of disaggregated public reporting from hyperscalers and by the conflation of training capex with inference run-rate in many published estimates.[CM007, CM008, CM009, CM010, CM011, CM012]

TAM/SAM/SOM or sizing lens table
Publisher	Year	Geography	Value	CAGR	Methodology	Confidence	Limitation
Gartner	2024	Global	$40–60B AI infrastructure TAM	30–50%	Top-down survey of hyperscaler + enterprise AI spend	medium	Aggregates training+inference; not disaggregated
IDC (cited via secondary)	2024	Global	$50B AI infrastructure 2024	35%	Hardware + cloud forecast	low	Indirect citation
McKinsey AI spend report	2024	Global	$50–100B 2027 AI infra	40%	Scenario analysis	low	Wide range; assumptions unclear
Triangulated SAM (this report)	2026	Global	$8–15B inference + dedicated SAM	—	Bottoms-up from CNBC ARR + hyperscaler disclosures	medium	Single-source dependence on hyperscaler quarterlies
Triangulated SOM (this report)	2026	Global	$1–3B Together-addressable	—	Channel + Together $100M ARR	medium	High estimation uncertainty
NVIDIA earnings (data centre)	2025-Q1	Global	>$30B/qtr DC revenue	>50%	Public filings	high	Includes training capex sales, not pure inference

TAM/SAM/SOM are bounded; ranges preserved because no single public source disaggregates inference spend cleanly.

[CM007, CM008, CM009, CM010, CM011, CM012]

FM001: Market sizing lens

TAM/SAM/SOM for Together-addressable AI compute.

[CM007, CM008, CM011, CM012, CM036]

FM002: Market estimate range

Inference SAM 2026 estimates.

[CM009, CM010]

2.3 Buyer, user, and payer segmentation

Three primary buyer segments drive Together demand. (1) AI-native startups and model labs: technical founders or CTOs choose Together for FlashAttention-class inference latency, dedicated H100/H200 access, and open-weight flexibility; these are typically self-serve credit-card purchasers escalating to enterprise contracts. (2) Enterprise platform teams and applied-ML groups inside Fortune-500 companies: budget owners are CIOs/CTOs evaluating multi-model strategies, with procurement gates around data residency, SOC 2, and BAA support; Salesforce Ventures co-leadership of the Series B underwrites this segment. (3) Government, research, and sovereign-cloud customers: Prosperity7 (Aramco) and similar sovereign-aligned LPs signal a Middle East/APAC angle, and Together has positioned dedicated regional clusters as a differentiator. Users (developers, ML engineers, researchers) often differ from payers (finance, procurement, IT), which lengthens enterprise cycles but improves NRR once landed.[CM014, CM015, CM016, CM017, CM018, CM019]

Segment / buyer map
Segment	Buyer	User	Payer	Workflow	Budget owner	Adoption trigger
AI-native startup	CTO	ML engineer	Founder/CFO	Self-serve API + LoRA	CTO	Need open weights + dedicated GPUs
F500 platform team	CIO	Applied ML	IT procurement	RFP + dedicated endpoints	CIO	Multi-model strategy + BAA
Sovereign cloud	Minister/CIO	Government ML	Treasury	In-region dedicated capacity	Government	Data residency mandate
Model lab	Founder	Researcher	Founder	Reserved training + inference	Founder	GPU scarcity at hyperscalers
Independent dev	Self	Self	Self	Per-token API	Self	Free tier + pricing parity
Salesforce ecosystem ISV	Product VP	Eng team	Product P&L	Embedded GenAI	Product VP	Salesforce Ventures channel

Buyer/user/payer split distinguishes self-serve credit-card adoption from enterprise procurement gates.

[CM014, CM015, CM016, CM017, CM018, CM019]

FM003: Buyer / segment map

Adoption maturity by segment.

[CM014, CM015, CM016, CM017, CM018, CM019]

2.4 Growth drivers and constraints

Tailwinds: ongoing open-weight model proliferation (Llama 3/4, Mistral, DeepSeek, Qwen), GPU scarcity at hyperscalers, FinOps pressure to reduce per-token closed-API spend, and the agentic AI wave that multiplies token-volume per user. Headwinds: NVIDIA supply allocation favouring hyperscalers, sovereign data rules slowing cross-border inference, energy/permitting bottlenecks for new data centres, and competitive pricing pressure from Groq, Fireworks, and Cerebras at the inference layer. Adoption-timing risks include enterprise procurement friction, the possibility that hyperscalers commoditise the OSS-inference layer (AWS Bedrock open models, GCP Vertex Model Garden), and the volatile economics of training-vs-inference mix. Together's positioning depends on staying a generation ahead on inference kernels (FlashAttention 3/4, ThunderKittens) while expanding into reserved/dedicated SKUs that lock enterprise spend. Each driver and constraint feeds back into a binary question for IC: does the inference SAM compound at 35%+ for three more years, or does hyperscaler commoditisation pull growth forward into a single year of land-grab? Our base case assumes durable 30–40% CAGR through 2027 with widening competitive intensity from 2026 onward, which is the regime in which Together's open-source flywheel and dedicated-capacity differentiation produce the strongest IRR.[CM021, CM022, CM023, CM024, CM025, CM026]

Growth drivers and constraints table
Driver/constraint	Direction	Timing	Implication	Diligence ask
Open-weight model proliferation	+	2024-2027	Sustains SAM growth >35% CAGR	Track Llama 4/5, DeepSeek, Qwen release cadence
NVIDIA Hopper/Blackwell scarcity	+	2024-2026	Drives Together's reserved capacity premium	Quantify Together NVIDIA allocation pact
Closed-API price pressure (OpenAI cuts)	-	Ongoing	Compresses per-token margin	Track Together pricing parity vs OpenAI
Hyperscaler open-model commoditisation	-	2025-2027	Erodes pure-inference SAM	Watch AWS Bedrock & Vertex Model Garden expansion
Sovereign data residency rules	+/-	2025+	Creates regional moats but caps cross-border ARR	Confirm Together in-region clusters
Energy/permitting bottlenecks	-	2026-2028	Slows capacity expansion	Confirm Together DC contracts
Agentic workloads multiply tokens	+	2025+	Increases inference volume per user	Track MoA + agent SDK adoption
FinOps push to OSS inference	+	2025+	Tailwind for Together vs closed APIs	Survey enterprise FinOps strategy

Drivers cited from multiple analyst notes and partner statements; constraints triangulated from supply-chain reporting and hyperscaler announcements.

[CM021, CM022, CM023, CM024, CM025, CM026]

FM004: Adoption funnel or value-chain map

Discovery to expansion path.

[CM020, CM021, CM022, CM033]

2.5 Exhibits

Chapter 03

03Competitors

3.1 Competitive landscape segmentation

Together competes across five overlapping arenas. (1) Hyperscaler open-model offerings — AWS Bedrock and Google Vertex Model Garden host the same Llama/Mistral checkpoints Together offers, bundled with enterprise contracts and IAM. (2) Specialised GPU clouds — CoreWeave, Lambda Labs, and TensorWave compete for raw GPU-hour and reserved capacity; they typically lack the inference SaaS layer Together overlays. (3) Inference-API peers — Fireworks, Replicate, Modal, and Anyscale provide near-direct substitutes at the per-token serverless layer; Fireworks is most frequently cited as Together's closest direct rival. (4) Bespoke-silicon inference vendors — Groq (LPU), Cerebras (wafer-scale), and SambaNova compete on latency and price/token at the cost of model coverage. (5) Closed-API model labs — OpenAI and Anthropic act as substitutes for buyers willing to give up weight portability. The status-quo alternative is self-hosted Kubernetes-on-GPU, which trades flexibility for operational burden; internal-build is most common at frontier labs and FAANG. The competitive set is unusually broad because Together sits at the intersection of compute, model hosting, and developer experience; each arena exposes Together to different cost structures (capex-heavy GPU clouds vs OpEx-light API providers), different distribution power (hyperscaler procurement vs developer self-serve), and different exit dynamics (consolidation among GPU clouds vs commoditisation among API peers), all of which we underwrite separately below.[CP001, CP002, CP003, CP004, CP005, CP006]

Competitor profile table
Competitor	Category	Scale/funding	Target segment	Differentiation	Limitation
AWS Bedrock	Hyperscaler open-model	>$80B AWS revenue	Enterprise	IAM, compliance, bundling	Per-token premium, slower model adds
GCP Vertex Model Garden	Hyperscaler open-model	~$30B GCP revenue	Enterprise	Gemini + open models	Less open-weight depth
CoreWeave	Specialised GPU cloud	>$8B raised; public 2025	AI labs, hyperscaler offload	Largest non-hyperscaler GPU fleet	No inference SaaS layer
Lambda Labs	GPU cloud	$320M Series C	Researchers, startups	On-demand H100/H200	Smaller fleet vs CoreWeave
Fireworks AI	Inference API peer	>$77M raised	Devs, startups	OpenAI-compatible API	Smaller OSS-research footprint
Replicate	Inference API peer	>$40M raised	Indie devs	Community models, low friction	Cold-start latency
Modal	Serverless infra	>$80M raised	ML eng	Python-native serverless	Less model breadth
Anyscale	Ray-based platform	>$250M raised	ML eng	Ray + LLM tooling	OSS-platform tax
Groq	Bespuke silicon	>$1B raised	Latency-sensitive devs	LPU inference speed	Limited model coverage
Cerebras	Bespoke silicon	>$1B raised; IPO filed	Frontier customers	Wafer-scale chip	High per-deployment cost
OpenAI / Anthropic (substitute)	Closed API	>$30B / $10B raised	Enterprise + devs	Frontier closed models	No weight portability
TensorWave	AMD GPU cloud	Seed-stage	Cost-sensitive devs	MI300X capacity	Limited scale

Funding and scale figures sourced from public press releases and Crunchbase summaries; some private funding rounds rely on third-party reporting.

[CP001, CP002, CP003, CP004, CP005, CP006]

3.2 Capability and feature comparison

On capability axes Together leads on FlashAttention-3/4 kernel performance, open-weight model breadth (Llama, Mistral, DeepSeek, Qwen, custom checkpoints), and dedicated-endpoint flexibility. Hyperscalers lead on enterprise compliance breadth (BAA, FedRAMP, regional residency) and bundled identity/billing. Groq leads on raw single-stream latency on supported models but lags on model coverage. Fireworks closely matches Together on serverless open-model APIs but has lower OSS-research visibility. Pricing comparison shows Together's serverless rates clustered around the OpenAI-parity envelope (≈$0.20–$0.90/M input tokens for 7–70B models) with batch discounts up to 50%; CoreWeave/Lambda undercut on raw GPU-hour but require customer DevOps; AWS Bedrock charges a per-token premium on top of underlying compute. Feature matrices below mark unsupported cells as unknown rather than guessing. The matrix shows Together winning on open-weight breadth and kernel performance, hyperscalers winning on compliance and IAM, and bespoke-silicon vendors winning on latency at the cost of model coverage; no single vendor dominates the four most cited buying criteria simultaneously. We also note that Together is one of only two vendors in the set that ships an OpenAI-compatible chat completions endpoint while also exposing fine-tune and batch SKUs, which materially shortens migration time for buyers leaving closed APIs.[CP009, CP010, CP011, CP012, CP013, CP014]

Feature / capability matrix
Buying criterion	Together	Bedrock	GCP Vertex	Fireworks	Groq	CoreWeave
Open-weight model breadth	high	medium	medium	high	low	n/a
FlashAttention-class kernel perf	high	unknown	unknown	high	medium	n/a
Dedicated endpoints / reserved	yes	yes (provisioned)	yes	yes	yes	yes (raw)
Fine-tuning API	yes	partial	yes	yes	no	no
Batch inference SKU	yes	partial	yes	partial	no	no
Compliance (SOC2/HIPAA/FedRAMP)	SOC2; HIPAA via BAA	full	full	SOC2	unknown	SOC2
Sovereign / regional clusters	available	full	full	limited	unknown	full
OpenAI-compatible API	yes	no	no	yes	yes	no
Per-token list pricing transparency	high	medium	medium	high	high	n/a
Multi-modal (vision/audio/image)	yes	partial	yes	partial	no	n/a

Cells marked "unknown" where public docs do not disclose; cells marked "n/a" where the feature is outside the competitor's SKU.

[CP009, CP010, CP011, CP012, CP013, CP014]

Pricing / packaging comparison
Vendor	SKU	Price/unit	Discount	Notes
Together	Serverless Llama-70B	$0.88/M tokens	—	OpenAI-parity envelope
Together	Batch inference	-50% off serverless	batch	Updated 2025
Together	Dedicated endpoint	custom	reserved	Quoted via sales
Fireworks	Serverless Llama-70B	$0.90/M tokens	—	Similar parity
Replicate	Per-second	varies	—	GPU-second billing
AWS Bedrock	Llama 3 70B	$0.99/M output tokens	vol	Provisioned reserved option
GCP Vertex	Llama 3 70B	$0.99/M	vol	Similar to Bedrock
Groq	Llama 3 70B	$0.59/M	—	Latency premium
CoreWeave	GPU-hour	$2–4/H100-hr	reserved	Customer manages stack
Lambda	GPU-hour	$2.79/H100-hr	on-demand	Customer manages stack

Per-token prices reflect public list pricing on vendor sites at runDate; realised pricing for enterprise deals is undisclosed.

[CP016, CP017, CP018, CP019, CP020]

FP001: Competitive positioning map

Open-weight breadth vs enterprise compliance maturity.

[CP001, CP009, CP011, CP012, CP013]

FP002: Feature breadth / capability map

Capability strength by competitor.

[CP010, CP014, CP015, CP018, CP029]

3.3 Moat durability and competitive risk

Together's defensible moats are (a) FlashAttention research lineage and kernel velocity (with Tri Dao + Chris Ré), (b) open-source community gravity (RedPajama, StripedHyena, MoA), and (c) the NVIDIA + Salesforce + sovereign capital stack that secures GPU supply and enterprise distribution. Switching costs are mid: customers can multi-home across Together / Fireworks / Bedrock with API translation; however, dedicated-endpoint contracts and fine-tuned model artifacts on Together raise stickiness. Distribution power tilts to hyperscalers — they own enterprise procurement and identity — but Together's neutrality and open-weight commitment is a counter-positioning differentiator. Adverse competitor evidence: CoreWeave's 2024 IPO filings and Lambda's growth signal substantial capital advantage at the IaaS layer; Groq and Cerebras have raised >$1B each at higher valuations; Bedrock's 2025 expansion of Llama support compresses Together's premium on commodity workloads. Commoditisation risk is real but bounded by Together's research velocity and dedicated-capacity contracts. Net, we believe the moat is durable through 2027 in the dedicated and high-performance segments, with growing pressure on the commodity serverless tier from hyperscalers and on the latency-critical tier from bespoke silicon vendors; the company's ability to sustain kernel and architecture leadership is the gating variable for the moat thesis and is therefore the principal item on the technical-diligence checklist.[CP021, CP022, CP023, CP024, CP025, CP026]

Moat durability / competitive risk register
Moat claim	Threat	Severity	Mitigation/diligence ask
FlashAttention research lineage	Open-source diffusion to competitors	medium	Track Together's patent/IP posture
Open-source community gravity	Competing OSS projects from Mistral/HF	medium	Quantify Together GH/HF traction over time
NVIDIA supply alignment	NVIDIA tilts to hyperscalers	high	Document Together NVIDIA pact
Salesforce / enterprise channel	Salesforce develops its own AI infra	medium	Confirm Salesforce commercial commit
Sovereign capital + regional clusters	Sovereign customers go direct to local clouds	medium	Map Together regional DC footprint
Dedicated endpoint stickiness	Bedrock provisioned throughput parity	high	Track Bedrock open-model price moves
Open-weight neutrality	Enterprise wants closed-API simplicity	medium	Survey enterprise multi-model strategy
Inference engine performance lead	Specialised silicon (Groq/Cerebras) leapfrogs	high	Benchmark Together vs Groq on shared models

Moats ranked by exposure to competitive substitution and capital intensity; each row has a concrete diligence ask.

[CP021, CP022, CP023, CP024, CP025, CP026]

FP003: Moat / readiness KPIs

Compact competitive durability summary.

[CP021, CP022, CP023, CP024, CP025, CP026]

3.4 Exhibits

Chapter 04

04Financials

4.1 Funding history and capital stack

Together AI has assembled approximately $533M of disclosed primary capital across four publicly announced rounds. The seed of $20M (May 2023) was led by Lux Capital with Factory, SciFi, Long Journey, and notable individual backers (Scott Banister, Jakob Uszkoreit, Aravind Srinivas). A $102.5M Series A in November 2023 was led by Kleiner Perkins with NVIDIA, Emergence, NEA, Prosperity7, and Greycroft. In March 2024 the company added approximately $106M at a reported $1.25B valuation (sometimes referred to as Series A2), then closed a $305M Series B in July 2024 led by Salesforce Ventures and Coatue at a $3.3B post-money valuation, with Lakestar, NVIDIA, and several strategics participating. No S-1, S-3, or registered offering appears on SEC EDGAR for Together Computer Inc. at the runDate, and no public secondary or 2026 extension has been confirmed. The capital stack therefore reads as venture-only with strategic anchors (NVIDIA for GPU supply, Salesforce Ventures for enterprise distribution, Prosperity7 for sovereign optionality); board control is split across KP, Coatue and Lux based on round-led signaling, but the cap table itself is not public. Cumulative dilution is undisclosed; founders are widely reported to retain a meaningful equity stake post Series B, but exact percentages are not in the public record and must be verified against management.[CI001, CI002, CI003, CI004, CI005, CI006]

Capital adequacy table
Capital primitive	Value	Date	Public status	Diligence ask
Cumulative primary capital	~$533M	2024-07	disclosed (round level)	—
Cash on hand	undisclosed	—	missing	Request cash position
Monthly burn	undisclosed (~$15-25M implied)	2024-25	missing	Request actual burn
Runway months	undisclosed (likely 18-30 implied)	2025	missing	Request runway plan
Planned use of funds	undisclosed	—	missing	Request capex plan
Next-round trigger	undisclosed	—	missing	Request milestones
Debt / project finance	undisclosed	—	missing	Request facility terms
Vendor financing (NVIDIA)	undisclosed	—	missing	Confirm any equipment financing
Series B valuation	$3.3B post	2024-07-09	disclosed	—
Latest secondary clearing price	undisclosed	—	missing	Pitchbook / Information chatter

Capital primitives mix disclosed round amounts with undisclosed forward-looking financial primitives.

[CI007, CI023, CI026, CI027, CI030, CI031]

FI004: Capital intensity / cash-flow map

Capex and operating cash flow mapped against funding rounds.

Cash balance and next-round trigger are undisclosed; arrows illustrate direction not magnitude.

[CI007, CI026, CI027, CI030]

4.2 Revenue, pricing, and reported scale

Together has not filed financial statements. CNBC reporting around the July 2024 Series B cited a $100M annualised revenue pace; Bloomberg cited "triple-digit growth"; Fast Company and VentureBeat repeated those figures without independent verification. The Information has separately reported on 2025 revenue trajectory behind a paywall; PitchBook lists the company as later-stage venture without a confirmed 2025 follow-on. Pricing is published per-token on the public pricing page, ranging roughly $0.20–$0.90/M tokens for 7–70B open models, with a documented 50% batch-inference discount and custom dedicated-endpoint pricing quoted via sales. SKUs include serverless, dedicated/reserved endpoints, fine-tuning, batch, and embeddings; vision/audio/image SKUs are documented separately. There is no published ARR, segment split, customer concentration, NRR, or gross margin disclosure at the runDate. Forrester and IDC market frames imply Together is a growth-stage entrant in a multi-billion-dollar generative-AI inference TAM, but neither analyst names Together among its top three vendors. Management has acknowledged enterprise pipeline acceleration tied to Salesforce Ventures co-selling but has not quantified it. This combination of company-claimed momentum, third-party press anecdotes, and absent audited disclosure is consistent with private growth-stage SaaS, but creates significant diligence risk on realised vs list pricing, mix, and gross margin. The GTM motion is dominated by self-serve developer signup at the top of the funnel and partner-led enterprise expansion through Salesforce Ventures and NVIDIA channel referrals; sales-cycle length, CAC, and payback are undisclosed but can be inferred to be 60-120 days for enterprise dedicated contracts based on comparable inference-API vendor disclosures.[CI009, CI010, CI011, CI012, CI013, CI014]

Revenue streams table
SKU	Pricing basis	Public price benchmark	Discount levers	Diligence gap
Serverless inference	per million tokens	$0.20–$0.90/M (open models 7–70B)	volume / committed-use	Realised vs list pricing not disclosed
Batch inference	per million tokens	50% discount vs serverless	batch SLA window	Confirmed via 2025 blog update
Dedicated endpoints	custom / reserved	quoted via sales	term commitment	No published list pricing
Fine-tuning API	per training run	quoted on pricing page	volume	Public docs but no margin disclosure
Embeddings API	per million tokens	published per-model	volume	—
Vision / image / audio APIs	per request / per token	published per model	—	Revenue mix not split
Enterprise contracts	annual / committed	undisclosed	strategic discounts	Critical diligence ask

Pricing rows mix published list pricing (high confidence) and inferred enterprise practice (low confidence); revenue mix between SKUs is not disclosed and must be requested.

[CI009, CI010, CI011, CI012, CI013, CI014]

Pricing / monetization table
Pricing dimension	Public benchmark	List vs realised	Discount / unknown	Source
Per-token Llama-70B	$0.88/M serverless	list only	volume discount	pricing page
Batch SLA discount	-50% vs serverless	list only	batch window	2025 batch blog
Dedicated endpoint	custom / per hour	realised not disclosed	term commit	blog + sales-quoted
Fine-tuning run	per training token	list only	volume	docs FT page
Embeddings	per million tokens	list only	volume	docs embeddings page
Enterprise contract value	not disclosed	realised undisclosed	strategic discounts	requested from management
Co-sell rebates (Salesforce)	not disclosed	realised undisclosed	partner economics	Salesforce Ventures co-sell
Sovereign-cloud premium	not disclosed	realised undisclosed	regional	Prosperity7 strategic

List pricing is publicly verifiable; realised pricing across enterprise contracts is undisclosed and must be requested.

[CI012, CI013, CI014, CI015, CI016]

FI001: Revenue model bridge

How customer activity converts to Together revenue and gross profit.

Gross-profit edge is illustrative; realised margin is undisclosed.

[CI012, CI013, CI014, CI015, CI024, CI025]

4.3 Unit economics, capital adequacy, and gaps

Together's public profile permits only rough unit-economics estimation. On the cost side, GPU-hour COGS scale with NVIDIA capex; CoreWeave's S-1 disclosures (a useful comparable) show GPU cloud gross margins in the 60–70% range on reserved deals and lower on on-demand. Per-token gross margin at Together's list pricing is plausibly 40–60% on serverless and higher on dedicated, but realised margin depends on utilisation and reserved-capacity contracts that are not public. On the cash side, $533M raised against an implied $300–$500M cash burn through 2024 (consistent with hyperscale GPU buildout and a 150+ headcount) suggests runway into 2026, but no figure is confirmed. Capital adequacy depends on whether Together extends Series B or files for IPO; the Figma and CoreWeave 2025 IPO precedents show the public-market window is open for AI-infrastructure issuers, while Navan's S-1 process is a closer growth-SaaS comparable. Gaps are material: ARR confirmation, gross margin by SKU, customer concentration top-10, net dollar retention, contracted vs uncontracted revenue, runway months, debt or vendor financing, and any sovereign-cloud commitments. These gaps drive the diligence ask list in the unit-economics and capital-adequacy tables and underpin a material evidence-gap entry for each undisclosed primitive; absent management disclosure, the most informative external signals are Together's public hiring posture, pricing-page revisions, and any 2026 secondary-market chatter, all of which should be tracked through the close of diligence. Working capital is unlikely to be a constraint at this scale of consumption-based SaaS; the bigger swing factor on cash is the pace of GPU capex relative to revenue ramp, which sets the cadence for the next round trigger. The verdict is that revenue quality and growth optics are strong but unverified; margin path is plausible but unaudited; capital intensity is high but underwritten by NVIDIA alignment; and the principal diligence blocker is the full set of private financial primitives enumerated in the public-financial-gaps table.[CI019, CI020, CI021, CI022, CI023, CI024]

Unit economics table
Metric	Value / null	Confidence	Why it matters	Diligence ask
Serverless gross margin	40–60% (inferred)	low	Long-term margin path	Request actual blended GM
Dedicated gross margin	60–75% (inferred)	low	Reserved customer LTV	Request dedicated GM split
Batch gross margin	35–55% (inferred)	low	Batch GM after 50% discount	Confirm batch utilisation
CAC payback	null	low	Sales efficiency	Request payback months by segment
Magic number	null	low	Sales productivity	Request magic number
NRR	null	low	Expansion proxy	Request NRR by cohort
Gross retention	null	low	Churn proxy	Request gross retention
Implied burn 2024	$300–$500M (inferred)	low	Cash adequacy	Request 24-month plan
Utilisation of GPU fleet	null	low	Utilisation drives GM	Request utilisation by SKU
SBC ratio	null	low	True margin	Request SBC schedule

All values are inferred ranges or nulls; every null is accompanied by a specific diligence request.

[CI019, CI020, CI021, CI022, CI023, CI024]

Public financial gaps table
Item	Public status	Why it matters	Diligence ask
Audited revenue (ARR)	not disclosed	Validates third-party $100M figure	Request management ARR & growth deck
Gross margin by SKU	not disclosed	Underpins long-term thesis	Request COGS breakdown by SKU
Net dollar retention	not disclosed	Stickiness proxy	Request NDR by cohort
Top-10 customer concentration	not disclosed	Revenue concentration risk	Request top-10 anonymised
Contracted revenue (RPO)	not disclosed	Forward visibility	Request contracted vs uncontracted split
Cash & runway	not disclosed	Capital adequacy	Request cash position & 24-mo plan
Debt / vendor financing	not disclosed	Capital structure	Request facility terms, if any
Founder ownership	not disclosed	Alignment, dilution	Request cap table
NRR vs gross retention	not disclosed	Expansion vs churn	Request gross / net retention
Stock-based compensation	not disclosed	Real vs reported margin	Request SBC schedule
Realised enterprise pricing	not disclosed	True margin vs list	Request three sample contracts

All items are material to underwriting and none are public at the runDate; chapter relies on third-party signals and management requests to close the gap.

[CI019, CI020, CI021, CI022, CI023, CI024]

FI002: Unit economics bridge

Inputs to per-token unit economics in absence of disclosed values.

All quantitative nodes are inferred ranges or null; this is a qualitative bridge.

[CI012, CI016, CI019, CI020, CI024, CI025]

FI003: Financial estimate range

Source-backed bounds on revenue, burn, runway, and margin.

Ranges are illustrative; lower bound is most conservative public datapoint and upper bound reflects 2x of most aggressive public datapoint.

[CI009, CI024, CI025, CI026, CI027]

4.4 Exhibits

Chapter 05

05Product & Technology

5.1 Product surface, modules, and SKUs

Together AI exposes a single platform with serverless inference, dedicated endpoints, fine-tuning, batch inference, embeddings, and modality-specific APIs (vision, audio, image). The product surface is documented at docs.together.ai and is OpenAI-compatible at the chat-completions level, making migration from closed APIs straightforward. The model catalog spans 200+ open models including Llama 3/4, Mistral, Mixtral, Qwen, DeepSeek, StripedHyena, and custom fine-tuned checkpoints; published model and SKU references confirm per-token and per-request billing surfaces. Dedicated endpoints offer reserved capacity on H100/H200/B200 GPUs for latency-sensitive workloads and are quoted via sales. The fine-tuning API accepts LoRA and full-parameter training jobs across most supported families. Batch inference offers up to a 50% discount versus serverless with documented SLA windows. SDKs ship in Python and TypeScript, with raw HTTP for any other runtime; rate-limit documentation distinguishes free, paid, and enterprise tiers. The complete product module / asset matrix below enumerates each module, its primary user, maturity status, differentiation, and the gap a buyer should probe before committing to a long-term contract. Module ordering follows the buyer's typical adoption sequence: serverless first for experimentation, then dedicated and fine-tuning for production, then batch and embeddings for scaled workflows.[CE001, CE002, CE003, CE004, CE005, CE006]

Product module / asset matrix
Module	User	Status/maturity	Differentiation	Diligence gap
Serverless inference API	Developers, startups	GA	OpenAI-compatible chat completions on 200+ open models	SLA % not published
Dedicated endpoints	Enterprise	GA	Reserved H100/H200/B200 capacity, BAA available	List pricing not published
Fine-tuning API	ML engineers	GA	LoRA + full-parameter on Llama/Mistral/Qwen	Training cost transparency
Batch inference	ML engineers	GA (2025 update)	50% discount vs serverless	Realised batch utilisation undisclosed
Embeddings API	Developers	GA	Multiple open embedding models	Per-model retention tracking
Vision / image / audio APIs	Multi-modal devs	GA	Llama-Vision, image generation, audio transcription	Regional availability map
Together Inference Engine (TIE v1/v2)	Internal / advanced	GA	FA-3/4 + TK + speculative decoding	Engine-version SLA differences
Mixture-of-Agents	Researchers, advanced devs	beta	Ensemble inference for higher quality	Cost premium vs single-model
Model store	All users	GA	200+ open + custom weights	Catalog churn cadence
SDKs (Python, TS, HTTP)	Developers	GA	OpenAI-compat + native	SDK release cadence

Maturity follows public docs status; cells marked beta or limited reflect explicit docs statements at runDate.

[CE001, CE002, CE003, CE004, CE005, CE006]

Workflow / use-case table
User job	Current workflow	Together solution	Measurable benefit	Limitation
Try an open model	Local llama.cpp or HF Spaces	Serverless API call	Zero infra, OpenAI-compat	Cost at scale
Move production from closed API	OpenAI SDK	Swap base URL to Together	Same SDK, open weights	Feature parity edges
Fine-tune a Llama variant	Custom GPU cluster	Fine-tune API + run job	No DevOps needed	Limited training-step visibility
Serve a low-latency app	Self-hosted vLLM	Dedicated endpoint	Reserved capacity, BAA	Higher commit
Run nightly batch summarisation	Self-hosted batch	Batch inference SKU	50% cheaper than serverless	Batch SLA window
Build an agent	LangChain + closed API	Function-calling + JSON mode + structured output	Open-weight + tool use	Tool-call patterns evolving
Generate embeddings	HF embedding models locally	Embeddings API	Hosted, scalable	Re-index cost
Multi-modal (vision)	Self-host Llama-Vision	Vision API	Hosted vision call	Image-size limits
Research ensemble	Paper-replication code	MoA API	Out-of-box ensemble	Higher per-query cost
Run a regulated workload	On-prem GPU	Dedicated + BAA	HIPAA on dedicated	No FedRAMP yet

Workflow rows are drawn from docs quickstarts and customer case studies; limitations are explicit docs caveats or known gaps.

[CE001, CE002, CE003, CE004, CE005, CE006]

FE001: Product architecture map

Together AI product stack layers from API down to GPU substrate.

[CE001, CE011, CE012, CE013, CE014, CE015]

5.2 Architecture, dependencies, and operating model

Together's architecture stacks application APIs (chat, completions, embeddings, fine-tune, batch) over a model registry and inference orchestrator that schedules GPU pods on a multi-region NVIDIA Hopper/Blackwell fleet. The inference engine (Together Inference Engine v1/v2) wraps FlashAttention-3 and FlashAttention-4 attention kernels, ThunderKittens kernel framework, and speculative-decoding/Medusa decoders to achieve published throughput and latency claims. Mixture-of-Agents (MoA) research enables ensemble inference for higher-quality completions on supported models. The model store is backed by HuggingFace and Together's own registry; weight portability is a stated design principle. Critical dependencies include NVIDIA GPU supply (Hopper/Blackwell), data-center co-location partners, the HuggingFace catalog for model artefacts, and AWS S3/equivalent storage for fine-tune artefacts. The operating model splits a kernel/inference engineering team (Tri Dao, HazyResearch lineage) from a platform/SRE team (Alon Gavrielov-led infrastructure org from 2025) and a research arm (Chris Ré, Percy Liang). The architecture is exposed through a flow figure (customer request to GPU pod to response) and a critical-dependency DAG that surfaces single-vendor concentrations. Reliability proof points are a status page, published rate-limit documentation, and a published roadmap of model launches at GTC 2025 and AI Native Conference 2025. Gaps include a public SLA percentage, the precise multi-region map (which regions, which providers), and a single-source-of-truth roadmap; all are flagged as evidence gaps.[CE011, CE012, CE013, CE014, CE015, CE016]

Technology / operating architecture table
Layer/component	Role	Key dependency	Risk
API gateway	Receive OpenAI-compat HTTP requests	Auth + rate limit infra	DDOS, rate-limit miscalibration
Model registry	Resolve model id to weights	HuggingFace + internal storage	Weight churn, license updates
Inference scheduler	Place request on GPU pod	GPU pool, kube/orchestrator	Hot-spotting, queue depth
Together Inference Engine v2	Kernel-optimised model execution	FA-3/4, ThunderKittens, speculative decoding	Engine bug, regression on new model
GPU pool (Hopper / Blackwell)	Compute substrate	NVIDIA supply, co-lo partners	Supply shock, power outage
Fine-tune trainer	LoRA / full-parameter training jobs	GPU pool + object storage	Job-failure cost
Batch queue	Schedule batch inference	GPU off-peak window	SLA violation if peak overlaps
Embedding service	Embed text/images	Embedding model registry	Model deprecation
Vision/audio path	Multi-modal inference	Separate model stack	Mode-specific bugs
Observability / status	SLA monitoring	status.together.ai feed	Public SLA still missing
Trust / compliance	SOC 2 + HIPAA controls	Audit cadence	FedRAMP not yet GA
Storage (fine-tune artefacts)	Persist trained models	S3-equivalent storage	Loss/leak scenarios

Architecture layers reflect documented surfaces; depth of each layer is inferred from blog + research papers and may not be exhaustive.

[CE011, CE012, CE013, CE014, CE015, CE016]

FE002: Customer workflow / operating flow

Customer request through Together platform to a completion.

[CE011, CE012, CE013, CE014, CE015, CE016]

FE003: Critical dependency map

Suppliers, platforms, and partners Together depends on.

[CE014, CE018, CE019, CE020, CE021, CE022]

5.3 Trust, security, compliance, and roadmap

Together publishes a trust center referencing SOC 2 Type II attestation, HIPAA business associate agreement (BAA) availability on dedicated endpoints, and standard data processing terms. FedRAMP and similar US-Federal accreditation are not yet listed at the runDate; regional residency is offered through dedicated clusters but the public map is partial. Safety controls span content moderation, function-calling JSON validation, structured-output JSON mode, and per-model safety guidance. The roadmap, mined across the blog and AI Native Conference posts, includes Blackwell (B200) capacity ramp, batch inference SKU expansion, expanded fine-tune families, multi-modal (vision+audio) coverage, and Mixture-of-Agents productisation. Differentiation rests on (a) kernel-level performance lead (FA-3/4, TK), (b) breadth of open-weight model coverage, (c) flexibility across serverless/dedicated/batch SKUs, and (d) dual research-and-engineering culture with deep Stanford/Princeton/ETH lineage. Public developer signal — GitHub repo activity, PyPI download trajectory, HuggingFace model hub presence, and Hacker News thread engagement — confirms an active developer community without yet matching the scale of OpenAI or Hugging Face itself. Compared with hyperscaler offerings, Together's differentiation is most visible on open-weight neutrality and kernel performance and least visible on enterprise compliance breadth. The trust/compliance and roadmap tables below summarise each control and milestone with its current status, scope, and gap; cells marked unknown reflect missing public disclosure rather than absence of the underlying capability.[CE023, CE024, CE025, CE026, CE027, CE028]

Trust / quality / compliance table
Control / certification	Status	Scope	Gap
SOC 2 Type II	attested	platform	Need recent attestation date
HIPAA / BAA	available	dedicated endpoints	Not on serverless tier
GDPR / DPA	available	EU customers	Specific regional residency
FedRAMP	not yet	US Federal	Roadmap timing not confirmed
ISO 27001	not confirmed	—	Status uncertain
Data residency / regional cluster	partial	EU, US	Public region map limited
Content moderation / safety	documented	API-level	Per-model behaviour differs
Function calling / JSON mode	GA	API	Tool-use patterns evolving
Structured output	GA	API	—
Audit logs	documented	enterprise	Default not enabled
Custom model weights privacy	documented	dedicated	Need contract review
Bug bounty / responsible disclosure	published	platform	—

Controls cross-verified against trust.together.ai pages, blog posts, and public docs; cells marked "not confirmed" reflect absent public disclosure rather than absence of the underlying control.

[CE023, CE024, CE025, CE026, CE027, CE028]

Roadmap / release / development-stage table
Date / stage	Feature / milestone	Status	Implication	Source
2024-07	FlashAttention-3	GA	Kernel lead on Hopper	arXiv 2407.08608
2024-10	ThunderKittens	GA	Kernel framework	Together blog
2024-11	Startup Accelerator	Launched	GTM channel	Together blog
2025-03	GTC 2025 Pioneers	event	Customer + NVIDIA visibility	Together blog
2025-04	Alon Gavrielov as VP Infra	hired	Operating scale	Together blog
2025-05	Adaption partnership	Launched	Healthcare workflow	Together blog
2025-06	AI Native Conference	event	Research + product announcements	Together blog
2025-08	FlashAttention-4	GA	Next-gen kernel	Together blog
2025-09	Batch inference API updates	GA	50% discount + SLA	Together blog
2026-Q1	Blackwell (B200) rollout	planned	Capacity & price	Inferred from docs
2026	Expanded MoA productisation	planned	Quality tier	AI Native Conference
2026	Multi-modal expansion	planned	Vision+audio coverage	Together blog

Roadmap items beyond runDate are explicitly marked planned; sources include blog posts and conference announcements.

[CE033, CE034, CE035, CE036, CE037]

FE004: Product maturity / capability map

Maturity rating across product modules.

[CE001, CE002, CE003, CE004, CE005, CE006]

5.4 Exhibits

Chapter 06

06Customers

6.1 Customer segmentation and adoption surface

Together AI's customer base is segmented by buyer/user role and by deployment intensity. The top of the funnel is self-serve developers using serverless inference for prototyping or low-volume production: per company disclosure, more than 100,000 developers have used the platform since GA. Beneath that sit named startup customers — Pika (video), Arcee (open-source merging), Nous Research (community models), Cartesia (voice) — who run production workloads via a mix of serverless and dedicated endpoints. The enterprise tier is anchored by Salesforce (referenced via Salesforce Ventures co-sell and a customer case study), Zoom (customer case study), and Washington University (research deployment); the NVIDIA GTC 2025 Pioneers programme surfaced an additional cohort of customers including healthcare, robotics, and developer-tools companies. The Startup Accelerator (launched 2024-11) is an explicit funnel for early-stage AI startups, providing credits, technical support, and GTM amplification. Geographic mix is North America-skewed with EU presence growing through dedicated clusters; vertical mix spans developer tools, content/media (video, voice, image), enterprise SaaS, healthcare, and academia. Payer/user/buyer split varies by tier: in self-serve the developer is both buyer and user; in enterprise the buyer is typically a CTO/CIO or platform-engineering lead while the users are application teams. Customer segmentation, adoption-trajectory and named-customer-proof tables below capture each row's evidence quality and the residual gap on retention and concentration.[CU001, CU002, CU003, CU004, CU005, CU006]

Customer segmentation table
Segment	Buyer/user/payer	Use case	Scale	Revenue / strategic value	Gap
Self-serve developers	Developer = buyer + user	Prototyping, low-volume production	100,000+ devs (company claim)	Long-tail revenue + funnel	Paid vs free not split
AI-native startups	CTO/founder	Production inference	Pika, Cartesia, Nous, Arcee documented	High strategic value	No revenue values disclosed
Enterprise SaaS	CIO/platform eng	Embedded AI features	Salesforce, Zoom	Large strategic value	Contract sizes not disclosed
Healthcare	CIO/clinical lead	Regulated workflows (BAA)	Adaption (2025 launch)	Strategic	Production status TBD
Academia / research	PI / IT lead	Research compute	Washington University	Brand value	Spend size not disclosed
Developer tools	Founder/CTO	Embedded inference	GTC 2025 cohort	Pipeline	Cohort not enumerated
Sovereign / govt	Procurement	Sovereign cloud	Prosperity7-aligned (implied)	Strategic optionality	No public proof
Open-source community	Maintainers	OSS model serving	HuggingFace mirror integration	Brand + community	Active vs passive use

Segmentation rows mix named case studies with inferred categories; revenue-band values are unavailable.

[CU001, CU002, CU003, CU004, CU005, CU006]

Customer growth / adoption trajectory table
Metric	Value	Date	Source	Confidence	Implication	Missing denominator
Developers using platform	100,000+	2024	Together blog	low	Top-of-funnel scale	Paid vs free split
Named customer case studies	7+ published	2024-25	Together blog	high	Real production usage	Total customer count
GTC 2025 customer cohort	~12 pioneers	2025-03	Together blog + NVIDIA	medium	Enterprise pipeline	Per-customer ACV
Startup Accelerator participants	undisclosed N	2024-11 onwards	Together blog	low	Pipeline lever	Cohort size
Adaption healthcare partner	1 (launched)	2025	Together blog	medium	Regulated entry	Production status
HuggingFace integration users	undisclosed	2024-25	HF blog	low	Open-source pull	Active developers
G2 reviews	very small N	2025	G2	low	Independent proof	Volume too low to be representative
Trustpilot reviews	very small N	2025	Trustpilot	low	Independent proof	Volume too low to be representative

Trajectory rows mix company-claimed (low confidence) and third-party-reported numbers; denominators are explicitly listed as missing.

[CU001, CU002, CU011, CU012, CU013, CU014]

FU001: Customer journey map

Self-serve developer to enterprise expansion path.

[CU001, CU002, CU003, CU004, CU005, CU006]

FU002: Adoption / deployment funnel

Stage-by-stage developer-to-enterprise conversion.

Awareness, active-paid, and multi-year counts are illustrative placeholders; only signup and named counts are sourced.

[CU001, CU002, CU011, CU012, CU013, CU014]

6.2 Named-customer proof and durability

Named-customer proof spans seven public case studies (Salesforce, Zoom, Pika, Arcee, Nous Research, Cartesia, Washington University) plus the GTC 2025 Pioneers cohort and the Adaption healthcare partnership. Each case study documents the customer's workflow, model used, and qualitative outcome; quantitative outcomes (throughput, latency, cost, ROI) are documented for some but not all deployments. The most-cited outcomes are FlashAttention-driven latency reduction (Pika, Cartesia), cost reduction versus closed APIs (Arcee, Nous), and integration depth (Salesforce, Zoom). Production vs pilot is explicit for Salesforce, Zoom, Pika, Cartesia (production); Adaption is described as a launching partnership rather than a confirmed production deployment. Adverse and durability signals are mixed: G2 and Trustpilot review counts are low, limiting independent retention proxy; Reddit and Hacker News threads occasionally cite latency or cold-start concerns on serverless tier; no public churn announcement or terminated-customer report has been published. The customer proof matrix below tags each named customer with evidence quality, outcome specificity, retention visibility, and production maturity. Retention and repeat-usage primitives (NRR, GRR, gross retention) are not disclosed, and the chapter records that gap as a material evidence gap with a concrete diligence ask. Reference-quality and freshness are best for the 2024-2025 case studies (Salesforce, Zoom, Pika) and weaker for older case studies that have not been updated in 2026.[CU012, CU013, CU014, CU015, CU016, CU017]

Named customer proof table
Customer	Segment	Deployment / use case	Production vs pilot	Outcome	Limitation
Salesforce	Enterprise SaaS	Co-sell + embedded inference	production	Integration depth + Series B lead	Contract value not disclosed
Zoom	Enterprise SaaS	AI feature inference	production	Latency improvement	Specific metrics not public
Pika	Startup (video)	Video model serving	production	Latency reduction via FA-class kernels	Cost benefit qualitative
Cartesia	Startup (voice)	Voice model serving	production	Throughput on dedicated	Pricing not disclosed
Arcee	Startup (open-source)	Model merging + inference	production	Cost vs closed APIs	Volume not disclosed
Nous Research	Open-source community	Community model hosting	production	Open-weight neutrality	Revenue mix not disclosed
Washington University	Academia	Research compute	production	Research throughput	Spend size not disclosed
Adaption	Healthcare	Regulated workflow	launching	Healthcare entry	Production status TBD
GTC 2025 Pioneers cohort	Enterprise mix	Various	production	NVIDIA + Together joint	Cohort not fully enumerated

Rows reflect publicly named customers with case-study or press evidence; private named customers (if any) are not in this table.

[CU012, CU013, CU014, CU015, CU016, CU017]

Retention / repeat usage / satisfaction table
Metric	Value/null	Segment	Confidence	Diligence ask
NRR	null	enterprise	low	Request NRR by cohort
GRR	null	enterprise	low	Request gross retention by cohort
Logo churn	null	enterprise	low	Request named-account churn list
Active developers (paid)	null	self-serve	low	Request paid-developer count
Repeat purchase rate	null	self-serve	low	Request cohort repeat rate
G2 average rating	very small N	self-serve	low	Cannot extrapolate from small N
Trustpilot average rating	very small N	self-serve	low	Cannot extrapolate from small N
Reddit/HN sentiment	mixed-to-positive	community	low	Aggregate qualitative scan
Named-customer renewals	null	enterprise	low	Confirm via reference calls
Dedicated-endpoint renewal rate	null	enterprise	low	Request renewal cohort

All retention primitives are null and accompanied by a specific diligence ask.

[CU022, CU023, CU024, CU025, CU026]

FU003: Customer proof matrix

Evidence quality across named customers; rows pivot per-customer evidence axes complementing the named-customer-proof table.

[CU012, CU013, CU014, CU015, CU016, CU017]

6.3 Expansion, concentration, and adverse signals

Expansion proxies are mostly qualitative. The Salesforce Ventures co-sell relationship is the principal enterprise expansion lever, with the Series B led by Salesforce Ventures interpreted by the market as a multi-year channel commitment; NVIDIA GTC 2025 Pioneers and the Startup Accelerator add brand and pipeline. The HuggingFace partnership funnels developers from the model hub into Together. Concentration risk is impossible to bound precisely without management disclosure, but the public customer mix is skewed toward AI-native startups and developer-tools companies rather than a small number of mega-enterprise contracts, which suggests broader top-of-funnel diversification than e.g. an OpenAI-style anchor-customer model. Channel and procurement friction is documented on the dedicated tier: enterprise sales cycles require sales engagement, custom MSAs, and security review, which adds 60-120 days before revenue. Adverse signals include scattered Reddit and Hacker News threads citing latency, cold-start, or occasional reliability events on the serverless tier; the company maintains a public status page but does not publish an SLA percentage. No public lawsuit, lost-customer report, or named-account churn has surfaced through the runDate. The expansion-and-concentration table below records each expansion driver, concentration risk, impact magnitude, and the precise diligence path required to close the residual gap; the chapter's retention table treats every undisclosed primitive as a diligence ask rather than asserting a number that cannot be sourced. Overall the customer evidence base is consistent with a growth-stage inference platform building real enterprise traction on top of a strong self-serve developer flywheel.[CU027, CU028, CU029, CU030, CU031, CU032]

Expansion and concentration risk table
Expansion driver	Concentration risk	Impact	Diligence path
Salesforce Ventures co-sell	Salesforce concentration in enterprise wins	high	Quantify pipeline % from Salesforce
NVIDIA GTC Pioneers	NVIDIA referral concentration	medium	Quantify GTC-sourced ACV
Startup Accelerator	Long-tail dilution risk	low	Track cohort revenue conversion
HuggingFace partnership	HF dependence for funnel	medium	Confirm cross-promote terms
Self-serve developer growth	Long-tail churn risk	low	Cohort retention by month
Adaption healthcare entry	Single named partner risk	medium	Track follow-on healthcare wins
Sovereign / Prosperity7 channel	Sovereign concentration if materialises	medium	Confirm pipeline commits
Open-source community	Brand dependence on OSS pull	low	Track GH/HF/PyPI signal stability
Top-10 customer concentration	Material if undisclosed	high	Request top-10 anonymised
Geographic concentration	NA-heavy	medium	Request regional revenue split

Expansion drivers and concentration risks ranked qualitatively in absence of disclosed customer-revenue breakdown.

[CU027, CU028, CU029, CU030, CU031, CU032]

FU004: Retention / repeat cohort

Time-series retention placeholder using sector-typical PLG SaaS proxy values; all numbers illustrative pending Together disclosure.

All retention cells are illustrative sector benchmarks (PLG SaaS / inference); Together has not disclosed actual cohort retention.

[CU022, CU023, CU024, CU025, CU026]

6.4 Exhibits

Chapter 07

07Risks

7.1 Regulatory and legal risk surface

Together AI faces the same Generative-AI regulatory perimeter as all foundation-model platforms operating in the United States and Europe. In the US, the FTC opened a 6(b) study into generative-AI investments and partnerships in 2024 and has signalled broad antitrust scrutiny of cloud-AI relationships; the Biden / Trump-era Executive Order on AI established a foundation for federal AI standards that the NIST AI Risk Management Framework operationalises. The BIS has tightened export controls on advanced GPUs (A100, H100, H200, B200) and on the export of certain foundation-model weights, directly relevant to a GPU-cloud operator. In the EU, the AI Act entered into force in 2024 with phased obligations for general-purpose AI providers culminating in 2026-2027; the UK ICO and Australian OAIC have published GenAI guidance that creates de-facto compliance baselines. Privacy regimes (CCPA in California, HIPAA for healthcare workloads) impose contract-level obligations Together discharges via BAAs and SOC 2 controls referenced on its trust center. On the litigation side, the NYT v Microsoft/OpenAI docket, Authors Guild v OpenAI, and Getty v Stability AI are the bellwether copyright cases whose outcomes will shape exposure for every model-hosting platform; Together itself is not currently a named defendant but its open-model hosting business carries adjacent exposure if precedent extends to platform-as-host. Civil-society pressure (CDT, EFF) adds reputational risk. The regulatory and legal risk register below ranks each line item by jurisdiction, likelihood, severity, mitigation, and residual exposure, with diligence asks for every undisclosed control.[CR001, CR002, CR003, CR004, CR005, CR006]

Regulatory / legal risk register
Rule / case	Jurisdiction	Status	Likelihood	Severity	Mitigation	Residual exposure
FTC 6(b) generative-AI inquiry	US	ongoing	high	medium	engage counsel, monitor	possible behavioural remedies
FTC general AI enforcement	US	active	medium	medium	standard advertising/competition compliance	enforcement action
EU AI Act (GPAI)	EU	phased 2024-27	high	high	GPAI obligations, transparency, copyright opt-out	non-compliance penalties up to 7% revenue
BIS export controls (GPUs + weights)	US/global	tightened 2025	high	high	geo-fence customers, screening	blocked sovereign deployments
NIST AI RMF	US	voluntary	medium	low	adopt framework controls	procurement disadvantage if absent
UK ICO GenAI guidance	UK	active	medium	medium	UK DPA + GDPR posture	enforcement exposure
Australia OAIC GenAI guide	AU	active	low	low	adopt guidance	enforcement exposure
White House EO on AI	US	active	medium	medium	reporting thresholds	reporting burden
CCPA (California)	US-CA	active	medium	medium	privacy controls	enforcement exposure
HIPAA (healthcare workloads)	US	active	medium	high	BAA, dedicated tier	breach + fines
SOC 2 attestation surface	global	self-declared on trust center	medium	medium	SOC 2 Type II evidence	attestation gap if expired
NYT v Microsoft/OpenAI (copyright)	US	active litigation	high	medium	monitor; platform-host distinction	precedent extension risk
Authors Guild v OpenAI	US	active litigation	high	medium	monitor; platform-host distinction	precedent extension risk
Getty v Stability AI	US/UK	active litigation	medium	medium	monitor; image-model adjacency	precedent extension risk
CDT AI policy pressure	US	active	low	low	engagement, transparency	reputational

Each row reflects the rule/case posture at runDate; ratings are qualitative pending management disclosure.

[CR001, CR002, CR003, CR004, CR005, CR006]

FR001: Risk heatmap

Likelihood × severity heatmap across top risks.

[CR001, CR003, CR004, CR012, CR018, CR021]

7.2 Operational, security, partner, and dependency risk

Operational risk for Together centres on three vectors: GPU capacity availability (Hopper and Blackwell), model-serving reliability, and regulated-workload controls. NVIDIA is the dominant single-vendor dependency — GPUs, networking (NVLink, InfiniBand), and software stack (CUDA, TensorRT, NeMo, Dynamo) — and is also a strategic investor, which both reduces supply-allocation risk and concentrates correlated downside if Blackwell allocation tightens. HuggingFace is the primary model-artefact dependency; partner risk would emerge if HF changes hosting terms or commercial alignment. Salesforce Ventures is the lead enterprise channel partner via the Series B; channel concentration risk is non-trivial. Security exposure spans the standard model-cloud surface (prompt injection, data exfiltration, prompt-logging leakage, supply-chain compromise of model weights) and the SOC 2 / HIPAA control surface that Together publishes via its trust center. The public status page exists but does not publish an SLA percentage. Competitive displacement risk is real: Fireworks, Replicate, Modal, Anyscale, Cerebras, and Groq all serve overlapping workloads, and hyperscalers (AWS Bedrock, GCP Vertex, Azure OpenAI) bundle inference into existing enterprise contracts. People and execution risk includes key-person dependency on Vipul Ved Prakash (CEO), Ce Zhang (CTO), and Tri Dao (Chief Scientist), and the build-out velocity required to keep pace with Hopper→Blackwell→Rubin cadence. The operational, partner, and people registers below capture each failure mode, mitigation maturity, and residual exposure with explicit diligence paths.[CR018, CR019, CR020, CR021, CR022, CR023]

Operational / quality / security risk register
Failure mode	Likelihood	Severity	Mitigation maturity	Residual exposure	Unresolved gap
Serverless multi-hour outage	medium	medium	status page; no SLA %	customer churn	SLA disclosure
Dedicated-endpoint hardware failure	low	medium	redundancy implied	revenue at risk	reliability metrics
Prompt injection / data exfiltration	medium	medium	safety models, function-calling guardrails	customer breach	pen-test cadence undisclosed
Model-weight supply-chain compromise	low	high	HF integrity checks	platform-wide compromise	weight signing process undisclosed
SOC 2 attestation lapse	low	medium	trust center publishes posture	enterprise-deal block	expiry date undisclosed
HIPAA BAA breach	low	high	BAA available	regulatory fines	breach plan undisclosed
GPU capacity shortfall	medium	high	NVIDIA partnership	revenue cap	allocation commit undisclosed
Network / inter-zone failure	low	medium	multi-region implied	latency spike	region map undisclosed
Insider threat	low	medium	standard controls	data leak	access controls undisclosed
Software bug introducing regression	medium	low	staged rollout implied	reputation	release cadence undisclosed

Operational ratings are qualitative; multiple control primitives are undisclosed and treated as diligence asks.

[CR018, CR019, CR020, CR021, CR022, CR028]

Partner / dependency risk register
Dependency	Counterparty	Role	Concentration	Failure scenario	Severity	Mitigation	Residual exposure
GPU supply	NVIDIA	primary supplier + investor	very high	Blackwell allocation cut	high	strategic investor; multi-gen commit	revenue cap
Model artefacts	HuggingFace	registry + distribution	high	hosting policy change	medium	company self-host fallback	distribution friction
Enterprise channel	Salesforce	co-sell + investor	medium	co-sell deprioritisation	medium	direct sales build-out	pipeline shrink
Datacenter capacity	Multiple (undisclosed)	colo + hyperscaler	medium	single-region capacity loss	medium	multi-region build	latency / cost
Network	Multiple	transit + IX	low	peering loss	low	multi-carrier	transient latency
Open-source community	Llama, Mistral, Qwen, DeepSeek maintainers	model upstreams	medium	license change	medium	model diversity	licensing review burden
Capital partners	GC / Salesforce / NVIDIA / Lux / Coatue / Prosperity7 / Kleiner	investors	medium	round oversubscription failure	medium	revenue traction	financing risk
Sovereign partners	Prosperity7 (KSA-adjacent)	strategic investor	low	geo-political pressure	medium	disclosure posture	reputational

Dependency ratings reflect public concentration only; private contractual commits remain a diligence ask.

[CR023, CR024, CR025, CR026, CR027, CR030]

People / execution risk register
Role / function	Dependency or gap	Likelihood	Severity	Mitigation	Diligence path
CEO Vipul Ved Prakash	founder-led; key-person dependency	low	high	founder retention	reference checks
CTO Ce Zhang	key-person dependency	low	high	retention	reference checks
Chief Scientist Tri Dao	key-person; brand-defining	low	high	academic dual-affiliation	retention plan
VP Infra Alon Gavrielov	new hire (2025)	low	medium	recent join	onboarding review
CFO	undisclosed at runDate	medium	medium	recruiting in progress (inferred)	confirm hire
CRO / sales leader	undisclosed at runDate	medium	medium	enterprise build-out	confirm hire
Engineering bench	growing post-Series B	medium	medium	hiring momentum	headcount disclosure
Compliance / GRC	SOC 2 referenced; team size undisclosed	medium	medium	attestation evidence	team size confirm
Board composition	GC + SVP + NVIDIA + founders	medium	medium	growth-stage governance	board minutes diligence
Hopper→Blackwell→Rubin transition execution	multi-quarter build-out	medium	high	partnership with NVIDIA	program plan diligence

People register includes both named individuals and undisclosed roles; CFO/CRO confirmations are explicit diligence asks.

[CR032, CR033, CR034, CR035]

FR002: Risk transmission map

How risks flow into revenue, margin, financing, and valuation.

[CR001, CR003, CR004, CR012, CR023, CR024]

7.3 Mitigations, kill criteria, and thesis-break triggers

The mitigation-and-kill-criteria table below pairs every top risk with a monitorable trigger, an explicit threshold or event, and the action implication if the trigger fires. Triggers span regulatory (e.g. EU AI Act GPAI obligation enforcement in 2027), litigation (e.g. adverse copyright ruling that extends to platform hosts), partner (e.g. NVIDIA allocation cut or HuggingFace hosting change), operational (e.g. multi-hour serverless outage, breach disclosure), competitive (e.g. hyperscaler bundled-inference pricing undercut), commercial (e.g. Salesforce co-sell churn), and execution (e.g. founder departure, missed Blackwell go-live). For each trigger the table records the transmission path into revenue, margin, financing, or valuation, and the action implication (kill, re-underwrite, monitor, accept). The chapter is explicit that several primitives — incident count, SLA, top-10 customer concentration, retention, GPU committed-spend, opex split — are undisclosed and therefore treated as diligence asks rather than asserting numbers that cannot be sourced. Adverse-source coverage is wide: regulatory bodies (FTC, BIS, EU, UK ICO, OAIC), legal dockets (CourtListener: NYT, Authors Guild, Getty), competitor websites (Fireworks, Replicate, Modal, Anyscale, Groq, Cerebras, CoreWeave, Lambda), and developer-sentiment fora (Hacker News, Reddit). The chapter underwrites that Together's public risk surface is normal for a growth-stage AI infrastructure company with a healthy mitigation posture, but several control primitives remain unverified pending management disclosure.[CR034, CR035, CR036, CR037, CR038, CR039]

Mitigation and kill criteria table
Risk	Monitorable trigger	Threshold / event	Action implication
EU AI Act GPAI	enforcement notice	first 7% fine on a peer	re-underwrite EU revenue
BIS export tightening	new entity-list rule	additional GPU export class added	re-underwrite sovereign pipeline
Copyright litigation extension	platform-host ruling	any host-liability ruling	re-underwrite OSS hosting
NVIDIA allocation	Blackwell allocation cut	published cut to a comparable peer	re-underwrite capacity ramp
HuggingFace policy change	HF terms update	material commercial change	build self-host
Serverless outage	multi-hour incident	>4h or repeated >1h	SLA review + customer comms
Security breach	disclosure event	any reportable incident	immediate re-underwrite
Customer concentration	top-10 share	single customer >25%	concentration discount
Founder departure	public announcement	any of CEO/CTO/CSO	kill or major re-underwrite
Down-round	new financing	flat or down vs Series B	re-underwrite valuation

Triggers are monitorable from public disclosure; the table is the chapter's actionable kill-criteria contract.

[CR034, CR035, CR036, CR037, CR038, CR039]

FR003: Dependency map

Critical partners, suppliers, regulators, and financing dependencies.

[CR023, CR024, CR025, CR026, CR027, CR030]

7.4 Exhibits

Chapter 08

08Valuation

8.1 Recommendation, thesis, and anti-thesis

The recommendation is Hold/Monitor with medium confidence and a medium-high risk rating. Investment thesis: Together AI sits at a structurally attractive intersection of (a) the GenAI inference market expanding 40-60% CAGR per analyst-market-data sources (Gartner, Forrester, IDC, a16z, Bessemer, Menlo), (b) a credible technical moat through FlashAttention authorship (Tri Dao), ThunderKittens kernels (Stanford HazyResearch), Together Inference Engine v2, and Mixture-of-Agents productisation, and (c) an enterprise distribution channel anchored by Salesforce Ventures co-sell, NVIDIA GTC 2025 Pioneers, and the Startup Accelerator funnel. Anti-thesis: the inference layer is contested by Fireworks, Replicate, Modal, Anyscale, Cerebras, Groq, and hyperscalers (AWS Bedrock, GCP Vertex, Azure OpenAI Service) who bundle inference into existing enterprise contracts; revenue (reported $130M-$200M+ ARR per The Information) and retention primitives remain undisclosed; valuation at the Series B mark (~$3.3B-$3.5B) requires multi-year revenue scale to underwrite a 3-5x exit; and the regulatory perimeter (EU AI Act, BIS, copyright litigation precedent) is tightening through 2027. The valuation chapter records each of these as an explicit thesis-break trigger and pairs it with a monitorable threshold and an action implication. The recommendation summary table below pairs recommendation, confidence, risk rating, valuation stance, and decision implication; the thesis / anti-thesis table records the underlying arguments and what would change the view.[CV001, CV002, CV003, CV004, CV005, CV006]

Recommendation summary table
Recommendation	Confidence	Risk rating	Valuation stance	Decision implication
Hold / Monitor	medium	medium-high	at-or-near current Series B mark	Track ARR + NRR + concentration; revisit at Series C
Buy (conditional)	medium	medium	25% correction OR confirmed >$500M ARR	Enter on confirmed traction or down-round
Pass (conditional)	medium	high	if hyperscaler pricing cut >40% OR NVIDIA allocation cut OR breach	Exit / decline if bear trigger fires
Bull case	low	medium	>$8B exit by 2028	Strategic-acquisition or premium IPO path
Base case	medium	medium	$4B-$6B exit by 2028	ARR scale + margin expansion
Bear case	medium	high	$1B-$2.5B outcome	Down-round / compressed exit

Recommendation is conditional on the trigger thresholds in the thesis-break table.

[CV001, CV002, CV003, CV004, CV005]

Thesis / anti-thesis table
Argument	What would change the view
GenAI inference TAM growing 40-60% CAGR per analyst sources	TAM revisions <20% CAGR
FlashAttention + ThunderKittens + TIE v2 form a credible technical moat	Open-source / hyperscaler kernel parity erodes Together edge
Salesforce Ventures-led Series B implies multi-year channel commitment	Salesforce co-sell deprioritisation or churn
NVIDIA strategic investment + GTC 2025 Pioneers cohort signal supply + pipeline	NVIDIA reallocation to direct-managed offerings (DGX Cloud)
Open-source neutrality is a defensible positioning vs closed-API providers	Major OSS license changes (Llama, Mistral, Qwen, DeepSeek)
Documented enterprise + startup proof base (Salesforce, Zoom, Pika, Cartesia, Arcee)	Named-customer churn or production downgrade
Capital base + brand attract talent and customers	Down-round or failed Series C
Anti: hyperscaler bundled inference (AWS Bedrock, GCP Vertex, Azure) compresses pricing	Hyperscaler retreats from bundled inference
Anti: GenAI copyright litigation could extend to platform hosts	Adverse precedent contained to model-trainer defendants
Anti: revenue + retention undisclosed; price-sensitive entry discipline required	Management discloses ARR + NRR

Thesis and anti-thesis are symmetric; the chapter is explicit on what evidence would flip the view.

[CV006, CV007, CV008, CV009, CV010, CV011]

FV001: Recommendation logic

Chain from scale, proof, risks, and valuation to the recommendation.

[CV001, CV002, CV003, CV004, CV005, CV006]

8.2 Scenarios, comparables, and sensitivity

Three scenarios anchor the valuation. Base case ($4B-$6B exit, ~50% probability) assumes ARR scales from current $130M-$200M to $500M-$700M over 2026-2028 with sustained gross margin in the 30-40% range typical of AI inference, modest dilution at a Series C, and Hopper→Blackwell capacity ramp on time. Bull case ($8B-$12B exit, ~25% probability) requires ARR >$1B by 2028, gross margin expansion through FlashAttention-driven utilisation, sustained Salesforce + NVIDIA channel commitment, and either a strategic acquisition (NVIDIA, hyperscaler, Salesforce) or a 2027-2028 IPO at premium multiples. Bear case ($1B-$2.5B outcome, ~25% probability) materialises if hyperscaler bundled inference compresses pricing, NVIDIA allocation tightens, or copyright precedent extends to platform hosts. The comparable-valuation table covers CoreWeave (post-IPO GPU-cloud comparable), Navan (recent S-1 SaaS comparable), Figma (S-1 comparable), and private rounds (Fireworks rumoured $4B, Replicate, Modal, Sakana, Mistral, Anthropic) plus public listings (NVIDIA, Snowflake) used as ceiling references. Sensitivity drivers are revenue growth, gross margin, NRR, exit multiple, and probability-weighted exit window. The bull/base/bear table and comparable-valuation table below capture each scenario's assumptions, valuation logic, and key sensitivity. The valuation-sensitivity bar figure and the valuation-range figure show downside, base, and upside vs the current Series B mark.[CV018, CV019, CV020, CV021, CV022, CV023]

Bull / base / bear scenario table
Scenario	Probability	ARR assumption	Gross margin	Exit multiple	Valuation/return logic	Key risks
Bull	25%	>$1B ARR by 2028	40-50%	12-15x ARR	$8B-$12B exit; strategic / premium IPO	Hyperscaler bundling; NVIDIA reallocation
Base	50%	$500M-$700M ARR by 2028	30-40%	8-10x ARR	$4B-$6B exit; trade-sale or IPO	Competitive pricing; retention drift
Bear	25%	$200M-$300M ARR by 2028	20-30%	5-7x ARR	$1B-$2.5B outcome; down-round	Hyperscaler price war; copyright precedent; NVIDIA allocation cut

Probabilities are subjective and chapter-internal; each row should be re-marked at Series C and at every major customer or regulatory event.

[CV015, CV016, CV017, CV018, CV019, CV020]

Comparable valuation table
Comparable	Metric	Multiple / valuation / status	Relevance	Limitation
CoreWeave (post-IPO, GPU-cloud)	EV / next-12-month revenue	8-12x at post-IPO	GPU-cloud closest comparable	CoreWeave revenue mix is GPU-bare-metal heavier
Navan (S-1, SaaS)	EV / NTM revenue	8-12x at filing	Growth-stage SaaS comparable	SaaS, not inference
Figma (S-1, SaaS)	EV / NTM revenue	12-15x at filing	High-multiple SaaS comparable	Design SaaS, not inference
Fireworks AI (rumoured 2024 round)	last private round	~$4B (rumoured)	Direct inference comparable	Round value rumoured
Replicate (private)	last private round	undisclosed	Direct inference comparable	Limited disclosure
Modal (private)	last private round	undisclosed	Serverless inference comparable	Limited disclosure
Anyscale (private)	last private round	$1B-$2B	Ray + inference comparable	Different positioning
Sakana AI (round)	last private round	~$1.5B (Aug 2024)	OSS model-builder comparable	Model lab not infra
Mistral (round)	last private round	$6B (mid-2024)	OSS model lab comparable	Hybrid model + infra
Anthropic (round)	last private round	$60B+ (2025)	Closed-API comparable	Different model — not direct
NVIDIA (public)	EV / NTM revenue	high-teens to mid-20s	Ceiling reference	Far larger scale
Snowflake (public)	EV / NTM revenue	10-15x	SaaS ceiling reference	Mature SaaS

Comparable rows mix public and private valuations; private-round figures are taken from press reports and PitchBook.

[CV021, CV022, CV023, CV024, CV025, CV026]

FV002: Valuation sensitivity

Sensitivity of valuation outcome to revenue, margin, multiple, retention.

[CV018, CV019, CV020, CV021]

FV003: Valuation / return range

Low / base / high valuation range across scenarios at 2028 exit window.

[CV022, CV023, CV024, CV025, CV026, CV029]

8.3 Thesis-break triggers, diligence asks, and KPIs

The thesis-break and kill-triggers table converts the chapter's risk and valuation logic into monitorable triggers tied to specific events: (a) revenue miss vs $500M-$700M ARR run-rate by 2027-2028 → re-underwrite base case, (b) Salesforce co-sell deprioritisation → kill bull case, (c) NVIDIA Blackwell allocation cut → re-underwrite capacity ramp, (d) hyperscaler bundled inference price cut >40% → compression, (e) any platform-host copyright ruling → re-underwrite OSS hosting, (f) Series C at flat/down vs Series B → mark-to-market valuation, (g) founder departure → kill thesis, (h) breach disclosure or multi-hour outage → SLA + reputation re-underwrite. The final diligence asks table records the remaining missing primitives — exact ARR, NRR/GRR, top-10 customer concentration, GPU committed spend, opex split, CFO/CRO hires, sovereign-channel posture, paid-developer count — and maps each to an owner or diligence path. The investment-KPI figure consolidates IC-ready scoring across market, proof, moat, economics, risk, valuation, and evidence quality on a 0-100 scale. The chapter is explicit that the recommendation is price-sensitive and evidence-sensitive: at a Series B valuation in the $3.3B-$3.5B range with the disclosed evidence base, Hold/Monitor is the disciplined answer — Buy at a 25%+ correction or with confirmed >$500M ARR + >120% NRR; Pass if any of the bear-case triggers fires before the Series C.[CV034, CV035, CV036, CV037, CV038, CV039]

Thesis-break and kill triggers table
Trigger	Threshold	Transmission to thesis	Action implication
ARR run-rate vs base case	<$500M ARR by FY2027	revenue mark-down	re-underwrite base
Salesforce co-sell	public deprioritisation	channel mark-down	kill bull
NVIDIA allocation	published cut to peer	capacity mark-down	re-underwrite capacity
Hyperscaler bundled pricing	>40% cut on AWS Bedrock or peer	margin compression	re-underwrite base
Copyright precedent	platform-host ruling	OSS hosting mark-down	re-underwrite OSS revenue
Financing	Series C flat or down vs Series B	mark-to-market	re-underwrite valuation
Founder departure	any of CEO/CTO/CSO	execution mark-down	kill thesis
Security / outage	breach disclosure OR multi-hour outage	reputation + SLA	re-underwrite enterprise pipeline

Trigger thresholds are monitorable from public disclosure or peer comparables.

[CV033, CV034, CV035, CV036, CV037, CV038]

Final diligence asks table
Topic	Missing evidence	Why it matters	Owner / diligence path
Revenue	exact ARR at runDate	base/bull scenario underwriting	request management ARR + growth
Retention	NRR / GRR / cohort retention	quality of revenue	request retention by cohort
Concentration	top-10 customer share	single-event downside	request anonymised top-10
GPU commit	committed spend with NVIDIA	margin underwriting	request supplier commit
Opex split	R&D / S&M / G&A	burn underwriting	request income-statement split
CFO / CRO	presence + tenure	execution underwriting	confirm hires
Sovereign channel	Prosperity7 commit	geo + brand risk	confirm channel posture
Paid-developer count	paid vs free split	self-serve revenue underwriting	request paid-developer count
SOC 2 expiry	Type II expiry date	enterprise procurement	request attestation refresh
Open license posture	OSS hosting policy	copyright exposure	request hosting policy

All diligence asks map to chapter-internal questions and to the risks chapter mitigation table.

[CV040, CV041, CV042, CV043, CV044]

FV004: Investment KPIs

IC-ready scoring across market, proof, moat, economics, risk, valuation, evidence.

[CV040, CV041, CV042, CV043, CV044]

8.4 Exhibits

Disclaimer

This report is a public-evidence diligence snapshot, not investment advice. Important financial, legal, technical, and contractual facts remain non-public and should be verified directly with management and primary documents before any investment decision.

Evidence index

Claims
ID	Statement	Confidence	Sources
CO001	Together AI markets itself as "the AI acceleration cloud" offering training, fine-tuning, and inference for open-source and custom models.	High	SO001, SO002
CO002	The corporate entity is Together Computer Inc., headquartered in San Francisco, California, with an additional research presence in Zurich.	High	SO002, SO004, SO003
CO003	Together was incorporated on 27 June 2022 by four co-founders: Vipul Ved Prakash, Ce Zhang, Chris Ré, and Percy Liang.	High	SO002, SO018
CO004	The company's public surface positions three product lines: serverless inference API, dedicated endpoints, and fine-tuning/training services.	High	SO001, SO035
CO005	Together emphasises that customers can keep weights and choose dedicated capacity, a deliberate contrast with closed-API providers.	Medium	SO001, SO005
CO006	CEO Vipul Ved Prakash previously co-founded Topsy, which Apple acquired for approximately $200M in 2013, and earlier co-founded Cloudmark.	High	SO018, SO002
CO007	CTO Ce Zhang is a tenured professor at ETH Zürich specialising in distributed ML and data-centric ML research.	High	SO002, SO018
CO008	Chief Scientist Chris Ré is a MacArthur Fellow at Stanford and a co-founder of Snorkel, anchoring much of Together's open-source research lineage.	High	SO002, SO011
CO009	Co-founder Percy Liang directs the Stanford Center for Research on Foundation Models (CRFM) and leads the HELM benchmark.	High	SO002, SO018
CO010	Princeton CS faculty member Tri Dao is the principal author of FlashAttention and is publicly identified as a Together chief scientist.	High	SO002, SO009, SO036
CO011	Together actively recruits across kernel engineering, GPU systems, applied ML, sales, and revenue operations roles as of May 2026.	High	SO003, SO018
CO012	Together raised a $20M Series Seed in May 2023 led by Lux Capital, with Factory, SciFi Capital, and Long Journey Ventures participating.	High	SO018, SO012
CO013	A $102.5M Series A closed in November 2023, led by Kleiner Perkins with NVIDIA, Emergence, NEA, Prosperity7, and Greycroft participating.	High	SO006, SO014, SO018
CO014	An interim financing in March 2024 reportedly valued Together at approximately $1.25B.	Medium	SO015, SO018
CO015	Together closed a $305M Series B on 9 July 2024 led by Salesforce Ventures and Coatue at a $3.3B post-money valuation.	High	SO012, SO013, SO016, SO017
CO016	Cumulative disclosed primary capital totals approximately $533M (seed + A + interim + B) before any 2025–2026 extensions.	Medium	SO012, SO006, SO018
CO017	No Together AI registration, S-1, or other public filing appears on SEC EDGAR as of the May 2026 run date.	High	SO027, SO019
CO018	NVIDIA participated as a strategic investor in both Series A and Series B financings, signalling H100/H200 supply alignment.	Medium	SO026, SO006, SO012
CO019	CNBC reported Together AI was running at an approximately $100M annualised revenue pace around the Series B announcement in July 2024.	Medium	SO012
CO020	Bloomberg cited triple-digit year-over-year revenue growth for Together AI at the time of the Series B, without disclosing absolute figures.	Medium	SO013
CO021	Together has publicly stated it operates more than 20,000 NVIDIA Hopper-class GPUs across its multi-region cluster.	Medium	SO012, SO005
CO022	The company describes its developer footprint as "hundreds of thousands" of developers, without disclosing paid versus free split.	Low	SO001, SO005
CO023	Together's public job board and LinkedIn footprint imply a headcount above 150 full-time staff globally as of May 2026.	Low	SO003, SO018
CO024	No audited gross margin, net revenue retention, or paid-customer disclosure exists for Together AI as of the run date.	High	SO027, SO019
CO025	Together AI launched OpenChatKit in March 2023 with LAION and Ontocord, an early open-source instruction-tuned chat baseline.	High	SO008, SO030
CO026	The RedPajama 1T token open dataset was released on 17 April 2023, intended to reproduce LLaMA-grade pretraining data.	High	SO007, SO029
CO027	FlashAttention-3 was published on arXiv and Together's blog on 11 July 2024, claiming state-of-the-art H100 attention performance.	High	SO036, SO009
CO028	StripedHyena-Nous-7B, a non-attention long-context architecture, was released in December 2023 in collaboration with Nous Research.	High	SO031, SO034
CO029	Together's Mixture-of-Agents paper, published in June 2024, demonstrated multi-LLM ensembling improvements on AlpacaEval.	High	SO037, SO011
CO030	Together publishes an active GitHub organisation (togethercomputer) with multiple ten-thousand-star repositories including OpenChatKit and RedPajama-Data.	High	SO028, SO029, SO030
CO031	The HuggingFace organisation togethercomputer hosts the RedPajama datasets and StripedHyena, Pythia, LLaMA-32k, and m2-bert models.	High	SO033, SO011
CO032	No public regulatory action, litigation, recall, or executive departure involving Together AI has been reported as of May 2026.	Medium	SO018, SO019, SO027
CO033	Together AI is described as one of the most followed open-source-AI infrastructure accounts on Hacker News and X.	Low	SO020, SO024, SO021
CO034	Salesforce Ventures publicly framed the Series B as enabling enterprise customers to deploy open models on Together's cloud.	Medium	SO025, SO012
CO035	Crunchbase's Together AI profile is paywalled and could not be independently verified for cap-table details at runDate.	Medium	SO019
CO036	Cover-metric "gaps" remain for ARR, gross margin, NRR, and paid-customer count; all are flagged as diligence asks for management.	Medium	SO027, SO019, SO012
CM001	Together AI competes in the AI compute and inference platform layer between hyperscaler GPU IaaS and closed-API model labs.	High	SM001, SM004, SM023
CM002	Together's addressable spend pool excludes general-purpose cloud compute and closed-only proprietary model APIs.	Medium	SM001, SM002
CM003	Status-quo substitutes for Together include self-hosted Kubernetes-on-GPU clusters and OpenAI/Anthropic closed APIs.	Medium	SM011, SM012
CM004	Specialised GPU clouds (CoreWeave, Lambda) compete on infrastructure but lack Together's open-source-model SaaS layer.	Medium	SM013, SM014
CM005	Inference-API providers (Replicate, Fireworks, Groq, Modal) compete directly at the per-token serverless layer.	High	SM015, SM019, SM018, SM016
CM006	AWS Bedrock and Google Vertex AI offer hosted open-model inference that overlaps Together's serverless product.	High	SM011, SM012
CM007	Gartner sizes 2024 AI infrastructure TAM at $40–60B with a 30–50% CAGR through 2028.	Medium	SM021
CM008	IDC-style analyst notes peg 2024 global AI infrastructure spend near $50B.	Low	SM021, SM022
CM009	Triangulated inference + dedicated GPU SAM for 2026 lands in an $8–15B range.	Medium	SM021, SM024, SM022
CM010	Together-addressable SOM (channels + open-model demand) is on the order of $1–3B in 2026.	Low	SM024, SM027
CM011	CNBC reported a ~$100M Together ARR at the July 2024 Series B, implying mid-single-digit SOM share.	Medium	SM024, SM025
CM012	NVIDIA disclosed >$30B quarterly data-centre revenue in early 2025, evidence that AI-compute spend dwarfs Together's ARR.	High	SM028, SM022
CM013	No single public source cleanly disaggregates inference spend from training capex, creating range uncertainty.	Medium	SM021, SM028, SM022
CM014	AI-native startups and model labs are Together's most active early buyers, choosing it for open-weight flexibility and dedicated GPU access.	Medium	SM003, SM032
CM015	F500 enterprise platform teams are an emerging segment, anchored by Salesforce Ventures Series B leadership.	Medium	SM027, SM024
CM016	Sovereign and regional cloud customers are a strategic third segment, signalled by Prosperity7 (Aramco) investor presence.	Low	SM024, SM023
CM017	Within Together, users (developers) frequently differ from payers (procurement/finance), lengthening enterprise sales cycles.	Low	SM027, SM004
CM018	Self-serve credit-card adoption is the primary land motion for AI-native startup customers on Together.	Medium	SM002, SM008
CM019	Together's NVIDIA GTC 2025 spotlight emphasised "AI pioneers" as case-study customers, validating the enterprise wedge.	Medium	SM033, SM028
CM020	Together's AI-Native conference (2025) was framed as a developer community event, reinforcing top-of-funnel demand generation.	Medium	SM005, SM030
CM021	Open-weight model proliferation (Llama 3/4, DeepSeek, Mistral, Qwen) keeps SAM growth above 35% CAGR through 2027.	Medium	SM022, SM021, SM029
CM022	NVIDIA Hopper and Blackwell GPU scarcity drives demand for Together's reserved capacity SKUs.	Medium	SM028, SM013
CM023	Closed-API price cuts from OpenAI compress per-token margins across the inference market.	Low	SM002, SM030
CM024	Hyperscaler open-model commoditisation (AWS Bedrock, GCP Vertex Model Garden) threatens to erode Together's pure-inference SAM.	Medium	SM011, SM012
CM025	Sovereign data residency rules accelerate demand for in-region dedicated clusters but cap cross-border ARR.	Low	SM004, SM023
CM026	Energy and data-centre permitting bottlenecks slow capacity expansion through 2028.	Low	SM013, SM028
CM027	Agentic AI workloads (Mixture-of-Agents, multi-step reasoning) multiply per-user token volume.	Medium	SM004, SM005
CM028	FinOps pressure pushes enterprises to substitute open-weight inference for closed-API spend.	Low	SM002, SM027
CM029	Together announces serverless, dedicated, and batch inference SKUs to capture different buyer demand curves.	High	SM002, SM008, SM009, SM010
CM030	Batch inference pricing updates in 2025 reduced per-million-token costs to attract high-volume customers.	Medium	SM006, SM010
CM031	Specialised GPU clouds CoreWeave and Lambda compete on raw GPU-hour pricing; Together overlays an inference SaaS layer.	Medium	SM013, SM014
CM032	Groq, Cerebras, and SambaNova compete with bespoke silicon for inference latency leadership.	High	SM018, SM020
CM033	Modal, Replicate, and Anyscale compete in serverless and Ray-based AI compute SaaS.	Medium	SM016, SM015, SM017
CM034	Fireworks AI is widely cited as Together's closest direct competitor on open-model inference SaaS.	Medium	SM019, SM030
CM035	Public-cloud earnings (AWS, GCP) describe AI workloads as the fastest-growing portion of cloud revenue.	Medium	SM011, SM012
CM036	Reddit r/LocalLLaMA and Hacker News discussion volume around Together has risen steadily through 2024–2026.	Low	SM030, SM029, SM031
CP001	Together competes against AWS Bedrock and Google Vertex Model Garden on hosted open-weight model inference.	High	SP018, SP019, SP001
CP002	Specialised GPU clouds CoreWeave and Lambda compete with Together at the IaaS layer for reserved GPU capacity.	High	SP020, SP021
CP003	Fireworks, Replicate, Modal, and Anyscale provide direct substitutes at the per-token serverless inference layer.	Medium	SP026, SP022, SP023, SP024
CP004	Groq, Cerebras, and SambaNova compete with bespoke silicon for inference latency leadership.	High	SP025, SP027
CP005	OpenAI and Anthropic act as substitutes for closed-API customers willing to give up weight portability.	Medium	SP018, SP036
CP006	TensorWave provides AMD MI300X GPU capacity as a niche alternative for cost-sensitive teams.	Low	SP028
CP007	Self-hosted Kubernetes-on-GPU is the status-quo alternative most cited by frontier labs and FAANG.	Low	SP036, SP037
CP008	Fireworks AI is widely cited as Together's closest direct competitor on open-model inference SaaS.	Medium	SP026, SP036, SP037
CP009	Together leads on FlashAttention kernel performance, anchored by the FlashAttention-3 paper and Together engineering team.	High	SP031, SP005, SP029, SP030
CP010	FlashAttention-4 was released in 2025 and extends Together's kernel lead on Hopper GPUs.	Medium	SP006
CP011	AWS Bedrock and GCP Vertex lead on enterprise compliance breadth (BAA, FedRAMP, regional residency).	High	SP018, SP019
CP012	Groq leads on single-stream inference latency on its supported models but lags in model coverage.	Medium	SP025, SP036
CP013	Fireworks AI provides an OpenAI-compatible API and serves the same open-model catalog as Together.	High	SP026, SP015
CP014	Together's serverless Llama-70B is listed near $0.88 per million tokens, within the OpenAI-parity envelope.	High	SP002, SP011
CP015	Together batch inference offers up to 50% discount versus serverless rates as of the 2025 update.	Medium	SP013
CP016	AWS Bedrock charges $0.99/M output tokens for Llama 3 70B in 2026 list pricing.	Medium	SP018
CP017	GCP Vertex Llama 3 70B is priced near $0.99/M tokens with volume discounts.	Medium	SP019
CP018	Groq lists Llama 3 70B at ~$0.59/M tokens, undercutting Together on raw price while constraining model choice.	Medium	SP025
CP019	CoreWeave and Lambda charge $2–4 per H100-hour for reserved or on-demand GPUs.	Medium	SP020, SP021
CP020	Together fine-tuning API, batch SKU, and dedicated endpoints differentiate it from raw-GPU competitors.	High	SP012, SP013, SP011
CP021	Together's open-source research lineage (RedPajama, StripedHyena, MoA, FlashAttention) sustains community gravity that competitors struggle to match.	High	SP031, SP034, SP004
CP022	Tri Dao and Chris Ré anchor Together's kernel and architecture research velocity.	High	SP031, SP005, SP008
CP023	NVIDIA's participation in Series A and Series B is read by the market as a GPU supply alignment moat.	Medium	SP041
CP024	Salesforce Ventures Series B leadership opens an enterprise distribution channel competitors lack.	Medium	SP004, SP003
CP025	Together advertises dedicated endpoints and reserved capacity SKUs that raise customer switching cost.	High	SP012, SP002
CP026	Hyperscalers (AWS, GCP) own enterprise procurement and identity, which is a distribution disadvantage Together must compensate for.	Medium	SP018, SP019
CP027	Enterprise multi-homing across Together / Fireworks / Bedrock is the reported equilibrium in 2026 buyer surveys.	Low	SP036, SP037
CP028	Open-weight neutrality is a counter-positioning advantage versus closed-only OpenAI and Anthropic substitutes.	Medium	SP001, SP002
CP029	Together publishes an OpenAI-compatible chat completions endpoint, simplifying migration from closed APIs.	High	SP015, SP016
CP030	CoreWeave's 2024 IPO disclosures reveal $1B+ revenue scale, implying meaningful capital advantage at the IaaS layer.	Medium	SP020, SP036
CP031	Lambda Labs raised a $320M Series C in 2024 to expand its H100/H200 fleet.	Medium	SP021
CP032	Groq and Cerebras have each raised more than $1B in 2024–2025 to fund bespoke silicon expansion.	Medium	SP025, SP027
CP033	AWS Bedrock's 2025 expansion of Llama support compresses Together's premium on commodity inference workloads.	Medium	SP018
CP034	Specialised silicon vendors (Groq, Cerebras, SambaNova) pose a latency-leapfrog risk that pure-software inference cannot fully match.	Medium	SP025, SP027
CP035	Together's Python SDK and PyPI download trajectory signal sustained developer pull comparable to peers.	Medium	SP042, SP043
CP036	Speculative-decoding and Medusa-class research feed Together's ability to close any Groq latency gap on shared models.	Medium	SP032, SP033
CI001	Together AI raised a $20M Seed in May 2023 led by Lux Capital.	High	SI008, SI018, SI019
CI002	Together AI raised a $102.5M Series A in November 2023 led by Kleiner Perkins.	High	SI005, SI015, SI018
CI003	In March 2024 Together added approximately $106M at a reported $1.25B valuation (Series A2).	Medium	SI016, SI007, SI014
CI004	Per the canonical company-overview claim, the Series B closed July 2024 at ~$3.3B post led by Salesforce Ventures and Coatue (financials chapter relies on that fact for capital-stack analysis).	High	SI011, SI012, SI013, SI006
CI005	NVIDIA participated in both Series A and Series B as a strategic investor.	High	SI022, SI006
CI006	Salesforce Ventures led the Series B, opening an enterprise distribution channel.	High	SI021, SI011, SI006
CI007	Cumulative disclosed primary capital is approximately $533M across Seed, Series A, March 2024 extension, and Series B.	High	SI011, SI018, SI006
CI008	No S-1, S-3, or registered offering appears on SEC EDGAR for Together Computer Inc. at the 2026-05 runDate.	High	SI020, SI025
CI009	CNBC reported an approximately $100M annualised revenue pace around the July 2024 Series B announcement.	Medium	SI011, SI012
CI010	Bloomberg reported triple-digit revenue growth around the July 2024 Series B.	Medium	SI013, SI014
CI011	Together has not published audited ARR, gross margin, or NRR figures as of the runDate.	High	SI020, SI001, SI002
CI012	Together publishes per-token list pricing on its public pricing page for serverless inference.	High	SI002, SI001
CI013	Together offers a 50% batch inference discount as of the 2025 batch pricing update.	Medium	SI009, SI002
CI014	Dedicated endpoint and reserved-capacity pricing is quoted via sales rather than published.	High	SI002, SI004
CI015	Together SKUs span serverless, dedicated, fine-tuning, batch, embeddings, vision, audio, and image.	High	SI002, SI001, SI004
CI016	Realised enterprise pricing for Together is not publicly disclosed and is a material diligence gap.	Medium	SI002, SI038
CI017	The Information has published paywalled coverage of Together AI 2025 revenue trajectory.	Low	SI026
CI018	PitchBook lists Together AI as later-stage venture with no public 2025 round confirmation.	Medium	SI025, SI019
CI019	Together has not disclosed gross margin by SKU as of the runDate.	High	SI020, SI002, SI001
CI020	Together has not disclosed top-10 customer concentration as of the runDate.	High	SI020, SI003
CI021	Together has not disclosed net dollar retention (NDR) as of the runDate.	High	SI020, SI003
CI022	Together has not disclosed contracted-revenue (RPO) figures.	High	SI020, SI001
CI023	Together has not disclosed cash position or runway as of the runDate.	High	SI020, SI001
CI024	CoreWeave 2024 S-1 disclosures imply GPU-cloud gross margins in the 60-70% range on reserved deals.	Medium	SI032, SI035
CI025	Together per-token gross margin on serverless is plausibly 40-60% based on competitor analog disclosures.	Low	SI032, SI036, SI037
CI026	Implied cash burn through 2024 is roughly $300-$500M consistent with GPU buildout and 150+ headcount.	Low	SI004, SI001, SI018
CI027	With $533M raised and that implied burn, runway likely extends into 2026 without a new round.	Low	SI006, SI011
CI028	Figma and CoreWeave 2025 IPOs demonstrate the public-market window is open for AI-infrastructure issuers.	High	SI034, SI032
CI029	Navan 2025 S-1 process is a closer growth-SaaS comparable than CoreWeave for Together.	Medium	SI033
CI030	Together has not disclosed any debt or vendor-financing facility.	Medium	SI020, SI004
CI031	Founder and employee ownership post Series B is widely reported as significant but no exact percentages are public.	Low	SI006, SI018, SI019
CI032	No public secondary or tender offer for Together AI shares has been reported at the runDate.	Medium	SI020, SI025, SI026
CI033	Forrester and IDC market frames place Together in the growth-stage generative-AI infrastructure segment without naming it top-three.	Medium	SI027, SI028
CI034	Menlo Ventures and Bessemer 2025 State-of-AI reports frame the inference market as multi-billion-dollar and growing.	Medium	SI030, SI031, SI029
CI035	No public 2026 follow-on round, IPO filing, or M&A announcement involving Together has been confirmed at the runDate.	High	SI020, SI025, SI026, SI011
CI036	Together pricing-page revisions in 2025 added batch and dedicated SKU clarifications, signalling product and financial maturation.	Medium	SI009, SI002, SI004
CI037	Public disclosure across ten standard financial primitives is missing or partial, qualifying as a material diligence gap.	High	SI020, SI001, SI002, SI003
CE001	Together AI exposes serverless inference, dedicated endpoints, fine-tuning, batch, embeddings, vision, audio, and image APIs.	High	SE016, SE018, SE001, SE003
CE002	Together AI publishes an OpenAI-compatible chat-completions endpoint to simplify migration.	High	SE022, SE035
CE003	The Together model catalog spans 200+ open and custom models including Llama, Mistral, Mixtral, Qwen, DeepSeek, StripedHyena.	High	SE018, SE036, SE045
CE004	Dedicated endpoints offer reserved H100/H200/B200 capacity with BAA available for HIPAA workloads.	High	SE020, SE003, SE005
CE005	Fine-tuning API supports LoRA and full-parameter training jobs on most supported families.	High	SE019, SE042
CE006	Batch inference offers up to 50% discount vs serverless as of the 2025 update.	Medium	SE011, SE021
CE007	Embeddings API offers multiple open embedding models per published reference.	High	SE024, SE034
CE008	Together publishes vision, audio, and image APIs as documented surfaces.	High	SE031, SE032, SE033
CE009	SDKs ship in Python (PyPI: together) and TypeScript with raw HTTP fallback.	High	SE044, SE043, SE017
CE010	Rate-limit documentation distinguishes free, paid, and enterprise tiers.	High	SE025, SE016
CE011	Together architecture stacks API gateway, model registry, inference scheduler, TIE v2, and GPU pool.	High	SE016, SE009, SE010
CE012	Together Inference Engine v2 integrates FlashAttention-3/4 and ThunderKittens kernels.	High	SE010, SE006, SE007, SE008
CE013	Speculative decoding and Medusa decoders are integrated into the inference engine.	Medium	SE053, SE054, SE055
CE014	Mixture-of-Agents (MoA) provides ensemble inference for higher-quality completions on supported models.	Medium	SE056, SE012
CE015	FlashAttention-3 paper (arXiv 2407.08608) describes the kernel anchoring Together throughput claims.	High	SE052, SE006
CE016	FlashAttention-4 was released in August 2025 and extends the kernel lead to Hopper and Blackwell.	Medium	SE007, SE012
CE017	ThunderKittens kernel framework was released in 2024 by Together and Stanford HazyResearch.	High	SE008, SE065
CE018	NVIDIA is the primary GPU supplier (Hopper H100/H200, Blackwell B200) and a strategic investor.	High	SE060, SE014, SE001
CE019	HuggingFace is the primary model artefact partner and hosts Together-published checkpoints.	High	SE045, SE049
CE020	A status page is published at status.together.ai documenting platform reliability.	Medium	SE062
CE021	The public SLA percentage for serverless and dedicated tiers is not yet documented at the runDate.	Medium	SE062, SE025
CE022	Together infrastructure organisation expanded in 2025 with Alon Gavrielov as VP of Infrastructure Strategy.	High	SE015, SE005
CE023	Trust center publishes SOC 2 Type II attestation references and HIPAA BAA availability.	High	SE063, SE066, SE067
CE024	HIPAA BAA is available on dedicated endpoints but not serverless tier per documentation.	Medium	SE063, SE020
CE025	GDPR / DPA terms are available for EU customers per trust center documentation.	Medium	SE063
CE026	FedRAMP accreditation is not yet listed in the trust center at the runDate.	Medium	SE063
CE027	The full regional residency map (which regions, which co-lo partners) is not publicly disclosed.	Medium	SE063, SE020
CE028	ISO 27001 certification status is not publicly confirmed at the runDate.	Medium	SE063
CE029	Content moderation, function calling, JSON mode, and structured-output safety controls are documented surfaces.	High	SE028, SE027, SE026
CE030	Audit logs are documented for enterprise customers but not enabled by default.	Medium	SE063, SE020
CE031	Custom-model-weights privacy controls are documented for dedicated tier.	Medium	SE020, SE063
CE032	A bug bounty / responsible disclosure programme is published on the trust center.	Medium	SE063
CE033	GTC 2025 Pioneers event surfaced multiple Together customer + NVIDIA partnerships.	High	SE014, SE060
CE034	Adaption partnership (2025) extends Together into healthcare workflows.	Medium	SE005
CE035	AI Native Conference 2025 announced research and product directions including MoA productisation.	High	SE012, SE005
CE036	Blackwell (B200) capacity ramp is documented as 2026 roadmap item in blog references.	Low	SE005, SE014
CE037	Multi-modal expansion (vision + audio) is a documented 2026 roadmap area.	Low	SE005, SE012
CU001	Together AI reports more than 100,000 developers have used the platform per company disclosure.	Medium	SU004, SU003, SU001
CU002	Self-serve developer signup is the primary top-of-funnel adoption motion for Together AI.	High	SU038, SU001, SU003
CU003	Together customers page enumerates named startup and enterprise deployments.	High	SU003, SU001
CU004	AI-native startups (Pika, Cartesia, Arcee, Nous Research) are documented production customers.	High	SU012, SU015, SU013, SU014, SU003
CU005	Enterprise SaaS deployments at Salesforce and Zoom are documented case studies.	High	SU010, SU011, SU003
CU006	Washington University is referenced as a research-compute customer in a case study.	Medium	SU016, SU003
CU007	Adaption (2025) extends Together into healthcare workflows.	Medium	SU008, SU004
CU008	NVIDIA GTC 2025 Pioneers programme surfaced a cohort of joint Together + NVIDIA customers.	High	SU007, SU018
CU009	Startup Accelerator launched in November 2024 as an explicit startup-acquisition funnel.	High	SU006, SU004
CU010	Geographic mix is North America-skewed with EU presence growing through dedicated clusters.	Low	SU003, SU004, SU001
CU011	Buyer/user split differs by tier: developer-led self-serve vs CIO/platform-eng-led enterprise.	Medium	SU038, SU003
CU012	Salesforce case study documents integration depth and is treated as production deployment.	High	SU010, SU017, SU003
CU013	Zoom case study documents AI-feature inference at production scale.	High	SU011, SU003
CU014	Pika case study cites latency improvement from FlashAttention-class kernels.	High	SU012, SU003
CU015	Cartesia case study documents voice-model production deployment on dedicated tier.	High	SU015, SU003
CU016	Arcee case study documents cost reduction relative to closed APIs.	Medium	SU013, SU003
CU017	Nous Research case study documents community model hosting on Together.	Medium	SU014, SU003
CU018	Washington University case study documents research-compute usage.	Medium	SU016, SU003
CU019	Adaption is described as a launching partnership rather than confirmed production deployment.	Medium	SU008
CU020	GTC 2025 cohort case studies cover developer tools, robotics, healthcare, and content/media.	Medium	SU007
CU021	HuggingFace partnership funnels developers from the model hub into Together.	Medium	SU019, SU020
CU022	Net dollar retention (NDR) is not publicly disclosed at the runDate.	High	SU003, SU001, SU004
CU023	Gross retention (GRR) and named-account churn are not publicly disclosed.	High	SU003, SU001, SU004
CU024	Paid vs free developer counts are not disclosed.	High	SU004, SU003
CU025	Dedicated-endpoint renewal rate is not publicly disclosed.	High	SU004, SU003
CU026	G2 and Trustpilot review counts for Together are small, limiting independent proxies.	Medium	SU026, SU027
CU027	Salesforce Ventures-led Series B and customer case study together signal a multi-year channel commitment.	Medium	SU017, SU010, SU004
CU028	GTC 2025 Pioneers cohort acts as an enterprise pipeline amplifier through NVIDIA.	Medium	SU007, SU018
CU029	Startup Accelerator provides credits and GTM amplification to long-tail AI startups.	High	SU006, SU004
CU030	Adaption launch indicates a follow-on path into regulated healthcare workflows.	Medium	SU008
CU031	Enterprise sales cycle requires custom MSA and security review, adding 60-120 days before revenue.	Low	SU004, SU038
CU032	Top-10 customer concentration is undisclosed and is a material diligence ask.	High	SU003, SU004
CU033	Public customer mix skews AI-native startups + developer tools rather than a single mega-anchor.	Medium	SU003, SU006, SU007
CU034	No public lawsuit or named-account churn report has surfaced for Together at the runDate.	Medium	SU023, SU022, SU004
CU035	Reddit and Hacker News threads occasionally cite latency or cold-start concerns on the serverless tier.	Low	SU023, SU022
CU036	Public status page exists but no SLA percentage is published for serverless or dedicated tiers.	Medium	SU042, SU038
CU037	PyPI download trajectory and GitHub repo activity indicate sustained developer pull.	Medium	SU040, SU041
CR001	FTC opened a 6(b) inquiry in 2024 into generative-AI investments and partnerships, naming the major cloud-AI relationships.	High	SR002, SR001
CR002	FTC has stated ongoing 2024-2025 attention to GenAI competition and consumer-protection enforcement.	High	SR001, SR002
CR003	EU AI Act entered into force in 2024 with phased GPAI obligations through 2026-2027 including fines up to 7% of global revenue.	High	SR003, SR012
CR004	BIS tightened advanced-computing export controls in 2025 covering H100, H200, B200 and certain foundation-model weights.	High	SR005, SR008
CR005	NIST AI Risk Management Framework establishes voluntary US federal AI controls increasingly used in enterprise procurement.	High	SR004, SR008
CR006	UK ICO has published GenAI guidance creating UK DPA compliance baseline.	Medium	SR006
CR007	Australia OAIC has published a 2024 GenAI guide for organisations.	Medium	SR007
CR008	White House EO on AI (2023, amended 2025) sets reporting thresholds for foundation-model training.	Medium	SR008
CR009	CCPA imposes privacy obligations on Together for California-resident user data.	High	SR009, SR012
CR010	HIPAA BAA support is published as available for healthcare workloads.	High	SR010, SR028, SR026
CR011	SOC 2 attestation surface is referenced via the AICPA SOC framework and Together trust center.	Medium	SR011, SR028
CR012	NYT v Microsoft/OpenAI active litigation (CourtListener docket) is the bellwether GenAI copyright case in US.	High	SR013, SR014
CR013	Authors Guild v OpenAI active litigation expands copyright exposure to non-press content.	High	SR014, SR013
CR014	Getty Images v Stability AI active litigation tests image-model copyright exposure on both US and UK sides.	High	SR015, SR014
CR015	Civil-society organisations (CDT) actively lobby for AI accountability, adding reputational pressure.	Medium	SR012
CR016	Together is not currently named in any of the bellwether GenAI copyright suits.	Medium	SR013, SR014, SR015, SR025
CR017	Open-model hosting carries adjacent precedent risk if copyright cases extend to platform hosts.	Medium	SR013, SR014, SR015
CR018	Together publishes a public status page but does not publish an SLA percentage.	High	SR027, SR030
CR019	Pen-test cadence, breach plan, and named incident history are not publicly disclosed.	High	SR028, SR025
CR020	Safety models and function-calling guardrails are documented mitigations for prompt-injection class risks.	High	SR031, SR030
CR021	HuggingFace integrity checks are inherited for model-weight artefacts; weight-signing process is undisclosed.	Medium	SR028, SR025
CR022	Trust center references SOC 2 Type II posture; attestation expiry date is not public.	Medium	SR028, SR011
CR023	NVIDIA is supplier of GPUs, networking, and software stack and a strategic investor — single-vendor concentration is high.	High	SR025, SR024, SR029
CR024	HuggingFace is the primary model-artefact dependency for the Together catalog.	High	SR025, SR029
CR025	Salesforce Ventures is lead enterprise channel investor and co-sell partner.	High	SR025, SR029
CR026	Datacenter / colo capacity counterparties are largely undisclosed; multi-region build is implied but not enumerated.	Medium	SR025, SR024
CR027	Capital partners include GC, Salesforce, NVIDIA, Lux, Coatue, Prosperity7, and Kleiner per public round disclosures.	High	SR025, SR034, SR035
CR028	Top-10 customer concentration is undisclosed and is a material diligence ask.	High	SR029, SR025
CR029	Competitive displacement risk is documented from Fireworks, Replicate, Modal, Anyscale, Groq, Cerebras, CoreWeave, Lambda.	High	SR019, SR020, SR021, SR022, SR017, SR018, SR016, SR023
CR030	Open-source model upstream license changes (Llama, Mistral, Qwen, DeepSeek) would introduce review and compliance burden.	Medium	SR025, SR029
CR031	Sovereign / Prosperity7-adjacent backing adds geopolitical disclosure considerations.	Medium	SR034, SR035, SR025
CR032	Key-person dependency on Vipul Ved Prakash, Ce Zhang, and Tri Dao is high; founder retention is the mitigation.	High	SR024, SR025
CR033	CFO and CRO presence at runDate is not publicly confirmed and is a material recruiting diligence ask.	Medium	SR025, SR024
CR034	Engineering and infra hiring momentum is visible (Alon Gavrielov 2025 VP-infra hire) but exact bench size is undisclosed.	Medium	SR025, SR024
CR035	Hopper→Blackwell→Rubin transition execution is a multi-quarter program-management risk for the chapter.	Medium	SR025
CR036	Monitorable kill triggers (NVIDIA allocation cut, HF policy change, EU AI Act fine, copyright host-ruling) can be tracked from public disclosure.	Medium	SR025, SR003, SR005, SR013
CR037	Operational kill triggers (multi-hour serverless outage, breach disclosure) are monitorable through status page and press.	Medium	SR027, SR025, SR032, SR033
CR038	Commercial kill triggers (Salesforce co-sell deprioritisation, customer concentration >25% single) are monitorable through press and reference calls.	Medium	SR025, SR029
CR039	Founder-departure triggers are catastrophic for the thesis at growth stage.	Medium	SR025, SR024
CR040	Financing kill triggers (flat/down round vs Series B at runDate) would re-underwrite valuation.	Medium	SR025, SR034, SR035
CR041	Adverse-source coverage spans regulators, court dockets, competitors, and developer-sentiment fora.	High	SR002, SR013, SR019, SR032, SR033
CR042	Several control primitives (SLA, incident, breach plan, top-10 concentration, GPU committed spend) remain undisclosed at runDate and are explicit diligence asks.	High	SR025, SR029, SR028, SR027
CV001	Recommendation is Hold / Monitor with medium confidence at the Series B mark.	Medium	SV025, SV027, SV007
CV002	Conditional Buy on a 25%+ correction or confirmed >$500M ARR plus >120% NRR.	Medium	SV008, SV007, SV005
CV003	Conditional Pass if hyperscaler pricing cuts >40%, NVIDIA allocation cuts, or breach disclosure occurs.	Medium	SV001, SV002, SV003
CV004	Risk rating is medium-high reflecting concentration, regulatory, and competitive overhangs.	Medium	SV001, SV002, SV006
CV005	Valuation stance is "at-or-near" the current Series B mark with explicit triggers to revisit.	Medium	SV007, SV008
CV006	GenAI inference TAM grows 40-60% CAGR per multiple analyst sources at 2025 mid-point.	High	SV001, SV002, SV003, SV004, SV005
CV007	FlashAttention authorship by Tri Dao and ThunderKittens (Stanford HazyResearch) anchor Together's kernel moat.	High	SV025, SV024
CV008	Together Inference Engine v2 and MoA productisation extend the technical surface beyond commoditised inference.	Medium	SV025
CV009	Salesforce Ventures-led Series B + customer case study imply multi-year channel commitment.	Medium	SV043, SV025, SV018
CV010	NVIDIA strategic investment + GTC 2025 Pioneers cohort signal supply + pipeline alignment.	High	SV025, SV044
CV011	Open-source neutrality (Llama, Mistral, Qwen, DeepSeek) is defensible positioning vs closed-API providers.	Medium	SV025, SV027
CV012	Documented enterprise + startup proof base spans Salesforce, Zoom, Pika, Cartesia, Arcee, Nous Research, GTC 2025 Pioneers.	High	SV027, SV025
CV013	Anti-thesis: hyperscaler bundled inference (Bedrock, Vertex, Azure) could compress pricing 30-50%.	Medium	SV001, SV002, SV006
CV014	Anti-thesis: copyright litigation precedent (NYT, Authors Guild, Getty) could extend to platform hosts.	Medium	SV025, SV008
CV015	Bull case (25% prob) assumes ARR >$1B by 2028 and exit $8B-$12B.	Medium	SV001, SV005, SV006
CV016	Base case (50% prob) assumes ARR $500M-$700M by 2028 and exit $4B-$6B.	Medium	SV001, SV003, SV002
CV017	Bear case (25% prob) assumes ARR $200M-$300M by 2028 and outcome $1B-$2.5B.	Medium	SV001, SV002, SV006
CV018	Sensitivity to ARR growth is the single largest valuation driver in the chapter model.	Medium	SV007, SV008
CV019	Gross margin sensitivity is ±1000bps shifts valuation outcome ±$2-3B at base case.	Medium	SV014, SV013
CV020	Multiple sensitivity is ±3x ARR shifts exit ±$2.5B at base case.	Medium	SV013, SV015
CV021	Probability weights are subjective and re-marked at Series C and major events.	Low	SV007, SV008
CV022	CoreWeave post-IPO trades 8-12x NTM revenue as GPU-cloud comparable.	Medium	SV014, SV018
CV023	Navan S-1 disclosed 8-12x NTM revenue range at filing for growth-stage SaaS.	Medium	SV013, SV030
CV024	Figma S-1 disclosed 12-15x NTM revenue range as high-multiple SaaS reference.	Medium	SV015, SV029
CV025	Fireworks AI rumoured 2024 round valued ~$4B per press reports.	Low	SV018, SV019
CV026	Replicate and Modal rounds undisclosed in public press.	Medium	SV023, SV022
CV027	Anyscale private valuation rumoured $1B-$2B at last round.	Low	SV023, SV019
CV028	Sakana AI round ~$1.5B Aug 2024 per TechCrunch and NVIDIA partnership.	Medium	SV031, SV032
CV029	Mistral round ~$6B mid-2024 as OSS-model-lab comparable.	Medium	SV019, SV018
CV030	Anthropic round at $60B+ in 2025 as closed-API reference, not direct comparable.	Medium	SV018, SV019
CV031	NVIDIA public NTM revenue multiple high-teens to mid-20s acts as ceiling reference.	Medium	SV018, SV019
CV032	Snowflake NTM revenue multiple 10-15x acts as mature-SaaS ceiling reference.	Medium	SV018, SV019
CV033	ARR run-rate <$500M by FY2027 is the base-case kill trigger.	Medium	SV008, SV007
CV034	Salesforce co-sell public deprioritisation is the bull-case kill trigger.	Medium	SV043, SV025
CV035	NVIDIA Blackwell allocation cut to a peer is a re-underwrite trigger.	Medium	SV044, SV025
CV036	Hyperscaler bundled pricing cut >40% on AWS Bedrock or peer is a base-compression trigger.	Medium	SV001, SV002
CV037	Platform-host copyright precedent is an OSS-revenue re-underwrite trigger.	Medium	SV025, SV008
CV038	Series C flat-or-down vs Series B is a mark-to-market trigger.	Medium	SV018, SV019, SV007
CV039	Founder departure (CEO/CTO/CSO) is a kill trigger.	Medium	SV024, SV025
CV040	Exact ARR at runDate is undisclosed and is the principal diligence ask.	High	SV008, SV007, SV027
CV041	NRR / GRR / cohort retention are undisclosed at runDate and are material diligence asks.	High	SV027, SV025
CV042	Top-10 customer concentration and GPU committed spend are undisclosed.	High	SV027, SV025
CV043	CFO and CRO presence at runDate is unconfirmed.	Medium	SV024, SV025
CV044	Opex split (R&D / S&M / G&A), paid-developer count, SOC 2 expiry, and OSS hosting policy are all diligence asks.	Medium	SV025, SV027
CV045	Sacra estimates Together AI reached $1B in annualized revenue by February 2026, up from ~$618M at year-end 2025, representing ~400% year-over-year growth in 2024.	Medium	SV045, SV046
CV046	Together AI is in talks to raise approximately $1B at a $7.5B pre-money valuation as of March 2026, which would represent a >2× step-up from the $3.3B Series B valuation set in February 2025.	Medium	SV045, SV047
CV047	EquityZen lists Together AI as available for pre-IPO secondary share purchases by accredited investors, indicating secondary-market liquidity exists for current shareholders.	Medium	SV047, SV045
CV048	CB Insights' Q1 2026 State of AI report identifies AI infrastructure as the leading funding category in early 2026, with total AI deal activity up materially from prior quarters, supporting the demand context for Together AI's growth.	High	SV048, SV001

Sources
ID	Publisher	Title	Quote
SO001	Together AI	Together AI — The AI Acceleration Cloud
SO002	Together AI	About \| Together AI
SO003	Together AI	Careers \| Together AI
SO004	Together AI	Contact \| Together AI
SO005	Together AI	Together AI Blog
SO006	Together AI	Together AI raises $102.5M Series A
SO007	Together AI	RedPajama, a project to create leading open-source models
SO008	Together AI	Announcing OpenChatKit
SO009	Together AI	FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
SO010	Together AI	Together Inference Engine 2.0
SO011	Together AI	Research \| Together AI
SO012	CNBC	Together AI raises $305 million at $3.3 billion valuation
SO013	Bloomberg	Together AI Startup Raises Funds at $3.3 Billion Valuation
SO014	TechCrunch	Together raises $102.5M to build open-source generative AI
SO015	TechCrunch	Together AI is worth $1.25B (March 2024 update)
SO016	Fast Company	Together AI funding profile
SO017	VentureBeat	Together AI raises $305M for open-source GenAI
SO018	Wikipedia	Together AI — Wikipedia
SO019	Crunchbase	Together AI — Crunchbase Profile
SO020	Hacker News	Submissions from together.ai
SO021	Reddit r/LocalLLaMA	Together AI discussions
SO022	Product Hunt	Together AI on Product Hunt
SO023	StackShare	Together AI Tech Stack
SO024	X (Together)	@togethercompute on X
SO025	Salesforce Ventures	Salesforce Ventures Perspectives
SO026	NVIDIA	NVIDIA AI investments 2024
SO027	SEC EDGAR	SEC EDGAR — Together AI search
SO028	GitHub	Together Computer · GitHub Org
SO029	GitHub	togethercomputer/RedPajama-Data
SO030	GitHub	togethercomputer/OpenChatKit
SO031	GitHub	togethercomputer/StripedHyena
SO032	GitHub	Dao-AILab/flash-attention
SO033	HuggingFace	togethercomputer on Hugging Face
SO034	HuggingFace	StripedHyena-Nous-7B
SO035	Together AI	Introduction \| Together AI Docs
SO036	arXiv	FlashAttention-3: Fast and Accurate Attention with Asynchrony
SO037	arXiv	Mixture-of-Agents Enhances LLM Capabilities
SO038	Gartner	Gartner AI Insights
SO039	CoreWeave	CoreWeave — Specialized GPU Cloud
SM001	Together AI	Together AI — The AI Acceleration Cloud
SM002	Together AI	Pricing \| Together AI
SM003	Together AI	Customers \| Together AI
SM004	Together AI	Together AI Blog
SM005	Together AI	AI Native Conf — research & product announcements
SM006	Together AI	Batch inference API updates 2025
SM007	Together AI	Inference Models \| Together AI Docs
SM008	Together AI	Serverless Inference \| Together AI Docs
SM009	Together AI	Dedicated Endpoints \| Together AI Docs
SM010	Together AI	Batch Inference \| Together AI Docs
SM011	AWS	Amazon Bedrock
SM012	Google Cloud	Vertex AI
SM013	CoreWeave	CoreWeave — Specialized GPU Cloud
SM014	Lambda Labs	Lambda — GPU Cloud for AI
SM015	Replicate	Replicate — Run models in the cloud
SM016	Modal	Modal — Serverless AI infrastructure
SM017	Anyscale	Anyscale — Powered by Ray
SM018	Groq	Groq — Fast AI inference
SM019	Fireworks AI	Fireworks AI — Production-grade LLM inference
SM020	Cerebras	Cerebras — Wafer-Scale AI
SM021	Gartner	Gartner AI Insights
SM022	arXiv	LLM inference infrastructure survey
SM023	Wikipedia	Together AI — Wikipedia
SM024	CNBC	Together AI raises $305 million at $3.3 billion valuation
SM025	Bloomberg	Together AI Startup Raises Funds at $3.3 Billion Valuation
SM026	Fast Company	Together AI funding profile
SM027	Salesforce Ventures	Salesforce Ventures Perspectives
SM028	NVIDIA	NVIDIA AI investments 2024
SM029	Hacker News	Submissions from together.ai
SM030	Reddit r/LocalLLaMA	Together AI discussions
SM031	Product Hunt	Together AI on Product Hunt
SM032	Together AI	Together AI Startup Accelerator
SM033	Together AI	Together AI at NVIDIA GTC 2025
SP001	Together AI	Together AI — The AI Acceleration Cloud
SP002	Together AI	Pricing \| Together AI
SP003	Together AI	Customers \| Together AI
SP004	Together AI	Together AI Blog
SP005	Together AI	FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
SP006	Together AI	FlashAttention-4
SP007	Together AI	Together Inference Engine 2.0
SP008	Together AI	ThunderKittens kernel framework
SP009	Together AI	AI Native Conf — research & product announcements
SP010	Together AI	Inference Models \| Together AI Docs
SP011	Together AI	Serverless Inference \| Together AI Docs
SP012	Together AI	Dedicated Endpoints \| Together AI Docs
SP013	Together AI	Batch Inference \| Together AI Docs
SP014	Together AI	Rate Limits \| Together AI Docs
SP015	Together AI	Chat Completions API Reference
SP016	Together AI	Completions API Reference
SP017	Together AI	Models API Reference
SP018	AWS	Amazon Bedrock
SP019	Google Cloud	Vertex AI
SP020	CoreWeave	CoreWeave — Specialized GPU Cloud
SP021	Lambda Labs	Lambda — GPU Cloud for AI
SP022	Replicate	Replicate — Run models in the cloud
SP023	Modal	Modal — Serverless AI infrastructure
SP024	Anyscale	Anyscale — Powered by Ray
SP025	Groq	Groq — Fast AI inference
SP026	Fireworks AI	Fireworks AI — Production-grade LLM inference
SP027	Cerebras	Cerebras — Wafer-Scale AI
SP028	TensorWave	TensorWave — AMD GPU cloud
SP029	arXiv	FlashAttention: Fast and Memory-Efficient Exact Attention
SP030	arXiv	FlashAttention-2: Faster Attention with Better Parallelism
SP031	arXiv	FlashAttention-3: Fast and Accurate Attention with Asynchrony
SP032	arXiv	Speculative Decoding paper
SP033	arXiv	Medusa speculative decoding paper
SP034	arXiv	LLM inference infrastructure survey
SP035	arXiv	LLM evaluation benchmark paper
SP036	Reddit r/LocalLLaMA	Together AI discussions
SP037	Hacker News	Submissions from together.ai
SP038	Product Hunt	Together AI on Product Hunt
SP039	StackShare	Together AI Tech Stack
SP040	Gartner	Gartner AI Insights
SP041	NVIDIA	NVIDIA AI investments 2024
SP042	GitHub	togethercomputer/together-python SDK
SP043	PyPI	together — Python package
SP044	Wikipedia	Together AI — Wikipedia
SI001	Together AI	Together AI — The AI Acceleration Cloud
SI002	Together AI	Pricing \| Together AI
SI003	Together AI	Customers \| Together AI
SI004	Together AI	Together AI Blog
SI005	Together AI	Together AI raises $102.5M Series A
SI006	Together AI	Announcing $305M Series B
SI007	Together AI	Series A2 announcement
SI008	Together AI	Seed funding announcement
SI009	Together AI	Batch inference API updates 2025
SI010	Together AI	Together AI Startup Accelerator
SI011	CNBC	Together AI raises $305 million at $3.3 billion valuation
SI012	CNBC	Together AI raises $305 million (follow-up)
SI013	Bloomberg	Together AI Startup Raises Funds at $3.3 Billion Valuation
SI014	Fast Company	Together AI funding profile
SI015	TechCrunch	Together raises $102.5M to build open-source generative AI
SI016	TechCrunch	Together AI is worth $1.25B (March 2024 update)
SI017	VentureBeat	Together AI raises $305M for open-source GenAI
SI018	Wikipedia	Together AI — Wikipedia
SI019	Crunchbase	Together AI — Crunchbase Profile
SI020	SEC EDGAR	SEC EDGAR — Together AI search
SI021	Salesforce Ventures	Salesforce Ventures Perspectives
SI022	NVIDIA	NVIDIA AI investments 2024
SI023	X (Together)	@togethercompute on X
SI024	Gartner	Gartner AI Insights
SI025	PitchBook	Together AI — PitchBook profile
SI026	The Information	Together AI revenue 2025 reporting
SI027	Forrester	Forrester: Generative AI infrastructure landscape
SI028	IDC	IDC Worldwide AI Software Market Forecast 2024-2028
SI029	a16z	a16z — State of Generative AI in the Enterprise 2025
SI030	Menlo Ventures	Menlo Ventures: 2025 State of AI
SI031	Bessemer Venture Partners	Bessemer: State of AI 2025
SI032	SEC EDGAR	CoreWeave SEC filings (S-1 and post-IPO)
SI033	SEC EDGAR	Navan S-1/A filing
SI034	SEC EDGAR	Figma S-1 filings (comparable IPO)
SI035	CoreWeave	CoreWeave — Specialized GPU Cloud
SI036	Fireworks AI	Fireworks AI — Production-grade LLM inference
SI037	Groq	Groq — Fast AI inference
SI038	Reddit r/LocalLLaMA	Together AI discussions
SI039	Hacker News	Submissions from together.ai
SE001	Together AI	Together AI — The AI Acceleration Cloud
SE002	Together AI	About \| Together AI
SE003	Together AI	Pricing \| Together AI
SE004	Together AI	Customers \| Together AI
SE005	Together AI	Together AI Blog
SE006	Together AI	FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
SE007	Together AI	FlashAttention-4
SE008	Together AI	ThunderKittens kernel framework
SE009	Together AI	Together Inference Engine 2.0
SE010	Together AI	Together Inference Engine v2
SE011	Together AI	Batch inference API updates 2025
SE012	Together AI	AI Native Conf — research & product announcements
SE013	Together AI	Together AI Startup Accelerator
SE014	Together AI	Together AI at NVIDIA GTC 2025
SE015	Together AI	Alon Gavrielov joins as VP Infrastructure Strategy
SE016	Together AI	Introduction \| Together AI Docs
SE017	Together AI	Quickstart \| Together AI Docs
SE018	Together AI	Inference Models \| Together AI Docs
SE019	Together AI	Fine-tuning Overview \| Together AI Docs
SE020	Together AI	Dedicated Endpoints \| Together AI Docs
SE021	Together AI	Batch Inference \| Together AI Docs
SE022	Together AI	Chat Completions API Reference
SE023	Together AI	Serverless Inference \| Together AI Docs
SE024	Together AI	Embeddings \| Together AI Docs
SE025	Together AI	Rate Limits \| Together AI Docs
SE026	Together AI	JSON Mode \| Together AI Docs
SE027	Together AI	Function Calling \| Together AI Docs
SE028	Together AI	Safety Models \| Together AI Docs
SE029	Together AI	Code Execution \| Together AI Docs
SE030	Together AI	LLMs Overview \| Together AI Docs
SE031	Together AI	Vision Models Overview \| Together AI Docs
SE032	Together AI	Audio Models Overview \| Together AI Docs
SE033	Together AI	Image Models Overview \| Together AI Docs
SE034	Together AI	Embeddings API Reference
SE035	Together AI	Completions API Reference
SE036	Together AI	Models API Reference
SE037	GitHub	Together Computer · GitHub Org
SE038	GitHub	togethercomputer/RedPajama-Data
SE039	GitHub	togethercomputer/OpenChatKit
SE040	GitHub	Dao-AILab/flash-attention
SE041	GitHub	togethercomputer/StripedHyena
SE042	GitHub	togethercomputer/Llama-2-7B-32K-Instruct
SE043	GitHub	togethercomputer/together-python SDK
SE044	PyPI	together — Python package
SE045	HuggingFace	togethercomputer on Hugging Face
SE046	HuggingFace	StripedHyena-Nous-7B
SE047	HuggingFace	Evo-1-131k-base
SE048	HuggingFace	RedPajama-Data-1T Dataset
SE049	HuggingFace	HuggingFace x Together AI partnership
SE050	arXiv	FlashAttention: Fast and Memory-Efficient Exact Attention
SE051	arXiv	FlashAttention-2: Faster Attention with Better Parallelism
SE052	arXiv	FlashAttention-3: Fast and Accurate Attention with Asynchrony
SE053	arXiv	Speculative Decoding paper
SE054	arXiv	Speculative decoding follow-up
SE055	arXiv	Medusa speculative decoding paper
SE056	arXiv	Mixture-of-Agents Enhances LLM Capabilities
SE057	arXiv	LLM inference infrastructure survey
SE058	arXiv	LLM evaluation benchmark paper
SE059	arXiv	Sheared LLaMA paper
SE060	NVIDIA	NVIDIA AI investments 2024
SE061	AWS	Amazon Bedrock
SE062	Together AI	Together AI status page
SE063	Together AI	Together AI trust center
SE064	Tri Dao	Tri Dao personal site (Together CSO)
SE065	Stanford HazyResearch	Stanford HazyResearch lab (Chris Ré)
SE066	AICPA	SOC 2 reporting framework
SE067	HHS	HIPAA sample BAA provisions
SE068	Hacker News	Submissions from together.ai
SE069	Reddit r/LocalLLaMA	Together AI discussions
SE070	Product Hunt	Together AI on Product Hunt
SE071	StackShare	Together AI Tech Stack
SU001	Together AI	Together AI — The AI Acceleration Cloud
SU002	Together AI	About \| Together AI
SU003	Together AI	Customers \| Together AI
SU004	Together AI	Together AI Blog
SU005	Together AI	Pricing \| Together AI
SU006	Together AI	Together AI Startup Accelerator
SU007	Together AI	Together AI at NVIDIA GTC 2025
SU008	Together AI	Together AI x Adaption partnership
SU009	Together AI	AI Native Conf — research & product announcements
SU010	Together AI	Salesforce customer case study
SU011	Together AI	Zoom customer case study
SU012	Together AI	Pika customer case study
SU013	Together AI	Arcee customer case study
SU014	Together AI	Nous Research customer case study
SU015	Together AI	Cartesia customer case study
SU016	Together AI	Washington University customer case study
SU017	Salesforce Ventures	Salesforce Ventures Perspectives
SU018	NVIDIA	NVIDIA AI investments 2024
SU019	HuggingFace	HuggingFace x Together AI partnership
SU020	HuggingFace	togethercomputer on Hugging Face
SU021	Together AI	Together AI Blog (apex)
SU022	Reddit r/LocalLLaMA	Together AI discussions
SU023	Hacker News	Submissions from together.ai
SU024	Product Hunt	Together AI on Product Hunt
SU025	StackShare	Together AI Tech Stack
SU026	G2	Together AI — G2 reviews
SU027	Trustpilot	Together AI — Trustpilot reviews
SU028	Wikipedia	Together AI — Wikipedia
SU029	Crunchbase	Together AI — Crunchbase Profile
SU030	Fireworks AI	Fireworks AI — Production-grade LLM inference
SU031	Replicate	Replicate — Run models in the cloud
SU032	CNBC	Together AI raises $305 million at $3.3 billion valuation
SU033	Bloomberg	Together AI Startup Raises Funds at $3.3 Billion Valuation
SU034	Fast Company	Together AI funding profile
SU035	TechCrunch	Together AI is worth $1.25B (March 2024 update)
SU036	VentureBeat	Together AI raises $305M for open-source GenAI
SU037	Gartner	Gartner AI Insights
SU038	Together AI	Introduction \| Together AI Docs
SU039	Together AI	Inference Models \| Together AI Docs
SU040	PyPI	together — Python package
SU041	GitHub	togethercomputer/together-python SDK
SU042	Together AI	Together AI status page
SR001	FTC	FTC: AI Companies — Uphold Your Privacy & Confidentiality Commitments
SR002	FTC	FTC launches inquiry into generative AI investments & partnerships
SR003	EUR-Lex	EU Regulation 2024/1689 (AI Act)
SR004	NIST	AI Risk Management Framework
SR005	US BIS	BIS export controls on advanced computing & foundation models
SR006	UK ICO	UK Information Commissioner — Our work on AI
SR007	OAIC (Australia)	OAIC guidance on privacy and AI products
SR008	The White House	Executive Order 14110 on Safe, Secure AI
SR009	CA Attorney General	California Consumer Privacy Act guidance
SR010	HHS	HIPAA sample BAA provisions
SR011	AICPA	SOC 2 reporting framework
SR012	Center for Democracy & Technology	CDT — AI policy & governance
SR013	CourtListener	NYT v Microsoft / OpenAI docket
SR014	CourtListener	Authors Guild v OpenAI docket
SR015	CourtListener	Getty Images v Stability AI docket
SR016	CoreWeave	CoreWeave — Specialized GPU Cloud
SR017	Groq	Groq — Fast AI inference
SR018	Cerebras	Cerebras — Wafer-Scale AI
SR019	Fireworks AI	Fireworks AI — Production-grade LLM inference
SR020	Replicate	Replicate — Run models in the cloud
SR021	Modal	Modal — Serverless AI infrastructure
SR022	Anyscale	Anyscale — Powered by Ray
SR023	Lambda Labs	Lambda — GPU Cloud for AI
SR024	Together AI	Together AI — The AI Acceleration Cloud
SR025	Together AI	Together AI Blog
SR026	Together AI	Pricing \| Together AI
SR027	Together AI	Together AI status page
SR028	Together AI	Together AI trust center
SR029	Together AI	Customers \| Together AI
SR030	Together AI	Introduction \| Together AI Docs
SR031	Together AI	Safety Models \| Together AI Docs
SR032	Hacker News	Submissions from together.ai
SR033	Reddit r/LocalLLaMA	Together AI discussions
SR034	CNBC	Together AI raises $305 million at $3.3 billion valuation
SR035	Bloomberg	Together AI Startup Raises Funds at $3.3 Billion Valuation
SR036	VentureBeat	Together AI raises $305M for open-source GenAI
SR037	Fast Company	Together AI funding profile
SR038	Wikipedia	Together AI — Wikipedia
SV001	Gartner	Gartner AI Insights
SV002	Forrester	Forrester: Generative AI infrastructure landscape
SV003	IDC	IDC Worldwide AI Software Market Forecast 2024-2028
SV004	a16z	a16z — State of Generative AI in the Enterprise 2025
SV005	Bessemer Venture Partners	Bessemer: State of AI 2025
SV006	Menlo Ventures	Menlo Ventures: 2025 State of AI
SV007	PitchBook	Together AI — PitchBook profile
SV008	The Information	Together AI revenue 2025 reporting
SV009	Meritech Capital	Meritech SaaS comps table
SV010	PwC	PwC Global AI Study — Sizing the prize
SV011	Y Combinator	Y Combinator — Generative AI companies directory
SV012	SEC EDGAR	SEC EDGAR — Together AI search
SV013	SEC EDGAR	Navan S-1/A filing
SV014	SEC EDGAR	CoreWeave SEC filings (S-1 and post-IPO)
SV015	SEC EDGAR	Figma S-1 filings (comparable IPO)
SV016	SEC EDGAR	Snowflake 10-K filings (public SaaS comp)
SV017	SEC EDGAR	MongoDB 10-K filings (public infra comp)
SV018	CNBC	Together AI raises $305 million at $3.3 billion valuation
SV019	Bloomberg	Together AI Startup Raises Funds at $3.3 Billion Valuation
SV020	VentureBeat	Together AI raises $305M for open-source GenAI
SV021	Fast Company	Together AI funding profile
SV022	Wikipedia	Together AI — Wikipedia
SV023	Crunchbase	Together AI — Crunchbase Profile
SV024	Together AI	Together AI — The AI Acceleration Cloud
SV025	Together AI	Together AI Blog
SV026	Together AI	Pricing \| Together AI
SV027	Together AI	Customers \| Together AI
SV028	Together AI	About \| Together AI
SV029	CNBC	Figma starts trading on NYSE after IPO
SV030	CNBC	Navan files for IPO
SV031	TechCrunch	Sakana AI $135M Series B at $2.65B
SV032	NVIDIA	NVIDIA + Sakana AI partnership
SV033	CoreWeave	CoreWeave — Specialized GPU Cloud
SV034	Groq	Groq — Fast AI inference
SV035	Cerebras	Cerebras — Wafer-Scale AI
SV036	Fireworks AI	Fireworks AI — Production-grade LLM inference
SV037	Replicate	Replicate — Run models in the cloud
SV038	Modal	Modal — Serverless AI infrastructure
SV039	Anyscale	Anyscale — Powered by Ray
SV040	Lambda Labs	Lambda — GPU Cloud for AI
SV041	Hacker News	Submissions from together.ai
SV042	Reddit r/LocalLLaMA	Together AI discussions
SV043	Salesforce Ventures	Salesforce Ventures Perspectives
SV044	NVIDIA	NVIDIA AI investments 2024
SV045	Sacra	Together AI revenue, valuation & funding — Sacra analysis	Sacra estimates that Together AI hit $1B in annualized revenue in February 2026, up from ~$618M at the end of 2025, off growing demand for generative AI applications and the need, particularly among startups, for developer tooling used to train, fine-tune, and deploy AI models.
SV046	ARR.club	Together AI ARR milestones and revenue growth
SV047	EquityZen	Invest In Together AI Stock — Pre-IPO shares profile
SV048	CB Insights	State of AI Q1 2026 Report

Cover facts

Company profile

Executive summary

Top strengths

Top risks

Open gaps

Contents

1.1 Identity, Headquarters, and Product Frame

1.2 Founders, Leadership, and Governance

1.3 Funding History, Capital Stack, and Valuation

1.4 Scale, Cover Metrics, and Milestones

1.5 Exhibits

2.1 Market boundary and adjacencies

2.2 TAM/SAM/SOM and sizing lenses

2.3 Buyer, user, and payer segmentation

2.4 Growth drivers and constraints

2.5 Exhibits

3.1 Competitive landscape segmentation

3.2 Capability and feature comparison

3.3 Moat durability and competitive risk

3.4 Exhibits

4.1 Funding history and capital stack

4.2 Revenue, pricing, and reported scale

4.3 Unit economics, capital adequacy, and gaps

4.4 Exhibits

5.1 Product surface, modules, and SKUs

5.2 Architecture, dependencies, and operating model

5.3 Trust, security, compliance, and roadmap

5.4 Exhibits

6.1 Customer segmentation and adoption surface

6.2 Named-customer proof and durability

6.3 Expansion, concentration, and adverse signals

6.4 Exhibits

7.1 Regulatory and legal risk surface

7.2 Operational, security, partner, and dependency risk

7.3 Mitigations, kill criteria, and thesis-break triggers

7.4 Exhibits

8.1 Recommendation, thesis, and anti-thesis

8.2 Scenarios, comparables, and sensitivity

8.3 Thesis-break triggers, diligence asks, and KPIs

8.4 Exhibits

Disclaimer

Evidence index