Together AI
Open-model inference cloud with credible technical moat and enterprise traction, priced near its Series B mark
Together AI shows credible inference-cloud product and traction at a Series B valuation that requires multi-year ARR scale to underwrite a strong exit.
Cover facts
Company profile
Together AI is a generative-AI cloud that runs serverless and dedicated inference, fine-tuning, and training across 200+ open and custom models, anchored by FlashAttention, ThunderKittens, and Together Inference Engine v2. The company combines a defensible technical research base with a Salesforce + NVIDIA channel and an open-source community surface.
- Website
- www.together.ai
- Founded
- 2022-06-01
- Founders
- Vipul Ved Prakash, Ce Zhang, Tri Dao, Percy Liang
- Founding location
- San Francisco, California, USA
- Headquarters
- San Francisco, California
- Product
- Together AI sells serverless inference (per-token), dedicated endpoints (reserved GPU capacity), fine-tuning (LoRA + full), batch inference, embeddings, vision, audio, and image APIs across a 200+ open and custom model catalog, all OpenAI-compatible.
- Customers
- Developers (self-serve), AI-native startups (Pika, Cartesia, Arcee, Nous Research), enterprise SaaS (Salesforce, Zoom), healthcare (Adaption), academia (Washington University), and NVIDIA GTC 2025 Pioneers cohort.
- Business model
- Usage-based serverless inference + committed dedicated capacity + fine-tuning + enterprise contracts; Salesforce Ventures co-sell and Startup Accelerator augment direct sales.
- Stage
- late-stage private
- Funding status
- Privately funded; Series A $102.5M (Nov 2023, Kleiner Perkins led) and Series B $305M (Mar 2024, Salesforce Ventures led, ~$3.3B post-money per CNBC / Bloomberg / Fast Company); investors include NVIDIA, Coatue, Lux Capital, Prosperity7, General Catalyst.
Executive summary
Top strengths
- Technical moat anchored by FlashAttention (Tri Dao), ThunderKittens (Stanford HazyResearch), Together Inference Engine v2, and Mixture-of-Agents productisation.
- Anchor channel partners (Salesforce Ventures co-sell, NVIDIA GTC 2025 Pioneers, Startup Accelerator) plus a 200+ open-model catalog give wide enterprise + developer reach.
- Documented enterprise + startup proof base spanning Salesforce, Zoom, Pika, Cartesia, Arcee, Nous Research, Washington University, and Adaption healthcare.
Top risks
- Hyperscaler bundled inference (AWS Bedrock, GCP Vertex, Azure OpenAI) could compress pricing 30-50% over 2026-2027.
- NVIDIA single-vendor concentration on GPUs, networking, and stack would cap revenue ramp if Blackwell allocation tightens.
- GenAI regulatory perimeter (EU AI Act, BIS export controls, FTC inquiry) and copyright-litigation precedent (NYT, Authors Guild, Getty) widen through 2027.
Open gaps
- Exact ARR, NRR / GRR, top-10 customer concentration, GPU committed spend, and opex split (R&D / S&M / G&A) are undisclosed.
- CFO and CRO presence at runDate is not publicly confirmed.
- SLA percentage, incident history, pen-test cadence, and breach plan are not disclosed beyond the public status page.
- Sovereign-channel posture (Prosperity7-adjacent) and OSS hosting policy under tightening copyright precedent require management disclosure.
Contents
01Company Overview
1.1 Identity, Headquarters, and Product Frame
Together AI markets itself as “the AI acceleration cloud,” offering training, fine-tuning, and inference for open-source and custom large language, image, audio, and vision models. The corporate entity, Together Computer Inc., is headquartered in San Francisco, California, with a satellite presence in Menlo Park and additional research staff in Zurich; the careers page and contact surface confirm both locations and an active hiring posture across infrastructure, kernel, GPU, applied-ML, and revenue roles. The company was incorporated on 27 June 2022 by four co-founders with deep ties to Stanford, Princeton, ETH Zürich, and the broader open-source LLM research community. Its identity rests on three pillars: a hyperscale GPU cloud purpose-built for AI workloads, an open-source research arm (RedPajama, OpenChatKit, StripedHyena, FlashAttention, Mixture-of-Agents), and a self-service inference and fine-tuning API competitive with OpenAI’s and Anthropic’s but priced for open models. The company emphasises that customers can keep weights, control data residency, and dedicate clusters when needed, which is the principal contrast with closed-API competitors.[CO001, CO002, CO003, CO004, CO005]
| Metric | Value/status | Date | Confidence | Gap or diligence ask |
|---|---|---|---|---|
| Post-money valuation | $3.3B | 2024-07-09 | high | Confirm 2026 secondaries or new round |
| Total primary capital raised | ≈$533M disclosed | 2024-07 | high | Verify any post-Jul-2024 extensions |
| Annualised revenue | ≈$100M (third-party report) | 2024-07 | medium | No audited filing; request management figure |
| Headcount | >150 (job board derived) | 2026-05 | medium | No filing; request HR roster |
| GPU footprint | >20,000 NVIDIA Hopper-class | 2024-07 | medium | Confirm Blackwell additions and utilisation |
| Customer count | 100,000+ developers (company-claimed) | 2024 | low | Distinguish paying vs free; verify NRR |
| HQ | San Francisco, CA | 2026-05 | high | — |
| Founding date | 27 June 2022 | 2022 | high | — |
Values mix company disclosure (high), third-party reporting (medium), and inferred figures (low); paid-customer count and ARR are unaudited and must be validated with management.
[CO019, CO020, CO021, CO022, CO023, CO024]How identity, product, capital, and customers connect.
[CO001, CO003, CO005, CO017, CO020, CO021]1.2 Founders, Leadership, and Governance
CEO Vipul Ved Prakash was previously co-founder/CTO of Topsy (acquired by Apple for ~$200M in 2013) and an early principal at Cloudmark, giving him both consumer-scale ML and infrastructure operating experience. CTO Ce Zhang is a tenured professor at ETH Zürich and Together’s research lead on distributed training systems and data-centric ML. Chief Scientist Chris Ré is the MacArthur-winning Stanford professor behind Snorkel and many of the FlashAttention/Hyena lines of work; Percy Liang, Stanford CRFM director, is co-founder and an advisor. The leadership bench has expanded with a head of revenue, head of GPU infrastructure, head of inference engineering, and a Zurich-based research head; the board includes investor partners from Coatue, Kleiner Perkins, NEA, and Lux. Key-person dependence is concentrated in Prakash for commercial execution and in the founding research trio for technical credibility, particularly given the open-source flywheel that drives much of Together’s top-of-funnel.[CO006, CO007, CO008, CO009, CO010, CO011]
| Person | Role | Background | Founder-market fit | Key-person dependency |
|---|---|---|---|---|
| Vipul Ved Prakash | Co-founder, CEO | Previously co-founder/CTO Topsy (acquired by Apple 2013), Cloudmark co-founder | Repeat infrastructure/consumer-ML founder with operating exit | High — sole CEO and primary commercial face |
| Ce Zhang | Co-founder, CTO | Tenured professor ETH Zürich; distributed training & data-centric ML research lead | Deep systems/ML research credibility | High — only CTO; bridges research & engineering |
| Chris Ré | Co-founder, Chief Scientist | MacArthur Fellow; Stanford CS; Snorkel co-founder; FlashAttention/Hyena lineage | Authored or advised most open-source IP | High — anchors research brand |
| Percy Liang | Co-founder | Director Stanford CRFM; HELM benchmark lead | Sets research agenda & academic credibility | Medium — advisory not full-time operational |
| Tri Dao | Chief Scientist (research) | FlashAttention author; Princeton CS faculty | Inference-kernel authority | High — drives kernel performance lead |
| Head of Revenue | Sales leadership (publicly listed roles) | Enterprise SaaS background | Required for enterprise expansion | Medium — multiple sales hires already |
| Head of GPU Infrastructure | Cluster engineering | Prior hyperscaler experience (job board) | Crucial for SLA & cost | Medium — recruiting actively |
Founder bios cross-verified against official about page and Wikipedia; non-founder executives derived from careers postings and public LinkedIn footprints at runDate.
[CO006, CO007, CO008, CO009, CO010, CO011]1.3 Funding History, Capital Stack, and Valuation
Together AI raised a $20M seed in May 2023 led by Lux Capital with Factory, SciFi, Long Journey, and individual backers including Scott Banister, Jakob Uszkoreit, and Aravind Srinivas. A $102.5M Series A followed in November 2023, led by Kleiner Perkins with NVIDIA, Emergence, NEA, Prosperity7, and Greycroft participating. In March 2024 the company added approximately $106M at a reported $1.25B valuation, then closed a $305M Series B in July 2024 led by Salesforce Ventures and Coatue at a $3.3B post-money valuation; Lakestar, NVIDIA, and an expanded set of strategics also participated. Cumulative disclosed primary capital is therefore approximately $533M before any 2025/2026 extensions, with no public S-1 filing or registered offering on EDGAR as of the run date. The investor mix — sovereign-aligned (Prosperity7), strategic GPU supplier (NVIDIA), category-defining cloud customer (Salesforce Ventures) and tier-1 financials (Coatue/KP/Lux) — is unusual and suggests Together is being positioned as a neutral, multi-stakeholder backbone for the open-model market.[CO012, CO013, CO014, CO015, CO016, CO017]
| Stakeholder | Role | Round(s) | Control/economic importance | Diligence ask |
|---|---|---|---|---|
| Salesforce Ventures | Lead Series B (2024) | B | Strategic distribution into Salesforce ecosystem | Confirm any commercial commit or revenue share |
| Coatue | Co-lead Series B | B | Public-market crossover signaling | Confirm pro-rata posture |
| Kleiner Perkins | Lead Series A | Seed/A/B | Board seat; partner Bucky Moore | Confirm board composition |
| NVIDIA | Strategic investor | A/B | Allocation of H100/H200/B200 supply | Quantify supply commitment & pricing |
| Lux Capital | Lead Seed | Seed/A | Earliest institutional backer | Confirm board observer rights |
| Emergence Capital | Series A | A | Enterprise-SaaS network | — |
| Prosperity7 (Aramco) | Series A | A | Sovereign-aligned capital; Middle East GTM | Confirm any sovereign-cloud commitments |
| NEA, Greycroft, SciFi, Factory, Long Journey, Definition, Long Journey | Co-investors | Seed/A/B | Round support | — |
| Founders & employees | Common stock | — | Reported >25% retained based on Series A press | Confirm cap table post-Series B |
Cap table figures sourced from press releases at funding events; secondary sales not disclosed at runDate.
[CO012, CO013, CO014, CO015, CO016, CO017]Founding to Series B + flagship research drops.
[CO014, CO015, CO016, CO017, CO021, CO022]1.4 Scale, Cover Metrics, and Milestones
Public scale metrics remain partial. The company has stated it operates more than 20,000 NVIDIA Hopper-class GPUs across multiple regions, with public roadmap notes referencing Blackwell rollouts, and serves "hundreds of thousands" of developers via the Together API, but it has not disclosed audited ARR, gross margin, paid-developer counts, or net revenue retention. CNBC reported a $100M annualised revenue pace around the Series B; Bloomberg cited triple-digit growth without a specific number. Reported headcount tracks above 150 globally, with active openings spanning kernel, networking, ML, and sales. The milestone timeline anchors founding (June 2022), seed (May 2023), RedPajama 1T dataset (April 2023), OpenChatKit (March 2023), Series A (November 2023), FlashAttention-3 (July 2024), Series B at $3.3B (July 2024), and StripedHyena/Mixture-of-Agents research (late 2023–2024). No adverse litigation, layoffs, or regulatory action has been reported through the run date, but key cover metrics (gross margin, ARR confirmation, customer concentration) remain undisclosed and are reflected in the snapshot KPI table.[CO019, CO020, CO021, CO022, CO023, CO024]
| Date | Event | Type | Amount/valuation/status | Participants | Implication |
|---|---|---|---|---|---|
| 2022-06-27 | Together Computer Inc. incorporated | founding | active | Prakash, Zhang, Ré, Liang | Identity established |
| 2023-03-10 | OpenChatKit launched | product | released | Together + LAION + Ontocord | Open-source instruct-tuning baseline |
| 2023-04-17 | RedPajama 1T dataset released | product | released | Together + EleutherAI + LAION | Foundational open dataset (1T tokens) |
| 2023-05-15 | $20M seed announced | financing | closed | Lux + Factory + SciFi | Institutional launch capital |
| 2023-11-29 | $102.5M Series A | financing | closed at undisclosed valuation | Kleiner Perkins (lead), NVIDIA, NEA, Emergence | Scale-up & H100 build-out |
| 2024-03-13 | Reported interim raise at $1.25B | financing | reported | Existing investors | Mid-cycle uplift |
| 2024-07-09 | $305M Series B at $3.3B post | financing | closed | Salesforce Ventures + Coatue (co-leads), NVIDIA, Lakestar | 3x valuation step-up; enterprise pivot |
| 2024-07-11 | FlashAttention-3 paper & blog | product | released | Dao et al. | State-of-the-art H100 inference kernel |
| 2024-09 | Together Inference Engine 2.0 | product | released | Together engineering | Latency / throughput leadership claim |
| 2023-12 | StripedHyena-Nous-7B | product | released | Together + Nous Research | Non-attention long-context architecture |
| 2024-06 | Mixture-of-Agents paper | product | released | Together research | Agentic LLM technique |
| 2024-Q4 | Dedicated Endpoints GA | product | released | Together engineering | Enterprise inference offer |
No reported adverse events (litigation, layoffs, regulatory action) at runDate; absence of adverse events is itself a diligence finding pending background check.
[CO019, CO020, CO021, CO022, CO023, CO024]IC-ready snapshot of maturity, traction, and capital.
[CO019, CO021, CO022, CO024, CO032]1.5 Exhibits
02Market Analysis
2.1 Market boundary and adjacencies
Together AI sits in the AI compute and inference platform layer of the modern cloud stack — between hyperscaler GPU IaaS (AWS, GCP, Azure), specialised GPU clouds (CoreWeave, Lambda, Crusoe), inference-API providers (Replicate, Fireworks, Groq, Modal), and closed-API model labs (OpenAI, Anthropic). The market we underwrite is the spend dedicated to running, fine-tuning, and serving open-weight or customer-owned foundation models, plus the dedicated and serverless GPU capacity used for AI workloads. Excluded from this market are general-purpose cloud compute, traditional ML platforms (Sagemaker training-only, classical scikit pipelines), and closed proprietary model APIs that do not host customer weights. Adjacencies include MLOps tooling (Weights & Biases, Anyscale), vector databases, and AI safety/observability vendors. Status-quo substitutes are self-hosted Kubernetes-on-GPU clusters and closed-API rentals from OpenAI/Anthropic, both of which trade flexibility for price and operational simplicity. We also explicitly exclude per-seat AI copilots (Copilot, Cursor) because the unit of demand is end-user seats rather than inference tokens, which means they sit one layer above Together in the application stack and procure rather than substitute for token-level inference.[CM001, CM002, CM003, CM004, CM005, CM006]
| Segment | Included spend | Excluded spend | Buyer/payer | Relevance to Together |
|---|---|---|---|---|
| Open-weight model inference (API) | Per-token serverless inference on Llama/Mistral/Qwen/DeepSeek | Closed-API tokens (OpenAI/Anthropic) | Developer + CTO | Core SOM |
| Dedicated GPU capacity | Reserved H100/H200/B200 endpoints | General-purpose cloud compute | Platform team | Direct expansion ARR |
| Fine-tuning + custom model hosting | LoRA, full fine-tune, custom checkpoint hosting | In-house Kubernetes training | ML engineering lead | High-margin attach |
| Batch inference + training | Multi-million-token batch jobs, pretraining runs | Closed training-only platforms | Research lead | Growth wedge |
| Sovereign / regional clusters | Dedicated in-region capacity | Public-region multi-tenant | Government / regulated CIO | Differentiated lane |
| MLOps + observability | Logs, evals, fine-tune jobs | BI/analytics | MLOps lead | Adjacency, not core |
| Closed-API model rentals | OpenAI/Anthropic API spend | — | App developer | Substitute / pressure |
Boundary anchors on customer ownership of weights and on GPU-backed compute as the billable unit; excludes general-purpose cloud and closed-only APIs.
[CM001, CM002, CM003, CM004, CM005, CM006]2.2 TAM/SAM/SOM and sizing lenses
Multiple analyst sources converge on a 2024 AI infrastructure TAM of $40–60B with 30–50% CAGR through 2028 (Gartner, IDC, McKinsey). Within that envelope, the inference and dedicated AI compute SAM most relevant to Together is sized at $8–15B in 2026 by triangulating: hyperscaler AI revenue disclosures ($26B annualised AWS Bedrock-equivalent revenue extrapolated), Series B coverage citing inference as the fastest-growing line, and the ~$100M Together ARR proxy implying a single-digit share of an early SAM. SOM (Together-addressable, near-term winnable spend) is on the order of $1–3B, focused on AI-native startups, model labs, and the Salesforce + sovereign-cloud channels Together has explicit relationships with. Sizing is constrained by the lack of disaggregated public reporting from hyperscalers and by the conflation of training capex with inference run-rate in many published estimates.[CM007, CM008, CM009, CM010, CM011, CM012]
| Publisher | Year | Geography | Value | CAGR | Methodology | Confidence | Limitation |
|---|---|---|---|---|---|---|---|
| Gartner | 2024 | Global | $40–60B AI infrastructure TAM | 30–50% | Top-down survey of hyperscaler + enterprise AI spend | medium | Aggregates training+inference; not disaggregated |
| IDC (cited via secondary) | 2024 | Global | $50B AI infrastructure 2024 | 35% | Hardware + cloud forecast | low | Indirect citation |
| McKinsey AI spend report | 2024 | Global | $50–100B 2027 AI infra | 40% | Scenario analysis | low | Wide range; assumptions unclear |
| Triangulated SAM (this report) | 2026 | Global | $8–15B inference + dedicated SAM | — | Bottoms-up from CNBC ARR + hyperscaler disclosures | medium | Single-source dependence on hyperscaler quarterlies |
| Triangulated SOM (this report) | 2026 | Global | $1–3B Together-addressable | — | Channel + Together $100M ARR | medium | High estimation uncertainty |
| NVIDIA earnings (data centre) | 2025-Q1 | Global | >$30B/qtr DC revenue | >50% | Public filings | high | Includes training capex sales, not pure inference |
TAM/SAM/SOM are bounded; ranges preserved because no single public source disaggregates inference spend cleanly.
[CM007, CM008, CM009, CM010, CM011, CM012]TAM/SAM/SOM for Together-addressable AI compute.
[CM007, CM008, CM011, CM012, CM036]Inference SAM 2026 estimates.
[CM009, CM010]2.3 Buyer, user, and payer segmentation
Three primary buyer segments drive Together demand. (1) AI-native startups and model labs: technical founders or CTOs choose Together for FlashAttention-class inference latency, dedicated H100/H200 access, and open-weight flexibility; these are typically self-serve credit-card purchasers escalating to enterprise contracts. (2) Enterprise platform teams and applied-ML groups inside Fortune-500 companies: budget owners are CIOs/CTOs evaluating multi-model strategies, with procurement gates around data residency, SOC 2, and BAA support; Salesforce Ventures co-leadership of the Series B underwrites this segment. (3) Government, research, and sovereign-cloud customers: Prosperity7 (Aramco) and similar sovereign-aligned LPs signal a Middle East/APAC angle, and Together has positioned dedicated regional clusters as a differentiator. Users (developers, ML engineers, researchers) often differ from payers (finance, procurement, IT), which lengthens enterprise cycles but improves NRR once landed.[CM014, CM015, CM016, CM017, CM018, CM019]
| Segment | Buyer | User | Payer | Workflow | Budget owner | Adoption trigger |
|---|---|---|---|---|---|---|
| AI-native startup | CTO | ML engineer | Founder/CFO | Self-serve API + LoRA | CTO | Need open weights + dedicated GPUs |
| F500 platform team | CIO | Applied ML | IT procurement | RFP + dedicated endpoints | CIO | Multi-model strategy + BAA |
| Sovereign cloud | Minister/CIO | Government ML | Treasury | In-region dedicated capacity | Government | Data residency mandate |
| Model lab | Founder | Researcher | Founder | Reserved training + inference | Founder | GPU scarcity at hyperscalers |
| Independent dev | Self | Self | Self | Per-token API | Self | Free tier + pricing parity |
| Salesforce ecosystem ISV | Product VP | Eng team | Product P&L | Embedded GenAI | Product VP | Salesforce Ventures channel |
Buyer/user/payer split distinguishes self-serve credit-card adoption from enterprise procurement gates.
[CM014, CM015, CM016, CM017, CM018, CM019]Adoption maturity by segment.
[CM014, CM015, CM016, CM017, CM018, CM019]2.4 Growth drivers and constraints
Tailwinds: ongoing open-weight model proliferation (Llama 3/4, Mistral, DeepSeek, Qwen), GPU scarcity at hyperscalers, FinOps pressure to reduce per-token closed-API spend, and the agentic AI wave that multiplies token-volume per user. Headwinds: NVIDIA supply allocation favouring hyperscalers, sovereign data rules slowing cross-border inference, energy/permitting bottlenecks for new data centres, and competitive pricing pressure from Groq, Fireworks, and Cerebras at the inference layer. Adoption-timing risks include enterprise procurement friction, the possibility that hyperscalers commoditise the OSS-inference layer (AWS Bedrock open models, GCP Vertex Model Garden), and the volatile economics of training-vs-inference mix. Together's positioning depends on staying a generation ahead on inference kernels (FlashAttention 3/4, ThunderKittens) while expanding into reserved/dedicated SKUs that lock enterprise spend. Each driver and constraint feeds back into a binary question for IC: does the inference SAM compound at 35%+ for three more years, or does hyperscaler commoditisation pull growth forward into a single year of land-grab? Our base case assumes durable 30–40% CAGR through 2027 with widening competitive intensity from 2026 onward, which is the regime in which Together's open-source flywheel and dedicated-capacity differentiation produce the strongest IRR.[CM021, CM022, CM023, CM024, CM025, CM026]
| Driver/constraint | Direction | Timing | Implication | Diligence ask |
|---|---|---|---|---|
| Open-weight model proliferation | + | 2024-2027 | Sustains SAM growth >35% CAGR | Track Llama 4/5, DeepSeek, Qwen release cadence |
| NVIDIA Hopper/Blackwell scarcity | + | 2024-2026 | Drives Together's reserved capacity premium | Quantify Together NVIDIA allocation pact |
| Closed-API price pressure (OpenAI cuts) | - | Ongoing | Compresses per-token margin | Track Together pricing parity vs OpenAI |
| Hyperscaler open-model commoditisation | - | 2025-2027 | Erodes pure-inference SAM | Watch AWS Bedrock & Vertex Model Garden expansion |
| Sovereign data residency rules | +/- | 2025+ | Creates regional moats but caps cross-border ARR | Confirm Together in-region clusters |
| Energy/permitting bottlenecks | - | 2026-2028 | Slows capacity expansion | Confirm Together DC contracts |
| Agentic workloads multiply tokens | + | 2025+ | Increases inference volume per user | Track MoA + agent SDK adoption |
| FinOps push to OSS inference | + | 2025+ | Tailwind for Together vs closed APIs | Survey enterprise FinOps strategy |
Drivers cited from multiple analyst notes and partner statements; constraints triangulated from supply-chain reporting and hyperscaler announcements.
[CM021, CM022, CM023, CM024, CM025, CM026]Discovery to expansion path.
[CM020, CM021, CM022, CM033]2.5 Exhibits
03Competitors
3.1 Competitive landscape segmentation
Together competes across five overlapping arenas. (1) Hyperscaler open-model offerings — AWS Bedrock and Google Vertex Model Garden host the same Llama/Mistral checkpoints Together offers, bundled with enterprise contracts and IAM. (2) Specialised GPU clouds — CoreWeave, Lambda Labs, and TensorWave compete for raw GPU-hour and reserved capacity; they typically lack the inference SaaS layer Together overlays. (3) Inference-API peers — Fireworks, Replicate, Modal, and Anyscale provide near-direct substitutes at the per-token serverless layer; Fireworks is most frequently cited as Together's closest direct rival. (4) Bespoke-silicon inference vendors — Groq (LPU), Cerebras (wafer-scale), and SambaNova compete on latency and price/token at the cost of model coverage. (5) Closed-API model labs — OpenAI and Anthropic act as substitutes for buyers willing to give up weight portability. The status-quo alternative is self-hosted Kubernetes-on-GPU, which trades flexibility for operational burden; internal-build is most common at frontier labs and FAANG. The competitive set is unusually broad because Together sits at the intersection of compute, model hosting, and developer experience; each arena exposes Together to different cost structures (capex-heavy GPU clouds vs OpEx-light API providers), different distribution power (hyperscaler procurement vs developer self-serve), and different exit dynamics (consolidation among GPU clouds vs commoditisation among API peers), all of which we underwrite separately below.[CP001, CP002, CP003, CP004, CP005, CP006]
| Competitor | Category | Scale/funding | Target segment | Differentiation | Limitation |
|---|---|---|---|---|---|
| AWS Bedrock | Hyperscaler open-model | >$80B AWS revenue | Enterprise | IAM, compliance, bundling | Per-token premium, slower model adds |
| GCP Vertex Model Garden | Hyperscaler open-model | ~$30B GCP revenue | Enterprise | Gemini + open models | Less open-weight depth |
| CoreWeave | Specialised GPU cloud | >$8B raised; public 2025 | AI labs, hyperscaler offload | Largest non-hyperscaler GPU fleet | No inference SaaS layer |
| Lambda Labs | GPU cloud | $320M Series C | Researchers, startups | On-demand H100/H200 | Smaller fleet vs CoreWeave |
| Fireworks AI | Inference API peer | >$77M raised | Devs, startups | OpenAI-compatible API | Smaller OSS-research footprint |
| Replicate | Inference API peer | >$40M raised | Indie devs | Community models, low friction | Cold-start latency |
| Modal | Serverless infra | >$80M raised | ML eng | Python-native serverless | Less model breadth |
| Anyscale | Ray-based platform | >$250M raised | ML eng | Ray + LLM tooling | OSS-platform tax |
| Groq | Bespuke silicon | >$1B raised | Latency-sensitive devs | LPU inference speed | Limited model coverage |
| Cerebras | Bespoke silicon | >$1B raised; IPO filed | Frontier customers | Wafer-scale chip | High per-deployment cost |
| OpenAI / Anthropic (substitute) | Closed API | >$30B / $10B raised | Enterprise + devs | Frontier closed models | No weight portability |
| TensorWave | AMD GPU cloud | Seed-stage | Cost-sensitive devs | MI300X capacity | Limited scale |
Funding and scale figures sourced from public press releases and Crunchbase summaries; some private funding rounds rely on third-party reporting.
[CP001, CP002, CP003, CP004, CP005, CP006]3.2 Capability and feature comparison
On capability axes Together leads on FlashAttention-3/4 kernel performance, open-weight model breadth (Llama, Mistral, DeepSeek, Qwen, custom checkpoints), and dedicated-endpoint flexibility. Hyperscalers lead on enterprise compliance breadth (BAA, FedRAMP, regional residency) and bundled identity/billing. Groq leads on raw single-stream latency on supported models but lags on model coverage. Fireworks closely matches Together on serverless open-model APIs but has lower OSS-research visibility. Pricing comparison shows Together's serverless rates clustered around the OpenAI-parity envelope (≈$0.20–$0.90/M input tokens for 7–70B models) with batch discounts up to 50%; CoreWeave/Lambda undercut on raw GPU-hour but require customer DevOps; AWS Bedrock charges a per-token premium on top of underlying compute. Feature matrices below mark unsupported cells as unknown rather than guessing. The matrix shows Together winning on open-weight breadth and kernel performance, hyperscalers winning on compliance and IAM, and bespoke-silicon vendors winning on latency at the cost of model coverage; no single vendor dominates the four most cited buying criteria simultaneously. We also note that Together is one of only two vendors in the set that ships an OpenAI-compatible chat completions endpoint while also exposing fine-tune and batch SKUs, which materially shortens migration time for buyers leaving closed APIs.[CP009, CP010, CP011, CP012, CP013, CP014]
| Buying criterion | Together | Bedrock | GCP Vertex | Fireworks | Groq | CoreWeave |
|---|---|---|---|---|---|---|
| Open-weight model breadth | high | medium | medium | high | low | n/a |
| FlashAttention-class kernel perf | high | unknown | unknown | high | medium | n/a |
| Dedicated endpoints / reserved | yes | yes (provisioned) | yes | yes | yes | yes (raw) |
| Fine-tuning API | yes | partial | yes | yes | no | no |
| Batch inference SKU | yes | partial | yes | partial | no | no |
| Compliance (SOC2/HIPAA/FedRAMP) | SOC2; HIPAA via BAA | full | full | SOC2 | unknown | SOC2 |
| Sovereign / regional clusters | available | full | full | limited | unknown | full |
| OpenAI-compatible API | yes | no | no | yes | yes | no |
| Per-token list pricing transparency | high | medium | medium | high | high | n/a |
| Multi-modal (vision/audio/image) | yes | partial | yes | partial | no | n/a |
Cells marked "unknown" where public docs do not disclose; cells marked "n/a" where the feature is outside the competitor's SKU.
[CP009, CP010, CP011, CP012, CP013, CP014]| Vendor | SKU | Price/unit | Discount | Notes |
|---|---|---|---|---|
| Together | Serverless Llama-70B | $0.88/M tokens | — | OpenAI-parity envelope |
| Together | Batch inference | -50% off serverless | batch | Updated 2025 |
| Together | Dedicated endpoint | custom | reserved | Quoted via sales |
| Fireworks | Serverless Llama-70B | $0.90/M tokens | — | Similar parity |
| Replicate | Per-second | varies | — | GPU-second billing |
| AWS Bedrock | Llama 3 70B | $0.99/M output tokens | vol | Provisioned reserved option |
| GCP Vertex | Llama 3 70B | $0.99/M | vol | Similar to Bedrock |
| Groq | Llama 3 70B | $0.59/M | — | Latency premium |
| CoreWeave | GPU-hour | $2–4/H100-hr | reserved | Customer manages stack |
| Lambda | GPU-hour | $2.79/H100-hr | on-demand | Customer manages stack |
Per-token prices reflect public list pricing on vendor sites at runDate; realised pricing for enterprise deals is undisclosed.
[CP016, CP017, CP018, CP019, CP020]Open-weight breadth vs enterprise compliance maturity.
[CP001, CP009, CP011, CP012, CP013]Capability strength by competitor.
[CP010, CP014, CP015, CP018, CP029]3.3 Moat durability and competitive risk
Together's defensible moats are (a) FlashAttention research lineage and kernel velocity (with Tri Dao + Chris Ré), (b) open-source community gravity (RedPajama, StripedHyena, MoA), and (c) the NVIDIA + Salesforce + sovereign capital stack that secures GPU supply and enterprise distribution. Switching costs are mid: customers can multi-home across Together / Fireworks / Bedrock with API translation; however, dedicated-endpoint contracts and fine-tuned model artifacts on Together raise stickiness. Distribution power tilts to hyperscalers — they own enterprise procurement and identity — but Together's neutrality and open-weight commitment is a counter-positioning differentiator. Adverse competitor evidence: CoreWeave's 2024 IPO filings and Lambda's growth signal substantial capital advantage at the IaaS layer; Groq and Cerebras have raised >$1B each at higher valuations; Bedrock's 2025 expansion of Llama support compresses Together's premium on commodity workloads. Commoditisation risk is real but bounded by Together's research velocity and dedicated-capacity contracts. Net, we believe the moat is durable through 2027 in the dedicated and high-performance segments, with growing pressure on the commodity serverless tier from hyperscalers and on the latency-critical tier from bespoke silicon vendors; the company's ability to sustain kernel and architecture leadership is the gating variable for the moat thesis and is therefore the principal item on the technical-diligence checklist.[CP021, CP022, CP023, CP024, CP025, CP026]
| Moat claim | Threat | Severity | Mitigation/diligence ask |
|---|---|---|---|
| FlashAttention research lineage | Open-source diffusion to competitors | medium | Track Together's patent/IP posture |
| Open-source community gravity | Competing OSS projects from Mistral/HF | medium | Quantify Together GH/HF traction over time |
| NVIDIA supply alignment | NVIDIA tilts to hyperscalers | high | Document Together NVIDIA pact |
| Salesforce / enterprise channel | Salesforce develops its own AI infra | medium | Confirm Salesforce commercial commit |
| Sovereign capital + regional clusters | Sovereign customers go direct to local clouds | medium | Map Together regional DC footprint |
| Dedicated endpoint stickiness | Bedrock provisioned throughput parity | high | Track Bedrock open-model price moves |
| Open-weight neutrality | Enterprise wants closed-API simplicity | medium | Survey enterprise multi-model strategy |
| Inference engine performance lead | Specialised silicon (Groq/Cerebras) leapfrogs | high | Benchmark Together vs Groq on shared models |
Moats ranked by exposure to competitive substitution and capital intensity; each row has a concrete diligence ask.
[CP021, CP022, CP023, CP024, CP025, CP026]Compact competitive durability summary.
[CP021, CP022, CP023, CP024, CP025, CP026]3.4 Exhibits
04Financials
4.1 Funding history and capital stack
Together AI has assembled approximately $533M of disclosed primary capital across four publicly announced rounds. The seed of $20M (May 2023) was led by Lux Capital with Factory, SciFi, Long Journey, and notable individual backers (Scott Banister, Jakob Uszkoreit, Aravind Srinivas). A $102.5M Series A in November 2023 was led by Kleiner Perkins with NVIDIA, Emergence, NEA, Prosperity7, and Greycroft. In March 2024 the company added approximately $106M at a reported $1.25B valuation (sometimes referred to as Series A2), then closed a $305M Series B in July 2024 led by Salesforce Ventures and Coatue at a $3.3B post-money valuation, with Lakestar, NVIDIA, and several strategics participating. No S-1, S-3, or registered offering appears on SEC EDGAR for Together Computer Inc. at the runDate, and no public secondary or 2026 extension has been confirmed. The capital stack therefore reads as venture-only with strategic anchors (NVIDIA for GPU supply, Salesforce Ventures for enterprise distribution, Prosperity7 for sovereign optionality); board control is split across KP, Coatue and Lux based on round-led signaling, but the cap table itself is not public. Cumulative dilution is undisclosed; founders are widely reported to retain a meaningful equity stake post Series B, but exact percentages are not in the public record and must be verified against management.[CI001, CI002, CI003, CI004, CI005, CI006]
| Capital primitive | Value | Date | Public status | Diligence ask |
|---|---|---|---|---|
| Cumulative primary capital | ~$533M | 2024-07 | disclosed (round level) | — |
| Cash on hand | undisclosed | — | missing | Request cash position |
| Monthly burn | undisclosed (~$15-25M implied) | 2024-25 | missing | Request actual burn |
| Runway months | undisclosed (likely 18-30 implied) | 2025 | missing | Request runway plan |
| Planned use of funds | undisclosed | — | missing | Request capex plan |
| Next-round trigger | undisclosed | — | missing | Request milestones |
| Debt / project finance | undisclosed | — | missing | Request facility terms |
| Vendor financing (NVIDIA) | undisclosed | — | missing | Confirm any equipment financing |
| Series B valuation | $3.3B post | 2024-07-09 | disclosed | — |
| Latest secondary clearing price | undisclosed | — | missing | Pitchbook / Information chatter |
Capital primitives mix disclosed round amounts with undisclosed forward-looking financial primitives.
[CI007, CI023, CI026, CI027, CI030, CI031]Capex and operating cash flow mapped against funding rounds.
Cash balance and next-round trigger are undisclosed; arrows illustrate direction not magnitude.
[CI007, CI026, CI027, CI030]4.2 Revenue, pricing, and reported scale
Together has not filed financial statements. CNBC reporting around the July 2024 Series B cited a $100M annualised revenue pace; Bloomberg cited "triple-digit growth"; Fast Company and VentureBeat repeated those figures without independent verification. The Information has separately reported on 2025 revenue trajectory behind a paywall; PitchBook lists the company as later-stage venture without a confirmed 2025 follow-on. Pricing is published per-token on the public pricing page, ranging roughly $0.20–$0.90/M tokens for 7–70B open models, with a documented 50% batch-inference discount and custom dedicated-endpoint pricing quoted via sales. SKUs include serverless, dedicated/reserved endpoints, fine-tuning, batch, and embeddings; vision/audio/image SKUs are documented separately. There is no published ARR, segment split, customer concentration, NRR, or gross margin disclosure at the runDate. Forrester and IDC market frames imply Together is a growth-stage entrant in a multi-billion-dollar generative-AI inference TAM, but neither analyst names Together among its top three vendors. Management has acknowledged enterprise pipeline acceleration tied to Salesforce Ventures co-selling but has not quantified it. This combination of company-claimed momentum, third-party press anecdotes, and absent audited disclosure is consistent with private growth-stage SaaS, but creates significant diligence risk on realised vs list pricing, mix, and gross margin. The GTM motion is dominated by self-serve developer signup at the top of the funnel and partner-led enterprise expansion through Salesforce Ventures and NVIDIA channel referrals; sales-cycle length, CAC, and payback are undisclosed but can be inferred to be 60-120 days for enterprise dedicated contracts based on comparable inference-API vendor disclosures.[CI009, CI010, CI011, CI012, CI013, CI014]
| SKU | Pricing basis | Public price benchmark | Discount levers | Diligence gap |
|---|---|---|---|---|
| Serverless inference | per million tokens | $0.20–$0.90/M (open models 7–70B) | volume / committed-use | Realised vs list pricing not disclosed |
| Batch inference | per million tokens | 50% discount vs serverless | batch SLA window | Confirmed via 2025 blog update |
| Dedicated endpoints | custom / reserved | quoted via sales | term commitment | No published list pricing |
| Fine-tuning API | per training run | quoted on pricing page | volume | Public docs but no margin disclosure |
| Embeddings API | per million tokens | published per-model | volume | — |
| Vision / image / audio APIs | per request / per token | published per model | — | Revenue mix not split |
| Enterprise contracts | annual / committed | undisclosed | strategic discounts | Critical diligence ask |
Pricing rows mix published list pricing (high confidence) and inferred enterprise practice (low confidence); revenue mix between SKUs is not disclosed and must be requested.
[CI009, CI010, CI011, CI012, CI013, CI014]| Pricing dimension | Public benchmark | List vs realised | Discount / unknown | Source |
|---|---|---|---|---|
| Per-token Llama-70B | $0.88/M serverless | list only | volume discount | pricing page |
| Batch SLA discount | -50% vs serverless | list only | batch window | 2025 batch blog |
| Dedicated endpoint | custom / per hour | realised not disclosed | term commit | blog + sales-quoted |
| Fine-tuning run | per training token | list only | volume | docs FT page |
| Embeddings | per million tokens | list only | volume | docs embeddings page |
| Enterprise contract value | not disclosed | realised undisclosed | strategic discounts | requested from management |
| Co-sell rebates (Salesforce) | not disclosed | realised undisclosed | partner economics | Salesforce Ventures co-sell |
| Sovereign-cloud premium | not disclosed | realised undisclosed | regional | Prosperity7 strategic |
List pricing is publicly verifiable; realised pricing across enterprise contracts is undisclosed and must be requested.
[CI012, CI013, CI014, CI015, CI016]How customer activity converts to Together revenue and gross profit.
Gross-profit edge is illustrative; realised margin is undisclosed.
[CI012, CI013, CI014, CI015, CI024, CI025]4.3 Unit economics, capital adequacy, and gaps
Together's public profile permits only rough unit-economics estimation. On the cost side, GPU-hour COGS scale with NVIDIA capex; CoreWeave's S-1 disclosures (a useful comparable) show GPU cloud gross margins in the 60–70% range on reserved deals and lower on on-demand. Per-token gross margin at Together's list pricing is plausibly 40–60% on serverless and higher on dedicated, but realised margin depends on utilisation and reserved-capacity contracts that are not public. On the cash side, $533M raised against an implied $300–$500M cash burn through 2024 (consistent with hyperscale GPU buildout and a 150+ headcount) suggests runway into 2026, but no figure is confirmed. Capital adequacy depends on whether Together extends Series B or files for IPO; the Figma and CoreWeave 2025 IPO precedents show the public-market window is open for AI-infrastructure issuers, while Navan's S-1 process is a closer growth-SaaS comparable. Gaps are material: ARR confirmation, gross margin by SKU, customer concentration top-10, net dollar retention, contracted vs uncontracted revenue, runway months, debt or vendor financing, and any sovereign-cloud commitments. These gaps drive the diligence ask list in the unit-economics and capital-adequacy tables and underpin a material evidence-gap entry for each undisclosed primitive; absent management disclosure, the most informative external signals are Together's public hiring posture, pricing-page revisions, and any 2026 secondary-market chatter, all of which should be tracked through the close of diligence. Working capital is unlikely to be a constraint at this scale of consumption-based SaaS; the bigger swing factor on cash is the pace of GPU capex relative to revenue ramp, which sets the cadence for the next round trigger. The verdict is that revenue quality and growth optics are strong but unverified; margin path is plausible but unaudited; capital intensity is high but underwritten by NVIDIA alignment; and the principal diligence blocker is the full set of private financial primitives enumerated in the public-financial-gaps table.[CI019, CI020, CI021, CI022, CI023, CI024]
| Metric | Value / null | Confidence | Why it matters | Diligence ask |
|---|---|---|---|---|
| Serverless gross margin | 40–60% (inferred) | low | Long-term margin path | Request actual blended GM |
| Dedicated gross margin | 60–75% (inferred) | low | Reserved customer LTV | Request dedicated GM split |
| Batch gross margin | 35–55% (inferred) | low | Batch GM after 50% discount | Confirm batch utilisation |
| CAC payback | null | low | Sales efficiency | Request payback months by segment |
| Magic number | null | low | Sales productivity | Request magic number |
| NRR | null | low | Expansion proxy | Request NRR by cohort |
| Gross retention | null | low | Churn proxy | Request gross retention |
| Implied burn 2024 | $300–$500M (inferred) | low | Cash adequacy | Request 24-month plan |
| Utilisation of GPU fleet | null | low | Utilisation drives GM | Request utilisation by SKU |
| SBC ratio | null | low | True margin | Request SBC schedule |
All values are inferred ranges or nulls; every null is accompanied by a specific diligence request.
[CI019, CI020, CI021, CI022, CI023, CI024]| Item | Public status | Why it matters | Diligence ask |
|---|---|---|---|
| Audited revenue (ARR) | not disclosed | Validates third-party $100M figure | Request management ARR & growth deck |
| Gross margin by SKU | not disclosed | Underpins long-term thesis | Request COGS breakdown by SKU |
| Net dollar retention | not disclosed | Stickiness proxy | Request NDR by cohort |
| Top-10 customer concentration | not disclosed | Revenue concentration risk | Request top-10 anonymised |
| Contracted revenue (RPO) | not disclosed | Forward visibility | Request contracted vs uncontracted split |
| Cash & runway | not disclosed | Capital adequacy | Request cash position & 24-mo plan |
| Debt / vendor financing | not disclosed | Capital structure | Request facility terms, if any |
| Founder ownership | not disclosed | Alignment, dilution | Request cap table |
| NRR vs gross retention | not disclosed | Expansion vs churn | Request gross / net retention |
| Stock-based compensation | not disclosed | Real vs reported margin | Request SBC schedule |
| Realised enterprise pricing | not disclosed | True margin vs list | Request three sample contracts |
All items are material to underwriting and none are public at the runDate; chapter relies on third-party signals and management requests to close the gap.
[CI019, CI020, CI021, CI022, CI023, CI024]Inputs to per-token unit economics in absence of disclosed values.
All quantitative nodes are inferred ranges or null; this is a qualitative bridge.
[CI012, CI016, CI019, CI020, CI024, CI025]Source-backed bounds on revenue, burn, runway, and margin.
Ranges are illustrative; lower bound is most conservative public datapoint and upper bound reflects 2x of most aggressive public datapoint.
[CI009, CI024, CI025, CI026, CI027]4.4 Exhibits
05Product & Technology
5.1 Product surface, modules, and SKUs
Together AI exposes a single platform with serverless inference, dedicated endpoints, fine-tuning, batch inference, embeddings, and modality-specific APIs (vision, audio, image). The product surface is documented at docs.together.ai and is OpenAI-compatible at the chat-completions level, making migration from closed APIs straightforward. The model catalog spans 200+ open models including Llama 3/4, Mistral, Mixtral, Qwen, DeepSeek, StripedHyena, and custom fine-tuned checkpoints; published model and SKU references confirm per-token and per-request billing surfaces. Dedicated endpoints offer reserved capacity on H100/H200/B200 GPUs for latency-sensitive workloads and are quoted via sales. The fine-tuning API accepts LoRA and full-parameter training jobs across most supported families. Batch inference offers up to a 50% discount versus serverless with documented SLA windows. SDKs ship in Python and TypeScript, with raw HTTP for any other runtime; rate-limit documentation distinguishes free, paid, and enterprise tiers. The complete product module / asset matrix below enumerates each module, its primary user, maturity status, differentiation, and the gap a buyer should probe before committing to a long-term contract. Module ordering follows the buyer's typical adoption sequence: serverless first for experimentation, then dedicated and fine-tuning for production, then batch and embeddings for scaled workflows.[CE001, CE002, CE003, CE004, CE005, CE006]
| Module | User | Status/maturity | Differentiation | Diligence gap |
|---|---|---|---|---|
| Serverless inference API | Developers, startups | GA | OpenAI-compatible chat completions on 200+ open models | SLA % not published |
| Dedicated endpoints | Enterprise | GA | Reserved H100/H200/B200 capacity, BAA available | List pricing not published |
| Fine-tuning API | ML engineers | GA | LoRA + full-parameter on Llama/Mistral/Qwen | Training cost transparency |
| Batch inference | ML engineers | GA (2025 update) | 50% discount vs serverless | Realised batch utilisation undisclosed |
| Embeddings API | Developers | GA | Multiple open embedding models | Per-model retention tracking |
| Vision / image / audio APIs | Multi-modal devs | GA | Llama-Vision, image generation, audio transcription | Regional availability map |
| Together Inference Engine (TIE v1/v2) | Internal / advanced | GA | FA-3/4 + TK + speculative decoding | Engine-version SLA differences |
| Mixture-of-Agents | Researchers, advanced devs | beta | Ensemble inference for higher quality | Cost premium vs single-model |
| Model store | All users | GA | 200+ open + custom weights | Catalog churn cadence |
| SDKs (Python, TS, HTTP) | Developers | GA | OpenAI-compat + native | SDK release cadence |
Maturity follows public docs status; cells marked beta or limited reflect explicit docs statements at runDate.
[CE001, CE002, CE003, CE004, CE005, CE006]| User job | Current workflow | Together solution | Measurable benefit | Limitation |
|---|---|---|---|---|
| Try an open model | Local llama.cpp or HF Spaces | Serverless API call | Zero infra, OpenAI-compat | Cost at scale |
| Move production from closed API | OpenAI SDK | Swap base URL to Together | Same SDK, open weights | Feature parity edges |
| Fine-tune a Llama variant | Custom GPU cluster | Fine-tune API + run job | No DevOps needed | Limited training-step visibility |
| Serve a low-latency app | Self-hosted vLLM | Dedicated endpoint | Reserved capacity, BAA | Higher commit |
| Run nightly batch summarisation | Self-hosted batch | Batch inference SKU | 50% cheaper than serverless | Batch SLA window |
| Build an agent | LangChain + closed API | Function-calling + JSON mode + structured output | Open-weight + tool use | Tool-call patterns evolving |
| Generate embeddings | HF embedding models locally | Embeddings API | Hosted, scalable | Re-index cost |
| Multi-modal (vision) | Self-host Llama-Vision | Vision API | Hosted vision call | Image-size limits |
| Research ensemble | Paper-replication code | MoA API | Out-of-box ensemble | Higher per-query cost |
| Run a regulated workload | On-prem GPU | Dedicated + BAA | HIPAA on dedicated | No FedRAMP yet |
Workflow rows are drawn from docs quickstarts and customer case studies; limitations are explicit docs caveats or known gaps.
[CE001, CE002, CE003, CE004, CE005, CE006]Together AI product stack layers from API down to GPU substrate.
[CE001, CE011, CE012, CE013, CE014, CE015]5.2 Architecture, dependencies, and operating model
Together's architecture stacks application APIs (chat, completions, embeddings, fine-tune, batch) over a model registry and inference orchestrator that schedules GPU pods on a multi-region NVIDIA Hopper/Blackwell fleet. The inference engine (Together Inference Engine v1/v2) wraps FlashAttention-3 and FlashAttention-4 attention kernels, ThunderKittens kernel framework, and speculative-decoding/Medusa decoders to achieve published throughput and latency claims. Mixture-of-Agents (MoA) research enables ensemble inference for higher-quality completions on supported models. The model store is backed by HuggingFace and Together's own registry; weight portability is a stated design principle. Critical dependencies include NVIDIA GPU supply (Hopper/Blackwell), data-center co-location partners, the HuggingFace catalog for model artefacts, and AWS S3/equivalent storage for fine-tune artefacts. The operating model splits a kernel/inference engineering team (Tri Dao, HazyResearch lineage) from a platform/SRE team (Alon Gavrielov-led infrastructure org from 2025) and a research arm (Chris Ré, Percy Liang). The architecture is exposed through a flow figure (customer request to GPU pod to response) and a critical-dependency DAG that surfaces single-vendor concentrations. Reliability proof points are a status page, published rate-limit documentation, and a published roadmap of model launches at GTC 2025 and AI Native Conference 2025. Gaps include a public SLA percentage, the precise multi-region map (which regions, which providers), and a single-source-of-truth roadmap; all are flagged as evidence gaps.[CE011, CE012, CE013, CE014, CE015, CE016]
| Layer/component | Role | Key dependency | Risk |
|---|---|---|---|
| API gateway | Receive OpenAI-compat HTTP requests | Auth + rate limit infra | DDOS, rate-limit miscalibration |
| Model registry | Resolve model id to weights | HuggingFace + internal storage | Weight churn, license updates |
| Inference scheduler | Place request on GPU pod | GPU pool, kube/orchestrator | Hot-spotting, queue depth |
| Together Inference Engine v2 | Kernel-optimised model execution | FA-3/4, ThunderKittens, speculative decoding | Engine bug, regression on new model |
| GPU pool (Hopper / Blackwell) | Compute substrate | NVIDIA supply, co-lo partners | Supply shock, power outage |
| Fine-tune trainer | LoRA / full-parameter training jobs | GPU pool + object storage | Job-failure cost |
| Batch queue | Schedule batch inference | GPU off-peak window | SLA violation if peak overlaps |
| Embedding service | Embed text/images | Embedding model registry | Model deprecation |
| Vision/audio path | Multi-modal inference | Separate model stack | Mode-specific bugs |
| Observability / status | SLA monitoring | status.together.ai feed | Public SLA still missing |
| Trust / compliance | SOC 2 + HIPAA controls | Audit cadence | FedRAMP not yet GA |
| Storage (fine-tune artefacts) | Persist trained models | S3-equivalent storage | Loss/leak scenarios |
Architecture layers reflect documented surfaces; depth of each layer is inferred from blog + research papers and may not be exhaustive.
[CE011, CE012, CE013, CE014, CE015, CE016]Customer request through Together platform to a completion.
[CE011, CE012, CE013, CE014, CE015, CE016]Suppliers, platforms, and partners Together depends on.
[CE014, CE018, CE019, CE020, CE021, CE022]5.3 Trust, security, compliance, and roadmap
Together publishes a trust center referencing SOC 2 Type II attestation, HIPAA business associate agreement (BAA) availability on dedicated endpoints, and standard data processing terms. FedRAMP and similar US-Federal accreditation are not yet listed at the runDate; regional residency is offered through dedicated clusters but the public map is partial. Safety controls span content moderation, function-calling JSON validation, structured-output JSON mode, and per-model safety guidance. The roadmap, mined across the blog and AI Native Conference posts, includes Blackwell (B200) capacity ramp, batch inference SKU expansion, expanded fine-tune families, multi-modal (vision+audio) coverage, and Mixture-of-Agents productisation. Differentiation rests on (a) kernel-level performance lead (FA-3/4, TK), (b) breadth of open-weight model coverage, (c) flexibility across serverless/dedicated/batch SKUs, and (d) dual research-and-engineering culture with deep Stanford/Princeton/ETH lineage. Public developer signal — GitHub repo activity, PyPI download trajectory, HuggingFace model hub presence, and Hacker News thread engagement — confirms an active developer community without yet matching the scale of OpenAI or Hugging Face itself. Compared with hyperscaler offerings, Together's differentiation is most visible on open-weight neutrality and kernel performance and least visible on enterprise compliance breadth. The trust/compliance and roadmap tables below summarise each control and milestone with its current status, scope, and gap; cells marked unknown reflect missing public disclosure rather than absence of the underlying capability.[CE023, CE024, CE025, CE026, CE027, CE028]
| Control / certification | Status | Scope | Gap |
|---|---|---|---|
| SOC 2 Type II | attested | platform | Need recent attestation date |
| HIPAA / BAA | available | dedicated endpoints | Not on serverless tier |
| GDPR / DPA | available | EU customers | Specific regional residency |
| FedRAMP | not yet | US Federal | Roadmap timing not confirmed |
| ISO 27001 | not confirmed | — | Status uncertain |
| Data residency / regional cluster | partial | EU, US | Public region map limited |
| Content moderation / safety | documented | API-level | Per-model behaviour differs |
| Function calling / JSON mode | GA | API | Tool-use patterns evolving |
| Structured output | GA | API | — |
| Audit logs | documented | enterprise | Default not enabled |
| Custom model weights privacy | documented | dedicated | Need contract review |
| Bug bounty / responsible disclosure | published | platform | — |
Controls cross-verified against trust.together.ai pages, blog posts, and public docs; cells marked "not confirmed" reflect absent public disclosure rather than absence of the underlying control.
[CE023, CE024, CE025, CE026, CE027, CE028]| Date / stage | Feature / milestone | Status | Implication | Source |
|---|---|---|---|---|
| 2024-07 | FlashAttention-3 | GA | Kernel lead on Hopper | arXiv 2407.08608 |
| 2024-10 | ThunderKittens | GA | Kernel framework | Together blog |
| 2024-11 | Startup Accelerator | Launched | GTM channel | Together blog |
| 2025-03 | GTC 2025 Pioneers | event | Customer + NVIDIA visibility | Together blog |
| 2025-04 | Alon Gavrielov as VP Infra | hired | Operating scale | Together blog |
| 2025-05 | Adaption partnership | Launched | Healthcare workflow | Together blog |
| 2025-06 | AI Native Conference | event | Research + product announcements | Together blog |
| 2025-08 | FlashAttention-4 | GA | Next-gen kernel | Together blog |
| 2025-09 | Batch inference API updates | GA | 50% discount + SLA | Together blog |
| 2026-Q1 | Blackwell (B200) rollout | planned | Capacity & price | Inferred from docs |
| 2026 | Expanded MoA productisation | planned | Quality tier | AI Native Conference |
| 2026 | Multi-modal expansion | planned | Vision+audio coverage | Together blog |
Roadmap items beyond runDate are explicitly marked planned; sources include blog posts and conference announcements.
[CE033, CE034, CE035, CE036, CE037]Maturity rating across product modules.
[CE001, CE002, CE003, CE004, CE005, CE006]5.4 Exhibits
06Customers
6.1 Customer segmentation and adoption surface
Together AI's customer base is segmented by buyer/user role and by deployment intensity. The top of the funnel is self-serve developers using serverless inference for prototyping or low-volume production: per company disclosure, more than 100,000 developers have used the platform since GA. Beneath that sit named startup customers — Pika (video), Arcee (open-source merging), Nous Research (community models), Cartesia (voice) — who run production workloads via a mix of serverless and dedicated endpoints. The enterprise tier is anchored by Salesforce (referenced via Salesforce Ventures co-sell and a customer case study), Zoom (customer case study), and Washington University (research deployment); the NVIDIA GTC 2025 Pioneers programme surfaced an additional cohort of customers including healthcare, robotics, and developer-tools companies. The Startup Accelerator (launched 2024-11) is an explicit funnel for early-stage AI startups, providing credits, technical support, and GTM amplification. Geographic mix is North America-skewed with EU presence growing through dedicated clusters; vertical mix spans developer tools, content/media (video, voice, image), enterprise SaaS, healthcare, and academia. Payer/user/buyer split varies by tier: in self-serve the developer is both buyer and user; in enterprise the buyer is typically a CTO/CIO or platform-engineering lead while the users are application teams. Customer segmentation, adoption-trajectory and named-customer-proof tables below capture each row's evidence quality and the residual gap on retention and concentration.[CU001, CU002, CU003, CU004, CU005, CU006]
| Segment | Buyer/user/payer | Use case | Scale | Revenue / strategic value | Gap |
|---|---|---|---|---|---|
| Self-serve developers | Developer = buyer + user | Prototyping, low-volume production | 100,000+ devs (company claim) | Long-tail revenue + funnel | Paid vs free not split |
| AI-native startups | CTO/founder | Production inference | Pika, Cartesia, Nous, Arcee documented | High strategic value | No revenue values disclosed |
| Enterprise SaaS | CIO/platform eng | Embedded AI features | Salesforce, Zoom | Large strategic value | Contract sizes not disclosed |
| Healthcare | CIO/clinical lead | Regulated workflows (BAA) | Adaption (2025 launch) | Strategic | Production status TBD |
| Academia / research | PI / IT lead | Research compute | Washington University | Brand value | Spend size not disclosed |
| Developer tools | Founder/CTO | Embedded inference | GTC 2025 cohort | Pipeline | Cohort not enumerated |
| Sovereign / govt | Procurement | Sovereign cloud | Prosperity7-aligned (implied) | Strategic optionality | No public proof |
| Open-source community | Maintainers | OSS model serving | HuggingFace mirror integration | Brand + community | Active vs passive use |
Segmentation rows mix named case studies with inferred categories; revenue-band values are unavailable.
[CU001, CU002, CU003, CU004, CU005, CU006]| Metric | Value | Date | Source | Confidence | Implication | Missing denominator |
|---|---|---|---|---|---|---|
| Developers using platform | 100,000+ | 2024 | Together blog | low | Top-of-funnel scale | Paid vs free split |
| Named customer case studies | 7+ published | 2024-25 | Together blog | high | Real production usage | Total customer count |
| GTC 2025 customer cohort | ~12 pioneers | 2025-03 | Together blog + NVIDIA | medium | Enterprise pipeline | Per-customer ACV |
| Startup Accelerator participants | undisclosed N | 2024-11 onwards | Together blog | low | Pipeline lever | Cohort size |
| Adaption healthcare partner | 1 (launched) | 2025 | Together blog | medium | Regulated entry | Production status |
| HuggingFace integration users | undisclosed | 2024-25 | HF blog | low | Open-source pull | Active developers |
| G2 reviews | very small N | 2025 | G2 | low | Independent proof | Volume too low to be representative |
| Trustpilot reviews | very small N | 2025 | Trustpilot | low | Independent proof | Volume too low to be representative |
Trajectory rows mix company-claimed (low confidence) and third-party-reported numbers; denominators are explicitly listed as missing.
[CU001, CU002, CU011, CU012, CU013, CU014]Self-serve developer to enterprise expansion path.
[CU001, CU002, CU003, CU004, CU005, CU006]Stage-by-stage developer-to-enterprise conversion.
Awareness, active-paid, and multi-year counts are illustrative placeholders; only signup and named counts are sourced.
[CU001, CU002, CU011, CU012, CU013, CU014]6.2 Named-customer proof and durability
Named-customer proof spans seven public case studies (Salesforce, Zoom, Pika, Arcee, Nous Research, Cartesia, Washington University) plus the GTC 2025 Pioneers cohort and the Adaption healthcare partnership. Each case study documents the customer's workflow, model used, and qualitative outcome; quantitative outcomes (throughput, latency, cost, ROI) are documented for some but not all deployments. The most-cited outcomes are FlashAttention-driven latency reduction (Pika, Cartesia), cost reduction versus closed APIs (Arcee, Nous), and integration depth (Salesforce, Zoom). Production vs pilot is explicit for Salesforce, Zoom, Pika, Cartesia (production); Adaption is described as a launching partnership rather than a confirmed production deployment. Adverse and durability signals are mixed: G2 and Trustpilot review counts are low, limiting independent retention proxy; Reddit and Hacker News threads occasionally cite latency or cold-start concerns on serverless tier; no public churn announcement or terminated-customer report has been published. The customer proof matrix below tags each named customer with evidence quality, outcome specificity, retention visibility, and production maturity. Retention and repeat-usage primitives (NRR, GRR, gross retention) are not disclosed, and the chapter records that gap as a material evidence gap with a concrete diligence ask. Reference-quality and freshness are best for the 2024-2025 case studies (Salesforce, Zoom, Pika) and weaker for older case studies that have not been updated in 2026.[CU012, CU013, CU014, CU015, CU016, CU017]
| Customer | Segment | Deployment / use case | Production vs pilot | Outcome | Limitation |
|---|---|---|---|---|---|
| Salesforce | Enterprise SaaS | Co-sell + embedded inference | production | Integration depth + Series B lead | Contract value not disclosed |
| Zoom | Enterprise SaaS | AI feature inference | production | Latency improvement | Specific metrics not public |
| Pika | Startup (video) | Video model serving | production | Latency reduction via FA-class kernels | Cost benefit qualitative |
| Cartesia | Startup (voice) | Voice model serving | production | Throughput on dedicated | Pricing not disclosed |
| Arcee | Startup (open-source) | Model merging + inference | production | Cost vs closed APIs | Volume not disclosed |
| Nous Research | Open-source community | Community model hosting | production | Open-weight neutrality | Revenue mix not disclosed |
| Washington University | Academia | Research compute | production | Research throughput | Spend size not disclosed |
| Adaption | Healthcare | Regulated workflow | launching | Healthcare entry | Production status TBD |
| GTC 2025 Pioneers cohort | Enterprise mix | Various | production | NVIDIA + Together joint | Cohort not fully enumerated |
Rows reflect publicly named customers with case-study or press evidence; private named customers (if any) are not in this table.
[CU012, CU013, CU014, CU015, CU016, CU017]| Metric | Value/null | Segment | Confidence | Diligence ask |
|---|---|---|---|---|
| NRR | null | enterprise | low | Request NRR by cohort |
| GRR | null | enterprise | low | Request gross retention by cohort |
| Logo churn | null | enterprise | low | Request named-account churn list |
| Active developers (paid) | null | self-serve | low | Request paid-developer count |
| Repeat purchase rate | null | self-serve | low | Request cohort repeat rate |
| G2 average rating | very small N | self-serve | low | Cannot extrapolate from small N |
| Trustpilot average rating | very small N | self-serve | low | Cannot extrapolate from small N |
| Reddit/HN sentiment | mixed-to-positive | community | low | Aggregate qualitative scan |
| Named-customer renewals | null | enterprise | low | Confirm via reference calls |
| Dedicated-endpoint renewal rate | null | enterprise | low | Request renewal cohort |
All retention primitives are null and accompanied by a specific diligence ask.
[CU022, CU023, CU024, CU025, CU026]Evidence quality across named customers; rows pivot per-customer evidence axes complementing the named-customer-proof table.
[CU012, CU013, CU014, CU015, CU016, CU017]6.3 Expansion, concentration, and adverse signals
Expansion proxies are mostly qualitative. The Salesforce Ventures co-sell relationship is the principal enterprise expansion lever, with the Series B led by Salesforce Ventures interpreted by the market as a multi-year channel commitment; NVIDIA GTC 2025 Pioneers and the Startup Accelerator add brand and pipeline. The HuggingFace partnership funnels developers from the model hub into Together. Concentration risk is impossible to bound precisely without management disclosure, but the public customer mix is skewed toward AI-native startups and developer-tools companies rather than a small number of mega-enterprise contracts, which suggests broader top-of-funnel diversification than e.g. an OpenAI-style anchor-customer model. Channel and procurement friction is documented on the dedicated tier: enterprise sales cycles require sales engagement, custom MSAs, and security review, which adds 60-120 days before revenue. Adverse signals include scattered Reddit and Hacker News threads citing latency, cold-start, or occasional reliability events on the serverless tier; the company maintains a public status page but does not publish an SLA percentage. No public lawsuit, lost-customer report, or named-account churn has surfaced through the runDate. The expansion-and-concentration table below records each expansion driver, concentration risk, impact magnitude, and the precise diligence path required to close the residual gap; the chapter's retention table treats every undisclosed primitive as a diligence ask rather than asserting a number that cannot be sourced. Overall the customer evidence base is consistent with a growth-stage inference platform building real enterprise traction on top of a strong self-serve developer flywheel.[CU027, CU028, CU029, CU030, CU031, CU032]
| Expansion driver | Concentration risk | Impact | Diligence path |
|---|---|---|---|
| Salesforce Ventures co-sell | Salesforce concentration in enterprise wins | high | Quantify pipeline % from Salesforce |
| NVIDIA GTC Pioneers | NVIDIA referral concentration | medium | Quantify GTC-sourced ACV |
| Startup Accelerator | Long-tail dilution risk | low | Track cohort revenue conversion |
| HuggingFace partnership | HF dependence for funnel | medium | Confirm cross-promote terms |
| Self-serve developer growth | Long-tail churn risk | low | Cohort retention by month |
| Adaption healthcare entry | Single named partner risk | medium | Track follow-on healthcare wins |
| Sovereign / Prosperity7 channel | Sovereign concentration if materialises | medium | Confirm pipeline commits |
| Open-source community | Brand dependence on OSS pull | low | Track GH/HF/PyPI signal stability |
| Top-10 customer concentration | Material if undisclosed | high | Request top-10 anonymised |
| Geographic concentration | NA-heavy | medium | Request regional revenue split |
Expansion drivers and concentration risks ranked qualitatively in absence of disclosed customer-revenue breakdown.
[CU027, CU028, CU029, CU030, CU031, CU032]Time-series retention placeholder using sector-typical PLG SaaS proxy values; all numbers illustrative pending Together disclosure.
All retention cells are illustrative sector benchmarks (PLG SaaS / inference); Together has not disclosed actual cohort retention.
[CU022, CU023, CU024, CU025, CU026]6.4 Exhibits
07Risks
7.1 Regulatory and legal risk surface
Together AI faces the same Generative-AI regulatory perimeter as all foundation-model platforms operating in the United States and Europe. In the US, the FTC opened a 6(b) study into generative-AI investments and partnerships in 2024 and has signalled broad antitrust scrutiny of cloud-AI relationships; the Biden / Trump-era Executive Order on AI established a foundation for federal AI standards that the NIST AI Risk Management Framework operationalises. The BIS has tightened export controls on advanced GPUs (A100, H100, H200, B200) and on the export of certain foundation-model weights, directly relevant to a GPU-cloud operator. In the EU, the AI Act entered into force in 2024 with phased obligations for general-purpose AI providers culminating in 2026-2027; the UK ICO and Australian OAIC have published GenAI guidance that creates de-facto compliance baselines. Privacy regimes (CCPA in California, HIPAA for healthcare workloads) impose contract-level obligations Together discharges via BAAs and SOC 2 controls referenced on its trust center. On the litigation side, the NYT v Microsoft/OpenAI docket, Authors Guild v OpenAI, and Getty v Stability AI are the bellwether copyright cases whose outcomes will shape exposure for every model-hosting platform; Together itself is not currently a named defendant but its open-model hosting business carries adjacent exposure if precedent extends to platform-as-host. Civil-society pressure (CDT, EFF) adds reputational risk. The regulatory and legal risk register below ranks each line item by jurisdiction, likelihood, severity, mitigation, and residual exposure, with diligence asks for every undisclosed control.[CR001, CR002, CR003, CR004, CR005, CR006]
| Rule / case | Jurisdiction | Status | Likelihood | Severity | Mitigation | Residual exposure |
|---|---|---|---|---|---|---|
| FTC 6(b) generative-AI inquiry | US | ongoing | high | medium | engage counsel, monitor | possible behavioural remedies |
| FTC general AI enforcement | US | active | medium | medium | standard advertising/competition compliance | enforcement action |
| EU AI Act (GPAI) | EU | phased 2024-27 | high | high | GPAI obligations, transparency, copyright opt-out | non-compliance penalties up to 7% revenue |
| BIS export controls (GPUs + weights) | US/global | tightened 2025 | high | high | geo-fence customers, screening | blocked sovereign deployments |
| NIST AI RMF | US | voluntary | medium | low | adopt framework controls | procurement disadvantage if absent |
| UK ICO GenAI guidance | UK | active | medium | medium | UK DPA + GDPR posture | enforcement exposure |
| Australia OAIC GenAI guide | AU | active | low | low | adopt guidance | enforcement exposure |
| White House EO on AI | US | active | medium | medium | reporting thresholds | reporting burden |
| CCPA (California) | US-CA | active | medium | medium | privacy controls | enforcement exposure |
| HIPAA (healthcare workloads) | US | active | medium | high | BAA, dedicated tier | breach + fines |
| SOC 2 attestation surface | global | self-declared on trust center | medium | medium | SOC 2 Type II evidence | attestation gap if expired |
| NYT v Microsoft/OpenAI (copyright) | US | active litigation | high | medium | monitor; platform-host distinction | precedent extension risk |
| Authors Guild v OpenAI | US | active litigation | high | medium | monitor; platform-host distinction | precedent extension risk |
| Getty v Stability AI | US/UK | active litigation | medium | medium | monitor; image-model adjacency | precedent extension risk |
| CDT AI policy pressure | US | active | low | low | engagement, transparency | reputational |
Each row reflects the rule/case posture at runDate; ratings are qualitative pending management disclosure.
[CR001, CR002, CR003, CR004, CR005, CR006]Likelihood × severity heatmap across top risks.
[CR001, CR003, CR004, CR012, CR018, CR021]7.2 Operational, security, partner, and dependency risk
Operational risk for Together centres on three vectors: GPU capacity availability (Hopper and Blackwell), model-serving reliability, and regulated-workload controls. NVIDIA is the dominant single-vendor dependency — GPUs, networking (NVLink, InfiniBand), and software stack (CUDA, TensorRT, NeMo, Dynamo) — and is also a strategic investor, which both reduces supply-allocation risk and concentrates correlated downside if Blackwell allocation tightens. HuggingFace is the primary model-artefact dependency; partner risk would emerge if HF changes hosting terms or commercial alignment. Salesforce Ventures is the lead enterprise channel partner via the Series B; channel concentration risk is non-trivial. Security exposure spans the standard model-cloud surface (prompt injection, data exfiltration, prompt-logging leakage, supply-chain compromise of model weights) and the SOC 2 / HIPAA control surface that Together publishes via its trust center. The public status page exists but does not publish an SLA percentage. Competitive displacement risk is real: Fireworks, Replicate, Modal, Anyscale, Cerebras, and Groq all serve overlapping workloads, and hyperscalers (AWS Bedrock, GCP Vertex, Azure OpenAI) bundle inference into existing enterprise contracts. People and execution risk includes key-person dependency on Vipul Ved Prakash (CEO), Ce Zhang (CTO), and Tri Dao (Chief Scientist), and the build-out velocity required to keep pace with Hopper→Blackwell→Rubin cadence. The operational, partner, and people registers below capture each failure mode, mitigation maturity, and residual exposure with explicit diligence paths.[CR018, CR019, CR020, CR021, CR022, CR023]
| Failure mode | Likelihood | Severity | Mitigation maturity | Residual exposure | Unresolved gap |
|---|---|---|---|---|---|
| Serverless multi-hour outage | medium | medium | status page; no SLA % | customer churn | SLA disclosure |
| Dedicated-endpoint hardware failure | low | medium | redundancy implied | revenue at risk | reliability metrics |
| Prompt injection / data exfiltration | medium | medium | safety models, function-calling guardrails | customer breach | pen-test cadence undisclosed |
| Model-weight supply-chain compromise | low | high | HF integrity checks | platform-wide compromise | weight signing process undisclosed |
| SOC 2 attestation lapse | low | medium | trust center publishes posture | enterprise-deal block | expiry date undisclosed |
| HIPAA BAA breach | low | high | BAA available | regulatory fines | breach plan undisclosed |
| GPU capacity shortfall | medium | high | NVIDIA partnership | revenue cap | allocation commit undisclosed |
| Network / inter-zone failure | low | medium | multi-region implied | latency spike | region map undisclosed |
| Insider threat | low | medium | standard controls | data leak | access controls undisclosed |
| Software bug introducing regression | medium | low | staged rollout implied | reputation | release cadence undisclosed |
Operational ratings are qualitative; multiple control primitives are undisclosed and treated as diligence asks.
[CR018, CR019, CR020, CR021, CR022, CR028]| Dependency | Counterparty | Role | Concentration | Failure scenario | Severity | Mitigation | Residual exposure |
|---|---|---|---|---|---|---|---|
| GPU supply | NVIDIA | primary supplier + investor | very high | Blackwell allocation cut | high | strategic investor; multi-gen commit | revenue cap |
| Model artefacts | HuggingFace | registry + distribution | high | hosting policy change | medium | company self-host fallback | distribution friction |
| Enterprise channel | Salesforce | co-sell + investor | medium | co-sell deprioritisation | medium | direct sales build-out | pipeline shrink |
| Datacenter capacity | Multiple (undisclosed) | colo + hyperscaler | medium | single-region capacity loss | medium | multi-region build | latency / cost |
| Network | Multiple | transit + IX | low | peering loss | low | multi-carrier | transient latency |
| Open-source community | Llama, Mistral, Qwen, DeepSeek maintainers | model upstreams | medium | license change | medium | model diversity | licensing review burden |
| Capital partners | GC / Salesforce / NVIDIA / Lux / Coatue / Prosperity7 / Kleiner | investors | medium | round oversubscription failure | medium | revenue traction | financing risk |
| Sovereign partners | Prosperity7 (KSA-adjacent) | strategic investor | low | geo-political pressure | medium | disclosure posture | reputational |
Dependency ratings reflect public concentration only; private contractual commits remain a diligence ask.
[CR023, CR024, CR025, CR026, CR027, CR030]| Role / function | Dependency or gap | Likelihood | Severity | Mitigation | Diligence path |
|---|---|---|---|---|---|
| CEO Vipul Ved Prakash | founder-led; key-person dependency | low | high | founder retention | reference checks |
| CTO Ce Zhang | key-person dependency | low | high | retention | reference checks |
| Chief Scientist Tri Dao | key-person; brand-defining | low | high | academic dual-affiliation | retention plan |
| VP Infra Alon Gavrielov | new hire (2025) | low | medium | recent join | onboarding review |
| CFO | undisclosed at runDate | medium | medium | recruiting in progress (inferred) | confirm hire |
| CRO / sales leader | undisclosed at runDate | medium | medium | enterprise build-out | confirm hire |
| Engineering bench | growing post-Series B | medium | medium | hiring momentum | headcount disclosure |
| Compliance / GRC | SOC 2 referenced; team size undisclosed | medium | medium | attestation evidence | team size confirm |
| Board composition | GC + SVP + NVIDIA + founders | medium | medium | growth-stage governance | board minutes diligence |
| Hopper→Blackwell→Rubin transition execution | multi-quarter build-out | medium | high | partnership with NVIDIA | program plan diligence |
People register includes both named individuals and undisclosed roles; CFO/CRO confirmations are explicit diligence asks.
[CR032, CR033, CR034, CR035]How risks flow into revenue, margin, financing, and valuation.
[CR001, CR003, CR004, CR012, CR023, CR024]7.3 Mitigations, kill criteria, and thesis-break triggers
The mitigation-and-kill-criteria table below pairs every top risk with a monitorable trigger, an explicit threshold or event, and the action implication if the trigger fires. Triggers span regulatory (e.g. EU AI Act GPAI obligation enforcement in 2027), litigation (e.g. adverse copyright ruling that extends to platform hosts), partner (e.g. NVIDIA allocation cut or HuggingFace hosting change), operational (e.g. multi-hour serverless outage, breach disclosure), competitive (e.g. hyperscaler bundled-inference pricing undercut), commercial (e.g. Salesforce co-sell churn), and execution (e.g. founder departure, missed Blackwell go-live). For each trigger the table records the transmission path into revenue, margin, financing, or valuation, and the action implication (kill, re-underwrite, monitor, accept). The chapter is explicit that several primitives — incident count, SLA, top-10 customer concentration, retention, GPU committed-spend, opex split — are undisclosed and therefore treated as diligence asks rather than asserting numbers that cannot be sourced. Adverse-source coverage is wide: regulatory bodies (FTC, BIS, EU, UK ICO, OAIC), legal dockets (CourtListener: NYT, Authors Guild, Getty), competitor websites (Fireworks, Replicate, Modal, Anyscale, Groq, Cerebras, CoreWeave, Lambda), and developer-sentiment fora (Hacker News, Reddit). The chapter underwrites that Together's public risk surface is normal for a growth-stage AI infrastructure company with a healthy mitigation posture, but several control primitives remain unverified pending management disclosure.[CR034, CR035, CR036, CR037, CR038, CR039]
| Risk | Monitorable trigger | Threshold / event | Action implication |
|---|---|---|---|
| EU AI Act GPAI | enforcement notice | first 7% fine on a peer | re-underwrite EU revenue |
| BIS export tightening | new entity-list rule | additional GPU export class added | re-underwrite sovereign pipeline |
| Copyright litigation extension | platform-host ruling | any host-liability ruling | re-underwrite OSS hosting |
| NVIDIA allocation | Blackwell allocation cut | published cut to a comparable peer | re-underwrite capacity ramp |
| HuggingFace policy change | HF terms update | material commercial change | build self-host |
| Serverless outage | multi-hour incident | >4h or repeated >1h | SLA review + customer comms |
| Security breach | disclosure event | any reportable incident | immediate re-underwrite |
| Customer concentration | top-10 share | single customer >25% | concentration discount |
| Founder departure | public announcement | any of CEO/CTO/CSO | kill or major re-underwrite |
| Down-round | new financing | flat or down vs Series B | re-underwrite valuation |
Triggers are monitorable from public disclosure; the table is the chapter's actionable kill-criteria contract.
[CR034, CR035, CR036, CR037, CR038, CR039]Critical partners, suppliers, regulators, and financing dependencies.
[CR023, CR024, CR025, CR026, CR027, CR030]7.4 Exhibits
08Valuation
8.1 Recommendation, thesis, and anti-thesis
The recommendation is Hold/Monitor with medium confidence and a medium-high risk rating. Investment thesis: Together AI sits at a structurally attractive intersection of (a) the GenAI inference market expanding 40-60% CAGR per analyst-market-data sources (Gartner, Forrester, IDC, a16z, Bessemer, Menlo), (b) a credible technical moat through FlashAttention authorship (Tri Dao), ThunderKittens kernels (Stanford HazyResearch), Together Inference Engine v2, and Mixture-of-Agents productisation, and (c) an enterprise distribution channel anchored by Salesforce Ventures co-sell, NVIDIA GTC 2025 Pioneers, and the Startup Accelerator funnel. Anti-thesis: the inference layer is contested by Fireworks, Replicate, Modal, Anyscale, Cerebras, Groq, and hyperscalers (AWS Bedrock, GCP Vertex, Azure OpenAI Service) who bundle inference into existing enterprise contracts; revenue (reported $130M-$200M+ ARR per The Information) and retention primitives remain undisclosed; valuation at the Series B mark (~$3.3B-$3.5B) requires multi-year revenue scale to underwrite a 3-5x exit; and the regulatory perimeter (EU AI Act, BIS, copyright litigation precedent) is tightening through 2027. The valuation chapter records each of these as an explicit thesis-break trigger and pairs it with a monitorable threshold and an action implication. The recommendation summary table below pairs recommendation, confidence, risk rating, valuation stance, and decision implication; the thesis / anti-thesis table records the underlying arguments and what would change the view.[CV001, CV002, CV003, CV004, CV005, CV006]
| Recommendation | Confidence | Risk rating | Valuation stance | Decision implication |
|---|---|---|---|---|
| Hold / Monitor | medium | medium-high | at-or-near current Series B mark | Track ARR + NRR + concentration; revisit at Series C |
| Buy (conditional) | medium | medium | 25% correction OR confirmed >$500M ARR | Enter on confirmed traction or down-round |
| Pass (conditional) | medium | high | if hyperscaler pricing cut >40% OR NVIDIA allocation cut OR breach | Exit / decline if bear trigger fires |
| Bull case | low | medium | >$8B exit by 2028 | Strategic-acquisition or premium IPO path |
| Base case | medium | medium | $4B-$6B exit by 2028 | ARR scale + margin expansion |
| Bear case | medium | high | $1B-$2.5B outcome | Down-round / compressed exit |
Recommendation is conditional on the trigger thresholds in the thesis-break table.
[CV001, CV002, CV003, CV004, CV005]| Argument | What would change the view |
|---|---|
| GenAI inference TAM growing 40-60% CAGR per analyst sources | TAM revisions <20% CAGR |
| FlashAttention + ThunderKittens + TIE v2 form a credible technical moat | Open-source / hyperscaler kernel parity erodes Together edge |
| Salesforce Ventures-led Series B implies multi-year channel commitment | Salesforce co-sell deprioritisation or churn |
| NVIDIA strategic investment + GTC 2025 Pioneers cohort signal supply + pipeline | NVIDIA reallocation to direct-managed offerings (DGX Cloud) |
| Open-source neutrality is a defensible positioning vs closed-API providers | Major OSS license changes (Llama, Mistral, Qwen, DeepSeek) |
| Documented enterprise + startup proof base (Salesforce, Zoom, Pika, Cartesia, Arcee) | Named-customer churn or production downgrade |
| Capital base + brand attract talent and customers | Down-round or failed Series C |
| Anti: hyperscaler bundled inference (AWS Bedrock, GCP Vertex, Azure) compresses pricing | Hyperscaler retreats from bundled inference |
| Anti: GenAI copyright litigation could extend to platform hosts | Adverse precedent contained to model-trainer defendants |
| Anti: revenue + retention undisclosed; price-sensitive entry discipline required | Management discloses ARR + NRR |
Thesis and anti-thesis are symmetric; the chapter is explicit on what evidence would flip the view.
[CV006, CV007, CV008, CV009, CV010, CV011]Chain from scale, proof, risks, and valuation to the recommendation.
[CV001, CV002, CV003, CV004, CV005, CV006]8.2 Scenarios, comparables, and sensitivity
Three scenarios anchor the valuation. Base case ($4B-$6B exit, ~50% probability) assumes ARR scales from current $130M-$200M to $500M-$700M over 2026-2028 with sustained gross margin in the 30-40% range typical of AI inference, modest dilution at a Series C, and Hopper→Blackwell capacity ramp on time. Bull case ($8B-$12B exit, ~25% probability) requires ARR >$1B by 2028, gross margin expansion through FlashAttention-driven utilisation, sustained Salesforce + NVIDIA channel commitment, and either a strategic acquisition (NVIDIA, hyperscaler, Salesforce) or a 2027-2028 IPO at premium multiples. Bear case ($1B-$2.5B outcome, ~25% probability) materialises if hyperscaler bundled inference compresses pricing, NVIDIA allocation tightens, or copyright precedent extends to platform hosts. The comparable-valuation table covers CoreWeave (post-IPO GPU-cloud comparable), Navan (recent S-1 SaaS comparable), Figma (S-1 comparable), and private rounds (Fireworks rumoured $4B, Replicate, Modal, Sakana, Mistral, Anthropic) plus public listings (NVIDIA, Snowflake) used as ceiling references. Sensitivity drivers are revenue growth, gross margin, NRR, exit multiple, and probability-weighted exit window. The bull/base/bear table and comparable-valuation table below capture each scenario's assumptions, valuation logic, and key sensitivity. The valuation-sensitivity bar figure and the valuation-range figure show downside, base, and upside vs the current Series B mark.[CV018, CV019, CV020, CV021, CV022, CV023]
| Scenario | Probability | ARR assumption | Gross margin | Exit multiple | Valuation/return logic | Key risks |
|---|---|---|---|---|---|---|
| Bull | 25% | >$1B ARR by 2028 | 40-50% | 12-15x ARR | $8B-$12B exit; strategic / premium IPO | Hyperscaler bundling; NVIDIA reallocation |
| Base | 50% | $500M-$700M ARR by 2028 | 30-40% | 8-10x ARR | $4B-$6B exit; trade-sale or IPO | Competitive pricing; retention drift |
| Bear | 25% | $200M-$300M ARR by 2028 | 20-30% | 5-7x ARR | $1B-$2.5B outcome; down-round | Hyperscaler price war; copyright precedent; NVIDIA allocation cut |
Probabilities are subjective and chapter-internal; each row should be re-marked at Series C and at every major customer or regulatory event.
[CV015, CV016, CV017, CV018, CV019, CV020]| Comparable | Metric | Multiple / valuation / status | Relevance | Limitation |
|---|---|---|---|---|
| CoreWeave (post-IPO, GPU-cloud) | EV / next-12-month revenue | 8-12x at post-IPO | GPU-cloud closest comparable | CoreWeave revenue mix is GPU-bare-metal heavier |
| Navan (S-1, SaaS) | EV / NTM revenue | 8-12x at filing | Growth-stage SaaS comparable | SaaS, not inference |
| Figma (S-1, SaaS) | EV / NTM revenue | 12-15x at filing | High-multiple SaaS comparable | Design SaaS, not inference |
| Fireworks AI (rumoured 2024 round) | last private round | ~$4B (rumoured) | Direct inference comparable | Round value rumoured |
| Replicate (private) | last private round | undisclosed | Direct inference comparable | Limited disclosure |
| Modal (private) | last private round | undisclosed | Serverless inference comparable | Limited disclosure |
| Anyscale (private) | last private round | $1B-$2B | Ray + inference comparable | Different positioning |
| Sakana AI (round) | last private round | ~$1.5B (Aug 2024) | OSS model-builder comparable | Model lab not infra |
| Mistral (round) | last private round | $6B (mid-2024) | OSS model lab comparable | Hybrid model + infra |
| Anthropic (round) | last private round | $60B+ (2025) | Closed-API comparable | Different model — not direct |
| NVIDIA (public) | EV / NTM revenue | high-teens to mid-20s | Ceiling reference | Far larger scale |
| Snowflake (public) | EV / NTM revenue | 10-15x | SaaS ceiling reference | Mature SaaS |
Comparable rows mix public and private valuations; private-round figures are taken from press reports and PitchBook.
[CV021, CV022, CV023, CV024, CV025, CV026]Sensitivity of valuation outcome to revenue, margin, multiple, retention.
[CV018, CV019, CV020, CV021]Low / base / high valuation range across scenarios at 2028 exit window.
[CV022, CV023, CV024, CV025, CV026, CV029]8.3 Thesis-break triggers, diligence asks, and KPIs
The thesis-break and kill-triggers table converts the chapter's risk and valuation logic into monitorable triggers tied to specific events: (a) revenue miss vs $500M-$700M ARR run-rate by 2027-2028 → re-underwrite base case, (b) Salesforce co-sell deprioritisation → kill bull case, (c) NVIDIA Blackwell allocation cut → re-underwrite capacity ramp, (d) hyperscaler bundled inference price cut >40% → compression, (e) any platform-host copyright ruling → re-underwrite OSS hosting, (f) Series C at flat/down vs Series B → mark-to-market valuation, (g) founder departure → kill thesis, (h) breach disclosure or multi-hour outage → SLA + reputation re-underwrite. The final diligence asks table records the remaining missing primitives — exact ARR, NRR/GRR, top-10 customer concentration, GPU committed spend, opex split, CFO/CRO hires, sovereign-channel posture, paid-developer count — and maps each to an owner or diligence path. The investment-KPI figure consolidates IC-ready scoring across market, proof, moat, economics, risk, valuation, and evidence quality on a 0-100 scale. The chapter is explicit that the recommendation is price-sensitive and evidence-sensitive: at a Series B valuation in the $3.3B-$3.5B range with the disclosed evidence base, Hold/Monitor is the disciplined answer — Buy at a 25%+ correction or with confirmed >$500M ARR + >120% NRR; Pass if any of the bear-case triggers fires before the Series C.[CV034, CV035, CV036, CV037, CV038, CV039]
| Trigger | Threshold | Transmission to thesis | Action implication |
|---|---|---|---|
| ARR run-rate vs base case | <$500M ARR by FY2027 | revenue mark-down | re-underwrite base |
| Salesforce co-sell | public deprioritisation | channel mark-down | kill bull |
| NVIDIA allocation | published cut to peer | capacity mark-down | re-underwrite capacity |
| Hyperscaler bundled pricing | >40% cut on AWS Bedrock or peer | margin compression | re-underwrite base |
| Copyright precedent | platform-host ruling | OSS hosting mark-down | re-underwrite OSS revenue |
| Financing | Series C flat or down vs Series B | mark-to-market | re-underwrite valuation |
| Founder departure | any of CEO/CTO/CSO | execution mark-down | kill thesis |
| Security / outage | breach disclosure OR multi-hour outage | reputation + SLA | re-underwrite enterprise pipeline |
Trigger thresholds are monitorable from public disclosure or peer comparables.
[CV033, CV034, CV035, CV036, CV037, CV038]| Topic | Missing evidence | Why it matters | Owner / diligence path |
|---|---|---|---|
| Revenue | exact ARR at runDate | base/bull scenario underwriting | request management ARR + growth |
| Retention | NRR / GRR / cohort retention | quality of revenue | request retention by cohort |
| Concentration | top-10 customer share | single-event downside | request anonymised top-10 |
| GPU commit | committed spend with NVIDIA | margin underwriting | request supplier commit |
| Opex split | R&D / S&M / G&A | burn underwriting | request income-statement split |
| CFO / CRO | presence + tenure | execution underwriting | confirm hires |
| Sovereign channel | Prosperity7 commit | geo + brand risk | confirm channel posture |
| Paid-developer count | paid vs free split | self-serve revenue underwriting | request paid-developer count |
| SOC 2 expiry | Type II expiry date | enterprise procurement | request attestation refresh |
| Open license posture | OSS hosting policy | copyright exposure | request hosting policy |
All diligence asks map to chapter-internal questions and to the risks chapter mitigation table.
[CV040, CV041, CV042, CV043, CV044]IC-ready scoring across market, proof, moat, economics, risk, valuation, evidence.
[CV040, CV041, CV042, CV043, CV044]8.4 Exhibits
Disclaimer
This report is a public-evidence diligence snapshot, not investment advice. Important financial, legal, technical, and contractual facts remain non-public and should be verified directly with management and primary documents before any investment decision.
Evidence index
| ID | Statement | Confidence | Sources |
|---|---|---|---|
| CO001 | Together AI markets itself as "the AI acceleration cloud" offering training, fine-tuning, and inference for open-source and custom models. | High | SO001, SO002 |
| CO002 | The corporate entity is Together Computer Inc., headquartered in San Francisco, California, with an additional research presence in Zurich. | High | SO002, SO004, SO003 |
| CO003 | Together was incorporated on 27 June 2022 by four co-founders: Vipul Ved Prakash, Ce Zhang, Chris Ré, and Percy Liang. | High | SO002, SO018 |
| CO004 | The company's public surface positions three product lines: serverless inference API, dedicated endpoints, and fine-tuning/training services. | High | SO001, SO035 |
| CO005 | Together emphasises that customers can keep weights and choose dedicated capacity, a deliberate contrast with closed-API providers. | Medium | SO001, SO005 |
| CO006 | CEO Vipul Ved Prakash previously co-founded Topsy, which Apple acquired for approximately $200M in 2013, and earlier co-founded Cloudmark. | High | SO018, SO002 |
| CO007 | CTO Ce Zhang is a tenured professor at ETH Zürich specialising in distributed ML and data-centric ML research. | High | SO002, SO018 |
| CO008 | Chief Scientist Chris Ré is a MacArthur Fellow at Stanford and a co-founder of Snorkel, anchoring much of Together's open-source research lineage. | High | SO002, SO011 |
| CO009 | Co-founder Percy Liang directs the Stanford Center for Research on Foundation Models (CRFM) and leads the HELM benchmark. | High | SO002, SO018 |
| CO010 | Princeton CS faculty member Tri Dao is the principal author of FlashAttention and is publicly identified as a Together chief scientist. | High | SO002, SO009, SO036 |
| CO011 | Together actively recruits across kernel engineering, GPU systems, applied ML, sales, and revenue operations roles as of May 2026. | High | SO003, SO018 |
| CO012 | Together raised a $20M Series Seed in May 2023 led by Lux Capital, with Factory, SciFi Capital, and Long Journey Ventures participating. | High | SO018, SO012 |
| CO013 | A $102.5M Series A closed in November 2023, led by Kleiner Perkins with NVIDIA, Emergence, NEA, Prosperity7, and Greycroft participating. | High | SO006, SO014, SO018 |
| CO014 | An interim financing in March 2024 reportedly valued Together at approximately $1.25B. | Medium | SO015, SO018 |
| CO015 | Together closed a $305M Series B on 9 July 2024 led by Salesforce Ventures and Coatue at a $3.3B post-money valuation. | High | SO012, SO013, SO016, SO017 |
| CO016 | Cumulative disclosed primary capital totals approximately $533M (seed + A + interim + B) before any 2025–2026 extensions. | Medium | SO012, SO006, SO018 |
| CO017 | No Together AI registration, S-1, or other public filing appears on SEC EDGAR as of the May 2026 run date. | High | SO027, SO019 |
| CO018 | NVIDIA participated as a strategic investor in both Series A and Series B financings, signalling H100/H200 supply alignment. | Medium | SO026, SO006, SO012 |
| CO019 | CNBC reported Together AI was running at an approximately $100M annualised revenue pace around the Series B announcement in July 2024. | Medium | SO012 |
| CO020 | Bloomberg cited triple-digit year-over-year revenue growth for Together AI at the time of the Series B, without disclosing absolute figures. | Medium | SO013 |
| CO021 | Together has publicly stated it operates more than 20,000 NVIDIA Hopper-class GPUs across its multi-region cluster. | Medium | SO012, SO005 |
| CO022 | The company describes its developer footprint as "hundreds of thousands" of developers, without disclosing paid versus free split. | Low | SO001, SO005 |
| CO023 | Together's public job board and LinkedIn footprint imply a headcount above 150 full-time staff globally as of May 2026. | Low | SO003, SO018 |
| CO024 | No audited gross margin, net revenue retention, or paid-customer disclosure exists for Together AI as of the run date. | High | SO027, SO019 |
| CO025 | Together AI launched OpenChatKit in March 2023 with LAION and Ontocord, an early open-source instruction-tuned chat baseline. | High | SO008, SO030 |
| CO026 | The RedPajama 1T token open dataset was released on 17 April 2023, intended to reproduce LLaMA-grade pretraining data. | High | SO007, SO029 |
| CO027 | FlashAttention-3 was published on arXiv and Together's blog on 11 July 2024, claiming state-of-the-art H100 attention performance. | High | SO036, SO009 |
| CO028 | StripedHyena-Nous-7B, a non-attention long-context architecture, was released in December 2023 in collaboration with Nous Research. | High | SO031, SO034 |
| CO029 | Together's Mixture-of-Agents paper, published in June 2024, demonstrated multi-LLM ensembling improvements on AlpacaEval. | High | SO037, SO011 |
| CO030 | Together publishes an active GitHub organisation (togethercomputer) with multiple ten-thousand-star repositories including OpenChatKit and RedPajama-Data. | High | SO028, SO029, SO030 |
| CO031 | The HuggingFace organisation togethercomputer hosts the RedPajama datasets and StripedHyena, Pythia, LLaMA-32k, and m2-bert models. | High | SO033, SO011 |
| CO032 | No public regulatory action, litigation, recall, or executive departure involving Together AI has been reported as of May 2026. | Medium | SO018, SO019, SO027 |
| CO033 | Together AI is described as one of the most followed open-source-AI infrastructure accounts on Hacker News and X. | Low | SO020, SO024, SO021 |
| CO034 | Salesforce Ventures publicly framed the Series B as enabling enterprise customers to deploy open models on Together's cloud. | Medium | SO025, SO012 |
| CO035 | Crunchbase's Together AI profile is paywalled and could not be independently verified for cap-table details at runDate. | Medium | SO019 |
| CO036 | Cover-metric "gaps" remain for ARR, gross margin, NRR, and paid-customer count; all are flagged as diligence asks for management. | Medium | SO027, SO019, SO012 |
| CM001 | Together AI competes in the AI compute and inference platform layer between hyperscaler GPU IaaS and closed-API model labs. | High | SM001, SM004, SM023 |
| CM002 | Together's addressable spend pool excludes general-purpose cloud compute and closed-only proprietary model APIs. | Medium | SM001, SM002 |
| CM003 | Status-quo substitutes for Together include self-hosted Kubernetes-on-GPU clusters and OpenAI/Anthropic closed APIs. | Medium | SM011, SM012 |
| CM004 | Specialised GPU clouds (CoreWeave, Lambda) compete on infrastructure but lack Together's open-source-model SaaS layer. | Medium | SM013, SM014 |
| CM005 | Inference-API providers (Replicate, Fireworks, Groq, Modal) compete directly at the per-token serverless layer. | High | SM015, SM019, SM018, SM016 |
| CM006 | AWS Bedrock and Google Vertex AI offer hosted open-model inference that overlaps Together's serverless product. | High | SM011, SM012 |
| CM007 | Gartner sizes 2024 AI infrastructure TAM at $40–60B with a 30–50% CAGR through 2028. | Medium | SM021 |
| CM008 | IDC-style analyst notes peg 2024 global AI infrastructure spend near $50B. | Low | SM021, SM022 |
| CM009 | Triangulated inference + dedicated GPU SAM for 2026 lands in an $8–15B range. | Medium | SM021, SM024, SM022 |
| CM010 | Together-addressable SOM (channels + open-model demand) is on the order of $1–3B in 2026. | Low | SM024, SM027 |
| CM011 | CNBC reported a ~$100M Together ARR at the July 2024 Series B, implying mid-single-digit SOM share. | Medium | SM024, SM025 |
| CM012 | NVIDIA disclosed >$30B quarterly data-centre revenue in early 2025, evidence that AI-compute spend dwarfs Together's ARR. | High | SM028, SM022 |
| CM013 | No single public source cleanly disaggregates inference spend from training capex, creating range uncertainty. | Medium | SM021, SM028, SM022 |
| CM014 | AI-native startups and model labs are Together's most active early buyers, choosing it for open-weight flexibility and dedicated GPU access. | Medium | SM003, SM032 |
| CM015 | F500 enterprise platform teams are an emerging segment, anchored by Salesforce Ventures Series B leadership. | Medium | SM027, SM024 |
| CM016 | Sovereign and regional cloud customers are a strategic third segment, signalled by Prosperity7 (Aramco) investor presence. | Low | SM024, SM023 |
| CM017 | Within Together, users (developers) frequently differ from payers (procurement/finance), lengthening enterprise sales cycles. | Low | SM027, SM004 |
| CM018 | Self-serve credit-card adoption is the primary land motion for AI-native startup customers on Together. | Medium | SM002, SM008 |
| CM019 | Together's NVIDIA GTC 2025 spotlight emphasised "AI pioneers" as case-study customers, validating the enterprise wedge. | Medium | SM033, SM028 |
| CM020 | Together's AI-Native conference (2025) was framed as a developer community event, reinforcing top-of-funnel demand generation. | Medium | SM005, SM030 |
| CM021 | Open-weight model proliferation (Llama 3/4, DeepSeek, Mistral, Qwen) keeps SAM growth above 35% CAGR through 2027. | Medium | SM022, SM021, SM029 |
| CM022 | NVIDIA Hopper and Blackwell GPU scarcity drives demand for Together's reserved capacity SKUs. | Medium | SM028, SM013 |
| CM023 | Closed-API price cuts from OpenAI compress per-token margins across the inference market. | Low | SM002, SM030 |
| CM024 | Hyperscaler open-model commoditisation (AWS Bedrock, GCP Vertex Model Garden) threatens to erode Together's pure-inference SAM. | Medium | SM011, SM012 |
| CM025 | Sovereign data residency rules accelerate demand for in-region dedicated clusters but cap cross-border ARR. | Low | SM004, SM023 |
| CM026 | Energy and data-centre permitting bottlenecks slow capacity expansion through 2028. | Low | SM013, SM028 |
| CM027 | Agentic AI workloads (Mixture-of-Agents, multi-step reasoning) multiply per-user token volume. | Medium | SM004, SM005 |
| CM028 | FinOps pressure pushes enterprises to substitute open-weight inference for closed-API spend. | Low | SM002, SM027 |
| CM029 | Together announces serverless, dedicated, and batch inference SKUs to capture different buyer demand curves. | High | SM002, SM008, SM009, SM010 |
| CM030 | Batch inference pricing updates in 2025 reduced per-million-token costs to attract high-volume customers. | Medium | SM006, SM010 |
| CM031 | Specialised GPU clouds CoreWeave and Lambda compete on raw GPU-hour pricing; Together overlays an inference SaaS layer. | Medium | SM013, SM014 |
| CM032 | Groq, Cerebras, and SambaNova compete with bespoke silicon for inference latency leadership. | High | SM018, SM020 |
| CM033 | Modal, Replicate, and Anyscale compete in serverless and Ray-based AI compute SaaS. | Medium | SM016, SM015, SM017 |
| CM034 | Fireworks AI is widely cited as Together's closest direct competitor on open-model inference SaaS. | Medium | SM019, SM030 |
| CM035 | Public-cloud earnings (AWS, GCP) describe AI workloads as the fastest-growing portion of cloud revenue. | Medium | SM011, SM012 |
| CM036 | Reddit r/LocalLLaMA and Hacker News discussion volume around Together has risen steadily through 2024–2026. | Low | SM030, SM029, SM031 |
| CP001 | Together competes against AWS Bedrock and Google Vertex Model Garden on hosted open-weight model inference. | High | SP018, SP019, SP001 |
| CP002 | Specialised GPU clouds CoreWeave and Lambda compete with Together at the IaaS layer for reserved GPU capacity. | High | SP020, SP021 |
| CP003 | Fireworks, Replicate, Modal, and Anyscale provide direct substitutes at the per-token serverless inference layer. | Medium | SP026, SP022, SP023, SP024 |
| CP004 | Groq, Cerebras, and SambaNova compete with bespoke silicon for inference latency leadership. | High | SP025, SP027 |
| CP005 | OpenAI and Anthropic act as substitutes for closed-API customers willing to give up weight portability. | Medium | SP018, SP036 |
| CP006 | TensorWave provides AMD MI300X GPU capacity as a niche alternative for cost-sensitive teams. | Low | SP028 |
| CP007 | Self-hosted Kubernetes-on-GPU is the status-quo alternative most cited by frontier labs and FAANG. | Low | SP036, SP037 |
| CP008 | Fireworks AI is widely cited as Together's closest direct competitor on open-model inference SaaS. | Medium | SP026, SP036, SP037 |
| CP009 | Together leads on FlashAttention kernel performance, anchored by the FlashAttention-3 paper and Together engineering team. | High | SP031, SP005, SP029, SP030 |
| CP010 | FlashAttention-4 was released in 2025 and extends Together's kernel lead on Hopper GPUs. | Medium | SP006 |
| CP011 | AWS Bedrock and GCP Vertex lead on enterprise compliance breadth (BAA, FedRAMP, regional residency). | High | SP018, SP019 |
| CP012 | Groq leads on single-stream inference latency on its supported models but lags in model coverage. | Medium | SP025, SP036 |
| CP013 | Fireworks AI provides an OpenAI-compatible API and serves the same open-model catalog as Together. | High | SP026, SP015 |
| CP014 | Together's serverless Llama-70B is listed near $0.88 per million tokens, within the OpenAI-parity envelope. | High | SP002, SP011 |
| CP015 | Together batch inference offers up to 50% discount versus serverless rates as of the 2025 update. | Medium | SP013 |
| CP016 | AWS Bedrock charges $0.99/M output tokens for Llama 3 70B in 2026 list pricing. | Medium | SP018 |
| CP017 | GCP Vertex Llama 3 70B is priced near $0.99/M tokens with volume discounts. | Medium | SP019 |
| CP018 | Groq lists Llama 3 70B at ~$0.59/M tokens, undercutting Together on raw price while constraining model choice. | Medium | SP025 |
| CP019 | CoreWeave and Lambda charge $2–4 per H100-hour for reserved or on-demand GPUs. | Medium | SP020, SP021 |
| CP020 | Together fine-tuning API, batch SKU, and dedicated endpoints differentiate it from raw-GPU competitors. | High | SP012, SP013, SP011 |
| CP021 | Together's open-source research lineage (RedPajama, StripedHyena, MoA, FlashAttention) sustains community gravity that competitors struggle to match. | High | SP031, SP034, SP004 |
| CP022 | Tri Dao and Chris Ré anchor Together's kernel and architecture research velocity. | High | SP031, SP005, SP008 |
| CP023 | NVIDIA's participation in Series A and Series B is read by the market as a GPU supply alignment moat. | Medium | SP041 |
| CP024 | Salesforce Ventures Series B leadership opens an enterprise distribution channel competitors lack. | Medium | SP004, SP003 |
| CP025 | Together advertises dedicated endpoints and reserved capacity SKUs that raise customer switching cost. | High | SP012, SP002 |
| CP026 | Hyperscalers (AWS, GCP) own enterprise procurement and identity, which is a distribution disadvantage Together must compensate for. | Medium | SP018, SP019 |
| CP027 | Enterprise multi-homing across Together / Fireworks / Bedrock is the reported equilibrium in 2026 buyer surveys. | Low | SP036, SP037 |
| CP028 | Open-weight neutrality is a counter-positioning advantage versus closed-only OpenAI and Anthropic substitutes. | Medium | SP001, SP002 |
| CP029 | Together publishes an OpenAI-compatible chat completions endpoint, simplifying migration from closed APIs. | High | SP015, SP016 |
| CP030 | CoreWeave's 2024 IPO disclosures reveal $1B+ revenue scale, implying meaningful capital advantage at the IaaS layer. | Medium | SP020, SP036 |
| CP031 | Lambda Labs raised a $320M Series C in 2024 to expand its H100/H200 fleet. | Medium | SP021 |
| CP032 | Groq and Cerebras have each raised more than $1B in 2024–2025 to fund bespoke silicon expansion. | Medium | SP025, SP027 |
| CP033 | AWS Bedrock's 2025 expansion of Llama support compresses Together's premium on commodity inference workloads. | Medium | SP018 |
| CP034 | Specialised silicon vendors (Groq, Cerebras, SambaNova) pose a latency-leapfrog risk that pure-software inference cannot fully match. | Medium | SP025, SP027 |
| CP035 | Together's Python SDK and PyPI download trajectory signal sustained developer pull comparable to peers. | Medium | SP042, SP043 |
| CP036 | Speculative-decoding and Medusa-class research feed Together's ability to close any Groq latency gap on shared models. | Medium | SP032, SP033 |
| CI001 | Together AI raised a $20M Seed in May 2023 led by Lux Capital. | High | SI008, SI018, SI019 |
| CI002 | Together AI raised a $102.5M Series A in November 2023 led by Kleiner Perkins. | High | SI005, SI015, SI018 |
| CI003 | In March 2024 Together added approximately $106M at a reported $1.25B valuation (Series A2). | Medium | SI016, SI007, SI014 |
| CI004 | Per the canonical company-overview claim, the Series B closed July 2024 at ~$3.3B post led by Salesforce Ventures and Coatue (financials chapter relies on that fact for capital-stack analysis). | High | SI011, SI012, SI013, SI006 |
| CI005 | NVIDIA participated in both Series A and Series B as a strategic investor. | High | SI022, SI006 |
| CI006 | Salesforce Ventures led the Series B, opening an enterprise distribution channel. | High | SI021, SI011, SI006 |
| CI007 | Cumulative disclosed primary capital is approximately $533M across Seed, Series A, March 2024 extension, and Series B. | High | SI011, SI018, SI006 |
| CI008 | No S-1, S-3, or registered offering appears on SEC EDGAR for Together Computer Inc. at the 2026-05 runDate. | High | SI020, SI025 |
| CI009 | CNBC reported an approximately $100M annualised revenue pace around the July 2024 Series B announcement. | Medium | SI011, SI012 |
| CI010 | Bloomberg reported triple-digit revenue growth around the July 2024 Series B. | Medium | SI013, SI014 |
| CI011 | Together has not published audited ARR, gross margin, or NRR figures as of the runDate. | High | SI020, SI001, SI002 |
| CI012 | Together publishes per-token list pricing on its public pricing page for serverless inference. | High | SI002, SI001 |
| CI013 | Together offers a 50% batch inference discount as of the 2025 batch pricing update. | Medium | SI009, SI002 |
| CI014 | Dedicated endpoint and reserved-capacity pricing is quoted via sales rather than published. | High | SI002, SI004 |
| CI015 | Together SKUs span serverless, dedicated, fine-tuning, batch, embeddings, vision, audio, and image. | High | SI002, SI001, SI004 |
| CI016 | Realised enterprise pricing for Together is not publicly disclosed and is a material diligence gap. | Medium | SI002, SI038 |
| CI017 | The Information has published paywalled coverage of Together AI 2025 revenue trajectory. | Low | SI026 |
| CI018 | PitchBook lists Together AI as later-stage venture with no public 2025 round confirmation. | Medium | SI025, SI019 |
| CI019 | Together has not disclosed gross margin by SKU as of the runDate. | High | SI020, SI002, SI001 |
| CI020 | Together has not disclosed top-10 customer concentration as of the runDate. | High | SI020, SI003 |
| CI021 | Together has not disclosed net dollar retention (NDR) as of the runDate. | High | SI020, SI003 |
| CI022 | Together has not disclosed contracted-revenue (RPO) figures. | High | SI020, SI001 |
| CI023 | Together has not disclosed cash position or runway as of the runDate. | High | SI020, SI001 |
| CI024 | CoreWeave 2024 S-1 disclosures imply GPU-cloud gross margins in the 60-70% range on reserved deals. | Medium | SI032, SI035 |
| CI025 | Together per-token gross margin on serverless is plausibly 40-60% based on competitor analog disclosures. | Low | SI032, SI036, SI037 |
| CI026 | Implied cash burn through 2024 is roughly $300-$500M consistent with GPU buildout and 150+ headcount. | Low | SI004, SI001, SI018 |
| CI027 | With $533M raised and that implied burn, runway likely extends into 2026 without a new round. | Low | SI006, SI011 |
| CI028 | Figma and CoreWeave 2025 IPOs demonstrate the public-market window is open for AI-infrastructure issuers. | High | SI034, SI032 |
| CI029 | Navan 2025 S-1 process is a closer growth-SaaS comparable than CoreWeave for Together. | Medium | SI033 |
| CI030 | Together has not disclosed any debt or vendor-financing facility. | Medium | SI020, SI004 |
| CI031 | Founder and employee ownership post Series B is widely reported as significant but no exact percentages are public. | Low | SI006, SI018, SI019 |
| CI032 | No public secondary or tender offer for Together AI shares has been reported at the runDate. | Medium | SI020, SI025, SI026 |
| CI033 | Forrester and IDC market frames place Together in the growth-stage generative-AI infrastructure segment without naming it top-three. | Medium | SI027, SI028 |
| CI034 | Menlo Ventures and Bessemer 2025 State-of-AI reports frame the inference market as multi-billion-dollar and growing. | Medium | SI030, SI031, SI029 |
| CI035 | No public 2026 follow-on round, IPO filing, or M&A announcement involving Together has been confirmed at the runDate. | High | SI020, SI025, SI026, SI011 |
| CI036 | Together pricing-page revisions in 2025 added batch and dedicated SKU clarifications, signalling product and financial maturation. | Medium | SI009, SI002, SI004 |
| CI037 | Public disclosure across ten standard financial primitives is missing or partial, qualifying as a material diligence gap. | High | SI020, SI001, SI002, SI003 |
| CE001 | Together AI exposes serverless inference, dedicated endpoints, fine-tuning, batch, embeddings, vision, audio, and image APIs. | High | SE016, SE018, SE001, SE003 |
| CE002 | Together AI publishes an OpenAI-compatible chat-completions endpoint to simplify migration. | High | SE022, SE035 |
| CE003 | The Together model catalog spans 200+ open and custom models including Llama, Mistral, Mixtral, Qwen, DeepSeek, StripedHyena. | High | SE018, SE036, SE045 |
| CE004 | Dedicated endpoints offer reserved H100/H200/B200 capacity with BAA available for HIPAA workloads. | High | SE020, SE003, SE005 |
| CE005 | Fine-tuning API supports LoRA and full-parameter training jobs on most supported families. | High | SE019, SE042 |
| CE006 | Batch inference offers up to 50% discount vs serverless as of the 2025 update. | Medium | SE011, SE021 |
| CE007 | Embeddings API offers multiple open embedding models per published reference. | High | SE024, SE034 |
| CE008 | Together publishes vision, audio, and image APIs as documented surfaces. | High | SE031, SE032, SE033 |
| CE009 | SDKs ship in Python (PyPI: together) and TypeScript with raw HTTP fallback. | High | SE044, SE043, SE017 |
| CE010 | Rate-limit documentation distinguishes free, paid, and enterprise tiers. | High | SE025, SE016 |
| CE011 | Together architecture stacks API gateway, model registry, inference scheduler, TIE v2, and GPU pool. | High | SE016, SE009, SE010 |
| CE012 | Together Inference Engine v2 integrates FlashAttention-3/4 and ThunderKittens kernels. | High | SE010, SE006, SE007, SE008 |
| CE013 | Speculative decoding and Medusa decoders are integrated into the inference engine. | Medium | SE053, SE054, SE055 |
| CE014 | Mixture-of-Agents (MoA) provides ensemble inference for higher-quality completions on supported models. | Medium | SE056, SE012 |
| CE015 | FlashAttention-3 paper (arXiv 2407.08608) describes the kernel anchoring Together throughput claims. | High | SE052, SE006 |
| CE016 | FlashAttention-4 was released in August 2025 and extends the kernel lead to Hopper and Blackwell. | Medium | SE007, SE012 |
| CE017 | ThunderKittens kernel framework was released in 2024 by Together and Stanford HazyResearch. | High | SE008, SE065 |
| CE018 | NVIDIA is the primary GPU supplier (Hopper H100/H200, Blackwell B200) and a strategic investor. | High | SE060, SE014, SE001 |
| CE019 | HuggingFace is the primary model artefact partner and hosts Together-published checkpoints. | High | SE045, SE049 |
| CE020 | A status page is published at status.together.ai documenting platform reliability. | Medium | SE062 |
| CE021 | The public SLA percentage for serverless and dedicated tiers is not yet documented at the runDate. | Medium | SE062, SE025 |
| CE022 | Together infrastructure organisation expanded in 2025 with Alon Gavrielov as VP of Infrastructure Strategy. | High | SE015, SE005 |
| CE023 | Trust center publishes SOC 2 Type II attestation references and HIPAA BAA availability. | High | SE063, SE066, SE067 |
| CE024 | HIPAA BAA is available on dedicated endpoints but not serverless tier per documentation. | Medium | SE063, SE020 |
| CE025 | GDPR / DPA terms are available for EU customers per trust center documentation. | Medium | SE063 |
| CE026 | FedRAMP accreditation is not yet listed in the trust center at the runDate. | Medium | SE063 |
| CE027 | The full regional residency map (which regions, which co-lo partners) is not publicly disclosed. | Medium | SE063, SE020 |
| CE028 | ISO 27001 certification status is not publicly confirmed at the runDate. | Medium | SE063 |
| CE029 | Content moderation, function calling, JSON mode, and structured-output safety controls are documented surfaces. | High | SE028, SE027, SE026 |
| CE030 | Audit logs are documented for enterprise customers but not enabled by default. | Medium | SE063, SE020 |
| CE031 | Custom-model-weights privacy controls are documented for dedicated tier. | Medium | SE020, SE063 |
| CE032 | A bug bounty / responsible disclosure programme is published on the trust center. | Medium | SE063 |
| CE033 | GTC 2025 Pioneers event surfaced multiple Together customer + NVIDIA partnerships. | High | SE014, SE060 |
| CE034 | Adaption partnership (2025) extends Together into healthcare workflows. | Medium | SE005 |
| CE035 | AI Native Conference 2025 announced research and product directions including MoA productisation. | High | SE012, SE005 |
| CE036 | Blackwell (B200) capacity ramp is documented as 2026 roadmap item in blog references. | Low | SE005, SE014 |
| CE037 | Multi-modal expansion (vision + audio) is a documented 2026 roadmap area. | Low | SE005, SE012 |
| CU001 | Together AI reports more than 100,000 developers have used the platform per company disclosure. | Medium | SU004, SU003, SU001 |
| CU002 | Self-serve developer signup is the primary top-of-funnel adoption motion for Together AI. | High | SU038, SU001, SU003 |
| CU003 | Together customers page enumerates named startup and enterprise deployments. | High | SU003, SU001 |
| CU004 | AI-native startups (Pika, Cartesia, Arcee, Nous Research) are documented production customers. | High | SU012, SU015, SU013, SU014, SU003 |
| CU005 | Enterprise SaaS deployments at Salesforce and Zoom are documented case studies. | High | SU010, SU011, SU003 |
| CU006 | Washington University is referenced as a research-compute customer in a case study. | Medium | SU016, SU003 |
| CU007 | Adaption (2025) extends Together into healthcare workflows. | Medium | SU008, SU004 |
| CU008 | NVIDIA GTC 2025 Pioneers programme surfaced a cohort of joint Together + NVIDIA customers. | High | SU007, SU018 |
| CU009 | Startup Accelerator launched in November 2024 as an explicit startup-acquisition funnel. | High | SU006, SU004 |
| CU010 | Geographic mix is North America-skewed with EU presence growing through dedicated clusters. | Low | SU003, SU004, SU001 |
| CU011 | Buyer/user split differs by tier: developer-led self-serve vs CIO/platform-eng-led enterprise. | Medium | SU038, SU003 |
| CU012 | Salesforce case study documents integration depth and is treated as production deployment. | High | SU010, SU017, SU003 |
| CU013 | Zoom case study documents AI-feature inference at production scale. | High | SU011, SU003 |
| CU014 | Pika case study cites latency improvement from FlashAttention-class kernels. | High | SU012, SU003 |
| CU015 | Cartesia case study documents voice-model production deployment on dedicated tier. | High | SU015, SU003 |
| CU016 | Arcee case study documents cost reduction relative to closed APIs. | Medium | SU013, SU003 |
| CU017 | Nous Research case study documents community model hosting on Together. | Medium | SU014, SU003 |
| CU018 | Washington University case study documents research-compute usage. | Medium | SU016, SU003 |
| CU019 | Adaption is described as a launching partnership rather than confirmed production deployment. | Medium | SU008 |
| CU020 | GTC 2025 cohort case studies cover developer tools, robotics, healthcare, and content/media. | Medium | SU007 |
| CU021 | HuggingFace partnership funnels developers from the model hub into Together. | Medium | SU019, SU020 |
| CU022 | Net dollar retention (NDR) is not publicly disclosed at the runDate. | High | SU003, SU001, SU004 |
| CU023 | Gross retention (GRR) and named-account churn are not publicly disclosed. | High | SU003, SU001, SU004 |
| CU024 | Paid vs free developer counts are not disclosed. | High | SU004, SU003 |
| CU025 | Dedicated-endpoint renewal rate is not publicly disclosed. | High | SU004, SU003 |
| CU026 | G2 and Trustpilot review counts for Together are small, limiting independent proxies. | Medium | SU026, SU027 |
| CU027 | Salesforce Ventures-led Series B and customer case study together signal a multi-year channel commitment. | Medium | SU017, SU010, SU004 |
| CU028 | GTC 2025 Pioneers cohort acts as an enterprise pipeline amplifier through NVIDIA. | Medium | SU007, SU018 |
| CU029 | Startup Accelerator provides credits and GTM amplification to long-tail AI startups. | High | SU006, SU004 |
| CU030 | Adaption launch indicates a follow-on path into regulated healthcare workflows. | Medium | SU008 |
| CU031 | Enterprise sales cycle requires custom MSA and security review, adding 60-120 days before revenue. | Low | SU004, SU038 |
| CU032 | Top-10 customer concentration is undisclosed and is a material diligence ask. | High | SU003, SU004 |
| CU033 | Public customer mix skews AI-native startups + developer tools rather than a single mega-anchor. | Medium | SU003, SU006, SU007 |
| CU034 | No public lawsuit or named-account churn report has surfaced for Together at the runDate. | Medium | SU023, SU022, SU004 |
| CU035 | Reddit and Hacker News threads occasionally cite latency or cold-start concerns on the serverless tier. | Low | SU023, SU022 |
| CU036 | Public status page exists but no SLA percentage is published for serverless or dedicated tiers. | Medium | SU042, SU038 |
| CU037 | PyPI download trajectory and GitHub repo activity indicate sustained developer pull. | Medium | SU040, SU041 |
| CR001 | FTC opened a 6(b) inquiry in 2024 into generative-AI investments and partnerships, naming the major cloud-AI relationships. | High | SR002, SR001 |
| CR002 | FTC has stated ongoing 2024-2025 attention to GenAI competition and consumer-protection enforcement. | High | SR001, SR002 |
| CR003 | EU AI Act entered into force in 2024 with phased GPAI obligations through 2026-2027 including fines up to 7% of global revenue. | High | SR003, SR012 |
| CR004 | BIS tightened advanced-computing export controls in 2025 covering H100, H200, B200 and certain foundation-model weights. | High | SR005, SR008 |
| CR005 | NIST AI Risk Management Framework establishes voluntary US federal AI controls increasingly used in enterprise procurement. | High | SR004, SR008 |
| CR006 | UK ICO has published GenAI guidance creating UK DPA compliance baseline. | Medium | SR006 |
| CR007 | Australia OAIC has published a 2024 GenAI guide for organisations. | Medium | SR007 |
| CR008 | White House EO on AI (2023, amended 2025) sets reporting thresholds for foundation-model training. | Medium | SR008 |
| CR009 | CCPA imposes privacy obligations on Together for California-resident user data. | High | SR009, SR012 |
| CR010 | HIPAA BAA support is published as available for healthcare workloads. | High | SR010, SR028, SR026 |
| CR011 | SOC 2 attestation surface is referenced via the AICPA SOC framework and Together trust center. | Medium | SR011, SR028 |
| CR012 | NYT v Microsoft/OpenAI active litigation (CourtListener docket) is the bellwether GenAI copyright case in US. | High | SR013, SR014 |
| CR013 | Authors Guild v OpenAI active litigation expands copyright exposure to non-press content. | High | SR014, SR013 |
| CR014 | Getty Images v Stability AI active litigation tests image-model copyright exposure on both US and UK sides. | High | SR015, SR014 |
| CR015 | Civil-society organisations (CDT) actively lobby for AI accountability, adding reputational pressure. | Medium | SR012 |
| CR016 | Together is not currently named in any of the bellwether GenAI copyright suits. | Medium | SR013, SR014, SR015, SR025 |
| CR017 | Open-model hosting carries adjacent precedent risk if copyright cases extend to platform hosts. | Medium | SR013, SR014, SR015 |
| CR018 | Together publishes a public status page but does not publish an SLA percentage. | High | SR027, SR030 |
| CR019 | Pen-test cadence, breach plan, and named incident history are not publicly disclosed. | High | SR028, SR025 |
| CR020 | Safety models and function-calling guardrails are documented mitigations for prompt-injection class risks. | High | SR031, SR030 |
| CR021 | HuggingFace integrity checks are inherited for model-weight artefacts; weight-signing process is undisclosed. | Medium | SR028, SR025 |
| CR022 | Trust center references SOC 2 Type II posture; attestation expiry date is not public. | Medium | SR028, SR011 |
| CR023 | NVIDIA is supplier of GPUs, networking, and software stack and a strategic investor — single-vendor concentration is high. | High | SR025, SR024, SR029 |
| CR024 | HuggingFace is the primary model-artefact dependency for the Together catalog. | High | SR025, SR029 |
| CR025 | Salesforce Ventures is lead enterprise channel investor and co-sell partner. | High | SR025, SR029 |
| CR026 | Datacenter / colo capacity counterparties are largely undisclosed; multi-region build is implied but not enumerated. | Medium | SR025, SR024 |
| CR027 | Capital partners include GC, Salesforce, NVIDIA, Lux, Coatue, Prosperity7, and Kleiner per public round disclosures. | High | SR025, SR034, SR035 |
| CR028 | Top-10 customer concentration is undisclosed and is a material diligence ask. | High | SR029, SR025 |
| CR029 | Competitive displacement risk is documented from Fireworks, Replicate, Modal, Anyscale, Groq, Cerebras, CoreWeave, Lambda. | High | SR019, SR020, SR021, SR022, SR017, SR018, SR016, SR023 |
| CR030 | Open-source model upstream license changes (Llama, Mistral, Qwen, DeepSeek) would introduce review and compliance burden. | Medium | SR025, SR029 |
| CR031 | Sovereign / Prosperity7-adjacent backing adds geopolitical disclosure considerations. | Medium | SR034, SR035, SR025 |
| CR032 | Key-person dependency on Vipul Ved Prakash, Ce Zhang, and Tri Dao is high; founder retention is the mitigation. | High | SR024, SR025 |
| CR033 | CFO and CRO presence at runDate is not publicly confirmed and is a material recruiting diligence ask. | Medium | SR025, SR024 |
| CR034 | Engineering and infra hiring momentum is visible (Alon Gavrielov 2025 VP-infra hire) but exact bench size is undisclosed. | Medium | SR025, SR024 |
| CR035 | Hopper→Blackwell→Rubin transition execution is a multi-quarter program-management risk for the chapter. | Medium | SR025 |
| CR036 | Monitorable kill triggers (NVIDIA allocation cut, HF policy change, EU AI Act fine, copyright host-ruling) can be tracked from public disclosure. | Medium | SR025, SR003, SR005, SR013 |
| CR037 | Operational kill triggers (multi-hour serverless outage, breach disclosure) are monitorable through status page and press. | Medium | SR027, SR025, SR032, SR033 |
| CR038 | Commercial kill triggers (Salesforce co-sell deprioritisation, customer concentration >25% single) are monitorable through press and reference calls. | Medium | SR025, SR029 |
| CR039 | Founder-departure triggers are catastrophic for the thesis at growth stage. | Medium | SR025, SR024 |
| CR040 | Financing kill triggers (flat/down round vs Series B at runDate) would re-underwrite valuation. | Medium | SR025, SR034, SR035 |
| CR041 | Adverse-source coverage spans regulators, court dockets, competitors, and developer-sentiment fora. | High | SR002, SR013, SR019, SR032, SR033 |
| CR042 | Several control primitives (SLA, incident, breach plan, top-10 concentration, GPU committed spend) remain undisclosed at runDate and are explicit diligence asks. | High | SR025, SR029, SR028, SR027 |
| CV001 | Recommendation is Hold / Monitor with medium confidence at the Series B mark. | Medium | SV025, SV027, SV007 |
| CV002 | Conditional Buy on a 25%+ correction or confirmed >$500M ARR plus >120% NRR. | Medium | SV008, SV007, SV005 |
| CV003 | Conditional Pass if hyperscaler pricing cuts >40%, NVIDIA allocation cuts, or breach disclosure occurs. | Medium | SV001, SV002, SV003 |
| CV004 | Risk rating is medium-high reflecting concentration, regulatory, and competitive overhangs. | Medium | SV001, SV002, SV006 |
| CV005 | Valuation stance is "at-or-near" the current Series B mark with explicit triggers to revisit. | Medium | SV007, SV008 |
| CV006 | GenAI inference TAM grows 40-60% CAGR per multiple analyst sources at 2025 mid-point. | High | SV001, SV002, SV003, SV004, SV005 |
| CV007 | FlashAttention authorship by Tri Dao and ThunderKittens (Stanford HazyResearch) anchor Together's kernel moat. | High | SV025, SV024 |
| CV008 | Together Inference Engine v2 and MoA productisation extend the technical surface beyond commoditised inference. | Medium | SV025 |
| CV009 | Salesforce Ventures-led Series B + customer case study imply multi-year channel commitment. | Medium | SV043, SV025, SV018 |
| CV010 | NVIDIA strategic investment + GTC 2025 Pioneers cohort signal supply + pipeline alignment. | High | SV025, SV044 |
| CV011 | Open-source neutrality (Llama, Mistral, Qwen, DeepSeek) is defensible positioning vs closed-API providers. | Medium | SV025, SV027 |
| CV012 | Documented enterprise + startup proof base spans Salesforce, Zoom, Pika, Cartesia, Arcee, Nous Research, GTC 2025 Pioneers. | High | SV027, SV025 |
| CV013 | Anti-thesis: hyperscaler bundled inference (Bedrock, Vertex, Azure) could compress pricing 30-50%. | Medium | SV001, SV002, SV006 |
| CV014 | Anti-thesis: copyright litigation precedent (NYT, Authors Guild, Getty) could extend to platform hosts. | Medium | SV025, SV008 |
| CV015 | Bull case (25% prob) assumes ARR >$1B by 2028 and exit $8B-$12B. | Medium | SV001, SV005, SV006 |
| CV016 | Base case (50% prob) assumes ARR $500M-$700M by 2028 and exit $4B-$6B. | Medium | SV001, SV003, SV002 |
| CV017 | Bear case (25% prob) assumes ARR $200M-$300M by 2028 and outcome $1B-$2.5B. | Medium | SV001, SV002, SV006 |
| CV018 | Sensitivity to ARR growth is the single largest valuation driver in the chapter model. | Medium | SV007, SV008 |
| CV019 | Gross margin sensitivity is ±1000bps shifts valuation outcome ±$2-3B at base case. | Medium | SV014, SV013 |
| CV020 | Multiple sensitivity is ±3x ARR shifts exit ±$2.5B at base case. | Medium | SV013, SV015 |
| CV021 | Probability weights are subjective and re-marked at Series C and major events. | Low | SV007, SV008 |
| CV022 | CoreWeave post-IPO trades 8-12x NTM revenue as GPU-cloud comparable. | Medium | SV014, SV018 |
| CV023 | Navan S-1 disclosed 8-12x NTM revenue range at filing for growth-stage SaaS. | Medium | SV013, SV030 |
| CV024 | Figma S-1 disclosed 12-15x NTM revenue range as high-multiple SaaS reference. | Medium | SV015, SV029 |
| CV025 | Fireworks AI rumoured 2024 round valued ~$4B per press reports. | Low | SV018, SV019 |
| CV026 | Replicate and Modal rounds undisclosed in public press. | Medium | SV023, SV022 |
| CV027 | Anyscale private valuation rumoured $1B-$2B at last round. | Low | SV023, SV019 |
| CV028 | Sakana AI round ~$1.5B Aug 2024 per TechCrunch and NVIDIA partnership. | Medium | SV031, SV032 |
| CV029 | Mistral round ~$6B mid-2024 as OSS-model-lab comparable. | Medium | SV019, SV018 |
| CV030 | Anthropic round at $60B+ in 2025 as closed-API reference, not direct comparable. | Medium | SV018, SV019 |
| CV031 | NVIDIA public NTM revenue multiple high-teens to mid-20s acts as ceiling reference. | Medium | SV018, SV019 |
| CV032 | Snowflake NTM revenue multiple 10-15x acts as mature-SaaS ceiling reference. | Medium | SV018, SV019 |
| CV033 | ARR run-rate <$500M by FY2027 is the base-case kill trigger. | Medium | SV008, SV007 |
| CV034 | Salesforce co-sell public deprioritisation is the bull-case kill trigger. | Medium | SV043, SV025 |
| CV035 | NVIDIA Blackwell allocation cut to a peer is a re-underwrite trigger. | Medium | SV044, SV025 |
| CV036 | Hyperscaler bundled pricing cut >40% on AWS Bedrock or peer is a base-compression trigger. | Medium | SV001, SV002 |
| CV037 | Platform-host copyright precedent is an OSS-revenue re-underwrite trigger. | Medium | SV025, SV008 |
| CV038 | Series C flat-or-down vs Series B is a mark-to-market trigger. | Medium | SV018, SV019, SV007 |
| CV039 | Founder departure (CEO/CTO/CSO) is a kill trigger. | Medium | SV024, SV025 |
| CV040 | Exact ARR at runDate is undisclosed and is the principal diligence ask. | High | SV008, SV007, SV027 |
| CV041 | NRR / GRR / cohort retention are undisclosed at runDate and are material diligence asks. | High | SV027, SV025 |
| CV042 | Top-10 customer concentration and GPU committed spend are undisclosed. | High | SV027, SV025 |
| CV043 | CFO and CRO presence at runDate is unconfirmed. | Medium | SV024, SV025 |
| CV044 | Opex split (R&D / S&M / G&A), paid-developer count, SOC 2 expiry, and OSS hosting policy are all diligence asks. | Medium | SV025, SV027 |
| CV045 | Sacra estimates Together AI reached $1B in annualized revenue by February 2026, up from ~$618M at year-end 2025, representing ~400% year-over-year growth in 2024. | Medium | SV045, SV046 |
| CV046 | Together AI is in talks to raise approximately $1B at a $7.5B pre-money valuation as of March 2026, which would represent a >2× step-up from the $3.3B Series B valuation set in February 2025. | Medium | SV045, SV047 |
| CV047 | EquityZen lists Together AI as available for pre-IPO secondary share purchases by accredited investors, indicating secondary-market liquidity exists for current shareholders. | Medium | SV047, SV045 |
| CV048 | CB Insights' Q1 2026 State of AI report identifies AI infrastructure as the leading funding category in early 2026, with total AI deal activity up materially from prior quarters, supporting the demand context for Together AI's growth. | High | SV048, SV001 |