Diligence report Data infrastructure / AI platform Late-stage private / pre-IPO 2026-05-05

Databricks

Elite data-and-AI asset, but the current private mark still prices in a lot of future perfection

Databricks is a premier late-stage data-and-AI platform, but the current $134B price still looks stretched versus public comps and available disclosure.

Cover facts

Valuation 01

134000 USD M

Revenue run-rate 02

5400 USD M

Customers 03

20000 organizations+

Employees 04

10000 employees+

Founded 05

2013

Recommendation 06

track

Company profile

Databricks is a late-stage private data-and-AI infrastructure company that grew out of the Apache Spark ecosystem and now sells a multicloud platform spanning data engineering, lakehouse storage and governance, warehousing, AI/BI, model and agent tooling, and adjacent database services. The company has reached rare private-market scale, with disclosed revenue run-rate above $5 billion, deep Fortune 500 penetration, and a fast-growing AI revenue stream, but public disclosure still lags what investors would normally expect to underwrite a $134 billion entry cleanly.

Website: www.databricks.com
Founded: 2013-01-01
Founders: Ali Ghodsi, Matei Zaharia, Ion Stoica
Founding location: Berkeley, California, USA
Headquarters: San Francisco, California, USA
Product: Databricks sells a unified, usage-based data and AI platform that combines lakehouse storage formats, data engineering, warehousing, governance, AI/BI, model and agent tooling, and multicloud deployment across Azure, AWS, and Google Cloud.
Customers: Large enterprises, digital-native companies, public-sector organizations, and data/AI teams that want a governed multicloud platform for analytics and production AI.
Business model: Primarily usage-based monetization through DBUs and adjacent serverless services, with expansion driven by workload growth, higher-value AI products, warehousing, governance, and broader platform adoption inside large accounts.
Stage: Late-stage private / pre-IPO
Funding status: Public financing sequence moved from a $62B Series J in December 2024 to more than $100B in Series K terms and a $134B Series L in December 2025, followed by a February 2026 package combining equity and debt.

Executive summary

Top strengths

Rare late-stage scale with revenue run-rate above $5B and continued fast growth.
Strong enterprise depth, including large-spend customer cohorts and multicloud distribution.
Real AI monetization already above a $1B run-rate, not just product marketing.
Deep technical founder roots and broad product surface across data, governance, and AI.
Capital access and free-cash-flow signals reduce near-term financing stress.

Top risks

The $134B mark still relies heavily on management-led run-rate disclosure rather than audited statements.
Public software comps provide only limited support for the current premium multiple.
Gross margin, concentration, cap-table seniority, debt terms, and AI economics remain under-disclosed.
Competition from hyperscalers, Snowflake, and open data formats can pressure differentiation and pricing.
Active copyright litigation and governance opacity still matter at this valuation level.

Open gaps

Audited revenue and gross-margin bridge suitable for IPO-grade underwriting.
Exact cap table, preference stack, debt covenants, and any employee-liquidity pricing terms.
Customer concentration, cohort retention by product, and renewal-duration data.
AI revenue contribution broken down by product, margin, and pass-through infrastructure exposure.
Full governance roster, committee structure, and litigation reserve or insurance detail.

Chapter 01

01Company Overview

1.1 Identity, mission, and the current public company shape

Databricks now presents itself as a large late-stage private infrastructure company rather than a narrowly defined Spark vendor. The cleanest current identity source is the official about page, which calls Databricks the data and AI company and frames the Data Intelligence Platform as a unified foundation for data, governance, and AI. Headquarters are clearly anchored in San Francisco and the contact page provides a specific 160 Spear Street address, which makes the company easy to anchor geographically for later chapters. The public scale message has also changed over time in a way worth preserving: the about page still says more than 15,000 organizations use Databricks, while the current press kit and later 2025-2026 press releases move that number to more than 20,000 customers. The safest reusable identity for the rest of the report is therefore a San Francisco-headquartered, privately held data-and-AI platform company with a multicloud operating model, broad enterprise reach, and explicit dependence on its own narrative around the Data Intelligence Platform.[CO004, CO005, CO006, CO007, CO008, CO009]

Snapshot KPI table
Metric	Value / Status	Date	Confidence	Gap / Notes
Founding year	2013	2013	high	Consistent across Databricks about materials and the press kit.
Headquarters	San Francisco, California	2026 public pages	high	Specific contact address is 160 Spear Street, 15th Floor.
Stage	Late-stage private / IPO-optional	2026-01-23	medium	Backed by $134B valuation and CNBC pre-IPO framing; still privately held.
Customer count	20000	2026 press kit / 2025-2026 releases	high	Current company claim is 20,000+; older about page still says 15,000+, so use the newer figure and keep the older one as historical context.
Headcount	10000	2026 press kit	medium	Current company claim is 10,000+ employees; CNBC reported roughly 8,000 in June 2025, so treat the trajectory as upward rather than point-precise.
Office count	30	2026 press kit	medium	Company says 30+ offices globally but does not publish a full office list in the reviewed materials.
Revenue run-rate	5400	2026-02-09	medium	Annualized revenue run-rate, not audited GAAP revenue.
Latest public valuation	134	2025-12-16	high	Series L and subsequent CNBC coverage align on $134B.
Capital package disclosed	7	2026-02-09	medium	February 2026 disclosure combined roughly $5B equity and roughly $2B debt capacity; this is not the same as cumulative lifetime funding.
Public total raised				Public sources describe about $20B raised and multiple debt/equity packages, but the exact cumulative capital base is not fully reconciled.

Use these values as the canonical company-overview baseline. Where the public record mixes annualized run-rate, equity, debt, and secondaries, the table keeps unsupported lifetime totals as null instead of manufacturing precision.

[CO001, CO006, CO007, CO009, CO010, CO011]

1.2 Founders, leadership bench, and governance disclosure limits

The founder record is unusually strong even if the full executive and board roster is not. Databricks says it was founded in 2013 by seven UC Berkeley AMP Lab researchers, and the official founders page names Ali Ghodsi, Ion Stoica, Matei Zaharia, Patrick Wendell, Reynold Xin, Andy Konwinski, and Arsalan Tavakoli-Shiraji. That matters because the company still derives legitimacy from technical origin stories: Berkeley and Spark are not marketing garnish but core proof of founder-market fit. The Berkeley profile for Ali Ghodsi and the CACM Spark paper reinforce that Databricks emerged from the same academic and open-source ecosystem that created Apache Spark. Public executive disclosure remains thinner than the founder disclosure. Ali Ghodsi is easy to verify as co-founder and CEO, but the reviewed board page mainly proves that a governance surface exists, not who all directors are or how committees and investor rights are allocated. That should be treated as a real diligence gap, not silently papered over.[CO001, CO002, CO003, CO013, CO014, CO015]

Leadership and founder table
Person	Role	Background	Founder-market fit or functional coverage	Key-person dependency
Ali Ghodsi	Co-founder, CEO	UC Berkeley academic; Spark-era founder; public face of financing and product direction	Bridges technical origin story, enterprise narrative, and capital markets communication	High
Ion Stoica	Co-founder	UC Berkeley professor and AMP Lab figure	Anchors Databricks credibility in distributed systems and academic research roots	Medium
Matei Zaharia	Co-founder	Apache Spark creator and Databricks technical founder	Core link between Spark, lakehouse credibility, and platform architecture	High
Patrick Wendell	Co-founder	Spark-era Databricks engineering leader listed on the CACM Spark paper	Adds engineering depth and continuity from early platform design	Medium
Andy Konwinski / Arsalan Tavakoli-Shiraji / Reynold Xin	Co-founding technical bench	Named on the official founders page with Berkeley and open-source roots	Broadens the company's original technical bench beyond the CEO-led narrative	Medium

Founder disclosure is strong; public executive and board detail beyond the founders is much thinner.

[CO002, CO003, CO013, CO014, CO015, CO016]

1.3 Capital formation, investor map, and private-market maturity

Databricks has crossed into very large private-company financing territory, and the progression is well supported by both company and independent reporting. The decisive inflection was the December 2024 Series J announcement: Databricks said it was arranging $10 billion of expected non-dilutive financing at a $62 billion valuation and expected to cross a $3 billion revenue run-rate while reaching positive free cash flow. By September 2025 the company said it had crossed a $4 billion run-rate, exceeded a $1 billion AI revenue run-rate, and closed a $1 billion Series K at a valuation above $100 billion. The December 2025 Series L then pushed public valuation to $134 billion with a $4.8 billion run-rate, and the February 2026 update said Databricks had crossed $5.4 billion in run-rate revenue while completing a package worth more than $7 billion including equity and debt. Those are unmistakable signs of late-stage private-market maturity and IPO optionality. What remains unresolved is the precise cap table and true cumulative capital raised after the mix of non-dilutive financing, equity, debt, and 2025 employee liquidity programs.[CO017, CO018, CO019, CO020, CO025, CO027]

Stakeholder or investor map
Stakeholder	Role	Control or economic importance	Diligence ask
Thrive Capital	Series J lead / Series K co-lead	Lead investor in the 2024 $62B financing and again part of the 2025 >$100B round	Confirm board rights, liquidation preference, and any pro-rata or super-pro-rata rights across J/K/L.
Andreessen Horowitz	Series J co-lead / Series K co-lead	Repeatedly named in 2024-2025 primary rounds	Confirm ownership percentage after Series L and any special governance rights.
Insight Partners	Series J co-lead / Series K co-lead / portfolio investor	Named in multiple rounds and still publicly lists Databricks as a portfolio company	Confirm current economic stake and whether Insight has observer or nomination rights.
DST Global / GIC / WCM	Series J co-leads	Helped anchor the large 2024 non-dilutive package	Request financing docs to understand structure, warrant terms, and seniority.
MGX / Ontario Teachers / Sands / Wellington / ICONIQ	Named participants in 2024-2025 financings	Signal broad sovereign/institutional support but unclear ownership percentages	Request the cap table and side letters to reconcile exact positions.
Employees and former employees	Liquidity counterparties in 2024-2025 tender or secondary activity	Series J explicitly contemplated liquidity and TechCrunch reported two 2025 secondary rounds	Request tender documents, participation rates, and pricing to understand morale and dilution.
Lenders / debt providers	Supplemental financing providers	January-February 2026 debt facilities expanded Databricks' capital base beyond equity alone	Request debt covenants, maturity schedule, security package, and permitted lien language.
Cloud/platform partners (SAP, Microsoft, Google)	Distribution and ecosystem stakeholders	Not equity owners in the reviewed record, but strategically important for route-to-market and platform reach	Confirm commercial concentration, revenue share, and dependency by partner.

Economic importance is supportable publicly; precise ownership and control rights are not.

[CO017, CO018, CO019, CO027, CO030, CO031]

FO003: Snapshot KPIs

The public KPI stack points to a very large private enterprise-software platform with current scale signals but still-incomplete ownership detail.

Customer and employee values are company-reported floor values rather than exact counts; revenue is an annualized run-rate, not audited revenue.

[CO009, CO010, CO032, CO036, CO038, CO052]

1.4 Scale signals, ecosystem reach, and what is reusable as ground truth

The strongest public scale signals are not just valuation headlines; they are the quality of Databricks customer and ecosystem indicators. Company materials now claim more than 20,000 customers globally, 70% Fortune 500 penetration, more than 10,000 employees, and 30-plus offices. Independent reporting fills in useful texture without fully replacing company claims: CNBC said Databricks had roughly 8,000 employees in June 2025, nearly 50 customers spending more than $10 million annually, and $2.6 billion of revenue in the fiscal year ending January 2025. The company then disclosed successive cohorts of 650-plus, 700-plus, and 800-plus customers consuming more than $1 million of annual revenue run-rate across September 2025, December 2025, and February 2026. Distribution is also broader than a single direct-sales motion. SAP says Business Data Cloud embeds Databricks technology, while Microsoft and Google each market Databricks as a first-party cloud offering. Together those signals support a reusable picture of Databricks as a scaled enterprise platform with multicloud reach, strong upsell dynamics, and a customer base deep enough to support large-spend cohorts.[CO009, CO010, CO011, CO012, CO021, CO022]

1.5 Milestones to reuse later, including the active adverse marker

The milestone record is rich enough that later chapters should not have to rediscover it. Databricks was founded in 2013, acquired MosaicML in 2023 to accelerate enterprise generative AI, moved in 2024 to acquire Tabular to converge Delta Lake and Apache Iceberg ecosystems, partnered with SAP in early 2025, and agreed in mid-2025 to acquire Neon to push deeper into operational databases for AI agents. Those strategic moves line up closely with the financing sequence from Series J to Series K to Series L and then the February 2026 debt-and-equity expansion. The one adverse event that clearly belongs in the chronology is the ongoing copyright litigation tied to Mosaic-era and DBRX-related model training. The Register and Saveri both indicate that the case survived a motion to dismiss in April 2026, so this is not a stale allegation that can be ignored. It should carry into later risk work as an active IP and model-governance issue.[CO017, CO018, CO025, CO027, CO032, CO036]

Milestone table
Date	Event	Type	Amount / valuation / status	Participants	Implication
2013-01-01	Databricks founded	founding	Company formed	Seven UC Berkeley AMP Lab researchers	Canonical origin point for the company and later founder-market-fit narrative.
2023-07-19	MosaicML acquisition closed	product	$1.3B reported by TechCrunch	Databricks and MosaicML	Accelerated Databricks into enterprise generative AI training and customization.
2024-06-04	Tabular acquisition announced	product	Agreed; closed 2024-06-07	Databricks and Tabular	Brought Apache Iceberg creators together with Delta Lake creators to reduce format fragmentation.
2024-12-17	Series J financing announced	financing	$10B expected non-dilutive at $62B valuation	Thrive, a16z, DST, GIC, Insight, WCM and others	Established Databricks as a megafinancing private company while targeting $3B revenue run-rate and positive free cash flow.
2025-02-13	SAP Business Data Cloud launched with embedded Databricks technology	partnership	Partnership live	SAP and Databricks	Strengthened enterprise distribution and business-data positioning.
2025-05-14	Neon acquisition agreement announced	product	Agreement announced	Databricks and Neon	Extended strategy into serverless Postgres for developers and AI agents.
2025-09-08	Series K and $4B revenue run-rate disclosed	scale	$1B at >$100B valuation	Databricks, a16z, Insight, MGX, Thrive, WCM	Showed valuation step-up, positive free cash flow, and rising AI monetization.
2025-12-16	Series L and $4.8B run-rate disclosed	financing	>$4B at $134B valuation	Databricks and Series L investors	Pushed the company into a new valuation band and reinforced platform breadth.
2026-02-09	Post-Series-L financing package expanded	governance	>$7B package including ~$2B debt capacity	Databricks and financing counterparties	Confirmed Databricks could layer debt on top of late-stage equity rather than rushing to public markets.
2026-04-21	DBRX-related copyright claims survived motion to dismiss	adverse	Active litigation	Authors, Databricks, Mosaic-related defendants	Creates a live IP and model-governance risk that later chapters should not ignore.

This is the public chronology of record for the report; it prioritizes milestones that materially change identity, scale, strategy, or risk.

[CO001, CO017, CO018, CO019, CO020, CO027]

FO001: Company milestone timeline

Major Databricks inflection points show a shift from open-source roots to late-stage private-market scale, AI-platform expansion, and an active legal overhang.

[CO001, CO017, CO018, CO020, CO027, CO032]

FO002: Company snapshot logic

Databricks' identity, platform, customers, capital base, partner routes, and litigation risk form one connected system rather than isolated datapoints.

[CO003, CO004, CO005, CO009, CO017, CO018]

1.6 Exhibits

Chapter 02

02Market Analysis

2.1 Market boundary and sizing lenses

Databricks should be bounded from its monetization surfaces outward rather than from every dollar labeled AI or cloud. The company presents a lakehouse-based platform that unifies storage, processing, governance, BI, and AI workloads, then layers agent development and governed business intelligence on top. That means the most relevant direct spend pools are lakehouse software and services, governed analytics, AI development on enterprise data, and public-sector data modernization. It should exclude generic cloud infrastructure, chips, foundation-model spending that never touches Databricks workflows, and broad consulting or systems-integration work that does not attach to the platform. Public market estimates confirm demand but also show why one headline TAM is misleading: Grand View pegs the core data lakehouse market at USD 13.94 billion in 2025, GMI at USD 14.2 billion, and TBRC at USD 10.33 billion, while IDC’s USD 337 billion 2025 AI-supporting-technology forecast is an outer envelope that is much broader than Databricks’ practical revenue pool. The right reading is that Databricks has macro tailwinds and a credible core category, but not a publicly isolatable SAM or SOM. That leaves the valuation debate anchored less on absolute TAM rhetoric and more on Databricks’ ability to consolidate budgets that would otherwise stay fragmented across warehouses, BI, governance, streaming, and bespoke AI stacks.[CM053, CM054, CM055, CM056, CM057, CM060]

TAM/SAM/SOM or sizing lens table
publisher	year	geography	value	CAGR	methodology	confidence	limitation
Grand View Research	2024	Global	11.35		Current data lakehouse market snapshot	medium	Core lakehouse category only, not Databricks-specific revenue pool
Grand View Research	2025	Global	13.94	23.2	Publisher forecast for the core data lakehouse market through 2033	medium	Forecast window and category boundary differ from other publishers
Global Market Insights	2024	Global	11.9		Current data lakehouse market snapshot	medium	Core lakehouse category only, not a broader AI platform measure
Global Market Insights	2025	Global	14.2	25	Publisher forecast for the core data lakehouse market through 2034	medium	Longer horizon and methodology differ from Grand View and TBRC
The Business Research Company	2025	Global	10.33		Current data lakehouse market snapshot	medium	Shorter-term definition than some other publisher estimates
The Business Research Company	2026	Global	12.58	21.8	Near-term forecast from 2025 base	medium	Not directly comparable with 2033-2034 endpoints
The Business Research Company	2030	Global	27.28	21.4	Five-year forecast for the core lakehouse category	medium	Shorter horizon than the 2033-2034 forecasts
IDC FutureScape	2025	Worldwide	337		Outer-envelope spending on AI-supporting technology	low	Far broader than the core data lakehouse category or Databricks SAM
IDC FutureScape	2028	Worldwide	749		Outer-envelope forecast for AI-supporting technology	low	Not comparable to lakehouse-only estimates; useful as macro context only

All values are USD billions. The first seven rows estimate the core data-lakehouse category; the IDC rows are a broader AI-supporting-technology envelope to show why Databricks TAM depends on boundary selection.

[CM023, CM024, CM025, CM028, CM029, CM030]

FM001: Market sizing lens

Databricks should be valued against nested spending lenses: a broad AI envelope, a narrower core lakehouse category, an enterprise-weighted buyer slice, and then an undisclosed Databricks-specific SAM/SOM.

This figure is a boundary lens, not a strict TAM-SAM-SOM waterfall. Public sources support the outer envelope and core category, but they do not isolate a Databricks-specific SAM or SOM.

[CM024, CM029, CM031, CM027, CM038, CM037]

FM002: Market estimate range

The best apples-to-apples public range is the 2025 core data lakehouse market, not the much broader AI-supporting-technology envelope.

The range keeps one consistent unit and one category definition: 2025 core data lakehouse market size. Broader AI-supporting-technology spend is excluded from the range because it is not comparable.

[CM024, CM029, CM031]

2.2 Buyer segmentation, budgets, and adoption path

The buyer map is broader than a single data-engineering budget. Official and partner materials show Databricks selling first to central data-platform teams, then to analytics leaders, data scientists, ML engineers, application developers, and public-sector data offices. Microsoft’s Azure Databricks overview explicitly groups data engineering, ML and AI, BI, and streaming analytics as core workloads, which implies multiple internal champions and budget owners. Databricks’ own AI/BI and Unity Catalog pages show a path from governed SQL and semantic layers into nontechnical business-user workflows, while Mosaic AI pushes into model and agent builders. Procurement also varies by segment: some deals can route through AWS Marketplace, Google Cloud, or Azure relationships, while public-sector programs emphasize agency compliance, fiscal decision-making, and secure data sharing across state, local, federal, and higher-education contexts. The enterprise-weighted character of the category still matters: Grand View says large enterprises represented 71.4% of 2024 data-lakehouse revenue, which fits Databricks’ multicloud, governance-heavy pitch and suggests the most important budgets sit with CIO, CDO, platform, and regulated-operations leaders rather than only isolated experimenters. These paths overlap, but they do not spend identically.[CM059, CM060, CM065, CM066, CM067, CM068]

Segment / buyer map
segment	buyer	user	payer/workflow	budget owner	adoption trigger
Central data platform	CIO, CDO, or platform leader	Data engineers and platform teams	Lakehouse consolidation, ETL, governance, and shared data services	Central IT, data office, or platform budget	Need to replace fragmented storage, ETL, governance, and analytics estates
Analytics and BI	Head of analytics, finance systems, or business operations leader	Analysts, finance teams, and business managers	Governed SQL analytics, dashboards, semantics, and conversational BI	Analytics, finance operations, or shared data budget	Need faster self-service analytics without adding separate BI silos
AI and ML builders	CTO, VP Engineering, or ML platform leader	Data scientists, ML engineers, and agent developers	Model training, agent evaluation, vector search, serving, and governed GenAI workflows	Engineering, product, or data science budget	Need to move from AI experimentation to production on enterprise data
Application and product teams	Product leader or engineering manager	Developers building data or AI applications	Use Databricks data, SQL, and AI services inside customer-facing or internal apps	Product engineering budget	Need shared data platform primitives without building the stack internally
Public sector and higher education	Agency CIO, data office leader, or university administrator	Policy teams, analysts, and domain operators	Compliance analytics, fiscal decision support, secure data sharing, and public-service AI use cases	Agency technology, program, or institutional budget	Need compliant modernization and cross-agency or campus data access
Regulated enterprises	Risk, compliance, finance, or operations leader	Analysts, reviewers, and line-of-business specialists	Trusted analytics and AI on sensitive data with auditability and oversight	Functional budget with governance oversight	Need higher productivity but cannot sacrifice lineage, controls, or policy enforcement

Databricks spans technical platform budgets and business-user analytics budgets, but the strongest public evidence still points to enterprise-scale, governance-heavy buying centers.

[CM059, CM065, CM066, CM067, CM068, CM069]

FM003: Buyer / segment map

Databricks fits best where data-platform control, clear ROI, and governance needs come together; public-sector and highly regulated workflows remain attractive but slower.

[CM065, CM066, CM067, CM068, CM069, CM070]

2.3 Growth drivers, adoption constraints, and valuation relevance

The demand backdrop is strong enough to justify continued category expansion. IDC expects AI-supporting-technology spend to reach USD 337 billion in 2025 and exceed USD 749 billion by 2028, while Confluent reports that 90% of surveyed IT leaders are increasing data-streaming investment and 44% report 5x ROI. Deloitte also says worker access to AI rose 50% in 2025 and that companies with 40% or more of projects in production are set to double in six months. But the same sources show the friction that matters for Databricks underwriting. Deloitte says only one in five companies has mature governance for autonomous AI agents and that enterprises feel less prepared in infrastructure, data, risk, and talent than they do strategically. McKinsey highlights security, inaccuracy, cybersecurity, and training gaps as major barriers. FinOps shows AI-spend governance moving up the priority stack, and CIO argues that outdated data estates still cannot feed AI systems well. The EU AI Act and NIST AI RMF reinforce that governed deployment, not raw experimentation, sets the pace for high-consequence use cases. Competition is also intense: Snowflake’s 2026 results show strong incumbent momentum and continued customer budget rationalization. For valuation, that means Databricks benefits from large secular demand, but durable upside still depends on converting governed pilots into repeatable production budgets faster than peers and substitutes.[CM038, CM039, CM040, CM041, CM042, CM043]

Market definition table
segment/category	included spend	excluded spend	buyer/payer	relevance
Unified lakehouse platform	Core platform spend for storage, processing, SQL, governance, and shared data infrastructure on a lakehouse architecture	Generic object storage, unmanaged compute, and point ETL spend with no Databricks workload	CIO, CDO, data-platform owner, and central IT budgets	Primary direct category for Databricks core platform land motions
Governed analytics and BI	Databricks SQL, AI/BI, semantics, dashboards, and conversational analytics on governed enterprise data	Standalone BI seat licenses or reporting spend that never attaches to Databricks data and semantics	Analytics leaders, finance operations, business intelligence teams, and shared data budgets	Direct expansion vector from technical data teams into business-user workflows
AI, ML, and agent development on enterprise data	Model development, agent evaluation, vector search, serving, and guardrailed GenAI workloads tied to enterprise data	Foundation-model API spend or generic inference spend that never uses Databricks data pipelines or governance	CTO, VP Engineering, ML platform lead, product engineering, and data science budgets	High-growth adjacency that broadens Databricks beyond classic analytics
Public-sector data modernization	Agency data integration, compliance analytics, secure data sharing, and higher-education or government AI use cases	Generic systems integration, public-cloud migration, or consulting spend that does not land on Databricks workloads	Agency CIOs, program leaders, data offices, procurement, and education administrators	Distinct vertical motion with procurement and compliance-heavy sales dynamics
Status-quo substitute stack	Legacy warehouses, fragmented ETL pipelines, point BI tools, self-managed Spark, and internal AI tooling that can be displaced	Net-new AI infrastructure spending that does not replace an existing workflow or data stack	Existing IT and analytics budget owners defending incumbent tools	Main source of displacement budget and the clearest practical substitute set
Broad AI-supporting technology envelope	AI-supporting software, infrastructure, and cloud renovation counted by broad macro forecasts	Any assumption that all AI-supporting spend converts into Databricks revenue	CIO, CTO, cloud platform owner, and transformation budget pools	Useful outer bound for demand context, but too broad to call Databricks SAM

The boundary starts from Databricks monetization surfaces: lakehouse infrastructure, governed analytics, AI workloads on enterprise data, and public-sector modernization. Generic cloud infrastructure and broad AI-enablement spend remain context, not direct market size.

[CM053, CM054, CM055, CM056, CM057, CM058]

Growth drivers and constraints table
driver/constraint	direction	timing	implication	diligence ask
AI-supporting technology spend expansion	up	12-36 months	Expands the macro budget pool for governed data and AI platforms	Request management’s internal split of revenue exposure to core lakehouse, BI, and GenAI workloads.
Shift from experimentation to reinvention	up	current	Supports broader platform decisions that combine data, infrastructure, and AI rather than buying point tools	Request win-rate data versus point solutions and status-quo internal builds.
Real-time data and streaming ROI	up	current	Improves the case for unified platforms that can feed AI with fresh, trusted data	Request attach rates between streaming, lakehouse, and AI workloads on Databricks.
Large-enterprise budget concentration	up	current	Favours Databricks because the category is still enterprise-heavy and governance-heavy	Request Databricks revenue mix by enterprise size and average expansion path.
Public-sector and education modernization	up	12-24 months	Creates vertical demand where secure sharing and compliance matter more than greenfield AI hype	Request public-sector pipeline, contract sizes, and procurement-cycle benchmarks.
Autonomous-agent governance immaturity	down	current	Slows rollout of higher-consequence AI workflows even when experimentation is widespread	Request production deployment counts for governed agents versus proofs of concept.
Regulatory compliance timeline	down	12-24 months	EU and trust frameworks raise the cost of deploying AI into sensitive workflows without controls	Request product roadmap evidence for AI transparency, monitoring, and high-risk use-case support.
FinOps scrutiny and AI-spend governance	down	current	Budget owners are pushing harder on unit economics, forecasting, and policy before scaling spend	Request margin and payback assumptions for AI workloads, especially serverless and model-serving usage.
Legacy data modernization backlog	down	current	Organizations still need data cleanup, governance, and modernization before AI budgets convert into durable platform spend	Request implementation timelines, migration blockers, and professional-services dependency.
Incumbent competition and budget rationalization	down	current	Strong rivals and budget scrutiny can extend sales cycles and reduce contract duration confidence	Request win/loss data versus Snowflake, cloud-native substitutes, and internal platform teams.

The key underwriting question is not whether AI demand exists, but whether Databricks can turn broad demand into governed, retained, multiyear production spend faster than the friction builds.

[CM038, CM039, CM040, CM041, CM042, CM043]

FM004: Adoption funnel or value-chain map

Databricks adoption usually starts with data-platform modernization, then expands into governed analytics and AI, but production scale waits on procurement, governance, and budget proof.

[CM054, CM055, CM060, CM061, CM062, CM071]

2.4 Exhibits

Chapter 03

03Competitors

3.1 Competitive landscape and buyer alternatives

Databricks no longer competes only against classic cloud data warehouses. Its own platform framing spans integration, storage, processing, governance, sharing, analytics, and AI across major clouds, which means the relevant set includes direct data-platform peers, incumbent cloud suites, adjacent streaming vendors, and the status quo of self-managed open-source stacks. Snowflake remains the closest direct peer because it sells a managed cross-cloud platform for analytics and AI with its own governance and consumption model. BigQuery, Microsoft Fabric, and AWS Redshift are the largest incumbent alternatives because each can absorb part of the same enterprise budget through an existing cloud relationship, then extend from analytics into AI and governance. Confluent overlaps more narrowly around streaming and real-time processing, but it competes for upstream data architecture decisions that can reduce warehouse or lakehouse spend. The final substitute set remains powerful: self-managed Spark, Trino, and other internal-build combinations let skilled platform teams avoid some vendor spend entirely, even if they accept more operational burden. That means Databricks is competing against platform vendors, cloud bundles, and internal build paths at the same time.[CP001, CP002, CP011, CP014, CP024, CP029]

FP001: Competitive positioning map

Ordinal positioning of major alternatives by open multi-cloud posture and breadth of end-to-end enterprise data and AI workflow coverage.

Axes are evidence-backed ordinal scores derived from reviewed platform, governance, and pricing surfaces rather than a published market dataset.

[CP002, CP003, CP014, CP024, CP029, CP036]

3.2 Direct peers, incumbents, and adjacent challengers

Snowflake is the clearest direct incumbent because it is already scaled with more than 13,300 customers, 733 customers spending over $1 million annually, and 790 Forbes Global 2000 customers. It differs from Databricks in architecture and economics: Snowflake is a managed public-cloud service with separate storage, compute, and cloud-services layers, while Databricks leans on lakehouse architecture, open-source lineage, and broader data-engineering-to-AI workflow claims. BigQuery is a more cloud-native substitute than a like-for-like company peer, but it matters because Google can pair a serverless analytics product with a large cloud sales motion and growing Apache Iceberg support. Microsoft Fabric is the most important bundle-led entrant: it packages data engineering, warehousing, Power BI, and Copilot-led workflows in a SaaS environment on OneLake, with Purview-backed governance and Azure procurement leverage. AWS Redshift remains a formidable incumbent where customers already standardize on S3, SageMaker, and AWS operations. Confluent is narrower, yet strategically relevant, because real-time data pipelines and Flink-based pre-processing can capture value before data ever reaches a Databricks or Snowflake warehouse. Together, these alternatives show that Databricks wins when buyers want one governed multi-workload platform, but loses some advantage when the customer already lives inside a hyperscaler bundle or only needs a focused component.[CP008, CP009, CP010, CP011, CP014, CP016]

Competitor profile table
Competitor	Category	Scale / funding	Target segment	Primary differentiation	Key limitation vs Databricks
Databricks	Reference platform	$10B Series J expected financing at $62B valuation; 500+ $1M ARR-run-rate customers; 15,000+ organizations	Enterprise data engineering, analytics, governance, and AI teams	Multi-cloud lakehouse plus unified governance, AI/BI, and open-format posture	Public realized pricing and net retention by workload remain undisclosed; open formats reduce hard lock-in
Snowflake	Direct incumbent	$1.23B Q4 FY26 product revenue; 733 $1M+ customers; 790 Forbes Global 2000 customers	SQL-led enterprise analytics, data sharing, and AI workloads	Large installed base with strong managed-service simplicity and cross-cloud reach	Compute-credit model and managed-service orientation make it less open-source-native than Databricks
Google BigQuery	Incumbent cloud platform	Google Cloud revenue reached $12.0B in Q4 2024	GCP-centric analytics, AI, and lakehouse buyers	Serverless analytics plus managed Apache Iceberg support and Google AI distribution	Best fit is strongest inside Google Cloud buying relationships rather than as a neutral multi-cloud control plane
Microsoft Fabric	Incumbent bundled suite	Microsoft Intelligent Cloud revenue reached $29.9B in FY25 Q4	Power BI, Azure, and business-user-centric analytics estates	End-to-end SaaS analytics with OneLake, Copilot, and Purview-backed governance	Microsoft ecosystem gravity is an advantage for Fabric but also makes it less cloud-neutral than Databricks
AWS Redshift	Incumbent warehouse / lakehouse substitute	AWS segment sales reached $107.6B in 2024	AWS-native data warehousing, S3-lakehouse, and AI workloads	Low-entry pricing, deep AWS integration, zero-ETL, and S3/SageMaker adjacency	Orientation is still AWS-centered and SQL-warehouse-led rather than a neutral data-to-AI control plane
Confluent Cloud + Flink	Adjacent real-time challenger	$922.1M FY2024 subscription revenue; $963.6M total revenue	Streaming-first engineering and real-time AI/data teams	Unified Kafka + Flink stack can shift transformations left before warehouse spend occurs	Not a full warehouse / BI / semantic-governance platform for broad enterprise analytics
Self-managed Spark / Trino	Status quo / internal build	Open-source software; infrastructure and staffing borne internally	Skilled platform teams with strong infra control requirements	Maximum engine choice and avoidance of platform license lock-in	Operational burden, security, governance, and user enablement fall back on the customer

Rows compare the main ways a buyer can solve the same broad enterprise data-and-AI job, including direct peers, incumbent suites, adjacent streaming vendors, and internal build.

[CP005, CP008, CP009, CP010, CP011, CP020]

3.3 Pricing, packaging, switching cost, and multi-homing

Pricing structure is one of the main reasons this market remains multi-homed. Databricks prices on a pay-as-you-go basis with per-second billing and committed-use discounts, but its public materials emphasize model structure more than a single simple list price. Snowflake is more explicit about billing mechanics: storage, compute, and data transfer are distinct, with compute metered in credits and even a small standard warehouse consuming 2 credits per hour. BigQuery exposes a similarly transparent structure through per-TiB scanning and slot-hour commitments. Fabric turns the buying decision into shared Capacity Units and reservation savings, while still preserving Power BI licensing nuances that can favor Microsoft-centric rollouts. Redshift sets clear low-entry serverless and provisioned starting prices and can ride existing AWS enterprise commitments. Confluent uses usage-based Kafka and Flink units that are attractive when streaming, not warehousing, is the center of gravity. The consequence is that Databricks' switching costs are real only after the platform owns governance, semantics, and multiple workload types. Before that, multi-homing is rational: enterprises can keep Snowflake for SQL-heavy sharing, BigQuery for GCP-resident analytics, Fabric for Power BI-heavy teams, Redshift for AWS-native warehousing, and Confluent for streaming while still using Databricks for engineering or AI.[CP003, CP005, CP006, CP007, CP017, CP018]

Pricing / packaging comparison
Platform	Price / unit / contract model	Included capabilities	Discount / unknowns	Implication
Databricks	Pay-as-you-go, per-second billing; committed-use contracts available	Unified data, analytics, governance, SQL, AI, and AI/BI surfaces	Exact realized net pricing varies by SKU and cloud; public page stresses structure more than a single headline price	Flexible but harder for outsiders to benchmark precisely against simpler warehouse tariffs
Snowflake	Storage + compute + data transfer; compute uses credits; small standard warehouse = 2 credits/hour	Managed SQL analytics and AI platform with separate virtual warehouses	Credit price depends on edition / cloud agreement; per-second billing has 60-second minimum per start	Transparent meter design but forecasting requires credit discipline
BigQuery	On-demand $6.25 per TiB scanned or capacity pricing per slot-hour with editions	Serverless analytics, reservations, autoscaler, and AI features in BigQuery editions	Actual cost depends on bytes scanned or slot commitments	Easy low-friction entry but cost can jump with inefficient scans or sustained slot demand
Microsoft Fabric	Shared Capacity Units via pay-as-you-go or reservation; ~41% reservation savings cited	Data engineering, warehousing, BI, AI experiences, and OneLake in one SaaS environment	Publish/share workflows still often require Power BI Pro; some value depends on existing Microsoft contracts	Bundle economics are strong in Microsoft estates even when pure feature-by-feature comparison is debatable
AWS Redshift	Provisioned from $0.543/hour or serverless from $1.50/hour; RPU-hour billing per second	Warehouse, S3-lakehouse querying, zero-ETL, and AI integrations	Reservations can cut serverless compute cost up to 45%; exact TCO depends on S3 and adjacent AWS usage	Low entry point and AWS commitment leverage create a credible pricing wedge
Confluent Cloud + Flink	Kafka autoscaling via eCKUs; Flink priced by CFUs per minute; annual commitments available	Streaming, schema, governance, and serverless Flink processing	Warehouse and BI spend still sits elsewhere; price advantage depends on streaming-first architecture	Attractive when teams want to transform or filter data before paying downstream warehouse costs

This table compares public billing mechanics and packaging posture rather than negotiated enterprise net price.

[CP005, CP006, CP017, CP018, CP019, CP025]

3.4 Moat durability, open formats, and adverse evidence

Databricks' strongest differentiation still comes from combining open lakehouse positioning, governed data-and-AI workflows, and enough product breadth to span engineering and business users. Unity Catalog is the centerpiece because it expands beyond permissions into lineage, semantics, business metrics, and open-format governance, while AI/BI reduces one classic weakness versus Microsoft by offering native dashboards and conversational analytics without per-seat BI pricing. But the adverse evidence is material. Databricks itself is pushing Iceberg REST catalog support and external-engine interoperability, which is strategically smart but also lowers proprietary lock-in. BigQuery now has managed Iceberg support, Redshift highlights open formats and Iceberg-compatible access through the AWS lakehouse, and Snowflake has responded with its own Iceberg and Open Catalog posture. Fabric raises a different kind of threat: even if it is not as cloud-neutral as Databricks, it can use Microsoft procurement, Power BI distribution, and Copilot familiarity to win pragmatic standardization decisions. The upshot is that Databricks still appears better positioned than most single-product rivals, but its moat is no longer format lock-in. It depends on whether Databricks can remain the best governed control plane across open data, AI assets, and business semantics faster than cloud bundles commoditize the underlying infrastructure.[CP003, CP004, CP007, CP012, CP013, CP027]

Feature / capability matrix
Buying criterion	Databricks	Snowflake	BigQuery	Fabric	Redshift	Confluent / internal build
Cross-cloud neutrality	Strong	Strong	Partial	Partial	Partial	Internal build = strong; Confluent = medium
Governed open-table posture	Strong	Medium	Medium	Medium	Medium	Internal build = medium
Business-user BI in same platform	Strong	Partial	Partial	Strong	Partial	Weak
Streaming / real-time depth	Medium	Partial	Partial	Medium	Medium	Strong
Warehouse simplicity / low-admin path	Medium	Strong	Strong	Strong	Strong	Weak
Open-source / engine portability	Strong	Medium	Medium	Medium	Medium	Strong

Cells are ordinal summaries of reviewed public product, docs, and pricing surfaces; they do not imply identical feature depth or operational maturity across categories.

[CP002, CP003, CP007, CP012, CP013, CP016]

Moat durability / competitive risk register
Moat claim	Threat	Severity	Mitigation / diligence ask
Unity Catalog governance spans data and AI assets	Snowflake, BigQuery, Fabric, and AWS are all improving governance around open formats and shared catalogs	High	Ask for win-loss data where governance breadth alone displaced a bundled incumbent
Open-format leadership reduces buyer lock-in fears	The same openness also lowers Databricks-specific switching costs once Iceberg interoperability becomes table stakes	High	Test whether customers standardize on Databricks as control plane even when compute engines remain multi-homed
AI/BI reduces need for separate BI tooling	Fabric can bundle Power BI, Copilot, and Microsoft procurement into a simpler executive purchase	High	Request attach-rate and expansion data for AI/BI versus Microsoft-centric accounts
Multi-cloud posture broadens buyer pool	Hyperscalers can still use existing cloud spend commitments and service adjacency to narrow evaluation scope	Medium	Review large-account win rates by incumbent cloud home and by regulated vertical
Open-source lineage supports internal credibility with engineers	Self-managed Spark, Trino, and stream-processing stacks remain credible for teams willing to absorb ops burden	Medium	Quantify how many large customers graduate from internal build to paid Databricks versus remaining self-managed

The main risk is not one superior point competitor but converging bundle and interoperability pressure across the stack.

[CP003, CP007, CP012, CP013, CP027, CP030]

FP002: Feature breadth / capability map

Visual summary of how Databricks and the main retained alternatives cover the buying criteria most relevant to an enterprise lakehouse decision.

Matrix cells summarize public product positioning and architecture evidence; they intentionally avoid unsupported claims about private feature adoption or implementation quality.

[CP003, CP007, CP013, CP024, CP027, CP029]

FP003: Moat / readiness KPIs

Ordinal scorecard of the competitive dimensions most likely to determine whether Databricks can defend share as the market converges around open formats and incumbent bundles.

Scores are analyst-derived ordinal judgments based on reviewed public evidence; they are not audited market benchmarks.

[CP003, CP007, CP012, CP013, CP030, CP044]

3.5 Exhibits

Chapter 04

04Financials

4.1 Revenue model, monetization surfaces, and public traction quality

Databricks now looks financially more like a broad consumption platform than a single analytics SKU. The reviewed public pricing and product surfaces show multiple monetization entry points: data engineering and warehousing compute, AI and model-serving workloads, AI/BI, and newer database products. The important underwriting distinction is that these are usage-based streams, not seat-based subscriptions. Databricks and Microsoft each describe DBU-driven billing with per-second granularity, while Microsoft also makes explicit that Azure customers pay both VM infrastructure charges and DBU platform charges. That dual-bill structure matters because public traction claims look strong but are not equivalent to realized software gross profit. On the traction side, company and independent sources line up on a sharp expansion path from $2.6 billion of recognized fiscal-2025 revenue to a $3.7 billion annualized rate by July 2025, then to $4.0 billion in September 2025 and $5.4 billion by February 2026. AI has become a material second engine rather than a marketing overlay: Databricks said AI products crossed $1.0 billion annualized revenue in September 2025 and $1.4 billion by February 2026, while CRN separately noted data warehousing still exceeded a $1.0 billion revenue run-rate. That is a healthier mix signal than a single narrowly defined warehouse story.[CI001, CI002, CI003, CI004, CI005, CI006]

Revenue streams table
Revenue stream	Mechanism	Unit	Current value / status	Revenue quality	Diligence ask
Core data engineering compute	Jobs, all-purpose, and serverless workloads billed through DBUs and attached infrastructure usage	DBU-hour plus cloud infrastructure	Core monetization surface remains active and disclosed on pricing pages	Medium; pricing mechanics are public but realized net rates are not	Request workload-level realized price per DBU and gross margin by compute class.
Databricks SQL / warehousing	Serverless SQL and related warehouse compute	DBU-hour	Greater than $1B revenue run-rate by Q3 2025	Medium; strong disclosed run-rate but not audited revenue recognition	Request warehousing revenue mix, warehouse attach, and margin by deployment mode.
AI products	Model serving, AI Gateway, agent and model tooling on governed data	DBU-hour and payload-based usage	Crossed $1B run-rate in Sep 2025 and $1.4B by Feb 2026	Medium; multiple corroborating sources but still company-led disclosure	Request AI revenue split between serving, tooling, and partner-model pass-through.
AI/BI	Native BI and conversational analytics embedded into the platform without per-seat BI licensing	Usage-based platform consumption	Publicly positioned as no per-seat or per-license BI fee	Medium; packaging is public but standalone revenue is undisclosed	Request AI/BI attach rate, user mix, and realized monetization per active account.
Lakebase / database	Serverless Postgres and database serverless compute for AI agents	Database compute and storage usage	Strategic expansion area accelerated with 2026 financing; revenue not disclosed	Low; new product with no public revenue contribution disclosed	Request current bookings, customer count, and cost-to-serve by Lakebase workload.
Professional services / support	Implementation, migration, and support services attached to platform deals	Services and support fees	Publicly unsegmented	Low; no public breakout	Request services mix, gross margin, and whether services are strategic or break-even.

Rows separate public monetization mechanisms from what remains undisclosed. Usage-based surfaces are visible; realized revenue mix is not.

[CI001, CI002, CI005, CI006, CI007, CI008]

Pricing / monetization table
Offer / comparator	Price unit / contract	Public list / billing signal	Discounts / unknowns	Source
Databricks core platform	Pay-as-you-go DBUs	No up-front cost and per-second granularity	Realized net pricing and enterprise discount bands undisclosed	Databricks pricing page
Azure Databricks commitments	1-year or 3-year DBCU pre-purchase	Up to 37% savings versus pay-as-you-go DBUs	Savings apply to DBUs, not the full underlying cloud bill	Microsoft Azure pricing
Databricks AI/BI	Embedded platform usage rather than BI seats	No per-seat or per-license BI fees	Actual monetization path and attach rate undisclosed	Databricks AI/BI page
Snowflake	Compute and storage with list-price tables / calculator	Managed elastic compute plus separate storage pricing	Capacity storage discounts require contract tables not shown on marketing page	Snowflake pricing
BigQuery	On-demand per TiB or reserved slots	On-demand analysis is $6.25 per TiB above the first free TiB monthly; capacity uses slots	Realized enterprise discounts vary by commitment and edition	Google BigQuery pricing
Amazon Redshift	Provisioned or serverless RPUs	Serverless starts at $1.50 per hour and is billed per-second while active	Reservations reduce cost but create commitment structure and separate transfer/storage charges	Amazon Redshift pricing

The table mixes Databricks list mechanics with comparator monetization structures to show how usage-based data-platform economics are actually bought. It does not attempt to estimate Databricks realized net revenue per workload.

[CI001, CI003, CI026, CI027, CI028, CI029]

FI001: Revenue model bridge

Databricks converts enterprise adoption into revenue through usage-based DBUs and adjacent AI / database services, but customer bills still include separate cloud infrastructure charges.

[CI001, CI002, CI004, CI013, CI017, CI037]

FI003: Financial estimate range

The cleanest source-backed late-2025 to early-2026 revenue range is Databricks annualized revenue run-rate.

The figure uses three disclosed run-rate points over two quarters. Mid is a disclosed December 2025 point, not a statistical midpoint.

[CI009, CI013]

4.2 GTM motion and the public unit-economics proxies that actually exist

Databricks still does not publish CAC, payback, quota efficiency, or sales-cycle duration, so the right approach is to lean on public expansion proxies rather than fabricate SaaS precision. The best signals come from spending cohorts and retention. CNBC reported that Databricks had net retention above 140% in June 2025, nearly 50 customers spending more than $10 million annually in the first quarter of the new fiscal year, and roughly 8,000 employees while continuing to hire aggressively. By February 2026 Databricks and CRN were both citing more than 800 customers consuming over $1 million in annual revenue run-rate and more than 70 above $10 million. Those cohorts strongly suggest a land-expand motion with material cross-sell potential across engineering, warehousing, BI, and AI workloads. Sacra also estimates average contract value around $208,696 as of June 2024, which is directionally useful but not a substitute for booked ARR disclosures. The chapter should therefore treat Databricks sales efficiency as promising but only partially observable: public evidence supports strong expansion within existing enterprise accounts, but it does not reveal the customer-acquisition cost, discounting intensity, or time-to-productivity needed for a full payback model.[CI017, CI018, CI021, CO023, CI023, CI045]

Unit economics table
Metric	Value	Confidence	Why it matters	Diligence ask
Recognized revenue for fiscal year ended Jan 2025 (USD billions)	2.6	medium	Anchors run-rate claims in at least one reported fiscal-year revenue datapoint.	Confirm audited GAAP revenue, deferred revenue, and revenue recognition policy by product.
Annualized revenue run-rate by Feb 2026 (USD billions)	5.4	medium	Shows scale and acceleration entering 2026, but run-rate is not the same as recognized revenue.	Bridge run-rate to booked and recognized revenue by quarter.
Net revenue retention	>140%	medium	Indicates strong expansion inside existing accounts and supports a usage-led land-expand thesis.	Provide cohort-level NRR by enterprise segment and by product family.
$1M+ annual run-rate customers	800	medium	Large high-spend cohort is a practical proxy for enterprise depth and cross-sell durability.	Provide cohort gross retention and gross-margin profile for these accounts.
$10M+ annual run-rate customers	70	medium	Very large accounts imply strategic embed but also raise concentration questions.	Provide top-10 customer exposure and any hyperscaler/channel overlap.
Average contract value proxy (USD)	208696	low	Third-party estimate provides directional context for typical deal size outside the very largest cohorts.	Validate against internal ACV / annual spend distribution.
Public gross margin		low	Gross margin is the key missing bridge between strong usage growth and durable cash generation.	Provide audited gross profit by major workload and cloud.
Public CAC / payback		low	Without CAC or payback, sales efficiency cannot be underwritten like a public SaaS company.	Provide blended and enterprise-only CAC, payback, and rep productivity curves.
Free cash flow status	Positive over prior 12 months by Sep 2025 and Feb 2026	medium	Suggests improving operating leverage even without full margin disclosure.	Provide absolute operating cash flow, capex, and free cash flow by quarter.

Null fields are intentional where Databricks withholds public detail. The table keeps public proxies separate from missing underwriting inputs.

[CI013, CI015, CI017, CI018, CI020, CI021]

Public financial gaps table
Missing metric	Why it matters	Best public proxy	Exact diligence path
Realized net pricing by workload	List pricing does not reveal net revenue quality or discounting intensity.	Public DBU mechanics and commitment discounts only.	Request top-100 contract sample with list price, discount, cloud, and product mix.
Audited gross margin and contribution margin	Growth can look excellent while margin quality deteriorates.	Snowflake and Confluent filings offer comparators; Sacra offers only a low-confidence estimate for Databricks.	Request audited gross profit and infrastructure cost allocation by product family.
Cash balance and debt terms	Capital adequacy cannot be modeled without exact liquidity and obligations.	CNBC says Databricks has billions in cash and about $2B of debt capacity.	Request closing cash, debt docs, maturity schedule, and covenant summary.
Monthly burn and runway	Runway is the basic underwriting test for a private company.	Positive free cash flow signals lower stress but not a full runway model.	Request trailing 18-month monthly cash bridge and downside runway scenarios.
CAC, payback, and sales-cycle length	Late-stage software underwriting needs a go-to-market efficiency view.	NRR >140%, 800+ $1M customers, and 70+ $10M customers indicate strong expansion but not acquisition efficiency.	Request cohort CAC, payback, pipeline conversion, and rep productivity by segment.
Customer concentration and top-account exposure	Very large cohorts can hide dependence on a few strategic accounts or channels.	Public sources show 70+ customers above $10M annualized spend but no top-customer concentration.	Request revenue concentration, top-20 account trends, and hyperscaler / marketplace channel overlap.

These are the core blockers to turning a strong public growth narrative into a full investment-grade financial model.

[CI014, CI017, CI018, CI021, CI044]

FI002: Unit economics bridge

Public unit-economics evidence is strongest on expansion behavior and weakest on acquisition efficiency and margin disclosure.

[CI017, CI018, CI021, CO023, CI045]

4.3 Cost structure, gross-margin path, and why dual billing matters

The cleanest public margin insight is not a Databricks audited statement but the mechanics of the platform and comparator filings. Databricks itself says total cost of ownership spans two components: direct platform costs and the underlying cloud infrastructure costs needed to run workloads. Microsoft adds the operational detail that Azure Databricks customers are billed for both VMs and DBUs, that idle pools can still incur infrastructure billing, and that committed-use purchases lower DBU prices but do not eliminate the cloud bill. That means Databricks margin quality will depend on software take-rate, workload mix, negotiated hyperscaler economics, and how much new AI serving and database usage compresses margin before scale catches up. Snowflake’s 2026 10-K is a helpful upper-bound comparator: product gross margin was 72% even after $248.1 million of additional third-party cloud infrastructure expense, including AI inference. Confluent’s filing is the cautionary counterexample: it says public-cloud pricing materially affects gross margins, that Confluent Cloud historically carried a lower average price than its legacy platform, and that the company shifted toward free-trial and pay-as-you-go land motions with more near-term volatility. Independent Databricks-specific analysis points in the same direction. CloudForecast, Mammoth, and Revefi all highlight that DBU pricing plus separate cloud charges make spend harder to predict, especially as AI workloads spike. The implication is that Databricks could still have attractive software economics, but margin underwriting remains incomplete without audited gross-profit and operating-cash detail.[CI002, CI003, CI004, CI005, CI006, CI007]

FI004: Capital intensity / cash-flow map

Databricks appears less capital-intensive than hardware-heavy AI companies, but the main cash-flow sensitivities are cloud economics, AI workload mix, and hidden debt / liquidity details.

The map is qualitative because Databricks does not publish audited gross margin, capex, or runway figures.

[CI014, CI015, CI032, CI033, CI034, CI037]

4.4 Capital adequacy, financing dependency, and the financial verdict

Public evidence points to low near-term financing stress but still leaves important underwriting holes. Databricks has moved from the December 2024 Series J package, which targeted AI investment, acquisitions, international go-to-market expansion, and employee liquidity, to a September 2025 Series K and then the February 2026 package worth more than $7 billion, including about $5 billion of equity and about $2 billion of additional debt capacity. Combined with public statements that free cash flow was positive over the prior 12 months, that suggests Databricks is financing growth options rather than plugging a disclosed liquidity crisis. CNBC also reported that the company now has billions in cash on hand, but without a precise balance, debt pricing, covenant package, amortization schedule, or monthly burn rate. That is enough to support a forward verdict of strong revenue quality and low immediate capital-intensity risk relative to many late-stage AI companies, but not enough to complete a lender-style or IPO-style liquidity model. The chapter’s practical conclusion is that Databricks appears financially durable in the near term, with multiple growth engines and ample external capital access, yet diligence should still prioritize realized pricing, audited margins, cash and debt schedules, and concentration risk before treating the public run-rate narrative as fully underwritten.[CO028, CI012, CI013, CI014, CI015, CI024]

Capital adequacy table
Capital metric	Public value / status	Evidence	Underwriting implication	Diligence ask
Cash on hand		CNBC says Databricks now has billions in cash on hand after the Feb 2026 package, but gives no exact balance.	Liquidity appears ample, but exact cash cannot be modeled.	Request current cash, restricted cash, and post-close liquidity waterfall.
Monthly burn		No public monthly burn disclosed; company instead emphasizes positive free cash flow.	Exact runway cannot be calculated from public data.	Request monthly cash burn bridge and scenario burn under slower growth.
Runway months		No exact cash balance plus no burn rate.	Runway remains an evidence gap despite strong financing access.	Request base, downside, and acquisition-adjusted runway model.
Planned use of funds	AI products, acquisitions, international GTM, employee liquidity, Lakebase, Genie	Series J and Feb 2026 company statements specify growth, product, M&A, and liquidity uses.	Capital appears growth-oriented rather than rescue-oriented.	Request board-approved capital plan and 12-24 month deployment budget.
Next-round trigger	No immediate public trigger; IPO / private financing appears optional rather than urgent	Positive free cash flow plus >$7B financing package reduce near-term dependency.	Near-term capital risk looks low, but market timing can still shape the path to IPO.	Confirm management trigger points for IPO, debt drawdown, or another private round.
Debt / credit obligations	~$2B additional debt capacity disclosed in Feb 2026; detailed terms undisclosed	Debt broadens flexibility but may embed covenants, pricing, and maturity risk not public.	Forward adequacy depends partly on unseen debt terms.	Request debt agreements, covenants, maturity ladder, and security package.

This table intentionally focuses on forward liquidity and financing dependency rather than repeating the full historical round chronology already established elsewhere in the report.

[CI014, CI015, CI024, CI025]

4.5 Exhibits

Chapter 05

05Product & Technology

5.1 Product scope and customer workflow coverage

Databricks is best understood as a workflow platform that starts before analytics and extends beyond it. The product surface now spans ingestion and transformation patterns, bronze-silver-gold lakehouse organization, centralized governance, BI consumption, AI model deployment, and operational application databases. LakeFlow matters because it pulls ingestion, transformation, and orchestration closer to the platform rather than leaving those jobs entirely to partners. Unity Catalog and AI/BI matter because they move Databricks from technical-platform ownership toward business-facing semantics, lineage, and conversational analytics. Mosaic AI Model Serving extends that workflow into real-time and batch inference, while Lakebase pushes the platform further into operational application development by pairing Postgres with the lakehouse. The net result is a broader customer journey: ingest and clean data, govern it centrally, expose metrics and dashboards to business users, deploy models and external-model endpoints, and increasingly build applications or agents on top of the same governed data estate. That breadth is strategically valuable because it reduces tool sprawl, but it also means underwriting Databricks requires assessing how coherently these modules work together rather than judging a single warehouse SKU.[CE001, CE002, CE007, CE011, CE012, CE014]

Product module / asset matrix
Module / product line	Primary user	Status / maturity	Differentiation	Diligence gap
Core lakehouse + medallion architecture	Data engineers and platform teams	Mature core workflow	Single governed path from raw to enriched data across bronze, silver, and gold layers	Need workload-level evidence on migration friction and performance by cloud.
Unity Catalog	Data platform, governance, security, and analytics teams	Mature control-plane pillar	Open-format governance, lineage, federation, and row/column controls across data and AI assets	Need public evidence on adoption depth of newer business-semantics and AI-governance features.
Databricks SQL + AI/BI	Analysts, business users, and semantic-layer owners	Mature analytics with expanding business-user reach	Native BI on governed data with conversational analytics and no public per-seat BI fee	Need public proof of production BI adoption, concurrency, and dashboard migration success.
Mosaic AI Model Serving	ML engineers, application developers, and platform teams	Mature serving surface with expanding external-model governance	Unified REST deployment and serverless serving for internal and external models	Need independent latency, cost, and guardrail benchmarks versus alternatives.
LakeFlow	Data engineering teams	Expanding; launched 2024 and still integrating partner overlap	Built-in ingestion, transformation, and orchestration reduce need for separate pipeline tooling	Need public evidence on connector breadth, reliability, and large-scale production references.
Lakebase	Application developers and agent builders	Emerging but materially advanced; GA reported in 2026	Operational Postgres integrated with the lakehouse, branching, point-in-time recovery, scale to zero	Need customer volume, cost-to-serve, and multi-cloud availability detail.
Lakewatch	Security teams and SecOps analysts	Newly launched in 2026	Extends Databricks data platform into AI-assisted SIEM workflows	Need public benchmarks, customer references, and false-positive / efficacy data.
CLI + Python SDK	Developers and platform engineers	Active ecosystem tooling with recent releases	Multi-cloud automation and developer workflows beyond notebooks alone	Need broader usage and contributor trends across the full ecosystem.

Rows separate mature core platform layers from newer expansion products such as Lakebase and Lakewatch. “Status / maturity” reflects public release evidence, not internal revenue contribution.

[CE002, CE007, CE012, CE022, CE028, CE037]

Workflow / use-case table
User job	Current workflow	Databricks solution	Measurable benefit	Limitation
Ingest SaaS and database data	Teams often chain separate ingestion, replication, and orchestration tools before analytics	LakeFlow ingestion, transformation, and orchestration inside Databricks	LakeFlow was launched to reduce the need for bespoke or third-party ingestion stacks	Public evidence does not show connector reliability or realized replacement rates at scale.
Create governed enterprise data products	Data lands in fragmented stores and governance tools with duplicated controls	Medallion layers plus Unity Catalog governance and lineage	Unified governance and lineage reduce audit friction and make downstream usage easier to trace	Public proof of governance-operating efficiency is still mostly vendor-authored.
Let business users self-serve analytics	BI sits on separate semantic layers and seat-based licensing models	AI/BI Dashboards, Genie, Databricks SQL, and Business Semantics on governed data	No public per-seat BI fee and conversational analytics reduce access friction	Customer migration effort from incumbent BI tools is not publicly quantified.
Deploy and manage AI inference	Teams manage separate model endpoints, APIs, and provider credentials	Mosaic AI Model Serving with REST APIs, serverless scaling, and centrally governed external models	Unified batch and real-time inference path simplifies deployment under one control plane	Independent latency, cost, and security comparisons are limited.
Build operational apps on governed data	Operational databases and analytics warehouses are separated by ETL and separate tooling	Lakebase adds Postgres integrated with the lakehouse and Databricks Apps	VentureBeat reported early adopters cutting app-delivery times by 75%-95% or 56%-92% depending on customer example	Those performance outcomes are company-reported through press coverage and not yet broadly audited.

Benefits are only included where the retained sources provided a concrete workflow or outcome statement. Press-reported customer outcomes remain lower confidence than audited benchmark data.

[CE011, CE013, CE020, CE021, CE022, CE023]

FE002: Customer Workflow / Operating Flow

Workflow from source-data ingestion to governed analytics, AI deployment, and operational application delivery. The figure emphasizes where Databricks has expanded from core lakehouse roots into pipeline tooling and Postgres-backed apps.

[CE011, CE013, CE020, CE021, CE022, CE028]

FE004: Product Maturity / Capability Map

Qualitative maturity map across Databricks capability areas. Core governance and lakehouse layers appear mature; BI and AI serving are mature-to-expanding; Lakebase and Lakewatch remain newer lines that still need broader public proof.

The maturity labels are analyst judgments synthesized from public release evidence, documentation depth, and independent reporting; they are not company-provided scores.

[CE004, CE011, CE022, CE028, CE031, CE033]

5.2 Architecture, deployment model, and critical dependencies

The most supportable public architecture picture is a hybrid one: Databricks manages a control plane, while classic compute still runs in customer cloud accounts and serverless compute runs in Databricks-managed infrastructure. Azure documentation and independent architecture analysis both describe the control-plane/compute-plane split, while Databricks' own architecture guidance frames the platform around control plane, compute plane, and storage. On the data path, Databricks keeps pushing the medallion pattern because bronze, silver, and gold layers make it easier to express ingestion, validation, and consumption steps as one governed pipeline. Unity Catalog then acts as the metadata and policy plane above those assets, and model serving exposes governed inference endpoints through REST APIs and serverless scaling. The dependency map is therefore not just hyperscalers. Databricks also depends on open-table-format politics, partner cloud services such as BigQuery and Gemini on Google Cloud, GPU acceleration paths such as NVIDIA RAPIDS, and external model providers such as OpenAI and Anthropic where customers want centrally governed third-party models. This architecture is flexible and differentiated, but it also means product quality depends on how well Databricks manages cloud boundaries, open-format interoperability, and external-service performance under one control plane.[CE003, CE015, CE016, CE017, CE018, CE019]

Technology / operating architecture table
Layer / component	Role	Dependency	Risk
Account + control plane	Hosts web app, account services, APIs, and central coordination	Databricks-managed control plane and account services	Control-plane concentration creates blast-radius risk if central services degrade.
Workspace + classic compute plane	Runs notebooks, jobs, and customer-managed compute in the customer cloud account	Customer cloud networking, identity, and classic cluster configuration	Security posture varies with workspace design and cloud-account hygiene.
Serverless compute	Runs model serving and serverless SQL without customer-managed public IPs	Databricks-managed serverless infrastructure and separate terms enablement	Less customer control and public transparency on service-family SLAs and incident rates.
Lakehouse data pipeline	Organizes data through bronze, silver, and gold quality layers	Storage systems, ingestion tools, and medallion discipline	Poor upstream data quality still propagates if silver/gold controls are weak.
Unity Catalog metadata plane	Enforces governance, lineage, discovery, and federation	Open formats, external systems, and policy configuration across clouds	Metadata centralization is strategic strength but becomes a control-plane dependency.
AI deployment layer	Serves internal and external models via REST APIs and AI Functions	Serverless compute, model registries, and third-party model providers such as OpenAI and Anthropic	Latency, cost, and policy outcomes partly depend on external-model vendors.
Open ecosystem + partner cloud layer	Extends Databricks through Iceberg, BigQuery, Gemini, and GPU acceleration	Google Cloud, NVIDIA, and open-table-format interoperability	Differentiation is tied to partner performance and evolving open-format standards.
Operational database layer	Runs Lakebase Postgres for AI-agent and operational apps	Neon/Mooncake-derived database tech plus Unity Catalog sync	Newer product category with less public scaling and reliability history than core lakehouse.

The table emphasizes operating-model dependencies rather than low-level implementation details. Public sources support the control-plane split, medallion pipeline, and partner dependencies but not internal service topology.

[CE015, CE018, CE019, CE020, CE022, CE023]

FE001: Databricks Product Architecture Map

Five-layer stack showing how Databricks ties user-facing analytics and app workflows to centralized governance, lakehouse pipelines, AI serving, and multi-cloud infrastructure. The architecture is broad, but the control plane and partner ecosystem remain critical dependencies.

[CE003, CE007, CE012, CE018, CE020, CE022]

FE003: Critical Dependency Map

Directed graph of the major external and internal dependencies that shape Databricks product delivery: hyperscalers, open formats, GPUs, external models, and the central Databricks control plane.

[CE018, CE019, CE024, CE028, CE030, CE034]

5.3 Trust, safety, privacy, compliance, and reliability posture

Databricks has enough public trust material to show serious enterprise posture, but not enough to treat trust as fully de-risked. The Trust Center says security is built into every layer of the platform and publicly points buyers to encryption, network controls, auditing, identity integration, access controls, and data governance. The compliance pages list a wide range of frameworks relevant to regulated buyers, including FedRAMP, HIPAA, GDPR, PCI-DSS, ISO 27001/27017/27018/27701, and SOC, with SOC 3 public and other reports accessible through diligence channels. Serverless SQL adds one concrete architectural trust signal because Databricks says those warehouses have no public IP addresses. At the same time, reliability remains a live operating issue rather than a solved checkbox: on the run date, the AWS status page showed partial compute disruption in multiple regions even while AI/BI and Apps were largely operational. Independent uptime monitors add only limited comfort because they often summarize uptime without publishing detailed incident data or root causes. AI security is also moving target, not a closed question. Databricks publicly discusses AI security resources and in March 2026 launched Lakewatch as an AI-assisted SIEM product, but there is still little independent evidence on detection quality, false positives, or how responsible-AI controls perform in production environments.[CE025, CE031, CE032, CE033, CE044, CE045]

Trust / quality / compliance table
Control / certification / quality signal	Status	Scope	Gap
Encryption, network controls, auditing, identity integration, access controls, governance	Publicly documented	Platform-wide trust posture according to Databricks Trust Center	Public pages do not quantify control effectiveness or incident-prevention outcomes.
Serverless SQL without public IPs	Publicly documented	Serverless SQL network isolation on AWS	Does not by itself disclose uptime, egress policy coverage, or all serverless-service boundaries.
FedRAMP, HIPAA, GDPR, PCI-DSS, ISO 27001/27017/27018/27701, SOC	Publicly listed	Regulated-industry and privacy posture across supported clouds	Framework listing is not the same as customer-specific configuration or scope fit.
SOC 3 public; SOC 1 and SOC 2 available via diligence channels; reports refreshed three times yearly	Publicly documented	Audit cadence and report availability	No public SOC detail in the chapter beyond availability and cadence.
Live status page by service family and region	Publicly documented	Operational visibility for active incidents	Historical MTTR, severity distribution, and root-cause reporting remain limited.
Lakewatch AI-assisted SIEM launch	Recently launched	Extends platform into AI security operations	Independent efficacy evidence and customer deployments remain sparse.

Trust evidence is strongest for control coverage and compliance breadth, but weakest for quantified reliability and independent AI-security efficacy.

[CE025, CE031, CE032, CE033, CE044, CE045]

5.4 Maturity, differentiation, and roadmap signals

The strongest differentiation signal is that Databricks is trying to be the governed control plane for open data, AI assets, and now operational application data, not just the place where Spark jobs run. Unity Catalog' open-format posture, federation support, and lineage features are central to that strategy, and Google Cloud's Iceberg commentary plus theCUBE's 2025 summit summary both reinforce that openness is a real product direction rather than a one-off talking point. The 2024-2026 launch timeline also shows consistent scope expansion: LakeFlow addressed ingestion and orchestration, the 2025 summit pushed semantics, agent tooling, and Lakebase into the foreground, Lakebase reached general availability in early 2026, and Lakewatch added a new security layer weeks later. Developer signal points the same way. The CLI and Python SDK show active releases in April 2026, and the SDK documentation emphasizes unified support across AWS, Azure, and GCP, which is the footprint expected from a platform company rather than a narrowly packaged application. The public evidence therefore supports a verdict of broad product maturity in core lakehouse, governance, and developer tooling, expanding maturity in BI and AI serving, and emerging maturity in operational database and AI-security products. What remains weak is forward visibility: Databricks publishes active release-note cadence, but not a dated roadmap that would let an external investor cleanly separate near-term launches from longer-horizon ambition.[CE004, CE006, CE009, CE010, CE027, CE035]

Roadmap / release / development-stage table
Date / stage	Feature / milestone	Status	Implication	Source
2024-05-14	NVIDIA publishes RAPIDS-on-Databricks technical guide	Ecosystem capability documented	Signals that Databricks is investing in GPU-accelerated developer workflows, not just CPU-bound analytics.	NVIDIA Technical Blog
2024-06-12	LakeFlow launch for ingestion, transformation, and orchestration	Launched	Moves Databricks upstream into built-in pipeline tooling and reduces reliance on adjacent vendors.	TechCrunch
2025-06-11	Data + AI Summit 2025 updates around Unity Catalog semantics, open formats, GenAI tools, and Lakebase	Announced / expanded direction	Shows platform expansion from lakehouse core toward a broader enterprise data-and-AI operating layer.	theCUBE Research
2025-08-30	Google Cloud blog highlights Unity Catalog Iceberg support across catalogs	Partner-confirmed ecosystem milestone	Reinforces Databricks' open-format and interoperability posture.	Google Cloud Blog
2026-02-03	Lakebase reported generally available and built on Neon plus Mooncake technology	GA reported by independent press	Makes Databricks relevant for operational apps and agent workflows, not only analytics.	VentureBeat
2026-03-24	Lakewatch AI security product launched	New product launch	Expands Databricks into SIEM-style security workflows but also introduces a new proof burden.	TechCrunch
2026-04-30	CLI v0.299.0 and Python SDK 0.106.0 visible in public release surfaces	Active developer-tooling cadence	Developer tooling is shipping quickly enough to matter as a productization signal.	GitHub / PyPI
2026-05-04	Release-note index current through May 2026 with Lakeflow declarative pipelines and serverless called out	Current release cadence visible	Confirms ongoing platform iteration but not a dated forward roadmap.	Databricks Docs

The table focuses on externally visible milestones that change product scope or maturity. It does not infer undisclosed future dates beyond public releases and announcements.

[CE027, CE035, CE037, CE038, CE040, CE041]

5.5 Exhibits

Chapter 06

06Customers

6.1 Customer segmentation and visible buyer map

Databricks’ public customer picture is broad, but it is most useful when separated into buying contexts rather than treated as a single logo wall. The company’s own 2025 and 2026 disclosures now anchor scale at more than 20,000 organizations and 70% of the Fortune 500, while CNBC still referenced more than 15,000 customers in mid-2025. That progression matters because it shows both breadth and momentum instead of a one-off marketing number. The visible buyer map is also broader than a core data-engineering sale. Microsoft, Google Cloud, and SAP all position Databricks as a route to enterprise analytics and AI procurement, while named accounts span telecom, payments, media, retail, healthcare, and public-sector style workloads. In practice, Databricks is sold to platform leaders, data and AI teams, governance leaders, and increasingly business users who consume AI/BI and governed analytics. The payer is often a central platform or cloud budget, but partner routes can influence procurement and renewal control. This makes Databricks look like a scaled enterprise platform with multiple buyer centers rather than a single-workload tool.[CU001, CU002, CU003, CU042, CU043, CU044]

Customer segmentation table
Segment	Buyer / user / payer	Use case	Scale	Revenue / strategic value	Gap
Large global enterprises	CDO/CIO, platform teams, analysts, governed business users	Unified data, analytics, AI, and agent workloads	20,000+ customers claimed; 70% Fortune 500 penetration claimed	Largest source of expansion and whale cohorts	No public split between active, paying, and channel-routed customers
Consumer and digital experience brands	Marketing, data science, product, operations	Real-time search, fan experience, personalization, store operations	7-Eleven, FOX Sports, Rivian, Block highlighted publicly	Shows Databricks can move beyond classic back-office analytics	No public contract values or renewal detail by consumer brand
Financial services and payments	Data platform, governance, onboarding, fraud, risk teams	AI assistants, secure data collaboration, pipeline optimization	Mastercard and Block explicitly named in current disclosures	High-reference value because governance and privacy matter	No public revenue concentration or procurement-cycle detail
Regulated healthcare, manufacturing, and public sector	Manufacturing ops, compliance, data engineering, federal contractors	Operational data unification, reliability, and compliant analytics	Insulet plus FedRAMP/Azure IL5 public-sector readiness	Supports trust-led and regulated-workload expansion	No public public-sector bookings or healthcare retention data
Partner-routed enterprise buyers	Cloud architects, enterprise platform teams, SAP data owners	Buy via Azure, Google Cloud Marketplace, or SAP Business Data Cloud	Azure, Google Cloud, and SAP maintain active Databricks routes	Extends distribution and lowers procurement friction for some enterprises	Direct versus partner sourced customer mix is undisclosed

Segmentation separates visible buying contexts and channels instead of treating customer proof as a single undifferentiated enterprise bucket.

[CU001, CU002, CU003, CU042, CU043, CU044]

6.2 Named deployment proof is strongest when it includes measurable outcomes

The strongest public customer proof is not the existence of logos, but named deployments with specific outcomes. The July 2025 Databricks summit recap gives the cleanest recent set. 7-Eleven used Databricks for a multipurpose agentic marketing assistant across more than 13,000 stores and also used Databricks workflows to support a Unity Catalog migration. FOX Sports built Cleatus AI to answer fan questions in natural language and said AI-powered search produced a 2x improvement in query success. Mastercard gives both workflow and economics proof: its onboarding assistant was built with Databricks, it uses human-in-the-loop feedback, and Databricks said Mastercard cut query time by 80% and storage by 70% while reducing processing from months to days. AT&T is the best large-enterprise migration proof from a partner domain, with Microsoft documenting 300% five-year ROI, more than 80 schema reductions, and roughly 3x faster data-science cycles on Azure Databricks. Insulet adds healthcare and manufacturing proof, with 12x faster processing and sharply lower data-stack cost. Together these references support real production use across multiple verticals and multiple deployment shapes, not just conference-stage demos.[CU011, CU012, CU013, CU014, CU015, CU016]

Named customer proof table
Customer	Segment	Deployment / use case	Production vs pilot	Outcome	Limitation
7-Eleven	Retail / store operations / marketing	Agentic marketing assistant, Unity Catalog migration support, technician knowledge retrieval	Production use case presented publicly	Tracks store performance across 13,000+ stores and uses workflows to guide migration steps	No public contract value, renewal data, or ROI disclosure
FOX Sports	Media / consumer engagement	Cleatus AI fan assistant with natural-language search over scores, stats, and commentary	Production	AI-powered search more than doubled query success for fans	No disclosed commercial metrics or retention terms
Mastercard	Financial services / payments	GenAI onboarding assistant plus data-pipeline optimization and governance	Production	Onboarding sped up, drop-off reportedly declined, query time down 80%, storage down 70%, processing reduced from months to days	Customer economics and renewal details are undisclosed
AT&T	Telecom / enterprise data platform	Migration of large data estate onto Azure Databricks plus AutoClassify ML use case	Production	300% five-year ROI, more than 80 schemas reduced, and data-science cycles about 3x faster	Case study focuses internal platform value, not external revenue outcomes
Insulet	Healthcare / manufacturing	Lakeflow Connect-driven data unification for manufacturing and customer-service data	Production	12x faster processing and 97% lower TCO, with near-real-time enterprise ingestion	Only company-authored proof is public; no independent outcome audit

Rows prioritize the clearest named accounts with measurable outcomes or credible corroboration rather than attempting an exhaustive customer roster.

[CU012, CU013, CU014, CU015, CU016, CU017]

FU003: Customer proof matrix

Relative strength of public customer proof by named account.

[CU012, CU016, CU019, CU020, CU024, CU026]

6.3 Durability is directionally strong, but public retention proof is still incomplete

Public durability evidence is positive, but it is not complete enough to underwrite Databricks like a fully disclosed public software company. The strongest disclosed signal is the company’s sustained net retention above 140%, repeated in September 2025, February 2026, and corroborated by CNBC and CRN. Large-account cohorts point in the same direction: Databricks moved from 650-plus to 800-plus $1M annual run-rate customers and from nearly 50 to more than 70 $10M customers in the same general period. Those are real expansion indicators. But public retention evidence still stops short of what an investor would ideally want. There is no public GRR, no segmented churn, no cohort waterfall, and no contract-term disclosure. Review surfaces are helpful only as directional color. Databricks’ own Gartner recap points to strong AI/BI satisfaction, while accessible independent review pages also surface consistent complaints about cost control, complexity, and onboarding. The right conclusion is that Databricks shows strong expansion and meaningful product value, but public sources still do not prove renewal durability by cohort or by product line.[CU004, CU005, CU006, CU007, CU008, CU009]

Customer growth / adoption trajectory table
Metric	Value	Date	Source	Confidence	Implication	Missing denominator
Public customer count (historical reference)	15,000+	2025-06-12	CNBC	medium	Shows scale was already large before the 2025-2026 financing cycle	No split by paid versus active accounts
Public customer count (current company claim)	20,000+	2026-02-09	Databricks press release	medium	Anchors current breadth across enterprise and AI buyers	No product-family or geography split
Fortune 500 penetration	70%	2026-02-09	Databricks press release	medium	Signals deep enterprise reach and high-reference value	No disclosure of depth per Fortune 500 account
$1M+ annual run-rate customers	650+	2025-09-08	Databricks press release	medium	Supports land-and-expand traction in large accounts	No gross retention for this cohort
$1M+ annual run-rate customers	800+	2026-02-09	Databricks press release / CRN	medium	Shows rapid whale-account expansion into 2026	No public revenue share from the cohort
$10M+ annual spenders	Nearly 50	2025-06-12	CNBC	medium	Confirms an already material whale cohort by mid-2025	No top-10 concentration disclosure
$10M+ annual run-rate customers	70+	2026-02-09	Databricks press release / CRN	medium	Suggests deeper enterprise embed and cross-sell	No segment or cloud-channel split
Net retention	>140%	2025-09 to 2026-02	Databricks / CNBC / CRN	medium	Best public durability proxy for platform expansion	No GRR, churn, or time-bucket cohort data

Trajectory rows rely on public breadth and spend-cohort metrics rather than unsupported estimates of active seats or deployment counts.

[CU001, CU003, CU004, CU005, CU006, CU007]

Retention / repeat usage / satisfaction table
Metric	Value / null	Segment	Confidence	Diligence ask
Net retention	>140%	Overall platform	medium	Request GRR, logo churn, and retention by product family and customer band.
AI/BI review signal	4.8 / 5 and 94% willingness to recommend from 167 verified reviews	Analytics and BI users	medium	Validate whether AI/BI satisfaction translates into broader platform renewal.
PeerSpot review base	93 reviews; multiple cost-management complaints	Enterprise review-site audience	low	Request support SLAs, FinOps tooling adoption, and cost-governance outcomes by segment.
Capterra archived review signal	17 archived reviews; setup and interface complexity recur in cons	Mixed user base	low	Request onboarding time-to-value and training requirements for newer or smaller teams.
Mastercard onboarding churn direction	Churn down, exact percentage undisclosed	Payments onboarding workflow	low	Request quantitative before/after abandonment rates and enterprise rollout breadth.
Public GRR / cohort retention		All segments	low	Request month-by-month and annual cohort retention by enterprise, regulated, and AI-heavy customers.
Public contract duration / renewal term		All segments	low	Request weighted-average remaining term and standard renewal cadence by account band.

Public durability evidence is strongest on NRR and weakest on cohort retention, GRR, and contract-term detail. This table intentionally substitutes for the planned cohort figure because no public 0-100 retention cohort data was found.

[CU008, CU009, CU010, CU034, CU035, CU036]

Retention cohort substitution table
Evidence area	Public signal	What is missing	Why figure was not used
Net retention	>140% overall NRR is publicly repeated in 2025-2026	No GRR, logo churn, or 0-100 time-bucket retention series	A cohort figure would require actual percentage buckets rather than directional expansion signals.
$1M+ and $10M+ cohorts	Public counts of 650+, 800+, nearly 50, and 70+ large accounts	No cohort renewal or contraction history for those whale bands	Spend-band counts show expansion, but they do not satisfy a retention-cohort data contract.
Review and satisfaction signals	AI/BI review scores are strong while PeerSpot and Capterra surface cost and complexity complaints	No link between those reviews and actual paid renewal behavior	Review text helps with qualitative durability but cannot populate numeric cohort cells.
Named customer stories	Mastercard, AT&T, FOX Sports, 7-Eleven, and Insulet show live deployments and outcomes	No public contract term, cohort, or renewal denominator for those references	Case studies prove production use, not retention percentages over time.

This extra table intentionally substitutes for the planned retention cohort figure because public sources did not provide time-bucket retention percentages between 0 and 100.

[CU005, CU006, CU007, CU008, CU009, CU010]

6.4 Expansion logic is credible, but concentration and channel economics remain partially hidden

Databricks’ expansion logic is visible even without full cohort disclosure. The platform can start as a data-engineering or cloud-migration buy, then expand into governance, AI serving, AI/BI, operational data products, or customer-specific agent workflows. AT&T’s nearly 90,000 internal users on one architecture and Mastercard’s expansion from data pipelines into onboarding assistants are concrete examples of that widening use. Public partner surfaces reinforce the same pattern. Azure positions Databricks as a first-party Azure service, Google Cloud offers Databricks through Marketplace, and SAP now embeds SAP Databricks as a first-party service inside Business Data Cloud. Those routes should make Databricks easier to buy and harder to displace inside large enterprises. The unresolved issue is economic control. Public sources do not show how much revenue is partner-sourced, how concentrated the top customer set is, or whether the curated reference list overstates the ease of expansion. Databricks therefore looks strong on customer adoption quality and cross-sell logic, while still requiring private diligence on top-account concentration, channel mix, and term structure before treating customer durability as fully underwritten.[CU005, CU006, CU019, CU024, CU031, CU041]

Expansion and concentration risk table
Expansion driver	Concentration risk	Impact	Diligence path
Usage-led platform expansion from data engineering into AI, BI, and operational use cases	$10M+ whale cohort is material but top-customer share is undisclosed	Strong growth with unknown revenue concentration at the very top of the base	Request top-10 and top-20 ARR share, gross retention, and product attach by whale cohort.
Internal land-and-expand after platform standardization, as seen at AT&T and Mastercard	Switching-cost strength may hide dependence on a few deeply embedded accounts	Can improve durability, but also magnify downside if one strategic account slows consumption	Request usage concentration by workspace, region, and product family for the largest customers.
Partner channels through Azure, Google Cloud, and SAP	Partner-routed deals may compress economics or shift renewal control	Channel leverage can help acquisition while reducing direct control over procurement and margin	Request direct versus partner sourced bookings, marketplace mix, and renewal ownership by channel.
Referenceable AI use cases such as 7-Eleven, FOX Sports, Mastercard, and Insulet	Curated references may overstate average deployment success and understate failed pilots	Good public proof, but survivorship bias remains a real diligence issue	Request win/loss data, failed pilot counts, and references from the last four quarters by segment.
Large review base and high AI/BI satisfaction alongside repeated cost complaints	Smaller or less-mature buyers may expand more slowly if cost governance is weak	Could cap adoption depth outside sophisticated enterprise teams even if top accounts keep expanding	Request cost-governance attach, training coverage, and support resolution metrics for smaller accounts.

Expansion logic is visible, but concentration and channel economics remain materially underdisclosed.

[CU005, CU006, CU019, CU024, CU031, CU035]

FU001: Customer journey map

Observed path from initial platform need to production rollout and cross-workload expansion in large Databricks accounts.

[CU005, CU006, CU012, CU016, CU019, CU024]

FU002: Adoption / deployment funnel

Publicly observable adoption path from enterprise need to standardized multi-workload deployment.

[CU005, CU006, CU012, CU016, CU019, CU029]

6.5 Exhibits

Chapter 07

07Risks

7.1 Legal, regulatory, and security risk is real even with a strong public trust surface

Databricks has a stronger public compliance and trust posture than many private infrastructure vendors: it publishes a privacy notice, a downloadable DPA, a trust center, a due-diligence package, a legal hub, and technical documentation that explicitly describes security responsibilities. That matters because it reduces initial diligence friction and shows the company is built to sell into regulated enterprises. The residual risk is that documentation is not the same thing as cleared exposure. The EU AI Act is now live with phased obligations for general-purpose AI models, public-sector authorization is explicitly cloud- and package-specific, and Databricks remains inside an active copyright dispute tied to Mosaic and DBRX. The Books3 / RedPajama litigation is not a theoretical AI-policy debate; it is a live legal process with U.S. and Canadian dimensions, uncertain damages, and potential reputational spillover for enterprise AI buyers. Databricks therefore looks better prepared than many peers on paper, but still exposed to a combination of regulatory scope creep, model-governance scrutiny, and litigation outcomes that are difficult to price from public data alone.[CR001, CR004, CR007, CR008, CR009, CR010]

Regulatory / legal risk register
Risk	Jurisdiction / source	Status	Likelihood	Severity	Mitigation	Residual exposure	Diligence path
AI copyright litigation tied to Mosaic / DBRX	U.S. federal + Canada	Active U.S. docket plus proposed Canadian class action	medium	high	Litigation defense, model-governance evidence, privacy / trust materials	Discovery, damages, or settlement could raise legal cost and enterprise trust friction	Request full litigation memo, reserve analysis, insurance coverage, and training-data provenance
EU AI Act and AI-governance obligations	EU / EEA	Phased obligations begin in 2025 and 2026 for GPAI and high-risk uses	medium-high	high	DPA, SCCs, trust materials, Unity Catalog governance claims	If Databricks sits closer to provider obligations than customers assume, compliance cost and GTM friction rise	Request AI Act applicability map by product, model role, and region
Public-sector authorization scope	U.S. federal	FedRAMP Certified exists for Databricks on Azure Commercial as of 2026-01-16	medium	medium-high	Cloud-specific authorization plus enhanced compliance controls	Investors may over-assume public-sector readiness across clouds, regions, or SKUs	Request cloud / region / product authorization matrix and renewal status
Privacy contracting and transfer regime	Global	Privacy notice, DPA, DPF, and legal center are public	medium	medium	Standard contractual clauses, supplementary measures, downloadable DPA, trust center	Cross-border processing, third-party service providers, and shared responsibility can still create contract friction after incidents	Request enterprise MSA, indemnity terms, subprocessor terms, and customer redline trends

Rows are ordered by residual severity, not by how much public disclosure exists.

[CR004, CR005, CR006, CR010, CR011, CR012]

7.2 Operational risk is transmitted through outages, shared responsibility, and partner concentration

Databricks’ operational risk is not just whether a core service stays up; it is how many critical workflows sit on top of cloud-specific deployments, partner models, and customer-side configuration. The official status page gives visibility, but third-party monitoring still shows enough Azure Databricks incident volume to treat reliability as an underwriting variable rather than a footnote. Databricks’ own documentation also makes clear that security and compliance are shared responsibilities across Databricks, customers, and cloud providers. That structure is normal for cloud infrastructure, but it means customer misconfiguration, workload placement, or control gaps can still become Databricks problems in practice when large enterprises evaluate renewals or incident response. At the same time, the company’s AI roadmap is increasingly entangled with Google Gemini, Anthropic Claude, SAP’s embedded data cloud route, and hyperscaler-native buyer relationships. These partnerships clearly accelerate distribution and feature breadth, yet they also create concentrated dependencies around account control, model access, and gross-margin leakage that public sources do not quantify.[CR008, CR009, CR018, CR019, CR020, CR021]

Operational / quality / security risk register
Failure mode	Evidence	Likelihood	Severity	Mitigation maturity	Residual exposure	Unresolved gap
Cloud-specific outages and degradations	IsDown reports 20 incidents in the last 90 days and 173 since Jan 2023; official status page exists	medium-high	high	medium	Enterprise workloads can still be interrupted by provider-region failures or slow recovery	No public SLOs, postmortems, or customer SLA-credit detail
Shared-responsibility misconfiguration	Databricks docs say security and compliance are shared among Databricks, customer, and cloud provider	medium	high	medium	Customer-side gaps can still become churn, legal, or reputational problems for Databricks	No public rate of misconfiguration-driven incidents by customer segment
Advanced controls as add-on rather than obvious default baseline	Enhanced Security and Compliance is a named add-on that includes FedRAMP High, FedRAMP Moderate, and HIPAA	medium	medium-high	medium	Some higher-assurance controls may require explicit packaging or workload choices	No public attach-rate or tier-by-tier baseline control disclosure
Security transparency thinner than compliance marketing	Trust, privacy, and legal surfaces are public, but detailed incident history is not	medium	medium	low-medium	Current mitigation posture is documentation-heavy rather than incident-history heavy	No public breach register, incident taxonomy, or RCA cadence

This table focuses on residual exposure after considering the public mitigation surface, not on whether controls exist at all.

[CR007, CR008, CR009, CR018, CR019, CR020]

Partner / dependency risk register
Dependency	Counterparty / market	Role	Concentration	Failure scenario	Severity	Mitigation	Residual exposure
Cloud and channel concentration	Microsoft, Google Cloud, SAP	Hosting, procurement, embedded distribution, enterprise route-to-market	high	Bundling, pricing, or policy shifts weaken Databricks account control or economics	high	Multi-cloud footprint and broad partner set	Large enterprise distribution still clusters around a small set of strategic routes
Frontier model access	Anthropic, Google Gemini, OpenAI	Model access for AI agents and enterprise features	medium-high	Model repricing, safety restrictions, or availability shifts raise COGS or slow roadmap delivery	high	Multiple model partners plus Mosaic AI tooling	External model providers still shape cost and feature availability
Embedded enterprise data route	SAP Business Data Cloud	Databricks becomes infrastructure inside another enterprise platform	medium	SAP controls customer context or product roadmaps more than Databricks does	medium-high	Databricks gains reach into large SAP estates	Channel leverage comes with lower direct control of the account
Native cloud substitutes	Microsoft Fabric, AWS EMR, Snowflake	Integrated data, database, Spark, and AI alternatives	high	Customers standardize on incumbent stacks already inside their cloud or data estate	high	Databricks still differentiates on open-source lineage and partner breadth	Incumbents can bundle data, AI, governance, and procurement in ways Databricks cannot fully offset publicly

The risk is not that Databricks lacks partners; it is that several of its most important partners also shape pricing, roadmap velocity, or competitive boundaries.

[CR023, CR024, CR025, CR026, CR027, CR028]

FR003: Dependency map

Databricks’ dependency stack spans cloud, model, and channel partners; breadth helps, but the critical nodes are still concentrated enough to matter.

The map shows critical dependency and competition nodes, not a complete ecosystem graph.

[CR023, CR024, CR026, CR027, CR028, CR029]

7.3 The largest residual risk may be valuation and execution, not immediate distress

Nothing in the public record suggests Databricks is financially weak today. The company says revenue, AI monetization, and free cash flow are all scaling, and multiple independent outlets corroborate a rapid step-up in capital raised and private valuation. That is precisely why valuation risk matters. Databricks moved from a $62 billion valuation in January 2025 to more than $100 billion in August-September 2025 and then to $134 billion by December 2025-February 2026 while also layering in billions of debt capacity. Meanwhile it is trying to extend beyond the classic lakehouse story into Lakebase, Agent Bricks, AI apps, and deeper model-provider relationships. A premium mark can remain justified when execution is near-perfect, but it becomes fragile if product sprawl, partner economics, competitive bundling, or IPO timing slips. Microsoft Fabric, AWS EMR, and Snowflake all show that Databricks is not competing in a vacuum; large incumbents are already pitching integrated data-plus-AI stacks, cloud-scale resilience, and lower-friction procurement inside their own estates.[CR028, CR029, CR030, CR031, CR032, CR033]

People / execution risk register
Role / function	Dependency or gap	Likelihood	Severity	Mitigation	Diligence path
Product and platform leadership	Databricks is expanding simultaneously into Lakebase, Agent Bricks, AI apps, and deeper model integrations	medium	high	Large capital base and visible investor support	Request product-level resource allocation, GA quality metrics, and launch postmortems
Finance and capital markets execution	Repeated mega-rounds plus >$7B debt access create IPO-grade control expectations	medium	high	Positive free-cash-flow claims and strong investor demand	Request audited financial package, debt covenants, and IPO readiness workstreams
Legal / compliance operations	Active AI copyright litigation and AI-regulation obligations require deep model-governance coordination	medium-high	high	DPA, trust center, legal hub, and compliance artifacts	Request governance ownership map, model provenance controls, and reserve process
SRE and support scaling	Enterprise reliability scrutiny rises as Databricks adds AI and operational database ambitions	medium	medium-high	Status page visibility and multi-cloud operating footprint	Request SRE org chart, SLOs, incident review cadence, and reliability staffing plan

Execution risk is elevated by strategic breadth and valuation expectations rather than by obvious public distress signals.

[CR007, CR018, CR032, CR036, CR037, CR038]

7.4 Mitigations are visible, but the kill criteria depend on closing private diligence gaps

Databricks’ strongest public mitigation is that it already looks like a company preparing for much more invasive diligence: trust materials are organized, privacy contracting is explicit, public-sector authorization exists, and the status surface is transparent enough for outside monitoring. Those are meaningful positives. But a risk chapter should convert them into concrete thresholds. The legal thesis breaks if copyright litigation escalates into class certification, injunctive relief, or reserve levels that change unit economics. The operational thesis weakens if outage frequency stays elevated without corresponding public or private evidence of SLO discipline and postmortem quality. The dependency thesis worsens if hyperscaler bundles or model-provider repricing change who controls the account or who captures the gross margin. And the valuation thesis remains fragile until private diligence closes four gaps that public materials do not answer: partner economics, customer concentration, debt covenants, and litigation downside. Without those, Databricks can still be an exceptional company while remaining a difficult late-stage entry price.[CR007, CR010, CR018, CR020, CR021, CR037]

Mitigation and kill criteria table
Risk	Monitorable trigger	Threshold / event	Action implication
Copyright litigation	Court rulings or settlement posture	Class certification, injunctive relief, or reserve needs that are material relative to disclosed free cash flow	Treat legal risk as thesis-breaking until downside is repriced or reserved
AI regulation	AI Act applicability expansion	Databricks-controlled models or workflows fall clearly into provider obligations without disclosed compliance mapping	Assume higher compliance cost, slower EU expansion, and greater contract friction
Reliability	Incident frequency and recovery time	Repeated major outages or sustained multi-hour median resolution over successive quarters	Underwrite slower enterprise expansion and higher support cost
Partner concentration	Bundling or repricing by strategic partners	Hyperscaler or model-provider changes compress gross margin or shift account ownership	Lower terminal margin assumptions and demand clearer partner economics
Capital dependence	Debt and liquidity trajectory	Debt expands again without comparable improvement in disclosed cash generation or IPO readiness	Treat the $134B valuation as stretched rather than merely aggressive
Disclosure quality	Private diligence gaps persist	No close on concentration, partner economics, SLA, or litigation-reserve questions	Pause or price for uncertainty instead of underwriting on narrative momentum

These thresholds are intended to be monitorable from public news, court dockets, incident trackers, and management diligence materials, not from intuition.

[CR012, CR015, CR016, CR020, CR021, CR037]

FR001: Risk heatmap

Databricks’ heaviest residual risks cluster where legal exposure, partner concentration, and premium valuation reinforce one another.

This heatmap uses source-backed ordinal scoring to rank residual exposure rather than pretending to know synthetic probabilities.

[CR010, CR012, CR015, CR016, CR020, CR021]

FR002: Risk transmission map

Databricks’ key risks flow through a small set of channels: legal burden, outages, and partner concentration all feed margin, growth durability, and valuation support.

The graph is qualitative and source-backed: it shows transmission channels rather than a synthetic risk model.

[CR012, CR015, CR020, CR021, CR024, CR026]

7.5 Exhibits

Chapter 08

08Valuation

8.1 Investment thesis, anti-thesis, and recommendation

Databricks still looks like one of the strongest late-stage infrastructure assets in private markets. Public evidence supports a rare combination of scale, growth, customer depth, and improving cash generation: the company moved from a $62 billion Series J in December 2024 to a >$100 billion Series K term sheet in August 2025 and then to a $134 billion Series L in December 2025, while disclosed revenue run-rate moved from an expected $3 billion by early 2025 to $4.8 billion in late 2025 and $5.4 billion by early 2026. AI is no longer a side narrative; management and independent coverage both point to AI revenue run-rate exceeding $1.4 billion, with retention above 140% and an expanding cohort of million-dollar customers. The anti-thesis is that almost every bullish datapoint is still management-led run-rate disclosure rather than audited financial reporting. Investors can admire the asset and still conclude that the current entry price already capitalizes much of the good news. The honest recommendation is therefore track, not buy: company quality is high, but public evidence is not yet clean enough to prove that the current price leaves venture-style upside after accounting for denominator risk, private-market structure, and the still-open IPO timeline.[CV001, CV005, CV006, CV007, CV008, CV011]

Thesis / anti-thesis table
Dimension	Thesis	Anti-thesis	What would change the view
Scale and growth	Run-rate rose from roughly $3B expected in early 2025 to $5.4B by early 2026, with growth still >55% to 65%.	The strongest datapoints are still management-led run-rate snapshots rather than audited financial statements.	An audited revenue bridge and quarter-by-quarter disclosure would strengthen conviction.
AI monetization	AI products now look like a real second engine at >$1.4B run-rate and around one-quarter of run-rate by Sacra estimates.	AI mix can still overstate value if economics are pass-through heavy or margin-dilutive.	Gross-margin disclosure by AI product family would prove whether the premium is software quality or just higher workload volume.
Customer depth	>700 to 800 customers above $1M run-rate and >140% retention suggest durable expansion in large accounts.	Public sources still do not disclose concentration, churn, or workload-level net expansion by product.	Customer-cohort disclosure and concentration data would materially improve underwriting.
Comp premium	Databricks deserves a premium to Snowflake, MongoDB, Confluent, and Elastic because it combines data infrastructure with AI control-plane narrative.	At ~25x to 28x run-rate, the premium is already large relative to most public software benchmarks.	A lower entry price or a sustained public AI premium would make the premium easier to underwrite.
Exit optionality	The IPO window could open in 2026 or later, creating a path to public repricing.	Timing remains management-controlled and disclosure-light; staying private longer can also defer price discovery.	A formal IPO timeline or confidential filing would improve exit confidence.

The anti-thesis focuses on price and disclosure, not on whether Databricks is strategically important.

[CV003, CV006, CV007, CV008, CV012, CV016]

8.2 Financing context, denominator caveats, and comparable framing

The most important caveat in this chapter is denominator honesty. Databricks is private, so the headline valuation is a post-money round mark, not a continuously traded enterprise value. The company mostly discloses annualized revenue run-rate, not audited GAAP revenue, and public observers do not know the cap-table seniority, tender discounts, or debt covenants attached to the recent financing stack. That means a simple $134 billion divided by $4.8 billion or $5.4 billion run-rate ratio is useful, but it is not directly comparable to a public-company EV/NTM revenue multiple. Even with that limitation, the rough math is informative. Databricks screens around 25x to 28x run-rate depending on which public denominator one uses. That is well above current public data-platform names such as Snowflake, MongoDB, Confluent, and Elastic, and above a scaled workflow benchmark like ServiceNow. It is below the extreme AI scarcity multiple implied by Palantir, which matters because it shows Databricks is priced as a premium hybrid: better than ordinary data infrastructure, but not yet at the very top end of public AI exuberance. The comp set therefore supports a fair-to-stretched conclusion rather than an obviously absurd one. Databricks can justify a premium, but only if growth, AI monetization, and eventual disclosure quality remain unusually strong.[CV006, CV016, CV020, CV024, CV027, CV028]

Recommendation summary table
Dimension	Value	Rationale
Recommendation	track	Company quality is strong, but public evidence does not yet support a clear margin of safety at $134B.
Confidence	medium	The comp signal is directionally clear, but the key Databricks denominator is still a private run-rate rather than audited revenue.
Risk rating	high	Wide dispersion remains because cap-table structure, debt terms, and IPO timing are still not public.
Valuation stance	stretched	Current pricing sits far above most public data-platform comps and only works cleanly if Databricks keeps an AI premium.
Base-case valuation range	$110B-$145B	This range assumes continued growth with some compression toward public-comp discipline.
Decision implication	Wait for lower entry or fuller disclosure	A better price or audited IPO-style disclosure would move the call more than another funding headline.

Recommendation is explicitly price-sensitive and denominator-sensitive.

[CV006, CV016, CV020, CV021, CV043, CV044]

Comparable valuation table
Comparable	Valuation / market cap	Revenue denominator	Implied multiple / status	Relevance	Limitation
Databricks (subject)	$134B private post-money	$4.8B-$5.4B run-rate	~24.8x-27.9x	Shows what new money is paying for scale plus AI premium.	Private post-money valuation and run-rate are not the same as public EV / NTM revenue.
Snowflake	$49.85B market cap	$4.472B FY2026 product revenue	~11.1x	Closest public data-platform peer with meaningful scale and cloud economics.	Public market cap is not enterprise value and product revenue is a cleaner denominator than Databricks discloses.
MongoDB	$21.27B market cap	$2.01B FY2025 total revenue	~10.6x	Useful high-growth developer-data comp with premium software narrative.	Database exposure and product mix differ from Databricks lakehouse plus AI platform.
Confluent	$11.13B market cap	$1.167B 2025 revenue	~9.5x	Helpful real-time data infra comp showing where narrower infrastructure trades.	Streaming focus is narrower than Databricks and should not be used as a direct valuation anchor alone.
Elastic	$5.24B market cap	$1.483B 2025 revenue	~3.5x	Shows the downside of ordinary infrastructure software without a strong current AI premium.	Search / observability mix and weaker growth make it more of a floor than a direct peer.
ServiceNow	$94.84B market cap	$13.278B 2025 revenue	~7.1x	Scaled workflow-software benchmark for what mature, highly profitable enterprise software can trade at.	ServiceNow has superior disclosure and a more mature model, so the comp mostly anchors the upper bound of ordinary software.
Palantir	$350.05B market cap	$4.475B 2025 revenue	~78.2x	Shows what a public AI-scarcity premium can look like when narrative and government/AI demand are both extreme.	Palantir is an outlier; using it as a direct Databricks anchor would overstate fair value.

Denominators are intentionally mixed and should be read directionally: current public market cap over latest annual revenue for public comps, versus private post-money over disclosed run-rate for Databricks.

[CV006, CV016, CV027, CV028, CV029, CV030]

8.3 Scenario analysis and price sensitivity

The scenario table should be read as a pricing discipline tool, not management guidance. In a bull case, Databricks keeps growth closer to current levels for longer, turns AI mix into a durable margin and platform premium, and reaches an IPO window while still looking more like an AI control plane than a mature data-warehouse vendor. That outcome can justify a valuation above the current mark. In a base case, the company continues to execute well, but public-market comp pressure and the move from run-rate rhetoric to IPO-grade scrutiny compress the premium enough that upside from a $134 billion entry is limited. In a bear case, the company is still good, but good is not enough: growth decelerates, AI monetization looks more like pass-through than software leverage, or public software multiples remain anchored near the current Snowflake / MongoDB / ServiceNow range. The result is that public evidence points to wide dispersion but a base-skewed distribution. That is why the chapter’s posture is positive on the asset and disciplined on price. The risk is not that Databricks is weak; it is that the current price leaves too little room for ordinary execution mistakes or a less forgiving IPO tape.[CV008, CV016, CV017, CV018, CV022, CV024]

Bull / base / bear scenario table
Scenario	Assumptions	Valuation / return logic	Key risks	Probability signal
Bull	Run-rate approaches $8B-$8.5B, AI mix stays premium, and IPO buyers keep rewarding AI-control-plane names above ordinary software comps.	$180B-$220B; roughly 1.3x-1.6x gross from a $134B entry over a 2-3 year hold.	Premium multiple must persist despite public scrutiny and broader software repricing.	Possible, but requires both execution and a supportive IPO tape.
Base	Run-rate reaches about $6.0B-$6.6B, disclosure improves only modestly, and Databricks rerates toward the upper end of public software comps.	$110B-$145B; roughly 0.8x-1.1x gross from a $134B entry.	Current headline price leaves limited room for normal multiple compression.	Most plausible on public evidence because it honors both quality and denominator caveats.
Bear	Growth decelerates toward mature-software levels, AI economics look less differentiated, or public comps stay anchored near 10x-15x.	$55B-$85B; roughly 0.4x-0.6x gross from a $134B entry.	A good company can still produce a weak entry if public discipline arrives before disclosure quality improves.	Material downside if price discovery moves faster than Databricks disclosure.
Probability-weighted posture	Base-skewed because quality is visible but price support is incomplete.	Supports track rather than buy.	Cap-table opacity and IPO timing keep dispersion wide.	Late-stage private pricing needs more than admiration for the asset.

Ranges are committee discussion tools built from current public comps and explicit denominator caveats, not management guidance.

[CV016, CV017, CV018, CV022, CV024, CV026]

Thesis-break and kill triggers table
Trigger	Threshold / event	Transmission to thesis	Action implication
Growth deceleration	Public growth falls materially below the >55%-65% range before disclosure quality improves	Reduces the premium multiple that currently separates Databricks from ordinary data-infrastructure comps	Move from track toward avoid unless price resets sharply lower
AI monetization disappointment	AI run-rate grows but gross margin or attach economics prove weak in diligence	Turns the AI premium into lower-quality pass-through revenue	Cut valuation range and require product-level margin evidence before investing
Public multiple compression	Snowflake / MongoDB / ServiceNow-style revenue multiples contract further	Shrinks the market-clearing range for a future IPO	Do not underwrite today’s private mark on stale public multiples
Cap-table overhang	Preferred structure, tender discounts, or debt covenants materially reduce common-equity value	Headline valuation ceases to represent what new common-equity capital can earn	Rebuild the model on common-equity economics, not post-money headline value
IPO timeline slips	No meaningful IPO preparation or disclosure path appears after another financing cycle	Pushes liquidity farther out and leaves valuation in a private-mark feedback loop	Raise required return or wait for secondary liquidity at a discount

Triggers focus on observable underwriting breaks rather than broad company-quality concerns.

[CV008, CV016, CV017, CV018, CV021, CV024]

FV001: Recommendation logic

Decision chain from Databricks scale, disclosure quality, comp premium, and exit timing to the final recommendation.

[CV016, CV018, CV021, CV043, CV044, CV055]

FV002: Valuation sensitivity

Directional valuation outcomes as Databricks moves between public-comp and AI-premium multiple regimes.

Values are estimated from public comparables plus scenario run-rate assumptions; they are not management guidance or enterprise-value calculations.

[CV043, CV044, CV045, CV046, CV047, CV048]

FV003: Valuation / return range

Low, base, and high valuation envelopes for Databricks at the current late-stage entry point.

These ranges use valuation-to-run-rate proxies because Databricks does not publish the audited revenue and cap-structure detail needed for a full EV model.

[CV052, CV053, CV054, CV058]

8.4 Entry discipline, thesis-break triggers, and final diligence asks

What would move the call? A cleaner cap table, audited revenue and gross-margin disclosure, and evidence that the AI layer carries true software economics rather than just workload growth would all help materially. Price alone could also move the recommendation faster than another funding headline. A common-equity-equivalent entry below the low end of the current base range would create much more attractive asymmetry, especially if the company enters a formal IPO process with better disclosure. Conversely, the recommendation should worsen if Databricks loses the growth and retention profile that currently underwrites its premium or if private financing terms reveal that common-equity holders sit behind more structure than the headline valuation implies. The practical conclusion is that Databricks is investable as a company but not yet underwritten enough as a price. Investors should treat it as a live watchlist name with aggressive diligence, not as a blind late-stage momentum buy. The public record gets you to track with medium confidence and high risk, while the remaining diligence list determines whether the next move is up toward buy or down toward avoid.[CV020, CV021, CV051, CV055, CV056, CV057]

Final diligence asks table
Topic	Missing evidence	Why it matters	Owner / diligence path
Cap table and preferences	Current share count, round prices, liquidation preferences, tender discounts, and employee-liquidity terms	Common-equity upside may differ materially from the headline $134B valuation	Company / counsel / lead investor
Revenue bridge	Quarterly bridge from run-rate disclosure to audited GAAP revenue and deferred revenue	Prevents overpaying off marketing denominators	Finance diligence / auditor
Gross margin and SBC	Audited gross margin by product family and stock-comp burden	Determines whether AI growth is real software leverage or expensive cloud pass-through	Finance diligence / IPO readiness workstream
Debt terms	Debt pricing, covenant package, maturity profile, and any security or cross-default features	Debt capacity affects true common-equity economics and risk	Treasury / lender diligence
Customer concentration and NRR detail	Top-customer exposure, churn cohorts, NRR by AI and core platform products	Premium multiples require proof that the biggest growth engines are durable	Sales ops / customer analytics
IPO readiness and liquidity path	Board-level IPO criteria, banker prep, and any current secondary windows	Entry return depends heavily on timing and the next real price-discovery event	CEO / CFO / bankers

These asks are the minimum package needed to turn a strong company view into a cleaner valuation view.

[CV020, CV021, CV051, CV057, CV058]

FV004: Investment KPIs

IC-style dashboard of the Databricks underwriting dimensions that matter most at the current price.

[CV016, CV018, CV021, CV051, CV055, CV057]

Disclaimer

This report is a public-evidence diligence snapshot, not investment advice. Important financial, legal, technical, and contractual facts remain non-public and should be verified directly with management and primary documents before any investment decision.

Evidence index

Claims
ID	Statement	Confidence	Sources
CO001	Databricks was founded in 2013.	High	SO001, SO002
CO002	Databricks says the company was founded by seven researchers from UC Berkeley's AMP Lab.	Medium	SO002
CO003	The official founders page names Ali Ghodsi, Ion Stoica, Matei Zaharia, Patrick Wendell, Reynold Xin, Andy Konwinski, and Arsalan Tavakoli-Shiraji as Databricks founders.	Medium	SO004
CO004	Databricks describes itself as the data and AI company.	High	SO001, SO002
CO005	Databricks says its Data Intelligence Platform provides a unified foundation for data and governance combined with AI models tuned to an organization's characteristics.	Medium	SO001
CO006	Databricks says it is headquartered in San Francisco.	High	SO001, SO002, SO003
CO007	Databricks lists 160 Spear Street, 15th Floor, San Francisco, California as its contact address.	Medium	SO003
CO008	The current about page says more than 15,000 organizations worldwide rely on Databricks.	Medium	SO001
CO009	The Databricks press kit says the company has more than 20,000 customers globally.	High	SO002, SO008, SO009, SO010
CO010	The Databricks press kit says the company has more than 10,000 employees worldwide.	Medium	SO002
CO011	The Databricks press kit says the company operates 30-plus offices around the globe.	Medium	SO002
CO012	Databricks says 70% of the Fortune 500 use its platform.	Medium	SO002
CO013	Databricks maintains a public board-of-directors page.	Medium	SO006
CO014	Ali Ghodsi is Databricks' co-founder and CEO.	High	SO023, SO014
CO015	UC Berkeley says Ali Ghodsi cofounded Databricks with six UC Berkeley academics who built Apache Spark.	Medium	SO023
CO016	The Spark CACM paper credits Matei Zaharia, Reynold Xin, Patrick Wendell, Ali Ghodsi and other Berkeley-linked authors, anchoring Databricks' founder bench in Apache Spark's creation.	Medium	SO022
CO017	On December 17, 2024 Databricks announced a Series J financing with $10 billion of expected non-dilutive funding and $8.6 billion completed to date.	High	SO007, SO027
CO018	Databricks said the Series J financing valued the company at $62 billion.	High	SO007, SO027, SO014
CO019	Databricks said Thrive Capital led Series J, with Andreessen Horowitz, DST Global, GIC, Insight Partners and WCM Investment Management as co-leads.	High	SO007, SO027
CO020	Databricks said in the Series J announcement that it expected to cross a $3 billion revenue run-rate and achieve positive free cash flow in the quarter ending January 31, 2025.	High	SO007, SO027
CO021	CNBC reported in June 2025 that Databricks expected annualized revenue to reach $3.7 billion by July 2025 with 50% year-over-year growth.	Medium	SO014
CO022	CNBC reported Databricks generated $2.6 billion of revenue in the fiscal year ending January 2025.	Medium	SO014
CO023	CNBC reported that nearly 50 Databricks customers were spending over $10 million annually in the first quarter of fiscal 2026.	Medium	SO014
CO024	CNBC reported Databricks had roughly 8,000 employees in June 2025 and was hiring 3,000 people in 2025.	Medium	SO014
CO025	Databricks announced in September 2025 that it crossed a $4 billion revenue run-rate with growth above 50% year over year.	Medium	SO008
CO026	Databricks said its AI products had exceeded a $1 billion revenue run-rate by September 2025.	Medium	SO008
CO027	Databricks said it was closing a $1 billion Series K at a valuation above $100 billion in September 2025.	High	SO008, SO018
CO028	Databricks said it had achieved positive free cash flow over the prior 12 months by September 2025.	Medium	SO008
CO029	Databricks said more than 650 customers were consuming over $1 million in annual revenue run-rate by September 2025.	Medium	SO008
CO030	TechCrunch reported in August 2025 that Databricks was closing about $1 billion of new funding at a $100 billion valuation, co-led by Thrive and Insight Partners.	Medium	SO018
CO031	TechCrunch reported Databricks had already offered employees two secondary liquidity rounds in 2025.	Medium	SO018
CO032	Databricks announced on December 16, 2025 that it was raising more than $4 billion in a Series L financing at a $134 billion valuation.	High	SO009, SO015
CO033	Databricks said it crossed a $4.8 billion revenue run-rate in Q3 2025 with growth above 55% year over year.	High	SO009, SO015
CO034	Databricks said both its AI products and its Data Warehousing business had surpassed $1 billion revenue run-rate by December 2025.	Medium	SO009
CO035	Databricks said more than 700 customers were consuming over $1 million in annual revenue run-rate by December 2025.	Medium	SO009
CO036	Databricks announced on February 9, 2026 that it crossed a $5.4 billion revenue run-rate with growth above 65% year over year.	Medium	SO010
CO037	Databricks said the February 2026 financing package exceeded $7 billion, including roughly $5 billion of equity at a $134 billion valuation and roughly $2 billion of additional debt capacity.	High	SO010, SO015
CO038	Databricks said more than 800 customers were consuming over $1 million in annual revenue run-rate by February 2026.	Medium	SO010
CO039	SAP said in February 2025 that SAP Business Data Cloud natively embeds Databricks technology for data engineering, machine learning and AI workloads.	Medium	SO019
CO040	Microsoft markets Azure Databricks as an Azure-managed environment for the data and AI lifecycle.	Medium	SO025
CO041	Google Cloud markets Databricks on Google Cloud as a partnership offering for scalable analytics and AI workloads.	Medium	SO026
CO042	Databricks said it completed the MosaicML acquisition on July 19, 2023.	High	SO011, SO016
CO043	TechCrunch reported Databricks agreed to pay $1.3 billion for MosaicML.	Medium	SO016
CO044	Databricks said the MosaicML deal was meant to help enterprises train, customize and deploy generative AI models on their own data.	Medium	SO011
CO045	Databricks said on June 4, 2024 that it agreed to acquire Tabular and updated on June 7, 2024 that the acquisition had completed.	High	SO012, SO017
CO046	Databricks said the Tabular deal brought the creators of Apache Iceberg together with the creators of Delta Lake to push open lakehouse interoperability.	High	SO012, SO017
CO047	Databricks said on May 14, 2025 that it agreed to acquire Neon to deliver serverless Postgres for developers and AI agents.	Medium	SO013
CO048	The Register reported in April 2026 that a federal judge let authors' copyright claims against Databricks continue over DBRX and Mosaic-related training data.	High	SO020, SO021
CO049	Saveri says the plaintiffs filed suit on March 8, 2024 and that on April 21, 2026 the court denied Databricks' motion to dismiss DBRX-related claims.	Medium	SO021
CO050	Insight Partners publicly lists Databricks as a portfolio investment.	Medium	SO024
CO051	TechCrunch reported in August 2025 that Databricks had raised about $20 billion since founding.	Low	SO018
CO052	CNBC described Databricks in January 2026 as one of the highly valued private technology companies primed to go public in 2026.	Medium	SO015
CM001	Databricks says its Data Intelligence Platform is built on a lakehouse and is intended for an entire organization to use data and AI.	Medium	SM001
CM002	Databricks says lakehouse architecture combines data lakes and data warehouses to reduce costs and accelerate data and AI initiatives.	Medium	SM002
CM003	Databricks says the lakehouse offers one architecture for integration, storage, processing, governance, sharing, analytics, and AI.	Medium	SM002
CM004	Databricks says the lakehouse supports structured and unstructured data across major clouds.	Medium	SM002
CM005	Databricks says AI/BI runs directly on governed data in Unity Catalog.	High	SM004, SM005
CM006	Databricks says integrated semantics create one version of truth across BI dashboards, AI agents, and downstream tools.	High	SM004, SM005
CM007	Databricks says AI/BI supports natural-language dashboard creation and conversational analytics for business users.	Medium	SM005
CM008	Databricks says Mosaic AI is for building production AI agents on enterprise data.	Medium	SM003
CM009	Databricks says Mosaic AI provides built-in evaluation for agents using any AI model.	Medium	SM003
CM010	Databricks says Unity Catalog can enforce guardrails, access controls, rate limits, and lineage across AI workflows.	High	SM003, SM004
CM011	Databricks says Unity Catalog applies governance across structured data, unstructured data, business metrics, and AI models.	Medium	SM004
CM012	Databricks says Unity Catalog uses open lakehouse formats and open APIs to reduce lock-in.	High	SM004, SM007
CM013	Databricks public-sector materials list state and local government, federal agencies, and higher education as distinct target segments.	Medium	SM006
CM014	Databricks says public-sector agencies use the platform to track revenue, strengthen compliance, and improve fiscal decision-making.	Medium	SM006
CM015	Databricks says Delta Sharing and Databricks Marketplace let public-sector users share data without copying it and without requiring counterparties to run Databricks.	Medium	SM025
CM016	AWS Marketplace has a Databricks seller profile, giving buyers a standard marketplace procurement route.	Medium	SM009
CM017	Google Cloud positions Databricks as a partner offering with access to Gemini, open-source models, and BigQuery.	Medium	SM010
CM018	Microsoft describes Azure Databricks as a unified, open analytics platform for enterprise-grade data, analytics, and AI at scale.	Medium	SM011
CM019	Microsoft documentation identifies data engineering as a core Azure Databricks use case.	Medium	SM011
CM020	Microsoft documentation identifies machine learning, AI, and data science as core Azure Databricks use cases.	Medium	SM011
CM021	Microsoft documentation identifies data warehousing, analytics, and BI as core Azure Databricks use cases.	Medium	SM011
CM022	Microsoft documentation identifies real-time and streaming analytics as a core Azure Databricks use case.	Medium	SM011
CM023	Grand View Research estimates the global data lakehouse market at USD 11.35 billion in 2024.	Medium	SM015
CM024	Grand View Research expects the data lakehouse market to reach USD 13.94 billion in 2025.	Medium	SM015
CM025	Grand View Research projects the data lakehouse market will reach USD 74.00 billion by 2033 at a 23.2% CAGR.	Medium	SM015
CM026	Grand View Research says North America held 35.2% of 2024 data lakehouse revenue.	Medium	SM015
CM027	Grand View Research says large enterprises held 71.4% of 2024 data lakehouse revenue.	Medium	SM015
CM028	Global Market Insights estimates the data lakehouse market at USD 11.9 billion in 2024.	Medium	SM016
CM029	Global Market Insights expects the data lakehouse market to reach USD 14.2 billion in 2025.	Medium	SM016
CM030	Global Market Insights projects the data lakehouse market will reach USD 105.9 billion by 2034 at a 25% CAGR.	Medium	SM016
CM031	The Business Research Company says the data lakehouse market reaches USD 10.33 billion in 2025.	Medium	SM017
CM032	The Business Research Company says the data lakehouse market reaches USD 12.58 billion in 2026 at a 21.8% CAGR from 2025.	Medium	SM017
CM033	The Business Research Company projects the data lakehouse market reaches USD 27.28 billion in 2030 at a 21.4% CAGR.	Medium	SM017
CM034	The Business Research Company says data lakehouse deployments span both cloud-based and on-premise models.	Medium	SM017
CM035	The Business Research Company says data lakehouse demand spans both large enterprises and SMEs.	Medium	SM017
CM036	The Business Research Company says key data lakehouse end markets include IT and telecom, BFSI, retail and e-commerce, healthcare and life sciences, manufacturing, and energy and utilities.	Medium	SM017
CM037	Public data lakehouse market estimates conflict materially across publishers and forecast windows, so one generic TAM figure would overstate precision for Databricks.	Medium	SM015, SM016, SM017
CM038	IDC projects worldwide spending on AI-supporting technology will reach USD 337 billion in 2025.	Medium	SM014
CM039	IDC projects AI-supporting technology spend will surpass USD 749 billion by 2028.	Medium	SM014
CM040	IDC says 2025 marks a shift from AI experimentation to reinvention driven by AI agents and renovation in data, infrastructure, and cloud.	Medium	SM014
CM041	Confluent says 89% of IT leaders view data streaming platforms as critical or important to achieving data-related goals.	Medium	SM013
CM042	Confluent says 44% of IT leaders report 5x ROI from data streaming investments.	Medium	SM013
CM043	Confluent says 90% of IT leaders are increasing data streaming platform investment in 2025.	Medium	SM013
CM044	Confluent says 89% of IT leaders think data streaming platforms ease AI adoption by improving data access, quality assurance, and governance.	Medium	SM013
CM045	Deloitte says worker access to AI rose by 50% in 2025.	Medium	SM019
CM046	Deloitte says the number of companies with at least 40% of AI projects in production is set to double in six months.	Medium	SM019
CM047	Deloitte says only one in five companies has a mature governance model for autonomous AI agents.	Medium	SM019
CM048	Deloitte says 42% of companies believe their AI strategy is highly prepared, but they feel less prepared in infrastructure, data, risk, and talent.	Medium	SM019
CM049	Deloitte says legacy data and infrastructure architectures cannot power real-time autonomous AI.	Medium	SM019
CM050	McKinsey says nearly two-thirds of respondents cite security and risk concerns as the top barrier to scaling agentic AI.	Medium	SM018
CM051	McKinsey says 74% of respondents identify inaccuracy and 72% cite cybersecurity as highly relevant AI risks.	Medium	SM018
CM052	McKinsey says nearly 60% of respondents cite knowledge and training gaps as the main barrier to implementing responsible AI practices.	Medium	SM018
CM053	The FinOps Foundation says 63% of respondents now manage AI spending, up from 31% last year.	Medium	SM022
CM054	The FinOps Foundation says implementing governance and policy at scale becomes the top future priority as organizations manage more AI and ML spend.	Medium	SM022
CM055	CIO says companies without modern data infrastructure cannot feed relevant data into AI systems effectively.	Medium	SM023
CM056	CIO says traditional data platforms are often designed only for structured data and can lack governance and quality features.	Medium	SM023
CM057	CIO says preparing data for AI is the number-one reason companies pursue data modernization.	Medium	SM023
CM058	CIO says only 29.1% of companies reported using AI-centric data management platforms such as Vertex or SageMaker.	Medium	SM023
CM059	NIST says the AI RMF is a voluntary framework for incorporating trustworthiness into the design, development, use, and evaluation of AI systems.	Medium	SM020
CM060	NIST says it released a generative AI risk management profile in July 2024 and a critical infrastructure trust profile concept note in April 2026.	Medium	SM020
CM061	The EU AI Act sets risk-based rules for AI developers and deployers.	Medium	SM021
CM062	The EU AI Act made prohibitions effective in February 2025, GPAI rules effective in August 2025, and begins transparency and high-risk obligations in 2026 and 2027.	Medium	SM021
CM063	CDOTrends says 85% of surveyed organizations were already using GenAI in at least one function.	Medium	SM024
CM064	CDOTrends says only 37% of executives and 29% of practitioners thought GenAI applications were production-ready.	Medium	SM024
CM065	CDOTrends says practitioners cited cost, skills, quality, and governance as the main GenAI deployment hurdles.	Medium	SM024
CM066	CDOTrends says only 22% of respondents felt their current IT architecture could effectively support new AI applications.	Medium	SM024
CM067	Databricks’ Economist Impact landing page says companies were quick to adopt GenAI but still struggle to productionize and scale.	Low	SM008
CM068	Databricks’ Economist Impact landing page says 71% of practitioners believe their GenAI apps are not production-ready.	Low	SM008
CM069	Snowflake says it added 740 net new customers in Q4 fiscal 2026.	Medium	SM012
CM070	Snowflake says 733 customers spent more than USD 1 million on a trailing-12-month basis.	Medium	SM012
CM071	Snowflake says it served 790 Forbes Global 2000 customers as of January 31, 2026.	Medium	SM012
CM072	Snowflake says customers continue to rationalize budgets and prioritize cash-flow management.	Medium	SM012
CM073	Snowflake says it competes in a continually evolving market where enterprises are increasingly adopting AI for core functions.	Medium	SM012
CP001	Databricks says its Data Intelligence Platform is built on lakehouse architecture that combines the best elements of data lakes and data warehouses.	Medium	SP001
CP002	Databricks describes its lakehouse as one architecture for integration, storage, processing, governance, sharing, analytics, and AI across major clouds.	Medium	SP001
CP003	Databricks markets Unity Catalog as unified governance for all data, analytics, and AI assets.	Medium	SP003
CP004	Databricks says Unity Catalog applies discovery, access, quality monitoring, and compliance controls across structured data, unstructured files, ML models, and business metrics.	Medium	SP003
CP005	Databricks pricing is pay-as-you-go with no up-front costs and per-second billing granularity.	Medium	SP002
CP006	Databricks says committed-use contracts can provide discounts and can flex across multiple clouds.	Medium	SP002
CP007	Databricks says AI/BI is built natively into the platform and removes per-seat or per-license BI fees.	Medium	SP004
CP008	Databricks announced an expected $10 billion Series J financing that valued the company at $62 billion.	Medium	SP006
CP009	Databricks said in December 2024 that it expected to cross a $3 billion revenue run rate and become free-cash-flow positive in the quarter ending January 31, 2025.	Medium	SP006
CP010	Databricks said it had more than 500 customers consuming at over $1 million annual revenue run rate.	Medium	SP006
CP011	Databricks said in June 2025 that more than 15,000 organizations, including 70% of the Fortune 500, rely on its platform.	Medium	SP005
CP012	Databricks said in June 2025 that Unity Catalog added full Apache Iceberg support and native Iceberg REST Catalog APIs.	Medium	SP005
CP013	Databricks said Unity Catalog can let external engines including Trino, Snowflake, and Amazon EMR read and write Iceberg managed tables with fine-grained governance.	Medium	SP005
CP014	Snowflake documentation describes the platform as a self-managed cloud service that combines data storage, processing, and analytic solutions.	Medium	SP007
CP015	Snowflake documentation says customers cannot install and run Snowflake locally or on private cloud infrastructure.	Medium	SP007
CP016	Snowflake documentation describes its architecture as separate storage, compute, and cloud-services layers, with virtual warehouses as independent compute clusters.	Medium	SP007, SP008
CP017	Snowflake documentation says total cost is the aggregate of compute, storage, and data-transfer usage.	Medium	SP008
CP018	Snowflake documentation says virtual warehouses are billed per second with a 60-second minimum each time a warehouse starts.	Medium	SP008
CP019	Snowflake documentation gives a Small Standard virtual warehouse example of 2 credits per hour.	Medium	SP008
CP020	Snowflake reported $1.23 billion of product revenue in Q4 fiscal 2026, up 30% year over year.	Medium	SP009
CP021	Snowflake reported 733 customers with trailing 12-month product revenue greater than $1 million as of January 31, 2026.	Medium	SP009
CP022	Snowflake reported 790 Forbes Global 2000 customers and more than 9,100 accounts using Snowflake AI features as of January 31, 2026.	Medium	SP009
CP023	Snowflake says more than 13,300 customers around the world use its AI Data Cloud.	Medium	SP009
CP024	Google Cloud describes BigQuery as a serverless data analytics platform that does not require users to provision individual instances or virtual machines.	Medium	SP010, SP011
CP025	BigQuery pricing defaults to on-demand billing per TiB scanned and generally provides up to 2,000 concurrent shared slots per project.	Medium	SP011
CP026	BigQuery on-demand query pricing lists $6.25 per tebibyte and also offers capacity pricing per slot-hour with BigQuery editions and autoscaling.	Medium	SP011
CP027	Google Cloud documentation says BigQuery-managed Apache Iceberg tables are designed as a foundation for interoperable lakehouse workflows.	Medium	SP012
CP028	Alphabet said Google Cloud revenue increased 30% to $12.0 billion in Q4 2024.	Medium	SP013
CP029	Microsoft Learn describes Fabric as an end-to-end analytics SaaS platform with data engineering, data factory, data science, real-time intelligence, data warehouse, and database workloads over a shared compute and storage model.	Medium	SP016
CP030	Microsoft Learn says Fabric uses OneLake as a centralized logical data lake and OneLake Catalog as a centralized discovery and governance experience.	Medium	SP016
CP031	Microsoft Learn says Fabric includes Copilot capabilities and Purview-backed governance, compliance, and auditing across workloads.	Medium	SP014, SP016
CP032	Microsoft pricing describes Fabric capacity as a shared pool of Capacity Units that can be bought on a pay-as-you-go or reservation basis.	Medium	SP015
CP033	Microsoft pricing says a one- or three-year Fabric reservation can save about 41% versus pay-as-you-go.	Medium	SP015
CP034	Microsoft pricing says Power BI Pro is still required for report publishers and consumers on smaller Fabric capacities, while F64/P1 or larger capacities can waive Pro for consumers.	Medium	SP015
CP035	Microsoft reported $29.9 billion of Intelligent Cloud revenue in fiscal Q4 2025, up 26% year over year.	Medium	SP017
CP036	AWS positions Amazon Redshift as a cloud data warehouse for analytics and agentic AI that can unify data across Redshift, S3 data lakes, and third-party or federated sources.	Medium	SP018
CP037	AWS pricing says Redshift Provisioned starts at $0.543 per hour and Redshift Serverless starts at $1.50 per hour.	Medium	SP019
CP038	AWS pricing says Redshift Serverless bills RPU-hours on a per-second basis with a 60-second minimum and reservations can reduce compute costs by up to 45%.	Medium	SP019
CP039	Amazon reported AWS segment sales of $28.8 billion in Q4 2024 and $107.6 billion in full-year 2024.	Medium	SP020
CP040	Confluent says its managed Flink offering unifies Apache Kafka and Apache Flink so Kafka topics become queryable Flink tables.	Medium	SP021
CP041	Confluent says its fully managed serverless Flink offering uses usage-based pricing calculated in CFUs consumed per minute.	Medium	SP021
CP042	Confluent pricing says serverless Kafka uses autoscaling eCKUs, with the first eCKU free and listed tiers starting at $2.25 with a two-eCKU minimum.	Medium	SP022
CP043	Confluent reported $922.1 million of fiscal-year 2024 subscription revenue and $963.6 million of total revenue.	Medium	SP023
CP044	Apache Spark describes itself as a unified engine for large-scale data analytics.	Medium	SP024
CP045	Trino describes itself as a distributed SQL query engine for big data.	Medium	SP025
CP046	Microsoft Learn says OneLake shortcuts can provide zero-copy access to Amazon S3 and Google Cloud Storage in addition to Azure storage.	Medium	SP016
CP047	Databricks says its lakehouse is built on open source and open standards including Apache Spark, Delta Lake, MLflow, and Delta Sharing.	Medium	SP001
CP048	BigQuery Iceberg documentation describes metadata snapshot export in Apache Iceberg V2 format and Spark-runtime access patterns for Iceberg tables.	Medium	SP012
CP049	AWS says Redshift can query data in open formats on Amazon S3 and open Redshift data to AWS and Apache Iceberg-compatible analytics engines through the SageMaker lakehouse.	Medium	SP018
CP050	Rill argues the competitive center of gravity is shifting from proprietary table formats toward managed Iceberg infrastructure and catalogs, which reduces vendor lock-in.	Medium	SP026
CI001	Databricks says its pricing is pay-as-you-go with no up-front costs and per-second billing granularity.	Medium	SI001
CI002	Microsoft says Azure Databricks bills customers for both provisioned virtual machines and Databricks Units based on the selected VM instance.	Medium	SI002
CI003	Microsoft says customers can save up to 37% over pay-as-you-go DBU prices by pre-purchasing Databricks Commit Units for one-year or three-year terms.	Medium	SI002
CI004	Microsoft says Azure Databricks does not charge DBUs while instances are idle in a pool, but cloud-instance billing still applies.	Medium	SI002
CI005	Microsoft Learn says some Azure Databricks serverless features use DBU multipliers, including a 2X multiplier for Data Quality Monitoring.	Medium	SI003
CI006	Microsoft Learn says SQL Serverless warehouse sizes range from 4 DBUs per hour at 2X-Small to 528 DBUs per hour at 4X-Large.	Medium	SI003
CI007	Microsoft Learn says CPU model serving bills one concurrent request per hour as 1 DBU per hour.	Medium	SI003
CI008	Microsoft Learn says AI Gateway inference tables bill 7.143 DBUs per 1 GB of payload.	Medium	SI003
CI009	Databricks said on September 8, 2025 that it crossed a $4 billion revenue run-rate growing more than 50% year over year.	Medium	SI005
CI010	Databricks said its AI products recently crossed a $1 billion revenue run-rate by September 2025.	Medium	SI005
CI011	Databricks said it had achieved positive free cash flow over the prior 12 months by September 2025.	Medium	SI005
CI012	Databricks said its September 2025 Series K raised $1 billion at a valuation above $100 billion.	Medium	SI005
CI013	Databricks said on February 9, 2026 that it crossed a $5.4 billion revenue run-rate with growth above 65% year over year.	High	SI006, SI010, SI011
CI014	Databricks said in February 2026 that its financing package exceeded $7 billion, including about $5 billion of equity at a $134 billion valuation and about $2 billion of additional debt capacity.	High	SI006, SI010, SI011
CI015	Databricks said in February 2026 that it delivered positive free cash flow over the prior 12 months.	High	SI006, SI011
CI016	Databricks said in February 2026 that its AI products crossed a $1.4 billion revenue run-rate.	High	SI006, SI010, SI011, SI012
CI017	Databricks said in February 2026 that more than 800 customers were consuming at over $1 million in annual revenue run-rate.	High	SI006, SI011, SI023
CI018	Databricks said in February 2026 that more than 70 customers were consuming at over $10 million in annual revenue run-rate.	High	SI006, SI011, SI023
CI019	CNBC reported in June 2025 that Databricks expected annualized revenue to reach $3.7 billion by July 2025.	Medium	SI009
CI020	CNBC reported Databricks generated $2.6 billion of revenue in the fiscal year that ended in January 2025.	Medium	SI009
CI021	CNBC reported in June 2025 that Databricks had a net retention rate above 140%.	Medium	SI009, SI023
CI022	CNBC reported that nearly 50 Databricks customers were spending over $10 million annually in the first quarter of fiscal 2026.	Medium	SI009
CI023	CNBC reported in June 2025 that Databricks was close to free-cash-flow positive in the most recent fiscal year.	Medium	SI009
CI024	Databricks said in December 2024 that it was raising $10 billion of expected non-dilutive financing, with $8.6 billion completed to date, at a $62 billion valuation.	Medium	SI004
CI025	Databricks said the December 2024 capital package was intended for AI products, acquisitions, international go-to-market expansion, and employee liquidity and related taxes.	Medium	SI004
CI026	Snowflake pricing describes a managed platform with elastic compute and separate storage charges.	Medium	SI013
CI027	Google says BigQuery on-demand analysis is priced at $6.25 per tebibyte above the first free tebibyte each month.	Medium	SI016
CI028	Google says BigQuery also offers capacity pricing in slots with pay-as-you-go autoscaling and optional one-year and three-year commitments.	Medium	SI016
CI029	AWS says Amazon Redshift Serverless starts at $1.50 per hour and bills RPU-hours on a per-second basis while the warehouse is active.	Medium	SI017
CI030	AWS says Amazon Redshift Serverless reservations can reduce compute costs by up to 45% for a three-year term or up to 24% for a one-year term.	Medium	SI017
CI031	Snowflake said in its FY2026 10-K that revenue was $4.7 billion and remaining performance obligations were about $9.8 billion, with about 46% expected to be recognized within 12 months.	Medium	SI014
CI032	Snowflake said cost of product revenue increased by $248.1 million in FY2026 mainly because of higher third-party cloud infrastructure expenses, including AI inference.	Medium	SI014
CI033	Snowflake said product gross margin was 72% in FY2026.	Medium	SI014
CI034	Confluent said in its 10-K that public-cloud provider pricing significantly influences its costs and gross margins and that higher cloud mix can hurt margins.	Medium	SI018
CI035	Confluent said its shift to a consumption-oriented sales model could create near-term financial volatility and that Confluent Cloud historically had a lower average price than Confluent Platform subscriptions.	Medium	SI018
CI036	Confluent said its Confluent Cloud land motions include free trial and pay-as-you-go entry points with no commitments, and some customers resist large long-term commitments.	Medium	SI018
CI037	Databricks said total cost of ownership on the platform has two core components: direct platform costs and underlying cloud infrastructure costs.	Medium	SI007
CI038	Databricks said FinOps and platform teams need unified views because Databricks and cloud cost data are fragmented across accounts, clusters, tags, and business units.	Medium	SI007
CI039	CloudForecast wrote in 2026 that Databricks pricing is confusing because DBUs, compute types, tiers, and separate infrastructure costs all contribute to the customer bill.	Medium	SI021
CI040	Mammoth wrote in 2026 that published Databricks pricing ranges from about $0.07 to $0.65+ per DBU plus separate cloud infrastructure charges.	Low	SI022
CI041	Mammoth wrote in 2026 that Databricks billing is pay-per-second with no upfront costs, but total spend includes DBUs plus cloud infrastructure and storage.	Medium	SI022
CI042	Revenue Brew reported in February 2026 that Databricks had reached a $5.4 billion revenue run-rate.	Medium	SI012
CI043	Revenue Brew reported in February 2026 that Databricks said AI products generated $1.4 billion in annualized revenue.	Medium	SI012
CI044	Sacra estimated Databricks gross margins were about 80% as of June 2024, down from about 85% a year earlier.	Low	SI020
CI045	Sacra said Databricks average contract value stood at $208,696 as of June 2024.	Low	SI020
CI046	Sacra says Databricks uses a B2B, consumption-based SaaS model where customers pay for compute, storage, and data processing usage rather than fixed licenses or seat counts.	Low	SI020
CI047	Sacra says Databricks cost inputs include cloud infrastructure, data processing, and compute resources from AWS, Azure, and Google Cloud.	Low	SI020
CI048	Revefi wrote in 2026 that Databricks consumption-based pricing makes spend harder to predict as Genie and Mosaic AI workloads create variable and spiky compute demand.	Medium	SI023
CI049	Databricks pricing and product pages present monetized surfaces that include data engineering, data warehousing, AI, business intelligence, application development, database, and security.	Medium	SI001
CI050	Databricks says AI/BI removes per-seat and per-license BI fees by embedding BI and conversational analytics directly into the platform.	Medium	SI008
CE001	Databricks presents itself as a data and AI platform for enterprises rather than a single analytics SKU.	Medium	SE001
CE002	Databricks' product surface spans lakehouse architecture, governance, serverless SQL analytics, AI governance, and operational database products.	Medium	SE002
CE003	Databricks markets its lakehouse platform across AWS, Azure, and GCP.	Medium	SE002
CE004	Unity Catalog is positioned as unified and open governance for data and AI.	Medium	SE003
CE005	Unity Catalog claims to enforce discovery, access, quality monitoring, and compliance controls across structured and unstructured data, ML models, and business metrics in any cloud.	Medium	SE003
CE006	Unity Catalog advertises support for open formats including Delta, Apache Iceberg, Hudi, and Parquet.	Medium	SE003
CE007	Unity Catalog says it provides a unified catalog for structured data, unstructured data, business metrics, and AI models.	Medium	SE003
CE008	Unity Catalog offers row- and column-level access policies based on attributes and tags.	Medium	SE003
CE009	Unity Catalog provides end-to-end automated column-level lineage for data and AI assets.	Medium	SE003, SE011
CE010	Unity Catalog federates and governs external systems including MySQL, PostgreSQL, Salesforce, Redshift, Snowflake, BigQuery, and Hive Metastore without requiring migration.	Medium	SE003
CE011	AI/BI is described as AI-powered business intelligence that is natively integrated into the Databricks platform.	Medium	SE004
CE012	AI/BI says dashboards, Genie, Databricks SQL, Databricks One, Genie Code, and Unity Catalog Business Semantics are part of the BI product family.	Medium	SE004
CE013	AI/BI says analytics run directly on governed data in Unity Catalog so metrics, lineage, and permissions stay aligned.	Medium	SE004
CE014	AI/BI claims there are no per-seat or per-license BI fees for users exploring data and dashboards.	Low	SE004
CE015	Databricks architecture guidance describes platform fundamentals in terms of control plane, compute plane, and storage components.	Medium	SE009
CE016	Azure Databricks accounts can manage multiple workspaces and multiple Unity Catalog metastores.	Medium	SE019
CE017	Databricks workspaces are the collaboration environment for ingestion, interactive exploration, scheduled jobs, and ML training.	Medium	SE019
CE018	Azure Databricks operates a control plane that Databricks manages outside the customer cloud account, and the web application lives in that control plane.	Medium	SE019, SE021
CE019	Azure Databricks uses different compute planes for serverless and classic compute: serverless runs in the Databricks account, while classic compute runs in the customer Azure subscription.	Medium	SE019
CE020	Databricks recommends a medallion architecture in which bronze, silver, and gold layers progressively improve data quality and structure.	Medium	SE010
CE021	Databricks' medallion example ingests raw data from cloud storage, Kafka, and Salesforce into bronze before validation in silver and enrichment in gold.	Medium	SE010
CE022	Mosaic AI Model Serving is Databricks' interface for deploying, governing, and querying AI and ML models for real-time serving and batch inference.	Medium	SE012
CE023	Mosaic AI Model Serving exposes served models as REST APIs and automatically scales with serverless compute for availability and latency management.	Medium	SE012
CE024	Databricks says external models from providers such as OpenAI and Anthropic can be centrally governed through model-serving endpoints.	Medium	SE012
CE025	Serverless SQL warehouses do not have public IP addresses.	Medium	SE013
CE026	Serverless SQL requires Premium-plan-or-higher workspaces and separate acceptance of serverless terms of service.	Medium	SE013
CE027	The Databricks release-notes index was updated on 2026-05-04 and includes feature-specific notes for Databricks SQL, Lakeflow Spark Declarative Pipelines, and serverless compute.	Medium	SE014
CE028	Lakebase is marketed as an operational Postgres database for AI agents and applications that is integrated with the lakehouse.	Medium	SE005
CE029	Lakebase advertises decoupled compute and storage, point-in-time recovery, scale-to-zero autoscaling, and database branching.	Medium	SE005
CE030	Lakebase says operational data can stay connected to the lakehouse through Unity Catalog governance and one-click data sync.	Medium	SE005
CE031	Databricks Trust says security capabilities include encryption, network controls, auditing, identity integration, access controls, and data governance.	Medium	SE006
CE032	Databricks publicly lists FedRAMP, GDPR, HIPAA, PCI-DSS, ISO 27001/27017/27018/27701, and SOC as supported compliance frameworks or attestations.	Medium	SE007
CE033	Databricks says its due diligence package includes ISO certificates and an annual penetration-test confirmation letter, while SOC 3 is public and SOC reports refresh in June, August, and December.	Medium	SE008
CE034	Google Cloud markets Databricks on Google Cloud as scalable, secure, and cost-effective, with access to Gemini, BigQuery, open-source tools, and multicloud patterns.	Medium	SE028
CE035	Google Cloud says Databricks announced Unity Catalog support for reading and writing managed Apache Iceberg tables across catalogs.	Medium	SE027, SE022
CE036	NVIDIA documents RAPIDS acceleration on Databricks for pandas, Spark, and Dask, including GPU plugins for driver-and-worker Spark clusters.	Medium	SE026
CE037	The Databricks CLI GitHub repository showed 332 stars, 165 forks, and 263 releases, with v0.299.0 dated 2026-04-30.	Medium	SE016
CE038	The Databricks SDK for Python PyPI package showed version 0.106.0 released on 2026-04-30 and requiring Python 3.10 or newer.	Medium	SE018
CE039	The Databricks SDK for Python repository says Runtime 13.1 includes a bundled SDK and that authentication supports Databricks-native, Azure-native, and GCP-native flows.	Medium	SE017
CE040	Databricks launched LakeFlow in June 2024 as a built-in data-engineering product for ingestion, transformation, and orchestration across databases and SaaS sources.	Medium	SE023
CE041	theCUBE Research reported that Databricks' 2025 summit centered on open lakehouse architecture, unified governance, and AI democratization.	Medium	SE022
CE042	theCUBE Research said 2025 summit announcements included business-semantics direction for Unity Catalog plus GenAI, agent, Iceberg, and Lakebase updates.	Medium	SE022
CE043	VentureBeat reported in February 2026 that Lakebase was generally available, built on technology from Neon and Mooncake, and designed to make operational writes queryable by analytics engines without ETL.	Medium	SE025
CE044	TechCrunch reported in March 2026 that Databricks launched Lakewatch, a new security product that performs SIEM-style detection and investigation with AI agents.	Medium	SE024
CE045	On 2026-05-05 the Databricks AWS status page showed an active incident with compute partially disrupted in multiple regions while AI/BI and Databricks Apps remained operational.	Medium	SE015
CE046	ServiceAlert.ai showed 100% uptime over the prior 88 days for Databricks but no detailed incident data, limiting independent verification of outage severity or root causes.	Low	SE029
CE047	Microsoft's production-planning guidance recommends security, governance, and multi-workspace design before production Azure Databricks deployment and suggests serverless workspaces for initial exploration.	Medium	SE020
CE048	lakeFS describes Databricks as a hybrid PaaS in which a single-tenant data plane runs in the customer cloud account while a multi-tenant control plane remains with Databricks.	Low	SE021
CU001	Databricks said in February 2026 that more than 20,000 organizations worldwide rely on the platform and that 70% of the Fortune 500 are customers.	Medium	SU004
CU002	Databricks said in September 2025 that more than 20,000 organizations worldwide relied on the platform, including Block, Comcast, Condé Nast, Rivian, and Shell.	Medium	SU003
CU003	CNBC reported in June 2025 that Databricks had more than 15,000 customers.	Medium	SU005
CU004	Databricks said in September 2025 that it had 650-plus customers consuming more than $1 million of annual revenue run-rate.	Medium	SU003
CU005	Databricks said in February 2026 that it had more than 800 customers consuming more than $1 million of annual revenue run-rate.	Medium	SU004
CU006	Databricks said in February 2026 that it had more than 70 customers consuming more than $10 million of annual revenue run-rate.	Medium	SU004
CU007	CNBC reported in June 2025 that nearly 50 Databricks customers were spending more than $10 million annually in the first quarter of the new fiscal year.	Medium	SU005
CU008	Databricks said in September 2025 that its net retention rate was sustaining above 140 percent.	Medium	SU003
CU009	Databricks said in February 2026 that its net retention rate remained above 140 percent.	Medium	SU004
CU010	CRN reported in February 2026 that Databricks sustained net retention greater than 140 percent.	Medium	SU006
CU011	The Databricks July 2025 summit recap said hundreds of customers, including 7-Eleven, Fox Sports, and Rivian, presented active use cases at Data + AI Summit 2025.	Medium	SU002
CU012	Databricks said 7-Eleven uses the platform to run a multipurpose agentic marketing assistant across more than 13,000 stores.	Medium	SU002
CU013	Databricks said 7-Eleven used assessments and workflows to simplify a Unity Catalog migration.	Medium	SU002
CU014	A Databricks Events YouTube session exists for 7-Eleven on using Mosaic AI to create a multi-purpose agentic marketing assistant.	Low	SU019
CU015	Databricks said FOX Sports built Cleatus AI to answer fan questions in natural language using live scores, stats, and commentary.	Medium	SU002
CU016	Databricks said FOX Sports achieved a 2x higher query-success rate for fans using its AI-powered search experience.	Medium	SU002, SU016
CU017	The FOX Sports Databricks customer story says AI-powered search more than doubled its success rate while delivering more personalized and timely insights to fans.	Medium	SU016
CU018	A Databricks Events YouTube page exists for a FOX Sports session on reimagining the fan experience with the Databricks Data Intelligence Platform.	Low	SU021
CU019	Databricks said Mastercard uses the platform to deploy AI responsibly across teams, platforms, and partners while automating onboarding support with a GenAI assistant.	Medium	SU002
CU020	Databricks said Mastercard used Delta Lake to cut query time by 80 percent and storage by 70 percent, and used Workflows to reduce pipeline processing from months to days.	Medium	SU002
CU021	Mastercard said its new product onboarding assistant was built in collaboration with Databricks on the Data Intelligence Platform.	Medium	SU011
CU022	Mastercard said the onboarding assistant uses retrieval-augmented generation and a human-in-the-loop feedback loop.	Medium	SU011
CU023	Mastercard said it uses machine-learning models to analyze more than 143 billion transactions per year.	Medium	SU011
CU024	In a September 2025 Mastercard story, Arsalan Tavakoli said the Mastercard product onboarding assistant significantly sped up onboarding and that churn in the process had come down.	Medium	SU012
CU025	The current Databricks customer story page for Mastercard frames the account as a responsible-AI and governance deployment at global payments scale.	Low	SU017
CU026	Databricks said Insulet used the platform to achieve 12x faster real-time data processing, 83 percent fewer SQL queries, and 97 percent lower total cost of ownership.	Medium	SU002
CU027	The Insulet Databricks customer story says adopting Databricks delivered 12x faster data processing and 97 percent lower total cost of ownership.	Medium	SU018
CU028	The Insulet Databricks customer story says Lakeflow Connect automated ingestion from enterprise applications including Salesforce and Workday.	Medium	SU018
CU029	Microsoft said AT&T achieved a five-year ROI of 300 percent after migrating to Azure Databricks.	Medium	SU007
CU030	Microsoft said AT&T reduced more than 80 schemas and accelerated its data-science cycles by about three times after migrating to Azure Databricks.	Medium	SU007
CU031	Microsoft said AT&T now supports nearly 90,000 internal customers on one data architecture and can spin up new computing environments in hours rather than three to four months.	Medium	SU007
CU032	Databricks said AT&T and Databricks built AutoClassify, an end-to-end system for automatic multi-head binary classification from unlabeled text.	Medium	SU002
CU033	A Databricks Events YouTube page exists for an AT&T AutoClassify customer session.	Low	SU020
CU034	PeerSpot listed 93 Databricks reviews on the reviewed page.	Medium	SU008
CU035	A PeerSpot reviewer said Databricks had become very expensive for their team and was less forgiving than Snowflake when implemented inefficiently.	Low	SU008
CU036	PeerSpot summarized Databricks as frequently expensive for enterprise buyers because costs vary with usage, compute time, and data processed.	Low	SU008
CU037	The archived Capterra Databricks page showed 17 reviews.	Low	SU009
CU038	Capterra review text said Databricks can feel overwhelming for new users and that initial setup and connections require an experienced professional.	Low	SU009
CU039	A Capterra review said Databricks pricing was fairly expensive and connecting Azure Data Lake required workarounds.	Low	SU009
CU040	Databricks said in a Gartner Peer Insights recap that AI/BI earned a 4.8 out of 5 star rating and 94 percent willingness to recommend from 167 verified customer reviews as of September 30, 2025.	Medium	SU024
CU041	FeaturedCustomers said Databricks had 631 reviews, 457 case studies, and 128 customer videos on its platform.	Low	SU010
CU042	Microsoft describes Azure Databricks as a Spark-based data and AI platform optimized for Microsoft Azure that works with Power BI, Azure AI Foundry, Power Platform, and other Microsoft services.	Medium	SU013
CU043	Google Cloud says Databricks on Google Cloud is available on Marketplace and offers enterprise capabilities for AI-driven outcomes.	Medium	SU014
CU044	SAP said SAP Business Data Cloud includes SAP Databricks as a first-party data service and brings the power of Databricks directly into SAP Business Data Cloud.	Medium	SU015
CU045	PR Newswire reported that Databricks received FedRAMP High agency authority to operate on AWS GovCloud in April 2024 and that Azure Databricks already held FedRAMP High and IL5 authorizations.	Medium	SU022
CU046	The current Databricks AI-customer page still highlights Mastercard and Rivian video references on the customer surface.	Low	SU025
CR001	Databricks’ Privacy Notice says it applies to websites, applications, platform services, events, sales, and marketing activities.	Medium	SR001
CR002	Databricks’ Privacy Notice says California residents have additional rights under the CCPA.	Medium	SR001
CR003	Databricks says it uses large language models and other AI tools for certain uses of collected information in accordance with applicable law.	Medium	SR001
CR004	Databricks says it uses European Commission Standard Contractual Clauses, supplementary measures, and a DPA with SCCs for customer transfers.	High	SR001, SR002
CR005	Databricks says it is certified to the EU-U.S., UK, and Swiss Data Privacy Frameworks.	Medium	SR001
CR006	Databricks offers a downloadable, electronically signable Data Processing Addendum for customers that require one.	Medium	SR002
CR007	Databricks says its due-diligence package includes ISO certifications, an annual pen-test confirmation letter, an Enterprise Security Guide, and a SOC 2 Type II report.	Medium	SR003
CR008	Databricks documentation says security and compliance are a shared responsibility between Databricks, the customer, and the cloud provider.	Medium	SR004
CR009	Databricks documentation says the Enhanced Security and Compliance add-on includes controls for FedRAMP High, FedRAMP Moderate, and HIPAA.	High	SR030, SR005
CR010	The FedRAMP Marketplace lists Databricks on Azure Commercial as FedRAMP Certified, Class D (High), Rev5, as of 2026-01-16.	Medium	SR007
CR011	The EUR-Lex AI Act summary says the regulation applies from 2 August 2026, while some governance, penalty, and general-purpose AI model obligations start on 2 August 2025.	Medium	SR008
CR012	The EUR-Lex AI Act summary says providers of general-purpose AI models face documentation, downstream-information, training-data disclosure, and possible additional risk-management and cybersecurity duties.	Medium	SR008
CR013	CourtListener shows In Re Mosaic LLM Litigation is a live federal copyright case involving Databricks, with a last known filing on 2026-04-29.	Medium	SR009
CR014	Internet Cases reports that the court allowed plaintiffs to amend the complaint to add direct copyright infringement claims against Databricks tied to DBRX.	Medium	SR010
CR015	The Register reported on 2026-04-29 that Judge Breyer denied Databricks’ motion to dismiss and allowed authors’ claims to continue.	Medium	SR011
CR016	The Register says plaintiffs allege DBRX inherited risk from Mosaic’s MPT lineage through RedPajama and Books3, with potential statutory damages up to $150,000 per work if willful infringement is proven.	Medium	SR011
CR017	CFM Lawyers says a proposed class action was filed in British Columbia and Quebec on 2025-07-24 against Databricks and MosaicML over Books3 and The Pile training-data allegations.	Medium	SR012
CR018	Databricks operates a public status page that provides high-level availability information across Databricks services and regions.	Medium	SR006
CR019	Databricks and Azure Databricks documentation says Delta Lake can be used to manage GDPR and CCPA compliance workflows.	High	SR005, SR030
CR020	IsDown says Azure Databricks had 20 incidents in the last 90 days, including 1 major outage and 19 minor incidents, with a median duration of 1 hour 33 minutes.	Medium	SR027
CR021	IsDown says it has documented 173 Azure Databricks outages and incidents since January 2023, averaging 4.4 per month, with typical resolution time of 177 minutes.	Medium	SR027
CR022	IsDown says it monitors the official Azure Databricks status page across 11 components.	Medium	SR027
CR023	Databricks said in its Series K announcement that it had launched or expanded partnerships with Microsoft, Google Cloud, Anthropic, SAP, and Palantir in the prior two quarters.	Medium	SR014
CR024	Databricks said its Google Cloud partnership makes Gemini models native Databricks products billable through Databricks contracts.	Medium	SR021
CR025	Databricks said the Google Cloud partnership lets customers use Gemini on enterprise data under Unity Catalog governance without data replication.	Medium	SR021
CR026	Databricks and Anthropic announced a strategic five-year partnership to offer Claude natively through Databricks across AWS, Azure, and Google Cloud Platform.	Medium	SR023
CR027	SAP said SAP Business Data Cloud natively embeds Databricks for data engineering, machine learning, and AI workloads.	Medium	SR022
CR028	Microsoft Fabric markets a complete data platform with AI-powered tools, a unified lake, autonomous databases, and shared resilience, security, governance, and compliance.	Medium	SR025
CR029	Amazon EMR markets serverless Spark, Trino, and Flink analytics plus a unified data-and-AI environment inside AWS with cost and performance claims.	Medium	SR026
CR030	Snowflake’s FY2026 10-K says its AI Data Cloud runs across three major public clouds and 53 regional deployments and includes cross-cloud business-continuity capabilities.	Medium	SR024
CR031	TechCrunch reported that Databricks closed $10 billion of Series J equity financing at a $62 billion valuation in January 2025 and also added $5.25 billion of debt financing.	Medium	SR013
CR032	TechCrunch reported that Databricks planned to use its January 2025 financing for new AI products, global go-to-market expansion, acquisitions, and employee liquidity.	Medium	SR013
CR033	Databricks said its August 2025 Series K term sheet valued the company at more than $100 billion.	Medium	SR014
CR034	CRN reported that Databricks closed a $1 billion Series K round at a valuation above $100 billion in September 2025.	Medium	SR015
CR035	TechCrunch reported that Databricks raised more than $4 billion at a $134 billion valuation in December 2025, up 34% from $100 billion three months earlier.	High	SR016, SR020
CR036	TechCrunch reported that Databricks was investing heavily in Lakebase and Agent Bricks and had struck model-access deals worth hundreds of millions with Anthropic and OpenAI.	Medium	SR016
CR037	CNBC reported in January 2026 that Databricks landed $1.8 billion of fresh debt and had access to more than $7 billion of debt.	Medium	SR017
CR038	CNBC reported in January 2026 that Databricks’ December round implied a $134 billion valuation alongside $4.8 billion of run-rate revenue growing more than 55% year over year and positive free cash flow.	High	SR017, SR016
CR039	CNBC reported in February 2026 that Databricks completed $5 billion of funding plus $2 billion of new debt capacity at a $134 billion valuation.	High	SR018, SR019
CR040	CNBC reported in February 2026 that Databricks’ annualized revenue exceeded $5.4 billion for the January quarter, up 65% year over year, while delivering free cash flow over the prior year.	High	SR018, SR019
CR041	CRN reported in February 2026 that Databricks’ AI-products revenue run rate exceeded $1.4 billion and that the company had 800 $1 million customers and 70 $10 million customers.	Medium	SR019
CR042	Databricks’ legal center is the company’s public hub for legal documents, privacy FAQs, service terms, and compliance resources.	Medium	SR028
CR043	Databricks’ trust and privacy center positions privacy, trust, and subprocessor-related materials as a public diligence surface for customers.	Medium	SR029
CR044	Because advanced compliance controls sit in a named add-on and public-sector authorization is explicitly tied to Databricks on Azure Commercial, Databricks’ public compliance coverage is strong but not obviously uniform across all clouds and tiers.	Medium	SR007, SR009, SR030
CR045	Databricks’ AI roadmap now depends on external model partners, hyperscalers, and embedded channels, so partner concentration can affect product availability, economics, and account control even while it speeds distribution.	Medium	SR021, SR022, SR023, SR025, SR026
CR046	The jump from a $62 billion valuation in January 2025 to more than $100 billion in August-September 2025 and $134 billion by December 2025-February 2026 leaves less room for execution misses or delayed IPO timing.	Medium	SR013, SR014, SR015, SR016, SR018
CR047	Databricks’ simultaneous pushes into Lakebase, Agent Bricks, AI apps, and strategic partner integrations increase execution complexity relative to a narrower lakehouse product story.	Medium	SR014, SR016, SR023
CR048	Databricks’ public documentation surface is stronger than many private AI infrastructure peers, but it reduces diligence friction more than it eliminates litigation, outage, dependency, or valuation risk.	Medium	SR001, SR002, SR003, SR006, SR028, SR029
CV001	Databricks announced a Series J financing on 2024-12-17 at a $62 billion valuation.	Medium	SV001
CV002	Databricks said the Series J package targeted $10 billion of expected non-dilutive financing and had completed $8.6 billion to date.	Medium	SV001
CV003	Databricks said in December 2024 that it expected to cross a $3 billion revenue run-rate in the quarter ending 2025-01-31.	Medium	SV001
CV004	Databricks said in December 2024 that the quarter ending 2025-01-31 would mark its first positive free-cash-flow quarter.	Medium	SV001
CV005	Databricks announced on 2025-08-19 that it had signed a Series K term sheet valuing the company at more than $100 billion.	Medium	SV002
CV006	Databricks announced on 2025-12-16 that it was raising more than $4 billion in a Series L round at a $134 billion valuation.	Medium	SV003, SV004, SV005
CV007	Databricks said it crossed a $4.8 billion revenue run-rate in Q3 2025.	Medium	SV003, SV007
CV008	Databricks said Q3 2025 revenue was growing by more than 55% year over year.	Medium	SV003, SV007
CV009	Databricks said its AI products reached more than a $1 billion revenue run-rate by Q3 2025.	Medium	SV003, SV007
CV010	Databricks said its Data Warehousing business had reached more than a $1 billion revenue run-rate by Q3 2025.	Medium	SV003
CV011	Databricks said it had delivered positive free cash flow over the previous 12 months as of the Series L announcement.	Medium	SV003, SV006
CV012	Databricks said net retention remained above 140% at the time of the Series L announcement.	Medium	SV003, SV006
CV013	Databricks said more than 700 customers were already consuming over $1 million of annual revenue run-rate by December 2025.	Medium	SV003
CV014	CNBC reported that Databricks’ $134 billion Series L valuation was a 34% jump from the valuation implied by the August 2025 financing.	Medium	SV004
CV015	TechCrunch described the December 2025 Series L as Databricks’ third major venture fundraise in less than a year.	Medium	SV005
CV016	CRN reported that Databricks had surpassed a $5.4 billion annual revenue run-rate by the quarter ended 2026-01-31.	Medium	SV006, SV007
CV017	CRN reported that Databricks grew 65% year over year in the quarter ended 2026-01-31.	Medium	SV006, SV007
CV018	CRN reported that Databricks’ AI products exceeded a $1.4 billion revenue run-rate in the quarter ended 2026-01-31.	Medium	SV006, SV007
CV019	CRN reported that Databricks had 800 customers above a $1 million annual run-rate and 70 customers above a $10 million annual run-rate by February 2026.	Medium	SV006, SV007
CV020	CRN reported that the latest Databricks financing stack exceeded $7 billion, including about $5 billion of equity and about $2 billion of additional debt capacity.	Medium	SV006, SV007
CV021	CRN noted that Databricks still does not disclose detailed financial statements publicly despite reporting run-rate and growth snapshots.	Medium	SV006
CV022	Sacra estimated that AI products represented about 26% of Databricks’ January 2026 annualized revenue run-rate.	Medium	SV007
CV023	Sacra said Databricks was reporting 80% gross margins in June 2024, down from 85% a year earlier.	Low	SV007
CV024	Forbes wrote that Databricks was trading at roughly 25x forward revenue when it carried a $100 billion valuation against a $4 billion annual run-rate in October 2025.	Medium	SV008
CV025	Forbes wrote in October 2025 that Snowflake was trading at roughly 18x forward revenue on about $79 billion of market capitalization and expected fiscal-2026 revenue of $4.395 billion.	Low	SV008
CV026	Forbes argued that steep software valuation multiples came under pressure when public-market growth decelerated, using Snowflake as a cautionary example.	Medium	SV008
CV027	CompaniesMarketCap listed Snowflake’s market capitalization at $49.85 billion as of May 2026.	Medium	SV011
CV028	Snowflake reported $4.4723 billion of fiscal-2026 product revenue.	Medium	SV010, SV013
CV029	CompaniesMarketCap listed MongoDB’s market capitalization at $21.27 billion as of May 2026.	Medium	SV015
CV030	MongoDB reported $2.01 billion of fiscal-2025 total revenue.	Medium	SV014, SV017
CV031	CompaniesMarketCap listed Confluent’s market capitalization at $11.13 billion as of May 2026.	Medium	SV018
CV032	Macrotrends listed Confluent’s 2025 annual revenue at $1.167 billion.	Medium	SV019
CV033	CompaniesMarketCap listed Elastic’s market capitalization at $5.24 billion as of May 2026.	Medium	SV020
CV034	Macrotrends listed Elastic’s 2025 annual revenue at $1.483 billion.	Medium	SV021, SV022
CV035	CompaniesMarketCap listed Cisco’s market capitalization at $365.87 billion as of May 2026.	Medium	SV024
CV036	Macrotrends listed Cisco’s 2025 annual revenue at $56.654 billion.	Medium	SV025
CV037	Cisco’s September 2023 merger filing said the Splunk acquisition would pay $157.00 per share in cash and value the acquired equity at about $28 billion.	Medium	SV023
CV038	CompaniesMarketCap listed Palantir’s market capitalization at $350.05 billion as of May 2026.	Medium	SV026
CV039	Macrotrends listed Palantir’s 2025 annual revenue at $4.475 billion.	Medium	SV027
CV040	CompaniesMarketCap listed ServiceNow’s market capitalization at $94.84 billion as of May 2026.	Medium	SV028
CV041	Macrotrends listed ServiceNow’s 2025 annual revenue at $13.278 billion.	Medium	SV029
CV042	ServiceNow reported $3.671 billion of Q1 2026 subscription revenue and $27.7 billion of remaining performance obligations.	Medium	SV030
CV043	Using Databricks’ disclosed $134 billion valuation and $4.8 billion run-rate implies roughly a 27.9x valuation-to-run-rate multiple.	Medium	SV003
CV044	Using Databricks’ disclosed $134 billion valuation and $5.4 billion run-rate implies roughly a 24.8x valuation-to-run-rate multiple.	Medium	SV006, SV007
CV045	Using current May 2026 market capitalization and latest annual revenue implies Snowflake trades around 11.1x revenue.	Medium	SV011, SV013
CV046	Using current May 2026 market capitalization and latest annual revenue implies MongoDB trades around 10.6x revenue.	Medium	SV015, SV017
CV047	Using current May 2026 market capitalization and latest annual revenue implies Confluent trades around 9.5x revenue.	Medium	SV018, SV019
CV048	Using current May 2026 market capitalization and latest annual revenue implies Elastic trades around 3.5x revenue.	Medium	SV020, SV021
CV049	Using current May 2026 market capitalization and latest annual revenue implies Palantir trades around 78.2x revenue.	Medium	SV026, SV027
CV050	Using current May 2026 market capitalization and latest annual revenue implies ServiceNow trades around 7.1x revenue.	Medium	SV028, SV029
CV051	Public sources support that Databricks’ IPO timing remained discretionary into early 2026: management would not rule out 2026, but no filing timeline or audited S-1 process was public.	Medium	SV004, SV006, SV009
CV052	A reasonable base-case valuation range is about $110 billion to $145 billion if Databricks reaches roughly $6.0 billion to $6.6 billion run-rate while public comp multiples stay in the high-single-digit to low-double-digit range.	Medium	SV006, SV011, SV013, SV015, SV017, SV018, SV019, SV020, SV021, SV028, SV029
CV053	A bull case above the current mark requires Databricks to preserve an AI premium while scaling toward roughly $8 billion or more of run-rate, supporting a valuation range around $180 billion to $220 billion.	Medium	SV003, SV006, SV007, SV026, SV027, SV028, SV029
CV054	A bear case of roughly $55 billion to $85 billion is plausible if growth slows toward mature-software levels and Databricks rerates toward the 10x to 15x range visible in public data-platform comps.	Medium	SV008, SV011, SV013, SV015, SV017, SV018, SV019, SV020, SV021
CV055	At a $134 billion entry price, Databricks offers limited base-case upside and therefore fits a track posture better than a buy posture on public evidence alone.	Medium	SV004, SV006, SV008, SV011, SV013, SV015, SV017, SV018, SV019, SV020, SV021, SV028, SV029
CV056	The main thesis-break triggers are multiple compression, loss of >55% growth, failure to convert AI mix into durable economics, or disclosure that a preference stack materially reduces common-equity upside.	Medium	SV003, SV006, SV007, SV008
CV057	The most material remaining diligence asks are the cap table and preference stack, audited revenue-to-run-rate bridge, debt terms, customer concentration, and AI-product gross margin.	Medium	SV006, SV007, SV009
CV058	The comparable sample is model-appropriate only as a partial reference set because Databricks is private and uses a post-money valuation while public comps are current market-cap snapshots tied to different revenue definitions.	Medium	SV003, SV006, SV008, SV011, SV013, SV015, SV017, SV018, SV019, SV020, SV021, SV028, SV029

Sources
ID	Publisher	Title	Quote
SO001	Databricks	About Databricks: The data and AI company \| Databricks	Today, more than 15,000 organizations worldwide ... rely on the Databricks Data Intelligence Platform.
SO002	Databricks	Press Kit \| Databricks	More than 20,000 customers globally. More than 10,000 employees worldwide.
SO003	Databricks	Contact Databricks - Get in Touch \| Databricks	160 Spear Street, 15th Floor
SO004	Databricks	Databricks Founders - Company Overview \| Databricks	Databricks was founded by Ali Ghodsi, Ion Stoica, Matei Zaharia, Patrick Wendell, Reynold Xin, Andy Konwinski, and Arsalan Tavakoli-Shiraji.
SO005	Databricks	Learn About Databricks Spark \| Databricks	Explore Apache Spark: A unified analytics engine for big data and machine learning.
SO006	Databricks	Databricks Board of Directors \| Databricks	Our Databricks Board of Directors leverages decades of experience to chart a new course for data and AI.
SO007	Databricks	Databricks is Raising $10B Series J Investment at $62B Valuation - Databricks	The company is raising $10 billion of expected non-dilutive financing and has completed $8.6 billion to date. This funding values Databricks at $62 billion.
SO008	Databricks	Databricks Surpasses $4B Revenue Run-Rate, Exceeding $1B AI Revenue Run-Rate - Databricks	Databricks ... has crossed a $4 billion revenue run-rate during Q2 ... AI products also recently crossed a $1 billion revenue run-rate.
SO009	Databricks	Databricks Grows >55% YoY, Surpasses $4.8B Revenue Run-Rate, and is Raising >$4B Series L at $134B Valuation - Databricks	Databricks ... is raising a >$4 billion Series L investment, valuing the company at $134 billion.
SO010	Databricks	Databricks Grows >65% YoY, Surpasses $5.4 Billion Revenue Run-Rate, Doubles Down on Lakebase and Genie - Databricks	Databricks ... crossed a $5.4 billion revenue run-rate ... including ~$5B of equity financing at a $134 billion valuation and ~$2B of additional debt capacity.
SO011	Databricks	Databricks + MosaicML \| Databricks Blog	Today, we’re excited to share that we’ve completed our acquisition of MosaicML.
SO012	Databricks	Databricks + Tabular \| Databricks Blog	We are excited to announce that we have agreed to acquire Tabular, Inc ... June 7, 2024 UPDATE: We’re excited to share that we’ve completed our acquisition of Tabular!
SO013	Databricks	Databricks and Neon \| Databricks Blog	Databricks and Neon will deliver serverless Postgres for developers and AI agents.
SO014	CNBC	Databricks says annualized revenue to reach $3.7 billion by next month	Databricks told investors and analysts on Wednesday that annualized revenue will hit $3.7 billion by July.
SO015	CNBC	Databricks obtains $1.8 billion in additional debt ahead of IPO	Data analytics software company Databricks has landed $1.8 billion in fresh debt.
SO016	TechCrunch	Databricks picks up MosaicML, an OpenAI competitor, for $1.3B \| TechCrunch	Databricks announced it will pay $1.3 billion to acquire MosaicML.
SO017	TechCrunch	Databricks acquires Tabular to build a common data lakehouse standard \| TechCrunch	Databricks ... has acquired data management company Tabular for an undisclosed sum. (CNBC reports that Databricks paid over $1 billion.)
SO018	TechCrunch	Databricks CEO says fresh $1B will help him attack a new AI database market \| TechCrunch	The round was co-led by both Thrive and one of Databricks’ early investors, Insight Partners ... The company has now raised about $20 billion since it was founded in 2013.
SO019	SAP News Center	SAP Business Data Cloud: SAP & Databricks Turbocharge AI	The new solution natively embeds Databricks technology for data engineering, machine learning and AI workloads.
SO020	The Register	Databricks fails to shake authors' copyright claim	Databricks cannot shake a class action lawsuit targeting its LLM.
SO021	Joseph Saveri Law Firm	Databricks, Inc. Large Language Model Litigation	On April 21, 2026, the Court denied defendants' motion to dismiss and motion to strike allegations relating to the DBRX series of large language models.
SO022	Communications of the ACM	Apache Spark: A Unified Engine for Big Data Processing	BY MATEI ZAHARIA, REYNOLD S. XIN, PATRICK WENDELL, MICHAEL J. FRANKLIN, ALI GHODSI, JOSEPH GONZALEZ ...
SO023	UC Berkeley Sutardja Center	Ghodsi, Ali - UC Berkeley Sutardja Center	Ali Ghodsi is cofounder and CEO of software startup Databricks ... Ghodsi cofounded Databricks with six UC Berkeley academics who built the popular analytics engine Apache Spark.
SO024	Insight Partners	Databricks \| Investment \| Insight Partners	Databricks \| Investment \| Insight Partners
SO025	Microsoft Azure	Azure Databricks \| Microsoft Azure	Azure Databricks can help streamline your entire data and AI lifecycle within a single, scalable environment.
SO026	Google Cloud	Databricks \| Google Cloud	Read about the Google Cloud and Databricks partnership.
SO027	PRNewswire	Databricks is Raising $10B Series J Investment at $62B Valuation	Databricks ... announced its Series J funding. The company is raising $10 billion of expected non-dilutive financing and has completed $8.6 billion to date.
SM001	Databricks	Databricks IQ: AI-Driven Analytics for Faster Data Insights \| Databricks
SM002	Databricks	Data Lakehouse Architecture \| Databricks
SM003	Databricks	Production-quality ML and GenAI \| Databricks
SM004	Databricks	Unity Catalog \| Databricks
SM005	Databricks	AI-Powered Business Intelligence \| Databricks
SM006	Databricks	Public Sector Solutions \| Databricks
SM007	Databricks Docs	What is Unity Catalog? \| Databricks on AWS
SM008	Databricks	A global study of 1,100 technologists—plus 28 CIO interviews \| Databricks
SM009	AWS Marketplace	AWS Marketplace: Databricks, Inc.
SM010	Google Cloud	Databricks \| Google Cloud
SM011	Microsoft Learn	What is Azure Databricks? - Azure Databricks \| Microsoft Learn
SM012	Snowflake	Snowflake Reports Financial Results for the Fourth Quarter and Full-Year of Fiscal 2026
SM013	Confluent	The 2025 Data Streaming Report: Real-Time Data, Real Business Results
SM014	IDC	IDC Unveils 2025 FutureScapes: Worldwide IT Industry Predictions
SM015	Grand View Research	Data Lakehouse Market Size & Share \| Industry Report, 2033
SM016	Global Market Insights	Data Lakehouse Market Size & Share \| Growth Forecast 2025-2034
SM017	The Business Research Company	Data Lakehouse Market Report 2026, Size And Share By 2035
SM018	McKinsey	State of AI trust in 2026: Shifting to the agentic era
SM019	Deloitte	The State of AI in the Enterprise - 2026 AI report \| Deloitte US
SM020	NIST	AI Risk Management Framework \| NIST
SM021	European Commission	AI Act \| Shaping Europe’s digital future
SM022	FinOps Foundation	The State of FinOps Report 2025
SM023	CIO	Data infrastructure: The missing link in successful AI adoption \| CIO
SM024	CDOTrends	Databricks Study: Enterprise IT Not Ready for AI \| CDOTrends
SM025	Databricks	State and Local Government Industry Solutions \| Databricks
SP001	Databricks	Data Lakehouse Architecture \| Databricks
SP002	Databricks	Databricks Pricing: Flexible Plans for Data and AI Solutions \| Databricks
SP003	Databricks	Unity Catalog \| Databricks
SP004	Databricks	AI-Powered Business Intelligence \| Databricks
SP005	Databricks	Databricks Eliminates Table Format Lock-in and Adds Capabilities for Business Users with Unity Catalog Advancements
SP006	PR Newswire	Databricks is Raising $10B Series J Investment at $62B Valuation
SP007	Snowflake	Snowflake key concepts and architecture \| Snowflake Documentation
SP008	Snowflake	Understanding overall cost \| Snowflake Documentation
SP009	Snowflake Investor Relations	Snowflake Reports Financial Results for the Fourth Quarter and Full-Year of Fiscal 2026
SP010	Google Cloud	BigQuery \| AI data platform \| EDW
SP011	Google Cloud	BigQuery \| Google Cloud
SP012	Google Cloud	Apache Iceberg tables \| BigQuery \| Google Cloud Documentation
SP013	SEC / Alphabet	Alphabet Announces Fourth Quarter and Fiscal Year 2024 Results
SP014	Microsoft	Data Analytics Platform \| Microsoft Fabric
SP015	Microsoft Azure	Microsoft Fabric - Pricing \| Microsoft Azure
SP016	Microsoft Learn	What is Microsoft Fabric - Microsoft Fabric \| Microsoft Learn
SP017	Microsoft Investor Relations	FY25 Q4 - Press Releases - Investor Relations - Microsoft
SP018	AWS	Cloud Data Warehouse – Amazon Redshift – AWS
SP019	AWS	Amazon Redshift Pricing
SP020	Amazon	Amazon Q4 2024 Earnings Release
SP021	Confluent	Stream Processing for Analytics and AI on Confluent
SP022	Confluent	Confluent Pricing–Save on Kafka Costs \| Confluent
SP023	Business Wire	Confluent Announces Fourth Quarter and Fiscal Year 2024 Financial Results
SP024	Apache Spark	Apache Spark - Unified Engine for large-scale data analytics
SP025	Trino Software Foundation	Trino \| Distributed SQL query engine for big data
SP026	Rill	The Open Table Format Revolution: Why Hyperscalers Are Betting on Managed Iceberg
SI001	Databricks	Databricks Pricing: Flexible Plans for Data and AI Solutions \| Databricks
SI002	Microsoft Azure	Azure Databricks pricing
SI003	Microsoft Learn	Serverless DBU consumption by SKU - Azure Databricks \| Microsoft Learn
SI004	Databricks	Databricks is Raising $10B Series J Investment at $62B Valuation - Databricks
SI005	Databricks	Databricks Surpasses $4B Revenue Run-Rate, Exceeding $1B AI Revenue Run-Rate - Databricks
SI006	Databricks	Databricks Grows >65% YoY, Surpasses $5.4 Billion Revenue Run-Rate, Doubles Down on Lakebase and Genie - Databricks
SI007	Databricks	Getting the Full Picture: Unifying Databricks and Cloud Infrastructure Costs
SI008	Databricks	AI-Powered Business Intelligence \| Databricks
SI009	CNBC	Databricks says annualized revenue to reach $3.7 billion by next month - CNBC
SI010	CNBC	Databricks completes $5 billion funding round with $2 billion in debt
SI011	CRN	Databricks Reports $5.4 Billion Revenue Run Rate As It Closes A $7B Investment Round
SI012	Revenue Brew	Databricks raises $5 billion in latest funding round
SI013	Snowflake	Snowflake Pricing \| Choose the Right Edition for Your Data Needs
SI014	SEC / Snowflake	snow-20260131 - SEC.gov
SI015	Snowflake Investor Relations	Snowflake - Financials - SEC Filings
SI016	Google Cloud	BigQuery pricing
SI017	Amazon Web Services	Amazon Redshift pricing
SI018	SEC / Confluent	Confluent 10-K - SEC.gov
SI019	SEC	Confluent, Inc. browse page - SEC.gov
SI020	Sacra	Databricks revenue, valuation & funding \| Sacra
SI021	CloudForecast	Databricks Pricing Guide (2026): DBU Costs, Tiers & How to Cut Your Bill
SI022	Mammoth Analytics	Databricks Pricing Guide 2026: Costs & Plans Broken Down
SI023	Revefi	Databricks Growth Surges Hits Record Revenue 2026 \| Revefi
SI024	SEC	MongoDB, Inc. browse page - SEC.gov
SI025	Snowflake	Snowflake Pricing Calculator \| Estimate Snowflake Cost
SE001	Databricks	Databricks: Leading Data and AI Platform for Enterprises
SE002	Databricks	Data Lakehouse Architecture
SE003	Databricks	Unity Catalog
SE004	Databricks	AI-Powered Business Intelligence
SE005	Databricks	Lakebase
SE006	Databricks	Databricks Trust: Ensuring Security, Privacy, & Compliance
SE007	Databricks	Databricks Trust & Compliance: Ensuring Security & Privacy
SE008	Databricks	Databricks SOC Compliance
SE009	Databricks Docs	Architecture \| Databricks on AWS
SE010	Databricks Docs	What is the medallion lakehouse architecture?
SE011	Databricks Docs	What is Unity Catalog?
SE012	Databricks Docs	Deploy models using Mosaic AI Model Serving
SE013	Databricks Docs	Set up serverless SQL warehouses
SE014	Databricks Docs	Databricks release notes
SE015	Databricks Status	Databricks on AWS Status Page
SE016	GitHub	GitHub - databricks/cli: Databricks CLI
SE017	GitHub	GitHub - databricks/databricks-sdk-py: Databricks SDK for Python (Beta)
SE018	PyPI	databricks-sdk
SE019	Microsoft Learn	High-level architecture - Azure Databricks
SE020	Microsoft Learn	Databricks production planning - Azure Databricks
SE021	lakeFS	Databricks Architecture Overview: Components & Key Features
SE022	theCUBE Research	Databricks Keynote Highlights – Data + AI Summit 2025
SE023	TechCrunch	Databricks launches LakeFlow to help its customers build their data pipelines
SE024	TechCrunch	Databricks bought two startups to underpin its new AI security product
SE025	VentureBeat	Databricks' serverless database slashes app development from months to days as companies prep for agentic AI
SE026	NVIDIA Technical Blog	RAPIDS on Databricks: A Guide to GPU-Accelerated Data Processing
SE027	Google Cloud Blog	Committing to Apache Iceberg with our ecosystem partners
SE028	Google Cloud	Databricks \| Google Cloud
SE029	ServiceAlert.ai	Databricks Outage History, Downtime & Incident Records
SU001	Databricks	Customer Stories \| Databricks
SU002	Databricks	Data Intelligence in Action: 100+ Data and AI Use Cases from Databricks Customers
SU003	Databricks	Databricks Surpasses $4B Revenue Run-Rate, Exceeding $1B AI Revenue Run-Rate
SU004	Databricks	Databricks Grows >65% YoY, Surpasses $5.4 Billion Revenue Run-Rate, Doubles Down on Lakebase and Genie
SU005	CNBC	Databricks says annualized revenue will reach $3.7 billion by next month
SU006	CRN	Databricks Reports $5.4 Billion Revenue Run Rate As It Closes A $7B Investment Round
SU007	Microsoft	AT&T migration to Azure Databricks catalyzes technical staff, advances business goals
SU008	PeerSpot	Databricks reviews 2026
SU009	Capterra	Databricks Reviews 2022 - Capterra
SU010	FeaturedCustomers	1216 Databricks Customer Reviews & References \| FeaturedCustomers
SU011	Mastercard	Mastercard launches new gen AI digital assistant capabilities to enhance customer value
SU012	Mastercard	Databricks co-founder on the value of AI for businesses today
SU013	Microsoft Azure	Azure Databricks \| Microsoft Azure
SU014	Google Cloud	Databricks \| Google Cloud
SU015	SAP News Center	SAP and Databricks: A Bold New Era of Data and AI
SU016	Databricks	FOX Sports Reimagines the Fan Experience \| Databricks
SU017	Databricks	Mastercard powers the digital economy for all \| Databricks
SU018	Databricks	Insulet simplifies life for people living with diabetes \| Databricks
SU019	YouTube / Databricks Events	AI Agents for Marketing: Leveraging Mosaic AI to Create a Multi-Purpose Agentic Marketing Assistant
SU020	YouTube / Databricks Events	AT&T AutoClassify: Unified Multi-Head Binary Classification From Unlabeled Text
SU021	YouTube / Databricks Events	FOX Sports reimagines the fan experience with the Databricks Data Intelligence Platform
SU022	PR Newswire	Databricks Achieves FedRAMP High Agency Authority to Operate for AWS GovCloud
SU023	CNBC	Databricks completes $5 billion funding round with $2 billion in debt
SU024	Databricks	Databricks recognized as a Gartner® Peer Insights™ Customers’ Choice for Analytics and BI
SU025	Databricks	Customers \| Databricks
SR001	Databricks	Privacy Notice \| Databricks
SR002	Databricks	Databricks Data Processing Addendum \| Databricks
SR003	Databricks	Databricks Trust & Compliance: Ensuring Security & Privacy
SR004	Databricks Docs	Security, compliance, and privacy for the data lakehouse
SR005	Microsoft Learn	Compliance - Azure Databricks \| Microsoft Learn
SR006	Databricks	Databricks Status
SR007	FedRAMP Marketplace	Databricks on Azure Commercial \| FedRAMP Marketplace
SR008	EUR-Lex	Rules for trustworthy artificial intelligence in the EU
SR009	CourtListener	In Re Mosaic LLM Litigation, 3:24-cv-01451 – CourtListener.com
SR010	Internet Cases	Court lets authors expand copyright case to target Databricks’ new AI models
SR011	The Register	Databricks fails to shake authors' copyright claim
SR012	CFM Lawyers	Databricks AI - CFM Lawyers
SR013	TechCrunch	Databricks closes $15.3B financing at $62B valuation, Meta joins as strategic investor
SR014	Databricks	Databricks is raising a Series K Investment at >$100 billion valuation
SR015	CRN	Databricks Closes $1B Series K Funding Round, Exceeds $100B Market Cap
SR016	TechCrunch	Databricks raises $4B at $134B valuation as its AI business heats up
SR017	CNBC	Databricks obtains $1.8 billion in additional debt ahead of IPO
SR018	CNBC	Databricks completes $5 billion funding round with $2 billion in debt
SR019	CRN	Databricks Reports $5.4 Billion Revenue Run Rate As It Closes A $7B Investment Round
SR020	Silicon Republic	Databricks raising $4bn Series L at $134bn valuation
SR021	Databricks	Databricks Announces Strategic AI Partnership with Google Cloud to Bring Gemini Models Natively to the Data Intelligence Platform
SR022	SAP News Center	SAP Business Data Cloud: SAP & Databricks Turbocharge AI
SR023	Databricks	Databricks and Anthropic Sign Landmark Deal to Bring Claude Models to the Data Intelligence Platform
SR024	U.S. Securities and Exchange Commission	Snowflake FY2026 Annual Report on Form 10-K
SR025	Microsoft	Data Analytics Platform \| Microsoft Fabric
SR026	Amazon Web Services	Big Data Platform – Amazon EMR – Amazon Web Services
SR027	IsDown	Is Azure Databricks Down? Check current status and user reports
SR028	Databricks	Databricks Legal
SR029	Databricks	Security and Trust Center - Databricks
SR030	Databricks Docs	Compliance - Databricks on AWS
SV001	Databricks	Databricks is Raising $10B Series J Investment at $62B Valuation
SV002	Databricks	Databricks is raising a Series K Investment at >$100 billion valuation
SV003	Databricks	Databricks Grows >55% YoY, Surpasses $4.8B Revenue Run-Rate, and is Raising >$4B Series L at $134B Valuation
SV004	CNBC	Databricks raises capital at $134 billion valuation in latest funding round
SV005	TechCrunch	Databricks raises $4B at $134B valuation as its AI business heats up
SV006	CRN	Databricks Reports $5.4 Billion Revenue Run Rate As It Closes A $7B Investment Round
SV007	Sacra	Databricks revenue, valuation & funding
SV008	Forbes	Databricks: A Much-Anticipated IPO, But Will The Honeymoon Last?
SV009	Allied Venture Partners	Databricks IPO: expectations, key dates, valuation risks
SV010	SEC	Snowflake Form 10-K for fiscal year ended January 31, 2026
SV011	CompaniesMarketCap	Snowflake (SNOW) - Market capitalization
SV012	Macrotrends	Snowflake revenue 2019-2026
SV013	Business Wire	Snowflake Reports Financial Results for the Fourth Quarter and Full Year of Fiscal 2026
SV014	SEC	MongoDB Form 10-K for fiscal year ended January 31, 2025
SV015	CompaniesMarketCap	MongoDB (MDB) - Market capitalization
SV016	Macrotrends	MongoDB revenue 2018-2025
SV017	PR Newswire	MongoDB, Inc. Announces Fourth Quarter and Full Year Fiscal 2025 Financial Results
SV018	CompaniesMarketCap	Confluent (CFLT) - Market capitalization
SV019	Macrotrends	Confluent revenue 2021-2025
SV020	CompaniesMarketCap	Elastic NV (ESTC) - Market capitalization
SV021	Macrotrends	Elastic revenue 2018-2025
SV022	Business Wire	Elastic Reports Fourth Quarter and Fiscal 2025 Financial Results
SV023	SEC	Cisco Form 8-K announcing Splunk merger agreement
SV024	CompaniesMarketCap	Cisco (CSCO) - Market capitalization
SV025	Macrotrends	Cisco revenue 2010-2025
SV026	CompaniesMarketCap	Palantir (PLTR) - Market capitalization
SV027	Macrotrends	Palantir Technologies revenue 2019-2025
SV028	CompaniesMarketCap	ServiceNow (NOW) - Market capitalization
SV029	Macrotrends	ServiceNow revenue 2013-2025
SV030	Business Wire	ServiceNow Reports First Quarter 2026 Financial Results