Databricks
Elite data-and-AI asset, but the current private mark still prices in a lot of future perfection
Databricks is a premier late-stage data-and-AI platform, but the current $134B price still looks stretched versus public comps and available disclosure.
Cover facts
Company profile
Databricks is a late-stage private data-and-AI infrastructure company that grew out of the Apache Spark ecosystem and now sells a multicloud platform spanning data engineering, lakehouse storage and governance, warehousing, AI/BI, model and agent tooling, and adjacent database services. The company has reached rare private-market scale, with disclosed revenue run-rate above $5 billion, deep Fortune 500 penetration, and a fast-growing AI revenue stream, but public disclosure still lags what investors would normally expect to underwrite a $134 billion entry cleanly.
- Website
- www.databricks.com
- Founded
- 2013-01-01
- Founders
- Ali Ghodsi, Matei Zaharia, Ion Stoica
- Founding location
- Berkeley, California, USA
- Headquarters
- San Francisco, California, USA
- Product
- Databricks sells a unified, usage-based data and AI platform that combines lakehouse storage formats, data engineering, warehousing, governance, AI/BI, model and agent tooling, and multicloud deployment across Azure, AWS, and Google Cloud.
- Customers
- Large enterprises, digital-native companies, public-sector organizations, and data/AI teams that want a governed multicloud platform for analytics and production AI.
- Business model
- Primarily usage-based monetization through DBUs and adjacent serverless services, with expansion driven by workload growth, higher-value AI products, warehousing, governance, and broader platform adoption inside large accounts.
- Stage
- Late-stage private / pre-IPO
- Funding status
- Public financing sequence moved from a $62B Series J in December 2024 to more than $100B in Series K terms and a $134B Series L in December 2025, followed by a February 2026 package combining equity and debt.
Executive summary
Top strengths
- Rare late-stage scale with revenue run-rate above $5B and continued fast growth.
- Strong enterprise depth, including large-spend customer cohorts and multicloud distribution.
- Real AI monetization already above a $1B run-rate, not just product marketing.
- Deep technical founder roots and broad product surface across data, governance, and AI.
- Capital access and free-cash-flow signals reduce near-term financing stress.
Top risks
- The $134B mark still relies heavily on management-led run-rate disclosure rather than audited statements.
- Public software comps provide only limited support for the current premium multiple.
- Gross margin, concentration, cap-table seniority, debt terms, and AI economics remain under-disclosed.
- Competition from hyperscalers, Snowflake, and open data formats can pressure differentiation and pricing.
- Active copyright litigation and governance opacity still matter at this valuation level.
Open gaps
- Audited revenue and gross-margin bridge suitable for IPO-grade underwriting.
- Exact cap table, preference stack, debt covenants, and any employee-liquidity pricing terms.
- Customer concentration, cohort retention by product, and renewal-duration data.
- AI revenue contribution broken down by product, margin, and pass-through infrastructure exposure.
- Full governance roster, committee structure, and litigation reserve or insurance detail.
Contents
01Company Overview
1.1 Identity, mission, and the current public company shape
Databricks now presents itself as a large late-stage private infrastructure company rather than a narrowly defined Spark vendor. The cleanest current identity source is the official about page, which calls Databricks the data and AI company and frames the Data Intelligence Platform as a unified foundation for data, governance, and AI. Headquarters are clearly anchored in San Francisco and the contact page provides a specific 160 Spear Street address, which makes the company easy to anchor geographically for later chapters. The public scale message has also changed over time in a way worth preserving: the about page still says more than 15,000 organizations use Databricks, while the current press kit and later 2025-2026 press releases move that number to more than 20,000 customers. The safest reusable identity for the rest of the report is therefore a San Francisco-headquartered, privately held data-and-AI platform company with a multicloud operating model, broad enterprise reach, and explicit dependence on its own narrative around the Data Intelligence Platform.[CO004, CO005, CO006, CO007, CO008, CO009]
| Metric | Value / Status | Date | Confidence | Gap / Notes |
|---|---|---|---|---|
| Founding year | 2013 | 2013 | high | Consistent across Databricks about materials and the press kit. |
| Headquarters | San Francisco, California | 2026 public pages | high | Specific contact address is 160 Spear Street, 15th Floor. |
| Stage | Late-stage private / IPO-optional | 2026-01-23 | medium | Backed by $134B valuation and CNBC pre-IPO framing; still privately held. |
| Customer count | 20000 | 2026 press kit / 2025-2026 releases | high | Current company claim is 20,000+; older about page still says 15,000+, so use the newer figure and keep the older one as historical context. |
| Headcount | 10000 | 2026 press kit | medium | Current company claim is 10,000+ employees; CNBC reported roughly 8,000 in June 2025, so treat the trajectory as upward rather than point-precise. |
| Office count | 30 | 2026 press kit | medium | Company says 30+ offices globally but does not publish a full office list in the reviewed materials. |
| Revenue run-rate | 5400 | 2026-02-09 | medium | Annualized revenue run-rate, not audited GAAP revenue. |
| Latest public valuation | 134 | 2025-12-16 | high | Series L and subsequent CNBC coverage align on $134B. |
| Capital package disclosed | 7 | 2026-02-09 | medium | February 2026 disclosure combined roughly $5B equity and roughly $2B debt capacity; this is not the same as cumulative lifetime funding. |
| Public total raised | Public sources describe about $20B raised and multiple debt/equity packages, but the exact cumulative capital base is not fully reconciled. |
Use these values as the canonical company-overview baseline. Where the public record mixes annualized run-rate, equity, debt, and secondaries, the table keeps unsupported lifetime totals as null instead of manufacturing precision.
[CO001, CO006, CO007, CO009, CO010, CO011]1.2 Founders, leadership bench, and governance disclosure limits
The founder record is unusually strong even if the full executive and board roster is not. Databricks says it was founded in 2013 by seven UC Berkeley AMP Lab researchers, and the official founders page names Ali Ghodsi, Ion Stoica, Matei Zaharia, Patrick Wendell, Reynold Xin, Andy Konwinski, and Arsalan Tavakoli-Shiraji. That matters because the company still derives legitimacy from technical origin stories: Berkeley and Spark are not marketing garnish but core proof of founder-market fit. The Berkeley profile for Ali Ghodsi and the CACM Spark paper reinforce that Databricks emerged from the same academic and open-source ecosystem that created Apache Spark. Public executive disclosure remains thinner than the founder disclosure. Ali Ghodsi is easy to verify as co-founder and CEO, but the reviewed board page mainly proves that a governance surface exists, not who all directors are or how committees and investor rights are allocated. That should be treated as a real diligence gap, not silently papered over.[CO001, CO002, CO003, CO013, CO014, CO015]
| Person | Role | Background | Founder-market fit or functional coverage | Key-person dependency |
|---|---|---|---|---|
| Ali Ghodsi | Co-founder, CEO | UC Berkeley academic; Spark-era founder; public face of financing and product direction | Bridges technical origin story, enterprise narrative, and capital markets communication | High |
| Ion Stoica | Co-founder | UC Berkeley professor and AMP Lab figure | Anchors Databricks credibility in distributed systems and academic research roots | Medium |
| Matei Zaharia | Co-founder | Apache Spark creator and Databricks technical founder | Core link between Spark, lakehouse credibility, and platform architecture | High |
| Patrick Wendell | Co-founder | Spark-era Databricks engineering leader listed on the CACM Spark paper | Adds engineering depth and continuity from early platform design | Medium |
| Andy Konwinski / Arsalan Tavakoli-Shiraji / Reynold Xin | Co-founding technical bench | Named on the official founders page with Berkeley and open-source roots | Broadens the company's original technical bench beyond the CEO-led narrative | Medium |
Founder disclosure is strong; public executive and board detail beyond the founders is much thinner.
[CO002, CO003, CO013, CO014, CO015, CO016]1.3 Capital formation, investor map, and private-market maturity
Databricks has crossed into very large private-company financing territory, and the progression is well supported by both company and independent reporting. The decisive inflection was the December 2024 Series J announcement: Databricks said it was arranging $10 billion of expected non-dilutive financing at a $62 billion valuation and expected to cross a $3 billion revenue run-rate while reaching positive free cash flow. By September 2025 the company said it had crossed a $4 billion run-rate, exceeded a $1 billion AI revenue run-rate, and closed a $1 billion Series K at a valuation above $100 billion. The December 2025 Series L then pushed public valuation to $134 billion with a $4.8 billion run-rate, and the February 2026 update said Databricks had crossed $5.4 billion in run-rate revenue while completing a package worth more than $7 billion including equity and debt. Those are unmistakable signs of late-stage private-market maturity and IPO optionality. What remains unresolved is the precise cap table and true cumulative capital raised after the mix of non-dilutive financing, equity, debt, and 2025 employee liquidity programs.[CO017, CO018, CO019, CO020, CO025, CO027]
| Stakeholder | Role | Control or economic importance | Diligence ask |
|---|---|---|---|
| Thrive Capital | Series J lead / Series K co-lead | Lead investor in the 2024 $62B financing and again part of the 2025 >$100B round | Confirm board rights, liquidation preference, and any pro-rata or super-pro-rata rights across J/K/L. |
| Andreessen Horowitz | Series J co-lead / Series K co-lead | Repeatedly named in 2024-2025 primary rounds | Confirm ownership percentage after Series L and any special governance rights. |
| Insight Partners | Series J co-lead / Series K co-lead / portfolio investor | Named in multiple rounds and still publicly lists Databricks as a portfolio company | Confirm current economic stake and whether Insight has observer or nomination rights. |
| DST Global / GIC / WCM | Series J co-leads | Helped anchor the large 2024 non-dilutive package | Request financing docs to understand structure, warrant terms, and seniority. |
| MGX / Ontario Teachers / Sands / Wellington / ICONIQ | Named participants in 2024-2025 financings | Signal broad sovereign/institutional support but unclear ownership percentages | Request the cap table and side letters to reconcile exact positions. |
| Employees and former employees | Liquidity counterparties in 2024-2025 tender or secondary activity | Series J explicitly contemplated liquidity and TechCrunch reported two 2025 secondary rounds | Request tender documents, participation rates, and pricing to understand morale and dilution. |
| Lenders / debt providers | Supplemental financing providers | January-February 2026 debt facilities expanded Databricks' capital base beyond equity alone | Request debt covenants, maturity schedule, security package, and permitted lien language. |
| Cloud/platform partners (SAP, Microsoft, Google) | Distribution and ecosystem stakeholders | Not equity owners in the reviewed record, but strategically important for route-to-market and platform reach | Confirm commercial concentration, revenue share, and dependency by partner. |
Economic importance is supportable publicly; precise ownership and control rights are not.
[CO017, CO018, CO019, CO027, CO030, CO031]The public KPI stack points to a very large private enterprise-software platform with current scale signals but still-incomplete ownership detail.
Customer and employee values are company-reported floor values rather than exact counts; revenue is an annualized run-rate, not audited revenue.
[CO009, CO010, CO032, CO036, CO038, CO052]1.4 Scale signals, ecosystem reach, and what is reusable as ground truth
The strongest public scale signals are not just valuation headlines; they are the quality of Databricks customer and ecosystem indicators. Company materials now claim more than 20,000 customers globally, 70% Fortune 500 penetration, more than 10,000 employees, and 30-plus offices. Independent reporting fills in useful texture without fully replacing company claims: CNBC said Databricks had roughly 8,000 employees in June 2025, nearly 50 customers spending more than $10 million annually, and $2.6 billion of revenue in the fiscal year ending January 2025. The company then disclosed successive cohorts of 650-plus, 700-plus, and 800-plus customers consuming more than $1 million of annual revenue run-rate across September 2025, December 2025, and February 2026. Distribution is also broader than a single direct-sales motion. SAP says Business Data Cloud embeds Databricks technology, while Microsoft and Google each market Databricks as a first-party cloud offering. Together those signals support a reusable picture of Databricks as a scaled enterprise platform with multicloud reach, strong upsell dynamics, and a customer base deep enough to support large-spend cohorts.[CO009, CO010, CO011, CO012, CO021, CO022]
1.5 Milestones to reuse later, including the active adverse marker
The milestone record is rich enough that later chapters should not have to rediscover it. Databricks was founded in 2013, acquired MosaicML in 2023 to accelerate enterprise generative AI, moved in 2024 to acquire Tabular to converge Delta Lake and Apache Iceberg ecosystems, partnered with SAP in early 2025, and agreed in mid-2025 to acquire Neon to push deeper into operational databases for AI agents. Those strategic moves line up closely with the financing sequence from Series J to Series K to Series L and then the February 2026 debt-and-equity expansion. The one adverse event that clearly belongs in the chronology is the ongoing copyright litigation tied to Mosaic-era and DBRX-related model training. The Register and Saveri both indicate that the case survived a motion to dismiss in April 2026, so this is not a stale allegation that can be ignored. It should carry into later risk work as an active IP and model-governance issue.[CO017, CO018, CO025, CO027, CO032, CO036]
| Date | Event | Type | Amount / valuation / status | Participants | Implication |
|---|---|---|---|---|---|
| 2013-01-01 | Databricks founded | founding | Company formed | Seven UC Berkeley AMP Lab researchers | Canonical origin point for the company and later founder-market-fit narrative. |
| 2023-07-19 | MosaicML acquisition closed | product | $1.3B reported by TechCrunch | Databricks and MosaicML | Accelerated Databricks into enterprise generative AI training and customization. |
| 2024-06-04 | Tabular acquisition announced | product | Agreed; closed 2024-06-07 | Databricks and Tabular | Brought Apache Iceberg creators together with Delta Lake creators to reduce format fragmentation. |
| 2024-12-17 | Series J financing announced | financing | $10B expected non-dilutive at $62B valuation | Thrive, a16z, DST, GIC, Insight, WCM and others | Established Databricks as a megafinancing private company while targeting $3B revenue run-rate and positive free cash flow. |
| 2025-02-13 | SAP Business Data Cloud launched with embedded Databricks technology | partnership | Partnership live | SAP and Databricks | Strengthened enterprise distribution and business-data positioning. |
| 2025-05-14 | Neon acquisition agreement announced | product | Agreement announced | Databricks and Neon | Extended strategy into serverless Postgres for developers and AI agents. |
| 2025-09-08 | Series K and $4B revenue run-rate disclosed | scale | $1B at >$100B valuation | Databricks, a16z, Insight, MGX, Thrive, WCM | Showed valuation step-up, positive free cash flow, and rising AI monetization. |
| 2025-12-16 | Series L and $4.8B run-rate disclosed | financing | >$4B at $134B valuation | Databricks and Series L investors | Pushed the company into a new valuation band and reinforced platform breadth. |
| 2026-02-09 | Post-Series-L financing package expanded | governance | >$7B package including ~$2B debt capacity | Databricks and financing counterparties | Confirmed Databricks could layer debt on top of late-stage equity rather than rushing to public markets. |
| 2026-04-21 | DBRX-related copyright claims survived motion to dismiss | adverse | Active litigation | Authors, Databricks, Mosaic-related defendants | Creates a live IP and model-governance risk that later chapters should not ignore. |
This is the public chronology of record for the report; it prioritizes milestones that materially change identity, scale, strategy, or risk.
[CO001, CO017, CO018, CO019, CO020, CO027]Major Databricks inflection points show a shift from open-source roots to late-stage private-market scale, AI-platform expansion, and an active legal overhang.
[CO001, CO017, CO018, CO020, CO027, CO032]Databricks' identity, platform, customers, capital base, partner routes, and litigation risk form one connected system rather than isolated datapoints.
[CO003, CO004, CO005, CO009, CO017, CO018]1.6 Exhibits
02Market Analysis
2.1 Market boundary and sizing lenses
Databricks should be bounded from its monetization surfaces outward rather than from every dollar labeled AI or cloud. The company presents a lakehouse-based platform that unifies storage, processing, governance, BI, and AI workloads, then layers agent development and governed business intelligence on top. That means the most relevant direct spend pools are lakehouse software and services, governed analytics, AI development on enterprise data, and public-sector data modernization. It should exclude generic cloud infrastructure, chips, foundation-model spending that never touches Databricks workflows, and broad consulting or systems-integration work that does not attach to the platform. Public market estimates confirm demand but also show why one headline TAM is misleading: Grand View pegs the core data lakehouse market at USD 13.94 billion in 2025, GMI at USD 14.2 billion, and TBRC at USD 10.33 billion, while IDC’s USD 337 billion 2025 AI-supporting-technology forecast is an outer envelope that is much broader than Databricks’ practical revenue pool. The right reading is that Databricks has macro tailwinds and a credible core category, but not a publicly isolatable SAM or SOM. That leaves the valuation debate anchored less on absolute TAM rhetoric and more on Databricks’ ability to consolidate budgets that would otherwise stay fragmented across warehouses, BI, governance, streaming, and bespoke AI stacks.[CM053, CM054, CM055, CM056, CM057, CM060]
| publisher | year | geography | value | CAGR | methodology | confidence | limitation |
|---|---|---|---|---|---|---|---|
| Grand View Research | 2024 | Global | 11.35 | Current data lakehouse market snapshot | medium | Core lakehouse category only, not Databricks-specific revenue pool | |
| Grand View Research | 2025 | Global | 13.94 | 23.2 | Publisher forecast for the core data lakehouse market through 2033 | medium | Forecast window and category boundary differ from other publishers |
| Global Market Insights | 2024 | Global | 11.9 | Current data lakehouse market snapshot | medium | Core lakehouse category only, not a broader AI platform measure | |
| Global Market Insights | 2025 | Global | 14.2 | 25 | Publisher forecast for the core data lakehouse market through 2034 | medium | Longer horizon and methodology differ from Grand View and TBRC |
| The Business Research Company | 2025 | Global | 10.33 | Current data lakehouse market snapshot | medium | Shorter-term definition than some other publisher estimates | |
| The Business Research Company | 2026 | Global | 12.58 | 21.8 | Near-term forecast from 2025 base | medium | Not directly comparable with 2033-2034 endpoints |
| The Business Research Company | 2030 | Global | 27.28 | 21.4 | Five-year forecast for the core lakehouse category | medium | Shorter horizon than the 2033-2034 forecasts |
| IDC FutureScape | 2025 | Worldwide | 337 | Outer-envelope spending on AI-supporting technology | low | Far broader than the core data lakehouse category or Databricks SAM | |
| IDC FutureScape | 2028 | Worldwide | 749 | Outer-envelope forecast for AI-supporting technology | low | Not comparable to lakehouse-only estimates; useful as macro context only |
All values are USD billions. The first seven rows estimate the core data-lakehouse category; the IDC rows are a broader AI-supporting-technology envelope to show why Databricks TAM depends on boundary selection.
[CM023, CM024, CM025, CM028, CM029, CM030]Databricks should be valued against nested spending lenses: a broad AI envelope, a narrower core lakehouse category, an enterprise-weighted buyer slice, and then an undisclosed Databricks-specific SAM/SOM.
This figure is a boundary lens, not a strict TAM-SAM-SOM waterfall. Public sources support the outer envelope and core category, but they do not isolate a Databricks-specific SAM or SOM.
[CM024, CM029, CM031, CM027, CM038, CM037]The best apples-to-apples public range is the 2025 core data lakehouse market, not the much broader AI-supporting-technology envelope.
The range keeps one consistent unit and one category definition: 2025 core data lakehouse market size. Broader AI-supporting-technology spend is excluded from the range because it is not comparable.
[CM024, CM029, CM031]2.2 Buyer segmentation, budgets, and adoption path
The buyer map is broader than a single data-engineering budget. Official and partner materials show Databricks selling first to central data-platform teams, then to analytics leaders, data scientists, ML engineers, application developers, and public-sector data offices. Microsoft’s Azure Databricks overview explicitly groups data engineering, ML and AI, BI, and streaming analytics as core workloads, which implies multiple internal champions and budget owners. Databricks’ own AI/BI and Unity Catalog pages show a path from governed SQL and semantic layers into nontechnical business-user workflows, while Mosaic AI pushes into model and agent builders. Procurement also varies by segment: some deals can route through AWS Marketplace, Google Cloud, or Azure relationships, while public-sector programs emphasize agency compliance, fiscal decision-making, and secure data sharing across state, local, federal, and higher-education contexts. The enterprise-weighted character of the category still matters: Grand View says large enterprises represented 71.4% of 2024 data-lakehouse revenue, which fits Databricks’ multicloud, governance-heavy pitch and suggests the most important budgets sit with CIO, CDO, platform, and regulated-operations leaders rather than only isolated experimenters. These paths overlap, but they do not spend identically.[CM059, CM060, CM065, CM066, CM067, CM068]
| segment | buyer | user | payer/workflow | budget owner | adoption trigger |
|---|---|---|---|---|---|
| Central data platform | CIO, CDO, or platform leader | Data engineers and platform teams | Lakehouse consolidation, ETL, governance, and shared data services | Central IT, data office, or platform budget | Need to replace fragmented storage, ETL, governance, and analytics estates |
| Analytics and BI | Head of analytics, finance systems, or business operations leader | Analysts, finance teams, and business managers | Governed SQL analytics, dashboards, semantics, and conversational BI | Analytics, finance operations, or shared data budget | Need faster self-service analytics without adding separate BI silos |
| AI and ML builders | CTO, VP Engineering, or ML platform leader | Data scientists, ML engineers, and agent developers | Model training, agent evaluation, vector search, serving, and governed GenAI workflows | Engineering, product, or data science budget | Need to move from AI experimentation to production on enterprise data |
| Application and product teams | Product leader or engineering manager | Developers building data or AI applications | Use Databricks data, SQL, and AI services inside customer-facing or internal apps | Product engineering budget | Need shared data platform primitives without building the stack internally |
| Public sector and higher education | Agency CIO, data office leader, or university administrator | Policy teams, analysts, and domain operators | Compliance analytics, fiscal decision support, secure data sharing, and public-service AI use cases | Agency technology, program, or institutional budget | Need compliant modernization and cross-agency or campus data access |
| Regulated enterprises | Risk, compliance, finance, or operations leader | Analysts, reviewers, and line-of-business specialists | Trusted analytics and AI on sensitive data with auditability and oversight | Functional budget with governance oversight | Need higher productivity but cannot sacrifice lineage, controls, or policy enforcement |
Databricks spans technical platform budgets and business-user analytics budgets, but the strongest public evidence still points to enterprise-scale, governance-heavy buying centers.
[CM059, CM065, CM066, CM067, CM068, CM069]Databricks fits best where data-platform control, clear ROI, and governance needs come together; public-sector and highly regulated workflows remain attractive but slower.
[CM065, CM066, CM067, CM068, CM069, CM070]2.3 Growth drivers, adoption constraints, and valuation relevance
The demand backdrop is strong enough to justify continued category expansion. IDC expects AI-supporting-technology spend to reach USD 337 billion in 2025 and exceed USD 749 billion by 2028, while Confluent reports that 90% of surveyed IT leaders are increasing data-streaming investment and 44% report 5x ROI. Deloitte also says worker access to AI rose 50% in 2025 and that companies with 40% or more of projects in production are set to double in six months. But the same sources show the friction that matters for Databricks underwriting. Deloitte says only one in five companies has mature governance for autonomous AI agents and that enterprises feel less prepared in infrastructure, data, risk, and talent than they do strategically. McKinsey highlights security, inaccuracy, cybersecurity, and training gaps as major barriers. FinOps shows AI-spend governance moving up the priority stack, and CIO argues that outdated data estates still cannot feed AI systems well. The EU AI Act and NIST AI RMF reinforce that governed deployment, not raw experimentation, sets the pace for high-consequence use cases. Competition is also intense: Snowflake’s 2026 results show strong incumbent momentum and continued customer budget rationalization. For valuation, that means Databricks benefits from large secular demand, but durable upside still depends on converting governed pilots into repeatable production budgets faster than peers and substitutes.[CM038, CM039, CM040, CM041, CM042, CM043]
| segment/category | included spend | excluded spend | buyer/payer | relevance |
|---|---|---|---|---|
| Unified lakehouse platform | Core platform spend for storage, processing, SQL, governance, and shared data infrastructure on a lakehouse architecture | Generic object storage, unmanaged compute, and point ETL spend with no Databricks workload | CIO, CDO, data-platform owner, and central IT budgets | Primary direct category for Databricks core platform land motions |
| Governed analytics and BI | Databricks SQL, AI/BI, semantics, dashboards, and conversational analytics on governed enterprise data | Standalone BI seat licenses or reporting spend that never attaches to Databricks data and semantics | Analytics leaders, finance operations, business intelligence teams, and shared data budgets | Direct expansion vector from technical data teams into business-user workflows |
| AI, ML, and agent development on enterprise data | Model development, agent evaluation, vector search, serving, and guardrailed GenAI workloads tied to enterprise data | Foundation-model API spend or generic inference spend that never uses Databricks data pipelines or governance | CTO, VP Engineering, ML platform lead, product engineering, and data science budgets | High-growth adjacency that broadens Databricks beyond classic analytics |
| Public-sector data modernization | Agency data integration, compliance analytics, secure data sharing, and higher-education or government AI use cases | Generic systems integration, public-cloud migration, or consulting spend that does not land on Databricks workloads | Agency CIOs, program leaders, data offices, procurement, and education administrators | Distinct vertical motion with procurement and compliance-heavy sales dynamics |
| Status-quo substitute stack | Legacy warehouses, fragmented ETL pipelines, point BI tools, self-managed Spark, and internal AI tooling that can be displaced | Net-new AI infrastructure spending that does not replace an existing workflow or data stack | Existing IT and analytics budget owners defending incumbent tools | Main source of displacement budget and the clearest practical substitute set |
| Broad AI-supporting technology envelope | AI-supporting software, infrastructure, and cloud renovation counted by broad macro forecasts | Any assumption that all AI-supporting spend converts into Databricks revenue | CIO, CTO, cloud platform owner, and transformation budget pools | Useful outer bound for demand context, but too broad to call Databricks SAM |
The boundary starts from Databricks monetization surfaces: lakehouse infrastructure, governed analytics, AI workloads on enterprise data, and public-sector modernization. Generic cloud infrastructure and broad AI-enablement spend remain context, not direct market size.
[CM053, CM054, CM055, CM056, CM057, CM058]| driver/constraint | direction | timing | implication | diligence ask |
|---|---|---|---|---|
| AI-supporting technology spend expansion | up | 12-36 months | Expands the macro budget pool for governed data and AI platforms | Request management’s internal split of revenue exposure to core lakehouse, BI, and GenAI workloads. |
| Shift from experimentation to reinvention | up | current | Supports broader platform decisions that combine data, infrastructure, and AI rather than buying point tools | Request win-rate data versus point solutions and status-quo internal builds. |
| Real-time data and streaming ROI | up | current | Improves the case for unified platforms that can feed AI with fresh, trusted data | Request attach rates between streaming, lakehouse, and AI workloads on Databricks. |
| Large-enterprise budget concentration | up | current | Favours Databricks because the category is still enterprise-heavy and governance-heavy | Request Databricks revenue mix by enterprise size and average expansion path. |
| Public-sector and education modernization | up | 12-24 months | Creates vertical demand where secure sharing and compliance matter more than greenfield AI hype | Request public-sector pipeline, contract sizes, and procurement-cycle benchmarks. |
| Autonomous-agent governance immaturity | down | current | Slows rollout of higher-consequence AI workflows even when experimentation is widespread | Request production deployment counts for governed agents versus proofs of concept. |
| Regulatory compliance timeline | down | 12-24 months | EU and trust frameworks raise the cost of deploying AI into sensitive workflows without controls | Request product roadmap evidence for AI transparency, monitoring, and high-risk use-case support. |
| FinOps scrutiny and AI-spend governance | down | current | Budget owners are pushing harder on unit economics, forecasting, and policy before scaling spend | Request margin and payback assumptions for AI workloads, especially serverless and model-serving usage. |
| Legacy data modernization backlog | down | current | Organizations still need data cleanup, governance, and modernization before AI budgets convert into durable platform spend | Request implementation timelines, migration blockers, and professional-services dependency. |
| Incumbent competition and budget rationalization | down | current | Strong rivals and budget scrutiny can extend sales cycles and reduce contract duration confidence | Request win/loss data versus Snowflake, cloud-native substitutes, and internal platform teams. |
The key underwriting question is not whether AI demand exists, but whether Databricks can turn broad demand into governed, retained, multiyear production spend faster than the friction builds.
[CM038, CM039, CM040, CM041, CM042, CM043]Databricks adoption usually starts with data-platform modernization, then expands into governed analytics and AI, but production scale waits on procurement, governance, and budget proof.
[CM054, CM055, CM060, CM061, CM062, CM071]2.4 Exhibits
03Competitors
3.1 Competitive landscape and buyer alternatives
Databricks no longer competes only against classic cloud data warehouses. Its own platform framing spans integration, storage, processing, governance, sharing, analytics, and AI across major clouds, which means the relevant set includes direct data-platform peers, incumbent cloud suites, adjacent streaming vendors, and the status quo of self-managed open-source stacks. Snowflake remains the closest direct peer because it sells a managed cross-cloud platform for analytics and AI with its own governance and consumption model. BigQuery, Microsoft Fabric, and AWS Redshift are the largest incumbent alternatives because each can absorb part of the same enterprise budget through an existing cloud relationship, then extend from analytics into AI and governance. Confluent overlaps more narrowly around streaming and real-time processing, but it competes for upstream data architecture decisions that can reduce warehouse or lakehouse spend. The final substitute set remains powerful: self-managed Spark, Trino, and other internal-build combinations let skilled platform teams avoid some vendor spend entirely, even if they accept more operational burden. That means Databricks is competing against platform vendors, cloud bundles, and internal build paths at the same time.[CP001, CP002, CP011, CP014, CP024, CP029]
Ordinal positioning of major alternatives by open multi-cloud posture and breadth of end-to-end enterprise data and AI workflow coverage.
Axes are evidence-backed ordinal scores derived from reviewed platform, governance, and pricing surfaces rather than a published market dataset.
[CP002, CP003, CP014, CP024, CP029, CP036]3.2 Direct peers, incumbents, and adjacent challengers
Snowflake is the clearest direct incumbent because it is already scaled with more than 13,300 customers, 733 customers spending over $1 million annually, and 790 Forbes Global 2000 customers. It differs from Databricks in architecture and economics: Snowflake is a managed public-cloud service with separate storage, compute, and cloud-services layers, while Databricks leans on lakehouse architecture, open-source lineage, and broader data-engineering-to-AI workflow claims. BigQuery is a more cloud-native substitute than a like-for-like company peer, but it matters because Google can pair a serverless analytics product with a large cloud sales motion and growing Apache Iceberg support. Microsoft Fabric is the most important bundle-led entrant: it packages data engineering, warehousing, Power BI, and Copilot-led workflows in a SaaS environment on OneLake, with Purview-backed governance and Azure procurement leverage. AWS Redshift remains a formidable incumbent where customers already standardize on S3, SageMaker, and AWS operations. Confluent is narrower, yet strategically relevant, because real-time data pipelines and Flink-based pre-processing can capture value before data ever reaches a Databricks or Snowflake warehouse. Together, these alternatives show that Databricks wins when buyers want one governed multi-workload platform, but loses some advantage when the customer already lives inside a hyperscaler bundle or only needs a focused component.[CP008, CP009, CP010, CP011, CP014, CP016]
| Competitor | Category | Scale / funding | Target segment | Primary differentiation | Key limitation vs Databricks |
|---|---|---|---|---|---|
| Databricks | Reference platform | $10B Series J expected financing at $62B valuation; 500+ $1M ARR-run-rate customers; 15,000+ organizations | Enterprise data engineering, analytics, governance, and AI teams | Multi-cloud lakehouse plus unified governance, AI/BI, and open-format posture | Public realized pricing and net retention by workload remain undisclosed; open formats reduce hard lock-in |
| Snowflake | Direct incumbent | $1.23B Q4 FY26 product revenue; 733 $1M+ customers; 790 Forbes Global 2000 customers | SQL-led enterprise analytics, data sharing, and AI workloads | Large installed base with strong managed-service simplicity and cross-cloud reach | Compute-credit model and managed-service orientation make it less open-source-native than Databricks |
| Google BigQuery | Incumbent cloud platform | Google Cloud revenue reached $12.0B in Q4 2024 | GCP-centric analytics, AI, and lakehouse buyers | Serverless analytics plus managed Apache Iceberg support and Google AI distribution | Best fit is strongest inside Google Cloud buying relationships rather than as a neutral multi-cloud control plane |
| Microsoft Fabric | Incumbent bundled suite | Microsoft Intelligent Cloud revenue reached $29.9B in FY25 Q4 | Power BI, Azure, and business-user-centric analytics estates | End-to-end SaaS analytics with OneLake, Copilot, and Purview-backed governance | Microsoft ecosystem gravity is an advantage for Fabric but also makes it less cloud-neutral than Databricks |
| AWS Redshift | Incumbent warehouse / lakehouse substitute | AWS segment sales reached $107.6B in 2024 | AWS-native data warehousing, S3-lakehouse, and AI workloads | Low-entry pricing, deep AWS integration, zero-ETL, and S3/SageMaker adjacency | Orientation is still AWS-centered and SQL-warehouse-led rather than a neutral data-to-AI control plane |
| Confluent Cloud + Flink | Adjacent real-time challenger | $922.1M FY2024 subscription revenue; $963.6M total revenue | Streaming-first engineering and real-time AI/data teams | Unified Kafka + Flink stack can shift transformations left before warehouse spend occurs | Not a full warehouse / BI / semantic-governance platform for broad enterprise analytics |
| Self-managed Spark / Trino | Status quo / internal build | Open-source software; infrastructure and staffing borne internally | Skilled platform teams with strong infra control requirements | Maximum engine choice and avoidance of platform license lock-in | Operational burden, security, governance, and user enablement fall back on the customer |
Rows compare the main ways a buyer can solve the same broad enterprise data-and-AI job, including direct peers, incumbent suites, adjacent streaming vendors, and internal build.
[CP005, CP008, CP009, CP010, CP011, CP020]3.3 Pricing, packaging, switching cost, and multi-homing
Pricing structure is one of the main reasons this market remains multi-homed. Databricks prices on a pay-as-you-go basis with per-second billing and committed-use discounts, but its public materials emphasize model structure more than a single simple list price. Snowflake is more explicit about billing mechanics: storage, compute, and data transfer are distinct, with compute metered in credits and even a small standard warehouse consuming 2 credits per hour. BigQuery exposes a similarly transparent structure through per-TiB scanning and slot-hour commitments. Fabric turns the buying decision into shared Capacity Units and reservation savings, while still preserving Power BI licensing nuances that can favor Microsoft-centric rollouts. Redshift sets clear low-entry serverless and provisioned starting prices and can ride existing AWS enterprise commitments. Confluent uses usage-based Kafka and Flink units that are attractive when streaming, not warehousing, is the center of gravity. The consequence is that Databricks' switching costs are real only after the platform owns governance, semantics, and multiple workload types. Before that, multi-homing is rational: enterprises can keep Snowflake for SQL-heavy sharing, BigQuery for GCP-resident analytics, Fabric for Power BI-heavy teams, Redshift for AWS-native warehousing, and Confluent for streaming while still using Databricks for engineering or AI.[CP003, CP005, CP006, CP007, CP017, CP018]
| Platform | Price / unit / contract model | Included capabilities | Discount / unknowns | Implication |
|---|---|---|---|---|
| Databricks | Pay-as-you-go, per-second billing; committed-use contracts available | Unified data, analytics, governance, SQL, AI, and AI/BI surfaces | Exact realized net pricing varies by SKU and cloud; public page stresses structure more than a single headline price | Flexible but harder for outsiders to benchmark precisely against simpler warehouse tariffs |
| Snowflake | Storage + compute + data transfer; compute uses credits; small standard warehouse = 2 credits/hour | Managed SQL analytics and AI platform with separate virtual warehouses | Credit price depends on edition / cloud agreement; per-second billing has 60-second minimum per start | Transparent meter design but forecasting requires credit discipline |
| BigQuery | On-demand $6.25 per TiB scanned or capacity pricing per slot-hour with editions | Serverless analytics, reservations, autoscaler, and AI features in BigQuery editions | Actual cost depends on bytes scanned or slot commitments | Easy low-friction entry but cost can jump with inefficient scans or sustained slot demand |
| Microsoft Fabric | Shared Capacity Units via pay-as-you-go or reservation; ~41% reservation savings cited | Data engineering, warehousing, BI, AI experiences, and OneLake in one SaaS environment | Publish/share workflows still often require Power BI Pro; some value depends on existing Microsoft contracts | Bundle economics are strong in Microsoft estates even when pure feature-by-feature comparison is debatable |
| AWS Redshift | Provisioned from $0.543/hour or serverless from $1.50/hour; RPU-hour billing per second | Warehouse, S3-lakehouse querying, zero-ETL, and AI integrations | Reservations can cut serverless compute cost up to 45%; exact TCO depends on S3 and adjacent AWS usage | Low entry point and AWS commitment leverage create a credible pricing wedge |
| Confluent Cloud + Flink | Kafka autoscaling via eCKUs; Flink priced by CFUs per minute; annual commitments available | Streaming, schema, governance, and serverless Flink processing | Warehouse and BI spend still sits elsewhere; price advantage depends on streaming-first architecture | Attractive when teams want to transform or filter data before paying downstream warehouse costs |
This table compares public billing mechanics and packaging posture rather than negotiated enterprise net price.
[CP005, CP006, CP017, CP018, CP019, CP025]3.4 Moat durability, open formats, and adverse evidence
Databricks' strongest differentiation still comes from combining open lakehouse positioning, governed data-and-AI workflows, and enough product breadth to span engineering and business users. Unity Catalog is the centerpiece because it expands beyond permissions into lineage, semantics, business metrics, and open-format governance, while AI/BI reduces one classic weakness versus Microsoft by offering native dashboards and conversational analytics without per-seat BI pricing. But the adverse evidence is material. Databricks itself is pushing Iceberg REST catalog support and external-engine interoperability, which is strategically smart but also lowers proprietary lock-in. BigQuery now has managed Iceberg support, Redshift highlights open formats and Iceberg-compatible access through the AWS lakehouse, and Snowflake has responded with its own Iceberg and Open Catalog posture. Fabric raises a different kind of threat: even if it is not as cloud-neutral as Databricks, it can use Microsoft procurement, Power BI distribution, and Copilot familiarity to win pragmatic standardization decisions. The upshot is that Databricks still appears better positioned than most single-product rivals, but its moat is no longer format lock-in. It depends on whether Databricks can remain the best governed control plane across open data, AI assets, and business semantics faster than cloud bundles commoditize the underlying infrastructure.[CP003, CP004, CP007, CP012, CP013, CP027]
| Buying criterion | Databricks | Snowflake | BigQuery | Fabric | Redshift | Confluent / internal build |
|---|---|---|---|---|---|---|
| Cross-cloud neutrality | Strong | Strong | Partial | Partial | Partial | Internal build = strong; Confluent = medium |
| Governed open-table posture | Strong | Medium | Medium | Medium | Medium | Internal build = medium |
| Business-user BI in same platform | Strong | Partial | Partial | Strong | Partial | Weak |
| Streaming / real-time depth | Medium | Partial | Partial | Medium | Medium | Strong |
| Warehouse simplicity / low-admin path | Medium | Strong | Strong | Strong | Strong | Weak |
| Open-source / engine portability | Strong | Medium | Medium | Medium | Medium | Strong |
Cells are ordinal summaries of reviewed public product, docs, and pricing surfaces; they do not imply identical feature depth or operational maturity across categories.
[CP002, CP003, CP007, CP012, CP013, CP016]| Moat claim | Threat | Severity | Mitigation / diligence ask |
|---|---|---|---|
| Unity Catalog governance spans data and AI assets | Snowflake, BigQuery, Fabric, and AWS are all improving governance around open formats and shared catalogs | High | Ask for win-loss data where governance breadth alone displaced a bundled incumbent |
| Open-format leadership reduces buyer lock-in fears | The same openness also lowers Databricks-specific switching costs once Iceberg interoperability becomes table stakes | High | Test whether customers standardize on Databricks as control plane even when compute engines remain multi-homed |
| AI/BI reduces need for separate BI tooling | Fabric can bundle Power BI, Copilot, and Microsoft procurement into a simpler executive purchase | High | Request attach-rate and expansion data for AI/BI versus Microsoft-centric accounts |
| Multi-cloud posture broadens buyer pool | Hyperscalers can still use existing cloud spend commitments and service adjacency to narrow evaluation scope | Medium | Review large-account win rates by incumbent cloud home and by regulated vertical |
| Open-source lineage supports internal credibility with engineers | Self-managed Spark, Trino, and stream-processing stacks remain credible for teams willing to absorb ops burden | Medium | Quantify how many large customers graduate from internal build to paid Databricks versus remaining self-managed |
The main risk is not one superior point competitor but converging bundle and interoperability pressure across the stack.
[CP003, CP007, CP012, CP013, CP027, CP030]Visual summary of how Databricks and the main retained alternatives cover the buying criteria most relevant to an enterprise lakehouse decision.
Matrix cells summarize public product positioning and architecture evidence; they intentionally avoid unsupported claims about private feature adoption or implementation quality.
[CP003, CP007, CP013, CP024, CP027, CP029]Ordinal scorecard of the competitive dimensions most likely to determine whether Databricks can defend share as the market converges around open formats and incumbent bundles.
Scores are analyst-derived ordinal judgments based on reviewed public evidence; they are not audited market benchmarks.
[CP003, CP007, CP012, CP013, CP030, CP044]3.5 Exhibits
04Financials
4.1 Revenue model, monetization surfaces, and public traction quality
Databricks now looks financially more like a broad consumption platform than a single analytics SKU. The reviewed public pricing and product surfaces show multiple monetization entry points: data engineering and warehousing compute, AI and model-serving workloads, AI/BI, and newer database products. The important underwriting distinction is that these are usage-based streams, not seat-based subscriptions. Databricks and Microsoft each describe DBU-driven billing with per-second granularity, while Microsoft also makes explicit that Azure customers pay both VM infrastructure charges and DBU platform charges. That dual-bill structure matters because public traction claims look strong but are not equivalent to realized software gross profit. On the traction side, company and independent sources line up on a sharp expansion path from $2.6 billion of recognized fiscal-2025 revenue to a $3.7 billion annualized rate by July 2025, then to $4.0 billion in September 2025 and $5.4 billion by February 2026. AI has become a material second engine rather than a marketing overlay: Databricks said AI products crossed $1.0 billion annualized revenue in September 2025 and $1.4 billion by February 2026, while CRN separately noted data warehousing still exceeded a $1.0 billion revenue run-rate. That is a healthier mix signal than a single narrowly defined warehouse story.[CI001, CI002, CI003, CI004, CI005, CI006]
| Revenue stream | Mechanism | Unit | Current value / status | Revenue quality | Diligence ask |
|---|---|---|---|---|---|
| Core data engineering compute | Jobs, all-purpose, and serverless workloads billed through DBUs and attached infrastructure usage | DBU-hour plus cloud infrastructure | Core monetization surface remains active and disclosed on pricing pages | Medium; pricing mechanics are public but realized net rates are not | Request workload-level realized price per DBU and gross margin by compute class. |
| Databricks SQL / warehousing | Serverless SQL and related warehouse compute | DBU-hour | Greater than $1B revenue run-rate by Q3 2025 | Medium; strong disclosed run-rate but not audited revenue recognition | Request warehousing revenue mix, warehouse attach, and margin by deployment mode. |
| AI products | Model serving, AI Gateway, agent and model tooling on governed data | DBU-hour and payload-based usage | Crossed $1B run-rate in Sep 2025 and $1.4B by Feb 2026 | Medium; multiple corroborating sources but still company-led disclosure | Request AI revenue split between serving, tooling, and partner-model pass-through. |
| AI/BI | Native BI and conversational analytics embedded into the platform without per-seat BI licensing | Usage-based platform consumption | Publicly positioned as no per-seat or per-license BI fee | Medium; packaging is public but standalone revenue is undisclosed | Request AI/BI attach rate, user mix, and realized monetization per active account. |
| Lakebase / database | Serverless Postgres and database serverless compute for AI agents | Database compute and storage usage | Strategic expansion area accelerated with 2026 financing; revenue not disclosed | Low; new product with no public revenue contribution disclosed | Request current bookings, customer count, and cost-to-serve by Lakebase workload. |
| Professional services / support | Implementation, migration, and support services attached to platform deals | Services and support fees | Publicly unsegmented | Low; no public breakout | Request services mix, gross margin, and whether services are strategic or break-even. |
Rows separate public monetization mechanisms from what remains undisclosed. Usage-based surfaces are visible; realized revenue mix is not.
[CI001, CI002, CI005, CI006, CI007, CI008]| Offer / comparator | Price unit / contract | Public list / billing signal | Discounts / unknowns | Source |
|---|---|---|---|---|
| Databricks core platform | Pay-as-you-go DBUs | No up-front cost and per-second granularity | Realized net pricing and enterprise discount bands undisclosed | Databricks pricing page |
| Azure Databricks commitments | 1-year or 3-year DBCU pre-purchase | Up to 37% savings versus pay-as-you-go DBUs | Savings apply to DBUs, not the full underlying cloud bill | Microsoft Azure pricing |
| Databricks AI/BI | Embedded platform usage rather than BI seats | No per-seat or per-license BI fees | Actual monetization path and attach rate undisclosed | Databricks AI/BI page |
| Snowflake | Compute and storage with list-price tables / calculator | Managed elastic compute plus separate storage pricing | Capacity storage discounts require contract tables not shown on marketing page | Snowflake pricing |
| BigQuery | On-demand per TiB or reserved slots | On-demand analysis is $6.25 per TiB above the first free TiB monthly; capacity uses slots | Realized enterprise discounts vary by commitment and edition | Google BigQuery pricing |
| Amazon Redshift | Provisioned or serverless RPUs | Serverless starts at $1.50 per hour and is billed per-second while active | Reservations reduce cost but create commitment structure and separate transfer/storage charges | Amazon Redshift pricing |
The table mixes Databricks list mechanics with comparator monetization structures to show how usage-based data-platform economics are actually bought. It does not attempt to estimate Databricks realized net revenue per workload.
[CI001, CI003, CI026, CI027, CI028, CI029]Databricks converts enterprise adoption into revenue through usage-based DBUs and adjacent AI / database services, but customer bills still include separate cloud infrastructure charges.
[CI001, CI002, CI004, CI013, CI017, CI037]The cleanest source-backed late-2025 to early-2026 revenue range is Databricks annualized revenue run-rate.
The figure uses three disclosed run-rate points over two quarters. Mid is a disclosed December 2025 point, not a statistical midpoint.
[CI009, CI013]4.2 GTM motion and the public unit-economics proxies that actually exist
Databricks still does not publish CAC, payback, quota efficiency, or sales-cycle duration, so the right approach is to lean on public expansion proxies rather than fabricate SaaS precision. The best signals come from spending cohorts and retention. CNBC reported that Databricks had net retention above 140% in June 2025, nearly 50 customers spending more than $10 million annually in the first quarter of the new fiscal year, and roughly 8,000 employees while continuing to hire aggressively. By February 2026 Databricks and CRN were both citing more than 800 customers consuming over $1 million in annual revenue run-rate and more than 70 above $10 million. Those cohorts strongly suggest a land-expand motion with material cross-sell potential across engineering, warehousing, BI, and AI workloads. Sacra also estimates average contract value around $208,696 as of June 2024, which is directionally useful but not a substitute for booked ARR disclosures. The chapter should therefore treat Databricks sales efficiency as promising but only partially observable: public evidence supports strong expansion within existing enterprise accounts, but it does not reveal the customer-acquisition cost, discounting intensity, or time-to-productivity needed for a full payback model.[CI017, CI018, CI021, CO023, CI023, CI045]
| Metric | Value | Confidence | Why it matters | Diligence ask |
|---|---|---|---|---|
| Recognized revenue for fiscal year ended Jan 2025 (USD billions) | 2.6 | medium | Anchors run-rate claims in at least one reported fiscal-year revenue datapoint. | Confirm audited GAAP revenue, deferred revenue, and revenue recognition policy by product. |
| Annualized revenue run-rate by Feb 2026 (USD billions) | 5.4 | medium | Shows scale and acceleration entering 2026, but run-rate is not the same as recognized revenue. | Bridge run-rate to booked and recognized revenue by quarter. |
| Net revenue retention | >140% | medium | Indicates strong expansion inside existing accounts and supports a usage-led land-expand thesis. | Provide cohort-level NRR by enterprise segment and by product family. |
| $1M+ annual run-rate customers | 800 | medium | Large high-spend cohort is a practical proxy for enterprise depth and cross-sell durability. | Provide cohort gross retention and gross-margin profile for these accounts. |
| $10M+ annual run-rate customers | 70 | medium | Very large accounts imply strategic embed but also raise concentration questions. | Provide top-10 customer exposure and any hyperscaler/channel overlap. |
| Average contract value proxy (USD) | 208696 | low | Third-party estimate provides directional context for typical deal size outside the very largest cohorts. | Validate against internal ACV / annual spend distribution. |
| Public gross margin | low | Gross margin is the key missing bridge between strong usage growth and durable cash generation. | Provide audited gross profit by major workload and cloud. | |
| Public CAC / payback | low | Without CAC or payback, sales efficiency cannot be underwritten like a public SaaS company. | Provide blended and enterprise-only CAC, payback, and rep productivity curves. | |
| Free cash flow status | Positive over prior 12 months by Sep 2025 and Feb 2026 | medium | Suggests improving operating leverage even without full margin disclosure. | Provide absolute operating cash flow, capex, and free cash flow by quarter. |
Null fields are intentional where Databricks withholds public detail. The table keeps public proxies separate from missing underwriting inputs.
[CI013, CI015, CI017, CI018, CI020, CI021]| Missing metric | Why it matters | Best public proxy | Exact diligence path |
|---|---|---|---|
| Realized net pricing by workload | List pricing does not reveal net revenue quality or discounting intensity. | Public DBU mechanics and commitment discounts only. | Request top-100 contract sample with list price, discount, cloud, and product mix. |
| Audited gross margin and contribution margin | Growth can look excellent while margin quality deteriorates. | Snowflake and Confluent filings offer comparators; Sacra offers only a low-confidence estimate for Databricks. | Request audited gross profit and infrastructure cost allocation by product family. |
| Cash balance and debt terms | Capital adequacy cannot be modeled without exact liquidity and obligations. | CNBC says Databricks has billions in cash and about $2B of debt capacity. | Request closing cash, debt docs, maturity schedule, and covenant summary. |
| Monthly burn and runway | Runway is the basic underwriting test for a private company. | Positive free cash flow signals lower stress but not a full runway model. | Request trailing 18-month monthly cash bridge and downside runway scenarios. |
| CAC, payback, and sales-cycle length | Late-stage software underwriting needs a go-to-market efficiency view. | NRR >140%, 800+ $1M customers, and 70+ $10M customers indicate strong expansion but not acquisition efficiency. | Request cohort CAC, payback, pipeline conversion, and rep productivity by segment. |
| Customer concentration and top-account exposure | Very large cohorts can hide dependence on a few strategic accounts or channels. | Public sources show 70+ customers above $10M annualized spend but no top-customer concentration. | Request revenue concentration, top-20 account trends, and hyperscaler / marketplace channel overlap. |
These are the core blockers to turning a strong public growth narrative into a full investment-grade financial model.
[CI014, CI017, CI018, CI021, CI044]Public unit-economics evidence is strongest on expansion behavior and weakest on acquisition efficiency and margin disclosure.
[CI017, CI018, CI021, CO023, CI045]4.3 Cost structure, gross-margin path, and why dual billing matters
The cleanest public margin insight is not a Databricks audited statement but the mechanics of the platform and comparator filings. Databricks itself says total cost of ownership spans two components: direct platform costs and the underlying cloud infrastructure costs needed to run workloads. Microsoft adds the operational detail that Azure Databricks customers are billed for both VMs and DBUs, that idle pools can still incur infrastructure billing, and that committed-use purchases lower DBU prices but do not eliminate the cloud bill. That means Databricks margin quality will depend on software take-rate, workload mix, negotiated hyperscaler economics, and how much new AI serving and database usage compresses margin before scale catches up. Snowflake’s 2026 10-K is a helpful upper-bound comparator: product gross margin was 72% even after $248.1 million of additional third-party cloud infrastructure expense, including AI inference. Confluent’s filing is the cautionary counterexample: it says public-cloud pricing materially affects gross margins, that Confluent Cloud historically carried a lower average price than its legacy platform, and that the company shifted toward free-trial and pay-as-you-go land motions with more near-term volatility. Independent Databricks-specific analysis points in the same direction. CloudForecast, Mammoth, and Revefi all highlight that DBU pricing plus separate cloud charges make spend harder to predict, especially as AI workloads spike. The implication is that Databricks could still have attractive software economics, but margin underwriting remains incomplete without audited gross-profit and operating-cash detail.[CI002, CI003, CI004, CI005, CI006, CI007]
Databricks appears less capital-intensive than hardware-heavy AI companies, but the main cash-flow sensitivities are cloud economics, AI workload mix, and hidden debt / liquidity details.
The map is qualitative because Databricks does not publish audited gross margin, capex, or runway figures.
[CI014, CI015, CI032, CI033, CI034, CI037]4.4 Capital adequacy, financing dependency, and the financial verdict
Public evidence points to low near-term financing stress but still leaves important underwriting holes. Databricks has moved from the December 2024 Series J package, which targeted AI investment, acquisitions, international go-to-market expansion, and employee liquidity, to a September 2025 Series K and then the February 2026 package worth more than $7 billion, including about $5 billion of equity and about $2 billion of additional debt capacity. Combined with public statements that free cash flow was positive over the prior 12 months, that suggests Databricks is financing growth options rather than plugging a disclosed liquidity crisis. CNBC also reported that the company now has billions in cash on hand, but without a precise balance, debt pricing, covenant package, amortization schedule, or monthly burn rate. That is enough to support a forward verdict of strong revenue quality and low immediate capital-intensity risk relative to many late-stage AI companies, but not enough to complete a lender-style or IPO-style liquidity model. The chapter’s practical conclusion is that Databricks appears financially durable in the near term, with multiple growth engines and ample external capital access, yet diligence should still prioritize realized pricing, audited margins, cash and debt schedules, and concentration risk before treating the public run-rate narrative as fully underwritten.[CO028, CI012, CI013, CI014, CI015, CI024]
| Capital metric | Public value / status | Evidence | Underwriting implication | Diligence ask |
|---|---|---|---|---|
| Cash on hand | CNBC says Databricks now has billions in cash on hand after the Feb 2026 package, but gives no exact balance. | Liquidity appears ample, but exact cash cannot be modeled. | Request current cash, restricted cash, and post-close liquidity waterfall. | |
| Monthly burn | No public monthly burn disclosed; company instead emphasizes positive free cash flow. | Exact runway cannot be calculated from public data. | Request monthly cash burn bridge and scenario burn under slower growth. | |
| Runway months | No exact cash balance plus no burn rate. | Runway remains an evidence gap despite strong financing access. | Request base, downside, and acquisition-adjusted runway model. | |
| Planned use of funds | AI products, acquisitions, international GTM, employee liquidity, Lakebase, Genie | Series J and Feb 2026 company statements specify growth, product, M&A, and liquidity uses. | Capital appears growth-oriented rather than rescue-oriented. | Request board-approved capital plan and 12-24 month deployment budget. |
| Next-round trigger | No immediate public trigger; IPO / private financing appears optional rather than urgent | Positive free cash flow plus >$7B financing package reduce near-term dependency. | Near-term capital risk looks low, but market timing can still shape the path to IPO. | Confirm management trigger points for IPO, debt drawdown, or another private round. |
| Debt / credit obligations | ~$2B additional debt capacity disclosed in Feb 2026; detailed terms undisclosed | Debt broadens flexibility but may embed covenants, pricing, and maturity risk not public. | Forward adequacy depends partly on unseen debt terms. | Request debt agreements, covenants, maturity ladder, and security package. |
This table intentionally focuses on forward liquidity and financing dependency rather than repeating the full historical round chronology already established elsewhere in the report.
[CI014, CI015, CI024, CI025]4.5 Exhibits
05Product & Technology
5.1 Product scope and customer workflow coverage
Databricks is best understood as a workflow platform that starts before analytics and extends beyond it. The product surface now spans ingestion and transformation patterns, bronze-silver-gold lakehouse organization, centralized governance, BI consumption, AI model deployment, and operational application databases. LakeFlow matters because it pulls ingestion, transformation, and orchestration closer to the platform rather than leaving those jobs entirely to partners. Unity Catalog and AI/BI matter because they move Databricks from technical-platform ownership toward business-facing semantics, lineage, and conversational analytics. Mosaic AI Model Serving extends that workflow into real-time and batch inference, while Lakebase pushes the platform further into operational application development by pairing Postgres with the lakehouse. The net result is a broader customer journey: ingest and clean data, govern it centrally, expose metrics and dashboards to business users, deploy models and external-model endpoints, and increasingly build applications or agents on top of the same governed data estate. That breadth is strategically valuable because it reduces tool sprawl, but it also means underwriting Databricks requires assessing how coherently these modules work together rather than judging a single warehouse SKU.[CE001, CE002, CE007, CE011, CE012, CE014]
| Module / product line | Primary user | Status / maturity | Differentiation | Diligence gap |
|---|---|---|---|---|
| Core lakehouse + medallion architecture | Data engineers and platform teams | Mature core workflow | Single governed path from raw to enriched data across bronze, silver, and gold layers | Need workload-level evidence on migration friction and performance by cloud. |
| Unity Catalog | Data platform, governance, security, and analytics teams | Mature control-plane pillar | Open-format governance, lineage, federation, and row/column controls across data and AI assets | Need public evidence on adoption depth of newer business-semantics and AI-governance features. |
| Databricks SQL + AI/BI | Analysts, business users, and semantic-layer owners | Mature analytics with expanding business-user reach | Native BI on governed data with conversational analytics and no public per-seat BI fee | Need public proof of production BI adoption, concurrency, and dashboard migration success. |
| Mosaic AI Model Serving | ML engineers, application developers, and platform teams | Mature serving surface with expanding external-model governance | Unified REST deployment and serverless serving for internal and external models | Need independent latency, cost, and guardrail benchmarks versus alternatives. |
| LakeFlow | Data engineering teams | Expanding; launched 2024 and still integrating partner overlap | Built-in ingestion, transformation, and orchestration reduce need for separate pipeline tooling | Need public evidence on connector breadth, reliability, and large-scale production references. |
| Lakebase | Application developers and agent builders | Emerging but materially advanced; GA reported in 2026 | Operational Postgres integrated with the lakehouse, branching, point-in-time recovery, scale to zero | Need customer volume, cost-to-serve, and multi-cloud availability detail. |
| Lakewatch | Security teams and SecOps analysts | Newly launched in 2026 | Extends Databricks data platform into AI-assisted SIEM workflows | Need public benchmarks, customer references, and false-positive / efficacy data. |
| CLI + Python SDK | Developers and platform engineers | Active ecosystem tooling with recent releases | Multi-cloud automation and developer workflows beyond notebooks alone | Need broader usage and contributor trends across the full ecosystem. |
Rows separate mature core platform layers from newer expansion products such as Lakebase and Lakewatch. “Status / maturity” reflects public release evidence, not internal revenue contribution.
[CE002, CE007, CE012, CE022, CE028, CE037]| User job | Current workflow | Databricks solution | Measurable benefit | Limitation |
|---|---|---|---|---|
| Ingest SaaS and database data | Teams often chain separate ingestion, replication, and orchestration tools before analytics | LakeFlow ingestion, transformation, and orchestration inside Databricks | LakeFlow was launched to reduce the need for bespoke or third-party ingestion stacks | Public evidence does not show connector reliability or realized replacement rates at scale. |
| Create governed enterprise data products | Data lands in fragmented stores and governance tools with duplicated controls | Medallion layers plus Unity Catalog governance and lineage | Unified governance and lineage reduce audit friction and make downstream usage easier to trace | Public proof of governance-operating efficiency is still mostly vendor-authored. |
| Let business users self-serve analytics | BI sits on separate semantic layers and seat-based licensing models | AI/BI Dashboards, Genie, Databricks SQL, and Business Semantics on governed data | No public per-seat BI fee and conversational analytics reduce access friction | Customer migration effort from incumbent BI tools is not publicly quantified. |
| Deploy and manage AI inference | Teams manage separate model endpoints, APIs, and provider credentials | Mosaic AI Model Serving with REST APIs, serverless scaling, and centrally governed external models | Unified batch and real-time inference path simplifies deployment under one control plane | Independent latency, cost, and security comparisons are limited. |
| Build operational apps on governed data | Operational databases and analytics warehouses are separated by ETL and separate tooling | Lakebase adds Postgres integrated with the lakehouse and Databricks Apps | VentureBeat reported early adopters cutting app-delivery times by 75%-95% or 56%-92% depending on customer example | Those performance outcomes are company-reported through press coverage and not yet broadly audited. |
Benefits are only included where the retained sources provided a concrete workflow or outcome statement. Press-reported customer outcomes remain lower confidence than audited benchmark data.
[CE011, CE013, CE020, CE021, CE022, CE023]Workflow from source-data ingestion to governed analytics, AI deployment, and operational application delivery. The figure emphasizes where Databricks has expanded from core lakehouse roots into pipeline tooling and Postgres-backed apps.
[CE011, CE013, CE020, CE021, CE022, CE028]Qualitative maturity map across Databricks capability areas. Core governance and lakehouse layers appear mature; BI and AI serving are mature-to-expanding; Lakebase and Lakewatch remain newer lines that still need broader public proof.
The maturity labels are analyst judgments synthesized from public release evidence, documentation depth, and independent reporting; they are not company-provided scores.
[CE004, CE011, CE022, CE028, CE031, CE033]5.2 Architecture, deployment model, and critical dependencies
The most supportable public architecture picture is a hybrid one: Databricks manages a control plane, while classic compute still runs in customer cloud accounts and serverless compute runs in Databricks-managed infrastructure. Azure documentation and independent architecture analysis both describe the control-plane/compute-plane split, while Databricks' own architecture guidance frames the platform around control plane, compute plane, and storage. On the data path, Databricks keeps pushing the medallion pattern because bronze, silver, and gold layers make it easier to express ingestion, validation, and consumption steps as one governed pipeline. Unity Catalog then acts as the metadata and policy plane above those assets, and model serving exposes governed inference endpoints through REST APIs and serverless scaling. The dependency map is therefore not just hyperscalers. Databricks also depends on open-table-format politics, partner cloud services such as BigQuery and Gemini on Google Cloud, GPU acceleration paths such as NVIDIA RAPIDS, and external model providers such as OpenAI and Anthropic where customers want centrally governed third-party models. This architecture is flexible and differentiated, but it also means product quality depends on how well Databricks manages cloud boundaries, open-format interoperability, and external-service performance under one control plane.[CE003, CE015, CE016, CE017, CE018, CE019]
| Layer / component | Role | Dependency | Risk |
|---|---|---|---|
| Account + control plane | Hosts web app, account services, APIs, and central coordination | Databricks-managed control plane and account services | Control-plane concentration creates blast-radius risk if central services degrade. |
| Workspace + classic compute plane | Runs notebooks, jobs, and customer-managed compute in the customer cloud account | Customer cloud networking, identity, and classic cluster configuration | Security posture varies with workspace design and cloud-account hygiene. |
| Serverless compute | Runs model serving and serverless SQL without customer-managed public IPs | Databricks-managed serverless infrastructure and separate terms enablement | Less customer control and public transparency on service-family SLAs and incident rates. |
| Lakehouse data pipeline | Organizes data through bronze, silver, and gold quality layers | Storage systems, ingestion tools, and medallion discipline | Poor upstream data quality still propagates if silver/gold controls are weak. |
| Unity Catalog metadata plane | Enforces governance, lineage, discovery, and federation | Open formats, external systems, and policy configuration across clouds | Metadata centralization is strategic strength but becomes a control-plane dependency. |
| AI deployment layer | Serves internal and external models via REST APIs and AI Functions | Serverless compute, model registries, and third-party model providers such as OpenAI and Anthropic | Latency, cost, and policy outcomes partly depend on external-model vendors. |
| Open ecosystem + partner cloud layer | Extends Databricks through Iceberg, BigQuery, Gemini, and GPU acceleration | Google Cloud, NVIDIA, and open-table-format interoperability | Differentiation is tied to partner performance and evolving open-format standards. |
| Operational database layer | Runs Lakebase Postgres for AI-agent and operational apps | Neon/Mooncake-derived database tech plus Unity Catalog sync | Newer product category with less public scaling and reliability history than core lakehouse. |
The table emphasizes operating-model dependencies rather than low-level implementation details. Public sources support the control-plane split, medallion pipeline, and partner dependencies but not internal service topology.
[CE015, CE018, CE019, CE020, CE022, CE023]Five-layer stack showing how Databricks ties user-facing analytics and app workflows to centralized governance, lakehouse pipelines, AI serving, and multi-cloud infrastructure. The architecture is broad, but the control plane and partner ecosystem remain critical dependencies.
[CE003, CE007, CE012, CE018, CE020, CE022]Directed graph of the major external and internal dependencies that shape Databricks product delivery: hyperscalers, open formats, GPUs, external models, and the central Databricks control plane.
[CE018, CE019, CE024, CE028, CE030, CE034]5.3 Trust, safety, privacy, compliance, and reliability posture
Databricks has enough public trust material to show serious enterprise posture, but not enough to treat trust as fully de-risked. The Trust Center says security is built into every layer of the platform and publicly points buyers to encryption, network controls, auditing, identity integration, access controls, and data governance. The compliance pages list a wide range of frameworks relevant to regulated buyers, including FedRAMP, HIPAA, GDPR, PCI-DSS, ISO 27001/27017/27018/27701, and SOC, with SOC 3 public and other reports accessible through diligence channels. Serverless SQL adds one concrete architectural trust signal because Databricks says those warehouses have no public IP addresses. At the same time, reliability remains a live operating issue rather than a solved checkbox: on the run date, the AWS status page showed partial compute disruption in multiple regions even while AI/BI and Apps were largely operational. Independent uptime monitors add only limited comfort because they often summarize uptime without publishing detailed incident data or root causes. AI security is also moving target, not a closed question. Databricks publicly discusses AI security resources and in March 2026 launched Lakewatch as an AI-assisted SIEM product, but there is still little independent evidence on detection quality, false positives, or how responsible-AI controls perform in production environments.[CE025, CE031, CE032, CE033, CE044, CE045]
| Control / certification / quality signal | Status | Scope | Gap |
|---|---|---|---|
| Encryption, network controls, auditing, identity integration, access controls, governance | Publicly documented | Platform-wide trust posture according to Databricks Trust Center | Public pages do not quantify control effectiveness or incident-prevention outcomes. |
| Serverless SQL without public IPs | Publicly documented | Serverless SQL network isolation on AWS | Does not by itself disclose uptime, egress policy coverage, or all serverless-service boundaries. |
| FedRAMP, HIPAA, GDPR, PCI-DSS, ISO 27001/27017/27018/27701, SOC | Publicly listed | Regulated-industry and privacy posture across supported clouds | Framework listing is not the same as customer-specific configuration or scope fit. |
| SOC 3 public; SOC 1 and SOC 2 available via diligence channels; reports refreshed three times yearly | Publicly documented | Audit cadence and report availability | No public SOC detail in the chapter beyond availability and cadence. |
| Live status page by service family and region | Publicly documented | Operational visibility for active incidents | Historical MTTR, severity distribution, and root-cause reporting remain limited. |
| Lakewatch AI-assisted SIEM launch | Recently launched | Extends platform into AI security operations | Independent efficacy evidence and customer deployments remain sparse. |
Trust evidence is strongest for control coverage and compliance breadth, but weakest for quantified reliability and independent AI-security efficacy.
[CE025, CE031, CE032, CE033, CE044, CE045]5.4 Maturity, differentiation, and roadmap signals
The strongest differentiation signal is that Databricks is trying to be the governed control plane for open data, AI assets, and now operational application data, not just the place where Spark jobs run. Unity Catalog' open-format posture, federation support, and lineage features are central to that strategy, and Google Cloud's Iceberg commentary plus theCUBE's 2025 summit summary both reinforce that openness is a real product direction rather than a one-off talking point. The 2024-2026 launch timeline also shows consistent scope expansion: LakeFlow addressed ingestion and orchestration, the 2025 summit pushed semantics, agent tooling, and Lakebase into the foreground, Lakebase reached general availability in early 2026, and Lakewatch added a new security layer weeks later. Developer signal points the same way. The CLI and Python SDK show active releases in April 2026, and the SDK documentation emphasizes unified support across AWS, Azure, and GCP, which is the footprint expected from a platform company rather than a narrowly packaged application. The public evidence therefore supports a verdict of broad product maturity in core lakehouse, governance, and developer tooling, expanding maturity in BI and AI serving, and emerging maturity in operational database and AI-security products. What remains weak is forward visibility: Databricks publishes active release-note cadence, but not a dated roadmap that would let an external investor cleanly separate near-term launches from longer-horizon ambition.[CE004, CE006, CE009, CE010, CE027, CE035]
| Date / stage | Feature / milestone | Status | Implication | Source |
|---|---|---|---|---|
| 2024-05-14 | NVIDIA publishes RAPIDS-on-Databricks technical guide | Ecosystem capability documented | Signals that Databricks is investing in GPU-accelerated developer workflows, not just CPU-bound analytics. | NVIDIA Technical Blog |
| 2024-06-12 | LakeFlow launch for ingestion, transformation, and orchestration | Launched | Moves Databricks upstream into built-in pipeline tooling and reduces reliance on adjacent vendors. | TechCrunch |
| 2025-06-11 | Data + AI Summit 2025 updates around Unity Catalog semantics, open formats, GenAI tools, and Lakebase | Announced / expanded direction | Shows platform expansion from lakehouse core toward a broader enterprise data-and-AI operating layer. | theCUBE Research |
| 2025-08-30 | Google Cloud blog highlights Unity Catalog Iceberg support across catalogs | Partner-confirmed ecosystem milestone | Reinforces Databricks' open-format and interoperability posture. | Google Cloud Blog |
| 2026-02-03 | Lakebase reported generally available and built on Neon plus Mooncake technology | GA reported by independent press | Makes Databricks relevant for operational apps and agent workflows, not only analytics. | VentureBeat |
| 2026-03-24 | Lakewatch AI security product launched | New product launch | Expands Databricks into SIEM-style security workflows but also introduces a new proof burden. | TechCrunch |
| 2026-04-30 | CLI v0.299.0 and Python SDK 0.106.0 visible in public release surfaces | Active developer-tooling cadence | Developer tooling is shipping quickly enough to matter as a productization signal. | GitHub / PyPI |
| 2026-05-04 | Release-note index current through May 2026 with Lakeflow declarative pipelines and serverless called out | Current release cadence visible | Confirms ongoing platform iteration but not a dated forward roadmap. | Databricks Docs |
The table focuses on externally visible milestones that change product scope or maturity. It does not infer undisclosed future dates beyond public releases and announcements.
[CE027, CE035, CE037, CE038, CE040, CE041]5.5 Exhibits
06Customers
6.1 Customer segmentation and visible buyer map
Databricks’ public customer picture is broad, but it is most useful when separated into buying contexts rather than treated as a single logo wall. The company’s own 2025 and 2026 disclosures now anchor scale at more than 20,000 organizations and 70% of the Fortune 500, while CNBC still referenced more than 15,000 customers in mid-2025. That progression matters because it shows both breadth and momentum instead of a one-off marketing number. The visible buyer map is also broader than a core data-engineering sale. Microsoft, Google Cloud, and SAP all position Databricks as a route to enterprise analytics and AI procurement, while named accounts span telecom, payments, media, retail, healthcare, and public-sector style workloads. In practice, Databricks is sold to platform leaders, data and AI teams, governance leaders, and increasingly business users who consume AI/BI and governed analytics. The payer is often a central platform or cloud budget, but partner routes can influence procurement and renewal control. This makes Databricks look like a scaled enterprise platform with multiple buyer centers rather than a single-workload tool.[CU001, CU002, CU003, CU042, CU043, CU044]
| Segment | Buyer / user / payer | Use case | Scale | Revenue / strategic value | Gap |
|---|---|---|---|---|---|
| Large global enterprises | CDO/CIO, platform teams, analysts, governed business users | Unified data, analytics, AI, and agent workloads | 20,000+ customers claimed; 70% Fortune 500 penetration claimed | Largest source of expansion and whale cohorts | No public split between active, paying, and channel-routed customers |
| Consumer and digital experience brands | Marketing, data science, product, operations | Real-time search, fan experience, personalization, store operations | 7-Eleven, FOX Sports, Rivian, Block highlighted publicly | Shows Databricks can move beyond classic back-office analytics | No public contract values or renewal detail by consumer brand |
| Financial services and payments | Data platform, governance, onboarding, fraud, risk teams | AI assistants, secure data collaboration, pipeline optimization | Mastercard and Block explicitly named in current disclosures | High-reference value because governance and privacy matter | No public revenue concentration or procurement-cycle detail |
| Regulated healthcare, manufacturing, and public sector | Manufacturing ops, compliance, data engineering, federal contractors | Operational data unification, reliability, and compliant analytics | Insulet plus FedRAMP/Azure IL5 public-sector readiness | Supports trust-led and regulated-workload expansion | No public public-sector bookings or healthcare retention data |
| Partner-routed enterprise buyers | Cloud architects, enterprise platform teams, SAP data owners | Buy via Azure, Google Cloud Marketplace, or SAP Business Data Cloud | Azure, Google Cloud, and SAP maintain active Databricks routes | Extends distribution and lowers procurement friction for some enterprises | Direct versus partner sourced customer mix is undisclosed |
Segmentation separates visible buying contexts and channels instead of treating customer proof as a single undifferentiated enterprise bucket.
[CU001, CU002, CU003, CU042, CU043, CU044]6.2 Named deployment proof is strongest when it includes measurable outcomes
The strongest public customer proof is not the existence of logos, but named deployments with specific outcomes. The July 2025 Databricks summit recap gives the cleanest recent set. 7-Eleven used Databricks for a multipurpose agentic marketing assistant across more than 13,000 stores and also used Databricks workflows to support a Unity Catalog migration. FOX Sports built Cleatus AI to answer fan questions in natural language and said AI-powered search produced a 2x improvement in query success. Mastercard gives both workflow and economics proof: its onboarding assistant was built with Databricks, it uses human-in-the-loop feedback, and Databricks said Mastercard cut query time by 80% and storage by 70% while reducing processing from months to days. AT&T is the best large-enterprise migration proof from a partner domain, with Microsoft documenting 300% five-year ROI, more than 80 schema reductions, and roughly 3x faster data-science cycles on Azure Databricks. Insulet adds healthcare and manufacturing proof, with 12x faster processing and sharply lower data-stack cost. Together these references support real production use across multiple verticals and multiple deployment shapes, not just conference-stage demos.[CU011, CU012, CU013, CU014, CU015, CU016]
| Customer | Segment | Deployment / use case | Production vs pilot | Outcome | Limitation |
|---|---|---|---|---|---|
| 7-Eleven | Retail / store operations / marketing | Agentic marketing assistant, Unity Catalog migration support, technician knowledge retrieval | Production use case presented publicly | Tracks store performance across 13,000+ stores and uses workflows to guide migration steps | No public contract value, renewal data, or ROI disclosure |
| FOX Sports | Media / consumer engagement | Cleatus AI fan assistant with natural-language search over scores, stats, and commentary | Production | AI-powered search more than doubled query success for fans | No disclosed commercial metrics or retention terms |
| Mastercard | Financial services / payments | GenAI onboarding assistant plus data-pipeline optimization and governance | Production | Onboarding sped up, drop-off reportedly declined, query time down 80%, storage down 70%, processing reduced from months to days | Customer economics and renewal details are undisclosed |
| AT&T | Telecom / enterprise data platform | Migration of large data estate onto Azure Databricks plus AutoClassify ML use case | Production | 300% five-year ROI, more than 80 schemas reduced, and data-science cycles about 3x faster | Case study focuses internal platform value, not external revenue outcomes |
| Insulet | Healthcare / manufacturing | Lakeflow Connect-driven data unification for manufacturing and customer-service data | Production | 12x faster processing and 97% lower TCO, with near-real-time enterprise ingestion | Only company-authored proof is public; no independent outcome audit |
Rows prioritize the clearest named accounts with measurable outcomes or credible corroboration rather than attempting an exhaustive customer roster.
[CU012, CU013, CU014, CU015, CU016, CU017]Relative strength of public customer proof by named account.
[CU012, CU016, CU019, CU020, CU024, CU026]6.3 Durability is directionally strong, but public retention proof is still incomplete
Public durability evidence is positive, but it is not complete enough to underwrite Databricks like a fully disclosed public software company. The strongest disclosed signal is the company’s sustained net retention above 140%, repeated in September 2025, February 2026, and corroborated by CNBC and CRN. Large-account cohorts point in the same direction: Databricks moved from 650-plus to 800-plus $1M annual run-rate customers and from nearly 50 to more than 70 $10M customers in the same general period. Those are real expansion indicators. But public retention evidence still stops short of what an investor would ideally want. There is no public GRR, no segmented churn, no cohort waterfall, and no contract-term disclosure. Review surfaces are helpful only as directional color. Databricks’ own Gartner recap points to strong AI/BI satisfaction, while accessible independent review pages also surface consistent complaints about cost control, complexity, and onboarding. The right conclusion is that Databricks shows strong expansion and meaningful product value, but public sources still do not prove renewal durability by cohort or by product line.[CU004, CU005, CU006, CU007, CU008, CU009]
| Metric | Value | Date | Source | Confidence | Implication | Missing denominator |
|---|---|---|---|---|---|---|
| Public customer count (historical reference) | 15,000+ | 2025-06-12 | CNBC | medium | Shows scale was already large before the 2025-2026 financing cycle | No split by paid versus active accounts |
| Public customer count (current company claim) | 20,000+ | 2026-02-09 | Databricks press release | medium | Anchors current breadth across enterprise and AI buyers | No product-family or geography split |
| Fortune 500 penetration | 70% | 2026-02-09 | Databricks press release | medium | Signals deep enterprise reach and high-reference value | No disclosure of depth per Fortune 500 account |
| $1M+ annual run-rate customers | 650+ | 2025-09-08 | Databricks press release | medium | Supports land-and-expand traction in large accounts | No gross retention for this cohort |
| $1M+ annual run-rate customers | 800+ | 2026-02-09 | Databricks press release / CRN | medium | Shows rapid whale-account expansion into 2026 | No public revenue share from the cohort |
| $10M+ annual spenders | Nearly 50 | 2025-06-12 | CNBC | medium | Confirms an already material whale cohort by mid-2025 | No top-10 concentration disclosure |
| $10M+ annual run-rate customers | 70+ | 2026-02-09 | Databricks press release / CRN | medium | Suggests deeper enterprise embed and cross-sell | No segment or cloud-channel split |
| Net retention | >140% | 2025-09 to 2026-02 | Databricks / CNBC / CRN | medium | Best public durability proxy for platform expansion | No GRR, churn, or time-bucket cohort data |
Trajectory rows rely on public breadth and spend-cohort metrics rather than unsupported estimates of active seats or deployment counts.
[CU001, CU003, CU004, CU005, CU006, CU007]| Metric | Value / null | Segment | Confidence | Diligence ask |
|---|---|---|---|---|
| Net retention | >140% | Overall platform | medium | Request GRR, logo churn, and retention by product family and customer band. |
| AI/BI review signal | 4.8 / 5 and 94% willingness to recommend from 167 verified reviews | Analytics and BI users | medium | Validate whether AI/BI satisfaction translates into broader platform renewal. |
| PeerSpot review base | 93 reviews; multiple cost-management complaints | Enterprise review-site audience | low | Request support SLAs, FinOps tooling adoption, and cost-governance outcomes by segment. |
| Capterra archived review signal | 17 archived reviews; setup and interface complexity recur in cons | Mixed user base | low | Request onboarding time-to-value and training requirements for newer or smaller teams. |
| Mastercard onboarding churn direction | Churn down, exact percentage undisclosed | Payments onboarding workflow | low | Request quantitative before/after abandonment rates and enterprise rollout breadth. |
| Public GRR / cohort retention | All segments | low | Request month-by-month and annual cohort retention by enterprise, regulated, and AI-heavy customers. | |
| Public contract duration / renewal term | All segments | low | Request weighted-average remaining term and standard renewal cadence by account band. |
Public durability evidence is strongest on NRR and weakest on cohort retention, GRR, and contract-term detail. This table intentionally substitutes for the planned cohort figure because no public 0-100 retention cohort data was found.
[CU008, CU009, CU010, CU034, CU035, CU036]| Evidence area | Public signal | What is missing | Why figure was not used |
|---|---|---|---|
| Net retention | >140% overall NRR is publicly repeated in 2025-2026 | No GRR, logo churn, or 0-100 time-bucket retention series | A cohort figure would require actual percentage buckets rather than directional expansion signals. |
| $1M+ and $10M+ cohorts | Public counts of 650+, 800+, nearly 50, and 70+ large accounts | No cohort renewal or contraction history for those whale bands | Spend-band counts show expansion, but they do not satisfy a retention-cohort data contract. |
| Review and satisfaction signals | AI/BI review scores are strong while PeerSpot and Capterra surface cost and complexity complaints | No link between those reviews and actual paid renewal behavior | Review text helps with qualitative durability but cannot populate numeric cohort cells. |
| Named customer stories | Mastercard, AT&T, FOX Sports, 7-Eleven, and Insulet show live deployments and outcomes | No public contract term, cohort, or renewal denominator for those references | Case studies prove production use, not retention percentages over time. |
This extra table intentionally substitutes for the planned retention cohort figure because public sources did not provide time-bucket retention percentages between 0 and 100.
[CU005, CU006, CU007, CU008, CU009, CU010]6.4 Expansion logic is credible, but concentration and channel economics remain partially hidden
Databricks’ expansion logic is visible even without full cohort disclosure. The platform can start as a data-engineering or cloud-migration buy, then expand into governance, AI serving, AI/BI, operational data products, or customer-specific agent workflows. AT&T’s nearly 90,000 internal users on one architecture and Mastercard’s expansion from data pipelines into onboarding assistants are concrete examples of that widening use. Public partner surfaces reinforce the same pattern. Azure positions Databricks as a first-party Azure service, Google Cloud offers Databricks through Marketplace, and SAP now embeds SAP Databricks as a first-party service inside Business Data Cloud. Those routes should make Databricks easier to buy and harder to displace inside large enterprises. The unresolved issue is economic control. Public sources do not show how much revenue is partner-sourced, how concentrated the top customer set is, or whether the curated reference list overstates the ease of expansion. Databricks therefore looks strong on customer adoption quality and cross-sell logic, while still requiring private diligence on top-account concentration, channel mix, and term structure before treating customer durability as fully underwritten.[CU005, CU006, CU019, CU024, CU031, CU041]
| Expansion driver | Concentration risk | Impact | Diligence path |
|---|---|---|---|
| Usage-led platform expansion from data engineering into AI, BI, and operational use cases | $10M+ whale cohort is material but top-customer share is undisclosed | Strong growth with unknown revenue concentration at the very top of the base | Request top-10 and top-20 ARR share, gross retention, and product attach by whale cohort. |
| Internal land-and-expand after platform standardization, as seen at AT&T and Mastercard | Switching-cost strength may hide dependence on a few deeply embedded accounts | Can improve durability, but also magnify downside if one strategic account slows consumption | Request usage concentration by workspace, region, and product family for the largest customers. |
| Partner channels through Azure, Google Cloud, and SAP | Partner-routed deals may compress economics or shift renewal control | Channel leverage can help acquisition while reducing direct control over procurement and margin | Request direct versus partner sourced bookings, marketplace mix, and renewal ownership by channel. |
| Referenceable AI use cases such as 7-Eleven, FOX Sports, Mastercard, and Insulet | Curated references may overstate average deployment success and understate failed pilots | Good public proof, but survivorship bias remains a real diligence issue | Request win/loss data, failed pilot counts, and references from the last four quarters by segment. |
| Large review base and high AI/BI satisfaction alongside repeated cost complaints | Smaller or less-mature buyers may expand more slowly if cost governance is weak | Could cap adoption depth outside sophisticated enterprise teams even if top accounts keep expanding | Request cost-governance attach, training coverage, and support resolution metrics for smaller accounts. |
Expansion logic is visible, but concentration and channel economics remain materially underdisclosed.
[CU005, CU006, CU019, CU024, CU031, CU035]Observed path from initial platform need to production rollout and cross-workload expansion in large Databricks accounts.
[CU005, CU006, CU012, CU016, CU019, CU024]Publicly observable adoption path from enterprise need to standardized multi-workload deployment.
[CU005, CU006, CU012, CU016, CU019, CU029]6.5 Exhibits
07Risks
7.1 Legal, regulatory, and security risk is real even with a strong public trust surface
Databricks has a stronger public compliance and trust posture than many private infrastructure vendors: it publishes a privacy notice, a downloadable DPA, a trust center, a due-diligence package, a legal hub, and technical documentation that explicitly describes security responsibilities. That matters because it reduces initial diligence friction and shows the company is built to sell into regulated enterprises. The residual risk is that documentation is not the same thing as cleared exposure. The EU AI Act is now live with phased obligations for general-purpose AI models, public-sector authorization is explicitly cloud- and package-specific, and Databricks remains inside an active copyright dispute tied to Mosaic and DBRX. The Books3 / RedPajama litigation is not a theoretical AI-policy debate; it is a live legal process with U.S. and Canadian dimensions, uncertain damages, and potential reputational spillover for enterprise AI buyers. Databricks therefore looks better prepared than many peers on paper, but still exposed to a combination of regulatory scope creep, model-governance scrutiny, and litigation outcomes that are difficult to price from public data alone.[CR001, CR004, CR007, CR008, CR009, CR010]
| Risk | Jurisdiction / source | Status | Likelihood | Severity | Mitigation | Residual exposure | Diligence path |
|---|---|---|---|---|---|---|---|
| AI copyright litigation tied to Mosaic / DBRX | U.S. federal + Canada | Active U.S. docket plus proposed Canadian class action | medium | high | Litigation defense, model-governance evidence, privacy / trust materials | Discovery, damages, or settlement could raise legal cost and enterprise trust friction | Request full litigation memo, reserve analysis, insurance coverage, and training-data provenance |
| EU AI Act and AI-governance obligations | EU / EEA | Phased obligations begin in 2025 and 2026 for GPAI and high-risk uses | medium-high | high | DPA, SCCs, trust materials, Unity Catalog governance claims | If Databricks sits closer to provider obligations than customers assume, compliance cost and GTM friction rise | Request AI Act applicability map by product, model role, and region |
| Public-sector authorization scope | U.S. federal | FedRAMP Certified exists for Databricks on Azure Commercial as of 2026-01-16 | medium | medium-high | Cloud-specific authorization plus enhanced compliance controls | Investors may over-assume public-sector readiness across clouds, regions, or SKUs | Request cloud / region / product authorization matrix and renewal status |
| Privacy contracting and transfer regime | Global | Privacy notice, DPA, DPF, and legal center are public | medium | medium | Standard contractual clauses, supplementary measures, downloadable DPA, trust center | Cross-border processing, third-party service providers, and shared responsibility can still create contract friction after incidents | Request enterprise MSA, indemnity terms, subprocessor terms, and customer redline trends |
Rows are ordered by residual severity, not by how much public disclosure exists.
[CR004, CR005, CR006, CR010, CR011, CR012]7.2 Operational risk is transmitted through outages, shared responsibility, and partner concentration
Databricks’ operational risk is not just whether a core service stays up; it is how many critical workflows sit on top of cloud-specific deployments, partner models, and customer-side configuration. The official status page gives visibility, but third-party monitoring still shows enough Azure Databricks incident volume to treat reliability as an underwriting variable rather than a footnote. Databricks’ own documentation also makes clear that security and compliance are shared responsibilities across Databricks, customers, and cloud providers. That structure is normal for cloud infrastructure, but it means customer misconfiguration, workload placement, or control gaps can still become Databricks problems in practice when large enterprises evaluate renewals or incident response. At the same time, the company’s AI roadmap is increasingly entangled with Google Gemini, Anthropic Claude, SAP’s embedded data cloud route, and hyperscaler-native buyer relationships. These partnerships clearly accelerate distribution and feature breadth, yet they also create concentrated dependencies around account control, model access, and gross-margin leakage that public sources do not quantify.[CR008, CR009, CR018, CR019, CR020, CR021]
| Failure mode | Evidence | Likelihood | Severity | Mitigation maturity | Residual exposure | Unresolved gap |
|---|---|---|---|---|---|---|
| Cloud-specific outages and degradations | IsDown reports 20 incidents in the last 90 days and 173 since Jan 2023; official status page exists | medium-high | high | medium | Enterprise workloads can still be interrupted by provider-region failures or slow recovery | No public SLOs, postmortems, or customer SLA-credit detail |
| Shared-responsibility misconfiguration | Databricks docs say security and compliance are shared among Databricks, customer, and cloud provider | medium | high | medium | Customer-side gaps can still become churn, legal, or reputational problems for Databricks | No public rate of misconfiguration-driven incidents by customer segment |
| Advanced controls as add-on rather than obvious default baseline | Enhanced Security and Compliance is a named add-on that includes FedRAMP High, FedRAMP Moderate, and HIPAA | medium | medium-high | medium | Some higher-assurance controls may require explicit packaging or workload choices | No public attach-rate or tier-by-tier baseline control disclosure |
| Security transparency thinner than compliance marketing | Trust, privacy, and legal surfaces are public, but detailed incident history is not | medium | medium | low-medium | Current mitigation posture is documentation-heavy rather than incident-history heavy | No public breach register, incident taxonomy, or RCA cadence |
This table focuses on residual exposure after considering the public mitigation surface, not on whether controls exist at all.
[CR007, CR008, CR009, CR018, CR019, CR020]| Dependency | Counterparty / market | Role | Concentration | Failure scenario | Severity | Mitigation | Residual exposure |
|---|---|---|---|---|---|---|---|
| Cloud and channel concentration | Microsoft, Google Cloud, SAP | Hosting, procurement, embedded distribution, enterprise route-to-market | high | Bundling, pricing, or policy shifts weaken Databricks account control or economics | high | Multi-cloud footprint and broad partner set | Large enterprise distribution still clusters around a small set of strategic routes |
| Frontier model access | Anthropic, Google Gemini, OpenAI | Model access for AI agents and enterprise features | medium-high | Model repricing, safety restrictions, or availability shifts raise COGS or slow roadmap delivery | high | Multiple model partners plus Mosaic AI tooling | External model providers still shape cost and feature availability |
| Embedded enterprise data route | SAP Business Data Cloud | Databricks becomes infrastructure inside another enterprise platform | medium | SAP controls customer context or product roadmaps more than Databricks does | medium-high | Databricks gains reach into large SAP estates | Channel leverage comes with lower direct control of the account |
| Native cloud substitutes | Microsoft Fabric, AWS EMR, Snowflake | Integrated data, database, Spark, and AI alternatives | high | Customers standardize on incumbent stacks already inside their cloud or data estate | high | Databricks still differentiates on open-source lineage and partner breadth | Incumbents can bundle data, AI, governance, and procurement in ways Databricks cannot fully offset publicly |
The risk is not that Databricks lacks partners; it is that several of its most important partners also shape pricing, roadmap velocity, or competitive boundaries.
[CR023, CR024, CR025, CR026, CR027, CR028]Databricks’ dependency stack spans cloud, model, and channel partners; breadth helps, but the critical nodes are still concentrated enough to matter.
The map shows critical dependency and competition nodes, not a complete ecosystem graph.
[CR023, CR024, CR026, CR027, CR028, CR029]7.3 The largest residual risk may be valuation and execution, not immediate distress
Nothing in the public record suggests Databricks is financially weak today. The company says revenue, AI monetization, and free cash flow are all scaling, and multiple independent outlets corroborate a rapid step-up in capital raised and private valuation. That is precisely why valuation risk matters. Databricks moved from a $62 billion valuation in January 2025 to more than $100 billion in August-September 2025 and then to $134 billion by December 2025-February 2026 while also layering in billions of debt capacity. Meanwhile it is trying to extend beyond the classic lakehouse story into Lakebase, Agent Bricks, AI apps, and deeper model-provider relationships. A premium mark can remain justified when execution is near-perfect, but it becomes fragile if product sprawl, partner economics, competitive bundling, or IPO timing slips. Microsoft Fabric, AWS EMR, and Snowflake all show that Databricks is not competing in a vacuum; large incumbents are already pitching integrated data-plus-AI stacks, cloud-scale resilience, and lower-friction procurement inside their own estates.[CR028, CR029, CR030, CR031, CR032, CR033]
| Role / function | Dependency or gap | Likelihood | Severity | Mitigation | Diligence path |
|---|---|---|---|---|---|
| Product and platform leadership | Databricks is expanding simultaneously into Lakebase, Agent Bricks, AI apps, and deeper model integrations | medium | high | Large capital base and visible investor support | Request product-level resource allocation, GA quality metrics, and launch postmortems |
| Finance and capital markets execution | Repeated mega-rounds plus >$7B debt access create IPO-grade control expectations | medium | high | Positive free-cash-flow claims and strong investor demand | Request audited financial package, debt covenants, and IPO readiness workstreams |
| Legal / compliance operations | Active AI copyright litigation and AI-regulation obligations require deep model-governance coordination | medium-high | high | DPA, trust center, legal hub, and compliance artifacts | Request governance ownership map, model provenance controls, and reserve process |
| SRE and support scaling | Enterprise reliability scrutiny rises as Databricks adds AI and operational database ambitions | medium | medium-high | Status page visibility and multi-cloud operating footprint | Request SRE org chart, SLOs, incident review cadence, and reliability staffing plan |
Execution risk is elevated by strategic breadth and valuation expectations rather than by obvious public distress signals.
[CR007, CR018, CR032, CR036, CR037, CR038]7.4 Mitigations are visible, but the kill criteria depend on closing private diligence gaps
Databricks’ strongest public mitigation is that it already looks like a company preparing for much more invasive diligence: trust materials are organized, privacy contracting is explicit, public-sector authorization exists, and the status surface is transparent enough for outside monitoring. Those are meaningful positives. But a risk chapter should convert them into concrete thresholds. The legal thesis breaks if copyright litigation escalates into class certification, injunctive relief, or reserve levels that change unit economics. The operational thesis weakens if outage frequency stays elevated without corresponding public or private evidence of SLO discipline and postmortem quality. The dependency thesis worsens if hyperscaler bundles or model-provider repricing change who controls the account or who captures the gross margin. And the valuation thesis remains fragile until private diligence closes four gaps that public materials do not answer: partner economics, customer concentration, debt covenants, and litigation downside. Without those, Databricks can still be an exceptional company while remaining a difficult late-stage entry price.[CR007, CR010, CR018, CR020, CR021, CR037]
| Risk | Monitorable trigger | Threshold / event | Action implication |
|---|---|---|---|
| Copyright litigation | Court rulings or settlement posture | Class certification, injunctive relief, or reserve needs that are material relative to disclosed free cash flow | Treat legal risk as thesis-breaking until downside is repriced or reserved |
| AI regulation | AI Act applicability expansion | Databricks-controlled models or workflows fall clearly into provider obligations without disclosed compliance mapping | Assume higher compliance cost, slower EU expansion, and greater contract friction |
| Reliability | Incident frequency and recovery time | Repeated major outages or sustained multi-hour median resolution over successive quarters | Underwrite slower enterprise expansion and higher support cost |
| Partner concentration | Bundling or repricing by strategic partners | Hyperscaler or model-provider changes compress gross margin or shift account ownership | Lower terminal margin assumptions and demand clearer partner economics |
| Capital dependence | Debt and liquidity trajectory | Debt expands again without comparable improvement in disclosed cash generation or IPO readiness | Treat the $134B valuation as stretched rather than merely aggressive |
| Disclosure quality | Private diligence gaps persist | No close on concentration, partner economics, SLA, or litigation-reserve questions | Pause or price for uncertainty instead of underwriting on narrative momentum |
These thresholds are intended to be monitorable from public news, court dockets, incident trackers, and management diligence materials, not from intuition.
[CR012, CR015, CR016, CR020, CR021, CR037]Databricks’ heaviest residual risks cluster where legal exposure, partner concentration, and premium valuation reinforce one another.
This heatmap uses source-backed ordinal scoring to rank residual exposure rather than pretending to know synthetic probabilities.
[CR010, CR012, CR015, CR016, CR020, CR021]Databricks’ key risks flow through a small set of channels: legal burden, outages, and partner concentration all feed margin, growth durability, and valuation support.
The graph is qualitative and source-backed: it shows transmission channels rather than a synthetic risk model.
[CR012, CR015, CR020, CR021, CR024, CR026]7.5 Exhibits
08Valuation
8.1 Investment thesis, anti-thesis, and recommendation
Databricks still looks like one of the strongest late-stage infrastructure assets in private markets. Public evidence supports a rare combination of scale, growth, customer depth, and improving cash generation: the company moved from a $62 billion Series J in December 2024 to a >$100 billion Series K term sheet in August 2025 and then to a $134 billion Series L in December 2025, while disclosed revenue run-rate moved from an expected $3 billion by early 2025 to $4.8 billion in late 2025 and $5.4 billion by early 2026. AI is no longer a side narrative; management and independent coverage both point to AI revenue run-rate exceeding $1.4 billion, with retention above 140% and an expanding cohort of million-dollar customers. The anti-thesis is that almost every bullish datapoint is still management-led run-rate disclosure rather than audited financial reporting. Investors can admire the asset and still conclude that the current entry price already capitalizes much of the good news. The honest recommendation is therefore track, not buy: company quality is high, but public evidence is not yet clean enough to prove that the current price leaves venture-style upside after accounting for denominator risk, private-market structure, and the still-open IPO timeline.[CV001, CV005, CV006, CV007, CV008, CV011]
| Dimension | Thesis | Anti-thesis | What would change the view |
|---|---|---|---|
| Scale and growth | Run-rate rose from roughly $3B expected in early 2025 to $5.4B by early 2026, with growth still >55% to 65%. | The strongest datapoints are still management-led run-rate snapshots rather than audited financial statements. | An audited revenue bridge and quarter-by-quarter disclosure would strengthen conviction. |
| AI monetization | AI products now look like a real second engine at >$1.4B run-rate and around one-quarter of run-rate by Sacra estimates. | AI mix can still overstate value if economics are pass-through heavy or margin-dilutive. | Gross-margin disclosure by AI product family would prove whether the premium is software quality or just higher workload volume. |
| Customer depth | >700 to 800 customers above $1M run-rate and >140% retention suggest durable expansion in large accounts. | Public sources still do not disclose concentration, churn, or workload-level net expansion by product. | Customer-cohort disclosure and concentration data would materially improve underwriting. |
| Comp premium | Databricks deserves a premium to Snowflake, MongoDB, Confluent, and Elastic because it combines data infrastructure with AI control-plane narrative. | At ~25x to 28x run-rate, the premium is already large relative to most public software benchmarks. | A lower entry price or a sustained public AI premium would make the premium easier to underwrite. |
| Exit optionality | The IPO window could open in 2026 or later, creating a path to public repricing. | Timing remains management-controlled and disclosure-light; staying private longer can also defer price discovery. | A formal IPO timeline or confidential filing would improve exit confidence. |
The anti-thesis focuses on price and disclosure, not on whether Databricks is strategically important.
[CV003, CV006, CV007, CV008, CV012, CV016]8.2 Financing context, denominator caveats, and comparable framing
The most important caveat in this chapter is denominator honesty. Databricks is private, so the headline valuation is a post-money round mark, not a continuously traded enterprise value. The company mostly discloses annualized revenue run-rate, not audited GAAP revenue, and public observers do not know the cap-table seniority, tender discounts, or debt covenants attached to the recent financing stack. That means a simple $134 billion divided by $4.8 billion or $5.4 billion run-rate ratio is useful, but it is not directly comparable to a public-company EV/NTM revenue multiple. Even with that limitation, the rough math is informative. Databricks screens around 25x to 28x run-rate depending on which public denominator one uses. That is well above current public data-platform names such as Snowflake, MongoDB, Confluent, and Elastic, and above a scaled workflow benchmark like ServiceNow. It is below the extreme AI scarcity multiple implied by Palantir, which matters because it shows Databricks is priced as a premium hybrid: better than ordinary data infrastructure, but not yet at the very top end of public AI exuberance. The comp set therefore supports a fair-to-stretched conclusion rather than an obviously absurd one. Databricks can justify a premium, but only if growth, AI monetization, and eventual disclosure quality remain unusually strong.[CV006, CV016, CV020, CV024, CV027, CV028]
| Dimension | Value | Rationale |
|---|---|---|
| Recommendation | track | Company quality is strong, but public evidence does not yet support a clear margin of safety at $134B. |
| Confidence | medium | The comp signal is directionally clear, but the key Databricks denominator is still a private run-rate rather than audited revenue. |
| Risk rating | high | Wide dispersion remains because cap-table structure, debt terms, and IPO timing are still not public. |
| Valuation stance | stretched | Current pricing sits far above most public data-platform comps and only works cleanly if Databricks keeps an AI premium. |
| Base-case valuation range | $110B-$145B | This range assumes continued growth with some compression toward public-comp discipline. |
| Decision implication | Wait for lower entry or fuller disclosure | A better price or audited IPO-style disclosure would move the call more than another funding headline. |
Recommendation is explicitly price-sensitive and denominator-sensitive.
[CV006, CV016, CV020, CV021, CV043, CV044]| Comparable | Valuation / market cap | Revenue denominator | Implied multiple / status | Relevance | Limitation |
|---|---|---|---|---|---|
| Databricks (subject) | $134B private post-money | $4.8B-$5.4B run-rate | ~24.8x-27.9x | Shows what new money is paying for scale plus AI premium. | Private post-money valuation and run-rate are not the same as public EV / NTM revenue. |
| Snowflake | $49.85B market cap | $4.472B FY2026 product revenue | ~11.1x | Closest public data-platform peer with meaningful scale and cloud economics. | Public market cap is not enterprise value and product revenue is a cleaner denominator than Databricks discloses. |
| MongoDB | $21.27B market cap | $2.01B FY2025 total revenue | ~10.6x | Useful high-growth developer-data comp with premium software narrative. | Database exposure and product mix differ from Databricks lakehouse plus AI platform. |
| Confluent | $11.13B market cap | $1.167B 2025 revenue | ~9.5x | Helpful real-time data infra comp showing where narrower infrastructure trades. | Streaming focus is narrower than Databricks and should not be used as a direct valuation anchor alone. |
| Elastic | $5.24B market cap | $1.483B 2025 revenue | ~3.5x | Shows the downside of ordinary infrastructure software without a strong current AI premium. | Search / observability mix and weaker growth make it more of a floor than a direct peer. |
| ServiceNow | $94.84B market cap | $13.278B 2025 revenue | ~7.1x | Scaled workflow-software benchmark for what mature, highly profitable enterprise software can trade at. | ServiceNow has superior disclosure and a more mature model, so the comp mostly anchors the upper bound of ordinary software. |
| Palantir | $350.05B market cap | $4.475B 2025 revenue | ~78.2x | Shows what a public AI-scarcity premium can look like when narrative and government/AI demand are both extreme. | Palantir is an outlier; using it as a direct Databricks anchor would overstate fair value. |
Denominators are intentionally mixed and should be read directionally: current public market cap over latest annual revenue for public comps, versus private post-money over disclosed run-rate for Databricks.
[CV006, CV016, CV027, CV028, CV029, CV030]8.3 Scenario analysis and price sensitivity
The scenario table should be read as a pricing discipline tool, not management guidance. In a bull case, Databricks keeps growth closer to current levels for longer, turns AI mix into a durable margin and platform premium, and reaches an IPO window while still looking more like an AI control plane than a mature data-warehouse vendor. That outcome can justify a valuation above the current mark. In a base case, the company continues to execute well, but public-market comp pressure and the move from run-rate rhetoric to IPO-grade scrutiny compress the premium enough that upside from a $134 billion entry is limited. In a bear case, the company is still good, but good is not enough: growth decelerates, AI monetization looks more like pass-through than software leverage, or public software multiples remain anchored near the current Snowflake / MongoDB / ServiceNow range. The result is that public evidence points to wide dispersion but a base-skewed distribution. That is why the chapter’s posture is positive on the asset and disciplined on price. The risk is not that Databricks is weak; it is that the current price leaves too little room for ordinary execution mistakes or a less forgiving IPO tape.[CV008, CV016, CV017, CV018, CV022, CV024]
| Scenario | Assumptions | Valuation / return logic | Key risks | Probability signal |
|---|---|---|---|---|
| Bull | Run-rate approaches $8B-$8.5B, AI mix stays premium, and IPO buyers keep rewarding AI-control-plane names above ordinary software comps. | $180B-$220B; roughly 1.3x-1.6x gross from a $134B entry over a 2-3 year hold. | Premium multiple must persist despite public scrutiny and broader software repricing. | Possible, but requires both execution and a supportive IPO tape. |
| Base | Run-rate reaches about $6.0B-$6.6B, disclosure improves only modestly, and Databricks rerates toward the upper end of public software comps. | $110B-$145B; roughly 0.8x-1.1x gross from a $134B entry. | Current headline price leaves limited room for normal multiple compression. | Most plausible on public evidence because it honors both quality and denominator caveats. |
| Bear | Growth decelerates toward mature-software levels, AI economics look less differentiated, or public comps stay anchored near 10x-15x. | $55B-$85B; roughly 0.4x-0.6x gross from a $134B entry. | A good company can still produce a weak entry if public discipline arrives before disclosure quality improves. | Material downside if price discovery moves faster than Databricks disclosure. |
| Probability-weighted posture | Base-skewed because quality is visible but price support is incomplete. | Supports track rather than buy. | Cap-table opacity and IPO timing keep dispersion wide. | Late-stage private pricing needs more than admiration for the asset. |
Ranges are committee discussion tools built from current public comps and explicit denominator caveats, not management guidance.
[CV016, CV017, CV018, CV022, CV024, CV026]| Trigger | Threshold / event | Transmission to thesis | Action implication |
|---|---|---|---|
| Growth deceleration | Public growth falls materially below the >55%-65% range before disclosure quality improves | Reduces the premium multiple that currently separates Databricks from ordinary data-infrastructure comps | Move from track toward avoid unless price resets sharply lower |
| AI monetization disappointment | AI run-rate grows but gross margin or attach economics prove weak in diligence | Turns the AI premium into lower-quality pass-through revenue | Cut valuation range and require product-level margin evidence before investing |
| Public multiple compression | Snowflake / MongoDB / ServiceNow-style revenue multiples contract further | Shrinks the market-clearing range for a future IPO | Do not underwrite today’s private mark on stale public multiples |
| Cap-table overhang | Preferred structure, tender discounts, or debt covenants materially reduce common-equity value | Headline valuation ceases to represent what new common-equity capital can earn | Rebuild the model on common-equity economics, not post-money headline value |
| IPO timeline slips | No meaningful IPO preparation or disclosure path appears after another financing cycle | Pushes liquidity farther out and leaves valuation in a private-mark feedback loop | Raise required return or wait for secondary liquidity at a discount |
Triggers focus on observable underwriting breaks rather than broad company-quality concerns.
[CV008, CV016, CV017, CV018, CV021, CV024]Decision chain from Databricks scale, disclosure quality, comp premium, and exit timing to the final recommendation.
[CV016, CV018, CV021, CV043, CV044, CV055]Directional valuation outcomes as Databricks moves between public-comp and AI-premium multiple regimes.
Values are estimated from public comparables plus scenario run-rate assumptions; they are not management guidance or enterprise-value calculations.
[CV043, CV044, CV045, CV046, CV047, CV048]Low, base, and high valuation envelopes for Databricks at the current late-stage entry point.
These ranges use valuation-to-run-rate proxies because Databricks does not publish the audited revenue and cap-structure detail needed for a full EV model.
[CV052, CV053, CV054, CV058]8.4 Entry discipline, thesis-break triggers, and final diligence asks
What would move the call? A cleaner cap table, audited revenue and gross-margin disclosure, and evidence that the AI layer carries true software economics rather than just workload growth would all help materially. Price alone could also move the recommendation faster than another funding headline. A common-equity-equivalent entry below the low end of the current base range would create much more attractive asymmetry, especially if the company enters a formal IPO process with better disclosure. Conversely, the recommendation should worsen if Databricks loses the growth and retention profile that currently underwrites its premium or if private financing terms reveal that common-equity holders sit behind more structure than the headline valuation implies. The practical conclusion is that Databricks is investable as a company but not yet underwritten enough as a price. Investors should treat it as a live watchlist name with aggressive diligence, not as a blind late-stage momentum buy. The public record gets you to track with medium confidence and high risk, while the remaining diligence list determines whether the next move is up toward buy or down toward avoid.[CV020, CV021, CV051, CV055, CV056, CV057]
| Topic | Missing evidence | Why it matters | Owner / diligence path |
|---|---|---|---|
| Cap table and preferences | Current share count, round prices, liquidation preferences, tender discounts, and employee-liquidity terms | Common-equity upside may differ materially from the headline $134B valuation | Company / counsel / lead investor |
| Revenue bridge | Quarterly bridge from run-rate disclosure to audited GAAP revenue and deferred revenue | Prevents overpaying off marketing denominators | Finance diligence / auditor |
| Gross margin and SBC | Audited gross margin by product family and stock-comp burden | Determines whether AI growth is real software leverage or expensive cloud pass-through | Finance diligence / IPO readiness workstream |
| Debt terms | Debt pricing, covenant package, maturity profile, and any security or cross-default features | Debt capacity affects true common-equity economics and risk | Treasury / lender diligence |
| Customer concentration and NRR detail | Top-customer exposure, churn cohorts, NRR by AI and core platform products | Premium multiples require proof that the biggest growth engines are durable | Sales ops / customer analytics |
| IPO readiness and liquidity path | Board-level IPO criteria, banker prep, and any current secondary windows | Entry return depends heavily on timing and the next real price-discovery event | CEO / CFO / bankers |
These asks are the minimum package needed to turn a strong company view into a cleaner valuation view.
[CV020, CV021, CV051, CV057, CV058]IC-style dashboard of the Databricks underwriting dimensions that matter most at the current price.
[CV016, CV018, CV021, CV051, CV055, CV057]Disclaimer
This report is a public-evidence diligence snapshot, not investment advice. Important financial, legal, technical, and contractual facts remain non-public and should be verified directly with management and primary documents before any investment decision.
Evidence index
| ID | Statement | Confidence | Sources |
|---|---|---|---|
| CO001 | Databricks was founded in 2013. | High | SO001, SO002 |
| CO002 | Databricks says the company was founded by seven researchers from UC Berkeley's AMP Lab. | Medium | SO002 |
| CO003 | The official founders page names Ali Ghodsi, Ion Stoica, Matei Zaharia, Patrick Wendell, Reynold Xin, Andy Konwinski, and Arsalan Tavakoli-Shiraji as Databricks founders. | Medium | SO004 |
| CO004 | Databricks describes itself as the data and AI company. | High | SO001, SO002 |
| CO005 | Databricks says its Data Intelligence Platform provides a unified foundation for data and governance combined with AI models tuned to an organization's characteristics. | Medium | SO001 |
| CO006 | Databricks says it is headquartered in San Francisco. | High | SO001, SO002, SO003 |
| CO007 | Databricks lists 160 Spear Street, 15th Floor, San Francisco, California as its contact address. | Medium | SO003 |
| CO008 | The current about page says more than 15,000 organizations worldwide rely on Databricks. | Medium | SO001 |
| CO009 | The Databricks press kit says the company has more than 20,000 customers globally. | High | SO002, SO008, SO009, SO010 |
| CO010 | The Databricks press kit says the company has more than 10,000 employees worldwide. | Medium | SO002 |
| CO011 | The Databricks press kit says the company operates 30-plus offices around the globe. | Medium | SO002 |
| CO012 | Databricks says 70% of the Fortune 500 use its platform. | Medium | SO002 |
| CO013 | Databricks maintains a public board-of-directors page. | Medium | SO006 |
| CO014 | Ali Ghodsi is Databricks' co-founder and CEO. | High | SO023, SO014 |
| CO015 | UC Berkeley says Ali Ghodsi cofounded Databricks with six UC Berkeley academics who built Apache Spark. | Medium | SO023 |
| CO016 | The Spark CACM paper credits Matei Zaharia, Reynold Xin, Patrick Wendell, Ali Ghodsi and other Berkeley-linked authors, anchoring Databricks' founder bench in Apache Spark's creation. | Medium | SO022 |
| CO017 | On December 17, 2024 Databricks announced a Series J financing with $10 billion of expected non-dilutive funding and $8.6 billion completed to date. | High | SO007, SO027 |
| CO018 | Databricks said the Series J financing valued the company at $62 billion. | High | SO007, SO027, SO014 |
| CO019 | Databricks said Thrive Capital led Series J, with Andreessen Horowitz, DST Global, GIC, Insight Partners and WCM Investment Management as co-leads. | High | SO007, SO027 |
| CO020 | Databricks said in the Series J announcement that it expected to cross a $3 billion revenue run-rate and achieve positive free cash flow in the quarter ending January 31, 2025. | High | SO007, SO027 |
| CO021 | CNBC reported in June 2025 that Databricks expected annualized revenue to reach $3.7 billion by July 2025 with 50% year-over-year growth. | Medium | SO014 |
| CO022 | CNBC reported Databricks generated $2.6 billion of revenue in the fiscal year ending January 2025. | Medium | SO014 |
| CO023 | CNBC reported that nearly 50 Databricks customers were spending over $10 million annually in the first quarter of fiscal 2026. | Medium | SO014 |
| CO024 | CNBC reported Databricks had roughly 8,000 employees in June 2025 and was hiring 3,000 people in 2025. | Medium | SO014 |
| CO025 | Databricks announced in September 2025 that it crossed a $4 billion revenue run-rate with growth above 50% year over year. | Medium | SO008 |
| CO026 | Databricks said its AI products had exceeded a $1 billion revenue run-rate by September 2025. | Medium | SO008 |
| CO027 | Databricks said it was closing a $1 billion Series K at a valuation above $100 billion in September 2025. | High | SO008, SO018 |
| CO028 | Databricks said it had achieved positive free cash flow over the prior 12 months by September 2025. | Medium | SO008 |
| CO029 | Databricks said more than 650 customers were consuming over $1 million in annual revenue run-rate by September 2025. | Medium | SO008 |
| CO030 | TechCrunch reported in August 2025 that Databricks was closing about $1 billion of new funding at a $100 billion valuation, co-led by Thrive and Insight Partners. | Medium | SO018 |
| CO031 | TechCrunch reported Databricks had already offered employees two secondary liquidity rounds in 2025. | Medium | SO018 |
| CO032 | Databricks announced on December 16, 2025 that it was raising more than $4 billion in a Series L financing at a $134 billion valuation. | High | SO009, SO015 |
| CO033 | Databricks said it crossed a $4.8 billion revenue run-rate in Q3 2025 with growth above 55% year over year. | High | SO009, SO015 |
| CO034 | Databricks said both its AI products and its Data Warehousing business had surpassed $1 billion revenue run-rate by December 2025. | Medium | SO009 |
| CO035 | Databricks said more than 700 customers were consuming over $1 million in annual revenue run-rate by December 2025. | Medium | SO009 |
| CO036 | Databricks announced on February 9, 2026 that it crossed a $5.4 billion revenue run-rate with growth above 65% year over year. | Medium | SO010 |
| CO037 | Databricks said the February 2026 financing package exceeded $7 billion, including roughly $5 billion of equity at a $134 billion valuation and roughly $2 billion of additional debt capacity. | High | SO010, SO015 |
| CO038 | Databricks said more than 800 customers were consuming over $1 million in annual revenue run-rate by February 2026. | Medium | SO010 |
| CO039 | SAP said in February 2025 that SAP Business Data Cloud natively embeds Databricks technology for data engineering, machine learning and AI workloads. | Medium | SO019 |
| CO040 | Microsoft markets Azure Databricks as an Azure-managed environment for the data and AI lifecycle. | Medium | SO025 |
| CO041 | Google Cloud markets Databricks on Google Cloud as a partnership offering for scalable analytics and AI workloads. | Medium | SO026 |
| CO042 | Databricks said it completed the MosaicML acquisition on July 19, 2023. | High | SO011, SO016 |
| CO043 | TechCrunch reported Databricks agreed to pay $1.3 billion for MosaicML. | Medium | SO016 |
| CO044 | Databricks said the MosaicML deal was meant to help enterprises train, customize and deploy generative AI models on their own data. | Medium | SO011 |
| CO045 | Databricks said on June 4, 2024 that it agreed to acquire Tabular and updated on June 7, 2024 that the acquisition had completed. | High | SO012, SO017 |
| CO046 | Databricks said the Tabular deal brought the creators of Apache Iceberg together with the creators of Delta Lake to push open lakehouse interoperability. | High | SO012, SO017 |
| CO047 | Databricks said on May 14, 2025 that it agreed to acquire Neon to deliver serverless Postgres for developers and AI agents. | Medium | SO013 |
| CO048 | The Register reported in April 2026 that a federal judge let authors' copyright claims against Databricks continue over DBRX and Mosaic-related training data. | High | SO020, SO021 |
| CO049 | Saveri says the plaintiffs filed suit on March 8, 2024 and that on April 21, 2026 the court denied Databricks' motion to dismiss DBRX-related claims. | Medium | SO021 |
| CO050 | Insight Partners publicly lists Databricks as a portfolio investment. | Medium | SO024 |
| CO051 | TechCrunch reported in August 2025 that Databricks had raised about $20 billion since founding. | Low | SO018 |
| CO052 | CNBC described Databricks in January 2026 as one of the highly valued private technology companies primed to go public in 2026. | Medium | SO015 |
| CM001 | Databricks says its Data Intelligence Platform is built on a lakehouse and is intended for an entire organization to use data and AI. | Medium | SM001 |
| CM002 | Databricks says lakehouse architecture combines data lakes and data warehouses to reduce costs and accelerate data and AI initiatives. | Medium | SM002 |
| CM003 | Databricks says the lakehouse offers one architecture for integration, storage, processing, governance, sharing, analytics, and AI. | Medium | SM002 |
| CM004 | Databricks says the lakehouse supports structured and unstructured data across major clouds. | Medium | SM002 |
| CM005 | Databricks says AI/BI runs directly on governed data in Unity Catalog. | High | SM004, SM005 |
| CM006 | Databricks says integrated semantics create one version of truth across BI dashboards, AI agents, and downstream tools. | High | SM004, SM005 |
| CM007 | Databricks says AI/BI supports natural-language dashboard creation and conversational analytics for business users. | Medium | SM005 |
| CM008 | Databricks says Mosaic AI is for building production AI agents on enterprise data. | Medium | SM003 |
| CM009 | Databricks says Mosaic AI provides built-in evaluation for agents using any AI model. | Medium | SM003 |
| CM010 | Databricks says Unity Catalog can enforce guardrails, access controls, rate limits, and lineage across AI workflows. | High | SM003, SM004 |
| CM011 | Databricks says Unity Catalog applies governance across structured data, unstructured data, business metrics, and AI models. | Medium | SM004 |
| CM012 | Databricks says Unity Catalog uses open lakehouse formats and open APIs to reduce lock-in. | High | SM004, SM007 |
| CM013 | Databricks public-sector materials list state and local government, federal agencies, and higher education as distinct target segments. | Medium | SM006 |
| CM014 | Databricks says public-sector agencies use the platform to track revenue, strengthen compliance, and improve fiscal decision-making. | Medium | SM006 |
| CM015 | Databricks says Delta Sharing and Databricks Marketplace let public-sector users share data without copying it and without requiring counterparties to run Databricks. | Medium | SM025 |
| CM016 | AWS Marketplace has a Databricks seller profile, giving buyers a standard marketplace procurement route. | Medium | SM009 |
| CM017 | Google Cloud positions Databricks as a partner offering with access to Gemini, open-source models, and BigQuery. | Medium | SM010 |
| CM018 | Microsoft describes Azure Databricks as a unified, open analytics platform for enterprise-grade data, analytics, and AI at scale. | Medium | SM011 |
| CM019 | Microsoft documentation identifies data engineering as a core Azure Databricks use case. | Medium | SM011 |
| CM020 | Microsoft documentation identifies machine learning, AI, and data science as core Azure Databricks use cases. | Medium | SM011 |
| CM021 | Microsoft documentation identifies data warehousing, analytics, and BI as core Azure Databricks use cases. | Medium | SM011 |
| CM022 | Microsoft documentation identifies real-time and streaming analytics as a core Azure Databricks use case. | Medium | SM011 |
| CM023 | Grand View Research estimates the global data lakehouse market at USD 11.35 billion in 2024. | Medium | SM015 |
| CM024 | Grand View Research expects the data lakehouse market to reach USD 13.94 billion in 2025. | Medium | SM015 |
| CM025 | Grand View Research projects the data lakehouse market will reach USD 74.00 billion by 2033 at a 23.2% CAGR. | Medium | SM015 |
| CM026 | Grand View Research says North America held 35.2% of 2024 data lakehouse revenue. | Medium | SM015 |
| CM027 | Grand View Research says large enterprises held 71.4% of 2024 data lakehouse revenue. | Medium | SM015 |
| CM028 | Global Market Insights estimates the data lakehouse market at USD 11.9 billion in 2024. | Medium | SM016 |
| CM029 | Global Market Insights expects the data lakehouse market to reach USD 14.2 billion in 2025. | Medium | SM016 |
| CM030 | Global Market Insights projects the data lakehouse market will reach USD 105.9 billion by 2034 at a 25% CAGR. | Medium | SM016 |
| CM031 | The Business Research Company says the data lakehouse market reaches USD 10.33 billion in 2025. | Medium | SM017 |
| CM032 | The Business Research Company says the data lakehouse market reaches USD 12.58 billion in 2026 at a 21.8% CAGR from 2025. | Medium | SM017 |
| CM033 | The Business Research Company projects the data lakehouse market reaches USD 27.28 billion in 2030 at a 21.4% CAGR. | Medium | SM017 |
| CM034 | The Business Research Company says data lakehouse deployments span both cloud-based and on-premise models. | Medium | SM017 |
| CM035 | The Business Research Company says data lakehouse demand spans both large enterprises and SMEs. | Medium | SM017 |
| CM036 | The Business Research Company says key data lakehouse end markets include IT and telecom, BFSI, retail and e-commerce, healthcare and life sciences, manufacturing, and energy and utilities. | Medium | SM017 |
| CM037 | Public data lakehouse market estimates conflict materially across publishers and forecast windows, so one generic TAM figure would overstate precision for Databricks. | Medium | SM015, SM016, SM017 |
| CM038 | IDC projects worldwide spending on AI-supporting technology will reach USD 337 billion in 2025. | Medium | SM014 |
| CM039 | IDC projects AI-supporting technology spend will surpass USD 749 billion by 2028. | Medium | SM014 |
| CM040 | IDC says 2025 marks a shift from AI experimentation to reinvention driven by AI agents and renovation in data, infrastructure, and cloud. | Medium | SM014 |
| CM041 | Confluent says 89% of IT leaders view data streaming platforms as critical or important to achieving data-related goals. | Medium | SM013 |
| CM042 | Confluent says 44% of IT leaders report 5x ROI from data streaming investments. | Medium | SM013 |
| CM043 | Confluent says 90% of IT leaders are increasing data streaming platform investment in 2025. | Medium | SM013 |
| CM044 | Confluent says 89% of IT leaders think data streaming platforms ease AI adoption by improving data access, quality assurance, and governance. | Medium | SM013 |
| CM045 | Deloitte says worker access to AI rose by 50% in 2025. | Medium | SM019 |
| CM046 | Deloitte says the number of companies with at least 40% of AI projects in production is set to double in six months. | Medium | SM019 |
| CM047 | Deloitte says only one in five companies has a mature governance model for autonomous AI agents. | Medium | SM019 |
| CM048 | Deloitte says 42% of companies believe their AI strategy is highly prepared, but they feel less prepared in infrastructure, data, risk, and talent. | Medium | SM019 |
| CM049 | Deloitte says legacy data and infrastructure architectures cannot power real-time autonomous AI. | Medium | SM019 |
| CM050 | McKinsey says nearly two-thirds of respondents cite security and risk concerns as the top barrier to scaling agentic AI. | Medium | SM018 |
| CM051 | McKinsey says 74% of respondents identify inaccuracy and 72% cite cybersecurity as highly relevant AI risks. | Medium | SM018 |
| CM052 | McKinsey says nearly 60% of respondents cite knowledge and training gaps as the main barrier to implementing responsible AI practices. | Medium | SM018 |
| CM053 | The FinOps Foundation says 63% of respondents now manage AI spending, up from 31% last year. | Medium | SM022 |
| CM054 | The FinOps Foundation says implementing governance and policy at scale becomes the top future priority as organizations manage more AI and ML spend. | Medium | SM022 |
| CM055 | CIO says companies without modern data infrastructure cannot feed relevant data into AI systems effectively. | Medium | SM023 |
| CM056 | CIO says traditional data platforms are often designed only for structured data and can lack governance and quality features. | Medium | SM023 |
| CM057 | CIO says preparing data for AI is the number-one reason companies pursue data modernization. | Medium | SM023 |
| CM058 | CIO says only 29.1% of companies reported using AI-centric data management platforms such as Vertex or SageMaker. | Medium | SM023 |
| CM059 | NIST says the AI RMF is a voluntary framework for incorporating trustworthiness into the design, development, use, and evaluation of AI systems. | Medium | SM020 |
| CM060 | NIST says it released a generative AI risk management profile in July 2024 and a critical infrastructure trust profile concept note in April 2026. | Medium | SM020 |
| CM061 | The EU AI Act sets risk-based rules for AI developers and deployers. | Medium | SM021 |
| CM062 | The EU AI Act made prohibitions effective in February 2025, GPAI rules effective in August 2025, and begins transparency and high-risk obligations in 2026 and 2027. | Medium | SM021 |
| CM063 | CDOTrends says 85% of surveyed organizations were already using GenAI in at least one function. | Medium | SM024 |
| CM064 | CDOTrends says only 37% of executives and 29% of practitioners thought GenAI applications were production-ready. | Medium | SM024 |
| CM065 | CDOTrends says practitioners cited cost, skills, quality, and governance as the main GenAI deployment hurdles. | Medium | SM024 |
| CM066 | CDOTrends says only 22% of respondents felt their current IT architecture could effectively support new AI applications. | Medium | SM024 |
| CM067 | Databricks’ Economist Impact landing page says companies were quick to adopt GenAI but still struggle to productionize and scale. | Low | SM008 |
| CM068 | Databricks’ Economist Impact landing page says 71% of practitioners believe their GenAI apps are not production-ready. | Low | SM008 |
| CM069 | Snowflake says it added 740 net new customers in Q4 fiscal 2026. | Medium | SM012 |
| CM070 | Snowflake says 733 customers spent more than USD 1 million on a trailing-12-month basis. | Medium | SM012 |
| CM071 | Snowflake says it served 790 Forbes Global 2000 customers as of January 31, 2026. | Medium | SM012 |
| CM072 | Snowflake says customers continue to rationalize budgets and prioritize cash-flow management. | Medium | SM012 |
| CM073 | Snowflake says it competes in a continually evolving market where enterprises are increasingly adopting AI for core functions. | Medium | SM012 |
| CP001 | Databricks says its Data Intelligence Platform is built on lakehouse architecture that combines the best elements of data lakes and data warehouses. | Medium | SP001 |
| CP002 | Databricks describes its lakehouse as one architecture for integration, storage, processing, governance, sharing, analytics, and AI across major clouds. | Medium | SP001 |
| CP003 | Databricks markets Unity Catalog as unified governance for all data, analytics, and AI assets. | Medium | SP003 |
| CP004 | Databricks says Unity Catalog applies discovery, access, quality monitoring, and compliance controls across structured data, unstructured files, ML models, and business metrics. | Medium | SP003 |
| CP005 | Databricks pricing is pay-as-you-go with no up-front costs and per-second billing granularity. | Medium | SP002 |
| CP006 | Databricks says committed-use contracts can provide discounts and can flex across multiple clouds. | Medium | SP002 |
| CP007 | Databricks says AI/BI is built natively into the platform and removes per-seat or per-license BI fees. | Medium | SP004 |
| CP008 | Databricks announced an expected $10 billion Series J financing that valued the company at $62 billion. | Medium | SP006 |
| CP009 | Databricks said in December 2024 that it expected to cross a $3 billion revenue run rate and become free-cash-flow positive in the quarter ending January 31, 2025. | Medium | SP006 |
| CP010 | Databricks said it had more than 500 customers consuming at over $1 million annual revenue run rate. | Medium | SP006 |
| CP011 | Databricks said in June 2025 that more than 15,000 organizations, including 70% of the Fortune 500, rely on its platform. | Medium | SP005 |
| CP012 | Databricks said in June 2025 that Unity Catalog added full Apache Iceberg support and native Iceberg REST Catalog APIs. | Medium | SP005 |
| CP013 | Databricks said Unity Catalog can let external engines including Trino, Snowflake, and Amazon EMR read and write Iceberg managed tables with fine-grained governance. | Medium | SP005 |
| CP014 | Snowflake documentation describes the platform as a self-managed cloud service that combines data storage, processing, and analytic solutions. | Medium | SP007 |
| CP015 | Snowflake documentation says customers cannot install and run Snowflake locally or on private cloud infrastructure. | Medium | SP007 |
| CP016 | Snowflake documentation describes its architecture as separate storage, compute, and cloud-services layers, with virtual warehouses as independent compute clusters. | Medium | SP007, SP008 |
| CP017 | Snowflake documentation says total cost is the aggregate of compute, storage, and data-transfer usage. | Medium | SP008 |
| CP018 | Snowflake documentation says virtual warehouses are billed per second with a 60-second minimum each time a warehouse starts. | Medium | SP008 |
| CP019 | Snowflake documentation gives a Small Standard virtual warehouse example of 2 credits per hour. | Medium | SP008 |
| CP020 | Snowflake reported $1.23 billion of product revenue in Q4 fiscal 2026, up 30% year over year. | Medium | SP009 |
| CP021 | Snowflake reported 733 customers with trailing 12-month product revenue greater than $1 million as of January 31, 2026. | Medium | SP009 |
| CP022 | Snowflake reported 790 Forbes Global 2000 customers and more than 9,100 accounts using Snowflake AI features as of January 31, 2026. | Medium | SP009 |
| CP023 | Snowflake says more than 13,300 customers around the world use its AI Data Cloud. | Medium | SP009 |
| CP024 | Google Cloud describes BigQuery as a serverless data analytics platform that does not require users to provision individual instances or virtual machines. | Medium | SP010, SP011 |
| CP025 | BigQuery pricing defaults to on-demand billing per TiB scanned and generally provides up to 2,000 concurrent shared slots per project. | Medium | SP011 |
| CP026 | BigQuery on-demand query pricing lists $6.25 per tebibyte and also offers capacity pricing per slot-hour with BigQuery editions and autoscaling. | Medium | SP011 |
| CP027 | Google Cloud documentation says BigQuery-managed Apache Iceberg tables are designed as a foundation for interoperable lakehouse workflows. | Medium | SP012 |
| CP028 | Alphabet said Google Cloud revenue increased 30% to $12.0 billion in Q4 2024. | Medium | SP013 |
| CP029 | Microsoft Learn describes Fabric as an end-to-end analytics SaaS platform with data engineering, data factory, data science, real-time intelligence, data warehouse, and database workloads over a shared compute and storage model. | Medium | SP016 |
| CP030 | Microsoft Learn says Fabric uses OneLake as a centralized logical data lake and OneLake Catalog as a centralized discovery and governance experience. | Medium | SP016 |
| CP031 | Microsoft Learn says Fabric includes Copilot capabilities and Purview-backed governance, compliance, and auditing across workloads. | Medium | SP014, SP016 |
| CP032 | Microsoft pricing describes Fabric capacity as a shared pool of Capacity Units that can be bought on a pay-as-you-go or reservation basis. | Medium | SP015 |
| CP033 | Microsoft pricing says a one- or three-year Fabric reservation can save about 41% versus pay-as-you-go. | Medium | SP015 |
| CP034 | Microsoft pricing says Power BI Pro is still required for report publishers and consumers on smaller Fabric capacities, while F64/P1 or larger capacities can waive Pro for consumers. | Medium | SP015 |
| CP035 | Microsoft reported $29.9 billion of Intelligent Cloud revenue in fiscal Q4 2025, up 26% year over year. | Medium | SP017 |
| CP036 | AWS positions Amazon Redshift as a cloud data warehouse for analytics and agentic AI that can unify data across Redshift, S3 data lakes, and third-party or federated sources. | Medium | SP018 |
| CP037 | AWS pricing says Redshift Provisioned starts at $0.543 per hour and Redshift Serverless starts at $1.50 per hour. | Medium | SP019 |
| CP038 | AWS pricing says Redshift Serverless bills RPU-hours on a per-second basis with a 60-second minimum and reservations can reduce compute costs by up to 45%. | Medium | SP019 |
| CP039 | Amazon reported AWS segment sales of $28.8 billion in Q4 2024 and $107.6 billion in full-year 2024. | Medium | SP020 |
| CP040 | Confluent says its managed Flink offering unifies Apache Kafka and Apache Flink so Kafka topics become queryable Flink tables. | Medium | SP021 |
| CP041 | Confluent says its fully managed serverless Flink offering uses usage-based pricing calculated in CFUs consumed per minute. | Medium | SP021 |
| CP042 | Confluent pricing says serverless Kafka uses autoscaling eCKUs, with the first eCKU free and listed tiers starting at $2.25 with a two-eCKU minimum. | Medium | SP022 |
| CP043 | Confluent reported $922.1 million of fiscal-year 2024 subscription revenue and $963.6 million of total revenue. | Medium | SP023 |
| CP044 | Apache Spark describes itself as a unified engine for large-scale data analytics. | Medium | SP024 |
| CP045 | Trino describes itself as a distributed SQL query engine for big data. | Medium | SP025 |
| CP046 | Microsoft Learn says OneLake shortcuts can provide zero-copy access to Amazon S3 and Google Cloud Storage in addition to Azure storage. | Medium | SP016 |
| CP047 | Databricks says its lakehouse is built on open source and open standards including Apache Spark, Delta Lake, MLflow, and Delta Sharing. | Medium | SP001 |
| CP048 | BigQuery Iceberg documentation describes metadata snapshot export in Apache Iceberg V2 format and Spark-runtime access patterns for Iceberg tables. | Medium | SP012 |
| CP049 | AWS says Redshift can query data in open formats on Amazon S3 and open Redshift data to AWS and Apache Iceberg-compatible analytics engines through the SageMaker lakehouse. | Medium | SP018 |
| CP050 | Rill argues the competitive center of gravity is shifting from proprietary table formats toward managed Iceberg infrastructure and catalogs, which reduces vendor lock-in. | Medium | SP026 |
| CI001 | Databricks says its pricing is pay-as-you-go with no up-front costs and per-second billing granularity. | Medium | SI001 |
| CI002 | Microsoft says Azure Databricks bills customers for both provisioned virtual machines and Databricks Units based on the selected VM instance. | Medium | SI002 |
| CI003 | Microsoft says customers can save up to 37% over pay-as-you-go DBU prices by pre-purchasing Databricks Commit Units for one-year or three-year terms. | Medium | SI002 |
| CI004 | Microsoft says Azure Databricks does not charge DBUs while instances are idle in a pool, but cloud-instance billing still applies. | Medium | SI002 |
| CI005 | Microsoft Learn says some Azure Databricks serverless features use DBU multipliers, including a 2X multiplier for Data Quality Monitoring. | Medium | SI003 |
| CI006 | Microsoft Learn says SQL Serverless warehouse sizes range from 4 DBUs per hour at 2X-Small to 528 DBUs per hour at 4X-Large. | Medium | SI003 |
| CI007 | Microsoft Learn says CPU model serving bills one concurrent request per hour as 1 DBU per hour. | Medium | SI003 |
| CI008 | Microsoft Learn says AI Gateway inference tables bill 7.143 DBUs per 1 GB of payload. | Medium | SI003 |
| CI009 | Databricks said on September 8, 2025 that it crossed a $4 billion revenue run-rate growing more than 50% year over year. | Medium | SI005 |
| CI010 | Databricks said its AI products recently crossed a $1 billion revenue run-rate by September 2025. | Medium | SI005 |
| CI011 | Databricks said it had achieved positive free cash flow over the prior 12 months by September 2025. | Medium | SI005 |
| CI012 | Databricks said its September 2025 Series K raised $1 billion at a valuation above $100 billion. | Medium | SI005 |
| CI013 | Databricks said on February 9, 2026 that it crossed a $5.4 billion revenue run-rate with growth above 65% year over year. | High | SI006, SI010, SI011 |
| CI014 | Databricks said in February 2026 that its financing package exceeded $7 billion, including about $5 billion of equity at a $134 billion valuation and about $2 billion of additional debt capacity. | High | SI006, SI010, SI011 |
| CI015 | Databricks said in February 2026 that it delivered positive free cash flow over the prior 12 months. | High | SI006, SI011 |
| CI016 | Databricks said in February 2026 that its AI products crossed a $1.4 billion revenue run-rate. | High | SI006, SI010, SI011, SI012 |
| CI017 | Databricks said in February 2026 that more than 800 customers were consuming at over $1 million in annual revenue run-rate. | High | SI006, SI011, SI023 |
| CI018 | Databricks said in February 2026 that more than 70 customers were consuming at over $10 million in annual revenue run-rate. | High | SI006, SI011, SI023 |
| CI019 | CNBC reported in June 2025 that Databricks expected annualized revenue to reach $3.7 billion by July 2025. | Medium | SI009 |
| CI020 | CNBC reported Databricks generated $2.6 billion of revenue in the fiscal year that ended in January 2025. | Medium | SI009 |
| CI021 | CNBC reported in June 2025 that Databricks had a net retention rate above 140%. | Medium | SI009, SI023 |
| CI022 | CNBC reported that nearly 50 Databricks customers were spending over $10 million annually in the first quarter of fiscal 2026. | Medium | SI009 |
| CI023 | CNBC reported in June 2025 that Databricks was close to free-cash-flow positive in the most recent fiscal year. | Medium | SI009 |
| CI024 | Databricks said in December 2024 that it was raising $10 billion of expected non-dilutive financing, with $8.6 billion completed to date, at a $62 billion valuation. | Medium | SI004 |
| CI025 | Databricks said the December 2024 capital package was intended for AI products, acquisitions, international go-to-market expansion, and employee liquidity and related taxes. | Medium | SI004 |
| CI026 | Snowflake pricing describes a managed platform with elastic compute and separate storage charges. | Medium | SI013 |
| CI027 | Google says BigQuery on-demand analysis is priced at $6.25 per tebibyte above the first free tebibyte each month. | Medium | SI016 |
| CI028 | Google says BigQuery also offers capacity pricing in slots with pay-as-you-go autoscaling and optional one-year and three-year commitments. | Medium | SI016 |
| CI029 | AWS says Amazon Redshift Serverless starts at $1.50 per hour and bills RPU-hours on a per-second basis while the warehouse is active. | Medium | SI017 |
| CI030 | AWS says Amazon Redshift Serverless reservations can reduce compute costs by up to 45% for a three-year term or up to 24% for a one-year term. | Medium | SI017 |
| CI031 | Snowflake said in its FY2026 10-K that revenue was $4.7 billion and remaining performance obligations were about $9.8 billion, with about 46% expected to be recognized within 12 months. | Medium | SI014 |
| CI032 | Snowflake said cost of product revenue increased by $248.1 million in FY2026 mainly because of higher third-party cloud infrastructure expenses, including AI inference. | Medium | SI014 |
| CI033 | Snowflake said product gross margin was 72% in FY2026. | Medium | SI014 |
| CI034 | Confluent said in its 10-K that public-cloud provider pricing significantly influences its costs and gross margins and that higher cloud mix can hurt margins. | Medium | SI018 |
| CI035 | Confluent said its shift to a consumption-oriented sales model could create near-term financial volatility and that Confluent Cloud historically had a lower average price than Confluent Platform subscriptions. | Medium | SI018 |
| CI036 | Confluent said its Confluent Cloud land motions include free trial and pay-as-you-go entry points with no commitments, and some customers resist large long-term commitments. | Medium | SI018 |
| CI037 | Databricks said total cost of ownership on the platform has two core components: direct platform costs and underlying cloud infrastructure costs. | Medium | SI007 |
| CI038 | Databricks said FinOps and platform teams need unified views because Databricks and cloud cost data are fragmented across accounts, clusters, tags, and business units. | Medium | SI007 |
| CI039 | CloudForecast wrote in 2026 that Databricks pricing is confusing because DBUs, compute types, tiers, and separate infrastructure costs all contribute to the customer bill. | Medium | SI021 |
| CI040 | Mammoth wrote in 2026 that published Databricks pricing ranges from about $0.07 to $0.65+ per DBU plus separate cloud infrastructure charges. | Low | SI022 |
| CI041 | Mammoth wrote in 2026 that Databricks billing is pay-per-second with no upfront costs, but total spend includes DBUs plus cloud infrastructure and storage. | Medium | SI022 |
| CI042 | Revenue Brew reported in February 2026 that Databricks had reached a $5.4 billion revenue run-rate. | Medium | SI012 |
| CI043 | Revenue Brew reported in February 2026 that Databricks said AI products generated $1.4 billion in annualized revenue. | Medium | SI012 |
| CI044 | Sacra estimated Databricks gross margins were about 80% as of June 2024, down from about 85% a year earlier. | Low | SI020 |
| CI045 | Sacra said Databricks average contract value stood at $208,696 as of June 2024. | Low | SI020 |
| CI046 | Sacra says Databricks uses a B2B, consumption-based SaaS model where customers pay for compute, storage, and data processing usage rather than fixed licenses or seat counts. | Low | SI020 |
| CI047 | Sacra says Databricks cost inputs include cloud infrastructure, data processing, and compute resources from AWS, Azure, and Google Cloud. | Low | SI020 |
| CI048 | Revefi wrote in 2026 that Databricks consumption-based pricing makes spend harder to predict as Genie and Mosaic AI workloads create variable and spiky compute demand. | Medium | SI023 |
| CI049 | Databricks pricing and product pages present monetized surfaces that include data engineering, data warehousing, AI, business intelligence, application development, database, and security. | Medium | SI001 |
| CI050 | Databricks says AI/BI removes per-seat and per-license BI fees by embedding BI and conversational analytics directly into the platform. | Medium | SI008 |
| CE001 | Databricks presents itself as a data and AI platform for enterprises rather than a single analytics SKU. | Medium | SE001 |
| CE002 | Databricks' product surface spans lakehouse architecture, governance, serverless SQL analytics, AI governance, and operational database products. | Medium | SE002 |
| CE003 | Databricks markets its lakehouse platform across AWS, Azure, and GCP. | Medium | SE002 |
| CE004 | Unity Catalog is positioned as unified and open governance for data and AI. | Medium | SE003 |
| CE005 | Unity Catalog claims to enforce discovery, access, quality monitoring, and compliance controls across structured and unstructured data, ML models, and business metrics in any cloud. | Medium | SE003 |
| CE006 | Unity Catalog advertises support for open formats including Delta, Apache Iceberg, Hudi, and Parquet. | Medium | SE003 |
| CE007 | Unity Catalog says it provides a unified catalog for structured data, unstructured data, business metrics, and AI models. | Medium | SE003 |
| CE008 | Unity Catalog offers row- and column-level access policies based on attributes and tags. | Medium | SE003 |
| CE009 | Unity Catalog provides end-to-end automated column-level lineage for data and AI assets. | Medium | SE003, SE011 |
| CE010 | Unity Catalog federates and governs external systems including MySQL, PostgreSQL, Salesforce, Redshift, Snowflake, BigQuery, and Hive Metastore without requiring migration. | Medium | SE003 |
| CE011 | AI/BI is described as AI-powered business intelligence that is natively integrated into the Databricks platform. | Medium | SE004 |
| CE012 | AI/BI says dashboards, Genie, Databricks SQL, Databricks One, Genie Code, and Unity Catalog Business Semantics are part of the BI product family. | Medium | SE004 |
| CE013 | AI/BI says analytics run directly on governed data in Unity Catalog so metrics, lineage, and permissions stay aligned. | Medium | SE004 |
| CE014 | AI/BI claims there are no per-seat or per-license BI fees for users exploring data and dashboards. | Low | SE004 |
| CE015 | Databricks architecture guidance describes platform fundamentals in terms of control plane, compute plane, and storage components. | Medium | SE009 |
| CE016 | Azure Databricks accounts can manage multiple workspaces and multiple Unity Catalog metastores. | Medium | SE019 |
| CE017 | Databricks workspaces are the collaboration environment for ingestion, interactive exploration, scheduled jobs, and ML training. | Medium | SE019 |
| CE018 | Azure Databricks operates a control plane that Databricks manages outside the customer cloud account, and the web application lives in that control plane. | Medium | SE019, SE021 |
| CE019 | Azure Databricks uses different compute planes for serverless and classic compute: serverless runs in the Databricks account, while classic compute runs in the customer Azure subscription. | Medium | SE019 |
| CE020 | Databricks recommends a medallion architecture in which bronze, silver, and gold layers progressively improve data quality and structure. | Medium | SE010 |
| CE021 | Databricks' medallion example ingests raw data from cloud storage, Kafka, and Salesforce into bronze before validation in silver and enrichment in gold. | Medium | SE010 |
| CE022 | Mosaic AI Model Serving is Databricks' interface for deploying, governing, and querying AI and ML models for real-time serving and batch inference. | Medium | SE012 |
| CE023 | Mosaic AI Model Serving exposes served models as REST APIs and automatically scales with serverless compute for availability and latency management. | Medium | SE012 |
| CE024 | Databricks says external models from providers such as OpenAI and Anthropic can be centrally governed through model-serving endpoints. | Medium | SE012 |
| CE025 | Serverless SQL warehouses do not have public IP addresses. | Medium | SE013 |
| CE026 | Serverless SQL requires Premium-plan-or-higher workspaces and separate acceptance of serverless terms of service. | Medium | SE013 |
| CE027 | The Databricks release-notes index was updated on 2026-05-04 and includes feature-specific notes for Databricks SQL, Lakeflow Spark Declarative Pipelines, and serverless compute. | Medium | SE014 |
| CE028 | Lakebase is marketed as an operational Postgres database for AI agents and applications that is integrated with the lakehouse. | Medium | SE005 |
| CE029 | Lakebase advertises decoupled compute and storage, point-in-time recovery, scale-to-zero autoscaling, and database branching. | Medium | SE005 |
| CE030 | Lakebase says operational data can stay connected to the lakehouse through Unity Catalog governance and one-click data sync. | Medium | SE005 |
| CE031 | Databricks Trust says security capabilities include encryption, network controls, auditing, identity integration, access controls, and data governance. | Medium | SE006 |
| CE032 | Databricks publicly lists FedRAMP, GDPR, HIPAA, PCI-DSS, ISO 27001/27017/27018/27701, and SOC as supported compliance frameworks or attestations. | Medium | SE007 |
| CE033 | Databricks says its due diligence package includes ISO certificates and an annual penetration-test confirmation letter, while SOC 3 is public and SOC reports refresh in June, August, and December. | Medium | SE008 |
| CE034 | Google Cloud markets Databricks on Google Cloud as scalable, secure, and cost-effective, with access to Gemini, BigQuery, open-source tools, and multicloud patterns. | Medium | SE028 |
| CE035 | Google Cloud says Databricks announced Unity Catalog support for reading and writing managed Apache Iceberg tables across catalogs. | Medium | SE027, SE022 |
| CE036 | NVIDIA documents RAPIDS acceleration on Databricks for pandas, Spark, and Dask, including GPU plugins for driver-and-worker Spark clusters. | Medium | SE026 |
| CE037 | The Databricks CLI GitHub repository showed 332 stars, 165 forks, and 263 releases, with v0.299.0 dated 2026-04-30. | Medium | SE016 |
| CE038 | The Databricks SDK for Python PyPI package showed version 0.106.0 released on 2026-04-30 and requiring Python 3.10 or newer. | Medium | SE018 |
| CE039 | The Databricks SDK for Python repository says Runtime 13.1 includes a bundled SDK and that authentication supports Databricks-native, Azure-native, and GCP-native flows. | Medium | SE017 |
| CE040 | Databricks launched LakeFlow in June 2024 as a built-in data-engineering product for ingestion, transformation, and orchestration across databases and SaaS sources. | Medium | SE023 |
| CE041 | theCUBE Research reported that Databricks' 2025 summit centered on open lakehouse architecture, unified governance, and AI democratization. | Medium | SE022 |
| CE042 | theCUBE Research said 2025 summit announcements included business-semantics direction for Unity Catalog plus GenAI, agent, Iceberg, and Lakebase updates. | Medium | SE022 |
| CE043 | VentureBeat reported in February 2026 that Lakebase was generally available, built on technology from Neon and Mooncake, and designed to make operational writes queryable by analytics engines without ETL. | Medium | SE025 |
| CE044 | TechCrunch reported in March 2026 that Databricks launched Lakewatch, a new security product that performs SIEM-style detection and investigation with AI agents. | Medium | SE024 |
| CE045 | On 2026-05-05 the Databricks AWS status page showed an active incident with compute partially disrupted in multiple regions while AI/BI and Databricks Apps remained operational. | Medium | SE015 |
| CE046 | ServiceAlert.ai showed 100% uptime over the prior 88 days for Databricks but no detailed incident data, limiting independent verification of outage severity or root causes. | Low | SE029 |
| CE047 | Microsoft's production-planning guidance recommends security, governance, and multi-workspace design before production Azure Databricks deployment and suggests serverless workspaces for initial exploration. | Medium | SE020 |
| CE048 | lakeFS describes Databricks as a hybrid PaaS in which a single-tenant data plane runs in the customer cloud account while a multi-tenant control plane remains with Databricks. | Low | SE021 |
| CU001 | Databricks said in February 2026 that more than 20,000 organizations worldwide rely on the platform and that 70% of the Fortune 500 are customers. | Medium | SU004 |
| CU002 | Databricks said in September 2025 that more than 20,000 organizations worldwide relied on the platform, including Block, Comcast, Condé Nast, Rivian, and Shell. | Medium | SU003 |
| CU003 | CNBC reported in June 2025 that Databricks had more than 15,000 customers. | Medium | SU005 |
| CU004 | Databricks said in September 2025 that it had 650-plus customers consuming more than $1 million of annual revenue run-rate. | Medium | SU003 |
| CU005 | Databricks said in February 2026 that it had more than 800 customers consuming more than $1 million of annual revenue run-rate. | Medium | SU004 |
| CU006 | Databricks said in February 2026 that it had more than 70 customers consuming more than $10 million of annual revenue run-rate. | Medium | SU004 |
| CU007 | CNBC reported in June 2025 that nearly 50 Databricks customers were spending more than $10 million annually in the first quarter of the new fiscal year. | Medium | SU005 |
| CU008 | Databricks said in September 2025 that its net retention rate was sustaining above 140 percent. | Medium | SU003 |
| CU009 | Databricks said in February 2026 that its net retention rate remained above 140 percent. | Medium | SU004 |
| CU010 | CRN reported in February 2026 that Databricks sustained net retention greater than 140 percent. | Medium | SU006 |
| CU011 | The Databricks July 2025 summit recap said hundreds of customers, including 7-Eleven, Fox Sports, and Rivian, presented active use cases at Data + AI Summit 2025. | Medium | SU002 |
| CU012 | Databricks said 7-Eleven uses the platform to run a multipurpose agentic marketing assistant across more than 13,000 stores. | Medium | SU002 |
| CU013 | Databricks said 7-Eleven used assessments and workflows to simplify a Unity Catalog migration. | Medium | SU002 |
| CU014 | A Databricks Events YouTube session exists for 7-Eleven on using Mosaic AI to create a multi-purpose agentic marketing assistant. | Low | SU019 |
| CU015 | Databricks said FOX Sports built Cleatus AI to answer fan questions in natural language using live scores, stats, and commentary. | Medium | SU002 |
| CU016 | Databricks said FOX Sports achieved a 2x higher query-success rate for fans using its AI-powered search experience. | Medium | SU002, SU016 |
| CU017 | The FOX Sports Databricks customer story says AI-powered search more than doubled its success rate while delivering more personalized and timely insights to fans. | Medium | SU016 |
| CU018 | A Databricks Events YouTube page exists for a FOX Sports session on reimagining the fan experience with the Databricks Data Intelligence Platform. | Low | SU021 |
| CU019 | Databricks said Mastercard uses the platform to deploy AI responsibly across teams, platforms, and partners while automating onboarding support with a GenAI assistant. | Medium | SU002 |
| CU020 | Databricks said Mastercard used Delta Lake to cut query time by 80 percent and storage by 70 percent, and used Workflows to reduce pipeline processing from months to days. | Medium | SU002 |
| CU021 | Mastercard said its new product onboarding assistant was built in collaboration with Databricks on the Data Intelligence Platform. | Medium | SU011 |
| CU022 | Mastercard said the onboarding assistant uses retrieval-augmented generation and a human-in-the-loop feedback loop. | Medium | SU011 |
| CU023 | Mastercard said it uses machine-learning models to analyze more than 143 billion transactions per year. | Medium | SU011 |
| CU024 | In a September 2025 Mastercard story, Arsalan Tavakoli said the Mastercard product onboarding assistant significantly sped up onboarding and that churn in the process had come down. | Medium | SU012 |
| CU025 | The current Databricks customer story page for Mastercard frames the account as a responsible-AI and governance deployment at global payments scale. | Low | SU017 |
| CU026 | Databricks said Insulet used the platform to achieve 12x faster real-time data processing, 83 percent fewer SQL queries, and 97 percent lower total cost of ownership. | Medium | SU002 |
| CU027 | The Insulet Databricks customer story says adopting Databricks delivered 12x faster data processing and 97 percent lower total cost of ownership. | Medium | SU018 |
| CU028 | The Insulet Databricks customer story says Lakeflow Connect automated ingestion from enterprise applications including Salesforce and Workday. | Medium | SU018 |
| CU029 | Microsoft said AT&T achieved a five-year ROI of 300 percent after migrating to Azure Databricks. | Medium | SU007 |
| CU030 | Microsoft said AT&T reduced more than 80 schemas and accelerated its data-science cycles by about three times after migrating to Azure Databricks. | Medium | SU007 |
| CU031 | Microsoft said AT&T now supports nearly 90,000 internal customers on one data architecture and can spin up new computing environments in hours rather than three to four months. | Medium | SU007 |
| CU032 | Databricks said AT&T and Databricks built AutoClassify, an end-to-end system for automatic multi-head binary classification from unlabeled text. | Medium | SU002 |
| CU033 | A Databricks Events YouTube page exists for an AT&T AutoClassify customer session. | Low | SU020 |
| CU034 | PeerSpot listed 93 Databricks reviews on the reviewed page. | Medium | SU008 |
| CU035 | A PeerSpot reviewer said Databricks had become very expensive for their team and was less forgiving than Snowflake when implemented inefficiently. | Low | SU008 |
| CU036 | PeerSpot summarized Databricks as frequently expensive for enterprise buyers because costs vary with usage, compute time, and data processed. | Low | SU008 |
| CU037 | The archived Capterra Databricks page showed 17 reviews. | Low | SU009 |
| CU038 | Capterra review text said Databricks can feel overwhelming for new users and that initial setup and connections require an experienced professional. | Low | SU009 |
| CU039 | A Capterra review said Databricks pricing was fairly expensive and connecting Azure Data Lake required workarounds. | Low | SU009 |
| CU040 | Databricks said in a Gartner Peer Insights recap that AI/BI earned a 4.8 out of 5 star rating and 94 percent willingness to recommend from 167 verified customer reviews as of September 30, 2025. | Medium | SU024 |
| CU041 | FeaturedCustomers said Databricks had 631 reviews, 457 case studies, and 128 customer videos on its platform. | Low | SU010 |
| CU042 | Microsoft describes Azure Databricks as a Spark-based data and AI platform optimized for Microsoft Azure that works with Power BI, Azure AI Foundry, Power Platform, and other Microsoft services. | Medium | SU013 |
| CU043 | Google Cloud says Databricks on Google Cloud is available on Marketplace and offers enterprise capabilities for AI-driven outcomes. | Medium | SU014 |
| CU044 | SAP said SAP Business Data Cloud includes SAP Databricks as a first-party data service and brings the power of Databricks directly into SAP Business Data Cloud. | Medium | SU015 |
| CU045 | PR Newswire reported that Databricks received FedRAMP High agency authority to operate on AWS GovCloud in April 2024 and that Azure Databricks already held FedRAMP High and IL5 authorizations. | Medium | SU022 |
| CU046 | The current Databricks AI-customer page still highlights Mastercard and Rivian video references on the customer surface. | Low | SU025 |
| CR001 | Databricks’ Privacy Notice says it applies to websites, applications, platform services, events, sales, and marketing activities. | Medium | SR001 |
| CR002 | Databricks’ Privacy Notice says California residents have additional rights under the CCPA. | Medium | SR001 |
| CR003 | Databricks says it uses large language models and other AI tools for certain uses of collected information in accordance with applicable law. | Medium | SR001 |
| CR004 | Databricks says it uses European Commission Standard Contractual Clauses, supplementary measures, and a DPA with SCCs for customer transfers. | High | SR001, SR002 |
| CR005 | Databricks says it is certified to the EU-U.S., UK, and Swiss Data Privacy Frameworks. | Medium | SR001 |
| CR006 | Databricks offers a downloadable, electronically signable Data Processing Addendum for customers that require one. | Medium | SR002 |
| CR007 | Databricks says its due-diligence package includes ISO certifications, an annual pen-test confirmation letter, an Enterprise Security Guide, and a SOC 2 Type II report. | Medium | SR003 |
| CR008 | Databricks documentation says security and compliance are a shared responsibility between Databricks, the customer, and the cloud provider. | Medium | SR004 |
| CR009 | Databricks documentation says the Enhanced Security and Compliance add-on includes controls for FedRAMP High, FedRAMP Moderate, and HIPAA. | High | SR030, SR005 |
| CR010 | The FedRAMP Marketplace lists Databricks on Azure Commercial as FedRAMP Certified, Class D (High), Rev5, as of 2026-01-16. | Medium | SR007 |
| CR011 | The EUR-Lex AI Act summary says the regulation applies from 2 August 2026, while some governance, penalty, and general-purpose AI model obligations start on 2 August 2025. | Medium | SR008 |
| CR012 | The EUR-Lex AI Act summary says providers of general-purpose AI models face documentation, downstream-information, training-data disclosure, and possible additional risk-management and cybersecurity duties. | Medium | SR008 |
| CR013 | CourtListener shows In Re Mosaic LLM Litigation is a live federal copyright case involving Databricks, with a last known filing on 2026-04-29. | Medium | SR009 |
| CR014 | Internet Cases reports that the court allowed plaintiffs to amend the complaint to add direct copyright infringement claims against Databricks tied to DBRX. | Medium | SR010 |
| CR015 | The Register reported on 2026-04-29 that Judge Breyer denied Databricks’ motion to dismiss and allowed authors’ claims to continue. | Medium | SR011 |
| CR016 | The Register says plaintiffs allege DBRX inherited risk from Mosaic’s MPT lineage through RedPajama and Books3, with potential statutory damages up to $150,000 per work if willful infringement is proven. | Medium | SR011 |
| CR017 | CFM Lawyers says a proposed class action was filed in British Columbia and Quebec on 2025-07-24 against Databricks and MosaicML over Books3 and The Pile training-data allegations. | Medium | SR012 |
| CR018 | Databricks operates a public status page that provides high-level availability information across Databricks services and regions. | Medium | SR006 |
| CR019 | Databricks and Azure Databricks documentation says Delta Lake can be used to manage GDPR and CCPA compliance workflows. | High | SR005, SR030 |
| CR020 | IsDown says Azure Databricks had 20 incidents in the last 90 days, including 1 major outage and 19 minor incidents, with a median duration of 1 hour 33 minutes. | Medium | SR027 |
| CR021 | IsDown says it has documented 173 Azure Databricks outages and incidents since January 2023, averaging 4.4 per month, with typical resolution time of 177 minutes. | Medium | SR027 |
| CR022 | IsDown says it monitors the official Azure Databricks status page across 11 components. | Medium | SR027 |
| CR023 | Databricks said in its Series K announcement that it had launched or expanded partnerships with Microsoft, Google Cloud, Anthropic, SAP, and Palantir in the prior two quarters. | Medium | SR014 |
| CR024 | Databricks said its Google Cloud partnership makes Gemini models native Databricks products billable through Databricks contracts. | Medium | SR021 |
| CR025 | Databricks said the Google Cloud partnership lets customers use Gemini on enterprise data under Unity Catalog governance without data replication. | Medium | SR021 |
| CR026 | Databricks and Anthropic announced a strategic five-year partnership to offer Claude natively through Databricks across AWS, Azure, and Google Cloud Platform. | Medium | SR023 |
| CR027 | SAP said SAP Business Data Cloud natively embeds Databricks for data engineering, machine learning, and AI workloads. | Medium | SR022 |
| CR028 | Microsoft Fabric markets a complete data platform with AI-powered tools, a unified lake, autonomous databases, and shared resilience, security, governance, and compliance. | Medium | SR025 |
| CR029 | Amazon EMR markets serverless Spark, Trino, and Flink analytics plus a unified data-and-AI environment inside AWS with cost and performance claims. | Medium | SR026 |
| CR030 | Snowflake’s FY2026 10-K says its AI Data Cloud runs across three major public clouds and 53 regional deployments and includes cross-cloud business-continuity capabilities. | Medium | SR024 |
| CR031 | TechCrunch reported that Databricks closed $10 billion of Series J equity financing at a $62 billion valuation in January 2025 and also added $5.25 billion of debt financing. | Medium | SR013 |
| CR032 | TechCrunch reported that Databricks planned to use its January 2025 financing for new AI products, global go-to-market expansion, acquisitions, and employee liquidity. | Medium | SR013 |
| CR033 | Databricks said its August 2025 Series K term sheet valued the company at more than $100 billion. | Medium | SR014 |
| CR034 | CRN reported that Databricks closed a $1 billion Series K round at a valuation above $100 billion in September 2025. | Medium | SR015 |
| CR035 | TechCrunch reported that Databricks raised more than $4 billion at a $134 billion valuation in December 2025, up 34% from $100 billion three months earlier. | High | SR016, SR020 |
| CR036 | TechCrunch reported that Databricks was investing heavily in Lakebase and Agent Bricks and had struck model-access deals worth hundreds of millions with Anthropic and OpenAI. | Medium | SR016 |
| CR037 | CNBC reported in January 2026 that Databricks landed $1.8 billion of fresh debt and had access to more than $7 billion of debt. | Medium | SR017 |
| CR038 | CNBC reported in January 2026 that Databricks’ December round implied a $134 billion valuation alongside $4.8 billion of run-rate revenue growing more than 55% year over year and positive free cash flow. | High | SR017, SR016 |
| CR039 | CNBC reported in February 2026 that Databricks completed $5 billion of funding plus $2 billion of new debt capacity at a $134 billion valuation. | High | SR018, SR019 |
| CR040 | CNBC reported in February 2026 that Databricks’ annualized revenue exceeded $5.4 billion for the January quarter, up 65% year over year, while delivering free cash flow over the prior year. | High | SR018, SR019 |
| CR041 | CRN reported in February 2026 that Databricks’ AI-products revenue run rate exceeded $1.4 billion and that the company had 800 $1 million customers and 70 $10 million customers. | Medium | SR019 |
| CR042 | Databricks’ legal center is the company’s public hub for legal documents, privacy FAQs, service terms, and compliance resources. | Medium | SR028 |
| CR043 | Databricks’ trust and privacy center positions privacy, trust, and subprocessor-related materials as a public diligence surface for customers. | Medium | SR029 |
| CR044 | Because advanced compliance controls sit in a named add-on and public-sector authorization is explicitly tied to Databricks on Azure Commercial, Databricks’ public compliance coverage is strong but not obviously uniform across all clouds and tiers. | Medium | SR007, SR009, SR030 |
| CR045 | Databricks’ AI roadmap now depends on external model partners, hyperscalers, and embedded channels, so partner concentration can affect product availability, economics, and account control even while it speeds distribution. | Medium | SR021, SR022, SR023, SR025, SR026 |
| CR046 | The jump from a $62 billion valuation in January 2025 to more than $100 billion in August-September 2025 and $134 billion by December 2025-February 2026 leaves less room for execution misses or delayed IPO timing. | Medium | SR013, SR014, SR015, SR016, SR018 |
| CR047 | Databricks’ simultaneous pushes into Lakebase, Agent Bricks, AI apps, and strategic partner integrations increase execution complexity relative to a narrower lakehouse product story. | Medium | SR014, SR016, SR023 |
| CR048 | Databricks’ public documentation surface is stronger than many private AI infrastructure peers, but it reduces diligence friction more than it eliminates litigation, outage, dependency, or valuation risk. | Medium | SR001, SR002, SR003, SR006, SR028, SR029 |
| CV001 | Databricks announced a Series J financing on 2024-12-17 at a $62 billion valuation. | Medium | SV001 |
| CV002 | Databricks said the Series J package targeted $10 billion of expected non-dilutive financing and had completed $8.6 billion to date. | Medium | SV001 |
| CV003 | Databricks said in December 2024 that it expected to cross a $3 billion revenue run-rate in the quarter ending 2025-01-31. | Medium | SV001 |
| CV004 | Databricks said in December 2024 that the quarter ending 2025-01-31 would mark its first positive free-cash-flow quarter. | Medium | SV001 |
| CV005 | Databricks announced on 2025-08-19 that it had signed a Series K term sheet valuing the company at more than $100 billion. | Medium | SV002 |
| CV006 | Databricks announced on 2025-12-16 that it was raising more than $4 billion in a Series L round at a $134 billion valuation. | Medium | SV003, SV004, SV005 |
| CV007 | Databricks said it crossed a $4.8 billion revenue run-rate in Q3 2025. | Medium | SV003, SV007 |
| CV008 | Databricks said Q3 2025 revenue was growing by more than 55% year over year. | Medium | SV003, SV007 |
| CV009 | Databricks said its AI products reached more than a $1 billion revenue run-rate by Q3 2025. | Medium | SV003, SV007 |
| CV010 | Databricks said its Data Warehousing business had reached more than a $1 billion revenue run-rate by Q3 2025. | Medium | SV003 |
| CV011 | Databricks said it had delivered positive free cash flow over the previous 12 months as of the Series L announcement. | Medium | SV003, SV006 |
| CV012 | Databricks said net retention remained above 140% at the time of the Series L announcement. | Medium | SV003, SV006 |
| CV013 | Databricks said more than 700 customers were already consuming over $1 million of annual revenue run-rate by December 2025. | Medium | SV003 |
| CV014 | CNBC reported that Databricks’ $134 billion Series L valuation was a 34% jump from the valuation implied by the August 2025 financing. | Medium | SV004 |
| CV015 | TechCrunch described the December 2025 Series L as Databricks’ third major venture fundraise in less than a year. | Medium | SV005 |
| CV016 | CRN reported that Databricks had surpassed a $5.4 billion annual revenue run-rate by the quarter ended 2026-01-31. | Medium | SV006, SV007 |
| CV017 | CRN reported that Databricks grew 65% year over year in the quarter ended 2026-01-31. | Medium | SV006, SV007 |
| CV018 | CRN reported that Databricks’ AI products exceeded a $1.4 billion revenue run-rate in the quarter ended 2026-01-31. | Medium | SV006, SV007 |
| CV019 | CRN reported that Databricks had 800 customers above a $1 million annual run-rate and 70 customers above a $10 million annual run-rate by February 2026. | Medium | SV006, SV007 |
| CV020 | CRN reported that the latest Databricks financing stack exceeded $7 billion, including about $5 billion of equity and about $2 billion of additional debt capacity. | Medium | SV006, SV007 |
| CV021 | CRN noted that Databricks still does not disclose detailed financial statements publicly despite reporting run-rate and growth snapshots. | Medium | SV006 |
| CV022 | Sacra estimated that AI products represented about 26% of Databricks’ January 2026 annualized revenue run-rate. | Medium | SV007 |
| CV023 | Sacra said Databricks was reporting 80% gross margins in June 2024, down from 85% a year earlier. | Low | SV007 |
| CV024 | Forbes wrote that Databricks was trading at roughly 25x forward revenue when it carried a $100 billion valuation against a $4 billion annual run-rate in October 2025. | Medium | SV008 |
| CV025 | Forbes wrote in October 2025 that Snowflake was trading at roughly 18x forward revenue on about $79 billion of market capitalization and expected fiscal-2026 revenue of $4.395 billion. | Low | SV008 |
| CV026 | Forbes argued that steep software valuation multiples came under pressure when public-market growth decelerated, using Snowflake as a cautionary example. | Medium | SV008 |
| CV027 | CompaniesMarketCap listed Snowflake’s market capitalization at $49.85 billion as of May 2026. | Medium | SV011 |
| CV028 | Snowflake reported $4.4723 billion of fiscal-2026 product revenue. | Medium | SV010, SV013 |
| CV029 | CompaniesMarketCap listed MongoDB’s market capitalization at $21.27 billion as of May 2026. | Medium | SV015 |
| CV030 | MongoDB reported $2.01 billion of fiscal-2025 total revenue. | Medium | SV014, SV017 |
| CV031 | CompaniesMarketCap listed Confluent’s market capitalization at $11.13 billion as of May 2026. | Medium | SV018 |
| CV032 | Macrotrends listed Confluent’s 2025 annual revenue at $1.167 billion. | Medium | SV019 |
| CV033 | CompaniesMarketCap listed Elastic’s market capitalization at $5.24 billion as of May 2026. | Medium | SV020 |
| CV034 | Macrotrends listed Elastic’s 2025 annual revenue at $1.483 billion. | Medium | SV021, SV022 |
| CV035 | CompaniesMarketCap listed Cisco’s market capitalization at $365.87 billion as of May 2026. | Medium | SV024 |
| CV036 | Macrotrends listed Cisco’s 2025 annual revenue at $56.654 billion. | Medium | SV025 |
| CV037 | Cisco’s September 2023 merger filing said the Splunk acquisition would pay $157.00 per share in cash and value the acquired equity at about $28 billion. | Medium | SV023 |
| CV038 | CompaniesMarketCap listed Palantir’s market capitalization at $350.05 billion as of May 2026. | Medium | SV026 |
| CV039 | Macrotrends listed Palantir’s 2025 annual revenue at $4.475 billion. | Medium | SV027 |
| CV040 | CompaniesMarketCap listed ServiceNow’s market capitalization at $94.84 billion as of May 2026. | Medium | SV028 |
| CV041 | Macrotrends listed ServiceNow’s 2025 annual revenue at $13.278 billion. | Medium | SV029 |
| CV042 | ServiceNow reported $3.671 billion of Q1 2026 subscription revenue and $27.7 billion of remaining performance obligations. | Medium | SV030 |
| CV043 | Using Databricks’ disclosed $134 billion valuation and $4.8 billion run-rate implies roughly a 27.9x valuation-to-run-rate multiple. | Medium | SV003 |
| CV044 | Using Databricks’ disclosed $134 billion valuation and $5.4 billion run-rate implies roughly a 24.8x valuation-to-run-rate multiple. | Medium | SV006, SV007 |
| CV045 | Using current May 2026 market capitalization and latest annual revenue implies Snowflake trades around 11.1x revenue. | Medium | SV011, SV013 |
| CV046 | Using current May 2026 market capitalization and latest annual revenue implies MongoDB trades around 10.6x revenue. | Medium | SV015, SV017 |
| CV047 | Using current May 2026 market capitalization and latest annual revenue implies Confluent trades around 9.5x revenue. | Medium | SV018, SV019 |
| CV048 | Using current May 2026 market capitalization and latest annual revenue implies Elastic trades around 3.5x revenue. | Medium | SV020, SV021 |
| CV049 | Using current May 2026 market capitalization and latest annual revenue implies Palantir trades around 78.2x revenue. | Medium | SV026, SV027 |
| CV050 | Using current May 2026 market capitalization and latest annual revenue implies ServiceNow trades around 7.1x revenue. | Medium | SV028, SV029 |
| CV051 | Public sources support that Databricks’ IPO timing remained discretionary into early 2026: management would not rule out 2026, but no filing timeline or audited S-1 process was public. | Medium | SV004, SV006, SV009 |
| CV052 | A reasonable base-case valuation range is about $110 billion to $145 billion if Databricks reaches roughly $6.0 billion to $6.6 billion run-rate while public comp multiples stay in the high-single-digit to low-double-digit range. | Medium | SV006, SV011, SV013, SV015, SV017, SV018, SV019, SV020, SV021, SV028, SV029 |
| CV053 | A bull case above the current mark requires Databricks to preserve an AI premium while scaling toward roughly $8 billion or more of run-rate, supporting a valuation range around $180 billion to $220 billion. | Medium | SV003, SV006, SV007, SV026, SV027, SV028, SV029 |
| CV054 | A bear case of roughly $55 billion to $85 billion is plausible if growth slows toward mature-software levels and Databricks rerates toward the 10x to 15x range visible in public data-platform comps. | Medium | SV008, SV011, SV013, SV015, SV017, SV018, SV019, SV020, SV021 |
| CV055 | At a $134 billion entry price, Databricks offers limited base-case upside and therefore fits a track posture better than a buy posture on public evidence alone. | Medium | SV004, SV006, SV008, SV011, SV013, SV015, SV017, SV018, SV019, SV020, SV021, SV028, SV029 |
| CV056 | The main thesis-break triggers are multiple compression, loss of >55% growth, failure to convert AI mix into durable economics, or disclosure that a preference stack materially reduces common-equity upside. | Medium | SV003, SV006, SV007, SV008 |
| CV057 | The most material remaining diligence asks are the cap table and preference stack, audited revenue-to-run-rate bridge, debt terms, customer concentration, and AI-product gross margin. | Medium | SV006, SV007, SV009 |
| CV058 | The comparable sample is model-appropriate only as a partial reference set because Databricks is private and uses a post-money valuation while public comps are current market-cap snapshots tied to different revenue definitions. | Medium | SV003, SV006, SV008, SV011, SV013, SV015, SV017, SV018, SV019, SV020, SV021, SV028, SV029 |