Startup Diligence
Diligence report AI safety / interpretability tools Series B private 2026-06-10

Goodfire

Interpretability-native model design lab with elite backing and still-unproven commercial scale

Goodfire looks like a category-defining interpretability company, but the public record still does not justify underwriting the February 2026 valuation as a clear bargain.

Cover facts

Latest public valuation 01
1.25 USD B [CV001]
Latest round 02
150 USD M [CV001]
Disclosed capital raised 03
207 USD M [CO021, CI005]
Recommendation 05
research-more [CV047]

Company profile

Goodfire is a San Francisco-based AI interpretability company and public benefit corporation building a model-design environment for understanding, debugging, and steering neural networks. The company sells selective enterprise and research partnerships around Silico/Ember-style interpretability workflows for frontier model teams, healthcare and scientific AI programs, and other high-stakes deployments, but public disclosure still leaves revenue quality and customer breadth mostly opaque.

Website
www.goodfire.ai
Founded
2024-01-01
Founders
Eric Ho, Daniel Balsam, Tom McGrath
Founding location
San Francisco, California, USA
Headquarters
San Francisco, California
Product
Goodfire's product is a model-design environment that exposes model internals, helps diagnose failure modes, supports steering and monitoring, and is increasingly packaged around selective enterprise and scientific deployments.
Customers
Frontier model builders, enterprise AI teams, life-sciences and scientific AI groups, and other high-stakes model developers.
Business model
Selective design-partner and enterprise software engagements built around platform access, pilots, and high-touch research or field-engineering support.
Stage
Series B private
Funding status
$150 million Series B announced in February 2026 at a $1.25 billion valuation after earlier seed and Series A rounds.
[CO001, CO003, CO004, CO018, CO021, CI007, CI008, CI009]

Executive summary

Top strengths

  • Goodfire has unusually strong research credibility and a differentiated interpretability-native product thesis.
  • The cap table includes high-signal investors and strategic backers spanning frontier AI and enterprise software.
  • Early flagship partnerships in healthcare, scientific AI, and enterprise design-partner workflows show real wedge potential.

Top risks

  • Public disclosure still does not show ARR, revenue quality, standardized pricing, retention, or customer concentration.
  • The company is valued as a future infrastructure winner before proving repeatable software economics.
  • Adjacent observability, guardrail, and platform vendors may satisfy many buyer budgets without requiring Goodfire's deeper tooling.

Open gaps

  • NDA-level disclosure is still needed on recurring revenue, pricing architecture, and software-versus-services mix.
  • The post-Series-B preference stack, ownership structure, and any secondary or debt features remain undisclosed.
  • Public materials do not provide a verified customer count, headcount, or concentration profile.

Contents

Chapter 01

01Company Overview

1.1 Identity, mission, and product positioning

Goodfire presents itself as a research company using interpretability to understand, learn from, and design AI systems, and multiple official and financing sources describe it as a San Francisco-based public benefit corporation. The company’s central thesis is that frontier AI is still built too much as a black box, so its mission is to make models understandable, debuggable, and shapeable rather than relying on scale alone. Official materials consistently frame the business around a “model design environment” that helps users inspect model internals, diagnose failure modes, and intervene on behavior at the feature or circuit level. The product story has matured over time. Series A materials in 2025 centered on Ember as Goodfire’s flagship interpretability platform, while by 2026 the public-facing product page markets Silico as the first platform for intentional model design. The go-to-market motion appears selective rather than mass-market: Goodfire says it works with Fortune 500 enterprises, major healthcare institutions, and AI research labs, and its public product copy repeatedly targets organizations training or fine-tuning foundation models. Public evidence therefore supports a company identity that combines research lab, platform vendor, and design-partner model, with customer concentration and commercial scale still largely undisclosed.[CO001, CO002, CO003, CO004, CO005, CO024]

Snapshot KPI table
MetricValue / statusDateConfidenceGap / note
HeadquartersSan Francisco, California2026-06-10highOfficial and investor materials agree; careers page specifies Telegraph Hill office
Organization typePublic benefit corporation2026-06-10highRepeated in official and financing materials
Current stagePrivate, Series B stage2026-06-10mediumPrivate status disclosed; stage inferred from latest financing
Latest round$150M Series B2026-02-05highLed by B Capital
Latest valuation$1.25B2026-02-05highRepeated across official and third-party coverage
Total disclosed capital~$207M; publicly rounded to >$200M2026-02-05mediumSum of disclosed seed, A, and B rounds
Founding date2026-06-10low2024 is implied by seed timing and Series A language, but one independent profile says 2023
Current product brandSilico2026-04-30mediumEarlier 2025 materials used Ember; product naming evolved
Revenue / ARR2026-06-10lowNo public revenue or ARR disclosed in reviewed sources
Customer count2026-06-10lowNo public customer count disclosed; only broad customer categories named
Employee count2026-06-10lowNo official headcount disclosed; one independent profile estimates ~51 employees as of Jan 2026
Disclosed customer profileFortune 500 enterprises, major healthcare institutions, AI research labs2026-06-10mediumNamed logos and contract counts remain sparse

Null values mark unsupported public metrics rather than zero. Funding and valuation are well corroborated, while founding date, headcount, revenue, and customer count remain incomplete or indirect.

[CO001, CO003, CO004, CO008, CO009, CO020]
FO002: Company snapshot logic

How Goodfire links research identity, product architecture, partner types, capital, and execution dependencies.

[CO002, CO003, CO004, CO005, CO024, CO026]

1.2 Founders, leadership, and organizational profile

The founding team publicly centers on three cofounders: Eric Ho as CEO, Daniel Balsam as CTO, and Tom McGrath as chief scientist. Across investor and company materials, Ho is the primary public spokesperson and strategy voice; Balsam appears as the technical operator translating interpretability into product and applied research; and McGrath supplies heavyweight scientific credibility as the former founder of Google DeepMind’s interpretability team. Menlo and Salesforce materials also tie Ho and Balsam to prior operating experience at RippleMatch, reinforcing the narrative that Goodfire mixes frontier-research pedigree with startup execution. The broader team profile is also part of the investment case. Goodfire and its backers highlight alumni from OpenAI, Google DeepMind, Harvard, Stanford, and UC San Diego, plus named contributors such as Nick Cammarata and Leon Bergen. However, public leadership disclosure is incomplete: reviewed materials do not provide a full C-suite roster, detailed board composition, or ownership map. Even the founding date has some ambiguity. Financing materials imply the company was founded in 2024 because the Series A was said to arrive less than a year after founding and Lightspeed publicly announced a seed round in August 2024, but one independent profile describes Goodfire as founded in 2023. The office footprint disclosed publicly is also narrow: Goodfire’s careers page states roles are in person five days a week at its Telegraph Hill office in San Francisco.[CO006, CO007, CO008, CO009, CO010, CO011]

Leadership and founder table
PersonRoleBackgroundFounder-market fit / coverageKey-person dependency
Eric HoCEO, co-founderFormer founder/operator at RippleMatch; public face of Goodfire in financing and media coverageSets company narrative, fundraising, partnerships, and commercial positioning for interpretabilityHigh — external narrative and investor confidence are tightly linked to Ho
Daniel BalsamCTO, co-founderFormer AI and engineering leader at RippleMatch; appears in Mayo and investor materials as technical operatorBridges frontier interpretability research into product and applied genomics/enterprise use casesHigh — core technical execution and productization sit heavily with Balsam
Tom McGrathChief Scientist, co-founderFormer founder of Google DeepMind’s interpretability team; repeatedly cited as scientific anchorSupplies research credibility, agenda-setting, and technical recruiting powerHigh — scientific brand and category authority rely materially on McGrath
Nick CammarataSenior interpretability researcher / marquee team memberCore contributor to the seminal OpenAI interpretability teamSignals that Goodfire can recruit from the small global pool of top interpretability talentMedium — not sole decision-maker, but valuable for research legitimacy

Coverage is partial because reviewed sources do not disclose a full board, finance leadership, or complete management roster. Table focuses on publicly named founders and high-signal technical leadership.

[CO009, CO010, CO011, CO012, CO013, CO014]

1.3 Funding history, investor base, and current stage

Goodfire has raised capital unusually quickly for a research-first infrastructure company. Public sources show a $7 million seed round led by Lightspeed in August 2024, a $50 million Series A led by Menlo Ventures in April 2025, and a $150 million Series B led by B Capital at a $1.25 billion valuation in February 2026. The Series A syndicate added Lightspeed, Anthropic, B Capital, Work-Bench, Wing, and South Park Commons, while the Series B expanded the cap table with Juniper Ventures, DFJ Growth, Salesforce Ventures, and Eric Schmidt alongside returning investors. Goodfire and third-party coverage consistently round the cumulative funding to “over $200 million,” while simple addition of disclosed rounds implies roughly $207 million. The investor mix matters as much as the dollars. Anthropic’s participation in the Series A is a strategic signal from a safety-oriented frontier lab; Salesforce Ventures indicates an enterprise-software adoption angle; and B Capital’s lead role at Series B reflects a belief that interpretability may become a major infrastructure layer. Still, the public record is thin on ownership stakes, liquidation structure, debt, secondaries, and board seats. Public sources reviewed label Goodfire as private, and the company should be treated as a late-venture, Series B-stage private business rather than a scaled commercial software company. That distinction matters because the financing pace and valuation far outstrip the company’s disclosed revenue and customer metrics.[CO016, CO017, CO018, CO019, CO020, CO021]

Stakeholder or investor map
StakeholderRoleControl or economic importanceDiligence ask
Lightspeed Venture PartnersSeed lead; Series A and B participantEarliest institutional lead and continuing backer; likely influential in early governanceConfirm current ownership, pro rata rights, and any board seat
Menlo VenturesSeries A lead; Series B participantKey financial sponsor at the first large institutional round and visible public championConfirm board role, follow-on reserve usage, and any protective provisions
AnthropicSeries A participantStrategic investor whose presence signals safety and interpretability relevance to frontier labsClarify whether investment includes technical collaboration, channel value, or simple financial exposure
B CapitalSeries B lead; Series A participantLead investor at the $1.25B valuation; likely major board and governance influence post-Series BConfirm ownership percentage, board seat, liquidation terms, and any commercial introduction rights
Juniper VenturesSeries B existing investorNamed as returning investor at Series B but less visible in earlier public materialsDetermine entry round, ownership, and influence relative to better-known VCs
DFJ GrowthSeries B new investorAdds late-venture scale capital and potential follow-on capacityAssess whether DFJ view is platform-infrastructure or frontier-model optionality
Salesforce VenturesSeries B new investor and strategic enterprise partnerSignals enterprise software and procurement relevance, not just research backingClarify whether Salesforce provides channel access, product partnerships, or board observation rights
Eric SchmidtSeries B angel / strategic investorAdds brand and policy credibility disproportionate to likely check sizeDetermine whether Schmidt is passive capital or active network participant
Wing Venture CapitalSeries A and B participantContinuing venture support from infrastructure-oriented investor baseConfirm stake and any role in product go-to-market guidance
South Park CommonsSeries A and B participant; early ecosystem sponsorImportant ecosystem backer given Goodfire’s early office history and talent networkClarify talent pipeline and whether SPC provided incubation before formal founding
Work-BenchSeries A participantAdds enterprise-software pattern recognition at earlier stageDetermine whether Work-Bench remains active post-Series B
Mayo Clinic / design partnersStrategic non-investor stakeholdersPartners matter economically because commercial proof appears to rely on selective high-stakes collaborationsRequest signed customer references, paid pilot status, and renewal dynamics

Investor map is exhaustive for explicitly named public stakeholders, not for the full cap table. Exact ownership, board representation, liquidation preferences, and secondary activity are not publicly disclosed in the reviewed sources.

[CO016, CO017, CO018, CO020, CO022, CO023]
FO003: Snapshot KPIs

A compact maturity snapshot emphasizing financing, stage, and the limited public disclosure of operating metrics.

Total disclosed capital is a simple sum of the public $7M seed, $50M Series A, and $150M Series B. The figure intentionally omits revenue and customer-count KPIs because public sources do not support them.

[CO020, CO021, CO026, CO029, CO030, CO032]

1.4 Chronology, cover metrics, and key diligence risks

The public chronology is short but dense. Goodfire surfaces from seed financing in 2024, announced its Series A in April 2025, publicized the Mayo Clinic collaboration in September 2025, launched a fellowship program and field-building educational content in late 2025, and then announced its Series B and broader intentional design agenda in February 2026. By April 2026, MIT Technology Review covered Silico as a commercial product for debugging and steering models, and by May 2026 Goodfire was emphasizing SOC 2 certification and a growing enterprise-facing posture. This sequence shows a company trying to convert cutting-edge interpretability research into product and partnership credibility in under two years. The key cover-metric pattern is asymmetry: valuation and capital raised are well supported, while operating metrics remain sparse. No reviewed public source discloses revenue, ARR, or customer count. Headcount is not officially disclosed; one independent profile estimates about 51 employees as of January 2026. That opacity matters because the most credible adverse evidence in the source set is not about misconduct, but about execution risk: MIT Technology Review quotes an external interpretability researcher arguing that Goodfire adds “precision to the alchemy” rather than turning AI engineering into a fully principled science, and an independent health-tech analysis argues the Series B valuation is aggressive for a research-first company with early commercial traction. The diligence burden is therefore less about headline credibility and more about commercial proof, governance disclosure, and how quickly interpretability demand converts into repeatable software revenue.[CO025, CO026, CO027, CO030, CO031, CO032]

Milestone table
DateEventTypeAmount / valuation / statusParticipantsImplication
2024-08-15Lightspeed publicly announces leading Goodfire’s seed roundfounding$7M seedGoodfire; Lightspeed Venture PartnersEstablishes the first public financing marker and anchors a 2024 operating timeline
2025-04-17Goodfire announces Series A and Ember platformfinancing$50M Series AMenlo Ventures lead; Lightspeed, Anthropic, B Capital, Work-Bench, Wing, South Park CommonsMoves the company from seed research lab to institutionally backed platform narrative
2025-09-09Goodfire announces Mayo Clinic collaboration for genomic medicinepartnershipCollaboration announcedGoodfire; Mayo ClinicExpands relevance from core interpretability research into healthcare and clinical AI
2025-10-09Goodfire opens fall fellowship programscaleFellowship cohort recruitmentGoodfire research staffSignals active talent build-out and field-building beyond core founder bench
2025-12-11Goodfire shares Stanford guest lectures on interpretabilitygovernanceEducational releaseGoodfire researchers; Stanford course communityShows thought leadership and effort to shape the discipline around its agenda
2026-02-05Goodfire announces Series B and intentional design agendafinancing$150M at $1.25B valuationB Capital lead; Juniper, Menlo, Lightspeed, DFJ Growth, Salesforce Ventures, Eric Schmidt and othersValidates investor appetite and sharply reprices the company as category infrastructure
2026-04-30MIT Technology Review covers Silico public launchproductFee-based product launch / release coverageGoodfire; MIT Technology ReviewMarks transition from research platform narrative toward broader commercial productization
2026-04-30External researcher cautions that Silico adds “precision to the alchemy”adverseSkeptical expert commentaryLeonard Bereska; MIT Technology ReviewIntroduces skepticism that the product fully solves the scientific uncertainty it claims to address
2026-05-22Goodfire announces SOC 2 Type II certificationregulatoryCompliance certification announcedGoodfireSupports enterprise procurement and trust posture for handling sensitive model-development workflows
2026-06-10Public customer profile remains selective rather than broad-marketscaleFortune 500 / healthcare / research-lab usage stated; no broad metricsGoodfire; unnamed customersCommercial story still depends on quality of design partners more than disclosed volume metrics

Timeline emphasizes dated events visible in public materials. Some items represent public disclosure dates rather than the underlying operational start date, which remains partially unresolved for founding and commercial scale.

[CO017, CO016, CO018, CO024, CO026, CO027]
FO001: Company milestone timeline

Key public milestones from Goodfire’s seed emergence through Series B, product launch, and the first meaningful skeptical coverage.

[CO017, CO016, CO018, CO024, CO026, CO035]

1.5 Exhibits

Chapter 02

02Market Analysis

2.1 Market boundary and evidence-constrained sizing

Goodfire's relevant market is narrower than headline AI enthusiasm. Its own materials describe a product stack built around understanding model internals, debugging failures, steering behavior, shaping training, and in some cases monitoring production behavior. That boundary excludes generic copilots, general application observability, and AI infrastructure spend that never reaches a model-design workflow. The closest public analogs are LLM observability and evaluation vendors such as Arize, Fiddler, Datadog, LangSmith, Langfuse, Humanloop, Arthur, and Patronus, but even those mostly instrument prompts, traces, sessions, and outputs rather than model parameters or latent representations. Because Goodfire does not disclose pricing, customer count, or revenue, a classic TAM-SAM-SOM stack would overstate precision. The evidence-constrained approach is to use multiple lenses instead: first, macro demand signals that show AI usage and ROI pressure are spreading; second, published adjacent-tool pricing that establishes a software-budget floor for teams already buying observability and eval products; and third, an access lens that narrows the reachable market to organizations able to provide model internals and tolerate a services-heavy pilot motion. That combination supports a real but selective market, with more near-term substance in advanced model teams than in generic enterprise AI narratives.[CM001, CM002, CM015, CM016, CM017, CM020]

Market definition table
segment/categoryincluded spendexcluded spendbuyer/payerrelevance
Frontier-lab interpretability and model designInterpretability research infrastructure, steering workflows, training-shaping tools, safety diagnostics, and production monitors tied to owned modelsGeneric AI infrastructure, pure inference hosting, and generic app analyticsResearch leads, safety teams, and frontier-model R&D budgetsMost natural direct segment because labs own model internals and already value interpretability
Enterprise model engineering and governanceDebugging, eval, steering, and monitoring for proprietary or open-weight enterprise modelsTeams using only third-party closed APIs with no internal-model accessVP Engineering, AI platform leaders, ML infra, and advanced product budgetsReachable when enterprises run or fine-tune their own important models
Scientific AI and life-sciences model designModel decoding, validation, confounder removal, and discovery workflows in genomics, biology, and roboticsGeneral lab software, wet-lab tools, and non-model R&D softwareScientific-program leads, computational biology teams, and research budgetsStrong fit where internal-model understanding changes scientific or deployment quality
Regulated and high-consequence adoptersInterpretability, governance, and validation layers for finance, healthcare, legal, or safety-critical AICommodity workplace copilots and generic knowledge-worker subscriptionsClinical, compliance, risk, or domain-operations budgets with technical sponsorshipHigh-need segment, but harder to close because procurement and evidence burdens are heavier
Adjacent LLM observability and evaluation stackTracing, prompt management, evals, experiments, and guardrails already budgeted in production AI teamsDeep parameter or latent-space control when vendors only observe outputs or tracesDeveloper tooling, platform engineering, and MLOps budgetsImportant adjacency because these budgets define the closest public comparison set

The market boundary is intentionally narrow: it follows the spend that can plausibly land in model-internal understanding, steering, and validation workflows rather than all generative-AI software or infrastructure.

[CM020, CM021, CM022, CM028, CM029, CM030]
TAM/SAM/SOM or sizing lens table
publisheryeargeographyvalueCAGRmethodologyconfidencelimitation
Gartner2025GlobalTrough of Disillusionment (qualitative maturity lens)Hype-cycle lens for implementation realism and ROI dispersionhighUseful for timing and caution, but not a market-size number.
PwC2025Global100% of industries increasing AI usage; 3x higher revenue-per-worker growth in AI-exposed industriesMacro adoption and productivity lenshighAdoption breadth is real, but it does not isolate interpretability-tool budgets.
Arize + Langfuse2026Global SaaS$348-$600 annual list price per small team before heavy usageBottom-up adjacent pricing lens from public self-serve planshighTrace-and-eval tooling is adjacent, not the same as model-internal design tooling.
Langfuse2026Global SaaS$29,988 annual enterprise list price before volume addersPublic enterprise list-price lenshighOne vendor datapoint does not reveal Goodfire pricing or win rates.
Fiddler2026Global SaaS$0.002 per traceUsage-based observability lenshighSpend depends entirely on trace volume and still reflects output-trace observability, not interpretability work.
Goodfire direct market lens2026Selective design partnersUndisclosed / case-by-caseDirect commercial lens from MIT reporting plus Goodfire pilot agreementmediumNo public ACV, customer-count, or pipeline data exist for a true TAM-SAM-SOM build.

This table intentionally mixes qualitative maturity signals and adjacent pricing proxies because public evidence does not support a clean Goodfire TAM-SAM-SOM. The point is to bound the market with observable lenses rather than invent a top-down number.

[CM015, CM016, CM017, CM019, CM024, CM025]
FM001: Market sizing lens

Public evidence supports a large AI-demand backdrop, a visible adjacent observability budget layer, and a much narrower direct Goodfire capture layer defined by model access and high-touch pilots.

This is a constrained lens stack rather than a numeric TAM-SAM-SOM waterfall. Only the adjacent-budget layer has visible public pricing; Goodfire's direct commercial layer is undisclosed.

[CM020, CM019, CM024, CM026, CM044, CM045]
FM002: Market estimate range

Public pricing only supports a bottom-up range for adjacent software budgets; Goodfire's direct ACV remains undisclosed, so these figures are comparison proxies rather than Goodfire revenue estimates.

All values are adjacent-market price proxies, not Goodfire prices. The usage-based row is derived directly from Fiddler's published per-trace rate using explicit 100k, 1M, and 10M annual trace scenarios.

[CM024, CM025, CM026, CM047, CM048, CM049]

2.2 Buyer segmentation, budget owners, and adoption path

The clearest buyers are teams that both control important models and can expose enough internal state for Goodfire to do meaningful work. Frontier labs sit at the top of that list because they already run interpretability efforts, have research and safety staff who can use the tooling, and face direct pressure to shape model behavior. Enterprise model teams come next when they own proprietary or open-weight models and can justify specialized tooling through AI-platform or advanced-engineering budgets. Scientific AI teams in genomics, biology, robotics, and other research-heavy domains are especially relevant because interpretability can validate whether predictions are driven by real structure or shortcuts and can surface domain knowledge humans can reuse. Regulated adopters have strong need, but the combination of privacy, governance, and evidence requirements makes them slower to close. The payer is not always the end user. Research leads, CTOs, platform heads, or scientific-program owners may buy; model scientists, safety teams, and computational researchers use; and central AI R&D, platform, or research-program budgets pay. Public legal and product evidence implies a pilot-first motion: identify a high-stakes model problem, secure model and data access, run interpretability or steering work in a shared environment, prove a control or validation outcome, and only then expand into longer-term monitoring or licensing. That motion fits a high-touch, design-partner market better than a mass self-serve software motion.[CM003, CM005, CM006, CM009, CM010, CM011]

Segment / buyer map
segmentbuyeruserpayer/workflowbudget owneradoption trigger
Frontier labsChief scientist, interpretability lead, safety leadInterpretability researchers, model scientists, safety engineersResearch program around training control, alignment, and failure analysisFrontier-model R&D and safety budgetsNeed to debug, steer, or align internally developed frontier models
Enterprise model teamsCTO, VP Engineering, AI platform leadApplied scientists, ML engineers, eval teamsOwned or fine-tuned model programs with reliability or control needsAI platform, infrastructure, or advanced product budgetsHigh-value model workflow where traces are insufficient and deeper control matters
Life sciences / scientific AI teamsResearch director, computational biology lead, scientific founderComputational scientists, modelers, translational research teamsScientific discovery or validation workflow tied to owned foundation modelsResearch program or disease-area budgetNeed to validate that model predictions reflect real mechanisms, not confounders
Regulated adoptersClinical, legal, compliance, or risk executive with technical sponsorDomain experts, review teams, model-risk staffPilot around high-consequence decision support or specialized model governanceDomain budget plus governance oversightNeed for transparent, auditable behavior before broader deployment

The buyer-user-payer split matters because Goodfire is sold as a high-touch capability layer. In every segment, the best trigger is a high-value model that the customer controls deeply enough to inspect.

[CM028, CM029, CM030, CM031, CM032, CM033]
FM003: Buyer / segment map

Goodfire's best near-term segments combine high need for interpretability with real access to model internals; regulated adopters have strong need but weaker immediate reach.

The matrix is an evidence-based ordinal synthesis from public product, legal, research, and independent reporting. It measures relative reachability, not disclosed revenue.

[CM028, CM029, CM030, CM031, CM036, CM046]
FM004: Adoption funnel/value-chain map

Goodfire's public materials imply a pilot-first value chain that starts with a high-stakes model problem and expands only after model access and interpretability work prove value.

The sequence comes from Goodfire's legal pilot agreement, product pages, and Series B narrative. Public sources do not disclose stage-by-stage conversion rates.

[CM009, CM010, CM032, CM036, CM044]

2.3 Growth drivers, constraints, and valuation relevance

Demand-side conditions are favorable. PwC shows that AI-exposed industries are generating materially higher revenue per worker and paying a large skills premium, which suggests real willingness to fund tools that make AI systems more effective. At the same time, adjacent vendors repeatedly frame observability, guardrails, and evaluation as business-critical because autonomous systems now touch revenue, operations, and user experience. That helps Goodfire because it means the budget conversation already exists; the company does not need to invent the importance of reliability or control from scratch. Its scientific and regulated use cases also line up with the places where output-only evaluation is least sufficient and where deeper interpretability has the most strategic value. The brakes are equally important. Gartner says ROI varies widely and hidden implementation costs can be large. NIST-style governance expectations, data privacy rules, and clinical or scientific validation standards all slow deployment. Most importantly, Goodfire's own story and independent reporting agree that the field is still technically immature: the company markets precision engineering, but external critics and even Goodfire's own research papers acknowledge that interpretability still has major open problems. Combined with the requirement for model-internal access and the lack of public pricing or customer data, that means valuation should anchor on a selective high-value wedge rather than a mass-market software assumption.[CM016, CM017, CM018, CM019, CM034, CM035]

Growth drivers and constraints table
driver/constraintdirectiontimingimplicationdiligence ask
Higher-stakes AI deploymentupcurrentAs AI touches science, healthcare, and autonomous workflows, demand rises for deeper validation and controlAsk which current customers use Goodfire for pre-deployment validation versus post hoc analysis.
Productivity and labor pressureup12-24 monthsFirms that see real AI productivity gains are more willing to fund tooling that increases model reliabilityRequest proof that Goodfire shortens debugging or post-training iteration cycles enough to justify budget.
Adjacent observability budget normalizationupcurrentTracing, evals, and guardrails are already funded categories, making the budget conversation easierAsk how often Goodfire sells alongside LangSmith, Datadog, Langfuse, or similar platforms.
Scientific discovery upsideup12-36 monthsBiology and robotics cases broaden the market beyond software teams if outcomes prove repeatableRequest revenue split and renewal evidence for scientific customers or partners.
Model-access dependencedowncurrentClosed-model customers are harder to serve because Goodfire needs deeper access than most API-only users can provideRequest the pipeline split between open-weight, proprietary in-house, and closed-API prospects.
Governance and validation burdendowncurrent and risingRegulated buyers may value interpretability most, but their procurement cycles are longestRequest average sales cycle and security or governance review time by segment.
Technical immaturity of mechanistic interpretabilitydown12-36 monthsDebate over how close the field is to precision engineering can cap budget urgencyRequest benchmark evidence that Goodfire changes outcomes on production tasks, not just research demos.
Opaque Goodfire pricing and customer disclosuredowncurrentWithout public price and customer data, outside investors must underwrite a selective rather than broad-market storyRequest ACV bands, pilot-to-license conversion, and customer-count disclosures by cohort.

The key underwriting question is not whether demand exists, but whether Goodfire can convert a real need for control into repeatable commercial deployments faster than access limits, governance friction, and field immaturity slow adoption.

[CM016, CM018, CM019, CM034, CM035, CM036]

2.4 Exhibits

Chapter 03

03Competitors

3.1 Landscape by competitor class

Goodfire sits in an unusual competitive slot. Its public product language is not about post-hoc prompt monitoring or generic LLM telemetry; it is about intentional model design, feature steering, targeted failure correction, and programmatic access to model internals. MIT Technology Review frames Silico as a mechanistic-interpretability tool that puts techniques previously concentrated inside Anthropic, OpenAI, and Google DeepMind into the hands of smaller firms and research teams. That makes internal frontier-lab interpretability groups and sophisticated in-house research teams the closest direct alternatives for buyers building or adapting open-weight models. The broader commercial landscape is more crowded but more indirect. Arize Phoenix, LangSmith, Langfuse, Datadog, Fiddler, Arthur, and former platforms such as Humanloop all compete for budget tied to trustworthy AI development, yet their default control point is tracing, evaluation, guardrails, or governance around deployed systems rather than deep editing of learned representations. The practical implication is that Goodfire should be judged less like another observability dashboard and more like a new tooling layer for model builders who need mechanistic understanding before, during, and after training.[CP001, CP002, CP004, CP006, CP007, CP008]

Competitor profile table
competitorcategoryscale / funding signaltarget segmentkey differentiationkey limitation versus Goodfire
Goodfire / SilicoMechanistic-interpretability-native model designRaised $150M Series B at $1.25B valuation; ~$209M total funding disclosedTeams building or adapting open-weight and domain-specific modelsProgrammatic access to model internals, feature steering, data attribution, and pre-deployment failure diagnosisPublic pricing, win-rate, and installed-base evidence are sparse relative to adjacent tooling vendors
Frontier-lab internal interpretability teams (Anthropic / OpenAI / Google DeepMind)Direct incumbent / internal buildEmbedded inside frontier labs rather than sold as a stand-alone productFrontier model builders with closed-weight accessDeepest access to proprietary models and internal research talentUnavailable as a commercial product for most buyers; not a purchasable vendor
Arize PhoenixAdjacent open-source tracing and eval platformOpen-source product; AX Pro starts at $50/month with enterprise tierAI engineers building agents and LLM applicationsTracing, evals, datasets, experiments, and open-source entry pointFocuses on agent development observability rather than mechanistic editing of model internals
Fiddler AIAdjacent enterprise observability / guardrails vendorFree tier, $0.002 per trace developer plan, enterprise deployment optionsEnterprises needing monitoring, policy, and governance for AI systemsUnified observability, custom evaluators, real-time guardrails, SaaS/VPC/on-prem optionsCompetes at the monitoring and control-plane layer, not the feature-level model-design layer
ArthurAdjacent lifecycle reliability and governance vendorEnterprise AI platform with monitoring and policy workflow proofs on pageEnterprises managing agents, GenAI, and traditional ML togetherContinuous evals, policies, guardrails, dashboards, and oversight across the AI lifecycleLittle public evidence of mechanistic interpretability or targeted internal model editing
Datadog LLM ObservabilityIncumbent observability platformFree 40K LLM spans/month; Pro starts at $160/month with 100K spansExisting Datadog customers extending APM into AI deliveryBundles agent observability with backend monitoring, experiments, data retention, and enterprise controlsBest suited to operating production AI systems, not to reverse engineering model representations
LangChain LangSmithAdjacent developer workflow incumbentFree tier for development and small production; paid plans scale with trace volumeTeams already building on LangChain or multi-framework agent stacksStrong agent tracing, SDK breadth, framework adjacency, and debugging workflowsPublic page describes observability, not mechanistic model editing or training-data attribution
LangfuseAdjacent open-source AI engineering platform10B+ observations/month; 100k+ engineers; free plus $29/$199/$2499 self-serve plansDevelopers wanting OSS tracing, evals, prompts, and production feedback loopsOpenTelemetry base, self-hosting, transparent pricing, and large OSS distributionEconomic and developer-workflow strength does not translate into Goodfire-style internal model control
Humanloop (historical)Adjacent eval / prompt management vendorFree trial with 50 eval runs and 10K logs/month; now joining Anthropic and sunsettingTeams evaluating models and managing prompts for trustworthy LLM appsPrompt management, evaluation metrics, private deployment add-onsNo longer an independent platform, which underscores category consolidation risk
Weights / Weave (historical)Adjacent tooling vendor absorbed by frontier labProducts wound down after team joined OpenAICreators and model builders using earlier Weights productsDemonstrates that AI tooling talent can be absorbed by frontier labsNo longer a live independent competitor; mainly a signal of category absorption
In-house black-box workflowStatus-quo substitute / internal buildEngineering labor plus commodity open-source or point toolsTeams unwilling to buy a new vendor categoryFlexible and initially cheap: prompting, evals, fine-tuning, and guardrails can be assembled incrementallyKeeps teams in guess-and-check loops with limited mechanistic evidence on why a model failed

Profile set intentionally mixes direct, incumbent, adjacent, historical, and substitute options because Goodfire competes for a job-to-be-done, not a single analyst-defined software category.

[CP001, CP006, CP007, CP017, CP018, CP019]
FP001: Competitive positioning map

Ordinal positioning shows Goodfire furthest toward mechanistic model control, while Datadog, Fiddler, LangSmith, and Langfuse score higher on deployment-observability breadth.

X-axis is mechanistic access / direct model editability from 1 (surface-level observability only) to 5 (deep model-internal access). Y-axis is deployment and distribution breadth from 1 (narrow research workflow) to 5 (broad installed-base or platform reach). Scores are evidence-backed ordinals synthesized from reviewed source pages, not benchmark measurements.

[CP001, CP006, CP007, CP017, CP019, CP021]

3.2 Adjacent vendors: capabilities, packaging, and budget overlap

The adjacent vendor set is commercially relevant because it competes for the same buyer conversation around trustworthy AI, but the products are usually anchored to different workflows. Arize Phoenix emphasizes open-source tracing, evals, datasets, and experiments for agent development. Fiddler and Arthur lean into lifecycle observability, guardrails, policies, and governance. Datadog folds agent observability into a much larger application-monitoring estate, which is important because that installed base can make “good enough” AI oversight easier to buy than a stand-alone platform. LangSmith and Langfuse both push developer workflow and production debugging; Langfuse, in particular, combines a strong open-source posture with transparent self-serve pricing, while LangSmith advertises a free tier and trace-volume billing. Humanloop historically targeted development, prompt management, and evaluation for trustworthy LLM apps, but its move into Anthropic shows the category can be absorbed by model labs rather than remain independent. Relative to these vendors, Goodfire looks differentiated on mechanistic access and targeted model editing, but thinner on public pricing, installed base, and broadly deployed observability surfaces.[CP017, CP018, CP019, CP020, CP021, CP022]

Feature / capability matrix
buying criterionGoodfireFrontier labs internal teamsArize PhoenixFiddler AIArthurDatadogLangSmithLangfuseHumanloop (historical)In-house black-box stack
Mechanistic access to model internalsstrongstrongunsupported / unknownunsupported / unknownunsupported / unknownunsupported / unknownunsupported / unknownunsupported / unknownunsupported / unknownlimited
Targeted steering or editing of learned featuresstrongstrongunsupported / unknownlimitedlimitedunsupported / unknownunsupported / unknownunsupported / unknownunsupported / unknownlimited via prompt or fine-tune only
Training-data attribution or probe workflowsstrongstrongunsupported / unknownunsupported / unknownunsupported / unknownunsupported / unknownunsupported / unknownunsupported / unknownunsupported / unknownunknown
Production tracing / experiments / eval looplimitedunknownstrongstrongstrongstrongstrongstrongstrongpartial
Real-time guardrails / policy enforcementlimitedunknownlimitedstrongstronglimitedlimitedlimitedlimitedpartial
Open-source or self-host pathlimited public evidenceno commercial pathstronglimitedunknownlimitedunknownstronglimitedstrong
Enterprise deployment / compliance controlsemerging / limited public proofinternal onlystrongstrongstrongstrongunknownstrongstrong historicallydepends on internal team
Domain-specific scientific model workflowsstronglimited public proofunsupported / unknownunsupported / unknownunsupported / unknownunsupported / unknownunsupported / unknownunsupported / unknownunsupported / unknowncustom if team builds it

Cells are evidence-backed qualitative judgments from reviewed product pages only; unsupported or absent capability disclosures are marked as unknown rather than inferred.

[CP001, CP002, CP007, CP008, CP011, CP012]
Pricing / packaging comparison
offeringpublic price / contract modelpackaging detailsincluded capabilitiesunknowns / discountsimplication
Goodfire / SilicoCase-by-case fee; Goodfire declined specific pricingCustom commercial engagement aligned to customer requirementsModel design environment, experiment agent, mechanistic debugging and steeringNo public self-serve list price or usage meter disclosedHarder for buyers to benchmark ROI; product must sell on differentiated outcomes rather than transparent entry pricing
Arize AX / PhoenixFree tier; AX Pro $50/month; enterprise custom50k spans/month and 10GB/month in Pro; enterprise SaaS or self-hostedTracing, evals, datasets, experiments, observabilityStartup pricing mentioned but not publicly enumerated in detailSets a low entry point for teams that mainly need agent telemetry and eval workflows
Fiddler AIFree tier; Developer at $0.002 per trace; enterprise customDeveloper plan adds unified observability, custom evaluators, SSO, SaaS deploymentObservability, tests and experiments, guardrails, governanceEnterprise pricing not public beyond tier framingCreates usage-based competition for safety and governance budgets around deployed systems
Datadog LLM ObservabilityFree up to 40K LLM spans/month; Pro starts at $160/month for 100K spansOn-demand overage after 100K spans; discounted M2M and annual commitmentsAgent observability, evaluations, retention options, sensitive data scanningRetention add-ons and full enterprise packaging vary by commitmentStrong incumbent bundle for teams already standardized on Datadog
LangSmithFree tier for development and small production; paid plans scale with trace volume; enterprise by contactFramework-agnostic SDK access with usage-based paid expansionAgent tracing, observability, debuggingExact public price not shown on reviewed pageBudget overlap is strongest where teams want workflow visibility rather than model-internal control
LangfuseFree Hobby tier; $29 Core; $199 Pro; $2499 EnterpriseUnits-based billing with 50k free and 100k included in paid plans; optional $300 Teams add-onTracing, evals, prompts, analytics, compliance features, self-host optionsVolume discounts beyond listed unit ladderTransparent pricing and OSS posture put pressure on vendors pitching generic AI engineering value
Humanloop (historical)Free trial; enterprise/custom plans2 members, 50 eval runs, 10K logs/month; VPC add-on and enterprise supportPrompt management, evaluation, trustworthy LLM app workflowIndependent commercial future is gone after Anthropic dealShows how adjacent platform categories can disappear into a frontier lab before they mature independently
In-house black-box stackNo software line item; internal labor plus cloud/tool spendMix of prompts, eval harnesses, fine-tuning, and guardrails from existing toolsFlexible substitute path for teams avoiding new vendor spendTrue total cost often hidden in compliance review, retraining, and internal overheadStatus quo remains viable unless Goodfire proves materially better debugging, safety, or domain outcomes

Public pricing combines official list pricing, tier descriptions, and explicit unknowns from reviewed pages; absence of a number is treated as an evidence gap, not a hidden assumption.

[CP018, CP020, CP023, CP024, CP026, CP027]
FP002: Feature breadth / capability map

The capability map highlights Goodfire's relative strength in model-internal editing and domain-specific mechanistic workflows, versus adjacent vendors' strength in tracing, governance, and production operations.

Scores are ordinal 1-5 judgments from public capability descriptions only. A 5 indicates strongest visible fit in the reviewed source set, not an audited market ranking. The figure is a synthesized strength map distinct from TP002's support/unknown matrix.

[CP002, CP003, CP011, CP012, CP013, CP017]

3.3 Switching costs, substitutes, and distribution power

Switching costs in this landscape are asymmetric. Once a team standardizes on Datadog, LangSmith, or Langfuse for traces, evals, and production debugging, those tools can become the default operating surface for AI quality work even if they do not expose model internals. That distribution advantage matters because many organizations would rather extend an existing developer or observability stack than adopt a new research-native workflow. Conversely, Goodfire’s strongest use cases appear where tracing alone is not enough: open-weight model builders, safety-critical domains, and research teams that need to inspect features, attribute behaviors to training data, or intervene before deployment. The main substitute is still a black-box stack of prompting, benchmark evals, guardrails, and iterative fine-tuning, sometimes assembled in-house from open-source tools. That path is cheaper up front and familiar, but Goodfire’s argument is that it leaves teams guessing at why a model behaves badly. The competitive question is whether buyers feel enough pain from that guess-and-check loop to move budget from observability or prompt tooling into mechanistic model design.[CP004, CP008, CP016, CP017, CP018, CP022]

3.4 Moat durability and competitive risk

Goodfire’s moat case is easiest to believe when the buyer values mechanistic understanding itself. The company can point to feature steering, data attribution, PII-detection probes, and domain work in biology and robotics as evidence that model internals can be used for debugging, safety, and scientific discovery rather than just post-hoc monitoring. That gives it a more research-native product story than adjacent evaluation vendors. But the adverse evidence matters. MIT Technology Review quotes an outside mechanistic-interpretability researcher arguing that Goodfire may be adding precision to today’s alchemy rather than turning AI into a fully principled engineering discipline. The same article notes that Silico is most useful where customers can access model weights, limiting applicability on closed frontier models. OnHealthcare also frames the company as a 51-person, research-first organization valued aggressively relative to disclosed commercial traction. The highest-risk scenarios are therefore clear: larger observability vendors adding explain-and-steer features, frontier labs keeping the deepest interpretability advantages in-house, or customers deciding that trace-level controls are sufficient. Goodfire can still win if it becomes the default model-design layer for open-weight and domain-specific AI programs, but that durability is not yet proven by public win-rate, pricing, or retention evidence.[CP005, CP007, CP008, CP009, CP010, CP011]

Moat durability / competitive risk register
moat or risk claimsupporting evidencecounter-pressureseveritymitigation / diligence ask
Mechanistic interpretability is Goodfire's clearest product moatSilico, feature steering, data attribution, Llama steering, and probe work all point to direct intervention on model internalsFrontier labs also do mechanistic interpretability internally, and outsiders question how principled the workflow already ishighRequest customer evidence showing that mechanistic workflows change deployment or training decisions in ways observability tools cannot
Goodfire is strongest where customers can inspect open-weight or adaptable modelsMIT says Silico is most usable when teams can access a model's inner workings; Goodfire markets training/debugging model design environmentsClosed frontier models limit applicability; many enterprise buyers still consume APIs from black-box providershighAsk for customer mix by open-weight versus API-only deployments and proof of closed-model roadmap
Adjacent observability vendors can absorb large parts of the AI-quality budgetArize, Fiddler, Datadog, LangSmith, Langfuse, Arthur, and Humanloop all sell tracing, evals, guardrails, or governanceThese tools do not obviously solve feature-level debugging or data attribution, leaving room for a deeper design layerhighTest whether Goodfire is attached to a separate budget owner or must displace observability spend
Transparent self-serve pricing elsewhere makes Goodfire's opaque pricing a sales riskArize, Fiddler, Datadog, and Langfuse publish entry pricing while Goodfire uses case-by-case commercial termsIf buyers perceive Goodfire as another tooling vendor rather than a differentiated research layer, price discovery will feel unfavorablemedium-highRequest realized pricing, pilots-to-production conversion, and average time to first value
Research breadth can become a moat only if it productizesGoodfire cites hallucination reduction, PII detection, biology discovery, and diffusion-search wins across multiple domainsBroad research portfolio can also create focus risk and slow repeatable product packagingmedium-highAsk what percentage of roadmap and headcount is tied to reusable product versus custom research engagements
Category consolidation is a real threatHumanloop is joining Anthropic and sunsetting; Weights wound down after team joined OpenAIFrontier labs may absorb adjacent capabilities and talent faster than start-ups can scale independentlymediumAssess whether Goodfire is more likely to be a durable platform, a feature inside another stack, or an attractive acquisition target
Governance and trust requirements help Goodfire only if buyers believe interpretability is additive to observabilityNIST AI RMF and Gartner both reinforce governance, evaluation, and hidden operating-cost concerns in sensitive AI systemsThose same concerns also strengthen guardrail and observability incumbents such as Fiddler, Arthur, and DatadogmediumValidate whether regulated buyers explicitly ask for mechanistic evidence or remain satisfied with trace-level controls and policy enforcement

Severity reflects competitive pressure on Goodfire specifically, not absolute vendor quality; mitigation requests focus on evidence missing from the public record.

[CP005, CP007, CP008, CP009, CP010, CP011]
FP003: Moat / readiness KPIs

Compact KPIs summarize the commercial and competitive boundaries around Goodfire's moat: large research funding, opaque pricing, adjacent free tiers, and direct pressure from internal frontier-lab teams.

KPI items intentionally mix funding, price floors, and packaging signals because Goodfire's competitive durability is shaped by both technical differentiation and adjacent-tool economics.

[CP005, CP010, CP018, CP020, CP023, CP026]
Chapter 04

04Financials

4.1 Revenue model and pricing surface: software is visible, economics are not

Public evidence supports a commercial product, but not a public price book. Goodfire's official surface describes Silico as a model-design environment and workspace for training and debugging models on Goodfire infrastructure, and the vertical pages repeatedly invite teams training or fine-tuning foundation models to request access rather than self-serve into a public checkout flow. The contact page goes further, saying the platform is already used by Fortune 500 enterprises, major healthcare institutions, and AI research labs. Those statements support the existence of an enterprise product and an enterprise target market. They do not disclose the actual commercial terms those customers accept. The legal documents make the pricing posture clearer. The master services agreement and terms of use both push commercial economics into negotiated order forms. The terms explicitly contemplate fees, overage charges when usage exceeds contracted allotments, and dashboard or usage-report records that are authoritative for billing. The pilot agreement separately states that pilot access is for internal evaluation and that a separate commercial license is required after the evaluation period. That combination points to a monetization stack built around custom contracts rather than public list pricing: pilot fees, commercial platform fees, usage-based overages, and potentially additional service charges. What remains unavailable is the part investors actually need to underwrite. None of the reviewed public pages disclose list price, minimum annual commit, support tier pricing, discount ladders, or realized pricing by customer type. The pricing / monetization table therefore distinguishes verified commercial mechanisms from missing economics. The absence of public prices is not unusual for enterprise AI infrastructure, but it means external readers cannot infer ACV, customer segmentation, or software gross margin from the official surface alone. The right conclusion is not that Goodfire lacks revenue; it is that Goodfire has chosen a negotiated, opaque commercial posture.[CI007, CI008, CI009, CI012, CI013, CI014]

Revenue streams table
streammechanismunitcurrent value/statusqualitydiligence ask
Pilot programsEvaluation access under pilot agreement before full commercial licensepilot fee / pilot termPilot fee exists in order form; public amount undisclosedMedium for existence, low for valueProvide executed pilot order forms, fee schedule, and conversion rate to commercial contracts.
Silico commercial platform accessOrder-form-based access to hosted platform, APIs, tools, documentation, and related softwareannual contract or custom licenseCommercial fees exist in order forms; no public list priceMedium for mechanism, low for pricingProvide standard order form, ACV ranges, minimum commits, and billing basis.
Usage overagesCharges for usage beyond contracted allotment under terms of useusage unit above allotmentOverages explicitly contemplated; triggering unit and price undisclosedMedium for mechanism, low for realized economicsDisclose metering unit, included allotment, overage rate, and customer usage mix.
Support / field engineering / research servicesTechnical assistance, field engineering, collaboration activities, and deliverables alongside platform useproject, retainer, or services statement of workServices are contractually available; public pricing and attach rate undisclosedMedium for existence, low for margin profileDisclose service revenue share, pricing method, utilization, and gross margin.
Life-sciences discovery engagementsPlatform plus embedded interpretability work for scientific discovery partners such as Prima Mentecustom engagementNamed proof points exist; no contract value or renewal data disclosedLow for current revenue contributionProvide contract values, renewal status, and whether these engagements convert to recurring software.
Enterprise design partnershipsSelective engagements with frontier or high-stakes AI teamscustom partnershipOfficially described as selective and request-access based; no public contract economicsLow for current revenue qualityProvide design-partner count, conversion to production contracts, and realized annual spend per account.

Verified mechanisms come from legal docs and official product pages. Current value/status is intentionally qualitative because Goodfire does not disclose revenue mix or realized pricing.

[CI007, CI008, CI009, CI012, CI013, CI014]
Pricing / monetization table
product / pathprice / unit / contractlist vs realizeddiscounts / unknownssource
Silico commercial licenseNo public amount disclosedNo public list pricing; negotiated realized pricing onlyUnknown minimum commits, contract term, seats or compute basisOfficial product pages + MSA/TOS
Pilot agreementPilot fee set in order form; amount undisclosedNo public list pricingUnknown evaluation term, conversion credits, and pilot-success criteriaPilot Agreement
Usage overagesOverage charges apply above included allotment; unit not publicRealized onlyUnknown rate card, thresholds, and true usage driverTOS
Support / field engineeringNo public price disclosedRealized onlyUnknown whether bundled, separately invoiced, or included in enterprise tierTOS + MSA
Compliance-ready enterprise deploymentSOC 2 / SOC 3 support procurement readiness but do not set priceNot a price pointUnknown whether security/compliance premium is monetized directlySOC 2 blog + contact surface
Deprecated demo / API previewNo current public commercial price; preview API deprecated in Feb 2026Historic preview removed from public surfaceUnknown whether any self-serve pricing survived privatelyFeature steering blog

This table separates disclosed commercial mechanics from undisclosed economics. Official pricing is effectively absent; every public path points to custom contracting.

[CI008, CI013, CI014, CI015, CI018, CI020]
FI001: Revenue model bridge

Illustrates the public revenue architecture from selective customer acquisition through platform usage and services, while marking where realized pricing and margin cease to be public.

Public sources verify the nodes and commercial mechanisms, but not realized values, contract sizes, or margin. This is a structural bridge rather than a quantified waterfall.

[CI007, CI008, CI009, CI012, CI013, CI014]

4.2 GTM motion and unit economics: high-touch deployments, low public observability

Goodfire's public GTM looks selective and high touch. The Series B post says the company engages deeply and selectively with teams building high-stakes or frontier systems, while the contact page describes a platform used by large enterprises, healthcare institutions, and AI research labs. The customer-story material shows why this matters financially: in the Prima Mente engagement, Goodfire researchers embedded with the customer and built a biomarker discovery pipeline around the customer's model. The terms of use also describe support, technical assistance, field engineering support, research activities, collaboration activities, and deliverables. Together, these sources suggest that at least some deployments are not pure seat-based software subscriptions; they likely combine platform access with bespoke scientific or engineering work. That has two opposite implications. On the positive side, embedded work can accelerate design-partner conversion, widen the product moat, and justify premium enterprise pricing. It can also make Goodfire useful in high-stakes domains where customers need interpretation help, not just dashboards. On the negative side, services-heavy revenue generally scales more slowly and often carries a weaker gross-margin profile than pure software. Public sources do not reveal how much of Goodfire's revenue, if any, comes from software usage, annual licenses, pilots, or research services. They also do not disclose customer counts, pilot-to-production conversion, sales cycle length, retention, CAC, or payback. The technical proof points are meaningful but not financial metrics. Goodfire's RLFR research claims a 58 percent hallucination reduction at roughly 90 times lower cost than an LLM-as-a-judge approach, and the life-sciences case studies show credible customer-value stories in diagnostics and scientific discovery. Those are strong commercialization narratives, but they are not the same as disclosed revenue quality. For this reason the unit economics bridge is qualitative. It shows the likely path from a selective design partnership to contracted software and overages, while making clear that the realized values at each step are private.[CI009, CI010, CI011, CI012, CI015, CI016]

Unit economics table
metricvalue / nullconfidencewhy it mattersdiligence ask
Public list price for SilicolowWithout list or starting price, outsiders cannot bracket ACV or customer segmentation.Request current price card or anonymized quote set by deployment type.
Average contract value (ACV)lowACV is needed to translate selective design-partner traction into revenue scale.Provide ACV distribution for pilots, enterprise subscriptions, and strategic partnerships.
Usage gross marginlowConsumption software can be high margin, but embedded compute or human delivery can compress it.Provide gross margin by platform usage line and by services line.
Services revenue sharelowA services-heavy mix changes scalability and valuation framework.Disclose software-versus-services mix for the last twelve months.
Pilot-to-production conversion ratelowThis is the clearest proxy for revenue quality in a selective-enterprise GTM model.Provide count of pilots launched, converted, and churned.
Sales cycle lengthlowLong enterprise and healthcare procurement cycles can delay revenue recognition and cash collection.Disclose median cycle from first contact to signed order form by customer segment.
CAC paybacklowNecessary to judge whether high-touch GTM is economically durable.Provide fully loaded CAC and gross-margin payback by cohort.
Retention / expansionlowOverages and usage growth matter only if accounts renew and expand.Provide logo retention, gross retention, and expansion rates for paying accounts.

Null means the metric is not publicly disclosed in reviewed sources, not that the metric is zero or irrelevant.

[CI012, CI016, CI022, CI023, CI029, CI030]
FI002: Unit economics bridge

Qualitative bridge from acquisition motion to blended economics, highlighting where public evidence ends and diligence requests must begin.

The bridge is intentionally qualitative because Goodfire does not disclose ACV, CAC, payback, retention, or gross margin.

[CI012, CI014, CI015, CI016, CI022, CI023]

4.3 Capital adequacy and financing: funding is verified, runway is not

The strongest financial facts in the public record are financing facts. Goodfire announced a $50 million Series A in April 2025 and a $150 million Series B at a $1.25 billion valuation in February 2026. The SEC Form D filings sharpen those announcements. The 2025 filing reports $52,029,991 sold after a first sale on 2025-04-02, and the 2026 filing reports $149,999,796 sold after a first sale on 2025-12-17, against a total offering amount of $161,674,124. On that narrow basis, at least $202.0 million of equity sold in the two disclosed rounds is directly verifiable from primary filing data, and public commentary places total funding modestly above $200 million including earlier capital. That financing fact pattern supports one clear conclusion: Goodfire has had strong capital access. It does not answer the central capital-adequacy question. No reviewed public source discloses cash on hand, monthly burn, runway months, debt covenants, or a next-round trigger. The public uses of funds are broad: frontier research, next-generation product development, and scaled partnerships across AI agents and life sciences. Those are real cash uses, but they are not enough to derive runway because the denominator is missing. Even the relatively small additional clue in the 2026 Form D — a total offering amount above the announced sold amount — only shows possible capacity or reserve in the round, not actual cash still available. The financial estimate range therefore stays disciplined and only brackets financing facts that are source-backed. It does not invent revenue, burn, or runway. Likewise, the capital-intensity map highlights where cash likely goes — product, research, embedded delivery, and enterprise compliance — while preserving the distinction between documented financing and inferred cost structure. This is the correct evidence-constrained stance: the raise is verified, but capital adequacy beyond the raise cannot be underwritten from public data.[CI001, CI002, CI003, CI004, CI005, CI006]

Capital adequacy table
itempublic value / statusconfidencewhy it mattersdiligence ask
Verified Series A financingAnnounced $50M; Form D shows $52.029991M soldhighPrimary evidence confirms external capital raising in 2025.Reconcile press-announced round size to cap-table-close documents.
Verified Series B financingAnnounced $150M at $1.25B valuation; Form D shows $149.999796M sold and $161.674124M total offeringhighPrimary evidence confirms large 2026 financing and possible residual offering capacity.Provide final close schedule and whether any unsold allocation remained available.
Cumulative disclosed capital since Series AAt least $202.029787M sold across 2025-2026 Form D filings; public commentary says backing exceeds $200M overallhighThis is the strongest public capital-adequacy anchor.Provide total capital raised including seed and remaining unrestricted cash.
Cash on handlowCash balance is required to convert financing history into actual runway.Provide current unrestricted cash and short-term investments.
Monthly burnlowWithout burn, no public runway estimate is defensible.Provide last six months of net burn and planned spend by function.
Runway monthslowRunway is the core adequacy metric after a large financing round.Provide management runway view under base and downside plans.
Planned use of fundsFrontier research, next-generation core product, and scaling partnerships across AI agents and life sciencesmediumConfirms capital is funding both R&D and GTM, not just balance-sheet preservation.Provide board-approved use-of-proceeds model with timing and budget buckets.
Debt / project-finance obligationsNo public debt or project-finance obligations identified in reviewed sourceslowAbsence of disclosure is not proof of zero leverage, but no public obligation surfaced here.Provide debt schedule, venture debt terms, leases, and any committed compute obligations.

This table separates verified financing facts from unavailable liquidity metrics. Null values reflect missing public disclosure, not negative findings.

[CI001, CI002, CI003, CI004, CI005, CI006]
FI003: Financial estimate range

Source-backed financial ranges limited to financing facts; revenue, burn, and runway are excluded because they are not publicly disclosed.

Low/base/high values reconcile press-announced financing, Form D sold amounts, and broader public commentary about total backing. This figure does not invent ranges for revenue, burn, or runway.

[CI001, CI002, CI003, CI004, CI005, CI034]
FI004: Capital intensity / cash-flow map

Matrix showing where public capital evidence exists and where operating-cash evidence is still missing.

This is a structured evidence map, not a quantified cash-flow statement. The purpose is to keep verified financing separate from missing operating-liquidity data.

[CI006, CI019, CI020, CI037, CI038]

4.4 Financial verdict and public gaps: verified funding, inferred monetization, unresolved underwriting

The evidence supports a precise but narrow verdict. Goodfire is not financially unformed; it has a verified enterprise product surface, real external financing, named partners in regulated and frontier domains, enterprise-security credentials, and commercial contracts that contemplate fees, overages, and usage measurement. Those are the ingredients of a real business. But almost every metric needed to judge revenue quality and margin path remains private. There is no public revenue, no public ARR, no gross-margin disclosure, no cash balance, no burn, no runway, and no debt schedule. That gap matters because the likely business model is mixed. The software platform could become a valuable recurring-revenue layer if usage and overages dominate. Yet the customer evidence and services clauses imply that at least part of the current offering includes embedded scientific and engineering labor. Without knowing the software-versus-services split, investors cannot tell whether Goodfire should be valued more like enterprise infrastructure software, specialized applied-research services, or a hybrid that starts service-heavy and software-lighter before maturing. The adverse read is straightforward. One skeptical sector analysis argues that the $1.25 billion valuation is aggressive for a company with early commercial traction and not yet a predictable SaaS profile. That critique is directionally fair given the public data: the capital has been disclosed, but the operating model has not. The underwriting answer is therefore to separate what is verified from what is inferred. Verified: financing, enterprise contracting mechanics, security readiness, and selective customer traction. Inferred: monetization mix, gross-margin path, and runway durability. The gaps table below captures the exact diligence requests needed before a financial investment case can move from plausible to underwritten.[CI017, CI018, CI020, CI025, CI029, CI030]

Public financial gaps table
missing private metricimpact on underwritingexact diligence path
Revenue / ARR by quarterCannot test whether valuation is supported by actual commercial scale.Request monthly recurring revenue bridge, quarterly revenue, and last-twelve-month ARR walk.
Realized pricing by customer typeCannot distinguish premium software economics from service-heavy bespoke work.Request anonymized signed order forms and invoice samples across enterprise, healthcare, and research customers.
Software versus services revenue mixCannot underwrite gross-margin path or scalability.Request management split of platform, overage, pilot, and services revenue for the last twelve months.
Gross margin and contribution marginCannot assess whether consumption and embedded-delivery costs support durable unit economics.Request gross margin by revenue line, plus cost buckets for compute, support, and personnel.
Cash balance and burnCannot estimate runway or next financing need despite large recent rounds.Request cash, debt, net burn, and planned hiring / research spend through the next 24 months.
Sales efficiency and retentionCannot judge whether selective GTM converts into repeatable enterprise software economics.Request pipeline conversion, sales cycle, CAC, payback, logo retention, and expansion metrics.

Every row here is a material diligence blocker rather than a cosmetic omission. These gaps are the reason this chapter remains evidence-constrained.

[CI029, CI030, CI031, CI037, CI038, CI040]
Chapter 05

05Product & Technology

5.1 Product definition and customer workflow

Goodfire's commercial surface is best understood as a model-design environment rather than as a generic LLM observability dashboard. Silico is presented as the first platform for intentional model design, a workspace for training and debugging models on Goodfire infrastructure, and a system that packages productized interpretability around concrete jobs: seeing inside predictions, running health checks, debugging failures, shaping behavior, and improving generalization. The practical consequence is that the product sits much closer to model-development loops than to standard application-layer analytics. The customer workflow is also unusually high touch. Public pages repeatedly push teams into request-access or partnership motions instead of a self-serve onboarding path. In practice, the workflow starts with a model team that already controls weights, activations, or at least enough internals to let Goodfire inspect how the model behaves. Goodfire then pulls models, datasets, prompts, workflows, and evaluation tasks into a shared workspace, runs agent-assisted experiments, and translates the resulting mechanistic findings into interventions such as steering, diagnostics, data filtering, or reward shaping. The vertical pages show the same loop repeated across domains. Language teams use the stack to reduce hallucinations; life-sciences teams use it to extract biomarkers and variant hypotheses from model internals; robotics and vision teams use it to catch brittle features and leakage before deployment. The result is a product with real workflow specificity, but one that still depends on customer willingness to operate in a shared, research-adjacent environment rather than through a mature, commodity API surface.[CE001, CE002, CE003, CE004, CE005, CE006]

Product module / asset matrix
Module / asset / product linePrimary userStatus / maturityDifferentiationDiligence gap
Silico shared workspaceFrontier labs and enterprise model teamsLive product surface; access controlledPackages interpretability around a model-design environment rather than an app-level dashboardNo public tenant model, API reference, or deployment architecture
Model scientist agent / experiment orchestrationResearchers and model engineersLive internally and publicly described in launch materialsAutomates experiment planning and execution inside the same workspaceHuman-review rules, guardrails, and customer autonomy levels are not public
Diagnostics and health checksTraining, evaluation, and safety teamsLive workflow claimsSurfaces bottlenecks, feature collapse, shortcut learning, and rare failures before deploymentNo published precision/recall or benchmark coverage by model class
Steering and intervention controlsAI engineers tuning model behaviorLive but still evolving after preview-tool deprecationDirect feature steering, reward shaping, and data-filtering style editsSupported-model matrix, rollback controls, and commercial packaging are private
Language reliability workflowOpen-model or fine-tuning teamsMost concrete public workflow58% hallucination reduction claim plus rollout viewer for intervention reviewEvidence is strong but still concentrated in Goodfire-selected case studies
Scientific discovery workflowGenomics and life-sciences researchersAdvanced partner workflowTurns model internals into biomarkers, pathogenicity probes, and human-readable variant hypothesesClinical validation and regulatory pathway remain partner-specific
Physical AI / creative workflow assetsRobotics, vision, and image-model teamsPartner workflow or research previewExtends same interpretability primitives into policy bottlenecks, leakage detection, and latent editing UIsCommercial status and repeatability outside case studies are not public

Rows combine public product modules and workflow assets because Goodfire markets the platform through problem-specific surfaces rather than through a public SKU sheet.

[CE001, CE002, CE003, CE004, CE007, CE011]
Workflow / use-case table
User jobCurrent workflowGoodfire solutionMeasurable benefitLimitation
Reduce LLM hallucinations before deploymentPrompt tweaks, judge loops, and post-hoc output reviewRLFR, feature steering, and the Hallucinations Viewer inside the model-design environment58% hallucination reduction and roughly 90x lower intervention cost versus LLM-as-judge claimsEvidence is workflow-specific and not a universal performance guarantee
Debug frontier reasoning-model behaviorPrompt hacks and coarse response benchmarkingReasoning-model SAEs, feature databases, and timing-aware steering on R1Shows reasoning-specific features like backtracking and exposes steering edge cases at large scaleRequires weight or activation access and expert handling of model-specific behavior
Extract biomarkers from a scientific modelBlack-box prediction review and wet-lab triageEmbedded interpretability work using SAEs, tracing, and ablation on customer modelsSurfaced a novel Alzheimer's biomarker class and a human-readable classifier that generalized to an independent cohortStill requires downstream experimental validation
Explain genome-wide variant effectsOpaque pathogenicity scores and coding-region-limited toolsEvo 2 embeddings plus probes and reasoning-model synthesis through EVEE0.997 AUROC on 839k ClinVar variants and structured hypotheses for 4.2M variantsOutputs are hypotheses, not diagnoses or regulatory-grade evidence
Catch robotics or vision failures before deploymentWait for benchmark misses or production failuresInspect latent policy structure, geometry, and leakage before deploymentCan localize bottlenecks, unused observations, and ECG leakage in reviewed case studiesPublic evidence is case-study based rather than product-documentation based
Edit image-model behavior directlyPrompt-box iteration onlyPaint With Ember canvas that manipulates latent activations and concept weightsSupports adding, moving, and reshaping concepts without only rewriting promptsThis looks like a research preview rather than the core commercial SKU

Benefits mix directly claimed research outcomes with workflow-specific demonstrations. Goodfire does not publish customer-level ROI, conversion, or usage-frequency metrics for these flows.

[CE004, CE005, CE006, CE009, CE010, CE012]
FE002: Customer workflow / operating flow

The public workflow starts with a partner-led access motion, moves through shared interpretability experiments, and ends in targeted steering or design decisions.

This operating flow synthesizes the recurring pattern across language, life sciences, robotics, and launch materials. Public sources do not expose a formal buyer playbook or conversion funnel.

[CE003, CE004, CE007, CE011, CE016, CE033]

5.2 Interpretability primitives and operating architecture

Goodfire's architecture pairs a shared experiment workspace with a research stack that spans activation analysis, geometry discovery, parameter decomposition, and intervention tooling. The official research surface shows sparse autoencoders, probes, and manifold methods doing the early-stage work of surfacing interpretable features; neural-geometry work argues that many important concepts live on curved internal manifolds rather than single directions; and stochastic parameter decomposition pushes the stack deeper into weights, where Goodfire tries to identify which causal components can be removed without changing outputs. That combination suggests the platform is not a single technique but a layered toolkit for interpreting, localizing, and editing model behavior. The R1 work is especially revealing because it shows both capability and friction. Goodfire says it trained the first public sparse autoencoders on a frontier reasoning model and had to build custom inference and interpreter-model infrastructure to do so. At the same time, the work shows that steering reasoning models is not plug-and-play: interventions had to happen after the model's stock response prefix, and some heavy-handed steering caused behavior to snap back toward the original response. That makes the core product proposition stronger, not weaker: the whole point of Silico is to expose these hidden operational constraints before customers ship or retrain blindly. This architecture also explains Goodfire's dependency stack. The deepest workflows require access to model internals, which makes open-weight or customer-controlled models a better fit than closed API endpoints. It also explains why Goodfire can reuse the same core ideas across domains. EVEE, Alzheimer's biomarker work, Paint With Ember, and robotics bottleneck analysis all share the same pattern: pull out internal structure, translate it into something legible, then use that understanding to debug, steer, or design the model more intentionally.[CE013, CE014, CE018, CE019, CE020, CE021]

Technology / operating architecture table
Layer / process / componentRoleDependencyRisk
Customer model and materials ingestionBrings weights, datasets, files, code, prompts, and workflows into the workspaceCustomer must control or expose enough internals for analysisClosed API models and restrictive data-sharing rules can block the deepest workflows
Shared workspace and agent orchestrationRuns experiments, captures outputs, and coordinates interpretability tasks on Goodfire infrastructureGoodfire compute, inference, and agent toolingTenancy, region layout, and review/approval controls are not public
Activation interpretability layerUses SAEs, probes, and related tools to localize model features and signalsActivation access plus trained interpreter modelsLinear feature methods can miss global curved structure
Geometry / manifold layerRecovers structured concept spaces for smoother understanding and controlClustering and geometry-discovery pipelines over internal representationsResearch maturity is high, but packaged product boundaries are not fully public
Parameter decomposition layerInspects weights as causal components rather than only observing activationsSPD-style decomposition and masking methodsScalability, runtime cost, and product packaging remain partially research-stage
Monitoring and failure-surfacing layerUses amplified sampling and eval-awareness analysis to catch rare post-training failuresBefore/after checkpoints, rollout analysis, and judge infrastructureMonitoring findings can depend on prompt design and may not generalize automatically
Intervention and steering loopApplies feature steering, filtering, reward shaping, and targeted model editsEdit permissions, rollback discipline, and model-specific heuristicsWrong timing or oversteering can cause route-around behavior in reasoning models
Service and commercial delivery layerAdds support, technical assistance, field engineering, and research collaboration around the platformOrder forms, Goodfire personnel, and partner workflowsHigh-touch delivery can slow scaling and hide how much value is software versus services

This is an evidence-backed operating architecture, not an official engineering diagram. It distinguishes public method layers from undisclosed infrastructure details such as tenancy, vendor stack, and data residency.

[CE018, CE020, CE021, CE022, CE023, CE024]
FE001: Product architecture map

Silico stacks customer-controlled model access, shared experimentation, interpretability primitives, and intervention tooling into a single model-design environment.

This stack is inferred from product pages, research posts, launch coverage, and legal terms. Goodfire does not publish a canonical architecture diagram or vendor-by-vendor infrastructure map.

[CE001, CE016, CE018, CE023, CE026, CE033]
FE003: Critical dependency map

Silico depends on customer access to model internals, Goodfire-controlled experiment infrastructure, contractual order forms, and domain-specific partner contexts.

The map is a synthesis of public product, legal, and launch materials. It highlights the practical dependency that Goodfire works best when customers can expose model internals rather than only call opaque APIs.

[CE017, CE033, CE034, CE036, CE037, CE038]

5.3 Trust, quality, and compliance posture

Goodfire's public trust posture is more mature on enterprise security than on public operational transparency. The strongest visible procurement signal is the company's SOC 2 Type II announcement, which says the audit completed with no exceptions and is accompanied by a public SOC 3 summary. Health-facing materials add another layer by describing Mayo-specific privacy protocols and governance frameworks designed to reduce spurious correlations and improve clinical relevance. Those are meaningful indicators for buyers in regulated environments. The legal surface, however, makes clear that many of the operational details investors and enterprise architects normally want to inspect are still private. The terms of use define the platform broadly to include software, APIs, tools, documentation, support, and services, but the concrete economics live in negotiated order forms. Usage reports are authoritative for billing, overages exist, and pilots are explicitly provided on an AS IS basis unless an order form says otherwise. Public terms also reserve suspension rights for security, legal, operational, and payment reasons and allow third-party products into the delivery stack. This is credible enterprise contract scaffolding, but it still leaves important diligence gaps. Public materials do not disclose a self-serve API reference, a public status page, deployment-count evidence, tenancy architecture, or quantitative uptime history. That matters because the external frameworks Goodfire is selling into are increasingly intolerant of black-box governance. NIST focuses on trustworthiness across design, development, use, and evaluation, while Gartner warns that hidden governance and change-management costs can dominate ROI in high-stakes GenAI deployments. Goodfire is directionally aligned with those buyer needs, but still early in how much public operating evidence it exposes.[CE033, CE034, CE035, CE036, CE037, CE038]

Trust / quality / compliance table
Control / certification / quality metricStatusScopeGap
SOC 2 Type II / SOC 3Achieved; Type II announced with no exceptionsEnterprise security and procurement assuranceDoes not substitute for public uptime or architecture transparency
Order-form commercial controlsLive contractual structureFees, overages, service scope, and commercial commitmentsNo public rate card or public benchmark for deal terms
Pilot program guardrailsLive evaluation structureInternal evaluation only; separate commercial license required after pilotDefault pilot terms are AS IS and do not publish service levels
Usage reports and meteringLive billing controlGoodfire records are authoritative for fee calculation and usage summariesPublic documents do not disclose exact metering units, quotas, or thresholds
Suspension and third-party-product governanceLive contractual controlSecurity, legal, operational, payment, and third-party integration handlingFallback procedures and vendor list are not public
Mayo privacy and governance protocolsPartner-specific public commitmentHealth and genomics collaborationNot a generic public privacy architecture for all customer deployments
Public transparency surfaceLimitedTrust portal, contact path, and security summaryNo public status page, self-serve API docs, incident history, or deployment-count disclosure

The table separates formal procurement signals from missing public operating evidence. Goodfire looks stronger on negotiated enterprise controls than on broad public transparency.

[CE034, CE035, CE037, CE038, CE039, CE040]

5.4 Roadmap, release cadence, and maturity

Goodfire's release cadence looks more like a fast-moving research organization productizing an internal stack than like a traditional enterprise software vendor with a stable public changelog. One public breadcrumb is the February 2026 deprecation notice for the earlier SAE demo interface and API, which implies a transition away from narrow research-preview tooling. By late April 2026, MIT Technology Review was covering Silico as an externally available product, and the company's own financing press materials were already framing the roadmap around next-generation product development plus scaled partnerships across AI agents and life sciences. The cadence after launch is still primarily expressed through research drops. In May 2026 alone, Goodfire published work on eval-awareness measurement, story-shape geometry, and SAE-based geometry recovery. That is unusually fast public iteration for a company trying to sell into enterprise and regulated workflows. It also means roadmap visibility is asymmetric: buyers can see the scientific engine moving quickly, but cannot yet inspect a normal SaaS artifact trail such as versioned release notes, public incident history, or a broad integration catalog. The resulting maturity picture is mixed but coherent. Core scientific capability appears strong, and the domain workflows in language, genomics, and scientific discovery are more than conceptual. Security posture is enterprise credible. The main immaturity is packaging: access remains negotiated, many deployments appear service-attached, and several key reliability and integration details remain private. Goodfire therefore looks most mature as a high-end design environment for teams with serious model ownership, and least mature as a broadly standardized developer platform.[CE015, CE017, CE039, CE044, CE046, CE047]

Roadmap / release / development-stage table
Date / stageFeature / milestoneStatusImplicationSource
Pre-Feb 2026 previewStandalone SAE demo interface and APIDeprecated in Feb 2026Goodfire consolidated from narrow preview tooling toward a broader platform motionFeature Steering blog
2026-02 strategic thesisIntentional design and next-generation core-product narrativePublicly articulatedRoadmap is anchored on closed-loop training control, not only on post-hoc explanationIntentional Design + PR Newswire
2026-04-30Silico launch / external unveilingLive product surfaceInternal interpretability tooling became an externally offered product with case-by-case pricingMIT Technology Review
2026-05-04Verbalized eval awareness paperPublishedPublic research cadence focuses on reliability and benchmark quality for safety-conscious buyersGoodfire Research
2026-05-20The Shape of Stories Inside Neural NetworksPublishedShows weekly geometry research output rather than a classic SaaS changelog patternGoodfire Research
2026-05-21Can SAEs Capture Neural Geometry?PublishedContinues tooling work that can feed future control surfaces and geometry-aware methodsGoodfire Research
2026 security milestoneSOC 2 Type II / SOC 3AchievedProcurement readiness is moving faster than public ops telemetryGoodfire blog
2026 partner build-outMayo, Prima Mente, and Radical domain workflowsActive programsRoadmap includes scientific verticalization in genomics, healthcare, and materials, not just generic LLM toolingGoodfire partner/customer pages

Goodfire exposes roadmap mainly through research posts, partner announcements, and financing narratives rather than through a public changelog. Dates therefore track public milestones, not a version-history feed.

[CE015, CE039, CE047, CE048, CE049, CE050]
FE004: Product maturity / capability map

Maturity is strongest in the core interpretability engine and domain-specific workflows, and weakest in public platform packaging and transparent operating telemetry.

Ratings are qualitative judgments from public evidence only. They measure visible maturity, not internal product quality or customer satisfaction.

[CE015, CE039, CE044, CE046, CE047, CE048]

5.5 Exhibits

Chapter 06

06Customers

6.1 Customer segmentation and buying centers

Goodfire's public customer story centers on organizations that build or fine-tune foundation models rather than end-user application buyers. The clearest broad segmentation claim comes from the company's contact page, which says the platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs. Product pages sharpen that picture: Silico is pitched to teams training or fine-tuning models across architectures and modalities, the language page targets LLM developers who want to predict failures and improve behavior without retraining from scratch, the life-sciences page targets genomics and scientific-model teams, and the robotics / vision page targets physical-AI and medical-imaging workflows. Across those surfaces, the likely economic buyer is an R&D, platform, or product owner responsible for model performance and reliability, while the day-to-day users are research scientists, ML engineers, and interpretability specialists. The important caveat is that Goodfire does not translate those segment claims into counts, revenue mix, or named enterprise references. Public materials do not disclose customer count, ARR, segment share, or a list of the Fortune 500 users behind the broad enterprise claim. The public proof set is therefore much deeper in vertical specificity than in commercial breadth: named evidence clusters in genomics, clinical research, AI-agent safety, and materials discovery, with the rest of the enterprise narrative still mostly unenumerated. That asymmetry suggests a selective, high-touch go-to-market motion in which Goodfire wins a small number of technically sophisticated design partners first and only later may broaden toward more standardized enterprise software distribution.[CU001, CU002, CU003, CU004, CU005, CU006]

Customer segmentation table
SegmentBuyer / user / payerPrimary use casePublic proofStrategic valueGap
Frontier model labs and AI research teamsBuyer: research / platform lead; user: interpretability researcher and ML engineer; payer: R&D or model platform budgetInspect internals, debug failures, shape training, monitor deploymentSilico page, Series B post, MIT Technology ReviewCore category where Goodfire can become workflow infrastructureNo public account count or list of labs beyond named references
Healthcare and genomics institutionsBuyer: medical AI leader or scientific program owner; user: computational biology / genomics team; payer: research or translational medicine budgetInterpret scientific models, surface biomarkers, explain variant effects, validate model reasoningMayo Clinic, Prima Mente, Arc Institute, EVEE researchHighest-quality named proof and strongest differentiated outcomesMost evidence is still research-stage, not routine clinical production
Large enterprises / Fortune 500Buyer: enterprise AI or product owner; user: ML / safety / model operations teams; payer: innovation, platform, or business-unit budgetImprove reliability, controllability, and ROI of internal modelsContact page and Salesforce Ventures thesisCould materially broaden ACV if the broad claim converts to named logosNo named Fortune 500 accounts or disclosed outcomes
AI-agent platforms and consumer-internet operatorsBuyer: safety / product leader; user: guardrail and infrastructure teams; payer: platform engineering budgetDetect PII, monitor agent behavior, deploy lightweight guardrailsRakuten production deploymentBest public proof that Goodfire can support live enterprise workflowsOnly one named production enterprise case in reviewed sources
Materials and physical-science teamsBuyer: scientific program lead; user: model scientist and autonomous-lab team; payer: R&D budgetUse internals to improve inverse design and candidate targetingRadical AI partnership and self-correcting-search researchExpands Goodfire beyond biology into broader in silico discoveryCommercial maturity and repeatability remain early

Rows summarize public segment evidence only. Nulls and unnamed enterprise claims indicate missing disclosure rather than absence of customers.

[CU001, CU002, CU003, CU004, CU005, CU008]
FU001: Customer journey map

Public evidence points to a selective enterprise journey: identify a high-stakes model problem, engage Goodfire as a design partner, work in a shared environment, validate technical gains, and then expand into broader monitoring or research programs.

[CU001, CU003, CU009, CU011, CU017, CU022]

6.2 Named customer proof and adoption motion

The named proof set shows that Goodfire is doing real work for customers and partners, but the type of proof varies materially by account. Prima Mente is the clearest model-to-science case study: Goodfire says it embedded researchers with Prima Mente, interpreted the Pleiades epigenomics model, and helped identify a novel class of blood-borne biomarkers for Alzheimer's detection. Arc Institute is a strong scientific reference showing Goodfire can work with frontier biological foundation models at scale; however, Arc evidence is still best understood as a research collaboration rather than a conventional software deployment, especially because the initial steering work was described as early stage. Mayo Clinic similarly supports category credibility, governance readiness, and clinical adjacency, but the public record frames the work as research and hypothesis generation rather than routine clinical deployment. Rakuten stands apart because it is the clearest public production-style deployment: Goodfire says Rakuten deployed SAE probes for PII detection in AI agents after the system had to generalize from synthetic training data to real multilingual traffic with high recall requirements. Radical AI adds a fifth named proof point in materials science, but commercialization maturity remains early because the public disclosure emphasizes technical progress and promises more detail later. Taken together, the adoption motion looks consultative and deeply collaborative. Goodfire repeatedly describes a shared environment, selective design-partner engagement, embedded work, and case-by-case pricing rather than self-serve onboarding. That is a credible way to launch an advanced infrastructure product, but it also means that the current evidence base proves depth of technical engagement more clearly than repeatable, scaled software distribution.[CU010, CU011, CU012, CU013, CU014, CU015]

Customer growth / adoption trajectory table
MetricValueDateSource qualityImplicationMissing denominator
Broad customer categories disclosedFortune 500 enterprises, major healthcare institutions, AI research labs2026-06-10mediumGoodfire markets beyond pure research labsNo count by category or logo list
Named public collaborators / customers with specific use cases52026-06-10highPublic proof set includes Prima Mente, Arc Institute, Mayo Clinic, Rakuten, and Radical AINot total customer count
Named proof points with quantified technical outcomes42026-06-10mediumPrima Mente, Mayo EVEE, Rakuten, and Radical disclose measurable technical resultsOutcome metrics are technical, not commercial
Named proof points explicitly described as production deployment12026-06-10highRakuten is the clearest production-style enterprise accountNo disclosed production account count across the rest of the base
Pricing disclosureCase-by-case and request-access2026-04-30highSales motion appears enterprise and consultativeNo public pricing tiers or contract ranges
Public follow-on evidence after initial collaboration announcement22026-06-10mediumArc and Mayo have later public updates, suggesting some relationship continuityFollow-on evidence is not the same as paid renewal
Public customer count / ARR / NRR2026-06-10highCommercial scale cannot be quantified from public evidenceCore denominator for adoption and durability is undisclosed

Count rows refer to the public proof set visible in reviewed sources, not to Goodfire's total customer base. Null means undisclosed.

[CU001, CU006, CU007, CU022, CU024, CU025]
Named customer proof table
Customer / partnerSegmentDeployment / use caseProduction vs pilotOutcomeLimitation
Prima MenteAI neuroscience / life sciencesInterpret Pleiades epigenomics model to surface disease signals and improve model designHigh-touch research collaboration; not disclosed as routine clinical productionNovel class of blood-borne Alzheimer's biomarkers identified; fragmentomics/fragment length highlightedExperimental validation and publication are still pending
Arc InstituteGenomics foundation-model researchInterpret Evo 2 representations and explore steerable biological featuresResearch collaboration with later Nature-linked validation; commercial terms undisclosedFeature discovery across coding sequences, protein structure, and tree-of-life representationsInitial steering work was described as early stage
Mayo ClinicMajor healthcare institution / genomic medicineReverse engineer genomics foundation models and launch EVEE variant-effect explorerResearch and translational collaboration; not disclosed as routine clinical deployment0.997 AUROC on 839k ClinVar variants; interpretable predictions for all 4.2M ClinVar variantsWork is undergoing peer review and computational outputs are not diagnoses
RakutenEnterprise AI-agent platformDetect PII in multilingual user messages for AI agentsProduction deploymentSAE probes deployed with strong synthetic-to-real generalization and major cost savings vs LLM-as-judgeOnly one named production enterprise deployment is public
Radical AIMaterials discovery / autonomous labImprove inverse materials design using self-correcting search on MatterGenEarly design partnership / technical proof~27% overall increase in successful candidates and ~30% more SUN materials in target rangePublic disclosure leaves commercialization and repeat usage unclear

This is an intentionally partial public-proof enumeration. It distinguishes named, use-case-specific evidence from broader but unnamed enterprise claims.

[CU010, CU012, CU013, CU014, CU016, CU017]
FU002: Adoption / deployment funnel

Because customer counts are undisclosed, the adoption figure is shown as a deployment flow rather than a numeric funnel: Goodfire appears to move from selective prospecting to shared-environment work, technical validation, and only occasionally disclosed production rollout.

[CU009, CU011, CU022, CU024, CU029, CU030]
FU003: Customer proof matrix

The matrix compares the public quality of each named reference account across disclosure, quantified outcomes, production maturity, independent corroboration, and retention visibility.

[CU013, CU015, CU018, CU020, CU024, CU025]

6.3 Durability, expansion, and concentration risks

Goodfire's customer durability story is the weakest part of the public record. No reviewed source disclosed NRR, GRR, churn, renewal rates, contract length, seat expansion, customer concentration, or satisfaction metrics such as NPS. The company also does not publish customer count, so outside investors cannot tell whether the business has a few large design partners or a broader installed base. The best available durability proxies are continuity signals in public collaboration history: Arc moved from an early-2025 announcement to a later Nature-linked update, and Mayo moved from a 2025 collaboration announcement to 2026 EVEE research outputs. Those signals show some relationships continue long enough to generate additional public work, but they do not prove paid renewals, revenue expansion, or long-term stickiness. Expansion potential is visible nonetheless. Goodfire can land inside high-stakes model-development workflows and then expand from research support into monitoring, training intervention, guardrails, and adjacent scientific programs. The risk is that the proof set is concentrated in a handful of named collaborators and heavily weighted toward life sciences, while the broad Fortune 500 claim remains mostly anonymous. Two independent sources sharpen the caution. MIT Technology Review praised Silico's utility but quoted Leonard Bereska arguing that Goodfire adds 'precision to the alchemy' rather than turning model design into fully principled engineering, and OnHealthcare argued that the $1.25 billion valuation looks aggressive given limited public commercial disclosure. The customer thesis is therefore promising but still fragile: Goodfire has credible reference accounts and technical outcomes, yet much of the investability question still depends on private evidence around account scale, contract economics, and repeat usage.[CU038, CU039, CU040, CU041, CU042, CU043]

Retention / repeat usage / satisfaction table
MetricValue / nullSegmentConfidenceDiligence ask
Net revenue retention (NRR)All segmentshighRequest customer cohort tables and expansion by account vintage
Gross revenue retention / churnAll segmentshighRequest renewal and logo-retention data by customer type
Contract length / commercial termEnterprise and research accountshighRequest pricing schedule, term length, and pilot-to-paid conversion rates
Public continuity proxy: Arc InstituteInitial 2025 announcement followed by 2026 Nature-linked updateGenomics researchmediumConfirm whether continuity reflected paid renewal, expanded scope, or publication only
Public continuity proxy: Mayo Clinic2025 collaboration later referenced by 2026 EVEE researchHealthcare / genomicsmediumConfirm whether follow-on work sits under one master agreement or multiple phases
Customer satisfaction proxyAll segmentshighRequest NPS, reference calls, or user-review data; no public reviews surfaced in the cache

Null means the metric was not publicly disclosed. The two continuity rows are relationship proxies only and should not be read as revenue retention metrics.

[CU019, CU024, CU038, CU039, CU040, CU046]
Expansion and concentration risk table
Expansion driverConcentration / friction riskImpactEvidenceDiligence path
Land from research collaboration into shared product environmentHigh-touch delivery may scale more like expert services than pure softwareCould produce high ACV but slow logo velocitySeries B post, Silico page, Prima Mente embedded-work descriptionSplit revenue by software subscription, services, and custom research
Expand from life sciences into enterprise model operationsPublic named proof remains heavily weighted toward biologyVertical concentration could distort the apparent breadth of demandLife-sciences page, Rakuten proof, Salesforce Ventures thesisMeasure pipeline and closed-won accounts outside biology
Broaden from open-model research teams to enterprisesModel-access constraints may limit use with closed frontier modelsAdoption may skew toward labs with parameter accessMIT Technology Review and Silico pageDocument support for closed-model monitoring or partner integrations
Use marquee references to win Fortune 500 buyersFortune 500 claim is unnamed and therefore weaker than the named proof setEnterprise credibility could be overstated relative to disclosed evidenceContact page and named proof tableRequest named references, outcomes, and reference-call permissions
Deepen AI-agent and guardrail use casesRakuten is a single disclosed production accountCategory could be large, but public production proof is still thinRakuten research and funding / investor coverageProvide additional production customers and renewal evidence in agent workflows

Expansion rows reflect visible go-to-market vectors in public materials. Risks focus on disclosure gaps, concentration in the proof set, and the likely services-heavy delivery model.

[CU022, CU024, CU025, CU029, CU031, CU032]
FU004: Retention / repeat cohort

Goodfire does not disclose true revenue-retention cohorts, so this figure shows a narrower proxy: the share of named public collaboration cohorts that later received additional public follow-on evidence. It is a continuity proxy, not NRR or customer retention.

This figure is evidence-constrained. Goodfire does not disclose customer retention metrics, so the cohort shows only later public continuity for named relationships.

[CU019, CU024, CU038, CU040]

6.4 Exhibits

Chapter 07

07Risks

7.1 Legal, regulatory, and contract risk

Goodfire's legal and regulatory posture is strong enough to clear initial enterprise diligence, but not yet strong enough to erase downside transfer. The positive evidence is real: Goodfire says it has achieved SOC 2 Type II, Mayo describes work under rigorous privacy and governance protocols, and the company frames interpretability as the bridge that makes sensitive AI use cases more governable. The harder underwriting read comes from the contracts. Default terms disclaim warranties around uninterrupted, secure, accurate, or error-free service; pilot and evaluation modes can operate without security or support commitments unless an order form says otherwise; and aggregate liability is capped to fees paid. Those are normal startup-software positions, but for a platform aimed at healthcare, safety, and potentially critical- infrastructure workflows they leave customers carrying a meaningful share of outage, breach, and deployment risk. Data-rights posture is the second sharp edge. The TOS gives Goodfire broad rights over Usage Data and a perpetual license over Workflow Data for improvement, evaluation, training, and commercialization, while also assigning feedback IP to Goodfire. That may be commercially rational for a research-driven platform, but it can slow procurement in regulated settings where customers want hard separation between operational traces, model behavior, and vendor product improvement. NIST's generative-AI profile and 2026 critical-infrastructure concept note both point toward more explicit risk controls, and Gartner likewise emphasizes governance, cost discipline, and realistic measurement as adoption gates. The upshot is that Goodfire does not appear to face public litigation or enforcement today, but it does face a contract-and-governance burden: if order forms do not materially improve on the default paper, expansion into regulated workloads will be slower than the brand narrative implies.[CR008, CR009, CR010, CR011, CR012, CR013]

Regulatory / legal risk register
rule / obligation / posturejurisdictionstatuslikelihoodseveritymitigationresidual exposurediligence path
Default warranty disclaimers and liability capsU.S. contract law / customer order formsCurrent in public MSA, pilot agreement, and TOSHighHighNegotiate customer-specific paper, cyber insurance, and security addendaHigh for regulated or safety-critical buyersReview top 10 executed enterprise redlines versus default terms and any uncapped confidentiality / security carve-outs.
Broad Workflow Data and Usage Data rightsCross-border enterprise procurement / privacyCurrent in TOSMedium-HighHighCustomer-specific data-use carve-outs, de-identification controls, audit rightsMedium-HighReview DPA, data-flow maps, retention windows, and whether workflow data can be excluded from improvement/training.
Healthcare explainability and clinical-governance burdenU.S. healthcare / regulated researchPartially mitigated by Mayo governance language and biomarker case studyMedium-HighHighUse interpretability as validation layer, partner with regulated institutions, document governance packHigh until there is broader deployment proofObtain clinical-validation plan, regulatory positioning memo, and evidence of deployments beyond named research collaborations.
Critical-infrastructure trustworthiness expectationsU.S. critical infrastructureRising external expectation per NIST 2026 concept noteMediumHighMap controls to NIST AI RMF profiles and customer model-risk workflowsMedium-HighRequest sector-specific control matrix, logging / auditability architecture, and incident-response procedures.
Export-control and restricted-jurisdiction constraintsU.S. export / re-export lawCurrent in public contractsMediumMediumScreen customers, geographies, and downstream model uses; use counsel on sensitive deploymentsMediumReview export-screening process and any blocked-country or restricted-end-use policy.
Feedback assignment and service-IP ownershipCustomer/vendor IP allocationCurrent in MSA and TOSMediumMediumContractual carve-outs for customer inventions and regulated workflowsMediumReview whether enterprise paper limits feedback assignment, deliverable ownership, and derivative-work ambiguity.

Public evidence shows strong enterprise intent but still customer-favorable default paper; rows are ordered by residual underwriting importance.

[CR008, CR009, CR010, CR011, CR012, CR013]
FR001: Risk heatmap

Places Goodfire's principal risks by mitigation maturity, showing that the company has meaningful intellectual and governance assets but still weak public proof on repeatability, customer breadth, and regulated deployment readiness.

Heatmap cells are synthesis judgments based on public evidence as of 2026-06-10, not company-internal risk scoring.

[CR009, CR011, CR016, CR019, CR024, CR025]

7.2 Technical reliability and product-proof risk

Goodfire's core product claim is ambitious: that interpretability can move model development from guesswork toward controllable engineering. The risk is that Goodfire's own research record shows how early that journey still is. The intentional-design essay says the science is incomplete and the hardest problems remain unsolved. MIT Technology Review highlights the same tension from outside, quoting a mechanistic- interpretability researcher who sees Silico as useful but still more precise alchemy than true engineering. This matters because Goodfire is not selling only dashboards; it is selling trust that its interventions expose the right internal mechanisms and safely change behavior in consequential systems. The company's recent papers reinforce the need for skepticism. Verbalized eval awareness inflates measured safety; reasoning traces can be performative rather than faithful; rare harmful or backdoored behaviors may evade standard evaluations; memorization edits can preserve some reasoning while damaging arithmetic and recall; and Goodfire's own method posts say SAEs, linear steering, and parameter decomposition all have important limitations. None of that invalidates the technology. In fact, it strengthens the case that Goodfire is doing serious work on real failure modes. But it also means buyers and investors should treat current results as advanced instrumentation, not yet as proof that model behavior is fully legible or controllable. The sharp question for diligence is whether Goodfire can turn promising research into production-grade reliability evidence faster than the surrounding AI stack commoditizes adjacent monitoring, evaluation, and tracing workflows.[CR001, CR002, CR003, CR004, CR005, CR006]

Operational / quality / security risk register
failure modelikelihoodseveritymitigation maturityresidual exposureunresolved gap
Interpretability science remains incomplete, making product promises outrun causal understandingHighHighPartial: Goodfire is publishing openly about limitations and building tooling anywayHighNeed independent production case studies showing interventions improve outcomes without hidden regressions.
Benchmark safety scores can be inflated by eval awareness and prompt artifactsHighHighPartial: Goodfire has identified the distortion and prompt-rewrite mitigationsHighNeed third-party eval methodology showing deployment behavior tracks benchmark performance.
Chain-of-thought can be performative rather than faithful on easier tasksHighMedium-HighPartial: probes and early-exit methods help, but do not solve full faithfulnessMedium-HighNeed deployment monitors that do not rely only on visible reasoning.
Rare harmful or backdoored behaviors may evade standard testing until after deploymentMedium-HighHighPartial: model-diff amplification appears useful for surfacing rare failuresHighNeed standardized pre-deployment red-team workflow and evidence it generalizes beyond model organisms.
Edits that suppress memorization or steer behavior can degrade arithmetic or factual recallMediumMedium-HighWeak-Partial: tradeoffs are documented, not yet cleanly solvedMedium-HighNeed model-quality scorecards showing what is lost as interpretability interventions are applied.
Current SAE / steering methods capture only fragments of geometry and can produce off-target effectsMediumMediumPartial: Goodfire is moving toward manifold-aware methods and SPDMediumNeed proof that newer methods scale beyond toy models and simple demo tasks.
Public security posture shows SOC 2 but not public SLAs, incident history, or runtime-control detailMediumMedium-HighPartial: SOC 2 and trust portal existMedium-HighNeed uptime reporting, incident history, and architecture detail for regulated buyers.

Severity reflects whether the failure mode would break trust in Goodfire as a control layer for consequential AI, not merely whether a research result is interesting.

[CR003, CR004, CR005, CR016, CR028, CR029]
FR002: Risk transmission map

Shows how Goodfire's research and contract risks transmit into slower regulated adoption, weaker reference quality, and potential valuation compression.

The DAG expresses directional business logic rather than measured probabilities.

[CR003, CR004, CR009, CR011, CR024, CR025]

7.3 Partner, customer, and dependency risk

Public market proof for Goodfire is narrower than the headline suggests. The company says the platform is used by Fortune 500 enterprises, healthcare institutions, and AI labs, but the named public evidence clusters around a small number of collaborations: Prima Mente in Alzheimer's biomarker discovery, Mayo Clinic in genomic medicine, Radical AI in materials science, and a request-access product page for companies training or fine-tuning models. Even the strongest case study describes Goodfire researchers embedding with the customer and building the workflow jointly. That is valuable evidence of technical depth, but it points to a high-touch delivery model rather than clearly repeatable software revenue. MIT Technology Review's case-by-case pricing note and On Healthcare's observation that this is not yet a predictable SaaS profile both fit the same pattern. This concentration creates two linked risks. First, public reference quality is partner-heavy rather than broad-based: if one flagship collaboration stalls, there is little disclosed volume to absorb the narrative hit. Second, the broader buyer workflow already contains adjacent products from observability and evaluation vendors such as Datadog and LangSmith, which package testing, tracing, monitoring, and governance for production AI teams. Those platforms are not mechanistic-interpretability equivalents, but they compete for budget and for the right to define what AI control and monitoring should look like in production. Goodfire therefore depends on proving that deep white-box access is a distinct control layer worth buying, not just an advanced research add-on inside a stack customers already understand.[CR006, CR007, CR018, CR024, CR025, CR026]

Partner / dependency risk register
dependencycounterparty / surfaceroleconcentrationfailure scenarioseveritymitigationresidual exposure
Named reference basePrima Mente, Mayo Clinic, Radical AI, and unnamed enterprisesPublic proof that the platform works in important domainsHighOne or two flagship collaborations stall, leaving little disclosed breadth to offset the narrative hitHighAdd diverse named production references and renewal proof across sectorsHigh
Research-heavy delivery modelEmbedded researchers, field engineering, collaboration servicesTransforms customer models and produces the strongest public outcomesHighRevenue scales with scarce expert labor instead of repeatable software usageHighSeparate productized modules, playbooks, and self-serve workflows from bespoke research workHigh
Frontier-model builder demandCompanies training or fine-tuning models across architectures and modalitiesCore buyer group for SilicoMedium-HighOpen-model teams or frontier labs internalize similar tooling or decide observability is enoughHighShow clear ROI and control advantages that cannot be replicated with standard tracing stacksMedium-High
Customer willingness to share workflow and usage dataEnterprise customers under TOS / order-form processCan improve platform performance and product learning loopsMediumProcurement teams restrict data-use rights or demand hard segregation of tracesMedium-HighOffer tighter customer controls and contract options that preserve trust without removing all learning loopsMedium
Adjacent observability stackDatadog Agent Observability, LangSmith, and similar toolingCompetes for the same monitoring, evaluation, and governance budget linesMediumCustomers buy observability plus evaluations and decide they do not need separate white-box interpretabilityMedium-HighPosition interpretability as a distinct causal-control layer with measurable lift in debugging or model designMedium-High
Healthcare governance partnersMayo and other regulated institutionsProvide legitimacy in sensitive domainsMediumIf governance-heavy partners do not translate into broader deployments, Goodfire stays a bespoke research vendorMedium-HighTurn flagship healthcare work into repeatable compliance and validation packagesMedium-High

Concentration is judged from disclosed public evidence only; the company may have broader commercial breadth privately, but it is not yet visible enough to underwrite as a core mitigation.

[CR006, CR007, CR017, CR018, CR024, CR025]
FR003: Dependency map

Maps the external surfaces Goodfire currently relies on most: flagship collaborators, enterprise buyers willing to share data and buy bespoke work, and adjacent observability platforms shaping buyer expectations.

Partner and buyer concentration are inferred only from publicly disclosed proof points; undisclosed customers could improve the true picture.

[CR007, CR024, CR026, CR027, CR044, CR045]

7.4 Execution, talent, capital, and thesis-break triggers

Goodfire is trying to do three hard things at once: push frontier interpretability research, turn that work into an enterprise platform, and establish category authority in regulated and high-stakes domains. Public evidence suggests the company is still small relative to the size of that ambition. On Healthcare pegs headcount at about 51 people, the labor pool for interpretability specialists appears unusually thin, and the careers page signals a still-scaling organization. At the same time, the February 2026 Series B pushed valuation to $1.25 billion, which compresses the margin for execution error. A company with limited disclosed customer breadth, no public pricing architecture, and a high-touch services component now has to prove that it can become repeatable software quickly enough to justify that mark. The practical investment answer is to convert these uncertainties into hard triggers. If customer contracts continue to leave security and outage risk mostly with the buyer, if named production references do not widen materially, if software revenue still cannot be separated from embedded services, or if adjacent observability platforms satisfy most buyer needs, the thesis weakens fast. Conversely, the risk can compress if Goodfire shows production renewals in regulated settings, enterprise paper that materially tightens default terms, and independent evidence that interpretability interventions work in deployment rather than only in papers or bespoke collaborations. Until then, Goodfire looks like a high-upside but still proof-constrained control-layer bet rather than a de-risked infrastructure standard.[CR019, CR020, CR021, CR022, CR023, CR024]

People / execution risk register
role / functiondependency or gaplikelihoodseveritymitigationdiligence path
Interpretability research benchGlobal talent pool appears unusually thin and expensiveHighHighUse capital to recruit senior researchers and convert reputation into hiring leverageReview retention metrics, key-hire pipeline, and compensation competitiveness versus frontier labs.
Research-to-product translationCompany must turn frontier papers into repeatable enterprise workflowsHighHighProductize the highest-value interventions and narrow initial beachhead use casesReview product roadmap, services share of revenue, and deployment architecture for named customers.
Commercial scaling / GTMCase-by-case pricing and request-access posture limit visible repeatabilityHighMedium-HighStandardize packages, implementation process, and procurement paperRequest pricing architecture, ACV bands, sales-cycle data, and renewal metrics.
Management bandwidthSmall team is simultaneously building research, platform, and regulated-domain partnershipsMedium-HighMedium-HighPrioritize a few vertical wedges and reduce bespoke projectsReview functional leadership depth, hiring plan, and what share of roadmap is customer-specific.
Capital discipline after unicorn pricingSeries B valuation compresses tolerance for slow commercial proofMediumHighUse new capital to widen reference base and prove software leverage quicklyRequest board materials on spend allocation, next milestone gates, and target evidence for next round.

The public risk is not simply that the team is small; it is that the company's ambition, valuation, and labor-market scarcity all expand execution scope faster than public proof has expanded.

[CR019, CR020, CR021, CR022, CR023, CR024]
Mitigation and kill criteria table
riskmonitorable triggerthreshold / eventaction implication
Contract paper remains startup-favorableEnterprise MSAs still mirror public liability caps and warranty disclaimersNo meaningful security / outage / confidentiality carve-out in first 3 reference customersTreat regulated-deployment thesis as unproven; do not underwrite healthcare or critical-infrastructure expansion.
Data-rights friction blocks procurementCustomers require major redlines around Workflow Data or refuse data sharing entirelyTwo or more priority accounts stall specifically on data-use termsAssume slower sales cycles and weaker product-learning loop; haircut software-scale assumptions.
Reference set fails to broadenNamed production customers do not expand beyond current collaboration-heavy proof setFewer than 3 additional named production references within the next refresh cycleRe-rate company as bespoke research/services business rather than infrastructure layer.
Research results do not translate into deployment liftNo independent evidence of production gains from interpretability interventionsNo third-party deployment study or customer KPI showing measurable improvementReduce moat assumption and compare directly against conventional observability vendors.
Security and uptime posture stays opaqueNo public uptime, incident history, or runtime-control evidence beyond SOC 2Another refresh passes without SLA, status, or incident disclosuresAssume slower enterprise penetration in sensitive workloads.
Talent pipeline weakensHiring velocity or retention falls in core interpretability rolesMissed senior research / product hires for two consecutive quartersExpect roadmap slippage and heavier founder / researcher concentration risk.
Valuation outruns repeatabilityCapital raised and valuation grow faster than visible revenue qualityNo pricing standardization or software-services split by next major financing eventAvoid paying for category-optionality without evidence of repeatable unit economics.
Observability platforms absorb the buyer problemCustomers adopt tracing/evaluation stacks without adding white-box interpretabilityReference buyers describe Goodfire as nice-to-have research tooling rather than control-plane infrastructureThesis break: category collapses into a feature rather than a standalone platform.

Kill criteria are framed as observable public-or-diligence events so they can be revisited in future refreshes instead of remaining abstract concerns.

[CR009, CR011, CR013, CR016, CR019, CR024]
Chapter 08

08Valuation

8.1 Recommendation, Financing Context, and Why Price Matters More Than Narrative

Public evidence paints Goodfire as a rare, high-quality interpretability company. The company assembled an elite funding stack quickly: a $50 million Series A in April 2025 followed by a $150 million Series B at a $1.25 billion valuation in February 2026, with Menlo, Anthropic, B Capital, Salesforce Ventures, and Eric Schmidt all showing up across the cap table. Official and filing records also support the basics of institutional quality: Goodfire is a Delaware public benefit corporation founded in 2023, based in San Francisco, and by early 2026 had filed both Series A- and Series B-era Form D documents. Goodfire further claims enterprise-ready momentum via Ember, Mayo Clinic, Arc Institute, Prima Mente, Microsoft, and a February 2026 SOC 2 Type II announcement. Those positives matter, but this chapter is valuation work, not admiration. The evidence is strong on team quality, scientific credibility, and investor signaling; it is weak on the commercial datapoints normally used to justify a software infrastructure price. None of the public round materials in this source pack disclose ARR, revenue, pricing, customer count, retention, gross margin, or software-versus-services mix. That absence is decisive. At $1.25 billion, investors are not obviously paying for proven fundamentals; they are paying for the option that interpretability becomes core AI infrastructure and that Goodfire becomes one of the category winners. That may happen, but on public evidence alone the price already assumes more commercialization than the company has disclosed. The recommendation is therefore research-more, not buy, and the valuation stance is stretched rather than attractive.[CV001, CV002, CV004, CV005, CV006, CV007]

Recommendation summary table
DimensionAssessmentDecision implication
RecommendationResearch-moreRe-engage only if NDA diligence closes the revenue-quality and cap-table gaps, or if pricing resets toward the base-case range.
ConfidenceMediumQuality of company signal is strong; quality of valuation signal is incomplete.
Risk ratingHighCommercial opacity, category formation risk, and preference-stack uncertainty dominate underwriting.
Valuation stanceStretchedThe $1.25B round sits near the low end of the bull case rather than the center of the base case.
Near-term actionTrack aggressivelyMaintain diligence access, but do not underwrite the round on narrative alone.

Uses only public evidence as of the run date; entry discipline assumes primary exposure near the February 2026 round terms.

[CV001, CV005, CV015, CV036, CV047, CV048]
Thesis / anti-thesis table
LensThesisAnti-thesisWhat would change the view
Category needInterpretability should become more important as enterprises demand controllable and explainable AI.Enterprises may decide observability and guardrails are enough, keeping interpretability niche.Budget data showing Goodfire wins a standard line item rather than an experimental spend.
ProductEmber offers a differentiated model-internal control layer, not just post-hoc monitoring.The product may still be too research-heavy or bespoke to scale as software.Proof of standard pricing, time-to-value, and repeatable deployments.
Scientific proofGoodfire has real research outputs, including steering, neural geometry, genomics, and multimodal work.Scientific credibility does not automatically translate into recurring revenue.Evidence that flagship research programs convert into durable commercial accounts.
Strategic demandAnthropic, Salesforce, and Eric Schmidt are strong signal investors for the category.Smart investors can still overpay for strategic option value in a hot AI market.Independent software metrics that validate the price without relying on cap-table prestige.
ValuationA $1.25B mark could be justified if Goodfire becomes core AI infrastructure for high-stakes deployments.Today's public evidence does not disclose the ARR or margins needed to justify that mark on fundamentals.NDA disclosure of ARR, gross margin, and retention that supports a scalable software multiple.

Rows are evidence-backed arguments and the observable condition that would change the view.

[CV010, CV011, CV013, CV015, CV022, CV029]
FV001: Recommendation logic

How scientific strength, commercial opacity, and round price combine into the recommendation.

The flow is qualitative and designed to show decision logic, not a weighted scoring model.

[CV010, CV015, CV036, CV037, CV047, CV048]
FV004: Investment KPIs

Key underwriting datapoints that are either known from public evidence or still missing.

KPI panel mixes confirmed public facts with flagged gaps; unknown commercial metrics are shown explicitly as undisclosed.

[CV001, CV005, CV015, CV028, CV036, CV047]

8.2 Evidence-Constrained Valuation Framework and Comparable Marks

Because revenue is undisclosed, a conventional revenue-multiple model would create false precision. The right method is to combine comparable private valuation marks with scenario logic anchored on what is and is not public. The comparable set is useful less as a formula than as a discipline check. Anysphere, Harvey, and Glean all carried disclosed ARR when reporters attached multibillion-dollar marks to them, while Anthropic sits in a wholly different frontier-model and compute-scarcity universe. Goodfire does not belong in Anthropic territory, and unlike Anysphere, Harvey, or Glean it has not publicly shown the recurring revenue base that would let outside investors defend a multiple. That forces the current round to be interpreted as strategic option value. The bull, base, and bear cases therefore turn on milestone conversion rather than spreadsheet extrapolation. In the bull case, Goodfire proves that Ember converts design partners and research collaborators into repeatable software revenue, keeps shipping differentiated interpretability breakthroughs, and becomes a must-have layer for high-stakes AI deployment. In the base case, the category is real and Goodfire remains one of its strongest independent teams, but commercialization is still early and high-touch; that warrants a discount to the last round, not a premium. In the bear case, research remains impressive but budgets flow toward observability, guardrails, or frontier labs themselves, leaving Goodfire with a bespoke-services profile and a materially lower valuation. On this framing, the February 2026 round sits near the bottom of the bull range rather than in the middle of the base range.[CV010, CV011, CV016, CV022, CV023, CV024]

Bull / base / bear scenario table
ScenarioCore assumptionsValuation / return logicKey risksProbability signal
BullEmber converts research credibility into repeatable software revenue; partners become scaled reference customers; security and governance posture unlocks enterprise adoption.$1.25B-$1.85B EV; roughly 1.0x-1.5x versus the last round, meaning upside exists but is not huge unless execution is exceptional.Commercial conversion may stay slower than the research narrative implies.Low-medium; requires proof not yet public.
BaseCategory demand is real and Goodfire remains one of the best independent teams, but monetization stays early and partially bespoke.$0.80B-$1.10B EV; roughly 0.6x-0.9x versus the last round, implying weak risk-adjusted returns at today's price.Public data never closes the revenue-quality gap; budgets split across adjacent vendors.Medium; most consistent with current public evidence.
BearInterpretability remains valuable but budgets shift toward observability, guardrails, or frontier labs; Goodfire struggles to standardize product revenue.$0.35B-$0.65B EV; roughly 0.3x-0.5x versus the last round, implying material permanent-capital risk.Commercialization remains bespoke; multiple compression hits AI infrastructure names.Medium-low, but adverse enough to matter because disclosure is limited.

Scenario values are evidence-constrained enterprise-value ranges, not precise DCF outputs. Return logic is shown against the February 2026 $1.25B round mark.

[CV041, CV042, CV043, CV044, CV045, CV046]
Comparable valuation table
ComparablePublic metricValuation / statusRelevanceLimitation
GoodfireRevenue undisclosed; $150M Series B$1.25B valuation (Feb 2026)Direct market anchor for this chapter.No public ARR, pricing, or customer data to support a software multiple.
Anysphere / Cursor>$500M ARR$9.9B valuation (Jun 2025)Shows what a leading AI application company looks like when valuation is paired with disclosed scale.Different product, growth profile, and developer-led distribution.
Harvey$190M ARR$11B reported raise target (Feb 2026)Shows how elite enterprise AI valuations can outrun conventional multiples when growth is proven.Legal AI is a different vertical and the number is reported, not company-confirmed.
Glean>$100M ARR$7.2B valuation (Jun 2025)Useful application-software benchmark for enterprise AI value with disclosed ARR.Enterprise search and agents is a more mature commercial category than interpretability.
AnthropicFrontier model and compute scale$350B valuation with Google committing up to $40B (Apr 2026)Upper boundary for frontier-model scarcity value in AI.Not comparable operationally; Goodfire is not a frontier foundation-model lab.

Selected 2025-2026 private AI marks used as discipline checks, not one-for-one valuation formulas; Goodfire revenue is undisclosed, so implied multiples cannot be calculated responsibly.

[CV001, CV030, CV032, CV033, CV034, CV035]
FV002: Valuation sensitivity

Directional sensitivity of valuation conviction; positive bars strengthen willingness to pay, negative bars weaken it.

Sensitivity bars are directional conviction scores, not dollar deltas, because public revenue disclosure is absent.

[CV015, CV022, CV036, CV039, CV040, CV041]
FV003: Valuation / return range

Evidence-constrained valuation bands against the February 2026 $1.25B round reference.

These are scenario ranges built from public comparables and milestone logic; they are not a substitute for NDA-backed financial underwriting.

[CV001, CV042, CV043, CV044, CV045, CV048]

8.3 Exit Discipline, Thesis-Break Triggers, and Final Diligence Asks

The near-to-mid-term exit path is almost certainly another private round or a strategic transaction, not an IPO. Goodfire is too early and too opaque publicly for public-market underwriting: investors do not have audited revenue scale, margin profile, or even a basic customer-count disclosure. That does not make the company unattractive; it makes the investment case diligence-dependent. The practical implication is that entry discipline must focus on the missing proof points that would move Goodfire from “exceptional research company with commercial promise” to “underwritable software infrastructure business.” Those proof points are recurring revenue quality, standard pricing, concentration, gross margin, and the post-Series-B preference stack. The thesis can also break in observable ways. If collaborators fail to convert into repeatable customers, if management cannot disclose convincing revenue quality under NDA, or if budget holders decide that tracing, monitoring, and guardrails from adjacent vendors are sufficient without Goodfire's deeper internal-control layer, the current price becomes difficult to defend. Conversely, if Goodfire can show repeatable software subscriptions, strong partner conversion, and evidence that interpretability is becoming mandatory infrastructure in regulated and high-stakes deployments, the round can grow into itself. Until that evidence is produced, the disciplined posture is to keep Goodfire on the front of the watchlist, continue diligence aggressively, and avoid treating the February 2026 price as a proven bargain.[CV022, CV026, CV027, CV036, CV039, CV040]

Thesis-break and kill triggers table
TriggerThresholdTransmission to thesisAction implication
Revenue-quality opacity persistsManagement cannot disclose ARR, gross margin, concentration, and retention under NDA.The investment remains narrative-led instead of fundamentals-led.Do not underwrite above the base-case range; default to pass.
No partner-to-paid conversion patternScientific collaborators and design partners do not convert into repeatable platform revenue.Goodfire looks like a high-end research shop instead of scalable infrastructure software.Move valuation toward the bear case and require a lower entry or structured downside.
Observability vendors satisfy the budgetCustomers solve their pain with tracing, monitoring, and guardrails without needing model-internal control.The category wedge narrows and Goodfire's TAM compresses.Reduce conviction materially and reassess category ownership.
Preference stack is investor-unfriendlySeries B documents reveal heavy seniority, unusual protections, or meaningful dilution overhang.Enterprise value may not translate into acceptable equity returns.Re-cut returns on an equity-value basis before proceeding.
Security or governance credibility slipsA major trust, compliance, or governance issue undercuts the high-stakes deployment narrative.The premium tied to safe and controllable AI weakens quickly.Pause diligence until remediation is independently verified.

Triggers are framed as observable diligence findings or post-investment monitoring events that would break the underwriting case.

[CV036, CV043, CV046, CV049, CV050]
Final diligence asks table
TopicMissing evidenceWhy it mattersOwner / diligence path
Revenue qualityARR, bookings, net retention, gross retention, gross margin, and revenue mix.These are the core inputs for any valuation method beyond strategic option value.CFO / finance room under NDA.
Pricing and packagingCurrent pricing sheets, pilot-to-production conversion terms, and software-versus-services monetization.Determines whether the business can scale as product revenue rather than bespoke work.Sales leader and product leader interview plus contract sample review.
Customer concentrationTop ten customers, revenue concentration, deployment scope, and renewal status.High concentration would make the current price much harder to defend.Customer cohort review and account-level diligence.
Cap table and preferencesPost-Series-B cap table, liquidation preferences, pro rata rights, and governance protections.Equity value can differ sharply from enterprise value if preferences are heavy.Legal diligence on financing documents.
Commercial conversionEvidence that Mayo, Arc, Microsoft, or similar relationships create repeatable paid software patterns.This is the bridge between scientific credibility and a scalable investment case.Management deep dive with cohort examples and implementation metrics.

These are the minimum asks needed to move Goodfire from an interesting company to an underwritable investment at or near the last round price.

[CV027, CV040, CV049, CV050]

8.4 Exhibits

Disclaimer

This report is a public-evidence diligence snapshot, not investment advice. Important financial, legal, technical, and contractual facts remain non-public and should be verified directly with management and primary documents before any investment decision.

Evidence index

Claims
IDStatementConfidenceSources
CO001 Goodfire describes itself as a San Francisco-based research company and public benefit corporation. High SO001, SO002, SO014, SO018
CO002 Goodfire’s mission is to build safe and powerful AI by understanding and intentionally shaping model internals rather than relying on scaling alone. Medium SO001, SO004, SO005, SO006
CO003 Goodfire’s current public product is a model design environment that helps users understand, debug, and shape models through interpretability-based tooling. Medium SO002, SO007, SO027
CO004 Goodfire says its platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs. Medium SO010
CO005 Official materials frame Goodfire around two linked pillars: intentional design of models and scientific discovery from model internals. Medium SO004, SO005, SO008
CO006 Lightspeed publicly announced Goodfire’s $7 million seed round on August 15, 2024, showing the company was operating by mid-2024. Medium SO022, SO023
CO007 Series A materials say the $50 million round came less than one year after Goodfire’s founding, which supports a 2024 founding window. Medium SO018, SO020, SO021
CO008 One independent profile describes Goodfire as founded in 2023, creating a conflict with the 2024 founding window implied by financing materials. Medium SO028
CO009 Goodfire’s careers page says all roles are full-time and in person five days a week at a Telegraph Hill office in San Francisco. Medium SO003
CO010 Eric Ho is Goodfire’s CEO and primary public spokesperson in financing and media materials. Medium SO014, SO018, SO029
CO011 Daniel Balsam is publicly identified as Goodfire’s cofounder and CTO. Medium SO009, SO024, SO030
CO012 Tom McGrath is publicly identified as Goodfire’s cofounder and chief scientist, and partner materials credit him with founding DeepMind’s interpretability team. Medium SO024, SO030
CO013 Goodfire and third-party coverage say the team includes researchers or engineers from OpenAI, Google DeepMind, Harvard, Stanford, and UC San Diego. Medium SO004, SO014, SO017
CO014 Investor materials tie Eric Ho and Daniel Balsam to prior operating work at RippleMatch, supporting the claim that the founding team combines startup execution with research pedigree. Medium SO021, SO022, SO024
CO015 Reviewed public materials do not disclose a full board roster or a complete executive team beyond the founders and a few named researchers. Medium SO002, SO024, SO025
CO016 Goodfire announced a $50 million Series A led by Menlo Ventures with Lightspeed, Anthropic, B Capital, Work-Bench, Wing, and South Park Commons participating. High SO018, SO019, SO020, SO021, SO026
CO017 Lightspeed says it led Goodfire’s $7 million seed round in August 2024. Medium SO022, SO023
CO018 Goodfire announced a $150 million Series B at a $1.25 billion valuation led by B Capital with Juniper Ventures, Menlo Ventures, Lightspeed Venture Partners, South Park Commons, Wing Venture Capital, DFJ Growth, Salesforce Ventures, Eric Schmidt, and others participating. High SO004, SO014, SO015, SO016, SO017, SO029
CO019 The Series B was announced less than a year after Goodfire’s Series A. Medium SO014, SO015, SO004
CO020 Goodfire and third-party coverage describe the company as having raised more than $200 million in total funding after the Series B. Medium SO004, SO014, SO016
CO021 Adding the publicly disclosed seed, Series A, and Series B rounds implies roughly $207 million of total disclosed capital. Medium SO022, SO018, SO014
CO022 Reviewed public sources do not disclose debt financing, secondary transactions, ownership percentages, or board-seat allocations for Goodfire’s financings. Medium SO014, SO018, SO025
CO023 Salesforce Ventures’ investment materials frame Goodfire as foundational enterprise AI infrastructure rather than only a research project. Medium SO024, SO025
CO024 Goodfire’s public product branding shifted from Ember in 2025 financing materials to Silico in 2026 product materials. Medium SO018, SO020, SO007, SO029
CO025 Goodfire says it reduced hallucinations in a large language model by about half using interpretability-informed training. Medium SO004, SO027
CO026 Official materials name Prima Mente, Arc Institute, Mayo Clinic, and Microsoft as partners or collaborators. Medium SO004, SO008, SO009, SO011
CO027 The Mayo Clinic collaboration explicitly discloses that Mayo Clinic has a financial interest in the technology referenced in the announcement. Medium SO009
CO028 Goodfire’s public commercial proof remains broad and category-based because it names customer types but does not list many named enterprise customers or contract counts. Medium SO010, SO028
CO029 Goodfire should be classified as a private Series B-stage company based on investor profiles labeling it private and the February 2026 financing history. Medium SO025, SO030, SO014
CO030 Goodfire’s best-supported current public valuation is $1.25 billion. High SO004, SO014, SO015, SO016, SO017
CO031 Goodfire’s best-supported public total capital figure is above $200 million. Medium SO004, SO014, SO016, SO022, SO018
CO032 No reviewed public source discloses Goodfire’s revenue, ARR, or customer count. Medium SO004, SO014, SO025
CO033 No official source reviewed discloses employee headcount, but one independent profile estimates Goodfire had about 51 employees as of January 2026. Low SO003, SO028
CO034 Reviewed public sources identify only a single disclosed office location in San Francisco and do not name other offices. Medium SO003, SO025
CO035 The public milestone arc visible in reviewed sources runs from seed financing in August 2024 to Series A in April 2025 and Series B in February 2026. Medium SO022, SO020, SO004
CO036 Goodfire’s September 2025 Mayo Clinic announcement shows the company expanding from interpretability tooling into healthcare and genomic medicine partnerships. Medium SO009
CO037 By February 2026 Goodfire was publicly describing partnerships spanning AI agents and life sciences. Medium SO004, SO015
CO038 MIT Technology Review reported on April 30, 2026 that Goodfire was commercially releasing Silico as a fee-based tool for model debugging and steering. Medium SO027
CO039 MIT Technology Review quoted an outside interpretability researcher saying Goodfire is adding “precision to the alchemy” rather than making model design fully principled. Medium SO027
CO040 An independent health-tech analysis argues the $1.25 billion valuation is aggressive for a research-first company with early commercial traction and an estimated 51 employees. Medium SO028
CO041 Goodfire’s public materials show active field-building and recruiting through a fellowship program, Stanford guest lectures, and ongoing in-person hiring in 2025-2026. Medium SO003, SO012, SO013
CM001 Goodfire positions itself as an interpretability lab focused on understanding and intentionally designing AI rather than only monitoring outputs. High SM001, SM007
CM002 Silico is described as a model design environment for training and debugging models on Goodfire infrastructure. High SM003, SM007
CM003 Goodfire says it partners with organizations training or fine-tuning foundation models across architectures and modalities. High SM003, SM004, SM005, SM006
CM004 Goodfire claims its language-model workflow cut hallucinations by 58% without degrading benchmark performance and at about 90x lower cost than LLM-as-a-judge. Medium SM004
CM005 Goodfire publicly markets use cases across language models, genomics, and robotics or vision instead of only text-model applications. High SM004, SM005, SM006
CM006 Goodfire says it works with partners such as Arc Institute, Mayo Clinic, and Microsoft and uses a shared environment with customers. Medium SM007
CM007 Goodfire publicly describes inference-time monitors and production monitoring as part of its intentional-design platform. High SM001, SM007
CM008 Goodfire argues that black-box prompting and fine-tuning are inadequate for reliable high-stakes AI engineering and that feature steering can substitute for some fine-tuning work. Medium SM008, SM009
CM009 Goodfire's pilot agreement starts with internal evaluation of software plus services and explicitly aims toward a later commercial license. Medium SM014
CM010 The pilot agreement requires customer cooperation, access to software or equipment, and designated contacts, implying a high-touch delivery model. Medium SM014
CM011 Prima Mente used Goodfire to decode an epigenomics model for biomarker discovery and model redesign, showing a plausible scientific-AI buyer archetype. High SM005, SM015
CM012 Goodfire and Mayo frame interpretability as a way to validate model predictions, reduce spurious correlations, and improve scientific or clinical relevance under governance controls. High SM005, SM010
CM013 MIT Technology Review says Goodfire is one of a small handful of companies pioneering mechanistic interpretability and that frontier labs already have internal interpretability teams. Medium SM030
CM014 MIT says Silico is most usable where customers can access model internals, which is easier for open-source or in-house models than for closed models like ChatGPT or Gemini. High SM003, SM030
CM015 MIT reports that Goodfire will price Silico case-by-case instead of publishing standard pricing. Medium SM030
CM016 Gartner says generative-AI ROI varies widely by use case and that hidden costs such as compliance reviews, retraining, and internal overhead can exceed initial expectations. Medium SM016
CM017 Gartner places generative AI in the 2025 Trough of Disillusionment, which implies more cautious implementation expectations even as interest remains high. Medium SM016
CM018 NIST's AI Risk Management Framework treats trustworthy, governable AI as a prerequisite for adoption in higher-risk settings. Medium SM018
CM019 PwC reports that AI-exposed industries have 3x higher revenue-per-worker growth since 2022 and workers with AI skills command a 56% wage premium. Medium SM017
CM020 Goodfire's relevant market boundary is narrower than broad generative-AI narratives and should focus on model design, interpretability, and model-behavior tooling for teams that can inspect or modify internals. High SM001, SM003, SM014, SM030
CM021 The included spend pool covers representation analysis, failure diagnosis, steering, interpretable training feedback, and production monitors, while excluding generic AI hardware, generic copilots, and pure app-performance monitoring. High SM003, SM007, SM023
CM022 Arize, Fiddler, Datadog, LangSmith, Langfuse, Patronus, Arthur, and Humanloop show that tracing, evaluation, monitoring, and agent control are already recognized software categories. High SM019, SM021, SM023, SM024, SM025, SM027, SM028, SM029
CM023 Those adjacent platforms mostly observe prompts, traces, sessions, and outputs, whereas Goodfire's differentiation claim is control over internal features, parameters, or latent representations. High SM003, SM011, SM013, SM019, SM021, SM024, SM025
CM024 Arize sells from free or open-source tooling to a $50-per-month Pro plan and custom enterprise pricing, showing the adjacent observability layer already has self-serve pricing and startup programs. High SM019, SM020
CM025 Fiddler publishes a developer price of $0.002 per trace and markets enterprise guardrails, observability, and governance as one platform. High SM021, SM022
CM026 Langfuse publishes prices from free to $29 per month Core, $199 per month Pro, and $2,499 per month Enterprise, with enterprise security and support features. High SM025, SM026
CM027 Humanloop markets enterprise evaluation tooling with a free trial, 50 eval runs, and 10,000 logs per month, reinforcing that adjacent budgets often begin with workflow tooling rather than custom research engagements. Medium SM029
CM028 Goodfire's direct market reach is highest in frontier labs because they already run interpretability teams, possess model internals, and value precise control over training and behavior. Medium SM003, SM007, SM030
CM029 Enterprise model teams are reachable when they train or fine-tune proprietary or open-weight models, but teams using only closed APIs are outside Goodfire's near-term reach. Medium SM003, SM009, SM014, SM030
CM030 Scientific-AI teams in genomics, biology, and robotics are attractive because model internals can reveal domain mechanisms, improve generalization, and validate whether predictions rely on real structure or shortcuts. High SM005, SM006, SM010, SM012, SM015
CM031 Regulated adopters have strong need for interpretability and trustworthy AI, but procurement and deployment cycles are slower because governance, privacy, and evidence standards are higher. High SM010, SM017, SM018
CM032 Goodfire's adoption motion likely starts with a pilot or design-partner evaluation, then requires model and data access, interpretability work, and only later expands to production monitoring and longer-term licensing. High SM003, SM007, SM014
CM033 In this market the buyer, user, and payer often differ, with research or platform leaders buying, model scientists and safety teams using, and AI R&D or platform budgets paying. Medium SM002, SM003, SM014, SM030
CM034 The category grows as models take on higher-stakes tasks in health, science, finance, and autonomous agent workflows where output-only evaluation is insufficient. High SM005, SM010, SM021, SM023, SM024
CM035 Agent-observability vendors frame autonomous decisions, guardrails, and repeatable evaluation as business-critical, which expands the adjacent budget pool that Goodfire can sell into or alongside. High SM021, SM022, SM023, SM024, SM025, SM027
CM036 Dependence on model-internal access is a major constraint because Goodfire's tooling requires deeper access than teams using only hosted closed-model APIs can usually provide. High SM003, SM014, SM030
CM037 Goodfire presents interpretability as precision engineering that can turn training into intentional design. Medium SM007, SM008
CM038 MIT Technology Review quotes an external researcher saying Goodfire is adding precision to alchemy, which challenges the precision-engineering narrative. Medium SM030
CM039 Goodfire's own intentional-design essay says the agenda is at the beginning of a deep technical tree and still needs better interpretability tools and algorithms. Medium SM008
CM040 Goodfire's parameter-decomposition research says current interpretability methods still struggle to map model behavior cleanly to underlying parameters and circuits, which reinforces technical immaturity. Medium SM013
CM041 Goodfire's manifold-steering research argues that linear steering often mismatches model geometry and that geometry-aware steering works better, suggesting the technical edge is not commodity tracing. Medium SM011
CM042 Goodfire's Evo 2 work shows interpretability can reveal biologically relevant features and possibly guide DNA generation, supporting a scientific-AI market lens beyond enterprise copilots. High SM005, SM012
CM043 Goodfire says customer conversations show teams prioritize rapid iteration and migration to newer models over heavy fine-tuning, which implies demand for lighter-weight control tooling. Medium SM009
CM044 Public adjacent pricing creates a floor for what buyers expect to pay for observability and eval tooling, but Goodfire's undisclosed case-by-case pricing means it must win on higher-value model-internal outcomes rather than commodity traces. High SM020, SM022, SM026, SM029, SM030
CM045 Because Goodfire has no public pricing schedule, customer count, or disclosed recurring revenue, a defensible TAM, SAM, or SOM cannot be computed from public evidence alone. High SM014, SM030
CM046 The most evidence-backed near-term SOM is a small set of frontier labs, advanced enterprise model teams, and scientific model builders willing to grant model access and buy services-heavy pilot engagements. High SM003, SM005, SM006, SM014, SM030
CM047 Published self-serve observability prices imply an annual software band of roughly $348 to $2,388 before enterprise add-ons or heavy usage. High SM020, SM026
CM048 Public list pricing shows adjacent enterprise-grade observability software can reach at least about $29,988 per year before overage charges or custom services. Medium SM026
CM049 Fiddler's per-trace pricing implies annual monitoring spend can range from hundreds to tens of thousands of dollars depending on trace volume. Medium SM022
CP001 Goodfire positions Silico as the first platform for intentional model design and as a workspace for training and debugging models at frontier scale. Medium SP001
CP002 Goodfire says its language-model workflow predicts failures before deployment and can correct failure modes directly without retraining from scratch. High SP001, SP002
CP003 Goodfire extends the same model-internal workflow into life sciences and robotics/vision use cases, not just generic chat applications. High SP003, SP004
CP004 Goodfire explicitly frames feature steering as an alternative to black-box prompting and fine-tuning workflows. Medium SP005
CP005 Goodfire disclosed a $150 million Series B at a $1.25 billion valuation and third-party coverage describes roughly $209 million raised in total. High SP006, SP008
CP006 MIT Technology Review describes Goodfire as one of a small handful of mechanistic-interpretability pioneers alongside Anthropic, OpenAI, and Google DeepMind. Medium SP007
CP007 MIT Technology Review says frontier labs already have internal interpretability teams, making them Goodfire's closest direct incumbent alternative for top-end model builders. Medium SP007
CP008 MIT Technology Review says Silico is most useful where customers can inspect a model's inner workings, limiting its applicability on closed models such as ChatGPT or Gemini. Medium SP007
CP009 Outside researcher Leonard Bereska told MIT Technology Review that Goodfire may be adding precision to existing AI alchemy rather than fully turning model building into engineering. Medium SP007
CP010 On Healthcare Tech characterizes Goodfire as a roughly 51-person, research-first organization whose $1.25 billion valuation looks aggressive relative to disclosed commercial traction. Medium SP008
CP011 Goodfire's probe-based data-attribution work claims a 63% reduction in harmful behavior after filtering flagged data and larger reductions after swapping labels or removing responsible sources. Medium SP009
CP012 Goodfire says SAE probes for Rakuten AI agents generalized better than other probes on PII detection and were cheaper than LLM-as-judge baselines. Medium SP010
CP013 Goodfire's Llama 3 research preview claims it can extract modifiable internal features and steer behavior while minimizing performance degradation. Medium SP011
CP014 Goodfire's VPD explainer says direct edits to decomposed parameter subcomponents can produce precise behavior changes without retraining. Medium SP012
CP015 Goodfire says its self-correcting-search collaboration improved viable candidate materials by about 30%, supporting its claim that mechanistic tools can affect model behavior in non-LLM domains. Medium SP013
CP016 Goodfire's own reasoning-theater research argues that chain-of-thought can be unfaithful to internal computation, which weakens the claim that trace-level reasoning alone is enough for debugging. Medium SP014
CP017 Arize Phoenix markets an open-source platform for agent development and evaluation built around tracing, evals, datasets, and experiments. Medium SP015
CP018 Arize prices AX Pro at $50 per month with 50k spans and 10 GB, while enterprise packaging is custom and can be self-hosted. Medium SP016
CP019 Fiddler positions itself as a unified AI observability and security platform with lifecycle evaluation, monitoring, and real-time guardrails. Medium SP017
CP020 Fiddler publishes a free tier and a Developer plan priced at $0.002 per trace, with enterprise deployment options spanning SaaS, VPC, and on-premise. Medium SP018
CP021 Arthur markets a full-lifecycle platform for reliable AI that combines continuous evals, policies, guardrails, dashboards, and oversight. Medium SP019
CP022 Datadog ties agent observability to its broader application-monitoring estate and says teams can test prompt, model, and tool changes against production data in one workflow. Medium SP020
CP023 Datadog publishes a free tier up to 40K LLM spans per month and a Pro plan starting at $160 per month for 100K spans, with no separate evaluation fee. Medium SP020
CP024 LangSmith markets agent observability with framework-agnostic SDKs and says it has a free tier while paid plans scale with trace volume. Medium SP021
CP025 Langfuse markets an open-source AI engineering platform, self-hosting, OpenTelemetry compatibility, 10+ billion observations per month, and more than 100,000 engineers building on it. Medium SP022
CP026 Langfuse publishes transparent self-serve pricing: free Hobby, $29 Core, $199 Pro, and $2499 Enterprise, plus a unit-based overage ladder. Medium SP023
CP027 Humanloop historically sold enterprise tools to develop, evaluate, and ship trustworthy LLM apps, including private deployment options and a free trial. Medium SP024
CP028 Humanloop is now joining Anthropic and sunsetting its platform, so it is better read as a consolidation signal than as a durable stand-alone peer. Medium SP025
CP029 Weights says its products and services were wound down after its team joined OpenAI, reinforcing the pattern that AI tooling teams can be absorbed by frontier labs. Medium SP026
CP030 NIST's AI RMF and Gartner's GenAI guidance both emphasize trustworthiness, governance, evaluation, and hidden operating costs in sensitive AI deployments. High SP027, SP028
CP031 Goodfire's closest direct alternatives are internal frontier-lab interpretability teams and advanced in-house build paths, not ordinary tracing vendors. High SP001, SP007, SP015, SP021
CP032 Most adjacent vendors in the reviewed set compete at the trace, eval, guardrail, or governance layer rather than through direct edits to learned model features. High SP017, SP019, SP020, SP021, SP022
CP033 Goodfire appears best aligned to buyers building or adapting open-weight models in high-stakes domains where pre-deployment diagnosis matters more than general observability breadth. Medium SP002, SP003, SP004, SP007
CP034 Datadog, LangSmith, and Langfuse have stronger visible developer distribution than Goodfire because they ride existing observability, framework, or open-source workflows. Medium SP020, SP021, SP022
CP035 Fiddler and Arthur compete more directly with governance- and trust-led procurement because they explicitly emphasize guardrails, policies, monitoring, and enterprise oversight. Medium SP017, SP018, SP019
CP036 Goodfire's public commercial disclosure is thinner than that of Arize, Fiddler, Datadog, and Langfuse because MIT describes Silico pricing as case-by-case and Goodfire declined specifics. Medium SP007, SP016, SP018, SP020, SP023
CP037 Free or low-cost adjacent tools put price pressure on any attempt to sell Goodfire as a generic AI engineering or observability layer instead of a differentiated model-design product. Medium SP015, SP016, SP020, SP022, SP023, SP024
CP038 Category consolidation is already visible through Humanloop's move to Anthropic and Weights' move to OpenAI, which raises the risk that interpretability adjacencies become features inside larger labs or stacks. Medium SP025, SP026
CP039 Goodfire's moat is strongest if its research outputs in steering, attribution, probes, and domain science can be productized into repeatable workflows rather than bespoke research wins. Medium SP006, SP009, SP010, SP011, SP013
CP040 The public record does not yet show enough win-rate, realized-pricing, or retention evidence to underwrite Goodfire's competitive durability with high confidence. Medium SP007, SP008
CP041 The status-quo substitute for many buyers remains an in-house black-box stack of prompting, eval harnesses, fine-tuning, and guardrails, which is cheaper up front but less mechanistically explanatory. Medium SP005, SP015, SP021, SP024
CP042 Goodfire's partner access today looks more domain-credibility-led than platform-distribution-led: Microsoft, Mayo, Rakuten, and Radical-style collaborations support relevance but do not equal Datadog- or LangChain-style installed-base reach. Medium SP006, SP010, SP013, SP020, SP021
CP043 Humanloop packages enterprise LLM evals as a standalone platform, reinforcing that adjacent evaluation vendors compete for some of the same budgets Goodfire targets. Medium SP029
CI001 Goodfire announced a $150 million Series B round at a $1.25 billion valuation in February 2026. High SI002, SI016, SI017
CI002 Goodfire's 2026 Form D reports $149,999,796 sold after a first sale on 2025-12-17, against a total offering amount of $161,674,124. Medium SI028
CI003 Goodfire announced a $50 million Series A round in April 2025. High SI021, SI022, SI023, SI027
CI004 Goodfire's 2025 Form D reports $52,029,991 sold after a first sale on 2025-04-02. Medium SI027
CI005 At least $202,029,787 of equity sold across Goodfire's 2025 and 2026 Form D filings is directly verifiable from primary filing evidence. High SI027, SI028
CI006 Goodfire says the Series B proceeds will fund frontier research, next-generation product development, and scaled partnerships across AI agents and life sciences. High SI002, SI016, SI018
CI007 Goodfire describes Silico as a model-design environment and workspace for training and debugging models on Goodfire infrastructure. High SI001, SI003
CI008 Goodfire's product and vertical pages route prospects to request access or contact the company instead of publishing self-serve commercial pricing. High SI003, SI004, SI005, SI006, SI008
CI009 Goodfire's contact page says its platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs. Medium SI008
CI010 Goodfire's Series B post says the company has partnered with Arc Institute, Mayo Clinic, and Microsoft to deploy its technology. Medium SI002
CI011 In the Prima Mente case study, Goodfire says its research scientists embedded with the customer and built a biomarker-discovery pipeline around the customer's model. Medium SI011
CI012 Goodfire's public contract terms show a commercial bundle that can include platform access plus support, technical assistance, field engineering support, research activities, collaboration activities, and deliverables. High SI013, SI015
CI013 Goodfire's MSA and TOS place core commercial fees in negotiated order forms rather than in public documentation. High SI013, SI015
CI014 Goodfire's TOS explicitly contemplates overage charges when usage exceeds the allotment included in the applicable order form. Medium SI015
CI015 Goodfire's pilot agreement says pilot access is for internal evaluation and requires a separate commercial license for post-evaluation use. Medium SI014
CI016 Goodfire's TOS says usage reports provided through the platform dashboard or on request are the authoritative source for calculating Fees. Medium SI015
CI017 Goodfire's MSA says it will not use Customer Data to train foundation models or generalized machine-learning models for the benefit of Goodfire, other customers, or third parties. Medium SI013
CI018 Goodfire's TOS gives Goodfire a perpetual license to use Workflow Data to provide, improve, train, fine-tune, and commercialize the platform, subject to confidentiality constraints. Medium SI015
CI019 Goodfire's MSA and TOS allow suspension for overdue accounts and provide for late-payment interest of 1.5 percent per month. High SI013, SI015
CI020 Goodfire announced SOC 2 Type II compliance and a public SOC 3 report by February 2026. Medium SI010
CI021 Goodfire's official vertical pages target teams training or fine-tuning AI models across architectures and modalities rather than retail end users. High SI003, SI004, SI005, SI006
CI022 Goodfire's RLFR post claims a 58 percent reduction in hallucinations in Gemma-3-12B-IT at roughly 90 times lower cost than an LLM-as-a-judge alternative, with no degradation on standard benchmarks. High SI004, SI012
CI023 Goodfire's RLFR and life-sciences proof points are technical or scientific outcomes, not disclosed customer ROI or recognized revenue metrics. Medium SI011, SI012, SI026
CI024 Goodfire's feature-steering post says the SAE demo interface and API were deprecated in February 2026. Medium SI007
CI025 The deprecation of public preview tooling and the request-access posture together suggest Goodfire has shifted its public surface toward enterprise and custom deployments. Medium SI003, SI007, SI008
CI026 Goodfire's public evidence includes named life-sciences proof points with Prima Mente, Mayo Clinic, and Arc Institute. High SI002, SI005, SI011
CI027 Salesforce Ventures presents Goodfire as foundational enterprise infrastructure for understanding and intentionally designing modern AI. Medium SI025
CI028 Menlo Ventures says Goodfire is productizing Ember and commercializing model understanding, and notes that Eric Ho previously scaled a prior company to more than $10 million in ARR. Medium SI023
CI029 No reviewed public source discloses Goodfire's revenue, ARR, gross margin, cash balance, burn rate, runway, or customer retention metrics. High SI001, SI002, SI003, SI013, SI015, SI016, SI026
CI030 No reviewed public source discloses public list pricing, minimum commits, or discount ladders for Silico or related commercial offerings. High SI003, SI004, SI005, SI006, SI008, SI013, SI015
CI031 Because pricing is private and contracts are order-form based, Goodfire's realized pricing and software-versus-services revenue mix cannot be inferred from the official surface alone. Medium SI011, SI013, SI014, SI015
CI032 A skeptical sector analysis argues that Goodfire's $1.25 billion valuation is aggressive for a roughly 51-person company with early commercial traction and not yet a predictable SaaS business. Medium SI026
CI033 The same skeptical analysis argues that investors are underwriting Goodfire on research and platform option value rather than on publicly evidenced near-term software revenue. Medium SI026
CI034 Goodfire's Series B was announced less than a year after its Series A, showing capital access that scaled faster than disclosed operating metrics. High SI002, SI021, SI026
CI035 Goodfire's 2025 Form D lists 47 investors, while the 2026 Form D lists 19 investors. High SI027, SI028
CI036 Goodfire's 2026 Form D total offering amount exceeds the press-announced $150 million sold amount, implying possible residual allocation or additional financing capacity within the same offering. Medium SI016, SI028
CI037 No public debt facility or project-finance obligation surfaced in the reviewed sources, but the absence of disclosure should not be treated as proof of zero leverage. Medium SI013, SI015, SI016, SI026
CI038 Post-Series-B capital adequacy can only be assessed qualitatively: Goodfire is well funded relative to public stage signals, but runway cannot be modeled without cash and burn data. Medium SI016, SI026, SI028
CI039 Goodfire's public messaging implies a high-touch GTM motion centered on selective design partnerships rather than broad self-serve transaction volume. Medium SI002, SI008, SI015
CI040 Because Goodfire's customer evidence includes embedded scientific work and its terms contemplate field engineering and collaboration activities, at least some current revenue likely mixes software access with services delivery. Medium SI011, SI015
CI041 Goodfire publicly presents Radical AI as a materials-science design partner, supporting a commercialization path beyond language-model customers. Medium SI029
CE001 Silico is described as the first platform for intentional model design and as a model-design environment built on Goodfire infrastructure. High SE001, SE030
CE002 Silico markets five operator jobs around model internals: seeing inside predictions, running health checks, debugging failures, shaping behavior, and generalizing from less data. Medium SE001
CE003 Goodfire's current product motion is request-access and partnership-led for teams training or fine-tuning foundation models across architectures and modalities. Medium SE001, SE002, SE003, SE004
CE004 Goodfire's language-model workflow claims a 58% hallucination reduction with no degradation on performance benchmarks. High SE002, SE005
CE005 The same language workflow claims roughly 90x lower intervention cost than LLM-as-a-judge approaches. Medium SE002
CE006 The Hallucinations Viewer compares base and policy rollouts on LongFact++ and exposes intervention details for selected outputs. Medium SE005
CE007 Goodfire's life-sciences surface claims state-of-the-art pathogenicity prediction across 839k ClinVar variants. High SE003, SE015
CE008 Goodfire says EVEE provides interpretable predictions and explanations for all 4.2 million ClinVar variants. High SE003, SE015
CE009 Prima Mente and Goodfire identified DNA fragment length as a dominant Alzheimer's signal and distilled the finding into a human-readable classifier. Medium SE013, SE028
CE010 Goodfire says the Alzheimer's biomarker workflow generalized to an independent cohort. High SE003, SE013
CE011 Goodfire's robotics and vision surface says teams can predict failures before deployment by inspecting latent representations directly. Medium SE004
CE012 The robotics case study says Goodfire traced unstable behavior to brittle internal features and information bottlenecks. Medium SE004
CE013 Goodfire markets feature steering as stronger than prompting when prompt engineering hits diminishing returns. Medium SE006
CE014 Goodfire says feature steering can often replace fine-tuning for behavior changes but cannot add new knowledge to a model. Medium SE006
CE015 Goodfire deprecated its earlier SAE demo interface and API in February 2026. Medium SE006
CE016 MIT Technology Review reports that Silico uses agents to automate interpretability work that previously required human researchers. High SE009, SE030
CE017 MIT Technology Review reports that Silico is priced case-by-case and is easier to use on open-source models than on closed APIs. Medium SE030
CE018 Goodfire's intentional design thesis frames current AI training as guess-and-check and positions interpretability as closed-loop steering. Medium SE007
CE019 Goodfire says intentional design aims to change what models learn from individual datapoints rather than hard-wiring heuristics into models. Medium SE007
CE020 Goodfire says it released the first public sparse autoencoders trained on a true reasoning model, DeepSeek R1. Medium SE010
CE021 Goodfire's R1 work says effective steering had to begin after the model's boilerplate response prefix rather than at the first response token. Medium SE010
CE022 Goodfire reports that some R1 features revert toward original behavior under oversteering before outputs become incoherent. Medium SE010
CE023 Goodfire argues that important model concepts often live on curved manifolds rather than along single linear directions. Medium SE011, SE014
CE024 Can SAEs Capture Neural Geometry? says a single sparse-autoencoder feature gives only a partial view of curved internal structure. Medium SE014
CE025 Goodfire says its manifold pipeline clusters sparse features to recover fuller geometric structure from internal representations. Medium SE014
CE026 Stochastic Parameter Decomposition moves interpretability into parameter space by learning which weight components can be removed without changing behavior. Medium SE017
CE027 Model diff amplification makes rare harmful behaviors 10 to 300 times more common in sampling, making them easier to detect. Medium SE016
CE028 Goodfire says model diff amplification can reveal post-training side effects after only a fraction of a training run. Medium SE016
CE029 Goodfire's eval-awareness study found naturally occurring verbalized eval awareness across all 19 benchmarks and 8 models it tested. Medium SE012
CE030 Goodfire says prompt rewrites reduced eval awareness by 40% and an unsupervised method reduced it by 75%, with safe-behavior rates also falling. Medium SE012
CE031 Paint With Ember uses a canvas that manipulates SDXL-Turbo internal activations instead of relying only on text prompts. Medium SE019
CE032 Goodfire's research surfaces and phylogeny work argue that internal geometry recapitulates structured concepts across language, image, and genomic models. Medium SE011, SE021
CE033 Goodfire's terms define the platform as software, APIs, tools, documentation, support, and services, and allow customers to bring models, files, datasets, code, and workflows into the platform. Medium SE022
CE034 Public terms tie commercial fees and overages to order forms and usage reports rather than to a public rate card. Medium SE022
CE035 The Pilot Agreement limits pilot use to internal evaluation and requires a separate commercial license after the evaluation period. Medium SE023
CE036 Goodfire's terms and pilot agreement both describe support, technical assistance, field engineering, research activities, and deliverables around the platform. High SE022, SE023
CE037 Goodfire's terms allow third-party products and permit access suspension for security, legal, operational, or overdue-account reasons. Medium SE022
CE038 Goodfire's terms say customers retain customer materials while Goodfire retains Goodfire IP and broad rights over usage data and licensed workflow data. Medium SE022
CE039 Goodfire announced SOC 2 Type II with no exceptions identified and a public SOC 3 summary. Medium SE008
CE040 Goodfire says its Mayo collaboration operates under rigorous data privacy protocols and Mayo Clinic governance frameworks. Medium SE027
CE041 NIST's AI RMF and generative-AI profile focus on embedding trustworthiness into AI design, development, use, and evaluation. Medium SE033
CE042 Gartner says GenAI total cost of ownership is often understated and that critical decision use cases require more robust and interpretable approaches. Medium SE034
CE043 Salesforce Ventures frames Goodfire as moving AI teams from guessing at behavior to measuring and shaping model intent and reasoning. Medium SE031
CE044 On Healthcare Tech argues that interpretability could become infrastructure for regulated health AI, but public commercialization evidence still looks early. Medium SE032
CE045 Public materials reviewed do not provide a public status page, self-serve API reference, or public deployment-count disclosure for Silico. Medium SE001, SE022, SE030
CE046 Careers, Stanford guest lectures, and the fellowship program show active research-engineering recruiting and practitioner education despite a limited public OSS product surface. Medium SE024, SE025, SE026
CE047 Goodfire's public 2026 output is research-led and fast, with published releases on May 4, May 20, and May 21 covering eval awareness and neural geometry. Medium SE012, SE014, SE020
CE048 PR Newswire says Series B proceeds will fund next-generation product development and partnership scaling across AI agents and life sciences. Medium SE038
CE049 EVEE combines Evo 2 embeddings, lightweight probes, and frontier reasoning models to generate human-readable hypotheses about variant effects. Medium SE015
CE050 Goodfire's phylogeny work says Evo 2 encodes tree-of-life relationships as a curved manifold, reinforcing its model-to-human knowledge-transfer thesis. Medium SE021
CE051 Menlo and Lightspeed both describe Goodfire as an applied research lab translating mechanistic interpretability into productized tooling. Medium SE035, SE036
CE052 PYMNTS reports that Goodfire internally uses a model design environment and deploys that shared environment forward with customers. Medium SE037
CU001 Goodfire publicly says its platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs. High SU001, SU014
CU002 Goodfire positions Silico and related services for organizations training or fine-tuning foundation models across architectures and modalities. High SU001, SU002
CU003 Goodfire says it engages deeply and selectively with teams building high-stakes or frontier systems where understanding and control are essential. Medium SU014
CU004 Public product and contact copy imply buyers are research, platform, or product owners while day-to-day users are research scientists and ML engineers. Medium SU001, SU002, SU003, SU004, SU005
CU005 Goodfire's named proof set is concentrated in life sciences, AI-agent infrastructure, and materials discovery rather than a wide range of end markets. Medium SU003, SU006, SU008, SU010, SU011, SU013
CU006 Reviewed public sources do not disclose Goodfire's customer count. High SU001, SU014, SU019, SU020
CU007 Reviewed public sources do not disclose Goodfire's segment-level revenue or ARR by customer type. High SU014, SU020, SU022
CU008 The broad Fortune 500 adoption claim is not backed by a public list of named enterprise customers or outcomes. High SU001, SU014
CU009 Goodfire's public sales surface is request-access and contact led rather than self-serve. High SU001, SU019
CU010 Prima Mente partnered with Goodfire to understand its Pleiades epigenomics model. Medium SU006, SU007
CU011 Goodfire says its researchers embedded in Prima Mente's team while building a biomarker-discovery pipeline. Medium SU006
CU012 Goodfire and Prima Mente identified a novel class of blood-borne biomarkers for Alzheimer's detection. Medium SU006, SU007, SU022
CU013 Prima Mente's public outcome remains pre-validation because the biomarkers are still undergoing experimental validation and a publication is forthcoming. Medium SU006, SU007
CU014 Goodfire collaborated with Arc Institute to interpret Evo 2, Arc's genomic foundation model. Medium SU010, SU025
CU015 The initial Arc Institute disclosure described feature discovery and steering work that was still in its early stages rather than a mature production deployment. Medium SU010
CU016 By March 2026, Goodfire's Evo 2 interpretability work had been updated to note Nature publication, increasing scientific credibility of the Arc partnership. Medium SU010, SU020
CU017 Goodfire says its Mayo Clinic collaboration combines interpretability work with Mayo's medical AI team and established data-governance frameworks. Medium SU008
CU018 Public Mayo materials frame the work as genomic research and responsible-AI validation rather than routine clinical deployment. Medium SU008, SU009
CU019 Goodfire's EVEE work is described as part of its ongoing collaboration with Mayo Clinic and is still undergoing peer review. Medium SU009
CU020 Goodfire says EVEE achieves 0.997 AUROC on 839k ClinVar variants and provides predictions and explanations for all 4.2 million ClinVar variants. Medium SU003, SU009
CU021 Goodfire says EVEE outputs are computational predictions rather than diagnoses and require further expert review and validation. Medium SU009
CU022 Goodfire partnered with Rakuten on PII detection for multilingual AI-agent messages in a production-critical enterprise setting. Medium SU013
CU023 The Rakuten deployment required synthetic-to-real generalization, multilingual English and Japanese coverage, lightweight inference, and very high recall. Medium SU013
CU024 Goodfire says Rakuten deployed SAE probes and describes the system as the first known enterprise application of SAEs for language-model guardrails. Medium SU013
CU025 Among reviewed sources, Rakuten is the clearest public evidence of a production Goodfire deployment. Medium SU013, SU019
CU026 Goodfire and Radical AI publicly announced a partnership to apply interpretability to inverse materials design. Medium SU011, SU012
CU027 Goodfire says its self-correcting-search work with Radical AI improved successful candidates by about 27% and generated about 30% more SUN materials in the target range. Medium SU012
CU028 Radical AI's public partnership disclosure says more research directions and outcomes will be shared later, leaving commercialization maturity unclear. Medium SU011, SU012
CU029 Goodfire says it deploys its model design environment forward with customers in a shared environment. High SU014, SU026
CU030 MIT Technology Review reports that Silico pricing is determined case by case and Goodfire declined to provide specific pricing. Medium SU019
CU031 MIT Technology Review says Silico could let smaller firms and research teams build or adapt open-source models without hiring interpretability researchers. Medium SU019
CU032 Goodfire's public positioning is selective and high-touch rather than high-volume self-serve SaaS. High SU001, SU014, SU019
CU033 Goodfire's Series B materials say the new funding will scale partnerships across AI agents and life sciences. Medium SU021, SU022, SU023, SU026
CU034 Salesforce Ventures frames Goodfire around enterprise AI ROI, reliability, and control problems. Medium SU017, SU018
CU035 The public proof set spans life sciences, AI-agent operations, materials science, and general frontier-model design. High SU003, SU011, SU013, SU014
CU036 Goodfire's public blog history shows named collaboration proof surfacing across 2025 and 2026 rather than through a single isolated announcement. High SU008, SU010, SU011, SU013, SU016
CU037 Goodfire's public materials distinguish broad segment claims from a much smaller set of named collaborators. High SU001, SU003, SU006, SU008, SU010, SU011, SU013
CU038 No reviewed public source disclosed NRR, GRR, churn, renewal rate, or true retention cohorts for Goodfire. High SU001, SU014, SU019, SU020
CU039 No reviewed public source disclosed contract length, commercial expansion metrics, or top-customer concentration for Goodfire. High SU001, SU014, SU020, SU022
CU040 Arc Institute and Mayo Clinic each have later public follow-on evidence after their initial collaboration announcements, indicating relationship continuity but not proving paid renewal. Medium SU008, SU009, SU010
CU041 The disclosed reference set is concentrated in a handful of named collaborators and is especially weighted toward life-sciences programs. Medium SU003, SU006, SU008, SU009, SU010, SU020
CU042 The broad Fortune 500 claim remains materially weaker than the named proof set because no enterprise names or outcomes are publicly disclosed. Medium SU001, SU014
CU043 MIT Technology Review quoted Leonard Bereska saying Goodfire is adding “precision to the alchemy,” a substantive critique of how principled the product really is. Medium SU019
CU044 OnHealthcare argued that Goodfire's $1.25 billion valuation looks aggressive for a research-first company with relatively early commercial traction. Medium SU020
CU045 OnHealthcare argued that the public valuation case relies more on platform option value than on disclosed revenue or customer metrics. Medium SU020
CU046 Several scientific customer outcomes remain partly hypothesis-stage because Prima Mente's biomarkers are still under validation and EVEE is still undergoing peer review. Medium SU006, SU007, SU009
CU047 Silico is most naturally usable where customers can inspect model internals, which may bias near-term adoption toward open-model teams and research labs over closed-model users. Medium SU002, SU019
CU048 Goodfire's continued publication of frontier interpretability results supports a customer narrative built on research credibility as much as on packaged software. Medium SU034
CU049 Mayo Clinic is a major medical institution, so Goodfire's disclosed collaboration carries meaningful signal for regulated-domain customer credibility. High SU035, SU007
CR001 Goodfire positions itself as a research company using interpretability to understand, learn from, and design AI systems rather than relying on scale alone. Medium SR006, SR022
CR002 Goodfire publicly argues that today's dominant AI-development process still cannot meaningfully understand, debug, or shape what models learn. Medium SR005, SR006
CR003 Goodfire says current model training is still a costly guess-and-check process and presents intentional design as an attempt to move from open-loop tweaking toward closed-loop control. Medium SR005
CR004 Goodfire also states that its techniques are early, the science is incomplete, and the hardest interpretability problems remain unsolved. Medium SR005, SR023
CR005 MIT Technology Review described Silico as potentially useful but quoted an external mechanistic-interpretability researcher saying Goodfire is adding precision to alchemy rather than fully turning model building into engineering. Medium SR013
CR006 Goodfire markets Silico as a model-design environment that can debug behavior, remove confounders, and diagnose failures before production, but access is still request-based rather than self-serve. Medium SR009
CR007 Goodfire claims its platform is already used by Fortune 500 enterprises, major healthcare institutions, and AI research labs, but it does not disclose how many of those users are production customers or what they pay. Medium SR008
CR008 Under the MSA and TOS, Goodfire only commits to support, service levels, implementation help, training, or professional services if those items are expressly defined in an order form. Medium SR001, SR003
CR009 The TOS says pilot, beta, trial, evaluation, or pre-release access may be modified, suspended, or discontinued at any time and, absent an order form, carries no service levels, support commitments, security commitments, or availability commitments. Medium SR003
CR010 Goodfire's default legal terms disclaim warranties that the platform or services will be uninterrupted, secure, accurate, complete, or error free. Medium SR001, SR003
CR011 Goodfire's aggregate liability is capped at fees paid in the prior twelve months under the MSA and TOS, while pilot-agreement liability is capped at pilot fees. Medium SR001, SR002, SR003
CR012 The TOS defines Usage Data broadly to include usage volumes, clickstream, logs, performance data, and error data, and classifies that Usage Data as Goodfire IP. Medium SR003
CR013 The TOS gives Goodfire a perpetual, irrevocable, sublicensable license to use Workflow Data to provide, improve, evaluate, train, and commercialize the platform, subject to promises not to identify the customer or reveal confidential information. Medium SR003
CR014 The MSA's feedback clause assigns customer feedback and related know-how to Goodfire without attribution or compensation. Medium SR001
CR015 Goodfire's public contracts require compliance with U.S. export and re-export restrictions and any necessary government approvals for cross-border use of the service or customer materials. Medium SR001, SR003
CR016 Goodfire says it is SOC 2 Type II compliant and directs customers to a trust portal for SOC 3 materials and full-report requests. Medium SR004
CR017 Goodfire says its Mayo Clinic collaboration operates under rigorous data-privacy protocols and Mayo's established data-governance frameworks. Medium SR011
CR018 Goodfire's Prima Mente case study says the customer needed to narrow model signals for experimental validation and FDA-approval progress, and that Goodfire identified a novel biomarker class through interpretability work. Medium SR010
CR019 On Healthcare argues Goodfire's 2026 valuation is aggressive relative to a roughly 51-person organization that still appears early in commercial traction and is funding green-field research alongside product work. Medium SR014
CR020 Goodfire's February 2026 Series B valued the company at $1.25 billion and brought total disclosed funding to just over $200 million. Medium SR014, SR015, SR021, SR022
CR021 On Healthcare reports Goodfire had about 51 employees as of January 2026. Medium SR014
CR022 Goodfire's careers page explicitly recruits people who thrive in fast-paced environments, signaling that the company is still building organizational depth while scaling. Medium SR007
CR023 On Healthcare says Eric Ho has argued there are probably fewer than 150 full-time interpretability researchers in the world, implying a tight labor pool for the company's core discipline. Medium SR014
CR024 MIT Technology Review says Silico pricing is determined case by case based on customer requirements and that Goodfire declined to disclose pricing specifics. Medium SR013
CR025 On Healthcare argues Goodfire is not yet a predictable SaaS business with clearly disclosed recurring revenue mechanics. Medium SR014
CR026 Goodfire's Prima Mente case study says Goodfire researchers embedded with the customer team and built the biomarker-discovery pipeline jointly. Medium SR010
CR027 Goodfire's TOS contemplates support, technical assistance, field engineering, research activities, collaboration activities, and deliverables alongside platform access. Medium SR003
CR028 Goodfire's May 2026 eval-awareness paper says verbalized eval awareness appeared across all 19 benchmarks and 8 models tested, with 515 manually verified instances. Medium SR024
CR029 The same paper says eval awareness correlates with safer behavior and can systematically overestimate model alignment if benchmarks do not account for it. Medium SR024
CR030 Goodfire reports that rewriting prompts cut verbalized eval awareness by 40 percent, that an unsupervised paraphrasing method cut it by 75 percent, and that refusal rates also dropped as awareness fell. Medium SR024
CR031 Goodfire's Reasoning Theater work says chain-of-thought text can be performative; on easier tasks models often know the answer early and generate superfluous reasoning that lags internal state. Medium SR025
CR032 Reasoning Theater also reports that probe-based early exit saved 68 percent of MMLU tokens and 33 percent of GPQA-Diamond tokens for DeepSeek-R1 while retaining more than 95 percent of baseline accuracy. Medium SR025
CR033 Goodfire's model-diff-amplification post says harmful or backdoored behaviors are often a needle-in-a-haystack problem that standard evaluations miss until after deployment. Medium SR030
CR034 Model diff amplification made harmful outputs 10x to 300x more frequent in testing and made a sleeper-agent backdoor about 100x easier to surface, but Goodfire says the method is only for detection and overstates real prevalence. Medium SR030
CR035 Goodfire's memorization-via-loss-curvature work says language models memorize substantial portions of training data and that many questions about how memories are stored or localized remain unresolved. Medium SR027
CR036 The same memorization work says suppressing memorization can preserve logical reasoning but degrade arithmetic and closed-book factual recall, showing that edits can trade off reliability across tasks. Medium SR027
CR037 Goodfire's SPD post argues that sparse autoencoders do not explain feature geometry, do not converge to a single true decomposition as they scale, and that SPD still has non-trivial sensitivity and has only been validated on toy models. Medium SR026
CR038 Goodfire's neural-geometry post says a single SAE direction gives only a partial view of curved structure, so interpreting features one by one misses the global picture. Medium SR028
CR039 Goodfire's manifold-steering post says linear steering often mismatches internal geometry and can produce noisy, off-target effects. Medium SR029
CR040 Goodfire's scientific-model interpretability work argues interpretability can improve reliability and transparency in downstream applications, especially clinical domains, but extracting mechanisms from complex models remains challenging and valuable. Medium SR031
CR041 MIT Technology Review says interpretability tools like Silico could be essential for safety-critical applications in healthcare and finance, increasing the burden on Goodfire to prove deployment-grade trustworthiness rather than just interesting demos. Medium SR013
CR042 NIST says the 2024 generative-AI profile and the 2026 critical-infrastructure concept note are intended to guide organizations toward concrete AI risk-management practices and trustworthy-AI controls. Medium SR016
CR043 Gartner says enterprise GenAI outcomes depend heavily on data quality, governance, change management, realistic expectations, and talent availability. Medium SR017
CR044 Datadog markets a production stack that combines prompt testing, evaluations, tracing, monitoring, sensitive-data scanning, and enterprise controls for AI systems. Medium SR019
CR045 LangSmith markets observability, monitoring, hallucination debugging, and self-hosted or BYOC deployment options so sensitive traces can stay inside the customer environment. Medium SR020
CR046 On Healthcare and Goodfire's Mayo materials both frame healthcare deployment as blocked by the gap between model predictions and biological understanding, positioning interpretability as a compliance and validation bridge rather than only a developer tool. Medium SR011, SR014
CR047 Goodfire's public proof set is concentrated in named collaborations and case studies—Prima Mente, Mayo Clinic, Radical AI, and unnamed enterprise claims—rather than a broad list of disclosed production references. Medium SR008, SR010, SR011, SR012
CR048 The Radical AI partnership announcement says details on research directions and outcomes will be shared later, so one of Goodfire's flagship scientific partnerships is still forward-looking in public evidence. Medium SR012
CR049 PwC says healthcare AI adoption is slower than in other sectors and emphasizes risk-controlled adoption, which raises go-to-market friction for vendors selling into regulated clinical workflows. Medium SR018
CR050 Adjacent observability vendors already package evaluation, tracing, monitoring, and governance into production platforms, so Goodfire has to prove that interpretability delivers a distinct control layer rather than just another form of observability. Medium SR019, SR020
CR051 Salesforce Ventures argues enterprise AI buyers are increasingly constrained by unclear ROI and by an inability to steer models reliably and consistently, framing control and reliability as buyer pain rather than purely research interests. Medium SR032
CR052 Lightspeed framed Goodfire as critical infrastructure for explainable and mission-critical AI, explicitly tying future demand to regulation and to the need to productize interpretability for enterprises rather than only researchers. Medium SR033
CR053 Investing.com reported that Goodfire works with clients including Microsoft, Mayo Clinic, and Arc Institute and plans to use new capital for model improvement, compute, and hiring, which reinforces both partner-value and execution-demand intensity. Medium SR034
CR054 Adjacent observability vendors already market tracing, monitoring, and workflow-debugging for AI agents, increasing substitution risk around parts of Goodfire's budget. Medium SR035
CR055 Datadog now packages agent observability inside a broader enterprise monitoring suite, which can pull AI-operations budget toward incumbent platforms. Medium SR036
CR056 Langfuse positions itself as an observability layer with open-source adoption, reinforcing price and workflow competition for AI development teams. Medium SR037
CR057 Langfuse publishes transparent pricing, which increases buyer expectations for standardized software packaging that Goodfire has not yet publicly matched. Medium SR038
CR058 LangSmith markets observability for AI agents and LLM applications, underscoring that adjacent tooling vendors can compete for the same developer and platform owners. Medium SR039
CR059 Weights' combination with OpenAI highlights consolidation risk in AI tooling, where platform vendors can absorb adjacent products before smaller specialists fully scale. Medium SR040
CR060 Mechanistic interpretability results still depend on advancing research rather than finished engineering playbooks. Medium SR041
CR061 Goodfire continues to publish foundational work on latent computation, underscoring that part of its edge still resides in experimental research rather than commoditized software. Medium SR042
CR062 Goodfire's ongoing publication cadence suggests platform differentiation remains tied to research velocity, which creates key-person and execution dependence if commercialization lags. Medium SR043
CR063 Goodfire's valuation and product narrative still depend on turning novel neural-geometry research into dependable commercial workflows, which keeps execution risk elevated. Medium SR044
CV001 Goodfire announced a $150 million Series B at a $1.25 billion valuation in February 2026. High SV001, SV002, SV012
CV002 B Capital led the Series B and the syndicate included Juniper Ventures, Menlo Ventures, Lightspeed Venture Partners, South Park Commons, Wing Venture Capital, DFJ Growth, Salesforce Ventures, and Eric Schmidt. Medium SV001, SV002, SV003
CV003 Goodfire said the Series B came less than a year after its Series A. Medium SV002, SV003, SV006
CV004 Goodfire announced a $50 million Series A in April 2025 led by Menlo Ventures with Anthropic participating. Medium SV006, SV007
CV005 Public company and press-release materials imply that Goodfire has raised more than $200 million in total capital after the Series B. High SV001, SV002, SV006
CV006 Official and SEC materials identify Goodfire as a Delaware company founded in 2023 and based in San Francisco. High SV028, SV029, SV030
CV007 Goodfire describes itself as a public benefit corporation focused on interpretability to understand, learn from, and design AI systems. Medium SV002, SV030
CV008 The April 2025 Form D shows roughly $52.0 million sold against a roughly $52.1 million offering tied to the Series A financing. Medium SV028, SV006
CV009 The February 2026 Form D lists Yan-David Erlich among related persons and shows a $161.7 million offering amount tied to the Series B-era filing. Medium SV029, SV002
CV010 Goodfire positions Ember as its flagship model design environment and interpretability platform. High SV006, SV007, SV010
CV011 Goodfire says Ember is meant to give programmable access to internal model features so users can inspect, edit, and retrain behavior more precisely than black-box methods. Medium SV006, SV007, SV009
CV012 Goodfire says interpretability-guided training reduced hallucinations in a language model by roughly half. Medium SV001, SV002, SV010
CV013 Goodfire cites collaborators including Arc Institute, Mayo Clinic, Prima Mente, and Microsoft. Medium SV001, SV010
CV014 Goodfire says interpretability work surfaced a novel class of Alzheimer's biomarkers from Prima Mente's epigenetic model. Medium SV001, SV002, SV010
CV015 Goodfire announced SOC 2 Type II compliance with no exceptions identified in February 2026. Medium SV031
CV016 Goodfire continued publishing 2026 research across neural geometry, steering, parameter decomposition, and pooling methods. Medium SV022, SV024, SV025, SV027, SV031
CV017 Goodfire's Llama 3 research preview says it trained sparse autoencoders on Llama-3-8B and used causal feature interventions to steer outputs while minimizing degradation. Medium SV023
CV018 Goodfire's Geometric Calculator page says Llama 3.1 8B uses a general-purpose addition module that handles months, days, and arithmetic via circular representations. Medium SV024
CV019 Goodfire's Covariance Pooling page argues second-moment pooling outperforms mean pooling on downstream genomic tasks. Medium SV025
CV020 Goodfire's Painting With Concepts page shows interpretability tooling applied to SDXL-Turbo image generation, indicating modality expansion beyond text. Medium SV026
CV021 Goodfire's VPD explainer says the company decomposed a 67M-parameter model into simple pieces and used that structure to edit behavior without training. Medium SV027
CV022 Goodfire's product wedge sits deeper in the stack than observability vendors because it aims to intervene on model internals rather than only trace outputs or enforce guardrails. Medium SV009, SV019, SV020, SV021
CV023 Arize Phoenix positions itself around tracing, evals, and agent observability rather than model-internal design. Medium SV019
CV024 Fiddler positions its product around observability, guardrails, and governance for agents and predictive AI rather than model-internal representation editing. Medium SV020
CV025 LangSmith positions its product around tracing, monitoring, and clustering for agent behavior rather than model-internal steering. Medium SV021
CV026 Gartner says generative AI entered the 2025 trough of disillusionment and that ROI depends on governance, change management, and full cost accounting. Medium SV017
CV027 NIST says the AI RMF and its generative AI profile exist to help organizations manage trustworthiness and AI risk across design, deployment, and evaluation. Medium SV018
CV028 The On Healthcare analysis says Goodfire raised $209 million across seed, Series A, and Series B and estimated the team at roughly 51 employees as of January 2026. Medium SV010
CV029 The On Healthcare analysis argues that the $1.25 billion valuation is aggressive for a research-first company with relatively early commercial traction. Medium SV010
CV030 TechCrunch's 2026 mega-round list places Goodfire among U.S. AI companies that raised $100 million or more in early 2026 at a $1.25 billion valuation. Medium SV012
CV031 TechCrunch reported that Eric Schmidt's Hillspire invested directly in Goodfire as family offices and private wealth moved earlier into AI deals. Medium SV011
CV032 Anysphere was valued at $9.9 billion after surpassing $500 million in ARR. Medium SV013
CV033 Harvey was reportedly raising at $11 billion after hitting a $190 million ARR rate by the end of 2025. Medium SV014
CV034 Glean reached a $7.2 billion valuation after surpassing $100 million in ARR. Medium SV015
CV035 Anthropic was valued at $350 billion in April 2026 with up to $40 billion of Google investment and large compute commitments. Medium SV016
CV036 Unlike Anysphere, Harvey, and Glean, Goodfire's public round materials do not disclose revenue or ARR, so a comparable revenue multiple cannot be responsibly calculated from public evidence. Medium SV001, SV002, SV010, SV013, SV014, SV015
CV037 The current mark therefore looks like strategic option value on category leadership, research talent, and future platform commercialization rather than a fundamentals-backed software multiple. Medium SV001, SV009, SV010, SV017
CV038 Goodfire's strategic investor mix—Anthropic in Series A, Salesforce in Series B, and Eric Schmidt in Series B—supports the view that technically sophisticated buyers think interpretability will matter commercially. Medium SV006, SV009, SV011, SV002
CV039 Goodfire's market relevance is helped by enterprise pressure for explainability, governance, and reliable ROI in AI deployments. Medium SV009, SV017, SV018
CV040 Public evidence still does not disclose customer count, pricing, contract structure, retention, gross margin, or software-versus-services mix. Medium SV001, SV002, SV010
CV041 A plausible bull case requires proof that Goodfire is converting research credibility into repeatable software revenue and durable enterprise adoption. Medium SV009, SV017, SV019
CV042 Without that proof, a base case should haircut the last round and anchor below $1.25 billion because market demand is real but commercial evidence is incomplete. Medium SV010, SV017, SV013, SV014, SV015
CV043 A reasonable public-evidence bear case is a sub-$650 million outcome if commercialization stays bespoke, competitors absorb budget, or private AI multiples compress. Medium SV010, SV019, SV020, SV021, SV013
CV044 A reasonable public-evidence base case is roughly $800 million to $1.1 billion, implying the last round already prices in part of the bull thesis. Medium SV010, SV013, SV014, SV015, SV017
CV045 A reasonable public-evidence bull case is roughly $1.25 billion to $1.85 billion, which requires disclosed software revenue, strong design-partner conversion, and continued research and enterprise validation. Medium SV001, SV009, SV010, SV015
CV046 Given stage and disclosure opacity, another private round or strategic acquisition is a more plausible near-to-mid-term path than a public listing. Medium SV010, SV012, SV015, SV016
CV047 The most supportable current recommendation is research-more rather than buy because company-quality evidence exceeds pricing evidence. Medium SV001, SV010, SV017, SV018
CV048 The most supportable valuation stance is stretched because the $1.25 billion round sits near the lower bound of the bull case, not the center of the base case. Medium SV010, SV013, SV014, SV015, SV017
CV049 Entry discipline should require NDA-gated disclosure of ARR or revenue, pricing, top-customer concentration, gross margin, and the post-Series-B preference stack before underwriting above the base-case range. Medium SV010, SV017, SV018
CV050 Thesis-break triggers include failure to disclose recurring revenue quality, inability to convert partners into repeatable platform customers, or evidence that observability vendors can satisfy budgets without Goodfire's deeper tooling. Medium SV009, SV019, SV020, SV021, SV026
CV051 Goodfire's valuation case depends partly on owning distinctive interpretability research that competitors may not easily replicate. Medium SV032
CV052 Goodfire continues to invest in foundational interpretability methods, which supports upside optionality but also means commercial value still depends on converting research into repeatable product adoption. Medium SV033
CV053 Goodfire's upside case still depends on scaling its interpretability research edge into a durable commercial moat before adjacent tooling categories commoditize around it. Medium SV034
Sources
IDPublisherTitleQuote
SO001 Goodfire Goodfire homepage Goodfire is a research company using interpretability to understand, learn from, and design AI systems.
SO002 Goodfire Goodfire company page
SO003 Goodfire Goodfire careers All roles are full-time, in person five days a week at our San Francisco, Telegraph Hill office.
SO004 Goodfire Our Series B Today, we’re excited to announce a $150 million Series B funding round at a $1.25 billion valuation.
SO005 Goodfire Intentionally Designing the Future of AI At Goodfire, we’re developing the science and technology that lets us steer model training — a process we’re calling intentional design.
SO006 Goodfire On optimism for interpretability At Goodfire, we believe we can engineer frontier AI systems that are understandable.
SO007 Goodfire Silico The first platform for intentional model design.
SO008 Goodfire Life Sciences We partner with companies training foundation models across architectures and modalities to interpret their models.
SO009 Goodfire Goodfire Announces Collaboration to Advance Genomic Medicine with AI Interpretability Mayo Clinic has a financial interest in the technology referenced in this press release.
SO010 Goodfire Goodfire contact Our platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs.
SO011 Goodfire Prima Mente customer story Goodfire’s platform for in silico science decoded their model, identifying a novel class of biomarkers for Alzheimer’s detection.
SO012 Goodfire Fellowship Fall 25 We’re excited to announce that we’ll be bringing on several Research Fellows and Research Engineering Fellows this fall for our fellowship program.
SO013 Goodfire AP293 guest lectures 25 We gave three guest lectures in Surya Ganguli’s course on interpretability at Stanford last fall.
SO014 PR Newswire AI lab Goodfire raises $150M at $1.25B valuation to design models with interpretability Today, Goodfire—the AI research lab using interpretability to understand, learn from, and design models—announced a $150 million Series B funding round at a $1.25 billion valuation.
SO015 Yahoo Finance AI lab Goodfire raises $150M at $1.25B valuation to design models with interpretability
SO016 Pulse 2.0 Goodfire: $150 Million Series B At $1.25 Billion Valuation Raised For Interpretability AI Lab The company has raised more than $200 million in total backing from a mix of venture firms and individual investors.
SO017 Tech Funding News Goodfire raises $150M Series B at $1.25B valuation for interpretability AI
SO018 PR Newswire Goodfire raises $50M Series A to advance AI interpretability research This funding, which comes less than one year after its founding, will support the expansion of Goodfire’s research initiatives and the development of the company’s flagship interpretability platform, Ember.
SO019 Yahoo Finance Goodfire raises $50M Series A to advance AI interpretability research
SO020 Goodfire Announcing our $50M Series A Today, we’re excited to announce a $50 million Series A funding round led by Menlo Ventures.
SO021 Menlo Ventures Leading Goodfire’s $50M Series A to interpret how AI models think
SO022 Lightspeed Venture Partners Goodfire: Building Interpretable AI We at Lightspeed are thrilled to lead their $7M seed round.
SO023 Lightspeed Venture Partners Goodfire company profile
SO024 Salesforce Ventures Welcome, Goodfire Goodfire was founded by Eric Ho, Daniel Balsam, and Thomas McGrath.
SO025 Salesforce Ventures Goodfire company profile
SO026 VCNewsDaily Goodfire Venture Capital Funding
SO027 MIT Technology Review This startup’s new mechanistic interpretability tool lets you debug LLMs In reality, they are adding precision to the alchemy.
SO028 OnHealthcare Goodfire AI and the billion-dollar interpretability bet The valuation jump ... is aggressive for a company with around 51 employees and what appears to be relatively early commercial traction.
SO029 PYMNTS Goodfire raises $150 million to better understand AI
SO030 LSVP Goodfire company page
SM001 Goodfire Goodfire Understand the scientific foundations of neural networks so that we can intentionally design AI.
SM002 Goodfire Company | Goodfire We engage deeply and selectively, partnering with teams building high-stakes or frontier systems where understanding and control are essential.
SM003 Goodfire Silico | Goodfire A model design environment.
SM004 Goodfire Language | Goodfire 58% reduction in hallucinations by using features as rewards.
SM005 Goodfire Life Sciences | Goodfire Interpretability surfaced fragment length as the dominant predictive signal.
SM006 Goodfire Robotics & Vision | Goodfire Catch generalization failure before deployment.
SM007 Goodfire Our Series B | Goodfire We have built a model design environment ... to improve model behavior, and monitor them in production.
SM008 Goodfire Intentional Design | Goodfire Intentional design will be an advance in model creation similar to the difference between selective breeding and genetic engineering.
SM009 Goodfire Feature Steering for Reliable and Expressive AI Engineering Feature steering works well with fine-tuned models but also often makes fine-tuning unnecessary.
SM010 Goodfire Mayo Clinic Collaboration | Goodfire This collaboration operates under rigorous data privacy protocols and Mayo Clinic's established data governance frameworks.
SM011 Goodfire Manifold Steering | Goodfire Research Representation steering ... promises lightweight, adaptable, and granular control of neural networks.
SM012 Goodfire Interpreting Evo 2 | Goodfire Research We discovered a wide range of features corresponding to sophisticated biological concepts.
SM013 Goodfire Interpreting LM Parameters | Goodfire Research This is not just a theoretical issue. It prevents us from achieving practical engineering goals.
SM014 Goodfire Pilot Agreement | Goodfire Customer will be allowed to test the Software and receive Services, with the aim of evaluating Goodfire's technology and considering a future long-term commercial relationship.
SM015 Goodfire / Prima Mente Prima Mente Customer Story | Goodfire Goodfire's interpretability platform ... turned their foundation model into an engine for biomarker discovery.
SM016 Gartner Generative AI | Gartner GenAI enters the Trough of Disillusionment on the 2025 Hype Cycle for Artificial Intelligence.
SM017 PwC AI Jobs Barometer | PwC Workers with AI skills command a 56% wage premium.
SM018 NIST AI Risk Management Framework | NIST AI Risk Management Framework.
SM019 Arize Phoenix | Arize The open-source platform for agent development and evaluation.
SM020 Arize Pricing | Arize AX Pro ... $50 per month.
SM021 Fiddler AI Observability | Fiddler Gain Complete Visibility from Development to Production.
SM022 Fiddler Pricing | Fiddler $0.002 per trace.
SM023 Datadog LLM Observability | Datadog Test prompt, model, and tool changes against real production data before rollout.
SM024 LangChain LangSmith | LangChain LangSmith Observability gives you complete visibility into agent behavior.
SM025 Langfuse Langfuse Langfuse brings observability, prompts, evals, experiments, and human annotation into one connected workflow.
SM026 Langfuse Pricing | Langfuse Enterprise ... $2499/month.
SM027 Patronus AI Patronus AI Evaluate agent effectiveness in tip-of-the-tongue moments.
SM028 Arthur Arthur Gain visibility and reliability of your model through continuous evals.
SM029 Humanloop Pricing | Humanloop Get the enterprise platform to develop, evaluate, and ship trustworthy LLM powered apps.
SM030 MIT Technology Review This startup’s new mechanistic interpretability tool lets you debug LLMs In reality, they are adding precision to the alchemy.
SP001 Goodfire Silico | Goodfire The first platform for intentional model design.
SP002 Goodfire Language | Goodfire Predict how your model will fail before deployment, not after.
SP003 Goodfire Life Sciences | Goodfire Trace predictive signal through interpretable features to confirm whether predictions rely on real biological structure or dataset artifacts and spurious correlations.
SP004 Goodfire Robotics & Vision | Goodfire Evaluate whether your model has learned real physical structure directly from the latent space, before generating a single frame.
SP005 Goodfire Feature steering for reliable and expressive AI engineering AI engineers often ask us how feature steering differs from prompting or fine-tuning.
SP006 Goodfire Our Series B Today, we're excited to announce a $150 million Series B funding round at a $1.25 billion valuation.
SP007 MIT Technology Review This startup's new mechanistic interpretability tool lets you debug LLMs Goodfire is one of a small handful of companies, including industry leaders Anthropic, OpenAI, and Google DeepMind, pioneering a technique known as mechanistic interpretability.
SP008 On Healthcare Tech Goodfire AI and the billion-dollar black box The valuation jump from wherever it was at Series A to $1.25B at Series B is aggressive for a company with around 51 employees and what appears to be relatively early commercial traction.
SP009 Goodfire Research Probe-based data attribution Filtering out the data flagged by our probe reduces the harmful behavior by 63% without compromising general performance.
SP010 Goodfire Research Rakuten: SAE probes for PII detection We detail one of the first uses of sparse autoencoders (SAEs) with a production AI model - using SAE probes to detect personally identifiable information for Rakuten AI agents.
SP011 Goodfire Research Understanding and steering Llama 3 We're releasing preview.goodfire.ai, a desktop interface to help you understand and steer Llama 3's behavior.
SP012 Goodfire Research VPD explainer We tried this and were able to make a precise and predictable change to the model's behaviour by directly editing the subcomponents, with no training required.
SP013 Goodfire Research Self-correcting search We were able to improve generation by giving a diffusion model a feedback loop from its own internals, resulting in ~30% more viable candidate materials in a target range.
SP014 Goodfire Research Reasoning theater Chain-of-thought reasoning is not always faithful to the model's internal computations.
SP015 Arize Phoenix The open-source platform for agent development and evaluation.
SP016 Arize Pricing | Arize AX Pro ... $50 per month.
SP017 Fiddler AI AI Observability | Fiddler AI Gain unified visibility, context, and control across agents and predictive applications.
SP018 Fiddler AI Pricing | Fiddler AI $0.002 per trace.
SP019 Arthur Arthur The full lifecycle platform for ensuring reliable AI.
SP020 Datadog LLM Observability | Datadog Free includes up to 40K LLM spans per month. Pro starts at $160 per month and includes 100K LLM spans.
SP021 LangChain LangSmith LangSmith has a free tier for development and small-scale production. Paid plans scale with trace volume.
SP022 Langfuse Langfuse Open Source AIEngineeringPlatform.
SP023 Langfuse Pricing | Langfuse $29/ month.
SP024 Humanloop Pricing | Humanloop Get the enterprise platform to develop, evaluate, and ship trustworthy LLM powered apps.
SP025 Humanloop Humanloop is joining Anthropic As we sunset the Humanloop platform, we will continue to work closely with our customers to make their transition as smooth as possible.
SP026 Weights Weights is joining OpenAI As part of this transition, our products and services have been wound down and are no longer available.
SP027 National Institute of Standards and Technology AI Risk Management Framework The NIST AI Risk Management Framework (AI RMF) is intended for voluntary use and to improve the ability to incorporate trustworthiness considerations into the design, development, use, and evaluation of AI products, services, and systems.
SP028 Gartner Generative AI The total cost of ownership (TCO) for GenAI initiatives can often exceed initial expectations due to hidden costs such as compliance reviews, model retraining and internal overheads.
SP029 Humanloop Humanloop: LLM evals platform for enterprises
SI001 Goodfire Goodfire homepage
SI002 Goodfire Understanding, Learning From, and Designing AI: Our Series B Today, we're excited to announce a $150 million Series B funding round at a $1.25 billion valuation.
SI003 Goodfire Silico
SI004 Goodfire Language
SI005 Goodfire Life Sciences
SI006 Goodfire Robotics & Vision
SI007 Goodfire Feature Steering for Reliable and Expressive AI Engineering Update (Feb 2026): Our SAE demo interface and API have been deprecated.
SI008 Goodfire Contact Our platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs.
SI009 Goodfire Careers
SI010 Goodfire SOC 2 Type II compliant We're excited to announce that Goodfire is SOC 2 Type II compliant.
SI011 Goodfire Customer story: Prima Mente
SI012 Goodfire RLFR: Reinforcement Learning from Feature Rewards Overall, we reduce the hallucination rate by 58% across the held-out test set.
SI013 Goodfire Master Services Agreement
SI014 Goodfire Pilot Agreement
SI015 Goodfire Silico Terms of Use
SI016 PR Newswire AI Lab Goodfire Raises $150M at $1.25B Valuation to Design Models with Interpretability
SI017 Yahoo Finance AI Lab Goodfire Raises $150M at $1.25B Valuation to Design Models with Interpretability
SI018 The SaaS News Goodfire Raises $150 Million at $1.25 Billion Valuation
SI019 Pulse 2.0 Goodfire: $150 Million Series B At $1.25 Billion Valuation Raised For Interpretability AI Lab
SI020 Tech Funding News Goodfire raises $150M Series B at $1.25B valuation
SI021 PR Newswire Goodfire Raises $50M Series A to Advance AI Interpretability Research
SI022 Yahoo Finance Goodfire Raises $50M Series A to Advance AI Interpretability Research
SI023 Menlo Ventures Leading Goodfire's $50M Series A to Interpret How AI Models Think
SI024 VC News Daily Goodfire Venture Capital Funding
SI025 Salesforce Ventures Welcome, Goodfire
SI026 On Healthcare Goodfire AI and the Billion-Dollar Black Box The valuation jump ... is aggressive for a company with around 51 employees and what appears to be relatively early commercial traction.
SI027 SEC Goodfire AI, Inc. Form D filing dated 2025-06-02
SI028 SEC Goodfire AI, Inc. Form D filing dated 2026-02-09
SI029 Goodfire Customer Story: Radical AI We're excited to announce a new partnership between Radical AI and Goodfire to fundamentally dismantle the black box of AI-driven materials discovery and design.
SE001 Goodfire Silico
SE002 Goodfire Language
SE003 Goodfire Life Sciences
SE004 Goodfire Robotics & Vision
SE005 Goodfire Hallucinations Viewer
SE006 Goodfire Feature Steering for Reliable and Expressive AI Engineering
SE007 Goodfire Intentionally Designing the Future of AI
SE008 Goodfire Announcing our SOC 2 Type II Certification
SE009 Goodfire You and Your Research Agent
SE010 Goodfire Under the Hood of a Reasoning Model
SE011 Goodfire The World Inside Neural Networks
SE012 Goodfire Verbalized Eval Awareness Inflates Measured Safety
SE013 Goodfire Interpretability for Alzheimer's Detection
SE014 Goodfire Can SAEs Capture Neural Geometry?
SE015 Goodfire EVEE: Explaining Genetic Variants
SE016 Goodfire Model Diff Amplification
SE017 Goodfire Stochastic Parameter Decomposition
SE018 Goodfire Understanding Memorization via Loss Curvature
SE019 Goodfire Painting with Concepts
SE020 Goodfire The Shape of Stories Inside Neural Networks
SE021 Goodfire Phylogeny Manifold
SE022 Goodfire Silico Terms of Use
SE023 Goodfire Pilot Agreement
SE024 Goodfire Careers
SE025 Goodfire AP293 Guest Lectures 25
SE026 Goodfire Fellowship Fall 25
SE027 Goodfire Announcing our Mayo Clinic Collaboration
SE028 Goodfire Prima Mente Customer Story
SE029 Goodfire Radical AI Partnership Announcement
SE030 MIT Technology Review This startup's new mechanistic interpretability tool lets you debug LLMs
SE031 Salesforce Ventures Welcome, Goodfire
SE032 On Healthcare Tech Goodfire AI and the Billion-Dollar Black Box
SE033 NIST AI Risk Management Framework
SE034 Gartner Generative AI
SE035 Menlo Ventures Leading Goodfire's $50M Series A to Interpret How AI Models Think
SE036 Lightspeed Venture Partners Goodfire
SE037 PYMNTS Goodfire Raises $150 Million to Better Understand AI
SE038 PR Newswire AI Lab Goodfire Raises $150M at $1.25B Valuation to Design Models with Interpretability
SU001 Goodfire Contact / early-access page Our platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs.
SU002 Goodfire Silico product page
SU003 Goodfire Life sciences page
SU004 Goodfire Language page
SU005 Goodfire Robotics / vision page
SU006 Goodfire Prima Mente customer story Goodfire’s research scientists embedded in Prima Mente’s team as they had finished training their model.
SU007 Goodfire Interpretability for Alzheimer's detection We detail how we studied Pleiades to identify fragmentomics as a novel class of biomarkers for Alzheimer’s detection.
SU008 Goodfire Mayo Clinic collaboration announcement This collaboration operates under rigorous data privacy protocols and Mayo Clinic's established data governance frameworks.
SU009 Goodfire EVEE: explaining genetic variants Our pathogenicity probe achieves state-of-the-art performance (0.997 overall AUROC on 839k ClinVar variants).
SU010 Goodfire Interpreting Evo 2 Preliminary experiments have shown promising directions for steering these features to guide DNA sequence generation, though this work is still in its early stages.
SU011 Goodfire Radical AI partnership announcement
SU012 Goodfire Using self-correcting search to accelerate materials discovery Applying self-correcting search improves targeting without harming SUN scores, leading to an overall ~27% increase in successful candidates.
SU013 Goodfire Rakuten SAE probes for PII detection As a result, Rakuten deployed the SAE probes - the first known enterprise application of SAEs for language model guardrails.
SU014 Goodfire Series B announcement / customer positioning We use this environment internally for research, and deploy it forward with our customers, collaborating in a shared environment.
SU015 Goodfire You and your research agent
SU016 Goodfire Blog index
SU017 Salesforce Ventures Goodfire company profile
SU018 Salesforce Ventures Welcome Goodfire Enterprise customers care more about the ROI they see from their AI investments than ever.
SU019 MIT Technology Review This startup's new mechanistic interpretability tool lets you debug LLMs In reality, they are adding precision to the alchemy.
SU020 OnHealthcare Goodfire AI and the billion-dollar bet on interpretability
SU021 Tech Funding News Goodfire raises $150M Series B at $1.25B valuation
SU022 PR Newswire AI lab Goodfire raises $150M at $1.25B valuation to design models with interpretability This funding... will enable Goodfire to ... scale partnerships across AI agents and life sciences.
SU023 Yahoo Finance AI lab Goodfire raises $150M at $1.25B valuation to design models with interpretability
SU024 Lightspeed Venture Partners Goodfire company page
SU025 Menlo Ventures Leading Goodfire's $50M Series A to interpret how AI models think Patrick Hsu, co-founder of Arc Institute... said, “Their interpretability tools have enabled us to extract novel biological concepts that are accelerating our scientific discovery process.”
SU026 PYMNTS Goodfire raises $150 million to better understand AI We use this environment internally for research, and deploy it forward with our customers, collaborating in a shared environment.
SU027 Goodfire Research index
SU028 Goodfire Radical AI customer story
SU029 Goodfire Open problems in mechanistic interpretability
SU030 Goodfire Belief dynamics in in-context steering
SU031 Goodfire Mixing mechanisms
SU032 Goodfire Replicating circuit tracing for a simple mechanism
SU033 Goodfire Mapping latent spaces in Llama 3.3 70B
SU034 Goodfire A Geometric Calculator
SU035 Mayo Clinic About Mayo Clinic
SR001 Goodfire Master Services Agreement The Services are provided "as is" and Goodfire hereby disclaims all warranties.
SR002 Goodfire Pilot Agreement In no event will either Party's aggregate liability exceed the fees paid for the pilot.
SR003 Goodfire Silico Terms of Use Customer grants Goodfire a non-exclusive, worldwide, perpetual, irrevocable, royalty-free, sublicensable license to Workflow Data.
SR004 Goodfire Goodfire is SOC 2 Type II compliant We're excited to announce that Goodfire is SOC 2 Type II compliant.
SR005 Goodfire Intentional design The techniques are early, the science is incomplete, and the hardest problems remain unsolved.
SR006 Goodfire Company Our goal is to make AI that can be understood, debugged, and shaped like software.
SR007 Goodfire Careers If you thrive in fast-paced environments and believe that understanding AI systems is essential for our future, join us.
SR008 Goodfire Contact Our platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs.
SR009 Goodfire Silico Precisely debug issues with model behavior, identify and remove confounders, and diagnose failures before they occur in production.
SR010 Goodfire Prima Mente customer story Goodfire's research scientists embedded in Prima Mente's team and built out a biomarker discovery pipeline.
SR011 Goodfire Mayo Clinic collaboration This collaboration operates under rigorous data privacy protocols and Mayo Clinic's established data governance frameworks.
SR012 Goodfire Radical AI partnership announcement More details about specific research directions and outcomes will be shared as the partnership progresses.
SR013 MIT Technology Review This startup’s new mechanistic interpretability tool lets you debug LLMs In reality, they are adding precision to the alchemy.
SR014 On Healthcare Goodfire AI and the billion-dollar interpretability bet The valuation jump is aggressive for a company with around 51 employees and what appears to be relatively early commercial traction.
SR015 PYMNTS Goodfire raises $150 million to better understand AI The company's Series B funding round values Goodfire at $1.25 billion.
SR016 NIST AI Risk Management Framework The profile will guide critical infrastructure operators towards specific risk management practices to consider when engaging AI-enabled capabilities.
SR017 Gartner Generative AI The success of these implementations often hinges on the quality of data and the effectiveness of governance frameworks in place.
SR018 PwC AI Jobs Barometer In the Healthcare sector, AI adoption is happening slower than in other industries and risk-controlled adoption of this technology matters.
SR019 Datadog LLM Observability / Agent Observability Validate changes before rollout, monitor production health continuously, and scale AI programs with stronger governance and fewer surprises.
SR020 LangChain LangSmith Observability LLM observability platforms provide visibility into agent decisions and help debug complex failures and hallucinations.
SR021 Tech Funding News Goodfire raises $150M Series B at $1.25B valuation This lack of visibility makes AI hard to control, difficult to fix, and risky to deploy at scale.
SR022 Goodfire Understanding, Learning From, and Designing AI: Our Series B To that end, we've built a model design environment.
SR023 Goodfire On optimism for interpretability Models are complex systems, and understanding them is a genuine research challenge.
SR024 Goodfire Verbalized eval awareness inflates measured safety Unless safety benchmarks account for eval awareness, they may systematically overestimate model alignment.
SR025 Goodfire Reasoning theater Models genuinely reason through hard problems, but coast through easy ones while generating superfluous chain-of-thought.
SR026 Goodfire Stochastic parameter decomposition SPD isn't a complete solution.
SR027 Goodfire Understanding memorization via loss curvature The method is not yet mature and can be heavy-handed in its edits.
SR028 Goodfire Can SAEs capture neural geometry? A single line can only give us a partial view of curved geometric structure.
SR029 Goodfire Manifold steering Linear steering cuts across the behavior manifold and produces noisy, off-target effects.
SR030 Goodfire Model diff amplification Even if an undesired behavior normally occurs only once in a million samples, amplification lets us surface it with far fewer rollouts.
SR031 Goodfire Phylogeny manifold Interpretability can improve reliability and transparency for downstream applications, especially in clinical domains.
SR032 Salesforce Ventures Welcome Goodfire Enterprise customers care more about the ROI they see from their AI investments than ever and cannot steer AI models to behave reliably and consistently.
SR033 Lightspeed Venture Partners Goodfire is building interpretable AI As governments increasingly push regulation mandating explainable AI systems, enterprises will need to provide clear rationales for model behavior.
SR034 Investing.com Goodfire raises $150 million to improve AI model understanding The company works with clients including Microsoft Corp., the Mayo Clinic, and the nonprofit Arc Institute.
SR035 IBM Think Topics: Model Observability
SR036 Datadog Agent Observability | LLM Observability | Datadog
SR037 Langfuse Langfuse
SR038 Langfuse Pricing - Langfuse
SR039 LangChain LangSmith: AI Agent & LLM Observability Platform
SR040 Weights Weights is joining OpenAI
SR041 Goodfire Priors in Time
SR042 Goodfire A Geometric Calculator
SR043 Goodfire Covariance Pooling
SR044 Goodfire The Neural Geometry Series
SV001 Goodfire Understanding, Learning From, and Designing AI: Our Series B Today, we're excited to announce a $150 million Series B funding round at a $1.25 billion valuation.
SV002 PR Newswire AI Lab Goodfire Raises $150M at $1.25B Valuation To Design Models With Interpretability Goodfire... announced a $150 million Series B funding round at a $1.25 billion valuation.
SV003 Yahoo Finance AI lab Goodfire raises $150M at $1.25B valuation to design models with interpretability
SV004 Pulse 2.0 Goodfire: $150 Million Series B At $1.25 Billion Valuation Raised For Interpretability AI Lab
SV005 Tech Funding News Goodfire bags $150M at $1.25B to build AI interpretability infrastructure
SV006 PR Newswire Goodfire Raises $50M Series A to Advance AI Interpretability Research Today, Goodfire... announced a $50 million Series A funding round led by Menlo Ventures... to support... Ember.
SV007 Menlo Ventures Leading Goodfire's $50M Series A to Interpret How AI Models Think
SV008 Lightspeed Venture Partners Goodfire: Building Interpretable AI
SV009 Salesforce Ventures Welcome Goodfire
SV010 On Healthcare Goodfire AI and the Billion Dollar Black Box The valuation jump... is aggressive for a company with around 51 employees and what appears to be relatively early commercial traction.
SV011 TechCrunch The AI gold rush is pulling private wealth into riskier, earlier bets
SV012 TechCrunch Here are the 17 U.S.-based AI companies that have raised $100M or more in 2026
SV013 TechCrunch Cursor's Anysphere nabs $9.9B valuation, soars past $500M ARR
SV014 TechCrunch Harvey reportedly raising at $11B valuation just months after it hit $8B
SV015 TechCrunch Enterprise AI startup Glean lands a $7.2B valuation
SV016 TechCrunch Google to invest up to $40B in Anthropic in cash and compute
SV017 Gartner Generative AI
SV018 NIST AI Risk Management Framework
SV019 Arize AI Phoenix
SV020 Fiddler AI AI Observability and Security
SV021 LangChain LangSmith Observability
SV022 Goodfire Research The Shape of Stories Inside Neural Networks
SV023 Goodfire Research Understanding and Steering Llama 3
SV024 Goodfire Research A Geometric Calculator
SV025 Goodfire Research Covariance Pooling
SV026 Goodfire Research Painting With Concepts
SV027 Goodfire Research VPD Explainer
SV028 U.S. Securities and Exchange Commission Form D for Goodfire AI, Inc. (Series A-era filing) Goodfire AI, Inc.... DELAWARE... 2023
SV029 U.S. Securities and Exchange Commission Form D for Goodfire AI, Inc. (Series B-era filing) Yan-David Erlich
SV030 Goodfire Company
SV031 Goodfire SOC 2 Type II
SV032 Goodfire The Neural Geometry Series
SV033 Goodfire SAE Scaling with Feature Manifolds
SV034 Goodfire SAE Scaling with Feature Manifolds