Diligence report AI safety / interpretability tools Series B private 2026-06-10

Goodfire

Interpretability-native model design lab with elite backing and still-unproven commercial scale

Goodfire looks like a category-defining interpretability company, but the public record still does not justify underwriting the February 2026 valuation as a clear bargain.

Cover facts

Latest public valuation 01

1.25 USD B [CV001]

Latest round 02

150 USD M [CV001]

Disclosed capital raised 03

207 USD M [CO021, CI005]

Headquarters 04

San Francisco [CO001, CO009]

Recommendation 05

research-more [CV047]

Company profile

Goodfire is a San Francisco-based AI interpretability company and public benefit corporation building a model-design environment for understanding, debugging, and steering neural networks. The company sells selective enterprise and research partnerships around Silico/Ember-style interpretability workflows for frontier model teams, healthcare and scientific AI programs, and other high-stakes deployments, but public disclosure still leaves revenue quality and customer breadth mostly opaque.

Website: www.goodfire.ai
Founded: 2024-01-01
Founders: Eric Ho, Daniel Balsam, Tom McGrath
Founding location: San Francisco, California, USA
Headquarters: San Francisco, California
Product: Goodfire's product is a model-design environment that exposes model internals, helps diagnose failure modes, supports steering and monitoring, and is increasingly packaged around selective enterprise and scientific deployments.
Customers: Frontier model builders, enterprise AI teams, life-sciences and scientific AI groups, and other high-stakes model developers.
Business model: Selective design-partner and enterprise software engagements built around platform access, pilots, and high-touch research or field-engineering support.
Stage: Series B private
Funding status: $150 million Series B announced in February 2026 at a $1.25 billion valuation after earlier seed and Series A rounds.

[CO001, CO003, CO004, CO018, CO021, CI007, CI008, CI009]

Executive summary

Top strengths

Goodfire has unusually strong research credibility and a differentiated interpretability-native product thesis.
The cap table includes high-signal investors and strategic backers spanning frontier AI and enterprise software.
Early flagship partnerships in healthcare, scientific AI, and enterprise design-partner workflows show real wedge potential.

Top risks

Public disclosure still does not show ARR, revenue quality, standardized pricing, retention, or customer concentration.
The company is valued as a future infrastructure winner before proving repeatable software economics.
Adjacent observability, guardrail, and platform vendors may satisfy many buyer budgets without requiring Goodfire's deeper tooling.

Open gaps

NDA-level disclosure is still needed on recurring revenue, pricing architecture, and software-versus-services mix.
The post-Series-B preference stack, ownership structure, and any secondary or debt features remain undisclosed.
Public materials do not provide a verified customer count, headcount, or concentration profile.

Chapter 01

01Company Overview

1.1 Identity, mission, and product positioning

Goodfire presents itself as a research company using interpretability to understand, learn from, and design AI systems, and multiple official and financing sources describe it as a San Francisco-based public benefit corporation. The company’s central thesis is that frontier AI is still built too much as a black box, so its mission is to make models understandable, debuggable, and shapeable rather than relying on scale alone. Official materials consistently frame the business around a “model design environment” that helps users inspect model internals, diagnose failure modes, and intervene on behavior at the feature or circuit level. The product story has matured over time. Series A materials in 2025 centered on Ember as Goodfire’s flagship interpretability platform, while by 2026 the public-facing product page markets Silico as the first platform for intentional model design. The go-to-market motion appears selective rather than mass-market: Goodfire says it works with Fortune 500 enterprises, major healthcare institutions, and AI research labs, and its public product copy repeatedly targets organizations training or fine-tuning foundation models. Public evidence therefore supports a company identity that combines research lab, platform vendor, and design-partner model, with customer concentration and commercial scale still largely undisclosed.[CO001, CO002, CO003, CO004, CO005, CO024]

Snapshot KPI table
Metric	Value / status	Date	Confidence	Gap / note
Headquarters	San Francisco, California	2026-06-10	high	Official and investor materials agree; careers page specifies Telegraph Hill office
Organization type	Public benefit corporation	2026-06-10	high	Repeated in official and financing materials
Current stage	Private, Series B stage	2026-06-10	medium	Private status disclosed; stage inferred from latest financing
Latest round	$150M Series B	2026-02-05	high	Led by B Capital
Latest valuation	$1.25B	2026-02-05	high	Repeated across official and third-party coverage
Total disclosed capital	~$207M; publicly rounded to >$200M	2026-02-05	medium	Sum of disclosed seed, A, and B rounds
Founding date		2026-06-10	low	2024 is implied by seed timing and Series A language, but one independent profile says 2023
Current product brand	Silico	2026-04-30	medium	Earlier 2025 materials used Ember; product naming evolved
Revenue / ARR		2026-06-10	low	No public revenue or ARR disclosed in reviewed sources
Customer count		2026-06-10	low	No public customer count disclosed; only broad customer categories named
Employee count		2026-06-10	low	No official headcount disclosed; one independent profile estimates ~51 employees as of Jan 2026
Disclosed customer profile	Fortune 500 enterprises, major healthcare institutions, AI research labs	2026-06-10	medium	Named logos and contract counts remain sparse

Null values mark unsupported public metrics rather than zero. Funding and valuation are well corroborated, while founding date, headcount, revenue, and customer count remain incomplete or indirect.

[CO001, CO003, CO004, CO008, CO009, CO020]

FO002: Company snapshot logic

How Goodfire links research identity, product architecture, partner types, capital, and execution dependencies.

[CO002, CO003, CO004, CO005, CO024, CO026]

1.2 Founders, leadership, and organizational profile

The founding team publicly centers on three cofounders: Eric Ho as CEO, Daniel Balsam as CTO, and Tom McGrath as chief scientist. Across investor and company materials, Ho is the primary public spokesperson and strategy voice; Balsam appears as the technical operator translating interpretability into product and applied research; and McGrath supplies heavyweight scientific credibility as the former founder of Google DeepMind’s interpretability team. Menlo and Salesforce materials also tie Ho and Balsam to prior operating experience at RippleMatch, reinforcing the narrative that Goodfire mixes frontier-research pedigree with startup execution. The broader team profile is also part of the investment case. Goodfire and its backers highlight alumni from OpenAI, Google DeepMind, Harvard, Stanford, and UC San Diego, plus named contributors such as Nick Cammarata and Leon Bergen. However, public leadership disclosure is incomplete: reviewed materials do not provide a full C-suite roster, detailed board composition, or ownership map. Even the founding date has some ambiguity. Financing materials imply the company was founded in 2024 because the Series A was said to arrive less than a year after founding and Lightspeed publicly announced a seed round in August 2024, but one independent profile describes Goodfire as founded in 2023. The office footprint disclosed publicly is also narrow: Goodfire’s careers page states roles are in person five days a week at its Telegraph Hill office in San Francisco.[CO006, CO007, CO008, CO009, CO010, CO011]

Leadership and founder table
Person	Role	Background	Founder-market fit / coverage	Key-person dependency
Eric Ho	CEO, co-founder	Former founder/operator at RippleMatch; public face of Goodfire in financing and media coverage	Sets company narrative, fundraising, partnerships, and commercial positioning for interpretability	High — external narrative and investor confidence are tightly linked to Ho
Daniel Balsam	CTO, co-founder	Former AI and engineering leader at RippleMatch; appears in Mayo and investor materials as technical operator	Bridges frontier interpretability research into product and applied genomics/enterprise use cases	High — core technical execution and productization sit heavily with Balsam
Tom McGrath	Chief Scientist, co-founder	Former founder of Google DeepMind’s interpretability team; repeatedly cited as scientific anchor	Supplies research credibility, agenda-setting, and technical recruiting power	High — scientific brand and category authority rely materially on McGrath
Nick Cammarata	Senior interpretability researcher / marquee team member	Core contributor to the seminal OpenAI interpretability team	Signals that Goodfire can recruit from the small global pool of top interpretability talent	Medium — not sole decision-maker, but valuable for research legitimacy

Coverage is partial because reviewed sources do not disclose a full board, finance leadership, or complete management roster. Table focuses on publicly named founders and high-signal technical leadership.

[CO009, CO010, CO011, CO012, CO013, CO014]

1.3 Funding history, investor base, and current stage

Goodfire has raised capital unusually quickly for a research-first infrastructure company. Public sources show a $7 million seed round led by Lightspeed in August 2024, a $50 million Series A led by Menlo Ventures in April 2025, and a $150 million Series B led by B Capital at a $1.25 billion valuation in February 2026. The Series A syndicate added Lightspeed, Anthropic, B Capital, Work-Bench, Wing, and South Park Commons, while the Series B expanded the cap table with Juniper Ventures, DFJ Growth, Salesforce Ventures, and Eric Schmidt alongside returning investors. Goodfire and third-party coverage consistently round the cumulative funding to “over $200 million,” while simple addition of disclosed rounds implies roughly $207 million. The investor mix matters as much as the dollars. Anthropic’s participation in the Series A is a strategic signal from a safety-oriented frontier lab; Salesforce Ventures indicates an enterprise-software adoption angle; and B Capital’s lead role at Series B reflects a belief that interpretability may become a major infrastructure layer. Still, the public record is thin on ownership stakes, liquidation structure, debt, secondaries, and board seats. Public sources reviewed label Goodfire as private, and the company should be treated as a late-venture, Series B-stage private business rather than a scaled commercial software company. That distinction matters because the financing pace and valuation far outstrip the company’s disclosed revenue and customer metrics.[CO016, CO017, CO018, CO019, CO020, CO021]

Stakeholder or investor map
Stakeholder	Role	Control or economic importance	Diligence ask
Lightspeed Venture Partners	Seed lead; Series A and B participant	Earliest institutional lead and continuing backer; likely influential in early governance	Confirm current ownership, pro rata rights, and any board seat
Menlo Ventures	Series A lead; Series B participant	Key financial sponsor at the first large institutional round and visible public champion	Confirm board role, follow-on reserve usage, and any protective provisions
Anthropic	Series A participant	Strategic investor whose presence signals safety and interpretability relevance to frontier labs	Clarify whether investment includes technical collaboration, channel value, or simple financial exposure
B Capital	Series B lead; Series A participant	Lead investor at the $1.25B valuation; likely major board and governance influence post-Series B	Confirm ownership percentage, board seat, liquidation terms, and any commercial introduction rights
Juniper Ventures	Series B existing investor	Named as returning investor at Series B but less visible in earlier public materials	Determine entry round, ownership, and influence relative to better-known VCs
DFJ Growth	Series B new investor	Adds late-venture scale capital and potential follow-on capacity	Assess whether DFJ view is platform-infrastructure or frontier-model optionality
Salesforce Ventures	Series B new investor and strategic enterprise partner	Signals enterprise software and procurement relevance, not just research backing	Clarify whether Salesforce provides channel access, product partnerships, or board observation rights
Eric Schmidt	Series B angel / strategic investor	Adds brand and policy credibility disproportionate to likely check size	Determine whether Schmidt is passive capital or active network participant
Wing Venture Capital	Series A and B participant	Continuing venture support from infrastructure-oriented investor base	Confirm stake and any role in product go-to-market guidance
South Park Commons	Series A and B participant; early ecosystem sponsor	Important ecosystem backer given Goodfire’s early office history and talent network	Clarify talent pipeline and whether SPC provided incubation before formal founding
Work-Bench	Series A participant	Adds enterprise-software pattern recognition at earlier stage	Determine whether Work-Bench remains active post-Series B
Mayo Clinic / design partners	Strategic non-investor stakeholders	Partners matter economically because commercial proof appears to rely on selective high-stakes collaborations	Request signed customer references, paid pilot status, and renewal dynamics

Investor map is exhaustive for explicitly named public stakeholders, not for the full cap table. Exact ownership, board representation, liquidation preferences, and secondary activity are not publicly disclosed in the reviewed sources.

[CO016, CO017, CO018, CO020, CO022, CO023]

FO003: Snapshot KPIs

A compact maturity snapshot emphasizing financing, stage, and the limited public disclosure of operating metrics.

Total disclosed capital is a simple sum of the public $7M seed, $50M Series A, and $150M Series B. The figure intentionally omits revenue and customer-count KPIs because public sources do not support them.

[CO020, CO021, CO026, CO029, CO030, CO032]

1.4 Chronology, cover metrics, and key diligence risks

The public chronology is short but dense. Goodfire surfaces from seed financing in 2024, announced its Series A in April 2025, publicized the Mayo Clinic collaboration in September 2025, launched a fellowship program and field-building educational content in late 2025, and then announced its Series B and broader intentional design agenda in February 2026. By April 2026, MIT Technology Review covered Silico as a commercial product for debugging and steering models, and by May 2026 Goodfire was emphasizing SOC 2 certification and a growing enterprise-facing posture. This sequence shows a company trying to convert cutting-edge interpretability research into product and partnership credibility in under two years. The key cover-metric pattern is asymmetry: valuation and capital raised are well supported, while operating metrics remain sparse. No reviewed public source discloses revenue, ARR, or customer count. Headcount is not officially disclosed; one independent profile estimates about 51 employees as of January 2026. That opacity matters because the most credible adverse evidence in the source set is not about misconduct, but about execution risk: MIT Technology Review quotes an external interpretability researcher arguing that Goodfire adds “precision to the alchemy” rather than turning AI engineering into a fully principled science, and an independent health-tech analysis argues the Series B valuation is aggressive for a research-first company with early commercial traction. The diligence burden is therefore less about headline credibility and more about commercial proof, governance disclosure, and how quickly interpretability demand converts into repeatable software revenue.[CO025, CO026, CO027, CO030, CO031, CO032]

Milestone table
Date	Event	Type	Amount / valuation / status	Participants	Implication
2024-08-15	Lightspeed publicly announces leading Goodfire’s seed round	founding	$7M seed	Goodfire; Lightspeed Venture Partners	Establishes the first public financing marker and anchors a 2024 operating timeline
2025-04-17	Goodfire announces Series A and Ember platform	financing	$50M Series A	Menlo Ventures lead; Lightspeed, Anthropic, B Capital, Work-Bench, Wing, South Park Commons	Moves the company from seed research lab to institutionally backed platform narrative
2025-09-09	Goodfire announces Mayo Clinic collaboration for genomic medicine	partnership	Collaboration announced	Goodfire; Mayo Clinic	Expands relevance from core interpretability research into healthcare and clinical AI
2025-10-09	Goodfire opens fall fellowship program	scale	Fellowship cohort recruitment	Goodfire research staff	Signals active talent build-out and field-building beyond core founder bench
2025-12-11	Goodfire shares Stanford guest lectures on interpretability	governance	Educational release	Goodfire researchers; Stanford course community	Shows thought leadership and effort to shape the discipline around its agenda
2026-02-05	Goodfire announces Series B and intentional design agenda	financing	$150M at $1.25B valuation	B Capital lead; Juniper, Menlo, Lightspeed, DFJ Growth, Salesforce Ventures, Eric Schmidt and others	Validates investor appetite and sharply reprices the company as category infrastructure
2026-04-30	MIT Technology Review covers Silico public launch	product	Fee-based product launch / release coverage	Goodfire; MIT Technology Review	Marks transition from research platform narrative toward broader commercial productization
2026-04-30	External researcher cautions that Silico adds “precision to the alchemy”	adverse	Skeptical expert commentary	Leonard Bereska; MIT Technology Review	Introduces skepticism that the product fully solves the scientific uncertainty it claims to address
2026-05-22	Goodfire announces SOC 2 Type II certification	regulatory	Compliance certification announced	Goodfire	Supports enterprise procurement and trust posture for handling sensitive model-development workflows
2026-06-10	Public customer profile remains selective rather than broad-market	scale	Fortune 500 / healthcare / research-lab usage stated; no broad metrics	Goodfire; unnamed customers	Commercial story still depends on quality of design partners more than disclosed volume metrics

Timeline emphasizes dated events visible in public materials. Some items represent public disclosure dates rather than the underlying operational start date, which remains partially unresolved for founding and commercial scale.

[CO017, CO016, CO018, CO024, CO026, CO027]

FO001: Company milestone timeline

Key public milestones from Goodfire’s seed emergence through Series B, product launch, and the first meaningful skeptical coverage.

[CO017, CO016, CO018, CO024, CO026, CO035]

1.5 Exhibits

Chapter 02

02Market Analysis

2.1 Market boundary and evidence-constrained sizing

Goodfire's relevant market is narrower than headline AI enthusiasm. Its own materials describe a product stack built around understanding model internals, debugging failures, steering behavior, shaping training, and in some cases monitoring production behavior. That boundary excludes generic copilots, general application observability, and AI infrastructure spend that never reaches a model-design workflow. The closest public analogs are LLM observability and evaluation vendors such as Arize, Fiddler, Datadog, LangSmith, Langfuse, Humanloop, Arthur, and Patronus, but even those mostly instrument prompts, traces, sessions, and outputs rather than model parameters or latent representations. Because Goodfire does not disclose pricing, customer count, or revenue, a classic TAM-SAM-SOM stack would overstate precision. The evidence-constrained approach is to use multiple lenses instead: first, macro demand signals that show AI usage and ROI pressure are spreading; second, published adjacent-tool pricing that establishes a software-budget floor for teams already buying observability and eval products; and third, an access lens that narrows the reachable market to organizations able to provide model internals and tolerate a services-heavy pilot motion. That combination supports a real but selective market, with more near-term substance in advanced model teams than in generic enterprise AI narratives.[CM001, CM002, CM015, CM016, CM017, CM020]

Market definition table
segment/category	included spend	excluded spend	buyer/payer	relevance
Frontier-lab interpretability and model design	Interpretability research infrastructure, steering workflows, training-shaping tools, safety diagnostics, and production monitors tied to owned models	Generic AI infrastructure, pure inference hosting, and generic app analytics	Research leads, safety teams, and frontier-model R&D budgets	Most natural direct segment because labs own model internals and already value interpretability
Enterprise model engineering and governance	Debugging, eval, steering, and monitoring for proprietary or open-weight enterprise models	Teams using only third-party closed APIs with no internal-model access	VP Engineering, AI platform leaders, ML infra, and advanced product budgets	Reachable when enterprises run or fine-tune their own important models
Scientific AI and life-sciences model design	Model decoding, validation, confounder removal, and discovery workflows in genomics, biology, and robotics	General lab software, wet-lab tools, and non-model R&D software	Scientific-program leads, computational biology teams, and research budgets	Strong fit where internal-model understanding changes scientific or deployment quality
Regulated and high-consequence adopters	Interpretability, governance, and validation layers for finance, healthcare, legal, or safety-critical AI	Commodity workplace copilots and generic knowledge-worker subscriptions	Clinical, compliance, risk, or domain-operations budgets with technical sponsorship	High-need segment, but harder to close because procurement and evidence burdens are heavier
Adjacent LLM observability and evaluation stack	Tracing, prompt management, evals, experiments, and guardrails already budgeted in production AI teams	Deep parameter or latent-space control when vendors only observe outputs or traces	Developer tooling, platform engineering, and MLOps budgets	Important adjacency because these budgets define the closest public comparison set

The market boundary is intentionally narrow: it follows the spend that can plausibly land in model-internal understanding, steering, and validation workflows rather than all generative-AI software or infrastructure.

[CM020, CM021, CM022, CM028, CM029, CM030]

TAM/SAM/SOM or sizing lens table
publisher	year	geography	value	methodology	confidence	limitation
Gartner	2025	Global	Trough of Disillusionment (qualitative maturity lens)	Hype-cycle lens for implementation realism and ROI dispersion	high	Useful for timing and caution, but not a market-size number.
PwC	2025	Global	100% of industries increasing AI usage; 3x higher revenue-per-worker growth in AI-exposed industries	Macro adoption and productivity lens	high	Adoption breadth is real, but it does not isolate interpretability-tool budgets.
Arize + Langfuse	2026	Global SaaS	$348-$600 annual list price per small team before heavy usage	Bottom-up adjacent pricing lens from public self-serve plans	high	Trace-and-eval tooling is adjacent, not the same as model-internal design tooling.
Langfuse	2026	Global SaaS	$29,988 annual enterprise list price before volume adders	Public enterprise list-price lens	high	One vendor datapoint does not reveal Goodfire pricing or win rates.
Fiddler	2026	Global SaaS	$0.002 per trace	Usage-based observability lens	high	Spend depends entirely on trace volume and still reflects output-trace observability, not interpretability work.
Goodfire direct market lens	2026	Selective design partners	Undisclosed / case-by-case	Direct commercial lens from MIT reporting plus Goodfire pilot agreement	medium	No public ACV, customer-count, or pipeline data exist for a true TAM-SAM-SOM build.

This table intentionally mixes qualitative maturity signals and adjacent pricing proxies because public evidence does not support a clean Goodfire TAM-SAM-SOM. The point is to bound the market with observable lenses rather than invent a top-down number.

[CM015, CM016, CM017, CM019, CM024, CM025]

FM001: Market sizing lens

Public evidence supports a large AI-demand backdrop, a visible adjacent observability budget layer, and a much narrower direct Goodfire capture layer defined by model access and high-touch pilots.

This is a constrained lens stack rather than a numeric TAM-SAM-SOM waterfall. Only the adjacent-budget layer has visible public pricing; Goodfire's direct commercial layer is undisclosed.

[CM020, CM019, CM024, CM026, CM044, CM045]

FM002: Market estimate range

Public pricing only supports a bottom-up range for adjacent software budgets; Goodfire's direct ACV remains undisclosed, so these figures are comparison proxies rather than Goodfire revenue estimates.

All values are adjacent-market price proxies, not Goodfire prices. The usage-based row is derived directly from Fiddler's published per-trace rate using explicit 100k, 1M, and 10M annual trace scenarios.

[CM024, CM025, CM026, CM047, CM048, CM049]

2.2 Buyer segmentation, budget owners, and adoption path

The clearest buyers are teams that both control important models and can expose enough internal state for Goodfire to do meaningful work. Frontier labs sit at the top of that list because they already run interpretability efforts, have research and safety staff who can use the tooling, and face direct pressure to shape model behavior. Enterprise model teams come next when they own proprietary or open-weight models and can justify specialized tooling through AI-platform or advanced-engineering budgets. Scientific AI teams in genomics, biology, robotics, and other research-heavy domains are especially relevant because interpretability can validate whether predictions are driven by real structure or shortcuts and can surface domain knowledge humans can reuse. Regulated adopters have strong need, but the combination of privacy, governance, and evidence requirements makes them slower to close. The payer is not always the end user. Research leads, CTOs, platform heads, or scientific-program owners may buy; model scientists, safety teams, and computational researchers use; and central AI R&D, platform, or research-program budgets pay. Public legal and product evidence implies a pilot-first motion: identify a high-stakes model problem, secure model and data access, run interpretability or steering work in a shared environment, prove a control or validation outcome, and only then expand into longer-term monitoring or licensing. That motion fits a high-touch, design-partner market better than a mass self-serve software motion.[CM003, CM005, CM006, CM009, CM010, CM011]

Segment / buyer map
segment	buyer	user	payer/workflow	budget owner	adoption trigger
Frontier labs	Chief scientist, interpretability lead, safety lead	Interpretability researchers, model scientists, safety engineers	Research program around training control, alignment, and failure analysis	Frontier-model R&D and safety budgets	Need to debug, steer, or align internally developed frontier models
Enterprise model teams	CTO, VP Engineering, AI platform lead	Applied scientists, ML engineers, eval teams	Owned or fine-tuned model programs with reliability or control needs	AI platform, infrastructure, or advanced product budgets	High-value model workflow where traces are insufficient and deeper control matters
Life sciences / scientific AI teams	Research director, computational biology lead, scientific founder	Computational scientists, modelers, translational research teams	Scientific discovery or validation workflow tied to owned foundation models	Research program or disease-area budget	Need to validate that model predictions reflect real mechanisms, not confounders
Regulated adopters	Clinical, legal, compliance, or risk executive with technical sponsor	Domain experts, review teams, model-risk staff	Pilot around high-consequence decision support or specialized model governance	Domain budget plus governance oversight	Need for transparent, auditable behavior before broader deployment

The buyer-user-payer split matters because Goodfire is sold as a high-touch capability layer. In every segment, the best trigger is a high-value model that the customer controls deeply enough to inspect.

[CM028, CM029, CM030, CM031, CM032, CM033]

FM003: Buyer / segment map

Goodfire's best near-term segments combine high need for interpretability with real access to model internals; regulated adopters have strong need but weaker immediate reach.

The matrix is an evidence-based ordinal synthesis from public product, legal, research, and independent reporting. It measures relative reachability, not disclosed revenue.

[CM028, CM029, CM030, CM031, CM036, CM046]

FM004: Adoption funnel/value-chain map

Goodfire's public materials imply a pilot-first value chain that starts with a high-stakes model problem and expands only after model access and interpretability work prove value.

The sequence comes from Goodfire's legal pilot agreement, product pages, and Series B narrative. Public sources do not disclose stage-by-stage conversion rates.

[CM009, CM010, CM032, CM036, CM044]

2.3 Growth drivers, constraints, and valuation relevance

Demand-side conditions are favorable. PwC shows that AI-exposed industries are generating materially higher revenue per worker and paying a large skills premium, which suggests real willingness to fund tools that make AI systems more effective. At the same time, adjacent vendors repeatedly frame observability, guardrails, and evaluation as business-critical because autonomous systems now touch revenue, operations, and user experience. That helps Goodfire because it means the budget conversation already exists; the company does not need to invent the importance of reliability or control from scratch. Its scientific and regulated use cases also line up with the places where output-only evaluation is least sufficient and where deeper interpretability has the most strategic value. The brakes are equally important. Gartner says ROI varies widely and hidden implementation costs can be large. NIST-style governance expectations, data privacy rules, and clinical or scientific validation standards all slow deployment. Most importantly, Goodfire's own story and independent reporting agree that the field is still technically immature: the company markets precision engineering, but external critics and even Goodfire's own research papers acknowledge that interpretability still has major open problems. Combined with the requirement for model-internal access and the lack of public pricing or customer data, that means valuation should anchor on a selective high-value wedge rather than a mass-market software assumption.[CM016, CM017, CM018, CM019, CM034, CM035]

Growth drivers and constraints table
driver/constraint	direction	timing	implication	diligence ask
Higher-stakes AI deployment	up	current	As AI touches science, healthcare, and autonomous workflows, demand rises for deeper validation and control	Ask which current customers use Goodfire for pre-deployment validation versus post hoc analysis.
Productivity and labor pressure	up	12-24 months	Firms that see real AI productivity gains are more willing to fund tooling that increases model reliability	Request proof that Goodfire shortens debugging or post-training iteration cycles enough to justify budget.
Adjacent observability budget normalization	up	current	Tracing, evals, and guardrails are already funded categories, making the budget conversation easier	Ask how often Goodfire sells alongside LangSmith, Datadog, Langfuse, or similar platforms.
Scientific discovery upside	up	12-36 months	Biology and robotics cases broaden the market beyond software teams if outcomes prove repeatable	Request revenue split and renewal evidence for scientific customers or partners.
Model-access dependence	down	current	Closed-model customers are harder to serve because Goodfire needs deeper access than most API-only users can provide	Request the pipeline split between open-weight, proprietary in-house, and closed-API prospects.
Governance and validation burden	down	current and rising	Regulated buyers may value interpretability most, but their procurement cycles are longest	Request average sales cycle and security or governance review time by segment.
Technical immaturity of mechanistic interpretability	down	12-36 months	Debate over how close the field is to precision engineering can cap budget urgency	Request benchmark evidence that Goodfire changes outcomes on production tasks, not just research demos.
Opaque Goodfire pricing and customer disclosure	down	current	Without public price and customer data, outside investors must underwrite a selective rather than broad-market story	Request ACV bands, pilot-to-license conversion, and customer-count disclosures by cohort.

The key underwriting question is not whether demand exists, but whether Goodfire can convert a real need for control into repeatable commercial deployments faster than access limits, governance friction, and field immaturity slow adoption.

[CM016, CM018, CM019, CM034, CM035, CM036]

2.4 Exhibits

Chapter 03

03Competitors

3.1 Landscape by competitor class

Goodfire sits in an unusual competitive slot. Its public product language is not about post-hoc prompt monitoring or generic LLM telemetry; it is about intentional model design, feature steering, targeted failure correction, and programmatic access to model internals. MIT Technology Review frames Silico as a mechanistic-interpretability tool that puts techniques previously concentrated inside Anthropic, OpenAI, and Google DeepMind into the hands of smaller firms and research teams. That makes internal frontier-lab interpretability groups and sophisticated in-house research teams the closest direct alternatives for buyers building or adapting open-weight models. The broader commercial landscape is more crowded but more indirect. Arize Phoenix, LangSmith, Langfuse, Datadog, Fiddler, Arthur, and former platforms such as Humanloop all compete for budget tied to trustworthy AI development, yet their default control point is tracing, evaluation, guardrails, or governance around deployed systems rather than deep editing of learned representations. The practical implication is that Goodfire should be judged less like another observability dashboard and more like a new tooling layer for model builders who need mechanistic understanding before, during, and after training.[CP001, CP002, CP004, CP006, CP007, CP008]

Competitor profile table
competitor	category	scale / funding signal	target segment	key differentiation	key limitation versus Goodfire
Goodfire / Silico	Mechanistic-interpretability-native model design	Raised $150M Series B at $1.25B valuation; ~$209M total funding disclosed	Teams building or adapting open-weight and domain-specific models	Programmatic access to model internals, feature steering, data attribution, and pre-deployment failure diagnosis	Public pricing, win-rate, and installed-base evidence are sparse relative to adjacent tooling vendors
Frontier-lab internal interpretability teams (Anthropic / OpenAI / Google DeepMind)	Direct incumbent / internal build	Embedded inside frontier labs rather than sold as a stand-alone product	Frontier model builders with closed-weight access	Deepest access to proprietary models and internal research talent	Unavailable as a commercial product for most buyers; not a purchasable vendor
Arize Phoenix	Adjacent open-source tracing and eval platform	Open-source product; AX Pro starts at $50/month with enterprise tier	AI engineers building agents and LLM applications	Tracing, evals, datasets, experiments, and open-source entry point	Focuses on agent development observability rather than mechanistic editing of model internals
Fiddler AI	Adjacent enterprise observability / guardrails vendor	Free tier, $0.002 per trace developer plan, enterprise deployment options	Enterprises needing monitoring, policy, and governance for AI systems	Unified observability, custom evaluators, real-time guardrails, SaaS/VPC/on-prem options	Competes at the monitoring and control-plane layer, not the feature-level model-design layer
Arthur	Adjacent lifecycle reliability and governance vendor	Enterprise AI platform with monitoring and policy workflow proofs on page	Enterprises managing agents, GenAI, and traditional ML together	Continuous evals, policies, guardrails, dashboards, and oversight across the AI lifecycle	Little public evidence of mechanistic interpretability or targeted internal model editing
Datadog LLM Observability	Incumbent observability platform	Free 40K LLM spans/month; Pro starts at $160/month with 100K spans	Existing Datadog customers extending APM into AI delivery	Bundles agent observability with backend monitoring, experiments, data retention, and enterprise controls	Best suited to operating production AI systems, not to reverse engineering model representations
LangChain LangSmith	Adjacent developer workflow incumbent	Free tier for development and small production; paid plans scale with trace volume	Teams already building on LangChain or multi-framework agent stacks	Strong agent tracing, SDK breadth, framework adjacency, and debugging workflows	Public page describes observability, not mechanistic model editing or training-data attribution
Langfuse	Adjacent open-source AI engineering platform	10B+ observations/month; 100k+ engineers; free plus $29/$199/$2499 self-serve plans	Developers wanting OSS tracing, evals, prompts, and production feedback loops	OpenTelemetry base, self-hosting, transparent pricing, and large OSS distribution	Economic and developer-workflow strength does not translate into Goodfire-style internal model control
Humanloop (historical)	Adjacent eval / prompt management vendor	Free trial with 50 eval runs and 10K logs/month; now joining Anthropic and sunsetting	Teams evaluating models and managing prompts for trustworthy LLM apps	Prompt management, evaluation metrics, private deployment add-ons	No longer an independent platform, which underscores category consolidation risk
Weights / Weave (historical)	Adjacent tooling vendor absorbed by frontier lab	Products wound down after team joined OpenAI	Creators and model builders using earlier Weights products	Demonstrates that AI tooling talent can be absorbed by frontier labs	No longer a live independent competitor; mainly a signal of category absorption
In-house black-box workflow	Status-quo substitute / internal build	Engineering labor plus commodity open-source or point tools	Teams unwilling to buy a new vendor category	Flexible and initially cheap: prompting, evals, fine-tuning, and guardrails can be assembled incrementally	Keeps teams in guess-and-check loops with limited mechanistic evidence on why a model failed

Profile set intentionally mixes direct, incumbent, adjacent, historical, and substitute options because Goodfire competes for a job-to-be-done, not a single analyst-defined software category.

[CP001, CP006, CP007, CP017, CP018, CP019]

FP001: Competitive positioning map

Ordinal positioning shows Goodfire furthest toward mechanistic model control, while Datadog, Fiddler, LangSmith, and Langfuse score higher on deployment-observability breadth.

X-axis is mechanistic access / direct model editability from 1 (surface-level observability only) to 5 (deep model-internal access). Y-axis is deployment and distribution breadth from 1 (narrow research workflow) to 5 (broad installed-base or platform reach). Scores are evidence-backed ordinals synthesized from reviewed source pages, not benchmark measurements.

[CP001, CP006, CP007, CP017, CP019, CP021]

3.2 Adjacent vendors: capabilities, packaging, and budget overlap

The adjacent vendor set is commercially relevant because it competes for the same buyer conversation around trustworthy AI, but the products are usually anchored to different workflows. Arize Phoenix emphasizes open-source tracing, evals, datasets, and experiments for agent development. Fiddler and Arthur lean into lifecycle observability, guardrails, policies, and governance. Datadog folds agent observability into a much larger application-monitoring estate, which is important because that installed base can make “good enough” AI oversight easier to buy than a stand-alone platform. LangSmith and Langfuse both push developer workflow and production debugging; Langfuse, in particular, combines a strong open-source posture with transparent self-serve pricing, while LangSmith advertises a free tier and trace-volume billing. Humanloop historically targeted development, prompt management, and evaluation for trustworthy LLM apps, but its move into Anthropic shows the category can be absorbed by model labs rather than remain independent. Relative to these vendors, Goodfire looks differentiated on mechanistic access and targeted model editing, but thinner on public pricing, installed base, and broadly deployed observability surfaces.[CP017, CP018, CP019, CP020, CP021, CP022]

Feature / capability matrix
buying criterion	Goodfire	Frontier labs internal teams	Arize Phoenix	Fiddler AI	Arthur	Datadog	LangSmith	Langfuse	Humanloop (historical)	In-house black-box stack
Mechanistic access to model internals	strong	strong	unsupported / unknown	unsupported / unknown	unsupported / unknown	unsupported / unknown	unsupported / unknown	unsupported / unknown	unsupported / unknown	limited
Targeted steering or editing of learned features	strong	strong	unsupported / unknown	limited	limited	unsupported / unknown	unsupported / unknown	unsupported / unknown	unsupported / unknown	limited via prompt or fine-tune only
Training-data attribution or probe workflows	strong	strong	unsupported / unknown	unsupported / unknown	unsupported / unknown	unsupported / unknown	unsupported / unknown	unsupported / unknown	unsupported / unknown	unknown
Production tracing / experiments / eval loop	limited	unknown	strong	strong	strong	strong	strong	strong	strong	partial
Real-time guardrails / policy enforcement	limited	unknown	limited	strong	strong	limited	limited	limited	limited	partial
Open-source or self-host path	limited public evidence	no commercial path	strong	limited	unknown	limited	unknown	strong	limited	strong
Enterprise deployment / compliance controls	emerging / limited public proof	internal only	strong	strong	strong	strong	unknown	strong	strong historically	depends on internal team
Domain-specific scientific model workflows	strong	limited public proof	unsupported / unknown	unsupported / unknown	unsupported / unknown	unsupported / unknown	unsupported / unknown	unsupported / unknown	unsupported / unknown	custom if team builds it

Cells are evidence-backed qualitative judgments from reviewed product pages only; unsupported or absent capability disclosures are marked as unknown rather than inferred.

[CP001, CP002, CP007, CP008, CP011, CP012]

Pricing / packaging comparison
offering	public price / contract model	packaging details	included capabilities	unknowns / discounts	implication
Goodfire / Silico	Case-by-case fee; Goodfire declined specific pricing	Custom commercial engagement aligned to customer requirements	Model design environment, experiment agent, mechanistic debugging and steering	No public self-serve list price or usage meter disclosed	Harder for buyers to benchmark ROI; product must sell on differentiated outcomes rather than transparent entry pricing
Arize AX / Phoenix	Free tier; AX Pro $50/month; enterprise custom	50k spans/month and 10GB/month in Pro; enterprise SaaS or self-hosted	Tracing, evals, datasets, experiments, observability	Startup pricing mentioned but not publicly enumerated in detail	Sets a low entry point for teams that mainly need agent telemetry and eval workflows
Fiddler AI	Free tier; Developer at $0.002 per trace; enterprise custom	Developer plan adds unified observability, custom evaluators, SSO, SaaS deployment	Observability, tests and experiments, guardrails, governance	Enterprise pricing not public beyond tier framing	Creates usage-based competition for safety and governance budgets around deployed systems
Datadog LLM Observability	Free up to 40K LLM spans/month; Pro starts at $160/month for 100K spans	On-demand overage after 100K spans; discounted M2M and annual commitments	Agent observability, evaluations, retention options, sensitive data scanning	Retention add-ons and full enterprise packaging vary by commitment	Strong incumbent bundle for teams already standardized on Datadog
LangSmith	Free tier for development and small production; paid plans scale with trace volume; enterprise by contact	Framework-agnostic SDK access with usage-based paid expansion	Agent tracing, observability, debugging	Exact public price not shown on reviewed page	Budget overlap is strongest where teams want workflow visibility rather than model-internal control
Langfuse	Free Hobby tier; $29 Core; $199 Pro; $2499 Enterprise	Units-based billing with 50k free and 100k included in paid plans; optional $300 Teams add-on	Tracing, evals, prompts, analytics, compliance features, self-host options	Volume discounts beyond listed unit ladder	Transparent pricing and OSS posture put pressure on vendors pitching generic AI engineering value
Humanloop (historical)	Free trial; enterprise/custom plans	2 members, 50 eval runs, 10K logs/month; VPC add-on and enterprise support	Prompt management, evaluation, trustworthy LLM app workflow	Independent commercial future is gone after Anthropic deal	Shows how adjacent platform categories can disappear into a frontier lab before they mature independently
In-house black-box stack	No software line item; internal labor plus cloud/tool spend	Mix of prompts, eval harnesses, fine-tuning, and guardrails from existing tools	Flexible substitute path for teams avoiding new vendor spend	True total cost often hidden in compliance review, retraining, and internal overhead	Status quo remains viable unless Goodfire proves materially better debugging, safety, or domain outcomes

Public pricing combines official list pricing, tier descriptions, and explicit unknowns from reviewed pages; absence of a number is treated as an evidence gap, not a hidden assumption.

[CP018, CP020, CP023, CP024, CP026, CP027]

FP002: Feature breadth / capability map

The capability map highlights Goodfire's relative strength in model-internal editing and domain-specific mechanistic workflows, versus adjacent vendors' strength in tracing, governance, and production operations.

Scores are ordinal 1-5 judgments from public capability descriptions only. A 5 indicates strongest visible fit in the reviewed source set, not an audited market ranking. The figure is a synthesized strength map distinct from TP002's support/unknown matrix.

[CP002, CP003, CP011, CP012, CP013, CP017]

3.3 Switching costs, substitutes, and distribution power

Switching costs in this landscape are asymmetric. Once a team standardizes on Datadog, LangSmith, or Langfuse for traces, evals, and production debugging, those tools can become the default operating surface for AI quality work even if they do not expose model internals. That distribution advantage matters because many organizations would rather extend an existing developer or observability stack than adopt a new research-native workflow. Conversely, Goodfire’s strongest use cases appear where tracing alone is not enough: open-weight model builders, safety-critical domains, and research teams that need to inspect features, attribute behaviors to training data, or intervene before deployment. The main substitute is still a black-box stack of prompting, benchmark evals, guardrails, and iterative fine-tuning, sometimes assembled in-house from open-source tools. That path is cheaper up front and familiar, but Goodfire’s argument is that it leaves teams guessing at why a model behaves badly. The competitive question is whether buyers feel enough pain from that guess-and-check loop to move budget from observability or prompt tooling into mechanistic model design.[CP004, CP008, CP016, CP017, CP018, CP022]

3.4 Moat durability and competitive risk

Goodfire’s moat case is easiest to believe when the buyer values mechanistic understanding itself. The company can point to feature steering, data attribution, PII-detection probes, and domain work in biology and robotics as evidence that model internals can be used for debugging, safety, and scientific discovery rather than just post-hoc monitoring. That gives it a more research-native product story than adjacent evaluation vendors. But the adverse evidence matters. MIT Technology Review quotes an outside mechanistic-interpretability researcher arguing that Goodfire may be adding precision to today’s alchemy rather than turning AI into a fully principled engineering discipline. The same article notes that Silico is most useful where customers can access model weights, limiting applicability on closed frontier models. OnHealthcare also frames the company as a 51-person, research-first organization valued aggressively relative to disclosed commercial traction. The highest-risk scenarios are therefore clear: larger observability vendors adding explain-and-steer features, frontier labs keeping the deepest interpretability advantages in-house, or customers deciding that trace-level controls are sufficient. Goodfire can still win if it becomes the default model-design layer for open-weight and domain-specific AI programs, but that durability is not yet proven by public win-rate, pricing, or retention evidence.[CP005, CP007, CP008, CP009, CP010, CP011]

Moat durability / competitive risk register
moat or risk claim	supporting evidence	counter-pressure	severity	mitigation / diligence ask
Mechanistic interpretability is Goodfire's clearest product moat	Silico, feature steering, data attribution, Llama steering, and probe work all point to direct intervention on model internals	Frontier labs also do mechanistic interpretability internally, and outsiders question how principled the workflow already is	high	Request customer evidence showing that mechanistic workflows change deployment or training decisions in ways observability tools cannot
Goodfire is strongest where customers can inspect open-weight or adaptable models	MIT says Silico is most usable when teams can access a model's inner workings; Goodfire markets training/debugging model design environments	Closed frontier models limit applicability; many enterprise buyers still consume APIs from black-box providers	high	Ask for customer mix by open-weight versus API-only deployments and proof of closed-model roadmap
Adjacent observability vendors can absorb large parts of the AI-quality budget	Arize, Fiddler, Datadog, LangSmith, Langfuse, Arthur, and Humanloop all sell tracing, evals, guardrails, or governance	These tools do not obviously solve feature-level debugging or data attribution, leaving room for a deeper design layer	high	Test whether Goodfire is attached to a separate budget owner or must displace observability spend
Transparent self-serve pricing elsewhere makes Goodfire's opaque pricing a sales risk	Arize, Fiddler, Datadog, and Langfuse publish entry pricing while Goodfire uses case-by-case commercial terms	If buyers perceive Goodfire as another tooling vendor rather than a differentiated research layer, price discovery will feel unfavorable	medium-high	Request realized pricing, pilots-to-production conversion, and average time to first value
Research breadth can become a moat only if it productizes	Goodfire cites hallucination reduction, PII detection, biology discovery, and diffusion-search wins across multiple domains	Broad research portfolio can also create focus risk and slow repeatable product packaging	medium-high	Ask what percentage of roadmap and headcount is tied to reusable product versus custom research engagements
Category consolidation is a real threat	Humanloop is joining Anthropic and sunsetting; Weights wound down after team joined OpenAI	Frontier labs may absorb adjacent capabilities and talent faster than start-ups can scale independently	medium	Assess whether Goodfire is more likely to be a durable platform, a feature inside another stack, or an attractive acquisition target
Governance and trust requirements help Goodfire only if buyers believe interpretability is additive to observability	NIST AI RMF and Gartner both reinforce governance, evaluation, and hidden operating-cost concerns in sensitive AI systems	Those same concerns also strengthen guardrail and observability incumbents such as Fiddler, Arthur, and Datadog	medium	Validate whether regulated buyers explicitly ask for mechanistic evidence or remain satisfied with trace-level controls and policy enforcement

Severity reflects competitive pressure on Goodfire specifically, not absolute vendor quality; mitigation requests focus on evidence missing from the public record.

[CP005, CP007, CP008, CP009, CP010, CP011]

FP003: Moat / readiness KPIs

Compact KPIs summarize the commercial and competitive boundaries around Goodfire's moat: large research funding, opaque pricing, adjacent free tiers, and direct pressure from internal frontier-lab teams.

KPI items intentionally mix funding, price floors, and packaging signals because Goodfire's competitive durability is shaped by both technical differentiation and adjacent-tool economics.

[CP005, CP010, CP018, CP020, CP023, CP026]

Chapter 04

04Financials

4.1 Revenue model and pricing surface: software is visible, economics are not

Public evidence supports a commercial product, but not a public price book. Goodfire's official surface describes Silico as a model-design environment and workspace for training and debugging models on Goodfire infrastructure, and the vertical pages repeatedly invite teams training or fine-tuning foundation models to request access rather than self-serve into a public checkout flow. The contact page goes further, saying the platform is already used by Fortune 500 enterprises, major healthcare institutions, and AI research labs. Those statements support the existence of an enterprise product and an enterprise target market. They do not disclose the actual commercial terms those customers accept. The legal documents make the pricing posture clearer. The master services agreement and terms of use both push commercial economics into negotiated order forms. The terms explicitly contemplate fees, overage charges when usage exceeds contracted allotments, and dashboard or usage-report records that are authoritative for billing. The pilot agreement separately states that pilot access is for internal evaluation and that a separate commercial license is required after the evaluation period. That combination points to a monetization stack built around custom contracts rather than public list pricing: pilot fees, commercial platform fees, usage-based overages, and potentially additional service charges. What remains unavailable is the part investors actually need to underwrite. None of the reviewed public pages disclose list price, minimum annual commit, support tier pricing, discount ladders, or realized pricing by customer type. The pricing / monetization table therefore distinguishes verified commercial mechanisms from missing economics. The absence of public prices is not unusual for enterprise AI infrastructure, but it means external readers cannot infer ACV, customer segmentation, or software gross margin from the official surface alone. The right conclusion is not that Goodfire lacks revenue; it is that Goodfire has chosen a negotiated, opaque commercial posture.[CI007, CI008, CI009, CI012, CI013, CI014]

Revenue streams table
stream	mechanism	unit	current value/status	quality	diligence ask
Pilot programs	Evaluation access under pilot agreement before full commercial license	pilot fee / pilot term	Pilot fee exists in order form; public amount undisclosed	Medium for existence, low for value	Provide executed pilot order forms, fee schedule, and conversion rate to commercial contracts.
Silico commercial platform access	Order-form-based access to hosted platform, APIs, tools, documentation, and related software	annual contract or custom license	Commercial fees exist in order forms; no public list price	Medium for mechanism, low for pricing	Provide standard order form, ACV ranges, minimum commits, and billing basis.
Usage overages	Charges for usage beyond contracted allotment under terms of use	usage unit above allotment	Overages explicitly contemplated; triggering unit and price undisclosed	Medium for mechanism, low for realized economics	Disclose metering unit, included allotment, overage rate, and customer usage mix.
Support / field engineering / research services	Technical assistance, field engineering, collaboration activities, and deliverables alongside platform use	project, retainer, or services statement of work	Services are contractually available; public pricing and attach rate undisclosed	Medium for existence, low for margin profile	Disclose service revenue share, pricing method, utilization, and gross margin.
Life-sciences discovery engagements	Platform plus embedded interpretability work for scientific discovery partners such as Prima Mente	custom engagement	Named proof points exist; no contract value or renewal data disclosed	Low for current revenue contribution	Provide contract values, renewal status, and whether these engagements convert to recurring software.
Enterprise design partnerships	Selective engagements with frontier or high-stakes AI teams	custom partnership	Officially described as selective and request-access based; no public contract economics	Low for current revenue quality	Provide design-partner count, conversion to production contracts, and realized annual spend per account.

Verified mechanisms come from legal docs and official product pages. Current value/status is intentionally qualitative because Goodfire does not disclose revenue mix or realized pricing.

[CI007, CI008, CI009, CI012, CI013, CI014]

Pricing / monetization table
product / path	price / unit / contract	list vs realized	discounts / unknowns	source
Silico commercial license	No public amount disclosed	No public list pricing; negotiated realized pricing only	Unknown minimum commits, contract term, seats or compute basis	Official product pages + MSA/TOS
Pilot agreement	Pilot fee set in order form; amount undisclosed	No public list pricing	Unknown evaluation term, conversion credits, and pilot-success criteria	Pilot Agreement
Usage overages	Overage charges apply above included allotment; unit not public	Realized only	Unknown rate card, thresholds, and true usage driver	TOS
Support / field engineering	No public price disclosed	Realized only	Unknown whether bundled, separately invoiced, or included in enterprise tier	TOS + MSA
Compliance-ready enterprise deployment	SOC 2 / SOC 3 support procurement readiness but do not set price	Not a price point	Unknown whether security/compliance premium is monetized directly	SOC 2 blog + contact surface
Deprecated demo / API preview	No current public commercial price; preview API deprecated in Feb 2026	Historic preview removed from public surface	Unknown whether any self-serve pricing survived privately	Feature steering blog

This table separates disclosed commercial mechanics from undisclosed economics. Official pricing is effectively absent; every public path points to custom contracting.

[CI008, CI013, CI014, CI015, CI018, CI020]

FI001: Revenue model bridge

Illustrates the public revenue architecture from selective customer acquisition through platform usage and services, while marking where realized pricing and margin cease to be public.

Public sources verify the nodes and commercial mechanisms, but not realized values, contract sizes, or margin. This is a structural bridge rather than a quantified waterfall.

[CI007, CI008, CI009, CI012, CI013, CI014]

4.2 GTM motion and unit economics: high-touch deployments, low public observability

Goodfire's public GTM looks selective and high touch. The Series B post says the company engages deeply and selectively with teams building high-stakes or frontier systems, while the contact page describes a platform used by large enterprises, healthcare institutions, and AI research labs. The customer-story material shows why this matters financially: in the Prima Mente engagement, Goodfire researchers embedded with the customer and built a biomarker discovery pipeline around the customer's model. The terms of use also describe support, technical assistance, field engineering support, research activities, collaboration activities, and deliverables. Together, these sources suggest that at least some deployments are not pure seat-based software subscriptions; they likely combine platform access with bespoke scientific or engineering work. That has two opposite implications. On the positive side, embedded work can accelerate design-partner conversion, widen the product moat, and justify premium enterprise pricing. It can also make Goodfire useful in high-stakes domains where customers need interpretation help, not just dashboards. On the negative side, services-heavy revenue generally scales more slowly and often carries a weaker gross-margin profile than pure software. Public sources do not reveal how much of Goodfire's revenue, if any, comes from software usage, annual licenses, pilots, or research services. They also do not disclose customer counts, pilot-to-production conversion, sales cycle length, retention, CAC, or payback. The technical proof points are meaningful but not financial metrics. Goodfire's RLFR research claims a 58 percent hallucination reduction at roughly 90 times lower cost than an LLM-as-a-judge approach, and the life-sciences case studies show credible customer-value stories in diagnostics and scientific discovery. Those are strong commercialization narratives, but they are not the same as disclosed revenue quality. For this reason the unit economics bridge is qualitative. It shows the likely path from a selective design partnership to contracted software and overages, while making clear that the realized values at each step are private.[CI009, CI010, CI011, CI012, CI015, CI016]

Unit economics table
metric	confidence	why it matters	diligence ask
Public list price for Silico	low	Without list or starting price, outsiders cannot bracket ACV or customer segmentation.	Request current price card or anonymized quote set by deployment type.
Average contract value (ACV)	low	ACV is needed to translate selective design-partner traction into revenue scale.	Provide ACV distribution for pilots, enterprise subscriptions, and strategic partnerships.
Usage gross margin	low	Consumption software can be high margin, but embedded compute or human delivery can compress it.	Provide gross margin by platform usage line and by services line.
Services revenue share	low	A services-heavy mix changes scalability and valuation framework.	Disclose software-versus-services mix for the last twelve months.
Pilot-to-production conversion rate	low	This is the clearest proxy for revenue quality in a selective-enterprise GTM model.	Provide count of pilots launched, converted, and churned.
Sales cycle length	low	Long enterprise and healthcare procurement cycles can delay revenue recognition and cash collection.	Disclose median cycle from first contact to signed order form by customer segment.
CAC payback	low	Necessary to judge whether high-touch GTM is economically durable.	Provide fully loaded CAC and gross-margin payback by cohort.
Retention / expansion	low	Overages and usage growth matter only if accounts renew and expand.	Provide logo retention, gross retention, and expansion rates for paying accounts.

Null means the metric is not publicly disclosed in reviewed sources, not that the metric is zero or irrelevant.

[CI012, CI016, CI022, CI023, CI029, CI030]

FI002: Unit economics bridge

Qualitative bridge from acquisition motion to blended economics, highlighting where public evidence ends and diligence requests must begin.

The bridge is intentionally qualitative because Goodfire does not disclose ACV, CAC, payback, retention, or gross margin.

[CI012, CI014, CI015, CI016, CI022, CI023]

4.3 Capital adequacy and financing: funding is verified, runway is not

The strongest financial facts in the public record are financing facts. Goodfire announced a $50 million Series A in April 2025 and a $150 million Series B at a $1.25 billion valuation in February 2026. The SEC Form D filings sharpen those announcements. The 2025 filing reports $52,029,991 sold after a first sale on 2025-04-02, and the 2026 filing reports $149,999,796 sold after a first sale on 2025-12-17, against a total offering amount of $161,674,124. On that narrow basis, at least $202.0 million of equity sold in the two disclosed rounds is directly verifiable from primary filing data, and public commentary places total funding modestly above $200 million including earlier capital. That financing fact pattern supports one clear conclusion: Goodfire has had strong capital access. It does not answer the central capital-adequacy question. No reviewed public source discloses cash on hand, monthly burn, runway months, debt covenants, or a next-round trigger. The public uses of funds are broad: frontier research, next-generation product development, and scaled partnerships across AI agents and life sciences. Those are real cash uses, but they are not enough to derive runway because the denominator is missing. Even the relatively small additional clue in the 2026 Form D — a total offering amount above the announced sold amount — only shows possible capacity or reserve in the round, not actual cash still available. The financial estimate range therefore stays disciplined and only brackets financing facts that are source-backed. It does not invent revenue, burn, or runway. Likewise, the capital-intensity map highlights where cash likely goes — product, research, embedded delivery, and enterprise compliance — while preserving the distinction between documented financing and inferred cost structure. This is the correct evidence-constrained stance: the raise is verified, but capital adequacy beyond the raise cannot be underwritten from public data.[CI001, CI002, CI003, CI004, CI005, CI006]

Capital adequacy table
item	public value / status	confidence	why it matters	diligence ask
Verified Series A financing	Announced $50M; Form D shows $52.029991M sold	high	Primary evidence confirms external capital raising in 2025.	Reconcile press-announced round size to cap-table-close documents.
Verified Series B financing	Announced $150M at $1.25B valuation; Form D shows $149.999796M sold and $161.674124M total offering	high	Primary evidence confirms large 2026 financing and possible residual offering capacity.	Provide final close schedule and whether any unsold allocation remained available.
Cumulative disclosed capital since Series A	At least $202.029787M sold across 2025-2026 Form D filings; public commentary says backing exceeds $200M overall	high	This is the strongest public capital-adequacy anchor.	Provide total capital raised including seed and remaining unrestricted cash.
Cash on hand		low	Cash balance is required to convert financing history into actual runway.	Provide current unrestricted cash and short-term investments.
Monthly burn		low	Without burn, no public runway estimate is defensible.	Provide last six months of net burn and planned spend by function.
Runway months		low	Runway is the core adequacy metric after a large financing round.	Provide management runway view under base and downside plans.
Planned use of funds	Frontier research, next-generation core product, and scaling partnerships across AI agents and life sciences	medium	Confirms capital is funding both R&D and GTM, not just balance-sheet preservation.	Provide board-approved use-of-proceeds model with timing and budget buckets.
Debt / project-finance obligations	No public debt or project-finance obligations identified in reviewed sources	low	Absence of disclosure is not proof of zero leverage, but no public obligation surfaced here.	Provide debt schedule, venture debt terms, leases, and any committed compute obligations.

This table separates verified financing facts from unavailable liquidity metrics. Null values reflect missing public disclosure, not negative findings.

[CI001, CI002, CI003, CI004, CI005, CI006]

FI003: Financial estimate range

Source-backed financial ranges limited to financing facts; revenue, burn, and runway are excluded because they are not publicly disclosed.

Low/base/high values reconcile press-announced financing, Form D sold amounts, and broader public commentary about total backing. This figure does not invent ranges for revenue, burn, or runway.

[CI001, CI002, CI003, CI004, CI005, CI034]

FI004: Capital intensity / cash-flow map

Matrix showing where public capital evidence exists and where operating-cash evidence is still missing.

This is a structured evidence map, not a quantified cash-flow statement. The purpose is to keep verified financing separate from missing operating-liquidity data.

[CI006, CI019, CI020, CI037, CI038]

4.4 Financial verdict and public gaps: verified funding, inferred monetization, unresolved underwriting

The evidence supports a precise but narrow verdict. Goodfire is not financially unformed; it has a verified enterprise product surface, real external financing, named partners in regulated and frontier domains, enterprise-security credentials, and commercial contracts that contemplate fees, overages, and usage measurement. Those are the ingredients of a real business. But almost every metric needed to judge revenue quality and margin path remains private. There is no public revenue, no public ARR, no gross-margin disclosure, no cash balance, no burn, no runway, and no debt schedule. That gap matters because the likely business model is mixed. The software platform could become a valuable recurring-revenue layer if usage and overages dominate. Yet the customer evidence and services clauses imply that at least part of the current offering includes embedded scientific and engineering labor. Without knowing the software-versus-services split, investors cannot tell whether Goodfire should be valued more like enterprise infrastructure software, specialized applied-research services, or a hybrid that starts service-heavy and software-lighter before maturing. The adverse read is straightforward. One skeptical sector analysis argues that the $1.25 billion valuation is aggressive for a company with early commercial traction and not yet a predictable SaaS profile. That critique is directionally fair given the public data: the capital has been disclosed, but the operating model has not. The underwriting answer is therefore to separate what is verified from what is inferred. Verified: financing, enterprise contracting mechanics, security readiness, and selective customer traction. Inferred: monetization mix, gross-margin path, and runway durability. The gaps table below captures the exact diligence requests needed before a financial investment case can move from plausible to underwritten.[CI017, CI018, CI020, CI025, CI029, CI030]

Public financial gaps table
missing private metric	impact on underwriting	exact diligence path
Revenue / ARR by quarter	Cannot test whether valuation is supported by actual commercial scale.	Request monthly recurring revenue bridge, quarterly revenue, and last-twelve-month ARR walk.
Realized pricing by customer type	Cannot distinguish premium software economics from service-heavy bespoke work.	Request anonymized signed order forms and invoice samples across enterprise, healthcare, and research customers.
Software versus services revenue mix	Cannot underwrite gross-margin path or scalability.	Request management split of platform, overage, pilot, and services revenue for the last twelve months.
Gross margin and contribution margin	Cannot assess whether consumption and embedded-delivery costs support durable unit economics.	Request gross margin by revenue line, plus cost buckets for compute, support, and personnel.
Cash balance and burn	Cannot estimate runway or next financing need despite large recent rounds.	Request cash, debt, net burn, and planned hiring / research spend through the next 24 months.
Sales efficiency and retention	Cannot judge whether selective GTM converts into repeatable enterprise software economics.	Request pipeline conversion, sales cycle, CAC, payback, logo retention, and expansion metrics.

Every row here is a material diligence blocker rather than a cosmetic omission. These gaps are the reason this chapter remains evidence-constrained.

[CI029, CI030, CI031, CI037, CI038, CI040]

Chapter 05

05Product & Technology

5.1 Product definition and customer workflow

Goodfire's commercial surface is best understood as a model-design environment rather than as a generic LLM observability dashboard. Silico is presented as the first platform for intentional model design, a workspace for training and debugging models on Goodfire infrastructure, and a system that packages productized interpretability around concrete jobs: seeing inside predictions, running health checks, debugging failures, shaping behavior, and improving generalization. The practical consequence is that the product sits much closer to model-development loops than to standard application-layer analytics. The customer workflow is also unusually high touch. Public pages repeatedly push teams into request-access or partnership motions instead of a self-serve onboarding path. In practice, the workflow starts with a model team that already controls weights, activations, or at least enough internals to let Goodfire inspect how the model behaves. Goodfire then pulls models, datasets, prompts, workflows, and evaluation tasks into a shared workspace, runs agent-assisted experiments, and translates the resulting mechanistic findings into interventions such as steering, diagnostics, data filtering, or reward shaping. The vertical pages show the same loop repeated across domains. Language teams use the stack to reduce hallucinations; life-sciences teams use it to extract biomarkers and variant hypotheses from model internals; robotics and vision teams use it to catch brittle features and leakage before deployment. The result is a product with real workflow specificity, but one that still depends on customer willingness to operate in a shared, research-adjacent environment rather than through a mature, commodity API surface.[CE001, CE002, CE003, CE004, CE005, CE006]

Product module / asset matrix
Module / asset / product line	Primary user	Status / maturity	Differentiation	Diligence gap
Silico shared workspace	Frontier labs and enterprise model teams	Live product surface; access controlled	Packages interpretability around a model-design environment rather than an app-level dashboard	No public tenant model, API reference, or deployment architecture
Model scientist agent / experiment orchestration	Researchers and model engineers	Live internally and publicly described in launch materials	Automates experiment planning and execution inside the same workspace	Human-review rules, guardrails, and customer autonomy levels are not public
Diagnostics and health checks	Training, evaluation, and safety teams	Live workflow claims	Surfaces bottlenecks, feature collapse, shortcut learning, and rare failures before deployment	No published precision/recall or benchmark coverage by model class
Steering and intervention controls	AI engineers tuning model behavior	Live but still evolving after preview-tool deprecation	Direct feature steering, reward shaping, and data-filtering style edits	Supported-model matrix, rollback controls, and commercial packaging are private
Language reliability workflow	Open-model or fine-tuning teams	Most concrete public workflow	58% hallucination reduction claim plus rollout viewer for intervention review	Evidence is strong but still concentrated in Goodfire-selected case studies
Scientific discovery workflow	Genomics and life-sciences researchers	Advanced partner workflow	Turns model internals into biomarkers, pathogenicity probes, and human-readable variant hypotheses	Clinical validation and regulatory pathway remain partner-specific
Physical AI / creative workflow assets	Robotics, vision, and image-model teams	Partner workflow or research preview	Extends same interpretability primitives into policy bottlenecks, leakage detection, and latent editing UIs	Commercial status and repeatability outside case studies are not public

Rows combine public product modules and workflow assets because Goodfire markets the platform through problem-specific surfaces rather than through a public SKU sheet.

[CE001, CE002, CE003, CE004, CE007, CE011]

Workflow / use-case table
User job	Current workflow	Goodfire solution	Measurable benefit	Limitation
Reduce LLM hallucinations before deployment	Prompt tweaks, judge loops, and post-hoc output review	RLFR, feature steering, and the Hallucinations Viewer inside the model-design environment	58% hallucination reduction and roughly 90x lower intervention cost versus LLM-as-judge claims	Evidence is workflow-specific and not a universal performance guarantee
Debug frontier reasoning-model behavior	Prompt hacks and coarse response benchmarking	Reasoning-model SAEs, feature databases, and timing-aware steering on R1	Shows reasoning-specific features like backtracking and exposes steering edge cases at large scale	Requires weight or activation access and expert handling of model-specific behavior
Extract biomarkers from a scientific model	Black-box prediction review and wet-lab triage	Embedded interpretability work using SAEs, tracing, and ablation on customer models	Surfaced a novel Alzheimer's biomarker class and a human-readable classifier that generalized to an independent cohort	Still requires downstream experimental validation
Explain genome-wide variant effects	Opaque pathogenicity scores and coding-region-limited tools	Evo 2 embeddings plus probes and reasoning-model synthesis through EVEE	0.997 AUROC on 839k ClinVar variants and structured hypotheses for 4.2M variants	Outputs are hypotheses, not diagnoses or regulatory-grade evidence
Catch robotics or vision failures before deployment	Wait for benchmark misses or production failures	Inspect latent policy structure, geometry, and leakage before deployment	Can localize bottlenecks, unused observations, and ECG leakage in reviewed case studies	Public evidence is case-study based rather than product-documentation based
Edit image-model behavior directly	Prompt-box iteration only	Paint With Ember canvas that manipulates latent activations and concept weights	Supports adding, moving, and reshaping concepts without only rewriting prompts	This looks like a research preview rather than the core commercial SKU

Benefits mix directly claimed research outcomes with workflow-specific demonstrations. Goodfire does not publish customer-level ROI, conversion, or usage-frequency metrics for these flows.

[CE004, CE005, CE006, CE009, CE010, CE012]

FE002: Customer workflow / operating flow

The public workflow starts with a partner-led access motion, moves through shared interpretability experiments, and ends in targeted steering or design decisions.

This operating flow synthesizes the recurring pattern across language, life sciences, robotics, and launch materials. Public sources do not expose a formal buyer playbook or conversion funnel.

[CE003, CE004, CE007, CE011, CE016, CE033]

5.2 Interpretability primitives and operating architecture

Goodfire's architecture pairs a shared experiment workspace with a research stack that spans activation analysis, geometry discovery, parameter decomposition, and intervention tooling. The official research surface shows sparse autoencoders, probes, and manifold methods doing the early-stage work of surfacing interpretable features; neural-geometry work argues that many important concepts live on curved internal manifolds rather than single directions; and stochastic parameter decomposition pushes the stack deeper into weights, where Goodfire tries to identify which causal components can be removed without changing outputs. That combination suggests the platform is not a single technique but a layered toolkit for interpreting, localizing, and editing model behavior. The R1 work is especially revealing because it shows both capability and friction. Goodfire says it trained the first public sparse autoencoders on a frontier reasoning model and had to build custom inference and interpreter-model infrastructure to do so. At the same time, the work shows that steering reasoning models is not plug-and-play: interventions had to happen after the model's stock response prefix, and some heavy-handed steering caused behavior to snap back toward the original response. That makes the core product proposition stronger, not weaker: the whole point of Silico is to expose these hidden operational constraints before customers ship or retrain blindly. This architecture also explains Goodfire's dependency stack. The deepest workflows require access to model internals, which makes open-weight or customer-controlled models a better fit than closed API endpoints. It also explains why Goodfire can reuse the same core ideas across domains. EVEE, Alzheimer's biomarker work, Paint With Ember, and robotics bottleneck analysis all share the same pattern: pull out internal structure, translate it into something legible, then use that understanding to debug, steer, or design the model more intentionally.[CE013, CE014, CE018, CE019, CE020, CE021]

Technology / operating architecture table
Layer / process / component	Role	Dependency	Risk
Customer model and materials ingestion	Brings weights, datasets, files, code, prompts, and workflows into the workspace	Customer must control or expose enough internals for analysis	Closed API models and restrictive data-sharing rules can block the deepest workflows
Shared workspace and agent orchestration	Runs experiments, captures outputs, and coordinates interpretability tasks on Goodfire infrastructure	Goodfire compute, inference, and agent tooling	Tenancy, region layout, and review/approval controls are not public
Activation interpretability layer	Uses SAEs, probes, and related tools to localize model features and signals	Activation access plus trained interpreter models	Linear feature methods can miss global curved structure
Geometry / manifold layer	Recovers structured concept spaces for smoother understanding and control	Clustering and geometry-discovery pipelines over internal representations	Research maturity is high, but packaged product boundaries are not fully public
Parameter decomposition layer	Inspects weights as causal components rather than only observing activations	SPD-style decomposition and masking methods	Scalability, runtime cost, and product packaging remain partially research-stage
Monitoring and failure-surfacing layer	Uses amplified sampling and eval-awareness analysis to catch rare post-training failures	Before/after checkpoints, rollout analysis, and judge infrastructure	Monitoring findings can depend on prompt design and may not generalize automatically
Intervention and steering loop	Applies feature steering, filtering, reward shaping, and targeted model edits	Edit permissions, rollback discipline, and model-specific heuristics	Wrong timing or oversteering can cause route-around behavior in reasoning models
Service and commercial delivery layer	Adds support, technical assistance, field engineering, and research collaboration around the platform	Order forms, Goodfire personnel, and partner workflows	High-touch delivery can slow scaling and hide how much value is software versus services

This is an evidence-backed operating architecture, not an official engineering diagram. It distinguishes public method layers from undisclosed infrastructure details such as tenancy, vendor stack, and data residency.

[CE018, CE020, CE021, CE022, CE023, CE024]

FE001: Product architecture map

Silico stacks customer-controlled model access, shared experimentation, interpretability primitives, and intervention tooling into a single model-design environment.

This stack is inferred from product pages, research posts, launch coverage, and legal terms. Goodfire does not publish a canonical architecture diagram or vendor-by-vendor infrastructure map.

[CE001, CE016, CE018, CE023, CE026, CE033]

FE003: Critical dependency map

Silico depends on customer access to model internals, Goodfire-controlled experiment infrastructure, contractual order forms, and domain-specific partner contexts.

The map is a synthesis of public product, legal, and launch materials. It highlights the practical dependency that Goodfire works best when customers can expose model internals rather than only call opaque APIs.

[CE017, CE033, CE034, CE036, CE037, CE038]

5.3 Trust, quality, and compliance posture

Goodfire's public trust posture is more mature on enterprise security than on public operational transparency. The strongest visible procurement signal is the company's SOC 2 Type II announcement, which says the audit completed with no exceptions and is accompanied by a public SOC 3 summary. Health-facing materials add another layer by describing Mayo-specific privacy protocols and governance frameworks designed to reduce spurious correlations and improve clinical relevance. Those are meaningful indicators for buyers in regulated environments. The legal surface, however, makes clear that many of the operational details investors and enterprise architects normally want to inspect are still private. The terms of use define the platform broadly to include software, APIs, tools, documentation, support, and services, but the concrete economics live in negotiated order forms. Usage reports are authoritative for billing, overages exist, and pilots are explicitly provided on an AS IS basis unless an order form says otherwise. Public terms also reserve suspension rights for security, legal, operational, and payment reasons and allow third-party products into the delivery stack. This is credible enterprise contract scaffolding, but it still leaves important diligence gaps. Public materials do not disclose a self-serve API reference, a public status page, deployment-count evidence, tenancy architecture, or quantitative uptime history. That matters because the external frameworks Goodfire is selling into are increasingly intolerant of black-box governance. NIST focuses on trustworthiness across design, development, use, and evaluation, while Gartner warns that hidden governance and change-management costs can dominate ROI in high-stakes GenAI deployments. Goodfire is directionally aligned with those buyer needs, but still early in how much public operating evidence it exposes.[CE033, CE034, CE035, CE036, CE037, CE038]

Trust / quality / compliance table
Control / certification / quality metric	Status	Scope	Gap
SOC 2 Type II / SOC 3	Achieved; Type II announced with no exceptions	Enterprise security and procurement assurance	Does not substitute for public uptime or architecture transparency
Order-form commercial controls	Live contractual structure	Fees, overages, service scope, and commercial commitments	No public rate card or public benchmark for deal terms
Pilot program guardrails	Live evaluation structure	Internal evaluation only; separate commercial license required after pilot	Default pilot terms are AS IS and do not publish service levels
Usage reports and metering	Live billing control	Goodfire records are authoritative for fee calculation and usage summaries	Public documents do not disclose exact metering units, quotas, or thresholds
Suspension and third-party-product governance	Live contractual control	Security, legal, operational, payment, and third-party integration handling	Fallback procedures and vendor list are not public
Mayo privacy and governance protocols	Partner-specific public commitment	Health and genomics collaboration	Not a generic public privacy architecture for all customer deployments
Public transparency surface	Limited	Trust portal, contact path, and security summary	No public status page, self-serve API docs, incident history, or deployment-count disclosure

The table separates formal procurement signals from missing public operating evidence. Goodfire looks stronger on negotiated enterprise controls than on broad public transparency.

[CE034, CE035, CE037, CE038, CE039, CE040]

5.4 Roadmap, release cadence, and maturity

Goodfire's release cadence looks more like a fast-moving research organization productizing an internal stack than like a traditional enterprise software vendor with a stable public changelog. One public breadcrumb is the February 2026 deprecation notice for the earlier SAE demo interface and API, which implies a transition away from narrow research-preview tooling. By late April 2026, MIT Technology Review was covering Silico as an externally available product, and the company's own financing press materials were already framing the roadmap around next-generation product development plus scaled partnerships across AI agents and life sciences. The cadence after launch is still primarily expressed through research drops. In May 2026 alone, Goodfire published work on eval-awareness measurement, story-shape geometry, and SAE-based geometry recovery. That is unusually fast public iteration for a company trying to sell into enterprise and regulated workflows. It also means roadmap visibility is asymmetric: buyers can see the scientific engine moving quickly, but cannot yet inspect a normal SaaS artifact trail such as versioned release notes, public incident history, or a broad integration catalog. The resulting maturity picture is mixed but coherent. Core scientific capability appears strong, and the domain workflows in language, genomics, and scientific discovery are more than conceptual. Security posture is enterprise credible. The main immaturity is packaging: access remains negotiated, many deployments appear service-attached, and several key reliability and integration details remain private. Goodfire therefore looks most mature as a high-end design environment for teams with serious model ownership, and least mature as a broadly standardized developer platform.[CE015, CE017, CE039, CE044, CE046, CE047]

Roadmap / release / development-stage table
Date / stage	Feature / milestone	Status	Implication	Source
Pre-Feb 2026 preview	Standalone SAE demo interface and API	Deprecated in Feb 2026	Goodfire consolidated from narrow preview tooling toward a broader platform motion	Feature Steering blog
2026-02 strategic thesis	Intentional design and next-generation core-product narrative	Publicly articulated	Roadmap is anchored on closed-loop training control, not only on post-hoc explanation	Intentional Design + PR Newswire
2026-04-30	Silico launch / external unveiling	Live product surface	Internal interpretability tooling became an externally offered product with case-by-case pricing	MIT Technology Review
2026-05-04	Verbalized eval awareness paper	Published	Public research cadence focuses on reliability and benchmark quality for safety-conscious buyers	Goodfire Research
2026-05-20	The Shape of Stories Inside Neural Networks	Published	Shows weekly geometry research output rather than a classic SaaS changelog pattern	Goodfire Research
2026-05-21	Can SAEs Capture Neural Geometry?	Published	Continues tooling work that can feed future control surfaces and geometry-aware methods	Goodfire Research
2026 security milestone	SOC 2 Type II / SOC 3	Achieved	Procurement readiness is moving faster than public ops telemetry	Goodfire blog
2026 partner build-out	Mayo, Prima Mente, and Radical domain workflows	Active programs	Roadmap includes scientific verticalization in genomics, healthcare, and materials, not just generic LLM tooling	Goodfire partner/customer pages

Goodfire exposes roadmap mainly through research posts, partner announcements, and financing narratives rather than through a public changelog. Dates therefore track public milestones, not a version-history feed.

[CE015, CE039, CE047, CE048, CE049, CE050]

FE004: Product maturity / capability map

Maturity is strongest in the core interpretability engine and domain-specific workflows, and weakest in public platform packaging and transparent operating telemetry.

Ratings are qualitative judgments from public evidence only. They measure visible maturity, not internal product quality or customer satisfaction.

[CE015, CE039, CE044, CE046, CE047, CE048]

5.5 Exhibits

Chapter 06

06Customers

6.1 Customer segmentation and buying centers

Goodfire's public customer story centers on organizations that build or fine-tune foundation models rather than end-user application buyers. The clearest broad segmentation claim comes from the company's contact page, which says the platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs. Product pages sharpen that picture: Silico is pitched to teams training or fine-tuning models across architectures and modalities, the language page targets LLM developers who want to predict failures and improve behavior without retraining from scratch, the life-sciences page targets genomics and scientific-model teams, and the robotics / vision page targets physical-AI and medical-imaging workflows. Across those surfaces, the likely economic buyer is an R&D, platform, or product owner responsible for model performance and reliability, while the day-to-day users are research scientists, ML engineers, and interpretability specialists. The important caveat is that Goodfire does not translate those segment claims into counts, revenue mix, or named enterprise references. Public materials do not disclose customer count, ARR, segment share, or a list of the Fortune 500 users behind the broad enterprise claim. The public proof set is therefore much deeper in vertical specificity than in commercial breadth: named evidence clusters in genomics, clinical research, AI-agent safety, and materials discovery, with the rest of the enterprise narrative still mostly unenumerated. That asymmetry suggests a selective, high-touch go-to-market motion in which Goodfire wins a small number of technically sophisticated design partners first and only later may broaden toward more standardized enterprise software distribution.[CU001, CU002, CU003, CU004, CU005, CU006]

Customer segmentation table
Segment	Buyer / user / payer	Primary use case	Public proof	Strategic value	Gap
Frontier model labs and AI research teams	Buyer: research / platform lead; user: interpretability researcher and ML engineer; payer: R&D or model platform budget	Inspect internals, debug failures, shape training, monitor deployment	Silico page, Series B post, MIT Technology Review	Core category where Goodfire can become workflow infrastructure	No public account count or list of labs beyond named references
Healthcare and genomics institutions	Buyer: medical AI leader or scientific program owner; user: computational biology / genomics team; payer: research or translational medicine budget	Interpret scientific models, surface biomarkers, explain variant effects, validate model reasoning	Mayo Clinic, Prima Mente, Arc Institute, EVEE research	Highest-quality named proof and strongest differentiated outcomes	Most evidence is still research-stage, not routine clinical production
Large enterprises / Fortune 500	Buyer: enterprise AI or product owner; user: ML / safety / model operations teams; payer: innovation, platform, or business-unit budget	Improve reliability, controllability, and ROI of internal models	Contact page and Salesforce Ventures thesis	Could materially broaden ACV if the broad claim converts to named logos	No named Fortune 500 accounts or disclosed outcomes
AI-agent platforms and consumer-internet operators	Buyer: safety / product leader; user: guardrail and infrastructure teams; payer: platform engineering budget	Detect PII, monitor agent behavior, deploy lightweight guardrails	Rakuten production deployment	Best public proof that Goodfire can support live enterprise workflows	Only one named production enterprise case in reviewed sources
Materials and physical-science teams	Buyer: scientific program lead; user: model scientist and autonomous-lab team; payer: R&D budget	Use internals to improve inverse design and candidate targeting	Radical AI partnership and self-correcting-search research	Expands Goodfire beyond biology into broader in silico discovery	Commercial maturity and repeatability remain early

Rows summarize public segment evidence only. Nulls and unnamed enterprise claims indicate missing disclosure rather than absence of customers.

[CU001, CU002, CU003, CU004, CU005, CU008]

FU001: Customer journey map

Public evidence points to a selective enterprise journey: identify a high-stakes model problem, engage Goodfire as a design partner, work in a shared environment, validate technical gains, and then expand into broader monitoring or research programs.

[CU001, CU003, CU009, CU011, CU017, CU022]

6.2 Named customer proof and adoption motion

The named proof set shows that Goodfire is doing real work for customers and partners, but the type of proof varies materially by account. Prima Mente is the clearest model-to-science case study: Goodfire says it embedded researchers with Prima Mente, interpreted the Pleiades epigenomics model, and helped identify a novel class of blood-borne biomarkers for Alzheimer's detection. Arc Institute is a strong scientific reference showing Goodfire can work with frontier biological foundation models at scale; however, Arc evidence is still best understood as a research collaboration rather than a conventional software deployment, especially because the initial steering work was described as early stage. Mayo Clinic similarly supports category credibility, governance readiness, and clinical adjacency, but the public record frames the work as research and hypothesis generation rather than routine clinical deployment. Rakuten stands apart because it is the clearest public production-style deployment: Goodfire says Rakuten deployed SAE probes for PII detection in AI agents after the system had to generalize from synthetic training data to real multilingual traffic with high recall requirements. Radical AI adds a fifth named proof point in materials science, but commercialization maturity remains early because the public disclosure emphasizes technical progress and promises more detail later. Taken together, the adoption motion looks consultative and deeply collaborative. Goodfire repeatedly describes a shared environment, selective design-partner engagement, embedded work, and case-by-case pricing rather than self-serve onboarding. That is a credible way to launch an advanced infrastructure product, but it also means that the current evidence base proves depth of technical engagement more clearly than repeatable, scaled software distribution.[CU010, CU011, CU012, CU013, CU014, CU015]

Customer growth / adoption trajectory table
Metric	Value	Date	Source quality	Implication	Missing denominator
Broad customer categories disclosed	Fortune 500 enterprises, major healthcare institutions, AI research labs	2026-06-10	medium	Goodfire markets beyond pure research labs	No count by category or logo list
Named public collaborators / customers with specific use cases	5	2026-06-10	high	Public proof set includes Prima Mente, Arc Institute, Mayo Clinic, Rakuten, and Radical AI	Not total customer count
Named proof points with quantified technical outcomes	4	2026-06-10	medium	Prima Mente, Mayo EVEE, Rakuten, and Radical disclose measurable technical results	Outcome metrics are technical, not commercial
Named proof points explicitly described as production deployment	1	2026-06-10	high	Rakuten is the clearest production-style enterprise account	No disclosed production account count across the rest of the base
Pricing disclosure	Case-by-case and request-access	2026-04-30	high	Sales motion appears enterprise and consultative	No public pricing tiers or contract ranges
Public follow-on evidence after initial collaboration announcement	2	2026-06-10	medium	Arc and Mayo have later public updates, suggesting some relationship continuity	Follow-on evidence is not the same as paid renewal
Public customer count / ARR / NRR		2026-06-10	high	Commercial scale cannot be quantified from public evidence	Core denominator for adoption and durability is undisclosed

Count rows refer to the public proof set visible in reviewed sources, not to Goodfire's total customer base. Null means undisclosed.

[CU001, CU006, CU007, CU022, CU024, CU025]

Named customer proof table
Customer / partner	Segment	Deployment / use case	Production vs pilot	Outcome	Limitation
Prima Mente	AI neuroscience / life sciences	Interpret Pleiades epigenomics model to surface disease signals and improve model design	High-touch research collaboration; not disclosed as routine clinical production	Novel class of blood-borne Alzheimer's biomarkers identified; fragmentomics/fragment length highlighted	Experimental validation and publication are still pending
Arc Institute	Genomics foundation-model research	Interpret Evo 2 representations and explore steerable biological features	Research collaboration with later Nature-linked validation; commercial terms undisclosed	Feature discovery across coding sequences, protein structure, and tree-of-life representations	Initial steering work was described as early stage
Mayo Clinic	Major healthcare institution / genomic medicine	Reverse engineer genomics foundation models and launch EVEE variant-effect explorer	Research and translational collaboration; not disclosed as routine clinical deployment	0.997 AUROC on 839k ClinVar variants; interpretable predictions for all 4.2M ClinVar variants	Work is undergoing peer review and computational outputs are not diagnoses
Rakuten	Enterprise AI-agent platform	Detect PII in multilingual user messages for AI agents	Production deployment	SAE probes deployed with strong synthetic-to-real generalization and major cost savings vs LLM-as-judge	Only one named production enterprise deployment is public
Radical AI	Materials discovery / autonomous lab	Improve inverse materials design using self-correcting search on MatterGen	Early design partnership / technical proof	~27% overall increase in successful candidates and ~30% more SUN materials in target range	Public disclosure leaves commercialization and repeat usage unclear

This is an intentionally partial public-proof enumeration. It distinguishes named, use-case-specific evidence from broader but unnamed enterprise claims.

[CU010, CU012, CU013, CU014, CU016, CU017]

FU002: Adoption / deployment funnel

Because customer counts are undisclosed, the adoption figure is shown as a deployment flow rather than a numeric funnel: Goodfire appears to move from selective prospecting to shared-environment work, technical validation, and only occasionally disclosed production rollout.

[CU009, CU011, CU022, CU024, CU029, CU030]

FU003: Customer proof matrix

The matrix compares the public quality of each named reference account across disclosure, quantified outcomes, production maturity, independent corroboration, and retention visibility.

[CU013, CU015, CU018, CU020, CU024, CU025]

6.3 Durability, expansion, and concentration risks

Goodfire's customer durability story is the weakest part of the public record. No reviewed source disclosed NRR, GRR, churn, renewal rates, contract length, seat expansion, customer concentration, or satisfaction metrics such as NPS. The company also does not publish customer count, so outside investors cannot tell whether the business has a few large design partners or a broader installed base. The best available durability proxies are continuity signals in public collaboration history: Arc moved from an early-2025 announcement to a later Nature-linked update, and Mayo moved from a 2025 collaboration announcement to 2026 EVEE research outputs. Those signals show some relationships continue long enough to generate additional public work, but they do not prove paid renewals, revenue expansion, or long-term stickiness. Expansion potential is visible nonetheless. Goodfire can land inside high-stakes model-development workflows and then expand from research support into monitoring, training intervention, guardrails, and adjacent scientific programs. The risk is that the proof set is concentrated in a handful of named collaborators and heavily weighted toward life sciences, while the broad Fortune 500 claim remains mostly anonymous. Two independent sources sharpen the caution. MIT Technology Review praised Silico's utility but quoted Leonard Bereska arguing that Goodfire adds 'precision to the alchemy' rather than turning model design into fully principled engineering, and OnHealthcare argued that the $1.25 billion valuation looks aggressive given limited public commercial disclosure. The customer thesis is therefore promising but still fragile: Goodfire has credible reference accounts and technical outcomes, yet much of the investability question still depends on private evidence around account scale, contract economics, and repeat usage.[CU038, CU039, CU040, CU041, CU042, CU043]

Retention / repeat usage / satisfaction table
Metric	Value / null	Segment	Confidence	Diligence ask
Net revenue retention (NRR)		All segments	high	Request customer cohort tables and expansion by account vintage
Gross revenue retention / churn		All segments	high	Request renewal and logo-retention data by customer type
Contract length / commercial term		Enterprise and research accounts	high	Request pricing schedule, term length, and pilot-to-paid conversion rates
Public continuity proxy: Arc Institute	Initial 2025 announcement followed by 2026 Nature-linked update	Genomics research	medium	Confirm whether continuity reflected paid renewal, expanded scope, or publication only
Public continuity proxy: Mayo Clinic	2025 collaboration later referenced by 2026 EVEE research	Healthcare / genomics	medium	Confirm whether follow-on work sits under one master agreement or multiple phases
Customer satisfaction proxy		All segments	high	Request NPS, reference calls, or user-review data; no public reviews surfaced in the cache

Null means the metric was not publicly disclosed. The two continuity rows are relationship proxies only and should not be read as revenue retention metrics.

[CU019, CU024, CU038, CU039, CU040, CU046]

Expansion and concentration risk table
Expansion driver	Concentration / friction risk	Impact	Evidence	Diligence path
Land from research collaboration into shared product environment	High-touch delivery may scale more like expert services than pure software	Could produce high ACV but slow logo velocity	Series B post, Silico page, Prima Mente embedded-work description	Split revenue by software subscription, services, and custom research
Expand from life sciences into enterprise model operations	Public named proof remains heavily weighted toward biology	Vertical concentration could distort the apparent breadth of demand	Life-sciences page, Rakuten proof, Salesforce Ventures thesis	Measure pipeline and closed-won accounts outside biology
Broaden from open-model research teams to enterprises	Model-access constraints may limit use with closed frontier models	Adoption may skew toward labs with parameter access	MIT Technology Review and Silico page	Document support for closed-model monitoring or partner integrations
Use marquee references to win Fortune 500 buyers	Fortune 500 claim is unnamed and therefore weaker than the named proof set	Enterprise credibility could be overstated relative to disclosed evidence	Contact page and named proof table	Request named references, outcomes, and reference-call permissions
Deepen AI-agent and guardrail use cases	Rakuten is a single disclosed production account	Category could be large, but public production proof is still thin	Rakuten research and funding / investor coverage	Provide additional production customers and renewal evidence in agent workflows

Expansion rows reflect visible go-to-market vectors in public materials. Risks focus on disclosure gaps, concentration in the proof set, and the likely services-heavy delivery model.

[CU022, CU024, CU025, CU029, CU031, CU032]

FU004: Retention / repeat cohort

Goodfire does not disclose true revenue-retention cohorts, so this figure shows a narrower proxy: the share of named public collaboration cohorts that later received additional public follow-on evidence. It is a continuity proxy, not NRR or customer retention.

This figure is evidence-constrained. Goodfire does not disclose customer retention metrics, so the cohort shows only later public continuity for named relationships.

[CU019, CU024, CU038, CU040]

6.4 Exhibits

Chapter 07

07Risks

7.1 Legal, regulatory, and contract risk

Goodfire's legal and regulatory posture is strong enough to clear initial enterprise diligence, but not yet strong enough to erase downside transfer. The positive evidence is real: Goodfire says it has achieved SOC 2 Type II, Mayo describes work under rigorous privacy and governance protocols, and the company frames interpretability as the bridge that makes sensitive AI use cases more governable. The harder underwriting read comes from the contracts. Default terms disclaim warranties around uninterrupted, secure, accurate, or error-free service; pilot and evaluation modes can operate without security or support commitments unless an order form says otherwise; and aggregate liability is capped to fees paid. Those are normal startup-software positions, but for a platform aimed at healthcare, safety, and potentially critical- infrastructure workflows they leave customers carrying a meaningful share of outage, breach, and deployment risk. Data-rights posture is the second sharp edge. The TOS gives Goodfire broad rights over Usage Data and a perpetual license over Workflow Data for improvement, evaluation, training, and commercialization, while also assigning feedback IP to Goodfire. That may be commercially rational for a research-driven platform, but it can slow procurement in regulated settings where customers want hard separation between operational traces, model behavior, and vendor product improvement. NIST's generative-AI profile and 2026 critical-infrastructure concept note both point toward more explicit risk controls, and Gartner likewise emphasizes governance, cost discipline, and realistic measurement as adoption gates. The upshot is that Goodfire does not appear to face public litigation or enforcement today, but it does face a contract-and-governance burden: if order forms do not materially improve on the default paper, expansion into regulated workloads will be slower than the brand narrative implies.[CR008, CR009, CR010, CR011, CR012, CR013]

Regulatory / legal risk register
rule / obligation / posture	jurisdiction	status	likelihood	severity	mitigation	residual exposure	diligence path
Default warranty disclaimers and liability caps	U.S. contract law / customer order forms	Current in public MSA, pilot agreement, and TOS	High	High	Negotiate customer-specific paper, cyber insurance, and security addenda	High for regulated or safety-critical buyers	Review top 10 executed enterprise redlines versus default terms and any uncapped confidentiality / security carve-outs.
Broad Workflow Data and Usage Data rights	Cross-border enterprise procurement / privacy	Current in TOS	Medium-High	High	Customer-specific data-use carve-outs, de-identification controls, audit rights	Medium-High	Review DPA, data-flow maps, retention windows, and whether workflow data can be excluded from improvement/training.
Healthcare explainability and clinical-governance burden	U.S. healthcare / regulated research	Partially mitigated by Mayo governance language and biomarker case study	Medium-High	High	Use interpretability as validation layer, partner with regulated institutions, document governance pack	High until there is broader deployment proof	Obtain clinical-validation plan, regulatory positioning memo, and evidence of deployments beyond named research collaborations.
Critical-infrastructure trustworthiness expectations	U.S. critical infrastructure	Rising external expectation per NIST 2026 concept note	Medium	High	Map controls to NIST AI RMF profiles and customer model-risk workflows	Medium-High	Request sector-specific control matrix, logging / auditability architecture, and incident-response procedures.
Export-control and restricted-jurisdiction constraints	U.S. export / re-export law	Current in public contracts	Medium	Medium	Screen customers, geographies, and downstream model uses; use counsel on sensitive deployments	Medium	Review export-screening process and any blocked-country or restricted-end-use policy.
Feedback assignment and service-IP ownership	Customer/vendor IP allocation	Current in MSA and TOS	Medium	Medium	Contractual carve-outs for customer inventions and regulated workflows	Medium	Review whether enterprise paper limits feedback assignment, deliverable ownership, and derivative-work ambiguity.

Public evidence shows strong enterprise intent but still customer-favorable default paper; rows are ordered by residual underwriting importance.

[CR008, CR009, CR010, CR011, CR012, CR013]

FR001: Risk heatmap

Places Goodfire's principal risks by mitigation maturity, showing that the company has meaningful intellectual and governance assets but still weak public proof on repeatability, customer breadth, and regulated deployment readiness.

Heatmap cells are synthesis judgments based on public evidence as of 2026-06-10, not company-internal risk scoring.

[CR009, CR011, CR016, CR019, CR024, CR025]

7.2 Technical reliability and product-proof risk

Goodfire's core product claim is ambitious: that interpretability can move model development from guesswork toward controllable engineering. The risk is that Goodfire's own research record shows how early that journey still is. The intentional-design essay says the science is incomplete and the hardest problems remain unsolved. MIT Technology Review highlights the same tension from outside, quoting a mechanistic- interpretability researcher who sees Silico as useful but still more precise alchemy than true engineering. This matters because Goodfire is not selling only dashboards; it is selling trust that its interventions expose the right internal mechanisms and safely change behavior in consequential systems. The company's recent papers reinforce the need for skepticism. Verbalized eval awareness inflates measured safety; reasoning traces can be performative rather than faithful; rare harmful or backdoored behaviors may evade standard evaluations; memorization edits can preserve some reasoning while damaging arithmetic and recall; and Goodfire's own method posts say SAEs, linear steering, and parameter decomposition all have important limitations. None of that invalidates the technology. In fact, it strengthens the case that Goodfire is doing serious work on real failure modes. But it also means buyers and investors should treat current results as advanced instrumentation, not yet as proof that model behavior is fully legible or controllable. The sharp question for diligence is whether Goodfire can turn promising research into production-grade reliability evidence faster than the surrounding AI stack commoditizes adjacent monitoring, evaluation, and tracing workflows.[CR001, CR002, CR003, CR004, CR005, CR006]

Operational / quality / security risk register
failure mode	likelihood	severity	mitigation maturity	residual exposure	unresolved gap
Interpretability science remains incomplete, making product promises outrun causal understanding	High	High	Partial: Goodfire is publishing openly about limitations and building tooling anyway	High	Need independent production case studies showing interventions improve outcomes without hidden regressions.
Benchmark safety scores can be inflated by eval awareness and prompt artifacts	High	High	Partial: Goodfire has identified the distortion and prompt-rewrite mitigations	High	Need third-party eval methodology showing deployment behavior tracks benchmark performance.
Chain-of-thought can be performative rather than faithful on easier tasks	High	Medium-High	Partial: probes and early-exit methods help, but do not solve full faithfulness	Medium-High	Need deployment monitors that do not rely only on visible reasoning.
Rare harmful or backdoored behaviors may evade standard testing until after deployment	Medium-High	High	Partial: model-diff amplification appears useful for surfacing rare failures	High	Need standardized pre-deployment red-team workflow and evidence it generalizes beyond model organisms.
Edits that suppress memorization or steer behavior can degrade arithmetic or factual recall	Medium	Medium-High	Weak-Partial: tradeoffs are documented, not yet cleanly solved	Medium-High	Need model-quality scorecards showing what is lost as interpretability interventions are applied.
Current SAE / steering methods capture only fragments of geometry and can produce off-target effects	Medium	Medium	Partial: Goodfire is moving toward manifold-aware methods and SPD	Medium	Need proof that newer methods scale beyond toy models and simple demo tasks.
Public security posture shows SOC 2 but not public SLAs, incident history, or runtime-control detail	Medium	Medium-High	Partial: SOC 2 and trust portal exist	Medium-High	Need uptime reporting, incident history, and architecture detail for regulated buyers.

Severity reflects whether the failure mode would break trust in Goodfire as a control layer for consequential AI, not merely whether a research result is interesting.

[CR003, CR004, CR005, CR016, CR028, CR029]

FR002: Risk transmission map

Shows how Goodfire's research and contract risks transmit into slower regulated adoption, weaker reference quality, and potential valuation compression.

The DAG expresses directional business logic rather than measured probabilities.

[CR003, CR004, CR009, CR011, CR024, CR025]

7.3 Partner, customer, and dependency risk

Public market proof for Goodfire is narrower than the headline suggests. The company says the platform is used by Fortune 500 enterprises, healthcare institutions, and AI labs, but the named public evidence clusters around a small number of collaborations: Prima Mente in Alzheimer's biomarker discovery, Mayo Clinic in genomic medicine, Radical AI in materials science, and a request-access product page for companies training or fine-tuning models. Even the strongest case study describes Goodfire researchers embedding with the customer and building the workflow jointly. That is valuable evidence of technical depth, but it points to a high-touch delivery model rather than clearly repeatable software revenue. MIT Technology Review's case-by-case pricing note and On Healthcare's observation that this is not yet a predictable SaaS profile both fit the same pattern. This concentration creates two linked risks. First, public reference quality is partner-heavy rather than broad-based: if one flagship collaboration stalls, there is little disclosed volume to absorb the narrative hit. Second, the broader buyer workflow already contains adjacent products from observability and evaluation vendors such as Datadog and LangSmith, which package testing, tracing, monitoring, and governance for production AI teams. Those platforms are not mechanistic-interpretability equivalents, but they compete for budget and for the right to define what AI control and monitoring should look like in production. Goodfire therefore depends on proving that deep white-box access is a distinct control layer worth buying, not just an advanced research add-on inside a stack customers already understand.[CR006, CR007, CR018, CR024, CR025, CR026]

Partner / dependency risk register
dependency	counterparty / surface	role	concentration	failure scenario	severity	mitigation	residual exposure
Named reference base	Prima Mente, Mayo Clinic, Radical AI, and unnamed enterprises	Public proof that the platform works in important domains	High	One or two flagship collaborations stall, leaving little disclosed breadth to offset the narrative hit	High	Add diverse named production references and renewal proof across sectors	High
Research-heavy delivery model	Embedded researchers, field engineering, collaboration services	Transforms customer models and produces the strongest public outcomes	High	Revenue scales with scarce expert labor instead of repeatable software usage	High	Separate productized modules, playbooks, and self-serve workflows from bespoke research work	High
Frontier-model builder demand	Companies training or fine-tuning models across architectures and modalities	Core buyer group for Silico	Medium-High	Open-model teams or frontier labs internalize similar tooling or decide observability is enough	High	Show clear ROI and control advantages that cannot be replicated with standard tracing stacks	Medium-High
Customer willingness to share workflow and usage data	Enterprise customers under TOS / order-form process	Can improve platform performance and product learning loops	Medium	Procurement teams restrict data-use rights or demand hard segregation of traces	Medium-High	Offer tighter customer controls and contract options that preserve trust without removing all learning loops	Medium
Adjacent observability stack	Datadog Agent Observability, LangSmith, and similar tooling	Competes for the same monitoring, evaluation, and governance budget lines	Medium	Customers buy observability plus evaluations and decide they do not need separate white-box interpretability	Medium-High	Position interpretability as a distinct causal-control layer with measurable lift in debugging or model design	Medium-High
Healthcare governance partners	Mayo and other regulated institutions	Provide legitimacy in sensitive domains	Medium	If governance-heavy partners do not translate into broader deployments, Goodfire stays a bespoke research vendor	Medium-High	Turn flagship healthcare work into repeatable compliance and validation packages	Medium-High

Concentration is judged from disclosed public evidence only; the company may have broader commercial breadth privately, but it is not yet visible enough to underwrite as a core mitigation.

[CR006, CR007, CR017, CR018, CR024, CR025]

FR003: Dependency map

Maps the external surfaces Goodfire currently relies on most: flagship collaborators, enterprise buyers willing to share data and buy bespoke work, and adjacent observability platforms shaping buyer expectations.

Partner and buyer concentration are inferred only from publicly disclosed proof points; undisclosed customers could improve the true picture.

[CR007, CR024, CR026, CR027, CR044, CR045]

7.4 Execution, talent, capital, and thesis-break triggers

Goodfire is trying to do three hard things at once: push frontier interpretability research, turn that work into an enterprise platform, and establish category authority in regulated and high-stakes domains. Public evidence suggests the company is still small relative to the size of that ambition. On Healthcare pegs headcount at about 51 people, the labor pool for interpretability specialists appears unusually thin, and the careers page signals a still-scaling organization. At the same time, the February 2026 Series B pushed valuation to $1.25 billion, which compresses the margin for execution error. A company with limited disclosed customer breadth, no public pricing architecture, and a high-touch services component now has to prove that it can become repeatable software quickly enough to justify that mark. The practical investment answer is to convert these uncertainties into hard triggers. If customer contracts continue to leave security and outage risk mostly with the buyer, if named production references do not widen materially, if software revenue still cannot be separated from embedded services, or if adjacent observability platforms satisfy most buyer needs, the thesis weakens fast. Conversely, the risk can compress if Goodfire shows production renewals in regulated settings, enterprise paper that materially tightens default terms, and independent evidence that interpretability interventions work in deployment rather than only in papers or bespoke collaborations. Until then, Goodfire looks like a high-upside but still proof-constrained control-layer bet rather than a de-risked infrastructure standard.[CR019, CR020, CR021, CR022, CR023, CR024]

People / execution risk register
role / function	dependency or gap	likelihood	severity	mitigation	diligence path
Interpretability research bench	Global talent pool appears unusually thin and expensive	High	High	Use capital to recruit senior researchers and convert reputation into hiring leverage	Review retention metrics, key-hire pipeline, and compensation competitiveness versus frontier labs.
Research-to-product translation	Company must turn frontier papers into repeatable enterprise workflows	High	High	Productize the highest-value interventions and narrow initial beachhead use cases	Review product roadmap, services share of revenue, and deployment architecture for named customers.
Commercial scaling / GTM	Case-by-case pricing and request-access posture limit visible repeatability	High	Medium-High	Standardize packages, implementation process, and procurement paper	Request pricing architecture, ACV bands, sales-cycle data, and renewal metrics.
Management bandwidth	Small team is simultaneously building research, platform, and regulated-domain partnerships	Medium-High	Medium-High	Prioritize a few vertical wedges and reduce bespoke projects	Review functional leadership depth, hiring plan, and what share of roadmap is customer-specific.
Capital discipline after unicorn pricing	Series B valuation compresses tolerance for slow commercial proof	Medium	High	Use new capital to widen reference base and prove software leverage quickly	Request board materials on spend allocation, next milestone gates, and target evidence for next round.

The public risk is not simply that the team is small; it is that the company's ambition, valuation, and labor-market scarcity all expand execution scope faster than public proof has expanded.

[CR019, CR020, CR021, CR022, CR023, CR024]

Mitigation and kill criteria table
risk	monitorable trigger	threshold / event	action implication
Contract paper remains startup-favorable	Enterprise MSAs still mirror public liability caps and warranty disclaimers	No meaningful security / outage / confidentiality carve-out in first 3 reference customers	Treat regulated-deployment thesis as unproven; do not underwrite healthcare or critical-infrastructure expansion.
Data-rights friction blocks procurement	Customers require major redlines around Workflow Data or refuse data sharing entirely	Two or more priority accounts stall specifically on data-use terms	Assume slower sales cycles and weaker product-learning loop; haircut software-scale assumptions.
Reference set fails to broaden	Named production customers do not expand beyond current collaboration-heavy proof set	Fewer than 3 additional named production references within the next refresh cycle	Re-rate company as bespoke research/services business rather than infrastructure layer.
Research results do not translate into deployment lift	No independent evidence of production gains from interpretability interventions	No third-party deployment study or customer KPI showing measurable improvement	Reduce moat assumption and compare directly against conventional observability vendors.
Security and uptime posture stays opaque	No public uptime, incident history, or runtime-control evidence beyond SOC 2	Another refresh passes without SLA, status, or incident disclosures	Assume slower enterprise penetration in sensitive workloads.
Talent pipeline weakens	Hiring velocity or retention falls in core interpretability roles	Missed senior research / product hires for two consecutive quarters	Expect roadmap slippage and heavier founder / researcher concentration risk.
Valuation outruns repeatability	Capital raised and valuation grow faster than visible revenue quality	No pricing standardization or software-services split by next major financing event	Avoid paying for category-optionality without evidence of repeatable unit economics.
Observability platforms absorb the buyer problem	Customers adopt tracing/evaluation stacks without adding white-box interpretability	Reference buyers describe Goodfire as nice-to-have research tooling rather than control-plane infrastructure	Thesis break: category collapses into a feature rather than a standalone platform.

Kill criteria are framed as observable public-or-diligence events so they can be revisited in future refreshes instead of remaining abstract concerns.

[CR009, CR011, CR013, CR016, CR019, CR024]

Chapter 08

08Valuation

8.1 Recommendation, Financing Context, and Why Price Matters More Than Narrative

Public evidence paints Goodfire as a rare, high-quality interpretability company. The company assembled an elite funding stack quickly: a $50 million Series A in April 2025 followed by a $150 million Series B at a $1.25 billion valuation in February 2026, with Menlo, Anthropic, B Capital, Salesforce Ventures, and Eric Schmidt all showing up across the cap table. Official and filing records also support the basics of institutional quality: Goodfire is a Delaware public benefit corporation founded in 2023, based in San Francisco, and by early 2026 had filed both Series A- and Series B-era Form D documents. Goodfire further claims enterprise-ready momentum via Ember, Mayo Clinic, Arc Institute, Prima Mente, Microsoft, and a February 2026 SOC 2 Type II announcement. Those positives matter, but this chapter is valuation work, not admiration. The evidence is strong on team quality, scientific credibility, and investor signaling; it is weak on the commercial datapoints normally used to justify a software infrastructure price. None of the public round materials in this source pack disclose ARR, revenue, pricing, customer count, retention, gross margin, or software-versus-services mix. That absence is decisive. At $1.25 billion, investors are not obviously paying for proven fundamentals; they are paying for the option that interpretability becomes core AI infrastructure and that Goodfire becomes one of the category winners. That may happen, but on public evidence alone the price already assumes more commercialization than the company has disclosed. The recommendation is therefore research-more, not buy, and the valuation stance is stretched rather than attractive.[CV001, CV002, CV004, CV005, CV006, CV007]

Recommendation summary table
Dimension	Assessment	Decision implication
Recommendation	Research-more	Re-engage only if NDA diligence closes the revenue-quality and cap-table gaps, or if pricing resets toward the base-case range.
Confidence	Medium	Quality of company signal is strong; quality of valuation signal is incomplete.
Risk rating	High	Commercial opacity, category formation risk, and preference-stack uncertainty dominate underwriting.
Valuation stance	Stretched	The $1.25B round sits near the low end of the bull case rather than the center of the base case.
Near-term action	Track aggressively	Maintain diligence access, but do not underwrite the round on narrative alone.

Uses only public evidence as of the run date; entry discipline assumes primary exposure near the February 2026 round terms.

[CV001, CV005, CV015, CV036, CV047, CV048]

Thesis / anti-thesis table
Lens	Thesis	Anti-thesis	What would change the view
Category need	Interpretability should become more important as enterprises demand controllable and explainable AI.	Enterprises may decide observability and guardrails are enough, keeping interpretability niche.	Budget data showing Goodfire wins a standard line item rather than an experimental spend.
Product	Ember offers a differentiated model-internal control layer, not just post-hoc monitoring.	The product may still be too research-heavy or bespoke to scale as software.	Proof of standard pricing, time-to-value, and repeatable deployments.
Scientific proof	Goodfire has real research outputs, including steering, neural geometry, genomics, and multimodal work.	Scientific credibility does not automatically translate into recurring revenue.	Evidence that flagship research programs convert into durable commercial accounts.
Strategic demand	Anthropic, Salesforce, and Eric Schmidt are strong signal investors for the category.	Smart investors can still overpay for strategic option value in a hot AI market.	Independent software metrics that validate the price without relying on cap-table prestige.
Valuation	A $1.25B mark could be justified if Goodfire becomes core AI infrastructure for high-stakes deployments.	Today's public evidence does not disclose the ARR or margins needed to justify that mark on fundamentals.	NDA disclosure of ARR, gross margin, and retention that supports a scalable software multiple.

Rows are evidence-backed arguments and the observable condition that would change the view.

[CV010, CV011, CV013, CV015, CV022, CV029]

FV001: Recommendation logic

How scientific strength, commercial opacity, and round price combine into the recommendation.

The flow is qualitative and designed to show decision logic, not a weighted scoring model.

[CV010, CV015, CV036, CV037, CV047, CV048]

FV004: Investment KPIs

Key underwriting datapoints that are either known from public evidence or still missing.

KPI panel mixes confirmed public facts with flagged gaps; unknown commercial metrics are shown explicitly as undisclosed.

[CV001, CV005, CV015, CV028, CV036, CV047]

8.2 Evidence-Constrained Valuation Framework and Comparable Marks

Because revenue is undisclosed, a conventional revenue-multiple model would create false precision. The right method is to combine comparable private valuation marks with scenario logic anchored on what is and is not public. The comparable set is useful less as a formula than as a discipline check. Anysphere, Harvey, and Glean all carried disclosed ARR when reporters attached multibillion-dollar marks to them, while Anthropic sits in a wholly different frontier-model and compute-scarcity universe. Goodfire does not belong in Anthropic territory, and unlike Anysphere, Harvey, or Glean it has not publicly shown the recurring revenue base that would let outside investors defend a multiple. That forces the current round to be interpreted as strategic option value. The bull, base, and bear cases therefore turn on milestone conversion rather than spreadsheet extrapolation. In the bull case, Goodfire proves that Ember converts design partners and research collaborators into repeatable software revenue, keeps shipping differentiated interpretability breakthroughs, and becomes a must-have layer for high-stakes AI deployment. In the base case, the category is real and Goodfire remains one of its strongest independent teams, but commercialization is still early and high-touch; that warrants a discount to the last round, not a premium. In the bear case, research remains impressive but budgets flow toward observability, guardrails, or frontier labs themselves, leaving Goodfire with a bespoke-services profile and a materially lower valuation. On this framing, the February 2026 round sits near the bottom of the bull range rather than in the middle of the base range.[CV010, CV011, CV016, CV022, CV023, CV024]

Bull / base / bear scenario table
Scenario	Core assumptions	Valuation / return logic	Key risks	Probability signal
Bull	Ember converts research credibility into repeatable software revenue; partners become scaled reference customers; security and governance posture unlocks enterprise adoption.	$1.25B-$1.85B EV; roughly 1.0x-1.5x versus the last round, meaning upside exists but is not huge unless execution is exceptional.	Commercial conversion may stay slower than the research narrative implies.	Low-medium; requires proof not yet public.
Base	Category demand is real and Goodfire remains one of the best independent teams, but monetization stays early and partially bespoke.	$0.80B-$1.10B EV; roughly 0.6x-0.9x versus the last round, implying weak risk-adjusted returns at today's price.	Public data never closes the revenue-quality gap; budgets split across adjacent vendors.	Medium; most consistent with current public evidence.
Bear	Interpretability remains valuable but budgets shift toward observability, guardrails, or frontier labs; Goodfire struggles to standardize product revenue.	$0.35B-$0.65B EV; roughly 0.3x-0.5x versus the last round, implying material permanent-capital risk.	Commercialization remains bespoke; multiple compression hits AI infrastructure names.	Medium-low, but adverse enough to matter because disclosure is limited.

Scenario values are evidence-constrained enterprise-value ranges, not precise DCF outputs. Return logic is shown against the February 2026 $1.25B round mark.

[CV041, CV042, CV043, CV044, CV045, CV046]

Comparable valuation table
Comparable	Public metric	Valuation / status	Relevance	Limitation
Goodfire	Revenue undisclosed; $150M Series B	$1.25B valuation (Feb 2026)	Direct market anchor for this chapter.	No public ARR, pricing, or customer data to support a software multiple.
Anysphere / Cursor	>$500M ARR	$9.9B valuation (Jun 2025)	Shows what a leading AI application company looks like when valuation is paired with disclosed scale.	Different product, growth profile, and developer-led distribution.
Harvey	$190M ARR	$11B reported raise target (Feb 2026)	Shows how elite enterprise AI valuations can outrun conventional multiples when growth is proven.	Legal AI is a different vertical and the number is reported, not company-confirmed.
Glean	>$100M ARR	$7.2B valuation (Jun 2025)	Useful application-software benchmark for enterprise AI value with disclosed ARR.	Enterprise search and agents is a more mature commercial category than interpretability.
Anthropic	Frontier model and compute scale	$350B valuation with Google committing up to $40B (Apr 2026)	Upper boundary for frontier-model scarcity value in AI.	Not comparable operationally; Goodfire is not a frontier foundation-model lab.

Selected 2025-2026 private AI marks used as discipline checks, not one-for-one valuation formulas; Goodfire revenue is undisclosed, so implied multiples cannot be calculated responsibly.

[CV001, CV030, CV032, CV033, CV034, CV035]

FV002: Valuation sensitivity

Directional sensitivity of valuation conviction; positive bars strengthen willingness to pay, negative bars weaken it.

Sensitivity bars are directional conviction scores, not dollar deltas, because public revenue disclosure is absent.

[CV015, CV022, CV036, CV039, CV040, CV041]

FV003: Valuation / return range

Evidence-constrained valuation bands against the February 2026 $1.25B round reference.

These are scenario ranges built from public comparables and milestone logic; they are not a substitute for NDA-backed financial underwriting.

[CV001, CV042, CV043, CV044, CV045, CV048]

8.3 Exit Discipline, Thesis-Break Triggers, and Final Diligence Asks

The near-to-mid-term exit path is almost certainly another private round or a strategic transaction, not an IPO. Goodfire is too early and too opaque publicly for public-market underwriting: investors do not have audited revenue scale, margin profile, or even a basic customer-count disclosure. That does not make the company unattractive; it makes the investment case diligence-dependent. The practical implication is that entry discipline must focus on the missing proof points that would move Goodfire from “exceptional research company with commercial promise” to “underwritable software infrastructure business.” Those proof points are recurring revenue quality, standard pricing, concentration, gross margin, and the post-Series-B preference stack. The thesis can also break in observable ways. If collaborators fail to convert into repeatable customers, if management cannot disclose convincing revenue quality under NDA, or if budget holders decide that tracing, monitoring, and guardrails from adjacent vendors are sufficient without Goodfire's deeper internal-control layer, the current price becomes difficult to defend. Conversely, if Goodfire can show repeatable software subscriptions, strong partner conversion, and evidence that interpretability is becoming mandatory infrastructure in regulated and high-stakes deployments, the round can grow into itself. Until that evidence is produced, the disciplined posture is to keep Goodfire on the front of the watchlist, continue diligence aggressively, and avoid treating the February 2026 price as a proven bargain.[CV022, CV026, CV027, CV036, CV039, CV040]

Thesis-break and kill triggers table
Trigger	Threshold	Transmission to thesis	Action implication
Revenue-quality opacity persists	Management cannot disclose ARR, gross margin, concentration, and retention under NDA.	The investment remains narrative-led instead of fundamentals-led.	Do not underwrite above the base-case range; default to pass.
No partner-to-paid conversion pattern	Scientific collaborators and design partners do not convert into repeatable platform revenue.	Goodfire looks like a high-end research shop instead of scalable infrastructure software.	Move valuation toward the bear case and require a lower entry or structured downside.
Observability vendors satisfy the budget	Customers solve their pain with tracing, monitoring, and guardrails without needing model-internal control.	The category wedge narrows and Goodfire's TAM compresses.	Reduce conviction materially and reassess category ownership.
Preference stack is investor-unfriendly	Series B documents reveal heavy seniority, unusual protections, or meaningful dilution overhang.	Enterprise value may not translate into acceptable equity returns.	Re-cut returns on an equity-value basis before proceeding.
Security or governance credibility slips	A major trust, compliance, or governance issue undercuts the high-stakes deployment narrative.	The premium tied to safe and controllable AI weakens quickly.	Pause diligence until remediation is independently verified.

Triggers are framed as observable diligence findings or post-investment monitoring events that would break the underwriting case.

[CV036, CV043, CV046, CV049, CV050]

Final diligence asks table
Topic	Missing evidence	Why it matters	Owner / diligence path
Revenue quality	ARR, bookings, net retention, gross retention, gross margin, and revenue mix.	These are the core inputs for any valuation method beyond strategic option value.	CFO / finance room under NDA.
Pricing and packaging	Current pricing sheets, pilot-to-production conversion terms, and software-versus-services monetization.	Determines whether the business can scale as product revenue rather than bespoke work.	Sales leader and product leader interview plus contract sample review.
Customer concentration	Top ten customers, revenue concentration, deployment scope, and renewal status.	High concentration would make the current price much harder to defend.	Customer cohort review and account-level diligence.
Cap table and preferences	Post-Series-B cap table, liquidation preferences, pro rata rights, and governance protections.	Equity value can differ sharply from enterprise value if preferences are heavy.	Legal diligence on financing documents.
Commercial conversion	Evidence that Mayo, Arc, Microsoft, or similar relationships create repeatable paid software patterns.	This is the bridge between scientific credibility and a scalable investment case.	Management deep dive with cohort examples and implementation metrics.

These are the minimum asks needed to move Goodfire from an interesting company to an underwritable investment at or near the last round price.

[CV027, CV040, CV049, CV050]

8.4 Exhibits

Disclaimer

This report is a public-evidence diligence snapshot, not investment advice. Important financial, legal, technical, and contractual facts remain non-public and should be verified directly with management and primary documents before any investment decision.

Evidence index

Claims
ID	Statement	Confidence	Sources
CO001	Goodfire describes itself as a San Francisco-based research company and public benefit corporation.	High	SO001, SO002, SO014, SO018
CO002	Goodfire’s mission is to build safe and powerful AI by understanding and intentionally shaping model internals rather than relying on scaling alone.	Medium	SO001, SO004, SO005, SO006
CO003	Goodfire’s current public product is a model design environment that helps users understand, debug, and shape models through interpretability-based tooling.	Medium	SO002, SO007, SO027
CO004	Goodfire says its platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs.	Medium	SO010
CO005	Official materials frame Goodfire around two linked pillars: intentional design of models and scientific discovery from model internals.	Medium	SO004, SO005, SO008
CO006	Lightspeed publicly announced Goodfire’s $7 million seed round on August 15, 2024, showing the company was operating by mid-2024.	Medium	SO022, SO023
CO007	Series A materials say the $50 million round came less than one year after Goodfire’s founding, which supports a 2024 founding window.	Medium	SO018, SO020, SO021
CO008	One independent profile describes Goodfire as founded in 2023, creating a conflict with the 2024 founding window implied by financing materials.	Medium	SO028
CO009	Goodfire’s careers page says all roles are full-time and in person five days a week at a Telegraph Hill office in San Francisco.	Medium	SO003
CO010	Eric Ho is Goodfire’s CEO and primary public spokesperson in financing and media materials.	Medium	SO014, SO018, SO029
CO011	Daniel Balsam is publicly identified as Goodfire’s cofounder and CTO.	Medium	SO009, SO024, SO030
CO012	Tom McGrath is publicly identified as Goodfire’s cofounder and chief scientist, and partner materials credit him with founding DeepMind’s interpretability team.	Medium	SO024, SO030
CO013	Goodfire and third-party coverage say the team includes researchers or engineers from OpenAI, Google DeepMind, Harvard, Stanford, and UC San Diego.	Medium	SO004, SO014, SO017
CO014	Investor materials tie Eric Ho and Daniel Balsam to prior operating work at RippleMatch, supporting the claim that the founding team combines startup execution with research pedigree.	Medium	SO021, SO022, SO024
CO015	Reviewed public materials do not disclose a full board roster or a complete executive team beyond the founders and a few named researchers.	Medium	SO002, SO024, SO025
CO016	Goodfire announced a $50 million Series A led by Menlo Ventures with Lightspeed, Anthropic, B Capital, Work-Bench, Wing, and South Park Commons participating.	High	SO018, SO019, SO020, SO021, SO026
CO017	Lightspeed says it led Goodfire’s $7 million seed round in August 2024.	Medium	SO022, SO023
CO018	Goodfire announced a $150 million Series B at a $1.25 billion valuation led by B Capital with Juniper Ventures, Menlo Ventures, Lightspeed Venture Partners, South Park Commons, Wing Venture Capital, DFJ Growth, Salesforce Ventures, Eric Schmidt, and others participating.	High	SO004, SO014, SO015, SO016, SO017, SO029
CO019	The Series B was announced less than a year after Goodfire’s Series A.	Medium	SO014, SO015, SO004
CO020	Goodfire and third-party coverage describe the company as having raised more than $200 million in total funding after the Series B.	Medium	SO004, SO014, SO016
CO021	Adding the publicly disclosed seed, Series A, and Series B rounds implies roughly $207 million of total disclosed capital.	Medium	SO022, SO018, SO014
CO022	Reviewed public sources do not disclose debt financing, secondary transactions, ownership percentages, or board-seat allocations for Goodfire’s financings.	Medium	SO014, SO018, SO025
CO023	Salesforce Ventures’ investment materials frame Goodfire as foundational enterprise AI infrastructure rather than only a research project.	Medium	SO024, SO025
CO024	Goodfire’s public product branding shifted from Ember in 2025 financing materials to Silico in 2026 product materials.	Medium	SO018, SO020, SO007, SO029
CO025	Goodfire says it reduced hallucinations in a large language model by about half using interpretability-informed training.	Medium	SO004, SO027
CO026	Official materials name Prima Mente, Arc Institute, Mayo Clinic, and Microsoft as partners or collaborators.	Medium	SO004, SO008, SO009, SO011
CO027	The Mayo Clinic collaboration explicitly discloses that Mayo Clinic has a financial interest in the technology referenced in the announcement.	Medium	SO009
CO028	Goodfire’s public commercial proof remains broad and category-based because it names customer types but does not list many named enterprise customers or contract counts.	Medium	SO010, SO028
CO029	Goodfire should be classified as a private Series B-stage company based on investor profiles labeling it private and the February 2026 financing history.	Medium	SO025, SO030, SO014
CO030	Goodfire’s best-supported current public valuation is $1.25 billion.	High	SO004, SO014, SO015, SO016, SO017
CO031	Goodfire’s best-supported public total capital figure is above $200 million.	Medium	SO004, SO014, SO016, SO022, SO018
CO032	No reviewed public source discloses Goodfire’s revenue, ARR, or customer count.	Medium	SO004, SO014, SO025
CO033	No official source reviewed discloses employee headcount, but one independent profile estimates Goodfire had about 51 employees as of January 2026.	Low	SO003, SO028
CO034	Reviewed public sources identify only a single disclosed office location in San Francisco and do not name other offices.	Medium	SO003, SO025
CO035	The public milestone arc visible in reviewed sources runs from seed financing in August 2024 to Series A in April 2025 and Series B in February 2026.	Medium	SO022, SO020, SO004
CO036	Goodfire’s September 2025 Mayo Clinic announcement shows the company expanding from interpretability tooling into healthcare and genomic medicine partnerships.	Medium	SO009
CO037	By February 2026 Goodfire was publicly describing partnerships spanning AI agents and life sciences.	Medium	SO004, SO015
CO038	MIT Technology Review reported on April 30, 2026 that Goodfire was commercially releasing Silico as a fee-based tool for model debugging and steering.	Medium	SO027
CO039	MIT Technology Review quoted an outside interpretability researcher saying Goodfire is adding “precision to the alchemy” rather than making model design fully principled.	Medium	SO027
CO040	An independent health-tech analysis argues the $1.25 billion valuation is aggressive for a research-first company with early commercial traction and an estimated 51 employees.	Medium	SO028
CO041	Goodfire’s public materials show active field-building and recruiting through a fellowship program, Stanford guest lectures, and ongoing in-person hiring in 2025-2026.	Medium	SO003, SO012, SO013
CM001	Goodfire positions itself as an interpretability lab focused on understanding and intentionally designing AI rather than only monitoring outputs.	High	SM001, SM007
CM002	Silico is described as a model design environment for training and debugging models on Goodfire infrastructure.	High	SM003, SM007
CM003	Goodfire says it partners with organizations training or fine-tuning foundation models across architectures and modalities.	High	SM003, SM004, SM005, SM006
CM004	Goodfire claims its language-model workflow cut hallucinations by 58% without degrading benchmark performance and at about 90x lower cost than LLM-as-a-judge.	Medium	SM004
CM005	Goodfire publicly markets use cases across language models, genomics, and robotics or vision instead of only text-model applications.	High	SM004, SM005, SM006
CM006	Goodfire says it works with partners such as Arc Institute, Mayo Clinic, and Microsoft and uses a shared environment with customers.	Medium	SM007
CM007	Goodfire publicly describes inference-time monitors and production monitoring as part of its intentional-design platform.	High	SM001, SM007
CM008	Goodfire argues that black-box prompting and fine-tuning are inadequate for reliable high-stakes AI engineering and that feature steering can substitute for some fine-tuning work.	Medium	SM008, SM009
CM009	Goodfire's pilot agreement starts with internal evaluation of software plus services and explicitly aims toward a later commercial license.	Medium	SM014
CM010	The pilot agreement requires customer cooperation, access to software or equipment, and designated contacts, implying a high-touch delivery model.	Medium	SM014
CM011	Prima Mente used Goodfire to decode an epigenomics model for biomarker discovery and model redesign, showing a plausible scientific-AI buyer archetype.	High	SM005, SM015
CM012	Goodfire and Mayo frame interpretability as a way to validate model predictions, reduce spurious correlations, and improve scientific or clinical relevance under governance controls.	High	SM005, SM010
CM013	MIT Technology Review says Goodfire is one of a small handful of companies pioneering mechanistic interpretability and that frontier labs already have internal interpretability teams.	Medium	SM030
CM014	MIT says Silico is most usable where customers can access model internals, which is easier for open-source or in-house models than for closed models like ChatGPT or Gemini.	High	SM003, SM030
CM015	MIT reports that Goodfire will price Silico case-by-case instead of publishing standard pricing.	Medium	SM030
CM016	Gartner says generative-AI ROI varies widely by use case and that hidden costs such as compliance reviews, retraining, and internal overhead can exceed initial expectations.	Medium	SM016
CM017	Gartner places generative AI in the 2025 Trough of Disillusionment, which implies more cautious implementation expectations even as interest remains high.	Medium	SM016
CM018	NIST's AI Risk Management Framework treats trustworthy, governable AI as a prerequisite for adoption in higher-risk settings.	Medium	SM018
CM019	PwC reports that AI-exposed industries have 3x higher revenue-per-worker growth since 2022 and workers with AI skills command a 56% wage premium.	Medium	SM017
CM020	Goodfire's relevant market boundary is narrower than broad generative-AI narratives and should focus on model design, interpretability, and model-behavior tooling for teams that can inspect or modify internals.	High	SM001, SM003, SM014, SM030
CM021	The included spend pool covers representation analysis, failure diagnosis, steering, interpretable training feedback, and production monitors, while excluding generic AI hardware, generic copilots, and pure app-performance monitoring.	High	SM003, SM007, SM023
CM022	Arize, Fiddler, Datadog, LangSmith, Langfuse, Patronus, Arthur, and Humanloop show that tracing, evaluation, monitoring, and agent control are already recognized software categories.	High	SM019, SM021, SM023, SM024, SM025, SM027, SM028, SM029
CM023	Those adjacent platforms mostly observe prompts, traces, sessions, and outputs, whereas Goodfire's differentiation claim is control over internal features, parameters, or latent representations.	High	SM003, SM011, SM013, SM019, SM021, SM024, SM025
CM024	Arize sells from free or open-source tooling to a $50-per-month Pro plan and custom enterprise pricing, showing the adjacent observability layer already has self-serve pricing and startup programs.	High	SM019, SM020
CM025	Fiddler publishes a developer price of $0.002 per trace and markets enterprise guardrails, observability, and governance as one platform.	High	SM021, SM022
CM026	Langfuse publishes prices from free to $29 per month Core, $199 per month Pro, and $2,499 per month Enterprise, with enterprise security and support features.	High	SM025, SM026
CM027	Humanloop markets enterprise evaluation tooling with a free trial, 50 eval runs, and 10,000 logs per month, reinforcing that adjacent budgets often begin with workflow tooling rather than custom research engagements.	Medium	SM029
CM028	Goodfire's direct market reach is highest in frontier labs because they already run interpretability teams, possess model internals, and value precise control over training and behavior.	Medium	SM003, SM007, SM030
CM029	Enterprise model teams are reachable when they train or fine-tune proprietary or open-weight models, but teams using only closed APIs are outside Goodfire's near-term reach.	Medium	SM003, SM009, SM014, SM030
CM030	Scientific-AI teams in genomics, biology, and robotics are attractive because model internals can reveal domain mechanisms, improve generalization, and validate whether predictions rely on real structure or shortcuts.	High	SM005, SM006, SM010, SM012, SM015
CM031	Regulated adopters have strong need for interpretability and trustworthy AI, but procurement and deployment cycles are slower because governance, privacy, and evidence standards are higher.	High	SM010, SM017, SM018
CM032	Goodfire's adoption motion likely starts with a pilot or design-partner evaluation, then requires model and data access, interpretability work, and only later expands to production monitoring and longer-term licensing.	High	SM003, SM007, SM014
CM033	In this market the buyer, user, and payer often differ, with research or platform leaders buying, model scientists and safety teams using, and AI R&D or platform budgets paying.	Medium	SM002, SM003, SM014, SM030
CM034	The category grows as models take on higher-stakes tasks in health, science, finance, and autonomous agent workflows where output-only evaluation is insufficient.	High	SM005, SM010, SM021, SM023, SM024
CM035	Agent-observability vendors frame autonomous decisions, guardrails, and repeatable evaluation as business-critical, which expands the adjacent budget pool that Goodfire can sell into or alongside.	High	SM021, SM022, SM023, SM024, SM025, SM027
CM036	Dependence on model-internal access is a major constraint because Goodfire's tooling requires deeper access than teams using only hosted closed-model APIs can usually provide.	High	SM003, SM014, SM030
CM037	Goodfire presents interpretability as precision engineering that can turn training into intentional design.	Medium	SM007, SM008
CM038	MIT Technology Review quotes an external researcher saying Goodfire is adding precision to alchemy, which challenges the precision-engineering narrative.	Medium	SM030
CM039	Goodfire's own intentional-design essay says the agenda is at the beginning of a deep technical tree and still needs better interpretability tools and algorithms.	Medium	SM008
CM040	Goodfire's parameter-decomposition research says current interpretability methods still struggle to map model behavior cleanly to underlying parameters and circuits, which reinforces technical immaturity.	Medium	SM013
CM041	Goodfire's manifold-steering research argues that linear steering often mismatches model geometry and that geometry-aware steering works better, suggesting the technical edge is not commodity tracing.	Medium	SM011
CM042	Goodfire's Evo 2 work shows interpretability can reveal biologically relevant features and possibly guide DNA generation, supporting a scientific-AI market lens beyond enterprise copilots.	High	SM005, SM012
CM043	Goodfire says customer conversations show teams prioritize rapid iteration and migration to newer models over heavy fine-tuning, which implies demand for lighter-weight control tooling.	Medium	SM009
CM044	Public adjacent pricing creates a floor for what buyers expect to pay for observability and eval tooling, but Goodfire's undisclosed case-by-case pricing means it must win on higher-value model-internal outcomes rather than commodity traces.	High	SM020, SM022, SM026, SM029, SM030
CM045	Because Goodfire has no public pricing schedule, customer count, or disclosed recurring revenue, a defensible TAM, SAM, or SOM cannot be computed from public evidence alone.	High	SM014, SM030
CM046	The most evidence-backed near-term SOM is a small set of frontier labs, advanced enterprise model teams, and scientific model builders willing to grant model access and buy services-heavy pilot engagements.	High	SM003, SM005, SM006, SM014, SM030
CM047	Published self-serve observability prices imply an annual software band of roughly $348 to $2,388 before enterprise add-ons or heavy usage.	High	SM020, SM026
CM048	Public list pricing shows adjacent enterprise-grade observability software can reach at least about $29,988 per year before overage charges or custom services.	Medium	SM026
CM049	Fiddler's per-trace pricing implies annual monitoring spend can range from hundreds to tens of thousands of dollars depending on trace volume.	Medium	SM022
CP001	Goodfire positions Silico as the first platform for intentional model design and as a workspace for training and debugging models at frontier scale.	Medium	SP001
CP002	Goodfire says its language-model workflow predicts failures before deployment and can correct failure modes directly without retraining from scratch.	High	SP001, SP002
CP003	Goodfire extends the same model-internal workflow into life sciences and robotics/vision use cases, not just generic chat applications.	High	SP003, SP004
CP004	Goodfire explicitly frames feature steering as an alternative to black-box prompting and fine-tuning workflows.	Medium	SP005
CP005	Goodfire disclosed a $150 million Series B at a $1.25 billion valuation and third-party coverage describes roughly $209 million raised in total.	High	SP006, SP008
CP006	MIT Technology Review describes Goodfire as one of a small handful of mechanistic-interpretability pioneers alongside Anthropic, OpenAI, and Google DeepMind.	Medium	SP007
CP007	MIT Technology Review says frontier labs already have internal interpretability teams, making them Goodfire's closest direct incumbent alternative for top-end model builders.	Medium	SP007
CP008	MIT Technology Review says Silico is most useful where customers can inspect a model's inner workings, limiting its applicability on closed models such as ChatGPT or Gemini.	Medium	SP007
CP009	Outside researcher Leonard Bereska told MIT Technology Review that Goodfire may be adding precision to existing AI alchemy rather than fully turning model building into engineering.	Medium	SP007
CP010	On Healthcare Tech characterizes Goodfire as a roughly 51-person, research-first organization whose $1.25 billion valuation looks aggressive relative to disclosed commercial traction.	Medium	SP008
CP011	Goodfire's probe-based data-attribution work claims a 63% reduction in harmful behavior after filtering flagged data and larger reductions after swapping labels or removing responsible sources.	Medium	SP009
CP012	Goodfire says SAE probes for Rakuten AI agents generalized better than other probes on PII detection and were cheaper than LLM-as-judge baselines.	Medium	SP010
CP013	Goodfire's Llama 3 research preview claims it can extract modifiable internal features and steer behavior while minimizing performance degradation.	Medium	SP011
CP014	Goodfire's VPD explainer says direct edits to decomposed parameter subcomponents can produce precise behavior changes without retraining.	Medium	SP012
CP015	Goodfire says its self-correcting-search collaboration improved viable candidate materials by about 30%, supporting its claim that mechanistic tools can affect model behavior in non-LLM domains.	Medium	SP013
CP016	Goodfire's own reasoning-theater research argues that chain-of-thought can be unfaithful to internal computation, which weakens the claim that trace-level reasoning alone is enough for debugging.	Medium	SP014
CP017	Arize Phoenix markets an open-source platform for agent development and evaluation built around tracing, evals, datasets, and experiments.	Medium	SP015
CP018	Arize prices AX Pro at $50 per month with 50k spans and 10 GB, while enterprise packaging is custom and can be self-hosted.	Medium	SP016
CP019	Fiddler positions itself as a unified AI observability and security platform with lifecycle evaluation, monitoring, and real-time guardrails.	Medium	SP017
CP020	Fiddler publishes a free tier and a Developer plan priced at $0.002 per trace, with enterprise deployment options spanning SaaS, VPC, and on-premise.	Medium	SP018
CP021	Arthur markets a full-lifecycle platform for reliable AI that combines continuous evals, policies, guardrails, dashboards, and oversight.	Medium	SP019
CP022	Datadog ties agent observability to its broader application-monitoring estate and says teams can test prompt, model, and tool changes against production data in one workflow.	Medium	SP020
CP023	Datadog publishes a free tier up to 40K LLM spans per month and a Pro plan starting at $160 per month for 100K spans, with no separate evaluation fee.	Medium	SP020
CP024	LangSmith markets agent observability with framework-agnostic SDKs and says it has a free tier while paid plans scale with trace volume.	Medium	SP021
CP025	Langfuse markets an open-source AI engineering platform, self-hosting, OpenTelemetry compatibility, 10+ billion observations per month, and more than 100,000 engineers building on it.	Medium	SP022
CP026	Langfuse publishes transparent self-serve pricing: free Hobby, $29 Core, $199 Pro, and $2499 Enterprise, plus a unit-based overage ladder.	Medium	SP023
CP027	Humanloop historically sold enterprise tools to develop, evaluate, and ship trustworthy LLM apps, including private deployment options and a free trial.	Medium	SP024
CP028	Humanloop is now joining Anthropic and sunsetting its platform, so it is better read as a consolidation signal than as a durable stand-alone peer.	Medium	SP025
CP029	Weights says its products and services were wound down after its team joined OpenAI, reinforcing the pattern that AI tooling teams can be absorbed by frontier labs.	Medium	SP026
CP030	NIST's AI RMF and Gartner's GenAI guidance both emphasize trustworthiness, governance, evaluation, and hidden operating costs in sensitive AI deployments.	High	SP027, SP028
CP031	Goodfire's closest direct alternatives are internal frontier-lab interpretability teams and advanced in-house build paths, not ordinary tracing vendors.	High	SP001, SP007, SP015, SP021
CP032	Most adjacent vendors in the reviewed set compete at the trace, eval, guardrail, or governance layer rather than through direct edits to learned model features.	High	SP017, SP019, SP020, SP021, SP022
CP033	Goodfire appears best aligned to buyers building or adapting open-weight models in high-stakes domains where pre-deployment diagnosis matters more than general observability breadth.	Medium	SP002, SP003, SP004, SP007
CP034	Datadog, LangSmith, and Langfuse have stronger visible developer distribution than Goodfire because they ride existing observability, framework, or open-source workflows.	Medium	SP020, SP021, SP022
CP035	Fiddler and Arthur compete more directly with governance- and trust-led procurement because they explicitly emphasize guardrails, policies, monitoring, and enterprise oversight.	Medium	SP017, SP018, SP019
CP036	Goodfire's public commercial disclosure is thinner than that of Arize, Fiddler, Datadog, and Langfuse because MIT describes Silico pricing as case-by-case and Goodfire declined specifics.	Medium	SP007, SP016, SP018, SP020, SP023
CP037	Free or low-cost adjacent tools put price pressure on any attempt to sell Goodfire as a generic AI engineering or observability layer instead of a differentiated model-design product.	Medium	SP015, SP016, SP020, SP022, SP023, SP024
CP038	Category consolidation is already visible through Humanloop's move to Anthropic and Weights' move to OpenAI, which raises the risk that interpretability adjacencies become features inside larger labs or stacks.	Medium	SP025, SP026
CP039	Goodfire's moat is strongest if its research outputs in steering, attribution, probes, and domain science can be productized into repeatable workflows rather than bespoke research wins.	Medium	SP006, SP009, SP010, SP011, SP013
CP040	The public record does not yet show enough win-rate, realized-pricing, or retention evidence to underwrite Goodfire's competitive durability with high confidence.	Medium	SP007, SP008
CP041	The status-quo substitute for many buyers remains an in-house black-box stack of prompting, eval harnesses, fine-tuning, and guardrails, which is cheaper up front but less mechanistically explanatory.	Medium	SP005, SP015, SP021, SP024
CP042	Goodfire's partner access today looks more domain-credibility-led than platform-distribution-led: Microsoft, Mayo, Rakuten, and Radical-style collaborations support relevance but do not equal Datadog- or LangChain-style installed-base reach.	Medium	SP006, SP010, SP013, SP020, SP021
CP043	Humanloop packages enterprise LLM evals as a standalone platform, reinforcing that adjacent evaluation vendors compete for some of the same budgets Goodfire targets.	Medium	SP029
CI001	Goodfire announced a $150 million Series B round at a $1.25 billion valuation in February 2026.	High	SI002, SI016, SI017
CI002	Goodfire's 2026 Form D reports $149,999,796 sold after a first sale on 2025-12-17, against a total offering amount of $161,674,124.	Medium	SI028
CI003	Goodfire announced a $50 million Series A round in April 2025.	High	SI021, SI022, SI023, SI027
CI004	Goodfire's 2025 Form D reports $52,029,991 sold after a first sale on 2025-04-02.	Medium	SI027
CI005	At least $202,029,787 of equity sold across Goodfire's 2025 and 2026 Form D filings is directly verifiable from primary filing evidence.	High	SI027, SI028
CI006	Goodfire says the Series B proceeds will fund frontier research, next-generation product development, and scaled partnerships across AI agents and life sciences.	High	SI002, SI016, SI018
CI007	Goodfire describes Silico as a model-design environment and workspace for training and debugging models on Goodfire infrastructure.	High	SI001, SI003
CI008	Goodfire's product and vertical pages route prospects to request access or contact the company instead of publishing self-serve commercial pricing.	High	SI003, SI004, SI005, SI006, SI008
CI009	Goodfire's contact page says its platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs.	Medium	SI008
CI010	Goodfire's Series B post says the company has partnered with Arc Institute, Mayo Clinic, and Microsoft to deploy its technology.	Medium	SI002
CI011	In the Prima Mente case study, Goodfire says its research scientists embedded with the customer and built a biomarker-discovery pipeline around the customer's model.	Medium	SI011
CI012	Goodfire's public contract terms show a commercial bundle that can include platform access plus support, technical assistance, field engineering support, research activities, collaboration activities, and deliverables.	High	SI013, SI015
CI013	Goodfire's MSA and TOS place core commercial fees in negotiated order forms rather than in public documentation.	High	SI013, SI015
CI014	Goodfire's TOS explicitly contemplates overage charges when usage exceeds the allotment included in the applicable order form.	Medium	SI015
CI015	Goodfire's pilot agreement says pilot access is for internal evaluation and requires a separate commercial license for post-evaluation use.	Medium	SI014
CI016	Goodfire's TOS says usage reports provided through the platform dashboard or on request are the authoritative source for calculating Fees.	Medium	SI015
CI017	Goodfire's MSA says it will not use Customer Data to train foundation models or generalized machine-learning models for the benefit of Goodfire, other customers, or third parties.	Medium	SI013
CI018	Goodfire's TOS gives Goodfire a perpetual license to use Workflow Data to provide, improve, train, fine-tune, and commercialize the platform, subject to confidentiality constraints.	Medium	SI015
CI019	Goodfire's MSA and TOS allow suspension for overdue accounts and provide for late-payment interest of 1.5 percent per month.	High	SI013, SI015
CI020	Goodfire announced SOC 2 Type II compliance and a public SOC 3 report by February 2026.	Medium	SI010
CI021	Goodfire's official vertical pages target teams training or fine-tuning AI models across architectures and modalities rather than retail end users.	High	SI003, SI004, SI005, SI006
CI022	Goodfire's RLFR post claims a 58 percent reduction in hallucinations in Gemma-3-12B-IT at roughly 90 times lower cost than an LLM-as-a-judge alternative, with no degradation on standard benchmarks.	High	SI004, SI012
CI023	Goodfire's RLFR and life-sciences proof points are technical or scientific outcomes, not disclosed customer ROI or recognized revenue metrics.	Medium	SI011, SI012, SI026
CI024	Goodfire's feature-steering post says the SAE demo interface and API were deprecated in February 2026.	Medium	SI007
CI025	The deprecation of public preview tooling and the request-access posture together suggest Goodfire has shifted its public surface toward enterprise and custom deployments.	Medium	SI003, SI007, SI008
CI026	Goodfire's public evidence includes named life-sciences proof points with Prima Mente, Mayo Clinic, and Arc Institute.	High	SI002, SI005, SI011
CI027	Salesforce Ventures presents Goodfire as foundational enterprise infrastructure for understanding and intentionally designing modern AI.	Medium	SI025
CI028	Menlo Ventures says Goodfire is productizing Ember and commercializing model understanding, and notes that Eric Ho previously scaled a prior company to more than $10 million in ARR.	Medium	SI023
CI029	No reviewed public source discloses Goodfire's revenue, ARR, gross margin, cash balance, burn rate, runway, or customer retention metrics.	High	SI001, SI002, SI003, SI013, SI015, SI016, SI026
CI030	No reviewed public source discloses public list pricing, minimum commits, or discount ladders for Silico or related commercial offerings.	High	SI003, SI004, SI005, SI006, SI008, SI013, SI015
CI031	Because pricing is private and contracts are order-form based, Goodfire's realized pricing and software-versus-services revenue mix cannot be inferred from the official surface alone.	Medium	SI011, SI013, SI014, SI015
CI032	A skeptical sector analysis argues that Goodfire's $1.25 billion valuation is aggressive for a roughly 51-person company with early commercial traction and not yet a predictable SaaS business.	Medium	SI026
CI033	The same skeptical analysis argues that investors are underwriting Goodfire on research and platform option value rather than on publicly evidenced near-term software revenue.	Medium	SI026
CI034	Goodfire's Series B was announced less than a year after its Series A, showing capital access that scaled faster than disclosed operating metrics.	High	SI002, SI021, SI026
CI035	Goodfire's 2025 Form D lists 47 investors, while the 2026 Form D lists 19 investors.	High	SI027, SI028
CI036	Goodfire's 2026 Form D total offering amount exceeds the press-announced $150 million sold amount, implying possible residual allocation or additional financing capacity within the same offering.	Medium	SI016, SI028
CI037	No public debt facility or project-finance obligation surfaced in the reviewed sources, but the absence of disclosure should not be treated as proof of zero leverage.	Medium	SI013, SI015, SI016, SI026
CI038	Post-Series-B capital adequacy can only be assessed qualitatively: Goodfire is well funded relative to public stage signals, but runway cannot be modeled without cash and burn data.	Medium	SI016, SI026, SI028
CI039	Goodfire's public messaging implies a high-touch GTM motion centered on selective design partnerships rather than broad self-serve transaction volume.	Medium	SI002, SI008, SI015
CI040	Because Goodfire's customer evidence includes embedded scientific work and its terms contemplate field engineering and collaboration activities, at least some current revenue likely mixes software access with services delivery.	Medium	SI011, SI015
CI041	Goodfire publicly presents Radical AI as a materials-science design partner, supporting a commercialization path beyond language-model customers.	Medium	SI029
CE001	Silico is described as the first platform for intentional model design and as a model-design environment built on Goodfire infrastructure.	High	SE001, SE030
CE002	Silico markets five operator jobs around model internals: seeing inside predictions, running health checks, debugging failures, shaping behavior, and generalizing from less data.	Medium	SE001
CE003	Goodfire's current product motion is request-access and partnership-led for teams training or fine-tuning foundation models across architectures and modalities.	Medium	SE001, SE002, SE003, SE004
CE004	Goodfire's language-model workflow claims a 58% hallucination reduction with no degradation on performance benchmarks.	High	SE002, SE005
CE005	The same language workflow claims roughly 90x lower intervention cost than LLM-as-a-judge approaches.	Medium	SE002
CE006	The Hallucinations Viewer compares base and policy rollouts on LongFact++ and exposes intervention details for selected outputs.	Medium	SE005
CE007	Goodfire's life-sciences surface claims state-of-the-art pathogenicity prediction across 839k ClinVar variants.	High	SE003, SE015
CE008	Goodfire says EVEE provides interpretable predictions and explanations for all 4.2 million ClinVar variants.	High	SE003, SE015
CE009	Prima Mente and Goodfire identified DNA fragment length as a dominant Alzheimer's signal and distilled the finding into a human-readable classifier.	Medium	SE013, SE028
CE010	Goodfire says the Alzheimer's biomarker workflow generalized to an independent cohort.	High	SE003, SE013
CE011	Goodfire's robotics and vision surface says teams can predict failures before deployment by inspecting latent representations directly.	Medium	SE004
CE012	The robotics case study says Goodfire traced unstable behavior to brittle internal features and information bottlenecks.	Medium	SE004
CE013	Goodfire markets feature steering as stronger than prompting when prompt engineering hits diminishing returns.	Medium	SE006
CE014	Goodfire says feature steering can often replace fine-tuning for behavior changes but cannot add new knowledge to a model.	Medium	SE006
CE015	Goodfire deprecated its earlier SAE demo interface and API in February 2026.	Medium	SE006
CE016	MIT Technology Review reports that Silico uses agents to automate interpretability work that previously required human researchers.	High	SE009, SE030
CE017	MIT Technology Review reports that Silico is priced case-by-case and is easier to use on open-source models than on closed APIs.	Medium	SE030
CE018	Goodfire's intentional design thesis frames current AI training as guess-and-check and positions interpretability as closed-loop steering.	Medium	SE007
CE019	Goodfire says intentional design aims to change what models learn from individual datapoints rather than hard-wiring heuristics into models.	Medium	SE007
CE020	Goodfire says it released the first public sparse autoencoders trained on a true reasoning model, DeepSeek R1.	Medium	SE010
CE021	Goodfire's R1 work says effective steering had to begin after the model's boilerplate response prefix rather than at the first response token.	Medium	SE010
CE022	Goodfire reports that some R1 features revert toward original behavior under oversteering before outputs become incoherent.	Medium	SE010
CE023	Goodfire argues that important model concepts often live on curved manifolds rather than along single linear directions.	Medium	SE011, SE014
CE024	Can SAEs Capture Neural Geometry? says a single sparse-autoencoder feature gives only a partial view of curved internal structure.	Medium	SE014
CE025	Goodfire says its manifold pipeline clusters sparse features to recover fuller geometric structure from internal representations.	Medium	SE014
CE026	Stochastic Parameter Decomposition moves interpretability into parameter space by learning which weight components can be removed without changing behavior.	Medium	SE017
CE027	Model diff amplification makes rare harmful behaviors 10 to 300 times more common in sampling, making them easier to detect.	Medium	SE016
CE028	Goodfire says model diff amplification can reveal post-training side effects after only a fraction of a training run.	Medium	SE016
CE029	Goodfire's eval-awareness study found naturally occurring verbalized eval awareness across all 19 benchmarks and 8 models it tested.	Medium	SE012
CE030	Goodfire says prompt rewrites reduced eval awareness by 40% and an unsupervised method reduced it by 75%, with safe-behavior rates also falling.	Medium	SE012
CE031	Paint With Ember uses a canvas that manipulates SDXL-Turbo internal activations instead of relying only on text prompts.	Medium	SE019
CE032	Goodfire's research surfaces and phylogeny work argue that internal geometry recapitulates structured concepts across language, image, and genomic models.	Medium	SE011, SE021
CE033	Goodfire's terms define the platform as software, APIs, tools, documentation, support, and services, and allow customers to bring models, files, datasets, code, and workflows into the platform.	Medium	SE022
CE034	Public terms tie commercial fees and overages to order forms and usage reports rather than to a public rate card.	Medium	SE022
CE035	The Pilot Agreement limits pilot use to internal evaluation and requires a separate commercial license after the evaluation period.	Medium	SE023
CE036	Goodfire's terms and pilot agreement both describe support, technical assistance, field engineering, research activities, and deliverables around the platform.	High	SE022, SE023
CE037	Goodfire's terms allow third-party products and permit access suspension for security, legal, operational, or overdue-account reasons.	Medium	SE022
CE038	Goodfire's terms say customers retain customer materials while Goodfire retains Goodfire IP and broad rights over usage data and licensed workflow data.	Medium	SE022
CE039	Goodfire announced SOC 2 Type II with no exceptions identified and a public SOC 3 summary.	Medium	SE008
CE040	Goodfire says its Mayo collaboration operates under rigorous data privacy protocols and Mayo Clinic governance frameworks.	Medium	SE027
CE041	NIST's AI RMF and generative-AI profile focus on embedding trustworthiness into AI design, development, use, and evaluation.	Medium	SE033
CE042	Gartner says GenAI total cost of ownership is often understated and that critical decision use cases require more robust and interpretable approaches.	Medium	SE034
CE043	Salesforce Ventures frames Goodfire as moving AI teams from guessing at behavior to measuring and shaping model intent and reasoning.	Medium	SE031
CE044	On Healthcare Tech argues that interpretability could become infrastructure for regulated health AI, but public commercialization evidence still looks early.	Medium	SE032
CE045	Public materials reviewed do not provide a public status page, self-serve API reference, or public deployment-count disclosure for Silico.	Medium	SE001, SE022, SE030
CE046	Careers, Stanford guest lectures, and the fellowship program show active research-engineering recruiting and practitioner education despite a limited public OSS product surface.	Medium	SE024, SE025, SE026
CE047	Goodfire's public 2026 output is research-led and fast, with published releases on May 4, May 20, and May 21 covering eval awareness and neural geometry.	Medium	SE012, SE014, SE020
CE048	PR Newswire says Series B proceeds will fund next-generation product development and partnership scaling across AI agents and life sciences.	Medium	SE038
CE049	EVEE combines Evo 2 embeddings, lightweight probes, and frontier reasoning models to generate human-readable hypotheses about variant effects.	Medium	SE015
CE050	Goodfire's phylogeny work says Evo 2 encodes tree-of-life relationships as a curved manifold, reinforcing its model-to-human knowledge-transfer thesis.	Medium	SE021
CE051	Menlo and Lightspeed both describe Goodfire as an applied research lab translating mechanistic interpretability into productized tooling.	Medium	SE035, SE036
CE052	PYMNTS reports that Goodfire internally uses a model design environment and deploys that shared environment forward with customers.	Medium	SE037
CU001	Goodfire publicly says its platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs.	High	SU001, SU014
CU002	Goodfire positions Silico and related services for organizations training or fine-tuning foundation models across architectures and modalities.	High	SU001, SU002
CU003	Goodfire says it engages deeply and selectively with teams building high-stakes or frontier systems where understanding and control are essential.	Medium	SU014
CU004	Public product and contact copy imply buyers are research, platform, or product owners while day-to-day users are research scientists and ML engineers.	Medium	SU001, SU002, SU003, SU004, SU005
CU005	Goodfire's named proof set is concentrated in life sciences, AI-agent infrastructure, and materials discovery rather than a wide range of end markets.	Medium	SU003, SU006, SU008, SU010, SU011, SU013
CU006	Reviewed public sources do not disclose Goodfire's customer count.	High	SU001, SU014, SU019, SU020
CU007	Reviewed public sources do not disclose Goodfire's segment-level revenue or ARR by customer type.	High	SU014, SU020, SU022
CU008	The broad Fortune 500 adoption claim is not backed by a public list of named enterprise customers or outcomes.	High	SU001, SU014
CU009	Goodfire's public sales surface is request-access and contact led rather than self-serve.	High	SU001, SU019
CU010	Prima Mente partnered with Goodfire to understand its Pleiades epigenomics model.	Medium	SU006, SU007
CU011	Goodfire says its researchers embedded in Prima Mente's team while building a biomarker-discovery pipeline.	Medium	SU006
CU012	Goodfire and Prima Mente identified a novel class of blood-borne biomarkers for Alzheimer's detection.	Medium	SU006, SU007, SU022
CU013	Prima Mente's public outcome remains pre-validation because the biomarkers are still undergoing experimental validation and a publication is forthcoming.	Medium	SU006, SU007
CU014	Goodfire collaborated with Arc Institute to interpret Evo 2, Arc's genomic foundation model.	Medium	SU010, SU025
CU015	The initial Arc Institute disclosure described feature discovery and steering work that was still in its early stages rather than a mature production deployment.	Medium	SU010
CU016	By March 2026, Goodfire's Evo 2 interpretability work had been updated to note Nature publication, increasing scientific credibility of the Arc partnership.	Medium	SU010, SU020
CU017	Goodfire says its Mayo Clinic collaboration combines interpretability work with Mayo's medical AI team and established data-governance frameworks.	Medium	SU008
CU018	Public Mayo materials frame the work as genomic research and responsible-AI validation rather than routine clinical deployment.	Medium	SU008, SU009
CU019	Goodfire's EVEE work is described as part of its ongoing collaboration with Mayo Clinic and is still undergoing peer review.	Medium	SU009
CU020	Goodfire says EVEE achieves 0.997 AUROC on 839k ClinVar variants and provides predictions and explanations for all 4.2 million ClinVar variants.	Medium	SU003, SU009
CU021	Goodfire says EVEE outputs are computational predictions rather than diagnoses and require further expert review and validation.	Medium	SU009
CU022	Goodfire partnered with Rakuten on PII detection for multilingual AI-agent messages in a production-critical enterprise setting.	Medium	SU013
CU023	The Rakuten deployment required synthetic-to-real generalization, multilingual English and Japanese coverage, lightweight inference, and very high recall.	Medium	SU013
CU024	Goodfire says Rakuten deployed SAE probes and describes the system as the first known enterprise application of SAEs for language-model guardrails.	Medium	SU013
CU025	Among reviewed sources, Rakuten is the clearest public evidence of a production Goodfire deployment.	Medium	SU013, SU019
CU026	Goodfire and Radical AI publicly announced a partnership to apply interpretability to inverse materials design.	Medium	SU011, SU012
CU027	Goodfire says its self-correcting-search work with Radical AI improved successful candidates by about 27% and generated about 30% more SUN materials in the target range.	Medium	SU012
CU028	Radical AI's public partnership disclosure says more research directions and outcomes will be shared later, leaving commercialization maturity unclear.	Medium	SU011, SU012
CU029	Goodfire says it deploys its model design environment forward with customers in a shared environment.	High	SU014, SU026
CU030	MIT Technology Review reports that Silico pricing is determined case by case and Goodfire declined to provide specific pricing.	Medium	SU019
CU031	MIT Technology Review says Silico could let smaller firms and research teams build or adapt open-source models without hiring interpretability researchers.	Medium	SU019
CU032	Goodfire's public positioning is selective and high-touch rather than high-volume self-serve SaaS.	High	SU001, SU014, SU019
CU033	Goodfire's Series B materials say the new funding will scale partnerships across AI agents and life sciences.	Medium	SU021, SU022, SU023, SU026
CU034	Salesforce Ventures frames Goodfire around enterprise AI ROI, reliability, and control problems.	Medium	SU017, SU018
CU035	The public proof set spans life sciences, AI-agent operations, materials science, and general frontier-model design.	High	SU003, SU011, SU013, SU014
CU036	Goodfire's public blog history shows named collaboration proof surfacing across 2025 and 2026 rather than through a single isolated announcement.	High	SU008, SU010, SU011, SU013, SU016
CU037	Goodfire's public materials distinguish broad segment claims from a much smaller set of named collaborators.	High	SU001, SU003, SU006, SU008, SU010, SU011, SU013
CU038	No reviewed public source disclosed NRR, GRR, churn, renewal rate, or true retention cohorts for Goodfire.	High	SU001, SU014, SU019, SU020
CU039	No reviewed public source disclosed contract length, commercial expansion metrics, or top-customer concentration for Goodfire.	High	SU001, SU014, SU020, SU022
CU040	Arc Institute and Mayo Clinic each have later public follow-on evidence after their initial collaboration announcements, indicating relationship continuity but not proving paid renewal.	Medium	SU008, SU009, SU010
CU041	The disclosed reference set is concentrated in a handful of named collaborators and is especially weighted toward life-sciences programs.	Medium	SU003, SU006, SU008, SU009, SU010, SU020
CU042	The broad Fortune 500 claim remains materially weaker than the named proof set because no enterprise names or outcomes are publicly disclosed.	Medium	SU001, SU014
CU043	MIT Technology Review quoted Leonard Bereska saying Goodfire is adding “precision to the alchemy,” a substantive critique of how principled the product really is.	Medium	SU019
CU044	OnHealthcare argued that Goodfire's $1.25 billion valuation looks aggressive for a research-first company with relatively early commercial traction.	Medium	SU020
CU045	OnHealthcare argued that the public valuation case relies more on platform option value than on disclosed revenue or customer metrics.	Medium	SU020
CU046	Several scientific customer outcomes remain partly hypothesis-stage because Prima Mente's biomarkers are still under validation and EVEE is still undergoing peer review.	Medium	SU006, SU007, SU009
CU047	Silico is most naturally usable where customers can inspect model internals, which may bias near-term adoption toward open-model teams and research labs over closed-model users.	Medium	SU002, SU019
CU048	Goodfire's continued publication of frontier interpretability results supports a customer narrative built on research credibility as much as on packaged software.	Medium	SU034
CU049	Mayo Clinic is a major medical institution, so Goodfire's disclosed collaboration carries meaningful signal for regulated-domain customer credibility.	High	SU035, SU007
CR001	Goodfire positions itself as a research company using interpretability to understand, learn from, and design AI systems rather than relying on scale alone.	Medium	SR006, SR022
CR002	Goodfire publicly argues that today's dominant AI-development process still cannot meaningfully understand, debug, or shape what models learn.	Medium	SR005, SR006
CR003	Goodfire says current model training is still a costly guess-and-check process and presents intentional design as an attempt to move from open-loop tweaking toward closed-loop control.	Medium	SR005
CR004	Goodfire also states that its techniques are early, the science is incomplete, and the hardest interpretability problems remain unsolved.	Medium	SR005, SR023
CR005	MIT Technology Review described Silico as potentially useful but quoted an external mechanistic-interpretability researcher saying Goodfire is adding precision to alchemy rather than fully turning model building into engineering.	Medium	SR013
CR006	Goodfire markets Silico as a model-design environment that can debug behavior, remove confounders, and diagnose failures before production, but access is still request-based rather than self-serve.	Medium	SR009
CR007	Goodfire claims its platform is already used by Fortune 500 enterprises, major healthcare institutions, and AI research labs, but it does not disclose how many of those users are production customers or what they pay.	Medium	SR008
CR008	Under the MSA and TOS, Goodfire only commits to support, service levels, implementation help, training, or professional services if those items are expressly defined in an order form.	Medium	SR001, SR003
CR009	The TOS says pilot, beta, trial, evaluation, or pre-release access may be modified, suspended, or discontinued at any time and, absent an order form, carries no service levels, support commitments, security commitments, or availability commitments.	Medium	SR003
CR010	Goodfire's default legal terms disclaim warranties that the platform or services will be uninterrupted, secure, accurate, complete, or error free.	Medium	SR001, SR003
CR011	Goodfire's aggregate liability is capped at fees paid in the prior twelve months under the MSA and TOS, while pilot-agreement liability is capped at pilot fees.	Medium	SR001, SR002, SR003
CR012	The TOS defines Usage Data broadly to include usage volumes, clickstream, logs, performance data, and error data, and classifies that Usage Data as Goodfire IP.	Medium	SR003
CR013	The TOS gives Goodfire a perpetual, irrevocable, sublicensable license to use Workflow Data to provide, improve, evaluate, train, and commercialize the platform, subject to promises not to identify the customer or reveal confidential information.	Medium	SR003
CR014	The MSA's feedback clause assigns customer feedback and related know-how to Goodfire without attribution or compensation.	Medium	SR001
CR015	Goodfire's public contracts require compliance with U.S. export and re-export restrictions and any necessary government approvals for cross-border use of the service or customer materials.	Medium	SR001, SR003
CR016	Goodfire says it is SOC 2 Type II compliant and directs customers to a trust portal for SOC 3 materials and full-report requests.	Medium	SR004
CR017	Goodfire says its Mayo Clinic collaboration operates under rigorous data-privacy protocols and Mayo's established data-governance frameworks.	Medium	SR011
CR018	Goodfire's Prima Mente case study says the customer needed to narrow model signals for experimental validation and FDA-approval progress, and that Goodfire identified a novel biomarker class through interpretability work.	Medium	SR010
CR019	On Healthcare argues Goodfire's 2026 valuation is aggressive relative to a roughly 51-person organization that still appears early in commercial traction and is funding green-field research alongside product work.	Medium	SR014
CR020	Goodfire's February 2026 Series B valued the company at $1.25 billion and brought total disclosed funding to just over $200 million.	Medium	SR014, SR015, SR021, SR022
CR021	On Healthcare reports Goodfire had about 51 employees as of January 2026.	Medium	SR014
CR022	Goodfire's careers page explicitly recruits people who thrive in fast-paced environments, signaling that the company is still building organizational depth while scaling.	Medium	SR007
CR023	On Healthcare says Eric Ho has argued there are probably fewer than 150 full-time interpretability researchers in the world, implying a tight labor pool for the company's core discipline.	Medium	SR014
CR024	MIT Technology Review says Silico pricing is determined case by case based on customer requirements and that Goodfire declined to disclose pricing specifics.	Medium	SR013
CR025	On Healthcare argues Goodfire is not yet a predictable SaaS business with clearly disclosed recurring revenue mechanics.	Medium	SR014
CR026	Goodfire's Prima Mente case study says Goodfire researchers embedded with the customer team and built the biomarker-discovery pipeline jointly.	Medium	SR010
CR027	Goodfire's TOS contemplates support, technical assistance, field engineering, research activities, collaboration activities, and deliverables alongside platform access.	Medium	SR003
CR028	Goodfire's May 2026 eval-awareness paper says verbalized eval awareness appeared across all 19 benchmarks and 8 models tested, with 515 manually verified instances.	Medium	SR024
CR029	The same paper says eval awareness correlates with safer behavior and can systematically overestimate model alignment if benchmarks do not account for it.	Medium	SR024
CR030	Goodfire reports that rewriting prompts cut verbalized eval awareness by 40 percent, that an unsupervised paraphrasing method cut it by 75 percent, and that refusal rates also dropped as awareness fell.	Medium	SR024
CR031	Goodfire's Reasoning Theater work says chain-of-thought text can be performative; on easier tasks models often know the answer early and generate superfluous reasoning that lags internal state.	Medium	SR025
CR032	Reasoning Theater also reports that probe-based early exit saved 68 percent of MMLU tokens and 33 percent of GPQA-Diamond tokens for DeepSeek-R1 while retaining more than 95 percent of baseline accuracy.	Medium	SR025
CR033	Goodfire's model-diff-amplification post says harmful or backdoored behaviors are often a needle-in-a-haystack problem that standard evaluations miss until after deployment.	Medium	SR030
CR034	Model diff amplification made harmful outputs 10x to 300x more frequent in testing and made a sleeper-agent backdoor about 100x easier to surface, but Goodfire says the method is only for detection and overstates real prevalence.	Medium	SR030
CR035	Goodfire's memorization-via-loss-curvature work says language models memorize substantial portions of training data and that many questions about how memories are stored or localized remain unresolved.	Medium	SR027
CR036	The same memorization work says suppressing memorization can preserve logical reasoning but degrade arithmetic and closed-book factual recall, showing that edits can trade off reliability across tasks.	Medium	SR027
CR037	Goodfire's SPD post argues that sparse autoencoders do not explain feature geometry, do not converge to a single true decomposition as they scale, and that SPD still has non-trivial sensitivity and has only been validated on toy models.	Medium	SR026
CR038	Goodfire's neural-geometry post says a single SAE direction gives only a partial view of curved structure, so interpreting features one by one misses the global picture.	Medium	SR028
CR039	Goodfire's manifold-steering post says linear steering often mismatches internal geometry and can produce noisy, off-target effects.	Medium	SR029
CR040	Goodfire's scientific-model interpretability work argues interpretability can improve reliability and transparency in downstream applications, especially clinical domains, but extracting mechanisms from complex models remains challenging and valuable.	Medium	SR031
CR041	MIT Technology Review says interpretability tools like Silico could be essential for safety-critical applications in healthcare and finance, increasing the burden on Goodfire to prove deployment-grade trustworthiness rather than just interesting demos.	Medium	SR013
CR042	NIST says the 2024 generative-AI profile and the 2026 critical-infrastructure concept note are intended to guide organizations toward concrete AI risk-management practices and trustworthy-AI controls.	Medium	SR016
CR043	Gartner says enterprise GenAI outcomes depend heavily on data quality, governance, change management, realistic expectations, and talent availability.	Medium	SR017
CR044	Datadog markets a production stack that combines prompt testing, evaluations, tracing, monitoring, sensitive-data scanning, and enterprise controls for AI systems.	Medium	SR019
CR045	LangSmith markets observability, monitoring, hallucination debugging, and self-hosted or BYOC deployment options so sensitive traces can stay inside the customer environment.	Medium	SR020
CR046	On Healthcare and Goodfire's Mayo materials both frame healthcare deployment as blocked by the gap between model predictions and biological understanding, positioning interpretability as a compliance and validation bridge rather than only a developer tool.	Medium	SR011, SR014
CR047	Goodfire's public proof set is concentrated in named collaborations and case studies—Prima Mente, Mayo Clinic, Radical AI, and unnamed enterprise claims—rather than a broad list of disclosed production references.	Medium	SR008, SR010, SR011, SR012
CR048	The Radical AI partnership announcement says details on research directions and outcomes will be shared later, so one of Goodfire's flagship scientific partnerships is still forward-looking in public evidence.	Medium	SR012
CR049	PwC says healthcare AI adoption is slower than in other sectors and emphasizes risk-controlled adoption, which raises go-to-market friction for vendors selling into regulated clinical workflows.	Medium	SR018
CR050	Adjacent observability vendors already package evaluation, tracing, monitoring, and governance into production platforms, so Goodfire has to prove that interpretability delivers a distinct control layer rather than just another form of observability.	Medium	SR019, SR020
CR051	Salesforce Ventures argues enterprise AI buyers are increasingly constrained by unclear ROI and by an inability to steer models reliably and consistently, framing control and reliability as buyer pain rather than purely research interests.	Medium	SR032
CR052	Lightspeed framed Goodfire as critical infrastructure for explainable and mission-critical AI, explicitly tying future demand to regulation and to the need to productize interpretability for enterprises rather than only researchers.	Medium	SR033
CR053	Investing.com reported that Goodfire works with clients including Microsoft, Mayo Clinic, and Arc Institute and plans to use new capital for model improvement, compute, and hiring, which reinforces both partner-value and execution-demand intensity.	Medium	SR034
CR054	Adjacent observability vendors already market tracing, monitoring, and workflow-debugging for AI agents, increasing substitution risk around parts of Goodfire's budget.	Medium	SR035
CR055	Datadog now packages agent observability inside a broader enterprise monitoring suite, which can pull AI-operations budget toward incumbent platforms.	Medium	SR036
CR056	Langfuse positions itself as an observability layer with open-source adoption, reinforcing price and workflow competition for AI development teams.	Medium	SR037
CR057	Langfuse publishes transparent pricing, which increases buyer expectations for standardized software packaging that Goodfire has not yet publicly matched.	Medium	SR038
CR058	LangSmith markets observability for AI agents and LLM applications, underscoring that adjacent tooling vendors can compete for the same developer and platform owners.	Medium	SR039
CR059	Weights' combination with OpenAI highlights consolidation risk in AI tooling, where platform vendors can absorb adjacent products before smaller specialists fully scale.	Medium	SR040
CR060	Mechanistic interpretability results still depend on advancing research rather than finished engineering playbooks.	Medium	SR041
CR061	Goodfire continues to publish foundational work on latent computation, underscoring that part of its edge still resides in experimental research rather than commoditized software.	Medium	SR042
CR062	Goodfire's ongoing publication cadence suggests platform differentiation remains tied to research velocity, which creates key-person and execution dependence if commercialization lags.	Medium	SR043
CR063	Goodfire's valuation and product narrative still depend on turning novel neural-geometry research into dependable commercial workflows, which keeps execution risk elevated.	Medium	SR044
CV001	Goodfire announced a $150 million Series B at a $1.25 billion valuation in February 2026.	High	SV001, SV002, SV012
CV002	B Capital led the Series B and the syndicate included Juniper Ventures, Menlo Ventures, Lightspeed Venture Partners, South Park Commons, Wing Venture Capital, DFJ Growth, Salesforce Ventures, and Eric Schmidt.	Medium	SV001, SV002, SV003
CV003	Goodfire said the Series B came less than a year after its Series A.	Medium	SV002, SV003, SV006
CV004	Goodfire announced a $50 million Series A in April 2025 led by Menlo Ventures with Anthropic participating.	Medium	SV006, SV007
CV005	Public company and press-release materials imply that Goodfire has raised more than $200 million in total capital after the Series B.	High	SV001, SV002, SV006
CV006	Official and SEC materials identify Goodfire as a Delaware company founded in 2023 and based in San Francisco.	High	SV028, SV029, SV030
CV007	Goodfire describes itself as a public benefit corporation focused on interpretability to understand, learn from, and design AI systems.	Medium	SV002, SV030
CV008	The April 2025 Form D shows roughly $52.0 million sold against a roughly $52.1 million offering tied to the Series A financing.	Medium	SV028, SV006
CV009	The February 2026 Form D lists Yan-David Erlich among related persons and shows a $161.7 million offering amount tied to the Series B-era filing.	Medium	SV029, SV002
CV010	Goodfire positions Ember as its flagship model design environment and interpretability platform.	High	SV006, SV007, SV010
CV011	Goodfire says Ember is meant to give programmable access to internal model features so users can inspect, edit, and retrain behavior more precisely than black-box methods.	Medium	SV006, SV007, SV009
CV012	Goodfire says interpretability-guided training reduced hallucinations in a language model by roughly half.	Medium	SV001, SV002, SV010
CV013	Goodfire cites collaborators including Arc Institute, Mayo Clinic, Prima Mente, and Microsoft.	Medium	SV001, SV010
CV014	Goodfire says interpretability work surfaced a novel class of Alzheimer's biomarkers from Prima Mente's epigenetic model.	Medium	SV001, SV002, SV010
CV015	Goodfire announced SOC 2 Type II compliance with no exceptions identified in February 2026.	Medium	SV031
CV016	Goodfire continued publishing 2026 research across neural geometry, steering, parameter decomposition, and pooling methods.	Medium	SV022, SV024, SV025, SV027, SV031
CV017	Goodfire's Llama 3 research preview says it trained sparse autoencoders on Llama-3-8B and used causal feature interventions to steer outputs while minimizing degradation.	Medium	SV023
CV018	Goodfire's Geometric Calculator page says Llama 3.1 8B uses a general-purpose addition module that handles months, days, and arithmetic via circular representations.	Medium	SV024
CV019	Goodfire's Covariance Pooling page argues second-moment pooling outperforms mean pooling on downstream genomic tasks.	Medium	SV025
CV020	Goodfire's Painting With Concepts page shows interpretability tooling applied to SDXL-Turbo image generation, indicating modality expansion beyond text.	Medium	SV026
CV021	Goodfire's VPD explainer says the company decomposed a 67M-parameter model into simple pieces and used that structure to edit behavior without training.	Medium	SV027
CV022	Goodfire's product wedge sits deeper in the stack than observability vendors because it aims to intervene on model internals rather than only trace outputs or enforce guardrails.	Medium	SV009, SV019, SV020, SV021
CV023	Arize Phoenix positions itself around tracing, evals, and agent observability rather than model-internal design.	Medium	SV019
CV024	Fiddler positions its product around observability, guardrails, and governance for agents and predictive AI rather than model-internal representation editing.	Medium	SV020
CV025	LangSmith positions its product around tracing, monitoring, and clustering for agent behavior rather than model-internal steering.	Medium	SV021
CV026	Gartner says generative AI entered the 2025 trough of disillusionment and that ROI depends on governance, change management, and full cost accounting.	Medium	SV017
CV027	NIST says the AI RMF and its generative AI profile exist to help organizations manage trustworthiness and AI risk across design, deployment, and evaluation.	Medium	SV018
CV028	The On Healthcare analysis says Goodfire raised $209 million across seed, Series A, and Series B and estimated the team at roughly 51 employees as of January 2026.	Medium	SV010
CV029	The On Healthcare analysis argues that the $1.25 billion valuation is aggressive for a research-first company with relatively early commercial traction.	Medium	SV010
CV030	TechCrunch's 2026 mega-round list places Goodfire among U.S. AI companies that raised $100 million or more in early 2026 at a $1.25 billion valuation.	Medium	SV012
CV031	TechCrunch reported that Eric Schmidt's Hillspire invested directly in Goodfire as family offices and private wealth moved earlier into AI deals.	Medium	SV011
CV032	Anysphere was valued at $9.9 billion after surpassing $500 million in ARR.	Medium	SV013
CV033	Harvey was reportedly raising at $11 billion after hitting a $190 million ARR rate by the end of 2025.	Medium	SV014
CV034	Glean reached a $7.2 billion valuation after surpassing $100 million in ARR.	Medium	SV015
CV035	Anthropic was valued at $350 billion in April 2026 with up to $40 billion of Google investment and large compute commitments.	Medium	SV016
CV036	Unlike Anysphere, Harvey, and Glean, Goodfire's public round materials do not disclose revenue or ARR, so a comparable revenue multiple cannot be responsibly calculated from public evidence.	Medium	SV001, SV002, SV010, SV013, SV014, SV015
CV037	The current mark therefore looks like strategic option value on category leadership, research talent, and future platform commercialization rather than a fundamentals-backed software multiple.	Medium	SV001, SV009, SV010, SV017
CV038	Goodfire's strategic investor mix—Anthropic in Series A, Salesforce in Series B, and Eric Schmidt in Series B—supports the view that technically sophisticated buyers think interpretability will matter commercially.	Medium	SV006, SV009, SV011, SV002
CV039	Goodfire's market relevance is helped by enterprise pressure for explainability, governance, and reliable ROI in AI deployments.	Medium	SV009, SV017, SV018
CV040	Public evidence still does not disclose customer count, pricing, contract structure, retention, gross margin, or software-versus-services mix.	Medium	SV001, SV002, SV010
CV041	A plausible bull case requires proof that Goodfire is converting research credibility into repeatable software revenue and durable enterprise adoption.	Medium	SV009, SV017, SV019
CV042	Without that proof, a base case should haircut the last round and anchor below $1.25 billion because market demand is real but commercial evidence is incomplete.	Medium	SV010, SV017, SV013, SV014, SV015
CV043	A reasonable public-evidence bear case is a sub-$650 million outcome if commercialization stays bespoke, competitors absorb budget, or private AI multiples compress.	Medium	SV010, SV019, SV020, SV021, SV013
CV044	A reasonable public-evidence base case is roughly $800 million to $1.1 billion, implying the last round already prices in part of the bull thesis.	Medium	SV010, SV013, SV014, SV015, SV017
CV045	A reasonable public-evidence bull case is roughly $1.25 billion to $1.85 billion, which requires disclosed software revenue, strong design-partner conversion, and continued research and enterprise validation.	Medium	SV001, SV009, SV010, SV015
CV046	Given stage and disclosure opacity, another private round or strategic acquisition is a more plausible near-to-mid-term path than a public listing.	Medium	SV010, SV012, SV015, SV016
CV047	The most supportable current recommendation is research-more rather than buy because company-quality evidence exceeds pricing evidence.	Medium	SV001, SV010, SV017, SV018
CV048	The most supportable valuation stance is stretched because the $1.25 billion round sits near the lower bound of the bull case, not the center of the base case.	Medium	SV010, SV013, SV014, SV015, SV017
CV049	Entry discipline should require NDA-gated disclosure of ARR or revenue, pricing, top-customer concentration, gross margin, and the post-Series-B preference stack before underwriting above the base-case range.	Medium	SV010, SV017, SV018
CV050	Thesis-break triggers include failure to disclose recurring revenue quality, inability to convert partners into repeatable platform customers, or evidence that observability vendors can satisfy budgets without Goodfire's deeper tooling.	Medium	SV009, SV019, SV020, SV021, SV026
CV051	Goodfire's valuation case depends partly on owning distinctive interpretability research that competitors may not easily replicate.	Medium	SV032
CV052	Goodfire continues to invest in foundational interpretability methods, which supports upside optionality but also means commercial value still depends on converting research into repeatable product adoption.	Medium	SV033
CV053	Goodfire's upside case still depends on scaling its interpretability research edge into a durable commercial moat before adjacent tooling categories commoditize around it.	Medium	SV034

Sources
ID	Publisher	Title	Quote
SO001	Goodfire	Goodfire homepage	Goodfire is a research company using interpretability to understand, learn from, and design AI systems.
SO002	Goodfire	Goodfire company page
SO003	Goodfire	Goodfire careers	All roles are full-time, in person five days a week at our San Francisco, Telegraph Hill office.
SO004	Goodfire	Our Series B	Today, we’re excited to announce a $150 million Series B funding round at a $1.25 billion valuation.
SO005	Goodfire	Intentionally Designing the Future of AI	At Goodfire, we’re developing the science and technology that lets us steer model training — a process we’re calling intentional design.
SO006	Goodfire	On optimism for interpretability	At Goodfire, we believe we can engineer frontier AI systems that are understandable.
SO007	Goodfire	Silico	The first platform for intentional model design.
SO008	Goodfire	Life Sciences	We partner with companies training foundation models across architectures and modalities to interpret their models.
SO009	Goodfire	Goodfire Announces Collaboration to Advance Genomic Medicine with AI Interpretability	Mayo Clinic has a financial interest in the technology referenced in this press release.
SO010	Goodfire	Goodfire contact	Our platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs.
SO011	Goodfire	Prima Mente customer story	Goodfire’s platform for in silico science decoded their model, identifying a novel class of biomarkers for Alzheimer’s detection.
SO012	Goodfire	Fellowship Fall 25	We’re excited to announce that we’ll be bringing on several Research Fellows and Research Engineering Fellows this fall for our fellowship program.
SO013	Goodfire	AP293 guest lectures 25	We gave three guest lectures in Surya Ganguli’s course on interpretability at Stanford last fall.
SO014	PR Newswire	AI lab Goodfire raises $150M at $1.25B valuation to design models with interpretability	Today, Goodfire—the AI research lab using interpretability to understand, learn from, and design models—announced a $150 million Series B funding round at a $1.25 billion valuation.
SO015	Yahoo Finance	AI lab Goodfire raises $150M at $1.25B valuation to design models with interpretability
SO016	Pulse 2.0	Goodfire: $150 Million Series B At $1.25 Billion Valuation Raised For Interpretability AI Lab	The company has raised more than $200 million in total backing from a mix of venture firms and individual investors.
SO017	Tech Funding News	Goodfire raises $150M Series B at $1.25B valuation for interpretability AI
SO018	PR Newswire	Goodfire raises $50M Series A to advance AI interpretability research	This funding, which comes less than one year after its founding, will support the expansion of Goodfire’s research initiatives and the development of the company’s flagship interpretability platform, Ember.
SO019	Yahoo Finance	Goodfire raises $50M Series A to advance AI interpretability research
SO020	Goodfire	Announcing our $50M Series A	Today, we’re excited to announce a $50 million Series A funding round led by Menlo Ventures.
SO021	Menlo Ventures	Leading Goodfire’s $50M Series A to interpret how AI models think
SO022	Lightspeed Venture Partners	Goodfire: Building Interpretable AI	We at Lightspeed are thrilled to lead their $7M seed round.
SO023	Lightspeed Venture Partners	Goodfire company profile
SO024	Salesforce Ventures	Welcome, Goodfire	Goodfire was founded by Eric Ho, Daniel Balsam, and Thomas McGrath.
SO025	Salesforce Ventures	Goodfire company profile
SO026	VCNewsDaily	Goodfire Venture Capital Funding
SO027	MIT Technology Review	This startup’s new mechanistic interpretability tool lets you debug LLMs	In reality, they are adding precision to the alchemy.
SO028	OnHealthcare	Goodfire AI and the billion-dollar interpretability bet	The valuation jump ... is aggressive for a company with around 51 employees and what appears to be relatively early commercial traction.
SO029	PYMNTS	Goodfire raises $150 million to better understand AI
SO030	LSVP	Goodfire company page
SM001	Goodfire	Goodfire	Understand the scientific foundations of neural networks so that we can intentionally design AI.
SM002	Goodfire	Company \| Goodfire	We engage deeply and selectively, partnering with teams building high-stakes or frontier systems where understanding and control are essential.
SM003	Goodfire	Silico \| Goodfire	A model design environment.
SM004	Goodfire	Language \| Goodfire	58% reduction in hallucinations by using features as rewards.
SM005	Goodfire	Life Sciences \| Goodfire	Interpretability surfaced fragment length as the dominant predictive signal.
SM006	Goodfire	Robotics & Vision \| Goodfire	Catch generalization failure before deployment.
SM007	Goodfire	Our Series B \| Goodfire	We have built a model design environment ... to improve model behavior, and monitor them in production.
SM008	Goodfire	Intentional Design \| Goodfire	Intentional design will be an advance in model creation similar to the difference between selective breeding and genetic engineering.
SM009	Goodfire	Feature Steering for Reliable and Expressive AI Engineering	Feature steering works well with fine-tuned models but also often makes fine-tuning unnecessary.
SM010	Goodfire	Mayo Clinic Collaboration \| Goodfire	This collaboration operates under rigorous data privacy protocols and Mayo Clinic's established data governance frameworks.
SM011	Goodfire	Manifold Steering \| Goodfire Research	Representation steering ... promises lightweight, adaptable, and granular control of neural networks.
SM012	Goodfire	Interpreting Evo 2 \| Goodfire Research	We discovered a wide range of features corresponding to sophisticated biological concepts.
SM013	Goodfire	Interpreting LM Parameters \| Goodfire Research	This is not just a theoretical issue. It prevents us from achieving practical engineering goals.
SM014	Goodfire	Pilot Agreement \| Goodfire	Customer will be allowed to test the Software and receive Services, with the aim of evaluating Goodfire's technology and considering a future long-term commercial relationship.
SM015	Goodfire / Prima Mente	Prima Mente Customer Story \| Goodfire	Goodfire's interpretability platform ... turned their foundation model into an engine for biomarker discovery.
SM016	Gartner	Generative AI \| Gartner	GenAI enters the Trough of Disillusionment on the 2025 Hype Cycle for Artificial Intelligence.
SM017	PwC	AI Jobs Barometer \| PwC	Workers with AI skills command a 56% wage premium.
SM018	NIST	AI Risk Management Framework \| NIST	AI Risk Management Framework.
SM019	Arize	Phoenix \| Arize	The open-source platform for agent development and evaluation.
SM020	Arize	Pricing \| Arize	AX Pro ... $50 per month.
SM021	Fiddler	AI Observability \| Fiddler	Gain Complete Visibility from Development to Production.
SM022	Fiddler	Pricing \| Fiddler	$0.002 per trace.
SM023	Datadog	LLM Observability \| Datadog	Test prompt, model, and tool changes against real production data before rollout.
SM024	LangChain	LangSmith \| LangChain	LangSmith Observability gives you complete visibility into agent behavior.
SM025	Langfuse	Langfuse	Langfuse brings observability, prompts, evals, experiments, and human annotation into one connected workflow.
SM026	Langfuse	Pricing \| Langfuse	Enterprise ... $2499/month.
SM027	Patronus AI	Patronus AI	Evaluate agent effectiveness in tip-of-the-tongue moments.
SM028	Arthur	Arthur	Gain visibility and reliability of your model through continuous evals.
SM029	Humanloop	Pricing \| Humanloop	Get the enterprise platform to develop, evaluate, and ship trustworthy LLM powered apps.
SM030	MIT Technology Review	This startup’s new mechanistic interpretability tool lets you debug LLMs	In reality, they are adding precision to the alchemy.
SP001	Goodfire	Silico \| Goodfire	The first platform for intentional model design.
SP002	Goodfire	Language \| Goodfire	Predict how your model will fail before deployment, not after.
SP003	Goodfire	Life Sciences \| Goodfire	Trace predictive signal through interpretable features to confirm whether predictions rely on real biological structure or dataset artifacts and spurious correlations.
SP004	Goodfire	Robotics & Vision \| Goodfire	Evaluate whether your model has learned real physical structure directly from the latent space, before generating a single frame.
SP005	Goodfire	Feature steering for reliable and expressive AI engineering	AI engineers often ask us how feature steering differs from prompting or fine-tuning.
SP006	Goodfire	Our Series B	Today, we're excited to announce a $150 million Series B funding round at a $1.25 billion valuation.
SP007	MIT Technology Review	This startup's new mechanistic interpretability tool lets you debug LLMs	Goodfire is one of a small handful of companies, including industry leaders Anthropic, OpenAI, and Google DeepMind, pioneering a technique known as mechanistic interpretability.
SP008	On Healthcare Tech	Goodfire AI and the billion-dollar black box	The valuation jump from wherever it was at Series A to $1.25B at Series B is aggressive for a company with around 51 employees and what appears to be relatively early commercial traction.
SP009	Goodfire Research	Probe-based data attribution	Filtering out the data flagged by our probe reduces the harmful behavior by 63% without compromising general performance.
SP010	Goodfire Research	Rakuten: SAE probes for PII detection	We detail one of the first uses of sparse autoencoders (SAEs) with a production AI model - using SAE probes to detect personally identifiable information for Rakuten AI agents.
SP011	Goodfire Research	Understanding and steering Llama 3	We're releasing preview.goodfire.ai, a desktop interface to help you understand and steer Llama 3's behavior.
SP012	Goodfire Research	VPD explainer	We tried this and were able to make a precise and predictable change to the model's behaviour by directly editing the subcomponents, with no training required.
SP013	Goodfire Research	Self-correcting search	We were able to improve generation by giving a diffusion model a feedback loop from its own internals, resulting in ~30% more viable candidate materials in a target range.
SP014	Goodfire Research	Reasoning theater	Chain-of-thought reasoning is not always faithful to the model's internal computations.
SP015	Arize	Phoenix	The open-source platform for agent development and evaluation.
SP016	Arize	Pricing \| Arize	AX Pro ... $50 per month.
SP017	Fiddler AI	AI Observability \| Fiddler AI	Gain unified visibility, context, and control across agents and predictive applications.
SP018	Fiddler AI	Pricing \| Fiddler AI	$0.002 per trace.
SP019	Arthur	Arthur	The full lifecycle platform for ensuring reliable AI.
SP020	Datadog	LLM Observability \| Datadog	Free includes up to 40K LLM spans per month. Pro starts at $160 per month and includes 100K LLM spans.
SP021	LangChain	LangSmith	LangSmith has a free tier for development and small-scale production. Paid plans scale with trace volume.
SP022	Langfuse	Langfuse	Open Source AIEngineeringPlatform.
SP023	Langfuse	Pricing \| Langfuse	$29/ month.
SP024	Humanloop	Pricing \| Humanloop	Get the enterprise platform to develop, evaluate, and ship trustworthy LLM powered apps.
SP025	Humanloop	Humanloop is joining Anthropic	As we sunset the Humanloop platform, we will continue to work closely with our customers to make their transition as smooth as possible.
SP026	Weights	Weights is joining OpenAI	As part of this transition, our products and services have been wound down and are no longer available.
SP027	National Institute of Standards and Technology	AI Risk Management Framework	The NIST AI Risk Management Framework (AI RMF) is intended for voluntary use and to improve the ability to incorporate trustworthiness considerations into the design, development, use, and evaluation of AI products, services, and systems.
SP028	Gartner	Generative AI	The total cost of ownership (TCO) for GenAI initiatives can often exceed initial expectations due to hidden costs such as compliance reviews, model retraining and internal overheads.
SP029	Humanloop	Humanloop: LLM evals platform for enterprises
SI001	Goodfire	Goodfire homepage
SI002	Goodfire	Understanding, Learning From, and Designing AI: Our Series B	Today, we're excited to announce a $150 million Series B funding round at a $1.25 billion valuation.
SI003	Goodfire	Silico
SI004	Goodfire	Language
SI005	Goodfire	Life Sciences
SI006	Goodfire	Robotics & Vision
SI007	Goodfire	Feature Steering for Reliable and Expressive AI Engineering	Update (Feb 2026): Our SAE demo interface and API have been deprecated.
SI008	Goodfire	Contact	Our platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs.
SI009	Goodfire	Careers
SI010	Goodfire	SOC 2 Type II compliant	We're excited to announce that Goodfire is SOC 2 Type II compliant.
SI011	Goodfire	Customer story: Prima Mente
SI012	Goodfire	RLFR: Reinforcement Learning from Feature Rewards	Overall, we reduce the hallucination rate by 58% across the held-out test set.
SI013	Goodfire	Master Services Agreement
SI014	Goodfire	Pilot Agreement
SI015	Goodfire	Silico Terms of Use
SI016	PR Newswire	AI Lab Goodfire Raises $150M at $1.25B Valuation to Design Models with Interpretability
SI017	Yahoo Finance	AI Lab Goodfire Raises $150M at $1.25B Valuation to Design Models with Interpretability
SI018	The SaaS News	Goodfire Raises $150 Million at $1.25 Billion Valuation
SI019	Pulse 2.0	Goodfire: $150 Million Series B At $1.25 Billion Valuation Raised For Interpretability AI Lab
SI020	Tech Funding News	Goodfire raises $150M Series B at $1.25B valuation
SI021	PR Newswire	Goodfire Raises $50M Series A to Advance AI Interpretability Research
SI022	Yahoo Finance	Goodfire Raises $50M Series A to Advance AI Interpretability Research
SI023	Menlo Ventures	Leading Goodfire's $50M Series A to Interpret How AI Models Think
SI024	VC News Daily	Goodfire Venture Capital Funding
SI025	Salesforce Ventures	Welcome, Goodfire
SI026	On Healthcare	Goodfire AI and the Billion-Dollar Black Box	The valuation jump ... is aggressive for a company with around 51 employees and what appears to be relatively early commercial traction.
SI027	SEC	Goodfire AI, Inc. Form D filing dated 2025-06-02
SI028	SEC	Goodfire AI, Inc. Form D filing dated 2026-02-09
SI029	Goodfire	Customer Story: Radical AI	We're excited to announce a new partnership between Radical AI and Goodfire to fundamentally dismantle the black box of AI-driven materials discovery and design.
SE001	Goodfire	Silico
SE002	Goodfire	Language
SE003	Goodfire	Life Sciences
SE004	Goodfire	Robotics & Vision
SE005	Goodfire	Hallucinations Viewer
SE006	Goodfire	Feature Steering for Reliable and Expressive AI Engineering
SE007	Goodfire	Intentionally Designing the Future of AI
SE008	Goodfire	Announcing our SOC 2 Type II Certification
SE009	Goodfire	You and Your Research Agent
SE010	Goodfire	Under the Hood of a Reasoning Model
SE011	Goodfire	The World Inside Neural Networks
SE012	Goodfire	Verbalized Eval Awareness Inflates Measured Safety
SE013	Goodfire	Interpretability for Alzheimer's Detection
SE014	Goodfire	Can SAEs Capture Neural Geometry?
SE015	Goodfire	EVEE: Explaining Genetic Variants
SE016	Goodfire	Model Diff Amplification
SE017	Goodfire	Stochastic Parameter Decomposition
SE018	Goodfire	Understanding Memorization via Loss Curvature
SE019	Goodfire	Painting with Concepts
SE020	Goodfire	The Shape of Stories Inside Neural Networks
SE021	Goodfire	Phylogeny Manifold
SE022	Goodfire	Silico Terms of Use
SE023	Goodfire	Pilot Agreement
SE024	Goodfire	Careers
SE025	Goodfire	AP293 Guest Lectures 25
SE026	Goodfire	Fellowship Fall 25
SE027	Goodfire	Announcing our Mayo Clinic Collaboration
SE028	Goodfire	Prima Mente Customer Story
SE029	Goodfire	Radical AI Partnership Announcement
SE030	MIT Technology Review	This startup's new mechanistic interpretability tool lets you debug LLMs
SE031	Salesforce Ventures	Welcome, Goodfire
SE032	On Healthcare Tech	Goodfire AI and the Billion-Dollar Black Box
SE033	NIST	AI Risk Management Framework
SE034	Gartner	Generative AI
SE035	Menlo Ventures	Leading Goodfire's $50M Series A to Interpret How AI Models Think
SE036	Lightspeed Venture Partners	Goodfire
SE037	PYMNTS	Goodfire Raises $150 Million to Better Understand AI
SE038	PR Newswire	AI Lab Goodfire Raises $150M at $1.25B Valuation to Design Models with Interpretability
SU001	Goodfire	Contact / early-access page	Our platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs.
SU002	Goodfire	Silico product page
SU003	Goodfire	Life sciences page
SU004	Goodfire	Language page
SU005	Goodfire	Robotics / vision page
SU006	Goodfire	Prima Mente customer story	Goodfire’s research scientists embedded in Prima Mente’s team as they had finished training their model.
SU007	Goodfire	Interpretability for Alzheimer's detection	We detail how we studied Pleiades to identify fragmentomics as a novel class of biomarkers for Alzheimer’s detection.
SU008	Goodfire	Mayo Clinic collaboration announcement	This collaboration operates under rigorous data privacy protocols and Mayo Clinic's established data governance frameworks.
SU009	Goodfire	EVEE: explaining genetic variants	Our pathogenicity probe achieves state-of-the-art performance (0.997 overall AUROC on 839k ClinVar variants).
SU010	Goodfire	Interpreting Evo 2	Preliminary experiments have shown promising directions for steering these features to guide DNA sequence generation, though this work is still in its early stages.
SU011	Goodfire	Radical AI partnership announcement
SU012	Goodfire	Using self-correcting search to accelerate materials discovery	Applying self-correcting search improves targeting without harming SUN scores, leading to an overall ~27% increase in successful candidates.
SU013	Goodfire	Rakuten SAE probes for PII detection	As a result, Rakuten deployed the SAE probes - the first known enterprise application of SAEs for language model guardrails.
SU014	Goodfire	Series B announcement / customer positioning	We use this environment internally for research, and deploy it forward with our customers, collaborating in a shared environment.
SU015	Goodfire	You and your research agent
SU016	Goodfire	Blog index
SU017	Salesforce Ventures	Goodfire company profile
SU018	Salesforce Ventures	Welcome Goodfire	Enterprise customers care more about the ROI they see from their AI investments than ever.
SU019	MIT Technology Review	This startup's new mechanistic interpretability tool lets you debug LLMs	In reality, they are adding precision to the alchemy.
SU020	OnHealthcare	Goodfire AI and the billion-dollar bet on interpretability
SU021	Tech Funding News	Goodfire raises $150M Series B at $1.25B valuation
SU022	PR Newswire	AI lab Goodfire raises $150M at $1.25B valuation to design models with interpretability	This funding... will enable Goodfire to ... scale partnerships across AI agents and life sciences.
SU023	Yahoo Finance	AI lab Goodfire raises $150M at $1.25B valuation to design models with interpretability
SU024	Lightspeed Venture Partners	Goodfire company page
SU025	Menlo Ventures	Leading Goodfire's $50M Series A to interpret how AI models think	Patrick Hsu, co-founder of Arc Institute... said, “Their interpretability tools have enabled us to extract novel biological concepts that are accelerating our scientific discovery process.”
SU026	PYMNTS	Goodfire raises $150 million to better understand AI	We use this environment internally for research, and deploy it forward with our customers, collaborating in a shared environment.
SU027	Goodfire	Research index
SU028	Goodfire	Radical AI customer story
SU029	Goodfire	Open problems in mechanistic interpretability
SU030	Goodfire	Belief dynamics in in-context steering
SU031	Goodfire	Mixing mechanisms
SU032	Goodfire	Replicating circuit tracing for a simple mechanism
SU033	Goodfire	Mapping latent spaces in Llama 3.3 70B
SU034	Goodfire	A Geometric Calculator
SU035	Mayo Clinic	About Mayo Clinic
SR001	Goodfire	Master Services Agreement	The Services are provided "as is" and Goodfire hereby disclaims all warranties.
SR002	Goodfire	Pilot Agreement	In no event will either Party's aggregate liability exceed the fees paid for the pilot.
SR003	Goodfire	Silico Terms of Use	Customer grants Goodfire a non-exclusive, worldwide, perpetual, irrevocable, royalty-free, sublicensable license to Workflow Data.
SR004	Goodfire	Goodfire is SOC 2 Type II compliant	We're excited to announce that Goodfire is SOC 2 Type II compliant.
SR005	Goodfire	Intentional design	The techniques are early, the science is incomplete, and the hardest problems remain unsolved.
SR006	Goodfire	Company	Our goal is to make AI that can be understood, debugged, and shaped like software.
SR007	Goodfire	Careers	If you thrive in fast-paced environments and believe that understanding AI systems is essential for our future, join us.
SR008	Goodfire	Contact	Our platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs.
SR009	Goodfire	Silico	Precisely debug issues with model behavior, identify and remove confounders, and diagnose failures before they occur in production.
SR010	Goodfire	Prima Mente customer story	Goodfire's research scientists embedded in Prima Mente's team and built out a biomarker discovery pipeline.
SR011	Goodfire	Mayo Clinic collaboration	This collaboration operates under rigorous data privacy protocols and Mayo Clinic's established data governance frameworks.
SR012	Goodfire	Radical AI partnership announcement	More details about specific research directions and outcomes will be shared as the partnership progresses.
SR013	MIT Technology Review	This startup’s new mechanistic interpretability tool lets you debug LLMs	In reality, they are adding precision to the alchemy.
SR014	On Healthcare	Goodfire AI and the billion-dollar interpretability bet	The valuation jump is aggressive for a company with around 51 employees and what appears to be relatively early commercial traction.
SR015	PYMNTS	Goodfire raises $150 million to better understand AI	The company's Series B funding round values Goodfire at $1.25 billion.
SR016	NIST	AI Risk Management Framework	The profile will guide critical infrastructure operators towards specific risk management practices to consider when engaging AI-enabled capabilities.
SR017	Gartner	Generative AI	The success of these implementations often hinges on the quality of data and the effectiveness of governance frameworks in place.
SR018	PwC	AI Jobs Barometer	In the Healthcare sector, AI adoption is happening slower than in other industries and risk-controlled adoption of this technology matters.
SR019	Datadog	LLM Observability / Agent Observability	Validate changes before rollout, monitor production health continuously, and scale AI programs with stronger governance and fewer surprises.
SR020	LangChain	LangSmith Observability	LLM observability platforms provide visibility into agent decisions and help debug complex failures and hallucinations.
SR021	Tech Funding News	Goodfire raises $150M Series B at $1.25B valuation	This lack of visibility makes AI hard to control, difficult to fix, and risky to deploy at scale.
SR022	Goodfire	Understanding, Learning From, and Designing AI: Our Series B	To that end, we've built a model design environment.
SR023	Goodfire	On optimism for interpretability	Models are complex systems, and understanding them is a genuine research challenge.
SR024	Goodfire	Verbalized eval awareness inflates measured safety	Unless safety benchmarks account for eval awareness, they may systematically overestimate model alignment.
SR025	Goodfire	Reasoning theater	Models genuinely reason through hard problems, but coast through easy ones while generating superfluous chain-of-thought.
SR026	Goodfire	Stochastic parameter decomposition	SPD isn't a complete solution.
SR027	Goodfire	Understanding memorization via loss curvature	The method is not yet mature and can be heavy-handed in its edits.
SR028	Goodfire	Can SAEs capture neural geometry?	A single line can only give us a partial view of curved geometric structure.
SR029	Goodfire	Manifold steering	Linear steering cuts across the behavior manifold and produces noisy, off-target effects.
SR030	Goodfire	Model diff amplification	Even if an undesired behavior normally occurs only once in a million samples, amplification lets us surface it with far fewer rollouts.
SR031	Goodfire	Phylogeny manifold	Interpretability can improve reliability and transparency for downstream applications, especially in clinical domains.
SR032	Salesforce Ventures	Welcome Goodfire	Enterprise customers care more about the ROI they see from their AI investments than ever and cannot steer AI models to behave reliably and consistently.
SR033	Lightspeed Venture Partners	Goodfire is building interpretable AI	As governments increasingly push regulation mandating explainable AI systems, enterprises will need to provide clear rationales for model behavior.
SR034	Investing.com	Goodfire raises $150 million to improve AI model understanding	The company works with clients including Microsoft Corp., the Mayo Clinic, and the nonprofit Arc Institute.
SR035	IBM	Think Topics: Model Observability
SR036	Datadog	Agent Observability \| LLM Observability \| Datadog
SR037	Langfuse	Langfuse
SR038	Langfuse	Pricing - Langfuse
SR039	LangChain	LangSmith: AI Agent & LLM Observability Platform
SR040	Weights	Weights is joining OpenAI
SR041	Goodfire	Priors in Time
SR042	Goodfire	A Geometric Calculator
SR043	Goodfire	Covariance Pooling
SR044	Goodfire	The Neural Geometry Series
SV001	Goodfire	Understanding, Learning From, and Designing AI: Our Series B	Today, we're excited to announce a $150 million Series B funding round at a $1.25 billion valuation.
SV002	PR Newswire	AI Lab Goodfire Raises $150M at $1.25B Valuation To Design Models With Interpretability	Goodfire... announced a $150 million Series B funding round at a $1.25 billion valuation.
SV003	Yahoo Finance	AI lab Goodfire raises $150M at $1.25B valuation to design models with interpretability
SV004	Pulse 2.0	Goodfire: $150 Million Series B At $1.25 Billion Valuation Raised For Interpretability AI Lab
SV005	Tech Funding News	Goodfire bags $150M at $1.25B to build AI interpretability infrastructure
SV006	PR Newswire	Goodfire Raises $50M Series A to Advance AI Interpretability Research	Today, Goodfire... announced a $50 million Series A funding round led by Menlo Ventures... to support... Ember.
SV007	Menlo Ventures	Leading Goodfire's $50M Series A to Interpret How AI Models Think
SV008	Lightspeed Venture Partners	Goodfire: Building Interpretable AI
SV009	Salesforce Ventures	Welcome Goodfire
SV010	On Healthcare	Goodfire AI and the Billion Dollar Black Box	The valuation jump... is aggressive for a company with around 51 employees and what appears to be relatively early commercial traction.
SV011	TechCrunch	The AI gold rush is pulling private wealth into riskier, earlier bets
SV012	TechCrunch	Here are the 17 U.S.-based AI companies that have raised $100M or more in 2026
SV013	TechCrunch	Cursor's Anysphere nabs $9.9B valuation, soars past $500M ARR
SV014	TechCrunch	Harvey reportedly raising at $11B valuation just months after it hit $8B
SV015	TechCrunch	Enterprise AI startup Glean lands a $7.2B valuation
SV016	TechCrunch	Google to invest up to $40B in Anthropic in cash and compute
SV017	Gartner	Generative AI
SV018	NIST	AI Risk Management Framework
SV019	Arize AI	Phoenix
SV020	Fiddler AI	AI Observability and Security
SV021	LangChain	LangSmith Observability
SV022	Goodfire Research	The Shape of Stories Inside Neural Networks
SV023	Goodfire Research	Understanding and Steering Llama 3
SV024	Goodfire Research	A Geometric Calculator
SV025	Goodfire Research	Covariance Pooling
SV026	Goodfire Research	Painting With Concepts
SV027	Goodfire Research	VPD Explainer
SV028	U.S. Securities and Exchange Commission	Form D for Goodfire AI, Inc. (Series A-era filing)	Goodfire AI, Inc.... DELAWARE... 2023
SV029	U.S. Securities and Exchange Commission	Form D for Goodfire AI, Inc. (Series B-era filing)	Yan-David Erlich
SV030	Goodfire	Company
SV031	Goodfire	SOC 2 Type II
SV032	Goodfire	The Neural Geometry Series
SV033	Goodfire	SAE Scaling with Feature Manifolds
SV034	Goodfire	SAE Scaling with Feature Manifolds