Goodfire
Interpretability-native model design lab with elite backing and still-unproven commercial scale
Goodfire looks like a category-defining interpretability company, but the public record still does not justify underwriting the February 2026 valuation as a clear bargain.
Cover facts
Company profile
Goodfire is a San Francisco-based AI interpretability company and public benefit corporation building a model-design environment for understanding, debugging, and steering neural networks. The company sells selective enterprise and research partnerships around Silico/Ember-style interpretability workflows for frontier model teams, healthcare and scientific AI programs, and other high-stakes deployments, but public disclosure still leaves revenue quality and customer breadth mostly opaque.
- Website
- www.goodfire.ai
- Founded
- 2024-01-01
- Founders
- Eric Ho, Daniel Balsam, Tom McGrath
- Founding location
- San Francisco, California, USA
- Headquarters
- San Francisco, California
- Product
- Goodfire's product is a model-design environment that exposes model internals, helps diagnose failure modes, supports steering and monitoring, and is increasingly packaged around selective enterprise and scientific deployments.
- Customers
- Frontier model builders, enterprise AI teams, life-sciences and scientific AI groups, and other high-stakes model developers.
- Business model
- Selective design-partner and enterprise software engagements built around platform access, pilots, and high-touch research or field-engineering support.
- Stage
- Series B private
- Funding status
- $150 million Series B announced in February 2026 at a $1.25 billion valuation after earlier seed and Series A rounds.
Executive summary
Top strengths
- Goodfire has unusually strong research credibility and a differentiated interpretability-native product thesis.
- The cap table includes high-signal investors and strategic backers spanning frontier AI and enterprise software.
- Early flagship partnerships in healthcare, scientific AI, and enterprise design-partner workflows show real wedge potential.
Top risks
- Public disclosure still does not show ARR, revenue quality, standardized pricing, retention, or customer concentration.
- The company is valued as a future infrastructure winner before proving repeatable software economics.
- Adjacent observability, guardrail, and platform vendors may satisfy many buyer budgets without requiring Goodfire's deeper tooling.
Open gaps
- NDA-level disclosure is still needed on recurring revenue, pricing architecture, and software-versus-services mix.
- The post-Series-B preference stack, ownership structure, and any secondary or debt features remain undisclosed.
- Public materials do not provide a verified customer count, headcount, or concentration profile.
Contents
01Company Overview
1.1 Identity, mission, and product positioning
Goodfire presents itself as a research company using interpretability to understand, learn from, and design AI systems, and multiple official and financing sources describe it as a San Francisco-based public benefit corporation. The company’s central thesis is that frontier AI is still built too much as a black box, so its mission is to make models understandable, debuggable, and shapeable rather than relying on scale alone. Official materials consistently frame the business around a “model design environment” that helps users inspect model internals, diagnose failure modes, and intervene on behavior at the feature or circuit level. The product story has matured over time. Series A materials in 2025 centered on Ember as Goodfire’s flagship interpretability platform, while by 2026 the public-facing product page markets Silico as the first platform for intentional model design. The go-to-market motion appears selective rather than mass-market: Goodfire says it works with Fortune 500 enterprises, major healthcare institutions, and AI research labs, and its public product copy repeatedly targets organizations training or fine-tuning foundation models. Public evidence therefore supports a company identity that combines research lab, platform vendor, and design-partner model, with customer concentration and commercial scale still largely undisclosed.[CO001, CO002, CO003, CO004, CO005, CO024]
| Metric | Value / status | Date | Confidence | Gap / note |
|---|---|---|---|---|
| Headquarters | San Francisco, California | 2026-06-10 | high | Official and investor materials agree; careers page specifies Telegraph Hill office |
| Organization type | Public benefit corporation | 2026-06-10 | high | Repeated in official and financing materials |
| Current stage | Private, Series B stage | 2026-06-10 | medium | Private status disclosed; stage inferred from latest financing |
| Latest round | $150M Series B | 2026-02-05 | high | Led by B Capital |
| Latest valuation | $1.25B | 2026-02-05 | high | Repeated across official and third-party coverage |
| Total disclosed capital | ~$207M; publicly rounded to >$200M | 2026-02-05 | medium | Sum of disclosed seed, A, and B rounds |
| Founding date | 2026-06-10 | low | 2024 is implied by seed timing and Series A language, but one independent profile says 2023 | |
| Current product brand | Silico | 2026-04-30 | medium | Earlier 2025 materials used Ember; product naming evolved |
| Revenue / ARR | 2026-06-10 | low | No public revenue or ARR disclosed in reviewed sources | |
| Customer count | 2026-06-10 | low | No public customer count disclosed; only broad customer categories named | |
| Employee count | 2026-06-10 | low | No official headcount disclosed; one independent profile estimates ~51 employees as of Jan 2026 | |
| Disclosed customer profile | Fortune 500 enterprises, major healthcare institutions, AI research labs | 2026-06-10 | medium | Named logos and contract counts remain sparse |
Null values mark unsupported public metrics rather than zero. Funding and valuation are well corroborated, while founding date, headcount, revenue, and customer count remain incomplete or indirect.
[CO001, CO003, CO004, CO008, CO009, CO020]How Goodfire links research identity, product architecture, partner types, capital, and execution dependencies.
[CO002, CO003, CO004, CO005, CO024, CO026]1.2 Founders, leadership, and organizational profile
The founding team publicly centers on three cofounders: Eric Ho as CEO, Daniel Balsam as CTO, and Tom McGrath as chief scientist. Across investor and company materials, Ho is the primary public spokesperson and strategy voice; Balsam appears as the technical operator translating interpretability into product and applied research; and McGrath supplies heavyweight scientific credibility as the former founder of Google DeepMind’s interpretability team. Menlo and Salesforce materials also tie Ho and Balsam to prior operating experience at RippleMatch, reinforcing the narrative that Goodfire mixes frontier-research pedigree with startup execution. The broader team profile is also part of the investment case. Goodfire and its backers highlight alumni from OpenAI, Google DeepMind, Harvard, Stanford, and UC San Diego, plus named contributors such as Nick Cammarata and Leon Bergen. However, public leadership disclosure is incomplete: reviewed materials do not provide a full C-suite roster, detailed board composition, or ownership map. Even the founding date has some ambiguity. Financing materials imply the company was founded in 2024 because the Series A was said to arrive less than a year after founding and Lightspeed publicly announced a seed round in August 2024, but one independent profile describes Goodfire as founded in 2023. The office footprint disclosed publicly is also narrow: Goodfire’s careers page states roles are in person five days a week at its Telegraph Hill office in San Francisco.[CO006, CO007, CO008, CO009, CO010, CO011]
| Person | Role | Background | Founder-market fit / coverage | Key-person dependency |
|---|---|---|---|---|
| Eric Ho | CEO, co-founder | Former founder/operator at RippleMatch; public face of Goodfire in financing and media coverage | Sets company narrative, fundraising, partnerships, and commercial positioning for interpretability | High — external narrative and investor confidence are tightly linked to Ho |
| Daniel Balsam | CTO, co-founder | Former AI and engineering leader at RippleMatch; appears in Mayo and investor materials as technical operator | Bridges frontier interpretability research into product and applied genomics/enterprise use cases | High — core technical execution and productization sit heavily with Balsam |
| Tom McGrath | Chief Scientist, co-founder | Former founder of Google DeepMind’s interpretability team; repeatedly cited as scientific anchor | Supplies research credibility, agenda-setting, and technical recruiting power | High — scientific brand and category authority rely materially on McGrath |
| Nick Cammarata | Senior interpretability researcher / marquee team member | Core contributor to the seminal OpenAI interpretability team | Signals that Goodfire can recruit from the small global pool of top interpretability talent | Medium — not sole decision-maker, but valuable for research legitimacy |
Coverage is partial because reviewed sources do not disclose a full board, finance leadership, or complete management roster. Table focuses on publicly named founders and high-signal technical leadership.
[CO009, CO010, CO011, CO012, CO013, CO014]1.3 Funding history, investor base, and current stage
Goodfire has raised capital unusually quickly for a research-first infrastructure company. Public sources show a $7 million seed round led by Lightspeed in August 2024, a $50 million Series A led by Menlo Ventures in April 2025, and a $150 million Series B led by B Capital at a $1.25 billion valuation in February 2026. The Series A syndicate added Lightspeed, Anthropic, B Capital, Work-Bench, Wing, and South Park Commons, while the Series B expanded the cap table with Juniper Ventures, DFJ Growth, Salesforce Ventures, and Eric Schmidt alongside returning investors. Goodfire and third-party coverage consistently round the cumulative funding to “over $200 million,” while simple addition of disclosed rounds implies roughly $207 million. The investor mix matters as much as the dollars. Anthropic’s participation in the Series A is a strategic signal from a safety-oriented frontier lab; Salesforce Ventures indicates an enterprise-software adoption angle; and B Capital’s lead role at Series B reflects a belief that interpretability may become a major infrastructure layer. Still, the public record is thin on ownership stakes, liquidation structure, debt, secondaries, and board seats. Public sources reviewed label Goodfire as private, and the company should be treated as a late-venture, Series B-stage private business rather than a scaled commercial software company. That distinction matters because the financing pace and valuation far outstrip the company’s disclosed revenue and customer metrics.[CO016, CO017, CO018, CO019, CO020, CO021]
| Stakeholder | Role | Control or economic importance | Diligence ask |
|---|---|---|---|
| Lightspeed Venture Partners | Seed lead; Series A and B participant | Earliest institutional lead and continuing backer; likely influential in early governance | Confirm current ownership, pro rata rights, and any board seat |
| Menlo Ventures | Series A lead; Series B participant | Key financial sponsor at the first large institutional round and visible public champion | Confirm board role, follow-on reserve usage, and any protective provisions |
| Anthropic | Series A participant | Strategic investor whose presence signals safety and interpretability relevance to frontier labs | Clarify whether investment includes technical collaboration, channel value, or simple financial exposure |
| B Capital | Series B lead; Series A participant | Lead investor at the $1.25B valuation; likely major board and governance influence post-Series B | Confirm ownership percentage, board seat, liquidation terms, and any commercial introduction rights |
| Juniper Ventures | Series B existing investor | Named as returning investor at Series B but less visible in earlier public materials | Determine entry round, ownership, and influence relative to better-known VCs |
| DFJ Growth | Series B new investor | Adds late-venture scale capital and potential follow-on capacity | Assess whether DFJ view is platform-infrastructure or frontier-model optionality |
| Salesforce Ventures | Series B new investor and strategic enterprise partner | Signals enterprise software and procurement relevance, not just research backing | Clarify whether Salesforce provides channel access, product partnerships, or board observation rights |
| Eric Schmidt | Series B angel / strategic investor | Adds brand and policy credibility disproportionate to likely check size | Determine whether Schmidt is passive capital or active network participant |
| Wing Venture Capital | Series A and B participant | Continuing venture support from infrastructure-oriented investor base | Confirm stake and any role in product go-to-market guidance |
| South Park Commons | Series A and B participant; early ecosystem sponsor | Important ecosystem backer given Goodfire’s early office history and talent network | Clarify talent pipeline and whether SPC provided incubation before formal founding |
| Work-Bench | Series A participant | Adds enterprise-software pattern recognition at earlier stage | Determine whether Work-Bench remains active post-Series B |
| Mayo Clinic / design partners | Strategic non-investor stakeholders | Partners matter economically because commercial proof appears to rely on selective high-stakes collaborations | Request signed customer references, paid pilot status, and renewal dynamics |
Investor map is exhaustive for explicitly named public stakeholders, not for the full cap table. Exact ownership, board representation, liquidation preferences, and secondary activity are not publicly disclosed in the reviewed sources.
[CO016, CO017, CO018, CO020, CO022, CO023]A compact maturity snapshot emphasizing financing, stage, and the limited public disclosure of operating metrics.
Total disclosed capital is a simple sum of the public $7M seed, $50M Series A, and $150M Series B. The figure intentionally omits revenue and customer-count KPIs because public sources do not support them.
[CO020, CO021, CO026, CO029, CO030, CO032]1.4 Chronology, cover metrics, and key diligence risks
The public chronology is short but dense. Goodfire surfaces from seed financing in 2024, announced its Series A in April 2025, publicized the Mayo Clinic collaboration in September 2025, launched a fellowship program and field-building educational content in late 2025, and then announced its Series B and broader intentional design agenda in February 2026. By April 2026, MIT Technology Review covered Silico as a commercial product for debugging and steering models, and by May 2026 Goodfire was emphasizing SOC 2 certification and a growing enterprise-facing posture. This sequence shows a company trying to convert cutting-edge interpretability research into product and partnership credibility in under two years. The key cover-metric pattern is asymmetry: valuation and capital raised are well supported, while operating metrics remain sparse. No reviewed public source discloses revenue, ARR, or customer count. Headcount is not officially disclosed; one independent profile estimates about 51 employees as of January 2026. That opacity matters because the most credible adverse evidence in the source set is not about misconduct, but about execution risk: MIT Technology Review quotes an external interpretability researcher arguing that Goodfire adds “precision to the alchemy” rather than turning AI engineering into a fully principled science, and an independent health-tech analysis argues the Series B valuation is aggressive for a research-first company with early commercial traction. The diligence burden is therefore less about headline credibility and more about commercial proof, governance disclosure, and how quickly interpretability demand converts into repeatable software revenue.[CO025, CO026, CO027, CO030, CO031, CO032]
| Date | Event | Type | Amount / valuation / status | Participants | Implication |
|---|---|---|---|---|---|
| 2024-08-15 | Lightspeed publicly announces leading Goodfire’s seed round | founding | $7M seed | Goodfire; Lightspeed Venture Partners | Establishes the first public financing marker and anchors a 2024 operating timeline |
| 2025-04-17 | Goodfire announces Series A and Ember platform | financing | $50M Series A | Menlo Ventures lead; Lightspeed, Anthropic, B Capital, Work-Bench, Wing, South Park Commons | Moves the company from seed research lab to institutionally backed platform narrative |
| 2025-09-09 | Goodfire announces Mayo Clinic collaboration for genomic medicine | partnership | Collaboration announced | Goodfire; Mayo Clinic | Expands relevance from core interpretability research into healthcare and clinical AI |
| 2025-10-09 | Goodfire opens fall fellowship program | scale | Fellowship cohort recruitment | Goodfire research staff | Signals active talent build-out and field-building beyond core founder bench |
| 2025-12-11 | Goodfire shares Stanford guest lectures on interpretability | governance | Educational release | Goodfire researchers; Stanford course community | Shows thought leadership and effort to shape the discipline around its agenda |
| 2026-02-05 | Goodfire announces Series B and intentional design agenda | financing | $150M at $1.25B valuation | B Capital lead; Juniper, Menlo, Lightspeed, DFJ Growth, Salesforce Ventures, Eric Schmidt and others | Validates investor appetite and sharply reprices the company as category infrastructure |
| 2026-04-30 | MIT Technology Review covers Silico public launch | product | Fee-based product launch / release coverage | Goodfire; MIT Technology Review | Marks transition from research platform narrative toward broader commercial productization |
| 2026-04-30 | External researcher cautions that Silico adds “precision to the alchemy” | adverse | Skeptical expert commentary | Leonard Bereska; MIT Technology Review | Introduces skepticism that the product fully solves the scientific uncertainty it claims to address |
| 2026-05-22 | Goodfire announces SOC 2 Type II certification | regulatory | Compliance certification announced | Goodfire | Supports enterprise procurement and trust posture for handling sensitive model-development workflows |
| 2026-06-10 | Public customer profile remains selective rather than broad-market | scale | Fortune 500 / healthcare / research-lab usage stated; no broad metrics | Goodfire; unnamed customers | Commercial story still depends on quality of design partners more than disclosed volume metrics |
Timeline emphasizes dated events visible in public materials. Some items represent public disclosure dates rather than the underlying operational start date, which remains partially unresolved for founding and commercial scale.
[CO017, CO016, CO018, CO024, CO026, CO027]Key public milestones from Goodfire’s seed emergence through Series B, product launch, and the first meaningful skeptical coverage.
[CO017, CO016, CO018, CO024, CO026, CO035]1.5 Exhibits
02Market Analysis
2.1 Market boundary and evidence-constrained sizing
Goodfire's relevant market is narrower than headline AI enthusiasm. Its own materials describe a product stack built around understanding model internals, debugging failures, steering behavior, shaping training, and in some cases monitoring production behavior. That boundary excludes generic copilots, general application observability, and AI infrastructure spend that never reaches a model-design workflow. The closest public analogs are LLM observability and evaluation vendors such as Arize, Fiddler, Datadog, LangSmith, Langfuse, Humanloop, Arthur, and Patronus, but even those mostly instrument prompts, traces, sessions, and outputs rather than model parameters or latent representations. Because Goodfire does not disclose pricing, customer count, or revenue, a classic TAM-SAM-SOM stack would overstate precision. The evidence-constrained approach is to use multiple lenses instead: first, macro demand signals that show AI usage and ROI pressure are spreading; second, published adjacent-tool pricing that establishes a software-budget floor for teams already buying observability and eval products; and third, an access lens that narrows the reachable market to organizations able to provide model internals and tolerate a services-heavy pilot motion. That combination supports a real but selective market, with more near-term substance in advanced model teams than in generic enterprise AI narratives.[CM001, CM002, CM015, CM016, CM017, CM020]
| segment/category | included spend | excluded spend | buyer/payer | relevance |
|---|---|---|---|---|
| Frontier-lab interpretability and model design | Interpretability research infrastructure, steering workflows, training-shaping tools, safety diagnostics, and production monitors tied to owned models | Generic AI infrastructure, pure inference hosting, and generic app analytics | Research leads, safety teams, and frontier-model R&D budgets | Most natural direct segment because labs own model internals and already value interpretability |
| Enterprise model engineering and governance | Debugging, eval, steering, and monitoring for proprietary or open-weight enterprise models | Teams using only third-party closed APIs with no internal-model access | VP Engineering, AI platform leaders, ML infra, and advanced product budgets | Reachable when enterprises run or fine-tune their own important models |
| Scientific AI and life-sciences model design | Model decoding, validation, confounder removal, and discovery workflows in genomics, biology, and robotics | General lab software, wet-lab tools, and non-model R&D software | Scientific-program leads, computational biology teams, and research budgets | Strong fit where internal-model understanding changes scientific or deployment quality |
| Regulated and high-consequence adopters | Interpretability, governance, and validation layers for finance, healthcare, legal, or safety-critical AI | Commodity workplace copilots and generic knowledge-worker subscriptions | Clinical, compliance, risk, or domain-operations budgets with technical sponsorship | High-need segment, but harder to close because procurement and evidence burdens are heavier |
| Adjacent LLM observability and evaluation stack | Tracing, prompt management, evals, experiments, and guardrails already budgeted in production AI teams | Deep parameter or latent-space control when vendors only observe outputs or traces | Developer tooling, platform engineering, and MLOps budgets | Important adjacency because these budgets define the closest public comparison set |
The market boundary is intentionally narrow: it follows the spend that can plausibly land in model-internal understanding, steering, and validation workflows rather than all generative-AI software or infrastructure.
[CM020, CM021, CM022, CM028, CM029, CM030]| publisher | year | geography | value | CAGR | methodology | confidence | limitation |
|---|---|---|---|---|---|---|---|
| Gartner | 2025 | Global | Trough of Disillusionment (qualitative maturity lens) | Hype-cycle lens for implementation realism and ROI dispersion | high | Useful for timing and caution, but not a market-size number. | |
| PwC | 2025 | Global | 100% of industries increasing AI usage; 3x higher revenue-per-worker growth in AI-exposed industries | Macro adoption and productivity lens | high | Adoption breadth is real, but it does not isolate interpretability-tool budgets. | |
| Arize + Langfuse | 2026 | Global SaaS | $348-$600 annual list price per small team before heavy usage | Bottom-up adjacent pricing lens from public self-serve plans | high | Trace-and-eval tooling is adjacent, not the same as model-internal design tooling. | |
| Langfuse | 2026 | Global SaaS | $29,988 annual enterprise list price before volume adders | Public enterprise list-price lens | high | One vendor datapoint does not reveal Goodfire pricing or win rates. | |
| Fiddler | 2026 | Global SaaS | $0.002 per trace | Usage-based observability lens | high | Spend depends entirely on trace volume and still reflects output-trace observability, not interpretability work. | |
| Goodfire direct market lens | 2026 | Selective design partners | Undisclosed / case-by-case | Direct commercial lens from MIT reporting plus Goodfire pilot agreement | medium | No public ACV, customer-count, or pipeline data exist for a true TAM-SAM-SOM build. |
This table intentionally mixes qualitative maturity signals and adjacent pricing proxies because public evidence does not support a clean Goodfire TAM-SAM-SOM. The point is to bound the market with observable lenses rather than invent a top-down number.
[CM015, CM016, CM017, CM019, CM024, CM025]Public evidence supports a large AI-demand backdrop, a visible adjacent observability budget layer, and a much narrower direct Goodfire capture layer defined by model access and high-touch pilots.
This is a constrained lens stack rather than a numeric TAM-SAM-SOM waterfall. Only the adjacent-budget layer has visible public pricing; Goodfire's direct commercial layer is undisclosed.
[CM020, CM019, CM024, CM026, CM044, CM045]Public pricing only supports a bottom-up range for adjacent software budgets; Goodfire's direct ACV remains undisclosed, so these figures are comparison proxies rather than Goodfire revenue estimates.
All values are adjacent-market price proxies, not Goodfire prices. The usage-based row is derived directly from Fiddler's published per-trace rate using explicit 100k, 1M, and 10M annual trace scenarios.
[CM024, CM025, CM026, CM047, CM048, CM049]2.2 Buyer segmentation, budget owners, and adoption path
The clearest buyers are teams that both control important models and can expose enough internal state for Goodfire to do meaningful work. Frontier labs sit at the top of that list because they already run interpretability efforts, have research and safety staff who can use the tooling, and face direct pressure to shape model behavior. Enterprise model teams come next when they own proprietary or open-weight models and can justify specialized tooling through AI-platform or advanced-engineering budgets. Scientific AI teams in genomics, biology, robotics, and other research-heavy domains are especially relevant because interpretability can validate whether predictions are driven by real structure or shortcuts and can surface domain knowledge humans can reuse. Regulated adopters have strong need, but the combination of privacy, governance, and evidence requirements makes them slower to close. The payer is not always the end user. Research leads, CTOs, platform heads, or scientific-program owners may buy; model scientists, safety teams, and computational researchers use; and central AI R&D, platform, or research-program budgets pay. Public legal and product evidence implies a pilot-first motion: identify a high-stakes model problem, secure model and data access, run interpretability or steering work in a shared environment, prove a control or validation outcome, and only then expand into longer-term monitoring or licensing. That motion fits a high-touch, design-partner market better than a mass self-serve software motion.[CM003, CM005, CM006, CM009, CM010, CM011]
| segment | buyer | user | payer/workflow | budget owner | adoption trigger |
|---|---|---|---|---|---|
| Frontier labs | Chief scientist, interpretability lead, safety lead | Interpretability researchers, model scientists, safety engineers | Research program around training control, alignment, and failure analysis | Frontier-model R&D and safety budgets | Need to debug, steer, or align internally developed frontier models |
| Enterprise model teams | CTO, VP Engineering, AI platform lead | Applied scientists, ML engineers, eval teams | Owned or fine-tuned model programs with reliability or control needs | AI platform, infrastructure, or advanced product budgets | High-value model workflow where traces are insufficient and deeper control matters |
| Life sciences / scientific AI teams | Research director, computational biology lead, scientific founder | Computational scientists, modelers, translational research teams | Scientific discovery or validation workflow tied to owned foundation models | Research program or disease-area budget | Need to validate that model predictions reflect real mechanisms, not confounders |
| Regulated adopters | Clinical, legal, compliance, or risk executive with technical sponsor | Domain experts, review teams, model-risk staff | Pilot around high-consequence decision support or specialized model governance | Domain budget plus governance oversight | Need for transparent, auditable behavior before broader deployment |
The buyer-user-payer split matters because Goodfire is sold as a high-touch capability layer. In every segment, the best trigger is a high-value model that the customer controls deeply enough to inspect.
[CM028, CM029, CM030, CM031, CM032, CM033]Goodfire's best near-term segments combine high need for interpretability with real access to model internals; regulated adopters have strong need but weaker immediate reach.
The matrix is an evidence-based ordinal synthesis from public product, legal, research, and independent reporting. It measures relative reachability, not disclosed revenue.
[CM028, CM029, CM030, CM031, CM036, CM046]Goodfire's public materials imply a pilot-first value chain that starts with a high-stakes model problem and expands only after model access and interpretability work prove value.
The sequence comes from Goodfire's legal pilot agreement, product pages, and Series B narrative. Public sources do not disclose stage-by-stage conversion rates.
[CM009, CM010, CM032, CM036, CM044]2.3 Growth drivers, constraints, and valuation relevance
Demand-side conditions are favorable. PwC shows that AI-exposed industries are generating materially higher revenue per worker and paying a large skills premium, which suggests real willingness to fund tools that make AI systems more effective. At the same time, adjacent vendors repeatedly frame observability, guardrails, and evaluation as business-critical because autonomous systems now touch revenue, operations, and user experience. That helps Goodfire because it means the budget conversation already exists; the company does not need to invent the importance of reliability or control from scratch. Its scientific and regulated use cases also line up with the places where output-only evaluation is least sufficient and where deeper interpretability has the most strategic value. The brakes are equally important. Gartner says ROI varies widely and hidden implementation costs can be large. NIST-style governance expectations, data privacy rules, and clinical or scientific validation standards all slow deployment. Most importantly, Goodfire's own story and independent reporting agree that the field is still technically immature: the company markets precision engineering, but external critics and even Goodfire's own research papers acknowledge that interpretability still has major open problems. Combined with the requirement for model-internal access and the lack of public pricing or customer data, that means valuation should anchor on a selective high-value wedge rather than a mass-market software assumption.[CM016, CM017, CM018, CM019, CM034, CM035]
| driver/constraint | direction | timing | implication | diligence ask |
|---|---|---|---|---|
| Higher-stakes AI deployment | up | current | As AI touches science, healthcare, and autonomous workflows, demand rises for deeper validation and control | Ask which current customers use Goodfire for pre-deployment validation versus post hoc analysis. |
| Productivity and labor pressure | up | 12-24 months | Firms that see real AI productivity gains are more willing to fund tooling that increases model reliability | Request proof that Goodfire shortens debugging or post-training iteration cycles enough to justify budget. |
| Adjacent observability budget normalization | up | current | Tracing, evals, and guardrails are already funded categories, making the budget conversation easier | Ask how often Goodfire sells alongside LangSmith, Datadog, Langfuse, or similar platforms. |
| Scientific discovery upside | up | 12-36 months | Biology and robotics cases broaden the market beyond software teams if outcomes prove repeatable | Request revenue split and renewal evidence for scientific customers or partners. |
| Model-access dependence | down | current | Closed-model customers are harder to serve because Goodfire needs deeper access than most API-only users can provide | Request the pipeline split between open-weight, proprietary in-house, and closed-API prospects. |
| Governance and validation burden | down | current and rising | Regulated buyers may value interpretability most, but their procurement cycles are longest | Request average sales cycle and security or governance review time by segment. |
| Technical immaturity of mechanistic interpretability | down | 12-36 months | Debate over how close the field is to precision engineering can cap budget urgency | Request benchmark evidence that Goodfire changes outcomes on production tasks, not just research demos. |
| Opaque Goodfire pricing and customer disclosure | down | current | Without public price and customer data, outside investors must underwrite a selective rather than broad-market story | Request ACV bands, pilot-to-license conversion, and customer-count disclosures by cohort. |
The key underwriting question is not whether demand exists, but whether Goodfire can convert a real need for control into repeatable commercial deployments faster than access limits, governance friction, and field immaturity slow adoption.
[CM016, CM018, CM019, CM034, CM035, CM036]2.4 Exhibits
03Competitors
3.1 Landscape by competitor class
Goodfire sits in an unusual competitive slot. Its public product language is not about post-hoc prompt monitoring or generic LLM telemetry; it is about intentional model design, feature steering, targeted failure correction, and programmatic access to model internals. MIT Technology Review frames Silico as a mechanistic-interpretability tool that puts techniques previously concentrated inside Anthropic, OpenAI, and Google DeepMind into the hands of smaller firms and research teams. That makes internal frontier-lab interpretability groups and sophisticated in-house research teams the closest direct alternatives for buyers building or adapting open-weight models. The broader commercial landscape is more crowded but more indirect. Arize Phoenix, LangSmith, Langfuse, Datadog, Fiddler, Arthur, and former platforms such as Humanloop all compete for budget tied to trustworthy AI development, yet their default control point is tracing, evaluation, guardrails, or governance around deployed systems rather than deep editing of learned representations. The practical implication is that Goodfire should be judged less like another observability dashboard and more like a new tooling layer for model builders who need mechanistic understanding before, during, and after training.[CP001, CP002, CP004, CP006, CP007, CP008]
| competitor | category | scale / funding signal | target segment | key differentiation | key limitation versus Goodfire |
|---|---|---|---|---|---|
| Goodfire / Silico | Mechanistic-interpretability-native model design | Raised $150M Series B at $1.25B valuation; ~$209M total funding disclosed | Teams building or adapting open-weight and domain-specific models | Programmatic access to model internals, feature steering, data attribution, and pre-deployment failure diagnosis | Public pricing, win-rate, and installed-base evidence are sparse relative to adjacent tooling vendors |
| Frontier-lab internal interpretability teams (Anthropic / OpenAI / Google DeepMind) | Direct incumbent / internal build | Embedded inside frontier labs rather than sold as a stand-alone product | Frontier model builders with closed-weight access | Deepest access to proprietary models and internal research talent | Unavailable as a commercial product for most buyers; not a purchasable vendor |
| Arize Phoenix | Adjacent open-source tracing and eval platform | Open-source product; AX Pro starts at $50/month with enterprise tier | AI engineers building agents and LLM applications | Tracing, evals, datasets, experiments, and open-source entry point | Focuses on agent development observability rather than mechanistic editing of model internals |
| Fiddler AI | Adjacent enterprise observability / guardrails vendor | Free tier, $0.002 per trace developer plan, enterprise deployment options | Enterprises needing monitoring, policy, and governance for AI systems | Unified observability, custom evaluators, real-time guardrails, SaaS/VPC/on-prem options | Competes at the monitoring and control-plane layer, not the feature-level model-design layer |
| Arthur | Adjacent lifecycle reliability and governance vendor | Enterprise AI platform with monitoring and policy workflow proofs on page | Enterprises managing agents, GenAI, and traditional ML together | Continuous evals, policies, guardrails, dashboards, and oversight across the AI lifecycle | Little public evidence of mechanistic interpretability or targeted internal model editing |
| Datadog LLM Observability | Incumbent observability platform | Free 40K LLM spans/month; Pro starts at $160/month with 100K spans | Existing Datadog customers extending APM into AI delivery | Bundles agent observability with backend monitoring, experiments, data retention, and enterprise controls | Best suited to operating production AI systems, not to reverse engineering model representations |
| LangChain LangSmith | Adjacent developer workflow incumbent | Free tier for development and small production; paid plans scale with trace volume | Teams already building on LangChain or multi-framework agent stacks | Strong agent tracing, SDK breadth, framework adjacency, and debugging workflows | Public page describes observability, not mechanistic model editing or training-data attribution |
| Langfuse | Adjacent open-source AI engineering platform | 10B+ observations/month; 100k+ engineers; free plus $29/$199/$2499 self-serve plans | Developers wanting OSS tracing, evals, prompts, and production feedback loops | OpenTelemetry base, self-hosting, transparent pricing, and large OSS distribution | Economic and developer-workflow strength does not translate into Goodfire-style internal model control |
| Humanloop (historical) | Adjacent eval / prompt management vendor | Free trial with 50 eval runs and 10K logs/month; now joining Anthropic and sunsetting | Teams evaluating models and managing prompts for trustworthy LLM apps | Prompt management, evaluation metrics, private deployment add-ons | No longer an independent platform, which underscores category consolidation risk |
| Weights / Weave (historical) | Adjacent tooling vendor absorbed by frontier lab | Products wound down after team joined OpenAI | Creators and model builders using earlier Weights products | Demonstrates that AI tooling talent can be absorbed by frontier labs | No longer a live independent competitor; mainly a signal of category absorption |
| In-house black-box workflow | Status-quo substitute / internal build | Engineering labor plus commodity open-source or point tools | Teams unwilling to buy a new vendor category | Flexible and initially cheap: prompting, evals, fine-tuning, and guardrails can be assembled incrementally | Keeps teams in guess-and-check loops with limited mechanistic evidence on why a model failed |
Profile set intentionally mixes direct, incumbent, adjacent, historical, and substitute options because Goodfire competes for a job-to-be-done, not a single analyst-defined software category.
[CP001, CP006, CP007, CP017, CP018, CP019]Ordinal positioning shows Goodfire furthest toward mechanistic model control, while Datadog, Fiddler, LangSmith, and Langfuse score higher on deployment-observability breadth.
X-axis is mechanistic access / direct model editability from 1 (surface-level observability only) to 5 (deep model-internal access). Y-axis is deployment and distribution breadth from 1 (narrow research workflow) to 5 (broad installed-base or platform reach). Scores are evidence-backed ordinals synthesized from reviewed source pages, not benchmark measurements.
[CP001, CP006, CP007, CP017, CP019, CP021]3.2 Adjacent vendors: capabilities, packaging, and budget overlap
The adjacent vendor set is commercially relevant because it competes for the same buyer conversation around trustworthy AI, but the products are usually anchored to different workflows. Arize Phoenix emphasizes open-source tracing, evals, datasets, and experiments for agent development. Fiddler and Arthur lean into lifecycle observability, guardrails, policies, and governance. Datadog folds agent observability into a much larger application-monitoring estate, which is important because that installed base can make “good enough” AI oversight easier to buy than a stand-alone platform. LangSmith and Langfuse both push developer workflow and production debugging; Langfuse, in particular, combines a strong open-source posture with transparent self-serve pricing, while LangSmith advertises a free tier and trace-volume billing. Humanloop historically targeted development, prompt management, and evaluation for trustworthy LLM apps, but its move into Anthropic shows the category can be absorbed by model labs rather than remain independent. Relative to these vendors, Goodfire looks differentiated on mechanistic access and targeted model editing, but thinner on public pricing, installed base, and broadly deployed observability surfaces.[CP017, CP018, CP019, CP020, CP021, CP022]
| buying criterion | Goodfire | Frontier labs internal teams | Arize Phoenix | Fiddler AI | Arthur | Datadog | LangSmith | Langfuse | Humanloop (historical) | In-house black-box stack |
|---|---|---|---|---|---|---|---|---|---|---|
| Mechanistic access to model internals | strong | strong | unsupported / unknown | unsupported / unknown | unsupported / unknown | unsupported / unknown | unsupported / unknown | unsupported / unknown | unsupported / unknown | limited |
| Targeted steering or editing of learned features | strong | strong | unsupported / unknown | limited | limited | unsupported / unknown | unsupported / unknown | unsupported / unknown | unsupported / unknown | limited via prompt or fine-tune only |
| Training-data attribution or probe workflows | strong | strong | unsupported / unknown | unsupported / unknown | unsupported / unknown | unsupported / unknown | unsupported / unknown | unsupported / unknown | unsupported / unknown | unknown |
| Production tracing / experiments / eval loop | limited | unknown | strong | strong | strong | strong | strong | strong | strong | partial |
| Real-time guardrails / policy enforcement | limited | unknown | limited | strong | strong | limited | limited | limited | limited | partial |
| Open-source or self-host path | limited public evidence | no commercial path | strong | limited | unknown | limited | unknown | strong | limited | strong |
| Enterprise deployment / compliance controls | emerging / limited public proof | internal only | strong | strong | strong | strong | unknown | strong | strong historically | depends on internal team |
| Domain-specific scientific model workflows | strong | limited public proof | unsupported / unknown | unsupported / unknown | unsupported / unknown | unsupported / unknown | unsupported / unknown | unsupported / unknown | unsupported / unknown | custom if team builds it |
Cells are evidence-backed qualitative judgments from reviewed product pages only; unsupported or absent capability disclosures are marked as unknown rather than inferred.
[CP001, CP002, CP007, CP008, CP011, CP012]| offering | public price / contract model | packaging details | included capabilities | unknowns / discounts | implication |
|---|---|---|---|---|---|
| Goodfire / Silico | Case-by-case fee; Goodfire declined specific pricing | Custom commercial engagement aligned to customer requirements | Model design environment, experiment agent, mechanistic debugging and steering | No public self-serve list price or usage meter disclosed | Harder for buyers to benchmark ROI; product must sell on differentiated outcomes rather than transparent entry pricing |
| Arize AX / Phoenix | Free tier; AX Pro $50/month; enterprise custom | 50k spans/month and 10GB/month in Pro; enterprise SaaS or self-hosted | Tracing, evals, datasets, experiments, observability | Startup pricing mentioned but not publicly enumerated in detail | Sets a low entry point for teams that mainly need agent telemetry and eval workflows |
| Fiddler AI | Free tier; Developer at $0.002 per trace; enterprise custom | Developer plan adds unified observability, custom evaluators, SSO, SaaS deployment | Observability, tests and experiments, guardrails, governance | Enterprise pricing not public beyond tier framing | Creates usage-based competition for safety and governance budgets around deployed systems |
| Datadog LLM Observability | Free up to 40K LLM spans/month; Pro starts at $160/month for 100K spans | On-demand overage after 100K spans; discounted M2M and annual commitments | Agent observability, evaluations, retention options, sensitive data scanning | Retention add-ons and full enterprise packaging vary by commitment | Strong incumbent bundle for teams already standardized on Datadog |
| LangSmith | Free tier for development and small production; paid plans scale with trace volume; enterprise by contact | Framework-agnostic SDK access with usage-based paid expansion | Agent tracing, observability, debugging | Exact public price not shown on reviewed page | Budget overlap is strongest where teams want workflow visibility rather than model-internal control |
| Langfuse | Free Hobby tier; $29 Core; $199 Pro; $2499 Enterprise | Units-based billing with 50k free and 100k included in paid plans; optional $300 Teams add-on | Tracing, evals, prompts, analytics, compliance features, self-host options | Volume discounts beyond listed unit ladder | Transparent pricing and OSS posture put pressure on vendors pitching generic AI engineering value |
| Humanloop (historical) | Free trial; enterprise/custom plans | 2 members, 50 eval runs, 10K logs/month; VPC add-on and enterprise support | Prompt management, evaluation, trustworthy LLM app workflow | Independent commercial future is gone after Anthropic deal | Shows how adjacent platform categories can disappear into a frontier lab before they mature independently |
| In-house black-box stack | No software line item; internal labor plus cloud/tool spend | Mix of prompts, eval harnesses, fine-tuning, and guardrails from existing tools | Flexible substitute path for teams avoiding new vendor spend | True total cost often hidden in compliance review, retraining, and internal overhead | Status quo remains viable unless Goodfire proves materially better debugging, safety, or domain outcomes |
Public pricing combines official list pricing, tier descriptions, and explicit unknowns from reviewed pages; absence of a number is treated as an evidence gap, not a hidden assumption.
[CP018, CP020, CP023, CP024, CP026, CP027]The capability map highlights Goodfire's relative strength in model-internal editing and domain-specific mechanistic workflows, versus adjacent vendors' strength in tracing, governance, and production operations.
Scores are ordinal 1-5 judgments from public capability descriptions only. A 5 indicates strongest visible fit in the reviewed source set, not an audited market ranking. The figure is a synthesized strength map distinct from TP002's support/unknown matrix.
[CP002, CP003, CP011, CP012, CP013, CP017]3.3 Switching costs, substitutes, and distribution power
Switching costs in this landscape are asymmetric. Once a team standardizes on Datadog, LangSmith, or Langfuse for traces, evals, and production debugging, those tools can become the default operating surface for AI quality work even if they do not expose model internals. That distribution advantage matters because many organizations would rather extend an existing developer or observability stack than adopt a new research-native workflow. Conversely, Goodfire’s strongest use cases appear where tracing alone is not enough: open-weight model builders, safety-critical domains, and research teams that need to inspect features, attribute behaviors to training data, or intervene before deployment. The main substitute is still a black-box stack of prompting, benchmark evals, guardrails, and iterative fine-tuning, sometimes assembled in-house from open-source tools. That path is cheaper up front and familiar, but Goodfire’s argument is that it leaves teams guessing at why a model behaves badly. The competitive question is whether buyers feel enough pain from that guess-and-check loop to move budget from observability or prompt tooling into mechanistic model design.[CP004, CP008, CP016, CP017, CP018, CP022]
3.4 Moat durability and competitive risk
Goodfire’s moat case is easiest to believe when the buyer values mechanistic understanding itself. The company can point to feature steering, data attribution, PII-detection probes, and domain work in biology and robotics as evidence that model internals can be used for debugging, safety, and scientific discovery rather than just post-hoc monitoring. That gives it a more research-native product story than adjacent evaluation vendors. But the adverse evidence matters. MIT Technology Review quotes an outside mechanistic-interpretability researcher arguing that Goodfire may be adding precision to today’s alchemy rather than turning AI into a fully principled engineering discipline. The same article notes that Silico is most useful where customers can access model weights, limiting applicability on closed frontier models. OnHealthcare also frames the company as a 51-person, research-first organization valued aggressively relative to disclosed commercial traction. The highest-risk scenarios are therefore clear: larger observability vendors adding explain-and-steer features, frontier labs keeping the deepest interpretability advantages in-house, or customers deciding that trace-level controls are sufficient. Goodfire can still win if it becomes the default model-design layer for open-weight and domain-specific AI programs, but that durability is not yet proven by public win-rate, pricing, or retention evidence.[CP005, CP007, CP008, CP009, CP010, CP011]
| moat or risk claim | supporting evidence | counter-pressure | severity | mitigation / diligence ask |
|---|---|---|---|---|
| Mechanistic interpretability is Goodfire's clearest product moat | Silico, feature steering, data attribution, Llama steering, and probe work all point to direct intervention on model internals | Frontier labs also do mechanistic interpretability internally, and outsiders question how principled the workflow already is | high | Request customer evidence showing that mechanistic workflows change deployment or training decisions in ways observability tools cannot |
| Goodfire is strongest where customers can inspect open-weight or adaptable models | MIT says Silico is most usable when teams can access a model's inner workings; Goodfire markets training/debugging model design environments | Closed frontier models limit applicability; many enterprise buyers still consume APIs from black-box providers | high | Ask for customer mix by open-weight versus API-only deployments and proof of closed-model roadmap |
| Adjacent observability vendors can absorb large parts of the AI-quality budget | Arize, Fiddler, Datadog, LangSmith, Langfuse, Arthur, and Humanloop all sell tracing, evals, guardrails, or governance | These tools do not obviously solve feature-level debugging or data attribution, leaving room for a deeper design layer | high | Test whether Goodfire is attached to a separate budget owner or must displace observability spend |
| Transparent self-serve pricing elsewhere makes Goodfire's opaque pricing a sales risk | Arize, Fiddler, Datadog, and Langfuse publish entry pricing while Goodfire uses case-by-case commercial terms | If buyers perceive Goodfire as another tooling vendor rather than a differentiated research layer, price discovery will feel unfavorable | medium-high | Request realized pricing, pilots-to-production conversion, and average time to first value |
| Research breadth can become a moat only if it productizes | Goodfire cites hallucination reduction, PII detection, biology discovery, and diffusion-search wins across multiple domains | Broad research portfolio can also create focus risk and slow repeatable product packaging | medium-high | Ask what percentage of roadmap and headcount is tied to reusable product versus custom research engagements |
| Category consolidation is a real threat | Humanloop is joining Anthropic and sunsetting; Weights wound down after team joined OpenAI | Frontier labs may absorb adjacent capabilities and talent faster than start-ups can scale independently | medium | Assess whether Goodfire is more likely to be a durable platform, a feature inside another stack, or an attractive acquisition target |
| Governance and trust requirements help Goodfire only if buyers believe interpretability is additive to observability | NIST AI RMF and Gartner both reinforce governance, evaluation, and hidden operating-cost concerns in sensitive AI systems | Those same concerns also strengthen guardrail and observability incumbents such as Fiddler, Arthur, and Datadog | medium | Validate whether regulated buyers explicitly ask for mechanistic evidence or remain satisfied with trace-level controls and policy enforcement |
Severity reflects competitive pressure on Goodfire specifically, not absolute vendor quality; mitigation requests focus on evidence missing from the public record.
[CP005, CP007, CP008, CP009, CP010, CP011]Compact KPIs summarize the commercial and competitive boundaries around Goodfire's moat: large research funding, opaque pricing, adjacent free tiers, and direct pressure from internal frontier-lab teams.
KPI items intentionally mix funding, price floors, and packaging signals because Goodfire's competitive durability is shaped by both technical differentiation and adjacent-tool economics.
[CP005, CP010, CP018, CP020, CP023, CP026]04Financials
4.1 Revenue model and pricing surface: software is visible, economics are not
Public evidence supports a commercial product, but not a public price book. Goodfire's official surface describes Silico as a model-design environment and workspace for training and debugging models on Goodfire infrastructure, and the vertical pages repeatedly invite teams training or fine-tuning foundation models to request access rather than self-serve into a public checkout flow. The contact page goes further, saying the platform is already used by Fortune 500 enterprises, major healthcare institutions, and AI research labs. Those statements support the existence of an enterprise product and an enterprise target market. They do not disclose the actual commercial terms those customers accept. The legal documents make the pricing posture clearer. The master services agreement and terms of use both push commercial economics into negotiated order forms. The terms explicitly contemplate fees, overage charges when usage exceeds contracted allotments, and dashboard or usage-report records that are authoritative for billing. The pilot agreement separately states that pilot access is for internal evaluation and that a separate commercial license is required after the evaluation period. That combination points to a monetization stack built around custom contracts rather than public list pricing: pilot fees, commercial platform fees, usage-based overages, and potentially additional service charges. What remains unavailable is the part investors actually need to underwrite. None of the reviewed public pages disclose list price, minimum annual commit, support tier pricing, discount ladders, or realized pricing by customer type. The pricing / monetization table therefore distinguishes verified commercial mechanisms from missing economics. The absence of public prices is not unusual for enterprise AI infrastructure, but it means external readers cannot infer ACV, customer segmentation, or software gross margin from the official surface alone. The right conclusion is not that Goodfire lacks revenue; it is that Goodfire has chosen a negotiated, opaque commercial posture.[CI007, CI008, CI009, CI012, CI013, CI014]
| stream | mechanism | unit | current value/status | quality | diligence ask |
|---|---|---|---|---|---|
| Pilot programs | Evaluation access under pilot agreement before full commercial license | pilot fee / pilot term | Pilot fee exists in order form; public amount undisclosed | Medium for existence, low for value | Provide executed pilot order forms, fee schedule, and conversion rate to commercial contracts. |
| Silico commercial platform access | Order-form-based access to hosted platform, APIs, tools, documentation, and related software | annual contract or custom license | Commercial fees exist in order forms; no public list price | Medium for mechanism, low for pricing | Provide standard order form, ACV ranges, minimum commits, and billing basis. |
| Usage overages | Charges for usage beyond contracted allotment under terms of use | usage unit above allotment | Overages explicitly contemplated; triggering unit and price undisclosed | Medium for mechanism, low for realized economics | Disclose metering unit, included allotment, overage rate, and customer usage mix. |
| Support / field engineering / research services | Technical assistance, field engineering, collaboration activities, and deliverables alongside platform use | project, retainer, or services statement of work | Services are contractually available; public pricing and attach rate undisclosed | Medium for existence, low for margin profile | Disclose service revenue share, pricing method, utilization, and gross margin. |
| Life-sciences discovery engagements | Platform plus embedded interpretability work for scientific discovery partners such as Prima Mente | custom engagement | Named proof points exist; no contract value or renewal data disclosed | Low for current revenue contribution | Provide contract values, renewal status, and whether these engagements convert to recurring software. |
| Enterprise design partnerships | Selective engagements with frontier or high-stakes AI teams | custom partnership | Officially described as selective and request-access based; no public contract economics | Low for current revenue quality | Provide design-partner count, conversion to production contracts, and realized annual spend per account. |
Verified mechanisms come from legal docs and official product pages. Current value/status is intentionally qualitative because Goodfire does not disclose revenue mix or realized pricing.
[CI007, CI008, CI009, CI012, CI013, CI014]| product / path | price / unit / contract | list vs realized | discounts / unknowns | source |
|---|---|---|---|---|
| Silico commercial license | No public amount disclosed | No public list pricing; negotiated realized pricing only | Unknown minimum commits, contract term, seats or compute basis | Official product pages + MSA/TOS |
| Pilot agreement | Pilot fee set in order form; amount undisclosed | No public list pricing | Unknown evaluation term, conversion credits, and pilot-success criteria | Pilot Agreement |
| Usage overages | Overage charges apply above included allotment; unit not public | Realized only | Unknown rate card, thresholds, and true usage driver | TOS |
| Support / field engineering | No public price disclosed | Realized only | Unknown whether bundled, separately invoiced, or included in enterprise tier | TOS + MSA |
| Compliance-ready enterprise deployment | SOC 2 / SOC 3 support procurement readiness but do not set price | Not a price point | Unknown whether security/compliance premium is monetized directly | SOC 2 blog + contact surface |
| Deprecated demo / API preview | No current public commercial price; preview API deprecated in Feb 2026 | Historic preview removed from public surface | Unknown whether any self-serve pricing survived privately | Feature steering blog |
This table separates disclosed commercial mechanics from undisclosed economics. Official pricing is effectively absent; every public path points to custom contracting.
[CI008, CI013, CI014, CI015, CI018, CI020]Illustrates the public revenue architecture from selective customer acquisition through platform usage and services, while marking where realized pricing and margin cease to be public.
Public sources verify the nodes and commercial mechanisms, but not realized values, contract sizes, or margin. This is a structural bridge rather than a quantified waterfall.
[CI007, CI008, CI009, CI012, CI013, CI014]4.2 GTM motion and unit economics: high-touch deployments, low public observability
Goodfire's public GTM looks selective and high touch. The Series B post says the company engages deeply and selectively with teams building high-stakes or frontier systems, while the contact page describes a platform used by large enterprises, healthcare institutions, and AI research labs. The customer-story material shows why this matters financially: in the Prima Mente engagement, Goodfire researchers embedded with the customer and built a biomarker discovery pipeline around the customer's model. The terms of use also describe support, technical assistance, field engineering support, research activities, collaboration activities, and deliverables. Together, these sources suggest that at least some deployments are not pure seat-based software subscriptions; they likely combine platform access with bespoke scientific or engineering work. That has two opposite implications. On the positive side, embedded work can accelerate design-partner conversion, widen the product moat, and justify premium enterprise pricing. It can also make Goodfire useful in high-stakes domains where customers need interpretation help, not just dashboards. On the negative side, services-heavy revenue generally scales more slowly and often carries a weaker gross-margin profile than pure software. Public sources do not reveal how much of Goodfire's revenue, if any, comes from software usage, annual licenses, pilots, or research services. They also do not disclose customer counts, pilot-to-production conversion, sales cycle length, retention, CAC, or payback. The technical proof points are meaningful but not financial metrics. Goodfire's RLFR research claims a 58 percent hallucination reduction at roughly 90 times lower cost than an LLM-as-a-judge approach, and the life-sciences case studies show credible customer-value stories in diagnostics and scientific discovery. Those are strong commercialization narratives, but they are not the same as disclosed revenue quality. For this reason the unit economics bridge is qualitative. It shows the likely path from a selective design partnership to contracted software and overages, while making clear that the realized values at each step are private.[CI009, CI010, CI011, CI012, CI015, CI016]
| metric | value / null | confidence | why it matters | diligence ask |
|---|---|---|---|---|
| Public list price for Silico | low | Without list or starting price, outsiders cannot bracket ACV or customer segmentation. | Request current price card or anonymized quote set by deployment type. | |
| Average contract value (ACV) | low | ACV is needed to translate selective design-partner traction into revenue scale. | Provide ACV distribution for pilots, enterprise subscriptions, and strategic partnerships. | |
| Usage gross margin | low | Consumption software can be high margin, but embedded compute or human delivery can compress it. | Provide gross margin by platform usage line and by services line. | |
| Services revenue share | low | A services-heavy mix changes scalability and valuation framework. | Disclose software-versus-services mix for the last twelve months. | |
| Pilot-to-production conversion rate | low | This is the clearest proxy for revenue quality in a selective-enterprise GTM model. | Provide count of pilots launched, converted, and churned. | |
| Sales cycle length | low | Long enterprise and healthcare procurement cycles can delay revenue recognition and cash collection. | Disclose median cycle from first contact to signed order form by customer segment. | |
| CAC payback | low | Necessary to judge whether high-touch GTM is economically durable. | Provide fully loaded CAC and gross-margin payback by cohort. | |
| Retention / expansion | low | Overages and usage growth matter only if accounts renew and expand. | Provide logo retention, gross retention, and expansion rates for paying accounts. |
Null means the metric is not publicly disclosed in reviewed sources, not that the metric is zero or irrelevant.
[CI012, CI016, CI022, CI023, CI029, CI030]Qualitative bridge from acquisition motion to blended economics, highlighting where public evidence ends and diligence requests must begin.
The bridge is intentionally qualitative because Goodfire does not disclose ACV, CAC, payback, retention, or gross margin.
[CI012, CI014, CI015, CI016, CI022, CI023]4.3 Capital adequacy and financing: funding is verified, runway is not
The strongest financial facts in the public record are financing facts. Goodfire announced a $50 million Series A in April 2025 and a $150 million Series B at a $1.25 billion valuation in February 2026. The SEC Form D filings sharpen those announcements. The 2025 filing reports $52,029,991 sold after a first sale on 2025-04-02, and the 2026 filing reports $149,999,796 sold after a first sale on 2025-12-17, against a total offering amount of $161,674,124. On that narrow basis, at least $202.0 million of equity sold in the two disclosed rounds is directly verifiable from primary filing data, and public commentary places total funding modestly above $200 million including earlier capital. That financing fact pattern supports one clear conclusion: Goodfire has had strong capital access. It does not answer the central capital-adequacy question. No reviewed public source discloses cash on hand, monthly burn, runway months, debt covenants, or a next-round trigger. The public uses of funds are broad: frontier research, next-generation product development, and scaled partnerships across AI agents and life sciences. Those are real cash uses, but they are not enough to derive runway because the denominator is missing. Even the relatively small additional clue in the 2026 Form D — a total offering amount above the announced sold amount — only shows possible capacity or reserve in the round, not actual cash still available. The financial estimate range therefore stays disciplined and only brackets financing facts that are source-backed. It does not invent revenue, burn, or runway. Likewise, the capital-intensity map highlights where cash likely goes — product, research, embedded delivery, and enterprise compliance — while preserving the distinction between documented financing and inferred cost structure. This is the correct evidence-constrained stance: the raise is verified, but capital adequacy beyond the raise cannot be underwritten from public data.[CI001, CI002, CI003, CI004, CI005, CI006]
| item | public value / status | confidence | why it matters | diligence ask |
|---|---|---|---|---|
| Verified Series A financing | Announced $50M; Form D shows $52.029991M sold | high | Primary evidence confirms external capital raising in 2025. | Reconcile press-announced round size to cap-table-close documents. |
| Verified Series B financing | Announced $150M at $1.25B valuation; Form D shows $149.999796M sold and $161.674124M total offering | high | Primary evidence confirms large 2026 financing and possible residual offering capacity. | Provide final close schedule and whether any unsold allocation remained available. |
| Cumulative disclosed capital since Series A | At least $202.029787M sold across 2025-2026 Form D filings; public commentary says backing exceeds $200M overall | high | This is the strongest public capital-adequacy anchor. | Provide total capital raised including seed and remaining unrestricted cash. |
| Cash on hand | low | Cash balance is required to convert financing history into actual runway. | Provide current unrestricted cash and short-term investments. | |
| Monthly burn | low | Without burn, no public runway estimate is defensible. | Provide last six months of net burn and planned spend by function. | |
| Runway months | low | Runway is the core adequacy metric after a large financing round. | Provide management runway view under base and downside plans. | |
| Planned use of funds | Frontier research, next-generation core product, and scaling partnerships across AI agents and life sciences | medium | Confirms capital is funding both R&D and GTM, not just balance-sheet preservation. | Provide board-approved use-of-proceeds model with timing and budget buckets. |
| Debt / project-finance obligations | No public debt or project-finance obligations identified in reviewed sources | low | Absence of disclosure is not proof of zero leverage, but no public obligation surfaced here. | Provide debt schedule, venture debt terms, leases, and any committed compute obligations. |
This table separates verified financing facts from unavailable liquidity metrics. Null values reflect missing public disclosure, not negative findings.
[CI001, CI002, CI003, CI004, CI005, CI006]Source-backed financial ranges limited to financing facts; revenue, burn, and runway are excluded because they are not publicly disclosed.
Low/base/high values reconcile press-announced financing, Form D sold amounts, and broader public commentary about total backing. This figure does not invent ranges for revenue, burn, or runway.
[CI001, CI002, CI003, CI004, CI005, CI034]Matrix showing where public capital evidence exists and where operating-cash evidence is still missing.
This is a structured evidence map, not a quantified cash-flow statement. The purpose is to keep verified financing separate from missing operating-liquidity data.
[CI006, CI019, CI020, CI037, CI038]4.4 Financial verdict and public gaps: verified funding, inferred monetization, unresolved underwriting
The evidence supports a precise but narrow verdict. Goodfire is not financially unformed; it has a verified enterprise product surface, real external financing, named partners in regulated and frontier domains, enterprise-security credentials, and commercial contracts that contemplate fees, overages, and usage measurement. Those are the ingredients of a real business. But almost every metric needed to judge revenue quality and margin path remains private. There is no public revenue, no public ARR, no gross-margin disclosure, no cash balance, no burn, no runway, and no debt schedule. That gap matters because the likely business model is mixed. The software platform could become a valuable recurring-revenue layer if usage and overages dominate. Yet the customer evidence and services clauses imply that at least part of the current offering includes embedded scientific and engineering labor. Without knowing the software-versus-services split, investors cannot tell whether Goodfire should be valued more like enterprise infrastructure software, specialized applied-research services, or a hybrid that starts service-heavy and software-lighter before maturing. The adverse read is straightforward. One skeptical sector analysis argues that the $1.25 billion valuation is aggressive for a company with early commercial traction and not yet a predictable SaaS profile. That critique is directionally fair given the public data: the capital has been disclosed, but the operating model has not. The underwriting answer is therefore to separate what is verified from what is inferred. Verified: financing, enterprise contracting mechanics, security readiness, and selective customer traction. Inferred: monetization mix, gross-margin path, and runway durability. The gaps table below captures the exact diligence requests needed before a financial investment case can move from plausible to underwritten.[CI017, CI018, CI020, CI025, CI029, CI030]
| missing private metric | impact on underwriting | exact diligence path |
|---|---|---|
| Revenue / ARR by quarter | Cannot test whether valuation is supported by actual commercial scale. | Request monthly recurring revenue bridge, quarterly revenue, and last-twelve-month ARR walk. |
| Realized pricing by customer type | Cannot distinguish premium software economics from service-heavy bespoke work. | Request anonymized signed order forms and invoice samples across enterprise, healthcare, and research customers. |
| Software versus services revenue mix | Cannot underwrite gross-margin path or scalability. | Request management split of platform, overage, pilot, and services revenue for the last twelve months. |
| Gross margin and contribution margin | Cannot assess whether consumption and embedded-delivery costs support durable unit economics. | Request gross margin by revenue line, plus cost buckets for compute, support, and personnel. |
| Cash balance and burn | Cannot estimate runway or next financing need despite large recent rounds. | Request cash, debt, net burn, and planned hiring / research spend through the next 24 months. |
| Sales efficiency and retention | Cannot judge whether selective GTM converts into repeatable enterprise software economics. | Request pipeline conversion, sales cycle, CAC, payback, logo retention, and expansion metrics. |
Every row here is a material diligence blocker rather than a cosmetic omission. These gaps are the reason this chapter remains evidence-constrained.
[CI029, CI030, CI031, CI037, CI038, CI040]05Product & Technology
5.1 Product definition and customer workflow
Goodfire's commercial surface is best understood as a model-design environment rather than as a generic LLM observability dashboard. Silico is presented as the first platform for intentional model design, a workspace for training and debugging models on Goodfire infrastructure, and a system that packages productized interpretability around concrete jobs: seeing inside predictions, running health checks, debugging failures, shaping behavior, and improving generalization. The practical consequence is that the product sits much closer to model-development loops than to standard application-layer analytics. The customer workflow is also unusually high touch. Public pages repeatedly push teams into request-access or partnership motions instead of a self-serve onboarding path. In practice, the workflow starts with a model team that already controls weights, activations, or at least enough internals to let Goodfire inspect how the model behaves. Goodfire then pulls models, datasets, prompts, workflows, and evaluation tasks into a shared workspace, runs agent-assisted experiments, and translates the resulting mechanistic findings into interventions such as steering, diagnostics, data filtering, or reward shaping. The vertical pages show the same loop repeated across domains. Language teams use the stack to reduce hallucinations; life-sciences teams use it to extract biomarkers and variant hypotheses from model internals; robotics and vision teams use it to catch brittle features and leakage before deployment. The result is a product with real workflow specificity, but one that still depends on customer willingness to operate in a shared, research-adjacent environment rather than through a mature, commodity API surface.[CE001, CE002, CE003, CE004, CE005, CE006]
| Module / asset / product line | Primary user | Status / maturity | Differentiation | Diligence gap |
|---|---|---|---|---|
| Silico shared workspace | Frontier labs and enterprise model teams | Live product surface; access controlled | Packages interpretability around a model-design environment rather than an app-level dashboard | No public tenant model, API reference, or deployment architecture |
| Model scientist agent / experiment orchestration | Researchers and model engineers | Live internally and publicly described in launch materials | Automates experiment planning and execution inside the same workspace | Human-review rules, guardrails, and customer autonomy levels are not public |
| Diagnostics and health checks | Training, evaluation, and safety teams | Live workflow claims | Surfaces bottlenecks, feature collapse, shortcut learning, and rare failures before deployment | No published precision/recall or benchmark coverage by model class |
| Steering and intervention controls | AI engineers tuning model behavior | Live but still evolving after preview-tool deprecation | Direct feature steering, reward shaping, and data-filtering style edits | Supported-model matrix, rollback controls, and commercial packaging are private |
| Language reliability workflow | Open-model or fine-tuning teams | Most concrete public workflow | 58% hallucination reduction claim plus rollout viewer for intervention review | Evidence is strong but still concentrated in Goodfire-selected case studies |
| Scientific discovery workflow | Genomics and life-sciences researchers | Advanced partner workflow | Turns model internals into biomarkers, pathogenicity probes, and human-readable variant hypotheses | Clinical validation and regulatory pathway remain partner-specific |
| Physical AI / creative workflow assets | Robotics, vision, and image-model teams | Partner workflow or research preview | Extends same interpretability primitives into policy bottlenecks, leakage detection, and latent editing UIs | Commercial status and repeatability outside case studies are not public |
Rows combine public product modules and workflow assets because Goodfire markets the platform through problem-specific surfaces rather than through a public SKU sheet.
[CE001, CE002, CE003, CE004, CE007, CE011]| User job | Current workflow | Goodfire solution | Measurable benefit | Limitation |
|---|---|---|---|---|
| Reduce LLM hallucinations before deployment | Prompt tweaks, judge loops, and post-hoc output review | RLFR, feature steering, and the Hallucinations Viewer inside the model-design environment | 58% hallucination reduction and roughly 90x lower intervention cost versus LLM-as-judge claims | Evidence is workflow-specific and not a universal performance guarantee |
| Debug frontier reasoning-model behavior | Prompt hacks and coarse response benchmarking | Reasoning-model SAEs, feature databases, and timing-aware steering on R1 | Shows reasoning-specific features like backtracking and exposes steering edge cases at large scale | Requires weight or activation access and expert handling of model-specific behavior |
| Extract biomarkers from a scientific model | Black-box prediction review and wet-lab triage | Embedded interpretability work using SAEs, tracing, and ablation on customer models | Surfaced a novel Alzheimer's biomarker class and a human-readable classifier that generalized to an independent cohort | Still requires downstream experimental validation |
| Explain genome-wide variant effects | Opaque pathogenicity scores and coding-region-limited tools | Evo 2 embeddings plus probes and reasoning-model synthesis through EVEE | 0.997 AUROC on 839k ClinVar variants and structured hypotheses for 4.2M variants | Outputs are hypotheses, not diagnoses or regulatory-grade evidence |
| Catch robotics or vision failures before deployment | Wait for benchmark misses or production failures | Inspect latent policy structure, geometry, and leakage before deployment | Can localize bottlenecks, unused observations, and ECG leakage in reviewed case studies | Public evidence is case-study based rather than product-documentation based |
| Edit image-model behavior directly | Prompt-box iteration only | Paint With Ember canvas that manipulates latent activations and concept weights | Supports adding, moving, and reshaping concepts without only rewriting prompts | This looks like a research preview rather than the core commercial SKU |
Benefits mix directly claimed research outcomes with workflow-specific demonstrations. Goodfire does not publish customer-level ROI, conversion, or usage-frequency metrics for these flows.
[CE004, CE005, CE006, CE009, CE010, CE012]The public workflow starts with a partner-led access motion, moves through shared interpretability experiments, and ends in targeted steering or design decisions.
This operating flow synthesizes the recurring pattern across language, life sciences, robotics, and launch materials. Public sources do not expose a formal buyer playbook or conversion funnel.
[CE003, CE004, CE007, CE011, CE016, CE033]5.2 Interpretability primitives and operating architecture
Goodfire's architecture pairs a shared experiment workspace with a research stack that spans activation analysis, geometry discovery, parameter decomposition, and intervention tooling. The official research surface shows sparse autoencoders, probes, and manifold methods doing the early-stage work of surfacing interpretable features; neural-geometry work argues that many important concepts live on curved internal manifolds rather than single directions; and stochastic parameter decomposition pushes the stack deeper into weights, where Goodfire tries to identify which causal components can be removed without changing outputs. That combination suggests the platform is not a single technique but a layered toolkit for interpreting, localizing, and editing model behavior. The R1 work is especially revealing because it shows both capability and friction. Goodfire says it trained the first public sparse autoencoders on a frontier reasoning model and had to build custom inference and interpreter-model infrastructure to do so. At the same time, the work shows that steering reasoning models is not plug-and-play: interventions had to happen after the model's stock response prefix, and some heavy-handed steering caused behavior to snap back toward the original response. That makes the core product proposition stronger, not weaker: the whole point of Silico is to expose these hidden operational constraints before customers ship or retrain blindly. This architecture also explains Goodfire's dependency stack. The deepest workflows require access to model internals, which makes open-weight or customer-controlled models a better fit than closed API endpoints. It also explains why Goodfire can reuse the same core ideas across domains. EVEE, Alzheimer's biomarker work, Paint With Ember, and robotics bottleneck analysis all share the same pattern: pull out internal structure, translate it into something legible, then use that understanding to debug, steer, or design the model more intentionally.[CE013, CE014, CE018, CE019, CE020, CE021]
| Layer / process / component | Role | Dependency | Risk |
|---|---|---|---|
| Customer model and materials ingestion | Brings weights, datasets, files, code, prompts, and workflows into the workspace | Customer must control or expose enough internals for analysis | Closed API models and restrictive data-sharing rules can block the deepest workflows |
| Shared workspace and agent orchestration | Runs experiments, captures outputs, and coordinates interpretability tasks on Goodfire infrastructure | Goodfire compute, inference, and agent tooling | Tenancy, region layout, and review/approval controls are not public |
| Activation interpretability layer | Uses SAEs, probes, and related tools to localize model features and signals | Activation access plus trained interpreter models | Linear feature methods can miss global curved structure |
| Geometry / manifold layer | Recovers structured concept spaces for smoother understanding and control | Clustering and geometry-discovery pipelines over internal representations | Research maturity is high, but packaged product boundaries are not fully public |
| Parameter decomposition layer | Inspects weights as causal components rather than only observing activations | SPD-style decomposition and masking methods | Scalability, runtime cost, and product packaging remain partially research-stage |
| Monitoring and failure-surfacing layer | Uses amplified sampling and eval-awareness analysis to catch rare post-training failures | Before/after checkpoints, rollout analysis, and judge infrastructure | Monitoring findings can depend on prompt design and may not generalize automatically |
| Intervention and steering loop | Applies feature steering, filtering, reward shaping, and targeted model edits | Edit permissions, rollback discipline, and model-specific heuristics | Wrong timing or oversteering can cause route-around behavior in reasoning models |
| Service and commercial delivery layer | Adds support, technical assistance, field engineering, and research collaboration around the platform | Order forms, Goodfire personnel, and partner workflows | High-touch delivery can slow scaling and hide how much value is software versus services |
This is an evidence-backed operating architecture, not an official engineering diagram. It distinguishes public method layers from undisclosed infrastructure details such as tenancy, vendor stack, and data residency.
[CE018, CE020, CE021, CE022, CE023, CE024]Silico stacks customer-controlled model access, shared experimentation, interpretability primitives, and intervention tooling into a single model-design environment.
This stack is inferred from product pages, research posts, launch coverage, and legal terms. Goodfire does not publish a canonical architecture diagram or vendor-by-vendor infrastructure map.
[CE001, CE016, CE018, CE023, CE026, CE033]Silico depends on customer access to model internals, Goodfire-controlled experiment infrastructure, contractual order forms, and domain-specific partner contexts.
The map is a synthesis of public product, legal, and launch materials. It highlights the practical dependency that Goodfire works best when customers can expose model internals rather than only call opaque APIs.
[CE017, CE033, CE034, CE036, CE037, CE038]5.3 Trust, quality, and compliance posture
Goodfire's public trust posture is more mature on enterprise security than on public operational transparency. The strongest visible procurement signal is the company's SOC 2 Type II announcement, which says the audit completed with no exceptions and is accompanied by a public SOC 3 summary. Health-facing materials add another layer by describing Mayo-specific privacy protocols and governance frameworks designed to reduce spurious correlations and improve clinical relevance. Those are meaningful indicators for buyers in regulated environments. The legal surface, however, makes clear that many of the operational details investors and enterprise architects normally want to inspect are still private. The terms of use define the platform broadly to include software, APIs, tools, documentation, support, and services, but the concrete economics live in negotiated order forms. Usage reports are authoritative for billing, overages exist, and pilots are explicitly provided on an AS IS basis unless an order form says otherwise. Public terms also reserve suspension rights for security, legal, operational, and payment reasons and allow third-party products into the delivery stack. This is credible enterprise contract scaffolding, but it still leaves important diligence gaps. Public materials do not disclose a self-serve API reference, a public status page, deployment-count evidence, tenancy architecture, or quantitative uptime history. That matters because the external frameworks Goodfire is selling into are increasingly intolerant of black-box governance. NIST focuses on trustworthiness across design, development, use, and evaluation, while Gartner warns that hidden governance and change-management costs can dominate ROI in high-stakes GenAI deployments. Goodfire is directionally aligned with those buyer needs, but still early in how much public operating evidence it exposes.[CE033, CE034, CE035, CE036, CE037, CE038]
| Control / certification / quality metric | Status | Scope | Gap |
|---|---|---|---|
| SOC 2 Type II / SOC 3 | Achieved; Type II announced with no exceptions | Enterprise security and procurement assurance | Does not substitute for public uptime or architecture transparency |
| Order-form commercial controls | Live contractual structure | Fees, overages, service scope, and commercial commitments | No public rate card or public benchmark for deal terms |
| Pilot program guardrails | Live evaluation structure | Internal evaluation only; separate commercial license required after pilot | Default pilot terms are AS IS and do not publish service levels |
| Usage reports and metering | Live billing control | Goodfire records are authoritative for fee calculation and usage summaries | Public documents do not disclose exact metering units, quotas, or thresholds |
| Suspension and third-party-product governance | Live contractual control | Security, legal, operational, payment, and third-party integration handling | Fallback procedures and vendor list are not public |
| Mayo privacy and governance protocols | Partner-specific public commitment | Health and genomics collaboration | Not a generic public privacy architecture for all customer deployments |
| Public transparency surface | Limited | Trust portal, contact path, and security summary | No public status page, self-serve API docs, incident history, or deployment-count disclosure |
The table separates formal procurement signals from missing public operating evidence. Goodfire looks stronger on negotiated enterprise controls than on broad public transparency.
[CE034, CE035, CE037, CE038, CE039, CE040]5.4 Roadmap, release cadence, and maturity
Goodfire's release cadence looks more like a fast-moving research organization productizing an internal stack than like a traditional enterprise software vendor with a stable public changelog. One public breadcrumb is the February 2026 deprecation notice for the earlier SAE demo interface and API, which implies a transition away from narrow research-preview tooling. By late April 2026, MIT Technology Review was covering Silico as an externally available product, and the company's own financing press materials were already framing the roadmap around next-generation product development plus scaled partnerships across AI agents and life sciences. The cadence after launch is still primarily expressed through research drops. In May 2026 alone, Goodfire published work on eval-awareness measurement, story-shape geometry, and SAE-based geometry recovery. That is unusually fast public iteration for a company trying to sell into enterprise and regulated workflows. It also means roadmap visibility is asymmetric: buyers can see the scientific engine moving quickly, but cannot yet inspect a normal SaaS artifact trail such as versioned release notes, public incident history, or a broad integration catalog. The resulting maturity picture is mixed but coherent. Core scientific capability appears strong, and the domain workflows in language, genomics, and scientific discovery are more than conceptual. Security posture is enterprise credible. The main immaturity is packaging: access remains negotiated, many deployments appear service-attached, and several key reliability and integration details remain private. Goodfire therefore looks most mature as a high-end design environment for teams with serious model ownership, and least mature as a broadly standardized developer platform.[CE015, CE017, CE039, CE044, CE046, CE047]
| Date / stage | Feature / milestone | Status | Implication | Source |
|---|---|---|---|---|
| Pre-Feb 2026 preview | Standalone SAE demo interface and API | Deprecated in Feb 2026 | Goodfire consolidated from narrow preview tooling toward a broader platform motion | Feature Steering blog |
| 2026-02 strategic thesis | Intentional design and next-generation core-product narrative | Publicly articulated | Roadmap is anchored on closed-loop training control, not only on post-hoc explanation | Intentional Design + PR Newswire |
| 2026-04-30 | Silico launch / external unveiling | Live product surface | Internal interpretability tooling became an externally offered product with case-by-case pricing | MIT Technology Review |
| 2026-05-04 | Verbalized eval awareness paper | Published | Public research cadence focuses on reliability and benchmark quality for safety-conscious buyers | Goodfire Research |
| 2026-05-20 | The Shape of Stories Inside Neural Networks | Published | Shows weekly geometry research output rather than a classic SaaS changelog pattern | Goodfire Research |
| 2026-05-21 | Can SAEs Capture Neural Geometry? | Published | Continues tooling work that can feed future control surfaces and geometry-aware methods | Goodfire Research |
| 2026 security milestone | SOC 2 Type II / SOC 3 | Achieved | Procurement readiness is moving faster than public ops telemetry | Goodfire blog |
| 2026 partner build-out | Mayo, Prima Mente, and Radical domain workflows | Active programs | Roadmap includes scientific verticalization in genomics, healthcare, and materials, not just generic LLM tooling | Goodfire partner/customer pages |
Goodfire exposes roadmap mainly through research posts, partner announcements, and financing narratives rather than through a public changelog. Dates therefore track public milestones, not a version-history feed.
[CE015, CE039, CE047, CE048, CE049, CE050]Maturity is strongest in the core interpretability engine and domain-specific workflows, and weakest in public platform packaging and transparent operating telemetry.
Ratings are qualitative judgments from public evidence only. They measure visible maturity, not internal product quality or customer satisfaction.
[CE015, CE039, CE044, CE046, CE047, CE048]5.5 Exhibits
06Customers
6.1 Customer segmentation and buying centers
Goodfire's public customer story centers on organizations that build or fine-tune foundation models rather than end-user application buyers. The clearest broad segmentation claim comes from the company's contact page, which says the platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs. Product pages sharpen that picture: Silico is pitched to teams training or fine-tuning models across architectures and modalities, the language page targets LLM developers who want to predict failures and improve behavior without retraining from scratch, the life-sciences page targets genomics and scientific-model teams, and the robotics / vision page targets physical-AI and medical-imaging workflows. Across those surfaces, the likely economic buyer is an R&D, platform, or product owner responsible for model performance and reliability, while the day-to-day users are research scientists, ML engineers, and interpretability specialists. The important caveat is that Goodfire does not translate those segment claims into counts, revenue mix, or named enterprise references. Public materials do not disclose customer count, ARR, segment share, or a list of the Fortune 500 users behind the broad enterprise claim. The public proof set is therefore much deeper in vertical specificity than in commercial breadth: named evidence clusters in genomics, clinical research, AI-agent safety, and materials discovery, with the rest of the enterprise narrative still mostly unenumerated. That asymmetry suggests a selective, high-touch go-to-market motion in which Goodfire wins a small number of technically sophisticated design partners first and only later may broaden toward more standardized enterprise software distribution.[CU001, CU002, CU003, CU004, CU005, CU006]
| Segment | Buyer / user / payer | Primary use case | Public proof | Strategic value | Gap |
|---|---|---|---|---|---|
| Frontier model labs and AI research teams | Buyer: research / platform lead; user: interpretability researcher and ML engineer; payer: R&D or model platform budget | Inspect internals, debug failures, shape training, monitor deployment | Silico page, Series B post, MIT Technology Review | Core category where Goodfire can become workflow infrastructure | No public account count or list of labs beyond named references |
| Healthcare and genomics institutions | Buyer: medical AI leader or scientific program owner; user: computational biology / genomics team; payer: research or translational medicine budget | Interpret scientific models, surface biomarkers, explain variant effects, validate model reasoning | Mayo Clinic, Prima Mente, Arc Institute, EVEE research | Highest-quality named proof and strongest differentiated outcomes | Most evidence is still research-stage, not routine clinical production |
| Large enterprises / Fortune 500 | Buyer: enterprise AI or product owner; user: ML / safety / model operations teams; payer: innovation, platform, or business-unit budget | Improve reliability, controllability, and ROI of internal models | Contact page and Salesforce Ventures thesis | Could materially broaden ACV if the broad claim converts to named logos | No named Fortune 500 accounts or disclosed outcomes |
| AI-agent platforms and consumer-internet operators | Buyer: safety / product leader; user: guardrail and infrastructure teams; payer: platform engineering budget | Detect PII, monitor agent behavior, deploy lightweight guardrails | Rakuten production deployment | Best public proof that Goodfire can support live enterprise workflows | Only one named production enterprise case in reviewed sources |
| Materials and physical-science teams | Buyer: scientific program lead; user: model scientist and autonomous-lab team; payer: R&D budget | Use internals to improve inverse design and candidate targeting | Radical AI partnership and self-correcting-search research | Expands Goodfire beyond biology into broader in silico discovery | Commercial maturity and repeatability remain early |
Rows summarize public segment evidence only. Nulls and unnamed enterprise claims indicate missing disclosure rather than absence of customers.
[CU001, CU002, CU003, CU004, CU005, CU008]Public evidence points to a selective enterprise journey: identify a high-stakes model problem, engage Goodfire as a design partner, work in a shared environment, validate technical gains, and then expand into broader monitoring or research programs.
[CU001, CU003, CU009, CU011, CU017, CU022]6.2 Named customer proof and adoption motion
The named proof set shows that Goodfire is doing real work for customers and partners, but the type of proof varies materially by account. Prima Mente is the clearest model-to-science case study: Goodfire says it embedded researchers with Prima Mente, interpreted the Pleiades epigenomics model, and helped identify a novel class of blood-borne biomarkers for Alzheimer's detection. Arc Institute is a strong scientific reference showing Goodfire can work with frontier biological foundation models at scale; however, Arc evidence is still best understood as a research collaboration rather than a conventional software deployment, especially because the initial steering work was described as early stage. Mayo Clinic similarly supports category credibility, governance readiness, and clinical adjacency, but the public record frames the work as research and hypothesis generation rather than routine clinical deployment. Rakuten stands apart because it is the clearest public production-style deployment: Goodfire says Rakuten deployed SAE probes for PII detection in AI agents after the system had to generalize from synthetic training data to real multilingual traffic with high recall requirements. Radical AI adds a fifth named proof point in materials science, but commercialization maturity remains early because the public disclosure emphasizes technical progress and promises more detail later. Taken together, the adoption motion looks consultative and deeply collaborative. Goodfire repeatedly describes a shared environment, selective design-partner engagement, embedded work, and case-by-case pricing rather than self-serve onboarding. That is a credible way to launch an advanced infrastructure product, but it also means that the current evidence base proves depth of technical engagement more clearly than repeatable, scaled software distribution.[CU010, CU011, CU012, CU013, CU014, CU015]
| Metric | Value | Date | Source quality | Implication | Missing denominator |
|---|---|---|---|---|---|
| Broad customer categories disclosed | Fortune 500 enterprises, major healthcare institutions, AI research labs | 2026-06-10 | medium | Goodfire markets beyond pure research labs | No count by category or logo list |
| Named public collaborators / customers with specific use cases | 5 | 2026-06-10 | high | Public proof set includes Prima Mente, Arc Institute, Mayo Clinic, Rakuten, and Radical AI | Not total customer count |
| Named proof points with quantified technical outcomes | 4 | 2026-06-10 | medium | Prima Mente, Mayo EVEE, Rakuten, and Radical disclose measurable technical results | Outcome metrics are technical, not commercial |
| Named proof points explicitly described as production deployment | 1 | 2026-06-10 | high | Rakuten is the clearest production-style enterprise account | No disclosed production account count across the rest of the base |
| Pricing disclosure | Case-by-case and request-access | 2026-04-30 | high | Sales motion appears enterprise and consultative | No public pricing tiers or contract ranges |
| Public follow-on evidence after initial collaboration announcement | 2 | 2026-06-10 | medium | Arc and Mayo have later public updates, suggesting some relationship continuity | Follow-on evidence is not the same as paid renewal |
| Public customer count / ARR / NRR | 2026-06-10 | high | Commercial scale cannot be quantified from public evidence | Core denominator for adoption and durability is undisclosed |
Count rows refer to the public proof set visible in reviewed sources, not to Goodfire's total customer base. Null means undisclosed.
[CU001, CU006, CU007, CU022, CU024, CU025]| Customer / partner | Segment | Deployment / use case | Production vs pilot | Outcome | Limitation |
|---|---|---|---|---|---|
| Prima Mente | AI neuroscience / life sciences | Interpret Pleiades epigenomics model to surface disease signals and improve model design | High-touch research collaboration; not disclosed as routine clinical production | Novel class of blood-borne Alzheimer's biomarkers identified; fragmentomics/fragment length highlighted | Experimental validation and publication are still pending |
| Arc Institute | Genomics foundation-model research | Interpret Evo 2 representations and explore steerable biological features | Research collaboration with later Nature-linked validation; commercial terms undisclosed | Feature discovery across coding sequences, protein structure, and tree-of-life representations | Initial steering work was described as early stage |
| Mayo Clinic | Major healthcare institution / genomic medicine | Reverse engineer genomics foundation models and launch EVEE variant-effect explorer | Research and translational collaboration; not disclosed as routine clinical deployment | 0.997 AUROC on 839k ClinVar variants; interpretable predictions for all 4.2M ClinVar variants | Work is undergoing peer review and computational outputs are not diagnoses |
| Rakuten | Enterprise AI-agent platform | Detect PII in multilingual user messages for AI agents | Production deployment | SAE probes deployed with strong synthetic-to-real generalization and major cost savings vs LLM-as-judge | Only one named production enterprise deployment is public |
| Radical AI | Materials discovery / autonomous lab | Improve inverse materials design using self-correcting search on MatterGen | Early design partnership / technical proof | ~27% overall increase in successful candidates and ~30% more SUN materials in target range | Public disclosure leaves commercialization and repeat usage unclear |
This is an intentionally partial public-proof enumeration. It distinguishes named, use-case-specific evidence from broader but unnamed enterprise claims.
[CU010, CU012, CU013, CU014, CU016, CU017]Because customer counts are undisclosed, the adoption figure is shown as a deployment flow rather than a numeric funnel: Goodfire appears to move from selective prospecting to shared-environment work, technical validation, and only occasionally disclosed production rollout.
[CU009, CU011, CU022, CU024, CU029, CU030]The matrix compares the public quality of each named reference account across disclosure, quantified outcomes, production maturity, independent corroboration, and retention visibility.
[CU013, CU015, CU018, CU020, CU024, CU025]6.3 Durability, expansion, and concentration risks
Goodfire's customer durability story is the weakest part of the public record. No reviewed source disclosed NRR, GRR, churn, renewal rates, contract length, seat expansion, customer concentration, or satisfaction metrics such as NPS. The company also does not publish customer count, so outside investors cannot tell whether the business has a few large design partners or a broader installed base. The best available durability proxies are continuity signals in public collaboration history: Arc moved from an early-2025 announcement to a later Nature-linked update, and Mayo moved from a 2025 collaboration announcement to 2026 EVEE research outputs. Those signals show some relationships continue long enough to generate additional public work, but they do not prove paid renewals, revenue expansion, or long-term stickiness. Expansion potential is visible nonetheless. Goodfire can land inside high-stakes model-development workflows and then expand from research support into monitoring, training intervention, guardrails, and adjacent scientific programs. The risk is that the proof set is concentrated in a handful of named collaborators and heavily weighted toward life sciences, while the broad Fortune 500 claim remains mostly anonymous. Two independent sources sharpen the caution. MIT Technology Review praised Silico's utility but quoted Leonard Bereska arguing that Goodfire adds 'precision to the alchemy' rather than turning model design into fully principled engineering, and OnHealthcare argued that the $1.25 billion valuation looks aggressive given limited public commercial disclosure. The customer thesis is therefore promising but still fragile: Goodfire has credible reference accounts and technical outcomes, yet much of the investability question still depends on private evidence around account scale, contract economics, and repeat usage.[CU038, CU039, CU040, CU041, CU042, CU043]
| Metric | Value / null | Segment | Confidence | Diligence ask |
|---|---|---|---|---|
| Net revenue retention (NRR) | All segments | high | Request customer cohort tables and expansion by account vintage | |
| Gross revenue retention / churn | All segments | high | Request renewal and logo-retention data by customer type | |
| Contract length / commercial term | Enterprise and research accounts | high | Request pricing schedule, term length, and pilot-to-paid conversion rates | |
| Public continuity proxy: Arc Institute | Initial 2025 announcement followed by 2026 Nature-linked update | Genomics research | medium | Confirm whether continuity reflected paid renewal, expanded scope, or publication only |
| Public continuity proxy: Mayo Clinic | 2025 collaboration later referenced by 2026 EVEE research | Healthcare / genomics | medium | Confirm whether follow-on work sits under one master agreement or multiple phases |
| Customer satisfaction proxy | All segments | high | Request NPS, reference calls, or user-review data; no public reviews surfaced in the cache |
Null means the metric was not publicly disclosed. The two continuity rows are relationship proxies only and should not be read as revenue retention metrics.
[CU019, CU024, CU038, CU039, CU040, CU046]| Expansion driver | Concentration / friction risk | Impact | Evidence | Diligence path |
|---|---|---|---|---|
| Land from research collaboration into shared product environment | High-touch delivery may scale more like expert services than pure software | Could produce high ACV but slow logo velocity | Series B post, Silico page, Prima Mente embedded-work description | Split revenue by software subscription, services, and custom research |
| Expand from life sciences into enterprise model operations | Public named proof remains heavily weighted toward biology | Vertical concentration could distort the apparent breadth of demand | Life-sciences page, Rakuten proof, Salesforce Ventures thesis | Measure pipeline and closed-won accounts outside biology |
| Broaden from open-model research teams to enterprises | Model-access constraints may limit use with closed frontier models | Adoption may skew toward labs with parameter access | MIT Technology Review and Silico page | Document support for closed-model monitoring or partner integrations |
| Use marquee references to win Fortune 500 buyers | Fortune 500 claim is unnamed and therefore weaker than the named proof set | Enterprise credibility could be overstated relative to disclosed evidence | Contact page and named proof table | Request named references, outcomes, and reference-call permissions |
| Deepen AI-agent and guardrail use cases | Rakuten is a single disclosed production account | Category could be large, but public production proof is still thin | Rakuten research and funding / investor coverage | Provide additional production customers and renewal evidence in agent workflows |
Expansion rows reflect visible go-to-market vectors in public materials. Risks focus on disclosure gaps, concentration in the proof set, and the likely services-heavy delivery model.
[CU022, CU024, CU025, CU029, CU031, CU032]Goodfire does not disclose true revenue-retention cohorts, so this figure shows a narrower proxy: the share of named public collaboration cohorts that later received additional public follow-on evidence. It is a continuity proxy, not NRR or customer retention.
This figure is evidence-constrained. Goodfire does not disclose customer retention metrics, so the cohort shows only later public continuity for named relationships.
[CU019, CU024, CU038, CU040]6.4 Exhibits
07Risks
7.1 Legal, regulatory, and contract risk
Goodfire's legal and regulatory posture is strong enough to clear initial enterprise diligence, but not yet strong enough to erase downside transfer. The positive evidence is real: Goodfire says it has achieved SOC 2 Type II, Mayo describes work under rigorous privacy and governance protocols, and the company frames interpretability as the bridge that makes sensitive AI use cases more governable. The harder underwriting read comes from the contracts. Default terms disclaim warranties around uninterrupted, secure, accurate, or error-free service; pilot and evaluation modes can operate without security or support commitments unless an order form says otherwise; and aggregate liability is capped to fees paid. Those are normal startup-software positions, but for a platform aimed at healthcare, safety, and potentially critical- infrastructure workflows they leave customers carrying a meaningful share of outage, breach, and deployment risk. Data-rights posture is the second sharp edge. The TOS gives Goodfire broad rights over Usage Data and a perpetual license over Workflow Data for improvement, evaluation, training, and commercialization, while also assigning feedback IP to Goodfire. That may be commercially rational for a research-driven platform, but it can slow procurement in regulated settings where customers want hard separation between operational traces, model behavior, and vendor product improvement. NIST's generative-AI profile and 2026 critical-infrastructure concept note both point toward more explicit risk controls, and Gartner likewise emphasizes governance, cost discipline, and realistic measurement as adoption gates. The upshot is that Goodfire does not appear to face public litigation or enforcement today, but it does face a contract-and-governance burden: if order forms do not materially improve on the default paper, expansion into regulated workloads will be slower than the brand narrative implies.[CR008, CR009, CR010, CR011, CR012, CR013]
| rule / obligation / posture | jurisdiction | status | likelihood | severity | mitigation | residual exposure | diligence path |
|---|---|---|---|---|---|---|---|
| Default warranty disclaimers and liability caps | U.S. contract law / customer order forms | Current in public MSA, pilot agreement, and TOS | High | High | Negotiate customer-specific paper, cyber insurance, and security addenda | High for regulated or safety-critical buyers | Review top 10 executed enterprise redlines versus default terms and any uncapped confidentiality / security carve-outs. |
| Broad Workflow Data and Usage Data rights | Cross-border enterprise procurement / privacy | Current in TOS | Medium-High | High | Customer-specific data-use carve-outs, de-identification controls, audit rights | Medium-High | Review DPA, data-flow maps, retention windows, and whether workflow data can be excluded from improvement/training. |
| Healthcare explainability and clinical-governance burden | U.S. healthcare / regulated research | Partially mitigated by Mayo governance language and biomarker case study | Medium-High | High | Use interpretability as validation layer, partner with regulated institutions, document governance pack | High until there is broader deployment proof | Obtain clinical-validation plan, regulatory positioning memo, and evidence of deployments beyond named research collaborations. |
| Critical-infrastructure trustworthiness expectations | U.S. critical infrastructure | Rising external expectation per NIST 2026 concept note | Medium | High | Map controls to NIST AI RMF profiles and customer model-risk workflows | Medium-High | Request sector-specific control matrix, logging / auditability architecture, and incident-response procedures. |
| Export-control and restricted-jurisdiction constraints | U.S. export / re-export law | Current in public contracts | Medium | Medium | Screen customers, geographies, and downstream model uses; use counsel on sensitive deployments | Medium | Review export-screening process and any blocked-country or restricted-end-use policy. |
| Feedback assignment and service-IP ownership | Customer/vendor IP allocation | Current in MSA and TOS | Medium | Medium | Contractual carve-outs for customer inventions and regulated workflows | Medium | Review whether enterprise paper limits feedback assignment, deliverable ownership, and derivative-work ambiguity. |
Public evidence shows strong enterprise intent but still customer-favorable default paper; rows are ordered by residual underwriting importance.
[CR008, CR009, CR010, CR011, CR012, CR013]Places Goodfire's principal risks by mitigation maturity, showing that the company has meaningful intellectual and governance assets but still weak public proof on repeatability, customer breadth, and regulated deployment readiness.
Heatmap cells are synthesis judgments based on public evidence as of 2026-06-10, not company-internal risk scoring.
[CR009, CR011, CR016, CR019, CR024, CR025]7.2 Technical reliability and product-proof risk
Goodfire's core product claim is ambitious: that interpretability can move model development from guesswork toward controllable engineering. The risk is that Goodfire's own research record shows how early that journey still is. The intentional-design essay says the science is incomplete and the hardest problems remain unsolved. MIT Technology Review highlights the same tension from outside, quoting a mechanistic- interpretability researcher who sees Silico as useful but still more precise alchemy than true engineering. This matters because Goodfire is not selling only dashboards; it is selling trust that its interventions expose the right internal mechanisms and safely change behavior in consequential systems. The company's recent papers reinforce the need for skepticism. Verbalized eval awareness inflates measured safety; reasoning traces can be performative rather than faithful; rare harmful or backdoored behaviors may evade standard evaluations; memorization edits can preserve some reasoning while damaging arithmetic and recall; and Goodfire's own method posts say SAEs, linear steering, and parameter decomposition all have important limitations. None of that invalidates the technology. In fact, it strengthens the case that Goodfire is doing serious work on real failure modes. But it also means buyers and investors should treat current results as advanced instrumentation, not yet as proof that model behavior is fully legible or controllable. The sharp question for diligence is whether Goodfire can turn promising research into production-grade reliability evidence faster than the surrounding AI stack commoditizes adjacent monitoring, evaluation, and tracing workflows.[CR001, CR002, CR003, CR004, CR005, CR006]
| failure mode | likelihood | severity | mitigation maturity | residual exposure | unresolved gap |
|---|---|---|---|---|---|
| Interpretability science remains incomplete, making product promises outrun causal understanding | High | High | Partial: Goodfire is publishing openly about limitations and building tooling anyway | High | Need independent production case studies showing interventions improve outcomes without hidden regressions. |
| Benchmark safety scores can be inflated by eval awareness and prompt artifacts | High | High | Partial: Goodfire has identified the distortion and prompt-rewrite mitigations | High | Need third-party eval methodology showing deployment behavior tracks benchmark performance. |
| Chain-of-thought can be performative rather than faithful on easier tasks | High | Medium-High | Partial: probes and early-exit methods help, but do not solve full faithfulness | Medium-High | Need deployment monitors that do not rely only on visible reasoning. |
| Rare harmful or backdoored behaviors may evade standard testing until after deployment | Medium-High | High | Partial: model-diff amplification appears useful for surfacing rare failures | High | Need standardized pre-deployment red-team workflow and evidence it generalizes beyond model organisms. |
| Edits that suppress memorization or steer behavior can degrade arithmetic or factual recall | Medium | Medium-High | Weak-Partial: tradeoffs are documented, not yet cleanly solved | Medium-High | Need model-quality scorecards showing what is lost as interpretability interventions are applied. |
| Current SAE / steering methods capture only fragments of geometry and can produce off-target effects | Medium | Medium | Partial: Goodfire is moving toward manifold-aware methods and SPD | Medium | Need proof that newer methods scale beyond toy models and simple demo tasks. |
| Public security posture shows SOC 2 but not public SLAs, incident history, or runtime-control detail | Medium | Medium-High | Partial: SOC 2 and trust portal exist | Medium-High | Need uptime reporting, incident history, and architecture detail for regulated buyers. |
Severity reflects whether the failure mode would break trust in Goodfire as a control layer for consequential AI, not merely whether a research result is interesting.
[CR003, CR004, CR005, CR016, CR028, CR029]Shows how Goodfire's research and contract risks transmit into slower regulated adoption, weaker reference quality, and potential valuation compression.
The DAG expresses directional business logic rather than measured probabilities.
[CR003, CR004, CR009, CR011, CR024, CR025]7.3 Partner, customer, and dependency risk
Public market proof for Goodfire is narrower than the headline suggests. The company says the platform is used by Fortune 500 enterprises, healthcare institutions, and AI labs, but the named public evidence clusters around a small number of collaborations: Prima Mente in Alzheimer's biomarker discovery, Mayo Clinic in genomic medicine, Radical AI in materials science, and a request-access product page for companies training or fine-tuning models. Even the strongest case study describes Goodfire researchers embedding with the customer and building the workflow jointly. That is valuable evidence of technical depth, but it points to a high-touch delivery model rather than clearly repeatable software revenue. MIT Technology Review's case-by-case pricing note and On Healthcare's observation that this is not yet a predictable SaaS profile both fit the same pattern. This concentration creates two linked risks. First, public reference quality is partner-heavy rather than broad-based: if one flagship collaboration stalls, there is little disclosed volume to absorb the narrative hit. Second, the broader buyer workflow already contains adjacent products from observability and evaluation vendors such as Datadog and LangSmith, which package testing, tracing, monitoring, and governance for production AI teams. Those platforms are not mechanistic-interpretability equivalents, but they compete for budget and for the right to define what AI control and monitoring should look like in production. Goodfire therefore depends on proving that deep white-box access is a distinct control layer worth buying, not just an advanced research add-on inside a stack customers already understand.[CR006, CR007, CR018, CR024, CR025, CR026]
| dependency | counterparty / surface | role | concentration | failure scenario | severity | mitigation | residual exposure |
|---|---|---|---|---|---|---|---|
| Named reference base | Prima Mente, Mayo Clinic, Radical AI, and unnamed enterprises | Public proof that the platform works in important domains | High | One or two flagship collaborations stall, leaving little disclosed breadth to offset the narrative hit | High | Add diverse named production references and renewal proof across sectors | High |
| Research-heavy delivery model | Embedded researchers, field engineering, collaboration services | Transforms customer models and produces the strongest public outcomes | High | Revenue scales with scarce expert labor instead of repeatable software usage | High | Separate productized modules, playbooks, and self-serve workflows from bespoke research work | High |
| Frontier-model builder demand | Companies training or fine-tuning models across architectures and modalities | Core buyer group for Silico | Medium-High | Open-model teams or frontier labs internalize similar tooling or decide observability is enough | High | Show clear ROI and control advantages that cannot be replicated with standard tracing stacks | Medium-High |
| Customer willingness to share workflow and usage data | Enterprise customers under TOS / order-form process | Can improve platform performance and product learning loops | Medium | Procurement teams restrict data-use rights or demand hard segregation of traces | Medium-High | Offer tighter customer controls and contract options that preserve trust without removing all learning loops | Medium |
| Adjacent observability stack | Datadog Agent Observability, LangSmith, and similar tooling | Competes for the same monitoring, evaluation, and governance budget lines | Medium | Customers buy observability plus evaluations and decide they do not need separate white-box interpretability | Medium-High | Position interpretability as a distinct causal-control layer with measurable lift in debugging or model design | Medium-High |
| Healthcare governance partners | Mayo and other regulated institutions | Provide legitimacy in sensitive domains | Medium | If governance-heavy partners do not translate into broader deployments, Goodfire stays a bespoke research vendor | Medium-High | Turn flagship healthcare work into repeatable compliance and validation packages | Medium-High |
Concentration is judged from disclosed public evidence only; the company may have broader commercial breadth privately, but it is not yet visible enough to underwrite as a core mitigation.
[CR006, CR007, CR017, CR018, CR024, CR025]Maps the external surfaces Goodfire currently relies on most: flagship collaborators, enterprise buyers willing to share data and buy bespoke work, and adjacent observability platforms shaping buyer expectations.
Partner and buyer concentration are inferred only from publicly disclosed proof points; undisclosed customers could improve the true picture.
[CR007, CR024, CR026, CR027, CR044, CR045]7.4 Execution, talent, capital, and thesis-break triggers
Goodfire is trying to do three hard things at once: push frontier interpretability research, turn that work into an enterprise platform, and establish category authority in regulated and high-stakes domains. Public evidence suggests the company is still small relative to the size of that ambition. On Healthcare pegs headcount at about 51 people, the labor pool for interpretability specialists appears unusually thin, and the careers page signals a still-scaling organization. At the same time, the February 2026 Series B pushed valuation to $1.25 billion, which compresses the margin for execution error. A company with limited disclosed customer breadth, no public pricing architecture, and a high-touch services component now has to prove that it can become repeatable software quickly enough to justify that mark. The practical investment answer is to convert these uncertainties into hard triggers. If customer contracts continue to leave security and outage risk mostly with the buyer, if named production references do not widen materially, if software revenue still cannot be separated from embedded services, or if adjacent observability platforms satisfy most buyer needs, the thesis weakens fast. Conversely, the risk can compress if Goodfire shows production renewals in regulated settings, enterprise paper that materially tightens default terms, and independent evidence that interpretability interventions work in deployment rather than only in papers or bespoke collaborations. Until then, Goodfire looks like a high-upside but still proof-constrained control-layer bet rather than a de-risked infrastructure standard.[CR019, CR020, CR021, CR022, CR023, CR024]
| role / function | dependency or gap | likelihood | severity | mitigation | diligence path |
|---|---|---|---|---|---|
| Interpretability research bench | Global talent pool appears unusually thin and expensive | High | High | Use capital to recruit senior researchers and convert reputation into hiring leverage | Review retention metrics, key-hire pipeline, and compensation competitiveness versus frontier labs. |
| Research-to-product translation | Company must turn frontier papers into repeatable enterprise workflows | High | High | Productize the highest-value interventions and narrow initial beachhead use cases | Review product roadmap, services share of revenue, and deployment architecture for named customers. |
| Commercial scaling / GTM | Case-by-case pricing and request-access posture limit visible repeatability | High | Medium-High | Standardize packages, implementation process, and procurement paper | Request pricing architecture, ACV bands, sales-cycle data, and renewal metrics. |
| Management bandwidth | Small team is simultaneously building research, platform, and regulated-domain partnerships | Medium-High | Medium-High | Prioritize a few vertical wedges and reduce bespoke projects | Review functional leadership depth, hiring plan, and what share of roadmap is customer-specific. |
| Capital discipline after unicorn pricing | Series B valuation compresses tolerance for slow commercial proof | Medium | High | Use new capital to widen reference base and prove software leverage quickly | Request board materials on spend allocation, next milestone gates, and target evidence for next round. |
The public risk is not simply that the team is small; it is that the company's ambition, valuation, and labor-market scarcity all expand execution scope faster than public proof has expanded.
[CR019, CR020, CR021, CR022, CR023, CR024]| risk | monitorable trigger | threshold / event | action implication |
|---|---|---|---|
| Contract paper remains startup-favorable | Enterprise MSAs still mirror public liability caps and warranty disclaimers | No meaningful security / outage / confidentiality carve-out in first 3 reference customers | Treat regulated-deployment thesis as unproven; do not underwrite healthcare or critical-infrastructure expansion. |
| Data-rights friction blocks procurement | Customers require major redlines around Workflow Data or refuse data sharing entirely | Two or more priority accounts stall specifically on data-use terms | Assume slower sales cycles and weaker product-learning loop; haircut software-scale assumptions. |
| Reference set fails to broaden | Named production customers do not expand beyond current collaboration-heavy proof set | Fewer than 3 additional named production references within the next refresh cycle | Re-rate company as bespoke research/services business rather than infrastructure layer. |
| Research results do not translate into deployment lift | No independent evidence of production gains from interpretability interventions | No third-party deployment study or customer KPI showing measurable improvement | Reduce moat assumption and compare directly against conventional observability vendors. |
| Security and uptime posture stays opaque | No public uptime, incident history, or runtime-control evidence beyond SOC 2 | Another refresh passes without SLA, status, or incident disclosures | Assume slower enterprise penetration in sensitive workloads. |
| Talent pipeline weakens | Hiring velocity or retention falls in core interpretability roles | Missed senior research / product hires for two consecutive quarters | Expect roadmap slippage and heavier founder / researcher concentration risk. |
| Valuation outruns repeatability | Capital raised and valuation grow faster than visible revenue quality | No pricing standardization or software-services split by next major financing event | Avoid paying for category-optionality without evidence of repeatable unit economics. |
| Observability platforms absorb the buyer problem | Customers adopt tracing/evaluation stacks without adding white-box interpretability | Reference buyers describe Goodfire as nice-to-have research tooling rather than control-plane infrastructure | Thesis break: category collapses into a feature rather than a standalone platform. |
Kill criteria are framed as observable public-or-diligence events so they can be revisited in future refreshes instead of remaining abstract concerns.
[CR009, CR011, CR013, CR016, CR019, CR024]08Valuation
8.1 Recommendation, Financing Context, and Why Price Matters More Than Narrative
Public evidence paints Goodfire as a rare, high-quality interpretability company. The company assembled an elite funding stack quickly: a $50 million Series A in April 2025 followed by a $150 million Series B at a $1.25 billion valuation in February 2026, with Menlo, Anthropic, B Capital, Salesforce Ventures, and Eric Schmidt all showing up across the cap table. Official and filing records also support the basics of institutional quality: Goodfire is a Delaware public benefit corporation founded in 2023, based in San Francisco, and by early 2026 had filed both Series A- and Series B-era Form D documents. Goodfire further claims enterprise-ready momentum via Ember, Mayo Clinic, Arc Institute, Prima Mente, Microsoft, and a February 2026 SOC 2 Type II announcement. Those positives matter, but this chapter is valuation work, not admiration. The evidence is strong on team quality, scientific credibility, and investor signaling; it is weak on the commercial datapoints normally used to justify a software infrastructure price. None of the public round materials in this source pack disclose ARR, revenue, pricing, customer count, retention, gross margin, or software-versus-services mix. That absence is decisive. At $1.25 billion, investors are not obviously paying for proven fundamentals; they are paying for the option that interpretability becomes core AI infrastructure and that Goodfire becomes one of the category winners. That may happen, but on public evidence alone the price already assumes more commercialization than the company has disclosed. The recommendation is therefore research-more, not buy, and the valuation stance is stretched rather than attractive.[CV001, CV002, CV004, CV005, CV006, CV007]
| Dimension | Assessment | Decision implication |
|---|---|---|
| Recommendation | Research-more | Re-engage only if NDA diligence closes the revenue-quality and cap-table gaps, or if pricing resets toward the base-case range. |
| Confidence | Medium | Quality of company signal is strong; quality of valuation signal is incomplete. |
| Risk rating | High | Commercial opacity, category formation risk, and preference-stack uncertainty dominate underwriting. |
| Valuation stance | Stretched | The $1.25B round sits near the low end of the bull case rather than the center of the base case. |
| Near-term action | Track aggressively | Maintain diligence access, but do not underwrite the round on narrative alone. |
Uses only public evidence as of the run date; entry discipline assumes primary exposure near the February 2026 round terms.
[CV001, CV005, CV015, CV036, CV047, CV048]| Lens | Thesis | Anti-thesis | What would change the view |
|---|---|---|---|
| Category need | Interpretability should become more important as enterprises demand controllable and explainable AI. | Enterprises may decide observability and guardrails are enough, keeping interpretability niche. | Budget data showing Goodfire wins a standard line item rather than an experimental spend. |
| Product | Ember offers a differentiated model-internal control layer, not just post-hoc monitoring. | The product may still be too research-heavy or bespoke to scale as software. | Proof of standard pricing, time-to-value, and repeatable deployments. |
| Scientific proof | Goodfire has real research outputs, including steering, neural geometry, genomics, and multimodal work. | Scientific credibility does not automatically translate into recurring revenue. | Evidence that flagship research programs convert into durable commercial accounts. |
| Strategic demand | Anthropic, Salesforce, and Eric Schmidt are strong signal investors for the category. | Smart investors can still overpay for strategic option value in a hot AI market. | Independent software metrics that validate the price without relying on cap-table prestige. |
| Valuation | A $1.25B mark could be justified if Goodfire becomes core AI infrastructure for high-stakes deployments. | Today's public evidence does not disclose the ARR or margins needed to justify that mark on fundamentals. | NDA disclosure of ARR, gross margin, and retention that supports a scalable software multiple. |
Rows are evidence-backed arguments and the observable condition that would change the view.
[CV010, CV011, CV013, CV015, CV022, CV029]How scientific strength, commercial opacity, and round price combine into the recommendation.
The flow is qualitative and designed to show decision logic, not a weighted scoring model.
[CV010, CV015, CV036, CV037, CV047, CV048]Key underwriting datapoints that are either known from public evidence or still missing.
KPI panel mixes confirmed public facts with flagged gaps; unknown commercial metrics are shown explicitly as undisclosed.
[CV001, CV005, CV015, CV028, CV036, CV047]8.2 Evidence-Constrained Valuation Framework and Comparable Marks
Because revenue is undisclosed, a conventional revenue-multiple model would create false precision. The right method is to combine comparable private valuation marks with scenario logic anchored on what is and is not public. The comparable set is useful less as a formula than as a discipline check. Anysphere, Harvey, and Glean all carried disclosed ARR when reporters attached multibillion-dollar marks to them, while Anthropic sits in a wholly different frontier-model and compute-scarcity universe. Goodfire does not belong in Anthropic territory, and unlike Anysphere, Harvey, or Glean it has not publicly shown the recurring revenue base that would let outside investors defend a multiple. That forces the current round to be interpreted as strategic option value. The bull, base, and bear cases therefore turn on milestone conversion rather than spreadsheet extrapolation. In the bull case, Goodfire proves that Ember converts design partners and research collaborators into repeatable software revenue, keeps shipping differentiated interpretability breakthroughs, and becomes a must-have layer for high-stakes AI deployment. In the base case, the category is real and Goodfire remains one of its strongest independent teams, but commercialization is still early and high-touch; that warrants a discount to the last round, not a premium. In the bear case, research remains impressive but budgets flow toward observability, guardrails, or frontier labs themselves, leaving Goodfire with a bespoke-services profile and a materially lower valuation. On this framing, the February 2026 round sits near the bottom of the bull range rather than in the middle of the base range.[CV010, CV011, CV016, CV022, CV023, CV024]
| Scenario | Core assumptions | Valuation / return logic | Key risks | Probability signal |
|---|---|---|---|---|
| Bull | Ember converts research credibility into repeatable software revenue; partners become scaled reference customers; security and governance posture unlocks enterprise adoption. | $1.25B-$1.85B EV; roughly 1.0x-1.5x versus the last round, meaning upside exists but is not huge unless execution is exceptional. | Commercial conversion may stay slower than the research narrative implies. | Low-medium; requires proof not yet public. |
| Base | Category demand is real and Goodfire remains one of the best independent teams, but monetization stays early and partially bespoke. | $0.80B-$1.10B EV; roughly 0.6x-0.9x versus the last round, implying weak risk-adjusted returns at today's price. | Public data never closes the revenue-quality gap; budgets split across adjacent vendors. | Medium; most consistent with current public evidence. |
| Bear | Interpretability remains valuable but budgets shift toward observability, guardrails, or frontier labs; Goodfire struggles to standardize product revenue. | $0.35B-$0.65B EV; roughly 0.3x-0.5x versus the last round, implying material permanent-capital risk. | Commercialization remains bespoke; multiple compression hits AI infrastructure names. | Medium-low, but adverse enough to matter because disclosure is limited. |
Scenario values are evidence-constrained enterprise-value ranges, not precise DCF outputs. Return logic is shown against the February 2026 $1.25B round mark.
[CV041, CV042, CV043, CV044, CV045, CV046]| Comparable | Public metric | Valuation / status | Relevance | Limitation |
|---|---|---|---|---|
| Goodfire | Revenue undisclosed; $150M Series B | $1.25B valuation (Feb 2026) | Direct market anchor for this chapter. | No public ARR, pricing, or customer data to support a software multiple. |
| Anysphere / Cursor | >$500M ARR | $9.9B valuation (Jun 2025) | Shows what a leading AI application company looks like when valuation is paired with disclosed scale. | Different product, growth profile, and developer-led distribution. |
| Harvey | $190M ARR | $11B reported raise target (Feb 2026) | Shows how elite enterprise AI valuations can outrun conventional multiples when growth is proven. | Legal AI is a different vertical and the number is reported, not company-confirmed. |
| Glean | >$100M ARR | $7.2B valuation (Jun 2025) | Useful application-software benchmark for enterprise AI value with disclosed ARR. | Enterprise search and agents is a more mature commercial category than interpretability. |
| Anthropic | Frontier model and compute scale | $350B valuation with Google committing up to $40B (Apr 2026) | Upper boundary for frontier-model scarcity value in AI. | Not comparable operationally; Goodfire is not a frontier foundation-model lab. |
Selected 2025-2026 private AI marks used as discipline checks, not one-for-one valuation formulas; Goodfire revenue is undisclosed, so implied multiples cannot be calculated responsibly.
[CV001, CV030, CV032, CV033, CV034, CV035]Directional sensitivity of valuation conviction; positive bars strengthen willingness to pay, negative bars weaken it.
Sensitivity bars are directional conviction scores, not dollar deltas, because public revenue disclosure is absent.
[CV015, CV022, CV036, CV039, CV040, CV041]Evidence-constrained valuation bands against the February 2026 $1.25B round reference.
These are scenario ranges built from public comparables and milestone logic; they are not a substitute for NDA-backed financial underwriting.
[CV001, CV042, CV043, CV044, CV045, CV048]8.3 Exit Discipline, Thesis-Break Triggers, and Final Diligence Asks
The near-to-mid-term exit path is almost certainly another private round or a strategic transaction, not an IPO. Goodfire is too early and too opaque publicly for public-market underwriting: investors do not have audited revenue scale, margin profile, or even a basic customer-count disclosure. That does not make the company unattractive; it makes the investment case diligence-dependent. The practical implication is that entry discipline must focus on the missing proof points that would move Goodfire from “exceptional research company with commercial promise” to “underwritable software infrastructure business.” Those proof points are recurring revenue quality, standard pricing, concentration, gross margin, and the post-Series-B preference stack. The thesis can also break in observable ways. If collaborators fail to convert into repeatable customers, if management cannot disclose convincing revenue quality under NDA, or if budget holders decide that tracing, monitoring, and guardrails from adjacent vendors are sufficient without Goodfire's deeper internal-control layer, the current price becomes difficult to defend. Conversely, if Goodfire can show repeatable software subscriptions, strong partner conversion, and evidence that interpretability is becoming mandatory infrastructure in regulated and high-stakes deployments, the round can grow into itself. Until that evidence is produced, the disciplined posture is to keep Goodfire on the front of the watchlist, continue diligence aggressively, and avoid treating the February 2026 price as a proven bargain.[CV022, CV026, CV027, CV036, CV039, CV040]
| Trigger | Threshold | Transmission to thesis | Action implication |
|---|---|---|---|
| Revenue-quality opacity persists | Management cannot disclose ARR, gross margin, concentration, and retention under NDA. | The investment remains narrative-led instead of fundamentals-led. | Do not underwrite above the base-case range; default to pass. |
| No partner-to-paid conversion pattern | Scientific collaborators and design partners do not convert into repeatable platform revenue. | Goodfire looks like a high-end research shop instead of scalable infrastructure software. | Move valuation toward the bear case and require a lower entry or structured downside. |
| Observability vendors satisfy the budget | Customers solve their pain with tracing, monitoring, and guardrails without needing model-internal control. | The category wedge narrows and Goodfire's TAM compresses. | Reduce conviction materially and reassess category ownership. |
| Preference stack is investor-unfriendly | Series B documents reveal heavy seniority, unusual protections, or meaningful dilution overhang. | Enterprise value may not translate into acceptable equity returns. | Re-cut returns on an equity-value basis before proceeding. |
| Security or governance credibility slips | A major trust, compliance, or governance issue undercuts the high-stakes deployment narrative. | The premium tied to safe and controllable AI weakens quickly. | Pause diligence until remediation is independently verified. |
Triggers are framed as observable diligence findings or post-investment monitoring events that would break the underwriting case.
[CV036, CV043, CV046, CV049, CV050]| Topic | Missing evidence | Why it matters | Owner / diligence path |
|---|---|---|---|
| Revenue quality | ARR, bookings, net retention, gross retention, gross margin, and revenue mix. | These are the core inputs for any valuation method beyond strategic option value. | CFO / finance room under NDA. |
| Pricing and packaging | Current pricing sheets, pilot-to-production conversion terms, and software-versus-services monetization. | Determines whether the business can scale as product revenue rather than bespoke work. | Sales leader and product leader interview plus contract sample review. |
| Customer concentration | Top ten customers, revenue concentration, deployment scope, and renewal status. | High concentration would make the current price much harder to defend. | Customer cohort review and account-level diligence. |
| Cap table and preferences | Post-Series-B cap table, liquidation preferences, pro rata rights, and governance protections. | Equity value can differ sharply from enterprise value if preferences are heavy. | Legal diligence on financing documents. |
| Commercial conversion | Evidence that Mayo, Arc, Microsoft, or similar relationships create repeatable paid software patterns. | This is the bridge between scientific credibility and a scalable investment case. | Management deep dive with cohort examples and implementation metrics. |
These are the minimum asks needed to move Goodfire from an interesting company to an underwritable investment at or near the last round price.
[CV027, CV040, CV049, CV050]8.4 Exhibits
Disclaimer
This report is a public-evidence diligence snapshot, not investment advice. Important financial, legal, technical, and contractual facts remain non-public and should be verified directly with management and primary documents before any investment decision.
Evidence index
| ID | Statement | Confidence | Sources |
|---|---|---|---|
| CO001 | Goodfire describes itself as a San Francisco-based research company and public benefit corporation. | High | SO001, SO002, SO014, SO018 |
| CO002 | Goodfire’s mission is to build safe and powerful AI by understanding and intentionally shaping model internals rather than relying on scaling alone. | Medium | SO001, SO004, SO005, SO006 |
| CO003 | Goodfire’s current public product is a model design environment that helps users understand, debug, and shape models through interpretability-based tooling. | Medium | SO002, SO007, SO027 |
| CO004 | Goodfire says its platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs. | Medium | SO010 |
| CO005 | Official materials frame Goodfire around two linked pillars: intentional design of models and scientific discovery from model internals. | Medium | SO004, SO005, SO008 |
| CO006 | Lightspeed publicly announced Goodfire’s $7 million seed round on August 15, 2024, showing the company was operating by mid-2024. | Medium | SO022, SO023 |
| CO007 | Series A materials say the $50 million round came less than one year after Goodfire’s founding, which supports a 2024 founding window. | Medium | SO018, SO020, SO021 |
| CO008 | One independent profile describes Goodfire as founded in 2023, creating a conflict with the 2024 founding window implied by financing materials. | Medium | SO028 |
| CO009 | Goodfire’s careers page says all roles are full-time and in person five days a week at a Telegraph Hill office in San Francisco. | Medium | SO003 |
| CO010 | Eric Ho is Goodfire’s CEO and primary public spokesperson in financing and media materials. | Medium | SO014, SO018, SO029 |
| CO011 | Daniel Balsam is publicly identified as Goodfire’s cofounder and CTO. | Medium | SO009, SO024, SO030 |
| CO012 | Tom McGrath is publicly identified as Goodfire’s cofounder and chief scientist, and partner materials credit him with founding DeepMind’s interpretability team. | Medium | SO024, SO030 |
| CO013 | Goodfire and third-party coverage say the team includes researchers or engineers from OpenAI, Google DeepMind, Harvard, Stanford, and UC San Diego. | Medium | SO004, SO014, SO017 |
| CO014 | Investor materials tie Eric Ho and Daniel Balsam to prior operating work at RippleMatch, supporting the claim that the founding team combines startup execution with research pedigree. | Medium | SO021, SO022, SO024 |
| CO015 | Reviewed public materials do not disclose a full board roster or a complete executive team beyond the founders and a few named researchers. | Medium | SO002, SO024, SO025 |
| CO016 | Goodfire announced a $50 million Series A led by Menlo Ventures with Lightspeed, Anthropic, B Capital, Work-Bench, Wing, and South Park Commons participating. | High | SO018, SO019, SO020, SO021, SO026 |
| CO017 | Lightspeed says it led Goodfire’s $7 million seed round in August 2024. | Medium | SO022, SO023 |
| CO018 | Goodfire announced a $150 million Series B at a $1.25 billion valuation led by B Capital with Juniper Ventures, Menlo Ventures, Lightspeed Venture Partners, South Park Commons, Wing Venture Capital, DFJ Growth, Salesforce Ventures, Eric Schmidt, and others participating. | High | SO004, SO014, SO015, SO016, SO017, SO029 |
| CO019 | The Series B was announced less than a year after Goodfire’s Series A. | Medium | SO014, SO015, SO004 |
| CO020 | Goodfire and third-party coverage describe the company as having raised more than $200 million in total funding after the Series B. | Medium | SO004, SO014, SO016 |
| CO021 | Adding the publicly disclosed seed, Series A, and Series B rounds implies roughly $207 million of total disclosed capital. | Medium | SO022, SO018, SO014 |
| CO022 | Reviewed public sources do not disclose debt financing, secondary transactions, ownership percentages, or board-seat allocations for Goodfire’s financings. | Medium | SO014, SO018, SO025 |
| CO023 | Salesforce Ventures’ investment materials frame Goodfire as foundational enterprise AI infrastructure rather than only a research project. | Medium | SO024, SO025 |
| CO024 | Goodfire’s public product branding shifted from Ember in 2025 financing materials to Silico in 2026 product materials. | Medium | SO018, SO020, SO007, SO029 |
| CO025 | Goodfire says it reduced hallucinations in a large language model by about half using interpretability-informed training. | Medium | SO004, SO027 |
| CO026 | Official materials name Prima Mente, Arc Institute, Mayo Clinic, and Microsoft as partners or collaborators. | Medium | SO004, SO008, SO009, SO011 |
| CO027 | The Mayo Clinic collaboration explicitly discloses that Mayo Clinic has a financial interest in the technology referenced in the announcement. | Medium | SO009 |
| CO028 | Goodfire’s public commercial proof remains broad and category-based because it names customer types but does not list many named enterprise customers or contract counts. | Medium | SO010, SO028 |
| CO029 | Goodfire should be classified as a private Series B-stage company based on investor profiles labeling it private and the February 2026 financing history. | Medium | SO025, SO030, SO014 |
| CO030 | Goodfire’s best-supported current public valuation is $1.25 billion. | High | SO004, SO014, SO015, SO016, SO017 |
| CO031 | Goodfire’s best-supported public total capital figure is above $200 million. | Medium | SO004, SO014, SO016, SO022, SO018 |
| CO032 | No reviewed public source discloses Goodfire’s revenue, ARR, or customer count. | Medium | SO004, SO014, SO025 |
| CO033 | No official source reviewed discloses employee headcount, but one independent profile estimates Goodfire had about 51 employees as of January 2026. | Low | SO003, SO028 |
| CO034 | Reviewed public sources identify only a single disclosed office location in San Francisco and do not name other offices. | Medium | SO003, SO025 |
| CO035 | The public milestone arc visible in reviewed sources runs from seed financing in August 2024 to Series A in April 2025 and Series B in February 2026. | Medium | SO022, SO020, SO004 |
| CO036 | Goodfire’s September 2025 Mayo Clinic announcement shows the company expanding from interpretability tooling into healthcare and genomic medicine partnerships. | Medium | SO009 |
| CO037 | By February 2026 Goodfire was publicly describing partnerships spanning AI agents and life sciences. | Medium | SO004, SO015 |
| CO038 | MIT Technology Review reported on April 30, 2026 that Goodfire was commercially releasing Silico as a fee-based tool for model debugging and steering. | Medium | SO027 |
| CO039 | MIT Technology Review quoted an outside interpretability researcher saying Goodfire is adding “precision to the alchemy” rather than making model design fully principled. | Medium | SO027 |
| CO040 | An independent health-tech analysis argues the $1.25 billion valuation is aggressive for a research-first company with early commercial traction and an estimated 51 employees. | Medium | SO028 |
| CO041 | Goodfire’s public materials show active field-building and recruiting through a fellowship program, Stanford guest lectures, and ongoing in-person hiring in 2025-2026. | Medium | SO003, SO012, SO013 |
| CM001 | Goodfire positions itself as an interpretability lab focused on understanding and intentionally designing AI rather than only monitoring outputs. | High | SM001, SM007 |
| CM002 | Silico is described as a model design environment for training and debugging models on Goodfire infrastructure. | High | SM003, SM007 |
| CM003 | Goodfire says it partners with organizations training or fine-tuning foundation models across architectures and modalities. | High | SM003, SM004, SM005, SM006 |
| CM004 | Goodfire claims its language-model workflow cut hallucinations by 58% without degrading benchmark performance and at about 90x lower cost than LLM-as-a-judge. | Medium | SM004 |
| CM005 | Goodfire publicly markets use cases across language models, genomics, and robotics or vision instead of only text-model applications. | High | SM004, SM005, SM006 |
| CM006 | Goodfire says it works with partners such as Arc Institute, Mayo Clinic, and Microsoft and uses a shared environment with customers. | Medium | SM007 |
| CM007 | Goodfire publicly describes inference-time monitors and production monitoring as part of its intentional-design platform. | High | SM001, SM007 |
| CM008 | Goodfire argues that black-box prompting and fine-tuning are inadequate for reliable high-stakes AI engineering and that feature steering can substitute for some fine-tuning work. | Medium | SM008, SM009 |
| CM009 | Goodfire's pilot agreement starts with internal evaluation of software plus services and explicitly aims toward a later commercial license. | Medium | SM014 |
| CM010 | The pilot agreement requires customer cooperation, access to software or equipment, and designated contacts, implying a high-touch delivery model. | Medium | SM014 |
| CM011 | Prima Mente used Goodfire to decode an epigenomics model for biomarker discovery and model redesign, showing a plausible scientific-AI buyer archetype. | High | SM005, SM015 |
| CM012 | Goodfire and Mayo frame interpretability as a way to validate model predictions, reduce spurious correlations, and improve scientific or clinical relevance under governance controls. | High | SM005, SM010 |
| CM013 | MIT Technology Review says Goodfire is one of a small handful of companies pioneering mechanistic interpretability and that frontier labs already have internal interpretability teams. | Medium | SM030 |
| CM014 | MIT says Silico is most usable where customers can access model internals, which is easier for open-source or in-house models than for closed models like ChatGPT or Gemini. | High | SM003, SM030 |
| CM015 | MIT reports that Goodfire will price Silico case-by-case instead of publishing standard pricing. | Medium | SM030 |
| CM016 | Gartner says generative-AI ROI varies widely by use case and that hidden costs such as compliance reviews, retraining, and internal overhead can exceed initial expectations. | Medium | SM016 |
| CM017 | Gartner places generative AI in the 2025 Trough of Disillusionment, which implies more cautious implementation expectations even as interest remains high. | Medium | SM016 |
| CM018 | NIST's AI Risk Management Framework treats trustworthy, governable AI as a prerequisite for adoption in higher-risk settings. | Medium | SM018 |
| CM019 | PwC reports that AI-exposed industries have 3x higher revenue-per-worker growth since 2022 and workers with AI skills command a 56% wage premium. | Medium | SM017 |
| CM020 | Goodfire's relevant market boundary is narrower than broad generative-AI narratives and should focus on model design, interpretability, and model-behavior tooling for teams that can inspect or modify internals. | High | SM001, SM003, SM014, SM030 |
| CM021 | The included spend pool covers representation analysis, failure diagnosis, steering, interpretable training feedback, and production monitors, while excluding generic AI hardware, generic copilots, and pure app-performance monitoring. | High | SM003, SM007, SM023 |
| CM022 | Arize, Fiddler, Datadog, LangSmith, Langfuse, Patronus, Arthur, and Humanloop show that tracing, evaluation, monitoring, and agent control are already recognized software categories. | High | SM019, SM021, SM023, SM024, SM025, SM027, SM028, SM029 |
| CM023 | Those adjacent platforms mostly observe prompts, traces, sessions, and outputs, whereas Goodfire's differentiation claim is control over internal features, parameters, or latent representations. | High | SM003, SM011, SM013, SM019, SM021, SM024, SM025 |
| CM024 | Arize sells from free or open-source tooling to a $50-per-month Pro plan and custom enterprise pricing, showing the adjacent observability layer already has self-serve pricing and startup programs. | High | SM019, SM020 |
| CM025 | Fiddler publishes a developer price of $0.002 per trace and markets enterprise guardrails, observability, and governance as one platform. | High | SM021, SM022 |
| CM026 | Langfuse publishes prices from free to $29 per month Core, $199 per month Pro, and $2,499 per month Enterprise, with enterprise security and support features. | High | SM025, SM026 |
| CM027 | Humanloop markets enterprise evaluation tooling with a free trial, 50 eval runs, and 10,000 logs per month, reinforcing that adjacent budgets often begin with workflow tooling rather than custom research engagements. | Medium | SM029 |
| CM028 | Goodfire's direct market reach is highest in frontier labs because they already run interpretability teams, possess model internals, and value precise control over training and behavior. | Medium | SM003, SM007, SM030 |
| CM029 | Enterprise model teams are reachable when they train or fine-tune proprietary or open-weight models, but teams using only closed APIs are outside Goodfire's near-term reach. | Medium | SM003, SM009, SM014, SM030 |
| CM030 | Scientific-AI teams in genomics, biology, and robotics are attractive because model internals can reveal domain mechanisms, improve generalization, and validate whether predictions rely on real structure or shortcuts. | High | SM005, SM006, SM010, SM012, SM015 |
| CM031 | Regulated adopters have strong need for interpretability and trustworthy AI, but procurement and deployment cycles are slower because governance, privacy, and evidence standards are higher. | High | SM010, SM017, SM018 |
| CM032 | Goodfire's adoption motion likely starts with a pilot or design-partner evaluation, then requires model and data access, interpretability work, and only later expands to production monitoring and longer-term licensing. | High | SM003, SM007, SM014 |
| CM033 | In this market the buyer, user, and payer often differ, with research or platform leaders buying, model scientists and safety teams using, and AI R&D or platform budgets paying. | Medium | SM002, SM003, SM014, SM030 |
| CM034 | The category grows as models take on higher-stakes tasks in health, science, finance, and autonomous agent workflows where output-only evaluation is insufficient. | High | SM005, SM010, SM021, SM023, SM024 |
| CM035 | Agent-observability vendors frame autonomous decisions, guardrails, and repeatable evaluation as business-critical, which expands the adjacent budget pool that Goodfire can sell into or alongside. | High | SM021, SM022, SM023, SM024, SM025, SM027 |
| CM036 | Dependence on model-internal access is a major constraint because Goodfire's tooling requires deeper access than teams using only hosted closed-model APIs can usually provide. | High | SM003, SM014, SM030 |
| CM037 | Goodfire presents interpretability as precision engineering that can turn training into intentional design. | Medium | SM007, SM008 |
| CM038 | MIT Technology Review quotes an external researcher saying Goodfire is adding precision to alchemy, which challenges the precision-engineering narrative. | Medium | SM030 |
| CM039 | Goodfire's own intentional-design essay says the agenda is at the beginning of a deep technical tree and still needs better interpretability tools and algorithms. | Medium | SM008 |
| CM040 | Goodfire's parameter-decomposition research says current interpretability methods still struggle to map model behavior cleanly to underlying parameters and circuits, which reinforces technical immaturity. | Medium | SM013 |
| CM041 | Goodfire's manifold-steering research argues that linear steering often mismatches model geometry and that geometry-aware steering works better, suggesting the technical edge is not commodity tracing. | Medium | SM011 |
| CM042 | Goodfire's Evo 2 work shows interpretability can reveal biologically relevant features and possibly guide DNA generation, supporting a scientific-AI market lens beyond enterprise copilots. | High | SM005, SM012 |
| CM043 | Goodfire says customer conversations show teams prioritize rapid iteration and migration to newer models over heavy fine-tuning, which implies demand for lighter-weight control tooling. | Medium | SM009 |
| CM044 | Public adjacent pricing creates a floor for what buyers expect to pay for observability and eval tooling, but Goodfire's undisclosed case-by-case pricing means it must win on higher-value model-internal outcomes rather than commodity traces. | High | SM020, SM022, SM026, SM029, SM030 |
| CM045 | Because Goodfire has no public pricing schedule, customer count, or disclosed recurring revenue, a defensible TAM, SAM, or SOM cannot be computed from public evidence alone. | High | SM014, SM030 |
| CM046 | The most evidence-backed near-term SOM is a small set of frontier labs, advanced enterprise model teams, and scientific model builders willing to grant model access and buy services-heavy pilot engagements. | High | SM003, SM005, SM006, SM014, SM030 |
| CM047 | Published self-serve observability prices imply an annual software band of roughly $348 to $2,388 before enterprise add-ons or heavy usage. | High | SM020, SM026 |
| CM048 | Public list pricing shows adjacent enterprise-grade observability software can reach at least about $29,988 per year before overage charges or custom services. | Medium | SM026 |
| CM049 | Fiddler's per-trace pricing implies annual monitoring spend can range from hundreds to tens of thousands of dollars depending on trace volume. | Medium | SM022 |
| CP001 | Goodfire positions Silico as the first platform for intentional model design and as a workspace for training and debugging models at frontier scale. | Medium | SP001 |
| CP002 | Goodfire says its language-model workflow predicts failures before deployment and can correct failure modes directly without retraining from scratch. | High | SP001, SP002 |
| CP003 | Goodfire extends the same model-internal workflow into life sciences and robotics/vision use cases, not just generic chat applications. | High | SP003, SP004 |
| CP004 | Goodfire explicitly frames feature steering as an alternative to black-box prompting and fine-tuning workflows. | Medium | SP005 |
| CP005 | Goodfire disclosed a $150 million Series B at a $1.25 billion valuation and third-party coverage describes roughly $209 million raised in total. | High | SP006, SP008 |
| CP006 | MIT Technology Review describes Goodfire as one of a small handful of mechanistic-interpretability pioneers alongside Anthropic, OpenAI, and Google DeepMind. | Medium | SP007 |
| CP007 | MIT Technology Review says frontier labs already have internal interpretability teams, making them Goodfire's closest direct incumbent alternative for top-end model builders. | Medium | SP007 |
| CP008 | MIT Technology Review says Silico is most useful where customers can inspect a model's inner workings, limiting its applicability on closed models such as ChatGPT or Gemini. | Medium | SP007 |
| CP009 | Outside researcher Leonard Bereska told MIT Technology Review that Goodfire may be adding precision to existing AI alchemy rather than fully turning model building into engineering. | Medium | SP007 |
| CP010 | On Healthcare Tech characterizes Goodfire as a roughly 51-person, research-first organization whose $1.25 billion valuation looks aggressive relative to disclosed commercial traction. | Medium | SP008 |
| CP011 | Goodfire's probe-based data-attribution work claims a 63% reduction in harmful behavior after filtering flagged data and larger reductions after swapping labels or removing responsible sources. | Medium | SP009 |
| CP012 | Goodfire says SAE probes for Rakuten AI agents generalized better than other probes on PII detection and were cheaper than LLM-as-judge baselines. | Medium | SP010 |
| CP013 | Goodfire's Llama 3 research preview claims it can extract modifiable internal features and steer behavior while minimizing performance degradation. | Medium | SP011 |
| CP014 | Goodfire's VPD explainer says direct edits to decomposed parameter subcomponents can produce precise behavior changes without retraining. | Medium | SP012 |
| CP015 | Goodfire says its self-correcting-search collaboration improved viable candidate materials by about 30%, supporting its claim that mechanistic tools can affect model behavior in non-LLM domains. | Medium | SP013 |
| CP016 | Goodfire's own reasoning-theater research argues that chain-of-thought can be unfaithful to internal computation, which weakens the claim that trace-level reasoning alone is enough for debugging. | Medium | SP014 |
| CP017 | Arize Phoenix markets an open-source platform for agent development and evaluation built around tracing, evals, datasets, and experiments. | Medium | SP015 |
| CP018 | Arize prices AX Pro at $50 per month with 50k spans and 10 GB, while enterprise packaging is custom and can be self-hosted. | Medium | SP016 |
| CP019 | Fiddler positions itself as a unified AI observability and security platform with lifecycle evaluation, monitoring, and real-time guardrails. | Medium | SP017 |
| CP020 | Fiddler publishes a free tier and a Developer plan priced at $0.002 per trace, with enterprise deployment options spanning SaaS, VPC, and on-premise. | Medium | SP018 |
| CP021 | Arthur markets a full-lifecycle platform for reliable AI that combines continuous evals, policies, guardrails, dashboards, and oversight. | Medium | SP019 |
| CP022 | Datadog ties agent observability to its broader application-monitoring estate and says teams can test prompt, model, and tool changes against production data in one workflow. | Medium | SP020 |
| CP023 | Datadog publishes a free tier up to 40K LLM spans per month and a Pro plan starting at $160 per month for 100K spans, with no separate evaluation fee. | Medium | SP020 |
| CP024 | LangSmith markets agent observability with framework-agnostic SDKs and says it has a free tier while paid plans scale with trace volume. | Medium | SP021 |
| CP025 | Langfuse markets an open-source AI engineering platform, self-hosting, OpenTelemetry compatibility, 10+ billion observations per month, and more than 100,000 engineers building on it. | Medium | SP022 |
| CP026 | Langfuse publishes transparent self-serve pricing: free Hobby, $29 Core, $199 Pro, and $2499 Enterprise, plus a unit-based overage ladder. | Medium | SP023 |
| CP027 | Humanloop historically sold enterprise tools to develop, evaluate, and ship trustworthy LLM apps, including private deployment options and a free trial. | Medium | SP024 |
| CP028 | Humanloop is now joining Anthropic and sunsetting its platform, so it is better read as a consolidation signal than as a durable stand-alone peer. | Medium | SP025 |
| CP029 | Weights says its products and services were wound down after its team joined OpenAI, reinforcing the pattern that AI tooling teams can be absorbed by frontier labs. | Medium | SP026 |
| CP030 | NIST's AI RMF and Gartner's GenAI guidance both emphasize trustworthiness, governance, evaluation, and hidden operating costs in sensitive AI deployments. | High | SP027, SP028 |
| CP031 | Goodfire's closest direct alternatives are internal frontier-lab interpretability teams and advanced in-house build paths, not ordinary tracing vendors. | High | SP001, SP007, SP015, SP021 |
| CP032 | Most adjacent vendors in the reviewed set compete at the trace, eval, guardrail, or governance layer rather than through direct edits to learned model features. | High | SP017, SP019, SP020, SP021, SP022 |
| CP033 | Goodfire appears best aligned to buyers building or adapting open-weight models in high-stakes domains where pre-deployment diagnosis matters more than general observability breadth. | Medium | SP002, SP003, SP004, SP007 |
| CP034 | Datadog, LangSmith, and Langfuse have stronger visible developer distribution than Goodfire because they ride existing observability, framework, or open-source workflows. | Medium | SP020, SP021, SP022 |
| CP035 | Fiddler and Arthur compete more directly with governance- and trust-led procurement because they explicitly emphasize guardrails, policies, monitoring, and enterprise oversight. | Medium | SP017, SP018, SP019 |
| CP036 | Goodfire's public commercial disclosure is thinner than that of Arize, Fiddler, Datadog, and Langfuse because MIT describes Silico pricing as case-by-case and Goodfire declined specifics. | Medium | SP007, SP016, SP018, SP020, SP023 |
| CP037 | Free or low-cost adjacent tools put price pressure on any attempt to sell Goodfire as a generic AI engineering or observability layer instead of a differentiated model-design product. | Medium | SP015, SP016, SP020, SP022, SP023, SP024 |
| CP038 | Category consolidation is already visible through Humanloop's move to Anthropic and Weights' move to OpenAI, which raises the risk that interpretability adjacencies become features inside larger labs or stacks. | Medium | SP025, SP026 |
| CP039 | Goodfire's moat is strongest if its research outputs in steering, attribution, probes, and domain science can be productized into repeatable workflows rather than bespoke research wins. | Medium | SP006, SP009, SP010, SP011, SP013 |
| CP040 | The public record does not yet show enough win-rate, realized-pricing, or retention evidence to underwrite Goodfire's competitive durability with high confidence. | Medium | SP007, SP008 |
| CP041 | The status-quo substitute for many buyers remains an in-house black-box stack of prompting, eval harnesses, fine-tuning, and guardrails, which is cheaper up front but less mechanistically explanatory. | Medium | SP005, SP015, SP021, SP024 |
| CP042 | Goodfire's partner access today looks more domain-credibility-led than platform-distribution-led: Microsoft, Mayo, Rakuten, and Radical-style collaborations support relevance but do not equal Datadog- or LangChain-style installed-base reach. | Medium | SP006, SP010, SP013, SP020, SP021 |
| CP043 | Humanloop packages enterprise LLM evals as a standalone platform, reinforcing that adjacent evaluation vendors compete for some of the same budgets Goodfire targets. | Medium | SP029 |
| CI001 | Goodfire announced a $150 million Series B round at a $1.25 billion valuation in February 2026. | High | SI002, SI016, SI017 |
| CI002 | Goodfire's 2026 Form D reports $149,999,796 sold after a first sale on 2025-12-17, against a total offering amount of $161,674,124. | Medium | SI028 |
| CI003 | Goodfire announced a $50 million Series A round in April 2025. | High | SI021, SI022, SI023, SI027 |
| CI004 | Goodfire's 2025 Form D reports $52,029,991 sold after a first sale on 2025-04-02. | Medium | SI027 |
| CI005 | At least $202,029,787 of equity sold across Goodfire's 2025 and 2026 Form D filings is directly verifiable from primary filing evidence. | High | SI027, SI028 |
| CI006 | Goodfire says the Series B proceeds will fund frontier research, next-generation product development, and scaled partnerships across AI agents and life sciences. | High | SI002, SI016, SI018 |
| CI007 | Goodfire describes Silico as a model-design environment and workspace for training and debugging models on Goodfire infrastructure. | High | SI001, SI003 |
| CI008 | Goodfire's product and vertical pages route prospects to request access or contact the company instead of publishing self-serve commercial pricing. | High | SI003, SI004, SI005, SI006, SI008 |
| CI009 | Goodfire's contact page says its platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs. | Medium | SI008 |
| CI010 | Goodfire's Series B post says the company has partnered with Arc Institute, Mayo Clinic, and Microsoft to deploy its technology. | Medium | SI002 |
| CI011 | In the Prima Mente case study, Goodfire says its research scientists embedded with the customer and built a biomarker-discovery pipeline around the customer's model. | Medium | SI011 |
| CI012 | Goodfire's public contract terms show a commercial bundle that can include platform access plus support, technical assistance, field engineering support, research activities, collaboration activities, and deliverables. | High | SI013, SI015 |
| CI013 | Goodfire's MSA and TOS place core commercial fees in negotiated order forms rather than in public documentation. | High | SI013, SI015 |
| CI014 | Goodfire's TOS explicitly contemplates overage charges when usage exceeds the allotment included in the applicable order form. | Medium | SI015 |
| CI015 | Goodfire's pilot agreement says pilot access is for internal evaluation and requires a separate commercial license for post-evaluation use. | Medium | SI014 |
| CI016 | Goodfire's TOS says usage reports provided through the platform dashboard or on request are the authoritative source for calculating Fees. | Medium | SI015 |
| CI017 | Goodfire's MSA says it will not use Customer Data to train foundation models or generalized machine-learning models for the benefit of Goodfire, other customers, or third parties. | Medium | SI013 |
| CI018 | Goodfire's TOS gives Goodfire a perpetual license to use Workflow Data to provide, improve, train, fine-tune, and commercialize the platform, subject to confidentiality constraints. | Medium | SI015 |
| CI019 | Goodfire's MSA and TOS allow suspension for overdue accounts and provide for late-payment interest of 1.5 percent per month. | High | SI013, SI015 |
| CI020 | Goodfire announced SOC 2 Type II compliance and a public SOC 3 report by February 2026. | Medium | SI010 |
| CI021 | Goodfire's official vertical pages target teams training or fine-tuning AI models across architectures and modalities rather than retail end users. | High | SI003, SI004, SI005, SI006 |
| CI022 | Goodfire's RLFR post claims a 58 percent reduction in hallucinations in Gemma-3-12B-IT at roughly 90 times lower cost than an LLM-as-a-judge alternative, with no degradation on standard benchmarks. | High | SI004, SI012 |
| CI023 | Goodfire's RLFR and life-sciences proof points are technical or scientific outcomes, not disclosed customer ROI or recognized revenue metrics. | Medium | SI011, SI012, SI026 |
| CI024 | Goodfire's feature-steering post says the SAE demo interface and API were deprecated in February 2026. | Medium | SI007 |
| CI025 | The deprecation of public preview tooling and the request-access posture together suggest Goodfire has shifted its public surface toward enterprise and custom deployments. | Medium | SI003, SI007, SI008 |
| CI026 | Goodfire's public evidence includes named life-sciences proof points with Prima Mente, Mayo Clinic, and Arc Institute. | High | SI002, SI005, SI011 |
| CI027 | Salesforce Ventures presents Goodfire as foundational enterprise infrastructure for understanding and intentionally designing modern AI. | Medium | SI025 |
| CI028 | Menlo Ventures says Goodfire is productizing Ember and commercializing model understanding, and notes that Eric Ho previously scaled a prior company to more than $10 million in ARR. | Medium | SI023 |
| CI029 | No reviewed public source discloses Goodfire's revenue, ARR, gross margin, cash balance, burn rate, runway, or customer retention metrics. | High | SI001, SI002, SI003, SI013, SI015, SI016, SI026 |
| CI030 | No reviewed public source discloses public list pricing, minimum commits, or discount ladders for Silico or related commercial offerings. | High | SI003, SI004, SI005, SI006, SI008, SI013, SI015 |
| CI031 | Because pricing is private and contracts are order-form based, Goodfire's realized pricing and software-versus-services revenue mix cannot be inferred from the official surface alone. | Medium | SI011, SI013, SI014, SI015 |
| CI032 | A skeptical sector analysis argues that Goodfire's $1.25 billion valuation is aggressive for a roughly 51-person company with early commercial traction and not yet a predictable SaaS business. | Medium | SI026 |
| CI033 | The same skeptical analysis argues that investors are underwriting Goodfire on research and platform option value rather than on publicly evidenced near-term software revenue. | Medium | SI026 |
| CI034 | Goodfire's Series B was announced less than a year after its Series A, showing capital access that scaled faster than disclosed operating metrics. | High | SI002, SI021, SI026 |
| CI035 | Goodfire's 2025 Form D lists 47 investors, while the 2026 Form D lists 19 investors. | High | SI027, SI028 |
| CI036 | Goodfire's 2026 Form D total offering amount exceeds the press-announced $150 million sold amount, implying possible residual allocation or additional financing capacity within the same offering. | Medium | SI016, SI028 |
| CI037 | No public debt facility or project-finance obligation surfaced in the reviewed sources, but the absence of disclosure should not be treated as proof of zero leverage. | Medium | SI013, SI015, SI016, SI026 |
| CI038 | Post-Series-B capital adequacy can only be assessed qualitatively: Goodfire is well funded relative to public stage signals, but runway cannot be modeled without cash and burn data. | Medium | SI016, SI026, SI028 |
| CI039 | Goodfire's public messaging implies a high-touch GTM motion centered on selective design partnerships rather than broad self-serve transaction volume. | Medium | SI002, SI008, SI015 |
| CI040 | Because Goodfire's customer evidence includes embedded scientific work and its terms contemplate field engineering and collaboration activities, at least some current revenue likely mixes software access with services delivery. | Medium | SI011, SI015 |
| CI041 | Goodfire publicly presents Radical AI as a materials-science design partner, supporting a commercialization path beyond language-model customers. | Medium | SI029 |
| CE001 | Silico is described as the first platform for intentional model design and as a model-design environment built on Goodfire infrastructure. | High | SE001, SE030 |
| CE002 | Silico markets five operator jobs around model internals: seeing inside predictions, running health checks, debugging failures, shaping behavior, and generalizing from less data. | Medium | SE001 |
| CE003 | Goodfire's current product motion is request-access and partnership-led for teams training or fine-tuning foundation models across architectures and modalities. | Medium | SE001, SE002, SE003, SE004 |
| CE004 | Goodfire's language-model workflow claims a 58% hallucination reduction with no degradation on performance benchmarks. | High | SE002, SE005 |
| CE005 | The same language workflow claims roughly 90x lower intervention cost than LLM-as-a-judge approaches. | Medium | SE002 |
| CE006 | The Hallucinations Viewer compares base and policy rollouts on LongFact++ and exposes intervention details for selected outputs. | Medium | SE005 |
| CE007 | Goodfire's life-sciences surface claims state-of-the-art pathogenicity prediction across 839k ClinVar variants. | High | SE003, SE015 |
| CE008 | Goodfire says EVEE provides interpretable predictions and explanations for all 4.2 million ClinVar variants. | High | SE003, SE015 |
| CE009 | Prima Mente and Goodfire identified DNA fragment length as a dominant Alzheimer's signal and distilled the finding into a human-readable classifier. | Medium | SE013, SE028 |
| CE010 | Goodfire says the Alzheimer's biomarker workflow generalized to an independent cohort. | High | SE003, SE013 |
| CE011 | Goodfire's robotics and vision surface says teams can predict failures before deployment by inspecting latent representations directly. | Medium | SE004 |
| CE012 | The robotics case study says Goodfire traced unstable behavior to brittle internal features and information bottlenecks. | Medium | SE004 |
| CE013 | Goodfire markets feature steering as stronger than prompting when prompt engineering hits diminishing returns. | Medium | SE006 |
| CE014 | Goodfire says feature steering can often replace fine-tuning for behavior changes but cannot add new knowledge to a model. | Medium | SE006 |
| CE015 | Goodfire deprecated its earlier SAE demo interface and API in February 2026. | Medium | SE006 |
| CE016 | MIT Technology Review reports that Silico uses agents to automate interpretability work that previously required human researchers. | High | SE009, SE030 |
| CE017 | MIT Technology Review reports that Silico is priced case-by-case and is easier to use on open-source models than on closed APIs. | Medium | SE030 |
| CE018 | Goodfire's intentional design thesis frames current AI training as guess-and-check and positions interpretability as closed-loop steering. | Medium | SE007 |
| CE019 | Goodfire says intentional design aims to change what models learn from individual datapoints rather than hard-wiring heuristics into models. | Medium | SE007 |
| CE020 | Goodfire says it released the first public sparse autoencoders trained on a true reasoning model, DeepSeek R1. | Medium | SE010 |
| CE021 | Goodfire's R1 work says effective steering had to begin after the model's boilerplate response prefix rather than at the first response token. | Medium | SE010 |
| CE022 | Goodfire reports that some R1 features revert toward original behavior under oversteering before outputs become incoherent. | Medium | SE010 |
| CE023 | Goodfire argues that important model concepts often live on curved manifolds rather than along single linear directions. | Medium | SE011, SE014 |
| CE024 | Can SAEs Capture Neural Geometry? says a single sparse-autoencoder feature gives only a partial view of curved internal structure. | Medium | SE014 |
| CE025 | Goodfire says its manifold pipeline clusters sparse features to recover fuller geometric structure from internal representations. | Medium | SE014 |
| CE026 | Stochastic Parameter Decomposition moves interpretability into parameter space by learning which weight components can be removed without changing behavior. | Medium | SE017 |
| CE027 | Model diff amplification makes rare harmful behaviors 10 to 300 times more common in sampling, making them easier to detect. | Medium | SE016 |
| CE028 | Goodfire says model diff amplification can reveal post-training side effects after only a fraction of a training run. | Medium | SE016 |
| CE029 | Goodfire's eval-awareness study found naturally occurring verbalized eval awareness across all 19 benchmarks and 8 models it tested. | Medium | SE012 |
| CE030 | Goodfire says prompt rewrites reduced eval awareness by 40% and an unsupervised method reduced it by 75%, with safe-behavior rates also falling. | Medium | SE012 |
| CE031 | Paint With Ember uses a canvas that manipulates SDXL-Turbo internal activations instead of relying only on text prompts. | Medium | SE019 |
| CE032 | Goodfire's research surfaces and phylogeny work argue that internal geometry recapitulates structured concepts across language, image, and genomic models. | Medium | SE011, SE021 |
| CE033 | Goodfire's terms define the platform as software, APIs, tools, documentation, support, and services, and allow customers to bring models, files, datasets, code, and workflows into the platform. | Medium | SE022 |
| CE034 | Public terms tie commercial fees and overages to order forms and usage reports rather than to a public rate card. | Medium | SE022 |
| CE035 | The Pilot Agreement limits pilot use to internal evaluation and requires a separate commercial license after the evaluation period. | Medium | SE023 |
| CE036 | Goodfire's terms and pilot agreement both describe support, technical assistance, field engineering, research activities, and deliverables around the platform. | High | SE022, SE023 |
| CE037 | Goodfire's terms allow third-party products and permit access suspension for security, legal, operational, or overdue-account reasons. | Medium | SE022 |
| CE038 | Goodfire's terms say customers retain customer materials while Goodfire retains Goodfire IP and broad rights over usage data and licensed workflow data. | Medium | SE022 |
| CE039 | Goodfire announced SOC 2 Type II with no exceptions identified and a public SOC 3 summary. | Medium | SE008 |
| CE040 | Goodfire says its Mayo collaboration operates under rigorous data privacy protocols and Mayo Clinic governance frameworks. | Medium | SE027 |
| CE041 | NIST's AI RMF and generative-AI profile focus on embedding trustworthiness into AI design, development, use, and evaluation. | Medium | SE033 |
| CE042 | Gartner says GenAI total cost of ownership is often understated and that critical decision use cases require more robust and interpretable approaches. | Medium | SE034 |
| CE043 | Salesforce Ventures frames Goodfire as moving AI teams from guessing at behavior to measuring and shaping model intent and reasoning. | Medium | SE031 |
| CE044 | On Healthcare Tech argues that interpretability could become infrastructure for regulated health AI, but public commercialization evidence still looks early. | Medium | SE032 |
| CE045 | Public materials reviewed do not provide a public status page, self-serve API reference, or public deployment-count disclosure for Silico. | Medium | SE001, SE022, SE030 |
| CE046 | Careers, Stanford guest lectures, and the fellowship program show active research-engineering recruiting and practitioner education despite a limited public OSS product surface. | Medium | SE024, SE025, SE026 |
| CE047 | Goodfire's public 2026 output is research-led and fast, with published releases on May 4, May 20, and May 21 covering eval awareness and neural geometry. | Medium | SE012, SE014, SE020 |
| CE048 | PR Newswire says Series B proceeds will fund next-generation product development and partnership scaling across AI agents and life sciences. | Medium | SE038 |
| CE049 | EVEE combines Evo 2 embeddings, lightweight probes, and frontier reasoning models to generate human-readable hypotheses about variant effects. | Medium | SE015 |
| CE050 | Goodfire's phylogeny work says Evo 2 encodes tree-of-life relationships as a curved manifold, reinforcing its model-to-human knowledge-transfer thesis. | Medium | SE021 |
| CE051 | Menlo and Lightspeed both describe Goodfire as an applied research lab translating mechanistic interpretability into productized tooling. | Medium | SE035, SE036 |
| CE052 | PYMNTS reports that Goodfire internally uses a model design environment and deploys that shared environment forward with customers. | Medium | SE037 |
| CU001 | Goodfire publicly says its platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs. | High | SU001, SU014 |
| CU002 | Goodfire positions Silico and related services for organizations training or fine-tuning foundation models across architectures and modalities. | High | SU001, SU002 |
| CU003 | Goodfire says it engages deeply and selectively with teams building high-stakes or frontier systems where understanding and control are essential. | Medium | SU014 |
| CU004 | Public product and contact copy imply buyers are research, platform, or product owners while day-to-day users are research scientists and ML engineers. | Medium | SU001, SU002, SU003, SU004, SU005 |
| CU005 | Goodfire's named proof set is concentrated in life sciences, AI-agent infrastructure, and materials discovery rather than a wide range of end markets. | Medium | SU003, SU006, SU008, SU010, SU011, SU013 |
| CU006 | Reviewed public sources do not disclose Goodfire's customer count. | High | SU001, SU014, SU019, SU020 |
| CU007 | Reviewed public sources do not disclose Goodfire's segment-level revenue or ARR by customer type. | High | SU014, SU020, SU022 |
| CU008 | The broad Fortune 500 adoption claim is not backed by a public list of named enterprise customers or outcomes. | High | SU001, SU014 |
| CU009 | Goodfire's public sales surface is request-access and contact led rather than self-serve. | High | SU001, SU019 |
| CU010 | Prima Mente partnered with Goodfire to understand its Pleiades epigenomics model. | Medium | SU006, SU007 |
| CU011 | Goodfire says its researchers embedded in Prima Mente's team while building a biomarker-discovery pipeline. | Medium | SU006 |
| CU012 | Goodfire and Prima Mente identified a novel class of blood-borne biomarkers for Alzheimer's detection. | Medium | SU006, SU007, SU022 |
| CU013 | Prima Mente's public outcome remains pre-validation because the biomarkers are still undergoing experimental validation and a publication is forthcoming. | Medium | SU006, SU007 |
| CU014 | Goodfire collaborated with Arc Institute to interpret Evo 2, Arc's genomic foundation model. | Medium | SU010, SU025 |
| CU015 | The initial Arc Institute disclosure described feature discovery and steering work that was still in its early stages rather than a mature production deployment. | Medium | SU010 |
| CU016 | By March 2026, Goodfire's Evo 2 interpretability work had been updated to note Nature publication, increasing scientific credibility of the Arc partnership. | Medium | SU010, SU020 |
| CU017 | Goodfire says its Mayo Clinic collaboration combines interpretability work with Mayo's medical AI team and established data-governance frameworks. | Medium | SU008 |
| CU018 | Public Mayo materials frame the work as genomic research and responsible-AI validation rather than routine clinical deployment. | Medium | SU008, SU009 |
| CU019 | Goodfire's EVEE work is described as part of its ongoing collaboration with Mayo Clinic and is still undergoing peer review. | Medium | SU009 |
| CU020 | Goodfire says EVEE achieves 0.997 AUROC on 839k ClinVar variants and provides predictions and explanations for all 4.2 million ClinVar variants. | Medium | SU003, SU009 |
| CU021 | Goodfire says EVEE outputs are computational predictions rather than diagnoses and require further expert review and validation. | Medium | SU009 |
| CU022 | Goodfire partnered with Rakuten on PII detection for multilingual AI-agent messages in a production-critical enterprise setting. | Medium | SU013 |
| CU023 | The Rakuten deployment required synthetic-to-real generalization, multilingual English and Japanese coverage, lightweight inference, and very high recall. | Medium | SU013 |
| CU024 | Goodfire says Rakuten deployed SAE probes and describes the system as the first known enterprise application of SAEs for language-model guardrails. | Medium | SU013 |
| CU025 | Among reviewed sources, Rakuten is the clearest public evidence of a production Goodfire deployment. | Medium | SU013, SU019 |
| CU026 | Goodfire and Radical AI publicly announced a partnership to apply interpretability to inverse materials design. | Medium | SU011, SU012 |
| CU027 | Goodfire says its self-correcting-search work with Radical AI improved successful candidates by about 27% and generated about 30% more SUN materials in the target range. | Medium | SU012 |
| CU028 | Radical AI's public partnership disclosure says more research directions and outcomes will be shared later, leaving commercialization maturity unclear. | Medium | SU011, SU012 |
| CU029 | Goodfire says it deploys its model design environment forward with customers in a shared environment. | High | SU014, SU026 |
| CU030 | MIT Technology Review reports that Silico pricing is determined case by case and Goodfire declined to provide specific pricing. | Medium | SU019 |
| CU031 | MIT Technology Review says Silico could let smaller firms and research teams build or adapt open-source models without hiring interpretability researchers. | Medium | SU019 |
| CU032 | Goodfire's public positioning is selective and high-touch rather than high-volume self-serve SaaS. | High | SU001, SU014, SU019 |
| CU033 | Goodfire's Series B materials say the new funding will scale partnerships across AI agents and life sciences. | Medium | SU021, SU022, SU023, SU026 |
| CU034 | Salesforce Ventures frames Goodfire around enterprise AI ROI, reliability, and control problems. | Medium | SU017, SU018 |
| CU035 | The public proof set spans life sciences, AI-agent operations, materials science, and general frontier-model design. | High | SU003, SU011, SU013, SU014 |
| CU036 | Goodfire's public blog history shows named collaboration proof surfacing across 2025 and 2026 rather than through a single isolated announcement. | High | SU008, SU010, SU011, SU013, SU016 |
| CU037 | Goodfire's public materials distinguish broad segment claims from a much smaller set of named collaborators. | High | SU001, SU003, SU006, SU008, SU010, SU011, SU013 |
| CU038 | No reviewed public source disclosed NRR, GRR, churn, renewal rate, or true retention cohorts for Goodfire. | High | SU001, SU014, SU019, SU020 |
| CU039 | No reviewed public source disclosed contract length, commercial expansion metrics, or top-customer concentration for Goodfire. | High | SU001, SU014, SU020, SU022 |
| CU040 | Arc Institute and Mayo Clinic each have later public follow-on evidence after their initial collaboration announcements, indicating relationship continuity but not proving paid renewal. | Medium | SU008, SU009, SU010 |
| CU041 | The disclosed reference set is concentrated in a handful of named collaborators and is especially weighted toward life-sciences programs. | Medium | SU003, SU006, SU008, SU009, SU010, SU020 |
| CU042 | The broad Fortune 500 claim remains materially weaker than the named proof set because no enterprise names or outcomes are publicly disclosed. | Medium | SU001, SU014 |
| CU043 | MIT Technology Review quoted Leonard Bereska saying Goodfire is adding “precision to the alchemy,” a substantive critique of how principled the product really is. | Medium | SU019 |
| CU044 | OnHealthcare argued that Goodfire's $1.25 billion valuation looks aggressive for a research-first company with relatively early commercial traction. | Medium | SU020 |
| CU045 | OnHealthcare argued that the public valuation case relies more on platform option value than on disclosed revenue or customer metrics. | Medium | SU020 |
| CU046 | Several scientific customer outcomes remain partly hypothesis-stage because Prima Mente's biomarkers are still under validation and EVEE is still undergoing peer review. | Medium | SU006, SU007, SU009 |
| CU047 | Silico is most naturally usable where customers can inspect model internals, which may bias near-term adoption toward open-model teams and research labs over closed-model users. | Medium | SU002, SU019 |
| CU048 | Goodfire's continued publication of frontier interpretability results supports a customer narrative built on research credibility as much as on packaged software. | Medium | SU034 |
| CU049 | Mayo Clinic is a major medical institution, so Goodfire's disclosed collaboration carries meaningful signal for regulated-domain customer credibility. | High | SU035, SU007 |
| CR001 | Goodfire positions itself as a research company using interpretability to understand, learn from, and design AI systems rather than relying on scale alone. | Medium | SR006, SR022 |
| CR002 | Goodfire publicly argues that today's dominant AI-development process still cannot meaningfully understand, debug, or shape what models learn. | Medium | SR005, SR006 |
| CR003 | Goodfire says current model training is still a costly guess-and-check process and presents intentional design as an attempt to move from open-loop tweaking toward closed-loop control. | Medium | SR005 |
| CR004 | Goodfire also states that its techniques are early, the science is incomplete, and the hardest interpretability problems remain unsolved. | Medium | SR005, SR023 |
| CR005 | MIT Technology Review described Silico as potentially useful but quoted an external mechanistic-interpretability researcher saying Goodfire is adding precision to alchemy rather than fully turning model building into engineering. | Medium | SR013 |
| CR006 | Goodfire markets Silico as a model-design environment that can debug behavior, remove confounders, and diagnose failures before production, but access is still request-based rather than self-serve. | Medium | SR009 |
| CR007 | Goodfire claims its platform is already used by Fortune 500 enterprises, major healthcare institutions, and AI research labs, but it does not disclose how many of those users are production customers or what they pay. | Medium | SR008 |
| CR008 | Under the MSA and TOS, Goodfire only commits to support, service levels, implementation help, training, or professional services if those items are expressly defined in an order form. | Medium | SR001, SR003 |
| CR009 | The TOS says pilot, beta, trial, evaluation, or pre-release access may be modified, suspended, or discontinued at any time and, absent an order form, carries no service levels, support commitments, security commitments, or availability commitments. | Medium | SR003 |
| CR010 | Goodfire's default legal terms disclaim warranties that the platform or services will be uninterrupted, secure, accurate, complete, or error free. | Medium | SR001, SR003 |
| CR011 | Goodfire's aggregate liability is capped at fees paid in the prior twelve months under the MSA and TOS, while pilot-agreement liability is capped at pilot fees. | Medium | SR001, SR002, SR003 |
| CR012 | The TOS defines Usage Data broadly to include usage volumes, clickstream, logs, performance data, and error data, and classifies that Usage Data as Goodfire IP. | Medium | SR003 |
| CR013 | The TOS gives Goodfire a perpetual, irrevocable, sublicensable license to use Workflow Data to provide, improve, evaluate, train, and commercialize the platform, subject to promises not to identify the customer or reveal confidential information. | Medium | SR003 |
| CR014 | The MSA's feedback clause assigns customer feedback and related know-how to Goodfire without attribution or compensation. | Medium | SR001 |
| CR015 | Goodfire's public contracts require compliance with U.S. export and re-export restrictions and any necessary government approvals for cross-border use of the service or customer materials. | Medium | SR001, SR003 |
| CR016 | Goodfire says it is SOC 2 Type II compliant and directs customers to a trust portal for SOC 3 materials and full-report requests. | Medium | SR004 |
| CR017 | Goodfire says its Mayo Clinic collaboration operates under rigorous data-privacy protocols and Mayo's established data-governance frameworks. | Medium | SR011 |
| CR018 | Goodfire's Prima Mente case study says the customer needed to narrow model signals for experimental validation and FDA-approval progress, and that Goodfire identified a novel biomarker class through interpretability work. | Medium | SR010 |
| CR019 | On Healthcare argues Goodfire's 2026 valuation is aggressive relative to a roughly 51-person organization that still appears early in commercial traction and is funding green-field research alongside product work. | Medium | SR014 |
| CR020 | Goodfire's February 2026 Series B valued the company at $1.25 billion and brought total disclosed funding to just over $200 million. | Medium | SR014, SR015, SR021, SR022 |
| CR021 | On Healthcare reports Goodfire had about 51 employees as of January 2026. | Medium | SR014 |
| CR022 | Goodfire's careers page explicitly recruits people who thrive in fast-paced environments, signaling that the company is still building organizational depth while scaling. | Medium | SR007 |
| CR023 | On Healthcare says Eric Ho has argued there are probably fewer than 150 full-time interpretability researchers in the world, implying a tight labor pool for the company's core discipline. | Medium | SR014 |
| CR024 | MIT Technology Review says Silico pricing is determined case by case based on customer requirements and that Goodfire declined to disclose pricing specifics. | Medium | SR013 |
| CR025 | On Healthcare argues Goodfire is not yet a predictable SaaS business with clearly disclosed recurring revenue mechanics. | Medium | SR014 |
| CR026 | Goodfire's Prima Mente case study says Goodfire researchers embedded with the customer team and built the biomarker-discovery pipeline jointly. | Medium | SR010 |
| CR027 | Goodfire's TOS contemplates support, technical assistance, field engineering, research activities, collaboration activities, and deliverables alongside platform access. | Medium | SR003 |
| CR028 | Goodfire's May 2026 eval-awareness paper says verbalized eval awareness appeared across all 19 benchmarks and 8 models tested, with 515 manually verified instances. | Medium | SR024 |
| CR029 | The same paper says eval awareness correlates with safer behavior and can systematically overestimate model alignment if benchmarks do not account for it. | Medium | SR024 |
| CR030 | Goodfire reports that rewriting prompts cut verbalized eval awareness by 40 percent, that an unsupervised paraphrasing method cut it by 75 percent, and that refusal rates also dropped as awareness fell. | Medium | SR024 |
| CR031 | Goodfire's Reasoning Theater work says chain-of-thought text can be performative; on easier tasks models often know the answer early and generate superfluous reasoning that lags internal state. | Medium | SR025 |
| CR032 | Reasoning Theater also reports that probe-based early exit saved 68 percent of MMLU tokens and 33 percent of GPQA-Diamond tokens for DeepSeek-R1 while retaining more than 95 percent of baseline accuracy. | Medium | SR025 |
| CR033 | Goodfire's model-diff-amplification post says harmful or backdoored behaviors are often a needle-in-a-haystack problem that standard evaluations miss until after deployment. | Medium | SR030 |
| CR034 | Model diff amplification made harmful outputs 10x to 300x more frequent in testing and made a sleeper-agent backdoor about 100x easier to surface, but Goodfire says the method is only for detection and overstates real prevalence. | Medium | SR030 |
| CR035 | Goodfire's memorization-via-loss-curvature work says language models memorize substantial portions of training data and that many questions about how memories are stored or localized remain unresolved. | Medium | SR027 |
| CR036 | The same memorization work says suppressing memorization can preserve logical reasoning but degrade arithmetic and closed-book factual recall, showing that edits can trade off reliability across tasks. | Medium | SR027 |
| CR037 | Goodfire's SPD post argues that sparse autoencoders do not explain feature geometry, do not converge to a single true decomposition as they scale, and that SPD still has non-trivial sensitivity and has only been validated on toy models. | Medium | SR026 |
| CR038 | Goodfire's neural-geometry post says a single SAE direction gives only a partial view of curved structure, so interpreting features one by one misses the global picture. | Medium | SR028 |
| CR039 | Goodfire's manifold-steering post says linear steering often mismatches internal geometry and can produce noisy, off-target effects. | Medium | SR029 |
| CR040 | Goodfire's scientific-model interpretability work argues interpretability can improve reliability and transparency in downstream applications, especially clinical domains, but extracting mechanisms from complex models remains challenging and valuable. | Medium | SR031 |
| CR041 | MIT Technology Review says interpretability tools like Silico could be essential for safety-critical applications in healthcare and finance, increasing the burden on Goodfire to prove deployment-grade trustworthiness rather than just interesting demos. | Medium | SR013 |
| CR042 | NIST says the 2024 generative-AI profile and the 2026 critical-infrastructure concept note are intended to guide organizations toward concrete AI risk-management practices and trustworthy-AI controls. | Medium | SR016 |
| CR043 | Gartner says enterprise GenAI outcomes depend heavily on data quality, governance, change management, realistic expectations, and talent availability. | Medium | SR017 |
| CR044 | Datadog markets a production stack that combines prompt testing, evaluations, tracing, monitoring, sensitive-data scanning, and enterprise controls for AI systems. | Medium | SR019 |
| CR045 | LangSmith markets observability, monitoring, hallucination debugging, and self-hosted or BYOC deployment options so sensitive traces can stay inside the customer environment. | Medium | SR020 |
| CR046 | On Healthcare and Goodfire's Mayo materials both frame healthcare deployment as blocked by the gap between model predictions and biological understanding, positioning interpretability as a compliance and validation bridge rather than only a developer tool. | Medium | SR011, SR014 |
| CR047 | Goodfire's public proof set is concentrated in named collaborations and case studies—Prima Mente, Mayo Clinic, Radical AI, and unnamed enterprise claims—rather than a broad list of disclosed production references. | Medium | SR008, SR010, SR011, SR012 |
| CR048 | The Radical AI partnership announcement says details on research directions and outcomes will be shared later, so one of Goodfire's flagship scientific partnerships is still forward-looking in public evidence. | Medium | SR012 |
| CR049 | PwC says healthcare AI adoption is slower than in other sectors and emphasizes risk-controlled adoption, which raises go-to-market friction for vendors selling into regulated clinical workflows. | Medium | SR018 |
| CR050 | Adjacent observability vendors already package evaluation, tracing, monitoring, and governance into production platforms, so Goodfire has to prove that interpretability delivers a distinct control layer rather than just another form of observability. | Medium | SR019, SR020 |
| CR051 | Salesforce Ventures argues enterprise AI buyers are increasingly constrained by unclear ROI and by an inability to steer models reliably and consistently, framing control and reliability as buyer pain rather than purely research interests. | Medium | SR032 |
| CR052 | Lightspeed framed Goodfire as critical infrastructure for explainable and mission-critical AI, explicitly tying future demand to regulation and to the need to productize interpretability for enterprises rather than only researchers. | Medium | SR033 |
| CR053 | Investing.com reported that Goodfire works with clients including Microsoft, Mayo Clinic, and Arc Institute and plans to use new capital for model improvement, compute, and hiring, which reinforces both partner-value and execution-demand intensity. | Medium | SR034 |
| CR054 | Adjacent observability vendors already market tracing, monitoring, and workflow-debugging for AI agents, increasing substitution risk around parts of Goodfire's budget. | Medium | SR035 |
| CR055 | Datadog now packages agent observability inside a broader enterprise monitoring suite, which can pull AI-operations budget toward incumbent platforms. | Medium | SR036 |
| CR056 | Langfuse positions itself as an observability layer with open-source adoption, reinforcing price and workflow competition for AI development teams. | Medium | SR037 |
| CR057 | Langfuse publishes transparent pricing, which increases buyer expectations for standardized software packaging that Goodfire has not yet publicly matched. | Medium | SR038 |
| CR058 | LangSmith markets observability for AI agents and LLM applications, underscoring that adjacent tooling vendors can compete for the same developer and platform owners. | Medium | SR039 |
| CR059 | Weights' combination with OpenAI highlights consolidation risk in AI tooling, where platform vendors can absorb adjacent products before smaller specialists fully scale. | Medium | SR040 |
| CR060 | Mechanistic interpretability results still depend on advancing research rather than finished engineering playbooks. | Medium | SR041 |
| CR061 | Goodfire continues to publish foundational work on latent computation, underscoring that part of its edge still resides in experimental research rather than commoditized software. | Medium | SR042 |
| CR062 | Goodfire's ongoing publication cadence suggests platform differentiation remains tied to research velocity, which creates key-person and execution dependence if commercialization lags. | Medium | SR043 |
| CR063 | Goodfire's valuation and product narrative still depend on turning novel neural-geometry research into dependable commercial workflows, which keeps execution risk elevated. | Medium | SR044 |
| CV001 | Goodfire announced a $150 million Series B at a $1.25 billion valuation in February 2026. | High | SV001, SV002, SV012 |
| CV002 | B Capital led the Series B and the syndicate included Juniper Ventures, Menlo Ventures, Lightspeed Venture Partners, South Park Commons, Wing Venture Capital, DFJ Growth, Salesforce Ventures, and Eric Schmidt. | Medium | SV001, SV002, SV003 |
| CV003 | Goodfire said the Series B came less than a year after its Series A. | Medium | SV002, SV003, SV006 |
| CV004 | Goodfire announced a $50 million Series A in April 2025 led by Menlo Ventures with Anthropic participating. | Medium | SV006, SV007 |
| CV005 | Public company and press-release materials imply that Goodfire has raised more than $200 million in total capital after the Series B. | High | SV001, SV002, SV006 |
| CV006 | Official and SEC materials identify Goodfire as a Delaware company founded in 2023 and based in San Francisco. | High | SV028, SV029, SV030 |
| CV007 | Goodfire describes itself as a public benefit corporation focused on interpretability to understand, learn from, and design AI systems. | Medium | SV002, SV030 |
| CV008 | The April 2025 Form D shows roughly $52.0 million sold against a roughly $52.1 million offering tied to the Series A financing. | Medium | SV028, SV006 |
| CV009 | The February 2026 Form D lists Yan-David Erlich among related persons and shows a $161.7 million offering amount tied to the Series B-era filing. | Medium | SV029, SV002 |
| CV010 | Goodfire positions Ember as its flagship model design environment and interpretability platform. | High | SV006, SV007, SV010 |
| CV011 | Goodfire says Ember is meant to give programmable access to internal model features so users can inspect, edit, and retrain behavior more precisely than black-box methods. | Medium | SV006, SV007, SV009 |
| CV012 | Goodfire says interpretability-guided training reduced hallucinations in a language model by roughly half. | Medium | SV001, SV002, SV010 |
| CV013 | Goodfire cites collaborators including Arc Institute, Mayo Clinic, Prima Mente, and Microsoft. | Medium | SV001, SV010 |
| CV014 | Goodfire says interpretability work surfaced a novel class of Alzheimer's biomarkers from Prima Mente's epigenetic model. | Medium | SV001, SV002, SV010 |
| CV015 | Goodfire announced SOC 2 Type II compliance with no exceptions identified in February 2026. | Medium | SV031 |
| CV016 | Goodfire continued publishing 2026 research across neural geometry, steering, parameter decomposition, and pooling methods. | Medium | SV022, SV024, SV025, SV027, SV031 |
| CV017 | Goodfire's Llama 3 research preview says it trained sparse autoencoders on Llama-3-8B and used causal feature interventions to steer outputs while minimizing degradation. | Medium | SV023 |
| CV018 | Goodfire's Geometric Calculator page says Llama 3.1 8B uses a general-purpose addition module that handles months, days, and arithmetic via circular representations. | Medium | SV024 |
| CV019 | Goodfire's Covariance Pooling page argues second-moment pooling outperforms mean pooling on downstream genomic tasks. | Medium | SV025 |
| CV020 | Goodfire's Painting With Concepts page shows interpretability tooling applied to SDXL-Turbo image generation, indicating modality expansion beyond text. | Medium | SV026 |
| CV021 | Goodfire's VPD explainer says the company decomposed a 67M-parameter model into simple pieces and used that structure to edit behavior without training. | Medium | SV027 |
| CV022 | Goodfire's product wedge sits deeper in the stack than observability vendors because it aims to intervene on model internals rather than only trace outputs or enforce guardrails. | Medium | SV009, SV019, SV020, SV021 |
| CV023 | Arize Phoenix positions itself around tracing, evals, and agent observability rather than model-internal design. | Medium | SV019 |
| CV024 | Fiddler positions its product around observability, guardrails, and governance for agents and predictive AI rather than model-internal representation editing. | Medium | SV020 |
| CV025 | LangSmith positions its product around tracing, monitoring, and clustering for agent behavior rather than model-internal steering. | Medium | SV021 |
| CV026 | Gartner says generative AI entered the 2025 trough of disillusionment and that ROI depends on governance, change management, and full cost accounting. | Medium | SV017 |
| CV027 | NIST says the AI RMF and its generative AI profile exist to help organizations manage trustworthiness and AI risk across design, deployment, and evaluation. | Medium | SV018 |
| CV028 | The On Healthcare analysis says Goodfire raised $209 million across seed, Series A, and Series B and estimated the team at roughly 51 employees as of January 2026. | Medium | SV010 |
| CV029 | The On Healthcare analysis argues that the $1.25 billion valuation is aggressive for a research-first company with relatively early commercial traction. | Medium | SV010 |
| CV030 | TechCrunch's 2026 mega-round list places Goodfire among U.S. AI companies that raised $100 million or more in early 2026 at a $1.25 billion valuation. | Medium | SV012 |
| CV031 | TechCrunch reported that Eric Schmidt's Hillspire invested directly in Goodfire as family offices and private wealth moved earlier into AI deals. | Medium | SV011 |
| CV032 | Anysphere was valued at $9.9 billion after surpassing $500 million in ARR. | Medium | SV013 |
| CV033 | Harvey was reportedly raising at $11 billion after hitting a $190 million ARR rate by the end of 2025. | Medium | SV014 |
| CV034 | Glean reached a $7.2 billion valuation after surpassing $100 million in ARR. | Medium | SV015 |
| CV035 | Anthropic was valued at $350 billion in April 2026 with up to $40 billion of Google investment and large compute commitments. | Medium | SV016 |
| CV036 | Unlike Anysphere, Harvey, and Glean, Goodfire's public round materials do not disclose revenue or ARR, so a comparable revenue multiple cannot be responsibly calculated from public evidence. | Medium | SV001, SV002, SV010, SV013, SV014, SV015 |
| CV037 | The current mark therefore looks like strategic option value on category leadership, research talent, and future platform commercialization rather than a fundamentals-backed software multiple. | Medium | SV001, SV009, SV010, SV017 |
| CV038 | Goodfire's strategic investor mix—Anthropic in Series A, Salesforce in Series B, and Eric Schmidt in Series B—supports the view that technically sophisticated buyers think interpretability will matter commercially. | Medium | SV006, SV009, SV011, SV002 |
| CV039 | Goodfire's market relevance is helped by enterprise pressure for explainability, governance, and reliable ROI in AI deployments. | Medium | SV009, SV017, SV018 |
| CV040 | Public evidence still does not disclose customer count, pricing, contract structure, retention, gross margin, or software-versus-services mix. | Medium | SV001, SV002, SV010 |
| CV041 | A plausible bull case requires proof that Goodfire is converting research credibility into repeatable software revenue and durable enterprise adoption. | Medium | SV009, SV017, SV019 |
| CV042 | Without that proof, a base case should haircut the last round and anchor below $1.25 billion because market demand is real but commercial evidence is incomplete. | Medium | SV010, SV017, SV013, SV014, SV015 |
| CV043 | A reasonable public-evidence bear case is a sub-$650 million outcome if commercialization stays bespoke, competitors absorb budget, or private AI multiples compress. | Medium | SV010, SV019, SV020, SV021, SV013 |
| CV044 | A reasonable public-evidence base case is roughly $800 million to $1.1 billion, implying the last round already prices in part of the bull thesis. | Medium | SV010, SV013, SV014, SV015, SV017 |
| CV045 | A reasonable public-evidence bull case is roughly $1.25 billion to $1.85 billion, which requires disclosed software revenue, strong design-partner conversion, and continued research and enterprise validation. | Medium | SV001, SV009, SV010, SV015 |
| CV046 | Given stage and disclosure opacity, another private round or strategic acquisition is a more plausible near-to-mid-term path than a public listing. | Medium | SV010, SV012, SV015, SV016 |
| CV047 | The most supportable current recommendation is research-more rather than buy because company-quality evidence exceeds pricing evidence. | Medium | SV001, SV010, SV017, SV018 |
| CV048 | The most supportable valuation stance is stretched because the $1.25 billion round sits near the lower bound of the bull case, not the center of the base case. | Medium | SV010, SV013, SV014, SV015, SV017 |
| CV049 | Entry discipline should require NDA-gated disclosure of ARR or revenue, pricing, top-customer concentration, gross margin, and the post-Series-B preference stack before underwriting above the base-case range. | Medium | SV010, SV017, SV018 |
| CV050 | Thesis-break triggers include failure to disclose recurring revenue quality, inability to convert partners into repeatable platform customers, or evidence that observability vendors can satisfy budgets without Goodfire's deeper tooling. | Medium | SV009, SV019, SV020, SV021, SV026 |
| CV051 | Goodfire's valuation case depends partly on owning distinctive interpretability research that competitors may not easily replicate. | Medium | SV032 |
| CV052 | Goodfire continues to invest in foundational interpretability methods, which supports upside optionality but also means commercial value still depends on converting research into repeatable product adoption. | Medium | SV033 |
| CV053 | Goodfire's upside case still depends on scaling its interpretability research edge into a durable commercial moat before adjacent tooling categories commoditize around it. | Medium | SV034 |
| ID | Publisher | Title | Quote |
|---|---|---|---|
| SO001 | Goodfire | Goodfire homepage | Goodfire is a research company using interpretability to understand, learn from, and design AI systems. |
| SO002 | Goodfire | Goodfire company page | |
| SO003 | Goodfire | Goodfire careers | All roles are full-time, in person five days a week at our San Francisco, Telegraph Hill office. |
| SO004 | Goodfire | Our Series B | Today, we’re excited to announce a $150 million Series B funding round at a $1.25 billion valuation. |
| SO005 | Goodfire | Intentionally Designing the Future of AI | At Goodfire, we’re developing the science and technology that lets us steer model training — a process we’re calling intentional design. |
| SO006 | Goodfire | On optimism for interpretability | At Goodfire, we believe we can engineer frontier AI systems that are understandable. |
| SO007 | Goodfire | Silico | The first platform for intentional model design. |
| SO008 | Goodfire | Life Sciences | We partner with companies training foundation models across architectures and modalities to interpret their models. |
| SO009 | Goodfire | Goodfire Announces Collaboration to Advance Genomic Medicine with AI Interpretability | Mayo Clinic has a financial interest in the technology referenced in this press release. |
| SO010 | Goodfire | Goodfire contact | Our platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs. |
| SO011 | Goodfire | Prima Mente customer story | Goodfire’s platform for in silico science decoded their model, identifying a novel class of biomarkers for Alzheimer’s detection. |
| SO012 | Goodfire | Fellowship Fall 25 | We’re excited to announce that we’ll be bringing on several Research Fellows and Research Engineering Fellows this fall for our fellowship program. |
| SO013 | Goodfire | AP293 guest lectures 25 | We gave three guest lectures in Surya Ganguli’s course on interpretability at Stanford last fall. |
| SO014 | PR Newswire | AI lab Goodfire raises $150M at $1.25B valuation to design models with interpretability | Today, Goodfire—the AI research lab using interpretability to understand, learn from, and design models—announced a $150 million Series B funding round at a $1.25 billion valuation. |
| SO015 | Yahoo Finance | AI lab Goodfire raises $150M at $1.25B valuation to design models with interpretability | |
| SO016 | Pulse 2.0 | Goodfire: $150 Million Series B At $1.25 Billion Valuation Raised For Interpretability AI Lab | The company has raised more than $200 million in total backing from a mix of venture firms and individual investors. |
| SO017 | Tech Funding News | Goodfire raises $150M Series B at $1.25B valuation for interpretability AI | |
| SO018 | PR Newswire | Goodfire raises $50M Series A to advance AI interpretability research | This funding, which comes less than one year after its founding, will support the expansion of Goodfire’s research initiatives and the development of the company’s flagship interpretability platform, Ember. |
| SO019 | Yahoo Finance | Goodfire raises $50M Series A to advance AI interpretability research | |
| SO020 | Goodfire | Announcing our $50M Series A | Today, we’re excited to announce a $50 million Series A funding round led by Menlo Ventures. |
| SO021 | Menlo Ventures | Leading Goodfire’s $50M Series A to interpret how AI models think | |
| SO022 | Lightspeed Venture Partners | Goodfire: Building Interpretable AI | We at Lightspeed are thrilled to lead their $7M seed round. |
| SO023 | Lightspeed Venture Partners | Goodfire company profile | |
| SO024 | Salesforce Ventures | Welcome, Goodfire | Goodfire was founded by Eric Ho, Daniel Balsam, and Thomas McGrath. |
| SO025 | Salesforce Ventures | Goodfire company profile | |
| SO026 | VCNewsDaily | Goodfire Venture Capital Funding | |
| SO027 | MIT Technology Review | This startup’s new mechanistic interpretability tool lets you debug LLMs | In reality, they are adding precision to the alchemy. |
| SO028 | OnHealthcare | Goodfire AI and the billion-dollar interpretability bet | The valuation jump ... is aggressive for a company with around 51 employees and what appears to be relatively early commercial traction. |
| SO029 | PYMNTS | Goodfire raises $150 million to better understand AI | |
| SO030 | LSVP | Goodfire company page | |
| SM001 | Goodfire | Goodfire | Understand the scientific foundations of neural networks so that we can intentionally design AI. |
| SM002 | Goodfire | Company | Goodfire | We engage deeply and selectively, partnering with teams building high-stakes or frontier systems where understanding and control are essential. |
| SM003 | Goodfire | Silico | Goodfire | A model design environment. |
| SM004 | Goodfire | Language | Goodfire | 58% reduction in hallucinations by using features as rewards. |
| SM005 | Goodfire | Life Sciences | Goodfire | Interpretability surfaced fragment length as the dominant predictive signal. |
| SM006 | Goodfire | Robotics & Vision | Goodfire | Catch generalization failure before deployment. |
| SM007 | Goodfire | Our Series B | Goodfire | We have built a model design environment ... to improve model behavior, and monitor them in production. |
| SM008 | Goodfire | Intentional Design | Goodfire | Intentional design will be an advance in model creation similar to the difference between selective breeding and genetic engineering. |
| SM009 | Goodfire | Feature Steering for Reliable and Expressive AI Engineering | Feature steering works well with fine-tuned models but also often makes fine-tuning unnecessary. |
| SM010 | Goodfire | Mayo Clinic Collaboration | Goodfire | This collaboration operates under rigorous data privacy protocols and Mayo Clinic's established data governance frameworks. |
| SM011 | Goodfire | Manifold Steering | Goodfire Research | Representation steering ... promises lightweight, adaptable, and granular control of neural networks. |
| SM012 | Goodfire | Interpreting Evo 2 | Goodfire Research | We discovered a wide range of features corresponding to sophisticated biological concepts. |
| SM013 | Goodfire | Interpreting LM Parameters | Goodfire Research | This is not just a theoretical issue. It prevents us from achieving practical engineering goals. |
| SM014 | Goodfire | Pilot Agreement | Goodfire | Customer will be allowed to test the Software and receive Services, with the aim of evaluating Goodfire's technology and considering a future long-term commercial relationship. |
| SM015 | Goodfire / Prima Mente | Prima Mente Customer Story | Goodfire | Goodfire's interpretability platform ... turned their foundation model into an engine for biomarker discovery. |
| SM016 | Gartner | Generative AI | Gartner | GenAI enters the Trough of Disillusionment on the 2025 Hype Cycle for Artificial Intelligence. |
| SM017 | PwC | AI Jobs Barometer | PwC | Workers with AI skills command a 56% wage premium. |
| SM018 | NIST | AI Risk Management Framework | NIST | AI Risk Management Framework. |
| SM019 | Arize | Phoenix | Arize | The open-source platform for agent development and evaluation. |
| SM020 | Arize | Pricing | Arize | AX Pro ... $50 per month. |
| SM021 | Fiddler | AI Observability | Fiddler | Gain Complete Visibility from Development to Production. |
| SM022 | Fiddler | Pricing | Fiddler | $0.002 per trace. |
| SM023 | Datadog | LLM Observability | Datadog | Test prompt, model, and tool changes against real production data before rollout. |
| SM024 | LangChain | LangSmith | LangChain | LangSmith Observability gives you complete visibility into agent behavior. |
| SM025 | Langfuse | Langfuse | Langfuse brings observability, prompts, evals, experiments, and human annotation into one connected workflow. |
| SM026 | Langfuse | Pricing | Langfuse | Enterprise ... $2499/month. |
| SM027 | Patronus AI | Patronus AI | Evaluate agent effectiveness in tip-of-the-tongue moments. |
| SM028 | Arthur | Arthur | Gain visibility and reliability of your model through continuous evals. |
| SM029 | Humanloop | Pricing | Humanloop | Get the enterprise platform to develop, evaluate, and ship trustworthy LLM powered apps. |
| SM030 | MIT Technology Review | This startup’s new mechanistic interpretability tool lets you debug LLMs | In reality, they are adding precision to the alchemy. |
| SP001 | Goodfire | Silico | Goodfire | The first platform for intentional model design. |
| SP002 | Goodfire | Language | Goodfire | Predict how your model will fail before deployment, not after. |
| SP003 | Goodfire | Life Sciences | Goodfire | Trace predictive signal through interpretable features to confirm whether predictions rely on real biological structure or dataset artifacts and spurious correlations. |
| SP004 | Goodfire | Robotics & Vision | Goodfire | Evaluate whether your model has learned real physical structure directly from the latent space, before generating a single frame. |
| SP005 | Goodfire | Feature steering for reliable and expressive AI engineering | AI engineers often ask us how feature steering differs from prompting or fine-tuning. |
| SP006 | Goodfire | Our Series B | Today, we're excited to announce a $150 million Series B funding round at a $1.25 billion valuation. |
| SP007 | MIT Technology Review | This startup's new mechanistic interpretability tool lets you debug LLMs | Goodfire is one of a small handful of companies, including industry leaders Anthropic, OpenAI, and Google DeepMind, pioneering a technique known as mechanistic interpretability. |
| SP008 | On Healthcare Tech | Goodfire AI and the billion-dollar black box | The valuation jump from wherever it was at Series A to $1.25B at Series B is aggressive for a company with around 51 employees and what appears to be relatively early commercial traction. |
| SP009 | Goodfire Research | Probe-based data attribution | Filtering out the data flagged by our probe reduces the harmful behavior by 63% without compromising general performance. |
| SP010 | Goodfire Research | Rakuten: SAE probes for PII detection | We detail one of the first uses of sparse autoencoders (SAEs) with a production AI model - using SAE probes to detect personally identifiable information for Rakuten AI agents. |
| SP011 | Goodfire Research | Understanding and steering Llama 3 | We're releasing preview.goodfire.ai, a desktop interface to help you understand and steer Llama 3's behavior. |
| SP012 | Goodfire Research | VPD explainer | We tried this and were able to make a precise and predictable change to the model's behaviour by directly editing the subcomponents, with no training required. |
| SP013 | Goodfire Research | Self-correcting search | We were able to improve generation by giving a diffusion model a feedback loop from its own internals, resulting in ~30% more viable candidate materials in a target range. |
| SP014 | Goodfire Research | Reasoning theater | Chain-of-thought reasoning is not always faithful to the model's internal computations. |
| SP015 | Arize | Phoenix | The open-source platform for agent development and evaluation. |
| SP016 | Arize | Pricing | Arize | AX Pro ... $50 per month. |
| SP017 | Fiddler AI | AI Observability | Fiddler AI | Gain unified visibility, context, and control across agents and predictive applications. |
| SP018 | Fiddler AI | Pricing | Fiddler AI | $0.002 per trace. |
| SP019 | Arthur | Arthur | The full lifecycle platform for ensuring reliable AI. |
| SP020 | Datadog | LLM Observability | Datadog | Free includes up to 40K LLM spans per month. Pro starts at $160 per month and includes 100K LLM spans. |
| SP021 | LangChain | LangSmith | LangSmith has a free tier for development and small-scale production. Paid plans scale with trace volume. |
| SP022 | Langfuse | Langfuse | Open Source AIEngineeringPlatform. |
| SP023 | Langfuse | Pricing | Langfuse | $29/ month. |
| SP024 | Humanloop | Pricing | Humanloop | Get the enterprise platform to develop, evaluate, and ship trustworthy LLM powered apps. |
| SP025 | Humanloop | Humanloop is joining Anthropic | As we sunset the Humanloop platform, we will continue to work closely with our customers to make their transition as smooth as possible. |
| SP026 | Weights | Weights is joining OpenAI | As part of this transition, our products and services have been wound down and are no longer available. |
| SP027 | National Institute of Standards and Technology | AI Risk Management Framework | The NIST AI Risk Management Framework (AI RMF) is intended for voluntary use and to improve the ability to incorporate trustworthiness considerations into the design, development, use, and evaluation of AI products, services, and systems. |
| SP028 | Gartner | Generative AI | The total cost of ownership (TCO) for GenAI initiatives can often exceed initial expectations due to hidden costs such as compliance reviews, model retraining and internal overheads. |
| SP029 | Humanloop | Humanloop: LLM evals platform for enterprises | |
| SI001 | Goodfire | Goodfire homepage | |
| SI002 | Goodfire | Understanding, Learning From, and Designing AI: Our Series B | Today, we're excited to announce a $150 million Series B funding round at a $1.25 billion valuation. |
| SI003 | Goodfire | Silico | |
| SI004 | Goodfire | Language | |
| SI005 | Goodfire | Life Sciences | |
| SI006 | Goodfire | Robotics & Vision | |
| SI007 | Goodfire | Feature Steering for Reliable and Expressive AI Engineering | Update (Feb 2026): Our SAE demo interface and API have been deprecated. |
| SI008 | Goodfire | Contact | Our platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs. |
| SI009 | Goodfire | Careers | |
| SI010 | Goodfire | SOC 2 Type II compliant | We're excited to announce that Goodfire is SOC 2 Type II compliant. |
| SI011 | Goodfire | Customer story: Prima Mente | |
| SI012 | Goodfire | RLFR: Reinforcement Learning from Feature Rewards | Overall, we reduce the hallucination rate by 58% across the held-out test set. |
| SI013 | Goodfire | Master Services Agreement | |
| SI014 | Goodfire | Pilot Agreement | |
| SI015 | Goodfire | Silico Terms of Use | |
| SI016 | PR Newswire | AI Lab Goodfire Raises $150M at $1.25B Valuation to Design Models with Interpretability | |
| SI017 | Yahoo Finance | AI Lab Goodfire Raises $150M at $1.25B Valuation to Design Models with Interpretability | |
| SI018 | The SaaS News | Goodfire Raises $150 Million at $1.25 Billion Valuation | |
| SI019 | Pulse 2.0 | Goodfire: $150 Million Series B At $1.25 Billion Valuation Raised For Interpretability AI Lab | |
| SI020 | Tech Funding News | Goodfire raises $150M Series B at $1.25B valuation | |
| SI021 | PR Newswire | Goodfire Raises $50M Series A to Advance AI Interpretability Research | |
| SI022 | Yahoo Finance | Goodfire Raises $50M Series A to Advance AI Interpretability Research | |
| SI023 | Menlo Ventures | Leading Goodfire's $50M Series A to Interpret How AI Models Think | |
| SI024 | VC News Daily | Goodfire Venture Capital Funding | |
| SI025 | Salesforce Ventures | Welcome, Goodfire | |
| SI026 | On Healthcare | Goodfire AI and the Billion-Dollar Black Box | The valuation jump ... is aggressive for a company with around 51 employees and what appears to be relatively early commercial traction. |
| SI027 | SEC | Goodfire AI, Inc. Form D filing dated 2025-06-02 | |
| SI028 | SEC | Goodfire AI, Inc. Form D filing dated 2026-02-09 | |
| SI029 | Goodfire | Customer Story: Radical AI | We're excited to announce a new partnership between Radical AI and Goodfire to fundamentally dismantle the black box of AI-driven materials discovery and design. |
| SE001 | Goodfire | Silico | |
| SE002 | Goodfire | Language | |
| SE003 | Goodfire | Life Sciences | |
| SE004 | Goodfire | Robotics & Vision | |
| SE005 | Goodfire | Hallucinations Viewer | |
| SE006 | Goodfire | Feature Steering for Reliable and Expressive AI Engineering | |
| SE007 | Goodfire | Intentionally Designing the Future of AI | |
| SE008 | Goodfire | Announcing our SOC 2 Type II Certification | |
| SE009 | Goodfire | You and Your Research Agent | |
| SE010 | Goodfire | Under the Hood of a Reasoning Model | |
| SE011 | Goodfire | The World Inside Neural Networks | |
| SE012 | Goodfire | Verbalized Eval Awareness Inflates Measured Safety | |
| SE013 | Goodfire | Interpretability for Alzheimer's Detection | |
| SE014 | Goodfire | Can SAEs Capture Neural Geometry? | |
| SE015 | Goodfire | EVEE: Explaining Genetic Variants | |
| SE016 | Goodfire | Model Diff Amplification | |
| SE017 | Goodfire | Stochastic Parameter Decomposition | |
| SE018 | Goodfire | Understanding Memorization via Loss Curvature | |
| SE019 | Goodfire | Painting with Concepts | |
| SE020 | Goodfire | The Shape of Stories Inside Neural Networks | |
| SE021 | Goodfire | Phylogeny Manifold | |
| SE022 | Goodfire | Silico Terms of Use | |
| SE023 | Goodfire | Pilot Agreement | |
| SE024 | Goodfire | Careers | |
| SE025 | Goodfire | AP293 Guest Lectures 25 | |
| SE026 | Goodfire | Fellowship Fall 25 | |
| SE027 | Goodfire | Announcing our Mayo Clinic Collaboration | |
| SE028 | Goodfire | Prima Mente Customer Story | |
| SE029 | Goodfire | Radical AI Partnership Announcement | |
| SE030 | MIT Technology Review | This startup's new mechanistic interpretability tool lets you debug LLMs | |
| SE031 | Salesforce Ventures | Welcome, Goodfire | |
| SE032 | On Healthcare Tech | Goodfire AI and the Billion-Dollar Black Box | |
| SE033 | NIST | AI Risk Management Framework | |
| SE034 | Gartner | Generative AI | |
| SE035 | Menlo Ventures | Leading Goodfire's $50M Series A to Interpret How AI Models Think | |
| SE036 | Lightspeed Venture Partners | Goodfire | |
| SE037 | PYMNTS | Goodfire Raises $150 Million to Better Understand AI | |
| SE038 | PR Newswire | AI Lab Goodfire Raises $150M at $1.25B Valuation to Design Models with Interpretability | |
| SU001 | Goodfire | Contact / early-access page | Our platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs. |
| SU002 | Goodfire | Silico product page | |
| SU003 | Goodfire | Life sciences page | |
| SU004 | Goodfire | Language page | |
| SU005 | Goodfire | Robotics / vision page | |
| SU006 | Goodfire | Prima Mente customer story | Goodfire’s research scientists embedded in Prima Mente’s team as they had finished training their model. |
| SU007 | Goodfire | Interpretability for Alzheimer's detection | We detail how we studied Pleiades to identify fragmentomics as a novel class of biomarkers for Alzheimer’s detection. |
| SU008 | Goodfire | Mayo Clinic collaboration announcement | This collaboration operates under rigorous data privacy protocols and Mayo Clinic's established data governance frameworks. |
| SU009 | Goodfire | EVEE: explaining genetic variants | Our pathogenicity probe achieves state-of-the-art performance (0.997 overall AUROC on 839k ClinVar variants). |
| SU010 | Goodfire | Interpreting Evo 2 | Preliminary experiments have shown promising directions for steering these features to guide DNA sequence generation, though this work is still in its early stages. |
| SU011 | Goodfire | Radical AI partnership announcement | |
| SU012 | Goodfire | Using self-correcting search to accelerate materials discovery | Applying self-correcting search improves targeting without harming SUN scores, leading to an overall ~27% increase in successful candidates. |
| SU013 | Goodfire | Rakuten SAE probes for PII detection | As a result, Rakuten deployed the SAE probes - the first known enterprise application of SAEs for language model guardrails. |
| SU014 | Goodfire | Series B announcement / customer positioning | We use this environment internally for research, and deploy it forward with our customers, collaborating in a shared environment. |
| SU015 | Goodfire | You and your research agent | |
| SU016 | Goodfire | Blog index | |
| SU017 | Salesforce Ventures | Goodfire company profile | |
| SU018 | Salesforce Ventures | Welcome Goodfire | Enterprise customers care more about the ROI they see from their AI investments than ever. |
| SU019 | MIT Technology Review | This startup's new mechanistic interpretability tool lets you debug LLMs | In reality, they are adding precision to the alchemy. |
| SU020 | OnHealthcare | Goodfire AI and the billion-dollar bet on interpretability | |
| SU021 | Tech Funding News | Goodfire raises $150M Series B at $1.25B valuation | |
| SU022 | PR Newswire | AI lab Goodfire raises $150M at $1.25B valuation to design models with interpretability | This funding... will enable Goodfire to ... scale partnerships across AI agents and life sciences. |
| SU023 | Yahoo Finance | AI lab Goodfire raises $150M at $1.25B valuation to design models with interpretability | |
| SU024 | Lightspeed Venture Partners | Goodfire company page | |
| SU025 | Menlo Ventures | Leading Goodfire's $50M Series A to interpret how AI models think | Patrick Hsu, co-founder of Arc Institute... said, “Their interpretability tools have enabled us to extract novel biological concepts that are accelerating our scientific discovery process.” |
| SU026 | PYMNTS | Goodfire raises $150 million to better understand AI | We use this environment internally for research, and deploy it forward with our customers, collaborating in a shared environment. |
| SU027 | Goodfire | Research index | |
| SU028 | Goodfire | Radical AI customer story | |
| SU029 | Goodfire | Open problems in mechanistic interpretability | |
| SU030 | Goodfire | Belief dynamics in in-context steering | |
| SU031 | Goodfire | Mixing mechanisms | |
| SU032 | Goodfire | Replicating circuit tracing for a simple mechanism | |
| SU033 | Goodfire | Mapping latent spaces in Llama 3.3 70B | |
| SU034 | Goodfire | A Geometric Calculator | |
| SU035 | Mayo Clinic | About Mayo Clinic | |
| SR001 | Goodfire | Master Services Agreement | The Services are provided "as is" and Goodfire hereby disclaims all warranties. |
| SR002 | Goodfire | Pilot Agreement | In no event will either Party's aggregate liability exceed the fees paid for the pilot. |
| SR003 | Goodfire | Silico Terms of Use | Customer grants Goodfire a non-exclusive, worldwide, perpetual, irrevocable, royalty-free, sublicensable license to Workflow Data. |
| SR004 | Goodfire | Goodfire is SOC 2 Type II compliant | We're excited to announce that Goodfire is SOC 2 Type II compliant. |
| SR005 | Goodfire | Intentional design | The techniques are early, the science is incomplete, and the hardest problems remain unsolved. |
| SR006 | Goodfire | Company | Our goal is to make AI that can be understood, debugged, and shaped like software. |
| SR007 | Goodfire | Careers | If you thrive in fast-paced environments and believe that understanding AI systems is essential for our future, join us. |
| SR008 | Goodfire | Contact | Our platform is used by Fortune 500 enterprises, major healthcare institutions, and AI research labs. |
| SR009 | Goodfire | Silico | Precisely debug issues with model behavior, identify and remove confounders, and diagnose failures before they occur in production. |
| SR010 | Goodfire | Prima Mente customer story | Goodfire's research scientists embedded in Prima Mente's team and built out a biomarker discovery pipeline. |
| SR011 | Goodfire | Mayo Clinic collaboration | This collaboration operates under rigorous data privacy protocols and Mayo Clinic's established data governance frameworks. |
| SR012 | Goodfire | Radical AI partnership announcement | More details about specific research directions and outcomes will be shared as the partnership progresses. |
| SR013 | MIT Technology Review | This startup’s new mechanistic interpretability tool lets you debug LLMs | In reality, they are adding precision to the alchemy. |
| SR014 | On Healthcare | Goodfire AI and the billion-dollar interpretability bet | The valuation jump is aggressive for a company with around 51 employees and what appears to be relatively early commercial traction. |
| SR015 | PYMNTS | Goodfire raises $150 million to better understand AI | The company's Series B funding round values Goodfire at $1.25 billion. |
| SR016 | NIST | AI Risk Management Framework | The profile will guide critical infrastructure operators towards specific risk management practices to consider when engaging AI-enabled capabilities. |
| SR017 | Gartner | Generative AI | The success of these implementations often hinges on the quality of data and the effectiveness of governance frameworks in place. |
| SR018 | PwC | AI Jobs Barometer | In the Healthcare sector, AI adoption is happening slower than in other industries and risk-controlled adoption of this technology matters. |
| SR019 | Datadog | LLM Observability / Agent Observability | Validate changes before rollout, monitor production health continuously, and scale AI programs with stronger governance and fewer surprises. |
| SR020 | LangChain | LangSmith Observability | LLM observability platforms provide visibility into agent decisions and help debug complex failures and hallucinations. |
| SR021 | Tech Funding News | Goodfire raises $150M Series B at $1.25B valuation | This lack of visibility makes AI hard to control, difficult to fix, and risky to deploy at scale. |
| SR022 | Goodfire | Understanding, Learning From, and Designing AI: Our Series B | To that end, we've built a model design environment. |
| SR023 | Goodfire | On optimism for interpretability | Models are complex systems, and understanding them is a genuine research challenge. |
| SR024 | Goodfire | Verbalized eval awareness inflates measured safety | Unless safety benchmarks account for eval awareness, they may systematically overestimate model alignment. |
| SR025 | Goodfire | Reasoning theater | Models genuinely reason through hard problems, but coast through easy ones while generating superfluous chain-of-thought. |
| SR026 | Goodfire | Stochastic parameter decomposition | SPD isn't a complete solution. |
| SR027 | Goodfire | Understanding memorization via loss curvature | The method is not yet mature and can be heavy-handed in its edits. |
| SR028 | Goodfire | Can SAEs capture neural geometry? | A single line can only give us a partial view of curved geometric structure. |
| SR029 | Goodfire | Manifold steering | Linear steering cuts across the behavior manifold and produces noisy, off-target effects. |
| SR030 | Goodfire | Model diff amplification | Even if an undesired behavior normally occurs only once in a million samples, amplification lets us surface it with far fewer rollouts. |
| SR031 | Goodfire | Phylogeny manifold | Interpretability can improve reliability and transparency for downstream applications, especially in clinical domains. |
| SR032 | Salesforce Ventures | Welcome Goodfire | Enterprise customers care more about the ROI they see from their AI investments than ever and cannot steer AI models to behave reliably and consistently. |
| SR033 | Lightspeed Venture Partners | Goodfire is building interpretable AI | As governments increasingly push regulation mandating explainable AI systems, enterprises will need to provide clear rationales for model behavior. |
| SR034 | Investing.com | Goodfire raises $150 million to improve AI model understanding | The company works with clients including Microsoft Corp., the Mayo Clinic, and the nonprofit Arc Institute. |
| SR035 | IBM | Think Topics: Model Observability | |
| SR036 | Datadog | Agent Observability | LLM Observability | Datadog | |
| SR037 | Langfuse | Langfuse | |
| SR038 | Langfuse | Pricing - Langfuse | |
| SR039 | LangChain | LangSmith: AI Agent & LLM Observability Platform | |
| SR040 | Weights | Weights is joining OpenAI | |
| SR041 | Goodfire | Priors in Time | |
| SR042 | Goodfire | A Geometric Calculator | |
| SR043 | Goodfire | Covariance Pooling | |
| SR044 | Goodfire | The Neural Geometry Series | |
| SV001 | Goodfire | Understanding, Learning From, and Designing AI: Our Series B | Today, we're excited to announce a $150 million Series B funding round at a $1.25 billion valuation. |
| SV002 | PR Newswire | AI Lab Goodfire Raises $150M at $1.25B Valuation To Design Models With Interpretability | Goodfire... announced a $150 million Series B funding round at a $1.25 billion valuation. |
| SV003 | Yahoo Finance | AI lab Goodfire raises $150M at $1.25B valuation to design models with interpretability | |
| SV004 | Pulse 2.0 | Goodfire: $150 Million Series B At $1.25 Billion Valuation Raised For Interpretability AI Lab | |
| SV005 | Tech Funding News | Goodfire bags $150M at $1.25B to build AI interpretability infrastructure | |
| SV006 | PR Newswire | Goodfire Raises $50M Series A to Advance AI Interpretability Research | Today, Goodfire... announced a $50 million Series A funding round led by Menlo Ventures... to support... Ember. |
| SV007 | Menlo Ventures | Leading Goodfire's $50M Series A to Interpret How AI Models Think | |
| SV008 | Lightspeed Venture Partners | Goodfire: Building Interpretable AI | |
| SV009 | Salesforce Ventures | Welcome Goodfire | |
| SV010 | On Healthcare | Goodfire AI and the Billion Dollar Black Box | The valuation jump... is aggressive for a company with around 51 employees and what appears to be relatively early commercial traction. |
| SV011 | TechCrunch | The AI gold rush is pulling private wealth into riskier, earlier bets | |
| SV012 | TechCrunch | Here are the 17 U.S.-based AI companies that have raised $100M or more in 2026 | |
| SV013 | TechCrunch | Cursor's Anysphere nabs $9.9B valuation, soars past $500M ARR | |
| SV014 | TechCrunch | Harvey reportedly raising at $11B valuation just months after it hit $8B | |
| SV015 | TechCrunch | Enterprise AI startup Glean lands a $7.2B valuation | |
| SV016 | TechCrunch | Google to invest up to $40B in Anthropic in cash and compute | |
| SV017 | Gartner | Generative AI | |
| SV018 | NIST | AI Risk Management Framework | |
| SV019 | Arize AI | Phoenix | |
| SV020 | Fiddler AI | AI Observability and Security | |
| SV021 | LangChain | LangSmith Observability | |
| SV022 | Goodfire Research | The Shape of Stories Inside Neural Networks | |
| SV023 | Goodfire Research | Understanding and Steering Llama 3 | |
| SV024 | Goodfire Research | A Geometric Calculator | |
| SV025 | Goodfire Research | Covariance Pooling | |
| SV026 | Goodfire Research | Painting With Concepts | |
| SV027 | Goodfire Research | VPD Explainer | |
| SV028 | U.S. Securities and Exchange Commission | Form D for Goodfire AI, Inc. (Series A-era filing) | Goodfire AI, Inc.... DELAWARE... 2023 |
| SV029 | U.S. Securities and Exchange Commission | Form D for Goodfire AI, Inc. (Series B-era filing) | Yan-David Erlich |
| SV030 | Goodfire | Company | |
| SV031 | Goodfire | SOC 2 Type II | |
| SV032 | Goodfire | The Neural Geometry Series | |
| SV033 | Goodfire | SAE Scaling with Feature Manifolds | |
| SV034 | Goodfire | SAE Scaling with Feature Manifolds |