Style or substance? What an AI analysis of 90 impact funds suggests about reporting

Roughly a third of impact funds appear credible on first read: They have clear theses, polished SDG mappings and well-designed reports. Yet those same funds score far lower on the less-visible dimensions of impact reporting, particularly measurement rigour and governance of impact performance.

That finding — and the three that follow — comes from an AI-powered analysis of publicly available impact reports. Dalberg’s IMMPactAI engine assessed 90 funds against the Impact Performance Reporting Norms, with technical input provided by Impact Frontiers, the non-profit steward of the Reporting Norms.

I presented the preliminary assessment for the first time on June 2 at a Glocal Evaluation Week webinar with Matt Ripley from Impact Frontiers and three impact allocators — Lauren Rosales Farello from Blue Haven Initiative, Sarah Freeman from Trimtab and Rafael Matos Martinón from COFIDES, Spain’s development finance institution. 

The setup

Impact reporting has long been an industry pain point. Funds spend substantial time preparing reports that allocators struggle to compare, interpret and use effectively. A 2022 BlueMark study found dissatisfaction on both sides: GPs are overwhelmed by inconsistent data requests, and LPs are frustrated by reports that often disclose only the upside.

The Impact Performance Reporting Norms aim to address this gap by establishing shared, consensus-based expectations for the content of impact reporting. They cover six core topics, from impact thesis through governance and independent review. Since late 2025, more than 100 organizations have signed on as Founding Adopters.

This AI-powered gap analysis reviewed 47 Founding Adopters with public impact reports, plus 43 peers — for a cohort of 90 funds in total. IMMPactAI scored each report against the Impact Performance Reporting Norms using roughly 50 sub-questions across five criteria. 

The assessment also included two rounds of human-in-the-loop review. This human verification is necessary to verify and translate the literal, text-bound findings of the AI into nuanced, real-world context. With that calibration complete, the cohort mean came in at 57/100.

But the average hides the story.

Four findings investors should know about

As the AI-powered assessment is still being refined, these findings should be read as directional and hypothesis-generating, not definitive — but they do point to four patterns worth closer examination.

1. The style-over-substance gap is real. The AI analysis suggested four distinct archetypes across the cohort. A 26-fund “Vanguard” group scores well across every criteria. A 24-fund “Emerging” archetype is on its way up. A 13-fund “Nascent” archetype is at the beginning of their reporting journey. 

But the 27-fund “Marketers” archetype is the most interesting. They score similarly to Vanguard funds on strategy (68) and disclosure (64) but collapse on measurement (45) and governance (29) — a 35-point internal gap between how they present impact intentions and how they govern and manage towards them. Lauren Rosales Farello put it most directly during the panel: “Governance is really where the distinction is made between those who are leading the course and those who are just putting information in a report.”

This is where the specific mechanics of an AI-driven audit become incredibly revealing. An LLM’s total lack of communal context means it cannot read between the lines or give a fund credit for unwritten, cultural conventions. It evaluates codified frameworks and disclosures with a brutal, literal objectivity. For the “Marketers,” the AI effectively stripped away the comforting industry jargon, polished SDG graphics, and aspirational intent, laying bare a stark, structural void in actual impact management and governance metrics.

2. The institutional paradox. The five largest funds in the assessed cohort — with combined AUM above $1 trillion — produce visually polished reports whose composite scores run from 26/100 to 72/100. Across the cohort, AUM and disclosure quality are uncorrelated. Within the top archetypes, AUM and strategy quality are negatively correlated; smaller funds with narrower theses often articulate impact more precisely than diversified mega-platforms. For smaller managers worried about competing on production values, the directional takeaway is clear: Budget does not appear to be the binding constraint on reporting quality.

3. The “highlight reel” bias. Forty-two percent of the cohort publish no substantive disclosure of negative impacts, trade-offs or underperformance. Yet doing so is the single largest practice signal in the dataset: funds that disclose negatives score roughly 15 points higher. The panel surfaced why this is hard. 

COFIDES’s Rafael Matos Martinón named the constraint explicitly. Working at a sovereign-backed DFI, “it’s difficult to convince political appointees of the negative impact that should be published, because they want to sell a different story,” he said. 

His reframe, which the panel endorsed: Disclose trade-offs, not failures. Connect negative outcomes to corrective actions and learning. Trimtab’s Sarah Freeman uses the phrase “what’s different than expected and what’s better than expected.” Blue Haven’s Lauren Rosales Farello added the necessary LP discipline: Don’t punish funds for honesty. “Think about what they learned, what they committed to, what came of the trade-off.”

Shifting an organization’s culture away from a curated highlight reel requires a safe environment to confront shortcomings. This highlights a distinct advantage of AI-enabled tools like IMMPactAI where funds can run low-stakes review and benchmarking of their data, building the internal comfort needed to publish authentic, learning-oriented disclosures to their LPs.

4. The verification void. Only 20% of funds in the cohort name a third-party assurer for their impact data or a verifier of their reporting practices. Much of what is described as “verification” in reports turns out, on closer inspection, to be internal review or B-Corp certification — all useful, but not the same as an independent view on impact performance. Verification has a relatively small effect on overall scores, and it is an optional component of the Impact Performance Reporting Norms, but a well-established supporting infrastructure exists including a set of verifiers, assurance standards and audit practices that funds can draw on.

A high-leverage trio: Three practices of leading funds

Investors and managers reading the findings above may just assume that “doing this is hard.” But the AI analysis hinted at a way forward, identifying three practices that appear to distinguish leading funds from the rest: disclosing negative impacts, commissioning third-party reviews and demonstrating a clear link between financial remuneration and impact performance.

Across the 90 funds, the relationship between adopting these practices and overall score is one of the clearest dose-response patterns in the dataset. Adopting just one — with disclosure of negative impacts appearing to be the highest-leverage — is enough to place a fund in the top half of the cohort. As Martinón put it, “At the end of the day, the bar is not that high. But it’s very revealing as an LP.”

What’s next

This work remains preliminary. Impact Frontiers expects to publish a State of Impact Reporting report this fall, using a refined AI-powered analysis as one input into a broader review of adopter practices. The free-to-use IMMPactAI gap-analysis tool is already being used by more than 60 organizations to self-assess their reports against the Impact Performance Reporting Norms.

As adoption of the reporting norms and automated diagnostics scales, however, the industry must guard against the temptation to let AI analysis become the final word rather than the first step. An AI engine can catch missing metrics and reporting inconsistencies, but it can never perform the real-world work of building investor trust.

The Impact Performance Reporting Norms are not a regulatory regime. They are a set of shared expectations for an industry that, by its own admission, has carried too much variance in reporting practices for too long. These early years represent a baseline. The real test will come in later cycles — and in whether reporting practices begin to lift consistently across the industry.


Kusi Hornberger is a partner at Dalberg. 

Guest posts on ImpactAlpha represent the opinions of their authors and do not necessarily reflect the views of ImpactAlpha.

The methodology and presentation deck used for this preliminary analysis are available on request. The gap-analysis tool can also be piloted by contacting the author.