As generating research becomes easier, validating research becomes more important. This may be the most important consequence of AI (automation) in evidence synthesis
Automation can increasingly assist with searching, screening, extraction and summarisation. But trustworthy evidence has never depended primarily on the mechanics of processing information. It depends on the methods used to decide what evidence belongs, how it should be interpreted and what conclusions can legitimately be drawn from it.
If anything, increasingly capable automation makes methodological expertise more important, not less.
Trustworthy evidence still depends on methods
Protocols, eligibility criteria, search strategies and risk-of-bias assessments are not administrative hurdles. They are the safeguards that turn information into evidence.
As generating research becomes easier, the consequences of weak validation become more visible. A 2026 audit published in The Lancet found that fabricated references in biomedical papers rose sharply between 2023 and early 2026. At the same time, organisations such as arXiv have reaffirmed a simple principle: researchers remain accountable for work produced with automation.
Together, these developments point to a broader shift. The challenge is no longer generating information. It is determining which information can be trusted. High-quality methods are how researchers separate signal from noise, justify conclusions and ensure findings can be scrutinised, challenged and reproduced by others.
Emerging standards such as RAISE reflect the same reality. Automation can assist researchers, but responsibility for the final output remains with them.
Evidence synthesis is more than information processing
Automation performs best when tasks are clear, structured and repeatable. Evidence synthesis is only partly that.
Automation is increasingly capable of summarising individual studies. Determining what an entire body of evidence means is much harder. It requires weighing conflicting findings, evaluating study quality and deciding how much confidence to place in the available evidence.
The challenge is that evidence synthesis does not happen in a world of clean inputs and obvious answers. Studies sit on the edge of eligibility criteria. Outcomes are reported inconsistently. Even experienced reviewers can reasonably disagree about how evidence should be interpreted.
What separates experienced reviewers from novices is often not knowledge of the rules, but judgment about how to apply them. This is why the hardest problems in evidence synthesis are rarely the routine ones. They are the ambiguous trial, the borderline inclusion decision and the study that does not quite fit the framework. A system that is usually correct may still be unsuitable if its mistakes occur precisely where judgment matters most.
As routine review tasks become increasingly automated, the value of methodological expertise shifts rather than disappears. Researchers may spend less time processing information and more time resolving uncertainty, validating outputs and deciding what conclusions can legitimately be drawn from the evidence.
The role of researchers is changing, not disappearing
Of course, not everyone would agree with this conclusion. A reasonable counterargument to the superiority of human expertise comes from Rich Sutton’s The Bitter Lesson. Sutton argues that many of AI’s biggest advances have come from general purpose learning systems outperforming approaches that rely heavily on human domain knowledge. It would be unwise to underestimate how capable future systems may become. In fact, we expect them to become dramatically more capable.
But this argument is not primarily about capability. It is about trust. Trustworthy evidence depends on more than capability alone. It includes transparency, reproducibility, accountability and sound judgment. Two of these requirements are particularly difficult to delegate:
- Accountability because systematic reviews inform real-world decisions. When a guideline gets it wrong, responsibility ultimately sits with people and institutions, not models.
- Judgment because evidence synthesis exists to help people make decisions in conditions of uncertainty. Producing information is not enough. Researchers must decide how much confidence to place in imperfect evidence, how to interpret ambiguity and whether a conclusion is justified. Those decisions depend not only on information, but on human understanding of context, consequence and risk.
Future systems may become dramatically more capable. But researchers cannot delegate accountability for the outcome or the judgment needed to reach it. The work will therefore likely shift from conducting reviews to designing, governing and validating increasingly automated systems.
The future of evidence synthesis may involve more automation than ever before. But as generating research becomes easier, validating research becomes more important. The role of researchers will not disappear. It will increasingly centre on the thing that has always mattered most: deciding what evidence can be trusted and what conclusions are justified.
This is the fifth article in Covidence’s six-part series on AI:
- Part 1: Covidence’s approach to responsible automation (AI)
- Part 2: How Covidence decides what AI to release (and what to hold back)
- Part 3: Beyond evaluation: deciding when AI is appropriate in evidence synthesis
- Part 4: How we evaluate AI models for evidence synthesis
- Part 5: AI won’t replace methods skills, it depends on them
- Coming next: How to justify AI tools to institutions
How do you feel about Covidence’s approach to automation (AI)? Share your perspective


