Editor’s Note: This article examines findings from News Integrity in AI Assistants, the largest international study of AI assistant accuracy, conducted by 22 public service media organizations evaluating more than 3,000 responses across 18 countries and 14 languages. For cybersecurity, information governance, and eDiscovery professionals, these findings carry direct implications for research workflows, compliance processes, and evidence handling. The documented 45% rate of responses with at least one issue—including fabricated facts, altered quotes, and unreliable sourcing—challenges assumptions about AI reliability in professional contexts where accuracy isn’t negotiable. As organizations increasingly embed these tools in critical workflows, understanding their limitations becomes an operational necessity. This analysis connects research findings to practical applications in legal technology, threat intelligence, regulatory compliance, and digital evidence authentication, providing actionable insights for professionals navigating the intersection of AI capabilities and professional obligations.
Content Assessment: Beyond the Hype: Major Study Reveals AI Assistants Have Issues in Nearly Half of Responses
Information - 94%
Insight - 94%
Relevance - 92%
Objectivity - 94%
Authority - 93%
93%
Excellent
A short percentage-based assessment of the qualitative benefit expressed as a percentage of positive reception of the recent article from ComplexDiscovery OÜ titled, "Beyond the Hype: Major Study Reveals AI Assistants Have Issues in Nearly Half of Responses."
Industry News – Artificial Intelligence Beat
Beyond the Hype: Major Study Reveals AI Assistants Have Issues in Nearly Half of Responses
ComplexDiscovery Staff
Artificial intelligence assistants now have at least one issue in 45% of their responses to news questions, according to the most extensive international study of its kind, raising urgent questions about the reliability of information tools that millions of professionals use daily for research, case preparation, and decision-making. The findings, released in October 2025 by the European Broadcasting Union and BBC in their News Integrity in AI Assistants study, expose systemic failures across ChatGPT, Microsoft Copilot, Google Gemini, and Perplexity—platforms increasingly embedded in the workflows of legal, cybersecurity, and information governance professionals who depend on accurate, verifiable information.
Researchers from 22 public service media organizations across 18 countries evaluated 2,709 core and 353 custom AI-generated responses (3,062 total) to news questions in 14 languages, revealing that 81% of all responses contained at least some form of issue, from minor inaccuracies to fabricated facts that could materially mislead users. For professionals who rely on precision in digital evidence handling, regulatory compliance, and risk assessment, these error rates represent more than statistical anomalies—they signal fundamental challenges to information integrity in an age when AI tools are rapidly displacing traditional search and research methods.
The study found sourcing failures in 31% of responses, with information either unsupported by cited sources, incorrectly attributed, or backed by non-existent references altogether. Accuracy issues plagued 20% of responses, including completely fabricated facts, outdated information, and distorted representations of events. Perhaps most concerning for professionals handling sensitive matters, 14% of responses failed to provide sufficient context, potentially leading to incomplete understanding of complex legal, regulatory, or security issues.
Google Gemini emerged as the worst performer, with 76% of its responses containing at least one issue—more than double the rate of other assistants—primarily driven by severe sourcing problems that appeared in 72% of Gemini responses compared to less than 25% for competitors. The platform’s presentation of sources often lacked direct links and showed inconsistent sourcing, creating a confusing landscape for users attempting to verify information.
The research documented numerous examples of basic factual errors with potentially serious consequences. Assistants incorrectly identified the current NATO Secretary General and German Chancellor, despite both individuals no longer holding these positions when responses were generated. Copilot drew on outdated NATO sources and presented opinion as fact in responses about NATO, illustrating how reliance on superseded information creates compliance risks. Such temporal confusion proves particularly problematic for professionals working with time-sensitive information in litigation, incident response, or regulatory compliance contexts.
Legal technology specialists will recognize familiar dangers in the study’s documentation of fabricated and altered quotes. Perplexity created entirely fictitious quotes attributed to labor unions and government councils when answering questions about labor disputes, even presenting these fabrications under a “Key Quotes” heading that implied authoritative sourcing. ChatGPT altered direct quotes in ways that changed their tone and meaning, transforming “It’s a very stupid thing to do” into the more inflammatory “stupid trade war” when quoting a Canadian official. For eDiscovery professionals who must authenticate statements and maintain chain of custody for digital evidence, such alterations raise fundamental questions about AI-generated content as a reliable information source.
Information governance specialists face parallel challenges. When AI assistants provide incomplete context or fail to distinguish opinion from fact—issues identified in 14% and 6% of responses respectively—professionals making retention decisions, classification determinations, or privacy impact assessments may work from fundamentally flawed understandings of regulatory requirements or legal precedents. The study documented cases where assistants presented outdated political leadership, obsolete laws, and superseded regulations as current fact, creating compliance risks for organizations relying on these tools for regulatory intelligence.
The research also revealed concerning patterns in how assistants handle uncertainty. Rather than acknowledging limitations or declining to answer when information is unavailable, the assistants answered virtually all questions, even when they lacked reliable data. Across the entire dataset of 3,113 questions asked to all assistants, only 17 were met with refusal, representing just 0.5%. This eagerness to respond regardless of capability, combined with confident tones that mask underlying uncertainty, creates what researchers call “over-confidence bias.” For professionals trained to assess the reliability of sources and maintain healthy skepticism about unsupported claims, these characteristics fundamentally undermine AI assistants’ utility as research tools.
Despite these failures, public trust in AI-generated information remains surprisingly high, particularly among younger users who increasingly turn to these tools for news and information. Recent BBC research found that over one-third of UK adults completely trust AI to produce accurate information summaries, rising to nearly half among those under 35. Yet 42% of adults indicated they would trust an original news source less if an AI summary contained errors, creating reputational risks for organizations whose content gets misrepresented by AI assistants. This challenge presents a troubling paradox: users trust AI while simultaneously acknowledging that errors in AI summaries damage their trust in the actual sources—even when those sources bear no responsibility for the AI’s mistakes.
The study did identify modest improvements since earlier BBC research in February 2025. Among BBC responses specifically, the percentage of answers with issues dropped from 51% to 37%, with notable reductions in problems distinguishing opinion from fact and in editorialization. For Gemini specifically, accuracy issues among BBC responses improved from 46% to 25%. Additionally, sourcing improvements were notable: only a single BBC response this round lacked a direct URL source, compared with 25 in the previous round—mostly from Gemini. However, researchers emphasized that even with these improvements, current error rates remain “alarmingly high” and insufficient for AI assistants to serve as reliable news sources. The rapid pace of AI development means these findings may already be outdated, as companies release new models with different capabilities and limitations—underscoring the need for continuous, independent evaluation.
Organizations can implement several strategies to mitigate risks while using AI research tools. First, establish mandatory verification protocols requiring independent confirmation of all AI-generated facts, citations, and legal authorities before use in consequential decisions or client communications. Second, implement AI literacy training that educates staff about common failure modes, including fabricated sources, temporal confusion, and incomplete context. Third, maintain access to traditional research tools and databases as backup verification systems. Fourth, document AI tool use in case files and work product to enable later auditing if questions about information accuracy arise. Finally, consider restricting AI assistant use to preliminary research and ideation rather than definitive fact-finding or legal analysis.
The BBC and European Broadcasting Union have released a News Integrity in AI Assistants Toolkit providing detailed taxonomies of failure modes and recommendations for improvement. The toolkit identifies five essential criteria for evaluating AI responses: accuracy (including accuracy of direct quotes), sourcing with verifiable citations, clear distinction between opinion and fact, avoiding inappropriate editorialization, and sufficient context for proper understanding. These standards closely align with professional requirements in legal, compliance, and security contexts, where precision, verifiability, and completeness aren’t optional features but fundamental necessities.
For eDiscovery professionals, the implications extend beyond research into core practice areas. As courts and parties increasingly discuss using generative AI for document review, early case assessment, and privilege logging, the baseline unreliability documented in this study raises questions about appropriate applications. Technology-assisted review (TAR) systems, which underwent extensive validation and generated substantial case law establishing reliability standards, may provide instructive precedents for evaluating newer AI tools. The key difference: TAR systems operate on closed document sets with measurable accuracy and human-validated workflows, while generative AI assistants operate across open-ended web content with inconsistent quality controls.
Information governance programs face strategic decisions about the use of AI assistants in classification, retention, and privacy workflows. When AI tools misrepresent legal requirements, conflate jurisdictions, or present outdated regulations, the resulting governance policies may fail to meet actual obligations—creating exposure to regulatory enforcement, civil litigation, or data breaches. Organizations should consider whether AI-assisted research requires additional approval layers, peer review, or compliance validation before policy implementation. Some may choose to restrict AI use to non-critical research while reserving authoritative guidance for human experts and established legal research platforms.
Cybersecurity teams navigating AI-generated threat intelligence must balance efficiency gains against accuracy risks. While AI can rapidly process vast amounts of security information, the documented tendency toward fabrication and confident presentation of false information creates potential for resource misallocation and missed genuine threats. Organizations should maintain robust human-in-the-loop validation for security intelligence, correlate AI-generated insights against multiple independent sources, and avoid automation of security decisions based solely on AI analysis. The principle applies equally to regulatory compliance, incident response, and vulnerability management.
The study’s international scope—covering 18 countries and 14 languages—reveals that these issues transcend borders, platforms, and linguistic contexts. Problems appear systemic rather than isolated, affecting all four major assistants across diverse user populations and question types. This geographic and linguistic breadth suggests that underlying architectural challenges, rather than localized data quality issues, drive the failures. For multinational organizations operating across jurisdictions, this means that AI reliability problems will appear consistently across global operations rather than concentrating in specific regions or languages.
Looking forward, several developments may influence AI assistant reliability. Regulatory frameworks are emerging, with the European Union implementing AI legislation and various jurisdictions considering transparency, accountability, and accuracy requirements. The EBU and its member organizations are calling for strengthened enforcement of existing regulations, establishment of oversight bodies for continuous monitoring, and formal dialogue between technology companies and news organizations to develop accuracy and transparency standards. Whether industry self-regulation proves sufficient or governmental intervention becomes necessary remains an open question as error rates persist despite apparent technological improvements.
Professional communities can contribute to improving AI reliability by documenting failures, sharing best practices, and establishing discipline-specific benchmarks for acceptable performance. Legal technology associations, information governance groups, and cybersecurity professional organizations are well-positioned to develop evaluation frameworks, certification programs, and standards of practice governing AI tool use in high-stakes contexts. Research indicates that approximately 7% of all online news consumers now use AI assistants, rising to 15% among those under 25—suggesting that professional use will only increase as younger cohorts enter the workforce.
The fundamental challenge extends beyond current error rates to the nature of large language models themselves. These systems generate probabilistic outputs based on pattern recognition rather than logical reasoning or factual knowledge databases. Even as technical improvements reduce certain types of errors, the underlying architecture means that hallucinations, temporal confusion, and context failures represent intrinsic characteristics rather than bugs awaiting fixes. Organizations must therefore develop sustainable approaches to AI tool use that account for persistent unreliability rather than assuming problems will disappear with next-generation models.
The question facing cybersecurity, legal, and information governance professionals isn’t whether to use AI assistants—their integration into workflows appears inevitable—but how to use them responsibly given documented limitations. When 45% of responses contain issues and 81% contain some form of problem, these tools cannot serve as authoritative sources. They may function as starting points for research, hypothesis generators for investigation, or efficiency multipliers for routine tasks—but only with robust verification systems ensuring that their outputs meet professional standards before influencing consequential decisions.
As artificial intelligence continues reshaping how professionals find, evaluate, and apply information, the gap between capability and reliability demands attention. The challenge isn’t simply technical—improving algorithms or expanding training data. It’s fundamentally about whether the probabilistic nature of large language models can ever align with the deterministic requirements of legal, compliance, and security work, where “mostly accurate” proves insufficient and single errors carry disproportionate consequences. Until that alignment occurs—if it occurs—professional skepticism must remain the first line of defense against artificially intelligent but fundamentally unreliable assistants.
How will your organization balance the efficiency promises of AI research tools against the accuracy requirements of professional practice?
News Sources
- Fletcher, J., & Verckist, D. (2025, October). News Integrity in AI Assistants: An international PSM study. European Broadcasting Union (EBU) & BBC.
- Largest study of its kind shows AI assistants misrepresent news content 45% of the time – regardless of language or territory (BBC)
- News Integrity in AI Assistants – an International PSM Study (DW)
- Global study on news integrity in AI assistants shows need for safeguards and improved accuracy (NPR)
- AI assistants make widespread errors about the news, new research shows (Reuters)
- AI models misrepresent news events nearly half the time, study says (Al Jazeera)
- AI Assistants Get The News Wrong Nearly Half The Time, Say Researchers (Forbes)
- Top AI assistants misrepresent news content, study finds (CBC)
- News Integrity in AI Assistants Report (European Broadcasting Union)
Assisted by GAI and LLM Technologies
Additional Reading
- The Agentic State: A Global Framework for Secure and Accountable AI-Powered Government
- Cyberocracy and the Efficiency Paradox: Why Democratic Design is the Smartest AI Strategy for Government
- The European Union’s Strategic AI Shift: Fostering Sovereignty and Innovation
Source: ComplexDiscovery OÜ


























