Exploring the Inclusion of eDiscovery-Centric Resources in the Google C4 Dataset: A Highly Selective Search

Apr 26, 2023

Content Assessment: Exploring the Inclusion of eDiscovery-Centric Resources in the Google C4 Dataset - A Highly Selective Search

Information - 93%

Insight - 92%

Relevance - 90%

Objectivity - 89%

Authority - 90%

91%

Excellent

A short percentage-based assessment of the qualitative benefit of the recent post highlighting the presence of selected eDiscovery resources in Google's C4 Dataset.

Editor’s Note: From time to time, ComplexDiscovery highlights publicly available or privately purchasable announcements, content updates, and research from cyber, data, and legal discovery providers, research organizations, and ComplexDiscovery community members. While ComplexDiscovery regularly highlights this information, it does not assume any responsibility for content assertions.

Contact us today to submit recommendations for consideration and inclusion in ComplexDiscovery’s data and legal discovery-centric service, product, or research announcements.

Background Note: The impact of organizations and entities on the output from Large Language Models (LLMs) can be more significant than one might initially anticipate. In some instances, specific resources within an industry can considerably influence how LLMs process and respond to information. One example of this influence can be observed by examining the Google C4 Dataset and searching for a non-comprehensive selection of domains from 55 eDiscovery-centric websites. While this exploration only offers a snapshot of selected resources from a non-all-inclusive list, it may provide valuable context for those evaluating the resource impact on LLMs and also highlight some tools that can help better understand the content populating LLMs. This deeper understanding can, in turn, contribute to shedding light on how selected eDiscovery resources may play a substantial role in shaping the knowledge and responses generated by LLMs – a role much more significant (or less important) than one might think.

Industry Backgrounder

Exploring the Inclusion of eDiscovery-Centric Resources in the Google C4 Dataset: A Highly Selective Search

ComplexDiscovery*

Large language models, such as those developed by Google and OpenAI, are becoming increasingly sophisticated and pervasive in various industries. One such application of these models is in the eDiscovery ecosystem, which contains touchpoints ranging from cybersecurity and information governance to legal discovery. This article explores at a very high level the inclusion of selected eDiscovery-centric resources in the Google C4 Dataset. It also discusses why understanding this exploration may benefit professionals working in the eDiscovery ecosystem.

Google’s C4 Dataset and its Relevance to eDiscovery

Understanding the Google G4 Dataset

Google’s C4 (Colossal Clean Crawled Corpus) project aims to create a comprehensive and diverse dataset for training large language models. The dataset is built from web pages crawled by the CommonCrawl project and includes a diverse range of content in multiple languages. Google’s C4 Dataset serves as an essential foundation for developing more accurate and sophisticated language models that can understand and generate human-like text.

The C4 dataset from Google contains approximately 750GB of cleaned text data derived from CommonCrawl web pages. This large-scale dataset is utilized for training and improving large language models, such as those based on the GPT architecture.

CommonCrawl is an open-source initiative that crawls and archives publicly available web content. This vast repository of web-crawled data is invaluable for training large language models, as it provides a diverse and extensive source of text in multiple languages. The Common Crawl project significantly contributes to the C4 Dataset, enhancing its quality and usefulness for AI research.

The Role of large language models in eDiscovery

Large language models can potentially revolutionize the eDiscovery process by automating tasks ranging from document review to review reporting. These models can analyze vast amounts of data quickly and efficiently, identify relevant information, and generate insightful summaries or responses. As a result, they can save time, reduce costs, and improve the accuracy of eDiscovery outcomes.

Inclusion of eDiscovery-centric resources in the C4 Dataset

The presence of eDiscovery resources in the C4 Dataset is crucial for ensuring the accuracy and relevance of large language model outputs in the eDiscovery context. By training on high-quality eDiscovery resources, the models can better understand the domain-specific language, concepts, and best practices, leading to more reliable and valuable results for eDiscovery professionals.

ComplexDiscovery’s Non-Comprehensive List of eDiscovery Resources and Its Significance

Introduction to ComplexDiscovery’s resource listing

On March 9, 2023, ComplexDiscovery published a non-comprehensive list of potentially helpful eDiscovery-centric resources. These resources, ranging from analyst and research firms to industry associations and blogs, were designed to serve as a simple starting point for individuals seeking information related to eDiscovery.

Selection of resources from ComplexDiscovery’s list for analysis

Given the manageable size of this resource listing and the direct or indirect relevance to the eDiscovery ecosystem of each listed resource, ComplexDiscovery created a truncated listing from an initial grouping of 100+ resources and used the top-level domain names of those resources to search the C4 Dataset. This truncation, which included the removal of top-level domain duplicates for multiple resources on the same domain and removing resources not available at the time of the Google C4 Dataset snapshot, resulted in a list of 55 resource domains.

Top-level domain names search against the C4 Dataset

The objective of searching the top-level domain names of the selected resources within the C4 Dataset was to explore how a very targeted snapshot of eDiscovery resources might be represented in the C4 Dataset. This information on the representation of selected resources may help gauge how these resources are being used to train Google’s large language models in responding to inquiries and prompts related to eDiscovery.

The results of top-level domain name searches of 55 eDiscovery-centric resources are provided in the following table, as extracted from the C4 Dataset search capability resource featured in the Washington Post article titled “Inside the Secret List of Websites That Make AI Like ChatGPT Sound Smart.” The data is reported based on database rank, tokens, and the percentage of all tokens. The aggregated results for the selected resources below showcase the prevalence of content from these resources in the C4 Dataset.

Table: Selected eDiscovery Resources and the C4 Dataset

Resource Category (ComplexDiscovery)	Resource	Domain Searched	Rank	Tokens (Rounded)	Percent of All Tokens
Analyst, Research, and Review Firms	G2	G2.com	152	16,000,000	0.01%
Analyst, Research, and Review Firms	Capterra	Capterra.com	216	13,000,000	0.008%
News, Announcement, and Commentary Resources	Lexology	Lexology.com	519	8,100,000	0.005%
Analyst, Research, and Review Firms	Software Advice	SoftwareAdvice.com	730	6,300,000	0.004%
Associations, Consortiums, and Groups	IAPP (International Association of Privacy Professionals)	IAPP.org	5,236	1,900,000	0.001%
News, Announcement, and Commentary Resources	JD Supra	JDSupra.com	5,274	1,800,000	0.001%
News, Announcement, and Commentary Resources	Legaltech News	Law.com	5,898	1,700,000	0.001%
Information and Research Resources	NIST (National Institute of Standards and Technology)	NIST.gov	5,920	1,700,000	0.001%
Analyst, Research, and Review Firms	TrustRadius	TrustRadius.com	6,958	1,500,000	0.001%
Information and Research Resources	Cybersecurity Legal Task Force (American Bar Association)	AmericanBar.org	8,266	1,300,000	0.0009%
Information and Research Resources	FTC Premerger Notification Program (Federal Trade Commission)	FTC.gov	10,959	1,100,000	0.0007%
Analyst, Research, and Review Firms	Gartner	Gartner.com	19,166	720,000	0.0005%
Industry Blogs	eDiscovery Team (Ralph Losey)	E-DiscoveryTeam.com	29,362	530,000	0.0003%
Analyst, Research, and Review Firms	IDC	IDC.com	41,812	400,000	0.0003%
Analyst, Research, and Review Firms	Forrester	Forrester.com	42,218	400,000	0.0003%
News, Announcement, and Commentary Resources	LawSites	LawSitesblog.com	63,769	290,000	0.0002%
Analyst, Research, and Review Firms	Chambers and Partners	Chambers.com	77,729	250,000	0.0002%
Industry Blogs	Artificial Lawyer (Richard Tromans)	ArtificialLawyer.com	85,162	230,000	0.0001%
Educational Training and Resources	E-Discovery Team Training	e-DiscoveryTeamTraining.com	93,748	210,000	0.0001%
News, Announcement, and Commentary Resources	LexBlog	LexBlog.com	110,534	180,000	0.0001%
News, Announcement, and Commentary Resources	LegalIT Insider	LegalTechnology.com	122,034	170,000	0.0001%
eDiscovery Provider Websites	Relativity	Relativity.com	145,664	150,000	0.00009%
Industry Blogs	eDisclosure Information Project (Chris Dale)	ChrisDaleOxford.com	187,731	120,000	0.00008%
News, Announcement, and Commentary Resources	Legal IT Professionals	LegalITProfessionals.com	220,976	100,000	0.00007%
Information and Research Resources	ENISA (European Union Agency for Cybersecurity)	ENISA.Europa.eu	271,149	85,000	0.00005%
Associations, Consortiums, and Groups	EDRM (Electronic Discovery Reference Model)	EDRM.net	293,316	79,000	0.00005%
eDiscovery Provider Websites	IPRO	IPROTech.com	299,993	77,000	0.00005%
Associations, Consortiums, and Groups	Women in eDiscovery	WomenineDiscovery.org	303,379	77,000	0.00005%
eDiscovery Provider Websites	Nuix	Nuix.com	323,733	72,000	0.00005%
eDiscovery Provider Websites	Epiq	EpiqGlobal.com	387,082	61,000	0.00004%
Analyst, Research, and Review Firms	ComplexDiscovery	ComplexDiscovery.com	445,248	53,000	0.00003%
Associations, Consortiums, and Groups	ACEDS (Association of Certified E-Discovery Specialists)	ACEDS.org	470,275	50,000	0.00003%
Industry Blogs	Hanzo Blog (Hanzo)	Hanzo.co	486,348	49,000	0.00003%
eDiscovery Provider Websites	Exterro	Exterro.com	508,502	46,000	0.00003%
Associations, Consortiums, and Groups	The Sedona Conference (TSC)	TheSedonaConference.org	508,617	46,000	0.00003%
Industry Blogs	Ball In Your Court (Craig Ball)	CraigBall.net	602,359	39,000	0.00002%
eDiscovery Provider Websites	Disco	CSDisco.com	747,835	31,000	0.00002%
eDiscovery Provider Websites	HaystackID	HaystackID.com	763,781	30,000	0.00002%
Information and Research Resources	International Cyber Law in Practice: Interactive Toolkit (NATO CCDCOE)	CCDCOE.org	818,082	28,000	0.00002%
eDiscovery Provider Websites	Logikcull	Logikcull.com	838,778	27,000	0.00002%
eDiscovery Provider Websites	Lexbe	Lexbe.com	894,973	26,000	0.00002%
Associations, Consortiums, and Groups	ILTA (International Legal Technology Association)	ILTAnet.org	929,143	24,000	0.00002%
eDiscovery Provider Websites	Lighthouse	LighthouseGlobal.com	1,049,929	21,000	0.00001%
eDiscovery Provider Websites	KLDiscovery	KLDiscovery.com	1,064,262	21,000	0.00001%
Information and Research Resources	GDPR (General Data Protection Regulation) (European Union)	GDPR.eu	1,089,043	20,000	0.00001%
Associations, Consortiums, and Groups	CLOC (Corporate Legal Operations Consortium)	CLOC.org	1,200,575	18,000	0.00001%
Industry Blogs	Ride the Lightning (Sharon Nelson)	SenseiEnt.com	1,222,763	18,000	0.00001%
Information and Research Resources	EDPB (European Data Protection Board)	EDPB.Europa.eu	1,306,894	17,000	0.00001%
Associations, Consortiums, and Groups	ARMA International	Arma.org	1,321,946	16,000	0.00001%
Industry Blogs	The Cowen Group (David Cowen)	CowenGroup.com	1,637,480	13,000	0.000008%
Industry Blogs	eDiscovery Assistant Blog (Kelly Twigger)	eDiscoveryAssistant.com	1,757,035	12,000	0.000007%
Educational Training and Resources	Nordic Institute for Interoperability Solutions	NIIS.org	2,609,572	7,000	0.000004%
Industry Blogs	Reveal Blog (George Socha and Cat Casey)	RevealData.com	5,437,005	2,100	0.000001%
Associations, Consortiums, and Groups	GICLI (The Government Investigations & Civil Litigation Institute)	GICLI.org	10,772,422	330	0.0000002%
eDiscovery Provider Websites	L2 Services	L2Services.net	13,335,285	110	0.00000007%

Source: ComplexDiscovery and the Washington Post

Implications of eDiscovery Resource Representation in the C4 Dataset

Identifying potential biases and limitations

By analyzing the representation of eDiscovery resources in the C4 Dataset, professionals in the eDiscovery ecosystem can identify potential biases and limitations in the data used to train large language models. This knowledge may enable them to make more informed decisions about the reliability and applicability of AI-generated outputs in their work.

Enhancing the quality and diversity of data used to train large language models

Understanding the inclusion of eDiscovery resources in the C4 Dataset can also help researchers and developers improve the quality and diversity of data used to train large language models. By incorporating a more comprehensive range of eDiscovery-centric resources, models may become better equipped to generate more accurate and relevant responses in the eDiscovery context.

Addressing the needs of cybersecurity, information governance, and legal discovery professionals

By exploring the eDiscovery resources represented in the C4 Dataset, developers can better understand the needs of cybersecurity, information governance, and legal discovery professionals. This insight may allow them to fine-tune large language models to address better the unique challenges and requirements of the eDiscovery ecosystem, ultimately leading to more useful AI-generated outputs for these professionals.

Encouraging transparency in AI development

Highlighting the inclusion of eDiscovery-centric resources in the C4 Dataset emphasizes the importance of transparency in AI development. By understanding the data sources used to train large language models, professionals in the eDiscovery ecosystem may be able to evaluate the reliability of AI-generated outputs better and make more informed decisions about their adoption and integration into their work and workflows.

Conclusion

This high-level exploration of selected eDiscovery-centric resources in the Google C4 Dataset has meaningful implications for professionals in the eDiscovery ecosystem. Analyzing the representation of selected resources in the dataset may help identify potential biases and limitations, enhance the quality and diversity of data used to train large language models, and encourage transparency in AI development. It may also highlight, with context, resources that may have more influence than you would think on shaping LLM-driven answers to prompts and queries. As large language models continue to evolve and become more integrated into the eDiscovery ecosystem, understanding their data sources and potential limitations will be crucial in ensuring their successful application and adoption.

*Assisted by GAI and LLM Technologies

Article References

Additional Reading

Source: ComplexDiscovery

Have a Request?

If you have information or offering requests that you would like to ask us about, please let us know, and we will make our response to you a priority.

ComplexDiscovery OÜ is an independent digital publication and research organization based in Tallinn, Estonia. ComplexDiscovery covers cybersecurity, data privacy, regulatory compliance, and eDiscovery, with reporting that connects legal and business technology developments—including high-growth startup trends—to international business, policy, and global security dynamics. Focusing on technology and risk issues shaped by cross-border regulation and geopolitical complexity, ComplexDiscovery delivers editorial coverage, original analysis, and curated briefings for a global audience of legal, compliance, security, and technology professionals. Learn more at ComplexDiscovery.com.

Generative Artificial Intelligence and Large Language Model Use

ComplexDiscovery OÜ recognizes the value of GAI and LLM tools in streamlining content creation processes and enhancing the overall quality of its research, writing, and editing efforts. To this end, ComplexDiscovery OÜ regularly employs GAI tools, including ChatGPT, Claude, Gemini, Grammarly, Midjourney, and Perplexity, to assist, augment, and accelerate the development and publication of both new and revised content in posts and pages published (initiated in late 2022).

Market Sizing

Editor's Note: The eDiscovery Market Size Mashup is a research tool now in its fourteenth annual cycle. Since 2012, it has tracked the worldwide eDiscovery market through three structural eras: the early-cloud era, when subscription consumption began to displace perpetual licensing; the AI-assisted-review era, when predictive coding reset per-document review economics; and the demand-and-response era that defines 2025 through 2030, when generative-AI-assisted review and emerging agentic workflow features compress per-document cost against a data curve growing roughly five times faster than the dollars available to discover it.

The purpose has remained consistent across the cycles: to give legal technology executives, consultants, analysts, service providers, investors, and corporate legal teams a reconciled mid-range view of the market that is internally consistent, methodologically disclosed, and updated each year against the latest available data. The 2025 to 2030 cycle places the worldwide eDiscovery market at an estimated $19.61 billion in 2025 and a projected $28.08 billion by 2030, a reconciled 7.44 percent compound annual growth rate. Underneath the aggregate trajectory sits the central arithmetic of the cycle: a 27.6-percentage-point annual gap between data growth and market growth that compounds across five years into a 3.13-times productivity-per-dollar mandate by 2030. Each of the twelve Market Intelligence installments published across this cycle examined a single segmentation lens. The consolidated synthesis that follows brings those lenses together as one citable reference for procurement, capability planning, market analysis, and vendor-selection decisions through the back half of the decade. [exclude_from_rss]

[taq_review]

[/exclude_from_rss] Industry Research - eDiscovery Market Sizing Beat

Complete look: ComplexDiscovery OÜ's 2025 to 2030 eDiscovery market size mashup

A synthesis of the worldwide eDiscovery market across software, services, deployment, geography, sector, delivery, task share, and the demand-side data growth curve, reconciled within the ComplexDiscovery OÜ eDiscovery Marketplace Mashup Model ComplexDiscovery OÜ Staff Two numbers shape worldwide eDiscovery through 2030, and they do not move at the same pace. The dollars to discover potentially relevant information rise from approximately 19.61 billion in 2025 to approximately 28.08 billion by 2030, a multiplier of 1.4 times. The data those dollars must reach rises from approximately 181 zettabytes in 2025 to approximately 812 zettabytes in 2030, a multiplier of 4.5 times. By the end of the decade, the same dollar must cover roughly 3.13 times more data than it does at the start. That arithmetic, larger than any single segment shift or composition change, is the structural force underneath this cycle of the mashup. The 3.13-times productivity-per-dollar mandate that compounds out of the gap is a primary force underneath much of what follows. It pressures software to take share from services as channel billing shifts toward AI-driven workflows. It pressures review's relative share of task spend to decline even as review dollars rise. It pressures cloud-first procurement to become the operational default. And it sits alongside other structural dynamics including subscription consumption migration, direct-buyer maturation, and supplier consolidation. None of those other dynamics is reducible to the productivity mandate alone, but each operates in a market where the mandate sets the demand-side ceiling. What follows is a tour of how the mandate lands across each segmentation lens explored in this cycle of the Market Intelligence series, from the aggregate market line down through software, services, deployment, cloud composition, geography, sector, delivery approach, task share, and the demand-side data growth curve underneath all of it.

The shape of the worldwide market

The reconciled view places the worldwide eDiscovery market at approximately 19.61 billion dollars in 2025, rising to approximately 28.08 billion dollars by 2030, a compound annual growth rate of approximately 7.44 percent. Software, the smaller of the two segments in absolute terms, grows at roughly 10.41 percent. Services, the larger segment, grows at roughly 5.75 percent. The 4.66-percentage-point segment CAGR gap translates into a 5-percentage-point composition shift across the five-year horizon. Software's share of total worldwide eDiscovery spend rises from 34 percent in 2025 to 39 percent in 2030; services' share falls correspondingly from 66 percent to 61 percent. Services remains the larger segment by absolute spend through 2030; the segment crossover point falls outside the 2025 to 2030 window.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/05/eDiscovery-Market-Sizing-Past-and-Projected-2026.pdf" title="eDiscovery Market Sizing - Past and Projected - 2026"] Chart: eDiscovery Market Sizing, Past and Projected (2012 to 2030)

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/eDiscovery-Software-and-Services-Market-2025-2030.pdf" title="eDiscovery Software and Services Market (2025-2030)"] Chart: eDiscovery Software and Services Market (2025 to 2030)

Software's outperformance is not a fluke of the moment. The 4.66-percentage-point CAGR gap is, in large measure, the segment-level expression of an AI-driven channel reallocation: the same review workflow that once generated services revenue at a per-document or per-hour rate increasingly generates software revenue at a SaaS subscription or AI-inference rate. The work has not disappeared, and in many cases has expanded as data volumes grow, but the channel through which the work gets billed has steadily shifted from services to software. Services growth is slower but structurally durable; cross-border data, regulatory exposure, advisory work, and specialized response work sustain the services line even as software automates discrete tasks. The services segment in 2030 will not be a slower-growing version of the services segment of 2025; it will be a different mix.

How the composition is changing

Within the software segment, the cloud-first transition that has been underway for the better part of a decade is now functionally complete for new deployments. Off-premise software grows from approximately 5.29 billion dollars in 2025 to approximately 8.87 billion dollars in 2030, while on-premise software grows from 1.37 billion to 2.08 billion. On-premise solutions persist where security, sovereignty, or contractual constraints require them, but they are no longer the default for new procurement.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/05/eDiscovery-Software-Market-2025-2030-On-Off-Premise.pdf" title="eDiscovery Software Market (2025-2030) - On + Off Premise"] Chart: eDiscovery Software Market, On and Off Premise (2025 to 2030)

The more interesting subplot is inside the cloud category. SaaS holds roughly two-thirds of cloud spend in 2025 (approximately 67 percent) and drifts to approximately 63 percent by 2030 as PaaS and IaaS components compound at faster rates. PaaS rises from 15 percent of cloud spend to 17 percent; IaaS rises from 18 percent to 20 percent. The reason is simple: as advanced eDiscovery workloads incorporate large-scale processing, AI inference, vector search, and complex data engineering, customers and providers are increasingly integrating directly with platform and infrastructure services. Some of that integration appears in vendor SaaS revenue, and some appears as direct PaaS or IaaS spend.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/eDiscovery-Cloud-Software-Market-2025-2030.pdf" title="eDiscovery Cloud Software Market (2025-2030)"] Chart: eDiscovery Cloud Software Market (2025 to 2030)

Services growth lags software growth, but the headline number understates the qualitative shift inside services. Traditional managed-review revenue faces continued pricing compression as AI-assisted review compresses billable hours. Advisory services (litigation readiness, information governance, AI risk advisory) grow on the back of regulatory complexity. Specialized response work (forensic collection, second-request response, cross-border data transfer, regulatory inquiry support) grows at premium rates. The services segment in 2030 will not be a slower-growing version of the services segment of 2025; it will be a different mix.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/05/eDiscovery-Services-Market-2025-2030.pdf" title="eDiscovery Services Market (2025-2030)"] Chart: eDiscovery Services Market (2025 to 2030)

Where the money is going

The United States continues to anchor the worldwide market, accounting for roughly 66 percent of global spend in 2025 and easing to roughly 64 percent in 2030. The shift toward rest-of-world is gradual but real, driven by the maturation of data protection regimes outside the United States, the internationalization of regulatory inquiries, and the gradual buildout of regional capacity. Within rest-of-world, the United Kingdom, Canada, Germany, Australia, and Japan continue to claim the largest sub-shares, with rising activity in Singapore, India, and parts of the Middle East.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/eDiscovery-Market-Geographical-Overview-2025-2030.pdf" title="eDiscovery Market Geographical Overview (2025-2030)"] Chart: eDiscovery Market Geographical Overview (2025 to 2030)

The reconciliation distinguishes between government and regulatory demand on one hand and non-government (private-sector) demand on the other. Both grow over the period; non-government grows faster in both percentage and absolute terms. Non-government growth reflects expansion in civil litigation, internal investigations, corporate compliance, and AI-related risk advisory work. Government and regulatory growth reflects persistent investigative activity, ongoing premerger notification work, parallel inquiries in the European Union and the United Kingdom, and continued cross-border regulatory coordination.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/eDiscovery-Government-and-Non-Government-Market-Overview-2025-2030.pdf" title="eDiscovery Government and Non-Government Market Overview (2025-2030)"] Chart: eDiscovery Government and Non-Government Market Overview (2025 to 2030)

Segmenting worldwide spend by who captures the direct economic transaction (corporations and governments, law firms, or service providers) reveals a clear picture. The corporations-and-governments category remains the dominant channel throughout the period, reflecting continued in-house consumption supplemented by direct vendor procurement. Service providers grow faster than law firms over the forecast period. Law firms increasingly act as orchestrators rather than primary procurement channels, with the work and the dollars flowing around them.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/eDiscovery-Market-By-Direct-Delivery-Approach-2025-2030.pdf" title="eDiscovery Market By Direct Delivery Approach (2025-2030)"] Chart: eDiscovery Market by Direct Delivery Approach (2025 to 2030)

The task shift, and why it matters

The most consequential structural shift in the industry is not visible in the aggregate market line. It is in the composition of where eDiscovery dollars get spent across the three core tasks of collection, processing, and review. Review remains the largest single task expenditure throughout the period, but collection and processing capture increasing absolute and relative shares. Across a longer horizon, from RAND Corporation's 2012 baseline through ComplexDiscovery OÜ's 2025 modeling and 2030 forecast, review's share of total task spend has fallen from 73 percent to a reconciled 62 percent to a projected 52 percent: a 21-percentage-point decline across 18 years. Collection, over the same span, has expanded over threefold, from 8 percent to a projected 25 percent, a 17-percentage-point gain. Processing has been comparatively stable, rising from 19 percent to 23 percent.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/05/eDiscovery-Market-By-Task-2025-2030.pdf" title="eDiscovery Market By Task (2025-2030)"] Chart: eDiscovery Market by Task (2025 to 2030)

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/Relative-Task-Expenditures-for-Core-eDiscovery-Tasks.pdf" title="Relative Task Expenditures for Core eDiscovery Tasks"] Chart: Relative Task Expenditures for Core eDiscovery Tasks (2012, 2025, 2030)

The pace of the rebalance is accelerating. Roughly 47 percent of the 18-year share movement happens in the final five years from 2025 to 2030. Review's 5-year decline of 10 percentage points nearly equals the prior 13-year decline of 11 percentage points. The trend is consistent with a demand-and-response dynamic rather than two independent forces operating in parallel. The demand side is the growth in data volume subject to potential collection. The supply-side response is AI-assisted review's compression of per-document review costs, with predictive coding through the prior decade, generative-AI-assisted review through the current decade, and emerging agentic workflow features as the next compression wave. Data growth raises absolute review work to be done; AI compression compresses per-document review prices. Whether the resulting absolute review spend rises, falls, or stays flat depends on the relative pace of the two. In the reconciled view, review absolute spend continues to grow modestly, from approximately 12.16 billion dollars in 2025 to approximately 14.60 billion dollars in 2030, but materially slower than the aggregate market, which is why review's share declines. For practitioners and providers, the practical consequence is that capacity decisions made today should anticipate the structural drift toward collection-heavy and processing-heavy task profiles.

The demand side, data growth as the underlying force

Worldwide data volumes are projected to grow from approximately 181 zettabytes in 2025 to approximately 812 zettabytes in 2030, a compound annual growth rate of approximately 35 percent. Enterprise-held data, the subset most relevant to discoverable information, expands from approximately 54 zettabytes to approximately 243 zettabytes over the same period at the same rate, holding steady at roughly 30 percent of the global total. The 181 zettabyte 2025 anchor is consistent with IDC's Global DataSphere baseline. The 35 percent CAGR through 2030 reflects the Mashup Model's reconciliation across data-universe forecasts and enterprise-specific projections, and sits on the upper end of published industry growth estimates. It is higher than IDC's headline total-data forecast trajectory and lower than the most aggressive AI-content-driven projections.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/Data-Volume-and-Growth-in-Zettabytes-2025-2030.pdf" title="Data Volume and Growth in Zettabytes (2025-2030)"] Chart: Data Volume and Growth in Zettabytes (2025 to 2030)

The 35-percent data CAGR set against the 7.44-percent eDiscovery market CAGR is the central arithmetic of the cycle. Across the five-year horizon, global data multiplies roughly 4.5 times (4.484x, from 181 to 812 zettabytes); the worldwide eDiscovery market multiplies roughly 1.4 times (1.432x, from 19.6 to 28.1 billion dollars). Divide one by the other and the productivity-per-dollar requirement falls out: 4.484 ÷ 1.432 ≈ 3.13. By 2030, the same dollar must process, store, search, review, and produce against roughly 3.13 times more data than it did in 2025 just to maintain the same coverage ratio. That is not a marginal improvement. It is the productivity mandate that defines the decade for the industry. The mandate sits underneath several of the shifts documented above: the segment-level CAGR gap reflects the channel through which the productivity gain flows; the task-share rebalance reflects where the gain lands at the workflow level. AI capability compounding (predictive coding through the prior decade, generative-AI-assisted review through the current decade, emerging agentic workflow features as the next compression wave) is the bridge that must close the gap. Whether the bridge closes the gap fully, partially, or in stages depends on the pace at which the current generation of tooling matures, the pace at which agentic features move from product roadmap to production deployment, and whether the industry holds to full-coverage discovery as the standard or moves toward risk-tiered coverage that reserves the most intensive workflows for the documents that matter most. The mandate sets the ceiling; the answer to how the ceiling gets met is the open question of the decade.

What the reconciled view implies

The reconciled view supports a small set of interpretive points, none of them prescriptive, all of them grounded in the figures. For software vendors, the 10.41 percent reconciled software CAGR outpaces services by 4.66 percentage points a year, and the vendors positioned to capture a disproportionate share of incremental software dollars are those integrating AI-assisted review, modular SaaS delivery, and platform-aware processing into the same product surface. For service providers, slower nominal growth does not imply a less attractive market. It implies a different one. Providers that reposition around higher-value advisory and specialized regulatory response can outpace the segment headline. For law firms, the modest share of direct economic transactions captured by law firms suggests a continued shift toward orchestration and advisory positioning rather than primary procurement. For corporate and government legal teams, in-house consumption continues to dominate the direct delivery approach. Build-versus-buy on internal capabilities, governance over AI use in discovery, and readiness for second-request response remain the central program-level questions. For investors and analysts, the dynamics support a continued investment thesis around cloud-native, AI-enabled software platforms, with margin pressure on traditional services and continued consolidation at the supplier level.

Closing the loop

At the start of this analysis, the cycle was framed around two numbers that do not move at the same pace: 19.61 billion dollars rising to 28.08 billion against 181 zettabytes rising to 812 zettabytes. The productivity-per-dollar mandate that compounds out of that gap, 3.13 times by 2030, is not a forecast of efficiency. It is a measurement of pressure. The segment, task, and channel shifts documented above are the visible places where the pressure shows up. AI capability compounding is the bridge that must close the gap. Whether the industry meets that pressure through AI-assisted tooling alone, or moves toward a structural redefinition of what discovery coverage means in practice, from full coverage of every potentially relevant artifact to risk-tiered coverage that reserves the most intensive workflows for the documents that matter most, is the open question of the decade. The mashup measures the pressure. The industry answers it.

About the Model behind these figures

All quantitative figures in this analysis are drawn from the ComplexDiscovery OÜ eDiscovery Marketplace Mashup Model, a proprietary analyst-aggregation framework that reconciles publicly available third-party research, vendor disclosures, and industry reference work into a single defensible mid-range view of the worldwide eDiscovery market. The Model is a research aggregation tool maintained by ComplexDiscovery OÜ. It is not distributed publicly and does not constitute primary research; figures cited here represent reconciled estimates aligned to a common scope, geography, and timeframe.

This Mashup is the public synthesis vehicle for the Model's 2025 to 2030 cycle. It provides the consolidated reconciliation across software, services, deployment, cloud composition, geography, sector, delivery approach, task composition, long-horizon task share, and data growth, along with a representative listing of the organizations and publications that inform the Model's source aggregation.

Methodology

The scope of this synthesis is the worldwide eDiscovery market, encompassing software and services, expressed in U.S. dollars, across calendar years 2025 through 2030. Reconciliation of varying market definitions, geographic scopes, and source methodologies is presented as ranges with assumptions disclosed where precise alignment is not possible. The 2012 task baseline cited in the long-horizon task share section derives from the RAND Corporation 2012 study by Pace and Zakaras. Compound annual growth rates are derived using the standard formula ((End divided by Start) raised to the power of one over the number of years) minus 1, with five years as the denominator unless otherwise noted. The 3.13-times productivity mandate is a coverage-flat ratio (data multiplier divided by market multiplier), not a forecast of realized productivity gains.

Citing this analysis

The primary citable resource for the figures and analysis in this article is the ComplexDiscovery OÜ eDiscovery Marketplace Mashup Model. The Model is the aggregated research artifact maintained by ComplexDiscovery OÜ since 2012 and is the appropriate citation when referencing data points, projections, segmentation, or analyses derived from any ComplexDiscovery annual eDiscovery market size mashup. Suggested citation: Robinson, R. (2026). 2025 to 2030 eDiscovery Market Size Mashup (H. Robinson, Ed.). ComplexDiscovery OÜ.

Sources informing the Model

The listing below provides an overview of the organizations and publications whose data points have informed the development of the Model over time. The Model itself aggregates publicly available content (including abstracts, excerpts, quotes, references, and data points) from these sources, with inputs collected since the inaugural ComplexDiscovery eDiscovery Market Size Mashup in 2012. Individual entries are presented for transparency about Model construction and are not the appropriate citation for figures appearing in this analysis; readers referencing figures should cite the Model. Market modeling rounding may result in slight differences in aggregate numbers.

360 Market Updates / 360iResearch
Aberdeen
ACG Partners
Allied Market Research
American Medical Association
BMC
Catalyst Investors
ComplexDiscovery OÜ eDiscovery Marketplace Mashup Model (incorporating industry news, editorial analysis, eDiscovery Business Confidence Surveys, eDiscovery Pricing Surveys, and prior Annual eDiscovery Market Size Mashups since 2012)
CS Disco
Discovery & Legal Technology Association (DLTA)
eDiscovery Journal
EY
Facts and Factors
Forbes
FRONTEO (UBIC)
Future Market Insights
Gartner
Georgetown Law Center for the Study of the Legal Profession and Thomson Reuters Legal Executive Institute
Global Industry Analysts
Grand View Research
Greentarget
Harvard Business Review
Houlihan Lokey
i360
IBIS World
IDC
Industry eDiscovery Provider, Analyst, and Investor Briefings and Discussions
Industry Observer Estimations (Multiple Observers)
Industry Research (Company)
KLDiscovery
Markets and Markets
Mordor Intelligence
Nasdaq
Nuix
P&S Market Research
Prescient & Strategic Intelligence
RAND Institute for Civil Justice
Relativity Fest, Industry Panel Discussions
Reports and Data
Research and Markets
Richmond Journal of Law and Technology
Statista
The Conference Board
The Radicati Group
Third-Party Market Studies (Independent Industry Briefings)
Transparency Market Research
U.S. Bureau of Economic Analysis
U.S. Department of Commerce, International Trade Administration
U.S. Securities and Exchange Commission (public company filings)
Zion Market Research

Market Intelligence series reports (2025-30 eDiscovery market size mashup)

[the_ad id="45753"]

Assisted by GAI and LLM Technologies Additional reading

Source: ComplexDiscovery OÜ

ComplexDiscovery’s mission is to enable clarity for complex decisions by providing independent, data‑driven reporting, research, and commentary that make digital risk, legal technology, and regulatory change more legible for practitioners, policymakers, and business leaders.

[taq_review]

[/exclude_from_rss] Industry Research - eDiscovery Market Sizing Beat

Complete look: ComplexDiscovery OÜ's 2025 to 2030 eDiscovery market size mashup

The shape of the worldwide market

How the composition is changing

Where the money is going

The task shift, and why it matters

The demand side, data growth as the underlying force

What the reconciled view implies

Closing the loop

About the Model behind these figures

Methodology

Citing this analysis

Sources informing the Model

360 Market Updates / 360iResearch
Aberdeen
ACG Partners
Allied Market Research
American Medical Association
BMC
Catalyst Investors
ComplexDiscovery OÜ eDiscovery Marketplace Mashup Model (incorporating industry news, editorial analysis, eDiscovery Business Confidence Surveys, eDiscovery Pricing Surveys, and prior Annual eDiscovery Market Size Mashups since 2012)
CS Disco
Discovery & Legal Technology Association (DLTA)
eDiscovery Journal
EY
Facts and Factors
Forbes
FRONTEO (UBIC)
Future Market Insights
Gartner
Georgetown Law Center for the Study of the Legal Profession and Thomson Reuters Legal Executive Institute
Global Industry Analysts
Grand View Research
Greentarget
Harvard Business Review
Houlihan Lokey
i360
IBIS World
IDC
Industry eDiscovery Provider, Analyst, and Investor Briefings and Discussions
Industry Observer Estimations (Multiple Observers)
Industry Research (Company)
KLDiscovery
Markets and Markets
Mordor Intelligence
Nasdaq
Nuix
P&S Market Research
Prescient & Strategic Intelligence
RAND Institute for Civil Justice
Relativity Fest, Industry Panel Discussions
Reports and Data
Research and Markets
Richmond Journal of Law and Technology
Statista
The Conference Board
The Radicati Group
Third-Party Market Studies (Independent Industry Briefings)
Transparency Market Research
U.S. Bureau of Economic Analysis
U.S. Department of Commerce, International Trade Administration
U.S. Securities and Exchange Commission (public company filings)
Zion Market Research

Market Intelligence series reports (2025-30 eDiscovery market size mashup)

[the_ad id="45753"]

Assisted by GAI and LLM Technologies Additional reading

Source: ComplexDiscovery OÜ

ComplexDiscovery’s mission is to enable clarity for complex decisions by providing independent, data‑driven reporting, research, and commentary that make digital risk, legal technology, and regulatory change more legible for practitioners, policymakers, and business leaders.

Complete look: ComplexDiscovery OÜ’s 2025 to 2030 eDiscovery market size mashup

Marketing

The one question that reveals whether your marketing plan is actually a plan

Investments

eDiscovery Vendor Viability Scoring Tool: Making the Subjective Objective

Pricing

Editor's Note: Generative AI is no longer a future-state concept in eDiscovery pricing; it is already reshaping how legal, technology, and corporate teams evaluate cost, value, and defensibility. In this Winter 2026 Pricing Pulse analysis, ComplexDiscovery OÜ, in partnership with EDRM, examines a market that is simultaneously stabilizing in traditional service categories and fragmenting in newer AI-driven ones. The findings highlight a clear divide between established pricing norms for forensic collection, processing, hosting, and document review, and the still-developing commercial models emerging around GenAI-assisted review. For cybersecurity, data privacy, regulatory compliance, and eDiscovery professionals, that divide matters. Pricing transparency now directly affects budgeting, vendor selection, matter planning, and risk management—especially as organizations weigh the promise of AI efficiency against unresolved questions around exception handling, quality control, and contract structure. This analysis offers a timely benchmark for understanding where the market stands today and where pricing pressure is likely to intensify next.

[exclude_from_rss]

[taq_review]

[/exclude_from_rss] Industry Research

A Complete Analysis of the Winter 2026 eDiscovery Pricing Survey

ComplexDiscovery Staff

Executive Summary

The Winter 2026 eDiscovery Pricing Survey, conducted by ComplexDiscovery OÜ in partnership with the Electronic Discovery Reference Model (EDRM) across December 2025 through February 2026, captures a market at a pivotal inflection point. Generative AI (GenAI) has moved into operational workflows for a significant and growing segment of the eDiscovery market — but adoption is uneven, pricing frameworks have not kept pace, and a meaningful share of practitioners have not yet engaged with AI-assisted review at any level. That bifurcation between early adopters and the rest of the market is itself one of the survey's defining findings. Drawing on 53 responses from legal professionals, technology providers, corporations, and consultancies, this survey provides a detailed pricing snapshot of the current eDiscovery market, spanning forensic collection, data processing and hosting, document review, and GenAI-assisted review. Several clear signals emerge from the data. Forensic collection and examination rates have stabilized in the $250–$350 per hour range for standard work, with premium rates for testimony and analysis. Data hosting has commoditized meaningfully at the infrastructure level, while analytics-enabled hosting retains pricing differentiation. Document review rates are stable but per-document billing remains opaque. Most critically, GenAI-assisted review pricing is experimentally diverse — hybrid models and per-document billing each claim roughly 28% of reported primary models, with the $0.11–$0.50 per-document range emerging as a competitive zone that directly challenges traditional human review economics. This report covers all 25 survey questions, organized into four thematic sections, with analyst observations and strategic implications throughout. All findings represent self-reported practitioner perceptions of prevailing market pricing — not verified transaction records — and should be read as directional market intelligence. Unlike vendor-produced or client-commissioned pricing guides, the Pricing Pulse is designed and published independently by ComplexDiscovery OÜ in partnership with the Electronic Discovery Reference Model (EDRM), with no commercial interest in any specific pricing outcome.

About the Survey

Survey Design and Purpose The Winter 2026 eDiscovery Pricing Survey was designed and administered by ComplexDiscovery OÜ in partnership with the Electronic Discovery Reference Model (EDRM) as part of its ongoing Pricing Pulse research program. The survey's primary purpose is to provide eDiscovery practitioners, technology providers, and legal operations professionals with empirically grounded pricing benchmarks across the key service categories that define the eDiscovery market. The Pricing Pulse is practitioner-reported and independently produced — it is not sponsored by, or designed to favor, any vendor, platform, or service category. Respondent comments critiquing the survey design itself are actively incorporated into future iterations, as reflected in this report's processing methodology note. This iteration of the survey placed particular emphasis on generative AI-assisted review pricing — a category first addressed formally in prior survey cycles and highlighted significantly in Winter 2026 to reflect the technology's accelerating, if uneven, integration into eDiscovery workflows. The five GenAI pricing questions (Questions 18–22) were designed to capture not just price points but pricing model structures, exception handling practices, and the nascent development of outcome-based pricing — recognizing that practitioners at very different stages of AI adoption would be responding. Respondent Profile The survey received 53 completed responses. By business segment, law firms represented the largest cohort at 43.4% (23 respondents), followed by software and/or services providers at 24.5% (13), corporations at 15.1% (8), consultancies at 9.4% (5), and media, research, or educational organizations at 7.5% (4). By primary function, 67.9% (36) identified as legal/litigation support professionals, 26.4% (14) as business or business support functions, and 5.7% (3) as IT or product development.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Organizational-Segment-Winter-2026.pdf" title="Survey Respondents by Organizational Segment - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Primary-Function-Winter-2026.pdf" title="Survey Respondents by Primary Function - Winter 2026"]

Geographically, the survey is overwhelmingly U.S.-centric: 92.5% of respondents (49) indicated North America – United States as their primary eDiscovery business geography, with the remaining 7.5% distributed across Europe (United Kingdom and non-UK) and Asia/Asia Pacific. This composition reflects the survey's community of practitioners and should be taken into account when applying results to non-U.S. markets.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Geographic-Region-Winter-2026.pdf" title="Survey Respondents by Geographic Region - Winter 2026"]

The respondent pool's composition — heavily weighted toward legal practitioners with meaningful technology provider and in-house corporate representation — lends credibility to the pricing data for legal use cases while also surfacing supply-side perspectives from vendors who see pricing across many client engagements.

Section 1: Forensic Collection, Examination, and Testimony Pricing

Forensic collection and digital examination form the evidentiary foundation of eDiscovery. Unlike commoditized downstream services, forensic work depends on specialized expertise, defensible chain-of-custody protocols, and increasingly complex device environments. Mobile devices, cloud-linked data ecosystems, encrypted storage, and enterprise application footprints have expanded the examiner's scope considerably over the past several years, sustaining rate levels that resist the downward pressure more commoditized services face. Expert witness testimony sits at the highest value tier of forensic work — where practitioner credentials, courtroom experience, and legal exposure command significant premium pricing. Q1 & Q2 — Per Hour Cost for Onsite and Remote Collection The $250–$350 per hour range is the clear market anchor for forensic collection, cited by 56.6% of respondents for both onsite and remote collection. However, the distributions diverge meaningfully at the premium tier: 20.8% of respondents report onsite collection rates exceeding $350 per hour, compared to just 5.7% for remote. Conversely, remote collection skews lower — 18.9% report sub-$250 rates for remote work, versus only 5.7% for onsite. This onsite premium reflects real cost structures: travel, physical access logistics, on-premises security requirements, and the coordination burden of collecting in active enterprise environments. The growth of remote forensic collection tools — driven in part by pandemic-era necessity and now institutionalized in many engagements — has introduced competitive downward pressure on remote rates that onsite services do not face to the same degree. Four respondents (7.5%) indicate alternative pricing models for remote collection, suggesting some providers are moving toward flat-fee or subscription-based remote collection arrangements.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Hour-Cost-for-an-Onsite-Collection-by-a-Forensic-Examiner-Winter-2026-.pdf" title="Collection Pricing - Per Hour Cost for an Onsite Collection by a Forensic Examiner - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Hour-Cost-for-a-Remote-Collection-by-a-Forensic-Examiner-Winter-2026.pdf" title="Collection Pricing - Per Hour Cost for a Remote Collection by a Forensic Examiner - Winter 2026"]

Q3 & Q4 — Per Device Cost for Desktop/Laptop and Mobile Device Collection Device-based pricing skews decisively to the upper tier: 50.9% of respondents report per-device costs exceeding $350 for desktop and laptop collections, and 49.1% report the same for mobile devices. The $250–$350 mid-range captures 18.9% for computers and 24.5% for mobile devices — the higher mobile representation in the mid-range may reflect lower-complexity or volume-based mobile collection engagements where physical access is easier and device configurations are more standardized. Perhaps most notable is the convergence of mobile and computer collection pricing at the upper tier. Mobile device collection — once considered simpler than computer collection due to smaller storage capacities — now commands comparable rates as encryption, cloud sync architectures, third-party application data, and ephemeral messaging platforms have substantially increased examiner effort and risk. Practitioners seeking to budget mobile collection as a lower-cost alternative to computer collection will increasingly find the market does not support that assumption.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Device-Cost-for-a-Desktop-Laptop-Computer-Collection-by-a-Forensic-Examiner-Winter-2026.pdf" title="Collection Pricing - Per Device Cost for a Desktop Laptop Computer Collection by a Forensic Examiner - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Device-Cost-for-a-Mobile-Device-Collection-by-a-Forensic-Examiner-Winter-2026.pdf" title="Collection Pricing - Per Device Cost for a Mobile Device Collection by a Forensic Examiner - Winter 2026"]

Q5 — Per Hour Cost for Investigation, Analysis, and Report Generation Investigation, analysis, and report generation command a higher hourly rate floor than collection itself. More than half of respondents (54.7%) report rates in the $350–$550 range for this work, compared to the $250–$350 majority for collection. Only 30.2% report rates below $350 per hour for analysis, and 5.7% exceed $550. This premium reflects the cognitive and legal weight of analytical work. Forensic examiners producing reports that will be used in litigation, regulatory proceedings, or internal investigations are exercising expert judgment that creates professional liability — and the market prices that exposure accordingly. Practitioners purchasing forensic services should anticipate that billing rates will escalate from collection through analysis, often within the same engagement.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Hour-Cost-for-Investigation-Analysis-and-Report-Generation-by-an-FE-Winter-2026.pdf" title="Collection Pricing - Per Hour Cost for Investigation Analysis and Report Generation by an FE - Winter 2026"]

Q6 — Per Hour Cost for Expert Witness Testimony Expert witness testimony carries the highest rate profile in the forensic pricing group. While 47.2% report testimony rates in the $350–$550 range — consistent with analysis rates — a notable 26.4% report rates exceeding $550 per hour, the highest proportion in any >$550 category across the survey. The elevated 'do not know' response rate (20.8%) likely reflects that many practitioners engage forensic examiners for collection and analysis but not testimony, creating a meaningful gap in their pricing awareness for this segment. Expert witness rates are driven by factors beyond standard hourly billing — including the examiner's track record, publication history, geographic availability, and the complexity of the matter at issue. The wide distribution, from below $350 to above $550, reflects a market where individual credentials create significant pricing dispersion.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Hour-Cost-for-Expert-Witness-Testimony-In-Person-and-Written-by-an-FE-Winter-2026.pdf" title="Collection Pricing - Per Hour Cost for Expert Witness Testimony (In-Person and Written) by an FE - Winter 2026"]

Analyst Observation — Forensic Collection & Examination The forensic pricing landscape shows a well-established rate structure for collection and a predictable escalation through analysis to testimony. The $250–$350 range for collection hours serves as a reliable negotiation baseline. The key risk for buyers is underbudgeting for analysis and testimony phases — where rates routinely exceed $350/hour and frequently surpass $550. Practitioners with active litigation portfolios should establish explicit rate schedules with forensic vendors for all service tiers at engagement outset, not just collection. Key Takeaways — Section 1

$250–$350/hour is the market anchor for both onsite and remote forensic collection (56.6% each).
Onsite collection carries a measurable premium: 20.8% report >$350/hour vs. 5.7% for remote.
Mobile device collection rates have converged with computer collection at the upper tier (both ~50% report >$350/device).
Investigation, analysis, and report generation rates escalate to $350–$550/hour for 54.7% of respondents.
Expert witness testimony exceeds $550/hour for 26.4% — the highest proportion across all survey categories.

Section 2: Data Processing, Hosting, and Project Management Pricing

Data processing and hosting represent the operational infrastructure of eDiscovery delivery. Processing — transforming raw electronically stored information (ESI) into a reviewable format — has historically been a significant cost driver in large matters. Hosting provides the platform on which review takes place. Both categories have experienced significant commoditization pressure from cloud infrastructure economics, but the emergence of AI-driven early culling and processing tools is beginning to reshape volume dynamics in ways that affect both pricing and billing model design. Q7 & Q8 — Per GB Cost to Process ESI at Ingestion and at Completion Processing pricing at ingestion is relatively compressed: 39.6% of respondents report rates in the $25–$75 per GB range, and 34.0% report rates below $25 per GB. A significant 18.9% indicate alternative pricing models, reflecting the market's movement away from traditional per-GB ingestion billing. Processing pricing at completion of processing tells a different story. The most commonly reported range shifts to 'less than $100 per GB' (37.7%), and the proportion reporting alternative pricing models rises to 22.6%. Another 15.1% report $100–$150 per GB at completion, and 9.4% exceed $150 per GB. The jump from ingestion to completion reflects the data expansion and enrichment that occurs through native processing, deduplication, OCR, and promotion — processes that substantially increase the per-GB cost basis for providers. One respondent offered a methodologically important observation worth acknowledging directly: the survey's two-question processing model may conflate two distinct industry billing philosophies — an 'all-in' per-GB rate that covers ingestion through promotion, versus a staged model with separate per-GB charges for ingestion and native processing or promotion to review. This is a legitimate distinction, and practitioners benchmarking against these results should clarify which model their vendor employs. Future survey iterations will address this more precisely.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-GB-Cost-to-Process-ESI-Based-on-Volume-at-Ingestion-Winter-2026.pdf" title="Processing Pricing - Per GB Cost to Process ESI Based on Volume at Ingestion - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-GB-Cost-to-Process-ESI-Based-on-Volume-at-Completion-Winter-2026.pdf" title="Processing Pricing - Per GB Cost to Process ESI Based on Volume at Completion - Winter 2026"]

Q9 & Q10 — Per GB Per Month Cost to Host ESI Without and With Analytics Data hosting without analytics has substantially commoditized. More than half of respondents (54.7%) report hosting rates below $10 per GB per month, and another 30.2% fall in the $10–$20 range. Less than 2% report rates exceeding $20 per GB per month. This distribution reflects years of cloud infrastructure cost reduction passed through to buyers, as major platform providers compete on storage economics. Analytics-enabled hosting shows a wider and higher distribution. While 43.4% report rates below $15 per GB per month with analytics, 32.1% fall in the $15–$25 range, and 11.3% exceed $25 per GB per month. The premium for analytics-capable hosting reflects platform differentiation: vendors with mature AI search, conceptual clustering, visualization tools, and review workflow automation can sustain higher rates. Undifferentiated platforms — those competing primarily on storage price — face continued downward pressure as infrastructure costs decline. One respondent's comment corroborates this trajectory directly, observing that while overall eDiscovery pricing has been stable, technology costs specifically appear to be coming down — a signal consistent with the commoditization pattern visible in the hosting data.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-GB-Cost-Per-Month-to-Host-ESI-without-Analytics-Winter-2026.pdf" title="Processing Pricing - Per GB Cost Per Month to Host ESI without Analytics - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-GB-Cost-Per-Month-to-Host-ESI-with-Analytics-Winter-2026.pdf" title="Processing Pricing - Per GB Cost Per Month to Host ESI with Analytics - Winter 2026"]

Q11 — User License Fee Per Month for Access to Hosted Data User licensing is in an active state of structural transition. The $50–$100 per user per month range is the most frequently cited (41.5%), but a striking 34.0% of respondents report alternative pricing models — the highest alternative-model proportion among any category in the processing and hosting section. Only 17.0% report rates below $50 per user per month. The high alternative-model rate reflects a market shift away from traditional per-seat licensing toward enterprise agreements, volume tiers, and managed service arrangements that bundle access costs into broader contract structures. For corporate legal departments and law firms managing multi-matter eDiscovery portfolios, these bundled arrangements restructure cost visibility: per-matter spend attribution becomes less granular, which may simplify budgeting at the portfolio level but reduces transparency at the individual matter level. Whether bundled arrangements represent a net financial advantage depends on volume, negotiated terms, and how closely actual usage tracks the contracted scope — variables the survey does not measure.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-User-License-Fee-Per-Month-for-Access-to-Hosted-Data-Winter-2026.pdf" title="Processing Pricing - User License Fee Per Month for Access to Hosted Data - Winter 2026"]

Q12 — Per Hour Cost of Project Management Support for eDiscovery Project management pricing is the most consistent and well-understood category in the processing and hosting group. More than half of respondents (52.8%) report rates in the $100–$200 per hour range, and 26.4% report rates exceeding $200 per hour. The low 'do not know' rate (5.7%) — tied with Q9 for the lowest across all Section 2 questions — indicates that PM pricing is well understood by practitioners and regularly visible in vendor proposals. The 26.4% reporting greater than $200 per hour for project management likely reflects the growing complexity of modern eDiscovery engagements. Today's project managers must coordinate across AI review platforms, multiple review vendor relationships, technical review workflows, and real-time quality control functions — a scope considerably broader than the data management and platform coordination role the title suggested in prior market iterations.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-Hour-Cost-of-Project-Management-Support-for-eDiscovery-Winter-2026.pdf" title="Processing Pricing - Per Hour Cost of Project Management Support for eDiscovery - Winter 2026"]

Analyst Observation — Processing, Hosting & Project Management Processing pricing is bifurcating: per-GB billing at ingestion remains common, but completion-phase and analytics-related pricing is shifting toward bundled and alternative models. Practitioners anchored to traditional per-GB benchmarks for TAR, analytics hosting, or managed service arrangements may be negotiating based on outdated frameworks. Hosting has genuinely commoditized at the infrastructure level — the pricing action now lives in analytics differentiation layered above the storage tier. Key Takeaways — Section 2

Processing at ingestion is largely below $75/GB (73.6% combined), but completion-phase pricing climbs with 24.5% reporting $100/GB or more.
Alternative pricing models account for 18.9% at ingestion and 22.6% at completion — signaling a structural shift away from per-GB processing billing.
Basic hosting has commoditized: 54.7% report sub-$10/GB/month. Analytics hosting retains differentiation with 11.3% exceeding $25/GB/month.
User licensing is migrating from per-seat to bundled models — 34.0% report alternative pricing structures.
Project management rates are well understood and rising: 26.4% now exceed $200/hour, reflecting growing engagement complexity.

Section 3: Document Review Pricing

Document review sits at the commercial center of most eDiscovery engagements. It is the largest cost driver in complex litigation, the primary arena in which human expertise meets technology leverage, and the category most directly disrupted by the emergence of GenAI-assisted review. Pricing in this section spans both hourly attorney rates (the traditional billing model) and per-document rates (a model that has gained traction as technology-assisted review has enabled higher throughput). The data in this section provides critical context for interpreting the GenAI pricing data that follows in Section 4. Q13 — Per GB Cost for Predictive Coding / Technology-Assisted Review Predictive coding and technology-assisted review (TAR) pricing has largely migrated away from per-GB billing. The highest single response category (35.8%) is 'alternative pricing model' — the highest alternative-model proportion of any per-GB question in the survey. Among those who do provide per-GB TAR pricing, 30.2% report rates below $75 per GB, 13.2% report $75–$150 per GB, and only 1.9% exceed $150 per GB. The 18.9% 'do not know' rate for TAR pricing suggests that many practitioners receive predictive coding as an embedded capability within their review platform subscription rather than a separately line-itemed service. This bundling trend, combined with the high alternative-model rate, indicates that standalone per-GB TAR billing is becoming the exception rather than the rule as platforms integrate AI-driven prioritization into standard hosting fees.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-GB-Cost-for-Predictive-Coding-in-a-Technology-Assisted-Review-Winter-2026.pdf" title="Review Pricing - Per GB Cost for Predictive Coding in a Technology-Assisted Review - Winter 2026"]

Q14 & Q15 — Per Hour Cost for Onsite and Remote Managed Review Attorneys Hourly managed review attorney rates are well understood and show a consistent onsite premium over remote delivery. For onsite review, 45.3% of respondents report rates exceeding $40 per hour, and 32.1% report $25–$40 per hour. For remote review, the distribution shifts: 41.5% report $25–$40 per hour, and 35.8% report greater than $40 per hour. The onsite premium reflects overhead recovery for physical review facilities, security infrastructure, and on-site supervision costs. Despite the normalization of remote review following the pandemic era, onsite review commands a persistent rate premium that clients with physical review requirements should anticipate. The relatively high 'do not know' rates for both onsite (18.9%) and remote (17.0%) suggest that many practitioners engage review vendors without direct visibility into the underlying attorney billing rates — a transparency gap that can make accurate matter budgeting difficult.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Hour-Cost-for-Document-Review-Attorneys-to-Review-Documents-Onsite-Winter-2026.pdf" title="Review Pricing - Per Hour Cost for Document Review Attorneys to Review Documents Onsite - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Hour-Cost-for-Document-Review-Attorneys-to-Review-Documents-Remote-Winter-2026.pdf" title="Review Pricing - Per Hour Cost for Document Review Attorneys to Review Documents Remote - Winter 2026"]

Q16 & Q17 — Cost Per Document for Onsite and Remote Managed Review Per-document billing for human document review carries significant uncertainty across the respondent pool. For onsite per-document review, 34.0% of respondents indicate they do not know the cost — the highest 'do not know' rate among all document review questions. For remote per-document review, 30.2% report not knowing. Among those with visibility, the $0.50–$1.00 per document range dominates for both onsite (30.2%) and remote (28.3%) delivery, with onsite showing a higher proportion of rates exceeding $1.00 per document (22.6% vs. 18.9% remote). Remote per-document review trends lower at the bottom of the range: 13.2% report sub-$0.50 rates for remote work versus only 3.8% for onsite. This directional difference is consistent with lower overhead costs in remote delivery environments. In this analyst's view, where the $0.50–$1.00 per-document rate for human review meets GenAI-assisted pricing in the $0.11–$0.50 range, the economic case for AI-assisted review becomes direct — provided quality and defensibility standards are met. The per-document rate distribution for human review is strategically important as a baseline against which GenAI-assisted review pricing should be evaluated. Where human review rates run $0.50–$1.00 per document and GenAI-assisted alternatives are priced in the $0.11–$0.50 range, the cost differential is substantial enough to drive adoption decisions — though the economic case ultimately depends on matter-specific quality thresholds and the degree to which AI exception handling costs are controlled.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Document-Cost-for-Document-Review-Attorneys-to-Review-Documents-Onsite-Winter-2026.pdf" title="Review Pricing - Per Document Cost for Document Review Attorneys to Review Documents Onsite - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Document-Cost-for-Document-Review-Attorneys-to-Review-Documents-Remote-Winter-2026.pdf" title="Review Pricing - Per Document Cost for Document Review Attorneys to Review Documents Remote - Winter 2026"]

Analyst Observation — Document Review Traditional document review rates have held relatively stable, but the market's increasing inability to articulate per-document pricing — particularly for onsite review — signals a structural shift away from document-count-based billing toward time-based models that are less directly comparable to AI-assisted pricing. Practitioners should push for per-document rate transparency in vendor proposals to enable genuine cost modeling against AI alternatives. Key Takeaways — Section 3

TAR/predictive coding billing is migrating away from per-GB models: 35.8% report alternative pricing, 18.9% don't know — bundled platform pricing is absorbing this cost.
Onsite managed review attorney rates exceed $40/hour for 45.3% of respondents vs. 35.8% for remote — the onsite premium persists.
Per-document review rates cluster in the $0.50–$1.00 range for both onsite and remote, with significant 'do not know' responses (34% onsite, 30.2% remote) indicating a transparency gap.
The $0.50–$1.00 per-document human review baseline sets up direct economic competition with emerging GenAI-assisted review pricing.

Section 4: GenAI-Assisted Review Pricing

The Winter 2026 survey's GenAI section was designed to illuminate where pricing clarity exists, where models are still fluid, and where the industry is beginning to form conventions around AI-assisted document review. What the results reveal is not a uniformly mature market but a bifurcated one: a segment of practitioners actively deploying and pricing GenAI review, and a substantial minority — 17.0% reporting it as not applicable or unknown — who have not yet engaged with it at a pricing level. Both cohorts are represented in the data, and the analysis in this section is relevant to each in different ways. This is not surprising. GenAI-assisted review introduces fundamentally different cost economics than traditional review: provider costs are driven by token consumption, GPU infrastructure, and model licensing — not attorney hours. Translating those costs into buyer-facing pricing structures that are transparent, predictable, and defensible has proven more difficult than the technology adoption itself. Q18 — Primary Model for GenAI-Assisted Review The two leading GenAI pricing models are effectively tied: hybrid pricing (combinations of multiple models) and per-document billing each account for 28.3% of primary model responses (15 respondents each). Per-GB billing captures 11.3%, per-token billing 5.7%, flat monthly subscription 5.7%, and outcome-based pricing 3.8%. Notably, 17.0% report that GenAI-assisted review pricing is not applicable or unknown to them — suggesting a meaningful share of the practitioner community has not yet engaged with AI review at a pricing level. The dominance of hybrid models reflects the reality that many providers are constructing bespoke proposals that combine per-document minimums, per-GB infrastructure charges, and platform subscription components. This complexity makes apples-to-apples comparison difficult for buyers — and may be intentional. Per-document pricing's co-equal standing with hybrid models suggests that a document-level unit of value is widely accepted as a conceptual billing anchor, even when the final structure is more complex. One respondent's comment illustrates the breadth of emerging structures not fully captured by the five survey model options: some providers are pricing GenAI review as an hourly professional service — with consultants performing query engineering, model interaction, and attorney collaboration — billed at standard hourly rates with per-matter minimums and not-to-exceed caps. This hourly professional service model sits outside the per-document or per-GB frameworks the market most commonly discusses, and its presence signals that GenAI pricing model diversity is wider than any single survey's categories can fully contain. Per-token pricing — the underlying cost reality for large language model deployments — has not been widely passed through to buyers (5.7%). This indicates that providers are currently absorbing token cost variability and presenting buyers with higher-order pricing units. As token costs evolve with model efficiency improvements, the degree to which providers pass these economics through will be an important market dynamic to watch.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Primary-Model-for-Gen-AI-Assisted-Review-in-eDiscovery-Winter-2026.pdf" title="Review Pricing - Primary Model for Gen AI-Assisted Review in eDiscovery - Winter 2026"]

Q19 — Average Cost Per Document for GenAI-Assisted Review (Per-Document Model) Among all survey respondents, the $0.26–$0.50 per-document tier is the most frequently cited GenAI price point (20.8%), followed by both the $0.11–$0.25 and $0.05–$0.10 ranges (15.1% each). Seven and a half percent report per-document GenAI rates exceeding $0.50, and 5.7% report rates below $0.05. A significant 35.8% indicate this pricing model is not applicable to them or that they do not know the cost. The broad distribution among those with pricing visibility — from under a nickel to over fifty cents per document — reflects the wide variance in task complexity, model selection, and quality control overhead that different GenAI review implementations involve. The $0.11–$0.50 range represents the most commercially active zone. At the lower end, GenAI review offers compelling cost efficiency relative to the $0.50–$1.00 range for human per-document review. At the upper end of GenAI pricing (>$0.50), the value proposition requires stronger justification — particularly around accuracy, speed, or reduced downstream review burden. Practitioners should push vendors for specificity on what the per-document fee includes: model inference costs alone, or QC, exception handling, and reporting as well.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Average-Cost-Per-Document-in-Per-Document-Model-of-Gen-AI-Assisted-Review-Winter-2026.pdf" title="Review Pricing - Average Cost Per Document in Per Document Model of Gen AI-Assisted Review - Winter 2026"]

Q20 — Average Cost Range for GenAI-Assisted Review (Per-GB Model) Per-GB GenAI pricing is less prevalent in practice — 64.2% of respondents indicate this model is not applicable or unknown. Among those who do report per-GB GenAI pricing, the $25–$50 per GB range is most common (17.0%), followed by below $25 per GB (13.2%). Two respondents (3.8%) report rates exceeding $100 per GB for GenAI review — likely representing specialized, computationally intensive analytical workflows rather than standard review acceleration. Given that data processing at ingestion typically falls below $75 per GB, a per-GB GenAI review charge layered on top represents a meaningful incremental cost. Practitioners evaluating per-GB GenAI pricing should model total matter economics carefully, including whether early data culling through AI reduces the volume that reaches review — potentially offsetting the per-GB GenAI charge with reduced processing and hosting costs downstream.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Average-Cost-Range-Per-GB-in-Per-GB-Model-of-Gen-AI-Assisted-Review-Winter-2026.pdf" title="Review Pricing - Average Cost Range Per GB in Per GB Model of Gen AI-Assisted Review - Winter 2026"]

Q21 — Outcome-Based Pricing Structure for GenAI-Assisted Review Outcome-based pricing for GenAI review remains largely theoretical in the current market: 79.2% of respondents report no applicable experience with it. Among the minority with exposure, custom agreements dominate (9.4%), with small numbers reporting tiered pricing based on review speed improvements (3.8%), fixed fees based on achieved accuracy rates (3.8%), a combination of performance metrics (1.9%), and percentage of cost savings compared to traditional review (1.9%). The theoretical appeal of outcome-based pricing is clear — it aligns provider incentives with client results and distributes AI benefit-sharing in a transparent way. The operational mechanisms, however, remain underdeveloped. Defining accuracy baselines, attributing speed gains to AI versus staffing decisions, and calculating savings against hypothetical traditional review costs are all methodologically complex. The custom-agreement dominance (9.4%) reflects that outcome-based structures, where they exist, are negotiated on a bespoke basis without market-standardized frameworks. In this analyst's view, this is an area where the industry is likely to see active experimentation and standardization attempts in coming survey cycles — though the timeline will depend on how quickly buyers begin demanding performance accountability in AI review contracts.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Typical-Structure-of-Outcome-Based-Pricing-Models-in-Gen-AI-Assisted-Review-Winter-2026.pdf" title="Review Pricing - Typical Structure of Outcome-Based Pricing Models in Gen AI-Assisted Review - Winter 2026"]

Q22 — How Pricing Models Handle Failed or Exception Documents in GenAI Review Exception document handling — documents that fail AI processing or require human intervention — is a practical and financially significant issue that is significantly underappreciated in headline GenAI pricing discussions. Nearly 40% of respondents (39.6%) cannot speak to how their contracts address this scenario. Among those with visibility, no single approach dominates: 18.9% report that exception documents route to manual review at standard rates; 17.0% say handling depends on the specific issue encountered; 9.4% each report that exceptions are charged as additional processing time or included in the base price (no additional charge); and 5.7% report per-document exception billing. The variability of exception handling approaches — and the high proportion of respondents with no visibility — represents a meaningful contract risk for buyers. In matters where a significant share of documents require human intervention, the effective cost of a GenAI-assisted review engagement can increase substantially depending on which exception pricing structure applies. Buyers negotiating GenAI review engagements should require explicit exception handling clauses that specify the triggering conditions, billing treatment, and quality control obligations for documents that exit the AI workflow.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Accounting-for-Docs-That-Fail-To-Process-or-Require-Special-Handing-Gen-AI-Winter-2026.pdf" title="Review Pricing - Accounting for Docs That Fail To Process or Require Special Handing (Gen AI) - Winter 2026"]

Analyst Observation — GenAI-Assisted Review The GenAI pricing market is operationally engaged but structurally immature. The concentration in hybrid and per-document models reflects practitioners and providers reaching for familiar pricing analogues while the technology matures. The $0.11–$0.50 per-document zone is emerging as a competitive market range — one that creates genuine economic pressure on traditional human review for appropriate document populations. The most important near-term challenge for the market is not the headline per-document or per-GB rate, but the hidden cost variables: exception document handling, quality control overhead, model retraining requirements, and the total cost of ownership of integrating GenAI review into existing workflows. One survey respondent offered a perspective worth placing on record: many vendors are still determining their AI pricing strategies, rushing to market to capture first-mover advantage or market share — and that token-based pricing pressures may cause AI solution costs to increase materially in the future, absent significant reductions in GPU infrastructure costs. This caution deserves attention as buyers evaluate multi-year GenAI review commitments. Key Takeaways — Section 4

Hybrid and per-document models are the dominant GenAI pricing structures, each at 28.3% — the market has converged on document-level units but not uniform delivery structures.
The $0.11–$0.50 per-document range is the emerging competitive zone for GenAI-assisted review, with direct economic implications for traditional human review.
Per-token pricing has not been widely passed to buyers (5.7%) — providers are absorbing LLM cost variability for now.
Outcome-based GenAI pricing is theoretically compelling but operationally undeveloped; 79.2% of respondents have no applicable experience.
Exception document handling is an underappreciated contract risk: 39.6% don't know how their agreements address it, and no standard approach has emerged.

Conclusion and Strategic Implications

The Winter 2026 eDiscovery Pricing Survey paints a picture of a market undergoing layered transitions simultaneously: forensic services have found stable pricing floors; processing and hosting have bifurcated between commoditized infrastructure and differentiated analytics tiers; document review is experiencing pricing model fragmentation as AI alternatives create new economic reference points; and GenAI-assisted review is operationally deployed but commercially immature in its pricing structures. For eDiscovery Buyers and Legal Operations Professionals The $250–$350 per hour range for forensic collection provides a reliable negotiation baseline, but buyers should build explicit rate schedules covering analysis and testimony phases — where rates routinely exceed $350 and frequently surpass $550 per hour. Processing and hosting negotiations should move beyond per-GB benchmarks for analytics-enabled and TAR-related services, where bundled models increasingly dominate. For document review, the critical action item is requiring per-document rate transparency even when hourly billing is the primary model — enabling genuine cost modeling against AI review alternatives. Corporate legal operations professionals face a distinct version of these challenges. Unlike law firms that pass eDiscovery costs to clients, in-house legal departments absorb them entirely — making pricing transparency a budget integrity issue, not just a negotiation tactic. The hosting commoditization finding (54.7% below $10/GB/month for basic hosting) and the user licensing transition (34.0% of respondents on alternative models) both represent leverage points in enterprise vendor negotiations that legal operations teams can use directly. The project management escalation finding (26.4% above $200/hour) warrants particular attention for in-house teams managing multi-matter portfolios: as PM rates rise with engagement complexity, the cost of inadequate internal scoping and vendor coordination compounds. Corporate legal operations teams are well-positioned to offset this by investing in internal eDiscovery program management capability rather than outsourcing all coordination to vendor project managers at premium rates. For GenAI-assisted review engagements, two contractual priorities stand out: first, obtain explicit pricing for exception documents rather than accepting provider discretion; second, require specificity on what is included in per-document or per-GB GenAI rates to enable accurate total-cost modeling. The $0.11–$0.50 per-document range is commercially viable for appropriate document populations, but hidden costs can erode that advantage quickly if not addressed in the agreement. For eDiscovery Service Providers and Technology Vendors The survey data confirms that buyers are engaging with GenAI pricing at a level of sophistication that requires providers to move beyond introductory pricing structures. The dominance of hybrid models reflects buyer uncertainty as much as provider flexibility — and that uncertainty is not sustainable as GenAI review becomes a standard engagement component rather than a premium add-on. Providers who develop clear, reproducible pricing structures with transparent exception handling will differentiate themselves in a market where 39.6% of buyers currently report no visibility into this critical cost variable. The trajectory of outcome-based pricing deserves attention. While only a small minority of respondents currently have exposure to these models, the direction of the market — toward accountability for AI review quality, not just delivery — suggests that providers who invest in outcome measurement frameworks now will be better positioned as client sophistication increases.

Looking Ahead: Open Questions for the Evolving eDiscovery Pricing Landscape

Several questions worth watching in future survey cycles: Will per-token pricing migrate from provider cost basis to buyer-facing billing as LLM economics become more visible? Will outcome-based pricing develop standardized frameworks, or remain bespoke indefinitely? Will the onsite/remote premium for forensic collection and attorney review compress as remote delivery tools mature further? And will the exception document handling gap in GenAI contracts become a litigation issue that forces market standardization? The Pricing Pulse series will continue to track these dynamics. The Winter 2026 results establish a pricing baseline at a pivotal moment — one that future surveys will be measured against as generative AI transforms both the economics and the practice of eDiscovery.

Research Methodology Note

The Winter 2026 eDiscovery Pricing Survey was designed and administered by ComplexDiscovery OÜ in partnership with the Electronic Discovery Reference Model (EDRM) as part of the Pricing Pulse research series. The survey was conducted via an online form distributed through ComplexDiscovery's professional community and partner networks. The survey period ran from December 2025 through February 2026, with the data collection window closing upon reaching the final respondent cohort of 53 individuals. The survey comprised 25 pricing questions organized across four service categories — forensic collection and examination, data processing and hosting, document review, and GenAI-assisted review — plus three respondent classification questions addressing geography, business segment, and primary function. Response options were structured as defined ranges rather than open-ended numeric inputs to facilitate comparative analysis and protect respondent pricing confidentiality. All responses represent self-reported market observations and practitioner experience. Results should be interpreted as directional market intelligence reflecting current practitioner perceptions of prevailing pricing, not as verified transaction records or audited benchmarks. The U.S.-centric geographic distribution (92.5%) should be taken into account when applying findings to non-U.S. markets. ComplexDiscovery OÜ maintains editorial independence in the analysis and publication of survey results. Individual respondent data is treated as confidential; only aggregated findings are reported. ComplexDiscovery and the Electronic Discovery Reference Model (EDRM) thank the 53 practitioners and professionals who contributed their time and market knowledge to this research. Organizations and individuals interested in participating in future Pricing Pulse surveys are encouraged to connect with ComplexDiscovery at complexdiscovery.com. © 2026 ComplexDiscovery OÜ. All rights reserved. Published on ComplexDiscovery.com. Conducted in partnership with the Electronic Discovery Reference Model (EDRM). The Pricing Pulse is an ongoing research series examining pricing dynamics across the eDiscovery market. News Source

Rob Robinson and Holley Robinson, ComplexDiscovery OÜ, "Winter 2026 eDiscovery Pricing Survey," February 2026.

[the_ad_group id="12741"]

Assisted by GAI and LLM Technologies Additional Reading

Source: ComplexDiscovery OÜ

ComplexDiscovery’s mission is to enable clarity for complex decisions by providing independent, data‑driven reporting, research, and commentary that make digital risk, legal technology, and regulatory change more legible for practitioners, policymakers, and business leaders.

[exclude_from_rss]

[taq_review]

[/exclude_from_rss] Industry Research

A Complete Analysis of the Winter 2026 eDiscovery Pricing Survey

ComplexDiscovery Staff

Executive Summary

About the Survey

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Primary-Function-Winter-2026.pdf" title="Survey Respondents by Primary Function - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Geographic-Region-Winter-2026.pdf" title="Survey Respondents by Geographic Region - Winter 2026"]

Section 1: Forensic Collection, Examination, and Testimony Pricing

$250–$350/hour is the market anchor for both onsite and remote forensic collection (56.6% each).
Onsite collection carries a measurable premium: 20.8% report >$350/hour vs. 5.7% for remote.
Mobile device collection rates have converged with computer collection at the upper tier (both ~50% report >$350/device).
Investigation, analysis, and report generation rates escalate to $350–$550/hour for 54.7% of respondents.
Expert witness testimony exceeds $550/hour for 26.4% — the highest proportion across all survey categories.

Section 2: Data Processing, Hosting, and Project Management Pricing

Processing at ingestion is largely below $75/GB (73.6% combined), but completion-phase pricing climbs with 24.5% reporting $100/GB or more.
Alternative pricing models account for 18.9% at ingestion and 22.6% at completion — signaling a structural shift away from per-GB processing billing.
Basic hosting has commoditized: 54.7% report sub-$10/GB/month. Analytics hosting retains differentiation with 11.3% exceeding $25/GB/month.
User licensing is migrating from per-seat to bundled models — 34.0% report alternative pricing structures.
Project management rates are well understood and rising: 26.4% now exceed $200/hour, reflecting growing engagement complexity.

Section 3: Document Review Pricing

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Hour-Cost-for-Document-Review-Attorneys-to-Review-Documents-Remote-Winter-2026.pdf" title="Review Pricing - Per Hour Cost for Document Review Attorneys to Review Documents Remote - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Document-Cost-for-Document-Review-Attorneys-to-Review-Documents-Remote-Winter-2026.pdf" title="Review Pricing - Per Document Cost for Document Review Attorneys to Review Documents Remote - Winter 2026"]

TAR/predictive coding billing is migrating away from per-GB models: 35.8% report alternative pricing, 18.9% don't know — bundled platform pricing is absorbing this cost.
Onsite managed review attorney rates exceed $40/hour for 45.3% of respondents vs. 35.8% for remote — the onsite premium persists.
Per-document review rates cluster in the $0.50–$1.00 range for both onsite and remote, with significant 'do not know' responses (34% onsite, 30.2% remote) indicating a transparency gap.
The $0.50–$1.00 per-document human review baseline sets up direct economic competition with emerging GenAI-assisted review pricing.

Section 4: GenAI-Assisted Review Pricing

Hybrid and per-document models are the dominant GenAI pricing structures, each at 28.3% — the market has converged on document-level units but not uniform delivery structures.
The $0.11–$0.50 per-document range is the emerging competitive zone for GenAI-assisted review, with direct economic implications for traditional human review.
Per-token pricing has not been widely passed to buyers (5.7%) — providers are absorbing LLM cost variability for now.
Outcome-based GenAI pricing is theoretically compelling but operationally undeveloped; 79.2% of respondents have no applicable experience.
Exception document handling is an underappreciated contract risk: 39.6% don't know how their agreements address it, and no standard approach has emerged.

Conclusion and Strategic Implications

Looking Ahead: Open Questions for the Evolving eDiscovery Pricing Landscape

Research Methodology Note

Rob Robinson and Holley Robinson, ComplexDiscovery OÜ, "Winter 2026 eDiscovery Pricing Survey," February 2026.

[the_ad_group id="12741"]

Assisted by GAI and LLM Technologies Additional Reading

Source: ComplexDiscovery OÜ

ComplexDiscovery’s mission is to enable clarity for complex decisions by providing independent, data‑driven reporting, research, and commentary that make digital risk, legal technology, and regulatory change more legible for practitioners, policymakers, and business leaders.