Cybersecurity Challenges for Artificial Intelligence: Considering the AI Lifecycle

Mar 27, 2021

Content Assessment: Cybersecurity Challenges for Artificial Intelligence: Considering the AI Lifecycle

Information - 95%

Insight - 95%

Relevance - 95%

Objectivity - 95%

Authority - 100%

96%

Excellent

A short percentage-based assessment of the qualitative benefit of the recently published European Union Agency for Cybersecurity (ENISA) report on cybersecurity challenges for artificial intelligence.

Editor’s Note: The European Union Agency for Cybersecurity, ENISA, is the Union’s agency dedicated to achieving a high common level of cybersecurity across Europe. In December of 2020, ENISA published the report AI Cybersecurity Challenges – Threat Landscape for Artificial Intelligence. The report presents the Agency’s active mapping of the AI cybersecurity ecosystem and its Threat Landscape. As part of the report, a generic lifecycle reference model for AI is provided to allow for a structured and methodical approach to understanding the different facets of AI. This generic AI lifecycle may be beneficial for legal, business, and information security professionals in the eDiscovery ecosystem beginning to consider cybersecurity and its relationship with AI.

AI Cybersecurity Challenges – European Union Agency for Cybersecurity

Report Extract on AI Lifecycle Shared with Permission*

AI Lifecycle Phases

Figure – AI Lifecycle Generic Reference Model

AI Lifecycle Generic Reference Model

In this section, we provide a short definition for each stage of the AI Lifecycle and recap the individual steps it involves (“Phase in a Nutshell”).

Business Goal Definition

Prior to carrying out any AI application/system development, it is important that the user organization fully understand the business context of the AI application/system and the data required to achieve the AI application’s business goals, as well as the business metrics to be used to assess the degree to which these goals have been achieved.

Business Goal Definition Phase in a Nutshell: Identify the business purpose of the AI application/system. Link the purpose with the question to be answered by the AI model to be used in the application/system. Identify the model type based on the question.

Data Ingestion

Data Ingestion is the AI life cycle stage where data is obtained from multiple sources (raw data may be of any form structured or unstructured) to compose multi-dimensional data points, called vectors, for immediate use or for storage in order to be accessed and used later. Data Ingestion lies at the basis of any AI application. Data can be ingested directly from its sources in a real-time fashion, a continuous way also known as streaming, or by importing data batches, where data is imported periodically in large macro-batches or in small micro-batches.

Different ingestion mechanisms can be active simultaneously in the same application, synchronizing or decoupling batch and stream ingestion of the same data flows. Ingestion components can also specify data annotation, i.e., whether ingestion is performed with or without metadata (data dictionary, or ontology/taxonomy of the data types). Often, access control operates during data ingestion modeling the privacy status of the data (personal / non-personal data.), choosing suitable privacy-preserving techniques and taking into account the achievable trade-off between privacy impact and analytic accuracy. Compliance with applicable EU privacy and data protection legal framework needs to be ensured in all cases.

The privacy status assigned to data is used to define the AI application Service Level Agreement (SLA) in accordance with the applicable EU privacy and data protection legal framework, including–among other things- the possibility for inspection/auditing competent regulatory authorities (such as Data Protection Authorities). It is important to remark that, in ingesting data an IT governance conflict may arise. On the one hand, data is compartmentalized by its owners in order to ensure access control and privacy protection; on the other hand, it must be integrated in order to enable analytics. Often, different policies and policy rules apply to items of the same category. For multimedia data sources, access protocols may even follow a Digital Right Management (DRM) approach where proof-of-hold must be first negotiated with license servers. It is the responsibility of the AI application designer to make sure that ingestion is done respecting the data providers’ policies on data usage and the applicable EU privacy and data protection legal framework.

Data Collection/Ingestion Phase in a Nutshell: Identify the input (dynamic) data to be collected and the corresponding context metadata. Organize ingestion according to the AI application requirements, importing data in a stream, batch or multi-modal fashion.

Data Exploration

Data Exploration is the stage where insights start to be taken from ingested data. While it may be skipped in some AI applications where data is well understood, it is usually a very time-consuming phase of the AI life cycle. At this stage, it is important to understand the type of data that were collected. A key distinction must be drawn between the different possible types of data, with numerical and categorical being the most prominent categories, alongside multimedia data (e.g., image, audio, video, etc.). Numerical data lends itself to plotting and allows for computing descriptive statistics and verifying if data fits simple parametric distributions like the Gaussian one. Missing data values can also be detected and handled at the exploration stage. Categorical variables are those that have two or more categories but without an intrinsic order. If the variable has a clear ordering, then it is considered as an ordinal variable.

Data Validation/Exploration in a Nutshell: Verify whether data fit a known statistics distribution, either by component (mono-variate distributions) or as vectors (multi-variate distribution). Estimate the corresponding statistic parameters.

Data Pre-Processing

The data pre-processing stage employs techniques to cleanse, integrate and transform the data. This process aims at improving data quality that will improve performance and efficiency of the overall AI system by saving time during the analytic models’ training phase and by promoting better quality of results. Specifically, the term data cleaning designates techniques to correct inconsistencies, remove noise and anonymize/pseudonymize data.

Data integration puts together data coming from multiple sources, while data transformation prepares the data for feeding an analytic model, typically by encoding it in a numerical format. A typical encoding is one-hot encoding used to represent categorical variables as binary vectors. This encoding first requires that the categorical values be mapped to integer values. Then, each integer value is represented as a binary vector that is all zero values except the position of the integer, which is marked with a 1.

Once converted to numbers, data can be subject to further types of transformation: re-scaling, standardization, normalization, and labeling. At the end of this process, a numerical data set is obtained, which will be the basis for training, testing and evaluating the AI model.

Since having a large enough dataset is one of the key success factors when properly training a model, it is common to apply different data augmentation techniques to those training datasets that are too small. For instance, it is common to include in a training dataset different scaled or rotated versions of images, which were already in that dataset. Another example of data augmentation technique which can be used when processing text is replacing a word by its synonym. Even in those cases in which the training dataset is large enough, data augmentation techniques can improve the final trained model. Data can also be augmented in order to increase its quantity and the diversity of scenarios covered. Data augmentation usually consists in applying transformations which are known to be label-preserving, i. e. the model should not change its output (namely prediction) when presented with the transformed data items. Data augmentation can serve to improve the performance of a model and in particular its robustness to benign perturbations. One task where data augmentation is used by default is image classification, where data can be augmented by for instance applying translations, rotations and blurring filters.

Data pre-processing in a Nutshell: Convert ingested data to a metric (numerical) format, integrate data from different sources, handle missing/null values by interpolation, densify to reduce data sparsity, de-noise, filter outliers, change representation interval, anonymize/pseudonymize data, augment data.

Feature Selection

Feature Selection (in general feature engineering) is the stage where the number of components or features (also called dimensions) composing each data vector is reduced, by identifying the components that are believed to be the most meaningful for the AI model. The result is a reduced dataset, as each data vector has fewer components than before. Besides the computational cost reduction, feature selection can bring more accurate models.

Additionally, models built on top of lower-dimensional data are more understandable and explainable. This stage can also be embedded in the model building phase (for instance when processing image or speech data), to be discussed in the next section.

Feature selection in a Nutshell: Identify the dimensions of the data set that account for a global parameter, e.g., the overall variance of the labels. Project data set along these dimensions, discarding the others.

Model Selection/Building

This stage performs the selection/building of the best AI model or algorithm for analyzing the data. It is a difficult task, often subject to trial and error. Based on the business goal and the type of available data, different types of AI techniques can be used. The three commonly identified major categories are supervised learning, unsupervised learning and reinforcement learning models. Supervised techniques deal with labeled data: the AI model is used to learn the mapping between input examples and the target outputs.

Supervised models can be designed as Classifiers, whose aim is to predict a class label, and Regressors, whose aim is to predict a numerical value function of the inputs. Here some common algorithms are Support Vector Machines, Naïve Bayes, Hidden Markov Model, Bayesian networks, and Neural Networks.

Unsupervised techniques use unlabelled training data to describe and extract relations from it, either with the aim of organizing it into clusters, highlight association between data input space, summarize the distribution of data, and reduce data dimensionality (this topic was already addressed as a preliminary step for data preparation in the section on feature selection). Reinforcement learning maps situations with actions, by learning behaviors that will maximize a desired reward function.

While the type of training data, labeled or not, is key for the type of technique to be used and selected, models may also be built from scratch (although this is rather unlikely), with the data scientist designing and coding the model, with the inherent software engineering techniques; or building a model by combining a composition of methods. It is important to remark that model selection (namely choosing the model adapted to the data) may trigger further transformation of the input data, as different AI models require different numerical encodings of the input data vectors.

Generally speaking, selecting a model also includes choosing its training strategy. In the context of supervised learning for example, training involves computing (a learning function of) the difference between the model’s output when it receives each training set data item D as input, and D’s label. This result is used to modify the model in order to decrease the difference.

Many training algorithms for error minimization are available, most of them based on gradient descent. Training algorithms have their own hyperparameters, including the function to be used to compute the model error (e.g., mean squared error), and the batch size, i.e., the number of labeled samples to be fed to the model to accumulate a value of the error to be used for adapting the model itself.

AI Model Selection in a Nutshell: Choose the type of AI model suitable for the application. Encode the data input vectors to match the model’s preferred input format.

Model Training

Having selected an AI model, which in the context of this reference model mostly refers to a Machine Learning (ML) model, the training phase of the AI system commences. In the context of supervised learning, the selected ML model must go through a training phase, where internal model parameters like weights and bias are learned from the data. This allows the model to gain understanding over the data being used and thus become more capable in analyzing them. Again, training involves computing (a function of) the difference between the model’s output when it receives each training set data item D as input, and D’s label. This result is used to modify the model in order to decrease the difference between inferred result and the desired result and thus progressively leads to more accurate, expected results.

The training phase will feed the ML model with batches of input vectors and will use the selected learning function to adapt the model’s internal parameters (weights and bias) based on a measure (e.g., linear, quadratic, log loss) of the difference between the model’s output and the labels. Often, the available data set is partitioned at this stage into a training set, used for setting the model’s parameters, and a test set, where evaluation criteria (e.g., error rate) are only recorded in order to assess the model’s performance outside the training set. Cross-Validation schemes randomly partition multiple times a data set into a training and a test portion of fixed sizes (e.g., 80% and 20% of the available data) and then repeat training and validation phases on each partition.

AI Model Training in a Nutshell: Apply the selected training algorithm with the appropriate parameters to modify the chosen model according to training data. Validate the model training on test set according to a cross-validation strategy.

Model Tuning

Model tuning usually overlaps with model training, since tuning is usually considered within the training process. We opted to separate the two stages in the AI lifecycle to highlight the differences in terms of functional operation, although it is most likely that in the majority of the AI systems they will be both part of the training process.

Certain parameters define high-level concepts about the model, such as their learning function or modality, and cannot be learned from input data. These special parameters, often called hyper-parameters, need to be set up manually, although they can under certain circumstances be tuned automatically by searching the model parameters’ space. This search, called hyper-parameter optimization, is often performed using classic optimization techniques like Grid Search, but Random Search and Bayesian optimization can be used. It is important to remark that the Model Tuning stage uses a special data set (often called validation set), distinct from the training and test sets used in the previous stages. An evaluation phase can also be considered to estimate the outputs limits and to assess how the model would behave in extreme conditions, for example, by using wrong/unsafe data sets. It is important to be noted that, depending on the number of hyper-parameters to be adjusted, trying all possible combinations may just not be feasible.

AI Model Tuning in a Nutshell: Apply model adaptation to the hyper-parameters of the trained AI model using a validation data set, according to deployment condition.

Transfer Learning

In this phase, the user organization sources a pre-trained and pre-tuned AI model and uses it as starting point for further training to achieve faster and better convergence. This is commonly the case when few data are available for training. It should be noted that all steps described above (tuning, testing, etc.) also apply for transfer learning. Moreover, since in many cases transfer learning is being applied, one can consider transfer learning as a part of model training phase, given that transfer learning usually serves as a starting point of the training algorithm. To ensure wider scope, we distinguish transfer learning into a distinct phase in the AI lifecycle presented here.

Transfer Learning in a Nutshell: Source a pre-trained AI model in the same application domain, and apply additional training to it, as needed to improve its in-production accuracy.

Model Deployment

A Machine Learning model will bring knowledge to an organization only when its predictions become available to final users. Deployment is the process of taking a trained model and making it available to the users.

Model Deployment in a Nutshell: Generate an in-production incarnation of the model as software, firmware or hardware. Deploy the model incarnation to edge or cloud, connecting in-production data flows.

Model Maintenance

After deployment, AI models need to be continuously monitored and maintained to handle concept changes and potential concept drifts that may arise during their operation. A change of concept happens when the meaning of an input to the model (or of an output label) changes, e.g., due to modified regulations. A concept drift occurs when the change is not drastic but emerges slowly. Drift is often due to sensor encrustment, i.e., slow evolution over time in sensor resolution (the smallest detectable difference between two values) or overall representation interval. A popular strategy to handle model maintenance is window-based relearning, which relies on recent data points to build a ML model. Another useful technique for AI model maintenance is back testing. In most cases, the user organization knows what happened in the aftermath of the AI model adoption and can compare model prediction to reality. This highlights concept changes: if an underlying concept switches, organizations see a decrease of performance. Another way of detecting these concept drifts may involve statistically characterizing the input dataset used for training the AI model, so that it is possible to compare this training dataset to the current input data in terms of statistic properties. Significant differences between datasets may be indicative of the presence of potential concept drifts which may require a relearning process to be carried out, even before the output of the system is significantly affected. In this way, retraining/relearning processes, which may be potentially time and resource consuming, can be carried out only when required instead of periodically, like in the above-mentioned window-based relearning strategies. Model maintenance also reflects the need to monitor the business goals and assets that might evolve over time and accordingly influence the model itself.

Model Maintenance in a Nutshell: Monitor the ML inference results of the deployed AI model, as well as the input data received by the model, in order to detect possible concept changes or drifts. Retrain the model when needed.

Business Understanding

Building an AI model is often expensive and always time-consuming. It poses several business risks, including failing to have a meaningful impact on the user organization as well as missing in-production deadlines after completion. Business understanding is the stage at which companies that deploy AI models gain insight on the impact of AI on their business and try to maximize the probability of success.

Business Understanding in a Nutshell: Assess the value proposition of the deployed AI model. Estimate (before deployment) and verify (after deployment) its business impact.

Artificial Intelligence Cybersecurity Challenges*

Read the Complete Report (PDF)

ENISA Report – AI Cybersecurity Challenges

Read the original communication.

*Shared with permission under Creative Commons – Attribution 4.0 International (CC BY 4.0) – license.

Additional Reading

Source: ComplexDiscovery

Have a Request?

If you have information or offering requests that you would like to ask us about, please let us know, and we will make our response to you a priority.

ComplexDiscovery OÜ is an independent digital publication and research organization based in Tallinn, Estonia. ComplexDiscovery covers cybersecurity, data privacy, regulatory compliance, and eDiscovery, with reporting that connects legal and business technology developments—including high-growth startup trends—to international business, policy, and global security dynamics. Focusing on technology and risk issues shaped by cross-border regulation and geopolitical complexity, ComplexDiscovery delivers editorial coverage, original analysis, and curated briefings for a global audience of legal, compliance, security, and technology professionals. Learn more at ComplexDiscovery.com.

Generative Artificial Intelligence and Large Language Model Use

ComplexDiscovery OÜ recognizes the value of GAI and LLM tools in streamlining content creation processes and enhancing the overall quality of its research, writing, and editing efforts. To this end, ComplexDiscovery OÜ regularly employs GAI tools, including ChatGPT, Claude, Gemini, Grammarly, Midjourney, and Perplexity, to assist, augment, and accelerate the development and publication of both new and revised content in posts and pages published (initiated in late 2022).

Market Sizing

Editor's Note: The eDiscovery Market Size Mashup is a research tool now in its fourteenth annual cycle. Since 2012, it has tracked the worldwide eDiscovery market through three structural eras: the early-cloud era, when subscription consumption began to displace perpetual licensing; the AI-assisted-review era, when predictive coding reset per-document review economics; and the demand-and-response era that defines 2025 through 2030, when generative-AI-assisted review and emerging agentic workflow features compress per-document cost against a data curve growing roughly five times faster than the dollars available to discover it.

The purpose has remained consistent across the cycles: to give legal technology executives, consultants, analysts, service providers, investors, and corporate legal teams a reconciled mid-range view of the market that is internally consistent, methodologically disclosed, and updated each year against the latest available data. The 2025 to 2030 cycle places the worldwide eDiscovery market at an estimated $19.61 billion in 2025 and a projected $28.08 billion by 2030, a reconciled 7.44 percent compound annual growth rate. Underneath the aggregate trajectory sits the central arithmetic of the cycle: a 27.6-percentage-point annual gap between data growth and market growth that compounds across five years into a 3.13-times productivity-per-dollar mandate by 2030. Each of the twelve Market Intelligence installments published across this cycle examined a single segmentation lens. The consolidated synthesis that follows brings those lenses together as one citable reference for procurement, capability planning, market analysis, and vendor-selection decisions through the back half of the decade. [exclude_from_rss]

[taq_review]

[/exclude_from_rss] Industry Research - eDiscovery Market Sizing Beat

Complete look: ComplexDiscovery OÜ's 2025 to 2030 eDiscovery market size mashup

A synthesis of the worldwide eDiscovery market across software, services, deployment, geography, sector, delivery, task share, and the demand-side data growth curve, reconciled within the ComplexDiscovery OÜ eDiscovery Marketplace Mashup Model ComplexDiscovery OÜ Staff Two numbers shape worldwide eDiscovery through 2030, and they do not move at the same pace. The dollars to discover potentially relevant information rise from approximately 19.61 billion in 2025 to approximately 28.08 billion by 2030, a multiplier of 1.4 times. The data those dollars must reach rises from approximately 181 zettabytes in 2025 to approximately 812 zettabytes in 2030, a multiplier of 4.5 times. By the end of the decade, the same dollar must cover roughly 3.13 times more data than it does at the start. That arithmetic, larger than any single segment shift or composition change, is the structural force underneath this cycle of the mashup. The 3.13-times productivity-per-dollar mandate that compounds out of the gap is a primary force underneath much of what follows. It pressures software to take share from services as channel billing shifts toward AI-driven workflows. It pressures review's relative share of task spend to decline even as review dollars rise. It pressures cloud-first procurement to become the operational default. And it sits alongside other structural dynamics including subscription consumption migration, direct-buyer maturation, and supplier consolidation. None of those other dynamics is reducible to the productivity mandate alone, but each operates in a market where the mandate sets the demand-side ceiling. What follows is a tour of how the mandate lands across each segmentation lens explored in this cycle of the Market Intelligence series, from the aggregate market line down through software, services, deployment, cloud composition, geography, sector, delivery approach, task share, and the demand-side data growth curve underneath all of it.

The shape of the worldwide market

The reconciled view places the worldwide eDiscovery market at approximately 19.61 billion dollars in 2025, rising to approximately 28.08 billion dollars by 2030, a compound annual growth rate of approximately 7.44 percent. Software, the smaller of the two segments in absolute terms, grows at roughly 10.41 percent. Services, the larger segment, grows at roughly 5.75 percent. The 4.66-percentage-point segment CAGR gap translates into a 5-percentage-point composition shift across the five-year horizon. Software's share of total worldwide eDiscovery spend rises from 34 percent in 2025 to 39 percent in 2030; services' share falls correspondingly from 66 percent to 61 percent. Services remains the larger segment by absolute spend through 2030; the segment crossover point falls outside the 2025 to 2030 window.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/05/eDiscovery-Market-Sizing-Past-and-Projected-2026.pdf" title="eDiscovery Market Sizing - Past and Projected - 2026"] Chart: eDiscovery Market Sizing, Past and Projected (2012 to 2030)

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/eDiscovery-Software-and-Services-Market-2025-2030.pdf" title="eDiscovery Software and Services Market (2025-2030)"] Chart: eDiscovery Software and Services Market (2025 to 2030)

Software's outperformance is not a fluke of the moment. The 4.66-percentage-point CAGR gap is, in large measure, the segment-level expression of an AI-driven channel reallocation: the same review workflow that once generated services revenue at a per-document or per-hour rate increasingly generates software revenue at a SaaS subscription or AI-inference rate. The work has not disappeared, and in many cases has expanded as data volumes grow, but the channel through which the work gets billed has steadily shifted from services to software. Services growth is slower but structurally durable; cross-border data, regulatory exposure, advisory work, and specialized response work sustain the services line even as software automates discrete tasks. The services segment in 2030 will not be a slower-growing version of the services segment of 2025; it will be a different mix.

How the composition is changing

Within the software segment, the cloud-first transition that has been underway for the better part of a decade is now functionally complete for new deployments. Off-premise software grows from approximately 5.29 billion dollars in 2025 to approximately 8.87 billion dollars in 2030, while on-premise software grows from 1.37 billion to 2.08 billion. On-premise solutions persist where security, sovereignty, or contractual constraints require them, but they are no longer the default for new procurement.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/05/eDiscovery-Software-Market-2025-2030-On-Off-Premise.pdf" title="eDiscovery Software Market (2025-2030) - On + Off Premise"] Chart: eDiscovery Software Market, On and Off Premise (2025 to 2030)

The more interesting subplot is inside the cloud category. SaaS holds roughly two-thirds of cloud spend in 2025 (approximately 67 percent) and drifts to approximately 63 percent by 2030 as PaaS and IaaS components compound at faster rates. PaaS rises from 15 percent of cloud spend to 17 percent; IaaS rises from 18 percent to 20 percent. The reason is simple: as advanced eDiscovery workloads incorporate large-scale processing, AI inference, vector search, and complex data engineering, customers and providers are increasingly integrating directly with platform and infrastructure services. Some of that integration appears in vendor SaaS revenue, and some appears as direct PaaS or IaaS spend.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/eDiscovery-Cloud-Software-Market-2025-2030.pdf" title="eDiscovery Cloud Software Market (2025-2030)"] Chart: eDiscovery Cloud Software Market (2025 to 2030)

Services growth lags software growth, but the headline number understates the qualitative shift inside services. Traditional managed-review revenue faces continued pricing compression as AI-assisted review compresses billable hours. Advisory services (litigation readiness, information governance, AI risk advisory) grow on the back of regulatory complexity. Specialized response work (forensic collection, second-request response, cross-border data transfer, regulatory inquiry support) grows at premium rates. The services segment in 2030 will not be a slower-growing version of the services segment of 2025; it will be a different mix.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/05/eDiscovery-Services-Market-2025-2030.pdf" title="eDiscovery Services Market (2025-2030)"] Chart: eDiscovery Services Market (2025 to 2030)

Where the money is going

The United States continues to anchor the worldwide market, accounting for roughly 66 percent of global spend in 2025 and easing to roughly 64 percent in 2030. The shift toward rest-of-world is gradual but real, driven by the maturation of data protection regimes outside the United States, the internationalization of regulatory inquiries, and the gradual buildout of regional capacity. Within rest-of-world, the United Kingdom, Canada, Germany, Australia, and Japan continue to claim the largest sub-shares, with rising activity in Singapore, India, and parts of the Middle East.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/eDiscovery-Market-Geographical-Overview-2025-2030.pdf" title="eDiscovery Market Geographical Overview (2025-2030)"] Chart: eDiscovery Market Geographical Overview (2025 to 2030)

The reconciliation distinguishes between government and regulatory demand on one hand and non-government (private-sector) demand on the other. Both grow over the period; non-government grows faster in both percentage and absolute terms. Non-government growth reflects expansion in civil litigation, internal investigations, corporate compliance, and AI-related risk advisory work. Government and regulatory growth reflects persistent investigative activity, ongoing premerger notification work, parallel inquiries in the European Union and the United Kingdom, and continued cross-border regulatory coordination.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/eDiscovery-Government-and-Non-Government-Market-Overview-2025-2030.pdf" title="eDiscovery Government and Non-Government Market Overview (2025-2030)"] Chart: eDiscovery Government and Non-Government Market Overview (2025 to 2030)

Segmenting worldwide spend by who captures the direct economic transaction (corporations and governments, law firms, or service providers) reveals a clear picture. The corporations-and-governments category remains the dominant channel throughout the period, reflecting continued in-house consumption supplemented by direct vendor procurement. Service providers grow faster than law firms over the forecast period. Law firms increasingly act as orchestrators rather than primary procurement channels, with the work and the dollars flowing around them.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/eDiscovery-Market-By-Direct-Delivery-Approach-2025-2030.pdf" title="eDiscovery Market By Direct Delivery Approach (2025-2030)"] Chart: eDiscovery Market by Direct Delivery Approach (2025 to 2030)

The task shift, and why it matters

The most consequential structural shift in the industry is not visible in the aggregate market line. It is in the composition of where eDiscovery dollars get spent across the three core tasks of collection, processing, and review. Review remains the largest single task expenditure throughout the period, but collection and processing capture increasing absolute and relative shares. Across a longer horizon, from RAND Corporation's 2012 baseline through ComplexDiscovery OÜ's 2025 modeling and 2030 forecast, review's share of total task spend has fallen from 73 percent to a reconciled 62 percent to a projected 52 percent: a 21-percentage-point decline across 18 years. Collection, over the same span, has expanded over threefold, from 8 percent to a projected 25 percent, a 17-percentage-point gain. Processing has been comparatively stable, rising from 19 percent to 23 percent.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/05/eDiscovery-Market-By-Task-2025-2030.pdf" title="eDiscovery Market By Task (2025-2030)"] Chart: eDiscovery Market by Task (2025 to 2030)

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/Relative-Task-Expenditures-for-Core-eDiscovery-Tasks.pdf" title="Relative Task Expenditures for Core eDiscovery Tasks"] Chart: Relative Task Expenditures for Core eDiscovery Tasks (2012, 2025, 2030)

The pace of the rebalance is accelerating. Roughly 47 percent of the 18-year share movement happens in the final five years from 2025 to 2030. Review's 5-year decline of 10 percentage points nearly equals the prior 13-year decline of 11 percentage points. The trend is consistent with a demand-and-response dynamic rather than two independent forces operating in parallel. The demand side is the growth in data volume subject to potential collection. The supply-side response is AI-assisted review's compression of per-document review costs, with predictive coding through the prior decade, generative-AI-assisted review through the current decade, and emerging agentic workflow features as the next compression wave. Data growth raises absolute review work to be done; AI compression compresses per-document review prices. Whether the resulting absolute review spend rises, falls, or stays flat depends on the relative pace of the two. In the reconciled view, review absolute spend continues to grow modestly, from approximately 12.16 billion dollars in 2025 to approximately 14.60 billion dollars in 2030, but materially slower than the aggregate market, which is why review's share declines. For practitioners and providers, the practical consequence is that capacity decisions made today should anticipate the structural drift toward collection-heavy and processing-heavy task profiles.

The demand side, data growth as the underlying force

Worldwide data volumes are projected to grow from approximately 181 zettabytes in 2025 to approximately 812 zettabytes in 2030, a compound annual growth rate of approximately 35 percent. Enterprise-held data, the subset most relevant to discoverable information, expands from approximately 54 zettabytes to approximately 243 zettabytes over the same period at the same rate, holding steady at roughly 30 percent of the global total. The 181 zettabyte 2025 anchor is consistent with IDC's Global DataSphere baseline. The 35 percent CAGR through 2030 reflects the Mashup Model's reconciliation across data-universe forecasts and enterprise-specific projections, and sits on the upper end of published industry growth estimates. It is higher than IDC's headline total-data forecast trajectory and lower than the most aggressive AI-content-driven projections.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/06/Data-Volume-and-Growth-in-Zettabytes-2025-2030.pdf" title="Data Volume and Growth in Zettabytes (2025-2030)"] Chart: Data Volume and Growth in Zettabytes (2025 to 2030)

The 35-percent data CAGR set against the 7.44-percent eDiscovery market CAGR is the central arithmetic of the cycle. Across the five-year horizon, global data multiplies roughly 4.5 times (4.484x, from 181 to 812 zettabytes); the worldwide eDiscovery market multiplies roughly 1.4 times (1.432x, from 19.6 to 28.1 billion dollars). Divide one by the other and the productivity-per-dollar requirement falls out: 4.484 ÷ 1.432 ≈ 3.13. By 2030, the same dollar must process, store, search, review, and produce against roughly 3.13 times more data than it did in 2025 just to maintain the same coverage ratio. That is not a marginal improvement. It is the productivity mandate that defines the decade for the industry. The mandate sits underneath several of the shifts documented above: the segment-level CAGR gap reflects the channel through which the productivity gain flows; the task-share rebalance reflects where the gain lands at the workflow level. AI capability compounding (predictive coding through the prior decade, generative-AI-assisted review through the current decade, emerging agentic workflow features as the next compression wave) is the bridge that must close the gap. Whether the bridge closes the gap fully, partially, or in stages depends on the pace at which the current generation of tooling matures, the pace at which agentic features move from product roadmap to production deployment, and whether the industry holds to full-coverage discovery as the standard or moves toward risk-tiered coverage that reserves the most intensive workflows for the documents that matter most. The mandate sets the ceiling; the answer to how the ceiling gets met is the open question of the decade.

What the reconciled view implies

The reconciled view supports a small set of interpretive points, none of them prescriptive, all of them grounded in the figures. For software vendors, the 10.41 percent reconciled software CAGR outpaces services by 4.66 percentage points a year, and the vendors positioned to capture a disproportionate share of incremental software dollars are those integrating AI-assisted review, modular SaaS delivery, and platform-aware processing into the same product surface. For service providers, slower nominal growth does not imply a less attractive market. It implies a different one. Providers that reposition around higher-value advisory and specialized regulatory response can outpace the segment headline. For law firms, the modest share of direct economic transactions captured by law firms suggests a continued shift toward orchestration and advisory positioning rather than primary procurement. For corporate and government legal teams, in-house consumption continues to dominate the direct delivery approach. Build-versus-buy on internal capabilities, governance over AI use in discovery, and readiness for second-request response remain the central program-level questions. For investors and analysts, the dynamics support a continued investment thesis around cloud-native, AI-enabled software platforms, with margin pressure on traditional services and continued consolidation at the supplier level.

Closing the loop

At the start of this analysis, the cycle was framed around two numbers that do not move at the same pace: 19.61 billion dollars rising to 28.08 billion against 181 zettabytes rising to 812 zettabytes. The productivity-per-dollar mandate that compounds out of that gap, 3.13 times by 2030, is not a forecast of efficiency. It is a measurement of pressure. The segment, task, and channel shifts documented above are the visible places where the pressure shows up. AI capability compounding is the bridge that must close the gap. Whether the industry meets that pressure through AI-assisted tooling alone, or moves toward a structural redefinition of what discovery coverage means in practice, from full coverage of every potentially relevant artifact to risk-tiered coverage that reserves the most intensive workflows for the documents that matter most, is the open question of the decade. The mashup measures the pressure. The industry answers it.

About the Model behind these figures

All quantitative figures in this analysis are drawn from the ComplexDiscovery OÜ eDiscovery Marketplace Mashup Model, a proprietary analyst-aggregation framework that reconciles publicly available third-party research, vendor disclosures, and industry reference work into a single defensible mid-range view of the worldwide eDiscovery market. The Model is a research aggregation tool maintained by ComplexDiscovery OÜ. It is not distributed publicly and does not constitute primary research; figures cited here represent reconciled estimates aligned to a common scope, geography, and timeframe.

This Mashup is the public synthesis vehicle for the Model's 2025 to 2030 cycle. It provides the consolidated reconciliation across software, services, deployment, cloud composition, geography, sector, delivery approach, task composition, long-horizon task share, and data growth, along with a representative listing of the organizations and publications that inform the Model's source aggregation.

Methodology

The scope of this synthesis is the worldwide eDiscovery market, encompassing software and services, expressed in U.S. dollars, across calendar years 2025 through 2030. Reconciliation of varying market definitions, geographic scopes, and source methodologies is presented as ranges with assumptions disclosed where precise alignment is not possible. The 2012 task baseline cited in the long-horizon task share section derives from the RAND Corporation 2012 study by Pace and Zakaras. Compound annual growth rates are derived using the standard formula ((End divided by Start) raised to the power of one over the number of years) minus 1, with five years as the denominator unless otherwise noted. The 3.13-times productivity mandate is a coverage-flat ratio (data multiplier divided by market multiplier), not a forecast of realized productivity gains.

Citing this analysis

The primary citable resource for the figures and analysis in this article is the ComplexDiscovery OÜ eDiscovery Marketplace Mashup Model. The Model is the aggregated research artifact maintained by ComplexDiscovery OÜ since 2012 and is the appropriate citation when referencing data points, projections, segmentation, or analyses derived from any ComplexDiscovery annual eDiscovery market size mashup. Suggested citation: Robinson, R. (2026). 2025 to 2030 eDiscovery Market Size Mashup (H. Robinson, Ed.). ComplexDiscovery OÜ.

Sources informing the Model

The listing below provides an overview of the organizations and publications whose data points have informed the development of the Model over time. The Model itself aggregates publicly available content (including abstracts, excerpts, quotes, references, and data points) from these sources, with inputs collected since the inaugural ComplexDiscovery eDiscovery Market Size Mashup in 2012. Individual entries are presented for transparency about Model construction and are not the appropriate citation for figures appearing in this analysis; readers referencing figures should cite the Model. Market modeling rounding may result in slight differences in aggregate numbers.

360 Market Updates / 360iResearch
Aberdeen
ACG Partners
Allied Market Research
American Medical Association
BMC
Catalyst Investors
ComplexDiscovery OÜ eDiscovery Marketplace Mashup Model (incorporating industry news, editorial analysis, eDiscovery Business Confidence Surveys, eDiscovery Pricing Surveys, and prior Annual eDiscovery Market Size Mashups since 2012)
CS Disco
Discovery & Legal Technology Association (DLTA)
eDiscovery Journal
EY
Facts and Factors
Forbes
FRONTEO (UBIC)
Future Market Insights
Gartner
Georgetown Law Center for the Study of the Legal Profession and Thomson Reuters Legal Executive Institute
Global Industry Analysts
Grand View Research
Greentarget
Harvard Business Review
Houlihan Lokey
i360
IBIS World
IDC
Industry eDiscovery Provider, Analyst, and Investor Briefings and Discussions
Industry Observer Estimations (Multiple Observers)
Industry Research (Company)
KLDiscovery
Markets and Markets
Mordor Intelligence
Nasdaq
Nuix
P&S Market Research
Prescient & Strategic Intelligence
RAND Institute for Civil Justice
Relativity Fest, Industry Panel Discussions
Reports and Data
Research and Markets
Richmond Journal of Law and Technology
Statista
The Conference Board
The Radicati Group
Third-Party Market Studies (Independent Industry Briefings)
Transparency Market Research
U.S. Bureau of Economic Analysis
U.S. Department of Commerce, International Trade Administration
U.S. Securities and Exchange Commission (public company filings)
Zion Market Research

Market Intelligence series reports (2025-30 eDiscovery market size mashup)

[the_ad id="45753"]

Assisted by GAI and LLM Technologies Additional reading

Source: ComplexDiscovery OÜ

ComplexDiscovery’s mission is to enable clarity for complex decisions by providing independent, data‑driven reporting, research, and commentary that make digital risk, legal technology, and regulatory change more legible for practitioners, policymakers, and business leaders.

[taq_review]

[/exclude_from_rss] Industry Research - eDiscovery Market Sizing Beat

Complete look: ComplexDiscovery OÜ's 2025 to 2030 eDiscovery market size mashup

The shape of the worldwide market

How the composition is changing

Where the money is going

The task shift, and why it matters

The demand side, data growth as the underlying force

What the reconciled view implies

Closing the loop

About the Model behind these figures

Methodology

Citing this analysis

Sources informing the Model

360 Market Updates / 360iResearch
Aberdeen
ACG Partners
Allied Market Research
American Medical Association
BMC
Catalyst Investors
ComplexDiscovery OÜ eDiscovery Marketplace Mashup Model (incorporating industry news, editorial analysis, eDiscovery Business Confidence Surveys, eDiscovery Pricing Surveys, and prior Annual eDiscovery Market Size Mashups since 2012)
CS Disco
Discovery & Legal Technology Association (DLTA)
eDiscovery Journal
EY
Facts and Factors
Forbes
FRONTEO (UBIC)
Future Market Insights
Gartner
Georgetown Law Center for the Study of the Legal Profession and Thomson Reuters Legal Executive Institute
Global Industry Analysts
Grand View Research
Greentarget
Harvard Business Review
Houlihan Lokey
i360
IBIS World
IDC
Industry eDiscovery Provider, Analyst, and Investor Briefings and Discussions
Industry Observer Estimations (Multiple Observers)
Industry Research (Company)
KLDiscovery
Markets and Markets
Mordor Intelligence
Nasdaq
Nuix
P&S Market Research
Prescient & Strategic Intelligence
RAND Institute for Civil Justice
Relativity Fest, Industry Panel Discussions
Reports and Data
Research and Markets
Richmond Journal of Law and Technology
Statista
The Conference Board
The Radicati Group
Third-Party Market Studies (Independent Industry Briefings)
Transparency Market Research
U.S. Bureau of Economic Analysis
U.S. Department of Commerce, International Trade Administration
U.S. Securities and Exchange Commission (public company filings)
Zion Market Research

Market Intelligence series reports (2025-30 eDiscovery market size mashup)

[the_ad id="45753"]

Assisted by GAI and LLM Technologies Additional reading

Source: ComplexDiscovery OÜ

ComplexDiscovery’s mission is to enable clarity for complex decisions by providing independent, data‑driven reporting, research, and commentary that make digital risk, legal technology, and regulatory change more legible for practitioners, policymakers, and business leaders.

Complete look: ComplexDiscovery OÜ’s 2025 to 2030 eDiscovery market size mashup

Marketing

The one question that reveals whether your marketing plan is actually a plan

Investments

eDiscovery Vendor Viability Scoring Tool: Making the Subjective Objective

Pricing

Editor's Note: Generative AI is no longer a future-state concept in eDiscovery pricing; it is already reshaping how legal, technology, and corporate teams evaluate cost, value, and defensibility. In this Winter 2026 Pricing Pulse analysis, ComplexDiscovery OÜ, in partnership with EDRM, examines a market that is simultaneously stabilizing in traditional service categories and fragmenting in newer AI-driven ones. The findings highlight a clear divide between established pricing norms for forensic collection, processing, hosting, and document review, and the still-developing commercial models emerging around GenAI-assisted review. For cybersecurity, data privacy, regulatory compliance, and eDiscovery professionals, that divide matters. Pricing transparency now directly affects budgeting, vendor selection, matter planning, and risk management—especially as organizations weigh the promise of AI efficiency against unresolved questions around exception handling, quality control, and contract structure. This analysis offers a timely benchmark for understanding where the market stands today and where pricing pressure is likely to intensify next.

[exclude_from_rss]

[taq_review]

[/exclude_from_rss] Industry Research

A Complete Analysis of the Winter 2026 eDiscovery Pricing Survey

ComplexDiscovery Staff

Executive Summary

The Winter 2026 eDiscovery Pricing Survey, conducted by ComplexDiscovery OÜ in partnership with the Electronic Discovery Reference Model (EDRM) across December 2025 through February 2026, captures a market at a pivotal inflection point. Generative AI (GenAI) has moved into operational workflows for a significant and growing segment of the eDiscovery market — but adoption is uneven, pricing frameworks have not kept pace, and a meaningful share of practitioners have not yet engaged with AI-assisted review at any level. That bifurcation between early adopters and the rest of the market is itself one of the survey's defining findings. Drawing on 53 responses from legal professionals, technology providers, corporations, and consultancies, this survey provides a detailed pricing snapshot of the current eDiscovery market, spanning forensic collection, data processing and hosting, document review, and GenAI-assisted review. Several clear signals emerge from the data. Forensic collection and examination rates have stabilized in the $250–$350 per hour range for standard work, with premium rates for testimony and analysis. Data hosting has commoditized meaningfully at the infrastructure level, while analytics-enabled hosting retains pricing differentiation. Document review rates are stable but per-document billing remains opaque. Most critically, GenAI-assisted review pricing is experimentally diverse — hybrid models and per-document billing each claim roughly 28% of reported primary models, with the $0.11–$0.50 per-document range emerging as a competitive zone that directly challenges traditional human review economics. This report covers all 25 survey questions, organized into four thematic sections, with analyst observations and strategic implications throughout. All findings represent self-reported practitioner perceptions of prevailing market pricing — not verified transaction records — and should be read as directional market intelligence. Unlike vendor-produced or client-commissioned pricing guides, the Pricing Pulse is designed and published independently by ComplexDiscovery OÜ in partnership with the Electronic Discovery Reference Model (EDRM), with no commercial interest in any specific pricing outcome.

About the Survey

Survey Design and Purpose The Winter 2026 eDiscovery Pricing Survey was designed and administered by ComplexDiscovery OÜ in partnership with the Electronic Discovery Reference Model (EDRM) as part of its ongoing Pricing Pulse research program. The survey's primary purpose is to provide eDiscovery practitioners, technology providers, and legal operations professionals with empirically grounded pricing benchmarks across the key service categories that define the eDiscovery market. The Pricing Pulse is practitioner-reported and independently produced — it is not sponsored by, or designed to favor, any vendor, platform, or service category. Respondent comments critiquing the survey design itself are actively incorporated into future iterations, as reflected in this report's processing methodology note. This iteration of the survey placed particular emphasis on generative AI-assisted review pricing — a category first addressed formally in prior survey cycles and highlighted significantly in Winter 2026 to reflect the technology's accelerating, if uneven, integration into eDiscovery workflows. The five GenAI pricing questions (Questions 18–22) were designed to capture not just price points but pricing model structures, exception handling practices, and the nascent development of outcome-based pricing — recognizing that practitioners at very different stages of AI adoption would be responding. Respondent Profile The survey received 53 completed responses. By business segment, law firms represented the largest cohort at 43.4% (23 respondents), followed by software and/or services providers at 24.5% (13), corporations at 15.1% (8), consultancies at 9.4% (5), and media, research, or educational organizations at 7.5% (4). By primary function, 67.9% (36) identified as legal/litigation support professionals, 26.4% (14) as business or business support functions, and 5.7% (3) as IT or product development.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Organizational-Segment-Winter-2026.pdf" title="Survey Respondents by Organizational Segment - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Primary-Function-Winter-2026.pdf" title="Survey Respondents by Primary Function - Winter 2026"]

Geographically, the survey is overwhelmingly U.S.-centric: 92.5% of respondents (49) indicated North America – United States as their primary eDiscovery business geography, with the remaining 7.5% distributed across Europe (United Kingdom and non-UK) and Asia/Asia Pacific. This composition reflects the survey's community of practitioners and should be taken into account when applying results to non-U.S. markets.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Geographic-Region-Winter-2026.pdf" title="Survey Respondents by Geographic Region - Winter 2026"]

The respondent pool's composition — heavily weighted toward legal practitioners with meaningful technology provider and in-house corporate representation — lends credibility to the pricing data for legal use cases while also surfacing supply-side perspectives from vendors who see pricing across many client engagements.

Section 1: Forensic Collection, Examination, and Testimony Pricing

Forensic collection and digital examination form the evidentiary foundation of eDiscovery. Unlike commoditized downstream services, forensic work depends on specialized expertise, defensible chain-of-custody protocols, and increasingly complex device environments. Mobile devices, cloud-linked data ecosystems, encrypted storage, and enterprise application footprints have expanded the examiner's scope considerably over the past several years, sustaining rate levels that resist the downward pressure more commoditized services face. Expert witness testimony sits at the highest value tier of forensic work — where practitioner credentials, courtroom experience, and legal exposure command significant premium pricing. Q1 & Q2 — Per Hour Cost for Onsite and Remote Collection The $250–$350 per hour range is the clear market anchor for forensic collection, cited by 56.6% of respondents for both onsite and remote collection. However, the distributions diverge meaningfully at the premium tier: 20.8% of respondents report onsite collection rates exceeding $350 per hour, compared to just 5.7% for remote. Conversely, remote collection skews lower — 18.9% report sub-$250 rates for remote work, versus only 5.7% for onsite. This onsite premium reflects real cost structures: travel, physical access logistics, on-premises security requirements, and the coordination burden of collecting in active enterprise environments. The growth of remote forensic collection tools — driven in part by pandemic-era necessity and now institutionalized in many engagements — has introduced competitive downward pressure on remote rates that onsite services do not face to the same degree. Four respondents (7.5%) indicate alternative pricing models for remote collection, suggesting some providers are moving toward flat-fee or subscription-based remote collection arrangements.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Hour-Cost-for-an-Onsite-Collection-by-a-Forensic-Examiner-Winter-2026-.pdf" title="Collection Pricing - Per Hour Cost for an Onsite Collection by a Forensic Examiner - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Hour-Cost-for-a-Remote-Collection-by-a-Forensic-Examiner-Winter-2026.pdf" title="Collection Pricing - Per Hour Cost for a Remote Collection by a Forensic Examiner - Winter 2026"]

Q3 & Q4 — Per Device Cost for Desktop/Laptop and Mobile Device Collection Device-based pricing skews decisively to the upper tier: 50.9% of respondents report per-device costs exceeding $350 for desktop and laptop collections, and 49.1% report the same for mobile devices. The $250–$350 mid-range captures 18.9% for computers and 24.5% for mobile devices — the higher mobile representation in the mid-range may reflect lower-complexity or volume-based mobile collection engagements where physical access is easier and device configurations are more standardized. Perhaps most notable is the convergence of mobile and computer collection pricing at the upper tier. Mobile device collection — once considered simpler than computer collection due to smaller storage capacities — now commands comparable rates as encryption, cloud sync architectures, third-party application data, and ephemeral messaging platforms have substantially increased examiner effort and risk. Practitioners seeking to budget mobile collection as a lower-cost alternative to computer collection will increasingly find the market does not support that assumption.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Device-Cost-for-a-Desktop-Laptop-Computer-Collection-by-a-Forensic-Examiner-Winter-2026.pdf" title="Collection Pricing - Per Device Cost for a Desktop Laptop Computer Collection by a Forensic Examiner - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Device-Cost-for-a-Mobile-Device-Collection-by-a-Forensic-Examiner-Winter-2026.pdf" title="Collection Pricing - Per Device Cost for a Mobile Device Collection by a Forensic Examiner - Winter 2026"]

Q5 — Per Hour Cost for Investigation, Analysis, and Report Generation Investigation, analysis, and report generation command a higher hourly rate floor than collection itself. More than half of respondents (54.7%) report rates in the $350–$550 range for this work, compared to the $250–$350 majority for collection. Only 30.2% report rates below $350 per hour for analysis, and 5.7% exceed $550. This premium reflects the cognitive and legal weight of analytical work. Forensic examiners producing reports that will be used in litigation, regulatory proceedings, or internal investigations are exercising expert judgment that creates professional liability — and the market prices that exposure accordingly. Practitioners purchasing forensic services should anticipate that billing rates will escalate from collection through analysis, often within the same engagement.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Hour-Cost-for-Investigation-Analysis-and-Report-Generation-by-an-FE-Winter-2026.pdf" title="Collection Pricing - Per Hour Cost for Investigation Analysis and Report Generation by an FE - Winter 2026"]

Q6 — Per Hour Cost for Expert Witness Testimony Expert witness testimony carries the highest rate profile in the forensic pricing group. While 47.2% report testimony rates in the $350–$550 range — consistent with analysis rates — a notable 26.4% report rates exceeding $550 per hour, the highest proportion in any >$550 category across the survey. The elevated 'do not know' response rate (20.8%) likely reflects that many practitioners engage forensic examiners for collection and analysis but not testimony, creating a meaningful gap in their pricing awareness for this segment. Expert witness rates are driven by factors beyond standard hourly billing — including the examiner's track record, publication history, geographic availability, and the complexity of the matter at issue. The wide distribution, from below $350 to above $550, reflects a market where individual credentials create significant pricing dispersion.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Collection-Pricing-Per-Hour-Cost-for-Expert-Witness-Testimony-In-Person-and-Written-by-an-FE-Winter-2026.pdf" title="Collection Pricing - Per Hour Cost for Expert Witness Testimony (In-Person and Written) by an FE - Winter 2026"]

Analyst Observation — Forensic Collection & Examination The forensic pricing landscape shows a well-established rate structure for collection and a predictable escalation through analysis to testimony. The $250–$350 range for collection hours serves as a reliable negotiation baseline. The key risk for buyers is underbudgeting for analysis and testimony phases — where rates routinely exceed $350/hour and frequently surpass $550. Practitioners with active litigation portfolios should establish explicit rate schedules with forensic vendors for all service tiers at engagement outset, not just collection. Key Takeaways — Section 1

$250–$350/hour is the market anchor for both onsite and remote forensic collection (56.6% each).
Onsite collection carries a measurable premium: 20.8% report >$350/hour vs. 5.7% for remote.
Mobile device collection rates have converged with computer collection at the upper tier (both ~50% report >$350/device).
Investigation, analysis, and report generation rates escalate to $350–$550/hour for 54.7% of respondents.
Expert witness testimony exceeds $550/hour for 26.4% — the highest proportion across all survey categories.

Section 2: Data Processing, Hosting, and Project Management Pricing

Data processing and hosting represent the operational infrastructure of eDiscovery delivery. Processing — transforming raw electronically stored information (ESI) into a reviewable format — has historically been a significant cost driver in large matters. Hosting provides the platform on which review takes place. Both categories have experienced significant commoditization pressure from cloud infrastructure economics, but the emergence of AI-driven early culling and processing tools is beginning to reshape volume dynamics in ways that affect both pricing and billing model design. Q7 & Q8 — Per GB Cost to Process ESI at Ingestion and at Completion Processing pricing at ingestion is relatively compressed: 39.6% of respondents report rates in the $25–$75 per GB range, and 34.0% report rates below $25 per GB. A significant 18.9% indicate alternative pricing models, reflecting the market's movement away from traditional per-GB ingestion billing. Processing pricing at completion of processing tells a different story. The most commonly reported range shifts to 'less than $100 per GB' (37.7%), and the proportion reporting alternative pricing models rises to 22.6%. Another 15.1% report $100–$150 per GB at completion, and 9.4% exceed $150 per GB. The jump from ingestion to completion reflects the data expansion and enrichment that occurs through native processing, deduplication, OCR, and promotion — processes that substantially increase the per-GB cost basis for providers. One respondent offered a methodologically important observation worth acknowledging directly: the survey's two-question processing model may conflate two distinct industry billing philosophies — an 'all-in' per-GB rate that covers ingestion through promotion, versus a staged model with separate per-GB charges for ingestion and native processing or promotion to review. This is a legitimate distinction, and practitioners benchmarking against these results should clarify which model their vendor employs. Future survey iterations will address this more precisely.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-GB-Cost-to-Process-ESI-Based-on-Volume-at-Ingestion-Winter-2026.pdf" title="Processing Pricing - Per GB Cost to Process ESI Based on Volume at Ingestion - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-GB-Cost-to-Process-ESI-Based-on-Volume-at-Completion-Winter-2026.pdf" title="Processing Pricing - Per GB Cost to Process ESI Based on Volume at Completion - Winter 2026"]

Q9 & Q10 — Per GB Per Month Cost to Host ESI Without and With Analytics Data hosting without analytics has substantially commoditized. More than half of respondents (54.7%) report hosting rates below $10 per GB per month, and another 30.2% fall in the $10–$20 range. Less than 2% report rates exceeding $20 per GB per month. This distribution reflects years of cloud infrastructure cost reduction passed through to buyers, as major platform providers compete on storage economics. Analytics-enabled hosting shows a wider and higher distribution. While 43.4% report rates below $15 per GB per month with analytics, 32.1% fall in the $15–$25 range, and 11.3% exceed $25 per GB per month. The premium for analytics-capable hosting reflects platform differentiation: vendors with mature AI search, conceptual clustering, visualization tools, and review workflow automation can sustain higher rates. Undifferentiated platforms — those competing primarily on storage price — face continued downward pressure as infrastructure costs decline. One respondent's comment corroborates this trajectory directly, observing that while overall eDiscovery pricing has been stable, technology costs specifically appear to be coming down — a signal consistent with the commoditization pattern visible in the hosting data.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-GB-Cost-Per-Month-to-Host-ESI-without-Analytics-Winter-2026.pdf" title="Processing Pricing - Per GB Cost Per Month to Host ESI without Analytics - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-GB-Cost-Per-Month-to-Host-ESI-with-Analytics-Winter-2026.pdf" title="Processing Pricing - Per GB Cost Per Month to Host ESI with Analytics - Winter 2026"]

Q11 — User License Fee Per Month for Access to Hosted Data User licensing is in an active state of structural transition. The $50–$100 per user per month range is the most frequently cited (41.5%), but a striking 34.0% of respondents report alternative pricing models — the highest alternative-model proportion among any category in the processing and hosting section. Only 17.0% report rates below $50 per user per month. The high alternative-model rate reflects a market shift away from traditional per-seat licensing toward enterprise agreements, volume tiers, and managed service arrangements that bundle access costs into broader contract structures. For corporate legal departments and law firms managing multi-matter eDiscovery portfolios, these bundled arrangements restructure cost visibility: per-matter spend attribution becomes less granular, which may simplify budgeting at the portfolio level but reduces transparency at the individual matter level. Whether bundled arrangements represent a net financial advantage depends on volume, negotiated terms, and how closely actual usage tracks the contracted scope — variables the survey does not measure.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-User-License-Fee-Per-Month-for-Access-to-Hosted-Data-Winter-2026.pdf" title="Processing Pricing - User License Fee Per Month for Access to Hosted Data - Winter 2026"]

Q12 — Per Hour Cost of Project Management Support for eDiscovery Project management pricing is the most consistent and well-understood category in the processing and hosting group. More than half of respondents (52.8%) report rates in the $100–$200 per hour range, and 26.4% report rates exceeding $200 per hour. The low 'do not know' rate (5.7%) — tied with Q9 for the lowest across all Section 2 questions — indicates that PM pricing is well understood by practitioners and regularly visible in vendor proposals. The 26.4% reporting greater than $200 per hour for project management likely reflects the growing complexity of modern eDiscovery engagements. Today's project managers must coordinate across AI review platforms, multiple review vendor relationships, technical review workflows, and real-time quality control functions — a scope considerably broader than the data management and platform coordination role the title suggested in prior market iterations.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Processing-Pricing-Per-Hour-Cost-of-Project-Management-Support-for-eDiscovery-Winter-2026.pdf" title="Processing Pricing - Per Hour Cost of Project Management Support for eDiscovery - Winter 2026"]

Analyst Observation — Processing, Hosting & Project Management Processing pricing is bifurcating: per-GB billing at ingestion remains common, but completion-phase and analytics-related pricing is shifting toward bundled and alternative models. Practitioners anchored to traditional per-GB benchmarks for TAR, analytics hosting, or managed service arrangements may be negotiating based on outdated frameworks. Hosting has genuinely commoditized at the infrastructure level — the pricing action now lives in analytics differentiation layered above the storage tier. Key Takeaways — Section 2

Processing at ingestion is largely below $75/GB (73.6% combined), but completion-phase pricing climbs with 24.5% reporting $100/GB or more.
Alternative pricing models account for 18.9% at ingestion and 22.6% at completion — signaling a structural shift away from per-GB processing billing.
Basic hosting has commoditized: 54.7% report sub-$10/GB/month. Analytics hosting retains differentiation with 11.3% exceeding $25/GB/month.
User licensing is migrating from per-seat to bundled models — 34.0% report alternative pricing structures.
Project management rates are well understood and rising: 26.4% now exceed $200/hour, reflecting growing engagement complexity.

Section 3: Document Review Pricing

Document review sits at the commercial center of most eDiscovery engagements. It is the largest cost driver in complex litigation, the primary arena in which human expertise meets technology leverage, and the category most directly disrupted by the emergence of GenAI-assisted review. Pricing in this section spans both hourly attorney rates (the traditional billing model) and per-document rates (a model that has gained traction as technology-assisted review has enabled higher throughput). The data in this section provides critical context for interpreting the GenAI pricing data that follows in Section 4. Q13 — Per GB Cost for Predictive Coding / Technology-Assisted Review Predictive coding and technology-assisted review (TAR) pricing has largely migrated away from per-GB billing. The highest single response category (35.8%) is 'alternative pricing model' — the highest alternative-model proportion of any per-GB question in the survey. Among those who do provide per-GB TAR pricing, 30.2% report rates below $75 per GB, 13.2% report $75–$150 per GB, and only 1.9% exceed $150 per GB. The 18.9% 'do not know' rate for TAR pricing suggests that many practitioners receive predictive coding as an embedded capability within their review platform subscription rather than a separately line-itemed service. This bundling trend, combined with the high alternative-model rate, indicates that standalone per-GB TAR billing is becoming the exception rather than the rule as platforms integrate AI-driven prioritization into standard hosting fees.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-GB-Cost-for-Predictive-Coding-in-a-Technology-Assisted-Review-Winter-2026.pdf" title="Review Pricing - Per GB Cost for Predictive Coding in a Technology-Assisted Review - Winter 2026"]

Q14 & Q15 — Per Hour Cost for Onsite and Remote Managed Review Attorneys Hourly managed review attorney rates are well understood and show a consistent onsite premium over remote delivery. For onsite review, 45.3% of respondents report rates exceeding $40 per hour, and 32.1% report $25–$40 per hour. For remote review, the distribution shifts: 41.5% report $25–$40 per hour, and 35.8% report greater than $40 per hour. The onsite premium reflects overhead recovery for physical review facilities, security infrastructure, and on-site supervision costs. Despite the normalization of remote review following the pandemic era, onsite review commands a persistent rate premium that clients with physical review requirements should anticipate. The relatively high 'do not know' rates for both onsite (18.9%) and remote (17.0%) suggest that many practitioners engage review vendors without direct visibility into the underlying attorney billing rates — a transparency gap that can make accurate matter budgeting difficult.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Hour-Cost-for-Document-Review-Attorneys-to-Review-Documents-Onsite-Winter-2026.pdf" title="Review Pricing - Per Hour Cost for Document Review Attorneys to Review Documents Onsite - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Hour-Cost-for-Document-Review-Attorneys-to-Review-Documents-Remote-Winter-2026.pdf" title="Review Pricing - Per Hour Cost for Document Review Attorneys to Review Documents Remote - Winter 2026"]

Q16 & Q17 — Cost Per Document for Onsite and Remote Managed Review Per-document billing for human document review carries significant uncertainty across the respondent pool. For onsite per-document review, 34.0% of respondents indicate they do not know the cost — the highest 'do not know' rate among all document review questions. For remote per-document review, 30.2% report not knowing. Among those with visibility, the $0.50–$1.00 per document range dominates for both onsite (30.2%) and remote (28.3%) delivery, with onsite showing a higher proportion of rates exceeding $1.00 per document (22.6% vs. 18.9% remote). Remote per-document review trends lower at the bottom of the range: 13.2% report sub-$0.50 rates for remote work versus only 3.8% for onsite. This directional difference is consistent with lower overhead costs in remote delivery environments. In this analyst's view, where the $0.50–$1.00 per-document rate for human review meets GenAI-assisted pricing in the $0.11–$0.50 range, the economic case for AI-assisted review becomes direct — provided quality and defensibility standards are met. The per-document rate distribution for human review is strategically important as a baseline against which GenAI-assisted review pricing should be evaluated. Where human review rates run $0.50–$1.00 per document and GenAI-assisted alternatives are priced in the $0.11–$0.50 range, the cost differential is substantial enough to drive adoption decisions — though the economic case ultimately depends on matter-specific quality thresholds and the degree to which AI exception handling costs are controlled.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Document-Cost-for-Document-Review-Attorneys-to-Review-Documents-Onsite-Winter-2026.pdf" title="Review Pricing - Per Document Cost for Document Review Attorneys to Review Documents Onsite - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Document-Cost-for-Document-Review-Attorneys-to-Review-Documents-Remote-Winter-2026.pdf" title="Review Pricing - Per Document Cost for Document Review Attorneys to Review Documents Remote - Winter 2026"]

Analyst Observation — Document Review Traditional document review rates have held relatively stable, but the market's increasing inability to articulate per-document pricing — particularly for onsite review — signals a structural shift away from document-count-based billing toward time-based models that are less directly comparable to AI-assisted pricing. Practitioners should push for per-document rate transparency in vendor proposals to enable genuine cost modeling against AI alternatives. Key Takeaways — Section 3

TAR/predictive coding billing is migrating away from per-GB models: 35.8% report alternative pricing, 18.9% don't know — bundled platform pricing is absorbing this cost.
Onsite managed review attorney rates exceed $40/hour for 45.3% of respondents vs. 35.8% for remote — the onsite premium persists.
Per-document review rates cluster in the $0.50–$1.00 range for both onsite and remote, with significant 'do not know' responses (34% onsite, 30.2% remote) indicating a transparency gap.
The $0.50–$1.00 per-document human review baseline sets up direct economic competition with emerging GenAI-assisted review pricing.

Section 4: GenAI-Assisted Review Pricing

The Winter 2026 survey's GenAI section was designed to illuminate where pricing clarity exists, where models are still fluid, and where the industry is beginning to form conventions around AI-assisted document review. What the results reveal is not a uniformly mature market but a bifurcated one: a segment of practitioners actively deploying and pricing GenAI review, and a substantial minority — 17.0% reporting it as not applicable or unknown — who have not yet engaged with it at a pricing level. Both cohorts are represented in the data, and the analysis in this section is relevant to each in different ways. This is not surprising. GenAI-assisted review introduces fundamentally different cost economics than traditional review: provider costs are driven by token consumption, GPU infrastructure, and model licensing — not attorney hours. Translating those costs into buyer-facing pricing structures that are transparent, predictable, and defensible has proven more difficult than the technology adoption itself. Q18 — Primary Model for GenAI-Assisted Review The two leading GenAI pricing models are effectively tied: hybrid pricing (combinations of multiple models) and per-document billing each account for 28.3% of primary model responses (15 respondents each). Per-GB billing captures 11.3%, per-token billing 5.7%, flat monthly subscription 5.7%, and outcome-based pricing 3.8%. Notably, 17.0% report that GenAI-assisted review pricing is not applicable or unknown to them — suggesting a meaningful share of the practitioner community has not yet engaged with AI review at a pricing level. The dominance of hybrid models reflects the reality that many providers are constructing bespoke proposals that combine per-document minimums, per-GB infrastructure charges, and platform subscription components. This complexity makes apples-to-apples comparison difficult for buyers — and may be intentional. Per-document pricing's co-equal standing with hybrid models suggests that a document-level unit of value is widely accepted as a conceptual billing anchor, even when the final structure is more complex. One respondent's comment illustrates the breadth of emerging structures not fully captured by the five survey model options: some providers are pricing GenAI review as an hourly professional service — with consultants performing query engineering, model interaction, and attorney collaboration — billed at standard hourly rates with per-matter minimums and not-to-exceed caps. This hourly professional service model sits outside the per-document or per-GB frameworks the market most commonly discusses, and its presence signals that GenAI pricing model diversity is wider than any single survey's categories can fully contain. Per-token pricing — the underlying cost reality for large language model deployments — has not been widely passed through to buyers (5.7%). This indicates that providers are currently absorbing token cost variability and presenting buyers with higher-order pricing units. As token costs evolve with model efficiency improvements, the degree to which providers pass these economics through will be an important market dynamic to watch.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Primary-Model-for-Gen-AI-Assisted-Review-in-eDiscovery-Winter-2026.pdf" title="Review Pricing - Primary Model for Gen AI-Assisted Review in eDiscovery - Winter 2026"]

Q19 — Average Cost Per Document for GenAI-Assisted Review (Per-Document Model) Among all survey respondents, the $0.26–$0.50 per-document tier is the most frequently cited GenAI price point (20.8%), followed by both the $0.11–$0.25 and $0.05–$0.10 ranges (15.1% each). Seven and a half percent report per-document GenAI rates exceeding $0.50, and 5.7% report rates below $0.05. A significant 35.8% indicate this pricing model is not applicable to them or that they do not know the cost. The broad distribution among those with pricing visibility — from under a nickel to over fifty cents per document — reflects the wide variance in task complexity, model selection, and quality control overhead that different GenAI review implementations involve. The $0.11–$0.50 range represents the most commercially active zone. At the lower end, GenAI review offers compelling cost efficiency relative to the $0.50–$1.00 range for human per-document review. At the upper end of GenAI pricing (>$0.50), the value proposition requires stronger justification — particularly around accuracy, speed, or reduced downstream review burden. Practitioners should push vendors for specificity on what the per-document fee includes: model inference costs alone, or QC, exception handling, and reporting as well.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Average-Cost-Per-Document-in-Per-Document-Model-of-Gen-AI-Assisted-Review-Winter-2026.pdf" title="Review Pricing - Average Cost Per Document in Per Document Model of Gen AI-Assisted Review - Winter 2026"]

Q20 — Average Cost Range for GenAI-Assisted Review (Per-GB Model) Per-GB GenAI pricing is less prevalent in practice — 64.2% of respondents indicate this model is not applicable or unknown. Among those who do report per-GB GenAI pricing, the $25–$50 per GB range is most common (17.0%), followed by below $25 per GB (13.2%). Two respondents (3.8%) report rates exceeding $100 per GB for GenAI review — likely representing specialized, computationally intensive analytical workflows rather than standard review acceleration. Given that data processing at ingestion typically falls below $75 per GB, a per-GB GenAI review charge layered on top represents a meaningful incremental cost. Practitioners evaluating per-GB GenAI pricing should model total matter economics carefully, including whether early data culling through AI reduces the volume that reaches review — potentially offsetting the per-GB GenAI charge with reduced processing and hosting costs downstream.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Average-Cost-Range-Per-GB-in-Per-GB-Model-of-Gen-AI-Assisted-Review-Winter-2026.pdf" title="Review Pricing - Average Cost Range Per GB in Per GB Model of Gen AI-Assisted Review - Winter 2026"]

Q21 — Outcome-Based Pricing Structure for GenAI-Assisted Review Outcome-based pricing for GenAI review remains largely theoretical in the current market: 79.2% of respondents report no applicable experience with it. Among the minority with exposure, custom agreements dominate (9.4%), with small numbers reporting tiered pricing based on review speed improvements (3.8%), fixed fees based on achieved accuracy rates (3.8%), a combination of performance metrics (1.9%), and percentage of cost savings compared to traditional review (1.9%). The theoretical appeal of outcome-based pricing is clear — it aligns provider incentives with client results and distributes AI benefit-sharing in a transparent way. The operational mechanisms, however, remain underdeveloped. Defining accuracy baselines, attributing speed gains to AI versus staffing decisions, and calculating savings against hypothetical traditional review costs are all methodologically complex. The custom-agreement dominance (9.4%) reflects that outcome-based structures, where they exist, are negotiated on a bespoke basis without market-standardized frameworks. In this analyst's view, this is an area where the industry is likely to see active experimentation and standardization attempts in coming survey cycles — though the timeline will depend on how quickly buyers begin demanding performance accountability in AI review contracts.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Typical-Structure-of-Outcome-Based-Pricing-Models-in-Gen-AI-Assisted-Review-Winter-2026.pdf" title="Review Pricing - Typical Structure of Outcome-Based Pricing Models in Gen AI-Assisted Review - Winter 2026"]

Q22 — How Pricing Models Handle Failed or Exception Documents in GenAI Review Exception document handling — documents that fail AI processing or require human intervention — is a practical and financially significant issue that is significantly underappreciated in headline GenAI pricing discussions. Nearly 40% of respondents (39.6%) cannot speak to how their contracts address this scenario. Among those with visibility, no single approach dominates: 18.9% report that exception documents route to manual review at standard rates; 17.0% say handling depends on the specific issue encountered; 9.4% each report that exceptions are charged as additional processing time or included in the base price (no additional charge); and 5.7% report per-document exception billing. The variability of exception handling approaches — and the high proportion of respondents with no visibility — represents a meaningful contract risk for buyers. In matters where a significant share of documents require human intervention, the effective cost of a GenAI-assisted review engagement can increase substantially depending on which exception pricing structure applies. Buyers negotiating GenAI review engagements should require explicit exception handling clauses that specify the triggering conditions, billing treatment, and quality control obligations for documents that exit the AI workflow.

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/03/Review-Pricing-Accounting-for-Docs-That-Fail-To-Process-or-Require-Special-Handing-Gen-AI-Winter-2026.pdf" title="Review Pricing - Accounting for Docs That Fail To Process or Require Special Handing (Gen AI) - Winter 2026"]

Analyst Observation — GenAI-Assisted Review The GenAI pricing market is operationally engaged but structurally immature. The concentration in hybrid and per-document models reflects practitioners and providers reaching for familiar pricing analogues while the technology matures. The $0.11–$0.50 per-document zone is emerging as a competitive market range — one that creates genuine economic pressure on traditional human review for appropriate document populations. The most important near-term challenge for the market is not the headline per-document or per-GB rate, but the hidden cost variables: exception document handling, quality control overhead, model retraining requirements, and the total cost of ownership of integrating GenAI review into existing workflows. One survey respondent offered a perspective worth placing on record: many vendors are still determining their AI pricing strategies, rushing to market to capture first-mover advantage or market share — and that token-based pricing pressures may cause AI solution costs to increase materially in the future, absent significant reductions in GPU infrastructure costs. This caution deserves attention as buyers evaluate multi-year GenAI review commitments. Key Takeaways — Section 4

Hybrid and per-document models are the dominant GenAI pricing structures, each at 28.3% — the market has converged on document-level units but not uniform delivery structures.
The $0.11–$0.50 per-document range is the emerging competitive zone for GenAI-assisted review, with direct economic implications for traditional human review.
Per-token pricing has not been widely passed to buyers (5.7%) — providers are absorbing LLM cost variability for now.
Outcome-based GenAI pricing is theoretically compelling but operationally undeveloped; 79.2% of respondents have no applicable experience.
Exception document handling is an underappreciated contract risk: 39.6% don't know how their agreements address it, and no standard approach has emerged.

Conclusion and Strategic Implications

The Winter 2026 eDiscovery Pricing Survey paints a picture of a market undergoing layered transitions simultaneously: forensic services have found stable pricing floors; processing and hosting have bifurcated between commoditized infrastructure and differentiated analytics tiers; document review is experiencing pricing model fragmentation as AI alternatives create new economic reference points; and GenAI-assisted review is operationally deployed but commercially immature in its pricing structures. For eDiscovery Buyers and Legal Operations Professionals The $250–$350 per hour range for forensic collection provides a reliable negotiation baseline, but buyers should build explicit rate schedules covering analysis and testimony phases — where rates routinely exceed $350 and frequently surpass $550 per hour. Processing and hosting negotiations should move beyond per-GB benchmarks for analytics-enabled and TAR-related services, where bundled models increasingly dominate. For document review, the critical action item is requiring per-document rate transparency even when hourly billing is the primary model — enabling genuine cost modeling against AI review alternatives. Corporate legal operations professionals face a distinct version of these challenges. Unlike law firms that pass eDiscovery costs to clients, in-house legal departments absorb them entirely — making pricing transparency a budget integrity issue, not just a negotiation tactic. The hosting commoditization finding (54.7% below $10/GB/month for basic hosting) and the user licensing transition (34.0% of respondents on alternative models) both represent leverage points in enterprise vendor negotiations that legal operations teams can use directly. The project management escalation finding (26.4% above $200/hour) warrants particular attention for in-house teams managing multi-matter portfolios: as PM rates rise with engagement complexity, the cost of inadequate internal scoping and vendor coordination compounds. Corporate legal operations teams are well-positioned to offset this by investing in internal eDiscovery program management capability rather than outsourcing all coordination to vendor project managers at premium rates. For GenAI-assisted review engagements, two contractual priorities stand out: first, obtain explicit pricing for exception documents rather than accepting provider discretion; second, require specificity on what is included in per-document or per-GB GenAI rates to enable accurate total-cost modeling. The $0.11–$0.50 per-document range is commercially viable for appropriate document populations, but hidden costs can erode that advantage quickly if not addressed in the agreement. For eDiscovery Service Providers and Technology Vendors The survey data confirms that buyers are engaging with GenAI pricing at a level of sophistication that requires providers to move beyond introductory pricing structures. The dominance of hybrid models reflects buyer uncertainty as much as provider flexibility — and that uncertainty is not sustainable as GenAI review becomes a standard engagement component rather than a premium add-on. Providers who develop clear, reproducible pricing structures with transparent exception handling will differentiate themselves in a market where 39.6% of buyers currently report no visibility into this critical cost variable. The trajectory of outcome-based pricing deserves attention. While only a small minority of respondents currently have exposure to these models, the direction of the market — toward accountability for AI review quality, not just delivery — suggests that providers who invest in outcome measurement frameworks now will be better positioned as client sophistication increases.

Looking Ahead: Open Questions for the Evolving eDiscovery Pricing Landscape

Several questions worth watching in future survey cycles: Will per-token pricing migrate from provider cost basis to buyer-facing billing as LLM economics become more visible? Will outcome-based pricing develop standardized frameworks, or remain bespoke indefinitely? Will the onsite/remote premium for forensic collection and attorney review compress as remote delivery tools mature further? And will the exception document handling gap in GenAI contracts become a litigation issue that forces market standardization? The Pricing Pulse series will continue to track these dynamics. The Winter 2026 results establish a pricing baseline at a pivotal moment — one that future surveys will be measured against as generative AI transforms both the economics and the practice of eDiscovery.

Research Methodology Note

The Winter 2026 eDiscovery Pricing Survey was designed and administered by ComplexDiscovery OÜ in partnership with the Electronic Discovery Reference Model (EDRM) as part of the Pricing Pulse research series. The survey was conducted via an online form distributed through ComplexDiscovery's professional community and partner networks. The survey period ran from December 2025 through February 2026, with the data collection window closing upon reaching the final respondent cohort of 53 individuals. The survey comprised 25 pricing questions organized across four service categories — forensic collection and examination, data processing and hosting, document review, and GenAI-assisted review — plus three respondent classification questions addressing geography, business segment, and primary function. Response options were structured as defined ranges rather than open-ended numeric inputs to facilitate comparative analysis and protect respondent pricing confidentiality. All responses represent self-reported market observations and practitioner experience. Results should be interpreted as directional market intelligence reflecting current practitioner perceptions of prevailing pricing, not as verified transaction records or audited benchmarks. The U.S.-centric geographic distribution (92.5%) should be taken into account when applying findings to non-U.S. markets. ComplexDiscovery OÜ maintains editorial independence in the analysis and publication of survey results. Individual respondent data is treated as confidential; only aggregated findings are reported. ComplexDiscovery and the Electronic Discovery Reference Model (EDRM) thank the 53 practitioners and professionals who contributed their time and market knowledge to this research. Organizations and individuals interested in participating in future Pricing Pulse surveys are encouraged to connect with ComplexDiscovery at complexdiscovery.com. © 2026 ComplexDiscovery OÜ. All rights reserved. Published on ComplexDiscovery.com. Conducted in partnership with the Electronic Discovery Reference Model (EDRM). The Pricing Pulse is an ongoing research series examining pricing dynamics across the eDiscovery market. News Source

Rob Robinson and Holley Robinson, ComplexDiscovery OÜ, "Winter 2026 eDiscovery Pricing Survey," February 2026.

[the_ad_group id="12741"]

Assisted by GAI and LLM Technologies Additional Reading

Source: ComplexDiscovery OÜ

ComplexDiscovery’s mission is to enable clarity for complex decisions by providing independent, data‑driven reporting, research, and commentary that make digital risk, legal technology, and regulatory change more legible for practitioners, policymakers, and business leaders.

[exclude_from_rss]

[taq_review]

[/exclude_from_rss] Industry Research

A Complete Analysis of the Winter 2026 eDiscovery Pricing Survey

ComplexDiscovery Staff

Executive Summary

About the Survey

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Primary-Function-Winter-2026.pdf" title="Survey Respondents by Primary Function - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Survey-Respondents-by-Geographic-Region-Winter-2026.pdf" title="Survey Respondents by Geographic Region - Winter 2026"]

Section 1: Forensic Collection, Examination, and Testimony Pricing

$250–$350/hour is the market anchor for both onsite and remote forensic collection (56.6% each).
Onsite collection carries a measurable premium: 20.8% report >$350/hour vs. 5.7% for remote.
Mobile device collection rates have converged with computer collection at the upper tier (both ~50% report >$350/device).
Investigation, analysis, and report generation rates escalate to $350–$550/hour for 54.7% of respondents.
Expert witness testimony exceeds $550/hour for 26.4% — the highest proportion across all survey categories.

Section 2: Data Processing, Hosting, and Project Management Pricing

Processing at ingestion is largely below $75/GB (73.6% combined), but completion-phase pricing climbs with 24.5% reporting $100/GB or more.
Alternative pricing models account for 18.9% at ingestion and 22.6% at completion — signaling a structural shift away from per-GB processing billing.
Basic hosting has commoditized: 54.7% report sub-$10/GB/month. Analytics hosting retains differentiation with 11.3% exceeding $25/GB/month.
User licensing is migrating from per-seat to bundled models — 34.0% report alternative pricing structures.
Project management rates are well understood and rising: 26.4% now exceed $200/hour, reflecting growing engagement complexity.

Section 3: Document Review Pricing

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Hour-Cost-for-Document-Review-Attorneys-to-Review-Documents-Remote-Winter-2026.pdf" title="Review Pricing - Per Hour Cost for Document Review Attorneys to Review Documents Remote - Winter 2026"]

[pdf-embedder url="https://complexdiscovery.com/wp-content/uploads/2026/02/Review-Pricing-Per-Document-Cost-for-Document-Review-Attorneys-to-Review-Documents-Remote-Winter-2026.pdf" title="Review Pricing - Per Document Cost for Document Review Attorneys to Review Documents Remote - Winter 2026"]

TAR/predictive coding billing is migrating away from per-GB models: 35.8% report alternative pricing, 18.9% don't know — bundled platform pricing is absorbing this cost.
Onsite managed review attorney rates exceed $40/hour for 45.3% of respondents vs. 35.8% for remote — the onsite premium persists.
Per-document review rates cluster in the $0.50–$1.00 range for both onsite and remote, with significant 'do not know' responses (34% onsite, 30.2% remote) indicating a transparency gap.
The $0.50–$1.00 per-document human review baseline sets up direct economic competition with emerging GenAI-assisted review pricing.

Section 4: GenAI-Assisted Review Pricing

Hybrid and per-document models are the dominant GenAI pricing structures, each at 28.3% — the market has converged on document-level units but not uniform delivery structures.
The $0.11–$0.50 per-document range is the emerging competitive zone for GenAI-assisted review, with direct economic implications for traditional human review.
Per-token pricing has not been widely passed to buyers (5.7%) — providers are absorbing LLM cost variability for now.
Outcome-based GenAI pricing is theoretically compelling but operationally undeveloped; 79.2% of respondents have no applicable experience.
Exception document handling is an underappreciated contract risk: 39.6% don't know how their agreements address it, and no standard approach has emerged.

Conclusion and Strategic Implications

Looking Ahead: Open Questions for the Evolving eDiscovery Pricing Landscape

Research Methodology Note

Rob Robinson and Holley Robinson, ComplexDiscovery OÜ, "Winter 2026 eDiscovery Pricing Survey," February 2026.

[the_ad_group id="12741"]

Assisted by GAI and LLM Technologies Additional Reading

Source: ComplexDiscovery OÜ

ComplexDiscovery’s mission is to enable clarity for complex decisions by providing independent, data‑driven reporting, research, and commentary that make digital risk, legal technology, and regulatory change more legible for practitioners, policymakers, and business leaders.