Mon. Jun 27th, 2022
    2010-2018 ARCHIVED CONTENT
    You are viewing ARCHIVED CONTENT released online from 1 April, 2010 to August 24, 2018. Content in this archive site is NOT UPDATED, and links may not function. For current information, go to ComplexDiscovery.com.

    Predictive Coding One-Question Provider Implementation Survey – Initial Results

    Provided below for your consideration and use are the in-progress results of the One-Question Provider Implementation Survey launched by ComplexDiscovery on 3/3/13.

    The goal of the survey is to provide a specific and detailed look at the use of the technology-assisted review feature of predictive coding among leading eDiscovery providers as represented by those providers. The specific technologies highlighted in the one-question survey include:

    • Active Learning
    • Language Modeling
    • Latent Semantic Analysis
    • Linguistic Analysis
    • Naive Bayesian Classifier
    • Nearest Neighbor Classifier
    • Probabilistic Latent Semantic Analysis
    • Relevance Feedback
    • Support Vector Machine
    • Other (Provider Machine Learning Approach Not Listed)

    The in-progress results consist of survey answers harvested directly from the online survey form as completed by provider representatives. Additional survey responses from the eDiscovery provider community will be added to this listing as they are completed.

    Additional responders are welcome and encouraged. Click here to go to survey.

    Note: The running results of a previously presented general survey on eDiscovery provider use of predictive coding are available for review(2) (click here for survey results).  The initial 120-second survey(3) (click here for initial survey form) contained six high level questions related to technology development, offering integration, machine learning approach and sampling approach of providers in relation to predictive coding.  The following one-question provider implementation survey was designed to build on the machine learning question from the initial general survey by providing additional and important layers of detail.

    Updated 9/16/2013

    Current Responders

    • @Legal
    • Altep
    • BIA
    • Catalyst Repository Systems
    • Content Analyst
    • D4
    • Daegis
    • Driven
    • Huron Legal
    • kCura
    • Kroll Ontrack
    • Liquid Litigation Management (LLM)
    • Nuix
    • Orange Legal Technologies
    • OrcaTec
    • Prolorem
    • Recommind
    • Servient
    • Symantec/Clearwell
    • TCDI
    • UBIC
    • Valora Technologies
    • Xerox Litigation Services

    Current Responses (By Provider)

    @Legal (CasePoint TAR) @AtLegal

    • Nearest Neighbor Classifier:  A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.
    • Support Vector Machine:  A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

    Altep (Tag | Content Analyst) @Altep_Inc

    • Active Learning: An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified.  In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line.  The line is moved if any of the presented documents has been misclassified.
    • Latent Semantic Analysis:  A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words.  LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.

    BIA (BIA Predictive Coding Engine | Coda™) @BIA_eDiscovery

    • Active Learning: An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified.  In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line.  The line is moved if any of the presented documents has been misclassified.
    • Language Modeling:  A mathematical approach that seeks to summarize the meaning of words by looking at how they are used in the set of documents.  Language modeling in predictive coding builds a model for word occurrence in the responsive and in the non-responsive documents and classifies documents according to the model that best accounts for the words in a document being considered.
    • Latent Semantic Analysis:  A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words.  LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.
    • Linguistic Analysis:  Linguists examine responsive and non-responsive documents to derive classification rules that maximize the correct classification of documents.
    • Naïve Bayesian Classifier:  A system that examines the probability that each word in a new document came from the word distribution derived from trained responsive document or from trained non-responsive documents.  The system is naïve in the sense that it assumes that all words are independent of one another.
    • Nearest Neighbor Classifier:  A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.
    • Probabilistic Latent Semantic Analysis:  A second mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words.  PLSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.
    • Relevance Feedback:  A computational model that adjusts the criteria for implicitly identifying responsive documents following feedback by a knowledgeable user as to which documents are relevant and which are not.
    • Support Vector Machine:  A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

    Catalyst Repository Systems (Insight Predict) @CatalystSecure

    • Nearest Neighbor Classifier:  A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.

    Content Analyst (Conceptual Categorization) @Content_Analyst

    • Active Learning: An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified.  In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line.  The line is moved if any of the presented documents has been misclassified.
    • Latent Semantic Analysis:  A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words.  LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.

    D4 Discovery (Equivio Relevance) @D4Discovery

    • Active Learning: An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified.  In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line.  The line is moved if any of the presented documents has been misclassified.
    • Support Vector Machine:  A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

    Daegis (Acumen) @Daegis

    • Active Learning: An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified.  In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line.  The line is moved if any of the presented documents has been misclassified.
    • Probabilistic Latent Semantic Analysis:  A second mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words.  PLSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.

    Driven (START (Simplified Technology Assisted Review Tool)) (Content Analyst)

    • Active Learning: An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified.  In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line.  The line is moved if any of the presented documents has been misclassified.
    • Latent Semantic Analysis:  A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words.  LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.
    • Nearest Neighbor Classifier:  A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.
    • Relevance Feedback:  A computational model that adjusts the criteria for implicitly identifying responsive documents following feedback by a knowledgeable user as to which documents are relevant and which are not.

    Huron Legal (Integrated Analytics | PureDiscovery) @HuronLegal

    • Nearest Neighbor Classifier:  A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.

    kCura (Relativity Assisted Review | Content Analyst)

    • Latent Semantic Analysis:  A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words.  LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.
    • Nearest Neighbor Classifier:  A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.
    • Relevance Feedback:  A computational model that adjusts the criteria for implicitly identifying responsive documents following feedback by a knowledgeable user as to which documents are relevant and which are not.

    Kroll Ontrack (Intelligent Review Technology (IRT) | Ontrack® Inview™)

    • Active Learning: An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified.  In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line.  The line is moved if any of the presented documents has been misclassified.
    • Other:  Logistical Regression is well-accepted by computer science and information retrieval communities as a sound statistical modeling approach for data analysis and predictive modeling.  Logistic regression is a form of supervised learning, in that a logistic regression model is produced by “training” on a set of documents that have been manually categorized.  Once trained, the logistic regression model can be used to estimate the probability that a new document belongs to each of the possible categories. The model can use both content features such as words and phrases, and metadata features such as custodian, date, file type, and contextual information. They can be applied to data sets with millions of documents and billions of content features, and are one of the most effective approaches in a wide range of text and data mining tasks.

    Liquid Litigation Management (Technology-Assisted Review) @LLMinc

    • Active Learning: An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified.  In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line.  The line is moved if any of the presented documents has been misclassified.
    • Latent Semantic Analysis:  A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words.  LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.

    Nuix (Automatic Classification) @Nuix

    • Naïve Bayesian Classifier:  A system that examines the probability that each word in a new document came from the word distribution derived from trained responsive document or from trained non-responsive documents.  The system is naïve in the sense that it assumes that all words are independent of one another.

    Orange Legal Technologies (Predictive Review | OrcaTec) @OrangeLT 

    • Language Modeling:  A mathematical approach that seeks to summarize the meaning of words by looking at how they are used in the set of documents.  Language modeling in predictive coding builds a model for word occurrence in the responsive and in the non-responsive documents and classifies documents according to the model that best accounts for the words in a document being considered.

     OrcaTec (OrcaPredict) @OrcaTec

    • Language Modeling:  A mathematical approach that seeks to summarize the meaning of words by looking at how they are used in the set of documents.  Language modeling in predictive coding builds a model for word occurrence in the responsive and in the non-responsive documents and classifies documents according to the model that best accounts for the words in a document being considered.

    Prolorem (Prolorem eDi)

    • Support Vector Machine:  A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

    Recommind (Predictive Coding | Axelerate) @Recommind

    • Probabilistic Latent Semantic Analysis:  A second mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words.  PLSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.
    • Support Vector Machine:  A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

    Servient (Servient Predictive Review) @Servient

    • Active Learning: An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified.  In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line.  The line is moved if any of the presented documents has been misclassified.
    • Language Modeling:  A mathematical approach that seeks to summarize the meaning of words by looking at how they are used in the set of documents.  Language modeling in predictive coding builds a model for word occurrence in the responsive and in the non-responsive documents and classifies documents according to the model that best accounts for the words in a document being considered.
    • Latent Semantic Analysis:  A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words.  LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.
    • Linguistic Analysis:  Linguists examine responsive and non-responsive documents to derive classification rules that maximize the correct classification of documents.
    • Naïve Bayesian Classifier:  A system that examines the probability that each word in a new document came from the word distribution derived from trained responsive document or from trained non-responsive documents.  The system is naïve in the sense that it assumes that all words are independent of one another.
    • Nearest Neighbor Classifier:  A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.
    • Relevance Feedback:  A computational model that adjusts the criteria for implicitly identifying responsive documents following feedback by a knowledgeable user as to which documents are relevant and which are not.
    • Support Vector Machine:  A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

    Symantec (Clearwell Systems) (Transparent Predictive Coding) @SYMCeDiscovery

    • Active Learning: An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified.  In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line.  The line is moved if any of the presented documents has been misclassified.
    • Relevance Feedback:  A computational model that adjusts the criteria for implicitly identifying responsive documents following feedback by a knowledgeable user as to which documents are relevant and which are not.
    • Support Vector Machine:  A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

    TCDI (Suggestive Coding | Content Analyst)

    • Latent Semantic Analysis:  A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words.  LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.

    UBIC (CJK TAR)

    • Active Learning: An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified.  In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line.  The line is moved if any of the presented documents has been misclassified.
    • Language Modeling:  A mathematical approach that seeks to summarize the meaning of words by looking at how they are used in the set of documents.  Language modeling in predictive coding builds a model for word occurrence in the responsive and in the non-responsive documents and classifies documents according to the model that best accounts for the words in a document being considered.
    • Latent Semantic Analysis:  A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words.  LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.
    • Nearest Neighbor Classifier:  A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.
    • Probabilistic Latent Semantic Analysis:  A second mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words.  PLSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.
    • Relevance Feedback:  A computational model that adjusts the criteria for implicitly identifying responsive documents following feedback by a knowledgeable user as to which documents are relevant and which are not.
    • Support Vector Machine:  A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

    Valora Technologies (Valora Auto Review Services) @ValoraTech

    • Other:  Probabilistic Hierarchical Context-Free Grammars approach to machine learning.

    Xerox Litigation Services (CategoriX) @Xerox_XLS

    • Active Learning: An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified.  In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line.  The line is moved if any of the presented documents has been misclassified.
    • Language Modeling:  A mathematical approach that seeks to summarize the meaning of words by looking at how they are used in the set of documents.  Language modeling in predictive coding builds a model for word occurrence in the responsive and in the non-responsive documents and classifies documents according to the model that best accounts for the words in a document being considered.
    • Linguistic Analysis:  Linguists examine responsive and non-responsive documents to derive classification rules that maximize the correct classification of documents.
    • Probabilistic Latent Semantic Analysis:  A second mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words.  PLSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.
    • Relevance Feedback:  A computational model that adjusts the criteria for implicitly identifying responsive documents following feedback by a knowledgeable user as to which documents are relevant and which are not.
    • Support Vector Machine:  A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

    Current Responses (By Answer)

    Active Learning

    Active Learning: An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified.  In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line.  The line is moved if any of the presented documents has been misclassified.

    • Altep
    • BIA
    • Content Analyst
    • D4
    • Daegis
    • Driven
    • Kroll Ontrack
    • Liquid Litigation Management (LLM)
    • Servient
    • Symantec/Clearwell
    • UBIC
    • Xerox Litigation Services

    Language Modeling

    Language Modeling:  A mathematical approach that seeks to summarize the meaning of words by looking at how they are used in the set of documents.  Language modeling in predictive coding builds a model for word occurrence in the responsive and in the non-responsive documents and classifies documents according to the model that best accounts for the words in a document being considered.

    • BIA
    • Orange Legel Technologies
    • OrcaTec
    • Servient
    • UBIC
    • Xerox Litigation Services

    Latent Semantic Analysis

    Latent Semantic Analysis:  A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words.  LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.

    • Altep
    • BIA
    • Content Analyst
    • Driven
    • kCura
    • Liquid Litigation Management (LLM)
    • Servient
    • TCDI
    • UBIC

    Linguistic Analysis

    Linguistic Analysis:  Linguists examine responsive and non-responsive documents to derive classification rules that maximize the correct classification of documents.

    • BIA
    • Servient
    • Xerox Litigation Services

    Naive Bayesian Classifier

    Naïve Bayesian Classifier:  A system that examines the probability that each word in a new document came from the word distribution derived from trained responsive document or from trained non-responsive documents.  The system is naïve in the sense that it assumes that all words are independent of one another.

    • BIA
    • Nuix
    • Servient

    Nearest Neighbor Classifier

    Nearest Neighbor Classifier:  A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.

    • @Legal
    • BIA
    • Catalyst Repository Systems
    • Driven
    • Huron Legal
    • kCura
    • Servient
    • UBIC

    Probabilistic Latent Semantic Analysis

    Probabilistic Latent Semantic Analysis:  A second mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words.  PLSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.

    • BIA
    • Daegis
    • Recommind
    • UBIC
    • Xerox Litigation Services

    Relevance Feedback

    Relevance Feedback:  A computational model that adjusts the criteria for implicitly identifying responsive documents following feedback by a knowledgeable user as to which documents are relevant and which are not.

    • BIA
    • Driven
    • kCura
    • Servient
    • Symantec/Clearwell
    • UBIC
    • Xerox Litigation Services

    Support Vector Machine

    Support Vector Machine:  A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

    • @Legal
    • BIA
    • D4
    • Prolorem
    • Recommind
    • Servient
    • Symantec/Clearwell
    • UBIC
    • Xerox Litigation Services

    Other: Logistical Regression

    • Kroll Ontrack

    Other: Probabilistic Hierarchical Context-Free Grammars

    • Valora Technologies

    End of Survey Results

    PredictiveCoding-Words

     

     

    Have a Request?

    If you have information or offering requests that you would like to ask us about, please let us know and we will make our response to you a priority.

    ComplexDiscovery is an online publication that highlights cyber, data, and legal discovery insight and intelligence ranging from original research to aggregated news for use by cybersecurity, information governance, and eDiscovery professionals. The highly targeted publication seeks to increase the collective understanding of readers regarding cyber, data, and legal discovery information and issues and to provide an objective resource for considering trends, technologies, and services related to electronically stored information.

    ComplexDiscovery OÜ is a technology marketing firm providing strategic planning and tactical execution expertise in support of cyber, data, and legal discovery organizations. Focused primarily on supporting the ComplexDiscovery publication, the company is registered as a private limited company in the European Union country of Estonia, one of the most digitally advanced countries in the world. The company operates virtually worldwide to deliver marketing consulting and services.

    Early Lessons from the Cyber War: A New Microsoft Report on Defending Ukraine

    According to a new report from Microsoft, the Russian invasion relies...

    From Continuity to Culture? Preserving and Securing Ukrainian Public and Private Sector Data

    Highlighted by ComplexDiscovery prior to the start of the current Ukrainian...

    Considering Access Control Policy Models? Blockchain for Access Control Systems (NIST)

    As current information systems and network architectures evolve to be more...

    Friends in Low Places? The 2022 Data Breach Investigations Report from Verizon

    The 15th Annual Data Breach Investigations Report (DBIR) from Verizon looked...

    TCDI to Acquire Aon’s eDiscovery Practice

    According to TCDI Founder and CEO Bill Johnson, “For 30 years,...

    Smarsh to Acquire TeleMessage

    “As in many other service industries, mobile communication is ubiquitous in...

    A Milestone Quarter? DISCO Announces First Quarter 2022 Financial Results

    According to Kiwi Camara, Co-Founder and CEO of DISCO, “This quarter...

    New from Nuix? Macquarie Australia Conference 2022 Presentation and Trading Update

    From a rebalanced leadership team to three concurrent horizons to drive...

    On the Move? 2022 eDiscovery Market Kinetics: Five Areas of Interest

    Recently ComplexDiscovery was provided an opportunity to share with the eDiscovery...

    Trusting the Process? 2021 eDiscovery Processing Task, Spend, and Cost Data Points

    Based on the complexity of cybersecurity, information governance, and legal discovery,...

    The Year in Review? 2021 eDiscovery Review Task, Spend, and Cost Data Points

    Based on the complexity of cybersecurity, information governance, and legal discovery,...

    A 2021 Look at eDiscovery Collection: Task, Spend, and Cost Data Points

    Based on the complexity of cybersecurity, information governance, and legal discovery,...

    Five Great Reads on Cyber, Data, and Legal Discovery for June 2022

    From eDiscovery ecosystem players and pricing to data breach investigations and...

    Five Great Reads on Cyber, Data, and Legal Discovery for May 2022

    From eDiscovery pricing and buyers to cyberattacks and incident response, the...

    Five Great Reads on Cyber, Data, and Legal Discovery for April 2022

    From cyber attack statistics and frameworks to eDiscovery investments and providers,...

    Five Great Reads on Cyber, Data, and Legal Discovery for March 2022

    From new privacy frameworks and disinformation to business confidence and the...

    Hot or Not? Summer 2022 eDiscovery Business Confidence Survey

    Since January 2016, 2,701 individual responses to twenty-six quarterly eDiscovery Business...

    Inflection or Deflection? An Aggregate Overview of Eight Semi-Annual eDiscovery Pricing Surveys

    Initiated in the winter of 2019 and conducted eight times with...

    Feeding the Frenzy? Summer 2022 eDiscovery Pricing Survey Results

    Initiated in the winter of 2019 and conducted eight times with...

    Surge or Splurge? Eighteen Observations on eDiscovery Business Confidence in the Spring of 2022

    In the spring of 2022, 63.5% of survey respondents felt that...