Wed. Apr 24th, 2024
Learning curves for different example selection methods
ARCHIVED CONTENT
You are viewing ARCHIVED CONTENT released online between 1 April 2010 and 24 August 2018 or content that has been selectively archived and is no longer active. Content in this archive is NOT UPDATED, and links may not function.
 

By William Webber

The problem with agreeing to teach is that you have less time for blogging, and the problem with a hiatus in blogging is that the topic you were in the middle of discussing gets overtaken by questions of more immediate interest. I hope to return to the question of simulating assessor error in a later post, but first I want to talk about an issue that is attracting attention at the moment: how to select documents for training a predictive coding system.

The catalyst for this current interest is “Evaluation of Machine Learning Protocols for Technology Assisted Review in Electronic Discovery”, recently presented at SIGIR by Gord Cormack and Maura Grossman. In fact, there are two questions, somewhat conflated in Maura and Gord’s paper. The first is, should the initial seed set of training examples be selected by random or by judgmental sampling (for instance, based on a keyword query)? The second is, having seeded the training cycle, how should subsequent training examples be selected? I’ll tackle the second issue here; I hope to return to the first in a later post.

 

Have a Request?

If you have information or offering requests that you would like to ask us about, please let us know, and we will make our response to you a priority.

ComplexDiscovery OÜ is a highly recognized digital publication focused on providing detailed insights into the fields of cybersecurity, information governance, and eDiscovery. Based in Estonia, a hub for digital innovation, ComplexDiscovery OÜ upholds rigorous standards in journalistic integrity, delivering nuanced analyses of global trends, technology advancements, and the eDiscovery sector. The publication expertly connects intricate legal technology issues with the broader narrative of international business and current events, offering its readership invaluable insights for informed decision-making.

For the latest in law, technology, and business, visit ComplexDiscovery.com.

 

Generative Artificial Intelligence and Large Language Model Use

ComplexDiscovery OÜ recognizes the value of GAI and LLM tools in streamlining content creation processes and enhancing the overall quality of its research, writing, and editing efforts. To this end, ComplexDiscovery OÜ regularly employs GAI tools, including ChatGPT, Claude, Midjourney, and DALL-E, to assist, augment, and accelerate the development and publication of both new and revised content in posts and pages published (initiated in late 2022).

ComplexDiscovery also provides a ChatGPT-powered AI article assistant for its users. This feature leverages LLM capabilities to generate relevant and valuable insights related to specific page and post content published on ComplexDiscovery.com. By offering this AI-driven service, ComplexDiscovery OÜ aims to create a more interactive and engaging experience for its users, while highlighting the importance of responsible and ethical use of GAI and LLM technologies.