Fri. Mar 29th, 2024
ARCHIVED CONTENT
You are viewing ARCHIVED CONTENT released online between 1 April 2010 and 24 August 2018 or content that has been selectively archived and is no longer active. Content in this archive is NOT UPDATED, and links may not function.
 

Editor’s Note: A widely recognized expert in search and retrieval technology, Dr. Herb Roitblat shares explanations and thoughts on machine learning in this new article extract from his eDiscovery Science blog.

Extract from article by Herbert L. Roitblat, Ph.D.

Machine learning still stands on the three core features of representation, assessment, and optimization.  As a result, many machine learning systems tend to return approximately the same results.  Given a choice between more clever algorithms and better quality training data, it is often preferable to spend the effort on better data.

Particularly in the context of categorizing documents, the machine learning algorithm makes little difference to the ultimate outcome of the project.  The same algorithm can lead to good results or to bad results, depending on how it is used.  We had a project, for example, in which several very similar datasets were to be categorized.  Each set was handled by a different attorney who was to provide the training.  One attorney paid close attention to the problem, worked at making consistent decisions, and selected the training examples over a few days.  The system did really well on this set.  The other attorneys did their work to select training documents less systematically selecting only a few documents per day, with several days in between.  The very same system did poorly on these sets.

To summarize: Learning algorithms, particularly those involved in document categorization all work to maximize the probability that a document will be categorized correctly given its content.  The tools that machine learning provides to accomplish this are to adjust the importance of the document elements (for example, the words, or whatever representation is used).  Methods differ in how they represent the documents and this difference can be critically important.  The quality, primarily, the consistency, of the training examples is also a critically important part of machine learning.  It does not much matter how these documents are selected, provided that the examples are representative of those that need to be categorized, that they are consistently coded, and that the coding represents the actual desired code for each one.

Additional Reading:

 

Generative Artificial Intelligence and Large Language Model Use

ComplexDiscovery OÜ recognizes the value of GAI and LLM tools in streamlining content creation processes and enhancing the overall quality of its research, writing, and editing efforts. To this end, ComplexDiscovery OÜ regularly employs GAI tools, including ChatGPT, Claude, Midjourney, and DALL-E, to assist, augment, and accelerate the development and publication of both new and revised content in posts and pages published (initiated in late 2022).

ComplexDiscovery also provides a ChatGPT-powered AI article assistant for its users. This feature leverages LLM capabilities to generate relevant and valuable insights related to specific page and post content published on ComplexDiscovery.com. By offering this AI-driven service, ComplexDiscovery OÜ aims to create a more interactive and engaging experience for its users, while highlighting the importance of responsible and ethical use of GAI and LLM technologies.

 

Have a Request?

If you have information or offering requests that you would like to ask us about, please let us know, and we will make our response to you a priority.

ComplexDiscovery OÜ is a highly recognized digital publication focused on providing detailed insights into the fields of cybersecurity, information governance, and eDiscovery. Based in Estonia, a hub for digital innovation, ComplexDiscovery OÜ upholds rigorous standards in journalistic integrity, delivering nuanced analyses of global trends, technology advancements, and the eDiscovery sector. The publication expertly connects intricate legal technology issues with the broader narrative of international business and current events, offering its readership invaluable insights for informed decision-making.

For the latest in law, technology, and business, visit ComplexDiscovery.com.