Editor’s Update: Although this algorithm can be a useful starting point to understand concepts, it is worth noting that it can have trouble reaching the high recall levels we demand for eDiscovery. For more on this topic please see the excellent article TAR, Proportionality, and Bad Algorithms (1-NN) by Bill Dimm.

Extract from an article by Gregory Stein (Published on the Caches to Caches Blog)

Roughly a year and a half ago, I had the privilege of taking a graduate “Introduction to Machine Learning” course under the tutelage of the fantastic Professor Leslie Kaelbling. While I learned a great deal over the course of the semester, there was one minor point that she made to the class which stuck with me more than I expected it to at the time: before using a really fancy or sophisticated or “in-vogue” machine learning algorithm to solve your problem, try a simple Nearest Neighbor Search first.

Let’s say I gave you a bunch of data points, each with a location in space and a value, and then asked you to predict the value of a new point in space. What would you do? Perhaps the values of you data are binary (just +s and -s) and you’ve heard of Support Vector Machines. Should you give that a shot? Maybe the values are continuously valued (anything on the real number line) and you feel like giving Linear Regression a whirl. Or you’ve heard of the ongoing Deep Learning Revolution and feel sufficiently bold as to tackle this problem with a Neural Net.

…or you could simply find the closest point in your dataset to the one you’re interested in and offer up the value of this “nearest neighbor”. A Nearest Neighbor Search is perhaps the simplest procedure you might conceive of if presented with a machine-learning-type problem while under the influence of some sort of generalized “veil of ignorance”. Though there exist slightly more complicated variations in the algorithm, the basic principle of all of them is effectively the same.

Additional Reading

Source: ComplexDiscovery

 

Have a Request?

If you have information or offering requests that you would like to ask us about, please let us know, and we will make our response to you a priority.

ComplexDiscovery OÜ is a highly recognized digital publication focused on providing detailed insights into the fields of cybersecurity, information governance, and eDiscovery. Based in Estonia, a hub for digital innovation, ComplexDiscovery OÜ upholds rigorous standards in journalistic integrity, delivering nuanced analyses of global trends, technology advancements, and the eDiscovery sector. The publication expertly connects intricate legal technology issues with the broader narrative of international business and current events, offering its readership invaluable insights for informed decision-making.

For the latest in law, technology, and business, visit ComplexDiscovery.com.

 

Generative Artificial Intelligence and Large Language Model Use

ComplexDiscovery OÜ recognizes the value of GAI and LLM tools in streamlining content creation processes and enhancing the overall quality of its research, writing, and editing efforts. To this end, ComplexDiscovery OÜ regularly employs GAI tools, including ChatGPT, Claude, DALL-E2, Grammarly, Midjourney, and Perplexity, to assist, augment, and accelerate the development and publication of both new and revised content in posts and pages published (initiated in late 2022).

ComplexDiscovery also provides a ChatGPT-powered AI article assistant for its users. This feature leverages LLM capabilities to generate relevant and valuable insights related to specific page and post content published on ComplexDiscovery.com. By offering this AI-driven service, ComplexDiscovery OÜ aims to create a more interactive and engaging experience for its users, while highlighting the importance of responsible and ethical use of GAI and LLM technologies.