Have You Tried Using a ‘Nearest Neighbor Search’?

A Nearest Neighbor Search is perhaps the simplest procedure you might conceive of if presented with a machine-learning-type problem while under the influence of some sort of generalized “veil of ignorance”. Though there exist slightly more complicated variations in the algorithm, the basic principle of all of them is effectively the same.

Editor’s Update: Although this algorithm can be a useful starting point to understand concepts, it is worth noting that it can have trouble reaching the high recall levels we demand for eDiscovery. For more on this topic please see the excellent article TAR, Proportionality, and Bad Algorithms (1-NN) by Bill Dimm.

Extract from an article by Gregory Stein (Published on the Caches to Caches Blog)

Roughly a year and a half ago, I had the privilege of taking a graduate “Introduction to Machine Learning” course under the tutelage of the fantastic Professor Leslie Kaelbling. While I learned a great deal over the course of the semester, there was one minor point that she made to the class which stuck with me more than I expected it to at the time: before using a really fancy or sophisticated or “in-vogue” machine learning algorithm to solve your problem, try a simple Nearest Neighbor Search first.

Let’s say I gave you a bunch of data points, each with a location in space and a value, and then asked you to predict the value of a new point in space. What would you do? Perhaps the values of you data are binary (just +s and -s) and you’ve heard of Support Vector Machines. Should you give that a shot? Maybe the values are continuously valued (anything on the real number line) and you feel like giving Linear Regression a whirl. Or you’ve heard of the ongoing Deep Learning Revolution and feel sufficiently bold as to tackle this problem with a Neural Net.

…or you could simply find the closest point in your dataset to the one you’re interested in and offer up the value of this “nearest neighbor”. A Nearest Neighbor Search is perhaps the simplest procedure you might conceive of if presented with a machine-learning-type problem while under the influence of some sort of generalized “veil of ignorance”. Though there exist slightly more complicated variations in the algorithm, the basic principle of all of them is effectively the same.

Additional Reading

Source: ComplexDiscovery

Have a Request?

If you have information or offering requests that you would like to ask us about, please let us know and we will make our response to you a priority.

ComplexDiscovery is an online publication that highlights data and legal discovery insight and intelligence ranging from original research to aggregated news for use by business, information technology, and legal professionals. The highly targeted publication seeks to increase the collective understanding of readers regarding data and legal discovery information and issues and to provide an objective resource for considering trends, technologies, and services related to electronically stored information.

ComplexDiscovery OÜ is a technology marketing firm providing strategic planning and tactical execution expertise in support of data and legal discovery organizations. Registered as a private limited company in the European Union country of Estonia, one of the most digitally advanced countries in the world, ComplexDiscovery OÜ operates virtually worldwide to deliver marketing consulting and services.

A (Brand) New Approach? Considering the Framework and Structure of eDiscovery Offerings

Today’s eDiscovery providers may benefit from the lessons learned in the creation of the Sgt. Pepper’s Lonely Hearts Club Band album by creating a concept for branding and packaging their offerings within that brand in a connected, theme-based way that represents the offerings’ promise and capability in a way that is easy to understand and remember.

This fictionalized branding approach was developed from the intellectual exercise of trying to figure out a reasonable and memorable way to descriptively highlight the promise and capabilities of offerings typically delivered by full-service eDiscovery providers. It may not be completely comprehensive or fully normalized. However, the hope of sharing this branding example is that it might help those involved in the branding and communication of eDiscovery provider services and solutions.

First Legal Acquires eDiscovery Provider Redpoint Technologies

According to Alex Martinez, CEO of First Legal, “Both First Legal...

Veristar Acquires Planet Data

According to Veristar company founder, CEO, and president Rick Avers, “We...

Questel Acquires doeLEGAL

doeLEGAL today announced that it has been acquired by intellectual property...

Following the Money? Mike Bryant Provides a SOLID Look at Legal Tech Merger and Acquisition Activity

From seed and venture capital investments to private equity and Special...

A New Era in eDiscovery? Framing Market Growth Through the Lens of Six Eras

There are many excellent resources for considering chronological and historiographical approaches...

An eDiscovery Market Size Mashup: 2020-2025 Worldwide Software and Services Overview

While the Compound Annual Growth Rate (CAGR) for worldwide eDiscovery software...

Resetting the Baseline? eDiscovery Market Size Adjustments for 2020

An unanticipated pandemeconomic-driven retraction in eDiscovery spending during 2020 has resulted...

Home or Away? New eDiscovery Collection Market Sizing and Pricing Considerations

One of the key home (onsite) or away (remote) decisions that...

Five Great Reads on eDiscovery for February 2021

From litigation trends and legal tech investing to facial recognition and...

Five Great Reads on eDiscovery for January 2021

From eDiscovery business confidence and operational metrics to merger and acquisition...

Five Great Reads on eDiscovery for December 2020

May the peace and joy of the holiday season be with...

Five Great Reads on eDiscovery for November 2020

From market sizing and cyber law to industry investments and customer...

HaystackID Recognized in IDC MarketScape for eDiscovery Services

According to HaystackID CEO Hal Brooks, “We are proud to once...

A Generational View of Remote Security? HaystackID™ Releases 3.0 Security Enhancements to Review Technology

According to HaystackID's Senior Vice President and General Manager for Review...

Only a Matter of Time? HaystackID Launches New Service for Data Breach Discovery and Review

According to HaystackID's Chief Innovation Officer and President of Global Investigations,...

It’s a Match! Focusing on the Total Cost of eDiscovery Review with ReviewRight Match

As a leader in remote legal document review, HaystackID provides clients...

Cold Weather Catch? Predictive Coding Technologies and Protocols Survey – Spring 2021 Results

The Predictive Coding Technologies and Protocols Survey is a non-scientific semi-annual...

Out of the Woods? Eighteen Observations on eDiscovery Business Confidence in the Winter of 2021

In the winter of 2021, 85.0% of eDiscovery Business Confidence Survey...

Issues Impacting eDiscovery Business Performance: A Winter 2021 Overview

In the winter of 2021, 43.3% of respondents viewed budgetary constraints...

Not So Outstanding? eDiscovery Operational Metrics in the Winter of 2021

In the winter of 2021, eDiscovery Business Confidence Survey more...