Tue. Apr 23rd, 2024

Content Assessment: Butter Knives and Chainsaws? Balancing Traditional Algorithms and Artificial Intelligence

Information - 93%
Insight - 95%
Relevance - 94%
Objectivity - 93%
Authority - 95%

94%

Excellent

A short percentage-based assessment of the qualitative benefit of the recent article by Chief Data Scientist John Brewer on using artificial intelligence to solve problems already addressed by traditional algorithms.

Editor’s Note: From time to time, ComplexDiscovery highlights publicly available or privately purchasable announcements, content updates, and research from cyber, data, and legal discovery providers, research organizations, and ComplexDiscovery community members. While ComplexDiscovery regularly highlights this information, it does not assume any responsibility for content assertions.

Contact us today to submit recommendations for consideration and inclusion in ComplexDiscovery’s data and legal discovery-centric service, product, or research announcements.


Background Note: In a recent article published on LinkedIn, HaystackID Chief Data Scientist John Brewer discusses the growing trend of utilizing AI to solve problems that could be efficiently addressed using traditional algorithms. Brewer shares his experience with various AI pitches, specifically noting that many of these solutions target well-structured data types that can be easily processed using regular expressions. He argues that while AI has its merits, it is crucial for developers to consider whether simpler, more cost-effective, and faster solutions are available through traditional data processing techniques. Brewer’s insights serve as a reminder to look beyond the hype and critically evaluate the best approach for each problem.

Industry Backgrounder

Killing an Ant with HAL 9000*

John Brewer, HaystackID

I hear a lot of pitches from firms selling AI solutions, from two-man operations all the way up to huge players like Microsoft and Google. One thing that has struck me, especially in the last few weeks, is that a lot of them are using AI to solve problems that clever algorithms can (and often already have) solved years ago.

A Regular Expression of Interest

AI-based tools bring a lot to the table, but at their heart, they create a repeatable process for doing something. The model or network or ruleset they create is a function that takes in some parameters and produces an output. When we know a relationship exists but we don’t know what it is, or how to express it mathematically, that is a godsend. Especially in areas where machine learning techniques traditionally excel, such as classification, astounding results can be achieved.

I am seeing AI used on solved problems, though. Names and products changed to protect the misguided but innocent. Acme.ai brought to me a tool that searched extracted text for certain types of information. One of the centers of their sales pitch was the ability to find North American phone numbers. They explained to me they can find numbers expressed as 7-digits, 10 digits, or even 11 digits (including the +1 country code.) I relatively quickly realized that all of the information types they could detect we things that are well-structured when they appear in documents. Phone numbers, ZIP codes, driver’s license numbers, etc. All things that could be searched for using relatively basic regular expressions.

I dug in on this, because I was certain they weren’t just trying to sell me something I could do with an IBM 7000-series from 1968. As I dug in, I learned these well-meaning chaps were indeed tokenizing their documents and had, with great effort, meticulously trained models up to 99.9% accuracy for each category of data they were searching for. They were very good classifiers. But they were an early 21st Century sledgehammer being swung at a mid-20th-century ant.

It’s Easy to Train on a Math Problem

There’s a growing population of companies and teams out there that are building AI systems based on problems that have more traditional solutions. In many cases, it is because they had traditional solutions these problems are being chosen. Training ML models takes vast quantities of data to use as examples, and data is painful and expensive to collect, clean, and utilize in bulk. Choosing a problem that you can compute independently takes this problem away. If I want to train an AI to tell me if a work is seven characters long, I can create huge amounts of data very easily to train and test on, and I can quickly train that model up to dazzling levels of accuracy. I can say on the side of my product that it is AI-enabled. At the end of the day, though, I have just created a string length function with a surprisingly high error rate.

Butter Knives for Butter, Chain Saws for Oak Trees

There seems to be a lot less pure snake oil in the AI industry than there was five years ago, but as the tools have gotten into the hands of more and more entrepreneurs and amateur developers, we’re seeing more and more naive uses of AI for problems that are just as well, or better, solved with traditional data processing techniques. Machine learning and AI open up new horizons in computation, but when we’re solving a new challenge as developers, we should always be stopping to ask if a simpler, cheaper, and faster solution is available with our older tools.

Read the original article.


*Shared with permission.

Additional Reading

Source: ComplexDiscovery

 

Have a Request?

If you have information or offering requests that you would like to ask us about, please let us know, and we will make our response to you a priority.

ComplexDiscovery OÜ is a highly recognized digital publication focused on providing detailed insights into the fields of cybersecurity, information governance, and eDiscovery. Based in Estonia, a hub for digital innovation, ComplexDiscovery OÜ upholds rigorous standards in journalistic integrity, delivering nuanced analyses of global trends, technology advancements, and the eDiscovery sector. The publication expertly connects intricate legal technology issues with the broader narrative of international business and current events, offering its readership invaluable insights for informed decision-making.

For the latest in law, technology, and business, visit ComplexDiscovery.com.

 

Generative Artificial Intelligence and Large Language Model Use

ComplexDiscovery OÜ recognizes the value of GAI and LLM tools in streamlining content creation processes and enhancing the overall quality of its research, writing, and editing efforts. To this end, ComplexDiscovery OÜ regularly employs GAI tools, including ChatGPT, Claude, Midjourney, and DALL-E, to assist, augment, and accelerate the development and publication of both new and revised content in posts and pages published (initiated in late 2022).

ComplexDiscovery also provides a ChatGPT-powered AI article assistant for its users. This feature leverages LLM capabilities to generate relevant and valuable insights related to specific page and post content published on ComplexDiscovery.com. By offering this AI-driven service, ComplexDiscovery OÜ aims to create a more interactive and engaging experience for its users, while highlighting the importance of responsible and ethical use of GAI and LLM technologies.