Fair and Balanced Search? New NIST Initiative Addresses Information Retrieval Bias

“It’s now recognized that systems aren’t unbiased. They can actually amplify existing bias because of the historical data the systems train on,” said Ellen Voorhees, a NIST computer scientist. “The systems are going to learn that bias and recommend you take an action that reflects it.”

en flag
nl flag
fi flag
fr flag
de flag
pt flag
es flag

News Announcement from the National Institute of Standards and Technology (NIST)

To Measure Bias in Data, NIST Initiates ‘Fair Ranking’ Research Effort

New initiative has long-term goal of making search technology more evenhanded.

A new research effort at the National Institute of Standards and Technology (NIST) aims to address a pervasive issue in our data-driven society: a lack of fairness that sometimes turns up in the answers we get from information retrieval software.

Software of this type is everywhere, from popular search engines to less-known algorithms that help specialists comb through databases. This software usually incorporates forms of artificial intelligence that help it learn to make better decisions over time. But it bases these decisions on the data it receives, and if that data is biased in some way, the software will learn to make decisions that reflect that bias too. These decisions can have real-world consequences, influencing what music artists a streaming service suggests and whether you get recommended for a job interview.

“It’s now recognized that systems aren’t unbiased. They can actually amplify existing bias because of the historical data the systems train on,” said Ellen Voorhees, a NIST computer scientist. “The systems are going to learn that bias and recommend you take an action that reflects it.”

As a step toward confronting this problem, NIST has launched the Fair Ranking track this year as part of its long-running Text Retrieval Conference (TREC), which is taking place this week at NIST’s Gaithersburg, Maryland, campus. Proposed and organized by researchers from Microsoft, Boise State University and NIST, the track — essentially an incubator for a new area of study — aims to coordinate research around the idea of fairness. By finding appropriate ways to measure the amount of bias in data and search techniques, the organizers hope to identify strategies for eliminating it.

“We would like to develop systems that serve all of their users, as opposed to benefiting a certain group of people,” said Asia Biega, a postdoctoral researcher at Microsoft Research Montreal and one of the track’s co-organizers. “We are trying to avoid developing systems that amplify existing inequality.”

While awareness of the trouble that biased data creates is growing, there are also many different ways of defining and evaluating fairness in datasets and search tools. In order to keep the research effort focused, the organizers have chosen a particular set of data used by a specific search tool: the Semantic Scholar search engine, developed by the nonprofit Allen Institute for Artificial Intelligence to help academics search for papers relevant to their field. The Allen Institute provided TREC with a 400-gigabyte database of queries made to Semantic Scholar, the resulting lists of answers it returned, and — as a measure of each answer’s relevance to the searcher — the number of clicks each answer received.

The organizers also have elected to concentrate on a problem that often crops up both in commercial search engine results and in scholarly searches for research papers: the appearance of the same answers at the top of a list every time after running a particular search term.

One problem with academic searches is that they often return a list of papers by the best-known researchers from high-ranked institutions. In one way, this approach to ranking makes sense, as conscientious scientists want to show they have reviewed the most relevant past research before claiming to have discovered something new. Similarly, when we use an internet search tool, we might not mind if we see a well-respected company at the top of our search for a product we know little about.

While this result is fine if we are looking for the most popular answer, it is problematic if there are great numbers of worthwhile answers. To be sure, some of us do scroll through pages of results, but by and large, most people who use search tools never look beyond the first page.

“The results on that first-page influence people’s economic livelihood in the real world,” Biega said. “Search engines have the power to amplify exposure. Whoever is on the first page gets more.”

A fair algorithm, by the research track’s measure, would not always return the exact same list of articles in the same order in response to a query, but instead would give other articles their fair share of exposure. In practice, this would mean more prominent articles might still show up more frequently, and the less prominent ones less so — but the returned list would not always be the same. It would contain answers relevant to the searcher’s needs, but it would vary in ways that would be quantifiable.

This is of course not the only way to define fairness, and Voorhees said that she does not expect a single research project to solve such a broad societal problem as this one. She does say that quantifying the problem is an appropriate first step, however.

“It’s important for us to be able to measure the amount of bias in a system effectively enough that we can do research on it,” she said. “We need to measure it if we want to try.”

The Fair Ranking track is open to all interested research teams. NIST will make the official call this December for participation in 2020 TREC, which will take place from November 18-20, 2020, in Gaithersburg, Maryland.

Read the original news announcement at To Measure Bias in Data, NIST Initiates ‘Fair Ranking’ Research Effort

Additional Reading

Source: ComplexDiscovery

A Matter of Pricing? A Running Update of Semi-Annual eDiscovery Pricing Survey Responses

First administered in December of 2018 and conducted four times during the last two years with 334 individual responses, the semi-annual eDiscovery Pricing Survey highlights pricing on selected collection, processing, and review tasks. The aggregate results of all surveys as shared in the provided comparative charts may be helpful for understanding pricing and its impact on purchasing behavior on selected services over time.



Access the Results Now!

ComplexDiscovery combines original industry research with curated expert articles to create an informational resource that helps legal, business, and information technology professionals better understand the business and practice of data discovery and legal discovery.

All contributions are invested to support the development and distribution of ComplexDiscovery content. Contributors can make as many article contributions as they like, but will not be asked to register and pay until their contribution reaches $5.

Ipro Acquires NetGovern

According to Dean Brown, CEO at Ipro Tech, “We are thrilled...

A 2020 Assessment: The Chambers Litigation Support Guide and eDiscovery

Chambers Litigation Support 2020 is a comprehensive guide to the leading professional...

A Competitive Advantage? FTC and DOJ Issue Antitrust Guidelines for Evaluating Vertical Mergers

According to FTC Chairman Joe Simons, “The new Guidelines reflect our...

Morae Acquires Legal Management Consultancy Janders Dean

According to Janders Dean founder Justin North, "Now more than ever,...

A Running List: Top 100+ eDiscovery Providers

Based on a compilation of research from analyst firms and industry...

The eDisclosure Systems Buyers Guide – 2020 Edition (Andrew Haslam)

Authored by industry expert Andrew Haslam, the eDisclosure Buyers Guide continues...

The Race to the Starting Line? Recent Secure Remote Review Announcements

Not all secure remote review offerings are equal as the apparent...

Enabling Remote eDiscovery? A Snapshot of DaaS

Desktop as a Service (DaaS) providers are becoming important contributors to...

Home or Away? New eDiscovery Collection Market Sizing and Pricing Considerations

One of the key home (onsite) or away (remote) decisions that...

Revisions and Decisions? New Considerations for eDiscovery Secure Remote Reviews

One of the key revision and decision areas that business, legal,...

A Macro Look at Past and Projected eDiscovery Market Size from 2012 to 2024

From a macro look at past estimations of eDiscovery market size...

An eDiscovery Market Size Mashup: 2019-2024 Worldwide Software and Services Overview

While the Compound Annual Growth Rate (CAGR) for worldwide eDiscovery software...

Constrained Acceleration? The Summer 2020 eDiscovery Business Confidence Survey

Since January 2016, 2,089 individual responses to eighteen quarterly eDiscovery Business...

A Matter of Pricing? A Running Update of Semi-Annual eDiscovery Pricing Survey Responses

First administered in December of 2018 and conducted four times during...

A Pandemeconomic Indicator? Summer 2020 eDiscovery Pricing Survey Results

Based on the complexity of data and legal discovery, it is...

COVID-19 Constrained? The Impact of Six Issues on the Business of eDiscovery

In the spring of 2020, 51.2% of respondents viewed budgetary constraints...

Ipro Acquires NetGovern

According to Dean Brown, CEO at Ipro Tech, “We are thrilled...

Morae Acquires Legal Management Consultancy Janders Dean

According to Janders Dean founder Justin North, "Now more than ever,...

eDiscovery Mergers, Acquisitions, and Investments in Q2 2020

From UnitedLex to Onna, ComplexDiscovery findings, data points, and tracking information...

Mitratech Acquires CMPG Risk Solutions

According to the announcement, with the inclusion of enterprise and vendor...

Five Great Reads on eDiscovery for June 2020

From collection market size updates to cloud outsourcing guidelines, the June...

Five Great Reads on eDiscovery for May 2020

From review market sizing revisions to pandemeconomic pricing, the May 2020...

Five Great Reads on eDiscovery for April 2020

From business confidence to the boom of Zoom, the April 2020...

Five Great Reads on Data Discovery and Legal Discovery for March 2020

From business continuity considerations to cybersecurity attacks, the March 2020 edition...