From Big Oil to Big Data? AI, Machine Learning, and the Montreal Data License

The emergence of ML and AI is already shaping society, political systems and our economies. The underlying assets driving such changes are largely informational. Access and licensing of data can thus be understood as one of the cornerstone of the development of ML and AI. This is true in an abstract sense, but when combined to the fact that there exists a widening data gap between multinational firms with platform-based business models on one hand, and governments, citizens and other businesses on the other, the need for clarity in data licensing becomes imperative.

en flag
nl flag
fr flag
de flag
pt flag
es flag

Editor’s Note: Data is valuable for the purposes of artificial intelligence (AI) and machine learning (ML) when it is voluminous, organized, and standardized. Provided in this short reference post published, with the permission of Element AI, are considerations that may be helpful in developing a common framework for data licensing. This reasonable first step toward data standardization may increase the accessibility, usability, and value of data being used in the fields of AI and ML.

Toward Standardization of Data Licenses: The Montreal Data License

An article by Misha Benjamin, Paul Gagnon, Negar Rostamzadeh, Chris Pal, Yoshua Bengio, and Alex Shee

This paper provides a taxonomy for the licensing of data in the fields of artificial intelligence and machine learning. The paper’s goal is to build towards a common framework for data licensing akin to the licensing of open source software. Increased transparency and resolving conceptual ambiguities in existing licensing language are two noted benefits of the approach proposed in the paper. In parallel, such benefits may help foster fairer and more efficient markets for data through bringing about clearer tools and concepts that better define how data can be used in the fields of AI and ML. The paper’s approach is summarized in a new family of data license language – \textit{the Montreal Data License (MDL)}. Alongside this new license, the authors and their collaborators have developed a web-based tool to generate license language espousing the taxonomies articulated in this paper.

Data is the new oil is an oft-repeated mantra, used in many fora in recognition of the fundamental role that data plays as a catalyst for the creation of artificial intelligence and machine learning assets and systems. Oil is an appealing analogy from a conceptual standpoint. Its extraction is a resource-heavy endeavor. The processing and refinement of oil yields fuel, plastics and other economically valuable derivatives. The acquisition and extraction processes for data are equally resource-intensive, and, given the technological progress underlying the current age of big data, machine learning and artificial intelligence, it is no exaggeration to compare the economic benefits of data in terms comparable to those used for oil.

In comparison to data, the market for oil is heavily regulated throughout its supply and extraction chain. Regulation is driven by the need to reduce transaction friction, prevent security issues, regulate toxicity and greenhouse gas emissions, and ultimately, to foster public trust. Through regulation and standardization, the end-products derived from oil gained in quality and predictability (Unit, 2011). One underestimated benefit arising from such regulation is the emergence and consolidation of standardized terminology, which in turn is ultimately market-making and fosters scalability. Unfortunately, none of those elements exist with regards to markets for data – which creates a lot of friction and uncertainty and increase transaction costs. These transaction costs consist of resources allocated to harmonizing the data itself as well as assessing whether the data itself is usable (both technically and from a legal standpoint). While metadata made available alongside data may prove useful in reducing such costs, metadata is inconsistent and may not always facilitate the assessment of how relevant or technically useful data may be. However, metadata is technical in nature, and does not clarify how data may legally be used. Conversely, recent efforts in standardizing the presentation of metadata for ML models are highly relevant, and contribute to fostering transparency in the fields of ML and AI (Gebru et al., 2018; Mitchell et al., 2019). For ML and AI to continue their growth, and for that growth to be beneficial for all, standardized terminology and increased predictability is necessary. This article aims to provide a first step towards such standards with respect to data licensing.



Complete PDF Copy: Toward Standardization of Data Licenses: The Montreal Data License

Toward Standardization of Data Licenses: The Montreal Data License

Read the complete article at Toward Standardization of Data Licenses: The Montreal Data License

The Montreal Data License: Online License Generator

The Montreal Data License was created to bring clarity to individuals and companies that make data available to third parties for use-cases in the fields of Machine Learning and Artificial Intelligence. The License Generator tool aims to standardize the language used to make data available so that the intent of those who make data available can be better reflected and understood by those who wish to make use of the data.

Learn more about and use the License Generator at MontrealDataLicense.com

Additional Reading

Source: ComplexDiscovery

Business as Unusual? Eighteen Observations on eDiscovery Business Confidence in the Summer of 2020

The results of the recent Summer 2020 eDiscovery Business Confidence Survey present the unfortunate and continuing impact of COVID-19 on the business of eDiscovery. However, for these pandemic-driven results to be fully understood, they should be viewed through the contextual lens of the results of all nineteen surveys that have been administered to eDiscovery professionals since the inception of the eDiscovery Business Confidence Survey in early 2016.



Check Out the Observations Now!

ComplexDiscovery combines original industry research with curated expert articles to create an informational resource that helps legal, business, and information technology professionals better understand the business and practice of data discovery and legal discovery.

All contributions are invested to support the development and distribution of ComplexDiscovery content. Contributors can make as many article contributions as they like, but will not be asked to register and pay until their contribution reaches $5.

Sharing is Caring? ayfie Group Lists on Merkur Market of Oslo Stock Exchange

According to Johannes Stiehler, CEO of ayfie Group, in a July...

XDD Acquires Anexsys

According to David Moran, XDD President and COO, “Complementing our recent...

Missing Something? Topic Modeling in eDiscovery

The basic idea behind topic modeling, according to eDiscovery expert and...

HaystackID and NightOwl Global Merge

According to today's announcement, the NightOwl merger is HaystackID's fourth major...

A Running List: Top 100+ eDiscovery Providers

Based on a compilation of research from analyst firms and industry...

The eDisclosure Systems Buyers Guide – 2020 Edition (Andrew Haslam)

Authored by industry expert Andrew Haslam, the eDisclosure Buyers Guide continues...

The Race to the Starting Line? Recent Secure Remote Review Announcements

Not all secure remote review offerings are equal as the apparent...

Enabling Remote eDiscovery? A Snapshot of DaaS

Desktop as a Service (DaaS) providers are becoming important contributors to...

Home or Away? New eDiscovery Collection Market Sizing and Pricing Considerations

One of the key home (onsite) or away (remote) decisions that...

Revisions and Decisions? New Considerations for eDiscovery Secure Remote Reviews

One of the key revision and decision areas that business, legal,...

A Macro Look at Past and Projected eDiscovery Market Size from 2012 to 2024

From a macro look at past estimations of eDiscovery market size...

An eDiscovery Market Size Mashup: 2019-2024 Worldwide Software and Services Overview

While the Compound Annual Growth Rate (CAGR) for worldwide eDiscovery software...

Business as Unusual? Eighteen Observations on eDiscovery Business Confidence in the Summer of 2020

Based on the aggregate results of nineteen past eDiscovery Business Confidence...

A Growing Concern? Budgetary Constraints and the Business of eDiscovery

In the summer of 2020, 56% of respondents viewed budgetary constraints...

A Change in Tempo? eDiscovery Operational Metrics in the Summer of 2020

In the summer of 2020, 91 eDiscovery Business Confidence Survey participants...

Shifting Gears? eDiscovery Business Confidence Survey Results – Summer 2020

This is the nineteenth quarterly eDiscovery Business Confidence Survey conducted by...

Sharing is Caring? ayfie Group Lists on Merkur Market of Oslo Stock Exchange

According to Johannes Stiehler, CEO of ayfie Group, in a July...

XDD Acquires Anexsys

According to David Moran, XDD President and COO, “Complementing our recent...

HaystackID and NightOwl Global Merge

According to today's announcement, the NightOwl merger is HaystackID's fourth major...

Mitratech Acquires Tracker Corp

The acquisition supports Mitratech’s mission to provide legal and compliance solutions...

Five Great Reads on eDiscovery for July 2020

From business confidence and operational metrics to data protection and privacy...

Five Great Reads on eDiscovery for June 2020

From collection market size updates to cloud outsourcing guidelines, the June...

Five Great Reads on eDiscovery for May 2020

From review market sizing revisions to pandemeconomic pricing, the May 2020...

Five Great Reads on eDiscovery for April 2020

From business confidence to the boom of Zoom, the April 2020...

[New Survey]
[New Survey]