From Big Oil to Big Data? AI, Machine Learning, and the Montreal Data License

The emergence of ML and AI is already shaping society, political systems and our economies. The underlying assets driving such changes are largely informational. Access and licensing of data can thus be understood as one of the cornerstone of the development of ML and AI. This is true in an abstract sense, but when combined to the fact that there exists a widening data gap between multinational firms with platform-based business models on one hand, and governments, citizens and other businesses on the other, the need for clarity in data licensing becomes imperative.

en flag
nl flag
fr flag
de flag
pt flag
es flag

Editor’s Note: Data is valuable for the purposes of artificial intelligence (AI) and machine learning (ML) when it is voluminous, organized, and standardized. Provided in this short reference post published, with the permission of Element AI, are considerations that may be helpful in developing a common framework for data licensing. This reasonable first step toward data standardization may increase the accessibility, usability, and value of data being used in the fields of AI and ML.

Toward Standardization of Data Licenses: The Montreal Data License

An article by Misha Benjamin, Paul Gagnon, Negar Rostamzadeh, Chris Pal, Yoshua Bengio, and Alex Shee

This paper provides a taxonomy for the licensing of data in the fields of artificial intelligence and machine learning. The paper’s goal is to build towards a common framework for data licensing akin to the licensing of open source software. Increased transparency and resolving conceptual ambiguities in existing licensing language are two noted benefits of the approach proposed in the paper. In parallel, such benefits may help foster fairer and more efficient markets for data through bringing about clearer tools and concepts that better define how data can be used in the fields of AI and ML. The paper’s approach is summarized in a new family of data license language – \textit{the Montreal Data License (MDL)}. Alongside this new license, the authors and their collaborators have developed a web-based tool to generate license language espousing the taxonomies articulated in this paper.

Data is the new oil is an oft-repeated mantra, used in many fora in recognition of the fundamental role that data plays as a catalyst for the creation of artificial intelligence and machine learning assets and systems. Oil is an appealing analogy from a conceptual standpoint. Its extraction is a resource-heavy endeavor. The processing and refinement of oil yields fuel, plastics and other economically valuable derivatives. The acquisition and extraction processes for data are equally resource-intensive, and, given the technological progress underlying the current age of big data, machine learning and artificial intelligence, it is no exaggeration to compare the economic benefits of data in terms comparable to those used for oil.

In comparison to data, the market for oil is heavily regulated throughout its supply and extraction chain. Regulation is driven by the need to reduce transaction friction, prevent security issues, regulate toxicity and greenhouse gas emissions, and ultimately, to foster public trust. Through regulation and standardization, the end-products derived from oil gained in quality and predictability (Unit, 2011). One underestimated benefit arising from such regulation is the emergence and consolidation of standardized terminology, which in turn is ultimately market-making and fosters scalability. Unfortunately, none of those elements exist with regards to markets for data – which creates a lot of friction and uncertainty and increase transaction costs. These transaction costs consist of resources allocated to harmonizing the data itself as well as assessing whether the data itself is usable (both technically and from a legal standpoint). While metadata made available alongside data may prove useful in reducing such costs, metadata is inconsistent and may not always facilitate the assessment of how relevant or technically useful data may be. However, metadata is technical in nature, and does not clarify how data may legally be used. Conversely, recent efforts in standardizing the presentation of metadata for ML models are highly relevant, and contribute to fostering transparency in the fields of ML and AI (Gebru et al., 2018; Mitchell et al., 2019). For ML and AI to continue their growth, and for that growth to be beneficial for all, standardized terminology and increased predictability is necessary. This article aims to provide a first step towards such standards with respect to data licensing.

Complete PDF Copy: Toward Standardization of Data Licenses: The Montreal Data License

Toward Standardization of Data Licenses: The Montreal Data License

Read the complete article at Toward Standardization of Data Licenses: The Montreal Data License

The Montreal Data License: Online License Generator

The Montreal Data License was created to bring clarity to individuals and companies that make data available to third parties for use-cases in the fields of Machine Learning and Artificial Intelligence. The License Generator tool aims to standardize the language used to make data available so that the intent of those who make data available can be better reflected and understood by those who wish to make use of the data.

Learn more about and use the License Generator at

Additional Reading

Source: ComplexDiscovery

ComplexDiscovery combines original industry research with curated expert articles to create an informational resource that helps legal, business, and information technology professionals better understand the business and practice of data discovery and legal discovery.

All contributions are invested to support the development and distribution of ComplexDiscovery content. Contributors can make as many article contributions as they like, but will not be asked to register and pay until their contribution reaches $5.