Thu. Mar 28th, 2024

Editor’s Note: Data is valuable for the purposes of artificial intelligence (AI) and machine learning (ML) when it is voluminous, organized, and standardized. Provided in this short reference post published, with the permission of Element AI, are considerations that may be helpful in developing a common framework for data licensing. This reasonable first step toward data standardization may increase the accessibility, usability, and value of data being used in the fields of AI and ML.

Toward Standardization of Data Licenses: The Montreal Data License

An article by Misha Benjamin, Paul Gagnon, Negar Rostamzadeh, Chris Pal, Yoshua Bengio, and Alex Shee

This paper provides a taxonomy for the licensing of data in the fields of artificial intelligence and machine learning. The paper’s goal is to build towards a common framework for data licensing akin to the licensing of open source software. Increased transparency and resolving conceptual ambiguities in existing licensing language are two noted benefits of the approach proposed in the paper. In parallel, such benefits may help foster fairer and more efficient markets for data through bringing about clearer tools and concepts that better define how data can be used in the fields of AI and ML. The paper’s approach is summarized in a new family of data license language – \textit{the Montreal Data License (MDL)}. Alongside this new license, the authors and their collaborators have developed a web-based tool to generate license language espousing the taxonomies articulated in this paper.

Data is the new oil is an oft-repeated mantra, used in many fora in recognition of the fundamental role that data plays as a catalyst for the creation of artificial intelligence and machine learning assets and systems. Oil is an appealing analogy from a conceptual standpoint. Its extraction is a resource-heavy endeavor. The processing and refinement of oil yields fuel, plastics and other economically valuable derivatives. The acquisition and extraction processes for data are equally resource-intensive, and, given the technological progress underlying the current age of big data, machine learning and artificial intelligence, it is no exaggeration to compare the economic benefits of data in terms comparable to those used for oil.

In comparison to data, the market for oil is heavily regulated throughout its supply and extraction chain. Regulation is driven by the need to reduce transaction friction, prevent security issues, regulate toxicity and greenhouse gas emissions, and ultimately, to foster public trust. Through regulation and standardization, the end-products derived from oil gained in quality and predictability (Unit, 2011). One underestimated benefit arising from such regulation is the emergence and consolidation of standardized terminology, which in turn is ultimately market-making and fosters scalability. Unfortunately, none of those elements exist with regards to markets for data – which creates a lot of friction and uncertainty and increase transaction costs. These transaction costs consist of resources allocated to harmonizing the data itself as well as assessing whether the data itself is usable (both technically and from a legal standpoint). While metadata made available alongside data may prove useful in reducing such costs, metadata is inconsistent and may not always facilitate the assessment of how relevant or technically useful data may be. However, metadata is technical in nature, and does not clarify how data may legally be used. Conversely, recent efforts in standardizing the presentation of metadata for ML models are highly relevant, and contribute to fostering transparency in the fields of ML and AI (Gebru et al., 2018; Mitchell et al., 2019). For ML and AI to continue their growth, and for that growth to be beneficial for all, standardized terminology and increased predictability is necessary. This article aims to provide a first step towards such standards with respect to data licensing.



Complete PDF Copy: Toward Standardization of Data Licenses: The Montreal Data License

Toward Standardization of Data Licenses: The Montreal Data License

Read the complete article at Toward Standardization of Data Licenses: The Montreal Data License

The Montreal Data License: Online License Generator

The Montreal Data License was created to bring clarity to individuals and companies that make data available to third parties for use-cases in the fields of Machine Learning and Artificial Intelligence. The License Generator tool aims to standardize the language used to make data available so that the intent of those who make data available can be better reflected and understood by those who wish to make use of the data.

Learn more about and use the License Generator at MontrealDataLicense.com

Additional Reading

Source: ComplexDiscovery

 

Generative Artificial Intelligence and Large Language Model Use

ComplexDiscovery OÜ recognizes the value of GAI and LLM tools in streamlining content creation processes and enhancing the overall quality of its research, writing, and editing efforts. To this end, ComplexDiscovery OÜ regularly employs GAI tools, including ChatGPT, Claude, Midjourney, and DALL-E, to assist, augment, and accelerate the development and publication of both new and revised content in posts and pages published (initiated in late 2022).

ComplexDiscovery also provides a ChatGPT-powered AI article assistant for its users. This feature leverages LLM capabilities to generate relevant and valuable insights related to specific page and post content published on ComplexDiscovery.com. By offering this AI-driven service, ComplexDiscovery OÜ aims to create a more interactive and engaging experience for its users, while highlighting the importance of responsible and ethical use of GAI and LLM technologies.

 

Have a Request?

If you have information or offering requests that you would like to ask us about, please let us know, and we will make our response to you a priority.

ComplexDiscovery OÜ is a highly recognized digital publication focused on providing detailed insights into the fields of cybersecurity, information governance, and eDiscovery. Based in Estonia, a hub for digital innovation, ComplexDiscovery OÜ upholds rigorous standards in journalistic integrity, delivering nuanced analyses of global trends, technology advancements, and the eDiscovery sector. The publication expertly connects intricate legal technology issues with the broader narrative of international business and current events, offering its readership invaluable insights for informed decision-making.

For the latest in law, technology, and business, visit ComplexDiscovery.com.