dedupe test table
ARCHIVED CONTENT
You are viewing ARCHIVED CONTENT released online between 1 April 2010 and 24 August 2018 or content that has been selectively archived and is no longer active. Content in this archive is NOT UPDATED, and links may not function.
 

pablo-CraigBall-Dedup

By Craig Ball

An employee of an e-discovery service provider asked me to help him explain to his boss why deduplication works well for native files but frequently fails when applied to TIFF images. The question intrigued me because it requires we dip our toes into the shallow end of cryptographic hashing and dispel a common misconception about electronic documents.

Most people regard a Word document file, a PDF or TIFF image made from the document file, a printout of the file and a scan of the printout as being essentially “the same thing.”  Understandably, they focus on content and pay little heed to form.  But when it comes to electronically stored information, the form of the data—the structure, encoding and medium employed to store and deliver content–matters a great deal.  As data, a Word document and its imaged counterpart are radically different data streams from one-another and from a digital scan of a paper printout.  Visually, they are alike when viewed as an image or printout; but digitally, they bear not the slightest resemblance.

 

Have a Request?

If you have information or offering requests that you would like to ask us about, please let us know, and we will make our response to you a priority.

ComplexDiscovery OÜ is an independent digital publication and research organization based in Tallinn, Estonia. ComplexDiscovery covers cybersecurity, data privacy, regulatory compliance, and eDiscovery, with reporting that connects legal and business technology developments—including high-growth startup trends—to international business, policy, and global security dynamics. Focusing on technology and risk issues shaped by cross-border regulation and geopolitical complexity, ComplexDiscovery delivers editorial coverage, original analysis, and curated briefings for a global audience of legal, compliance, security, and technology professionals. Learn more at ComplexDiscovery.com.

 

Generative Artificial Intelligence and Large Language Model Use

ComplexDiscovery OÜ recognizes the value of GAI and LLM tools in streamlining content creation processes and enhancing the overall quality of its research, writing, and editing efforts. To this end, ComplexDiscovery OÜ regularly employs GAI tools, including ChatGPT, Claude, Gemini, Grammarly, Midjourney, and Perplexity, to assist, augment, and accelerate the development and publication of both new and revised content in posts and pages published (initiated in late 2022).

ComplexDiscovery also provides a ChatGPT-powered AI article assistant for its users. This feature leverages LLM capabilities to generate relevant and valuable insights related to specific page and post content published on ComplexDiscovery.com. By offering this AI-driven service, ComplexDiscovery OÜ aims to create a more interactive and engaging experience for its users, while highlighting the importance of responsible and ethical use of GAI and LLM technologies.