Provided for review and consideration is a general technical glossary designed to enhance the education and understanding of eDiscovery and eDisclosure professionals.
ACTIVE OR LIVE DATA: Information residing on a computer’s hard drive or servers which is readily visible to users (e.g. a document, spreadsheet or an e-mail).
ALGORITHM: A detailed formula or set of steps for solving a particular problem (e.g. searching for relevant electronic documents, such as, MDS# or SHA-1#)).
APPLICATION: A collection of one or more related software programmes that allow a user to enter, store, view, change or extract information from files or databases (e.g. Word, Excel and Microsoft Office). Also referred to as “programmes” or “software”.
ARCHITECTURE: Hardware and/or software comprising a computer system or network.
ARCHIVAL DATA: Information that is not directly accessible to the user of a computer system but is data that the organisation maintains for long term storage and record keeping purposes (e.g. backup data).
ATTACHMENT: A record or file associated with another record for the purposes of retention or transfer. The attachment is commonly referred to as the “child” with the record it is attached to as the “parent”. If the attachment itself has an attachment this would be a “grandchild” and so on. A synonym is an ATTACHED DOCUMENT, which means a Document attached to, or embedded in, a HOST DOCUMENT.
AUDIT TRAIL: Information about where data has been, in whose possession and why, held in sufficient detail so as to allow the reconstruction of that activity.
AUTHOR: The person, office or designated person responsible for a document’s creation or issuance. Also referred to as “originator”.
BACKUP DATA: A copy of data created as a precaution against the loss or damage of the original data. Backup data is information that is not presently in use by an organisation and is routinely stored separately upon portable media, to free up space and permit data recovery in the event of disaster. Backup data can be incremental (where only new data is saved) or complete (where all data is saved).
BACKUP TAPE RECYCLING: The process whereby an organisation’s backup tapes are overwritten with new backup data, usually on a fixed schedule (e.g. the use of nightly backup tapes for each day of the week with a daily backup tape for a particular day being overwritten on the same day the following week; weekly and monthly backups being stored offsite for a specified period of time before being placed back in rotation).
BATES NUMBERING: is used in the legal, medical, and business fields to place identifying numbers and/or date/time-marks on images and documents as they are scanned or processed, for example, during the discovery stage of preparations for trial or identifying business receipts. Bates stamping can be used to mark and identify images with copyrights by putting a company name, logo and/or legal copyright on them. This process provides identification, protection, and automatic consecutive numbering of the images. The process is named after the late 19th century inventor Edwin G. Bates of New York City.
BYTE: The basic measurement of most computer data.
CD-ROM (CD READ ONLY MEMORY): Data storage medium that uses compact discs to store about 1,500 floppy discs worth of data, that is, approximately 55,000 pages. Variations include CD-Rs (CD Recordable) and CD-RWs (CD Re-Writable).
CLUSTERING: Functionality whereby ESI containing similar content is grouped together by the software without human intervention. Results might be shown in a pictorial manner with items of ESI “clustered” together, or by folders of similar documents.
COMPRESSION: The reduction of the size of a file to save storage space. “Compression ratio” is the ratio of the size of an uncompressed file to a compressed file.
COMPUTER ASSISTED REVIEW (CAR): Also known as Technology Assisted Review (TAR). A process of having computer software electronically classify documents based on input from expert reviewers, in an effort to expedite the organization and prioritization of the document collection. The computer classification may include broad topics pertaining to discovery responsiveness, privilege, and other designated issues. Also see: Predictive Coding.
COMPUTER ASSISTED REVIEW REFERENCE MODEL (CARRM): Model used to show stages of process of Computer Assisted review (CAR).
COMPUTER FORENSICS: The use of specialised techniques for recovery, authentication, and analysis of electronic data.
CSV FILE: A computer file containing a list of values separated by a comma or other delimiter.
CUSTODIAN: Person having control of a network, computer or specific electronic folder.
DAT (DIGITAL AUDIO TAPE): A high capacity storage medium. Used in some backup systems.
DATA MAP: A written description (possibly with a diagram or two) of where the client’s data sources are.
Data sampling: See Sampling.
DE-DUPLICATION: The process of identifying and removing duplicate Documents from a collection of Documents so that only one unique copy of each document remains. A cryptographic hash function such as the Message Digest algorithm 5 may be used to generate a digital fingerprint for an Electronic Document. The digital fingerprint of a Document can then be electronically compared against the digital fingerprint of any other Document to determine whether the Documents are exact duplicates. De- duplication may also be implemented by using a cryptographic hash function applied to a group of Documents.
DELETED DATA: Data that, in the past, existed on the computer as live data and which has been deleted by the computer system or end-user. Deleted data remains on storage media in whole or part until it is overwritten by on-going usage or “wiped” with a software program specifically designed to remove deleted data. Even after the data itself has been wiped, directory entries, pointers, or other metadata relating to the deleted data may remain on the computer.
DELETION: The process whereby data is removed from active files and other data storage structures on computers and rendered inaccessible except by using special data recovery tools designed to recover deleted data.
DISC (DISK): It may be a floppy disk, or it may be a hard disk. Either way, it is a magnetic storage medium on which data is digitally stored.
DISCLOSURE DATA: Data relating to disclosed Documents, including for example the type of document, the date of the document, the names of the author/sender and the recipient, and the party disclosing the document. See OBJECTIVE and SUBJECTIVE CODING. Normally only OBJECTIVE CODING is disclosed with documents.
DISTRIBUTED DATA: Information belonging to an organisation which resides on portable media and non-local devices such as remote offices, home computers, laptop computers, personal digital assistants (PDA’s), wireless communication devices (e.g. Blackberry) and internet repositories (such as email hosted by internet service provider or portals and web sites).
DOCUMENT: Anything in which information of any description is recorded (see CPR Rule 31.4). It includes all ESI.
DOCUMENT CODING: The process of identifying and recording case-relevant information (e.g. author, date authored, date sent, recipient, date opened, etc.) from a document. Can be automated or manual. Also referred to as INDEXING. See also OBJECTIVE CODING and SUBJECTIVE CODING.
DOCUMENT MANAGEMENT: The manual and automated processes for the management of documents during the course of proceedings, including the identification, preservation, collection, processing, analysis, review, production and exchange of documents.
DVD (DIGITAL VIDEO DISC OR DIGITAL VERSATILE DISC): Data storage medium, like a compact disc, upon which data can be written and read. DVDs are faster, can hold more information, and can support more data formats than CDs.
EARLY CASE ASSESSMENT (ECA): Also known as “EARLY DATA ASSESSMENT”. Initial process in the EDRM approach whereby a large volume of data (normally emails and attachments) goes through various processes such as clustering, semantic analysis, and email threading to enable early decisions to be taken on the relevance of ESI.
ELECTRONIC DATA DISCLOSURE (EDD): Also known as EDISCLOSURE. Process of disclosing ESI. Not to be confused with using electronic means to carry out the disclosure of images of paper documents or printed out emails, Word documents etc.
ELECTRONIC DISCOVERY REFERENCE MODEL (EDRM): Model used to show stages of process of electronic discovery.
Electronic Document: see Electronically Stored Information (ESI).
ELECTRONIC IMAGE: an electronic representation of a paper document or Electronically Stored Information. An Electronic Image may be a SEARCHABLE IMAGE or an UNSEARCHABLE IMAGE. Examples are image PDF files and TIF (/TIFF) files.
ELECTRONIC STORAGE SYSTEM: A system or medium for retaining Electronically Stored Information.
ELECTRONICALLY STORED INFORMATION (ESI): Electronic files on a computer such as emails, Word, Excel, PowerPoint, Adobe PDF documents. It includes (for example) e-mail and other electronic communications such as SMS and voicemail, word- processed documents and databases, and documents stored on portable devices such as memory sticks and mobile phones. In addition to documents that are readily accessible from computer systems and other electronic devices and media, it includes documents that are stored on servers and back-up systems and electronic documents that have been ‘deleted’. It also includes METADATA and EMBEDDED DATA.
EMAIL THREADING: Software functionality that pulls together the various emails that make up a “thread of conversation” and display them in an easy to understand manner. The normal aim is have the final email in a chain readily identifiable so that all the secondary emails in the conversation can be read in one pass.
EMBEDDED DATA: Text or other information which is not typically visible to the user viewing the output display on screen or as a print-out. Examples of Embedded Data include spreadsheet formulae (which display as the result of the formula operation), hidden columns, externally or internally linked files (e.g., sound files in PowerPoint presentations), references to external files and content (e.g., hyperlinks to HTML files or URLs), references and fields (e.g., the field codes for an auto-numbered document), and certain database information if the data is part of a database (e.g. a date field in a database will display as a formatted date, but its actual value is typically a long integer).
ENCRYPTION: Procedure whereby the contents of a message or file are scrambled or made unintelligible to anyone not authorised to use it.
FIELD: A section of data in a database, for example a field containing the date of a document.
FILE SLACK SPACE: A form of residual data, slack space is the amount of on-disk file space from the end of their logical record information to the end of the physical disk record. Slack space can contain information soft-deleted from the record, information from prior records stored at the same physical location as current records, metadata fragments and other information useful for forensic analysis of computer systems.
FORENSIC COPY: An extract copy of an entire physical storage medium (hard drive, CD-ROM, DVD, tape etc.). Also referred to as “mirror imaged copies”, “imaging” and “disc mirroring”.
FORMAT: The way in which Electronic Images and other documents are stored and made accessible.
GIF (GRAPHIC INTERCHANGE FORMAT): A computer compression format for pictures. GIGABYTE (GB): A measure of computer data storage capacity and equivalent to a billion (1,000,000,000) bytes. Also referred to as a “gig”.
HARD DRIVE: The primary storage unit on PCs, consisting of one or more magnetic media platters on which digital data can be written and erased magnetically.
HOST DOCUMENT: A Document with one or more ATTACHED DOCUMENTS. For example, an e-mail is a Host Document and any Documents attached to the e-mail are its Attached Documents.
Indexing: See Document Coding.
INTERNET SERVICE PROVIDER (ISP): A business that provides access to the Internet.
JPEG (JOINT PHOTOGRAPHIC EXPERTS GROUP): An image compression standard for photographs.
KEYWORD SEARCH: A search for documents containing one or more words that are specified by a user. Normally conducted on ELECTRONICALLY STORED INFORMATION, but can also be carried out on OCR TEXT.
KILOBYTE (KB): A measure of computer data storage capacity and equivalent to a thousand (1,000) bytes.
LEGACY DATA: Information that has been created or stored by the use of software and/or hardware that has become obsolete or has been replaced (“Legacy Systems”).
LEGACY SYSTEMS: Systems containing legacy data.
LITIGATION HOLD: An instruction issued as a result of current or anticipated litigation, audit investigation or other such matter that suspends the normal processing or disposal of records.
LITIGATION SUPPORT SOFTWARE/SYSTEM: Application that supports the process of litigation. In terms of the EDRM approach this stage occurs after the Early Case Assessment stage.
LOOSE DOCUMENT: An Electronic Document that is stored in its Native Form in a file system or directory system but not an email box. An email or document attached to an email, even if extracted from the email box in which it was originally stored, is not a Loose Document.
MEDIA FREE SPACE: Unused space on storage media that is available for storage.
MEGABYTE (MB): A measure of computer data storage capacity and equivalent to a million (1,000,000) bytes. Also referred to as a “meg”.
METADATA: Commonly described as “data about data”. It is information that may describe, for example, how, when and by whom it was received, created, accessed, modified and how it is formatted. Some metadata, such as file date and sizes, can easily be seen by users. Other metadata can be hidden or embedded and is unavailable to computer users who are not technically adept. Metadata is generally not reproduced in full form when a document is printed.
MIGRATED DATA: Information that has been moved from one database or format to another.
MIRROR IMAGE: Used in computer forensic investigations and some electronic disclosure investigations, a mirror image is an exact bit-by-bit copy of a computer hard drive that ensures the operating system is not altered during the forensic examination. May also be referred to as “disc mirroring”, or as a “forensic copy” or “imaged copy”.
MPEG (MOVING PICTURES EXPERT GROUP): An image compression standard for full motion video.
NATIVE FORMAT: An associated file structure for an electronic document defined by the original creating application. Viewing or searching documents in the native format may require the original application (for example, viewing a Microsoft Word document may require the Microsoft Word application).
NETWORK: A group of one or more computers and other devices connected together for the exchange and sharing of data and resources.
OBJECTIVE CODING: Coded information that can be derived from a document without any specific legal training. Normally comprises; Date, Estimated Date, Document Title, Document Type, From, To, Copyee. Objective coding is normally conducted by a vendor (often overseas to provide a cheaper service).
OFF-LINE DATA: The storage of electronic data outside the network in daily use that is only accessible through the off-line storage system.
Optical Character Recognition (‘OCR’): means the computer-facilitated recognition of printed or written text characters in an Unsearchable Image
OFF-LINE DATA: The storage of electronic data outside the network in daily use that is only accessible through the off-line storage system.
ON LINE DATA: Electronic data stored on the network in daily use.
PDF (PORTABLE DOCUMENT FORMAT): A common format for images of documents which enables documents to be displayed or printed a manner which preserves the formatting originally used by the author. A PDF file may be either a Searchable Image file or an Unsearchable Image file.
PETABYTE (PB): A petabyte is a measure of computer data storage capacity and equivalent to one quadrillion (1,000,000,000,000,000) bytes.
PERSONAL DATA: Information of a personal nature that must not be disclosed, such as medical records, salary, home addresses, relationship discussions, social security numbers, etc. Personal data is normally REDACTED.
PREDICTIVE CODING: Functionality that automatically codes records by conducting analysis on the ESI. The coding can encompass OBJECTIVE and SUBJECTIVE CODING.
Objective coding is usually a simpler process than the Subjective work which requires the software to be “seeded” with examples of relevant and/or Privilege documents. The application then “learns” what criteria it uses to arrive at the Subjective decisions and (once trained) will identify those documents and pass them to a user for confirmation on the coding calls. See also: COMPUTER ASSISTED REVIEW REFERENCE MODEL (CARRM).
PST (PERSONAL STORE): The place where Microsoft Outlook stores its data (when Outlook is used without Microsoft Exchange Server). A PST file is created when a mail account is set up. Additional PST files can be created for backing up and archiving Outlook folders, messages, forms and files. The file extension given to PST files is .pst. Can be broken down into individual emails called .msg.
RETENTION PERIOD: The length of time a given records series must be kept, expressed as either a time period (e.g. four years), an event or action (e.g. audit), or a combination (e.g. six months after audit).
REDACTION: The process whereby sensitive information is hidden by rendering part of a document unreadable. It is sometimes referred to as ‘Masking’. Redaction is typically used to render unreadable; confidential, privileged or personal data portions of an otherwise disclosable document.
RESIDUAL DATA: Data that is not active on a computer system (sometimes referred to as “Ambient Data”). Residual data includes (1) data found on media free space; (2) data found in file slack space; and (3) data within files that has functionally been deleted, in that it is not visible using the application with which the file was created, without use of undelete or special data recovery techniques.
RESTORE: To transfer data from a backup medium (such as tapes) to an on-line system, often for the purposes of recovery from a problem, failure, or disaster. Restoration of archival media is the transfer of data from an archival store to an on- line system for the purposes of processing (such as query, analysis, extraction or disposition of that data). Archival restoration of systems may require not only data restoration but also replication of the original hardware and software operating environment. Restoration of systems is often called “recovery”.
SAMPLING: Usually (but not always) refers to the process of statistically testing a data set for the likelihood of relevant information. It can be a useful technique in addressing a number of issues relating to litigation, including decisions as to which repositories of data should be preserved and reviewed, and determinations of the validity and effectiveness of searches or other data extraction procedures. Sampling can be useful in providing information to the court about the relative cost burden versus benefit of requiring a party to review certain electronic records.
SEARCHABLE IMAGE: An ELECTRONIC IMAGE in which the text-based contents can be searched electronically.
SEMANTIC ANALYSIS : Method by which a number of products conduct clustering. Refers to the “automatic” identification of key words and concepts within a document so that there is a “spine” of a central concept, off which related groups of documents are clustered.
SCANNING: The process of converting a hard copy paper document into a digital image for use in a computer system. Often associated with the OCR process, as in “documents will be scanned and subject to an OCR process”.
SUBJECTIVE CODING: Information held in a litigation support system about records (either paper or electronic). Subjective coding requires legal input and covers initial calls on Relevance, Privilege and Trade Secret as well as case specific issue and matter coding.
TECHNOLOGY ASSISTED REVIEW (TAR): See: Computer Assisted Review (CAR) and Predictive Coding.
TERABYTE (TB): A measure of computer data storage capacity and equivalent to one trillion (1,000,000,000,000) bytes.
TIF OR TIFF (TAGGED IMAGE FILE FORMAT): One of the most widely supported file formats for storing bit-mapped images. Files in TIFF format often end with a .tiff extension. Other file formats include JPG and BMP.
Unattached Document: An Electronic Document without a Host Document. UNSEARCHABLE IMAGE: An Electronic Image in which the text-based contents cannot be searched electronically.
Source: eDisclosure Systems Buyers Guide – 2022 Update