An overview by Rob Robinson
Understanding Electronically Stored Information – The Elements of ESI (1)
What is Electronically Stored Information?
Electronically stored information, or ESI, is described by the EDRM as information stored electronically on enumerable types of media regardless of the original format in which it was created. (2)
Federal Rules of Civil Procedure 26 and 34, effective as of December 1, 2006, use the broad term “electronically stored information” to identify a distinct category of information that, along with “documents” and “tangible things,” is subject to discovery rights and obligations. (3)
What is Information?
From a technology perspective, information is defined as the summarization of data. Technically, data are raw facts and figures that are processed into information, such as summaries and totals. But since information can also be the raw data for the next job or person, the two terms cannot be precisely defined, and both are used interchangeably.
What Is Data?
- Factual information, especially information organized for analysis or used to reason or make decisions.
- Computer Science. Numerical or other information represented in a form suitable for processing by a computer.
Data Scope (What is the scope of the data in question?)
- Entity Scope – Entities that may have had individuals involved in the creation, review, or response of data that may contain relevant information for the matter at hand.
- Custodian Scope – Individuals who may have been involved in the creation, review, or response of data that may contain relevant information for the matter at hand.
- Data Steward Scope – Individuals who have Information Technology management responsibilities for the entities and individuals determined to be related to the matter at hand or individuals who maintain access rights to the applications and equipment used by these entities and organizations.
- Geographical Scope – The geographic locales of the entities and individuals that may have been involved in the creation, review, and/or response of communications and/or documents relevant to the matter at hand as well as the locales of the equipment used to support creation, transmission, review, and storage of these communications and/or documents.
- Time Frame Scope – The period of time in which relevant information may have been created, reviewed, or responded to for the matter at hand.
- Volume Scope – The estimated volume of data that may contain relevant information for the matter at hand.
Data Structure (What is the structure of the data?)
- Unstructured – Unstructured data (or unstructured information) refers to masses of (usually) digital information in which every bit of information does not have an assigned format and significance. Examples of “unstructured data” may include audio, video and unstructured text such as the body of an email or word processor document. Unstructured data represents approximately 80% of enterprise data. (4)
- Structured – Structured data (or structured information) refers to masses of (usually) digital information in which every bit of information has an assigned format and significance. Examples of “structured data” may include a database such as SQL or a spreadsheet such as Excel. Structured data represents approximately 20% of enterprise data.
Data Format (What is the format of the data?)
- Still Image – Images that convey their meaning in visual terms, e.g. pictorial images, photographs, posters, graphs, diagrams, documentary architectural drawings. Formats for such images may be bitmapped (sometimes called raster), vector, or some combination of the two. A bitmapped image is an array of dots (usually called pixels, from picture elements, when referring to screen display), the type of image produced by a digital camera or a scanner. Vector images are made up of scalable objects—lines, curves, and shapes—defined in mathematical terms, often with typographic insertions.
- Sound – Media-independent sound content that can be broken into two format sub-categories. The first sub-category consists of formats that represent recorded sound, often called waveform sound. Such formats are employed for applications like popular music recordings, recorded books, and digital oral histories. The second sub-category consists of formats that provide data to support the dynamic construction of sound through combinations of software and hardware. Such software includes sequencers and trackers that use data that controls when individual sound elements should start and stop, attributes such as volume and pitch, and other effects that should be applied to the sound elements. The sound elements may be short sections of waveform sound (sometimes called samples or loops) or data elements that characterize a sound so that a synthesizer (which may be in software or hardware) or sound generator (usually hardware) can produce the actual sound. The data are brought together when the file is played, i.e., the sounds are generated dynamically at runtime. This second sub-category is sometimes called structured audio.
- Moving Image – A variety of media-independent digital moving image formats and their implementations. Some formats, e.g., QuickTime and MPEG-4, allow for a very wide range of implementations compared to, say, MPEG-2, an encoding format whose possible implementations are relatively more constrained.
- Textual – Content works consisting primarily of text.
- Web Archive – Content in formats that might hold the results of a crawl of a Web site or set of Web sites, a dynamic action resulting from the use of a software package that calls up Web pages and captures them in the form disseminated to users.
- Generic – Content in widely acceptable generic formats to include but not limited to specifications for wrappers (e.g., RIFF and ISO_BMFF), bundling formats (e.g., METS and AES-31), and encodings (e.g., UTF-8 and IEEE 754 1985).
Data State (What is the state of the data?)
- Active State – Active Data is information residing on the hard drives or optical drives of computer systems, that is readily visible to the operating system or application software with which it was created and is immediately accessible to users without deletion, modification or reconstruction.
- Static State – Static Data (or Archival Data) is information that is not directly accessible to the user of a computer system but that the organization maintains for long-term storage and record keeping purposes. Static data may be written to removable media such as a CD, magneto-optical media, tape or another electronic storage device, or may be maintained on system hard drives in compressed formats.
- Residual State: Residual Data (sometimes referred to as “Ambient Data”) refers to data that is not active on a computer system. Residual data includes data found on media free space; data found in file slack space; and data within files that has functionally been deleted in that it is not visible using the application with which the file was created, without the use of undelete or special data recovery techniques.
Data Network (How does one “Connect” to the data?)
- Non-Networked: Data is not interconnected to a group of computers.
- Personal Area Network (PAN): A personal area network (PAN) is a computer network used for communication among computer devices close to one person. Some examples of devices that may be used in a PAN are printers, fax machines, telephones, PDAs, or scanners. The reach of a PAN is typically within about 20-30 feet (approximately 4-6 Meters). PANs can be used for communication among the individual devices (intrapersonal communication), or for connecting to a higher level network and the Internet (an uplink).
- Local Area Network (LAN): A network covering a small geographic area, like a home, office, or building. Current LANs are most likely to be based on Ethernet technology.
- Wireless Local Area Network (WLAN): A wireless distribution method for two or more devices that use high-frequency radio waves and often include an access point to the Internet.
- Campus Area Network (CAN): A network that connects two or more LANs but that is limited to a specific and contiguous geographical area such as a college campus, industrial complex, or a military base. A CAN may be considered a type of MAN (metropolitan area network), but is generally limited to an area that is smaller than a typical MAN.
- Metro Area Network (MAN): A Metropolitan Area Network is a network that connects two or more Local Area Networks or Campus Area Networks but does not extend beyond the boundaries of the immediate town, city, or metropolitan area. Multiple routers, switches & hubs are connected to create a MAN.
- Wide Area Network (WAN): A WAN is a data communications network that covers a relatively broad geographic area (i.e., one city to another and one country to another country) and that often uses transmission facilities provided by common carriers, such as telephone companies.
- InterNetwork: Two or more networks or network segments connected using devices that operate at layer 3 (the ‘network’ layer) of the OSI Basic Reference Model, such as a router. Any interconnection among or between public, private, commercial, industrial, or governmental networks may also be defined as an internetwork. In modern practice, the interconnected networks use the Internet Protocol. There are at least three variants of an internetwork, depending on who administers and who participates in them:
+ Intranet: An intranet is a set of interconnected networks, using the Internet Protocol and uses IP-based tools such as web browsers, that are under the control of a single administrative entity. That administrative entity closes the intranet to the rest of the world and allows only specific users. Most commonly, an intranet is the internal network of a company or other enterprise.
+ Extranet: An extranet is a network or internetwork that is limited in scope to a single organization or entity but which also has limited connections to the networks of one or more other usually, but not necessarily, trusted organizations or entities (e.g. a company’s customers may be given access to some part of its intranet creating in this way an extranet, while at the same time the customers may not be considered ‘trusted’ from a security standpoint). Technically, an extranet may also be categorized as a CAN, MAN, WAN, or another type of network, although, by definition, an extranet cannot consist of a single LAN; it must have at least one connection with an external network.
+ “The Internet”: A specific internetwork , consisting of a worldwide interconnection of governmental, academic, public, and private networks based upon the Advanced Research Projects Agency Network (ARPANET) developed by ARPA of the U.S. Department of Defense – also home to the World Wide Web (WWW) and referred to as the ‘Internet’ with a capital ‘I’ to distinguish it from other generic internetworks.
Intranets and extranets may or may not have connections to the Internet. If connected to the Internet, the intranet or extranet is generally protected from being accessed from the Internet without proper authorization. The Internet itself is not considered to be a part of the intranet or extranet, although the Internet may serve as a portal for access to portions of an extranet.
Data Storage Network (How does one get to Active State data?)
- Direct Attached Storage (DAS): Direct-attached storage (DAS) refers to a digital storage system directly attached to a server or workstation, without a storage network in between. It is a retronym, mainly used to differentiate non-networked storage from SAN and NAS.
- Network-Attached Storage (NAS): Network Attached Storage (NAS) is a file-level computer data storage connected to a computer network providing data access to heterogeneous network clients.
- Storage Area Network (SAN): A storage area network (SAN) is an architecture to attach remote computer storage devices (such as disk arrays, tape libraries, and optical jukeboxes) to servers in such a way that, to the operating system, the devices appear as locally attached.
Data Storage Media (How does one maintain the Static State data?)
- Semi Conductor Based Storage Media (Memory Cards, USB Flash Drives, PDAs, Digital Audio Players, Digital Cameras, Mobile Phones, Copiers, Solid-State Hard Drives)
- Magnetic-Based Storage Media (Floppy Disk, Hard Disk, Magnetic Tape)
- Optical and Magneto-Optical Storage Media (CD, CD-ROM, DVD, BD-R, BD-RE, HD DVD, CD-R, DVD-R, DVD+R, CD-RW, DVD-RW, DVD+RW, DVD-RAM, UDO)
Data Volume (How much data will be acted upon?)
- Uncompressed – Data not having undergone a process of transformation from one representation to another, smaller representation from which the original, or a close approximation to it, can be recovered.
- Compressed – Data having undergone a process of transformation from one representation to another, smaller representation from which the original, or a close approximation to it, can be recovered.
Data Encryption (Is the data encrypted?)
- Data Not-Encrypted – Data not having undergone a procedure that renders the contents of a computer message or file unintelligible to anyone not authorized to read it. The data is encoded mathematically with a string of characters called a data encryption key.
- Data Encrypted – Data having undergone a procedure that renders the contents of a computer message or file unintelligible to anyone not authorized to read it. The data is encoded mathematically with a string of characters called a data encryption key.
Data Code Format (What capabilities will be needed to display information?)
- Unicode Support – Unicode Support provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.
- Non-Unicode Support – Data Code Format does not provide a unique number for every character regardless of platform, program, or language.
Data Output (How will data reports or files be provided to requestor?)
- Native File Formats: Files produced in the format they were created and maintained are known as native production. In a native production, MS Word documents are produced as .doc files, MS Excel files are produced as .xls files, and Adobe files are produced as .pdf files, etc. Native format is often recommended for files that were not created for printing such as spreadsheets and small databases. For some file types, the native format may be the only way to adequately produce the documents.
- Near-Native Formats: Some files, including most e-mail, cannot be reviewed for production and/or produced without some form of conversion. Most e-mail files must be extracted and converted into individual files for document review and production. As a result, the original format is altered and they are no longer in native format. There is no standard format for near-native file productions. Files are typically converted to a structured text format such as .html or xml. These formats do not require special software for viewing. Other common e-mail formats include .msg and .eml.
- Near-Paper Formats: ESI can also be produced in a near paper format. Rendering an image is the process of converting ESI or scanning paper into a non-editable digital file. During this process, a “picture” is taken of the file as it exists or would exist in paper format. Based on the print settings in the document, the printer or the computer, data can be altered or missing from the image. Expertise in the field of electronic discovery and image rendering tools are necessary to minimize this risk.
- Paper Formats: A paper production is just what it sounds like: paper is produced as paper or ESI is printed to paper and the paper is produced. As with converting to image, printing documents to paper can result in missed or altered data. When producing ESI in paper, it is recommended to utilize someone with expertise in the field of e-discovery and image rendering tools to minimize this risk during the printing or image rendering process.
Data Storage Requirements (How will the data be stored after being acted upon?)
- Hot – Data is stored in an active state and is immediately accessible to end users.
- Warm – Data is stored in an active state not immediately accessible to end users.
- Cold – Data is stored in a static state.
- Destruct – Data is destroyed.
As one begins to understand these storage dispositions, one can then start to assign economic values (time/ money) to the potential approaches to get the data and make it available for all parties involved in a specific matter. Ranging from extremely general and subjective on one end of the spectrum to very specific and objective on the other, determining and understanding economic value can also serve as the basis for discussing from a position of knowing whether or not ESI is accessible or not-reasonably-accessible from a case-specific legal perspective.
ESI Technology Focus Framework
- Creation – enables the creation of ESI.
- Connectivity – infrastructure that connects communication and storage nodes of ESI.
- Communication – enables the dissemination of and collaboration on ESI.
- Conduct (Management) – enables functional area management of ESI.
Understanding Electronically Stored Information – Examination
What are the general categories of ESI in relation to ESI examination?
- Accessible*: “Information deemed ‘accessible’ is stored in a readily usable format.” (Zubulake v. UBS Warburg)
- Unreasonably Accessible: Information not stored in a reasonably usable format.
What are the potential preservation and production implications of accessible/non-accessible ESI?
- Accessible – Need to preserve and to produce.
- Unreasonably Accessible – Need to preserve and understand requirements for production.
What electronic media may need to be examined for ESI*?
- Active, Online Data (Typically Accessible)
- Nearline Data (Typically Accessible)
- Offline Storage/Archives (Sometimes Accessible, Sometimes Unreasonably Accessible)
- Backup Tapes (Typically Unreasonably Accessible)
- Erased, Fragmented, or Damaged Data (Typically Unreasonably Accessible)
* Continuum of Accessibility is not defined in the FRCP as it may change over time.
Understanding Electronically Stored Information – Experts
Why the need for determining eDiscovery experts for investigations and litigation?
- Allows for the selection of a primary eDiscovery advisor for the responsible attorney of record.
- Allows for the selection of an eDiscovery liaison/facilitator for coordination and communication.
- Allows for the proactive selection and training of Rule 30(b)(6) experts.
Who might be selected as an eDiscovery expert/liaison?
- Attorney (In-House or Outside Counsel)
- Court Appointed Special Master
- Third Party Consultant
- Company/Organization Employee
What are the typical characteristics of an expert eDiscovery liaison?
- Technical familiarity with party’s electronic systems and capabilities.
- Technical understanding of eDiscovery.
- Familiarity with and ability to establish “chain of custody” for all ESI.
- Prepared to participate in eDiscovery dispute resolutions and litigation.
Content originally developed to provide a general overview of electronically stored information for those conducting Federal Rule 26(f) “Meet and Confer” planning.
(1) Robinson, R. (2009). Considering Meet and Confer. 1st ed. [ebook] Orange Legal Technologies. Available at: https://www.jdsupra.com/legalnews/44-page-considering-meet-and-confer-eb-54415/ [Accessed 25 Aug. 2018].
(2) Edrm.net. (2018). ESI / Electronically Stored Information. [online] Available at: https://www.edrm.net/glossary/esi-electronically-stored-information/ [Accessed 25 Aug. 2018].
(3) Hedges, R., Rothstein, B. and Wiggins, E. (2017). Managing Discovery of Electronic Information. 3rd ed. [ebook] Washington, DC: Federal Judicial Center, p.3. Available at: https://www.fjc.gov/sites/default/files/2017/Managing_Discovery_of_Electronic_Information_3d_ed.pdf [Accessed 25 Aug. 2018].
(4) Rizkallah, J. (2017). The Big (Unstructured) Data Problem. [online] Forbes.com. Available at: https://www.forbes.com/sites/forbestechcouncil/2017/06/05/the-big-unstructured-data-problem/#54562bc9493a [Accessed 25 Aug. 2018].