From LegalTech NY 2010: Backup is for Recovery, Archiving is for Discovery

This post is one of several summarizing our coverage of LegalTech New York 2010.  For our other posts click here.

LegalTech NYC 2010   200 x 100

Reported by:  Alexis Gambetta /The Posse List

The functions of managing record retention and disposition, electronic discovery, data privacy, audit trail, etc. are inter-related.  In fact they are more than just inter-related, they are part of an emerging underlying concept: information governance.  Organizations need to avoid siloing these governance functions into separate solutions.  Instead they should look at them holistically and integrate them into unified corporate information governance programs.

That was the big take-away from the double-session on backup, archiving, and information management.  But it was also a bit more nuanced than that.  And this was a double-session panel and they covered a lot of territory.

The panel:  Annie Goranson (Discovery Attorney, Symantec Corporation);  George Socha (Socha Consulting); Hon. Ron Hedges; Denise Backhouse (Associate, Morgan Lewis);  Mikki Tomlinson (Litigation Support Manager, Chesapeake Engergy); and Jonathan Moskin (partner, Foley & Lardner).

First, some concepts and definitions.  Backup and archiving each have a different purpose.  Backup and archiving are two distinct processes with different objectives and requirements.   With backup, the objective is to ensure that a recent copy of production data is available for recovery in the event of a disaster, outage, or accidental loss.   The process is to make a copy of production system data and store it until overwritten by a new version.   With digital archiving, the objective is to enable the long-term retention and management of digital assets to satisfy regulatory compliance, audit, litigation support, records management, data management and new business process requirements.   The process is to remove records from production systems, and to preserve and keep them available for easy access and reference until the retention period has expired or the  data possesses no more business value.

Backup

Backup technologies have long provided effective recovery options for systems subject to data loss from human error, hardware failure or major natural disasters. They are ideally suited for quick restoration of large amounts of lost information and can return complete systems to full operational capacity in a short period of time. However, backup also is a major pain point for storage administrators. Massive amounts of data can strain the ability of backup infrastructures to keep up.   But time required to back up data is shrinking, and the ability to quickly restore information is significantly improved.   A walk around the vendor floors would have told you that.

However, these technologies will be only stopgap measures if the uncontrolled growth in the amount of data requiring backup isn’t curtailed. This becomes a real danger when a company treats backup as a single solution for both data protection and data retention, resulting in highly ineffective and inefficient data management.

File Archiving

By introducing file archiving, corporations can improve their service levels for backup and recovery while reducing backup costs. File archiving can also meet regulatory requirements for data retention, managing files with complete knowledge of the file system and document metadata, as well as knowledge of the files’ content. A file archiving system moves or copies files according to the value of the actual content. They also find and retrieve individual files based on their content, which could include any number of parameters, including author, date and customized tags such as “audit” or “Sarbanes-Oxley.”

To effectively manage data, file archiving systems discover all files on a network and provide an inventory of unstructured data (for a definition click here). During the discovery process, the systems collect file system metadata and extract file contents, building a foundation for data classification and application of information governance policies.

A file archiving system must provide the following capabilities:

— Be content-aware. For example, it should index the content in the documents, not only the file system metadata.

— Populate customized metadata tags by extracting information from content.

— Prune production storage by using policies to archive information to the appropriate tiered storage level.

— Archive a subset of data (defined by archival policies) selectively to meet regulatory compliance and corporate information governance rules.

—  Provide quick access to archived data.

So file archiving and backup systems have two distinct and complementary functions within an enterprise: backup for high-speed copy and restore to minimize the impact of failures, human error or disaster; and file archiving to effectively manage data for retention and long-term access and retrieval.

But the two can intersect

Backup and archiving processes can intersect at two specific points.  First, IT should archive inactive data to free up capacity on primary storage and servers, and reduce the amount of data that needs to be backed up regularly from these systems. If the data being protected is old, unchanging or rarely accessed, but still needs to retained, there’s no reason to keep the information on production servers and storage. That data can be archived and moved to lower cost storage where it will still be accessible. This takes the aged data out of recurring backup operations. Organizations can complete backups much faster and save money on tertiary media by archiving. The brute-force alternative is to simply delete old data from primary systems. However, this would put an organization at risk of being out of compliance with regulations and limit the opportunity to leverage the information for other business purposes.

The second point of intersection involves adding information archive systems to the backup schema for data protection purposes. Efficient archiving mandates that the data doesn’t reside anywhere else (because it was moved from primary systems). As such, IT must back up the archive system as part of the backup schema so that archived data is also protected appropriately.

Issues around accessiblity and inaccessibility of data

FRCP 26(b)(2)(B) states:

(B) Specific Limitations on Electronically Stored Information. A party need not provide discovery of electronically stored information from sources that the party identifies as not reasonably accessible because of undue burden or cost. On motion to compel discovery or for a protective order, the party from whom discovery is sought must show that the information is not reasonably accessible because of undue burden or cost. If that showing is made, the court may nonetheless order discovery from such sources if the requesting party shows good cause, considering the limitations of Rule 26(b)(2)(C). The court may specify conditions for the discovery.

So how are these accessible/not accessible issues examined?   A documented accessibility analysis is essential to backup a litigant or investigator’s determination that data is inaccessible:  failure to maintain such documentation can be extremely costly to both you and your client. The Federal Rules of Civil Procedure require a thorough accessibility assessment in order to successfully claim that data is inaccessible under the FRCP.   Rule 26(b)(2)(B) prevents discovery from inaccessible sources only where the requesting party meets its evidentiary burden of showing good cause for the discovery, subject to the limitations of Rule 26(b)(2)(C).

The Rule means that a party is not required to respond to requests for inaccessible electronic information and produce at any cost.  Rule 26(b)(2)(B) of the Federal Rules of Civil Procedure provides that  a party need not provide discovery of electronically stored information from sources that the party identifies as not reasonably accessible because of undue burden or cost.  (Available at http://www.law.cornell.edu/rules/frcp/Rule26.htm).  

Six common types of electronic media ranging from the most accessible to the least accessible include:

* Active online data, usually magnetic disks, used in the most active stages of the electronic record’s life, such as computer hard drives (active data);

* Removable magnetic and optical media (active data);

*  Cell phones, i-pods, Personal Digital Assistants (PDAs) (active data);

*  Offline storage/archives used for disaster recovery (inactive data);

*  Backup tapes or compressed data requiring sequential access (generally, backup tapes are not organized for retrieval of individual documents) (inactive data); and

*  Erased, fragmented or damaged data that are only retrievable using sophisticated forensic tools and professionals (inactive data).

But these rules are changing as the technology has become more advanced.  The analysis of accessibility depends on the software application used to store and/or archive the data, the operating system, the presence or absence of encryption, the data format, existing records (i.e. an index of the media content), age, and storage method.  Courts and industry guidelines have tried to classify data into two categories active and inactive the later being (usually) considered inaccessible.  Sedona Principle 8 provides that: “The primary source of electronically stored information for production should be active data and information.  Resort to disaster recovery backup tapes and other sources of electronically stored information that are not reasonably accessible require the requesting party to demonstrate need and relevance that outweigh the costs and burdens of retrieving and processing the electronically stored information from such sources, including the disruption of business and information management activities.”  See Redgrave, Jonathan; The Sedona Principles (Second Edition) Addressing Electronic Document Production (2007 The Sedona Conference) (Available at http://www.thesedonaconference.org/dltForm?did=TSC_PRINCP_2nd_ed_607.pdf).

The panel also suggested some cases:

Phillip M. Adams & Associates, L.L.C. v. Dell, Inc.  where the court  held that the defendant’s practice of having employees archive email did “not establish the good-faith nature” of the defendant’s data management practices. Also, Fed. R. Civ. P. 37(e) did not provide a “safe harbor” for email apparently not saved by employees because the fact that the defendant had backed-up servers on which it stored financial data demonstrated that the defendant “does know how to protect data it regards as important.”

Capitol Records vs. MP3tunes During the course of discovery in this copyright infringement case, several disputes arose related to the burdensome nature of the parties’ respective requests for production.  The battle went back and forth, with one party claiming it had ineffective search software.  The court stated: “The day undoubtedly will come when burden arguments based on a large organization’s lack of internal ediscovery software will be received about as well as the contention that a party should be spared from retrieving paper documents because it had filed them sequentially, but in no apparent groupings, in an effort to avoid the added expense of file folders or indices.  Nonetheless, at this stage in the development of ediscovery case law, the Court cannot say that the EMI Labels’ failure to acquire such software and to configure its systems to permit centralized email searches means that its burdensomeness arguments should be disregarded.  I therefore conclude that the EMI Labels’ email files that MP3tunes seeks to search are not reasonably accessible within the meaning of Rule 26(b)(2)(B)”.

Social media and e-discovery

The panel ended with a discussion of the increase in Web 2.0 activity.  If anything, social media are more easily discoverable than just about any other form of user-generated content.  And it is fraught with issues of privacy and technical issues arising from these content sources.  But the big issue is enterprise applications.  As has been reported, increasingly companies are moving to advanced enterprise social media platforms as a way of improving internal collaboration and making projects run more smoothly and effectively. Because such enterprise platforms are often used on a company’s most important and strategic projects, having robust e-discovery capabilities to allow internal blog, wiki, and discussion content to be captured and placed into a format that can be seamlessly searched along with other more traditional documents is becoming critical to forward-thinking enterprises.  Over time, it will become a requirement for e-discovery platforms to integrate with enterprise social media products.