Data Culling Services for Electronically Stored Information (ESI)
Data Culling Services for Electronically Stored Information (ESI)

Typically the amount electronically stored information involved in discovery is unmanageable for any review team. At Electronic Legal we pride ourselves in working with our clients to develop a culling strategy that best fits their needs. We have perfected the best of breed techniques to allow our clients worry less about the techie stuff and allow them to concentrate what they are experts at, being Attorneys. Listed below is a description is some of the most popular culling services we offer.

Data Culling Services:
Search Terms Filtering
Date Range Filtering
File Type Filtering
DeDuplication
Email Duplication
De-NISTing

Search Terms Filtering
Developing an effective keyword list is one of the first steps in helping your review team to be given a review set limited to potentially responsive data while not overlooking critical information. Electronic Legal will assist in the process of developing a keyword term list that will help your team be more effective with their time by supplying as relevant review sets as possible.

Date Range Filtering
With the headaches that electronically stored information has brought to the legal industry it has also brought some efficiencies. Often discovery requests are limited to specific periods of time. If your team decides that any documents outside a specific date range are irrelevant we are able to not include any documents that fall outside the relevant date range.

File Type Filtering
The extension of a file such as doc, pdf or xls does not tell the whole story. Files do not always have correct extensions because of them being mistakenly or intentionally not named correctly. At Electronic Legal we look at the true binary data to determine the appropriate type of file. This process helps assure your firm that no vital documents were not included in your review. So when you request for all Excel spreadsheet documents to be in your review they will all be there. Don’t let another company’s lack of processing expertise make you or your client become vulnerable to sanctions.

DeDuplication
DeDuplication is a technique used to identify and segregate files that are exact duplicates. Why waste you or your team’s time by reviewing the same document ten or more times? This can become even more vital when there are multiple copies of the same privilege document.

Each electronic file has a unique fingerprint, also known as a hash value. This hash is generated by applying a one of two complex mathematical algorithms that will result in either a 128 or 160-bit identifier key. These keys are what is use to determine exact duplicates.

File duplication can occur across a single custodian (e-mail) or across entire infrastructures of files servers, emails servers and other computers. To take these two different scenarios into account we offer two types of deduplication:

• Vertical deduplication locates duplicates within the records and data of a single custodian, and

• Horizontal deduplication applies globally across all custodians.

There are advantages and drawbacks for both vertical and horizontal deduplication. An Electronic Legal project manager will help you determine what is best for your project.

Email DeDuplication
In some instances, e-mail can be deduplicated using the MD5 hash value of four common metadata fields opposed to the entire message. Those fields typically used for deduplication include: Sender's name (the "from" field), "Sent On" date and time, "Subject" line, and Attachment Count (just the count, since some e-mail servers will strip attachments completely and automatically). More fields such as the "CC" and "BCC" may also be used.

De-NISTing
A typical desktop computer can contain between 10,000 and 100,000 files, each of which may need to be reviewed. In order to eliminate as many known irrelevant files as possible from having to be reviewed, an automated filter program can screen files for specific profiles and signatures. If a specific file profile and signature match the database of known irrelevant files, then the file can be eliminated from review. Only those files that do not are subject to further investigation. This process can considerably reduce the volume of data your team has to review. Why waste you and your team’s time reviewing driver files, settings files, and other non-unique irrelevant files?

The National Institute of Science and Technology (NIST) manages a list of known software signatures at the National Software Reference Library (NSRL). Our complex software has the ability to use this list with your data to screen them against the known software signatures. This will once again help your team save time and money by only needing to concentrate on the potentially relevant data.