Data Classification: The Elephant in the File Server

You know the common thought process: "Data classification is a planning thing. It's an operations thing. It's a disaster-recovery thing. But it can't be this department's thing. Nope. Not doing it. It's too hard. Takes too long. Wastes manpower. Why is it done, anyway? Nobody cares about it."

The amount of data created and collected every year is huge. Enough of it is generated every day to fill another Library of Congress! From video surveillance footage to TPS reports (with or without a cover sheet) to emails, US-based corporations in 2012 created approximately 2.8 zettabytes of data. And that number is expanding. Classifying all that data is a monstrous task. Why would anyone want to do it?

Simple, really. Data classification must be planned, performed, and perfected to avoid breaches and other losses. There is almost nothing more important than data classification for disaster recovery, data loss prevention, data protection and privacy, and data destruction/lifespan policies. Unfortunately, it's also the most difficult operation for which to plan, design a solution, and implement said solution.

Planning data classification should be simple. There's public data, confidential data, secret data, private data, and mission-critical-top-secret-eyes-only-burn-before-reading data. Roughly four tiers. Oh, and what about employee health plan data? That's ePHI. OK, five tiers. And data needed for audit only once a year? Six tiers. (Not so simple, and this is a small sample.) Every stakeholder in sight is going to want influence on which data is classified, which tier it's classified as, and in some cases, when it gets destroyed. Politics will be involved. Top-down buy-in is essential.

Then there's the design of the solution. All the stakeholder interests must be evaluated. The cost of holding, encrypting, and classifying data must be taken into account. For example, if all emails must be classified, perhaps an Outlook plugin would work. But who is going to administer the plugin, be responsible for it, and who will spot-check the emails to make sure the classification rules are being used appropriately?

Spot-checking actually comes under implementation and maintenance. So assuming the InfoSec budget will pay for deployment of our Outlook plugin, InfoSec security operations center (SOC) personnel will need to perform weekly spot-checks on random emails to make sure they're being classified properly. They should also establish a data loss prevention (DLP) solution to make sure that high-classification emails and documents aren't exfiltrated outside the corporate perimeter. Without data classification, DLP is worthless.

DLP solutions are typically costly and difficult to tune correctly. It is extremely important to get expert advice on which system to purchase and consultant help to install, tune, and maintain it. Many DLP solutions are bought and never turned on, just for lack of this resource. While this may check a box regarding the acquisition of a DLP system, it really doesn't do much for keeping data where it needs to stay.

Disaster recovery becomes much harder without data classification, because if all data is the same importance, all data has to be treated as mission critical. That's a lot more data to worry about, contingency plans to have, hot sites to pay for, and redundant storage across multiple geographic areas to maintain than if data classification were used correctly.

Data classification is hard. It takes planning, clout, stakeholder buy-in, budget, time, and major effort. But the payoffs are multiple, and significantly worth it. It is yet another part of InfoSec life where "failing to plan is planning to fail."

Posted on May 7, 2014

← View more Blogs

This document was retrieved from on Tue, 17 Jan 2017 03:59:25 -0500.
© 2017 EMC Corporation. All rights reserved.