Sie sind auf Seite 1von 11

Data Masking – A five-phase

implementation methodology

Authors: Ashim Roy

Nikhil Patwardhan

Data Masking – Recommended Steps 1 of 11 TATA CONSULTANCY SERVICES


About the Authors

Ashim has been in the IT industry for last decade


and a half, playing many roles in delivery assignments and
product development. His current interest area is customer
data privacy. He has architected solutions on diverse
platforms for multiple customers in Engineering, Financial
product and Services industry. He has done his masters in
Robotics from IIT, Kanpur.
Ashim Roy

Nikhil has been in the IT industry for last 7 years,


playing many critical roles in development of industry-scale
products in testing, program analysis and data privacy. He
has been instrumental in understanding customer core
privacy problem and pioneering customized solution to drive
landmark privacy benefit for organizations. He has done his
masters in Mechanical Engineering from IIT, Madras.

Nikhil Patwardhan

Data Masking – Recommended Steps 2 of 11 TATA CONSULTANCY SERVICES


Table of Contents
What is Masking?.......................................................................................................... 4

Need for Data Masking ................................................................................................. 5

Challenges to Data Masking ........................................................................................ 6

What all things Data Masking must take care of?...................................................... 7

Data Masking Steps ...................................................................................................... 8

Analysis .................................................................................................................. 9

Set-Up ..................................................................................................................... 9

Masking ................................................................................................................ 10

Uploading ............................................................................................................. 10

Review and sign-off ............................................................................................. 10

Conclusion............................................................................................................... 10

REFERENCES.......................................................................................................... 10

Data Masking – Recommended Steps 3 of 11 TATA CONSULTANCY SERVICES


What is Masking?

An Example of data masking

Production Data Masked Data


Trader Table Trader Table

TraderID Name Location TraderID Name Location


100 James Brown CA
999 Robert Williams CA
101 Roy Hooper NJ
1011 Tom Cox NJ
102 Timothy Fox NY 1050 Vincent Hill NY

Order Table Order Table

TraderID Quantity InitDate TraderID Quantity InitDate


100 100000 15-Aug-06 999 25000 15-Aug-06

101 250000 13-Aug-06 1011 35000 13-Aug-06

101 50000 13-Aug-06 1011 10000 13-Aug-06

Data Masking – Recommended Steps 4 of 11 TATA CONSULTANCY SERVICES


Need for Data Masking
Today’s globally spread business demands data to travel all over the globe at the speed of light.
Organization needs to share data not only with globally present teams but also with various
supporting and interfacing agencies. The need for this are many e.g. global IT system
development, maintenance and support, regulatory mandates, data publication for awareness,
publishing various trends etc. etc.
In absence of sufficient and careful measure at source, this may lead to various data privacy
breaches. Organization and individuals can suffer damages far more than one can imagine and
could have serious repercussion in future.
The recent rise in data privacy violations has resulted in regulations[1] in many countries and has
imposed constraints on how organizations should handle customer sensitive data. If found
violating, there are heavy financial penalties[2], legal complications and the concerned
organization may see an avalanche of customer erosion.

To tackle this problem, organizations spend lot of time and money to do either of the following:-
1. Share production data with an implicit agreement not to abuse it. This carries a great deal
of risk as sensitive data is disclosed as is.
2. Create fictitious dataset and share. The dataset generated is far from having a realistic
look.
3. Create in-house program to hide sensitivity of production data before it could be shared.
Program lacks a common approach and different silos have separate customized program
leading to high cost and difficulty to sustain in long term.

As far as the utility and privacy of data is concerned, approach-3 above looks good but has
inherent problems before it could be deployed enterprise-wide.

Data masking comes in this context, precisely to take care of these problems.
The expected benefits are:-

1. Create test data which is production look-alike and hence, realistic.


2. Create an enterprise-wide non-intrusive program to take care of data sensitivity and
complex inter-application data integrity.
3. Help organizations to be compliant with stringent regulations by guarding customer and
business confidential information.

Data Masking – Recommended Steps 5 of 11 TATA CONSULTANCY SERVICES


Challenges to Data Masking
Data masking is a new technology and not everybody is on the same page. There are
misconceptions, confusions and at times over-expectations of what data masking can achieve.
Below listed are few typical challenges to data masking.

1. Need to mask all data fields – Organizations don’t want to take any chance and
categorizes every data field as sensitive and favors data masking for all them.

2. Lack of knowledge/documentation of database/application implementation


Data masking demands complete understanding of the implementation of database as
well as any implicit assumptions of data in applications using it. Without this, one
would not achieve success in data masking. Unfortunately, there is not enough depth
and breadth of knowledge available on the ground and organization needs to invest
sufficient time and effort to achieve that.

3. Data stored in variety of storage formats(RDBMS, Mainframe, flat file)


This makes the task of masking more difficult and time consuming. Data masking
applications cannot simply assume certain type of data sources in mind while being
built. This adds to the overall cost of masking.

4. Data volume is too high


Today’s business is mostly volume driven and runs on already thin and yet shrinking
profit margin. Masking of this high volume data would add to the overall cost and
time.

5. Masking required too frequently


Some organizations want to go for frequent data masking. This makes data masking to
be a closely monitored business process and would require more funding than
allocated.

6. Insufficient awareness of Masking process


Sufficient evidence makes us believe that there is lack of awareness as to what data
masking can do for an organization. People get confused between privacy and
security, masking and access control.

7. Failure to strike a balance between privacy and utility


Organization finds it hard to strike the right balance between privacy and utility.
Often, this process takes quite longer for a consensus to arrive at. Many a times, they
get lost in hypothetical utility and recommend a data masking solution which is not
friendly to use and hits the pocket hard.
8. Masking means encryption
The first thing that comes to the mind of many is that masking and encryption are
synonymous. However, besides encryption, masking can be achieved by many other
different ways.

Data Masking – Recommended Steps 6 of 11 TATA CONSULTANCY SERVICES


9. Nice to have ….
Organizations don’t accept data masking as a solution that they must have. It is still
one of “Nice to have” things. Unless, there is an incidence of violation, they feel they
are alright and can go few miles without having a look at data masking.

What all things Data Masking must take care of?


Other than masking sensitive information, a data masking tool should provide following features
1. Referential Integrity
It is very important to understand the relationship of data elements in a database. Often
times, this relationship resides in the applications outside of the data storage area and
finding those is a hard thing to do if application knowledge is thin amongst the current
owners. It’s also the case that data is not standalone and is fed to downstream systems.
Downstream system has an implicit data relationship with the upstream system. Masking
must note these relationships and make sure that data integrity is maintained.

2. De-normalized data
Because data might exist in de-normalized form, masking must make sure that the de-
normalized data is masked with same values wherever they occur.

3. Derived data field


There may be few data fields which are derived out of other fields. For example, full name
could be constructed from first and last name. If one decides to mask the first name then
full name should automatically be masked.

4. Data field length


Masking process must understand the metadata well to generate data values within the
minimum and maximum length bounds.

5. Value constraints
Masking must guarantee that data values are within the lower and upper value bounds as
specified in the metadata. In absence of such constraints in the database, one should be
allowed to define it as part of masking process.

6. Uniqueness property for single and composite fields


Masking process should understand uniqueness property of data values in the metadata. It
should additionally allow users to define the same to incorporate application constraints.
Masking must generate values to comply with such constraints.

7. Sampling
It’s often the case that organizations have big data storage and would like to take a sample
of it for masking. Masking process should sample the most meaningful and representative
data out of such big data set and proceed for masking.

Data Masking – Recommended Steps 7 of 11 TATA CONSULTANCY SERVICES


8. Intelligence
Masking requires much of manual housekeeping and is prone to error particularly at the
set-up phase. Masking process should be able to automate much of this manual phase,
create valuable checklist and should be intelligent enough to point out mistakes sooner in
the lifecycle of masking.

9. Easy to use and customizability


Developing a masking solution is hardly enough if it cannot accommodate the unique
requirements of different business environments in a trivial way. Data masking does not
require highly technical people to be involved and it could be made possible by having a
solution which is easy to use and simple at core.

10. Preserving the look and feel in masked data


Masking should ensure not only depersonalization but also the realistic look of data. If
original data contains a name “Richard” and masking makes it “123azx”, there rises a
problem of acceptance of this masking process to the end user.

Data Masking Phases


A five-phase process for data masking exercise is recommended here. Before one can start
on the job, a data masking task-force is required to be set up comprising of people across Law,
Legal and Compliance, BU and IT. The task force should acquire knowledge about different
privacy norms, application environment and various databases, and should be setting an overall
plan for data masking as well as executing the same keeping in mind the business requirements
and appropriate legislative mandates.

Data Masking – Recommended Steps 8 of 11 TATA CONSULTANCY SERVICES


The recommended steps for a data masking exercise are:-
Analysis – This is a critical step towards ensuring the success of the data masking effort.
The task-force must do the due diligence to understand the complexity of the application and
data environment. This would help to identify sensitive data at the bare minimum (as opposed
to all), understand data relationships and constraints defined both in data as well as in
application. Sufficient rigor is needed to understand the flow of data across applications.
Understanding various assumptions, both at the application and inter-application level, would
ensure devise a sound data masking strategy at an enterprise level. The task-force must keep
in mind the appropriate regulatory laws so that right masking techniques are adhered to in the
later stage of the masking process. The following questionnaire/checklist would provide a
good guidance in this phase.
• Understanding data privacy regulations and creating a scope of application data
requiring masking.
• Identifying sensitive fields and for each of them, understanding the masking
requirements. Also, getting the context in which the masked data would be used is
crucial? One must evaluate the possible masking impact?
• Overall ER diagram with unique keys and foreign key relationships
• What Referential integrity requirements are kept outside of data?
• What are the data sources and respective sizes?
• Which data is replicated across applications?
• Which data is shared with other applications?
• What is the Sampling strategy for test-bed? Is it full replica or a small subset of the
production environment?
• Frequency of update of test bed
• Current method and time taken for test bed.

Set-Up
This phase consists of the following steps.
• Identify the right people to be involved in masking.
• Tagging of proper role to different people. TCS recommends role-based masking for
improved data privacy.
• Creating a static data source from production. TCS strongly discourages using
Production environment as a data source. Non-conformance to this may lead to
problems like integrity violation, failure of masking process etc
• Installing a data privacy tool in client’s environment.
• Setting up of a project in the tool. This includes defining the data source identifiers
and importing entities intended for masking.
• Define any constraints and referential relationship of data in addition to what is
already defined at the data level.
• Creating any customized data sets. A data set is an external set of data used to mask
the original data.
• Selecting sensitive data and defining the apt masking technique.
• Defining the right tests to gauge the quality of masking.

Data Masking – Recommended Steps 9 of 11 TATA CONSULTANCY SERVICES


Mask
After set-up is complete, actual masking needs to go ahead and produce masked data.
This phase may be time consuming depending upon the volume of data being masked.
User may need to watch the process from time to time and on exception, may decide to
resume from the point of failure. At the end of masking, user must verify the quality of
masking and identify any unintended variation on the statistical measure. One should
move over to the next step if Masking is found to meet the specification as defined in
“Analysis” phase.

Upload
Once Masking is done, one must make sure that the masked data is uploaded to
various target systems.

Review and sign-off


Once target systems are loaded with masked data, a thorough review must take place
to check the privacy and utility of the data. This would consist of, at the minimum,
checking the various inter-relationships, constraints of data and running of applications
using this data. The task force, on finding the data acceptable, must sign-off and close the
exercise formally.

Conclusion
In today’s business scenario, data masking is no longer an option, it is a mandatory
activity. Organizations would reinstate customer confidence by quickly adopting and integrating
data masking as part of its data management policy. Organization must choose a tool which
could handle not only the complexity of today’s business environment but could easily be
customized to handle the complexity of tomorrow. More organizations are already on this path
but a clear lack of understanding is making the process unnecessarily complex. There is a need
that organizations take help from data privacy experts to create the right awareness within and
drive the masking program efficiently to derive competitive advantage over its competitors.

REFERENCES
1. Protecting Private Data With Data Masking by Noel Yuhanna
2. Build Your Privacy Program: Oversight And Management by Michael Rasmussen
3. Data privacy in Indian IT and ITES-BPO sectors: NASSCOMM handbook
(http://www.nasscom.org/download/Information Security in the ITES-BPO sector.pdf)
4. Articles and white papers about Data Privacy:
http://www.itbusinessedge.com/taxonomy/?t=8256

[1] USA – GLBA, HIPAA, SOX, RFPA, CFAA, ECPA, FIPPA


UK - Data Protection Act of 1998, Regulation of Investigatory Powers Act 2000
Australia - Privacy Amendment Act of 2000

Data Masking – Recommended Steps 10 of 11 TATA CONSULTANCY SERVICES


Canada - Personal Information Protection and Electronic Documents Act
EU - Union Personal Data Protection Directive 1998
New Zealand - Privacy Act of 1993
Hong Kong - Hong Kong Personal Data (Privacy) Ordinance of 1995

[2] Financial Institution Privacy Protection Act 2003


– Firm’s employees personally liable for $10,000 in case of a data breach
– Firm itself is liable to $100,000 in damages for a data theft

Data Masking – Recommended Steps 11 of 11 TATA CONSULTANCY SERVICES