Introduction to Masking
Data masking is about modifying existing information to such an extent that it is no longer possible to draw conclusions about real data. In order to achieve this, it is necessary to comply with certain guidelines, such as those set out in the GDPR.
Any project that has data masking as its main topic requires a tool that can be used to mask the existing data according to the above-mentioned guidelines. XDM’s masking tool is able to use different information as a basis and to carry out data manipulation using defined methods without being able to draw conclusions about the original data.
There are always various hurdles to overcome in this process. Some well-known examples will be taken up and described below. It will also be shown how it is possible to carry out such masking with XDM.
Masking of personal information
The obvious example of data manipulation often starts with a person’s name. The name usually consists of a first name and a surname. The modification of a first name, for example, usually has the requirement that a modified name remains recognizable as the name of a person so that the data and reports generated from the data can also be used for technical and exploratory tests.
This is achieved by using lookup tables that contain a specific set of first and last names. A mathematical calculation can be used to determine which name is replaced by another name. It is usually necessary for an identical name to be changed in the same way if the name appears in different places in the database. To do this, you can either use a fixed number (such as a personal number) or use the input name. Using the input name has the advantage that the addition and availability of a reference number is not necessary, but the disadvantage that completely different names may result if the spelling of a name is slightly different in different memory locations. Find more information on using mapping table containers and building custom lookup tables in the modification configuration documentation.
In addition, there is often the requirement that the gender of the person in question remains the same. If the base value is the first name "John", which is obviously male, then the alienated target value should also be a male name, e.g. the first name "Mark". The basis for this is, of course, the information whether the person is male or female. This can either be extracted directly from the table or determined using an existing lookup table mentioned above. If neither is possible, you can also use XDM to obtain a random name as the target value. A ready-to-use example can be found here.
It can happen that surnames contain double names, that they contain spaces, or that first and last names are stored together as "Name" in an existing table structure. Such cases must be recognized accordingly and masked correctly (correct here means according to the previously defined arithmetic for the modification).
Further examples of special masking requirements can be found in the case of birth data. According to the use case, it is important that a person belongs to a certain age group even after the data has been modified.
An example from the insurance industry: A person named John has insured his car and is 18 years old. He is therefore considered a novice driver and is also assigned to a specific tariff group for car insurance. To ensure that John remains in this rate group even after his real data has been altered for testing purposes, his date of birth cannot be altered arbitrarily, but must be masked according to certain criteria. These criteria are defined by an XDM user. For example, only the birthday and month maybe masked, but not the year.
Masking of addresses
Another example from the insurance industry is the alteration of policyholder addresses. In this case, it is regularly the case that the type of insured property must always be taken into account when altering data and may not be changed. For example, a single-family home is insured differently than a multi-family home. If data records of people who have insured a single-family home are required for a test case, the masked data must provide precisely this data.
A similar scenario is the particular risk of flooding (or other conceivable hazards, such as avalanches) and the corresponding classification of buildings into risk groups. These are taken into account when taking out insurance and must be alienated accordingly in test scenarios designed for this purpose. In concrete terms, this means that the address may not be changed arbitrarily. Instead, the person who actually lives in risk area A must still live in risk area A even after the real data has been modified.
XDM provides further options for masking addresses. For example, you can define that the address to be masked must be within a certain range of the source address. You can also specify that the destination address must be in the same city or the same zip code area.
Masking of banking information
In addition to names and addresses of people, the alienation of international bank account numbers (IBAN) is also one of the most common use cases in masking projects. IBANs can be masked using predefined algorithms, as can be seen in the corresponding example for IBANs.
Masking data using scripts for conditions
Of course, more than one of the above scenarios apply to a real masking project. As a rule, a combination of several attributes is requested. This can also be easily implemented with XDM’s masking tool by working with special scripts (for more details see chapter Condition script). A common example is the use of different types of data masking, depending on a different data set for that person. For example, information for people older than 60 years should be masked differently from people who are younger.
Further information on masking
Further information on how to set up and work with XDM for masking and modification purposes can be found in the following chapter called Data Modification.
A guide that shows how to best create an anonymization project with XDM can be found here. The guide shows which steps need to be taken, what needs to be considered, and which mistakes should be avoided.