Manually populate the MaskingRules sheet

Search Knowledge Base by Keyword

< Back

The MaskingRules sheet in the Control spreadsheet (C:\VIPTDM\SubsetCloneAndMaskRun\Mask\SQLServer\VIPMaskSQLServer.xlsx ) contains the details of what the user wants to mask. For example, if a user wants to mask the Addresses of users, they will create a rule that specifies changing parts of the address so that it is no longer the original name, although it is still an address value.

This information is populated by the SCANCOLUMNS action.

The MaskingRules sheet contains a set of rules that are used to do the data masking.

Each column has a title/category in the first row of the sheet which tells you what the column refers to, such as Database, Schema, Table etc.

Each row contains a set of values for the categories/titles of the columns. Here’s a screenshot of the MaskingRules sheet.

The MaskingRules sheet is used to specify the rules for both:

1) what data is to be Masked

2) how it should be Masked.

The Masking Rules do this by specifying functions (often pre-defined) to act on the relevant Data in the database. The specific details of the data to be masked are specified in the Masking Rules sheet – Database name, Table name, Columns/Rows name, DataTypes, etc.
Example: The MaskingRules sheet contains a rule that defines the set of data to be masked as ‘addresses’ by replacing all ‘addresses’ in the specified database with RandomText (a pre-defined function).

Note: Help with knowing what data to mask? There is lots of information in the Tables tab sheet that tells you about the data. This information includes:

Distinct Count

For instance by looking at the output of the ScanColumns table, you can see a Column called Distinct Count. If the number in this column is very high, meaning that this data is highly distinct then that might give you an indication that you should mask this information.

Active

The Active column provides a way to make Masking Rules active/inactive say in the event that you have a very large number of rules but are only interested in a subset of those. This is also useful for getting the Masking ‘just right’. This allows for turning off and on rules until you get the results you would like.

DateFormat

If dates are stored in string or numeric columns, rather than date columns, you should specify the date format used, for example “YYYY-MM-DD”.

MaskingFunctions

This column specifies User-Defined functions. A drop-down menu shows the available functions:

 List
 RandomInteger
 RandomNumber
 RandomText

All of these functions require parameters which should be entered in columns Parm1 to Parm7.

List masks a column using a value from a seedlist, the seedlist should be populated in the database unless UseDBFunction (see below) is set to “No”.

Parm1 should contain one of the following:

Sequential
Random
Hash

This value determines how the row from the seedlist is selected:

Sequential starts at the first entry in the list and goes through the list sequentially. When the end of the list is reached it starts again at the first entry.

Random uses a random number to select from the list.

Hash generates the row number based on the content of the column a value from a column is hashed to give a number, this number is used to determine the entry from the seedlist to use for masking. Hashing is deterministic, what this means is that the same value always results in the same number. So, eg if you mask a Surname in two different tables using hashing you get the same masked value for the same original value.

Parm2 names the seedlist to use
Parm3 specifies the column from the seedlist to use
Parm4 is only applicable if Parm1 contains “Hash” and it is optional, if not set the value from the column being masked is used to determine the seedlist row to use in masking, alternatively it can be set to the name of a column in the table being masked and this will be used to determine the seedlist row.

RandomInteger masks a column using a randomly generated integer within a range

Parm1 is the start of the range
Parm2 is the end of the range

RandomNumber masks a column using a randomly generated decimal within a range

Parm1 is the start of the range
Parm2 is the end of the range

RandomText masks a column using a randomly generated string within a range

Parm1 is the minimum length of the string
Parm2 is the maximum length of the string (this can be the same as Parm1 for a fixed length string)
Parm3 is the list of characters to use in the string

Use DBFunction

If you wish to use the List function for masking but don’t want to get the Seedlist data from the database tables set this cell to “No”, the Seedlist data will then be read from a .csv in the Seedlists folder. Warning – this will be significantly slower than masking using the database seed tables.

WhereClause

This column allows the user to include a condition (valid SQL where clause) for when to apply the masking. To do this supply a valid SQL where clause (excluding the “where” keyword).

VIPMaskingFunctions

A large number of functions are available which are not implemented as database user-defined functions – these can be entered as VIPMaskingFunctions. If you enter “=VIP” in a cell you will be presented with a list of the available functions. Double click on any entry in this list and you will be shown details of all parameters required for the chosen function. See the KB for a full lists of VIP functions: https://curiositysoftware.ie/vip-knowledge-base/data-masking-function-list/

*If you need to process a large amount of data with these functions, it could be slow.