Retrieve metadata from the Source Database

Search Knowledge Base by Keyword

< Back

The first Action(s) to perform in a Basic Subset Job retrieve metadata from the Source Database. This retrieves the relationships that must be reflected in a coherent Subset, and are therefore used to generate the rules that will create a coherent Subset from the source data.

The metadata retrieved during the Basic Subset Job include information regarding the tables and their rows, the Foreign Keys, and the Identity Columns. All this information can be read in one composite action, GETMETADATA. GETMETADATA is made up of three individual actions, that can also be run individually:

  1. The TABLESAction retrieve metadata related to the tables in the Source Database.
  2. The GETKEYSAction reads Foreign Keys from the Source Database.
  3. The FINDIDENTITYCOLUMNS Action retrieves the Identity Columns for each specified table.

In most instances, the composite GETMETADATA action is the best action to use due its speed and simplicity. The individual actions might be run for analysis of the subsetting process and data, for learning, or for debugging and refining a defined Subset.

Instructions are provided below on running GETMETADATA, TABLES, GETKEYS, and FINDIDENTIFYCOLUMNS

Run GETMETADATA

GETMETADATA will retrieve all requisite metadata from the source database needed to create a coherent subset. It is therefore the first action to run in a Basic Subset.

To run GETMETADATA, open The VIP Server Controller from your desktop or start menu. You must open it as an administrator.

If you have a re-usable Config file, load it in The VIP Server Controller. It is recommended that you create a Config File after parameterising the GETMETADATA action. That way, you can re-use it for all Actions used in the following Subset.

Check and complete all mandatory parameters in the “Flow Cache Server” tab and commit the automated workflows to memory. See this article for instructions.

Next, complete the mandatory parameters in the “Run Flow” tab. In addition to filling out the mandatory fields in the UI, Metadata generally requires the following Arguments:

parAction – Enter GETMETADATA to run the GETMETADATA action:

parAction=GETMETADATA

parControlExcel – Specify the location of your Excel Configuration spreadsheet. For example:

parControlExcel=C:\Users\VIP1\Documents\Subsetting Documentation 5.0\VIPsubsetStartPoint.xlsx

parDatabaseType Specify the database type. For example, enter the following argument when working with SQLServer:

parDatabaseType=SQL Server

parReportFile Specify the name and location to which your Subset Report file will be saved. Include the file name and .txt extension, not just a directory. For example:

parReportFile=C:\Users\VIP1\Documents\Subsetting Documentation 5.0\SubsetReport.txt

parSQLActionsPath when running any Subset action, you must always specify the location of the SQLActions vip Enter the following arugment:

parSQLActionsPath=C:\VIPTDM\SubsetCloneAndMask\SQLActions.enc.vip

You can also specify any Optional parameters or Arguments at this point. See this Knowledge Base article for a full list of Arguments and example parameters.

See this Knowledge Base article for instructions on parameterising an action.

An example GETMETADATA action appears as follows in the VIP Server Controller:

GetMetadata

Once you have parameterized the GETMETADATA Action, save a re-usable Config File to use throughout your Subset. When performing a Subset, you will typically only need to change the parAction action each time. You can therefore quickly re-use the Config File created at this point for each subsequent Action throughout the Subset. Instructions for working with Config files are provided in this article.

Next, click “Create File” and then “Start” to run GETMETADATA.

In the output dialogue, you will see something like following once the GETMETADATA Action is complete”:

GetMetadataOutput

The Log will display something like the following after GETMETADATA is complete:

GetMetadataLog

If you receive an error message in your output dialogue or log, please refer to this article to troubleshoot common issues. You can also run your Action on debug mode for a more detailed log.

Finally, check your Subset Report to check that the GETMETADATA has run successfully.

Review the Resultant Control Spreadsheet

The composite GETMETADATA action will create and/or populate three worksheets in the Control Spreadsheet: the Tables sheet, the ForeignKeys sheet, and the IdentityColumns sheet. You can review these worksheets to see the metadata retrieved, which will in turn be used to formulate the Subset rules.

Each sheet is created and/or populated by one of the constituent actions run as part part of GETMETADATA. The TABLES action creates the Tables sheet. The GetKeys action creates the ForeignKeys sheet, and FINDIDENTITYCOLUMNS creates the Identity Columns sheet. Each action can be ran individually, as is set out in the next subsection of this article.

Review the Tables Sheet

In your Control Sheet, you should now see a newly created and populated Tables Sheet:

Data Subsetting

The Tables Sheet is created by the TABLES action and shows the metadata that has been read from the source database.

You should check and make sure the metadata looks correct. Each row represents a Primary Key constraint/relationship. It contains the following columns:

  1. Active: “Yes” or “No”. This field is used later to toggle relationships. Ignore this when checking to see if the TABLES action has run correctly.
  2. TableName: The source table form which metadata will be read.
  3. SchemaName: The source schema name.
  4. NumberOfRows: The number of rows in the source database.
  5. PK_Name: The Primary Key name.
  6. PK_Columns: Primary Key Columns, separated by a semi-colon for composite keys.
  7. PK_Column_Types: Primary key column types, separated by a semi-colon if there is more than one.
  8. IsIdentity: Specifies whether the column is an identify column.
  9. InheritedBy: This cell is populated if the table is inherited from another table.
  10. InheritsFrom: This cell is populated if the table inherits from another table.
  11. IsSoftKey: This cell displays “Yes” if the primary key was given in the SoftKeys sheet.
  12. NextValueFunction: If the INCREMENTKEYS action is run this cell provides a list of functions to use to increment the key columns.

You can further check the Log and Subset Report after each Action. This is particularly useful for debugging or if something has gone wrong.

Review the ForeignKeys Sheet

You should additionally see that the ForeignKeys sheet has been populated:

Data Subsetting

The ForeignKeys sheet displays the Foreign Keys that have been read from the source database. It is populated by the GETKEYS Action.

You should check and make sure the Foreign Keys looks correct. Each Row represents a different Foreign Key that has been read from the source database. The Columns are:

  1. Active: Setting Active to “Yes” or “No” includes or excludes a relationship.
  2. FK_Schema: The source Foreign Key schema.
  3. FK_Table: The Foreign Key table name.
  4. FK_Columns: The Column Names. For composite keys the column names are delimited by a semi-colon.
  5. PK_Schema: The source Primary Key schema
  6. PK_Table: Primary Key table name
  7. PK_Columns: The Primary Key Column Names. For composite keys the column names are delimited by a semi-colon.
  8. FK_Name: The Foreign Key name.

Both sheets have been created by the Subset Action. Each time the Actions are run, the sheets will be overwritten.

Review the Log and Subset Report if needed.

Review the IdentityColumns Sheet

The FINDIDENTITYCOLUMNS action will create and/or populate the IdentityColumns sheet, ran either individually or as part of GETMETADATA.

A populated IdentityColumns sheet appears as follows:

IdentityColumns

The IdentityColumns sheet displays the Identity Columns have been read from the source database. These Identity Columns are used by subsetting actions to insert data into the target tables.

You should check and make sure the Identity Columns looks correct. Each Row represents a different Identity Column that has been read from the source database. The Columns are:

  1. Schema: The Source Schema name.
  2. Table: The Source Table name.
  3. Column: The name of the Identity Column.

Each time the GETMETADATA or FINDIDENTITYCOLUMNS Action is run this sheet will be overwritten.

Review the Log and Subset Report if needed.

Run GETMETADATA from the Command Line

To run GETMETADATA using a .CMD script, you must close the Control spreadsheet and any log files you have open.

Next, navigate to the Subset folder that you created earlier.

Open the template .cmd script called “Subset1_getmetadata.cmd”.

You first need to specify the mandatory parameters:

  1. The Database Type.
  2. The location of the Subset.enc.vip flow that will execute the Subsetting;
  3. The location of the Basic Control Spreadsheet
  4. The Actions that will be executed.

To specify the mandatory parameters, first specify your database type:

Data Subsetting

Next, specify the location of the SQLSubset.enc.vip workflow that will execute your Subset. This is contained in your Subset folder. Specify the location as follows:

Data Subsetting

Next, specify where your Control Spreadsheet is. This is specified under

-parControlExcel=””

Locate this in your script and specify the location of the Basic Control Sheet as follows:

Data Subsetting

Now, specify GETMETADATA under =parAction=””:

GETMETADATACMD

Make sure your Control Spreadsheet, the Subset Report, and any Log Files are closed.

Run the script, either by double-clicking the .CMD file, or right-clicking to run the script as administrator.

Review the Log and Subset Report if needed.

Run the constituent Actions individually

You can run the TABLES, GETKEYS, and FINDIDENTITYCOLUMNS actions individually if needed. This is useful for learning and analysing the subsetting process, as well as for debugging and reviewing a source database.

Run TABLES

The TABLES Action will retrieve metadata related to the tables in the Source Database. This metadata is populated in the Tables sheet of the control spreadsheet and is been defined above. TABLES is itself a composite action and also performs the FINDIDENTITYCOLUMNS action, populating data into the Identity Columns sheet.

The process for running Tables from the VIP Server Controller is similar to running GETMETADATA and you can use the instructions from above. You only need to change the parAction argument from “GETMETADATA” to “TABLES”:

parAction=Tables

Otherwise, enter the same parameters and arguments in both the “Flow Cache Server” and “Run Flow” tab. An example TABLES action appears as follows in the VIP Server Controller:

TablesController

Click “Create File” and “Start” to run TABLES. Review the output dialogue and Tables sheet to make sure the action has run correctly. Review the Log and Subset Report if needed.

Run TABLES from Command Line

To run TABLES from the Command Line, navigate to your Subset folder and open your re-usable GETMETADATA .cmd script.

If you have already configured a .cmd script for one Subset Action and wish to edit and re-use that script, you should only need to specify “TABLES” for  -parAction=””:

Otherwise, configure all mandatory parameters in the .cmd script

You can also specify optional parameters at this stage.

Make sure your Control Spreadsheet, the Subset Report, and any Log Files are closed.

Run the script, either by double-clicking the .CMD file, or right-clicking to run the script as administrator.

Now review the Tables sheet to make sure it has been populated correctly. Review the Log and Subset Report if needed.

Run GETKEYS

The GETKEYS Action reads the Foreign Keys from the source database. This information is populated in the ForeignKeys sheet of the control spreadsheet and is been defined above.

The process for running GETKEYS from the VIP Server Controller is similar to running GETMETADATA. You can use the instructions from above. You only need to change the parAction argument from “GETMETADATA” to “GETKEYS”:

parAction=GETKEYS

Otherwise, enter the same parameters and arguments in both the “Flow Cache Server” and “Run Flow” tab as you did to run the “GETMETADATA” and “TABLES” actions. An example GETKEYS action appears as follows in the VIP Server Controller:

GetKeys Controller

Click “Create File” and “Start” to run GETKEYS. Review the output dialogue and ForeignKeys sheet to make sure the action has run correctly. Review the Log and Subset Report if needed.

Run GETKEYS from Command Line

To run GETKEYS from the Command Line, navigate to your Subset folder and open your re-usable .cmd script.

If you have already configured a .cmd script for one Subset Action and wish to edit and re-use that script, you should only need to specify “GETKEYS” for  -parAction=””:

GETKEYSCMD

Otherwise, configure all mandatory parameters in the .cmd script

You can also specify optional parameters at this stage.

Make sure your Control Spreadsheet, the Subset Report, and any Log Files are closed.

Run the script, either by double-clicking the .CMD file, or right-clicking to run the script as administrator.

Now review the Tables sheet to make sure it has been populated correctly. Review the Log and Subset Report if needed.

Run FINDIDENTITYCOLUMNS

The FINDIDENTITYCOLUMNS Action reads the Identity Columns from the source database. This information is populated in the IdentityColumns sheet of the control spreadsheet and is been defined above.

The process for running FINDIDENTITYCOLUMNS from the VIP Server Controller is similar to running GETMETADATA and you can use the instructions from above. You only need to change the parAction argument from “GETMETADATA” to “FINDIDENTITYCOLUMNS”:

parAction=GETKEYS

Otherwise, enter the same parameters and arguments in both the “Flow Cache Server” and “Run Flow” tab as you did to run the “GETMETADATA” action. An example FINDIDENTITYCOLUMNS action appears as follows in the VIP Server Controller:

FINDIDENTITYCOLUMNSController

Click “Create File” and “Start” to run FINDIDENTITYCOLUMNS. Review the output dialogue and IdentityColumns sheet to make sure the action has run correctly. Review the Log and Subset Report if needed.

Run FINDIDENTITYCOLUMNS from Command Line

To run FINDIDENTITYCOLUMNS from the Command Line, navigate to your Subset folder and open your re-usable .cmd script.

If you have already configured a .cmd script for one Subset Action and wish to edit and re-use that script, you should only need to specify “FINDIDENTITYCOLUMNS ” for  -parAction=””:

FINDIDENTITYCOLUMNSCMD

Otherwise, configure all mandatory parameters in the .cmd script

You can also specify optional parameters at this stage.

Make sure your Control Spreadsheet, the Subset Report, and any Log Files are closed.

Run the script, either by double-clicking the .CMD file, or right-clicking to run the script as administrator.

Now review the Tables sheet to make sure it has been populated correctly. Review the Log and Subset Report if needed.