Application Models

Preparations for copying subsets

We have now prepared data masking, but there is another requirement where qa1 and qa2 differ. While qa1 needed a copy of all the data in the production tables, qa2 only needs a subset of the data. Only data for employees of one of the nine departments - Customer Service - is supposed to be copied. For this purpose, XDM provides row level processor (RLP) tasks.

In RLP tasks, you do not use selection rules to specify which tables to copy. Instead, information about the relationships between the tables must be provided through data relation rules which are stored in application models. Application models also define a start table. The RLP task itself contains a start condition that determines which rows from the start table are initially extracted. Based on the provided information, XDM automatically determines which additional tables need to be processed, and which rows from those tables need to be extracted. The process is iterative and it is possible that the same table is processed multiple times. The end result is a consistent subset of the data from multiple tables, which is then inserted into the target schema.

Data relation rules are similar — but not identical — to foreign keys in a database. Foreign keys are used to make sure that every row in a dependant table (child table) has exactly one corresponding row in a referenced table (parent table). Databases use foreign keys to prevent inserting rows into a child table for which no parent row exists, and to automatically delete rows in a child table if a parent row is deleted.

By contrast, data relation rules tell XDM - after extracting rows from a given table - which tables to process next. Data relation rules can be used to instruct XDM to visit related tables even if no formal foreign key exists in the database. In addition to forcing the extraction of parent rows, you can also use these rules to extract all child rows if a parent row is extracted (the database system does not require you to do this - a parent row without any entries in a child table is allowed).

This mechanism allows you to specify subset copies that make sense for testers. For example, when copying a row from the table departments, the database system does not force us to copy any additional rows from other tables because the table departments is not a child table of any other table. However, copying only the entry in the table departments does not make sense for testers because it would result in a department without any employees. Using data relation rules, you can instruct XDM to process the table dept_emp after extracting rows from the table departments and extract all entries for the selected department, and then to process the table employees and extract all corresponding employee records. The remaining tables can be processed in a similar way.

About application models and versions

To manage data relation rules, XDM uses application models. An application model has one or more associated versions. The different application model versions represent different versions of an application that is used in an organization. For example, version 2 of a production application may have introduced new tables when compared with version 1 of the same application. You can create one application model with two versions called V1 and V2, and V2 would contain additional rules that reflect the newly added tables and how they are connected to the tables that already existed in V1.

A version contains information about the start table. This table is the first table that an RLP process will access. The data relation rules, which are also associated with the version, tell XDM which tables to process next. In this example, the start table will be the table departments because we want the user to be able to specify the desired department name, and XDM is supposed to copy information about all the employees that work in this department. The two tables dept_manager and dept_emp are child tables of the table departments. After extracting the desired row from the departments table, we want XDM to access these two tables and extract all rows that refer to the chosen department. We need to add two data relation rules to convey this information to XDM. We then need to add additional data relation rules so that processing does not stop after data has been extracted from the tables departments, dept_emp and dept_manager, but rather continue with the remaining three tables in the schema.

Both application models and the individual versions can also have associated modification sets. When a version of an application model is used in an RLP task, the modification sets from the version and the application model to which it belongs are merged and automatically applied. This way, data masking can be enforced whenever the application model is used, even if the RLP task itself does not explicitly use any modification sets.

Application models and their versions are generic - they are not tied to any specific schema. Tables are only referenced by one-part names. However, it is possible to use an existing schema to import existing foreign keys into a version. In our sample database, all relations are represented by real foreign keys in the database. We can import these foreign keys as data relation rules from any of the existing schemas (production, qa1, or qa2).

Creating the application model and version

  1. In the left sidebar, click the menu category Data Subsets to expand the data subsets menu and select the menu item Application Models. Then click the button + Create. Enter the following information:

    Name

    HR Application

  2. Click the button Create and edit.

  3. In the bottom panel select the tab Versions and click + Create. Enter the following information:

    Name

    V1

    Modelling connection

    Select Sample production database from the drop down list

    Modelling schema

    production

    For the modelling schema, you can either enter the schema name production, or you can click the button Pick, select the schema production from the list of schemas, and then click the button Choose.
  4. Click the button Create and edit

  5. Next to the field Start table name, click the button Pick. This opens a list of existing tables in the specified schema.

    Browse schemas and tables
  6. The list Tables will now show the six tables that exist in the schema production. Select the table departments and click the button Choose. The schema browser will close and the field Start table name will be filled with the selected table name.

  7. Click the button Save Changes.

We can now add data relation rules to version V1 of this application model. Since we selected a modelling connection and schema, XDM allows us to import existing foreign keys. Even though these foreign keys are defined in the schema production, the resulting data relation rules are not attached to it in any way.

In this example, all relations are known and can be added by import or manually. If relationships between tables are unknown, the Guide Exploring and Recommending Data Relation Rules with the Table Reference Stage Hook describes how to use the Explore table references stage hook to iteratively identify data relation rules.

When importing existing foreign keys, it is important to keep in mind that data relation rules have slightly different semantics to foreign keys. Data relation rules tell XDM the order in which tables need to be processed. You can import the same foreign key in two different ways: either by selecting the parent table first, or by selecting the child table first. In general, when you import a foreign key between the tables A and B, you should think of it as follows: "After XDM has extracted rows from table A, it needs to extract related rows from table B next". You would then select table A first, followed by table B, when importing an existing foreign key between the two tables, regardless of whether table A is the parent or child table in this relationship.

To import the first two foreign keys that are defined on the start table departments:

  1. Open the panel Table Relation Modelling at the bottom of the page.

  2. Then select the tab Data Relation Rules and click Import from Database (Import From Database button).

  3. Click the button Pick to open the list of available schemas and tables.

  4. In the list Schemas, select production. In the list Tables, select departments. Click the button Choose. The schema browser will disappear and you will see a list of foreign keys.

    Screenshot showing a list of foreign keys found
    The names of the foreign keys are shown in the following format:
    <p|c>.<schema_name>.<table_name>.<constraint_name>
    where p or c indicates if the selected table is the parent or child table in a foreign key relationship.
  5. Select the check boxes in front of both foreign keys and click the import button (Button with an arrow pointing to the bottom). In the dialog window Data Relation Rule Import, click the button Import to confirm.

If we were to run a row level processing task with these two data relation rules, XDM would first copy rows from the table departments and then try to copy rows from the tables dept_emp and dept_manager. This would not work, however, because dept_emp and dept_manager are not only child tables of the table departments, but also of the table employees. Since XDM has not been instructed to follow the connection to the table employees, attempting to insert rows into the tables dept_emp or dept_manager would not work. The rows would be rejected by the target database and XDM would discard them. Therefore, we need to add additional rules:

  • We need to tell XDM that, after extracting rows from the table dept_emp, it needs to extract related rows from the table employees.

  • We then need to tell XDM that, after extracting rows from the table employees, it needs to extract rows from the tables salaries and titles.

Note that there is also a relationship between the tables dept_manager and employees. The entries in the table dept_manager are a subset of the entries in the table dept_emp. Therefore, adding a data relation rule between the table dept_emp and the table employees will automatically add all parent rows in the table employees that any of the rows in the table dept_manager might require to exist. This is a peculiarity of the sample schema. In general, you would need to add an extra data relation rule to properly represent this relationship.

To import the foreign key between the tables dept_emp and employees:

After importing the first rules, the browser window now shows the list of data relation rules by itself.
  1. Use the bread crumbs to navigate back to the Version V1.

  2. In the bottom panel Table Relation Modelling, select the tab Data Relation Rules and click Import from database (Import From Database button).

  3. Click the button Pick to open the list of available schemas and tables.

  4. In the list Schemas, select production. In the list Tables, select dept_emp. Click the button Choose. The schema browser will disappear and you will see a list of foreign keys.

    Screenshot showing a list of foreign keys with the first entry selected
  5. Select the check box in front of the foreign key with the name c.production.dept_emp.dept_emp_emp_no_fkey and click the import button (Button with an arrow pointing to the bottom). In the dialog window Data Relation Rule Import, click the button Import to confirm.

    The two foreign keys that are displayed have very similar names, and the order in which they are shown is not necessarily the same order as in the above screenshot. Make sure to import c.production.dept_emp.dept_emp_emp_no_fkey and not c.production.dept_emp.dept_emp_dept_no_fkey.

To import the foreign keys between employees and its child tables salaries and titles:

  1. In the bottom panel Table Relation Modelling, select the tab Data Relation Rules and click Import from Database (Import From Database button).

  2. Click the button Pick to open the list of available schemas and tables.

  3. In the list Schemas, select production. In the list Tables, select employees. Click the button Choose. The schema browser will disappear and you will see a list of foreign keys.

    Screenshot of the foreign keys found for the production table
    The order in which the foreign keys are shown is not necessarily the same order as in the above screenshot.
  4. Select the check box in front of the foreign keys with the names p.production.salaries.salaries_emp_no_fkey and p.production.titles.titles_emp_no_fkey, then click the import button (Button with an arrow pointing to the bottom). In the dialog window Data Relation Rule Import, click the button Import to confirm.

You will be taken back to the panel Data Relation Rules. The complete list of rules should now look like this:

Screenshot with the list of created data relation rules

The version V1 of the application model HR Application is now complete.

Note that V1 of our application covers all six tables of the sample schema. In more complex environments, it is possible to have application models that cover some, but not all of the tables.

Defining the source and target environments

While defining V1 of the application model, we have imported foreign keys from the real schema production in the database production. The resulting rule set is not tied to any database or schema name. In practice, there are typically multiple different instances of the same set of tables in different databases and under different schema names. In XDM, you use so-called environments to define where these sets of tables exist.

In this example, we need to tell XDM about the schemas production (in the database production) and qa2 (in the database testing) so that we can use them in an RLP task. Even though the same set of tables also exists in the schemas qa1, these are not relevant because we populated qa1 using a TC task. TC tasks copy full tables based on a list of table names and do not need to know the details about the relationships between those tables.

To create the production environment:

  1. In the left sidebar, click the menu category Data Subsets to expand the data subsets menu and select Environments. Then click + Create. Enter the following information:

    Name

    Production HR environment

  2. Click the button Create and edit. Optionally, enter a description. Then click the button Save Changes.

  3. In the bottom panel, under Installed Applications, click the button + Create on the right hand side. Enter the following information:

    Description

    (Optional, may be left empty)

    Application model

    Select HR application from the drop down list

    Version

    Select V1 from the drop down list

    Connection

    Select Sample production database from the drop down list

    Table schema

    production

    Create an environment
    For the table schema, you can either enter the schema name production, or you can click the button Pick, select the schema production from the list of schemas, and then click the button Choose.
  4. Click the button Save Changes. You will be taken to the panel Production HR environment.

  5. Verify that the field Start model is set to HR application ( V1 ) on Sample production database. If it is not, click the button Edit, then select this value from the drop down box, then click the button Save Changes.

To create the testing environment:

  1. In the left sidebar, click the menu category Data Subsets to expand the data subsets menu and select the menu item Environments. Then click + Create. Enter the following information:

    Name

    Testing HR environment qa2

  2. Click the button Create and edit. Optionally, enter a description. Then click the button Save Changes.

  3. In the bottom panel, under the tab Installed Applications, click the button + Create on the right hand side. Enter the following information:

    Description

    (Optional, may be left empty)

    Application model

    Select HR application from the drop down list

    Version

    Select V1 from the drop down list

    Connection

    Select Sample testing database from the drop down list

    Table schema

    qa2

  4. Click the button Save Changes. You will be taken to the panel Testing HR environment qa2.

  5. Verify that the field Start model is set to HR application ( V1 ) on Sample testing database. If it is not, click Edit, then select this value from the drop down box, then click the button Save Changes.