Application Models
Preparations for copying subsets
We have now prepared data masking, but there is another requirement where
qa1 and qa2 differ. While qa1 needed a copy of all the data in the
production tables, qa2 only needs a subset of the data. Only data for
employees of one of the nine departments - Customer Service - is supposed
to be copied. For this purpose, XDM provides row level processor (RLP) tasks.
In RLP tasks, you do not use selection rules to specify which tables to copy. Instead, information about the relationships between the tables must be provided through data relation rules which are stored in application models. Application models also define a start table. The RLP task itself contains a start condition that determines which rows from the start table are initially extracted. Based on the provided information, XDM automatically determines which additional tables need to be processed, and which rows from those tables need to be extracted. The process is iterative and it is possible that the same table is processed multiple times. The end result is a consistent subset of the data from multiple tables, which is then inserted into the target schema.
Data relation rules are similar — but not identical — to foreign keys in a database. Foreign keys are used to make sure that every row in a dependant table (child table) has exactly one corresponding row in a referenced table (parent table). Databases use foreign keys to prevent inserting rows into a child table for which no parent row exists, and to automatically delete rows in a child table if a parent row is deleted.
By contrast, data relation rules tell XDM - after extracting rows from a given table - which tables to process next. Data relation rules can be used to instruct XDM to visit related tables even if no formal foreign key exists in the database. In addition to forcing the extraction of parent rows, you can also use these rules to extract all child rows if a parent row is extracted (the database system does not require you to do this - a parent row without any entries in a child table is allowed).
This mechanism allows you to specify subset copies that make sense for
testers. For example, when copying a row from the table departments, the
database system does not force us to copy any additional rows from other
tables because the table departments is not a child table of any other
table. However, copying only the entry in the table departments does not
make sense for testers because it would result in a department without any
employees. Using data relation rules, you can instruct XDM to process the
table dept_emp after extracting rows from the table departments and extract
all entries for the selected department, and then to process the table
employees and extract all corresponding employee records. The remaining
tables can be processed in a similar way.
About application models and versions
To manage data relation rules, XDM uses application models. An application model has one or more associated versions. The different application model versions represent different versions of an application that is used in an organization. For example, version 2 of a production application may have introduced new tables when compared with version 1 of the same application. You can create one application model with two versions called V1 and V2, and V2 would contain additional rules that reflect the newly added tables and how they are connected to the tables that already existed in V1.
A version contains information about the start table. This table is the
first table that an RLP process will access. The data relation rules,
which are also associated with the version, tell XDM which tables to
process next. In this example, the start table will be the table
departments because we want the user to be able to specify the desired
department name, and XDM is supposed to copy information about all the
employees that work in this department. The two tables dept_manager and
dept_emp are child tables of the table departments. After extracting
the desired row from the departments table, we want XDM to access these two
tables and extract all rows that refer to the chosen department. We need
to add two data relation rules to convey this information to XDM. We then
need to add additional data relation rules so that processing does not
stop after data has been extracted from the tables departments, dept_emp
and dept_manager, but rather continue with the remaining three tables
in the schema.
Both application models and the individual versions can also have associated modification sets. When a version of an application model is used in an RLP task, the modification sets from the version and the application model to which it belongs are merged and automatically applied. This way, data masking can be enforced whenever the application model is used, even if the RLP task itself does not explicitly use any modification sets.
Application models and their versions are generic - they are not tied to
any specific schema. Tables are only referenced by one-part names.
However, it is possible to use an existing schema to import existing
foreign keys into a version. In our sample database, all relations are
represented by real foreign keys in the database. We can import these
foreign keys as data relation rules from any of the existing schemas
(production, qa1, or qa2).
Creating the application model and version
-
In the left sidebar, click the menu category Data Subsets to expand the data subsets menu and select the menu item Application Models. Then click the button + Create. Enter the following information:
Name
HR Application -
Click the button Create and edit.
-
In the bottom panel select the tab Versions and click + Create. Enter the following information:
Name
V1Modelling connection
Select Sample production database from the drop down list
Modelling schema
productionFor the modelling schema, you can either enter the schema name production, or you can click the button Pick, select the schemaproductionfrom the list of schemas, and then click the button Choose. -
Click the button Create and edit
-
Next to the field Start table name, click the button Pick. This opens a list of existing tables in the specified schema.
-
The list Tables will now show the six tables that exist in the schema production. Select the table
departmentsand click the button Choose. The schema browser will close and the field Start table name will be filled with the selected table name. -
Click the button Save Changes.
We can now add data relation rules to version V1 of this application model.
Since we selected a modelling connection and schema, XDM allows us to import
existing foreign keys. Even though these foreign keys are defined in the
schema production, the resulting data relation rules are not attached to it
in any way.
| In this example, all relations are known and can be added by import or manually. If relationships between tables are unknown, the Guide Exploring and Recommending Data Relation Rules with the Table Reference Stage Hook describes how to use the Explore table references stage hook to iteratively identify data relation rules. |
When importing existing foreign keys, it is important to keep in mind that data relation rules have slightly different semantics to foreign keys. Data relation rules tell XDM the order in which tables need to be processed. You can import the same foreign key in two different ways: either by selecting the parent table first, or by selecting the child table first. In general, when you import a foreign key between the tables A and B, you should think of it as follows: "After XDM has extracted rows from table A, it needs to extract related rows from table B next". You would then select table A first, followed by table B, when importing an existing foreign key between the two tables, regardless of whether table A is the parent or child table in this relationship.
To import the first two foreign keys that are defined on the start table departments:
-
Open the panel Table Relation Modelling at the bottom of the page.
-
Then select the tab Data Relation Rules and click Import from Database (
). -
Click the button Pick to open the list of available schemas and tables.
-
In the list Schemas, select
production. In the list Tables, selectdepartments. Click the button Choose. The schema browser will disappear and you will see a list of foreign keys.
The names of the foreign keys are shown in the following format:
<p|c>.<schema_name>.<table_name>.<constraint_name>
where p or c indicates if the selected table is the parent or child table in a foreign key relationship. -
Select the check boxes in front of both foreign keys and click the import button (
). In the dialog window Data
Relation Rule Import, click the button Import to confirm.
If we were to run a row level processing task with these two data
relation rules, XDM would first copy rows from the table departments
and then try to copy rows from the tables dept_emp and dept_manager.
This would not work, however, because dept_emp and dept_manager are
not only child tables of the table departments, but also of the table
employees. Since XDM has not been instructed to follow the connection
to the table employees, attempting to insert rows into the tables
dept_emp or dept_manager would not work. The rows would be rejected
by the target database and XDM would discard them. Therefore, we need to
add additional rules:
-
We need to tell XDM that, after extracting rows from the table
dept_emp, it needs to extract related rows from the tableemployees. -
We then need to tell XDM that, after extracting rows from the table
employees, it needs to extract rows from the tablessalariesandtitles.
Note that there is also a relationship between the tables dept_manager and
employees. The entries in the table dept_manager are a subset of the
entries in the table dept_emp. Therefore, adding a data relation rule
between the table dept_emp and the table employees will automatically
add all parent rows in the table employees that any of the rows in the
table dept_manager might require to exist. This is a peculiarity of the
sample schema. In general, you would need to add an extra data relation rule
to properly represent this relationship.
To import the foreign key between the tables dept_emp and employees:
| After importing the first rules, the browser window now shows the list of data relation rules by itself. |
-
Use the bread crumbs to navigate back to the Version
V1. -
In the bottom panel Table Relation Modelling, select the tab Data Relation Rules and click Import from database (
). -
Click the button Pick to open the list of available schemas and tables.
-
In the list Schemas, select
production. In the list Tables, selectdept_emp. Click the button Choose. The schema browser will disappear and you will see a list of foreign keys.
-
Select the check box in front of the foreign key with the name
c.production.dept_emp.dept_emp_emp_no_fkeyand click the import button (
). In the dialog window Data
Relation Rule Import, click the button Import to confirm.The two foreign keys that are displayed have very similar names, and the order in which they are shown is not necessarily the same order as in the above screenshot. Make sure to import c.production.dept_emp.dept_emp_emp_no_fkeyand notc.production.dept_emp.dept_emp_dept_no_fkey.
To import the foreign keys between employees and its child tables salaries and titles:
-
In the bottom panel Table Relation Modelling, select the tab Data Relation Rules and click Import from Database (
). -
Click the button Pick to open the list of available schemas and tables.
-
In the list Schemas, select production. In the list Tables, select
employees. Click the button Choose. The schema browser will disappear and you will see a list of foreign keys.
The order in which the foreign keys are shown is not necessarily the same order as in the above screenshot. -
Select the check box in front of the foreign keys with the names
p.production.salaries.salaries_emp_no_fkeyandp.production.titles.titles_emp_no_fkey, then click the import button (
). In the dialog window
Data Relation Rule Import, click the button Import to confirm.
You will be taken back to the panel Data Relation Rules. The complete list of rules should now look like this:
The version V1 of the application model HR Application is now complete.
Note that V1 of our application covers all six tables of the sample schema. In more complex environments, it is possible to have application models that cover some, but not all of the tables.
Defining the source and target environments
While defining V1 of the application model, we have imported foreign keys
from the real schema production in the database production. The resulting
rule set is not tied to any database or schema name. In practice, there are
typically multiple different instances of the same set of tables in
different databases and under different schema names. In XDM, you use
so-called environments to define where these sets of tables exist.
In this example, we need to tell XDM about the schemas production (in the
database production) and qa2 (in the database testing) so that we can use
them in an RLP task. Even though the same set of tables also exists in the
schemas qa1, these are not relevant because we populated qa1 using a TC
task. TC tasks copy full tables based on a list of table names and do not need
to know the details about the relationships between those tables.
To create the production environment:
-
In the left sidebar, click the menu category Data Subsets to expand the data subsets menu and select Environments. Then click + Create. Enter the following information:
Name
Production HR environment -
Click the button Create and edit. Optionally, enter a description. Then click the button Save Changes.
-
In the bottom panel, under Installed Applications, click the button + Create on the right hand side. Enter the following information:
Description
(Optional, may be left empty)
Application model
Select HR application from the drop down list
Version
Select V1 from the drop down list
Connection
Select Sample production database from the drop down list
Table schema
production
For the table schema, you can either enter the schema name production, or you can click the button Pick, select the schemaproductionfrom the list of schemas, and then click the button Choose. -
Click the button Save Changes. You will be taken to the panel Production HR environment.
-
Verify that the field Start model is set to HR application ( V1 ) on Sample production database. If it is not, click the button Edit, then select this value from the drop down box, then click the button Save Changes.
To create the testing environment:
-
In the left sidebar, click the menu category Data Subsets to expand the data subsets menu and select the menu item Environments. Then click + Create. Enter the following information:
Name
Testing HR environment qa2 -
Click the button Create and edit. Optionally, enter a description. Then click the button Save Changes.
-
In the bottom panel, under the tab Installed Applications, click the button + Create on the right hand side. Enter the following information:
Description
(Optional, may be left empty)
Application model
Select HR application from the drop down list
Version
Select V1 from the drop down list
Connection
Select Sample testing database from the drop down list
Table schema
qa2 -
Click the button Save Changes. You will be taken to the panel Testing HR environment qa2.
-
Verify that the field Start model is set to HR application ( V1 ) on Sample testing database. If it is not, click Edit, then select this value from the drop down box, then click the button Save Changes.
Previous section: Modification methods and modification sets | Next section: Row level processing tasks