Data Selection and Data Reduction

All row level processing tasks select rows from a table (called the start table) using a start condition to define the rows to select. XDM will also select all referenced rows in related tables. A table is related to the start table if there are data relation rules connecting the two tables.

Start Table

The Start Table is the table that is used for data selection definition in the task. For every application model version, a start table is defined. By setting the Start model in an environment, the start table of this model is used if this environment is the source environment of the task.

For more details refer to the start table description in the reference.

Start Condition

The Start condition is specified by choosing one of the options in the sub-panel Kind of data selection and then supplying the associated parameters. This defines how the task should select rows from the start table.

There are four possible start conditions:

Select Number of Rows

Sets a fixed, non-negative number r of rows. XDM automatically and randomly chooses the keys for r rows and uses them to select those rows in the start table. r must be greater than -1; if r is greater than the number of rows in the table, all rows of the table will be selected.

Select Percentage of Table

Sets a fixed value p between 0 and 100. XDM automatically and randomly chooses the keys of p% of the rows in the start table and uses them to select those rows.

Specify Query

Select rows in the start table by using a custom query. The start condition is used in the WHERE clause of a SELECT statement. This query must be a valid SQL statement in the source system without a semicolon at the end. It is possible to specify a start condition over joined tables.

XDM selects the keys of the desired rows from the start table using the specified query. It then uses these key values to select the rows to be copied.

Using the start query template.

It is possible to define source and target environment in the task instead of in the task template. Therefore, the start table does not need to be defined when creating the task template. Furthermore, the unique row identifier to identify individual rows will be evaluated during the task run, so it is also not known when creating the task template.

XDM uses a start query template to represent these values in the task template. This start query template is defined as follows:

SELECT ${uniqueRowIdentifier}
FROM ${startTableSchema}.${startTableName} T
WHERE ...

The variables evaluate as follows:

uniqueRowIdentifier

A list of columns that can uniquely identify a single row in the start table.

The uniqueRowIdentifier is set during task execution, so it must be part of the start query.

startTableSchema

The table schema as set in the installed application version marked as starting point for the chosen environment.

startTableName

The table name as set in the application model for the installed application marked as starting point for the chosen environment.

It is recommended to use both name and schema variables in the start condition. Otherwise, the query must be changed manually when changing the environment.

The query can be changed to your needs so that it returns the rows you wish to have in the initial extract for row level processing.

The query template is evaluated using FreeMarker™. You can use all expressions described in the FreeMarker manual.

When using a complex start condition, it is useful to add comments to improve structure and readability. A line starting with # or -- is treated as a comment and is ignored when the start condition is evaluated. Empty lines are also ignored when the start condition is evaluated.

Using Custom Parameters

By enabling custom parameters for your task template, you can use those parameters in the query. If you have activated a parameter named customerId in your task you can access this with the following query:

SELECT ${uniqueRowIdentifier}
FROM ${startTableSchema}.${startTableName} T
WHERE "CUSTOMER_ID" = ${customerId}

In the next example we have two different parameters customerId and personId where only one may be active. This can be handled by the use of a FreeMarker™ expression:

SELECT ${uniqueRowIdentifier}
FROM ${startTableSchema}.${startTableName} T
<#if customerId?? && customerId != ''>
WHERE "CUSTOMER_ID" = ${customerId}
<#elseif personId?? && personId != ''>
INNER JOIN ${startTableSchema}.PERSON P
ON (P.CUSTOMER_ID = T.CUSTOMER_ID)
WHERE PERSON_ID = '${personId?trim}'
<#else>
WHERE 1=0
</#if>

This example shows the use of FreeMarker™ conditionals, the possibility to JOIN other tables into the start query, and an example of the use of functions on variables by applying trim to the personId.

The section Using file content in a query describes how file content can be used as a custom parameter value in a start query.

Read from external file

Specify to use start table selection values from a CSV file in the start condition.

This file must be stored in XDM as a file object with type Comma separated value (CSV).

The file is set in Start condition CSV file, the Columns in file property defines, which table columns are used in the file.

For more details refer to the row selection mode description and to the start condition description in the reference.

Reduction Rules

Reduction rules specify which objects should be reduced in an XDM task execution.

They are used to control the amount of data when fetching rows from a table. To this end the SELECT statement to fetch the data can be customized. The statement can be either a conditional expression (e.g. WHERE clause), or a template that defines the entire SELECT statement.

Once the set of rows to be copied has been determined, the reduction rules are applied to this set. When a row is removed from the set, because it does not satisfy the condition of a reduction rule, XDM automatically removes all dependent rows from the set as well. Therefore, reduction rules will be set on the source tables.

All selected rows which fulfill the reduction rule condition are copied, all other rows are discarded.

If a reduction rule is set for a table that is not selected in the task execution, the rule will be ignored.

Every reduction rule in the task execution is used. If there are two or more rules on a table, only the last matching rule is used. All previous rules are ignored for this table.

The reduction rule expression specifies a conditional expression that determines whether a row is to be copied or not.

There are two ways to define an expression:

By defining a WHERE condition for a SELECT statement generated at runtime
By defining a template that contains the full SELECT statement.

XDM evaluates the expression as a SELECT statement when processing the keys during subset extract. If the full SELECT statement template is not defined, the expression is interpreted as a WHERE clause for a SELECT statement. This must be entered in the form WHERE <condition>. The first part of the statement is generated by XDM.

It is possible to use runtime parameters as variables in the query. This is both for the WHERE condition and for the template possible. When using a SELECT template, the variables ${columns}, ${tableName}, and ${keys} will be set automatically in the task execution.

Using the WHERE clause

Usually the expression is set by defining a WHERE clause. A SELECT query that uses the specified WHERE clause is generated. Only rows which satisfy the resulting SELECT query are copied.

It is not possible to use conditions containing columns from other tables. If it is necessary to use such condition, use full templates instead.

Using the SELECT template

With a template, it is possible to customize the SELECT and FROM clauses of the statement. A SELECT template starts with <#--template--> and has to contain a valid SQL SELECT statement, as well as the variables ${columns} for the list of columns, ${tableName} for the name of the table and ${keys} for the list of keys selected by XDM when evaluating the start condition. These values will be set automatically by XDM during task execution (see default template below).

The variables ${columns}, ${tableName} and ${keys} must be part of the template, otherwise the template cannot be processed correctly.

A template should be used for complex reduction rules that contain more than a straight forward WHERE clause.

The default template is defined as:

<#--template-->
SELECT ${columns}
   FROM ${tableName} "T"
   WHERE ${keys}

Examples

Thin out rows

A typical use for a reduction rule is to thin out the rows that are copied from the child table of a relationship. For example, if a table contains history records, and a data relation rule specifies that the rows from the child table are also to be copied, a reduction rule can be added to the child table in order to specify that not all dependent rows, but only those that fulfill a certain conditional expression, are supposed to be copied. An example for this conditional expression is:

WHERE ID < 1000

Copy data depending on condition

Another usage of reduction rules is to filter rows that have a special value in one column. Assume, you want to copy customers with their contracts. The contracts have different types named A, B and C, which are stored in the CONTRACT_TYPE column of the contract table. For your purpose, only contracts of type A are necessary. To avoid to copying contracts of type B and C, you would define a reduction rule for the table PROD.CONTRACT and the column CONTRACT_TYPE:

WHERE CONTRACT_TYPE = 'A'

Copy data depending on a condition in another table

If the condition is set in another table, it is not possible to use the WHERE condition. We modify the example Depending Condition: Instead of a CONTRACT_TYPE column with the types as data, the type is stored in its own table called PROD.CONTRACTTYPE. The ID of this table is stored as foreign key in the column CONTRACT_TYPE. Again, only contracts of type A are necessary. To satisfy this requirement, a reduction rule template with a join between PROD.CONTRACT and PROD.CONTRACTTYPE is used. Note that PROD.CONTRACT is the subject of the reduction rule and therefore specified by the variable ${tableName}.

<#--template-->
SELECT ${columns}
   FROM ${tableName} "T" JOIN "PROD"."CONTRACTTYPE" "C" ON "T"."CONTRACT_TYPE" = "C"."ID"
   WHERE ${keys} AND "C"."TYPE" = 'A'

For more details refer to the reduction rules description in the reference.