Modification Method Script Examples

The following chapter shows examples how to implement some general use cases of Modification Methods either in Groovy or JavaScript code.

The detailed documentation of the modification method code can be found in the Modification Method code reference.

More documentation on modification methods itself as a configuration object can be found Modification Method object reference.

Example for a simple modification method

The following example demonstrates how to create a fictional e-mail address from a column that contains the first name of a person, and a column that contains the last name of a person. It is assumed that this modification method is used in a modification rule that contains the three parameters firstNameColumn, lastNameColumn, and domain, where firstNameColumn is the name of the table column that contains the first name of the person, lastNameColumn is the name of the table column that contains the last name of the person, and domain is a syntactically valid domain name. The modification rule itself is assumed to be applied to a column that is supposed to store the generated e-mail addresses. The position of this column inside the table is stored in the variable columnIndex.

Groovy
JavaScript

def apply() {
    def firstNameIdx = ctx.indexOf(modificationMethod.firstNameColumn)
    def lastNameIdx = ctx.indexOf(modificationMethod.lastNameColumn)
    def email = "${data[firstNameIdx]}.${data[lastNameIdx]}@${modificationMethod.domain}"

    data[columnIndex] = email
    return true
}

function apply() {
    var firstNameIdx = ctx.indexOf(modificationMethod.firstNameColumn);
    var lastNameIdx = ctx.indexOf(modificationMethod.lastNameColumn);
    var email = data[firstNameIdx] + "." +
                   data[lastNameIdx] + "@" +
                   modificationMethod.domain;

    data[columnIndex] = email;
    return true;
}

Line 1 determines the position of the first name column inside the table.

Line 2 determines the position of the last name column inside the table.

Lines 3 create an e-mail address by combining the first name, last name, and the fictional domain name.

Line 4 writes the generated e-mail address into the data array. The position inside the array is determined by columnIndex, which points to the table column that is supposed to store the e-mail address.

Each apply() method returns a boolean value, weather to keep or retain the current row in the further processing.

Filtering rows using modification methods

With XDM, it is possible to define which rows should be filtered during a modification process. Filtered rows will not be part of the modification process. To filter rows, you have to specify the return value of the applied modification method. Returning true means that the row will not be filtered, whereas returning false means that it will be filtered, as also described in the object reference for modification method.

As an example, we assume that we have a table structure containing a table that consists of information of employees. For a specific test, we only want to modify female employees that are born before 1980. To achieve this, we have to filter every row, that does not match the requirements. Through this, these rows will not be part of the copy.

If the requirements match, the first name of the person should be changed to the fix value "Jane". The corresponding modification code could then look like this:

Groovy
JavaScript

def apply() {
    def genderColumnIndex = ctx.indexOf('gender')
    def gender = data[genderColumnIndex]

    def birthDateColumnIndex = ctx.indexOf('birth_date')
    def birthDate = data[birthDateColumnIndex].toString()

    // the actual modification
    data[ctx.indexOf('first_name')] = "Jane"

    return gender == 'F' && birthDate < "1980-01-01"
}

function apply() {
    var genderColumnIndex = ctx.indexOf('gender');
    var gender = data[genderColumnIndex];

    var birthDateColumnIndex = ctx.indexOf('birth_date');
    var birthDate = data[birthDateColumnIndex].toString();

    // the actual modification
    data[ctx.indexOf('first_name')] = "Jane";

    return gender == 'F' && birthDate < "1980-01-01";
}

To illustrate the impact of filtering rows, a table comparing input and output values will look as follows. Matching rows are displayed in yellow in the source table.

Logging and Debugging

If there is a case that something does not work correctly or is in development, print shows an output.

This example shows how to debug a modification method for development cases with ctx.print.

It is also possible to use the native print() or println() methods of Groovy or JavaScript. These will write to the task log independent of the tasks log level. Then each message will be prefixed with the current modification methods name and the table the method is applied to.

The following example will print the modified value for the first 30 rows of the table the modification method is applied to. The doModificationMethod in this example is just a stub to demonstrate the usage the ctx.print methods

Groovy
JavaScript

def rowCount = 0
def logRowLimit = 30

def doModification(input) {
    // The masking algorithm, method, or something static that can be evaluated.
    return "example ${input}"
}

def apply() {
    if(rowCount < logRowLimit) {
        String modifiedValue = doModification(data[columnIndex])
        // will only be visible, if task is running with log level TRACE
        ctx.print("FINE","Original value: ${data[columnIndex]}  - Modified value: ${modifiedValue}")
        // will be visible, if task is running with log level INFO
        ctx.print("Modified value: ${modifiedValue}")
        // will be visible in the task stage log on INFO level, prefixed with the method name
        println("Print using native method")
        rowCount += 1
    }
    return true
}

var rowCount = 0;
var logRowLimit = 30;

function doModification(input) {
    return "example " + input;
}

function apply() {
    if(rowCount < logRowLimit)
    {
        var masked = doModification(data[columnIndex]);
        // will only be visible, if task is running with log level TRACE
        ctx.print("FINE","Original value: " + data[columnIndex] + " - Modified value: " + modifiedValue);
        // will be visible, if task is running with log level INFO
        ctx.print("Modified value: ${modifiedValue}");
        // will be visible in the task stage log on INFO level, prefixed with the method name
        println("Print using native method");
        rowCount++;
    }
    return true;
}

Also, it is useful for bigger modification configurations, to set up a library file with a debugging log method, that can be configured by a custom parameter in the task configuration. This example assumes there is a custom parameter definition modificationDebugLog of type Boolean, that is configured in the respective task template. If the utility function is used in all modification methods, the debug logging can be enabled and disabled at once.

Groovy
JavaScript

def apply() {
    debugLog("Current value ${data[columnIndex]}")
    return true
}

// ...
// inside a library file
// ...

def debugLog(def message) {
    if(ctx.getTaskRuntimeProperty('modificationDebugLog') == 'true') {
        ctx.print(message)
    }
}

function apply() {
    debugLog("Current value " + data[columnIndex]});
    return true;
}

// ...
// inside a library file
// ...

function debugLog(var message) {
    if (ctx.getTaskRuntimeProperty('modificationDebugLog') == 'true') {
        ctx.print(message);
    }
}

Multiply rows using modification

This is an example of how to multiply rows by stashing and unstashing rows. With this it is possible to add new rows into the process. There are two helper methods to support the multiplication of rows: ctx.stashRow(…) and ctx.unstashRows().

The following example explains the usage of these methods, implemented in Groovy and JavaScript. We assume there is a table consisting of personal data, including the columns first_name and last_name. We want to do two things with this data.

Add the letter 'X' on every first name in the table
Replace every second last name with the constant string "Watson"

For this, we use the helper methods stashRow(…) and unstashRows(). First, we append the letter 'X' for every first name. After this, we do a deep copy of the modified row to make sure that it is not changed any further.

Now we can define the second modification part by setting the last name to "Watson". We stash this row as well.

To get both of the stashed rows back, we need to call unstashRows() and put the returned rows into the variable replacementData. This variable is used by XDM as the output of the processing of the modification method, if it is not empty.

Groovy
JavaScript

def apply() {
    def firstNameColIndex = ctx.indexOf("first_name")
    data[firstNameColIndex] += 'X'
    ctx.stashRow(data)

    def lastNameColIndex = ctx.indexOf("last_name")
    data[lastNameColIndex] = "Watson"
    ctx.stashRow(data)

    replacementData.addAll(ctx.unstashRows())
    return true
}

function apply() {
    var firstNameColIndex = ctx.indexOf("first_name");
    data[firstNameColIndex] += 'X';
    ctx.stashRow(data);

    var lastNameColIndex = ctx.indexOf("last_name");
    data[lastNameColIndex] = "Watson";
    ctx.stashRow(data);

    replacementData.addAll(ctx.unstashRows());
    return true;
}

Multiplying rows may lead to problems concerning primary key constraints, if the primary key values are not modified as well.

The following tables display the comparison of source and target tables.

Using Java libraries in modification scripts

It is possible to use external JAVA libraries to be called from the modification methods scripts. This example will show how to add and use the Java Faker library in a modification method.

Add a Java Library by creating a new XDM File:

Enter a display name (e.g. Java Faker)
Set File type to Java libraries
Leave delimiter as it is, it will not be evaluated
Upload the JAR into the XDM file (Download the JAR file here)

The Java Faker library itself has a dependency on SnakeYML. These dependencies will not be resolved automatically, but all dependent other libraries need to be added to the runtime class path of the modification method.

Also add a Java Library as XDM File for the dependent library:

Enter a display name (SnakeYML)
Set File type to Java libraries
Leave delimiter as it is, it will not be evaluated
Upload the JAR into the XDM file (Download the JAR file here)

With the libraries added to XDM, they can be used in a modification method. In the modification method, under libraries add the Java Faker and the SnakeYML libraries.

In the code, the libraries can be used as follows:

Groovy
JavaScript

faker = new com.github.javafaker.Faker(java.util.Locale.GERMAN)
def apply() {
  data[columnIndex] = faker.name().lastName()
  return true
}

faker = new com.github.javafaker.Faker(java.util.Locale.GERMAN);
function apply() {
  data[columnIndex] = faker.name().lastName();
  return true;
}

To illustrate the impact of the libraries here is an example, where the modification method was used on the last_name column.

Calling a REST API with modification

It is possible to call an external service from modification methods via a REST API.

This example calls a public REST API genderize.io, which gives an evaluation on the gender of a given name.

Groovy
JavaScript

import groovy.json.JsonSlurper
import org.springframework.http.HttpHeaders
import org.springframework.http.HttpEntity
import org.springframework.http.HttpMethod
import org.springframework.web.client.RestTemplate

def apply() {
    var firstname = data[columnIndex]
    def body = restCall("https://api.genderize.io/?name=${firstname}", "GET", null,
            [
                "Content-Type": "application/json",
                "Accept": "application/json",
            ]
        )
    print(body)
    var result = new JsonSlurper().parseText(body)
    print "The name  ${result.name} is with a probability of ${result.probability} of gender ${result.gender}"

    data[columnIndex] = result.gender
    return true
}

def restCall(url, method, payload, headersMap) {

    def headers = new HttpHeaders()
    headersMap.each { entry ->
        headers.set(entry.key, entry.value)
    }

    def entity = new HttpEntity(payload, headers)
    println "Call REST Endpoint $url method: $method"

    def restTemplate = buildRestTemplate()
    def response = restTemplate.exchange(url, HttpMethod.valueOf(method), entity, String.class)

    return response.getBody()
}

def buildRestTemplate() {
    return new RestTemplate()
}

importClass(org.springframework.http.HttpHeaders)
importClass(org.springframework.http.HttpEntity)
importClass(org.springframework.http.HttpMethod)
importClass(org.springframework.web.client.RestTemplate)

function apply() {
    var firstname = data[columnIndex];
    var body = restCall("https://api.genderize.io/?name=" + firstname, "GET", null,
            {
                "Content-Type": "application/json",
                "Accept": "application/json",
            }
        );
    print("Raw REST result:" + body);
    var result = JSON.parse(body);
    print("The name " + result.name + " is with a probability of " + result.probability + " of gender " + result.gender);

    data[columnIndex] = result.gender;
    return true;
}

function restCall(url, method, payload, headersMap) {
    var headers = new HttpHeaders();

    for(var key in headersMap) {
        headers.set(key, headersMap[key]);
    }
    var entity = new HttpEntity(payload, headers);
    print("Call REST Endpoint " + url + " with method: " + method);

    var restTemplate = buildRestTemplate();
    var response = restTemplate.exchange(url, HttpMethod.valueOf(method), entity, Packages.java.lang.String.class);

    return response.getBody();
}

function buildRestTemplate() {
    return new RestTemplate();
}

Find more examples for REST API calls in the common examples.

Executing SQL

Executing SQL using Groovy or JavaScript

In modification methods the typical script methods to use JDBC for querying data can be used as described in common script examples for executing SQL.

For more information about the Groovy SQL classes and methods can be found here

Implementing a query in the apply() method for every modified row will lead to bad modification performance. If necessary to have an information for a database at place, consider writing a hook, that extracts the necessary information into a mapping table container and lookup the information for the modification method from there.

An example how to use a SQL query in combination with the init() method:

Groovy
JavaScript

import groovy.sql.Sql

def maxValue = 0

void init() {
    def query = 'SELECT MAX(emp_no) FROM production.employees'
    Sql sql = new Sql(modificationMethod.connection.jdbcConnection)
    maxValue = sql.firstRow(query)[0]
}

def apply() {
    // change the id for all IDs that are within the existing range of IDs
    if (data[columnIndex] <= maxValue) {
        data[columnIndex] = data[columnIndex] + maxValue
    }
    return true
}

importPackage(java.sql);

var maxValue = 0;

function init() {
    var query = 'SELECT MAX(emp_no) FROM production.employees';
    var prepareStatement = modificationMethod.connection.jdbcConnection.prepareStatement(query);
    var resultSet = prepareStatement.executeQuery();

    while (resultSet.next()) {
        // returns only one row
        maxValue = resultSet.getString(1);
    }

    resultSet.close();
    prepareStatement.close();
}

function apply() {
    // change the id for all IDs that are within the existing range of IDs
    if (data[columnIndex] <= maxValue) {
        data[columnIndex] = data[columnIndex] + maxValue;
    }
    return true;
}

Get and use Source and Target Connection

In some scenarios, a modification method needs to read additional data from another table during row processing. For example, you might want to fill a column with a derived value such as MAX(emp_no) from a separate table. JDBC connections for both the current source and the current target connection can be obtained from the modification context by using ctx.getCurrentSourceConnectionPool() and ctx.getCurrentTargetConnectionPool() respectively. Both methods return a connection pool. A connection is acquired with .get() and must be returned with .put() when no longer needed.

Using JDBC inside a modification method can significantly slow down task execution, when statements are executed in the apply function, because then the statement is executed once per processed data row. If possible statements should be executed only once in the init function and the results should be stored in global variables. This approach should only be used if simpler alternatives (such as mapping tables or lookup data) are not sufficient.

The following example demonstrates how to:

obtain JDBC connections for the current source connections from the modification context,
execute a SELECT statement on the source database, and
write the result into the current row using the data and columnIndex variables .

Groovy
JavaScript

 import groovy.sql.Sql

// Global variables used across init(), apply(), and close()
mySourceConnection = null
sqlSource = null
sqlTarget = null
currentEmpNo = null

void init() {
    // Obtain JDBC connection from the current connection pool.
    // This connection corresponds to the configured source connections of the task.
    mySourceConnection = ctx.getCurrentSourceConnectionPool().get()
    // Wrap the JDBC connection with groovy.sql.Sql for convenient querying.
    sqlSource = new Sql(mySourceConnection)

    // Example query: read the maximum employee number from another table.
    def query = 'SELECT MAX(emp_no) FROM production.employees'
    def row = sqlSource.firstRow(query)

    // Store the retrieved maximum employee number in a global variable
    // This will be used and incremented in apply()
    currentEmpNo = row[0]
}

def apply() {
    // Increment the employee number by 1 for each row processed
    currentEmpNo = currentEmpNo + 1
    // Write the new employee number into the current record
    data[columnIndex] = currentEmpNo
    return true
}

void close() {
    // Return the connection to their pools to avoid resource leaks.
    ctx.getCurrentSourceConnectionPool().put(mySourceConnection)
}

// Make Groovy Sql class available for Nashorn
var Sql = Java.type("groovy.sql.Sql");

var mySourceConnection = null;
var sqlSource = null;
var sqlTarget = null;
var currentEmpNo = null;

function init() {
    // Get JDBC connections from the current connection pools
    mySourceConnection = ctx.getCurrentSourceConnectionPool().get();

    // Wrap the JDBC connections with groovy.sql.Sql
    sqlSource = new Sql(mySourceConnection);

    // Only once at startup: get the current maximum employee number
    var query = "SELECT MAX(emp_no) FROM production.employees";
    var row = sqlSource.firstRow(query); // returns a GroovyRowResult (Map-like)

    currentEmpNo = row.getAt(0);
}

function apply() {
    // Increase the employee number by 1 for each new row
    currentEmpNo = currentEmpNo + 1;

    // Write the new value into the current column
    data[columnIndex] = currentEmpNo;

    return true;
}

function close() {
    ctx.getCurrentSourceConnectionPool().put(mySourceConnection);
}

Use persistent memory in modification methods

Within XDM, you have the possibility to use a persistent memory to store information into and read information from. There are three different options of using the ModificationUtils and MappingTableUtils API in a modification method, all consisting of different types of data storage. These options are described below, containing one example each. The example itself always shows the same doing but with different ways to gain mapping information from. The example uses describes a first name mapping.

An overview of the available API in modification methods can be found in the corresponding object reference.

Use stored data from a Mapping Table Container

One possibility to use stored data from a mapping table is to read information from a defined Mapping Table Container. For this, the MappingTableUtils API should be used. As a precondition, a Mapping Table Container has to be defined and filled with data.

Even if it is possible to store data in the Mapping Table Container from within a modification method, this is not recommended, because it would cause lock issues when writing into the container. We recommend to populate the contents of a mapping table container by a task stage hook.

For having access to the mapping table container, that for this example is called firstNamesContainer, the MappingTableUtils have to be initialized appropriately and can then be used in the further processing of the data to be modified. The name of the MappingTableContainer is internally interpreted as a Connection object.

Groovy
JavaScript

import de.ubs.xdm3.batch.modification.MappingTableUtils
utils = new MappingTableUtils(modificationMethod.firstNamesContainer)

def apply() {
    def firstNameIdx = ctx.indexOf("first_name")
    def oldFirstName = data[firstNameIdx]
    // specify mapping table name inside container and key for which a value should be retrieved
    def newFirstName = utils.get("first_names", oldFirstName).first()

    data[firstNameIdx] = newFirstName
    return true
}

importClass(Packages.de.ubs.xdm3.batch.modification.MappingTableUtils);
const utils = new MappingTableUtils(modificationMethod.firstNamesContainer);

function apply() {
    var firstNameIdx = ctx.indexOf("first_name");
    var oldFirstName = data[firstNameIdx];
    // specify mapping table name inside container and key for which a value should be retrieved
    var newFirstName = utils.get("first_names", oldFirstName)[0];

    data[firstNameIdx] = newFirstName;
    return true;
}

Use stored data from XDM’s Administration Database

Another example of where to store the mapping information is the usage of the XDM administration database. A mapping table will then be stored directly in this existing database and can be accessed from within modification methods. The following example illustrates how to initialize the MappingTableUtils API appropriately.

The example uses the get(tableName, key) method that has two mandatory parameters. The first parameter is the name of the mapping table inside the administration database. The second parameter is the key for which the mapped value should be retrieved.

Groovy
JavaScript

import de.ubs.xdm3.batch.modification.MappingTableUtils
// initialize the API without a parameter to use the xdm administration database as persistent memory
utils = new MappingTableUtils()

def apply() {
    def firstNameIdx = ctx.indexOf("first_name")
    def oldFirstName = data[firstNameIdx]
    def newFirstName = utils.get("first_names", oldFirstName).first()

    data[firstNameIdx] = newFirstName
    return true
}

importClass(Packages.de.ubs.xdm3.batch.modification.MappingTableUtils);
const utils = new MappingTableUtils();

function apply() {
    var firstNameIdx = ctx.indexOf("first_name");
    var oldFirstName = data[firstNameIdx];
    // specify mapping table name inside container and key for which a value should be retrieved
    var newFirstName = utils.get("first_names", oldFirstName)[0];

    data[firstNameIdx] = newFirstName;
    return true;
}

Use stored data from a local file inside the task execution directory

For specific scenarios it can be necessary to have mapping information directly inside the task execution directory, e.g. when information is only available during the execution of modification with scope source. Then the information has to be persisted and transported to when modification with scope target is executed and the information would not be available anymore. The ModificationUtils API allows that by initializing the API accordingly, as shown in the example below.

The first part would be filling the mapping table with values when modification with scope source is applied. The second part would then be using this mapping table in a modification using scope target.

Groovy
JavaScript

import de.ubs.xdm3.batch.modification.MappingTableUtils
def utils

void init() {
    def h2Folder = ctx.getTaskRuntimeProperty("root_dir") + "/files/"
    // initialize the API using the path to a local file in which the h2 database MappingDB is located
    utils = new MappingTableUtils("jdbc:h2:file:${h2Folder}/h2:MappingDB", "sa", "")
}

def apply() {
    // assume that this information is only available on source modification
    def sourceInfo = data[columnIndex]
    utils.put("source_info", data[ctx.indexOf("id")], sourceInfo)

    // do any modification
    return true
}

importClass(Packages.de.ubs.xdm3.batch.modification.MappingTableUtils);
var utils;

function init() {
    var h2Folder = ctx.getTaskRuntimeProperty("root_dir") + "/files/";
    // initialize the API using the path to a local file in which the h2 database MappingDB is located
    utils = new MappingTableUtils("jdbc:h2:file:" + h2Folder + "/h2:MappingDB", "sa", "");
}

function apply() {
    // assume that this information is only available on source modification
    var sourceInfo = data[columnIndex];
    utils.put("source_info", data[ctx.indexOf("id")], sourceInfo);

    // do any modification
    return true;
}

The previously populated mapping table can now be referenced as a data basis when modifying data with scope source.

Groovy
JavaScript

import de.ubs.xdm3.batch.modification.MappingTableUtils
def utils

void init() {
    def h2Folder = ctx.getTaskRuntimeProperty("root_dir") + "/files/"
    // initialize the API using the path to a local file in which the h2 database MappingDB is located
    utils = new MappingTableUtils("jdbc:h2:file:${h2Folder}/h2:MappingDB", "sa", "")
}

def apply() {
    def idVal = data[ctx.indexOf("id")]
    def newFirstName = utils.get("source_info", idVal).first()

    // do any additional modification with the retrieved value
    return true
}

importClass(Packages.de.ubs.xdm3.batch.modification.MappingTableUtils);
var utils;

function init() {
    var h2Folder = ctx.getTaskRuntimeProperty("root_dir") + "/files/";
    // initialize the API using the path to a local file in which the h2 database MappingDB is located
    utils = new MappingTableUtils("jdbc:h2:file:" + h2Folder + "/h2:MappingDB", "sa", "");
}

function apply() {
    var idVal = data[ctx.indexOf("id")];
    var newFirstName = utils.get("source_info", idVal)[0];

    // do any additional modification with the retrieved value
    return true;
}

Working with large numeric values

This only applies to legacy modification methods.

When using big values in modification that are larger than a typical INTEGER value, it could come to rounding issues. By using default operations like + and - the rounding issues could occur because the JavaScript environment will internally cast this to double. To be able to calculate those values use libraries like java.math.BigDecimal:

dataArray[columnIndex] = new Packages.java.math.BigDecimal(dataArray[columnIndex]).add(Packages.java.math.BigDecimal.ONE);