Modification Method Script Examples
The following chapter shows examples how to implement some general use cases of Modification Methods either in Groovy or JavaScript code.
The detailed documentation of the modification method code can be found in the Modification Method code reference.
More documentation on modification methods itself as a configuration object can be found Modification Method object reference.
Example for a simple modification method
The following example demonstrates how to create a fictional e-mail address from a column that contains the first name of a person, and a column that contains the last name of a person. It is assumed that this modification method is used in a modification rule that contains the three parameters firstNameColumn, lastNameColumn, and domain, where firstNameColumn is the name of the table column that contains the first name of the person, lastNameColumn is the name of the table column that contains the last name of the person, and domain is a syntactically valid domain name. The modification rule itself is assumed to be applied to a column that is supposed to store the generated e-mail addresses. The position of this column inside the table is stored in the variable columnIndex.
-
Groovy
-
JavaScript
def apply() {
def firstNameIdx = ctx.indexOf(modificationMethod.firstNameColumn)
def lastNameIdx = ctx.indexOf(modificationMethod.lastNameColumn)
def email = "${data[firstNameIdx]}.${data[lastNameIdx]}@${modificationMethod.domain}"
data[columnIndex] = email
return true
}
function apply() {
var firstNameIdx = ctx.indexOf(modificationMethod.firstNameColumn);
var lastNameIdx = ctx.indexOf(modificationMethod.lastNameColumn);
var email = data[firstNameIdx] + "." +
data[lastNameIdx] + "@" +
modificationMethod.domain;
data[columnIndex] = email;
return true;
}
Line 1 determines the position of the first name column inside the table.
Line 2 determines the position of the last name column inside the table.
Lines 3 create an e-mail address by combining the first name, last name, and the fictional domain name.
Line 4 writes the generated e-mail address into the data array. The
position inside the array is determined by columnIndex, which points to
the table column that is supposed to store the e-mail address.
Each apply() method returns a boolean value, weather to keep or retain the current row in the further processing.
Filtering rows using modification methods
With XDM, it is possible to define which rows should be filtered during a modification process. Filtered rows
will not be part of the modification process. To filter rows, you have to specify the return value
of the applied modification method. Returning true means that the row will not be filtered, whereas
returning false means that it will be filtered, as also described in the
object reference for modification method.
As an example, we assume that we have a table structure containing a table that consists of information of employees. For a specific test, we only want to modify female employees that are born before 1980. To achieve this, we have to filter every row, that does not match the requirements. Through this, these rows will not be part of the copy.
If the requirements match, the first name of the person should be changed to the fix value "Jane". The corresponding modification code could then look like this:
-
Groovy
-
JavaScript
def apply() {
def genderColumnIndex = ctx.indexOf('gender')
def gender = data[genderColumnIndex]
def birthDateColumnIndex = ctx.indexOf('birth_date')
def birthDate = data[birthDateColumnIndex].toString()
// the actual modification
data[ctx.indexOf('first_name')] = "Jane"
return gender == 'F' && birthDate < "1980-01-01"
}
function apply() {
var genderColumnIndex = ctx.indexOf('gender');
var gender = data[genderColumnIndex];
var birthDateColumnIndex = ctx.indexOf('birth_date');
var birthDate = data[birthDateColumnIndex].toString();
// the actual modification
data[ctx.indexOf('first_name')] = "Jane";
return gender == 'F' && birthDate < "1980-01-01";
}
To illustrate the impact of filtering rows, a table comparing input and output values will look as follows. Matching rows are displayed in yellow in the source table.
Logging and Debugging
If there is a case that something does not work correctly or is in development, print shows an output.
This example shows how to debug a modification method for development cases with ctx.print.
It is also possible to use the native print() or println() methods of Groovy or JavaScript. These will
write to the task log independent of the tasks log level. Then each message will be prefixed with the current
modification methods name and the table the method is applied to.
The following example will print the modified value for the first 30 rows of the table the modification method is applied to.
The doModificationMethod in this example is just a stub to demonstrate the usage the ctx.print methods
-
Groovy
-
JavaScript
def rowCount = 0
def logRowLimit = 30
def doModification(input) {
// The masking algorithm, method, or something static that can be evaluated.
return "example ${input}"
}
def apply() {
if(rowCount < logRowLimit) {
String modifiedValue = doModification(data[columnIndex])
// will only be visible, if task is running with log level TRACE
ctx.print("FINE","Original value: ${data[columnIndex]} - Modified value: ${modifiedValue}")
// will be visible, if task is running with log level INFO
ctx.print("Modified value: ${modifiedValue}")
// will be visible in the task stage log on INFO level, prefixed with the method name
println("Print using native method")
rowCount += 1
}
return true
}
var rowCount = 0;
var logRowLimit = 30;
function doModification(input) {
return "example " + input;
}
function apply() {
if(rowCount < logRowLimit)
{
var masked = doModification(data[columnIndex]);
// will only be visible, if task is running with log level TRACE
ctx.print("FINE","Original value: " + data[columnIndex] + " - Modified value: " + modifiedValue);
// will be visible, if task is running with log level INFO
ctx.print("Modified value: ${modifiedValue}");
// will be visible in the task stage log on INFO level, prefixed with the method name
println("Print using native method");
rowCount++;
}
return true;
}
Also, it is useful for bigger modification configurations, to set up a library file with a debugging log method, that can be configured by a custom parameter in the task configuration. This example assumes there is a custom parameter definition modificationDebugLog of type Boolean,
that is configured in the respective task template. If the utility function is used in all modification methods, the debug logging can be enabled and disabled at once.
-
Groovy
-
JavaScript
def apply() {
debugLog("Current value ${data[columnIndex]}")
return true
}
// ...
// inside a library file
// ...
def debugLog(def message) {
if(ctx.getTaskRuntimeProperty('modificationDebugLog') == 'true') {
ctx.print(message)
}
}
function apply() {
debugLog("Current value " + data[columnIndex]});
return true;
}
// ...
// inside a library file
// ...
function debugLog(var message) {
if (ctx.getTaskRuntimeProperty('modificationDebugLog') == 'true') {
ctx.print(message);
}
}
Multiply rows using modification
This is an example of how to multiply rows by stashing and unstashing rows.
With this it is possible to add new rows into the process. There are two helper methods to
support the multiplication of rows: ctx.stashRow(…) and ctx.unstashRows().
The following example explains the usage of these methods, implemented in Groovy and JavaScript.
We assume there is a table consisting of personal data, including the columns first_name and last_name.
We want to do two things with this data.
-
Add the letter 'X' on every first name in the table
-
Replace every second last name with the constant string "Watson"
For this, we use the helper methods stashRow(…) and unstashRows(). First, we append the letter 'X' for every first name. After this, we do a deep copy of the modified row to make sure that it is not changed any further.
Now we can define the second modification part by setting the last name to "Watson". We stash this row as well.
To get both of the stashed rows back, we need to call unstashRows() and put the returned rows into the variable
replacementData. This variable is used by XDM as the output of the processing of the modification method, if it is
not empty.
-
Groovy
-
JavaScript
def apply() {
def firstNameColIndex = ctx.indexOf("first_name")
data[firstNameColIndex] += 'X'
ctx.stashRow(data)
def lastNameColIndex = ctx.indexOf("last_name")
data[lastNameColIndex] = "Watson"
ctx.stashRow(data)
replacementData.addAll(ctx.unstashRows())
return true
}
function apply() {
var firstNameColIndex = ctx.indexOf("first_name");
data[firstNameColIndex] += 'X';
ctx.stashRow(data);
var lastNameColIndex = ctx.indexOf("last_name");
data[lastNameColIndex] = "Watson";
ctx.stashRow(data);
replacementData.addAll(ctx.unstashRows());
return true;
}
| Multiplying rows may lead to problems concerning primary key constraints, if the primary key values are not modified as well. |
The following tables display the comparison of source and target tables.
Using Java libraries in modification scripts
It is possible to use external JAVA libraries to be called from the modification methods scripts. This example will show how to add and use the Java Faker library in a modification method.
Add a Java Library by creating a new XDM File:
-
Enter a display name (e.g. Java Faker)
-
Set File type to Java libraries
-
Leave delimiter as it is, it will not be evaluated
-
Upload the JAR into the XDM file (Download the JAR file here)
The Java Faker library itself has a dependency on SnakeYML. These dependencies will not be resolved automatically, but all dependent other libraries need to be added to the runtime class path of the modification method.
Also add a Java Library as XDM File for the dependent library:
-
Enter a display name (SnakeYML)
-
Set File type to Java libraries
-
Leave delimiter as it is, it will not be evaluated
-
Upload the JAR into the XDM file (Download the JAR file here)
With the libraries added to XDM, they can be used in a modification method. In the modification method, under libraries add the Java Faker and the SnakeYML libraries.
In the code, the libraries can be used as follows:
-
Groovy
-
JavaScript
faker = new com.github.javafaker.Faker(java.util.Locale.GERMAN)
def apply() {
data[columnIndex] = faker.name().lastName()
return true
}
faker = new com.github.javafaker.Faker(java.util.Locale.GERMAN);
function apply() {
data[columnIndex] = faker.name().lastName();
return true;
}
To illustrate the impact of the libraries here is an example, where the modification method was used on the last_name column.
Calling a REST API with modification
It is possible to call an external service from modification methods via a REST API.
This example calls a public REST API genderize.io, which gives an evaluation on the gender of a given name.
-
Groovy
-
JavaScript
import groovy.json.JsonSlurper
import org.springframework.http.HttpHeaders
import org.springframework.http.HttpEntity
import org.springframework.http.HttpMethod
import org.springframework.web.client.RestTemplate
def apply() {
var firstname = data[columnIndex]
def body = restCall("https://api.genderize.io/?name=${firstname}", "GET", null,
[
"Content-Type": "application/json",
"Accept": "application/json",
]
)
print(body)
var result = new JsonSlurper().parseText(body)
print "The name ${result.name} is with a probability of ${result.probability} of gender ${result.gender}"
data[columnIndex] = result.gender
return true
}
def restCall(url, method, payload, headersMap) {
def headers = new HttpHeaders()
headersMap.each { entry ->
headers.set(entry.key, entry.value)
}
def entity = new HttpEntity(payload, headers)
println "Call REST Endpoint $url method: $method"
def restTemplate = buildRestTemplate()
def response = restTemplate.exchange(url, HttpMethod.valueOf(method), entity, String.class)
return response.getBody()
}
def buildRestTemplate() {
return new RestTemplate()
}
importClass(org.springframework.http.HttpHeaders)
importClass(org.springframework.http.HttpEntity)
importClass(org.springframework.http.HttpMethod)
importClass(org.springframework.web.client.RestTemplate)
function apply() {
var firstname = data[columnIndex];
var body = restCall("https://api.genderize.io/?name=" + firstname, "GET", null,
{
"Content-Type": "application/json",
"Accept": "application/json",
}
);
print("Raw REST result:" + body);
var result = JSON.parse(body);
print("The name " + result.name + " is with a probability of " + result.probability + " of gender " + result.gender);
data[columnIndex] = result.gender;
return true;
}
function restCall(url, method, payload, headersMap) {
var headers = new HttpHeaders();
for(var key in headersMap) {
headers.set(key, headersMap[key]);
}
var entity = new HttpEntity(payload, headers);
print("Call REST Endpoint " + url + " with method: " + method);
var restTemplate = buildRestTemplate();
var response = restTemplate.exchange(url, HttpMethod.valueOf(method), entity, Packages.java.lang.String.class);
return response.getBody();
}
function buildRestTemplate() {
return new RestTemplate();
}
Find more examples for REST API calls in the common examples.
Executing SQL
Executing SQL using Groovy or JavaScript
In modification methods the typical script methods to use JDBC for querying data can be used as described in common script examples for executing SQL.
| Implementing a query in the apply() method for every modified row will lead to bad modification performance. If necessary to have an information for a database at place, consider writing a hook, that extracts the necessary information into a mapping table container and lookup the information for the modification method from there. |
An example how to use a SQL query in combination with the init() method:
-
Groovy
-
JavaScript
import groovy.sql.Sql
def maxValue = 0
void init() {
def query = 'SELECT MAX(emp_no) FROM production.employees'
Sql sql = new Sql(modificationMethod.connection.jdbcConnection)
maxValue = sql.firstRow(query)[0]
}
def apply() {
// change the id for all IDs that are within the existing range of IDs
if (data[columnIndex] <= maxValue) {
data[columnIndex] = data[columnIndex] + maxValue
}
return true
}
importPackage(java.sql);
var maxValue = 0;
function init() {
var query = 'SELECT MAX(emp_no) FROM production.employees';
var prepareStatement = modificationMethod.connection.jdbcConnection.prepareStatement(query);
var resultSet = prepareStatement.executeQuery();
while (resultSet.next()) {
// returns only one row
maxValue = resultSet.getString(1);
}
resultSet.close();
prepareStatement.close();
}
function apply() {
// change the id for all IDs that are within the existing range of IDs
if (data[columnIndex] <= maxValue) {
data[columnIndex] = data[columnIndex] + maxValue;
}
return true;
}
Use persistent memory in modification methods
Within XDM, you have the possibility to use a persistent memory to store information into and read information from. There are three different options of using the ModificationUtils and MappingTableUtils API in a modification method, all consisting of different types of data storage. These options are described below, containing one example each. The example itself always shows the same doing but with different ways to gain mapping information from. The example uses describes a first name mapping.
An overview of the available API in modification methods can be found in the corresponding object reference.
Use stored data from a Mapping Table Container
One possibility to use stored data from a mapping table is to read information from a defined Mapping Table Container. For this, the MappingTableUtils API should be used. As a precondition, a Mapping Table Container has to be defined and filled with data.
| Even if it is possible to store data in the Mapping Table Container from within a modification method, this is not recommended, because it would cause lock issues when writing into the container. We recommend to populate the contents of a mapping table container by a task stage hook. |
For having access to the mapping table container, that for this example is called firstNamesContainer, the MappingTableUtils
have to be initialized appropriately and can then be used in the further processing of the data to be modified. The name of
the MappingTableContainer is internally interpreted as a Connection object.
-
Groovy
-
JavaScript
import de.ubs.xdm3.batch.modification.MappingTableUtils
utils = new MappingTableUtils(modificationMethod.firstNamesContainer)
def apply() {
def firstNameIdx = ctx.indexOf("first_name")
def oldFirstName = data[firstNameIdx]
// specify mapping table name inside container and key for which a value should be retrieved
def newFirstName = utils.get("first_names", oldFirstName).first()
data[firstNameIdx] = newFirstName
return true
}
importClass(Packages.de.ubs.xdm3.batch.modification.MappingTableUtils);
const utils = new MappingTableUtils(modificationMethod.firstNamesContainer);
function apply() {
var firstNameIdx = ctx.indexOf("first_name");
var oldFirstName = data[firstNameIdx];
// specify mapping table name inside container and key for which a value should be retrieved
var newFirstName = utils.get("first_names", oldFirstName)[0];
data[firstNameIdx] = newFirstName;
return true;
}
Use stored data from XDM’s Administration Database
Another example of where to store the mapping information is the usage of the XDM administration database. A mapping table will then be stored directly in this existing database and can be accessed from within modification methods. The following example illustrates how to initialize the MappingTableUtils API appropriately.
The example uses the get(tableName, key) method that has two mandatory parameters. The first parameter is the name of the mapping
table inside the administration database. The second parameter is the key for which the mapped value should be retrieved.
-
Groovy
-
JavaScript
import de.ubs.xdm3.batch.modification.MappingTableUtils
// initialize the API without a parameter to use the xdm administration database as persistent memory
utils = new MappingTableUtils()
def apply() {
def firstNameIdx = ctx.indexOf("first_name")
def oldFirstName = data[firstNameIdx]
def newFirstName = utils.get("first_names", oldFirstName).first()
data[firstNameIdx] = newFirstName
return true
}
importClass(Packages.de.ubs.xdm3.batch.modification.MappingTableUtils);
const utils = new MappingTableUtils();
function apply() {
var firstNameIdx = ctx.indexOf("first_name");
var oldFirstName = data[firstNameIdx];
// specify mapping table name inside container and key for which a value should be retrieved
var newFirstName = utils.get("first_names", oldFirstName)[0];
data[firstNameIdx] = newFirstName;
return true;
}
Use stored data from a local file inside the task execution directory
For specific scenarios it can be necessary to have mapping information directly inside the task execution directory, e.g. when information is only available during the execution of modification with scope source. Then the information has to be persisted and transported to when modification with scope target is executed and the information would not be available anymore. The ModificationUtils API allows that by initializing the API accordingly, as shown in the example below.
The first part would be filling the mapping table with values when modification with scope source is applied. The second part would then be using this mapping table in a modification using scope target.
-
Groovy
-
JavaScript
import de.ubs.xdm3.batch.modification.MappingTableUtils
def utils
void init() {
def h2Folder = ctx.getTaskRuntimeProperty("root_dir") + "/files/"
// initialize the API using the path to a local file in which the h2 database MappingDB is located
utils = new MappingTableUtils("jdbc:h2:file:${h2Folder}/h2:MappingDB", "sa", "")
}
def apply() {
// assume that this information is only available on source modification
def sourceInfo = data[columnIndex]
utils.put("source_info", data[ctx.indexOf("id")], sourceInfo)
// do any modification
return true
}
importClass(Packages.de.ubs.xdm3.batch.modification.MappingTableUtils);
var utils;
function init() {
var h2Folder = ctx.getTaskRuntimeProperty("root_dir") + "/files/";
// initialize the API using the path to a local file in which the h2 database MappingDB is located
utils = new MappingTableUtils("jdbc:h2:file:" + h2Folder + "/h2:MappingDB", "sa", "");
}
function apply() {
// assume that this information is only available on source modification
var sourceInfo = data[columnIndex];
utils.put("source_info", data[ctx.indexOf("id")], sourceInfo);
// do any modification
return true;
}
The previously populated mapping table can now be referenced as a data basis when modifying data with scope source.
-
Groovy
-
JavaScript
import de.ubs.xdm3.batch.modification.MappingTableUtils
def utils
void init() {
def h2Folder = ctx.getTaskRuntimeProperty("root_dir") + "/files/"
// initialize the API using the path to a local file in which the h2 database MappingDB is located
utils = new MappingTableUtils("jdbc:h2:file:${h2Folder}/h2:MappingDB", "sa", "")
}
def apply() {
def idVal = data[ctx.indexOf("id")]
def newFirstName = utils.get("source_info", idVal).first()
// do any additional modification with the retrieved value
return true
}
importClass(Packages.de.ubs.xdm3.batch.modification.MappingTableUtils);
var utils;
function init() {
var h2Folder = ctx.getTaskRuntimeProperty("root_dir") + "/files/";
// initialize the API using the path to a local file in which the h2 database MappingDB is located
utils = new MappingTableUtils("jdbc:h2:file:" + h2Folder + "/h2:MappingDB", "sa", "");
}
function apply() {
var idVal = data[ctx.indexOf("id")];
var newFirstName = utils.get("source_info", idVal)[0];
// do any additional modification with the retrieved value
return true;
}
Working with large numeric values
| This only applies to legacy modification methods. |
When using big values in modification that are larger than a typical INTEGER value, it could come to rounding issues.
By using default operations like + and - the rounding issues could occur because
the JavaScript environment will internally cast this to double. To be able to calculate those values use
libraries like java.math.BigDecimal:
dataArray[columnIndex] = new Packages.java.math.BigDecimal(dataArray[columnIndex]).add(Packages.java.math.BigDecimal.ONE);