PII Finder Task Template

An XDM PII (Personally Identifiable Information) finder task template is designed to analyze the contents of given tables in a single database.

When a PII finder task is executed, it examines the environment by processing selection rules and exclude rules.

PII finder tasks are used to find personally identifiable information, such as first and last names, email addresses, or birth-dates. A PII finder task checks which percentage of a column’s values match the criteria defined in a matcher object. A matcher object compares each value to a regular expression or to a list of reference values.

Permissions

Task Templates have specific permissions to manage user access. The table below displays the available permissions and their purposes.

For more details about the concept of XDMs permission management refer to Permission Management.

Permission

Description

ADMINISTRATION

Specifies that the grantee can grant and revoke permissions to and from other users.

A user that creates an object automatically receives the ADMINISTRATION permission on that object.

DELETE

Specifies that the grantee can delete objects of the selected types.

DIAGNOSE

This permission controls access to diagnostic data of a task execution. The diagnostic data consists of the stages, their outputs and the batch reports.

EXECUTE

Specifies that the grantee may execute the object, or schedule the object for later execution. This permission only applies to objects that are executable, i.e. tasks, task templates, workflows, workflow templates, and data shops.

READ

Specifies that the grantee has read permission on the object. The grantee is able to see the object in lists and can see all of the object’s details, such as rules or access permissions.

In addition, the grantee can reference this object. For example, a user who has READ permission on a credential object can refer to this credential object when creating a new database connection.

WRITE

Specifies that the grantee has the permission to change the settings and attributes of an object. This also includes modifying any rule lists that might be associated with the object (for example, the selection rules of a task template).

Properties

The table below documents the available properties for task templates. The 'name' column displays the property name as it can be used in Groovy and Java Scripts.

Name

Type

Default

Description

Connection

connection

Connection

n/a

Specifies the connection that is used in the PII finder task. A connection must be set in either the task template or the task. If it is set in both, the connection set in the task is used. The connection must be usable as source connection.

Delete execution files on success

deleteExecutionFilesOnSuccess

Boolean

false

Specifies if working files of a task execution should be deleted in case of a successful task execution. This reduces the required space for a task and should therefore prevent space problems on the server when many tasks exist.

Description

description

String

n/a

An optional description for this object. The description can contain multiple lines to give more context on the configured object. The description is not used in a technical context.

Name

displayName

String

n/a

Specifies the name of the object. The name is used to display and identify the object in lists. The name can contain any valid UTF-8 characters.

Execution platform

executionPlatform

String

default

This property defines whether and on which platform the Dataflow server is being built for a task or workflow execution. The property on the template is overwritten by the property on the task or workflow.

Execute retention period

executionRetentionPeriod

String

-1

Specifies how long executions are kept for the specific task or workflow. If an execution is older than the specified retention period, the XDM service will remove it. If the executions should not be deleted automatically, the retention period must be set to -1. Otherwise, the input must start with P for a date period or with PT with a time period.

A change of the execution retention period will only affect executions created after the change.

A detailed description of the period syntax can be found here.

Fetch lobs

fetchLobs

Boolean

false

Controls whether lob columns are fetched, and the content is analyzed. The analysis of lob values might increase the processing time.

Include numeric fields

includeNumericFields

Boolean

false

A PII finder task normally only analyzes columns that can contain strings and characters. This flag can be used to specify that columns with numeric data types are also taken into account.

Performance supervision

jdbcProfiling

Boolean

false

Activate the performance supervision in the associated task template (all task types are supported). If this option is activated certain warnings in the task logs will be activated and profiling statistics for JDBC queries are logged. They are available through a task execution export in the task folder files/performanceSupervision.

Log level

logLevel

Loglevel

INFO

Controls the granularity of the log messages for the XDM task issued directly by XDM. The number of log messages decreases in the following order: Trace → Info → Warning → Error

To keep the size of the output small, set the log level to the default value Info, unless there are errors that you need to diagnose, which require more detail than that provided by log level Info. The log size is set to 10MB by default. To modify the log size, use a custom parameter with the name modifiedLogSize. The value must be the desired size in bytes. After the log size is reached, the last log entry will notify the user that the log size has been reached and a hint that the full log can be downloaded via exporting the task execution.

It is also possible to overwrite the log level in specific program parts. This can be done by adding the prefix logger.level. + package name via a task stage hook to the global property map. For example, to set a specific log level to all components of the package de.ubs.xdm.batch.subsetapply you have to set the property logger.level.de.ubs.xdm.batch.subsetapply with the wanted required log level. inside a task stage hook. Supported values are info, warning, error, verbose and debug. A example how to set a property in a task stage hook can be found here. Typically, the support team will instruct to set a package log level for detailed analysis if necessary.

Possible values are:

Error (error)

The least fine granularity. Only severe error messages that cause the task to terminate are written to the log file.

Info (info)

The most important messages about the progress of the task are written to the log file.

Trace (verbose)

Additional messages without any messages for debugging purposes are written to the log file.

Warning (warning)

The log file contains warning messages that may or may not require user attention.

Matcher

matcher

Matcher

n/a

The property contains a list of matcher that can be used in the analysis of the selected table. For every matcher it is also stored, if it is selected or not.

It is not possible to create an PII finder task without a matcher.

Probability

probability

Number

20

The minimum probability for which a matcher will generate entries in the report file, expressed as a percentage. For example, if a column in a table is analyzed with a matcher using the default value for this property, and 20 rows or more out of 100 rows in this table are matched, then these hits will be reported. If less than 20 rows are hit, no result will be reported in the report file.

Sample size

sampleSize

Number

10000

Defines how many rows per table are read by the data miner. If the table contains 1000 rows and sample size is 100, then just 100 rows are read. If Sample size is set to 2000, then all 1000 rows are read.

If the property Use absolute count is used, you can set the sample size to 0 to scan the entire table.

Tags

tags

Tag

n/a

Contains the tags that apply to this object. These tags can be used in the search to find objects quickly and effortlessly.

Task execution report

taskExecutionReport

TaskExecutionReport

n/a

This list defines which reports are created at the end of an execution and are displayed in the task execution. The list includes all defined reports to which the current user has READ access. The selected reports are generated at the end of a task or workflow execution.

If a workflow template calls one or more tasks, only the reports that were defined on the workflow template are generated. The reports that have been specified for the respective tasks are not generated at the end of the execution.

Trace data configuration

traceDataConfiguration

String

n/a

This property specifies a list with one or more values which should be traced during XDM task processing. If multiple values are specified, these should be separated by semicolons. The specified values are reported in a trace data report at the end of a task stage in which the value was detected. Reported are the rows of a table in which one of the specified values occurs in any of the row’s columns, whenever that row is part of an INSERT, UPDATE, DELETE, or SELECT action.

If a specified value is detected in a stage as part of one of the above named actions, then the row in which it occurs is reported in the trace data report at the end of that stage. In the event that a specified value is affected by a modification method, the row in which it occurs is reported in the trace data modification report.

Use absolute count

useAbsoluteCount

Boolean

false

Normally, a task shows the relative frequency of hits as a percentage. With this property XDM will show the absolute number of hits. In this case the probability field for this task is deactivated.

Actions

The available actions are described below. Some actions apply to the list, while others are specific to selected task templates.

List Actions

The following actions are available on the task templates list. If the action is disabled a tooltip will provide the exact reason for the deactivation. The required permissions are described in detail for each action.

  • Bulk Create Permission

  • Bulk Delete

  • Bulk Export

  • Create

  • List History

Create a new permission on the selected objects. Shows in the result list whether the permission could be granted on the respective object. Only these permissions can be granted that are existing on the underlying object.

A permission in the result list can have three different states, these are:

CREATED

The permission successfully granted on the object.

MERGED

The granted permission already exists on the object and merged with the new permission.

SKIPPED

The permission could not be granted, because of missing administration permission on the object.

The following permissions are required on the list:

  • ADMINISTRATION

  • READ

Delete the selected objects.

The following options are available:

Cascade

Recursively delete depending objects.

When using cascade, dependent objects are deleted first also with cascade enabled. Thus, a cascade deletion is a recursive function that deeply searches for dependent objects and deletes them first. There is only a confirmation for the first object. The dependent objects are deleted without confirmation but only when the user has the DELETE permission.

This feature is only available in development mode. More information about development mode can be found in the chapter User Settings. It should be used with caution.

An object in the result list can have two different states, these are:

DELETED

The object could be deleted.

NOT_DELETED

The object could be not deleted. This may be because the executing person does not have a delete permission on the object or the object is still referenced by others. A detailed reason can be determined with the help of the error message. If the object is still in use, these objects are also displayed.

The following permissions are required on the list:

  • DELETE

  • READ

Exports the selected objects.

YAML

Generates a YAML file containing all the object’s settings. The user has the option to download the export file, or to paste the content in the import dialog. The YAML export is particularly suitable for importing the exported objects again via the XDM UI.

ZIP

This export writes several individual YAML-files. Each YAML-file is stored in a directory according to its type. For example, when exporting a native table backup task template named 'A backup template', a YAML-file 'A backup template.yaml' is created inside the directory /TaskTemplate/native-table-backup-task-template/ of the ZIP-file. This kind of export is suitable for usage in git-repositories together with XDM’s configuration as code feature.

Related and dependent objects can optionally be included in the export. The export dialog has the following options:

Include dependent objects

Dependent objects only belong to the exported object like rules and tasks.

Include permissions

Permissions of each exported object, only when the object supports permissions. Some objects like rules don’t have permissions.

Include referenced objects

Referenced objects exist by their own and are used in the exported object like connections and environments.

Include objects that depend on referenced objects

Also include the dependent objects of the referenced objects. E.g. the rules of a modification set or the rules in an application model version.

Objects on which the user does not have READ permission are not exported. This includes dependent and referenced objects. However, the reference to an object will be exported. For example a connection object would refer to the credential, even if the user does not have READ permission on the credential. The definition of the credential object itself will not be part of the export file. This can lead to issues during the import, because the connection cannot be created without an existing credential.

The following permissions are required on the list:

  • READ

Creates a new object in the current list. Depending on the object type either a popup dialog is shown for the most important settings, or the complete object is shown in edit mode. The dialog provides the option to create the object and remain in the current list or to switch to the newly created object in edit mode to perform further changes.

The following permissions are required on the list:

  • CREATE

The history list tracks all modifications made to objects within it. A new record is added each time an object is created, edited, or deleted. A record indicates who made the change, which object was affected, and when the change was made.

For more information about the concept of the history refer to the history concepts.

The following permissions are required on the list:

  • READ

Object Actions

The following actions are available on specific task templates. In order to execute the action, the user must possess the necessary permissions for the object. The permissions required for each action are described individually. If the user does not have these permissions, the action will be disabled and the tooltip will provide the exact reason for the deactivation.

  • Check

  • Delete

  • Duplicate

  • Edit

  • Event List

  • Export

  • Object History

  • Permission Check

  • Usage

  • Uses

This action validates the object and its dependencies, reporting configuration errors that could cause issues during task or workflow execution. The validation will cascade through the child objects of the checked objects and objects referenced by them.

For instance, if an installed application of an environment is checked, the check will process the application model, the specified version, the connection, modification sets, and involved modification methods. If an object has rules, all active rules will be checked. The modeling connection and version, including their modification sets and methods, will also be checked. Deactivated objects will not be included in recursive checks, but can be checked individually if the check is executed on the object itself.

Checks often require additional information from the context of the objects being checked, such as necessary connections or custom parameter values. The check will gather information from the objects being checked and use it to perform checks on child objects. Any required additional information must be provided before the check begins. The check queries the user to provide these missing information.

Database object checks

For all rules which reference database objects such as tables, columns, etc, the check verifies that the those objects exist in the database system. If a connection can be inferred from the context, then this connection is used. If no connection is available in the context, it must be specified before the check is executed.

Connection checks

For objects which configure access to external systems, such as connections or storage locations, the configuration check verifies that access can be established using the given credentials. Furthermore, additional operations on database connections are performed to check whether the credential user has the necessary authorization to access relevant database objects. In particular, the credential user’s permission to read source tables and write to target tables is verified. Similarly, for storage locations the check verifies that the credential user has permission to write to the working directory.

Code checks

For all entities containing code segments, such as modification methods or condition scripts, the syntax for the code is checked. This does not check, however, whether at run time all necessary variables are likely to be available.

The following permissions are required:

  • READ

Delete the object. If the object is still used by another entity, an error message is displayed, and the object is not deleted. The delete operation must be confirmed in a separate popup.

The following options are available:

Cascade

Recursively delete depending objects.

When using cascade, dependent objects are deleted first also with cascade enabled. Thus, a cascade deletion is a recursive function that deeply searches for dependent objects and deletes them first. There is only a confirmation for the first object. The dependent objects are deleted without confirmation but only when the user has the DELETE permission.

This feature is only available in development mode. More information about development mode can be found in the chapter User Settings. It should be used with caution.

The following permissions are required:

  • DELETE

  • READ

Will create an exact copy of the current object with a different display name in the same list. Users can decide whether they want to copy child objects like rules, permissions or tasks. It is only possible to select complete classes of objects and not to select individual child objects. Copied child-objects will preserve their display name. The default is to copy all child objects.

The following permissions are required:

  • CREATE

  • READ

Opens the current entity in edit mode.

The following permissions are required:

  • READ

  • WRITE

This list shows all registered events for the object. It includes events that are specific to the object, or for that type.

The following permissions are required:

  • READ

This action allows to export XDM objects in different formats in order to import them via export or CasC in another environment.

Refer to configuration of export for more information.

Related and dependent objects can optionally be included in the export. The export dialog has the following options:

Include dependent objects

Dependent objects only belong to the exported object like rules and tasks.

Include permissions

Permissions of each exported object, only when the object supports permissions. Some objects like rules don’t have permissions.

Include referenced objects

Referenced objects exist by their own and are used in the exported object like connections and environments.

Include objects that depend on referenced objects

Also include the dependent objects of the referenced objects. E.g. the rules of a modification set or the rules in an application model version.

Include implicit created objects

Implicit created objects are tasks or workflows which were automatically created for execution. These objects won’t be exported by default, but can be included by setting this flag. When exporting implicit objects, make sure that the Include dependent objects flag is also enabled.

Objects on which the user does not have READ permission are not exported. This includes dependent and referenced objects. However, the reference to an object will be exported.

For example a connection object would refer to the credential, even if the user does not have READ permission on the credential. The definition of the credential object itself will not be part of the export file. This can lead to issues during the import, because the connection cannot be created without an existing credential.

The following permissions are required:

  • READ

The history displays all changes made to the respective XDM object, including any changes made to its rules.

Each change record includes information about the operation performed (e.g. CREATE, UPDATE, DELETE), the timestamp, and the user responsible for the change.

For more information about the concept of the history refer to the history concepts.

The following permissions are required:

  • READ

The check verifies that the current user has the authorization to access the object. The check can also be performed for a specific user or role, if needed. By default, the check is performed using the current user’s credentials. It is then applied to child and referenced objects.

Additional permission checks are applied when these can be inferred from the context in which the check was started. For example, if the check is performed on a table copy task, the referenced source and target connections are checked to determine whether the given identity has source or target usage permission respectively.

The following permissions are required:

  • READ

The Usage List shows all objects that refer to the current object. It provides an overview of the relationships and makes it easy to track these relationships.

The following permissions are required:

  • READ

The Uses List shows all objects that the current object uses. It provides an overview of the relationships and makes it easy to track these relationships.

The following permissions are required:

  • READ

Child objects

The following child objects can be specified on task templates:

License Options

This object is available if the following license option is enabled:

  • TASK_TYPE:ANALYSE_TASK

The object is also available if the license package is at least: STANDARD.

Prerequisites

Before an XDM PII finder task template can be completely configured you have to set up a connection and credential. Furthermore, at least one matcher must be defined. XDM contains several pre-defined matchers, but it is also possible to define your own.

Required permissions

Users to execute a task

  • Source database user, defined in the credential of the source connection

Authorization for database user

  • SELECT authority for the database catalog, and

  • SELECT authority for the tables to be copied.

  • DB2 LUW only: System monitor authority (SYSMON) to obtain information from the database catalog (optional, but recommended),

  • MS SQL Server only: The user must be a database user, a domain user is not sufficient.

Executing a task

The user needs the following permissions to successfully start the task execution:

  • READ permission on the task and task template,

  • EXECUTE permission on the task,

  • SOURCE_USAGE on the connection,

  • READ permission on the credential used in the connection,

  • READ permission on any matcher linked in the task template, and

  • READ permission on any custom parameter used in the task template.

Task procedure

An XDM PII finder task contains two stages. Before executing Stage 1, XDM will create all necessary files and folders in the task directory during the Tailoring process.

Tailoring

All necessary files and folders are created in the task directory.
The following reports are generated by this stage:

Stage 1

The structure of the tables in the environment is cached in task files. XDM collects information about all tables that match at least one selection rule. Tables and columns that match any exclude rule will not be in the task’s selection set.
The following reports are generated by this stage:

Stage 2

The data of the selected tables is analyzed. XDM fetches the data of the tables and compares it with the selected matcher.
The following reports are generated by this stage:

Setting up a PII finder task template and task

This section describes the most import settings for an XDM PII finder task template.