PII Finder Task Template
An XDM PII (Personally Identifiable Information) finder task template is designed to analyze the contents of given tables in a single database.
When a PII finder task is executed, it examines the environment by processing selection rules and exclude rules.
PII finder tasks are used to find personally identifiable information, such as first and last names, email addresses, or birth-dates. A PII finder task checks which percentage of a column’s values match the criteria defined in a matcher object. A matcher object compares each value to a regular expression or to a list of reference values.
Permissions
Task Templates have specific permissions to manage user access. The table below displays the available permissions and their purposes.
For more details about the concept of XDMs permission management refer to Permission Management.
Permission |
Description |
|---|---|
ADMINISTRATION |
Specifies that the grantee can grant and revoke permissions to and from other users. A user that creates an object automatically receives the |
DELETE |
Specifies that the grantee can delete objects of the selected types. |
DIAGNOSE |
This permission controls access to diagnostic data of a task execution. The diagnostic data consists of the stages, their outputs and the batch reports. |
EXECUTE |
Specifies that the grantee may execute the object, or schedule the object for later execution. This permission only applies to objects that are executable, i.e. tasks, task templates, workflows, workflow templates, and data shops. |
READ |
Specifies that the grantee has read permission on the object. The grantee is able to see the object in lists and can see all of the object’s details, such as rules or access permissions. In addition, the grantee can reference this object.
For example, a user who has |
WRITE |
Specifies that the grantee has the permission to change the settings and attributes of an object. This also includes modifying any rule lists that might be associated with the object (for example, the selection rules of a task template). |
Properties
The table below documents the available properties for task templates. The 'name' column displays the property name as it can be used in Groovy and Java Scripts.
Name |
Type |
Default |
Description |
||
|---|---|---|---|---|---|
|
connection |
Connection |
n/a |
Specifies the connection that is used in the PII finder task. A connection must be set in either the task template or the task. If it is set in both, the connection set in the task is used. The connection must be usable as source connection. |
||
Delete execution files on success deleteExecutionFilesOnSuccess |
Boolean |
false |
Specifies if working files of a task execution should be deleted in case of a successful task execution. This reduces the required space for a task and should therefore prevent space problems on the server when many tasks exist. |
||
|
description |
String |
n/a |
An optional description for this object. The description can contain multiple lines to give more context on the configured object. The description is not used in a technical context. |
||
|
displayName |
String |
n/a |
Specifies the name of the object. The name is used to display and identify the object in lists. The name can contain any valid UTF-8 characters. |
||
|
executionPlatform |
String |
default |
This property defines whether and on which platform the Dataflow server is being built for a task or workflow execution. The property on the template is overwritten by the property on the task or workflow. |
||
|
executionRetentionPeriod |
String |
-1 |
Specifies how long executions are kept for the specific task or workflow.
If an execution is older than the specified retention period, the XDM service will remove it.
If the executions should not be deleted automatically, the retention period must be set to -1. Otherwise, the input must start with
A detailed description of the period syntax can be found here. |
||
|
fetchLobs |
Boolean |
false |
Controls whether lob columns are fetched, and the content is analyzed. The analysis of lob values might increase the processing time. |
||
|
includeNumericFields |
Boolean |
false |
A PII finder task normally only analyzes columns that can contain strings and characters. This flag can be used to specify that columns with numeric data types are also taken into account. |
||
|
jdbcProfiling |
Boolean |
false |
Activate the performance supervision in the associated task template (all task types are supported).
If this option is activated certain warnings in the task logs will be activated and profiling statistics for JDBC queries are logged.
They are available through a task execution export in the task folder |
||
|
logLevel |
Loglevel |
INFO |
Controls the granularity of the log messages for the XDM task issued directly by XDM. The number of log messages decreases in the following order: Trace → Info → Warning → Error
It is also possible to overwrite the log level in specific program parts. This can be done by adding the prefix Possible values are:
|
||
|
matcher |
Matcher |
n/a |
The property contains a list of matcher that can be used in the analysis of the selected table. For every matcher it is also stored, if it is selected or not.
|
||
|
probability |
Number |
20 |
The minimum probability for which a matcher will generate entries in the report file, expressed as a percentage. For example, if a column in a table is analyzed with a matcher using the default value for this property, and 20 rows or more out of 100 rows in this table are matched, then these hits will be reported. If less than 20 rows are hit, no result will be reported in the report file. |
||
|
sampleSize |
Number |
10000 |
Defines how many rows per table are read by the data miner. If the table contains 1000 rows and sample size is 100, then just 100 rows are read. If Sample size is set to 2000, then all 1000 rows are read. If the property |
||
|
tags |
Tag |
n/a |
Contains the tags that apply to this object. These tags can be used in the search to find objects quickly and effortlessly. |
||
|
taskExecutionReport |
TaskExecutionReport |
n/a |
This list defines which reports are created at the end of an execution and are displayed in the task execution. The list includes all defined reports to which the current user has READ access. The selected reports are generated at the end of a task or workflow execution.
|
||
|
traceDataConfiguration |
String |
n/a |
This property specifies a list with one or more values which should be traced during XDM task processing. If multiple values are specified, these should be separated by semicolons. The specified values are reported in a trace data report at the end of a task stage in which the value was detected. Reported are the rows of a table in which one of the specified values occurs in any of the row’s columns, whenever that row is part of an INSERT, UPDATE, DELETE, or SELECT action. If a specified value is detected in a stage as part of one of the above named actions, then the row in which it occurs is reported in the trace data report at the end of that stage. In the event that a specified value is affected by a modification method, the row in which it occurs is reported in the trace data modification report. |
||
|
useAbsoluteCount |
Boolean |
false |
Normally, a task shows the relative frequency of hits as a percentage. With this property XDM will show the absolute number of hits. In this case the probability field for this task is deactivated. |
Actions
The available actions are described below. Some actions apply to the list, while others are specific to selected task templates.
List Actions
The following actions are available on the task templates list. If the action is disabled a tooltip will provide the exact reason for the deactivation. The required permissions are described in detail for each action.
-
Bulk Create Permission
-
Bulk Delete
-
Bulk Export
-
Create
-
List History
Create a new permission on the selected objects. Shows in the result list whether the permission could be granted on the respective object. Only these permissions can be granted that are existing on the underlying object.
A permission in the result list can have three different states, these are:
- CREATED
-
The permission successfully granted on the object.
- MERGED
-
The granted permission already exists on the object and merged with the new permission.
- SKIPPED
-
The permission could not be granted, because of missing administration permission on the object.
The following permissions are required on the list:
-
ADMINISTRATION
-
READ
Delete the selected objects.
The following options are available:
- Cascade
-
Recursively delete depending objects.
|
When using cascade, dependent objects are deleted first also with cascade enabled. Thus, a cascade deletion is a recursive function that deeply searches for dependent objects and deletes them first. There is only a confirmation for the first object. The dependent objects are deleted without confirmation but only when the user has the DELETE permission. This feature is only available in development mode. More information about development mode can be found in the chapter User Settings. It should be used with caution. |
An object in the result list can have two different states, these are:
- DELETED
-
The object could be deleted.
- NOT_DELETED
-
The object could be not deleted. This may be because the executing person does not have a delete permission on the object or the object is still referenced by others. A detailed reason can be determined with the help of the error message. If the object is still in use, these objects are also displayed.
The following permissions are required on the list:
-
DELETE
-
READ
Exports the selected objects.
- YAML
-
Generates a YAML file containing all the object’s settings. The user has the option to download the export file, or to paste the content in the import dialog. The YAML export is particularly suitable for importing the exported objects again via the XDM UI.
- ZIP
-
This export writes several individual YAML-files. Each YAML-file is stored in a directory according to its type. For example, when exporting a native table backup task template named 'A backup template', a YAML-file 'A backup template.yaml' is created inside the directory /TaskTemplate/native-table-backup-task-template/ of the ZIP-file. This kind of export is suitable for usage in git-repositories together with XDM’s configuration as code feature.
Related and dependent objects can optionally be included in the export. The export dialog has the following options:
- Include dependent objects
-
Dependent objects only belong to the exported object like rules and tasks.
- Include permissions
-
Permissions of each exported object, only when the object supports permissions. Some objects like rules don’t have permissions.
- Include referenced objects
-
Referenced objects exist by their own and are used in the exported object like connections and environments.
- Include objects that depend on referenced objects
-
Also include the dependent objects of the referenced objects. E.g. the rules of a modification set or the rules in an application model version.
| Objects on which the user does not have READ permission are not exported. This includes dependent and referenced objects. However, the reference to an object will be exported. For example a connection object would refer to the credential, even if the user does not have READ permission on the credential. The definition of the credential object itself will not be part of the export file. This can lead to issues during the import, because the connection cannot be created without an existing credential. |
The following permissions are required on the list:
-
READ
Creates a new object in the current list. Depending on the object type either a popup dialog is shown for the most important settings, or the complete object is shown in edit mode. The dialog provides the option to create the object and remain in the current list or to switch to the newly created object in edit mode to perform further changes.
The following permissions are required on the list:
-
CREATE
The history list tracks all modifications made to objects within it. A new record is added each time an object is created, edited, or deleted. A record indicates who made the change, which object was affected, and when the change was made.
For more information about the concept of the history refer to the history concepts.
The following permissions are required on the list:
-
READ
Object Actions
The following actions are available on specific task templates. In order to execute the action, the user must possess the necessary permissions for the object. The permissions required for each action are described individually. If the user does not have these permissions, the action will be disabled and the tooltip will provide the exact reason for the deactivation.
-
Check
-
Delete
-
Duplicate
-
Edit
-
Event List
-
Export
-
Object History
-
Permission Check
-
Usage
-
Uses
This action validates the object and its dependencies, reporting configuration errors that could cause issues during task or workflow execution. The validation will cascade through the child objects of the checked objects and objects referenced by them.
For instance, if an installed application of an environment is checked, the check will process the application model, the specified version, the connection, modification sets, and involved modification methods. If an object has rules, all active rules will be checked. The modeling connection and version, including their modification sets and methods, will also be checked. Deactivated objects will not be included in recursive checks, but can be checked individually if the check is executed on the object itself.
Checks often require additional information from the context of the objects being checked, such as necessary connections or custom parameter values. The check will gather information from the objects being checked and use it to perform checks on child objects. Any required additional information must be provided before the check begins. The check queries the user to provide these missing information.
- Database object checks
-
For all rules which reference database objects such as tables, columns, etc, the check verifies that the those objects exist in the database system. If a connection can be inferred from the context, then this connection is used. If no connection is available in the context, it must be specified before the check is executed.
- Connection checks
-
For objects which configure access to external systems, such as connections or storage locations, the configuration check verifies that access can be established using the given credentials. Furthermore, additional operations on database connections are performed to check whether the credential user has the necessary authorization to access relevant database objects. In particular, the credential user’s permission to read source tables and write to target tables is verified. Similarly, for storage locations the check verifies that the credential user has permission to write to the working directory.
- Code checks
-
For all entities containing code segments, such as modification methods or condition scripts, the syntax for the code is checked. This does not check, however, whether at run time all necessary variables are likely to be available.
The following permissions are required:
-
READ
Delete the object. If the object is still used by another entity, an error message is displayed, and the object is not deleted. The delete operation must be confirmed in a separate popup.
The following options are available:
- Cascade
-
Recursively delete depending objects.
|
When using cascade, dependent objects are deleted first also with cascade enabled. Thus, a cascade deletion is a recursive function that deeply searches for dependent objects and deletes them first. There is only a confirmation for the first object. The dependent objects are deleted without confirmation but only when the user has the DELETE permission. This feature is only available in development mode. More information about development mode can be found in the chapter User Settings. It should be used with caution. |
The following permissions are required:
-
DELETE
-
READ
Will create an exact copy of the current object with a different display name in the same list. Users can decide whether they want to copy child objects like rules, permissions or tasks. It is only possible to select complete classes of objects and not to select individual child objects. Copied child-objects will preserve their display name. The default is to copy all child objects.
The following permissions are required:
-
CREATE
-
READ
Opens the current entity in edit mode.
The following permissions are required:
-
READ
-
WRITE
This list shows all registered events for the object. It includes events that are specific to the object, or for that type.
The following permissions are required:
-
READ
This action allows to export XDM objects in different formats in order to import them via export or CasC in another environment.
Refer to configuration of export for more information.
Related and dependent objects can optionally be included in the export. The export dialog has the following options:
- Include dependent objects
-
Dependent objects only belong to the exported object like rules and tasks.
- Include permissions
-
Permissions of each exported object, only when the object supports permissions. Some objects like rules don’t have permissions.
- Include referenced objects
-
Referenced objects exist by their own and are used in the exported object like connections and environments.
- Include objects that depend on referenced objects
-
Also include the dependent objects of the referenced objects. E.g. the rules of a modification set or the rules in an application model version.
- Include implicit created objects
-
Implicit created objects are tasks or workflows which were automatically created for execution. These objects won’t be exported by default, but can be included by setting this flag. When exporting implicit objects, make sure that the
Include dependent objectsflag is also enabled.
|
Objects on which the user does not have For example a connection object would refer to the credential, even if the user does not have |
The following permissions are required:
-
READ
The history displays all changes made to the respective XDM object, including any changes made to its rules.
Each change record includes information about the operation performed (e.g. CREATE, UPDATE, DELETE), the timestamp, and the user responsible for the change.
For more information about the concept of the history refer to the history concepts.
The following permissions are required:
-
READ
The check verifies that the current user has the authorization to access the object. The check can also be performed for a specific user or role, if needed. By default, the check is performed using the current user’s credentials. It is then applied to child and referenced objects.
Additional permission checks are applied when these can be inferred from the context in which the check was started. For example, if the check is performed on a table copy task, the referenced source and target connections are checked to determine whether the given identity has source or target usage permission respectively.
The following permissions are required:
-
READ
The Usage List shows all objects that refer to the current object. It provides an overview of the relationships and makes it easy to track these relationships.
The following permissions are required:
-
READ
The Uses List shows all objects that the current object uses. It provides an overview of the relationships and makes it easy to track these relationships.
The following permissions are required:
-
READ
License Options
This object is available if the following license option is enabled:
-
TASK_TYPE:ANALYSE_TASK
The object is also available if the license package is at least: STANDARD.
Prerequisites
Before an XDM PII finder task template can be completely configured you have to set up a connection and credential. Furthermore, at least one matcher must be defined. XDM contains several pre-defined matchers, but it is also possible to define your own.
Required permissions
Users to execute a task
-
Source database user, defined in the credential of the source connection
Authorization for database user
-
SELECTauthority for the database catalog, and -
SELECTauthority for the tables to be copied. -
DB2 LUW only: System monitor authority (
SYSMON) to obtain information from the database catalog (optional, but recommended), -
MS SQL Server only: The user must be a database user, a domain user is not sufficient.
Executing a task
The user needs the following permissions to successfully start the task execution:
-
READpermission on the task and task template, -
EXECUTEpermission on the task, -
SOURCE_USAGEon the connection, -
READpermission on the credential used in the connection, -
READpermission on any matcher linked in the task template, and -
READpermission on any custom parameter used in the task template.
Task procedure
An XDM PII finder task contains two stages. Before executing Stage 1, XDM will create all necessary files and folders in the task directory during the Tailoring process.
- Tailoring
-
All necessary files and folders are created in the task directory.
The following reports are generated by this stage: - Stage 1
-
The structure of the tables in the environment is cached in task files. XDM collects information about all tables that match at least one selection rule. Tables and columns that match any exclude rule will not be in the task’s selection set.
The following reports are generated by this stage: - Stage 2
-
The data of the selected tables is analyzed. XDM fetches the data of the tables and compares it with the selected matcher.
The following reports are generated by this stage: