Setting a Dummy PDF in a Database Column for Masking

This guide provides detailed steps for setting a dummy PDF in a database column. The objective is to mask data by replacing it with a predefined file. This guide describes the process for a PDF file, but this will work for all kinds of binary files as JPG, PNG, ZIP, etc.

Overview

The process involves creating and encoding a dummy file, setting it in the database via a modification method, and integrating it into a modification set. This approach ensures data masking while maintaining database integrity.

Prerequisites

  • Basic familiarity with the software environment and database manipulation.

Steps

1. Create the Dummy File

Create the file (e.g., dummy.pdf) that will replace sensitive data in the target database column.

2. Convert the File to Base64 Representation

Use a Base64 encoder to convert the dummy file into a text-based format. This allows the file content to be embedded in a script. This can be done with the command line tool base64, Notepad++ or with an online tool.

3. Create an XDM File of Type Text

  • Create an XDM file.

  • Upload the Base64 representation of the dummy file into the XDM file.

4. Develop a Modification Method in Groovy

  1. Add a parameter to the modification method:

    • Name: fileName

    • Display Name: File Name

    • Type: String

  2. Use the following code in the init, apply, and close methods:

   import org.apache.commons.codec.binary.Base64
   import javax.sql.rowset.serial.SerialBlob
   import de.ubs.xdm3.script.evaluator.ScriptValidationException

   fileContentBinary = null

   /*
    * This function is called once before the processing starts.
    */

   void init() {
        // Retrieve the Base64 content of the dummy file
        file = api.files.find{ modificationMethod.fileName.equals(it.displayName)}
        if (file.equals(null)) {
            throw new ScriptValidationException("File " + modificationMethod.fileName + " could not be found.")
        } else {
            // Decode the Base64 content to binary
            fileContentBase64 = file.content
            fileContentBinary = Base64.decodeBase64(fileContentBase64)
        }
   }

   /*
    * This function modifies each row of the table.
    */
   def apply() {
       // Convert to Blob for binary storage in database
       data[columnIndex] = new SerialBlob(fileContentBinary)
       return true
   }

   /*
    * This function is executed after the table modification is complete.
    */
   def close() {
       // Perform any necessary cleanup (if required)
   }

The ByteArrayWrapperBlob class is a utility class that converts a byte array to a Blob object. This is necessary, if the database column is represented as BLOB in the data process. If it represented as byte[], the Wrapper is not necessary. Please compare the data type mappings in Data Types Mappings.

5. Create a New Modification Set

  • Create a modification set in the software.

  • Add a rule specifying the database column where the dummy file should be applied.

  • Enter the name of the dummy file in the File Name parameter.

6. Use the Modification Set in Task Execution

Assign the modification set to a task for execution. Ensure the appropriate table and column are targeted during the process.

Benefits

  • Data Masking: Securely replaces sensitive data with a controlled dummy file.

  • Automation: Simplifies the process with reusable methods and scripts.

  • Flexibility: Supports various file types by modifying the input file.

Conclusion

By following this guide, you can effectively mask database columns with dummy files using a structured and reusable approach. This ensures compliance with data privacy requirements while maintaining operational efficiency.