The Configuration is the first tab of the SSIS component. Here you define the basic properties of the SSIS component.
Depending on the selected settings, the configuration of the SSIS component can be very extensive. To make it easier for you to configure the components across several packages, you can save the settings and reload them in another package. This also ensures that you use a uniform duplicate search within your project. Make sure that the profile can only be loaded correctly if the column names are identical.
The SSIS component adds its own columns to the data flow. Here you can define which prefix these columns should have.
Add Matching Group Threshold to the Output
By default, the component only adds the matching group per search parameter and the total matching group to the output. For analysis purposes, it can be very helpful to also output the threshold with which a duplicate has been found. Activate the checkbox to add the additional information to the data flow. This setting can decrease the performance of your package.
The search uses a sliding window to check records against each other. By default, the component searches with a widowsize of 5, but you can change this parameter to improve your search results. However, the larger the sliding window, the more the performance of your package is affected.
By default, the component uses one thread per available core. If you want to restrict the component to a certain number of threads, you can define the corresponding number here. With a value greater than the number of cores, the component uses one thread per core (maximum setting).
If you want to check already cleaned data against new uncleaned data, you can use partial matching to improve performance. Activate the corresponding checkbox and select the filter column for partial matching. Enter a value as the Partial Filter Value for the rows that you do not want to check again.
For example, if you check new data records from your CRM system against data from your already cleaned ERP system, you can add the respective values CRM and ERP to the data in a “Source” column in advance. Then activate partial matching for the “Source” column and enter Value ERP as the filter.
The search then compares only rows with the values CRM and CRM as well as CRM and ERP against each other. Rows with the values ERP and ERP are excluded from the check.
The component requires an SQL Server to cache and sort data. Here you define the SQL Server (SQL Server, Managed Instance or Azure SQL DB) and the database on which the data is to be stored.
Use Temporary Tables
If you are using an SQL Server or a Managed Instance, the Temp DB is configured for high performance in most cases. If you select the corresponding checkbox, the tables are created as Temp Tables by the component. If your database is an Azure SQL DB, we recommend storing the tables in a regular database. In this case, you can also define your own database that is optimized for short access times.