It is always expected that there will be duplicated in the result set, its all related to the way how keys are generated in the Fuzzy matching process. The fuzzy matching tool will output the Record ID fields only. ![]() Once these configurations are done in the fuzzy matching tool and the workflow is executed, the Tool will output the list of potential matches based on the configured match styles and threshold percentage. ![]() Match style can be either pre-configured or custom style. Under the match fields section, select the required field and appropriate match style for the selected field. ![]() If the match score is less than this specified value, then the record will not be considered as a potential match. And then, the match threshold as a percentage should be specified on the tool. Once we decided on the matching mode, specify the record ID field on the tool. Source ID column helps us to locate the stream of data where a record is coming from. And, each source must contain a Source ID. It is always good to de-dupe the input files before the merge mode because Merge mode does not detect duplicate records within the same source. Records from different sources will be compared to identify the duplicates across different input files. Records from a single source will be compared against each other to identify the potential duplicates. There are two match modes available in this tool. So, it always necessary to union the different data streams into one before Fuzzy matching. Unlike the other tools in the Join family, the Fuzzy matching tool has only one input stream available. Once the data preparation process is successful, then and only then, performing the fuzzy matching on top of this cleansed data sets would yield the required outcome. After cleansing activities, the data model should be consistent in most of the aspects with different sources.įollowing is a set of possible tools to use in the Data Preparation Process, but it is not only limited to this set. Data cleansing activities may also involve activities like, harmonization of data and standardization of data. Better success rates, it is always important to understand the data and correcting corrupt or inaccurate records from the data set. A comprehensive set of tools is available in Alteryx to perform this step. The next step in the preparation process is to cleanse the data. In case we are comparing records from different sources (Merge Process), it is always good to keep an eye on the Record ID field to make sure that these IDs are not getting overlapped and duplicated. We can either use an existing field as Record ID or create one from scratch. The fuzzy match tool uses the Record ID to output the list of matched records. The very first step is setting the record ID for our data. Please keep in mind that, we can always use a Join tool to join back the required fields at the end of the fuzzy matching process. Also, all unneeded fields can be removed to keep the matching process simple and clean. To extract the most out of a fuzzy matching tool, data must be properly cleaned. In order to perform a successful fuzzy matching, it is essential to prep the data for it. Fuzzy matching is a process that enables the identification of duplicates or matches that are not the same. Alteryx dispense fuzzy match tool to address these scenarios with ease. This is a common problem we experience in a wide range of scenarios starting from human errs to joining data from different databases where it coded differently. We often want to match words that were referring to the same thing, but they were written somewhat different, or with a typographical error. ![]()
0 Comments
Leave a Reply. |