CRM data cleansing and machine learning

Duplicate data in CRM systems can cause you to miss out on that promising lead. Or worse, can give you the impression that you obliged a deletion request while truthfully, duplicate data about that same entity is still lingering around, breaking privacy regulations in the process. Avoiding duplicate data is of utmost importance.

Prevention is better than cure

Preventing duplicate data is of course better than curing these issues. Having strict and clear data entry guidelines for all employees working with the CRM system is paramount. These guidelines should entail a standardized way of data entry and guidelines for checking existing entries before entering new data.

But a cure is better than no cure

Even with strict guidelines or perhaps due to too lenient guidelines, duplicate data might already be present in your CRM system. Fixing these issues manually can be a very time consuming and error-prone process. CRM objects contain many links with other objects and improperly editing these entries may break these links, leaving companies without associated contacts or losing your promising leads. Proper merging is thus very important.

cimt can help you on these deduplication jobs. Through the power of Talend and sophisticated matching algorithms, we are able to automatically detect duplicates and merge them effortlessly. This is even effective when the duplicate data is not an exact match due to missing fields, spelling errors and the like. Most of this data can be merged automatically, massively cutting down on the time required for these jobs. Only uncertain matches are checked manually, but even these are then merged automatically, making sure that the merging process is done properly.

What we did

We already employed these techniques for deduplicating a customer’s data in Salesforce. As mentioned before, we used Talend to automatically find the duplicate data. So-called survivorship rules were used to combine the data from all duplicate records in the best possible way to generate a single record, the golden record. At the same, uncertain matches are efficiently and intuitively processed in Talend’s data stewardship console to manually produce the golden records. Another Talend job was used to send a merge request to Salesforce’s SOAP API, making sure that only the golden records are stored, while all duplicate records are merged. This ensures that all links between CRM objects are preserved. For this particular customer, this process was applied over all accounts, leads and contacts. But technically, this could just as well be applied to any object in the CRM system.

Beyond CRM systems

While these techniques are very effective for CRM systems, they work just as well with just about any database system. If duplicate data is an issue in your company’s database systems, there is a good chance that we can assist with deduplication of this data.

Machine Learning

In addition to engaging business users to clean duplicate suspects, you can also employ machine learning algorithms to learn from the manual checks that are done on the uncertain matches, so that your implementation can learn from this manual labour and thus improve its duplicate detection accuracy. This will cut down the cleanup time even more, resulting in more efficient deduplication workflows.

More information

Would you like to know more about data deduplication, applying machine learning algorithms or how we can help you with this process? Feel free to contact us!