Understanding Entity Resolution
Entity resolution is the process of determining if two data entries actually represent the same real object. This makes entity resolution a decision making process. This process is done at the entity level, but can be scaled to accommodate big data. Because entity resolution is a process at the entity level, there is a significant space for proprietary approaches that differ in quality and speed.
What exactly is entity resolution?
Entity resolution is a key process step in Master Data Management. Master Data acts as a superset of a company’s overall data, tying together data from disparate sources that potentially refer to the same unique entities. Such entities like customers, products, and suppliers can be represented in multiple databases and used by different departments, but may not be represented with the same data structures, or even referential names. Furthermore data within data structures may not be formatted consistently, this lack of standardization contributes to poor data integrity.
What is dynamic entity resolution?
In entity resolution, the process of matching different data points that could represent a single entity is called similarity analysis, and it’s an ever improving field. There are three common approaches to a similarity analysis each with increasing complexity: traditional matching, which focuses on directly matching records but yields poor results; batch entity resolution, which constructs better results into a single view of entities, and real-time entity resolution, which constructs a single view which remains current.
The next evolution of similarity analysis, referred to as dynamic entity resolution, emphasizes the regeneration of entity views from underlying raw data in real-time with respect to specific use case requirements. Similar to real-time entity resolution in that it remains current, dynamic entity resolution also remains more relevant.
In some use cases broader or tighter targeting or specificity may be required. So, the premise of regenerating entities is to allow different combinations of matching criteria for individual entities instead of assuming that one criteria of an entity can fit all use cases. In essence, dynamic entity resolution allows different fuzziness levels that fulfill data access and application requirements. This has become beneficial for enterprise-level data solutions supporting multiple use cases.
Why is entity resolution important?
Entity resolution is critical to Master Data Management. Only through the process of matching and merging records from disparate datasets can the construction of the Master Data set be possible. Without entity resolution, there is no reliable tie between entries in separate databases, and therefore any potential insights that can be drawn from combining them are simply squandered (because data is likely not to be combined with great accuracy). In essence the output of an entity resolution process is the Master Data record.
What is the process of entity resolution?
Entity resolution is a step within the larger process of the Master Data Management key processing model, of which each stage overlaps and impacts overall data quality. Entity resolution effectiveness should not be considered in isolation. A comprehensive MDM environment will include the following processing steps.
Data Model Management — Master Data is purpose built to transcend complications of inconsistent data that lead to poor understanding. The solution is to establish clear and consistent logical data definitions within the context of the business. Then data systems should be made to speak this language between each other.
Another established method is to use globally unique identifiers (GUID) that represent an entity and reference data can be associated through this GUID. In this way the data model overcomes the dependency on system speak, a principle which should also extend down to attributes that describe data within systems.
Data Acquisition — New data sources, and data within those sources may be inconsistent. Because of these external and internal inconsistencies, establishing a reliable, repeatable data acquisition process will support the ability to effectively manage and improve entity resolution activities, like validating, standardizing and enriching data.
Data Validation, Standardization and Enrichment — At a minimum to ensure good data consistency, validation, standardization and data enrichment should be implemented. Validation aims to eliminate erroneous data entries, like fake emails. Standardization conforms data to known values (like country codes), formats (like telephone numbers), and fields (like addresses). While data enrichment improves the process by adding useful attributes that aid in more accurate entity resolution. This results in cleansed data ready for entity resolution.
Entity Resolution — Entity resolution consists of a general workflow that subjects the validated and standardized data to a set of match rules which determine how to proceed based on deterministic and probabilistic matching algorithms. Similar entries are treated according to their score. Entities with scores that signal tight similarity may be automatically resolved, others that are fuzzier may be sent to a data steward for resolution. And still, entity cross-referencing may simply be recorded while the master record remains unchanged. Further entity resolution management activities include Master Data ID management—management of the Global IDs and Cross-Reference (x-Ref) information—and Affiliation Management—understanding and establishment of the relationships between MD entity records that correspond to the relationships they share in the real-world.
At this point, Identification Management and Metadata Management systems will begin to manage the growing metadata and Globally Unique Identifiers that support access to the data now connected to newly discovered entities.
What are examples of entity resolution?
To illustrate, we use the following source data received by an MDM system. Imagine two data sets pulled together with very similar structures, but inconsistent entry data.
Source ID | Name | Address | Telephone |
---|---|---|---|
549 | Jacob Smith | 555 Main St., Freedonia, QT 87456 | |
183 | J. Smith | 555 Main St., Freedonia | 2345678900 |
349 | Joanna Smith | 555 Main St., Freedonia | 234-567-8900 |
Between the three entries, standardization appears to be missing, but many similarities are present. Firstly, the surnames create overlap and because the addresses are very close to the same there is cause to believe these entries are related. But the abbreviated first name in entry 183 leaves questions, and the entities need to be resolved. Potentially this entry could represent the same entity as one of the other two, or a third entity living at the same address, or simply be out of date. Similar discrepancies in the telephone fields also present questions. If it’s learned that Jacob Smith’s telephone is different from Joanna Smith’s, then there is a better chance that entry 183 is Joanna Smith. But if entry 549’s telephone is identical to J. Smith, then more information may be needed to resolve the correct entity.
This simplified demonstration shows entity resolutions at a very basic level, sometimes it is performed manually on small data sets using spreadsheets. But these techniques are absurdly inadequate for organizations today who are leveraging their big data as an operational asset. In these big data cases entity resolution needs to be automatic to be effective and efficient. Master Data Management platforms provide these automated entity resolution capabilities.
What is augmented entity resolution?
Augmented entity resolution (AER) is a sophisticated data management technique designed to elevate the accuracy and connectivity of information within large datasets. Organizations today amass vast amounts of data from diverse sources; the challenges they face are not in collecting data but in finding the meaning in it. Identifying and linking related entities from large datasets is a complex task.
Augmented entity resolution uses a range of advanced algorithms and techniques—such as machine learning algorithms, natural language processing, and statistical models to refine the accuracy of entity matching and linking processes. By incorporating these technologies, AER can adapt to data complexities and improve the overall resolution process.
An example of AER in action is in customer relationship management. Consider a scenario where a retail company wants to merge customer data from multiple touchpoints, such as online purchases, in-store transactions, and customer service interactions. AER matches and links customer profiles across those data sources, providing a unified view of each customer’s journey and preferences.
Benefits of augmented entity resolution
- Improved data accuracy: AER enhances the accuracy of entity resolution, reducing the likelihood of false positives and negatives. This improves the reliability and trustworthiness of integrated datasets, instilling more confidence in data-driven insights.
- Enhanced connectivity: AER adeptly identifies and links related entities, bolstering the overall connectivity of data. Establishing more connections enriches the depth and breadth of insights that can be derived from a dataset and provides a more comprehensive understanding of relationships and patterns within the data. Uncovering deeper insights leads to better decisions.
- Adaptability to diverse data sources: AER is a versatile solution capable of harmonizing structured and unstructured data seamlessly. From customer profiles to financial records, AER can integrate disparate datasets to provide organizations with a unified and holistic view of their data.
What is a flexible entity resolution network (FERN)?
Flexible entity resolution network (FERN) is Reltio’s advanced and unique solution that addresses the challenges of linking and resolving entities within complex datasets. FERN’s adaptability to diverse and dynamic data sources ensures accurate augmented entity resolution, providing organizations with reliable and actionable insights.
FERN uses advanced neural network architectures and flexible algorithms, allowing it to learn intricate patterns and enabling it to make intelligent decisions when resolving entities. The network processes input data, extracts relevant features, and produces refined outputs. Its sophistication makes FERN indispensable for organizations seeking a deeper understanding of relationships within datasets.
To understand and identify similarities, FERN uses embedding techniques, which capture semantic relationships between entities. This enhances FERN’s flexibility, enabling it to adapt to varying data structures and complexities. It also incorporates attention mechanisms, which focus on relevant information during the entity resolution process. FERN can assign varying degrees of importance to different parts of the input data. This enables better decisions, especially in situations where certain features are more critical for accurate resolution.
Financial institutions managing customer data for fraud detection demonstrate FERN’s capabilities. FERN flags suspicious transactions by analyzing transactional data, identifying patterns indicative of fraudulent activity, and accurately resolving entities across disparate datasets.
Benefits of flexible entity resolution network
- Improved accuracy and adaptability: FERN harnesses the power of neural network architectures to learn intricate patterns within data, enabling precise entity resolution even in the face of evolving datasets. This enhanced accuracy ensures that organizations can trust the insights derived from their data analysis processes.
- Efficient real-time processing: FERN’s efficiency enables entity resolution in or near real time, crucial for applications where the timely identification and resolution of entities is imperative. From fraud detection to customer data management to cybersecurity, FERN gives organizations the agility to respond swiftly to emerging threats and opportunities.
- Scalability: FERN handles large-scale datasets without compromising on performance. Its neural network architecture allows it to process substantial volumes of data, and it can scale along with an organization’s evolving data needs.
FERN combines advanced algorithms and neural network architectures to deliver accurate, adaptable, and scalable solutions for today’s data-intensive challenges.