.excerpt-thumb {display: block !important;}

Right Data in the Right Storage for the Right Insights

The world we live in isn't just relational, so why should your data be stored that way?

Reltio Max Lukichev

Today’s data management requires reliable handling of master data at big data volume, variety, and velocity. There are two broad categories of databases in the market, Relational and NoSQL. And then there are variations of NoSQL such as a graph, key-value, and document. The challenge is comparing, contrasting, and figuring out what database is the right one for your business.

There are dozens of databases to choose from. Data architects have been exploring multiple options for various business needs, with a realization that a single type of database is not optimum for all application use cases. This thought led to the notion of polyglot persistence. The concept explores the idea that a single application should be able to talk to different database types to achieve the business objective.

When you build a data foundation for enterprise applications requiring information from internal, external, third-party, and social data sources, you need a data store that is flexible to handle all disparate data types with big data scalability. You also need the agility to update the data model quickly without any impact on the performance.

In this article, I am going to discuss the idea of hybrid data stores designed to handle the most complex multi-domain master data management (MDM) challenges and at the same time effortlessly bringing together transactional, interaction, social and machine-generated data at scale. Inspired by consumer applications like LinkedIn (Economic Graph) and Facebook (Social Graph), hybrid data foundation helps create data-driven applications that are infinitely scalable, flexible, and extensible.

An example of such data-driven application is Customer 360 for consolidating customer information by bringing together data from multiple sources like CRM, ERP, support, marketing automation, social, channel, and others. For a true Customer 360 view, you may also want to augment customer profile by adding data from third-party providers like data.com.

Business users want access to accurate and complete customer data and analytical insights (customer value, churn propensity, and next-best-offer) to help increase revenue and improve customer experience. Customer-facing teams also want to uncover the relationships that customers have with various organizations, places, and products.

A single columnar or graph database cannot easily support such use cases. Columnar databases do not manage relationships that well. Graph databases while suited for uncovering and handling relationships, don’t have the horizontal scalability to meet enterprise requirements. When a Fortune 500 company wants to manage 100 million or above customer profiles, across the globe, across thousands of product offering, the graph databases alone is not up to the task. A modern data management platform with a polyglot approach, built on columnar-graph hybrid stores are more suited for applications with varying data storage needs.

We selected Cassandra as our columnar storage of all data which provided us schema-less model to store information at a big data scale. It supports elastic storage for fast access and ability to perform analytics for aggregated insights. Also, we added graph technology to help model and visualize real-world relationships in a manner that is more efficient and easier to understand as compared to relational databases.

With a graph you can describe an unlimited number and types of entities and associated relationships, connecting people, products, organizations, and places in many-to-many relationships. This modern data management platform with hybrid columnar and graph store enables schema-on-read, graph relationship modeling, plus infinite horizontal scalability across all business entities with limitless attributes.

Many use cases do not require advanced analytics, but when you need deeper insights into activities and behavior of customers organized in a hierarchy, database queries are not sufficient. Modeling roll-up of dynamic hierarchical information (revenue, value, product usage) or finding key influencers in customer networks requires more scalable and efficient solution beyond columnar database queries.

Sophisticated analytics with many-to-many entity relationships and connections to interactions and transactions need big data analytics technology like Apache Spark. Spark supports data science queries, percentile calculations, window functions, advanced machine learning, and graph analytics. Spark analytics environment enables you to handle complex queries and helps you to distribute task across multiple nodes making it possible to manage enormous datasets.

With Spark environment in place, you can utilize advanced algorithms such as triangles, page ranks, node connectivity, node inbound and outbound degree to know more about the relationships between data entities. These capabilities are useful when you calculate scores, relationship strengths, or influence of various data elements that you manage.

Data scientists use such rich capabilities to find most influential doctors based on publications, influencers in social media, recommendations for cross-sell offers, or detect fraudulent insurance claims.

The modern data management platform also requires mechanisms to ensure the reliability, integrity, and governance of the data and match appropriate entities and objects across hundreds of systems and devices. Hybrid data store combined with data cleansing and management capabilities and Spark analytics provides users reliable business data, relevant operational insights, and recommended actions to make informed decisions. With the combination of columnar and graph technologies and power of advanced analytics, companies can assemble a scalable data store, scalable graph, and scalable analytics to design the new generation data-driven applications and run the business at a big data scale.

(About the author: Maxim Lukichev is the Lead Data Scientist at Reltio. He is currently focused on extending Reltio’s big data analytics framework and machine learning capabilities on top of the reliable master data management foundation built into the Reltio Cloud platform. You can contact him at maxim.lukichev@reltio.com)

Originally published at Information-management.com. Click here to view the article.