Click here for the article on BI this week at TDWI (The Data Warehousing Institute)
There’s a good chance you’ve heard the term “data-driven.” It’s the buzzword du jour. Pick a subject area and there’s probably a data-driven for it — from data-driven decision management to data-driven marketing.
What does it data-driven really mean?
For starters, says Manish Sood, CEO of Reltio Inc., the term “data-driven” doesn’t mean anything if it doesn’t have a reliable data management (DM) infrastructure in back of it. “This goal [of data-driven apps] has to be done on the basis of reliable data that then drives relevant insights and allows us to surface relevant actions out of those combined data sets,” says Sood, citing the increasing use of poly-structured text and file-based data, in addition to data from the strictly-structured (R)DBMS.
Assume you have such an infrastructure, what does data-driven? Sood contrasts the “data-driven” paradigm with that of packaged applications, which he says are premised on an “information recording” paradigm. “Most of these are functional apps, such as CRM, that drive a very structured process. They’re designed to do certain things in a certain way so certain steps are not missed. But these are more information-recording devices,” he argues.
“When you look at a salesperson in an enterprise company … they’re all trying to use the information they have in their existing systems as a starting point. In most cases, they want to augment that information, to enrich it, to have better insights: who are these customers they’re dealing with, what are these products being sold to them, who are the vendors providing these products, and what does the competitive landscape look like to them?”
In other words, data-driven apps give you context.
More precisely, data-driven apps give you context by synthesizing and coalescing insights from a variety of vectors — of which the traditional, strictly-structured relational (R)DBMS is but one example. Other vectors include the event messages exchanged by applications; data from Web services, subscription services (which increasingly use Web interfaces), and other data-as-a-service (DaaS) providers; and the results of advanced NoSQL analytical processes.
As Sood sees it, this makes for a two-fold challenge: first, to build and deploy an infrastructure that can reliably control for and manage both data heterogeneity and data periodicity; second, to deliver an application user experience (UX) that promotes exploratory discovery and analysis and which does so in the context of a familiar or intuitive paradigm.
He cites the business social app LinkedIn as an example of what he means.
“What we did [with Reltio] is inspired by some of the work that has been done by various consumer-facing products that you see in the market, such as LinkedIn or even Google Knowledge Graph, that are based on similar concepts,” he explains. “Reltio is driven by a hybrid structure of columnar and graph data structures that are combined together … [which] gives us the ability to have infinite attribution details, whereas the graph gives us the ability to expand into various relationships.”
From Master to Modern Data Management
The first challenge — building and deploying a platform that reliably serves up consistent, quality data from multiple vectors — is arguably the hardest. Traditional data management concepts — such as data warehouse architecture or the heretofore dominant MDM paradigm — simply aren’t up to the task, Sood argues. Traditional DM is grounded in the relational data model, which expects to structure data in a very specific (and, for this reason, specifically inflexible) format.
The shift to software-as-a-service (SaaS) and cloud helped to chip away at the dominance of traditional DM; the phenomenon of big data has altogether upended it.
Cloud services, especially, are built on a new app dev paradigm — representational state transfer, or REST — that’s fundamentally different from the client-server paradigm that underpins traditional DM and its architectural linchpin, the data warehouse. REST-ful apps aren’t transactional in the traditional sense. Instead of recording transactions in a single back-end database, they use asynchronous messaging to “transfer” a representation of the state of data. REST-ful apps can and do record to databases, but in many if not most cases, they’re also communicating with and transferring data between and among other event-driven REST-ful apps.
Sood describes Reltio as a “modern” data management platform. It combines homegrown MDM technology with a distributed, fault-tolerant database substrate (Apache Cassandra), a general-purpose parallel processing platform (Hadoop, running in Cassandra), Apache Spark for streaming, a natural-language search facility (via Apache Lucene) and a homegrown graphing capability.
“Our architecture is built from the ground up on a big data foundation. Instead of taking the traditional route of using the RDBMS-type of capability to define the data model and then use that as a starting point. … We took a columnar-oriented approach we’re able to introduce new entities, new attribution details, many-to-many connections at scale without having to go back and extend or add to the data model,” he says. (Cassandra itself is technically a partitioned row store. Because of how it manages and replicates data — i.e., as tables with optional columns — it’s commonly described as a “column-oriented” store.) “Users just have to think about the logical concepts that they’re trying to bring to life and aggregate the information for,” Sood continues.
“In the past, let’s say that you define a data model where you had one e-mail [address] per person in a database, and tomorrow you decide that everybody has to have multiple e-mail addresses. You have to go back and redefine the structure and extend it and accommodate [your changes] by changing the data model. In our case, multi-value attribution is a normal thing.”
Deus Ex Machina?
Companies such as Cloudera Inc. (with its vision of a Hadoop-centered Enterprise Data Hub), IBM Corp. (with its Watson Foundations platform), and Teradata (with its Unified Data Architecture) — just to name a few — are spending tens (if not hundreds) of millions of R&D dollars on the problems Sood describes. How can Reltio hope to convince potential customers that a tiny start-up player has them all licked? (As a BI industry research analyst told BI This Week: “I read their [Reltio’s] marketing, and I’m not sure what they do. I know what they say they do. I just don’t know how they do it.”)
Sood points to Reltio’s Siperian pedigree — Informatica acquired Siperian, a highly respected MDM pure-play, in early 2010 to anchor its own MDM offering — and notes that he and other principals have been grappling with some of these problems for decades. (Sood himself holds an MDM-related patent — US 20090024589 A1– for a “data integration system that can easily integrate data from several disparate sources [and] can flexibly manage data.”)
“Some of us [at Reltio] were responsible for the previous generation of master data technologies that you see in the market today. We used that background to solve the [master data] problem at big data scale, [which requires] standardization, augmentation, enrichment, and normalization of information that’s coming in from multiple sources,” he says. “That forms a foundation inside the product we have built out, so that every data-driven app has MDM capabilities built into it.”