Understanding Big Data
Big data refers to the large volume, velocity, and variety of data that is generated at a high rate from various sources such as social media, sensors, and business transactions. Understanding big data involves leveraging technologies, tools, and methodologies to manage, process, and analyze this data to extract valuable insights and gain a competitive advantage.
Definition of Data Modeling
Data modeling is the process of defining and diagramming a data model of a software or database system and clearly defining the attributes and flows of data. It requires high level collaboration between modelers, business stakeholders, and end users. Data modeling is important because a comprehensive data model helps to properly structure a database and reduces errors and redundancy while improving performance and resource utilization.
Benefits of Data Modeling
Data modeling’s overarching benefit is to set up a data strategy for success by forcing designers to think through what the real data problem is. It begins with a data model structure that will maximize the advantage of its data by accurately and fully describing data and its flows. Achieving this, the following benefits will follow.
- Avoid and reduce errors in development by addressing large design challenges before implementation
- Improve documentation consistency through accurate and comprehensive definition
- Increase performance by thoroughly planning robust models implemented well
- Reduce the burden of data mapping
- Improve collaboration and communication between technical and business personnel
- Accelerate the process of database design at all three data modeling levels
Types of Data Models
Data modelers rely on three models, or views, of the data model to help abstract models for understanding the structure of their data.
Conceptual model — Conceptual models begin with overall content as inspiration for the structure of the data model. The concept informs the definition of the data structures and the entities that will form foundational elements. The conceptual data model’s focus is on entities, their characteristics, and relationship between them.
Logical model — The logical model represents the elements described in the conceptual model with greater technical detail, like defining data structures, and details on keys, data types, and attributes. These details do not include technical specifications for any database. At this stage, the logical model can be used as a blueprint to build the data structures in any database product.
Physical model — The logical model is then translated into a physical model of the database application. The physical model specifies a blueprint fit for the implementing database.
Data Modeling Structures
Hierarchical Data Model
The hierarchical data model is arranged in a treelike configuration of parents and children. Children have only one parent. The model forms a one-to-many relationship, meaning that there exists for each element only one unique path to access it.
Relational Data Model
The relational data model would supplant the hierarchical model. It is configured into tables with columns recording attributes, and rows as entities. Relational databases can support one-to-many, one-to-one, and many-to-many relationships.
Dimensional Data Model
Dimensional data models are used for data warehouses or data marts because they support business intelligence. They consist of two types of tables, fact tables that contain data about transactions and events, and dimensional tables that list attributes about those entities in the fact table. These models are used to increase speed.
Entity-relationship (ER) Data Models
Similar to relational data models, entity-relationship data models are closely modeled on reality rather than the underlying database. From the ER data model a relational data model can be produced.
Object-oriented Data Models
Object-oriented data models describe models as objects with class hierarchy and inheritance of attributes. This approach is good with multimedia and hypertext databases.
Steps in the Data Modeling Process
The data modeling process consists of six steps outlined by Peter Aiken, associate professor of information systems, at the 2019 Dataversity webinar.
- Identify the business entities that are represented in the data set.
- Identify key properties for each entity to differentiate between them.
- Create a draft entity-relationship model to show how entities are connected.
- Identify the data attributes that need to be incorporated into the model.
- Map the attributes to entities to illustrate the data’s business meaning.
- Finalize the data model and validate its accuracy.
A seventh step should be placed at the end, to continuously revise and update the data models as the data assets and business needs shift.
Data Modeling FAQs
What is data modeling?
The process of design and drafting a visual diagram of a system, software package, or database. It should define connections, data attributes, and flows using text, symbols and lines.
What are the types of data modeling?
The three commonly used data models are relational models, dimensional models, and entity-relationship models. Lesser used models are hierarchical, object-oriented, multi-valued, and network.
What is the data modeling process?
The six steps in the data modeling process are:
- Identify the business entities that are represented in the data set.
- Identify key properties for each entity to differentiate between them.
- Create a draft entity-relationship model to show how entities are connected.
- Identify the data attributes that need to be incorporated into the model.
- Map the attributes to entities to illustrate the data’s business meaning.
- Finalize the data model and validate its accuracy.
Why is data modeling important
Data modeling is a collaborative planning process that is absolutely imperative to ensure the data model accurately reflects its intended use. Inaccuracies can lead to flaws in the database design and lead to further errors and complications.
What are the three levels of data abstraction?
The three layers of data abstraction are the concept layer, the logical layer, and the physical layer. The concept layer inspires the database by defining at a high level the elements to be modeled. The logical layer adds more detail, and is like a master blueprint that can be used to develop the model for a database. The physical layer model demonstrates the translation of the logical layer into the strict requirements of a database. Attributes identified on the logical layer, are now fully defined on the physical layer.