Standardization vs. normalization

Data standardization and data normalization both intend to improve data quality but do so by different approaches. Data standardization enforces rules and definitions on data, whereas database normalization is designing a database structure to reduce data redundancy and improve data integrity. Because this means creating tables and relationships in ways that eliminate redundancy and inconsistencies, normalizing a database after data entry is not advisable as it can further cause integrity issues. This means that normalization is typically first considered during the design phase of database development.

Normalization contributes to the elimination of database designs that produce modification anomalies. When a database design that breaks normalization rules is modified, either by insertion, deletion or update, certain side-effects may occur due to its structure.

  • An insertion anomaly can occur in circumstances where data entry cannot be recorded because it is simply in violation of the database structure. Such as, at the point of data insertion, data is missing, but the table requires complete data, then the insertion fails.
  • An update anomaly occurs in circumstances when there may be multiple records for the same entity. If data is updated, it must be updated in all congruent entries otherwise the system will soon add its own inconsistencies into its data.
  • deletion anomaly can occur when dependencies are not normalized. For instance, in a database where courses are attached to teachers, unassigning a teacher mid-semester from a course could very well delete those course entries from the database and subsequently remove those courses from student curriculums.

Normalization is the process of structuring a database using “normal forms” that enforce an internal logic which protects against losing data integrity. Normal forms are progress design guidelines, meaning that each normalization level builds upon the previous normal forms, enforcing stricter data integrity and reinforcing good data standards.

An unnormalized database is labeled UNF. Applying the first normal form, or 1NF, to a database eliminates nesting relations into tables, and instead replaces it with the use of foreign keys to increase flexibility and data independence. After establishing 1NF, the second normal form, or 2NF, can be applied, which demands that non-prime attributes depend only on candidate keys.  The third normal form, or 3NF, is achieved when all attributes are functionally dependent solely on the primary key. At this point, there are further normal forms 4NF, 5NF, and 6NF, but applying up to the third normal form on database designs categorizes it as sufficiently ‘normalized’, and adequately free from any modification anomalies. Other normal forms exist in special or academic uses.