Data Integrity occurs when the same data is stored more than once. This is problematic because redundant data must be updated in multiple places. If one location is overlooked, the data becomes inconsistent. Redundant data also requires additional space to store it raising overall storage costs. File systems were notorious for having redundant data like names. One file may store a first name as William and a second file may store it redundantly as Bill or Will. The data has become inconsistent. Ideally each piece of data should be stored once and only once.
Data integrity is compromised when the same data is stored in multiple places redundantly. For example, a customer’s first name may be stored as “William” in the customer table, and again as “Bill” in the sales table. This redundancy creates several risks:
![](https://thecoderscatnip.com/wp-content/uploads/2023/09/example-of-data-integrity.png?w=560)
- Increased storage needs – Storing the same data multiple times unnecessarily takes up more storage space.
- Inconsistency – If the name is updated in one place but not others, the records become inconsistent.
- Difficulty updating – Changes must be applied across all redundant copies, or inconsistencies result.
Best practices for avoiding redundant data include:
- Normalizing databases so each piece of data has a single canonical representation.
- Using unique IDs to reference data instead of duplicating it. So rather than storing “William” in multiple tables, store his unique customer ID.
- Establishing data governance policies on allowable redundancies. Some replication may be needed for performance.
In summary, redundant data storage impacts data integrity by increasing the risks of inconsistencies, wasted space, and difficult updates. Following normalization and unique ID best practices can minimize unnecessary duplication. Careful data governance is key to enabling integrity across the entire data ecosystem.