The Coder's Catnip

Follow an aspiring developer's adventures in programming, data science, and machine learning. From early gaming communities to exploring new career paths, my fascination with coding eventually drew me back to computer science. Although programming seemed challenging at first, its creative possibilities continued to motivate me. Now, after returning to school, I'm fully embracing this journey. Join me as I chronicle my path as a lifelong learner – sharing, projects, mindset shifts, and resources that help me progress from student to coder. You'll find motivational highs, frustrating lows, and everything in between. My goal is to pass on tools, inspiration, and community to empower aspiring developers. Let's explore this endless world of coding together!


Data Integrity

Data Integrity occurs when the same data is stored more than once. This is problematic because redundant data must be updated in multiple places. If one location is overlooked, the data becomes inconsistent. Redundant data also requires additional space to store it raising overall storage costs. File systems were notorious for having redundant data like names. One file may store a first name as William and a second file may store it redundantly as Bill or Will. The data has become inconsistent. Ideally each piece of data should be stored once and only once.

Data integrity is compromised when the same data is stored in multiple places redundantly. For example, a customer’s first name may be stored as “William” in the customer table, and again as “Bill” in the sales table. This redundancy creates several risks:

Example of redundant data vs normal data
  • Increased storage needs – Storing the same data multiple times unnecessarily takes up more storage space.
  • Inconsistency – If the name is updated in one place but not others, the records become inconsistent.
  • Difficulty updating – Changes must be applied across all redundant copies, or inconsistencies result.

Best practices for avoiding redundant data include:

  • Normalizing databases so each piece of data has a single canonical representation.
  • Using unique IDs to reference data instead of duplicating it. So rather than storing “William” in multiple tables, store his unique customer ID.
  • Establishing data governance policies on allowable redundancies. Some replication may be needed for performance.

In summary, redundant data storage impacts data integrity by increasing the risks of inconsistencies, wasted space, and difficult updates. Following normalization and unique ID best practices can minimize unnecessary duplication. Careful data governance is key to enabling integrity across the entire data ecosystem.

About Me

I’m always on the lookout for fresh learning materials. Whether it’s blogging, data science, productivity, personal growth, AI, or coding. If that piques your interest, sign up for my Newsletter and connect with me on social media to stay updated!
(ノ◕ヮ◕)ノ*:・゚✧


Newsletter

Blog at WordPress.com.