The Coder's Catnip

Follow an aspiring developer's adventures in programming, data science, and machine learning. From early gaming communities to exploring new career paths, my fascination with coding eventually drew me back to computer science. Although programming seemed challenging at first, its creative possibilities continued to motivate me. Now, after returning to school, I'm fully embracing this journey. Join me as I chronicle my path as a lifelong learner – sharing, projects, mindset shifts, and resources that help me progress from student to coder. You'll find motivational highs, frustrating lows, and everything in between. My goal is to pass on tools, inspiration, and community to empower aspiring developers. Let's explore this endless world of coding together!


TheCodersCatnip – Intro to Data Analytics Chapter 1 Notes

What is Big Data?

  • Traditional techniques insufficient when data grows too:
    • Large in volume (massive scales)
    • Fast in velocity (high ingestion rates)
    • Diverse in variety (structured, unstructured, semi-structured formats)
  • Requires new solutions to:
    • Combine multiple, disparate, complex datasets
    • Process huge volumes of multi-format unstructured/semi-structured data
    • Extract insights quickly from rapid data flows
  • Uses and benefits:
    • Optimize operations by pinpointing inefficiencies
    • Gain actionable intelligence for competitive advantages
    • Discover new products, services, markets
    • Build predictive models for forecasting
    • Detect faults, failures, fraud, anomalies
    • Maintain comprehensive, detailed records
    • Make data-driven decisions from deep insights
    • Enable scientific research and discoveries

Data Analysis vs Data Analytics

  • Data Analysis
    • The process of examining data
    • To find patterns, relationships, insights, trends
  • Data Analytics
    • Broader discipline that includes analysis
    • Encompasses full lifecycle management of data:
      • Collection, organization, storage
      • Integration, cleansing, governance
      • Mining, analysis, visualization
    • In business: Lowers costs, enables strategic decisions
    • In science: Identifies causes, improves predictions
    • In services: Optimizes operations, improves quality

Four Categories of Analytics

  1. Descriptive Analytics
    • Summarizes what has happened
    • Provides context and information on past events
    • Techniques: Reports, visualizations, KPI monitoring
    • Example: What were sales by product/region?
  2. Diagnostic Analytics
    • Identifies the causes behind past outcomes
    • Data mining, drill-downs, discovery
    • Example: Why did sales decline in a region?
  3. Predictive Analytics
    • Forecasts what is likely to happen in the future
    • Statistical modeling, machine learning, data mining
    • Example: What are the chances of customer churn?
  4. Prescriptive Analytics
    • Recommends actions to achieve optimal outcomes
    • Uses rules, constraints, optimization, simulations
    • Example: How to maximize marketing ROI?
The operational systems on the left, are queried via descriptive analytics tools to generate reports or dashboards on the right.

Business Intelligence (BI)

  • Applies analytics across entire enterprise
  • Provides insights into organizational performance
  • Data consolidated into enterprise data warehouses
  • Analytics queries and reports displayed on:
    • Dashboards
    • Scorecards
    • Data visualizations

Key Performance Indicators (KPIs)

  • Quantifiable metrics to gauge performance
  • Aligned to strategic goals and objectives
  • Track progress and identify issues
  • Visualized on dashboards with target thresholds
  • Examples: Sales revenue, customer churn, call times

The 5 V’s of Big Data

  1. Volume
    • Massive scales and quantities of data
    • Terabytes, petabytes or more
    • Requires special storage and processing
  2. Velocity
    • High speed of data inflows/feeds
    • Streaming data, real-time updates
    • Demands rapid ingestion and processing
  3. Variety
    • Different data formats and structures
    • Structured, unstructured, semi-structured
    • Challenges in integration and processing
  4. Veracity
    • Uncertainty, reliability of data
    • Data quality, accuracy issues
    • Cleansing and transformation needed
  5. Value
    • Extracting usefulness and benefits
    • Impacted by data quality, questions asked
    • Also storage, cleansing, analysis approach

Data Types

  1. Structured Data
    • Conforms to defined data model/schema
    • Database tables, CSV/Excel files
    • Relational data from enterprise systems
  2. Unstructured Data
    • Does not follow a pre-defined structure
    • Text files, documents, PDFs
    • Media files like images, audio, video
    • Requires special processing
  3. Semi-Structured Data
    • Some organizational properties
    • Hierarchical, nested constructs
    • XML, JSON, log files, data feeds
  4. Metadata
    • Data that describes other data
    • Technical and business metadata
    • Essential for data lineage, context
    • Sources, processing steps, definitions


Leave a comment

About Me

I’m always on the lookout for fresh learning materials. Whether it’s blogging, data science, productivity, personal growth, AI, or coding. If that piques your interest, sign up for my Newsletter and connect with me on social media to stay updated!
(ノ◕ヮ◕)ノ*:・゚✧


Newsletter

Blog at WordPress.com.

Discover more from The Coder's Catnip

Subscribe now to keep reading and get access to the full archive.

Continue reading