Data_lab/README.md
2026-01-11 22:04:49 +00:00

2.7 KiB


🧪 Data Lab

DataLab

Learning • Experimentation • Data Engineering Projects

Note

Code, datasets, and designs may change as I refine implementations and adopt best practices.


Table of Contents
  1. About
  2. Projects
  3. Tools & Technologies
  4. Contact Me

About

Data Lab is a personal sandbox for building and iterating on data engineering projects.

The focus is on:

  • Designing reliable data pipelines
  • Applying analytics engineering principles
  • Working with batch data, lakehouse patterns, and validation
  • Learning by building realistic systems rather than just toys

Projects range from small experiments to end-to-end pipelines using production-style tools.

(back to top)


Projects

Each project will be in its own Repo.

Banking Transaction Pipeline

  • Spark-based ETL pipeline
  • Bronze → Silver → Gold lakehouse design
  • Data quality enforcement and validation

(back to top)


Tools & Technologies

  • Languages: Python, SQL, Java
  • Processing: Apache Spark (PySpark)
  • Storage: S3 storage
  • Data Formats: Parquet, Delta-style layouts
  • Databases: PostgreSQL, SQLite (project-dependent)
  • Visualization: Tableau / Power BI

Tools may expand as new projects are added.


💬 Connect With Me

LinkedIn Portfolio Kaggle Email Resume

(back to top)