No description
| images | ||
| README.md | ||
🧪 Data Lab
Learning • Experimentation • Data Engineering Projects
Note
Code, datasets, and designs may change as I refine implementations and adopt best practices.
Table of Contents
About
Data Lab is a personal sandbox for building and iterating on data engineering projects.
The focus is on:
- Designing reliable data pipelines
- Applying analytics engineering principles
- Working with batch data, lakehouse patterns, and validation
- Learning by building realistic systems rather than just toys
Projects range from small experiments to end-to-end pipelines using production-style tools.
Projects
Each project will be in its own Repo.
Banking Transaction Pipeline
- Spark-based ETL pipeline
- Bronze → Silver → Gold lakehouse design
- Data quality enforcement and validation
Tools & Technologies
- Languages: Python, SQL, Java
- Processing: Apache Spark (PySpark)
- Storage: S3 storage
- Data Formats: Parquet, Delta-style layouts
- Databases: PostgreSQL, SQLite (project-dependent)
- Visualization: Tableau / Power BI
Tools may expand as new projects are added.