diff --git a/readme.md b/readme.md index 46af03d..5991cd7 100644 --- a/readme.md +++ b/readme.md @@ -30,7 +30,7 @@ Table of Contents
  1. - About The Project + About The Project @@ -38,23 +38,25 @@
  2. Architecture
  3. -
  4. Data Quality & Validation
  5. +
  6. Data Quality & Validation
  7. Outputs
  8. Roadmap
  9. License
  10. -
  11. Connect With Me
  12. +
  13. Connect With Me
+ --- + # About The Project This project simulates a **banking transaction data pipeline** using **Python + Apache Spark** with an **S3-backed data lake**. @@ -63,7 +65,7 @@ It demonstrates how raw transactional data can be ingested, validated, transform ## **Tech Stack:** Python, PySpark, Apache Spark, S3 storage - + ### Key Features - **Batch ingestion** of banking-style transaction data into an S3-backed Bronze layer @@ -75,11 +77,12 @@ It demonstrates how raw transactional data can be ingested, validated, transform

(back to top)

- + # Architecture The pipeline follows a lakehouse pattern where each layer has a clear responsibility. + ## Bronze (Raw) **Purpose** @@ -91,6 +94,7 @@ The pipeline follows a lakehouse pattern where each layer has a clear responsibi --- + ## Silver (Clean & Validated) **Purpose** @@ -105,6 +109,7 @@ The pipeline follows a lakehouse pattern where each layer has a clear responsibi --- + ## Gold (Curated & Analytics-Ready) **Purpose** @@ -127,6 +132,7 @@ The pipeline follows a lakehouse pattern where each layer has a clear responsibi --- + # Data Quality & Validation The pipeline applies checks to prevent bad data from reaching curated datasets. @@ -144,6 +150,7 @@ These checks keep the Silver and Gold layers consistent and trustworthy for down --- + ## Outputs **Example S3 layout:** @@ -163,6 +170,7 @@ Gold-layer datasets are structured to support:

(back to top)

+ ## Roadmap - Add orchestration (Airflow / Dagster) @@ -173,12 +181,12 @@ Gold-layer datasets are structured to support: - Add CDC-style ingestion simulation - + ## License Distributed under the MIT License. See [LICENSE.txt](https://git.camcodes.dev/Cameron/Data_Lab/src/branch/main/LICENSE.txt) for more information. - + ## 💬 Connect With Me