Update readme.md

This commit is contained in:
Cameron Seamons 2026-01-11 21:37:07 +00:00
parent 8cc44c1c99
commit 91942ca6ba

View file

@ -30,7 +30,7 @@
<summary>Table of Contents</summary> <summary>Table of Contents</summary>
<ol> <ol>
<li> <li>
<a href="#about-the-project">About The Project</a> <a href="#about">About The Project</a>
<ul> <ul>
<li><a href="#key-features">Key Features</a></li> <li><a href="#key-features">Key Features</a></li>
</ul> </ul>
@ -38,23 +38,25 @@
<li> <li>
<a href="#architecture">Architecture</a> <a href="#architecture">Architecture</a>
<ul> <ul>
<li><a href="#bronze-raw">Bronze (Raw)</a></li> <li><a href="#bronze">Bronze (Raw)</a></li>
<li><a href="#silver-clean--validated">Silver (Clean & Validated)</a></li> <li><a href="#silver">Silver (Clean & Validated)</a></li>
<li><a href="#gold-curated--analytics-ready">Gold (Curated & Analytics-Ready)</a></li> <li><a href="#gold">Gold (Curated & Analytics-Ready)</a></li>
</ul> </ul>
</li> </li>
<li><a href="#data-quality--validation">Data Quality & Validation</a></li> <li><a href="#data-quality">Data Quality & Validation</a></li>
<li><a href="#outputs">Outputs</a></li> <li><a href="#outputs">Outputs</a></li>
<li><a href="#roadmap">Roadmap</a></li> <li><a href="#roadmap">Roadmap</a></li>
<li><a href="#license">License</a></li> <li><a href="#license">License</a></li>
<li><a href="#-connect-with-me">Connect With Me</a></li> <li><a href="#contact">Connect With Me</a></li>
</ol> </ol>
</details> </details>
--- ---
<!-- ABOUT THE PROJECT --> <!-- ABOUT THE PROJECT -->
<a id="about"></a>
# About The Project # About The Project
This project simulates a **banking transaction data pipeline** using **Python + Apache Spark** with an **S3-backed data lake**. This project simulates a **banking transaction data pipeline** using **Python + Apache Spark** with an **S3-backed data lake**.
@ -63,7 +65,7 @@ It demonstrates how raw transactional data can be ingested, validated, transform
## **Tech Stack:** Python, PySpark, Apache Spark, S3 storage ## **Tech Stack:** Python, PySpark, Apache Spark, S3 storage
<a id="key-features"></a>
### Key Features ### Key Features
- **Batch ingestion** of banking-style transaction data into an S3-backed Bronze layer - **Batch ingestion** of banking-style transaction data into an S3-backed Bronze layer
@ -75,11 +77,12 @@ It demonstrates how raw transactional data can be ingested, validated, transform
<p align="right">(<a href="#readme-top">back to top</a>)</p> <p align="right">(<a href="#readme-top">back to top</a>)</p>
<a id="architecture"></a>
# Architecture # Architecture
The pipeline follows a lakehouse pattern where each layer has a clear responsibility. The pipeline follows a lakehouse pattern where each layer has a clear responsibility.
<a id="bronze"></a>
## Bronze (Raw) ## Bronze (Raw)
**Purpose** **Purpose**
@ -91,6 +94,7 @@ The pipeline follows a lakehouse pattern where each layer has a clear responsibi
--- ---
<a id="silver"></a>
## Silver (Clean & Validated) ## Silver (Clean & Validated)
**Purpose** **Purpose**
@ -105,6 +109,7 @@ The pipeline follows a lakehouse pattern where each layer has a clear responsibi
--- ---
<a id="gold"></a>
## Gold (Curated & Analytics-Ready) ## Gold (Curated & Analytics-Ready)
**Purpose** **Purpose**
@ -127,6 +132,7 @@ The pipeline follows a lakehouse pattern where each layer has a clear responsibi
--- ---
<a id="data-quality"></a>
# Data Quality & Validation # Data Quality & Validation
The pipeline applies checks to prevent bad data from reaching curated datasets. The pipeline applies checks to prevent bad data from reaching curated datasets.
@ -144,6 +150,7 @@ These checks keep the Silver and Gold layers consistent and trustworthy for down
--- ---
<a id="outputs"></a>
## Outputs ## Outputs
**Example S3 layout:** **Example S3 layout:**
@ -163,6 +170,7 @@ Gold-layer datasets are structured to support:
<p align="right">(<a href="#readme-top">back to top</a>)</p> <p align="right">(<a href="#readme-top">back to top</a>)</p>
<a id="roadmap"></a>
## Roadmap ## Roadmap
- Add orchestration (Airflow / Dagster) - Add orchestration (Airflow / Dagster)
@ -173,12 +181,12 @@ Gold-layer datasets are structured to support:
- Add CDC-style ingestion simulation - Add CDC-style ingestion simulation
<a id="license"></a>
## License ## License
Distributed under the MIT License. See [LICENSE.txt](https://git.camcodes.dev/Cameron/Data_Lab/src/branch/main/LICENSE.txt) for more information. Distributed under the MIT License. See [LICENSE.txt](https://git.camcodes.dev/Cameron/Data_Lab/src/branch/main/LICENSE.txt) for more information.
<a id="contact"></a>
## 💬 Connect With Me ## 💬 Connect With Me