DAY 54: Capstone Complete & Documented!
Finished the `README.md` with an architecture diagram (Bronze -> Silver -> Gold). Documented the pipeline run instructions and the materials science logic behind the Gold layer. Project submitted
github.com/middaycoffee...
#dezoomcamp #dataengineering
Posts by baris a. πΏπ
DAY 53: Bringing the data to life in Looker Studio!
Wired BigQuery to Looker Studio. Built an executive scorecard and an interactive scatter plot (Energy Above Hull vs Band Gap). Wrote a custom calculated field to visually "grey out" materials that failed mechanical or safety checks!
#dataviz
DAY 52: CI/CD Pipeline implemented!
Set up a `.github/workflows/deploy.yml` file. Now, whenever I push code to the main branch, GitHub Actions automatically spins up a runner to lint my SQL and Python code. The pipeline is fully portable and containerized.
#githubactions #cicd #devops #dezoomcamp
DAY 51: Dockerization.
Wrote the `Dockerfile` for the pipeline. Used `python:3.11-slim`, installed Bruin CLI and dependencies. Ran into the classic "Virtual Disk Balloon" disk-space issue, but managed to clear out WSL space, optimize layer caching, and build the image successfully!
#docker
DAY 50: The Gold Layer is materialized!
Wrote the final SQL transformation. Filtered out toxic elements (Pb, Cd, As) using BigQuery `UNNEST` and ranked the surviving materials. Out of thousands of compounds, narrowed it down to the elite candidates. Zero-tolerance DQ checks passed!
#bigquery #sql
DAY 49: Coding the Science (Gold Layer Prep).
Defined the actual metallurgical constraints for a viable solid-state electrolyte: Stability (energy_above_hull <= 0.05), Insulation (band_gap > 1.0), and calculated Pugh's Ratio for ductility to avoid brittle batteries!
#materialsscience #sql
DAY 48: Polishing the Silver Layer.
Built the `silver_materials.sql` asset in Bruin. Flattened nested JSON, casted `band_gap` and `energy_above_hull` to FLOAT64, and implemented Bruin Data Quality checks to ensure `material_id` is unique. All checks passed!
#sql #dataquality #analyticsengineering
DAY 47: Bridging the Lake and the Warehouse.
Designed the Silver schema. Kept the thermodynamic and structural data, dropped the API noise. Wrote the DDL to create a BigQuery External Table pointing directly at the GCS Parquet files to save on storage costs.
#bigquery #sql #gcp #dezoomcamp
DAY 46: Automation engaged!
Moved my Python extraction logic into a Bruin Python Asset. Ran my first end-to-end `bruin run` dry run. The orchestrator successfully triggered the script and landed the Parquet file in the cloud with zero manual intervention. Phase 2 complete!
#bruin #orchestration
DAY 45: The Bronze Layer is active.
Updated the ingestion script to convert the raw JSON payloads into Pandas DataFrames. Validated the schemas locally and successfully saved the raw data as `.parquet` files directly into my GCS bucket.
#pandas #gcp #parquet #dataengineering
DAY 44: Digging for Lithium!
Wrote the Python extraction script (`ingestion.py`) using `mp-api`. Implemented batching logic to pull all Lithium-containing materials with 2-4 elements. Pulling thousands of records without hitting API rate limits!
#python #api #materials #data
DAY 43: Local environment is locked in.
Ran `terraform apply` and watched the GCP infrastructure spin up perfectly. Installed the Bruin CLI, initialized the project, and configured my `.bruin.yml` with GCP credentials securely ignored in Git. Ready to extract data
#bruin #dataengineering #dezoomcamp
DAY 42: Infrastructure as Code day! βοΈ
Set up my Google Cloud Platform project and secured my Service Accounts. Wrote the Terraform scripts (`main.tf` and `variables.tf`) to automatically provision my GCS Bronze buckets and BigQuery datasets.
#terraform #gcp #dataengineering #cloud
DAY 41: started DE ZOOMCAMP 2026 Final Capstone Project!
I will try to build an end-to-end pipeline to boost solid-state ev battery researchs by using Materials Project API as the data source and Bruin as the workflow+sql!
#python #dezoomcamp #sql #data
DAY 40: Module 7 of Data Engineering Zoomcamp done!
- Kafka producers and consumers
- PyFlink tumbling and session windows
- Real-time taxi data analysis
- Redpanda as Kafka replacement
My solution: github.com/middaycoffee...
Free course by DataTalksClub: github.com/DataTalksClu...
DAY 39: session and tumbling windows operations in flink. and querying them with pgcli.
#sql #python #flink
www.youtube.com/live/YDUgFeH...
DAY 38: models[.]py file for json serialization and deserialization. so that architacting models is easier and can be implemented on any desired notebook easily. #flink #postgres #python #sql
github.com/middaycoffee...
DAY 37: by using producer and consumer jupyter notebook scripts, i've processed the nyc 2025 green taxi data for november. some problems occured with postgress port (5432-33 difference) and redpandas initiation but it is solved at the end #python #sql #flink #postgres
www.youtube.com/live/YDUgFeH...
DAY 36: created my first job with flink! used docker-compose to create jobmanager and taskmanager and used
docker compose exec jobmanager bash -c "./bin/flink run -py /opt/src/job/pass_through_job.py -d"
to create the job.
#flink #python #sql #data
youtu.be/YDUgFeHQzJU?...
DAY 35: Real-time pipeline construction with Kafka using pgcli localhost. used nyc taxi data and processed 1000 rows.
#python #kafka #sql
www.youtube.com/live/YDUgFeH...
DAY 34: Started stream processing tutorial. We are diving into PyFlink, kafka and redpanda to build a real-time pipeline
#python #pyflink #kafka #redpanda
www.youtube.com/live/YDUgFeH...
DAY 33: β‘ Module 6 of Data Engineering Zoomcamp done!
- Batch processing with Spark π₯
- PySpark & DataFrames
- Parquet file optimization
- Spark UI on port 4040
My solution: github.com/middaycoffee...
Free course by DataTalksClub: github.com/DataTalksClu...
DAY 32: practiced joins with pyspark. anatomy of spark clusters, and GroupBy in spark
#sql #python #pyspark #spark
www.youtube.com/watch?v=lu7T...
DAY 31: sql queries with spark.sql and writing the resulting tables to local folder as .write.parquet, utilizing .coalesce(1).
#spark #sql #python
github.com/DataTalksClu...
DAY 30: Downloading and reading parquet files with spark, running SQL queries.
#python #sql #dezoomcamp #spark
www.youtube.com/watch?v=uAlp...
DAY 29: select, filter, actions, transformation, and functions with spark on a nyc taxi data (.csv.gz -> parquet). now i get a sense how big companies handle massive data accross many executers and CPUs...
#python #spark #sql
www.youtube.com/watch?v=ti3a...
DAY 28: Repartitioning with spark! these are some great applications on efficiencs when handling big data. you should check it out.
#partitioning #python #spark #sql #dezoomcamp
www.youtube.com/watch?v=r_Sf...
DAY 27: started module 6 batch processing of data engineering zoomcamp. using pyspark from jupyter notebook to handle some "massive files".
check the videos:
www.youtube.com/watch?v=r_Sf...
#python #sql #dezoomcamp