Advertisement Β· 728 Γ— 90

Posts by baris a. 🌿🌌

Preview
GitHub - middaycoffee/lithium-lake Contribute to middaycoffee/lithium-lake development by creating an account on GitHub.

DAY 54: Capstone Complete & Documented!
Finished the `README.md` with an architecture diagram (Bronze -> Silver -> Gold). Documented the pipeline run instructions and the materials science logic behind the Gold layer. Project submitted

github.com/middaycoffee...
#dezoomcamp #dataengineering

2 days ago 0 0 0 0

DAY 53: Bringing the data to life in Looker Studio!
Wired BigQuery to Looker Studio. Built an executive scorecard and an interactive scatter plot (Energy Above Hull vs Band Gap). Wrote a custom calculated field to visually "grey out" materials that failed mechanical or safety checks!
#dataviz

2 days ago 0 0 0 0

DAY 52: CI/CD Pipeline implemented!
Set up a `.github/workflows/deploy.yml` file. Now, whenever I push code to the main branch, GitHub Actions automatically spins up a runner to lint my SQL and Python code. The pipeline is fully portable and containerized.
#githubactions #cicd #devops #dezoomcamp

2 days ago 0 0 0 0

DAY 51: Dockerization.
Wrote the `Dockerfile` for the pipeline. Used `python:3.11-slim`, installed Bruin CLI and dependencies. Ran into the classic "Virtual Disk Balloon" disk-space issue, but managed to clear out WSL space, optimize layer caching, and build the image successfully!
#docker

2 days ago 0 0 0 0

DAY 50: The Gold Layer is materialized!
Wrote the final SQL transformation. Filtered out toxic elements (Pb, Cd, As) using BigQuery `UNNEST` and ranked the surviving materials. Out of thousands of compounds, narrowed it down to the elite candidates. Zero-tolerance DQ checks passed!
#bigquery #sql

2 days ago 0 0 0 0

DAY 49: Coding the Science (Gold Layer Prep).
Defined the actual metallurgical constraints for a viable solid-state electrolyte: Stability (energy_above_hull <= 0.05), Insulation (band_gap > 1.0), and calculated Pugh's Ratio for ductility to avoid brittle batteries!
#materialsscience #sql

2 days ago 0 0 0 0

DAY 48: Polishing the Silver Layer.
Built the `silver_materials.sql` asset in Bruin. Flattened nested JSON, casted `band_gap` and `energy_above_hull` to FLOAT64, and implemented Bruin Data Quality checks to ensure `material_id` is unique. All checks passed!
#sql #dataquality #analyticsengineering

2 days ago 0 0 0 0

DAY 47: Bridging the Lake and the Warehouse.
Designed the Silver schema. Kept the thermodynamic and structural data, dropped the API noise. Wrote the DDL to create a BigQuery External Table pointing directly at the GCS Parquet files to save on storage costs.
#bigquery #sql #gcp #dezoomcamp

2 days ago 0 0 0 0

DAY 46: Automation engaged!
Moved my Python extraction logic into a Bruin Python Asset. Ran my first end-to-end `bruin run` dry run. The orchestrator successfully triggered the script and landed the Parquet file in the cloud with zero manual intervention. Phase 2 complete!
#bruin #orchestration

2 days ago 1 0 0 0
Advertisement

DAY 45: The Bronze Layer is active.
Updated the ingestion script to convert the raw JSON payloads into Pandas DataFrames. Validated the schemas locally and successfully saved the raw data as `.parquet` files directly into my GCS bucket.
#pandas #gcp #parquet #dataengineering

2 days ago 0 0 0 0

DAY 44: Digging for Lithium!
Wrote the Python extraction script (`ingestion.py`) using `mp-api`. Implemented batching logic to pull all Lithium-containing materials with 2-4 elements. Pulling thousands of records without hitting API rate limits!
#python #api #materials #data

2 days ago 2 0 0 0

DAY 43: Local environment is locked in.
Ran `terraform apply` and watched the GCP infrastructure spin up perfectly. Installed the Bruin CLI, initialized the project, and configured my `.bruin.yml` with GCP credentials securely ignored in Git. Ready to extract data
#bruin #dataengineering #dezoomcamp

2 days ago 0 0 0 0

DAY 42: Infrastructure as Code day! ☁️
Set up my Google Cloud Platform project and secured my Service Accounts. Wrote the Terraform scripts (`main.tf` and `variables.tf`) to automatically provision my GCS Bronze buckets and BigQuery datasets.
#terraform #gcp #dataengineering #cloud

2 days ago 1 0 0 0

DAY 41: started DE ZOOMCAMP 2026 Final Capstone Project!

I will try to build an end-to-end pipeline to boost solid-state ev battery researchs by using Materials Project API as the data source and Bruin as the workflow+sql!
#python #dezoomcamp #sql #data

2 days ago 1 0 0 0
Preview
GitHub - DataTalksClub/data-engineering-zoomcamp: Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course... Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here πŸ‘‡πŸΌ - DataTalksClub/data-engineering-zoomcamp

DAY 40: Module 7 of Data Engineering Zoomcamp done!

- Kafka producers and consumers
- PyFlink tumbling and session windows
- Real-time taxi data analysis
- Redpanda as Kafka replacement

My solution: github.com/middaycoffee...

Free course by DataTalksClub: github.com/DataTalksClu...

3 weeks ago 0 0 0 0
Preview
PyFlink Stream Processing Tutorial: Build a Real-Time Pipeline with Kafka, Redpanda and Python YouTube video by DataTalksClub ⬛

DAY 39: session and tumbling windows operations in flink. and querying them with pgcli.

#sql #python #flink
www.youtube.com/live/YDUgFeH...

3 weeks ago 1 0 0 0
Preview
data-engineering-zoomcamp/module-7-streaming at main Β· middaycoffee/data-engineering-zoomcamp Notes and codes of Data Engineering Zoomcamp focusing on Docker, Kestra, dbt, Terraform, BigQuery, Bruin, Kafka. - middaycoffee/data-engineering-zoomcamp

DAY 38: models[.]py file for json serialization and deserialization. so that architacting models is easier and can be implemented on any desired notebook easily. #flink #postgres #python #sql
github.com/middaycoffee...

3 weeks ago 0 0 0 0
Preview
PyFlink Stream Processing Tutorial: Build a Real-Time Pipeline with Kafka, Redpanda and Python YouTube video by DataTalksClub ⬛

DAY 37: by using producer and consumer jupyter notebook scripts, i've processed the nyc 2025 green taxi data for november. some problems occured with postgress port (5432-33 difference) and redpandas initiation but it is solved at the end #python #sql #flink #postgres
www.youtube.com/live/YDUgFeH...

3 weeks ago 0 0 0 0
PyFlink Stream Processing Tutorial: Build a Real-Time Pipeline with Kafka, Redpanda and Python
PyFlink Stream Processing Tutorial: Build a Real-Time Pipeline with Kafka, Redpanda and Python YouTube video by DataTalksClub ⬛

DAY 36: created my first job with flink! used docker-compose to create jobmanager and taskmanager and used

docker compose exec jobmanager bash -c "./bin/flink run -py /opt/src/job/pass_through_job.py -d"

to create the job.

#flink #python #sql #data

youtu.be/YDUgFeHQzJU?...

3 weeks ago 0 0 0 0
Advertisement
Preview
PyFlink Stream Processing Tutorial: Build a Real-Time Pipeline with Kafka, Redpanda and Python YouTube video by DataTalksClub ⬛

DAY 35: Real-time pipeline construction with Kafka using pgcli localhost. used nyc taxi data and processed 1000 rows.

#python #kafka #sql
www.youtube.com/live/YDUgFeH...

4 weeks ago 1 1 0 0
Preview
PyFlink Stream Processing Tutorial: Build a Real-Time Pipeline with Kafka, Redpanda and Python YouTube video by DataTalksClub ⬛

DAY 34: Started stream processing tutorial. We are diving into PyFlink, kafka and redpanda to build a real-time pipeline

#python #pyflink #kafka #redpanda

www.youtube.com/live/YDUgFeH...

4 weeks ago 0 0 0 0
Preview
GitHub - DataTalksClub/data-engineering-zoomcamp: Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course... Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here πŸ‘‡πŸΌ - DataTalksClub/data-engineering-zoomcamp

DAY 33: ⚑ Module 6 of Data Engineering Zoomcamp done!

- Batch processing with Spark πŸ”₯
- PySpark & DataFrames
- Parquet file optimization
- Spark UI on port 4040

My solution: github.com/middaycoffee...

Free course by DataTalksClub: github.com/DataTalksClu...

1 month ago 0 0 0 0
DE Zoomcamp 5.4.3 - Joins in Spark
DE Zoomcamp 5.4.3 - Joins in Spark YouTube video by DataTalksClub ⬛

DAY 32: practiced joins with pyspark. anatomy of spark clusters, and GroupBy in spark

#sql #python #pyspark #spark
www.youtube.com/watch?v=lu7T...

1 month ago 0 1 0 0
Preview
data-engineering-zoomcamp/06-batch/code/06_spark_sql.py at main Β· DataTalksClub/data-engineering-zoomcamp Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here πŸ‘‡πŸΌ - DataTalksClub/data-engineering-zoomcamp

DAY 31: sql queries with spark.sql and writing the resulting tables to local folder as .write.parquet, utilizing .coalesce(1).

#spark #sql #python
github.com/DataTalksClu...

1 month ago 0 0 0 0
DE Zoomcamp 5.3.4 - SQL with Spark
DE Zoomcamp 5.3.4 - SQL with Spark YouTube video by DataTalksClub ⬛

DAY 30: Downloading and reading parquet files with spark, running SQL queries.

#python #sql #dezoomcamp #spark
www.youtube.com/watch?v=uAlp...

1 month ago 1 0 0 0
DE Zoomcamp 5.3.2 - Spark DataFrames
DE Zoomcamp 5.3.2 - Spark DataFrames YouTube video by DataTalksClub ⬛

DAY 29: select, filter, actions, transformation, and functions with spark on a nyc taxi data (.csv.gz -> parquet). now i get a sense how big companies handle massive data accross many executers and CPUs...
#python #spark #sql

www.youtube.com/watch?v=ti3a...

1 month ago 0 0 0 0
Advertisement
DE Zoomcamp 5.3.1 - First Look at Spark/PySpark
DE Zoomcamp 5.3.1 - First Look at Spark/PySpark YouTube video by DataTalksClub ⬛

DAY 28: Repartitioning with spark! these are some great applications on efficiencs when handling big data. you should check it out.
#partitioning #python #spark #sql #dezoomcamp

www.youtube.com/watch?v=r_Sf...

1 month ago 1 1 0 0
DE Zoomcamp 5.3.1 - First Look at Spark/PySpark
DE Zoomcamp 5.3.1 - First Look at Spark/PySpark YouTube video by DataTalksClub ⬛

DAY 27: started module 6 batch processing of data engineering zoomcamp. using pyspark from jupyter notebook to handle some "massive files".

check the videos:
www.youtube.com/watch?v=r_Sf...

#python #sql #dezoomcamp

1 month ago 1 0 0 0

i don't wanna go back to X but it fees like a desert here. there are only some people from usa here.
#python #sql

1 month ago 1 0 0 0
Preview
python-playground/istanbul-air-tracker at main Β· middaycoffee/python-playground A playground for Python, small scripts for random tasks. - middaycoffee/python-playground

github.com/middaycoffee...

1 month ago 0 0 0 0