Final Peer Reviews for #DEZoomcamp complete! β
Just evaluated a complex SkyPulse Streaming Pipeline. Impressed by the real-time flight data visualization and the use of Apache Flink + Redpanda. Reviewing expert architectures like this is a game-changer for my engineering criteria.
Posts by Jhames Mejia
Final Project for the Data Engineering Zoomcamp complete! π
Built an end-to-end pipeline to audit Colombian public procurement (SECOP II) bridging Law and Tech. Used AWS (S3, Glue, Athena), dbt for transformations, and Looker Studio.
Check my repo & dashboard: github.com/CodingJhames...
The Data Warehouse is officially online! π©οΈ
Mapped 100k raw SECOP II records in S3 using AWS Glue and queried them with Athena. The best part? Connecting my local James-T-850 via #dbt to run the first staging models directly in the cloud. π
github.com/CodingJhames...
#DataEngineering #AWS #dbt
Data Ingestion milestone reached! ποΈ
Successfully deployed #AWS S3 with #Terraform and ingested 100k records from the SECOP II API using Python on my James-T-850. Everything is now stored as Parquet.
Repo: github.com/CodingJhames...
#DataEngineering #Python #Cloud #DEZoomcamp
Week 7 #DataEngineering Zoomcamp ποΈ
Streamed 4.4M records via #Redpanda & #PySpark on my James-T-850. Speed is nothing without logic!
Results: π
π Dist: 9506
ποΈ Zone: 74
β³ Session: 31m
π° Peak Tip: 10-16 18:00
Progress: github.com/CodingJhames...
#Streaming #Python #BigData #DataTalksClub
Week 6 Spark Module of #DataEngineeringZoomcamp completed! π οΈ
Bridging legal logic with large-scale batch processing.
πΉ PySpark & DataFrames
πΉ Parquet optimization
πΉ AWS EC2 deployment
πΉ Spark UI monitoring (4040)
Solution: github.com/CodingJhames...
Course: github.com/DataTalksClu...
Totally agree! Schema drift was a great lesson today. dlt handles schema versioning out of the box, storing it as YAML in the destination. It definitely gives me peace of mind knowing the evolution is tracked even if the API changes. Thanks for the insight!
#DEZoomcamp W6: dlt & DuckDB! π
Schema drift: API changed tip_amount to tip_amt. β οΈ
DuckDB's error logs pointed out the fix instantly.
Running #dlt on an AWS t3.micro (1GB RAM).
0.2666 Credit prop | 6063.41 Tips.
Repo: github.com/CodingJhames/de-zoomcamp-james
#DataEngineering #AWS #BuildInPublic
Thanks, Jeremy! Performance took a hit due to disk I/O, but the goal was to prove that logic beats a tight budget. I used the zone data for a join, and you're rightβitβs the perfect case for broadcast joins to avoid exploding that 1GB RAM limit!.
Data Engineering Week 5: Done! π
Pivoted to #AWS from GCP. βοΈ
Ran #PySpark on a t3.micro (1GB RAM) using a 4GB Swapfile. Processed NYC Taxi data smoothly without crashes. π§
Adaptability > Tools. π¦Ύ
Code: github.com/CodingJhames...
#DataEngineering #Spark #AWS #OpenSource
Week 4 Analytics Engineering: Complete! π¨βπ»
Mastered dbt with NYC Taxi data in AWS Athena. Overcoming those Parquet type mismatches was the final boss! π
Project link:
github.com/CodingJhames...
#DataEngineering #dbt #BuildInPublic
Week 3 #DataZoomcamp done! π
Migrated the DW logic to AWS Athena & S3 βοΈ
πΉ 20M+ records with #Kestra πΉ Optimized queries: 310MB β 26MB scan using Partitioning & Clustering πΉ Mastered cloud-agnostic DW concepts
Check my repo: π github.com/CodingJhames...
#DataEngineering #AWS #Athena