- Python + pypdf - Marker (for markdown conversion)
- Preserves images, tables, diagrams
- Bookmark parsing + fallback to text scanning
- Page offset detection (handles front matter)
Posts by Akash Rajvanshi
So I created a simple wrapper that splits PDFs into chapters and logical groups, then converts them into both Markdown and PDF. This makes it easier to work with the content in both NotebookLM-style RAG workflows and tools like Codex or Claude.
Built a small NotebookLM helper this weekend. NotebookLM is great for parsing PDFs and other text-heavy content, but quality can drop when you upload a full book as a single PDF.
- Spent the week learning Flink, Spark, and AWS data services like EMR, Glue, and Managed Flink
- Added a refurbished Seagate Exos 4TB drive to Unraid as a parity drive
Day 18
- Explored Envoy Gateway with Gateway API while moving away from the deprecated NGINX Ingress Controller, and also looked into Robusta and HolmesGPT for observability and SRE workflows
- Next step is understanding these correlations more deeply to add dynamic alerting and SRE agent integration into notifications
- Spent the day on observability, using a correlation-based approach and Istio for multi-region observability across AKS, with a centralized visualization layer
Day 17
- Built a problem correlation dashboard using Flink to analyze logs and metrics from Kafka topics and generate correlated metrics. Currently ingesting homelab VM/container, Kafka, and Traefik data.
โข Had a disk failure in the homelab, so currently looking for a refurbished drive for Unraid.
๐๐ฎ๐ 16
โข Planning to build a dashboard that gives a complete birdโs-eye view of the project infrastructure
โข Understanding how correlation works in observability with Flink and ClickHouse to build better dashboards and alerts.
Resuming my 180 days of learning across Observability, Data, AI, Development, and SRE. Iโll also be sharing more of my homelab work along the way. The streak was paused for a while because of other commitments.
Part 3 โ the operational side. 3-2-1 backup strategy, a Kafka-powered observability pipeline with VictoriaMetrics + Grafana, GitOps deployments, and 2026 plans.
blogs.thedevopsguy.biz/blog/homelab-architectur...
#homelab #selfhosted #devops #proxmox
Part 2 โ every VM, LXC, and container across all nodes. 30+ services on the media stack alone, plus Authentik, Teleport, Gitea, FreshRSS, and more.
blogs.thedevopsguy.biz/blog/homelab-architectur...
My 2026 homelab runs 7 Proxmox nodes, 37TB of NAS storage, pfSense + UDM SE, PiKVM for remote management, and APC UPS protection.
Part 1 covers the full infrastructure ๐
blogs.thedevopsguy.biz/blog/homelab-architectur...
My 2026 Homelab Architecture Part 1: The Infrastructure
UDM SE + pfSense dual-router setup, Proxmox nodes 40+ CPU, 200GB+ RAM, 30TB+ across 4 NAS devices, PiKVM for out-of-band access, and dual APC UPS tiers.
blogs.thedevopsguy.biz/blog/homelab-architectur...
- Working on my homelab architecture and components write-up
- PostgreSQL Replications Slot copy: x.com/crunchydata/status/20215...
Day 15
- Completed pure and first-class functions in python functional programming
- Studied concepts and algorithms used in non-relational databases (document-based and key-value stores), focusing on scaling and high availability
- Great read today: medium.com/@devanshusharma658/build...
- Spent the day working on AKS observability: added the Cilium dataplane and replaced NGINX Ingress with Istio. Since this is a centralized observability setup, we are using an API gateway to receive data from multiple accounts and sources
Day 14
- Worked on my notes website over the weekend; completed it today and will keep adding more content:
https://blogs.thedevopsguy.biz/
- github.com/SoulKyu/vault-db-injecto...
- Exploring immutable OS options for the homelab (NixOS, Fedora CoreOS, Flatcar, openSUSE MicroOS). Planning to try Fedora CoreOS and openSUSE MicroOS for servers while continuing NixOS on the desktop
Day 13
- Read about PostgreSQL tuning; explored pgwatch and a Vault DB injector demo using OpenBao.
- stormatics.tech/blogs/postgresql-materia...
- https://github.com/cybertec-postgresql/pgwatch
Read these PostgreSQL-related blogs:
- engineering.zalando.com/posts/2025/12/contributi...
- neon.com/postgresql/postgresql-18...
Day 12
- Not feeling well for the past two days
- Reading an excellent book on Python workout exercises and continuing to explore functional programming in Python
www.manning.com/books/python-workout-sec...
Day 11
- Reading and solving problems on functional programming in Python
- Must-reads for databases: understanding how WAL works and the algorithms behind database recovery
https://yashagw.github.io/blog/db-recovery/
https://x.com/BenjDicken/status/2016514026344464535
[3/3]
- Used NotebookLM to chat with PDFs for TypeScript and Python concepts
stormatics.tech/blogs/unused-indexes-in-...
[2/3]
- Read: Unused Indexes in PostgreSQL โ Risks, Detection, and Safe Removal
- Key takeaway: Unused indexes increase write overhead and waste storage. Use pg_stat_user_indexes to detect them and always verify before dropping.
Day 10
[1/3]
Major updates to the homelab media stack:
- Added Glance dashboard
- Applied resource limits to all containers
- Enabled Komodo webhook-based auto-deployment
- Separated multi-host environment configs
https://github.com/AkashRajvanshi/homelab-media-stack