14 years running Kubernetes in production. From monolith migrations to supply chain incident response — I architect platforms that scale without breaking.
Designing and operating production-grade EKS and GKE clusters. Multi-account, multi-cloud, with Istio service mesh, RBAC hardening, and network policies that actually enforce boundaries.
Multi-environment Terraform and Terragrunt setups with versioned module pipelines. ArgoCD App-of-Apps patterns, sync waves, and GitOps workflows where Git is the only way things change.
Supply chain hardening, OIDC-based CI/CD, runtime threat detection with Falco and Tetragon, WAF consolidation, and secrets management. Security baked in, not bolted on.
A malicious package was injected via a compromised GitHub Actions runner, exposing secrets across AWS and GCP. Led full incident response: forensic analysis of 58 findings, credential rotation, OIDC migration replacing static service account keys, runtime threat detection deployment, and a reusable playbook shared org-wide.
Fragmented WAF policies, inconsistent DNS, and per-service certificates made security audits painful. Designed a unified architecture behind Imperva WAF with DNS delegation via Terraform — enabling instant backend switching between AWS and GCP without DNS changes, eliminating certificate sprawl across all environments.
Ad-hoc deployments caused config drift, inconsistent rollouts, and no audit trail for production changes. Designed a GitOps platform with ArgoCD App-of-Apps and sync waves for ordered deployment orchestration. Git became the single source of truth — no manual kubectl, rollbacks are a git revert.
Every new microservice required DevOps coordination for manifests, databases, buckets, and secrets — a constant bottleneck. Built a simple YAML config per service declaring its full context boundary. A custom Kubernetes operator reconciled everything automatically: deployment, database, storage, secrets, networking. Developers gained full autonomy. DevOps kept guardrails.
Handling terabytes of daily VPN traffic with EFK stack on Kubernetes — existing tooling couldn't throttle log ingestion without losing data. Designed a custom HA logging and monitoring setup and contributed the "Throttle" filter plugin to FluentBit, which became part of the upstream project. Still used in production systems today.
A practical look at deploying Tetragon for runtime security — what syscall-level visibility looks like in a real Kubernetes cluster, and the false positive problem nobody talks about.
// runtime-security · ebpfMigrating GCP service account keys to Workload Identity Federation after a supply chain incident. The before/after, the gotchas, and a Terraform setup you can actually use.
// security · oidc · gcpHow sync waves solve the "CRD not ready" problem at scale. Practical patterns for App-of-Apps, what breaks first, and the monitoring setup that saves you at 2am.
// gitops · argocd · kubernetesBuilding a proper home lab for testing security tooling — QNAP NAS as PXE server, UniFi networking, NetBird for remote access, and why the GL.iNet GL-RM10 is worth it.
// homelab · k3s · networkingThe full runbook for migrating a production Kafka cluster through ZooKeeper deprecation to KRaft mode. Metadata version transitions, the rollback plan, and what the monitoring looks like mid-migration.
// kafka · strimzi · kubernetesWhen you have 250+ environments across AWS and GCP, resource naming becomes a serious operational problem. A hash-based convention that makes debugging production logs take seconds instead of minutes.
// platform · multi-cloud · iac