Dmitriy Shmatov — Cloud & Kubernetes Infrastructure

// 01 — services

What I do

Kubernetes & Cloud Architecture

Designing and operating production-grade EKS and GKE clusters. Multi-account, multi-cloud, with Istio service mesh, RBAC hardening, and network policies that actually enforce boundaries.

EKS GKE Istio Cilium RBAC

Infrastructure as Code & GitOps

Multi-environment Terraform and Terragrunt setups with versioned module pipelines. ArgoCD App-of-Apps patterns, sync waves, and GitOps workflows where Git is the only way things change.

Terraform Terragrunt ArgoCD Helm SOPS

DevSecOps & Runtime Security

Supply chain hardening, OIDC-based CI/CD, runtime threat detection with Falco and Tetragon, WAF consolidation, and secrets management. Security baked in, not bolted on.

Falco Tetragon OIDC WAF eBPF

// 02 — case studies

Deep dives

2025 Security · Incident response

Supply chain attack: from compromised runner to hardened pipeline

A malicious package was injected via a compromised GitHub Actions runner, exposing secrets across AWS and GCP. Led full incident response: forensic analysis of 58 findings, credential rotation, OIDC migration replacing static service account keys, runtime threat detection deployment, and a reusable playbook shared org-wide.

Zero confirmed breach · Pipeline fully hardened · No downtime

2024 Architecture · Multi-cloud

WAF & API gateway consolidation across 250+ environments

Fragmented WAF policies, inconsistent DNS, and per-service certificates made security audits painful. Designed a unified architecture behind Imperva WAF with DNS delegation via Terraform — enabling instant backend switching between AWS and GCP without DNS changes, eliminating certificate sprawl across all environments.

Single security perimeter · Instant multi-cloud failover · 100% managed as code

2022 Platform · GitOps

GitOps at scale: ArgoCD App-of-Apps across 250+ environments

Ad-hoc deployments caused config drift, inconsistent rollouts, and no audit trail for production changes. Designed a GitOps platform with ArgoCD App-of-Apps and sync waves for ordered deployment orchestration. Git became the single source of truth — no manual kubectl, rollbacks are a git revert.

Zero config drift · Full audit trail · Dramatically faster time-to-production

2019 Platform engineering · Kubernetes

Developer-first deployment platform with custom Kubernetes operator

Every new microservice required DevOps coordination for manifests, databases, buckets, and secrets — a constant bottleneck. Built a simple YAML config per service declaring its full context boundary. A custom Kubernetes operator reconciled everything automatically: deployment, database, storage, secrets, networking. Developers gained full autonomy. DevOps kept guardrails.

Self-service deployments · Consistent resource ownership · Zero DevOps bottleneck

2017 Open source · SRE

5–10TB/day: building HA logging at scale and contributing to FluentBit

Handling terabytes of daily VPN traffic with EFK stack on Kubernetes — existing tooling couldn't throttle log ingestion without losing data. Designed a custom HA logging and monitoring setup and contributed the "Throttle" filter plugin to FluentBit, which became part of the upstream project. Still used in production systems today.

Open source contribution · System stability at petabyte scale

// 03 — tooling

Stack & tooling

Cloud & Orchestration

AWS (EKS, EC2, S3, RDS, IAM) GCP (GKE, GCS, IAM, Pub/Sub) Kubernetes (since 2015) Docker (since 2013) Istio service mesh Cilium · eBPF

IaC & GitOps

Terraform · Terragrunt ArgoCD · Flux Helm · Kustomize GitHub Actions · OIDC SOPS · KMS External Secrets Operator

Security & Observability

Falco · Tetragon KubeArmor Imperva WAF AWS GuardDuty · Security Hub Prometheus · Grafana Loki · FluentBit

Messaging & Data

Kafka (Strimzi · KRaft) PostgreSQL · Redis Kong API Gateway Bash · Python · Go NetBird · WireGuard k3s · home lab

// 04 — lab notes

Writing & experiments

Coming soon

eBPF tracing in production: what Tetragon actually sees

A practical look at deploying Tetragon for runtime security — what syscall-level visibility looks like in a real Kubernetes cluster, and the false positive problem nobody talks about.

// runtime-security · ebpf

Coming soon

OIDC for CI/CD: why static keys are a disaster waiting to happen

Migrating GCP service account keys to Workload Identity Federation after a supply chain incident. The before/after, the gotchas, and a Terraform setup you can actually use.

// security · oidc · gcp

Coming soon

ArgoCD sync waves: ordering chaos in 250+ environments

How sync waves solve the "CRD not ready" problem at scale. Practical patterns for App-of-Apps, what breaks first, and the monitoring setup that saves you at 2am.

// gitops · argocd · kubernetes

Coming soon

Home lab: 5-node k3s cluster with NetBird and PXE boot

Building a proper home lab for testing security tooling — QNAP NAS as PXE server, UniFi networking, NetBird for remote access, and why the GL.iNet GL-RM10 is worth it.

// homelab · k3s · networking

Coming soon

Kafka KRaft migration: Strimzi 3.7 → 4.1 without downtime

The full runbook for migrating a production Kafka cluster through ZooKeeper deprecation to KRaft mode. Metadata version transitions, the rollback plan, and what the monitoring looks like mid-migration.

// kafka · strimzi · kubernetes

Coming soon

Multi-cloud naming strategy: deterministic hashes at 250+ envs

When you have 250+ environments across AWS and GCP, resource naming becomes a serious operational problem. A hash-based convention that makes debugging production logs take seconds instead of minutes.

// platform · multi-cloud · iac

Helping teams ship safer, more resilient infrastructure

What I do

Deep dives

Stack & tooling

Writing & experiments

Helping teams ship safer,
more resilient infrastructure