At SREcon19 Americas, I gave a talk called "Operating within Normal Parameters: Monitoring Kubernetes". I also reprised this talk at the Cloud Native PDX meetup in October 2019 and the Portland DevOps meetup in May 2020. Here's some links and resources related to my talk, for your reference.
Operating within Normal Parameters: Monitoring Kubernetes
- Talk slides (pdf download)
- Talk video, hosted on YouTube
- Try it yourself: sample code on GitHub
- Prometheus documentation
- Grafana documentation (for dashboards and visualization)
Additional Prometheus metrics sources
I'm including these documents for reference to add some context around what's currently happening (as of 2019Q1) in the Kubernetes instrumentation SIG and wider ecosystem.
Note that GitHub links are pinned to their most recent commit to ensure they will not break; if you want the latest version, make sure to switch the branch to "master".
- SIG Instrumentation Meeting Minutes (note: you must join the Google Group to be able to access these)
- Kubernetes 1.14 metrics overhaul (KEP-0031)
- Core Metrics proposal
- Kubelet Resource Metrics (formerly "Core Metrics") Endpoint proposal
- Kubernetes monitoring architecture
- Kubernetes instrumentation guidelines