Kubernetes is a platform thatโs constantly evolving. Thatโs why I occasionally take a fresh look at features that I havenโt worked with for a few years. I recently went through this exercise with Horizontal Pod Autoscalers (HPAs) and noticed some features of interest and limitations that are useful to know about, but arenโt well documented or obvious.
A โgotchaโ worth noting is that HPAโsย apiVersion: autoscaling/v2ย implies that HPAs are a mature API. In reality, scaling based on custom and external metrics is still a bit of an untamed frontier that could use improvement. This is unfortunate, as custom metrics tend to be more accurate and useful than CPU- and RAM-based autoscaling.
Letโs explore a few use cases where custom metrics improve scalability and the tools to make it happen.
Use Cases
- Once custom metrics have been set up and presented in an HPA-friendly format, HPAs can scale based on multiple metrics. Theย Kubernetes docs give a partial example of this, where an HPA is configured to scale based on CPU utilization, packets per second and requests per second. The idea is each metric can suggest a different number of desired replicas, like 3, 5 and 8. Then, the HPA can scale to the highest suggested count.
- Theย Kube Prometheus Stack Helm chartย is one of the most popular and supported cloud-agnostic solutions for providing custom metrics. The stack contains several apps, andย Prometheus Adapter for Kubernetes Metrics APIs is the component that can convert and publish Prometheus metrics into a format that HPAs can understand and use.
- Incoming requests per second and request duration/latency are solid metrics for scaling web services.
- Architectures with queues or storage buckets, where objects to be processed are uploaded, can decide to autoscale services that process those objects based on the number of items detected.
- Kubernetes Event Driven Autoscalingย (KEDA) hasย scalersย that can interface with several queues (like Pub/Sub) and object storage buckets. It also supports running queries against metric, log, SQL and NoSQL databases to generate metrics.
- Variable Minimum Replicas can help balance the ability to support traffic spikes and keep down costs. If youโve ever supported an app whose replicas are slow to start or that experiences massive traffic spikes, youโve probably run into a need to increase your minimum replicas to maintain quality of service.
- For example: If one replica can support 100 reqs/sec, it could be configured to autoscale at 50 reqs/sec to avoid errors or lag seen above 100 reqs/sec. Having a minimum of 10 replicas would support traffic spikes better than a minimum of two.
- The only problem with this technique is that your costs go up, but itโs reasonable to think an app might have semi-predictable traffic spikes. Maybe a minimum of two would work outside of normal business hours; a minimum of 10 for hours when spikes are common and a minimum of five at other times.
- KEDAย recently merged an option to plug multiple metrics into a user-customized formula to create composite metrics by plugging other metrics into math formulas. So, desired count = autoscaling metric + cron-based desired count can create the effect of having a variable minimum replica count.
- Serverless and functions-as-a-service platforms hosted on Kubernetes, like the following:
- keda.sh
- knative.dev
- openfaas.com
Limitations and Consequences
All of the above sounds good, and we have tooling to pull it off. So, whatโs this problematic limitation thatโs preventing HPAs from being great?
The following limitation results in significant UX (user experience) consequences:ย https://github.com/kubernetes-sigs/custom-metrics-apiserver/issues/70
The limitation summarized is โThere can only be one custom metrics server.โ If you install the popular kube-prometheus-stack with default settings, youโll get Prometheus Adapter. You canโt use KEDAโs keda-operator-metrics-apiserver, Knativeโs Knative Pod Autoscaler, OpenFaaS Proโs autoscaler, Datadogโs autoscaler or others. This is because they all work by hosting a custom metric server. I think this is why KEDA has enough scalers to give the impression they want to be the poster child of feature creep. KEDA hasย over 60 scalers, including ones for Prometheus and Datadog. This doesnโt make sense until you realize that part of the reason is to work around the limitation.
So why does this limitation degrade the UX? Letโs start with an example of what good UX looks like.
The Kube Prometheus stack and Helm are popular for a reason. They offer a great UX and part of their secret sauce is following the concept of โconvention over configuration.โ They can offer sensible default values and a default configuration where hundreds to thousands of YAML objects are pre-wired together according to conventions to achieve a turnkey UX where a lot of stuff works out of the box.
A prerequisite for making that magical UX happen is the ability to own a namespace that wonโt conflict with other things, as then you can establish conventions in your namespace so things can be pre-wired together without needing handcrafted configuration.
Final Thoughts
Multiple proofs-of-concept (PoCs) have been made for this common problem, and even a Kubernetes enhancement proposal was created to solve it;ย that fizzled out. Itโs my hope that this article shines light on this problem and encourages custom metric scaling APIs to get some renewed interest.



