vertical pod autoscaler
Resource request is a contract between your workload and the Kubernetes scheduler.
Using monitoring tools with Kubernetes clusters for getting insights about the usage of
memory of application could help. Complementary to this,
kubectl top pods to get more insights may help too. In the meantime, you could run load tests and simulate more usage of your applications, the more close to what you could have in Production with high traffic, the better.
I recently discovered that Vertical Pod Autoscaler is better at it since that’s its job.
Setting resource request and limit is hard, VPA is here to help. Observes usage, Recommends resources and Updates resources (if
Let’s now see in actions how easy it is to leverage VPA on a GKE cluster.
First enable VPA:
# For a new cluster gcloud container clusters create --enable-vertical-pod-autoscaling # For an existing cluster gcloud container clusters update --enable-vertical-pod-autoscaling
Then deploy a VPA resource for your specific application:
apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: myblog spec: targetRef: apiVersion: "apps/v1" kind: Deployment name: myblog updatePolicy: updateMode: "Off"
And finally, after some time and load tests with your applications, you could discover those metrics by running
kubectl describe vpa myblog:
... Recommendation: Container Recommendations: Container Name: myblog Lower Bound: Cpu: 3m Memory: 4194304 Target: Cpu: 6m Memory: 5242880 Uncapped Target: Cpu: 6m Memory: 5242880 Upper Bound: Cpu: 6m Memory: 5242880 ...
In this illustration, I’m using the
updateMode: "Off", with that I need to manually apply those numbers on my
Lower Bound could be used to set the
requests numbers (but you may want to use
Target to be more conservative).
Upper Bound could be used to set the
limits numbers. So with the numbers gotten above, here is what the
Deployment manifest will look like to represent them:
... resources: requests: cpu: 3m memory: 4Mi limits: cpu: 6m memory: 5Mi ...
Notes: those numbers could be applied automatically and continuously if you are using
updateMode: "Auto" instead. Furthermore, there is also some known limitations with VPA to be aware of.
Complementary and further resources:
- VPA on GKE
- Autoscaling with GKE: Overview and pods
- Best practices for running cost-optimized Kubernetes applications on GKE
- Kubernetes Autoscaling 101: Cluster Autoscaler, Horizontal Autoscaler, and Vertical Pod Autoscaler
Hope you enjoyed that blog article and that you are now more equiped to properly set your Kubernetes resources request and limits for your own applications.