[Feature Request] stagger pods start
I have no idea if it's something that can be done easily or at all, so here it is:
We use the default configuration and set an uptime range, so our pods go to sleep at 9pm and wake up at 7am.
When it's time to scale back up all the pods (=7am), our servers cannot cope with the number of pods starting up; the CPU required for starting many pods at once is huge, AND the cluster was downscaled.
Having a configuration option that would allow us to set something like:
"Cannot upscale more than 5 pods/per min" (or "5 deployments per min" if it makes things easier ?)
Thanks for your help !
Can you share more details on why the upscale causes issues? This looks like a problem with your setup, e.g. we (Zalando) frequently scale from something like 10 to 200 replicas which takes some time (cluster autoscaler has to provision nodes), but causes no issues.
Yes, this is a problem with our setup:
we use java, so the pod startup just smashes the CPU of the EC2 servers.
Which creates a feedback loop: they take longer to start, so their probes fail, they get killed and startup again (thus using more CPU).
There are several ways we could work around this, like:
- setting a lower CPU limit for each pod, but then, they'd take longer to start up on normal circumstances; (and also, they would not be able to burst when needed)
- increasing the probe delays - but it would take longer to detect failures
That's why I thought the stagger pod upscale could be a good idea - but I agree the issue stems from our setup.
ps: this is for our "dev" cluster anyway, so it's not a huge issue.
we would also face heavy load spikes by rescaling around 250-700 Pods (16-20 namespaces) up at the same time.
Was also reading the code, but related it's interate first over resource types before over namespaces, think a stagger deployment per NS (1 NS every 2-3min) is not easy to implement or need to switch this logic.
In our setup 1 NS = 1 review system. But Statefulsets and Deployments has sometimes start up dependencies. So scale up all deployments in a staggered way, but not able to start related waiting for some StatefulSets to be reachable, would/could mean no benefit at all.
My Solution is now, try to add the NS annotation with some random minute values for each NS, so the load is better spread and the kube-downscaler can stay like it is.
Deleting a branch is permanent. It CANNOT be undone. Continue?