Fixing bad CPU usage distribution in Kubernetes
It all started with that smile, that damned smile, that promise of better performance, you know the thing they promise you when you enable TCP keep-alive requests. While that is true for the most part, nothing comes for free.
The price for TCP keep alive in kubernetes is that it renders the IPVS /Iptables based load balancing useless. To explain that, lets take a look at the following example:
Service A(pod A1) calls service B(pod B1) (http://service-b:80), Service A establishes a TCP connection with service B via IPVS to service B, so now pod A1 is connected to pod B1 and all requests between them will use that same connection.
As time goes by, service A generates more requests to service B, pod B1 CPU utilization is increasing and being responsible professionals. We already have HPA inplace, so pod B2 is created and all is good in the world! Right?
WRONG!
Both pods of A service are talking to the smae old pod of service B, there is no trigger for them to drop the existing connections and create a new one with the new pod.
Now that we understand the problem, lets talk solutions:
A. Service mesh
Introducing a service mesh such as Isio/LinkerD/Consul Mesh/etc… would solve this problem completely, but again, at a cost. Maintaining and scaling the service mesh takes intimate know-how and a lot of hard work with its own caviats such as changing HPA scaling policies, init container order, gracefull shutdown and more.
B. Limit connection time
When you think about it, it is weird that when we are using a LoadBalancer such as AWS ELB we don’t see an issue… So how does it solve it? If we take a little dive into NGINX on which ELB is based, we’ll see this little gem, by deafult they allow only 100 HTTP1.1 requests per each TCP keep-alive connection same for HTTP2 requests.
Now how hard can it be to implement it in NodeJS as a fastify middleware?
With just 17 lines of code the impact on cpu is this:
It’s worth mentioning a few things,
1) Some webservers already support this functionality so check the documentation.
2) Some HTTP client also have a similar functionality, comes in a form of Max connection time / Max requests on connection.
3) This feature was added to NodeJS 16
Conclusion:
In a high scale environemt with HPA enabled services you dont have to go for the win-all solution of services meshes if the only problem you are trying to solve is CPU distrebution between pods, It can easily and safely be achieved with simpler methods.