F5 is expanding its Application Delivery and Security platform with new features designed for Kubernetes environments for AI. A key element is the new BIG-IP Next for Kubernetes module, which – in collaboration with the NVIDIA BlueField-3 DPU and the NVIDIA DOCA environment – provides more efficient traffic and security management of AI-generated applications.
The F5 novelty is not just another load balancing tool. It is a step towards a next-generation AI infrastructure. The solution tested by Sesterce shows that dynamic load balancing – with increasing volumes of data and queries – allows for better GPU utilisation and thus reduced costs and improved quality of service.
In practice, this means intelligently routing queries to language models – including LLMs – according to their complexity and available resources. Simple tasks can be handled by lighter models, while complex tasks go to larger instances. This flexible routing approach not only improves response times, but also enables specialisation of models in different subject domains.
Working with NVIDIA allows F5 to move some operations from the CPU directly to the BlueField-3 DPU, reducing latency and freeing up valuable server resources. KV Cache Manager also plays a key role, which – in conjunction with NVIDIA Dynamo – allows reuse of previously processed data to speed up AI systems and reduce GPU memory consumption.
“Enterprises are deploying more and more LLM models for more complex AI applications, but classifying and routing LLM traffic can be computationally expensive and degrade user experience,” – points out Kunal Anand, Chief Innovation Officer at F5. – “By implementing routing logic directly on the NVIDIA BlueField-3 DPU, F5 BIG-IP Next for Kubernetes is the most efficient way to deliver and secure such traffic. This is just the beginning. Our platform opens up new possibilities for AI infrastructure, and we look forward to deepening our partnership with NVIDIA as we scale enterprise AI applications.”
Importantly, the new F5 module supports the Model Context Protocol (MCP) – an open standard from Anthropic – securing LLM servers running on this model and enabling faster adaptation to changing requirements.
For IT service providers and integrators, this is concrete news: F5 and NVIDIA today offer tools that realistically optimise AI infrastructures – and are already commercially available. In a world where every millisecond of processing comes at a price, this is an advantage that is hard to ignore.
“BIG-IP Next for Kubernetes, supported by the NVIDIA BlueField-3 DPU, allows companies and service providers to better manage traffic in AI environments. This allows them to optimise GPU performance and reduce processing time when inferring, training models or deploying AI systems,” says Ash Bhalgat, Senior Director of AI Networking and Security Solutions, Ecosystem and Marketing, NVIDIA. –“In addition, the multi-user support and programmability of iRules in F5 create a platform well positioned for further integration and development, such as support for the distributed KV Cache Manager in NVIDIA Dynamo.”