High HTTP Scaling with Azure Functions Flex Consumption: Q&A with Thiago Almeida and Paul Batum

Microsoft has introduced a significant enhancement to its Azure Functions platform with the Flex Consumption plan, designed to handle high HTTP scale efficiently. This new plan supports customizable per-instance concurrency, allowing users to achieve high throughput while managing costs effectively. In practical tests, Azure Functions Flex demonstrated the ability to scale from zero to 32,000 requests per second (RPS) in just seven seconds.

The Flex Consumption plan supports two memory sizes, 2,048 MB and 4,096 MB, with more sizes expected in the future. It also includes features to optimize cold starts and protect downstream components by adjusting the maximum instance count. The plan aims to offer a robust and flexible solution for varying load requirements, from retail flash sales to large-scale data processing.

A case study featured in the announcement showcases the plan's capabilities. A retail customer handling a flash online promotion achieved an average throughput of 15,630 RPS, processing nearly 3 million requests in three minutes. The system could handle up to 32,000 RPS by fine-tuning the concurrency settings, illustrating the plan's scalability and performance benefits.

Average throughput of 15,630 RPS (Source: Tech community blog post)

The company also added improved virtual network features to Azure Functions Flex Consumption. A function app can now access services secured behind a virtual network (VNet) without sacrificing speed, even during scaling. In a recent scenario, a VNet allowed a function app to write to an event hub without a public endpoint. The company compared startup performance with and without VNet integration across multiple language stacks and regions to test this.

Enabling VNet injection has minimal impact on scale-out performance. With an average delay of 37ms at the 50th percentile, the security benefits of using virtual networks with Flex Consumption outweigh the performance cost. These results are due to the significant investment in the networking stack of Project Legion, the compute foundation for Flex Consumption.

InfoQ spoke with Thiago Almeida, a principal program manager at Microsoft, and Paul Batum, a principal software engineer at Microsoft, to learn more about Azure Function Flex Consumption performance.

InfoQ: How does one determine the optimal per-instance concurrency setting for different workloads in Azure Functions Flex?

Thiago Almeida: You can generally trust the default values to work for most cases and let Azure Functions scale dynamically. Flex Consumption provides default values that maximize each language’s capabilities. The default concurrency for Python apps is 1 for all instance sizes. For other languages, the 2,048 MB instance size uses a default concurrency of 16, and the 4,096 MB uses 32. Running tests with varying concurrencies is recommended to further fine-tune high HTTP scaling scenarios for your Flex Consumption applications. The Performance Optimizer feature has been enabled for everyone. It is a great tool to help identify the best concurrency and instance size for your HTTP functions apps on Flex Consumption.

InfoQ: When adjusting instance concurrency and memory sizes, can you elaborate on the trade-offs between cost and performance?

Paul Batum: It varies significantly across workloads, but a general rule is that increasing the per-instance concurrency will improve your overall efficiency but with some type of performance impact, such as slower responses, particularly at high percentiles (P99, P99.9, etc). For some workloads, this is a huge win - if the application has called an external API and is waiting for a response, it's much more efficient to process other requests on that instance while waiting for that response.

On the other hand, if the workload is highly CPU intensive, context switching between 16 concurrent operations is less efficient than handling them one after another. So, in this type of scenario, you would likely see your efficiency improve by reducing the concurrency. When you increase memory size, there is a proportional increase in allocated CPU, which can help reduce the time it takes for the system to complete a CPU-heavy task.

InfoQ: How does enabling VNet injection impact the performance, and what specific optimizations have been made in Project Legion to address this?

Almeida: Project Legion was built to allow scaling to the levels that Flex Consumption requires, including VNet injection with kernel-level routing and efficient use of subnet IP addresses. The Azure Functions team also introduced the ’trigger monitor’ component that is injected into the customer’s VNet to provide scaling decisions for the app, even if the app is scaled to zero. As the case study article shows, these platform improvements allow VNet injection to have a very low impact on the scale-out performance. We observed a 37ms at the 50th percentile change between tests with and without VNet integration.

Lastly, more Azure Flex Consumption samples are available on GitHub.

About the Author

Steef-Jan Wiggers

Show moreShow less

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Write for InfoQ

About the Author

Steef-Jan Wiggers

Rate this Article

This content is in the Cloud topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter