Public Preview: Expanded GenAI Gateway capabilities in Azure API Management
Related Products
We are excited to announce new enhancements to our GenAI Gateway capabilities, specifically designed for large language model (LLM) use cases. Building on our initial release in May 2024, we are introducing new policies to support a wider range of LLMs via the Azure AI Model Inference API. These new policies offer the same robust functionality as our initial offerings but are now compatible with a broader array of models available in Azure AI Studio.
Key Highlights of the New GenAI Policies:
- LLM Token Limit Policy (Preview): This policy allows you to define and enforce token limits for interactions with large language models, helping manage resource usage and control costs. It automatically blocks requests that exceed the set token limit, preventing overuse and ensuring fair usage across applications.
- LLM Emit Token Metric Policy (Preview): Gain detailed insights into token consumption with this policy, which emits metrics in real-time. It provides valuable information on token usage patterns, aiding in cost management by allowing you to attribute costs to different teams, departments, or applications.
- LLM Semantic Caching Policy (Preview): Designed to enhance efficiency and reduce costs, this policy caches responses based on the semantic content of prompts. By reducing redundant model inferences and lowering latency, it optimizes resource utilization and speeds up response times for frequently requested queries.
These enhancements ensure efficient, cost-effective, and powerful LLM usage, allowing you to take full advantage of the models available in Azure AI. With seamless integration and enhanced monitoring capabilities, Azure API Management continues to empower your intelligent applications with advanced AI functionalities.
Start exploring these new policies today and elevate your application development with Azure API Management!