Gateway & Protection
AI Gateway
Single entry point to the models: multi-provider routing, cache, quotas — and a policy enforcement point.
Plane
Gateway & Protection
Flow steps
1 · 5
Frameworks
NIST 800-53 · 800-207 · OWASP LLM10/01
Technology
Why use it
Concentrate all model traffic at one point: multi-provider routing, cache, normalization and quotas.
Why it matters to security
It is an ideal PEP: it centralizes authentication, policy enforcement, rate/cost caps and input/output inspection before the model — an auditable choke point.
Implementations Azure AI GatewayKong AI GatewayLiteLLMCloudflare AI GatewayApigee
One path to the model means one place to control everything.
Recommendations by maturity tier
Foundation
Minimum viable baseline
- Centralized routing and model-call authentication. NIST 800-53 AC-3 · IA-5No direct model calls: everything goes through a governed point.
- Logging of every model call. NIST 800-53 AU-2 · AU-12Tracing prompts and responses is essential to audit and security debugging.
- Rate and token caps. NIST 800-53 SC-5OWASP LLM10:2025Bounding tokens protects both availability and budget.
Enterprise
Enterprise standard
- Request validation (model guard). NIST 800-53 SI-10OWASP LLM01:2025The gateway checks shape and content before the model is reached.
- Per-tenant cost / token quotas and isolation. NIST 800-53 SC-5OWASP LLM10:2025Each tenant has its budget; abuse stays contained.
- Secure cache with no cross-tenant leakage. NIST 800-53 SC-4A poorly isolated shared cache leaks one tenant’s data to another.
Advanced
High-assurance / regulated
- PEP → PDP decision on every request. NIST 800-53 AC-3 · AC-24NIST 800-207 §3.1Model access becomes a context-aware policy decision, not a standing right.
- Risk-adaptive capping. NIST 800-53 SI-4Quotas tighten when behavior turns suspicious.
- Abuse and cost-drift detection. NIST 800-53 SI-4A runaway agent loop is caught before the bill.
Architecture notes
- Cap cost, not just throughput.details ▸A looping agent can drain a budget in minutes.LLM10 (Unbounded Consumption) covers cost as much as availability: enforce token and spend quotas.
References
NIST SP 800-53 Rev5
AC-3, AC-24 (Access Control Decisions), SC-5 (DoS), SC-4 (Shared Resources), SI-10, AU-12.
NIST SP 800-207
§3.1 — the AI Gateway as a Policy Enforcement Point in front of the model.
OWASP LLM10:2025
Unbounded Consumption — token and cost quotas.
Abbreviations
PDP
Policy Decision Point
PEP
Policy Enforcement Point
PIP
Policy Information Point
PAP
Policy Administration Point
IdP
Identity Provider
TSS
Token Service
NHI
Non-Human Identity
RBAC
Role-Based Access Control
ABAC
Attribute-Based Access Control
MFA
Multi-Factor Authentication
HITL
Human-in-the-loop
JIT
Just-In-Time
CAE
Continuous Access Evaluation
CAEP
Continuous Access Evaluation Profile
DPoP
Demonstrating Proof-of-Possession
mTLS
mutual TLS
PII
Personally Identifiable Information
KMS
Key Management Service
CI/CD
Continuous Integration / Continuous Delivery
SIEM
Security Information and Event Management
SOAR
Security Orchestration, Automation and Response
SCIM
System for Cross-domain Identity Management
XACML
eXtensible Access Control Markup Language
OPA
Open Policy Agent
OWASP
Open Worldwide Application Security Project
NIST
National Institute of Standards and Technology
ATLAS
Adversarial Threat Landscape for Artificial-Intelligence Systems
LLM
Large Language Model
WAF
Web Application Firewall
CDN
Content Delivery Network
DDoS
Distributed Denial of Service
DLP
Data Loss Prevention
JWT
JSON Web Token
API
Application Programming Interface
CRS
Core Rule Set (OWASP)
RAG
Retrieval-Augmented Generation
MCP
Model Context Protocol
PBAC
Permission-Based Access Control
HSM
Hardware Security Module
UEBA
User and Entity Behavior Analytics
SBOM
Software Bill of Materials
SLSA
Supply-chain Levels for Software Artifacts
WORM
Write Once, Read Many
SPIFFE
Secure Production Identity Framework For Everyone