AI Gateways Inspect Prompts, But Who Secures the Connection to Your Infrastructure?
Andrew Baumbach
•
Product Marketing Engineer
•

TL;DR: AI gateways handle prompt filtering, rate limiting, and model routing. They don't handle how an agent actually reaches your internal database, API, or MCP server. That gap is a network access problem, and it's where ZTNA fits.
Every week there's a new AI gateway. Cloudflare AI Gateway adds guardrails. Kong ships an AI plugin suite. Portkey, LiteLLM, and Helicone all pitch some flavor of "observability and control for your LLM calls." They're useful, and they solve a real problem: when your application sends a prompt to OpenAI or Anthropic, you want logging, caching, content filtering, key rotation, and budget controls in one place.
But there's a question these tools don't answer, and it's the one that keeps coming up in security reviews: when an AI agent needs to read from your production Postgres, hit your internal billing API, or call a tool on an MCP server running in a VPC, how does that connection actually happen? Who decides whether the agent is allowed to make it? And what stops a compromised agent from reaching everything else on the same network?
The gateway sits in front of the model. The connectivity layer sits behind the agent. Most teams have invested heavily in the first and left the second to whatever VPN, bastion host, or hardcoded IP allowlist was already there.
What AI gateways actually do
It's worth being precise about the gateway's job, because the marketing around these products has gotten broad enough to imply they cover more ground than they do.
An AI gateway is a proxy that sits between your application and one or more LLM providers. The core feature set is fairly consistent across vendors:
Capability | What it does |
|---|---|
Prompt and response inspection | Scans for PII, prompt injection patterns, policy violations |
Rate limiting and quotas | Per-user, per-team, per-model spend caps |
Model routing and fallback | Send GPT-4 traffic to Azure OpenAI, fail over to Anthropic |
Caching | Return cached responses for repeated prompts |
Centralized API key management | Apps hit the gateway, gateway holds provider keys |
Observability | Logs, traces, token counts, latency metrics |
Cloudflare's AI Gateway is a clean example. So is Kong's AI Gateway plugin set. Portkey and LiteLLM cover similar ground from the application side.
What they all have in common: the traffic they see is the traffic between your app and the model. Prompts going out, completions coming back, and maybe tool-call definitions and structured outputs in the middle.
What they don't see: the connection from an AI agent into your private infrastructure when that agent needs to actually do something.
The connectivity problem nobody bought a product for
Take a realistic example. You've built an internal support agent. A user asks "what's the status of order 4471?" The agent needs to:
Authenticate the user
Query an internal orders API behind your VPC
Read from a Postgres read replica that isn't internet-facing
Call a shipping carrier's webhook (this part is external, fine)
Return a synthesized answer
Steps 2 and 3 are the interesting ones. The agent needs network-level access to resources that, correctly, aren't exposed to the public internet. So how do you connect it?
The answers I see in the wild, roughly in order of frequency:
Run the agent inside the VPC. This works until you need agents running anywhere else, like a developer's laptop, a customer's environment, or an edge function.
Open a port and put the API on the internet behind an API key. This is the path of least resistance and the largest attack surface. The API key becomes a static secret with broad scope, and now your "internal" API isn't.
Use a traditional VPN. Now the agent has IP-level access to the entire subnet, not just the one API it needs. If the agent or its host is compromised, the blast radius is the entire network.
Bastion hosts or jump boxes. Works for humans, but this is painful for programmatic access and doesn't scale to dozens of agents.
Wire up SSH tunnels in startup scripts. I've seen this before. It works, but I do not recommend it.
None of these are network access policies. They're plumbing. And the gateway in front of the model can't see any of it, because it's not in the path.
This matters more for agents than for users
When a human employee accesses an internal API, there are usually layers of context: their identity is in the IdP, their device is enrolled in MDM, their access is reviewed quarterly, and if something looks weird, somebody notices.
Agents don't have any of that by default. An agent is, from the network's perspective, just a process making HTTP calls. It often runs with credentials that don't expire on a sensible cadence. It can fire thousands of requests in the time a human makes one. And increasingly, agents are calling tools that have side effects: writing to databases, sending emails, kicking off workflows.
A few things follow from this:
The identity of the agent matters more than its IP. If you allowlist by IP, anything running on that host inherits the access. A compromised dependency, a misconfigured sidecar, a curl command in a Slack message, they all look like the agent.
Least privilege has to be per-resource, not per-network. "The agent can reach the VPC" is too coarse. "The agent can reach the orders API on port 8080, and nothing else" is the correct grain.
Standing access is a liability. An agent that can hit the production database 24/7 because it might need to once an hour is a worse trade than just-in-time access scoped to actual use.
The audit trail needs to show the agent, not the bastion. If your logs say "10.0.4.22 queried customers," that's not useful. "Agent
support-bot-v3, identity-verified, calledGET /orders/4471at 14:02:11" is.
This is roughly the brief that Zero Trust Network Access was written for, just applied to a non-human principal.
Where ZTNA fits in the AI stack
The cleanest way to think about it: the AI gateway secures the prompt path, ZTNA secures the resource path. They're in different parts of the request flow and they're solving different problems.
A reasonable layered model looks like this:
User / Client
│
▼
[ Application / Agent Runtime ]
The AI gateway inspects what the model sees and says. The ZTNA layer decides whether the agent's process is allowed to reach a specific resource at all, verifies that decision on both ends, and never exposes the resource to the internet.
In Twingate's case, the mechanics are concrete:
A Connector runs inside the network where the resource lives (VPC, K8s cluster, on-prem rack). All connections from the Connector are outbound-only. No inbound firewall rules, no exposed ports.
A Client runs alongside the agent, either on the same host, in the same container, or on the developer machine where the agent is being built.
The Controller holds policy: which identity can reach which resource, under which conditions. It delegates authentication to your IdP.
When the agent makes a request to, say,
orders.internal, the Client intercepts the DNS query, checks the local ACL, requests authorization from the Controller, and only then sets up an end-to-end encrypted tunnel through the Connector to the resource.The Connector independently re-verifies the policy before proxying traffic. Neither side trusts the other to have made the right call alone.
The agent never sees a network. It sees the specific resources it has been authorized to reach, by hostname, and nothing else. If the agent process is compromised, the attacker inherits exactly that scope and no more.
What "identity" means when the principal isn't a person
The hard part of applying ZTNA to AI workloads is identity. Humans authenticate through an IdP and an MFA prompt. Agents… don't, usually.
A few patterns work in practice:
Treat the agent's runtime host as the identity boundary. Install the ZTNA Client on the host (VM, container, Pod) and bind access to a service identity verified by the Controller. The agent process inherits the access of its host.
Use a service account in your IdP. Most IdPs support non-human identities now. Tie the ZTNA policy to that account. Rotate credentials on a real schedule.
Scope per-agent, not per-application. If you have three agents — support, ops, data — each gets its own service identity with its own resource policy. The support agent should not have any path, network or otherwise, to the data warehouse.
Log at the identity layer, not just the network layer. When something goes wrong, you want to know which agent made which call, not which subnet a packet came from.
This is also where MCP gets interesting.
An MCP server exposing tools to an agent is, architecturally, an internal API. If you're running MCP servers that talk to private resources (internal docs, customer data, code repos) the same logic applies. The MCP server shouldn't be reachable from the open internet, and the agents that consume it shouldn't have broader access than the tools they actually call.
A practical checklist for AI infrastructure access
If you're putting together the security model for an AI deployment that touches private resources, here's what's worth answering before the agent ships:
Where does the agent run, and what does that host have access to? If the answer is "the whole VPC by default," tighten it.
Is any internal API or database exposed to the public internet for the agent to reach? If yes, why? Can it be moved behind an identity-aware proxy or ZTNA Connector?
What identity does the agent authenticate as? Is it a service account in your IdP, or a static API key in an env var?
Are access policies scoped per-resource or per-network? Network-scope access is acceptable for humans on a corporate LAN in 2008. It's not appropriate for autonomous agents in 2026.
What's the audit trail when the agent makes a call to a private resource? Can you answer "which agent reached which resource when" without grepping seven log sources?
What happens when the agent is decommissioned or its credentials leak? How fast can you revoke access, and is that revocation enforced at the network layer or only at the application layer?
If you can answer these cleanly, you probably already have a ZTNA-shaped layer in place, whether you call it that or not. If you can't, the AI gateway is solving the visible problem and leaving the larger one wide open.
The short version
AI gateways are necessary. They sit in the prompt path, where prompt-shaped problems live: injection, leakage, cost, latency, model selection. They are not network access products and don't claim to be.
The connectivity layer between an AI workload and your internal infrastructure is a separate problem with a separate set of tools. It's the same problem teams have always had with remote employees, contractors, and service-to-service calls, but this time with a faster, more autonomous, more privilege-hungry principal.
ZTNA was built for exactly this shape: identity-based, least-privilege, deny-by-default access to specific resources without exposing them to the internet.
Pairing the two is the version of the AI security stack that actually covers the request from the user typing into a chat box all the way down to the row in the database.
Closing
For a deeper look at how Twingate's architecture handles service-to-service and agent access, including Connectors, resource-level policies, and headless client deployment, check out the Twingate documentation.
New to Twingate? You can use Twingate for free for up to 5 users, request a personalized demo, or reach out to the team over on the Twingate subreddit.
Rapidly implement a modern Zero Trust network that is more secure and maintainable than VPNs.
AI Gateways Inspect Prompts, But Who Secures the Connection to Your Infrastructure?
Andrew Baumbach
•
Product Marketing Engineer
•

TL;DR: AI gateways handle prompt filtering, rate limiting, and model routing. They don't handle how an agent actually reaches your internal database, API, or MCP server. That gap is a network access problem, and it's where ZTNA fits.
Every week there's a new AI gateway. Cloudflare AI Gateway adds guardrails. Kong ships an AI plugin suite. Portkey, LiteLLM, and Helicone all pitch some flavor of "observability and control for your LLM calls." They're useful, and they solve a real problem: when your application sends a prompt to OpenAI or Anthropic, you want logging, caching, content filtering, key rotation, and budget controls in one place.
But there's a question these tools don't answer, and it's the one that keeps coming up in security reviews: when an AI agent needs to read from your production Postgres, hit your internal billing API, or call a tool on an MCP server running in a VPC, how does that connection actually happen? Who decides whether the agent is allowed to make it? And what stops a compromised agent from reaching everything else on the same network?
The gateway sits in front of the model. The connectivity layer sits behind the agent. Most teams have invested heavily in the first and left the second to whatever VPN, bastion host, or hardcoded IP allowlist was already there.
What AI gateways actually do
It's worth being precise about the gateway's job, because the marketing around these products has gotten broad enough to imply they cover more ground than they do.
An AI gateway is a proxy that sits between your application and one or more LLM providers. The core feature set is fairly consistent across vendors:
Capability | What it does |
|---|---|
Prompt and response inspection | Scans for PII, prompt injection patterns, policy violations |
Rate limiting and quotas | Per-user, per-team, per-model spend caps |
Model routing and fallback | Send GPT-4 traffic to Azure OpenAI, fail over to Anthropic |
Caching | Return cached responses for repeated prompts |
Centralized API key management | Apps hit the gateway, gateway holds provider keys |
Observability | Logs, traces, token counts, latency metrics |
Cloudflare's AI Gateway is a clean example. So is Kong's AI Gateway plugin set. Portkey and LiteLLM cover similar ground from the application side.
What they all have in common: the traffic they see is the traffic between your app and the model. Prompts going out, completions coming back, and maybe tool-call definitions and structured outputs in the middle.
What they don't see: the connection from an AI agent into your private infrastructure when that agent needs to actually do something.
The connectivity problem nobody bought a product for
Take a realistic example. You've built an internal support agent. A user asks "what's the status of order 4471?" The agent needs to:
Authenticate the user
Query an internal orders API behind your VPC
Read from a Postgres read replica that isn't internet-facing
Call a shipping carrier's webhook (this part is external, fine)
Return a synthesized answer
Steps 2 and 3 are the interesting ones. The agent needs network-level access to resources that, correctly, aren't exposed to the public internet. So how do you connect it?
The answers I see in the wild, roughly in order of frequency:
Run the agent inside the VPC. This works until you need agents running anywhere else, like a developer's laptop, a customer's environment, or an edge function.
Open a port and put the API on the internet behind an API key. This is the path of least resistance and the largest attack surface. The API key becomes a static secret with broad scope, and now your "internal" API isn't.
Use a traditional VPN. Now the agent has IP-level access to the entire subnet, not just the one API it needs. If the agent or its host is compromised, the blast radius is the entire network.
Bastion hosts or jump boxes. Works for humans, but this is painful for programmatic access and doesn't scale to dozens of agents.
Wire up SSH tunnels in startup scripts. I've seen this before. It works, but I do not recommend it.
None of these are network access policies. They're plumbing. And the gateway in front of the model can't see any of it, because it's not in the path.
This matters more for agents than for users
When a human employee accesses an internal API, there are usually layers of context: their identity is in the IdP, their device is enrolled in MDM, their access is reviewed quarterly, and if something looks weird, somebody notices.
Agents don't have any of that by default. An agent is, from the network's perspective, just a process making HTTP calls. It often runs with credentials that don't expire on a sensible cadence. It can fire thousands of requests in the time a human makes one. And increasingly, agents are calling tools that have side effects: writing to databases, sending emails, kicking off workflows.
A few things follow from this:
The identity of the agent matters more than its IP. If you allowlist by IP, anything running on that host inherits the access. A compromised dependency, a misconfigured sidecar, a curl command in a Slack message, they all look like the agent.
Least privilege has to be per-resource, not per-network. "The agent can reach the VPC" is too coarse. "The agent can reach the orders API on port 8080, and nothing else" is the correct grain.
Standing access is a liability. An agent that can hit the production database 24/7 because it might need to once an hour is a worse trade than just-in-time access scoped to actual use.
The audit trail needs to show the agent, not the bastion. If your logs say "10.0.4.22 queried customers," that's not useful. "Agent
support-bot-v3, identity-verified, calledGET /orders/4471at 14:02:11" is.
This is roughly the brief that Zero Trust Network Access was written for, just applied to a non-human principal.
Where ZTNA fits in the AI stack
The cleanest way to think about it: the AI gateway secures the prompt path, ZTNA secures the resource path. They're in different parts of the request flow and they're solving different problems.
A reasonable layered model looks like this:
User / Client
│
▼
[ Application / Agent Runtime ]
The AI gateway inspects what the model sees and says. The ZTNA layer decides whether the agent's process is allowed to reach a specific resource at all, verifies that decision on both ends, and never exposes the resource to the internet.
In Twingate's case, the mechanics are concrete:
A Connector runs inside the network where the resource lives (VPC, K8s cluster, on-prem rack). All connections from the Connector are outbound-only. No inbound firewall rules, no exposed ports.
A Client runs alongside the agent, either on the same host, in the same container, or on the developer machine where the agent is being built.
The Controller holds policy: which identity can reach which resource, under which conditions. It delegates authentication to your IdP.
When the agent makes a request to, say,
orders.internal, the Client intercepts the DNS query, checks the local ACL, requests authorization from the Controller, and only then sets up an end-to-end encrypted tunnel through the Connector to the resource.The Connector independently re-verifies the policy before proxying traffic. Neither side trusts the other to have made the right call alone.
The agent never sees a network. It sees the specific resources it has been authorized to reach, by hostname, and nothing else. If the agent process is compromised, the attacker inherits exactly that scope and no more.
What "identity" means when the principal isn't a person
The hard part of applying ZTNA to AI workloads is identity. Humans authenticate through an IdP and an MFA prompt. Agents… don't, usually.
A few patterns work in practice:
Treat the agent's runtime host as the identity boundary. Install the ZTNA Client on the host (VM, container, Pod) and bind access to a service identity verified by the Controller. The agent process inherits the access of its host.
Use a service account in your IdP. Most IdPs support non-human identities now. Tie the ZTNA policy to that account. Rotate credentials on a real schedule.
Scope per-agent, not per-application. If you have three agents — support, ops, data — each gets its own service identity with its own resource policy. The support agent should not have any path, network or otherwise, to the data warehouse.
Log at the identity layer, not just the network layer. When something goes wrong, you want to know which agent made which call, not which subnet a packet came from.
This is also where MCP gets interesting.
An MCP server exposing tools to an agent is, architecturally, an internal API. If you're running MCP servers that talk to private resources (internal docs, customer data, code repos) the same logic applies. The MCP server shouldn't be reachable from the open internet, and the agents that consume it shouldn't have broader access than the tools they actually call.
A practical checklist for AI infrastructure access
If you're putting together the security model for an AI deployment that touches private resources, here's what's worth answering before the agent ships:
Where does the agent run, and what does that host have access to? If the answer is "the whole VPC by default," tighten it.
Is any internal API or database exposed to the public internet for the agent to reach? If yes, why? Can it be moved behind an identity-aware proxy or ZTNA Connector?
What identity does the agent authenticate as? Is it a service account in your IdP, or a static API key in an env var?
Are access policies scoped per-resource or per-network? Network-scope access is acceptable for humans on a corporate LAN in 2008. It's not appropriate for autonomous agents in 2026.
What's the audit trail when the agent makes a call to a private resource? Can you answer "which agent reached which resource when" without grepping seven log sources?
What happens when the agent is decommissioned or its credentials leak? How fast can you revoke access, and is that revocation enforced at the network layer or only at the application layer?
If you can answer these cleanly, you probably already have a ZTNA-shaped layer in place, whether you call it that or not. If you can't, the AI gateway is solving the visible problem and leaving the larger one wide open.
The short version
AI gateways are necessary. They sit in the prompt path, where prompt-shaped problems live: injection, leakage, cost, latency, model selection. They are not network access products and don't claim to be.
The connectivity layer between an AI workload and your internal infrastructure is a separate problem with a separate set of tools. It's the same problem teams have always had with remote employees, contractors, and service-to-service calls, but this time with a faster, more autonomous, more privilege-hungry principal.
ZTNA was built for exactly this shape: identity-based, least-privilege, deny-by-default access to specific resources without exposing them to the internet.
Pairing the two is the version of the AI security stack that actually covers the request from the user typing into a chat box all the way down to the row in the database.
Closing
For a deeper look at how Twingate's architecture handles service-to-service and agent access, including Connectors, resource-level policies, and headless client deployment, check out the Twingate documentation.
New to Twingate? You can use Twingate for free for up to 5 users, request a personalized demo, or reach out to the team over on the Twingate subreddit.
Rapidly implement a modern Zero Trust network that is more secure and maintainable than VPNs.
AI Gateways Inspect Prompts, But Who Secures the Connection to Your Infrastructure?
Andrew Baumbach
•
Product Marketing Engineer
•

TL;DR: AI gateways handle prompt filtering, rate limiting, and model routing. They don't handle how an agent actually reaches your internal database, API, or MCP server. That gap is a network access problem, and it's where ZTNA fits.
Every week there's a new AI gateway. Cloudflare AI Gateway adds guardrails. Kong ships an AI plugin suite. Portkey, LiteLLM, and Helicone all pitch some flavor of "observability and control for your LLM calls." They're useful, and they solve a real problem: when your application sends a prompt to OpenAI or Anthropic, you want logging, caching, content filtering, key rotation, and budget controls in one place.
But there's a question these tools don't answer, and it's the one that keeps coming up in security reviews: when an AI agent needs to read from your production Postgres, hit your internal billing API, or call a tool on an MCP server running in a VPC, how does that connection actually happen? Who decides whether the agent is allowed to make it? And what stops a compromised agent from reaching everything else on the same network?
The gateway sits in front of the model. The connectivity layer sits behind the agent. Most teams have invested heavily in the first and left the second to whatever VPN, bastion host, or hardcoded IP allowlist was already there.
What AI gateways actually do
It's worth being precise about the gateway's job, because the marketing around these products has gotten broad enough to imply they cover more ground than they do.
An AI gateway is a proxy that sits between your application and one or more LLM providers. The core feature set is fairly consistent across vendors:
Capability | What it does |
|---|---|
Prompt and response inspection | Scans for PII, prompt injection patterns, policy violations |
Rate limiting and quotas | Per-user, per-team, per-model spend caps |
Model routing and fallback | Send GPT-4 traffic to Azure OpenAI, fail over to Anthropic |
Caching | Return cached responses for repeated prompts |
Centralized API key management | Apps hit the gateway, gateway holds provider keys |
Observability | Logs, traces, token counts, latency metrics |
Cloudflare's AI Gateway is a clean example. So is Kong's AI Gateway plugin set. Portkey and LiteLLM cover similar ground from the application side.
What they all have in common: the traffic they see is the traffic between your app and the model. Prompts going out, completions coming back, and maybe tool-call definitions and structured outputs in the middle.
What they don't see: the connection from an AI agent into your private infrastructure when that agent needs to actually do something.
The connectivity problem nobody bought a product for
Take a realistic example. You've built an internal support agent. A user asks "what's the status of order 4471?" The agent needs to:
Authenticate the user
Query an internal orders API behind your VPC
Read from a Postgres read replica that isn't internet-facing
Call a shipping carrier's webhook (this part is external, fine)
Return a synthesized answer
Steps 2 and 3 are the interesting ones. The agent needs network-level access to resources that, correctly, aren't exposed to the public internet. So how do you connect it?
The answers I see in the wild, roughly in order of frequency:
Run the agent inside the VPC. This works until you need agents running anywhere else, like a developer's laptop, a customer's environment, or an edge function.
Open a port and put the API on the internet behind an API key. This is the path of least resistance and the largest attack surface. The API key becomes a static secret with broad scope, and now your "internal" API isn't.
Use a traditional VPN. Now the agent has IP-level access to the entire subnet, not just the one API it needs. If the agent or its host is compromised, the blast radius is the entire network.
Bastion hosts or jump boxes. Works for humans, but this is painful for programmatic access and doesn't scale to dozens of agents.
Wire up SSH tunnels in startup scripts. I've seen this before. It works, but I do not recommend it.
None of these are network access policies. They're plumbing. And the gateway in front of the model can't see any of it, because it's not in the path.
This matters more for agents than for users
When a human employee accesses an internal API, there are usually layers of context: their identity is in the IdP, their device is enrolled in MDM, their access is reviewed quarterly, and if something looks weird, somebody notices.
Agents don't have any of that by default. An agent is, from the network's perspective, just a process making HTTP calls. It often runs with credentials that don't expire on a sensible cadence. It can fire thousands of requests in the time a human makes one. And increasingly, agents are calling tools that have side effects: writing to databases, sending emails, kicking off workflows.
A few things follow from this:
The identity of the agent matters more than its IP. If you allowlist by IP, anything running on that host inherits the access. A compromised dependency, a misconfigured sidecar, a curl command in a Slack message, they all look like the agent.
Least privilege has to be per-resource, not per-network. "The agent can reach the VPC" is too coarse. "The agent can reach the orders API on port 8080, and nothing else" is the correct grain.
Standing access is a liability. An agent that can hit the production database 24/7 because it might need to once an hour is a worse trade than just-in-time access scoped to actual use.
The audit trail needs to show the agent, not the bastion. If your logs say "10.0.4.22 queried customers," that's not useful. "Agent
support-bot-v3, identity-verified, calledGET /orders/4471at 14:02:11" is.
This is roughly the brief that Zero Trust Network Access was written for, just applied to a non-human principal.
Where ZTNA fits in the AI stack
The cleanest way to think about it: the AI gateway secures the prompt path, ZTNA secures the resource path. They're in different parts of the request flow and they're solving different problems.
A reasonable layered model looks like this:
User / Client
│
▼
[ Application / Agent Runtime ]
The AI gateway inspects what the model sees and says. The ZTNA layer decides whether the agent's process is allowed to reach a specific resource at all, verifies that decision on both ends, and never exposes the resource to the internet.
In Twingate's case, the mechanics are concrete:
A Connector runs inside the network where the resource lives (VPC, K8s cluster, on-prem rack). All connections from the Connector are outbound-only. No inbound firewall rules, no exposed ports.
A Client runs alongside the agent, either on the same host, in the same container, or on the developer machine where the agent is being built.
The Controller holds policy: which identity can reach which resource, under which conditions. It delegates authentication to your IdP.
When the agent makes a request to, say,
orders.internal, the Client intercepts the DNS query, checks the local ACL, requests authorization from the Controller, and only then sets up an end-to-end encrypted tunnel through the Connector to the resource.The Connector independently re-verifies the policy before proxying traffic. Neither side trusts the other to have made the right call alone.
The agent never sees a network. It sees the specific resources it has been authorized to reach, by hostname, and nothing else. If the agent process is compromised, the attacker inherits exactly that scope and no more.
What "identity" means when the principal isn't a person
The hard part of applying ZTNA to AI workloads is identity. Humans authenticate through an IdP and an MFA prompt. Agents… don't, usually.
A few patterns work in practice:
Treat the agent's runtime host as the identity boundary. Install the ZTNA Client on the host (VM, container, Pod) and bind access to a service identity verified by the Controller. The agent process inherits the access of its host.
Use a service account in your IdP. Most IdPs support non-human identities now. Tie the ZTNA policy to that account. Rotate credentials on a real schedule.
Scope per-agent, not per-application. If you have three agents — support, ops, data — each gets its own service identity with its own resource policy. The support agent should not have any path, network or otherwise, to the data warehouse.
Log at the identity layer, not just the network layer. When something goes wrong, you want to know which agent made which call, not which subnet a packet came from.
This is also where MCP gets interesting.
An MCP server exposing tools to an agent is, architecturally, an internal API. If you're running MCP servers that talk to private resources (internal docs, customer data, code repos) the same logic applies. The MCP server shouldn't be reachable from the open internet, and the agents that consume it shouldn't have broader access than the tools they actually call.
A practical checklist for AI infrastructure access
If you're putting together the security model for an AI deployment that touches private resources, here's what's worth answering before the agent ships:
Where does the agent run, and what does that host have access to? If the answer is "the whole VPC by default," tighten it.
Is any internal API or database exposed to the public internet for the agent to reach? If yes, why? Can it be moved behind an identity-aware proxy or ZTNA Connector?
What identity does the agent authenticate as? Is it a service account in your IdP, or a static API key in an env var?
Are access policies scoped per-resource or per-network? Network-scope access is acceptable for humans on a corporate LAN in 2008. It's not appropriate for autonomous agents in 2026.
What's the audit trail when the agent makes a call to a private resource? Can you answer "which agent reached which resource when" without grepping seven log sources?
What happens when the agent is decommissioned or its credentials leak? How fast can you revoke access, and is that revocation enforced at the network layer or only at the application layer?
If you can answer these cleanly, you probably already have a ZTNA-shaped layer in place, whether you call it that or not. If you can't, the AI gateway is solving the visible problem and leaving the larger one wide open.
The short version
AI gateways are necessary. They sit in the prompt path, where prompt-shaped problems live: injection, leakage, cost, latency, model selection. They are not network access products and don't claim to be.
The connectivity layer between an AI workload and your internal infrastructure is a separate problem with a separate set of tools. It's the same problem teams have always had with remote employees, contractors, and service-to-service calls, but this time with a faster, more autonomous, more privilege-hungry principal.
ZTNA was built for exactly this shape: identity-based, least-privilege, deny-by-default access to specific resources without exposing them to the internet.
Pairing the two is the version of the AI security stack that actually covers the request from the user typing into a chat box all the way down to the row in the database.
Closing
For a deeper look at how Twingate's architecture handles service-to-service and agent access, including Connectors, resource-level policies, and headless client deployment, check out the Twingate documentation.
New to Twingate? You can use Twingate for free for up to 5 users, request a personalized demo, or reach out to the team over on the Twingate subreddit.
Solutions
Solutions
The VPN replacement your workforce will love.
Solutions