AI Vendor Security Checklist

How to use this checklist

Before connecting an AI agent platform to your team’s tools, you’re granting it access to your code, conversations, documents, and potentially customer data. That decision deserves due diligence.

This checklist covers 20 questions across five categories. For each question, we explain why it matters and what a good answer looks like. We also share how ClawStaff answers each one, not because we think you should skip the evaluation, but because we believe vendors should answer these questions publicly.

Send this list to any AI agent vendor you’re evaluating. The quality and specificity of their answers will tell you a lot about their security posture.

Isolation Architecture

1. How are customer workloads isolated from each other?

Why it matters: If your AI agents run in a shared runtime with other customers’ agents, a vulnerability in their workload could expose your data. Isolation architecture is the foundation of multi-tenant security.

What good looks like: Container-level or VM-level isolation per customer. Each organization gets its own runtime environment with dedicated resources.

ClawStaff’s answer: Every organization gets its own isolated container through ClawCage. No shared process space, memory, or disk between organizations.

2. Can one customer’s agent access another customer’s data?

Why it matters: This is the practical consequence of question 1. Even with good isolation, misconfigured access controls or shared databases can create cross-tenant data access.

What good looks like: “No, and here’s the architectural reason why.” The answer should reference specific isolation mechanisms, not just policies.

ClawStaff’s answer: No. Container isolation means each organization’s agents, data, and configurations are separated at the infrastructure level. There is no shared database or runtime between organizations.

3. What happens if an agent crashes or behaves unexpectedly?

Why it matters: An agent that enters an error loop could consume resources, leak data in error messages, or affect other services. The blast radius of agent failures matters.

What good looks like: Agent failures are contained within the customer’s isolated environment. Crash recovery is automatic without requiring manual intervention. Error states don’t expose data.

ClawStaff’s answer: Agent failures are contained within the organization’s container. A crashing Claw cannot affect other Claws in different organizations. The container boundary limits the blast radius.

4. Is isolation consistent across all pricing tiers?

Why it matters: Some platforms offer isolation only on enterprise plans, leaving starter and mid-tier customers in shared environments. Security shouldn’t be a premium feature.

What good looks like: “Yes, every customer gets the same isolation architecture regardless of plan.”

ClawStaff’s answer: Yes. Every organization gets container isolation from the Starter plan through Scale. Isolation is architectural, not a feature gate.

Data Flow

5. Does my data pass through your servers?

Why it matters: Every server your data touches is an additional point of risk and an additional entity in your compliance scope. The shorter the data path, the smaller the attack surface.

What good looks like: The vendor should clearly describe the data flow: what enters their infrastructure, what passes through, and what they store. BYOK models minimize vendor contact with customer data.

ClawStaff’s answer: With BYOK, your LLM calls go directly from your container to your LLM provider. ClawStaff orchestrates agent behavior but does not sit in the data path between your tools and your model provider.

6. Do you train models on my data?

Why it matters: If the vendor uses your data to train or fine-tune their models, your proprietary information could influence outputs for other customers. This is especially concerning with sensitive or regulated data.

What good looks like: “No, we do not train on customer data.” Full stop, no qualifiers.

ClawStaff’s answer: No. ClawStaff does not train models on customer data. With BYOK, your data flows to your LLM provider under their terms, not ours.

7. What data do you store, and for how long?

Why it matters: Even if data passes through briefly, storage creates persistent risk. You need to know what’s retained, where it’s stored, and when it’s deleted.

What good looks like: Clear documentation of what’s stored (agent configurations, logs, metadata) vs what’s not stored (prompts, responses, customer data). Defined retention periods with customer control over deletion.

ClawStaff’s answer: ClawStaff stores agent configurations, organization metadata, and audit logs. With BYOK, prompts and LLM responses are not stored by ClawStaff. They flow directly between your container and your provider.

8. Can I delete my data, and is deletion verifiable?

Why it matters: Data deletion rights are required by GDPR, expected by customers, and practically important when offboarding from a platform.

What good looks like: Customer-initiated deletion of all stored data, including backups, within a defined timeframe. Confirmation of deletion provided.

ClawStaff’s answer: Organization data can be deleted by the organization administrator. Deletion removes agent configurations, audit logs, and all associated metadata.

Permission Granularity

9. Can I set different permissions for different agents?

Why it matters: A code review agent and a customer support agent should not have the same tool access. Per-agent permissions enforce the principle of minimum necessary access.

What good looks like: Per-agent permission configuration, including which tools each agent can access and what actions it can perform.

ClawStaff’s answer: Yes. Each Claw has its own tool access configuration. Access controls are defined per agent, per tool.

10. Can I restrict which tools each agent accesses?

Why it matters: An agent with access to “everything” is an agent waiting to cause a data incident. Tool-level restrictions limit the blast radius of any single agent.

What good looks like: Granular tool-by-tool access configuration for each agent. Default to no access; tools must be explicitly granted.

ClawStaff’s answer: Yes. Tool access is explicitly configured per Claw. Agents have no tool access by default; each integration must be granted.

11. Can I control agent visibility within my organization?

Why it matters: Not every agent should be visible to every team member. A finance team’s AI coworker shouldn’t be accessible to the entire organization by default.

What good looks like: Multiple visibility tiers: private (creator only), team (specific group), and organization-wide.

ClawStaff’s answer: Yes. Claws are scoped as private, team, or organization. Visibility is set at deployment and can be changed.

12. How do you handle OAuth tokens and API credentials?

Why it matters: Your agents need credentials to access your tools. How those credentials are stored, rotated, and scoped determines whether a credential compromise is a minor incident or a major breach.

What good looks like: Encrypted credential storage, per-agent credential scoping, support for credential rotation without agent downtime.

ClawStaff’s answer: Credentials are encrypted at rest and scoped to individual Claws. BYOK API keys are stored encrypted and used only for the designated agent’s LLM calls.

Audit and Monitoring

13. Is every agent action logged?

Why it matters: If you can’t see what an agent did, you can’t audit it, troubleshoot it, or learn from it. Complete logging is the foundation of agent oversight.

What good looks like: Every tool access, data read, output generation, and error is logged with timestamps and agent identification. No gaps for “internal” actions.

ClawStaff’s answer: Yes. Every Claw action is logged: tool access, data reads, outputs, and errors. Logs include timestamps, agent identification, and action details.

14. Can I export audit logs?

Why it matters: Logs that live only in the vendor’s dashboard aren’t useful for compliance audits, SIEM integration, or long-term record retention under your control.

What good looks like: Log export in standard formats (JSON, CSV). API access for automated log collection. Integration with common SIEM tools.

ClawStaff’s answer: Audit logs are accessible through the admin dashboard and can be exported for external review and compliance documentation.

15. Do you alert on anomalous agent behavior?

Why it matters: An agent that suddenly accesses ten times its normal volume of data, or starts accessing tools it hasn’t used before, could indicate a compromised credential or a configuration error. Automated detection catches issues before they become incidents.

What good looks like: Configurable alerts based on activity thresholds, tool access patterns, and error rates. Notifications through your existing channels (email, Slack).

ClawStaff’s answer: Activity monitoring is available through the admin dashboard. Teams can set up monitoring cadences as part of their AI governance framework.

16. How long are audit logs retained?

Why it matters: Compliance frameworks often require specific retention periods. Your internal policies may require logs for a defined duration. If the vendor deletes logs after 30 days, your compliance may have gaps.

What good looks like: Defined retention periods that meet or exceed common compliance requirements (90 days minimum, with options for longer retention). Customer control over retention settings.

ClawStaff’s answer: Audit logs are retained for the duration of your subscription with options for extended retention on higher-tier plans.

Key Management

17. Do you offer Bring Your Own Key (BYOK)?

Why it matters: BYOK means your LLM calls use your API keys, keeping the data relationship between you and your model provider. Without BYOK, the vendor processes your data through their keys, adding themselves to the data flow.

What good looks like: Full BYOK support where the customer provides their own LLM API keys. The vendor orchestrates agent behavior without inserting themselves into the LLM data path.

ClawStaff’s answer: Yes. BYOK is supported across all plans. Your API keys, your provider relationship, your data flow.

18. Can I use different models for different agents?

Why it matters: Some tasks need larger, more capable models. Others work fine with smaller, faster, cheaper models. Per-agent model assignment lets you optimize cost and capability at the agent level.

What good looks like: Per-agent model configuration. Support for multiple LLM providers (OpenAI, Anthropic, etc.) across different agents within the same organization.

ClawStaff’s answer: Yes. Each Claw can be configured with a different LLM provider and model through BYOK. Use GPT-4o for one agent and Claude for another.

19. How are API keys stored and protected?

Why it matters: Your LLM API keys are sensitive credentials. If compromised, they allow unrestricted access to your LLM provider at your expense.

What good looks like: Keys encrypted at rest using industry-standard encryption (AES-256 or equivalent). Keys never logged in plaintext. Access to key storage limited to the runtime processes that need them.

ClawStaff’s answer: API keys are encrypted at rest and accessible only to the agent’s container runtime. Keys are never exposed in logs, dashboards, or API responses.

20. What happens to my keys if I leave the platform?

Why it matters: When you offboard from a vendor, your API keys should be deleted, not retained “just in case” or stored in backups indefinitely.

What good looks like: Keys are deleted when the organization is deleted or the agent is decommissioned. Deletion includes backups. Confirmation provided.

ClawStaff’s answer: When an organization or agent is deleted, associated API keys are removed from encrypted storage. Your keys are your credentials. ClawStaff does not retain them after offboarding.

How to score the answers

Not every vendor will answer all 20 questions perfectly. Here’s how to prioritize:

Non-negotiable (questions 1, 2, 5, 6, 9, 10, 13, 17): If a vendor can’t clearly answer these, they haven’t built security into their architecture. These cover isolation, data flow, permissions, logging, and key management: the fundamentals.

Important (questions 3, 4, 7, 8, 11, 12, 14, 19): Gaps here indicate a less mature security posture. They’re addressable, but you should understand the risks and timelines for resolution.

Nice to have (questions 15, 16, 18, 20): These differentiate good from great. A vendor without anomaly detection or flexible key management can still be secure, but may require more manual oversight on your end.

Use this checklist as a conversation starter

The best security evaluation isn’t a checkbox exercise. It’s a conversation. A vendor that answers these questions openly and specifically is demonstrating their security posture. A vendor that deflects, generalizes, or gates answers behind NDAs is telling you something too.

For more on building security into your AI deployment process, see our AI Governance Framework. For healthcare-specific security considerations, see HIPAA-Compliant AI Agents.