Rate Limiting

Overview

VIZOCHOK enforces rate limits at three tiers to protect the platform, ensure fair usage, and prevent abuse:

Tier 1: Per-Connection

10 messages / 60 seconds (sliding window per WebSocket)

Tier 2: Per-User

200 messages / day, 20 conversations / day (per user)

Tier 3: Per-Tenant

500,000 tokens / day, 10,000,000 tokens / month (per tenant)

Each tier is checked in order. If any tier rejects the request, an error is returned immediately and subsequent tiers are not checked.

Tier 1: Per-Connection Rate Limit

Protects against rapid message flooding on a single WebSocket connection.

Parameter	Value
Max messages	10
Time window	60 seconds (sliding)
Scope	Single WebSocket connection

How It Works

The server tracks message frequency per connection and rejects messages that exceed the limit.

Error Response

{
  "type": "error",
  "code": "rate_limit_exceeded",
  "retry_after": 45
}

The retry_after field indicates how many seconds until the oldest message in the window expires and a new message can be sent.

Connection Limit

Each API key is also limited to a maximum of 3 concurrent WebSocket connections. Exceeding this limit closes the new connection with close code 4029.

Tier 2: Per-User Rate Limit

Prevents individual users from consuming excessive resources.

Parameter	Default	Configurable
Messages per day	200	Yes
Conversations per day	20	Yes

How It Works

Counters reset daily at midnight UTC. Rate limits are tracked per authenticated user. Always pass userId in the widget config for accurate per-user tracking.

Error Responses

{
  "type": "error",
  "code": "user_message_limit",
  "limit": 200
}

{
  "type": "error",
  "code": "user_conversation_limit",
  "limit": 20
}

Configuring Limits

Per-user limits are configurable per tenant via the Admin Panel.

Tier 3: Per-Tenant Rate Limit

Prevents a single tenant from consuming disproportionate LLM resources.

Parameter	Default	Configurable
Tokens per day	500,000	Yes
Tokens per month	10,000,000	Yes

How It Works

Token usage is recorded after each AI response and checked before processing new messages. Both daily and monthly limits are enforced.

Error Responses

{
  "type": "error",
  "code": "tenant_daily_token_limit"
}

{
  "type": "error",
  "code": "tenant_monthly_token_limit"
}

Tenant token limits affect all users of a tenant. When a tenant hits the daily or monthly limit, no user under that tenant can send new messages until the limit resets.

Usage Alerts

VIZOCHOK sends a one-time alert notification when a tenant reaches 80% of their monthly token budget.

Configuring Limits

Per-tenant token limits are configurable via the Admin Panel.

Additional Limits

Beyond the three tiers, the system enforces several other limits:

Limit	Value	Description
Message size	64 KB	Maximum size of a single WebSocket message (checked client-side and server-side).
Agent processing timeout	120s	Maximum time for the agent to process a single message.
Auth timeout	10s	Maximum time to receive the auth message after connection.
Heartbeat timeout	60s	Connection closed if no pong within this period.
Per-session token limit	Configurable	Maximum tokens per conversation (prevents runaway sessions).
Max tool chain rounds	Configurable	Maximum LLM iterations per response.
Client message queue	50	SDK-side limit on queued messages during disconnection.
Concurrent WS connections	3	Per API key.

What Happens When Limits Are Hit

Limit	User Experience
Connection rate limit	Error shown in chat with countdown timer. User can send another message after `retry_after` seconds.
User message limit	Error shown in chat: “Daily message limit reached.” No more messages until midnight UTC.
User conversation limit	Error shown in chat: “Daily conversation limit reached.” Starting new conversations is blocked.
Tenant daily tokens	Error shown in chat: “Service temporarily unavailable.” All users of the tenant are blocked.
Tenant monthly tokens	Same as daily tokens. Persists until the calendar month changes.
Agent busy	Error shown in chat: “Assistant is processing a previous request.” Resolves when the previous request completes.
Message too large	Error shown immediately (client-side check). User must shorten the message.

Best Practices for High-Traffic Stores

1. Set Appropriate Per-User Limits

For high-traffic stores, consider lowering per-user limits to prevent individual users from consuming a disproportionate share of the tenant’s token budget:

Messages per day: 50-100 for most retail use cases
Conversations per day: 5-10

2. Monitor Token Usage

Use the admin panel dashboard to monitor daily and monthly token consumption. VIZOCHOK automatically sends an alert when you reach 80% of your monthly budget.

3. Optimize with Smart Prompts

Configure your tenant’s system prompt to encourage concise interactions:

Provide clear store rules to reduce unnecessary tool calls
Disable tools that are not relevant to your use case via disabled_tools

4. Use User IDs for Accurate Tracking

Always pass userId in the widget config when the user is authenticated. This enables accurate per-user rate limiting rather than falling back to per-API-key limits.

const widget = new VizochokWidget({
  apiKey: 'pk_your_key',
  storeId: 'your-store',
  userId: currentUser.id, // Important for rate limiting
});

5. Handle Limit Errors Gracefully

Listen for limit errors in the onError callback and provide appropriate feedback:

onError: (error) => {
  if (error.code === 'tenant_daily_token_limit' ||
      error.code === 'tenant_monthly_token_limit') {
    notifyOps('VIZOCHOK token limit reached');
  }
},

Monitoring Usage

Monitor your current usage in the Admin Panel dashboard, which shows:

Tenant’s token count for today and this month
Per-user message and conversation counts
Usage trends and alerts

Usage data is also available via the REST API — see the API Reference tab.

Getting Started

Integration Guides

Widget SDK

Webhooks

Concepts

Overview

Tier 1: Per-Connection

Tier 2: Per-User

Tier 3: Per-Tenant

Tier 1: Per-Connection Rate Limit

How It Works

Error Response

Connection Limit

Tier 2: Per-User Rate Limit

How It Works

Error Responses

Configuring Limits

Tier 3: Per-Tenant Rate Limit

How It Works

Error Responses

Usage Alerts

Configuring Limits

Additional Limits

What Happens When Limits Are Hit

Best Practices for High-Traffic Stores

1. Set Appropriate Per-User Limits

2. Monitor Token Usage

3. Optimize with Smart Prompts

4. Use User IDs for Accurate Tracking

5. Handle Limit Errors Gracefully

Monitoring Usage

Getting Started

Integration Guides

Widget SDK

Webhooks

Concepts

Documentation Index

​Overview

Tier 1: Per-Connection

Tier 2: Per-User

Tier 3: Per-Tenant

​Tier 1: Per-Connection Rate Limit

​How It Works

​Error Response

​Connection Limit

​Tier 2: Per-User Rate Limit

​How It Works

​Error Responses

​Configuring Limits

​Tier 3: Per-Tenant Rate Limit

​How It Works

​Error Responses

​Usage Alerts

​Configuring Limits

​Additional Limits

​What Happens When Limits Are Hit

​Best Practices for High-Traffic Stores

​1. Set Appropriate Per-User Limits

​2. Monitor Token Usage

​3. Optimize with Smart Prompts

​4. Use User IDs for Accurate Tracking

​5. Handle Limit Errors Gracefully

​Monitoring Usage

Overview

Tier 1: Per-Connection Rate Limit

How It Works

Error Response

Connection Limit

Tier 2: Per-User Rate Limit

How It Works

Error Responses

Configuring Limits

Tier 3: Per-Tenant Rate Limit

How It Works

Error Responses

Usage Alerts

Configuring Limits

Additional Limits

What Happens When Limits Are Hit

Best Practices for High-Traffic Stores

1. Set Appropriate Per-User Limits

2. Monitor Token Usage

3. Optimize with Smart Prompts

4. Use User IDs for Accurate Tracking

5. Handle Limit Errors Gracefully

Monitoring Usage