Skip to main content

Multi-Tenant Architecture

VIZOCHOK is a multi-tenant SaaS where each tenant (retailer) gets an isolated environment with their own:
  • Product catalog and vector embeddings
  • AI configuration (LLM model, system prompt, disabled tools)
  • API keys (public and secret)
  • Webhook endpoints (products, cart)
  • Usage limits and billing
  • Admin users
All tenants share the same infrastructure (database, Redis, LLM API), but data is strictly isolated via tenant_id scoping at every layer.

Tenant A

  • Catalog + Embeddings
  • API Keys (pk_, sk_)
  • LLM Config (configurable per tenant)
  • Webhooks (products, cart)
  • Usage Limits + Billing
  • Admin Users

Tenant B

  • Catalog + Embeddings
  • API Keys (pk_, sk_)
  • LLM Config (configurable per tenant)
  • Webhooks (products, cart)
  • Usage Limits + Billing
  • Admin Users

Data Flow

The complete data flow from a user message to a response:
1

Widget SDK (Storefront)

Customer sends a message via WebSocket connection.
2

VIZOCHOK Backend

WS Handler authenticates and checks rate limits. Shopping Agent (LLM + Tools) processes the request using Hybrid Search and Webhook Calls.
3

Client Backend

VIZOCHOK calls your Products API and Cart API via HTTP webhooks to get live prices and confirm cart operations.
4

Streaming Response

Results stream back to the Widget SDK via WebSocket events. Data is persisted to PostgreSQL (products, tenants) and Redis (sessions, counters).

Step-by-Step Flow

  1. User sends a message via the Widget SDK over WebSocket
  2. API server authenticates the connection using the API key (SHA-256 hashed)
  3. Rate limits are checked at three tiers: per-connection, per-user, per-tenant
  4. Agent is initialized with tenant config, session state (from Redis), and user profile
  5. Smart model routing selects the LLM model:
    • First message in a conversation uses the complex model
    • Subsequent tool-chain iterations use the fast model
  6. Agent processes the message using a tool-based architecture:
    • Search tools: Hybrid search (vector + FTS + trigram) with RRF fusion and optional Cohere reranker
    • Cart tools: Add, remove, update, clear — each calls the client’s webhook
    • UI tools: Show products, ask user, show recipe checklist, show meal plan
  7. Prices are fetched from the client’s backend via the products_url webhook (server-to-server)
  8. Cart operations call the client’s cart_url webhook, which confirms or rejects
  9. Results stream back to the widget via WebSocket as text_delta, product_cards, cart_changed, etc.
  10. Session is saved to Redis for the next message (1h TTL)

What VIZOCHOK Stores vs. What the Client Stores

Understanding the data boundary is critical for integration:

VIZOCHOK Stores

DataStoragePurpose
Product catalogPostgreSQLNames, descriptions, categories, images, SKUs — for search and embeddings
Product embeddingspgvector1536-dimension vectors for semantic search
Conversation historyPostgreSQLFull chat log for analytics and session restore
Session stateRedisCart, pending tools, context — 1h TTL
API keys (hashed)PostgreSQLAuthentication
Tenant configurationPostgreSQLLLM, webhooks, limits, prompts
Usage countersRedisRate limiting and billing
User profilesPostgreSQLLanguage, dietary preferences, favorite brands

Client Stores

DataPurpose
Product pricesReturned via products_url webhook on demand
Product availabilityReturned via products_url webhook on demand
Shopping cartManaged via cart_url / cart_get_url webhooks
User identityPassed as userId to the widget
Order historyNever shared with VIZOCHOK
Payment informationNever shared with VIZOCHOK
VIZOCHOK intentionally does not store prices or availability. These are always fetched in real-time from the client’s backend via webhooks, ensuring the AI always has current data.

Session Lifecycle

New Connection

  1. Client sends auth message
  2. Server responds with auth_ok
  3. Server sends conversation_started with new conversation_id
  4. Message loop begins
  5. On disconnect, session saved to Redis (1h TTL)

Reconnection

  1. Client sends auth message
  2. Server responds with auth_ok
  3. Server sends session_restored with cart + pending tools
  4. Message loop resumes
  5. On disconnect, session saved to Redis (1h TTL)

Session Details

  • Server side: Redis with 1-hour TTL, scoped by tenant_id:conversation_id
  • Client side: sessionStorage (per-tab) stores conversation ID and message history under key vz-session-{storeId}
  • On reconnect: Server sends session_restored with cart state and any pending interactive tool (product selection, quick replies, etc.)
  • On expiry: After 1 hour of inactivity, the Redis session expires and a new conversation starts

Smart Model Routing

VIZOCHOK uses a dual-model strategy to balance quality and cost:
ModelWhen UsedStrengths
Complex model (configurable)First message in a conversationComplex reasoning, intent classification
Fast model (configurable)Subsequent tool-chain iterationsFast response, cost-effective for tool calls
The routing logic selects the model based on:
  1. Conversation position: First message always uses the complex model
  2. Tool chain depth: After the first LLM call, subsequent iterations in the same response use the fast model
  3. Tenant configuration: Each tenant can configure which models to use (via llm_model and llm_model_fast columns)
This approach typically reduces LLM costs by 60-70% compared to using the complex model for every call, while maintaining high quality for the initial intent understanding.

Token Budget

Each conversation has a configurable token budget (max_tokens_per_session). When the accumulated token usage approaches the limit, the agent sends a session_token_limit error and suggests starting a new conversation.

Agent Tool Architecture

The AI agent uses a tool-based architecture where the LLM decides which tools to call:

Non-Interactive Tools

These tools execute and return results to the LLM for further processing. The agent loops: LLM call -> tool execution -> LLM call -> … until it produces a final text response or calls an interactive tool.
ToolDescription
search_productsHybrid search across the product catalog
add_to_cartAdd a product to cart (via webhook)
remove_from_cartRemove a product from cart (via webhook)
update_quantityChange quantity of a cart item (via webhook)
clear_cartClear all cart items (via webhook)
get_cartRetrieve current cart state

Interactive Tools

These tools render UI in the widget and pause for user input. The conversation resumes when the user interacts with the UI element.
ToolDescriptionWidget UI
show_products_to_userDisplay product cards for selectionProduct list with quantity steppers
ask_userPresent quick-reply optionsPill-shaped buttons
show_recipe_checklistShow ingredient checklistCheckable ingredient list with submit
show_meal_planDisplay a meal plan for approvalMulti-day plan with approve/modify
Product search combines three strategies using Reciprocal Rank Fusion (RRF):
  1. Vector search — Cohere embed-v4.0 embeddings (1536 dimensions) for semantic similarity via pgvector
  2. Full-text search — PostgreSQL tsvector with Ukrainian language configuration
  3. Trigram searchpg_trgm for fuzzy matching of brand names and misspellings
Results are fused with RRF and optionally re-ranked using a Cohere reranker for improved relevance. After search, the backend calls the client’s products_url webhook to fetch real-time prices and availability. Products not returned by the webhook are filtered out (treated as unavailable).

Webhook Architecture

VIZOCHOK uses server-to-server webhooks for all commercial data:
  1. VIZOCHOK Backend calls your HTTP endpoints server-to-server
  2. GET products_url returns prices and stock for requested SKUs
  3. POST cart_url confirms add/remove/update/clear cart operations
  4. GET cart_get_url returns current cart contents at session start
All webhook requests include an HMAC-SHA256 signature in the X-VIZOCHOK-Signature header for verification. The shared secret is configured per tenant.

Error Handling

Errors flow through the system as machine-readable codes:
  1. Backend detects error (rate limit, validation, LLM failure)
  2. Backend sends {"type": "error", "code": "rate_limit_exceeded"} via WebSocket
  3. SDK maps the code to a localized message using its i18n dictionary
  4. SDK renders the error in the chat and fires the onError callback
  5. The host page can handle the error (e.g., show a toast, log to analytics)
See Error Codes for the full list.