Operations Guide - Uppzy Documentation

This page is for the team that owns the live integration after implementation is complete. Use it to define:

What must be checked before release
What must be monitored after release
How to respond to common incidents
How to rotate credentials and publish content safely

Read this together with:

Operating model

Every production integration should have clear ownership for:

Runtime configuration
API key lifecycle
Document rollout
Monitoring and alerts
Incident response
Release approvals

If more than one service calls Uppzy, define one owning team and one shared escalation path.

Environment checklist

Keep these values explicit per environment:

UPPZY_BASE_URL
UPPZY_API_KEY
UPPZY_TENANT_ID
UPPZY_SITE_ID
Request timeout settings
Retry settings
Async polling timeout settings

Do not reuse production keys in development or staging.

Pre-release checklist

Before a production rollout:

Confirm the correct tenant_id and site_id
Confirm the deployment has the correct API key
Run a tenant connectivity check
Run one document operation if the release affects content sync
Run one sync chat smoke test
Run one async chat smoke test if async mode is used
Run one feedback submission test if feedback is wired
Confirm dashboards and alerts are active

Smoke test script

Use a small smoke flow after deployment and after any key rotation.

export UPPZY_BASE_URL="https://api.uppzy.com/api/v1"
export UPPZY_API_KEY="<YOUR_API_KEY>"
export UPPZY_TENANT_ID="<YOUR_TENANT_ID>"
export UPPZY_SITE_ID="<YOUR_SITE_ID>"

echo "1) Connectivity"
curl --silent --show-error \
  --request GET \
  --url "$UPPZY_BASE_URL/m2m/tenants/$UPPZY_TENANT_ID/limits" \
  --header "X-API-Key: $UPPZY_API_KEY"

echo
echo "2) Sync chat"
curl --silent --show-error \
  --request POST \
  --url "$UPPZY_BASE_URL/m2m/sites/$UPPZY_SITE_ID/chat" \
  --header "X-API-Key: $UPPZY_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "session_id": "ops_smoke_sync",
    "message": "What are your support hours?",
    "response_language": "en"
  }'

Keep smoke questions simple and based on approved, stable content.

Dashboard checklist

At minimum, your dashboards should show:

Request volume by endpoint
Success rate by endpoint
4xx, 429, and 5xx rate
P50 and P95 latency
Timeout count
Async completion latency
Feedback trend

Useful dimensions:

Environment
Integration name
Tenant ID
Site ID

Alert checklist

Start with simple operational alerts:

Repeated 401 over a short period
Repeated 403 after a deployment or configuration change
Sustained 429 rate
Sustained 5xx rate
Timeout spike
Async polling timeout spike
Sudden drop in request volume for a critical integration

Avoid noisy alerts for one-off single failures.

Content rollout runbook

When publishing a new content batch:

Verify that the source content is approved
Start with a small batch
Record the source IDs in your own sync log
Run smoke questions for the changed topics
Monitor feedback and low-confidence outcomes
Expand the batch only after answer quality is acceptable

Pause the rollout if:

Bad feedback rises sharply
Expected answers stop matching approved content
5xx or timeout rate increases during bulk sync

API key rotation runbook

Treat key rotation as a planned operational change. Recommended sequence:

Create the replacement key
Update runtime configuration in the target environment
Deploy or reload the service
Run the smoke test
Confirm normal traffic and error rate
Revoke the old key after verification

After rotation, watch closely for:

401 spike
Retry storm
Worker failures using stale configuration
Async jobs still using an outdated client instance

Incident runbook

`401 Unauthorized`

Check:

Wrong key in runtime configuration
Old key still used after rotation
Missing key in one worker or one deployment target

Immediate action:

Stop repeated retries
Validate current configuration
Re-run connectivity smoke test

`403 Forbidden`

Check:

Wrong tenant or site targeting
Plan or capability mismatch
Endpoint used by the wrong integration flow

Immediate action:

Pause the affected rollout
Validate environment ownership and endpoint usage

`404 Not Found`

Check:

Wrong tenant_id
Wrong site_id
Wrong request_id
Resource deleted or outside the expected scope

Immediate action:

Inspect the exact identifier values in logs
Confirm which service produced the request

`429 Too Many Requests`

Check:

Traffic burst
Worker concurrency
Polling interval
Unexpected loop or duplicate retries

Immediate action:

Reduce concurrency
Increase backoff
Pause non-critical bulk sync jobs

`5xx` or timeout spike

Check:

Current traffic shape
Bulk content sync jobs
Async worker backlog
Recent deployment or configuration change

Immediate action:

Keep retries bounded
Pause non-essential traffic if needed
Run a small smoke flow to confirm recovery

Deploy checklist for application teams

Use this checklist when a new service or release starts using Uppzy:

Shared client layer is in place
User-facing fallback messages are safe
Metrics emit tenant ID, site ID, status, and latency
Retry behavior is bounded
Async timeout budget is defined
Feedback flow is tested if used
One rollback path is documented

Change log checklist

Record these changes in your internal release notes:

API key rotation date
Environment configuration change
New content batch release
Session strategy change
Retry policy change
Async polling timeout change
Alert threshold change

Weekly review checklist

At least once per week, review:

High-error endpoints
429 trend
Timeout trend
Low-confidence answer trend
Bad feedback trend
Recent content rollout impact

Use this review to decide whether the next action should be:

Content improvement
Retry tuning
Polling change
Alert tuning
Rollback of a recent rollout

​Operating model

​Environment checklist

​Pre-release checklist

​Smoke test script

​Dashboard checklist

​Alert checklist

​Content rollout runbook

​API key rotation runbook

​Incident runbook

​401 Unauthorized

​403 Forbidden

​404 Not Found

​429 Too Many Requests

​5xx or timeout spike

​Deploy checklist for application teams

​Change log checklist

​Weekly review checklist

Operating model

Environment checklist

Pre-release checklist

Smoke test script

Dashboard checklist

Alert checklist

Content rollout runbook

API key rotation runbook

Incident runbook

`401 Unauthorized`

`403 Forbidden`

`404 Not Found`

`429 Too Many Requests`

`5xx` or timeout spike

Deploy checklist for application teams

Change log checklist

Weekly review checklist