- What must be checked before release
- What must be monitored after release
- How to respond to common incidents
- How to rotate credentials and publish content safely
Operating model
Every production integration should have clear ownership for:- Runtime configuration
- API key lifecycle
- Document rollout
- Monitoring and alerts
- Incident response
- Release approvals
Environment checklist
Keep these values explicit per environment:UPPZY_BASE_URLUPPZY_API_KEYUPPZY_TENANT_IDUPPZY_SITE_ID- Request timeout settings
- Retry settings
- Async polling timeout settings
Pre-release checklist
Before a production rollout:- Confirm the correct
tenant_idandsite_id - Confirm the deployment has the correct API key
- Run a tenant connectivity check
- Run one document operation if the release affects content sync
- Run one sync chat smoke test
- Run one async chat smoke test if async mode is used
- Run one feedback submission test if feedback is wired
- Confirm dashboards and alerts are active
Smoke test script
Use a small smoke flow after deployment and after any key rotation.Dashboard checklist
At minimum, your dashboards should show:- Request volume by endpoint
- Success rate by endpoint
4xx,429, and5xxrate- P50 and P95 latency
- Timeout count
- Async completion latency
- Feedback trend
- Environment
- Integration name
- Tenant ID
- Site ID
Alert checklist
Start with simple operational alerts:- Repeated
401over a short period - Repeated
403after a deployment or configuration change - Sustained
429rate - Sustained
5xxrate - Timeout spike
- Async polling timeout spike
- Sudden drop in request volume for a critical integration
Content rollout runbook
When publishing a new content batch:- Verify that the source content is approved
- Start with a small batch
- Record the source IDs in your own sync log
- Run smoke questions for the changed topics
- Monitor feedback and low-confidence outcomes
- Expand the batch only after answer quality is acceptable
- Bad feedback rises sharply
- Expected answers stop matching approved content
5xxor timeout rate increases during bulk sync
API key rotation runbook
Treat key rotation as a planned operational change. Recommended sequence:- Create the replacement key
- Update runtime configuration in the target environment
- Deploy or reload the service
- Run the smoke test
- Confirm normal traffic and error rate
- Revoke the old key after verification
401spike- Retry storm
- Worker failures using stale configuration
- Async jobs still using an outdated client instance
Incident runbook
401 Unauthorized
Check:
- Wrong key in runtime configuration
- Old key still used after rotation
- Missing key in one worker or one deployment target
- Stop repeated retries
- Validate current configuration
- Re-run connectivity smoke test
403 Forbidden
Check:
- Wrong tenant or site targeting
- Plan or capability mismatch
- Endpoint used by the wrong integration flow
- Pause the affected rollout
- Validate environment ownership and endpoint usage
404 Not Found
Check:
- Wrong
tenant_id - Wrong
site_id - Wrong
request_id - Resource deleted or outside the expected scope
- Inspect the exact identifier values in logs
- Confirm which service produced the request
429 Too Many Requests
Check:
- Traffic burst
- Worker concurrency
- Polling interval
- Unexpected loop or duplicate retries
- Reduce concurrency
- Increase backoff
- Pause non-critical bulk sync jobs
5xx or timeout spike
Check:
- Current traffic shape
- Bulk content sync jobs
- Async worker backlog
- Recent deployment or configuration change
- Keep retries bounded
- Pause non-essential traffic if needed
- Run a small smoke flow to confirm recovery
Deploy checklist for application teams
Use this checklist when a new service or release starts using Uppzy:- Shared client layer is in place
- User-facing fallback messages are safe
- Metrics emit tenant ID, site ID, status, and latency
- Retry behavior is bounded
- Async timeout budget is defined
- Feedback flow is tested if used
- One rollback path is documented
Change log checklist
Record these changes in your internal release notes:- API key rotation date
- Environment configuration change
- New content batch release
- Session strategy change
- Retry policy change
- Async polling timeout change
- Alert threshold change
Weekly review checklist
At least once per week, review:- High-error endpoints
429trend- Timeout trend
- Low-confidence answer trend
- Bad feedback trend
- Recent content rollout impact
- Content improvement
- Retry tuning
- Polling change
- Alert tuning
- Rollback of a recent rollout