Error response model
For failed requests, treat the HTTP status code as the primary contract. The response body may include anerror field or a small JSON object with details:
- HTTP status
- endpoint name
- request context you own
- retry count
- latency
request_idwhen the response includes one
Status code handling
| Status | Meaning | Client behavior |
|---|---|---|
400 | Invalid request payload, missing field, or malformed input | Fix request construction. Do not retry without changing the request. |
401 | Missing or invalid authentication | Treat as a configuration or credential incident. Do not retry repeatedly. |
403 | Valid identity but action is not allowed | Check plan, tenant access, site access, or allowed capability. Do not blind-retry. |
404 | Tenant, site, request, or document was not found in the requested scope | Check identifiers and ownership. Do not blind-retry. |
409 | Conflict, when returned by an endpoint | Check whether the resource already exists or whether the operation is duplicated. |
429 | Too many requests | Retry with bounded exponential backoff and jitter. |
5xx | Temporary server-side or upstream failure | Retry with bounded exponential backoff for safely repeatable operations. |
Retry decision table
| Operation | Retry on 429 | Retry on 5xx | Retry on timeout | Notes |
|---|---|---|---|---|
| Read tenant or site information | Yes | Yes | Yes | Safe to retry with backoff. |
| Read sessions or statistics | Yes | Yes | Yes | Safe to retry with backoff. |
| Create text or Q&A document | Yes, bounded | Yes, bounded | Use care | Avoid creating duplicates in your own sync process. |
| Upload file document | Yes, bounded | Yes, bounded | Use care | Keep a source-to-document sync log in your system. |
| Sync chat | Yes, bounded | Yes, bounded | Use care | The user may have already waited; show a fallback on final failure. |
| Async chat submit | Yes, bounded | Yes, bounded | Use care | Store local job state and avoid duplicate user notifications. |
| Async chat polling | Yes | Yes | Stop at timeout budget | Polling should never run forever. |
| Feedback submission | Yes, bounded | Yes, bounded | Use care | Avoid duplicate feedback events from the same UI event. |
| Session close | Yes, bounded | Yes, bounded | Yes | Safe if your app treats close as idempotent at the workflow level. |
JavaScript error class
Use a custom error type so application code can make decisions without parsing strings.JavaScript client with timeout and retry
This example retries only retryable failures and applies a total timeout per request attempt.Safe user-facing fallback
Do not expose raw API errors to end users. Map integration failures to a small set of safe messages in your product.Async polling failure handling
For async chat, distinguish submit failures from polling failures.- Mark the local job as timed out
- Store the
request_id - Avoid endless polling
- Let the user retry through your own product flow
- Review timeout frequency in monitoring
Logging checklist
Log enough to debug without leaking sensitive data. Recommended fields:- Integration name
- Environment
- Endpoint path template
- HTTP status
- Tenant ID and site ID
- Session ID
- Request ID when available
- Retry count
- Latency
- Timeout flag
- API keys
- Authorization headers
- Full secret configuration
- Raw private customer records
- Full document content in normal production logs
Incident routing
Use these routing rules when alerts fire:- Repeated
401: check key rotation, deployment configuration, and revoked keys - Repeated
403: check plan level, tenant access, site access, and enabled capability - Repeated
404: check configuredtenant_id,site_id,request_id, or document ID - Repeated
429: reduce concurrency, add backoff, and inspect traffic spikes - Repeated
5xx: keep bounded retries, monitor recovery, and pause non-critical bulk jobs if needed - Polling timeouts: check async job volume, user timeout budget, and worker concurrency
Production checklist
- All HTTP calls have timeout settings
- Retries are bounded
- Retryable and non-retryable failures are separated
401and403create operational alerts instead of retry storms- Async polling has an overall timeout
- User-facing errors are safe and non-technical
- Server logs exclude API keys and sensitive customer records
- Bulk document sync jobs can pause when error rate increases