When to use Website Crawl
Use Website Crawl for:- Help center pages
- Product pages
- Service pages
- Public policy pages
- Public FAQ pages
Start a website crawl
- Open Reference Documents for the selected domain.
- Select Website Crawl.
- Enter the website URL you want to import from.
- Optionally set the maximum number of pages.
- Start the crawl.
- Watch the crawl job status.
- Review the imported documents after the crawl finishes.
- Activate only the pages that should be used by the assistant.
Maximum pages
Maximum pages controls how many pages the crawl should try to collect. Use a smaller number for the first crawl so you can review the results more easily. Increase the number later if the first results are relevant and useful.Crawl jobs list
The crawl jobs list shows previous or current crawl activity. It can help you understand:- Which URL was used
- Whether the crawl is still running
- How many pages were visited
- How many documents were stored
- Whether any pages were skipped or failed
Review imported documents
After a crawl finishes, review the resulting documents in the list. For each imported page:- Preview the content.
- Confirm the page is useful for visitor questions.
- Deactivate pages that are outdated, duplicated, or unrelated.
- Activate the pages the assistant should use.
Delete a crawl job
Deleting a completed or failed crawl job can remove the job history and the documents imported from that crawl. Use this when the crawl targeted the wrong URL or produced results you do not want to keep. Before deleting, confirm that no useful imported documents from that crawl are still needed.Crawl quality checklist
Before activating crawl results, confirm that:- The starting URL was correct.
- Imported pages contain useful visitor-facing content.
- Duplicate pages have been removed or deactivated.
- Navigation-only pages are not active.
- Outdated or campaign-only pages are not active.
- Test questions produce answers based on the expected pages.