Skip to main content
Website Crawl helps you import content from public website pages into Reference Documents. Use it when your website already contains useful public information that visitors may ask about.

When to use Website Crawl

Use Website Crawl for:
  • Help center pages
  • Product pages
  • Service pages
  • Public policy pages
  • Public FAQ pages
Avoid crawling pages that are not useful for answers, such as checkout pages, account pages, search pages, campaign-only pages, or pages with mostly navigation.

Start a website crawl

  1. Open Reference Documents for the selected domain.
  2. Select Website Crawl.
  3. Enter the website URL you want to import from.
  4. Optionally set the maximum number of pages.
  5. Start the crawl.
  6. Watch the crawl job status.
  7. Review the imported documents after the crawl finishes.
  8. Activate only the pages that should be used by the assistant.
Use a focused starting URL when possible. For example, start with a help center or product documentation section instead of the entire website home page.

Maximum pages

Maximum pages controls how many pages the crawl should try to collect. Use a smaller number for the first crawl so you can review the results more easily. Increase the number later if the first results are relevant and useful.

Crawl jobs list

The crawl jobs list shows previous or current crawl activity. It can help you understand:
  • Which URL was used
  • Whether the crawl is still running
  • How many pages were visited
  • How many documents were stored
  • Whether any pages were skipped or failed
Select a job to inspect its activity while it is running or after it finishes.

Review imported documents

After a crawl finishes, review the resulting documents in the list. For each imported page:
  1. Preview the content.
  2. Confirm the page is useful for visitor questions.
  3. Deactivate pages that are outdated, duplicated, or unrelated.
  4. Activate the pages the assistant should use.
Do not leave every imported page active by default. A smaller set of high-quality pages usually produces better answers than a large set of mixed pages.

Delete a crawl job

Deleting a completed or failed crawl job can remove the job history and the documents imported from that crawl. Use this when the crawl targeted the wrong URL or produced results you do not want to keep. Before deleting, confirm that no useful imported documents from that crawl are still needed.

Crawl quality checklist

Before activating crawl results, confirm that:
  1. The starting URL was correct.
  2. Imported pages contain useful visitor-facing content.
  3. Duplicate pages have been removed or deactivated.
  4. Navigation-only pages are not active.
  5. Outdated or campaign-only pages are not active.
  6. Test questions produce answers based on the expected pages.