Reference Documents: Website Crawl

Website Crawl helps you import content from public website pages into Reference Documents. Use it when your website already contains useful public information that visitors may ask about.

When to use Website Crawl

Use Website Crawl for:

Help center pages
Product pages
Service pages
Public policy pages
Public FAQ pages

Avoid crawling pages that are not useful for answers, such as checkout pages, account pages, search pages, campaign-only pages, or pages with mostly navigation.

Start a website crawl

Open Reference Documents for the selected domain.
Select Website Crawl.
Enter the website URL you want to import from.
Optionally set the maximum number of pages.
Start the crawl.
Watch the crawl job status.
Review the imported documents after the crawl finishes.
Activate only the pages that should be used by the assistant.

Use a focused starting URL when possible. For example, start with a help center or product documentation section instead of the entire website home page.

Maximum pages

Maximum pages controls how many pages the crawl should try to collect. Use a smaller number for the first crawl so you can review the results more easily. Increase the number later if the first results are relevant and useful.

Crawl jobs list

The crawl jobs list shows previous or current crawl activity. It can help you understand:

Which URL was used
Whether the crawl is still running
How many pages were visited
How many documents were stored
Whether any pages were skipped or failed

Select a job to inspect its activity while it is running or after it finishes.

Review imported documents

After a crawl finishes, review the resulting documents in the list. For each imported page:

Preview the content.
Confirm the page is useful for visitor questions.
Deactivate pages that are outdated, duplicated, or unrelated.
Activate the pages the assistant should use.

Do not leave every imported page active by default. A smaller set of high-quality pages usually produces better answers than a large set of mixed pages.

Delete a crawl job

Deleting a completed or failed crawl job can remove the job history and the documents imported from that crawl. Use this when the crawl targeted the wrong URL or produced results you do not want to keep. Before deleting, confirm that no useful imported documents from that crawl are still needed.

Crawl quality checklist

Before activating crawl results, confirm that:

The starting URL was correct.
Imported pages contain useful visitor-facing content.
Duplicate pages have been removed or deactivated.
Navigation-only pages are not active.
Outdated or campaign-only pages are not active.
Test questions produce answers based on the expected pages.

Getting Started

User Account

Workspace

Chatbot Setup

Monitoring and Analytics

Integrations

Developer Prep

Reference Documents: Website Crawl

When to use Website Crawl

Start a website crawl

Maximum pages

Crawl jobs list

Review imported documents

Delete a crawl job

Crawl quality checklist

Getting Started

User Account

Workspace

Chatbot Setup

Monitoring and Analytics

Integrations

Developer Prep

​When to use Website Crawl

​Start a website crawl

​Maximum pages

​Crawl jobs list

​Review imported documents

​Delete a crawl job

​Crawl quality checklist

When to use Website Crawl

Start a website crawl

Maximum pages

Crawl jobs list

Review imported documents

Delete a crawl job

Crawl quality checklist