Skip to main content

Hi Everyone,

As we have been adding URLs to our AI Agent’s knowledge sources, I’ve noticed a few challenges during the crawling process:

  1. Unnecessary Pages Being Crawled – Some sites generate unwanted pages, which I have to manually remove. It would be more efficient if there were a way to refine or control which pages are included during the crawl.
  2. Handling Redirects – Some pages are redirecting, causing the bot to fail in reading the content. Is there a way to improve how redirects are managed to ensure successful crawling?
  3. Unclear Error Messages – Certain pages fail to load, but the error messages provided are not detailed enough to diagnose the issue. More descriptive error reporting would help in troubleshooting. ex. “count not process URL”

I believe these challenges could be addressed by allowing users to upload a sitemap, enabling the bot to crawl only the specified pages. Has anyone else encountered similar issues? Would a sitemap upload feature be feasible for future development?

Looking forward to any insights or workarounds you may have!

 

Thanks! Bryce

Hi


Reply