AgenticFlow

Product details

Q: Questions

Hi,

I'm interested in the app and have a few questions.

1. Is there a way to make the agent answer only on the training data and not generic knowledge?

2. Is there a way to train URLs and possibly sitemaps? If so, does it have auto-retrain functionality?

101366915154176305384PLUSMay 28, 2025
Founder Team
SeanP_AgenticFlowAI

SeanP_AgenticFlowAI

Jun 1, 2025

A: Hey there!

Great questions – these are key to building effective and reliable AI agents!

1. Agent Answering ONLY on Training Data (Restricting Generic Knowledge):

Yes, this is crucial and achievable through careful prompting. In your Agent's System Prompt (its core instructions), you need to be very explicit:

- "You are an assistant for [My Company/Product]. Your role is to answer questions exclusively based on the information provided in your knowledge base (the documents and website content you have been given)."

"Do NOT use your general knowledge or information from outside these provided sources to answer user queries."

"If you cannot find an answer to a question within your provided knowledge base, politely state that you don't have that specific information and offer to [e.g., connect them to support / provide a contact email / search the web if you've given it that tool explicitly]."

This technique, combined with Retrieval-Augmented Generation (RAG) where the agent first searches your documents, significantly helps in keeping answers grounded in your data. While no LLM can be 100% guaranteed to never access its base training, strong prompting makes a huge difference.

2. Training on URLs and Sitemaps & Auto-Retrain:

Training on URLs:

Yes. When you create an Agent using our 1-click widget in Templates page, you can directly paste URLs, and AgenticFlow will attempt to crawl and index the content from those pages for the agent's knowledge base.

You can also use workflow nodes like Web Scraping or the Firecrawl MCP (https://agenticflow.ai/mcp/firecrawl) to fetch content from URLs and then process/add that content to a dataset your agent can reference.

Training on Sitemaps:
AgenticFlow doesn't have a direct "input sitemap.xml" feature for agent creation at this moment.
Workaround: You can use a workflow:
- Fetch the sitemap.xml (e.g., using the Web Scraping node to get its content or the Firecrawl Map node if it can process sitemaps).
- Parse the XML to extract all the individual page URLs.
- Then, loop through those URLs, scrape each one, and compile the content into a dataset or feed it to an agent for knowledge ingestion (e.g., by updating a Table Dataset programmatically via API, or soon, direct knowledge base updates via API).

This is a great feature request for more direct sitemap support! Please add it to our roadmap: https://agenticflow.featurebase.app/

Auto-Retrain Functionality:
Not fully automatic in the background yet. Currently, if your website content or uploaded documents change, you would typically need to:
- Re-crawl the URLs (if using the agent's URL knowledge source and there's a "re-sync" option, which we're improving).
- Re-upload updated files to the Agent's Knowledge Base.
- Or, re-run the workflow that populates its Table Dataset.

Scheduled Re-Training (Roadmap/Workaround): True "auto-retrain on a schedule" (e.g., "re-crawl these URLs every week and update the agent's knowledge") is a more advanced feature we're planning. For now, you could build a scheduled workflow (triggered by external cron + API call) that re-scrapes key URLs and updates a Table Dataset that your agent uses for RAG.

We're continuously working on making knowledge ingestion and updates more seamless and automated. Your feedback helps us prioritize!

Hope this helps!
— Sean

Share
Helpful?
Log in to join the conversation
Related questions
View product details