This capstone combines the whole LangChain tutorial into one realistic developer project: a documentation assistant that answers questions from private files, cites sources, can call a small business tool, and exposes an API endpoint for a frontend or internal dashboard.
The goal is not to copy one script. The goal is to understand the boundaries: ingestion prepares searchable knowledge, retrieval finds relevant chunks, the answer chain grounds model output, tools handle live operations, and evaluation protects the application when you change prompts, models, chunk sizes, or retrievers.
Add one worked example that compares the normal path with the boundary case for LangChain Capstone: Build a Production RAG Agent with Tools.
LangChain Capstone Build a Production RAG Agent with Tools should be studied as a practical LangChain lesson, not as a label. Start by naming the input, the rule that changes the input, and the result a learner should be able to predict after reading the page.
In the langchain > capstone-production-rag-agent page, the notes should connect the definition with a working scenario, a mistake that beginners actually make, and the exact check that proves the fix. That makes the topic useful for coding, debugging, and interview revision.
A production RAG agent is two systems working together: a deterministic retrieval-and-answer pipeline for knowledge questions, and a controlled tool layer for actions that require live data or business logic.
Keep the capstone split by responsibility. Ingestion should run offline or on a schedule. The API should load an existing index and answer requests. Tools should be small, typed, and safe. Evaluation should run independently from the web server.
This separation makes the project easier to debug. If answers are weak, you can inspect chunks and retrieval. If actions are wrong, you can test tools without involving the LLM. If latency is high, you can profile retrieval, model calls, and streaming separately.
The assistant must refuse to invent answers when retrieval does not contain enough evidence. It must cite source filenames. It must not call tools unless the user request needs live or operational information. It must never expose hidden prompts, raw API keys, internal traces, or unrelated chunks.
LangChain Capstone Build a Production RAG Agent with Tools matters in LangChain because it changes how a program is written, tested, or debugged. The page should explain the normal flow first: what the developer writes, what the runtime or platform does, and what result should appear.
When teaching LangChain Capstone Build a Production RAG Agent with Tools, avoid stopping at syntax. Show the surrounding decision: why this feature is chosen, what problem it removes, and what would become harder if the feature were not used.
Create a small structure that can grow without becoming a single messy script.
langchain-support-assistant/
.env
requirements.txt
data/
docs/
billing-policy.md
security.md
onboarding.md
storage/
faiss_index/
app/
ingest.py
rag.py
tools.py
server.py
evals/
run_eval.py
Use separate LangChain packages so imports stay explicit and easier to upgrade.
python -m venv .venv
.venv\Scripts\activate
pip install langchain langchain-core langchain-community langchain-openai langchain-text-splitters faiss-cpu fastapi uvicorn python-dotenv
Ingestion turns documents into searchable chunks. Run it whenever source documents change.
# app/ingest.py
from pathlib import Path
from dotenv import load_dotenv
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
load_dotenv()
DOCS_DIR = Path("data/docs")
INDEX_DIR = "storage/faiss_index"
def load_documents():
docs = []
for path in DOCS_DIR.glob("*.md"):
loader = TextLoader(str(path), encoding="utf-8")
for doc in loader.load():
doc.metadata["source"] = path.name
docs.append(doc)
return docs
def build_index():
splitter = RecursiveCharacterTextSplitter(
chunk_size=900,
chunk_overlap=160,
separators=["\n\n", "\n", ". ", " ", ""],
)
chunks = splitter.split_documents(load_documents())
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(chunks, embeddings)
vectorstore.save_local(INDEX_DIR)
print(f"Saved {len(chunks)} chunks to {INDEX_DIR}")
if __name__ == "__main__":
build_index()
This chain retrieves context, formats it with source names, and instructs the model to avoid unsupported answers.
# app/rag.py
from dotenv import load_dotenv
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_community.vectorstores import FAISS
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
load_dotenv()
INDEX_DIR = "storage/faiss_index"
def load_retriever(k=4):
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.load_local(
INDEX_DIR,
embeddings,
allow_dangerous_deserialization=True,
)
return vectorstore.as_retriever(search_kwargs={"k": k})
def format_docs(docs):
blocks = []
for doc in docs:
source = doc.metadata.get("source", "unknown-source")
blocks.append(f"Source: {source}\n{doc.page_content}")
return "\n\n---\n\n".join(blocks)
prompt = ChatPromptTemplate.from_messages([
("system", """You answer using only the provided context.
If the context does not contain the answer, say you do not know from the documents.
Cite source filenames in every factual answer."""),
("human", "Question: {question}\n\nContext:\n{context}"),
])
def build_rag_chain():
retriever = load_retriever()
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
return (
{
"context": retriever | format_docs,
"question": RunnablePassthrough(),
}
| prompt
| model
| StrOutputParser()
)
Tools should be narrow and predictable. The API can route all user questions through one service boundary.
# app/server.py
from fastapi import FastAPI
from pydantic import BaseModel, Field
from app.rag import build_rag_chain
app = FastAPI(title="LangChain Support Assistant")
rag_chain = build_rag_chain()
class Question(BaseModel):
question: str = Field(min_length=3, max_length=1000)
class Answer(BaseModel):
answer: str
@app.post("/ask", response_model=Answer)
def ask(payload: Question):
answer = rag_chain.invoke(payload.question)
return Answer(answer=answer)
# Run:
# uvicorn app.server:app --reload
Run this after changing prompts, chunking, embedding models, retriever settings, or source documents.
# evals/run_eval.py
from app.rag import build_rag_chain
cases = [
{
"question": "How long do annual customers have to request a refund?",
"must_include": ["14 days", "billing-policy.md"],
},
{
"question": "Can the product reset a user's bank password?",
"must_include": ["do not know"],
},
]
chain = build_rag_chain()
failures = []
for case in cases:
answer = chain.invoke(case["question"])
missing = [
required
for required in case["must_include"]
if required.lower() not in answer.lower()
]
if missing:
failures.append((case["question"], missing, answer))
if failures:
for question, missing, answer in failures:
print(f"\nFAILED: {question}")
print("Missing:", ", ".join(missing))
print("Answer:", answer)
raise SystemExit(1)
print("Capstone evaluation passed")
Rebuild the vector index inside every HTTP request.
Build indexes offline and load them once when the app starts.
Let the model answer from general knowledge when retrieval is weak.
Require context-grounded answers and allow “I do not know from the documents.”
Expose broad tools that can do many actions.
Expose small tools with typed inputs and limited permissions.
Memorizing LangChain Capstone Build a Production RAG Agent with Tools without the situation where it is useful.
Connect LangChain Capstone Build a Production RAG Agent with Tools to a concrete LangChain task.
It is a strong foundation. A real deployment should also add authentication, rate limits, structured logging, tracing, secret management, monitoring, and a review process for source documents.
No. Use the deterministic RAG chain for normal knowledge questions. Add agents only for tasks that require choosing among tools or multi-step actions.
The common mistake is memorizing syntax without understanding when the behavior changes or fails.
Remember the problem it solves in LangChain, then attach the syntax or steps to that problem.
Explore 500+ free tutorials across 20+ languages and frameworks.