Tutorials Logic, IN info@tutorialslogic.com

LangChain Capstone: Build a Production RAG Agent with Tools

LangChain Capstone

This capstone combines the whole LangChain tutorial into one realistic developer project: a documentation assistant that answers questions from private files, cites sources, can call a small business tool, and exposes an API endpoint for a frontend or internal dashboard.

The goal is not to copy one script. The goal is to understand the boundaries: ingestion prepares searchable knowledge, retrieval finds relevant chunks, the answer chain grounds model output, tools handle live operations, and evaluation protects the application when you change prompts, models, chunk sizes, or retrievers.

Add one worked example that compares the normal path with the boundary case for LangChain Capstone: Build a Production RAG Agent with Tools.

LangChain Capstone Build a Production RAG Agent with Tools should be studied as a practical LangChain lesson, not as a label. Start by naming the input, the rule that changes the input, and the result a learner should be able to predict after reading the page.

In the langchain > capstone-production-rag-agent page, the notes should connect the definition with a working scenario, a mistake that beginners actually make, and the exact check that proves the fix. That makes the topic useful for coding, debugging, and interview revision.

Mental Model

A production RAG agent is two systems working together: a deterministic retrieval-and-answer pipeline for knowledge questions, and a controlled tool layer for actions that require live data or business logic.

Project Architecture

Keep the capstone split by responsibility. Ingestion should run offline or on a schedule. The API should load an existing index and answer requests. Tools should be small, typed, and safe. Evaluation should run independently from the web server.

This separation makes the project easier to debug. If answers are weak, you can inspect chunks and retrieval. If actions are wrong, you can test tools without involving the LLM. If latency is high, you can profile retrieval, model calls, and streaming separately.

  • <strong>data/docs:</strong> markdown, text, policy, support, or product files.
  • <strong>app/ingest.py:</strong> load files, split chunks, embed text, and save the vector index.
  • <strong>app/rag.py:</strong> retrieve context and answer with citations.
  • <strong>app/tools.py:</strong> safe business functions exposed to the model.
  • <strong>app/server.py:</strong> FastAPI endpoint used by the website or dashboard.
  • <strong>evals/run_eval.py:</strong> regression checks for grounded answers.

Production Rules for the Capstone

The assistant must refuse to invent answers when retrieval does not contain enough evidence. It must cite source filenames. It must not call tools unless the user request needs live or operational information. It must never expose hidden prompts, raw API keys, internal traces, or unrelated chunks.

  • Use a low-temperature model for factual support workflows.
  • Format retrieved chunks with source names before passing them to the model.
  • Return structured metadata such as sources, latency, and tool usage to the API caller.
  • Create test questions where the correct response is “I do not know from the provided documents.”

LangChain Capstone Build a Production RAG Agent with Tools in Real Work

LangChain Capstone Build a Production RAG Agent with Tools matters in LangChain because it changes how a program is written, tested, or debugged. The page should explain the normal flow first: what the developer writes, what the runtime or platform does, and what result should appear.

When teaching LangChain Capstone Build a Production RAG Agent with Tools, avoid stopping at syntax. Show the surrounding decision: why this feature is chosen, what problem it removes, and what would become harder if the feature were not used.

  • Identify the concrete problem solved by LangChain Capstone Build a Production RAG Agent with Tools.
  • Show the normal input, operation, and output for langchain.
  • Mention the nearby alternative a beginner may confuse with this topic.
  • Tie the explanation to a real project task, command, component, query, or debugging step.

Recommended Project Layout

Create a small structure that can grow without becoming a single messy script.

Recommended Project Layout
langchain-support-assistant/
  .env
  requirements.txt
  data/
    docs/
      billing-policy.md
      security.md
      onboarding.md
  storage/
    faiss_index/
  app/
    ingest.py
    rag.py
    tools.py
    server.py
  evals/
    run_eval.py
  • Keep generated indexes in storage and human-authored content in data/docs.
  • Commit code and sample docs, but keep secrets and large generated artifacts out of git.

Install Dependencies

Use separate LangChain packages so imports stay explicit and easier to upgrade.

Install Dependencies
python -m venv .venv
.venv\Scripts\activate
pip install langchain langchain-core langchain-community langchain-openai langchain-text-splitters faiss-cpu fastapi uvicorn python-dotenv
  • On macOS or Linux, activate with source .venv/bin/activate.
  • Set OPENAI_API_KEY in .env or your deployment secret manager.

Offline Ingestion Pipeline

Ingestion turns documents into searchable chunks. Run it whenever source documents change.

Offline Ingestion Pipeline
# app/ingest.py
from pathlib import Path

from dotenv import load_dotenv
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

load_dotenv()

DOCS_DIR = Path("data/docs")
INDEX_DIR = "storage/faiss_index"

def load_documents():
    docs = []
    for path in DOCS_DIR.glob("*.md"):
        loader = TextLoader(str(path), encoding="utf-8")
        for doc in loader.load():
            doc.metadata["source"] = path.name
            docs.append(doc)
    return docs

def build_index():
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=900,
        chunk_overlap=160,
        separators=["\n\n", "\n", ". ", " ", ""],
    )
    chunks = splitter.split_documents(load_documents())
    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
    vectorstore = FAISS.from_documents(chunks, embeddings)
    vectorstore.save_local(INDEX_DIR)
    print(f"Saved {len(chunks)} chunks to {INDEX_DIR}")

if __name__ == "__main__":
    build_index()
  • Chunk size is a quality knob. Smaller chunks improve precision; larger chunks preserve context.
  • Always preserve source metadata so final answers can cite where evidence came from.

Grounded RAG Chain with Citations

This chain retrieves context, formats it with source names, and instructs the model to avoid unsupported answers.

Grounded RAG Chain with Citations
# app/rag.py
from dotenv import load_dotenv
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_community.vectorstores import FAISS
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

load_dotenv()

INDEX_DIR = "storage/faiss_index"

def load_retriever(k=4):
    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
    vectorstore = FAISS.load_local(
        INDEX_DIR,
        embeddings,
        allow_dangerous_deserialization=True,
    )
    return vectorstore.as_retriever(search_kwargs={"k": k})

def format_docs(docs):
    blocks = []
    for doc in docs:
        source = doc.metadata.get("source", "unknown-source")
        blocks.append(f"Source: {source}\n{doc.page_content}")
    return "\n\n---\n\n".join(blocks)

prompt = ChatPromptTemplate.from_messages([
    ("system", """You answer using only the provided context.
If the context does not contain the answer, say you do not know from the documents.
Cite source filenames in every factual answer."""),
    ("human", "Question: {question}\n\nContext:\n{context}"),
])

def build_rag_chain():
    retriever = load_retriever()
    model = ChatOpenAI(model="gpt-4o-mini", temperature=0)

    return (
        {
            "context": retriever | format_docs,
            "question": RunnablePassthrough(),
        }
        | prompt
        | model
        | StrOutputParser()
    )
  • The chain is deterministic enough to test because retrieval and prompt structure are explicit.
  • The model still needs guardrails; context-only instructions reduce hallucination but do not replace evaluation.

Add a Safe Tool and API Endpoint

Tools should be narrow and predictable. The API can route all user questions through one service boundary.

Add a Safe Tool and API Endpoint
# app/server.py
from fastapi import FastAPI
from pydantic import BaseModel, Field

from app.rag import build_rag_chain

app = FastAPI(title="LangChain Support Assistant")
rag_chain = build_rag_chain()

class Question(BaseModel):
    question: str = Field(min_length=3, max_length=1000)

class Answer(BaseModel):
    answer: str

@app.post("/ask", response_model=Answer)
def ask(payload: Question):
    answer = rag_chain.invoke(payload.question)
    return Answer(answer=answer)

# Run:
# uvicorn app.server:app --reload
  • Load chains once at startup instead of rebuilding indexes on every request.
  • Keep request validation at the web boundary; never pass unbounded user input straight into expensive workflows.

Regression Evaluation for the Capstone

Run this after changing prompts, chunking, embedding models, retriever settings, or source documents.

Regression Evaluation for the Capstone
# evals/run_eval.py
from app.rag import build_rag_chain

cases = [
    {
        "question": "How long do annual customers have to request a refund?",
        "must_include": ["14 days", "billing-policy.md"],
    },
    {
        "question": "Can the product reset a user's bank password?",
        "must_include": ["do not know"],
    },
]

chain = build_rag_chain()
failures = []

for case in cases:
    answer = chain.invoke(case["question"])
    missing = [
        required
        for required in case["must_include"]
        if required.lower() not in answer.lower()
    ]
    if missing:
        failures.append((case["question"], missing, answer))

if failures:
    for question, missing, answer in failures:
        print(f"\nFAILED: {question}")
        print("Missing:", ", ".join(missing))
        print("Answer:", answer)
    raise SystemExit(1)

print("Capstone evaluation passed")
  • These checks are not perfect, but they catch obvious regressions quickly.
  • Add real support questions over time so the test set becomes a map of production risk.
Key Takeaways
  • A complete LangChain app separates ingestion, retrieval, answer generation, tools, API serving, and evaluation.
  • Source metadata is not optional in serious RAG. Without citations, users cannot verify the answer.
  • Evaluation should include both answerable and unanswerable questions.
  • Agents and tools belong behind tight boundaries with input validation and clear failure behavior.
  • Explain the purpose of LangChain Capstone: Build a Production RAG Agent with Tools before memorizing syntax.
Common Mistakes to Avoid
WRONG Rebuild the vector index inside every HTTP request.
RIGHT Build indexes offline and load them once when the app starts.
Indexing on request creates high latency and unnecessary API cost.
WRONG Let the model answer from general knowledge when retrieval is weak.
RIGHT Require context-grounded answers and allow “I do not know from the documents.”
A useful refusal is better than a confident unsupported answer.
WRONG Expose broad tools that can do many actions.
RIGHT Expose small tools with typed inputs and limited permissions.
The safer the tool boundary, the easier the agent is to trust.
WRONG Memorizing LangChain Capstone Build a Production RAG Agent with Tools without the situation where it is useful.
RIGHT Connect LangChain Capstone Build a Production RAG Agent with Tools to a concrete LangChain task.
Purpose makes syntax easier to recall.

Practice Tasks

  • Add a /health endpoint that checks whether the vector index can load.
  • Add source links to the API response instead of embedding citations only in text.
  • Create 25 evaluation questions from your own documents, including five unanswerable questions.
  • Add streaming to /ask so long answers appear incrementally in the UI.
  • Write a small example that uses LangChain Capstone Build a Production RAG Agent with Tools in a realistic LangChain scenario.

Frequently Asked Questions

It is a strong foundation. A real deployment should also add authentication, rate limits, structured logging, tracing, secret management, monitoring, and a review process for source documents.

No. Use the deterministic RAG chain for normal knowledge questions. Add agents only for tasks that require choosing among tools or multi-step actions.

The common mistake is memorizing syntax without understanding when the behavior changes or fails.

Remember the problem it solves in LangChain, then attach the syntax or steps to that problem.

Ready to Level Up Your Skills?

Explore 500+ free tutorials across 20+ languages and frameworks.