LangChain Conversational RAG: Chat History, Query Rewriting and Source-Grounded Answers

LangChain Conversational RAG

Conversational RAG is harder than single-turn RAG because users ask follow-up questions like “what about enterprise plans?” or “explain step two again.” The retriever needs a standalone search query, while the answer model needs both retrieved evidence and conversation context.

A reliable design separates query rewriting from answering. The first model call rewrites the user question for search. The second model call answers using retrieved documents and visible conversation history.

LangChain is expanded here with a practical explanation, multiple examples, and beginner-focused checks so the idea is easier to learn from this page alone.

Read the concept first, then trace the example line by line. The important habit is to connect the rule to visible behavior instead of memorizing only the name.

Mental Model

Conversational RAG has two contexts: chat history for understanding the user, and retrieved documents for factual grounding.

Why Query Rewriting Matters

Retrievers do not understand pronouns and vague follow-ups as well as chat models. If the user asks “does it support that?”, the retriever needs a rewritten query such as “Does the product support SAML SSO for enterprise customers?”

Rewrite only the search query; do not answer during rewriting.
Keep the original user message for the final response.
Limit history length so old turns do not pollute retrieval.

Conversation State

Use chat history when it changes the meaning of the current question. Do not stuff the entire conversation into every prompt forever. Summarize old turns or store only relevant state when conversations become long.

Persist history by session ID.
Store human and assistant messages, not hidden internal tool traces.
Handle topic changes by allowing retrieval to ignore irrelevant history.

Detailed Explanation of LangChain

LangChain becomes much easier when you separate the concept from the tool syntax. First identify the problem being solved, then identify the data or resource being changed, and finally identify the proof that the change worked.

In LangChain, this topic should be studied through prompt inputs, model calls, parser behavior, retrieved context, tool boundaries, and validation. Those points explain not only how to use the feature, but also why it fails when the wrong assumption is made.

The previous audit note was: under 650 content words . This expanded section adds a fuller explanation, concrete examples, and practice guidance so the page can stand on its own for beginners.

A good way to learn this page is to read the normal path once, run or trace the example, then intentionally change one input to observe the different result. That one change teaches more than memorizing several definitions.

Write the goal of LangChain before touching code or configuration.
Identify the normal case, edge case, and failure case.
Trace what changes before and after the operation.
Use a command, output, compiler message, log, metric, or table to verify the result.
Record the mistake that would confuse a beginner and the exact fix.

Beginner-Friendly Walkthrough for LangChain

Start with a tiny project scenario. For example, imagine one user action, one request, one resource, one function call, or one batch of data. Keep the scenario small enough that every step can be explained without skipping details.

Next, describe the movement of information. Where does the input start? Which rule or component handles it? What result should appear? If the result is wrong, where would you inspect first?

Finally, compare two outcomes. The correct outcome proves that you understand the main rule. The incorrect outcome teaches the symptom, which is what you will recognize later during debugging or interviews.

Normal path: valid input produces the expected result.
Boundary path: the smallest, largest, empty, or unusual input still behaves predictably.
Error path: a realistic mistake creates a visible symptom.
Fix path: one focused correction removes the symptom without changing unrelated code.

Standalone Question Rewriter

The rewriter turns follow-up questions into search-friendly questions.

Standalone Question Rewriter

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI

rewrite_prompt = ChatPromptTemplate.from_messages([
    ("system", "Rewrite the latest user question as a standalone search query. Do not answer it."),
    MessagesPlaceholder("chat_history"),
    ("human", "{question}"),
])

rewriter = rewrite_prompt | ChatOpenAI(model="gpt-4o-mini", temperature=0) | StrOutputParser()

The rewriter should produce a query, not a final answer.
This improves retrieval for pronouns, ellipsis, and follow-up language.

Answer with History and Retrieved Context

The answer prompt uses history for continuity and documents for factual claims.

Answer with History and Retrieved Context

answer_prompt = ChatPromptTemplate.from_messages([
    ("system", """Answer using only the context.
If the context is insufficient, say you do not know from the documents.
Cite source filenames."""),
    MessagesPlaceholder("chat_history"),
    ("human", "Question: {question}\n\nContext:\n{context}"),
])

def answer_question(question, chat_history, retriever, model):
    rewritten = rewriter.invoke({"question": question, "chat_history": chat_history})
    docs = retriever.invoke(rewritten)
    context = "\n\n".join(
        f"Source: {d.metadata.get('source')}\n{d.page_content}"
        for d in docs
    )
    return (answer_prompt | model | StrOutputParser()).invoke({
        "question": question,
        "chat_history": chat_history,
        "context": context,
    })

The final answer receives the original question so tone and intent are preserved.
Retrieved context is the evidence boundary for factual claims.

LangChain focused LangChain runnable example

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_template('Explain LangChain with one example and one warning.')
chain = prompt | (lambda message: message.text) | StrOutputParser()

# In a real app, replace the lambda with a chat model and keep the parser step explicit.

LangChain LangChain validation example

def check_answer(answer: str) -> list[str]:
    issues = []
    if 'source' not in answer.lower():
        issues.append('Add sources or retrieved context.')
    if len(answer) < 120:
        issues.append('Add a fuller explanation for LangChain.')
    return issues

print(check_answer('Short answer without source'))

Key Takeaways

Rewrite follow-up questions before retrieval.
Use chat history for meaning, not as a replacement for documents.
Cite documents in final answers and refuse unsupported answers.
Explain the purpose of LangChain in your own words.
Run or trace a small LangChain example for LangChain.
Test a normal case, a boundary case, and a broken case.
Verify the result with visible output, logs, metrics, compiler feedback, or a table.
Summarize the common mistake and the correction.

Common Mistakes to Avoid

WRONG Retrieve with vague follow-up text such as “what about that?”

RIGHT Rewrite follow-ups into standalone retrieval queries.

Retrievers need explicit search terms.

WRONG Put unlimited chat history into every prompt.

RIGHT Keep recent useful turns and summarize or trim old context.

Long irrelevant history increases cost and confusion.

WRONG Learning LangChain only as a term.

RIGHT Learn it through a working example, a boundary case, and a failure case.

Concept plus behavior is easier to remember than definition alone.

WRONG Skipping verification.

RIGHT Always check output, state, logs, metrics, query results, or compiler feedback.

Verification turns confidence into evidence.

WRONG Changing many things at once while debugging.

RIGHT Change one setting, input, or line, then inspect the result.

Small changes reveal the real cause.

Practice Tasks

Build a session-based chat endpoint that stores history in memory.
Add a topic-change test where old history should not affect retrieval.
Create five follow-up questions and verify rewritten search queries manually.
Create a small demo that shows LangChain clearly.
Add one edge case and write the expected result before running it.
Break the demo intentionally and document the error symptom.
Fix the broken version and explain why the fix works.

Frequently Asked Questions

Does conversational RAG require memory?

It requires some session history, but that can be short-term chat history rather than long-term user memory.

Should source citations include every retrieved document?

No. Cite only sources actually used to answer the question.

What is the fastest way to understand LangChain?

Start with one tiny example, trace every step, then compare it with a broken version.

What should I verify after using LangChain?

Verify the visible result: output, state, log entry, metric, query result, compiler feedback, or rendered behavior.

Why does LangChain feel confusing at first?

It often combines vocabulary with behavior. The confusion drops when you trace the input, rule, result, and failure path.

Previous Next

LangChain Conversational RAG: Chat History, Query Rewriting and Source-Grounded Answers