LangChain interview questions covering chains, agents, prompts, retrievers, tools, memory, LangSmith, and production RAG apps.
LangChain is a framework for building applications around language models. It helps developers compose prompts, models, retrievers, tools, memory, agents, and observability into reusable workflows. Example: a support bot can use LangChain to retrieve policy documents, format a prompt, call an LLM, parse the answer, and trace the run.
LangChain is used to organize common LLM application patterns such as RAG, tool calling, prompt templates, agents, and output parsing. Instead of writing every step manually, developers can compose building blocks. Example: PromptTemplate -> Retriever -> ChatModel -> OutputParser can become one pipeline.
LCEL stands for LangChain Expression Language. It lets you compose chains using runnable components and pipe-style syntax. Example: prompt | model | parser creates a simple chain that formats input, calls the model, and parses the result.
A Runnable is a component that can be invoked, streamed, batched, or composed with other components. Prompts, models, retrievers, parsers, and custom functions can behave as runnables. Example: a retriever runnable can accept a question and return relevant documents.
A chain is a sequence or graph of steps that work together to complete a task. Example: take a user question, retrieve documents, build a prompt, call an LLM, and parse the output. Chains make workflows easier to reuse and test.
A chain follows a predefined workflow, while an agent can decide which tools to use and what steps to take. Example: a RAG chain always retrieves then answers; an agent may decide to search docs, call a calculator, ask a follow-up question, or stop.
PromptTemplate is used to create reusable prompts with variables. Example: "Answer the question using this context: {context}. Question: {question}" can be filled at runtime with retrieved documents and a user question.
ChatPromptTemplate builds multi-message prompts for chat models, including system, human, and assistant messages. Example: a system message can set "You are a policy assistant", while the human message contains the user question and context.
An output parser converts model text into a desired structure such as JSON, a list, a Pydantic object, or plain string. Example: after asking a model to classify a ticket, a parser can ensure the output has category, priority, and reason fields.
Output parsers are important because LLM output can be inconsistent. Parsers make responses safer for downstream systems. Example: if an API expects JSON, a parser can validate fields and fail fast instead of passing malformed text.
A retriever returns relevant documents for a query. It is commonly used in RAG systems. Example: a retriever receives "What is the refund window?" and returns the most relevant chunks from the refund policy.
A document loader reads content from sources such as PDFs, web pages, text files, CSV files, Notion, GitHub, or databases and converts it into Document objects. Example: a PDF loader can extract HR policy pages for indexing.
A text splitter breaks long documents into smaller chunks for embedding and retrieval. Example: RecursiveCharacterTextSplitter can split policy text by sections, paragraphs, and characters while trying to preserve meaning.
A vector store stores document embeddings and supports similarity search. LangChain integrates with stores such as FAISS, Chroma, Pinecone, Weaviate, Milvus, Qdrant, and pgvector. Example: store support article embeddings and retrieve the closest chunks for a customer question.
An embedding model converts text into vectors that capture semantic meaning. LangChain uses embedding models when indexing documents and when converting user queries for vector search. Example: embed "refund after delivery" and find policy chunks about post-delivery returns.
A simple RAG chain loads documents, splits them, embeds chunks, stores vectors, retrieves relevant chunks, inserts them into a prompt, calls a chat model, and parses the answer. Example workflow: loader -> splitter -> vectorstore -> retriever -> prompt -> model -> parser.
RetrievalQA is a pattern where a retriever supplies documents and an LLM answers the question using those documents. In newer LangChain patterns, developers often build this with LCEL for more control over prompt format, citations, and parsing.
ConversationalRetrievalChain is a pattern for RAG with chat history. It can reformulate follow-up questions before retrieval. Example: if the user asks "What about next month?" after discussing leave, the chain rewrites it into a standalone leave-policy query.
Question rewriting converts a conversational or vague question into a standalone search query. Example: "Can I use it later?" becomes "Can an employee carry forward unused annual leave?" This improves retriever accuracy.
Store metadata such as source URL, page number, and document title with each chunk. Pass retrieved documents and metadata to the prompt, then instruct the model to cite source IDs. Example: "Use [S1], [S2] after claims and do not cite sources you did not use."
A tool is a callable function an agent or model can use to interact with external systems. Example: a shipping-status tool accepts order_id and returns current delivery state. Tools should have clear names, descriptions, schemas, and permission checks.
A good tool description tells the model when to use the tool, what inputs it requires, and what it returns. Example: "Use get_order_status only when the user provides a valid order ID. It returns status, ETA, and carrier. Do not use it for refund questions."
A LangChain agent uses an LLM to choose actions, call tools, observe results, and continue until it completes a task. Example: an agent can search documentation, call a calculator, summarize findings, and ask the user for missing information.
Agents can call the wrong tool, loop too long, expose data, make expensive calls, or take unsafe actions. Production agents need limits, timeouts, tool permissions, human approval for risky actions, and detailed tracing.
Memory stores conversation or state across turns. Example: a chatbot can remember the user is asking about order #123 during the session. Memory should be scoped, privacy-aware, and not treated as a permanent source of truth unless intentionally persisted.
Conversation buffer memory stores previous chat messages directly. It is simple but can grow large and exceed the context window. Example: after 50 turns, the full history may become too expensive or too long for the model.
Summary memory compresses previous conversation into a shorter summary. Example: instead of storing every message, it stores "User is comparing refund policy for domestic orders and has order #123." It saves tokens but may lose details.
LangSmith is an observability and evaluation platform for LangChain applications. It helps trace runs, inspect prompts, compare outputs, evaluate datasets, debug errors, and monitor production behavior.
Tracing shows each step of a chain or agent run: inputs, retrieved documents, prompts, model calls, tool calls, outputs, latency, and errors. Example: if a RAG answer is wrong, tracing helps reveal whether retrieval failed or the model ignored good context.
Check the original question, rewritten query, retrieved documents, metadata filters, final prompt, model output, and parser result. Example: if the correct document was not retrieved, fix chunking or retrieval before changing the LLM prompt.
Streaming sends model output chunks as they are generated. It improves perceived speed in chat apps. Example: a writing assistant can show text word by word instead of waiting for the entire response.
Batching processes multiple inputs together or concurrently. Example: classify 100 support tickets using the same chain. Batching can improve throughput but must respect rate limits and error handling.
Async execution allows chains or tools to run without blocking the main thread. Example: a RAG app can retrieve documents and call independent tools concurrently to reduce latency.
Use retries with backoff, queueing, caching, batching, smaller models for simple tasks, and request throttling. Example: if the model provider returns rate-limit errors, retry after a delay and avoid launching unlimited parallel calls.
Retry logic reruns a failed step, often with delay or modified instructions. Example: if a model returns invalid JSON, retry with a stricter prompt or use an output-fixing parser, but cap retries to prevent loops.
Fallback strategy uses an alternate model or path when the primary call fails or is too expensive. Example: use a smaller model for simple classification, but fall back to a larger model when confidence is low.
Use an output parser or schema validator to check fields, types, and allowed values. Example: parse a ticket triage response into category, priority, and reason; reject or retry if category is not one of the approved labels.
A Pydantic output parser validates model output against a Pydantic schema. Example: define a CustomerIssue model with category, priority, and summary, then parse the LLM output into that model before saving it.
Secure tools with input validation, authorization, least privilege, allowlisted actions, audit logs, and human approval for sensitive operations. Example: a refund tool should verify user permission and amount limits before issuing a refund.
Separate trusted instructions from user and retrieved content, restrict tool permissions, validate tool inputs, avoid exposing secrets, and monitor suspicious traces. Example: a retrieved web page saying "ignore the system prompt" must be treated as data, not instruction.
Evaluate task success, retrieval relevance, answer faithfulness, citation accuracy, format compliance, tool correctness, latency, cost, and user feedback. Example: use a LangSmith dataset of 100 support questions and compare prompt versions.
An evaluation dataset contains test inputs, expected outputs, labels, or grading criteria. Example: store common HR questions, expected policy sections, and correct answers to test a RAG chain after prompt or retriever changes.
Track token usage, model calls, tool calls, retries, retrieved context length, and latency. Example: a prompt that adds 20 retrieved chunks may dramatically increase cost, so logs should show prompt size and output tokens.
Reduce latency by shortening prompts, lowering top-k retrieval, caching, streaming output, running independent steps in parallel, using smaller models, limiting agent loops, and avoiding unnecessary reranking.
Common mistakes include overusing agents, weak tool descriptions, no output validation, poor chunking, missing access control, no tracing, no evaluation dataset, and changing prompts without regression tests.
Avoid LangChain when the task is a simple one-off model call, when the abstraction adds more complexity than value, or when strict low-level control is required. Example: a basic "summarize this paragraph" endpoint may not need LangChain.
Metadata filters restrict retrieval to documents that match fields such as user role, region, product, source, or document type. Example: a support bot can retrieve only chunks where product = "Billing" and region = "India" before answering a pricing-policy question.
Keep source metadata with retrieved documents, pass source IDs into the prompt, and ask the model to cite only documents it used. Example: format context as [S1] refund policy page 3 and instruct: "Add [S1] after any claim supported by that source."
Start with a clear workflow, keep chains simple, validate inputs and outputs, enforce permissions, trace every run, build an evaluation dataset, monitor cost and latency, add fallback behavior, and test prompts before deployment.
Explain the business problem, chain or agent design, prompt template, retriever or tools used, output parser, evaluation method, tracing setup, and production lessons. Example: "I built a policy RAG bot using loaders, splitters, pgvector retriever, source citations, JSON parser, and LangSmith evals."
Explore 500+ free tutorials across 20+ languages and frameworks.