Jared AI Hub
Published on

LangChain and LlamaIndex: Building LLM Applications

Authors
  • avatar
    Name
    Jared Chung
    Twitter

Introduction

Building LLM applications from scratch requires implementing chains, memory, document loading, vector storage, and more. LangChain and LlamaIndex are frameworks that provide these building blocks, saving weeks of development time.

But which should you use? They overlap significantly but have different philosophies:

  • LangChain is a general-purpose orchestration framework for any LLM workflow
  • LlamaIndex is focused specifically on connecting LLMs with your data

This guide helps you understand both frameworks and choose the right one for your project.

Understanding the Frameworks

LangChain vs LlamaIndex Comparison

LangChain: The Swiss Army Knife

LangChain's philosophy is flexibility. It provides primitives for:

  • Models: Unified interface to any LLM (OpenAI, Anthropic, local models)
  • Prompts: Template management and composition
  • Chains: Sequences of operations
  • Memory: Conversation history management
  • Agents: LLMs that can use tools and make decisions
  • Callbacks: Logging, streaming, and monitoring

The learning curve is steeper because there's more to learn, but you can build anything.

LlamaIndex: The Data Expert

LlamaIndex's philosophy is simplicity for data-centric applications. It excels at:

  • Data Connectors: Load from 160+ sources (Notion, Slack, databases, etc.)
  • Index Structures: Different ways to organize data for retrieval
  • Query Engines: Sophisticated retrieval and response synthesis
  • Node Processing: Transform and filter retrieved content

For RAG applications, LlamaIndex often requires less code.

When to Use Each

Use CaseRecommendedWhy
Simple Q&A over documentsLlamaIndexPurpose-built, less code
Complex multi-step agentsLangChainBetter agent framework
Chatbot with memoryLangChainRobust memory options
Multiple data sourcesLlamaIndex160+ connectors
Custom LLM workflowsLangChainLCEL composition
Production RAGBothUse together for full power

Core Concepts

LangChain Expression Language (LCEL)

LCEL is LangChain's declarative way to compose chains using the pipe operator:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Create components
prompt = ChatPromptTemplate.from_template("Explain {topic} simply")
model = ChatOpenAI(model="gpt-4o-mini")
parser = StrOutputParser()

# Compose with pipe operator
chain = prompt | model | parser

# Run
result = chain.invoke({"topic": "quantum computing"})

The chain flows left to right: prompt formats the input → model generates → parser extracts text.

Why LCEL?

  • Automatic streaming support
  • Parallel execution where possible
  • Built-in retry logic
  • Easy debugging with intermediate values

LlamaIndex Query Pipeline

LlamaIndex has a similar concept with query pipelines:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.query_pipeline import QueryPipeline

# Load and index documents
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Create query engine
query_engine = index.as_query_engine()

# Query
response = query_engine.query("What is the main topic?")

LlamaIndex's approach is more opinionated—it handles more automatically but gives less control.

Building a RAG Application

Let's compare implementing the same RAG system in both frameworks.

LangChain RAG

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

# 1. Load documents
loader = PyPDFLoader("document.pdf")
docs = loader.load()

# 2. Split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(docs)

# 3. Create vector store
vectorstore = Chroma.from_documents(chunks, OpenAIEmbeddings())
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# 4. Create chain
template = """Answer based on this context:
{context}

Question: {question}"""

prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI(model="gpt-4o-mini")

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
)

# 5. Query
response = chain.invoke("What are the key findings?")

LlamaIndex RAG

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI

# 1. Configure (optional)
Settings.llm = OpenAI(model="gpt-4o-mini")

# 2. Load and index (handles chunking automatically)
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

# 3. Create query engine
query_engine = index.as_query_engine(similarity_top_k=5)

# 4. Query
response = query_engine.query("What are the key findings?")

Comparison:

  • LlamaIndex: ~10 lines, automatic chunking
  • LangChain: ~25 lines, explicit control over each step

For simple RAG, LlamaIndex is faster to implement. For customization (custom chunking, hybrid search, reranking), LangChain gives more control.

Key Features Deep Dive

Agents: LangChain's Strength

Agents are LLMs that can decide which tools to use and in what order:

from langchain_openai import ChatOpenAI
from langchain.agents import tool, AgentExecutor, create_openai_functions_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Define tools
@tool
def search_database(query: str) -> str:
    """Search the product database."""
    return f"Found 3 products matching '{query}'"

@tool
def get_weather(city: str) -> str:
    """Get current weather for a city."""
    return f"Weather in {city}: 72°F, sunny"

# Create agent
llm = ChatOpenAI(model="gpt-4o-mini")
tools = [search_database, get_weather]

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("human", "{input}"),
    MessagesPlaceholder("agent_scratchpad"),
])

agent = create_openai_functions_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)

# The agent decides which tools to use
result = executor.invoke({"input": "What products match 'laptop'?"})

The agent autonomously:

  1. Understands the user's intent
  2. Decides to call search_database
  3. Formats and returns the response

Data Connectors: LlamaIndex's Strength

LlamaIndex has connectors for virtually any data source:

from llama_index.readers.notion import NotionPageReader
from llama_index.readers.slack import SlackReader
from llama_index.readers.database import DatabaseReader

# From Notion
notion_loader = NotionPageReader(integration_token="secret_...")
notion_docs = notion_loader.load_data(page_ids=["page_id"])

# From Slack
slack_loader = SlackReader(slack_token="xoxb-...")
slack_docs = slack_loader.load_data(channel_ids=["C12345"])

# From SQL database
db_loader = DatabaseReader(uri="postgresql://...")
db_docs = db_loader.load_data(query="SELECT * FROM articles")

# Combine all sources into one index
from llama_index.core import VectorStoreIndex
all_docs = notion_docs + slack_docs + db_docs
index = VectorStoreIndex.from_documents(all_docs)

Memory: Conversation History

Both frameworks handle conversation memory differently.

LangChain:

from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {}

def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

chain_with_memory = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

LlamaIndex:

from llama_index.core.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer.from_defaults(token_limit=3000)
chat_engine = index.as_chat_engine(memory=memory)

# Maintains context across calls
response1 = chat_engine.chat("What is the document about?")
response2 = chat_engine.chat("Can you elaborate on that?")  # Remembers context

Choosing the Right Framework

Start with LlamaIndex if:

  1. Your primary use case is RAG - Purpose-built with sensible defaults
  2. You're connecting multiple data sources - Extensive connector library
  3. You want to move fast - Less configuration needed
  4. The team is new to LLM apps - Gentler learning curve

Start with LangChain if:

  1. You need agents with tools - Best-in-class agent framework
  2. You have complex workflows - LCEL enables sophisticated composition
  3. You need fine-grained control - Explicit control over every step
  4. You're building beyond RAG - More general-purpose

Use Both Together

For production applications, the frameworks complement each other:

# Use LlamaIndex for data loading and indexing
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Export to LangChain for complex chains
from llama_index.core.langchain_helpers.text_splitter import LangchainNodeParser
from langchain_community.vectorstores import Chroma

# Use LlamaIndex's data, LangChain's orchestration
# Best of both worlds

Common Patterns

Both frameworks support combining vector and keyword search:

# LangChain
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever

bm25 = BM25Retriever.from_documents(docs)
vector = vectorstore.as_retriever()
hybrid = EnsembleRetriever(retrievers=[bm25, vector], weights=[0.5, 0.5])

Structured Output

Enforce output schemas:

# LangChain
from pydantic import BaseModel

class Summary(BaseModel):
    title: str
    key_points: list[str]
    sentiment: str

structured_llm = llm.with_structured_output(Summary)
result = structured_llm.invoke("Summarize: ...")  # Returns Summary object

Streaming

Stream responses for better UX:

# LangChain
for chunk in chain.stream({"question": "Explain AI"}):
    print(chunk, end="", flush=True)

# LlamaIndex
response = query_engine.query("Explain AI")
for token in response.response_gen:
    print(token, end="", flush=True)

Conclusion

Both LangChain and LlamaIndex are excellent frameworks that will accelerate your LLM development:

LlamaIndex gets you to a working RAG system faster with less code. Start here if document Q&A is your primary use case.

LangChain gives you more control and flexibility for complex applications. Start here if you need agents, complex chains, or highly customized workflows.

For production applications, consider using both—LlamaIndex for data handling, LangChain for orchestration. They integrate well together.

The best framework is the one that matches your use case and team experience. Start with the simpler option (usually LlamaIndex for RAG) and add complexity only when needed.

References