Introduction

Vector databases are purpose-built to store and query high-dimensional embedding vectors efficiently. They've become essential infrastructure for modern AI applications like semantic search, recommendation systems, and Retrieval Augmented Generation (RAG). In this post, we'll explore what vector databases are, when to use them, and compare the leading options with complete, runnable code examples.

Prerequisites

# Core packages
pip install chromadb pinecone-client weaviate-client qdrant-client
pip install sentence-transformers numpy

# Optional: for advanced examples
pip install rank-bm25 fastembed

Why Vector Databases?

Traditional Databases vs. Vector Databases

Traditional databases excel at exact matching:

SELECT * FROM products WHERE name = 'iPhone 15';
SELECT * FROM products WHERE category = 'electronics' AND price < 500;

But what about semantic queries like "smartphones with good cameras"? Traditional databases can't understand that "good cameras" relates to megapixels, aperture, and image quality. This requires:

Converting text to embeddings (dense vectors that capture meaning)
Finding similar vectors efficiently
Scaling to millions or billions of vectors

Vector databases solve all three challenges.

What is a Vector Embedding?

An embedding is a numerical representation of data (text, images, audio) where similar items have similar numbers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

texts = [
    "I love programming in Python",
    "Python is my favorite coding language",
    "The weather is nice today"
]

embeddings = model.encode(texts)
print(f"Embedding shape: {embeddings[0].shape}")  # (384,)

# Calculate similarity
from sklearn.metrics.pairwise import cosine_similarity
similarities = cosine_similarity(embeddings)

print("\nSimilarity matrix:")
for i, text in enumerate(texts):
    print(f"{i}: {text[:40]}")
print(similarities)
# Texts 0 and 1 will have high similarity (~0.8+)
# Text 2 will have low similarity with 0 and 1 (~0.1-0.3)

The Nearest Neighbor Problem

Given a query vector, find the k most similar vectors in the database.

Brute Force: Compare with every vector - O(n) time complexity. For 1 million vectors of 384 dimensions, that's 384 million float comparisons per query. Too slow for production.

Solution: Approximate Nearest Neighbor (ANN) algorithms trade perfect accuracy for speed:

HNSW (Hierarchical Navigable Small World)

The most popular algorithm. Builds a multi-layer graph where each layer is a "navigable small world" network.

Layer 2:    A -------- B -------- C       (few nodes, long connections)
             \        /
Layer 1:    A -- D -- B -- E -- C          (more nodes, medium connections)
             \  / \  / \  / \  /
Layer 0:   A-D-F-B-G-E-H-C-I-J             (all nodes, short connections)

Pros: Fast queries, good recall, handles updates well
Cons: High memory usage (stores graph structure)
Parameters:
- M: Number of connections per node (higher = better recall, more memory)
- ef_construction: Build-time search width (higher = better quality index)
- ef_search: Query-time search width (higher = better recall, slower)

IVF (Inverted File Index)

Clusters vectors and only searches relevant clusters.

Cluster 1: [v1, v5, v9, v12]
Cluster 2: [v2, v3, v7, v15]
Cluster 3: [v4, v6, v8, v10, v11, v13, v14]

Pros: Lower memory than HNSW, fast with many clusters
Cons: Requires training, poor with updates
Parameters:
- nlist: Number of clusters
- nprobe: Number of clusters to search (higher = better recall)

PQ (Product Quantization)

Compresses vectors by splitting into subvectors and quantizing each.

Original: [0.1, 0.5, 0.3, 0.8, 0.2, 0.9, 0.4, 0.7]
Split:    [0.1, 0.5] [0.3, 0.8] [0.2, 0.9] [0.4, 0.7]
Quantize: [2]        [5]        [1]        [7]        (code IDs)

Pros: Massive memory reduction (32x or more)
Cons: Lower accuracy, training required
Often combined with IVF as IVF-PQ

Distance Metrics

Metric	Formula	Use Case
Cosine	1 - cos(a,b)	Normalized embeddings (most common)
Euclidean (L2)	sqrt(sum((a-b)^2))	When magnitude matters
Dot Product	sum(a*b)	Recommendation systems
Manhattan (L1)	sum(abs(a-b))	Sparse data

Vector Database Comparison

Feature	ChromaDB	Pinecone	Weaviate	Qdrant	Milvus
Type	Embedded/Server	Managed Cloud	Self-hosted/Cloud	Self-hosted/Cloud	Self-hosted
Pricing	Free	Pay per use	Free/Enterprise	Free/Cloud	Free
Setup	Easiest	Easy	Medium	Medium	Complex
Scalability	Small-Medium	Large	Large	Large	Very Large
Filtering	Basic	Advanced	GraphQL	Advanced	Advanced
Best For	Prototyping	Production SaaS	Hybrid search	Performance	Enterprise

ChromaDB: Getting Started Quickly

ChromaDB is the easiest way to get started with vector databases. It runs embedded in your Python process.

Installation and Basic Usage

import chromadb
from chromadb.utils import embedding_functions

# Create client (in-memory)
client = chromadb.Client()

# Or persistent storage
client = chromadb.PersistentClient(path="./chroma_db")

# Create collection with default embedding function
collection = client.create_collection(
    name="documents",
    metadata={"hnsw:space": "cosine"}  # Distance metric
)

# Add documents (ChromaDB handles embedding automatically)
collection.add(
    documents=[
        "Machine learning is a subset of artificial intelligence",
        "Deep learning uses neural networks with many layers",
        "Natural language processing deals with text data",
        "Computer vision processes image and video data",
    ],
    metadatas=[
        {"category": "ml"},
        {"category": "dl"},
        {"category": "nlp"},
        {"category": "cv"},
    ],
    ids=["doc1", "doc2", "doc3", "doc4"]
)

# Query
results = collection.query(
    query_texts=["What is deep learning?"],
    n_results=2
)

print(results)

Using Custom Embeddings

from sentence_transformers import SentenceTransformer

# Custom embedding function
model = SentenceTransformer('all-MiniLM-L6-v2')

sentence_transformer_ef = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)

collection = client.create_collection(
    name="custom_embeddings",
    embedding_function=sentence_transformer_ef
)

Filtering with Metadata

# Query with metadata filter
results = collection.query(
    query_texts=["neural networks"],
    n_results=5,
    where={"category": "dl"}  # Only deep learning docs
)

# Complex filters
results = collection.query(
    query_texts=["machine learning"],
    where={
        "$and": [
            {"category": {"$in": ["ml", "dl"]}},
            {"year": {"$gte": 2020}}
        ]
    }
)

Pinecone: Production-Ready Cloud

Pinecone is a fully managed vector database optimized for production workloads.

Setup

from pinecone import Pinecone, ServerlessSpec

# Initialize
pc = Pinecone(api_key="your-api-key")

# Create index
pc.create_index(
    name="semantic-search",
    dimension=384,  # Match your embedding dimension
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws",
        region="us-east-1"
    )
)

# Connect to index
index = pc.Index("semantic-search")

Upserting and Querying

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

# Prepare data
documents = [
    {"id": "1", "text": "Python is great for data science"},
    {"id": "2", "text": "JavaScript powers the modern web"},
    {"id": "3", "text": "Rust offers memory safety without garbage collection"},
]

# Generate embeddings and upsert
vectors = []
for doc in documents:
    embedding = model.encode(doc["text"]).tolist()
    vectors.append({
        "id": doc["id"],
        "values": embedding,
        "metadata": {"text": doc["text"]}
    })

index.upsert(vectors=vectors)

# Query
query = "Which language is best for machine learning?"
query_embedding = model.encode(query).tolist()

results = index.query(
    vector=query_embedding,
    top_k=3,
    include_metadata=True
)

for match in results["matches"]:
    print(f"{match['score']:.3f}: {match['metadata']['text']}")

Namespaces for Multi-tenancy

# Upsert to specific namespace
index.upsert(vectors=vectors, namespace="user_123")

# Query specific namespace
results = index.query(
    vector=query_embedding,
    top_k=5,
    namespace="user_123"
)

Weaviate: Hybrid Search

Weaviate combines vector search with keyword search (BM25) for best-of-both-worlds retrieval.

Setup with Docker

docker run -d \
  -p 8080:8080 \
  -p 50051:50051 \
  cr.weaviate.io/semitechnologies/weaviate:latest

Python Client

import weaviate
from weaviate.classes.config import Configure, Property, DataType

# Connect
client = weaviate.connect_to_local()

# Create collection (class)
collection = client.collections.create(
    name="Document",
    vectorizer_config=Configure.Vectorizer.text2vec_transformers(),
    properties=[
        Property(name="content", data_type=DataType.TEXT),
        Property(name="category", data_type=DataType.TEXT),
    ]
)

# Add objects
collection.data.insert_many([
    {"content": "Machine learning automates analytical model building", "category": "ml"},
    {"content": "Neural networks are inspired by biological neurons", "category": "dl"},
])

Hybrid Search

# Combine vector and keyword search
results = collection.query.hybrid(
    query="neural network training",
    alpha=0.5,  # 0 = pure keyword, 1 = pure vector
    limit=5
)

for obj in results.objects:
    print(obj.properties)

Qdrant: High Performance

Qdrant is optimized for performance and offers advanced filtering capabilities.

Setup

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

# Local or cloud
client = QdrantClient(":memory:")  # In-memory for testing
# client = QdrantClient(url="http://localhost:6333")  # Docker
# client = QdrantClient(url="https://xxx.qdrant.io", api_key="...")  # Cloud

# Create collection
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)

Indexing and Search

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

# Prepare points
documents = [
    "Transformers revolutionized NLP",
    "CNNs are great for image processing",
    "RNNs handle sequential data",
]

points = [
    PointStruct(
        id=i,
        vector=model.encode(doc).tolist(),
        payload={"text": doc, "index": i}
    )
    for i, doc in enumerate(documents)
]

# Upsert
client.upsert(collection_name="documents", points=points)

# Search
query_vector = model.encode("What handles sequences?").tolist()

results = client.search(
    collection_name="documents",
    query_vector=query_vector,
    limit=3
)

for result in results:
    print(f"{result.score:.3f}: {result.payload['text']}")

Advanced Filtering

from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

# Search with filters
results = client.search(
    collection_name="documents",
    query_vector=query_vector,
    query_filter=Filter(
        must=[
            FieldCondition(
                key="category",
                match=MatchValue(value="nlp")
            ),
            FieldCondition(
                key="year",
                range=Range(gte=2020)
            )
        ]
    ),
    limit=10
)

Choosing the Right Database

Use ChromaDB when:

Prototyping and experimenting
Small to medium datasets (less than 1M vectors)
Embedded use cases
Quick setup is priority

Use Pinecone when:

Production SaaS applications
Don't want to manage infrastructure
Need enterprise features (SSO, audit logs)
Serverless scaling is important

Use Weaviate when:

Need hybrid search (vector + keyword)
GraphQL API is preferred
Multi-modal data (text, images)
Self-hosted with cloud option

Use Qdrant when:

Performance is critical
Complex filtering requirements
Rust-based reliability
Self-hosted preferred

Performance Tips

Batch operations: Upsert in batches of 100-1000
Choose right metric: Cosine for normalized, Euclidean for absolute distance
Index parameters: Tune HNSW ef_construction and M for speed/accuracy tradeoff
Quantization: Reduce memory with scalar/product quantization
Async operations: Use async clients for high-throughput apps

# Batch upsert example
BATCH_SIZE = 100
for i in range(0, len(vectors), BATCH_SIZE):
    batch = vectors[i:i + BATCH_SIZE]
    index.upsert(vectors=batch)

Complete Working Example

Here's a production-ready semantic search system you can adapt:

from sentence_transformers import SentenceTransformer
import chromadb
from typing import List, Dict, Optional
from dataclasses import dataclass
import hashlib

@dataclass
class Document:
    """Represents a document to be indexed."""
    content: str
    metadata: Dict
    id: Optional[str] = None

    def __post_init__(self):
        if self.id is None:
            self.id = hashlib.md5(self.content.encode()).hexdigest()[:16]

class SemanticSearchEngine:
    """Production-ready semantic search with ChromaDB."""

    def __init__(
        self,
        collection_name: str = "documents",
        model_name: str = "all-MiniLM-L6-v2",
        persist_directory: str = "./vector_db"
    ):
        # Initialize embedding model
        self.model = SentenceTransformer(model_name)
        self.embedding_dim = self.model.get_sentence_embedding_dimension()

        # Initialize ChromaDB
        self.client = chromadb.PersistentClient(path=persist_directory)
        self.collection = self.client.get_or_create_collection(
            name=collection_name,
            metadata={
                "hnsw:space": "cosine",
                "hnsw:M": 16,
                "hnsw:ef_construction": 100
            }
        )

    def add_documents(self, documents: List[Document], batch_size: int = 100):
        """Add documents to the index."""
        for i in range(0, len(documents), batch_size):
            batch = documents[i:i + batch_size]

            ids = [doc.id for doc in batch]
            contents = [doc.content for doc in batch]
            metadatas = [doc.metadata for doc in batch]

            # Generate embeddings
            embeddings = self.model.encode(contents).tolist()

            # Upsert to collection
            self.collection.upsert(
                ids=ids,
                documents=contents,
                embeddings=embeddings,
                metadatas=metadatas
            )

        print(f"Indexed {len(documents)} documents")

    def search(
        self,
        query: str,
        n_results: int = 10,
        filter_dict: Optional[Dict] = None,
        min_score: float = 0.0
    ) -> List[Dict]:
        """Search for similar documents."""
        query_embedding = self.model.encode(query).tolist()

        results = self.collection.query(
            query_embeddings=[query_embedding],
            n_results=n_results,
            where=filter_dict
        )

        # Format results with similarity scores
        formatted_results = []
        for i in range(len(results["ids"][0])):
            # ChromaDB returns distances, convert to similarity
            distance = results["distances"][0][i]
            similarity = 1 - distance  # For cosine distance

            if similarity >= min_score:
                formatted_results.append({
                    "id": results["ids"][0][i],
                    "content": results["documents"][0][i],
                    "metadata": results["metadatas"][0][i],
                    "similarity": round(similarity, 4)
                })

        return formatted_results

    def delete(self, ids: List[str]):
        """Delete documents by ID."""
        self.collection.delete(ids=ids)

    def count(self) -> int:
        """Get total document count."""
        return self.collection.count()


# Usage example
if __name__ == "__main__":
    # Initialize search engine
    engine = SemanticSearchEngine()

    # Sample documents
    docs = [
        Document(
            content="Python is a high-level programming language known for its readability",
            metadata={"category": "programming", "language": "python"}
        ),
        Document(
            content="JavaScript is essential for web development and runs in browsers",
            metadata={"category": "programming", "language": "javascript"}
        ),
        Document(
            content="Machine learning is a subset of artificial intelligence",
            metadata={"category": "ml", "topic": "fundamentals"}
        ),
        Document(
            content="Deep learning uses neural networks with many layers",
            metadata={"category": "ml", "topic": "deep-learning"}
        ),
        Document(
            content="Docker containers package applications with their dependencies",
            metadata={"category": "devops", "tool": "docker"}
        ),
    ]

    # Index documents
    engine.add_documents(docs)
    print(f"Total documents: {engine.count()}")

    # Search
    print("\n--- Search: 'How do I learn AI?' ---")
    results = engine.search("How do I learn AI?", n_results=3)
    for r in results:
        print(f"[{r['similarity']:.3f}] {r['content'][:60]}...")

    # Search with filter
    print("\n--- Search: 'coding' (filtered to programming) ---")
    results = engine.search(
        "coding",
        n_results=3,
        filter_dict={"category": "programming"}
    )
    for r in results:
        print(f"[{r['similarity']:.3f}] {r['content'][:60]}...")

Monitoring and Observability

Key Metrics to Track

import time
from dataclasses import dataclass
from typing import List, Callable
import functools

@dataclass
class SearchMetrics:
    """Track search performance."""
    query: str
    latency_ms: float
    num_results: int
    top_score: float

class MonitoredSearchEngine(SemanticSearchEngine):
    """Search engine with built-in monitoring."""

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.metrics: List[SearchMetrics] = []

    def search(self, query: str, **kwargs) -> List[Dict]:
        start = time.perf_counter()
        results = super().search(query, **kwargs)
        latency = (time.perf_counter() - start) * 1000

        # Record metrics
        self.metrics.append(SearchMetrics(
            query=query,
            latency_ms=latency,
            num_results=len(results),
            top_score=results[0]["similarity"] if results else 0.0
        ))

        return results

    def get_stats(self) -> Dict:
        """Get aggregated statistics."""
        if not self.metrics:
            return {}

        latencies = [m.latency_ms for m in self.metrics]
        scores = [m.top_score for m in self.metrics]

        return {
            "total_searches": len(self.metrics),
            "avg_latency_ms": sum(latencies) / len(latencies),
            "p95_latency_ms": sorted(latencies)[int(len(latencies) * 0.95)],
            "avg_top_score": sum(scores) / len(scores),
            "zero_result_rate": sum(1 for m in self.metrics if m.num_results == 0) / len(self.metrics)
        }

Migration Between Vector Databases

If you need to switch databases:

def migrate_chroma_to_qdrant(
    chroma_client: chromadb.Client,
    qdrant_client,
    collection_name: str,
    batch_size: int = 100
):
    """Migrate from ChromaDB to Qdrant."""
    from qdrant_client.models import PointStruct, VectorParams, Distance

    # Get all data from ChromaDB
    chroma_collection = chroma_client.get_collection(collection_name)
    data = chroma_collection.get(include=["documents", "embeddings", "metadatas"])

    # Create Qdrant collection
    vector_size = len(data["embeddings"][0])
    qdrant_client.create_collection(
        collection_name=collection_name,
        vectors_config=VectorParams(size=vector_size, distance=Distance.COSINE)
    )

    # Migrate in batches
    points = []
    for i, (id_, emb, doc, meta) in enumerate(zip(
        data["ids"], data["embeddings"], data["documents"], data["metadatas"]
    )):
        points.append(PointStruct(
            id=i,
            vector=emb,
            payload={"document": doc, **meta}
        ))

        if len(points) >= batch_size:
            qdrant_client.upsert(collection_name=collection_name, points=points)
            points = []

    if points:
        qdrant_client.upsert(collection_name=collection_name, points=points)

    print(f"Migrated {len(data['ids'])} vectors")

Conclusion

Vector databases are essential infrastructure for modern AI applications. Key takeaways:

Understand ANN algorithms: HNSW for most cases, IVF-PQ for memory constraints
ChromaDB: Best for getting started and prototyping
Pinecone: Best managed service for production
Weaviate: Best for hybrid search requirements
Qdrant: Best for performance-critical applications
Monitor performance: Track latency, recall, and result quality

Start with ChromaDB for development, then evaluate managed options (Pinecone) or self-hosted (Qdrant, Weaviate) based on your scaling needs.

References

ChromaDB Documentation: https://docs.trychroma.com
Pinecone Documentation: https://docs.pinecone.io
Weaviate Documentation: https://weaviate.io/developers/weaviate
Qdrant Documentation: https://qdrant.tech/documentation
Malkov & Yashunin "Efficient and robust approximate nearest neighbor search using HNSW" (2018)
Jegou et al. "Product Quantization for Nearest Neighbor Search" (2011)