Every team shipping retrieval-augmented generation, semantic search, or recommendation features eventually hits the same fork in the road: where do the embeddings live? The choice between pgvector, Pinecone, Qdrant, and Weaviate is less about which database has the fastest benchmark and more about how your team operates, what your data looks like, and how much you want to pay at the scale you actually have. This guide cuts through the marketing to give product teams a defensible decision.
What a vector database actually has to do
A vector database stores high-dimensional embeddings (typically 768 to 3072 floats per record) and answers approximate nearest-neighbor (ANN) queries fast. Under the hood, almost everyone uses the same family of index structures: HNSW (Hierarchical Navigable Small World graphs) for low-latency in-memory search, or IVF/quantized variants when memory cost dominates.
That means raw ANN performance is largely commoditized. The differences that matter in production are elsewhere:
- Metadata filtering — can you constrain a search to
tenant_id = X AND status = 'active'without wrecking recall or latency? - Hybrid search — can you combine dense vectors with sparse keyword signals (BM25) and fuse the results?
- Operational surface — who patches it, scales it, backs it up, and gets paged at 3am?
- Cost curve — what does it cost at 100K vectors, and what does it cost at 100M?
Hold those four axes in your head as we go.
pgvector: the database you already run
pgvector is a PostgreSQL extension that adds a vector column type and ANN indexes (HNSW and IVFFlat) directly inside Postgres. If your application already uses Postgres — and most do — this is the lowest-friction option in existence.
The strategic advantage is not performance; it is that your embeddings live in the same transactional database as your business data. You can join vectors against users, orders, and permissions in a single SQL query, inside a single transaction, with foreign keys and row-level security doing their job.
-- Enable once
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id bigserial PRIMARY KEY,
tenant_id bigint NOT NULL,
title text,
embedding vector(1536)
);
-- HNSW index with cosine distance
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Filtered semantic search: real metadata filter, real ANN
SELECT id, title, embedding <=> $1 AS distance
FROM documents
WHERE tenant_id = $2
ORDER BY embedding <=> $1
LIMIT 10;The honest limits: pgvector keeps the HNSW graph in shared buffers, so once your index no longer fits comfortably in RAM, latency degrades. Filtered queries can also force the planner into trade-offs between using the vector index and the metadata index. In practice pgvector is excellent up to the low tens of millions of vectors on a well-provisioned instance, especially when combined with pgvectorscale (StreamingDiskANN) for larger sets. Beyond that, or when you need aggressive multi-tenant isolation at scale, a dedicated engine starts to earn its keep.
Pinecone: managed, serverless, opinionated
Pinecone is a fully managed vector database with no servers to run. Its serverless tier decouples storage from compute and bills on usage, which removes the "provision a cluster and pray you sized it right" problem entirely.
What you are buying with Pinecone is operational silence. There is no index to tune by hand for most workloads, no node to patch, no replication to configure. For a small team that wants vectors to be a solved problem, that is genuinely valuable.
The trade-offs are equally real:
- Cost at idle and at scale. Serverless is cheap when traffic is low, but a large, hot index queried constantly can become a meaningful line item. Model it before you commit.
- Vendor lock-in. Your data lives in Pinecone's cloud in Pinecone's format. Migrating out means re-embedding or bulk export and re-ingest.
- Less control. You cannot drop down to tune the index or run exotic queries. You get the API surface they give you, which is clean but bounded.
Pinecone added native sparse-dense hybrid search and integrated reranking, so it is no longer dense-only. For teams whose differentiation is the product, not the infrastructure, paying Pinecone to never think about ANN ops is a rational trade.
Qdrant: the control-and-performance pick
Qdrant is an open-source vector database written in Rust, available self-hosted or as Qdrant Cloud. It sits in the sweet spot for teams that want serious vector performance and rich features without surrendering control or accepting black-box pricing.
Where Qdrant shines:
- First-class filtering. Its filterable HNSW lets you apply complex payload filters during graph traversal rather than as a slow post-filter, so recall stays high even with selective filters.
- Quantization built in. Scalar and binary quantization can shrink memory 4x to 32x, letting you hold large collections in RAM affordably or push cold data to disk.
- Hybrid search. Native sparse vectors plus the Query API with fusion (Reciprocal Rank Fusion) and multi-stage reranking give you proper hybrid retrieval.
- Self-host or managed. Run it on your own infrastructure for data residency and cost control, or use Qdrant Cloud when you want managed convenience.
Qdrant is our default recommendation when a project has outgrown pgvector but the team still wants to own its stack — common for AI products handling tens of millions of vectors with demanding filter and hybrid-search requirements.
Weaviate: the batteries-included knowledge engine
Weaviate is an open-source vector database that leans into being a complete search and knowledge platform rather than a bare vector store. It bundles modules for embedding generation, hybrid search, and generative pipelines, so it can call your embedding and LLM providers for you.
This module ecosystem is the differentiator. With Weaviate you can ingest raw text and let the database vectorize it on write, then run hybrid (BM25 + vector) queries and even RAG-style generation through one API. For teams that want the database to absorb more of the pipeline, that integration reduces glue code.
The flip side is that the abstraction adds surface area to learn and operate, and the schema-and-modules model is more opinionated than Qdrant's leaner core. Weaviate offers both self-hosted and Weaviate Cloud deployments. Pick it when hybrid search and an integrated, schema-driven knowledge layer are central to the product rather than a feature bolted onto an app.
Comparison table
| Dimension | pgvector | Pinecone | Qdrant | Weaviate |
|---|---|---|---|---|
| Model | Postgres extension | Fully managed SaaS | Open source + Cloud | Open source + Cloud |
| Best scale | Up to ~10–50M (more with pgvectorscale) | Small to very large, elastic | Tens of millions+ | Millions to large |
| Ops burden | You already run Postgres | Near zero | Self-host or managed | Self-host or managed |
| Hybrid search | Manual (combine with FTS) | Native sparse-dense + rerank | Native sparse + RRF fusion | Native BM25 + vector |
| Filtering | SQL, planner-dependent | Metadata filters | Filterable HNSW (strong) | Where filters |
| Quantization | Limited | Managed internally | Scalar + binary, tunable | Yes (PQ/BQ) |
| Lock-in risk | None (open) | High (proprietary cloud) | Low (open source) | Low (open source) |
| Cost shape | Your DB instance cost | Usage-based, can spike | Predictable infra cost | Predictable infra cost |
| Transactional joins | Yes, native SQL | No | No | No |
Treat the scale numbers as guidance, not guarantees — real limits depend on dimensions, filter selectivity, RAM, and query patterns. Always benchmark on your own data before a large commitment.
How to actually decide
Use this short decision list:
- You already use Postgres and have under ~10M vectors → Start with pgvector. Do not add infrastructure you do not need. Reach for
pgvectorscalebefore you reach for a new database. - Tiny team, want zero ops, fine with vendor lock-in → Pinecone serverless. Pay to make the problem disappear.
- Tens of millions of vectors, heavy metadata filtering or hybrid search, want to control cost and avoid lock-in → Qdrant, self-hosted or Cloud.
- Hybrid search and an integrated knowledge/RAG layer are core to the product → Weaviate.
- Strict data residency or air-gapped requirements → Self-hosted Qdrant or Weaviate, or pgvector on your own Postgres.
A pattern we use often at CodeAustral: ship the first version on pgvector to validate the feature with real users, then migrate to Qdrant only when filtering complexity, scale, or memory cost crosses a line you can measure. Migrating is cheap if you kept your embeddings and metadata in a clean, re-ingestible form from day one.
Avoiding the expensive mistakes
- Do not pick the database before you know your scale. A 50,000-document internal search tool does not need a distributed cluster. Provisioning for imaginary scale is the most common waste we see.
- Test filtered recall, not just raw QPS. A vector DB that is fast on unfiltered queries can collapse to a linear scan once you add a selective
WHERE. This is where engines genuinely differ. - Decide on hybrid search early. Pure vector search misses exact-match terms (SKUs, error codes, names). If your domain has those, plan for sparse + dense fusion from the start.
- Keep embeddings reproducible. Store the source text and the embedding model version. Re-embedding is the real cost of any migration, and model upgrades will force it eventually anyway.
Frequently Asked Questions
Is pgvector good enough for production?
Yes, for a large share of real workloads. pgvector handles up to roughly tens of millions of vectors well on a properly sized Postgres instance, and pgvectorscale extends that further. Its biggest win is keeping vectors and business data in one transactional database. Move to a dedicated engine only when scale, filtering, or memory cost demonstrably exceed what Postgres delivers.
What is hybrid search and do I need it?
Hybrid search combines dense vector similarity with sparse keyword scoring (like BM25) and fuses the rankings, usually with Reciprocal Rank Fusion. You need it when queries contain exact terms that semantics alone miss: product SKUs, part numbers, proper names, or error codes. Pinecone, Qdrant, and Weaviate support it natively; with pgvector you combine the vector index with Postgres full-text search manually.
When should I choose Pinecone over Qdrant?
Choose Pinecone when your team is small, wants zero operational overhead, and is comfortable with proprietary lock-in and usage-based pricing. Choose Qdrant when you want strong filtered search, tunable quantization, predictable infrastructure cost, and the option to self-host. Qdrant gives more control; Pinecone gives more silence. Model both against your real query volume before deciding.
How hard is it to migrate between vector databases?
The mechanics are straightforward: export records and re-ingest them. The real cost is embeddings. If you stored the original source text and the embedding model version, migration is a bulk re-ingest, sometimes with a re-embedding pass. If you only kept opaque vectors, you are locked to that model's output. Design for portability from day one.
Do I need a dedicated vector database at all?
Often not. If you already run Postgres and your dataset is in the thousands to low millions of vectors, pgvector is usually the correct answer and adds no new infrastructure. A dedicated vector database earns its place when you cross into heavy filtering, hybrid search, large scale, or strict isolation requirements that Postgres cannot serve efficiently.
Working with CodeAustral
We build AI products end to end — retrieval pipelines, embeddings, search, and the apps around them — and we choose infrastructure based on your actual scale and team, not the trend of the month. If you are deciding where your embeddings should live or planning a migration, send us a short brief at codeaustral.com/contact and we will help you pick the boring, correct option.

