Vector Databases Compared: pgvector, Pinecone, Qdrant and When to Use Each

Every team shipping retrieval-augmented generation, semantic search, or recommendation features eventually hits the same fork in the road: where do the embeddings live? The choice between pgvector, Pinecone, Qdrant, and Weaviate is less about which database has the fastest benchmark and more about how your team operates, what your data looks like, and how much you want to pay at the scale you actually have. This guide cuts through the marketing to give product teams a defensible decision.

What a vector database actually has to do

A vector database stores high-dimensional embeddings (typically 768 to 3072 floats per record) and answers approximate nearest-neighbor (ANN) queries fast. Under the hood, almost everyone uses the same family of index structures: HNSW (Hierarchical Navigable Small World graphs) for low-latency in-memory search, or IVF/quantized variants when memory cost dominates.

That means raw ANN performance is largely commoditized. The differences that matter in production are elsewhere:

Metadata filtering — can you constrain a search to tenant_id = X AND status = 'active' without wrecking recall or latency?
Hybrid search — can you combine dense vectors with sparse keyword signals (BM25) and fuse the results?
Operational surface — who patches it, scales it, backs it up, and gets paged at 3am?
Cost curve — what does it cost at 100K vectors, and what does it cost at 100M?

Hold those four axes in your head as we go.

pgvector: the database you already run

pgvector is a PostgreSQL extension that adds a vector column type and ANN indexes (HNSW and IVFFlat) directly inside Postgres. If your application already uses Postgres — and most do — this is the lowest-friction option in existence.

The strategic advantage is not performance; it is that your embeddings live in the same transactional database as your business data. You can join vectors against users, orders, and permissions in a single SQL query, inside a single transaction, with foreign keys and row-level security doing their job.

-- Enable once
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id          bigserial PRIMARY KEY,
  tenant_id   bigint NOT NULL,
  title       text,
  embedding   vector(1536)
);

-- HNSW index with cosine distance
CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- Filtered semantic search: real metadata filter, real ANN
SELECT id, title, embedding <=> $1 AS distance
FROM documents
WHERE tenant_id = $2
ORDER BY embedding <=> $1
LIMIT 10;

The honest limits: pgvector keeps the HNSW graph in shared buffers, so once your index no longer fits comfortably in RAM, latency degrades. Filtered queries can also force the planner into trade-offs between using the vector index and the metadata index. In practice pgvector is excellent up to the low tens of millions of vectors on a well-provisioned instance, especially when combined with pgvectorscale (StreamingDiskANN) for larger sets. Beyond that, or when you need aggressive multi-tenant isolation at scale, a dedicated engine starts to earn its keep.

Pinecone: managed, serverless, opinionated

Pinecone is a fully managed vector database with no servers to run. Its serverless tier decouples storage from compute and bills on usage, which removes the "provision a cluster and pray you sized it right" problem entirely.

What you are buying with Pinecone is operational silence. There is no index to tune by hand for most workloads, no node to patch, no replication to configure. For a small team that wants vectors to be a solved problem, that is genuinely valuable.

The trade-offs are equally real:

Cost at idle and at scale. Serverless is cheap when traffic is low, but a large, hot index queried constantly can become a meaningful line item. Model it before you commit.
Vendor lock-in. Your data lives in Pinecone's cloud in Pinecone's format. Migrating out means re-embedding or bulk export and re-ingest.
Less control. You cannot drop down to tune the index or run exotic queries. You get the API surface they give you, which is clean but bounded.

Pinecone added native sparse-dense hybrid search and integrated reranking, so it is no longer dense-only. For teams whose differentiation is the product, not the infrastructure, paying Pinecone to never think about ANN ops is a rational trade.

Qdrant: the control-and-performance pick

Qdrant is an open-source vector database written in Rust, available self-hosted or as Qdrant Cloud. It sits in the sweet spot for teams that want serious vector performance and rich features without surrendering control or accepting black-box pricing.

Where Qdrant shines:

First-class filtering. Its filterable HNSW lets you apply complex payload filters during graph traversal rather than as a slow post-filter, so recall stays high even with selective filters.
Quantization built in. Scalar and binary quantization can shrink memory 4x to 32x, letting you hold large collections in RAM affordably or push cold data to disk.
Hybrid search. Native sparse vectors plus the Query API with fusion (Reciprocal Rank Fusion) and multi-stage reranking give you proper hybrid retrieval.
Self-host or managed. Run it on your own infrastructure for data residency and cost control, or use Qdrant Cloud when you want managed convenience.

Qdrant is our default recommendation when a project has outgrown pgvector but the team still wants to own its stack — common for AI products handling tens of millions of vectors with demanding filter and hybrid-search requirements.

Weaviate: the batteries-included knowledge engine

Weaviate is an open-source vector database that leans into being a complete search and knowledge platform rather than a bare vector store. It bundles modules for embedding generation, hybrid search, and generative pipelines, so it can call your embedding and LLM providers for you.

This module ecosystem is the differentiator. With Weaviate you can ingest raw text and let the database vectorize it on write, then run hybrid (BM25 + vector) queries and even RAG-style generation through one API. For teams that want the database to absorb more of the pipeline, that integration reduces glue code.

The flip side is that the abstraction adds surface area to learn and operate, and the schema-and-modules model is more opinionated than Qdrant's leaner core. Weaviate offers both self-hosted and Weaviate Cloud deployments. Pick it when hybrid search and an integrated, schema-driven knowledge layer are central to the product rather than a feature bolted onto an app.

Comparison table

Dimension	pgvector	Pinecone	Qdrant	Weaviate
Model	Postgres extension	Fully managed SaaS	Open source + Cloud	Open source + Cloud
Best scale	Up to ~10–50M (more with pgvectorscale)	Small to very large, elastic	Tens of millions+	Millions to large
Ops burden	You already run Postgres	Near zero	Self-host or managed	Self-host or managed
Hybrid search	Manual (combine with FTS)	Native sparse-dense + rerank	Native sparse + RRF fusion	Native BM25 + vector
Filtering	SQL, planner-dependent	Metadata filters	Filterable HNSW (strong)	Where filters
Quantization	Limited	Managed internally	Scalar + binary, tunable	Yes (PQ/BQ)
Lock-in risk	None (open)	High (proprietary cloud)	Low (open source)	Low (open source)
Cost shape	Your DB instance cost	Usage-based, can spike	Predictable infra cost	Predictable infra cost
Transactional joins	Yes, native SQL	No	No	No

Treat the scale numbers as guidance, not guarantees — real limits depend on dimensions, filter selectivity, RAM, and query patterns. Always benchmark on your own data before a large commitment.

How to actually decide

Use this short decision list:

You already use Postgres and have under ~10M vectors → Start with pgvector. Do not add infrastructure you do not need. Reach for pgvectorscale before you reach for a new database.
Tiny team, want zero ops, fine with vendor lock-in → Pinecone serverless. Pay to make the problem disappear.
Tens of millions of vectors, heavy metadata filtering or hybrid search, want to control cost and avoid lock-in → Qdrant, self-hosted or Cloud.
Hybrid search and an integrated knowledge/RAG layer are core to the product → Weaviate.
Strict data residency or air-gapped requirements → Self-hosted Qdrant or Weaviate, or pgvector on your own Postgres.

A pattern we use often at CodeAustral: ship the first version on pgvector to validate the feature with real users, then migrate to Qdrant only when filtering complexity, scale, or memory cost crosses a line you can measure. Migrating is cheap if you kept your embeddings and metadata in a clean, re-ingestible form from day one.

Avoiding the expensive mistakes

Do not pick the database before you know your scale. A 50,000-document internal search tool does not need a distributed cluster. Provisioning for imaginary scale is the most common waste we see.
Test filtered recall, not just raw QPS. A vector DB that is fast on unfiltered queries can collapse to a linear scan once you add a selective WHERE. This is where engines genuinely differ.
Decide on hybrid search early. Pure vector search misses exact-match terms (SKUs, error codes, names). If your domain has those, plan for sparse + dense fusion from the start.
Keep embeddings reproducible. Store the source text and the embedding model version. Re-embedding is the real cost of any migration, and model upgrades will force it eventually anyway.

Frequently Asked Questions

Is pgvector good enough for production?

Yes, for a large share of real workloads. pgvector handles up to roughly tens of millions of vectors well on a properly sized Postgres instance, and pgvectorscale extends that further. Its biggest win is keeping vectors and business data in one transactional database. Move to a dedicated engine only when scale, filtering, or memory cost demonstrably exceed what Postgres delivers.

What is hybrid search and do I need it?

Hybrid search combines dense vector similarity with sparse keyword scoring (like BM25) and fuses the rankings, usually with Reciprocal Rank Fusion. You need it when queries contain exact terms that semantics alone miss: product SKUs, part numbers, proper names, or error codes. Pinecone, Qdrant, and Weaviate support it natively; with pgvector you combine the vector index with Postgres full-text search manually.

When should I choose Pinecone over Qdrant?

Choose Pinecone when your team is small, wants zero operational overhead, and is comfortable with proprietary lock-in and usage-based pricing. Choose Qdrant when you want strong filtered search, tunable quantization, predictable infrastructure cost, and the option to self-host. Qdrant gives more control; Pinecone gives more silence. Model both against your real query volume before deciding.

How hard is it to migrate between vector databases?

The mechanics are straightforward: export records and re-ingest them. The real cost is embeddings. If you stored the original source text and the embedding model version, migration is a bulk re-ingest, sometimes with a re-embedding pass. If you only kept opaque vectors, you are locked to that model's output. Design for portability from day one.

Do I need a dedicated vector database at all?

Often not. If you already run Postgres and your dataset is in the thousands to low millions of vectors, pgvector is usually the correct answer and adds no new infrastructure. A dedicated vector database earns its place when you cross into heavy filtering, hybrid search, large scale, or strict isolation requirements that Postgres cannot serve efficiently.

Working with CodeAustral

We build AI products end to end — retrieval pipelines, embeddings, search, and the apps around them — and we choose infrastructure based on your actual scale and team, not the trend of the month. If you are deciding where your embeddings should live or planning a migration, send us a short brief at codeaustral.com/contact and we will help you pick the boring, correct option.