Migration Guidance on Qdrant - Vector Search Engine

Pre-Migration Baseline

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Pre-Migration Baseline

Establishing a baseline is paramount for migration verification. If you don’t capture what “correct” looks like before you migrate, you have nothing to compare against afterward. This page covers what to record from your source system before starting the migration.

What to Capture

There are four pieces of information that need to be accounted for when establishing a baseline: collection/index inventory, metadata samples, baseline search results, and system configuration snapshots.

Data Integrity

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Data Integrity Verification

Once you’ve established a baseline, you first need to check data integrity. Data integrity answers the question: “Did all my data arrive, and did it arrive correctly?” These are the fastest checks to run and catch the most common migration failures.

1. Vector Count Verification

The simplest check: does the number of vectors in Qdrant match your source system?

from qdrant_client import QdrantClient

client = QdrantClient("localhost", port=6333)

# Get collection info
collection_info = client.get_collection("your_collection")
qdrant_count = collection_info.points_count

# Compare against baseline
source_count = baseline["total_vector_count"] # From pre-migration capture

if qdrant_count == source_count:
 print(f"✓ Vector count matches: {qdrant_count}")
else:
 diff = source_count - qdrant_count
 pct = (diff / source_count) * 100
 print(f"✗ Count mismatch: source={source_count}, qdrant={qdrant_count}, "
 f"missing={diff} ({pct:.2f}%)")

Common causes of count mismatches:

Search Quality

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Search Quality Verification

Two systems can hold identical vectors and produce different search results because of differences in indexing, quantization, scoring, and filtering implementation.

This is perhaps the hardest part of migration verification. The guide breaks it into three tiers so you can pick the level of rigor that matches your resources and risk tolerance.

Three-Tiered Search Quality Checks

Tier	Effort	What It Catches	When to Use
Tier 1: Spot-Check	15 min	Gross failures: wrong metric, broken filters, obviously wrong results	Every migration
Tier 2: Statistical Sampling	1-2 hours	Systematic recall degradation, filter interaction bugs, score distribution shifts	Production workloads, >100K vectors
Tier 3: Gold-Standard Evaluation	Half day to days	Measurable relevance changes with confidence intervals	High-stakes search (revenue, safety), regulated industries

Our recommendation: Every migration should run Tier 1 and Tier 2. Tier 3 is for teams that have (or can build) labeled evaluation data. If you don’t have labeled data today, Tier 2 gives you a strong quantitative baseline and this guide shows you how to build toward Tier 3 over time.

Diagnosing Discrepancies

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Diagnosing Discrepancies

When verification catches a problem, you need to determine whether it’s a data issue (something went wrong during migration) or a configuration issue (the data is correct but the systems behave differently). This page provides a diagnostic decision tree and vendor-specific gotchas.

Decision Tree

Start here when any verification check fails:

Is the vector count wrong?
├─ Yes → Data-level issue
│ ├─ Count lower than expected → Check migration script logs for errors,
│ │ timeouts, or partial failures. Re-run for missing segments.
│ ├─ Count higher than expected → Check for duplicate inserts (retried batches)
│ │ or source count excluding namespaces/partitions.
│ └─ Count matches but IDs differ → ID mapping error during migration.
│
└─ No (count matches) → Continue
 │
 Are metadata fields missing or wrong type?
 ├─ Yes → Payload mapping issue
 │ ├─ Fields missing → Source system may omit null fields on export.
 │ │ Check migration script's null handling.
 │ ├─ Types changed → See "Type Coercion" section below.
 │ └─ Values differ → Encoding issue (UTF-8, special characters, unicode normalization).
 │
 └─ No (metadata looks correct) → Continue
 │
 Are search results completely different?
 ├─ Yes → Configuration-level issue
 │ ├─ Check distance metric (most common cause)
 │ ├─ Check if index is built (HNSW may not be built yet on fresh data)
 │ └─ Check if vectors are normalized (affects cosine vs. dot product)
 │
 └─ No (results overlap but differ at the margins) → Expected behavior
 │
 Is recall@10 below 0.85?
 ├─ Yes → Indexing parameter mismatch
 │ ├─ Compare HNSW ef_construction and M values
 │ ├─ Compare ef (search-time) parameters
 │ └─ Check quantization settings
 │
 └─ No → Migration is working correctly.
 Results differ on borderline cases due to
 ANN approximation. This is normal.

Configuration-Level Issues

Distance Metric Mismatch

The most impactful configuration error. Here’s how metrics map across systems: