OPEN SOURCE · DISTRIBUTED VECTOR DATABASE · BILLION-SCALE
Milvus
An open-source, cloud-native, distributed vector database built by Zilliz — designed from day one for tens of billions of vectors, fully Kubernetes-native, with optional GPU acceleration for the highest-throughput similarity search in the open-source category. The right retrieval layer for hospital-system-scale corpora and FHIR-grounded medical RAG when the workload outgrows single-node engines.
Milvus Distributed handles datasets from 100M vectors up to tens of billions. Sharded storage architecture; storage, query, and coordination scale independently. Designed for the corpora that don't fit on a single node.
Fully Kubernetes-native, distributed from day one. Separates compute and storage; uses object storage (S3 / MinIO) for vectors and metadata. Requires a real ops team — that is both the cost and the value.
Milvus supports GPU-accelerated index building and similarity search via the cuVS / RAFT integration — the highest-throughput open-source vector DB option when GPU capacity is available.
Milvus Lite (embedded, for prototyping), Milvus Standalone (single-node for pilots), Milvus Distributed (production, billion-vector scale). The mode you choose is a real architectural decision.
What Milvus actually is
Milvus is an open-source vector database originally built by the Zilliz team in 2019 and now under LF AI & Data Foundation governance. The core differentiator versus Qdrant is architectural: Milvus is distributed by design, with separated storage and compute layers, sharded ANN indexes, and a fully Kubernetes-native operating model. That makes it heavier to run but also the most credible open-source option once a corpus passes the billion-vector mark or once HA / multi-region replication is a binding requirement.
For healthcare specifically, Milvus has become the default reference for FHIR-grounded medical RAG: public examples in 2025–26 include time-aware patient-history search built on FHIR + Milvus + BGE embeddings, visual medication-identification systems combining CLIP and Milvus, and large-scale knowledge bases that fuse internal clinical content with licensed external evidence. The supporting commercial entity Zilliz publishes deep documentation and active integration with the broader open-source healthcare stack (Docling, Haystack, LangChain, LlamaIndex).
What Milvus is not: the right choice when the workload comfortably fits on a single node and operational simplicity matters more than horizontal-scale headroom. For most departmental hospital RAG deployments, Qdrant is the better starting point. Reach for Milvus when the corpus is system-wide and growing, when GPU-accelerated search is on the table, or when the org already operates a Kubernetes platform team.
Deployment posture
Milvus is delivered in three deployment modes: Milvus Lite (embedded Python library, for prototypes and Jupyter notebooks), Milvus Standalone (single-node Docker, for pilots up to ~100M vectors), and Milvus Distributed (Kubernetes-native, for production billion-vector workloads). Production Milvus uses S3-compatible object storage for persistence (MinIO is the typical on-prem choice), etcd for metadata, and a separation of query / index / coordinator pods that lets each scale independently.
Native gRPC API with official clients in Python, Java, Go, Node.js, C#, and Ruby. PyMilvus is the most widely-used client. Compatible with Haystack, LangChain, LlamaIndex, Docling.
HNSW, IVF_FLAT, IVF_PQ, DiskANN, SCANN, and GPU-accelerated cuVS-backed indexes. The widest index palette in the open-source category.
Vectors and metadata persist in S3-compatible object storage (MinIO on-prem). etcd for coordination. The architecture that makes billion-vector horizontal scaling possible.
Production Milvus expects a Kubernetes platform team. That is both the cost (vs Qdrant's single-binary simplicity) and the value (vs Qdrant's single-node ceiling).
Healthcare fit
Milvus is the right retrieval layer when a healthcare workload genuinely needs billion-vector scale, multi-team concurrent usage at high throughput, GPU-accelerated similarity search, or operational architectures already aligned with Kubernetes. Real-world deployments are growing: FHIR-grounded patient history retrieval, multi-modal medical RAG (CLIP + Milvus for pill / pathology image search), large-scale literature search, and hospital-system knowledge bases that combine internal SOPs with licensed external evidence.
- checkGood fit: system-wide RAG over EHR, claims, imaging metadata, and literature where corpus size genuinely exceeds single-node capacity.
- checkGood fit: teams that already run Kubernetes and prefer to standardize one data-platform pattern across services.
- checkGood fit: GPU-rich environments where cuVS-accelerated indexing and search materially improve cost-per-query.
- checkGood fit: FHIR-grounded medical RAG patterns where the published reference architecture explicitly uses Milvus.
- closeBad fit: single-department pilots where Qdrant's operational simplicity wins. The decision rule of thumb: under 100M vectors and no HA requirement, start with Qdrant.
- closeBad fit: teams without a Kubernetes ops capability. Milvus Distributed is not a single-binary deployment.
Privacy and governance
Milvus runs entirely on customer-controlled infrastructure in the self-hosted deployment — no telemetry by default, no remote dependency, all data on local Kubernetes / object storage. The data-handling posture is consistent with HIPAA / PIPEDA / PHIPA / Quebec Law 25 environments, with the caveat that a real Milvus deployment has more moving parts than a single Qdrant binary, and each component (etcd, MinIO, query nodes, coordinator) needs its own network-boundary, access-control, and audit story.
Governance for Milvus is the same shape as for any distributed data system: schema ownership, ingestion-side population of permission metadata, query-side authorization at the gateway, audit log retention, backup and restore policy, and a documented capacity model. The engine does retrieval; the operator owns the operating model.
Strengths and limitations
Highest scale ceiling among open-source vector databases — credibly handles billions of vectors. Kubernetes-native architecture that aligns with existing hospital-system platform teams. GPU-accelerated indexing and search via cuVS / RAFT. Widest index family. Strong reference architectures for FHIR-grounded medical RAG. Backed by Zilliz with active commercial support if needed.
Higher operational complexity than Qdrant; not a single-binary deployment. Kubernetes prerequisite is real. Less polished metadata-filter integration than Qdrant for permission-aware retrieval. Heavier learning curve for teams new to vector databases. Overkill for departmental pilots under ~100M vectors.
Where Milvus fits in a hospital stack
| Layer | What Milvus contributes | What still has to be solved |
|---|---|---|
| Vector retrieval at scale | Billion-vector horizontal scaling on Kubernetes; storage / compute separation; GPU-accelerated search. | Capacity planning, ops team, backup and restore strategy, multi-region replication policy. |
| Index palette | HNSW, IVF, DiskANN, SCANN, GPU-accelerated cuVS — pick by recall / latency / cost trade-off. | Choosing the right index per collection; periodic re-indexing as the model or corpus evolves. |
| Healthcare reference patterns | FHIR-grounded patient history search, CLIP + Milvus visual medical RAG, large multi-modal corpora. | FHIR ingestion pipeline, embedding strategy per modality, evaluation per use case. |
| Embeddings | None — bring your own embedding model served through Ollama, vLLM, or LocalAI. | Embedding choice (BGE, E5, MedCPT, nomic-embed-text), evaluation, re-indexing cadence. |
| Orchestration | None — pair with Haystack or LangChain for the full RAG pipeline. | Workflow design, error handling, evaluation rubric, audit trail. |
Milvus is the heavyweight retrieval layer. It is the right answer when a hospital corpus genuinely outgrows single-node engines or when Kubernetes-native operational architecture is already a given. For most starting-point hospital workloads, Qdrant is the lighter choice; graduate to Milvus when the workload earns it.
Quick facts
| Project | Milvus (open-source, Apache 2.0). Under LF AI & Data Foundation governance. Commercial support: Zilliz. GitHub: milvus-io/milvus. |
| Type | Distributed, Kubernetes-native vector database with separated storage and compute and a wide index palette. |
| Deployment modes | Milvus Lite (embedded). Milvus Standalone (single-node Docker). Milvus Distributed (production Kubernetes). |
| Indexes | HNSW, IVF_FLAT, IVF_PQ, DiskANN, SCANN, GPU-accelerated (cuVS / RAFT). |
| Storage | S3-compatible object store (MinIO on-prem). etcd for coordination. |
| Scale class | Tens of billions of vectors. Multi-shard, multi-replica. Designed for HA and horizontal scaling. |
| Healthcare reference patterns | FHIR-grounded medical RAG (BGE + Milvus), CLIP + Milvus visual medication ID, Docling + Milvus document RAG. |
| Website | milvus.io · GitHub: github.com/milvus-io/milvus |
Use Milvus when scale and Kubernetes are both already in the picture
Milvus is the retrieval layer to reach for when the hospital workload genuinely exceeds single-node capacity, or when the org already operates Kubernetes and wants one distributed pattern across services. Moneli Automation's typical Milvus deployment is paired with FHIR-grounded ingestion, BGE or MedCPT embeddings, and Haystack orchestration — the same shape as a Qdrant deployment, scaled up.
send Request a WalledCare pilot arrow_back All open-source profiles
Further reading
- Milvus official site
- Milvus on GitHub
- Milvus deployment options overview
- Build RAG with Milvus (official tutorial)
- FHIR-powered medical RAG with Milvus (DEV community)
- Visual RAG for medication safety with CLIP and Milvus
- Qdrant profile — the operationally simpler counterpart for sub-billion workloads
- Haystack profile — the orchestration layer Milvus most often pairs with