OpenSearch — Open-Source Search + Vector for Hospital Hybrid Retrieval

License

Apache 2.0

Permissive open-source license that does not change. The original reason for the 2021 Elasticsearch fork; the operational reason large healthcare and public-sector organizations adopted it.

Vector engines

3

Lucene (smart pre-filter, smaller deployments), Faiss (large-scale, recommended for billions of vectors), nmslib (HNSW classic). Pick by scale and filter shape — Lucene's auto-selecting filter strategy is uniquely useful for permission-aware retrieval.

Performance gap

Up to 12×

Elastic's own published benchmarks claim Elasticsearch is up to 12× faster than OpenSearch for vector search. Independent benchmarks vary widely; for hospital-scale workloads the gap is rarely the binding constraint, but buyers should measure on their own corpus.

Built-in security

No premium

OpenSearch ships RBAC, field-level security, audit logging, encryption-at-rest, and SAML / OIDC integration without a premium subscription. Closes the historical Elastic Security pricing-gap that drove many organizations off X-Pack.

What OpenSearch actually is

OpenSearch is a distributed search and analytics engine forked from Elasticsearch 7.10 in 2021 by AWS and the broader community after Elastic switched the Elasticsearch license to SSPL. It is now developed by the OpenSearch Foundation (Linux Foundation) and is the search layer used inside Amazon OpenSearch Service, plus countless self-hosted deployments. It shares the Lucene heritage, the JSON query DSL, the indices / shards / replicas model, and most of the ecosystem (Logstash, Beats, Kibana under the OpenSearch Dashboards rename) with Elasticsearch.

For healthcare buyers building a private RAG stack, OpenSearch's value is hybrid retrieval. The k-NN plugin adds a knn_vector data type and the choice of three indexing engines: Lucene (good for smaller deployments with smart pre-/post-filter selection), Faiss (recommended for billions of vectors), and nmslib (HNSW reference). Combine vector search with the standard BM25 keyword scoring inside one query, layer on the existing filter machinery, and you have a permission-aware hybrid retrieval system without operating two separate databases.

What OpenSearch is not: a pure vector database. Qdrant and Milvus are faster and more memory-efficient for pure k-NN at scale. OpenSearch wins when the workload genuinely needs hybrid keyword-and-vector retrieval, when the operator already runs Elastic-style infrastructure, or when the in-built RBAC and audit-logging story is a procurement advantage.

Deployment posture

OpenSearch runs as a self-hosted cluster on Linux (Docker, RPM, Kubernetes via the official Helm chart) and is also available as Amazon OpenSearch Service for orgs that are comfortable with cloud hosting. The on-prem deployment is a standard distributed-cluster shape: master nodes, data nodes, ingest nodes, optional coordinating nodes, with replicas across availability zones. Storage is on attached block volumes.

SURFACE

REST API + clients

JSON REST API on port 9200 with the familiar Elasticsearch query DSL. Native clients in most languages. OpenSearch Dashboards (Kibana fork) for visualization on port 5601.

VECTOR

k-NN plugin

Adds knn_vector field type, three vector engines (Lucene / Faiss / nmslib), approximate-and-exact k-NN, hybrid keyword-and-vector queries, and integrated filter handling.

SECURITY

Included

RBAC, field-level security, document-level security, audit logging, encryption at rest, SAML / OIDC integration. The big advantage versus historical Elastic deployments that required X-Pack.

SCALE

Cluster-grade

Designed for sharded, replicated, multi-node deployments from day one. Hospital-system corpora with billions of documents and billions of vectors are well within range, with the standard cluster-ops overhead.

Healthcare fit

OpenSearch is the right retrieval layer when a hospital workload genuinely benefits from hybrid keyword-and-vector search — clinical guideline retrieval where exact-term match and semantic similarity both matter, claims-data search where structured filters dominate, policy-document Q&A where the existing keyword pipeline is already trusted, or any deployment where the security and audit-logging story is a procurement advantage over a newer vector-only engine.

checkGood fit: hybrid keyword-plus-vector retrieval over clinical guidelines, policy SOPs, formulary tables, and structured claims data.
checkGood fit: hospitals that already operate Elastic / OpenSearch for logs, observability, or general search and would rather extend than introduce a separate vector database.
checkGood fit: environments where field-level security, document-level security, and audit logging are procurement requirements — OpenSearch includes them without a premium tier.
closeBad fit: pure-vector workloads at billion-scale where Milvus's GPU-accelerated indexing or Qdrant's single-binary simplicity wins.
closeBad fit: small teams without distributed-systems operational capacity. A real OpenSearch cluster is a real cluster.

Privacy and governance

OpenSearch self-hosted runs entirely on customer-controlled infrastructure with no telemetry. It is the same data-handling shape as any on-prem search cluster — encryption at rest, encryption in transit, RBAC, audit logging, and field- / document-level security all bundled into the engine itself. That is a meaningful advantage versus engines that defer identity and audit to a separate gateway: in OpenSearch the boundary is engineering-grade out of the box.

For HIPAA / PIPEDA / PHIPA / Quebec Law 25 environments, the typical Moneli Automation pattern is OpenSearch as the retrieval-plus-search-plus-audit substrate, with the standard cluster ops responsibilities (capacity, snapshots, replication) and a clinician-facing application that consumes OpenSearch via service accounts scoped to specific indices and field-level policies. The engine carries more of the governance weight than the lighter vector-only alternatives.

Strengths and limitations

STRENGTHS

Why hospital stacks pick it

Hybrid keyword + vector retrieval in one engine. RBAC, field-level security, document-level security, audit logging, and SAML / OIDC included without premium. Apache 2.0 license. Familiar to teams that already run Elastic; mature operational tooling (snapshots, ILM, dashboards). Three vector-engine choices give scale flexibility from small Lucene-backed deployments to billion-vector Faiss-backed ones.

LIMITATIONS

Where it does not fit

Pure-vector throughput lower than Qdrant or Milvus. Higher operational overhead than a single-binary engine. Elastic's published benchmarks claim Elasticsearch is up to 12× faster for vector search; independent results vary. Cluster operations are real distributed-systems work — not the right starting point for a small pilot. Vector roadmap moves more slowly than dedicated vector databases.

Where OpenSearch fits in a hospital stack

Layer	What OpenSearch contributes	What still has to be solved
Hybrid retrieval	Keyword + vector search in one query, integrated with filters, RBAC, and field-level security.	Embedding model choice, chunking strategy, hybrid scoring weights, evaluation harness.
Security + audit	RBAC, field-/document-level security, audit logging, encryption — built into the engine.	Identity provider integration, retention policy, audit-trail review workflow.
Cluster ops	Standard distributed-search operating model — sharding, replication, snapshots, dashboards.	Capacity planning, snapshot strategy, version upgrades, multi-AZ replication.
Embeddings	None — bring your own embedding model served through Ollama, vLLM, or LocalAI.	Embedding choice, dimensionality, re-indexing cadence.
Orchestration	None — pair with Haystack or LangChain for full RAG pipelines.	Workflow design, evaluation, audit trail.

OpenSearch is the hybrid-search retrieval layer. It is the right answer when keyword and vector queries genuinely matter together, when the operator already runs Elastic-style infrastructure, or when bundled security and audit features are a procurement advantage. For pure vector workloads at large scale, Qdrant or Milvus is the better fit.

Quick facts

Project	OpenSearch (Apache 2.0). Forked from Elasticsearch 7.10 in 2021. Now under the OpenSearch Foundation (Linux Foundation). GitHub: opensearch-project/OpenSearch.
Type	Distributed search and analytics engine with k-NN vector search.
Vector engines	Lucene (smart filter selection, smaller deployments), Faiss (large-scale, recommended for billions of vectors), nmslib (HNSW reference).
Hybrid search	Keyword (BM25) + vector + filter combinations in a single query.
Security	RBAC, field-level security, document-level security, audit logging, encryption at rest, encryption in transit, SAML / OIDC. Included without premium.
Companion tooling	OpenSearch Dashboards (Kibana fork), Data Prepper, Logstash compatibility, ML Commons (in-cluster ML), Anomaly Detection.
Reference deployment shape	Multi-node Kubernetes cluster, sharded indices, replicated across nodes. SAML / OIDC for identity, field-level policies for permission-aware retrieval.
Website	opensearch.org · GitHub: github.com/opensearch-project/OpenSearch

Use OpenSearch when hybrid search and built-in security matter

OpenSearch is the retrieval layer to reach for when the workload is genuinely hybrid keyword-and-vector, when the bundled security model is a procurement advantage, or when the operator already runs Elastic-style infrastructure. Moneli Automation's typical OpenSearch deployment pairs it with vLLM-served embeddings, Haystack for orchestration, and the in-engine field-level security model for permission-aware retrieval.

send Request a WalledCare pilot arrow_back All open-source profiles