GLOSSARY · BUYER-SIDE DEFINITIONS · 12 min

Healthcare AI Glossary

Plain-language definitions for the ~70 terms that come up in clinical AI procurement: model architecture, retrieval, speech and multimodal, healthcare standards, deployment and infrastructure, evaluation and safety, regulatory and privacy. Each term links to where it shows up in the rest of the directory, so the glossary doubles as a navigation index.

A

Agentic AI: An AI system that plans and executes multi-step tasks with limited human input — calling tools, making decisions between steps, looping until a goal is met. Distinct from a chat assistant, which responds turn-by-turn. Agentic patterns are the largest 2026 healthcare-AI trend per Deloitte and Becker's.
AI governance: The policy and operational layer that decides which AI tools are allowed, how they are evaluated, how their use is logged, and how problems are escalated. Distinct from the technology itself — the governance layer is what makes a deployment auditable.
Air-gapped: A system that has no network connectivity to anything outside the organization's network boundary. Ollama and llama.cpp support air-gapped operation; vendor cloud scribes do not.
Ambient scribe: An AI tool that listens to a patient encounter and produces a draft clinical note. Modern ambient scribes also draft after-visit summaries, code suggestions, and referral letters from the same audio. See the AI Scribes category guide.
ASR (Automatic Speech Recognition): Speech-to-text. The first stage in an ambient scribe pipeline before the language model summarizes the transcript into a note. Whisper is the open-source baseline; most commercial scribes use it.
Audit log: A record of what the system did, when, for whom, and on what data. PHIPA Section 12, Quebec Law 25, and Alberta HIA all impose audit-trail expectations. The buyer should confirm the audit log is exportable, not just internal to the vendor.
Automation bias: The well-documented tendency of clinicians (and reviewers generally) to accept a plausibly-worded pre-filled draft more readily than they would re-derive the same content from scratch. The reason "signature is review" is not a safe AI-scribe workflow. See the hallucination and omission reference.

B

BAA (Business Associate Agreement): The HIPAA contract that lets a vendor handle PHI on the hospital's behalf. Required for every U.S. healthcare cloud relationship. The BAA's specific terms — audit access, breach notification, retention — are what matter; "HIPAA-compliant" is not by itself an answer. Question 1 of the RFP checklist.
Benchmark: A standardized test of an AI system's capability or performance — accuracy on a held-out clinical-summarization corpus, throughput in tokens per second on a GPU, latency at p95. Vendor benchmarks should be reproduced on the buyer's own audio / corpus before trust.
BERT: A 2018 transformer-encoder model from Google, still the basis of many embedding models. Variants like BioBERT and PubMedBERT add medical-domain pretraining; useful as embedding backbones for clinical RAG.
BM25: The classical keyword-search ranking algorithm. Used in hybrid retrieval alongside vector search — keyword exact-match plus semantic similarity. Built into OpenSearch by default.

C

CAI (Commission d'accès à l'information): Quebec's privacy regulator. Enforces Law 25; aggregate Q1 2026 enforcement signal crossed $C2.3M in fines. The CAI is the body that interprets the algorithmic-transparency requirements in Section 12.
Chunking: Splitting documents into smaller pieces (paragraphs, sections, fixed-size windows) before generating embeddings. Chunking strategy is one of the most important RAG design choices — too-small chunks lose context, too-large chunks dilute similarity.
Continuous batching: An LLM serving technique used by vLLM that fills GPU cycles with incoming requests as previous requests complete, instead of waiting for batches to fill. The complement to PagedAttention; together they explain vLLM's throughput advantage.
Context window: The maximum number of tokens an LLM can attend to in one inference call. Modern models offer 32k-1M token windows; longer windows enable longer RAG chunks and larger chart-context payloads but cost more in latency and memory.
CPT (Current Procedural Terminology): The U.S. medical-procedure coding system used for billing. AI scribes increasingly suggest CPT codes from the encounter — the workflow that justifies the contract on revenue-cycle ROI grounds.
Custodian (PHIPA / HIA): Under PHIPA in Ontario and HIA in Alberta, the "health information custodian" is the party legally accountable for protected health information. Hospitals are typically custodians; AI vendors operating on PHI act as agents or affiliates of the custodian, which keeps the legal responsibility with the hospital.

D

DAX Copilot: Microsoft's ambient documentation product (Dragon Ambient eXperience), now folded into Dragon Copilot. The product line behind the UCLA NEJM AI RCT's −1.7% time-in-note result.
Diarization: Speaker attribution in audio — labeling who said what. Whisper does not natively diarize; production pipelines pair it with WhisperX or pyannote.audio.
Distillation: Training a smaller "student" model to imitate a larger "teacher" model. Used to compress capable but expensive frontier models into smaller deployable variants.
DPA (Data Processing Agreement): The EU / UK / AU equivalent of a BAA — the contract that lets a vendor process personal data on the customer's behalf under GDPR or analogous law. Vendors operating internationally typically sign BAAs and DPAs depending on geography.

E

EHR (Electronic Health Record): The hospital's clinical record system. Epic, Oracle Health (formerly Cerner), Meditech, athenahealth, eClinicalWorks, AllScripts, NextGen, and Oscar (Canadian primary care) are the most-named EHRs in the directory's vendor profiles.
E/M (Evaluation and Management): The U.S. CMS coding system for outpatient visit complexity (levels 1-5 for new and established patients). One of the highest-value coding surfaces for AI scribes — accurate E/M coding has direct revenue-cycle impact.
Embedding: A numeric vector representation of a piece of text, image, or audio that encodes its meaning in a way similar items end up close in vector space. The foundation of semantic search and RAG. Healthcare-relevant embedding models include nomic-embed-text, BGE, E5, MedCPT, and PubMedBERT.
Epic Pal: Epic's program for tightly-integrated third-party AI partners. Abridge is the first Epic Pal; the designation is the deepest integration tier Epic offers to scribes.
Evaluation (eval): The structured measurement of an AI system's quality — accuracy, hallucination rate, omission rate, edit distance, citation coverage. A pilot without a written evaluation framework is not a real pilot.

F

FAccT: The ACM Conference on Fairness, Accountability, and Transparency. The venue for the 2024 study that established the ~1% Whisper hallucination baseline in clinical audio.
Fine-tuning: Continuing the training of a pretrained model on a specific dataset to specialize its behavior — for example, fine-tuning Llama 3.1 on medical-documentation corpora. Distinct from RAG, which leaves the model unchanged and provides context at inference time.
FHIR (Fast Healthcare Interoperability Resources): The modern healthcare data standard for exchanging clinical information. The integration surface most often used by on-prem AI stacks to pull patient context safely. See the on-premise reference architecture.
FP8 / FP16 / BF16: Floating-point number precisions used in LLM inference. FP16 / BF16 is the production default; FP8 (supported by H100's Transformer Engine) trades a small accuracy hit for substantial throughput.

G

GGUF: The model file format used by llama.cpp and Ollama. Self-contained, quantization-aware, optimized for fast load and inference on CPU + GPU.
GPU (A100 / H100 / B200): NVIDIA's data-center GPU line. A100 (40 / 80 GB) is the workhorse for 70B-class models; H100 delivers ~3× more LLM throughput via the Transformer Engine and FP8; B200 (2024-25) extends the curve. See the on-prem reference architecture for sizing.

H

Hallucination: AI-generated content that is not present in the source — a fabricated symptom, an invented diagnosis, a documented physical exam that never happened. Published rate: 1.47% on the npj Digital Medicine framework, 44% of which are classified as major.
HIA (Alberta Health Information Act): Alberta's provincial health-privacy statute. Section 20 imposes consent rules for AI processing of health information; Section 60 governs custodian / affiliate accountability. See the Canadian compliance hub.
HIPAA: The U.S. Health Insurance Portability and Accountability Act. The federal floor for PHI protection. The 2026 HIPAA Security Rule update added explicit AI-deployment expectations. Compliance is necessary; not by itself sufficient for Canadian provincial requirements.
HL7: The older, message-based predecessor to FHIR for clinical data exchange. Many hospital integrations are still HL7 v2; FHIR is the newer standard most modern AI integrations target.
HNSW (Hierarchical Navigable Small World): The dominant approximate-nearest-neighbor index for vector search. Qdrant, Milvus, and OpenSearch all support HNSW; differences are in filter integration and quantization options.

I

ICD-10: The 10th revision of the International Classification of Diseases coding system. The diagnosis-coding output most AI scribes produce alongside the visit note.
I-PASS: A structured handoff framework — Illness severity, Patient summary, Action list, Situation awareness, Synthesis by receiver. The published 2014 NEJM trial reported a 30% reduction in adverse events; multi-site replications report 47%. See the Handoff Tools category page.
Inference: The act of running an AI model on input to produce output. Distinct from training (which produces the model). Inference is the cost line on the operating-budget side; training is the cost line on the capex side.
IPC (Information and Privacy Commissioner): Provincial privacy regulators — IPC of Ontario, the OIPC of Alberta, the OIPC of BC, the CAI in Quebec. The regulator each hospital answers to under its provincial regime.

K

KLAS Research: An independent healthcare-IT research firm. Best in KLAS rankings (Ambient AI category) are the most cited third-party recognition in vendor sales conversations. Abridge has held the #1 position multiple years.
k-NN (k-Nearest Neighbors): The search problem vector databases solve — finding the k vectors in an index most similar to a query vector. Approximate k-NN (via HNSW or similar) trades some recall for speed.
KV cache: The key-value tensors an LLM caches during generation. Memory-hungry; PagedAttention's contribution was modeling the KV cache as virtual-memory pages to cut memory waste from ~60-80% to under 4%.

L

Latency (p50 / p95 / p99): Time from request to first or last token, measured at percentiles. p99 latency matters in clinical workflows because the tail is where clinicians notice; mean latency hides bad worst-case behavior.
Law 25 (Quebec): Quebec's modernized private-sector privacy law (formerly Bill 64). Section 12 obliges organizations to disclose use of automated decision-making and explain the principal factors. Section 17 governs cross-border transfers. The strictest Canadian penalty regime; ~$C2.3M aggregate Q1 2026 fines. See the Canadian compliance hub.
LLM (Large Language Model): A neural network trained on large text corpora to generate or analyze text. Examples: Llama 3.x, Mistral, Gemma, GPT-class models, Claude-class models, Qwen, DeepSeek. The compute and capability tier behind every product in the directory.
LoRA (Low-Rank Adaptation): An efficient fine-tuning technique that updates only a small "adapter" of a model rather than all its weights. Makes domain fine-tuning of large models affordable on modest hardware.

M

MedGemma: Google's open-weight medical Gemma model — 4B and 27B variants, multimodal under MedGemma 1.5, runnable through Ollama or vLLM. The most credible open-weight medical model in 2026.
Model card: A short structured document describing a model's training data, intended use, known limitations, and evaluation. Originating in Google AI's 2018 framework; the Health AI Partnership's AI Vendor Disclosure Framework is the procurement-oriented extension.
Multimodal: An AI model that accepts more than one input type — typically text plus images (and sometimes audio or video). MedGemma 1.5 and Whisper are multimodal in different senses.

N

NEJM AI: The New England Journal of Medicine's AI-specific journal. The UCLA 2025 ambient-scribe RCT (NEJM AI) is the single most-cited published evaluation in the category — Nabla −9.5% time-in-note (p=0.02), DAX −1.7% (not significant).
nmslib: An open-source library implementing HNSW. One of three vector-search engines offered inside OpenSearch.
npj Digital Medicine: The Nature Portfolio journal of digital health. Source of the 2025 hallucination / omission framework (1.47% / 3.45% rates, 44% major) and the "Beyond human ears" editorial on AI-scribe risks. See the safety reference.

O

Omission: Content the AI failed to document that was present in the source — a missed medication change, a missed history item, a missed plan item. Published rate: 3.45%; 55% of major omissions cluster in the "current issues" section of the note.
On-prem (on-premises): Software running inside the hospital's own network rather than in a vendor's cloud. The deployment shape that satisfies provincial residency rules in Canada without contractual gymnastics. See the on-premise reference architecture.
Open-weight: An AI model whose trained parameters (weights) are publicly downloadable. Distinct from open-source, which describes the training code. Llama 3.x, Mistral, Gemma 2/3, MedGemma, Qwen, and Whisper are open-weight; GPT-class models from OpenAI are not.

P

PagedAttention: The KV-cache memory-management technique introduced by vLLM at UC Berkeley in 2023. Treats the KV cache as virtual-memory pages; cuts memory waste from ~60-80% to under 4%. The architectural reason vLLM throughput is roughly 22× vanilla Hugging Face Transformers.
Pajama time: The clinical-informatics slang for after-hours EHR work done at home. Family physicians average ~86 minutes per night. The headline burden ambient scribes are most directly trying to bend.
Parameters (weights): The numbers a neural network learns during training. Modern LLMs range from ~1B parameters (small / efficient) through ~70B (mainstream production) to 405B+ (frontier). MedGemma ships at 4B and 27B.
PHI (Protected Health Information): The U.S. HIPAA-defined category for identifiable health information. The Canadian equivalents are "personal health information" under PHIPA, "health information" under HIA, and "personal information" generally under Quebec Law 25.
PHIPA (Ontario): Ontario's Personal Health Information Protection Act. Section 18 governs consent for AI use of PHI; Section 10 governs permitted uses; Section 12 imposes audit-trail requirements. The most-cited Canadian provincial framework for AI deployment. See the Canadian compliance hub.
PIA (Privacy Impact Assessment): The structured assessment of how a new system handles personal information, where the risks are, and how they are mitigated. Mandatory under Quebec Law 25 for tech processing personal info; strongly recommended by the IPC of Ontario under PHIPA. The most common compliance failure is skipping the PIA before procurement hardens.
PIPA (BC): British Columbia's Personal Information Protection Act. Applies to private-sector personal information in BC. Public-sector hospitals are also subject to FIPPA, which contains residency-relevant provisions.
PIPEDA: The federal Personal Information Protection and Electronic Documents Act. The default Canadian privacy floor; cross-border PHI flows usually trigger PIPEDA Article 4.1.3. 2026 amendments under Bill C-27's CPPA modernize consent rules around AI training and automated decision-making.
Prompt: The input text given to an LLM at inference time. In production scribes, the prompt usually combines a system role, retrieved chart context, the audio transcript, and an instruction template.

Q

Q4_K_M: The production-recommended quantization preset for llama.cpp — roughly 4-bit weights with K-quant groupings, ~4.5 GB for a 7B model, minimal quality loss versus fp16.
Quantization: Reducing the numerical precision of a model's weights — fp16 → 8-bit → 4-bit — to shrink memory footprint and speed up inference, at the cost of some accuracy. 4-bit (Q4_K_M, GPTQ, AWQ) is the most common production-quantization tier.

R

RAG (Retrieval-Augmented Generation): A pattern that retrieves relevant documents from a private corpus and supplies them to the LLM as context, instead of fine-tuning the model on the corpus. The dominant pattern for hospital document Q&A and private medical search. See the Document Q&A category page.
RCM (Revenue Cycle Management): The hospital function that manages billing, coding, prior authorization, and payment collection. AI scribes increasingly bundle RCM uplift (ICD-10 / E/M / HCC accuracy) into the value pitch; Commure Ambient and Ambience lean hardest on this.
Recall (and precision): Standard information-retrieval metrics. Recall is the share of relevant items retrieved; precision is the share of retrieved items that are relevant. RAG quality depends on both — and on whether retrieval surfaces are reranked before generation.
Reranker: A second-pass model that scores retrieved documents for relevance to the query before they reach the LLM. Typically a cross-encoder (slower but more accurate than the embedding retrieval that produced the candidate set).
Residency (data residency): The legal-and-architectural requirement that data live in a specific geography. Quebec Law 25 and BC FIPPA are the most explicit Canadian residency regimes; PHIPA and HIA bind through cross-border-transfer documentation rules.

S

SBAR: A clinical communication structure — Situation, Background, Assessment, Recommendation. The standard nursing-handoff framework alongside I-PASS. See Handoff Tools.
Self-RAG: A retrieval pattern where the model decides when to retrieve, what to retrieve, and which retrieved passages to use — instead of retrieving once for every query. Used in clinical-RAG systems where naive retrieval can degrade accuracy.
SOAP note: The classical clinical-note structure — Subjective, Objective, Assessment, Plan. The dominant note format ambient scribes produce.
SOC 2: A widely-adopted U.S. security audit framework (Type 1 = controls in place; Type 2 = controls operating effectively over a period). Standard healthcare-cloud-vendor compliance attestation alongside HIPAA / BAA.
SSO / mTLS: Single Sign-On and mutual TLS authentication — the standard identity and transport-security layer in front of a hospital AI gateway. The engine layer (vLLM, LocalAI, Qdrant) is intentionally minimal on identity; the gateway in front owns it.
Subprocessor: A third-party service the AI vendor uses to handle PHI on their behalf (e.g., a cloud provider, an embedding API). The vendor must disclose subprocessors; an undisclosed subprocessor is where most residency commitments fail in practice.

T

Temperature: A generation parameter controlling how deterministic an LLM's output is. Temperature 0 is greedy / deterministic; higher temperatures produce more varied output. Most clinical-summarization workflows operate at low temperature.
Tensor parallelism: Splitting a model's weights across multiple GPUs so each handles part of every forward pass. Supported by vLLM; the standard way to serve 70B-class models on 2×A100 or 4×H100.
TEVV (Test, Evaluation, Verification, Validation): The structured AI assurance discipline coming out of NIST and the broader compliance ecosystem. Increasingly cited as the expected baseline for hospital AI procurement under the Health AI Partnership vendor framework.
Token: The unit of text an LLM consumes and produces — typically a subword. English text averages ~4 characters per token; "ICD-10" tokenizes to about 3 tokens. Latency and throughput are usually measured per token.
TPS (Tokens Per Second): The throughput unit for LLM inference. Single-stream TPS measures how fast one user sees output; aggregate TPS measures how much total work the GPU produces under concurrent load. vLLM's headline ~2,200 TPS on Llama 2 70B / 4× A100 is aggregate at 256 concurrent users.

V

Vector database: A database optimized for similarity search over high-dimensional vectors. Qdrant, Milvus, OpenSearch, Weaviate, and pgvector are the common open-source options.
vLLM: The UC Berkeley-originated open-source LLM serving engine. Introduces PagedAttention and continuous batching. The de facto production inference layer for hospital-scale concurrent workloads. See the full profile.

W

Whisper: OpenAI's open-source speech-to-text model. Powers most ambient AI scribes. Has documented hallucination behavior in medical audio; OpenAI's own documentation warns against use in "high-risk domains." See the Whisper profile and the safety reference.

Where this fits in the WalledCare directory

Every term in this glossary cross-links to the directory page where it most often comes up. Use the glossary as the buyer's onboarding artifact for a steering committee that includes IT, clinical informatics, privacy, and procurement reviewers — different members will encounter different sets of terms, and the cross-links route them to the right depth.

send Request a WalledCare pilot menu_book Back to guides grid_view Back to directory