BUYER GUIDE · RFP CHECKLIST · 12 min read

AI Scribe RFP Questions That Survive Procurement

Thirty questions a hospital should put in writing before signing an AI scribe contract — across security and privacy, clinical safety, EHR integration, published evidence, pricing, and vendor risk. Each question includes why it matters, an acceptable-answer hint, and the red flag that means walk away. Copy what you need into your RFP.

Questions

Across six categories — security, safety, integration, evidence, pricing, vendor risk. Roughly the depth a real hospital RFP needs, none of the boilerplate a generic vendor template adds.

Red flags

Every question pairs an acceptable-answer hint with the red flag. The red flags are not hypothetical — they are taken from published 2024–26 reporting and from real hospital procurement experience.

Anchor for

All 11 vendors

Use this checklist alongside the vendor profiles in the WalledCare directory — Abridge, Nabla, Suki, DeepScribe, Ambience, Augmedix, Commure Ambient, Dragon Copilot, Freed, Heidi Health, Mintha — to make the comparison apples-to-apples.

How to use this checklist

This is not a procurement template you photocopy and send. It is the question list a steering committee should rephrase in its own language, weight by the constraints binding for the organization, and use to test what a vendor will actually commit to in writing versus what the marketing page says. The right reviewer for each section is different — security and privacy belong to the privacy office and CISO, clinical safety to the CMIO and clinical informatics, EHR integration to IT, evidence to the medical-affairs reviewer, and vendor risk to procurement and legal.

Two principles run through the questions. First, "we are HIPAA-compliant" is not an answer — the compliance posture has to be specific (BAA terms, retention windows, training-data-use defaults, breach notification). Second, the published evidence is the lower-bound expectation. Vendors who anchor on their own marketing rather than peer-reviewed studies are easier to push for the real numbers if you have the studies ready to cite.

Security and privacy (1–6)

check1. Will you sign our Business Associate Agreement as-is, or do you require us to sign yours? Why it matters: Vendor-templated BAAs often shift retention, audit, and breach-notification terms in the vendor's favor. Acceptable answer: "We will accept material redlines, especially on audit access and breach notification timelines." Red flag: "Our BAA is non-negotiable" — or a refusal to share it before contract talks.
check2. Where exactly does PHI live during and after processing? Why it matters: Data-residency rules under Quebec Law 25, Ontario PHIPA, Alberta HIA, and BC FIPPA depend on the actual region — not on which cloud provider's logo is on the page. Acceptable answer: A specific named region (US East, Canada Central, etc.) with a contractual commitment to keep it there. Red flag: "Our cloud provider may route data through other regions for performance reasons."
check3. How long is audio retained, and can we configure or eliminate that retention? Why it matters: Audio is the original source of truth; deleting it removes the safety net for verifying suspicious transcripts. Some vendors retain by default for "quality"; others delete by default. Acceptable answer: Explicit retention windows with a customer-controlled override. Red flag: Audio retained by default with no off switch.
check4. Do you train your AI models on our patient data? If so, how do we opt out? Why it matters: Training on customer PHI is a fundamentally different data-use story from inference on it, and the opt-out path is the only governance buyers have. Acceptable answer: "No training on customer data by default, and an explicit contractual prohibition is available." Red flag: A nuanced "we de-identify and aggregate before training" — read the fine print.
check5. List every subprocessor with access to PHI and the purpose of access. Why it matters: Subprocessors are where data residency commitments most often fail in practice. Acceptable answer: A current, public subprocessor list with notification commitments before changes. Red flag: "We don't disclose subprocessors for competitive reasons."
check6. What is your breach notification timeline, and is it contractual? Why it matters: HIPAA gives 60 days; many state laws and Canadian provincial laws require faster. The contract should match the strictest. Acceptable answer: A specific timeline (often 24–72 hours) written into the BAA. Red flag: "We will notify in accordance with applicable law" without specifying.

Clinical safety (7–12)

check7. What is your measured hallucination rate, and against what evaluation framework? Why it matters: The 2025 npj Digital Medicine framework analysis (12,999 sentences, 18 model configurations) reported 1.47% hallucinations, 44% of which were classified as major. That is the published baseline. Acceptable answer: Specific rates with a description of the evaluation framework. Red flag: "Hallucinations are very rare in our system" — without numbers.
check8. What is your measured omission rate? Why it matters: The same framework reported 3.45% omissions — more common than hallucinations, with 55% of major omissions clustered in the "current issues" section where they matter most. Acceptable answer: Specific rates plus which note sections the omissions cluster in. Red flag: "Omissions are caught in clinician review" — that is a workflow claim, not a measurement.
check9. How do you handle pronoun and negation errors? Show us examples. Why it matters: "Denies chest pain" rendered as "chest pain" is a documented failure mode and the dominant non-hallucination error class. Acceptable answer: Specific guardrails, recent test results, and example error logs. Red flag: Evasion or "our model handles negation correctly" without evidence.
check10. Is your transcription engine Whisper-based? If so, what safeguards address its documented hallucination issues? Why it matters: OpenAI's Whisper powers most ambient scribes; OpenAI's own docs warn against high-risk-domain use. Published 2024–25 reporting documented invented sentences in medical audio. Acceptable answer: Yes, with specific guardrails (audio retention, sample edit-distance monitoring, post-processing, model fine-tuning). Red flag: "We use proprietary speech recognition" without specifics.
check11. What stop conditions does your contract support if our internal evaluation finds the system unsafe? Why it matters: Pilots and rollouts that cannot pause quickly are not safe pilots. Acceptable answer: Defined termination-for-cause language and a documented escalation path. Red flag: Multi-year commitments with no off-ramp.
check12. How is the clinician's review and sign-off recorded in your audit trail? Why it matters: "Signature is review" is unsafe per the published evidence — but the audit trail still needs to prove what the clinician actually approved versus what the model produced. Acceptable answer: Differential audit (model draft, clinician edits, final signed note) with retention sufficient for your malpractice policy. Red flag: Audit only retains the final note.

EHR integration (13–18)

check13. Exactly what does "Epic integration" mean for your product? Why it matters: "Epic integration" ranges from copy-paste to Epic Pal status. Abridge is the first Epic "Pal"; many vendors stop at sidecar. Acceptable answer: Specific integration surface (Pal, sidecar, Haiku app, In Basket), bi-directional write-back, version compatibility. Red flag: "We integrate with Epic" without specifying which surface.
check14. List the EHRs you support and the depth of each integration. Why it matters: Most vendor pages claim "all major EHRs"; depth varies massively. Acceptable answer: A matrix of EHR × integration depth × write-back support. Red flag: Marketing claim without operational reference.
check15. Do you support inpatient and procedural workflows, or only outpatient? Why it matters: Ambient documentation in surgical, ED, and ICU settings has different failure modes — and most vendors started outpatient. Acceptable answer: Named inpatient deployments and the specific workflow surfaces (rounding, handoff, procedure notes) supported. Red flag: "We support inpatient" without named references.
check16. How do you handle our specialty templates, smart phrases, and clinical preference lists? Why it matters: Notes that match the clinician's existing template are accepted faster; notes that don't get rewritten. Acceptable answer: A documented onboarding process for specialty templates and a service-level commitment for changes. Red flag: "Our templates are standardized for all customers."
check17. What happens if the EHR is down or your service is degraded? Why it matters: Clinicians need a fallback that doesn't lose the encounter. Acceptable answer: Documented degraded-mode behavior (local audio cache, retry, alternate workflow). Red flag: "Our uptime is 99.95%" — that's an SLA, not a fallback plan.
check18. Can we export our historical notes, encounter audio, and metadata in a portable format? Why it matters: Vendor lock-in is real; the exit story is the procurement test. Acceptable answer: Documented export formats (JSON, FHIR, mp3 / wav), customer-controlled retention, deletion-on-request. Red flag: "Export is a custom services engagement."

Published evidence (19–22)

check19. List the peer-reviewed studies of your product, with author institutions and dates. Why it matters: The category has real evidence — UCLA NEJM AI RCT, Mass General Brigham JAMA cohort, multi-system burnout QI — and most vendors do not. Acceptable answer: Named papers in named journals with reference numbers. Red flag: Vendor case studies marketed as "studies."
check20. What does the strongest peer-reviewed evidence actually show? Why it matters: Specifics anchor the conversation. Nabla cut time-in-note by 9.5% in the UCLA RCT; DAX showed −1.7%, not significant. Mass General Brigham reported 13.4 min/day total EHR-time reduction. Acceptable answer: Specific numbers a vendor will commit to in writing. Red flag: "Our customers report saving up to two hours a day."
check21. What independent evaluations have been run on your product (KLAS, etc.)? Why it matters: KLAS Ambient AI rankings are public and meaningful; vendor-curated awards less so. Acceptable answer: Named recent rankings, source, and date. Red flag: Awards from organizations the vendor itself sponsors.
check22. Provide three customer references in our specialty and size range that we can call independently. Why it matters: The reference call where the vendor is not on the line is the only one worth taking. Acceptable answer: Named contacts at named institutions in your specialty and size. Red flag: "We will arrange a reference call with our customer-success team."

Pricing and contracts (23–26)

check23. What is your total cost for our specific deployment, in writing? Why it matters: "Contact sales" is a negotiation tactic, not a pricing model. Acceptable answer: A specific quote with all components (per-clinician fee, implementation, support, integration). Red flag: Multi-page quotes that bury per-feature costs.
check24. What is your year-2 price escalation policy? Why it matters: Industry reporting documents automatic year-2 price increases that aren't visible at signing. Acceptable answer: Capped escalation (e.g., 3–5%) tied to a named index, customer right to renegotiate. Red flag: "Pricing is reviewed annually" without a cap.
check25. Which features are standard versus add-on, and what is the path to adding them? Why it matters: Pricing comparisons fail when one vendor bundles features another charges separately for. Acceptable answer: A current SKU sheet with feature availability per tier. Red flag: Demos that show features the contract does not include.
check26. What is the cancellation policy if our pilot fails or our needs change? Why it matters: Auto-renewal and termination-for-convenience terms decide whether a failed pilot becomes a multi-year obligation. Acceptable answer: Documented termination-for-convenience after a stated notice period; no auto-renewal lock-in. Red flag: Three-year initial term with auto-renewal and a 90-day cancellation window buried in section 12.

Vendor risk (27–30)

check27. Describe your funding history, current cash runway, and ownership status. Why it matters: The category is in active consolidation — Augmedix was taken private by Commure in 2024 for $139M after losing money. Acquisition risk is real. Acceptable answer: Funding rounds disclosed, runway in quarters, ownership structure. Red flag: "We're a private company and don't disclose finances."
check28. Has your product changed hands recently? If so, what changed for existing customers? Why it matters: Augmedix → Commure, Nuance → Microsoft → Dragon Copilot. Acquisitions reset defaults, sometimes terms. Acceptable answer: A specific change-log for existing customers, with continuity guarantees. Red flag: "Nothing has changed for existing customers" — verify with a reference.
check29. Who at your company will be our named operational contact, and what is their tenure? Why it matters: Customer success teams turn over; the named individual matters more than the title. Acceptable answer: A named CS lead and a named clinical-informatics lead with tenure measured in years. Red flag: "Your account team will be assigned at kickoff."
check30. What insurance do you carry, and will you indemnify us for AI-output errors? Why it matters: Clinicians remain liable for AI-generated documentation errors today; the vendor's insurance posture is the buyer's risk floor. Acceptable answer: Errors-and-omissions and cyber-liability insurance with specific limits, and a defined indemnification scope. Red flag: No errors-and-omissions coverage; broad indemnification carve-outs.

What the answers should add up to

The vendor's answers should compose into a single, internally consistent story: where data lives, what the failure modes are, how clinicians review, what the evidence base shows, what the contract obliges. Inconsistencies between sections are the most useful signal — a vendor whose marketing claims a 2-hour-per-day saving but whose evidence list contains the UCLA RCT showing a 41-second saving has a coherence problem the procurement committee should name out loud.

The Moneli Automation framing is to use this checklist as the comparison rubric against an on-prem alternative — the on-prem path answers some of these questions trivially (where does PHI live? on hospital hardware) and others substantively differently (what is the cost model? capex plus ops rather than per-clinician-per-month). The right pilot scopes both paths against the same questions.

Where this fits in the WalledCare directory

This RFP checklist pairs with the workflow-specific guidance in the AI Scribes category page, the deployment-decision framework in Cloud vs Local AI for Hospitals, and the safety-oriented pilot design in How to Test an AI Scribe Safely. Use the vendor profiles in the vendor hub to pre-fill which vendors will answer which questions credibly before the RFP goes out.

send Request a WalledCare pilot menu_book Back to guides grid_view Back to directory