BUYER GUIDE · RFP CHECKLIST · 12 min read

AI Scribe RFP Questions That Survive Procurement

Thirty questions a hospital should put in writing before signing an AI scribe contract — across security and privacy, clinical safety, EHR integration, published evidence, pricing, and vendor risk. Each question includes why it matters, an acceptable-answer hint, and the red flag that means walk away. Copy what you need into your RFP.

Questions
30

Across six categories — security, safety, integration, evidence, pricing, vendor risk. Roughly the depth a real hospital RFP needs, none of the boilerplate a generic vendor template adds.

Red flags
30

Every question pairs an acceptable-answer hint with the red flag. The red flags are not hypothetical — they are taken from published 2024–26 reporting and from real hospital procurement experience.

Anchor for
All 11 vendors

Use this checklist alongside the vendor profiles in the WalledCare directory — Abridge, Nabla, Suki, DeepScribe, Ambience, Augmedix, Commure Ambient, Dragon Copilot, Freed, Heidi Health, Mintha — to make the comparison apples-to-apples.

How to use this checklist

This is not a procurement template you photocopy and send. It is the question list a steering committee should rephrase in its own language, weight by the constraints binding for the organization, and use to test what a vendor will actually commit to in writing versus what the marketing page says. The right reviewer for each section is different — security and privacy belong to the privacy office and CISO, clinical safety to the CMIO and clinical informatics, EHR integration to IT, evidence to the medical-affairs reviewer, and vendor risk to procurement and legal.

Two principles run through the questions. First, "we are HIPAA-compliant" is not an answer — the compliance posture has to be specific (BAA terms, retention windows, training-data-use defaults, breach notification). Second, the published evidence is the lower-bound expectation. Vendors who anchor on their own marketing rather than peer-reviewed studies are easier to push for the real numbers if you have the studies ready to cite.

Security and privacy (1–6)

  • check1. Will you sign our Business Associate Agreement as-is, or do you require us to sign yours? Why it matters: Vendor-templated BAAs often shift retention, audit, and breach-notification terms in the vendor's favor. Acceptable answer: "We will accept material redlines, especially on audit access and breach notification timelines." Red flag: "Our BAA is non-negotiable" — or a refusal to share it before contract talks.
  • check2. Where exactly does PHI live during and after processing? Why it matters: Data-residency rules under Quebec Law 25, Ontario PHIPA, Alberta HIA, and BC FIPPA depend on the actual region — not on which cloud provider's logo is on the page. Acceptable answer: A specific named region (US East, Canada Central, etc.) with a contractual commitment to keep it there. Red flag: "Our cloud provider may route data through other regions for performance reasons."
  • check3. How long is audio retained, and can we configure or eliminate that retention? Why it matters: Audio is the original source of truth; deleting it removes the safety net for verifying suspicious transcripts. Some vendors retain by default for "quality"; others delete by default. Acceptable answer: Explicit retention windows with a customer-controlled override. Red flag: Audio retained by default with no off switch.
  • check4. Do you train your AI models on our patient data? If so, how do we opt out? Why it matters: Training on customer PHI is a fundamentally different data-use story from inference on it, and the opt-out path is the only governance buyers have. Acceptable answer: "No training on customer data by default, and an explicit contractual prohibition is available." Red flag: A nuanced "we de-identify and aggregate before training" — read the fine print.
  • check5. List every subprocessor with access to PHI and the purpose of access. Why it matters: Subprocessors are where data residency commitments most often fail in practice. Acceptable answer: A current, public subprocessor list with notification commitments before changes. Red flag: "We don't disclose subprocessors for competitive reasons."
  • check6. What is your breach notification timeline, and is it contractual? Why it matters: HIPAA gives 60 days; many state laws and Canadian provincial laws require faster. The contract should match the strictest. Acceptable answer: A specific timeline (often 24–72 hours) written into the BAA. Red flag: "We will notify in accordance with applicable law" without specifying.

Clinical safety (7–12)

  • check7. What is your measured hallucination rate, and against what evaluation framework? Why it matters: The 2025 npj Digital Medicine framework analysis (12,999 sentences, 18 model configurations) reported 1.47% hallucinations, 44% of which were classified as major. That is the published baseline. Acceptable answer: Specific rates with a description of the evaluation framework. Red flag: "Hallucinations are very rare in our system" — without numbers.
  • check8. What is your measured omission rate? Why it matters: The same framework reported 3.45% omissions — more common than hallucinations, with 55% of major omissions clustered in the "current issues" section where they matter most. Acceptable answer: Specific rates plus which note sections the omissions cluster in. Red flag: "Omissions are caught in clinician review" — that is a workflow claim, not a measurement.
  • check9. How do you handle pronoun and negation errors? Show us examples. Why it matters: "Denies chest pain" rendered as "chest pain" is a documented failure mode and the dominant non-hallucination error class. Acceptable answer: Specific guardrails, recent test results, and example error logs. Red flag: Evasion or "our model handles negation correctly" without evidence.
  • check10. Is your transcription engine Whisper-based? If so, what safeguards address its documented hallucination issues? Why it matters: OpenAI's Whisper powers most ambient scribes; OpenAI's own docs warn against high-risk-domain use. Published 2024–25 reporting documented invented sentences in medical audio. Acceptable answer: Yes, with specific guardrails (audio retention, sample edit-distance monitoring, post-processing, model fine-tuning). Red flag: "We use proprietary speech recognition" without specifics.
  • check11. What stop conditions does your contract support if our internal evaluation finds the system unsafe? Why it matters: Pilots and rollouts that cannot pause quickly are not safe pilots. Acceptable answer: Defined termination-for-cause language and a documented escalation path. Red flag: Multi-year commitments with no off-ramp.
  • check12. How is the clinician's review and sign-off recorded in your audit trail? Why it matters: "Signature is review" is unsafe per the published evidence — but the audit trail still needs to prove what the clinician actually approved versus what the model produced. Acceptable answer: Differential audit (model draft, clinician edits, final signed note) with retention sufficient for your malpractice policy. Red flag: Audit only retains the final note.

EHR integration (13–18)

  • check13. Exactly what does "Epic integration" mean for your product? Why it matters: "Epic integration" ranges from copy-paste to Epic Pal status. Abridge is the first Epic "Pal"; many vendors stop at sidecar. Acceptable answer: Specific integration surface (Pal, sidecar, Haiku app, In Basket), bi-directional write-back, version compatibility. Red flag: "We integrate with Epic" without specifying which surface.
  • check14. List the EHRs you support and the depth of each integration. Why it matters: Most vendor pages claim "all major EHRs"; depth varies massively. Acceptable answer: A matrix of EHR × integration depth × write-back support. Red flag: Marketing claim without operational reference.
  • check15. Do you support inpatient and procedural workflows, or only outpatient? Why it matters: Ambient documentation in surgical, ED, and ICU settings has different failure modes — and most vendors started outpatient. Acceptable answer: Named inpatient deployments and the specific workflow surfaces (rounding, handoff, procedure notes) supported. Red flag: "We support inpatient" without named references.
  • check16. How do you handle our specialty templates, smart phrases, and clinical preference lists? Why it matters: Notes that match the clinician's existing template are accepted faster; notes that don't get rewritten. Acceptable answer: A documented onboarding process for specialty templates and a service-level commitment for changes. Red flag: "Our templates are standardized for all customers."
  • check17. What happens if the EHR is down or your service is degraded? Why it matters: Clinicians need a fallback that doesn't lose the encounter. Acceptable answer: Documented degraded-mode behavior (local audio cache, retry, alternate workflow). Red flag: "Our uptime is 99.95%" — that's an SLA, not a fallback plan.
  • check18. Can we export our historical notes, encounter audio, and metadata in a portable format? Why it matters: Vendor lock-in is real; the exit story is the procurement test. Acceptable answer: Documented export formats (JSON, FHIR, mp3 / wav), customer-controlled retention, deletion-on-request. Red flag: "Export is a custom services engagement."

Published evidence (19–22)

  • check19. List the peer-reviewed studies of your product, with author institutions and dates. Why it matters: The category has real evidence — UCLA NEJM AI RCT, Mass General Brigham JAMA cohort, multi-system burnout QI — and most vendors do not. Acceptable answer: Named papers in named journals with reference numbers. Red flag: Vendor case studies marketed as "studies."
  • check20. What does the strongest peer-reviewed evidence actually show? Why it matters: Specifics anchor the conversation. Nabla cut time-in-note by 9.5% in the UCLA RCT; DAX showed −1.7%, not significant. Mass General Brigham reported 13.4 min/day total EHR-time reduction. Acceptable answer: Specific numbers a vendor will commit to in writing. Red flag: "Our customers report saving up to two hours a day."
  • check21. What independent evaluations have been run on your product (KLAS, etc.)? Why it matters: KLAS Ambient AI rankings are public and meaningful; vendor-curated awards less so. Acceptable answer: Named recent rankings, source, and date. Red flag: Awards from organizations the vendor itself sponsors.
  • check22. Provide three customer references in our specialty and size range that we can call independently. Why it matters: The reference call where the vendor is not on the line is the only one worth taking. Acceptable answer: Named contacts at named institutions in your specialty and size. Red flag: "We will arrange a reference call with our customer-success team."

Pricing and contracts (23–26)

  • check23. What is your total cost for our specific deployment, in writing? Why it matters: "Contact sales" is a negotiation tactic, not a pricing model. Acceptable answer: A specific quote with all components (per-clinician fee, implementation, support, integration). Red flag: Multi-page quotes that bury per-feature costs.
  • check24. What is your year-2 price escalation policy? Why it matters: Industry reporting documents automatic year-2 price increases that aren't visible at signing. Acceptable answer: Capped escalation (e.g., 3–5%) tied to a named index, customer right to renegotiate. Red flag: "Pricing is reviewed annually" without a cap.
  • check25. Which features are standard versus add-on, and what is the path to adding them? Why it matters: Pricing comparisons fail when one vendor bundles features another charges separately for. Acceptable answer: A current SKU sheet with feature availability per tier. Red flag: Demos that show features the contract does not include.
  • check26. What is the cancellation policy if our pilot fails or our needs change? Why it matters: Auto-renewal and termination-for-convenience terms decide whether a failed pilot becomes a multi-year obligation. Acceptable answer: Documented termination-for-convenience after a stated notice period; no auto-renewal lock-in. Red flag: Three-year initial term with auto-renewal and a 90-day cancellation window buried in section 12.

Vendor risk (27–30)

  • check27. Describe your funding history, current cash runway, and ownership status. Why it matters: The category is in active consolidation — Augmedix was taken private by Commure in 2024 for $139M after losing money. Acquisition risk is real. Acceptable answer: Funding rounds disclosed, runway in quarters, ownership structure. Red flag: "We're a private company and don't disclose finances."
  • check28. Has your product changed hands recently? If so, what changed for existing customers? Why it matters: Augmedix → Commure, Nuance → Microsoft → Dragon Copilot. Acquisitions reset defaults, sometimes terms. Acceptable answer: A specific change-log for existing customers, with continuity guarantees. Red flag: "Nothing has changed for existing customers" — verify with a reference.
  • check29. Who at your company will be our named operational contact, and what is their tenure? Why it matters: Customer success teams turn over; the named individual matters more than the title. Acceptable answer: A named CS lead and a named clinical-informatics lead with tenure measured in years. Red flag: "Your account team will be assigned at kickoff."
  • check30. What insurance do you carry, and will you indemnify us for AI-output errors? Why it matters: Clinicians remain liable for AI-generated documentation errors today; the vendor's insurance posture is the buyer's risk floor. Acceptable answer: Errors-and-omissions and cyber-liability insurance with specific limits, and a defined indemnification scope. Red flag: No errors-and-omissions coverage; broad indemnification carve-outs.

What the answers should add up to

The vendor's answers should compose into a single, internally consistent story: where data lives, what the failure modes are, how clinicians review, what the evidence base shows, what the contract obliges. Inconsistencies between sections are the most useful signal — a vendor whose marketing claims a 2-hour-per-day saving but whose evidence list contains the UCLA RCT showing a 41-second saving has a coherence problem the procurement committee should name out loud.

The Moneli Automation framing is to use this checklist as the comparison rubric against an on-prem alternative — the on-prem path answers some of these questions trivially (where does PHI live? on hospital hardware) and others substantively differently (what is the cost model? capex plus ops rather than per-clinician-per-month). The right pilot scopes both paths against the same questions.

Where this fits in the WalledCare directory

This RFP checklist pairs with the workflow-specific guidance in the AI Scribes category page, the deployment-decision framework in Cloud vs Local AI for Hospitals, and the safety-oriented pilot design in How to Test an AI Scribe Safely. Use the vendor profiles in the vendor hub to pre-fill which vendors will answer which questions credibly before the RFP goes out.

send Request a WalledCare pilot menu_book Back to guides grid_view Back to directory

Further reading