Discharge Summaries — Buyer Guide for AI Discharge Drafting

Quality parity

3.67 / 3.77

Mean Likert quality scores for LLM- vs. physician-generated discharge narratives in a 2025 evaluation. Comparable on overall quality; LLM drafts more concise and coherent, less comprehensive. Eight blinded clinicians, 292 paired summaries, Dutch academic hospital, April 2025.

Errors per summary

2.91 vs 1.82

LLM drafts in the same study contained more unique errors per summary than physicians (2.91 vs 1.82). The reason mandatory clinician review is non-negotiable, not a polite recommendation.

Readability gap

88.7%

Of internal-medicine discharge instructions exceed the recommended sixth-grade reading level — making them inaccessible to most patients they serve. This is the gap a patient-friendly draft is built to close.

Medication discrepancies

88%

Of patients had at least one medication discrepancy after discharge in a 2025 cross-sectional study (median 3 per patient). Medication reconciliation is the dominant safety failure mode AI drafting must explicitly address.

Why discharge is the hardest documentation moment

Discharge summaries serve two distinct readers and have to satisfy both. The receiving primary-care or specialty provider needs a complete clinical picture: hospital course, problems addressed, medication changes with rationale, pending labs, follow-up plan, and red flags. The patient leaving the hospital needs the same information at a sixth-grade reading level, in plain language, with the medications and the warning signs clear enough to act on alone at home.

Both readers consistently get failed. Discharge instructions exceed the recommended reading level for ~88.7% of internal-medicine patients. Medication non-reconciliation is identified in ~50% of inpatient episodes, with 63% of those non-reconciled events carrying potential for moderate harm and 2% for severe harm. Production delays in discharge summaries are independently associated with readmission rates. AI drafting addresses every part of this — readability, comprehensiveness, medication clarity, time-to-completion — when the workflow is designed around clinician review and explicit medication reconciliation.

The dual-reader pattern that works

The reference architecture in the 2025–2026 literature converges on a two-output workflow: one clinician-grade summary for the receiving provider, one patient-friendly version for the patient. Same source material, same medication list, same plan — different language and structure.

SURFACE 01

Clinician-grade discharge summary

Hospital course, problems addressed, procedures, results, medication changes with rationale, pending tasks, follow-up plan. Generated from the encounter notes plus structured chart artifacts (med list, problem list, recent vitals). Reviewed and signed by the discharging clinician before filing.

SURFACE 02

Patient-friendly version

Same content at a sixth-grade reading level, in the patient's preferred language, with medication instructions formatted for action, warning signs, and follow-up clearly listed. NEJM AI's GPT-4 plain-language work showed +2.4 subjective and +1.2 objective comprehension points.

SURFACE 03

Medication reconciliation block

Explicit "what changed" comparison: pre-admission medications, in-hospital medications, discharge medications, with the rationale for each addition / discontinuation / dose change. The single highest-leverage component of the entire summary.

SURFACE 04

Teach-back integration

The patient-friendly version is the input to the bedside teach-back at discharge. Clinician asks the patient to explain back the medications and warning signs; AI summary becomes a structured prompt rather than a takeaway document.

What the published evidence shows

checkQuality parity on the clinician summary. The Dutch academic-hospital evaluation (292 paired summaries, eight blinded clinicians) found mean Likert quality scores of 3.67 (LLM) vs 3.77 (physician) — comparable. LLM drafts were more concise and coherent; physician drafts were more comprehensive.
checkPatient comprehension improves materially. NEJM AI's GPT-4 plain-language translation study found subjective comprehension scores rose 2.4 points and objective scores rose 1.2 points across diagnoses — with the largest gains in populations with historically low health literacy.
checkReadability shifts. Expert evaluators rated AI-generated patient-friendly summaries comprehensible in 88–97% of cases — versus the baseline where ~88.7% of original instructions exceed sixth-grade reading level.
checkMultimodal extensions. Stanford's ED-Explain (PSB 2026) showed AI-generated discharge instructions were significantly more complete, correct, and accessible than originals — with personalized video as an additional surface for low-literacy patients.
closeThe error count caveat. The same Dutch study reported 2.91 errors per LLM summary vs 1.82 per physician summary. Lower-comprehensiveness errors and fact-omissions are the failure modes. This is the reason "review and sign" must be a real review step, not a one-click confirmation.

Where it goes wrong — the patterns to plan around

closeMedication-list errors. The dominant safety failure. 88% of patients have at least one medication discrepancy after discharge (median 3 per patient). Common patterns: unintentional discontinuation of a chronic medication, inappropriate retention of an inpatient-only medication, dose / frequency drift, missing rationale for a change. Mitigation: explicit reconciliation block, enforced before signature.
closeComprehensiveness drops. LLM drafts read better but skip more. Mitigation: structured input (the chart artifacts the LLM is given) must include problem list, med list, recent labs, pending tasks, and active orders — not just the encounter narrative.
closePlain-language overshoot. Aggressive simplification can drop clinically critical nuance. Mitigation: keep the clinician-grade summary as ground truth, derive the patient version from it, and have the same clinician review both.
closeLanguage and equity gaps. Comprehension gains are largest where health literacy is lowest — but accuracy in non-English drafts depends on model coverage. Evaluate per language, not in aggregate.
close"Sign without reading" automation bias. Pre-filled draft is harder to scrutinize than a blank one. Mitigation: friction in the review UI (require explicit acknowledgement of the medication block), random spot audit by pharmacy or a second clinician, edit-distance tracking.

The evaluation rubric that survives the demo

METRIC 01

Medication accuracy

Per-medication audit: pre-admission, in-hospital, discharge. Catch unintentional discontinuations, retained inpatient meds, and dose drift. Lead metric — every other gain is undermined if this one fails.

METRIC 02

Completeness

Coverage of: hospital course, problems addressed, medication changes with rationale, pending labs, follow-up plan, red flags. Pull a structured checklist from each summary; missing items are the LLM's known weakness.

METRIC 03

Reading level

Flesch-Kincaid Grade Level on the patient-friendly version. Target ≤ 6th grade. Per-language reading-level checks for non-English drafts.

METRIC 04

Patient comprehension

Teach-back pass rate at the bedside. Optional: post-discharge phone-call comprehension check at 48 hours. The downstream metric that predicts readmissions.

METRIC 05

Edit distance

Clinician edits between draft and signed summary. Per section. Big edits in medications or follow-up are red flags about the upstream chart artifacts being passed to the model.

METRIC 06

Time to discharge summary

Discharge order to filed summary, in minutes. Delayed summaries are independently associated with readmission rates and downstream prescribing errors.

METRIC 07

Readmission and 30-day adverse events

Track at the cohort level after rollout. Slow signal but the one the C-suite cares about. Use a difference-in-differences design against a non-rollout unit if possible.

METRIC 08

Equity breakdown

Every metric above split by patient primary language and health-literacy proxy. Aggregate numbers are comforting; the disaggregated picture is the true safety signal.

Cloud commercial vs. on-prem — the architecture choice

Discharge drafting touches the broadest patient-data surface in the hospital: full hospital course, complete medication history, problem list, labs, vitals, plan. The cloud-vs-on-prem choice is therefore most often decided by which architecture can credibly take that surface inside the inference path.

Dimension	Cloud commercial draft tools	On-prem (WalledCare)
Data surface	Notes + structured artifacts shipped to vendor cloud per discharge.	Notes + chart artifacts processed inside hospital data center.
Medication-rec integration	Variable; depends on EHR + pharmacy integration depth.	Native: medication system → reconciliation block → summary, all on-prem.
Patient-friendly translation	Available; depends on vendor's plain-language model coverage.	Same model serves clinical + patient surfaces; translation is one prompt away.
Multilingual support	Vendor-supplied; configuration limited.	Customer chooses model with appropriate language coverage.
Audit	Vendor-side, exposed via API.	Append-only audit log inside hospital data center; native to the same compliance stack as everything else.
Residency (Canadian provincial)	Cloud surface is presumptively non-compliant with PHIPA, HIA, Law 25 without province-resident infrastructure.	Province-resident, no outbound API.

For most hospitals, the on-prem path is the cleaner architecture for discharge specifically — because the data surface is largest, the safety stakes are highest, and the medication reconciliation step benefits most from being hosted next to the pharmacy system rather than across a vendor boundary.

How this fits into a multi-app local stack

Discharge drafting shares infrastructure with the rest of the on-prem clinical AI stack: the same retrieval layer that backs document Q&A sources patient-instruction language from the approved patient-education library; the same audit log that captures policy lookups captures discharge edits; the same FHIR integration that grounds ambient scribes grounds discharge medication reconciliation; the same handoff-style summary that feeds shift handoffs gets reused for inter-facility transfer.

Hospital-course documentation that flows into the discharge summary at the end of stay. Same audio capture, same structured note artifact.

Sources patient-instruction language from the approved internal patient-education library — not the public internet.

Local guideline retrieval to ground medication-reconciliation rationale and red-flag selection.

SBAR-style summaries derived from the same hospital-course content used to draft the discharge summary.

Pick a unit, run a real pilot

The fastest path to a defensible discharge-summary deployment is to scope one unit (typically internal medicine or hospital medicine), define the rubric above, and run a 60-day pilot with mandatory clinician review and explicit medication-reconciliation tracking. Compare a cloud commercial draft tool against an on-prem reference stack on every metric — but particularly on medication accuracy, time-to-summary, and patient comprehension at teach-back.

send Request a WalledCare pilot arrow_back Back to directory