CATEGORY · DISCHARGE SUMMARIES
Discharge Summaries
Discharge is the inflection point where most preventable adverse events happen. The summary serves two readers — the next clinician taking over care, and the patient going home — and it has to be accurate, complete, and readable. AI drafting helps with all three goals when the workflow is designed around dual-reader output, medication-reconciliation discipline, and mandatory clinician review. This guide is the buyer's view: what the published 2025–2026 evidence shows, where the failure modes cluster, the evaluation rubric that matters, and where the on-prem path is the cleaner architecture.
Mean Likert quality scores for LLM- vs. physician-generated discharge narratives in a 2025 evaluation. Comparable on overall quality; LLM drafts more concise and coherent, less comprehensive. Eight blinded clinicians, 292 paired summaries, Dutch academic hospital, April 2025.
LLM drafts in the same study contained more unique errors per summary than physicians (2.91 vs 1.82). The reason mandatory clinician review is non-negotiable, not a polite recommendation.
Of internal-medicine discharge instructions exceed the recommended sixth-grade reading level — making them inaccessible to most patients they serve. This is the gap a patient-friendly draft is built to close.
Of patients had at least one medication discrepancy after discharge in a 2025 cross-sectional study (median 3 per patient). Medication reconciliation is the dominant safety failure mode AI drafting must explicitly address.
Why discharge is the hardest documentation moment
Discharge summaries serve two distinct readers and have to satisfy both. The receiving primary-care or specialty provider needs a complete clinical picture: hospital course, problems addressed, medication changes with rationale, pending labs, follow-up plan, and red flags. The patient leaving the hospital needs the same information at a sixth-grade reading level, in plain language, with the medications and the warning signs clear enough to act on alone at home.
Both readers consistently get failed. Discharge instructions exceed the recommended reading level for ~88.7% of internal-medicine patients. Medication non-reconciliation is identified in ~50% of inpatient episodes, with 63% of those non-reconciled events carrying potential for moderate harm and 2% for severe harm. Production delays in discharge summaries are independently associated with readmission rates. AI drafting addresses every part of this — readability, comprehensiveness, medication clarity, time-to-completion — when the workflow is designed around clinician review and explicit medication reconciliation.
The dual-reader pattern that works
The reference architecture in the 2025–2026 literature converges on a two-output workflow: one clinician-grade summary for the receiving provider, one patient-friendly version for the patient. Same source material, same medication list, same plan — different language and structure.
Hospital course, problems addressed, procedures, results, medication changes with rationale, pending tasks, follow-up plan. Generated from the encounter notes plus structured chart artifacts (med list, problem list, recent vitals). Reviewed and signed by the discharging clinician before filing.
Same content at a sixth-grade reading level, in the patient's preferred language, with medication instructions formatted for action, warning signs, and follow-up clearly listed. NEJM AI's GPT-4 plain-language work showed +2.4 subjective and +1.2 objective comprehension points.
Explicit "what changed" comparison: pre-admission medications, in-hospital medications, discharge medications, with the rationale for each addition / discontinuation / dose change. The single highest-leverage component of the entire summary.
The patient-friendly version is the input to the bedside teach-back at discharge. Clinician asks the patient to explain back the medications and warning signs; AI summary becomes a structured prompt rather than a takeaway document.
What the published evidence shows
- checkQuality parity on the clinician summary. The Dutch academic-hospital evaluation (292 paired summaries, eight blinded clinicians) found mean Likert quality scores of 3.67 (LLM) vs 3.77 (physician) — comparable. LLM drafts were more concise and coherent; physician drafts were more comprehensive.
- checkPatient comprehension improves materially. NEJM AI's GPT-4 plain-language translation study found subjective comprehension scores rose 2.4 points and objective scores rose 1.2 points across diagnoses — with the largest gains in populations with historically low health literacy.
- checkReadability shifts. Expert evaluators rated AI-generated patient-friendly summaries comprehensible in 88–97% of cases — versus the baseline where ~88.7% of original instructions exceed sixth-grade reading level.
- checkMultimodal extensions. Stanford's ED-Explain (PSB 2026) showed AI-generated discharge instructions were significantly more complete, correct, and accessible than originals — with personalized video as an additional surface for low-literacy patients.
- closeThe error count caveat. The same Dutch study reported 2.91 errors per LLM summary vs 1.82 per physician summary. Lower-comprehensiveness errors and fact-omissions are the failure modes. This is the reason "review and sign" must be a real review step, not a one-click confirmation.
Where it goes wrong — the patterns to plan around
- closeMedication-list errors. The dominant safety failure. 88% of patients have at least one medication discrepancy after discharge (median 3 per patient). Common patterns: unintentional discontinuation of a chronic medication, inappropriate retention of an inpatient-only medication, dose / frequency drift, missing rationale for a change. Mitigation: explicit reconciliation block, enforced before signature.
- closeComprehensiveness drops. LLM drafts read better but skip more. Mitigation: structured input (the chart artifacts the LLM is given) must include problem list, med list, recent labs, pending tasks, and active orders — not just the encounter narrative.
- closePlain-language overshoot. Aggressive simplification can drop clinically critical nuance. Mitigation: keep the clinician-grade summary as ground truth, derive the patient version from it, and have the same clinician review both.
- closeLanguage and equity gaps. Comprehension gains are largest where health literacy is lowest — but accuracy in non-English drafts depends on model coverage. Evaluate per language, not in aggregate.
- close"Sign without reading" automation bias. Pre-filled draft is harder to scrutinize than a blank one. Mitigation: friction in the review UI (require explicit acknowledgement of the medication block), random spot audit by pharmacy or a second clinician, edit-distance tracking.
The evaluation rubric that survives the demo
Per-medication audit: pre-admission, in-hospital, discharge. Catch unintentional discontinuations, retained inpatient meds, and dose drift. Lead metric — every other gain is undermined if this one fails.
Coverage of: hospital course, problems addressed, medication changes with rationale, pending labs, follow-up plan, red flags. Pull a structured checklist from each summary; missing items are the LLM's known weakness.
Flesch-Kincaid Grade Level on the patient-friendly version. Target ≤ 6th grade. Per-language reading-level checks for non-English drafts.
Teach-back pass rate at the bedside. Optional: post-discharge phone-call comprehension check at 48 hours. The downstream metric that predicts readmissions.
Clinician edits between draft and signed summary. Per section. Big edits in medications or follow-up are red flags about the upstream chart artifacts being passed to the model.
Discharge order to filed summary, in minutes. Delayed summaries are independently associated with readmission rates and downstream prescribing errors.
Track at the cohort level after rollout. Slow signal but the one the C-suite cares about. Use a difference-in-differences design against a non-rollout unit if possible.
Every metric above split by patient primary language and health-literacy proxy. Aggregate numbers are comforting; the disaggregated picture is the true safety signal.
Cloud commercial vs. on-prem — the architecture choice
Discharge drafting touches the broadest patient-data surface in the hospital: full hospital course, complete medication history, problem list, labs, vitals, plan. The cloud-vs-on-prem choice is therefore most often decided by which architecture can credibly take that surface inside the inference path.
| Dimension | Cloud commercial draft tools | On-prem (WalledCare) |
|---|---|---|
| Data surface | Notes + structured artifacts shipped to vendor cloud per discharge. | Notes + chart artifacts processed inside hospital data center. |
| Medication-rec integration | Variable; depends on EHR + pharmacy integration depth. | Native: medication system → reconciliation block → summary, all on-prem. |
| Patient-friendly translation | Available; depends on vendor's plain-language model coverage. | Same model serves clinical + patient surfaces; translation is one prompt away. |
| Multilingual support | Vendor-supplied; configuration limited. | Customer chooses model with appropriate language coverage. |
| Audit | Vendor-side, exposed via API. | Append-only audit log inside hospital data center; native to the same compliance stack as everything else. |
| Residency (Canadian provincial) | Cloud surface is presumptively non-compliant with PHIPA, HIA, Law 25 without province-resident infrastructure. | Province-resident, no outbound API. |
For most hospitals, the on-prem path is the cleaner architecture for discharge specifically — because the data surface is largest, the safety stakes are highest, and the medication reconciliation step benefits most from being hosted next to the pharmacy system rather than across a vendor boundary.
How this fits into a multi-app local stack
Discharge drafting shares infrastructure with the rest of the on-prem clinical AI stack: the same retrieval layer that backs document Q&A sources patient-instruction language from the approved patient-education library; the same audit log that captures policy lookups captures discharge edits; the same FHIR integration that grounds ambient scribes grounds discharge medication reconciliation; the same handoff-style summary that feeds shift handoffs gets reused for inter-facility transfer.
Hospital-course documentation that flows into the discharge summary at the end of stay. Same audio capture, same structured note artifact.
Sources patient-instruction language from the approved internal patient-education library — not the public internet.
Local guideline retrieval to ground medication-reconciliation rationale and red-flag selection.
SBAR-style summaries derived from the same hospital-course content used to draft the discharge summary.
Pick a unit, run a real pilot
The fastest path to a defensible discharge-summary deployment is to scope one unit (typically internal medicine or hospital medicine), define the rubric above, and run a 60-day pilot with mandatory clinician review and explicit medication-reconciliation tracking. Compare a cloud commercial draft tool against an on-prem reference stack on every metric — but particularly on medication accuracy, time-to-summary, and patient comprehension at teach-back.
send Request a WalledCare pilot arrow_back Back to directory
Further reading
- PubMed: Physician- and LLM-generated hospital discharge summaries (2025)
- PMC: Quality of AI-generated vs. physician-written discharge summaries (Dutch hospital, April 2025)
- Scientific Reports: Automated discharge-summary generation from clinical data (2025)
- PMC: LLM discharge-summary preparation using real-world EMR data
- JAMA Network Open: Generative AI for patient-friendly language in discharge summaries
- NEJM AI: GPT-4 plain-language translation of clinical notes — comprehension gains
- npj Digital Medicine: Quality and safety of generative AI for patient-centred discharge instructions
- Communications Medicine: LLM to simplify discharge summaries with cardiological lifestyle recommendations
- American Journal of Medicine: Language and readability barriers in discharge instructions
- PMC: Generative AI to transform inpatient discharge summaries to patient-friendly language
- AHRQ PSNet primer: Readmissions and adverse events after discharge
- Discrepancies in medication lists after hospital discharge — multi-condition study (2025)
- PMC: Medication-information completeness in discharge summaries — Norwegian rural hospital
- PSB 2026: ED-Explain — personalized video discharge instructions
- JGIM: Physician-led verbal communication and the teach-back method
- WalledCare: On-premise clinical assistants reference stack