Case Study Coming soon

High-Risk Disease Detection — Clinical Document Intelligence

A production system that reads ~3 million inbound clinical documents monthly — ER records, specialist notes, labs, faxes — and uses vector embeddings to surface high-risk conditions for clinical review in near real-time.

Client ChenMed LLC
Role Principal Architect
Period 2023 – Present
Scale ~3M documents/month · XML, PDF, scanned TIF · 12,000+ recommendations in a two-week period

In value-based care, a diagnosis that arrives from an outside encounter and sits unread is a patient outcome waiting to get worse. This system reads every inbound clinical document and uses vector embeddings to surface semantic matches against ICD-10, HCC, and high-risk condition taxonomies — cancers, diabetes, cardiovascular and renal disease — so care teams learn about new diagnoses in near real-time.

The full write-up will cover:

  • Vector-embedding document classification across heterogeneous formats (XML, PDF, scanned TIF)
  • Building taxonomy matching against ICD-10, HCC, and high-risk condition sets
  • The clinical-review workflow: tuning what surfaces and what does not
  • Operating semantic search at the scale of millions of documents per month
  • Why manual review cannot scale to this problem, and what that means for patient safety

Full case study coming soon.


← All case studies  ·  Engage me on similar work