High-Risk Disease Detection — Clinical Document Intelligence
A production system that reads ~3 million inbound clinical documents monthly — ER records, specialist notes, labs, faxes — and uses vector embeddings to surface high-risk conditions for clinical review in near real-time.
In value-based care, a diagnosis that arrives from an outside encounter and sits unread is a patient outcome waiting to get worse. This system reads every inbound clinical document and uses vector embeddings to surface semantic matches against ICD-10, HCC, and high-risk condition taxonomies — cancers, diabetes, cardiovascular and renal disease — so care teams learn about new diagnoses in near real-time.
The full write-up will cover:
- Vector-embedding document classification across heterogeneous formats (XML, PDF, scanned TIF)
- Building taxonomy matching against ICD-10, HCC, and high-risk condition sets
- The clinical-review workflow: tuning what surfaces and what does not
- Operating semantic search at the scale of millions of documents per month
- Why manual review cannot scale to this problem, and what that means for patient safety
Full case study coming soon.