• Center on Health Equity & Access
  • Clinical
  • Health Care Cost
  • Health Care Delivery
  • Insurance
  • Policy
  • Technology
  • Value-Based Care

Contributor: Unlocking the Value of Unstructured Data From Continuity of Care Documents Using Medical-Grade NLP

Commentary
Article

Clinical-grade, expert-supported natural language processing (NLP) is valuable to payers and providers when exchanging patient information through continuity-of-care documents.

 Kim Perry | Image Credit: emtelligent

Kim Perry | Image Credit: emtelligent

One of the most critical documents in health care is the continuity of care (CCD) document. This patient-specific electronic document contains vital patient health information and facilitates the exchange of patient data between health care providers. CCDs are essential to patient safety and outcomes as people move between health care settings or providers.

In addition to enabling continuity of care, CCDs are commonly used to exchange patient information between providers and payers. Although CCDs today are based on the HL7 Consolidated Clinical Data Architecture, an initiative to create a standards-based means of exchanging patient information, the content in these electronic documents can vary widely depending on the software used to generate them and how administrators or users choose what data are included.

Stage 1 of CMS’ Meaningful Use program, launched in 2011 to create incentives to accelerate the adoption of electronic health records (EHRs), required that CCDs contain a section only for problems, allergies, medications, and lab results. Stage 2, which encouraged the use of EHRs to promote information exchange and quality improvement at the point of care, mandates the inclusion of additional data, such as vital signs, smoking status, and care plans. In 2015, Stages 1 and 2 were consolidated into a new program.

The Complexity of CCDs

CCD files are formatted in Extensible Markup Language (XML) and contain multiple sections that include both HTML-formatted and XML-formatted data. Unique identifiers are used to link the HTML and XML data, although there can be slightly different data in both sections.

How an her system is configured to create CCDs will have an impact on the type and amount of information in the file. Some EHRs are configured to produce CCDs that include vast amounts of detailed historical patient data, while others may include only more recent patient information.

The Unstructured Data Problem

Among the biggest challenges involving CCDs is how to extract and exchange unstructured data such as free-text clinician notes and free-text embedded documents. Patients with long or complex care histories can have hundreds of reports in their CCDs, including megabytes of unstructured data that are difficult or nearly impossible to extract without the right tools.

These free-text clinical notes and reports in CCDs typically contain specific procedure details, diagnostic data, or other results that clinicians need to provide effective and appropriate care. For example, a surgeon may include free-text notes in a CCD detailing a complication from surgery. Such clinically important information, which can be used to predict important outcomes in a patient, may not be captured elsewhere in the CCD—and potentially overlooked.

Medical-Grade NLP Unlocks Unstructured Data

Finding and extracting unstructured data, however, can be extremely difficult for payers and providers. Having humans search CCDs for unstructured data takes far too long and consumes time they could spend on other activities. Unfortunately, digital tools, such as traditional natural language processing (NLP), have not been up to the task of coping with the complexities of CCD architecture. Until now.

New tools exist in the form of medical-grade NLP powered by artificial intelligence (AI) that can unleash the value contained in unstructured data within CCDs. Clinical-grade medical NLP software reads and understands unstructured clinical notes—capabilities lacking in traditional NLP. It can extract data from these free-text notes in a structured format—while linking back to the source information and context—store the data, and then make it available to use in multiple ways.

Clinical-grade NLP overcomes the problems with inaccuracies and the inability to understand shorthand and abbreviations that have limited the usefulness of traditional NLP. This high-quality NLP contextualizes language in medical notes and accurately identifies common medical terms. By infusing deep learning models with specialized medical knowledge, modern medical NLP software can help providers, payers, pharmaceutical companies, and clinical researchers optimize their data.

Clinical-grade NLP can process and understand information from unstructured medical data exponentially faster than humans and at scale—invaluable for large payer and provider organizations with tens of thousands of plan members and patients. Indeed, clinical-grade NLP can help payers identify high-risk populations, develop targeted interventions, and enhance risk adjustment.

Capture the Full Value of CCDs for Payers and Providers

For CCD consumers such as payers and providers, the value of clinical-grade, expert-supported NLP is clear and compelling. With its ability to structure unstructured text data and connect it with structured data, medical-grade NLP is a foundational technology that can transform how health care is practiced and delivered.

Related Videos
Masanori Aikawa, MD
James Chambers, PhD
Mei Wei, MD, an oncologist specializing in breast cancer at Huntsman Cancer Institute at the University of Utah.
Screenshot of an interview with Ruben Mesa, MD
Screenshot of Adam Colborn, JD during an interview
Ruben Mesa, MD
Screenshot of Susan Wescott, RPh, MBA
Glenn Balasky, executive director of the Rocky Mountain Cancer Center.
Screenshot of Stephanie Hsia, PharmD
Related Content
© 2025 MJH Life Sciences
AJMC®
All rights reserved.