Offline Reinforcement Learning Improves Time in Range via Hybrid Closed-Loop Systems

May 10, 2023

Article

This form of reinforcement learning was also shown to correct for control scenarios like irregular meal timing and compression errors.

Offline reinforcement learning (RL) in hybrid closed-loop systems can significantly increase time in the healthy blood glucose range for patients living with type 1 diabetes (T1D), new research shows. In addition, offline RL can also correct for common and challenging control scenarios like incorrect bolus dosing, irregular meal timing, and compression errors, authors wrote.

Findings of the proof-of-concept study were published in Journal of Biomedical Informatics.

Hybrid closed-loop systems allow patients with T1D to automatically regulate their basal insulin dosing.

“The majority of commercially available hybrid closed loop systems utilize predictive integral derivative [PID] controllers or model predictive controllers [MPCs],” the researchers explained.

Although these algorithms are easily interpretable, they can limit devices’ efficacy. PID algorithms can overestimate insulin doses after meals while MPCs typically utilize linear or simplified models of glucose dynamics, which are unable to capture the full complexity of the task, they added.

RL poses one solution to this problem, as “a decision-making agent learns the optimal sequence of actions to take in order to maximize some concept of reward.”

Current approaches typically use online RL that require interaction with a patient or simulator during training. However, this method has limitations, underscoring the need for methods capable of learning accurate dosing policies from obtainable levels of glucose data without associated risks.

In the current proof-of-concept study, researchers utilized offline RL for glucose control. Specifically, an RL agent learned without environmental interaction during training, instead learning from a static dataset collected under another agent.

“This entails a rigorous analysis of the offline RL algorithms—batch-constrained deep Q-learning, conservative Q-learning and twin-delayed deep deterministic policy gradient with behavioral cloning (TD3-BC)—in their ability to develop safe and high-performing insulin-dosing strategies in hybrid closed-loop systems,” the authors wrote.

Investigators trained and tested the algorithms on 30 virtual patients—10 children, 10 adolescents, 10 adults—using the UVA/Padova glucose dynamics simulator. They then assessed their performance and sample efficiency compared with the current strongest online RL and control baselines.

Data showed that when trained on less than a tenth of the total training samples required by online RL to achieve stable performance, offline RL can significantly increase time in the healthy blood glucose range from a mean (SD) 61.6% (0.3%) to 65.3% (0.5%) compared with the strongest state-of-art baseline (P < .001), the authors wrote.

This was achieved without any associated increase in low blood glucose events, they added.

Analyses also revealed:

The TD3-BC approach outperformed the widely used and clinically validated PID algorithm across all patient age groups with respect to time in range, time below range, and glycemic risk
The improvement was more significant when TD3-BC was evaluated in potentially unsafe glucose control scenarios
Further experiments on TD3-BC highlighted the ability of the approach to learn accurate and stable dosing policies from significantly smaller samples of patient data than those utilized in current online RL alternatives

The use of a T1D simulator marks a main limitation to the study, as these environments cannot capture factors like stress, activity, and illness, the authors emphasized.

“Future work could include validating the method on simulated populations with type 2 diabetes, building on offline RL methods to incorporate online learning for continuous adaption of control policies, or incorporating features such as interpretability or integration of prior medical knowledge, which may ease the transition from simulation to clinical use,” the authors concluded.

Reference

Emerson H, Guy M, McConville R. Offline reinforcement learning for safer blood glucose control in people with type 1 diabetes. J Biomed Inform. Published online May 4, 2023. doi:10.1016/j.jbi.2023.104376

Related Content

Pink ribbon for breast cancer | Image Credit: © Andrey Popov - stock.adobe.com

Remote Weight Loss Intervention Reduces Body Weight in Women With Breast Cancer

August 25th 2025

Article

A clinical trial assessing a remote weight loss intervention in women with breast cancer saw a reduction in body weight.

Laundromats as a New Frontier in Community Health, Medicaid Outreach

May 29th 2025

Podcast

Lindsey Leininger, PhD, and Allister Chang, MPA, highlight the potential of laundromats as accessible, community-based settings to support Medicaid outreach, foster trust, and connect families with essential health and social services.

Listen

Atopic dermatitis on hands | Image Credit: © Ольга Тернавская - stock.adobe.com

Yearly Flares in Atopic Dermatitis May Predict Disease Severity

August 20th 2025

Article

In this study, total flares were investigated for their propensity to predict atopic dermatitis disease severity among adult patients.

Inside the Center's MDD Value Model and Its Use of Dynamic Pricing

May 13th 2025

Podcast

Larragem Raines, MS, of the Center for Innovation & Value Research, discusses the organization's major depressive disorder (MDD) open-source value model, dynamic pricing, and the future role of artificial intelligence in care.

Listen

Michael Ungvary from Cigna (left) and Beebe Healthcare's David Tam, MD, MBA, FACHE, CPHE (right) speaking at the GPBCH annual conference.

Payers and Providers Seek Common Ground on Costs, Technology, and Value-Based Care

August 20th 2025

Article

Explore how payers and providers collaborate to enhance value-based care, tackle rising costs, and improve patient outcomes in today's health care landscape.

CRC detection | Image credit: Ahmet Aglamaz - stock.adobe.com

AI Model Using EHR Data Predicts 5-Year Survival in Colorectal Cancer

August 19th 2025

Article

A novel image-based deep learning approach achieves high accuracy and interpretability, offering potential for clinical decision support.