Searching for the Policy-Relevant Treatment Effect in Medicare’s ACO Evaluations

Dowd,Bryan;Feldman,Roger;Lee,Woolton;Rowan,Kathleen;Parashuram,Shriram;White,Katie;

Center on Health Equity & Access
Clinical
Health Care Cost
Health Care Delivery
Insurance
Policy
Technology
Value-Based Care

Free In-Person Regional Events Focused on Oncology and Population Health

All Coverage Event Coverage Interviews News

AJMC Journals Anniversary Author Forms Authors Nominate a Rising Leader Submit a Manuscript

Searching for the Policy-Relevant Treatment Effect in Medicare’s ACO Evaluations

December 9, 2024

Bryan E. Dowd, PhD
Roger D. Feldman, PhD

Publication

Peer-Reviewed

Population Health, Equity & OutcomesDecember 2024

Volume 30

Issue Spec No. 13

Pages: SP978-SP984

The authors discuss multiple challenges to the production of policy-relevant results from evaluation of Medicare accountable care organizations (ACOs).

ABSTRACT

Objectives: To explain key challenges to evaluating Center for Medicare and Medicaid Innovation (CMMI) accountable care organization (ACO) models and ways to address those challenges.

Study Design: We enumerate the challenges, beginning with the conception of the alternative payment model and extending through the decision to scale up the model should the initial evaluation suggest that the model is successful. The challenges include churn at the provider and ACO levels, beneficiary leakage and spillover, participation in prior payment models, and determinants of shared savings and penalties.

Methods: We explain challenges posed in evaluations of voluntary ACO models vs models in which ACOs are randomly assigned to the treatment group. We also note the relationship between the design used in an evaluation and subsequent plans for scaling up successful models.

Results: The optimal research design is inextricably tied to the plans for scaling up a successful model. Decisions regarding churn, leakage, spillover, and participating in past payment models can alter the estimated effects of the intervention on participants in the model.

Conclusions: If CMMI intends to offer the model to a larger, but similar, group of volunteers, then the estimated treatment effect based on voluntary participants may be the most policy-relevant parameter. However, if the scaled-up population has different characteristics than the evaluation sample, perhaps due to mandatory participation, then the evaluator will need to employ pseudo-randomization appropriate for observational data.

Am J Manag Care. 2024;30(Spec. No. 13):SP978-SP984. https://doi.org/10.37765/ajmc.2024.89647

_____

The Center for Medicare and Medicaid Innovation (CMMI) within CMS is tasked with fostering “healthcare transformation by finding new ways to pay for and deliver care that can lower costs and improve care.”1 An important component of CMMI’s agenda is the accountable care organization (ACO) model in fee-for-service (FFS) traditional Medicare (TM). CMS defines ACOs as “groups of doctors, hospitals, and other health care professionals that work together to give patients high-quality, coordinated service and health care, improve health outcomes, and manage costs. ACOs may be in a specific geographic area and/or focused on patients who have a specific condition, like chronic kidney disease.”2 The Medicare Shared Savings Program (MSSP) made ACOs a permanent feature of TM in 2012. CMMI’s past ACO models include Pioneer ACO; ACO Investment Model; Advance Payment; Comprehensive ESRD Care, which focuses on end-stage renal disease; and Next Generation ACOs (NGACOs). Current models as of November 2024 include Kidney Care Choices, Making Care Primary, Primary Care First, and Realizing Equity, Access, and Community Health (known as REACH). Currently, one-third of TM beneficiaries are in Medicare’s ACO models, pursuant to CMMI’s goal of having all TM beneficiaries in an accountable care relationship by 2030.3

To test its ACO models, CMMI specifies the rules for participation and the research questions to be addressed by the evaluation. Some features of the model are fixed by CMMI, whereas others are left to the discretion of the ACOs. In the NGACO model, the focus of this discussion, Part D costs were excluded from the model, and although a goal of the NGACO model is to strengthen the patient-physician relationship, participating beneficiaries remain free to choose any Medicare-eligible provider. NGACO participants included 3 types of organizations: hospitals, physician-affiliated organizations, and physician-hospital partnerships. NGACOs chose risk level and risk cap, which together determine an ACO’s financial risk: 80% or 100% 2-sided risk for shared savings or losses and a cap on reward/liability from shared savings or losses that could range from 5% to 15% of benchmark expenditures. NGACOs chose among 4 payment mechanisms: traditional FFS, FFS with a fixed per-beneficiary per-month infrastructure payment, population-based payments, or all-inclusive population-based payments.4

Two features of CMMI’s ACO initiatives distinguish them from similar ACO initiatives by private commercial health insurance plans. First, unlike smaller commercial health plans, CMS controls a large enough market share of many physicians’ practices that it has a degree of monopsony pricing power, extending beyond fees to mandatory participation in particular payment models in TM. Second, expanding or scaling up a model that proves to be successful in the evaluation phase is under the control of the CMS Office of the Actuary (OACT). Evaluations of CMMI’s ACO models are statutorily required to assess the effects of the model on both spending and quality of care. OACT must certify that scaling up the model would reduce or not increase Medicare spending with no deleterious effects on quality of care.5 The following discussion applies to the evaluation of both cost and quality of care.

There are 2 ways that an ACO model can reduce Medicare spending. First, the model’s incentives could cause providers to become more efficient relative to the same providers’ spending in the absence of the model’s incentives. Second, the model could increase the number of beneficiaries receiving care from inherently more efficient providers. Thus far, CMMI ACO models have not included any incentive for beneficiaries to choose more efficient providers.

METHODS

Estimating the Policy-Relevant Treatment Effects

An important feature of CMMI’s ACO models that affects the policy-relevant research questions is that a participant organization’s choice to join an ACO model has almost always been voluntary. CMMI chooses model participants from a pool of applicants based on prior experience with financial risk sharing, organizational history, patient-centeredness, and existing infrastructure and workflows. The process whereby ACOs apply and CMMI selects participants could result in biased estimates of the policy-relevant treatment effect.

In current CMS ACO evaluations, the estimated treatment effect includes (1) the effect of nonrandom decisions by health care providers to form ACOs and apply for participation in the model, (2) the nonrandom decision by CMMI to allow specific ACOs to become actual participants, and (3) the effect of the model’s incentives on participants vs nonparticipants. ACO participants could be those who either the ACO or CMMI thinks will be successful in the model. If a successful model is scaled up in a similar way to a larger group of similar, voluntary participants, it will not be necessary to correct for participation decisions by either the ACOs or CMMI because all those processes will remain constant in the scaled-up environment. However, the evaluator still may wish to consider heterogeneous treatment effects and the possibility that later adopters will differ from earlier adopters in their responses to the model’s incentives.

Alternatively, CMS could scale up the model to a population with different characteristics than the evaluation sample, perhaps by mandating participation in the model for all TM providers. In that case, the evaluation results, based on voluntary participants, could result in a biased estimate of the average effect on the broader target population.

Identification of the most policy-relevant treatment effect often is couched in the language of internal and external validity. Gertler et al define internal validity as “the estimated impact of the program…net of all other potential confounding factors—or, in other words, that the comparison group provides an accurate estimate of the counterfactual so that we are estimating the true impact of the program.”6 Gertler et al further state that “an evaluation is externally valid if the evaluation sample accurately represents the population of eligible units. The results of the evaluation can then be generalized to the population of eligible units.”6

If scaling up means applying the same model to participants with different characteristics from the evaluation sample (eg, by mandating participation), then both internal and external validity are important. In that case, it becomes important to correct for observed and unobserved variables that could affect both the participation decision and the outcomes of interest—a problem known as omitted variable bias, spurious correlation, or unobserved confounders. The policy-relevant parameter is the effect of the treatment on the new, different target population.

Having identified the policy-relevant treatment effect, the next step is to determine the best feasible way to estimate it. We need to estimate what would have happened to the target population in the absence of the model’s incentives. That counterfactual cannot be observed directly, so it must be approximated by the analyst.

In some cases, the comparison group can be constructed by assigning participants randomly to the treatment and comparison groups. A randomized controlled trial is designed to balance the observed and unobserved characteristics of the participants in the treatment and comparison groups. CMS has used randomization in the Comprehensive Care for Joint Replacement and Million Hearts models but not the Pioneer, MSSP, or NGACO ACO models.

A fully randomized research design for ACO models would require assignment of beneficiaries, providers, organizations, and even market areas to treatment and comparison groups and even to each other (eg, random assignment of beneficiaries to providers, providers to organizations).7 That option is not feasible in the evaluation of most CMMI ACO models because ACO models generally involve individual providers who voluntarily form, join, or leave an ACO. Nonetheless, there are 2 opportunities for randomization that CMMI could consider. The first is to randomize market areas to the model’s offer (intention to treat). The second is to assign ACOs that meet the eligibility requirements for participation in the model randomly to actual participation in the model. Of course, CMMI could be constrained from implementing these options for various reasons.

In some cases, randomization is unethical, illegal, too expensive, or otherwise infeasible. When randomization is infeasible, CMMI must turn to the collection of observational (nonexperimental) data in which participants self-select or otherwise are nonrandomly assigned to the treatment and comparison groups. If randomization-like results are required from an evaluation based on voluntary participants, comparability of the treatment and comparison groups can be improved by matching beneficiaries, providers, ACOs, or market areas on observable characteristics,8,9 but matching on observable variables leaves open the possibility of omitted variable bias (ie, unobserved variables that affect both ACO participation and subsequent outcomes). For example, some ACOs may have learned from their experience with past CMMI models or commercial ACOs. Omitted variable problems can be addressed through econometric estimation methods10,11 such as difference-in-differences (DID),12-16 instrumental variables,17 sample selection models,18-20 2-stage residual inclusion,21 or regression discontinuity.22 These methods are designed to obtain randomization-like results for at least a subset of participants in the focal population. The results apply only to the subset of participants for whom the estimation method works like randomization. That subset of participants may or may not be the same subset of participants for whom the policy-relevant treatment effect is desired. Each method has its strengths and weaknesses, which are the subject of current work in econometrics, and heterogeneous treatment effects should be considered regardless of the research design or estimation approach. For example, DID estimation is vulnerable to events that happen at the same time as implementation of the model but have different effects on the treatment and comparison groups. Staggered implementation dates can reduce the probability of such events, but staggered implementation implies multiple evaluation years, which introduces the possibility that later voluntary participants may be different from earlier volunteers because they have learned about the model from earlier volunteers. Also, the sample of volunteers at later implementation dates increasingly may consist of successful, earlier participants who have not dropped out of the model. This point is discussed further in the section on churn.

Results Relevant to Other Challenges

Accounting for prior and contemporaneous CMS initiatives. CMMI often tests multiple models in the same market area and even at the same time.23 Should the effect of a new model be the effect of the model relative to the status quo, in which members of both the treatment and control groups may be participants in current or past payment reform models? Or should the comparison group consist of a “clean” set of providers with no previous or current experience in CMMI models? Recent evidence suggests that participation in multiple models can have an additive effect on an ACO’s performance.24

As CMMI continues to experiment with a growing number of models, providers who have no concurrent or previous experience in interventions could increasingly represent an unrepresentative set of providers, thereby limiting the generalizability (external validity) of the evaluation results. The problem is amplified by the fact that commercial insurers also are experimenting with alternative provider payment models.

In the NGACO evaluation, many participants in the treatment group had previous experience in the MSSP and Pioneer ACOs and their NGACO participation decisions may have been based on that experience. In contrast, providers with such experience were excluded from the comparison group.

Provider and ACO churn. When ACOs or individual providers drop out of the treatment group or providers leave an ACO during the evaluation observation period, researchers refer to such changes as churn-out. When new ACOs enter the treatment group or providers join a participating ACO during the observation period, they are said to churn in. Provider churn is important to CMS because recruiting and retaining only inherently efficient providers can result in financial rewards for an ACO without any savings for Medicare. As noted earlier, Medicare saves money only if the model’s incentives cause providers to become more efficient or result in more beneficiaries being treated by inherently more efficient providers. In fact, if the ACOs recruit and retain only inherently efficient providers, Medicare will lose money because it will pay bonuses to inherently efficient providers whose behavior was not altered by the model’s incentives.

Organizational-level churn, or entrance and exit from the model, is not random. ACOs that experience financial losses are more likely to leave the current model and less likely to volunteer to participate in future models. Consequently, results over multiple years or a sequence of models will largely reflect the experience of successful ACOs. Attrition from the model could affect external validity.

Leakage and spillover effects. Beneficiaries in TM are free to seek care from any provider, but ACOs are accountable for Medicare spending incurred only for their aligned beneficiaries. Evaluators refer to services received by ACO-aligned beneficiaries from non-ACO providers as leakage (care sought from outside the NGACO network). Similarly, expenditures for beneficiaries in the comparison group (non-ACO providers) who receive care from ACO providers are referred to as spillover (the treatment effect spills over to the comparison group).

Both leakage and spillover can affect the model’s estimated treatment effect. For example, if patients attributed to the ACO have lower risk-adjusted spending or better health outcomes than patients in the comparison group, failure to account for leakage and spillover could reduce an otherwise positive estimated treatment effect because beneficiaries attributed to an ACO could incur larger costs if they seek care from non-ACO providers—costs for which the ACO would be held responsible. Costs for comparison patients who receive care from more efficient ACO providers could reduce the cost of care for comparison group patients, which could bias the model’s treatment effect toward zero. In current ACO models, ACOs do not receive payments for lowering the cost of care to beneficiaries in the comparison group and cost data for comparison providers are not adjusted for the care received from ACO providers.

Gross and net spending. CMMI ACOs receive financial rewards when they perform better than financial benchmarks determined by CMMI and financial penalties when they perform worse than the benchmarks. CMS determines a benchmark for each ACO based on risk-adjusted, historical Medicare Part A and B spending for NGACO-aligned beneficiaries, regional spending trends, and quality-of-care metrics. In evaluations of CMS ACOs, changes in Medicare spending that do not include these financial rewards and penalties are referred to as gross impacts. Changes in Medicare spending that include such rewards or penalties are referred to as net impacts. The gross and net spending are 2 possible standards against which the performance of ACOs could be assessed.

Instead of comparing an ACO’s performance against the CMS benchmark, ACOs could be compared with a comparison group. These 2 standards produce different conclusions regarding ACO performance. An ACO could reduce Medicare spending relative to a contemporaneous comparison group but be penalized because it exceeded its administrative benchmark target.25 Conversely, an ACO could receive a financial reward for meeting its administrative benchmark target but still increase Medicare spending relative to its contemporaneous comparison group. When reporting results on the ACO’s performance, it is important to specify whether the results represent gross or net impacts and performance relative to CMMI’s benchmark vs the comparison group.

DISCUSSION AND CONCLUSIONS

CMS has tested many alternative provider payment models. Evaluation of ACO models can be difficult for many reasons. Evaluators will continue to contend with issues related to scaling up successful models, overlapping and past participation in other models, multiple levels of endogenous sample selection, churn, leakage, spillover, ways of determining shared savings and penalties, and the advantages and disadvantages of different estimation approaches. By understanding and anticipating these problems early in the evaluation process, CMS and its evaluators can determine the most policy-relevant research questions, the most appropriate research designs, and the best methods for estimating the policy-relevant treatment effects.

Author Affiliations: University of Minnesota (BED, RDF, KW), Minneapolis, MN; Center for Medicare and Medicaid Innovation (WL), Baltimore, MD; NORC at the University of Chicago (KR, SP), Bethesda, MD.

Source of Funding: The authors acknowledge support from the Center for Medicare and Medicaid Innovation.

Author Disclosures: Dr Dowd, Dr Feldman, Dr Rowan, Dr Parashuram, and Dr White received support from CMS for the present manuscript from 2016 to 2023 and for other work from 2021 to 2023. They also received support for attending meetings and/or travel from CMS as part of the evaluation contract. Dr Dowd performed this work as part of a subcontract between the University of Minnesota and NORC. Dr Lee received salary and work equipment from CMS as an employee of the Center for Medicare and Medicaid Innovation.

Authorship Information: Concept and design (BED, RDF, WL, SP); acquisition of data (WL, KR, SP, KW); analysis and interpretation of data (BED, RDF, WL, KR, SP, KW); drafting of the manuscript (BED, RDF, SP, KW); critical revision of the manuscript for important intellectual content (BED, RDF, WL, KR, SP, KW); statistical analysis (BED, SP); obtaining funding (BED, SP); administrative, technical, or logistic support (BED, KR); and supervision (BED, SP).

Send Correspondence to: Bryan E. Dowd, PhD, University of Minnesota, 729 MMC, Minneapolis, MN 55419. Email: dowdx001@umn.edu.

REFERENCES

Our mission. Center for Medicare and Medicaid Innovation. Updated March 4, 2024. Accessed August 4, 2023. https://innovation.cms.gov/about/our-mission
Accountable care and accountable care organizations. Center for Medicare and Medicaid Innovation. Updated May 14, 2024. Accessed August 4, 2023. https://innovation.cms.gov/key-concept/accountable-care-and-accountable-care-organizations
Innovation Center Strategy Refresh. CMS. Accessed September 9, 2023. https://www.cms.gov/priorities/innovation/strategic-direction-whitepaper
Evaluation of the Next Generation Accountable Care Organization model. NORC at the University of Chicago. Accessed November 6, 2024. https://bit.ly/3Znvm8Q
CMMI model certifications. CMS. Updated September 10, 2024. Accessed April 17, 2023. https://bit.ly/414uffp
Gertler PJ, Martinez S, Premand P, Rawlings LB, Vermeersch CMJ. Impact Evaluation in Practice. 2nd ed. World Bank Publications; 2016.
Heckman JJ. Epilogue: randomization and social policy revisited. In: Bédécarrats F, Guérin I, Roubaud F, eds. Randomized Control Trials in the Field of Development: A Critical Perspective. Oxford University Press; 2020:304-330.
Abadie A. Using synthetic controls: feasibility, data requirements, and methodological aspects. J Econ Lit. 2021;59(2):391-425. doi:10.1257/jel.20191450
Bai J. Panel data models with interactive fixed effects. Econometrica. 2009;77(4):1229-1279. doi:10.3982/ECTA6135
Cunningham S. Causal Inference: The Mixtape. Yale University Press; 2021.
Huntington-Klein N. The Effect: An Introduction to Research Design and Causality. Chapman and Hall; 2021.
Bertrand M, Duflo E, Mullainathan S. How much should we trust difference-in-differences estimates? Q J Econ. 2004;119(1):249-275. doi:10.1162/003355304772839588
Goodman-Bacon A. Difference-in-differences with variation in treatment timing. J Econom. 2021;225(2):254-277. doi:10.1016/j.jeconom.2021.03.014
Callaway B, Sant’Anna PHC. Difference-in-differences with multiple time periods. J Econom. 2021;225(2):200-230. doi:10.1016/j.jeconom.2020.12.001
Sun L, Abraham S. Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. J Econom. 2021;225(2):175-199. doi:10.1016/j.jeconom.2020.09.006
Arkhangelsky D, Athey S, Hirshberg DA, Imbens GW, Wager S. Synthetic difference-in-differences. Am Econ Rev. 2021;111(12):4088-4118. doi:10.1257/aer.20190159
Baiocchi M, Cheng J, Small DS. Instrumental variable methods for causal inference. Stat Med. 2014;33(13):2297-2340. doi:10.1002/sim.6128
Heckman J. Shadow prices, market wages, and labor supply. Econometrica. 1974;42(4):679-694. doi:10.2307/1913937
Lee LF. Estimation of Limited Dependent Variables by Two Stage Methods. PhD thesis. University of Rochester Department of Economics; 1977.
Lee LF. Generalized econometric models with selectivity. Econometrica. 1983;51(2):507-512. doi:10.2307/1912003
Terza JV. Two-stage residual inclusion estimation in health services research and health economics. Health Serv Res. 2018;53(3):1890-1899. doi:10.1111/1475-6773.12714
Cook TD. “Waiting for life to arrive”: a history of the regression-discontinuity design in psychology, statistics and economics. J Econom. 2008;142(2):636-654. doi:10.1016/j.jeconom.2007.05.002
Grannemann TW, Brown RS. Adapting evaluations of alternative payment models to a changing environment. Health Serv Res. 2018;53(2):991-1007. doi:10.1111/1475-6773.12689
Navathe AS, Liao JM, Wang E, et al. Association of patient outcomes with bundled payments among hospitalized patients attributed to accountable care organizations. JAMA Health Forum. 2021;2(8):e212131. doi:10.1001/jamahealthforum.2021.2131
Parashuram S, Lee W, Rowan K, et al. The effect of Next Generation accountable care organizations on Medicare expenditures. Health Aff (Millwood). 2024;43(7):933-941. doi:10.1377/hlthaff.2022.01648