Machine Learning–Based Classifier Accurately Identifies Ovarian Cancer

January 25, 2024

News

Article

A predictive model utilizing serum metabolic profiles was able to distinguish ovarian cancer from control samples with 93% accuracy, according to a new study.

Machine learning–based classification systems have shown promise in detecting cancer and other diseases early, and a study published in Gynecologic Oncology found the same may hold true for early detection of ovarian cancer.

Early cancer detection has proven especially difficult in cancers such as ovarian cancer that do not typically display clinical symptoms early on. Even on the molecular level, finding a single biomarker or biomarker set that is present in all individuals with a certain type of cancer is difficult due to the disease’s heterogeneity, the authors explained.

“The ideal cancer diagnostic should not only be highly accurate but additionally non-invasive and low cost to be widely available to the general public,” the authors wrote. “Despite heroic efforts to develop such cancer diagnostics over the last several decades, this goal has proven to be frustratingly elusive.”

Ovarian cancer | Image credit: blueringmedia - stock.adobe.com

Utilizing machine learning to analyze large sets of omics (genomic/proteomic/metabolomic) data can help identify patterns that may help diagnose cancer, and the current study aimed to develop an effective machine learning–based approach to detect ovarian cancer early via metabolic profiles in blood.

To develop the model, researchers collected serum samples from 431 patients with ovarian cancer and 133 healthy individuals with no known medical pathologies. The samples were acquired from 4 locations across the US and Canada and transferred to the Creative Proteomics laboratory in Shirley, NY, where ultra-performance liquid chromatography and high-resolution mass spectrometry analyses were conducted to characterize the metabolome of patients with ovarian cancer.

The predictive accuracy of 5 machine learning classifiers (logistic regression classifier, random forest classifier, support vector machine, k-nearest neighbor, and adaptive boosting) was examined, with results aggregated to construct a consensus classifier. Metabolomic analyses of the samples were conducted in 2 batches due to the size of the sample pool, and analysis using 2 different ionization modes (negative and positive) produced 4 distinct datasets in which the classifiers were examined. Before evaluating the classifiers independently, the investigators identified reliable metabolites via recursive feature elimination and cross-validation, then assigned relative rankings to those features based on relative frequency and importance.

The 5 classifiers were assessed using positive predictive value (PPV), negative predictive value (NPV), F1-score (F1), and Matthew's correlation coefficient (MCC). Their performances varied slightly, but overall performance was high across the 4 datasets with PPV of 93% or higher, NPV of 87% or higher, F1 of 92% or higher, and MCC of 0.78 or higher.

“By combining results from the 4 datasets and selecting the best average score among them, we observed a notable improvement in classifying both cancer and normal samples,” the authors wrote. “This underscores that each dataset brings its unique contribution to the accurate prediction of cancer or non-cancer status.”

Combining the findings into a consensus classifier showed the best overall performance, with a PPV of 93%. Accuracy was found slightly better in early-stage vs late-stage disease due to greater heterogeneity among the molecular profiles of patients with late-stage disease.

“In this regard, it may be relevant to note that the terms ‘early stage’ vs ‘late stage’ when applied to ovarian cancer do not necessarily imply a temporal progression,” the authors noted. “A more accurate terminology might be ‘pre-metastatic’ vs ‘post-metastatic.’”

Overall, the high PPV seen with the consensus classifier suggests that machine learning analysis of omics data, particularly metabolomic data, is a potentially effective approach in the diagnosis of ovarian cancer and other malignancies, the authors concluded. Still, they noted that there are limitations, including the PPV associated with the same machine-learning model can differ based on the size and composition of the datasets used to build and test these models.

Models such as the one developed in the study could translate to clinical use by generating score ranges that can indicate the likelihood of a patient having or not having cancer, potentially helping with treatment decisions, according to the authors.

“We believe this personalized/probabilistic approach to cancer diagnostics is more robust and clinically informative than the more traditional binary (yes/no) tests and may represent a promising new direction in the early detection of ovarian cancer and perhaps other cancer types as well,” the authors concluded.

Reference

Ban D, Housley SN, Matyunina LV, et al. A personalized probabilistic approach to ovarian cancer diagnostics. Gyn Oncol. Published online January 23, 2024. doi:10.1016/j.ygyno.2023.12.030