close
close
Analysis of four long noncoding RNAs for screening and prognosis of hepatocellular carcinoma using machine learning techniques

Hepatocellular carcinoma (HCC) represents a significant public health challenge in Egypt. Early detection is crucial for optimal patient outcomes. The aim of this study was to develop a machine learning model to improve HCC diagnosis by integrating long noncoding RNA (lncRNA) biomarkers into conventional liver function tests. In hepatocellular carcinoma (HCC), traditional biomarkers such as ALT and AST are often elevated due to liver damage. However, this increase is non-specific and occurs in various liver diseases, including hepatitis, cirrhosis and general liver damage9. ALT and AST measure hepatocyte integrity but lack specificity for the molecular changes unique to HCC. In contrast, lncRNAs such as UCA1, GAS5, LINC00152 and LINC00853 are involved in HCC-specific oncogenic processes, including the regulation of cell proliferation, apoptosis and metastasis, which directly correlate with cancer pathology13. These lncRNAs provide insights into the molecular basis of HCC that ALT and AST do not capture. By integrating these lncRNAs into our diagnostic model alongside ALT and AST, we have significantly improved specificity and sensitivity and improved our ability to more accurately distinguish HCC from other liver diseases. This integration leverages the unique predictive information provided by lncRNAs, improving the overall diagnostic performance of the model and eliminating the limitations of traditional liver enzymes in HCC screening.

Previous studies reported changes in the expression of lncRNAs in HCC tissues28 and their elevated levels in patients’ serum samples26. All lncRNAs selected in this study have a proven role in HCC pathogenesis, UCA1, GAS5, LINC00152, and LINC00853, and were selected based on their demonstrated functional relevance in hepatocellular carcinoma (HCC) and their demonstrated potential as diagnostic biomarkers. UCA1 has been extensively documented as an oncogenic lncRNA involved in various malignancies, including HCC. It promotes cell proliferation, migration and resistance to apoptosis, in part through its interaction with key pathways such as the Hippo pathway, which influences tumor growth and survival17. Increased UCA1 expression has been associated with poor outcomes in HCC patients, suggesting that it may be an indicator of aggressive disease progression13. On the other hand, GAS5 functions as a tumor suppressor, and its downregulation in HCC is associated with increased proliferation and reduced apoptosis. Studies suggest that GAS5 plays a role in cell cycle arrest and apoptosis through mechanisms such as caspase-dependent stress pathways in the endoplasmic reticulum11. These opposing roles of UCA1 and GAS5 provide complementary insights into disease biology and justify their inclusion as diagnostic markers. LINC00152 has been shown to promote cell proliferation and migration through modulation of cyclin D1 (CCND1), with high expression associated with poor prognosis16. In addition, LINC00152 functions as a competing endogenous RNA (ceRNA) and influences oncogenic signaling pathways by binding miRNAs that regulate tumor suppressor genes. Although LINC00853 has been less extensively studied in HCC, there is emerging evidence for its role in cell proliferation and invasion, making it a promising candidate for further study as a diagnostic biomarker for HCC23. Although several lncRNAs were initially tested at the beginning of this study, our final model prioritized the combination of these selected four lncRNAs, which optimized prediction accuracy while minimizing complexity and cost.

Our results show that lncRNAs alone provide moderate sensitivity and specificity for HCC diagnosis. Furthermore, some of the lncRNAs examined showed a prognostic association with mortality risk. The machine learning model we implemented significantly improved diagnostic sensitivity and specificity, highlighting the potential of this approach for improved early detection and diagnosis of HCC.

To our knowledge, this study represents a groundbreaking effort in using a machine learning model for HCC diagnosis by integrating lncRNAs with standard laboratory data. By leveraging the data processing capabilities of machine learning, we were able to achieve a significant improvement in diagnostic performance, with sensitivity and specificity nearly equal achieved 100%. In addition, the developed model was translated into a user-friendly web application that was piloted by healthcare professionals. Their feedback indicated a straightforward interface that delivers fast and accurate results based on laboratory data. This cost-effective approach promises large-scale screening and enables cost-effective testing of a large population compared to traditional diagnostic methods. The use of readily available laboratory data for screening has the potential to reduce the financial burden on the healthcare system and enable more comprehensive and efficient service delivery.

Previous research has evaluated the use of artificial intelligence and the accuracy of machine ML for the prediction and/or diagnosis of HCC and documented differences in the accuracy of different models. Sato et al. compared different algorithms (logistic regression, SVM, gradient boosting) on ​​clinical data and found that gradient boosting had the highest accuracy29. Angelis et al. who used a publicly available HCC dataset to evaluate various feature selection and classification techniques also achieved the best results with gradient boosting (84% accuracy, 93% precision).30. Wong et al. reported that ridge regression and random forest models contributed comparable performance to traditional scores such as CU-HCC (California University-Hepatocellular Carcinoma) and GAG-HCC (Ghent-Amsterdam-Gothenburg-Hepatocellular Carcinoma) for HCC prediction HBV/HCV patients31. In our study, the Support Vector Machine and Logistic Regression algorithms showed the strongest performance. These results highlight the importance of algorithm selection and potential variability in model performance for HCC diagnosis. On the other hand, although the performance of the model developed in this study is close to 100%, the results of the machine learning model are somewhat sensitive to the sample size, experimental setup and data sets. Therefore, performance may vary on different datasets, therefore validation of the model on different datasets is strongly recommended to ensure clinical applicability.

Studies have also investigated the use of genetic data in ML models for HCC prediction. Chen et al. used a random forest model to investigate the HCC prediction potential of HBV reverse transcriptase gene. Their model achieved optimal performance through a combination of 10 features and demonstrated robustness across different HBV genotypes and sequencing depths32. Likewise, Tao et al. applied a random forest model to distinguish HCC from chronic HBV infection based on ctDNA copy number aberrations. The model achieved robust performance in the two validation cohorts evaluated33.

Our study identified a significant association between increased mortality risk in HCC patients and both higher expression levels of LINC00152 and lower expression levels of GAS5. LINC00152 is known to be aberrantly expressed in various cancer types and is associated with cell proliferation, migration, invasion, therapeutic resistance, tumor growth and metastasis34. Previous research found overexpression of LINC00152 in HCC tissues compared to healthy controls and demonstrated its role as an independent prognostic factor associated with worse patient survival35.36suggesting its potential as a therapeutic target for HCC37.

In contrast to LINC00152, GAS5 showed a protective effect against mortality in HCC patients, although it had higher expression levels in HCC compared to controls. Previous studies have documented the tumor suppressive role of GAS5 in HCC, including increasing radiosensitivity, inhibiting invasion, and the poor prognosis associated with downregulation38,39,40. Overall, our results suggest a complex role for GAS5 in HCC, potentially playing a role in tumorigenesis but also exerting a protective effect against disease progression.

Our study has some limitations. First, the study has inherent limitations related to patient demographics. The average age of the study population of 63 years and the predominantly male composition (80%) are consistent with the typical HCC patient profile41.42. However, these characteristics could affect the generalizability of the model to populations with different age and gender distributions. Another limitation is the relatively small sample size. While our results provide valuable insights, a larger cohort could strengthen the generalizability of the results. We only focused on analyzing circulating lncRNA levels in plasma, which is suitable for screening purposes. However, integrating these data with tissue expression levels of the same lncRNAs would have provided a more comprehensive perspective. This combined approach could have provided valuable validation of our results and enabled a deeper understanding of the role of these lncRNAs in HCC. Finally, it is important to emphasize that although the performance of the models reaches 100%, which is very promising, the model needs to be validated on different datasets before moving to clinical use to examine the influence of sample size and variability between them to minimize records to the results.

Conclusions and recommendations

Our study shows that lncRNAs provide moderate diagnostic value for HCC. However, implementing a machine learning model that integrates lncRNAs with standard laboratory data significantly improves their diagnostic utility. This model can be easily translated into a user-friendly interface such as a website or mobile application, facilitating convenient use by healthcare professionals. The simplicity of the model, coupled with the relative speed and affordability of the underlying laboratory tests, make it a promising tool for large-scale screening.

Future research directions include evaluating the robustness and prognostic prediction capabilities of the model in a larger patient cohort. Furthermore, the study of a broader range of lncRNAs promises further refinement and optimization of the model. Furthermore, examining the model’s ability to distinguish HCC from other benign liver diseases represents a promising approach for future research.

Leave a Reply

Your email address will not be published. Required fields are marked *