基于临床病理特征及炎性指标的机器学习模型预测胃癌脉管浸润的价值

杨宝顺; 马晓梅; 曹栋; 张恒

doi:10.12025/j.issn.1008-6358.2026.20250949

基于临床病理特征及炎性指标的机器学习模型预测胃癌脉管浸润的价值

Value of machine learning models based on clinico-pathological features and inflammatory markers predicting lymphovascular invasion in gastric cancer

摘要

摘要:
目的探讨基于临床病理特征与炎性指标建立的机器学习模型在术前预测胃癌脉管浸润（lymphovascular invasion, LVI）中的价值。
方法回顾性纳入兰州大学第一医院193例和复旦大学附属中山医院185例原发性胃癌患者，分别作为训练集和验证集。收集患者临床病理特征、肿瘤标志物及炎性指标数据，筛选影响LVI的独立危险因素。在训练集中建立6种机器学习模型，通过ROC曲线下面积（area under the curve, AUC）、校准曲线、决策曲线及Brier评分评估模型的预测性能，并采用沙普利加性解释（Shapley additive explanations, SHAP）模型进行可视化分析。
结果多因素logistic回归分析显示，肿瘤浸润深度（T分期）、淋巴结转移数目（N分期）和系统免疫炎症指数（systemic immune-inflammation index, SII）增加是胃癌LVI的独立危险因素（P＜0.05）。基于这3项指标，构建6种机器学习模型，各模型均表现出较好的预测性能，训练集中AUC最小0.79，验证集中AUC最小0.76。其中LightGBM模型综合表现最优，在训练集和验证集中AUC分别为0.83和0.82，Brier评分为0.163和0.187；校准曲线与决策曲线分析证实该模型具有较好的预测准确性和临床效用。SHAP分析示，LightGBM模型中，N分期的贡献度最高，其次是T分期和SII评分。
结论基于临床病理特征和炎性指标构建的机器学习模型可有效预测胃癌LVI状态，以LightGBM模型表现最优。

Abstract:
Objective To explore the predictive performance of machine learning models integrating clinico-pathological features and inflammatory markers for lymphovascular invasion (LVI) before gastric cancer surgery.
Methods A retrospective cohort of 193 gastric cancer patients from The First Hospital of Lanzhou University (training set) and 185 patients from Zhongshan Hospital, Fudan University (validation set) was included. Preoperative clinical pathological characteristics, tumor markers, and inflammatory markers were collected to identify independent risk factors for LVI. Six machine learning models were established in the training set. Model performance was evaluated using area under the receiver operating characteristic (ROC) curve (AUC), calibration curve, decision curve analysis (DCA), and Brier scores. Shapley additive explanations (SHAP) was applied for model interpretability.
Results The multivariate logistic regression showed increased tumor invasion depth (T-stage), lymph node metastases (N-stage), and the systemic immune-inflammation index (SII) were independent risk factors for gastric cancer LVI (P＜0.05). Using these three indicators, 6 machine learning models were developed, all of which demonstrated favorable predictive performance, with 0.79 and 0.76 of minimum AUC values in the training set and the validation set, respectively. Among them, the light gradient boosting machine (LightGBM) model exhibited the best overall performance, achieving AUCs of 0.83 and 0.82 in the training set and the validation set, along with Brier scores of 0.163 and 0.187, respectively. Calibration and DCA curves further confirmed that the model possesses strong predictive accuracy and application value. SHAP analysis showed the feature importance in LightGBM model, identifying the N-stage as the top contributor, followed by the T-stage and the SII.
Conclusion The machine learning models incorporating clinical pathological features and inflammatory indicators can effectively predict LVI status in gastric cancer, with the LightGBM model demonstrating optimal performance.

HTML全文

参考文献(15)

施引文献

资源附件(0)