Abstract:
Objective To improve the efficiency and accuracy of videonystagmography calibration test results while enabling effective recognition of saccadic undershoot waveform by developing a dual-stream architecture-based deep learning model.
Methods A vestibular function calibration test recognition model with cross-modal feature fusion was constructed by integrating vision transformer (ViT) and a modified ConvNeXt convolutional network. The model utilized trajectory pictures and spatial distribution maps as inputs, employed a multi-task learning framework to classify calibration data, and to directly evaluate undershoot waveform.
Results The model showed outstanding performance in assessing calibration compliance. The accuracy, sensitivity, specificity of the model in left side, middle, and right side were all greater than 90%, and AUC values were all greater than 0.99, with 97.66% of optimal accuracy (middle), 98.98% of optimal sensitivity (middle), 96.87% of optimal specificity (right side), and 0.997 of AUC (right side). The model also showed promising performance in undershoot waveform recognition with 87.50% of accuracy, 89.66% of sensitivity, 85.71% of specificity, 86.67% of F1 score, and 0.931 of AUC.
Conclusions The proposed method not only significantly enhances the efficiency and accuracy of calibration test results, but also provides a novel solution for undershoot waveform recognition.