Speech and language technologies are effective tools for identifying the distinct speech changes associated with Parkinson's disease (PD), enabling earlier and more accurate diagnosis. Recent advancements in self-supervised speech pretraining, particularly with Wav2Vec models, have demonstrated superior performance over traditional feature extraction methods. While Wav2Vec 2.0 has been successfully utilized for PD detection, a rigorous quantitative comparison with Wav2Vec 1.0 is needed to comprehensively evaluate its advantages, limitations, and applicability across different speech modes in PD. This study presents a systematic comparison of Wav2Vec 1.0 and Wav2Vec 2.0 embeddings across three multilingual datasets using various classification approaches in classifying normal (healthy controls; HC) and PD speech. Additionally, both Wav2Vec versions were benchmarked against traditional baseline features across diverse linguistic contexts, including spontaneous speech, non-spontaneous speech, and isolated vowels. A multicriteria TOPSIS approach was employed to rank feature extraction methods, revealing that the Wav2Vec 2.0 consistently excelled across all speech modes, with its first transformer layer demonstrating the best performance for contextual tasks (read text and monologue) and its feature extractor performing best in vowel-based classification. In contrast, the Wav2Vec 1.0, while generally outperformed by the Wav2Vec 2.0, still provided a faster alternative with competitive performance in contextual tasks, highlighting its potential for specific applications, such as federated learning. This comparative analysis furthermore underscores the strengths of each Wav2Vec architecture and informs their optimal use in PD detection.
Competing Interest StatementThe authors have declared no competing interest.
Funding StatementThis study was supported by the project of the National Institute for Neurological Research (Programme EXCELES, ID Project No. LX22NPO5107), funded by the European Union, Next Generation EU.
Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The datasets used in this study are publicly available, with one dataset being available for download upon request from their authors. 1. (MDVR KCL) Available at: https://doi.org/10.5281/zenodo.2867216 2. (PC-GITA) The PC-GITA dataset is available upon request from Juan Rafael Orozco-Arroyave affiliated with Universidad de Antioquia UdeA. The study complied with the Helsinki Declaration and was approved by the Ethics Committee of Clinica Noel in Medellín, Colombia. A written informed consent was signed by each participant. 3. Available at: https://figshare.com/articles/dataset/Voice_Samples_for_Patients_with_Parkinson_s_Disease_and_Healthy_Controls/23849127
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
留言 (0)