Analyzing wav2vec embedding in Parkinson's disease speech: A study on cross-database classification and regression tasks

Abstract

Advancements in deep learning speech representations have facilitated the effective use of extensive datasets comprised of unlabeled speech signals, and have achieved success in modeling tasks associated with Parkinson's disease (PD) with minimal annotated data. This study focuses on PD non-fine-tuned wav2vec 1.0 architecture. Utilizing features derived from wav2vec embedding, we develop machine learning models tailored for clinically relevant PD speech diagnosis tasks, such as cross-database classification and regression to predict demographic and articulation characteristics, for instance, modeling the subjects' age and number of characters per second. The primary aim is to conduct feature importance analysis on both classification and regression tasks, investigating whether latent discrete speech representations in PD are shared across models, particularly for related tasks. The proposed wav2vec-based models were evaluated on PD versus healthy controls using three multi-language-task PD datasets. Results indicated that wav2vec accurately detected PD based on speech, outperforming feature extraction using mel-frequency cepstral coefficients in the proposed cross-database scenarios. Furthermore, wav2vec proved effective in regression, modeling various quantitative speech characteristics related to intelligibility and aging. Subsequent analysis of important features, obtained using scikit-learn feature importance built-in tools and the Shapley additive explanations method, examined the presence of significant overlaps between classification and regression models. The feature importance experiments discovered shared features across trained models, with increased sharing for related tasks, further suggesting that wav2vec contributes to improved generalizability. In conclusion, the study proposes wav2vec embedding as a promising step toward a speech-based universal model to assist in the evaluation of PD.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was supported by the project of the National Institute for Neurological Research (Programme EXCELES, ID Project No. LX22NPO5107), funded by the European Union, Next Generation EU.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study used ONLY openly available human data. Italian datataset is available at the following URL: https://ieee-dataport.org/open-access/italian-parkinsons-voice-and-speech. English dataset is available at the following URL: https://zenodo.org/records/2867216. The dataset comprised of participants rhythmically repeating the syllable /pa/ was openly available as a training dataset for the international competition Biosignal Challenge 2018 at Czech Technical University in Prague.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

The Italian and English datasets analyzed during the current study are publicly available. Italian data is available at the following URL: https://ieee-dataport.org/open-access/italian-parkinsons-voice-and-speech. English data is available at the following URL: https://zenodo.org/records/2867216. The dataset comprised of participants rhythmically repeating the syllable /pa/ is available from the corresponding author upon reasonable request.

https://ieee-dataport.org/open-access/italian-parkinsons-voice-and-speech

https://zenodo.org/records/2867216

留言 (0)

沒有登入
gif