Publications - ESPERANTO

Dissemination

- Speaker embeddings by modeling channel-wise correlations
  Stafylakis T., Rohdin J., Burget L., Speaker embeddings by modeling channel-wise correlations, Interspeech 2021
  Read more
- Log-Likelihood-Ratio Cost Function as Objective Loss for Speaker Verification Systems
  Mingote, V., Miguel, A., Ortega, A., Lleida, E. "Log-Likelihood-Ratio Cost Function as Objective Loss for Speaker Verification Systems" 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. Brno, Czech Republic.
  Read more
- Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challenge 2021
  Gimeno, P; Ortega, A.; Miguel, A.; Lleida, E. "Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challenge 2021" 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. Brno, Czech Republic.
  Read more
- The Domain Mismatch Problem in the Broadcast Speaker Attribution Task
  Viñals, I.; Ortega, A.; Miguel, A.; Lleida, E. "The Domain Mismatch Problem in the Broadcast Speaker Attribution Task". Applied Sciences, vol. 11, no. 18, p. 8521, Sept. 2021.
  Read more
- Generalising AUC Optimisation to Multiclass Classification for Audio Segmentation with Limited Training Data
  Gimeno, P.; Ortega, A.; Miguel, A.; Lleida, E. "Generalising AUC Optimisation to Multiclass Classification for Audio Segmentation with Limited Training Data". IEEE Signal Processing Letters, 28 , pp. 1135-1139, 2021.
  Read more
- The LIUM Human Active Correction Platform for Speaker Diarization
  Flucha, A., Larcher, A., Mehrish, A., Meignier, S., Plaut, F., Poupon, N., Prokopalo, Y., Puertolas, A., Shamsi, M., Tahon, M. (2021) The LIUM Human Active Correction Platform for Speaker Diarization. Proc. Interspeech 2021, 965-966
  Read more
- MultiSV: Dataset for Far-Field Multi-Channel Speaker Verification
  Mošner L., Plchot O., Burget L., Černock J. (202é) MultiSV: Dataset for Far-Field Multi-Channel Speaker Verification. ICASSP 2022,
  Read more
- Multi-channel Speaker Verification with Conv-TasNet Based Beamformer
  Mošner L., Plchot O., Burget L., Černock J. (2022) Multi-channel Speaker Verification with Conv-TasNet Based Beamformer. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  Read more
- Analyzing speaker verification embedding extractors and back-ends under language and channel mismatch
  Silnova A., Stafylakis T., Mosner L., Plchot O., Rohdin J., Matejka P., Burget L, Glembek O., Brummer N.. (2022) Analyzing speaker verification embedding extractors and back-ends under language and channel mismatch. Odyssey: The Speaker and Language Recognition Workshop 2022
  Read more
- Development of ABC Systems for the 2021 Edition of NIST Speaker Recognition Evaluation
  Alam J., Beneš R., Beszédeš M., Burget L., Dahmane M., Fathan A., Ghodrati H., Glembek O., Kang W.H., Matĕjka P., Mošner L., Plchot O., Rohdin J., Silnova A., Stafylakis T.(2022) Development of ABC Systems for the 2021 Edition of NIST Speaker Recognition Evaluation. Odyssey: The Speaker and Language Recognition Workshop 2022
  Read more
- Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings
  Brummer N., Swart A., Mosner L, Silnova A., Plchot O., Stafylakis T. Burget L.,(2022) Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings. Interspeech 2022
  Read more
- Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundarie
  Stafylakis T., Mosner L, Plchot O., Rohdin J., Silnova A., Burget L., Cernocky J., (2022) Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundarie. Interspeech 2022
  Read more
- Microphone Array Channel Combination Algorithms for Overlapped Speech Detection
  Mariotte T., Larcher A., Montresor S., Thomas J.-H. (2022) Microphone Array Channel Combination Algorithms for Overlapped Speech Detection. Interspeech 2022
  Read more
- Phone-Level Pronunciation Scoring for Spanish Speakers Learning English Using a GOP-DNN System
  Vidal J., Bonomi C., Sancinetti M., Ferrer L.. (2022) Phone-Level Pronunciation Scoring for Spanish Speakers Learning English Using a GOP-DNN System. Interspeech 2022
  Read more
- ViVoLAB System Description for the S2TC IberSPEECH-RTVE 2022 challenge
  Miguel, A., Ortega, A., & Lleida, E. (2022). ViVoLAB System Description for the S2TC IberSPEECH-RTVE 2022 challenge. Proc. IberSPEECH 2022, 284.
  Read more
- End-to-End Speech Translation of Arabic to English Broadcast News.
  Bougares F. and Jouili S., End-to-End Speech Translation of Arabic to English Broadcast News, The Seventh Arabic Natural Language Processing Workshop (WANLP 2022)
  Read more
- Automatic Voice Disorder Detection Using Self-Supervised Representations
  D. Ribas, M. A. Pastor, A. Miguel, D. Martínez, A. Ortega and E. Lleida, "Automatic Voice Disorder Detection Using Self-Supervised Representations," in IEEE Access, vol. 11, pp. 14915-14927, 2023, doi: 10.1109/ACCESS.2023.3243986.
  Read more
- Class token and knowledge distillation for multi-head self-attention speaker verification systems
  Mingote, V.; Miguel, A.; Ortega, A.; Lleida, E. Class token and knowledge distillation for multi-head self-attention speaker verification systems. Digital Signal Processing, 2023, vol. 133, p. 103859.
  Read more
- Multimodal Diarization Systems by Training Enrollment Models as Identity Representations
  Mingote, V.; Viñals, I.; Gimeno, P.; Miguel, A.; Ortega, A.; Lleida, E.
  Multimodal Diarization Systems by Training Enrollment Models as Identity Representations. Applied Sciences. 2022; 12(3):1141
  DOI:10.3390/app12031141, Corpus ID: 246348513
  Read more
- aDCF Loss Function for Deep Metric Learning in End-to-End Text-Dependent Speaker Verification Systems.
  Mingote, V.; Miguel, A.; Ribas D.; Ortega, A.; Lleida, E. aDCF Loss Function for Deep Metric Learning in End-to-End Text-Dependent Speaker Verification Systems. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 772-784, 2022, doi: 10.1109/TASLP.2022.3145307
  Read more
- Unsupervised Adaptation of Deep Speech Activity Detection Models to Unseen Domains
  Gimeno, P.; Ribas, D.; Ortega, A.; Miguel, A.; Lleida, E. Unsupervised Adaptation of Deep Speech Activity Detection Models to Unseen Domains. Applied Sciences 2022, 12, 1832. https://doi.org/10.3390/app12041832
  Read more
- Wiener Filter and Deep Neural Networks: A Well-Balanced Pair for Speech Enhancement
  Ribas, D.; Miguel, A.; Ortega, A.; Lleida, E. Wiener Filter and Deep Neural Networks: A Well-Balanced Pair for Speech Enhancement. Applied Sciences 2022, 12, 9000.
  DOI:10.3390/app12189000, Corpus ID: 252174387
  Read more
- Shouted and whispered speech compensation for speaker verification systems
  Prieto, S.; Ortega, A.; López-Espejo, I.; Lleida, E. Shouted and whispered speech compensation for speaker verification systems. Digital Signal Processing, 2022, vol. 127, p. 103536.
  https://doi.org/10.1016/j.dsp.2022.103536
  Read more
- On the Problem of Data Availability in Automatic Voice Disorder Detection
  Ribas, Dayana; Miguel, Antonio; Ortega, Alfonso; Lleida, Eduardo
  On the Problem of Data Availability in Automatic Voice Disorder Detection. Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2023) – Volume 5: HEALTHINF, 2023, ISBN: 978-989-758-631-6.
  Read more
- S3prl-Disorder: Open-Source Voice Disorder Detection System based in the Framework of S3PRL-toolkit
  Ribas, D., Yoldi, M.A.P., Miguel, A., Martínez, D., Ortega, A., Lleida, E. (2022) S3prl-Disorder: Open-Source Voice Disorder Detection System based in the Framework of S3PRL-toolkit . Proc. IberSPEECH 2022, 136-140,
  doi: 10.21437/IberSPEECH.2022-28
  Read more
- A Study on the Use of wav2vec Representations for Multiclass Audio Segmentation
  Gimeno, P., Ortega, A., Miguel, A., Lleida, E. (2022) A Study on the Use of wav2vec Representations for Multiclass Audio Segmentation . Proc. IberSPEECH 2022, 56-60, doi: 10.21437/IberSPEECH.2022-1210.21437/IberSPEECH.2022-12
  Read more
- A Transfer Learning Approach for Pronunciation Scoring
  M. Sancinetti, J. Vidal, C. Bonomi and L. Ferrer, "A Transfer Learning Approach for Pronunciation Scoring," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 6812-6816, doi: 10.1109/ICASSP43922.2022.9747727.
  Read more
- Speech-Based Emotion Recognition with Self-Supervised Models Using Attentive Channel-Wise Correlations and Label Smoothing
  Kakouros S., T. Stafylakis, L. Mošner and L. Burget, "Speech-Based Emotion Recognition with Self-Supervised Models Using Attentive Channel-Wise Correlations and Label Smoothing," ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10094673.
  Read more
- Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models
  Kesiraju, S., Sarvaš, M., Pavlíček, T., Macaire, C., Ciuba, A. (2023) Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models. Proc. INTERSPEECH 2023, 2148-2152, doi: 10.21437/Interspeech.2023-2506
  Read more
- Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization
  Landini F., M Diez, A. Lozano-Diez and L. Burget, "Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization," ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. _, doi: 10.1109/ICASSP49357.2023.10097049.
  Read more
- Semantic Enrichment Towards Efficient Speech Representations
  Laperrière, G., Nguyen, H., Ghannay, S., Jabaian, B., Estève, Y. (2023) Semantic Enrichment Towards Efficient Speech Representations. Proc. INTERSPEECH 2023, 705-709, doi: 10.21437/Interspeech.2023-2234
  Read more
- An Overview of the IberSpeech-RTVE 2022 Challenges on Speech Technologies
  Lleida E.; Rodriguez-Fuentes, L.J.; Tejedor; Ortega, A.; Miguel, A.; Bazán, V.; Pérez, C.; de Prada, A.; Penagarikano, M.; Varona, A.; et al. An Overview of the IberSpeech-RTVE 2022 Challenges on Speech Technologies. Appl. Sci. 2023, 13, 8577. https://doi.org/10.3390/app13158577
  Read more
- An Attention-Based Backend Allowing Efficient Fine-Tuning of Transformer Models for Speaker Verification,
  Peng J., O Plchot, T. Stafylakis, L. Mošner, L. Burget and J. Černocký, "An Attention-Based Backend Allowing Efficient Fine-Tuning of Transformer Models for Speaker Verification," 2022 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023, pp. 555-562, doi: 10.1109/SLT54892.2023.10022775.
  Read more
- Cross-Corpus Training Strategy for Speech Emotion Recognition Using Self-Supervised Representations
  Cross-Corpus Training Strategy for Speech Emotion Recognition Using Self-Supervised Representations
  Pastor, M.A.; Ribas, D.; Ortega, A.; Miguel, A.; Lleida, E. Cross-Corpus Training Strategy for Speech Emotion Recognition Using Self-Supervised Representations. Appl. Sci. 2023, 13, 9062.
  https://doi.org/10.3390/app13169062
  Read more
- Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters
  Peng J., Themos Stafylakis, Rongzhi Gu, Oldřich Plchot, Ladislav Mošner, Lukáš Burget, Jan Černocký., "Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters," ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10094795.
  Read more
- Extracting Speaker and Emotion Information from Self-Supervised Speech Models via Channel-Wise Correlations
  Stafylakis T., L. Mošner, S. Kakouros, O. Plchot, L. Burget and J. Ćernocký, "Extracting Speaker and Emotion Information from Self-Supervised Speech Models via Channel-Wise Correlations," 2022 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar, 2023, pp. 1136-1143, doi: 10.1109/SLT54892.2023.10023345.
  Read more
- Direct Text to Speech Translation System Using Acoustic Units
  Mingote V., P. Gimeno, L. Vicente, S. Khurana, A. Laurent and J. Duret, "Direct Text to Speech Translation System Using Acoustic Units," in IEEE Signal Processing Letters, vol. 30, pp. 1262-1266, 2023, doi: 10.1109/LSP.2023.3313513.
  Read more
- ON-TRAC Consortium Systems for the IWSLT 2023 Dialectal and Low-resource Speech Translation Tasks.
  Antoine Laurent, Souhir Gahbiche, Ha Nguyen, Haroun Elleuch, Fethi Bougares, Antoine Thiol, Hugo Riguidel, Salima Mdhaffar, Gaëlle Laperrière, Lucas Maison, Sameer Khurana, and Yannick Estève. 2023. ON-TRAC Consortium Systems for the IWSLT 2023 Dialectal and Low-resource Speech Translation Tasks. In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 219–226, Toronto, Canada (in-person and online). Association for Computational Linguistics.
  Read more
- Toroidal Probabilistic Spherical Discriminant Analysis
  A. Silnova, N. Brümmer, A. Swart and L. Burget, "Toroidal Probabilistic Spherical Discriminant Analysis," ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5, doi: 10.1109/ICASSP49357.2023.10095580.
  Read more
- Description and Analysis of ABC Submission to NIST LRE 2022
  Matejka, P., Silnova, A., Slavíček, J., Mosner, L., Plchot, O., Klčo, M., Peng, J., Stafylakis, T., Burget, L. (2023) Description and Analysis of ABC Submission to NIST LRE 2022. Proc. INTERSPEECH 2023, 511-515, doi: 10.21437/Interspeech.2023-1529
  Read more
- Improving Speaker Verification with Self-Pretrained Transformer Models
  Peng, J., Plchot, O., Stafylakis, T., Mosner, L., Burget, L., Černocký, J.". (2023) Improving Speaker Verification with Self-Pretrained Transformer Models. Proc. INTERSPEECH 2023, 5361-5365, doi: 10.21437/Interspeech.2023-453
  Read more
- Multi-Channel Speech Separation with Cross-Attention and Beamforming
  Mosner, L., Plchot, O., Peng, J., Burget, L., Černocký, J.". (2023) Multi-Channel Speech Separation with Cross-Attention and Beamforming. Proc. INTERSPEECH 2023, 1693-1697, doi: 10.21437/Interspeech.2023-2537
  Read more
- BUT Systems for IWSLT 2023 Marathi - Hindi Low Resource Speech Translation Task
  Kesiraju, S., Beneš, K., Tikhonov, M., & Černocký, J.H. (2023). BUT Systems for IWSLT 2023 Marathi - Hindi Low Resource Speech Translation Task. International Workshop on Spoken Language Translation.
  Read more
- Improving Speaker Diarization for Low-Resourced Sarawak Malay Language Conversational Speech Corpus
  Rahim M. Z., S. S. Juan and F. S. Mohamad, "Improving Speaker Diarization for Low-Resourced Sarawak Malay Language Conversational Speech Corpus," 2023 International Conference on Asian Language Processing (IALP), Singapore, Singapore, 2023, pp. 228-233, doi: 10.1109/IALP61005.2023.10337314.
  Read more
- Improved Vocal Effort Transfer Vector Estimation For Vocal Effort-Robust Speaker Verification
  I. López-Espejo, S. Prieto, A. Ortega and E. Lleida, "Improved Vocal Effort Transfer Vector Estimation For Vocal Effort-Robust Speaker Verification," 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP), Rome, Italy, 2023, pp. 1-6, doi: 10.1109/MLSP55844.2023.10285923.
  Read more
- Towards Lifelong Human Assisted Speaker Diarization
  Meysam Shamsi, Anthony Larcher , Loïc Barrault,, Sylvain Meignier , Yevheni Prokopalo, Marie Tahon, Ambuj Mehrish, Simon Petitrenaud, Olivier Galibert, Samuel Gaist, André Anjos, Sébastien Marcel, Marta Costa-Jussà, Towards Lifelong Human Assisted Speaker Diarization, Computer Speech and Language, 2023, 77, pp.101437. ⟨10.1016/j.csl.2022.101437⟩.
  Read more
- Active Correction for Incremental Speaker Diarization of a Collection with Human in the Loop
  Yevhenii Prokopalo, Meysam Shamsi, Loïc Barrault, Sylvain Meignier, Anthony Larcher. Active Correction for Incremental Speaker Diarization of a Collection with Human in the Loop. Applied Sciences, 2022, ⟨10.3390/app12041782⟩. ⟨hal-03563148⟩
  Read more
- An explainable proxy model for multilabel audio segmentation
  Mariotte Théo , Antonio Almudévar, Marie Tahon, Alfonso Ortega. An explainable proxy model for multilabel audio segmentation. International Conference on Acoustics Speech and Signal Processing, IEEE, Apr 2024, Seoul (Korea).
  Read more
- Predefined Prototypes for Intra-Class Separation and Disentanglement
  Almudévar Antonio, Théo Mariotte, Alfonso Ortega, Marie Tahon, Luis Vicente, Antonio Miguel, Eduardo Lleida. Predefined Prototypes for Intra-Class Separation and Disentanglement. Interspeech 2024,
  Read more
- A Phonetic Analysis of Speaker Verification Systems through Phoneme selection and Integrated Gradients
  Thebaud Thomas , Gabriel Hernandez Sierra, Sarah Flora Samson Juan, Marie Tahon. A Phonetic Analysis of Speaker Verification Systems through Phoneme selection and Integrated Gradients. Speaker and Language Recognition Workshop - Odyssey, Jun 2024, Quebec, Canada.
  Read more
- 3MAS: a multitask, multilabel, multidataset semi-supervised audio segmentation model
  Lebourdais Martin, Pablo Gimeno, Théo Mariotte, Marie Tahon, Alfonso Ortega, Anthony Larcher. 3MAS: a multitask, multilabel, multidataset semi-supervised audio segmentation model.Speaker and Language Recognition Workshop Odyssey, Jun 2024, Quebec, Canada.
  Read more
- Automatic Speech Interruption Detection: Analysis, Corpus, and System
  Lebourdais Martin, Marie Tahon, Antoine Laurent, Sylvain Meignier. Automatic Speech Interruption Detection: Analysis, Corpus, and System. Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-Coling 2024), ELRA Language Resources Association (ELRA); International Committee on Computational Linguistics (ICCL), May 2024, Torino, Italy.
  Read more
- ALLIES: A Speech Corpus for Segmentation, Speaker Diarization Speech Recognition and Speaker Change Detection
  Tahon Marie, Anthony Larcher, Martin Lebourdais, Fethi Bougares, Ana Silnova, Pablo Gimeno. ALLIES: A Speech Corpus for Segmentation, Speaker Diarization Speech Recognition and Speaker Change Detection. Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-Coling 2024), ELRA Language Resources Association (ELRA); International Committee on Computational Linguistics (ICCL), May 2024, Torino, Italy. <hal-04578441>
  Read more
- Interprétabilité pour l’identification de locuteurs. Retour sur le projet JSALT 2023
  Tahon Marie, Imen Ben Amor, Nicolas Dugué, Jean-François Bonastre. Interprétabilité pour l’identification de locuteurs. Retour sur le projet JSALT 2023. Actes de la Journée Extraction de connaissances interprétables pour l’étude de la communication parlée. Journée commune AFIA-TLH -AFCP, 2023. <hal-04489273>
  Read more
- The CONILIUM proposition for Odyssey Emotion Challenge Leveraging major class with complex annotations
  Shamsi Meysam, Lara Gauder, Marie Tahon. The CONILIUM proposition for Odyssey Emotion Challenge Leveraging major class with complex annotations. The Speaker and Language Recognition Workshop (Odyssey), Jun 2024, Quebec, Canada. hal-04600047
  Read more

Partagez :

Dissemination

Speaker embeddings by modeling channel-wise correlations

Log-Likelihood-Ratio Cost Function as Objective Loss for Speaker Verification Systems

Unsupervised Representation Learning for Speech Activity Detection in the Fearless Steps Challenge 2021

The Domain Mismatch Problem in the Broadcast Speaker Attribution Task

Generalising AUC Optimisation to Multiclass Classification for Audio Segmentation with Limited Training Data

The LIUM Human Active Correction Platform for Speaker Diarization

MultiSV: Dataset for Far-Field Multi-Channel Speaker Verification

Multi-channel Speaker Verification with Conv-TasNet Based Beamformer

Analyzing speaker verification embedding extractors and back-ends under language and channel mismatch

Development of ABC Systems for the 2021 Edition of NIST Speaker Recognition Evaluation

Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings

Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundarie

Microphone Array Channel Combination Algorithms for Overlapped Speech Detection

Phone-Level Pronunciation Scoring for Spanish Speakers Learning English Using a GOP-DNN System

ViVoLAB System Description for the S2TC IberSPEECH-RTVE 2022 challenge

End-to-End Speech Translation of Arabic to English Broadcast News.

Automatic Voice Disorder Detection Using Self-Supervised Representations

Class token and knowledge distillation for multi-head self-attention speaker verification systems

Multimodal Diarization Systems by Training Enrollment Models as Identity Representations

aDCF Loss Function for Deep Metric Learning in End-to-End Text-Dependent Speaker Verification Systems.

Unsupervised Adaptation of Deep Speech Activity Detection Models to Unseen Domains

Wiener Filter and Deep Neural Networks: A Well-Balanced Pair for Speech Enhancement

Shouted and whispered speech compensation for speaker verification systems

On the Problem of Data Availability in Automatic Voice Disorder Detection

S3prl-Disorder: Open-Source Voice Disorder Detection System based in the Framework of S3PRL-toolkit

A Study on the Use of wav2vec Representations for Multiclass Audio Segmentation

A Transfer Learning Approach for Pronunciation Scoring

Speech-Based Emotion Recognition with Self-Supervised Models Using Attentive Channel-Wise Correlations and Label Smoothing

Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models

Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization

Semantic Enrichment Towards Efficient Speech Representations

An Overview of the IberSpeech-RTVE 2022 Challenges on Speech Technologies

An Attention-Based Backend Allowing Efficient Fine-Tuning of Transformer Models for Speaker Verification,

Cross-Corpus Training Strategy for Speech Emotion Recognition Using Self-Supervised Representations

Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters

Extracting Speaker and Emotion Information from Self-Supervised Speech Models via Channel-Wise Correlations

Direct Text to Speech Translation System Using Acoustic Units

ON-TRAC Consortium Systems for the IWSLT 2023 Dialectal and Low-resource Speech Translation Tasks.

Toroidal Probabilistic Spherical Discriminant Analysis

Description and Analysis of ABC Submission to NIST LRE 2022

Improving Speaker Verification with Self-Pretrained Transformer Models

Multi-Channel Speech Separation with Cross-Attention and Beamforming

BUT Systems for IWSLT 2023 Marathi - Hindi Low Resource Speech Translation Task

Improving Speaker Diarization for Low-Resourced Sarawak Malay Language Conversational Speech Corpus

Improved Vocal Effort Transfer Vector Estimation For Vocal Effort-Robust Speaker Verification

Towards Lifelong Human Assisted Speaker Diarization

Active Correction for Incremental Speaker Diarization of a Collection with Human in the Loop

An explainable proxy model for multilabel audio segmentation

Predefined Prototypes for Intra-Class Separation and Disentanglement

A Phonetic Analysis of Speaker Verification Systems through Phoneme selection and Integrated Gradients

3MAS: a multitask, multilabel, multidataset semi-supervised audio segmentation model

Automatic Speech Interruption Detection: Analysis, Corpus, and System

ALLIES: A Speech Corpus for Segmentation, Speaker Diarization Speech Recognition and Speaker Change Detection

Interprétabilité pour l’identification de locuteurs. Retour sur le projet JSALT 2023

The CONILIUM proposition for Odyssey Emotion Challenge Leveraging major class with complex annotations