ESPERANTO Consortium


ESPERANTO is a collaborative research program based on a partnership between 15 academic and 4 non-academic partners, having complementary expertise and resources. The Esperanto project involves academic institutions leading the field of speech processing on 4 continents for decades and covering a large range of speech processing applications.

Academic partners

LIUM (Laboratoire d'Informatique de l'Université de Mans) - project coordinator

Research of the Language Speech Technology (LST) team involves Machine translation, Speech recognition, Speech understanding, Speaker characterization and Text to speech.

The LIUM has organized many national and international workshops and conferences, it produces open-source resources (corpora and toolkits) for speech processing. LIUM has been involved in many international challenges (NIST-SRE, REPERE, MGB, WLT…)



ViVoLAB - Universidad Zaragoza

UNIZAR is a public University with the main campus in Zaragoza, Spain. VIVOLAB is the speech technologies group recognized as a reference research group. It is composed by 7 academic and 2 post-doc researchers, and several PhD students. VIVOLAB has three main research lines:

  • Acoustic information processing: speaker recognition & diarization, multimodal (speaker & face) diarization, acoustic event detection (monitoring systems with optical fibers), non-standard speech, speech enhancement…
  • Interaction technologies: automatic speech recognition, text-to-speech, language modelling, alternative and augmentative communication systems for inclusive interaction.
  • Multimedia content retrieval and indexing: metadata extraction, audiovisual summarization in collaboration with the Spanish public radio and television corporation (RTVE) through the RTVE chair at UNIZAR.

Vivolab has organized the IberSpeech (2012, 2014, 2016) international conference on speech technologies and co-organize the 2016 Speaker Odyssey workshop. It has a long tradition participating in national and international evaluation challenges. Vivolab has been organizing the Albayzín evaluations in speech technologies since 2006.


LNE (Laboratoire National de Métrologie et d'Essais)

The LNE is a French state-owned laboratory charged with the testing and certification of products and coordinating the French metrology, entrusted since 2008 by national public agencies and industries of the evaluation of ICT systems (natural language processing, robotics …)
The LNE has organized AI evaluation campaigns (QUAERO, REPERE, MAURDOR …) and is specializedn in designing evaluations plan, protocols and metrics.


BUT Speech@FIT

BUT Speech@FIT group is part of Department of Computer Graphics and Multimedia at the Faculty of Information Technology at Brno University of Technology. Its more than 20 members (faculty, research staff, PhD students and support) are world renowned experts in speech data mining: speaker and language identification, speech recognition, and keyword spotting. Excellent results in various international evaluations (especially organized by NIST) are among its main achievements. The group is also known for its work in feature extraction and acoustic modeling for speech recognition (features based on neural networks, multi-lingual training). The group has a track of more than 15 European projects, and has also been funded by U.S. DARPA and IARPA. It has an extensive industrial cooperation and span-off three companies: Phonexia, ReplayWell and Voice Dimension. BUT Speech@FITorganizes Interspeech 2021 in Brno. 


LIA (Laboratoire d'Informatique d'Avignon)

The LIA is part of Avignon University, which has more than 700 years of history.

It is composed of about 80 people, including 65 teacher-researchers (30 permanent staff and about 30 PhD students) who work on three main research themes: Automatic Processing of Natural Language (whether written or oral), Operational Research and Networks. The LIA is part of the Laboratory of Excellence (Labex) Brain and Language Research Institute (BLRI).



LIG (Laboratoire Informatique de Grenoble)

The LIG is the computer science lab of University Grenoble Alpes and is it includes the GETALP team (Study Group for Machine Translation and Automated Processing of Languages and Speech).

Resulting from the union of researchers in NLP, GETALP is a multidisciplinary group (computer scientists, linguists, phoneticians, translators and signal processing specialists) with the objective to address theoretical, methodological, and practical aspects of multilingual communication and multilingual information processing.



Natural Language Processing Group of The University of Sheffield

The NLP Group of the University of Sheffield, established in 1993 , is one of the largest and most successful language processing groups in the UK and has a strong global reputation. Its research themes include:

  • Information Access
  • Language Resources and Architectures for NLP
  • Machine Translation
  • Human-Computer Dialogue Systems
  • NLP for social media
  • Biomedical Text Processing

UNIMAS (Universiti Malaysia Sarawak)

UNIMAS is Malaysia's eighth university which was officially incorporated on 24 December 1992. We are in Sarawak, Borneo– a unique island which is home to more than100 indigenous languages.

UNIMAS's commitment to research has already been recognized by local and international stakeholders and partners in the industry, by the provision of various endowments for the establishment of eight research chairs, namely the Tun Zaidi Chair for Medicinal Chemistry, the Tun Openg Chair for Sago Technology, the Shell Chair for Environmental Studies, and the Sapura Chair for ICT. UNIMAS has organized international conferences on natural language processing such as IALP 2014 and MALINDO 2012.


USM (Universiti Sains Malaysia)

USM is a public research university founded on 1 June 1969 in Penang, Malaysia. It has three campuses: a main campus on the island of Penang, a health campus in Kelantan, and an engineering campus in Nibong Tebal. School of Computer Sciences (CS), USM since its early establishment has been active in the domain of artificial intelligence, particularly in natural language processing. The speech processing group in CS, USM has collaborated with international researchers, for instance the publication of language resources to LDC, and also organized a number of international language and speech processing conferences such as SLTU 2009, IALP 2011, SLAM 2014.


IDASCO (Informatique Distribuée Appliquée aux Systèmes Complexes)

People from the University of Yaounde 1 (UY1) involved in the project are members of IDASCO (Informatique Distribuée Appliquée aux Systèmes Complexes), a research team of the Laboratoire d’Informatique et Applications; a public research laboratory in computer science of the Faculty of Sciences of UY1. IDASCO is a also member of the research unit UMI 209 UMMISCO (IRD/Sorbonne Université).

The general objective of IDASCO is to develop models and tools that can be used to collect data produced by complex computational, linguistic, epidemiological or environmental systems, and extract knowledge from these data in order to better understand their structure and dynamics.


ICC (Insituto de Ciencias de la computación)

CONICET is the main agency that fosters science and technology in Argentina. The speech lab involved in this project belongs to the Computer Science Institute (ICC, its acronym in Spanish), doubly affiliated to CONICET and the University of Buenos Aires (UBA). The lab has made contributions in different tasks of speech processing including coordination in dialogs, extraction of high-level information from speech and computer assisted language learning.



Speech Processing and Transmission Lab

The Laboratory for the Processing and Transmission of Speech (LPTV) of the University of Chile was created in the year 2000 by Professor Néstor Becerra Yoma, Ph.D., as part of the project “Robust Processing of Acoustic Patterns of Telephone and Internet Applications”, financed by CONICYT/FONDECYT, Chile. Our research area of interest is focused on the following topics: speech technologies, QoS on the Internet, and usability in engineering.



CENTATAV (Centro de Aplicaciones de tecnologias de Avanzada)

CENATAV is a center for theoretical and applied research in Pattern Recognition and Data Mining. It was created in 2004 and currently has around 30 researchers in the areas of Image & Video Processing, Data Mining and Voice Processing. The group has participated in projects in the areas of:

  • Speech Recognition
  • Speech Processing
  • Speaker and Language Characterization
  • Speaker Diarization



JHU (Johns hopkins University)

The Center for Language and Speech Processing (CLSP) at the Johns Hopkins University, is a world-renowned academic research center in human language technologies, including automatic speech recognition, speaker and language recognition, emotion and sentiment recognition from speech, speech translation, and keyword search, as well as text parsing, computational morphology and phonology, information extraction and retrieval, computational semantics and machine translation.  

Among its many widely-acknowledged contributions to the field is the organization and hosting since 1995 of the (now called) Frederick Jelinek Memorial Summer Workshops in Speech and Language Technology (JSALT) , an 8-week summer research marathon that brings together top researchers from academia, industry and government/non-profit organizations at all levels of seniority (PhDs, graduate students and undergraduates) to work together collaboratively and intensively on a few select problems each year.



Mila (Montreal Institute for Learning Algorithms)

Mila is a research lab founded by Professor Yoshua Bengio of the Université de Montréal. It rallies researchers specializing in the field of deep learning. Recognized globally for its significant contributions to the field of deep learning, Mila has distinguished itself in the areas of language modelling, machine translation, speech recognition, object recognition, generative models. Since 2017, Mila is the result of a partnership between the Université de Montréal and McGill University with École Polytechnique de Montréal and HEC Montréal. 


Non Academic Partners


Phonexia specializes in technologies associated with data mining from speech. As a result of research and development, Phonexia introduced the Phonexia Speech Platform for Commercial and Government sectors. This product provides the complete set of state-of-the-art speech technologies (speaker / language / gender identification, keyword spotting, speech transcription and others) in a single software platform. The easy-to-integrate and scalable product allows understanding the massive amount of audio data without listening to it. Phonexia gained expertise by building technologies and solutions for voice verification in Call centers and also for VoiceBots (also called voice-enabled chatbots) for Central European languages. 



Elyadata was founded in 2012 to provide services to businesses that leverage world-class data-driven technologies to their potential, and provide the ROI the customer expects from their investment.  ELYADATA specializes in implementing advanced AI research findings to real business cases. ELYADATA provides R&D consulting services on Artificial intelligence and data-related fields (Data mining/Data Architecture/...).  Main clients of ELYADATA include leading insurance companies and banks in Europe and governments in MENA regions. 


Omilia is a speech technology company with offices in Athens (Greece), Chania (Greece), Limassol (Cyprus), Kiev (Ukraine), and Toronto (Canada). Omilia has developed an entire stack of spoken language understanding technologies (including automatic speech recognition, natural language understanding, dialogue manager, voice biometrics, fraud detection, a.o.) that allow enterprises to realize the digital transformation of their customer care. Its products are used by some of the largest financial institutions in North America, such as Royal Bank of Canada and Discover Financial Services (USA). 


Allo-Media provides an AI platform based on Call Tracking, Automatic Natural Language Recognition, and Speech Analytics that helps convert calls into the right actions. Allo-Media proposes two products: the CookieVocal™ uses AI on phone calls for acquisition marketing campaigns, to link phone conversation data to analytic solutions and increase marketing efficiency as well as customer experience; and Scribr which is a SaaS speech analytics platform for phone calls. Allo-Media’s proprietary live call transcripts and voice of the customer analysis provide actionable data for continuous improvement in customer experience processes. 


Collaborating Institutions

 NIST (National Institute of Standards and Technology)

NIST manages a broad array of research areas throughout its research laboratories. One of those research areas is speech technology evaluation and metrology, managed by the Information Access Division of the Information Technology Laboratory.  NIST has expertise in:

  • human-assisted speech technology,
  • low-resourced speech technology,
  • explainability in speech technology.


Partagez :


Project Coordinator :

Prof , LMU (anthony.larcher[at]univ-lemans.fr)

Project Manager :

Dr. Emmanuelle Billard, LMU (emmanuelle.billard[at]univ-lemans.fr)

Le Mans Université

Avenue Olivier Messiaen,

72085 - LE MANS Cedex 09

+33 (0)2 43 23 38 53