From December 12, 2022 to April 19, 2023
Low resource languages are languages for which we have little data, necessary to be able to build efficient models for several applications of NLP.
I generally work on multilingual embedding methods to exploit the semantic information of several other so-called rich languages in order to design models effective on tasks such as machine translation and speech alignement for low resource languages and particularly on the Ewondo language.
It is a language that is widely spoken, but for which few written resources exist. My work aims to develop NLP tools to be able to easily and efficiently translate languages like French or English into Ewondo and vice versa. This type of tool would have a strong societal impact by allowing the population to access many documents that exist only in French or English. The difficulty of this task is the small amount of data available. The approaches currently explored rely on the transfer of learning from more endowed languages and the exploitation of imperfect parallel corpora from the Bible.