FFR (Fon-French Neural Machine Translation)

Bonaventure DOSSOU et Chris Chinenye Emezue
FFR (Fon-French Neural Machine Translation)
Un article du site Idemi.africa en parle: https://idemi.africa/le-projet-ffr-et-les-recherches-en-intelligence-artificielle-ia-en-afrique/

« FFR v1.0 » is the first stage of a Fon-French translation model project, trained on https://github.com/bonaventuredossou/ffr-v1/tree/master/FFR-Dataset using neural machine translation with attention. While it could be observed that Masakhane https://www.masakhane.io/ (https://twitter.com/MasakhaneMt) , an online community of African researchers working on machine translation for African languages, have generated translation models and baselines from/to many African languages, however, the « Project FFR v1.0” is the first to make this effort on a large scale, by taking time to painstakingly amass a large training dataset and exploring techniques to work with the Fon diacritics for better translation accuracy in order to achieve a publishable model which may be used by people to a certain degree of reliability.

Part of the research methodology used by the researchers in sourcing the data for this research includes rigorous compilation through “web-scraping” and “parsing” open source dataset websites. Through these efforts, we obtained 53,975 Fon-French parallel words and sentences, which we used for the pilot stage. Furthermore, the dataset was specially cleaned, pre-processed and tokenized, preserving the diacritics and special characters of the Fon alphabet. The owners of the website were contacted and permission was granted to collect the data on their website.

