UMR 8135 CNRS - INaLCO

ANR contracts

NaijaSynCor – A Corpus-Based Macro-Syntactic Study of Naija (Nigerian Pidgin)
(01.02.2017-31.07.2020):

NaijaSynCor takes an exhaustive and in-depth look at the structure of Naija (Nigerian Pidgin) in Nigeria today. Spoken by educated Nigerians, it has been proved to develop in Lagos as a discrete language, separate from Nigerian English. This study proposes to assess whether this holds true for the rest of Nigeria where Naija is spoken by over 75 million speakers. It examines diachronic, diatopic, diaphasic, diastratic, and genre variation. The project is a collaborative effort of two Nigerian leading experts on Naija (F. Egbokhare & C. Ofulue) and two research units that have proved their expertise in corpus annotation in previous programs: LLACAN, on lesser-described languages; MoDyCo, on the interaction of prosody and syntax in French and the development of large treebanks.
More information...

BULB (BULB - Breaking the Unwritten Language Barrier) (01.03.2015 - 28.02.2018) :

In a context where a growing number of languages are in danger of extinction and linguists in dire need for efficient language documentation tools, Breaking the Unwritten Language Barrier (BULB) aims at supporting the documentation of unwritten languages with the help of modern natural language processing technologies, in particular automatic speech recognition (ASR) and machine translation (MT).

This ANR/DFG project relies on a strong German-French cooperation between linguists and computer scientists from ZAS (F. Hamlaoui), the KIT (S. Stücker) and the University of Stuttgart (S. Zerbian) on the German side, as well as the LPP (M. Adda-Decker, A. Rialland), the LLACAN (M. van de Velde, D. Idiatov), the LIMSI (L. Lamel and F. Yvon), the LIG (L. Besacier) and the IMMI-CNRS (G. Adda) on the French side. These researchers and their local teams are bringing together their expertise to address the documentation of three mostly unwritten and generally under-resourced African languages of the Bantu family: Basaa (Cameroon), Myene (Gabon) and Embosi (Republic of Congo).
More information...

ELLAF (Encyclopaedia of Literature in African Languages) (01/2014-01/2017):

The little known literature in African languages is rich and encompasses oral literature as well as literature written in different scripts. The great linguistic and formal variety of African literature raises important analytical questions as well as questions relating to the theory of literature in general: What is the link between the status of a language and its capacity to produce literary texts? What are the relations between oral literature and literary writing?

In order to improve the state of documentation, it is essential to develop documentary tools; they are the necessary requisites for the realisation of interdisciplinary research that goes beyond the limits of one literature type. ELLAF has the ambition to be both a database of literature in African languages – irrespective of their sociolinguistic status – and a research platform. According to a shared protocol, excerpts or full versions of literary texts are presented with a translation in French and/or English on the ELLAF web site. Each text is contextualised, information on the circumstances of its creation or performance is given and thus the literary genre to which it belongs is defined.
More information... (in French)

CorTypo (03/2013-03/2017):

The aim of the CorTypo project is the elaboration of an innovative system of linguistic annotation of natural language corpora in lesser-described spoken languages, in view of testing linguistic hypotheses on spontaneous discourse data, in a typological perspective.

In order to achieve this goal a number of fundamental theoretical questions need to be resolved with respect to language form and language functions. Crucially, the project addresses the question of what kind of theoretical apparatus is required for the comparison of languages displaying different formal means and different functions.

By implementing theoretical solutions into corpus-design and database-design, the project provides the basis for the empirical testing and falsification of hypotheses, and allows the elaboration of new hypotheses on language structure and cross-linguistic comparison. By proposing solutions to the problem of linguistic interoperability, it paves the way for large-scale typological work based on first-hand natural language data.
More information...

RefLex (12/2010 – 05/2015):

The RefLex project aims at providing the scientific community with (i) a lexical reference corpus of the languages of Africa as well as (ii) the instruments to process and analyse the data of this corpus. A more detailed description (in French) is available as a PDF document.
More information...

Sénélangues (10/2009 – 01/2014):

The project wants to contribute to the documentation and description of the languages of Senegal and to the classification of the Atlantic languages. The identification of the least documented and/or most endangered languages will allow us to define research priorities. The description of these languages will advance our knowledge of the language of Senegal considerably and help take steps to safeguard those languages that are endangered. The project intends to make an Africanist contribution to linguistic typology as well as to language classification; it will provide valuable argument for the revision of the contested classification of the Atlantic language family.
More information... (in French)

CORPAFROAS (2007-2012):

CORPAFROAS (led by Amina Mettouchi) was a project financed by the ANR (France) from 2007-2012. It was an integrated pilot project realised by field linguists for field linguists and typologists which proposed a methodology for the treatment of fieldwork textual data in little known languages, from data gathering to automatic searches on the corpus. It developed free open-source and user-friendly new software, ELAN-CorpA, on the basis of ELAN (Max Planck Institute Nijmegen). It made available a corpus of time-aligned annotated first-hand transcriptions of narrative and conversational data from different Afroasiatic languages, with accompanying sound files, list of glosses, grammatical sketches, and metadata. The corpus is freely accessible online together with software, instruments and publications that aim at facilitating contributions by other field linguists to CORPAFROAS and at inspiring initiatives modelled on the CORPAFROAS project.
More information...