NaijaSynCor: A Corpus-based Macro-Syntactic Study of Naija (Nigerian Pidgin)
NaijaSynCor Research Project
Discover the project
NaijaSynCor (A Corpus-based Macro-Syntactic Study of Naija, aka Nigerian Pidgin) takes an exhaustive and in-depth look at the structure of Naija (Nigerian Pidgin) in Nigeria today. Spoken by educated Nigerians, it has been proved to develop in Lagos as a discrete language, separate from Nigerian English. This study proposes to assess whether this holds true for the rest of Nigeria where Naija is spoken by over 75 million speakers. It examines diachronic, diatopic, diaphasic, diastratic, and genre variation.
The project is a collaborative effort of two Nigerian leading experts on Naija (F. Egbokhare & C. Ofulue) and two research units that have proved their expertise in corpus annotation in previous programmes: Llacan, on lesser-described languages; Modyco, on the interaction of prosody and syntax in French and the development of large treebanks, and. The macrosyntactic framework developed in the ANR Rhapsodie project (Lacheret, Pietrandrea & Tchobanov 2014) has proved to be particularly efficient in dealing with the specificities of oral corpora, e.g. piles stacking, disfluencies, repetitions, discourse markers, overlaps, co-enunciation, false starts, self-repairs and truncations. This method is data-driven, inductive (the relevant units are identified through annotation) and modular.
The tools developed by the research team in these previous corpus study programs are robust and mature enough to focus on the linguistic problem posed by Naija: in its geographical and functional expansion, does Naija maintain its status as a discrete language, separate from Nigerian English, or does it undergo decreolization? While answering this question, the research programme aims at overcoming two remaining technological challenges, (i) automatic identification of illocutionary units based on intonation data as a parameter; (ii) building a parser integrating intonation data as a parameter.
Through the creation of a deeply annotated 500 Kw corpus, the project documents the emergence of Naija as a language at the national level, challenging existing theories of the development of creoles and languages in contact. Capitalizing on the latest developments in the area of corpus annotation, this innovative approach to the dynamics of contact and change in the areas of human behaviour and sociology of language will powerfully impact the methodology and technology of research on emerging languages.
Starting: February 1st, 2017
Duration: 42 months
Meet the team
Partner #1: LLACAN, UMR 8135, Langages, Langues et Cultures d’Afrique Noire (Inalco – CNRS) : http://llacan.vjf.cnrs.fr/
Partner #2: MODYCO, UMR 7114, Modèles, Dynamiques, Corpus (Université Paris -Ouest Nanterre La Défense – CNRS) : https://scanr.enseignementsup-recherche.gouv.fr/structure/200112501N
- Bigi, Brigitte, Bernard Caron & Abiola S. Oyelere. 2017. Developing Resources for Automated Speech Processing of the African Language Naija (Nigerian Pidgin). 8th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, 441–445. Poznan, Poland.https://hal.archives-ouvertes.fr/hal-01705707.
- Caron, Bernard. s.p. Clefts in Naija, a Nigerian pidgincreole. Linguistics Discovery 41pp.
- Caron, Bernard. 2018a. NaijaSynCor. Methodological and technical challenges of a corpus-based study of Naija (a post-creole spoken in Nigeria). Keynote address presented at the SYWAL 2018 (3rd Symposium on West African Languages) , 28th-29th September, Warsaw.
- Caron, Bernard. 2018b. Could Naija (aka Common Nigerian Pidgin) be a solution to the curse of indigeneity? (Nigerian Pidgin: E fit go be di future?). Presented at the Neuvième édition du CAAS (Consortium for Asian and African Studies, CAAS 2018), Inalco, Paris.
- Courtin, Marine, Sylvain Kahane, Kim Gerdes & Bernard Caron. 2018. Establishing a Language by Annotating a Corpus: the Case of Naija, a Post-creole Spoken in Nigeria. In Sandra Kübler & Heike Zinsmeister (eds.), Proceedings of the Workshop on Annotation in Digital Humanities co-located with ESSLLI 2018, vol. 2155, 7–11. Sofia, Bulgaria: CEUR. Workshop Proceedings.
- Gerdes, Kim, Bruno Guillaume, Sylvain Kahane & Guy Perrier. 2018. SUD or Surface-Syntactic Universal Dependencies: An annotation scheme near-isomorphic to UD. Workshop Paper presented at the Universal Dependencies Workshop 2018 (UDW 2018), EMNLP 2018, Brussells. http://universaldependencies.org/udw18/PDFs/33_Paper.pdf (20 November, 2018).
- Oyelere, Abiola S. 2018. Vowel Nasality in Naija. Paper presented at the SYWAL 2018 (3rd Symposium on West African Languages) , 28th-29th September, Warsaw.
- Oyelere, Abiola S., Candide Simard & Anne Lacheret-Dujour. 2018. Prominence in the identification of the focus elements in Naija (Nigerian Pidgin). PROSLANG, 12–13. Wellington, New-Zealand.