NaijaSynCor project: release of a pilot syntactic corpus of Naija
NaijaSynCor has just released a pilot syntactic corpus of Naija on the Turku Universal Dependencies Website: http://bionlp-www.utu.fi/dep_search/
The treebank was created within the NaijaSynCor project, directed by Bernard Caron and funded by the ANR, the French National Research Agency.
This corpus is a pilot (948 sentences and 12863 tokens) for the larger corpus elaborated as part of the NaijaSynCor Project (Projet-ANR-16-CE27-0007). Its main aim is to elaborate and test the annotation and procedures that are used in the ANR-project. It will be part of a larger 500kW corpus that will be projected on prosodic and information structures and analysed for sociolinguistic variation (http://naijasyncor.huma-num.fr/).
The pilot corpus was recorded in various locations in Ibadan (Nigeria) by Bukola Babalola and Opeyemi Lewis. It was transcribed, translated and tagged manually using Elan-Corpa (http://llacan.vjf.cnrs.fr/res_ELAN-CorpA_en.php) by Folakemi Ladoja, Emeka Onwuegbuzia, Biola Oyelere and Samson Tella under the supervision of Bernard Caron. It was converted to CONLL by Mourad Aouini. The final Universal dependencies annotations have been manually checked by Sandra Bello, Marine Courtin, Bernard Caron, Kim Gerdes, Sylvain Kahane, and Manying Zhang using the processing chain developed by Kim Gerdes. The guidelines have been written by Marine Courtin and Sandra Bellato under the supervision of Sylvain Kahane, Bernard Caron, and Kim Gerdes.
The treebank can be searched for words, lemmas, parts of speech and functions at this address: http://bionlp-www.utu.fi/dep_search/
On the left-hand side, in the ‘search’ area, scroll down to "UV dev-branch" section:
And select: Naija:
Now you can type in your query, e.g. the word ‘pikin’:
If you click on ‘context’ above an example, you will open a window showing the paragraph where the sentence appears: