The RapCor corpus is one of the smaller and topic-specific corpora for French language, which is being developed at the Institute of Romance Languages and Literatures of the Faculty of Arts of Masaryk University in Brno under the leadership of doc. PhDr. Alena Policka, Ph.D.

It is a corpus of spoken French in rap songs, created for socio-lexical research purposes. The specific character of rap lyrics allows for a broader knowledge of substandard French, in particular the dynamics of the development of generational and ethno-socio-geographical word formation and of neology in relation to lexicography. The corpus can also serve those interested in modern poetics or sociolinguistics (especially in relation to multiethnic suburbs).

 

Introduction: what is a corpus?

The word corpus refers to a collection of texts under study, but with the development of computer capacity, the term corpus is increasingly used to mean an electronic corpus, i.e. a collection of computer-stored and processed texts (or transcripts of audio recordings) used for linguistic research. Thanks to the ease with which the results can be retrieved and evaluated, it is possible to obtain much more reliable information and statistics than was previously the case, i.e. in the era of card catalogues.

Electronic language corpora began to emerge together with the development of computer technology in the last decades of the 20th century. Today, there are a number of small and large corpora for most of the world's languages, the largest of which describe the entire national language and reach the extent of several hundred million word forms. For example, for the Czech language, the Czech National Corpus Institute at the Faculty of Arts of the Charles University in Prague is actively creating the Czech National Corpus (ČNK) , made up of several subcorpora of written and spoken texts. For the French language, the largest corpus is Frantext, a corpus of mainly literary texts, conceived at the University of Nancy. There are also a number of smaller corpora, of which we only cite spoken French corpora, e.g. Eslo or Clapi, i.a.

About RapCor corpus

RapCor has been created since 2009 in the framework of the postdoctoral project of the Grant Agency of the Czech Republic - Expressivity in Youth Slang on the Background of the Search for Self and Group Identity (GP405/09/P307). The collection and primary editing of the source material is carried out with the cooperation of students of French, who obtain the lyrics of selected French rap songs either from transcriptions of fans available on the Internet or (currently, preferably) directly from original lyrics on CD covers, if they are included on the covers.