FrSemCor: Annotating a French Corpus with Supersenses
Lucie Barque, Pauline Haas, Richard Huyghe, Delphine Tribout, Marie Candito, Benoit Crabbé, Vincent Segonne
Abstract
French, as many languages, lacks semantically annotated corpus data. Our aim is to provide the linguistic and NLP research communities with a gold standard sense-annotated corpus of French, using WordNet Unique Beginners as semantic tags, thus allowing for interoperability. In this paper, we report on the first phase of the project, which focused on the annotation of common nouns. The resulting dataset consists of more than 12,000 French noun occurrences which were annotated in double blind and adjudicated according to a carefully redefined set of supersenses. The resource is released online under a Creative Commons Licence.- Anthology ID:
- 2020.lrec-1.724
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 5912–5918
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.724
- DOI:
- Bibkey:
- Cite (ACL):
- Lucie Barque, Pauline Haas, Richard Huyghe, Delphine Tribout, Marie Candito, Benoit Crabbé, and Vincent Segonne. 2020. FrSemCor: Annotating a French Corpus with Supersenses. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5912–5918, Marseille, France. European Language Resources Association.
- Cite (Informal):
- FrSemCor: Annotating a French Corpus with Supersenses (Barque et al., LREC 2020)
- Copy Citation:
- PDF:
- https://aclanthology.org/2020.lrec-1.724.pdf
Export citation
@inproceedings{barque-etal-2020-frsemcor, title = "{F}r{S}em{C}or: Annotating a {F}rench Corpus with Supersenses", author = "Barque, Lucie and Haas, Pauline and Huyghe, Richard and Tribout, Delphine and Candito, Marie and Crabb{\'e}, Benoit and Segonne, Vincent", editor = "Calzolari, Nicoletta and B{\'e}chet, Fr{\'e}d{\'e}ric and Blache, Philippe and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Isahara, Hitoshi and Maegaard, Bente and Mariani, Joseph and Mazo, H{\'e}l{\`e}ne and Moreno, Asuncion and Odijk, Jan and Piperidis, Stelios", booktitle = "Proceedings of the Twelfth Language Resources and Evaluation Conference", month = may, year = "2020", address = "Marseille, France", publisher = "European Language Resources Association", url = "https://aclanthology.org/2020.lrec-1.724", pages = "5912--5918", abstract = "French, as many languages, lacks semantically annotated corpus data. Our aim is to provide the linguistic and NLP research communities with a gold standard sense-annotated corpus of French, using WordNet Unique Beginners as semantic tags, thus allowing for interoperability. In this paper, we report on the first phase of the project, which focused on the annotation of common nouns. The resulting dataset consists of more than 12,000 French noun occurrences which were annotated in double blind and adjudicated according to a carefully redefined set of supersenses. The resource is released online under a Creative Commons Licence.", language = "English", ISBN = "979-10-95546-34-4", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="barque-etal-2020-frsemcor"> <titleInfo> <title>FrSemCor: Annotating a French Corpus with Supersenses</title> </titleInfo> <name type="personal"> <namePart type="given">Lucie</namePart> <namePart type="family">Barque</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Pauline</namePart> <namePart type="family">Haas</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Richard</namePart> <namePart type="family">Huyghe</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Delphine</namePart> <namePart type="family">Tribout</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Marie</namePart> <namePart type="family">Candito</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Benoit</namePart> <namePart type="family">Crabbé</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Vincent</namePart> <namePart type="family">Segonne</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2020-05</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <language> <languageTerm type="text">English</languageTerm> <languageTerm type="code" authority="iso639-2b">eng</languageTerm> </language> <relatedItem type="host"> <titleInfo> <title>Proceedings of the Twelfth Language Resources and Evaluation Conference</title> </titleInfo> <name type="personal"> <namePart type="given">Nicoletta</namePart> <namePart type="family">Calzolari</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Frédéric</namePart> <namePart type="family">Béchet</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Philippe</namePart> <namePart type="family">Blache</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Khalid</namePart> <namePart type="family">Choukri</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Christopher</namePart> <namePart type="family">Cieri</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Thierry</namePart> <namePart type="family">Declerck</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sara</namePart> <namePart type="family">Goggi</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hitoshi</namePart> <namePart type="family">Isahara</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Bente</namePart> <namePart type="family">Maegaard</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Joseph</namePart> <namePart type="family">Mariani</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hélène</namePart> <namePart type="family">Mazo</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Asuncion</namePart> <namePart type="family">Moreno</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Jan</namePart> <namePart type="family">Odijk</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Stelios</namePart> <namePart type="family">Piperidis</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>European Language Resources Association</publisher> <place> <placeTerm type="text">Marseille, France</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> <identifier type="isbn">979-10-95546-34-4</identifier> </relatedItem> <abstract>French, as many languages, lacks semantically annotated corpus data. Our aim is to provide the linguistic and NLP research communities with a gold standard sense-annotated corpus of French, using WordNet Unique Beginners as semantic tags, thus allowing for interoperability. In this paper, we report on the first phase of the project, which focused on the annotation of common nouns. The resulting dataset consists of more than 12,000 French noun occurrences which were annotated in double blind and adjudicated according to a carefully redefined set of supersenses. The resource is released online under a Creative Commons Licence.</abstract> <identifier type="citekey">barque-etal-2020-frsemcor</identifier> <location> <url>https://aclanthology.org/2020.lrec-1.724</url> </location> <part> <date>2020-05</date> <extent unit="page"> <start>5912</start> <end>5918</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T FrSemCor: Annotating a French Corpus with Supersenses %A Barque, Lucie %A Haas, Pauline %A Huyghe, Richard %A Tribout, Delphine %A Candito, Marie %A Crabbé, Benoit %A Segonne, Vincent %Y Calzolari, Nicoletta %Y Béchet, Frédéric %Y Blache, Philippe %Y Choukri, Khalid %Y Cieri, Christopher %Y Declerck, Thierry %Y Goggi, Sara %Y Isahara, Hitoshi %Y Maegaard, Bente %Y Mariani, Joseph %Y Mazo, Hélène %Y Moreno, Asuncion %Y Odijk, Jan %Y Piperidis, Stelios %S Proceedings of the Twelfth Language Resources and Evaluation Conference %D 2020 %8 May %I European Language Resources Association %C Marseille, France %@ 979-10-95546-34-4 %G English %F barque-etal-2020-frsemcor %X French, as many languages, lacks semantically annotated corpus data. Our aim is to provide the linguistic and NLP research communities with a gold standard sense-annotated corpus of French, using WordNet Unique Beginners as semantic tags, thus allowing for interoperability. In this paper, we report on the first phase of the project, which focused on the annotation of common nouns. The resulting dataset consists of more than 12,000 French noun occurrences which were annotated in double blind and adjudicated according to a carefully redefined set of supersenses. The resource is released online under a Creative Commons Licence. %U https://aclanthology.org/2020.lrec-1.724 %P 5912-5918
Markdown (Informal)
[FrSemCor: Annotating a French Corpus with Supersenses](https://aclanthology.org/2020.lrec-1.724) (Barque et al., LREC 2020)
- FrSemCor: Annotating a French Corpus with Supersenses (Barque et al., LREC 2020)
ACL
- Lucie Barque, Pauline Haas, Richard Huyghe, Delphine Tribout, Marie Candito, Benoit Crabbé, and Vincent Segonne. 2020. FrSemCor: Annotating a French Corpus with Supersenses. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5912–5918, Marseille, France. European Language Resources Association.