Abstract
We investigate methods to develop a parser for Martinican Creole, a highly under-resourced language, using a French treebank. We compare transfer learning and multi-task learning models and examine different input features and strategies to handle the massive size imbalance between the treebanks. Surprisingly, we find that a simple concatenated (French + Martinican Creole) baseline yields optimal results even though it has access to only 80 Martinican Creole sentences. POS embeddings work better than lexical ones, but they suffer from negative transfer.- Anthology ID:
- 2022.coling-1.387
- Volume:
- Proceedings of the 29th International Conference on Computational Linguistics
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Editors:
- Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 4397–4406
- Language:
- URL:
- https://aclanthology.org/2022.coling-1.387
- DOI:
- Bibkey:
- Cite (ACL):
- Ludovic Mompelat, Daniel Dakota, and Sandra Kübler. 2022. How to Parse a Creole: When Martinican Creole Meets French. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4397–4406, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Cite (Informal):
- How to Parse a Creole: When Martinican Creole Meets French (Mompelat et al., COLING 2022)
- Copy Citation:
- PDF:
- https://aclanthology.org/2022.coling-1.387.pdf
Export citation
@inproceedings{mompelat-etal-2022-parse, title = "How to Parse a Creole: When Martinican Creole Meets {F}rench", author = {Mompelat, Ludovic and Dakota, Daniel and K{\"u}bler, Sandra}, editor = "Calzolari, Nicoletta and Huang, Chu-Ren and Kim, Hansaem and Pustejovsky, James and Wanner, Leo and Choi, Key-Sun and Ryu, Pum-Mo and Chen, Hsin-Hsi and Donatelli, Lucia and Ji, Heng and Kurohashi, Sadao and Paggio, Patrizia and Xue, Nianwen and Kim, Seokhwan and Hahm, Younggyun and He, Zhong and Lee, Tony Kyungil and Santus, Enrico and Bond, Francis and Na, Seung-Hoon", booktitle = "Proceedings of the 29th International Conference on Computational Linguistics", month = oct, year = "2022", address = "Gyeongju, Republic of Korea", publisher = "International Committee on Computational Linguistics", url = "https://aclanthology.org/2022.coling-1.387", pages = "4397--4406", abstract = "We investigate methods to develop a parser for Martinican Creole, a highly under-resourced language, using a French treebank. We compare transfer learning and multi-task learning models and examine different input features and strategies to handle the massive size imbalance between the treebanks. Surprisingly, we find that a simple concatenated (French + Martinican Creole) baseline yields optimal results even though it has access to only 80 Martinican Creole sentences. POS embeddings work better than lexical ones, but they suffer from negative transfer.", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="mompelat-etal-2022-parse"> <titleInfo> <title>How to Parse a Creole: When Martinican Creole Meets French</title> </titleInfo> <name type="personal"> <namePart type="given">Ludovic</namePart> <namePart type="family">Mompelat</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Daniel</namePart> <namePart type="family">Dakota</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sandra</namePart> <namePart type="family">Kübler</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2022-10</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the 29th International Conference on Computational Linguistics</title> </titleInfo> <name type="personal"> <namePart type="given">Nicoletta</namePart> <namePart type="family">Calzolari</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Chu-Ren</namePart> <namePart type="family">Huang</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hansaem</namePart> <namePart type="family">Kim</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">James</namePart> <namePart type="family">Pustejovsky</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Leo</namePart> <namePart type="family">Wanner</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Key-Sun</namePart> <namePart type="family">Choi</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Pum-Mo</namePart> <namePart type="family">Ryu</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Hsin-Hsi</namePart> <namePart type="family">Chen</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Lucia</namePart> <namePart type="family">Donatelli</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Heng</namePart> <namePart type="family">Ji</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Sadao</namePart> <namePart type="family">Kurohashi</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Patrizia</namePart> <namePart type="family">Paggio</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Nianwen</namePart> <namePart type="family">Xue</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Seokhwan</namePart> <namePart type="family">Kim</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Younggyun</namePart> <namePart type="family">Hahm</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Zhong</namePart> <namePart type="family">He</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Tony</namePart> <namePart type="given">Kyungil</namePart> <namePart type="family">Lee</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Enrico</namePart> <namePart type="family">Santus</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Francis</namePart> <namePart type="family">Bond</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Seung-Hoon</namePart> <namePart type="family">Na</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>International Committee on Computational Linguistics</publisher> <place> <placeTerm type="text">Gyeongju, Republic of Korea</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>We investigate methods to develop a parser for Martinican Creole, a highly under-resourced language, using a French treebank. We compare transfer learning and multi-task learning models and examine different input features and strategies to handle the massive size imbalance between the treebanks. Surprisingly, we find that a simple concatenated (French + Martinican Creole) baseline yields optimal results even though it has access to only 80 Martinican Creole sentences. POS embeddings work better than lexical ones, but they suffer from negative transfer.</abstract> <identifier type="citekey">mompelat-etal-2022-parse</identifier> <location> <url>https://aclanthology.org/2022.coling-1.387</url> </location> <part> <date>2022-10</date> <extent unit="page"> <start>4397</start> <end>4406</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T How to Parse a Creole: When Martinican Creole Meets French %A Mompelat, Ludovic %A Dakota, Daniel %A Kübler, Sandra %Y Calzolari, Nicoletta %Y Huang, Chu-Ren %Y Kim, Hansaem %Y Pustejovsky, James %Y Wanner, Leo %Y Choi, Key-Sun %Y Ryu, Pum-Mo %Y Chen, Hsin-Hsi %Y Donatelli, Lucia %Y Ji, Heng %Y Kurohashi, Sadao %Y Paggio, Patrizia %Y Xue, Nianwen %Y Kim, Seokhwan %Y Hahm, Younggyun %Y He, Zhong %Y Lee, Tony Kyungil %Y Santus, Enrico %Y Bond, Francis %Y Na, Seung-Hoon %S Proceedings of the 29th International Conference on Computational Linguistics %D 2022 %8 October %I International Committee on Computational Linguistics %C Gyeongju, Republic of Korea %F mompelat-etal-2022-parse %X We investigate methods to develop a parser for Martinican Creole, a highly under-resourced language, using a French treebank. We compare transfer learning and multi-task learning models and examine different input features and strategies to handle the massive size imbalance between the treebanks. Surprisingly, we find that a simple concatenated (French + Martinican Creole) baseline yields optimal results even though it has access to only 80 Martinican Creole sentences. POS embeddings work better than lexical ones, but they suffer from negative transfer. %U https://aclanthology.org/2022.coling-1.387 %P 4397-4406
Markdown (Informal)
[How to Parse a Creole: When Martinican Creole Meets French](https://aclanthology.org/2022.coling-1.387) (Mompelat et al., COLING 2022)
- How to Parse a Creole: When Martinican Creole Meets French (Mompelat et al., COLING 2022)
ACL
- Ludovic Mompelat, Daniel Dakota, and Sandra Kübler. 2022. How to Parse a Creole: When Martinican Creole Meets French. In Proceedings of the 29th International Conference on Computational Linguistics, pages 4397–4406, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.