A Neural Model for Language Identification in Code-Switched Tweets
Aaron Jaech | George Mulcaire | Mari Ostendorf | Noah A. Smith
Proceedings of the Second Workshop on Computational Approaches to Code Switching
Hierarchical Character-Word Models for Language Identification
Aaron Jaech | George Mulcaire | Shobhit Hathi | Mari Ostendorf | Noah A. Smith
Proceedings of the Fourth International Workshop on Natural Language Processing for Social Media
Many Languages, One Parser
Waleed Ammar | George Mulcaire | Miguel Ballesteros | Chris Dyer | Noah A. Smith
Transactions of the Association for Computational Linguistics, Volume 4
We train one multilingual model for dependency parsing and use it to parse sentences in several languages. The parsing model uses (i) multilingual word clusters and embeddings; (ii) token-level language information; and (iii) language-specific features (fine-grained POS tags). This input representation enables the parser not only to parse effectively in multiple languages, but also to generalize across languages based on linguistic universals and typological similarities, making it more effective to learn from limited annotations. Our parser’s performance compares favorably to strong baselines in a range of data scenarios, including when the target language has a large treebank, a small treebank, or no treebank for training.
- Noah A. Smith 3
- Aaron Jaech 2
- Mari Ostendorf 2
- Shobhit Hathi 1
- Waleed Ammar 1
- show all...