LFTK: Handcrafted Features in Computational Linguistics

Bruce W. Lee, Jason Lee


Abstract
Past research has identified a rich set of handcrafted linguistic features that can potentially assist various tasks. However, their extensive number makes it difficult to effectively select and utilize existing handcrafted features. Coupled with the problem of inconsistent implementation across research works, there has been no categorization scheme or generally-accepted feature names. This creates unwanted confusion. Also, no actively-maintained open-source library extracts a wide variety of handcrafted features. The current handcrafted feature extraction practices have several inefficiencies, and a researcher often has to build such an extraction system from the ground up. We collect and categorize more than 220 popular handcrafted features grounded on past literature. Then, we conduct a correlation analysis study on several task-specific datasets and report the potential use cases of each feature. Lastly, we devise a multilingual handcrafted linguistic feature extraction system in a systematically expandable manner. We open-source our system to give the community a rich set of pre-implemented handcrafted features.
Anthology ID:
2023.bea-1.1
Volume:
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Ekaterina Kochmar, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Nitin Madnani, Anaïs Tack, Victoria Yaneva, Zheng Yuan, Torsten Zesch
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–19
Language:
URL:
https://aclanthology.org/2023.bea-1.1
DOI:
10.18653/v1/2023.bea-1.1
Bibkey:
Cite (ACL):
Bruce W. Lee and Jason Lee. 2023. LFTK: Handcrafted Features in Computational Linguistics. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 1–19, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
LFTK: Handcrafted Features in Computational Linguistics (Lee & Lee, BEA 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.bea-1.1.pdf