Thomas Gaillat


pdf bib
A new learner language data set for the study of English for Specific Purposes at university
Cyriel Mallart | Nicolas Ballier | Jen-Yu Li | Andrew Simpkin | Bernardo Stearns | Rémi Venant | Thomas Gaillat
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
Exploring a New Grammatico-functional Type of Measure as Part of a Language Learning Expert System
Cyriel Mallart | Andrew Simpkin | Rmi Venant | Nicolas Ballier | Bernardo Stearns | Jen Yu Li | Thomas Gaillat
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)

This paper explores the use of L2-specific grammatical microsystems as elements of the domain knowledge of an Intelligent Computer-assisted Language Learning (ICALL) system. We report on the design of new grammatico-functional measures and their association with proficiency. We illustrate the approach with the design of the IT, THIS, THAT proform microsystem. The measures rely on the paradigmatic relations between words of the same linguistic functions. They are operationalised with one frequency-based and two probabilistic methods, i.e., the relative proportions of the forms and their likelihood of occurrence. Ordinal regression models show that the measures are significant in terms of association with CEFR levels, paving the way for their introduction in a specific proform microsystem expert model.


pdf bib
Language learning analytics: designing and testing new functional complexity measures in L2 writings
Thomas Gaillat
Proceedings of the 11th Workshop on NLP for Computer Assisted Language Learning


pdf bib
Towards a Data Analytics Pipeline for the Visualisation of Complexity Metrics in L2 writings
Thomas Gaillat | Anas Knefati | Antoine Lafontaine
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications

We present the design of a tool for the visualisation of linguistic complexity in second language (L2) learner writings. We show how metrics can be exploited to visualise complexity in L2 writings in relation to CEFR levels.


pdf bib
Un prototype en ligne pour la prédiction du niveau de compétence en anglais des productions écrites (A prototype for web-based prediction of English proficiency levels in writings)
Thomas Gaillat | Nicolas Ballier | Annanda Sousa | Manon Bouyé | Andrew Simpkin | Bernardo Stearns | Manel Zarrouk
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 4 : Démonstrations et résumés d'articles internationaux

Cet article décrit un prototype axé sur la prédiction du niveau de compétence des apprenants de l’anglais. Le système repose sur un modèle d’apprentissage supervisé, couplé à une interface web.

pdf bib
From Linguistic Research Projects to Language Technology Platforms: A Case Study in Learner Data
Annanda Sousa | Nicolas Ballier | Thomas Gaillat | Bernardo Stearns | Manel Zarrouk | Andrew Simpkin | Manon Bouyé
Proceedings of the 1st International Workshop on Language Technology Platforms

This paper describes the workflow and architecture adopted by a linguistic research project. We report our experience and present the research outputs turned into resources that we wish to share with the community. We discuss the current limitations and the next steps that could be taken for the scaling and development of our research project. Allying NLP and language-centric AI, we discuss similar projects and possible ways to start collaborating towards potential platform interoperability.

pdf bib
Automatic detection of unexpected/erroneous collocations in learner corpus
Jen-Yu Li | Thomas Gaillat
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons

This research investigates the collocational errors made by English learners in a learner corpus. It focuses on the extraction of unexpected collocations. A system was proposed and implemented with open source toolkit. Firstly, the collocation extraction module was evaluated by a corpus with manually annotated collocations. Secondly, a standard collocation list was collected from a corpus of native speaker. Thirdly, a list of unexpected collocations was generated by extracting candidates from a learner corpus and discarding the standard collocations on the list. The overall performance was evaluated, and possible sources of error were pointed out for future improvement.


pdf bib
FinSentiA: Sentiment Analysis in English Financial Microblogs
Thomas Gaillat | Annanda Sousa | Manel Zarrouk | Brian Davis
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

FinSentiA: Sentiment Analysis in English Financial Microblogs The objective of this paper is to report on the building of a Sentiment Analysis (SA) system dedicated to financial microblogs in English. The purpose of our work is to build a financial classifier that predicts the sentiment of stock investors in microblog platforms such as StockTwits and Twitter. Our contribution shows that it is possible to conduct such tasks in order to provide fine grained SA of financial microblogs. We extracted financial entities with relevant contexts and assigned scores on a continuous scale by adopting a deep learning method for the classification.

pdf bib
Implicit and Explicit Aspect Extraction in Financial Microblogs
Thomas Gaillat | Bernardo Stearns | Gopal Sridhar | Ross McDermott | Manel Zarrouk | Brian Davis
Proceedings of the First Workshop on Economics and Natural Language Processing

This paper focuses on aspect extraction which is a sub-task of Aspect-based Sentiment Analysis. The goal is to report an extraction method of financial aspects in microblog messages. Our approach uses a stock-investment taxonomy for the identification of explicit and implicit aspects. We compare supervised and unsupervised methods to assign predefined categories at message level. Results on 7 aspect classes show 0.71 accuracy, while the 32 class classification gives 0.82 accuracy for messages containing explicit aspects and 0.35 for implicit aspects.

pdf bib
The SSIX Corpora: Three Gold Standard Corpora for Sentiment Analysis in English, Spanish and German Financial Microblogs
Thomas Gaillat | Manel Zarrouk | André Freitas | Brian Davis
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


pdf bib
Automatic tagging of a learner corpus of English with a modified version of the Penn Treebank tagset (Annotation automatique d’un corpus d’apprenants d’anglais avec un jeu d’étiquettes modifié du Penn Treebank) [in French]
Thomas Gaillat
Proceedings of TALN 2013 (Volume 1: Long Papers)