2024
pdf
bib
abs
NLPeople at L+M-24 Shared Task: An Ensembled Approach for Molecule Captioning from SMILES
Shinnosuke Tanaka
|
Carol Mak
|
Flaviu Cipcigan
|
James Barry
|
Mohab Elkaref
|
Movina Moses
|
Vishnudev Kuruvanthodi
|
Geeth Mel
Proceedings of the 1st Workshop on Language + Molecules (L+M 2024)
This paper presents our approach submitted to the Language + Molecules 2024 (L+M-24) Shared Task in the Molecular Captioning track. The task involves generating captions that describe the properties of molecules that are provided in SMILES format.We propose a method for the task that decomposes the challenge of generating captions from SMILES into a classification problem,where we first predict the molecule’s properties. The molecules whose properties can be predicted with high accuracy show high translation metric scores in the caption generation by LLMs, while others produce low scores. Then we use the predicted properties to select the captions generated by different types of LLMs, and use that prediction as the final output. Our submission achieved an overall increase score of 15.21 on the dev set and 12.30 on the evaluation set, based on translation metrics and property metrics from the baseline.
pdf
bib
abs
NLPeople at TextGraphs-17 Shared Task: Chain of Thought Questioning to Elicit Decompositional Reasoning
Movina Moses
|
Vishnudev Kuruvanthodi
|
Mohab Elkaref
|
Shinnosuke Tanaka
|
James Barry
|
Geeth Mel
|
Campbell Watson
Proceedings of TextGraphs-17: Graph-based Methods for Natural Language Processing
This paper presents the approach of the NLPeople team for the Text-Graph Representations for KGQA Shared Task at TextGraphs-17. The task involved selecting an answer for a given question from a list of candidate entities. We show that prompting Large Language models (LLMs) to break down a natural language question into a series of sub-questions, allows models to understand complex questions. The LLMs arrive at the final answer by answering the intermediate questions using their internal knowledge and without needing additional context. Our approach to the task uses an ensemble of prompting strategies to guide how LLMs interpret various types of questions. Our submission achieves an F1 score of 85.90, ranking 1st among the other participants in the task.
2023
pdf
bib
abs
NLPeople at NADI 2023 Shared Task: Arabic Dialect Identification with Augmented Context and Multi-Stage Tuning
Mohab Elkaref
|
Movina Moses
|
Shinnosuke Tanaka
|
James Barry
|
Geeth Mel
Proceedings of ArabicNLP 2023
This paper presents the approach of the NLPeople team to the Nuanced Arabic Dialect Identification (NADI) 2023 shared task. Subtask 1 involves identifying the dialect of a source text at the country level. Our approach to Subtask 1 makes use of language-specific language models, a clustering and retrieval method to provide additional context to a target sentence, a fine-tuning strategy which makes use of the provided data from the 2020 and 2021 shared tasks, and finally, ensembling over the predictions of multiple models. Our submission achieves a macro-averaged F1 score of 87.27, ranking 1st among the other participants in the task.
pdf
bib
abs
El-Kawaref at WojoodNER shared task: StagedNER for Arabic Named Entity Recognition
Nehal Elkaref
|
Mohab Elkaref
Proceedings of ArabicNLP 2023
Named Entity Recognition (NER) is the task of identifying word-units that correspond to mentions as location, organization, person, or currency. In this shared task we tackle flat-entity classification for Arabic, where for each word-unit a single entity should be identified. To resolve the classification problem we propose StagedNER a novel technique to fine-tuning NER downstream tasks that divides the learning process of a transformer-model into two phases, where a model is tasked to learn sequence tags and then entity tags rather than learn both together simultaneously for an input sequence. We create an ensemble of two base models using this method that yield a score of on the development set and an F1 performance of 90.03% on the validation set and 91.95% on the test set.
pdf
bib
abs
NLPeople at SemEval-2023 Task 2: A Staged Approach for Multilingual Named Entity Recognition
Mohab Elkaref
|
Nathan Herr
|
Shinnosuke Tanaka
|
Geeth De Mel
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
The MultiCoNER II shared task aims at detecting complex, ambiguous named entities with fine-grained types in a low context setting. Previous winning systems incorporated external knowledge bases to retrieve helpful contexts. In our submission we additionally propose splitting the NER task into two stages, a Span Extraction Step, and an Entity Classification step. Our results show that the former does not suffer from the low context setting comparably, and in so leading to a higher overall performance for an external KB-assisted system. We achieve 3rd place on the multilingual track and an average of 6th place overall.
2021
pdf
bib
abs
A Joint Training Approach to Tweet Classification and Adverse Effect Extraction and Normalization for SMM4H 2021
Mohab Elkaref
|
Lamiece Hassan
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task
In this work we describe our submissions to the Social Media Mining for Health (SMM4H) 2021 Shared Task. We investigated the effectiveness of a joint training approach to Task 1, specifically classification, extraction and normalization of Adverse Drug Effect (ADE) mentions in English tweets. Our approach performed well on the normalization task, achieving an above average f1 score of 24%, but less so on classification and extraction, with f1 scores of 22% and 37% respectively. Our experiments also showed that a larger dataset with more negative results led to stronger results than a smaller more balanced dataset, even when both datasets have the same positive examples. Finally we also submitted a tuned BERT model for Task 6: Classification of Covid-19 tweets containing symptoms, which achieved an above average f1 score of 96%.
2019
pdf
bib
Recursive LSTM Tree Representation for Arc-Standard Transition-Based Dependency Parsing
Mohab Elkaref
|
Bernd Bohnet
Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)
2015
pdf
bib
Domain Adaptation for Dependency Parsing via Self-Training
Juntao Yu
|
Mohab Elkaref
|
Bernd Bohnet
Proceedings of the 14th International Conference on Parsing Technologies
2014
pdf
bib
Exploring Options for Fast Domain Adaptation of Dependency Parsers
Viktor Pekar
|
Juntao Yu
|
Mohab El-karef
|
Bernd Bohnet
Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages