Octavia-Maria Şulea

Also published as: Octavia-Maria Sulea, Maria-Octavia Sulea, Maria Sulea

2017

Recognizing Textual Entailment in Twitter Using Word Embeddings
Octavia-Maria Şulea
Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP

In this paper, we investigate the application of machine learning techniques and word embeddings to the task of Recognizing Textual Entailment (RTE) in Social Media. We look at a manually labeled dataset consisting of user generated short texts posted on Twitter (tweets) and related to four recent media events (the Charlie Hebdo shooting, the Ottawa shooting, the Sydney Siege, and the German Wings crash) and test to what extent neural techniques and embeddings are able to distinguish between tweets that entail or contradict each other or that claim unrelated things. We obtain comparable results to the state of the art in a train-test setting, but we show that, due to the noisy aspect of the data, results plummet in an evaluation strategy crafted to better simulate a real-life train-test scenario.

pdf bib abs

Predicting the Law Area and Decisions of French Supreme Court Cases
Octavia-Maria Şulea | Marcos Zampieri | Mihaela Vela | Josef van Genabith
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

In this paper, we investigate the application of text classification methods to predict the law area and the decision of cases judged by the French Supreme Court. We also investigate the influence of the time period in which a ruling was made over the textual form of the case description and the extent to which it is necessary to mask the judge’s motivation for a ruling to emulate a real-world test scenario. We report results of 96% f1 score in predicting a case ruling, 90% f1 score in predicting the law area of a case, and 75.9% f1 score in estimating the time span when a ruling has been issued using a linear Support Vector Machine (SVM) classifier trained on lexical features.

2016

pdf bib abs

Using Word Embeddings to Translate Named Entities
Octavia-Maria Şulea | Sergiu Nisioi | Liviu P. Dinu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we investigate the usefulness of neural word embeddings in the process of translating Named Entities (NEs) from a resource-rich language to a language low on resources relevant to the task at hand, introducing a novel, yet simple way of obtaining bilingual word vectors. Inspired by observations in (Mikolov et al., 2013b), which show that training their word vector model on comparable corpora yields comparable vector space representations of those corpora, reducing the problem of translating words to finding a rotation matrix, and results in (Zou et al., 2013), which showed that bilingual word embeddings can improve Chinese Named Entity Recognition (NER) and English to Chinese phrase translation, we use the sentence-aligned English-French EuroParl corpora and show that word embeddings extracted from a merged corpus (corpus resulted from the merger of the two aligned corpora) can be used to NE translation. We extrapolate that word embeddings trained on merged parallel corpora are useful in Named Entity Recognition and Translation tasks for resource-poor languages.

2013

pdf bib

Sequence Tagging for Verb Conjugation in Romanian
Liviu Dinu | Octavia-Maria Şulea | Vlad Niculae
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib

Temporal classification for historical Romanian texts
Alina Maria Ciobanu | Anca Dinu | Liviu Dinu | Vlad Niculae | Octavia-Maria Şulea
Proceedings of the 7th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf bib

Temporal Text Classification for Romanian Novels set in the Past
Alina Maria Ciobanu | Liviu P. Dinu | Octavia-Maria Şulea | Anca Dinu | Vlad Niculae
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

2012

pdf bib

Dealing with the Grey Sheep of the Romanian Gender System, the Neuter
Liviu P. Dinu | Vlad Niculae | Maria Sulea
Proceedings of COLING 2012: Demonstration Papers

pdf bib abs

The Romanian Neuter Examined Through A Two-Gender N-Gram Classification System
Liviu P. Dinu | Vlad Niculae | Octavia-Maria Şulea
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Romanian has been traditionally seen as bearing three lexical genders: masculine, feminine and neuter, although it has always been known to have only two agreement patterns (for masculine and feminine). A recent analysis of the Romanian gender system described in (Bateman and Polinsky, 2010), based on older observations, argues that there are two lexically unspecified noun classes in the singular and two different ones in the plural and that what is generally called neuter in Romanian shares the class in the singular with masculines, and the class in the plural with feminines based not only on agreement features but also on form. Previous machine learning classifiers that have attempted to discriminate Romanian nouns according to gender have so far taken as input only the singular form, presupposing the traditional tripartite analysis. We propose a classifier based on two parallel support vector machines using n-gram features from the singular and from the plural which outperforms previous classifiers in its high ability to distinguish the neuter. The performance of our system suggests that the two-gender analysis of Romanian, on which it is based, is on the right track.

pdf bib

Pastiche Detection Based on Stopword Rankings. Exposing Impersonators of a Romanian Writer
Liviu P. Dinu | Vlad Niculae | Maria-Octavia Sulea
Proceedings of the Workshop on Computational Approaches to Deception Detection

pdf bib

Learning How to Conjugate the Romanian Verb. Rules for Regular and Partially Irregular Verbs
Liviu P. Dinu | Vlad Niculae | Octavia-Maria Sulea
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics