Mihai Lupu


2017

pdf bib
Character-based Neural Embeddings for Tweet Clustering
Svitlana Vakulenko | Lyndon Nixon | Mihai Lupu
Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media

In this paper we show how the performance of tweet clustering can be improved by leveraging character-based neural networks. The proposed approach overcomes the limitations related to the vocabulary explosion in the word-based models and allows for the seamless processing of the multilingual content. Our evaluation results and code are available on-line: https://github.com/vendi12/tweet2vec_clustering.

pdf bib
Volatility Prediction using Financial Disclosures Sentiments with Word Embedding-based IR Models
Navid Rekabsaz | Mihai Lupu | Artem Baklanov | Alexander Dür | Linda Andersson | Allan Hanbury
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Volatility prediction—an essential concept in financial markets—has recently been addressed using sentiment analysis methods. We investigate the sentiment of annual disclosures of companies in stock markets to forecast volatility. We specifically explore the use of recent Information Retrieval (IR) term weighting models that are effectively extended by related terms using word embeddings. In parallel to textual information, factual market data have been widely used as the mainstream approach to forecast market risk. We therefore study different fusion methods to combine text and market data resources. Our word embedding-based approach significantly outperforms state-of-the-art methods. In addition, we investigate the characteristics of the reports of the companies in different financial sectors.

2016

pdf bib
Standard Test Collection for English-Persian Cross-Lingual Word Sense Disambiguation
Navid Rekabsaz | Serwah Sabetghadam | Mihai Lupu | Linda Andersson | Allan Hanbury
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we address the shortage of evaluation benchmarks on Persian (Farsi) language by creating and making available a new benchmark for English to Persian Cross Lingual Word Sense Disambiguation (CL-WSD). In creating the benchmark, we follow the format of the SemEval 2013 CL-WSD task, such that the introduced tools of the task can also be applied on the benchmark. In fact, the new benchmark extends the SemEval-2013 CL-WSD task to Persian language.

2012

pdf bib
Applying Random Indexing to Structured Data to Find Contextually Similar Words
Danica Damljanović | Udo Kruschwitz | M-Dyaa Albakour | Johann Petrak | Mihai Lupu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Language resources extracted from structured data (e.g. Linked Open Data) have already been used in various scenarios to improve conventional Natural Language Processing techniques. The meanings of words and the relations between them are made more explicit in RDF graphs, in comparison to human-readable text, and hence have a great potential to improve legacy applications. In this paper, we describe an approach that can be used to extend or clarify the semantic meaning of a word by constructing a list of contextually related terms. Our approach is based on exploiting the structure inherent in an RDF graph and then applying the methods from statistical semantics, and in particular, Random Indexing, in order to discover contextually related terms. We evaluate our approach in the domain of life science using the dataset generated with the help of domain experts from a large pharmaceutical company (AstraZeneca). They were involved in two phases: firstly, to generate a set of keywords of interest to them, and secondly to judge the set of generated contextually similar words for each keyword of interest. We compare our proposed approach, exploiting the semantic graph, with the same method applied on the human readable text extracted from the graph.