2020
pdf
bib
abs
Information Extraction from Federal Open Market Committee Statements
Oana Frunza
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation
We present a novel approach to unsupervised information extraction by identifying and extracting relevant concept-value pairs from textual data. The system’s building blocks are domain agnostic, making it universally applicable. In this paper, we describe each component of the system and how it extracts relevant economic information from U.S. Federal Open Market Committee (FOMC) statements. Our methodology achieves an impressive 96% accuracy for identifying relevant information for a set of seven economic indicators: household spending, inflation, unemployment, economic activity, fixed in-vestment, federal funds rate, and labor market.
2010
pdf
bib
Extraction of Disease-Treatment Semantic Relations from Biomedical Sentences
Oana Frunza
|
Diana Inkpen
Proceedings of the 2010 Workshop on Biomedical Natural Language Processing
pdf
bib
Building Systematic Reviews Using Automatic Text Classification Techniques
Oana Frunza
|
Diana Inkpen
|
Stan Matwin
Coling 2010: Posters
2008
pdf
bib
Textual Information for Predicting Functional Properties of the Genes
Oana Frunza
|
Diana Inkpen
Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
pdf
bib
abs
A Trainable Tokenizer, solution for multilingual texts and compound expression tokenization
Oana Frunza
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Tokenization is one of the initial steps done for almost any text processing task. It is not particularly recognized as a challenging task for English monolingual systems but it rapidly increases in complexity for systems that apply it for different languages. This article proposes a supervised learning approach to perform the tokenization task. The method presented in this article is based on character transitions representation, a representation that allows compound expressions to be recognized as a single token. Compound tokens are identified independent of the character that creates the expression. The method automatically learns tokenization rules from a pre-tokenized corpus. The results obtained using the trainable system show that for Romanian and English a statistical significant improvement is obtained over a baseline system that tokenizes texts on every non-alphanumeric character.
2007
pdf
bib
abs
A tool for detecting French-English cognates and false friends
Oana Frunza
|
Diana Inkpen
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
Cognates are pairs of words in different languages similar in spelling and meaning. They can help a second-language learner on the tasks of vocabulary expansion and reading comprehension. False friends are pairs of words that have similar spelling but different meanings. Partial cognates are pairs of words in two languages that have the same meaning in some, but not all contexts. In this article we present a method to automatically classify a pair of words as cognates or false friends, by using several measures of orthographic similarity as features for classification. We use this method to create complete lists of cognates and false friends between two languages. We also disambiguate partial cognates in context. We applied all our methods to French and English, but they can be applied to other pairs of languages as well. We built a tool that takes the produced lists and annotates a French text with equivalent English cognates or false friends, in order to help second-language learners improve their reading comprehension skills and retention rate.
2006
pdf
bib
Semi-Supervised Learning of Partial Cognates Using Bilingual Bootstrapping
Oana Frunza
|
Diana Inkpen
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics