Zornitsa Kozareva


2021

pdf bib
SoDA: On-device Conversational Slot Extraction
Sujith Ravi | Zornitsa Kozareva
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

We propose a novel on-device neural sequence labeling model which uses embedding-free projections and character information to construct compact word representations to learn a sequence model using a combination of bidirectional LSTM with self-attention and CRF. Unlike typical dialog models that rely on huge, complex neural network architectures and large-scale pre-trained Transformers to achieve state-of-the-art results, our method achieves comparable results to BERT and even outperforms its smaller variant DistilBERT on conversational slot extraction tasks. Our method is faster than BERT models while achieving significant model size reduction–our model requires 135x and 81x fewer model parameters than BERT and DistilBERT, respectively. We conduct experiments on multiple conversational datasets and show significant improvements over existing methods including recent on-device models. Experimental results and ablation studies also show that our neural models preserve tiny memory footprint necessary to operate on smart devices, while still maintaining high performance.

pdf bib
Proceedings of the 5th Workshop on Structured Prediction for NLP (SPNLP 2021)
Zornitsa Kozareva | Sujith Ravi | Andreas Vlachos | Priyanka Agrawal | André Martins
Proceedings of the 5th Workshop on Structured Prediction for NLP (SPNLP 2021)

pdf bib
ProFormer: Towards On-Device LSH Projection Based Transformers
Chinnadhurai Sankar | Sujith Ravi | Zornitsa Kozareva
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

At the heart of text based neural models lay word representations, which are powerful but occupy a lot of memory making it challenging to deploy to devices with memory constraints such as mobile phones, watches and IoT. To surmount these challenges, we introduce ProFormer – a projection based transformer architecture that is faster and lighter making it suitable to deploy to memory constraint devices and preserve user privacy. We use LSH projection layer to dynamically generate word representations on-the-fly without embedding lookup tables leading to significant memory footprint reduction from O(V.d) to O(T), where V is the vocabulary size, d is the embedding dimension size and T is the dimension of the LSH projection representation.We also propose a local projection attention (LPA) layer, which uses self-attention to transform the input sequence of N LSH word projections into a sequence of N/K representations reducing the computations quadratically by O(Kˆ2). We evaluate ProFormer on multiple text classification tasks and observed improvements over prior state-of-the-art on-device approaches for short text classification and comparable performance for long text classification tasks. ProFormer is also competitive with other popular but highly resource-intensive approaches like BERT and even outperforms small-sized BERT variants with significant resource savings – reduces the embedding memory footprint from 92.16 MB to 1.7 KB and requires 16x less computation overhead, which is very impressive making it the fastest and smallest on-device model.

pdf bib
On-Device Text Representations Robust To Misspellings via Projections
Chinnadhurai Sankar | Sujith Ravi | Zornitsa Kozareva
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Recently, there has been a strong interest in developing natural language applications that live on personal devices such as mobile phones, watches and IoT with the objective to preserve user privacy and have low memory. Advances in Locality-Sensitive Hashing (LSH)-based projection networks have demonstrated state-of-the-art performance in various classification tasks without explicit word (or word-piece) embedding lookup tables by computing on-the-fly text representations. In this paper, we show that the projection based neural classifiers are inherently robust to misspellings and perturbations of the input text. We empirically demonstrate that the LSH projection based classifiers are more robust to common misspellings compared to BiLSTMs (with both word-piece & word-only tokenization) and fine-tuned BERT based methods. When subject to misspelling attacks, LSH projection based classifiers had a small average accuracy drop of 2.94% across multiple classifications tasks, while the fine-tuned BERT model accuracy had a significant drop of 11.44%.

2020

pdf bib
Proceedings of the Fourth Workshop on Structured Prediction for NLP
Priyanka Agrawal | Zornitsa Kozareva | Julia Kreutzer | Gerasimos Lampouras | André Martins | Sujith Ravi | Andreas Vlachos
Proceedings of the Fourth Workshop on Structured Prediction for NLP

2019

pdf bib
Proceedings of the Third Workshop on Structured Prediction for NLP
Andre Martins | Andreas Vlachos | Zornitsa Kozareva | Sujith Ravi | Gerasimos Lampouras | Vlad Niculae | Julia Kreutzer
Proceedings of the Third Workshop on Structured Prediction for NLP

pdf bib
Transferable Neural Projection Representations
Chinnadhurai Sankar | Sujith Ravi | Zornitsa Kozareva
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Neural word representations are at the core of many state-of-the-art natural language processing models. A widely used approach is to pre-train, store and look up word or character embedding matrices. While useful, such representations occupy huge memory making it hard to deploy on-device and often do not generalize to unknown words due to vocabulary pruning. In this paper, we propose a skip-gram based architecture coupled with Locality-Sensitive Hashing (LSH) projections to learn efficient dynamically computable representations. Our model does not need to store lookup tables as representations are computed on-the-fly and require low memory footprint. The representations can be trained in an unsupervised fashion and can be easily transferred to other NLP tasks. For qualitative evaluation, we analyze the nearest neighbors of the word representations and discover semantically similar words even with misspellings. For quantitative evaluation, we plug our transferable projections into a simple LSTM and run it on multiple NLP tasks and show how our transferable projections achieve better performance compared to prior work.

pdf bib
On-device Structured and Context Partitioned Projection Networks
Sujith Ravi | Zornitsa Kozareva
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

A challenging problem in on-device text classification is to build highly accurate neural models that can fit in small memory footprint and have low latency. To address this challenge, we propose an on-device neural network SGNN++ which dynamically learns compact projection vectors from raw text using structured and context-dependent partition projections. We show that this results in accelerated inference and performance improvements. We conduct extensive evaluation on multiple conversational tasks and languages such as English, Japanese, Spanish and French. Our SGNN++ model significantly outperforms all baselines, improves upon existing on-device neural models and even surpasses RNN, CNN and BiLSTM models on dialog act and intent prediction. Through a series of ablation studies we show the impact of the partitioned projections and structured information leading to 10% improvement. We study the impact of the model size on accuracy and introduce quatization-aware training for SGNN++ to further reduce the model size while preserving the same quality. Finally, we show fast inference on mobile phones.

pdf bib
ProSeqo: Projection Sequence Networks for On-Device Text Classification
Zornitsa Kozareva | Sujith Ravi
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We propose a novel on-device sequence model for text classification using recurrent projections. Our model ProSeqo uses dynamic recurrent projections without the need to store or look up any pre-trained embeddings. This results in fast and compact neural networks that can perform on-device inference for complex short and long text classification tasks. We conducted exhaustive evaluation on multiple text classification tasks. Results show that ProSeqo outperformed state-of-the-art neural and on-device approaches for short text classification tasks such as dialog act and intent prediction. To the best of our knowledge, ProSeqo is the first on-device long text classification neural model. It achieved comparable results to previous neural approaches for news article, answers and product categorization, while preserving small memory footprint and maintaining high accuracy.

pdf bib
PRADO: Projection Attention Networks for Document Classification On-Device
Prabhu Kaliamoorthi | Sujith Ravi | Zornitsa Kozareva
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Recently, there has been a great interest in the development of small and accurate neural networks that run entirely on devices such as mobile phones, smart watches and IoT. This enables user privacy, consistent user experience and low latency. Although a wide range of applications have been targeted from wake word detection to short text classification, yet there are no on-device networks for long text classification. We propose a novel projection attention neural network PRADO that combines trainable projections with attention and convolutions. We evaluate our approach on multiple large document text classification tasks. Our results show the effectiveness of the trainable projection model in finding semantically similar phrases and reaching high performance while maintaining compact size. Using this approach, we train tiny neural networks just 200 Kilobytes in size that improve over prior CNN and LSTM models and achieve near state of the art performance on multiple long document classification tasks. We also apply our model for transfer learning, show its robustness and ability to further improve the performance in limited data scenarios.

2018

pdf bib
Self-Governing Neural Networks for On-Device Short Text Classification
Sujith Ravi | Zornitsa Kozareva
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Deep neural networks reach state-of-the-art performance for wide range of natural language processing, computer vision and speech applications. Yet, one of the biggest challenges is running these complex networks on devices such as mobile phones or smart watches with tiny memory footprint and low computational capacity. We propose on-device Self-Governing Neural Networks (SGNNs), which learn compact projection vectors with local sensitive hashing. The key advantage of SGNNs over existing work is that they surmount the need for pre-trained word embeddings and complex networks with huge parameters. We conduct extensive evaluation on dialog act classification and show significant improvement over state-of-the-art results. Our findings show that SGNNs are effective at capturing low-dimensional semantic text representations, while maintaining high accuracy.

pdf bib
Self-Governing Neural Networks for On-Device Short Text Classification
Sujith Ravi | Zornitsa Kozareva
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Deep neural networks reach state-of-the-art performance for wide range of natural language processing, computer vision and speech applications. Yet, one of the biggest challenges is running these complex networks on devices such as mobile phones or smart watches with tiny memory footprint and low computational capacity. We propose on-device Self-Governing Neural Networks (SGNNs), which learn compact projection vectors with local sensitive hashing. The key advantage of SGNNs over existing work is that they surmount the need for pre-trained word embeddings and complex networks with huge parameters. We conduct extensive evaluation on dialog act classification and show significant improvement over state-of-the-art results. Our findings show that SGNNs are effective at capturing low-dimensional semantic text representations, while maintaining high accuracy.

2016

pdf bib
Recognizing Salient Entities in Shopping Queries
Zornitsa Kozareva | Qi Li | Ke Zhai | Weiwei Guo
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Which Tumblr Post Should I Read Next?
Zornitsa Kozareva | Makoto Yamada
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2015

pdf bib
Everyone Likes Shopping! Multi-class Product Categorization for e-Commerce
Zornitsa Kozareva
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Multilingual Affect Polarity and Valence Prediction in Metaphors
Zornitsa Kozareva
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

2013

pdf bib
Proceedings of the First Workshop on Metaphor in NLP
Ekaterina Shutova | Beata Beigman Klebanov | Joel Tetreault | Zornitsa Kozareva
Proceedings of the First Workshop on Metaphor in NLP

pdf bib
Proceedings of TextGraphs-8 Graph-based Methods for Natural Language Processing
Zornitsa Kozareva | Irina Matveeva | Gabor Melli | Vivi Nastase
Proceedings of TextGraphs-8 Graph-based Methods for Natural Language Processing

pdf bib
Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts
Zornitsa Kozareva
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
SemEval-2013 Task 4: Free Paraphrases of Noun Compounds
Iris Hendrickx | Zornitsa Kozareva | Preslav Nakov | Diarmuid Ó Séaghdha | Stan Szpakowicz | Tony Veale
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
SemEval-2013 Task 2: Sentiment Analysis in Twitter
Preslav Nakov | Sara Rosenthal | Zornitsa Kozareva | Veselin Stoyanov | Alan Ritter | Theresa Wilson
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf bib
Learning Verbs on the Fly
Zornitsa Kozareva
Proceedings of COLING 2012: Posters

pdf bib
Cause-Effect Relation Learning
Zornitsa Kozareva
Workshop Proceedings of TextGraphs-7: Graph-based Methods for Natural Language Processing

pdf bib
SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning
Andrew Gordon | Zornitsa Kozareva | Melissa Roemmele
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf bib
Class Label Enhancement via Related Instances
Zornitsa Kozareva | Konstantin Voevodski | Shanghua Teng
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the ACL 2011 Workshop on Relational Models of Semantics
Su Nam Kim | Zornitsa Kozareva | Preslav Nakov | Diarmuid Ó Séaghdha | Sebastian Padó | Stan Szpakowicz
Proceedings of the ACL 2011 Workshop on Relational Models of Semantics

pdf bib
Unsupervised Name Ambiguity Resolution Using A Generative Model
Zornitsa Kozareva | Sujith Ravi
Proceedings of the First workshop on Unsupervised Learning in NLP

pdf bib
Proceedings of the RANLP 2011 Workshop on Information Extraction and Knowledge Acquisition
Preslav Nakov | Zornitsa Kozareva | Kuzman Ganchev | Jerry Hobbs
Proceedings of the RANLP 2011 Workshop on Information Extraction and Knowledge Acquisition

pdf bib
Combining Relational and Attributional Similarity for Semantic Relation Classification
Preslav Nakov | Zornitsa Kozareva
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
Insights from Network Structure for Text Mining
Zornitsa Kozareva | Eduard Hovy
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
Zornitsa Kozareva | Eduard Hovy
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
A Semi-Supervised Method to Learn and Construct Taxonomies Using the Web
Zornitsa Kozareva | Eduard Hovy
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations between Pairs of Nominals
Iris Hendrickx | Su Nam Kim | Zornitsa Kozareva | Preslav Nakov | Diarmuid Ó Séaghdha | Sebastian Padó | Marco Pennacchiotti | Lorenza Romano | Stan Szpakowicz
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Not All Seeds Are Equal: Measuring the Quality of Text Mining Seeds
Zornitsa Kozareva | Eduard Hovy
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf bib
Toward Completeness in Concept Extraction and Classification
Eduard Hovy | Zornitsa Kozareva | Ellen Riloff
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals
Iris Hendrickx | Su Nam Kim | Zornitsa Kozareva | Preslav Nakov | Diarmuid Ó Séaghdha | Sebastian Padó | Marco Pennacchiotti | Lorenza Romano | Stan Szpakowicz
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

2008

pdf bib
Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs
Zornitsa Kozareva | Ellen Riloff | Eduard Hovy
Proceedings of ACL-08: HLT

2007

pdf bib
A Language Independent Approach for Name Categorization and Discrimination
Zornitsa Kozareva | Sonia Vázquez | Andrés Montoyo
Proceedings of the Workshop on Balto-Slavonic Natural Language Processing

pdf bib
UA-ZBSA: A Headline Emotion Classification through Web Information
Zornitsa Kozareva | Borja Navarro | Sonia Vázquez | Andrés Montoyo
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
UA-ZSA: Web Page Clustering on the basis of Name Disambiguation
Zornitsa Kozareva | Sonia Vazquez | Andres Montoyo
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

pdf bib
Improving Name Discrimination: A Language Salad Approach
Ted Pedersen | Anagha Kulkarni | Roxana Angheluta | Zornitsa Kozareva | Thamar Solorio
Proceedings of the Cross-Language Knowledge Induction Workshop

pdf bib
Bootstrapping Named Entity Recognition with Automatically Generated Gazetteer Lists
Zornitsa Kozareva
Student Research Workshop

2004

pdf bib
Cluster Analysis and Classification of Named Entities
Joaquim F. Ferreira da Silva | Zornitsa Kozareva | José Gabriel Pereira Lopes
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Extracting Named Entities. A Statistical Approach
Joaquim Silva | Zornitsa Kozareva | Veska Noncheva | Gabriel Lopes
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Named entities and more generally Multiword Lexical Units (MWUs) are important for various applications. However, language independent methods for automatically extracting MWUs do not provide us with clean data. So, in this paper we propose a method for selecting possible named entities from automatically extracted MWUs, and later, a statistics-based language independent unsupervised approach is applied to possible named entities in order to cluster them according to their type. Statistical features used by our clustering process are described and motivated. The Model-Based Clustering Analysis (MBCA) software enabled us to obtain different clusters for proposed named entities. The method was applied to Bulgarian and English. For some clusters, precision is very high; other clusters still need further refinement. Based on the obtained clusters, it is also possible to classify new possible named entities.