Marius Pasca

Also published as: Marius A. Pasca, Marius Paşca


2020

pdf bib
Interpreting Open-Domain Modifiers: Decomposition of Wikipedia Categories into Disambiguated Property-Value Pairs
Marius Pasca
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

This paper proposes an open-domain method for automatically annotating modifier constituents (20th-century’) within Wikipedia categories (20th-century male writers) with properties (date of birth). The annotations offer a semantically-anchored understanding of the role of the constituents in defining the underlying meaning of the categories. In experiments over an evaluation set of Wikipedia categories, the proposed method annotates constituent modifiers as semantically-anchored properties, rather than as mere strings in a previous method. It does so at a better trade-off between precision and recall.

2019

pdf bib
Wikipedia as a Resource for Text Analysis and Retrieval
Marius Pasca
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

This tutorial examines the role of Wikipedia in tasks related to text analysis and retrieval. Text analysis tasks, which take advantage of Wikipedia, include coreference resolution, word sense and entity disambiguation and information extraction. In information retrieval, a better understanding of the structure and meaning of queries helps in matching queries against documents, clustering search results, answer and entity retrieval and retrieving knowledge panels for queries asking about popular entities.

2017

pdf bib
Acquisition, Representation and Usage of Conceptual Hierarchies
Marius Pasca
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

Through subsumption and instantiation, individual instances (“artificial intelligence”, “the spotted pig”) otherwise spanning a wide range of domains can be brought together and organized under conceptual hierarchies. The hierarchies connect more specific concepts (“computer science subfields”, “gastropubs”) to more general concepts (“academic disciplines”, “restaurants”) through IsA relations. Explicit or implicit properties applicable to, and defining, more general concepts are inherited by their more specific concepts, down to the instances connected to the lower parts of the hierarchies. Subsumption represents a crisp, universally-applicable principle towards consistently representing IsA relations in any knowledge resource. Yet knowledge resources often exhibit significant differences in their scope, representation choices and intended usage, to cause significant differences in their expected usage and impact on various tasks. This tutorial examines the theoretical foundations of subsumption, and its practical embodiment through IsA relations compiled manually or extracted automatically. It addresses IsA relations from their formal definition; through practical choices made in their representation within the larger and more widely-used of the available knowledge resources; to their automatic acquisition from document repositories, as opposed to their manual compilation by human contributors; to their impact in text analysis and information retrieval. As search engines move away from returning a set of links and closer to returning results that more directly answer queries, IsA relations play an increasingly important role towards a better understanding of documents and queries. The tutorial teaches the audience about definitions, assumptions and practical choices related to modeling and representing IsA relations in existing, human-compiled resources of instances, concepts and resulting conceptual hierarchies; methods for automatically extracting sets of instances within unlabeled or labeled concepts, where the concepts may be considered as a flat set or organized hierarchically; and applications of IsA relations in information retrieval.

pdf bib
Identifying 1950s American Jazz Musicians: Fine-Grained IsA Extraction via Modifier Composition
Ellie Pavlick | Marius Paşca
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a method for populating fine-grained classes (e.g., “1950s American jazz musicians”) with instances (e.g., Charles Mingus ). While state-of-the-art methods tend to treat class labels as single lexical units, the proposed method considers each of the individual modifiers in the class label relative to the head. An evaluation on the task of reconstructing Wikipedia category pages demonstrates a >10 point increase in AUC, over a strong baseline relying on widely-used Hearst patterns.

2016

pdf bib
Revisiting Taxonomy Induction over Wikipedia
Amit Gupta | Francesco Piccinno | Mikhail Kozhevnikov | Marius Paşca | Daniele Pighin
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Guided by multiple heuristics, a unified taxonomy of entities and categories is distilled from the Wikipedia category network. A comprehensive evaluation, based on the analysis of upward generalization paths, demonstrates that the taxonomy supports generalizations which are more than twice as accurate as the state of the art. The taxonomy is available at http://headstaxonomy.com.

pdf bib
The Role of Wikipedia in Text Analysis and Retrieval
Marius Paşca
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Tutorial Abstracts

This tutorial examines the characteristics, advantages and limitations of Wikipedia relative to other existing, human-curated resources of knowledge; derivative resources, created by converting semi-structured content in Wikipedia into structured data; the role of Wikipedia and its derivatives in text analysis; and the role of Wikipedia and its derivatives in enhancing information retrieval.

2015

pdf bib
Interpreting Compound Noun Phrases Using Web Search Queries
Marius Paşca
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

bib
Knowledge Acquisition for Web Search
Marius Pasca
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts

The identification of textual items, or documents, that best match a user’s information need, as expressed in search queries, forms the core functionality of information retrieval systems. Well-known challenges are associated with understanding the intent behind user queries; and, more importantly, with matching inherently-ambiguous queries to documents that may employ lexically different phrases to convey the same meaning. The conversion of semi-structured content from Wikipedia and other resources into structured data produces knowledge potentially more suitable to database-style queries and, ideally, to use in information retrieval. In parallel, the availability of textual documents on the Web enables an aggressive push towards the automatic acquisition of various types of knowledge from text. Methods developed under the umbrella of open-domain information extraction acquire open-domain classes of instances and relations from Web text. The methods operate over unstructured or semi-structured text available within collections of Web documents, or over relatively more intriguing streams of anonymized search queries. Some of the methods import the automatically-extracted data into human-generated resources, or otherwise exploit existing human-generated resources. In both cases, the goal is to expand the coverage of the initial resources, thus providing information about more of the topics that people in general, and Web search users in particular, may be interested in.

2014

pdf bib
Queries as a Source of Lexicalized Commonsense Knowledge
Marius Paşca
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Acquisition of Noncontiguous Class Attributes from Web Search Queries
Marius Paşca
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

2013

pdf bib
Open-Domain Fine-Grained Class Extraction from Web Search Queries
Marius Paşca
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2012

pdf bib
Attribute Extraction from Conjectural Queries
Marius Paşca
Proceedings of COLING 2012

pdf bib
Instance-Driven Attachment of Semantic Annotations over Conceptual Hierarchies
Janara Christensen | Marius Paşca
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Jun’ichi Tsujii | James Henderson | Marius Paşca
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Fine-Grained Class Label Markup of Search Queries
Joseph Reisinger | Marius Paşca
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Ranking Class Labels Using Query Sessions
Marius Paşca
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Web Search Queries as a Corpus
Marius Paşca
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

pdf bib
Attribute Extraction from Synthetic Web Search Queries
Marius Paşca
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
Proceedings of the NAACL HLT 2010 Workshop on Semantic Search
Donghui Feng | Jamie Callan | Eduard Hovy | Marius Pasca
Proceedings of the NAACL HLT 2010 Workshop on Semantic Search

pdf bib
Instance Sense Induction from Attribute Sets
Ricardo Martin-Brualla | Enrique Alfonseca | Marius Pasca | Keith Hall | Enrique Robledo-Arnuncio | Massimiliano Ciaramita
Coling 2010: Posters

pdf bib
The Role of Queries in Ranking Labeled Instances Extracted from Text
Marius Paşca
Coling 2010: Posters

2009

pdf bib
A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches
Eneko Agirre | Enrique Alfonseca | Keith Hall | Jana Kravalova | Marius Paşca | Aitor Soroa
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Outclassing Wikipedia in Open-Domain Information Extraction: Weakly-Supervised Acquisition of Attributes over Conceptual Hierarchies
Marius Paşca
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Latent Variable Models of Concept-Attribute Attachment
Joseph Reisinger | Marius Paşca
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

2008

pdf bib
Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs
Marius Paşca | Benjamin Van Durme
Proceedings of ACL-08: HLT

pdf bib
Mining Parenthetical Translations from the Web by Word Alignment
Dekang Lin | Shaojun Zhao | Benjamin Van Durme | Marius Paşca
Proceedings of ACL-08: HLT

pdf bib
Answering Definition Questions via Temporally-Anchored Text Snippets
Marius Paşca
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Low-Complexity Heuristics for Deriving Fine-Grained Classes of Named Entities from Web Textual Data
Marius Paşca
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We introduce a low-complexity method for acquiring fine-grained classes of named entities from the Web. The method exploits the large amounts of textual data available on the Web, while avoiding the use of any expensive text processing techniques or tools. The quality of the extracted classes is encouraging with respect to both the precision of the sets of named entities acquired within various classes, and the labels assigned to the sets of named entities.

pdf bib
Weakly-Supervised Acquisition of Labeled Class Instances using Graph Random Walks
Partha Pratim Talukdar | Joseph Reisinger | Marius Paşca | Deepak Ravichandran | Rahul Bhagat | Fernando Pereira
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2006

pdf bib
Names and Similarities on the Web: Fact Extraction in the Fast Lane
Marius Paşca | Dekang Lin | Jeffrey Bigham | Andrei Lifchits | Alpa Jain
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Using Encyclopedic Knowledge for Named entity Disambiguation
Razvan Bunescu | Marius Paşca
11th Conference of the European Chapter of the Association for Computational Linguistics

2005

pdf bib
Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web
Marius Paşca | Péter Dienes
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Book Review: New Directions in Question Answering, edited by Mark T. Maybury
Marius Paşca
Computational Linguistics, Volume 31, Number 3, September 2005

2002

pdf bib
Performance Issues and Error Analysis in an Open-Domain Question Answering System
Dan Moldovan | Marius Pasca | Sanda Harabagiu | Mihai Surdeanu
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

2001

pdf bib
Answer Mining from On-Line Documents
Marius Pasca | Sanda Harabagiu
Proceedings of the ACL 2001 Workshop on Open-Domain Question Answering

pdf bib
The Role of Lexico-Semantic Feedback in Open-Domain Textual Question-Answering
Sanda Harabagiu | Dan Moldovan | Marius Pasca | Rada Mihalcea | Mihai Surdeanu | Razvan Bunsecu | Roxana Girju | Vasile Rus | Paul Morarescu
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf bib
The Structure and Performance of an Open-Domain Question Answering System
Dan Moldovan | Sanda Harabagiu | Marius Pasca | Rada Mihalcea | Roxana Girju | Richard Goodrum | Vasile Rus
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

pdf bib
Experiments with Open-Domain Textual Question Answering
Sanda M. Harabagiu | Marius A. Pasca | Steven J. Maiorano
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics