Ted Pedersen


2021

pdf bib
Proceedings of the Fifth Workshop on Teaching NLP
David Jurgens | Varada Kolhatkar | Lucy Li | Margot Mieskes | Ted Pedersen
Proceedings of the Fifth Workshop on Teaching NLP

pdf bib
SemEval-2021 Task 11: NLPContributionGraph - Structuring Scholarly NLP Contributions for a Research Knowledge Graph
Jennifer D’Souza | Sören Auer | Ted Pedersen
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

There is currently a gap between the natural language expression of scholarly publications and their structured semantic content modeling to enable intelligent content search. With the volume of research growing exponentially every year, a search feature operating over semantically structured content is compelling. The SemEval-2021 Shared Task NLPContributionGraph (a.k.a. ‘the NCG task’) tasks participants to develop automated systems that structure contributions from NLP scholarly articles in the English language. Being the first-of-its-kind in the SemEval series, the task released structured data from NLP scholarly articles at three levels of information granularity, i.e. at sentence-level, phrase-level, and phrases organized as triples toward Knowledge Graph (KG) building. The sentence-level annotations comprised the few sentences about the article’s contribution. The phrase-level annotations were scientific term and predicate phrases from the contribution sentences. Finally, the triples constituted the research overview KG. For the Shared Task, participating systems were then expected to automatically classify contribution sentences, extract scientific terms and relations from the sentences, and organize them as KG triples. Overall, the task drew a strong participation demographic of seven teams and 27 participants. The best end-to-end task system classified contribution sentences at 57.27% F1, phrases at 46.41% F1, and triples at 22.28% F1. While the absolute performance to generate triples remains low, as conclusion to the article, the difficulty of producing such data and as a consequence of modeling it is highlighted.

pdf bib
Duluth at SemEval-2021 Task 11: Applying DeBERTa to Contributing Sentence Selection and Dependency Parsing for Entity Extraction
Anna Martin | Ted Pedersen
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper describes the Duluth system that participated in SemEval-2021 Task 11, NLP Contribution Graph. It details the extraction of contribution sentences and scientific entities and their relations from scholarly articles in the domain of Natural Language Processing. Our solution uses deBERTa for multi-class sentence classification to extract the contributing sentences and their type, and dependency parsing to outline each sentence and extract subject-predicate-object triples. Our system ranked fifth of seven for Phase 1: end-to-end pipeline, sixth of eight for Phase 2 Part 1: phrases and triples, and fifth of eight for Phase 2 Part 2: triples extraction.

2020

pdf bib
Duluth at SemEval-2020 Task 7: Using Surprise as a Key to Unlock Humorous Headlines
Shuning Jin | Yue Yin | XianE Tang | Ted Pedersen
Proceedings of the Fourteenth Workshop on Semantic Evaluation

We use pretrained transformer-based language models in SemEval-2020 Task 7: Assessing the Funniness of Edited News Headlines. Inspired by the incongruity theory of humor, we use a contrastive approach to capture the surprise in the edited headlines. In the official evaluation, our system gets 0.531 RMSE in Subtask 1, 11th among 49 submissions. In Subtask 2, our system gets 0.632 accuracy, 9th among 32 submissions.

pdf bib
Duluth at SemEval-2020 Task 12: Offensive Tweet Identification in English with Logistic Regression
Ted Pedersen
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes the Duluth systems that participated in SemEval–2020 Task 12, Multilingual Offensive Language Identification in Social Media (OffensEval–2020). We participated in the three English language tasks. Our systems provide a simple machine learning baseline using logistic regression. We trained our models on the distantly supervised training data made available by the task organizers and used no other resources. As might be expected we did not rank highly in the comparative evaluation: 79th of 85 in task A, 34th of 43 in task B, and 24th of 39 in task C. We carried out a qualitative analysis of our results and found that the class labels in the gold standard data are somewhat noisy. We hypothesize that the extremely high accuracy (>$ 90%) of the top ranked systems may reflect methods that learn the training data very well but may not generalize to the task of identifying offensive language in English. This analysis includes examples of tweets that despite being mildly redacted are still offensive.

2019

pdf bib
Duluth at SemEval-2019 Task 6: Lexical Approaches to Identify and Categorize Offensive Tweets
Ted Pedersen
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes the Duluth systems that participated in SemEval–2019 Task 6, Identifying and Categorizing Offensive Language in Social Media (OffensEval). For the most part these systems took traditional Machine Learning approaches that built classifiers from lexical features found in manually labeled training data. However, our most successful system for classifying a tweet as offensive (or not) was a rule-based black–list approach, and we also experimented with combining the training data from two different but related SemEval tasks. Our best systems in each of the three OffensEval tasks placed in the middle of the comparative evaluation, ranking 57th of 103 in task A, 39th of 75 in task B, and 44th of 65 in task C.

pdf bib
Duluth at SemEval-2019 Task 4: The Pioquinto Manterola Hyperpartisan News Detector
Saptarshi Sengupta | Ted Pedersen
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes the Pioquinto Manterola Hyperpartisan News Detector, which participated in SemEval-2019 Task 4. Hyperpartisan news is highly polarized and takes a very biased or one–sided view of a particular story. We developed two variants of our system, the more successful was a Logistic Regression classifier based on unigram features. This was our official entry in the task, and it placed 23rd of 42 participating teams. Our second variant was a Convolutional Neural Network that did not perform as well.

2018

pdf bib
UMDSub at SemEval-2018 Task 2: Multilingual Emoji Prediction Multi-channel Convolutional Neural Network on Subword Embedding
Zhenduo Wang | Ted Pedersen
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes the UMDSub system that participated in Task 2 of SemEval-2018. We developed a system that predicts an emoji given the raw text in a English tweet. The system is a Multi-channel Convolutional Neural Network based on subword embeddings for the representation of tweets. This model improves on character or word based methods by about 2%. Our system placed 21st of 48 participating systems in the official evaluation.

pdf bib
Duluth UROP at SemEval-2018 Task 2: Multilingual Emoji Prediction with Ensemble Learning and Oversampling
Shuning Jin | Ted Pedersen
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes the Duluth UROP systems that participated in SemEval–2018 Task 2, Multilingual Emoji Prediction. We relied on a variety of ensembles made up of classifiers using Naive Bayes, Logistic Regression, and Random Forests. We used unigram and bigram features and tried to offset the skewness of the data through the use of oversampling. Our task evaluation results place us 19th of 48 systems in the English evaluation, and 5th of 21 in the Spanish. After the evaluation we realized that some simple changes to our pre-processing could significantly improve our results. After making these changes we attained results that would have placed us sixth in the English evaluation, and second in the Spanish.

pdf bib
ALANIS at SemEval-2018 Task 3: A Feature Engineering Approach to Irony Detection in English Tweets
Kevin Swanberg | Madiha Mirza | Ted Pedersen | Zhenduo Wang
Proceedings of The 12th International Workshop on Semantic Evaluation

This paper describes the ALANIS system that participated in Task 3 of SemEval-2018. We develop a system for detection of irony, as well as the detection of three types of irony: verbal polar irony, other verbal irony, and situational irony. The system uses a logistic regression model in subtask A and a voted classifier system with manually developed features to identify ironic tweets. This model improves on a naive bayes baseline by about 8 percent on training set.

pdf bib
UMDuluth-CS8761 at SemEval-2018 Task9: Hypernym Discovery using Hearst Patterns, Co-occurrence frequencies and Word Embeddings
Arshia Zernab Hassan | Manikya Swathi Vallabhajosyula | Ted Pedersen
Proceedings of The 12th International Workshop on Semantic Evaluation

Hypernym Discovery is the task of identifying potential hypernyms for a given term. A hypernym is a more generalized word that is super-ordinate to more specific words. This paper explores several approaches that rely on co-occurrence frequencies of word pairs, Hearst Patterns based on regular expressions, and word embeddings created from the UMBC corpus. Our system Babbage participated in Subtask 1A for English and placed 6th of 19 systems when identifying concept hypernyms, and 12th of 18 systems for entity hypernyms.

2017

pdf bib
Improving Correlation with Human Judgments by Integrating Semantic Similarity with Second–Order Vectors
Bridget McInnes | Ted Pedersen
BioNLP 2017

Vector space methods that measure semantic similarity and relatedness often rely on distributional information such as co–occurrence frequencies or statistical measures of association to weight the importance of particular co–occurrences. In this paper, we extend these methods by incorporating a measure of semantic similarity based on a human curated taxonomy into a second–order vector representation. This results in a measure of semantic relatedness that combines both the contextual information available in a corpus–based vector space representation with the semantic knowledge found in a biomedical ontology. Our results show that incorporating semantic similarity into a second order co-occurrence matrices improves correlation with human judgments for both similarity and relatedness, and that our method compares favorably to various different word embedding methods that have recently been evaluated on the same reference standards we have used.

pdf bib
Duluth at SemEval-2017 Task 6: Language Models in Humor Detection
Xinru Yan | Ted Pedersen
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes the Duluth system that participated in SemEval-2017 Task 6 #HashtagWars: Learning a Sense of Humor. The system participated in Subtasks A and B using N-gram language models, ranking highly in the task evaluation. This paper discusses the results of our system in the development and evaluation stages and from two post-evaluation runs.

pdf bib
Duluth at SemEval-2017 Task 7 : Puns Upon a Midnight Dreary, Lexical Semantics for the Weak and Weary
Ted Pedersen
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes the Duluth systems that participated in SemEval-2017 Task 7 : Detection and Interpretation of English Puns. The Duluth systems participated in all three subtasks, and relied on methods that included word sense disambiguation and measures of semantic relatedness.

2016

pdf bib
Semi-supervised CLPsych 2016 Shared Task System Submission
Nicolas Rey-Villamizar | Prasha Shrestha | Thamar Solorio | Farig Sadeque | Steven Bethard | Ted Pedersen
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

pdf bib
Analysis of Anxious Word Usage on Online Health Forums
Nicolas Rey-Villamizar | Prasha Shrestha | Farig Sadeque | Steven Bethard | Ted Pedersen | Arjun Mukherjee | Thamar Solorio
Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis

pdf bib
Why Do They Leave: Modeling Participation in Online Depression Forums
Farig Sadeque | Ted Pedersen | Thamar Solorio | Prasha Shrestha | Nicolas Rey-Villamizar | Steven Bethard
Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media

pdf bib
Duluth at SemEval 2016 Task 14: Extending Gloss Overlaps to Enrich Semantic Taxonomies
Ted Pedersen
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
UMNDuluth at SemEval-2016 Task 14: WordNet’s Missing Lemmas
Jon Rusert | Ted Pedersen
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
Age and Gender Prediction on Health Forum Data
Prasha Shrestha | Nicolas Rey-Villamizar | Farig Sadeque | Ted Pedersen | Steven Bethard | Thamar Solorio
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Health support forums have become a rich source of data that can be used to improve health care outcomes. A user profile, including information such as age and gender, can support targeted analysis of forum data. But users might not always disclose their age and gender. It is desirable then to be able to automatically extract this information from users’ content. However, to the best of our knowledge there is no such resource for author profiling of health forum data. Here we present a large corpus, with close to 85,000 users, for profiling and also outline our approach and benchmark results to automatically detect a user’s age and gender from their forum posts. We use a mix of features from a user’s text as well as forum specific features to obtain accuracy well above the baseline, thus showing that both our dataset and our method are useful and valid.

2015

pdf bib
Duluth: Word Sense Discrimination in the Service of Lexicography
Ted Pedersen
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
Screening Twitter Users for Depression and PTSD with Lexical Decision Lists
Ted Pedersen
Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality

pdf bib
Predicting Continued Participation in Online Health Forums
Farig Sadeque | Thamar Solorio | Ted Pedersen | Prasha Shrestha | Steven Bethard
Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis

2014

pdf bib
Duluth : Measuring Cross-Level Semantic Similarity with First and Second Order Dictionary Overlaps
Ted Pedersen
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

pdf bib
Offspring from Reproduction Problems: What Replication Failure Teaches Us
Antske Fokkens | Marieke van Erp | Marten Postma | Ted Pedersen | Piek Vossen | Nuno Freire
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Duluth : Word Sense Induction Applied to Web Page Clustering
Ted Pedersen
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
UMLS::Similarity: Measuring the Relatedness and Similarity of Biomedical Concepts
Bridget McInnes | Ted Pedersen | Serguei Pakhomov | Ying Liu | Genevieve Melton-Meaux
Proceedings of the 2013 NAACL HLT Demonstration Session

2012

pdf bib
Duluth : Measuring Degrees of Relational Similarity with the Gloss Vector Measure of Semantic Relatedness
Ted Pedersen
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf bib
Using Second-order Vectors in a Knowledge-based Method for Acronym Disambiguation
Bridget T. McInnes | Ted Pedersen | Ying Liu | Serguei V. Pakhomov | Genevieve B. Melton
Proceedings of the Fifteenth Conference on Computational Natural Language Learning

pdf bib
The Ngram Statistics Package (Text::NSP) : A Flexible Tool for Identifying Ngrams, Collocations, and Word Associations
Ted Pedersen | Satanjeev Banerjee | Bridget McInnes | Saiyam Kohli | Mahesh Joshi | Ying Liu
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World

pdf bib
Identifying Collocations to Measure Compositionality: Shared Task System Description
Ted Pedersen
Proceedings of the Workshop on Distributional Semantics and Compositionality

2010

pdf bib
Information Content Measures of Semantic Similarity Perform Better Without Sense-Tagged Text
Ted Pedersen
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Duluth-WSI: SenseClusters Applied to the Sense Induction Task of SemEval-2
Ted Pedersen
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas
Thamar Solorio | Ted Pedersen
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas

2009

pdf bib
WordNet::SenseRelate::AllWords - A Broad Coverage Word Sense Tagger that Maximizes Semantic Relatedness
Ted Pedersen | Varada Kolhatkar
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Demonstration Session

2008

pdf bib
Last Words: Empiricism Is Not a Matter of Faith
Ted Pedersen
Computational Linguistics, Volume 34, Number 3, September 2008

2007

pdf bib
Determining the Syntactic Structure of Medical Terms in Clinical Notes
Bridget McInnes | Ted Pedersen | Serguei Pakhomov
Biological, translational, and clinical language processing

pdf bib
UMND1: Unsupervised Word Sense Disambiguation Using Contextual Semantic Relatedness
Siddharth Patwardhan | Satanjeev Banerjee | Ted Pedersen
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
UMND2 : SenseClusters Applied to the Sense Induction Task of Senseval-4
Ted Pedersen
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

pdf bib
Improving Name Discrimination: A Language Salad Approach
Ted Pedersen | Anagha Kulkarni | Roxana Angheluta | Zornitsa Kozareva | Thamar Solorio
Proceedings of the Cross-Language Knowledge Induction Workshop

pdf bib
Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts
Siddharth Patwardhan | Ted Pedersen
Proceedings of the Workshop on Making Sense of Sense: Bringing Psycholinguistics and Computational Linguistics Together

pdf bib
Automatic Cluster Stopping with Criterion Functions and the Gap Statistic
Ted Pedersen | Anagha Kulkarni
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Demonstrations

pdf bib
Selecting the “Right” Number of Senses Based on Clustering Criterion Functions
Ted Pedersen | Anagha Kulkarni
Demonstrations

2005

pdf bib
Proceedings of the ACL Interactive Poster and Demonstration Sessions
Masaaki Nagata | Ted Pedersen
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf bib
SenseRelate::TargetWord—A Generalized Framework for Word Sense Disambiguation
Siddharth Patwardhan | Satanjeev Banerjee | Ted Pedersen
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf bib
SenseClusters: Unsupervised Clustering and Labeling of Similar Contexts
Anagha Kulkarni | Ted Pedersen
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf bib
Proceedings of the ACL Workshop on Building and Using Parallel Texts
Philipp Koehn | Joel Martin | Rada Mihalcea | Christof Monz | Ted Pedersen
Proceedings of the ACL Workshop on Building and Using Parallel Texts

pdf bib
Word Alignment for Languages with Scarce Resources
Joel Martin | Rada Mihalcea | Ted Pedersen
Proceedings of the ACL Workshop on Building and Using Parallel Texts

2004

pdf bib
SenseClusters - Finding Clusters that Represent Word Senses
Amruta Purandare | Ted Pedersen
Demonstration Papers at HLT-NAACL 2004

pdf bib
WordNet::Similarity - Measuring the Relatedness of Concepts
Ted Pedersen | Siddharth Patwardhan | Jason Michelizzi
Demonstration Papers at HLT-NAACL 2004

pdf bib
The Senseval-3 Multilingual English-Hindi lexical sample task
Timothy Chklovski | Rada Mihalcea | Ted Pedersen | Amruta Purandare
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
Complementarity of lexical and simple syntactic features: The SyntaLex approach to Senseval-3
Saif Mohammad | Ted Pedersen
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
The Duluth lexical sample systems in Senseval-3
Ted Pedersen
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation
Saif Mohammad | Ted Pedersen
Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004

pdf bib
Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces
Amruta Purandare | Ted Pedersen
Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004

2003

pdf bib
An Evaluation Exercise for Word Alignment
Rada Mihalcea | Ted Pedersen
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

pdf bib
The Duluth Word Alignment System
Bridget Thomson McInnes | Ted Pedersen
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

2002

pdf bib
Assessing System Agreement and Instance Difficulty in the Lexical
Ted Pedersen
Proceedings of the ACL-02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions

pdf bib
Evaluating the Effectiveness of Ensembles of Decision Trees
Ted Pedersen
Proceedings of the ACL-02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions

2001

pdf bib
A Decision Tree of Bigrams is an Accurate Predictor of Word Sense
Ted Pedersen
Second Meeting of the North American Chapter of the Association for Computational Linguistics

pdf bib
Machine Learning with Lexical Features: The Duluth Approach to SENSEVAL-2
Ted Pedersen
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

2000

pdf bib
A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation
Ted Pedersen
1st Meeting of the North American Chapter of the Association for Computational Linguistics

1997

pdf bib
Sequential Model Selection for Word Sense Disambiguation
Ted Pedersen | Rebecca Bruce | Janyce Wiebe
Fifth Conference on Applied Natural Language Processing

pdf bib
Distinguishing Word Senses in Untagged Text
Ted Pedersen | Rebecca Bruce
Second Conference on Empirical Methods in Natural Language Processing

pdf bib
A Statistical Decision Making Method: A Case Study on Prepositional Phrase Attachment
Mehmet Kayaalp | Ted Pedersen | Rebecca Bruce
CoNLL97: Computational Natural Language Learning

1996

pdf bib
The Measure of a Model
Rebecca Bruce | Janyce Wiebe | Ted Pedersen
Conference on Empirical Methods in Natural Language Processing