Björn Gambäck

Also published as: Bjorn Gamback, Björn Gämback


2020

pdf bib
Sentimental Poetry Generation
Kasper Aalberg Røstvold | Björn Gambäck
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

The paper investigates how well poetry can be generated to contain a specific sentiment, and whether readers of the poetry experience the intended sentiment. The poetry generator consists of a bi-directional Long Short-Term Memory (LSTM) model, combined with rhyme pair generation, rule-based word prediction methods, and tree search for extending generation possibilities. The LSTM network was trained on a set of English poetry written and published by users on a public website. Human judges evaluated poems generated by the system, both with a positive and negative sentiment. The results indicate that while there are some weaknesses in the system compared to other state-of-the-art solutions, it is fully capable of generating poetry with an inherent sentiment that is perceived by readers.

pdf bib
Native-Language Identification with Attention
Stian Steinbakken | Björn Gambäck
Proceedings of the 17th International Conference on Natural Language Processing (ICON)

The paper explores how an attention-based approach can increase performance on the task of native-language identification (NLI), i.e., to identify an author’s first language given information expressed in a second language. Previously, Support Vector Machines have consistently outperformed deep learning-based methods on the TOEFL11 data set, the de facto standard for evaluating NLI systems. The attention-based system BERT (Bidirectional Encoder Representations from Transformers) was first tested in isolation on the TOEFL11 data set, then used in a meta-classifier stack in combination with traditional techniques to produce an accuracy of 0.853. However, more labelled NLI data is now available, so BERT was also trained on the much larger Reddit-L2 data set, containing 50 times as many examples as previously used for English NLI, giving an accuracy of 0.902 on the Reddit-L2 in-domain test scenario, improving the state-of-the-art by 21.2 percentage points.

pdf bib
SemEval-2020 Task 8: Memotion Analysis- the Visuo-Lingual Metaphor!
Chhavi Sharma | Deepesh Bhageria | William Scott | Srinivas PYKL | Amitava Das | Tanmoy Chakraborty | Viswanath Pulabaigari | Björn Gambäck
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Information on social media comprises of various modalities such as textual, visual and audio. NLP and Computer Vision communities often leverage only one prominent modality in isolation to study social media. However, computational processing of Internet memes needs a hybrid approach. The growing ubiquity of Internet memes on social media platforms such as Facebook, Instagram, and Twitter further suggests that we can not ignore such multimodal content anymore. To the best of our knowledge, there is not much attention towards meme emotion analysis. The objective of this proposal is to bring the attention of the research community towards the automatic processing of Internet memes. The task Memotion analysis released approx 10K annotated memes- with human annotated labels namely sentiment(positive, negative, neutral), type of emotion(sarcastic,funny,offensive, motivation) and their corresponding intensity. The challenge consisted of three subtasks: sentiment (positive, negative, and neutral) analysis of memes,overall emotion (humor, sarcasm, offensive, and motivational) classification of memes, and classifying intensity of meme emotion. The best performances achieved were F1 (macro average) scores of 0.35, 0.51 and 0.32, respectively for each of the three subtasks.

pdf bib
SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets
Parth Patwa | Gustavo Aguilar | Sudipta Kar | Suraj Pandey | Srinivas PYKL | Björn Gambäck | Tanmoy Chakraborty | Thamar Solorio | Amitava Das
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper, we present the results of the SemEval-2020 Task 9 on Sentiment Analysis of Code-Mixed Tweets (SentiMix 2020). We also release and describe our Hinglish (Hindi-English)and Spanglish (Spanish-English) corpora annotated with word-level language identification and sentence-level sentiment labels. These corpora are comprised of 20K and 19K examples, respectively. The sentiment labels are - Positive, Negative, and Neutral. SentiMix attracted 89 submissions in total including 61 teams that participated in the Hinglish contest and 28 submitted systems to the Spanglish competition. The best performance achieved was 75.0% F1 score for Hinglish and 80.6% F1 for Spanglish. We observe that BERT-like models and ensemble methods are the most common and successful approaches among the participants.

pdf bib
Using Transfer-based Language Models to Detect Hateful and Offensive Language Online
Vebjørn Isaksen | Björn Gambäck
Proceedings of the Fourth Workshop on Online Abuse and Harms

Distinguishing hate speech from non-hate offensive language is challenging, as hate speech not always includes offensive slurs and offensive language not always express hate. Here, four deep learners based on the Bidirectional Encoder Representations from Transformers (BERT), with either general or domain-specific language models, were tested against two datasets containing tweets labelled as either ‘Hateful’, ‘Normal’ or ‘Offensive’. The results indicate that the attention-based models profoundly confuse hate speech with offensive and normal language. However, the pre-trained models outperform state-of-the-art results in terms of accurately predicting the hateful instances.

2019

pdf bib
Studying Generalisability across Abusive Language Detection Datasets
Steve Durairaj Swamy | Anupam Jamatia | Björn Gambäck
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of this, there exists a great deal of redundancy and non-generalisability between datasets. Through experiments on cross-dataset training and testing, the paper reveals that the preconceived notion of including more non-abusive samples in a dataset (to emulate reality) may have a detrimental effect on the generalisability of a model trained on that data. Hence a hierarchical annotation model is utilised here to reveal redundancies in existing datasets and to help reduce redundancy in future efforts.

pdf bib
A Platform Agnostic Dual-Strand Hate Speech Detector
Johannes Skjeggestad Meyer | Björn Gambäck
Proceedings of the Third Workshop on Abusive Language Online

Hate speech detectors must be applicable across a multitude of services and platforms, and there is hence a need for detection approaches that do not depend on any information specific to a given platform. For instance, the information stored about the text’s author may differ between services, and so using such data would reduce a system’s general applicability. The paper thus focuses on using exclusively text-based input in the detection, in an optimised architecture combining Convolutional Neural Networks and Long Short-Term Memory-networks. The hate speech detector merges two strands with character n-grams and word embeddings to produce the final classification, and is shown to outperform comparable previous approaches.

pdf bib
NIT_Agartala_NLP_Team at SemEval-2019 Task 6: An Ensemble Approach to Identifying and Categorizing Offensive Language in Twitter Social Media Corpora
Steve Durairaj Swamy | Anupam Jamatia | Björn Gambäck | Amitava Das
Proceedings of the 13th International Workshop on Semantic Evaluation

The paper describes the systems submitted to OffensEval (SemEval 2019, Task 6) on ‘Identifying and Categorizing Offensive Language in Social Media’ by the ‘NIT_Agartala_NLP_Team’. A Twitter annotated dataset of 13,240 English tweets was provided by the task organizers to train the individual models, with the best results obtained using an ensemble model composed of six different classifiers. The ensemble model produced macro-averaged F1-scores of 0.7434, 0.7078 and 0.4853 on Subtasks A, B, and C, respectively. The paper highlights the overall low predictive nature of various linguistic features and surface level count features, as well as the limitations of a traditional machine learning approach when compared to a Deep Learning counterpart.

2018

pdf bib
Utilizing Large Twitter Corpora to Create Sentiment Lexica
Valerij Fredriksen | Brage Jahren | Björn Gambäck
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Named Entity Recognition on Code-Switched Data Using Conditional Random Fields
Utpal Kumar Sikdar | Biswanath Barik | Björn Gambäck
Proceedings of the Third Workshop on Computational Approaches to Linguistic Code-Switching

Named Entity Recognition is an important information extraction task that identifies proper names in unstructured texts and classifies them into some pre-defined categories. Identification of named entities in code-mixed social media texts is a more difficult and challenging task as the contexts are short, ambiguous and often noisy. This work proposes a Conditional Random Fields based named entity recognition system to identify proper names in code-switched data and classify them into nine categories. The system ranked fifth among nine participant systems and achieved a 59.25% F1-score.

pdf bib
The Effects of User Features on Twitter Hate Speech Detection
Elise Fehn Unsvåg | Björn Gambäck
Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)

The paper investigates the potential effects user features have on hate speech classification. A quantitative analysis of Twitter data was conducted to better understand user characteristics, but no correlations were found between hateful text and the characteristics of the users who had posted it. However, experiments with a hate speech classifier based on datasets from three different languages showed that combining certain user features with textual features gave slight improvements of classification performance. While the incorporation of user features resulted in varying impact on performance for the different datasets used, user network-related features provided the most consistent improvements.

pdf bib
Ternary Twitter Sentiment Classification with Distant Supervision and Sentiment-Specific Word Embeddings
Mats Byrkjeland | Frederik Gørvell de Lichtenberg | Björn Gambäck
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

The paper proposes the Ternary Sentiment Embedding Model, a new model for creating sentiment embeddings based on the Hybrid Ranking Model of Tang et al. (2016), but trained on ternary-labeled data instead of binary-labeled, utilizing sentiment embeddings from datasets made with different distant supervision methods. The model is used as part of a complete Twitter Sentiment Analysis system and empirically compared to existing systems, showing that it outperforms Hybrid Ranking and that the quality of the distant-supervised dataset has a great impact on the quality of the produced sentiment embeddings.

pdf bib
NTNU at SemEval-2018 Task 7: Classifier Ensembling for Semantic Relation Identification and Classification in Scientific Papers
Biswanath Barik | Utpal Kumar Sikdar | Björn Gambäck
Proceedings of The 12th International Workshop on Semantic Evaluation

The paper presents NTNU’s contribution to SemEval-2018 Task 7 on relation identification and classification. The class weights and parameters of five alternative supervised classifiers were optimized through grid search and cross-validation. The outputs of the classifiers were combined through voting for the final prediction. A wide variety of features were explored, with the most informative identified by feature selection. The best setting achieved F1 scores of 47.4% and 66.0% in the relation classification subtasks 1.1 and 1.2. For relation identification and classification in subtask 2, it achieved F1 scores of 33.9% and 17.0%,

pdf bib
Flytxt_NTNU at SemEval-2018 Task 8: Identifying and Classifying Malware Text Using Conditional Random Fields and Naïve Bayes Classifiers
Utpal Kumar Sikdar | Biswanath Barik | Björn Gambäck
Proceedings of The 12th International Workshop on Semantic Evaluation

Cybersecurity risks such as malware threaten the personal safety of users, but to identify malware text is a major challenge. The paper proposes a supervised learning approach to identifying malware sentences given a document (subTask1 of SemEval 2018, Task 8), as well as to classifying malware tokens in the sentences (subTask2). The approach achieved good results, ranking second of twelve participants for both subtasks, with F-scores of 57% for subTask1 and 28% for subTask2.

2017

pdf bib
Twitter Topic Modeling by Tweet Aggregation
Asbjørn Steinskog | Jonas Therkelsen | Björn Gambäck
Proceedings of the 21st Nordic Conference on Computational Linguistics

pdf bib
Using Convolutional Neural Networks to Classify Hate-Speech
Björn Gambäck | Utpal Kumar Sikdar
Proceedings of the First Workshop on Abusive Language Online

The paper introduces a deep learning-based Twitter hate-speech text classification system. The classifier assigns each tweet to one of four predefined categories: racism, sexism, both (racism and sexism) and non-hate-speech. Four Convolutional Neural Network models were trained on resp. character 4-grams, word vectors based on semantic information built using word2vec, randomly generated word vectors, and word vectors combined with character n-grams. The feature set was down-sized in the networks by max-pooling, and a softmax function used to classify tweets. Tested by 10-fold cross-validation, the model based on word2vec embeddings performed best, with higher precision than recall, and a 78.3% F-score.

pdf bib
A Feature-based Ensemble Approach to Recognition of Emerging and Rare Named Entities
Utpal Kumar Sikdar | Björn Gambäck
Proceedings of the 3rd Workshop on Noisy User-generated Text

Detecting previously unseen named entities in text is a challenging task. The paper describes how three initial classifier models were built using Conditional Random Fields (CRFs), Support Vector Machines (SVMs) and a Long Short-Term Memory (LSTM) recurrent neural network. The outputs of these three classifiers were then used as features to train another CRF classifier working as an ensemble. 5-fold cross-validation based on training and development data for the emerging and rare named entity recognition shared task showed precision, recall and F1-score of 66.87%, 46.75% and 54.97%, respectively. For surface form evaluation, the CRF ensemble-based system achieved precision, recall and F1 scores of 65.18%, 45.20% and 53.30%. When applied to unseen test data, the model reached 47.92% precision, 31.97% recall and 38.55% F1-score for entity level evaluation, with the corresponding surface form evaluation values of 44.91%, 30.47% and 36.31%.

pdf bib
A Societal Sentiment Analysis: Predicting the Values and Ethics of Individuals by Analysing Social Media Content
Tushar Maheshwari | Aishwarya N. Reganti | Samiksha Gupta | Anupam Jamatia | Upendra Kumar | Björn Gambäck | Amitava Das
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

To find out how users’ social media behaviour and language are related to their ethical practices, the paper investigates applying Schwartz’ psycholinguistic model of societal sentiment to social media text. The analysis is based on corpora collected from user essays as well as social media (Facebook and Twitter). Several experiments were carried out on the corpora to classify the ethical values of users, incorporating Linguistic Inquiry Word Count analysis, n-grams, topic models, psycholinguistic lexica, speech-acts, and non-linguistic information, while applying a range of machine learners (Support Vector Machines, Logistic Regression, and Random Forests) to identify the best linguistic and non-linguistic features for automatic classification of values and ethics.

2016

pdf bib
Comparing the Level of Code-Switching in Corpora
Björn Gambäck | Amitava Das
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Social media texts are often fairly informal and conversational, and when produced by bilinguals tend to be written in several different languages simultaneously, in the same way as conversational speech. The recent availability of large social media corpora has thus also made large-scale code-switched resources available for research. The paper addresses the issues of evaluation and comparison these new corpora entail, by defining an objective measure of corpus level complexity of code-switched texts. It is also shown how this formal measure can be used in practice, by applying it to several code-switched corpora.

pdf bib
Feature-Rich Twitter Named Entity Recognition and Classification
Utpal Kumar Sikdar | Björn Gambäck
Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT)

Twitter named entity recognition is the process of identifying proper names and classifying them into some predefined labels/categories. The paper introduces a Twitter named entity system using a supervised machine learning approach, namely Conditional Random Fields. A large set of different features was developed and the system was trained using these. The Twitter named entity task can be divided into two parts: i) Named entity extraction from tweets and ii) Twitter name classification into ten different types. For Twitter named entity recognition on unseen test data, our system obtained the second highest F1 score in the shared task: 63.22%. The system performance on the classification task was worse, with an F1 measure of 40.06% on unseen test data, which was the fourth best of the ten systems participating in the shared task.

pdf bib
Language Identification in Code-Switched Text Using Conditional Random Fields and Babelnet
Utpal Kumar Sikdar | Björn Gambäck
Proceedings of the Second Workshop on Computational Approaches to Code Switching

pdf bib
Twitter Named Entity Extraction and Linking Using Differential Evolution
Utpal Kumar Sikdar | Björn Gambäck
Proceedings of the 13th International Conference on Natural Language Processing

pdf bib
NTNUSentEval at SemEval-2016 Task 4: Combining General Classifiers for Fast Twitter Sentiment Analysis
Brage Ekroll Jahren | Valerij Fredriksen | Björn Gambäck | Lars Bungum
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf bib
Negation Scope Detection for Twitter Sentiment Analysis
Johan Reitan | Jørgen Faret | Björn Gambäck | Lars Bungum
Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
Self-Organizing Maps for Classification of a Multi-Labeled Corpus
Lars Bungum | Björn Gambäck
Proceedings of the 12th International Conference on Natural Language Processing

pdf bib
Sentence Boundary Detection for Social Media Text
Dwijen Rudrapal | Anupam Jamatia | Kunal Chakma | Amitava Das | Björn Gambäck
Proceedings of the 12th International Conference on Natural Language Processing

pdf bib
Part-of-Speech Tagging for Code-Mixed English-Hindi Twitter and Facebook Chat Messages
Anupam Jamatia | Björn Gambäck | Amitava Das
Proceedings of the International Conference Recent Advances in Natural Language Processing

2014

pdf bib
Agent-based modeling of language evolution
Torvald Lekvam | Björn Gambäck | Lars Bungum
Proceedings of the 5th Workshop on Cognitive Aspects of Computational Language Learning (CogACLL)

pdf bib
Identifying Languages at the Word Level in Code-Mixed Indian Social Media Text
Amitava Das | Björn Gambäck
Proceedings of the 11th International Conference on Natural Language Processing

pdf bib
NTNU: Measuring Semantic Similarity with Sublexical Feature Representations and Soft Cardinality
André Lynum | Partha Pakray | Björn Gambäck | Sergio Jimenez
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

pdf bib
Improving Word Translation Disambiguation by Capturing Multiword Expressions with Dictionaries
Lars Bungum | Björn Gambäck | André Lynum | Erwin Marsi
Proceedings of the 9th Workshop on Multiword Expressions

pdf bib
Towards Dynamic Word Sense Discrimination with Random Indexing
Hans Moen | Erwin Marsi | Björn Gambäck
Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality

pdf bib
NTNU-CORE: Combining strong features for semantic similarity
Erwin Marsi | Hans Moen | Lars Bungum | Gleb Sizov | Björn Gambäck | André Lynum
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

pdf bib
NTNU: Domain Semi-Independent Short Message Sentiment Classification
Øyvind Selmer | Mikael Brevik | Björn Gambäck | Lars Bungum
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf bib
Sentimantics: Conceptual Spaces for Lexical Sentiment Polarity Representation with Contextuality
Amitava Das | Björn Gambäck
Proceedings of the 3rd Workshop in Computational Approaches to Subjectivity and Sentiment Analysis

2010

pdf bib
Proceedings of the 2010 Workshop on Companionable Dialogue Systems
Yorick Wilks | Björn Gambäck | Morena Danieli
Proceedings of the 2010 Workshop on Companionable Dialogue Systems

2009

pdf bib
A Mobile Health and Fitness Companion Demonstrator
Olov Ståhl | Björn Gambäck | Markku Turunen | Jaakko Hakulinen
Proceedings of the Demonstrations Session at EACL 2009

pdf bib
Methods for Amharic Part-of-Speech Tagging
Björn Gambäck | Fredrik Olsson | Atelach Alemu Argaw | Lars Asker
Proceedings of the First Workshop on Language Technologies for African Languages

2005

pdf bib
Natural Language Processing at the School of Information Studies for Africa
Björn Gambäck | Gunnar Eriksson | Athanassia Fourla
Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL

pdf bib
Classifying Amharic News Text Using Self-Organizing Maps
Samuel Eyassu | Björn Gambäck
Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages

2003

pdf bib
Introduction: Dialogue Systems: Interaction, Adaptation and Styles of Management
Kristiina Jokinen | Björn Gämback | William Black | Roberta Catizone | Yorick Wilks
Proceedings of the 2003 EACL Workshop on Dialogue Systems: interaction, adaptation and styes of management

2000

pdf bib
Designing a System for Swedish Spoken Document Retrieval
Botond Pakucs | Björn Gambäck
Proceedings of the 12th Nordic Conference of Computational Linguistics (NODALIDA 1999)

pdf bib
Experiences of Language Engineering Algorithm Reuse
Björn Gambäck | Fredrik Olsson
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Composing a General-Purpose Toolbox for Swedish
Fredrik Olsson | Björn Gambäck
Proceedings of the COLING-2000 Workshop on Using Toolsets and Architectures To Build NLP Systems

1998

pdf bib
Semantic-Head Based Resolution of Scopal Ambiguities
Bjorn Gamback | Johan Bos
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
Semantic-Head Based Resolution of Scopal Ambiguities
Bjorn Gamback | Johan Bos
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

1996

pdf bib
Underspecified Japanese Semantics in a Machine Translation System
Björn Gambäck | Christian Lieske | Yoshiki Mori
Proceedings of the 11th Pacific Asia Conference on Language, Information and Computation

pdf bib
Compositional Semantics in Verbmobil
Johan Bos | Bjorn Gamback | Christian Lieske | Yoshiki Mori | Manfred Pinkal | Karsten Worm
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

1995

pdf bib
Swedish Language Processing in the Spoken Language Translator
Björn Gambäck
Proceedings of the 10th Nordic Conference of Computational Linguistics (NODALIDA 1995)

1994

pdf bib
Complex Verb Transfer Phenomena in the SLT System
Björn Gambäck | Ivan Bretan
Proceedings of the First Conference of the Association for Machine Translation in the Americas

pdf bib
Tagging Experiments Using Neural Networks
Martin Eineborg | Björn Gambäck
Proceedings of the 9th Nordic Conference of Computational Linguistics (NODALIDA 1993)

pdf bib
On Implementing Swedish Tense and Aspect
Björn Gambäck
Proceedings of the 9th Nordic Conference of Computational Linguistics (NODALIDA 1993)

pdf bib
Clustering Sentences – Making Sense of Synonymous Sentences
Jussi Karlgren | Björn Gambäck | Christer Samuelsson
Proceedings of the 9th Nordic Conference of Computational Linguistics (NODALIDA 1993)

1993

pdf bib
A Speech to Speech Translation System Built From Standard Components
Manny Rayner | Hiyan Alshawi | Ivan Bretan | David Carter | Vassilios Digalakis | Bjorn Gamback | Jaan Kaja | Jussi Karlgren | Bertil Lyberg | Steve Pulman | Patti Price | Christer Samuelsson
Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993

1992

pdf bib
English-Swedish translation of dialogue software
Hiyan Alshawi | David Carter | Steve Pulman | Manny Rayner | Björn Gambäck
Proceedings of Translating and the Computer 14: Quality standards and the implementation of technology in translation

pdf bib
Ebl²: An Approach to Automatic Lexical Acquisition
Lars Asker | Bjorn Gamback | Christer Samuelsson
COLING 1992 Volume 4: The 14th International Conference on Computational Linguistics

1991

pdf bib
Translation by Quasi Logical Form Transfer
Hiyan Alshawi | David Carter | Manny Rayner | Bjorn Gamback
29th Annual Meeting of the Association for Computational Linguistics