Alica Hövelmeyer
2022
Varanalysis@SV-Ident 2022: Variable Detection and Disambiguation Based on Semantic Similarity
Alica Hövelmeyer
|
Yavuz Selim Kartal
Proceedings of the Third Workshop on Scholarly Document Processing
This paper describes an approach to the SV-Ident Shared Task which requires the detection and disambiguation of survey variables in sentences taken from social science publications. It deals with both subtasks as problems of semantic textual similarity (STS) and relies on the use of sentence transformers. Sentences and variables are examined for semantic similarity for both detecting sentences containing variables and disambiguating the respective variables. The focus is placed on analyzing the effects of including different parts of the variables and observing the differences between English and German instances. Additionally, for the variable detection task a bag of words model is used to filter out sentences which are likely to contain a variable mention as a preselection of sentences to perform the semantic similarity comparison on.