Investigating the detection of Tortured Phrases in Scientific Literature
Puthineath Lay | Martin Lentschat | Cyril Labbe
Proceedings of the Third Workshop on Scholarly Document Processing
With the help of online tools, unscrupulous authors can today generate a pseudo-scientific article and attempt to publish it. Some of these tools work by replacing or paraphrasing existing texts to produce new content, but they have a tendency to generate nonsensical expressions. A recent study introduced the concept of “tortured phrase”, an unexpected odd phrase that appears instead of the fixed expression. E.g. counterfeit consciousness instead of artificial intelligence. The present study aims at investigating how tortured phrases, that are not yet listed, can be detected automatically. We conducted several experiments, including non-neural binary classification, neural binary classification and cosine similarity comparison of the phrase tokens, yielding noticeable results.