Vera Aleksić
Also published as: Vera Aleksic
2023
Lexicon-Driven Automatic Sentence Generation for the Skills Section in a Job Posting
Vera Aleksic
|
Mona Brems
|
Anna Mathes
|
Theresa Bertele
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
This paper presents a sentence generation pipeline as implemented on the online job board Stepstone. The goal is to automatically create a set of sentences for the candidate profile and the task description sections in a job ad, related to a given input skill. They must cover two different “tone of voice” variants in German (Du, Sie), three experience levels (junior, mid, senior), and two optionality values (skill is mandatory or optional/nice to have). The generation process considers the difference between soft skills, natural language competencies and hard skills, as well as more specific sub-categories such as IT skills, programming languages and similar. To create grammatically consistent text, morphosyntactic features from the proprietary skill ontology and lexicon are consulted. The approach is a lexicon-driven generation process that compares all lexical features of the new input skills with the ones already added to the sentence database and creates new sentences according to the corresponding templates.
2012
Creating Term and Lexicon Entries from Phrase Tables
Gregor Thurmair
|
Vera Aleksić
Proceedings of the 16th Annual Conference of the European Association for Machine Translation
Large Scale Lexical Analysis
Gregor Thurmair
|
Vera Aleksić
|
Christoph Schwarz
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The following paper presents a lexical analysis component as implemented in the PANACEA project. The goal is to automatically extract lexicon entries from crawled corpora, in an attempt to use corpus-based methods for high-quality linguistic text processing, and to focus on the quality of data without neglecting quantitative aspects. Lexical analysis has the task to assign linguistic information (like: part of speech, inflectional class, gender, subcategorisation frame, semantic properties etc.) to all parts of the input text. If tokens are ambiguous, lexical analysis must provide all possible sets of annotation for later (syntactic) disambiguation, be it tagging, or full parsing. The paper presents an approach for assigning part-of-speech tags for German and English to large input corpora (> 50 mio tokens), providing a workflow which takes as input crawled corpora and provides POS-tagged lemmata ready for lexicon integration. Tools include sentence splitting, lexicon lookup, decomposition, and POS defaulting. Evaluation shows that the overall error rate can be brought down to about 2% if language resources are properly designed. The complete workflow is implemented as a sequence of web services integrated into the PANACEA platform.
2011
Personal Translator at WMT2011
Vera Aleksić
|
Gregor Thurmair
Proceedings of the Sixth Workshop on Statistical Machine Translation