Christian Wallraven

2023

Text classification in education, usually called auto-tagging, is the automated process of assigning relevant tags to educational content, such as questions and textbooks. However, auto-tagging suffers from a data scarcity problem, which stems from two major challenges: 1) it possesses a large tag space and 2) it is multi-label. Though a retrieval approach is reportedly good at low-resource scenarios, there have been fewer efforts to directly address the data scarcity problem. To mitigate these issues, here we propose a novel retrieval approach CEAA that provides effective learning in educational text classification. Our main contributions are as follows: 1) we leverage transfer learning from question-answering datasets, and 2) we propose a simple but effective data augmentation method introducing cross-encoder style texts to a bi-encoder architecture for more efficient inference. An extensive set of experiments shows that our proposed method is effective in multi-label scenarios and low-resource tags compared to state-of-the-art models.

2010

pdf bib abs

The POETICON Corpus: Capturing Language Use and Sensorimotor Experience in Everyday Interaction
Katerina Pastra | Christian Wallraven | Michael Schultze | Argyro Vataki | Kathrin Kaulard
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Natural language use, acquisition, and understanding takes place usually in multisensory and multimedia communication environments. Therefore, for one to model language in its interaction and integration with sensorimotor experiences, one needs a representative corpus of such interplay. In this paper, we will present the first corpus of language use and sensorimotor experience recordings in everyday human:human interaction, in which spontaneous language communication has been recorded along with corresponding multiview video recordings, recordings of 3D full body kinematics, and 3D tracking of objects in focus. It is a twelve-hour corpus which comprises of six everyday human:human interaction scenes, each one performed 3 times by 4 different English-speaking couples (interaction between a male and a female actor), each couple acting each scene in two settings: a fully naturalistic setting in which 5-camera multi-view video recordings take place, and a high-tech setting, with full body motion capture for both individuals, a 2-camera multiview video recording, and 3D tracking of focus objects. The corpus has been developed within an EU-funded cognitive systems research project, POETICON (http://www.poeticon.eu), and represents a new type of language resources for cognitive systems. Namely, a corpus that reveals the dynamic role of language in its interplay with sensorimotor experiences and which allows one to computationally model this interplay.

Co-authors

Venues

Findings1
LREC1

Fix author