Sung-Ju Lee
2024
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting
Hyungjun Yoon
|
Biniyam Aschalew Tolera
|
Taesik Gong
|
Kimin Lee
|
Sung-Ju Lee
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) have demonstrated exceptional abilities across various domains. However, utilizing LLMs for ubiquitous sensing applications remains challenging as existing text-prompt methods show significant performance degradation when handling long sensor data sequences. In this paper, we propose a visual prompting approach for sensor data using multimodal LLMs (MLLMs). Specifically, we design a visual prompt that directs MLLMs to utilize visualized sensor data alongside descriptions of the target sensory task. Additionally, we introduce a visualization generator that automates the creation of optimal visualizations tailored to a given sensory task, eliminating the need for prior task-specific knowledge. We evaluated our approach on nine sensory tasks involving four sensing modalities, achieving an average of 10% higher accuracy compared to text-based prompts and reducing token costs by 15.8 times. Our findings highlight the effectiveness and cost-efficiency of using visual prompts with MLLMs for various sensory tasks. The source code is available at https://github.com/diamond264/ByMyEyes.
2023
FedTherapist: Mental Health Monitoring with User-Generated Linguistic Expressions on Smartphones via Federated Learning
Jaemin Shin
|
Hyungjun Yoon
|
Seungjoo Lee
|
Sungjoon Park
|
Yunxin Liu
|
Jinho Choi
|
Sung-Ju Lee
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Psychiatrists diagnose mental disorders via the linguistic use of patients. Still, due to data privacy, existing passive mental health monitoring systems use alternative features such as activity, app usage, and location via mobile devices. We propose FedTherapist, a mobile mental health monitoring system that utilizes continuous speech and keyboard input in a privacy-preserving way via federated learning. We explore multiple model designs by comparing their performance and overhead for FedTherapist to overcome the complex nature of on-device language model training on smartphones. We further propose a Context-Aware Language Learning (CALL) methodology to effectively utilize smartphones’ large and noisy text for mental health signal sensing. Our IRB-approved evaluation of the prediction of self-reported depression, stress, anxiety, and mood from 46 participants shows higher accuracy of FedTherapist compared with the performance with non-language features, achieving 0.15 AUROC improvement and 8.21% MAE reduction.
Search
Co-authors
- Hyungjun Yoon 2
- Biniyam Aschalew Tolera 1
- Taesik Gong 1
- Kimin Lee 1
- Jaemin Shin 1
- show all...