Kazuhiro Nakadai
2026
Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models
Ragib Amin Nihal | Rui Wen | Kazuhiro Nakadai | Jun Sakuma
Findings of the Association for Computational Linguistics: ACL 2026
Ragib Amin Nihal | Rui Wen | Kazuhiro Nakadai | Jun Sakuma
Findings of the Association for Computational Linguistics: ACL 2026
Large language models (LLMs) remain vulnerable to multi-turn jailbreaking attacks that exploit conversational context to bypass safety constraints gradually. These attacks target different harm categories through distinct conversational approaches. Existing multi-turn methods often rely on heuristic or ad hoc exploration strategies, providing limited insight into underlying model weaknesses. The relationship between conversation patterns and model vulnerabilities across harm categories remains poorly understood. We propose Pattern Enhanced Chain of Attack (PE-CoA), a framework of five conversation patterns to construct multi-turn jailbreaks through natural dialogue. Evaluating PE-CoA on twelve LLMs spanning ten harm categories, we achieve state-of-the-art performance, uncovering pattern-specific vulnerabilities and LLM behavioral characteristics: models exhibit distinct weakness profiles, defense to one pattern does not generalize to others, and model families share similar failure modes. These findings highlight limitations of safety training and indicate the need for pattern-aware defenses. Code available on: https://github.com/Ragib-Amin-Nihal/PE-CoA
2025
Improvement in Sign Language Translation Using Text CTC Alignment
Sihan Tan | Taro Miyazaki | Nabeela Khan | Kazuhiro Nakadai
Proceedings of the 31st International Conference on Computational Linguistics
Sihan Tan | Taro Miyazaki | Nabeela Khan | Kazuhiro Nakadai
Proceedings of the 31st International Conference on Computational Linguistics
Current sign language translation (SLT) approaches often rely on gloss-based supervision with Connectionist Temporal Classification (CTC), limiting their ability to handle non-monotonic alignments between sign language video and spoken text. In this work, we propose a novel method combining joint CTC/Attention and transfer learning. The joint CTC/Attention introduces hierarchical encoding and integrates CTC with the attention mechanism during decoding, effectively managing both monotonic and non-monotonic alignments. Meanwhile, transfer learning helps bridge the modality gap between vision and language in SLT. Experimental results on two widely adopted benchmarks, RWTH-PHOENIX-Weather 2014 T and CSL-Daily, show that our method achieves results comparable to state-of-the-art and outperforms the pure-attention baseline. Additionally, this work opens a new door for future research into gloss-free SLT using text-based CTC alignment.
Multilingual Gloss-free Sign Language Translation: Towards Building a Sign Language Foundation Model
Sihan Tan | Taro Miyazaki | Kazuhiro Nakadai
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Sihan Tan | Taro Miyazaki | Kazuhiro Nakadai
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Sign Language Translation (SLT) aims to convert sign language (SL) videos into spoken language text, thereby bridging the communication gap between the sign and the spoken community. While most existing works focus on translating a single SL into a single spoken language (one-to-one SLT), leveraging multilingual resources could mitigate low-resource issues and enhance accessibility. However, multilingual SLT (MLSLT) remains unexplored due to language conflicts and alignment difficulties across SLs and spoken languages. To address these challenges, we propose a multilingual gloss-free model with dual CTC objectives for token-level SL identification and spoken text generation. Our model supports 10 SLs and handles one-to-one, many-to-one, and many-to-many SLT tasks, achieving competitive performance compared to state-of-the-art methods on three widely adopted benchmarks: multilingual SP-10, PHOENIX14T, and CSL-Daily.
2024
SEDA: Simple and Effective Data Augmentation for Sign Language Understanding
Sihan Tan | Taro Miyazaki | Katsutoshi Itoyama | Kazuhiro Nakadai
Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources
Sihan Tan | Taro Miyazaki | Katsutoshi Itoyama | Kazuhiro Nakadai
Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources
2018
Deep JSLC: A Multimodal Corpus Collection for Data-driven Generation of Japanese Sign Language Expressions
Heike Brock | Kazuhiro Nakadai
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Heike Brock | Kazuhiro Nakadai
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2016
Construction of Japanese Audio-Visual Emotion Database and Its Application in Emotion Recognition
Nurul Lubis | Randy Gomez | Sakriani Sakti | Keisuke Nakamura | Koichiro Yoshino | Satoshi Nakamura | Kazuhiro Nakadai
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Nurul Lubis | Randy Gomez | Sakriani Sakti | Keisuke Nakamura | Koichiro Yoshino | Satoshi Nakamura | Kazuhiro Nakadai
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Emotional aspects play a vital role in making human communication a rich and dynamic experience. As we introduce more automated system in our daily lives, it becomes increasingly important to incorporate emotion to provide as natural an interaction as possible. To achieve said incorporation, rich sets of labeled emotional data is prerequisite. However, in Japanese, existing emotion database is still limited to unimodal and bimodal corpora. Since emotion is not only expressed through speech, but also visually at the same time, it is essential to include multiple modalities in an observation. In this paper, we present the first audio-visual emotion corpora in Japanese, collected from 14 native speakers. The corpus contains 100 minutes of annotated and transcribed material. We performed preliminary emotion recognition experiments on the corpus and achieved an accuracy of 61.42% for five classes of emotion.