WIGVO: Real-Time Bidirectional Speech Translation over Legacy PSTN Calls via Dual-Session Echo Gating

Hyeong-seob Kim; Sang-Woo Son; Hyun-woo Cho; Hyeonsang Kim; Jinmo Kim

WIGVO: Real-Time Bidirectional Speech Translation over Legacy PSTN Calls via Dual-Session Echo Gating

Hyeong-seob Kim, Sang-Woo Son, Hyun-woo Cho, Hyeonsang Kim, Jinmo Kim

Abstract

Real-time speech translation with large language models (LLMs) has become feasible in controlled wideband settings—mobile apps, web browsers, and end-to-end full-duplex systems pushing latency below 200 ms—where developers can assume client-side echo cancellation. However, deploying such systems over the Public Switched Telephone Network (PSTN) remains challenging due to narrowband G.711 audio, unpredictable round-trip delays, and absence of client-side signal processing. We present **WIGVO** (WIGTN Voice-Only), a server-side relay system that enables bidirectional LLM-based speech translation over ordinary telephone calls without requiring app installation or carrier integration. A central contribution is addressing what we term *echo-induced self-reinforcing translation loops*: synthesized speech echoing back through the PSTN gets re-ingested and repeatedly translated. WIGVO solves this through a dual-session architecture with deterministic silence injection and energy-based voice activity detection (VAD) gating. We evaluate WIGVO on 155 Korean–English PSTN calls (148 instrumented, 147 completed) across three communication modes—voice-to-voice (V2V), text-to-voice (T2V), and full-agent—observing 555 ms median caller-to-callee latency and 2,684 ms median callee-to-caller latency, zero echo-induced translation loops, COMET semantic adequacy of 0.71 (en→ko) and 0.62 (ko→en) against offline LLM references, and USD 0.28 per minute cost. The system is deployed at https://wigvo.wigtn.com, with a video walkthrough at https://youtu.be/4Uf6zMPOInY. Evaluation scripts and anonymized call logs are available in the open-source repository.

Anthology ID:: 2026.acl-demo.33
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Greg Durrett, Ping Jian
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 336–344
Language:
URL:: https://aclanthology.org/2026.acl-demo.33/
DOI:
Bibkey:
Cite (ACL):: Hyeong-seob Kim, Sang-Woo Son, Hyun-woo Cho, Hyeonsang Kim, and Jinmo Kim. 2026. WIGVO: Real-Time Bidirectional Speech Translation over Legacy PSTN Calls via Dual-Session Echo Gating. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 336–344, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: WIGVO: Real-Time Bidirectional Speech Translation over Legacy PSTN Calls via Dual-Session Echo Gating (Kim et al., ACL 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.acl-demo.33.pdf

PDF Cite Search Fix data