2025
pdf
bib
abs
Preserving Multilingual Quality While Tuning Query Encoder on English Only
Oleg Vasilyev
|
Randy Sawaya
|
John Bohannon
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
A query encoder of a dual passage retrieval system can be tuned for specific types of queries or domains, while the precomputed and stored documents representations are kept intact. Switching from one query encoder to another when needed is easily feasible, unlike overhauling the embeddings of a whole knowledge base. In this work we raise a question: Can the generic, original qualities of the encoder be preserved or at least left not too degraded when it is tuned on a narrow domain? We conducted experiments on a high quality multilingual embedding model: Tuning it on a single English-only dataset, we observe that the tuning not only preserves the multilingual qualities, but even improves them. The embedding qualities on distinctly different data are also improved or at least preserved. Drawing on our observations, we suggest a more general hypothesis: Tuning with intentionally low learning rate can preserve or improve a system’s properties acquired in training, but not specifically targeted by tuning. We call this adiabatic tuning and provide tentative explanations.
2024
pdf
bib
abs
Linear Cross-Lingual Mapping of Sentence Embeddings
Oleg Vasilyev
|
Fumika Isono
|
John Bohannon
Findings of the Association for Computational Linguistics: ACL 2024
Semantics of a sentence is defined with much less ambiguity than semantics of a single word, and we assume that it should be better preserved by translation to another language. If multilingual sentence embeddings intend to represent sentence semantics, then the similarity between embeddings of any two sentences must be invariant with respect to translation. Based on this suggestion, we consider a simple linear cross-lingual mapping as a possible improvement of the multilingual embeddings. We also consider deviation from orthogonality conditions as a measure of deficiency of the embeddings.
2022
pdf
bib
abs
Does Summary Evaluation Survive Translation to Other Languages?
Spencer Braun
|
Oleg Vasilyev
|
Neslihan Iskender
|
John Bohannon
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
The creation of a quality summarization dataset is an expensive, time-consuming effort, requiring the production and evaluation of summaries by both trained humans and machines. The returns to such an effort would increase significantly if the dataset could be used in additional languages without repeating human annotations. To investigate how much we can trust machine translation of summarization datasets, we translate the English SummEval dataset to seven languages and compare performances across automatic evaluation measures. We explore equivalence testing as the appropriate statistical paradigm for evaluating correlations between human and automated scoring of summaries. We also consider the effect of translation on the relative performance between measures. We find some potential for dataset reuse in languages similar to the source and along particular dimensions of summary quality. Our code and data can be found at 
https://github.com/PrimerAI/primer-research/.
2021
pdf
bib
abs
ESTIME: Estimation of Summary-to-Text Inconsistency by Mismatched Embeddings
Oleg Vasilyev
|
John Bohannon
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems
We propose a new reference-free summary quality evaluation measure, with emphasis on the faithfulness. The measure is based on finding and counting all probable potential inconsistencies of the summary with respect to the source document. The proposed ESTIME, Estimator of Summary-to-Text Inconsistency by Mismatched Embeddings, correlates with expert scores in summary-level SummEval dataset stronger than other common evaluation measures not only in Consistency but also in Fluency. We also introduce a method of generating subtle factual errors in human summaries. We show that ESTIME is more sensitive to subtle errors than other common evaluation measures.
pdf
bib
Is Human Scoring the Best Criteria for Summary Evaluation?
Oleg Vasilyev
|
John Bohannon
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
2020
pdf
bib
abs
Fill in the BLANC: Human-free quality estimation of document summaries
Oleg Vasilyev
|
Vedant Dharnidharka
|
John Bohannon
Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems
We present BLANC, a new approach to the automatic estimation of document summary quality. Our goal is to measure the functional performance of a summary with an objective, reproducible, and fully automated method. Our approach achieves this by measuring the performance boost gained by a pre-trained language model with access to a document summary while carrying out its language understanding task on the document’s text. We present evidence that BLANC scores have as good correlation with human evaluations as do the ROUGE family of summary quality measurements. And unlike ROUGE, the BLANC method does not require human-written reference summaries, allowing for fully human-free summary quality estimation.