Gary Farmaner


2025

pdf bib
Neural Document Segmentation Using Weighted Sliding Windows with Transformer Encoders
Saeed Abbasi | Aijun An | Heidar Davoudi | Ron Di Carlantonio | Gary Farmaner
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track

We introduce a novel Transformer-based method for document segmentation, tailored for practical, real-world applications. This method utilizes overlapping text sequences with a unique position-aware weighting mechanism to enhance segmentation accuracy. Through comprehensive experiments on both public and proprietary datasets, we demonstrate significant improvements, establishing new state-of-the-art standards by achieving up to a 10% increase in segmentation F1 score compared to existing methods. Additionally, we explore the application of our segmentation method in downstream retrieval-augmented question answering tasks, where it improves the quality of generated responses by 5% while achieving up to four times greater efficiency. These results underscore our model’s potential as a robust and scalable solution for real-world text segmentation challenges.

2024

pdf bib
Generating Vehicular Icon Descriptions and Indications Using Large Vision-Language Models
James Fletcher | Nicholas Dehnen | Seyed Nima Tayarani Bathaie | Aijun An | Heidar Davoudi | Ron DiCarlantonio | Gary Farmaner
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

To enhance a question-answering system for automotive drivers, we tackle the problem of automatic generation of icon image descriptions. The descriptions can match the driver’s query about the icon appearing on the dashboard and tell the driver what is happening so that they may take an appropriate action. We use three state-of-the-art large vision-language models to generate both visual and functional descriptions based on the icon image and its context information in the car manual. Both zero-shot and few-shot prompts are used. We create a dataset containing over 400 icons with their ground-truth descriptions and use it to evaluate model-generated descriptions across several performance metrics. Our evaluation shows that two of these models (GPT-4o and Claude 3.5) performed well on this task, while the third model (LLaVA-NEXT) performs poorly.