Sara Noeman


pdf bib
DIMSIM: An Accurate Chinese Phonetic Similarity Algorithm Based on Learned High Dimensional Encoding
Min Li | Marina Danilevsky | Sara Noeman | Yunyao Li
Proceedings of the 22nd Conference on Computational Natural Language Learning

Phonetic similarity algorithms identify words and phrases with similar pronunciation which are used in many natural language processing tasks. However, existing approaches are designed mainly for Indo-European languages and fail to capture the unique properties of Chinese pronunciation. In this paper, we propose a high dimensional encoded phonetic similarity algorithm for Chinese, DIMSIM. The encodings are learned from annotated data to separately map initial and final phonemes into n-dimensional coordinates. Pinyin phonetic similarities are then calculated by aggregating the similarities of initial, final and tone. DIMSIM demonstrates a 7.5X improvement on mean reciprocal rank over the state-of-the-art phonetic similarity approaches.


pdf bib
IBM_EG-CORE: Comparing multiple Lexical and NE matching features in measuring Semantic Textual similarity
Sara Noeman
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity


pdf bib
Language Independent Transliteration Mining System Using Finite State Automata Framework
Sara Noeman | Amgad Madkour
Proceedings of the 2010 Named Entities Workshop


pdf bib
Language Independent Transliteration System Using Phrase-based SMT Approach on Substrings
Sara Noeman
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)


pdf bib
Language Independent Text Correction using Finite State Automata
Ahmed Hassan | Sara Noeman | Hany Hassan
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II


pdf bib
Graph Based Semi-Supervised Approach for Information Extraction
Hany Hassan | Ahmed Hassan | Sara Noeman
Proceedings of TextGraphs: the First Workshop on Graph Based Methods for Natural Language Processing