Saher Esmeir
2023
Semantic Similarity Covariance Matrix Shrinkage
Guillaume Becquin
|
Saher Esmeir
Findings of the Association for Computational Linguistics: EMNLP 2023
An accurate estimation of the covariance matrix is a critical component of many applications in finance, including portfolio optimization. The sample covariance suffers from the curse of dimensionality when the number of observations is in the same order or lower than the number of variables. This tends to be the case in portfolio optimization, where a portfolio manager can choose between thousands of stocks using historical daily returns to guide their investment decisions. To address this issue, past works proposed linear covariance shrinkage to regularize the estimated matrix. While effective, the proposed methods relied solely on historical price data and thus ignored company fundamental data. In this work, we propose to utilise semantic similarity derived from textual descriptions or knowledge graphs to improve the covariance estimation. Rather than using the semantic similarity directly as a biased estimator to the covariance, we employ it as a shrinkage target. The resulting covariance estimators leverage both semantic similarity and recent price history, and can be readily adapted to a broad range of financial securities. The effectiveness of the approach is demonstrated for a period including diverse market conditions and compared with the covariance shrinkage prior art.
2022
Entity Retrieval from Multilingual Knowledge Graphs
Saher Esmeir
|
Arthur Câmara
|
Edgar Meij
Proceedings of the 2nd Workshop on Multi-lingual Representation Learning (MRL)
Knowledge Graphs (KGs) are structured databases that capture real-world entities and their relationships. The task of entity retrieval from a KG aims at retrieving a ranked list of entities relevant to a given user query. While English-only entity retrieval has attracted considerable attention, user queries, as well as the information contained in the KG, may be represented in multiple—and possibly distinct—languages. Furthermore, KG content may vary between languages due to different information sources and points of view. Recent advances in language representation have enabled natural ways of bridging gaps between languages. In this paper, we therefore propose to utilise language models (LMs) and diverse entity representations to enable truly multilingual entity retrieval. We propose two approaches: (i) an array of monolingual retrievers and (ii) a single multilingual retriever, trained using queries and documents in multiple languages. We show that while our approach is on par with the significantly more complex state-of-the-art method for the English task, it can be successfully applied to virtually any language with a LM. Furthermore, it allows languages to benefit from one another, yielding significantly better performance, both for low- and high-resource languages.
2021
SERAG: Semantic Entity Retrieval from Arabic Knowledge Graphs
Saher Esmeir
Proceedings of the Sixth Arabic Natural Language Processing Workshop
Knowledge graphs (KGs) are widely used to store and access information about entities and their relationships. Given a query, the task of entity retrieval from a KG aims at presenting a ranked list of entities relevant to the query. Lately, an increasing number of models for entity retrieval have shown a significant improvement over traditional methods. These models, however, were developed for English KGs. In this work, we build on one such system, named KEWER, to propose SERAG (Semantic Entity Retrieval from Arabic knowledge Graphs). Like KEWER, SERAG uses random walks to generate entity embeddings. DBpedia-Entity v2 is considered the standard test collection for entity retrieval. We discuss the challenges of using it for non-English languages in general and Arabic in particular. We provide an Arabic version of this standard collection, and use it to evaluate SERAG. SERAG is shown to significantly outperform the popular BM25 model thanks to its multi-hop reasoning.