Textual Representations for Crosslingual Information Retrieval

Hang Zhang; Liling Tan

doi:10.18653/v1/2021.ecnlp-1.14

Textual Representations for Crosslingual Information Retrieval

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

In this paper, we explored different levels of textual representations for cross-lingual information retrieval. Beyond the traditional token level representation, we adopted the subword and character level representations for information retrieval that had shown to improve neural machine translation by reducing the out-of-vocabulary issues in machine translation. We found that crosslingual information retrieval performance can be improved by combining search results from subwords and token level representation. Additionally, we improved the search performance by combining and re-ranking the result sets from the different text representations for German, French and Japanese.

Anthology ID:: 2021.ecnlp-1.14
Volume:: Proceedings of the 4th Workshop on e-Commerce and NLP
Month:: August
Year:: 2021
Address:: Online
Editors:: Shervin Malmasi, Surya Kallumadi, Nicola Ueffing, Oleg Rokhlenko, Eugene Agichtein, Ido Guy
Venue:: ECNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 116–122
Language:
URL:: https://aclanthology.org/2021.ecnlp-1.14/
DOI:: 10.18653/v1/2021.ecnlp-1.14
Bibkey:
Cite (ACL):: Hang Zhang and Liling Tan. 2021. Textual Representations for Crosslingual Information Retrieval. In Proceedings of the 4th Workshop on e-Commerce and NLP, pages 116–122, Online. Association for Computational Linguistics.
Cite (Informal):: Textual Representations for Crosslingual Information Retrieval (Zhang & Tan, ECNLP 2021)
Copy Citation:
PDF:: https://aclanthology.org/2021.ecnlp-1.14.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{zhang-tan-2021-textual,
    title = "Textual Representations for Crosslingual Information Retrieval",
    author = "Zhang, Hang  and
      Tan, Liling",
    editor = "Malmasi, Shervin  and
      Kallumadi, Surya  and
      Ueffing, Nicola  and
      Rokhlenko, Oleg  and
      Agichtein, Eugene  and
      Guy, Ido",
    booktitle = "Proceedings of the 4th Workshop on e-Commerce and NLP",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.ecnlp-1.14/",
    doi = "10.18653/v1/2021.ecnlp-1.14",
    pages = "116--122",
    abstract = "In this paper, we explored different levels of textual representations for cross-lingual information retrieval. Beyond the traditional token level representation, we adopted the subword and character level representations for information retrieval that had shown to improve neural machine translation by reducing the out-of-vocabulary issues in machine translation. We found that crosslingual information retrieval performance can be improved by combining search results from subwords and token level representation. Additionally, we improved the search performance by combining and re-ranking the result sets from the different text representations for German, French and Japanese."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="zhang-tan-2021-textual">
    <titleInfo>
        <title>Textual Representations for Crosslingual Information Retrieval</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Hang</namePart>
        <namePart type="family">Zhang</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Liling</namePart>
        <namePart type="family">Tan</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2021-08</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 4th Workshop on e-Commerce and NLP</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Shervin</namePart>
            <namePart type="family">Malmasi</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Surya</namePart>
            <namePart type="family">Kallumadi</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Nicola</namePart>
            <namePart type="family">Ueffing</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Oleg</namePart>
            <namePart type="family">Rokhlenko</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Eugene</namePart>
            <namePart type="family">Agichtein</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Ido</namePart>
            <namePart type="family">Guy</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>Association for Computational Linguistics</publisher>
            <place>
                <placeTerm type="text">Online</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>In this paper, we explored different levels of textual representations for cross-lingual information retrieval. Beyond the traditional token level representation, we adopted the subword and character level representations for information retrieval that had shown to improve neural machine translation by reducing the out-of-vocabulary issues in machine translation. We found that crosslingual information retrieval performance can be improved by combining search results from subwords and token level representation. Additionally, we improved the search performance by combining and re-ranking the result sets from the different text representations for German, French and Japanese.</abstract>
    <identifier type="citekey">zhang-tan-2021-textual</identifier>
    <identifier type="doi">10.18653/v1/2021.ecnlp-1.14</identifier>
    <location>
        <url>https://aclanthology.org/2021.ecnlp-1.14/</url>
    </location>
    <part>
        <date>2021-08</date>
        <extent unit="page">
            <start>116</start>
            <end>122</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T Textual Representations for Crosslingual Information Retrieval
%A Zhang, Hang
%A Tan, Liling
%Y Malmasi, Shervin
%Y Kallumadi, Surya
%Y Ueffing, Nicola
%Y Rokhlenko, Oleg
%Y Agichtein, Eugene
%Y Guy, Ido
%S Proceedings of the 4th Workshop on e-Commerce and NLP
%D 2021
%8 August
%I Association for Computational Linguistics
%C Online
%F zhang-tan-2021-textual
%X In this paper, we explored different levels of textual representations for cross-lingual information retrieval. Beyond the traditional token level representation, we adopted the subword and character level representations for information retrieval that had shown to improve neural machine translation by reducing the out-of-vocabulary issues in machine translation. We found that crosslingual information retrieval performance can be improved by combining search results from subwords and token level representation. Additionally, we improved the search performance by combining and re-ranking the result sets from the different text representations for German, French and Japanese.
%R 10.18653/v1/2021.ecnlp-1.14
%U https://aclanthology.org/2021.ecnlp-1.14/
%U https://doi.org/10.18653/v1/2021.ecnlp-1.14
%P 116-122

Download as File

Markdown (Informal)

[Textual Representations for Crosslingual Information Retrieval](https://aclanthology.org/2021.ecnlp-1.14/) (Zhang & Tan, ECNLP 2021)

Textual Representations for Crosslingual Information Retrieval (Zhang & Tan, ECNLP 2021)

ACL

Hang Zhang and Liling Tan. 2021. Textual Representations for Crosslingual Information Retrieval. In Proceedings of the 4th Workshop on e-Commerce and NLP, pages 116–122, Online. Association for Computational Linguistics.