VSoLSCSum: Building a Vietnamese Sentence-Comment Dataset for Social Context Summarization

Minh-Tien Nguyen; Dac Viet Lai; Phong-Khac Do; Duc-Vu Tran; Minh Le Nguyen

VSoLSCSum: Building a Vietnamese Sentence-Comment Dataset for Social Context Summarization

Minh-Tien Nguyen, Dac Viet Lai, Phong-Khac Do, Duc-Vu Tran, Minh-Le Nguyen

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member.

⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.

Important: The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF.

Title Adjust the title. Retain tags such as <fixed-case>.

Authors Adjust author names and order to match the PDF.

Abstract Correct abstract if needed. Retain XML formatting tags such as <tex-math>. You may use <b>...</b> for bold, <i>...</i> for italic, and <url>...</url> for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.)

Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents.

Abstract

This paper presents VSoLSCSum, a Vietnamese linked sentence-comment dataset, which was manually created to treat the lack of standard corpora for social context summarization in Vietnamese. The dataset was collected through the keywords of 141 Web documents in 12 special events, which were mentioned on Vietnamese Web pages. Social users were asked to involve in creating standard summaries and the label of each sentence or comment. The inter-agreement calculated by Cohen’s Kappa among raters after validating is 0.685. To illustrate the potential use of our dataset, a learning to rank method was trained by using a set of local and social features. Experimental results indicate that the summary model trained on our dataset outperforms state-of-the-art baselines in both ROUGE-1 and ROUGE-2 in social context summarization.

Anthology ID:: W16-5405
Volume:: Proceedings of the 12th Workshop on Asian Language Resources (ALR12)
Month:: December
Year:: 2016
Address:: Osaka, Japan
Editors:: Koiti Hasida, Kam-Fai Wong, Nicoletta Calzorari, Key-Sun Choi
Venue:: ALR
SIG:
Publisher:: The COLING 2016 Organizing Committee
Note:
Pages:: 38–48
Language:
URL:: https://aclanthology.org/W16-5405/
DOI:
Bibkey:
Cite (ACL):: Minh-Tien Nguyen, Dac Viet Lai, Phong-Khac Do, Duc-Vu Tran, and Minh-Le Nguyen. 2016. VSoLSCSum: Building a Vietnamese Sentence-Comment Dataset for Social Context Summarization. In Proceedings of the 12th Workshop on Asian Language Resources (ALR12), pages 38–48, Osaka, Japan. The COLING 2016 Organizing Committee.
Cite (Informal):: VSoLSCSum: Building a Vietnamese Sentence-Comment Dataset for Social Context Summarization (Nguyen et al., ALR 2016)
Copy Citation:
PDF:: https://aclanthology.org/W16-5405.pdf

PDF Cite Search Fix data

Export citation

BibTeX
MODS XML
Endnote
Preformatted

@inproceedings{nguyen-etal-2016-vsolscsum,
    title = "{VS}o{LSCS}um: Building a {V}ietnamese Sentence-Comment Dataset for Social Context Summarization",
    author = "Nguyen, Minh-Tien  and
      Lai, Dac Viet  and
      Do, Phong-Khac  and
      Tran, Duc-Vu  and
      Nguyen, Minh-Le",
    editor = "Hasida, Koiti  and
      Wong, Kam-Fai  and
      Calzorari, Nicoletta  and
      Choi, Key-Sun",
    booktitle = "Proceedings of the 12th Workshop on {A}sian Language Resources ({ALR}12)",
    month = dec,
    year = "2016",
    address = "Osaka, Japan",
    publisher = "The COLING 2016 Organizing Committee",
    url = "https://aclanthology.org/W16-5405/",
    pages = "38--48",
    abstract = "This paper presents VSoLSCSum, a Vietnamese linked sentence-comment dataset, which was manually created to treat the lack of standard corpora for social context summarization in Vietnamese. The dataset was collected through the keywords of 141 Web documents in 12 special events, which were mentioned on Vietnamese Web pages. Social users were asked to involve in creating standard summaries and the label of each sentence or comment. The inter-agreement calculated by Cohen{'}s Kappa among raters after validating is 0.685. To illustrate the potential use of our dataset, a learning to rank method was trained by using a set of local and social features. Experimental results indicate that the summary model trained on our dataset outperforms state-of-the-art baselines in both ROUGE-1 and ROUGE-2 in social context summarization."
}

Download as File

<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="nguyen-etal-2016-vsolscsum">
    <titleInfo>
        <title>VSoLSCSum: Building a Vietnamese Sentence-Comment Dataset for Social Context Summarization</title>
    </titleInfo>
    <name type="personal">
        <namePart type="given">Minh-Tien</namePart>
        <namePart type="family">Nguyen</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Dac</namePart>
        <namePart type="given">Viet</namePart>
        <namePart type="family">Lai</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Phong-Khac</namePart>
        <namePart type="family">Do</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Duc-Vu</namePart>
        <namePart type="family">Tran</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <name type="personal">
        <namePart type="given">Minh-Le</namePart>
        <namePart type="family">Nguyen</namePart>
        <role>
            <roleTerm authority="marcrelator" type="text">author</roleTerm>
        </role>
    </name>
    <originInfo>
        <dateIssued>2016-12</dateIssued>
    </originInfo>
    <typeOfResource>text</typeOfResource>
    <relatedItem type="host">
        <titleInfo>
            <title>Proceedings of the 12th Workshop on Asian Language Resources (ALR12)</title>
        </titleInfo>
        <name type="personal">
            <namePart type="given">Koiti</namePart>
            <namePart type="family">Hasida</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Kam-Fai</namePart>
            <namePart type="family">Wong</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Nicoletta</namePart>
            <namePart type="family">Calzorari</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <name type="personal">
            <namePart type="given">Key-Sun</namePart>
            <namePart type="family">Choi</namePart>
            <role>
                <roleTerm authority="marcrelator" type="text">editor</roleTerm>
            </role>
        </name>
        <originInfo>
            <publisher>The COLING 2016 Organizing Committee</publisher>
            <place>
                <placeTerm type="text">Osaka, Japan</placeTerm>
            </place>
        </originInfo>
        <genre authority="marcgt">conference publication</genre>
    </relatedItem>
    <abstract>This paper presents VSoLSCSum, a Vietnamese linked sentence-comment dataset, which was manually created to treat the lack of standard corpora for social context summarization in Vietnamese. The dataset was collected through the keywords of 141 Web documents in 12 special events, which were mentioned on Vietnamese Web pages. Social users were asked to involve in creating standard summaries and the label of each sentence or comment. The inter-agreement calculated by Cohen’s Kappa among raters after validating is 0.685. To illustrate the potential use of our dataset, a learning to rank method was trained by using a set of local and social features. Experimental results indicate that the summary model trained on our dataset outperforms state-of-the-art baselines in both ROUGE-1 and ROUGE-2 in social context summarization.</abstract>
    <identifier type="citekey">nguyen-etal-2016-vsolscsum</identifier>
    <location>
        <url>https://aclanthology.org/W16-5405/</url>
    </location>
    <part>
        <date>2016-12</date>
        <extent unit="page">
            <start>38</start>
            <end>48</end>
        </extent>
    </part>
</mods>
</modsCollection>

Download as File

%0 Conference Proceedings
%T VSoLSCSum: Building a Vietnamese Sentence-Comment Dataset for Social Context Summarization
%A Nguyen, Minh-Tien
%A Lai, Dac Viet
%A Do, Phong-Khac
%A Tran, Duc-Vu
%A Nguyen, Minh-Le
%Y Hasida, Koiti
%Y Wong, Kam-Fai
%Y Calzorari, Nicoletta
%Y Choi, Key-Sun
%S Proceedings of the 12th Workshop on Asian Language Resources (ALR12)
%D 2016
%8 December
%I The COLING 2016 Organizing Committee
%C Osaka, Japan
%F nguyen-etal-2016-vsolscsum
%X This paper presents VSoLSCSum, a Vietnamese linked sentence-comment dataset, which was manually created to treat the lack of standard corpora for social context summarization in Vietnamese. The dataset was collected through the keywords of 141 Web documents in 12 special events, which were mentioned on Vietnamese Web pages. Social users were asked to involve in creating standard summaries and the label of each sentence or comment. The inter-agreement calculated by Cohen’s Kappa among raters after validating is 0.685. To illustrate the potential use of our dataset, a learning to rank method was trained by using a set of local and social features. Experimental results indicate that the summary model trained on our dataset outperforms state-of-the-art baselines in both ROUGE-1 and ROUGE-2 in social context summarization.
%U https://aclanthology.org/W16-5405/
%P 38-48

Download as File

Markdown (Informal)

[VSoLSCSum: Building a Vietnamese Sentence-Comment Dataset for Social Context Summarization](https://aclanthology.org/W16-5405/) (Nguyen et al., ALR 2016)

VSoLSCSum: Building a Vietnamese Sentence-Comment Dataset for Social Context Summarization (Nguyen et al., ALR 2016)

ACL

Minh-Tien Nguyen, Dac Viet Lai, Phong-Khac Do, Duc-Vu Tran, and Minh-Le Nguyen. 2016. VSoLSCSum: Building a Vietnamese Sentence-Comment Dataset for Social Context Summarization. In Proceedings of the 12th Workshop on Asian Language Resources (ALR12), pages 38–48, Osaka, Japan. The COLING 2016 Organizing Committee.