Proceedings of the Second International Workshop of Discourse Processing
Discourse parsing aims to comprehensively acquire the logical structure of the whole text which may be helpful to some downstream applications such as summarization, reading comprehension, QA and so on. One important issue behind discourse parsing is the representation of discourse structure. Up to now, many discourse structures have been proposed, and the correponding parsing methods are designed, promoting the development of discourse research. In this paper, we mainly introduce our recent discourse research and its preliminary application from the dependency view.
Machine translation (MT) models usually translate a text at sentence level by considering isolated sentences, which is based on a strict assumption that the sentences in a text are independent of one another. However, the fact is that the texts at discourse level have properties going beyond individual sentences. These properties reveal texts in the frequency and distribution of words, word senses, referential forms and syntactic structures. Dissregarding dependencies across sentences will harm translation quality especially in terms of coherence, cohesion, and consistency. To solve these problems, several approaches have previously been investigated for conventional statistical machine translation (SMT). With the fast growth of neural machine translation (NMT), discourse-level NMT has drawn increasing attention from researchers. In this work, we review major works on addressing discourse related problems for both SMT and NMT models with a survey of recent trends in the fields.
The need to evaluate the ability of context-aware neural machine translation (NMT) models in dealing with specific discourse phenomena arises in document-level NMT. However, test sets that satisfy this need are rare. In this paper, we propose a test suite to evaluate three common discourse phenomena in English-Chinese translation: pronoun, discourse connective and ellipsis where discourse divergences lie across the two languages. The test suite contains 1,200 instances, 400 for each type of discourse phenomena. We perform both automatic and human evaluation with three state-of-the-art context-aware NMT models on the proposed test suite. Results suggest that our test suite can be used as a challenging benchmark test bed for evaluating document-level NMT. The test suite will be publicly available soon.
In recent years, attention mechanism has been widely used in various neural machine translation tasks based on encoder decoder. This paper focuses on the performance of encoder decoder attention mechanism in word sense disambiguation task with different text length, trying to find out the influence of context marker on attention mechanism in word sense disambiguation task. We hypothesize that attention mechanisms have similar performance when translating texts of different lengths. Our conclusion is that the alignment effect of attention mechanism is magnified in short text translation tasks with ambiguous nouns, while the effect of attention mechanism is far less than expected in long-text tasks, which means that attention mechanism is not the main mechanism for NMT model to feed WSD to integrate context information. This may mean that attention mechanism pays more attention to ambiguous nouns than context markers. The experimental results show that with the increase of text length, the performance of NMT model using attention mechanism will gradually decline.
Previous neural approaches achieve significant progress for Chinese word segmentation (CWS) as a sentence-level task, but it suffers from limitations on real-world scenario. In this paper, we address this issue with a context-aware method and optimize the solution at document-level. This paper proposes a three-step strategy to improve the performance for discourse CWS. First, the method utilizes an auxiliary segmenter to remedy the limitation on pre-segmenter. Then the context-aware algorithm computes the confidence of each split. The maximum probability path is reconstructed via this algorithm. Besides, in order to evaluate the performance in discourse, we build a new benchmark consisting of the latest news and Chinese medical articles. Extensive experiments on this benchmark show that our proposed method achieves a competitive performance on a document-level real-world scenario for CWS.
With regards to WikiSum (CITATION) that empowers applicative explorations of Neural Multi-Document Summarization (MDS) to learn from large scale dataset, this study develops two hierarchical Transformers (HT) that describe both the cross-token and cross-document dependencies, at the same time allow extended length of input documents. By incorporating word- and paragraph-level multi-head attentions in the decoder based on the parallel and vertical architectures, the proposed parallel and vertical hierarchical Transformers (PHT &VHT) generate summaries utilizing context-aware word embeddings together with static and dynamics paragraph embeddings, respectively. A comprehensive evaluation is conducted on WikiSum to compare PHT &VHT with established models and to answer the question whether hierarchical structures offer more promising performances than flat structures in the MDS task. The results suggest that our hierarchical models generate summaries of higher quality by better capturing cross-document relationships, and save more memory spaces in comparison to flat-structure models. Moreover, we recommend PHT given its practical value of higher inference speed and greater memory-saving capacity.
In this paper, we explore a new approach based on discourse analysis for the task of intent segmentation. Our target texts are user queries from a real-world chatbot. Our results show the feasibility of our approach with an F1-score of 82.97 points, and some advantages and disadvantages compared to two machine learning baselines: BERT and LSTM+CRF.
In human question-answering (QA), questions are often expressed in the form of multiple sentences. One can see this in both spoken QA interactions, when one person asks a question of another, and written QA, such as are found on-line in FAQs and in what are called ”Community Question-Answering Forums”. Computer-based QA has taken the challenge of these ”multi-sentence questions” to be that of breaking them into an appropriately ordered sequence of separate questions, with both the previous questions and their answers serving as context for the next question. This can be seen, for example, in two recent workshops at AAAI called ”Reasoning for Complex QA” [https://rcqa-ws.github.io/program/]. We claim that, while appropriate for some types of ”multi-sentence questions” (MSQs), it is not appropriate for all, because they are essentially different types of discourse. To support this claim, we need to provide evidence that: • different types of MSQs are answered differently in written or spoken QA between people; • people can (and do) distinguish these different types of MSQs; • systems can be made to both distinguish different types of MSQs and provide appropriate answers.
NT Clause Complex Framework defines a clause complex as a combination of NT clauses through component sharing and logic-semantic relationship. This paper clarifies the existence of component sharing mechanism in both English and Chinese clause complexes, illustrates the differences in component sharing between the two languages, and introduces a formal annotation scheme to represent clause-complex level structural transformations. Under the guidance of the annotation scheme, the English-Chinese Clause Alignment Corpus is built. It is believed that this corpus will aid comparative linguistic studies, translation studies and machine translation studies by providing abundant formal and computable samples for English-Chinese structural transformations on the clause complex level.
Connected texts are characterised by the presence of linguistic elements relating to shared referents throughout the text. These elements together form a structure that lends cohesion to the text. The realisation of those cohesive structures is subject to different constraints and varying preferences in different languages. We regularly observe mismatches of cohesive structures across languages in parallel texts. This can be a result of either a divergence of language-internal constraints or of effects of the translation process. As fully automatic high-quality MT is starting to look achievable, the question arises how cohesive elements should be handled in MT evaluation, since the common assumption of 1:1 correspondence between referring expressions is a poor match for what we find in corpus data. Focusing on the translation of pronouns, I discuss different approaches to evaluating a particular type of cohesive elements in MT output and the trade-offs they make between evaluation cost, validity, specificity and coverage. I suggest that a meaningful evaluation of cohesive structures in translation is difficult to achieve simply by appealing to the intuition of human annotators, but requires a more structured approach that forces us to make up our minds about the standards we expect the translation output to adhere to.