In this work, we take a further step towards satisfying practical demands in Chinese lyric generation from musical short-video creators, in respect of the challenges on songs’ format constraints, creating specific lyrics from open-ended inspiration inputs, and language rhyme grace. One representative detail in these demands is to control lyric format at word level, that is, for Chinese songs, creators even expect fix-length words on certain positions in a lyric to match a special melody, while previous methods lack such ability. Although recent lyric generation community has made gratifying progress, most methods are not comprehensive enough to simultaneously meet these demands. As a result, we propose ChipSong, which is an assisted lyric generation system built based on a Transformer-based autoregressive language model architecture, and generates controlled lyric paragraphs fit for musical short-video display purpose, by designing 1) a novel Begin-Internal-End (BIE) word-granularity embedding sequence with its guided attention mechanism for word-level length format control, and an explicit symbol set for sentence-level length format control; 2) an open-ended trigger word mechanism to guide specific lyric contents generation; 3) a paradigm of reverse order training and shielding decoding for rhyme control. Extensive experiments show that our ChipSong generates fluent lyrics, with assuring the high consistency to pre-determined control conditions.
Semantic parses are directed acyclic graphs (DAGs), but in practice most parsers treat them as strings or trees, mainly because models that predict graphs are far less understood. This simplification, however, comes at a cost: there is no guarantee that the output is a well-formed graph. A recent work by Fancellu et al. (2019) addressed this problem by proposing a graph-aware sequence model that utilizes a DAG grammar to guide graph generation. We significantly improve upon this work, by proposing a simpler architecture as well as more efficient training and inference algorithms that can always guarantee the well-formedness of the generated graphs. Importantly, unlike Fancellu et al., our model does not require language-specific features, and hence can harness the inherent ability of DAG-grammar parsing in multilingual settings. We perform monolingual as well as multilingual experiments on the Parallel Meaning Bank (Abzianidze et al., 2017). Our parser outperforms previous graph-aware models by a large margin, and closes the performance gap between string-based and DAG-grammar parsing.
Despite the recent advances in coherence modelling, most such models including state-of-the-art neural ones, are evaluated on either contrived proxy tasks such as the standard order discrimination benchmark, or tasks that require special expert annotation. Moreover, most evaluations are conducted on small newswire corpora. To address these shortcomings, in this paper we propose four generic evaluation tasks that draw on different aspects of coherence at both the lexical and document levels, and can be applied to any corpora. In designing these tasks, we aim at capturing coherence-specific properties, such as the correct use of discourse connectives, lexical cohesion, as well as the overall temporal and causal consistency among events and participants in a story. Importantly, our proposed tasks either rely on automatically-generated data, or data annotated for other purposes, hence alleviating the need for annotation specifically targeted to the task of coherence modelling. We perform experiments with several existing state-of-the-art neural models of coherence on these tasks, across large corpora from different domains, including newswire, dialogue, as well as narrative and instructional text. Our findings point to a strong need for revisiting the common practices in the development and evaluation of coherence models.