Roy, Pierre
Avoiding Plagiarism in Markov Sequence Generation
Papadopoulos, Alexandre (Sony CSL and Sorbonne Universites, UPMC Univ Paris 06) | Roy, Pierre (Sony CSL) | Pachet, François (Sony CSL and Sorbonne Universites, UPMC Univ Paris 06)
Markov processes are widely used to generate sequences that imitate a given style, using random walk. Random walk generates sequences by iteratively concatenating states to prefixes of length equal or less than the given Markov order}. However, at higher orders, Markov chains tend to replicate chunks of the corpus with a size possibly higher than the order, a primary form of plagiarism. The Markov order defines a maximum length for training but not for generation. In the framework of constraint satisfaction (CSP), we introduce MaxOrder. This global constraint ensures that generated sequences do not include chunks larger than a given maximum order. We exhibit an automaton that recognises the solution set, with a size linear in the size of the corpus. We propose a linear-time procedure to generate this automaton from a corpus and a given max order. We then use this automaton to achieve generalised arc consistency for the MaxOrder constraint, holding on a sequence of size n, in O(n.T) time, where T is the size of the automaton. We illustrate our approach by generating text sequences from text corpora with a maximum order guarantee, effectively controlling plagiarism.
Enforcing Meter in Finite-Length Markov Sequences
Roy, Pierre (Associate Researcher) | Pachet, Francois (Sony CSL Paris)
Markov processes are increasingly used to generate finite-length sequences that imitate a given style. However, Markov processes are notoriously difficult to control. Recently, Markov constraints have been introduced to give users some control on generated sequences. Markov constraints reformulate finite-length Markov sequence generation in the framework of constraint satisfaction (CSP). However, in practice, this approach is limited to local constraints and its performance is low for global constraints, such as cardinality or arithmetic constraints. This limitation prevents generated sequences to follow structural properties which are independent of the style, but inherent to the domain, such as meter. In this article, we introduce meter, a constraint that ensures a sequence is 1) Markovian with regards to a given corpus and 2) follows metrical rules expressed as cumulative cost functions. Additionally, meter can simultaneously enforce cardinality constraints. We propose a domain consistency algorithm whose complexity is pseudo-polynomial. This result is obtained thanks to a theorem on the growth of sumsets by Khovanskii. We illustrate our constraint on meter-constrained music generation problems that were so far not solvable by any other technique.
Finite-Length Markov Processes with Constraints
Pachet, Francois (SONY CSL-Paris) | Roy, Pierre (SONY CSL-Paris) | Barbieri, Gabriele (SONY CSL-Paris)
Many systems use Markov models to generate finite-length sequences that imitate a given style. These systems often need to enforce specific control constraints on the sequences to generate. Unfortunately, control constraints are not compatible with Markov models, as they induce long-range dependencies that violate the Markov hypothesis of limited memory. Attempts to solve this issue using heuristic search do not give any guarantee on the nature and probability of the sequences generated. We propose a novel and efficient approach to controlled Markov generation for a specific class of control constraints that 1) guarantees that generated sequences satisfy control constraints and 2) follow the statistical distribution of the initial Markov model. Revisiting Markov generation in the framework of constraint satisfaction, we show how constraints can be compiled into a non-homogeneous Markov model, using arc-consistency techniques and renormalization. We illustrate the approach on a melody generation problem and sketch some realtime applications in which control constraints are given by gesture controllers.