Goto

Collaborating Authors

 ferrer-i-cancho


The Distribution of Dependency Distance and Hierarchical Distance in Contemporary Written Japanese and Its Influencing Factors

Wang, Linxuan, Yu, Shuiyuan

arXiv.org Artificial Intelligence

To explore the relationship between dependency distance (DD) and hierarchical distance (HD) in Japanese, we compared the probability distributions of DD and HD with and without sentence length fixed, and analyzed the changes in mean dependency distance (MDD) and mean hierarchical distance (MHD) as sentence length increases, along with their correlation coefficient based on the Balanced Corpus of Contemporary Written Japanese. It was found that the valency of the predicates is the underlying factor behind the trade-off relation between MDD and MHD in Japanese. Native speakers of Japanese regulate the linear complexity and hierarchical complexity through the valency of the predicates, and the relative sizes of MDD and MHD depend on whether the threshold of valency has been reached. Apart from the cognitive load, the valency of the predicates also affects the probability distributions of DD and HD. The effect of the valency of the predicates on the distribution of HD is greater than on that of DD, which leads to differences in their probability distributions and causes the mean of MDD to be lower than that of MHD.


On the class of coding optimality of human languages and the origins of Zipf's law

Ferrer-i-Cancho, Ramon

arXiv.org Artificial Intelligence

Here we present a new class of optimality for coding systems. Members of that class are displaced linearly from optimal coding and thus exhibit Zipf's law, namely a power-law distribution of frequency ranks. Within that class, Zipf's law, the size-rank law and the size-probability law form a group-like structure. We identify human languages that are members of the class. All languages showing sufficient agreement with Zipf's law are potential members of the class. In contrast, there are communication systems in other species that cannot be members of that class for exhibiting an exponential distribution instead but dolphins and humpback whales might. We provide a new insight into plots of frequency versus rank in double logarithmic scale. For any system, a straight line in that scale indicates that the lengths of optimal codes under non-singular coding and under uniquely decodable encoding are displaced by a linear function whose slope is the exponent of Zipf's law. For systems under compression and constrained to be uniquely decodable, such a straight line may indicate that the system is coding close to optimality. We provide support for the hypothesis that Zipf's law originates from compression and define testable conditions for the emergence of Zipf's law in compressing systems.


The exponential distribution of the orders of demonstrative, numeral, adjective and noun

Ferrer-i-Cancho, Ramon

arXiv.org Artificial Intelligence

The frequency of the preferred order for a noun phrase formed by demonstrative, numeral, adjective and noun has received significant attention over the last two decades. We investigate the actual distribution of the preferred 24 possible orders. There is no consensus on whether it can be well-fitted by an exponential or a power law distribution. We find that an exponential distribution is a much better model. This finding and other circumstances where an exponential-like distribution is found challenge the view that power-law distributions, e.g., Zipf's law for word frequencies, are inevitable. We also investigate which of two exponential distributions gives a better fit: an exponential model where the 24 orders have non-zero probability or an exponential model where the number of orders that can have non-zero probability is variable. When parsimony and generalizability are prioritized, we find strong support for the exponential model where all 24 orders have non-zero probability. This finding suggests that there is no hard constraint on word order variation and then unattested orders merely result from undersampling, consistently with Cysouw's view.


Predictability maximization and the origins of word order harmony

Ferrer-i-Cancho, Ramon

arXiv.org Artificial Intelligence

We address the linguistic problem of the sequential arrangement of a head and its dependents from an information theoretic perspective. In particular, we consider the optimal placement of a head that maximizes the predictability of the sequence. We assume that dependents are statistically independent given a head, in line with the open-choice principle and the core assumptions of dependency grammar. We demonstrate the optimality of harmonic order, i.e., placing the head last maximizes the predictability of the head whereas placing the head first maximizes the predictability of dependents. We also show that postponing the head is the optimal strategy to maximize its predictability while bringing it forward is the optimal strategy to maximize the predictability of dependents. We unravel the advantages of the strategy of maximizing the predictability of the head over maximizing the predictability of dependents. Our findings shed light on the placements of the head adopted by real languages or emerging in different kinds of experiments.


The optimal placement of the head in the noun phrase. The case of demonstrative, numeral, adjective and noun

Ferrer-i-Cancho, Ramon

arXiv.org Artificial Intelligence

The word order of a sentence is shaped by multiple principles. The principle of syntactic dependency distance minimization is in conflict with the principle of surprisal minimization (or predictability maximization) in single head syntactic dependency structures: while the former predicts that the head should be placed at the center of the linear arrangement, the latter predicts that the head should be placed at one of the ends (either first or last). A critical question is when surprisal minimization (or predictability maximization) should surpass syntactic dependency distance minimization. In the context of single head structures, it has been predicted that this is more likely to happen when two conditions are met, i.e. (a) fewer words are involved and (b) words are shorter. Here we test the prediction on the noun phrase when it is composed of a demonstrative, a numeral, an adjective and a noun. We find that, across preferred orders in languages, the noun tends to be placed at one of the ends, confirming the theoretical prediction. We also show evidence of anti locality effects: syntactic dependency distances in preferred orders are longer than expected by chance.


Swap distance minimization in SOV languages. Cognitive and mathematical foundations

Ferrer-i-Cancho, Ramon, Namboodiripad, Savithry

arXiv.org Artificial Intelligence

Distance minimization is a general principle of language. A special case of this principle in the domain of word order is swap distance minimization. This principle predicts that variations from a canonical order that are reached by fewer swaps of adjacent constituents are lest costly and thus more likely. Here we investigate the principle in the context of the triple formed by subject (S), object (O) and verb (V). We introduce the concept of word order rotation as a cognitive underpinning of that prediction. When the canonical order of a language is SOV, the principle predicts SOV < SVO, OSV < VSO, OVS < VOS, in order of increasing cognitive cost. We test the prediction in three flexible order SOV languages: Korean (Koreanic), Malayalam (Dravidian), and Sinhalese (Indo-European). Evidence of swap distance minimization is found in all three languages, but it is weaker in Sinhalese. Swap distance minimization is stronger than a preference for the canonical order in Korean and especially Malayalam.


The expected sum of edge lengths in planar linearizations of trees. Theory and applications

Alemany-Puig, Lluís, Ferrer-i-Cancho, Ramon

arXiv.org Artificial Intelligence

Dependency trees have proven to be a very successful model to represent the syntactic structure of sentences of human languages. In these structures, vertices are words and edges connect syntactically-dependent words. The tendency of these dependencies to be short has been demonstrated using random baselines for the sum of the lengths of the edges or its variants. A ubiquitous baseline is the expected sum in projective orderings (wherein edges do not cross and the root word of the sentence is not covered by any edge), that can be computed in time $O(n)$. Here we focus on a weaker formal constraint, namely planarity. In the theoretical domain, we present a characterization of planarity that, given a sentence, yields either the number of planar permutations or an efficient algorithm to generate uniformly random planar permutations of the words. We also show the relationship between the expected sum in planar arrangements and the expected sum in projective arrangements. In the domain of applications, we derive a $O(n)$-time algorithm to calculate the expected value of the sum of edge lengths. We also apply this research to a parallel corpus and find that the gap between actual dependency distance and the random baseline reduces as the strength of the formal constraint on dependency structures increases, suggesting that formal constraints absorb part of the dependency distance minimization effect. Our research paves the way for replicating past research on dependency distance minimization using random planar linearizations as random baseline.


The distribution of syntactic dependency distances

Petrini, Sonia, Ferrer-i-Cancho, Ramon

arXiv.org Artificial Intelligence

The syntactic structure of a sentence can be represented as a graph where vertices are words and edges indicate syntactic dependencies between them. In this setting, the distance between two syntactically linked words can be defined as the difference between their positions. Here we want to contribute to the characterization of the actual distribution of syntactic dependency distances, and unveil its relationship with short-term memory limitations. We propose a new double-exponential model in which decay in probability is allowed to change after a break-point. This transition could mirror the transition from the processing of words chunks to higher-level structures. We find that a two-regime model -- where the first regime follows either an exponential or a power-law decay -- is the most likely one in all 20 languages we considered, independently of sentence length and annotation style. Moreover, the break-point is fairly stable across languages and averages values of 4-5 words, suggesting that the amount of words that can be simultaneously processed abstracts from the specific language to a high degree. Finally, we give an account of the relation between the best estimated model and the closeness of syntactic dependencies, as measured by a recently introduced optimality score.


Memory limitations are hidden in grammar

Gómez-Rodríguez, Carlos, Christiansen, Morten H., Ferrer-i-Cancho, Ramon

arXiv.org Artificial Intelligence

For many centuries, the goal of linguistics has been to capture this capacity by a formal description--a grammar--consisting of a systematic set of rules and/or principles that determine which sentences are part of a given language and which are not (Bod, 2013). Over the years, these formal grammars have taken many forms but common to them all is the assumption that they capture the idealized linguistic competence of a native speaker/hearer, independent of any memory limitations or other non-linguistic cognitive constraints (Chomsky, 1965; Miller, 2000). These abstract formal descriptions have come to play a foundational role in the language sciences, from linguistics, psycholinguistics, and neurolinguistics (Hauser et al., 2002; Pinker, 2003) to computer science, engineering, and machine learning (Klein and Manning, 2003; Dyer et al., 2016; Gómez-Rodríguez et al., 2018). Despite evidence that processing difficulty underpins the unacceptability of certain sentences (Morrill, 2010; Hawkins, 2004), the cognitive independence assumption that is a defining feature of linguistic competence has not been examined in a systematic way using the tools of formal grammar. It is therefore unclear whether these supposedly idealized descriptions of language are free of non-linguistic cognitive constraints, such as memory limitations.