beltran
Identifying Quantum Structure in AI Language: Evidence for Evolutionary Convergence of Human and Artificial Cognition
Aerts, Diederik, Arguëlles, Jonito Aerts, Beltran, Lester, Geriente, Suzette, Leporini, Roberto, de Bianchi, Massimiliano Sassoli, Sozzo, Sandro
We present the results of cognitive tests on conceptual combinations, performed using specific Large Language Models (LLMs) as test subjects. In the first test, performed with ChatGPT and Gemini, we show that Bell's inequalities are significantly violated, which indicates the presence of 'quantum entanglement' in the tested concepts. In the second test, also performed using ChatGPT and Gemini, we instead identify the presence of 'Bose-Einstein statistics', rather than the intuitively expected 'Maxwell-Boltzmann statistics', in the distribution of the words contained in large-size texts. Interestingly, these findings mirror the results previously obtained in both cognitive tests with human participants and information retrieval tests on large corpora. Taken together, they point to the 'systematic emergence of quantum structures in conceptual-linguistic domains', regardless of whether the cognitive agent is human or artificial. Although LLMs are classified as neural networks for historical reasons, we believe that a more essential form of knowledge organization takes place in the distributive semantic structure of vector spaces built on top of the neural network. It is this meaning-bearing structure that lends itself to a phenomenon of evolutionary convergence between human cognition and language, slowly established through biological evolution, and LLM cognition and language, emerging much more rapidly as a result of self-learning and training. We analyze various aspects and examples that contain evidence supporting the above hypothesis. We also advance a unifying framework that explains the pervasive quantum organization of meaning that we identify.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (5 more...)
- Research Report > Experimental Study (0.67)
- Research Report > New Finding (0.46)
- Health & Medicine (0.47)
- Energy (0.45)
PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
Gong, Albert, Stankevičiūtė, Kamilė, Wan, Chao, Kabra, Anmol, Thesmar, Raphael, Lee, Johann, Klenke, Julius, Gomes, Carla P., Weinberger, Kilian Q.
High-quality benchmarks are essential for evaluating reasoning and retrieval capabilities of large language models (LLMs). However, curating datasets for this purpose is not a permanent solution as they are prone to data leakage and inflated performance results. To address these challenges, we propose PhantomWiki: a pipeline to generate unique, factually consistent document corpora with diverse question-answer pairs. Unlike prior work, PhantomWiki is neither a fixed dataset, nor is it based on any existing data. Instead, a new PhantomWiki instance is generated on demand for each evaluation. We vary the question difficulty and corpus size to disentangle reasoning and retrieval capabilities respectively, and find that PhantomWiki datasets are surprisingly challenging for frontier LLMs. Thus, we contribute a scalable and data leakage-resistant framework for disentangled evaluation of reasoning, retrieval, and tool-use abilities. Our code is available at https://github.com/kilian-group/phantom-wiki.
- North America > United States (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- Health & Medicine (0.46)
- Government (0.46)
- Food & Agriculture (0.46)
Identifying Quantum Mechanical Statistics in Italian Corpora
Aerts, Diederik, Arguëlles, Jonito Aerts, Beltran, Lester, de Bianchi, Massimiliano Sassoli, Sozzo, Sandro
We present a theoretical and empirical investigation of the statistical behaviour of the words in a text produced by human language. To this aim, we analyse the word distribution of various texts of Italian language selected from a specific literary corpus. We firstly generalise a theoretical framework elaborated by ourselves to identify 'quantum mechanical statistics' in large-size texts. Then, we show that, in all analysed texts, words distribute according to 'Bose--Einstein statistics' and show significant deviations from 'Maxwell--Boltzmann statistics'. Next, we introduce an effect of 'word randomization' which instead indicates that the difference between the two statistical models is not as pronounced as in the original cases. These results confirm the empirical patterns obtained in texts of English language and strongly indicate that identical words tend to 'clump together' as a consequence of their meaning, which can be explained as an effect of 'quantum entanglement' produced through a phenomenon of 'contextual updating'. More, word randomization can be seen as the linguistic-conceptual equivalent of an increase of temperature which destroys 'coherence' and makes classical statistics prevail over quantum statistics. Some insights into the origin of quantum statistics in physics are finally provided.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > New York (0.04)
- Asia > Singapore (0.04)
- (3 more...)
Development of a Thermodynamics of Human Cognition and Human Culture
Aerts, Diederik, Arguëlles, Jonito Aerts, Beltran, Lester, Sozzo, Sandro
Inspired by foundational studies in classical and quantum physics, and by information retrieval studies in quantum information theory, we prove that the notions of 'energy' and 'entropy' can be consistently introduced in human language and, more generally, in human culture. More explicitly, if energy is attributed to words according to their frequency of appearance in a text, then the ensuing energy levels are distributed non-classically, namely, they obey Bose-Einstein, rather than Maxwell-Boltzmann, statistics, as a consequence of the genuinely 'quantum indistinguishability' of the words that appear in the text. Secondly, the 'quantum entanglement' due to the way meaning is carried by a text reduces the (von Neumann) entropy of the words that appear in the text, a behaviour which cannot be explained within classical (thermodynamic or information) entropy. We claim here that this 'quantum-type behaviour is valid in general in human language', namely, any text is conceptually more concrete than the words composing it, which entails that the entropy of the overall text decreases. In addition, we provide examples taken from cognition, where quantization of energy appears in categorical perception, and from culture, where entities collaborate, thus 'entangle', to decrease overall entropy. We use these findings to propose the development of a new 'non-classical thermodynamic theory' for human cognition, which also covers broad parts of human culture and its artefacts and bridges concepts with quantum physics entities.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Singapore (0.04)
- North America > United States > New York (0.04)
- (3 more...)
Are Words the Quanta of Human Language? Extending the Domain of Quantum Cognition
Aerts, Diederik, Beltran, Lester
In previous research, we showed that 'texts that tell a story' exhibit a statistical structure that is not Maxwell-Boltzmann but Bose-Einstein. Our explanation is that this is due to the presence of 'indistinguishability' in human language as a result of the same words in different parts of the story being indistinguishable from one another. In the current article, we set out to provide an explanation for this Bose-Einstein statistics. We show that it is the presence of 'meaning' in 'stories' that gives rise to the lack of independence characteristic of Bose-Einstein, and provides conclusive evidence that 'words can be considered the quanta of human language', structurally similar to how 'photons are the quanta of light'. Using several studies on entanglement from our Brussels research group, we also show that it is also the presence of 'meaning' in texts that makes the von Neumann entropy of a total text smaller relative to the entropy of the words composing it. We explain how the new insights in this article fit in with the research domain called 'quantum cognition', where quantum probability models and quantum vector spaces are used in human cognition, and are also relevant to the use of quantum structures in information retrieval and natural language processing, and how they introduce 'quantization' and 'Bose-Einstein statistics' as relevant quantum effects there. Inspired by the conceptuality interpretation of quantum mechanics, and relying on the new insights, we put forward hypotheses about the nature of physical reality. In doing so, we note how this new type of decrease in entropy, and its explanation, may be important for the development of quantum thermodynamics. We likewise note how it can also give rise to an original explanatory picture of the nature of physical reality on the surface of planet Earth, in which human culture emerges as a reinforcing continuation of life.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (5 more...)
Learning to Rank based on Analogical Reasoning
Fahandar, Mohsen Ahmadi, Hüllermeier, Eyke
Object ranking or "learning to rank" is an important problem in the realm of preference learning. On the basis of training data in the form of a set of rankings of objects represented as feature vectors, the goal is to learn a ranking function that predicts a linear order of any new set of objects. In this paper, we propose a new approach to object ranking based on principles of analogical reasoning. More specifically, our inference pattern is formalized in terms of so-called analogical proportions and can be summarized as follows: Given objects $A,B,C,D$, if object $A$ is known to be preferred to $B$, and $C$ relates to $D$ as $A$ relates to $B$, then $C$ is (supposedly) preferred to $D$. Our method applies this pattern as a main building block and combines it with ideas and techniques from instance-based learning and rank aggregation. Based on first experimental results for data sets from various domains (sports, education, tourism, etc.), we conclude that our approach is highly competitive. It appears to be specifically interesting in situations in which the objects are coming from different subdomains, and which hence require a kind of knowledge transfer.
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > North Macedonia > Southwestern Statistical Region > Ohrid Municipality > Ohrid (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Analogical Reasoning (0.71)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- (2 more...)