Goto

Collaborating Authors

 chang




The Missing Layer of AGI: From Pattern Alchemy to Coordination Physics

Chang, Edward Y.

arXiv.org Artificial Intelligence

Influential critiques argue that Large Language Models (LLMs) are a dead end for AGI: "mere pattern matchers" structurally incapable of reasoning or planning. We argue this conclusion misidentifies the bottleneck: it confuses the ocean with the net. Pattern repositories are the necessary System-1 substrate; the missing component is a System-2 coordination layer that selects, constrains, and binds these patterns. We formalize this layer via UCCT, a theory of semantic anchoring that models reasoning as a phase transition governed by effective support (rho_d), representational mismatch (d_r), and an adaptive anchoring budget (gamma log k). Under this lens, ungrounded generation is simply an unbaited retrieval of the substrate's maximum likelihood prior, while "reasoning" emerges when anchors shift the posterior toward goal-directed constraints. We translate UCCT into architecture with MACI, a coordination stack that implements baiting (behavior-modulated debate), filtering (Socratic judging), and persistence (transactional memory). By reframing common objections as testable coordination failures, we argue that the path to AGI runs through LLMs, not around them.


If I Could Turn Back Time: Temporal Reframing as a Historical Reasoning Task for LLMs

Bungum, Lars, Huang, Charles Yijia, Kashar, Abeer

arXiv.org Artificial Intelligence

In this study, we experiment with the ability of LLMs to do temporal reasoning. Using a Norwegian book from 1940 containing trivia questions, we prompt the LLMs to answer the questions as if it were 1940. We also pose the questions in both English and Norwegian. Correct answers are often presented as sentences, and grading is done by means of LLM-as-judge, with sampled checks by a native speaker. Prompting in English consistently gave better results than in Norwegian, an unexpected result. In contrast, using larger LLMs improved results. We tested the DeepSeek-R1, Gemma3, Qwen3, and Llama3.1 model families, and also the largest available LLM especially crafted for Norwegian.


Inverse-Free Wilson Loops for Transformers: A Practical Diagnostic for Invariance and Order Sensitivity

Chang, Edward Y., Chang, Ethan Y.

arXiv.org Artificial Intelligence

Large language models can change answers under harmless edits that matter in practice: RAG outputs flip when passages are reordered, fine-tuning erodes invariances learned at pretraining, debate or chain-of-thought prompts take path-dependent routes, and compiler fusion or reordering perturbs logits near decision boundaries. These failures violate intended invariances, break continuous integration, and force teams to trade safety for speed. The effects are small yet distributed across layers and positions, sensitive to context length and evaluation order, and costly to repair with retraining or formal verification. We present WILSON, a minimal post-hoc diagnostic suite that converts simple loop and reordering checks on internal representations into system signals. WILSON combines an inverse-free curvature map over positions and layers, computed with JVPs and Hutchinson probes, with activation-level commutators that flag reorder risk. Signals are cheap to compute, model-agnostic for standard Transformers, and exported as thresholds and CSV artifacts for orchestrators. This enables concrete actions: guard RAG against order effects, catch fine-tuning regressions, stabilize debate pathways and long multi-turn contexts, and gate fusions or reorders in deployment. In short, WILSON helps anticipate failures and approve safe optimizations so reliability and throughput can improve together without changing model architecture or training.



The Bias is in the Details: An Assessment of Cognitive Bias in LLMs

Knipper, R. Alexander, Knipper, Charles S., Zhang, Kaiqi, Sims, Valerie, Bowers, Clint, Karmaker, Santu

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) are increasingly embedded in real-world decision-making processes, it becomes crucial to examine the extent to which they exhibit cognitive biases. Extensively studied in the field of psychology, cognitive biases appear as systematic distortions commonly observed in human judgments. This paper presents a large-scale evaluation of eight well-established cognitive biases across 45 LLMs, analyzing over 2.8 million LLM responses generated through controlled prompt variations. To achieve this, we introduce a novel evaluation framework based on multiple-choice tasks, hand-curate a dataset of 220 decision scenarios targeting fundamental cognitive biases in collaboration with psychologists, and propose a scalable approach for generating diverse prompts from human-authored scenario templates. Our analysis shows that LLMs exhibit bias-consistent behavior in 17.8-57.3% of instances across a range of judgment and decision-making contexts targeting anchoring, availability, confirmation, framing, interpretation, overattribution, prospect theory, and representativeness biases. We find that both model size and prompt specificity play a significant role on bias susceptibility as follows: larger size (>32B parameters) can reduce bias in 39.5% of cases, while higher prompt detail reduces most biases by up to 14.9%, except in one case (Overattribution), which is exacerbated by up to 8.8%.


Cat owners donate more money than dog owners

Popular Science

An analysis of nearly $70 billion in donations showed feline lovers contributed slightly more. Breakthroughs, discoveries, and DIY tips sent every weekday. Or at least as old as about 26 to 35 million years ago when felines and canines first evolved . Cats and dogs duking it out for who is the best human companion . That debate has spilled over to humans, with research showing that dog owners are more often considered more social and community-oriented, while cat owners are believed to be more introverted and open-minded .


Hierarchical Retrieval: The Geometry and a Pretrain-Finetune Recipe

You, Chong, Jayaram, Rajesh, Suresh, Ananda Theertha, Nittka, Robin, Yu, Felix, Kumar, Sanjiv

arXiv.org Machine Learning

Dual encoder (DE) models, where a pair of matching query and document are embedded into similar vector representations, are widely used in information retrieval due to their simplicity and scalability. However, the Euclidean geometry of the embedding space limits the expressive power of DEs, which may compromise their quality. This paper investigates such limitations in the context of hierarchical retrieval (HR), where the document set has a hierarchical structure and the matching documents for a query are all of its ancestors. We first prove that DEs are feasible for HR as long as the embedding dimension is linear in the depth of the hierarchy and logarithmic in the number of documents. Then we study the problem of learning such embeddings in a standard retrieval setup where DEs are trained on samples of matching query and document pairs. Our experiments reveal a lost-in-the-long-distance phenomenon, where retrieval accuracy degrades for documents further away in the hierarchy. To address this, we introduce a pretrain-finetune recipe that significantly improves long-distance retrieval without sacrificing performance on closer documents. We experiment on a realistic hierarchy from WordNet for retrieving documents at various levels of abstraction, and show that pretrain-finetune boosts the recall on long-distance pairs from 19% to 76%. Finally, we demonstrate that our method improves retrieval of relevant products on a shopping queries dataset.


It shocked the market but has China's DeepSeek changed AI?

BBC News

DeepSeek's arrival also marked a turning point in the US-China AI rivalry, some experts say. "China was seen as playing catch-up in large language models until this point, with competitive models but always trailing the best western ones," policy analyst Wendy Chang of the Mercator Institute for China Studies told the BBC. A large language model (LLM) is a reasoning system trained to predict the next word in a given sentence or phrase. DeepSeek changed perceptions when it claimed to have achieved a leading model for a fraction of the computational resources and costs common among its American counterparts. OpenAI had spent 5bn ( 3.7bn) in 2024 alone.