wonderland
Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet
Zhao, James Xu, Hooi, Bryan, Ng, See-Kiong
Test-time scaling increases inference-time computation by allowing models to generate long reasoning chains, and has shown strong performance across many domains. However, in this work, we show that this approach is not yet effective for knowledge-intensive tasks, where high factual accuracy and low hallucination rates are essential. We conduct a comprehensive evaluation of test-time scaling using 12 reasoning models on two knowledge-intensive benchmarks. Our results reveal that increasing test-time computation does not consistently improve accuracy and, in many cases, it even leads to more hallucinations. We then analyze how extended reasoning affects hallucination behavior. We find that reduced hallucinations often result from the model choosing to abstain after thinking more, rather than from improved factual recall. Conversely, for some models, longer reasoning encourages attempts on previously unanswered questions, many of which result in hallucinations. Case studies show that extended reasoning can induce confirmation bias, leading to overconfident hallucinations. Despite these limitations, we observe that compared to non-thinking, enabling thinking remains beneficial. Code and data are available at https://github.com/XuZhao0/tts-knowledge
- North America > Canada > Ontario > Toronto (0.15)
- Europe > Austria > Vienna (0.14)
- Europe > United Kingdom (0.14)
- (5 more...)
Story Grammar Semantic Matching for Literary Study
Swenor, Abigail, Coffee, Neil, Scheirer, Walter
In Natural Language Processing (NLP), semantic matching algorithms have traditionally relied on the feature of word co-occurrence to measure semantic similarity. While this feature approach has proven valuable in many contexts, its simplistic nature limits its analytical and explanatory power when used to understand literary texts. To address these limitations, we propose a more transparent approach that makes use of story structure and related elements. Using a BERT language model pipeline, we label prose and epic poetry with story element labels and perform semantic matching by only considering these labels as features. This new method, Story Grammar Semantic Matching, guides literary scholars to allusions and other semantic similarities across texts in a way that allows for characterizing patterns and literary technique.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > New York (0.04)
- (4 more...)
Jurors must search for truth in the 'Alice in Wonderland' case against Trump
As former President Donald Trump awaits a Manhattan jury's verdict, he can be forgiven for feeling that his criminal trial resembles a surreal "Alice in Wonderland" farce. He is left to peer through a "Looking-Glass" where everything is backward. The culprit for this hallucinatory nightmare is District Attorney Alvin Bragg who brought a bizarre case based on warped interpretations of law and distorted facts. It is now up to twelve jurors to wade through the lunacy in search of the illusive truth. Bragg's fractured case requires the jury to reach several distinct conclusions on issues that make little sense to begin with.
- Law (1.00)
- Government > Voting & Elections (1.00)
- Government > Regional Government > North America Government > United States Government (0.53)
Branching Narratives: Character Decision Points Detection
This paper presents the Character Decision Points Detection (CHADPOD) task, a task of identification of points within narratives where characters make decisions that may significantly influence the story's direction. We propose a novel dataset based on Choose Your Own Adventure (a registered trademark of Chooseco LLC) games graphs to be used as a benchmark for such a task. We provide a comparative analysis of different models' performance on this task, including a couple of LLMs and several MLMs as baselines, achieving up to 89% accuracy. This underscores the complexity of narrative analysis, showing the challenges associated with understanding character-driven story dynamics. Additionally, we show how such a model can be applied to the existing text to produce linear segments divided by potential branching points, demonstrating the practical application of our findings in narrative analysis.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Illinois > Champaign County > Champaign (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (3 more...)
- Law > Intellectual Property & Technology Law (0.55)
- Leisure & Entertainment > Games > Computer Games (0.46)
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
Yan, An, Yang, Zhengyuan, Zhu, Wanrong, Lin, Kevin, Li, Linjie, Wang, Jianfeng, Yang, Jianwei, Zhong, Yiwu, McAuley, Julian, Gao, Jianfeng, Liu, Zicheng, Wang, Lijuan
We present MM-Navigator, a GPT-4V-based agent for the smartphone graphical user interface (GUI) navigation task. MM-Navigator can interact with a smartphone screen as human users, and determine subsequent actions to fulfill given instructions. Our findings demonstrate that large multimodal models (LMMs), specifically GPT-4V, excel in zero-shot GUI navigation through its advanced screen interpretation, action reasoning, and precise action localization capabilities. We first benchmark MM-Navigator on our collected iOS screen dataset. According to human assessments, the system exhibited a 91\% accuracy rate in generating reasonable action descriptions and a 75\% accuracy rate in executing the correct actions for single-step instructions on iOS. Additionally, we evaluate the model on a subset of an Android screen navigation dataset, where the model outperforms previous GUI navigators in a zero-shot fashion. Our benchmark and detailed analyses aim to lay a robust groundwork for future research into the GUI navigation task. The project page is at https://github.com/zzxslp/MM-Navigator.
- Information Technology > Graphics (1.00)
- Information Technology > Communications > Mobile (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.80)
'Tiny Tina's Wonderlands' nails the feeling of playing Dungeons & Dragons, warts and all
The gameplay is traditional Borderlands. You level up by killing enemies with your weapons, and the stronger the enemy, the more experience you get. As you go through the cities and caves in the game, killing powerful enemies and finding specific chests grants you better loot. Where the mainline Borderlands games have shields, "Wonderlands" re-skins them into "wards" to fit the fantasy theme. Instead of class mods and buff items, you have magic armor and rings that increase your class skills and give you benefits like health regeneration and flat damage buffs.
A Wild Snark descends into the Metaverse by Wild Snark
A wild snark descends into the metaverse is it heaven or hell? He makes this perilous journey to rescue Alice. However; she does not need rescuing She has now realised that the metaverse and wonderland are the same place. She is happy now; as are the white rabbits, who are in fact blue. One of the last few snarks that live in the wild. The other snarks have either been hunted by humans or domesticated.
58 Ways to Visualize Alice in Wonderland (+10 more)
How many ways are there to visualize a book? And, yes, there are websites showing how academics visualize text. But what happens out in the wild? Ever so curious, I decided to find out. To come up with some kind of method to search broadly, I picked one book, Lewis Carroll's Alice's Adventures in Wonderland and decided to find all the possible visualizations that might pop-up on Google/Bing text search, image search, scholar search.
Bayes in Wonderland! Predictive supervised classification inference hits unpredictability
Amiryousefi, Ali, Kinnula, Ville, Tang, Jing
The marginal Bayesian predictive classifiers (mBpc) as opposed to the simultaneous Bayesian predictive classifiers (sBpc), handle each data separately and hence tacitly assumes the independence of the observations. However, due to saturation in learning of generative model parameters, the adverse effect of this false assumption on the accuracy of mBpc tends to wear out in face of increasing amount of training data; guaranteeing the convergence of these two classifiers under de Finetti type of exchangeability. This result however, is far from trivial for the sequences generated under Partition exchangeability (PE), where even umpteen amount of training data is not ruling out the possibility of an unobserved outcome (Wonderland!). We provide a computational scheme that allows the generation of the sequences under PE. Based on that, with controlled increase of the training data, we show the convergence of the sBpc and mBpc. This underlies the use of simpler yet computationally more efficient marginal classifiers instead of simultaneous. We also provide a parameter estimation of the generative model giving rise to the partition exchangeable sequence as well as a testing paradigm for the equality of this parameter across different samples. The package for Bayesian predictive supervised classifications, parameter estimation and hypothesis testing of the Ewens Sampling formula generative model is deposited on CRAN as PEkit package and free available from https://github.com/AmiryousefiLab/PEkit.
World of AI
Simply put across, AI is described as as any task performed by a program or a machine that requires application of human like intelligence to accomplish the task. It's technical simulation i.e., technology which uses complex algorithmic techniques to simulate the way neurons works in human brain. Neurons are the basic unit of our nervous system. AI is superset of Machine learning, Cognitive learning and deep learning, Reinforcement Learning. ML is algorithmic & statistical approach to approximate conclusions, predictions without direct human input.
- Leisure & Entertainment > Games > Chess (0.74)
- Health & Medicine > Therapeutic Area (0.50)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.57)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.49)