Goto

Collaborating Authors

 infinite loop


StyleBench: Evaluating thinking styles in Large Language Models

arXiv.org Artificial Intelligence

The effectiveness of Large Language Models (LLMs) is heavily influenced by the reasoning strategies, or styles of thought, employed in their prompts. However, the interplay between these reasoning styles, model architecture, and task type remains poorly understood. To address this, we introduce StyleBench, a comprehensive benchmark for systematically evaluating reasoning styles across diverse tasks and models. We assess five representative reasoning styles--Chain-of-Thought (CoT), Tree-of-Thought (ToT), Algorithm-of-Thought (AoT), Sketch-of-Thought (SoT), and Chain-of-Draft (CoD)--on five reasoning tasks, using 15 open-source models from major families (LLaMA, Qwen, Mistral, Gemma, GPT -OSS, Phi, and DeepSeek) ranging from 270M to 120B parameters. Our large-scale analysis reveals that no single style is universally optimal. We demonstrate that strategy efficacy is highly contingent on both model scale and task type: search-based methods (AoT, ToT) excel in open-ended problems but require large-scale models, while concise styles (SoT, CoD) achieve radical efficiency gains on well-defined tasks. Furthermore, we identify key behavioral patterns: smaller models frequently fail to follow output instructions and default to guessing, while reasoning robustness emerges as a function of scale. Our findings offer a crucial roadmap for selecting optimal reasoning strategies based on specific constraints, We open source the benchmark in https://github.com/JamesJunyuGuo/Style_Bench. Large Language Models (LLMs) have demonstrated impressive capabilities across a diverse range of tasks, including mathematical reasoning, code generation, and complex question answering (Imani et al., 2023; Wang & Chen, 2023; Tan et al., 2023). A key insight from prior work is that their performance on challenging problems is not merely a function of scale, but is critically dependent on the methods used to guide reasoning (Huang & Y ang, 2025). This has spurred the development of sophisticated prompting techniques designed to structure the model's internal reasoning process. Notable among these are Chain-of-Thought (CoT) (Wei et al., 2022), which decomposes problems into sequential steps, and more advanced paradigms like Tree-of-Thought (ToT) (Y ao et al., 2023), which explores multiple reasoning paths in parallel, and Rea-sonflux (Y ang et al., 2025b), employing high-level templates to explore potential solutions. Performance remains highly sensitive to prompt phrasing and frequently necessitates iterative feedback to achieve robust results (Sel et al., 2023). In response, recent work has sought to automate reasoning strategy selection.


Non-Termination Proving: 100 Million LoC and Beyond

arXiv.org Artificial Intelligence

We report on our tool, Pulse Infinite, that uses proof techniques to show non-termination (divergence) in large programs. Pulse Infinite works compositionally and under-approximately: the former supports scale, and the latter ensures soundness for proving divergence. Prior work focused on small benchmarks in the tens or hundreds of lines of code (LoC), and scale limits their practicality: a single company may have tens of millions, or even hundreds of millions of LoC or more. We report on applying Pulse Infinite to over a hundred million lines of open-source and proprietary software written in C, C++, and Hack, identifying over 30 previously unknown issues, establishing a new state of the art for detecting divergence in real-world codebases.


Tinkering with Monte Carlo Method in Reinforcement Learning

#artificialintelligence

Monte Carlo, as well as Dynamic Programming, Temporal Difference are the main methods for starters in Reinforcement Learning. First, let's have a brief reminder of what is Monte Carlo method. Monte Carlo is an algorithm that generates paths (which constitutes an episode) based on the current policy which usually splits between exploration and exploitation, like epsilon greedy, until the path reaches a terminal state. Once that state is reached, the algorithm goes back through that path again and affects each state the discounted rewards that are met during the episode. These values (discounts rewards) are averaged with any other values that happen to be contained in those states.


This AI Creates Realistic Animated Looping Videos from Static Images

#artificialintelligence

I explain Artificial Intelligence terms and news to non-experts. This model takes a picture, understands which particles are supposed to be moving, and realistically animates them in an infinite loop while conserving the rest of the picture entirely. The end result is amazingly realistic videos like this one, using only still pictures to generate it. Read the full article: https://www.louisbouchard.ai/animate-pictures/ Have you ever taken a beautiful landscape picture and later on you noticed that it didn't Observing the water flow or see the smoke disperse in the air.


Agents in Artificial Intelligence

#artificialintelligence

Artificial Intelligence study is composed of rational agents. A rational agent could be anything which make decisions, program, machine or a person. Agent carries out the actions which give the best outcome based on past and present percepts. An AI system contains and agent and the environment on which agent perform actions. It can be many agents in the environment.


Stochastic Hill Climbing in Python from Scratch - DLTK.AI

#artificialintelligence

Stochastic Hill climbing is an optimization algorithm. It makes use of randomness as part of the search process. This makes the algorithm appropriate for nonlinear objective functions where other local search algorithms do not operate well. It is also a local search algorithm, meaning that it modifies a single solution and searches the relatively local area of the search space until the local optima is located. This means that it is appropriate for unimodal optimization problems or for use after the application of a global optimization algorithm.


Generalized Planning with Positive and Negative Examples

arXiv.org Artificial Intelligence

Generalized planning aims at computing an algorithm-like structure (generalized plan) that solves a set of multiple planning instances. In this paper we define negative examples for generalized planning as planning instances that must not be solved by a generalized plan. With this regard the paper extends the notion of validation of a generalized plan as the problem of verifying that a given generalized plan solves the set of input positives instances while it fails to solve a given input set of negative examples. This notion of plan validation allows us to define quantitative metrics to asses the generalization capacity of generalized plans. The paper also shows how to incorporate this new notion of plan validation into a compilation for plan synthesis that takes both positive and negative instances as input. Experiments show that incorporating negative examples can accelerate plan synthesis in several domains and leverage quantitative metrics to evaluate the generalization capacity of the synthesized plans.


An Introduction to Computability Theory and Complexity All Essential Tech

#artificialintelligence

Have you ever wondered: What exactly is the device that you are reading this article on? Computational science dates back to a time long before these modern computing devices were even imagined. In an industry where the more frequently asked questions revolve around programming languages, frameworks, and libraries, we often taken for granted the fundamental concepts that make a computer tick. But these computers, which seem to possess endless potential--do they have any limitations? Are there problems that computers cannot be used to solve? In this article, we will address these questions by stepping away from the particulars of programming languages and computer architectures. By understanding the power and limitations of computers and algorithms, we can improve the way we think and better reason about different strategies. The abstract view of computing produces results that have stood the test of time, being as valuable to us today as they were when initially developed in the 1970s.


Watch: Viral Video Shows Amazon's Alexa Caught In An Infinite Loop

International Business Times

A YouTuber who goes by the name Tester Junkies uploaded a video Wednesday showing Amazon's Alexa voice assistant caught in an infinite loop, pushing out the same phrase again and again. The user basically gave Alexa the voice command on an Echo Dot speaker "Bark, Alexa, set a reminder for 1 second." After 1 second, Alexa responds, "Bark Alexa set a reminder for 1 second" and keeps repeating it. Essentially what happens is that the user has set a reminder for Alexa to reminder for itself. When being asked, "What is the reminder for" the user repeats himself with the same command.


Watch Amazon's Echo Dot get stuck in an 'infinite loop' chatting to Google's Home

Daily Mail - Science & tech

The'smart' speakers that won't stop talking to each other: Watch Amazon's Echo Dot get stuck in an'infinite loop' chatting to Google's Home Google's $130 Home speaker went on sale earlier this month Amazon's Alexa has been a huge hit with 5.1m sold Both can do everything from control lights to answer questions Google's $130 Home speaker went on sale earlier this month Amazon's Alexa has been a huge hit with 5.1m sold Has YOUR Google account been hacked? Researchers say... Apple goes Red for World AIDS day as firm is revealed to... Britain traded with the Middle East 1,300 years ago: Bitumen... The original human ancestor'Lucy' was a tree climbing... Has YOUR Google account been hacked? Researchers say... Apple goes Red for World AIDS day as firm is revealed to... Britain traded with the Middle East 1,300 years ago: Bitumen... The original human ancestor'Lucy' was a tree climbing... Google Home AI speaker (left) shows the incredible potential of a smart home assistant - but still has a little bit of learning to do before it become indispensable.