AITopics | loaf

Collaborating Authors

loaf

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

Neural Information Processing SystemsJun-17-2026, 07:01:47 GMT

Recent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL). These capabilities have primarily been demonstrated within the left-to-right autoregressive (AR) generation paradigm. In contrast, non-autoregressive paradigms based on diffusion generate text in a coarse-to-fine manner. Although recent diffusion-based large language models (dLLMs) have achieved competitive language modeling performance compared to their AR counterparts, it remains unclear if dLLMs can also leverage recent advances in LLM reasoning. To this end, we propose d1, a framework to adapt pre-trained masked dLLMs into reasoning models via a combination of supervised finetuning (SFT) and RL. Specifically, we develop and extend techniques to improve reasoning in pretrained dLLMs: (a) we utilize a masked SFT technique to distill knowledge and instill self-improvement behavior directly from existing datasets, and (b) we introduce a novel critic-free, policygradient based RL algorithm called diffu-GRPO, the first integration of policy gradient methods to masked dLLMs. Through empirical studies, we investigate the performance of different post-training recipes on multiple mathematical and planning benchmarks. We find that d1 yields the best performance and significantly improves performance of a state-of-the-art dLLM. Our code is released at https://dllm-reasoning.github.io/.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

Zhao, Siyan, Gupta, Devaansh, Zheng, Qinqing, Grover, Aditya

arXiv.org Artificial IntelligenceJun-4-2025

Recent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL). These capabilities have primarily been demonstrated within the left-to-right autoregressive (AR) generation paradigm. In contrast, non-autoregressive paradigms based on diffusion generate text in a coarse-to-fine manner. Although recent diffusion-based large language models (dLLMs) have achieved competitive language modeling performance compared to their AR counterparts, it remains unclear if dLLMs can also leverage recent advances in LLM reasoning. To this end, we propose d1, a framework to adapt pre-trained masked dLLMs into reasoning models via a combination of supervised finetuning (SFT) and RL. Specifically, we develop and extend techniques to improve reasoning in pretrained dLLMs: (a) we utilize a masked SFT technique to distill knowledge and instill self-improvement behavior directly from existing datasets, and (b) we introduce a novel critic-free, policy-gradient based RL algorithm called diffu-GRPO, the first integration of policy gradient methods to masked dLLMs. Through empirical studies, we investigate the performance of different post-training recipes on multiple mathematical and planning benchmarks. We find that d1 yields the best performance and significantly improves performance of a state-of-the-art dLLM. Our code is released at https://dllm-reasoning.github.io/.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.12216

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments

Payoungkhamdee, Patomporn, Tuchinda, Pume, Baek, Jinheon, Cahyawijaya, Samuel, Udomcharoenchaikit, Can, Manakul, Potsawee, Limkonchotiwat, Peerat, Chuangsuwanich, Ekapol, Nutanong, Sarana

arXiv.org Artificial IntelligenceFeb-25-2025

Multi-step reasoning is essential for large language models (LLMs), yet multilingual performance remains challenging. While Chain-of-Thought (CoT) prompting improves reasoning, it struggles with non-English languages due to the entanglement of reasoning and execution. Program-of-Thought (PoT) prompting separates reasoning from execution, offering a promising alternative but shifting the challenge to generating programs from non-English questions. We propose a framework to evaluate PoT by separating multilingual reasoning from code execution to examine (i) the impact of fine-tuning on question-reasoning alignment and (ii) how reasoning quality affects answer correctness. Our findings demonstrate that PoT fine-tuning substantially enhances multilingual reasoning, outperforming CoT fine-tuned models. We further demonstrate a strong correlation between reasoning quality (measured through code quality) and answer accuracy, highlighting its potential as a test-time performance improvement heuristic.

computational linguistic, ice-score, reasoning, (16 more...)

arXiv.org Artificial Intelligence

2502.17956

Country:

Asia > Thailand > Bangkok > Bangkok (0.05)
Asia > Singapore (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.31)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement

Xie, Yuxi, Goyal, Anirudh, Wu, Xiaobao, Yin, Xunjian, Xu, Xiao, Kan, Min-Yen, Pan, Liangming, Wang, William Yang

arXiv.org Artificial IntelligenceOct-12-2024

Iterative refinement has emerged as an effective paradigm for enhancing the capabilities of large language models (LLMs) on complex tasks. However, existing approaches typically implement iterative refinement at the application or prompting level, relying on autoregressive (AR) modeling. The sequential token generation in AR models can lead to high inference latency. To overcome these challenges, we propose Context-Wise Order-Agnostic Language Modeling (COrAL), which incorporates iterative refinement directly into the LLM architecture while maintaining computational efficiency. Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally during the generation process. Leveraging the order-agnostic nature of COrAL, we introduce sliding blockwise order-agnostic decoding, which performs multi-token forward prediction and backward reconstruction within context windows. This allows the model to iteratively refine its outputs in parallel in the sliding block, effectively capturing diverse dependencies without the high inference cost of sequential generation. Empirical evaluations on reasoning tasks demonstrate that COrAL improves performance and inference speed, respectively, achieving absolute accuracy gains of $4.6\%$ on GSM8K and $4.0\%$ on LogiQA, along with inference speedups of up to $3.9\times$ over next-token baselines. Preliminary results on code generation indicate a drop in pass rates due to inconsistencies in order-agnostic outputs, highlighting the inherent quality--speed trade-off. Our code is publicly available at https://github.com/YuxiXie/COrAL.

large language model, loaf, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.09675

Country:

Europe > Austria > Vienna (0.15)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
(20 more...)

Genre: Research Report (0.82)

Industry: Education > Educational Setting > K-12 Education (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

Xie, Yuxi, Goyal, Anirudh, Zheng, Wenyue, Kan, Min-Yen, Lillicrap, Timothy P., Kawaguchi, Kenji, Shieh, Michael

arXiv.org Artificial IntelligenceJun-17-2024

We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by AlphaZero. Our work leverages Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. To enhance consistency in intermediate steps, we combine outcome validation and stepwise self-evaluation, continually updating the quality assessment of newly generated data. The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data. Theoretical analysis reveals the importance of using on-policy sampled data for successful self-improving. Extensive evaluations on various arithmetic and commonsense reasoning tasks demonstrate remarkable performance improvements over existing models. For instance, our approach outperforms the Mistral-7B Supervised Fine-Tuning (SFT) baseline on GSM8K, MATH, and ARC-C, with substantial increases in accuracy to $81.8\%$ (+$5.9\%$), $34.7\%$ (+$5.8\%$), and $76.4\%$ (+$15.8\%$), respectively. Additionally, our research delves into the training and inference compute tradeoff, providing insights into how our method effectively maximizes performance gains. Our code is publicly available at https://github.com/YuxiXie/MCTS-DPO.

equation, loaf, molecule, (14 more...)

arXiv.org Artificial Intelligence

2405.00451

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(10 more...)

Genre: Research Report (0.81)

Industry:

Education (0.46)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Tab-CoT: Zero-shot Tabular Chain of Thought

Jin, Ziqi, Lu, Wei

arXiv.org Artificial IntelligenceMay-28-2023

The chain-of-though (CoT) prompting methods were successful in various natural language processing (NLP) tasks thanks to their ability to unveil the underlying complex reasoning processes. Such reasoning processes typically exhibit implicitly structured steps. Recent efforts also started investigating methods to encourage more explicitly structured reasoning procedures to be captured. In this work, we propose Tab-CoT, a novel tabular-format CoT prompting method, which allows the complex reasoning process to be explicitly modelled in a highly structured manner. Despite its simplicity, we show that our approach is capable of performing reasoning across multiple dimensions (i.e., both rows and columns). We demonstrate our approach's strong zero-shot and few-shot capabilities through extensive experiments on a range of reasoning tasks.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.17812

Country:

Asia > Singapore (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > California > Los Angeles County > Beverly Hills (0.04)
(2 more...)

Genre:

Research Report (1.00)
Workflow (0.68)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

30 Great Deals at Best Buy, Target, and Other Amazon Prime Day Rivals (Updated)

WIREDOct-15-2020, 00:37:00 GMT

Prime Day is nearly over, and while Amazon still has plenty of discounts, so do its competitors. We've gathered up corresponding deals from Walmart, Target, Best Buy, and other stores. You won't need a membership to shop these sales, but you should keep in mind that this is just the start of the holiday shopping season. Black Friday and Cyber Monday are just over a month away, and we'll be covering those sales, too. Note: We strike through items that sell out or rise in price as we update this guide.

amazon prime day rival, artificial intelligence, best buy, (9 more...)

WIRED

Industry: Retail > Online (1.00)

Technology:

Information Technology > Artificial Intelligence (0.54)
Information Technology > Communications > Mobile (0.31)

Add feedback

Are machines going to replace programmers?

#artificialintelligenceAug-15-2020, 22:10:25 GMT

Are machines going to replace programmers? Or how software creates other software. I started doing some home baking recently. It started, like with a lot of other people. I started doing some home baking recently.

artificial intelligence, machine learning, programmer, (9 more...)

#artificialintelligence

Country: North America > United States (0.51)

Industry: Government > Regional Government > North America Government > United States Government (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback

A new fleet of autonomous robots is now making one of the world's oldest foods

Washington Post - Technology NewsJan-8-2019, 00:04:03 GMT

In the beginning, archaeologists believe, the first breads were created using some of the most rudimentary technologies in human history: fire and stone. In the region that now encompasses Jordan, one of the world's most ancient examples -- a flatbread vaguely resembling pita and made from wild cereal grains and water -- was cooked in large fireplaces using flat basalt stones, according to Reuters. The taste is "gritty and salty," Amaia Arranz-Otaegui, a University of Copenhagen postdoctoral researcher in archaeobotany, told the news service. "But it is a bit sweet, as well." More than 10,000 years later, bread has clearly evolved but, perhaps, not as dramatically as the technology being used to bake it.

artificial intelligence, loaf, wilkinson, (8 more...)

Washington Post - Technology News

AI-Alerts: 2019 > 2019-01 > AAAI AI-Alert for Jan 8, 2019 (1.00)

Country:

North America > United States (0.31)
Europe > Denmark > Capital Region > Copenhagen (0.25)
Asia > Middle East > Jordan (0.25)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.32)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Meet the BreadBot: Autonomous bread-making robot bakes 10 loaves every hour

Daily Mail - Science & techJan-7-2019, 06:02:48 GMT

It could be the best thing since sliced bread. A family-owned baking business is attempting to disrupt how you get your next loaf at the grocery store with the first, fully automated bread-making machine. Called the BreadBot, Washington-based Wilkinson Baking Company took the wraps off the machine at the Consumer Electronics Show in Las Vegas, which mixes, kneads and bakes bread in just 90 minutes. Up to 10 loaves are ready to pick up from a vending machine every hour and its creators say the BreadBot does as good of a job as a human baker - so good that the machines are expected to land in major grocery stores soon. BreadBot can make just about any kind of loaf you want, including whole wheat, nine grain, honey oat and rye.

artificial intelligence, breadbot, wilkinson baking company, (11 more...)

Daily Mail - Science & tech

Country: North America > United States > Nevada > Clark County > Las Vegas (0.27)

Industry:

Retail (0.62)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.62)
Semiconductors & Electronics (0.59)

Technology: Information Technology > Artificial Intelligence > Robots (0.91)

Add feedback