AITopics | Large Language Model

Collaborating Authors

Large Language Model

News Overviews Instructional Materials AI-Alerts Classics

'Probably' doesn't mean the same thing to your AI as it does to you

AIHubApr-21-2026, 13:45:47 GMT

'Probably' doesn't mean the same thing to your AI as it does to you When a human says an event is "probable" or "likely," people generally have a shared, if fuzzy, understanding of what that means. But when an AI chatbot like ChatGPT uses the same word, it's not assessing the odds the way we do, my colleagues and I found. We recently published a study in the journal NPJ Complexity that suggests that, while large language model AIs excel at conversation, they often fail to align with humans when communicating uncertainty . The research focused on words of estimative probability, which include terms like "maybe," "probably" and "almost certain." By comparing how AI models and humans map these words to numerical percentages, we uncovered significant gaps between humans and large language models.

artificial intelligence, large language model, natural language, (16 more...)

AIHub

Country: North America > United States > California (0.15)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)

Add feedback

Identifying interactions at scale for LLMs

AIHubApr-21-2026, 13:37:46 GMT

Understanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence. Interpretability research aims to make the decision-making process more transparent to model builders and impacted humans, a step toward safer and more trustworthy AI. To achieve state-of-the-art performance, models synthesize complex feature relationships, find shared patterns from diverse training examples, and process information through highly interconnected internal components. In this blog post, we describe the fundamental ideas behind SPEX and ProxySPEX, algorithms capable of identifying these critical interactions at scale. We mask or remove specific segments of the input prompt and measure the resulting shift in the predictions.

large language model, machine learning, natural language, (18 more...)

AIHub

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Interview with Sukanya Mandal: Synthesizing multi-modal knowledge graphs for smart city intelligence

AIHubApr-21-2026, 13:37:43 GMT

In their paper LLMasMMKG: LLM Assisted Synthetic Multi-Modal Knowledge Graph Creation For Smart City Cognitive Digital Twins, which was published in the AAAI Fall Symposium series, and introduced an approach that leverages large language models to automate the construction of synthetic multi-modal knowledge graphs specifically designed for a smart city cognitive digital twin. Here, Sukanya tells us more about cognitive digital twins, the framework they employed, and some key results. Could you start by introducing the idea of smart city cognitive digital twins and why this is an interesting area for study? Cities grow increasingly complex and interconnected, demanding sophisticated tools for management. A cognitive digital twin (CDT) serves as an AI-enabled virtual replica that models the dynamic interplay of physical and social systems, enabling simulations, predictions, and optimized operations.

artificial intelligence, large language model, natural language, (12 more...)

AIHub

Country: Europe > Ireland (0.05)

Genre: Personal > Interview (0.30)

Industry:

Health & Medicine (0.71)
Energy (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.87)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.77)

Add feedback

Japanet expands its VC fund after bets on Anthropic and xAI pay off

The Japan TimesApr-21-2026, 08:30:00 GMT

Japanet is expanding its venture capital fund with Pegasus Tech Ventures, after early investments in firms like SpaceX, OpenAI, Anthropic and xAI showed strong growth. Japanese home shopping company Japanet is expanding its venture capital fund with San Jose-based Pegasus Tech Ventures, following the success of early bets in SpaceX, OpenAI, Anthropic and xAI. The Nagasaki-based retailer known for infomercials targeting seniors in aging Japan will allocate $200 million to the fund, up from an initial $50 million in 2021, following significant growth" in investments so far, the companies said in a statement. The fund, of which Pegasus is general partner, will focus on areas such as generative AI, robotics and space technology. Its Japan portfolio includes startup Aillis, which seeks to use artificial intelligence to analyze medical scans. Asian companies have struggled to win stakes in promising startups in Silicon Valley, hampered by a lack of personal connections and reputation for slow decision-making. Pegasus also manages startup investments on behalf of Toyota Motor-affiliate Aisin, Japanese chemical maker Denka, Taiwan's Asustek Computer and Acer and Indonesia's pharma company Kalbe Farma. Everybody wants a piece of the Silicon Valley AI action," Pegasus Chief Executive Officer Anis Uzzaman said on a video call.

large language model, machine learning, natural language, (16 more...)

The Japan Times

Country:

North America > United States > California (0.46)
Asia > Middle East > Iran (0.43)
Asia > Taiwan (0.26)
(7 more...)

Genre: Press Release (1.00)

Industry:

Automobiles & Trucks > Manufacturer (0.56)
Consumer Products & Services > Travel (0.54)
Information Technology > Services (0.52)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.77)
Information Technology > Communications > Social Media (0.77)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)

Add feedback

Amazon to invest an additional 5 billion in Anthropic

The Japan TimesApr-21-2026, 01:54:00 GMT

Anthropic was founded in 2021 by several former employees of OpenAI. Amazon is investing an additional $5 billion in Anthropic, and may inject $20 billion more over time, a deal that deepens the companies' ties in an increasingly competitive artificial intelligence industry. Anthropic, which makes the Claude chatbot and coding tool, plans to spend more than $100 billion over the next 10 years on Amazon's cloud technologies and chips, the companies said in a statement on Monday. Amazon shares gained about 3% on the news in extended trading. Amazon was already one of Anthropic's biggest backers, with prior investments totaling $8 billion.

artificial intelligence, large language model, natural language, (13 more...)

The Japan Times

Country:

Asia > Middle East > Iran (0.44)
Oceania > Australia (0.05)
North America > United States (0.05)
(5 more...)

Genre: Press Release (0.37)

Industry:

Information Technology > Services (0.33)
Government (0.33)
Media > News (0.31)
Leisure & Entertainment (0.31)

Technology:

Information Technology > Communications > Social Media (0.79)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.56)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.36)

Add feedback

Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning

Davidov, Hen, Cohen, Nachshon, Kalinsky, Oren, Fairstein, Yaron, Kushilevitz, Guy, Yazdi, Ram, Rebeschini, Patrick

arXiv.org Machine LearningApr-21-2026

Large language models (LLMs) using chain-of-thought reasoning often waste substantial compute by producing long, incorrect responses. Abstention can mitigate this by withholding outputs unlikely to be correct. While most abstention methods decide to withhold outputs before or after generation, dynamic mid-generation abstention considers early termination of unpromising reasoning traces at each token position. Prior work has explored empirical variants of this idea, but principled guidance for the abstention rule remains lacking. We present a formal analysis of dynamic abstention for LLMs, modeling abstention as an explicit action within a regularized reinforcement learning framework. An abstention reward parameter controls the trade-off between compute and information. We show that abstaining when the value function falls below this reward strictly outperforms natural baselines under general conditions. We further derive a principled and efficient method to approximate the value function. Empirical results on mathematical reasoning and toxicity avoidance tasks support our theory and demonstrate improved selective accuracy over existing methods.

abstention, large language model, machine learning, (20 more...)

arXiv.org Machine Learning

2604.18419

Country:

Europe > Monaco (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Tight Sample Complexity Bounds for Best-Arm Identification Under Bounded Systematic Bias

Qian, Tianhao

arXiv.org Machine LearningApr-21-2026

As search depth increases in autonomous reasoning and embodied planning, the candidate action space expands exponentially, heavily taxing computational budgets. While heuristic pruning is a common countermeasure, it operates without formal safety guarantees when surrogate models (like LLMs) exhibit systematic evaluation biases. This paper frames the node expansion process as a localized Best-Arm Identification (BAI) problem over dynamic frontiers, subject to a bounded systematic bias $L$. By inverting the Lambert W function, we establish an additive sample complexity of $\mathcal{O}((Δ-4L)^{-2})$, which indicates that safe node elimination is only feasible when the empirical reward gap exceeds $4L$. We complement this with an information-theoretic lower bound of $Ω((Δ-2L)^{-2})$ to confirm the structural limits of biased search. Subsequent evaluations on both synthetic trees and complex reasoning tasks demonstrate that adhering to this local safety boundary successfully preserves optimal trajectories while maximizing sample allocation efficiency.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Machine Learning

2604.14345

Country: Asia > China (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)

Add feedback

Sampling for Quality: Training-Free Reward-Guided LLM Decoding via Sequential Monte Carlo

Markovic-Voronov, Jelena, Zhu, Wenhui, Long, Bo, Wang, Zhipeng, Gupta, Suyash, Behdin, Kayhan, Chen, Bee-Chung, Agarwal, Deepak

arXiv.org Machine LearningApr-21-2026

We introduce a principled probabilistic framework for reward-guided decoding in large language models, addressing the limitations of standard decoding methods that optimize token-level likelihood rather than sequence-level quality. Our method defines a reward-augmented target distribution over complete sequences by combining model transition probabilities with prefix-dependent reward potentials. Importantly, the approach is training-free: it leaves model weights unchanged and instead modifies the inference distribution via reward potentials, with all gains arising purely from inference-time sampling. To sample from this distribution, we develop Sequential Monte Carlo algorithms, including a computationally efficient prefix-only variant and a lookahead variant whose intermediate targets match the exact marginals of the full sequence distribution. The framework also integrates resample-move updates with Metropolis-Hastings rejuvenation and supports block-wise generation, subsuming common decoding strategies such as temperature sampling and power-tempered objectives. Empirical results across three 7B models show significant gains. On code generation (HumanEval), our method improves base performance by up to 54.9% and surpasses the strongest sampling baselines by 9.1%-15.3%. On mathematical reasoning (MATH500), it achieves gains of up to 8.8%. Notably, it reaches 87.8% on HumanEval and 78.4% on MATH500 with Qwen2.5-7B, consistently outperforming the reinforcement learning method GRPO.

large language model, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2604.16453

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.54)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.34)

Add feedback

FUSE: Ensembling Verifiers with Zero Labeled Data

Lee, Joonhyuk, Ma, Virginia, Zhao, Sarah, Nair, Yash, Spector, Asher, Cohen, Regev, Candès, Emmanuel J.

arXiv.org Machine LearningApr-21-2026

Verification of model outputs is rapidly emerging as a key primitive for both training and real-world deployment of large language models (LLMs). In practice, this often involves using imperfect LLM judges and reward models since ground truth acquisition can be time-consuming and expensive. We introduce Fully Unsupervised Score Ensembling (FUSE), a method for improving verification quality by ensembling verifiers without access to ground truth correctness labels. The key idea behind FUSE is to control conditional dependencies between verifiers in a manner that improves the unsupervised performance of a class of spectral algorithms from the ensembling literature. Despite requiring zero ground truth labels, FUSE typically matches or improves upon semi-supervised alternatives in test-time scaling experiments with diverse sets of generator models, verifiers, and benchmarks. In particular, we validate our method on both conventional academic benchmarks such as GPQA Diamond and on frontier, unsaturated benchmarks such as Humanity's Last Exam and IMO Shortlist questions.

large language model, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2604.18547

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)
Asia > Middle East > Lebanon (0.04)
Asia > China (0.04)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Prior-Fitted Functional Flow: In-Context Generative Models for Pharmacokinetics

Ojeda, César, Hartung, Niklas, Huisinga, Wilhelm, Jahn, Tim, Kavwele, Purity Kamene, Klose, Marian, Kumar, Piyush, Sánchez, Ramsés J., Faroughy, Darius A.

arXiv.org Machine LearningApr-21-2026

We introduce Prior-Fitted Functional Flows, a generative foundation model for pharmacokinetics that enables zero-shot population synthesis and individual forecasting without manual parameter tuning. We learn functional vector fields, explicitly conditioned on the sparse, irregular data of an entire study population. This enables the generation of coherent virtual cohorts as well as forecasting of partially observed patient trajectories with calibrated uncertainty. We construct a new open-access literature corpus to inform our priors, and demonstrate state-of-the-art predictive accuracy on extensive real-world datasets.

large language model, machine learning, trajectory, (20 more...)

arXiv.org Machine Learning

2604.1767

Country:

North America > United States (0.14)
Europe > Austria > Vienna (0.14)
Europe > Germany (0.05)

Genre: Research Report > Experimental Study (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback