AITopics | prefix

Collaborating Authors

prefix

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimal Gap-Dependent Regret for Private Stochastic Decision-Theoretic Online Learning

Cesari, Tommaso, Colomboni, Roberto

arXiv.org Machine LearningMay-29-2026

We study stochastic decision-theoretic online learning with full information and event-level pure differential privacy. A COLT open problem of Hu and Mehta asks to determine the optimal gap-dependent regret rate for stochastic decision-theoretic online learning under pure event-level differential privacy. For $K$ actions, losses in $[0,1]$, and a unique best action separated from the second-best action by gap $Δ_{\min}$, the known lower bound is of order $ \frac{\log K}{\min\{Δ_{\min},\varepsilon\}}, $ or equivalently, up to universal constants, of order \[ \frac{\log K}{Δ_{\min}}+\frac{\log K}{\varepsilon}. \] We give a horizon-free pure-DP algorithm and prove the explicit regret bound \[ \operatorname{Reg}_T \le 1000 \cdot \left(\frac{\log K}{Δ_{\min}}+\frac{\log K}{\varepsilon}\right) \] for every horizon $T$. The numerical constant is not optimized. The algorithm partitions time into blocks of exponentially increasing size, plays a single action throughout each block, and chooses the next action by an exponential mechanism applied to a data-independent random prefix of the previous block. The random prefix converts block regret into a sum, over all prefix lengths, of softmax selection errors. A single entropy-potential argument controls all privacy-dominated large-gap actions at cost $\log K/\varepsilon$.

artificial intelligence, machine learning, privacy, (15 more...)

arXiv.org Machine Learning

2605.29148

Country: Europe > United Kingdom (0.28)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.92)

Add feedback

Conformal Certification of Reasoning Trace Prefixes

Cheung, Matt Y., Veeraraghavan, Ashok, Chen, Hanjie, Balakrishnan, Guha

arXiv.org Machine LearningMay-29-2026

Language model reasoning traces are rarely all-or-nothing; they frequently contain valid intermediate steps before a critical error occurs. Existing uncertainty quantification methods typically certify final answers or entire responses, failing to provide statistical guarantees for the proportion of a sequential trace that can be safely retained. To address this, we introduce CROP (Conformal Reasoning Output Prefixes), a verifier-agnostic calibration procedure for clean-prefix certification. Given any step-level risk proxy, CROP selects a calibrated threshold and returns the longest contiguous prefix whose step risk proxies remain below it, routing the uncertified suffix for downstream review or repair. Assuming exchangeability, CROP rigorously controls the marginal probability that the returned prefix contains an annotated error. Across six process-labeled reasoning datasets, we demonstrate that standard step-level metrics such as AUROC do not fully capture prefix utility, suggesting verifiers should instead be evaluated by certified prefix length. Furthermore, CROP balances over- and under-withholding, improving downstream repair accuracy by preserving valid intermediate reasoning while discarding misleading suffixes. Ultimately, this work positions prefix certification as a rigorous, practical bridge between process supervision, abstention, and repair.

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2605.30085

Genre:

Workflow (0.93)
Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

When Is Next-Token Prediction Useful? Marginalization, Ergodicity, Mixture Identifiability, Local Sufficiency, RAG, Tools, and Programming

Corielli, Francesco

arXiv.org Machine LearningMay-25-2026

Language models trained on observed sequences are often described as learning the conditional distribution of the next token given previous tokens. This description is only conditionally correct. A model trained on realized token trajectories does not observe full conditional laws; it receives sampled continuations. Moreover, real language generation is conditioned not only on previous words but also on non-textual circumstances: facts, events, intentions, goals, beliefs, social context, and task-specific constraints. This paper distinguishes three objects that are often conflated: the full conditional language process conditioned on latent circumstances, the marginal text-only process obtained by integrating those circumstances out, and the model-induced distribution learned from finite observed corpora. The paper argues that interpreting model training as estimating the marginal text-only law requires strong assumptions of stationarity, representativeness, and ergodicity, assumptions that are standard in statistical estimation but problematic when applied to heterogeneous language corpora. Even if these assumptions hold, the marginal text-only law is useful only when the observed prefix is an approximately sufficient statistic for the latent circumstances relevant to continuation. In information-theoretic terms, usefulness requires that the residual conditional mutual information between the next token and the omitted circumstances, given the observed text, be small. The paper then extends this argument to heterogeneous training corpora. Finally, the paper interprets Retrieval Augmented Generation (RAG) and tool use as conditional sufficiency devices.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2605.23278

Genre: Research Report (1.00)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Into the Single Cell Multiverse: an End-to-End Dataset for Procedural Knowledge Extraction in Biomedical Texts

Neural Information Processing SystemsApr-25-2026, 22:19:22 GMT

Here we describe the additional details of FlaMBé's curation including structured guidelines for each annotation task, corpus curation, and file assembly. All manual curation in FlaMBé was conducted by three annotators who have doctorate level expertise in computational biology. For named entity tagging annotations a set of structured guidelines were followed to ensure consistency. The guidelines given to reviewers are in the annotator guidelines section below. B.1 Tissue and cell type entities Generally, all terms, related synonyms, and text entities that can be mapped to an entry from the tissue, organ, body part, fluid, and cell type branches of the NCI thesaurus were labeled. Instead of a rigid vocabulary fixed on exact matches of NCIThesaurus (NCIT) terms and synonyms, annotators were encouraged to tag any word with the same meaning as an ontology term. For example, "Pancreatic ductal adenocarcinoma" describes cancer of the pancreas, which can be related back to the NCI Thesaurus, and thus was tagged as a "TISSUE". An initial set of rules was provided to each annotator. When one annotator encountered a corner case (e.g., "is neuron a tissue or cell type?") all annotators discussed, reached a consensus, then added the corner case to the set of annotation rules.

data mining, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Industry:

Health & Medicine > Therapeutic Area > Oncology > Pancreatic Cancer (0.54)
Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.54)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.34)

Add feedback

0cde695b83bd186c1fd456302888454c-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 13:29:47 GMT

artificial intelligence, intrinsic, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Cold-Start Forecasting of New Product Life-Cycles via Conditional Diffusion Models

Zhou, Ruihan, Zhang, Zishi, Han, Jinhui, Peng, Yijie, Zhang, Xiaowei

arXiv.org Machine LearningApr-23-2026

Forecasting the life-cycle trajectory of a newly launched product is important for launch planning, resource allocation, and early risk assessment. This task is especially difficult in the pre-launch and early post-launch phases, when product-specific outcome history is limited or unavailable, creating a cold-start problem. In these phases, firms must make decisions before demand patterns become reliably observable, while early signals are often sparse, noisy, and unstable We propose the Conditional Diffusion Life-cycle Forecaster (CDLF), a conditional generative framework for forecasting new-product life-cycle trajectories under cold start. CDLF combines three sources of information: static descriptors, reference trajectories from similar products, and newly arriving observations when available. Here, static descriptors refer to structured pre-launch characteristics of the product, such as category, price tier, brand or organization identity, scale, and access conditions. This structure allows the model to condition forecasts on relevant product context and to update them adaptively over time without retraining, yielding flexible multi-modal predictive distributions under extreme data scarcity. The method satisfies consistency with a horizon-uniform distributional error bound for recursive generation. Across studies on Intel microprocessor stock keeping unit (SKU) life cycles and the platform-mediated adoption of open large language model repositories, CDLF delivers more accurate point forecasts and higher-quality probabilistic forecasts than classical diffusion models, Bayesian updating approaches, and other state-of-the-art machine-learning baselines.

forecasting, large language model, machine learning, (21 more...)

arXiv.org Machine Learning

2604.2037

Country: