Goto

Collaborating Authors

 cocoa


Toffee Crisp and Blue Riband can't be called chocolate any more

BBC News

Toffee Crisp and Blue Riband can't be called chocolate any more Toffee Crisp and Blue Riband bars can no longer be called chocolate after maker Nestle changed their recipes. To be described as milk chocolate in the UK a product needs to have at least 20% cocoa solids and 20% milk solids, a level each product fell below once a higher amount of cheaper vegetable fat was used. Nestle said its reformulations were needed due to higher input costs but were carefully developed and sensory tested and there were no plans to alter the recipes of other chocolate products. As many ingredient costs, such as cocoa and butter, increased food companies have altered recipes to use less of the expensive ingredients, as well as shrinking serving sizes. Nestle now describes the treats as being encased in a smooth milk chocolate flavour coating rather than being covered in milk chocolate.


Systematic Evaluation of Uncertainty Estimation Methods in Large Language Models

Hobelsberger, Christian, Winner, Theresa, Nawroth, Andreas, Mitevski, Oliver, Haensch, Anna-Carolina

arXiv.org Artificial Intelligence

Large language models (LLMs) produce outputs with varying levels of uncertainty, and, just as often, varying levels of correctness; making their practical reliability far from guaranteed. To quantify this uncertainty, we systematically evaluate four approaches for confidence estimation in LLM outputs: VCE, MSP, Sample Consistency, and CoCoA (Vashurin et al., 2025). For the evaluation of the approaches, we conduct experiments on four question-answering tasks using a state-of-the-art open-source LLM. Our results show that each uncertainty metric captures a different facet of model confidence and that the hybrid CoCoA approach yields the best reliability overall, improving both calibration and discrimination of correct answers. We discuss the trade-offs of each method and provide recommendations for selecting uncertainty measures in LLM applications.


Towards Agents That Know When They Don't Know: Uncertainty as a Control Signal for Structured Reasoning

Stoisser, Josefa Lia, Martell, Marc Boubnovski, Phillips, Lawrence, Mazzoni, Gianluca, Harder, Lea Mørch, Torr, Philip, Ferkinghoff-Borg, Jesper, Martens, Kaspar, Fauqueur, Julien

arXiv.org Artificial Intelligence

Large language model (LLM) agents are increasingly deployed in structured biomedical data environments, yet they often produce fluent but overconfident outputs when reasoning over complex multi-table data. We introduce an uncertainty-aware agent for query-conditioned multi-table summarization that leverages two complementary signals: (i) retrieval uncertainty--entropy over multiple table-selection rollouts--and (ii) summary uncertainty--combining self-consistency and perplexity. Summary uncertainty is incorporated into reinforcement learning (RL) with Group Relative Policy Optimization (GRPO), while both retrieval and summary uncertainty guide inference-time filtering and support the construction of higher-quality synthetic datasets. On multi-omics benchmarks, our approach improves factuality and calibration, nearly tripling correct and useful claims per summary (3.0\(\rightarrow\)8.4 internal; 3.6\(\rightarrow\)9.9 cancer multi-omics) and substantially improving downstream survival prediction (C-index 0.32\(\rightarrow\)0.63). These results demonstrate that uncertainty can serve as a control signal--enabling agents to abstain, communicate confidence, and become more reliable tools for complex structured-data environments.


CoCoA: A Generalized Approach to Uncertainty Quantification by Integrating Confidence and Consistency of LLM Outputs

Vashurin, Roman, Goloburda, Maiya, Nakov, Preslav, Shelmanov, Artem, Panov, Maxim

arXiv.org Artificial Intelligence

Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompasses a variety of approaches, with two major types being particularly prominent: information-based, which focus on model confidence expressed as token probabilities, and consistency-based, which assess the semantic relationship between multiple outputs generated using repeated sampling. Several recent methods have combined these two approaches and shown impressive performance in various applications. However, they sometimes fail to outperform much simpler baseline methods. Our investigation reveals distinctive characteristics of LLMs as probabilistic models, which help to explain why these UQ methods underperform in certain tasks. Based on these findings, we propose a new way of synthesizing model confidence and output consistency that leads to a family of efficient and robust UQ methods. We evaluate our approach across a variety of tasks such as question answering, abstractive summarization, and machine translation, demonstrating sizable improvements over state-of-the-art UQ approaches.


Cocoa: Co-Planning and Co-Execution with AI Agents

Feng, K. J. Kevin, Pu, Kevin, Latzke, Matt, August, Tal, Siangliulue, Pao, Bragg, Jonathan, Weld, Daniel S., Zhang, Amy X., Chang, Joseph Chee

arXiv.org Artificial Intelligence

We present Cocoa, a system that implements a novel interaction design pattern -- interactive plans -- for users to collaborate with an AI agent on complex, multi-step tasks in a document editor. Cocoa harmonizes human and AI efforts and enables flexible delegation of agency through two actions: Co-planning (where users collaboratively compose a plan of action with the agent) and Co-execution (where users collaboratively execute plan steps with the agent). Using scientific research as a sample domain, we motivate the design of Cocoa through a formative study with 9 researchers while also drawing inspiration from the design of computational notebooks. We evaluate Cocoa through a user study with 16 researchers and find that when compared to a strong chat baseline, Cocoa improved agent steerability without sacrificing ease of use. A deeper investigation of the general utility of both systems uncovered insights into usage contexts where interactive plans may be more appropriate than chat, and vice versa. Our work surfaces numerous practical implications and paves new paths for interactive interfaces that foster more effective collaboration between humans and agentic AI systems.


Reviews: COLA: Decentralized Linear Learning

Neural Information Processing Systems

This paper deals with learning linear models in a decentralized setting, where each node holds a subset of the dataset (features or data points, depending on the application) and communication can only occur between neighboring nodes in a connected network graph. The authors extend the CoCoA algorithm, originally designed for the distributed (master/slave) setting. They provide convergence rates as well as numerical comparisons. The authors should state more clearly that they are extending CoCoA to the decentralized setting. The adaptation of the setup, the local subproblems and the algorithm itself are fairly direct by restricting the information accessible by each node to its direct neighbors (instead of having access to information from all nodes).


Compositional Conservatism: A Transductive Approach in Offline Reinforcement Learning

Song, Yeda, Lee, Dongwook, Kim, Gunhee

arXiv.org Artificial Intelligence

Offline reinforcement learning (RL) is a compelling framework for learning optimal policies from past experiences without additional interaction with the environment. Nevertheless, offline RL inevitably faces the problem of distributional shifts, where the states and actions encountered during policy execution may not be in the training dataset distribution. A common solution involves incorporating conservatism into the policy or the value function to safeguard against uncertainties and unknowns. In this work, we focus on achieving the same objectives of conservatism but from a different perspective. We propose COmpositional COnservatism with Anchor-seeking (COCOA) for offline RL, an approach that pursues conservatism in a compositional manner on top of the transductive reparameterization (Netanyahu et al., 2023), which decomposes the input variable (the state in our case) into an anchor and its difference from the original input. Our COCOA seeks both in-distribution anchors and differences by utilizing the learned reverse dynamics model, encouraging conservatism in the compositional input space for the policy or value function. Such compositional conservatism is independent of and agnostic to the prevalent behavioral conservatism in offline RL. We apply COCOA to four state-of-the-art offline RL algorithms and evaluate them on the D4RL benchmark, where COCOA generally improves the performance of each algorithm. The code is available at https://github.com/runamu/compositional-conservatism.


Counterfactual Data Augmentation with Contrastive Learning

Aloui, Ahmed, Dong, Juncheng, Le, Cat P., Tarokh, Vahid

arXiv.org Machine Learning

Statistical disparity between distinct treatment groups is one of the most significant challenges for estimating Conditional Average Treatment Effects (CATE). To address this, we introduce a model-agnostic data augmentation method that imputes the counterfactual outcomes for a selected subset of individuals. Specifically, we utilize contrastive learning to learn a representation space and a similarity measure such that in the learned representation space close individuals identified by the learned similarity measure have similar potential outcomes. This property ensures reliable imputation of counterfactual outcomes for the individuals with close neighbors from the alternative treatment group. By augmenting the original dataset with these reliable imputations, we can effectively reduce the discrepancy between different treatment groups, while inducing minimal imputation error. The augmented dataset is subsequently employed to train CATE estimation models. Theoretical analysis and experimental studies on synthetic and semi-synthetic benchmarks demonstrate that our method achieves significant improvements in both performance and robustness to overfitting across state-of-the-art models. One of the most significant challenges for Conditional Average Treatment Effect (CATE) estimation is the statistical disparity between distinct treatment groups (Goldsmith-Pinkham et al., 2022). While Randomized Controlled Trials (RCT) mitigate this issue (Rubin, 1974; Imbens & Rubin, 2015), they can be expensive, unethical, and sometimes unfeasible to conduct.


Contrastive Corpus Attribution for Explaining Representations

Lin, Chris, Chen, Hugh, Kim, Chanwoo, Lee, Su-In

arXiv.org Artificial Intelligence

Despite the widespread use of unsupervised models, very few methods are designed to explain them. Most explanation methods explain a scalar model output. However, unsupervised models output representation vectors, the elements of which are not good candidates to explain because they lack semantic meaning. To bridge this gap, recent works defined a scalar explanation output: a dot product-based similarity in the representation space to the sample being explained (i.e., an explicand). Although this enabled explanations of unsupervised models, the interpretation of this approach can still be opaque because similarity to the explicand's representation may not be meaningful to humans. To address this, we propose contrastive corpus similarity, a novel and semantically meaningful scalar explanation output based on a reference corpus and a contrasting foil set of samples. We demonstrate that contrastive corpus similarity is compatible with many post-hoc feature attribution methods to generate COntrastive COrpus Attributions (COCOA) and quantitatively verify that features important to the corpus are identified. We showcase the utility of COCOA in two ways: (i) we draw insights by explaining augmentations of the same image in a contrastive learning setting (SimCLR); and (ii) we perform zero-shot object localization by explaining the similarity of image representations to jointly learned text representations (CLIP). Machine learning models based on deep neural networks are increasingly used in a diverse set of tasks including chess (Silver et al., 2018), protein folding (Jumper et al., 2021), and language translation (Jean et al., 2014). The majority of neural networks have many parameters, which impedes humans from understanding them (Lipton, 2018). To address this, many tools have been developed to understand supervised models in terms of their prediction (Lundberg & Lee, 2017; Wachter et al., 2017). In this supervised setting, the model maps features to labels (f: X Y), and explanations aim to understand the model's prediction of a label of interest. These explanations are interpretable, because the label of interest (e.g., mortality, an image class) is meaningful to humans (Figure 1a). In contrast, models trained in unsupervised settings map features to representations (f: X H). Unfortunately, the meaning of individual elements in the representation space is unknown in general.


Satellite-based high-resolution maps of cocoa planted area for C\^ote d'Ivoire and Ghana

Kalischek, Nikolai, Lang, Nico, Renier, Cécile, Daudt, Rodrigo Caye, Addoah, Thomas, Thompson, William, Blaser-Hart, Wilma J., Garrett, Rachael, Schindler, Konrad, Wegner, Jan D.

arXiv.org Artificial Intelligence

In both countries, cocoa is the primary perennial crop, providing income to almost two million farmers. Yet precise maps of cocoa planted area are missing, hindering accurate quantification of expansion in protected areas, production and yields, and limiting information available for improved sustainability governance. Here, we combine cocoa plantation data with publicly available satellite imagery in a deep learning framework and create high-resolution maps of cocoa plantations for both countries, validated in situ. Our results suggest that cocoa cultivation is an underlying driver of over 37 % and 13 % of forest loss in protected areas in Côte d'Ivoire and Ghana, respectively, and that official reports substantially underestimate the planted area, up to 40 % in Ghana. These maps serve as a crucial building block to advance understanding of conservation and economic development in cocoa producing regions.