Goto

Collaborating Authors

 ward


Estimating Staged Event Tree Models via Hierarchical Clustering on the Simplex

Shoaib, Muhammad, Riccomagno, Eva, Leonelli, Manuele, Varando, Gherardo

arXiv.org Machine Learning

Staged tree models enhance Bayesian networks by incorporating context-specific dependencies through a stage-based structure. In this study, we present a new framework for estimating staged trees using hierarchical clustering on the probability simplex, utilizing simplex basesd divergences. We conduct a thorough evaluation of several distance and divergence metrics including Total Variation, Hellinger, Fisher, and Kaniadakis; alongside various linkage methods such as Ward.D2, average, complete, and McQuitty. We conducted the simulation experiments that reveals Total Variation, especially when combined with Ward.D2 linkage, consistently produces staged trees with better model fit, structure recovery, and computational efficiency. We assess performance by utilizing relative Bayesian Information Criterion (BIC), and Hamming distance. Our findings indicate that although Backward Hill Climbing (BHC) delivers competitive outcomes, it incurs a significantly higher computational cost. On the other, Total Variation divergence with Ward.D2 linkage, achieves similar performance while providing significantly better computational efficiency, making it a more viable option for large-scale or time sensitive tasks.


Supplementary Information

Neural Information Processing Systems

The claim and evidence conflict pairs can be found at https://huggingface. The scope of our dataset is purely for scientific research. Conflict V erification: Ensuring that the default and conflict evidence are contradictory. The human evaluation results showed a high level of accuracy in our data generation process. We select models with 2B and 7B parameters for our analysis. MA2 [ Touvron et al., 2023 ] is a popular open-source foundation model, trained on 2T Models with 7B and 70B parameters are selected for our analysis. To facilitate parallel training, we employ DeepSpeed Zero-Stage 3 [ Ren et al., The prompt for generating semantic conflict descriptions is shown in Figure 1 . The prompt for generating default evidence is shown in Table 6 . The prompt for generating misinformation conflict evidence is shown in Table 7 . The prompt for generating temporal conflict evidence is shown in Table 8 . The prompt for generating semantic conflict evidence is shown in Table 9 .




Chef 'not embarrassed' by one-star hygiene rating at Michelin-starred restaurant

BBC News

The chef behind Wales' only two-Michelin-star restaurant has said he is not embarrassed after it was awarded a one-star hygiene rating. Ynyshir Restaurant and Rooms, near Machynlleth in Ceredigion, which charges nearly £500 per head, received the rating after a visit by food safety officers on 5 November. According to the Food Standards Agency (FSA), a score of one out of five means major improvement is necessary. But chef patron Gareth Ward, a contestant on MasterChef The Professionals, said the restaurant was working at the highest standard in the world and doing something different with how it approaches raw ingredients and techniques. Ynyshir offers a high-end dining experience starting at £468 per person, including a 30-course tasting menu and an in-house DJ.


Subquadratic High-Dimensional Hierarchical Clustering

Neural Information Processing Systems

We consider the widely-used average-linkage, single-linkage, and Ward's methods for computing hierarchical clusterings of high-dimensional Euclidean inputs. It is easy to show that there is no efficient implementation of these algorithms in high dimensional Euclidean space since it implicitly requires to solve the closest pair problem, a notoriously difficult problem. However, how fast can these algorithms be implemented if we allow approximation? More precisely: these algorithms successively merge the clusters that are at closest average (for average-linkage), minimum distance (for single-linkage), or inducing the least sum-of-square error (for Ward's). We ask whether one could obtain a significant running-time improvement if the algorithm can merge $\gamma$-approximate closest clusters (namely, clusters that are at distance (average, minimum, or sum-of-square error) at most $\gamma$ times the distance of the closest clusters). We show that one can indeed take advantage of the relaxation and compute the approximate hierarchical clustering tree using $\widetilde{O}(n)$ $\gamma$-approximate nearest neighbor queries.


SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models

Gu, Ken, Bhat, Advait, Merrill, Mike A, West, Robert, Liu, Xin, McDuff, Daniel, Althoff, Tim

arXiv.org Artificial Intelligence

Evaluating the reasoning ability of language models (LMs) is complicated by their extensive parametric world knowledge, where benchmark performance often reflects factual recall rather than genuine reasoning. Existing datasets and approaches (e.g., temporal filtering, paraphrasing, adversarial substitution) cannot cleanly separate the two. We present SynthWorlds, a framework that disentangles task reasoning complexity from factual knowledge. In SynthWorlds, we construct parallel corpora representing two worlds with identical interconnected structure: a real-mapped world, where models may exploit parametric knowledge, and a synthetic-mapped world, where such knowledge is meaningless. On top of these corpora, we design two mirrored tasks as case studies: multi-hop question answering and page navigation, which maintain equal reasoning difficulty across worlds. Experiments in parametric-only (e.g., closed-book QA) and knowledge-augmented (e.g., retrieval-augmented) LM settings reveal a persistent knowledge advantage gap, defined as the performance boost models gain from memorized parametric world knowledge. Knowledge acquisition and integration mechanisms reduce but do not eliminate this gap, highlighting opportunities for system improvements. Fully automatic and scalable, SynthWorlds provides a controlled environment for evaluating LMs in ways that were previously challenging, enabling precise and testable comparisons of reasoning and memorization.


Supplementary Information

Neural Information Processing Systems

The claim and evidence conflict pairs can be found at https://huggingface. The scope of our dataset is purely for scientific research. Conflict V erification: Ensuring that the default and conflict evidence are contradictory. The human evaluation results showed a high level of accuracy in our data generation process. We select models with 2B and 7B parameters for our analysis. MA2 [ Touvron et al., 2023 ] is a popular open-source foundation model, trained on 2T Models with 7B and 70B parameters are selected for our analysis. To facilitate parallel training, we employ DeepSpeed Zero-Stage 3 [ Ren et al., The prompt for generating semantic conflict descriptions is shown in Figure 1 . The prompt for generating default evidence is shown in Table 6 . The prompt for generating misinformation conflict evidence is shown in Table 7 . The prompt for generating temporal conflict evidence is shown in Table 8 . The prompt for generating semantic conflict evidence is shown in Table 9 .


A Benchmark for Evaluating Knowledge Conflicts in Large Language Models

Neural Information Processing Systems

Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. While a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge, a comprehensive assessment of knowledge conflict in LLMs is still missing.