qwen1
- Asia > China > Beijing > Beijing (0.04)
- North America > United States (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- (3 more...)
- Health & Medicine (0.68)
- Education > Educational Setting (0.46)
- Energy > Renewable (0.45)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- (8 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.92)
- Education (0.92)
- Health & Medicine (0.66)
- Asia > South Korea (0.14)
- North America > United States (0.14)
- Asia > China (0.05)
- (16 more...)
- Leisure & Entertainment (0.68)
- Education > Health & Safety > School Nutrition (0.46)
- North America > United States (0.04)
- Asia > Singapore (0.04)
Beyond Arrow: From Impossibility to Possibilities in Multi-Criteria Benchmarking
Gordienko, Polina, Jansen, Christoph, Rodemann, Julian, Schollmeyer, Georg
Modern benchmarks such as HELM MMLU account for multiple metrics like accuracy, robustness and efficiency. When trying to turn these metrics into a single ranking, natural aggregation procedures can become incoherent or unstable to changes in the model set. We formalize this aggregation as a social choice problem where each metric induces a preference ranking over models on each dataset, and a benchmark operator aggregates these votes across metrics. While prior work has focused on Arrow's impossibility result, we argue that the impossibility often originates from pathological examples and identify sufficient conditions under which these disappear, and meaningful multi-criteria benchmarking becomes possible. In particular, we deal with three restrictions on the combinations of rankings and prove that on single-peaked, group-separable and distance-restricted preferences, the benchmark operator allows for the construction of well-behaved rankings of the involved models. Empirically, we investigate several modern benchmark suites like HELM MMLU and verify which structural conditions are fulfilled on which benchmark problems.
From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
Shani, Chen, Soffer, Liron, Jurafsky, Dan, LeCun, Yann, Shwartz-Ziv, Ravid
Humans organize knowledge into compact conceptual categories that balance compression with semantic richness. Large Language Models (LLMs) exhibit impressive linguistic abilities, but whether they navigate this same compression-meaning trade-off remains unclear. We apply an Information Bottleneck framework to compare human conceptual structure with embeddings from 40+ LLMs using classic categorization benchmarks. We find that LLMs broadly align with human category boundaries, yet fall short on fine-grained semantic distinctions. Unlike humans, who maintain ``inefficient'' representations that preserve contextual nuance, LLMs aggressively compress, achieving more optimal information-theoretic compression at the cost of semantic richness. Surprisingly, encoder models outperform much larger decoder models in human alignment, suggesting that understanding and generation rely on distinct representational mechanisms. Training-dynamics analysis reveals a two-phase trajectory: rapid initial concept formation followed by architectural reorganization, during which semantic processing migrates from deep to mid-network layers as the model discovers increasingly efficient, sparser encodings. These divergent strategies, where LLMs optimize for compression and humans for adaptive utility, reveal fundamental differences between artificial and natural intelligence. This highlights the need for models that preserve the conceptual ``inefficiencies'' essential for human-like understanding.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Newfoundland and Labrador > Labrador (0.04)
- (5 more...)
Bayesian Mixture of Experts For Large Language Models
Dialameh, Maryam, Rajabzadeh, Hossein, Zhang, Weiwei, Ahmed, Walid, Kwon, Hyock Ju
We present Bayesian Mixture of Experts (Bayesian-MoE), a post-hoc uncertainty estimation framework for fine-tuned large language models (LLMs) based on Mixture-of-Experts architectures. Our method applies a structured Laplace approximation to the second linear layer of each expert, enabling calibrated uncertainty estimation without modifying the original training procedure or introducing new parameters. Unlike prior approaches, which apply Bayesian inference to added adapter modules, Bayesian-MoE directly targets the expert pathways already present in MoE models, leveraging their modular design for tractable block-wise posterior estimation. We use Kronecker-factored low-rank approximations to model curvature and derive scalable estimates of predictive uncertainty and marginal likelihood. Experiments on common-sense reasoning benchmarks with Qwen1.5-MoE and DeepSeek-MoE demonstrate that Bayesian-MoE improves both expected calibration error (ECE) and negative log-likelihood (NLL) over baselines, confirming its effectiveness for reliable downstream decision-making.
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)