Apple's Big OS Rebrand, OnePlus Embraces AI, and Samsung's Next Folds--Your Gear News of the Week
Bloomberg reports that this year at WWDC, Apple plans to announce a broad overhaul of all of its operating systems. That includes renaming them to be more consistent. Starting this year, Apple will reportedly begin denoting each OS version for each product by year, instead of by version. Confusingly, it will start with the next year, rather than this year (just like cars). So the versions we'll see at this year's WWDC will not be iOS 25, but rather iOS 26, watchOS 26, and so on, in place of iOS 19 and watchOS 12. Here's more you may have missed this week: The move is reportedly part of a larger push toward a cohesive user experience across platforms.
Mixture of In-Context Experts Enhance LLMs' Long Context Awareness Hongzhan Lin 1 Ang Lv1 Yuhan Chen 2 Chen Zhu 3
Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions. Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging. In this paper, for LLMs utilizing RoPE as position embeddings, we introduce a novel method called "Mixture of In-Context Experts" (MoICE) to address this challenge. MoICE comprises two key components: a router integrated into each attention head within LLMs and a lightweight router-only training optimization strategy: (1) MoICE views each RoPE angle as an'in-context' expert, demonstrated to be capable of directing the attention of a head to specific contextual positions. Consequently, each attention head flexibly processes tokens using multiple RoPE angles dynamically selected by the router to attend to the needed positions.
appreciate the technical novelty in our approach and its theoretical guarantees, and find our work respond in more detail below, and took all comments into account in our revised version
We sincerely thank the reviewers for their time, feedback, and thoughtful suggestions. We would like to first clarify the claims and evaluation of our work. In the context of HC, we focus on Dasgupta's cost (DC), Approximation Ratio (R3) R3's main concerns are two clarifications about our approximation The first asks if the approximation result (Thm 4.1) only holds for the optimal embedding. HC Baselines (R2) We thank R2 for the suggestions to improve our experiments. K-Means, a top-down method which is the direct analog of HKM in a similarity-based context [33].
Credal Deep Ensembles for Uncertainty Quantification Kaizheng Wang
This paper introduces an innovative approach to classification called Credal Deep Ensembles (CreDEs), namely, ensembles of novel Credal-Set Neural Networks (CreNets). CreNets are trained to predict a lower and an upper probability bound for each class, which, in turn, determine a convex set of probabilities (credal set) on the class set. The training employs a loss inspired by distributionally robust optimization which simulates the potential divergence of the test distribution from the training distribution, in such a way that the width of the predicted probability interval reflects the'epistemic' uncertainty about the future data distribution. Ensembles can be constructed by training multiple CreNets, each associated with a different random seed, and averaging the outputted intervals. Extensive experiments are conducted on various out-of-distributions (OOD) detection benchmarks (CIFAR10/100 vs SVHN/Tiny-ImageNet, CIFAR10 vs CIFAR10-C, ImageNet vs ImageNet-O) and using different network architectures (ResNet50, VGG16, and ViT Base). Compared to Deep Ensemble baselines, CreDEs demonstrate higher test accuracy, lower expected calibration error, and significantly improved epistemic uncertainty estimation.
Supplementary Materials for S-PIFu: Integrating Parametric Human Models with PIFu for Single-view Clothed Human Reconstruction
In Figure 1, we show S-PIFu's results when given images of test subjects who wear large clothings (e.g. Images of these test subjects have pixels that belong to human subject but not to the SMPL-X body, and yet S-PIFu is able reconstruct the human subjects accurately. Pixels that belong to human subject but not to the SMPL-X body act as a natural regularizer that prevents S-PIFu from being overly reliant on estimated SMPL-X meshes to reconstruct clothed human meshes. This happens because these pixels only have valid values for the RGB channels and not the channels of our 2D feature maps (i.e. C, B, and N. Recall that C refers to coordinate information, B refers to blendweights-based labels, and N refers to body part orientation information). In Figure 1, we observe what would happen if we feed a noisy SMPL-X mesh (i.e. a SMPL-X mesh with inaccurate pose parameters) to our S-PIFu (Note that S-PIFu has not been trained with any noisy SMPL-X meshes). It is not uncommon for an estimated SMPL-X mesh to have an inaccurate pose, as observed by PaMIR (9), ARCH++ (3) and ICON (8).
High-Quality Self-Supervised Deep Image Denoising
Samuli Laine, Tero Karras, Jaakko Lehtinen, Timo Aila
We describe a novel method for training high-quality image denoising models based on unorganized collections of corrupted images. The training does not need access to clean reference images, or explicit pairs of corrupted images, and can thus be applied in situations where such data is unacceptably expensive or impossible to acquire. We build on a recent technique that removes the need for reference data by employing networks with a "blind spot" in the receptive field, and significantly improve two key aspects: image quality and training efficiency. Our result quality is on par with state-of-the-art neural network denoisers in the case of i.i.d.
clarifying the paper. always performs roughly on par with the baseline supervised training, and with very small training sets appears to
We would like to thank the reviewers for their comments and remarks. Reviewers #1 and #4 inquired about the quality of our method with smaller training sets. Training images Method all 10 000 1000 500 300 200 100 (10 runs) Baseline, N2C 31.60 31.59 Reviewer #1 remarked that our experiments are performed on synthetic data only. As the non-learned CBM3D method is also designed for natural images, we feel that our comparisons are fair.
Supplementary Material for " Adaptive Experimental Design with Temporal Interference: A Maximum Likelihood Approach "
Throughout this section, we refer to the two Markov chains depicted in Figure 1. The transition probabilities are as depicted in the figure. We assume each chain only earns a reward in state x =1. Thus the treatment effect is (q(2) q(1))/s. First, suppose that for ` =1,2we wanted to estimate only (`) by running chain `, i.e., A Then note that in every S steps, only one observation is received of the reward in state 1. Figure 1: The two Markov chains described in Appendix A. Chain 1 is red, and chain 2 is blue.
Compact Proofs of Model Performance via Mechanistic Interpretability
We propose using mechanistic interpretability - techniques for reverse engineering model weights into human-interpretable algorithms - to derive and compactly prove formal guarantees on model performance. We prototype this approach by formally proving accuracy lower bounds for a small transformer trained on Max-of-K, validating proof transferability across 151 random seeds and four values of K. We create 102 different computer-assisted proof strategies and assess their length and tightness of bound on each of our models. Using quantitative metrics, we find that shorter proofs seem to require and provide more mechanistic understanding. Moreover, we find that more faithful mechanistic understanding leads to tighter performance bounds. We confirm these connections by qualitatively examining a subset of our proofs. Finally, we identify compounding structureless errors as a key challenge for using mechanistic interpretability to generate compact proofs on model performance.
MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making Yubin Kim 1 Chanwoo Park
Foundation models are becoming valuable tools in medicine. Yet despite their promise, the best way to leverage Large Language Models (LLMs) in complex medical tasks remains an open question. We introduce a novel multi-agent framework, named Medical Decision-making Agents (MDAgents) that helps to address this gap by automatically assigning a collaboration structure to a team of LLMs. The assigned solo or group collaboration structure is tailored to the medical task at hand, a simple emulation inspired by the way real-world medical decision-making processes are adapted to tasks of different complexities. We evaluate our framework and baseline methods using state-of-the-art LLMs across a suite of real-world medical knowledge and medical diagnosis benchmarks, including a comparison of LLMs' medical complexity classification against human physicians