Industry
Computable universal online learning
Understanding when learning is possible is a fundamental task in the theory of machine learning. However, many characterizations known from the literature deal with abstract learning as a mathematical object and ignore the crucial question: when can learning be implemented as a computer program? We address this question for universal online learning, a generalist theoretical model of online binary classification, recently characterized by Bousquet et al. (STOC 2021). In this model, there is no hypothesis fixed in advance; instead, Adversary--playing the role of Nature--can change their mind as long as local consistency with the given class of hypotheses is maintained. We require Learner to achieve a finite number of mistakes while using a strategy that can be implemented as a computer program. We show that universal online learning does not imply computable universal online learning, even if the class of hypotheses is relatively easy from a computability-theoretic perspective. We then study the agnostic variant of computable universal online learning and provide an exact characterization of classes that are learnable in this sense. We also consider a variant of proper universal online learning and show exactly when it is possible. Together, our results give a more realistic perspective on the existing theory of online binary classification and the related problem of inductive inference.
Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking
Large Language Models (LLMs) have demonstrated notable capabilities across financial tasks, including financial report summarization, earnings call transcript analysis, and asset classification. However, their real-world effectiveness in managing complex fund investment remains inadequately assessed. A fundamental limitation of existing benchmarks for evaluating LLM-driven trading strategies is their reliance on historical back-testing, inadvertently enabling LLMs to time travel--leveraging future information embedded in their training corpora, thus resulting in possible information leakage and overly optimistic performance estimates. To address this issue, we introduce DeepFund, a live fund benchmark tool designed to rigorously evaluate LLM in real-time market conditions. Utilizing a multi-agent architecture, DeepFund connects directly with real-time stock market data--specifically data published after each model's pretraining cutoff--to ensure fair and leakage-free evaluations. Empirical tests on nine flagship LLMs from leading global institutions across multiple investment dimensions--including ticker-level analysis, investment decision-making, portfolio management, and risk control--reveal significant practical challenges.
DAAC: Discrepancy-Aware Adaptive Contrastive Learning for Medical Time series
Medical time-series data play a vital role in disease diagnosis but suffer from limited labeled samples and single-center bias, which hinder model generalization and lead to overfitting. To address these challenges, we propose DAAC (Discrepancy-Aware Adaptive Contrastive learning), a learnable multi-view contrastive framework that integrates external normal samples and enhances feature learning through adaptive contrastive strategies. DAAC consists of two key modules: (1) a Discrepancy Estimator, built upon a GAN-enhanced encoder-decoder architecture, captures the distribution of normal data and computes reconstruction errors as indicators of abnormality. These discrepancy features augment the target dataset to mitigate overfitting.
SplashNet: Split‑and‑Share Encoders for Accurate and Efficient Typing with Surface Electromyography
Surface electromyography (sEMG) at the wrists could enable natural, keyboard free text entry, yet the state of the art emg2qwerty baseline still misrecognizes 51.8\% of characters zero shot on unseen users and 7.0\% after user specific fine tuning. We trace much of these errors to mismatched cross user signal statistics, fragile reliance on high order feature dependencies, and the absence of architectural inductive biases aligned with the bilateral nature of typing. To address these issues, we introduce three simple modifications: (i) Rolling Time Normalization which adaptively aligns input distributions across users; (ii) Aggressive Channel Masking, which encourages reliance on low order feature combinations more likely to generalize across users; and (iii) a Split and Share encoder that processes each hand independently with weight shared streams to reflect the bilateral symmetry of the neuromuscular system. Combined with a five fold reduction in spectral resolution (33$\rightarrow$6 frequency bands), these components yield a compact Split-and-Share model, SplashNet mini, which uses only the parameters and 0.6 the FLOPs of the baseline while reducing character error rate (CER) to 36.4\% zero shot and 5.9\% after fine tuning. An upscaled variant, SplashNet ( parameters, 1.15 FLOPs of the baseline), further lowers error to 35.7\% and 5.5\%, representing 31\% and 21\% relative improvements in the zero-shot and finetuned settings, respectively. SplashNet therefore establishes a new state-of-the-art without requiring additional data.
TreeFinder: A US-Scale Benchmark Dataset for Individual Tree Mortality Monitoring Using High-Resolution Aerial Imagery
Monitoring individual tree mortality at scale has been found to be crucial for understanding forest loss, ecosystem resilience, carbon fluxes, and climate-induced impacts. However, the fine-granularity monitoring faces major challenges on both the data and methodology sides because: (1) finding isolated individual-level tree deaths requires high-resolution remote sensing images with broad coverage, and (2) compared to regular geo-objects (e.g., buildings), dead trees often exhibit weaker contrast and high variability across tree types, landscapes and ecosystems. Existing datasets on tree mortality primarily rely on moderate-resolution satellite imagery (e.g., 30m resolution), which aims to detect large-patch wipe-outs but is unable to recognize individual-level tree mortality events. Several efforts have explored alternatives via very-high-resolution drone imagery. However, drone images are highly expensive and can only be collected at local scales, which are therefore not suitable for national-scale applications and beyond. To bridge the gaps,we introduce TreeFinder, the first high-resolution remote sensing benchmark dataset designed for individual-level tree mortality mapping across the Contiguous United States (CONUS).
Self-Refining Language Model Anonymizers via Adversarial Distillation
Large language models (LLMs) are increasingly used in sensitive domains, where their ability to infer personal data from seemingly benign text introduces emerging privacy risks. While recent LLM-based anonymization methods help mitigate such risks, they often rely on proprietary models (e.g., GPT-4), raising concerns about cost and the potential exposure of sensitive data to untrusted external systems. To address this, we introduce $\textit{SElf-refining Anonymization with Language model}$ (SEAL), a novel distillation framework for training small language models (SLMs) to perform effective anonymization without relying on external models at inference time. SEAL leverages adversarial interactions between an LLM anonymizer and an inference model to collect trajectories of anonymized texts and inferred attributes, which are then used to distill anonymization and critique capabilities into SLMs through supervised fine-tuning and preference learning. The resulting models learn both to anonymize text and to evaluate their outputs, enabling iterative improvement of anonymization quality via self-refinement. Experiments on SynthPAI, a dataset of synthetic personal profiles and text comments, demonstrate that SLMs trained with SEAL achieve substantial improvements in anonymization capabilities. Notably, 8B models attain a privacy-utility trade-off comparable to that of the GPT-4 anonymizer and, with self-refinement, even surpass it in terms of privacy protection.
Functional data analysis for multivariate distributions through Wasserstein slicing
The modeling of samples of distributions is a major challenge since distributions do not form a vector space. While various approaches exist for univariate distributions, including transformations to a Hilbert space, far less is known about the multivariate case. We utilize a transformation approach to map multivariate distributions to a Hilbert space via a Wasserstein slicing method that is invertible. This approach combines functional data analysis tools, such as functional principal component analysis and modes of variation, with the facility to map back to interpretable distributions. We also provide convergence guarantees for the Hilbert space representations under a broad class of such transforms. The method is illustrated using joint systolic and diastolic blood pressure data.
Global capitalism bets it all on AI future, alarming voters
Days after filing confidentially to go public, Anthropic, the $965 billion artificial intelligence juggernaut that's one of the fastest-growing startups of all time, dropped another bombshell. In a blog post, Anthropic suggested the world might benefit from a slowdown in development of the very technologies that have been minting cash for the company. Provided global peers agreed, and enforcement mechanisms could be set up, that would help societies deal with the "immense implications" of AI, it said. Critics have long accused Anthropic of "doom marketing" -- hyping its own products as so good that they're bad. But the post's co-author, who's also the company's co-founder, says the motive is very different. "We say this stuff because we think the world needs to know the truth about what's happening," Jack Clark, who now heads Anthropic's public benefit work, said in an interview.
Worse than Zero-shot? A Fact-Checking Dataset for Evaluating the Robustness of RAG Against Misleading Retrievals
Retrieval-augmented generation (RAG) has shown impressive capabilities in mitigating hallucinations in large language models (LLMs). However, LLMs struggle to maintain consistent reasoning when exposed to misleading or conflicting evidence, especially in real-world domains such as politics, where information is polarized or selectively framed. Mainstream RAG benchmarks evaluate models under clean retrieval settings, where systems generate answers from gold-standard documents, or under synthetically perturbed settings, where documents are artificially injected with noise. These assumptions fail to reflect real-world conditions, often leading to an overestimation of RAG system performance. To address this gap, we introduce \textsc{RAGuard}, the first benchmark to evaluate the robustness of RAG systems against \textit{misleading} retrievals. Unlike prior benchmarks that rely on synthetic noise, our fact-checking dataset captures naturally occurring misinformation by constructing its retrieval corpus from Reddit discussions. It categorizes retrieved evidence into three types: \textit{supporting}, \textit{misleading}, and \textit{unrelated}, providing a realistic and challenging testbed for assessing how well RAG systems navigate different types of evidence. Our experiments reveal that, when exposed to potentially misleading retrievals, all tested LLM-powered RAG systems perform worse than their zero-shot baselines (i.e., no retrieval at all), while human annotators consistently perform better, highlighting LLMs' susceptibility to noisy environments. To our knowledge, \textsc{RAGuard} is the first benchmark to systematically assess the robustness of the RAG against misleading evidence.We expect this benchmark to drive future research toward improving RAG systems beyond idealized datasets, making them more reliable for real-world applications.