Technology
KnowMol: Advancing Molecular Large Language Models with Multi-Level Chemical Knowledge
Tthesehe challenges, we introduce cKnoarbwMol-100K,oxylate group and the polarizable sulfur atom, methylsulfanyl group attaalarchge-scaed tole tdatasethe sixwithth c100Karbofine-grainedn and molecular annotations Theacross polamriultiplety of the molecule is increased by the polar verum with data available.
Don't Just Chase " Highlighted Tokens " in MLLMs: Revisiting Visual Holistic Context Retention
Despite their powerful capabilities, Multimodal Large Language Models (MLLMs) suffer from considerable computational overhead due to their reliance on massive visual tokens. Recent studies have explored token pruning to alleviate this problem, which typically uses text-vision cross-attention or [CLS] attention to assess and discard redundant visual tokens. In this work, we identify a critical limitation of such attention-first pruning approaches, i.e., they tend to preserve semantically similar tokens, resulting in pronounced performance drops under high pruning ratios. To this end, we propose HoloV, a simple yet effective, plug-and-play visual token pruning framework for efficient inference.
DAMamba: Vision State Space Model with Dynamic Adaptive Scan
State space models (SSMs) have recently garnered significant attention in computer vision. However, due to the unique characteristics of image data, adapting SSMs from natural language processing to computer vision has not outperformed the state-of-the-art convolutional neural networks (CNNs) and Vision Transformers (ViTs). Existing vision SSMs primarily leverage manually designed scans to flatten image patches into sequences locally or globally. This approach disrupts the original semantic spatial adjacency of the image and lacks flexibility, making it difficult to capture complex image structures. To address this limitation, we propose Dynamic Adaptive Scan (DAS), a data-driven method that adaptively allocates scanning orders and regions. This enables more flexible modeling capabilities while maintaining linear computational complexity and global modeling capacity. Based on DAS, we further propose the vision backbone DAMamba, which significantly outperforms popular vision Mamba models in vision tasks such as image classification, object detection, instance segmentation, and semantic segmentation.
Efficient Fairness-Performance Pareto Front Computation
There is a well known intrinsic trade-off between the fairness of a representation and the performance of classifiers derived from the representation. In this paper we propose a new method to compute the optimal Pareto front of this trade off. In contrast to the existing methods, this approach does not require the training of complex fair representation models. Our approach is derived through three main steps: We analyze fair representations theoretically, and derive several structural properties of optimal representations. We then show that these properties enable a reduction of the computation of the Pareto Front to a compact discrete problem. Finally, we show that these compact approximating problems can be efficiently solved via off-the shelf concave-convex programming methods.
'Pretty Crazy' Token Usage Is Testing Bosses' Bet on AI
'Pretty Crazy' Token Usage Is Testing Bosses' Bet on AI A Silicon Valley software maker and an ecommerce company reveal to WIRED how they are navigating the emerging challenge of "tokenomics." At the software company 8x8, employees are using Anthropic's Claude to draft emails, analyze customer feedback, and write code, but so far, their growing reliance on the artificial intelligence chatbot hasn't troubled the finance team. While other Silicon Valley companies, such as Meta, Uber, and Salesforce, have publicly expressed concerns about the growing cost of generative AI tools and have begun introducing usage caps in some cases, 8x8 says it finds itself in the black. Over the past 18 months, the company estimates it has saved about $5 million in annual costs by canceling subscriptions to dozens of software and educational tools it deemed unnecessary in part because Claude could provide similar capabilities. So far, 8x8's annualized bill for Claude is "well below" that figure, says Joel Neeb, the company's chief transformation and business operations officer.
Multi-Objective Hyperparameter Selection via Hypothesis Testing on Reliability Graphs
The selection of hyperparameters, such as prompt templates in large language models (LLMs), must often strike a balance between reliability and cost. In many cases, structural relationships between the expected reliability levels of the hyperparameters can be inferred from prior information and held-out data - e.g., longer prompt templates may be more detailed and thus more reliable. However, existing hyperparameter selection methods either do not provide formal reliability guarantees or are unable to incorporate structured knowledge in the hyperparameter space. This paper introduces reliability graph-based Pareto testing (RG-PT), a novel multi-objective hyperparameter selection framework that maintains formal reliability guarantees in terms of false discovery rate (FDR), while accounting for known relationships among hyperparameters via a directed acyclic graph. Edges in the graph reflect expected reliability and cost trade-offs among hyperparameters, which are inferred via the Bradley-Terry (BT) ranking model from prior information and held-out data. Experimental evaluations demonstrate that RG-PT significantly outperforms existing methods such as learn-then-test (LTT) and Pareto testing (PT) through a more efficient exploration of the hyperparameter space.
FSNet: Feasibility-Seeking Neural Network for Constrained Optimization with Guarantees
Efficiently solving constrained optimization problems is crucial for numerous realworld applications, yet traditional solvers are often computationally prohibitive for real-time use. Machine learning-based approaches have emerged as a promising alternative to provide approximate solutions at faster speeds, but they struggle to strictly enforce constraints, leading to infeasible solutions in practice. To address this, we propose the Feasibility-Seeking Neural Network (FSNet), which integrates a feasibility-seeking step directly into its solution procedure to ensure constraint satisfaction. This feasibility-seeking step solves an unconstrained optimization problem that minimizes constraint violations in a differentiable manner, enabling end-to-end training and providing guarantees on feasibility and convergence. Our experiments across a range of different optimization problems, including both smooth/nonsmooth and convex/nonconvex problems, demonstrate that FSNet can provide feasible solutions with solution quality comparable to (or in some cases better than) traditional solvers, at significantly faster speeds.1
Implicit Bias of Spectral Descent and Muon on Multiclass Separable Data
Different gradient-based methods for optimizing overparameterized models can all achieve zero training error yet converge to distinctly different solutions inducing different generalization properties. We provide the first complete characterization of implicit optimization bias for p-norm normalized steepest descent (NSD) and momentum steepest descent (NMD) algorithms in multi-class linear classification with cross-entropy loss. Our key theoretical contribution is proving that these algorithms converge to solutions maximizing the margin with respect to the classifier matrix's p-norm, with established convergence rates. These results encompass important special cases including Spectral Descent and Muon, which we show converge to max-margin solutions with respect to the spectral norm. A key insight of our contribution is that the analysis of general entry-wise and Schatten p-norms can be reduced to the analysis of NSD/NMD with max-norm by exploiting a natural ordering property between all p-norms relative to the max-norm and its dual sumnorm. For the specific case of descent with respect to the max-norm, we further extend our analysis to include preconditioning, showing that Adam converges to the matrix's max-norm solution. Our results demonstrate that the multi-class linear setting, which is inherently richer than the binary counterpart, provides the most transparent framework for studying implicit biases of matrix-parameter optimization algorithms.
DataSIR: ABenchmark Dataset for Sensitive Information Recognition
A.1 Comparison of Results for Gemini with Different Format Transformations Gemini attained optimal performance metrics for sensitive category and format transformation scenarios tasks, surpassing all comparator models in maximum achievable performance. The focus was then placed on Gemini's ability to recognize and restore both original and transformed data. The experimental results are shown in Table 1. In the main text section Experiments, due to space constraints, only four key observations were analyzed, as follows: i) The LRAcc and DRAcc of total format transformed data is less than original data, which indicates that it is more difficult to recognize and restore data after format transformed. These transformations only affect numbers, and only the IMEI and IMSI (purely numeric) sensitive categories support such transformations. Due to the lack of contextual information in the sample data, large language models may confuse these with personal identifiers, mobile numbers, and MEID.
DataSIR: ABenchmark Dataset for Sensitive Information Recognition
With the rapid development of artificial intelligence technologies, the demand for training data has surged, exacerbating risks of data leakage. Despite increasing incidents and costs associated with such leaks, data leakage prevention (DLP) technologies lag behind evolving evasion techniques that bypass existing sensitive information recognition (SIR) models. Current datasets lack comprehensive coverage of these adversarial transformations, limiting the evaluation of robust SIR systems. To address this gap, we introduce DataSIR, a benchmark dataset specifically designed to evaluate SIR models on sensitive data subjected to diverse format transformations. We curate 26 sensitive data categories based on multiple international regulations, and collect 131,890 original samples correspondingly.