Connectivity Shapes Implicit Regularization in Matrix Factorization Models for Matrix Completion
Matrix factorization models have been extensively studied as a valuable test-bed for understanding the implicit biases of overparameterized models. Although both low nuclear norm and low rank regularization have been studied for these models, a unified understanding of when, how, and why they achieve different implicit regularization effects remains elusive. In this work, we systematically investigate the implicit regularization of matrix factorization for solving matrix completion problems. We empirically discover that the connectivity of observed data plays a crucial role in the implicit bias, with a transition from low nuclear norm to low rank as data shifts from disconnected to connected with increased observations. We identify a hierarchy of intrinsic invariant manifolds in the loss landscape that guide the training trajectory to evolve from low-rank to higher-rank solutions. Based on this finding, we theoretically characterize the training trajectory as following the hierarchical invariant manifold traversal process, generalizing the characterization of Li et al. (2020) to include the disconnected case. Furthermore, we establish conditions that guarantee minimum nuclear norm, closely aligning with our experimental findings, and we provide a dynamics characterization condition for ensuring minimum rank. Our work reveals the intricate interplay between data connectivity, training dynamics, and implicit regularization in matrix factorization models.
Questioning the Survey Responses of Large Language Models Ricardo Dominguez-Olmedo Max-Planck Institute for Intelligent Systems, Tรผbingen
Surveys have recently gained popularity as a tool to study large language models. By comparing survey responses of models to those of human reference populations, researchers aim to infer the demographics, political opinions, or values best represented by current language models. In this work, we critically examine this methodology on the basis of the well-established American Community Survey by the U.S. Census Bureau. Evaluating 43 different language models using de-facto standard prompting methodologies, we establish two dominant patterns. First, models' responses are governed by ordering and labeling biases, for example, towards survey responses labeled with the letter'A'.
MMSite: A Multi-modal Framework for the Identification of Active Sites in Proteins
The accurate identification of active sites in proteins is essential for the advancement of life sciences and pharmaceutical development, as these sites are of critical importance for enzyme activity and drug design. Recent advancements in protein language models (PLMs), trained on extensive datasets of amino acid sequences, have significantly improved our understanding of proteins. However, compared to the abundant protein sequence data, functional annotations, especially precise per-residue annotations, are scarce, which limits the performance of PLMs. On the other hand, textual descriptions of proteins, which could be annotated by human experts or a pretrained protein sequence-to-text model, provide meaningful context that could assist in the functional annotations, such as the localization of active sites. This motivates us to construct a ProTein-Attribute text Dataset (ProTAD), comprising over 570,000 pairs of protein sequences and multi-attribute textual descriptions.
Federated Transformer: Multi-Party Vertical Federated Learning on Practical Fuzzily Linked Data
Federated Learning (FL) is an evolving paradigm that enables multiple parties to collaboratively train models without sharing raw data. Among its variants, Vertical Federated Learning (VFL) is particularly relevant in real-world, crossorganizational collaborations, where distinct features of a shared instance group are contributed by different parties. In these scenarios, parties are often linked using fuzzy identifiers, leading to a common practice termed as multi-party fuzzy VFL. Existing models generally address either multi-party VFL or fuzzy VFL between two parties. Extending these models to practical multi-party fuzzy VFL typically results in significant performance degradation and increased costs for maintaining privacy.
We have a chance to prevent AI decimating Britain's creative industries โ but it's slipping away Beeban Kidron
But opting out is impossible to do without AI transparency. The plan is a charter for theft, since creatives would have no idea who is taking what, when and from whom. When the government stoops to a preferred outcome that undermines the moral right to your work and income, you might reasonably be angered. As Elton John said last weekend: "The government have no right to do this to my songs. They have no right to do it to anybody's songs, or anybody's prose."
MicroSD Express Cards are a must-have Switch 2 accessory -- if you can find one in stock
Planning to use your trusty MicroSD card for your Switch 2 when it finally arrives? Then we have bad news. Gamers require a whole new storage medium this time around. Specifically, you need MicroSD Express cards for the Switch 2. The Nintendo Switch 2 is poised to be one of the most successful video game console launches of all time, with high demand for preorders globally. The hype for Nintendo's next-generation console is understandable, as it's an improvement over the nearly decade-old original in meaningful ways.
Lingjiao Chen, Jared Davis
Many recent state-of-the-art results in language tasks were achieved using compound systems that perform multiple Language Model (LM) calls and aggregate their responses. However, there is little understanding of how the number of LM calls - e.g., when asking the LM to answer each question multiple times and taking a majority vote - affects such a compound system's performance. In this paper, we initiate the study of scaling properties of compound inference systems. We analyze, theoretically and empirically, how the number of LM calls affects the performance of Vote and Filter-Vote, two of the simplest compound system designs, which aggregate LM responses via majority voting, optionally applying LM filters. We find, surprisingly, that across multiple language tasks, the performance of both Vote and Filter-Vote can first increase but then decrease as a function of the number of LM calls. Our theoretical results suggest that this non-monotonicity is due to the diversity of query difficulties within a task: more LM calls lead to higher performance on "easy" queries, but lower performance on "hard" queries, and nonmonotone behavior can emerge when a task contains both types of queries. This insight then allows us to compute, from a small number of samples, the number of LM calls that maximizes system performance, and define an analytical scaling model for both systems. Experiments show that our scaling model can accurately predict the performance of Vote and Filter-Vote systems and thus find the optimal number of LM calls to make.
clearly went into our reviews, and all reviewers seem happy with the substantial potential impact of our approach, and
We thank the reviewers for their extensive comments. Where is the novelty (R2+R4) / What is the point of the new proofs (R2)? However, our primary result is to show why it works. Newton's method with a more stable trust-region based method gave rise to a more stable fixed-point (line 131), and Given this, partial derivatives and full derivatives coincide. This mischaracterisation by R6 is our fault; we had intended to cite Fitzgibbon's later We emphasise that we're modifying a baseline [24] that was published independently All issues raised by the reviewers will be clarified. 'network' instead of'parameters of the energy function' in the pose experiment; we agree the name should be changed.