Goto

Collaborating Authors

The Creator of em Succession /em Is Back With a Movie. There's a Reason He Rushed to Make It Right Away.

Slate

Outside an opulent retreat in the mountains of Utah, the world is going to hell. Thanks to disinformation-spreading tools on the world's largest social media platform, people are being executed by bloodthirsty mobs and machine-gunned by their neighbors, politicians assassinated and governments crumbling. But inside Mountainhead, the billionaire tech moguls responsible for the chaos are smoking cigars and shooting the breeze, debating whether the eruption of global chaos is a crisis to be managed or a surge of "creative destruction" that will help usher humanity into a brighter future. If the fictional setting of Mountainhead, the debut feature by Jesse Armstrong, seems a little too close to reality, that's because it's meant to be. The movie, which stars Steve Carell, Jason Schwartzman, Ramy Youssef, and Cory Michael Smith, was conceived, written, cast, shot, edited, and released in about six months, an astonishingly short timeline for any director, let alone a first-timer.


Infinite-Dimensional Feature Interaction Maoliang Li

Neural Information Processing Systems

The past neural network design has largely focused on feature representation space dimension and its capacity scaling (e.g., width, depth), but overlooked the feature interaction space scaling. Recent advancements have shown shifted focus towards element-wise multiplication to facilitate higher-dimensional feature interaction space for better information transformation. Despite this progress, multiplications predominantly capture low-order interactions, thus remaining confined to a finitedimensional interaction space. To transcend this limitation, classic kernel methods emerge as a promising solution to engage features in an infinite-dimensional space. We introduce InfiNet, a model architecture that enables feature interaction within an infinite-dimensional space created by RBF kernel. Our experiments reveal that InfiNet achieves new state-of-the-art, owing to its capability to leverage infinitedimensional interactions, significantly enhancing model performance.


Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning Jaehyun Nam 1 Jihoon Tack

Neural Information Processing Systems

In tabular prediction tasks, tree-based models combined with automated feature engineering methods often outperform deep learning approaches that rely on learned representations. While these feature engineering techniques are effective, they typically depend on a pre-defined search space and primarily use validation scores for feature selection, thereby missing valuable insights from previous experiments. To address these limitations, we propose a novel tabular learning framework that utilizes large language models (LLMs), termed Optimizing Column feature generator with decision Tree reasoning (OCTree). Our key idea is to leverage the reasoning capabilities of LLMs to identify effective feature generation rules without manually specifying the search space and provide language-based reasoning information highlighting past experiments as feedback for iterative rule improvements. We use decision trees to convey this reasoning information, as they can be easily represented in natural language, effectively providing knowledge from prior experiments (i.e., the impact of the generated features on performance) to the LLMs. Our empirical results demonstrate that OCTree consistently enhances the performance of various prediction models across diverse benchmarks, outperforming competing automated feature engineering methods.


An Improved Analysis of Training Over-parameterized Deep Neural Networks

Neural Information Processing Systems

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for overparameterized (i.e., sufficiently wide) deep neural networks. However, the condition on the width of the neural network to ensure the global convergence is very stringent, which is often a high-degree polynomial in the training sample size n (e.g., O(n


DM2C: Deep Mixed-Modal Clustering Zhiyong Yang 1,2

Neural Information Processing Systems

Data exhibited with multiple modalities are ubiquitous in real-world clustering tasks. Most existing methods, however, pose a strong assumption that the pairing information for modalities is available for all instances. In this paper, we consider a more challenging task where each instance is represented in only one modality, which we call mixed-modal data. Without any extra pairing supervision across modalities, it is difficult to find a universal semantic space for all of them. To tackle this problem, we present an adversarial learning framework for clustering with mixed-modal data. Instead of transforming all the samples into a joint modalityindependent space, our framework learns the mappings across individual modality spaces by virtue of cycle-consistency. Through these mappings, we could easily unify all the samples into a single modality space and perform the clustering. Evaluations on several real-world mixed-modal datasets could demonstrate the superiority of our proposed framework.


Response for Submission 3163 " DM2C: Deep Mixed-Modal Clustering "

Neural Information Processing Systems

We thank all the reviewers for their careful and valuable comments. Ablation study We evaluate k-means using latent modality-specific representations obtained before/after the The results are recorded in Tab. Cycle consistency on multiple modalities: Perhaps due to our way of writing, it is a pity to leave you an Q2. "1-Lipschitz constraint" is not explained: "1-Lipschitz constraint" is a requirement of the dual formulation This approximates the cycle-consistency condition.


Unraveling the Gradient Descent Dynamics of Transformers

Neural Information Processing Systems

While the Transformer architecture has achieved remarkable success across various domains, a thorough theoretical foundation explaining its optimization dynamics is yet to be fully developed. In this study, we aim to bridge this understanding gap by answering the following two core questions: (1) Which types of Transformer architectures allow Gradient Descent (GD) to achieve guaranteed convergence?


Intrinsic Self-Supervision for Data Quality Audits Fabian Gröger, Alvaro Gonzalez-Jimenez

Neural Information Processing Systems

Benchmark datasets in computer vision often contain off-topic images, near duplicates, and label errors, leading to inaccurate estimates of model performance. In this paper, we revisit the task of data cleaning and formalize it as either a ranking problem, which significantly reduces human inspection effort, or a scoring problem, which allows for automated decisions based on score distributions. We find that a specific combination of context-aware self-supervised representation learning and distance-based indicators is effective in finding issues without annotation biases.


Selecting the independent coordinates of manifolds with large aspect ratios

Neural Information Processing Systems

Many manifold embedding algorithms fail apparently when the data manifold has a large aspect ratio (such as a long, thin strip). Here, we formulate success and failure in terms of finding a smooth embedding, showing also that the problem is pervasive and more complex than previously recognized.


6a10bbd480e4c5573d8f3af73ae0454b-AuthorFeedback.pdf

Neural Information Processing Systems

Thanks for the VERY careful, responsible and competent reviews our paper has received! Here we comment only on the more significant questions raised. Reviewer 1 " relate to: "Non-Redundant Spectral Dimensionality Reduction", Michaeli et al." Will do. " The choice of kernel bandwidth (ε) not addressed." "if ε is chosen as a diag matrix..., the aspect ratio problem could be fixed (see for example "Kernel Scaling for We will discuss this reference in final paper.