Plotting

Alias-Free Mamba Neural Operator

Neural Information Processing Systems

Benefiting from the booming deep learning techniques, neural operators (NO) are considered as an ideal alternative to break the traditions of solving Partial Differential Equations (PDE) with expensive cost.


Optimal Best-arm Identification in Linear Bandits

Neural Information Processing Systems

We study the problem of best-arm identification with fixed confidence in stochastic linear bandits. The objective is to identify the best arm with a given level of certainty while minimizing the sampling budget. We devise a simple algorithm whose sampling complexity matches known instance-specific lower bounds, asymptotically almost surely and in expectation. The algorithm relies on an arm sampling rule that tracks an optimal proportion of arm draws, and that remarkably can be updated as rarely as we wish, without compromising its theoretical guarantees. Moreover, unlike existing best-arm identification strategies, our algorithm uses a stopping rule that does not depend on the number of arms. Experimental results suggest that our algorithm significantly outperforms existing algorithms. The paper further provides a first analysis of the best-arm identification problem in linear bandits with a continuous set of arms.


FineStyle: Fine-grained Controllable Style Personalization for Text-to-image Models

Neural Information Processing Systems

Nine image pairs are generated by personalized text-to-image models, each of which is fine-tuned on a respective, single style reference image displayed at the corner of the left image of each pair. Fine-grained concepts are written on top of the images for comparisons, showing the nuanced compositionality encompassing color, foreground object, background, and textures. Full prompts are available in Appendix A.1.


Appendix

Neural Information Processing Systems

The appendix is organized as follows. In Appendix A, we first discuss the relationship of our work to prior arts. In Appendix B, we provide some preliminary tools for analyzing our manifold optimization problem. Based upon this, the proof of Theorem 1 and the proof of Theorem 2 are provided in Appendix C and Appendix D, respectively. Finally, our experimental setup as well as more experimental results are provided in Appendix E. Notations. Before we proceed, let us first introduce the notations that will be used throughout the appendix.


Graph Convolutions Enrich the Self-Attention in Transformers!

Neural Information Processing Systems

Transformers, renowned for their self-attention mechanism, have achieved state-ofthe-art performance across various tasks in natural language processing, computer vision, time-series modeling, etc. However, one of the challenges with deep Transformer models is the oversmoothing problem, where representations across layers converge to indistinguishable values, leading to significant performance degradation. We interpret the original self-attention as a simple graph filter and redesign it from a graph signal processing (GSP) perspective.


A Supplemental Figures

Neural Information Processing Systems

Supplementary Material for "What shapes feature representations? Figure A.1: Feature decodability in models with a ResNet-50 architecture trained on the Navon dataset. Accuracy decoding features (shape, texture) from an untrained model (left) versus from shape- (center) and texture-trained (right) models. Results corresponding to trained models are mean across models trained on 5 cv splits. Target features are enhanced relative to the untrained model, whereas non-target features are suppressed. Figure A.2: Non-target features are suppressed in the post-pool layer of models with a ResNet-50 architecture trained on the Trifeature dataset.


Katherine L. Hermann Andrew K. Lampinen

Neural Information Processing Systems

In naturalistic learning problems, a model's input contains a wide range of features, some useful for the task at hand, and others not. Of the useful features, which ones does the model use? Of the task-irrelevant features, which ones does the model represent? Answers to these questions are important for understanding the basis of models' decisions, as well as for building models that learn versatile, adaptable representations useful beyond the original training task. We study these questions using synthetic datasets in which the task-relevance of input features can be controlled directly.


71e9c6620d381d60196ebe694840aaaa-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their helpful comments. Feature difficulty (R3): "I hope that the authors have a grasp of manually designed image features and their We agree that color is an easier feature than shape or texture. We performed experiments using both vision and non-vision datasets. Indeed, we found that feature difficulty was not the sole determinant of feature use or representation (Figs. 5 & 6). The joint image feature-label statistics of ImageNet are unknown and uncontrolled.


Hume's new EVI 3 model lets you customize AI voices - how to try it

ZDNet

Hume AI is launching EVI 3, the third iteration of its Empathic Voice Interface (EVI) model, which can interact with users in a huge variety of humanlike voices. Like ChatGPT's voice mode, EVI 3 comes with an assortment of preprogrammed AI voices. These are listed by personality and character descriptions, including "Old Knocks Comedian," "Seasoned Life Coach," "Wise Wizard," and "Dungeon Master," as well as the company's namesake, the 18th-century philosopher David Hume. Crucially, the model also comes with a feature that allows users to customize their own AI voices from scratch. And rather than having to adjust a long list of specific attributes, as you might when building a Bitmoji or a video game character, you can simply describe the characteristics of your desired voice, using natural language, and the model will do the rest. The launch reflects a broader effort among AI companies to build more personable and engaging models by training them to exhibit distinct "personalities."


InterDreamer: Zero-Shot Textto 3D Dynamic Human-Object Interaction Ziyin Wang

Neural Information Processing Systems

Text-conditioned human motion generation has experienced significant advancements with diffusion models trained on extensive motion capture data and corresponding textual annotations. However, extending such success to 3D dynamic human-object interaction (HOI) generation faces notable challenges, primarily due to the lack of large-scale interaction data and comprehensive descriptions that align with these interactions. This paper takes the initiative and showcases the potential of generating human-object interactions without direct training on text-interaction pair data. Our key insight in achieving this is that interaction semantics and dynamics can be decoupled. Being unable to learn interaction semantics through supervised training, we instead leverage pre-trained large models, synergizing knowledge from a large language model and a text-to-motion model. While such knowledge offers high-level control over interaction semantics, it cannot grasp the intricacies of low-level interaction dynamics. To overcome this issue, we introduce a world model designed to comprehend simple physics, modeling how human actions influence object motion. By integrating these components, our novel framework, InterDreamer, is able to generate text-aligned 3D HOI sequences without relying on paired text-interaction data. We apply InterDreamer to the BEHAVE, OMOMO, and CHAIRS datasets, and our comprehensive experimental analysis demonstrates its capability to generate realistic and coherent interaction sequences that seamlessly align with the text directives.