Plotting


CausalChaos for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes

Neural Information Processing Systems

Causal video question answering (QA) has garnered increasing interest, yet existing datasets often lack depth in causal reasoning. To address this gap, we capitalize on the unique properties of cartoons and construct CausalChaos!, a novel, challenging causal Why-QA dataset built upon the iconic "Tom and Jerry" cartoon series. Cartoons use the principles of animation that allow animators to create expressive, unambiguous causal relationships between events to form a coherent storyline. Utilizing these properties, along with thought-provoking questions and multilevel answers (answer and detailed causal explanation), our questions involve causal chains that interconnect multiple dynamic interactions between characters and visual scenes. These factors demand models to solve more challenging, yet well-defined causal relationships. We also introduce hard incorrect answer mining, including a causally confusing version that is even more challenging. While models perform well, there is much room for improvement, especially, on open-ended answers. We identify more advanced/explicit causal relationship modeling & joint modeling of vision and language as the immediate areas for future efforts to focus upon. Along with the other complementary datasets, our new challenging dataset will pave the way for these developments in the field.



Collaborative Refining for Learning from Inaccurate Labels

Neural Information Processing Systems

This paper considers the problem of learning from multiple sets of inaccurate labels, which can be easily obtained from low-cost annotators, such as rule-based annotators. Previous works typically concentrate on aggregating information from all the annotators, overlooking the significance of data refinement. This paper presents a collaborative refining approach for learning from inaccurate labels. To refine the data, we introduce the annotator agreement as an instrument, which refers to whether multiple annotators agree or disagree on the labels for a given sample. For samples where some annotators disagree, a comparative strategy is proposed to filter noise.


a8808b75b299d64a23255bc8d30fb786-Paper-Conference.pdf

Neural Information Processing Systems

Can a physicist make only a finite number of errors in the eternal quest to uncover the law of nature? This millennium-old philosophical problem, known as inductive inference, lies at the heart of epistemology. Despite its significance to understanding human reasoning, a rigorous justification of inductive inference has remained elusive. At a high level, inductive inference asks whether one can make at most finite errors amidst an infinite sequence of observations, when deducing the correct hypothesis from a given hypothesis class. Historically, the only theoretical guarantee has been that if the hypothesis class is countable, inductive inference is possible, as exemplified by Solomonoff induction for learning Turing machines. In this paper, we provide a tight characterization of inductive inference by establishing a novel link to online learning theory. As our main result, we prove that inductive inference is possible if and only if the hypothesis class is a countable union of online learnable classes, potentially with an uncountable size, no matter the observations are adaptively chosen or iid sampled. Moreover, the same condition is also sufficient and necessary in the agnostic setting, where any hypothesis class meeting this criterion enjoys an ร•( T) regret bound for any time step T, while others require an arbitrarily slow rate of regret. Our main technical tool is a novel non-uniform online learning framework, which may be of independent interest.


6c990b7aca7bc7058f5e98ea909e924b-AuthorFeedback.pdf

Neural Information Processing Systems

R2 Why compare to COBYLA instead of BOBYQA?: Thank you for the suggestion! We ran BOBYQA (using nlopt) on robot pushing (mean 9.22) and rover (mean 1.40) and it indeed performs better than COBYLA. Thus, we will replace COBYLA by BOBYQA in all experiments. Note that TuRBO still outperforms both by a large margin. R2 The results of EBO for the robot experiments are quite different from the results on the original paper: The results are consistent and we used the code from the authors.


End-to-End Learning on 3D Protein Structure for Interface Prediction

Neural Information Processing Systems

Despite an explosion in the number of experimentally determined, atomically detailed structures of biomolecules, many critical tasks in structural biology remain data-limited. Whether performance in such tasks can be improved by using large repositories of tangentially related structural data remains an open question. To address this question, we focused on a central problem in biology: predicting how proteins interact with one another--that is, which surfaces of one protein bind to those of another protein. We built a training dataset, the Database of Interacting Protein Structures (DIPS), that contains biases but is two orders of magnitude larger than those used previously. We found that these biases significantly degrade the performance of existing methods on gold-standard data. Hypothesizing that assumptions baked into the hand-crafted features on which these methods depend were the source of the problem, we developed the first end-to-end learning model for protein interface prediction, the Siamese Atomic Surfacelet Network (SASNet). Using only spatial coordinates and identities of atoms, SASNet outperforms state-of-the-art methods trained on gold-standard structural data, even when trained on only 3% of our new dataset.


Reviewer 1 can be trained quickly on an arbitrarily large dataset

Neural Information Processing Systems

We thank the reviewers for their detailed and thoughtful feedback. We respond to each reviewer individually below. So it is a mean (across training seeds) of the median (across complexes) of AUROC. We will clarify this in the Table 2 and Figure 1 captions. We will clarify this point.


Researchers develop face 'e-tattoo' to track mental workload in high-stress jobs

FOX News

Tyler Saltsman, founder and CEO of EdgeRunner AI, warned that creating artificial general intelligence could "destroy the world as we know it." Scientists say that they have formulated a way to help people in stressful and demanding work environments track their brainwaves and brain usage -- an electronic tattoo device, or "e-tattoo," on the person's face. In a study posted in the science journal Device, the team of researchers wrote that they found e-tattoos to be a more cost-effective and simpler way to track one's mental workload. Dr. Nanshu Lu, the senior author of the research from the University of Texas at Austin, wrote that mental workload is a critical factor in human-in-the-loop systems, directly influencing cognitive performance and decision-making. Lu told Fox News Digital in an email that this device was motivated by high-demand, high-stake jobs such as pilots, air traffic controllers, doctors and emergency dispatchers.


Rethinking the Membrane Dynamics and Optimization Objectives of Spiking Neural Networks, Gang Pan

Neural Information Processing Systems

Despite spiking neural networks (SNNs) have demonstrated notable energy efficiency across various fields, the limited firing patterns of spiking neurons within fixed time steps restrict the expression of information, which impedes further improvement of SNN performance. In addition, current implementations of SNNs typically consider the firing rate or average membrane potential of the last layer as the output, lacking exploration of other possibilities. In this paper, we identify that the limited spike patterns of spiking neurons stem from the initial membrane potential (IMP), which is set to 0. By adjusting the IMP, the spiking neurons can generate additional firing patterns and pattern mappings. Furthermore, we find that in static tasks, the accuracy of SNNs at each time step increases as the membrane potential evolves from zero. This observation inspires us to propose a learnable IMP, which can accelerate the evolution of membrane potential and enables higher performance within a limited number of time steps. Additionally, we introduce the last time step (LTS) approach to accelerate convergence in static tasks, and we propose a label smooth temporal efficient training (TET) loss to mitigate the conflicts between optimization objective and regularization term in the vanilla TET. Our methods improve the accuracy by 4.05% on ImageNet compared to baseline and achieve state-of-theart performance of 87.80% on CIFAR10-DVS and 87.86% on N-Caltech101.