Plotting

Unpacking the Flaws of Techbro Dreams of the Future

Mother Jones

Cutaway view of a fictional space colony concept painted by artist Rick Guidice as part of a NASA art program in the 1970s. This story was originally published by Undark and is reproduced here as part of the Climate Desk collaboration. Elon Musk once joked: "I would like to die on Mars. Musk is, in fact, deadly serious about colonizing the Red Planet. Part of his motivation is the idea of having a "back-up" planet in case some future catastrophe renders the Earth uninhabitable. Musk has suggested that a million people may be calling Mars home by 2050 -- and he's hardly alone in his enthusiasm. Venture capitalist Marc Andreessen believes the world can easily support 50 billion people, and more than that once we settle other planets. And Jeff Bezos has spoken of exploiting the resources of the moon and the asteroids to build giant space stations. "I would love to see a trillion humans living in the solar system," he has said. Not so fast, cautions science journalist Adam Becker.


Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

Neural Information Processing Systems

Research on video generation has recently made tremendous progress, enabling high-quality videos to be generated from text prompts or images. Adding control to the video generation process is an important goal moving forward and recent approaches that condition video generation models on camera trajectories make strides towards it. Yet, it remains challenging to generate a video of the same scene from multiple different camera trajectories. Solutions to this multi-video generation problem could enable large-scale 3D scene generation with editable camera trajectories, among other applications. We introduce collaborative video diffusion (CVD) as an important step towards this vision. The CVD framework includes a novel cross-video synchronization module that promotes consistency between corresponding frames of the same video rendered from different camera poses using an epipolar attention mechanism. Trained on top of a state-of-the-art camera-control module for video generation, CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines, as shown in extensive experiments.



AR-Pro: Counterfactual Explanations for Anomaly Repair with Formal Properties

Neural Information Processing Systems

Anomaly detection is widely used for identifying critical errors and suspicious behaviors, but current methods lack interpretability. We leverage common properties of existing methods and recent advances in generative models to introduce counterfactual explanations for anomaly detection. Given an input, we generate its counterfactual as a diffusion-based repair that shows what a non-anomalous version should have looked like. A key advantage of this approach is that it enables a domain-independent formal specification of explainability desiderata, offering a unified framework for generating and evaluating explanations. We demonstrate the effectiveness of our anomaly explainability framework, AR-Pro, on vision (MVTec, VisA) and time-series (SWaT, WADI, HAI) anomaly datasets. The code used for the experiments is accessible at: https://github.com/xjiae/arpro.


Ghost kitchen delivery drivers have overrun an Echo Park neighborhood, say frustrated residents

Los Angeles Times

As soon as Echo Park Eats opened on the corner of Sunset Boulevard and Douglas Street in the fall of 2023, Sandy Romero said her neighborhood became overrun with delivery drivers. "The first day that they opened business it was chaotic, unorganized and it's just such a nuisance now," she said. Echo Park Eats is a ghost kitchen, a meal preparation hub for app-based delivery orders. It rents its kitchens to 26 different food vendors. The facility is part of CloudKitchens, led by Travis Kalanick, co-founder of Uber Technologies, which has kitchen locations across the nation including 11 in Los Angeles County.


Mixture of Link Predictors on Graphs

Neural Information Processing Systems

Link prediction, which aims to forecast unseen connections in graphs, is a fundamental task in graph machine learning. Heuristic methods, leveraging a range of different pairwise measures such as common neighbors and shortest paths, often rival the performance of vanilla Graph Neural Networks (GNNs). Therefore, recent advancements in GNNs for link prediction (GNN4LP) have primarily focused on integrating one or a few types of pairwise information. In this work, we reveal that different node pairs within the same dataset necessitate varied pairwise information for accurate prediction and models that only apply the same pairwise information uniformly could achieve suboptimal performance. As a result, we propose a simple mixture of experts model Link-MoE for link prediction. Link-MoE utilizes various GNNs as experts and strategically selects the appropriate expert for each node pair based on various types of pairwise information. Experimental results across diverse real-world datasets demonstrate substantial performance improvement from Link-MoE. Notably, Link-MoE achieves a relative improvement of 18.71% on the MRR metric for the Pubmed dataset and 9.59% on the Hits@100 metric for the ogbl-ppa dataset, compared to the best baselines. The code is available at https://github.com/ml-ml/Link-MoE/.


Rectifying the Shortcut Learning of Background for Few-Shot Learning

Neural Information Processing Systems

The category gap between training and evaluation has been characterised as one of the main obstacles to the success of Few-Shot Learning (FSL). In this paper, we for the first time empirically identify image background, common in realistic images, as a shortcut knowledge helpful for in-class classification but ungeneralizable beyond training categories in FSL. A novel framework, COSOC, is designed to tackle this problem by extracting foreground objects in images at both training and evaluation without any extra supervision. Extensive experiments carried on inductive FSL tasks demonstrate the effectiveness of our approaches.


Supplementary Material A Proofs & Derivations 15 A.1 Finite-and Infinite-Horizon Variational Objectives

Neural Information Processing Systems

To solve the variational problem in Equation (A.5), we can define a factorized variational family Returning to the variational problem in Equation (A.5), we can now write D Following Haarnoja et al. [16], we define the variational distribution over next states to be the the true transition dynamics, that is, q Corollary 1 (Fixed-Time Outcome-Driven Reward Function). Parts of this proof are adapted from the proof given in Haarnoja et al. [16], modified for the Bellman operator proposed in Definition 1. 1. Outcome-Driven Policy Evaluation (ODPE): Instead of absorbing the entropy term into the Q-function, we can define an entropy-augmented reward as r The first condition is true by assumption. Therefore, we apply convergence results for policy evaluation with transition-dependent discount factors [53] to this contraction mapping, and the result immediately follows. Convergence follows from Outcome-Driven Policy Evaluation above. Remark 2. The convergence proof of ODPE assumes a transition-dependent discount factor [53], because the variational distribution used in Equation (11) depends on the next state and action as well as on the desired outcome.


EEVR: A Dataset of Paired Physiological Signals and Textual Descriptions for Joint Emotion Representation Learning

Neural Information Processing Systems

Figure 2: The figure presents still images extracted from 360 videos used in the experiment to display various environments to the participants. The videos were selected from the publically available 360 VR video dataset (Li et al. (2017).) The EEVR dataset comprises synchronized pairs of physiological signals and textual data. It includes responses to four self-assessment questions regarding perceived arousal, valence, dominance, and discrete emotions ratings collected using PANAS questionnaires (which were further utilized to calculate Positive and Negative Affect Score). The EEVR dataset was collected using Virtual Reality (VR) 360 videos as the elicitation medium. The videos utilized in the dataset were selected based on their arousal and valence ratings to cover all four quadrants of the Russell circumplex emotion model (Russell et al. (1989)), as shown in Figure 2. The remainder of the supplementary materials provide detailed information about the EEVR dataset. Figure 3 provides a datasheet for the EEVR dataset based on Gebru et al. (2018).


EEVR: A Dataset of Paired Physiological Signals and Textual Descriptions for Joint Emotion Representation Learning

Neural Information Processing Systems

EEVR (Emotion Elicitation in Virtual Reality) is a novel dataset specifically designed for language supervision-based pre-training of emotion recognition tasks, such as valence and arousal classification. It features high-quality physiological signals, including electrodermal activity (EDA) and photoplethysmography (PPG), acquired through emotion elicitation via 360-degree virtual reality (VR) videos. Additionally, it includes subject-wise textual descriptions of emotions experienced during each stimulus gathered from qualitative interviews. The dataset consists of recordings from 37 participants and is the first dataset to pair raw text with physiological signals, providing additional contextual information that objective labels cannot offer. To leverage this dataset, we introduced the Contrastive Language Signal Pre-training (CLSP) method, which jointly learns representations using pairs of physiological signals and textual descriptions. Our results show that integrating self-reported textual descriptions with physiological signals significantly improves performance on emotion recognition tasks, such as arousal and valence classification. Moreover, our pre-trained CLSP model demonstrates strong zero-shot transferability to existing datasets, outperforming supervised baseline models, suggesting that the representations learned by our method are more contextualized and generalized. The dataset also includes baseline models for arousal, valence, and emotion classification, as well as code for data cleaning and feature extraction.