Goto

Collaborating Authors

Bringing Image Structure to Video via Frame-Clip Consistency of Object Tokens

Neural Information Processing Systems

Recent action recognition models have achieved impressive results by integrating objects, their locations and interactions. However, obtaining dense structured annotations for each frame is tedious and time-consuming, making these methods expensive to train and less scalable. On the other hand, one does often have access to a small set of annotated images, either within or outside the domain of interest. Here we ask how such images can be leveraged for downstream video understanding tasks. We propose a learning framework StructureViT (SViT for short), which demonstrates how utilizing the structure of a small number of images only available during training can improve a video model.


clarify that B-RAI [24] is a recently proposed algorithm for estimating the posterior probability of causal relations among observed

Neural Information Processing Systems

We would like to sincerely thank you for your important ideas and constructive comments. It is not related to the deep learning domain. We will clearly state these contributions in the paper. As you suggest, we will define B2N, RAI, and GGT in the paper. An ensemble of 15 (last point on the curve, Figure 1), having a total of 3.6M parameters, is Optimizing for a specific loss hinders other objectives, e.g., accuracy and calibration.


Efficient Algorithms for Smooth Minimax Optimization

Neural Information Processing Systems

In terms of g(, y), we consider two settings - strongly convex and nonconvex - and improve upon the best known rates in both. For strongly-convex g(, y), y, we propose a new direct optimal algorithm combining Mirror-Prox and Nesterov's AGD, and show that it can find global optimum in ร• (1/k


strongly-convex-concave minimax problems first, which we will add in the final revision

Neural Information Processing Systems

We thank all the reviewers for their constructive comments. Conceptual DIAG: The intuition behind Algorithm 1 stems from a "conceptual" version of DIAG (also specified in Algorithm 1, Step 4), which is inspired from the conceptual version of Mirror-Prox (MP) (cf. Thus the overall speed of Imp-STEP is O()) steps. Response to reviewer 1: We agree with and will include, the reviewer's comment, that the non-smoothness of We will devote more space to explaining the DIAG algorithm and discussing more related works. We will add a precise justification (which was omitted due to the lack of space) in the next revision.


DARE: Disentanglement-Augmented Rationale Extraction

Neural Information Processing Systems

Rationale extraction can be considered as a straightforward method of improving the model explainability, where rationales are a subsequence of the original inputs, and can be extracted to support the prediction results. Existing methods are mainly cascaded with the selector which extracts the rationale tokens, and the predictor which makes the prediction based on selected tokens. Since previous works fail to fully exploit the original input, where the information of non-selected tokens is ignored, in this paper, we propose a Disentanglement-Augmented Rationale Extraction (DARE) method, which encapsulates more information from the input to extract rationales. Specifically, it first disentangles the input into the rationale representations and the non-rationale ones, and then learns more comprehensive rationale representations for extracting by minimizing the mutual information (MI) between the two disentangled representations. Besides, to improve the performance of MI minimization, we develop a new MI estimator by exploring existing MI estimation methods. Extensive experimental results on three real-world datasets and simulation studies clearly validate the effectiveness of our proposed method. Code is released at https://github.com/yuelinan/DARE.


The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Neural Information Processing Systems

The performance of a large language model (LLM) depends heavily on the quality and size of its pretraining dataset. However, the pretraining datasets for state-ofthe-art open LLMs like Llama 3 and Mixtral are not publicly available and very little is known about how they were created. In this work, we introduce FineWeb, a 15-trillion token dataset derived from 96 Common Crawl snapshots that produces better-performing LLMs than other open pretraining datasets. To advance the understanding of how best to curate high-quality pretraining datasets, we carefully document and ablate all of the design choices used in FineWeb, including indepth investigations of deduplication and filtering strategies. In addition, we introduce FineWeb-Edu, a 1.3-trillion token collection of educational text filtered from FineWeb.


Appendix: Not All Low-Pass Filters are Robust in Graph Convolutional Networks 15 B Broader Impact 16 C Additional Related Work 16 D Additional Preliminaries on Graph Signal Filtering

Neural Information Processing Systems

For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and scope? If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? Graph Convolutional Networks (GCNs) could be crucial tools for a broad range of applications, including social networks, computer vision, natural language processing, traffic prediction, chemistry, protein design, recommendation system and so on [64, 58]. Any of these applications may have a different social effect. The use of GCNs could improve protein design efficiency and lead to the development of new medicines, but it could also result in job losses.


7 Appendix A Limitations

Neural Information Processing Systems

Table 6 provides summary statistics of domain coverage. Overall, the benchmark covers 8,637 biology images and 8,678 pathology images across 12 subdomains. Similarly, Table 7 shows summary statistics of microscopy modalities covered by Micro-Bench perception, including 10,864 images for light microscopy, 5,618 for fluorescence microscopy, and 833 images for electron microscopy across 8 microscopy imaging submodalities and 25 unique microscopy staining techniques (see Table 8). Micro-Bench Perception (Coarse-grained): Hierarchical metadata for each of the 17,235 perception images and task-specific templates (shown in Table 23) are used to create 5 coarse-grained questions and captions regarding microscopy modality, submodality, domain, subdomain, and staining technique. The use of hierarchical metadata enables the generation of options within each hierarchical level.


Topological Attention for Time Series Forecasting

Neural Information Processing Systems

The problem of (point) forecasting univariate time series is considered. Most approaches, ranging from traditional statistical methods to recent learning-based techniques with neural networks, directly operate on raw time series observations. As an extension, we study whether local topological properties, as captured via persistent homology, can serve as a reliable signal that provides complementary information for learning to forecast. To this end, we propose topological attention, which allows attending to local topological features within a time horizon of historical data. Our approach easily integrates into existing end-to-end trainable forecasting models, such as N-BEATS, and, in combination with the latter, exhibits state-of-the-art performance on the large-scale M4 benchmark dataset of 100,000 diverse time series from different domains. Ablation experiments, as well as a comparison to a broad range of forecasting methods in a setting where only a single time series is available for training, corroborate the beneficial nature of including local topological information through an attention mechanism.


'Frasier' star Kelsey Grammer voices growing alarm over AI manipulation

FOX News

While artificial intelligence (AI) is playing a bigger role than ever in Hollywood, award-winning actor Kelsey Grammer is warning it may be "dangerous." The "Karen: A Brother Remembers" author opened up about his growing concern over AI deepfakes and the potential blurred lines between reality and manipulation. "What I'm a little sad about is our prevalence these days to come up with so many, as they try to say deepfakes," he told Fox News Digital. "You know, the ones who say it usually are the ones who are actually doing it. "Karen: A Brother Remembers" author Kelsey Grammer warns about the dangers of AI deepfakes in Hollywood, expressing concerns over the blurred lines between reality and manipulation. AI-generated images, known as "deepfakes," often involve editing videos or photos of people to make them look like someone else by using artificial intelligence. While the "Frasier" star has acknowledged AI to be beneficial in some capacity, including in the medical field, Grammer shared his reservations about how the system can potentially fabricate someone's identity in seconds. WATCH: KELSEY GRAMMER WARNS AI WILL'NEVER REFLECT THE SAME SPONTANEITY' AS HUMANS "I recognize the validity and the potential in AI," Grammer said. "I recognize the validity and the potential in AI, especially in medicine and a number of other things." Grammer warned, "But AI still is...