AITopics | large-scale dataset

GS-Blur: A 3D Scene-Based Dataset for Realistic Image Deblurring

Neural Information Processing SystemsMar-22-2026, 18:06:23 GMT

To train a deblurring network, an appropriate dataset with paired blurry and sharp images is essential.Existing datasets collect blurry images either synthetically by aggregating consecutive sharp frames or using sophisticated camera systems to capture real blur.However, these methods offer limited diversity in blur types (blur trajectories) or require extensive human effort to reconstruct large-scale datasets, failing to fully reflect real-world blur scenarios.To address this, we propose GS-Blur, a dataset of synthesized realistic blurry images created using a novel approach.To this end, we first reconstruct 3D scenes from multi-view images using 3D Gaussian Splatting~(3DGS), then render blurry images by moving the camera view along the randomly generated motion trajectories.By adopting various camera trajectories in reconstructing our GS-Blur, our dataset contains realistic and diverse types of blur, offering a large-scale dataset that generalizes well to real-world blur.Using GS-Blur with various deblurring methods, we demonstrate its ability to generalize effectively compared to previous synthetic or real blur datasets, showing significant improvements in deblurring performance.We will publicly release our dataset.

artificial intelligence, dataset, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.40)

Add feedback

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

Neural Information Processing SystemsMar-21-2026, 05:06:34 GMT

The arrival of Sora marks a new era for text-to-video diffusion models, bringing significant advancements in video generation and potential applications. However, Sora, along with other text-to-video diffusion models, is highly reliant on prompts, and there is no publicly available dataset that features a study of text-to-video prompts. In this paper, we introduce VidProM, the first large-scale dataset comprising 1.67 Million unique text-to-Video Prompts from real users. Additionally, this dataset includes 6.69 million videos generated by four state-of-the-art diffusion models, alongside some related data. We initially discuss the curation of this large-scale dataset, a process that is both time-consuming and costly. Subsequently, we underscore the need for a new prompt dataset specifically designed for text-to-video generation by illustrating how VidProM differs from DiffusionDB, a large-scale prompt-gallery dataset for image generation. Our extensive and diverse dataset also opens up many exciting new research areas. For instance, we suggest exploring text-to-video prompt engineering, efficient video generation, and video copy detection for diffusion models to develop better, more efficient, and safer models. The project (including the collected dataset VidProM and related code) is publicly available at https://vidprom.github.io

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels

Neural Information Processing SystemsMar-17-2026, 02:06:41 GMT

Deep neural networks (DNNs) have achieved tremendous success in a variety of applications across many disciplines. Yet, their superior performance comes with the expensive cost of requiring correctly annotated large-scale datasets. Moreover, due to DNNs' rich capacity, errors in training labels can hamper performance. To combat this problem, mean absolute error (MAE) has recently been proposed as a noise-robust alternative to the commonly-used categorical cross entropy (CCE) loss. However, as we show in this paper, MAE can perform poorly with DNNs and large-scale datasets. Here, we present a theoretically grounded set of noise-robust loss functions that can be seen as a generalization of MAE and CCE. Proposed loss functions can be readily applied with any existing DNN architecture and algorithm, while yielding good performance in a wide range of noisy label scenarios. We report results from experiments conducted with CIFAR-10, CIFAR-100 and FASHION-MNIST datasets and synthetically generated noisy labels.

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Learning from Highly Sparse Spatio-temporal Data

Neural Information Processing SystemsFeb-17-2026, 08:11:50 GMT

Incomplete spatio-temporal data in the real world has spawned much research.

artificial intelligence, data mining, machine learning, (22 more...)

Neural Information Processing Systems

Country:

Europe > Spain > Galicia > Madrid (0.04)
Asia > China (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Energy (0.46)

Technology:

Information Technology > Data Science > Data Mining (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.67)

Add feedback

c2469e35d469e3c0eca09dbe484eb474-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 22:48:40 GMT

artificial intelligence, data mining, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Heilongjiang Province > Daqing (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Katakomba: Tools and Benchmarks for Data-Driven NetHack

Neural Information Processing SystemsFeb-16-2026, 08:16:02 GMT

NetHack is known as the frontier of reinforcement learning research where learning-based methods still need to catch up to rule-based solutions.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: Europe > Portugal > Braga > Braga (0.04)

Genre: Research Report (0.93)

Industry: Leisure & Entertainment > Games (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Discrimination-aware Channel Pruning for Deep Neural Networks

Zhuangwei Zhuang, Mingkui Tan, Bohan Zhuang, Jing Liu, Yong Guo, Qingyao Wu, Junzhou Huang, Jinhui Zhu

Neural Information Processing SystemsFeb-12-2026, 20:48:58 GMT

Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, whilst the latter kind optimizes the reconstruction error but ignores the discriminative power of channels.

artificial intelligence, arxivpreprintarxiv, machine learning, (17 more...)

Neural Information Processing Systems

Country: