AITopics | Asia

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data Only The Falcon LLMTeam

Neural Information Processing SystemsApr-30-2026, 09:16:27 GMT

This curation process is believed to be necessary to produce 5 performant models with broad zero-shot generalization abilities. However, as larger 6 models requiring pretraining on trillions of tokens are considered, it is unclear how 7 scalable is curation, and whether we will run out of unique high-quality data soon.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia (0.28)
North America > United States (0.28)

Genre:

Research Report (0.68)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation

Neural Information Processing SystemsApr-30-2026, 09:07:36 GMT

Many fine-grained classification tasks, like rare animal identification, have limited training data and consequently classifiers trained on these datasets often fail to generalize to variations in the domain like changes in weather or location. As such, we explore how natural language descriptions of the domains seen in training data can be used with large vision models trained on diverse pretraining datasets to generate useful variations of the training data. We introduce ALIA (Automated Language-guided Image Augmentation), a method which utilizes large vision and language models to automatically generate natural language descriptions of a dataset's domains and augment the training data via language-guided image editing. To maintain data integrity, a model trained on the original dataset filters out minimal image edits and those which corrupt class-relevant information. The resulting dataset is visually consistent with the original training data and offers significantly enhanced diversity. We show that ALIA is able to surpasses traditional data augmentation and text-to-image generated data on fine-grained classification tasks, including cases of domain generalization and contextual bias. Code is available at https://github.com/lisadunlap/ALIA.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: Asia > Middle East (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

f7d3cef7ff579f2f903c8f458e730cae-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 08:38:51 GMT

artificial intelligence, machine learning, subtask, (14 more...)

Neural Information Processing Systems

Country:

Asia > China (0.47)
Europe (0.46)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

ADataset for Analyzing Streaming Media Performance over HTTP/3 Browsers

Neural Information Processing SystemsApr-30-2026, 08:09:28 GMT

HTTP/3 is a new application layer protocol supported by most browsers. It uses QUIC as an underlying transport protocol. QUIC provides multiple benefits, like faster connection establishment, reduced latency, and improved connection migration. Hence, popular browsers like Chrome/Chromium, Microsoft Edge, Apple Safari, and Mozilla Firefox have started supporting it. This paper presents an HTTP/3-supported browser dataset collection tool named H3B.

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > India (0.15)

Industry:

Telecommunications (0.94)
Information Technology > Software (0.68)
Information Technology > Networks (0.46)
(2 more...)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Communications > Mobile (0.68)
Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Communications > Web (0.66)

Add feedback

GenImage: AMillion-Scale Benchmark for Detecting AI-Generated Image

Neural Information Processing SystemsApr-30-2026, 07:49:01 GMT

The extraordinary ability of generative models to generate photographic images has intensified concerns about the spread of disinformation, thereby leading to the demand for detectors capable of distinguishing between AI-generated fake images and real images. However, the lack of large datasets containing images from the most advanced image generators poses an obstacle to the development of such detectors. In this paper, we introduce the GenImage dataset, which has the following advantages: 1) Plenty of Images, including over one million pairs of AI-generated fake images and collected real images.

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Industry: Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

f4b6ef2a78684dca2fb3f1c09372e041-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 07:47:43 GMT

artificial intelligence, dataset, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.28)
Asia (0.28)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

f4b6ef2a78684dca2fb3f1c09372e041-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 07:47:39 GMT

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
Asia (0.28)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

ChatGPT trounces humans in entrance exams for top Japan university, study finds

The Japan TimesApr-30-2026, 07:38:00 GMT

AI models surpassed the highest score recorded for a human test taker in this year's University of Tokyo entrance exam, a new study shows. If an artificial intelligence model such as ChatGPT had taken the entrance exams for Japan's top university in 2026, it would have been assessed as top of the class and admitted for scoring higher than any human test takers, a study by AI startup LifePrompt has found. The research used three major AI models -- ChatGPT 5.2 Thinking by OpenAI, Gemini 3 Pro Preview by Google and Claude Opus 4.5 by Anthropic -- and had them take the actual entrance exam used by the University of Tokyo in February 2026 to assess candidates for courses set to start in April. The university's category 3 science exam, often taken by those who want to enter the institution's medical school, is considered the most difficult exam to pass in Japan. In a time of both misinformation and too much information, quality journalism is more crucial than ever.

large language model, machine learning, natural language, (13 more...)

The Japan Times

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.89)

Industry:

Media > News (0.71)
Education > Assessment & Standards > Student Performance (0.56)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The split between China and Silicon Valley just got wider

The Japan TimesApr-30-2026, 07:33:00 GMT

Beijing's insistence that Meta unwind its deal with a Chinese A.I. start-up marks an escalation in the geopolitical fight over advanced tech. TAIPEI - Manus, an artificial intelligence startup, began with an idea among three engineers in Wuhan, China, united by an obsession with AI and a shared ambition to build a global venture. From the outset, they looked beyond China. Their big break came in March last year. Manus had drawn the attention of Silicon Valley investors with an AI agent capable of carrying out tasks on its own.

artificial intelligence, sanae takaichi sushi tech tokyo, social media, (8 more...)

The Japan Times

Country:

Asia > China (1.00)
North America > United States > California (0.89)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.47)

Industry:

Information Technology > Services (0.33)
Media > News (0.31)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.79)

Add feedback

Appendix

Neural Information Processing SystemsApr-30-2026, 07:24:16 GMT

The following section is answers to questions listed in datasheets for datasets. A.1 Motivation For what purpose was the dataset created? VisAlign is created to serve as a benchmark for measuring visual perception alignment between AI models and humans. Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., company, institution, organization)? Who funded the creation of the dataset? If there is an associated grant, please provide the name of the grantor and the grant name and number. This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant (No.2019-0-00075, Artificial Intelligence Graduate School Program(KAIST)) and National Research Foundation of Korea (NRF) grant (NRF2020H1D3A2A03100945), funded by the Korea government (MSIT). A.2 Composition What do the instances that comprise the dataset represent (e.g., documents, photos, people, countries)? VisAlign contains eight different types of images and their corresponding gold human labels. How many instances are there in total (of each type, if appropriate)? There are a total of 12500 images in the train set, distributed equally among the 10 classes. The open test set and the closed test each contain 900 images: 100 images each in Categories 1 to 7 and 200 images in Category 8. Does the dataset contain all possible instances or is it a sample (not necessarily random) of instances from a larger set?

artificial intelligence, dataset, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Asia (0.46)
North America (0.28)

Genre: Research Report (1.00)

Industry:

Law (0.67)
Information Technology (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Filters

Collaborating Authors

Asia

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data Only The Falcon LLMTeam

Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation

f7d3cef7ff579f2f903c8f458e730cae-Paper-Conference.pdf

ADataset for Analyzing Streaming Media Performance over HTTP/3 Browsers

GenImage: AMillion-Scale Benchmark for Detecting AI-Generated Image

f4b6ef2a78684dca2fb3f1c09372e041-Supplemental-Conference.pdf

f4b6ef2a78684dca2fb3f1c09372e041-Paper-Conference.pdf

ChatGPT trounces humans in entrance exams for top Japan university, study finds

The split between China and Silicon Valley just got wider

Appendix