AITopics | statistically significant improvement

Collaborating Authors

statistically significant improvement

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Alchemist: Turning Public Text-to-Image Data into Generative Gold

Neural Information Processing SystemsJun-15-2026, 13:14:56 GMT

Pre-training equips text-to-image (T2I) models with broad world knowledge, but this alone is often insufficient to achieve high aesthetic quality and alignment. Consequently, supervised fine-tuning (SFT) is crucial for further refinement. However, its effectiveness highly depends on the quality of the fine-tuning dataset. Existing public SFT datasets frequently target narrow domains (e.g., anime or specific art styles), and the creation of high-quality, general-purpose SFT datasets remains a significant challenge. Current curation methods are often costly and struggle to identify truly impactful samples.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.67)
Transportation (0.46)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

39d0a8908fbe6c18039ea8227f827023-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 12:23:56 GMT

artificial intelligence, participant, sketch, (16 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.47)

Industry: Leisure & Entertainment > Games (0.47)

Technology: Information Technology > Artificial Intelligence (0.70)

Add feedback

39d0a8908fbe6c18039ea8227f827023-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 06:37:52 GMT

agent, participant, sketch, (15 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.47)

Industry: Leisure & Entertainment > Games (0.47)

Technology: Information Technology > Artificial Intelligence (0.70)

Add feedback

Hyperellipsoid Density Sampling: Exploitative Sequences to Accelerate High-Dimensional Optimization

Soltes, Julian

arXiv.org Artificial IntelligenceNov-18-2025

The curse of dimensionality presents a pervasive challenge in optimization problems, with exponential expansion of the search space rapidly causing traditional algorithms to become inefficient or infeasible. An adaptive sampling strategy is presented to accelerate optimization in this domain as an alternative to uniform quasi-Monte Carlo (QMC) methods. This method, referred to as Hyperellipsoid Density Sampling (HDS), generates its sequences by defining multiple hyperellipsoids throughout the search space. HDS uses three types of unsupervised learning algorithms to circumvent high-dimensional geometric calculations, producing an intelligent, non-uniform sample sequence that exploits statistically promising regions of the parameter space and improves final solution quality in high-dimensional optimization problems. A key feature of the method is optional Gaussian weights, which may be provided to influence the sample distribution towards known locations of interest. This capability makes HDS versatile for applications beyond optimization, providing a focused, denser sample distribution where models need to concentrate their efforts on specific, non-uniform regions of the parameter space. The method was evaluated against Sobol, a standard QMC method, using differential evolution (DE) on the 29 CEC2017 benchmark test functions. The results show statistically significant improvements in solution geometric mean error (p < 0.05), with average performance gains ranging from 3% in 30D to 37% in 10D. This paper demonstrates the efficacy of HDS as a robust alternative to QMC sampling for high-dimensional optimization.

artificial intelligence, dimension, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2511.07836

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Bias-Aware Mislabeling Detection via Decoupled Confident Learning

Li, Yunyi, De-Arteaga, Maria, Saar-Tsechansky, Maytal

arXiv.org Artificial IntelligenceJul-15-2025

Reliable data is a cornerstone of modern organizational systems. A notable data integrity challenge stems from label bias, which refers to systematic errors in a label, a covariate that is central to a quantitative analysis, such that its quality differs across social groups. This type of bias has been conceptually and empirically explored and is widely recognized as a pressing issue across critical domains. However, effective methodologies for addressing it remain scarce. In this work, we propose Decoupled Confident Learning (DeCoLe), a principled machine learning based framework specifically designed to detect mislabeled instances in datasets affected by label bias, enabling bias aware mislabelling detection and facilitating data quality improvement. We theoretically justify the effectiveness of DeCoLe and evaluate its performance in the impactful context of hate speech detection, a domain where label bias is a well documented challenge. Empirical results demonstrate that DeCoLe excels at bias aware mislabeling detection, consistently outperforming alternative approaches for label error detection. Our work identifies and addresses the challenge of bias aware mislabeling detection and offers guidance on how DeCoLe can be integrated into organizational data management practices as a powerful tool to enhance data reliability.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2507.07216

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (0.67)
Information Technology > Security & Privacy (0.45)
Health & Medicine > Government Relations & Public Policy (0.45)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

The Limited Impact of Medical Adaptation of Large Language and Vision-Language Models

Jeong, Daniel P., Mani, Pranav, Garg, Saurabh, Lipton, Zachary C., Oberst, Michael

arXiv.org Artificial IntelligenceNov-13-2024

Several recent works seek to develop foundation models specifically for medical applications, adapting general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pretraining on publicly available biomedical corpora. These works typically claim that such domain-adaptive pretraining (DAPT) improves performance on downstream medical tasks, such as answering medical licensing exam questions. In this paper, we compare ten public "medical" LLMs and two VLMs against their corresponding base models, arriving at a different conclusion: all medical VLMs and nearly all medical LLMs fail to consistently improve over their base models in the zero-/few-shot prompting and supervised fine-tuning regimes for medical question-answering (QA). For instance, across all tasks and model pairs we consider in the 3-shot setting, medical LLMs only outperform their base models in 22.7% of cases, reach a (statistical) tie in 36.8% of cases, and are significantly worse than their base models in the remaining 40.5% of cases. Our conclusions are based on (i) comparing each medical model head-to-head, directly against the corresponding base model; (ii) optimizing the prompts for each model separately in zero-/few-shot prompting; and (iii) accounting for statistical uncertainty in comparisons. While these basic practices are not consistently adopted in the literature, our ablations show that they substantially impact conclusions. Meanwhile, we find that after fine-tuning on specific QA tasks, medical LLMs can show performance improvements, but the benefits do not carry over to tasks based on clinical notes. Our findings suggest that state-of-the-art general-domain models may already exhibit strong medical knowledge and reasoning capabilities, and offer recommendations to strengthen the conclusions of future studies.

dataset, prompt format, qa dataset, (13 more...)

arXiv.org Artificial Intelligence

2411.0887

Country:

Asia > Middle East > UAE (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.93)
Health & Medicine > Health Care Providers & Services (0.67)
Health & Medicine > Health Care Technology > Medical Record (0.50)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Learning Curricula in Open-Ended Worlds

Jiang, Minqi

arXiv.org Artificial IntelligenceDec-7-2023

Deep reinforcement learning (RL) provides powerful methods for training optimal sequential decision-making agents. As collecting real-world interactions can entail additional costs and safety risks, the common paradigm of sim2real conducts training in a simulator, followed by real-world deployment. Unfortunately, RL agents easily overfit to the choice of simulated training environments, and worse still, learning ends when the agent masters the specific set of simulated environments. In contrast, the real world is highly open-ended, featuring endlessly evolving environments and challenges, making such RL approaches unsuitable. Simply randomizing over simulated environments is insufficient, as it requires making arbitrary distributional assumptions and can be combinatorially less likely to sample specific environment instances that are useful for learning. An ideal learning process should automatically adapt the training environment to maximize the learning potential of the agent over an open-ended task space that matches or surpasses the complexity of the real world. This thesis develops a class of methods called Unsupervised Environment Design (UED), which aim to produce such open-ended processes. Given an environment design space, UED automatically generates an infinite sequence or curriculum of training environments at the frontier of the learning agent's capabilities. Through extensive empirical studies and theoretical arguments founded on minimax-regret decision theory and game theory, the findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness and generalization to previously unseen environment instances. Such autocurricula are promising paths toward open-ended learning systems that achieve more general intelligence by continually generating and mastering additional challenges of their own design.

intelligence and interactive digital entertainment, proximal policy optimization algorithm, statistically significant improvement, (12 more...)

arXiv.org Artificial Intelligence

2312.03126

Country:

North America > United States > New York > New York County > New York City (0.27)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.13)
North America > United States > California > San Francisco County > San Francisco (0.13)
(41 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.67)

Industry:

Leisure & Entertainment > Sports > Motorsports > Formula One (1.00)
Energy (0.92)
Leisure & Entertainment > Games > Computer Games (0.92)
Education > Curriculum (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.92)

Add feedback

A Gentle Introduction to Random Forests, Ensembles, and Performance Metrics in a Commercial System

#artificialintelligenceNov-19-2022, 15:35:40 GMT

This is the first in a series of posts that illustrate what our data team is up to, experimenting with, and building'under the hood' at CitizenNet. He has been involved in web-scale machine learning and information retrieval for over 10 years. One of the first posts we published spoke at a high level of the technical problem CitizenNet is trying to solve. In essence, we are trying to predict what combinations of demographic and interest targets will be interested in some piece of content. On the CitizenNet platform, a user would create a project that would define (broadly) the target audience, the pieces of Facebook content they are looking to promote, and other campaign and financial information. Behind the scenes, a robust prediction system builds the targets for the project.

classifier, high ctr, precision, (15 more...)

#artificialintelligence

Genre:

Research Report > New Finding (0.31)
Research Report > Experimental Study (0.30)

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.48)

Add feedback

Few-shot training LLMs for project-specific code-summarization

Ahmed, Toufique, Devanbu, Premkumar

arXiv.org Artificial IntelligenceSep-8-2022

Very large language models (LLMs), such as GPT-3 and Codex have achieved state-of-the-art performance on several natural-language tasks, and show great promise also for code. A particularly exciting aspect of LLMs is their knack for few-shot and zero-shot learning: they can learn to perform a task with very few examples. Few-shotting has particular synergies in software engineering, where there are a lot of phenomena (identifier names, APIs, terminology, coding patterns) that are known to be highly project-specific. However, project-specific data can be quite limited, especially early in the history of a project; thus the few-shot learning capacity of LLMs might be very relevant. In this paper, we investigate the use few-shot training with the very large GPT (Generative Pre-trained Transformer) Codex model, and find evidence suggesting that one can significantly surpass state-of-the-art models for code-summarization, leveraging project-specific training.

code summarization, codex, few-shot training, (11 more...)

arXiv.org Artificial Intelligence

2207.04237

Country:

North America > United States > California > Yolo County > Davis (0.14)
North America > United States > Michigan > Oakland County > Rochester (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > Experimental Study (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Representation Learning for Efficient and Effective Similarity Search and Recommendation

Hansen, Casper

arXiv.org Artificial IntelligenceSep-4-2021

How data is represented and operationalized is critical for building computational solutions that are both effective and efficient. A common approach is to represent data objects as binary vectors, denoted \textit{hash codes}, which require little storage and enable efficient similarity search through direct indexing into a hash table or through similarity computations in an appropriate space. Due to the limited expressibility of hash codes, compared to real-valued representations, a core open challenge is how to generate hash codes that well capture semantic content or latent properties using a small number of bits, while ensuring that the hash codes are distributed in a way that does not reduce their search efficiency. State of the art methods use representation learning for generating such hash codes, focusing on neural autoencoder architectures where semantics are encoded into the hash codes by learning to reconstruct the original inputs of the hash codes. This thesis addresses the above challenge and makes a number of contributions to representation learning that (i) improve effectiveness of hash codes through more expressive representations and a more effective similarity measure than the current state of the art, namely the Hamming distance, and (ii) improve efficiency of hash codes by learning representations that are especially suited to the choice of search method. The contributions are empirically validated on several tasks related to similarity search and recommendation.

learning representation, unsupervised neural generative semantic hashing, unsupervised semantic hashing, (15 more...)

arXiv.org Artificial Intelligence

2109.01815

Country:

North America > United States > California > San Francisco County > San Francisco (0.13)
Europe > Denmark > Capital Region > Copenhagen (0.05)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
(12 more...)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.67)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
(5 more...)

Add feedback