AITopics | proportion

Collaborating Authors

proportion

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FRESH: Information-Geometric Calibration of Patient-Level Models to Aggregate Evidence

Fuller, Franklin, Bertolini, Daniele, Liang, Samantha, Christopher, Jason, Smith, Aaron M.

arXiv.org Machine LearningMay-18-2026

Many decision in clinical science and epidemiology -- estimating probability of technical success for a clinical trial, assessing comparative effectiveness of two therapies, imputing a placebo effect onto natural history data -- rely on combining sources of information about a clinical cohort that comes from different kinds of studies. Specifically we contrast patient-level sources that provide granular pictures of individual disease course (clinical trial, registries, or electronic health records) with aggregate sources such as published clinical trial results and the TFLs (tables figures and listings). One strategy for combining aggregate with patient-level data sources is to bring each into a common format for a unified analysis. If one wants to maintain the analytic flexibility of patient-level data, then a natural solution is to convert the aggregate data information into a simulated patient-level dataset that recapitulate those aggregate statistics. This is an under-determined inverse problem in that there are many such datasets, and it cannot be well specified without further constraints. FRESH(Fusion of Recent Evidence with Subject Histories) provides a well-defined method for solving this problem, and therefore providing maximal analytic flexibility.

artificial intelligence, constraint, machine learning, (18 more...)

arXiv.org Machine Learning

2605.16246

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Imbalanced Classification under Capacity Constraints

Fraiman, Daniel, Fraiman, Ricardo

arXiv.org Machine LearningMay-6-2026

In many classification settings, the class of primary interest is underrepresented, leading to imbalanced data problems that arise in applications such as rare disease detection and fraud identification. In these contexts, identifying a potential positive instance typically triggers costly follow-up actions, such as medical imaging or detailed transaction inspection, which are subject to limited operational capacity. Motivated by this setting, we consider classification problems where data may arrive sequentially and decisions must be made under constraints on the number of instances that can be selected for further analysis. We propose a classification framework that explicitly controls the rate of positive predictions, enforcing a user-defined bound on the proportion of observations classified as belonging to the minority class while maximizing detection performance. The approach can be implemented using standard learning methods and naturally extends to online settings, where decisions are taken in real time. We show that incorporating capacity constraints leads to substantial improvements over classical approaches, including resampling techniques such as SMOTE, which do not directly control the selection rate.

artificial intelligence, classifier, machine learning, (18 more...)

arXiv.org Machine Learning

2605.03289

Country: South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.40)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

Add feedback

AMDP: An Adaptive Detection Procedure for False Discovery Rate Control in High-Dimensional Mediation Analysis

Neural Information Processing SystemsMay-1-2026, 04:46:39 GMT

High-dimensional mediation analysis is often associated with a multiple testing problem for detecting significant mediators. Assessing the uncertainty of this detecting process via false discovery rate (FDR) has garnered great interest. To control the FDR in multiple testing, two essential steps are involved: ranking and selection. Existing approaches either construct p-values without calibration or disregard the joint information across tests, leading to conservation in FDR control or non-optimal ranking rules for multiple hypotheses. In this paper, we develop an adaptive mediation detection procedure (referred to as "AMDP") to identify relevant mediators while asymptotically controlling the FDR in high-dimensional mediation analysis. AMDP produces the optimal rule for ranking hypotheses and proposes a data-driven strategy to determine the threshold for mediator selection. This novel method captures information from the proportions of composite null hypotheses and the distribution of p-values, which turns the high dimensionality into an advantage instead of a limitation. The numerical studies on synthetic and real data sets illustrate the performances of AMDP compared with existing approaches.

artificial intelligence, dact, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.90)

Industry:

Law > Alternative Dispute Resolution (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

f499387f191d6be56e68966181095878-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 07:46:48 GMT

artificial intelligence, decoder, machine learning, (18 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

e2cfb719f58585f779d0a4f9f07bd618-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-30-2026, 02:17:07 GMT

A.1 Creation of the Multimodal Web Document Dataset A.1.1 Collecting of a Large Number of HTMLFiles Our data collection process begins by considering the 25 most recent Common Crawl6 dumps available at the time of dataset creation. It contains webpages spanning from February 2020 to January/February 2023. We use a modified version of readability-lxml7 to extract the main text from the pages, discarding any pages that contain text of excessively high perplexity. This process yields a total of 41.2 billion documents. Selection of English content To identify non-English content, we apply the FastText classifier (Joulin et al., 2017) to the extracted text, e ectively filtering out 63.6% of the documents. Early text deduplication Often, a set of URLs is crawled repeatedly across di erent Common Crawl snapshots. However, the content of these websites may vary as web administrators make changes over time. Hence, at this stage, we refrain from deduplicating documents based on their URLs. Instead, we perform MinHash (Broder, 1997) deduplication with 16 hashes calculated over 5-grams. To further refine the data, we eliminate documents containing substantial proportions of repeated paragraphs and n-grams, employing the methodology described in MassiveText (Rae et al., 2022).

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Africa (1.00)
North America > Canada (0.93)
(2 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Leisure & Entertainment > Sports > Martial Arts (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
(14 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Mobile (0.68)
(2 more...)

Add feedback

Mixed Membership sub-Gaussian Models

Qing, Huan

arXiv.org Machine LearningApr-27-2026

The Gaussian mixture model is widely used in unsupervised learning, owing to its simplicity and interpretability. However, a fundamental limitation of the classical Gaussian mixture model is that it forces each observation to belong to exactly one component. In many practical applications, such as genetics, social network analysis, and text mining, an observation may naturally belong to multiple components or exhibit partial membership in several latent components. To overcome this limitation, we propose the mixed membership sub-Gaussian model, which extends the classical Gaussian mixture framework by allowing each observation to belong to multiple components. This model inherits the interpretability of the classical Gaussian mixture model while offering greater flexibility for capturing complex overlapping structures. We develop an efficient spectral algorithm to estimate the mixed membership of each individual observation, and under mild separation conditions on the component centres, we prove that the estimation error of the per-individual membership vector can be made arbitrarily small with high probability. To our knowledge, this is the first work to provide a computationally efficient estimator with such a vanishing-error guarantee for a mixed-membership extension of the Gaussian mixture model. Extensive experimental studies demonstrate that our method outperforms existing approaches that ignore mixed memberships.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2604.22633

Country: Asia (0.28)

Genre: Research Report > Experimental Study (0.34)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

5446f217e9504bc593ad9dcf2ec88dda-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 23:03:32 GMT

Python notebooks producing the figures of this paper are available at https://github.com/ Let F be the joint distribution on RTD obtained by first assigning a random vector =( 1| | T) via F and then applying the map to each of the components t, and let F 1,..., F T denote the corresponding marginal distributions on RD. Given F and t F t, define the second moment matrices = E[ >] 2 RTD TD and t = E[ t >t ] 2 RD D, and let r = rank() and rt = rank( t). Let M 2 Rr TD be a matrix whose rows form a basis of supp(F), and similarly let Nt 2 Rrt D be a matrix whose rows form a basis of supp(F t). Let N = diag(N1,..., NT), and define the Writing V =( V1| | VT), where Vt 2 Rrt d has rank dr, we can then construct the singular value decompositions Vt = Ut tW>t, with Ut 2 O(rt dt), t 2 Rdt dt and Wt 2 O(d dt), where dt = rank(Vt).

artificial intelligence, machine learning, matrix, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

47b4f1bfdf6d298682e610ad74b37dca-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 17:13:30 GMT

classifier, data mining, machine learning, (22 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Degraded Polygons Raise Fundamental Questions of Neural Network Perception

Neural Information Processing SystemsApr-25-2026, 16:56:53 GMT

It is well-known that modern computer vision systems often exhibit behaviors misaligned with those of humans: from adversarial attacks to image corruptions, deep learning vision models suffer in a variety of settings that humans capably handle. In light of these phenomena, here we introduce another, orthogonal perspective studying the human-machine vision gap. We revisit the task of recovering images under degradation, first introduced over 30 years ago in the Recognition-by-Components theory of human vision. Specifically, we study the performance and behavior of neural networks on the seemingly simple task of classifying regular polygons at varying orders of degradation along their perimeters. To this end, we implement the Automated Shape Recoverability Test1 for rapidly generating large-scale datasets of perimeter-degraded regular polygons, modernizing the historically manual creation of image recoverability experiments. We then investigate the capacity of neural networks to recognize and recover such degraded shapes when initialized with different priors. Ultimately, we find that neural networks' behavior on this simple task conflicts with human behavior, raising a fundamental question of the robustness and learning capabilities of modern computer vision models.

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report (0.46)

Industry: Information Technology (0.54)

Technology: