AITopics

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > New York > Erie County > Amherst (0.04)
Asia > India > West Bengal > Kharagpur (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti

Overlapping Clustering Models, and One (class) SVM to Bind Them All

Neural Information Processing SystemsFeb-13-2026, 05:55:14 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, matrix, vector, (17 more...)

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti

Overlapping Clustering Models, and One (class) SVM to Bind Them All

Neural Information Processing SystemsNov-20-2025, 17:24:12 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, bayesian inference, machine learning, (20 more...)

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

arXiv.org Artificial IntelligenceOct-7-2025

WAREX: Web Agent Reliability Evaluation on Existing Benchmarks

Kara, Su, Faisal, Fazle, Nath, Suman

Recent advances in browser-based LLM agents have shown promise for automating tasks ranging from simple form filling to hotel booking or online shopping. Current benchmarks measure agent performance in controlled environments, such as containers or stable networks, where websites behave deterministically. However, in the real world, users access websites over networks and HTTPS connections that introduce instability from multiple sources: client-side, server-side issues or broader system failures. Moreover, live websites are prone to web attacks such Cross-Site Scripting, as well as general site modifications which can cause unexpected or malicious pop-ups or improper functionality. Our experiments show that introducing WAREX leads to significant drops in task success rates, highlighting the limited robustness of state-of-the-art agents. W eb agents are leaving the lab and entering the wild, but benchmarks give a false sense of reliability. Web agents have emerged as a promising paradigm for automating complex online tasks, attracting significant attention across academia and industry. Recent advances have produced state-of-the-art web agents with diverse designs, ranging from variations in prompting and observation spaces to reinforcement learning-based action policies. Notable examples include SteP (Sodhi et al., 2024), WebNaviX (Shlomov et al., 2024), Agent Q (Putta et al., 2024), and GUI-Owl (Y e et al., 2025), among a myriad others. Large technology companies have also begun deploying production-grade agents, such as OpenAI (2025); Perplexity (2025) and TinyFish (2025).

large language model, machine learning, natural language, (18 more...)

2510.03285

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

arXiv.org Artificial IntelligenceJun-25-2025

MedErr-CT: A Visual Question Answering Benchmark for Identifying and Correcting Errors in CT Reports

Kyung, Sunggu, Park, Hyungbin, Seo, Jinyoung, Sung, Jimin, Kim, Jihyun, Kim, Dongyeong, Jo, Wooyoung, Nam, Yoojin, Park, Sangah, Kwon, Taehee, Lee, Sang Min, Kim, Namkug

Computed T omography (CT) plays a crucial role in clinical diagnosis, but the growing demand for CT examinations has raised concerns about diagnostic errors. While Multimodal Large Language Models (MLLMs) demonstrate promising comprehension of medical knowledge, their tendency to produce inaccurate information highlights the need for rigorous validation. However, existing medical visual question answering (VQA) benchmarks primarily focus on simple visual recognition tasks, lacking clinical relevance and failing to assess expert-level knowledge. W e introduce MedErr-CT, a novel benchmark for evaluating medical MLLMs' ability to identify and correct errors in CT reports through a VQA framework. The benchmark includes six error categories--four vision-centric errors (Omission, Insertion, Direction, Size) and two lexical error types (Unit, Typo)--and is organized into three task levels: classification, detection, and correction. Using this benchmark, we quantitatively assess the performance of state-of-the-art 3D medical MLLMs, revealing substantial variation in their capabilities across different error types. Our benchmark contributes to the development of more reliable and clinically applicable MLLMs, ultimately helping reduce diagnostic errors and improve accuracy in clinical practice.

artificial intelligence, large language model, natural language, (16 more...)

2506.19217

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Nuclear Medicine (0.96)
Health & Medicine > Health Care Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Kim, Taehoon, Gouk, Henry, Kim, Minyoung, Hospedales, Timothy

Model Merging is Secretly Certifiable: Non-Vacuous Generalisation Bounds for Low-Shot Learning

arXiv.org Artificial IntelligenceMay-22-2025

Certifying the IID generalisation ability of deep networks is the first of many requirements for trusting AI in high-stakes applications from medicine to security. However, when instantiating generalisation bounds for deep networks it remains challenging to obtain non-vacuous guarantees, especially when applying contemporary large models on the small scale data prevalent in such high-stakes fields. In this paper, we draw a novel connection between a family of learning methods based on model fusion and generalisation certificates, and surprisingly show that with minor adjustment several existing learning strategies already provide non-trivial generalisation guarantees. Essentially, by focusing on data-driven learning of downstream tasks by fusion rather than fine-tuning, the certified generalisation gap becomes tiny and independent of the base network size, facilitating its certification. Our results show for the first time non-trivial generalisation guarantees for learning with as low as 100 examples, while using vision models such as VIT-B and language models such as mistral-7B. This observation is significant as it has immediate implications for facilitating the certification of existing systems as trustworthy, and opens up new directions for research at the intersection of practice and theory.

large language model, machine learning, natural language, (19 more...)

2505.15798

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceMar-6-2025

Randomized based restricted kernel machine for hyperspectral image classification

Quadir, A., Tanveer, M.

In recent years, the random vector functional link (RVFL) network has gained significant popularity in hyperspectral image (HSI) classification due to its simplicity, speed, and strong generalization performance. However, despite these advantages, RVFL models face several limitations, particularly in handling non-linear relationships and complex data structures. The random initialization of input-to-hidden weights can lead to instability, and the model struggles with determining the optimal number of hidden nodes, affecting its performance on more challenging datasets. To address these issues, we propose a novel randomized based restricted kernel machine ($R^2KM$) model that combines the strehyperngths of RVFL and restricted kernel machines (RKM). $R^2KM$ introduces a layered structure that represents kernel methods using both visible and hidden variables, analogous to the energy function in restricted Boltzmann machines (RBM). This structure enables $R^2KM$ to capture complex data interactions and non-linear relationships more effectively, improving both interpretability and model robustness. A key contribution of $R^2KM$ is the introduction of a novel conjugate feature duality based on the Fenchel-Young inequality, which expresses the problem in terms of conjugate dual variables and provides an upper bound on the objective function. This duality enhances the model's flexibility and scalability, offering a more efficient and flexible solution for complex data analysis tasks. Extensive experiments on hyperspectral image datasets and real-world data from the UCI and KEEL repositories show that $R^2KM$ outperforms baseline models, demonstrating its effectiveness in classification and regression tasks.

baseline model, dataset, km model, (13 more...)

2503.05837

Country:

North America > United States > California (0.04)
North America > United States > Indiana (0.04)
North America > United States > Florida > Brevard County (0.04)
Asia > India > Madhya Pradesh (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Artificial IntelligenceFeb-13-2025

TRKM: Twin Restricted Kernel Machines for Classification and Regression

Quadir, A., Tanveer, M.

Restricted kernel machines (RKMs) have considerably improved generalization in machine learning. Recent advancements explored various techniques within the RKM framework, integrating kernel functions with least squares support vector machines (LSSVM) to mirror the energy function of restricted Boltzmann machines (RBM), leading to enhanced performance. However, RKMs may face challenges in generalization when dealing with unevenly distributed or complexly clustered data. Additionally, as the dataset size increases, the computational burden of managing high-dimensional feature spaces can become substantial, potentially hindering performance in large-scale datasets. To address these challenges, we propose twin restricted kernel machine (TRKM). TRKM combines the benefits of twin models with the robustness of the RKM framework to enhance classification and regression tasks. By leveraging the Fenchel-Young inequality, we introduce a novel conjugate feature duality, allowing the formulation of classification and regression problems in terms of dual variables. This duality provides an upper bound to the objective function of the TRKM problem, resulting in a new methodology under the RKM framework. The model uses an energy function similar to that of RBM, incorporating both visible and hidden variables corresponding to both classes. Additionally, the kernel trick is employed to map data into a high-dimensional feature space, where the model identifies an optimal separating hyperplane using a regularized least squares approach. Experiments on UCI and KEEL datasets confirm TRKM's superiority over baselines, showcasing its robustness and efficiency in handling complex data. Furthermore, We implemented the TRKM model on the brain age dataset, demonstrating its efficacy in predicting brain age.

baseline model, dataset, trkm-r model, (16 more...)

2502.15759

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > India > Madhya Pradesh (0.04)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Neural Information Processing SystemsFeb-9-2025, 08:37:09 GMT

Multi-Class Deep Boosting

Our algorithms can use as a base classifier set a family of deep decision trees or other rich or complex families and yet benefit from strong generalization guarantees. We give new data-dependent learning bounds for convex ensembles in the multiclass classification setting expressed in terms of the Rademacher complexities of the sub-families composing the base classifier set, and the mixture weight assigned to each sub-family. These bounds are finer than existing ones both thanks to an improved dependency on the number of classes and, more crucially, by virtue of a more favorable complexity term expressed as an average of the Rademacher complexities based on the ensemble's mixture weights.

algorithm, artificial intelligence, machine learning, (18 more...)

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.49)

Hou, Xiaotian, Zhang, Linjun

Finite-Sample and Distribution-Free Fair Classification: Optimal Trade-off Between Excess Risk and Fairness, and the Cost of Group-Blindness

arXiv.org Machine LearningNov-6-2024

Algorithmic fairness in machine learning has recently garnered significant attention. However, two pressing challenges remain: (1) The fairness guarantees of existing fair classification methods often rely on specific data distribution assumptions and large sample sizes, which can lead to fairness violations when the sample size is moderate-a common situation in practice. (2) Due to legal and societal considerations, using sensitive group attributes during decision-making (referred to as the group-blind setting) may not always be feasible. In this work, we quantify the impact of enforcing algorithmic fairness and group-blindness in binary classification under group fairness constraints. Specifically, we propose a unified framework for fair classification that provides distribution-free and finite-sample fairness guarantees with controlled excess risk. This framework is applicable to various group fairness notions in both group-aware and group-blind scenarios. Furthermore, we establish a minimax lower bound on the excess risk, showing the minimax optimality of our proposed algorithm up to logarithmic factors. Through extensive simulation studies and real data analysis, we further demonstrate the superior performance of our algorithm compared to existing methods, and provide empirical support for our theoretical findings.

classifier, excess risk, unfairness measure, (15 more...)

arXiv.org Machine Learning

2410.16477

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(4 more...)

Genre: Research Report (0.81)

Industry: Law (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)