AITopics | performance data

Collaborating Authors

performance data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization

Neural Information Processing SystemsMar-20-2026, 11:50:37 GMT

We estimate the difficulty of individual problems by leveraging the performance data of many human subjects and LLMs on prominent leaderboards. Harnessing the rich human performance data, we employ widely recognized difficulty ranking systems, including the Item Response Theory (IRT) and Glicko-2 models, to uniformly assign difficulty scores to problems. The Easy2Hard datasets distinguish themselves from previous collections by incorporating a significantly higher proportion of challenging problems, presenting a novel and demanding test for state-of-the-art LLMs. Through extensive experiments conducted with six state-of-the-art LLMs on the Easy2Hard datasets, we offer valuable insights into their performance and generalization capabilities across varying degrees of difficulty, setting the stage for future research in LLM generalization.

artificial intelligence, large language model, natural language, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A Multi-Agent LLM Framework for Design Space Exploration in Autonomous Driving Systems

Shih, Po-An, Wang, Shao-Hua, Li, Yung-Che, Tu, Chia-Heng, Chang, Chih-Han

arXiv.org Artificial IntelligenceDec-10-2025

Designing autonomous driving systems requires efficient exploration of large hardware/software configuration spaces under diverse environmental conditions, e.g., with varying traffic, weather, and road layouts. Traditional design space exploration (DSE) approaches struggle with multi-modal execution outputs and complex performance trade-offs, and often require human involvement to assess correctness based on execution outputs. This paper presents a multi-agent, large language model (LLM)-based DSE framework, which integrates multi-modal reasoning with 3D simulation and profiling tools to automate the interpretation of execution outputs and guide the exploration of system designs. Specialized LLM agents are leveraged to handle user input interpretation, design point generation, execution orchestration, and analysis of both visual and textual execution outputs, which enables identification of potential bottlenecks without human intervention. A prototype implementation is developed and evaluated on a robotaxi case study (an SAE Level 4 autonomous driving application). Compared with a genetic algorithm baseline, the proposed framework identifies more Pareto-optimal, cost-efficient solutions with reduced navigation time under the same exploration budget. Experimental results also demonstrate the efficiency of the adoption of the LLM-based approach for DSE. We believe that this framework paves the way to the design automation of autonomous driving systems.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2512.08476

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A Control-Theoretic Approach to Dynamic Payment Routing for Success Rate Optimization

Agrawal, Aniket, Patil, Harsharanga

arXiv.org Artificial IntelligenceOct-21-2025

This paper introduces a control-theoretic framework for dynamic payment routing, implemented within JUSPAY's Payment Orchestrator to maximize transaction success rate. The routing system is modeled as a closed-loop feedback controller continuously sensing gateway [3] performance, computing corrective actions, and dynamically routes transactions across gateway to ensure operational resilience. The system leverages concepts from control theory, reinforcement learning, and multi-armed bandit optimization to achieve both short-term responsiveness and long-term stability. Rather than relying on explicit PID regulation, the framework applies generalized feedback-based adaptation, ensuring that corrective actions remain proportional to observed performance deviations and the computed gateway score gradually converges toward the success rate [2]. This hybrid approach unifies control theory and adaptive decision systems, enabling self-regulating transaction routing that dampens instability, and improves reliability. Live production results show an improvement of up to 1.15% in success rate over traditional rule-based routing, demonstrating the effectiveness of feedback-based control in payment systems.

artificial intelligence, gateway, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.16735

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Banking & Finance (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.51)
Information Technology > Artificial Intelligence > Machine Learning (0.49)

Add feedback

MAB Optimizer for Estimating Math Question Difficulty via Inverse CV without NLP

Das, Surajit, Roy, Gourav, Eliseev, Aleksei, Rajendran, Ram Kumar

arXiv.org Artificial IntelligenceSep-3-2025

The evolution of technology and education is driving the emergence of Intelligent & Autonomous Tutoring Systems (IATS), where objective and domain-agnostic methods for determining question difficulty are essential. Traditional human labeling is subjective, and existing NLP-based approaches fail in symbolic domains like algebra. This study introduces the Approach of Passive Measures among Educands (APME), a reinforcement learning-based Multi-Armed Bandit (MAB) framework that estimates difficulty solely from solver performance data -- marks obtained and time taken -- without requiring linguistic features or expert labels. By leveraging the inverse coefficient of variation as a risk-adjusted metric, the model provides an explainable and scalable mechanism for adaptive assessment. Empirical validation was conducted on three heterogeneous datasets. Across these diverse contexts, the model achieved an average R2 of 0.9213 and an average RMSE of 0.0584, confirming its robustness, accuracy, and adaptability to different educational levels and assessment formats. Compared with baseline approaches-such as regression-based, NLP-driven, and IRT models-the proposed framework consistently outperformed alternatives, particularly in purely symbolic domains. The findings highlight that (i) item heterogeneity strongly influences perceived difficulty, and (ii) variance in solver outcomes is as critical as mean performance for adaptive allocation. Pedagogically, the model aligns with Vygotskys Zone of Proximal Development by identifying tasks that balance challenge and attainability, supporting motivation while minimizing disengagement. This domain-agnostic, self-supervised approach advances difficulty tagging in IATS and can be extended beyond algebra wherever solver interaction data is available

data mining, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.19014

Country:

Europe > Russia (0.28)
Asia > India (0.28)

Genre:

Instructional Material (0.93)
Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Model Performance-Guided Evaluation Data Selection for Effective Prompt Optimization

Dong, Ximing, Wang, Shaowei, Lin, Dayi, Hassan, Ahmed E.

arXiv.org Artificial IntelligenceAug-20-2025

Optimizing Large Language Model (LLM) performance requires well-crafted prompts, but manual prompt engineering is labor-intensive and often ineffective. Automated prompt optimization techniques address this challenge but the majority of them rely on randomly selected evaluation subsets, which fail to represent the full dataset, leading to unreliable evaluations and suboptimal prompts. Existing coreset selection methods, designed for LLM benchmarking, are unsuitable for prompt optimization due to challenges in clustering similar samples, high data collection costs, and the unavailability of performance data for new or private datasets. To overcome these issues, we propose IPOMP, an Iterative evaluation data selection for effective Prompt Optimization using real-time Model Performance. IPOMP is a two-stage approach that selects representative and diverse samples using semantic clustering and boundary analysis, followed by iterative refinement with real-time model performance data to replace redundant samples. Evaluations on the BIG-bench dataset show that IPOMP improves effectiveness by 1.6% to 5.3% and stability by at least 57% compared with SOTA baselines, with minimal computational overhead below 1%. Furthermore, the results demonstrate that our real-time performance-guided refinement approach can be universally applied to enhance existing coreset selection methods.

large language model, machine learning, speaker 1, (19 more...)

arXiv.org Artificial Intelligence

2505.10736

Country: North America (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization

Neural Information Processing SystemsMay-27-2025, 00:53:13 GMT

artificial intelligence, large language model, natural language, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

JanusDDG: A Thermodynamics-Compliant Model for Sequence-Based Protein Stability via Two-Fronts Multi-Head Attention

Barducci, Guido, Rossi, Ivan, Codicè, Francesco, Rollo, Cesare, Repetto, Valeria, Pancotti, Corrado, Iannibelli, Virginia, Sanavia, Tiziana, Fariselli, Piero

arXiv.org Artificial IntelligenceApr-16-2025

Understanding how residue variations affect protein stability is crucial for designing functional proteins and deciphering the molecular mechanisms underlying disease-related mutations. Recent advances in protein language models (PLMs) have revolutionized computational protein analysis, enabling, among other things, more accurate predictions of mutational effects. In this work, we introduce JanusDDG, a deep learning framework that leverages PLM-derived embeddings and a bidirectional cross-attention transformer architecture to predict $ΔΔG$ of single and multiple-residue mutations while simultaneously being constrained to respect fundamental thermodynamic properties, such as antisymmetry and transitivity. Unlike conventional self-attention, JanusDDG computes queries (Q) and values (V) as the difference between wild-type and mutant embeddings, while keys (K) alternate between the two. This cross-interleaved attention mechanism enables the model to capture mutation-induced perturbations while preserving essential contextual information. Experimental results show that JanusDDG achieves state-of-the-art performance in predicting $ΔΔG$ from sequence alone, matching or exceeding the accuracy of structure-based methods for both single and multiple mutations. Code Availability:https://github.com/compbiomed-unito/JanusDDG

artificial intelligence, machine learning, mutation, (18 more...)

arXiv.org Artificial Intelligence

2504.03278

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generative Data Imputation for Sparse Learner Performance Data Using Generative Adversarial Imputation Networks

Zhang, Liang, Lin, Jionghao, Sabatini, John, Zapata-Rivera, Diego, Forsyth, Carol, Jiang, Yang, Hollander, John, Hu, Xiangen, Graesser, Arthur C.

arXiv.org Artificial IntelligenceApr-15-2025

DV ANCEMENTS in AI-driven technologies have significantly enhanced modern education through personalized tutoring and adaptive learning strategies on online platforms [1], [2]. Intelligent T utoring Systems (ITSs) exemplify this progress by leveraging advanced machine learning and natural language processing models to create interactive learning environments that improve outcomes across domains like literacy [3], mathematics [4], language learning [5], biology [6] and other STEM fields [7]. As human learners interact with ITSs, often through question-and-answer scenarios with immediate responses, their performance data becomes crucial for learner modeling, enabling systems to track progress, predict future performance, and adapt instruction accordingly [8]. Learner models like Bayesian Knowledge Tracing (BKT) and other knowledge tracing variants utilize the learner performance data to uncover learning characteristics, estimate knowledge states and acquisition [9]. However, in real-world scenarios, missing learner performance data is prevalent due to factors, such as learner dropout or disengagement [10], technical issues or incomplete data logging [11], biased sampling within experimental groups [12], and more. These challenges often lead to sparse data, where items (i.e., questions or problems) remain unattempted (e.g., learners may bypass the question, leave it unanswered due to a lack of response initiation, or make no attempt to engage with it), alongside limited learner interactions [13], [14]. As shown in Figure 1, missing performance records can occur along both the attempt and question dimensions during learner-ITS interactions. In the right portion of the figure's two matrices, entries marked with "?

imputation, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.18982

Country: North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)
Education > Assessment & Standards (1.00)
Education > Curriculum > Subject-Specific Education (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)
(2 more...)

Add feedback

Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI

Zhang, Liang, Lin, Jionghao, Sabatini, John, Borchers, Conrad, Weitekamp, Daniel, Cao, Meng, Hollander, John, Hu, Xiangen, Graesser, Arthur C.

arXiv.org Artificial IntelligenceJan-3-2025

Learning performance data describe correct and incorrect answers or problem-solving attempts in adaptive learning, such as in intelligent tutoring systems (ITSs). Learning performance data tend to be highly sparse (80\%$\sim$90\% missing observations) in most real-world applications due to adaptive item selection. This data sparsity presents challenges to using learner models to effectively predict future performance explore new hypotheses about learning. This article proposes a systematic framework for augmenting learner data to address data sparsity in learning performance data. First, learning performance is represented as a three-dimensional tensor of learners' questions, answers, and attempts, capturing longitudinal knowledge states during learning. Second, a tensor factorization method is used to impute missing values in sparse tensors of collected learner data, thereby grounding the imputation on knowledge tracing tasks that predict missing performance values based on real observations. Third, a module for generating patterns of learning is used. This study contrasts two forms of generative Artificial Intelligence (AI), including Generative Adversarial Networks (GANs) and Generate Pre-Trained Transformers (GPT) to generate data associated with different clusters of learner data. We tested this approach on an adult literacy dataset from AutoTutor lessons developed for Adult Reading Comprehension (ARC). We found that: (1) tensor factorization improved the performance in tracing and predicting knowledge mastery compared with other knowledge tracing techniques without data augmentation, showing higher relative fidelity for this imputation method, and (2) the GAN-based simulation showed greater overall stability and less statistical bias based on a divergence evaluation with varying simulation sample sizes compared to GPT.

artificial intelligence, machine learning, performance data, (17 more...)

arXiv.org Artificial Intelligence

2409.15631

Country: North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Instructional Material (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.84)

Add feedback

Optimizing Fantasy Sports Team Selection with Deep Reinforcement Learning

Bhattacharjee, Shamik, Marathe, Kamlesh, Kapoor, Hitesh, Patil, Nilesh

arXiv.org Artificial IntelligenceDec-26-2024

Fantasy sports, particularly fantasy cricket, have garnered immense popularity in India in recent years, offering enthusiasts the opportunity to engage in strategic team-building and compete based on the real-world performance of professional athletes. In this paper, we address the challenge of optimizing fantasy cricket team selection using reinforcement learning (RL) techniques. By framing the team creation process as a sequential decision-making problem, we aim to develop a model that can adaptively select players to maximize the team's potential performance. Our approach leverages historical player data to train RL algorithms, which then predict future performance and optimize team composition. This not only represents a huge business opportunity by enabling more accurate predictions of high-performing teams but also enhances the overall user experience. Through empirical evaluation and comparison with traditional fantasy team drafting methods, we demonstrate the effectiveness of RL in constructing competitive fantasy teams. Our results show that RL-based strategies provide valuable insights into player selection in fantasy sports.

artificial intelligence, machine learning, reinforcement learning, (9 more...)

arXiv.org Artificial Intelligence

2412.19215

Country: Asia > India > Maharashtra (0.15)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Gambling (1.00)
Leisure & Entertainment > Sports > Cricket (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback