AITopics | irr

Collaborating Authors

irr

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Recursive Dynamics in Fast-Weights Homeostatic Reentry Networks: Toward Reflective Intelligence

Chae, B. G.

arXiv.org Artificial IntelligenceNov-11-2025

This study introduces the Fast-Weights Homeostatic Reentry Layer (FH-RL), a neural mechanism that integrates fast-weight associative memory, homeostatic regularization, and learned reentrant feedback to approximate self-referential computation in neural networks. Unlike standard transformer architectures that operate in a purely feedforward manner during inference, FH-RL enables internal recurrence without external looping, allowing prior latent states to be dynamically re-entered into the ongoing computation stream. We conduct controlled experiments sweeping the reentry gain $γ$ and evaluate emergent internal dynamics using three novel metrics: the Information Reentry Ratio (IRR), Eigen-Spectrum Recursion Index (ESRI), and Representational Drift Periodicity (RDP). Results show that reentry quantity increases proportionally with~$γ$, while the learned feedback matrix $W_r$ remains bounded and becomes more structured at moderate gains. Critically, a stable reflective band emerges around $γ\approx 0.10-0.20$, where internal feedback is maximally expressive yet spectrally stable: IRR rises smoothly, ESRI remains near zero, and RDP exhibits consistent low-frequency cycles. These findings provide quantitative evidence that reflective, thought-like internal processing can arise from a principled balance between feedback amplification and homeostatic regulation, linking modern fast-weight architectures to theories of cortical reentry and recursive cognition.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.06798

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Beyond Agreement: Rethinking Ground Truth in Educational AI Annotation

Thomas, Danielle R., Borchers, Conrad, Koedinger, Kenneth R.

arXiv.org Artificial IntelligenceAug-4-2025

Humans can be notoriously imperfect evaluators. They are often biased, unreliable, and unfit to define "ground truth." Yet, given the surging need to produce large amounts of training data in educational applications using AI, traditional inter-rater reliability (IRR) metrics like Cohen's kappa remain central to validating labeled data. IRR remains a cornerstone of many machine learning pipelines for educational data. Take, for example, the classification of tutors' moves in dialogues or labeling open responses in machine-graded assessments. This position paper argues that overreliance on human IRR as a gatekeeper for annotation quality hampers progress in classifying data in ways that are valid and predictive in relation to improving learning. To address this issue, we highlight five examples of complementary evaluation methods, such as multi-label annotation schemes, expert-based approaches, and close-the-loop validity. We argue that these approaches are in a better position to produce training data and subsequent models that produce improved student learning and more actionable insights than IRR approaches alone. We also emphasize the importance of external validity, for example, by establishing a procedure of validating tutor moves and demonstrating that it works across many categories of tutor actions (e.g., providing hints). We call on the field to rethink annotation quality and ground truth--prioritizing validity and educational impact over consensus alone.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.00143

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Adversarially Pretrained Transformers may be Universally Robust In-Context Learners

Kumano, Soichiro, Kera, Hiroshi, Yamasaki, Toshihiko

arXiv.org Machine LearningMay-21-2025

Adversarial training is one of the most effective adversarial defenses, but it incurs a high computational cost. In this study, we show that transformers adversarially pretrained on diverse tasks can serve as robust foundation models and eliminate the need for adversarial training in downstream tasks. Specifically, we theoretically demonstrate that through in-context learning, a single adversarially pretrained transformer can robustly generalize to multiple unseen tasks without any additional training, i.e., without any parameter updates. This robustness stems from the model's focus on robust features and its resistance to attacks that exploit non-predictive features. Besides these positive findings, we also identify several limitations. Under certain conditions (though unrealistic), no universally robust single-layer transformers exist. Moreover, robust transformers exhibit an accuracy--robustness trade-off and require a large number of in-context demonstrations. The code is available at https://github.com/s-kumano/universally-robust-in-context-learner.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2505.14042

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

TSPRank: Bridging Pairwise and Listwise Methods with a Bilinear Travelling Salesman Model

Li, Weixian Waylon, Ziser, Yftah, Xie, Yifei, Cohen, Shay B., Ma, Tiejun

arXiv.org Artificial IntelligenceNov-18-2024

Traditional Learning-To-Rank (LETOR) approaches, including pairwise methods like RankNet and LambdaMART, often fall short by solely focusing on pairwise comparisons, leading to sub-optimal global rankings. Conversely, deep learning based listwise methods, while aiming to optimise entire lists, require complex tuning and yield only marginal improvements over robust pairwise models. To overcome these limitations, we introduce Travelling Salesman Problem Rank (TSPRank), a hybrid pairwise-listwise ranking method. TSPRank reframes the ranking problem as a Travelling Salesman Problem (TSP), a well-known combinatorial optimisation challenge that has been extensively studied for its numerous solution algorithms and applications. This approach enables the modelling of pairwise relationships and leverages combinatorial optimisation to determine the listwise ranking. This approach can be directly integrated as an additional component into embeddings generated by existing backbone models to enhance ranking performance. Our extensive experiments across three backbone models on diverse tasks, including stock ranking, information retrieval, and historical events ordering, demonstrate that TSPRank significantly outperforms both pure pairwise and listwise methods. Our qualitative analysis reveals that TSPRank's main advantage over existing methods is its ability to harness global information better while ranking. TSPRank's robustness and superior performance across different domains highlight its potential as a versatile and effective LETOR solution. The code and preprocessed data are available at https://github.com/waylonli/TSPRank-KDD2025.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2411.12064

Country:

North America > United States (0.92)
Europe > United Kingdom (0.67)
North America > Canada (0.15)
(10 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Transportation (1.00)
Banking & Finance > Trading (0.93)
Government > Regional Government (0.92)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.86)

Add feedback

Understanding Subjectivity through the Lens of Motivational Context in Model-Generated Image Satisfaction

Dutta, Senjuti, Chen, Sherol, Mak, Sunny, Ahmad, Amnah, Collins, Katherine, Butryna, Alena, Ramachandran, Deepak, Dvijotham, Krishnamurthy, Pavlick, Ellie, Rajakumar, Ravi

arXiv.org Artificial IntelligenceFeb-26-2024

Image generation models are poised to become ubiquitous in a range of applications. These models are often fine-tuned and evaluated using human quality judgments that assume a universal standard, failing to consider the subjectivity of such tasks. To investigate how to quantify subjectivity, and the scale of its impact, we measure how assessments differ among human annotators across different use cases. Simulating the effects of ordinarily latent elements of annotators subjectivity, we contrive a set of motivations (t-shirt graphics, presentation visuals, and phone background images) to contextualize a set of crowdsourcing tasks. Our results show that human evaluations of images vary within individual contexts and across combinations of contexts. Three key factors affecting this subjectivity are image appearance, image alignment with text, and representation of objects mentioned in the text. Our study highlights the importance of taking individual users and contexts into account, both when building and evaluating generative models

annotator, motivational context, subjectivity, (16 more...)

arXiv.org Artificial Intelligence

2403.05576

Country:

North America > United States > Tennessee > Knox County > Knoxville (0.14)
North America > United States > California > Santa Clara County > Mountain View (0.05)
North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.68)
Information Technology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

k-Rater Reliability: The Correct Unit of Reliability for Aggregated Human Annotations

Wong, Ka, Paritosh, Praveen

arXiv.org Machine LearningMar-24-2022

Since the inception of crowdsourcing, aggregation has been a common strategy for dealing with unreliable data. Aggregate ratings are more reliable than individual ones. However, many natural language processing (NLP) applications that rely on aggregate ratings only report the reliability of individual ratings, which is the incorrect unit of analysis. In these instances, the data reliability is under-reported, and a proposed k-rater reliability (kRR) should be used as the correct data reliability for aggregated datasets. It is a multi-rater generalization of inter-rater reliability (IRR). We conducted two replications of the WordSim-353 benchmark, and present empirical, analytical, and bootstrap-based methods for computing kRR on WordSim-353. These methods produce very similar results. We hope this discussion will nudge researchers to report kRR in addition to IRR.

artificial intelligence, natural language, social media, (19 more...)

arXiv.org Machine Learning

2203.12913

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > Dominican Republic (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.50)

Add feedback

On the preferred extensions of argumentation frameworks: bijections with naive extensions

Elaroussi, Mohammed, Nourine, Lhouari, Radjef, Mohammed Said, Vilmin, Simon

arXiv.org Artificial IntelligenceFeb-11-2022

This paper deals with the problem of finding the preferred extensions of an argumentation framework by means of a bijection with the naive extensions of another framework. First we consider the case where an argumentation framework is naive-realizable: its naive and preferred extensions are equal. Recognizing naive-realizable argumentation frameworks is hard, but we show that it is tractable for frameworks with bounded in-degree. Next, we give a bijection between the preferred extensions of an argumentation framework being admissible-closed (the intersection of two admissible sets is admissible) and the naive extensions of another framework on the same set of arguments. On the other hand, we prove that identifying admissible-closed argumentation frameworks is coNP-complete. At last, we introduce the notion of irreducible self-defending sets as those that are not the union of others. It turns out there exists a bijection between the preferred extensions of an argumentation framework and the naive extensions of a framework on its irreducible self-defending sets. Consequently, the preferred extensions of argumentation frameworks with some lattice properties can be listed with polynomial delay and polynomial space.

argumentation framework, extension, irr, (13 more...)

arXiv.org Artificial Intelligence

2202.05506

Country:

Africa > Middle East > Algeria > Béjaïa Province > Béjaïa (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > South Holland > The Hague (0.04)
Europe > France > Auvergne-Rhône-Alpes > Puy-de-Dôme > Clermont-Ferrand (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)

Add feedback

Cross-replication Reliability -- An Empirical Approach to Interpreting Inter-rater Reliability

Wong, Ka, Paritosh, Praveen, Aroyo, Lora

arXiv.org Artificial IntelligenceJun-11-2021

We present a new approach to interpreting IRR that is empirical and contextualized. It is based upon benchmarking IRR against baseline measures in a replication, one of which is a novel cross-replication reliability (xRR) measure based on Cohen's kappa. We call this approach the xRR framework. We opensource a replication dataset of 4 million human judgements of facial expressions and analyze it with the proposed framework. We argue this framework can be used to measure the quality of crowdsourced datasets.

agreement, irr, replication, (16 more...)

arXiv.org Artificial Intelligence

2106.07393

Country:

North America > Mexico > Mexico City > Mexico City (0.05)
Europe > Hungary > Budapest > Budapest (0.05)
Asia > Malaysia > Kuala Lumpur > Kuala Lumpur (0.05)
(3 more...)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Communications > Social Media > Crowdsourcing (0.36)

Add feedback

Identity testing of reversible Markov chains

Fried, Sela, Wolfer, Geoffrey

arXiv.org Machine LearningMay-13-2021

We consider the problem of identity testing of Markov chains based on a single trajectory of observations under the distance notion introduced by Daskalakis et al. [2018a] and further analyzed by Cherapanamjeri and Bartlett [2019]. Both works made the restrictive assumption that the Markov chains under consideration are symmetric. In this work we relax the symmetry assumption to the more natural assumption of reversibility, still assuming that both the reference and the unknown Markov chains share the same stationary distribution.

lemma 4, markov chain, probability, (16 more...)

arXiv.org Machine Learning

2105.06347

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States (0.04)
Asia > India (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Towards Efficient Anytime Computation and Execution of Decoupled Robustness Envelopes for Temporal Plans

Cashmore, Michael, Cimatti, Alessandro, Magazzeni, Daniele, Micheli, Andrea, Zehtabi, Parisa

arXiv.org Artificial IntelligenceNov-17-2019

Robustness Envelopes characterize the set of possible contingencies that a plan is able to address without re-planning, but their exact computation is extremely expensive; furthermore, general robustness envelopes are not amenable for efficient execution. In this paper, we present a novel, anytime algorithm to approximate Robustness Envelopes, making them scalable and executable. This is proven by an experimental analysis showing the efficiency of the algorithm, and by a concrete case study where the execution of robustness envelopes significantly reduces the number of re-plannings. 1 Introduction When planning and scheduling techniques are employed in practical applications, one of the major problems is the need for online re-planning when the observed contingencies are not aligned with the ones that were considered at planning time. These situations are common, because it is arguably impossible to predict the entire range of situations an autonomous system can encounter, especially when the planning domain encompasses time and temporal constraints. Unfortunately, re-planning can be costly in terms of time, and computational resources can be scarce on-board, so limiting the use of re-planning is very important for practical purposes. In principle, it is also possible to continue with the execution of a plan even when the observed contingencies are unexpected, optimistically hoping for a successful completion. However, this approach offers no formal guarantee, and is prone to the risk of continuing execution of a plan that is bound to fail. Several approaches have been proposed in the literature to address this problem (see (In-grand and Ghallab 2017) for a survey focused on robotics).

algorithm, execution, robustness envelope, (13 more...)

arXiv.org Artificial Intelligence

1911.07318

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Italy (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Add feedback