AITopics | cmm

Collaborating Authors

cmm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Curriculum Model Merging: Harmonizing Chemical LLMs for Enhanced Cross-Task Generalization

Neural Information Processing SystemsJun-16-2026, 08:23:18 GMT

The emergence of large language models (LLMs) has opened new opportunities for AI-driven chemical problem solving. However, existing chemical LLMs are typically tailored to specific task formats or narrow domains, limiting their capacity to integrate knowledge and generalize across tasks. Model merging offers a promising route for efficiently combining specialized LLMs into a unified model without access to original training data, which is urgently needed in the chemical domain where in-house data and privacy preservation are critical. However, effective model merging in the chemical domain poses unique challenges: (1) significant disparities among chemical LLMs due to task-specific specialization, and (2) a highly imbalanced distribution of chemical LLMs in targeted downstream tasks, where some are over-benchmarked while others remain underexplored. These challenges intensify model inconsistencies such as parameter interference and accumulated fine-tuning noise, which collectively hinder effective model merging. To this end, we propose Curriculum Model Merging (CMM), a curriculum-based framework that progressively merges expert chemical LLMs in a moderate and continual manner. CMM aims to harmonize their inconsistencies while meantime preserve their domain-specific expertise. Comprehensive experiments on two benchmark datasets show that CMM effectively consolidates task-specific expertise and outperforms the state-of-the-art methods by 29.03% in terms of overall average performance.

expert model, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota (0.28)
Asia > Middle East > UAE (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

A Systematic Approach to Predict the Impact of Cybersecurity Vulnerabilities Using LLMs

Høst, Anders Mølmen, Lison, Pierre, Moonen, Leon

arXiv.org Artificial IntelligenceOct-21-2025

Vulnerability databases, such as the National Vulnerability Database (NVD), offer detailed descriptions of Common Vulnerabilities and Exposures (CVEs), but often lack information on their real-world impact, such as the tactics, techniques, and procedures (TTPs) that adversaries may use to exploit the vulnerability. However, manually linking CVEs to their corresponding TTPs is a challenging and time-consuming task, and the high volume of new vulnerabilities published annually makes automated support desirable. This paper introduces TRIAGE, a two-pronged automated approach that uses Large Language Models (LLMs) to map CVEs to relevant techniques from the ATT&CK knowledge base. We first prompt an LLM with instructions based on MITRE's CVE Mapping Methodology to predict an initial list of techniques. This list is then combined with the results from a second LLM-based module that uses in-context learning to map a CVE to relevant techniques. This hybrid approach strategically combines rule-based reasoning with data-driven inference. Our evaluation reveals that in-context learning outperforms the individual mapping methods, and the hybrid approach improves recall of exploitation techniques. We also find that GPT-4o-mini performs better than Llama3.3-70B on this task. Overall, our results show that LLMs can be used to automatically predict the impact of cybersecurity vulnerabilities and TRIAGE makes the process of mapping CVEs to ATT&CK more efficient. A replication package is available for download from https://doi.org/10.5281/zenodo.17341503. Keywords: vulnerability impact, CVE, ATT&CK techniques, large language models, automated mapping.

large language model, machine learning, secondary impact, (17 more...)

arXiv.org Artificial Intelligence

2508.18439

Country: Europe > Norway (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.61)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MeMo: Towards Language Models with Associative Memory Mechanisms

Zanzotto, Fabio Massimo, Ruzzetti, Elena Sofia, Xompero, Giancarlo A., Ranaldi, Leonardo, Venditti, Davide, Ranaldi, Federico, Giannone, Cristina, Favalli, Andrea, Romagnoli, Raniero

arXiv.org Artificial IntelligenceFeb-18-2025

Memorization is a fundamental ability of Transformer-based Large Language Models, achieved through learning. In this paper, we propose a paradigm shift by designing an architecture to memorize text directly, bearing in mind the principle that memorization precedes learning. We introduce MeMo, a novel architecture for language modeling that explicitly memorizes sequences of tokens in layered associative memories. By design, MeMo offers transparency and the possibility of model editing, including forgetting texts. We experimented with the MeMo architecture, showing the memorization power of the one-layer and the multi-layer configurations.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.12851

Country:

Europe (1.00)
North America > United States (0.46)
North America > Mexico (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.61)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Clustered Mallows Model

Piancastelli, Luiza S. C., Friel, Nial

arXiv.org Machine LearningMar-19-2024

Rankings are a type of preference elicitation that arise in experiments where assessors arrange items, for example, in decreasing order of utility. Orderings of n items labelled {1,...,n} denoted are permutations that reflect strict preferences. For a number of reasons, strict preferences can be unrealistic assumptions for real data. For example, when items share common traits it may be reasonable to attribute them equal ranks. Also, there can be different importance attributions to decisions that form the ranking. In a situation with, for example, a large number of items, an assessor may wish to rank at top a certain number items; to rank other items at the bottom and to express indifference to all others. In addition, when aggregating opinions, a judging body might be decisive about some parts of the rank but ambiguous for others. In this paper we extend the well-known Mallows (Mallows, 1957) model (MM) to accommodate item indifference, a phenomenon that can be in place for a variety of reasons, such as those above mentioned.The underlying grouping of similar items motivates the proposed Clustered Mallows Model (CMM). The CMM can be interpreted as a Mallows distribution for tied ranks where ties are learned from the data. The CMM provides the flexibility to combine strict and indifferent relations, achieving a simpler and robust representation of rank collections in the form of ordered clusters. Bayesian inference for the CMM is in the class of doubly-intractable problems since the model's normalisation constant is not available in closed form. We overcome this challenge by sampling from the posterior with a version of the exchange algorithm \citep{murray2006}. Real data analysis of food preferences and results of Formula 1 races are presented, illustrating the CMM in practical situations.

cmm, criterion, probability, (14 more...)

arXiv.org Machine Learning

2403.1288

Country:

Asia > Japan > Honshū > Tōhoku (0.05)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Oceania > Australia (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Exponentially Faster Language Modelling

Belcak, Peter, Wattenhofer, Roger

arXiv.org Artificial IntelligenceNov-21-2023

Language models only really need to use an exponential fraction of their neurons for individual inferences. As proof, we present UltraFastBERT, a BERT variant that uses 0.3% of its neurons during inference while performing on par with similar BERT models. UltraFastBERT selectively engages just 12 out of 4095 neurons for each layer inference. This is achieved by replacing feedforward networks with fast feedforward networks (FFFs). While no truly efficient implementation currently exists to unlock the full acceleration potential of conditional neural execution, we provide high-level CPU code achieving 78x speedup over the optimized baseline feedforward implementation, and a PyTorch implementation delivering 40x speedup over the equivalent batched feedforward inference. We publish our training code, benchmarking setup, and model weights.

implementation, inference, neuron, (14 more...)

arXiv.org Artificial Intelligence

2311.1077

Country: Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

A Scalable Framework for Table of Contents Extraction from Complex ESG Annual Reports

Wang, Xinyu, Gui, Lin, He, Yulan

arXiv.org Artificial IntelligenceOct-27-2023

Table of contents (ToC) extraction centres on structuring documents in a hierarchical manner. In this paper, we propose a new dataset, ESGDoc, comprising 1,093 ESG annual reports from 563 companies spanning from 2001 to 2022. These reports pose significant challenges due to their diverse structures and extensive length. To address these challenges, we propose a new framework for Toc extraction, consisting of three steps: (1) Constructing an initial tree of text blocks based on reading order and font sizes; (2) Modelling each tree node (or text block) independently by considering its contextual information captured in node-centric subtree; (3) Modifying the original tree by taking appropriate action on each tree node (Keep, Delete, or Move). This construction-modelling-modification (CMM) process offers several benefits. It eliminates the need for pairwise modelling of section headings as in previous approaches, making document segmentation practically feasible. By incorporating structured information, each section heading can leverage both local and long-distance context relevant to itself. Experimental results show that our approach outperforms the previous state-of-the-art baseline with a fraction of running time. Our framework proves its scalability by effectively handling documents of any length.

esgdoc, information, node, (15 more...)

arXiv.org Artificial Intelligence

2310.18073

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(5 more...)

Genre:

Collection (0.61)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Cognition-inspired homeostasis can balance conflicting needs in robots

Stovold, James, O'Keefe, Simon, Timmis, Jon

arXiv.org Artificial IntelligenceSep-27-2022

Homeostasis keeps animals alive; it is a fundamental process that allows animals to adapt quickly to their environment. Artificial homeostasis can be used to help robots adapt to changing environments. Previous attempts at developing artificial homeostasis for robots were driven by mimicry of the biochemical machinery that drives homeostasis in humans. By considering homeostasis from a cognitive perspective, we develop a comparatively simple robot controller named CogSis (COGnitive HomeostaSIS) and demonstrate that it can provide homeostasis to a robot, even when there are conflicting needs. We present experiments showing that a robot running CogSis is able to learn from previous experiences and use them to influence future behaviour; can maintain its charge level while attending to another task (warming itself in an area separate from the charging station); and is able to maintain its charge level while avoiding a conflicting need (keeping cool, when the charging station is placed in a hot region of the environment). Results are presented in simulation and from a real robot platform.

agent, artificial intelligence, robot, (16 more...)

arXiv.org Artificial Intelligence

1811.10033

Country:

Europe > Germany > Saxony > Leipzig (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (0.97)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Statistical Theory for Imbalanced Binary Classification

Singh, Shashank, Khim, Justin

arXiv.org Machine LearningJul-4-2021

Within the vast body of statistical theory developed for binary classification, few meaningful results exist for imbalanced classification, in which data are dominated by samples from one of the two classes. Existing theory faces at least two main challenges. First, meaningful results must consider more complex performance measures than classification accuracy. To address this, we characterize a novel generalization of the Bayes-optimal classifier to any performance metric computed from the confusion matrix, and we use this to show how relative performance guarantees can be obtained in terms of the error of estimating the class probability function under uniform ($\mathcal{L}_\infty$) loss. Second, as we show, optimal classification performance depends on certain properties of class imbalance that have not previously been formalized. Specifically, we propose a novel sub-type of class imbalance, which we call Uniform Class Imbalance. We analyze how Uniform Class Imbalance influences optimal classifier performance and show that it necessitates different classifier behavior than other types of class imbalance. We further illustrate these two contributions in the case of $k$-nearest neighbor classification, for which we develop novel guarantees. Together, these results provide some of the first meaningful finite-sample statistical theory for imbalanced binary classification.

class imbalance, classification, classifier, (13 more...)

arXiv.org Machine Learning

2107.01777

Country:

North America > United States > Texas (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Sequential Deconfounding for Causal Inference with Unobserved Confounders

Hatt, Tobias, Feuerriegel, Stefan

arXiv.org Machine LearningApr-16-2021

Using observational data to estimate the effect of a treatment is a powerful tool for decision-making when randomized experiments are infeasible or costly. However, observational data often yields biased estimates of treatment effects, since treatment assignment can be confounded by unobserved variables. A remedy is offered by deconfounding methods that adjust for such unobserved confounders. In this paper, we develop the Sequential Deconfounder, a method that enables estimating individualized treatment effects over time in presence of unobserved confounders. This is the first deconfounding method that can be used in a general sequential setting (i.e., with one or more treatments assigned at each timestep). The Sequential Deconfounder uses a novel Gaussian process latent variable model to infer substitutes for the unobserved confounders, which are then used in conjunction with an outcome model to estimate treatment effects over time. We prove that using our method yields unbiased estimates of individualized treatment responses over time. Using simulated and real medical data, we demonstrate the efficacy of our method in deconfounding the estimation of treatment responses over time.

confounder, treatment response, unobserved confounder, (13 more...)

arXiv.org Machine Learning

2104.09323

Country: Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology: