Goto

Collaborating Authors

 journal paper


Foundation Models and Transformers for Anomaly Detection: A Survey

Ammar, Mouïn Ben, Mendoza, Arturo, Belkhir, Nacim, Manzanera, Antoine, Franchi, Gianni

arXiv.org Artificial Intelligence

In line with the development of deep learning, this survey examines the trans-formative role of T ransformers and foundation models in advancing visual anomaly detection (VAD). We explore how these architectures, with their global receptive fields and adaptability, address challenges such as long-range dependency modeling, contextual modeling and data scarcity . The survey categorizes VAD methods into reconstruction-based, feature-based and zero/few-shot approaches, highlighting the paradigm shift brought about by foundation models. By integrating attention mechanisms and leveraging large-scale pre-training, T ransformers and foundation models enable more robust, interpretable, and scalable anomaly detection solutions. This work provides a comprehensive review of state-of-the-art techniques, their strengths, limitations, and emerging trends in leveraging these architectures for VAD.


Can "consciousness" be observed from large language model (LLM) internal states? Dissecting LLM representations obtained from Theory of Mind test with Integrated Information Theory and Span Representation analysis

Li, Jingkai

arXiv.org Artificial Intelligence

Integrated Information Theory (IIT) provides a quantitative framework for explaining consciousness phenomenon, positing that conscious systems comprise elements integrated through causal properties. We apply IIT 3.0 and 4.0 -- the latest iterations of this framework -- to sequences of Large Language Model (LLM) representations, analyzing data derived from existing Theory of Mind (ToM) test results. Our study systematically investigates whether the differences of ToM test performances, when presented in the LLM representations, can be revealed by IIT estimates, i.e., $Φ^{\max}$ (IIT 3.0), $Φ$ (IIT 4.0), Conceptual Information (IIT 3.0), and $Φ$-structure (IIT 4.0). Furthermore, we compare these metrics with the Span Representations independent of any estimate for consciousness. This additional effort aims to differentiate between potential "consciousness" phenomena and inherent separations within LLM representational space. We conduct comprehensive experiments examining variations across LLM transformer layers and linguistic spans from stimuli. Our results suggest that sequences of contemporary Transformer-based LLM representations lack statistically significant indicators of observed "consciousness" phenomena but exhibit intriguing patterns under $\textit{spatio}$-permutational analyses. The Appendix and code are available as Supplementary Materials at: https://doi.org/10.1016/j.nlp.2025.100163.


Naming the Pain in Machine Learning-Enabled Systems Engineering

Kalinowski, Marcos, Mendez, Daniel, Giray, Görkem, Alves, Antonio Pedro Santos, Azevedo, Kelly, Escovedo, Tatiana, Villamizar, Hugo, Lopes, Helio, Baldassarre, Teresa, Wagner, Stefan, Biffl, Stefan, Musil, Jürgen, Felderer, Michael, Lavesson, Niklas, Gorschek, Tony

arXiv.org Artificial Intelligence

Context: Machine learning (ML)-enabled systems are being increasingly adopted by companies aiming to enhance their products and operational processes. Objective: This paper aims to deliver a comprehensive overview of the current status quo of engineering ML-enabled systems and lay the foundation to steer practically relevant and problem-driven academic research. Method: We conducted an international survey to collect insights from practitioners on the current practices and problems in engineering ML-enabled systems. We received 188 complete responses from 25 countries. We conducted quantitative statistical analyses on contemporary practices using bootstrapping with confidence intervals and qualitative analyses on the reported problems using open and axial coding procedures. Results: Our survey results reinforce and extend existing empirical evidence on engineering ML-enabled systems, providing additional insights into typical ML-enabled systems project contexts, the perceived relevance and complexity of ML life cycle phases, and current practices related to problem understanding, model deployment, and model monitoring. Furthermore, the qualitative analysis provides a detailed map of the problems practitioners face within each ML life cycle phase and the problems causing overall project failure. Conclusions: The results contribute to a better understanding of the status quo and problems in practical environments. We advocate for the further adaptation and dissemination of software engineering practices to enhance the engineering of ML-enabled systems.


AI Competitions and Benchmarks: towards impactful challenges with post-challenge papers, benchmarks and other dissemination actions

Marot, Antoine, Rousseau, David, Xu, Zhen

arXiv.org Artificial Intelligence

Organising an AI challenge does not end with the final event. The long-lasting impact also needs to be organised. This chapter covers the various activities after the challenge is formally finished. The target audience of different post-challenge activities is identified. The various outputs of the challenge are listed with the means to collect them. The main part of the chapter is a template for a typical post-challenge paper, including possible graphs as well as advice on how to turn the challenge into a long-lasting benchmark.


How many preprints have actually been printed and why: a case study of computer science preprints on arXiv

Lin, Jialiang, Yu, Yao, Zhou, Yu, Zhou, Zhiyang, Shi, Xiaodong

arXiv.org Artificial Intelligence

Preprints play an increasingly critical role in academic communities. There are many reasons driving researchers to post their manuscripts to preprint servers before formal submission to journals or conferences, but the use of preprints has also sparked considerable controversy, especially surrounding the claim of priority. In this paper, a case study of computer science preprints submitted to arXiv from 2008 to 2017 is conducted to quantify how many preprints have eventually been printed in peer-reviewed venues. Among those published manuscripts, some are published under different titles and without an update to their preprints on arXiv. In the case of these manuscripts, the traditional fuzzy matching method is incapable of mapping the preprint to the final published version. In view of this issue, we introduce a semantics-based mapping method with the employment of Bidirectional Encoder Representations from Transformers (BERT). With this new mapping method and a plurality of data sources, we find that 66% of all sampled preprints are published under unchanged titles and 11% are published under different titles and with other modifications. A further analysis was then performed to investigate why these preprints but not others were accepted for publication. Our comparison reveals that in the field of computer science, published preprints feature adequate revisions, multiple authorship, detailed abstract and introduction, extensive and authoritative references and available source code.


Transformer Network-based Reinforcement Learning Method for Power Distribution Network (PDN) Optimization of High Bandwidth Memory (HBM)

Park, Hyunwook, Kim, Minsu, Kim, Seongguk, Kim, Keunwoo, Kim, Haeyeon, Shin, Taein, Son, Keeyoung, Sim, Boogyo, Kim, Subin, Jeong, Seungtaek, Hwang, Chulsoon, Kim, Joungho

arXiv.org Artificial Intelligence

In this article, for the first time, we propose a transformer network-based reinforcement learning (RL) method for power distribution network (PDN) optimization of high bandwidth memory (HBM). The proposed method can provide an optimal decoupling capacitor (decap) design to maximize the reduction of PDN self- and transfer impedance seen at multiple ports. An attention-based transformer network is implemented to directly parameterize decap optimization policy. The optimality performance is significantly improved since the attention mechanism has powerful expression to explore massive combinatorial space for decap assignments. Moreover, it can capture sequential relationships between the decap assignments. The computing time for optimization is dramatically reduced due to the reusable network on positions of probing ports and decap assignment candidates. This is because the transformer network has a context embedding process to capture meta-features including probing ports positions. In addition, the network is trained with randomly generated data sets. Therefore, without additional training, the trained network can solve new decap optimization problems. The computing time for training and data cost are critically decreased due to the scalability of the network. Thanks to its shared weight property, the network can adapt to a larger scale of problems without additional training. For verification, we compare the results with conventional genetic algorithm (GA), random search (RS), and all the previous RL-based methods. As a result, the proposed method outperforms in all the following aspects: optimality performance, computing time, and data efficiency.


Features of a smart city

#artificialintelligence

A smart city is a city that uses technology to provide services and solve city problems. The main goals of a smart city are to improve policy efficiency, reduce waste and inconvenience, improve social and economic quality, and maximize social inclusion. Due to the breadth of technologies that have been implemented under the smart city label, it is difficult to distill a precise definition of a smart city. As the world's population continues to urbanize – by 2050, 66% of the world's population is expected to be urban – there is a global trend toward the creation of smart cities. This tendency not only causes many physical, social, behavioural, economic, and infrastructure issues, but it also creates many opportunities.


Why Knowledge Representation Matters

AITopics Original Links

There is a big difference between the attention artificial intelligence (AI) is currently receiving and that of the 1990s. Twenty years ago, the focus was on logic-based AI, usually under the heading of knowledge representation, or KR, whereas today's focus is on machine learning and statistical algorithms. This shift has served AI well, since machine learning and stats provide effective algorithmic solutions to certain kinds of problems (such as image recognition), in a way that KR never did. However, I contend the pendulum has swung too far, and something valuable has been lost. Knowledge representation is not a single thing.


Robot-written reviews fool academics

#artificialintelligence

Soulless computer algorithms are already churning out weather bulletins, sports reports, rap lyrics and even passable Chinese poetry. But it seems machines have now taken another step towards replacing human enterprise by generating their own reviews of serious academic journal papers that are able to impress even experienced academics. Using automatic text generation software, computer scientists at Italy's University of Trieste created a series of fake peer reviews of genuine journal papers and asked academics of different levels of seniority to say whether they agreed with their recommendations to accept for publication or not. In a quarter of cases, academics said they agreed with the fake review's conclusions, even though they were entirely made up of computer-generated gobbledegook – or, rather, sentences picked at random from a selection of peer reviews taken from subjects as diverse as brain science, ecology and ornithology. "Sentences like'it would be good if you can also talk about the importance of establishing some good shared benchmarks' or'it would be useful to identify key assumptions in the modelling' are probably well suited to almost any review," explained Eric Medvet, assistant professor at Trieste's department of engineering and architecture, who conducted the experiment with colleagues at his university's Machine Learning Lab.


A History of Cluster Analysis Using the Classification Society's Bibliography Over Four Decades

Murtagh, Fionn, Kurtz, Michael J.

arXiv.org Machine Learning

The Classification Literature Automated Search Service, an annual bibliography based on citation of one or more of a set of around 80 book or journal publications, ran from 1972 to 2012. We analyze here the years 1994 to 2011. The Classification Society's Service, as it was termed, has been produced by the Classification Society. In earlier decades it was distributed as a diskette or CD with the Journal of Classification. Among our findings are the following: an enormous increase in scholarly production post approximately 2000; a very major increase in quantity, coupled with work in different disciplines, from approximately 2004; and a major shift also from cluster analysis in earlier times having mathematics and psychology as disciplines of the journals published in, and affiliations of authors, contrasted with, in more recent times, a "centre of gravity" in management and engineering.