Overview
Comparison of different Unique hard attention transformer models by the formal languages they can recognize
The goal of this note is to give an overview of the capabilities of different flavors of unique hard attention transformer encoders in terms of the formal languages they are able to recognize. This study is relevant in the context of the rising use of large language models, which typically follow a transformer architecture. While the model we will be primarily investigating has features very distinct from real-world transformers (we will comment on the distinction later) they can still give valuable insights into the principle underlying transformer capabilities. Roughly speaking, a transformer can be thought of function that, given an input of any length, can construct a sequence of the same length. It transforms one sequence into the other.
The Future of Continual Learning in the Era of Foundation Models: Three Key Directions
Bell, Jack, Quarantiello, Luigi, Coleman, Eric Nuertey, Li, Lanpei, Li, Malio, Madeddu, Mauro, Piccoli, Elia, Lomonaco, Vincenzo
Continual learning--the ability to acquire, retain, and refine knowledge over time--has always been fundamental to intelligence, both human and artificial. Historically, different AI paradigms have acknowledged this need, albeit with varying priorities: early expert and production systems focused on incremental knowledge consolidation, while reinforcement learning emphasised dynamic adaptation. With the rise of deep learning, deep continual learning has primarily focused on learning robust and reusable representations over time to solve sequences of increasingly complex tasks. However, the emergence of Large Language Models (LLMs) and foundation models has raised the question: Do we still need continual learning when centralised, monolithic models can tackle diverse tasks with access to internet-scale knowledge? We argue that continual learning remains essential for three key reasons: (i) continual pre-training is still necessary to ensure foundation models remain up to date, mitigating knowledge staleness and distribution shifts while integrating new information; (ii) continual fine-tuning enables models to specialise and personalise, adapting to domain-specific tasks, user preferences, and real-world constraints without full retraining, avoiding the need for computationally expensive long context-windows; (iii) continual compositionality offers a scalable and modular approach to intelligence, enabling the orchestration of foundation models and agents to be dynamically composed, recombined, and adapted. While continual pre-training and fine-tuning are explored as niche research directions, we argue it is continual compositionality that will mark the rebirth of continual learning. The future of AI will not be defined by a single static model but by an ecosystem of continually evolving and interacting models, making continual learning more relevant than ever.
A Survey of Deep Learning Video Super-Resolution
Baniya, Arbind Agrahari, Lee, Tsz-Kwan, Eklund, Peter, Aryal, Sunil
Video super-resolution (VSR) is a prominent research topic in low-level computer vision, where deep learning technologies have played a significant role. The rapid progress in deep learning and its applications in VSR has led to a proliferation of tools and techniques in the literature. However, the usage of these methods is often not adequately explained, and decisions are primarily driven by quantitative improvements. Given the significance of VSR's potential influence across multiple domains, it is imperative to conduct a comprehensive analysis of the elements and deep learning methodologies employed in VSR research. This methodical analysis will facilitate the informed development of models tailored to specific application needs. In this paper, we present an overarching overview of deep learning-based video super-resolution models, investigating each component and discussing its implications. Furthermore, we provide a synopsis of key components and technologies employed by state-of-the-art and earlier VSR models. By elucidating the underlying methodologies and categorising them systematically, we identified trends, requirements, and challenges in the domain. As a first-of-its-kind survey of deep learning-based VSR models, this work also establishes a multi-level taxonomy to guide current and future VSR research, enhancing the maturation and interpretation of VSR practices for various practical applications.
UAE AMBASSADOR YOUSEF AL OTAIBA: US and UAE forge groundbreaking high-tech partnership based on AI
President Donald Trump's recent visit to the UAE marked a pivotal moment for UAE-U.S. bilateral relations, shining a spotlight on a shared vision for the future. As the UAE and the "New Gulf" pivot from oil to cutting-edge technologies, our partnership with the U.S., rooted in decades of trust, has become a beacon of what's possible when nations collaborate. This trust has paved the way for a bold new chapter: a strategic economic alliance poised to create tens of thousands of high-tech, energy and manufacturing jobs, driving prosperity in both of our countries. At the heart of this collaboration lies the new U.S.-UAE AI Acceleration Partnership. This initiative will advance cooperation in artificial intelligence and other transformative technologies while spurring investment flows between our nations.
Natural Language Processing to Enhance Deliberation in Political Online Discussions: A Survey
Behrendt, Maike, Wagner, Stefan Sylvius, Weinmann, Carina, Bormann, Marike, Warne, Mira, Harmeling, Stefan
Political online participation in the form of discussing political issues and exchanging opinions among citizens is gaining importance with more and more formats being held digitally. To come to a decision, a careful discussion and consideration of opinions and a civil exchange of arguments, which is defined as the act of deliberation, is desirable. The quality of discussions and participation processes in terms of their deliberativeness highly depends on the design of platforms and processes. To facilitate online communication for both participants and initiators, machine learning methods offer a lot of potential. In this work we want to showcase which issues occur in political online discussions and how machine learning can be used to counteract these issues and enhance deliberation.
ThinkTank: A Framework for Generalizing Domain-Specific AI Agent Systems into Universal Collaborative Intelligence Platforms
Surabhi, Praneet Sai Madhu, Mudireddy, Dheeraj Reddy, Tao, Jian
This paper presents ThinkTank, a comprehensive and scalable framework designed to transform specialized AI agent systems into versatile collaborative intelligence platforms capable of supporting complex problem-solving across diverse domains. ThinkTank systematically generalizes agent roles, meeting structures, and knowledge integration mechanisms by adapting proven scientific collaboration methodologies. Through role abstraction, generalization of meeting types for iterative collaboration, and the integration of Retrieval-Augmented Generation with advanced knowledge storage, the framework facilitates expertise creation and robust knowledge sharing. ThinkTank enables organizations to leverage collaborative AI for knowledge-intensive tasks while ensuring data privacy and security through local deployment, utilizing frameworks like Ollama with models such as Llama3.1. The ThinkTank framework is designed to deliver significant advantages in cost-effectiveness, data security, scalability, and competitive positioning compared to cloud-based alternatives, establishing it as a universal platform for AI-driven collaborative problem-solving. The ThinkTank code is available at https://github.com/taugroup/ThinkTank
Constrained Sliced Wasserstein Embedding
NaderiAlizadeh, Navid, Salehi, Darian, Liu, Xinran, Kolouri, Soheil
Sliced Wasserstein (SW) distances offer an efficient method for comparing high-dimensional probability measures by projecting them onto multiple 1-dimensional probability distributions. However, identifying informative slicing directions has proven challenging, often necessitating a large number of slices to achieve desirable performance and thereby increasing computational complexity. We introduce a constrained learning approach to optimize the slicing directions for SW distances. Specifically, we constrain the 1D transport plans to approximate the optimal plan in the original space, ensuring meaningful slicing directions. By leveraging continuous relaxations of these transport plans, we enable a gradient-based primal-dual approach to train the slicer parameters, alongside the remaining model parameters. We demonstrate how this constrained slicing approach can be applied to pool high-dimensional embeddings into fixed-length permutation-invariant representations. Numerical results on foundation models trained on images, point clouds, and protein sequences showcase the efficacy of the proposed constrained learning approach in learning more informative slicing directions. Our implementation code can be found at https://github.com/Stranja572/constrainedswe.
The Hitchhikers Guide to Production-ready Trustworthy Foundation Model powered Software (FMware)
Vasilevski, Kirill, Rombaut, Benjamin, Rajbahadur, Gopi Krishnan, Oliva, Gustavo A., Gallaba, Keheliya, Cogo, Filipe R., Lin, Jiahuei, Lin, Dayi, Zhang, Haoxiang, Chen, Bouyan, Thangarajah, Kishanthan, Hassan, Ahmed E., Jiang, Zhen Ming
Foundation Models (FMs) such as Large Language Models (LLMs) are reshaping the software industry by enabling FMware, systems that integrate these FMs as core components. In this KDD 2025 tutorial, we present a comprehensive exploration of FMware that combines a curated catalogue of challenges with real-world production concerns. We first discuss the state of research and practice in building FMware. We further examine the difficulties in selecting suitable models, aligning high-quality domain-specific data, engineering robust prompts, and orchestrating autonomous agents. We then address the complex journey from impressive demos to production-ready systems by outlining issues in system testing, optimization, deployment, and integration with legacy software. Drawing on our industrial experience and recent research in the area, we provide actionable insights and a technology roadmap for overcoming these challenges. Attendees will gain practical strategies to enable the creation of trustworthy FMware in the evolving technology landscape.
Early Detection of Patient Deterioration from Real-Time Wearable Monitoring System
Ting, Lo Pang-Yun, Chen, Hong-Pei, Liu, An-Shan, Yeh, Chun-Yin, Chen, Po-Lin, Chuang, Kun-Ta
Early detection of patient deterioration is crucial for reducing mortality rates. Heart rate data has shown promise in assessing patient health, and wearable devices offer a cost-effective solution for real-time monitoring. However, extracting meaningful insights from diverse heart rate data and handling missing values in wearable device data remain key challenges. To address these challenges, we propose TARL, an innovative approach that models the structural relationships of representative subsequences, known as shapelets, in heart rate time series. TARL creates a shapelet-transition knowledge graph to model shapelet dynamics in heart rate time series, indicating illness progression and potential future changes. We further introduce a transition-aware knowledge embedding to reinforce relationships among shapelets and quantify the impact of missing values, enabling the formulation of comprehensive heart rate representations. These representations capture explanatory structures and predict future heart rate trends, aiding early illness detection. We collaborate with physicians and nurses to gather ICU patient heart rate data from wearables and diagnostic metrics assessing illness severity for evaluating deterioration. Experiments on real-world ICU data demonstrate that TARL achieves both high reliability and early detection. A case study further showcases TARL's explainable detection process, highlighting its potential as an AI-driven tool to assist clinicians in recognizing early signs of patient deterioration.
DeCo: Defect-Aware Modeling with Contrasting Matching for Optimizing Task Assignment in Online IC Testing
Ting, Lo Pang-Yun, Chiang, Yu-Hao, Tsai, Yi-Tung, Lai, Hsu-Chao, Chuang, Kun-Ta
In the semiconductor industry, integrated circuit (IC) processes play a vital role, as the rising complexity and market expectations necessitate improvements in yield. Identifying IC defects and assigning IC testing tasks to the right engineers improves efficiency and reduces losses. While current studies emphasize fault localization or defect classification, they overlook the integration of defect characteristics, historical failures, and the insights from engineer expertise, which restrains their effectiveness in improving IC handling. To leverage AI for these challenges, we propose DeCo, an innovative approach for optimizing task assignment in IC testing. DeCo constructs a novel defect-aware graph from IC testing reports, capturing co-failure relationships to enhance defect differentiation, even with scarce defect data. Additionally, it formulates defect-aware representations for engineers and tasks, reinforced by local and global structure modeling on the defect-aware graph. Finally, a contrasting-based assignment mechanism pairs testing tasks with QA engineers by considering their skill level and current workload, thus promoting an equitable and efficient job dispatch. Experiments on a real-world dataset demonstrate that DeCo achieves the highest task-handling success rates in different scenarios, exceeding 80\%, while also maintaining balanced workloads on both scarce or expanded defect data. Moreover, case studies reveal that DeCo can assign tasks to potentially capable engineers, even for their unfamiliar defects, highlighting its potential as an AI-driven solution for the real-world IC failure analysis and task handling.