Oceania
On the Importance of Reward Design in Reinforcement Learning-based Dynamic Algorithm Configuration: A Case Study on OneMax with (1+($\lambda$,$\lambda$))-GA
Nguyen, Tai, Le, Phong, Biedenkapp, André, Doerr, Carola, Dang, Nguyen
Dynamic Algorithm Configuration (DAC) has garnered significant attention in recent years, particularly in the prevalence of machine learning and deep learning algorithms. Numerous studies have leveraged the robustness of decision-making in Reinforcement Learning (RL) to address the optimization challenges associated with algorithm configuration. However, making an RL agent work properly is a non-trivial task, especially in reward design, which necessitates a substantial amount of handcrafted knowledge based on domain expertise. In this work, we study the importance of reward design in the context of DAC via a case study on controlling the population size of the $(1+(\lambda,\lambda))$-GA optimizing OneMax. We observed that a poorly designed reward can hinder the RL agent's ability to learn an optimal policy because of a lack of exploration, leading to both scalability and learning divergence issues. To address those challenges, we propose the application of a reward shaping mechanism to facilitate enhanced exploration of the environment by the RL agent. Our work not only demonstrates the ability of RL in dynamically configuring the $(1+(\lambda,\lambda))$-GA, but also confirms the advantages of reward shaping in the scalability of RL agents across various sizes of OneMax problems.
Lightweight yet Efficient: An External Attentive Graph Convolutional Network with Positional Prompts for Sequential Recommendation
Zhang, Jinyu, Li, Chao, Zhao, Zhongying
Graph-based Sequential Recommender systems (GSRs) have gained significant research attention due to their ability to simultaneously handle user-item interactions and sequential relationships between items. Current GSRs often utilize composite or in-depth structures for graph encoding (e.g., the Graph Transformer). Nevertheless, they have high computational complexity, hindering the deployment on resource-constrained edge devices. Moreover, the relative position encoding in Graph Transformer has difficulty in considering the complicated positional dependencies within sequence. To this end, we propose an External Attentive Graph convolutional network with Positional prompts for Sequential recommendation, namely EA-GPS. Specifically, we first introduce an external attentive graph convolutional network that linearly measures the global associations among nodes via two external memory units. Then, we present a positional prompt-based decoder that explicitly treats the absolute item positions as external prompts. By introducing length-adaptive sequential masking and a soft attention network, such a decoder facilitates the model to capture the long-term positional dependencies and contextual relationships within sequences. Extensive experimental results on five real-world datasets demonstrate that the proposed EA-GPS outperforms the state-of-the-art methods. Remarkably, it achieves the superior performance while maintaining a smaller parameter size and lower training overhead. The implementation of this work is publicly available at https://github.com/ZZY-GraphMiningLab/EA-GPS.
Haiti police raid gang leader's stronghold in capital
Haiti police raid gang leader's stronghold in capital 3 hours agoShareSaveLeonardo RochaBBC World Service Americas regional editor Jaroslav LukivBBC NewsShareSaveReutersGang control in Port-au-Prince has led to an almost complete breakdown of law and order The government of Haiti says police have launched a large-scale operation in a shantytown controlled by powerful gang leader Jimmy Chérizier, who is widely known as Barbecue. The authorities say several gang members have been killed in the Lower Delmas area of the capital Port-au-Prince. Local reports say military drones carrying explosives are being used in the operation. He said it was the work of a special task force created two days ago to tackle insecurity.Reuters Jimmy'Barbecue' Chérizier has become one of the most powerful gang leaders in Haiti Chérizier, aged 47, is the feared leader of Viv Ansam (Live Together), a coalition of gangs that control much of the city. It is not clear whether Kenyan police officers deployed in Haiti last year to help fight the gangs are involved in the security operation.
Depth-Adaptive Graph Neural Networks via Learnable Bakry-'Emery Curvature
Hevapathige, Asela, Zehmakan, Ahad N., Wang, Qing
Graph Neural Networks (GNNs) have demonstrated strong representation learning capabilities for graph-based tasks. Recent advances on GNNs leverage geometric properties, such as curvature, to enhance its representation capabilities by modeling complex connectivity patterns and information flow within graphs. However, most existing approaches focus solely on discrete graph topology, overlooking diffusion dynamics and task-specific dependencies essential for effective learning. To address this, we propose integrating Bakry-\'Emery curvature, which captures both structural and task-driven aspects of information propagation. We develop an efficient, learnable approximation strategy, making curvature computation scalable for large graphs. Furthermore, we introduce an adaptive depth mechanism that dynamically adjusts message-passing layers per vertex based on its curvature, ensuring efficient propagation. Our theoretical analysis establishes a link between curvature and feature distinctiveness, showing that high-curvature vertices require fewer layers, while low-curvature ones benefit from deeper propagation. Extensive experiments on benchmark datasets validate the effectiveness of our approach, showing consistent performance improvements across diverse graph learning tasks.
Can AI Model the Complexities of Human Moral Decision-Making? A Qualitative Study of Kidney Allocation Decisions
Keswani, Vijay, Conitzer, Vincent, Sinnott-Armstrong, Walter, Nguyen, Breanna K., Heidari, Hoda, Borg, Jana Schaich
A growing body of work in Ethical AI attempts to capture human moral judgments through simple computational models. The key question we address in this work is whether such simple AI models capture {the critical} nuances of moral decision-making by focusing on the use case of kidney allocation. We conducted twenty interviews where participants explained their rationale for their judgments about who should receive a kidney. We observe participants: (a) value patients' morally-relevant attributes to different degrees; (b) use diverse decision-making processes, citing heuristics to reduce decision complexity; (c) can change their opinions; (d) sometimes lack confidence in their decisions (e.g., due to incomplete information); and (e) express enthusiasm and concern regarding AI assisting humans in kidney allocation decisions. Based on these findings, we discuss challenges of computationally modeling moral judgments {as a stand-in for human input}, highlight drawbacks of current approaches, and suggest future directions to address these issues.
Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics
Arora, Siddhant, Lu, Zhiyun, Chiu, Chung-Cheng, Pang, Ruoming, Watanabe, Shinji
The recent wave of audio foundation models (FMs) could provide new capabilities for conversational modeling. However, there have been limited efforts to evaluate these audio FMs comprehensively on their ability to have natural and interactive conversations. To engage in meaningful conversation with the end user, we would want the FMs to additionally perform a fluent succession of turns without too much overlapping speech or long stretches of silence. Inspired by this, we ask whether the recently proposed audio FMs can understand, predict, and perform turn-taking events? To answer this, we propose a novel evaluation protocol that can assess spoken dialog system's turn-taking capabilities using a supervised model as a judge that has been trained to predict turn-taking events in human-human conversations. Using this protocol, we present the first comprehensive user study that evaluates existing spoken dialogue systems on their ability to perform turn-taking events and reveal many interesting insights, such as they sometimes do not understand when to speak up, can interrupt too aggressively and rarely backchannel. We further evaluate multiple open-source and proprietary audio FMs accessible through APIs on carefully curated test benchmarks from Switchboard to measure their ability to understand and predict turn-taking events and identify significant room for improvement. We will open source our evaluation platform to promote the development of advanced conversational AI systems.
One-shot In-context Part Segmentation
Dai, Zhenqi, Liu, Ting, Zhang, Xingxing, Wei, Yunchao, Zhang, Yanning
In this paper, we present the One-shot In-context Part Segmentation (OIParts) framework, designed to tackle the challenges of part segmentation by leveraging visual foundation models (VFMs). Existing training-based one-shot part segmentation methods that utilize VFMs encounter difficulties when faced with scenarios where the one-shot image and test image exhibit significant variance in appearance and perspective, or when the object in the test image is partially visible. We argue that training on the one-shot example often leads to overfitting, thereby compromising the model's generalization capability. Our framework offers a novel approach to part segmentation that is training-free, flexible, and data-efficient, requiring only a single in-context example for precise segmentation with superior generalization ability. By thoroughly exploring the complementary strengths of VFMs, specifically DINOv2 and Stable Diffusion, we introduce an adaptive channel selection approach by minimizing the intra-class distance for better exploiting these two features, thereby enhancing the discriminatory power of the extracted features for the fine-grained parts. We have achieved remarkable segmentation performance across diverse object categories. The OIParts framework not only eliminates the need for extensive labeled data but also demonstrates superior generalization ability. Through comprehensive experimentation on three benchmark datasets, we have demonstrated the superiority of our proposed method over existing part segmentation approaches in one-shot settings.
Rashomon Sets for Prototypical-Part Networks: Editing Interpretable Models in Real-Time
Donnelly, Jon, Guo, Zhicheng, Barnett, Alina Jade, McTavish, Hayden, Chen, Chaofan, Rudin, Cynthia
Interpretability is critical for machine learning models in high-stakes settings because it allows users to verify the model's reasoning. In computer vision, prototypical part models (ProtoPNets) have become the dominant model type to meet this need. Users can easily identify flaws in ProtoPNets, but fixing problems in a ProtoPNet requires slow, difficult retraining that is not guaranteed to resolve the issue. This problem is called the "interaction bottleneck." We solve the interaction bottleneck for ProtoPNets by simultaneously finding many equally good ProtoPNets (i.e., a draw from a "Rashomon set"). We show that our framework - called Proto-RSet - quickly produces many accurate, diverse ProtoPNets, allowing users to correct problems in real time while maintaining performance guarantees with respect to the training set. We demonstrate the utility of this method in two settings: 1) removing synthetic bias introduced to a bird identification model and 2) debugging a skin cancer identification model. This tool empowers non-machine-learning experts, such as clinicians or domain experts, to quickly refine and correct machine learning models without repeated retraining by machine learning experts.
Identity documents recognition and detection using semantic segmentation with convolutional neural network
Kozlenko, Mykola, Sendetskyi, Volodymyr, Simkiv, Oleksiy, Savchenko, Nazar, Bosyi, Andy
Object recognition and detection are well-studied problems with a developed set of almost standard solutions. Identity documents recognition, classification, detection, and localization are the tasks required in a number of applications, particularly, in physical access control security systems at critical infrastructure premises. In this paper, we propose the new original architecture of a model based on an artificial convolutional neural network and semantic segmentation approach for the recognition and detection of identity documents in images. The challenge with the processing of such images is the limited computational performance and the limited amount of memory when such an application is running on industrial oneboard microcomputer hardware. The aim of this research is to prove the feasibility of the proposed technique and to obtain quality metrics. The methodology of the research is to evaluate the deep learning detection model trained on the mobile identity document video dataset. The dataset contains five hundred video clips for fifty different identity document types. The numerical results from simulations are used to evaluate the quality metrics. We present the results as accuracy versus threshold of the intersection over union value. The paper reports an accuracy above 0.75 for the intersection over union (IoU) threshold value of 0.8. Besides, we assessed the size of the model and proved the feasibility of running the model on an industrial one-board microcomputer or smartphone hardware.
Riemannian Integrated Gradients: A Geometric View of Explainable AI
Costanza, Federico, Simpson, Lachlan
We introduce Riemannian Integrated Gradients (RIG); an extension of Integrated Gradients (IG) to Riemannian manif olds. We demonstrate that RIG restricts to IG when the Riemannian man ifold is Euclidean space. We show that feature attribution can be p hrased as an eigenvalue problem where attributions correspond to eig envalues of a symmetric endomorphism.