Goto

Collaborating Authors

 rta


Enhancing Jailbreak Attacks on LLMs via Persona Prompts

Zhang, Zheng, Zhao, Peilin, Ye, Deheng, Wang, Hao

arXiv.org Artificial Intelligence

Jailbreak attacks aim to exploit large language models (LLMs) by inducing them to generate harmful content, thereby revealing their vulnerabilities. Understanding and addressing these attacks is crucial for advancing the field of LLM safety. Previous jailbreak approaches have mainly focused on direct manipulations of harmful intent, with limited attention to the impact of persona prompts. In this study, we systematically explore the efficacy of persona prompts in compromising LLM defenses. We propose a genetic algorithm-based method that automatically crafts persona prompts to bypass LLM's safety mechanisms. Our experiments reveal that: (1) our evolved persona prompts reduce refusal rates by 50-70% across multiple LLMs, and (2) these prompts demonstrate synergistic effects when combined with existing attack methods, increasing success rates by 10-20%. Our code and data are available at https://github.com/CjangCjengh/Generic_Persona.


Stacked Universal Successor Feature Approximators for Safety in Reinforcement Learning

Cannon, Ian, Garcia, Washington, Gresavage, Thomas, Saurine, Joseph, Leong, Ian, Culbertson, Jared

arXiv.org Artificial Intelligence

Real-world problems often involve complex objective structures that resist distillation into reinforcement learning environments with a single objective. Operation costs must be balanced with multi-dimensional task performance and end-states' effects on future availability, all while ensuring safety for other agents in the environment and the reinforcement learning agent itself. System redundancy through secondary backup controllers has proven to be an effective method to ensure safety in real-world applications where the risk of violating constraints is extremely high. In this work, we investigate the utility of a stacked, continuous-control variation of universal successor feature approximation (USFA) adapted for soft actor-critic (SAC) and coupled with a suite of secondary safety controllers, which we call stacked USFA for safety (SUSFAS). Our method improves performance on secondary objectives compared to SAC baselines using an intervening secondary controller such as a runtime assurance (RTA) controller.


Collision Avoidance and Geofencing for Fixed-wing Aircraft with Control Barrier Functions

Molnar, Tamas G., Kannan, Suresh K., Cunningham, James, Dunlap, Kyle, Hobbs, Kerianne L., Ames, Aaron D.

arXiv.org Artificial Intelligence

Safety-critical failures often have fatal consequences in aerospace control. Control systems on aircraft, therefore, must ensure the strict satisfaction of safety constraints, preferably with formal guarantees of safe behavior. This paper establishes the safety-critical control of fixed-wing aircraft in collision avoidance and geofencing tasks. A control framework is developed wherein a run-time assurance (RTA) system modulates the nominal flight controller of the aircraft whenever necessary to prevent it from colliding with other aircraft or crossing a boundary (geofence) in space. The RTA is formulated as a safety filter using control barrier functions (CBFs) with formal guarantees of safe behavior. CBFs are constructed and compared for a nonlinear kinematic fixed-wing aircraft model. The proposed CBF-based controllers showcase the capability of safely executing simultaneous collision avoidance and geofencing, as demonstrated by simulations on the kinematic model and a high-fidelity dynamical model.


Searching for Optimal Runtime Assurance via Reachability and Reinforcement Learning

Miller, Kristina, Zeitler, Christopher K., Shen, William, Hobbs, Kerianne, Mitra, Sayan, Schierman, John, Viswanathan, Mahesh

arXiv.org Artificial Intelligence

A runtime assurance system (RTA) for a given plant enables the exercise of an untrusted or experimental controller while assuring safety with a backup (or safety) controller. The relevant computational design problem is to create a logic that assures safety by switching to the safety controller as needed, while maximizing some performance criteria, such as the utilization of the untrusted controller. Existing RTA design strategies are well-known to be overly conservative and, in principle, can lead to safety violations. In this paper, we formulate the optimal RTA design problem and present a new approach for solving it. Our approach relies on reward shaping and reinforcement learning. It can guarantee safety and leverage machine learning technologies for scalability. We have implemented this algorithm and present experimental results comparing our approach with state-of-the-art reachability and simulation-based RTA approaches in a number of scenarios using aircraft models in 3D space with complex safety requirements. Our approach can guarantee safety while increasing utilization of the experimental controller over existing approaches.


Membership Inference Attacks on DNNs using Adversarial Perturbations

Ali, Hassan, Qayyum, Adnan, Al-Fuqaha, Ala, Qadir, Junaid

arXiv.org Artificial Intelligence

Several membership inference (MI) attacks have been proposed to audit a target DNN. Given a set of subjects, MI attacks tell which subjects the target DNN has seen during training. This work focuses on the post-training MI attacks emphasizing high confidence membership detection -- True Positive Rates (TPR) at low False Positive Rates (FPR). Current works in this category -- likelihood ratio attack (LiRA) and enhanced MI attack (EMIA) -- only perform well on complex datasets (e.g., CIFAR-10 and Imagenet) where the target DNN overfits its train set, but perform poorly on simpler datasets (0% TPR by both attacks on Fashion-MNIST, 2% and 0% TPR respectively by LiRA and EMIA on MNIST at 1% FPR). To address this, firstly, we unify current MI attacks by presenting a framework divided into three stages -- preparation, indication and decision. Secondly, we utilize the framework to propose two novel attacks: (1) Adversarial Membership Inference Attack (AMIA) efficiently utilizes the membership and the non-membership information of the subjects while adversarially minimizing a novel loss function, achieving 6% TPR on both Fashion-MNIST and MNIST datasets; and (2) Enhanced AMIA (E-AMIA) combines EMIA and AMIA to achieve 8% and 4% TPRs on Fashion-MNIST and MNIST datasets respectively, at 1% FPR. Thirdly, we introduce two novel augmented indicators that positively leverage the loss information in the Gaussian neighborhood of a subject. This improves TPR of all four attacks on average by 2.5% and 0.25% respectively on Fashion-MNIST and MNIST datasets at 1% FPR. Finally, we propose simple, yet novel, evaluation metric, the running TPR average (RTA) at a given FPR, that better distinguishes different MI attacks in the low FPR region. We also show that AMIA and E-AMIA are more transferable to the unknown DNNs (other than the target DNN) and are more robust to DP-SGD training as compared to LiRA and EMIA.


V\=arta: A Large-Scale Headline-Generation Dataset for Indic Languages

Aralikatte, Rahul, Cheng, Ziling, Doddapaneni, Sumanth, Cheung, Jackie Chi Kit

arXiv.org Artificial Intelligence

We present V\=arta, a large-scale multilingual dataset for headline generation in Indic languages. This dataset includes 41.8 million news articles in 14 different Indic languages (and English), which come from a variety of high-quality sources. To the best of our knowledge, this is the largest collection of curated articles for Indic languages currently available. We use the data collected in a series of experiments to answer important questions related to Indic NLP and multilinguality research in general. We show that the dataset is challenging even for state-of-the-art abstractive models and that they perform only slightly better than extractive baselines. Owing to its size, we also show that the dataset can be used to pretrain strong language models that outperform competitive baselines in both NLU and NLG benchmarks.


MisRoB{\AE}RTa: Transformers versus Misinformation

Truică, Ciprian-Octavian, Apostol, Elena-Simona

arXiv.org Artificial Intelligence

Misinformation is considered a threat to our democratic values and principles. The spread of such content on social media polarizes society and undermines public discourse by distorting public perceptions and generating social unrest while lacking the rigor of traditional journalism. Transformers and transfer learning proved to be state-of-the-art methods for multiple well-known natural language processing tasks. In this paper, we propose MisRoB{\AE}RTa, a novel transformer-based deep neural ensemble architecture for misinformation detection. MisRoB{\AE}RTa takes advantage of two transformers (BART \& RoBERTa) to improve the classification performance. We also benchmarked and evaluated the performances of multiple transformers on the task of misinformation detection. For training and testing, we used a large real-world news articles dataset labeled with 10 classes, addressing two shortcomings in the current research: increasing the size of the dataset from small to large, and moving the focus of fake news detection from binary classification to multi-class classification. For this dataset, we manually verified the content of the news articles to ensure that they were correctly labeled. The experimental results show that the accuracy of transformers on the misinformation detection problem was significantly influenced by the method employed to learn the context, dataset size, and vocabulary dimension. We observe empirically that the best accuracy performance among the classification models that use only one transformer is obtained by BART, while DistilRoBERTa obtains the best accuracy in the least amount of time required for fine-tuning and training. The proposed MisRoB{\AE}RTa outperforms the other transformer models in the task of misinformation detection. To arrive at this conclusion, we performed ample ablation and sensitivity testing with MisRoB{\AE}RTa on two datasets.


Dubai Tram drivers monitored by artificial intelligence in safety drive

#artificialintelligence

Dubai's transport authority is trialling the use of artificial intelligence to monitor tram drivers. The Roads and Transport Authority on Sunday said the data collected could be used to cut accidents, prevent unsafe driving, show incident hot spots and enhance passenger safety. The system includes a smart device and an armband that tracks drivers' heart rates, speech patterns and reaction times to assess driving style, unsafe patterns and gestures based on profiles. The system includes a smart device and an armband that tracks driver heart rate, speech patterns and reaction times. The RTA said the data collected is then "processed from both incidents and routine operations to provide a comprehensive understanding of the individuals". "Transportation networks and their assets are widely known as critical infrastructure that require attention to detail and special protection," said Hassan Al Mutawa, director of rail operations at the RTA.


Ablation Study of How Run Time Assurance Impacts the Training and Performance of Reinforcement Learning Agents

Hamilton, Nathaniel, Dunlap, Kyle, Johnson, Taylor T, Hobbs, Kerianne L

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) has become an increasingly important research area as the success of machine learning algorithms and methods grows. To combat the safety concerns surrounding the freedom given to RL agents while training, there has been an increase in work concerning Safe Reinforcement Learning (SRL). However, these new and safe methods have been held to less scrutiny than their unsafe counterparts. For instance, comparisons among safe methods often lack fair evaluation across similar initial condition bounds and hyperparameter settings, use poor evaluation metrics, and cherry-pick the best training runs rather than averaging over multiple random seeds. In this work, we conduct an ablation study using evaluation best practices to investigate the impact of run time assurance (RTA), which monitors the system state and intervenes to assure safety, on effective learning. By studying multiple RTA approaches in both on-policy and off-policy RL algorithms, we seek to understand which RTA methods are most effective, whether the agents become dependent on the RTA, and the importance of reward shaping versus safe exploration in RL agent training. Our conclusions shed light on the most promising directions of SRL, and our evaluation methodology lays the groundwork for creating better comparisons in future SRL work.


RTA uses artificial intelligence, high-tech to improve bus services

#artificialintelligence

His Excellency Mattar Mohammed Al Tayer, Director-General, Chairman of the Board of Executive Directors of Roads and Transport Authority (RTA), revealed that RTA's precautionary measures and initiatives applied to the scheduling and the operation of public buses, marine transit means and taxis had accelerated the recovery from the Covid-19 pandemic. He stated that such measures contributed to restoring the growth of public transport ridership to 70% of the pre-Covid-19 levels. They also contributed to reducing the number of kilometres travelled by 18%, improving bus on-time arrival by 6%, and cutting carbon emissions by 34 metric tons. "In cooperation with Alibaba Cloud, RTA has recently started trialling the'City Brain' system to manage traffic in urban areas using artificial intelligence and advanced algorithms. The system analysis a massive number of big data received from nol cards, operating buses and taxis as well as the Enterprise Command and Control Centre. Then it converts the data into useful information that could be used in sending instant notifications and improving bus schedules and routes. The system is expected to improve the bus ridership by 17%, average waiting time by 10%, and the journey time and the average bus usage by 5%," stated Al Tayer.