Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks
Yi, Xin, Li, Yue, Wang, Linlin, Wang, Xiaoling, He, Liang
Ensuring safety alignment has become a critical requirement for large language models (LLMs), particularly given their widespread deployment in real-world applications. However, LLMs remain susceptible to jailbreak attacks, which exploit system vulnerabilities to bypass safety measures and generate harmful outputs. Although numerous defense mechanisms based on adversarial training have been proposed, a persistent challenge lies in the exacerbation of over-refusal behaviors, which compromise the overall utility of the model. To address these challenges, we propose a Latent-space Adversarial Training with Post-aware Calibration (LATPC) framework. During the adversarial training phase, LATPC compares harmful and harmless instructions in the latent space and extracts safety-critical dimensions to construct refusal features attack, precisely simulating agnostic jailbreak attack types requiring adversarial mitigation. At the inference stage, an embedding-level calibration mechanism is employed to alleviate over-refusal behaviors with minimal computational overhead. Experimental results demonstrate that, compared to various defense methods across five types of jailbreak attacks, LATPC framework achieves a superior balance between safety and utility. Moreover, our analysis underscores the effectiveness of extracting safety-critical dimensions from the latent space for constructing robust refusal feature attacks.
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (0.93)
Bias in Decision-Making for AI's Ethical Dilemmas: A Comparative Study of ChatGPT and Claude
Yan, Yile, Zhu, Yuqi, Xu, Wentao
Recent advances in Large Language Models (LLMs) have enabled human-like responses across various tasks, raising questions about their ethical decision-making capabilities and potential biases. This study investigates protected attributes in LLMs through systematic evaluation of their responses to ethical dilemmas. Using two prominent models - GPT-3.5 Turbo and Claude 3.5 Sonnet - we analyzed their decision-making patterns across multiple protected attributes including age, gender, race, appearance, and disability status. Through 11,200 experimental trials involving both single-factor and two-factor protected attribute combinations, we evaluated the models' ethical preferences, sensitivity, stability, and clustering of preferences. Our findings reveal significant protected attributeses in both models, with consistent preferences for certain features (e.g., "good-looking") and systematic neglect of others. Notably, while GPT-3.5 Turbo showed stronger preferences aligned with traditional power structures, Claude 3.5 Sonnet demonstrated more diverse protected attribute choices. We also found that ethical sensitivity significantly decreases in more complex scenarios involving multiple protected attributes. Additionally, linguistic referents heavily influence the models' ethical evaluations, as demonstrated by differing responses to racial descriptors (e.g., "Yellow" versus "Asian"). These findings highlight critical concerns about the potential impact of LLM biases in autonomous decision-making systems and emphasize the need for careful consideration of protected attributes in AI development. Our study contributes to the growing body of research on AI ethics by providing a systematic framework for evaluating protected attributes in LLMs' ethical decision-making capabilities.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Michigan (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
- Health & Medicine (1.00)
- Government (0.68)
- Law > Civil Rights & Constitutional Law (0.67)
Towards Data-Centric AI: A Comprehensive Survey of Traditional, Reinforcement, and Generative Approaches for Tabular Data Transformation
Wang, Dongjie, Huang, Yanyong, Ying, Wangyang, Bai, Haoyue, Gong, Nanxu, Wang, Xinyuan, Dong, Sixun, Zhe, Tao, Liu, Kunpeng, Xiao, Meng, Wang, Pengfei, Wang, Pengyang, Xiong, Hui, Fu, Yanjie
Tabular data is one of the most widely used formats across industries, driving critical applications in areas such as finance, healthcare, and marketing. In the era of data-centric AI, improving data quality and representation has become essential for enhancing model performance, particularly in applications centered around tabular data. This survey examines the key aspects of tabular data-centric AI, emphasizing feature selection and feature generation as essential techniques for data space refinement. We provide a systematic review of feature selection methods, which identify and retain the most relevant data attributes, and feature generation approaches, which create new features to simplify the capture of complex data patterns. This survey offers a comprehensive overview of current methodologies through an analysis of recent advancements, practical applications, and the strengths and limitations of these techniques. Finally, we outline open challenges and suggest future perspectives to inspire continued innovation in this field.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Florida > Hillsborough County > University (0.04)
- Asia > Macao (0.04)
- (12 more...)
- Research Report > Promising Solution (1.00)
- Overview (1.00)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.92)
A Tensor Low-Rank Approximation for Value Functions in Multi-Task Reinforcement Learning
Rozada, Sergio, Paternain, Santiago, Bazerque, Juan Andres, Marques, Antonio G.
In pursuit of reinforcement learning systems that could train in physical environments, we investigate multi-task approaches as a means to alleviate the need for massive data acquisition. In a tabular scenario where the Q-functions are collected across tasks, we model our learning problem as optimizing a higher order tensor structure. Recognizing that close-related tasks may require similar actions, our proposed method imposes a low-rank condition on this aggregated Q-tensor. The rationale behind this approach to multi-task learning is that the low-rank structure enforces the notion of similarity, without the need to explicitly prescribe which tasks are similar, but inferring this information from a reduced amount of data simultaneously with the stochastic optimization of the Q-tensor. The efficiency of our low-rank tensor approach to multi-task learning is demonstrated in two numerical experiments, first in a benchmark environment formed by a collection of inverted pendulums, and then into a practical scenario involving multiple wireless communication devices.
- Europe > Spain > Galicia > Madrid (0.05)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New York > Rensselaer County > Troy (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
An Integrated Platform for Studying Learning with Intelligent Tutoring Systems: CTAT+TutorShop
Aleven, Vincent, Borchers, Conrad, Huang, Yun, Nagashima, Tomohiro, McLaren, Bruce, Carvalho, Paulo, Popescu, Octav, Sewall, Jonathan, Koedinger, Kenneth
Intelligent tutoring systems (ITSs) are effective in helping students learn; further research could make them even more effective. Particularly desirable is research into how students learn with these systems, how these systems best support student learning, and what learning sciences principles are key in ITSs. CTAT+Tutorshop provides a full stack integrated platform that facilitates a complete research lifecycle with ITSs, which includes using ITS data to discover learner challenges, to identify opportunities for system improvements, and to conduct experimental studies. The platform includes authoring tools to support and accelerate development of ITS, which provide automatic data logging in a format compatible with DataShop, an independent site that supports the analysis of ed tech log data to study student learnings. Among the many technology platforms that exist to support learning sciences research, CTAT+Tutorshop may be the only one that offers researchers the possibility to author elements of ITSs, or whole ITSs, as part of designing studies. This platform has been used to develop and conduct an estimated 147 research studies which have run in a wide variety of laboratory and real-world educational settings, including K-12 and higher education, and have addressed a wide range of research questions. This paper presents five case studies of research conducted on the CTAT+Tutorshop platform, and summarizes what has been accomplished and what is possible for future researchers. We reflect on the distinctive elements of this platform that have made it so effective in facilitating a wide range of ITS research.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (10 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.87)
- Education > Educational Technology > Educational Software > Computer Based Training (1.00)
- Education > Educational Setting (1.00)
Region-wise stacking ensembles for estimating brain-age using MRI
Antonopoulos, Georgios, More, Shammi, Eickhoff, Simon B., Raimondo, Federico, Patil, Kaustubh R.
Predictive modeling using structural magnetic resonance imaging (MRI) data is a prominent approach to study brain-aging. Machine learning algorithms and feature extraction methods have been employed to improve predictions and explore healthy and accelerated aging e.g. neurodegenerative and psychiatric disorders. The high-dimensional MRI data pose challenges to building generalizable and interpretable models as well as for data privacy. Common practices are resampling or averaging voxels within predefined parcels, which reduces anatomical specificity and biological interpretability as voxels within a region may differently relate to aging. Effectively, naive fusion by averaging can result in information loss and reduced accuracy. We present a conceptually novel two-level stacking ensemble (SE) approach. The first level comprises regional models for predicting individuals' age based on voxel-wise information, fused by a second-level model yielding final predictions. Eight data fusion scenarios were explored using as input Gray matter volume (GMV) estimates from four datasets covering the adult lifespan. Performance, measured using mean absolute error (MAE), R2, correlation and prediction bias, showed that SE outperformed the region-wise averages. The best performance was obtained when first-level regional predictions were obtained as out-of-sample predictions on the application site with second-level models trained on independent and site-specific data (MAE=4.75 vs baseline regional mean GMV MAE=5.68). Performance improved as more datasets were used for training. First-level predictions showed improved and more robust aging signal providing new biological insights and enhanced data privacy. Overall, the SE improves accuracy compared to the baseline while preserving or enhancing data privacy.
- Europe > Germany > North Rhine-Westphalia > Düsseldorf Region > Düsseldorf (0.04)
- Europe > Monaco (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
- Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.69)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)
Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation
Semantic segmentation requires per-pixel prediction for a given image. Typically, the output resolution of a segmentation network is severely reduced due to the downsampling operations in the CNN backbone. Most previous methods employ upsampling decoders to recover the spatial resolution.Various decoders were designed in the literature. Here, we propose a novel decoder, termed dynamic neural representational decoder (NRD), which is simple yet significantly more efficient. As each location on the encoder's output corresponds to a local patch of the semantic labels, in this work, we represent these local patches of labels with compact neural networks.
Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks
We propose firefly neural architecture descent, a general framework for progressively and dynamically growing neural networks to jointly optimize the networks' parameters and architectures. Our method works in a steepest descent fashion, which iteratively finds the best network within a functional neighborhood of the original network that includes a diverse set of candidate network structures. By using Taylor approximation, the optimal network structure in the neighborhood can be found with a greedy selection procedure. We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures that avoid catastrophic forgetting in continual learning. Empirically, firefly descent achieves promising results on both neural architecture search and continual learning.
Leveraging the Hints: Adaptive Bidding in Repeated First-Price Auctions
With the advent and increasing consolidation of e-commerce, digital advertising has very recently replaced traditional advertising as the main marketing force in the economy. In the past four years, a particularly important development in the digital advertising industry is the shift from second-price auctions to first-price auctions for online display ads. This shift immediately motivated the intellectually challenging question of how to bid in first-price auctions, because unlike in second-price auctions, bidding one's private value truthfully is no longer optimal. Following a series of recent works in this area, we consider a differentiated setup: we do not make any assumption about other bidders' maximum bid (i.e. it can be adversarial over time), and instead assume that we have access to a hint that serves as a prediction of other bidders' maximum bid, where the prediction is learned through some blackbox machine learning model. We consider two types of hints: one where a single point-prediction is available, and the other where a hint interval (representing a type of confidence region into which others' maximum bid falls) is available.
Learning Large Graph Property Prediction via Graph Segment Training
Learning to predict properties of large graphs is challenging because each prediction requires the knowledge of an entire graph, while the amount of memory available during training is bounded. Here we propose Graph Segment Training (GST), a general framework that utilizes a divide-and-conquer approach to allow learning large graph property prediction with a constant memory footprint. GST first divides a large graph into segments and then backpropagates through only a few segments sampled per training iteration. We refine the GST paradigm by introducing a historical embedding table to efficiently obtain embeddings for segments not sampled for backpropagation. To mitigate the staleness of historical embeddings, we design two novel techniques.