AITopics

2410.17445

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-4-2024

Large Language Models can be Strong Self-Detoxifiers

Ko, Ching-Yun, Chen, Pin-Yu, Das, Payel, Mroueh, Youssef, Dan, Soham, Kollias, Georgios, Chaudhury, Subhajit, Pedapati, Tejaswini, Daniel, Luca

This paper contains examples that may be considered offensive and inappropriate. Reducing the likelihood of generating harmful and toxic output is an essential task when aligning large language models (LLMs). Existing methods mainly rely on training an external reward model (i.e., another language model) or fine-tuning the LLM using self-generated data to influence the outcome. In this paper, we show that LLMs have the capability of self-detoxification without the use of an additional reward model or re-training. We propose Self-disciplined Autoregressive Sampling (SASA), a lightweight controlled decoding algorithm for toxicity reduction of LLMs. SASA leverages the contextual representations from an LLM to learn linear subspaces characterizing toxic v.s. When auto-completing a response token-by-token, SASA dynamically tracks the margin of the current output to steer the generation away from the toxic subspace, by adjusting the autoregressive sampling strategy. Recent advancements in large language models (LLMs) have dramatically enhanced their capabilities in textual understanding and reasoning (Brown et al., 2020; Kojima et al., 2022). Their capabilities in performing diverse linguistic tasks and producing coherent texts have catalyzed their adoption across a variety of applications (Rae et al., 2021; Hoffmann et al., 2022; Le Scao et al., 2023; Touvron et al., 2023a;b; Achiam et al., 2023). However, with the escalating size of models (Raffel et al., 2020; Brown et al., 2020; Achiam et al., 2023), there is a corresponding increase in the scale of the training datasets required to avert overfitting and to encapsulate extensive world knowledge. These extensive datasets, predominantly derived from internet crawls and merely subjected to basic filtering protocols (Raffel et al., 2020), often harbor biases that are problematic or directly detrimental for many applications and may not inherently align with these desirable attributes (Wallace et al., 2019; Gehman et al., 2020). In fact, it is known that language models trained on such data may not only mimic but also amplify these biases (Bolukbasi et al., 2016; Caliskan et al., 2017; Zhao et al., 2018; Sheng et al., 2019; Gehman et al., 2020; Hartvigsen et al., For example, an "aligned" LLM may be inadvertently or maliciously tricked into generating harmful or toxic output that causes usage violations and safety concerns (Sun et al., 2024).

large language model, machine learning, natural language, (18 more...)

2410.03818

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningDec-20-2023

One step closer to unbiased aleatoric uncertainty estimation

Zhang, Wang, Ma, Ziwen, Das, Subhro, Weng, Tsui-Wei, Megretski, Alexandre, Daniel, Luca, Nguyen, Lam M.

Neural networks are powerful tools in various applications, and quantifying their uncertainty is crucial for reliable decision-making. In the deep learning field, the uncertainties are usually categorized into aleatoric (data) and epistemic (model) uncertainty. In this paper, we point out that the existing popular variance attenuation method highly overestimates aleatoric uncertainty. To address this issue, we propose a new estimation method by actively de-noising the observed data. By conducting a broad range of experiments, we demonstrate that our proposed approach provides a much closer approximation to the actual data uncertainty than the standard method.

artificial intelligence, machine learning, noise, (17 more...)

2312.10469

Country:

North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceDec-20-2023

PIFON-EPT: MR-Based Electrical Property Tomography Using Physics-Informed Fourier Networks

Yu, Xinling, Serrallés, José E. C., Giannakopoulos, Ilias I., Liu, Ziyue, Daniel, Luca, Lattanzi, Riccardo, Zhang, Zheng

We propose Physics-Informed Fourier Networks for Electrical Properties (EP) Tomography (PIFON-EPT), a novel deep learning-based method for EP reconstruction using noisy and/or incomplete magnetic resonance (MR) measurements. Our approach leverages the Helmholtz equation to constrain two networks, responsible for the denoising and completion of the transmit fields, and the estimation of the object's EP, respectively. We embed a random Fourier features mapping into our networks to enable efficient learning of high-frequency details encoded in the transmit fields. We demonstrated the efficacy of PIFON-EPT through several simulated experiments at 3 and 7 tesla (T) MR imaging, and showed that our method can reconstruct physically consistent EP and transmit fields. Specifically, when only $20\%$ of the noisy measured fields were used as inputs, PIFON-EPT reconstructed the EP of a phantom with $\leq 5\%$ error, and denoised and completed the measurements with $\leq 1\%$ error. Additionally, we adapted PIFON-EPT to solve the generalized Helmholtz equation that accounts for gradients of EP between inhomogeneities. This yielded improved results at interfaces between different materials without explicit knowledge of boundary conditions. PIFON-EPT is the first method that can simultaneously reconstruct EP and transmit fields from incomplete noisy MR measurements, providing new opportunities for EPT research.

artificial intelligence, machine learning, pifon-ept, (17 more...)

2302.11883

Country: North America > United States > California > Santa Barbara County > Santa Barbara (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

arXiv.org Machine LearningOct-29-2023

Rare Event Probability Learning by Normalizing Flows

Gao, Zhenggqi, Zhang, Dinghuai, Daniel, Luca, Boning, Duane S.

A rare event is defined by a low probability of occurrence. Accurate estimation of such small probabilities is of utmost importance across diverse domains. Conventional Monte Carlo methods are inefficient, demanding an exorbitant number of samples to achieve reliable estimates. Inspired by the exact sampling capabilities of normalizing flows, we revisit this challenge and propose normalizing flow assisted importance sampling, termed NOFIS. NOFIS first learns a sequence of proposal distributions associated with predefined nested subset events by minimizing KL divergence losses. Next, it estimates the rare event probability by utilizing importance sampling in conjunction with the last proposal. The efficacy of our NOFIS method is substantiated through comprehensive qualitative visualizations, affirming the optimality of the learned proposal distribution, as well as a series of quantitative experiments encompassing $10$ distinct test cases, which highlight NOFIS's superiority over baseline approaches.

artificial intelligence, machine learning, probability, (16 more...)

2310.19167

Country: North America > Canada > Quebec (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Artificial IntelligenceJul-19-2023

ConCerNet: A Contrastive Learning Based Framework for Automated Conservation Law Discovery and Trustworthy Dynamical System Prediction

Zhang, Wang, Weng, Tsui-Wei, Das, Subhro, Megretski, Alexandre, Daniel, Luca, Nguyen, Lam M.

Deep neural networks (DNN) have shown great capacity of modeling a dynamical system; nevertheless, they usually do not obey physics constraints such as conservation laws. This paper proposes a new learning framework named ConCerNet to improve the trustworthiness of the DNN based dynamics modeling to endow the invariant properties. ConCerNet consists of two steps: (i) a contrastive learning method to automatically capture the system invariants (i.e. conservation properties) along the trajectory observations; (ii) a neural projection layer to guarantee that the learned dynamics models preserve the learned invariants. We theoretically prove the functional relationship between the learned latent representation and the unknown system invariant function. Experiments show that our method consistently outperforms the baseline neural networks in both coordinate error and conservation metrics by a large margin. With neural network based parameterization and no dependence on prior knowledge, our method can be extended to complex and large-scale dynamics by leveraging an autoencoder.

artificial intelligence, concernet, machine learning, (17 more...)

2302.05783

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceJan-26-2023

Certified Interpretability Robustness for Class Activation Mapping

Gu, Alex, Weng, Tsui-Wei, Chen, Pin-Yu, Liu, Sijia, Daniel, Luca

Interpreting machine learning models is challenging but crucial for ensuring the safety of deep networks in autonomous driving systems. Due to the prevalence of deep learning based perception models in autonomous vehicles, accurately interpreting their predictions is crucial. While a variety of such methods have been proposed, most are shown to lack robustness. Yet, little has been done to provide certificates for interpretability robustness. Taking a step in this direction, we present CORGI, short for Certifiably prOvable Robustness Guarantees for Interpretability mapping. CORGI is an algorithm that takes in an input image and gives a certifiable lower bound for the robustness of the top k pixels of its CAM interpretability map. We show the effectiveness of CORGI via a case study on traffic sign data, certifying lower bounds on the minimum adversarial perturbation not far from (4-5x) state-of-the-art attack methods.

artificial intelligence, machine learning, neural network, (17 more...)

2301.11324

Country: North America (0.46)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

arXiv.org Machine LearningFeb-1-2021

Fast Training of Provably Robust Neural Networks by SingleProp

Boopathy, Akhilan, Weng, Tsui-Wei, Liu, Sijia, Chen, Pin-Yu, Zhang, Gaoyuan, Daniel, Luca

Recent works have developed several methods of defending neural networks against adversarial attacks with certified guarantees. However, these techniques can be computationally costly due to the use of certification during training. We develop a new regularizer that is both more efficient than existing certified defenses, requiring only one additional forward propagation through a network, and can be used to train networks with similar certified accuracy. Through experiments on MNIST and CIFAR-10 we demonstrate improvements in training speed and comparable certified accuracy compared to state-of-the-art certified defenses.

deep learning, ibp, neural network, (17 more...)

2102.01208

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningOct-21-2020

Proper Network Interpretability Helps Adversarial Robustness in Classification

Boopathy, Akhilan, Liu, Sijia, Zhang, Gaoyuan, Liu, Cynthia, Chen, Pin-Yu, Chang, Shiyu, Daniel, Luca

Recent works have empirically shown that there exist adversarial examples that can be hidden from neural network interpretability (namely, making network interpretation maps visually similar), or interpretability is itself susceptible to adversarial attacks. In this paper, we theoretically show that with a proper measurement of interpretation, it is actually difficult to prevent prediction-evasion adversarial attacks from causing interpretation discrepancy, as confirmed by experiments on MNIST, CIFAR-10 and Restricted ImageNet. Spurred by that, we develop an interpretability-aware defensive scheme built only on promoting robust interpretation (without the need for resorting to adversarial loss minimization). We show that our defense achieves both robust classification and robust interpretation, outperforming state-of-the-art adversarial training methods against attacks of large perturbation in particular.

artificial intelligence, interpretation discrepancy, neural network, (12 more...)

2006.14748

Country:

North America > United States > Massachusetts (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

arXiv.org Machine LearningOct-13-2020

Higher-Order Certification for Randomized Smoothing

Mohapatra, Jeet, Ko, Ching-Yun, Weng, Tsui-Wei, Chen, Pin-Yu, Liu, Sijia, Daniel, Luca

Randomized smoothing is a recently proposed defense against adversarial attacks that has achieved SOTA provable robustness against $\ell_2$ perturbations. A number of publications have extended the guarantees to other metrics, such as $\ell_1$ or $\ell_\infty$, by using different smoothing measures. Although the current framework has been shown to yield near-optimal $\ell_p$ radii, the total safety region certified by the current framework can be arbitrarily small compared to the optimal. In this work, we propose a framework to improve the certified safety region for these smoothed classifiers without changing the underlying smoothing scheme. The theoretical contributions are as follows: 1) We generalize the certification for randomized smoothing by reformulating certified radius calculation as a nested optimization problem over a class of functions. 2) We provide a method to calculate the certified safety region using $0^{th}$-order and $1^{st}$-order information for Gaussian-smoothed classifiers. We also provide a framework that generalizes the calculation for certification using higher-order information. 3) We design efficient, high-confidence estimators for the relevant statistics of the first-order information. Combining the theoretical contribution 2) and 3) allows us to certify safety region that are significantly larger than the ones provided by the current methods. On CIFAR10 and Imagenet datasets, the new regions certified by our approach achieve significant improvements on general $\ell_1$ certified radii and on the $\ell_2$ certified radii for color-space attacks ($\ell_2$ restricted to 1 channel) while also achieving smaller improvements on the general $\ell_2$ certified radii. Our framework can also provide a way to circumvent the current impossibility results on achieving higher magnitude of certified radii without requiring the use of data-dependent smoothing techniques.

deep learning, neural network, null 2, (19 more...)

2010.06651

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry:

Transportation (0.46)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)