Goto

Collaborating Authors

 human



T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks

Neural Information Processing Systems

In recent years, fueled by the rapid advancement of diffusion models, text-to-video (T2V) generation models have achieved remarkable progress, with notable examples including Pika, Luma, Kling, and Open-Sora. Although these models exhibit impressive generative capabilities, they also expose significant security risks due to their vulnerability to jailbreak attacks, where the models are manipulated to produce unsafe content such as pornography, violence, or discrimination. Existing works such as T2VSafetyBench provide preliminary benchmarks for safety evaluation, but lack systematic methods for thoroughly exploring model vulnerabilities. To address this gap, we are the first to formalize the T2V jailbreak attack as a discrete optimization problem and propose a joint objective-based optimization framework, called T2V-OptJail. This framework consists of two key optimization goals: bypassing the built-in safety filtering mechanisms to increase the attack success rate, preserving semantic consistency between the adversarial prompt and the unsafe input prompt, as well as between the generated video and the unsafe input prompt, to enhance content controllability. In addition, we introduce an iterative optimization strategy guided by prompt variants, where multiple semantically equivalent candidates are generated in each round, and their scores are aggregated to robustly guide the search toward optimal adversarial prompts. We conduct large-scale experiments on several T2V models, covering both open-source models (e.g., Open-Sora) and real commercial closed-source models (e.g., Pika, Luma, Kling). The experimental results show that the proposed method improves 11.4% and 10.0% over the existing state-of-the-art method (SoTA) in terms of attack


Race on to establish globally recognised 'AI-free' logo

BBC News

Race on to establish globally recognised'AI-free' logo Organisations worldwide are racing to develop a universally recognised label for human-made products and services as part of the growing backlash against AI use. Declarations like Proudly Human, Human-made, 'No A.I and AI-free are appearing across films, marketing, books and websites. It is in response to fears that jobs or entire professions are being swept away in a wave of AI-powered automation. BBC News has counted at least eight different initiatives trying to come up with a label that could get the kind of global recognition that the Fair Trade logo has for ethically made products. But with so many competing labels - as well as confusion over the definition of AI-free - experts say consumers are in danger of being left confused unless a single standard can be agreed on.



H-NeRF: Neural Radiance Fields for Rendering and Temporal Reconstruction of Humans in Motion

Neural Information Processing Systems

We present neural radiance fields for rendering and temporal (4D) reconstruction of humans in motion (H-NeRF), as captured by a sparse set of cameras or even from a monocular video. Our approach combines ideas from neural scene representation, novel-view synthesis, and implicit statistical geometric human representations, coupled using novel loss functions. Instead of learning a radiance field with a uniform occupancy prior, we constrain it by a structured implicit human body model, represented using signed distance functions. This allows us to robustly fuse information from sparse views and generalize well beyond the poses or views observed in training. Moreover, we apply geometric constraints to co-learn the structure of the observed subject -- including both body and clothing -- and to regularize the radiance field to geometrically plausible solutions. Extensive experiments on multiple datasets demonstrate the robustness and the accuracy of our approach, its generalization capabilities significantly outside a small training set of poses and views, and statistical extrapolation beyond the observed shape.


Accenture CEO Julie Sweet on Trust in AI, Building New Workbenches, and Why Humans Are Here to Stay

TIME - Tech

Javed is a senior editor at TIME, based in the London bureau. Javed is a senior editor at TIME, based in the London bureau. How do you see your clients adopting AI and grappling with the rapid changes it is bringing? CEOs have identified that AI is simple to try and hard to scale, and that's why they come to Accenture. And you can see that in the explosive growth of our advanced AI practice over the past couple of years.


To Err Like Human: Affective Bias-Inspired Measures for Visual Emotion Recognition Evaluation

Neural Information Processing Systems

Accuracy is a commonly adopted performance metric in various classification tasks, which measures the proportion of correctly classified samples among all samples. It assumes equal importance for all classes, hence equal severity for misclassifications. However, in the task of emotional classification, due to the psychological similarities between emotions, misclassifying a certain emotion into one class may be more severe than another, e.g., misclassifying'excitement' as'anger' apparently is more severe than as'awe'. Albeit high meaningful for many applications, metrics capable of measuring these cases of misclassifications in visual emotion recognition tasks have yet to be explored. In this paper, based on Mikel's emotion wheel from psychology, we propose a novel approach for evaluating the performance in visual emotion recognition, which takes into account the distance on the emotion wheel between different emotions to mimic the psychological nuances of emotions. Experimental results in semi-supervised learning on emotion recognition and user study have shown that our proposed metrics is more effective than the accuracy to assess the performance and conforms to the cognitive laws of human emotions.


Metric from Human: Zero-shot Monocular Metric Depth Estimation via Test-time Adaptation

Neural Information Processing Systems

Monocular depth estimation (MDE) is fundamental for deriving 3D scene structures from 2D images. While state-of-the-art monocular relative depth estimation (MRDE) excels in estimating relative depths for in-the-wild images, current monocular metric depth estimation (MMDE) approaches still face challenges in handling unseen scenes. Since MMDE can be viewed as the composition of MRDE and metric scale recovery, we attribute this difficulty to scene dependency, where MMDE models rely on scenes observed during supervised training for predicting scene scales during inference. To address this issue, we propose to use humans as landmarks for distilling scene-independent metric scale priors from generative painting models. Specifically, MfH generates humans on the input image with generative painting and estimates human dimensions with an off-the-shelf human mesh recovery (HMR) model.


Technical Report: Competition Solution For BetterMixture

arXiv.org Artificial Intelligence

In the era of flourishing large-scale models, the challenge of selecting and optimizing datasets from the vast and complex sea of data, to enhance the performance of large language models within the constraints of limited computational resources, has become paramount. This paper details our solution for the BetterMixture challenge, which focuses on the fine-tuning data mixing for large language models. Our approach, which secured third place, incorporates data deduplication, low-level and high-level quality filtering, and diversity selection. The foundation of our solution is Ke-Data-Juicer, an extension of Data-Juicer, demonstrating its robust capabilities in handling and optimizing data for large language models.


It's Impossible for Machines To Think Like Humans

#artificialintelligence

There's a lot of hysteria around Generative AI (GAI) tools like ChatGPT, beyond the usual hype cycle of many technologies that have come to be in the world. There was even the case last year of the now former Google engineer who was convinced that an AI was, well, sentient. In human terms, this is absolutely impossible. This doesn't mean AI is terrible or that it can't do amazing things to help us. In fact, AI may be just the right technology humanity needs to survive our next phase of evolution. But there is no way, whatsoever, that AI can be in any way, shape or form, human.