believing
Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation
Robustness has been extensively studied in reinforcement learning (RL) to handle various forms of uncertainty such as random perturbations, rare events, and malicious attacks. In this work, we consider one critical type of robustness against spurious correlation, where different portions of the state do not have correlations induced by unobserved confounders. These spurious correlations are ubiquitous in real-world tasks, for instance, a self-driving car usually observes heavy traffic in the daytime and light traffic at night due to unobservable human activity. A model that learns such useless or even harmful correlation could catastrophically fail when the confounder in the test case deviates from the training one. Although motivated, enabling robustness against spurious correlation poses significant challenges since the uncertainty set, shaped by the unobserved confounder and causal structure, is difficult to characterize and identify. Existing robust algorithms that assume simple and unstructured uncertainty sets are therefore inadequate to address this challenge. To solve this issue, we propose Robust State-Confounded Markov Decision Processes (RSC-MDPs) and theoretically demonstrate its superiority in avoiding learning spurious correlations compared with other robust RL counterparts. We also design an empirical algorithm to learn the robust optimal policy for RSC-MDPs, which outperforms all baselines in eight realistic self-driving and manipulation tasks.
Seeing is Believing: Rich-Context Hallucination Detection for MLLMs via Backward Visual Grounding
Guo, Pinxue, Wu, Chongruo, Zhou, Xinyu, Hong, Lingyi, Chen, Zhaoyu, Li, Jinglun, Jiang, Kaixun, Cheung, Sen-ching Samson, Zhang, Wei, Zhang, Wenqiang
Multimodal Large Language Models (MLLMs) have unlocked powerful cross-modal capabilities, but still significantly suffer from hallucinations. As such, accurate detection of hallucinations in MLLMs is imperative for ensuring their reliability in practical applications. To this end, guided by the principle of "Seeing is Believing", we introduce VBackChecker, a novel reference-free hallucination detection framework that verifies the consistency of MLLMgenerated responses with visual inputs, by leveraging a pixellevel Grounding LLM equipped with reasoning and referring segmentation capabilities. This reference-free framework not only effectively handles rich-context scenarios, but also offers interpretability. To facilitate this, an innovative pipeline is accordingly designed for generating instruction-tuning data (R-Instruct), featuring rich-context descriptions, grounding masks, and hard negative samples. We further establish R^2 -HalBench, a new hallucination benchmark for MLLMs, which, unlike previous benchmarks, encompasses real-world, rich-context descriptions from 18 MLLMs with high-quality annotations, spanning diverse object-, attribute, and relationship-level details. VBackChecker outperforms prior complex frameworks and achieves state-of-the-art performance on R^2 -HalBench, even rivaling GPT-4o's capabilities in hallucination detection. It also surpasses prior methods in the pixel-level grounding task, achieving over a 10% improvement. All codes, data, and models are available at https://github.com/PinxueGuo/VBackChecker.
Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation
Robustness has been extensively studied in reinforcement learning (RL) to handle various forms of uncertainty such as random perturbations, rare events, and malicious attacks. In this work, we consider one critical type of robustness against spurious correlation, where different portions of the state do not have correlations induced by unobserved confounders. These spurious correlations are ubiquitous in real-world tasks, for instance, a self-driving car usually observes heavy traffic in the daytime and light traffic at night due to unobservable human activity. A model that learns such useless or even harmful correlation could catastrophically fail when the confounder in the test case deviates from the training one. Although motivated, enabling robustness against spurious correlation poses significant challenges since the uncertainty set, shaped by the unobserved confounder and causal structure, is difficult to characterize and identify.
Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation
Ding, Wenhao, Shi, Laixi, Chi, Yuejie, Zhao, Ding
Robustness has been extensively studied in reinforcement learning (RL) to handle various forms of uncertainty such as random perturbations, rare events, and malicious attacks. In this work, we consider one critical type of robustness against spurious correlation, where different portions of the state do not have correlations induced by unobserved confounders. These spurious correlations are ubiquitous in real-world tasks, for instance, a self-driving car usually observes heavy traffic in the daytime and light traffic at night due to unobservable human activity. A model that learns such useless or even harmful correlation could catastrophically fail when the confounder in the test case deviates from the training one. Although motivated, enabling robustness against spurious correlation poses significant challenges since the uncertainty set, shaped by the unobserved confounder and causal structure, is difficult to characterize and identify. Existing robust algorithms that assume simple and unstructured uncertainty sets are therefore inadequate to address this challenge. To solve this issue, we propose Robust State-Confounded Markov Decision Processes (RSC-MDPs) and theoretically demonstrate its superiority in avoiding learning spurious correlations compared with other robust RL counterparts. We also design an empirical algorithm to learn the robust optimal policy for RSC-MDPs, which outperforms all baselines in eight realistic self-driving and manipulation tasks.
Seeing is Believing: Mastering Image Classification with Python
Image classification is a widely used technique in computer vision that involves categorizing images into one or more classes based on their visual features. It has many practical applications in fields such as object recognition, face detection, and autonomous vehicles. In this article, we will explore how to perform image classification using Python. Image classification is the process of assigning a label to an image based on its visual content. It involves analyzing the image's features, such as color, texture, and shape, and using these features to predict the image's class.
The Mistake Every Data Scientist Has Made at Least Once - KDnuggets
If you use a tool where it hasn't been verified safe, any mess you make is your fault… AI is a tool like any other, so the same rule applies. Instead, force machine learning and AI systems to earn your trust. If you want to teach with examples, the examples have to be good. If you want to trust your student's ability, the test has to be good. Always keep in mind that you don't know anything about the safety of your system outside the conditions you checked it in, so check it carefully!
The mistake every data scientist has made at least once
If you use a tool where it hasn't been verified safe, any mess you make is your fault… AI is a tool like any other, so the same rule applies. Instead, force machine learning and AI systems to earn your trust. If you want to teach with examples, the examples have to be good. If you want to trust your student's ability, the test has to be good. Always keep in mind that you don't know anything about the safety of your system outside the conditions you checked it in, so check it carefully!
Is Seeing Still Believing? Leveraging Deepfake Technology for Livestock Farming
Deepfake technologies are known for the creation of forged celebrity pornography, face and voice swaps, and other fake media content. Despite the negative connotations the technology bears, the underlying machine learning algorithms have a huge potential that could be applied to not just digital media, but also to medicine, biology, affective science, and agriculture, just to name a few. Due to the ability to generate big datasets based on real data distributions, deepfake could also be used to positively impact non-human animals such as livestock. Generated data using Generative Adversarial Networks, one of the algorithms that deepfake is based on, could be used to train models to accurately identify and monitor animal health and emotions. Through data augmentation, using digital twins, and maybe even displaying digital conspecifics (digital avatars or metaverse) where social interactions are enhanced, deepfake technologies have the potential to increase animal health, emotionality, sociality, animal-human and animal-computer interactions and thereby productivity, and sustainability of the farming industry. The interactive 3D avatars and the digital twins of farm animals enabled by deepfake technology offers a timely and essential way in the digital transformation toward exploring the subtle nuances of animal behavior and cognition in enhancing farm animal welfare. Without offering conclusive remarks, the presented mini review is exploratory in nature due to the nascent s...
Believing Your Self-Driving Car Or Your Lyin' Eyes
Will the AI driving your self-driving car see things the same way you do, or will you have lyin' eyes. You might be somewhat familiar with the expression "lyin' eyes" and which can be used in a variety of interesting and useful ways. Especially popularized by a famous song that the Eagles brought to the world in the mid-70's, the most straightforward meaning of lying eyes is that your eyes have the potential of giving away your true intent, in spite of your actions that might suggest some other purpose or goal. Lore has it that Don Henley and Glenn Frey of the Eagles were inspired to incorporate the expression as the title of their song in order to describe how some beautiful women were cheating on their husbands and that apparently those cheating women's lyin' eyes gave away their unfaithful efforts. There are though other variations to the meaning and use of lying eyes.
Seeing Isn't Believing: New AI May Tackle 'Manipulation of Reality' Amid Rising Threat of Deepfakes
Last year saw the rise of the threat of deepfakes – a technique used to combine and superimpose images and videos onto others using a machine learning algorithm, creating hyper-realistic but fake content. AI buffs have split into two major groups – one working to make such images and video more realistic, and another developing tools that would tell users whether a video has been manipulated or not. A team of researchers from the University of California at Riverside and the R&D firm Mayachitra have developed a novel deep-learning architecture that can detect content-changing manipulation. This is not the first study on the problem, but this neural network appears to have gone further in recognising deepfakes than its predecessors. Different manipulation techniques may create a convincing video for human eyes, but the algorithm is able to see minor distortions, such as shearing and compression. It exploits resampling features, a long short-term memory (LSTM) based network, and encoder-decoder architectures in order to analyse videos pixel by pixel, and is said to be capable of spotting whole patches of the footage that have been doctored.