Goto

Collaborating Authors

 Vision


Amnesia as a Catalyst for Enhancing Black Box Pixel Attacks in Image Classification and Object Detection

Neural Information Processing Systems

It is well known that query-based attacks tend to have relatively higher successrates in adversarial black-box attacks. While research on black-box attacks is activelybeing conducted, relatively few studies have focused on pixel attacks thattarget only a limited number of pixels. In image classification, query-based pixelattacks often rely on patches, which heavily depend on randomness and neglectthe fact that scattered pixels are more suitable for adversarial attacks. Moreover, tothe best of our knowledge, query-based pixel attacks have not been explored in thefield of object detection. To address these issues, we propose a novel pixel-basedblack-box attack called Remember and Forget Pixel Attack using ReinforcementLearning(RFPAR), consisting of two main components: the Remember and Forgetprocesses. RFPAR mitigates randomness and avoids patch dependency byleveraging rewards generated through a one-step RL algorithm to perturb pixels.RFPAR effectively creates perturbed images that minimize the confidence scoreswhile adhering to limited pixel constraints.


Weakly Supervised 3D Open-vocabulary Segmentation

Neural Information Processing Systems

Open-vocabulary segmentation of 3D scenes is a fundamental function of human perception and thus a crucial objective in computer vision research. However, this task is heavily impeded by the lack of large-scale and diverse 3D open-vocabulary segmentation datasets for training robust and generalizable models. Distilling knowledge from pre-trained 2D open-vocabulary segmentation models helps but it compromises the open-vocabulary feature as the 2D models are mostly finetuned with close-vocabulary datasets. We tackle the challenges in 3D open-vocabulary segmentation by exploiting pre-trained foundation models CLIP and DINO in a weakly supervised manner. Specifically, given only the open-vocabulary text descriptions of the objects in a scene, we distill the open-vocabulary multimodal knowledge and object reasoning capability of CLIP and DINO into a neural radiance field (NeRF), which effectively lifts 2D features into view-consistent 3D segmentation.


When it comes to crime, you can't algorithm your way to safety

New Scientist

The UK government's proposed AI-powered crime prediction tool, designed to flag individuals deemed "high risk" for future violence based on personal data like mental health history and addiction, marks a provocative new frontier. Elsewhere, Argentina's new Artifical Intelligence Unit for Security intends to use machine learning for crime prediction and real-time surveillance. And in some US cities, AI facial recognition is paired with street surveillance to track suspects. The promise of anticipating violence Minority Report-style is compelling.


A Unified Sequence Interface for Vision Tasks

Neural Information Processing Systems

While language tasks are naturally expressed in a single, unified, modeling framework, i.e., generating sequences of tokens, this has not been the case in computer vision. As a result, there is a proliferation of distinct architectures and loss functions for different vision tasks. In this work we show that a diverse set of "core" computer vision tasks can also be unified if formulated in terms of a shared pixel-to-sequence interface. We focus on four tasks, namely, object detection, instance segmentation, keypoint detection, and image captioning, all with diverse types of outputs, e.g., bounding boxes or dense masks. Despite that, by formulating the output of each task as a sequence of discrete tokens with a unified interface, we show that one can train a neural network with a single model architecture and loss function on all these tasks, with no task-specific customization.


Police tech can sidestep facial recognition bans now

MIT Technology Review

Companies like Flock and Axon sell suites of sensors--cameras, license plate readers, gunshot detectors, drones--and then offer AI tools to make sense of that ocean of data (at last year's conference I saw schmoozing between countless AI-for-police startups and the chiefs they sell to on the expo floor). Departments say these technologies save time, ease officer shortages, and help cut down on response times. Those sound like fine goals, but this pace of adoption raises an obvious question: Who makes the rules here? When does the use of AI cross over from efficiency into surveillance, and what type of transparency is owed to the public? In some cases, AI-powered police tech is already driving a wedge between departments and the communities they serve.


The Download: a new form of AI surveillance, and the US and China's tariff deal

MIT Technology Review

Police and federal agencies have found a controversial new way to skirt the growing patchwork of laws that curb how they use facial recognition: an AI model that can track people based on attributes like body size, gender, hair color and style, clothing, and accessories. The tool, called Track and built by the video analytics company Veritone, is used by 400 customers, including state and local police departments and universities all over the US. It is also expanding federally. The product has drawn criticism from the American Civil Liberties Union, which--after learning of the tool through MIT Technology Review--said it was the first instance they'd seen of a nonbiometric tracking system used at scale in the US. How the largest gathering of US police chiefs is talking about AI.


How a new type of AI is helping police skirt facial recognition bans

MIT Technology Review

"The whole vision behind Track in the first place," says Veritone CEO Ryan Steelberg, was "if we're not allowed to track people's faces, how do we assist in trying to potentially identify criminals or malicious behavior or activity?" In addition to tracking individuals where facial recognition isn't legally allowed, Steelberg says, it allows for tracking when faces are obscured or not visible. The product has drawn criticism from the American Civil Liberties Union, which--after learning of the tool through MIT Technology Review--said it was the first instance they'd seen of a nonbiometric tracking system used at scale in the US. They warned that it raises many of the same privacy concerns as facial recognition but also introduces new ones at a time when the Trump administration is pushing federal agencies to ramp up monitoring of protesters, immigrants, and students. Veritone gave us a demonstration of Track in which it analyzed people in footage from different environments, ranging from the January 6 riots to subway stations.


ICE's Deportation Airline Hack Reveals Man 'Disappeared' to El Salvador

WIRED

A United States Customs and Border Protection request for information this week revealed the agency's plans to find vendors that can supply face recognition technology for capturing data on everyone entering the US in a vehicle like a car or van, not just the people sitting in the front seat. And a CBP spokesperson later told WIRED that the agency also has plans to expand its real-time face recognition capabilities at the border to detect people exiting the US as well--a focus that may be tied to the Trump administration's push to get undocumented people to "self-deport" and leave the US. WIRED also shed light this week on a recent CBP memo that rescinded a number of internal policies designed to protect vulnerable people--including pregnant women, infants, the elderly, and people with serious medical conditions--while in the agency's custody. Signed by acting commissioner Pete Flores, the order eliminates four Biden-era policies. Meanwhile, as the ripple effects of "SignalGate" continue, the communication app TeleMessage suspended "all services" pending an investigation after former US national security adviser Mike Waltz inadvertently called attention to the app, which subsequently suffered data breaches in recent days.


US Customs and Border Protection Plans to Photograph Everyone Exiting the US by Car

WIRED

United States Customs and Border Protection plans to log every person leaving the country by vehicle by taking photos at border crossings of every passenger and matching their faces to their passports, visas, or travel documents, WIRED has learned. The escalated documentation of travelers could be used to track how many people are self-deporting, or leave the US voluntarily, which the Trump administration is fervently encouraging to people in the country illegally. CBP exclusively tells WIRED, in response to an inquiry to the agency, that it plans to mirror the current program it's developing--photographing every person entering the US and match their faces with their travel documents--to the outbound lanes going to Canada and Mexico. The agency currently does not have a system that monitors people leaving the country by vehicle. "Although we are still working on how we would handle outbound vehicle lanes, we will ultimately expand to this area," CBP spokesperson Jessica Turner tells WIRED.


A murder victim addressed his killer in court thanks to AI resurrection

Mashable

And, as AI gets more advanced, so do the resurrections. Most recently, Stacey Wales used AI to generate a video of her late brother, Christopher Pelkey, to address the courtroom at the sentencing hearing for the man who killed him in a road rage incident in Chandler, Arizona. According to NPR, its the first time AI has ever been used in this way. "He doesn't get a say. He doesn't get a chance to speak," Wales told NPR, referring to her brother.