dan
Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks
Chen, Sishuo, Yang, Wenkai, Zhang, Zhiyuan, Bi, Xiaohan, Sun, Xu
Natural language processing (NLP) models are known to be vulnerable to backdoor attacks, which poses a newly arisen threat to NLP models. Prior online backdoor defense methods for NLP models only focus on the anomalies at either the input or output level, still suffering from fragility to adaptive attacks and high computational cost. In this work, we take the first step to investigate the unconcealment of textual poisoned samples at the intermediate-feature level and propose a feature-based efficient online defense method. Through extensive experiments on existing attacking methods, we find that the poisoned samples are far away from clean samples in the intermediate feature space of a poisoned NLP model. Motivated by this observation, we devise a distance-based anomaly score (DAN) to distinguish poisoned samples from clean samples at the feature level. Experiments on sentiment analysis and offense detection tasks demonstrate the superiority of DAN, as it substantially surpasses existing online defense methods in terms of defending performance and enjoys lower inference costs. Moreover, we show that DAN is also resistant to adaptive attacks based on feature-level regularization. Our code is available at https://github.com/lancopku/DAN.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- (16 more...)
Do explanations for data-based predictions actually increase users' trust in AI?
In recent years, many artificial intelligence (AI) and robotics researchers have been trying to develop systems that can provide explanations for their actions or predictions. The idea behind their work is that as AI systems become more widespread, explaining why they act in particular ways or why they made certain predictions could increase transparency and consequently users' trust in them. Researchers at Bretagne Atlantique Research Center in Rennes and the French National Center for Scientific Research in Toulouse have recently carried out a study that explores and questions this assumption, with the hope of better understanding how AI explainability may actually impact users' trust in AI. Their paper, published in Nature Machine Intelligence, argues that an AI system's explanations might not actually be as truthful or transparent as some users assume them to be. "This paper originates from our desire to explore an intuitive gap," Erwan Le Merrer and Gilles Trédan, two of the researchers who carried out the study, told TechXplore.
Exclusive Coinvision Interview with IPwe's CEO and CTO
Coinvision sat down with Erich Spangenberg and Dan Bork, respectively, the CEO and the CTO of IPwe, a new venture creating a blockchain and AI enabled global patent market that has been attracting a lot of attention in recent months. Coinvision: Can you give us a short overview on IPwe? Maybe a little on how you came up with the business? Erich: IPwe is creating the patent asset class. We are using AI and blockchain to answer basic questions about patents – Do they exist?
Is Apple's HomePod failing?
A report from Bloomberg earlier this week claimed that Apple's HomePod isn't doing so well, and that the company cut orders for new hardware from suppliers. This might not shock some of you: Apple missed the all-important holiday buying season and is competing with less expensive hardware from Google, Sonos and Amazon. But is the first smart speaker with Siri already a failure, or does the HomePod simply need time to find its place? I'm not in any way surprised that the HomePod has fizzled, simply because it's a weird product with a very weird proposition. I seriously considered buying it in the run-up to its launch, but ultimately couldn't find a strong enough reason to plunk down $350.
- Media > Music (1.00)
- Leisure & Entertainment (1.00)