caspi
Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-oriented Dialogue Systems
Feng, Yihao, Yang, Shentao, Zhang, Shujian, Zhang, Jianguo, Xiong, Caiming, Zhou, Mingyuan, Wang, Huan
When learning task-oriented dialogue (ToD) agents, reinforcement learning (RL) techniques can naturally be utilized to train dialogue strategies to achieve user-specific goals. Prior works mainly focus on adopting advanced RL techniques to train the ToD agents, while the design of the reward function is not well studied. This paper aims at answering the question of how to efficiently learn and leverage a reward function for training end-to-end (E2E) ToD agents. Specifically, we introduce two generalized objectives for reward-function learning, inspired by the classical learning-to-rank literature. Further, we utilize the learned reward function to guide the training of the E2E ToD agent. With the proposed techniques, we achieve competitive results on the E2E response-generation task on the Multiwoz 2.0 dataset. Source code and checkpoints are publicly released at https://github.com/Shentao-YANG/Fantastic_Reward_ICLR2023.
Applying The Power Of Deep Learning To Cybersecurity
Deep Instinct applies deep learning to cybersecurity--going beyond what machine learning can ... [ ] accomplish with a neural network designed to emulate the human brain and learn as it goes. Cyber attacks are not a new issue by any stretch of the imagination--but they are a rapidly growing threat. As the volume and types of technologies businesses and consumers use continues to expand, the attack surface--the configuration errors, vulnerabilities, human errors, or other weaknesses that increase the potential for a successful cyber attack--increases exponentially. To keep pace with the threat landscape, organizations need to rethink their approach to security. According to AVTest, there are more than 18,000 new malware and/or potentially unwanted applications identified every hour.
Applying The Power Of Deep Learning To Cybersecurity
Deep Instinct applies deep learning to cybersecurity--going beyond what machine learning can ... [ ] accomplish with a neural network designed to emulate the human brain and learn as it goes. Cyber attacks are not a new issue by any stretch of the imagination--but they are a rapidly growing threat. As the volume and types of technologies businesses and consumers use continues to expand, the attack surface--the configuration errors, vulnerabilities, human errors, or other weaknesses that increase the potential for a successful cyber attack--increases exponentially. To keep pace with the threat landscape, organizations need to rethink their approach to security. According to AVTest, there are more than 18,000 new malware and/or potentially unwanted applications identified every hour.
Deep Learning Is Our Best Hope for Cybersecurity, Deep Instinct Says
Thanks to the exponential growth of malware, traditional heuristics-based detection regimes have been overwhelmed, leaving computers at risk. Machine learning approaches can help, but the bottleneck presented by the feature engineering step is a potential dealbreaker. The best path forward at this point is deep learning, says the CEO of Deep Instinct, which claims to have taken an early lead in the emerging field. Ten years ago, the cybersecurity industry faced a dilemma. The volume of malware was exploding, with tens of thousands of new types discovered every day.
Causal-aware Safe Policy Improvement for Task-oriented dialogue
Ramachandran, Govardana Sachithanandam, Hashimoto, Kazuma, Xiong, Caiming
The recent success of reinforcement learning's (RL) in solving complex tasks is most often attributed to its capacity to explore and exploit an environment where it has been trained. Sample efficiency is usually not an issue since cheap simulators are available to sample data on-policy. On the other hand, task oriented dialogues are usually learnt from offline data collected using human demonstrations. Collecting diverse demonstrations and annotating them is expensive. Unfortunately, use of RL methods trained on off-policy data are prone to issues of bias and generalization, which are further exacerbated by stochasticity in human response and non-markovian belief state of a dialogue management system. To this end, we propose a batch RL framework for task oriented dialogue policy learning: causal aware safe policy improvement (CASPI). This method gives guarantees on dialogue policy's performance and also learns to shape rewards according to intentions behind human responses, rather than just mimicking demonstration data; this couple with batch-RL helps overall with sample efficiency of the framework. We demonstrate the effectiveness of this framework on a dialogue-context-to-text Generation and end-to-end dialogue task of the Multiwoz2.0 dataset. The proposed method outperforms the current state of the art on these metrics, in both case. In the end-to-end case, our method trained only on 10\% of the data was able to out perform current state in three out of four evaluation metrics.
Deep Instinct Eyes Deep Learning Cybersecurity PYMNTS.com
Machine learning is perhaps the hottest buzzword in cybersecurity today. The artificial intelligence technology is deployed by cybersecurity firms in an effort to keep pace with the evolution of cyberattacks, as machine learning algorithms are able to improve predictability the more it is used. But according to Guy Caspi, CEO of cybersecurity company Deep Instinct, machine learning is no longer enough in an age of unprecedented evolution and volume of cybercrime. G DATA researchers recently found that last year a new malware specimen surfaced every 4.6 seconds. In the first quarter of 2017, it reduced to every 4.2 seconds, meaning millions and millions of new malware surfaced every year.