Large Language Model
Exploring Euphemism Detection in Few-Shot and Zero-Shot Settings
Compared to other figures of speech like similes (Chakrabarty et al., 2020) and metaphors Euphemisms are figures of speech which aim to (Chakrabarty et al., 2021), work on euphemisms soften the blow of certain words which may be has been limited. Recently, Gavidia et al. (2022); too direct or too harsh (Magu and Luo, 2018; Felt Lee et al. (2022) released a new dataset of diverse and Riloff, 2020). In the EMNLP 2022 FigLang euphemisms and conducted analysis on automatically Workshop Euphemism Shared Task, participating identifying potentially euphemistic terms. In teams are given a set of sentences with potentially the past, Felt and Riloff (2020) used sentiment analysis euphemistic terms (PETs) enclosed in brackets, and techniques to recognize euphemistic and dysphemistic the task is to classify whether or not the PET in a phrases. Other studies also focused on given sentence is used euphemistically.
Diversity-boosted Generalization-Specialization Balancing for Zero-shot Learning
Li, Yun, Liu, Zhe, Chang, Xiaojun, McAuley, Julian, Yao, Lina
Zero-Shot Learning (ZSL) aims to transfer classification capability from seen to unseen classes. Recent methods have proved that generalization and specialization are two essential abilities to achieve good performance in ZSL. However, focusing on only one of the abilities may result in models that are either too general with degraded classification ability or too specialized to generalize to unseen classes. In this paper, we propose an end-to-end network, termed as BGSNet, which equips and balances generalization and specialization abilities at the instance and dataset level. Specifically, BGSNet consists of two branches: the Generalization Network (GNet), which applies episodic meta-learning to learn generalized knowledge, and the Balanced Specialization Network (BSNet), which adopts multiple attentive extractors to extract discriminative features and achieve instance-level balance. A novel self-adjusted diversity loss is designed to optimize BSNet with redundancy reduced and diversity boosted. We further propose a differentiable dataset-level balance and update the weights in a linear annealing schedule to simulate network pruning and thus obtain the optimal structure for BSNet with dataset-level balance achieved. Experiments on four benchmark datasets demonstrate our model's effectiveness. Sufficient component ablations prove the necessity of integrating and balancing generalization and specialization abilities.
Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation
Vu, Tu, Barua, Aditya, Lester, Brian, Cer, Daniel, Iyyer, Mohit, Constant, Noah
In this paper, we explore the challenging problem of performing a generative task in a target language when labeled data is only available in English, using summarization as a case study. We assume a strict setting with no access to parallel data or machine translation and find that common transfer learning approaches struggle in this setting, as a generative multilingual model fine-tuned purely on English catastrophically forgets how to generate non-English. Given the recent rise of parameter-efficient adaptation techniques, we conduct the first investigation into how one such method, prompt tuning (Lester et al., 2021), can overcome catastrophic forgetting to enable zero-shot cross-lingual generation. Our experiments show that parameter-efficient prompt tuning provides gains over standard fine-tuning when transferring between less-related languages, e.g., from English to Thai. However, a significant gap still remains between these methods and fully-supervised baselines. To improve cross-lingual transfer further, we explore several approaches, including: (1) mixing in unlabeled multilingual data, and (2) explicitly factoring prompts into recombinable language and task components. Our approaches can provide further quality gains, suggesting that robust zero-shot cross-lingual generation is within reach.
EncT5: A Framework for Fine-tuning T5 as Non-autoregressive Models
Liu, Frederick, Huang, Terry, Lyu, Shihang, Shakeri, Siamak, Yu, Hongkun, Li, Jing
Pre-trained encoder-decoder transformer architectures have become increasingly popular recently with the advent of T5 models. T5 has also become more favorable over other architectures like BERT due to the amount of data that it is pre-trained on, increased scale of model parameter sizes and easy applicability to a diverse set of tasks due to the generative nature of the model. While being able to generalize to a wide variety of tasks, it is not clear that encoder-decoder architectures are the most efficient for fine-tuning tasks that don't require auto-regressive decoding. In this work, we study fine-tuning pre-trained encoder-decoder models for tasks such as classification, multi-label classification, and structured prediction. We propose \textbf{EncT5}, a framework for these problems, and illustrate instantiations for these tasks. Our experiment results show that EncT5 has advantages over T5 such as efficiency and usability out performs BERT when evaluated on publicly available pre-trained checkpoints.
FaithDial: A Faithful Benchmark for Information-Seeking Dialogue
Dziri, Nouha, Kamalloo, Ehsan, Milton, Sivan, Zaiane, Osmar, Yu, Mo, Ponti, Edoardo M., Reddy, Siva
The goal of information-seeking dialogue is to respond to seeker queries with natural language utterances that are grounded on knowledge sources. However, dialogue systems often produce unsupported utterances, a phenomenon known as hallucination. To mitigate this behavior, we adopt a data-centric solution and create FaithDial, a new benchmark for hallucination-free dialogues, by editing hallucinated responses in the Wizard of Wikipedia (WoW) benchmark. We observe that FaithDial is more faithful than WoW while also maintaining engaging conversations. We show that FaithDial can serve as training signal for: i) a hallucination critic, which discriminates whether an utterance is faithful or not, and boosts the performance by 12.8 F1 score on the BEGIN benchmark compared to existing datasets for dialogue coherence; ii) high-quality dialogue generation. We benchmark a series of state-of-the-art models and propose an auxiliary contrastive objective that achieves the highest level of faithfulness and abstractiveness based on several automated metrics. Further, we find that the benefits of FaithDial generalize to zero-shot transfer on other datasets, such as CMU-Dog and TopicalChat. Finally, human evaluation reveals that responses generated by models trained on FaithDial are perceived as more interpretable, cooperative, and engaging.
AI technology is not dark magic, it's just misunderstood
Most forms of technology applications are well understood. Every computer programme can be deconstructed into the basic building blocks of code, and if it goes wrong, you can debug the software โ often by simply stepping through the code line by line in order to find out where the problem lies. Artificial Intelligence, or AI, is different. With the latest AI large language models we can't predict exactly what it will output, but it will do a good job at writing an article or creating poetry. What makes them human-like is the lack of predictable outcomes โ humans simply aren't predictable!
Venture FOMO Returns as Investors Chase Artificial Intelligence Deals
Venture capitalists are shaking themselves out of a bear market slumber to chase deals in a pocket of artificial intelligence that's spilled into the mainstream this year: AI that generates art, videos and writing. Jasper AI, which last year started selling an AI-assisted writing tool, raised funding from Insight Partners at a $1.5 billion pre-investment valuation around June this year, according to two sources familiar with the talks. Startup valuations have fallen since then. But earlier this month hedge fund Coatue Management paid a higher price for new shares in the Austin, Tex.โbased Jasper, which has rapidly increased its revenues using software developed by startup OpenAI. Meanwhile, Descript, a startup that uses AI for video and audio editing and was founded by Groupon co-founder Andrew Mason, has been in talks with OpenAI CEO Sam Altman and other investors to raise a new round, according to people familiar with the discussions.
Will artificial intelligence ever rival true human thinking?
The narrowness of AI will someday be replaced by artificial general intelligence. But will it have the capability to rival human intelligence and creativity? Some of the world's most advanced artificial intelligence (AI) systems, at least the ones the public hear about, are famous for beating human players at chess or poker. Other algorithms are known for their ability to learn how to recognize cats or their inability to recognize people with darker skin. But are current AI systems anything more than toys?
Digital transformation with Google Cloud
Alphabet's Google Cloud empowers organisations to digitally transform themselves into smarter businesses. Its diverse solutions include cloud computing, data analytics, and the latest artificial intelligence (AI) and machine learning tools. Last week, many of the platform's latest advances were shared at Next '22, Google Cloud's annual developer and tech conference about digital transformation in the cloud. We've partnered with Google Cloud over the last few years to apply our AI research for making a positive impact on core solutions used by their customers. Here, we introduce a few of these projects, including optimising document understanding, enhancing the value of wind energy, and offering easier use of AlphaFold.
Less is More: Summary of Long Instructions is Better for Program Synthesis
Kuznia, Kirby, Mishra, Swaroop, Parmar, Mihir, Baral, Chitta
Despite the success of large pre-trained language models (LMs) such as Codex, they show below-par performance on the larger and more complicated programming related questions. We show that LMs benefit from the summarized version of complicated questions. Our findings show that superfluous information often present in problem description such as human characters, background stories, and names (which are included to help humans in understanding a task) does not help models in understanding a task. To this extent, we create a meta-dataset from the frequently used APPS dataset and the newly created CodeContests dataset for the program synthesis task. Our meta-dataset consists of human and synthesized summaries of the long and complicated programming questions. Experimental results on Codex show that our proposed approach outperforms baseline by 8.13% on the APPS dataset and 11.88% on the CodeContests dataset on average in terms of strict accuracy. Our analysis shows that summaries significantly improve performance for introductory (9.86%) and interview (11.48%) programming questions. However, it shows improvement by a small margin (~ 2%) for competitive programming questions, implying scope for future research in this direction.