Large Language Model
Google's Search Labs lets you test its AI-powered 'products and ideas'
It's fair to say that Google was caught flat-footed by Microsoft's launch of Bing search powered by ChatGPT, as it didn't have anything similar when it unveiled its own conversational AI, Bard. Now, Google has announced Search Labs, a new way for consumers to test "bold new ideas and ideas we're exploring" in search, the company said at its I/O 2023 keynote. There are three key features available for a limited time. The first is called Search Generative Experience (SGE), bringing generative AI directly into Google Search. "The new Search experience helps you quickly find and make sense of information," Google's Direct of Search wrote.
Google Bard transitions to PaLM 2 and expands to 180 countries
For the past two months, anybody wanting to try out Google's new chatbot AI, Bard, had to first register their interest and join a waitlist before being granted access. On Wednesday, the company announced that those days are over. Bard will immediately be dropping the waitlist requirement as it expands to 180 additional countries and territories. What's more, this expanded Bard will be built atop Google's newest Large Language Model, PaLM 2, making it more capable than ever before. Google hurriedly released the first generation Bard back in February after OpenAI's ChatGPT came out of nowhere and began eating the industry's collective lunch like Gulliver in a Lilliputian cafeteria. Matters were made worse when Bard's initial performances proved less than impressive -- especially given Google's generally accepted status at the forefront of AI development -- which hurt both Google's public image and its bottom line.
OpenAI suggests voluntary AI standards, not government mandates, to ensure AI safety
Fox News contributor Joe Concha joins "Fox & Friends First" to discuss Elon Musk's warning that AI could threaten elections and his concerns on the declining birth rate. The top lawyer for OpenAI, the company that developed ChatGPT, argued that the best way to regulate artificial intelligence is not to start with government mandated rules and regulations but to allow the companies themselves to set standards that ensure AI is used safely and responsibly. OpenAI General Counsel Jason Kwon made that argument during a Tuesday panel discussion in Washington, D.C., which was hosted by BSA/The Software Alliance, even as he acknowledged that AI is developing so quickly that it can often lead to unexpected results that companies quickly need to rein in. Still, when asked what his message to policymakers was, Kwon recommended voluntary, industry-led standards for AI, calling for a tactic that many companies in most industries tend to favor over government mandates. The top lawyer at OpenAI, run by CEO Sam Altman, above, said this week that the company recommends voluntary industry standards to regulate AI, not government mandates.
Combo of Thinking and Observing for Outside-Knowledge VQA
Si, Qingyi, Mo, Yuchen, Lin, Zheng, Ji, Huishan, Wang, Weiping
Outside-knowledge visual question answering is a challenging task that requires both the acquisition and the use of open-ended real-world knowledge. Some existing solutions draw external knowledge into the cross-modality space which overlooks the much vaster textual knowledge in natural-language space, while others transform the image into a text that further fuses with the textual knowledge into the natural-language space and completely abandons the use of visual features. In this paper, we are inspired to constrain the cross-modality space into the same space of natural-language space which makes the visual features preserved directly, and the model still benefits from the vast knowledge in natural-language space. To this end, we propose a novel framework consisting of a multimodal encoder, a textual encoder and an answer decoder. Such structure allows us to introduce more types of knowledge including explicit and implicit multimodal and textual knowledge. Extensive experiments validate the superiority of the proposed method which outperforms the state-of-the-art by 6.17% accuracy. We also conduct comprehensive ablations of each component, and systematically study the roles of varying types of knowledge. Codes and knowledge data can be found at https://github.com/PhoebusSi/Thinking-while-Observing.
Large Language Models Need Holistically Thought in Medical Conversational QA
Weng, Yixuan, Li, Bin, Xia, Fei, Zhu, Minjun, Sun, Bin, He, Shizhu, Liu, Kang, Zhao, Jun
The medical conversational question answering (CQA) system aims at providing a series of professional medical services to improve the efficiency of medical care. Despite the success of large language models (LLMs) in complex reasoning tasks in various fields, such as mathematics, logic, and commonsense QA, they still need to improve with the increased complexity and specialization of the medical field. This is because medical CQA tasks require not only strong medical reasoning, but also the ability to think broadly and deeply. In this paper, to address these challenges in medical CQA tasks that need to be considered and understood in many aspects, we propose the Holistically Thought (HoT) method, which is designed to guide the LLMs to perform the diffused and focused thinking for generating high-quality medical responses. The proposed HoT method has been evaluated through automated and manual assessments in three different medical CQA datasets containing the English and Chinese languages. The extensive experimental results show that our method can produce more correctness, professional, and considerate answers than several state-of-the-art (SOTA) methods, manifesting its effectiveness. Our code in https://github.com/WENGSYX/HoT.
Talking with Machines: A Comprehensive Survey of Emergent Dialogue Systems
From the earliest experiments in the 20th century to the utilization of large language models and transformers, dialogue systems research has continued to evolve, playing crucial roles in numerous fields. This paper offers a comprehensive review of these systems, tracing their historical development and examining their fundamental operations. We analyze popular and emerging datasets for training and survey key contributions in dialogue systems research, including traditional systems and advanced machine learning methods. Finally, we consider conventional and transformer-based evaluation metrics, followed by a short discussion of prevailing challenges and future prospects in the field.
Large language models in biomedical natural language processing: benchmarks, baselines, and recommendations
Chen, Qingyu, Du, Jingcheng, Hu, Yan, Keloth, Vipina Kuttichi, Peng, Xueqing, Raja, Kalpana, Zhang, Rui, Lu, Zhiyong, Xu, Hua
Biomedical literature is growing rapidly, making it challenging to curate and extract knowledge manually. Biomedical natural language processing (BioNLP) techniques that can automatically extract information from biomedical literature help alleviate this burden. Recently, large Language Models (LLMs), such as GPT-3 and GPT-4, have gained significant attention for their impressive performance. However, their effectiveness in BioNLP tasks and impact on method development and downstream users remain understudied. This pilot study (1) establishes the baseline performance of GPT-3 and GPT-4 at both zero-shot and one-shot settings in eight BioNLP datasets across four applications: named entity recognition, relation extraction, multi-label document classification, and semantic similarity and reasoning, (2) examines the errors produced by the LLMs and categorized the errors into three types: missingness, inconsistencies, and unwanted artificial content, and (3) provides suggestions for using LLMs in BioNLP applications. We make the datasets, baselines, and results publicly available to the community via https://github.com/qingyu-qc/gpt_bionlp_benchmark.
Text-To-Concept (and Back) via Cross-Model Alignment
Moayeri, Mazda, Rezaei, Keivan, Sanjabi, Maziar, Feizi, Soheil
We observe that the mapping between an image's representation in one model to its representation in another can be learned surprisingly well with just a linear layer, even across diverse models. Building on this observation, we propose $\textit{text-to-concept}$, where features from a fixed pretrained model are aligned linearly to the CLIP space, so that text embeddings from CLIP's text encoder become directly comparable to the aligned features. With text-to-concept, we convert fixed off-the-shelf vision encoders to surprisingly strong zero-shot classifiers for free, with accuracy at times even surpassing that of CLIP, despite being much smaller models and trained on a small fraction of the data compared to CLIP. We show other immediate use-cases of text-to-concept, like building concept bottleneck models with no concept supervision, diagnosing distribution shifts in terms of human concepts, and retrieving images satisfying a set of text-based constraints. Lastly, we demonstrate the feasibility of $\textit{concept-to-text}$, where vectors in a model's feature space are decoded by first aligning to the CLIP before being fed to a GPT-based generative model. Our work suggests existing deep models, with presumably diverse architectures and training, represent input samples relatively similarly, and a two-way communication across model representation spaces and to humans (through language) is viable.
LACoS-BLOOM: Low-rank Adaptation with Contrastive objective on 8 bits Siamese-BLOOM
Hua, Wen-Yu, Williams, Brian, Shamsi, Davood
Text embeddings are useful features for several NLP applications, such as sentence similarity, text clustering, and semantic search. In this paper, we present a Low-rank Adaptation with a Contrastive objective on top of 8-bit Siamese-BLOOM, a multilingual large language model optimized to produce semantically meaningful word embeddings. The innovation is threefold. First, we cast BLOOM weights to 8-bit values. Second, we fine-tune BLOOM with a scalable adapter (LoRA) and 8-bit Adam optimizer for sentence similarity classification. Third, we apply a Siamese architecture on BLOOM model with a contrastive objective to ease the multi-lingual labeled data scarcity. The experiment results show the quality of learned embeddings from LACoS-BLOOM is proportional to the number of model parameters and the amount of unlabeled training data. With the parameter efficient fine-tuning design, we are able to run BLOOM 7.1 billion parameters end-to-end on a single GPU machine with 32GB memory. Compared to previous solution Sentence-BERT, we achieve significant improvement on both English and multi-lingual STS tasks.