AITopics | Large Language Model

Collaborating Authors

Large Language Model

News Overviews Instructional Materials AI-Alerts Classics

AI should be licensed like medicines or nuclear power, Labour suggests

The GuardianJun-5-2023, 21:56:15 GMT

The UK should bar technology developers from working on advanced artificial intelligence tools unless they have a licence to do so, Labour has said. Ministers should introduce much stricter rules around companies training their AI products on vast datasets of the kind used by OpenAI to build ChatGPT, Lucy Powell, Labour's digital spokesperson, told the Guardian. Her comments come amid a rethink at the top of government over how to regulate the fast-moving world of AI, with the prime minister, Rishi Sunak, acknowledging it could pose an "existential" threat to humanity. One of the government's advisers on artificial intelligence also said on Monday that humanity could have only two years before AI is able to outwit people, the latest in a series of stark warnings about the threat posed by the fast-developing technology. Powell said: "My real point of concern is the lack of any regulation of the large language models that can then be applied across a range of AI tools, whether that's governing how they are built, how they are managed or how they are controlled."

government, nuclear power, powell, (13 more...)

The Guardian

Country:

Europe > United Kingdom (0.86)
North America > United States > District of Columbia > Washington (0.05)
North America > Canada > Ontario > Middlesex County > London (0.05)

Industry:

Government > Regional Government > Europe Government > United Kingdom Government (0.67)
Energy > Power Industry > Utilities > Nuclear (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

Add feedback

Daiwa gives workers OK to "freely use" ChatGPT as part of tech drive

The Japan TimesJun-5-2023, 11:49:00 GMT

Daiwa Securities Group employees are widely using an artificial intelligence-powered chatbot in Japan as the nation's second-largest brokerage follows global banks in exploring the potential of rapidly evolving technologies. Chief Executive Officer Seiji Nakata said the Tokyo-based firm started an experiment in April that gave around 9,000 workers in Japan the go-ahead to "freely use" ChatGPT. Daiwa has also been strengthening the recruitment of science graduates to develop high-tech experts in house, he said in an interview. The move comes as an AI revolution unfolds on Wall Street in response to widening interest in the technology and its likely business impact. Deutsche Bank is using it to scan wealthy client portfolios, while JPMorgan Chase & Co. is advertising for more AI roles than any of its rivals.

chatgpt, japan, tech drive

The Japan Times

Country:

North America > United States > New York > New York County > New York City (0.29)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.29)

Genre: Press Release (0.64)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.65)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

Chegg Embraced AI. ChatGPT Ate Its Lunch Anyway

WIREDJun-5-2023, 11:00:00 GMT

Investors were surprised when the online education company Chegg last month revealed that ChatGPT was hurting subscriber growth--the company lost half of its market value overnight. But long before Chegg became an index case for the disruptive force of ChatGPT, its top brass had heard plenty of warnings about the threat and opportunity of generative AI. For years, on afternoon walks outside Chegg's Silicon Valley headquarters, former executives say they had discussed someday slashing costs by tapping AI programs to replace an army of instructors that answer student questions and draft flashcards. Matthew Ramirez, a product leader who left Chegg two years ago, says he even advised CEO Dan Rosensweig in 2020 that generative AI would be the bus that ran down Chegg if it didn't prepare itself. And just weeks after OpenAI launched ChatGPT last November, a source familiar with the exchange says, one Chegg executive had the bot write an email to Rosensweig urging him to develop a ChatGPT rival.

large language model, machine learning, maskinelæring, (20 more...)

WIRED

Country: North America > United States > California (0.26)

Industry:

Education > Educational Setting > Online (0.57)
Banking & Finance > Trading (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.83)

Add feedback

ChatGPT has a problem no one wants to talk about

Washington Post - Technology NewsJun-5-2023, 10:04:33 GMT

That sort of computational power requires GPUs, or graphics processing units, that were first made for video games but were found to be the only chips that could handle such heavy computer tasks as large language models. Currently, just one company, Nvidia, sells the best of those, for which it charges tens of thousands of dollars. Nvidia's valuation recently rocketed to $1 trillion on the anticipated sales. The Taiwan-based company that manufactures many of those chips, TSMC, has likewise soared in value.

chatgpt, nvidia

Washington Post - Technology News

Country: Asia > Taiwan (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

The Creator of ChatGPT on the Rise of Artificial Intelligence

The New YorkerJun-5-2023, 10:00:00 GMT

Sign up to receive our weekly newsletter of the best New Yorker podcasts. David Remnick sits down with Sam Altman, the C.E.O. of OpenAI, which created ChatGPT, GPT-4, and other artificial-intelligence programs. A.I. is a tool, Altman emphasizes, that streamlines human work and quickens the pace of scientific advancement. But he claims to empathize with concerns about the emerging technology. "Even if you don't believe in any of the sci-fi stories," he tells Remnick, "you could still be freaked out about the level of change that this is going to bring society and the compressed time frame in which that's going to happen."

artificial intelligence, chatgpt, creator, (2 more...)

The New Yorker

Country: North America > United States > New York (0.28)

Industry: Government (0.60)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Information Flow Control in Machine Learning through Modular Model Architecture

Tiwari, Trishita, Gururangan, Suchin, Guo, Chuan, Hua, Weizhe, Kariyappa, Sanjay, Gupta, Udit, Xiong, Wenjie, Maeng, Kiwan, Lee, Hsien-Hsin S., Suh, G. Edward

arXiv.org Artificial IntelligenceJun-5-2023

In today's machine learning (ML) models, any part of the training data can affect its output. This lack of control for information flow from training data to model output is a major obstacle in training models on sensitive data when access control only allows individual users to access a subset of data. To enable secure machine learning for access controlled data, we propose the notion of information flow control for machine learning, and develop a secure Transformer-based language model based on the Mixture-of-Experts (MoE) architecture. The secure MoE architecture controls information flow by limiting the influence of training data from each security domain to a single expert module, and only enabling a subset of experts at inference time based on an access control policy. The evaluation using a large corpus of text data shows that the proposed MoE architecture has minimal (1.9%) performance overhead and can significantly improve model accuracy (up to 37%) by enabling training on access-controlled data.

artificial intelligence, machine learning, security domain, (18 more...)

arXiv.org Artificial Intelligence

2306.03235

Country:

North America > United States (0.14)
Europe > Croatia (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Energy > Oil & Gas > Upstream (0.61)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DISCO: Distilling Counterfactuals with Large Language Models

Chen, Zeming, Gao, Qiyue, Bosselut, Antoine, Sabharwal, Ashish, Richardson, Kyle

arXiv.org Artificial IntelligenceJun-5-2023

Models trained with counterfactually augmented data learn representations of the causal structure of tasks, enabling robust generalization. However, high-quality counterfactual data is scarce for most tasks and not easily generated at scale. When crowdsourced, such data is typically limited in scale and diversity; when generated using supervised methods, it is computationally expensive to extend to new counterfactual dimensions. In this work, we introduce DISCO (DIStilled COunterfactual Data), a new method for automatically generating high quality counterfactual data at scale. DISCO engineers prompts to generate phrasal perturbations with a large general language model. Then, a task-specific teacher model filters these generations to distill high-quality counterfactual data. While task-agnostic, we apply our pipeline to the task of natural language inference (NLI) and find that on challenging evaluations such as the NLI stress test, comparatively smaller student models trained with DISCO generated counterfactuals are more robust (6% absolute) and generalize better across distributions (2%) compared to models trained without data augmentation. Furthermore, DISCO augmented models are 10% more consistent between counterfactual pairs on three evaluation sets, demonstrating that DISCO augmentation enables models to more reliably learn causal representations. Our repository is available at: https://github.com/eric11eca/disco

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2212.10534

Country:

North America > United States (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Communications > Social Media > Crowdsourcing (0.66)

Add feedback

Zero-Shot Prompting for Implicit Intent Prediction and Recommendation with Commonsense Reasoning

Kuo, Hui-Chi, Chen, Yun-Nung

arXiv.org Artificial IntelligenceJun-5-2023

Intelligent virtual assistants are currently designed to perform tasks or services explicitly mentioned by users, so multiple related domains or tasks need to be performed one by one through a long conversation with many explicit intents. Instead, human assistants are capable of reasoning (multiple) implicit intents based on user utterances via commonsense knowledge, reducing complex interactions and improving practicality. Therefore, this paper proposes a framework of multi-domain dialogue systems, which can automatically infer implicit intents based on user utterances and then perform zero-shot prompting using a large pre-trained language model to trigger suitable single task-oriented bots. The proposed framework is demonstrated effective to realize implicit intents and recommend associated bots in a zero-shot manner.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.05901

Country: Asia > Taiwan > Taiwan Province > Taipei (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Services (0.97)
Media (0.95)
Leisure & Entertainment (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

A Universal Discriminator for Zero-Shot Generalization

Xu, Haike, Lin, Zongyu, Zhou, Jing, Zheng, Yanan, Yang, Zhilin

arXiv.org Artificial IntelligenceJun-5-2023

Generative modeling has been the dominant approach for large-scale pretraining and zero-shot generalization. In this work, we challenge this convention by showing that discriminative approaches perform substantially better than generative ones on a large number of NLP tasks. Technically, we train a single discriminator to predict whether a text sample comes from the true data distribution, similar to GANs. Since many NLP tasks can be formulated as selecting from a few options, we use this discriminator to predict the concatenation of input and which option has the highest probability of coming from the true data distribution. This simple formulation achieves state-of-the-art zero-shot results on the T0 benchmark, outperforming T0 by 16.0\%, 7.8\%, and 11.5\% respectively on different scales. In the finetuning setting, our approach also achieves new state-of-the-art results on a wide range of NLP tasks, with only 1/4 parameters of previous methods. Meanwhile, our approach requires minimal prompting efforts, which largely improves robustness and is essential for real-world applications. Furthermore, we also jointly train a generalized UD in combination with generative tasks, which maintains its advantage on discriminative tasks and simultaneously works on generative tasks.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2211.08099

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Xiao, Guangxuan, Lin, Ji, Seznec, Mickael, Wu, Hao, Demouth, Julien, Han, Song

arXiv.org Artificial IntelligenceJun-5-2023

Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quantization can reduce memory and accelerate inference. However, existing methods cannot maintain accuracy and hardware efficiency at the same time. We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs. Based on the fact that weights are easy to quantize while activations are not, SmoothQuant smooths the activation outliers by offline migrating the quantization difficulty from activations to weights with a mathematically equivalent transformation. SmoothQuant enables an INT8 quantization of both weights and activations for all the matrix multiplications in LLMs, including OPT, BLOOM, GLM, MT-NLG, and LLaMA family. We demonstrate up to 1.56x speedup and 2x memory reduction for LLMs with negligible loss in accuracy. SmoothQuant enables serving 530B LLM within a single node. Our work offers a turn-key solution that reduces hardware costs and democratizes LLMs. Code is available at https://github.com/mit-han-lab/smoothquant.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2211.10438

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback