Goto

Collaborating Authors

 Large Language Model


What Should Data Science Education Do with Large Language Models?

arXiv.org Artificial Intelligence

The rapid advances of large language models (LLMs), such as ChatGPT, are revolutionizing data science and statistics. These state-of-the-art tools can streamline complex processes. As a result, it reshapes the role of data scientists. We argue that LLMs are transforming the responsibilities of data scientists, shifting their focus from hands-on coding, data-wrangling and conducting standard analyses to assessing and managing analyses performed by these automated AIs. This evolution of roles is reminiscent of the transition from a software engineer to a product manager. We illustrate this transition with concrete data science case studies using LLMs in this paper. These developments necessitate a meaningful evolution in data science education. Pedagogy must now place greater emphasis on cultivating diverse skillsets among students, such as LLM-informed creativity, critical thinking, AI-guided programming. LLMs can also play a significant role in the classroom as interactive teaching and learning tools, contributing to personalized education. This paper discusses the opportunities, resources and open challenges for each of these directions. As with any transformative technology, integrating LLMs into education calls for careful consideration. While LLMs can perform repetitive tasks efficiently, it's crucial to remember that their role is to supplement human intelligence and creativity, not to replace it. Therefore, the new era of data science education should balance the benefits of LLMs while fostering complementary human expertise and innovations. In conclusion, the rise of LLMs heralds a transformative period for data science and its education. This paper seeks to shed light on the emerging trends, potential opportunities, and challenges accompanying this paradigm shift, hoping to spark further discourse and investigation into this exciting, uncharted territory.


Conformal Prediction with Large Language Models for Multi-Choice Question Answering

arXiv.org Artificial Intelligence

As large language models continue to be widely developed, robust uncertainty quantification techniques will become crucial for their safe deployment in high-stakes scenarios. In this work, we explore how conformal prediction can be used to provide uncertainty quantification in language models for the specific task of multiple-choice question-answering. We find that the uncertainty estimates from conformal prediction are tightly correlated with prediction accuracy. This observation can be useful for downstream applications such as selective classification and filtering out low-quality predictions. We also investigate the exchangeability assumption required by conformal prediction to out-of-subject questions, which may be a more realistic scenario for many practical applications. Our work contributes towards more trustworthy and reliable usage of large language models in safety-critical situations, where robust guarantees of error rate are required.


The Innovation Paradox: Concept Space Expansion with Diminishing Originality and the Promise of Creative AI

arXiv.org Artificial Intelligence

Innovation, typically spurred by reusing, recombining, and synthesizing existing concepts, is expected to result in an exponential growth of the concept space over time. However, our statistical analysis of TechNet, which is a comprehensive technology semantic network encompassing over four million concepts derived from patent texts, reveals a linear rather than exponential expansion of the overall technological concept space. Moreover, there is a notable decline in the originality of newly created concepts. These trends can be attributed to the constraints of human cognitive abilities to innovate beyond an ever-growing space of prior art, among other factors. Integrating creative artificial intelligence into the innovation process holds the potential to overcome these limitations and alter the observed trends in the future.


LENS: A Learnable Evaluation Metric for Text Simplification

arXiv.org Artificial Intelligence

Training learnable metrics using modern language models has recently emerged as a promising method for the automatic evaluation of machine translation. However, existing human evaluation datasets for text simplification have limited annotations that are based on unitary or outdated models, making them unsuitable for this approach. To address these issues, we introduce the SimpEval corpus that contains: SimpEval_past, comprising 12K human ratings on 2.4K simplifications of 24 past systems, and SimpEval_2022, a challenging simplification benchmark consisting of over 1K human ratings of 360 simplifications including GPT-3.5 generated text. Training on SimpEval, we present LENS, a Learnable Evaluation Metric for Text Simplification. Extensive empirical results show that LENS correlates much better with human judgment than existing metrics, paving the way for future progress in the evaluation of text simplification. We also introduce Rank and Rate, a human evaluation framework that rates simplifications from several models in a list-wise manner using an interactive interface, which ensures both consistency and accuracy in the evaluation process and is used to create the SimpEval datasets.


AI firms should face prison over creation of fake humans, says Yuval Noah Harari

The Guardian

The creators of AI bots that masquerade as people should face harsh criminal sentences comparable to those who trade in counterfeit currency, the Israeli historian and author Yuval Noah Harari has said. He also called for sanctions, including prison sentences, to apply to tech company executives who fail to guard against fake profiles on their social media platforms. Addressing the UN's AI for Good global summit in Geneva, the author of Sapiens and Home Deus said the proliferation of fake humans could lead to a collapse in public trust and democracy. "Now it is possible, for the first time in history, to create fake people โ€“ billions of fake people," he said. "If this is allowed to happen it will do to society what fake money threatened to do to the financial system. If you can't know who is a real human, trust will collapse. "Maybe relationships will be able to manage somehow, but not democracy," Harari added. The advent of ChatGPT and other large language models means AI bots can not only amplify human content, but also artificially generate their own content at scale. "What happens if you have a social media platform where โ€ฆ millions of bots can create content that is in many ways superior to what humans can create โ€“ more convincing, more appealing," he said. "If we allow this to happen, then humans have completely lost control of the public conversation.


Big tech companies want AI regulation -- but on their own terms

The Japan Times

OpenAI Chief Executive Officer Sam Altman surprised everyone last month when he warned Congress of the dangers posed by artificial intelligence. Suddenly, it looked like tech companies had learned from the problems of social media and wanted to roll out AI differently. Even more remarkably: They wanted politicians' help. But a week later, Altman told a different story to reporters in London. The head of ChatGPT's creator said that he would try to comply with European Union rules but if that proved too difficult, his company would "cease operating" within the bloc.


Give Every AI a Soul--or Else

WIRED

Mavens in the field of artificial intelligence, including architects of notorious "generative AI" systems like ChatGPT, now publicly express shared dread of terrible outcomes that might be wrought by their own creations. Many now call for a moratorium, or pause in AI development, allowing time for existing nations and institutions to innovate systems of control. Amid the toppling of many clichรฉd assumptions, we've learned that so-called Turing tests are irrelevant, providing no insight at all into whether generative large language models--GLLMs or "gollems"--are actually sapient beings. Anyway, that distinction now appears less pressing than questions of good or bad--or potentially lethal--behavior. This essay is adapted from David Brin's nonfiction book in progress, Soul on Ai.


Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection

arXiv.org Artificial Intelligence

Neural sequence models based on the transformer architecture have demonstrated remarkable \emph{in-context learning} (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model. This work first provides a comprehensive statistical theory for transformers to perform ICL. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression, Lasso, learning generalized linear models, and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Using an efficient implementation of in-context gradient descent as the underlying mechanism, our transformer constructions admit mild size bounds, and can be learned with polynomially many pretraining sequences. Building on these ``base'' ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving \emph{in-context algorithm selection}, akin to what a statistician can do in real life -- A \emph{single} transformer can adaptively select different base ICL algorithms -- or even perform qualitatively different tasks -- on different input sequences, without any explicit prompting of the right algorithm or task. We both establish this in theory by explicit constructions, and also observe this phenomenon experimentally. In theory, we construct two general mechanisms for algorithm selection with concrete examples: pre-ICL testing, and post-ICL validation. As an example, we use the post-ICL validation mechanism to construct a transformer that can perform nearly Bayes-optimal ICL on a challenging task -- noisy linear models with mixed noise levels. Experimentally, we demonstrate the strong in-context algorithm selection capabilities of standard transformer architectures.


ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation

arXiv.org Artificial Intelligence

The ability to accurately locate and navigate to a specific object is a crucial capability for embodied agents that operate in the real world and interact with objects to complete tasks. Such object navigation tasks usually require large-scale training in visual environments with labeled objects, which generalizes poorly to novel objects in unknown environments. In this work, we present a novel zero-shot object navigation method, Exploration with Soft Commonsense constraints (ESC), that transfers commonsense knowledge in pre-trained models to open-world object navigation without any navigation experience nor any other training on the visual environments. First, ESC leverages a pre-trained vision and language model for open-world prompt-based grounding and a pre-trained commonsense language model for room and object reasoning. Then ESC converts commonsense knowledge into navigation actions by modeling it as soft logic predicates for efficient exploration. Extensive experiments on MP3D, HM3D, and RoboTHOR benchmarks show that our ESC method improves significantly over baselines, and achieves new state-of-the-art results for zero-shot object navigation (e.g., 288% relative Success Rate improvement than CoW on MP3D).


CORE-GPT: Combining Open Access research and large language models for credible, trustworthy question answering

arXiv.org Artificial Intelligence

In this paper, we present CORE-GPT, a novel question-answering platform that combines GPT-based language models and more than 32 million full-text open access scientific articles from CORE. We first demonstrate that GPT3.5 and GPT4 cannot be relied upon to provide references or citations for generated text. We then introduce CORE-GPT which delivers evidence-based answers to questions, along with citations and links to the cited papers, greatly increasing the trustworthiness of the answers and reducing the risk of hallucinations. CORE-GPT's performance was evaluated on a dataset of 100 questions covering the top 20 scientific domains in CORE, resulting in 100 answers and links to 500 relevant articles. The quality of the provided answers and and relevance of the links were assessed by two annotators. Our results demonstrate that CORE-GPT can produce comprehensive and trustworthy answers across the majority of scientific domains, complete with links to genuine, relevant scientific articles.