Goto

Collaborating Authors

 Large Language Model


How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks

arXiv.org Artificial Intelligence

The GPT-3.5 models have demonstrated impressive performance in various Natural Language Processing (NLP) tasks, showcasing their strong understanding and reasoning capabilities. However, their robustness and abilities to handle various complexities of the open world have yet to be explored, which is especially crucial in assessing the stability of models and is a key aspect of trustworthy AI. In this study, we perform a comprehensive experimental analysis of GPT-3.5, exploring its robustness using 21 datasets (about 116K test samples) with 66 text transformations from TextFlint that cover 9 popular Natural Language Understanding (NLU) tasks. Our findings indicate that while GPT-3.5 outperforms existing fine-tuned models on some tasks, it still encounters significant robustness degradation, such as its average performance dropping by up to 35.74\% and 43.59\% in natural language inference and sentiment analysis tasks, respectively. We also show that GPT-3.5 faces some specific robustness challenges, including robustness instability, prompt sensitivity, and number sensitivity. These insights are valuable for understanding its limitations and guiding future research in addressing these challenges to enhance GPT-3.5's overall performance and generalization abilities.


Does Zero-Shot Reinforcement Learning Exist?

arXiv.org Artificial Intelligence

A zero-shot RL agent is an agent that can solve any RL task in a given environment, instantly with no additional planning or learning, after an initial reward-free learning phase. This marks a shift from the reward-centric RL paradigm towards "controllable" agents that can follow arbitrary instructions in an environment. Current RL agents can solve families of related tasks at best, or require planning anew for each task. Strategies for approximate zero-shot RL ave been suggested using successor features (SFs) [BBQ+ 18] or forward-backward (FB) representations [TO21], but testing has been limited. After clarifying the relationships between these schemes, we introduce improved losses and new SF models, and test the viability of zero-shot RL schemes systematically on tasks from the Unsupervised RL benchmark [LYL+21]. To disentangle universal representation learning from exploration, we work in an offline setting and repeat the tests on several existing replay buffers. SFs appear to suffer from the choice of the elementary state features. SFs with Laplacian eigenfunctions do well, while SFs based on auto-encoders, inverse curiosity, transition models, low-rank transition matrix, contrastive learning, or diversity (APS), perform unconsistently. In contrast, FB representations jointly learn the elementary and successor features from a single, principled criterion. They perform best and consistently across the board, reaching 85% of supervised RL performance with a good replay buffer, in a zero-shot manner.


Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought

arXiv.org Artificial Intelligence

Large language models (LLMs) have shown remarkable reasoning capabilities given chain-of-thought prompts (examples with intermediate reasoning steps). Existing benchmarks measure reasoning ability indirectly, by evaluating accuracy on downstream tasks such as mathematical reasoning. However, it is unclear how these models obtain the answers and whether they rely on simple heuristics rather than the generated chain-of-thought. To enable systematic exploration of the reasoning ability of LLMs, we present a new synthetic question-answering dataset called PrOntoQA, where each example is generated from a synthetic world model represented in first-order logic. This allows us to parse the generated chain-of-thought into symbolic proofs for formal analysis. Our analysis on InstructGPT and GPT-3 shows that LLMs are quite capable of making correct individual deduction steps, and so are generally capable of reasoning, even in fictional contexts. However, they have difficulty with proof planning: When multiple valid deduction steps are available, they are not able to systematically explore the different options.


Where is ChatGPT taking us? And do we want to follow?

#artificialintelligence

With its uncanny ability to mimic human language and reasoning, ChatGPT seems to herald a revolution in artificial intelligence. The nimble chatbot can conjure poems and essays, share recipes, translate languages, dispense advice, and tell jokes, among the endless applications users have tested since the Silicon Valley research lab OpenAI released the natural language-processing tool in November. With the excitement comes some trepidation--that the technology could degrade authentic human writing and critical thinking, upend industries, and amplify our own prejudices and biases. Experts from across the university convene at 1 p.m. EST to discuss the latest developments in AI, including language learning programs such as ChatGPT, disinformation campaigns, and ethical concerns. To those working in artificial intelligence, ChatGPT is not merely an overnight sensation, but a mark of achievement after years of experimentation, says Johns Hopkins assistant computer science professor Daniel Khashabi, who specializes in language processing and has worked on similar tools.


La veille de la cybersécurité

#artificialintelligence

The companies touting new chat-based artificial-intelligence systems are running a massive experiment--and we are the test subjects. In this experiment, Microsoft, OpenAI and others are rolling out on the internet an alien intelligence that no one really understands, which has been granted the ability to influence our assessment of what's true in the world.


Financial services' AI appetite grows, study says

#artificialintelligence

Top uses for AI in the financial services sector include'next-best action' systems; portfolio optimization; and fraud detection. The recent attention showered on ChatGPT, the natural- language processing tool developed by Microsoft-backed OpenAI that can reportedly mimic advanced conversations and writing tasks, was likely not lost on CFOs. Financial executives are increasingly aware that they must strike a delicate balance between supporting C-suite enthusiasm for AI initiatives while conserving scarce resources. AI's long-term potential, however, is leading executives to prioritize it, according to a study released by tech firm Nvidia this month. The percentage of financial services executives surveyed who said their executive leadership teams value and believe in AI has more than doubled to 64% from 36% of those polled last year, the Santa Clara, Calif.-based Nvidia reported.


AI: Disinformation, misinformation and meltdowns

#artificialintelligence

Artificial Intelligence (AI) has taken over our imagination in recent months, with Large Language Models, in particular, being touted as a ground-breaking development. However, as we move beyond the hype, the true capabilities and limitations of the technology are beginning to emerge. It is hard to deny, the recent developments in Artificial Intelligence (AI) that have been released are beginning to highlight to the masses the potential of AI in both our personal and professional lives. Over the past several weeks, people have revelled in the capabilities of chatbots, such as ChatGPT, even declaring that it could replace certain roles, such as those associated with customer care, media (advertising, content creation and technical writing) and research. However, over time, as more questions have been thrown as these chatbots and people actively test their limits, several cracks have begun to emerge, as few of which we outline below. As a result, people are beginning to ask more questions about the deficiencies and limitations of AI, especially the Large Language Models (LLM), such as ChatGPT, which to some degree, was being marketed as "the best thing since sliced bread" and being able to transform life as we know it.


Is Elon Musk joining the AI race? Billionaire rumored to be working on an 'anti-woke' ChatGPT rival

Daily Mail - Science & tech

Elon Musk could soon join the AI race with an'anti-woke' ChatGPT rival that'would not censor its replies.' People familiar with the matter told The Information that Musk is assembling a team of AI researchers, including Igor Babuschki, who recently left Google's DeepMind AI unit. Babuschkin said he has not officially signed onto the Musk initiative. The project is in'the early stages,' but the goal is to develop'a trustworthy and reliable' chatbot. The move comes as the billionaire has repeatedly criticized OpenAI for placing safeguards on ChatGPT to prevent it from generating offensive dialogue.


OPED: How ChatGPT is Transforming the Way We Study

#artificialintelligence

As technology constantly advances and focuses on the use of artificial intelligence (AI) and virtual assistance, the future of education also goes through numerous changes. It is not only in how we acquire information these days but also in how the tools we use transform the way we study per se. One of the prominent examples is ChatGPT, which has instantly taken things to another level by allowing smart students and educators to use it as a solution for writing and even analytical purposes. As the system bases itself on the information that is being shared, ChatGPT also provides intelligent feedback that helps to remain inspired and have fun with this new generation chatbot! If you have never used intelligent chatbots in the past for educational purposes, the best way to start is ChatGPT.


ChatGPT Alternatives: The #1 Options For Features & Uptime

#artificialintelligence

It'll be back, of course, but if you're looking for a more reliable solution then you need to check out these 5 awesome ChatGPT alternatives… ChatGPT has taken the world by storm, giving people from all walks or life their first taste of what AI can really do. But millions of people woke up this week to find that ChatGPT was down and this, of course, caused quite a few headaches for ChatGPT's millions of users. Because ChatGPT is hosted on OpenAI's servers, when things go wrong – OpenAI has an issue with its servers or something else – the entire platform goes down. But there are now plenty of ChatGPT alternatives that run on the exact same source code as ChatGPT. And because these ChatGPT alternatives are not run or hosted by OpenAI, they're far less likely to crash or go down simply because they have much lower volumes of users on their servers.