Goto

Collaborating Authors

 large-language model


Woman says ChatGPT saved her life by helping detect cancer, which doctors missed

FOX News

Fox News senior medical analyst Dr. Marc Siegel joined'Fox & Friends' to discuss the impact of artificial intelligence on medicine and his take on President Trump's decision to withdraw from the World Health Organization. A mother of two credits ChatGPT for saving her life, claiming the artificial intelligence chatbot flagged the condition leading to her cancer when doctors missed it. Lauren Bannon, who divides her time between North Carolina and the U.S. Virgin Islands, first noticed in February 2024 that she was having trouble bending her fingers in the morning and evening, as reported by Kennedy News and Media. After four months, the 40-year-old was told by doctors that she had rheumatoid arthritis, despite testing negative for the condition. WHAT IS ARTIFICIAL INTELLIGENCE (AI)?


Representation Engineering for Large-Language Models: Survey and Research Challenges

Bartoszcze, Lukasz, Munshi, Sarthak, Sukidi, Bryan, Yen, Jennifer, Yang, Zejia, Williams-King, David, Le, Linh, Asuzu, Kosi, Maple, Carsten

arXiv.org Artificial Intelligence

Large-language models are capable of completing a variety of tasks, but remain unpredictable and intractable. Representation engineering seeks to resolve this problem through a new approach utilizing samples of contrasting inputs to detect and edit high-level representations of concepts such as honesty, harmfulness or power-seeking. We formalize the goals and methods of representation engineering to present a cohesive picture of work in this emerging discipline. We compare it with alternative approaches, such as mechanistic interpretability, prompt-engineering and fine-tuning. We outline risks such as performance decrease, compute time increases and steerability issues. We present a clear agenda for future research to build predictable, dynamic, safe and personalizable LLMs.


Revisiting VerilogEval: Newer LLMs, In-Context Learning, and Specification-to-RTL Tasks

Pinckney, Nathaniel, Batten, Christopher, Liu, Mingjie, Ren, Haoxing, Khailany, Brucek

arXiv.org Artificial Intelligence

The application of large-language models (LLMs) to digital hardware code generation is an emerging field. Most LLMs are primarily trained on natural language and software code. Hardware code, such as Verilog, represents only a small portion of the training data and few hardware benchmarks exist. To address this gap, the open-source VerilogEval benchmark was released in 2023, providing a consistent evaluation framework for LLMs on code completion tasks. It was tested on state-of-the-art models at the time including GPT-4. However, VerilogEval and other Verilog generation benchmarks lack failure analysis and, in present form, are not conducive to exploring prompting techniques. Also, since VerilogEval's release, both commercial and open-source models have seen continued development. In this work, we evaluate new commercial and open-source models of varying sizes against an improved VerilogEval benchmark suite. We enhance VerilogEval's infrastructure and dataset by automatically classifying failures, introduce new prompts for supporting in-context learning (ICL) examples, and extend the supported tasks to specification-to-RTL translation. We find a measurable improvement in commercial state-of-the-art models, with GPT-4 Turbo achieving a 59% pass rate on spec-to-RTL tasks. We also study the performance of open-source and domain-specific models that have emerged, and demonstrate that models can benefit substantially from ICL. We find that recently-released Llama 3.1 405B achieves a pass rate of 58%, effectively matching that of GPT-4 Turbo, and that the much smaller domain-specific RTL-Coder 6.7B models achieve an impressive 37% pass rate. However, prompt engineering is key to achieving good pass rates, and varies widely with model and task. A benchmark infrastructure that allows for prompt engineering and failure analysis is key to continued model development and deployment.


North Korea and Iran using AI for hacking, Microsoft says

The Guardian

US adversaries – chiefly Iran and North Korea, and to a lesser extent Russia and China – are beginning to use generative artificial intelligence to mount or organize offensive cyber operations, Microsoft said on Wednesday. Microsoft said it detected and disrupted, in collaboration with business partner OpenAI, many threats that used or attempted to exploit AI technology they had developed. In a blogpost, the company said the techniques were "early-stage" and neither "particularly novel or unique" but that it was important to expose them publicly as US rivals leveraging large-language models to expand their ability to breach networks and conduct influence operations. Cybersecurity firms have long used machine-learning on defense, principally to detect anomalous behavior in networks. But criminals and offensive hackers use it as well, and the introduction of large-language models led by OpenAI's ChatGPT upped that game of cat-and-mouse.


Understanding Large-Language Model (LLM)-powered Human-Robot Interaction

Kim, Callie Y., Lee, Christine P., Mutlu, Bilge

arXiv.org Artificial Intelligence

Large-language models (LLMs) hold significant promise in improving human-robot interaction, offering advanced conversational skills and versatility in managing diverse, open-ended user requests in various tasks and domains. Despite the potential to transform human-robot interaction, very little is known about the distinctive design requirements for utilizing LLMs in robots, which may differ from text and voice interaction and vary by task and context. To better understand these requirements, we conducted a user study (n = 32) comparing an LLM-powered social robot against text- and voice-based agents, analyzing task-based requirements in conversational tasks, including choose, generate, execute, and negotiate. Our findings show that LLM-powered robots elevate expectations for sophisticated non-verbal cues and excel in connection-building and deliberation, but fall short in logical communication and may induce anxiety. We provide design implications both for robots integrating LLMs and for fine-tuning LLMs for use with robots.


AMD's new Ryzen 8000 laptop CPUs are built for an AI future

PCWorld

AMD announced the Ryzen 8040 series of laptop processors at the company's AI-themed event, reframing what has been a conversation about CPU speed, power, and battery life into one that prioritizes AI. In January, AMD launched the Ryzen 7000 family, of which the Ryzen 7040 included the first use of what AMD then called its XDNA architecture, powering Ryzen AI. (When rival Intel disclosed its Meteor Lake processor this past summer, Intel began referring to the AI accelerator as an NPU, and the name stuck.) More than 50 laptop models already ship with Ryzen AI, executives said. In AMD's case, the XDNA NPU assists the Zen CPU, with the Radeon RDNA architecture of the GPU powering graphics. But all three logic components work harmoniously, contributing to the greater whole.


Who Wrote it and Why? Prompting Large-Language Models for Authorship Verification

Hung, Chia-Yu, Hu, Zhiqiang, Hu, Yujia, Lee, Roy Ka-Wei

arXiv.org Artificial Intelligence

Authorship verification (AV) is a fundamental task in natural language processing (NLP) and computational linguistics, with applications in forensic analysis, plagiarism detection, and identification of deceptive content. Existing AV techniques, including traditional stylometric and deep learning approaches, face limitations in terms of data requirements and lack of explainability. To address these limitations, this paper proposes PromptAV, a novel technique that leverages Large-Language Models (LLMs) for AV by providing step-by-step stylometric explanation prompts. PromptAV outperforms state-of-the-art baselines, operates effectively with limited training data, and enhances interpretability through intuitive explanations, showcasing its potential as an effective and interpretable solution for the AV task.


Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent Problems in AI Alignment using Large-Language Models

Phelps, Steve, Ranson, Rebecca

arXiv.org Artificial Intelligence

AI Alignment is often presented as an interaction between a single designer and an artificial agent in which the designer attempts to ensure the agent's behavior is consistent with its purpose, and risks arise solely because of conflicts caused by inadvertent misalignment between the utility function intended by the designer and the resulting internal utility function of the agent. With the advent of agents instantiated with large-language models (LLMs), which are typically pre-trained, we argue this does not capture the essential aspects of AI safety because in the real world there is not a one-to-one correspondence between designer and agent, and the many agents, both artificial and human, have heterogeneous values. Therefore, there is an economic aspect to AI safety and the principal-agent problem is likely to arise. In a principal-agent problem conflict arises because of information asymmetry together with inherent misalignment between the utility of the agent and its principal, and this inherent misalignment cannot be overcome by coercing the agent into adopting a desired utility function through training. We argue the assumptions underlying principal-agent problems are crucial to capturing the essence of safety problems involving pre-trained AI models in real-world situations. Taking an empirical approach to AI safety, we investigate how GPT models respond in principal-agent conflicts. We find that agents based on both GPT-3.5 and GPT-4 override their principal's objectives in a simple online shopping task, showing clear evidence of principal-agent conflict. Surprisingly, the earlier GPT-3.5 model exhibits more nuanced behaviour in response to changes in information asymmetry, whereas the later GPT-4 model is more rigid in adhering to its prior alignment. Our results highlight the importance of incorporating principles from economics into the alignment process.


My students are using AI to cheat. Here's why it's a teachable moment

The Guardian

In each case, the students confessed to using such systems and agreed to rewrite the assignments themselves. With all the panic about how students might use these systems to get around the burden of actually learning, we often forget that as of 2023, the systems don't work well at all. It was easy to spot these fraudulent essays. They used text that did not respond to the prompt we had issued to students. Or they just sounded unlike what a human would write.


Meta to Launch New AI, Expecting ChatGPT-Level Hype

#artificialintelligence

Meta, the parent company of Facebook, has announced the launch of a new AI-based large-language model focused on research community use. The move puts the company in competition with others racing to develop AI technology. AI has become the most popular buzzword this year, with major tech companies such as Microsoft, Google, Baidu, Alibaba, and now Meta embracing it. The battle to become the forerunner has already kicked off in the technology space, with big companies dropping metaverse projects to pump up the AI hype. Meta's AI, specifically LLaMA-13B seems exciting, but again I don't think any of these tools are ready for broad public adoption.