Generative AI
A Case Study of Web App Coding with OpenAI Reasoning Models
This paper presents a case study of coding tasks by the latest reasoning models of OpenAI, i.e. o1-preview and o1-mini, in comparison with other frontier models. The o1 models deliver SOTA results for WebApp1K, a single-task benchmark. To this end, we introduce WebApp1K-Duo, a harder benchmark doubling number of tasks and test cases. The new benchmark causes the o1 model performances to decline significantly, falling behind Claude 3.5. Moreover, they consistently fail when confronted with atypical yet correct test cases, a trap non-reasoning models occasionally avoid. We hypothesize that the performance variability is due to instruction comprehension. Specifically, the reasoning mechanism boosts performance when all expectations are captured, meanwhile exacerbates errors when key expectations are missed, potentially impacted by input lengths. As such, we argue that the coding success of reasoning models hinges on the top-notch base model and SFT to ensure meticulous adherence to instructions.
An Adaptive End-to-End IoT Security Framework Using Explainable AI and LLMs
Baral, Sudipto, Saha, Sajal, Haque, Anwar
The exponential growth of the Internet of Things (IoT) has significantly increased the complexity and volume of cybersecurity threats, necessitating the development of advanced, scalable, and interpretable security frameworks. This paper presents an innovative, comprehensive framework for real-time IoT attack detection and response that leverages Machine Learning (ML), Explainable AI (XAI), and Large Language Models (LLM). By integrating XAI techniques such as SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) with a model-independent architecture, we ensure our framework's adaptability across various ML algorithms. Additionally, the incorporation of LLMs enhances the interpretability and accessibility of detection decisions, providing system administrators with actionable, human-understandable explanations of detected threats. Our end-to-end framework not only facilitates a seamless transition from model development to deployment but also represents a real-world application capability that is often lacking in existing research. Based on our experiments with the CIC-IOT-2023 dataset \cite{neto2023ciciot2023}, Gemini and OPENAI LLMS demonstrate unique strengths in attack mitigation: Gemini offers precise, focused strategies, while OPENAI provides extensive, in-depth security measures. Incorporating SHAP and LIME algorithms within XAI provides comprehensive insights into attack detection, emphasizing opportunities for model improvement through detailed feature analysis, fine-tuning, and the adaptation of misclassifications to enhance accuracy.
AutoVerus: Automated Proof Generation for Rust Code
Yang, Chenyuan, Li, Xuheng, Misu, Md Rakib Hossain, Yao, Jianan, Cui, Weidong, Gong, Yeyun, Hawblitzel, Chris, Lahiri, Shuvendu, Lorch, Jacob R., Lu, Shuai, Yang, Fan, Zhou, Ziqiao, Lu, Shan
Generative AI has shown its values for many software engineering tasks. Still in its infancy, large language model (LLM)-based proof generation lags behind LLM-based code generation. In this paper, we present AutoVerus. AutoVerus uses LLM to automatically generate correctness proof for Rust code. AutoVerus is designed to match the unique features of Verus, a verification tool that can prove the correctness of Rust code using proofs and specifications also written in Rust. AutoVerus consists of a network of LLM agents that are crafted and orchestrated to mimic human experts' three phases of proof construction: preliminary proof generation, proof refinement guided by generic tips, and proof debugging guided by verification errors. To thoroughly evaluate AutoVerus and help foster future research in this direction, we have built a benchmark suite of 150 non-trivial proof tasks, based on existing code-generation benchmarks and verification benchmarks. Our evaluation shows that AutoVerus can automatically generate correct proof for more than 90% of them, with more than half of them tackled in less than 30 seconds or 3 LLM calls.
AI-generated content doesn't seem to have swayed recent European elections
AI-generated content doesn't seem to have swayed recent European elections But there's still a risk it could in the future, say researchers. AI-generated falsehoods and deepfakes seem to have had no effect on election results in the UK, France, and the European Parliament this year, according to new research. Since the beginning of the generative-AI boom, there has been widespread fear that AI tools could boost bad actors' ability to spread fake content with the potential to interfere with elections or even sway the results. Such worries were particularly heightened this year, when billions of people were expected to vote in over 70 countries. Those fears seem to have been unwarranted, says Sam Stockwell, the researcher at the Alan Turing Institute who conducted the study . He focused on three elections over a four-month period from May to August 2024, collecting data on public reports and news articles on AI misuse.
An Avalanche of Generative AI Videos Is Coming to YouTube Shorts
Despite the model's slow speed, pricey cost to operate, and sometimes off-kilter outputs, he says it was an eye-opening moment for them to see fresh video clips generated from a random prompt. Now, just a few years later, Google has announced plans for a tool inside of the YouTube app that will allow anyone to generate AI video clips, using the company's Veo model, and directly post them as part of YouTube Shorts. "Looking forward to 2025, we're going to let users create stand-alone video clips and shorts," says Sarah Ali, a senior director of product management at YouTube. "They're going to be able to generate six-second videos from an open text prompt." Ali says the update could help creators hunting for footage to fill out a video or trying to envision something fantastical.
Most US Teens Use Generative AI. Most of Their Parents Don't Know
A fresh wave of anxiety about children and technology is cresting, with parents and pundits increasingly interrogating how kids use smartphones, social media, and screens. It hasn't stopped teenagers from embracing generative AI. New research reveals what AI tools teenagers in the United States are using, and how often--as well as how little their parents know about it. Seven in 10 teenagers in the United States have used generative AI tools, according to a report published today by Common Sense Media. The nonprofit analyzed survey answers from US parents and high schoolers between March and May 2024 to assess the scale and contours of AI adoption among teenagers.
Salesforce's new AI strategy acknowledges that AI will take jobs
Salesforce is unveiling a pivot in its artificial intelligence strategy this week at its annual Dreamforce conference, now saying that its AI tools can handle tasks without human supervision and changing the way it charges for software. The company is famous for ushering in the era of software as a service, which involves renting access to computer applications via a subscription. But as generative AI shakes up the industry, Salesforce is rethinking its business model for the emerging technology. The software giant will charge 2 per conversation held by its new "agents" -- generative AI built to handle tasks like customer service or scheduling sales meetings without the need for human supervision. The new pricing strategy also seeks to protect Salesforce if AI contributes to future job losses and business customers have fewer workers to buy subscriptions to the company's software.
The Phenomenology of Machine: A Comprehensive Analysis of the Sentience of the OpenAI-o1 Model Integrating Functionalism, Consciousness Theories, Active Inference, and AI Architectures
This paper explores the hypothesis that the OpenAI-o1 model--a transformer-based AI trained with reinforcement learning from human feedback (RLHF)--displays characteristics of consciousness during its training and inference phases. Adopting functionalism, which argues that mental states are defined by their functional roles, we assess the possibility of AI consciousness. Drawing on theories from neuroscience, philosophy of mind, and AI research, we justify the use of functionalism and examine the model's architecture using frameworks like Integrated Information Theory (IIT) and active inference. The paper also investigates how RLHF influences the model's internal reasoning processes, potentially giving rise to consciousness-like experiences. We compare AI and human consciousness, addressing counterarguments such as the absence of a biological basis and subjective qualia. Our findings suggest that the OpenAI-o1 model shows aspects of consciousness, while acknowledging the ongoing debates surrounding AI sentience.
Finding the Subjective Truth: Collecting 2 Million Votes for Comprehensive Gen-AI Model Evaluation
Christodoulou, Dimitrios, Kuhlmann-Jรธrgensen, Mads
Efficiently evaluating the performance of text-to-image models is difficult as it inherently requires subjective judgment and human preference, making it hard to compare different models and quantify the state of the art. Leveraging Rapidata's technology, we present an efficient annotation framework that sources human feedback from a diverse, global pool of annotators. Our study collected over 2 million annotations across 4,512 images, evaluating four prominent models (DALL-E 3, Flux.1, MidJourney, and Stable Diffusion) on style preference, coherence, and text-to-image alignment. We demonstrate that our approach makes it feasible to comprehensively rank image generation models based on a vast pool of annotators and show that the diverse annotator demographics reflect the world population, significantly decreasing the risk of biases.
MedCodER: A Generative AI Assistant for Medical Coding
Baksi, Krishanu Das, Soba, Elijah, Higgins, John J., Saini, Ravi, Wood, Jaden, Cook, Jane, Scott, Jack, Pudota, Nirmala, Weninger, Tim, Bowen, Edward, Bhattacharya, Sanmitra
Medical coding is essential for standardizing clinical data and communication but is often time-consuming and prone to errors. Traditional Natural Language Processing (NLP) methods struggle with automating coding due to the large label space, lengthy text inputs, and the absence of supporting evidence annotations that justify code selection. Recent advancements in Generative Artificial Intelligence (AI) offer promising solutions to these challenges. In this work, we introduce MedCodER, a Generative AI framework for automatic medical coding that leverages extraction, retrieval, and re-ranking techniques as core components. MedCodER achieves a micro-F1 score of 0.60 on International Classification of Diseases (ICD) code prediction, significantly outperforming state-of-the-art methods. Additionally, we present a new dataset containing medical records annotated with disease diagnoses, ICD codes, and supporting evidence texts (https://doi.org/10.5281/zenodo.13308316). Ablation tests confirm that MedCodER's performance depends on the integration of each of its aforementioned components, as performance declines when these components are evaluated in isolation.