Goto

Collaborating Authors

 Generative AI


OpenAI's new GPT-4.5 model is a better, more natural conversationalist

Engadget

In what has already been a busy past few days for new model releases, OpenAI is capping off the week with a research preview of GPT-4.5. The company is touting the new system as its largest and best model for chat yet. In early testing, OpenAI says people found GPT-4.5 to be a more natural conversationalist, with the ability to convey warmth and display a kind of emotional intelligence. In one example shared by OpenAI, a person tells ChatGPT they're going through a hard time after failing a test. Where the company's previous models, including GPT-4o and o3-mini, might commiserate with the individual before offering a long list of unsolicited advice, GPT-4.5 takes a different tact. "Want to talk about what happened, or do you just need a distraction?


OpenAI Launches GPT-4.5 for ChatGPT--It's Huge and Compute-Intensive

WIRED

GPT-4.5 is here, and OpenAI's newest generative AI model is bigger and more compute-intensive than ever--it's supposedly also better at understanding what ChatGPT users mean with their prompts. Users who want to be part of the first wave to try GPT-4.5, labeled as a research preview, will be required to pay for OpenAI's 200-a-month ChatGPT Pro subscription. Prior to this launch, 2025 has already been filled with new AI model releases. Anthropic recently put out a hybrid reasoning model for its Claude chatbot. Before that, Chinese researchers at DeepSeek rocked Silicon Valley with their release of a powerful model trained on a tiny budget, prompting OpenAI to drop a "mini" version of its reasoning model a month ago.


Computer Science Under Trump

Communications of the ACM

In November 2024, voters in the U.S. elected Donald Trump to a second, non-consecutive term as the nation's 47th President. Given U.S. prominence in the world, and the strong executive powers of the President, Trump and his administration will have a massive impact on everything from national security to the economy to the tenor of civic discourse--both inside and outside America. One area of impact being watched especially closely by policy experts: computer science. Technology powered by computer science, perhaps more so than at any other time in history, is expected to play a starring role in U.S. economic, cultural, and military strategies over the next four years. The world is in the midst of an unprecedented generative AI boom.


Are LLMs Ready for Practical Adoption for Assertion Generation?

arXiv.org Artificial Intelligence

Are LLMs Ready for Practical Adoption for Assertion Generation? Abstract --Assertions have been the de facto collateral for simulation-based and formal verification of hardware designs for over a decade. The quality of hardware verification, i.e., detection and diagnosis of corner-case design bugs, is critically dependent on the quality of the assertions. With the onset of generative AI such as Transformers and Large-Language Models (LLMs), there has been a renewed interest in developing novel, effective, and scalable techniques of generating functional and security assertions from design source code. While there have been recent works that use commercial-of-the-shelf (COTS) LLMs for assertion generation, there is no comprehensive study in quantifying the effectiveness of LLMs in generating syntactically and semantically correct assertions. In this paper, we first discuss AssertionBench from our prior work, a comprehensive set of designs and assertions to quantify the goodness of a broad spectrum of COTS LLMs for the task of assertion generations from hardware design source code. Our key insight was that COTS LLMs are not yet ready for prime-time adoption for assertion generation as they generate a considerable fraction of syntactically and semantically incorrect assertions. Motivated by the insight, we propose AssertionLLM, a first of its kind LLM model, specifically fine-tuned for assertion generation. Our initial experimental results show that AssertionLLM considerably improves the semantic and syntactic correctness of the generated assertions over COTS LLMs.


Artificial Intelligence in Sports: Insights from a Quantitative Survey among Sports Students in Germany about their Perceptions, Expectations, and Concerns regarding the Use of AI Tools

arXiv.org Artificial Intelligence

Generative Artificial Intelligence (AI) tools such as ChatGPT, Copilot, or Gemini have a crucial impact on academic research and teaching. Empirical data on how students perceive the increasing influence of AI, which different types of tools they use, what they expect from them in their daily academic tasks, and their concerns regarding the use of AI in their studies are still limited. The manuscript presents findings from a quantitative survey conducted among sports students of all semesters in Germany using an online questionnaire. It explores aspects such as students' usage behavior, motivational factors, and uncertainties regarding the impact of AI tools on academia in the future. Furthermore, the social climate in sports studies is being investigated to provide a general overview of the current situation of the students in Germany. Data collection took place between August and November 2023, addressing all sports departments at German universities, with a total of 262 students participating. Our Findings indicate that students have a strong interest in using AI tools in their studies, expecting them to improve their overall academic performance, understand the complexity of scientific approaches, and save time. They express confidence that the proliferation of AI will not compromise their critical thinking skills. Moreover, students are positive about integrating more AI-related topics into the curriculum and about lecturers adopting more AI-based teaching methods. However, our findings also show that students have concerns about plagiarism, lecturer preparedness and their own skills and future skill development.


Advancing AI-Powered Medical Image Synthesis: Insights from MedVQA-GI Challenge Using CLIP, Fine-Tuned Stable Diffusion, and Dream-Booth + LoRA

arXiv.org Artificial Intelligence

The MEDVQA-GI challenge addresses the integration of AI-driven text-to-image generative models in medical diagnostics, aiming to enhance diagnostic capabilities through synthetic image generation. Existing methods primarily focus on static image analysis and lack the dynamic generation of medical imagery from textual descriptions. This study intends to partially close this gap by introducing a novel approach based on fine-tuned generative models to generate dynamic, scalable, and precise images from textual descriptions. Particularly, our system integrates fine-tuned Stable Diffusion and DreamBooth models, as well as Low-Rank Adaptation (LORA), to generate high-fidelity medical images. The problem is around two sub-tasks namely: image synthesis (IS) and optimal prompt production (OPG). The former creates medical images via verbal prompts, whereas the latter provides prompts that produce high-quality images in specified categories. The study emphasizes the limitations of traditional medical image generation methods, such as hand sketching, constrained datasets, static procedures, and generic models. Our evaluation measures showed that Stable Diffusion surpasses CLIP and DreamBooth + LORA in terms of producing high-quality, diversified images. Specifically, Stable Diffusion had the lowest Fr\'echet Inception Distance (FID) scores (0.099 for single center, 0.064 for multi-center, and 0.067 for combined), indicating higher image quality. Furthermore, it had the highest average Inception Score (2.327 across all datasets), indicating exceptional diversity and quality. This advances the field of AI-powered medical diagnosis. Future research will concentrate on model refining, dataset augmentation, and ethical considerations for efficiently implementing these advances into clinical practice


Supervised Fine-Tuning LLMs to Behave as Pedagogical Agents in Programming Education

arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly being explored in higher education, yet their effectiveness as teaching agents remains underexamined. In this paper, we present the development of GuideLM, a fine-tuned LLM designed for programming education. GuideLM has been integrated into the Debugging C Compiler (DCC), an educational C compiler that leverages LLMs to generate pedagogically sound error explanations. Previously, DCC relied on off-the-shelf OpenAI models, which, while accurate, often over-assisted students by directly providing solutions despite contrary prompting. To address this, we employed supervised fine-tuning (SFT) on a dataset of 528 student-question/teacher-answer pairs, creating two models: GuideLM and GuideLM-mini, fine-tuned on ChatGPT-4o and 4o-mini, respectively. We conducted an expert analysis of 400 responses per model, comparing their pedagogical effectiveness against base OpenAI models. Our evaluation, grounded in constructivism and cognitive load theory, assessed factors such as conceptual scaffolding, clarity, and Socratic guidance. Results indicate that GuideLM and GuideLM-mini improve pedagogical performance, with an 8% increase in Socratic guidance and a 58% improvement in economy of words compared to GPT-4o. However, this refinement comes at the cost of a slight reduction in general accuracy. While further work is needed, our findings suggest that fine-tuning LLMs with targeted datasets is a promising approach for developing models better suited to educational contexts.


Bridging Legal Knowledge and AI: Retrieval-Augmented Generation with Vector Stores, Knowledge Graphs, and Hierarchical Non-negative Matrix Factorization

arXiv.org Artificial Intelligence

Agentic Generative AI, powered by Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG), Knowledge Graphs (KGs), and Vector Stores (VSs), represents a transformative technology applicable to specialized domains such as legal systems, research, recommender systems, cybersecurity, and global security, including proliferation research. This technology excels at inferring relationships within vast unstructured or semi-structured datasets. The legal domain here comprises complex data characterized by extensive, interrelated, and semi-structured knowledge systems with complex relations. It comprises constitutions, statutes, regulations, and case law. Extracting insights and navigating the intricate networks of legal documents and their relations is crucial for effective legal research. Here, we introduce a generative AI system that integrates RAG, VS, and KG, constructed via Non-Negative Matrix Factorization (NMF), to enhance legal information retrieval and AI reasoning and minimize hallucinations. In the legal system, these technologies empower AI agents to identify and analyze complex connections among cases, statutes, and legal precedents, uncovering hidden relationships and predicting legal trends-challenging tasks that are essential for ensuring justice and improving operational efficiency. Our system employs web scraping techniques to systematically collect legal texts, such as statutes, constitutional provisions, and case law, from publicly accessible platforms like Justia. It bridges the gap between traditional keyword-based searches and contextual understanding by leveraging advanced semantic representations, hierarchical relationships, and latent topic discovery. This framework supports legal document clustering, summarization, and cross-referencing, for scalable, interpretable, and accurate retrieval for semi-structured data while advancing computational law and AI.


Deterministic or probabilistic? The psychology of LLMs as random number generators

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have transformed text generation through inherently probabilistic context-aware mechanisms, mimicking human natural language. In this paper, we systematically investigate the performance of various LLMs when generating random numbers, considering diverse configurations such as different model architectures, numerical ranges, temperature, and prompt languages. Our results reveal that, despite their stochastic transformers-based architecture, these models often exhibit deterministic responses when prompted for random numerical outputs. In particular, we find significant differences when changing the model, as well as the prompt language, attributing this phenomenon to biases deeply embedded within the training data. Models such as DeepSeek-R1 can shed some light on the internal reasoning process of LLMs, despite arriving to similar results. These biases induce predictable patterns that undermine genuine randomness, as LLMs are nothing but reproducing our own human cognitive biases.


How Sam Altman Could Break Up Elon Musk and Donald Trump

The Atlantic - Technology

The rivalry between Sam Altman and Elon Musk is entering its Apprentice era. Both men have the ambition to redefine how the modern world works--and both are jockeying for President Donald Trump's blessing to accelerate their plans. Altman's company, OpenAI, as well as Musk's ventures--which include SpaceX, Tesla, and xAI--all depend to some degree on federal dollars, permits, and regulatory support. The president could influence whether OpenAI or xAI produces the next major AI breakthrough, whether Musk can succeed in sending a human to Mars, and whether Altman's big bet on nuclear energy, and fusion reactors in particular, pans out. Understanding the competition between these two men helps illuminate Trump's particular style of governing--one defined by patronage and dealmaking.