Goto

Collaborating Authors

 Generative AI


LLMs Outperform Experts on Challenging Biology Benchmarks

arXiv.org Artificial Intelligence

This study systematically evaluates 27 frontier Large Language Models on eight biology benchmarks spanning molecular biology, genetics, cloning, virology, and biosecurity. Models from major AI developers released between November 2022 and April 2025 were assessed through ten independent runs per benchmark. The findings reveal dramatic improvements in biological capabilities. Top model performance increased more than 4-fold on the challenging text-only subset of the Virology Capabilities Test over the study period, with OpenAI's o3 now performing twice as well as expert virologists. Several models now match or exceed expert-level performance on other challenging benchmarks, including the biology subsets of GPQA and WMDP and LAB-Bench CloningScenarios. Contrary to expectations, chain-of-thought did not substantially improve performance over zero-shot evaluation, while extended reasoning features in o3-mini and Claude 3.7 Sonnet typically improved performance as predicted by inference scaling. Benchmarks such as PubMedQA and the MMLU and WMDP biology subsets exhibited performance plateaus well below 100%, suggesting benchmark saturation and errors in the underlying benchmark data. The analysis highlights the need for more sophisticated evaluation methodologies as AI systems continue to advance.


AutoMCQ -- Automatically Generate Code Comprehension Questions using GenAI

arXiv.org Artificial Intelligence

Students often do not fully understand the code they have written. This sometimes does not become evident until later in their education, which can mean it is harder to fix their incorrect knowledge or misunderstandings. In addition, being able to fully understand code is increasingly important in a world where students have access to generative artificial intelligence (GenAI) tools, such as GitHub Copilot. One effective solution is to utilise code comprehension questions, where a marker asks questions about a submission to gauge understanding, this can also have the side effect of helping to detect plagiarism. However, this approach is time consuming and can be difficult and/or expensive to scale. This paper introduces AutoMCQ, which uses GenAI for the automatic generation of multiple-choice code comprehension questions. This is integrated with the CodeRunner automated assessment platform.


Multimodal Generative AI for Story Point Estimation in Software Development

arXiv.org Artificial Intelligence

This research explores the application of Multimodal Generative AI to enhance story point estimation in Agile software development. By integrating text, image, and categorical data using advanced models like BERT, CNN, and XGBoost, our approach surpasses t he limitations of traditional single - modal estimation methods. The results demonstrate strong accuracy for simpler story points, while also highlighting challenges in more complex categories due to data imbalance. This study further explores the impact of categorical data, particularly severity, on the estimation process, emphasizing its influence on model performance. Our findings emphasize the transformative potential of multimodal data integration in refining AI - driven project management, paving the way for more precise, adaptable, and domain - specific AI capabilities. Additionally, this work outlines future direction s for addressing data variability and enhancing the robustness of AI in Agile methodologies.


iPhone design guru and OpenAI chief promise an AI device revolution

The Guardian

Everything over the last 30 years, according to Sir Jony Ive, has led to this moment: a partnership between the iPhone designer and the developer of ChatGPT. Ive has sold his hardware startup, io, to OpenAI and will take on creative and design leadership across the merged businesses. "I have a growing sense that everything I have learned over the last 30 years has led me to this place, to this moment," he says in a video announcing the 6.4bn ( 4.8bn) deal. The main aim will be to move on from Ive's signature achievement designing Apple's most successful product, as well as the iPod, iPad and Apple Watch. The British-born designer has already developed a prototype io device, and one of its users is OpenAI's chief executive, Sam Altman.


Google's New AI Puts Breasts on Minors--And J. D. Vance

The Atlantic - Technology

Sorry to tell you this, but Google's new AI shopping tool appears eager to give J. D. Vance breasts. This week, at its annual software conference, Google released an AI tool called Try It On, which acts as a virtual dressing room: Upload images of yourself while shopping for clothes online, and Google will show you what you might look like in a selected garment. Curious to play around with the tool, we began uploading images of famous men--Vance, Sam Altman, Abraham Lincoln, Michelangelo's David, Pope Leo XIV--and dressed them in linen shirts and three-piece suits. But when we tested a number of articles designed for women on these famous men, the tool quickly adapted: Whether it was a mesh shirt, a low-cut top, or even just a T-shirt, Google's AI rapidly spun up images of the vice president, the CEO of OpenAI, and the vicar of Christ with breasts. It's not just men: When we uploaded images of women, the tool repeatedly enhanced their dรฉcolletage or added breasts that were not visible in the original images.


Apple iPhone designer Jony Ive joins OpenAI in 6.5bn deal

BBC News

Sir Jony worked for Apple for 27 years, helping to revive the company with groundbreaking products including the iPhone and iPod. He also designed the iMac in 1998 and the iPad in 2010. When Sir Jony left the company in 2019, Apple's CEO Tim Cook described him as "a singular figure in the design world and his role in Apple's revival cannot be overstated". Shares in Apple fell more than 2% following the news of his partnership with OpenAI. He left to found his own company, LoveFrom, which has worked with companies such as Airbnb and Moncler.


OpenAI's Ambitions Just Became Crystal Clear

The Atlantic - Technology

Sam Altman is done with keyboards and screens. Earlier today, OpenAI announced its intentions to solve this apparent problem. The company is partnering with Jony Ive, the longtime head of design at Apple, who did pioneering work on products such as the iMac G3, the iPod, and, most famously, the iPhone. Together, Altman and Ive say they want to create hardware built specifically for AI software. Everyone, Altman suggested in a highly produced announcement video, could soon have access to a "team of geniuses"--presumably, ChatGPT-style assistants--on a "family of devices."


Causal Predictive Optimization and Generation for Business AI

arXiv.org Machine Learning

The sales process involves sales functions converting leads or opportunities to customers and selling more products to existing customers. The optimization of the sales process thus is key to success of any B2B business. In this work, we introduce a principled approach to sales optimization and business AI, namely the Causal Predictive Optimization and Generation, which includes three layers: 1) prediction layer with causal ML 2) optimization layer with constraint optimization and contextual bandit 3) serving layer with Generative AI and feedback-loop for system enhancement. We detail the implementation and deployment of the system in LinkedIn, showcasing significant wins over legacy systems and sharing learning and insight broadly applicable to this field.


Leveraging Generative AI Models to Explore Human Identity

arXiv.org Artificial Intelligence

This paper attempts to explore human identity by utilizing neural networks in an indirect manner. For this exploration, we adopt diffusion models, state-of-the-art AI generative models trained to create human face images. By relating the generated human face to human identity, we establish a correspondence between the face image generation process of the diffusion model and the process of human identity formation. Through experiments with the diffusion model, we observe that changes in its external input result in significant changes in the generated face image. Based on the correspondence, we indirectly confirm the dependence of human identity on external factors in the process of human identity formation. Furthermore, we introduce Fluidity of Human Identity, a video artwork that expresses the fluid nature of human identity affected by varying external factors. The video is available at https://www.behance.net/gallery/


Kaleidoscope Gallery: Exploring Ethics and Generative AI Through Art

arXiv.org Artificial Intelligence

Ethical theories and Generative AI (GenAI) models are dynamic concepts subject to continuous evolution. This paper investigates the visualization of ethics through a subset of GenAI models. We expand on the emerging field of Visual Ethics, using art as a form of critical inquiry and the metaphor of a kaleidoscope to invoke moral imagination. Through formative interviews with 10 ethics experts, we first establish a foundation of ethical theories. Our analysis reveals five families of ethical theories, which we then transform into images using the text-to-image (T2I) GenAI model. The resulting imagery, curated as Kaleidoscope Gallery and evaluated by the same experts, revealed eight themes that highlight how morality, society, and learned associations are central to ethical theories. We discuss implications for critically examining T2I models and present cautions and considerations. This work contributes to examining ethical theories as foundational knowledge that interrogates GenAI models as socio-technical systems.