Generative AI
A Multimodal, Multitask System for Generating E Commerce Text Listings from Images
Manually generating catchy descriptions and names is labor intensive and a slow process for retailers. Although generative AI provides an automation solution in form of Vision to Language Models (VLM), the current VLMs are prone to factual "hallucinations". Siloed, single task models are not only inefficient but also fail to capture interdependent relationships between features. To address these challenges, we propose an end to end, multi task system that generates factually grounded textual listings from a single image. The contributions of this study are two proposals for the model architecture. First, application of multi task learning approach for fine tuning a vision encoder where a single vision backbone is jointly trained on attribute prediction such as color, hemline and neck style and price regression. Second, introduction of a hierarchical generation process where the model's own predicted attributes are embedded in a prompt and fed to the text decoder to improve factual consistency. The experiments demonstrate the superiority of this architecture. The multi tasking approach outperforms both the independent price regression, with a 3.6% better R2 Value and attribute classification, with a 6.6% improvement F1 score. Critically, the hierarchical generation process proves highly effective, slashing the factual hallucination rate from 12.7% to 7.1%, a 44.5% relative reduction, compared to a non hierarchical ablation. The hierarchical approach also reduces the latency of the autoregressive text generation process by a factor of 3.5 when compared to direct vision to language model of similar size. One minor caveat is that the model does perform 3.5% worse than direct vision-to-language model on ROUGE-L score.
Prompt fidelity of ChatGPT4o / Dall-E3 text-to-image visualisations
This study examines the prompt fidelity of ChatGPT4o / DALL - E3 text - to - image visualisations by analysing whether anullributes explicitly specified in autogenously generated prompts are correctly rendered in the resulting images. Using two public - domain datasets comprising 200 visualisations of women working in the cultural and creative industries and 230 visualisations of museum curators, the study assessed accuracy across personal anullributes (age, hair), appearance (anullire, glasses), and paraphernalia (name tags, clipboards). While correctly rendered in most cases, DALL - E3 deviated from prompt specifications in 15.6% of all anullributes (n=710). Errors were lowest for paraphernalia, moderate for personal appearance, and highest for depictions of the person themselves, particularly age. These findings demonstrate measurable prompt - to - image fidelity gaps with implications for bias detection and model evaluation.
Operationalising Extended Cognition: Formal Metrics for Corporate Knowledge and Legal Accountability
Corporate responsibility turns on notions of corporate \textit{mens rea}, traditionally imputed from human agents. Yet these assumptions are under challenge as generative AI increasingly mediates enterprise decision-making. Building on the theory of extended cognition, we argue that in response corporate knowledge may be redefined as a dynamic capability, measurable by the efficiency of its information-access procedures and the validated reliability of their outputs. We develop a formal model that captures epistemic states of corporations deploying sophisticated AI or information systems, introducing a continuous organisational knowledge metric $S_S(ฯ)$ which integrates a pipeline's computational cost and its statistically validated error rate. We derive a thresholded knowledge predicate $\mathsf{K}_S$ to impute knowledge and a firm-wide epistemic capacity index $\mathcal{K}_{S,t}$ to measure overall capability. We then operationally map these quantitative metrics onto the legal standards of actual knowledge, constructive knowledge, wilful blindness, and recklessness. Our work provides a pathway towards creating measurable and justiciable audit artefacts, that render the corporate mind tractable and accountable in the algorithmic age.
BikeBench: A Bicycle Design Benchmark for Generative Models with Objectives and Constraints
Regenwetter, Lyle, Obaideh, Yazan Abu, Chiotti, Fabien, Lykourentzou, Ioanna, Ahmed, Faez
We introduce BikeBench, an engineering design benchmark for evaluating generative models on problems with multiple real-world objectives and constraints. As generative AI's reach continues to grow, evaluating its capability to understand physical laws, human guidelines, and hard constraints grows increasingly important. Engineering product design lies at the intersection of these difficult tasks, providing new challenges for AI capabilities. BikeBench evaluates AI models' capabilities to generate bicycle designs that not only resemble the dataset, but meet specific performance objectives and constraints. To do so, BikeBench quantifies a variety of human-centered and multiphysics performance characteristics, such as aerodynamics, ergonomics, structural mechanics, human-rated usability, and similarity to subjective text or image prompts. Supporting the benchmark are several datasets of simulation results, a dataset of 10,000 human-rated bicycle assessments, and a synthetically generated dataset of 1.6M designs, each with a parametric, CAD/XML, SVG, and PNG representation. BikeBench is uniquely configured to evaluate tabular generative models, large language models (LLMs), design optimization, and hybrid algorithms side-by-side. Our experiments indicate that LLMs and tabular generative models fall short of hybrid GenAI+optimization algorithms in design quality, constraint satisfaction, and similarity scores, suggesting significant room for improvement. We hope that BikeBench, a first-of-its-kind benchmark, will help catalyze progress in generative AI for constrained multi-objective engineering design problems. We provide code, data, an interactive leaderboard, and other resources at https://github.com/Lyleregenwetter/BikeBench.
ChatGPT shares data on how many users exhibit psychosis or suicidal thoughts
OpenAI has released new estimates of the number of ChatGPT users who exhibit possible signs of mental health emergencies, including mania, psychosis or suicidal thoughts. The company said that around 0.07% of ChatGPT users active in a given week exhibited such signs, adding that its artificial intelligence (AI) chatbot recognizes and responds to these sensitive conversations. While OpenAI maintains these cases are extremely rare, critics said even a small percentage may amount to hundreds of thousands of people, as ChatGPT recently reached 800 million weekly active users, per boss Sam Altman. As scrutiny mounts, the company said it built a network of experts around the world to advise it. Those experts include more than 170 psychiatrists, psychologists, and primary care physicians who have practiced in 60 countries, the company said. They have devised a series of responses in ChatGPT to encourage users to seek help in the real world, according to OpenAI.
More than a million people every week show suicidal intent when chatting with ChatGPT, OpenAI estimates
OpenAI claimed that its recent GPT-5 update improved user safety in a model evaluation involving more than 1,000 self-harm and suicide conversations. OpenAI claimed that its recent GPT-5 update improved user safety in a model evaluation involving more than 1,000 self-harm and suicide conversations. More than a million ChatGPT users each week send messages that include "explicit indicators of potential suicidal planning or intent", according to a blogpost published by OpenAI on Monday. The finding, part of an update on how the chatbot handles sensitive conversations, is one of the most direct statements from the artificial intelligence giant on the scale of how AI can exacerbate mental health issues. In addition to its estimates on suicidal ideations and related interactions, OpenAI also said that about 0.07% of users active in a given week - about 560,000 of its touted 800m weekly users - show "possible signs of mental health emergencies related to psychosis or mania".
A Timeline of the Battle for OpenAI: Musk, Altman, and the For-Profit Shift
Open AI CEO Sam Altman speaks during a summit on June 2, 2025 in San Francisco, California. Open AI CEO Sam Altman speaks during a summit on June 2, 2025 in San Francisco, California. Founded in 2015 as a nonprofit, rather than a for-profit company, it promised to develop AI "in the way that is most likely to benefit humanity." With billions of dollars in investments from Microsoft, Japanese bank SoftBank, and chipmaker Nvidia, however, OpenAI has proposed changing its corporate structure to give investors more control over its technology. Critics of the change include cofounder-turned-competitor, Elon Musk, and nonprofits concerned about OpenAI's adherence to its mission.
OpenAI Says Hundreds of Thousands of ChatGPT Users May Show Signs of Manic or Psychotic Crisis Every Week
OpenAI released initial estimates about the share of users who may be experiencing symptoms like delusional thinking, mania, or suicidal ideation, and says it has tweaked GPT-5 to respond more effectively. For the first time ever, OpenAI has released a rough estimate of how many ChatGPT users globally may show signs of having a severe mental health crisis in a typical week. The company said Monday that it worked with experts around the world to make updates to the chatbot so it can more reliably recognize indicators of mental distress and guide users toward real-world support. In recent months, a growing number of people have ended up hospitalized, divorced, or dead after having long, intense conversations with ChatGPT. Some of their loved ones allege the chatbot fueled their delusions and paranoia.
The Download: what to make of OpenAI's Atlas browser, and how to make climate progress
The Download: what to make of OpenAI's Atlas browser, and how to make climate progress I tried OpenAI's new Atlas browser but I still don't know what it's for OpenAI rolled out a new web browser last week called Atlas. It comes with ChatGPT built in, along with an agent, so that you can browse, get answers, and have automated tasks performed on your behalf all at the same time. I've spent the past several days tinkering with Atlas. I've used it to do all my normal web browsing, and also tried to take advantage of the ChatGPT functions--plus I threw some weird agentic tasks its way to see how it did with those. My impression is that Atlas is fine? But my big takeaway is that it's pretty pointless for anyone not employed by OpenAI.
Inside the Data Centers That Train A.I. and Drain the Electrical Grid
A data center, which can use as much electricity as Philadelphia, is the new American factory, creating the future and propping up the economy. "I do guess that a lot of the world gets covered in data centers," Sam Altman, the C.E.O. of OpenAI, has said. Drive in almost any direction from almost any American city, and soon enough you'll arrive at a data center--a giant white box rising from graded earth, flanked by generators and fenced like a prison yard. Data centers for artificial intelligence are the new American factory. Packed with computing equipment, they absorb information and emit A.I. Since the launch of ChatGPT, in 2022, they have begun to multiply at an astonishing rate. "I do guess that a lot of the world gets covered in data centers over time," Sam Altman, the C.E.O. of OpenAI, recently said. The leading independent operator of A.I. data centers in the United States is CoreWeave, which was founded eight years ago, as a casual experiment. In 2017, traders at a middling New York hedge fund decided to begin mining cryptocurrency, which they used as the entry fee for their fantasy-football league. To mine the crypto, they bought a graphics-processing unit, a powerful microchip made by the company Nvidia. The G.P.U. was marketed to video gamers, but Nvidia offered software that turned it into a low-budget supercomputer. "It was so successful, from a return-of-capital perspective, that we started scaling it," Brian Venturo, one of CoreWeave's co-founders, told me. "If you make your money back in, like, five days, you want to do that a lot." Within a year, the traders had quit the hedge-fund business and bought several thousand G.P.U.s, which they ran from Venturo's grandfather's garage, in New Jersey.