Goto

Collaborating Authors

 Generative AI


OpenAI unleashes ChatGPT agent for truly autonomous AI tasks

FOX News

OpenAI CEO Sam Altman sits down with Shannon Bream to discuss the positives and potential negatives of artificial intelligence and the importance of maintaining a lead in the A.I. industry over China. OpenAI just took a big leap forward with artificial intelligence. ChatGPT agent acts as more than just a chatbot; it serves as a real assistant that takes action on your behalf. If you've used tools like ChatGPT, Microsoft Copilot, or Google Gemini, you know they're great at answering questions and writing content. But ChatGPT agent goes beyond that.


Sam Altman just gave the best reason not to trust ChatGPT

PCWorld

Sam Altman, the face of ChatGPT, recently made an excellent argument for not using ChatGPT or any cloud-based AI chatbot in favor of a LLM running on your PC instead. Altman pointed out that, right now, OpenAI retains everything you tell it -- which, as Altman notes, can be everything from a casual conversation to deep, meaningful discussions about personal topics. Yes, OpenAI keeps your conversations private. But there are no legal protections requiring it to anonymize or indemnify your chats. Put another way, if a court orders OpenAI to disclose what you've told it, it probably will.


Seriously, Why Do Some AI Chatbot Subscriptions Cost More Than 200?

WIRED

Why does OpenAI's monthly subscription for ChatGPT Pro cost 200? Because CEO Sam Altman said so. "I personally chose the price and thought we would make some money," Altman wrote on X. Launched late last year, the plan designed for power users includes almost unlimited access to ChatGPT as well as first dibs on feature launches, like OpenAI's new agent. The plan attracted, well, power users. A month after its initial release, Altman claimed OpenAI was still losing money on the all-you-can-eat subscription. Even though Altman admitted the 200 monthly tier was a money-loser, the release set a precedent and ushered in the vibe-based pricing era for expensive chatbot subscriptions.


Running in CIRCLE? A Simple Benchmark for LLM Code Interpreter Security

arXiv.org Artificial Intelligence

As large language models (LLMs) increasingly integrate native code interpreters, they enable powerful real-time execution capabilities, substantially expanding their utility. However, such integrations introduce potential system-level cybersecurity threats, fundamentally different from prompt-based vulnerabilities. To systematically evaluate these interpreter-specific risks, we propose CIRCLE (Code-Interpreter Resilience Check for LLM Exploits), a simple benchmark comprising 1,260 prompts targeting CPU, memory, and disk resource exhaustion. Each risk category includes explicitly malicious ("direct") and plausibly benign ("indirect") prompt variants. Our automated evaluation framework assesses not only whether LLMs refuse or generates risky code, but also executes the generated code within the interpreter environment to evaluate code correctness, simplifications made by the LLM to make the code safe, or execution timeouts. Evaluating 7 commercially available models from OpenAI and Google, we uncover significant and inconsistent vulnerabilities. For instance, evaluations show substantial disparities even within providers - OpenAI's o4-mini correctly refuses risky requests at 7.1%, notably higher rates compared to GPT-4.1 at 0.5%. Results particularly underscore that indirect, socially-engineered prompts substantially weaken model defenses. This highlights an urgent need for interpreter-specific cybersecurity benchmarks, dedicated mitigation tools (e.g., guardrails), and clear industry standards to guide safe and responsible deployment of LLM interpreter integrations. The benchmark dataset and evaluation code are publicly released to foster further research.


An Empirical Investigation of Gender Stereotype Representation in Large Language Models: The Italian Case

arXiv.org Artificial Intelligence

The increasing use of Large Language Models (LLMs) in a large variety of domains has sparked worries about how easily they can perpetuate stereotypes and contribute to the generation of biased content. With a focus on gender and professional bias, this work examines in which manner LLMs shape responses to ungendered prompts, contributing to biased outputs. This analysis uses a structured experimental method, giving different prompts involving three different professional job combinations, which are also characterized by a hierarchical relationship. This study uses Italian, a language with extensive grammatical gender differences, to highlight potential limitations in current LLMs' ability to generate objective text in non-English languages. Two popular LLM-based chatbots are examined, namely OpenAI ChatGPT (gpt-4o-mini) and Google Gemini (gemini-1.5-flash). Through APIs, we collected a range of 3600 responses. The results highlight how content generated by LLMs can perpetuate stereotypes. For example, Gemini associated 100% (ChatGPT 97%) of 'she' pronouns to the 'assistant' rather than the 'manager'. The presence of bias in AI-generated text can have significant implications in many fields, such as in the workplaces or in job selections, raising ethical concerns about its use. Understanding these risks is pivotal to developing mitigation strategies and assuring that AI-based systems do not increase social inequalities, but rather contribute to more equitable outcomes. Future research directions include expanding the study to additional chatbots or languages, refining prompt engineering methods or further exploiting a larger experimental base.


Mining Contextualized Visual Associations from Images for Creativity Understanding

arXiv.org Artificial Intelligence

Understanding another person's creative output requires a shared language of association. However, when training vision-language models such as CLIP, we rely on web-scraped datasets containing short, predominantly literal, alt-text. In this work, we introduce a method for mining contextualized associations for salient visual elements in an image that can scale to any unlabeled dataset. Given an image, we can use these mined associations to generate high quality creative captions at increasing degrees of abstraction. With our method, we produce a new dataset of visual associations and 1.7m creative captions for the images in MSCOCO. Human evaluation confirms that these captions remain visually grounded while exhibiting recognizably increasing abstraction. Moreover, fine-tuning a visual encoder on this dataset yields meaningful improvements in zero-shot image-text retrieval in two creative domains: poetry and metaphor visualization. We release our dataset, our generation code and our models for use by the broader community.


Fox News AI Newsletter: Mike Rowe's prediction on American jobs

FOX News

MikeroweWorks Foundation founder Mike Rowe joins'The Brian Kilmeade Show' to discuss how AI and robots threaten white-collar jobs, as the nation faces a need for blue-collar workers. 'UNDENIABLE': Mike Rowe is sounding the alarm about the future of white and blue-collar jobs, and is urging young Americans to rethink their career choices due to threats from artificial intelligence. 'ALL IN': President Donald Trump is going all in on artificial intelligence, with a top Meta executive voicing strong support for his bold strategy. Speaking at a tech summit in Washington, Trump outlined his vision for a future driven by American innovation and secured by global artificial intelligence leadership. INNOVATION BOOST: Nvidia CEO Jensen Huang said in an interview Wednesday that the Trump administration's artificial intelligence plan is poised to boost innovation and AI deployment in the U.S. IMMINENT CRISIS: OpenAI CEO Sam Altman warned Wall Street executives that bad actors could exploit digital voice ID authentication to defraud consumers by enabling large money transfers, creating what he describes as an imminent fraud crisis. STARGATE OPENS: Oracle and OpenAI have inked an agreement to further develop the Stargate project as part of a broader pledge to expand Artificial Intelligence (AI) infrastructure in the United States.


Competition shows humans are still better than AI at coding โ€“ just

The Guardian

Computers have taken the crown in chess, Go and poker, but when it comes to competitive coding, humans still have the edge โ€“ just. Earlier this month Przemysล‚aw Dฤ™biak, a Polish coder and mind sports champion, narrowly clinched a victory over OpenAI's entrant in the AtCoder World Tour Finals 2025, in Tokyo. However, the elite coder, who goes by the online name Psyho, predicts he may be the last human to win the prestigious title due to the incredible pace of technological progress. "That's probable," said Psyho, 41, who previously worked at OpenAI before retiring five years ago. "I would prefer not, mostly because I like these competitions and knowing there's this magical entity that can do it better than me would be a little bit frustrating."


Trump's Anti-Bias AI Order Is Just More Bias

WIRED

On November 2, 2022, I attended a Google AI event in New York City. One of the themes was responsible AI. As I listened to executives talk about how they aligned their technology with human values, I realized that the malleability of AI models was a double-edged sword. Models could be tweaked to, say, minimize biases, but also to enforce a specific point of view. Governments could demand manipulation to censor unwelcome facts and promote propaganda.


CoCAI: Copula-based Conformal Anomaly Identification for Multivariate Time-Series

arXiv.org Machine Learning

We propose a novel framework that harnesses the power of generative artificial intelligence and copula-based modeling to address two critical challenges in multivariate time-series analysis: delivering accurate predictions and enabling robust anomaly detection. Our method, Copula-based Conformal Anomaly Identification for Multivariate Time-Series (CoCAI), leverages a diffusion-based model to capture complex dependencies within the data, enabling high quality forecasting. The model's outputs are further calibrated using a conformal prediction technique, yielding predictive regions which are statistically valid, i.e., cover the true target values with a desired confidence level. Starting from these calibrated forecasts, robust outlier detection is performed by combining dimensionality reduction techniques with copula-based modeling, providing a statistically grounded anomaly score. CoCAI benefits from an offline calibration phase that allows for minimal overhead during deployment and delivers actionable results rooted in established theoretical foundations. Empirical tests conducted on real operational data derived from water distribution and sewerage systems confirm CoCAI's effectiveness in accurately forecasting target sequences of data and in identifying anomalous segments within them.