Generative AI
Apple design legend Jony Ive joins OpenAI to work on AI hardware
The legendary designer behind Apple's iPhone, Jony Ive, has joined OpenAI to create devices tailored for using generative artificial intelligence, according to a video posted Wednesday by the ChatGPT maker. Ive and his team will take over design at OpenAI as part of an acquisition of his startup named "IO" valued at 6.5 billion. Sharing no details, OpenAI chief executive Sam Altman said in the video that a prototype Ive shared with him "is the coolest piece of technology that the world will have ever seen."
OpenAI buys iPhone architect's startup for 6.4bn
OpenAI is buying an untested startup for 6.4bn, the ChatGPT maker's biggest acquisition yet. The hardware startup, called io, was founded by Apple design guru Jony Ive, known best as one of the principal architects of the iPhone. Ive and OpenAI's CEO, Sam Altman, said in a blog post that their partnership has been two years in the making. "A collaboration built upon friendship, curiosity and shared values quickly grew in ambition," they wrote in the blog post, which offered scant details on upcoming devices. "Tentative ideas and explorations evolved into tangible designs."
OpenAI's Big Bet That Jony Ive Can Make AI Hardware Work
OpenAI has fully acquired Io, a joint venture it cocreated last year with Jony Ive, the famed British designer behind the sleek industrial aesthetic that defined the iPhone and more than two decades of Apple products. In a nearly 10-minute video posted to X on Wednesday, Ive and OpenAI CEO Sam Altman said the Apple pioneer's "creative collective" will "merge with OpenAI to work more intimately with the research, engineering, and product teams in San Francisco." OpenAI says it's paying 5 billion in equity to acquire Io. The promotional video included musings on technology from both Ive and Altman, set against the golden-hour backdrop of the streets of San Francisco, but the two never share exactly what it is they're building. "We look forward to sharing our work next year," a text statement at the end of the video reads.
The Time Sam Altman Asked for a Countersurveillance Audit of OpenAI
Dario Amodei's AI safety contingent was growing disquieted with some of Sam Altman's behaviors. Shortly after OpenAI's Microsoft deal was inked in 2019, several of them were stunned to discover the extent of the promises that Altman had made to Microsoft for which technologies it would get access to in return for its investment. The terms of the deal didn't align with what they had understood from Altman. If AI safety issues actually arose in OpenAI's models, they worried, those commitments would make it far more difficult, if not impossible, to prevent the models' deployment. Amodei's contingent began to have serious doubts about Altman's honesty.
'Every person that clashed with him has left': the rise, fall and spectacular comeback of Sam Altman
The short-lived firing of Sam Altman, the CEO of possibly the world's most important AI company, was sensational. When he was sacked by OpenAI's board members, some of them believed the stakes could not have been higher – the future of humanity – if the organisation continued under Altman. Imagine Succession, with added apocalypse vibes. In early November 2023, after three weeks of secret calls and varying degrees of paranoia, the OpenAI board agreed: Altman had to go. After his removal, Altman's most loyal staff resigned, and others signed an open letter calling for his reinstatement.
Source framing triggers systematic evaluation bias in Large Language Models
Germani, Federico, Spitale, Giovanni
Large Language Models (LLMs) are increasingly used not only to generate text but also to evaluate it, raising urgent questions about whether their judgments are consistent, unbiased, and robust to framing effects. In this study, we systematically examine inter - and intra - model agreement across four state - of - the - art LLMs - OpenAI o3 - mini, Deepseek Reasone r, xAI Grok 2, and Mistral - tasked with evaluating 4,800 narrative statements on 24 different topics of social, political, and public health relevance, for a total of 192,000 assessments. W e manipulate the disclosed source of each statement to assess how attribution to either another LLM or a human author of specified nationality affects evaluation outcomes. We find that, in the blind condition, different LLMs display a remarkably high degree of inter - and intra - model agreement across topics . However, this alignment breaks down when source framing is introduced. Here we show that attributing statements to Chinese individuals systematically lowers agreement scores across all models, and in particular for Deepseek Reasoner . Our findings reveal that framing effects can deeply affect text evaluation, with significant implications for the integrity, neutrality, and fairness of LLM - mediated information systems.
Understanding University Students' Use of Generative AI: The Roles of Demographics and Personality Traits
Deng, Newnew, Liu, Edward Jiusi, Zhai, Xiaoming
The use of generative AI (GAI) among university students is rapidly increasing, yet empirical research on students' GAI use and the factors influencing it remains limited. To address this gap, we surveyed 363 undergraduate and graduate students in the United States, examining their GAI usage and how it relates to demographic variables and personality traits based on the Big Five model (i.e., extraversion, agreeableness, conscientiousness, and emotional stability, and intellect/imagination). Our findings reveal: (a) Students in higher academic years are more inclined to use GAI and prefer it over traditional resources. (b) Non-native English speakers use and adopt GAI more readily than native speakers. (c) Compared to White, Asian students report higher GAI usage, perceive greater academic benefits, and express a stronger preference for it. Similarly, Black students report a more positive impact of GAI on their academic performance. Personality traits also play a significant role in shaping perceptions and usage of GAI. After controlling demographic factors, we found that personality still significantly predicts GAI use and attitudes: (a) Students with higher conscientiousness use GAI less. (b) Students who are higher in agreeableness perceive a less positive impact of GAI on academic performance and express more ethical concerns about using it for academic work. (c) Students with higher emotional stability report a more positive impact of GAI on learning and fewer concerns about its academic use. (d) Students with higher extraversion show a stronger preference for GAI over traditional resources. (e) Students with higher intellect/imagination tend to prefer traditional resources. These insights highlight the need for universities to provide personalized guidance to ensure students use GAI effectively, ethically, and equitably in their academic pursuits.
NExT-Search: Rebuilding User Feedback Ecosystem for Generative AI Search
Dai, Sunhao, Wang, Wenjie, Pang, Liang, Xu, Jun, Ng, See-Kiong, Wen, Ji-Rong, Chua, Tat-Seng
Generative AI search is reshaping information retrieval by offering end-to-end answers to complex queries, reducing users' reliance on manually browsing and summarizing multiple web pages. However, while this paradigm enhances convenience, it disrupts the feedback-driven improvement loop that has historically powered the evolution of traditional Web search. Web search can continuously improve their ranking models by collecting large-scale, fine-grained user feedback (e.g., clicks, dwell time) at the document level. In contrast, generative AI search operates through a much longer search pipeline, spanning query decomposition, document retrieval, and answer generation, yet typically receives only coarse-grained feedback on the final answer. This introduces a feedback loop disconnect, where user feedback for the final output cannot be effectively mapped back to specific system components, making it difficult to improve each intermediate stage and sustain the feedback loop. In this paper, we envision NExT-Search, a next-generation paradigm designed to reintroduce fine-grained, process-level feedback into generative AI search. NExT-Search integrates two complementary modes: User Debug Mode, which allows engaged users to intervene at key stages; and Shadow User Mode, where a personalized user agent simulates user preferences and provides AI-assisted feedback for less interactive users. Furthermore, we envision how these feedback signals can be leveraged through online adaptation, which refines current search outputs in real-time, and offline update, which aggregates interaction logs to periodically fine-tune query decomposition, retrieval, and generation models. By restoring human control over key stages of the generative AI search pipeline, we believe NExT-Search offers a promising direction for building feedback-rich AI search systems that can evolve continuously alongside human feedback.
Can AI Freelancers Compete? Benchmarking Earnings, Reliability, and Task Success at Scale
This study explores Large Language Models (LLMs) as autonomous agents for real-world tasks, including freelance software development. This work presents a new benchmark that evaluates LLMs on freelance programming and data analysis tasks derived from economic data. We construct the benchmark using synthetic tasks created from a Kaggle Freelancer dataset of job postings, with all job prices standardized to USD (median fixed-project price around $250, and an average of $306). Each task is accompanied by structured input-output test cases and an estimated price tag, enabling automated correctness checking and a monetary performance valuation. This approach is inspired by OpenAI's recent SWE-Lancer benchmark (1,400 real Upwork tasks worth $1M total). Still, our framework simplifies evaluation using programmatically testable tasks and predicted price values, making it highly scalable and repeatable. On this benchmark, we evaluate four modern LLMs - Claude 3.5 Haiku, GPT-4o-mini, Qwen 2.5, and Mistral. We report each model's accuracy (task success rate and test-case pass rate) and the total "freelance earnings" it achieves (sum of prices of solved tasks). Our results show that Claude 3.5 Haiku performs best, earning approximately $1.52 million USD, followed closely by GPT-4o-mini at $1.49 million, then Qwen 2.5 ($1.33M) and Mistral ($0.70M). We analyze the distribution of errors per task and observe that the strongest models solve the most tasks and rarely fail completely on any project. We discuss the implications of these results for the feasibility of AI as a freelance developer, the advantages and limitations of our automated benchmark approach, and the gap between performance on structured tasks versus the true complexity of real-world freelance jobs.
With AI Mode, Google Search Is About to Get Even Chattier
Google is rolling out its AI Mode search experience to everyone in the US starting today. The chatbot-style addition to the company's search engine results page is designed to answer longer queries and uses Google's AI model to generate full responses based on--and linking back to--indexed websites on the open web. AI Mode is Google's direct response to the release of search engines from Silicon Valley startups like OpenAI and Perplexity, which provide chatbot-style answers to questions and queries. If all of this feels like déjà vu, that's because at last year's Google I/O developer conference, the company rolled out AI Mode's precursor, AI Overviews. In 2024, Google started to use its machine intelligence model to summarize the contents of the web and plaster a block of text at the top of the results for some queries.