Law
Who's to Blame When AI Agents Screw Up?
Over the past year, veteran software engineer Jay Prakash Thakur has spent his nights and weekends prototyping AI agents that could, in the near future, order meals and engineer mobile apps almost entirely on their own. His agents, while surprisingly capable, have also exposed new legal questions that await companies trying to capitalize on Silicon Valley's hottest new technology. Agents are AI programs that can act mostly independently, allowing companies to automate tasks such as answering customer questions or paying invoices. While ChatGPT and similar chatbots can draft emails or analyze bills upon request, Microsoft and other tech giants expect that agents will tackle more complex functions--and most importantly, do it with little human oversight. The tech industry's most ambitious plans involve multi-agent systems, with dozens of agents someday teaming up to replace entire workforces.
Interview with Gillian Hadfield: Normative infrastructure for AI alignment
During the 33rd International Joint Conference on Artificial Intelligence (IJCAI), held in Jeju, I had the opportunity to meet with one of the keynote speakers, Gillian Hadfield. We spoke about her interdisciplinary research, career trajectory, path into AI alignment, law, and general thoughts on AI systems. Transcript: Note: the transcript has been lightly edited for clarity. This is an interview with Professor Gillian Hadfield who was a keynote speaker at IJCAI 2024. She gave a very insightful talk about normative infrastructures and how they can guide our search for AI alignment. Kumar Kshitij Patel (KKP): Could you talk a bit about your background and career trajectory? I want our readers to understand how much interdisciplinary work you've done over the years. Gillian Hadfield (GH): I did a PhD in economics and a law degree, a JD, at Stanford, originally motivated by wanting to think about the big questions about the world. So I read John Rawls' theory of justice when I was an undergraduate, and those are the big questions: how do we organize the world and just institutions, but I was very interested in using more formal methods and social scientific approaches. That's why I decided to do that joint degree. So, this is in the 1980s, and in the early days of starting to use a lot of game theory. I studied information theory, a student of Canaro and Paul Milgram at the economics department at Stanford. I did work on contract theory, bargaining theory, but I was still very interested in going to law school, not to practice law, but to learn about legal institutions and how those work. I was a member of this emerging area of law and economics early in my career, which of course, was interdisciplinary, using economics to think about law and legal institutions.
LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models You Chen Department of Computer Science Department of Computer Science Tsinghua University
Large language models (LLMs) have made significant progress in natural language processing tasks and demonstrate considerable potential in the legal domain. However, legal applications demand high standards of accuracy, reliability, and fairness. Applying existing LLMs to legal systems without careful evaluation of their potential and limitations could pose significant risks in legal practice. To this end, we introduce a standardized comprehensive Chinese legal benchmark LexEval. This benchmark is notable in the following three aspects: (1) Ability Modeling: We propose a new taxonomy of legal cognitive abilities to organize different tasks.
Report: Creating a 5-second AI video is like running a microwave for an hour
You've probably heard that statistic that every search on ChatGPT uses the equivalent of a bottle of water. And while that's technically true, it misses some of the nuance. The MIT Technology Review dropped a massive report that reveals how the artificial intelligence industry uses energy -- and exactly how much energy it costs to use a service like ChatGPT. The report determined that the energy cost of large-language models like ChatGPT cost anywhere from 114 joules per response to 6,706 joules per response -- that's the difference between running a microwave for one-tenth of a second to running a microwave for eight seconds. The lower-energy models, according to the report, use less energy because they uses fewer parameters, which also means the answers tend to be less accurate.
OpenAI taps iPhone designer Jony Ive to develop AI devices
On Wednesday, OpenAI announced that it had acquired the startup of iPhone designer Jony Ive, a big win for the company. Ive's startup is called io, and the purchase price is nearly 6.5 billion, according to Bloomberg, which would make it OpenAI's biggest acquisition to date. The official announcement didn't contain much detail and mostly consisted of Altman and Ive gushing about each other. "Two years ago, Jony Ive and the creative collective LoveFrom, quietly began collaborating with Sam Altman and the team at OpenAI. A collaboration built upon friendship, curiosity and shared values quickly grew in ambition. Tentative ideas and explorations evolved into tangible designs. The ideas seemed important and useful. They were optimistic and hopeful. They reminded us of a time when we celebrated human achievement, grateful for new tools that helped us learn, explore and create...We gathered together the best hardware and software engineers, the best technologists, physicists, scientists, researchers and experts in product development and manufacturing. Many of us have worked closely for decades. The io team, focused on developing products that inspire, empower and enable, will now merge with OpenAI to work more intimately with the research, engineering and product teams in San Francisco."
+ + Dataset: Vision-Language Model Sensitivity to Semantic and Lexical Alterations
Despite their remarkable successes, state-of-the-art large language models (LLMs), including vision-and-language models (VLMs) and unimodal language models (ULMs), fail to understand precise semantics. For example, semantically equivalent sentences expressed using different lexical compositions elicit diverging representations. The degree of this divergence and its impact on encoded semantics is not very well understood.
Cannes Is Rolling Out the Red Carpet for One of This Century's Most Controversial Figures
Although the Cannes Film Festival is the world's most prestigious movie showcase, its spotlight rarely falls on nonfiction film. Years go by without a single documentary competing for its biggest honor, the Palme d'Or, and there is no separate documentary prize. Juliette Binoche, the president of this year's jury, devoted part of her opening-night remarks to Fatma Hassona, the Palestinian photojournalist who was killed in an Israeli airstrike the day after it was announced that her documentary Put Your Soul on Your Hand and Walk would be premiering at Cannes. But the film itself was slotted into a low-profile sidebar devoted to independent productions. The festival did, however, roll out the red carpet for The Six Billion Dollar Man, Eugene Jarecki's portrait of WikiLeaks founder Julian Assange, which premiered out of competition on Wednesday evening.
Single 1 138K 20M Amazon Multiple 2 / 233M Yelp Single 1 1.9M 8M YOOCHOOSE Single 2 9.2M 34M Taobao: User-Behavior
K and M are short for thousand and million respectively. True_neg denotes whether it includes true negative feedback. We only show the statistics of the QK-video (QKV) and QK-article (QKA) in this table. We show the difference between Tenrec and other popular recommendation datasets in Table1. First, most datasets contain only a single scenario. Without overlapped users and items, it is difficult to develop and evaluate transfer learning recommendation methods. In addition, Tenrec contains very rich positive user feedback, which can be used to evaluate the multi-task learning and preference-level transfer learning tasks. Third, compared with most recommendation datasets, Tenrec has true negative examples, which can be used to evaluate more realistic CTR prediction task.
Welcome to Google AI Mode! Everything is fine.
If the AI lovefest of Google I/O 2025 were a TV show, you might be tempted to call it It's Always Sunny in Mountain View. But here's a better sitcom analogy for the event that added AI Mode to all U.S. search results, whether we want it or not. It's The Good Place, in which our late heroes are repeatedly assured that they've gone to a better world. A place where everything is fine, all is as it seems, and search quality just keeps getting better. Don't worry about ever-present and increasing AI hallucinations here in the Good Place, where the word "hallucination" isn't even used.
'Every person that clashed with him has left': the rise, fall and spectacular comeback of Sam Altman
The short-lived firing of Sam Altman, the CEO of possibly the world's most important AI company, was sensational. When he was sacked by OpenAI's board members, some of them believed the stakes could not have been higher – the future of humanity – if the organisation continued under Altman. Imagine Succession, with added apocalypse vibes. In early November 2023, after three weeks of secret calls and varying degrees of paranoia, the OpenAI board agreed: Altman had to go. After his removal, Altman's most loyal staff resigned, and others signed an open letter calling for his reinstatement.