Goto

Collaborating Authors

 phase



AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web Platforms

Yan, Yuwei, Shang, Yu, Zeng, Qingbin, Li, Yu, Zhao, Keyu, Zheng, Zhiheng, Ning, Xuefei, Wu, Tianji, Yan, Shengen, Wang, Yu, Xu, Fengli, Li, Yong

arXiv.org Artificial Intelligence

The AgentSociety Challenge is the first competition in the Web Conference that aims to explore the potential of Large Language Model (LLM) agents in modeling user behavior and enhancing recommender systems on web platforms. The Challenge consists of two tracks: the User Modeling Track and the Recommendation Track. Participants are tasked to utilize a combined dataset from Yelp, Amazon, and Goodreads, along with an interactive environment simulator, to develop innovative LLM agents. The Challenge has attracted 295 teams across the globe and received over 1,400 submissions in total over the course of 37 official competition days. The participants have achieved 21.9% and 20.3% performance improvement for Track 1 and Track 2 in the Development Phase, and 9.1% and 15.9% in the Final Phase, representing a significant accomplishment. This paper discusses the detailed designs of the Challenge, analyzes the outcomes, and highlights the most successful LLM agent designs. To support further research and development, we have open-sourced the benchmark environment at https://tsinghua-fib-lab.github.io/AgentSocietyChallenge.


Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning

Cherepanov, Egor, Kachaev, Nikita, Kovalev, Alexey K., Panov, Aleksandr I.

arXiv.org Artificial Intelligence

Memory is crucial for enabling agents to tackle complex tasks with temporal and spatial dependencies. While many reinforcement learning (RL) algorithms incorporate memory, the field lacks a universal benchmark to assess an agent's memory capabilities across diverse scenarios. This gap is particularly evident in tabletop robotic manipulation, where memory is essential for solving tasks with partial observability and ensuring robust performance, yet no standardized benchmarks exist. To address this, we introduce MIKASA (Memory-Intensive Skills Assessment Suite for Agents), a comprehensive benchmark for memory RL, with three key contributions: (1) we propose a comprehensive classification framework for memory-intensive RL tasks, (2) we collect MIKASA-Base - a unified benchmark that enables systematic evaluation of memory-enhanced agents across diverse scenarios, and (3) we develop MIKASA-Robo - a novel benchmark of 32 carefully designed memory-intensive tasks that assess memory capabilities in tabletop robotic manipulation. Our contributions establish a unified framework for advancing memory RL research, driving the development of more reliable systems for real-world applications. The code is available at https://sites.google.com/view/memorybenchrobots/.


Council Post: Artificial Intelligence Platforms Will Drive The Next Phase Of Trade Finance Growth

#artificialintelligence

Trade finance refers to products and financial instruments used to facilitate the export and import of trade and commerce--and, thereby, the smooth conduct of business. Some of the most popular instruments in trade finance are letters of credit (LC), bank guarantees (BG), documentary collections and remittances. Essentially, these instruments have one primary function: enabling parties to the trade to make a transaction and mitigate the associated risks related to supply and payment. Trade finance drives the global economy. This segment will only grow in the future, notwithstanding temporary setbacks like the Covid-19 pandemic or geopolitical conflicts.


A Stable AI Optimization Algorithm Implementation Using Rust

#artificialintelligence

Its name is'Artificial bee colony algorithm' and it follows a pattern observed in nature about bees: A fixed number of N'bees' is positioned randomly (uniform) in the domain. Local Search: In each step, each bee first tries to find a better position by selecting randomly another bee and moving a random distance towards or away from its counterpart. It only performs the actual move in case the new position would be a better one than the current. Onlooker Phase: When all bees have performed the local search, they are relocated by a categorical distribution w.r.t a fitness value. In detail, each bee is assigned a fitness by computing the distance of its value to the value of the current'worst' bee.


Final Report on MITRE Evaluations for the DARPA Big Mechanism Program

Peterson, Matthew, Korves, Tonia, Garay, Christopher, Kozierok, Robyn, Hirschman, Lynette

arXiv.org Artificial Intelligence

This report presents the evaluation approach developed for the DARPA Big Mechanism program, which aimed at developing computer systems that will read research papers, integrate the information into a computer model of cancer mechanisms, and frame new hypotheses. We employed an iterative, incremental approach to the evaluation of the three phases of the program. In Phase I, we evaluated the ability of system and human teams ability to read-with-a-model to capture mechanistic information from the biomedical literature, integrated with information from expert curated biological databases. In Phase II we evaluated the ability of systems to assemble fragments of information into a mechanistic model. The Phase III evaluation focused on the ability of systems to provide explanations of experimental observations based on models assembled (largely automatically) by the Big Mechanism process. The evaluation for each phase built on earlier evaluations and guided developers towards creating capabilities for the new phase. The report describes our approach, including innovations such as a reference set (a curated data set limited to major findings of each paper) to assess the accuracy of systems in extracting mechanistic findings in the absence of a gold standard, and a method to evaluate model-based explanations of experimental data. Results of the evaluation and supporting materials are included in the appendices.


ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits

Buening, Thomas Kleine, Saha, Aadirupa

arXiv.org Artificial Intelligence

We study the problem of non-stationary dueling bandits and provide the first adaptive dynamic regret algorithm for this problem. The only two existing attempts in this line of work fall short across multiple dimensions, including pessimistic measures of non-stationary complexity and non-adaptive parameter tuning that requires knowledge of the number of preference changes. We develop an elimination-based rescheduling algorithm to overcome these shortcomings and show a near-optimal $\tilde{O}(\sqrt{S^{\texttt{CW}} T})$ dynamic regret bound, where $S^{\texttt{CW}}$ is the number of times the Condorcet winner changes in $T$ rounds. This yields the first near-optimal dynamic regret algorithm for unknown $S^{\texttt{CW}}$. We further study other related notions of non-stationarity for which we also prove near-optimal dynamic regret guarantees under additional assumptions on the underlying preference model.


A Universal Error Measure for Input Predictions Applied to Online Graph Problems

Bernardini, Giulia, Lindermayr, Alexander, Marchetti-Spaccamela, Alberto, Megow, Nicole, Stougie, Leen, Sweering, Michelle

arXiv.org Artificial Intelligence

We introduce a novel measure for quantifying the error in input predictions. The error is based on a minimum-cost hyperedge cover in a suitably defined hypergraph and provides a general template which we apply to online graph problems. The measure captures errors due to absent predicted requests as well as unpredicted actual requests; hence, predicted and actual inputs can be of arbitrary size. We achieve refined performance guarantees for previously studied network design problems in the online-list model, such as Steiner tree and facility location. Further, we initiate the study of learning-augmented algorithms for online routing problems, such as the online traveling salesperson problem and the online dial-a-ride problem, where (transportation) requests arrive over time (online-time model). We provide a general algorithmic framework and we give error-dependent performance bounds that improve upon known worst-case barriers, when given accurate predictions, at the cost of slightly increased worst-case bounds when given predictions of arbitrary quality.


The Next Phase of The Web Would Be Driven by AI

#artificialintelligence

Reading an article, watching a video on TikTok or YouTube, listening to a podcast while you're out running, you feel you have a reasonable expectation the content you're consuming is created by a human being. There is good reason to assume at least part of what you're consuming was either created by or assisted by an AI or some form of NLP (Natural Language Processor) or machine learning algorithm. Whether it's a TikTok video about a viral trend, an article in a renowned newspaper, or an image accompanying a news story on television, chances are some forms of AI generation has taken place between the idea of the story being created and the story reaching you. It could be the image was generated using DALL·E 2 or another image-generating AI, it could be the title, or lede, or social media text was generated by an NLP, it's quite likely part of or the entire text was written by an AI based on the prompts and prior writings of a human creator, and if you leave your young kids watching YouTube videos, there's a very high chance they'll encounter videos entirely conceived of and generated by an AI. Whereas Web 1.0 was defined by people being able to publish content using HTML, CSS (and eventually JavaScript), and Web 2.0 was defined by people being able to publish content through user-friendly applications that generated the HTML and CSS and JavaScript for them, the next stage of the web is being defined right now.


Why Businesses Are Still in The 'AI Adolescence' Phase

#artificialintelligence

AI maturity comes down to mastering critical capabilities in the right combinations--not only in data and AI but also in organizational strategy, talent, and culture. The AI transformation is occurring much faster than the digital transformation, because early successes have increased faith in AI as a value driver. There is a significant incentive to move rapidly. According to new research from Accenture, 'The art of AI Maturity', 63% of 1,200 companies were identified as "Experimenters," or companies stuck in the experimentation phase of their AI lives. They risk losing money since they haven't fully tapped into the technology's potential to innovate and revolutionize their industry. The companies with the highest advanced AI are already using this money.