Personal Assistant Systems
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
Zhao, Henry Hengyuan, Pei, Wenqi, Tao, Yifei, Mei, Haiyang, Shou, Mike Zheng
Existing benchmarks do not test Large Multimodal Models (LMMs) on their interactive intelligence with human users, which is vital for developing generalpurpose AI assistants. We design InterFeedback, an interactive framework, which can be applied to any LMM and dataset to assess this ability autonomously. On top of this, we introduce InterFeedback-Bench that evaluates interactive intelligence using two representative datasets, MMMU-Pro and MathVerse, to test 10 different open-source LMMs. Additionally, we present InterFeedback-Human, a newly collected dataset of 120 cases designed for manually testing interactive performance in leading models such as OpenAI-o1 and Claude-3.5-Sonnet. Our evaluation results indicate that even the state-of-the-art LMM, OpenAI-o1, struggles to refine its responses based on human feedback, achieving an average score of less than 50%. Our findings point to the need for methods that can enhance LMMs' capabilities to interpret and benefit from feedback. In this paper, we are curious about the question "Can Large Multimodal Models evolve through Interactive Human Feedback?" It is central to developing general-purpose AI assistants with Large Multimodal Models (LMMs). While these models show exceptional performance on tackling multimodal tasks directly, their ability to interact with humans remains largely unknown. We argue that an LMM functioning as the general assistant should possess two capabilities: 1) exceptional problem-solving ability and 2) the ability to improve itself through feedback (e.g., human feedback, execution results).
A Survey of Sim-to-Real Methods in RL: Progress, Prospects and Challenges with Foundation Models
Da, Longchao, Turnau, Justin, Kutralingam, Thirulogasankar Pranav, Velasquez, Alvaro, Shakarian, Paulo, Wei, Hua
Deep Reinforcement Learning (RL) has been explored and verified to be effective in solving decision-making tasks in various domains, such as robotics, transportation, recommender systems, etc. It learns from the interaction with environments and updates the policy using the collected experience. However, due to the limited real-world data and unbearable consequences of taking detrimental actions, the learning of RL policy is mainly restricted within the simulators. This practice guarantees safety in learning but introduces an inevitable sim-to-real gap in terms of deployment, thus causing degraded performance and risks in execution. There are attempts to solve the sim-to-real problems from different domains with various techniques, especially in the era with emerging techniques such as large foundations or language models that have cast light on the sim-to-real. This survey paper, to the best of our knowledge, is the first taxonomy that formally frames the sim-to-real techniques from key elements of the Markov Decision Process (State, Action, Transition, and Reward). Based on the framework, we cover comprehensive literature from the classic to the most advanced methods including the sim-to-real techniques empowered by foundation models, and we also discuss the specialties that are worth attention in different domains of sim-to-real problems. Then we summarize the formal evaluation process of sim-to-real performance with accessible code or benchmarks. The challenges and opportunities are also presented to encourage future exploration of this direction. We are actively maintaining a repository to include the most up-to-date sim-to-real research work to help domain researchers.
Apple is delaying its smarter, more personal Siri
Apple is delaying its updated version of Siri that understands personal context and can take action inside of apps, according to a statement the company shared with Daring Fireball. The company didn't offer a date as to when the upgrades to Siri will actually launch beyond that they're "rolling them out in the coming year." Here's the full statement reproduced below: Siri helps our users find what they need and get things done quickly, and in just the past six months, we've made Siri more conversational, introduced new features like type to Siri and product knowledge, and added an integration with ChatGPT. We've also been working on a more personalized Siri, giving it more awareness of your personal context, as well as the ability to take action for you within and across your apps. It's going to take us longer than we thought to deliver on these features and we anticipate rolling them out in the coming year.
Best Sonos Speakers (2025): Soundbars, Turntables, and More
After flooding our homes with every Sonos model you can buy (and filling all remaining space with the boxes of said speakers), then using them for a couple of years, we've come to value their audio fidelity and ability to network seamlessly together. There isn't another speaker system that lets you string together multiple speakers as easily or connect them to stream in different rooms of your home while keeping the audio perfectly in sync. The closest thing may be Google Assistant speakers, and Sonos connects to that system as well. Easy streaming: The Sonos app supports almost every streaming service in existence, and many apps, like Spotify, let you stream to Sonos speakers within them. The Sonos ecosystem can also handle home-theater applications and can support a full surround-sound setup.
Can Matchmaking Platforms Save Us From Dating App Fatigue?
One might assume, with good reason, that a romantic recession is underway. That's the story the numbers tell, at least. Forty-seven percent of US adults say dating is more difficult today than it was a decade ago, according to a Pew Research Center analysis. Even as singledom is on a downward slope--in 2023, 42 percent of adults were unpartnered compared to 44 percent in 2019, a different Pew survey found--it doesn't feel that way. The dating landscape is in the throes of another tectonic shift.
Alarming number of Americans scammed out of life savings have one thing in common, prompting lawmaker response
EXCLUSIVE: As romance scams are on the rise, a bipartisan group of lawmakers is introducing new legislation aimed at holding accountable those who seek to defraud retirees and steal their hard-earned savings. U.S. Sens. Marsha Blackburn, R-Tenn., and John Hickenlooper, D-Colo., and Rep. David Valadao, R-Calif., introduced the Romance Scam Prevention Act, which would require dating apps and services to issue fraud ban notifications to users who have interacted with a person removed from the app. The move came as Americans are more than ever connected thanks to social media and dating apps that allow us to stay in touch with old friends all over the world and to develop new relationships online. As Americans increasingly go online in search of relationships, scammers are following suit. According to the Federal Trade Commission (FTC), in 2022 almost 70,000 people reported being victims of a romance scam.
Benchmarking LLMs in Recommendation Tasks: A Comparative Evaluation with Conventional Recommenders
Liu, Qijiong, Zhu, Jieming, Fan, Lu, Wang, Kun, Hu, Hengchang, Guo, Wei, Liu, Yong, Wu, Xiao-Ming
In recent years, integrating large language models (LLMs) into recommender systems has created new opportunities for improving recommendation quality. However, a comprehensive benchmark is needed to thoroughly evaluate and compare the recommendation capabilities of LLMs with traditional recommender systems. In this paper, we introduce RecBench, which systematically investigates various item representation forms (including unique identifier, text, semantic embedding, and semantic identifier) and evaluates two primary recommendation tasks, i.e., click-through rate prediction (CTR) and sequential recommendation (SeqRec). Our extensive experiments cover up to 17 large models and are conducted across five diverse datasets from fashion, news, video, books, and music domains. Our findings indicate that LLM-based recommenders outperform conventional recommenders, achieving up to a 5% AUC improvement in the CTR scenario and up to a 170% NDCG@10 improvement in the SeqRec scenario. However, these substantial performance gains come at the expense of significantly reduced inference efficiency, rendering the LLM-as-RS paradigm impractical for real-time recommendation environments. We aim for our findings to inspire future research, including recommendation-specific model acceleration methods. We will release our code, data, configurations, and platform to enable other researchers to reproduce and build upon our experimental results.
The arrogant ex-soldier who turned into a triple killer
Former soldier Kyle Clifford raped and murdered Louise Hunt, and killed her sister Hannah and mother Carol in attacks described by police as "barbaric". What happened and what has emerged since? Days before the attacks, Louise had ended an 18-month relationship with Clifford. She told Clifford, who she had met through a dating app, it was "sucking the life out of me". They did not like the way Clifford treated Louise, finding him disrespectful, arrogant, rude and "odd". He had hidden relationships with other women from Louise, and went on a dating site moments after receiving the message ending theirs.
Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital Experiences
Shahid, Adnan, Kliks, Adrian, Al-Tahmeesschi, Ahmed, Elbakary, Ahmed, Nikou, Alexandros, Maatouk, Ali, Mokh, Ali, Kazemi, Amirreza, De Domenico, Antonio, Karapantelakis, Athanasios, Cheng, Bo, Yang, Bo, Wang, Bohao, Fischione, Carlo, Zhang, Chao, Issaid, Chaouki Ben, Yuen, Chau, Peng, Chenghui, Huang, Chongwen, Chaccour, Christina, Thomas, Christo Kurisummoottil, Sharma, Dheeraj, Kalogiros, Dimitris, Niyato, Dusit, De Poorter, Eli, Mhanna, Elissa, Strinati, Emilio Calvanese, Bader, Faouzi, Abdeldayem, Fathi, Wang, Fei, Zhu, Fenghao, Fontanesi, Gianluca, Geraci, Giovanni, Zhou, Haibo, Purmehdi, Hakimeh, Ahmadi, Hamed, Zou, Hang, Du, Hongyang, Lee, Hoon, Yang, Howard H., Poli, Iacopo, Carron, Igor, Chatzistefanidis, Ilias, Lee, Inkyu, Pitsiorlas, Ioannis, Fontaine, Jaron, Wu, Jiajun, Zeng, Jie, Li, Jinan, Karam, Jinane, Gemayel, Johny, Deng, Juan, Frison, Julien, Huang, Kaibin, Qiu, Kehai, Ball, Keith, Wang, Kezhi, Guo, Kun, Tassiulas, Leandros, Gwenole, Lecorve, Yue, Liexiang, Bariah, Lina, Powell, Louis, Dryjanski, Marcin, Galdon, Maria Amparo Canaveras, Kountouris, Marios, Hafeez, Maryam, Elkael, Maxime, Bennis, Mehdi, Boudjelli, Mehdi, Dai, Meiling, Debbah, Merouane, Polese, Michele, Assaad, Mohamad, Benzaghta, Mohamed, Refai, Mohammad Al, Djerrab, Moussab, Syed, Mubeen, Amir, Muhammad, Yan, Na, Alkaabi, Najla, Li, Nan, Sehad, Nassim, Nikaein, Navid, Hashash, Omar, Sroka, Pawel, Yang, Qianqian, Zhao, Qiyang, Silab, Rasoul Nikbakht, Ying, Rex, Morabito, Roberto, Li, Rongpeng, Madi, Ryad, Ayoubi, Salah Eddine El, D'Oro, Salvatore, Lasaulce, Samson, Shalmashi, Serveh, Liu, Sige, Cherrared, Sihem, Chetty, Swarna Bindu, Dutta, Swastika, Zaidi, Syed A. R., Chen, Tianjiao, Murphy, Timothy, Melodia, Tommaso, Quek, Tony Q. S., Ram, Vishnu, Saad, Walid, Hamidouche, Wassim, Chen, Weilong, Liu, Xiaoou, Yu, Xiaoxue, Wang, Xijun, Shang, Xingyu, Wang, Xinquan, Cao, Xuelin, Su, Yang, Liang, Yanping, Deng, Yansha, Yang, Yifan, Cui, Yingping, Sun, Yu, Chen, Yuxuan, Pointurier, Yvan, Nehme, Zeinab, Nezami, Zeinab, Yang, Zhaohui, Zhang, Zhaoyang, Liu, Zhe, Yang, Zhenyu, Han, Zhu, Zhou, Zhuang, Chen, Zihan, Chen, Zirui, Shuai, Zitao
The rise of generative artificial intelligence (AI) as a novel frontier that uniquely merges advanced levels of intelligence with revolutionary user experiences is redefining the AI landscape for future cellular networks. In particular, the transition towards 6G systems has introduced a myriad of challenges inherent to their AI-native network design, requiring innovative solutions to enable real-time network orchestration, intelligent decision-making, and adaptive dynamic configurations. Meanwhile, the envisioned user experiences for 6G are growing increasingly complex, exceeding the capabilities offered by vintage wireless technologies and conventional AI solutions to satisfy their advanced demands. With its disruptive impact evident across diverse fields, generative AI possesses immense potential to tackle these challenges, leveraging its exceptional capabilities to manage complex tasks, operate autonomously, and adapt seamlessly to scenarios beyond its training domain. Remarkably, generative AI provides a transformative opportunity for telecom and cellular networks to bridge this defined gap in 6G systems, thereby shifting towards a new era with cutting-edge AI innovations across the different system and user levels.
Matrix Factorization for Inferring Associations and Missing Links
Barron, Ryan, Eren, Maksim E., Truong, Duc P., Matuszek, Cynthia, Wendelberger, James, Dorn, Mary F., Alexandrov, Boian
Missing link prediction is a method for network analysis, with applications in recommender systems, biology, social sciences, cybersecurity, information retrieval, and Artificial Intelligence (AI) reasoning in Knowledge Graphs. Missing link prediction identifies unseen but potentially existing connections in a network by analyzing the observed patterns and relationships. In proliferation detection, this supports efforts to identify and characterize attempts by state and non-state actors to acquire nuclear weapons or associated technology - a notoriously challenging but vital mission for global security. Dimensionality reduction techniques like Non-Negative Matrix Factorization (NMF) and Logistic Matrix Factorization (LMF) are effective but require selection of the matrix rank parameter, that is, of the number of hidden features, k, to avoid over/under-fitting. We introduce novel Weighted (WNMFk), Boolean (BNMFk), and Recommender (RNMFk) matrix factorization methods, along with ensemble variants incorporating logistic factorization, for link prediction. Our methods integrate automatic model determination for rank estimation by evaluating stability and accuracy using a modified bootstrap methodology and uncertainty quantification (UQ), assessing prediction reliability under random perturbations. We incorporate Otsu threshold selection and k-means clustering for Boolean matrix factorization, comparing them to coordinate descent-based Boolean thresholding. Our experiments highlight the impact of rank k selection, evaluate model performance under varying test-set sizes, and demonstrate the benefits of UQ for reliable predictions using abstention. We validate our methods on three synthetic datasets (Boolean and uniformly distributed) and benchmark them against LMF and symmetric LMF (symLMF) on five real-world protein-protein interaction networks, showcasing an improved prediction performance.