Personal
Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation
Liu, Jian, Sun, Wei, Yang, Hui, Deng, Pengchao, Liu, Chongpei, Sebe, Nicu, Rahmani, Hossein, Mian, Ajmal
Nine-degrees-of-freedom (9-DoF) object pose and size estimation is crucial for enabling augmented reality and robotic manipulation. Category-level methods have received extensive research attention due to their potential for generalization to intra-class unknown objects. However, these methods require manual collection and labeling of large-scale real-world training data. To address this problem, we introduce a diffusion-based paradigm for domain-generalized category-level 9-DoF object pose estimation. Our motivation is to leverage the latent generalization ability of the diffusion model to address the domain generalization challenge in object pose estimation. This entails training the model exclusively on rendered synthetic data to achieve generalization to real-world scenes. We propose an effective diffusion model to redefine 9-DoF object pose estimation from a generative perspective. Our model does not require any 3D shape priors during training or inference. By employing the Denoising Diffusion Implicit Model, we demonstrate that the reverse diffusion process can be executed in as few as 3 steps, achieving near real-time performance. Finally, we design a robotic grasping system comprising both hardware and software components. Through comprehensive experiments on two benchmark datasets and the real-world robotic system, we show that our method achieves state-of-the-art domain generalization performance. Our code will be made public at https://github.com/CNJianLiu/Diff9D.
Source Code by Bill Gates review – growing pains of a computer geek
The enduring mystery about William Henry Gates III is this: how did a precocious and sometimes obnoxious kid evolve into a billionaire tech lord and then into an elder statesman and philanthropist? This book gives us only the first part of the story, tracing Gates's evolution from birth in 1955 to the founding of Microsoft in 1975. For the next part of the story, we will just have to wait for the sequel. In a way, the volume's title describes it well. In the era before machine learning and AI, when computer programs were exclusively written by humans, the term "source code" meant something.
Wizard of Shopping: Target-Oriented E-commerce Dialogue Generation with Decision Tree Branching
Li, Xiangci, Chen, Zhiyu, Choi, Jason Ingyu, Vedula, Nikhita, Fetahu, Besnik, Rokhlenko, Oleg, Malmasi, Shervin
The goal of conversational product search (CPS) is to develop an intelligent, chat-based shopping assistant that can directly interact with customers to understand shopping intents, ask clarification questions, and find relevant products. However, training such assistants is hindered mainly due to the lack of reliable and large-scale datasets. Prior human-annotated CPS datasets are extremely small in size and lack integration with real-world product search systems. We propose a novel approach, TRACER, which leverages large language models (LLMs) to generate realistic and natural conversations for different shopping domains. TRACER's novelty lies in grounding the generation to dialogue plans, which are product search trajectories predicted from a decision tree model, that guarantees relevant product discovery in the shortest number of search conditions. We also release the first target-oriented CPS dataset Wizard of Shopping (WoS), containing highly natural and coherent conversations (3.6k) from three shopping domains. Finally, we demonstrate the quality and effectiveness of WoS via human evaluations and downstream tasks.
AIhub monthly digest: January 2025 – artists' perspectives on GenAI, biomedical knowledge graphs, and ML for studying greenhouse gas emissions
Welcome to our monthly digest, where you can catch up with any AIhub stories you may have missed, peruse the latest news, recap recent events, and more. This month, we hear about artists' perspectives on generative AI, learn how to explain neural networks using logic, and find out about using machine learning for studying greenhouse gas emissions. We caught up with Erica Kimei to find out about her research studying gas emissions from agriculture, specifically ruminant livestock. Erica combines machine learning and remote sensing technology to monitor and forecast such emissions. This interview is the latest in our series highlighting members of the AfriClimate AI community.
International AI Safety Report
Bengio, Yoshua, Mindermann, Sören, Privitera, Daniel, Besiroglu, Tamay, Bommasani, Rishi, Casper, Stephen, Choi, Yejin, Fox, Philip, Garfinkel, Ben, Goldfarb, Danielle, Heidari, Hoda, Ho, Anson, Kapoor, Sayash, Khalatbari, Leila, Longpre, Shayne, Manning, Sam, Mavroudis, Vasilios, Mazeika, Mantas, Michael, Julian, Newman, Jessica, Ng, Kwan Yee, Okolo, Chinasa T., Raji, Deborah, Sastry, Girish, Seger, Elizabeth, Skeadas, Theodora, South, Tobin, Strubell, Emma, Tramèr, Florian, Velasco, Lucia, Wheeler, Nicole, Acemoglu, Daron, Adekanmbi, Olubayo, Dalrymple, David, Dietterich, Thomas G., Felten, Edward W., Fung, Pascale, Gourinchas, Pierre-Olivier, Heintz, Fredrik, Hinton, Geoffrey, Jennings, Nick, Krause, Andreas, Leavy, Susan, Liang, Percy, Ludermir, Teresa, Marda, Vidushi, Margetts, Helen, McDermid, John, Munga, Jane, Narayanan, Arvind, Nelson, Alondra, Neppel, Clara, Oh, Alice, Ramchurn, Gopal, Russell, Stuart, Schaake, Marietje, Schölkopf, Bernhard, Song, Dawn, Soto, Alvaro, Tiedrich, Lee, Varoquaux, Gaël, Yao, Andrew, Zhang, Ya-Qin, Albalawi, Fahad, Alserkal, Marwan, Ajala, Olubunmi, Avrin, Guillaume, Busch, Christian, de Carvalho, André Carlos Ponce de Leon Ferreira, Fox, Bronwyn, Gill, Amandeep Singh, Hatip, Ahmet Halit, Heikkilä, Juha, Jolly, Gill, Katzir, Ziv, Kitano, Hiroaki, Krüger, Antonio, Johnson, Chris, Khan, Saif M., Lee, Kyoung Mu, Ligot, Dominic Vincent, Molchanovskyi, Oleksii, Monti, Andrea, Mwamanzi, Nusu, Nemer, Mona, Oliver, Nuria, Portillo, José Ramón López, Ravindran, Balaraman, Rivera, Raquel Pezoa, Riza, Hammam, Rugege, Crystal, Seoighe, Ciarán, Sheehan, Jerry, Sheikh, Haroon, Wong, Denise, Zeng, Yi
I am honoured to present the International AI Safety Report. It is the work of 96 international AI experts who collaborated in an unprecedented effort to establish an internationally shared scientific understanding of risks from advanced AI and methods for managing them. We embarked on this journey just over a year ago, shortly after the countries present at the Bletchley Park AI Safety Summit agreed to support the creation of this report. Since then, we published an Interim Report in May 2024, which was presented at the AI Seoul Summit. We are now pleased to publish the present, full report ahead of the AI Action Summit in Paris in February 2025. Since the Bletchley Summit, the capabilities of general-purpose AI, the type of AI this report focuses on, have increased further. For example, new models have shown markedly better performance at tests of Professor Yoshua Bengio programming and scientific reasoning.
2SSP: A Two-Stage Framework for Structured Pruning of LLMs
Sandri, Fabrizio, Cunegatti, Elia, Iacca, Giovanni
We propose a novel Two-Stage framework for Structured Pruning (2SSP) for pruning Large Language Models (LLMs), which combines two different strategies of pruning, namely Width and Depth Pruning. The first stage (Width Pruning) removes entire neurons, hence their corresponding rows and columns, aiming to preserve the connectivity among the pruned structures in the intermediate state of the Feed-Forward Networks in each Transformer block. This is done based on an importance score measuring the impact of each neuron over the output magnitude. The second stage (Depth Pruning), instead, removes entire Attention submodules. This is done by applying an iterative process that removes the Attention submodules with the minimum impact on a given metric of interest (in our case, perplexity). We also propose a novel mechanism to balance the sparsity rate of the two stages w.r.t. to the desired global sparsity. We test 2SSP on four LLM families and three sparsity rates (25\%, 37.5\%, and 50\%), measuring the resulting perplexity over three language modeling datasets as well as the performance over six downstream tasks. Our method consistently outperforms five state-of-the-art competitors over three language modeling and six downstream tasks, with an up to two-order-of-magnitude gain in terms of pruning time. The code is available at available at \url{https://github.com/FabrizioSandri/2SSP}.
MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs
Sirdeshmukh, Ved, Deshpande, Kaustubh, Mols, Johannes, Jin, Lifeng, Cardona, Ed-Yeremai, Lee, Dean, Kritz, Jeremy, Primack, Willow, Yue, Summer, Xing, Chen
We present MultiChallenge, a pioneering benchmark evaluating large language models (LLMs) on conducting multi-turn conversations with human users, a crucial yet underexamined capability for their applications. MultiChallenge identifies four categories of challenges in multi-turn conversations that are not only common and realistic among current human-LLM interactions, but are also challenging to all current frontier LLMs. All 4 challenges require accurate instruction-following, context allocation, and in-context reasoning at the same time. We also develop LLM as judge with instance-level rubrics to facilitate an automatic evaluation method with fair agreement with experienced human raters. Despite achieving near-perfect scores on existing multi-turn evaluation benchmarks, all frontier models have less than 50% accuracy on MultiChallenge, with the top-performing Claude 3.5 Sonnet (June 2024) achieving just a 41.4% average accuracy.
Review for NeurIPS paper: Emergent Reciprocity and Team Formation from Randomized Uncertain Social Preferences
Four knowledgeable referees reviewed this paper. After conducting initial reviews, reading the authors' rebuttal (which resolved some concerns, but not the core concerns of two of the reviewers), and discussing the paper, the reviewers did not agree on an outcome. Two of the reviewers came to the conclusion that this is a ground-breaking paper (simple and elegant). The other two reviewers were perhaps somewhat intrigued, but did not feel the paper was yet ready for publication. For example, during the discussion phase, R4 (a very accomplished and well-respected research in the field) made very valid points about the papers weaknesses: "So all this leads me to suggest that there needs to be a better context, more related work and a better way to situate the paper in related arenas, e.g., provide some sort of a framework to back up the findings. I understand the issue of limited space, but given the amount of literature in this area, I feel that the paper doesnt do a good enough job explaining its findings in context."
DialUp! Modeling the Language Continuum by Adapting Models to Dialects and Dialects to Models
Bafna, Niyati, Chang, Emily, Robinson, Nathaniel R., Mortensen, David R., Murray, Kenton, Yarowsky, David, Sirin, Hale
Most of the world's languages and dialects are low-resource, and lack support in mainstream machine translation (MT) models. However, many of them have a closely-related high-resource language (HRL) neighbor, and differ in linguistically regular ways from it. This underscores the importance of model robustness to dialectical variation and cross-lingual generalization to the HRL dialect continuum. We present DialUp, consisting of a training-time technique for adapting a pretrained model to dialectical data (M->D), and an inference-time intervention adapting dialectical data to the model expertise (D->M). M->D induces model robustness to potentially unseen and unknown dialects by exposure to synthetic data exemplifying linguistic mechanisms of dialectical variation, whereas D->M treats dialectical divergence for known target dialects. These methods show considerable performance gains for several dialects from four language families, and modest gains for two other language families. We also conduct feature and error analyses, which show that language varieties with low baseline MT performance are more likely to benefit from these approaches.
Review for NeurIPS paper: Forget About the LiDAR: Self-Supervised Depth Estimators with MED Probability Volumes
Weaknesses: I have no major concerns, but only remarks and suggestions for improvements. Although this is unambiguous in the experimental section, the abstract and introduction should clarify that the method is self-supervised from stereo pairs. There is a lot of confusion in the literature, because all monocular methods predict depth from a single image (by definition) but can be trained in different ways: from lidar supervision (full or partial), from stereo pairs (as is the case here), or from videos (a.k.a. Some of the authors' critique of related works (e.g., regarding dynamic objects) are only applicable to the SfM self-supervised scenario, as in the case of stereo-based self-supervised learning pairs of images are captured at the same time. Furthermore, the SfM case requires estimating the camera's ego-motion, which vastly complicates the self-supervised learning task (hence why the comparison is not entirely fair in my opinion).