AITopics | Personal

Collaborating Authors

Personal

Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation

Liu, Jian, Sun, Wei, Yang, Hui, Deng, Pengchao, Liu, Chongpei, Sebe, Nicu, Rahmani, Hossein, Mian, Ajmal

arXiv.org Artificial IntelligenceFeb-4-2025

Nine-degrees-of-freedom (9-DoF) object pose and size estimation is crucial for enabling augmented reality and robotic manipulation. Category-level methods have received extensive research attention due to their potential for generalization to intra-class unknown objects. However, these methods require manual collection and labeling of large-scale real-world training data. To address this problem, we introduce a diffusion-based paradigm for domain-generalized category-level 9-DoF object pose estimation. Our motivation is to leverage the latent generalization ability of the diffusion model to address the domain generalization challenge in object pose estimation. This entails training the model exclusively on rendered synthetic data to achieve generalization to real-world scenes. We propose an effective diffusion model to redefine 9-DoF object pose estimation from a generative perspective. Our model does not require any 3D shape priors during training or inference. By employing the Denoising Diffusion Implicit Model, we demonstrate that the reverse diffusion process can be executed in as few as 3 steps, achieving near real-time performance. Finally, we design a robotic grasping system comprising both hardware and software components. Through comprehensive experiments on two benchmark datasets and the real-world robotic system, we show that our method achieves state-of-the-art domain generalization performance. Our code will be made public at https://github.com/CNJianLiu/Diff9D.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.02525

Country:

North America > United States (0.28)
Europe > United Kingdom (0.28)
Asia > China > Shaanxi Province > Xi'an (0.04)
(4 more...)

Genre:

Research Report (1.00)
Personal (0.67)

Industry:

Government > Regional Government (0.92)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Source Code by Bill Gates review – growing pains of a computer geek

The GuardianFeb-2-2025, 07:00:05 GMT

The enduring mystery about William Henry Gates III is this: how did a precocious and sometimes obnoxious kid evolve into a billionaire tech lord and then into an elder statesman and philanthropist? This book gives us only the first part of the story, tracing Gates's evolution from birth in 1955 to the founding of Microsoft in 1975. For the next part of the story, we will just have to wait for the sequel. In a way, the volume's title describes it well. In the era before machine learning and AI, when computer programs were exclusively written by humans, the term "source code" meant something.

computer geek, software, source code, (7 more...)

The Guardian

Country: North America > United States > New Mexico > Bernalillo County > Albuquerque (0.05)

Genre: Personal (0.30)

Industry: Education (0.30)

Technology:

Information Technology > Software Engineering (0.62)
Information Technology > Artificial Intelligence (0.56)

Add feedback

Wizard of Shopping: Target-Oriented E-commerce Dialogue Generation with Decision Tree Branching

Li, Xiangci, Chen, Zhiyu, Choi, Jason Ingyu, Vedula, Nikhita, Fetahu, Besnik, Rokhlenko, Oleg, Malmasi, Shervin

arXiv.org Artificial IntelligenceFeb-2-2025

The goal of conversational product search (CPS) is to develop an intelligent, chat-based shopping assistant that can directly interact with customers to understand shopping intents, ask clarification questions, and find relevant products. However, training such assistants is hindered mainly due to the lack of reliable and large-scale datasets. Prior human-annotated CPS datasets are extremely small in size and lack integration with real-world product search systems. We propose a novel approach, TRACER, which leverages large language models (LLMs) to generate realistic and natural conversations for different shopping domains. TRACER's novelty lies in grounding the generation to dialogue plans, which are product search trajectories predicted from a decision tree model, that guarantees relevant product discovery in the shortest number of search conditions. We also release the first target-oriented CPS dataset Wizard of Shopping (WoS), containing highly natural and coherent conversations (3.6k) from three shopping domains. Finally, we demonstrate the quality and effectiveness of WoS via human evaluations and downstream tasks.

customer, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2502.00969

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > New York (0.04)
North America > United States > Texas (0.04)
(8 more...)

Genre:

Personal > Interview (1.00)
Research Report (0.70)

Industry: Information Technology > Services > e-Commerce Services (0.50)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

AIhub monthly digest: January 2025 – artists' perspectives on GenAI, biomedical knowledge graphs, and ML for studying greenhouse gas emissions

AIHubJan-29-2025, 16:07:23 GMT

Welcome to our monthly digest, where you can catch up with any AIhub stories you may have missed, peruse the latest news, recap recent events, and more. This month, we hear about artists' perspectives on generative AI, learn how to explain neural networks using logic, and find out about using machine learning for studying greenhouse gas emissions. We caught up with Erica Kimei to find out about her research studying gas emissions from agriculture, specifically ruminant livestock. Erica combines machine learning and remote sensing technology to monitor and forecast such emissions. This interview is the latest in our series highlighting members of the AfriClimate AI community.

biomedical knowledge graph, greenhouse ga emission, monthly digest, (8 more...)

AIHub

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.06)

Genre: Personal > Interview (0.92)

Industry: Energy > Energy Policy (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.65)

Add feedback

International AI Safety Report

Bengio, Yoshua, Mindermann, Sören, Privitera, Daniel, Besiroglu, Tamay, Bommasani, Rishi, Casper, Stephen, Choi, Yejin, Fox, Philip, Garfinkel, Ben, Goldfarb, Danielle, Heidari, Hoda, Ho, Anson, Kapoor, Sayash, Khalatbari, Leila, Longpre, Shayne, Manning, Sam, Mavroudis, Vasilios, Mazeika, Mantas, Michael, Julian, Newman, Jessica, Ng, Kwan Yee, Okolo, Chinasa T., Raji, Deborah, Sastry, Girish, Seger, Elizabeth, Skeadas, Theodora, South, Tobin, Strubell, Emma, Tramèr, Florian, Velasco, Lucia, Wheeler, Nicole, Acemoglu, Daron, Adekanmbi, Olubayo, Dalrymple, David, Dietterich, Thomas G., Felten, Edward W., Fung, Pascale, Gourinchas, Pierre-Olivier, Heintz, Fredrik, Hinton, Geoffrey, Jennings, Nick, Krause, Andreas, Leavy, Susan, Liang, Percy, Ludermir, Teresa, Marda, Vidushi, Margetts, Helen, McDermid, John, Munga, Jane, Narayanan, Arvind, Nelson, Alondra, Neppel, Clara, Oh, Alice, Ramchurn, Gopal, Russell, Stuart, Schaake, Marietje, Schölkopf, Bernhard, Song, Dawn, Soto, Alvaro, Tiedrich, Lee, Varoquaux, Gaël, Yao, Andrew, Zhang, Ya-Qin, Albalawi, Fahad, Alserkal, Marwan, Ajala, Olubunmi, Avrin, Guillaume, Busch, Christian, de Carvalho, André Carlos Ponce de Leon Ferreira, Fox, Bronwyn, Gill, Amandeep Singh, Hatip, Ahmet Halit, Heikkilä, Juha, Jolly, Gill, Katzir, Ziv, Kitano, Hiroaki, Krüger, Antonio, Johnson, Chris, Khan, Saif M., Lee, Kyoung Mu, Ligot, Dominic Vincent, Molchanovskyi, Oleksii, Monti, Andrea, Mwamanzi, Nusu, Nemer, Mona, Oliver, Nuria, Portillo, José Ramón López, Ravindran, Balaraman, Rivera, Raquel Pezoa, Riza, Hammam, Rugege, Crystal, Seoighe, Ciarán, Sheehan, Jerry, Sheikh, Haroon, Wong, Denise, Zeng, Yi

arXiv.org Artificial IntelligenceJan-29-2025

I am honoured to present the International AI Safety Report. It is the work of 96 international AI experts who collaborated in an unprecedented effort to establish an internationally shared scientific understanding of risks from advanced AI and methods for managing them. We embarked on this journey just over a year ago, shortly after the countries present at the Bletchley Park AI Safety Summit agreed to support the creation of this report. Since then, we published an Interim Report in May 2024, which was presented at the AI Seoul Summit. We are now pleased to publish the present, full report ahead of the AI Action Summit in Paris in February 2025. Since the Bletchley Summit, the capabilities of general-purpose AI, the type of AI this report focuses on, have increased further. For example, new models have shown markedly better performance at tests of Professor Yoshua Bengio programming and scientific reasoning.

data mining, large language model, machine learning, (27 more...)

arXiv.org Artificial Intelligence

2501.17805

Country:

South America (1.00)
North America > Canada (1.00)
Asia > Middle East (1.00)
(7 more...)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
(5 more...)

Industry:

Transportation > Air (1.00)
Social Sector (1.00)
Media > News (1.00)
(30 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Quality (1.00)
(21 more...)

Add feedback

2SSP: A Two-Stage Framework for Structured Pruning of LLMs

Sandri, Fabrizio, Cunegatti, Elia, Iacca, Giovanni

arXiv.org Artificial IntelligenceJan-29-2025

We propose a novel Two-Stage framework for Structured Pruning (2SSP) for pruning Large Language Models (LLMs), which combines two different strategies of pruning, namely Width and Depth Pruning. The first stage (Width Pruning) removes entire neurons, hence their corresponding rows and columns, aiming to preserve the connectivity among the pruned structures in the intermediate state of the Feed-Forward Networks in each Transformer block. This is done based on an importance score measuring the impact of each neuron over the output magnitude. The second stage (Depth Pruning), instead, removes entire Attention submodules. This is done by applying an iterative process that removes the Attention submodules with the minimum impact on a given metric of interest (in our case, perplexity). We also propose a novel mechanism to balance the sparsity rate of the two stages w.r.t. to the desired global sparsity. We test 2SSP on four LLM families and three sparsity rates (25\%, 37.5\%, and 50\%), measuring the resulting perplexity over three language modeling datasets as well as the performance over six downstream tasks. Our method consistently outperforms five state-of-the-art competitors over three language modeling and six downstream tasks, with an up to two-order-of-magnitude gain in terms of pruning time. The code is available at available at \url{https://github.com/FabrizioSandri/2SSP}.

large language model, machine learning, sparsity rate, (20 more...)

arXiv.org Artificial Intelligence

2501.17771

Country:

North America > United States (0.46)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
(3 more...)

Genre:

Research Report (1.00)
Personal > Honors > Award (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs

Sirdeshmukh, Ved, Deshpande, Kaustubh, Mols, Johannes, Jin, Lifeng, Cardona, Ed-Yeremai, Lee, Dean, Kritz, Jeremy, Primack, Willow, Yue, Summer, Xing, Chen

arXiv.org Artificial IntelligenceJan-28-2025

We present MultiChallenge, a pioneering benchmark evaluating large language models (LLMs) on conducting multi-turn conversations with human users, a crucial yet underexamined capability for their applications. MultiChallenge identifies four categories of challenges in multi-turn conversations that are not only common and realistic among current human-LLM interactions, but are also challenging to all current frontier LLMs. All 4 challenges require accurate instruction-following, context allocation, and in-context reasoning at the same time. We also develop LLM as judge with instance-level rubrics to facilitate an automatic evaluation method with fair agreement with experienced human raters. Despite achieving near-perfect scores on existing multi-turn evaluation benchmarks, all frontier models have less than 50% accuracy on MultiChallenge, with the top-performing Claude 3.5 Sonnet (June 2024) achieving just a 41.4% average accuracy.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.17399

Country:

Africa > Middle East > Egypt (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(13 more...)

Genre:

Workflow (1.00)
Research Report (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
Personal (0.68)

Industry:

Media > Film (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Consumer Health (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Review for NeurIPS paper: Emergent Reciprocity and Team Formation from Randomized Uncertain Social Preferences

Neural Information Processing SystemsJan-27-2025, 19:39:25 GMT

Four knowledgeable referees reviewed this paper. After conducting initial reviews, reading the authors' rebuttal (which resolved some concerns, but not the core concerns of two of the reviewers), and discussing the paper, the reviewers did not agree on an outcome. Two of the reviewers came to the conclusion that this is a ground-breaking paper (simple and elegant). The other two reviewers were perhaps somewhat intrigued, but did not feel the paper was yet ready for publication. For example, during the discussion phase, R4 (a very accomplished and well-respected research in the field) made very valid points about the papers weaknesses: "So all this leads me to suggest that there needs to be a better context, more related work and a better way to situate the paper in related arenas, e.g., provide some sort of a framework to back up the findings. I understand the issue of limited space, but given the amount of literature in this area, I feel that the paper doesnt do a good enough job explaining its findings in context."

emergent reciprocity and team formation, randomized uncertain social preference, reviewer, (10 more...)

Neural Information Processing Systems

Genre: Personal (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.31)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

DialUp! Modeling the Language Continuum by Adapting Models to Dialects and Dialects to Models

Bafna, Niyati, Chang, Emily, Robinson, Nathaniel R., Mortensen, David R., Murray, Kenton, Yarowsky, David, Sirin, Hale

arXiv.org Artificial IntelligenceJan-27-2025

Most of the world's languages and dialects are low-resource, and lack support in mainstream machine translation (MT) models. However, many of them have a closely-related high-resource language (HRL) neighbor, and differ in linguistically regular ways from it. This underscores the importance of model robustness to dialectical variation and cross-lingual generalization to the HRL dialect continuum. We present DialUp, consisting of a training-time technique for adapting a pretrained model to dialectical data (M->D), and an inference-time intervention adapting dialectical data to the model expertise (D->M). M->D induces model robustness to potentially unseen and unknown dialects by exposure to synthetic data exemplifying linguistic mechanisms of dialectical variation, whereas D->M treats dialectical divergence for known target dialects. These methods show considerable performance gains for several dialects from four language families, and modest gains for two other language families. We also conduct feature and error analyses, which show that language varieties with low baseline MT performance are more likely to benefit from these approaches.

large language model, latn, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2501.16581

Country:

Europe > Sweden (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Philippines > Luzon > Ilocos Region > Province of Pangasinan (0.05)
(21 more...)

Genre:

Personal > Honors (0.46)
Research Report > New Finding (0.45)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Review for NeurIPS paper: Forget About the LiDAR: Self-Supervised Depth Estimators with MED Probability Volumes

Neural Information Processing SystemsJan-26-2025, 16:30:58 GMT

Weaknesses: I have no major concerns, but only remarks and suggestions for improvements. Although this is unambiguous in the experimental section, the abstract and introduction should clarify that the method is self-supervised from stereo pairs. There is a lot of confusion in the literature, because all monocular methods predict depth from a single image (by definition) but can be trained in different ways: from lidar supervision (full or partial), from stereo pairs (as is the case here), or from videos (a.k.a. Some of the authors' critique of related works (e.g., regarding dynamic objects) are only applicable to the SfM self-supervised scenario, as in the case of stereo-based self-supervised learning pairs of images are captured at the same time. Furthermore, the SfM case requires estimating the camera's ego-motion, which vastly complicates the self-supervised learning task (hence why the comparison is not entirely fair in my opinion).

abstract and introduction, neurips paper, self-supervised depth estimator, (6 more...)

Neural Information Processing Systems

Genre: Personal (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.97)

Add feedback