AITopics

Neural Information Processing SystemsMar-24-2025, 06:50:16 GMT

IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos

Shape assembly is a ubiquitous task in daily life, integral for constructing complex 3D structures like IKEA furniture. While significant progress has been made in developing autonomous agents for shape assembly, existing datasets have not yet tackled the 4D grounding of assembly instructions in videos, essential for a holistic understanding of assembly in 3D space over time. We introduce IKEA Video Manuals, a dataset that features 3D models of furniture parts, instructional manuals, assembly videos from the Internet, and most importantly, annotations of dense spatio-temporal alignments between these data modalities. To demonstrate the utility of IKEA Video Manuals, we present five applications essential for shape assembly: assembly plan generation, part-conditioned segmentation, partconditioned pose estimation, video object segmentation, and furniture assembly based on instructional video manuals. For each application, we provide evaluation metrics and baseline methods. Through experiments on our annotated data, we highlight many challenges in grounding assembly instructions in videos to improve shape assembly, including handling occlusions, varying viewpoints, and extended assembly sequences.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Genre:

Instructional Material > Training Manual (0.48)
Research Report > New Finding (0.46)

Industry:

Retail (1.00)
Banking & Finance (0.67)
Education > Educational Technology > Audio & Video (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Neural Information Processing SystemsMar-24-2025, 00:23:03 GMT

Efficient Forward Architecture Search

Hanzhang Hu, John Langford, Rich Caruana, Saurajit Mukherjee, Eric J. Horvitz, Debadeepta Dey

We propose a neural architecture search (NAS) algorithm, Petridish, to iteratively add shortcut connections to existing network layers. The added shortcut connections effectively perform gradient boosting on the augmented layers. The proposed algorithm is motivated by the feature selection algorithm forward stage-wise linear regression, since we consider NAS as a generalization of feature selection for regression, where NAS selects shortcuts among layers instead of selecting features. In order to reduce the number of trials of possible connection combinations, we train jointly all possible connections at each stage of growth while leveraging feature selection techniques to choose a subset of them. We experimentally show this process to be an efficient forward architecture search algorithm that can find competitive models using few GPU days in both the search space of repeatable network modules (cell-search) and the space of general networks (macro-search). Petridish is particularly well-suited for warm-starting from existing models crucial for lifelong-learning scenarios.

architecture search, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre: Instructional Material (0.34)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Parental Guidance: Efficient Lifelong Learning through Evolutionary Distillation

Zhang, Octi, Peng, Quanquan, Scalise, Rosario, Boots, Bryon

Developing robotic agents that can generalize across diverse environments while continually evolving their behaviors is a core challenge in AI and robotics. The difficulties lie in solving increasingly complex tasks and ensuring agents can continue learning without converging on narrow, specialized solutions. Quality Diversity (QD) [1, 2] methods effectively foster diversity but often rely on trial and error, where the path to a final solution can be convoluted, leading to inefficiencies and uncertainty. Our approach draws inspiration from nature's inheritance process, where offspring not only receive but also build upon the knowledge of their predecessors. Similarly, our agents inherit distilled behaviors from previous generations, allowing them to adapt and continue learning efficiently, eventually surpassing their predecessors. This natural knowledge transfer reduces randomness, guiding exploration toward more meaningful learning without manual intervention like reward shaping or task descriptors. What sets our method apart is that it offers a straightforward, evolution-inspired way to consolidate and progress, avoiding the need for manually defined styles or gradient editing [3, 4] to prevent forgetting. The agent's ability to retain and refine skills is driven by a blend of IL and RL, naturally passing down essential behaviors while implicitly discarding inferior ones. We introduce Parental Guidance (PG-1) which makes the following contributions: 1. Distributed Evolution Framework: We propose a framework that distributes the evolution process across multiple compute instances, efficiently scheduling and analyzing evolution.

evolutionary algorithm, machine learning, reinforcement learning, (14 more...)

2503.18531

Country: Europe > Germany (0.14)

Genre:

Research Report (0.50)
Instructional Material (0.40)

Industry: Education > Educational Setting > Continuing Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Synthetic Function Demonstrations Improve Generation in Low-Resource Programming Languages

McKenna, Nick, Xu, Xinnuo, Williams, Jack, Wilson, Nick, Van Durme, Benjamin, Poelitz, Christian

A key consideration when training an LLM is whether the target language is more or less resourced, whether this is English compared to Welsh, or Python compared to Excel. Typical training data for programming languages consist of real program demonstrations coupled with human-written comments. Here we present novel approaches to the creation of such data for low resource programming languages. We generate fully-synthetic, textbook-quality demonstrations of common library functions in an example domain of Excel formulas, using a teacher model. We then finetune an underperforming student model, and show improvement on 2 question-answering datasets recast into the Excel domain. We show advantages of finetuning over standard, off-the-shelf RAG approaches, which can offer only modest improvement due to the unfamiliar target domain.

large language model, machine learning, programming language, (22 more...)

2503.1876

Country:

South America (1.00)
North America > United States > Mississippi (0.14)

Genre:

Research Report > New Finding (0.67)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Education (0.49)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Braun, Sacha, Aolaritei, Liviu, Jordan, Michael I., Bach, Francis

Minimum Volume Conformal Sets for Multivariate Regression

arXiv.org Machine LearningMar-24-2025

Conformal prediction provides a principled framework for constructing predictive sets with finite-sample validity. While much of the focus has been on univariate response variables, existing multivariate methods either impose rigid geometric assumptions or rely on flexible but computationally expensive approaches that do not explicitly optimize prediction set volume. We propose an optimization-driven framework based on a novel loss function that directly learns minimum-volume covering sets while ensuring valid coverage. This formulation naturally induces a new nonconformity score for conformal prediction, which adapts to the residual distribution and covariates. Our approach optimizes over prediction sets defined by arbitrary norm balls, including single and multi-norm formulations. Additionally, by jointly optimizing both the predictive model and predictive uncertainty, we obtain prediction sets that are tight, informative, and computationally efficient, as demonstrated in our experiments on real-world datasets.

artificial intelligence, machine learning, prediction, (17 more...)

arXiv.org Machine Learning

2503.19068

Country:

Europe > France (0.28)
North America > Canada (0.28)

Genre:

Research Report > New Finding (0.67)
Instructional Material > Course Syllabus & Notes (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

The Case for "Thick Evaluations" of Cultural Representation in AI

Qadri, Rida, Diaz, Mark, Wang, Ding, Madaio, Michael

To a ddress these gaps, prior work has sought to evaluate the cultural representations within AI generated output, b ut with few exceptions [30, 67], mostly through quantified, metricized approaches to representation such as statistical similarities and benchmark-style scoring [49, 84]. However, the use of these methods presumes that representation is an o bjective construct with an empirical, definitive ground truth that outputs can be compared against [e.g., 42, 84] [fo r a critique of ground truth, see 59]. Given limitations of these computational methods, evaluation of representation is reduced to basic recognition or factual generation of artifacts. Even when human feedback on representation is sought, it is solicited through narrow, constrained, quantitative scales from anonymized crowdworkers who often do not have th e lived experiences to evaluate nuances of cultural representation of other cultures. However, this approach to measuring representation is in contravention to decades of scholarship in the social sciences that emphasizes the subjective nature of representation, where judgments about representation in visual media are constructed in conversation with the viewer's lived experiences and the broader context within which an image is Permission to make digital or hard copies of all or part of thi s work for personal or classroom use is granted without fee pr ovided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.

artificial intelligence, machine learning, natural language, (15 more...)

2503.19075

Country:

North America > United States (1.00)
Asia > India (1.00)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Education (0.48)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

LogicLearner: A Tool for the Guided Practice of Propositional Logic Proofs

Inamdar, Amogh, Macar, Uzay, Vazirani, Michel, Tarnow, Michael, Mustapha, Zarina, Dittren, Natalia, Sadeh, Sam, Verma, Nakul, Salleb-Aouissi, Ansaf

The study of propositional logic -- fundamental to the theory of computing -- is a cornerstone of the undergraduate computer science curriculum. Learning to solve logical proofs requires repeated guided practice, but undergraduate students often lack access to on-demand tutoring in a judgment-free environment. In this work, we highlight the need for guided practice tools in undergraduate mathematics education and outline the desiderata of an effective practice tool. We accordingly develop LogicLearner, a web application for guided logic proof practice. LogicLearner consists of an interface to attempt logic proofs step-by-step and an automated proof solver to generate solutions on the fly, allowing users to request guidance as needed. We pilot LogicLearner as a practice tool in two semesters of an undergraduate discrete mathematics course and receive strongly positive feedback for usability and pedagogical value in student surveys. To the best of our knowledge, LogicLearner is the only learning tool that provides an end-to-end practice environment for logic proofs with immediate, judgment-free feedback.

large language model, logic & formal reasoning, machine learning, (20 more...)

2503.1928

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Education > Educational Setting (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Computational Thinking with Computer Vision: Developing AI Competency in an Introductory Computer Science Course

Chowdhury, Tahiya

Developing competency in artificial intelligence is becoming increasingly crucial for computer science (CS) students at all levels of the CS curriculum. However, most previous research focuses on advanced CS courses, as traditional introductory courses provide limited opportunities to develop AI skills and knowledge. This paper introduces an introductory CS course where students learn computational thinking through computer vision, a sub-field of AI, as an application context. The course aims to achieve computational thinking outcomes alongside critical thinking outcomes that expose students to AI approaches and their societal implications. Through experiential activities such as individual projects and reading discussions, our course seeks to balance technical learning and critical thinking goals. Our evaluation, based on pre-and post-course surveys, shows an improved sense of belonging, self-efficacy, and AI ethics awareness among students. The results suggest that an AI-focused context can enhance participation and employability, student-selected projects support self-efficacy, and ethically grounded AI instruction can be effective for interdisciplinary audiences. Students' discussions on reading assignments demonstrated deep engagement with the complex challenges in today's AI landscape. Finally, we share insights on scaling such courses for larger cohorts and improving the learning experience for introductory CS students.

machine learning, natural language, programming language, (15 more...)

2503.19006

Country: North America > United States (0.30)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Neural Information Processing SystemsMar-23-2025, 22:59:23 GMT

MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making Yubin Kim 1 Chanwoo Park

Foundation models are becoming valuable tools in medicine. Yet despite their promise, the best way to leverage Large Language Models (LLMs) in complex medical tasks remains an open question. We introduce a novel multi-agent framework, named Medical Decision-making Agents (MDAgents) that helps to address this gap by automatically assigning a collaboration structure to a team of LLMs. The assigned solo or group collaboration structure is tailored to the medical task at hand, a simple emulation inspired by the way real-world medical decision-making processes are adapted to tasks of different complexities. We evaluate our framework and baseline methods using state-of-the-art LLMs across a suite of real-world medical knowledge and medical diagnosis benchmarks, including a comparison of LLMs' medical complexity classification against human physicians

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia (0.45)
North America > United States > Massachusetts (0.14)

Genre:

Research Report > New Finding (1.00)
Instructional Material (0.68)
Research Report > Experimental Study > Negative Result (0.45)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(14 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)