Goto

Collaborating Authors

 simple task


Large Language Models as Commonsense Knowledge for Large-Scale Task Planning Anonymous Author(s) Affiliation Address email Appendix 1 A Experimental environments 2 We use the VirtualHome simulator [

Neural Information Processing Systems

A.1 List of objects, containers, surfaces, and rooms in the apartment We list all the objects that are included in our experimental environment. We use the object rearrangement tasks for evaluation. The tasks are randomly sampled from different distributions. Simple: this task is to move one object in the house to the desired location. Novel Simple: this task is to move one object in the house to the desired location.




Large Language Models as Commonsense Knowledge for Large-Scale Task Planning Anonymous Author(s) Affiliation Address email Appendix 1 A Experimental environments 2 We use the VirtualHome simulator [

Neural Information Processing Systems

A.1 List of objects, containers, surfaces, and rooms in the apartment We list all the objects that are included in our experimental environment. We use the object rearrangement tasks for evaluation. The tasks are randomly sampled from different distributions. Simple: this task is to move one object in the house to the desired location. Novel Simple: this task is to move one object in the house to the desired location.


Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs

Xu, Xiaoyu, Yue, Xiang, Liu, Yang, Ye, Qingqing, Zheng, Huadi, Hu, Peizhao, Du, Minxin, Hu, Haibo

arXiv.org Artificial Intelligence

Unlearning in large language models (LLMs) aims to remove specified data, but its efficacy is typically assessed with task-level metrics like accuracy and perplexity. We demonstrate that these metrics are often misleading, as models can appear to forget while their original behavior is easily restored through minimal fine-tuning. This phenomenon of \emph{reversibility} suggests that information is merely suppressed, not genuinely erased. To address this critical evaluation gap, we introduce a \emph{representation-level analysis framework}. Our toolkit comprises PCA-based similarity and shift, centered kernel alignment (CKA), and Fisher information, complemented by a summary metric, the mean PCA distance, to measure representational drift. Applying this framework across six unlearning methods, three data domains, and two LLMs, we identify four distinct forgetting regimes based on their \emph{reversibility} and \emph{catastrophicity}. Our analysis reveals that achieving the ideal state--irreversible, non-catastrophic forgetting--is exceptionally challenging. By probing the limits of unlearning, we identify a case of seemingly irreversible, targeted forgetting, offering new insights for designing more robust erasure algorithms. Our findings expose a fundamental gap in current evaluation practices and establish a representation-level foundation for trustworthy unlearning.




Why advanced robots still struggle with simple tasks

Popular Science

Robots in 2024 are far more complex than their single-armed factory-working predecessors. Modern robots can run, jump, do the splits, and even hold down a basic conversation. At the same, despite decades of technical advancements and billions of dollars of investment, even the most advanced robot systems still struggle to do many everyday tasks humans take for granted like folding laundry or stacking blocks. Ironically, robots are quite bad at doing things we find easy. New advances in robot training take some inspiration from massively popular large language models like ChatGPT may change that… eventually.


Comment on Is Complexity an Illusion?

Simmons, Gabriel

arXiv.org Artificial Intelligence

The paper "Is Complexity an Illusion?" (Bennett, 2024) provides a formalism for complexity, learning, inference, and generalization, and introduces a formal definition for a "policy". This reply shows that correct policies do not exist for a simple task of supervised multi-class classification, via mathematical proof and exhaustive search. Implications of this result are discussed, as well as possible responses and amendments to the theory.


Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability

Xu, Zhuoyan, Shi, Zhenmei, Liang, Yingyu

arXiv.org Artificial Intelligence

Large language models (LLMs) have emerged as powerful tools for many AI problems and exhibit remarkable in-context learning (ICL) capabilities. Compositional ability, solving unseen complex tasks that combine two or more simple tasks, is an essential reasoning ability for Artificial General Intelligence. Despite the tremendous success of LLMs, how they approach composite tasks, especially those not encountered during the pretraining phase, remains an open and largely underexplored question. In this study, we delve into the ICL capabilities of LLMs on composite tasks, with only simple tasks as in-context examples. We develop a test suite of composite tasks including linguistic and logical challenges and perform empirical studies across different LLM families. We observe that models exhibit divergent behaviors: (1) For simpler composite tasks that apply distinct mapping mechanisms to different input segments, the models demonstrate decent compositional ability, while scaling up the model enhances this ability; (2) for more complex composite tasks involving reasoning multiple steps, where each step represents one task, models typically underperform, and scaling up generally provides no improvements. We offer theoretical analysis in a simplified setting, explaining that models exhibit compositional capability when the task handles different input parts separately. We believe our work sheds new light on the capabilities of LLMs in solving composite tasks regarding the nature of the tasks and model scale. Our dataset and code are available at {\url{https://github.com/OliverXUZY/LLM_Compose}}.