Goto

Collaborating Authors

 ulti



Learning Communication Skills in Multi-task Multi-agent Deep Reinforcement Learning

Zhu, Changxi, Dastani, Mehdi, Wang, Shihan

arXiv.org Artificial Intelligence

In multi-agent deep reinforcement learning (MADRL), agents can communicate with one another to perform a task in a coordinated manner. When multiple tasks are involved, agents can also leverage knowledge from one task to improve learning in other tasks. In this paper, we propose Multi-task Communication Skills (MCS), a MADRL with communication method that learns and performs multiple tasks simultaneously, with agents interacting through learnable communication protocols. MCS employs a Transformer encoder to encode task-specific observations into a shared message space, capturing shared communication skills among agents. To enhance coordination among agents, we introduce a prediction network that correlates messages with the actions of sender agents in each task. We adapt three multi-agent benchmark environments to multi-task settings, where the number of agents as well as the observation and action spaces vary across tasks. Experimental results demonstrate that MCS achieves better performance than multi-task MADRL baselines without communication, as well as single-task MADRL baselines with and without communication.



Exploring Continual Fine-Tuning for Enhancing Language Ability in Large Language Model

Aggarwal, Divyanshu, Damle, Sankarshan, Goyal, Navin, Lokam, Satya, Sitaram, Sunayana

arXiv.org Artificial Intelligence

A common challenge towards the adaptability of Large Language Models (LLMs) is their ability to learn new languages over time without hampering the model's performance on languages in which the model is already proficient (usually English). Continual fine-tuning (CFT) is the process of sequentially fine-tuning an LLM to enable the model to adapt to downstream tasks with varying data distributions and time shifts. This paper focuses on the language adaptability of LLMs through CFT. We study a two-phase CFT process in which an English-only end-to-end fine-tuned LLM from Phase 1 (predominantly Task Ability) is sequentially fine-tuned on a multilingual dataset -- comprising task data in new languages -- in Phase 2 (predominantly Language Ability). We observe that the ``similarity'' of Phase 2 tasks with Phase 1 determines the LLM's adaptability. For similar phase-wise datasets, the LLM after Phase 2 does not show deterioration in task ability. In contrast, when the phase-wise datasets are not similar, the LLM's task ability deteriorates. We test our hypothesis on the open-source \mis\ and \llm\ models with multiple phase-wise dataset pairs. To address the deterioration, we analyze tailored variants of two CFT methods: layer freezing and generative replay. Our findings demonstrate their effectiveness in enhancing the language ability of LLMs while preserving task performance, in comparison to relevant baselines.


Teaching LLMs to Abstain across Languages via Multilingual Feedback

Feng, Shangbin, Shi, Weijia, Wang, Yike, Ding, Wenxuan, Ahia, Orevaoghene, Li, Shuyue Stella, Balachandran, Vidhisha, Sitaram, Sunayana, Tsvetkov, Yulia

arXiv.org Artificial Intelligence

Multilingual LLMs often have knowledge disparities across languages, with larger gaps in under-resourced languages. Teaching LLMs to abstain in the face of knowledge gaps is thus a promising strategy to mitigate hallucinations in multilingual settings. However, previous studies on LLM abstention primarily focus on English; we find that directly applying existing solutions beyond English results in up to 20.5% performance gaps between high and low-resource languages, potentially due to LLMs' drop in calibration and reasoning beyond a few resource-rich languages. To this end, we propose strategies to enhance LLM abstention by learning from multilingual feedback, where LLMs self-reflect on proposed answers in one language by generating multiple feedback items in related languages: we show that this helps identifying the knowledge gaps across diverse languages, cultures, and communities. Extensive experiments demonstrate that our multilingual feedback approach outperforms various strong baselines, achieving up to 9.2% improvement for low-resource languages across three black-box and open models on three datasets, featuring open-book, closed-book, and commonsense QA. Further analysis reveals that multilingual feedback is both an effective and a more equitable abstain strategy to serve diverse language speakers, and cultural factors have great impact on language selection and LLM abstention behavior, highlighting future directions for multilingual and multi-cultural reliable language modeling.


Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?

Son, Guijin, Baek, Sangwon, Nam, Sangdae, Jeong, Ilgyun, Kim, Seungone

arXiv.org Artificial Intelligence

Large language models (LLMs) are typically prompted to follow a single instruction per inference call. In this work, we analyze whether LLMs also hold the capability to handle multiple instructions simultaneously, denoted as Multi-Task Inference. For this purpose, we introduce the MTI Bench(Multi-Task Inference Benchmark), a comprehensive evaluation benchmark encompassing 5,000 instances across 25 tasks. Each task in the MTI Bench involves 2 to 3 sub-tasks. As expected, we first demonstrate that Multi-Task Inference reduces the total inference time by 1.46 times in average since it does not require multiple inference calls. Interestingly, contrary to the expectation that LLMs would perform better when tasks are divided, we find that state-of-the-art LLMs, such as Llama-2-Chat-70B and GPT-4, show up to 7.3% and 12.4% improved performance with Multi-Task Inference compared to Single-Task Inference on the MTI Bench. We release the MTI Bench dataset and our code at this link https://github.com/guijinSON/MTI-Bench.


Multi: Multimodal Understanding Leaderboard with Text and Images

Zhu, Zichen, Xu, Yang, Chen, Lu, Yang, Jingkai, Ma, Yichuan, Sun, Yiming, Wen, Hailin, Liu, Jiaqi, Cai, Jinyu, Ma, Yingzi, Zhang, Situo, Zhao, Zihan, Sun, Liangtai, Yu, Kai

arXiv.org Artificial Intelligence

Rapid progress in multimodal large language models (MLLMs) highlights the need to introduce challenging yet realistic benchmarks to the academic community. Existing benchmarks primarily focus on simple natural image understanding, but Multi emerges as a cutting-edge benchmark for MLLMs, offering a comprehensive dataset for evaluating MLLMs against understanding complex figures and tables, and scientific questions. This benchmark, reflecting current realistic examination styles, provides multimodal inputs and requires responses that are either precise or open-ended, similar to real-life school tests. It challenges MLLMs with a variety of tasks, ranging from formula derivation to image detail analysis, and cross-modality reasoning. Multi includes over 18,000 questions, with a focus on science-based QA in diverse formats. We also introduce Multi-Elite, a 500-question subset for testing the extremities of MLLMs, and Multi-Extend, which enhances In-Context Learning research with more than 4,500 knowledge pieces. Our evaluation indicates significant potential for MLLM advancement, with GPT-4V achieving a 63.7% accuracy rate on Multi, in contrast to other MLLMs scoring between 31.3% and 53.7%. Multi serves not only as a robust evaluation platform but also paves the way for the development of expert-level AI.


Modeling multi-legged robot locomotion with slipping and its experimental validation

Wu, Ziyou, Zhao, Dan, Revzen, Shai

arXiv.org Artificial Intelligence

Multi-legged robots with six or more legs are not in common use, despite designs with superior stability, maneuverability, and a low number of actuators being available for over 20 years. This may be in part due to the difficulty in modeling multi-legged motion with slipping and producing reliable predictions of body velocity. Here we present a detailed measurement of the foot contact forces in a hexapedal robot with multiple sliding contacts, and provide an algorithm for predicting these contact forces and the body velocity. The algorithm relies on the recently published observation that even while slipping, multi-legged robots are principally kinematic, and employ a friction law ansatz that allows us to compute the shape-change to body-velocity connection and the foot contact forces. This results in the ability to simulate motion plans for a large number of contacts, each potentially with slipping. Furthermore, in homogeneous environments, this kind of simulation can run in (parallel) logarithmic time of the planning horizon.

  Country:
  Genre: Research Report (0.64)
  Industry:

MultiIoT: Towards Large-scale Multisensory Learning for the Internet of Things

Mo, Shentong, Liang, Paul Pu, Salakhutdinov, Russ, Morency, Louis-Philippe

arXiv.org Artificial Intelligence

The Internet of Things (IoT), the network integrating billions of smart physical devices embedded with sensors, software, and communication technologies for the purpose of connecting and exchanging data with other devices and systems, is a critical and rapidly expanding component of our modern world. The IoT ecosystem provides a rich source of real-world modalities such as motion, thermal, geolocation, imaging, depth, sensors, video, and audio for prediction tasks involving the pose, gaze, activities, and gestures of humans as well as the touch, contact, pose, 3D of physical objects. Machine learning presents a rich opportunity to automatically process IoT data at scale, enabling efficient inference for impact in understanding human wellbeing, controlling physical devices, and interconnecting smart cities. To develop machine learning technologies for IoT, this paper proposes MultiIoT, the most expansive IoT benchmark to date, encompassing over 1.15 million samples from 12 modalities and 8 tasks. MultiIoT introduces unique challenges involving (1) learning from many sensory modalities, (2) fine-grained interactions across long temporal ranges, and (3) extreme heterogeneity due to unique structure and noise topologies in real-world sensors. We also release a set of strong modeling baselines, spanning modality and task-specific methods to multisensory and multitask models to encourage future research in multisensory representation learning for IoT.


PolyLM: An Open Source Polyglot Large Language Model

Wei, Xiangpeng, Wei, Haoran, Lin, Huan, Li, Tianhao, Zhang, Pei, Ren, Xingzhang, Li, Mei, Wan, Yu, Cao, Zhiwei, Xie, Binbin, Hu, Tianxiang, Li, Shangjie, Hui, Binyuan, Yu, Bowen, Liu, Dayiheng, Yang, Baosong, Huang, Fei, Xie, Jun

arXiv.org Artificial Intelligence

Large language models (LLMs) demonstrate remarkable ability to comprehend, reason, and generate following nature language instructions. However, the development of LLMs has been primarily focused on high-resource languages, such as English, thereby limiting their applicability and research in other languages. Consequently, we present PolyLM, a multilingual LLM trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B. To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training. Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning. To assess the model's performance, we collect several existing multilingual tasks, including multilingual understanding, question answering, generation, and translation. Extensive experiments show that PolyLM surpasses other open-source models such as LLaMA and BLOOM on multilingual tasks while maintaining comparable performance in English. Our models, alone with the instruction data and multilingual benchmark, are available at: \url{https://modelscope.cn/models/damo/nlp_polylm_13b_text_generation}.