South America
A Minimalistic 3D Self-Organized UAV Flocking Approach for Desert Exploration
Amorim, Thulio, Nascimento, Tiago, Chaudhary, Akash, Ferrante, Eliseo, Saska, Martin
In this work, we propose a minimalistic swarm flocking approach for multirotor unmanned aerial vehicles (UAVs). Our approach allows the swarm to achieve cohesively and aligned flocking (collective motion), in a random direction, without externally provided directional information exchange (alignment control). The method relies on minimalistic sensory requirements as it uses only the relative range and bearing of swarm agents in local proximity obtained through onboard sensors on the UAV. Thus, our method is able to stabilize and control the flock of a general shape above a steep terrain without any explicit communication between swarm members. To implement proximal control in a three-dimensional manner, the Lennard-Jones potential function is used to maintain cohesiveness and avoid collisions between robots. The performance of the proposed approach was tested in real-world conditions by experiments with a team of nine UAVs. Experiments also present the usage of our approach on UAVs that are independent of external positioning systems such as the Global Navigation Satellite System (GNSS). Relying only on a relative visual localization through the ultraviolet direction and ranging (UVDAR) system, previously proposed by our group, the experiments verify that our system can be applied in GNSS-denied environments. The degree achieved of alignment and cohesiveness was evaluated using the metrics of order and steady-state value.
Flattering to Deceive: The Impact of Sycophantic Behavior on User Trust in Large Language Model
Sycophancy refers to the tendency of a large language model to align its outputs with the user's perceived preferences, beliefs, or opinions, in order to look favorable, regardless of whether those statements are factually correct. This behavior can lead to undesirable consequences, such as reinforcing discriminatory biases or amplifying misinformation. Given that sycophancy is often linked to human feedback training mechanisms, this study explores whether sycophantic tendencies negatively impact user trust in large language models or, conversely, whether users consider such behavior as favorable. To investigate this, we instructed one group of participants to answer ground-truth questions with the assistance of a GPT specifically designed to provide sycophantic responses, while another group used the standard version of ChatGPT. Initially, participants were required to use the language model, after which they were given the option to continue using it if they found it trustworthy and useful. Trust was measured through both demonstrated actions and self-reported perceptions. The findings consistently show that participants exposed to sycophantic behavior reported and exhibited lower levels of trust compared to those who interacted with the standard version of the model, despite the opportunity to verify the accuracy of the model's output.
A Generalized Thrust Estimation and Control Approach for Multirotors Micro Aerial Vehicles
Santos, Davi, Saska, Martin, Nascimento, Tiago
This paper addresses the problem of thrust estimation and control for the rotors of small-sized multirotors Uncrewed Aerial Vehicles (UAVs). Accurate control of the thrust generated by each rotor during flight is one of the main challenges for robust control of quadrotors. The most common approach is to approximate the mapping of rotor speed to thrust with a simple quadratic model. This model is known to fail under non-hovering flight conditions, introducing errors into the control pipeline. One of the approaches to modeling the aerodynamics around the propellers is the Blade Element Momentum Theory (BEMT). Here, we propose a novel BEMT-based closed-loop thrust estimator and control to eliminate the laborious calibration step of finding several aerodynamic coefficients. We aim to reuse known values as a baseline and fit the thrust estimate to values closest to the real ones with a simple test bench experiment, resulting in a single scaling value. A feedforward PID thrust control was implemented for each rotor, and the methods were validated by outdoor experiments with two multirotor UAV platforms: 250mm and 500mm. A statistical analysis of the results showed that the thrust estimation and control provided better robustness under aerodynamically varying flight conditions compared to the quadratic model.
GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot
Zeng, Aohan, Du, Zhengxiao, Liu, Mingdao, Wang, Kedong, Jiang, Shengmin, Zhao, Lei, Dong, Yuxiao, Tang, Jie
We introduce GLM-4-Voice, an intelligent and human-like end-to-end spoken chatbot. It supports both Chinese and English, engages in real-time voice conversations, and varies vocal nuances such as emotion, intonation, speech rate, and dialect according to user instructions. GLM-4-Voice uses an ultra-low bitrate (175bps), single-codebook speech tokenizer with 12.5Hz frame rate derived from an automatic speech recognition (ASR) model by incorporating a vector-quantized bottleneck into the encoder. To efficiently transfer knowledge from text to speech modalities, we synthesize speech-text interleaved data from existing text pre-training corpora using a text-to-token model. We continue pre-training from the pre-trained text language model GLM-4-9B with a combination of unsupervised speech data, interleaved speech-text data, and supervised speech-text data, scaling up to 1 trillion tokens, achieving state-of-the-art performance in both speech language modeling and spoken question answering. We then fine-tune the pre-trained model with high-quality conversational speech data, achieving superior performance compared to existing baselines in both conversational ability and speech quality.
Defending Against Diverse Attacks in Federated Learning Through Consensus-Based Bi-Level Optimization
Trillos, Nicolรกs Garcรญa, Akash, Aditya Kumar, Li, Sixu, Riedl, Konstantin, Zhu, Yuhua
Adversarial attacks pose significant challenges in many machine learning applications, particularly in the setting of distributed training and federated learning, where malicious agents seek to corrupt the training process with the goal of jeopardizing and compromising the performance and reliability of the final models. In this paper, we address the problem of robust federated learning in the presence of such attacks by formulating the training task as a bi-level optimization problem. We conduct a theoretical analysis of the resilience of consensus-based bi-level optimization (CB$^2$O), an interacting multi-particle metaheuristic optimization method, in adversarial settings. Specifically, we provide a global convergence analysis of CB$^2$O in mean-field law in the presence of malicious agents, demonstrating the robustness of CB$^2$O against a diverse range of attacks. Thereby, we offer insights into how specific hyperparameter choices enable to mitigate adversarial effects. On the practical side, we extend CB$^2$O to the clustered federated learning setting by proposing FedCB$^2$O, a novel interacting multi-particle system, and design a practical algorithm that addresses the demands of real-world applications. Extensive experiments demonstrate the robustness of the FedCB$^2$O algorithm against label-flipping attacks in decentralized clustered federated learning scenarios, showcasing its effectiveness in practical contexts.
Bio-inspired visual relative localization for large swarms of UAVs
Kลรญลพek, Martin, Vrba, Matouลก, Kulaลก, Antonella Bariลกiฤ, Bogdan, Stjepan, Saska, Martin
We propose a new approach to visual perception for relative localization of agents within large-scale swarms of UAVs. Inspired by biological perception utilized by schools of sardines, swarms of bees, and other large groups of animals capable of moving in a decentralized yet coherent manner, our method does not rely on detecting individual neighbors by each agent and estimating their relative position, but rather we propose to regress a neighbor density over distance. This allows for a more accurate distance estimation as well as better scalability with respect to the number of neighbors. Additionally, a novel swarm control algorithm is proposed to make it compatible with the new relative localization method. We provide a thorough evaluation of the presented methods and demonstrate that the regressing approach to distance estimation is more robust to varying relative pose of the targets and that it is suitable to be used as the main source of relative localization for swarm stabilization.
A Comprehensive Evaluation of Large Language Models on Aspect-Based Sentiment Analysis
Zhou, Changzhi, Song, Dandan, Tian, Yuhang, Wu, Zhijing, Wang, Hao, Zhang, Xinyu, Yang, Jun, Yang, Ziyi, Zhang, Shuhao
Recently, Large Language Models (LLMs) have garnered increasing attention in the field of natural language processing, revolutionizing numerous downstream tasks with powerful reasoning and generation abilities. For example, In-Context Learning (ICL) introduces a fine-tuning-free paradigm, allowing out-of-the-box LLMs to execute downstream tasks by analogy learning without any fine-tuning. Besides, in a fine-tuning-dependent paradigm where substantial training data exists, Parameter-Efficient Fine-Tuning (PEFT), as the cost-effective methods, enable LLMs to achieve excellent performance comparable to full fine-tuning. However, these fascinating techniques employed by LLMs have not been fully exploited in the ABSA field. Previous works probe LLMs in ABSA by merely using randomly selected input-output pairs as demonstrations in ICL, resulting in an incomplete and superficial evaluation. In this paper, we shed light on a comprehensive evaluation of LLMs in the ABSA field, involving 13 datasets, 8 ABSA subtasks, and 6 LLMs. Specifically, we design a unified task formulation to unify ``multiple LLMs for multiple ABSA subtasks in multiple paradigms.'' For the fine-tuning-dependent paradigm, we efficiently fine-tune LLMs using instruction-based multi-task learning. For the fine-tuning-free paradigm, we propose 3 demonstration selection strategies to stimulate the few-shot abilities of LLMs. Our extensive experiments demonstrate that LLMs achieve a new state-of-the-art performance compared to fine-tuned Small Language Models (SLMs) in the fine-tuning-dependent paradigm. More importantly, in the fine-tuning-free paradigm where SLMs are ineffective, LLMs with ICL still showcase impressive potential and even compete with fine-tuned SLMs on some ABSA subtasks.
Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models
Park, Jungwon, Ko, Jungmin, Byun, Dongnam, Suh, Jangwon, Rhee, Wonjong
Recent text-to-image diffusion models leverage cross-attention layers, which have been effectively utilized to enhance a range of visual generative tasks. However, our understanding of cross-attention layers remains somewhat limited. In this study, we present a method for constructing Head Relevance Vectors (HRVs) that align with useful visual concepts. An HRV for a given visual concept is a vector with a length equal to the total number of cross-attention heads, where each element represents the importance of the corresponding head for the given visual concept. We develop and employ an ordered weakening analysis to demonstrate the effectiveness of HRVs as interpretable features. To demonstrate the utility of HRVs, we propose concept strengthening and concept adjusting methods and apply them to enhance three visual generative tasks. We show that misinterpretations of polysemous words in image generation can be corrected in most cases, five challenging attributes in image editing can be successfully modified, and catastrophic neglect in multi-concept generation can be mitigated. Overall, our work provides an advancement in understanding cross-attention layers and introduces new approaches for fine-controlling these layers at the head level.
Multi-objective Deep Learning: Taxonomy and Survey of the State of the Art
Peitz, Sebastian, Hotegni, Sedjro Salomon
Simultaneously considering multiple objectives in machine learning has been a popular approach for several decades, with various benefits for multi-task learning, the consideration of secondary goals such as sparsity, or multicriteria hyperparameter tuning. However - as multi-objective optimization is significantly more costly than single-objective optimization - the recent focus on deep learning architectures poses considerable additional challenges due to the very large number of parameters, strong nonlinearities and stochasticity. This survey covers recent advancements in the area of multi-objective deep learning. We introduce a taxonomy of existing methods - based on the type of training algorithm as well as the decision maker's needs - before listing recent advancements, and also successful applications. All three main learning paradigms supervised learning, unsupervised learning and reinforcement learning are covered, and we also address the recently very popular area of generative modeling.
AtomR: Atomic Operator-Empowered Large Language Models for Heterogeneous Knowledge Reasoning
Xin, Amy, Liu, Jinxin, Yao, Zijun, Lee, Zhicheng, Cao, Shulin, Hou, Lei, Li, Juanzi
Recent advancements in large language models (LLMs) have led to significant improvements in various natural language processing tasks, but it is still challenging for LLMs to perform knowledge-intensive complex question answering due to LLMs' inefficacy in reasoning planning and the hallucination problem. A typical solution is to employ retrieval-augmented generation (RAG) coupled with chain-of-thought (CoT) reasoning, which decomposes complex questions into chain-like sub-questions and applies iterative RAG at each sub-question. However, prior works exhibit sub-optimal reasoning planning and overlook dynamic knowledge retrieval from heterogeneous sources. In this paper, we propose AtomR, a novel heterogeneous knowledge reasoning framework that conducts multi-source reasoning at the atomic level. Drawing inspiration from the graph modeling of knowledge, AtomR leverages large language models (LLMs) to decompose complex questions into combinations of three atomic knowledge operators, significantly enhancing the reasoning process at both the planning and execution stages. We also introduce BlendQA, a novel evaluation benchmark tailored to assess complex heterogeneous knowledge reasoning. Experiments show that AtomR significantly outperforms state-of-the-art baselines across three single-source and two multi-source reasoning benchmarks, with notable performance gains of 9.4% on 2WikiMultihop and 9.5% on BlendQA.