Clustering
End-to-End Framework for Robot Lawnmower Coverage Path Planning using Cellular Decomposition
Shah, Nikunj, Dey, Utsav, Nishimiya, Kenji
Efficient Coverage Path Planning (CPP) is necessary for autonomous robotic lawnmowers to effectively navigate and maintain lawns with diverse and irregular shapes. This paper introduces a comprehensive end-to-end pipeline for CPP, designed to convert user-defined boundaries on an aerial map into optimized coverage paths seamlessly. The pipeline includes user input extraction, coordinate transformation, area decomposition and path generation using our novel AdaptiveDecompositionCPP algorithm, preview and customization through an interactive coverage path visualizer, and conversion to actionable GPS waypoints. The AdaptiveDecompositionCPP algorithm combines cellular decomposition with an adaptive merging strategy to reduce non-mowing travel thereby enhancing operational efficiency. Experimental evaluations, encompassing both simulations and real-world lawnmower tests, demonstrate the effectiveness of the framework in coverage completeness and mowing efficiency.
Applying XAI based unsupervised knowledge discovering for Operation modes in a WWTP. A real case: AQUAVALL WWTP
Beneyto-Rodriguez, Alicia, Sainz-Palmero, Gregorio I., Galende-Hernรกndez, Marta, Fuente, Marรญa J., Cuenca, Josรฉ M.
Water reuse is a key point when fresh water is a commodity in ever greater demand, but which is also becoming ever more available. Furthermore, the return of clean water to its natural environment is also mandatory. Therefore, wastewater treatment plants (WWTPs) are essential in any policy focused on these serious challenges. WWTPs are complex facilities which need to operate at their best to achieve their goals. Nowadays, they are largely monitored, generating large databases of historical data concerning their functioning over time. All this implies a large amount of embedded information which is not usually easy for plant managers to assimilate, correlate and understand; in other words, for them to know the global operation of the plant at any given time. At this point, the intelligent and Machine Learning (ML) approaches can give support for that need, managing all the data and translating them into manageable, interpretable and explainable knowledge about how the WWTP plant is operating at a glance. Here, an eXplainable Artificial Intelligence (XAI) based methodology is proposed and tested for a real WWTP, in order to extract explainable service knowledge concerning the operation modes of the WWTP managed by AQUAVALL, which is the public service in charge of the integral water cycle in the City Council of Valladolid (Castilla y Leรณn, Spain). By applying well-known approaches of XAI and ML focused on the challenge of WWTP, it has been possible to summarize a large number of historical databases through a few explained operation modes of the plant in a low-dimensional data space, showing the variables and facility units involved in each case.
Interpretable Clustering Ensemble
Lv, Hang, Hu, Lianyu, Jiang, Mudi, Liu, Xinying, He, Zengyou
--Clustering ensemble has emerged as an important research topic in the field of machine learning. Although numerous methods have been proposed to improve clustering quality, most existing approaches overlook the need for interpretability in high-stakes applications. In domains such as medical diagnosis and financial risk assessment, algorithms must not only be accurate but also interpretable to ensure transparent and trustworthy decision-making. Therefore, to fill the gap of lack of interpretable algorithms in the field of clustering ensemble, we propose the first interpretable clustering ensemble algorithm in the literature. By treating base partitions as categorical variables, our method constructs a decision tree in the original feature space and use the statistical association test to guide the tree building process. Experimental results demonstrate that our algorithm achieves comparable performance to state-of-the-art (SOT A) clustering ensemble methods while maintaining an additional feature of interpretability. T o the best of our knowledge, this is the first interpretable algorithm specifically designed for clustering ensemble, offering a new perspective for future research in interpretable clustering. LUSTERING analysis [1] is an unsupervised learning issue in the field of data mining, which aims to partition data into different clusters by exploring its intrinsic structure.
Advancement and Field Evaluation of a Dual-arm Apple Harvesting Robot
Zhu, Keyi, Lammers, Kyle, Zhang, Kaixiang, Arunachalam, Chaaran, Bhattacharya, Siddhartha, Li, Jiajia, Lu, Renfu, Li, Zhaojian
Apples are among the most widely consumed fruits worldwide. Currently, apple harvesting fully relies on manual labor, which is costly, drudging, and hazardous to workers. Hence, robotic harvesting has attracted increasing attention in recent years. However, existing systems still fall short in terms of performance, effectiveness, and reliability for complex orchard environments. In this work, we present the development and evaluation of a dual-arm harvesting robot. The system integrates a ToF camera, two 4DOF robotic arms, a centralized vacuum system, and a post-harvest handling module. During harvesting, suction force is dynamically assigned to either arm via the vacuum system, enabling efficient apple detachment while reducing power consumption and noise. Compared to our previous design, we incorporated a platform movement mechanism that enables both in-out and up-down adjustments, enhancing the robot's dexterity and adaptability to varying canopy structures. On the algorithmic side, we developed a robust apple localization pipeline that combines a foundation-model-based detector, segmentation, and clustering-based depth estimation, which improves performance in orchards. Additionally, pressure sensors were integrated into the system, and a novel dual-arm coordination strategy was introduced to respond to harvest failures based on sensor feedback, further improving picking efficiency. Field demos were conducted in two commercial orchards in MI, USA, with different canopy structures. The system achieved success rates of 0.807 and 0.797, with an average picking cycle time of 5.97s. The proposed strategy reduced harvest time by 28% compared to a single-arm baseline. The dual-arm harvesting robot enhances the reliability and efficiency of apple picking. With further advancements, the system holds strong potential for autonomous operation and commercialization for the apple industry.
Learning-Augmented Hierarchical Clustering
Braverman, Vladimir, Ergun, Jon C., Wang, Chen, Zhou, Samson
Hierarchical clustering (HC) is an important data analysis technique in which the goal is to recursively partition a dataset into a tree-like structure while grouping together similar data points at each level of granularity. Unfortunately, for many of the proposed HC objectives, there exist strong barriers to approximation algorithms with the hardness of approximation. Thus, we consider the problem of hierarchical clustering given auxiliary information from natural oracles. Specifically, we focus on a *splitting oracle* which, when provided with a triplet of vertices $(u,v,w)$, answers (possibly erroneously) the pairs of vertices whose lowest common ancestor includes all three vertices in an optimal tree, i.e., identifying which vertex ``splits away'' from the others. Using such an oracle, we obtain the following results: - A polynomial-time algorithm that outputs a hierarchical clustering tree with $O(1)$-approximation to the Dasgupta objective (Dasgupta [STOC'16]). - A near-linear time algorithm that outputs a hierarchical clustering tree with $(1-o(1))$-approximation to the Moseley-Wang objective (Moseley and Wang [NeurIPS'17]). Under the plausible Small Set Expansion Hypothesis, no polynomial-time algorithm can achieve any constant approximation for Dasgupta's objective or $(1-C)$-approximation for the Moseley-Wang objective for some constant $C>0$. As such, our results demonstrate that the splitting oracle enables algorithms to outperform standard HC approaches and overcome hardness constraints. Furthermore, our approaches extend to sublinear settings, in which we show new streaming and PRAM algorithms for HC with improved guarantees.
HGOT: Self-supervised Heterogeneous Graph Neural Network with Optimal Transport
Liu, Yanbei, Wang, Chongxu, Xiao, Zhitao, Geng, Lei, Pang, Yanwei, Wang, Xiao
Heterogeneous Graph Neural Networks (HGNNs), have demonstrated excellent capabilities in processing heterogeneous information networks. Self-supervised learning on heterogeneous graphs, especially contrastive self-supervised strategy, shows great potential when there are no labels. However, this approach requires the use of carefully designed graph augmentation strategies and the selection of positive and negative samples. Determining the exact level of similarity between sample pairs is non-trivial.To solve this problem, we propose a novel self-supervised Heterogeneous graph neural network with Optimal Transport (HGOT) method which is designed to facilitate self-supervised learning for heterogeneous graphs without graph augmentation strategies. Different from traditional contrastive self-supervised learning, HGOT employs the optimal transport mechanism to relieve the laborious sampling process of positive and negative samples. Specifically, we design an aggregating view (central view) to integrate the semantic information contained in the views represented by different meta-paths (branch views). Then, we introduce an optimal transport plan to identify the transport relationship between the semantics contained in the branch view and the central view. This allows the optimal transport plan between graphs to align with the representations, forcing the encoder to learn node representations that are more similar to the graph space and of higher quality. Extensive experiments on four real-world datasets demonstrate that our proposed HGOT model can achieve state-of-the-art performance on various downstream tasks. In particular, in the node classification task, HGOT achieves an average of more than 6% improvement in accuracy compared with state-of-the-art methods.
Communication Efficient Adaptive Model-Driven Quantum Federated Learning
Gurung, Dev, Pokhrel, Shiva Raj
--Training with huge datasets and a large number of participating devices leads to bottlenecks in federated learning (FL). Furthermore, the challenges of heterogeneity between multiple FL clients affect the overall performance of the system. In a quantum federated learning (QFL) context, we address these three main challenges: i) training bottlenecks from massive datasets, ii) the involvement of a substantial number of devices, and iii) non-IID data distributions. We introduce a model-driven quantum federated learning algorithm (mdQFL) to tackle these challenges. Our proposed approach is efficient and adaptable to various factors, including different numbers of devices. T o the best of our knowledge, it is the first to explore training and update personalization, as well as test generalization within a QFL setting, which can be applied to other FL scenarios. We evaluated the efficiency of the proposed mdQFL framework through extensive experiments under diverse non-IID data heterogeneity conditions using various datasets within the Qiskit environment. Our results demonstrate a nearly 50% decrease in total communication costs while maintaining or, in some cases, exceeding the accuracy of the final model and consistently improving local model training compared to the standard QFL baseline. Moreover, our experimental evaluation thoroughly explores the QFL and mdQFL algorithms, along with several influencing factors. In addition, we present a theoretical analysis to clarify the complexities of the proposed algorithm. Federated Learning (FL) has emerged as a pivotal technique to address the challenges of privacy and security in distributed machine learning [1], [2].
Unsupervised Machine Learning for Scientific Discovery: Workflow and Best Practices
Chang, Andersen, Tang, Tiffany M., Zikry, Tarek M., Allen, Genevera I.
Unsupervised machine learning is widely used to mine large, unlabeled datasets to make data-driven discoveries in critical domains such as climate science, biomedicine, astronomy, chemistry, and more. However, despite its widespread utilization, there is a lack of standardization in unsupervised learning workflows for making reliable and reproducible scientific discoveries. In this paper, we present a structured workflow for using unsupervised learning techniques in science. We highlight and discuss best practices starting with formulating validatable scientific questions, conducting robust data preparation and exploration, using a range of modeling techniques, performing rigorous validation by evaluating the stability and generalizability of unsupervised learning conclusions, and promoting effective communication and documentation of results to ensure reproducible scientific discoveries. To illustrate our proposed workflow, we present a case study from astronomy, seeking to refine globular clusters of Milky Way stars based upon their chemical composition. Our case study highlights the importance of validation and illustrates how the benefits of a carefully-designed workflow for unsupervised learning can advance scientific discovery.
Clustering and Median Aggregation Improve Differentially Private Inference
Amin, Kareem, Avestimehr, Salman, Babakniya, Sara, Bie, Alex, Kong, Weiwei, Ponomareva, Natalia, Syed, Umar
Differentially private (DP) language model inference is an approach for generating private synthetic text. A sensitive input example is used to prompt an off-the-shelf large language model (LLM) to produce a similar example. Multiple examples can be aggregated together to formally satisfy the DP guarantee. Prior work creates inference batches by sampling sensitive inputs uniformly at random. We show that uniform sampling degrades the quality of privately generated text, especially when the sensitive examples concern heterogeneous topics. We remedy this problem by clustering the input data before selecting inference batches. Next, we observe that clustering also leads to more similar next-token predictions across inferences. We use this insight to introduce a new algorithm that aggregates next token statistics by privately computing medians instead of averages. This approach leverages the fact that the median has decreased local sensitivity when next token predictions are similar, allowing us to state a data-dependent and ex-post DP guarantee about the privacy properties of this algorithm. Finally, we demonstrate improvements in terms of representativeness metrics (e.g., MAUVE) as well as downstream task performance. We show that our method produces high-quality synthetic data at significantly lower privacy cost than a previous state-of-the-art method.
Short-Term Power Demand Forecasting for Diverse Consumer Types to Enhance Grid Planning and Synchronisation
Diaz-Iglesias, Asier, Belaunzaran, Xabier, Florez-Tapia, Ane M.
Ensuring grid stability in the transition to renewable energy sources requires accurate power demand forecasting. This study addresses the need for precise forecasting by differentiating among industrial, commercial, and residential consumers through customer clusterisation, tailoring the forecasting models to capture the unique consumption patterns of each group. A feature selection process is done for each consumer type including temporal, socio-economic, and weather-related data obtained from the Copernicus Earth Observation (EO) program. A variety of AI and machine learning algorithms for Short-Term Load Forecasting (STLF) and Very Short-Term Load Forecasting (VSTLF) are explored and compared, determining the most effective approaches. With all that, the main contribution of this work are the new forecasting approaches proposed, which have demonstrated superior performance compared to simpler models, both for STLF and VSTLF, highlighting the importance of customized forecasting strategies for different consumer groups and demonstrating the impact of incorporating detailed weather data on forecasting accuracy. These advancements contribute to more reliable power demand predictions, thereby supporting grid stability.