Clustering
Driving Intelligent IoT Monitoring and Control through Cloud Computing and Machine Learning
Li, Hanzhe, Wang, Xiangxiang, Feng, Yuan, Qi, Yaqian, Tian, Jingxiao
This article explores how to drive intelligent iot monitoring and control through cloud computing and machine learning. As iot and the cloud continue to generate large and diverse amounts of data as sensor devices in the network, the collected data is sent to the cloud for statistical analysis, prediction, and data analysis to achieve business objectives. However, because the cloud computing model is limited by distance, it can be problematic in environments where the quality of the Internet connection is not ideal for critical operations. Therefore, edge computing, as a distributed computing architecture, moves the location of processing applications, data and services from the central node of the network to the logical edge node of the network to reduce the dependence on cloud processing and analysis of data, and achieve near-end data processing and analysis. The combination of iot and edge computing can reduce latency, improve efficiency, and enhance security, thereby driving the development of intelligent systems. The paper also introduces the development of iot monitoring and control technology, the application of edge computing in iot monitoring and control, and the role of machine learning in data analysis and fault detection. Finally, the application and effect of intelligent Internet of Things monitoring and control system in industry, agriculture, medical and other fields are demonstrated through practical cases and experimental studies.
Path Integral Control with Rollout Clustering and Dynamic Obstacles
Patrick, Steven, Bakolas, Efstathios
Model Predictive Path Integral (MPPI) control has proven to be a powerful tool for the control of uncertain systems (such as systems subject to disturbances and systems with unmodeled dynamics). One important limitation of the baseline MPPI algorithm is that it does not utilize simulated trajectories to their fullest extent. For one, it assumes that the average of all trajectories weighted by their performance index will be a safe trajectory. In this paper, multiple examples are shown where the previous assumption does not hold, and a trajectory clustering technique is presented that reduces the chances of the weighted average crossing in an unsafe region. Secondly, MPPI does not account for dynamic obstacles, so the authors put forward a novel cost function that accounts for dynamic obstacles without adding significant computation time to the overall algorithm. The novel contributions proposed in this paper were evaluated with extensive simulations to demonstrate improvements upon the state-of-the-art MPPI techniques.
Compression of the Koopman matrix for nonlinear physical models via hierarchical clustering
Nishikata, Tomoya, Ohkubo, Jun
Machine learning methods allow the prediction of nonlinear dynamical systems from data alone. The Koopman operator is one of them, which enables us to employ linear analysis for nonlinear dynamical systems. The linear characteristics of the Koopman operator are hopeful to understand the nonlinear dynamics and perform rapid predictions. The extended dynamic mode decomposition (EDMD) is one of the methods to approximate the Koopman operator as a finite-dimensional matrix. In this work, we propose a method to compress the Koopman matrix using hierarchical clustering. Numerical demonstrations for the cart-pole model and comparisons with the conventional singular value decomposition (SVD) are shown; the results indicate that the hierarchical clustering performs better than the naive SVD compressions.
Graph Language Model (GLM): A new graph-based approach to detect social instabilities
de Oliveira, Wallyson Lemes, Shamsaddini, Vahid, Ghofrani, Ali, Inda, Rahul Singh, Veeramaneni, Jithendra Sai, Voutaz, รtienne
This scientific report presents a novel methodology for the early prediction of important political events using News datasets. The methodology leverages natural language processing, graph theory, clique analysis, and semantic relationships to uncover hidden predictive signals within the data. Initially, we designed a preliminary version of the method and tested it on a few events. This analysis revealed limitations in the initial research phase. We then enhanced the model in two key ways: first, we added a filtration step to only consider politically relevant news before further processing; second, we adjusted the input features to make the alert system more sensitive to significant spikes in the data. After finalizing the improved methodology, we tested it on eleven events including US protests, the Ukraine war, and French protests. Results demonstrate the superiority of our approach compared to baseline methods. Through targeted refinements, our model can now provide earlier and more accurate predictions of major political events based on subtle patterns in news data.
Using Domain Knowledge to Guide Dialog Structure Induction via Neural Probabilistic Soft Logic
Pryor, Connor, Yuan, Quan, Liu, Jeremiah, Kazemi, Mehran, Ramachandran, Deepak, Bedrax-Weiss, Tania, Getoor, Lise
Dialog Structure Induction (DSI) is the task of inferring the latent dialog structure (i.e., a set of dialog states and their temporal transitions) of a given goal-oriented dialog. It is a critical component for modern dialog system design and discourse analysis. Existing DSI approaches are often purely data-driven, deploy models that infer latent states without access to domain knowledge, underperform when the training corpus is limited/noisy, or have difficulty when test dialogs exhibit distributional shifts from the training domain. This work explores a neural-symbolic approach as a potential solution to these problems. We introduce Neural Probabilistic Soft Logic Dialogue Structure Induction (NEUPSL DSI), a principled approach that injects symbolic knowledge into the latent space of a generative neural model. We conduct a thorough empirical investigation on the effect of NEUPSL DSI learning on hidden representation quality, few-shot learning, and out-of-domain generalization performance. Over three dialog structure induction datasets and across unsupervised and semi-supervised settings for standard and cross-domain generalization, the injection of symbolic knowledge using NEUPSL DSI provides a consistent boost in performance over the canonical baselines.
New Intent Discovery with Attracting and Dispersing Prototype
Zhang, Shun, Yang, Jian, Bai, Jiaqi, Yan, Chaoran, Li, Tongliang, Yan, Zhao, Li, Zhoujun
New Intent Discovery (NID) aims to recognize known and infer new intent categories with the help of limited labeled and large-scale unlabeled data. The task is addressed as a feature-clustering problem and recent studies augment instance representation. However, existing methods fail to capture cluster-friendly representations, since they show less capability to effectively control and coordinate within-cluster and between-cluster distances. Tailored to the NID problem, we propose a Robust and Adaptive Prototypical learning (RAP) framework for globally distinct decision boundaries for both known and new intent categories. Specifically, a robust prototypical attracting learning (RPAL) method is designed to compel instances to gravitate toward their corresponding prototype, achieving greater within-cluster compactness. To attain larger between-cluster separation, another adaptive prototypical dispersing learning (APDL) method is devised to maximize the between-cluster distance from the prototype-to-prototype perspective. Experimental results evaluated on three challenging benchmarks (CLINC, BANKING, and StackOverflow) of our method with better cluster-friendly representation demonstrate that RAP brings in substantial improvements over the current state-of-the-art methods (even large language model) by a large margin (average +5.5% improvement).
From Discrete to Continuous: Deep Fair Clustering With Transferable Representations
We consider the problem of deep fair clustering, which partitions data into clusters via the representations extracted by deep neural networks while hiding sensitive data attributes. To achieve fairness, existing methods present a variety of fairness-related objective functions based on the group fairness criterion. However, these works typically assume that the sensitive attributes are discrete and do not work for continuous sensitive variables, such as the proportion of the female population in an area. Besides, the potential of the representations learned from clustering tasks to improve performance on other tasks is ignored by existing works. In light of these limitations, we propose a flexible deep fair clustering method that can handle discrete and continuous sensitive attributes simultaneously. Specifically, we design an information bottleneck style objective function to learn fair and clustering-friendly representations. Furthermore, we explore for the first time the transferability of the extracted representations to other downstream tasks. Unlike existing works, we impose fairness at the representation level, which could guarantee fairness for the transferred task regardless of clustering results. To verify the effectiveness of the proposed method, we perform extensive experiments on datasets with discrete and continuous sensitive attributes, demonstrating the advantage of our method in comparison with state-of-the-art methods.
What Happens to a Dataset Transformed by a Projection-based Concept Removal Method?
We investigate the behavior of methods that use linear projections to remove information about a concept from a language representation, and we consider the question of what happens to a dataset transformed by such a method. A theoretical analysis and experiments on real-world and synthetic data show that these methods inject strong statistical dependencies into the transformed datasets. After applying such a method, the representation space is highly structured: in the transformed space, an instance tends to be located near instances of the opposite label. As a consequence, the original labeling can in some cases be reconstructed by applying an anti-clustering method.
Live and Learn: Continual Action Clustering with Incremental Views
Yan, Xiaoqiang, Gan, Yingtao, Mao, Yiqiao, Ye, Yangdong, Yu, Hui
Multi-view action clustering leverages the complementary information from different camera views to enhance the clustering performance. Although existing approaches have achieved significant progress, they assume all camera views are available in advance, which is impractical when the camera view is incremental over time. Besides, learning the invariant information among multiple camera views is still a challenging issue, especially in continual learning scenario. Aiming at these problems, we propose a novel continual action clustering (CAC) method, which is capable of learning action categories in a continual learning manner. To be specific, we first devise a category memory library, which captures and stores the learned categories from historical views. Then, as a new camera view arrives, we only need to maintain a consensus partition matrix, which can be updated by leveraging the incoming new camera view rather than keeping all of them. Finally, a three-step alternate optimization is proposed, in which the category memory library and consensus partition matrix are optimized. The empirical experimental results on 6 realistic multi-view action collections demonstrate the excellent clustering performance and time/space efficiency of the CAC compared with 15 state-of-the-art baselines.
Varroa destructor detection on honey bees using hyperspectral imagery
Duma, Zina-Sabrina, Zemcik, Tomas, Bilik, Simon, Sihvonen, Tuomas, Honec, Peter, Reinikainen, Satu-Pia, Horak, Karel
Hyperspectral (HS) imagery in agriculture is becoming increasingly common. These images have the advantage of higher spectral resolution. Advanced spectral processing techniques are required to unlock the information potential in these HS images. The present paper introduces a method rooted in multivariate statistics designed to detect parasitic Varroa destructor mites on the body of western honey bee Apis mellifera, enabling easier and continuous monitoring of the bee hives. The methodology explores unsupervised (K-means++) and recently developed supervised (Kernel Flows - Partial Least-Squares, KF-PLS) methods for parasitic identification. Additionally, in light of the emergence of custom-band multispectral cameras, the present research outlines a strategy for identifying the specific wavelengths necessary for effective bee-mite separation, suitable for implementation in a custom-band camera. Illustrated with a real-case dataset, our findings demonstrate that as few as four spectral bands are sufficient for accurate parasite identification.