South America
Using Platt's scaling for calibration after undersampling -- limitations and how to address them
Phelps, Nathan, Lizotte, Daniel J., Woolford, Douglas G.
When modelling data where the response is dichotomous and highly imbalanced, response-based sampling where a subset of the majority class is retained (i.e., undersampling) is often used to create more balanced training datasets prior to modelling. However, the models fit to this undersampled data, which we refer to as base models, generate predictions that are severely biased. There are several calibration methods that can be used to combat this bias, one of which is Platt's scaling. Here, a logistic regression model is used to model the relationship between the base model's original predictions and the response. Despite its popularity for calibrating models after undersampling, Platt's scaling was not designed for this purpose. Our work presents what we believe is the first detailed study focused on the validity of using Platt's scaling to calibrate models after undersampling. We show analytically, as well as via a simulation study and a case study, that Platt's scaling should not be used for calibration after undersampling without critical thought. If Platt's scaling would have been able to successfully calibrate the base model had it been trained on the entire dataset (i.e., without undersampling), then Platt's scaling might be appropriate for calibration after undersampling. If this is not the case, we recommend a modified version of Platt's scaling that fits a logistic generalized additive model to the logit of the base model's predictions, as it is both theoretically motivated and performed well across the settings considered in our study.
Bounds in Wasserstein distance for locally stationary processes
Tinio, Jan Nino G., Alaya, Mokhtar Z., Bouzebda, Salim
Locally stationary processes (LSPs) provide a robust framework for modeling time-varying phenomena, allowing for smooth variations in statistical properties such as mean and variance over time. In this paper, we address the estimation of the conditional probability distribution of LSPs using Nadaraya-Watson (NW) type estimators. The NW estimator approximates the conditional distribution of a target variable given covariates through kernel smoothing techniques. We establish the convergence rate of the NW conditional probability estimator for LSPs in the univariate setting under the Wasserstein distance and extend this analysis to the multivariate case using the sliced Wasserstein distance. Theoretical results are supported by numerical experiments on both synthetic and real-world datasets, demonstrating the practical usefulness of the proposed estimators.
Generalization Bounds and Model Complexity for Kolmogorov-Arnold Networks
Zhang, Xianyang, Zhou, Huijuan
Kolmogorov-Arnold Network (KAN) is a network structure recently proposed by Liu et al. (2024) that offers improved interpretability and a more parsimonious design in many science-oriented tasks compared to multi-layer perceptrons. This work provides a rigorous theoretical analysis of KAN by establishing generalization bounds for KAN equipped with activation functions that are either represented by linear combinations of basis functions or lying in a low-rank Reproducing Kernel Hilbert Space (RKHS). In the first case, the generalization bound accommodates various choices of basis functions in forming the activation functions in each layer of KAN and is adapted to different operator norms at each layer. For a particular choice of operator norms, the bound scales with the $l_1$ norm of the coefficient matrices and the Lipschitz constants for the activation functions, and it has no dependence on combinatorial parameters (e.g., number of nodes) outside of logarithmic factors. Moreover, our result does not require the boundedness assumption on the loss function and, hence, is applicable to a general class of regression-type loss functions. In the low-rank case, the generalization bound scales polynomially with the underlying ranks as well as the Lipschitz constants of the activation functions in each layer. These bounds are empirically investigated for KANs trained with stochastic gradient descent on simulated and real data sets. The numerical results demonstrate the practical relevance of these bounds.
Meta says AI had only 'modest' impact on global elections in 2024
Despite fears that artificial intelligence (AI) could influence the outcome of elections around the world, the United States technology giant Meta said it detected little impact across its platforms this year. That was in part due to defensive measures designed to prevent coordinated networks of accounts, or bots, from grabbing attention on Facebook, Instagram and Threads, Meta president of global affairs Nick Clegg told reporters on Tuesday. "I don't think the use of generative AI was a particularly effective tool for them to evade our trip wires," Clegg said of actors behind coordinated disinformation campaigns. In 2024, Meta says it ran several election operations centres around the world to monitor content issues, including during elections in the US, Bangladesh, Brazil, France, India, Indonesia, Mexico, Pakistan, South Africa, the United Kingdom and the European Union. Most of the covert influence operations it has disrupted in recent years were carried out by actors from Russia, Iran and China, Clegg said, adding that Meta took down about 20 "covert influence operations" on its platform this year.
Meta says AI-generated content was less than 1 precent of election misinformation
AI-generated content played a much smaller role in global election misinformation than what many officials and researchers had feared, according to a new analysis from Meta. In an update on its efforts to safeguard dozens of elections in 2024, the company said that AI content made up only a fraction of election-related misinformation that was caught and labeled by its fact checkers. "During the election period in the major elections listed above, ratings on AI content related to elections, politics and social topics represented less than 1% of all fact-checked misinformation," the company shared in a blog post, referring to elections in the US, UK, Bangladesh, Indonesia, India, Pakistan, France, South Africa, Mexico and Brazil, as well as the EU's Parliamentary elections. The update comes after numerous government officials and researchers for months raised the alarm about the role generative AI could play in supercharging election misinformation in a year when more than 2 billion people were expected to go to the polls. But those fears largely did not play out -- at least on Meta's platforms -- according to the company's President of Global Affairs, Nick Clegg.
Adaptive Two-Phase Finetuning LLMs for Japanese Legal Text Retrieval
Trung, Quang Hoang, Phuc, Nguyen Van Hoang, Hoang, Le Trung, Hieu, Quang Huu, Duy, Vo Nguyen Le
Text Retrieval (TR) involves finding and retrieving text-based content relevant to a user's query from a large repository, with applications in real-world scenarios such as legal document retrieval. While most existing studies focus on English, limited work addresses Japanese contexts. In this paper, we introduce a new dataset specifically designed for Japanese legal contexts and propose a novel two-phase pipeline tailored to this domain. In the first phase, the model learns a broad understanding of global contexts, enhancing its generalization and adaptability to diverse queries. In the second phase, the model is fine-tuned to address complex queries specific to legal scenarios. Extensive experiments are conducted to demonstrate the superior performance of our method, which outperforms existing baselines. Furthermore, our pipeline proves effective in English contexts, surpassing comparable baselines on the MS MARCO dataset. We have made our code publicly available on GitHub, and the model checkpoints are accessible via HuggingFace.
Proximal Control of UAVs with Federated Learning for Human-Robot Collaborative Domains
Nobrega, Lucas Nogueira, de Oliveira, Ewerton, Saska, Martin, Nascimento, Tiago
The human-robot interaction (HRI) is a growing area of research. In HRI, complex command (action) classification is still an open problem that usually prevents the real applicability of such a technique. The literature presents some works that use neural networks to detect these actions. However, occlusion is still a major issue in HRI, especially when using uncrewed aerial vehicles (UAVs), since, during the robot's movement, the human operator is often out of the robot's field of view. Furthermore, in multi-robot scenarios, distributed training is also an open problem. In this sense, this work proposes an action recognition and control approach based on Long Short-Term Memory (LSTM) Deep Neural Networks with two layers in association with three densely connected layers and Federated Learning (FL) embedded in multiple drones. The FL enabled our approach to be trained in a distributed fashion, i.e., access to data without the need for cloud or other repositories, which facilitates the multi-robot system's learning. Furthermore, our multi-robot approach results also prevented occlusion situations, with experiments with real robots achieving an accuracy greater than 96%.
A Multi-Agent Framework for Extensible Structured Text Generation in PLCs
Yang, Donghao, Wu, Aolang, Zhang, Tianyi, Zhang, Li, Liu, Fang, Lian, Xiaoli, Ren, Yuming, Tian, Jiaji
Programmable Logic Controllers (PLCs) are microcomputers essential for automating factory operations. Structured Text (ST), a high-level language adhering to the IEC 61131-3 standard, is pivotal for PLCs due to its ability to express logic succinctly and to seamlessly integrate with other languages within the same standard. However, vendors develop their own customized versions of ST, and the lack of comprehensive and standardized documentation for the full semantics of ST has contributed to inconsistencies in how the language is implemented. Consequently, the steep learning curve associated with ST, combined with ever-evolving industrial requirements, presents significant challenges for developers. In response to these issues, we present AutoPLC, an LLM-based approach designed to automate the generation of vendor-specific ST code. To facilitate effective code generation, we first built a comprehensive knowledge base, including Rq2ST Case Library (requirements and corresponding implementations) and Instruction libraries. Then we developed a retrieval module to incorporate the domain-specific knowledge by identifying pertinent cases and instructions, guiding the LLM to generate code that meets the requirements. In order to verify and improve the quality of the generated code, we designed an adaptable code checker. If errors are detected, we initiate an iterative self-improvement process to instruct the LLM to revise the generated code. We evaluate AutoPLC's performance against seven state-of-the-art baselines using three benchmarks, one for open-source basic ST and two for commercial Structured Control Language (SCL) from Siemens. The results show that our approach consistently achieves superior performance across all benchmarks. Ablation study emphasizes the significance of our modules. Further manual analysis confirm the practical utility of the ST code generated by AutoPLC.
3D Interaction Geometric Pre-training for Molecular Relational Learning
Lee, Namkyeong, Oh, Yunhak, Noh, Heewoong, Na, Gyoung S., Xu, Minkai, Wang, Hanchen, Fu, Tianfan, Park, Chanyoung
Molecular relational learning (MRL) focuses on understanding the interaction dynamics between molecules and has gained significant attention from researchers thanks to its diverse applications [20]. For instance, understanding how a medication dissolves in different solvents (medication-solvent interaction) is vital in pharmacy [30, 26, 3], while predicting the optical and photophysical properties of chromophores in various solvents (chromophore-solvent interaction) is essential for material discovery [16]. Because of the expensive time and financial costs associated with conducting wet lab experiments to test the interaction behavior of all possible molecular pairs [31], machine learning methods have been quickly embraced for MRL. Despite recent advancements in MRL, previous works tend to ignore molecules' 3D geometric information and instead focus solely on their 2D topological structures. However, in molecular science, the 3D geometric information of molecules (Figure 1 (a)) is crucial for understanding and predicting molecular behavior across various contexts, ranging from physical properties [1] to biological functions [10, 46]. This is particularly important in MRL, as geometric information plays a key role in molecular interactions by determining how molecules recognize, interact, and bind with one another in their interaction environment [34]. In traditional molecular dynamics simulations, explicit solvent models, which directly consider the detailed environment of molecular interaction, have demonstrated superior performance compared to implicit solvent models, which simplify the solvent as a continuous medium, highlighting the significance of explicitly modeling the complex geometries of interaction environments [47]. However, acquiring stereochemical structures of molecules is often very costly, resulting in limited availability of such 3D geometric information for downstream tasks [23].
Probing the statistical properties of enriched co-occurrence networks
Amancio, Diego R., Machicao, Jeaneth, Quispe, Laura V. C.
Recent studies have explored the addition of virtual edges to word co-occurrence networks using word embeddings to enhance graph representations, particularly for short texts. While these enriched networks have demonstrated some success, the impact of incorporating semantic edges into traditional co-occurrence networks remains uncertain. In this study, we investigate two key statistical properties of text-based network models. First, we assess whether network metrics can effectively distinguish between meaningless and meaningful texts. Second, we analyze whether these metrics are more sensitive to syntactic or semantic aspects of the text. Our results show that incorporating virtual edges can have both positive and negative effects, depending on the specific network metric. For instance, the informativeness of the average shortest path and closeness centrality improves in short texts, while the clustering coefficient's informativeness decreases as more virtual edges are added. Additionally, we found that including stopwords affects the statistical properties of enriched networks. Our results can serve as a guideline for determining which network metrics are most appropriate for specific applications, depending on the typical text size and the nature of the problem.