Biobío Region
The Geometry of Meaning: Perfect Spacetime Representations of Hierarchical Structures
Anabalon, Andres, Garces, Hugo, Oliva, Julio, Cifuentes, Jose
We show that there is a fast algorithm that embeds hierarchical structures in three-dimensional Minkowski spacetime. The correlation of data ends up purely encoded in the causal structure. Our model relies solely on oriented token pairs -- local hierarchical signals -- with no access to global symbolic structure. We apply our method to the corpus of \textit{WordNet}. We provide a perfect embedding of the mammal sub-tree including ambiguities (more than one hierarchy per node) in such a way that the hierarchical structures get completely codified in the geometry and exactly reproduce the ground-truth. We extend this to a perfect embedding of the maximal unambiguous subset of the \textit{WordNet} with 82{,}115 noun tokens and a single hierarchy per token. We introduce a novel retrieval mechanism in which causality, not distance, governs hierarchical access. Our results seem to indicate that all discrete data has a perfect geometrical representation that is three-dimensional. The resulting embeddings are nearly conformally invariant, indicating deep connections with general relativity and field theory. These results suggest that concepts, categories, and their interrelations, namely hierarchical meaning itself, is geometric.
Astromer 2
Donoso-Oliva, Cristobal, Becker, Ignacio, Protopapas, Pavlos, Cabrera-Vives, Guillermo, Cádiz-Leyton, Martina, Moreno-Cartagena, Daniel
Foundational models have emerged as a powerful paradigm in deep learning field, leveraging their capacity to learn robust representations from large-scale datasets and effectively to diverse downstream applications such as classification. In this paper, we present Astromer 2 a foundational model specifically designed for extracting light curve embeddings. We introduce Astromer 2 as an enhanced iteration of our self-supervised model for light curve analysis. This paper highlights the advantages of its pre-trained embeddings, compares its performance with that of its predecessor, Astromer 1, and provides a detailed empirical analysis of its capabilities, offering deeper insights into the model's representations. Astromer 2 is pretrained on 1.5 million single-band light curves from the MACHO survey using a self-supervised learning task that predicts randomly masked observations within sequences. Fine-tuning on a smaller labeled dataset allows us to assess its performance in classification tasks. The quality of the embeddings is measured by the F1 score of an MLP classifier trained on Astromer-generated embeddings. Our results demonstrate that Astromer 2 significantly outperforms Astromer 1 across all evaluated scenarios, including limited datasets of 20, 100, and 500 samples per class. The use of weighted per-sample embeddings, which integrate intermediate representations from Astromer's attention blocks, is particularly impactful. Notably, Astromer 2 achieves a 15% improvement in F1 score on the ATLAS dataset compared to prior models, showcasing robust generalization to new datasets. This enhanced performance, especially with minimal labeled data, underscores the potential of Astromer 2 for more efficient and scalable light curve analysis.
Enhancing Predictive Maintenance in Mining Mobile Machinery through a TinyML-enabled Hierarchical Inference Network
de la Fuente, Raúl, Radrigan, Luciano, Morales, Anibal S
Mining machinery operating in variable environments faces high wear and unpredictable stress, challenging Predictive Maintenance (PdM). This paper introduces the Edge Sensor Network for Predictive Maintenance (ESN-PdM), a hierarchical inference framework across edge devices, gateways, and cloud services for real-time condition monitoring. The system dynamically adjusts inference locations--on-device, on-gateway, or on-cloud--based on trade-offs among accuracy, latency, and battery life, leveraging Tiny Machine Learning (TinyML) techniques for model optimization on resource-constrained devices. Performance evaluations showed that on-sensor and on-gateway inference modes achieved over 90\% classification accuracy, while cloud-based inference reached 99\%. On-sensor inference reduced power consumption by approximately 44\%, enabling up to 104 hours of operation. Latency was lowest for on-device inference (3.33 ms), increasing when offloading to the gateway (146.67 ms) or cloud (641.71 ms). The ESN-PdM framework provides a scalable, adaptive solution for reliable anomaly detection and PdM, crucial for maintaining machinery uptime in remote environments. By balancing accuracy, latency, and energy consumption, this approach advances PdM frameworks for industrial applications.
Analyzing Diversity in Healthcare LLM Research: A Scientometric Perspective
Restrepo, David, Wu, Chenwei, Vásquez-Venegas, Constanza, Matos, João, Gallifant, Jack, Filipe, Luis
The deployment of large language models (LLMs) in healthcare has demonstrated substantial potential for enhancing clinical decision-making, administrative efficiency, and patient outcomes. However, the underrepresentation of diverse groups in the development and application of these models can perpetuate biases, leading to inequitable healthcare delivery. This paper presents a comprehensive scientometric analysis of LLM research for healthcare, including data from January 1, 2021, to June 16, 2024. By analyzing metadata from PubMed and Dimensions, including author affiliations, countries, and funding sources, we assess the diversity of contributors to LLM research. Our findings highlight significant gender and geographic disparities, with a predominance of male authors and contributions primarily from high-income countries (HICs). We introduce a novel journal diversity index based on Gini impurity to measure the inclusiveness of scientific publications. Our results underscore the necessity for greater representation in order to ensure the equitable application of LLMs in healthcare. We propose actionable strategies to enhance diversity and inclusivity in artificial intelligence research, with the ultimate goal of fostering a more inclusive and equitable future in healthcare innovation.
An economically-consistent discrete choice model with flexible utility specification based on artificial neural networks
Hernandez, Jose Ignacio, Mouter, Niek, van Cranenburgh, Sander
Random utility maximisation (RUM) models are one of the cornerstones of discrete choice modelling. However, specifying the utility function of RUM models is not straightforward and has a considerable impact on the resulting interpretable outcomes and welfare measures. In this paper, we propose a new discrete choice model based on artificial neural networks (ANNs) named "Alternative-Specific and Shared weights Neural Network (ASS-NN)", which provides a further balance between flexible utility approximation from the data and consistency with two assumptions: RUM theory and fungibility of money (i.e., "one euro is one euro"). Therefore, the ASS-NN can derive economically-consistent outcomes, such as marginal utilities or willingness to pay, without explicitly specifying the utility functional form. Using a Monte Carlo experiment and empirical data from the Swissmetro dataset, we show that ASS-NN outperforms (in terms of goodness of fit) conventional multinomial logit (MNL) models under different utility specifications. Furthermore, we show how the ASS-NN is used to derive marginal utilities and willingness to pay measures.
Domain Adaptation via Minimax Entropy for Real/Bogus Classification of Astronomical Alerts
Cabrera-Vives, Guillermo, Bolivar, César, Förster, Francisco, Arancibia, Alejandra M. Muñoz, Pérez-Carrasco, Manuel, Reyes, Esteban
An important number of these alerts analysis of multiple massive datasets in real time, are bogus artifacts created by the image reduction pipelines, prompting the development of multi-stream machine hence, the importance of creating real/bogus classification learning models. In this work, we study algorithms which have proven to be extremely useful for Domain Adaptation (DA) for real/bogus classification detecting real astrophysical phenomena. During the last of astronomical alerts using four different decade, most of these algorithms have been based on Convolutional datasets: HiTS, DES, ATLAS, and ZTF. We Neural Networks (Cabrera-Vives et al., 2016; 2017; study the domain shift between these datasets, Reyes et al., 2018; Duev et al., 2019; Turpin et al., 2020; Yin and improve a naive deep learning classification et al., 2021; Rabeendran & Denneau, 2021) which need a model by using a fine tuning approach and significant amount of data to be trained.
Multi-Class Deep SVDD: Anomaly Detection Approach in Astronomy with Distinct Inlier Categories
Pérez-Carrasco, Manuel, Cabrera-Vives, Guillermo, Hernández-García, Lorena, Forster, Francisco, Sánchez-Sáez, Paula, Arancibia, Alejandra Muñoz, Astorga, Nicolás, Bauer, Franz, Bayo, Amelia, Cádiz-Leyton, Martina, Catelan, Marcio
With the increasing volume of astronomical data generated by modern survey telescopes, automated pipelines and machine learning techniques have become crucial for analyzing and extracting knowledge from these datasets. Anomaly detection, i.e. the task of identifying irregular or unexpected patterns in the data, is a complex challenge in astronomy. In this paper, we propose Multi-Class Deep Support Vector Data Description (MCDSVDD), an extension of the state-of-the-art anomaly detection algorithm One-Class Deep SVDD, specifically designed to handle different inlier categories with distinct data distributions. MCDSVDD uses a neural network to map the data into hyperspheres, where each hypersphere represents a specific inlier category. The distance of each sample from the centers of these hyperspheres determines the anomaly score. We evaluate the effectiveness of MCDSVDD by comparing its performance with several anomaly detection algorithms on a large dataset of astronomical light-curves obtained from the Zwicky Transient Facility. Our results demonstrate the efficacy of MCDSVDD in detecting anomalous sources while leveraging the presence of different inlier categories. The code and the data needed to reproduce our results are publicly available at https://github.com/mperezcarrasco/AnomalyALeRCE.
Multi-Agent Path Finding: A New Boolean Encoding
Asín Achá, Roberto (Universidad de Concepción) | López, Rodrigo (Universidad de Chile & Pontificia Universidad Católica de Chile) | Hagedorn, Sebastian (Pontificia Universidad Católica de Chile) | Baier, Jorge A. (Pontificia Universidad Católica de Chile)
Multi-agent pathfinding (MAPF) is an NP-hard problem. As such, dense maps may be very hard to solve optimally. In such scenarios, compilation-based approaches, via Boolean satisfiability (SAT) and answer set programming (ASP), have been shown to outperform heuristic-search-based approaches, such as conflict-based search (CBS). In this paper, we propose a new Boolean encoding for MAPF, and show how to implement it in ASP and MaxSAT. A feature that distinguishes our encoding from existing ones is that swap and follow conflicts are encoded using binary clauses, which can be exploited by current conflict-driven clause learning (CDCL) solvers. In addition, the number of clauses used to encode swap and follow conflicts do not depend on the number of agents, allowing us to scale better. For MaxSAT, we study different ways in which we may combine the MSU3 and LSU algorithms for maximum performance. In our experimental evaluation, we used square grids, ranging from 20 x 20 to 50 x 50 cells, and warehouse maps, with a varying number of agents and obstacles. We compared against representative solvers of the state-of-the-art, including the search-based algorithm CBS, the ASP-based solver ASP-MAPF, and the branch-and-cut-and-price hybrid solver, BCP. We observe that the ASP implementation of our encoding, ASP-MAPF2 outperforms other solvers in most of our experiments. The MaxSAT implementation of our encoding, MtMS shows best performance in relatively small warehouse maps when the number of agents is large, which are the instances with closer resemblance to hard puzzle-like problems.
Federated Learning Enables Big Data for Rare Cancer Boundary Detection
Pati, Sarthak, Baid, Ujjwal, Edwards, Brandon, Sheller, Micah, Wang, Shih-Han, Reina, G Anthony, Foley, Patrick, Gruzdev, Alexey, Karkada, Deepthi, Davatzikos, Christos, Sako, Chiharu, Ghodasara, Satyam, Bilello, Michel, Mohan, Suyash, Vollmuth, Philipp, Brugnara, Gianluca, Preetha, Chandrakanth J, Sahm, Felix, Maier-Hein, Klaus, Zenk, Maximilian, Bendszus, Martin, Wick, Wolfgang, Calabrese, Evan, Rudie, Jeffrey, Villanueva-Meyer, Javier, Cha, Soonmee, Ingalhalikar, Madhura, Jadhav, Manali, Pandey, Umang, Saini, Jitender, Garrett, John, Larson, Matthew, Jeraj, Robert, Currie, Stuart, Frood, Russell, Fatania, Kavi, Huang, Raymond Y, Chang, Ken, Balana, Carmen, Capellades, Jaume, Puig, Josep, Trenkler, Johannes, Pichler, Josef, Necker, Georg, Haunschmidt, Andreas, Meckel, Stephan, Shukla, Gaurav, Liem, Spencer, Alexander, Gregory S, Lombardo, Joseph, Palmer, Joshua D, Flanders, Adam E, Dicker, Adam P, Sair, Haris I, Jones, Craig K, Venkataraman, Archana, Jiang, Meirui, So, Tiffany Y, Chen, Cheng, Heng, Pheng Ann, Dou, Qi, Kozubek, Michal, Lux, Filip, Michálek, Jan, Matula, Petr, Keřkovský, Miloš, Kopřivová, Tereza, Dostál, Marek, Vybíhal, Václav, Vogelbaum, Michael A, Mitchell, J Ross, Farinhas, Joaquim, Maldjian, Joseph A, Yogananda, Chandan Ganesh Bangalore, Pinho, Marco C, Reddy, Divya, Holcomb, James, Wagner, Benjamin C, Ellingson, Benjamin M, Cloughesy, Timothy F, Raymond, Catalina, Oughourlian, Talia, Hagiwara, Akifumi, Wang, Chencai, To, Minh-Son, Bhardwaj, Sargam, Chong, Chee, Agzarian, Marc, Falcão, Alexandre Xavier, Martins, Samuel B, Teixeira, Bernardo C A, Sprenger, Flávia, Menotti, David, Lucio, Diego R, LaMontagne, Pamela, Marcus, Daniel, Wiestler, Benedikt, Kofler, Florian, Ezhov, Ivan, Metz, Marie, Jain, Rajan, Lee, Matthew, Lui, Yvonne W, McKinley, Richard, Slotboom, Johannes, Radojewski, Piotr, Meier, Raphael, Wiest, Roland, Murcia, Derrick, Fu, Eric, Haas, Rourke, Thompson, John, Ormond, David Ryan, Badve, Chaitra, Sloan, Andrew E, Vadmal, Vachan, Waite, Kristin, Colen, Rivka R, Pei, Linmin, Ak, Murat, Srinivasan, Ashok, Bapuraj, J Rajiv, Rao, Arvind, Wang, Nicholas, Yoshiaki, Ota, Moritani, Toshio, Turk, Sevcan, Lee, Joonsang, Prabhudesai, Snehal, Morón, Fanny, Mandel, Jacob, Kamnitsas, Konstantinos, Glocker, Ben, Dixon, Luke V M, Williams, Matthew, Zampakis, Peter, Panagiotopoulos, Vasileios, Tsiganos, Panagiotis, Alexiou, Sotiris, Haliassos, Ilias, Zacharaki, Evangelia I, Moustakas, Konstantinos, Kalogeropoulou, Christina, Kardamakis, Dimitrios M, Choi, Yoon Seong, Lee, Seung-Koo, Chang, Jong Hee, Ahn, Sung Soo, Luo, Bing, Poisson, Laila, Wen, Ning, Tiwari, Pallavi, Verma, Ruchika, Bareja, Rohan, Yadav, Ipsa, Chen, Jonathan, Kumar, Neeraj, Smits, Marion, van der Voort, Sebastian R, Alafandi, Ahmed, Incekara, Fatih, Wijnenga, Maarten MJ, Kapsas, Georgios, Gahrmann, Renske, Schouten, Joost W, Dubbink, Hendrikus J, Vincent, Arnaud JPE, Bent, Martin J van den, French, Pim J, Klein, Stefan, Yuan, Yading, Sharma, Sonam, Tseng, Tzu-Chi, Adabi, Saba, Niclou, Simone P, Keunen, Olivier, Hau, Ann-Christin, Vallières, Martin, Fortin, David, Lepage, Martin, Landman, Bennett, Ramadass, Karthik, Xu, Kaiwen, Chotai, Silky, Chambless, Lola B, Mistry, Akshitkumar, Thompson, Reid C, Gusev, Yuriy, Bhuvaneshwar, Krithika, Sayah, Anousheh, Bencheqroun, Camelia, Belouali, Anas, Madhavan, Subha, Booth, Thomas C, Chelliah, Alysha, Modat, Marc, Shuaib, Haris, Dragos, Carmen, Abayazeed, Aly, Kolodziej, Kenneth, Hill, Michael, Abbassy, Ahmed, Gamal, Shady, Mekhaimar, Mahmoud, Qayati, Mohamed, Reyes, Mauricio, Park, Ji Eun, Yun, Jihye, Kim, Ho Sung, Mahajan, Abhishek, Muzi, Mark, Benson, Sean, Beets-Tan, Regina G H, Teuwen, Jonas, Herrera-Trujillo, Alejandro, Trujillo, Maria, Escobar, William, Abello, Ana, Bernal, Jose, Gómez, Jhon, Choi, Joseph, Baek, Stephen, Kim, Yusung, Ismael, Heba, Allen, Bryan, Buatti, John M, Kotrotsou, Aikaterini, Li, Hongwei, Weiss, Tobias, Weller, Michael, Bink, Andrea, Pouymayou, Bertrand, Shaykh, Hassan F, Saltz, Joel, Prasanna, Prateek, Shrestha, Sampurna, Mani, Kartik M, Payne, David, Kurc, Tahsin, Pelaez, Enrique, Franco-Maldonado, Heydy, Loayza, Francis, Quevedo, Sebastian, Guevara, Pamela, Torche, Esteban, Mendoza, Cristobal, Vera, Franco, Ríos, Elvis, López, Eduardo, Velastin, Sergio A, Ogbole, Godwin, Oyekunle, Dotun, Odafe-Oyibotha, Olubunmi, Osobu, Babatunde, Shu'aibu, Mustapha, Dorcas, Adeleye, Soneye, Mayowa, Dako, Farouk, Simpson, Amber L, Hamghalam, Mohammad, Peoples, Jacob J, Hu, Ricky, Tran, Anh, Cutler, Danielle, Moraes, Fabio Y, Boss, Michael A, Gimpel, James, Veettil, Deepak Kattil, Schmidt, Kendall, Bialecki, Brian, Marella, Sailaja, Price, Cynthia, Cimino, Lisa, Apgar, Charles, Shah, Prashant, Menze, Bjoern, Barnholtz-Sloan, Jill S, Martin, Jason, Bakas, Spyridon
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25, 256 MRI scans from 6, 314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.
Three Success Stories About Compact Data Structures
Technology evolution is no longer keeping pace with the growth of data. We are facing problems storing and processing the huge amounts of data produced every day. People rely on data-intensive applications and new paradigms (for example, edge computing) to try to keep computation closer to where data is produced and needed. Thus, the need to store and query data in devices where capacity is surpassed by data volume is routine today, ranging from astronomy data to be processed by supercomputers, to personal data to be processed by wearable sensors. The scale is different, yet the underlying problem is the same.