Goto

Collaborating Authors

 South America


MatExpert: Decomposing Materials Discovery by Mimicking Human Experts

arXiv.org Artificial Intelligence

Material discovery is a critical research area with profound implications for various industries. In this work, we introduce MatExpert, a novel framework that leverages Large Language Models (LLMs) and contrastive learning to accelerate the discovery and design of new solid-state materials. Inspired by the workflow of human materials design experts, our approach integrates three key stages: retrieval, transition, and generation. First, in the retrieval stage, MatExpert identifies an existing material that closely matches the desired criteria. Second, in the transition stage, MatExpert outlines the necessary modifications to transform this material formulation to meet specific requirements outlined by the initial user query. Third, in the generation state, MatExpert performs detailed computations and structural generation to create new materials based on the provided information. Our experimental results demonstrate that MatExpert outperforms stateof-the-art methods in material generation tasks, achieving superior performance across various metrics including validity, distribution, and stability. As such, Mat-Expert represents a meaningful advancement in computational material discovery using langauge-based generative models. The discovery and design of new materials are central challenges in modern materials science, driven by the need for materials with tailored properties for applications in energy, electronics, and catalysis. Traditional methods for material discovery, such as high-throughput experiments and density functional theory (DFT) simulations, are computationally expensive and often require significant domain expertise to achieve accurate predictions (Miret et al., 2024). Recent advancements in artificial intelligence (AI), particularly large language models (LLMs), have opened new possibilities for automating and accelerating the materials design process (Miret & Krishnan, 2024; Jablonka et al., 2024; Song et al., 2023a;b; Zhang et al., 2024; Ramos et al., 2024). LLMs such as GPT-4 OpenAI (2023) have demonstrated remarkable success in natural language processing tasks and have shown potential for application in scientific problems beyond language, including chemistry and materials science Flam-Shepherd & Aspuru-Guzik (2023); Gruver et al. (2024); Schilling-Wilhelmi et al. (2024); Mirza et al. (2024); Delétang et al. (2023). For example, LLMs have been used to generate molecular structures Gruver et al. (2024) and predict material properties from textual descriptions Alampara et al. (2024).


Implementaci\'on de Navegaci\'on en Plataforma Rob\'otica M\'ovil Basada en ROS y Gazebo

arXiv.org Artificial Intelligence

This research focused on utilizing ROS2 and Gazebo for simulating the TurtleBot3 robot, with the aim of exploring autonomous navigation capabilities. While the study did not achieve full autonomous navigation, it successfully established the connection between ROS2 and Gazebo and enabled manual simulation of the robot's movements. The primary objective was to understand how these tools can be integrated to support autonomous functions, providing valuable insights into the development process. The results of this work lay the groundwork for future research into autonomous robotics. The topic is particularly engaging for both teenagers and adults interested in discovering how robots function independently and the underlying technology involved. This research highlights the potential for further advancements in autonomous systems and serves as a stepping stone for more in-depth studies in the field.


Assessing the societal influence of academic research with ChatGPT: Impact case study evaluations

arXiv.org Artificial Intelligence

Academics and departments are sometimes judged by how their research has benefitted society. For example, the UK Research Excellence Framework (REF) assesses Impact Case Studies (ICS), which are five-page evidence-based claims of societal impacts. This study investigates whether ChatGPT can evaluate societal impact claims and therefore potentially support expert human assessors. For this, various parts of 6,220 public ICS from REF2021 were fed to ChatGPT 4o-mini along with the REF2021 evaluation guidelines, comparing the results with published departmental average ICS scores. The results suggest that the optimal strategy for high correlations with expert scores is to input the title and summary of an ICS but not the remaining text, and to modify the original REF guidelines to encourage a stricter evaluation. The scores generated by this approach correlated positively with departmental average scores in all 34 Units of Assessment (UoAs), with values between 0.18 (Economics and Econometrics) and 0.56 (Psychology, Psychiatry and Neuroscience). At the departmental level, the corresponding correlations were higher, reaching 0.71 for Sport and Exercise Sciences, Leisure and Tourism. Thus, ChatGPT-based ICS evaluations are simple and viable to support or cross-check expert judgments, although their value varies substantially between fields.


The Representation of Meaningful Precision, and Accuracy

arXiv.org Artificial Intelligence

The concepts of precision, and accuracy are domain and problem dependent. The simplified numeric hard and soft measures used in the fields of statistical learning, many types of machine learning, and binary or multiclass classification problems are known to be of limited use for understanding the meaningfulness of models or their relevance. Arguably, they are neither of patterns nor proofs. Further, there are no good measures or representations for analogous concepts in the cognition domain. In this research, the key issues are reflected upon, and a compositional knowledge representation approach in a minimalist general rough framework is proposed for the problem contexts. The latter is general enough to cover most application contexts, and may be applicable in the light of improved computational tools available.


The Sound of Silence in Social Networks

arXiv.org Artificial Intelligence

We generalize the classic multi-agent DeGroot model for opinion dynamics to incorporate the Spiral of Silence theory from political science. This theory states that individuals may withhold their opinions when they perceive them to be in the minority. As in the DeGroot model, a community of agents is represented as a weighted directed graph whose edges indicate how much agents influence one another. However, agents whose current opinions are in the minority become silent (i.e., they do not express their opinion). Two models for opinion update are then introduced. In the memoryless opinion model ($\mbox{SOM}^-$), agents update their opinion by taking the weighted average of their non-silent neighbors' opinions. In the memory based opinion model ($\mbox{SOM}^+$), agents update their opinions by taking the weighted average of the opinions of all their neighbors, but for silent neighbors, their most recent opinion is considered. We show that for $\mbox{SOM}^-$ convergence to consensus is guaranteed for clique graphs but, unlike for the classic DeGroot, not guaranteed for strongly-connected aperiodic graphs. In contrast, we show that for $\mbox{SOM}^+$ convergence to consensus is not guaranteed even for clique graphs. We showcase our models through simulations offering experimental insights that align with key aspects of the Spiral of Silence theory. These findings reveal the impact of silence dynamics on opinion formation and highlight the limitations of consensus in more nuanced social models.


LoLCATs: On Low-Rank Linearizing of Large Language Models

arXiv.org Machine Learning

Recent works show we can linearize large language models (LLMs) -- swapping the quadratic attentions of popular Transformer-based LLMs with subquadratic analogs, such as linear attention -- avoiding the expensive pretraining costs. However, linearizing LLMs often significantly degrades model quality, still requires training over billions of tokens, and remains limited to smaller 1.3B to 7B LLMs. We thus propose Low-rank Linear Conversion via Attention Transfer (LoLCATs), a simple two-step method that improves LLM linearizing quality with orders of magnitudes less memory and compute. We base these steps on two findings. First, we can replace an LLM's softmax attentions with closely-approximating linear attentions, simply by training the linear attentions to match their softmax counterparts with an output MSE loss ("attention transfer"). Then, this enables adjusting for approximation errors and recovering LLM quality simply with low-rank adaptation (LoRA). LoLCATs significantly improves linearizing quality, training efficiency, and scalability. We significantly reduce the linearizing quality gap and produce state-of-the-art subquadratic LLMs from Llama 3 8B and Mistral 7B v0.1, leading to 20+ points of improvement on 5-shot MMLU. Furthermore, LoLCATs does so with only 0.2% of past methods' model parameters and 0.4% of their training tokens. Finally, we apply LoLCATs to create the first linearized 70B and 405B LLMs (50x larger than prior work). When compared with prior approaches under the same compute budgets, LoLCATs significantly improves linearizing quality, closing the gap between linearized and original Llama 3.1 70B and 405B LLMs by 77.8% and 78.1% on 5-shot MMLU.


Unified Causality Analysis Based on the Degrees of Freedom

arXiv.org Artificial Intelligence

Temporally evolving systems are typically modeled by dynamic equations. A key challenge in accurate modeling is understanding the causal relationships between subsystems, as well as identifying the presence and influence of unobserved hidden drivers on the observed dynamics. This paper presents a unified method capable of identifying fundamental causal relationships between pairs of systems, whether deterministic or stochastic. Notably, the method also uncovers hidden common causes beyond the observed variables. By analyzing the degrees of freedom in the system, our approach provides a more comprehensive understanding of both causal influence and hidden confounders. This unified framework is validated through theoretical models and simulations, demonstrating its robustness and potential for broader application.


An Auditing Test To Detect Behavioral Shift in Language Models

arXiv.org Artificial Intelligence

As language models (LMs) approach human-level performance, a comprehensive understanding of their behavior becomes crucial. This includes evaluating capabilities, biases, task performance, and alignment with societal values. Extensive initial evaluations, including red teaming and diverse benchmarking, can establish a model's behavioral profile. However, subsequent fine-tuning or deployment modifications may alter these behaviors in unintended ways. We present a method for continual Behavioral Shift Auditing (BSA) in LMs. Our test compares model generations from a baseline model to those of the model under scrutiny and provides theoretical guarantees for change detection while controlling false positives. The test features a configurable tolerance parameter that adjusts sensitivity to behavioral changes for different use cases. We evaluate our approach using two case studies: monitoring changes in (a) toxicity and (b) translation performance. We find that the test is able to detect meaningful changes in behavior distributions using just hundreds of examples. Language models (LMs) can now achieve human-level performance in a wide range of tasks, including text summarization, machine translation, coding and even acting as AI scientists: generating hypotheses and designing experiments (Achiam et al., 2023; Katz et al., 2024; Lu et al., 2024; Zhang et al., 2024). Because of this, many sectors are looking for ways to use them to improve existing systems (Kasneci et al., 2023; Felten et al., 2023).


OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery

arXiv.org Artificial Intelligence

While the pretraining of Foundation Models (FMs) for remote sensing (RS) imagery is on the rise, models remain restricted to a few hundred million parameters. Scaling models to billions of parameters has been shown to yield unprecedented benefits including emergent abilities, but requires data scaling and computing resources typically not available outside industry R&D labs. In this work, we pair high-performance computing resources including Frontier supercomputer, America's first exascale system, and high-resolution optical RS data to pretrain billion-scale FMs. Our study assesses performance of different pretrained variants of vision Transformers across image classification, semantic segmentation and object detection benchmarks, which highlight the importance of data scaling for effective model scaling. Moreover, we discuss construction of a novel TIU pretraining dataset, model initialization, with data and pretrained models intended for public release. By discussing technical challenges and details often lacking in the related literature, this work is intended to offer best practices to the geospatial community toward efficient training and benchmarking of larger FMs.


MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding

arXiv.org Artificial Intelligence

Large Vision-Language Models (LVLMs) have achieved remarkable performance in many vision-language tasks, yet their capabilities in fine-grained visual understanding remain insufficiently evaluated. Existing benchmarks either contain limited fine-grained evaluation samples that are mixed with other data, or are confined to object-level assessments in natural images. To holistically assess LVLMs' fine-grained visual understanding capabilities, we propose using document images with multi-granularity and multi-modal information to supplement natural images. In this light, we construct MMDocBench, a benchmark with various OCR-free document understanding tasks for the evaluation of fine-grained visual perception and reasoning abilities. MMDocBench defines 15 main tasks with 4,338 QA pairs and 11,353 supporting regions, covering various document images such as research papers, receipts, financial reports, Wikipedia tables, charts, and infographics. Based on MMDocBench, we conduct extensive experiments using 13 open-source and 3 proprietary advanced LVLMs, assessing their strengths and weaknesses across different tasks and document image types. The benchmark, task instructions, and evaluation code will be made publicly available.