AITopics | Rodriguez, Pedro

Collaborating Authors

Rodriguez, Pedro

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SafePowerGraph-HIL: Real-Time HIL Validation of Heterogeneous GNNs for Bridging Sim-to-Real Gap in Power Grids

Ma, Aoxiang, Ghamizi, Salah, Cao, Jun, Rodriguez, Pedro

arXiv.org Artificial IntelligenceJan-21-2025

As machine learning (ML) techniques gain prominence in power system research, validating these methods' effectiveness under real-world conditions requires real-time hardware-in-the-loop (HIL) simulations. HIL simulation platforms enable the integration of computational models with physical devices, allowing rigorous testing across diverse scenarios critical to system resilience and reliability. In this study, we develop a SafePowerGraph-HIL framework that utilizes HIL simulations on the IEEE 9-bus system, modeled in Hypersim, to generate high-fidelity data, which is then transmitted in real-time via SCADA to an AWS cloud database before being input into a Heterogeneous Graph Neural Network (HGNN) model designed for power system state estimation and dynamic analysis. By leveraging Hypersim's capabilities, we simulate complex grid interactions, providing a robust dataset that captures critical parameters for HGNN training. The trained HGNN is subsequently validated using newly generated data under varied system conditions, demonstrating accuracy and robustness in predicting power system states. The results underscore the potential of integrating HIL with advanced neural network architectures to enhance the real-time operational capabilities of power systems. This approach represents a significant advancement toward the development of intelligent, adaptive control strategies that support the robustness and resilience of evolving power grids.

machine learning, real time system, simulation, (16 more...)

arXiv.org Artificial Intelligence

2501.12427

Country: Europe (0.15)

Genre: Research Report > New Finding (0.88)

Industry:

Energy > Power Industry (1.00)
Transportation > Ground > Road (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

Byte Latent Transformer: Patches Scale Better Than Tokens

Pagnoni, Artidoro, Pasunuru, Ram, Rodriguez, Pedro, Nguyen, John, Muller, Benjamin, Li, Margaret, Zhou, Chunting, Yu, Lili, Weston, Jason, Zettlemoyer, Luke, Ghosh, Gargi, Lewis, Mike, Holtzman, Ari, Iyer, Srinivasan

arXiv.org Artificial IntelligenceDec-13-2024

We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency and robustness. BLT encodes bytes into dynamically sized patches, which serve as the primary units of computation. Patches are segmented based on the entropy of the next byte, allocating more compute and model capacity where increased data complexity demands it. We present the first FLOP controlled scaling study of byte-level models up to 8B parameters and 4T training bytes. Our results demonstrate the feasibility of scaling models trained on raw bytes without a fixed vocabulary. Both training and inference efficiency improve due to dynamically selecting long patches when data is predictable, along with qualitative improvements on reasoning and long tail generalization. Overall, for fixed inference costs, BLT shows significantly better scaling than tokenization-based models, by simultaneously growing both patch and model size.

byte, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.09871

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Characterizing and Efficiently Accelerating Multimodal Generation Model Inference

Lee, Yejin, Sun, Anna, Hosmer, Basil, Acun, Bilge, Balioglu, Can, Wang, Changhan, Hernandez, Charles David, Puhrsch, Christian, Haziza, Daniel, Guessous, Driss, Massa, Francisco, Kahn, Jacob, Wan, Jeffrey, Reizenstein, Jeremy, Zhai, Jiaqi, Isaacson, Joe, Schlosser, Joel, Pino, Juan, Sadagopan, Kaushik Ram, Shamis, Leonid, Ma, Linjian, Hwang, Min-Jae, Chen, Mingda, Elhoushi, Mostafa, Rodriguez, Pedro, Pasunuru, Ram, Yih, Scott, Popuri, Sravya, Liu, Xing, Wu, Carole-Jean

arXiv.org Artificial IntelligenceSep-30-2024

Generative artificial intelligence (AI) technology is revolutionizing the computing industry. Not only its applications have broadened to various sectors but also poses new system design and optimization opportunities. The technology is capable of understanding and responding in multiple modalities. However, the advanced capability currently comes with significant system resource demands. To sustainably scale generative AI capabilities to billions of users in the world, inference must be fast and efficient. This paper pinpoints key system design and optimization opportunities by characterizing a family of emerging multi-modal generation models on real systems. Auto-regressive token generation is a critical latency performance bottleneck, typically dominated by GPU idle time. In addition to memory-intensive attention across the generative AI models, linear operations constitute significant inference latency due to the feed forward networks in Transformer-based models. We demonstrate that state-of-the-art optimization levers, spanning from applications to system software and hardware, set a 3.88x better baseline.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.00215

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Instruction-tuned Language Models are Better Knowledge Learners

Jiang, Zhengbao, Sun, Zhiqing, Shi, Weijia, Rodriguez, Pedro, Zhou, Chunting, Neubig, Graham, Lin, Xi Victoria, Yih, Wen-tau, Iyer, Srinivasan

arXiv.org Artificial IntelligenceMay-25-2024

In order for large language model (LLM)-based assistants to effectively adapt to evolving information needs, it must be possible to update their factual knowledge through continued training on new data. The standard recipe for doing so involves continued pre-training on new documents followed by instruction-tuning on question-answer (QA) pairs. However, we find that LLMs trained with this recipe struggle to answer questions, even though the perplexity of documents is minimized. We found that QA pairs are generally straightforward, while documents are more complex, weaving many factual statements together in an intricate manner. Therefore, we hypothesize that it is beneficial to expose LLMs to QA pairs before continued pre-training on documents so that the process of encoding knowledge from complex documents takes into account how this knowledge is accessed through questions. Based on this, we propose pre-instruction-tuning (PIT), a method that instruction-tunes on questions prior to training on documents. This contrasts with standard instruction-tuning, which learns how to extract knowledge after training on documents. Extensive experiments and ablation studies demonstrate that pre-instruction-tuning significantly enhances the ability of LLMs to absorb knowledge from new documents, outperforming standard instruction-tuning by 17.8%.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.12847

Country:

North America > United States (0.68)
Asia > Middle East > UAE (0.14)

Genre:

Research Report (0.50)
Instructional Material (0.34)

Industry:

Leisure & Entertainment (0.93)
Media > Film (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PowerFlowMultiNet: Multigraph Neural Networks for Unbalanced Three-Phase Distribution Systems

Ghamizi, Salah, Cao, Jun, Ma, Aoxiang, Rodriguez, Pedro

arXiv.org Artificial IntelligenceMar-12-2024

Efficiently solving unbalanced three-phase power flow in distribution grids is pivotal for grid analysis and simulation. There is a pressing need for scalable algorithms capable of handling large-scale unbalanced power grids that can provide accurate and fast solutions. To address this, deep learning techniques, especially Graph Neural Networks (GNNs), have emerged. However, existing literature primarily focuses on balanced networks, leaving a critical gap in supporting unbalanced three-phase power grids. This letter introduces PowerFlowMultiNet, a novel multigraph GNN framework explicitly designed for unbalanced three-phase power grids. The proposed approach models each phase separately in a multigraph representation, effectively capturing the inherent asymmetry in unbalanced grids. A graph embedding mechanism utilizing message passing is introduced to capture spatial dependencies within the power system network. PowerFlowMultiNet outperforms traditional methods and other deep learning approaches in terms of accuracy and computational speed. Rigorous testing reveals significantly lower error rates and a notable hundredfold increase in computational speed for large power networks compared to model-based methods.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2403.00892

Genre: Research Report (0.40)

Industry: Energy > Power Industry (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MultiContrievers: Analysis of Dense Retrieval Representations

Goldfarb-Tarrant, Seraphina, Rodriguez, Pedro, Dwivedi-Yu, Jane, Lewis, Patrick

arXiv.org Artificial IntelligenceFeb-24-2024

Dense retrievers compress source documents into (possibly lossy) vector representations, yet there is little analysis of what information is lost versus preserved, and how it affects downstream tasks. We conduct the first analysis of the information captured by dense retrievers compared to the language models they are based on (e.g., BERT versus Contriever). We use 25 MultiBert checkpoints as randomized initialisations to train MultiContrievers, a set of 25 contriever models. We test whether specific pieces of information -- such as gender and occupation -- can be extracted from contriever vectors of wikipedia-like documents. We measure this extractability via information theoretic probing. We then examine the relationship of extractability to performance and gender bias, as well as the sensitivity of these results to many random initialisations and data shuffles. We find that (1) contriever models have significantly increased extractability, but extractability usually correlates poorly with benchmark performance 2) gender bias is present, but is not caused by the contriever representations 3) there is high sensitivity to both random initialisation and to data shuffle, suggesting that future retrieval research should test across a wider spread of both.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.15925

Country:

Europe (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

RA-DIT: Retrieval-Augmented Dual Instruction Tuning

Lin, Xi Victoria, Chen, Xilun, Chen, Mingda, Shi, Weijia, Lomeli, Maria, James, Rich, Rodriguez, Pedro, Kahn, Jacob, Szilvasy, Gergely, Lewis, Mike, Zettlemoyer, Luke, Yih, Scott

arXiv.org Artificial IntelligenceNov-5-2023

Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores, but are challenging to build. Existing approaches require either expensive retrieval-specific modifications to LM pre-training or use post-hoc integration of the data store that leads to suboptimal performance. We introduce Retrieval-Augmented Dual Instruction Tuning (RA-DIT), a lightweight fine-tuning methodology that provides a third option by retrofitting any LLM with retrieval capabilities. Our approach operates in two distinct fine-tuning steps: (1) one updates a pre-trained LM to better use retrieved information, while (2) the other updates the retriever to return more relevant results, as preferred by the LM. By fine-tuning over tasks that require both knowledge utilization and contextual awareness, we demonstrate that each stage yields significant performance improvements, and using both leads to additional gains. Our best model, RA-DIT 65B, achieves state-of-the-art performance across a range of knowledge-intensive zero- and few-shot learning benchmarks, significantly outperforming existing in-context RALM approaches by up to +8.9% in 0-shot setting and +1.4% in 5-shot setting on average.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2310.01352

Country:

Europe (1.00)
Asia (1.00)
North America > Canada (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Reimagining Retrieval Augmented Language Models for Answering Queries

Tan, Wang-Chiew, Li, Yuliang, Rodriguez, Pedro, James, Richard, Lin, Xi Victoria, Halevy, Alon, Yih, Scott

arXiv.org Artificial IntelligenceJun-1-2023

We present a reality check on large language models and inspect the promise of retrieval augmented language models in comparison. Such language models are semi-parametric, where models integrate model parameters and knowledge from external data sources to make their predictions, as opposed to the parametric nature of vanilla large language models. We give initial experimental findings that semi-parametric architectures can be enhanced with views, a query analyzer/planner, and provenance to make a significantly more powerful system for question answering in terms of accuracy and efficiency, and potentially for other NLP tasks

machine learning, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2306.01061

Country:

Europe (1.00)
North America > United States > New York (0.14)
Africa > Middle East > Egypt (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.64)

Add feedback

Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks

Rodriguez, Pedro, Azab, Mahmoud, Silvert, Becka, Sanchez, Renato, Labson, Linzy, Shah, Hardik, Moon, Seungwhan

arXiv.org Artificial IntelligenceApr-18-2023

Searching troves of videos with textual descriptions is a core multimodal retrieval task. Owing to the lack of a purpose-built dataset for text-to-video retrieval, video captioning datasets have been re-purposed to evaluate models by (1) treating captions as positive matches to their respective videos and (2) assuming all other videos to be negatives. However, this methodology leads to a fundamental flaw during evaluation: since captions are marked as relevant only to their original video, many alternate videos also match the caption, which introduces false-negative caption-video pairs. We show that when these false negatives are corrected, a recent state-of-the-art model gains 25\% recall points -- a difference that threatens the validity of the benchmark itself. To diagnose and mitigate this issue, we annotate and release 683K additional caption-video pairs. Using these, we recompute effectiveness scores for three models on two standard benchmarks (MSR-VTT and MSVD). We find that (1) the recomputed metrics are up to 25\% recall points higher for the best models, (2) these benchmarks are nearing saturation for Recall@10, (3) caption length (generality) is related to the number of positives, and (4) annotation costs can be mitigated through sampling. We recommend retiring these benchmarks in their current form, and we make recommendations for future text-to-video retrieval benchmarks.

caption, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.05038

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.56)

Add feedback

py-irt: A Scalable Item Response Theory Library for Python

Lalor, John P., Rodriguez, Pedro

arXiv.org Artificial IntelligenceMar-13-2022

py-irt is a Python library for fitting Bayesian Item Response Theory (IRT) models. py-irt estimates latent traits of subjects and items, making it appropriate for use in IRT tasks as well as ideal-point models. py-irt is built on top of the Pyro and PyTorch frameworks and uses GPU-accelerated training to scale to large data sets. Code, documentation, and examples can be found at https://github.com/nd-ball/py-irt. py-irt can be installed from the GitHub page or the Python Package Index (PyPI).

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1287/ijoc.2022.1250

2203.01282

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Industry: Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
(2 more...)

Add feedback