univ
C-QUERI: Congressional Questions, Exchanges, and Responses in Institutions Dataset
Rudra, Manjari, Magleby, Daniel, Sikdar, Sujoy
Questions in political interviews and hearings serve strategic purposes beyond information gathering including advancing partisan narratives and shaping public perceptions. However, these strategic aspects remain understudied due to the lack of large-scale datasets for studying such discourse. Congressional hearings provide an especially rich and tractable site for studying political questioning: Interactions are structured by formal rules, witnesses are obliged to respond, and members with different political affiliations are guaranteed opportunities to ask questions, enabling comparisons of behaviors across the political spectrum. We develop a pipeline to extract question-answer pairs from unstructured hearing transcripts and construct a novel dataset of committee hearings from the 108th--117th Congress. Our analysis reveals systematic differences in questioning strategies across parties, by showing the party affiliation of questioners can be predicted from their questions alone. Our dataset and methods not only advance the study of congressional politics, but also provide a general framework for analyzing question-answering across interview-like settings.
Towards Enhanced RAC Accessibility: Leveraging Datasets and LLMs
Sepulveda, Edison Jair Bejarano, Hector, Nicolai Potes, Montoya, Santiago Pineda, Rodriguez, Felipe Ivan, Orduy, Jaime Enrique, Cabezas, Alec Rosales, Navarrete, Danny Traslaviña, Farfan, Sergio Madrid
This paper explores the potential of large language models (LLMs) to make the Aeronautical Regulations of Colombia (RAC) more accessible. Given the complexity and extensive technicality of the RAC, this study introduces a novel approach to simplifying these regulations for broader understanding. By developing the first-ever RAC database, which contains 24,478 expertly labeled question-and-answer pairs, and fine-tuning LLMs specifically for RAC applications, the paper outlines the methodology for dataset assembly, expert-led annotation, and model training. Utilizing the Gemma1.1 2b model along with advanced techniques like Unsloth for efficient VRAM usage and flash attention mechanisms, the research aims to expedite training processes. This initiative establishes a foundation to enhance the comprehensibility and accessibility of RAC, potentially benefiting novices and reducing dependence on expert consultations for navigating the aviation industry's regulatory landscape. You can visit the dataset (https://huggingface.co/somosnlp/gemma-1.1-2b-it_ColombiaRAC_FullyCurated_format_chatML_V1) and the model (https://huggingface.co/datasets/somosnlp/ColombiaRAC_FullyCurated) here.
M3ICRO: Machine Learning-Enabled Compact Photonic Tensor Core based on PRogrammable Multi-Operand Multimode Interference
Gu, Jiaqi, Zhu, Hanqing, Feng, Chenghao, Jiang, Zixuan, Chen, Ray T., Pan, David Z.
Photonic computing shows promise for transformative advancements in machine learning (ML) acceleration, offering ultra-fast speed, massive parallelism, and high energy efficiency. However, current photonic tensor core (PTC) designs based on standard optical components hinder scalability and compute density due to their large spatial footprint. To address this, we propose an ultra-compact PTC using customized programmable multi-operand multimode interference (MOMMI) devices, named M3ICRO. The programmable MOMMI leverages the intrinsic light propagation principle, providing a single-device programmable matrix unit beyond the conventional computing paradigm of one multiply-accumulate (MAC) operation per device. To overcome the optimization difficulty of customized devices that often requires time-consuming simulation, we apply ML for optics to predict the device behavior and enable a differentiable optimization flow. We thoroughly investigate the reconfigurability and matrix expressivity of our customized PTC, and introduce a novel block unfolding method to fully exploit the computing capabilities of a complex-valued PTC for near-universal real-valued linear transformations. Extensive evaluations demonstrate that M3ICRO achieves a 3.4-9.6x smaller footprint, 1.6-4.4x higher speed, 10.6-42x higher compute density, 3.7-12x higher system throughput, and superior noise robustness compared to state-of-the-art coherent PTC designs, while maintaining close-to-digital task accuracy across various ML benchmarks. Our code is open-sourced at https://github.com/JeremieMelo/M3ICRO-MOMMI.
Combining multi-spectral data with statistical and deep-learning models for improved exoplanet detection in direct imaging at high contrast
Flasseur, Olivier, Bodrito, Théo, Mairal, Julien, Ponce, Jean, Langlois, Maud, Lagrange, Anne-Marie
Exoplanet detection by direct imaging is a difficult task: the faint signals from the objects of interest are buried under a spatially structured nuisance component induced by the host star. The exoplanet signals can only be identified when combining several observations with dedicated detection algorithms. In contrast to most of existing methods, we propose to learn a model of the spatial, temporal and spectral characteristics of the nuisance, directly from the observations. In a pre-processing step, a statistical model of their correlations is built locally, and the data are centered and whitened to improve both their stationarity and signal-to-noise ratio (SNR). A convolutional neural network (CNN) is then trained in a supervised fashion to detect the residual signature of synthetic sources in the pre-processed images. Our method leads to a better trade-off between precision and recall than standard approaches in the field. It also outperforms a state-of-the-art algorithm based solely on a statistical framework. Besides, the exploitation of the spectral diversity improves the performance compared to a similar model built solely from spatio-temporal data.
Is it Safe to Drive? An Overview of Factors, Challenges, and Datasets for Driveability Assessment in Autonomous Driving
Guo, Junyao, Kurup, Unmesh, Shah, Mohak
Is it Safe to Drive? Abstract--With recent advances in learning algorithms and hardware development, autonomous cars have shown promise when operating in structured environments under good driving conditions. However, for complex, cluttered and unseen environments withhigh uncertainty, autonomous driving systems still frequently demonstrate erroneous or unexpected behaviors, that could lead to catastrophic outcomes. Autonomous vehicles should ideally adapt to driving conditions; while this can be achieved through multiple routes, it would be beneficial as a first step to be able to characterize Driveability in some quantified form. To this end, this paper aims to create a framework for investigating different factors that can impact driveability. Also, one of the main mechanisms to adapt autonomous driving systems to any driving condition is to be able to learn and generalize from representative scenarios. The machine learning algorithms that currently do so learn predominantly in a supervised manner and consequently need sufficient data for robust and efficient learning. Specifically,we categorize the datasets according to use cases, and highlight the datasets that capture complicated and hazardous driving conditions which can be better used for training robust driving models. Furthermore, by discussions of what driving scenarios are not covered by existing public datasets and what driveability factors need more investigation and data acquisition, this paper aims to encourage both targeted dataset collection and the proposal of novel driveability metrics that enhance the robustness of autonomous cars in adverse environments. I. INTRODUCTION Despite testing autonomous cars in highly controlled settings, thesecars still occasionally fail in making correct decisions, often with catastrophic results According to the accident records, the failures are most likely to happen in complex or unseen driving environments. The fact remains that while autonomous cars can operate well in controlled or structured environments such as highways, they are still far from reliable when operating in cluttered, unstructured or unseen environments [2]. These apply to autonomous vehicles in general. Thesetwo different application fields also suggest that driveability could be quantified in different forms, either as a single metric or a composition of metrics. For example, with ADAS and current Level 2 or 3 autonomy, a scene can be simply defined as driveable if the car can operate safely in autonomous mode. When a non-driveable scene is detected, the autonomous car can hand over control to the human driver in a timely manner [4].
AIChE Journal Highlight: Using Machine Learning for Catalyst Design
Machine learning is beginning to make a large impact in catalysis research, according to Bryan Goldsmith, Jacques Esterhuizen, and Jin-Xun Liu of the Univ. of Michigan, Christopher Bartel of the Univ. of Colorado Boulder, and Christopher Sutton of the Fritz Haber Institute of the Max Planck Society in their July AIChE Journal Perspective article, "Machine Learning for Heterogeneous Catalyst Design and Discovery." Novel catalysts are crucial for several applications, such as energy generation and storage, sustainable chemical production, and pollution mitigation. The current trial-and-error approaches to new catalyst discovery and synthesis are expensive and time-consuming. As an alternative, machine learning can be used to identify the top catalyst candidates before experimental testing, thereby accelerating catalyst discovery and design. Goldsmith and colleagues highlight several examples where machine learning is making an impact on heterogeneous catalysis research, such as: accelerating the determination of catalyst active sites and catalyst screening; finding descriptors and patterns in catalysis data; determining interatomic potentials for catalyst simulation; and discovering and analyzing catalytic mechanisms.
Home - Flowers Laboratory
The Flowers project-team, at Inria and Ensta ParisTech, studies mechanisms that can allow robots and humans to acquire autonomously and cumulatively repertoires of novel skills over extended periods of time. This includes mechanisms for learning by self-exploration, as well as learning through interaction with peers, for the acquisition of both sensorimotor and social skills. Sensorimotor skills include locomotion, affordance learning, active manipulation. Interactive skills include grounded language use and understanding, adaptive interaction protocols, and human-robot collaboration. Our project-team, headed by Pierre-Yves Oudeyer (Inria) and co-started with David Filliat (Ensta ParisTech Cognitive Robotics Group), focuses in particular on the study of developmental mechanisms that guide efficient open-ended learning of novel skills in large real world environments.
Mars Target Encyclopedia: Rock and Soil Composition Extracted From the Literature
Wagstaff, Kiri L. (California Institute of Technology) | Francis, Raymond (California Institute of Technology) | Gowda, Thamme (California Institute of Technology) | Lu, You (Information Sciences Institute, University of Southern California ) | Riloff, Ellen (California Institute of Technology) | Singh, Karanjeet (University of Utah) | Lanza, Nina L. (California Institute of Technology)
We have constructed an information extraction system called the Mars Target Encyclopedia that takes in planetary science publications and extracts scientific knowledge about target compositions. The extracted knowledge is stored in a searchable database that can greatly accelerate the ability of scientists to compare new discoveries with what is already known. To date, we have applied this system to ~6000 documents and achieved 41-56% precision in the extracted information.
Has AI Gone Too Far? - Automated Inference of Criminality Using Face Images
Has AI gone too far? This might seem like a nonsensical question to data scientists who strive every day to expand the capabilities of AI until you read the headlines created by this just released peer reviewed scientific paper: Automated Inference on Criminality Using Face Images (Xiaolin Wu, McMaster Univ. That's right, shades of The Minority Report (movie in which criminals are arrested before the crime occurs) and the 19th century studies of phrenology. These researchers claim 89.51% accuracy in making this classification on several sets of unlabeled validation images, each of about 1,500 facial images. I hope this has really taken your breath away.
Introduction to the 28th International Conference on Logic Programming Special Issue
Dovier, Agostino, Costa, Vítor Santos
We are proud to introduce this special issue of the Journal of Theory and Practice of Logic Programming (TPLP), dedicated to the full papers accepted for the 28th International Conference on Logic Programming (ICLP). The ICLP meetings started in Marseille in 1982 and since then constitute the main venue for presenting and discussing work in the area of logic programming. We contributed to ICLP for the first time in 1991. The first guest-editor had a paper on logic programming with sets, and the second had two papers on the parallel implementation of the Andorra model. Since then, we continued pursuing research in this exciting area and ICLP has always been the major venue for our work.