imperfection
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario > Toronto (0.04)
Imitation Learning from Imperfection: Theoretical Justifications and Algorithms
Imitation learning (IL) algorithms excel in acquiring high-quality policies from expert data for sequential decision-making tasks. But, their effectiveness is hampered when faced with limited expert data. To tackle this challenge, a novel framework called (offline) IL with supplementary data has been proposed, which enhances learning by incorporating an additional yet imperfect dataset obtained inexpensively from sub-optimal policies. Nonetheless, learning becomes challenging due to the potential inclusion of out-of-expert-distribution samples. In this work, we propose a mathematical formalization of this framework, uncovering its limitations.
Leveraging Imperfection with MEDLEY A Multi-Model Approach Harnessing Bias in Medical AI
Abtahi, Farhad, Astaraki, Mehdi, Seoane, Fernando
Bias in medical artificial intelligence is conventionally viewed as a defect requiring elimination. However, human reasoning inherently incorporates biases shaped by education, culture, and experience, suggesting their presence may be inevitable and potent ially valuable. We propose MEDLEY (Medical Ensemble Diagnostic system with Leveraged diversit Y), a conceptual framework that orchestrates multiple AI models while preserving their diverse outputs rather than collapsing them into a consensus. Unlike traditional approaches that suppress disagreement, MEDLEY documents model - specific biases as potential strengths and treats hallucinations as provisional hypotheses for clinician verification. A proof - of - concept demonstrator was developed using over 30 large language models, creating a minimum viable product that preserved both consensus and minority views in synthetic cases, making diagnostic uncertainty and latent biases transparent for clinical oversight. While not yet a validated clini cal tool, the demonstration illustrates how structured diversity can enhance medical reasoning under clinician supervision. By reframing AI imperfection as a resource, MEDLEY offers a paradigm shift that opens new regulatory, ethical, and innovation pathwa ys for developing trustworthy medical AI systems.
- North America > United States (0.46)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Asia > China (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (3 more...)
Noise-Aware Generative Microscopic Traffic Simulation
Jayawardana, Vindula, Tang, Catherine, Ji, Junyi, Philion, Jonah, Peng, Xue Bin, Wu, Cathy
Accurately modeling individual vehicle behavior in microscopic traffic simulation remains a key challenge in intelligent transportation systems, as it requires vehicles to realistically generate and respond to complex traffic phenomena such as phantom traffic jams. While traditional human driver simulation models offer computational tractability, they do so by abstracting away the very complexity that defines human driving. On the other hand, recent advances in infrastructure-mounted camera-based roadway sensing have enabled the extraction of vehicle trajectory data, presenting an opportunity to shift toward generative, agent-based models. Yet, a major bottleneck remains: most existing datasets are either overly sanitized or lack standardization, failing to reflect the noisy, imperfect nature of real-world sensing. Unlike data from vehicle-mounted sensors-which can mitigate sensing artifacts like occlusion through overlapping fields of view and sensor fusion-infrastructure-based sensors surface a messier, more practical view of challenges that traffic engineers encounter. To this end, we present the I-24 MOTION Scenario Dataset (I24-MSD)-a standardized, curated dataset designed to preserve a realistic level of sensor imperfection, embracing these errors as part of the learning problem rather than an obstacle to overcome purely from preprocessing. Drawing from noise-aware learning strategies in computer vision, we further adapt existing generative models in the autonomous driving community for I24-MSD with noise-aware loss functions. Our results show that such models not only outperform traditional baselines in realism but also benefit from explicitly engaging with, rather than suppressing, data imperfection. We view I24-MSD as a stepping stone toward a new generation of microscopic traffic simulation that embraces the real-world challenges and is better aligned with practical needs.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Tennessee > Davidson County > Nashville (0.14)
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- Asia > Japan (0.04)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
PuckTrick: A Library for Making Synthetic Data More Realistic
Agostini, Alessandra, Maurino, Andrea, Spahiu, Blerina
The increasing reliance on machine learning (ML) models for decision-making requires high-quality training data. However, access to real-world datasets is often restricted due to privacy concerns, proprietary restrictions, and incomplete data availability. As a result, synthetic data generation (SDG) has emerged as a viable alternative, enabling the creation of artificial datasets that preserve the statistical properties of real data while ensuring privacy compliance. Despite its advantages, synthetic data is often overly clean and lacks real-world imperfections, such as missing values, noise, outliers, and misclassified labels, which can significantly impact model generalization and robustness. To address this limitation, we introduce Pucktrick, a Python library designed to systematically contaminate synthetic datasets by introducing controlled errors. The library supports multiple error types, including missing data, noisy values, outliers, label misclassification, duplication, and class imbalance, offering a structured approach to evaluating ML model resilience under real-world data imperfections. Pucktrick provides two contamination modes: one for injecting errors into clean datasets and another for further corrupting already contaminated datasets. Through extensive experiments on real-world financial datasets, we evaluate the impact of systematic data contamination on model performance. Our findings demonstrate that ML models trained on contaminated synthetic data outperform those trained on purely synthetic, error-free data, particularly for tree-based and linear models such as SVMs and Extra Trees.
- Banking & Finance > Trading (1.00)
- Information Technology > Security & Privacy (0.66)
Optimal Modified Feedback Strategies in LQ Games under Control Imperfections
Rabbani, Mahdis, Mojahed, Navid, Nazari, Shima
Game-theoretic approaches and Nash equilibrium have been widely applied across various engineering domains. However, practical challenges such as disturbances, delays, and actuator limitations can hinder the precise execution of Nash equilibrium strategies. This work explores the impact of such implementation imperfections on game trajectories and players' costs within the context of a two-player linear quadratic (LQ) nonzero-sum game. Specifically, we analyze how small deviations by one player affect the state and cost function of the other player. To address these deviations, we propose an adjusted control policy that not only mitigates adverse effects optimally but can also exploit the deviations to enhance performance. Rigorous mathematical analysis and proofs are presented, demonstrating through a representative example that the proposed policy modification achieves up to $61\%$ improvement compared to the unadjusted feedback policy and up to $0.59\%$ compared to the feedback Nash strategy.
- North America > United States > California > Yolo County > Davis (0.04)
- Europe > Poland > Masovia Province > Warsaw (0.04)
How I Fell Back in Love with iPhone Photography
There's a Japanese word, komorebi, that describes beams of light and dappled shadows that result when the sun shines through trees. When I take my dog on walks around my leafy neighborhood in Washington, D.C., komorebi is what most often catches my eye, especially in this autumnal moment when dense, green summer foliage is starting to thin and turn golden. As the sun sets and the shadows grow long on the edge of a precipitous valley near my apartment, the foliage creates fluttering patterns of warm and cool colors. I try to photograph these apparitions with my iPhone camera, but I'm always disappointed in the results: the device's automated image processing treats contrast as a problem to be solved, aggressively darkening the highlights and lightening up the shadows to achieve a bland flatness. Little of the lambent atmosphere I see in real life survives in the image.
- North America > United States > District of Columbia > Washington (0.25)
- North America > United States > California > San Francisco County > San Francisco (0.05)
Imitation Learning from Imperfection: Theoretical Justifications and Algorithms
Imitation learning (IL) algorithms excel in acquiring high-quality policies from expert data for sequential decision-making tasks. But, their effectiveness is hampered when faced with limited expert data. To tackle this challenge, a novel framework called (offline) IL with supplementary data has been proposed, which enhances learning by incorporating an additional yet imperfect dataset obtained inexpensively from sub-optimal policies. Nonetheless, learning becomes challenging due to the potential inclusion of out-of-expert-distribution samples. In this work, we propose a mathematical formalization of this framework, uncovering its limitations.
Subspace Representation Learning for Sparse Linear Arrays to Localize More Sources than Sensors: A Deep Learning Methodology
Chen, Kuan-Lin, Rao, Bhaskar D.
Localizing more sources than sensors with a sparse linear array (SLA) has long relied on minimizing a distance between two covariance matrices and recent algorithms often utilize semidefinite programming (SDP). Although deep neural network (DNN)-based methods offer new alternatives, they still depend on covariance matrix fitting. In this paper, we develop a novel methodology that estimates the co-array subspaces from a sample covariance for SLAs. Our methodology trains a DNN to learn signal and noise subspace representations that are invariant to the selection of bases. To learn such representations, we propose loss functions that gauge the separation between the desired and the estimated subspace. In particular, we propose losses that measure the length of the shortest path between subspaces viewed on a union of Grassmannians, and prove that it is possible for a DNN to approximate signal subspaces. The computation of learning subspaces of different dimensions is accelerated by a new batch sampling strategy called consistent rank sampling. The methodology is robust to array imperfections due to its geometry-agnostic and data-driven nature. In addition, we propose a fully end-to-end gridless approach that directly learns angles to study the possibility of bypassing subspace methods. Numerical results show that learning such subspace representations is more beneficial than learning covariances or angles. It outperforms conventional SDP-based methods such as the sparse and parametric approach (SPA) and existing DNN-based covariance reconstruction methods for a wide range of signal-to-noise ratios (SNRs), snapshots, and source numbers for both perfect and imperfect arrays.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > San Diego County > La Jolla (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis
Srinivas, Sakhinana Sagar, Ravuru, Chidaksh, Sannidhi, Geethan, Runkana, Venkataramana
Semiconductors, crucial to modern electronics, are generally under-researched in foundational models. It highlights the need for research to enhance the semiconductor device technology portfolio and aid in high-end device fabrication. In this paper, we introduce sLAVA, a small-scale vision-language assistant tailored for semiconductor manufacturing, with a focus on electron microscopy image analysis. It addresses challenges of data scarcity and acquiring high-quality, expert-annotated data. We employ a teacher-student paradigm, using a foundational vision language model like GPT-4 as a teacher to create instruction-following multimodal data for customizing the student model, sLAVA, for electron microscopic image analysis tasks on consumer hardware with limited budgets. Our approach allows enterprises to further fine-tune the proposed framework with their proprietary data securely within their own infrastructure, protecting intellectual property. Rigorous experiments validate that our framework surpasses traditional methods, handles data shifts, and enables high-throughput screening.
- Semiconductors & Electronics (1.00)
- Information Technology > Security & Privacy (0.67)
- Information Technology > Hardware (0.48)