roi
- North America > United States > Texas (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Israel (0.04)
- Information Technology > Human Computer Interaction (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
Provable FDR Control for Deep Feature Selection: Deep MLPs and Beyond
We develop a flexible feature selection framework based on deep neural networks that approximately controls the false discovery rate (FDR), a measure of Type-I error. The method applies to architectures whose first layer is fully connected. From the second layer onward, it accommodates multilayer perceptrons (MLPs) of arbitrary width and depth, convolutional and recurrent networks, attention mechanisms, residual connections, and dropout. The procedure also accommodates stochastic gradient descent with data-independent initializations and learning rates. To the best of our knowledge, this is the first work to provide a theoretical guarantee of FDR control for feature selection within such a general deep learning setting. Our analysis is built upon a multi-index data-generating model and an asymptotic regime in which the feature dimension $n$ diverges faster than the latent dimension $q^{*}$, while the sample size, the number of training iterations, the network depth, and hidden layer widths are left unrestricted. Under this setting, we show that each coordinate of the gradient-based feature-importance vector admits a marginal normal approximation, thereby supporting the validity of asymptotic FDR control. As a theoretical limitation, we assume $\mathbf{B}$-right orthogonal invariance of the design matrix, and we discuss broader generalizations. We also present numerical experiments that underscore the theoretical findings.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (1.00)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
OnSight Pathology: A real-time platform-agnostic computational pathology companion for histopathology
Hu, Jinzhen, Faust, Kevin, Zadeh, Parsa Babaei, Bourkas, Adrienn, Eaton, Shane, Young, Andrew, Alvi, Anzar, Oreopoulos, Dimitrios George, Paliwal, Ameesha, Alrumeh, Assem Saleh, Kamski-Hennekam, Evelyn Rose, Diamandis, Phedias
The microscopic examination of surgical tissue remains a cornerstone of disease classification but relies on subjective interpretations and access to highly specialized experts, which can compromise accuracy and clinical care. While emerging breakthroughs in artificial intelligence (AI) offer promise for automated histological analysis, the growing number of proprietary digital pathology solutions has created barriers to real-world deployment. To address these challenges, we introduce OnSight Pathology, a platform-agnostic computer vision software that uses continuous custom screen captures to provide real-time AI inferences to users as they review digital slide images. Accessible as a single, self-contained executable file (https://onsightpathology.github.io/ ), OnSight Pathology operates locally on consumer-grade personal computers without complex software integration, enabling cost-effective and secure deployment in research and clinical workflows. Here we demonstrate the utility of OnSight Pathology using over 2,500 publicly available whole slide images across different slide viewers, as well as cases from our clinical digital pathology setup. The software's robustness is highlighted across routine histopathological tasks, including the classification of common brain tumor types, mitosis detection, and the quantification of immunohistochemical stains. A built-in multi-modal chat assistant provides verifiable descriptions of images, free of rigid class labels, for added quality control. Lastly, we show compatibility with live microscope camera feeds, including from personal smartphones, offering potential for deployment in more analog, inter-operative, and telepathology settings. Together, we highlight how OnSight Pathology can deliver real-time AI inferences across a broad range of pathology pipelines, removing key barriers to the adoption of AI tools in histopathology.
- North America > Canada > Ontario > Toronto (0.15)
- North America > United States (0.14)
- Europe > Czechia (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
- North America > United States > Alaska (0.04)
- North America > United States > Texas > Schleicher County (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- Government (0.68)
- Information Technology > Services (0.46)
MEGA-GUI: Multi-stage Enhanced Grounding Agents for GUI Elements
Kwak, SeokJoo, Kim, Jihoon, Kim, Boyoun, Yoon, Jung Jae, Jang, Wooseok, Hong, Jeonghoon, Yang, Jaeho, Kwon, Yeong-Dae
Graphical User Interface (GUI) grounding - the task of mapping natural language instructions to screen coordinates - is essential for autonomous agents and accessibility technologies. Existing systems rely on monolithic models or one-shot pipelines that lack modularity and fail under visual clutter and ambiguous instructions. We introduce MEGA-GUI, a multi-stage framework that separates grounding into coarse Region-of-Interest (ROI) selection and fine-grained element grounding, orchestrated by specialized vision-language agents. MEGA-GUI features a bidirectional ROI zoom algorithm that mitigates spatial dilution and a context-aware rewriting agent that reduces semantic ambiguity. Our analysis reveals complementary strengths and weaknesses across vision-language models at different visual scales, and we show that leveraging this modular structure achieves consistently higher accuracy than monolithic approaches. On the visually dense ScreenSpot-Pro benchmark, MEGA-GUI attains 73.18% accuracy, and on the semantically complex OSWorld-G benchmark it reaches 68.63%, surpassing previously reported results. Code and the Grounding Benchmark Toolkit (GBT) are available at https://github.com/samsungsds-research-papers/mega-gui.
ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models
Li, Zhaoyang, Ling, Zhan, Zhou, Yuchen, Gong, Litian, Bıyık, Erdem, Su, Hao
Large Vision-Language Models (LVLMs) excel at captioning, visual question answering, and robotics by combining vision and language, yet they often miss obvious objects or hallucinate nonexistent ones in atypical scenes. W e examine these failures through the lens of uncertainty, focusing on contextual incongruity, where objects appear unexpectedly or fail to appear in expected contexts, and show that such cases increase recognition difficulty for state-of-the-art LVLMs. T o study this regime, we introduce the Object Recognition in Incongruous Context (ORIC) framework, which constructs incongruous object-context pairs through two complementary strategies: (1) LLM-guided sampling to identify hard-to-recognize objects present in the image and (2) CLIP-guided sampling to mine plausible but absent ones. Applied to MSCOCO, ORIC produces ORIC-Bench and ORIC-style training data. Evaluating 18 LVLMs and 2 open-vocabulary detectors reveals substantial performance drops and bias patterns under incongruous contexts. Fine-tuning Qwen3-VL-8B-Instruct with Visual Reinforcement Fine-Tuning on 600 ORIC-style samples improves results on ORIC-Bench, AMBER, and HallusionBench. Overall, we show that contextual incongruity is a key source of uncertainty and provide tools for more reliable LVLMs.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > Riverside County > Riverside (0.04)
- Africa > Guinea > Kankan Region > Kankan Prefecture > Kankan (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- (9 more...)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.47)
- North America > United States > Alaska (0.04)
- North America > United States > Texas > Schleicher County (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- Government (0.68)
- Information Technology > Services (0.46)