abstractor
- Information Technology (0.67)
- Food & Agriculture (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
OW-VISCapTor: Abstractors for Open-World Video Instance Segmentation and Captioning
We propose the new task open-world video instance segmentation and captioning. It requires to detect, segment, track and describe with rich captions never before seen objects. This challenging task can be addressed by developing abstractors which connect a vision model and a language foundation model. Concretely, we connect a multi-scale visual feature extractor and a large language model (LLM) by developing an object abstractor and an object-to-text abstractor. The object abstractor, consisting of a prompt encoder and transformer blocks, introduces spatially-diverse open-world object queries to discover never before seen objects in videos. An inter-query contrastive loss further encourages the diversity of object queries. The object-to-text abstractor is augmented with masked cross-attention and acts as a bridge between the object queries and a frozen LLM to generate rich and descriptive object-centric captions for each detected object. Our generalized approach surpasses the baseline that jointly addresses the tasks of open-world video instance segmentation and dense video object captioning by 13% on never before seen objects, and by 10% on object-centric captions.
- Information Technology (0.67)
- Food & Agriculture (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework
Estevez, Melissa, Singh, Nisha, Dyson, Lauren, Adamson, Blythe, Yuan, Qianyu, Hildner, Megan W., Fidyk, Erin, Mbah, Olive, Khan, Farhad, Seidl-Rathkopf, Kathi, Cohen, Aaron B.
Large language models (LLMs) are increasingly used to extract clinical data from electronic health records (EHRs), offering significant improvements in scalability and efficiency for real-world data (RWD) curation in oncology. However, the adoption of LLMs introduces new challenges in ensuring the reliability, accuracy, and fairness of extracted data, which are essential for research, regulatory, and clinical applications. Existing quality assurance frameworks for RWD and artificial intelligence do not fully address the unique error modes and complexities associated with LLM-extracted data. In this paper, we propose a comprehensive framework for evaluating the quality of clinical data extracted by LLMs. The framework integrates variable-level performance benchmarking against expert human abstraction, automated verification checks for internal consistency and plausibility, and replication analyses comparing LLM-extracted data to human-abstracted datasets or external standards. This multidimensional approach enables the identification of variables most in need of improvement, systematic detection of latent errors, and confirmation of dataset fitness-for-purpose in real-world research. Additionally, the framework supports bias assessment by stratifying metrics across demographic subgroups. By providing a rigorous and transparent method for assessing LLM-extracted RWD, this framework advances industry standards and supports the trustworthy use of AI-powered evidence generation in oncology research and practice.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
OW-VISCapTor: Abstractors for Open-World Video Instance Segmentation and Captioning
We propose the new task open-world video instance segmentation and captioning. It requires to detect, segment, track and describe with rich captions never before seen objects. This challenging task can be addressed by developing "abstractors" which connect a vision model and a language foundation model. Concretely, we connect a multi-scale visual feature extractor and a large language model (LLM) by developing an object abstractor and an object-to-text abstractor. The object abstractor, consisting of a prompt encoder and transformer blocks, introduces spatially-diverse open-world object queries to discover never before seen objects in videos.
RESOLVE: Relational Reasoning with Symbolic and Object-Level Features Using Vector Symbolic Processing
Mejri, Mohamed, Amarnath, Chandramouli, Chatterjee, Abhijit
Modern transformer-based encoder-decoder architectures struggle with reasoning tasks due to their inability to effectively extract relational information between input objects (data/tokens). Recent work introduced the Abstractor module, embedded between transformer layers, to address this gap. However, the Abstractor layer while excelling at capturing relational information (pure relational reasoning), faces challenges in tasks that require both object and relational-level reasoning (partial relational reasoning). To address this, we propose RESOLVE, a neuro-vector symbolic architecture that combines object-level features with relational representations in high-dimensional spaces, using fast and efficient operations such as bundling (summation) and binding (Hadamard product) allowing both object-level features and relational representations to coexist within the same structure without interfering with one another. RESOLVE is driven by a novel attention mechanism that operates in a bipolar high dimensional space, allowing fast attention score computation compared to the state-of-the-art. By leveraging this design, the model achieves both low compute latency and memory efficiency. RESOLVE also offers better generalizability while achieving higher accuracy in purely relational reasoning tasks such as sorting as well as partial relational reasoning tasks such as math problem-solving compared to state-of-the-art methods.
LLMD: A Large Language Model for Interpreting Longitudinal Medical Records
Porter, Robert, Diehl, Adam, Pastel, Benjamin, Hinnefeld, J. Henry, Nerenberg, Lawson, Maung, Pye, Kerbrat, Sebastien, Hanson, Gillian, Astorino, Troy, Tarsa, Stephen J.
We introduce LLMD, a large language model designed to analyze a patient's medical history based on their medical records. Along with domain knowledge, LLMD is trained on a large corpus of records collected over time and across facilities, as well as tasks and labels that make nuanced connections among them. This approach is critical to an accurate picture of patient health, and has distinctive advantages over models trained on knowledge alone, unlabeled records, structured EHR data, or records from a single health system. The recipe for LLMD continues pretraining a foundational model on both domain knowledge and the contents of millions of records. These span an average of 10 years of care and as many as 140 care sites per patient. LLMD is then instruction fine-tuned on structuring and abstraction tasks. The former jointly identify and normalize document metadata, provenance information, clinical named-entities, and ontology mappings, while the latter roll these into higher-level representations, such a continuous era of time a patient was on a medication. LLMD is deployed within a layered validation system that includes continual random audits and review by experts, e.g. based on uncertainty, disease-specific rules, or use-case. LLMD exhibits large gains over both more-powerful generalized models and domain-specific models. On medical knowledge benchmarks, LLMD-8B achieves state of the art accuracy on PubMedQA text responses, besting orders-of-magnitude larger models. On production tasks, we show that LLMD significantly outperforms all other models evaluated, and among alternatives, large general purpose LLMs like GPT-4o are more accurate than models emphasizing medical knowledge. We find strong evidence that accuracy on today's medical benchmarks is not the most significant factor when analyzing real-world patient data, an insight with implications for future medical LLMs.'
Extract-and-Abstract: Unifying Extractive and Abstractive Summarization within Single Encoder-Decoder Framework
Wu, Yuping, Li, Hao, Zhu, Hongbo, Nenadic, Goran, Zeng, Xiao-Jun
Extract-then-Abstract is a naturally coherent paradigm to conduct abstractive summarization with the help of salient information identified by the extractive model. Previous works that adopt this paradigm train the extractor and abstractor separately and introduce extra parameters to highlight the extracted salients to the abstractor, which results in error accumulation and additional training costs. In this paper, we first introduce a parameter-free highlight method into the encoder-decoder framework: replacing the encoder attention mask with a saliency mask in the cross-attention module to force the decoder to focus only on salient parts of the input. A preliminary analysis compares different highlight methods, demonstrating the effectiveness of our saliency mask. We further propose the novel extract-and-abstract paradigm, ExtAbs, which jointly and seamlessly performs Extractive and Abstractive summarization tasks within single encoder-decoder model to reduce error accumulation. In ExtAbs, the vanilla encoder is augmented to extract salients, and the vanilla decoder is modified with the proposed saliency mask to generate summaries. Built upon BART and PEGASUS, experiments on three datasets show that ExtAbs can achieve superior performance than baselines on the extractive task and performs comparable, or even better than the vanilla models on the abstractive task.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (13 more...)
- Law (0.46)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)
Slot Abstractors: Toward Scalable Abstract Visual Reasoning
Mondal, Shanka Subhra, Cohen, Jonathan D., Webb, Taylor W.
Abstract visual reasoning is a characteristically human ability, allowing the identification of relational patterns that are abstracted away from object features, and the systematic generalization of those patterns to unseen problems. Recent work has demonstrated strong systematic generalization in visual reasoning tasks involving multi-object inputs, through the integration of slot-based methods used for extracting object-centric representations coupled with strong inductive biases for relational abstraction. However, this approach was limited to problems containing a single rule, and was not scalable to visual reasoning problems containing a large number of objects. Other recent work proposed Abstractors, an extension of Transformers that incorporates strong relational inductive biases, thereby inheriting the Transformer's scalability and multi-head architecture, but it has yet to be demonstrated how this approach might be applied to multi-object visual inputs. Here we combine the strengths of the above approaches and propose Slot Abstractors, an approach to abstract visual reasoning that can be scaled to problems involving a large number of objects and multiple relations among them. The approach displays state-of-the-art performance across four abstract visual reasoning tasks, as well as an abstract reasoning task involving real-world images.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Europe > Austria > Vienna (0.14)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (0.93)
- Information Technology > Artificial Intelligence > Vision (0.88)
LARS-VSA: A Vector Symbolic Architecture For Learning with Abstract Rules
Mejri, Mohamed, Amarnath, Chandramouli, Chatterjee, Abhijit
Human cognition excels at symbolic reasoning, deducing abstract rules from limited samples. This has been explained using symbolic and connectionist approaches, inspiring the development of a neuro-symbolic architecture that combines both paradigms. In parallel, recent studies have proposed the use of a "relational bottleneck" that separates object-level features from abstract rules, allowing learning from limited amounts of data . While powerful, it is vulnerable to the curse of compositionality meaning that object representations with similar features tend to interfere with each other. In this paper, we leverage hyperdimensional computing, which is inherently robust to such interference to build a compositional architecture. We adapt the "relational bottleneck" strategy to a high-dimensional space, incorporating explicit vector binding operations between symbols and relational representations. Additionally, we design a novel high-dimensional attention mechanism that leverages this relational representation. Our system benefits from the low overhead of operations in hyperdimensional space, making it significantly more efficient than the state of the art when evaluated on a variety of test datasets, while maintaining higher or equal accuracy.