AITopics

2412.11245

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Oceania > Australia (0.04)
North America > United States > Indiana (0.04)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Agarwal, Shubham, Sahu, Gaurav, Puri, Abhay, Laradji, Issam H., Dvijotham, Krishnamurthy DJ, Stanley, Jason, Charlin, Laurent, Pal, Christopher

LLMs for Literature Review: Are we there yet?

Literature reviews are an essential component of scientific research, but they remain time-intensive and challenging to write, especially due to the recent influx of research papers. This paper explores the zero-shot abilities of recent Large Language Models (LLMs) in assisting with the writing of literature reviews based on an abstract. We decompose the task into two components: 1. Retrieving related works given a query abstract, and 2. Writing a literature review based on the retrieved results. We analyze how effective LLMs are for both components. For retrieval, we introduce a novel two-step search strategy that first uses an LLM to extract meaningful keywords from the abstract of a paper and then retrieves potentially relevant papers by querying an external knowledge base. Additionally, we study a prompting-based re-ranking mechanism with attribution and show that re-ranking doubles the normalized recall compared to naive search methods, while providing insights into the LLM's decision-making process. In the generation phase, we propose a two-step approach that first outlines a plan for the review and then executes steps in the plan to generate the actual review. To evaluate different LLM-based literature review methods, we create test sets from arXiv papers using a protocol designed for rolling use with newly released LLMs to avoid test set contamination in zero-shot evaluations. We release this evaluation protocol to promote additional research and development in this regard. Our empirical results suggest that LLMs show promising potential for writing literature reviews when the task is decomposed into smaller components of retrieval and planning. Further, we demonstrate that our planning-based approach achieves higher-quality reviews by minimizing hallucinated references in the generated review by 18-26% compared to existing simpler LLM-based generation methods.

large language model, machine learning, natural language, (17 more...)

2412.15249

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(11 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Grasp What You Want: Embodied Dexterous Grasping System Driven by Your Voice

Li, Junliang, Ye, Kai, Kang, Haolan, Liang, Mingxuan, Wu, Yuhang, Liu, Zhenhua, Zhuang, Huiping, Huang, Rui, Chen, Yongquan

In recent years, as robotics has advanced, human-robot collaboration has gained increasing importance. However, current robots struggle to fully and accurately interpret human intentions from voice commands alone. Traditional gripper and suction systems often fail to interact naturally with humans, lack advanced manipulation capabilities, and are not adaptable to diverse tasks, especially in unstructured environments. This paper introduces the Embodied Dexterous Grasping System (EDGS), designed to tackle object grasping in cluttered environments for human-robot interaction. We propose a novel approach to semantic-object alignment using a Vision-Language Model (VLM) that fuses voice commands and visual information, significantly enhancing the alignment of multi-dimensional attributes of target objects in complex scenarios. Inspired by human hand-object interactions, we develop a robust, precise, and efficient grasping strategy, incorporating principles like the thumb-object axis, multi-finger wrapping, and fingertip interaction with an object's contact mechanics. We also design experiments to assess Referring Expression Representation Enrichment (RERE) in referring expression segmentation, demonstrating that our system accurately detects and matches referring expressions. Extensive experiments confirm that EDGS can effectively handle complex grasping tasks, achieving stability and high success rates, highlighting its potential for further development in the field of Embodied AI.

information, large language model, natural language, (20 more...)

2412.10694

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.66)

Torka, Simon, Albayrak, Sahin

ALPACA -- Adaptive Learning Pipeline for Comprehensive AI

In the rapidly evolving landscape of artificial intelligence (AI), the ability to comprehensively analyze and understand complex data is paramount. However, the ever-increasing complexity of real data requires sophisticated AI pipelines that seamlessly integrate different stages such as data collection, preparation, model generation, and evaluation. Such a pipeline can be illustrated as a chain of distinct yet interdependent stages, each contributing to the overarching goal of turning data into actionable intelligence. Like a well-coordinated symphony, these stages require harmonious collaboration to achieve optimal results. The concept of AI pipelines therefore represents more than just a linear progression; it signifies the orchestration of diverse processes to accomplish a larger purpose. At the core of this revolution lie AI pipelines, intricate networks of interconnected data processing and analysis steps designed to transform raw data into meaningful insights or outcomes using AI techniques. The evolution of simple AI models to adaptive, systematic AI pipelines has ushered in a new era of data-driven decision-making by solving complex tasks in an ever-changing environment. Making AI understandable, accessible and usable by everyone in every domain requires a domain-independent, easy-to-use pipeline architecture that can be integrated into a complex ecosystem of experts and non-experts. However, the design and implementation of such pipelines often prove challenging due to the intricate interplay of technical components and the diverse requirements of different application domains.

artificial intelligence, data mining, machine learning, (16 more...)

2412.1095

Country:

Europe > Germany (0.28)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)

Genre:

Research Report (1.00)
Overview (0.93)

Industry:

Information Technology > Services (0.93)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Unveiling Topological Structures in Text: A Comprehensive Survey of Topological Data Analysis Applications in NLP

Uchendu, Adaku, Le, Thai

The surge of data available on the internet has led to the adoption of various computational methods to analyze and extract valuable insights from this wealth of information. Among these, the field of Machine Learning (ML) has thrived by leveraging data to extract meaningful insights. However, ML techniques face notable challenges when dealing with real-world data, often due to issues of imbalance, noise, insufficient labeling, and high dimensionality. To address these limitations, some researchers advocate for the adoption of Topological Data Analysis (TDA), a statistical approach that discerningly captures the intrinsic shape of data despite noise. Despite its potential, TDA has not gained as much traction within the Natural Language Processing (NLP) domain compared to structurally distinct areas like computer vision. Nevertheless, a dedicated community of researchers has been exploring the application of TDA in NLP, yielding 87 papers we comprehensively survey in this paper. Our findings categorize these efforts into theoretical and non-theoretical approaches. Theoretical approaches aim to explain linguistic phenomena from a topological viewpoint, while non-theoretical approaches merge TDA with ML features, utilizing diverse numerical representation techniques. We conclude by exploring the challenges and unresolved questions that persist in this niche field. Resources and a list of papers on this topic can be found at: https://github.com/AdaUchendu/AwesomeTDA4NLP.

large language model, machine learning, natural language, (19 more...)

2411.10298

Country:

Oceania > Australia (0.04)
North America > United States > Texas (0.04)
North America > United States > Missouri > Greene County > Springfield (0.04)
(13 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.34)

Industry:

Government (0.93)
Information Technology > Security & Privacy (0.69)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
(2 more...)

$\textit{SKIntern}$: Internalizing Symbolic Knowledge for Distilling Better CoT Capabilities into Small Language Models

Liao, Huanxuan, He, Shizhu, Hao, Yupu, Li, Xiang, Zhang, Yuanzhe, Zhao, Jun, Liu, Kang

Small Language Models (SLMs) are attracting attention due to the high computational demands and privacy concerns of Large Language Models (LLMs). Some studies fine-tune SLMs using Chains of Thought (CoT) data distilled from LLMs, aiming to enhance their reasoning ability. Furthermore, Some CoT distillation methods introduce external symbolic knowledge into the generation process to improve the limited knowledge memory, reasoning ability and out-of-domain (OOD) generalization of SLMs. However, the introduction of symbolic knowledge increases computational overhead and introduces potential noise. In this paper, we introduce $\textit{SKIntern}$, an innovative approach that empowers SLMs to internalize symbolic knowledge and few-shot examples gradually through a progressive fine-tuning process, guided by a predefined linear decay schedule under curriculum learning. By efficiently internalizing knowledge, $\textit{SKIntern}$ reduces computational overhead and speeds up the reasoning process by focusing solely on the question during inference. It outperforms state-of-the-art baselines by over 5\%, while reducing inference costs (measured in FLOPs) by up to $4\times$ across a wide range of SLMs in both in-domain (ID) and out-of-domain (OOD) tasks. Our code will be available at \url{https://github.com/Xnhyacinth/SKIntern}.

knowledge management, large language model, machine learning, (23 more...)

2409.13183

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Texas (0.04)
(5 more...)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry:

Education (0.68)
Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Wollman, Alex, Hastings, John

CEKER: A Generalizable LLM Framework for Literature Analysis with a Case Study in Unikernel Security

Literature reviews are a critical component of formulating and justifying new research, but are a manual and often time-consuming process. This research introduces a novel, generalizable approach to literature analysis called CEKER which uses a three-step process to streamline the collection of literature, the extraction of key insights, and the summarized analysis of key trends and gaps. Leveraging Large Language Models (LLMs), this methodology represents a significant shift from traditional manual literature reviews, offering a scalable, flexible, and repeatable approach that can be applied across diverse research domains. A case study on unikernel security illustrates CEKER's ability to generate novel insights validated against previous manual methods. CEKER's analysis highlighted reduced attack surface as the most prominent theme. Key security gaps included the absence of Address Space Layout Randomization, missing debugging tools, and limited entropy generation, all of which represent important challenges to unikernel security. The study also revealed a reliance on hypervisors as a potential attack vector and emphasized the need for dynamic security adjustments to address real-time threats.

large language model, machine learning, natural language, (21 more...)

2412.10904

Country:

North America > United States (0.28)
Europe (0.28)

Genre:

Workflow (0.90)
Overview (0.70)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

arXiv.org Artificial IntelligenceDec-13-2024

Visual Object Tracking across Diverse Data Modalities: A Review

Wang, Mengmeng, Ma, Teli, Xin, Shuo, Hou, Xiaojun, Xing, Jiazheng, Dai, Guang, Wang, Jingdong, Liu, Yong

Visual Object Tracking (VOT) is an attractive and significant research area in computer vision, which aims to recognize and track specific targets in video sequences where the target objects are arbitrary and class-agnostic. The VOT technology could be applied in various scenarios, processing data of diverse modalities such as RGB, thermal infrared and point cloud. Besides, since no one sensor could handle all the dynamic and varying environments, multi-modal VOT is also investigated. This paper presents a comprehensive survey of the recent progress of both single-modal and multi-modal VOT, especially the deep learning methods. Specifically, we first review three types of mainstream single-modal VOT, including RGB, thermal infrared and point cloud tracking. In particular, we conclude four widely-used single-modal frameworks, abstracting their schemas and categorizing the existing inheritors. Then we summarize four kinds of multi-modal VOT, including RGB-Depth, RGB-Thermal, RGB-LiDAR and RGB-Language. Moreover, the comparison results in plenty of VOT benchmarks of the discussed modalities are presented. Finally, we provide recommendations and insightful observations, inspiring the future development of this fast-growing literature.

artificial intelligence, machine learning, natural language, (16 more...)

2412.09991

Country: Asia > China (0.27)

Genre: Overview (1.00)

Industry:

Transportation (0.67)
Energy > Oil & Gas > Upstream (0.45)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(4 more...)

arXiv.org Artificial IntelligenceDec-13-2024

A Decade of Deep Learning: A Survey on The Magnificent Seven

Azizov, Dilshod, Manzoor, Muhammad Arslan, Bojkovic, Velibor, Wang, Yingxu, Wang, Zixiao, Iklassov, Zangir, Zhao, Kailong, Li, Liang, Liu, Siwei, Zhong, Yu, Liu, Wei, Liang, Shangsong

At the core of this transformation is the development of multi-layered neural network architectures that facilitate automatic feature extraction from raw data, significantly improving the efficiency on machine learning tasks. Given the rapid pace of these advancements, an accessible manual is necessary to distill the key advances of the past decade. With this in mind, we introduce a study which highlights the evolution of deep learning, largely attributed to powerful algorithms. Among the multitude of breakthroughs, certain algorithms, including Residual Networks (ResNets), Transformers, Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), Graph Neural Networks (GNNs), Contrastive Language-Image Pretraining (CLIP) and Diffusion models, have emerged as the cornerstones and driving forces behind the discipline. We select these algorithms via a survey targeting a broad spectrum of academics and professionals with the aim of encapsulating the essence of the most influential algorithms over the past decade. In this work, we provide details on the selection methodology, exploring the mentioned architectures in a broader context of the history of deep learning. We present an overview of selected core architectures, their mathematical underpinnings, and the algorithmic procedures that define the subsequent extensions and variants of these models, their applications, and their challenges and potential future research directions. In addition, we explore the practical aspects related to these algorithms, such as training and optimization methods, normalization techniques, and rate scheduling strategies that are essential for their effective implementation. Therefore, our manuscript serves as a practical survey for understanding and applying these crucial algorithms and aims to provide a manual for experienced researchers transitioning into deep learning from other domains, as well as for beginners seeking to grasp the trending algorithms.

artificial intelligence, arxiv preprint arxiv, machine learning, (15 more...)

2412.16188

Country:

Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
Asia > China (0.04)
North America > United States > Hawaii (0.04)
(6 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)

Industry:

Information Technology (0.92)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Leisure & Entertainment > Games (0.67)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Villani, Francesco, Maljkovic, Igor, Lazzaro, Dario, Sotgiu, Angelo, Cinà, Antonio Emanuele, Roli, Fabio

Robust image classification with multi-modal large language models

arXiv.org Artificial IntelligenceDec-13-2024

Deep Neural Networks are vulnerable to adversarial examples, i.e., carefully crafted input samples that can cause models to make incorrect predictions with high confidence. To mitigate these vulnerabilities, adversarial training and detection-based defenses have been proposed to strengthen models in advance. However, most of these approaches focus on a single data modality, overlooking the relationships between visual patterns and textual descriptions of the input. In this paper, we propose a novel defense, Multi-Shield, designed to combine and complement these defenses with multi-modal information to further enhance their robustness. Multi-Shield leverages multi-modal large language models to detect adversarial examples and abstain from uncertain classifications when there is no alignment between textual and visual representations of the input. Extensive evaluations on CIFAR-10 and ImageNet datasets, using robust and non-robust image classification models, demonstrate that Multi-Shield can be easily integrated to detect and reject adversarial examples, outperforming the original defenses.

large language model, machine learning, natural language, (21 more...)

2412.10353

Country: Europe > Italy > Sardinia > Cagliari (0.04)

Genre:

Overview (1.00)
Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.85)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)