AITopics

2405.17779

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)
(5 more...)

Genre:

Research Report (1.00)
Instructional Material > Online (0.55)

Industry:

Education > Educational Setting > Online (1.00)
Transportation > Ground > Road (0.92)
Information Technology > Robotics & Automation (0.83)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Peng, ShengYun, Chakravarthy, Aishwarya, Lee, Seongmin, Wang, Xiaojing, Balasubramaniyan, Rajarajeswari, Chau, Duen Horng

UniTable: Towards a Unified Framework for Table Recognition via Self-Supervised Pretraining

arXiv.org Artificial IntelligenceMay-27-2024

Tables convey factual and quantitative data with implicit conventions created by humans that are often challenging for machines to parse. Prior work on table recognition (TR) has mainly centered around complex task-specific combinations of available inputs and tools. We present UniTable, a training framework that unifies both the training paradigm and training objective of TR. Its training paradigm combines the simplicity of purely pixel-level inputs with the effectiveness and scalability empowered by self-supervised pretraining from diverse unannotated tabular images. Our framework unifies the training objectives of all three TR tasks - extracting table structure, cell content, and cell bounding box - into a unified task-agnostic training objective: language modeling. Extensive quantitative and qualitative analyses highlight UniTable's state-of-the-art (SOTA) performance on four of the largest TR datasets. UniTable's table parsing capability has surpassed both existing TR methods and general large vision-language models, e.g., GPT-4o, GPT-4-turbo with vision, and LLaVA. Our code is publicly available at https://github.com/poloclub/unitable, featuring a Jupyter Notebook that includes the complete inference pipeline, fine-tuned across multiple TR datasets, supporting all three TR tasks.

annotation, tabular image, unitable, (14 more...)

2403.04822

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Instructional Material (0.76)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)

arXiv.org Machine LearningMay-26-2024

High-dimensional multidisciplinary design optimization for aircraft eco-design / Optimisation multi-disciplinaire en grande dimension pour l'\'eco-conception avion en avant-projet

Saves, Paul

The objective of this Philosophiae Doctor (Ph.D) thesis is to propose an efficient approach for optimizing a multidisciplinary black-box model when the optimization problem is constrained and involves a large number of mixed integer design variables (typically 100 variables). The targeted optimization approach, called EGO, is based on a sequential enrichment of an adaptive surrogate model and, in this context, GP surrogate models are one of the most widely used in engineering problems to approximate time-consuming high fidelity models. EGO is a heuristic BO method that performs well in terms of solution quality. However, like any other global optimization method, EGO suffers from the curse of dimensionality, meaning that its performance is satisfactory on lower dimensional problems, but deteriorates as the dimensionality of the optimization search space increases. For realistic aircraft design problems, the typical size of the design variables can even exceed 100 and, thus, trying to solve directly the problems using EGO is ruled out. The latter is especially true when the problems involve both continuous and categorical variables increasing even more the size of the search space. In this Ph.D thesis, effective parameterization tools are investigated, including techniques like partial least squares regression, to significantly reduce the number of design variables. Additionally, Bayesian optimization is adapted to handle discrete variables and high-dimensional spaces in order to reduce the number of evaluations when optimizing innovative aircraft concepts such as the "DRAGON" hybrid airplane to reduce their climate impact.

artificial intelligence, high-dimensional multidisciplinary design optimization, machine learning, (21 more...)

arXiv.org Machine Learning

2402.04711

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.67)
Research Report > Promising Solution (0.67)

Industry:

Transportation > Air (1.00)
Aerospace & Defense > Aircraft (1.00)
Energy > Renewable (0.92)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

López-Fernández, Daniel, Vergaz, Ricardo

Adoption and Impact of ChatGPT in Computer Science Education: A Case Study on a Database Administration Course

Contribution: The combination of ChatGPT with traditional learning resources is very effective in computer science education. High-performing students are the ones who are using ChatGPT the most. So, a new digital trench could be rising between these students and those with lower degree of fundamentals and worse prompting skills, who may not take advantage of all the ChatGPT possibilities. Background: The irruption of GenAI such as ChatGPT has changed the educational landscape. Therefore, methodological guidelines and more empirical experiences in computer science education are needed to better understand these tools and know how to use them to their fullest potential. Research Questions: This article addresses three questions. The first two explore the degree of use and perceived usefulness of ChatGPT among computer science students to learn database administration, where as the third one explore how the utilization of ChatGPT can impact academic performance. Methodology: This contribution presents an exploratory and correlational study conducted with 37 students who used ChatGPT as a support tool to learn database administration. The student grades and a comprehensive questionnaire were employed as research instruments. Findings: The obtained results indicate that traditional learning resources, such as teacher explanations and student reports, were widely used and correlated positively with student grade. The usage and perceived utility of ChatGPT were moderate, but positive correlations between student grade and ChatGPT usage were found. Indeed, a significantly higher use of this tool was identified among the group of outstanding students.

chatgpt, correlation, student, (17 more...)

2407.12145

Country:

Europe > Spain > Galicia > Madrid (0.05)
North America > United States > Virginia (0.04)
Europe > Middle East > Cyprus (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
Research Report > Experimental Study > Negative Result (0.34)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Adaptive VIO: Deep Visual-Inertial Odometry with Online Continual Learning

Pan, Youqi, Zhou, Wugen, Cao, Yingdian, Zha, Hongbin

Visual-inertial odometry (VIO) has demonstrated remarkable success due to its low-cost and complementary sensors. However, existing VIO methods lack the generalization ability to adjust to different environments and sensor attributes. In this paper, we propose Adaptive VIO, a new monocular visual-inertial odometry that combines online continual learning with traditional nonlinear optimization. Adaptive VIO comprises two networks to predict visual correspondence and IMU bias. Unlike end-to-end approaches that use networks to fuse the features from two modalities (camera and IMU) and predict poses directly, we combine neural networks with visual-inertial bundle adjustment in our VIO system. The optimized estimates will be fed back to the visual and IMU bias networks, refining the networks in a self-supervised manner. Such a learning-optimization-combined framework and feedback mechanism enable the system to perform online continual learning. Experiments demonstrate that our Adaptive VIO manifests adaptive capability on EuRoC and TUM-VI datasets. The overall performance exceeds the currently known learning-based VIO methods and is comparable to the state-of-the-art optimization-based methods.

continual learning, odometry, online continual learning, (15 more...)

2405.16754

Country:

Europe > Switzerland (0.04)
Asia > China (0.04)

Genre: Instructional Material > Online (0.84)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Dual-State Personalized Knowledge Tracing with Emotional Incorporation

Wang, Shanshan, Yuan, Fangzheng, Wang, Keyang, Yang, Xun, Zhang, Xingyi, Wang, Meng

Knowledge tracing has been widely used in online learning systems to guide the students' future learning. However, most existing KT models primarily focus on extracting abundant information from the question sets and explore the relationships between them, but ignore the personalized student behavioral information in the learning process. This will limit the model's ability to accurately capture the personalized knowledge states of students and reasonably predict their performances. To alleviate this limitation, we explicitly models the personalized learning process by incorporating the emotions, a representative personalized behavior in the learning process, into KT framework. Specifically, we present a novel Dual-State Personalized Knowledge Tracing with Emotional Incorporation model to achieve this goal: Firstly, we incorporate emotional information into the modeling process of knowledge state, resulting in the Knowledge State Boosting Module. Secondly, we design an Emotional State Tracing Module to monitor students' personalized emotional states, and propose an emotion prediction method based on personalized emotional states. Finally, we apply the predicted emotions to enhance students' response prediction. Furthermore, to extend the generalization capability of our model across different datasets, we design a transferred version of DEKT, named Transfer Learning-based Self-loop model (T-DEKT). Extensive experiments show our method achieves the state-of-the-art performance.

emotion, information, student, (15 more...)

2405.16799

Country:

Asia > China > Anhui Province > Hefei (0.05)
Asia > China > Chongqing Province > Chongqing (0.04)
Oceania > Australia (0.04)
(6 more...)

Genre:

Instructional Material (0.92)
Research Report > New Finding (0.46)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.87)

EchoSpike Predictive Plasticity: An Online Local Learning Rule for Spiking Neural Networks

Graf, Lars, Su, Zhe, Indiveri, Giacomo

The drive to develop artificial neural networks that efficiently utilize resources has generated significant interest in bio-inspired Spiking Neural Networks (SNNs). These networks are particularly attractive due to their potential in applications requiring low power and memory. This potential is further enhanced by the ability to perform online local learning, enabling them to adapt to dynamic environments. This requires the model to be adaptive in a self-supervised manner. While self-supervised learning has seen great success in many deep learning domains, its application for online local learning in multi-layer SNNs remains underexplored. In this paper, we introduce the "EchoSpike Predictive Plasticity" (ESPP) learning rule, a pioneering online local learning rule designed to leverage hierarchical temporal dynamics in SNNs through predictive and contrastive coding. We validate the effectiveness of this approach using benchmark datasets, demonstrating that it performs on par with current state-of-the-art supervised learning rules. The temporal and spatial locality of ESPP makes it particularly well-suited for low-cost neuromorphic processors, representing a significant advancement in developing biologically plausible self-supervised learning models for neuromorphic computing at the edge.

espp, neural network, prediction, (16 more...)

2405.13976

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Research Report (1.00)
Instructional Material > Online (1.00)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMay-25-2024

Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future Directions

Chen, Jiayu, Ganguly, Bhargav, Xu, Yang, Mei, Yongsheng, Lan, Tian, Aggarwal, Vaneet

Deep generative models (DGMs) have demonstrated great success across various domains, particularly in generating texts, images, and videos using models trained from offline data. Similarly, data-driven decision-making and robotic control also necessitate learning a generator function from the offline data to serve as the strategy or policy. In this case, applying deep generative models in offline policy learning exhibits great potential, and numerous studies have explored in this direction. However, this field still lacks a comprehensive review and so developments of different branches are relatively independent. In this paper, we provide the first systematic review on the applications of deep generative models for offline policy learning. In particular, we cover five mainstream deep generative models, including Variational Auto-Encoders, Generative Adversarial Networks, Normalizing Flows, Transformers, and Diffusion Models, and their applications in both offline reinforcement learning (offline RL) and imitation learning (IL). Offline RL and IL are two main branches of offline policy learning and are widely-adopted techniques for sequential decision-making. Notably, for each type of DGM-based offline policy learning, we distill its fundamental scheme, categorize related works based on the usage of the DGM, and sort out the development process of algorithms in that field. Subsequent to the main content, we provide in-depth discussions on deep generative models and offline policy learning as a summary, based on which we present our perspectives on future research directions. This work offers a hands-on reference for the research progress in deep generative models for offline policy learning, and aims to inspire improved DGM-based offline RL or IL algorithms. For convenience, we maintain a paper list on https://github.com/LucasCJYSDL/DGMs-for-Offline-Policy-Learning.

neural information processing system 29, neural information processing system 36, neural information processing system track, (12 more...)

2402.13777

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
North America > United States > Montana (0.04)
(10 more...)

Genre:

Research Report (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.92)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (1.00)
Transportation > Ground > Road (0.67)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

arXiv.org Artificial IntelligenceMay-25-2024

PM2: A New Prompting Multi-modal Model Paradigm for Few-shot Medical Image Classification

Wang, Zhenwei, Sun, Qiule, Zhang, Bingbing, Wang, Pengfei, Zhang, Jianxin, Zhang, Qiang

Few-shot learning has been successfully applied to medical image classification as only very few medical examples are available for training. Due to the challenging problem of limited number of annotated medical images, image representations should not be solely derived from a single image modality which is insufficient for characterizing concept classes. In this paper, we propose a new prompting multi-modal model paradigm on medical image classification based on multi-modal foundation models, called PM2. Besides image modality,PM2 introduces another supplementary text input, known as prompt, to further describe corresponding image or concept classes and facilitate few-shot learning across diverse modalities. To better explore the potential of prompt engineering, we empirically investigate five distinct prompt schemes under the new paradigm. Furthermore, linear probing in multi-modal models acts as a linear classification head taking as input only class token, which ignores completely merits of rich statistics inherent in high-level visual tokens. Thus, we alternatively perform a linear classification on feature distribution of visual tokens and class token simultaneously. To effectively mine such rich statistics, a global covariance pooling with efficient matrix power normalization is used to aggregate visual tokens. Then we study and combine two classification heads. One is shared for class token of image from vision encoder and prompt representation encoded by text encoder. The other is to classification on feature distribution of visual tokens from vision encoder. Extensive experiments on three medical datasets show that our PM2 significantly outperforms counterparts regardless of prompt schemes and achieves state-of-the-art performance.

classification, dataset, image classification, (17 more...)

2404.08915

Country: Asia > China > Liaoning Province > Dalian (0.05)

Genre:

Instructional Material > Course Syllabus & Notes (0.91)
Instructional Material > Online (0.82)
Research Report > New Finding (0.68)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.92)

arXiv.org Artificial IntelligenceMay-25-2024

Instruction-tuned Language Models are Better Knowledge Learners

Jiang, Zhengbao, Sun, Zhiqing, Shi, Weijia, Rodriguez, Pedro, Zhou, Chunting, Neubig, Graham, Lin, Xi Victoria, Yih, Wen-tau, Iyer, Srinivasan

In order for large language model (LLM)-based assistants to effectively adapt to evolving information needs, it must be possible to update their factual knowledge through continued training on new data. The standard recipe for doing so involves continued pre-training on new documents followed by instruction-tuning on question-answer (QA) pairs. However, we find that LLMs trained with this recipe struggle to answer questions, even though the perplexity of documents is minimized. We found that QA pairs are generally straightforward, while documents are more complex, weaving many factual statements together in an intricate manner. Therefore, we hypothesize that it is beneficial to expose LLMs to QA pairs before continued pre-training on documents so that the process of encoding knowledge from complex documents takes into account how this knowledge is accessed through questions. Based on this, we propose pre-instruction-tuning (PIT), a method that instruction-tunes on questions prior to training on documents. This contrasts with standard instruction-tuning, which learns how to extract knowledge after training on documents. Extensive experiments and ablation studies demonstrate that pre-instruction-tuning significantly enhances the ability of LLMs to absorb knowledge from new documents, outperforming standard instruction-tuning by 17.8%.

knowledge, language model, qa pair, (15 more...)

2402.12847

Country:

Asia > Singapore (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(8 more...)

Genre:

Research Report (0.50)
Instructional Material (0.34)

Industry:

Leisure & Entertainment (0.93)
Media > Film (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)