AITopics | Instructional Material

Collaborating Authors

Instructional Material

Amortized Variational Inference: A Systematic Review

Ganguly, Ankush | Jain, Sanjana | Watchareeruetai, Ukrit (a:1:{s:5:"en_US";s:6:"Sertis";})

Journal of Artificial Intelligence ResearchOct-15-2023

The core principle of Variational Inference (VI) is to convert the statistical inference problem of computing complex posterior probability densities into a tractable optimization problem. This property enables VI to be faster than several sampling-based techniques. However, the traditional VI algorithm is not scalable to large data sets and is unable to readily infer out-of-bounds data points without re-running the optimization process. Recent developments in the field, like stochastic-, black box-, and amortized-VI, have helped address these issues. Generative modeling tasks nowadays widely make use of amortized VI for its efficiency and scalability, as it utilizes a parameterized function to learn the approximate posterior density parameters. In this paper, we review the mathematical foundations of various VI techniques to form the basis for understanding amortized VI. Additionally, we provide an overview of the recent trends that address several issues of amortized VI, such as the amortization gap, generalization issues, inconsistent representation learning, and posterior collapse. Finally, we analyze alternate divergence measures that improve VI optimization.

inference, international conference, proceedings, (11 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.14258

AI Access Foundation

14258

Journal of Artificial Intelligence Research

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Ontario > Toronto (0.14)
(17 more...)

Genre:

Overview (0.88)
Instructional Material (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

Add feedback

Neuronal Auditory Machine Intelligence (NEURO-AMI) In Perspective

Osegi, Emmanuel Ndidi

arXiv.org Artificial IntelligenceOct-14-2023

The recent developments in soft computing cannot be complete without noting the contributions of artificial neural machine learning systems that draw inspiration from real cortical tissue or processes that occur in human brain. The universal approximability of such neural systems has led to its wide spread use, and novel developments in this evolving technology has shown that there is a bright future for such Artificial Intelligent (AI) techniques in the soft computing field. Indeed, the proliferation of large and very deep networks of artificial neural systems and the corresponding enhancement and development of neural machine learning algorithms have contributed immensely to the development of the modern field of Deep Learning as may be found in the well documented research works of Lecun, Bengio and Hinton. However, the key requirements of end user affordability in addition to reduced complexity and reduced data learning size requirement means there still remains a need for the synthesis of more cost-efficient and less data-hungry artificial neural systems. In this report, we present an overview of a new competing bio-inspired continual learning neural tool Neuronal Auditory Machine Intelligence (Neuro-AMI) as a predictor detailing its functional and structural details, important aspects on right applicability, some recent application use cases and future research directions for current and prospective machine learning experts and data scientists.

application, opération, representation, (13 more...)

arXiv.org Artificial Intelligence

2401.02421

Country:

Asia > Singapore (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre:

Research Report (0.64)
Instructional Material (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Energy (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Instruction Tuning with Human Curriculum

Lee, Bruce W., Cho, Hyunsoo, Yoo, Kang Min

arXiv.org Artificial IntelligenceOct-14-2023

The dominant paradigm for instruction tuning is the random-shuffled training of maximally diverse instruction-response pairs. This paper explores the potential benefits of applying a structured cognitive learning approach to instruction tuning in contemporary large language models like ChatGPT and GPT-4. Unlike the previous conventional randomized instruction dataset, we propose a highly structured synthetic dataset that mimics the progressive and organized nature of human education. We curate our dataset by aligning it with educational frameworks, incorporating meta information including its topic and cognitive rigor level for each sample. Our dataset covers comprehensive fine-grained topics spanning diverse educational stages (from middle school to graduate school) with various questions for each topic to enhance conceptual depth using Bloom's taxonomy-a classification framework distinguishing various levels of human cognition for each concept. The results demonstrate that this cognitive rigorous training approach yields significant performance enhancements - +3.06 on the MMLU benchmark and an additional +1.28 on AI2 Reasoning Challenge (hard set) - compared to conventional randomized training, all while avoiding additional computational costs. This research highlights the potential of leveraging human learning principles to enhance the capabilities of language models in comprehending and responding to complex instructions and tasks.

higher education, instruction, language model, (14 more...)

arXiv.org Artificial Intelligence

2310.09518

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > France (0.04)
North America > United States > Pennsylvania (0.04)
(5 more...)

Genre:

Instructional Material > Course Syllabus & Notes (1.00)
Research Report > New Finding (0.87)

Industry:

Education > Curriculum > Subject-Specific Education (0.93)
Education > Educational Setting > K-12 Education > Secondary School (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multimodal Analysis Of Google Bard And GPT-Vision: Experiments In Visual Reasoning

Noever, David, Noever, Samantha Elizabeth Miller

arXiv.org Artificial IntelligenceOct-14-2023

Addressing the gap in understanding visual comprehension in Large Language Models (LLMs), we designed a challenge-response study, subjecting Google Bard and GPT-Vision to 64 visual tasks, spanning categories like "Visual Situational Reasoning" and "Next Scene Prediction." Previous models, such as GPT4, leaned heavily on optical character recognition tools like Tesseract, whereas Bard and GPT-Vision, akin to Google Lens and Visual API, employ deep learning techniques for visual text recognition. However, our findings spotlight both vision-language model's limitations: while proficient in solving visual CAPTCHAs that stump ChatGPT alone, it falters in recreating visual elements like ASCII art or analyzing Tic Tac Toe grids, suggesting an over-reliance on educated visual guesses. The prediction problem based on visual inputs appears particularly challenging with no common-sense guesses for next-scene forecasting based on current "next-token" multimodal models. This study provides experimental insights into the current capacities and areas for improvement in multimodal LLMs.

bard, engine, text prompt response note, (12 more...)

arXiv.org Artificial Intelligence

2309.16705

Country:

North America > United States > Indiana (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(5 more...)

Genre:

Research Report (1.00)
Instructional Material (0.67)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Transportation > Air (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

Liu, Bo, Zhu, Yifeng, Gao, Chongkai, Feng, Yihao, Liu, Qiang, Zhu, Yuke, Stone, Peter

arXiv.org Artificial IntelligenceOct-14-2023

Lifelong learning offers a promising paradigm of building a generalist agent that learns and adapts over its lifespan. Unlike traditional lifelong learning problems in image and text domains, which primarily involve the transfer of declarative knowledge of entities and concepts, lifelong learning in decision-making (LLDM) also necessitates the transfer of procedural knowledge, such as actions and behaviors. To advance research in LLDM, we introduce LIBERO, a novel benchmark of lifelong learning for robot manipulation. Specifically, LIBERO highlights five key research topics in LLDM: 1) how to efficiently transfer declarative knowledge, procedural knowledge, or the mixture of both; 2) how to design effective policy architectures and 3) effective algorithms for LLDM; 4) the robustness of a lifelong learner with respect to task ordering; and 5) the effect of model pretraining for LLDM. We develop an extendible procedural generation pipeline that can in principle generate infinitely many tasks. For benchmarking purpose, we create four task suites (130 tasks in total) that we use to investigate the above-mentioned research topics. To support sample-efficient learning, we provide high-quality human-teleoperated demonstration data for all tasks. Our extensive experiments present several insightful or even unexpected discoveries: sequential finetuning outperforms existing lifelong learning methods in forward transfer, no single visual encoder architecture excels at all types of knowledge transfer, and naive supervised pretraining can hinder agents' performance in the subsequent LLDM. Check the website at https://libero-project.github.io for the code and the datasets.

algorithm, architecture, lifelong learning algorithm, (12 more...)

arXiv.org Artificial Intelligence

2306.0331

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)
Research Report > Experimental Study (0.68)

Industry: Education > Educational Setting > Continuing Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Jointly-Learned Exit and Inference for a Dynamic Neural Network : JEI-DNN

Regol, Florence, Chataoui, Joud, Coates, Mark

arXiv.org Artificial IntelligenceOct-13-2023

Large pretrained models, coupled with fine-tuning, are slowly becoming established as the dominant architecture in machine learning. Even though these models offer impressive performance, their practical application is often limited by the prohibitive amount of resources required for every inference. Early-exiting dynamic neural networks (EDNN) circumvent this issue by allowing a model to make some of its predictions from intermediate layers (i.e., early-exit). Training an EDNN architecture is challenging as it consists of two intertwined components: the gating mechanism (GM) that controls early-exiting decisions and the intermediate inference modules (IMs) that perform inference from intermediate representations. As a result, most existing approaches rely on thresholding confidence metrics for the gating mechanism and strive to improve the underlying backbone network and the inference modules. Although successful, this approach has two fundamental shortcomings: 1) the GMs and the IMs are decoupled during training, leading to a train-test mismatch; and 2) the thresholding gating mechanism introduces a positive bias into the predictive probabilities, making it difficult to readily extract uncertainty information. We propose a novel architecture that connects these two modules. This leads to significant performance improvements on classification datasets and enables better uncertainty characterization capabilities. The dominant approach to improve machine learning models is to develop larger networks that can handle every potential sample. As a result, despite very impressive performance, the resource overhead is huge (Scao et al., 2023). The push for larger model size is often driven by the need to handle a small percentage of samples that are particularly challenging to infer (Bolukbasi et al., 2017); most inferences do not need the full power of a large network to be successfully executed. Nonetheless, most traditional neural network (NN) models have a fixed processing pipeline. This means that every sample, simple or complex, is processed the same way. To tackle this inefficiency, dynamic networks have been introduced (see (Han et al., 2022a) for a review).

architecture, ims, mul-add, (15 more...)

arXiv.org Artificial Intelligence

2310.09163

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > Canada > Ontario > Toronto (0.14)
Europe > France (0.04)

Genre:

Instructional Material (0.67)
Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model

Ye, Qichen, Liu, Junling, Chong, Dading, Zhou, Peilin, Hua, Yining, Liu, Andrew

arXiv.org Artificial IntelligenceOct-13-2023

Integrating large language models (LLMs) into healthcare presents potential but faces challenges. Directly pre-training LLMs for domains like medicine is resource-heavy and sometimes unfeasible. Sole reliance on Supervised Fine-tuning (SFT) can result in overconfident predictions and may not tap into domain specific insights. Addressing these challenges, we present a multi-stage training method combining Domain-specific Continued Pre-training (DCPT), SFT, and Direct Preference Optimization (DPO). A notable contribution of our study is the introduction of a 3Gb Chinese Medicine (ChiMed) dataset, encompassing medical question answering, plain texts, knowledge graphs, and dialogues, segmented into three training stages. The medical LLM trained with our pipeline, Qilin-Med, exhibits significant performance boosts. In the CPT and SFT phases, it achieves 38.4% and 40.0% accuracy on the CMExam, surpassing Baichuan-7B's 33.5%. In the DPO phase, on the Huatuo-26M test set, it scores 16.66 in BLEU-1 and 27.44 in ROUGE1, outperforming the SFT's 12.69 and 24.21. This highlights the strength of our training approach in refining LLMs for medical applications.

dataset, language model, qilin-med, (17 more...)

arXiv.org Artificial Intelligence

2310.09089

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre:

Research Report (0.50)
Instructional Material (0.47)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Comprehensive Survey on Deep Learning Techniques in Educational Data Mining

Lin, Yuanguo, Chen, Hong, Xia, Wei, Lin, Fan, Wu, Pengcheng, Wang, Zongyue, Liu, Yong

arXiv.org Artificial IntelligenceOct-13-2023

Educational Data Mining (EDM) has emerged as a vital field of research, which harnesses the power of computational techniques to analyze educational data. With the increasing complexity and diversity of educational data, Deep Learning techniques have shown significant advantages in addressing the challenges associated with analyzing and modeling this data. This survey aims to systematically review the state-of-the-art in EDM with Deep Learning. We begin by providing a brief introduction to EDM and Deep Learning, highlighting their relevance in the context of modern education. Next, we present a detailed review of Deep Learning techniques applied in four typical educational scenarios, including knowledge tracing, undesirable student detecting, performance prediction, and personalized recommendation. Furthermore, a comprehensive overview of public datasets and processing tools for EDM is provided. Finally, we point out emerging trends and future directions in this research area.

knowledge, recommendation, student, (14 more...)

arXiv.org Artificial Intelligence

2309.04761

Country:

North America > United States > Pennsylvania (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)
Asia > China > Fujian Province > Xiamen (0.04)
(5 more...)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.93)
Instructional Material > Online (0.68)
(2 more...)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)
Education > Assessment & Standards (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation

Lin, Jianghao, Shan, Rong, Zhu, Chenxu, Du, Kounianhua, Chen, Bo, Quan, Shigang, Tang, Ruiming, Yu, Yong, Zhang, Weinan

arXiv.org Artificial IntelligenceOct-13-2023

With large language models (LLMs) achieving remarkable breakthroughs in natural language processing (NLP) domains, LLM-enhanced recommender systems have received much attention and have been actively explored currently. In this paper, we focus on adapting and empowering a pure large language model for zero-shot and few-shot recommendation tasks. First and foremost, we identify and formulate the lifelong sequential behavior incomprehension problem for LLMs in recommendation domains, i.e., LLMs fail to extract useful information from a textual context of long user behavior sequence, even if the length of context is far from reaching the context limitation of LLMs. To address such an issue and improve the recommendation performance of LLMs, we propose a novel framework, namely Retrieval-enhanced Large Language models (ReLLa) for recommendation tasks in both zero-shot and few-shot settings. For zero-shot recommendation, we perform semantic user behavior retrieval (SUBR) to improve the data quality of testing samples, which greatly reduces the difficulty for LLMs to extract the essential knowledge from user behavior sequences. As for few-shot recommendation, we further design retrieval-enhanced instruction tuning (ReiT) by adopting SUBR as a data augmentation technique for training samples. Specifically, we develop a mixed training dataset consisting of both the original data samples and their retrieval-enhanced counterparts. We conduct extensive experiments on three real-world public datasets to demonstrate the superiority of ReLLa compared with existing baseline models, as well as its capability for lifelong sequential behavior comprehension. To be highlighted, with only less than 10% training samples, few-shot ReLLa can outperform traditional CTR models that are trained on the entire training set (e.g., DCNv2, DIN, SIM).

behavior sequence, dataset, rella, (11 more...)

arXiv.org Artificial Intelligence

2308.11131

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Instructional Material (1.00)

Industry:

Media > Film (0.69)
Leisure & Entertainment (0.47)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Trustworthy Machine Learning

Mucsányi, Bálint, Kirchhof, Michael, Nguyen, Elisa, Rubinstein, Alexander, Oh, Seong Joon

arXiv.org Artificial IntelligenceOct-12-2023

As machine learning technology gets applied to actual products and solutions, new challenges have emerged. Models unexpectedly fail to generalize to small changes in the distribution, tend to be confident on novel data they have never seen, or cannot communicate the rationale behind their decisions effectively with the end users. Collectively, we face a trustworthiness issue with the current machine learning technology. This textbook on Trustworthy Machine Learning (TML) covers a theoretical and technical background of four key topics in TML: Out-of-Distribution Generalization, Explainability, Uncertainty Quantification, and Evaluation of Trustworthiness. We discuss important classical and contemporary research papers of the aforementioned fields and uncover and connect their underlying intuitions. The book evolved from the homonymous course at the University of T\"ubingen, first offered in the Winter Semester of 2022/23. It is meant to be a stand-alone product accompanied by code snippets and various pointers to further sources on topics of TML. The dedicated website of the book is https://trustworthyml.io/.

large language model, machine learning, pattern recognition, (30 more...)

arXiv.org Artificial Intelligence

2310.08215

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.13)
North America > United States > New Mexico > Lea County (0.13)
North America > United States > Louisiana (0.13)
(2 more...)

Genre:

Summary/Review (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
(3 more...)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (1.00)
Government (1.00)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(12 more...)

Add feedback