AITopics

2407.12818

Country:

North America > United States > Texas (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Instructional Material > Course Syllabus & Notes (0.66)

Industry:

Education > Educational Setting (1.00)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Torresan, Filippo, Baltieri, Manuel

Disentangled Representations for Causal Cognition

arXiv.org Artificial IntelligenceJun-30-2024

Complex adaptive agents consistently achieve their goals by solving problems that seem to require an understanding of causal information, information pertaining to the causal relationships that exist among elements of combined agent-environment systems. Causal cognition studies and describes the main characteristics of causal learning and reasoning in human and non-human animals, offering a conceptual framework to discuss cognitive performances based on the level of apparent causal understanding of a task. Despite the use of formal intervention-based models of causality, including causal Bayesian networks, psychological and behavioural research on causal cognition does not yet offer a computational account that operationalises how agents acquire a causal understanding of the world. Machine and reinforcement learning research on causality, especially involving disentanglement as a candidate process to build causal representations, represent on the one hand a concrete attempt at designing causal artificial agents that can shed light on the inner workings of natural causal cognition. In this work, we connect these two areas of research to build a unifying framework for causal cognition that will offer a computational perspective on studies of animal cognition, and provide insights in the development of new algorithms for causal reinforcement learning in AI.

causal information, cit, learning, (11 more...)

2407.00744

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.27)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(16 more...)

Genre:

Research Report > Experimental Study (0.45)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Education (1.00)
Leisure & Entertainment > Games (0.92)
Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(3 more...)

arXiv.org Artificial IntelligenceJun-29-2024

Generative AI Agents with Large Language Model for Satellite Networks via a Mixture of Experts Transmission

Zhang, Ruichen, Du, Hongyang, Liu, Yinqiu, Niyato, Dusit, Kang, Jiawen, Xiong, Zehui, Jamalipour, Abbas, Kim, Dong In

In response to the needs of 6G global communications, satellite communication networks have emerged as a key solution. However, the large-scale development of satellite communication networks is constrained by the complex system models, whose modeling is challenging for massive users. Moreover, transmission interference between satellites and users seriously affects communication performance. To solve these problems, this paper develops generative artificial intelligence (AI) agents for model formulation and then applies a mixture of experts (MoE) approach to design transmission strategies. Specifically, we leverage large language models (LLMs) to build an interactive modeling paradigm and utilize retrieval-augmented generation (RAG) to extract satellite expert knowledge that supports mathematical modeling. Afterward, by integrating the expertise of multiple specialized components, we propose an MoE-proximal policy optimization (PPO) approach to solve the formulated problem. Each expert can optimize the optimization variables at which it excels through specialized training through its own network and then aggregates them through the gating network to perform joint optimization. The simulation results validate the accuracy and effectiveness of employing a generative agent for problem formulation. Furthermore, the superiority of the proposed MoE-ppo approach over other benchmarks is confirmed in solving the formulated problem. The adaptability of MoE-PPO to various customized modeling problems has also been demonstrated.

generative ai agent, modeling, satellite communication, (11 more...)

2404.09134

Country:

Asia > Singapore (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Asia > South Korea > Gyeonggi-do > Suwon (0.04)
Asia > China (0.04)

Genre:

Research Report (0.50)
Instructional Material (0.34)

Industry: Telecommunications (0.72)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.87)

Termehchi, Atefeh, Hossain, Ekram, Woungang, Isaac

Science-Informed Deep Learning (ScIDL) With Applications to Wireless Communications

Given the extensive and growing capabilities offered by deep learning (DL), more researchers are turning to DL to address complex challenges in next-generation (xG) communications. However, despite its progress, DL also reveals several limitations that are becoming increasingly evident. One significant issue is its lack of interpretability, which is especially critical for safety-sensitive applications. Another significant consideration is that DL may not comply with the constraints set by physics laws or given security standards, which are essential for reliable DL. Additionally, DL models often struggle outside their training data distributions, which is known as poor generalization. Moreover, there is a scarcity of theoretical guidance on designing DL algorithms. These challenges have prompted the emergence of a burgeoning field known as science-informed DL (ScIDL). ScIDL aims to integrate existing scientific knowledge with DL techniques to develop more powerful algorithms. The core objective of this article is to provide a brief tutorial on ScIDL that illustrates its building blocks and distinguishes it from conventional DL. Furthermore, we discuss both recent applications of ScIDL and potential future research directions in the field of wireless communications.

communication, knowledge, scientific knowledge, (16 more...)

2407.07742

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > Canada > Manitoba > Winnipeg Metropolitan Region > Winnipeg (0.04)

Genre:

Research Report (0.64)
Overview (0.46)
Instructional Material > Course Syllabus & Notes (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

YuLan: An Open-source Large Language Model

Zhu, Yutao, Zhou, Kun, Mao, Kelong, Chen, Wentong, Sun, Yiding, Chen, Zhipeng, Cao, Qian, Wu, Yihan, Chen, Yushuo, Wang, Feng, Zhang, Lei, Li, Junyi, Wang, Xiaolei, Wang, Lei, Zhang, Beichen, Dong, Zican, Cheng, Xiaoxue, Chen, Yuhan, Tang, Xinyu, Hou, Yupeng, Ren, Qiangqiang, Pang, Xincheng, Xie, Shufang, Zhao, Wayne Xin, Dou, Zhicheng, Mao, Jiaxin, Lin, Yankai, Song, Ruihua, Xu, Jun, Chen, Xu, Yan, Rui, Wei, Zhewei, Hu, Di, Huang, Wenbing, Gao, Ze-Feng, Chen, Yueguo, Lu, Weizheng, Wen, Ji-Rong

Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billion parameters. The base model of YuLan is pre-trained on approximately $1.7$T tokens derived from a diverse corpus, including massive English, Chinese, and multilingual texts. We design a three-stage pre-training method to enhance YuLan's overall capabilities. Subsequent phases of training incorporate instruction-tuning and human alignment, employing a substantial volume of high-quality synthesized data. To facilitate the learning of complex and long-tail knowledge, we devise a curriculum-learning framework throughout across these stages, which helps LLMs learn knowledge in an easy-to-hard manner. YuLan's training is finished on Jan, 2024 and has achieved performance on par with state-of-the-art LLMs across various English and Chinese benchmarks. This paper outlines a comprehensive technical roadmap for developing LLMs from scratch. Our model and codes are available at https://github.com/RUC-GSAI/YuLan-Chat.

dataset, instruction, knowledge, (16 more...)

2406.19853

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(17 more...)

Genre:

Research Report (0.64)
Instructional Material (0.45)

Industry:

Education > Educational Setting > K-12 Education (0.46)
Education > Curriculum (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

The impact of model size on catastrophic forgetting in Online Continual Learning

Lee, Eunhae

This study investigates the impact of model size on Online Continual Learning performance, with a focus on catastrophic forgetting. Employing ResNet architectures of varying sizes, the research examines how network depth and width affect model performance in class-incremental learning using the SplitCIFAR-10 dataset. Key findings reveal that larger models do not guarantee better Continual Learning performance; in fact, they often struggle more in adapting to new tasks, particularly in online settings. These results challenge the notion that larger models inherently mitigate catastrophic forgetting, highlighting the nuanced relationship between model size and Continual Learning efficacy. This study contributes to a deeper understanding of model scalability and its practical implications in Continual Learning scenarios.

continual learning, learning, online continual learning, (13 more...)

2407.00176

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre:

Research Report (1.00)
Instructional Material > Online (0.64)

Industry: Education (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Bringing Generative AI to Adaptive Learning in Education

Li, Hang, Xu, Tianlong, Zhang, Chaoli, Chen, Eason, Liang, Jing, Fan, Xing, Li, Haoyang, Tang, Jiliang, Wen, Qingsong

The recent surge in generative AI technologies, such as large language models and diffusion models, has boosted the development of AI applications in various domains, including science, finance, and education. Concurrently, adaptive learning, a concept that has gained substantial interest in the educational sphere, has proven its efficacy in enhancing students' learning efficiency. In this position paper, we aim to shed light on the intersectional studies of these two methods, which combine generative AI with adaptive learning concepts. By presenting discussions about the benefits, challenges, and potentials in this field, we argue that this union will contribute significantly to the development of the next-stage learning format in education.

adaptive learning, genai, learning, (13 more...)

2402.14601

Country:

North America > United States > Michigan (0.04)
Europe > Poland (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Africa > Middle East > Egypt (0.04)

Genre:

Research Report (1.00)
Instructional Material (1.00)
Overview (0.93)

Industry:

Education > Educational Setting (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

arXiv.org Artificial IntelligenceJun-27-2024

FernUni LLM Experimental Infrastructure (FLEXI) -- Enabling Experimentation and Innovation in Higher Education Through Access to Open Large Language Models

Zesch, Torsten, Hanses, Michael, Seidel, Niels, Aggarwal, Piush, Veiel, Dirk, de Witt, Claudia

Using the full potential of LLMs in higher education is hindered by challenges with access to LLMs. The two main access modes currently discussed are paying for a cloud-based LLM or providing a locally maintained open LLM. In this paper, we describe the current state of establishing an open LLM infrastructure at FernUniversit\"at in Hagen under the project name FLEXI (FernUni LLM Experimental Infrastructure). FLEXI enables experimentation within teaching and research with the goal of generating strongly needed evidence in favor (or against) the use of locally maintained open LLMs in higher education. The paper will provide some practical guidance for everyone trying to decide whether to run their own LLM server.

llm, server, university, (9 more...)

2407.13013

Country:

Europe > Germany (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(2 more...)

Genre: Instructional Material > Training Manual (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Setting > Higher Education (0.91)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Carenini, Giuseppe, Johnson, Jordon, Salamatian, Ali

Captioning Visualizations with Large Language Models (CVLLM): A Tutorial

arXiv.org Artificial IntelligenceJun-27-2024

It is well-established that visualizations have advantages over text-based representations for a number of analysis tasks, since they more fully leverage our innate visual processing capabilities. However, it has also been found that visualizations can be well-supported by textual augmentations such as captions [1]. Further, recent advances in large language models (LLMs) have resulted in their incorporation into an unprecedented number of applications and domains. That being the case, this tutorial aims to provide: (1) an overview of captioning visualizations and key concepts in Information Visualization (InfoVis), (2) an introduction to neural networks and transformers, (3) an exploration of the limitations of LLMs and recent developments in the field, and (4) the latest research on InfoVis captioning using LLMs and Large Vision-Language Models (LVLMs). We will begin with an overview of key concepts in InfoVis and captioning visualizations, including marks, channels, and content characterization.

computational linguistic, language model, visualization, (13 more...)

2406.19512

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
Asia > Singapore (0.05)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.05)
(2 more...)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-27-2024

FedMLP: Federated Multi-Label Medical Image Classification under Task Heterogeneity

Sun, Zhaobin, Wu, Nannan, Shi, Junjie, Yu, Li, Yang, Xin, Cheng, Kwang-Ting, Yan, Zengqiang

Cross-silo federated learning (FL) enables decentralized organizations to collaboratively train models while preserving data privacy and has made significant progress in medical image classification. One common assumption is task homogeneity where each client has access to all classes during training. However, in clinical practice, given a multi-label classification task, constrained by the level of medical knowledge and the prevalence of diseases, each institution may diagnose only partial categories, resulting in task heterogeneity. How to pursue effective multi-label medical image classification under task heterogeneity is under-explored. In this paper, we first formulate such a realistic label missing setting in the multi-label FL domain and propose a two-stage method FedMLP to combat class missing from two aspects: pseudo label tagging and global knowledge learning. The former utilizes a warmed-up model to generate class prototypes and select samples with high confidence to supplement missing labels, while the latter uses a global model as a teacher for consistency regularization to prevent forgetting missing class knowledge. Experiments on two publicly-available medical datasets validate the superiority of FedMLP against the state-of-the-art both federated semi-supervised and noisy label learning approaches under task heterogeneity. Code is available at https://github.com/szbonaldo/FedMLP.

federated learning, heterogeneity, learning, (14 more...)

2406.18995

Country:

Europe > Belgium > Flanders (0.04)
Asia > China > Hubei Province (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Research Report (1.00)
Instructional Material > Online (0.81)
Instructional Material > Course Syllabus & Notes (0.81)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.81)