AITopics | mega

Collaborating Authors

mega

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MEGA: Second-Order Gradient Alignment for Catastrophic Forgetting Mitigation in GFSCIL

Pang, Jinhui, Lin, Changqing, Lin, Hao, Zhang, Zhihui, Ding, Weiping, Liu, Yu, Hao, Xiaoshuai

arXiv.org Artificial IntelligenceAug-21-2025

Graph Few-Shot Class-Incremental Learning (GFSCIL) enables models to continually learn from limited samples of novel tasks after initial training on a large base dataset. Existing GFSCIL approaches typically utilize Prototypical Networks (PNs) for metric-based class representations and fine-tune the model during the incremental learning stage. However, these PN-based methods oversimplify learning via novel query set fine-tuning and fail to integrate Graph Continual Learning (GCL) techniques due to architectural constraints. To address these challenges, we propose a more rigorous and practical setting for GFSCIL that excludes query sets during the incremental training phase. Building on this foundation, we introduce Model-Agnostic Meta Graph Continual Learning (MEGA), aimed at effectively alleviating catastrophic forgetting for GFSCIL. Specifically, by calculating the incremental second-order gradient during the meta-training stage, we endow the model to learn high-quality priors that enhance incremental learning by aligning its behaviors across both the meta-training and incremental learning stages. Extensive experiments on four mainstream graph datasets demonstrate that MEGA achieves state-of-the-art results and enhances the effectiveness of various GCL methods in GFSCIL. We believe that our proposed MEGA serves as a model-agnostic GFSCIL paradigm, paving the way for future research.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2504.13691

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Macao (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report > Promising Solution (0.46)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Findings of MEGA: Maths Explanation with LLMs using the Socratic Method for Active Learning

Adewumi, Tosin, Liwicki, Foteini Simistira, Liwicki, Marcus, Gardelli, Viktor, Alkhaled, Lama, Mokayed, Hamam

arXiv.org Artificial IntelligenceJul-17-2025

This paper presents an intervention study on the effects of the combined methods of (1) the Socratic method, (2) Chain of Thought (CoT) reasoning, (3) simplified gamification and (4) formative feedback on university students' Maths learning driven by large language models (LLMs). We call our approach Mathematics Explanations through Games by AI LLMs (MEGA). Some students struggle with Maths and as a result avoid Math-related discipline or subjects despite the importance of Maths across many fields, including signal processing. Oftentimes, students' Maths difficulties stem from suboptimal pedagogy. We compared the MEGA method to the traditional step-by-step (CoT) method to ascertain which is better by using a within-group design after randomly assigning questions for the participants, who are university students. Samples (n=60) were randomly drawn from each of the two test sets of the Grade School Math 8K (GSM8K) and Mathematics Aptitude Test of Heuristics (MATH) datasets, based on the error margin of 11%, the confidence level of 90%, and a manageable number of samples for the student evaluators. These samples were used to evaluate two capable LLMs at length (Generative Pretrained Transformer 4o (GPT4o) and Claude 3.5 Sonnet) out of the initial six that were tested for capability. The results showed that students agree in more instances that the MEGA method is experienced as better for learning for both datasets. It is even much better than the CoT (47.5% compared to 26.67%) in the more difficult MATH dataset, indicating that MEGA is better at explaining difficult Maths problems.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.12079

Country:

Europe > Sweden > Norrbotten County > Luleå (0.05)
North America > United States > Maryland (0.04)
Europe > Switzerland > Fribourg > Fribourg (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education > Educational Setting > Higher Education (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Memorization and Knowledge Injection in Gated LLMs

Pan, Xu, Hahami, Ely, Zhang, Zechen, Sompolinsky, Haim

arXiv.org Artificial IntelligenceMay-1-2025

Large Language Models (LLMs) currently struggle to sequentially add new memories and integrate new knowledge. These limitations contrast with the human ability to continuously learn from new experiences and acquire knowledge throughout life. Most existing approaches add memories either through large context windows or external memory buffers (e.g., Retrieval-Augmented Generation), and studies on knowledge injection rarely test scenarios resembling everyday life events. In this work, we introduce a continual learning framework, Memory Embedded in Gated LLMs (MEGa), which injects event memories directly into the weights of LLMs. Each memory is stored in a dedicated set of gated low-rank weights. During inference, a gating mechanism activates relevant memory weights by matching query embeddings to stored memory embeddings. This enables the model to both recall entire memories and answer related questions. On two datasets - fictional characters and Wikipedia events - MEGa outperforms baseline approaches in mitigating catastrophic forgetting. Our model draws inspiration from the complementary memory system of the human brain.

arxiv preprint arxiv, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2504.21239

Country:

Europe > United Kingdom > England > South Yorkshire (0.05)
South America > Peru (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
(12 more...)

Genre: Research Report (0.82)

Industry:

Government (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.93)
Education > Educational Setting > K-12 Education (0.68)
Leisure & Entertainment > Sports > Basketball (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OMEGA: A Low-Latency GNN Serving System for Large Graphs

Kim, Geon-Woo, Kim, Donghyun, Moon, Jeongyoon, Liu, Henry, Khan, Tarannum, Iyer, Anand, Kim, Daehyeok, Akella, Aditya

arXiv.org Artificial IntelligenceJan-14-2025

Graph Neural Networks (GNNs) have been widely adopted for their ability to compute expressive node representations in graph datasets. However, serving GNNs on large graphs is challenging due to the high communication, computation, and memory overheads of constructing and executing computation graphs, which represent information flow across large neighborhoods. Existing approximation techniques in training can mitigate the overheads but, in serving, still lead to high latency and/or accuracy loss. To this end, we propose OMEGA, a system that enables low-latency GNN serving for large graphs with minimal accuracy loss through two key ideas. First, OMEGA employs selective recomputation of precomputed embeddings, which allows for reusing precomputed computation subgraphs while selectively recomputing a small fraction to minimize accuracy loss. Second, we develop computation graph parallelism, which reduces communication overhead by parallelizing the creation and execution of computation graphs across machines. Our evaluation with large graph datasets and GNN models shows that OMEGA significantly outperforms state-of-the-art techniques.

data mining, machine learning, node, (22 more...)

arXiv.org Artificial Intelligence

2501.08547

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Washington > King County > Renton (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Improved Models for Media Bias Detection and Subcategorization

Menzner, Tim, Leidner, Jochen L.

arXiv.org Artificial IntelligenceDec-16-2024

We present improved models for the granular detection and sub-classification news media bias in English news articles. We compare the performance of zero-shot versus fine-tuned large pre-trained neural transformer language models, explore how the level of detail of the classes affects performance on a novel taxonomy of 27 news bias-types, and demonstrate how using synthetically generated example data can be used to improve quality.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.11835

Country:

North America > United States (0.28)
Europe > United Kingdom (0.14)
Europe > Switzerland (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Media > News (1.00)
Government (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MeGA: Merging Multiple Independently Trained Neural Networks Based on Genetic Algorithm

Yun, Daniel

arXiv.org Artificial IntelligenceJun-27-2024

In this paper, we introduce a novel method for merging the weights of multiple pre-trained neural networks using a genetic algorithm called MeGA. Traditional techniques, such as weight averaging and ensemble methods, often fail to fully harness the capabilities of pre-trained networks. Our approach leverages a genetic algorithm with tournament selection, crossover, and mutation to optimize weight combinations, creating a more effective fusion. This technique allows the merged model to inherit advantageous features from both parent models, resulting in enhanced accuracy and robustness. Through experiments on the CIFAR-10 dataset, we demonstrate that our genetic algorithm-based weight merging method improves test accuracy compared to individual models and conventional methods. This approach provides a scalable solution for integrating multiple pre-trained networks across various deep learning applications. Github is available at: https://github.com/YUNBLAK/MeGA-Merging-Multiple-Independently-Trained-Neural-Networks-Based-on-Genetic-Algorithm

algorithm, genetic algorithm, neural network, (14 more...)

arXiv.org Artificial Intelligence

2406.04607

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York > Suffolk County > Stony Brook (0.04)

Genre: Research Report > Promising Solution (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Encouraging Responsible Use of Generative AI in Education: A Reward-Based Learning Approach

Singh, Aditi, Ehtesham, Abul, Kumar, Saket, Gupta, Gaurav Kumar, Khoei, Tala Talaei

arXiv.org Artificial IntelligenceJun-26-2024

This research introduces an innovative mathematical learning approach that integrates generative AI to cultivate a structured learning rather than quick solution. Our method combines chatbot capabilities and generative AI to offer interactive problem-solving exercises, enhancing learning through a stepby-step approach for varied problems, advocating for the responsible use of AI in education. Our approach emphasizes that immediate answers from ChatGPT can impede real learning. We introduce a reward-based system that requires students to solve mathematical problems effectively to receive the final answer. This encourages a progressive learning path from basic to complex problems, rewarding mastery with final solutions. The goal is to transition students from seeking quick fixes to engaging actively in a comprehensive learning experience.

chatgpt, mega, student, (16 more...)

arXiv.org Artificial Intelligence

2407.15022

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Virginia > Loudoun County > Sterling (0.04)
North America > United States > Mississippi > Harrison County > Biloxi (0.04)
(4 more...)

Genre:

Instructional Material (0.69)
Research Report (0.50)

Industry:

Education > Curriculum > Subject-Specific Education (0.69)
Education > Educational Setting > K-12 Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.93)

Add feedback

On Analyzing the Role of Image for Visual-enhanced Relation Extraction

Li, Lei, Chen, Xiang, Qiao, Shuofei, Xiong, Feiyu, Chen, Huajun, Zhang, Ningyu

arXiv.org Artificial IntelligenceNov-14-2022

Multimodal relation extraction is an essential task for knowledge graph construction. In this paper, we take an in-depth empirical analysis that indicates the inaccurate information in the visual scene graph leads to poor modal alignment weights, further degrading performance. Moreover, the visual shuffle experiments illustrate that the current approaches may not take full advantage of visual information. Based on the above observation, we further propose a strong baseline with an implicit fine-grained multimodal alignment based on Transformer for multimodal relation extraction. Experimental results demonstrate the better performance of our method. Codes are available at https://github.com/zjunlp/DeepKE/tree/main/example/re/multimodal.

artificial intelligence, information, natural language, (14 more...)

arXiv.org Artificial Intelligence

2211.07504

Country:

Asia > China > Zhejiang Province > Ningbo (0.05)
Asia > China > Zhejiang Province > Hangzhou (0.05)

Genre: Research Report > New Finding (0.35)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

MEGA: Model Stealing via Collaborative Generator-Substitute Networks

Hong, Chi, Huang, Jiyue, Chen, Lydia Y.

arXiv.org Artificial IntelligenceJan-31-2022

Deep machine learning models are increasingly deployedin the wild for providing services to users. Adversaries maysteal the knowledge of these valuable models by trainingsubstitute models according to the inference results of thetargeted deployed models. Recent data-free model stealingmethods are shown effective to extract the knowledge of thetarget model without using real query examples, but they as-sume rich inference information, e.g., class probabilities andlogits. However, they are all based on competing generator-substitute networks and hence encounter training instability.In this paper we propose a data-free model stealing frame-work,MEGA, which is based on collaborative generator-substitute networks and only requires the target model toprovide label prediction for synthetic query examples. Thecore of our method is a model stealing optimization con-sisting of two collaborative models (i) the substitute modelwhich imitates the target model through the synthetic queryexamples and their inferred labels and (ii) the generatorwhich synthesizes images such that the confidence of thesubstitute model over each query example is maximized. Wepropose a novel coordinate descent training procedure andanalyze its convergence. We also empirically evaluate thetrained substitute model on three datasets and its applicationon black-box adversarial attacks. Our results show that theaccuracy of our trained substitute model and the adversarialattack success rate over it can be up to 33% and 40% higherthan state-of-the-art data-free black-box attacks.

scenario, substitute model, target model, (16 more...)

arXiv.org Artificial Intelligence

2202.00008

Country:

Europe > Netherlands > South Holland > Delft (0.04)
Europe > Austria (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(5 more...)

Genre: Research Report > New Finding (0.54)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Bootstrapping Informative Graph Augmentation via A Meta Learning Approach

Gao, Hang, Li, Jiangmeng, Qiang, Wenwen, Si, Lingyu, Zheng, Changwen, Sun, Fuchun

arXiv.org Artificial IntelligenceJan-11-2022

Recent works explore learning graph representations in a self-supervised manner. In graph contrastive learning, benchmark methods apply various graph augmentation approaches. However, most of the augmentation methods are non-learnable, which causes the issue of generating unbeneficial augmented graphs. Such augmentation may degenerate the representation ability of graph contrastive learning methods. Therefore, we motivate our method to generate augmented graph by a learnable graph augmenter, called MEta Graph Augmentation (MEGA). We then clarify that a "good" graph augmentation must have uniformity at the instance-level and informativeness at the feature-level. To this end, we propose a novel approach to learning a graph augmenter that can generate an augmentation with uniformity and informativeness. The objective of the graph augmenter is to promote our feature extraction network to learn a more discriminative feature representation, which motivates us to propose a meta-learning paradigm. Empirically, the experiments across multiple benchmark datasets demonstrate that MEGA outperforms the state-of-the-art methods in graph self-supervised learning tasks. Further experimental studies prove the effectiveness of different terms of MEGA.

augmented graph, graph, representation, (14 more...)

arXiv.org Artificial Intelligence

2201.03812

Country:

Oceania > Australia (0.04)
Asia (0.04)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)

Add feedback