AITopics

This paper presents the Korean National Educational Test Benchmark (KoNET), a new benchmark designed to evaluate Multimodal Generative AI Systems using Korean national educational tests. KoNET comprises four exams: the Korean Elementary General Educational Development Test (KoEGED), Middle (KoMGED), High (KoHGED), and College Scholastic Ability Test (KoCSAT). These exams are renowned for their rigorous standards and diverse questions, facilitating a comprehensive analysis of AI performance across different educational levels. By focusing on Korean, KoNET provides insights into model performance in less-explored languages. We assess a range of models - open-source, open-access, and closed APIs - by examining difficulties, subject diversity, and human error rates. The code and dataset builder will be made fully open-sourced at https://github.com/naver-ai/KoNET.

benchmark, error rate, zhang, (14 more...)

2502.15422

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > South Korea (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry:

Education > Educational Setting (1.00)
Education > Curriculum > Subject-Specific Education (0.46)
Education > Assessment & Standards > Educational Standards (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.71)

Chung, Yi-Ling, Cobo, Aurora, Serna, Pablo

Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking

Robust automatic fact-checking systems have the potential to combat online misinformation at scale. However, most existing research primarily focuses on English. In this paper, we introduce MultiSynFact, the first large-scale multilingual fact-checking dataset containing 2.2M claim-source pairs designed to support Spanish, German, English, and other low-resource languages. Our dataset generation pipeline leverages Large Language Models (LLMs), integrating external knowledge from Wikipedia and incorporating rigorous claim validation steps to ensure data quality. We evaluate the effectiveness of MultiSynFact across multiple models and experimental settings. Additionally, we open-source a user-friendly framework to facilitate further research in multilingual fact-checking and dataset generation.

computational linguistic, dataset, synthetic data, (15 more...)

2502.15419

Country:

Europe > United Kingdom (0.04)
South America > Colombia (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(11 more...)

Genre: Research Report (1.00)

Industry:

Media > News (0.48)
Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Lyu, Lixing, Jiang, Jiashuo, Cheung, Wang Chi

Efficiently Solving Discounted MDPs with Predictions on Transition Matrices

We study infinite-horizon Discounted Markov Decision Processes (DMDPs) under a generative model. Motivated by the Algorithm with Advice framework Mitzenmacher and Vassilvitskii 2022, we propose a novel framework to investigate how a prediction on the transition matrix can enhance the sample efficiency in solving DMDPs and improve sample complexity bounds. We focus on the DMDPs with $N$ state-action pairs and discounted factor $\gamma$. Firstly, we provide an impossibility result that, without prior knowledge of the prediction accuracy, no sampling policy can compute an $\epsilon$-optimal policy with a sample complexity bound better than $\tilde{O}((1-\gamma)^{-3} N\epsilon^{-2})$, which matches the state-of-the-art minimax sample complexity bound with no prediction. In complement, we propose an algorithm based on minimax optimization techniques that leverages the prediction on the transition matrix. Our algorithm achieves a sample complexity bound depending on the prediction error, and the bound is uniformly better than $\tilde{O}((1-\gamma)^{-4} N \epsilon^{-2})$, the previous best result derived from convex optimization methods. These theoretical findings are further supported by our numerical experiments.

algorithm, prediction, sample complexity, (13 more...)

2502.15345

Country:

Asia > Singapore (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Wegmann, Anna, Nguyen, Dong, Jurgens, David

Tokenization is Sensitive to Language Variation

Variation in language is ubiquitous and often systematically linked to regional, social, and contextual factors. Tokenizers split texts into smaller units and might behave differently for less common linguistic forms. This might affect downstream LLM performance differently on two types of tasks: Tasks where the model should be robust to language variation (e.g., for semantic tasks like NLI, labels do not depend on whether a text uses British or American spelling) and tasks where the model should be sensitive to language variation (e.g., for form-based tasks like authorship verification, labels depend on whether a text uses British or American spelling). We pre-train BERT base models for the popular Byte-Pair Encoding algorithm to investigate how key algorithmic design choices impact downstream models' performances: fitting corpus, pre-tokenizer and vocabulary size. We find that the best tokenizer varies on the two task types -- with the pre-tokenizer having the biggest impact on performance. Further, we introduce a new approach to estimate tokenizer impact on downstream LLM performance, showing significant improvement over techniques like R\'enyi efficiency. We encourage more work on language variation and its relation to tokenizers and thus LLM performance.

computational linguistic, language variation, tokenizer, (14 more...)

2502.15343

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
(27 more...)

Genre: Research Report > Experimental Study (0.47)

Industry:

Media > News (0.46)
Information Technology > Services (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Zaree, Pedram, Mamun, Md Abdullah Al, Alam, Quazi Mishkatul, Dong, Yue, Alouani, Ihsen, Abu-Ghazaleh, Nael

Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment

Recent research has shown that carefully crafted jailbreak inputs can induce large language models to produce harmful outputs, despite safety measures such as alignment. It is important to anticipate the range of potential Jailbreak attacks to guide effective defenses and accurate assessment of model safety. In this paper, we present a new approach for generating highly effective Jailbreak attacks that manipulate the attention of the model to selectively strengthen or weaken attention among different parts of the prompt. By harnessing attention loss, we develop more effective jailbreak attacks, that are also transferrable. The attacks amplify the success rate of existing Jailbreak algorithms including GCG, AutoDAN, and ReNeLLM, while lowering their generation cost (for example, the amplified GCG attack achieves 91.2% ASR, vs. 67.9% for the original attack on Llama2-7B/AdvBench, using less than a third of the generation time).

arxiv preprint arxiv, jailbreak attack, jailbreak prompt, (13 more...)

2502.15334

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > Riverside County > Riverside (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation (0.69)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Road Traffic Sign Recognition method using Siamese network Combining Efficient-CNN based Encoder

Xi, Zhenghao, Shao, Yuchao, Zheng, Yang, Liu, Xiang, Liu, Yaqi, Cai, Yitong

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORT A TION SYSTEMS 1 Road Traffic Sign Recognition Method Using Siamese Network Combining Efficient-CNN-Based Encoder Zhenghao Xi, Member, IEEE, Y uchao Shao, Y ang Zheng, Member, IEEE, Xiang Liu, Member, IEEE, Y aqi Liu, and Yitong Cai Abstract -- Traffic signs recognition (TSR) plays an essential role in assistant driving and intelligent transportation system. However, the noise of complex environment may lead to motion-blur or occlusion problems, which raise the tough challenge to real-time recognition with high accuracy and robust. In this article, we propose IECES-network which with improved encoders and Siamese net. The three-stage approach of our method includes Efficient-CNN based encoders, Siamese backbone and the fully-connected layers. We firstly use convolu-tional encoders to extract and encode the traffic sign features of augmented training samples and standard images. Then, we design the Siamese neural network with Efficient-CNN based encoder and contrastive loss function, which can be trained to improve the robustness of TSR problem when facing the samples of motion-blur and occlusion by computing the distance between inputs and templates. Additionally, the template branch of the proposed network can be stopped when executing the recognition tasks after training to raise the process speed of our real-time model, and alleviate the computational resource and parameter scale. Finally, we recombined the feature code and a fully-connected layer with SoftMax function to classify the codes of samples and recognize the category of traffic signs. The results of experiments on the Tsinghua-T encent 100K dataset and the German Traffic Sign Recognition Benchmark dataset demonstrate the performance of the proposed IECES-network. Compared with other state-of-the-art methods, in the case of motion-blur and occluded environment, the proposed method achieves competitive performance precision-recall and accuracy metric average is 88.1%, 86.43% and 86.1% with a 2.9M lightweight scale, respectively. Moreover, processing time of our model is 0.1s per frame, of which the speed is increased by 1.5 times compared with existing methods. Index T erms-- Traffic signs recognition, Siamese network, efficient-CNN based encoder . Received 11 September 2024; revised 25 November 2024; accepted 9 January 2025.

category, recognition, traffic sign, (16 more...)

2502.15307

Country:

Asia > China > Shanghai > Shanghai (0.06)
Asia > China > Beijing > Beijing (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (0.46)
Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)

Kumon, Ryoma, Yanaka, Hitomi

Analyzing the Inner Workings of Transformers in Compositional Generalization

The compositional generalization abilities of neural models have been sought after for human-like linguistic competence. The popular method to evaluate such abilities is to assess the models' input-output behavior. However, that does not reveal the internal mechanisms, and the underlying competence of such models in compositional generalization remains unclear. To address this problem, we explore the inner workings of a Transformer model by finding an existing subnetwork that contributes to the generalization performance and by performing causal analyses on how the model utilizes syntactic features. We find that the model depends on syntactic features to output the correct answer, but that the subnetwork with much better generalization performance than the whole model relies on a non-compositional algorithm in addition to the syntactic features. We also show that the subnetwork improves its generalization performance relatively slowly during the training compared to the in-distribution one, and the non-compositional solution is acquired in the early stages of the training.

computational linguistic, generalization, subnetwork, (13 more...)

2502.15277

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
(7 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Yuan, Tingting, Chung, Hwei-Ming, Fu, Xiaoming

PP-MARL: Efficient Privacy-Preserving Multi-Agent Reinforcement Learning for Cooperative Intelligence in Communications

--Cooperative intelligence (CI) is expected to become an integral element in next-generation networks because it can aggregate the capabilities and intelligence of multiple devices. Multi-agent reinforcement learning (MARL) is a popular approach for achieving CI in communication problems by enabling effective collaboration among agents to address sequential problems. However, ensuring privacy protection for MARL is a challenging task because of the presence of heterogeneous agents that learn interdependently via sharing information. Implementing privacy protection techniques such as data encryption and federated learning to MARL introduces the notable overheads (e.g., computation and bandwidth). T o overcome these challenges, we propose PP-MARL, an efficient privacy-preserving learning scheme for MARL. PP-MARL leverages homomorphic encryption (HE) and differential privacy (DP) to protect privacy, while introducing split learning to decrease overheads via reducing the volume of shared messages, and then improve efficiency. We apply and evaluate PP-MARL in two communication-related use cases. Simulation results reveal that PP-MARL can achieve efficient and reliable collaboration with 1.1-6 times better privacy protection and lower overheads (e.g., 84-91% reduction in bandwidth) than state-of-the-art approaches. Cooperative intelligence (CI) [1], [2] is expected to facilitate next-generation networks by establishing collaboration among various communication-related intelligent equipment. Multi-agent reinforcement learning (MARL) is a popular approach for achieving CI in addressing sequential problems in communication [3], such as adaptive routing and resource allocation [4].

agent, pp-marl, privacy protection, (14 more...)

2204.12064

Country:

South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)
(4 more...)

Genre: Research Report (0.84)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningFeb-21-2025

Mantis: Lightweight Calibrated Foundation Model for User-Friendly Time Series Classification

Feofanov, Vasilii, Wen, Songkang, Alonso, Marius, Ilbert, Romain, Guo, Hongbo, Tiomoko, Malik, Pan, Lujia, Zhang, Jianfeng, Redko, Ievgen

In recent years, there has been increasing interest in developing foundation models for time series data that can generalize across diverse downstream tasks. While numerous forecasting-oriented foundation models have been introduced, there is a notable scarcity of models tailored for time series classification. To address this gap, we present Mantis, a new open-source foundation model for time series classification based on the Vision Transformer (ViT) architecture that has been pre-trained using a contrastive learning approach. Our experimental results show that Mantis outperforms existing foundation models both when the backbone is frozen and when fine-tuned, while achieving the lowest calibration error. In addition, we propose several adapters to handle the multivariate setting, reducing memory requirements and modeling channel interdependence.

dataset, foundation model, mantis, (12 more...)

arXiv.org Machine Learning

2502.15637

Country: South America > Ecuador (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neto, Fernando M de Paula

Data Complexity Measures for Quantum Circuits Architecture Recommendation

arXiv.org Artificial IntelligenceFeb-20-2025

Quantum Parametric Circuits are constructed as an alternative to reduce the size of quantum circuits, meaning to decrease the number of quantum gates and, consequently, the depth of these circuits. However, determining the optimal circuit for a given problem remains an open question. Testing various combinations is challenging due to the infinite possibilities. In this work, a quantum circuit recommendation architecture for classification problems is proposed using database complexity measures. A quantum circuit is defined based on a circuit layer and the number of times this layer is iterated. Fourteen databases of varying dimensions and di fferent numbers of classes were used to evaluate six quantum circuits, each with 1, 2, 3, 4, 8, and 16-layer repetitions. Using data complexity measures from the databases, it was possible to identify the optimal circuit capable of solving all problems with up to 100% accuracy. Furthermore, with a mean absolute error of 0.80 2.17, one determined the appropriate number of layer repetitions, allowing for an error margin of up to three additional layers. Sixteen distinct machine learning models were employed for the selection of quantum circuits, alongside twelve classical regressor models to dynamically define the number of layers. Introduction Quantum computing (QC) leverages principles from quantum mechanics to perform information processing. In addition to exploring intrinsically quantum phenomena such as superposition and entanglement [1], QC becomes even more relevant as the miniaturization of electronic components reaches the atomic level, and the laws of quantum mechanics come into play to operate them.

artificial intelligence, machine learning, quantum circuit, (17 more...)

2502.15129

Country:

South America > Brazil > Pernambuco (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
(2 more...)