AITopics

2507.19205

Country:

Europe (0.93)
Asia (0.93)
North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (1.00)

Industry: Construction & Engineering (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture (1.00)

arXiv.org Artificial IntelligenceMay-23-2025

MPL: Multiple Programming Languages with Large Language Models for Information Extraction

Li, Bo, Fang, Gexiang, Ye, Wei, Xu, Zhenghua, Zhang, Jinglei, Cheng, Hao, Zhang, Shikun

Recent research in information extraction (IE) focuses on utilizing code-style inputs to enhance structured output generation. The intuition behind this is that the programming languages (PLs) inherently exhibit greater structural organization than natural languages (NLs). This structural advantage makes PLs particularly suited for IE tasks. Nevertheless, existing research primarily focuses on Python for code-style simulation, overlooking the potential of other widely-used PLs (e.g., C++ and Java) during the supervised fine-tuning (SFT) phase. In this research, we propose \textbf{M}ultiple \textbf{P}rogramming \textbf{L}anguages with large language models for information extraction (abbreviated as \textbf{MPL}), a novel framework that explores the potential of incorporating different PLs in the SFT phase. Additionally, we introduce \texttt{function-prompt} with virtual running to simulate code-style inputs more effectively and efficiently. Experimental results on a wide range of datasets demonstrate the effectiveness of MPL. Furthermore, we conduct extensive experiments to provide a comprehensive analysis. We have released our code for future research.

arxiv preprint arxiv, large language model, natural language, (16 more...)

2505.16107

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)

Singh, Kamal, Marouani, Sami, Sheikh, Ahmad Al, Quang, Pham Tran Anh, Habrard, Amaury

Interpretable Reinforcement Learning for Load Balancing using Kolmogorov-Arnold Networks

arXiv.org Artificial IntelligenceMay-21-2025

As load and delta load increase, the policy puts more flows on the Internet link. Increasing Internet delay puts the flows on MPLS. The contribution of Internet loss seems counter intuitive as it seems to put more load on Internet Link. However, even if its coefficient is near to 1.0, the overall contribution of the term is negligible as compared to load because loss in our scenario varies from 0 to around 0.15. This applies to delay too. For minimising loss, we extract the following: a 1. 9 1 .1( 2 λ 3 + 1) 2 2λ i 5 + 10 d i 3 + u i 10 (4) This policy can be interpreted as follows, and we may refer to Figure 1 as well. The ratio starts near 0.8 and increasing load, with increasing delta, puts more traffic on Internet link. Increasing Internet delay and Internet link utilisation slightly shifts the balance towards putting more traffic on MPLS link. Distillation of symbolic equations of PPO policy: In this method, we train policy using PPO, generate trajectory data and then generate the symbolic equations using auto-regressive models [22].

machine learning, reinforcement learning, traffic, (18 more...)

2505.14459

Genre: Research Report (0.40)

Industry: Energy > Power Industry (0.43)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningMar-17-2025

A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules

Luo, Kairong, Wen, Haodong, Hu, Shengding, Sun, Zhenbo, Liu, Zhiyuan, Sun, Maosong, Lyu, Kaifeng, Chen, Wenguang

Training large models is both resource-intensive and time-consuming, making it crucial to understand the quantitative relationship between model performance and hyperparameters. In this paper, we present an empirical law that describes how the pretraining loss of large language models evolves under different learning rate schedules, such as constant, cosine, and step decay schedules. Our proposed law takes a multi-power form, combining a power law based on the sum of learning rates and additional power laws to account for a loss reduction effect induced by learning rate decay. We extensively validate this law on various model sizes and architectures, and demonstrate that after fitting on a few learning rate schedules, the law accurately predicts the loss curves for unseen schedules of different shapes and horizons. Moreover, by minimizing the predicted final pretraining loss across learning rate schedules, we are able to find a schedule that outperforms the widely used cosine learning rate schedule. Interestingly, this automatically discovered schedule bears some resemblance to the recently proposed Warmup-Stable-Decay (WSD) schedule (Hu et al, 2024) but achieves a slightly lower final loss. We believe these results could offer valuable insights for understanding the dynamics of pretraining and designing learning rate schedules to improve efficiency.

large language model, machine learning, natural language, (19 more...)

2503.12811

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(3 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Government (0.67)
Law > Statutes (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)

Storm, J., Rocha, I. B. C. M., van der Meer, F. P.

A Microstructure-based Graph Neural Network for Accelerating Multiscale Simulations

arXiv.org Artificial IntelligenceFeb-20-2024

Simulating the mechanical response of advanced materials can be done more accurately using concurrent multiscale models than with single-scale simulations. However, the computational costs stand in the way of the practical application of this approach. The costs originate from microscale Finite Element (FE) models that must be solved at every macroscopic integration point. A plethora of surrogate modeling strategies attempt to alleviate this cost by learning to predict macroscopic stresses from macroscopic strains, completely replacing the microscale models. In this work, we introduce an alternative surrogate modeling strategy that allows for keeping the multiscale nature of the problem, allowing it to be used interchangeably with an FE solver for any time step. Our surrogate provides all microscopic quantities, which are then homogenized to obtain macroscopic quantities of interest. We achieve this for an elasto-plastic material by predicting full-field microscopic strains using a graph neural network (GNN) while retaining the microscopic constitutive material model to obtain the stresses. This hybrid data-physics graph-based approach avoids the high dimensionality originating from predicting full-field responses while allowing non-locality to arise. By training the GNN on a variety of meshes, it learns to generalize to unseen meshes, allowing a single model to be used for a range of microstructures. The embedded microscopic constitutive model in the GNN implicitly tracks history-dependent variables and leads to improved accuracy. We demonstrate for several challenging scenarios that the surrogate can predict complex macroscopic stress-strain paths. As the computation time of our method scales favorably with the number of elements in the microstructure compared to the FE method, our method can significantly accelerate FE2 simulations.

material model, microstructure, time step, (15 more...)

2402.13101

Country: Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (0.64)

Industry: Materials (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Zezario, Ryandhimas E., Bai, Bo-Ren Brian, Fuh, Chiou-Shann, Wang, Hsin-Min, Tsao, Yu

Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model

arXiv.org Artificial IntelligenceSep-11-2023

This study proposes a multi-task pseudo-label learning (MPL)-based non-intrusive speech quality assessment model called MTQ-Net. MPL consists of two stages: obtaining pseudo-label scores from a pretrained model and performing multi-task learning. The 3QUEST metrics, namely Speech-MOS (S-MOS), Noise-MOS (N-MOS), and General-MOS (G-MOS), are the assessment targets. The pretrained MOSA-Net model is utilized to estimate three pseudo labels: perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and speech distortion index (SDI). Multi-task learning is then employed to train MTQ-Net by combining a supervised loss (derived from the difference between the estimated score and the ground-truth label) and a semi-supervised loss (derived from the difference between the estimated score and the pseudo label), where the Huber loss is employed as the loss function. Experimental results first demonstrate the advantages of MPL compared to training a model from scratch and using a direct knowledge transfer mechanism. Second, the benefit of the Huber loss for improving the predictive ability of MTQ-Net is verified. Finally, the MTQ-Net with the MPL approach exhibits higher overall predictive power compared to other SSL-based speech assessment models.

assessment model, mtq-net, prediction, (17 more...)

2308.09262

Country:

Asia > Taiwan (0.05)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Liu, Yubin, Ye, Qiming, Escribano-Macias, Jose, Feng, Yuxiang, Candela, Eduardo, Angeloudis, Panagiotis

Route Planning for Last-Mile Deliveries Using Mobile Parcel Lockers: A Hybrid Q-Learning Network Approach

arXiv.org Artificial IntelligenceFeb-9-2023

Mobile parcel lockers have been recently proposed by logistics operators as a technology that could help reduce traffic congestion and operational costs in urban freight distribution. Given their ability to relocate throughout their area of deployment, they hold the potential to improve customer accessibility and convenience. In this study, we formulate the Mobile Parcel Locker Problem (MPLP) , a special case of the Location-Routing Problem (LRP) which determines the optimal stopover location for MPLs throughout the day and plans corresponding delivery routes. A Hybrid Q Learning Network based Method (HQM) is developed to resolve the computational complexity of the resulting large problem instances while escaping local optima. In addition, the HQM is integrated with global and local search mechanisms to resolve the dilemma of exploration and exploitation faced by classic reinforcement learning methods. We examine the performance of HQM under different problem sizes (up to 200 nodes) and benchmarked it against the exact approach and Genetic Algorithm (GA). Our results indicate that HQM achieves better optimisation performance with shorter computation time than the exact approach solved by the Gurobi solver in large problem instances. Additionally, the average reward obtained by HQM is 1.96 times greater than GA, which demonstrates that HQM has a better optimisation ability. Further, we identify critical factors that contribute to fleet size requirements, travel distances, and service delays. Our findings outline that the efficiency of MPLs is mainly contingent on the length of time windows and the deployment of MPL stopovers. Finally, we highlight managerial implications based on parametric analysis to provide guidance for logistics operators in the context of efficient last-mile distribution operations.

machine learning, parking space, reinforcement learning, (20 more...)

doi: 10.1016/j.tre.2023.103234

2209.04265

Country:

Europe (0.67)
North America > United States (0.27)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Freight & Logistics Services (1.00)
Information Technology (1.00)
(6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Pensar, Johan, Nyman, Henrik, Corander, Jukka

Structure Learning of Contextual Markov Networks using Marginal Pseudo-likelihood

arXiv.org Machine LearningMar-29-2021

Markov networks are popular models for discrete multivariate systems where the dependence structure of the variables is specified by an undirected graph. To allow for more expressive dependence structures, several generalizations of Markov networks have been proposed. Here we consider the class of contextual Markov networks which takes into account possible context-specific independences among pairs of variables. Structure learning of contextual Markov networks is very challenging due to the extremely large number of possible structures. One of the main challenges has been to design a score, by which a structure can be assessed in terms of model fit related to complexity, without assuming chordality. Here we introduce the marginal pseudo-likelihood as an analytically tractable criterion for general contextual Markov networks. Our criterion is shown to yield a consistent structure estimator. Experiments demonstrate the favorable properties of our method in terms of predictive accuracy of the inferred models.

edge context, graph, markov network, (17 more...)

doi: 10.1111/sjos.12260

2103.1554

Country:

North America > United States > Wisconsin (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Shi, Yunpeng, Lerman, Gilad

Message Passing Least Squares Framework and its Application to Rotation Synchronization

arXiv.org Machine LearningAug-14-2020

We propose an efficient algorithm for solving group synchronization under high levels of corruption and noise, while we focus on rotation synchronization. We first describe our recent theoretically guaranteed message passing algorithm that estimates the corruption levels of the measured group ratios. We then propose a novel reweighted least squares method to estimate the group elements, where the weights are initialized and iteratively updated using the estimated corruption levels. We demonstrate the superior performance of our algorithm over state-of-the-art methods for rotation synchronization using both synthetic and real data.

artificial intelligence, data mining, machine learning, (16 more...)

2007.13638

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
(7 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Architecture > Distributed Systems (0.63)

arXiv.org Machine LearningMar-23-2020

Meta Pseudo Labels

Pham, Hieu, Xie, Qizhe, Dai, Zihang, Le, Quoc V.

Many training algorithms of a deep neural network can be interpreted as minimizing the cross entropy loss between the prediction made by the network and a target distribution. In supervised learning, this target distribution is typically the ground-truth one-hot vector. In semi-supervised learning, this target distribution is typically generated by a pre-trained teacher model to train the main network. In this work, instead of using such predefined target distributions, we show that learning to adjust the target distribution based on the learning state of the main network can lead to better performances. In particular, we propose an efficient meta-learning algorithm, which encourages the teacher to adjust the target distributions of training examples in the manner that improves the learning of the main network. The teacher is updated by policy gradients computed by evaluating the main network on a held-out validation set. Our experiments demonstrate substantial improvements over strong baselines and establish state-ofthe-art performance on CIFAR-10, SVHN, and ImageNet. For instance, with ResNets on small datasets, we achieve 96.1% on CIFAR-10 with 4,000 labeled examples and 73.9% top-1 on ImageNet with 10% examples. Meanwhile, with EfficientNet on full datasets plus extra unlabeled data, we attain 98.6% accuracy on CIFAR-10 and 86.9% top-1 accuracy on ImageNet.

mpl, student, target distribution, (15 more...)

2003.1058

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (1.00)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)