AITopics

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
(3 more...)

Neural Information Processing SystemsDec-25-2025, 19:01:47 GMT

Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context

Language models of code (LMs) work well when the surrounding code provides sufficient context. This is not true when it becomes necessary to use types, functionality or APIs defined elsewhere in the repository or a linked library, especially those not seen during training. LMs suffer from limited awareness of such global context and end up hallucinating.Integrated development environments (IDEs) assist developers in understanding repository context using static analysis. We extend this assistance, enjoyed by developers, to LMs. We propose monitor-guided decoding (MGD) where a monitor uses static analysis to guide the decoding. We construct a repository-level dataset PragmaticCode for method-completion in Java and evaluate MGD on it. On models of varying parameter scale, by monitoring for type-consistent object dereferences, MGD consistently improves compilation rates and agreement with ground truth. Further, LMs with fewer parameters, when augmented with MGD, can outperform larger LMs.

lms, monitor-guided decoding, static analysis, (7 more...)

Technology: Information Technology > Artificial Intelligence (0.56)

Neural Information Processing SystemsOct-8-2025, 19:58:42 GMT

662b1774ba8845fc1fa3d1fc0177ceeb-Paper-Conference.pdf

large language model, machine learning, programming language, (24 more...)

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
(3 more...)

Neural Information Processing SystemsAug-15-2025, 08:57:04 GMT

7bb16972da003e87724f048d76b7e0e1-Paper.pdf

algorithm, optimization, pareto front, (11 more...)

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Asia > Singapore (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.30)

Oripov, Bakhrom G., Dienstfrey, Andrew, McCaughan, Adam N., Buckley, Sonia M.

Scaling of hardware-compatible perturbative training algorithms

arXiv.org Artificial IntelligenceJan-26-2025

In this work, we explore the capabilities of multiplexed gradient descent (MGD), a scalable and efficient perturbative zeroth-order training method for estimating the gradient of a loss function in hardware and training it via stochastic gradient descent. We extend the framework to include both weight and node perturbation, and discuss the advantages and disadvantages of each approach. We investigate the time to train networks using MGD as a function of network size and task complexity. Previous research has suggested that perturbative training methods do not scale well to large problems, since in these methods the time to estimate the gradient scales linearly with the number of network parameters. However, in this work we show that the time to reach a target accuracy--that is, actually solve the problem of interest--does not follow this undesirable linear scaling, and in fact often decreases with network size. Furthermore, we demonstrate that MGD can be used to calculate a drop-in replacement for the gradient in stochastic gradient descent, and therefore optimization accelerators such as momentum can be used alongside MGD, ensuring compatibility with existing machine learning practices. Our results indicate that MGD can efficiently train large networks on hardware, achieving accuracy comparable to backpropagation, thus presenting a practical solution for future neuromorphic computing systems.

artificial intelligence, machine learning, perturbation, (19 more...)

2501.15403

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
Asia > Japan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsJan-18-2025, 22:48:05 GMT

Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context

Language models of code (LMs) work well when the surrounding code provides sufficient context. This is not true when it becomes necessary to use types, functionality or APIs defined elsewhere in the repository or a linked library, especially those not seen during training. LMs suffer from limited awareness of such global context and end up hallucinating.Integrated development environments (IDEs) assist developers in understanding repository context using static analysis. We extend this assistance, enjoyed by developers, to LMs. We propose monitor-guided decoding (MGD) where a monitor uses static analysis to guide the decoding.

lms, monitor-guided decoding, repository context, (5 more...)

Technology: Information Technology > Artificial Intelligence (0.59)

Papazov, Hristo, Pesme, Scott, Flammarion, Nicolas

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

arXiv.org Machine LearningMar-8-2024

In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient descent with step size $\gamma$ and momentum parameter $\beta$ that allows us to identify an intrinsic quantity $\lambda = \frac{ \gamma }{ (1 - \beta)^2 }$ which uniquely defines the optimisation path and provides a simple acceleration rule. When training a $2$-layer diagonal linear network in an overparametrised regression setting, we characterise the recovered solution through an implicit regularisation problem. We then prove that small values of $\lambda$ help to recover sparse solutions. Finally, we give similar but weaker results for stochastic momentum gradient descent. We provide numerical experiments which support our claims.

balancedness, leveraging continuous time, mgf, (14 more...)

arXiv.org Machine Learning

2403.05293

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

Agrawal, Lakshya A, Kanade, Aditya, Goyal, Navin, Lahiri, Shuvendu K., Rajamani, Sriram K.

Guiding Language Models of Code with Global Context using Monitors

arXiv.org Artificial IntelligenceNov-3-2023

Language models of code (LMs) work well when the surrounding code provides sufficient context. This is not true when it becomes necessary to use types, functionality or APIs defined elsewhere in the repository or a linked library, especially those not seen during training. LMs suffer from limited awareness of such global context and end up hallucinating. Integrated development environments (IDEs) assist developers in understanding repository context using static analysis. We extend this assistance, enjoyed by developers, to LMs. We propose monitor-guided decoding (MGD) where a monitor uses static analysis to guide the decoding. We construct a repository-level dataset PragmaticCode for method-completion in Java and evaluate MGD on it. On models of varying parameter scale, by monitoring for type-consistent object dereferences, MGD consistently improves compilation rates and agreement with ground truth. Further, LMs with fewer parameters, when augmented with MGD, can outperform larger LMs. With MGD, SantaCoder-1.1B achieves better compilation rate and next-identifier match than the much larger text-davinci-003 model. We also conduct a generalizability study to evaluate the ability of MGD to generalize to multiple programming languages (Java, C# and Rust), coding scenarios (e.g., correct number of arguments to method calls), and to enforce richer semantic constraints (e.g., stateful API protocols). Our data and implementation are available at https://github.com/microsoft/monitors4codegen .

mgd, repository, static analysis, (16 more...)

2306.10763

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
(2 more...)

arXiv.org Artificial IntelligenceOct-28-2023

Effective Multi-Graph Neural Networks for Illicit Account Detection on Cryptocurrency Transaction Networks

Ding, Zhihao, Shi, Jieming, Li, Qing, Cao, Jiannong

We study illicit account detection on transaction networks of cryptocurrencies that are increasi_testngly important in online financial markets. The surge of illicit activities on cryptocurrencies has resulted in billions of losses from normal users. Existing solutions either rely on tedious feature engineering to get handcrafted features, or are inadequate to fully utilize the rich semantics of cryptocurrency transaction data, and consequently, yield sub-optimal performance. In this paper, we formulate the illicit account detection problem as a classification task over directed multigraphs with edge attributes, and present DIAM, a novel multi-graph neural network model to effectively detect illicit accounts on large transaction networks. First, DIAM includes an Edge2Seq module that automatically learns effective node representations preserving intrinsic transaction patterns of parallel edges, by considering both edge attributes and directed edge sequence dependencies. Then utilizing the multigraph topology, DIAM employs a new Multigraph Discrepancy (MGD) module with a well-designed message passing mechanism to capture the discrepant features between normal and illicit nodes, supported by an attention mechanism. Assembling all techniques, DIAM is trained in an end-to-end manner. Extensive experiments, comparing against 14 existing solutions on 4 large cryptocurrency datasets of Bitcoin and Ethereum, demonstrate that DIAM consistently achieves the best performance to accurately detect illicit accounts, while being efficient. For instance, on a Bitcoin dataset with 20 million nodes and 203 million edges, DIAM achieves F1 score 96.55%, significantly higher than the F1 score 83.92% of the best competitor. The code is available at https://github.com/TommyDzh/DIAM.

detection, diam, representation, (14 more...)

2309.0246

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Banking & Finance > Trading (1.00)
Information Technology > Services > e-Commerce Services (0.46)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Baldrati, Alberto, Morelli, Davide, Cartella, Giuseppe, Cornia, Marcella, Bertini, Marco, Cucchiara, Rita

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

arXiv.org Artificial IntelligenceAug-23-2023

Fashion illustration is used by designers to communicate their vision and to bring the design idea from conceptualization to realization, showing how clothes interact with the human body. In this context, computer vision can thus be used to improve the fashion design process. Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches. We tackle this problem by proposing a new architecture based on latent diffusion models, an approach that has not been used before in the fashion domain. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner. Experimental results on these new datasets demonstrate the effectiveness of our proposal, both in terms of realism and coherence with the given multimodal inputs. Source code and collected multimodal annotations are publicly available at: https://github.com/aimagelab/multimodal-garment-designer.

artificial intelligence, dataset, machine learning, (17 more...)