AITopics

Casnici, Davide, Frenkel, Charlotte, Dauwels, Justin

Bio-Inspired Artificial Neural Networks based on Predictive Coding

arXiv.org Machine LearningAug-13-2025

Backpropagation (BP) of errors is the backbone training algorithm for artificial neural networks (ANNs). It updates network weights through gradient descent to minimize a loss function representing the mismatch between predictions and desired outputs. BP uses the chain rule to propagate the loss gradient backward through the network hierarchy, allowing efficient weight updates. However, this process requires weight updates at every layer to rely on a global error signal generated at the network's output. In contrast, the Hebbian model of synaptic plasticity states that weight updates are local, depending only on the activity of pre- and post-synaptic neurons. This suggests biological brains likely do not implement BP directly. Recently, Predictive Coding (PC) has gained interest as a biologically plausible alternative that updates weights using only local information. Originating from 1950s work on signal compression, PC was later proposed as a model of the visual cortex and formalized under the free energy principle, linking it to Bayesian inference and dynamical systems. PC weight updates rely solely on local information and provide theoretical advantages such as automatic scaling of gradients based on uncertainty. This lecture notes column offers a novel, tutorial-style introduction to PC, focusing on its formulation, derivation, and connections to well-known optimization and signal processing algorithms such as BP and the Kalman Filter (KF). It aims to support existing literature by guiding readers from the mathematical foundations of PC to practical implementation, including Python examples using PyTorch.

artificial intelligence, machine learning, posterior, (20 more...)

arXiv.org Machine Learning

2508.08762

Country: Europe (0.68)

Genre:

Research Report (0.50)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Energy > Oil & Gas (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)

Flow Battery Manifold Design with Heterogeneous Inputs Through Generative Adversarial Neural Networks

Seng, Eric, O'Connor, Hugh, Boyce, Adam, Bailey, Josh J., van Beek, Anton

Generative machine learning has emerged as a powerful tool for design representation and exploration. However, its application is often constrained by the need for large datasets of existing designs and the lack of interpretability about what features drive optimality. To address these challenges, we introduce a systematic framework for constructing training datasets tailored to generative models and demonstrate how these models can be leveraged for interpretable design. The novelty of this work is twofold: (i) we present a systematic framework for generating archetypes with internally homogeneous but mutually heterogeneous inputs that can be used to generate a training dataset, and (ii) we show how integrating generative models with Bayesian optimization can enhance the interpretability of the latent space of admissible designs. These findings are validated by using the framework to design a flow battery manifold, demonstrating that it effectively captures the space of feasible designs, including novel configurations while enabling efficient exploration. This work broadens the applicability of generative machine-learning models in system designs by enhancing quality and reliability.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

2508.08863

Country:

North America > United States (0.46)
Europe > United Kingdom > Northern Ireland (0.14)

Genre: Research Report (1.00)

Industry:

Energy > Energy Storage (1.00)
Electrical Industrial Apparatus (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Phan, Peter, Agarwal, Dhruv, Srinivas, Kavitha, Samulowitz, Horst, Kapanipathi, Pavan, McCallum, Andrew

MiGrATe: Mixed-Policy GRPO for Adaptation at Test-Time

Large language models (LLMs) are increasingly being applied to black-box optimization tasks, from program synthesis to molecule design. Prior work typically leverages in-context learning to iteratively guide the model towards better solutions. Such methods, however, often struggle to balance exploration of new solution spaces with exploitation of high-reward ones. Recently, test-time training (TTT) with synthetic data has shown promise in improving solution quality. However, the need for hand-crafted training data tailored to each task limits feasibility and scalability across domains. To address this problem, we introduce MiGrATe-a method for online TTT that uses GRPO as a search algorithm to adapt LLMs at inference without requiring external training data. MiGrATe operates via a mixed-policy group construction procedure that combines on-policy sampling with two off-policy data selection techniques: greedy sampling, which selects top-performing past completions, and neighborhood sampling (NS), which generates completions structurally similar to high-reward ones. Together, these components bias the policy gradient towards exploitation of promising regions in solution space, while preserving exploration through on-policy sampling. We evaluate MiGrATe on three challenging domains-word search, molecule optimization, and hypothesis+program induction on the Abstraction and Reasoning Corpus (ARC)-and find that it consistently outperforms both inference-only and TTT baselines, demonstrating the potential of online TTT as a solution for complex search tasks without external supervision.

large language model, machine learning, natural language, (19 more...)

2508.08641

Country:

Asia (0.67)
North America > Mexico (0.28)
North America > United States > Massachusetts (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Energy (0.46)
Materials > Chemicals (0.46)
Education (0.46)
Transportation (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Zaboli, Aydin, Hong, Junho

Generative AI for Critical Infrastructure in Smart Grids: A Unified Framework for Synthetic Data Generation and Anomaly Detection

In digital substations, security events pose significant challenges to the sustained operation of power systems. To mitigate these challenges, the implementation of robust defense strategies is critically important. A thorough process of anomaly identification and detection in information and communication technology (ICT) frameworks is crucial to ensure secure and reliable communication and coordination between interconnected devices within digital substations. Hence, this paper addresses the critical cybersecurity challenges confronting IEC61850-based digital substations within modern smart grids, where the integration of advanced communication protocols, e.g., generic object-oriented substation event (GOOSE), has enhanced energy management and introduced significant vulnerabilities to cyberattacks. Focusing on the limitations of traditional anomaly detection systems (ADSs) in detecting threats, this research proposes a transformative approach by leveraging generative AI (GenAI) to develop robust ADSs. The primary contributions include the suggested advanced adversarial traffic mutation (AATM) technique to generate synthesized and balanced datasets for GOOSE messages, ensuring protocol compliance and enabling realistic zero-day attack pattern creation to address data scarcity. Then, the implementation of GenAI-based ADSs incorporating the task-oriented dialogue (ToD) processes has been explored for improved detection of attack patterns. Finally, a comparison of the GenAI-based ADS with machine learning (ML)-based ADSs has been implemented to showcase the outperformance of the GenAI-based frameworks considering the AATM-generated GOOSE datasets and standard/advanced performance evaluation metrics.

data mining, machine learning, natural language, (19 more...)

2508.08593

Country: North America > United States > Michigan (0.28)

Genre:

Overview > Innovation (0.48)
Research Report > Promising Solution (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Energy > Power Industry (1.00)
Government > Military > Cyberwarfare (0.87)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.72)

Sparse Partial Optimal Transport via Quadratic Regularization

Tran, Khang, Nguyen, Khoa, Nguyen, Anh, Huynh, Thong, Pham, Son, Nguyen-Dang, Sy-Hoang, Pham, Manh, Vo, Bang, Tran, Mai Ngoc, Tran, Mai Ngoc, Luong, Dung

Partial Optimal Transport (POT) has recently emerged as a central tool in various Machine Learning (ML) applications. It lifts the stringent assumption of the conventional Optimal Transport (OT) that input measures are of equal masses, which is often not guaranteed in real-world datasets, and thus offers greater flexibility by permitting transport between unbalanced input measures. Nevertheless, existing major solvers for POT commonly rely on entropic regularization for acceleration and thus return dense transport plans, hindering the adoption of POT in various applications that favor sparsity. In this paper, as an alternative approach to the entropic POT formulation in the literature, we propose a novel formulation of POT with quadratic regularization, hence termed quadratic regularized POT (QPOT), which induces sparsity to the transport plan and consequently facilitates the adoption of POT in many applications with sparsity requirements. Extensive experiments on synthetic and CIFAR-10 datasets, as well as real-world applications such as color transfer and domain adaptations, consistently demonstrate the improved sparsity and favorable performance of our proposed QPOT formulation.

artificial intelligence, experiment, machine learning, (18 more...)

doi: 10.3844/jcssp.2025.1677.1687

2508.08476

Country:

Asia > Vietnam (0.69)
North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Industry: Energy (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Neural Tangent Knowledge Distillation for Optical Convolutional Networks

Xiang, Jinlin, Choi, Minho, Zhang, Yubo, Zhou, Zhihao, Majumdar, Arka, Shlizerman, Eli

Hybrid Optical Neural Networks (ONNs, typically consisting of an optical frontend and a digital backend) offer an energy-efficient alternative to fully digital deep networks for real-time, power-constrained systems. However, their adoption is limited by two main challenges: the accuracy gap compared to large-scale networks during training, and discrepancies between simulated and fabricated systems that further degrade accuracy. While previous work has proposed end-to-end optimizations for specific datasets (e.g., MNIST) and optical systems, these approaches typically lack generalization across tasks and hardware designs. To address these limitations, we propose a task-agnostic and hardware-agnostic pipeline that supports image classification and segmentation across diverse optical systems. To assist optical system design before training, we estimate achievable model accuracy based on user-specified constraints such as physical size and the dataset. For training, we introduce Neural Tangent Knowledge Distillation (NTKD), which aligns optical models with electronic teacher networks, thereby narrowing the accuracy gap. After fabrication, NTKD also guides fine-tuning of the digital backend to compensate for implementation errors. Experiments on multiple datasets (e.g., MNIST, CIFAR, Carvana Masking) and hardware configurations show that our pipeline consistently improves ONN performance and enables practical deployment in both pre-fabrication simulations and physical implementations.

artificial intelligence, machine learning, neural network, (17 more...)

2508.08421

Country: North America (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology (0.93)
Energy (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Ashraf, Humza, Danish, Syed Muhammad, Leivadeas, Aris, Otoum, Yazan, Sattar, Zeeshan

Energy-Aware Code Generation with LLMs: Benchmarking Small vs. Large Language Models for Sustainable AI Programming

Large Language Models (LLMs) are widely used for code generation. However, commercial models like ChatGPT require significant computing power, which leads to high energy use and carbon emissions. This has raised concerns about their environmental impact. In this study, we evaluate open-source Small Language Models (SLMs) trained explicitly for code generation and compare their performance and energy efficiency against large LLMs and efficient human-written Python code. The goal is to investigate whether SLMs can match the performance of LLMs on certain types of programming problems while producing more energy-efficient code. We evaluate 150 coding problems from LeetCode, evenly distributed across three difficulty levels: easy, medium, and hard. Our comparison includes three small open-source models, StableCode-3B, StarCoderBase-3B, and Qwen2.5-Coder-3B-Instruct, and two large commercial models, GPT-4.0 and DeepSeek-Reasoner. The generated code is evaluated using four key metrics: run-time, memory usage, energy consumption, and correctness. We use human-written solutions as a baseline to assess the quality and efficiency of the model-generated code. Results indicate that LLMs achieve the highest correctness across all difficulty levels, but SLMs are often more energy-efficient when their outputs are correct. In over 52% of the evaluated problems, SLMs consumed the same or less energy than LLMs.

large language model, machine learning, natural language, (18 more...)

2508.08332

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Whole-Body Coordination for Dynamic Object Grasping with Legged Manipulators

Liang, Qiwei, Cai, Boyang, He, Rongyi, Li, Hui, Teng, Tao, Duan, Haihan, Huang, Changxin, Zeng, Runhao

Quadrupedal robots with manipulators offer strong mobility and adaptability for grasping in unstructured, dynamic environments through coordinated whole-body control. However, existing research has predominantly focused on static-object grasping, neglecting the challenges posed by dynamic targets and thus limiting applicability in dynamic scenarios such as logistics sorting and human-robot collaboration. To address this, we introduce DQ-Bench, a new benchmark that systematically evaluates dynamic grasping across varying object motions, velocities, heights, object types, and terrain complexities, along with comprehensive evaluation metrics. Building upon this benchmark, we propose DQ-Net, a compact teacher-student framework designed to infer grasp configurations from limited perceptual cues. During training, the teacher network leverages privileged information to holistically model both the static geometric properties and dynamic motion characteristics of the target, and integrates a grasp fusion module to deliver robust guidance for motion planning. Concurrently, we design a lightweight student network that performs dual-viewpoint temporal modeling using only the target mask, depth map, and proprioceptive state, enabling closed-loop action outputs without reliance on privileged data. Extensive experiments on DQ-Bench demonstrate that DQ-Net achieves robust dynamic objects grasping across multiple task settings, substantially outperforming baseline methods in both success rate and responsiveness.

artificial intelligence, international conference, machine learning, (17 more...)

2508.08328

Country:

Asia > China (0.28)
Europe > Austria (0.28)

Genre: Research Report (0.64)

Industry: Energy (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Banerjee, Saptarshi, Mallick, Tausif, Chakroborty, Amlan, Saha, Himadri Nath, Takur, Nityananda T.

Evaluation of State-of-the-Art Deep Learning Techniques for Plant Disease and Pest Detection

Addressing plant diseases and pests is critical for enhancing crop production and preventing economic losses. Recent advances in artificial intelligence (AI), machine learning (ML), and deep learning (DL) have significantly improved the precision and efficiency of detection methods, surpassing the limitations of manual identification. This study reviews modern computer-based techniques for detecting plant diseases and pests from images, including recent AI developments. The methodologies are organized into five categories: hyperspectral imaging, non-visualization techniques, visualization approaches, modified deep learning architectures, and transformer models. This structured taxonomy provides researchers with detailed, actionable insights for selecting advanced state-of-the-art detection methods. A comprehensive survey of recent work and comparative studies demonstrates the consistent superiority of modern AI-based approaches, which often outperform older image analysis methods in speed and accuracy. In particular, vision transformers such as the Hierarchical Vision Transformer (HvT) have shown accuracy exceeding 99.3% in plant disease detection, outperforming architectures like MobileNetV3. The study concludes by discussing system design challenges, proposing solutions, and outlining promising directions for future research.

artificial intelligence, comput mater contin, machine learning, (12 more...)

doi: 10.32604/cmc.2025.065250

2508.08317

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.67)

Genre:

Research Report > Promising Solution (1.00)
Overview (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Energy (1.00)
Food & Agriculture > Agriculture > Pest Control (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)