AITopics

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Europe > Switzerland (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Neural Information Processing SystemsFeb-9-2026, 11:18:41 GMT

EcoLight

Effective intersection control can play an important role in reducing traffic congestion and associated vehicular emissions.

artificial intelligence, intersection, machine learning, (18 more...)

Country:

Asia > Pakistan (0.05)
North America > United States (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Kresse, Fabian, Lampert, Christoph H.

Differentiable Weightless Controllers: Learning Logic Circuits for Continuous Control

arXiv.org Artificial IntelligenceDec-2-2025

We investigate whether continuous-control policies can be represented and learned as discrete logic circuits instead of continuous neural networks. We introduce Differentiable Weightless Controllers (DWCs), a symbolic-differentiable architecture that maps real-valued observations to actions using thermometer-encoded inputs, sparsely connected boolean lookup-table layers, and lightweight action heads. DWCs can be trained end-to-end by gradient-based techniques, yet compile directly into FPGA-compatible circuits with few- or even single-clock-cycle latency and nanojoule-level energy cost per action. Across five MuJoCo benchmarks, including high-dimensional Humanoid, DWCs achieve returns competitive with weight-based policies (full precision or quantized neural networks), matching performance on four tasks and isolating network capacity as the key limiting factor on HalfCheetah. Furthermore, DWCs exhibit structurally sparse and interpretable connectivity patterns, enabling a direct inspection of which input thresholds influence control decisions.

artificial intelligence, dwc, machine learning, (15 more...)

2512.01467

Country: Europe (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Semiconductors & Electronics (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceNov-18-2025

T-SAR: A Full-Stack Co-design for CPU-Only Ternary LLM Inference via In-Place SIMD ALU Reorganization

Oh, Hyunwoo, Nam, KyungIn, Bhattacharjya, Rajat, Chen, Hanning, Das, Tamoghno, Yun, Sanggeon, Jang, Suyeon, Ding, Andrew, Dutt, Nikil, Imani, Mohsen

Recent advances in LLMs have outpaced the computational and memory capacities of edge platforms that primarily employ CPUs, thereby challenging efficient and scalable deployment. While ternary quantization enables significant resource savings, existing CPU solutions rely heavily on memory-based lookup tables (LUTs) which limit scalability, and FPGA or GPU accelerators remain impractical for edge use. This paper presents T-SAR, the first framework to achieve scalable ternary LLM inference on CPUs by repurposing the SIMD register file for dynamic, in-register LUT generation with minimal hardware modifications. T-SAR eliminates memory bottlenecks and maximizes data-level parallelism, delivering 5.6-24.5x and 1.1-86.2x improvements in GEMM latency and GEMV throughput, respectively, with only 3.2% power and 1.4% area overheads in SIMD units. T-SAR achieves up to 2.5-4.9x the energy efficiency of an NVIDIA Jetson AGX Orin, establishing a practical approach for efficient LLM inference on edge platforms.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2511.13676

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Nag, Shashank, Bacellar, Alan T. L., Susskind, Zachary, Jha, Anshul, Liberty, Logan, Sivakumar, Aishwarya, John, Eugene B., Kailas, Krishnan, Lima, Priscila M. V., Yadwadkar, Neeraja J., Franca, Felipe M. G., John, Lizy K.

LL-ViT: Edge Deployable Vision Transformers with Look Up Table Neurons

arXiv.org Artificial IntelligenceNov-4-2025

Vision Transformers have been tremendously successful in computer vision tasks. However, their large computational, memory, and energy demands are a challenge for edge inference on FPGAs -- a field that has seen a recent surge in demand. We recognize the benefits of recent works on logic and Look Up Table (LUT) based networks, such as LogicNets, NeuraLUT, DWN, among others, in offering models that simultaneously reduce both the memory and compute footprints. However, these models natively do not perform well on common vision tasks, such as CIFAR-10/100. In this work, we propose LL-ViT, a novel edge optimized vision transformer design that integrates layers of LUT neurons within the transformer architecture. Based on our characterization that reveals that a majority of model weights and computations are from the channel mixer (MLP layer), we design an alternate LUT-based channel mixer, and simultaneously develop an FPGA-based accelerator for LL-ViT. Contrary to some attempts to replace each multiplication with a table lookup, our architecture utilizes a neural learning approach which natively learns the LUT functions. This approach allows for reduced model sizes, and a computational and energy-efficient inference solution for vision transformer models. Evaluating on edge-suitable workloads, we achieve accuracies of 95.5% on CIFAR-10, 78.8% on CIFAR-100, and 60.9% on Tiny-ImageNet datasets, comparable to the baseline transformer. LL-ViT eliminates over 60% of the model weights and 50% of the multiplications in the model, and achieves 1.9x energy efficiency and 1.3x lower latency over an integer quantized ViT accelerator, while also offering superior throughput against prior works at a 10.9W power budget.

artificial intelligence, machine learning, transformer, (18 more...)

2511.00812

Country: North America > United States (0.69)

Genre: Research Report (0.51)

Industry:

Information Technology (0.93)
Energy (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 11:00:49 GMT

TinyLUT: Tiny Look-Up Table for Efficient Image Restoration at the Edge Huanan Li

experiment, lut, tinylut, (16 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Europe > Switzerland (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Meng, Chang, Burleson, Wayne, De Micheli, Giovanni

Gradient Estimation Methods of Approximate Multipliers for High-Accuracy Retraining of Deep Learning Models

arXiv.org Artificial IntelligenceSep-16-2025

--Approximate multipliers (AppMults) are widely used in deep learning accelerators to reduce their area, delay, and power consumption. However, AppMults introduce arithmetic errors into deep learning models, necessitating a retraining process to recover accuracy. A key step in retraining is computing the gradient of the AppMult, i.e., the partial derivative of the approximate product with respect to each input operand. Existing approaches typically estimate this gradient using that of the accurate multiplier (AccMult), which can lead to suboptimal retraining results. T o address this, we propose two methods to obtain more precise gradients of AppMults. The first, called LUT -2D, characterizes the AppMult gradient with 2-dimensional lookup tables (LUTs), providing fine-grained estimation and achieving the highest retraining accuracy. The second, called LUT -1D, is a compact and more efficient variant that stores gradient values in 1-dimensional LUTs, achieving comparable retraining accuracy with shorter runtime. Experimental results show that on CIF AR-10 with convolutional neural networks, our LUT -2D and LUT -1D methods improve retraining accuracy by 3.83% and 3.72% on average, respectively. On ImageNet with vision transformer models, our LUT -1D method improves retraining accuracy by 23.69% on average, compared to a state-of-the-art retraining framework. Modern artificial intelligence ( AI) technologies excel in a wide range of areas such as natural language processing and computer vision. However, this rapid growth raises serious concerns about power consumption [1]. To achieve energy-efficient deep learning accelerators, researchers have adopted an emerging design paradigm called approximate computing, which reduces power consumption at the cost of errors [2], [3]. Approximate computing is particularly suitable for deep learning accelerators, since they are inherently resilient to errors and noise.

appmult, artificial intelligence, machine learning, (17 more...)

2509.10519

Country:

Europe (0.94)
North America > United States > Massachusetts (0.46)
North America > United States > Colorado (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsAug-15-2025, 06:48:19 GMT

EcoLight: Intersection Control in Developing Regions Under Extreme Budget and Network Constraints

Effective intersection control can play an important role in reducing traffic congestion and associated vehicular emissions.

algorithm, intersection, vehicle, (16 more...)

Country:

Asia > Pakistan (0.05)
Asia > India > NCT > New Delhi (0.04)
Asia > India > NCT > Delhi (0.04)
(4 more...)

Industry:

Transportation > Infrastructure & Services (0.47)
Transportation > Ground > Road (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

arXiv.org Artificial IntelligenceMar-19-2025

Mixture of Lookup Experts

Jie, Shibo, Tang, Yehui, Han, Kai, Li, Yitong, Tang, Duyu, Deng, Zhi-Hong, Wang, Yunhe

Mixture-of-Experts (MoE) activates only a subset of experts during inference, allowing the model to maintain low inference FLOPs and latency even as the parameter count scales up. However, since MoE dynamically selects the experts, all the experts need to be loaded into VRAM. Their large parameter size still limits deployment, and offloading, which load experts into VRAM only when needed, significantly increase inference latency. To address this, we propose Mixture of Lookup Experts (MoLE), a new MoE architecture that is efficient in both communication and VRAM usage. In MoLE, the experts are Feed-Forward Networks (FFNs) during training, taking the output of the embedding layer as input. Before inference, these experts can be re-parameterized as lookup tables (LUTs) that retrieves expert outputs based on input ids, and offloaded to storage devices. Therefore, we do not need to perform expert computations during inference. Instead, we directly retrieve the expert's computation results based on input ids and load them into VRAM, and thus the resulting communication overhead is negligible. Experiments show that, with the same FLOPs and VRAM usage, MoLE achieves inference speeds comparable to dense models and significantly faster than MoE with experts offloading, while maintaining performance on par with MoE.

large language model, machine learning, natural language, (20 more...)

2503.15798

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Bacellar, Alan T. L., Jadhao, Mugdha P., Nag, Shashank, Lima, Priscila M. V., Franca, Felipe M. G., John, Lizy K.

nanoML for Human Activity Recognition

arXiv.org Artificial IntelligenceFeb-13-2025

Human Activity Recognition (HAR) is critical for applications in healthcare, fitness, and IoT, but deploying accurate models on resource-constrained devices remains challenging due to high energy and memory demands. This paper demonstrates the application of Differentiable Weightless Neural Networks (DWNs) to HAR, achieving competitive accuracies of 96.34% and 96.67% while consuming only 56nJ and 104nJ per sample, with an inference time of just 5ns per sample. The DWNs were implemented and evaluated on an FPGA, showcasing their practical feasibility for energy-efficient hardware deployment. DWNs achieve up to 926,000x energy savings and 260x memory reduction compared to state-of-the-art deep learning methods. These results position DWNs as a nano-machine learning nanoML model for HAR, setting a new benchmark in energy efficiency and compactness for edge and wearable devices, paving the way for ultra-efficient edge AI.

artificial intelligence, machine learning, neural network, (13 more...)

2502.12173

Country:

North America > United States > Texas > Travis County > Austin (0.30)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.05)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.94)

Industry:

Health & Medicine (0.88)
Information Technology > Hardware (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)