AITopics | subarray

Collaborating Authors

subarray

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DNN-based Methods of Jointly Sensing Number and Directions of Targets via a Green Massive H2AD MIMO Receiver

Deng, Bin, Bai, Jiatong, Zhao, Feilong, Xie, Zuming, Li, Maolin, Wang, Yan, Shu, Feng

arXiv.org Artificial IntelligenceAug-1-2025

As a green MIMO structure, the heterogeneous hybrid analog-digital H2AD MIMO architecture has been shown to own a great potential to replace the massive or extremely large-scale fully-digital MIMO in the future wireless networks to address the three challenging problems faced by the latter: high energy consumption, high circuit cost, and high complexity. However, how to intelligently sense the number and direction of multi-emitters via such a structure is still an open hard problem. To address this, we propose a two-stage sensing framework that jointly estimates the number and direction values of multiple targets. Specifically, three target number sensing methods are designed: an improved eigen-domain clustering (EDC) framework, an enhanced deep neural network (DNN) based on five key statistical features, and an improved one-dimensional convolutional neural network (1D-CNN) utilizing full eigenvalues. Subsequently, a low-complexity and high-accuracy DOA estimation is achieved via the introduced online micro-clustering (OMC-DOA) method. Furthermore, we derive the Cramér-Rao lower bound (CRLB) for the H2AD under multiple-source conditions as a theoretical performance benchmark. Simulation results show that the developed three methods achieve 100\% number of targets sensing at moderate-to-high SNRs, while the improved 1D-CNN exhibits superior under extremely-low SNR conditions. The introduced OMC-DOA outperforms existing clustering and fusion-based DOA methods in multi-source environments.

artificial intelligence, estimation, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2507.22906

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Energy (0.66)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems

Choong, Benjamin Chen Ming, Luo, Tao, Liu, Cheng, He, Bingsheng, Zhang, Wei, Zhou, Joey Tianyi

arXiv.org Artificial IntelligenceJul-3-2025

Deep neural networks generate and process large volumes of data, posing challenges for low-resource embedded systems. In-memory computing has been demonstrated as an efficient computing infrastructure and shows promise for embedded AI applications. Among newly-researched memory technologies, racetrack memory is a non-volatile technology that allows high data density fabrication, making it a good fit for in-memory computing. However, integrating in-memory arithmetic circuits with memory cells affects both the memory density and power efficiency. It remains challenging to build efficient in-memory arithmetic circuits on racetrack memory within area and energy constraints. To this end, we present an efficient in-memory convolutional neural network (CNN) accelerator optimized for use with racetrack memory. We design a series of fundamental arithmetic circuits as in-memory computing cells suited for multiply-and-accumulate operations. Moreover, we explore the design space of racetrack memory based systems and CNN model architectures, employing co-design to improve the efficiency and performance of performing CNN inference in racetrack memory while maintaining model accuracy. Our designed circuits and model-system co-optimization strategies achieve a small memory bank area with significant improvements in energy and performance for racetrack memory based embedded systems.

artificial intelligence, machine learning, racetrack memory, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.sysarc.2022.102507

2507.01429

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.92)

Industry:

Semiconductors & Electronics (0.67)
Information Technology (0.67)
Energy (0.46)

Technology:

Information Technology > Architecture (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Multi-Branch DNN and CRLB-Ratio-Weight Fusion for Enhanced DOA Sensing via a Massive H$^2$AD MIMO Receiver

Shu, Feng, Bai, Jiatong, Wu, Di, Zhu, Wei, Deng, Bin, Zhou, Fuhui, Wang, Jiangzhou

arXiv.org Artificial IntelligenceJul-1-2025

As a green MIMO structure, massive H$^2$AD is viewed as a potential technology for the future 6G wireless network. For such a structure, it is a challenging task to design a low-complexity and high-performance fusion of target direction values sensed by different sub-array groups with fewer use of prior knowledge. To address this issue, a lightweight Cramer-Rao lower bound (CRLB)-ratio-weight fusion (WF) method is proposed, which approximates inverse CRLB of each subarray using antenna number reciprocals to eliminate real-time CRLB computation. This reduces complexity and prior knowledge dependence while preserving fusion performance. Moreover, a multi-branch deep neural network (MBDNN) is constructed to further enhance direction-of-arrival (DOA) sensing by leveraging candidate angles from multiple subarrays. The subarray-specific branch networks are integrated with a shared regression module to effectively eliminate pseudo-solutions and fuse true angles. Simulation results show that the proposed CRLB-ratio-WF method achieves DOA sensing performance comparable to CRLB-based methods, while significantly reducing the reliance on prior knowledge. More notably, the proposed MBDNN has superior performance in low-SNR ranges. At SNR $= -15$ dB, it achieves an order-of-magnitude improvement in estimation accuracy compared to CRLB-ratio-WF method.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.23203

Country:

Asia > China > Jiangsu Province > Nanjing (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Italy > Campania > Naples (0.04)
Asia > China > Hainan Province (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Think Only When You Need with Large Hybrid-Reasoning Models

Jiang, Lingjie, Wu, Xun, Huang, Shaohan, Dong, Qingxiu, Chi, Zewen, Dong, Li, Zhang, Xingxing, Lv, Tengchao, Cui, Lei, Wei, Furu

arXiv.org Artificial IntelligenceMay-22-2025

Recent Large Reasoning Models (LRMs) have shown substantially improved reasoning capabilities over traditional Large Language Models (LLMs) by incorporating extended thinking processes prior to producing final responses. However, excessively lengthy thinking introduces substantial overhead in terms of token consumption and latency, which is particularly unnecessary for simple queries. In this work, we introduce Large Hybrid-Reasoning Models (LHRMs), the first kind of model capable of adaptively determining whether to perform thinking based on the contextual information of user queries. To achieve this, we propose a two-stage training pipeline comprising Hybrid Fine-Tuning (HFT) as a cold start, followed by online reinforcement learning with the proposed Hybrid Group Policy Optimization (HGPO) to implicitly learn to select the appropriate thinking mode. Furthermore, we introduce a metric called Hybrid Accuracy to quantitatively assess the model's capability for hybrid thinking. Extensive experimental results show that LHRMs can adaptively perform hybrid thinking on queries of varying difficulty and type. It outperforms existing LRMs and LLMs in reasoning and general capabilities while significantly improving efficiency. Together, our work advocates for a reconsideration of the appropriate use of extended thinking processes and provides a solid starting point for building hybrid thinking systems.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.14631

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Iraq > Basra Governorate > Basra (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MICSim: A Modular Simulator for Mixed-signal Compute-in-Memory based AI Accelerator

Wang, Cong, Chen, Zeming, Huang, Shanshi

arXiv.org Artificial IntelligenceSep-23-2024

This work introduces MICSim, an open-source, pre-circuit simulator designed for early-stage evaluation of chip-level software performance and hardware overhead of mixed-signal compute-in-memory (CIM) accelerators. MICSim features a modular design, allowing easy multi-level co-design and design space exploration. Modularized from the state-of-the-art CIM simulator NeuroSim, MICSim provides a highly configurable simulation framework supporting multiple quantization algorithms, diverse circuit/architecture designs, and different memory devices. This modular approach also allows MICSim to be effectively extended to accommodate new designs. MICSim natively supports evaluating accelerators' software and hardware performance for CNNs and Transformers in Python, leveraging the popular PyTorch and HuggingFace Transformers frameworks. These capabilities make MICSim highly adaptive when simulating different networks and user-friendly. This work demonstrates that MICSim can easily be combined with optimization strategies to perform design space exploration and used for chip-level Transformers CIM accelerators evaluation. Also, MICSim can achieve a 9x - 32x speedup of NeuroSim through a statistic-based average mode proposed by this work.

machine learning, micsim, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3658617.3697630

2409.14838

Country:

Asia > China (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Software (0.88)

Add feedback

Direction of Arrival Estimation with Sparse Subarrays

Leite, W., de Lamare, R. C., Zakharov, Y., Liu, W., Haardt, M.

arXiv.org Artificial IntelligenceAug-17-2024

This paper proposes design techniques for partially-calibrated sparse linear subarrays and algorithms to perform direction-of-arrival (DOA) estimation. First, we introduce array architectures that incorporate two distinct array categories, namely type-I and type-II arrays. The former breaks down a known sparse linear geometry into as many pieces as we need, and the latter employs each subarray such as it fits a preplanned sparse linear geometry. Moreover, we devise two Direction of Arrival (DOA) estimation algorithms that are suitable for partially-calibrated array scenarios within the coarray domain. The algorithms are capable of estimating a greater number of sources than the number of available physical sensors, while maintaining the hardware and computational complexity within practical limits for real-time implementation. To this end, we exploit the intersection of projections onto affine spaces by devising the Generalized Coarray Multiple Signal Classification (GCA-MUSIC) in conjunction with the estimation of a refined projection matrix related to the noise subspace, as proposed in the GCA root-MUSIC algorithm. An analysis is performed for the devised subarray configurations in terms of degrees of freedom, as well as the computation of the Cram\`er-Rao Lower Bound for the utilized data model, in order to demonstrate the good performance of the proposed methods. Simulations assess the performance of the proposed design methods and algorithms against existing approaches.

algorithm, matrix, subarray, (16 more...)

arXiv.org Artificial Intelligence

2409.00033

Country:

South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > North Yorkshire > York (0.04)
(3 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Information Management (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

Analysis of Partially-Calibrated Sparse Subarrays for Direction Finding with Extended Degrees of Freedom

Leite, W. S., de Lamare, R. C.

arXiv.org Artificial IntelligenceAug-6-2024

This paper investigates the problem of direction-of-arrival (DOA) estimation using multiple partially-calibrated sparse subarrays. In particular, we present the Generalized Coarray Multiple Signal Classification (GCA-MUSIC) DOA estimation algorithm to scenarios with partially-calibrated sparse subarrays. The proposed GCA-MUSIC algorithm exploits the difference coarray for each subarray, followed by a specific pseudo-spectrum merging rule that is based on the intersection of the signal subspaces associated to each subarray. This rule assumes that there is no a priori knowledge about the cross-covariance between subarrays. In that way, only the second-order statistics of each subarray are used to estimate the directions with increased degrees of freedom, i.e., the estimation procedure preserves the coarray Multiple Signal Classification and sparse arrays properties to estimate more sources than the number of physical sensors in each subarray. Numerical simulations show that the proposed GCA-MUSIC has better performance than other similar strategies.

ieee transaction, signal processing, subarray, (12 more...)

arXiv.org Artificial Intelligence

2408.03236

Country:

South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.05)
North America > United States > New York (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence (0.68)

Add feedback

PENDRAM: Enabling High-Performance and Energy-Efficient Processing of Deep Neural Networks through a Generalized DRAM Data Mapping Policy

Putra, Rachmad Vidya Wicaksana, Hanif, Muhammad Abdullah, Shafique, Muhammad

arXiv.org Artificial IntelligenceAug-5-2024

Convolutional Neural Networks (CNNs), a prominent type of Deep Neural Networks (DNNs), have emerged as a state-of-the-art solution for solving machine learning tasks. To improve the performance and energy efficiency of CNN inference, the employment of specialized hardware accelerators is prevalent. However, CNN accelerators still face performance- and energy-efficiency challenges due to high off-chip memory (DRAM) access latency and energy, which are especially crucial for latency- and energy-constrained embedded applications. Moreover, different DRAM architectures have different profiles of access latency and energy, thus making it challenging to optimize them for high performance and energy-efficient CNN accelerators. To address this, we present PENDRAM, a novel design space exploration methodology that enables high-performance and energy-efficient CNN acceleration through a generalized DRAM data mapping policy. Specifically, it explores the impact of different DRAM data mapping policies and DRAM architectures across different CNN partitioning and scheduling schemes on the DRAM access latency and energy, then identifies the pareto-optimal design choices. The experimental results show that our DRAM data mapping policy improves the energy-delay-product of DRAM accesses in the CNN accelerator over other mapping policies by up to 96%. In this manner, our PENDRAM methodology offers high-performance and energy-efficient CNN acceleration under any given DRAM architectures for diverse embedded AI applications.

architecture, dram architecture, mapping policy, (15 more...)

arXiv.org Artificial Intelligence

2408.02412

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > New York (0.05)
(3 more...)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural Networks

Afifi, Salma, Thakkar, Ishan, Pasricha, Sudeep

arXiv.org Artificial IntelligenceJul-17-2024

Transformers have emerged as a powerful tool for natural language processing (NLP) and computer vision. Through the attention mechanism, these models have exhibited remarkable performance gains when compared to conventional approaches like recurrent neural networks (RNNs) and convolutional neural networks (CNNs). Nevertheless, transformers typically demand substantial execution time due to their extensive computations and large memory footprint. Processing in-memory (PIM) and near-memory computing (NMC) are promising solutions to accelerating transformers as they offer high compute parallelism and memory bandwidth. However, designing PIM/NMC architectures to support the complex operations and massive amounts of data that need to be moved between layers in transformer neural networks remains a challenge. We propose ARTEMIS, a mixed analog-stochastic in-DRAM accelerator for transformer models. Through employing minimal changes to the conventional DRAM arrays, ARTEMIS efficiently alleviates the costs associated with transformer model execution by supporting stochastic computing for multiplications and temporal analog accumulations using a novel in-DRAM metal-on-metal capacitor. Our analysis indicates that ARTEMIS exhibits at least 3.0x speedup, 1.8x lower energy, and 1.9x better energy efficiency compared to GPU, TPU, CPU, and state-of-the-art PIM transformer hardware accelerators.

accelerator, architecture, opération, (17 more...)

arXiv.org Artificial Intelligence

2407.12638

Country: Asia > Middle East > Oman > Al Wusta Governorate > Haima (0.05)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OPIMA: Optical Processing-In-Memory for Convolutional Neural Network Acceleration

Sunny, Febin, Shafiee, Amin, Balasubramaniam, Abhishek, Nikdast, Mahdi, Pasricha, Sudeep

arXiv.org Artificial IntelligenceJul-11-2024

Recent advances in machine learning (ML) have spotlighted the pressing need for computing architectures that bridge the gap between memory bandwidth and processing power. The advent of deep neural networks has pushed traditional Von Neumann architectures to their limits due to the high latency and energy consumption costs associated with data movement between the processor and memory for these workloads. One of the solutions to overcome this bottleneck is to perform computation within the main memory through processing-in-memory (PIM), thereby limiting data movement and the costs associated with it. However, DRAM-based PIM struggles to achieve high throughput and energy efficiency due to internal data movement bottlenecks and the need for frequent refresh operations. In this work, we introduce OPIMA, a PIM-based ML accelerator, architected within an optical main memory. OPIMA has been designed to leverage the inherent massive parallelism within main memory while performing high-speed, low-energy optical computation to accelerate ML models based on convolutional neural networks. We present a comprehensive analysis of OPIMA to guide design choices and operational mechanisms. Additionally, we evaluate the performance and energy consumption of OPIMA, comparing it with conventional electronic computing systems and emerging photonic PIM architectures. The experimental results show that OPIMA can achieve 2.98x higher throughput and 137x better energy efficiency than the best-known prior work.

architecture, opima, opération, (17 more...)

arXiv.org Artificial Intelligence

2407.08205

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Colorado > Larimer County > Fort Collins (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.70)

Industry:

Energy (1.00)
Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback