Goto

Collaborating Authors

 dnn architecture




Using Random Effects to Account for High-Cardinality Categorical Features and Repeated Measures in Deep Neural Networks

Neural Information Processing Systems

Figure 2: Real data predicted vs. true results and category size distribution Python 3.8 Numpy + Pandas suite, Keras and Tensorflow Code is fully available in the lmmnn package on Github Running code: see details in package README file 3 n = 100, 000, σ At each run 80% (80,000) of the simulated data is used as training set, of which 10% (8,000) is used as validation set which the network only uses to check for early stopping. Embedding layer which maps q levels to a d = 0 .1 q vector, so input dimension is p + d - Physical activity (P A) definition: Subjects wore an accelerometer on their wrist for 7 days. ENMO in m-g was summarised across valid wear-time. ETL: We follow instructions by Pearce et al. (2020), implemented in R. At high level, we "once a week" is converted to 1 and "every day" is converted to 7. Finally the P A dependent variable is standardized to have a Baseline DNN architecture: Pearce et al. did not use DNNs, but two separate linear regressions, for men and women. ReLU activation of 10 and 5 neurons, followed by a single output neuron with no activation.


Using Random Effects to Account for High-Cardinality Categorical Features and Repeated Measures in Deep Neural Networks

Neural Information Processing Systems

A special scenario of interest is that of repeated measures, where the categorical feature is the identity of the individual or object, and each object is measured several times, possibly under different conditions (values of the other features).



Adaptive Fine-Tuning via Pattern Specialization for Deep Time Series Forecasting

Saadallah, Amal, Al-Ademi, Abdulaziz

arXiv.org Artificial Intelligence

Time series forecasting poses significant challenges in non-stationary environments where underlying patterns evolve over time. In this work, we propose a novel framework that enhances deep neural network (DNN) performance by leveraging specialized model adaptation and selection. Initially, a base DNN is trained offline on historical time series data. A reserved validation subset is then segmented to extract and cluster the most dominant patterns within the series, thereby identifying distinct regimes. For each identified cluster, the base DNN is fine-tuned to produce a specialized version that captures unique pattern characteristics. At inference, the most recent input is matched against the cluster centroids, and the corresponding fine-tuned version is deployed based on the closest similarity measure. Additionally, our approach integrates a concept drift detection mechanism to identify and adapt to emerging patterns caused by non-stationary behavior. The proposed framework is generalizable across various DNN architectures and has demonstrated significant performance gains on both traditional DNNs and recent advanced architectures implemented in the GluonTS library.


Industry Insights from Comparing Deep Learning and GBDT Models for E-Commerce Learning-to-Rank

Lutz, Yunus, Wilm, Timo, Duwe, Philipp

arXiv.org Artificial Intelligence

In e-commerce recommender and search systems, tree-based models, such as LambdaMART, have set a strong baseline for Learning-to-Rank (LTR) tasks. Despite their effectiveness and widespread adoption in industry, the debate continues whether deep neural networks (DNNs) can outperform traditional tree-based models in this domain. To contribute to this discussion, we systematically benchmark DNNs against our production-grade LambdaMART model. We evaluate multiple DNN architectures and loss functions on a proprietary dataset from OTTO and validate our findings through an 8-week online A/B test. The results show that a simple DNN architecture outperforms a strong tree-based baseline in terms of total clicks and revenue, while achieving parity in total units sold.


Graph-CNNs for RF Imaging: Learning the Electric Field Integral Equations

Stylianopoulos, Kyriakos, Gavriilidis, Panagiotis, Gradoni, Gabriele, Alexandropoulos, George C.

arXiv.org Artificial Intelligence

Radio-Frequency (RF) imaging concerns the digital recreation of the surfaces of scene objects based on the scattered field at distributed receivers. To solve this difficult inverse scattering problems, data-driven methods are often employed that extract patterns from similar training examples, while offering minimal latency. In this paper, we first provide an approximate yet fast electromagnetic model, which is based on the electric field integral equations, for data generation, and subsequently propose a Deep Neural Network (DNN) architecture to learn the corresponding inverse model. A graph-attention backbone allows for the system geometry to be passed to the DNN, where residual convolutional layers extract features about the objects, while a UNet head performs the final image reconstruction. Our quantitative and qualitative evaluations on two synthetic data sets of different characteristics showcase the performance gains of thee proposed advanced architecture and its relative resilience to signal noise levels and various reception configurations.


Towards a robust R2D2 paradigm for radio-interferometric imaging: revisiting DNN training and architecture

Aghabiglou, Amir, Chu, Chung San, Tang, Chao, Dabbech, Arwa, Wiaux, Yves

arXiv.org Artificial Intelligence

The R2D2 Deep Neural Network (DNN) series was recently introduced for image formation in radio interferometry. It can be understood as a learned version of CLEAN, whose minor cycles are substituted with DNNs. We revisit R2D2 on the grounds of series convergence, training methodology, and DNN architecture, improving its robustness in terms of generalisability beyond training conditions, capability to deliver high data fidelity, and epistemic uncertainty. Firstly, while still focusing on telescope-specific training, we enhance the learning process by randomising Fourier sampling integration times, incorporating multi-scan multi-noise configurations, and varying imaging settings, including pixel resolution and visibility-weighting scheme. Secondly, we introduce a convergence criterion whereby the reconstruction process stops when the data residual is compatible with noise, rather than simply using all available DNNs. This not only increases the reconstruction efficiency by reducing its computational cost, but also refines training by pruning out the data/image pairs for which optimal data fidelity is reached before training the next DNN. Thirdly, we substitute R2D2's early U-Net DNN with a novel architecture (U-WDSR) combining U-Net and WDSR, which leverages wide activation, dense connections, weight normalisation, and low-rank convolution to improve feature reuse and reconstruction precision. As previously, R2D2 was trained for monochromatic intensity imaging with the Very Large Array (VLA) at fixed $512 \times 512$ image size. Simulations on a wide range of inverse problems and a case study on real data reveal that the new R2D2 model consistently outperforms its earlier version in image reconstruction quality, data fidelity, and epistemic uncertainty.


Integrating Optimization Theory with Deep Learning for Wireless Network Design

Coleri, Sinem, Onalan, Aysun Gurur, di Renzo, Marco

arXiv.org Artificial Intelligence

Traditional wireless network design relies on optimization algorithms derived from domain-specific mathematical models, which are often inefficient and unsuitable for dynamic, real-time applications due to high complexity. Deep learning has emerged as a promising alternative to overcome complexity and adaptability concerns, but it faces challenges such as accuracy issues, delays, and limited interpretability due to its inherent black-box nature. This paper introduces a novel approach that integrates optimization theory with deep learning methodologies to address these issues. The methodology starts by constructing the block diagram of the optimization theory-based solution, identifying key building blocks corresponding to optimality conditions and iterative solutions. Selected building blocks are then replaced with deep neural networks, enhancing the adaptability and interpretability of the system. Extensive simulations show that this hybrid approach not only reduces runtime compared to optimization theory based approaches but also significantly improves accuracy and convergence rates, outperforming pure deep learning models.