Goto

Collaborating Authors

 cpu time



Supplementary Material " Fast Bayesian Estimation of Point Process Intensity as Function of Covariates "

Neural Information Processing Systems

Current affiliation is Y okohama City University. We detail the derivation of the predictive covariance shown in (19-20). We detail the derivation of the marginal likelihood, p (D), shown in (23). H |. Finally, we obtain the marginal likelihood in a tractable form, log p(D) = log |Z | 1 2 log |I We detail the derivation of the functional determinant of equivalent kernel, |H|, when the naive and degenerate approaches are applied. S4.1 Naive Approach The equivalent kernel is constructed under the naive approach as follows: h( y, y Mercer's theorem [ 5 ] states that the kernel function of finite rank M has a diagonal representation such that k ( y, y S5.1 Model Configuration Augmented Permanental Process (APP) Let the number of samples for quasi-Monte Carlo method be denoted by J, and the ranks of approximate kernel function for Random feature map [ 6 ] and Nyström approximation [ 8, 9 ] be denoted by M We employed a popular gradient descent algorithm, Adam [ 4 ], to perform the minimization problem (see Section 2.2), { ˆ v B was set as 10 in the experiments.


Edge-Based Speech Transcription and Synthesis for Kinyarwanda and Swahili Languages

Mbonimpa, Pacome Simon, Tuyizere, Diane, Biyabani, Azizuddin Ahmed, Tonguz, Ozan K.

arXiv.org Artificial Intelligence

Abstract--This paper presents a novel framework for speech transcription and synthesis, leveraging edge-cloud parallelism to enhance processing speed and accessibility for Kinyarwanda and Swahili speakers. It addresses the scarcity of powerful language processing tools for these widely spoken languages in East African countries with limited technological infrastructure. The framework utilizes the Whisper and SpeechT5 pre-trained models to enable speech-to-text (STT) and text-to-speech (TTS) translation. The architecture uses a cascading mechanism that distributes the model inference workload between the edge device and the cloud, thereby reducing latency and resource usage, benefiting both ends. On the edge device, our approach achieves a memory usage compression of 9.5% for the SpeechT5 model and 14% for the Whisper model, with a maximum memory usage of 149 MB. Experimental results indicate that on a 1.7 GHz CPU edge device with a 1 MB/s network bandwidth, the system can process a 270-character text in less than a minute for both speech-to-text and text-to-speech transcription. Using real-world survey data from Kenya, it is shown that the cascaded edge-cloud architecture proposed could easily serve as an excellent platform for STT and TTS transcription with good accuracy and response time. I. INTRODUCTION In today's digital age, the need for accurate and efficient speech transcription and synthesis models has been increasing rapidly. These models play an important role in a variety of applications, such as learning new language(s), accessibility tools for people with difficulties in reading and hearing, as well as automated voice assistants [1]. Kinyarwanda and Swahili are two of the local languages spoken in East Africa. While Swahili is the most widely spoken language in Eastern Africa, the speakers range from 60 million to over 150 million [2].


large scale canonical correlation analysis with iterative least squares

Yichao Lu, Dean P. Foster

Neural Information Processing Systems

Canonical Correlation Analysis (CCA) is a widely used statistical tool with both well established theory and favorable performance for a wide range of machine learning problems. However, computing CCA for huge datasets can be very slow since it involves implementing QR decomposition or singular value decomposition of huge matrices. In this paper we introduce L-CCA, a iterative algorithm which can compute CCA fast on huge sparse datasets. Theory on both the asymptotic convergence and finite time accuracy of L-CCA are established. The experiments also show that L-CCA outperform other fast CCA approximation schemes on two real datasets.


A Unified Optimization Framework for Multiclass Classification with Structured Hyperplane Arrangements

Blanco, Víctor, Kothari, Harshit, Luedtke, James

arXiv.org Artificial Intelligence

In this paper, we propose a new mathematical optimization model for multiclass classification based on arrangements of hyperplanes. Our approach preserves the core support vector machine (SVM) paradigm of maximizing class separation while minimizing misclassification errors, and it is computationally more efficient than a previous formulation. We present a kernel-based extension that allows it to construct nonlinear decision boundaries. Furthermore, we show how the framework can naturally incorporate alternative geometric structures, including classification trees, $\ell_p$-SVMs, and models with discrete feature selection. To address large-scale instances, we develop a dynamic clustering matheuristic that leverages the proposed MIP formulation. Extensive computational experiments demonstrate the efficiency of the proposed model and dynamic clustering heuristic, and we report competitive classification performance on both synthetic datasets and real-world benchmarks from the UCI Machine Learning Repository, comparing our method with state-of-the-art implementations available in scikit-learn.



R2 Explain the identitiy inf c [ 0,1] 1 2 (c + a c) null a + a/2

Neural Information Processing Systems

We thank the reviwers for the feedback. Please find the responses for other comments/queries below. Please be specific about what part of [31] is being referenced in line 166. It is Theorem-6.7 of page 336 of [31], We will be more precise in the final version. We mentioned the gist of the algorithms in line 66-69.



Accelerating Particle-based Energetic Variational Inference

Bao, Xuelian, Kang, Lulu, Liu, Chun, Wang, Yiwei

arXiv.org Machine Learning

In this work, we propose a novel particle-based variational inference (ParVI) method that accelerates the EVI-Im, proposed in Ref. [41]. Inspired by energy quadratization (EQ) and operator splitting techniques for gradient flows, our approach efficiently drives particles towards the target distribution. Unlike EVI-Im, which employs the implicit Euler method to solve variational-preserving particle dynamics for minimizing the KL divergence, derived using a "discretize-then-variational" approach, the proposed algorithm avoids repeated evaluation of inter-particle interaction terms, significantly reducing computational cost. The framework is also extensible to other gradient-based sampling techniques. Through several numerical experiments, we demonstrate that our method outperforms existing ParVI approaches in efficiency, robustness, and accuracy.


Solving the Best Subset Selection Problem via Suboptimal Algorithms

Singh, Vikram, Sun, Min

arXiv.org Machine Learning

Best subset selection in linear regression is well known to be nonconvex and computationally challenging to solve, as the number of possible subsets grows rapidly with increasing dimensionality of the problem. As a result, finding the global optimal solution via an exact optimization method for a problem with dimensions of 1000s may take an impractical amount of CPU time. This suggests the importance of finding suboptimal procedures that can provide good approximate solutions using much less computational effort than exact methods. In this work, we introduce a new procedure and compare it with other popular suboptimal algorithms to solve the best subset selection problem. Extensive computational experiments using synthetic and real data have been performed. The results provide insights into the performance of these methods in different data settings. The new procedure is observed to be a competitive suboptimal algorithm for solving the best subset selection problem for high-dimensional data.