AITopics | scale data

Collaborating Authors

scale data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reviews: Practical Deep Learning with Bayesian Principles

Neural Information Processing SystemsJan-26-2025, 15:05:14 GMT

The paper demonstrates that the Variational Online Gauss-Newton (VOGN) method of Khan et al. (2018) can be successfully scaled to deep learning architectures. The authors demonstrated the scalability of Bayesian methods to large scale data such as ImageNet. Extensive experiments on large scale data and models are provided. The main result is an adoption of an existing model (VOGN) to make it practical for deep learning.

bayesian principle, practical deep learning, scale data, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Accelerated sparse Kernel Spectral Clustering for large scale data clustering problems

Novak, Mihaly, Langone, Rocco, Alzate, Carlos, Suykens, Johan

arXiv.org Artificial IntelligenceOct-20-2023

An improved version of the sparse multiway kernel spectral clustering (KSC) is presented in this brief. The original algorithm is derived from weighted kernel principal component (KPCA) analysis formulated within the primal-dual least-squares support vector machine (LS-SVM) framework. Sparsity is achieved then by the combination of the incomplete Cholesky decomposition (ICD) based low rank approximation of the kernel matrix with the so called reduced set method. The original ICD based sparse KSC algorithm was reported to be computationally far too demanding, especially when applied on large scale data clustering problems that actually it was designed for, which has prevented to gain more than simply theoretical relevance so far. This is altered by the modifications reported in this brief that drastically improve the computational characteristics. Solving the alternative, symmetrized version of the computationally most demanding core eigenvalue problem eliminates the necessity of forming and SVD of large matrices during the model construction. This results in solving clustering problems now within seconds that were reported to require hours without altering the results. Furthermore, sparsity is also improved significantly, leading to more compact model representation, increasing further not only the computational efficiency but also the descriptive power. These transform the original, only theoretically relevant ICD based sparse KSC algorithm applicable for large scale practical clustering problems. Theoretical results and improvements are demonstrated by computational experiments on carefully selected synthetic data as well as on real life problems such as image segmentation.

algorithm, alzate and suyken, matrix, (10 more...)

arXiv.org Artificial Intelligence

2310.13381

Country:

Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)
Asia > Middle East > Jordan (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

GPU accelerated matrix factorization of large scale data using block based approach

Bhavana, Prasad, Padmanabhan, Vineet

arXiv.org Artificial IntelligenceJan-2-2023

Matrix Factorization (MF) on large scale data takes substantial time on a Central Processing Unit (CPU). While Graphical Processing Unit (GPU)s could expedite the computation of MF, the available memory on a GPU is finite. Leveraging GPUs require alternative techniques that allow not only parallelism but also address memory limitations. Synchronization between computation units, isolation of data related to a computational unit, sharing of data between computational units and identification of independent tasks among computational units are some of the challenges while leveraging GPUs for MF. We propose a block based approach to matrix factorization using Stochastic Gradient Descent (SGD) that is aimed at accelerating MF on GPUs. The primary motivation for the approach is to make it viable to factorize extremely large data sets on limited hardware without having to compromise on results. The approach addresses factorization of large scale data by identifying independent blocks, each of which are factorized in parallel using multiple computational units. The approach can be extended to one or more GPUs and even to distributed systems. The RMSE results of the block based approach are with in acceptable delta in comparison to the results of CPU based variant and multi-threaded CPU variant of similar SGD kernel implementation. The advantage, of the block based variant, in-terms of speed are significant in comparison to other variants.

artificial intelligence, factorization, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2304.13724

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
(12 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

How to Scale Data for Long Short-Term Memory Networks in Python - Machine Learning Mastery

#artificialintelligenceJul-12-2017, 18:15:22 GMT

The data for your sequence prediction problem probably needs to be scaled when training a neural network, such as a Long Short-Term Memory recurrent neural network. When a network is fit on unscaled data that has a range of values (e.g. In this tutorial, you will discover how to normalize and standardize your sequence prediction data and how to decide which to use for your input and output variables. How to Scale Data for Long Short-Term Memory Networks in Python Photo by Mathias Appel, some rights reserved. There are two types of scaling of your series that you may want to consider: normalization and standardization.

artificial intelligence, deep learning, machine learning, (16 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unsupervised Large Graph Embedding

Nie, Feiping (Northwestern Polytechnical University) | Zhu, Wei (Northwestern Polytechnical University) | Li, Xuelong (Chinese Academy of Sciences)

AAAI ConferencesFeb-14-2017

There are many successful spectral based unsupervised dimensionality reduction methods, including Laplacian Eigenmap (LE), Locality Preserving Projection (LPP), Spectral Regression (SR), etc. LPP and SR are two different linear spectral based methods, however, we discover that LPP and SR are equivalent, if the symmetric similarity matrix is doubly stochastic, Positive Semi-Definite (PSD) and with rank p, where p is the reduced dimension. The discovery promotes us to seek low-rank and doubly stochastic similarity matrix, we then propose an unsupervised linear dimensionality reduction method, called Unsupervised Large Graph Embedding (ULGE). ULGE starts with similar idea as LPP, it adopts an efficient approach to construct similarity matrix and then performs spectral analysis efficiently, the computational complexity can reduce to O(ndm), which is a significant improvement compared to conventional spectral based methods which need O(n^2d) at least, where n, d and m are the number of samples, dimensions and anchors, respectively. Extensive experiments on several public available data sets demonstrate the efficiency and effectiveness of the proposed method.

artificial intelligence, machine learning, similarity matrix, (15 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country:

North America > United States (0.15)
Asia > China (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Robust, scalable and fast bootstrap method for analyzing large scale data

Basiri, Shahab, Ollila, Esa, Koivunen, Visa

arXiv.org Machine LearningApr-12-2015

In this paper we address the problem of performing statistical inference for large scale data sets i.e., Big Data. The volume and dimensionality of the data may be so high that it cannot be processed or stored in a single computing node. We propose a scalable, statistically robust and computationally efficient bootstrap method, compatible with distributed processing and storage systems. Bootstrap resamples are constructed with smaller number of distinct data points on multiple disjoint subsets of data, similarly to the bag of little bootstrap method (BLB) [1]. Then significant savings in computation is achieved by avoiding the re-computation of the estimator for each bootstrap sample. Instead, a computationally efficient fixed-point estimation equation is analytically solved via a smart approximation following the Fast and Robust Bootstrap method (FRB) [2]. Our proposed bootstrap method facilitates the use of highly robust statistical methods in analyzing large scale data sets. The favorable statistical properties of the method are established analytically. Numerical examples demonstrate scalability, low complexity and robust statistical performance of the method in analyzing large data sets.

artificial intelligence, bootstrap sample, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2015.2498121

1504.02382

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (1.00)

Add feedback