AITopics | Najafi, Amir

Collaborating Authors

Najafi, Amir

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robust Model Evaluation over Large-scale Federated Networks

Najafi, Amir, Sani, Samin Mahdizadeh, Farnia, Farzan

arXiv.org Machine LearningOct-26-2024

In this paper, we address the challenge of certifying the performance of a machine learning model on an unseen target network, using measurements from an available source network. We focus on a scenario where heterogeneous datasets are distributed across a source network of clients, all connected to a central server. Specifically, consider a source network "A" composed of $K$ clients, each holding private data from unique and heterogeneous distributions, which are assumed to be independent samples from a broader meta-distribution $\mu$. Our goal is to provide certified guarantees for the model's performance on a different, unseen target network "B," governed by another meta-distribution $\mu'$, assuming the deviation between $\mu$ and $\mu'$ is bounded by either the Wasserstein distance or an $f$-divergence. We derive theoretical guarantees for the model's empirical average loss and provide uniform bounds on the risk CDF, where the latter correspond to novel and adversarially robust versions of the Glivenko-Cantelli theorem and the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality. Our bounds are computable in polynomial time with a polynomial number of queries to the $K$ clients, preserving client privacy by querying only the model's (potentially adversarial) loss on private data. We also establish non-asymptotic generalization bounds that consistently converge to zero as both $K$ and the minimum client sample size grow. Extensive empirical evaluations validate the robustness and practicality of our bounds across real-world tasks.

artificial intelligence, machine learning, probability, (17 more...)

arXiv.org Machine Learning

2410.2025

Country:

Asia > Middle East > Iran (0.28)
North America (0.28)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.45)

Add feedback

Gradual Domain Adaptation via Manifold-Constrained Distributionally Robust Optimization

Saberi, Amir Hossein, Najafi, Amir, Emrani, Ala, Behjati, Amin, Zolfimoselo, Yasaman, Shadrooy, Mahdi, Motahari, Abolfazl, Khalaj, Babak H.

arXiv.org Machine LearningOct-17-2024

The aim of this paper is to address the challenge of gradual domain adaptation within a class of manifold-constrained data distributions. In particular, we consider a sequence of $T\ge2$ data distributions $P_1,\ldots,P_T$ undergoing a gradual shift, where each pair of consecutive measures $P_i,P_{i+1}$ are close to each other in Wasserstein distance. We have a supervised dataset of size $n$ sampled from $P_0$, while for the subsequent distributions in the sequence, only unlabeled i.i.d. samples are available. Moreover, we assume that all distributions exhibit a known favorable attribute, such as (but not limited to) having intra-class soft/hard margins. In this context, we propose a methodology rooted in Distributionally Robust Optimization (DRO) with an adaptive Wasserstein radius. We theoretically show that this method guarantees the classification error across all $P_i$s can be suitably bounded. Our bounds rely on a newly introduced {\it {compatibility}} measure, which fully characterizes the error propagation dynamics along the sequence. Specifically, for inadequately constrained distributions, the error can exponentially escalate as we progress through the gradual shifts. Conversely, for appropriately constrained distributions, the error can be demonstrated to be linear or even entirely eradicated. We have substantiated our theoretical findings through several experimental results.

artificial intelligence, classifier, machine learning, (15 more...)

arXiv.org Machine Learning

2410.14061

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Unlabeled Out-Of-Domain Data Improves Generalization

Saberi, Amir Hossein, Najafi, Amir, Heidari, Alireza, Movasaghinia, Mohammad Hosein, Motahari, Abolfazl, Khalaj, Babak H.

arXiv.org Machine LearningSep-28-2023

We propose a novel framework for incorporating unlabeled data into semi-supervised classification problems, where scenarios involving the minimization of either i) adversarially robust or ii) non-robust loss functions have been considered. Notably, we allow the unlabeled samples to deviate slightly (in total variation sense) from the in-domain distribution. The core idea behind our framework is to combine Distributionally Robust Optimization (DRO) with self-supervised training. As a result, we also leverage efficient polynomial-time algorithms for the training stage. From a theoretical standpoint, we apply our framework on the classification problem of a mixture of two Gaussians in $\mathbb{R}^d$, where in addition to the $m$ independent and labeled samples from the true distribution, a set of $n$ (usually with $n\gg m$) out of domain and unlabeled samples are gievn as well. Using only the labeled data, it is known that the generalization error can be bounded by $\propto\left(d/m\right)^{1/2}$. However, using our method on both isotropic and non-isotropic Gaussian mixture models, one can derive a new set of analytically explicit and non-asymptotic bounds which show substantial improvement on the generalization error compared ERM. Our results underscore two significant insights: 1) out-of-domain samples, even when unlabeled, can be harnessed to narrow the generalization gap, provided that the true data distribution adheres to a form of the "cluster assumption", and 2) the semi-supervised learning paradigm can be regarded as a special case of our framework when there are no distributional shifts. We validate our claims through experiments conducted on a variety of synthetic and real-world datasets.

artificial intelligence, machine learning, unlabeled sample, (18 more...)

arXiv.org Machine Learning

2310.00027

Country:

North America > United States (0.14)
Asia > Middle East > Iran (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes

Saberi, Amir Hossein, Najafi, Amir, Motahari, Seyed Abolfazl, Khalaj, Babak H.

arXiv.org Artificial IntelligenceApr-28-2023

In this paper, we find a sample complexity bound for learning a simplex from noisy samples. Assume a dataset of size $n$ is given which includes i.i.d. samples drawn from a uniform distribution over an unknown simplex in $\mathbb{R}^K$, where samples are assumed to be corrupted by a multi-variate additive Gaussian noise of an arbitrary magnitude. We prove the existence of an algorithm that with high probability outputs a simplex having a $\ell_2$ distance of at most $\varepsilon$ from the true simplex (for any $\varepsilon>0$). Also, we theoretically show that in order to achieve this bound, it is sufficient to have $n\ge\left(K^2/\varepsilon^2\right)e^{\Omega\left(K/\mathrm{SNR}^2\right)}$ samples, where $\mathrm{SNR}$ stands for the signal-to-noise ratio. This result solves an important open problem and shows as long as $\mathrm{SNR}\ge\Omega\left(K^{1/2}\right)$, the sample complexity of the noisy regime has the same order to that of the noiseless case. Our proofs are a combination of the so-called sample compression technique in \citep{ashtiani2018nearly}, mathematical tools from high-dimensional geometry, and Fourier analysis. In particular, we have proposed a general Fourier-based technique for recovery of a more general class of distribution families from additive Gaussian noise, which can be further used in a variety of other related problems.

artificial intelligence, machine learning, simplex, (10 more...)

arXiv.org Artificial Intelligence

2209.05953

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.42)

Add feedback

Distributed Sparse Feature Selection in Communication-Restricted Networks

Barghi, Hanie, Najafi, Amir, Motahari, Seyed Abolfazl

arXiv.org Machine LearningNov-2-2021

This paper aims to propose and theoretically analyze a new distributed scheme for sparse linear regression and feature selection. The primary goal is to learn the few causal features of a high-dimensional dataset based on noisy observations from an unknown sparse linear model. However, the presumed training set which includes $n$ data samples in $\mathbb{R}^p$ is already distributed over a large network with $N$ clients connected through extremely low-bandwidth links. Also, we consider the asymptotic configuration of $1\ll N\ll n\ll p$. In order to infer the causal dimensions from the whole dataset, we propose a simple, yet effective method for information sharing in the network. In this regard, we theoretically show that the true causal features can be reliably recovered with negligible bandwidth usage of $O\left(N\log p\right)$ across the network. This yields a significantly lower communication cost in comparison with the trivial case of transmitting all the samples to a single node (centralized scenario), which requires $O\left(np\right)$ transmissions. Even more sophisticated schemes such as ADMM still have a communication complexity of $O\left(Np\right)$. Surprisingly, our sample complexity bound is proved to be the same (up to a constant factor) as the optimal centralized approach for a fixed performance measure in each node, while that of a na\"{i}ve decentralized technique grows linearly with $N$. Theoretical guarantees in this paper are based on the recent analytic framework of debiased LASSO in Javanmard et al. (2019), and are supported by several computer experiments performed on both synthetic and real-world datasets.

artificial intelligence, health & medicine, machine learning, (19 more...)

arXiv.org Machine Learning

2111.02802

Country:

Europe (0.46)
Asia > Middle East > Iran (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Regularizing Recurrent Neural Networks via Sequence Mixup

Karamzade, Armin, Najafi, Amir, Motahari, Seyed Abolfazl

arXiv.org Machine LearningNov-27-2020

Recurrent neural networks are the basis of the state-of-the-art models in natural language processing, including language modeling (Mikolov et al., 2011), machine translation (Cho et al., 2014) and named entity recognition (Lample et al., 2016). It is needless to say that complex learning tasks require relatively large networks with millions of parameters to be accomplished. However, large neural networks need more data and/or strong regularization techniques to be trained successfully and avoid overfitting. Without the means to collect more data, which is the case in the majority of real-world problems, data augmentation and regularization methods are standard alternative practices to overcome this barrier. Data augmentation in natural language processing is limited, and often task-specific (Kobayashi, 2018; Kafle et al., 2017). On the other hand, adopting the same regularization methods that are originally proposed for feed-forward (non-recurrent) networks needs to be done with extra care to avoid hurting the network's information flow between consecutive time-steps. To overcome such limitations, we present Sequence Mixup: a set of training methods, regularization techniques, and data augmentation procedures for RNNs. Sequence Mixup can be considered as the RNN-generalization of input mixup (Zhang et al., 2017) and manifold mixup (Verma et al., 2018), which are already introduced for feed-forward neural

deep learning, mixup, neural network, (16 more...)

arXiv.org Machine Learning

2012.07527

Country:

North America > United States > Louisiana (0.14)
Europe > Spain (0.14)
Europe > Portugal (0.14)
(2 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Robustness to Adversarial Perturbations in Learning from Incomplete Data

Najafi, Amir, Maeda, Shin-ichi, Koyama, Masanori, Miyato, Takeru

arXiv.org Machine LearningMay-24-2019

Robustness to adversarial perturbations has become an essential feature in the design of modern classifiers --in particular, of deep neural networks. This phenomenon originates from several empirical observations, such as [1] and [2], which show deep networks are vulnerable to adversarial attacks in the input space. So far, plenty of novel methodologies have been introduced to compensate for this shortcoming. Adversarial Training (AT) [3], Virtual AT [4] or Distillation [5] are just examples of some promising methods in this area. The majority of these approaches seek an effective defense against a point-wise adversary, who shifts input data-points toward adversarial directions, in a separate manner. However, as shown by [6], a distributional adversary who can shift the data distribution instead of the input data-points is provably more detrimental to learning. This suggests that one can greatly improve the robustness of a classifier by improving its defense against a distributional adversary rather than a point-wise one. This motivation has led to the development of Distributionally Robust Learning (DRL) [7], which has attracted intensive research interest over the last few years [8, 9, 10, 11]. Despite of all the advancements in supervised or unsupervised DRL, the amount of researches tackling this problem from a semi-supervised angle is slim to none [12].

deep learning, neural network, vat, (18 more...)

arXiv.org Machine Learning

1905.13021

Country:

North America > United States > Wisconsin (0.14)
Asia > Middle East > Iran (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

On Statistical Learning of Simplices: Unmixing Problem Revisited

Najafi, Amir, Ilchi, Saeed, Saberi, Amir H., Motahari, Abolfazl, Khalaj, Babak H., Rabiee, Hamid R.

arXiv.org Machine LearningOct-17-2018

Learning of high-dimensional simplices from uniformly-sampled observations, generally known as the "unmixing problem", is a long-studied task in computer science. More recently, a significant interest is focused on this problem from other areas, such as computational biology and remote sensing. In this paper, we have studied the Probably Approximately Correct (PAC)-learnability of simplices with a focus on sample complexity. Our analysis shows that a sufficient sample size for PAC-learning of $K$-simplices is only $O\left(K^2\log K\right)$, yielding a huge improvement over the existing results, i.e. $O\left(K^{22}\right)$. Moreover, a novel continuously-relaxed optimization scheme is proposed which is guaranteed to achieve a PAC-approximation of the simplex, followed by a corresponding scalable algorithm whose performance is extensively tested on synthetic and real-world datasets. Experimental results show that not only being comparable to other existing strategies on noiseless samples, our method is superior to the state-of-the-art in noisy cases. The overall proposed framework is backed with solid theoretical guarantees and provides a rigorous framework for future research in this area.

complexity, health & medicine, optimization problem, (20 more...)

arXiv.org Machine Learning

1810.07845

Country: Asia > Middle East > Iran (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.48)

Add feedback

Reliable Learning of Bernoulli Mixture Models

Najafi, Amir, Motahari, Abolfazl, Rabiee, Hamid R.

arXiv.org Machine LearningOct-5-2017

In this paper, we have derived a set of sufficient conditions for reliable clustering of data produced by Bernoulli Mixture Models (BMM), when the number of clusters is unknown. A BMM refers to a random binary vector whose components are independent Bernoulli trials with cluster-specific frequencies. The problem of clustering BMM data arises in many real-world applications, most notably in population genetics where researchers aim at inferring the population structure from multilocus genotype data. Our findings stipulate a minimum dataset size and a minimum number of Bernoulli trials (or genotyped loci) per sample, such that the existence of a clustering algorithm with a sufficient accuracy is guaranteed. Moreover, the mathematical intuitions and tools behind our work can help researchers in designing more effective and theoretically-plausible heuristic methods for similar problems.

artificial intelligence, bernoulli model, health & medicine, (18 more...)

arXiv.org Machine Learning

1710.02101

Country: Asia > Middle East > Iran (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback