AITopics | Ermis, Beyza

Collaborating Authors

Ermis, Beyza

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research

Pozzobon, Luiza, Ermis, Beyza, Lewis, Patrick, Hooker, Sara

arXiv.org Artificial IntelligenceApr-24-2023

Perception of toxicity evolves over time and often differs between geographies and cultural backgrounds. Similarly, black-box commercially available APIs for detecting toxicity, such as the Perspective API, are not static, but frequently retrained to address any unattended weaknesses and biases. We evaluate the implications of these changes on the reproducibility of findings that compare the relative merits of models and methods that aim to curb toxicity. Our findings suggest that research that relied on inherited automatic toxicity scores to compare models and techniques may have resulted in inaccurate findings. Rescoring all models from HELM, a widely respected living benchmark, for toxicity with the recent version of the API led to a different ranking of widely used foundation models. We suggest caution in applying apples-to-apples comparisons between studies and lay recommendations for a more structured approach to evaluating toxicity over time. Code and data are available at https://github.com/for-ai/black-box-api-challenges.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.12397

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation > Air (0.81)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

PASHA: Efficient HPO and NAS with Progressive Resource Allocation

Bohdal, Ondrej, Balles, Lukas, Wistuba, Martin, Ermis, Beyza, Archambeau, Cédric, Zappella, Giovanni

arXiv.org Artificial IntelligenceMar-8-2023

Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run. When models are trained on large datasets, tuning them with HPO or NAS rapidly becomes prohibitively expensive for practitioners, even when efficient multi-fidelity methods are employed. We propose an approach to tackle the challenge of tuning machine learning models trained on large datasets with limited computational resources. Our approach, named PASHA, extends ASHA and is able to dynamically allocate maximum resources for the tuning procedure depending on the need. The experimental comparison shows that PASHA identifies well-performing hyperparameter configurations and architectures while consuming significantly fewer computational resources than ASHA. Hyperparameter optimization (HPO) and neural architecture search (NAS) yield state-of-the-art models, but often are a very costly endeavor, especially when working with large datasets and models. For example, using the results of (Sharir et al., 2020) we can estimate that evaluating 50 configurations for a 340-million-parameter BERT model (Devlin et al., 2019) on the 15GB Wikipedia and Book corpora would cost around $500,000. To make HPO and NAS more efficient, researchers explored how we can learn from cheaper evaluations (e.g. on a subset of the data) to later allocate more resources only to promising configurations. This created a family of methods often described as multifidelity methods. Two well-known algorithms in this family are Successive Halving (SH) (Jamieson & Talwalkar, 2016; Karnin et al., 2013) and Hyperband (HB) (Li et al., 2018). Multi-fidelity methods significantly lower the cost of the tuning. Li et al. (2018) reported speedups up to 30x compared to standard Bayesian Optimization (BO) and up to 70x compared to random search. Unfortunately, the cost of current multi-fidelity methods is still too high for many practitioners, also because of the large datasets used for training the models.

artificial intelligence, configuration, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2207.0694

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Memory Efficient Continual Learning with Transformers

Ermis, Beyza, Zappella, Giovanni, Wistuba, Martin, Rawal, Aditya, Archambeau, Cedric

arXiv.org Artificial IntelligenceJan-13-2023

In many real-world scenarios, data to train machine learning models becomes available over time. Unfortunately, these models struggle to continually learn new concepts without forgetting what has been learnt in the past. This phenomenon is known as catastrophic forgetting and it is difficult to prevent due to practical constraints. For instance, the amount of data that can be stored or the computational resources that can be used might be limited. Moreover, applications increasingly rely on large pre-trained neural networks, such as pre-trained Transformers, since the resources or data might not be available in sufficiently large quantities to practitioners to train the model from scratch. In this paper, we devise a method to incrementally train a model on a sequence of tasks using pre-trained Transformers and extending them with Adapters. Different than the existing approaches, our method is able to scale to a large number of tasks without significant overhead and allows sharing information across tasks. On both image and text classification tasks, we empirically demonstrate that our method maintains a good predictive performance without retraining the model or increasing the number of model parameters over time. The resulting model is also significantly faster at inference time compared to Adapter-based state-of-the-art methods.

artificial intelligence, machine learning, memory efficient continual learning, (1 more...)

arXiv.org Artificial Intelligence

2203.0464

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Contextual Bandits under Delayed Feedback

Vernade, Claire, Carpentier, Alexandra, Zappella, Giovanni, Ermis, Beyza, Brueckner, Michael

arXiv.org Machine LearningJul-5-2018

Delayed feedback is an ubiquitous problem in many industrial systems employing bandit algorithms. Most of those systems seek to optimize binary indicators as clicks. In that case, when the reward is not sent immediately, the learner cannot distinguish a negative signal from a not-yet-sent positive one: she might be waiting for a feedback that will never come. In this paper, we define and address the contextual bandit problem with delayed and censored feedback by providing a new UCB-based algorithm. In order to demonstrate its effectiveness, we provide a finite time regret analysis and an empirical evaluation that compares it against a baseline commonly used in practice.

algorithm, artificial intelligence, big data, (21 more...)

arXiv.org Machine Learning

1807.02089

Country: Europe > Germany (0.28)

Genre: Research Report (0.64)

Industry:

Law > Civil Rights & Constitutional Law (0.49)
Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.88)

Add feedback

Differentially Private Variational Dropout

Ermis, Beyza, Cemgil, Ali Taylan

arXiv.org Machine LearningDec-16-2017

Deep neural networks with their large number of parameters are highly flexible learning systems. The high flexibility in such networks brings with some serious problems such as overfitting, and regularization is used to address this problem. A currently popular and effective regularization technique for controlling the overfitting is dropout. Often, large data collections required for neural networks contain sensitive information such as the medical histories of patients, and the privacy of the training data should be protected. In this paper, we modify the recently proposed variational dropout technique which provided an elegant Bayesian interpretation to dropout, and show that the intrinsic noise in the variational dropout can be exploited to obtain a degree of differential privacy. The iterative nature of training neural networks presents a challenge for privacy-preserving estimation since multiple iterations increase the amount of noise added. We overcome this by using a relaxed notion of differential privacy, called concentrated differential privacy, which provides tighter estimates on the overall privacy loss. We demonstrate the accuracy of our privacy-preserving variational dropout algorithm on benchmark datasets.

deep learning, neural network, privacy, (17 more...)

arXiv.org Machine Learning

1712.02629

Country:

North America > United States > California (0.14)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Differentially Private Dropout

Ermis, Beyza, Cemgil, Ali Taylan

arXiv.org Machine LearningNov-30-2017

Large data collections required for the training of neural networks often contain sensitive information such as the medical histories of patients, and the privacy of the training data must be preserved. In this paper, we introduce a dropout technique that provides an elegant Bayesian interpretation to dropout, and show that the intrinsic noise added, with the primary goal of regularization, can be exploited to obtain a degree of differential privacy. The iterative nature of training neural networks presents a challenge for privacy-preserving estimation since multiple iterations increase the amount of noise added. We overcome this by using a relaxed notion of differential privacy, called concentrated differential privacy, which provides tighter estimates on the overall privacy loss. We demonstrate the accuracy of our privacy-preserving dropout algorithm on benchmark datasets.

deep learning, neural network, noise, (17 more...)

arXiv.org Machine Learning

1712.01665

Country:

Asia > Middle East > Republic of Türkiye (0.14)
North America > United States > New York (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Incremental Variational Inference for Latent Dirichlet Allocation

Archambeau, Cedric, Ermis, Beyza

arXiv.org Machine LearningJul-22-2015

We introduce incremental variational inference and apply it to latent Dirichlet allocation (LDA). Incremental variational inference is inspired by incremental EM and provides an alternative to stochastic variational inference. Incremental LDA can process massive document collections, does not require to set a learning rate, converges faster to a local optimum of the variational bound and enjoys the attractive property of monotonically increasing it. We study the performance of incremental LDA on large benchmark data sets. We further introduce a stochastic approximation of incremental variational inference which extends to the asynchronous distributed setting. The resulting distributed algorithm achieves comparable performance as single host incremental variational inference, but with a significant speed-up.

artificial intelligence, text processing, variational inference, (15 more...)

arXiv.org Machine Learning

1507.05016

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.86)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.73)

Add feedback

A Bayesian Tensor Factorization Model via Variational Inference for Link Prediction

Ermis, Beyza, Cemgil, A. Taylan

arXiv.org Machine LearningSep-29-2014

Probabilistic approaches for tensor factorization aim to extract meaningful structure from incomplete data by postulating low rank constraints. Recently, variational Bayesian (VB) inference techniques have successfully been applied to large scale models. This paper presents full Bayesian inference via VB on both single and coupled tensor factorization models. Our method can be run even for very large models and is easily implemented. It exhibits better prediction performance than existing approaches based on maximum likelihood on several real-world datasets for missing link prediction problem.

artificial intelligence, bayesian inference, prediction performance, (11 more...)

arXiv.org Machine Learning

1409.8276

Country:

Asia > Middle East > Republic of Türkiye (0.14)
North America > United States > Oregon (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Add feedback