AITopics | DeSalvo, Giulia

Collaborating Authors

DeSalvo, Giulia

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

No more hard prompts: SoftSRV prompting for synthetic data generation

DeSalvo, Giulia, Kagy, Jean-Fracois, Karydas, Lazaros, Rostamizadeh, Afshin, Kumar, Sanjiv

arXiv.org Artificial IntelligenceOct-22-2024

We present a novel soft prompt based framework, SoftSRV, that leverages a frozen pre-trained large language model (LLM) to generate targeted synthetic text sequences. Given a sample from the target distribution, our proposed framework uses data-driven loss minimization to train a parameterized "contextual" soft prompt. This soft prompt is then used to steer the frozen LLM to generate synthetic sequences that are similar to the target distribution. We argue that SoftSRV provides a practical improvement over common hard-prompting approaches that rely on human-curated prompt-templates, which can be idiosyncratic, labor-intensive to craft, and may need to be specialized per domain. We empirically evaluate SoftSRV and hard-prompting baselines by generating synthetic data to fine-tune a small Gemma model on three different domains (coding, math, reasoning). To stress the generality of SoftSRV, we perform these evaluations without any particular specialization of the framework to each domain. We find that SoftSRV significantly improves upon hard-prompting baselines, generating data with superior fine-tuning performance and that better matches the target distribution according to the MAUVE similarity metric.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.16534

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

GIST: Greedy Independent Set Thresholding for Diverse Data Summarization

Fahrbach, Matthew, Ramalingam, Srikumar, Zadimoghaddam, Morteza, Ahmadian, Sara, Citovsky, Gui, DeSalvo, Giulia

arXiv.org Artificial IntelligenceMay-29-2024

Subset selection is a challenging optimization problem with a wide variety of applications in machine learning, including feature selection, recommender systems, news aggregation, drug discovery, data summarization, and designing pretraining sets for large language models (Anil et al., 2023). Data sampling in particular is a salient problem due to unprecedented and continuous data collection. For example, LiDAR and imaging devices in one self-driving vehicle can easily capture ~80 terabytes of data per day (Kazhamiaka et al., 2021). In most subset selection tasks, we rely on the weight (or utility) of the objects to rank one over the other, and also to avoid selecting duplicate or near-duplicate objects. If we select a small subset, then we also want to ensure that the selected subset is a good representation of the original set. These utility, diversity, and coverage criteria can be expressed through objective functions, and the interesting research lies in developing efficient algorithms with strong approximation guarantees. The underlying machinery used in constrained subset selection algorithms shares many similarities with techniques from other areas of combinatorial optimization such as submodular maximization, -center clustering, and convex hull approximations. In this work, we study the problem of selecting a set of points in a metric space that maximizes an objective that combines their utility and a minimum pairwise-distance diversity measure.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2405.18754

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection

Ye, Ke, Jiang, Heinrich, Rostamizadeh, Afshin, Chakrabarti, Ayan, DeSalvo, Giulia, Kagy, Jean-François, Karydas, Lazaros, Citovsky, Gui, Kumar, Sanjiv

arXiv.org Artificial IntelligenceJan-23-2024

Pre-training large language models is known to be extremely resource intensive and often times inefficient, under-utilizing the information encapsulated in the training text sequences. In this paper, we present SpacTor, a new training procedure consisting of (1) a hybrid objective combining span corruption (SC) and token replacement detection (RTD), and (2) a two-stage curriculum that optimizes the hybrid objective over the initial $\tau$ iterations, then transitions to standard SC loss. We show empirically that the effectiveness of the hybrid objective is tied to the two-stage pre-training schedule, and provide extensive analysis on why this is the case. In our experiments with encoder-decoder architectures (T5) on a variety of NLP tasks, SpacTor-T5 yields the same downstream performance as standard SC pre-training, while enabling a 50% reduction in pre-training iterations and 40% reduction in total FLOPs. Alternatively, given the same amount of computing budget, we find that SpacTor results in significantly improved downstream benchmark performance.

accuracy, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2401.1316

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Two-Step Active Learning for Instance Segmentation with Uncertainty and Diversity Sampling

Yu, Ke, Albro, Stephen, DeSalvo, Giulia, Kothawade, Suraj, Rashwan, Abdullah, Tavakkol, Sasan, Batmanghelich, Kayhan, Yin, Xiaoqi

arXiv.org Artificial IntelligenceSep-27-2023

Training high-quality instance segmentation models requires an abundance of labeled images with instance masks and classifications, which is often expensive to procure. Active learning addresses this challenge by striving for optimum performance with minimal labeling cost by selecting the most informative and representative images for labeling. Despite its potential, active learning has been less explored in instance segmentation compared to other tasks like image classification, which require less labeling. In this study, we propose a post-hoc active learning algorithm that integrates uncertainty-based sampling with diversity-based sampling. Our proposed algorithm is not only simple and easy to implement, but it also delivers superior performance on various datasets. Its practical application is demonstrated on a real-world overhead imagery dataset, where it increases the labeling efficiency fivefold.

artificial intelligence, machine learning, uncertainty and diversity sampling, (2 more...)

arXiv.org Artificial Intelligence

2309.16139

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Agile Modeling: From Concept to Classifier in Minutes

Stretcu, Otilia, Vendrow, Edward, Hata, Kenji, Viswanathan, Krishnamurthy, Ferrari, Vittorio, Tavakkol, Sasan, Zhou, Wenlei, Avinash, Aditya, Luo, Enming, Alldrin, Neil Gordon, Bateni, MohammadHossein, Berger, Gabriel, Bunner, Andrew, Lu, Chun-Ta, Rey, Javier A, DeSalvo, Giulia, Krishna, Ranjay, Fuxman, Ariel

arXiv.org Artificial IntelligenceMay-12-2023

The application of computer vision to nuanced subjective use cases is growing. While crowdsourcing has served the vision community well for most objective tasks (such as labeling a "zebra"), it now falters on tasks where there is substantial subjectivity in the concept (such as identifying "gourmet tuna"). However, empowering any user to develop a classifier for their concept is technically difficult: users are neither machine learning experts, nor have the patience to label thousands of examples. In reaction, we introduce the problem of Agile Modeling: the process of turning any subjective visual concept into a computer vision model through a real-time user-in-the-loop interactions. We instantiate an Agile Modeling prototype for image classification and show through a user study (N=14) that users can create classifiers with minimal effort under 30 minutes. We compare this user driven process with the traditional crowdsourcing paradigm and find that the crowd's notion often differs from that of the user's, especially as the concepts become more subjective. Finally, we scale our experiments with simulations of users training classifiers for ImageNet21k categories to further demonstrate the efficacy.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2302.12948

Country: North America > United States (0.67)

Genre: Research Report (0.64)

Industry:

Education (0.93)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

Leveraging Importance Weights in Subset Selection

Citovsky, Gui, DeSalvo, Giulia, Kumar, Sanjiv, Ramalingam, Srikumar, Rostamizadeh, Afshin, Wang, Yunjuan

arXiv.org Artificial IntelligenceJan-27-2023

We present a subset selection algorithm designed to work with arbitrary model families in a practical batch setting. In such a setting, an algorithm can sample examples one at a time but, in order to limit overhead costs, is only able to update its state (i.e. Our algorithm, IWeS, selects examples by importance sampling where the sampling probability assigned to each example is based on the entropy of models trained on previously selected batches. IWeS admits significant performance improvement compared to other subset selection algorithms for seven publicly available datasets. Additionally, it is competitive in an active learning setting, where the label information is not available at selection time. We also provide an initial theoretical analysis to support our importance weighting approach, proving generalization and sampling rate bounds. Deep neural networks have shown remarkable success in several domains such as computer vision and natural language processing. In many tasks, this is achieved by heavily relying on extremely large labeled datasets. In addition to the storage costs and potential security/privacy concerns that come along with large datasets, training modern deep neural networks on such datasets also incur high computational costs.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2301.12052

Country: North America > Canada > Ontario (0.28)

Genre: Research Report (0.82)

Industry: Information Technology (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Batch Active Learning at Scale

Citovsky, Gui, DeSalvo, Giulia, Gentile, Claudio, Karydas, Lazaros, Rajagopalan, Anand, Rostamizadeh, Afshin, Kumar, Sanjiv

arXiv.org Artificial IntelligenceJul-29-2021

The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources. Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem. The practical benefits of batch sampling come with the downside of less adaptivity and the risk of sampling redundant examples within a batch -- a risk that grows with the batch size. In this work, we analyze an efficient active learning algorithm, which focuses on the large batch setting. In particular, we show that our sampling method, which combines notions of uncertainty and diversity, easily scales to batch sizes (100K-1M) several orders of magnitude larger than used in previous studies and provides significant improvements in model training efficiency compared to recent baselines. Finally, we provide an initial theoretical analysis, proving label complexity guarantees for a related sampling method, which we show is approximately equivalent to our sampling method in specific settings.

algorithm, deep learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

2107.14263

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Boosting with Abstention

Cortes, Corinna, DeSalvo, Giulia, Mohri, Mehryar

Neural Information Processing SystemsDec-31-2016

We present a new boosting algorithm for the key scenario of binary classification with abstention where the algorithm can abstain from predicting the label of a point, at the price of a fixed cost. At each round, our algorithm selects a pair of functions, a base predictor and a base abstention function. We define convex upper bounds for the natural loss function associated to this problem, which we prove to be calibrated with respect to the Bayes solution. Our algorithm benefits from general margin-based learning guarantees which we derive for ensembles of pairs of base predictor and abstention functions, in terms of the Rademacher complexities of the corresponding function classes. We give convergence guarantees for our algorithm along with a linear-time weak-learning algorithm for abstention stumps. We also report the results of several experiments suggesting that our algorithm provides a significant improvement in practice over two confidence-based algorithms.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.14)
Europe > Spain (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

Li, Lisha, Jamieson, Kevin, DeSalvo, Giulia, Rostamizadeh, Afshin, Talwalkar, Ameet

arXiv.org Machine LearningNov-23-2016

Performance of machine learning algorithms depends critically on identifying a good set of hyperparameters. While current methods offer efficiencies by adaptively choosing new configurations to train, an alternative strategy is to adaptively allocate resources across the selected configurations. We formulate hyperparameter optimization as a pure-exploration non-stochastic infinitely many armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations. We introduce Hyperband for this framework and analyze its theoretical properties, providing several desirable guarantees. Furthermore, we compare Hyperband with state-of-the-art methods on a suite of hyperparameter optimization problems. We observe that Hyperband provides five times to thirty times speedup over state-of-the-art Bayesian optimization algorithms on a variety of deep-learning and kernel-based learning problems.

configuration, deep learning, neural network, (21 more...)

arXiv.org Machine Learning

1603.0656

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

Random Composite Forests

DeSalvo, Giulia (Courant Institute of Mathematical Sciences) | Mohri, Mehryar (Courant Institute of Mathematical Sciences and Google Research)

AAAI ConferencesApr-19-2016

We introduce a broad family of decision trees, Composite Trees, whose leaf classifiers are selected out of a hypothesis set composed of p subfamilies with different complexities. We prove new data-dependent learning guarantees for this family in the multi-class setting. These learning bounds provide a quantitative guidance for the choice of the hypotheses at each leaf. Remarkably, they depend on the Rademacher complexities of the sub-families of the predictors and the fraction of sample points correctly classified at each leaf. We further introduce random composite trees and derive learning guarantees for random composite trees which also apply to Random Forests. Using our theoretical analysis, we devise a new algorithm, RANDOMCOMPOSITEFORESTS (RCF), that is based on forming an ensemble of random composite trees. We report the results of experiments demonstrating that RCF yields significant performance improvements over both Random Forests and a variant of RCF in several tasks.

artificial intelligence, composite tree, decision tree learning, (19 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country: North America > United States > New York (0.14)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback