AITopics | Yu, Felix X.

Collaborating Authors

Yu, Felix X.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Automatic Engineering of Long Prompts

Hsieh, Cho-Jui, Si, Si, Yu, Felix X., Dhillon, Inderjit S.

arXiv.org Artificial IntelligenceNov-16-2023

Large language models (LLMs) have demonstrated remarkable capabilities in solving complex open-domain tasks, guided by comprehensive instructions and demonstrations provided in the form of prompts. However, these prompts can be lengthy, often comprising hundreds of lines and thousands of tokens, and their design often requires considerable human effort. Recent research has explored automatic prompt engineering for short prompts, typically consisting of one or a few sentences. However, the automatic design of long prompts remains a challenging problem due to its immense search space. In this paper, we investigate the performance of greedy algorithms and genetic algorithms for automatic long prompt engineering. We demonstrate that a simple greedy approach with beam search outperforms other methods in terms of search efficiency. Moreover, we introduce two novel techniques that utilize search history to enhance the effectiveness of LLM-based mutation in our search algorithm. Our results show that the proposed automatic long prompt engineering algorithm achieves an average of 9.2% accuracy gain on eight tasks in Big Bench Hard, highlighting the significance of automating prompt designs to fully harness the capabilities of LLMs.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2311.10117

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

Rawat, Ankit Singh, Menon, Aditya Krishna, Jitkrittum, Wittawat, Jayasumana, Sadeep, Yu, Felix X., Reddi, Sashank, Kumar, Sanjiv

arXiv.org Machine LearningMay-12-2021

Classification problems with a large number of labels arise in language modelling [Mikolov et al., 2013, Levy and Goldberg, 2014], recommender systems [Covington et al., 2016, Xu et al., 2016], and information retrieval [Agrawal et al., 2013, Prabhu and Varma, 2014]. Such large-output problems pose a core challenge: losses such as the softmax cross-entropy can be prohibitive to optimise, as they depend on the entire set of labels. Several works have thus devised negative sampling schemes for efficiently and effectively approximating such losses [Bengio and Senecal, 2008, Blanc and Rendle, 2018, Ruiz et al., 2018, Bamler and Mandt, 2020]. Broadly, negative sampling techniques sample a subset of "negative" labels, which are used to contrast against the observed "positive" labels. One further applies a suitable weighting on these "negatives", which ostensibly corrects the sampling bias introduced by the dependence on a random subset of labels. Intuitively, such bias assesses how closely a scheme approximates the unsampled loss on the full label set. This bias is well understood for sampled softmax schemes (see, e.g., Bengio and Senecal [2008]); surprisingly, however, far less is understood about other popular schemes, e.g., within-batch and uniform sampling (cf.

artificial intelligence, neural network, softmax cross-entropy, (20 more...)

arXiv.org Machine Learning

2105.05736

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.48)

Add feedback

AdaCliP: Adaptive Clipping for Private SGD

Pichapati, Venkatadheeraj, Suresh, Ananda Theertha, Yu, Felix X., Reddi, Sashank J., Kumar, Sanjiv

arXiv.org Machine LearningAug-20-2019

Privacy preserving machine learning algorithms are crucial for learning models over user data to protect sensitive information. Motivated by this, differentially private stochastic gradient descent (SGD) algorithms for training machine learning models have been proposed. At each step, these algorithms modify the gradients and add noise proportional to the sensitivity of the modified gradients. Under this framework, we propose AdaCliP, a theoretically motivated differentially private SGD algorithm that provably adds less noise compared to the previous methods, by using coordinate-wise adaptive clipping of the gradient. We empirically demonstrate that AdaCliP reduces the amount of added noise and produces models with better accuracy.

deep learning, neural network, null 2, (17 more...)

arXiv.org Machine Learning

1908.07643

Country:

Asia (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Sparse Recovery Autoencoder

Wu, Shanshan, Dimakis, Alexandros G., Sanghavi, Sujay, Yu, Felix X., Holtmann-Rice, Daniel, Storcheus, Dmitry, Rostamizadeh, Afshin, Kumar, Sanjiv

arXiv.org Machine LearningJul-5-2018

Linear encoding of sparse vectors is widely popular, but is most commonly data-independent -- missing any possible extra (but a-priori unknown) structure beyond sparsity. In this paper we present a new method to learn linear encoders that adapt to data, while still performing well with the widely used $\ell_1$ decoder. The convex $\ell_1$ decoder prevents gradient propagation as needed in standard autoencoder training. Our method is based on the insight that unfolding the convex decoder into $T$ projected gradient steps can address this issue. Our method can be seen as a data-driven way to learn a compressed sensing matrix. Our experiments show that there is indeed additional structure beyond sparsity in several real datasets. Our autoencoder is able to discover it and exploit it to create excellent reconstructions with fewer measurements compared to the previous state of the art methods.

deep learning, matrix, neural network, (20 more...)

arXiv.org Machine Learning

1806.10175

Country: North America > United States (0.93)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Orthogonal Random Features

Yu, Felix X., Suresh, Ananda Theertha, Choromanski, Krzysztof, Holtmann-Rice, Daniel, Kumar, Sanjiv

arXiv.org Machine LearningOct-27-2016

We present an intriguing discovery related to Random Fourier Features: in Gaussian kernel approximation, replacing the random Gaussian matrix by a properly scaled random orthogonal matrix significantly decreases kernel approximation error. We call this technique Orthogonal Random Features (ORF), and provide theoretical and empirical justification for this behavior. Motivated by this discovery, we further propose Structured Orthogonal Random Features (SORF), which uses a class of structured discrete orthogonal matrices to speed up the computation. The method reduces the time cost from $\mathcal{O}(d^2)$ to $\mathcal{O}(d \log d)$, where $d$ is the data dimensionality, with almost no compromise in kernel approximation quality compared to ORF. Experiments on several datasets verify the effectiveness of ORF and SORF over the existing methods. We also provide discussions on using the same type of discrete orthogonal structure for a broader range of applications.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

1610.09072

Country:

North America > United States (0.14)
Europe > Spain (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Compact Nonlinear Maps and Circulant Extensions

Yu, Felix X., Kumar, Sanjiv, Rowley, Henry, Chang, Shih-Fu

arXiv.org Machine LearningMar-12-2015

Kernel approximation via nonlinear random feature maps is widely used in speeding up kernel machines. There are two main challenges for the conventional kernel approximation methods. First, before performing kernel approximation, a good kernel has to be chosen. Picking a good kernel is a very challenging problem in itself. Second, high-dimensional maps are often required in order to achieve good performance. This leads to high computational cost in both generating the nonlinear maps, and in the subsequent learning and prediction process. In this work, we propose to optimize the nonlinear maps directly with respect to the classification objective in a data-dependent fashion. The proposed approach achieves kernel approximation and kernel learning in a joint framework. This leads to much more compact maps without hurting the performance. As a by-product, the same framework can also be used to achieve more compact kernel maps to approximate a known kernel. We also introduce Circulant Nonlinear Maps, which uses a circulant-structured projection matrix to speed up the nonlinear maps for high-dimensional data.

artificial intelligence, kernel, machine learning, (13 more...)

arXiv.org Machine Learning

1503.03893

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.49)

Add feedback

On Learning from Label Proportions

Yu, Felix X., Choromanski, Krzysztof, Kumar, Sanjiv, Jebara, Tony, Chang, Shih-Fu

arXiv.org Machine LearningFeb-11-2015

Learning from Label Proportions (LLP) is a learning setting, where the training data is provided in groups, or "bags", and only the proportion of each class in each bag is known. The task is to learn a model to predict the class labels of the individual instances. LLP has broad applications in political science, marketing, healthcare, and computer vision. This work answers the fundamental question, when and why LLP is possible, by introducing a general framework, Empirical Proportion Risk Minimization (EPRM). EPRM learns an instance label classifier to match the given label proportions on the training data. Our result is based on a two-step analysis. First, we provide a VC bound on the generalization error of the bag proportions. We show that the bag sample complexity is only mildly sensitive to the bag size. Second, we show that under some mild assumptions, good bag proportion prediction guarantees good instance label prediction. The results together provide a formal guarantee that the individual labels can indeed be learned in the LLP setting. We discuss applications of the analysis, including justification of LLP algorithms, learning with population proportions, and a paradigm for learning algorithms with privacy guarantees. We also demonstrate the feasibility of LLP based on a case study in real-world setting: predicting income based on census data.

artificial intelligence, machine learning, proportion, (17 more...)

arXiv.org Machine Learning

1402.5902

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Circulant Binary Embedding

Yu, Felix X., Kumar, Sanjiv, Gong, Yunchao, Chang, Shih-Fu

arXiv.org Machine LearningMay-13-2014

Binary embedding of high-dimensional data requires long codes to preserve the discriminative power of the input space. Traditional binary coding methods often suffer from very high computation and storage costs in such a scenario. To address this problem, we propose Circulant Binary Embedding (CBE) which generates binary codes by projecting the data with a circulant matrix. The circulant structure enables the use of Fast Fourier Transformation to speed up the computation. Compared to methods that use unstructured matrices, the proposed method improves the time complexity from $\mathcal{O}(d^2)$ to $\mathcal{O}(d\log{d})$, and the space complexity from $\mathcal{O}(d^2)$ to $\mathcal{O}(d)$ where $d$ is the input dimensionality. We also propose a novel time-frequency alternating optimization to learn data-dependent circulant projections, which alternatively minimizes the objective in original and Fourier domains. We show by extensive experiments that the proposed approach gives much better performance than the state-of-the-art approaches for fixed time, and provides much faster computation with no performance degradation for fixed number of bits.

artificial intelligence, optimization problem, rand cbe opt recall 0, (12 more...)

arXiv.org Machine Learning

1405.3162

Country:

Asia (0.28)
North America > United States > North Carolina (0.14)

Genre: Research Report > Promising Solution (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.69)

Add feedback

$\propto$SVM for learning with label proportions

Yu, Felix X., Liu, Dong, Kumar, Sanjiv, Jebara, Tony, Chang, Shih-Fu

arXiv.org Machine LearningJun-4-2013

We study the problem of learning with label proportions in which the training data is provided in groups and only the proportion of each class in each group is known. We propose a new method called proportion-SVM, or $\propto$SVM, which explicitly models the latent unknown instance labels together with the known group label proportions in a large-margin framework. Unlike the existing works, our approach avoids making restrictive assumptions about the data. The $\propto$SVM model leads to a non-convex integer programming problem. In order to solve it efficiently, we propose two algorithms: one based on simple alternating optimization and the other based on a convex relaxation. Extensive experiments on standard datasets show that $\propto$SVM outperforms the state-of-the-art, especially for larger group sizes.

artificial intelligence, machine learning, optimization problem, (14 more...)

arXiv.org Machine Learning

1306.0886

Country: North America > United States (0.68)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)

Add feedback