AITopics | Salim, Adil

Collaborating Authors

Salim, Adil

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Phi-4 Technical Report

Abdin, Marah, Aneja, Jyoti, Behl, Harkirat, Bubeck, Sébastien, Eldan, Ronen, Gunasekar, Suriya, Harrison, Michael, Hewett, Russell J., Javaheripi, Mojan, Kauffmann, Piero, Lee, James R., Lee, Yin Tat, Li, Yuanzhi, Liu, Weishung, Mendes, Caio C. T., Nguyen, Anh, Price, Eric, de Rosa, Gustavo, Saarikivi, Olli, Salim, Adil, Shah, Shital, Wang, Xin, Ward, Rachel, Wu, Yue, Yu, Dingli, Zhang, Cyril, Zhang, Yi

arXiv.org Artificial IntelligenceDec-11-2024

We present phi-4, a 14-billion parameter language model developed with a training recipe that is centrally focused on data quality. Unlike most language models, where pre-training is based primarily on organic data sources such as web content or code, phi-4 strategically incorporates synthetic data throughout the training process. While previous models in the Phi family largely distill the capabilities of a teacher model (specifically GPT-4), phi-4 substantially surpasses its teacher model on STEM-focused QA capabilities, giving evidence that our data-generation and post-training techniques go beyond distillation. Despite minimal changes to the phi-3 architecture, phi-4 achieves strong performance relative to its size -- especially on reasoning-focused benchmarks -- due to improved data, training curriculum, and innovations in the post-training scheme.

benchmark, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2412.08905

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment > Sports (0.92)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Long-time asymptotics of noisy SVGD outside the population limit

Priser, Victor, Bianchi, Pascal, Salim, Adil

arXiv.org Artificial IntelligenceJun-21-2024

Stein Variational Gradient Descent (SVGD) is a widely used sampling algorithm that has been successfully applied in several areas of Machine Learning. SVGD operates by iteratively moving a set of interacting particles (which represent the samples) to approximate the target distribution. Despite recent studies on the complexity of SVGD and its variants, their long-time asymptotic behavior (i.e., after numerous iterations ) is still not understood in the finite number of particles regime. We study the long-time asymptotic behavior of a noisy variant of SVGD. First, we establish that the limit set of noisy SVGD for large is well-defined. We then characterize this limit set, showing that it approaches the target distribution as increases. In particular, noisy SVGD provably avoids the variance collapse observed for SVGD. Our approach involves demonstrating that the trajectories of noisy SVGD closely resemble those described by a McKean-Vlasov process.

artificial intelligence, machine learning, svgd, (15 more...)

arXiv.org Artificial Intelligence

2406.11929

Country: Europe > Switzerland (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)

Add feedback

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Abdin, Marah, Jacobs, Sam Ade, Awan, Ammar Ahmad, Aneja, Jyoti, Awadallah, Ahmed, Awadalla, Hany, Bach, Nguyen, Bahree, Amit, Bakhtiari, Arash, Bao, Jianmin, Behl, Harkirat, Benhaim, Alon, Bilenko, Misha, Bjorck, Johan, Bubeck, Sébastien, Cai, Qin, Cai, Martin, Mendes, Caio César Teodoro, Chen, Weizhu, Chaudhary, Vishrav, Chen, Dong, Chen, Dongdong, Chen, Yen-Chun, Chen, Yi-Ling, Chopra, Parul, Dai, Xiyang, Del Giorno, Allie, de Rosa, Gustavo, Dixon, Matthew, Eldan, Ronen, Fragoso, Victor, Iter, Dan, Gao, Mei, Gao, Min, Gao, Jianfeng, Garg, Amit, Goswami, Abhishek, Gunasekar, Suriya, Haider, Emman, Hao, Junheng, Hewett, Russell J., Huynh, Jamie, Javaheripi, Mojan, Jin, Xin, Kauffmann, Piero, Karampatziakis, Nikos, Kim, Dongwoo, Khademi, Mahoud, Kurilenko, Lev, Lee, James R., Lee, Yin Tat, Li, Yuanzhi, Li, Yunsheng, Liang, Chen, Liden, Lars, Liu, Ce, Liu, Mengchen, Liu, Weishung, Lin, Eric, Lin, Zeqi, Luo, Chong, Madan, Piyush, Mazzola, Matt, Mitra, Arindam, Modi, Hardik, Nguyen, Anh, Norick, Brandon, Patra, Barun, Perez-Becker, Daniel, Portet, Thomas, Pryzant, Reid, Qin, Heyang, Radmilac, Marko, Rosset, Corby, Roy, Sambudha, Ruwase, Olatunji, Saarikivi, Olli, Saied, Amin, Salim, Adil, Santacroce, Michael, Shah, Shital, Shang, Ning, Sharma, Hiteshi, Shukla, Swadheen, Song, Xia, Tanaka, Masahiro, Tupini, Andrea, Wang, Xin, Wang, Lijuan, Wang, Chunyu, Wang, Yu, Ward, Rachel, Wang, Guanhua, Witte, Philipp, Wu, Haiping, Wyatt, Michael, Xiao, Bin, Xu, Can, Xu, Jiahang, Xu, Weijian, Yadav, Sonali, Yang, Fan, Yang, Jianwei, Yang, Ziyi, Yang, Yifan, Yu, Donghan, Yuan, Lu, Zhang, Chengruidong, Zhang, Cyril, Zhang, Jianwen, Zhang, Li Lyna, Zhang, Yi, Zhang, Yue, Zhang, Yunan, Zhou, Xiren

arXiv.org Artificial IntelligenceMay-23-2024

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench). Moreover, we also introduce phi-3-vision, a 4.2 billion parameter model based on phi-3-mini with strong reasoning capabilities for image and text prompts.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2404.14219

Country:

Europe (0.28)
Asia > China (0.14)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment > Sports (0.47)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Textbooks Are All You Need

Gunasekar, Suriya, Zhang, Yi, Aneja, Jyoti, Mendes, Caio César Teodoro, Del Giorno, Allie, Gopi, Sivakanth, Javaheripi, Mojan, Kauffmann, Piero, de Rosa, Gustavo, Saarikivi, Olli, Salim, Adil, Shah, Shital, Behl, Harkirat Singh, Wang, Xin, Bubeck, Sébastien, Eldan, Ronen, Kalai, Adam Tauman, Lee, Yin Tat, Li, Yuanzhi

arXiv.org Artificial IntelligenceOct-2-2023

We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displays surprising emergent properties compared to phi-1-base, our model before our finetuning stage on a dataset of coding exercises, and phi-1-small, a smaller model with 350M parameters trained with the same pipeline as phi-1 that still achieves 45% on HumanEval.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.11644

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Gaussian random field approximation via Stein's method with applications to wide random neural networks

Balasubramanian, Krishnakumar, Goldstein, Larry, Ross, Nathan, Salim, Adil

arXiv.org Artificial IntelligenceJun-28-2023

We derive upper bounds on the Wasserstein distance ($W_1$), with respect to $\sup$-norm, between any continuous $\mathbb{R}^d$ valued random field indexed by the $n$-sphere and the Gaussian, based on Stein's method. We develop a novel Gaussian smoothing technique that allows us to transfer a bound in a smoother metric to the $W_1$ distance. The smoothing is based on covariance functions constructed using powers of Laplacian operators, designed so that the associated Gaussian process has a tractable Cameron-Martin or Reproducing Kernel Hilbert Space. This feature enables us to move beyond one dimensional interval-based index sets that were previously considered in the literature. Specializing our general result, we obtain the first bounds on the Gaussian random field approximation of wide random neural networks of any depth and Lipschitz activation functions at the random field level. Our bounds are explicitly expressed in terms of the widths of the network and moments of the random weights. We also obtain tighter bounds when the activation function has three bounded derivatives.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Artificial Intelligence

2306.16308

Country: North America > United States > California (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The probability flow ODE is provably fast

Chen, Sitan, Chewi, Sinho, Lee, Holden, Li, Yuanzhi, Lu, Jianfeng, Salim, Adil

arXiv.org Artificial IntelligenceMay-19-2023

We provide the first polynomial-time convergence guarantees for the probability flow ODE implementation (together with a corrector step) of score-based generative modeling. Our analysis is carried out in the wake of recent results obtaining such guarantees for the SDE-based implementation (i.e., denoising diffusion probabilistic modeling or DDPM), but requires the development of novel techniques for studying deterministic dynamics without contractivity. Through the use of a specially chosen corrector step based on the underdamped Langevin diffusion, we obtain better dimension dependence than prior works on DDPM ($O(\sqrt{d})$ vs. $O(d)$, assuming smoothness of the data distribution), highlighting potential advantages of the ODE framework.

artificial intelligence, machine learning, null 2, (13 more...)

arXiv.org Artificial Intelligence

2305.11798

Country: North America > United States (0.92)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions

Chen, Sitan, Chewi, Sinho, Li, Jerry, Li, Yuanzhi, Salim, Adil, Zhang, Anru R.

arXiv.org Artificial IntelligenceApr-15-2023

Score-based generative models (SGMs) are a family of generative models which achieve state-of-the-art performance for generating audio and image data [Soh+15; HJA20; DN21; Kin+21; Son+21a; Son+21b; VKK21]; see, e.g., the recent surveys [Cao+22; Cro+22; Yan+22]. One notable example of an SGM are denoising diffusion probabilistic models (DDPMs) [Soh+15; HJA20], which are a key component in largescale generative models such as DALL E 2 [Ram+22]. As the importance of SGMs continues to grow due to newfound applications in commercial domains, it is a pressing question of both practical and theoretical concern to understand the mathematical underpinnings which explain their startling empirical successes. As we explain in more detail in Section 2, at their mathematical core, SGMs consist of two stochastic processes, which we call the forward process and the reverse process. The forward process transforms samples from a data distribution q (e.g., natural images) into pure noise, whereas the reverse process transforms pure noise into samples from q, hence performing generative modeling. Implementation of the reverse process requires estimation of the score function of the law of the forward process, which is typically accomplished by training neural networks on a score matching objective [Hyv05; Vin11; SE19]. Providing precise guarantees for estimation of the score function is difficult, as it requires an understanding of the non-convex training dynamics of neural network optimization that is currently out of reach. However, given the empirical success of neural networks on the score estimation task, a natural and important question is whether or not accurate score estimation implies that SGMs provably converge to the true data distribution in realistic settings.

artificial intelligence, exp, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2209.11215

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.54)

Add feedback

Forward-backward Gaussian variational inference via JKO in the Bures-Wasserstein Space

Diao, Michael, Balasubramanian, Krishnakumar, Chewi, Sinho, Salim, Adil

arXiv.org Artificial IntelligenceApr-10-2023

Variational inference (VI) seeks to approximate a target distribution $\pi$ by an element of a tractable family of distributions. Of key interest in statistics and machine learning is Gaussian VI, which approximates $\pi$ by minimizing the Kullback-Leibler (KL) divergence to $\pi$ over the space of Gaussians. In this work, we develop the (Stochastic) Forward-Backward Gaussian Variational Inference (FB-GVI) algorithm to solve Gaussian VI. Our approach exploits the composite structure of the KL divergence, which can be written as the sum of a smooth term (the potential) and a non-smooth term (the entropy) over the Bures-Wasserstein (BW) space of Gaussians endowed with the Wasserstein distance. For our proposed algorithm, we obtain state-of-the-art convergence guarantees when $\pi$ is log-smooth and log-concave, as well as the first convergence guarantees to first-order stationary solutions when $\pi$ is only log-smooth.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2304.05398

Country: North America > United States (0.92)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Towards a Theory of Non-Log-Concave Sampling: First-Order Stationarity Guarantees for Langevin Monte Carlo

Balasubramanian, Krishnakumar, Chewi, Sinho, Erdogdu, Murat A., Salim, Adil, Zhang, Matthew

arXiv.org Machine LearningFeb-10-2022

For the task of sampling from a density $\pi \propto \exp(-V)$ on $\mathbb{R}^d$, where $V$ is possibly non-convex but $L$-gradient Lipschitz, we prove that averaged Langevin Monte Carlo outputs a sample with $\varepsilon$-relative Fisher information after $O( L^2 d^2/\varepsilon^2)$ iterations. This is the sampling analogue of complexity bounds for finding an $\varepsilon$-approximate first-order stationary points in non-convex optimization and therefore constitutes a first step towards the general theory of non-log-concave sampling. We discuss numerous extensions and applications of our result; in particular, it yields a new state-of-the-art guarantee for sampling from distributions which satisfy a Poincar\'e inequality.

artificial intelligence, inequality, machine learning, (16 more...)

arXiv.org Machine Learning

2202.05214

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

A Non-Asymptotic Analysis for Stein Variational Gradient Descent

Korba, Anna, Salim, Adil, Arbel, Michael, Luise, Giulia, Gretton, Arthur

arXiv.org Machine LearningOct-27-2020

We study the Stein Variational Gradient Descent (SVGD) algorithm, which optimises a set of particles to approximate a target probability distribution $\pi\propto e^{-V}$ on $\mathbb{R}^d$. In the population limit, SVGD performs gradient descent in the space of probability distributions on the KL divergence with respect to $\pi$, where the gradient is smoothed through a kernel integral operator. In this paper, we provide a novel finite time analysis for the SVGD algorithm. We provide a descent lemma establishing that the algorithm decreases the objective at each iteration, and rates of convergence. We also provide a convergence result of the finite particle system corresponding to the practical implementation of SVGD to its population version.

artificial intelligence, optimization problem, stein, (14 more...)

arXiv.org Machine Learning

2006.09797

Country: North America > Canada (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.92)

Add feedback