AITopics | Chang, Sung-En

Fully Open Source Moxin-7B Technical Report

Zhao, Pu, Shen, Xuan, Kong, Zhenglun, Shen, Yixin, Chang, Sung-En, Rupprecht, Timothy, Lu, Lei, Nan, Enfu, Yang, Changdi, He, Yumei, Xu, Xingchen, Huang, Yu, Wang, Wei, Chen, Yue, He, Yong, Wang, Yanzhi

arXiv.org Artificial IntelligenceDec-11-2024

Recently, Large Language Models (LLMs) have undergone a significant transformation, marked by a rapid rise in both their popularity and capabilities. Leading this evolution are proprietary LLMs like GPT-4 and GPT-o1, which have captured widespread attention in the AI community due to their remarkable performance and versatility. Simultaneously, open-source LLMs, such as LLaMA and Mistral, have made great contributions to the ever-increasing popularity of LLMs due to the ease to customize and deploy the models across diverse applications. Although open-source LLMs present unprecedented opportunities for innovation and research, the commercialization of LLMs has raised concerns about transparency, reproducibility, and safety. Many open-source LLMs fail to meet fundamental transparency requirements by withholding essential components like training code and data, and some use restrictive licenses whilst claiming to be "open-source," which may hinder further innovations on LLMs. To mitigate this issue, we introduce Moxin 7B, a fully open-source LLM developed in accordance with the Model Openness Framework (MOF), a ranked classification system that evaluates AI models based on model completeness and openness, adhering to principles of open science, open source, open data, and open access. Our model achieves the highest MOF classification level of "open science" through the comprehensive release of pre-training code and configurations, training and fine-tuning datasets, and intermediate and final checkpoints. Experiments show that our model achieves superior performance in zero-shot evaluation compared with popular 7B models and performs competitively in few-shot evaluation.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.06845

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MixLasso: Generalized Mixed Regression via Convex Atomic-Norm Regularization

Yen, Ian En-Hsu, Lee, Wei-Cheng, Zhong, Kai, Chang, Sung-En, Ravikumar, Pradeep K., Lin, Shou-De

Neural Information Processing SystemsFeb-14-2020, 21:14:29 GMT

We consider a generalization of mixed regression where the response is an additive combination of several mixture components. Standard mixed regression is a special case where each response is generated from exactly one component. Typical approaches to the mixture regression problem employ local search methods such as Expectation Maximization (EM) that are prone to spurious local optima. On the other hand, a number of recent theoretically-motivated \emph{Tensor-based methods} either have high sample complexity, or require the knowledge of the input distribution, which is not available in most of practical situations. In this work, we study a novel convex estimator \emph{MixLasso} for the estimation of generalized mixed regression, based on an atomic norm specifically constructed to regularize the number of mixture components.

artificial intelligence, machine learning, mixed regression, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.63)

Add feedback

MixLasso: Generalized Mixed Regression via Convex Atomic-Norm Regularization

Yen, Ian En-Hsu, Lee, Wei-Cheng, Zhong, Kai, Chang, Sung-En, Ravikumar, Pradeep K., Lin, Shou-De

Neural Information Processing SystemsDec-31-2018

We consider a generalization of mixed regression where the response is an additive combination of several mixture components. Standard mixed regression is a special case where each response is generated from exactly one component. Typical approaches to the mixture regression problem employ local search methods such as Expectation Maximization (EM) that are prone to spurious local optima. On the other hand, a number of recent theoretically-motivated \emph{Tensor-based methods} either have high sample complexity, or require the knowledge of the input distribution, which is not available in most of practical situations. In this work, we study a novel convex estimator \emph{MixLasso} for the estimation of generalized mixed regression, based on an atomic norm specifically constructed to regularize the number of mixture components. Our algorithm gives a risk bound that trades off between prediction accuracy and model sparsity without imposing stringent assumptions on the input/output distribution, and can be easily adapted to the case of non-linear functions. In our numerical experiments on mixtures of linear as well as nonlinear regressions, the proposed method yields high-quality solutions in a wider range of settings than existing approaches.

artificial intelligence, machine learning, mixlasso, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.14)
Asia (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

MixLasso: Generalized Mixed Regression via Convex Atomic-Norm Regularization

Yen, Ian En-Hsu, Lee, Wei-Cheng, Zhong, Kai, Chang, Sung-En, Ravikumar, Pradeep K., Lin, Shou-De

Neural Information Processing SystemsDec-31-2018

We consider a generalization of mixed regression where the response is an additive combination of several mixture components. Standard mixed regression is a special case where each response is generated from exactly one component. Typical approaches to the mixture regression problem employ local search methods such as Expectation Maximization (EM) that are prone to spurious local optima. On the other hand, a number of recent theoretically-motivated \emph{Tensor-based methods} either have high sample complexity, or require the knowledge of the input distribution, which is not available in most of practical situations. In this work, we study a novel convex estimator \emph{MixLasso} for the estimation of generalized mixed regression, based on an atomic norm specifically constructed to regularize the number of mixture components. Our algorithm gives a risk bound that trades off between prediction accuracy and model sparsity without imposing stringent assumptions on the input/output distribution, and can be easily adapted to the case of non-linear functions. In our numerical experiments on mixtures of linear as well as nonlinear regressions, the proposed method yields high-quality solutions in a wider range of settings than existing approaches.

artificial intelligence, machine learning, regression, (18 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Learning Tensor Latent Features

Chang, Sung-En, Zheng, Xun, Yen, Ian E. H., Ravikumar, Pradeep, Yu, Rose

arXiv.org Machine LearningOct-10-2018

Compared to the classic latent factor models [14], latent feature models have two main benefits: (1) interpretablity: the binary codes directly reveal whether certain features exist in the data, thus provide more interpretable latent profiles [25]; (2) scalability: compared with real-valued codes, binary codes require fewer bits to store, thereby cutting down the memory footprint, making it easier to deploy into memory constrained environments such as mobile devices. Tensor latent feature models generalize traditional matrix latent feature models to represent high-order correlation structures in the data. For example, in spatiotemporal recommender systems, the observations are user activities over different locations and time. We want to learn the latent features and codes that correspond to user, space and time simultaneously without assuming conditional independence of these three dimensions. In this case, we can first represent such data as a high-order tensor and assign binary codes encoding presence or absence of rows for individual modes of the tensor. These codes can then help us answer the "when" and "where" questions regarding the learned user preferences. Besides the non-convex formulation of the maximum likelihood estimation (MLE) objective, learning latent feature models is further complicated by the combinatorial nature of the codes.

artificial intelligence, latent feature model, machine learning, (16 more...)

arXiv.org Machine Learning

1810.04754

Country: North America > United States > New York (0.14)

Genre: Research Report (0.50)

Technology: