feature scaling
Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder
Xu, Zhen, Tan, Zhen, Wang, Song, Xu, Kaidi, Chen, Tianlong
Sparse autoencoders (SAEs) have emerged as a powerful tool for interpreting large language models (LLMs) by decomposing token activations into combinations of human-understandable features. While SAEs provide crucial insights into LLM explanations, their practical adoption faces a fundamental challenge: better interpretability demands that SAEs' hidden layers have high dimensionality to satisfy sparsity constraints, resulting in prohibitive training and inference costs. Recent Mixture of Experts (MoE) approaches attempt to address this by partitioning SAEs into narrower expert networks with gated activation, thereby reducing computation. In a well-designed MoE, each expert should focus on learning a distinct set of features. However, we identify a \textit{critical limitation} in MoE-SAE: Experts often fail to specialize, which means they frequently learn overlapping or identical features. To deal with it, we propose two key innovations: (1) Multiple Expert Activation that simultaneously engages semantically weighted expert subsets to encourage specialization, and (2) Feature Scaling that enhances diversity through adaptive high-frequency scaling. Experiments demonstrate a 24\% lower reconstruction error and a 99\% reduction in feature redundancy compared to existing MoE-SAE methods. This work bridges the interpretability-efficiency gap in LLM analysis, allowing transparent model inspection without compromising computational feasibility.
- North America > United States > North Carolina (0.04)
- North America > United States > Arizona (0.04)
- North America > Canada > Newfoundland and Labrador > Labrador (0.04)
- (2 more...)
LCDB 1.1: A Database Illustrating Learning Curves Are More Ill-Behaved Than Previously Thought
Yan, Cheng, Mohr, Felix, Viering, Tom
Sample-wise learning curves plot performance versus training set size. They are useful for studying scaling laws and speeding up hyperparameter tuning and model selection. Learning curves are often assumed to be well-behaved: monotone (i.e. improving with more data) and convex. By constructing the Learning Curves Database 1.1 (LCDB 1.1), a large-scale database with high-resolution learning curves including more modern learners (CatBoost, TabNet, RealMLP and TabPFN), we show that learning curves are less often well-behaved than previously thought. Using statistically rigorous methods, we observe significant ill-behavior in approximately 15% of the learning curves, almost twice as much as in previous estimates. We also identify which learners are to blame and show that specific learners are more ill-behaved than others. Additionally, we demonstrate that different feature scalings rarely resolve ill-behavior. We evaluate the impact of ill-behavior on downstream tasks, such as learning curve fitting and model selection, and find it poses significant challenges, underscoring the relevance and potential of LCDB 1.1 as a challenging benchmark for future research.
- Europe > Netherlands > South Holland > Delft (0.04)
- North America > Mexico > Yucatán > Mérida (0.04)
- Europe > Switzerland (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area (0.92)
- Information Technology (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.92)
- (2 more...)
The Impact of Feature Scaling In Machine Learning: Effects on Regression and Classification Tasks
Pinheiro, João Manoel Herrera, de Oliveira, Suzana Vilas Boas, Silva, Thiago Henrique Segreto, Saraiva, Pedro Antonio Rabelo, de Souza, Enzo Ferreira, Godoy, Ricardo V., Ambrosio, Leonardo André, Becker, Marcelo
This research addresses the critical lack of comprehensive studies on feature scaling by systematically evaluating 12 scaling techniques - including several less common transformations - across 14 different Machine Learning algorithms and 16 datasets for classification and regression tasks. We meticulously analyzed impacts on predictive performance (using metrics such as accuracy, MAE, MSE, and $R^2$) and computational costs (training time, inference time, and memory usage). Key findings reveal that while ensemble methods (such as Random Forest and gradient boosting models like XGBoost, CatBoost and LightGBM) demonstrate robust performance largely independent of scaling, other widely used models such as Logistic Regression, SVMs, TabNet, and MLPs show significant performance variations highly dependent on the chosen scaler. This extensive empirical analysis, with all source code, experimental results, and model parameters made publicly available to ensure complete transparency and reproducibility, offers model-specific crucial guidance to practitioners on the need for an optimal selection of feature scaling techniques.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
- (2 more...)
DTization: A New Method for Supervised Feature Scaling
Artificial intelligence is currently a dominant force in shaping various aspects of the world. Machine learning is a sub-field in artificial intelligence. Feature scaling is one of the data pre-processing techniques that improves the performance of machine learning algorithms. The traditional feature scaling techniques are unsupervised where they do not have influence of the dependent variable in the scaling process. In this paper, we have presented a novel feature scaling technique named DTization that employs decision tree and robust scaler for supervised feature scaling. The proposed method utilizes decision tree to measure the feature importance and based on the importance, different features get scaled differently with the robust scaler algorithm. The proposed method has been extensively evaluated on ten classification and regression datasets on various evaluation matrices and the results show a noteworthy performance improvement compared to the traditional feature scaling methods.
- Health & Medicine > Therapeutic Area (1.00)
- Information Technology (0.70)
- Energy > Oil & Gas > Upstream (0.36)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Top 5 Machine Learning Practices Recommended by Experts - KDnuggets
Machine learning has been a subject of intense media hype with more organizations adopting this technology to handle their everyday tasks. Machine learning practitioners may be able to present the solution but enhancing the model performance can be very challenging at times. It is something that comes with practice and experience. Even after trying out all the strategies, we often fail to improve the accuracy of the model. Therefore, this article is intended to help beginners improve their model structure by listing the best practices recommended by machine learning Experts.
This AI newsletter is all you need #13
Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. This week, my attention was on Emad and the amazing work he and his team (Stability.ai)
#003 Machine Learning - Improving The Performance Of A Learning Algorithm - Master Data Science 18.07.2022
Highlights: Welcome back to our new Machine Learning series. In the previous post, we studied all about Linear Regression, Cost Functions and Gradient Descent. We also built a simple Linear Regression model using Python. In this tutorial post, we will learn how to make our Linear Regression model faster and more powerful. We will start by building a Linear Regression model using multiple features and then, enhance its performance using various techniques. And finally, we'll implement what we learn about Multiple Linear Regression models using a simple code in Python. In our previous post, we studied an example for predicting the price of a house given the size of the house. In that particular example, we worked with the original version of Linear Regression which utilized only a single feature \(x \), the size of the house, in order to predict \(y \), the price of the house.
Kaggle Master with Heart Attack Prediction Kaggle Project
Kaggle Master with Heart Attack Prediction Kaggle Project - Kaggle is Machine Learning & Data Science community. Become Kaggle master with real machine learning kaggle project Preview this Course Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle is a platform where data scientists can compete in machine learning challenges. These challenges can be anything from predicting housing prices to detect Machine learning describes systems that make predictions using a model trained on real-world data. Machine learning is constantly being applied to new industries and ne Data science includes preparing, analyzing, and processing data.
Optimizing Neural Networks
The goal of training an artificial neural network is to achieve the lowest generalized error in the least amount of time. In this article I'll outline a brief description of some common methods of optimizing training. Feature scaling, is the process of scaling the input features such that all features occupy the same range of values. This ensures that the gradient of the cost function is not exaggerated in any particular dimension, which reduces oscillation during gradient descent. Oscillation during gradient descent means the training is not maximally efficient, as it's not taking the shortest path to the minimum of the cost function.