Feature Hashing for Scalable Machine Learning – Inside Machine learning

#artificialintelligence

Feature hashing is a powerful technique for handling sparse, high-dimensional features in machine learning. It is fast, simple, memory-efficient, and well suited to online learning scenarios. While an approximation, it has surprisingly low accuracy tradeoffs in many machine learning problems. In this post, I will cover the basics of feature hashing and how to use it for flexible, scalable feature encoding and engineering. I'll also mention feature hashing in the context of Apache Spark's MLlib machine learning library.


Understanding a Version of Multivariate Symmetric Uncertainty to assist in Feature Selection

arXiv.org Machine Learning

In this paper, we analyze the behavior of the multivariate symmetric uncertainty (MSU) measure through the use of statistical simulation techniques under various mixes of informative and non-informative randomly generated features. Experiments show how the number of attributes, their cardinalities, and the sample size affect the MSU. We discovered a condition that preserves good quality in the MSU under different combinations of these three factors, providing a new useful criterion to help drive the process of dimension reduction.



Feature Engineering: Data scientist's Secret Sauce !

@machinelearnbot

It is very tempting for data science practitioners to opt for the best known algorithms for a given problem.However It's not the algorithm alone, which can provide the best solution; Model built on carefully engineered and selected features can provide far better results. "Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction."- The complex models are not easily interpretable and tougher to tune. Simpler algorithms, with better features or more data can perform far better than a weak assumption accompanied with a complex model.


Feature Engineering: Data scientist's Secret Sauce !

@machinelearnbot

It is very tempting for data science practitioners to opt for the best known algorithms for a given problem.However It's not the algorithm alone, which can provide the best solution; Model built on carefully engineered and selected features can provide far better results. "Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction."- The complex models are not easily interpretable and tougher to tune. Simpler algorithms, with better features or more data can perform far better than a weak assumption accompanied with a complex model.