Comparison of Classification Methods for Very High-Dimensional Data in Sparse Random Projection Representation
Machine learning is a mature scientific field with lots of theoretical results, established algorithms and processes that address various supervised and unsupervised problems using the provided data. In theoretical research, such data is generated in a convenient way, or various methods are compared on standard benchmark problems - where data samples are represented as dense real-valued vectors of fixed and relatively low length. Practical applications represented by such standard datasets can successfully be solved by one of a myriad of existing machine learning methods and their implementations. However, the most impact of machine learning is currently in the big data field with the problems that are well explained in natural language ("Find malicious files", "Is that website safe to browse?") but are hard to encode numerically. Data samples in these problems have distinct features coming from a huge unordered set of possible features. Same approach can cover a frequent case of missing feature values [10, 28].
Dec-18-2019
- Country:
- North America
- United States > New York
- New York County > New York City (0.04)
- Canada > Quebec
- Montreal (0.04)
- United States > New York
- Europe
- Netherlands > South Holland
- Dordrecht (0.04)
- Finland > Uusimaa
- Helsinki (0.04)
- Netherlands > South Holland
- Asia > Middle East
- Israel > Jerusalem District > Jerusalem (0.04)
- North America
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology > Security & Privacy (0.96)
- Technology: