AITopics

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.31)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.31)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.30)

Elkano, Mikel, Uriz, Mikel, Bustince, Humberto, Galar, Mikel

On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems

arXiv.org Machine LearningFeb-28-2019

We present a new distributed fuzzy partitioning method to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems. The proposed algorithm builds a fixed number of fuzzy sets for all variables and adjusts their shape and position to the real distribution of training data. A two-step process is applied : 1) transformation of the original distribution into a standard uniform distribution by means of the probability integral transform. Since the original distribution is generally unknown, the cumulative distribution function is approximated by computing the q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy partition in the transformed attribute space using a fixed number of equally distributed triangular membership functions. Despite the aforementioned transformation, the definition of every fuzzy set in the original space can be recovered by applying the inverse cumulative distribution function (also known as quantile function). The experimental results reveal that the proposed methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT) induction algorithm to maintain classification accuracy with up to 6 million fewer leaves.

artificial intelligence, data mining, machine learning, (19 more...)

doi: 10.1109/BigDataCongress.2018.00011

1903.00345

Country:

Europe > Spain > Navarre > Pamplona (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Hudson County > Secaucus (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.84)

Mohammed, Rafiq Ahmed, Wong, Kok-Wai, Shiratuddin, Mohd Fairuz, Wang, Xuequn

Improving fraud prediction with incremental data balancing technique for massive data streams

arXiv.org Machine LearningFeb-28-2019

The performance of classification algorithms with a massive and highly imbalanced data stream depends upon efficient balancing strategy. Some techniques of balancing strategy have been applied in the past with Batch data to resolve the class imbalance problem. This paper proposes a new incremental data balancing framework which can work with massive imbalanced data streams. In this paper, we choose Racing Algorithm as an automated data balancing technique which optimizes the balancing techniques. We applied Random Forest classification algorithm which can deal with the massive data stream. We investigated the suitability of Racing Algorithm and Random Forest in the proposed framework. Applying new technique in the proposed framework on the European Credit Card dataset, provided better results than the Batch mode. The proposed framework is more scalable to handle online massive data streams.

artificial intelligence, experiment, machine learning, (17 more...)

1903.0041

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Law Enforcement & Public Safety > Fraud (0.50)
Banking & Finance > Credit (0.37)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.56)

arXiv.org Machine LearningFeb-27-2019

Robust Decision Trees Against Adversarial Examples

Chen, Hongge, Zhang, Huan, Boning, Duane, Hsieh, Cho-Jui

Although adversarial examples and model robustness have been extensively studied in the context of linear models and neural networks, research on this issue in tree-based models and how to make tree-based models robust against adversarial examples is still limited. In this paper, we show that tree based models are also vulnerable to adversarial examples and develop a novel algorithm to learn robust trees. At its core, our method aims to optimize the performance under the worst-case perturbation of input features, which leads to a max-min saddle point problem. Incorporating this saddle point objective into the decision tree building procedure is non-trivial due to the discrete nature of trees --- a naive approach to finding the best split according to this saddle point objective will take exponential time. To make our approach practical and scalable, we propose efficient tree building algorithms by approximating the inner minimizer in this saddle point problem, and present efficient implementations for classical information gain based trees as well as state-of-the-art tree boosting models such as XGBoost. Experimental results on real world datasets demonstrate that the proposed algorithms can substantially improve the robustness of tree-based models against adversarial examples.

adversarial example, artificial intelligence, machine learning, (18 more...)

1902.1066

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Artificial IntelligenceFeb-26-2019

Neural Packet Classification

Liang, Eric, Zhu, Hang, Jin, Xin, Stoica, Ion

Packet classification is a fundamental problem in computer networking. This problem exposes a hard tradeoff between the computation and state complexity, which makes it particularly challenging. To navigate this tradeoff, existing solutions rely on complex hand-tuned heuristics, which are brittle and hard to optimize. In this paper, we propose a deep reinforcement learning (RL) approach to solve the packet classification problem. There are several characteristics that make this problem a good fit for Deep RL. First, many of the existing solutions are iteratively building a decision tree by splitting nodes in the tree. Second, the effects of these actions (e.g., splitting nodes) can only be evaluated once we are done with building the tree. These two characteristics are naturally captured by the ability of RL to take actions that have sparse and delayed rewards. Third, it is computationally efficient to generate data traces and evaluate decision trees, which alleviate the notoriously high sample complexity problem of Deep RL algorithms. Our solution, NeuroCuts, uses succinct representations to encode state and action space, and efficiently explore candidate decision trees to optimize for a global objective. It produces compact decision trees optimized for a specific set of rules and a given performance metric, such as classification time, memory footprint, or a combination of the two. Evaluation on ClassBench shows that NeuroCuts outperforms existing hand-crafted algorithms in classification time by 18% at the median, and reduces both time and memory footprint by up to 3x.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1902.10319

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (0.93)
Education (0.67)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceFeb-24-2019

Entity Personalized Talent Search Models with Tree Interaction Features

Ozcaglar, Cagri, Geyik, Sahin, Schmitz, Brian, Sharma, Prakhar, Shelkovnykov, Alex, Ma, Yiming, Buchanan, Erik

Talent Search systems aim to recommend potential candidates who are a good match to the hiring needs of a recruiter expressed in terms of the recruiter's search query or job posting. Past work in this domain has focused on linear and nonlinear models which lack preference personalization in the user-level due to being trained only with globally collected recruiter activity data. In this paper, we propose an entity-personalized Talent Search model which utilizes a combination of generalized linear mixed (GLMix) models and gradient boosted decision tree (GBDT) models, and provides personalized talent recommendations using nonlinear tree interaction features generated by the GBDT. We also present the offline and online system architecture for the productionization of this hybrid model approach in our Talent Search systems. Finally, we provide offline and online experiment results benchmarking our entity-personalized model with tree interaction features, which demonstrate significant improvements in our precision metrics compared to globally trained non-personalized models.

artificial intelligence, glmix model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3308558.3313672

1902.09041

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > Virginia > Alexandria County > Alexandria (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.94)
(2 more...)

#artificialintelligenceFeb-23-2019, 10:52:25 GMT

Python for Data Science : Learn in 3 Days

In the syntax below, we are asking Python to import numpy and pandas package. The'as' is used to alias package name.

data mining, machine learning, python, (20 more...)

Country:

Asia > India > Maharashtra (0.04)
Asia > India > Karnataka (0.04)

Genre: Research Report > Experimental Study (0.47)

Industry: Education > Curriculum > Subject-Specific Education (0.64)

Technology:

Information Technology > Software (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.99)
(2 more...)

#artificialintelligenceFeb-22-2019, 17:05:15 GMT

A Quick look at ML in algorithmic trading strategies Packt Hub

Algorithmic trading relies on computer programs that execute algorithms to automate some, or all, elements of a trading strategy. Algorithms are a sequence of steps or rules to achieve a goal and can take many forms. In the case of machine learning (ML), algorithms pursue the objective of learning other algorithms, namely rules, to achieve a target based on data, such as minimizing a prediction error. In this article, we have a look at use cases of ML and how it is used in algorithmic trading strategies. These algorithms encode various activities of a portfolio manager who observes market transactions and analyzes relevant data to decide on placing buy or sell orders.

algorithm, algorithmic trading strategy packt hub, trading strategy, (11 more...)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.31)

arXiv.org Machine LearningFeb-22-2019

Diversity of Ensembles for Data Stream Classification

Abassi, Mohamed Souhayel

When constructing a classifier ensemble, diversity among the base classifiers is one of the important characteristics. Several studies have been made in the context of standard static data, in particular, when analyzing the relationship between a high ensemble predictive performance and the diversity of its components. Besides, ensembles of learning machines have been performed to learn in the presence of concept drift and adapt to it. However, diversity measures have not received much research interest in evolving data streams. Only a few researchers directly consider promoting diversity while constructing an ensemble or rebuilding them in the moment of detecting drifts. In this paper, we present a theoretical analysis of different diversity measures and relate them to the success of ensemble learning algorithms for streaming data. The analysis provides a deeper understanding of the concept of diversity and its impact on online ensemble Learning in the presence of concept drift. More precisely, we are interested in answering the following research question; Which commonly used diversity measures are used in the context of static-data ensembles and how far are they applicable in the context of streaming data ensembles?

artificial intelligence, classifier, machine learning, (17 more...)

1902.08466

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.46)

#artificialintelligenceFeb-21-2019, 10:58:09 GMT

Random Forest Algorithm in Machine Learning

Random forest algorithm is a one of the most popular and most powerful supervised Machine Learning algorithm in Machine Learning that is capable of performing both regression and classification tasks. As the name suggest, this algorithm creates the forest with a number of decision trees. Random Forest Algorithm in Machine Learning: Machine learning is a scientific discipline that explores the construction and study of algorithms that can learn from data. Such algorithms operate by building a model from example inputs and using that to make predictions or decisions, rather than following strictly static program instructions. Machine learning is closely related to and often overlaps with computational statistics; a discipline that also specializes in prediction-making.

algorithm, forest algorithm, random forest algorithm, (11 more...)

Genre: Instructional Material > Course Syllabus & Notes (0.56)

Industry: Education > Educational Setting > Online (0.77)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)