AITopics

2009.06231

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Information Technology > Services (0.90)

Technology:

Information Technology > Security & Privacy > Spam Filtering (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
(6 more...)

arXiv.org Machine LearningSep-14-2020

A Vertical Federated Learning Method for Interpretable Scorecard and Its Application in Credit Scoring

Zheng, Fanglan, Erihe, null, Li, Kun, Tian, Jiang, Xiang, Xiaojia

With the success of big data and artificial intelligence in many fields, the applications of big data driven models are expected in financial risk management especially credit scoring and rating. Under the premise of data privacy protection, we propose a projected gradient-based method in the vertical federated learning framework for the traditional scorecard, which is based on logistic regression with bounded constraints, namely FL-LRBC. The latter enables multiple agencies to jointly train an optimized scorecard model in a single training session. It leads to the formation of the model with positive coefficients, while the time-consuming parameter-tuning process can be avoided. Moreover, the performance in terms of both AUC and the Kolmogorov-Smirnov (KS) statistics is significantly improved due to data enrichment using FL-LRBC. At present, FL-LRBC has already been applied to credit business in a China nation-wide financial holdings group.

artificial intelligence, machine learning, scorecard model, (13 more...)

2009.06218

Country:

Asia > China (0.25)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre:

Research Report > New Finding (0.50)
Research Report > Experimental Study (0.35)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance > Credit (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

arXiv.org Machine LearningSep-14-2020

Fairness Constraints in Semi-supervised Learning

Zhang, Tao, Zhu, Tianqing, Han, Mengde, Li, Jing, Zhou, Wanlei, Yu, Philip S.

Fairness in machine learning has received considerable attention. However, most studies on fair learning focus on either supervised learning or unsupervised learning. Very few consider semi-supervised settings. Yet, in reality, most machine learning tasks rely on large datasets that contain both labeled and unlabeled data. One of key issues with fair learning is the balance between fairness and accuracy. Previous studies arguing that increasing the size of the training set can have a better trade-off. We believe that increasing the training set with unlabeled data may achieve the similar result. Hence, we develop a framework for fair semi-supervised learning, which is formulated as an optimization problem. This includes classifier loss to optimize accuracy, label propagation loss to optimize unlabled data prediction, and fairness constraints over labeled and unlabeled data to optimize the fairness level. The framework is conducted in logistic regression and support vector machines under the fairness metrics of disparate impact and disparate mistreatment. We theoretically analyze the source of discrimination in semi-supervised learning via bias, variance and noise decomposition. Extensive experiments show that our method is able to achieve fair semi-supervised learning, and reach a better trade-off between accuracy and fairness than fair supervised learning.

artificial intelligence, inductive learning, machine learning, (14 more...)

2009.0619

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(5 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Health & Medicine (0.93)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)

#artificialintelligenceSep-13-2020, 22:46:08 GMT

Why Working With AUC is More Powerful Than One Might Think

I am finishing up a work-note that has some really neat implications as to why working with AUC is more powerful than one might think. I think I am far enough along to share the consequences here. This started as some, now reappraised, thoughts on the fallacy of thinking knowing the AUC (area under the curve) means you know the shape of the ROC plot (receiver operating characteristic plot]. I now think for many practical applications the AUC number carries a lot more information about the ROC shape than one might expect. For the experienced practitioner the following ROC shape is very familiar.

artificial intelligence, auc, machine learning, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.87)

Wang, Weiwei, Eberhardt, Wiebke, Bromuri, Stefano

That looks interesting! Personalizing Communication and Segmentation with Random Forest Node Embeddings

arXiv.org Artificial IntelligenceSep-13-2020

Communicating effectively with customers is a challenge for many marketers, but especially in a context that is both pivotal to individual long-term financial well-being and difficult to understand: pensions. Around the world, participants are reluctant to consider their pension in advance, it leads to a lack of preparation of their pension retirement [1], [2]. In order to engage participants to obtain information on their expected pension benefits, personalizing the pension providers' email communication is a first and crucial step. We describe a machine learning approach to model email newsletters to fit participants' interests. The data for the modeling and analysis is collected from newsletters sent by a large Dutch pension provider of the Netherlands and is divided into two parts. The first part comprises 2,228,000 customers whereas the second part comprises the data of a pilot study, which took place in July 2018 with 465,711 participants. In both cases, our algorithm extracts features from continuous and categorical data using random forests, and then calculates node embeddings of the decision boundaries of the random forest. We illustrate the algorithm's effectiveness for the classification task, and how it can be used to perform data mining tasks. In order to confirm that the result is valid for more than one data set, we also illustrate the properties of our algorithm in benchmark data sets concerning churning. In the data sets considered, the proposed modeling demonstrates competitive performance with respect to other state of the art approaches based on random forests, achieving the best Area Under the Curve (AUC) in the pension data set (0.948). For the descriptive part, the algorithm can identify customer segmentations that can be used by marketing departments to better target their communication towards their customers.

artificial intelligence, machine learning, participant, (18 more...)

2009.05931

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Netherlands > Limburg > Maastricht (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Banking & Finance (1.00)
Consumer Products & Services > Retirement (0.94)
Information Technology (0.93)
Telecommunications (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
(2 more...)

Krabel, Tobias Markus, Tran, Thi Ngoc Tien, Groll, Andreas, Horn, Daniel, Jentsch, Carsten

Random boosting and random^2 forests -- A random tree depth injection approach

arXiv.org Machine LearningSep-13-2020

The induction of additional randomness in parallel and sequential ensemble methods has proven to be worthwhile in many aspects. In this manuscript, we propose and examine a novel random tree depth injection approach suitable for sequential and parallel tree-based approaches including Boosting and Random Forests. The resulting methods are called \emph{Random Boost} and \emph{Random$^2$ Forest}. Both approaches serve as valuable extensions to the existing literature on the gradient boosting framework and random forests. A Monte Carlo simulation, in which tree-shaped data sets with different numbers of final partitions are built, suggests that there are several scenarios where \emph{Random Boost} and \emph{Random$^2$ Forest} can improve the prediction performance of conventional hierarchical boosting and random forest approaches. The new algorithms appear to be especially successful in cases where there are merely a few high-order interactions in the generated data. In addition, our simulations suggest that our random tree depth injection approach can improve computation time by up to 40%, while at the same time the performance losses in terms of prediction accuracy turn out to be minor or even negligible in most cases.

algorithm, artificial intelligence, machine learning, (19 more...)

2009.06078

Country:

Europe > Austria > Vienna (0.14)
Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)
North America > United States > New York (0.04)
Europe > Germany > Hesse > Darmstadt Region > Frankfurt (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

arXiv.org Artificial IntelligenceSep-13-2020

Is there a role for statistics in artificial intelligence?

Friedrich, Sarah, Antes, Gerd, Behr, Sigrid, Binder, Harald, Brannath, Werner, Dumpert, Florian, Ickstadt, Katja, Kestler, Hans, Lederer, Johannes, Leitgöb, Heinz, Pauly, Markus, Steland, Ansgar, Wilhelm, Adalbert, Friede, Tim

The research on and application of artificial intelligence (AI) has triggered a comprehensive scientific, economic, social and political discussion. Here we argue that statistics, as an interdisciplinary scientific field, plays a substantial role both for the theoretical and practical understanding of AI and for its future development. Statistics might even be considered a core element of AI. With its specialist knowledge of data evaluation, starting with the precise formulation of the research question and passing through a study design stage on to analysis and interpretation of the results, statistics is a natural partner for other disciplines in teaching, research and practice. This paper aims at contributing to the current discussion by highlighting the relevance of statistical methodology in the context of AI development. In particular, we discuss contributions of statistics to the field of artificial intelligence concerning methodological development, planning and design of studies, assessment of data quality and data collection, differentiation of causality and associations and assessment of uncertainty in results. Moreover, the paper also deals with the equally necessary and meaningful extension of curricula in schools and universities.

data mining, machine learning, natural language, (19 more...)

2009.0907

Country:

Europe > Germany > Lower Saxony > Gottingen (0.14)
North America > United States > New York (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(13 more...)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Government Relations & Public Policy (0.93)
Leisure & Entertainment > Games (0.93)
(6 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(6 more...)

Ibrahim, Omar Adel, Sciancalepore, Savio, Oligeri, Gabriele, Di Pietro, Roberto

MAGNETO: Fingerprinting USB Flash Drives via Unintentional Magnetic Emissions

arXiv.org Artificial IntelligenceSep-12-2020

Universal Serial Bus (USB) Flash Drives are nowadays one of the most convenient and diffused means to transfer files, especially when no Internet connection is available. However, USB flash drives are also one of the most common attack vectors used to gain unauthorized access to host devices. For instance, it is possible to replace a USB drive so that when the USB key is connected, it would install passwords stealing tools, root-kit software, and other disrupting malware. In such a way, an attacker can steal sensitive information via the USB-connected devices, as well as inject any kind of malicious software into the host. To thwart the above-cited raising threats, we propose MAGNETO, an efficient, non-interactive, and privacy-preserving framework to verify the authenticity of a USB flash drive, rooted in the analysis of its unintentional magnetic emissions. We show that the magnetic emissions radiated during boot operations on a specific host are unique for each device, and sufficient to uniquely fingerprint both the brand and the model of the USB flash drive, or the specific USB device, depending on the used equipment. Our investigation on 59 different USB flash drives---belonging to 17 brands, including the top brands purchased on Amazon in mid-2019---, reveals a minimum classification accuracy of 98.2% in the identification of both brand and model, accompanied by a negligible time and computational overhead. MAGNETO can also identify the specific USB Flash drive, with a minimum classification accuracy of 91.2%. Overall, MAGNETO proves that unintentional magnetic emissions can be considered as a viable and reliable means to fingerprint read-only USB flash drives. Finally, future research directions in this domain are also discussed.

artificial intelligence, machine learning, magneto, (14 more...)

doi: 10.1145/3422308

2002.05905

Country:

Asia > Middle East > Qatar (0.04)
North America > United States > Texas (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Communications > Networks (0.93)
(2 more...)

Mehdiyev, Nijat, Fettke, Peter

Explainable Artificial Intelligence for Process Mining: A General Overview and Application of a Novel Local Explanation Approach for Predictive Process Monitoring

arXiv.org Artificial IntelligenceSep-12-2020

The contemporary process-aware information systems possess the capabilities to record the activities generated during the process execution. To leverage these process specific fine-granular data, process mining has recently emerged as a promising research discipline. As an important branch of process mining, predictive business process management, pursues the objective to generate forward-looking, predictive insights to shape business processes. In this study, we propose a conceptual framework sought to establish and promote understanding of decision-making environment, underlying business processes and nature of the user characteristics for developing explainable business process prediction solutions. Consequently, with regard to the theoretical and practical implications of the framework, this study proposes a novel local post-hoc explanation approach for a deep learning classifier that is expected to facilitate the domain experts in justifying the model decisions. In contrary to alternative popular perturbation-based local explanation approaches, this study defines the local regions from the validation dataset by using the intermediate latent space representations learned by the deep neural networks. To validate the applicability of the proposed explanation method, the real-life process log data delivered by the Volvo IT Belgium's incident management system are used. The adopted deep learning classifier achieves a good performance with the Area Under the ROC Curve of 0.94. The generated local explanations are also visualized and presented with relevant evaluation measures that are expected to increase the users' trust in the black-box-model.

data mining, explanation, machine learning, (18 more...)

2009.02098

Country:

Europe > Belgium (0.24)
North America > United States > New York (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)
(2 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningSep-12-2020

Machine Learning Against Cancer: Accurate Diagnosis of Cancer by Machine Learning Classification of the Whole Genome Sequencing Data

Hooshmand, Arash

Machine learning can precisely identify different cancer tumors at any stage by classifying cancerous and healthy samples based on their genomic profile. We have developed novel methods of MLAC (Machine Learning Against Cancer) achieving perfect results with perfect precision, sensitivity, and specificity. We have used the whole genome sequencing data acquired by next-generation RNA sequencing techniques in The Cancer Genome Atlas and Genotype-Tissue Expression projects for cancerous and healthy tissues respectively. Moreover, we have shown that unsupervised machine learning clustering has great potential to be used for cancer diagnosis. Indeed, a creative way to work with data and general algorithms has resulted in perfect classification i.e. all precision, sensitivity, and specificity are equal to 1 for most of the different tumor types even with a modest amount of data, and the same method works well on a series of cancers and results in great clustering of cancerous and healthy samples too. Our system can be used in practice because once the classifier is trained, it can be used to classify any new sample of new potential patients. One advantage of our work is that the aforementioned perfect precision and recall are obtained on samples of all stages including very early stages of cancer; therefore, it is a promising tool for diagnosis of cancers in early stages. Another advantage of our novel model is that it works with normalized values of RNA sequencing data, hence people's private sensitive medical data will remain hidden, protected, and safe. This type of analysis will be widespread and economical in the future and people can even learn to receive their RNA sequencing data and do their own preliminary cancer studies themselves which have the potential to help the healthcare systems. It is a great step forward toward good health that is the main base of sustainable societies.

artificial intelligence, bayesian inference, machine learning, (16 more...)

2009.05847

Country:

North America > United States (0.28)
Asia > Middle East > UAE > Dubai Emirate > Dubai (0.04)

Genre:

Research Report > Promising Solution (0.86)
Research Report > New Finding (0.68)
Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)