AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

Improving drug response prediction by integrating multiple data sources: matrix factorization, kernel and network-based approaches

#artificialintelligenceDec-23-2019, 11:46:40 GMT

Note: MF Matrix factorization; BMF Bayesian matrix factorization; KBMF Kernel Bayesian matrix factorization; KRR Kernel ridge regression; NBR Network based regression; NBC Network based classification; CV Cross validation; LOOCV Leave-one-out cross validation; PCC Pearson correlation coefficient; RMSE Root mean square error; MSE Mean square error; SCC Spearman correlation coefficient; NDCG Normalized discounted cumulative gain; R2 Coefficient of determination; NRMSE Normalized root mean squared error; AUC Area under curve; PPI Protein–protein interaction.

cell line, information, mathbf, (13 more...)

#artificialintelligence

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Europe > Finland (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

EnsemFDet: An Ensemble Approach to Fraud Detection based on Bipartite Graph

Ren, Yuxiang, Zhu, Hao, ZHang, Jiawei, Dai, Peng, Bo, Liefeng

arXiv.org Machine LearningDec-23-2019

Fraud detection is extremely critical for e-commerce business. It is the intent of the companies to detect and prevent fraud as early as possible. Existing fraud detection methods try to identify unexpected dense subgraphs and treat related nodes as suspicious. Spectral relaxation-based methods solve the problem efficiently but hurt the performance due to the relaxed constraints. Besides, many methods cannot be accelerated with parallel computation or control the number of returned suspicious nodes because they provide a set of subgraphs with diverse node sizes. These drawbacks affect the real-world applications of existing methods. In this paper, we propose an Ensemble-based Fraud Detection (EnsemFDet) method to scale up fraud detection in bipartite graphs by decomposing the original problem into subproblems on small-sized subgraphs. By oversampling the graph and solving the subproblems, the ensemble approach further votes suspicious nodes without sacrificing the prediction accuracy. Extensive experiments have been done on real transaction data from JD.com, which is one of the world's largest e-commerce platforms. Experimental results demonstrate the effectiveness, practicability, and scalability of EnsemFDet. More specifically, EnsemFDet is up to 100x faster than the state-of-the-art methods due to its parallelism with all aspects of data.

ensemfdet, graph, subgraph, (13 more...)

arXiv.org Machine Learning

1912.11113

Country: Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Law Enforcement & Public Safety > Fraud (1.00)
Information Technology > Services > e-Commerce Services (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Privacy Attacks on Network Embeddings

Ellers, Michael, Cochez, Michael, Schumacher, Tobias, Strohmaier, Markus, Lemmerich, Florian

arXiv.org Machine LearningDec-23-2019

Data ownership and data protection are increasingly important topics with ethical and legal implications, e.g., with the right to erasure established in the European General Data Protection Regulation (GDPR). In this light, we investigate network embeddings, i.e., the representation of network nodes as low-dimensional vectors. We consider a typical social network scenario with nodes representing users and edges relationships between them. We assume that a network embedding of the nodes has been trained. After that, a user demands the removal of his data, requiring the full deletion of the corresponding network information, in particular the corresponding node and incident edges. In that setting, we analyze whether after the removal of the node from the network and the deletion of the vector representation of the respective node in the embedding significant information about the link structure of the removed node is still encoded in the embedding vectors of the remaining nodes. This would require a (potentially computationally expensive) retraining of the embedding. For that purpose, we deploy an attack that leverages information from the remaining network and embedding to recover information about the neighbors of the removed node. The attack is based on (i) measuring distance changes in network embeddings and (ii) a machine learning classifier that is trained on networks that are constructed by removing additional nodes. Our experiments demonstrate that substantial information about the edges of a removed node/user can be retrieved across many different datasets. This implies that to fully protect the privacy of users, node deletion requires complete retraining - or at least a significant modification - of original network embeddings. Our results suggest that deleting the corresponding vector representation from network embeddings alone is not sufficient from a privacy perspective.

algorithm, information, node, (15 more...)

arXiv.org Machine Learning

1912.10979

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks

Filos, Angelos, Farquhar, Sebastian, Gomez, Aidan N., Rudner, Tim G. J., Kenton, Zachary, Smith, Lewis, Alizadeh, Milad, de Kroon, Arnoud, Gal, Yarin

arXiv.org Machine LearningDec-22-2019

Evaluation of Bayesian deep learning (BDL) methods is challenging. We often seek to evaluate the methods' robustness and scalability, assessing whether new tools give `better' uncertainty estimates than old ones. These evaluations are paramount for practitioners when choosing BDL tools on-top of which they build their applications. Current popular evaluations of BDL methods, such as the UCI experiments, are lacking: Methods that excel with these experiments often fail when used in application such as medical or automotive, suggesting a pertinent need for new benchmarks in the field. We propose a new BDL benchmark with a diverse set of tasks, inspired by a real-world medical imaging application on \emph{diabetic retinopathy diagnosis}. Visual inputs (512x512 RGB images of retinas) are considered, where model uncertainty is used for medical pre-screening---i.e. to refer patients to an expert when model diagnosis is uncertain. Methods are then ranked according to metrics derived from expert-domain to reflect real-world use of model uncertainty in automated diagnosis. We develop multiple tasks that fall under this application, including out-of-distribution detection and robustness to distribution shift. We then perform a systematic comparison of well-tuned BDL techniques on the various tasks. From our comparison we conclude that some current techniques which solve benchmarks such as UCI `overfit' their uncertainty to the dataset---when evaluated on our benchmark these underperform in comparison to simpler baselines. The code for the benchmark, its baselines, and a simple API for evaluating new BDL tools are made available at https://github.com/oatml/bdl-benchmarks.

benchmark, learning, neural network, (14 more...)

arXiv.org Machine Learning

1912.10481

Country:

North America > United States > Virginia (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (0.62)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.62)
Health & Medicine > Diagnostic Medicine > Imaging (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Add feedback

AEGR: A simple approach to gradient reversal in autoencoders for network anomaly detection

Babaei, Kasra, Chen, Zhi Yuan, Maul, Tomas

arXiv.org Machine LearningDec-21-2019

--Anomaly detection is referred to as a process in which the aim is to detect data points that follow a different pattern from the majority of data points. Anomaly detection methods suffer from several well-known challenges that hinder their performance such as high dimensionality. Autoencoders are unsupervised neural networks that have been used for the purpose of reducing dimensionality and also detecting network anomalies in large datasets. The performance of autoen-coders debilitates when the training set contains noise and anomalies. In this paper, a new gradient-reversal method is proposed to overcome the influence of anomalies on the training phase for the purpose of detecting network anomalies. The method is different from other approaches as it does not require an anomaly-free training set and is based on reconstruction error . Once latent variables are extracted from the network, Local Outlier Factor is used to separate normal data points from anomalies. A simple pruning approach and data augmentation is also added to further improve performance. The experimental results show that the proposed model can outperform other well-know approaches. In many real-world problems such as detecting fraudulent activities or detecting failure in aircraft engines, there is a pressing need to identify observations that have a striking dissimilarity compared to the majority. In medicine for instance, this discovery can lead to early detection of lung cancer or breast cancer.

anomaly, dataset, reconstruction error, (17 more...)

arXiv.org Machine Learning

1912.13387

Country: Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.54)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Evaluating the Effectiveness of Margin Parameter when Learning Knowledge Embedding Representation for Domain-specific Multi-relational Categorized Data

Chung, Matthew Wai Heng, Tissot, Hegler

arXiv.org Machine LearningDec-21-2019

Learning knowledge representation is an increasingly important technology that supports a variety of machine learning related applications. However, the choice of hyperparameters is seldom justified and usually relies on exhaustive search. Understanding the effect of hyperparameter combinations on embedding quality is crucial to avoid the inefficient process and enhance practicality of vector representation methods. We evaluate the effects of distinct values for the margin parameter focused on translational embedding representation models for multi-relational categorized data. We assess the margin influence regarding the quality of embedding models by contrasting traditional link prediction task accuracy against a classification task. The findings provide evidence that lower values of margin are not rigorous enough to help with the learning process, whereas larger values produce much noise pushing the entities beyond to the surface of the hyperspace, thus requiring constant regularization. Finally, the correlation between link prediction and classification accuracy shows traditional validation protocol for embedding models is a weak metric to represent the quality of embedding representation.

accuracy, relation, representation, (15 more...)

arXiv.org Machine Learning

1912.10264

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
South America > Brazil (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Massive errors found in facial recognition tech, especially in case of nonwhites: U.S. study

The Japan TimesDec-20-2019, 03:06:09 GMT

WASHINGTON – Facial recognition systems can produce wildly inaccurate results, especially for nonwhites, according to a U.S. government study released Thursday that is likely to raise fresh doubts on deployment of the artificial intelligence technology. The study of dozens of facial recognition algorithms showed "false positives" rates for Asians and African-Americans as much as 100 times higher than for whites. The researchers from the National Institute of Standards and Technology (NIST), a government research center, also found two algorithms assigned the wrong gender to black females almost 35 percent of the time. The study comes amid widespread deployment of facial recognition for law enforcement, airports, border security, banking, retailing, schools and for personal technology such as unlocking smartphones. Some activists and researchers have claimed the potential for errors is too great and that mistakes could result in the jailing of innocent people, and that the technology could be used to create databases that may be hacked or inappropriately used.

deployment, facial recognition tech, nonwhite, (8 more...)

The Japan Times

Country: North America > United States (0.58)

Genre: Research Report > New Finding (0.79)

Industry:

Government > Regional Government > North America Government > United States Government (0.58)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.38)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Data science for cybersecurity: A probabilistic time series model for detecting RDP inbound brute force attacks - Microsoft Security

#artificialintelligenceDec-20-2019, 02:12:52 GMT

Our approach to time series anomaly detection is computationally efficient, automatically learns how to update probabilities and adapt to changes in data. As we describe in the next section, this approach has yielded successful attack detection at high precision. The proposed time series anomaly detection model was deployed and utilized by Microsoft Threat Experts to detect RDP brute force attacks during threat hunting activities. A list that ranks machines across enterprises with the lowest anomaly scores (indicating the likelihood of observing a value at least as large under expected conditions in all signals considered) is updated and reviewed every day. See Table 1 for an example.

brute force attack, force attack, rdp brute force attack, (12 more...)

#artificialintelligence

Country:

Europe > United Kingdom (0.04)
Europe > Russia (0.04)
Europe > Netherlands (0.04)
Asia > Russia (0.04)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.41)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.75)

Add feedback

Destruction of Image Steganography using Generative Adversarial Networks

Corley, Isaac, Lwowski, Jonathan, Hoffman, Justin

arXiv.org Machine LearningDec-20-2019

Digital image steganalysis, or the detection of image steganography, has been studied in depth for years and is driven by Advanced Persistent Threat (APT) groups', such as APT37 Reaper, utilization of steganographic techniques to transmit additional malware to perform further post-exploitation activity on a compromised host. However, many steganalysis algorithms are constrained to work with only a subset of all possible images in the wild or are known to produce a high false positive rate. This results in blocking any suspected image being an unreasonable policy. A more feasible policy is to filter suspicious images prior to reception by the host machine. However, how does one optimally filter specifically to obfuscate or remove image steganography while avoiding degradation of visual image quality in the case that detection of the image was a false positive? We propose the Deep Digital Steganography Purifier (DDSP), a Generative Adversarial Network (GAN) which is optimized to destroy steganographic content without compromising the perceptual quality of the original image. As verified by experimental results, our model is capable of providing a high rate of destruction of steganographic image content while maintaining a high visual quality in comparison to other state-of-the-art filtering methods. Additionally, we test the transfer learning capability of generalizing to to obfuscate real malware payloads embedded into different image file formats and types using an unseen steganographic algorithm and prove that our model can in fact be deployed to provide adequate results.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

1912.1007

Country:

North America > United States > New York > Broome County > Binghamton (0.04)
North America > United States > Texas > Bexar County > San Antonio (0.04)
Europe > Greece (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Shareable Representations for Search Query Understanding

Kumar, Mukul, Hu, Youna, Headden, Will, Goutam, Rahul, Lin, Heran, Yin, Bing

arXiv.org Machine LearningDec-20-2019

Understanding search queries is critical for shopping search engines to deliver a satisfying customer experience. Popular shopping search engines receive billions of unique queries yearly, each of which can depict any of hundreds of user preferences or intents. In order to get the right results to customers it must be known queries like "inexpensive prom dresses" are intended to not only surface results of a certain product type but also products with a low price. Referred to as query intents, examples also include preferences for author, brand, age group, or simply a need for customer service. Recent works such as BERT have demonstrated the success of a large transformer encoder architecture with language model pre-training on a variety of NLP tasks. We adapt such an architecture to learn intents for search queries and describe methods to account for the noisiness and sparseness of search query data. We also describe cost effective ways of hosting transformer encoder models in context with low latency requirements. With the right domain-specific training we can build a shareable deep learning model whose internal representation can be reused for a variety of query understanding tasks including query intent identification. Model sharing allows for fewer large models needed to be served at inference time and provides a platform to quickly build and roll out new search query classifiers.

query, shareable representation, transformer encoder, (11 more...)

arXiv.org Machine Learning

2001.04345

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report (0.41)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Add feedback