AITopics

2002.02879

Country:

North America > United States > Minnesota (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Ireland (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Marketing (1.00)
Information Technology > Services (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Shvetsov, Nikolay, Buzun, Nazar, Dylov, Dmitry V.

Unsupervised non-parametric change point detection in quasi-periodic signals

arXiv.org Machine LearningFeb-7-2020

We propose a new unsupervised and non-parametric method to detect change points in intricate quasi-periodic signals. The detection relies on optimal transport theory combined with topological analysis and the bootstrap procedure. The algorithm is designed to detect changes in virtually any harmonic or a partially harmonic signal and is verified on three different sources of physiological data streams. We successfully find abnormal or irregular cardiac cycles in the waveforms for the six of the most frequent types of clinical arrhythmias using a single algorithm. The validation and the efficiency of the method are shown both on synthetic and on real time series. Our unsupervised approach reaches the level of performance of the supervised state-of-the-art techniques. We provide conceptual justification for the efficiency of the method and prove the convergence of the bootstrap procedure theoretically.

detection, point cloud, wasserstein distance, (14 more...)

2002.02717

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Alvarez, Ezequiel, Lamagna, Federico, Miquel, Cesar, Szewc, Manuel

Intelligent Arxiv: Sort daily papers by learning users topics preference

arXiv.org Machine LearningFeb-6-2020

Current daily paper releases are becoming increasingly large and areas of research are growing in diversity. This makes it harder for scientists to keep up to date with current state of the art and identify relevant work within their lines of interest. The goal of this article is to address this problem using Machine Learning techniques. We model a scientific paper to be built as a combination of different scientific knowledge from diverse topics into a new problem. In light of this, we implement the unsupervised Machine Learning technique of Latent Dirichlet Allocation (LDA) on the corpus of papers in a given field to: i) define and extract underlying topics in the corpus; ii) get the topics weight vector for each paper in the corpus; and iii) get the topics weight vector for new papers. By registering papers preferred by a user, we build a user vector of weights using the information of the vectors of the selected papers. Hence, by performing an inner product between the user vector and each paper in the daily Arxiv release, we can sort the papers according to the user preference on the underlying topics. We have created the website IArxiv.org where users can read sorted daily Arxiv releases (and more) while the algorithm learns each users preference, yielding a more accurate sorting every day. Current IArxiv.org version runs on Arxiv categories astro-ph, gr-qc, hep-ph and hep-th and we plan to extend to others. We propose several new useful and relevant implementations to be additionally developed as well as new Machine Learning techniques beyond LDA to further improve the accuracy of this new tool.

category, corpus, vector, (12 more...)

2002.0246

Country:

South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Middle East > Malta > Port Region > Southern Harbour District > Valletta (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

arXiv.org Machine LearningFeb-6-2020

Neural Network Representation Control: Gaussian Isolation Machines and CVC Regularization

Amit, Guy, Rosenberg, Ishai, Levy, Moshe, Bitton, Ron, Shabtai, Asaf, Elovici, Yuval

In many cases, neural network classifiers are likely to be exposed to input data that is outside of their training distribution data. Samples from outside the distribution may be classified as an existing class with high probability by softmax-based classifiers; such incorrect classifications affect the performance of the classifiers and the applications/systems that depend on them. Previous research aimed at distinguishing training distribution data from out-of-distribution data (OOD) has proposed detectors that are external to the classification method. We present Gaussian isolation machine (GIM), a novel hybrid (generative-discriminative) classifier aimed at solving the problem arising when OOD data is encountered. The GIM is based on a neural network and utilizes a new loss function that imposes a distribution on each of the trained classes in the neural network's output space, which can be approximated by a Gaussian. The proposed GIM's novelty lies in its discriminative performance and generative capabilities, a combination of characteristics not usually seen in a single classifier. The GIM achieves state-of-the-art classification results on image recognition and sentiment analysis benchmarking datasets and can also deal with OOD inputs. We also demonstrate the benefits of incorporating part of the GIM's loss function into standard neural networks as a regularization method.

classifier, dataset, neural network, (15 more...)

2002.02176

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

#artificialintelligenceFeb-5-2020, 18:07:22 GMT

Boosting Techniques in Python: Predicting Hotel Cancellations

For this reason, boosting is referred to as an ensemble method. In this example, boosting techniques are used to determine whether a customer will cancel their hotel booking or not. Hotel cancellations represent the response (or dependent) variable, where 1 cancel, 0 follow through with booking. The relevant features to be included as the x variable in the boosting models are identified by the ExtraTreesClassifier. The three features identified by the ExtraTreesClassifier (excluding variables deemed to be theoretically irrelevant) are lead time, country and deposit type. The following boosting techniques are used in predicting hotel cancellations.

accuracy, cancellation, customer, (12 more...)

#artificialintelligence

Industry: Consumer Products & Services (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.71)

Ärje, Johanna, Melvad, Claus, Jeppesen, Mads Rosenhøj, Madsen, Sigurd Agerskov, Raitoharju, Jenni, Rasmussen, Maria Strandgård, Iosifidis, Alexandros, Tirronen, Ville, Meissner, Kristian, Gabbouj, Moncef, Høye, Toke Thomas

Automatic image-based identification and biomass estimation of invertebrates

Understanding how biological communities respond to environmental changes is a key challenge in ecology and ecosystem management. The apparent decline of insect populations necessitates more biomonitoring but the time-consuming sorting and identification of taxa pose strong limitations on how many insect samples can be processed. In turn, this affects the scale of efforts to map invertebrate diversity altogether. Given recent advances in computer vision, we propose to replace the standard manual approach of human expert-based sorting and identification with an automatic image-based technology. We describe a robot-enabled image-based identification machine, which can automate the process of invertebrate identification, biomass estimation and sample sorting. We use the imaging device to generate a comprehensive image database of terrestrial arthropod species. We use this database to test the classification accuracy i.e. how well the species identity of a specimen can be predicted from images taken by the machine. We also test sensitivity of the classification accuracy to the camera settings (aperture and exposure time) in order to move forward with the best possible image quality. We use state-of-the-art Resnet-50 and InceptionV3 CNNs for the classification task. The results for the initial dataset are very promising ($\overline{ACC}=0.980$). The system is general and can easily be used for other groups of invertebrates as well. As such, our results pave the way for generating more data on spatial and temporal variation in invertebrate abundance, diversity and biomass.

classification accuracy, identification, specimen, (13 more...)

2002.03807

Country:

Europe > Denmark (0.04)
North America > Greenland (0.04)
Europe > Finland > Central Finland > Jyväskylä (0.04)
(2 more...)

Genre: Research Report (0.84)

Industry: Health & Medicine (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.58)

Wang, Sen, Chang, J. Morris

Privacy-Preserving Boosting in the Local Setting

In machine learning, boosting is one of the most popular methods that designed to combine multiple base learners to a superior one. The well-known Boosted Decision Tree classifier, has been widely adopted in many areas. In the big data era, the data held by individual and entities, like personal images, browsing history and census information, are more likely to contain sensitive information. The privacy concern raises when such data leaves the hand of the owners and be further explored or mined. Such privacy issue demands that the machine learning algorithm should be privacy aware. Recently, Local Differential Privacy is proposed as an effective privacy protection approach, which offers a strong guarantee to the data owners, as the data is perturbed before any further usage, and the true values never leave the hands of the owners. Thus the machine learning algorithm with the private data instances is of great value and importance. In this paper, we are interested in developing the privacy-preserving boosting algorithm that a data user is allowed to build a classifier without knowing or deriving the exact value of each data samples. Our experiments demonstrate the effectiveness of the proposed boosting algorithm and the high utility of the learned classifiers.

algorithm, classifier, data owner, (16 more...)

2002.02096

Country:

North America > United States > Florida > Hillsborough County > Tampa (0.14)
South America > Brazil (0.05)
North America > United States > North Carolina (0.04)
(8 more...)

Genre: Research Report (0.83)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.90)
(2 more...)

Shi, Benwei, Phillips, Jeff M.

A Deterministic Streaming Sketch for Ridge Regression

We provide a deterministic space-efficient algorithm for estimating ridge regression. For $n$ data points with $d$ features and a large enough regularization parameter, we provide a solution within $\varepsilon$ L$_2$ error using only $O(d/\varepsilon)$ space. This is the first $o(d^2)$ space algorithm for this classic problem. The algorithm sketches the covariance matrix by variants of Frequent Directions, which implies it can operate in insertion-only streams and a variety of distributed data settings. In comparisons to randomized sketching algorithms on synthetic and real-world datasets, our algorithm has less empirical error using less space and similar time.

algorithm, regression, sketch, (15 more...)

2002.02013

Country: North America > United States > Utah (0.05)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.63)

Online Passive-Aggressive Total-Error-Rate Minimization

Jang, Se-In

We provide a new online learning algorithm which utilizes online passive-aggressive learning (PA) and total-error-rate minimization (TER) for binary classification. The PA learning establishes not only large margin training but also the capacity to handle non-separable data. The TER learning on the other hand minimizes an approximated classification error based objective function. We propose an online PATER algorithm which combines those useful properties. In addition, we also present a weighted PATER algorithm to improve the ability to cope with data imbalance problems. Experimental results demonstrate that the proposed PATER algorithms achieves better performances in terms of efficiency and effectiveness than the existing state-of-the-art online learning algorithms in real-world data sets.

algorithm, classification, hinge, (12 more...)

2002.01771

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Online (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Journal of Artificial Intelligence ResearchFeb-4-2020

Blind Spot Detection for Safe Sim-to-Real Transfer

Ramakrishnan, Ramya (Massachusetts Institute of Technology) | Kamar, Ece | Dey, Debadeepta | Horvitz, Eric | Shah, Julie

Agents trained in simulation may make errors when performing actions in the real world due to mismatches between training and execution environments. These mistakes can be dangerous and difficult for the agent to discover because the agent is unable to predict them a priori. In this work, we propose the use of oracle feedback to learn a predictive model of these blind spots in order to reduce costly errors in real-world applications. We focus on blind spots in reinforcement learning (RL) that occur due to incomplete state representation: when the agent lacks necessary features to represent the true state of the world, and thus cannot distinguish between numerous states. We formalize the problem of discovering blind spots in RL as a noisy supervised learning problem with class imbalance. Our system learns models for predicting blind spots within unseen regions of the state space by combining techniques for label aggregation, calibration, and supervised learning. These models take into consideration noise emerging from different forms of oracle feedback, including demonstrations and corrections. We evaluate our approach across two domains and demonstrate that it achieves higher predictive performance than baseline methods, and also that the learned model can be used to selectively query an oracle at execution time to prevent errors. We also empirically analyze the biases of various feedback types and how these biases influence the discovery of blind spots. Further, we include analyses of our approach that incorporate relaxed initial optimality assumptions. (Interestingly, relaxing the assumptions of an optimal oracle and an optimal simulator policy helped our models to perform better.) We also propose extensions to our method that are intended to improve performance when using corrections and demonstrations data.

agent, blind spot, oracle, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.11436

AI Access Foundation

11436

Journal of Artificial Intelligence Research

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Washington > King County > Redmond (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Transportation (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
(3 more...)