Case-Based Reasoning
Robust nonparametric nearest neighbor random process clustering
Tschannen, Michael, Bรถlcskei, Helmut
We consider the problem of clustering noisy finite-length observations of stationary ergodic random processes according to their generative models without prior knowledge of the model statistics and the number of generative models. Two algorithms, both using the $L^1$-distance between estimated power spectral densities (PSDs) as a measure of dissimilarity, are analyzed. The first one, termed nearest neighbor process clustering (NNPC), relies on partitioning the nearest neighbor graph of the observations via spectral clustering. The second algorithm, simply referred to as $k$-means (KM), consists of a single $k$-means iteration with farthest point initialization and was considered before in the literature, albeit with a different dissimilarity measure. We prove that both algorithms succeed with high probability in the presence of noise and missing entries, and even when the generative process PSDs overlap significantly, all provided that the observation length is sufficiently large. Our results quantify the tradeoff between the overlap of the generative process PSDs, the observation length, the fraction of missing entries, and the noise variance. Finally, we provide extensive numerical results for synthetic and real data and find that NNPC outperforms state-of-the-art algorithms in human motion sequence clustering.
Making the positive case for artificial intelligence - CBR
In part, the critics of AI are driven by the knowledge that'white collar jobs' are the ones that are now under threat. Business leaders are frequently confronted by notions of job-killing automation and headlines on the variation of the theme that "Robots Will Steal Our Jobs." Elon Musk, CEO of Tesla, Silicon Valley figurehead, and champion of technology-driven innovation even goes a step further by suggesting AI is a fundamental threat to human civilisation. In part, the critics of AI are driven by the knowledge that'white collar jobs' are the ones that are now under threat. The robot on the assembly line is now a familiar image.
Implementing kd-tree for fast range-search, nearest-neighbor search and k-nearest-neighbor search algorithms in 2D in Java and python
The following problem appeared as an assignment in the coursera course Algorithm-I by Prof.Robert Sedgewick from the Princeton University few years back (and also in the course cos226 offered at Princeton). The problem definition and the description is taken from the course website and lectures. The original assignment was to be done in java, where in this article both the java and a corresponding python implementation will also be described. The idea is to build a BST with points in the nodes, using the xโ and y-coordinates of the points as keys in strictly alternating sequence, starting with the x-coordinates, as shown in the next figure. The following figures and animations show how the 2-d-tree is grown with recursive space-partioning for a few sample datasets.
Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information
Conditional independence testing is a fundamental problem underlying causal discovery and a particularly challenging task in the presence of nonlinear and high-dimensional dependencies. Here a fully non-parametric test for continuous data based on conditional mutual information combined with a local permutation scheme is presented. Through a nearest neighbor approach, the test efficiently adapts also to non-smooth distributions due to strongly nonlinear dependencies. Numerical experiments demonstrate that the test reliably simulates the null distribution even for small sample sizes and with high-dimensional conditioning sets. The test is better calibrated than kernel-based tests utilizing an analytical approximation of the null distribution, especially for non-smooth densities, and reaches the same or higher power levels. Combining the local permutation scheme with the kernel tests leads to better calibration, but suffers in power. For smaller sample sizes and lower dimensions, the test is faster than random fourier feature-based kernel tests if the permutation scheme is (embarrassingly) parallelized, but the runtime increases more sharply with sample size and dimensionality. Thus, more theoretical research to analytically approximate the null distribution and speed up the estimation for larger sample sizes is desirable.
MLDM 2018 : 14th International Conference on Machine Learning and Data Mining MLDM 2018
The Aim of the Conference The aim of the conference is to bring together researchers from all over the world who deal with machine learning and data mining in order to discuss the recent status of the research and to direct further developments. Basic research papers as well as application papers are welcome. Paper submissions should be related but not limited to any of the following topics: association rules case-based reasoning and learning classification and interpretation of images, text, video conceptional learning and clustering Goodness measures and evaluaion (e.g. Long Paper The paper must be formatted in the Springer LNCS format. They should have at most 15 pages.
K - Nearest Neighbors - KNN Fun and Easy Machine Learning
In pattern recognition, the KNN algorithm is a method for classifying objects based on closest training examples in the feature space. KNN is a type of instance-based learning, or lazy learning where the function is only approximated locally and all computation is delayed until classification. The KNN is the fundamental and simplest classification technique when there is little or no prior knowledge about the distribution of the data. The K in KNN refers to number of nearest neighbors that the classifier will use to make its predication. In this video we use Game of Thrones example to explain kNN.
Using Artificial Intelligence to Run your Best Marathon
I've been writing marathon-related blog posts for about 2 years now, describing a range of studies on different aspects of marathon running, such as the influence of age, gender, and experience on performance and pacing, and focusing on race-records from a wide range of big-city marathons around the world. To date these studies have focused on analysing marathon data with a view to gaining insights into what has happened in the past; something that is often referred to as descriptive analytics in the world of data science. Recently I have turned my attention to the future, to use this marathon data to gain insights into what might happen in the future -- predictive analytics -- and, in particular to make predictions about the potential of runners to achieve new personal best (PB) finish-times. In fact, what began as a bit of data-fun in my spare-time, has now started to leak into my day-job, and this week I will present a scientific paper based on this prediction work. This is not so unusual. As a Professor in the area of artificial intelligence, machine learning, and recommender systems, a major part of my job involves publishing and presenting research ideas.
The Pitfalls of Hunting Cyber Threats with AI - CBR
Although it's not a'one size fits all' solution, artificial intelligence can be used to successfully hunt cyberthreats. Giovanni Vigna, CTO and co-founder of Lastline, identifies several of the key areas to address when thinking proactively about AI as a tool in detecting cyberthreats. Artificial intelligence (AI) will not automatically detect and resolve every potential malware or cyberthreat incident, but when it combines both bad and good behavior modeling it becomes a successful and powerful weapon against even the most advanced malware. By their very nature, malware detection tools must constantly evolve to stay up to date with ever-changing crimeware. One of the biggest evolutions in malware detection is the migration from trapping to hunting.
How Artificial Intelligence Could Help Transform The Oil Industry
While the oil and gas industry has had its share of ups and downs over the past decade, many financial institutions are banking on a very slow growth of oil prices in 2017. Though some believe that the efficiency gains that the oil industry can capture are quickly coming to an end, this sentiment is only capturing hard technology specifically related to oil and gas. To help bring the O&G industry to the 21st century, technology from other industries needs to be incorporated, using many hard-earned years of expertise and different lines of thinking. Oilprice previously mentioned incorporating food industry technology to increase safety standards when fracking, but incorporating technology from the IT industry is something that the O&G industry as a whole can benefit from. Whether its neural networks, machine learning, fuzzy logic, case-based reasoning or expert systems, AI has the potential to transform the industry.
Cognitive Adaptive Learning, Classification, and Response for Communications Threats (CALCR): A Case-Based Reasoning Approach
Whitaker, Elizabeth Taylor (Georgia Tech Research Institute (GTRI)) | Trewhitt, Ethan Brantley (Georgia Tech Research Institute (GTRI)) | Rosenbluth, David (Lockheed Martin Advanced Technology Laboratories)
The Cognitive Adaptive Learning Classification and Response for Communications Threats system, (CALCR) uses a case-based reasoning (CBR) and case-based learning (CBL) approach to address issues encountered in a contested RF communications environment. CALCR was the result of a research project that explored new approaches to understanding communications threats and responding with appropriate countermeasures. Modern communications threats may be modified from existing systems, or may be completely new systems, and CALCR enables a response to these unknown or unanticipated threats. CALCR integrates existing properties of CBR, along with several innovations, making it ideal for this problem: the ability for a case library to be extended through CBL as new conditions are encountered; the robustness of CBR in situations where there is missing data, which CALCR addresses with an advanced intelligent similarity measure; the ability to detect classes unknown to the case library through the use of a confidence measure; and the ability to provide a best-attempt solution, when multiple threat classes are matched, through the use of a new approach called the taxonomy reasoner.