Plotting

 Asia


Video Blog: Machine Learning for dummies

#artificialintelligence

Ecommerce companies have seen a slower quarter. According to Business Today, investors in India have turned cautious about placing big bets on ecommerce start-ups and the party is clearly over. The total VC money invested and the numbers of deals have dropped from a peak of 43 deals worth 831 million in March 2015 to just 24 transactions worth 112 million (Rs 730 crore) in less than a year according to Economic Times. In a similar state, brands from leading ecommerce markets like SEA and the Middle-East are looking at tighter marketing budgets and greater value for their mobile ad spends (the focus is on metrics like In-App user engagement and conversions).


School headmaster receives praise for using mahjong to teach students English

Mashable

Learning a new language that's not your mother tongue can be real daunting. A school in Chengdu, China, has made national news headlines for its creative way of teaching its students English with mahjong tiles. Mahjong is a traditional Chinese card game that's similar to the Western's world rummy. It is usually played with four players and involves 144 tiles based on Chinese characters and symbols. Many Chinese children pick up on how to play mahjong from their parents from a young age, simply by observing the adults playing rounds of the game.


Machine Learning Applied Scientist at Microsoft India (R&D) - Machine Learning

#artificialintelligence

Bing is all about data and using it to deliver the highest quality search experience -- our mission is to empower people with knowledge. Fostered by the unprecedented availability of web data and accelerated by the recent advances in deep learning, we are now at a pivotal moment in which huge amounts of both structured and unstructured data can be mined from the web to allow search engines to connect users directly with the information they need. Our ultimate goal is to fulfill any information need on the spot, whether on a desktop or mobile device. We see the web as the most comprehensive database for powering a multitude of services across all Microsoft products, from Bing to Excel to Skype, not to mention Cortana -- all of them need data. Join our team and help us to build the next generation of applications.


Lee: Chinese Tech Firms Need Experts With Cross Disciplines

WSJ.com: WSJD - Technology

HONG KONG--The biggest challenges for Chinese companies making the next generation of wearables, self-driving cars and drones is having experts in cross disciplines, GGV Capital Managing Partner Jenny Lee said Friday. Speaking at the Converge technology conference hosted by The Wall Street Journal and f.ounders in Hong Kong, Ms. Lee said Chinese companies benefit from having government support and funding, and a huge market of...


Baidu Plans to Mass Produce Autonomous Cars in Five Years

WSJ.com: WSJD - Technology

HONG KONG--Chinese search-engine giant Baidu Inc. Senior Vice President Wang Jing said Friday that the company plans to mass produce a driverless car in five years--so that babies born today won't need a driver's license. Baidu, which is China's largest search-company with 80% market share, is already testing its model on public roads in Beijing and in Wuhu, in China's southeastern Anhui province, and in a closed testing area in Shanghai. Speaking at The Wall Street Journal's Converge technology conference, Mr. Wang said...


IBM pushes blockchain, cognitive gameplay in Singapore

ZDNet

IBM has expanded its ambition to become a cognitive services provider into this neck of the woods, where it will soon open new hubs in Singapore and Tokyo to provide a startup developer environment for enterprises. Executives at the IBM Solutions Connect 2016 conference held Thursday in Singapore outlined the company's plans to transform into a cognitive and cloud company. Raymond Wong, IBM Singapore's country manager for software, revealed that a new IBM Design Studio would be launched in the city-state next week, making it the company's 30th of such facilities worldwide. The local site also would serve as the regional hub. In addition, a new Bluemix Garage would soon be opened in Singapore, providing an environment in which developers could access design and support to build applications, specifically, in emerging technologies such as blockchain.


Statistical Pattern Recognition for Driving Styles Based on Bayesian Probability and Kernel Density Estimation

arXiv.org Machine Learning

Driving styles have a great influence on vehicle fuel economy, active safety, and drivability. To recognize driving styles of path-tracking behaviors for different divers, a statistical pattern-recognition method is developed to deal with the uncertainty of driving styles or characteristics based on probability density estimation. First, to describe driver path-tracking styles, vehicle speed and throttle opening are selected as the discriminative parameters, and a conditional kernel density function of vehicle speed and throttle opening is built, respectively, to describe the uncertainty and probability of two representative driving styles, e.g., aggressive and normal. Meanwhile, a posterior probability of each element in feature vector is obtained using full Bayesian theory. Second, a Euclidean distance method is involved to decide to which class the driver should be subject instead of calculating the complex covariance between every two elements of feature vectors. By comparing the Euclidean distance between every elements in feature vector, driving styles are classified into seven levels ranging from low normal to high aggressive. Subsequently, to show benefits of the proposed pattern-recognition method, a cross-validated method is used, compared with a fuzzy logic-based pattern-recognition method. The experiment results show that the proposed statistical pattern-recognition method for driving styles based on kernel density estimation is more efficient and stable than the fuzzy logic-based method.


Robust Ensemble Clustering Using Probability Trajectories

arXiv.org Machine Learning

Although many successful ensemble clustering approaches have been developed in recent years, there are still two limitations to most of the existing approaches. First, they mostly overlook the issue of uncertain links, which may mislead the overall consensus process. Second, they generally lack the ability to incorporate global information to refine the local links. To address these two limitations, in this paper, we propose a novel ensemble clustering approach based on sparse graph representation and probability trajectory analysis. In particular, we present the elite neighbor selection strategy to identify the uncertain links by locally adaptive thresholds and build a sparse graph with a small number of probably reliable links. We argue that a small number of probably reliable links can lead to significantly better consensus results than using all graph links regardless of their reliability. The random walk process driven by a new transition probability matrix is utilized to explore the global information in the graph. We derive a novel and dense similarity measure from the sparse graph by analyzing the probability trajectories of the random walkers, based on which two consensus functions are further proposed. Experimental results on multiple real-world datasets demonstrate the effectiveness and efficiency of our approach.


Combining Multiple Clusterings via Crowd Agreement Estimation and Multi-Granularity Link Analysis

arXiv.org Machine Learning

The clustering ensemble technique aims to combine multiple clusterings into a probably better and more robust clustering and has been receiving an increasing attention in recent years. There are mainly two aspects of limitations in the existing clustering ensemble approaches. Firstly, many approaches lack the ability to weight the base clusterings without access to the original data and can be affected significantly by the low-quality, or even ill clusterings. Secondly, they generally focus on the instance level or cluster level in the ensemble system and fail to integrate multi-granularity cues into a unified model. To address these two limitations, this paper proposes to solve the clustering ensemble problem via crowd agreement estimation and multi-granularity link analysis. We present the normalized crowd agreement index (NCAI) to evaluate the quality of base clusterings in an unsupervised manner and thus weight the base clusterings in accordance with their clustering validity. To explore the relationship between clusters, the source aware connected triple (SACT) similarity is introduced with regard to their common neighbors and the source reliability. Based on NCAI and multi-granularity information collected among base clusterings, clusters, and data instances, we further propose two novel consensus functions, termed weighted evidence accumulation clustering (WEAC) and graph partitioning with multi-granularity link analysis (GP-MGLA) respectively. The experiments are conducted on eight real-world datasets. The experimental results demonstrate the effectiveness and robustness of the proposed methods.


A Graph-Based Semi-Supervised k Nearest-Neighbor Method for Nonlinear Manifold Distributed Data Classification

arXiv.org Machine Learning

$k$ Nearest Neighbors ($k$NN) is one of the most widely used supervised learning algorithms to classify Gaussian distributed data, but it does not achieve good results when it is applied to nonlinear manifold distributed data, especially when a very limited amount of labeled samples are available. In this paper, we propose a new graph-based $k$NN algorithm which can effectively handle both Gaussian distributed data and nonlinear manifold distributed data. To achieve this goal, we first propose a constrained Tired Random Walk (TRW) by constructing an $R$-level nearest-neighbor strengthened tree over the graph, and then compute a TRW matrix for similarity measurement purposes. After this, the nearest neighbors are identified according to the TRW matrix and the class label of a query point is determined by the sum of all the TRW weights of its nearest neighbors. To deal with online situations, we also propose a new algorithm to handle sequential samples based a local neighborhood reconstruction. Comparison experiments are conducted on both synthetic data sets and real-world data sets to demonstrate the validity of the proposed new $k$NN algorithm and its improvements to other version of $k$NN algorithms. Given the widespread appearance of manifold structures in real-world problems and the popularity of the traditional $k$NN algorithm, the proposed manifold version $k$NN shows promising potential for classifying manifold-distributed data.