Goto

Collaborating Authors

 right feature


Clustering Items through Bandit Feedback: Finding the Right Feature out of Many

Graf, Maximilian, Thuot, Victor, Verzelen, Nicolas

arXiv.org Machine Learning

We study the problem of clustering a set of items based on bandit feedback. Each of the $n$ items is characterized by a feature vector, with a possibly large dimension $d$. The items are partitioned into two unknown groups such that items within the same group share the same feature vector. We consider a sequential and adaptive setting in which, at each round, the learner selects one item and one feature, then observes a noisy evaluation of the item's feature. The learner's objective is to recover the correct partition of the items, while keeping the number of observations as small as possible. We provide an algorithm which relies on finding a relevant feature for the clustering task, leveraging the Sequential Halving algorithm. With probability at least $1-\delta$, we obtain an accurate recovery of the partition and derive an upper bound on the budget required. Furthermore, we derive an instance-dependent lower bound, which is tight in some relevant cases.


5 common mistakes while working with machine learning algorithms - GreatLearning

#artificialintelligence

Perfection is achieved only by making mistakes. Same holds true when you work with machine learning algorithms to build models. Most of the time, it is not obvious how to proceed and navigate at the beginning and professionals are bound to make mistakes, especially those who are a novice in the domain. Here is a list of most common mistakes that are committed while working with machine learning algorithms. Hopefully, you will learn and draw valuable insights from this article that you could apply in your work.


Cybersecurity and machine learning: How selecting the right features can lead to success

#artificialintelligence

Big data is around us. However, it is common to hear from a lot of data scientists and researchers doing analytics that they need more data. How is that possible, and where does this eagerness to get more data come from? Very often, data scientists need lots of data to train sophisticated machine-learning models. The same applies when using machine-learning algorithms for cybersecurity.


Xconomy: Five Questions With a16z's Vijay Pande on AI and Making New Drugs

#artificialintelligence

In startup world these days, the word "biotech" is increasingly accompanied by "computational" and two, two-letter initialisms: AI and ML. Those tools--artificial intelligence and machine learning, respectively--have been around for decades, but in recent years have become faster and cheaper, accelerating their use by those in the business of discovering and developing new drugs. Another startup looking to take advantage of those improvements, South San Francisco-based Genesis Therapeutics, has scored $4.1 million in seed funding and publicly joined the growing fray of biotechs with grand ambitions of disrupting the slow, costly process of discovering and developing new medicines. Andreessen Horowitz, also known as a16z, led its seed round, one of a handful of seed-stage investments it has made in biotech. Felicis Ventures, another VC firm based in Silicon Valley, also invested.


Useful things to know about Machine Learning

#artificialintelligence

Learning algorithms are the seeds, data is the soil, and the learned programs are the grown plants. The machine learning expert is like a farmer, sowing the seeds, irrigating and fertilizing the soil, and keeping an eye on the health of the crop but otherwise staying out of the way. Machine learning algorithms are different: they are called learners, they input data and output other algorithms. The algorithms produced by learners are of several types, but the most common ones are called classifiers. They are used to assign a class, or label, to an object having certain numeric or categorical features.


Self-configuration from a Machine-Learning Perspective

Konen, Wolfgang

arXiv.org Machine Learning

The goal of machine learning is to provide solutions which are trained by data or by experience coming from the environment. Many training algorithms exist and some brilliant successes were achieved. But even in structured environments for machine learning (e.g. data mining or board games), most applications beyond the level of toy problems need careful hand-tuning or human ingenuity (i.e. detection of interesting patterns) or both. We discuss several aspects how self-configuration can help to alleviate these problems. One aspect is the self-configuration by tuning of algorithms, where recent advances have been made in the area of SPO (Sequen- tial Parameter Optimization). Another aspect is the self-configuration by pattern detection or feature construction. Forming multiple features (e.g. random boolean functions) and using algorithms (e.g. random forests) which easily digest many fea- tures can largely increase learning speed. However, a full-fledged theory of feature construction is not yet available and forms a current barrier in machine learning. We discuss several ideas for systematic inclusion of feature construction. This may lead to partly self-configuring machine learning solutions which show robustness, flexibility, and fast learning in potentially changing environments.