Search
A Credit Assignment Compiler for Joint Prediction
Chang, Kai-Wei, He, He, Ross, Stephane, III, Hal Daume, Langford, John
Many machine learning applications involve jointly predicting multiple mutually dependent output variables. Learning to search is a family of methods where the complex decision problem is cast into a sequence of decisions via a search space. Although these methods have shown promise both in theory and in practice, implementing them has been burdensomely awkward. In this paper, we show the search space can be defined by an arbitrary imperative program, turning learning to search into a credit assignment compiler. Altogether with the algorithmic improvements for the compiler, we radically reduce the complexity of programming and the running time. We demonstrate the feasibility of our approach on multiple joint prediction tasks. In all cases, we obtain accuracies as high as alternative approaches, at drastically reduced execution and programming time.
Optimal Tagging with Markov Chain Optimization
Rosenfeld, Nir, Globerson, Amir
Many information systems use tags and keywords to describe and annotate content. These allow for efficient organization and categorization of items, as well as facilitate relevant search queries. As such, the selected set of tags for an item can have a considerable effect on the volume of traffic that eventually reaches an item. In tagging systems where tags are exclusively chosen by an item's owner, who in turn is interested in maximizing traffic, a principled approach for assigning tags can prove valuable. In this paper we introduce the problem of optimal tagging, where the task is to choose a subset of tags for a new item such that the probability of browsing users reaching that item is maximized. We formulate the problem by modeling traffic using a Markov chain, and asking how transitions in this chain should be modified to maximize traffic into a certain state of interest. The resulting optimization problem involves maximizing a certain function over subsets, under a cardinality constraint. We show that the optimization problem is NP-hard, but has a (1-1/e)-approximation via a simple greedy algorithm due to monotonicity and submodularity. Furthermore, the structure of the problem allows for an efficient computation of the greedy step. To demonstrate the effectiveness of our method, we perform experiments on three tagging datasets, and show that the greedy algorithm outperforms other baselines.
Tree-Structured Reinforcement Learning for Sequential Object Localization
Jie, Zequn, Liang, Xiaodan, Feng, Jiashi, Jin, Xiaojie, Lu, Wen, Yan, Shuicheng
Existing object proposal algorithms usually search for possible object regions over multiple locations and scales \emph{ separately}, which ignore the interdependency among different objects and deviate from the human perception procedure. To incorporate global interdependency between objects into object localization, we propose an effective Tree-structured Reinforcement Learning (Tree-RL) approach to sequentially search for objects by fully exploiting both the current observation and historical search paths. The Tree-RL approach learns multiple searching policies through maximizing the long-term reward that reflects localization accuracies over all the objects. Starting with taking the entire image as a proposal, the Tree-RL approach allows the agent to sequentially discover multiple objects via a tree-structured traversing scheme. Allowing multiple near-optimal policies, Tree-RL offers more diversity in search paths and is able to find multiple objects with a single feed-forward pass. Therefore, Tree-RL can better cover different objects with various scales which is quite appealing in the context of object proposal. Experiments on PASCAL VOC 2007 and 2012 validate the effectiveness of the Tree-RL, which can achieve comparable recalls with current object proposal algorithms via much fewer candidate windows.
P-SyncBB: A Privacy Preserving Branch and Bound DCOP Algorithm
Distributed constraint optimization problems enable the representation of many combinatorial problems that are distributed by nature. An important motivation for such problems is to preserve the privacy of the participating agents during the solving process. The present paper introduces a novel privacy-preserving branch and bound algorithm for this purpose. The proposed algorithm, P-SyncBB, preserves constraint, topology and decision privacy. The algorithm requires secure solutions to several multi-party computation problems. Consequently, appropriate novel secure protocols are devised and analyzed. An extensive experimental evaluation on different benchmarks, problem sizes, and constraint densities shows that P-SyncBB exhibits superior performance to other privacy-preserving complete DCOP algorithms.
Variations on Memetic Algorithms for Graph Coloring Problems
Moalic, Laurent, Gondran, Alexandre
Given an undirected graph G (V, E) with V a set of vertices and E a set of edges, graph vertex coloring involves assigning each vertex with a color so that two adjacent vertices (linked by an edge) feature different colors. The Graph Vertex Coloring Problem (GVCP) consists in finding the minimum number of colors, called chromatic number ฯ(G), required to color the graph G while respecting these binary constraints. The GVCP is a well-documented and much-studied problem because this simple formalization can be applied to various issues such as frequency assignment problems [1, 2], scheduling problems [3, 4, 5] and flight level allocation problems [6, 7]. Most problems that involve sharing a rare resource (colors) between different operators (vertices) can be modeled as a GVCP.
How Google's search algorithm spreads false information with a rightwing bias
Google's search algorithm appears to be systematically promoting information that is either false or slanted with an extreme rightwing bias on subjects as varied as climate change and homosexuality. Following a recent investigation by the Observer, which uncovered that Google's search engine prominently suggests neo-Nazi websites and antisemitic writing, the Guardian has uncovered a dozen additional examples of biased search results. Google's search algorithm and its autocomplete function prioritize websites that, for example, declare that climate change is a hoax, being gay is a sin, and the Sandy Hook mass shooting never happened. The increased scrutiny on the algorithms of Google โ which removed antisemitic and sexist autocomplete phrases after the recent Observer investigation โ comes at a time of tense debate surrounding the role of fake news in building support for conservative political leaders, particularly US President-elect Donald Trump. Facebook has faced significant backlash for its role in enabling widespread dissemination of misinformation, and data scientists and communication experts have argued that rightwing groups have found creative ways to manipulate social media trends and search algorithms.
Small Representations of Big Kidney Exchange Graphs
Dickerson, John P., Kazachkov, Aleksandr M., Procaccia, Ariel D., Sandholm, Tuomas
Kidney exchanges are organized markets where patients swap willing but incompatible donors. In the last decade, kidney exchanges grew from small and regional to large and national---and soon, international. This growth results in more lives saved, but exacerbates the empirical hardness of the $\mathcal{NP}$-complete problem of optimally matching patients to donors. State-of-the-art matching engines use integer programming techniques to clear fielded kidney exchanges, but these methods must be tailored to specific models and objective functions, and may fail to scale to larger exchanges. In this paper, we observe that if the kidney exchange compatibility graph can be encoded by a constant number of patient and donor attributes, the clearing problem is solvable in polynomial time. We give necessary and sufficient conditions for losslessly shrinking the representation of an arbitrary compatibility graph. Then, using real compatibility graphs from the UNOS nationwide kidney exchange, we show how many attributes are needed to encode real compatibility graphs. The experiments show that, indeed, small numbers of attributes suffice.
Data Driven Resource Allocation for Distributed Learning
Dick, Travis, Li, Mu, Pillutla, Venkata Krishna, White, Colin, Balcan, Maria Florina, Smola, Alex
In distributed machine learning, data is dispatched to multiple machines for processing. Motivated by the fact that similar data points often belong to the same or similar classes, and more generally, classification rules of high accuracy tend to be "locally simple but globally complex" (Vapnik & Bottou 1993), we propose data dependent dispatching that takes advantage of such structure. We present an in-depth analysis of this model, providing new algorithms with provable worst-case guarantees, analysis proving existing scalable heuristics perform well in natural non worst-case conditions, and techniques for extending a dispatching rule from a small sample to the entire distribution. We overcome novel technical challenges to satisfy important conditions for accurate distributed learning, including fault tolerance and balancedness. We empirically compare our approach with baselines based on random partitioning, balanced partition trees, and locality sensitive hashing, showing that we achieve significantly higher accuracy on both synthetic and real world image and advertising datasets. We also demonstrate that our technique strongly scales with the available computing power.
What we've learned about SEO in 2016
Since the inception of the search engine, SEO has been an important, yet often misunderstood industry. For some, these three little letters bring massive pain and frustration. For others, SEO has saved their business. One thing is for sure: having a clear and strategic search strategy is what often separates those who succeed from those who don't. As we wrap up 2016, let's take a look at how the industry has grown and shifted over the past year, and then look ahead to 2017.
Feliks Zemdegs sets new Rubik's Cube world record
The incredible moment a man solves a Rubik's cube in less than FIVE SECONDS to set a new world record (as the previous champion sits next to him and grins through gritted teeth) Feliks Zemdegs, 20, solved the famous 1980s toy in just 4.73 seconds Previous world record set by Mats Valk, 20, who is sitting next to Mr Zemdegs Mr Zemdegs got ten seconds to inspect the Rubik's cube before he has to solve it Feliks Zemdegs, 20, solved the famous 1980s toy in just 4.73 seconds Mr Zemdegs got ten seconds to inspect the Rubik's cube before he has to solve it His hands move so fast the camera struggles to pick up his finger movements. He solves the puzzle in just 4.73 seconds. The previous world record was set by Mats Valk, 20, (right) is sat next to Mr Zemdegs as he breaks his record. Valley Stream Best Buy associates gift a teen with a Wii U Watch woman get dragged off jet by police in Detroit Syria: Footage emerges of Russian special forces'fighting ISIS' Trash is dumped on woman's door ...