The original goal of this post was to explore the relationship between the softmax and sigmoid functions. In truth, this relationship had always seemed just out of reach: "One has an exponent in the numerator! One has a 1 in the denominator!" And of course, the two have different names. Once derived, I quickly realized how this relationship backed out into a more general modeling framework motivated by the conditional probability axiom itself.

Balseiro, Santiago, Golrezaei, Negin, Mahdian, Mohammad, Mirrokni, Vahab, Schneider, Jon

We consider the variant of this problem where in addition to receiving the reward $r_{a,t}(c)$, the learner also learns the values of $r_{a,t}(c')$ for all other contexts $c'$; i.e., the rewards that would have been achieved by performing that action under different contexts. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions (in this setting the context is the decision maker's private valuation for each auction). We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve $\tilde{O}(\sqrt{CKT})$ regret against all stationary policies, where $C$ is the number of contexts, $K$ the number of actions, and $T$ the number of rounds. We demonstrate algorithms for the contextual bandits problem with cross-learning that remove the dependence on $C$ and achieve regret $\tilde{O}(\sqrt{KT})$ (when contexts are stochastic with known distribution), $\tilde{O}(K {1/3}T {2/3})$ (when contexts are stochastic with unknown distribution), and $\tilde{O}(\sqrt{KT})$ (when contexts are adversarial but rewards are stochastic).

In computer science, Monte Carlo tree search (MCTS) is a heuristic search algorithm for some kinds of decision processes, most notably those employed in game play. A leading example is recent computer Go programs,[1] but it also has been used in other board games, as well as real-time video games and non-deterministic games such as poker (see history section). The focus of Monte Carlo tree search is on the analysis of the most promising moves, expanding the search tree based on random sampling of the search space. The application of Monte Carlo tree search in games is based on many playouts. In each playout, the game is played-out to the very end by selecting moves at random.

Deep generative models for graphs have shown great promise in the area of drug design, but have so far found little application beyond generating graph-structured molecules. In this work, we demonstrate a proof of concept for the challenging task of road network extraction from image data introducing the Generative Graph Transformer (GGT): a deep autoregressive model based on state-of-the-art attention mechanisms. In road network extraction, the goal is to learn to reconstruct graphs representing the road networks pictured in satellite images. A PyTorch implementation of GGT is available here. The proposed GGT model is designed for the recurrent generation of graphs, conditioned on other data such as an image, by means of the encoder-decoder architecture outlined in Figure 1.

The implied semantics of direct manipulation is that when a user drags an UI element (in this case, an axis handle), they are signaling to the system that they wished that the corresponding data point had been projected to the location where the UI element was dropped, rather than where it was dragged from. In our case the overall projection is a rotation (originally determined by the Grand Tour), and an arbitrary user manipulation might not necessarily generate a new projection that is also a rotation. Our goal, then, is to find a new rotation which satisfies the user request and is close to the previous state of the Grand Tour projection, so that the resulting state satisfies the user request. In a nutshell, when user drags the ithi {th}ith axis handle by (dx,dy)(dx, dy)(dx,dy), we add them to the first two entries of the ithi {th}ith row of the Grand Tour matrix, and then perform Gram-Schmidt orthonormalization on the rows of the new matrix. Rows have to be reordered such that the ithi {th}ith row is considered first in the Gram-Schmidt procedure.