Harriet Tubman didn't give many interviews in her lifetime, and when she did, they were generally conducted by one of her friends, Sarah Hopkins Bradford, a White children's book author in Upstate New York, where Tubman spent the last decades of her life. The result of those interviews were two biographies, published in 1869 and 1886. Though Bradford obviously admired Tubman, the books suffer from her sometimes patronizing attitude toward her subject, her use of racial slurs and her awkward attempts to re-create the speech patterns of a Black woman raised enslaved in Maryland. Some of the long "quotes" from Tubman were completely made up, and it shows. So I was curious to see what would happen recently when I had my own "interview" with Tubman -- using the online educator Khan Academy's new artificial intelligence learning tool Khanmigo, which enables users to have live chats with dozens of simulated historical figures like Abigail Adams, Genghis Khan, Montezuma and Winston Churchill. And if so, would it come off horribly, a 21st-century minstrelsy?
From anime to childhood classics, animations have brought stories to life by combining still images. Now, with just a text prompt, you can generate your own animations using AI. On Thursday, Stability AI, the AI company that created Stable Diffusion, unveiled a text-to-animation tool that allows developers and artists to use Stable Diffusion models to generate animations. The tool, known as Stable Animation SDK, can generate video from three different inputs including text alone, text and an initial image, and text and an input video. Some users have taken to Twitter to share their animations.
The greedy algorithm is extensively studied in the field of combinatorial optimization for decades. In this paper, we address the online learning problem when the input to the greedy algorithm is stochastic with unknown parameters that have to be learned over time. We first propose the greedy regret and ɛ-quasi greedy regret as learning metrics comparing with the performance of offline greedy algorithm. We then propose two online greedy learning algorithms with semi-bandit feedbacks, which use multi-armed bandit and pure exploration bandit policies at each level of greedy learning, one for each of the regret metrics respectively. Both algorithms achieve O(log T) problem-dependent regret bound (T being the time horizon) for a general class of combinatorial structures and reward functions that allow greedy solutions. We further show that the bound is tight in T and other problem instance parameters.
We consider a crowdsourcing model in which n workers are asked to rate the quality of n items previously generated by other workers. An unknown set of αn workers generate reliable ratings, while the remaining workers may behave arbitrarily and possibly adversarially. The manager of the experiment can also manually evaluate the quality of a small number of items, and wishes to curate together almost all of the high-quality items with at most an ɛ fraction of low-quality items.
The paper considers the problem of online learning in two-player zero-sum stochastic games. The main result is constructing a strategy for player 1 that guarantees that the cumulative rewards will never go below the maximin value of the game by more than a certain bound, no matter what strategy the other player follows. The bound is shown to grow sublinearly in the number of rounds T of the game, and polynomially on other problem parameters such as the diameter, the size of the state and action spaces. The results imply that the proposed algorithm can be used in self-play to compute near-maximin strategies for both players. The algorithm and the analysis are largely based on the UCRL algorithm of Auer and Ortner (2007) and the analysis thereof.
We study online reinforcement learning in average-reward stochastic games (SGs). An SG models a two-player zero-sum game in a Markov environment, where state transitions and one-step payoffs are determined simultaneously by a learner and an adversary. We propose the UCSG algorithm that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent. This result improves previous ones under the same setting. The regret bound has a dependency on the diameter, which is an intrinsic value related to the mixing property of SGs. If we let the opponent play an optimistic best response to the learner, UCSG finds an ε-maximin stationary policy with a sample complexity of Õ (poly(1/ε)), where ε is the gap to the best policy.
The paper proposes an efficient second-order online kernel learning mainly by combining KONS and Nystrom method. NOVELTY The novelty is limited on both the methodological and theoretical contributions. The achieved results do not have profound implication for the advancement of theory and practice. WRITING QUALITY The English writing and organization of this paper are relatively good. The reviewer strongly suggests the authors arrange Table 2 in the main paper rather than in Appendix because the experimental results in Table 2 are the core material.
Online kernel learning (OKL) is a flexible framework for prediction problems, since the large approximation space provided by reproducing kernel Hilbert spaces often contains an accurate function for the problem. Nonetheless, optimizing over this space is computationally expensive. Not only first order methods accumulate O( T) more loss than the optimal function, but the curse of kernelization results in a O(t) per-step complexity.
Implicit feedback, such as user clicks, although abundant in online information service systems, does not provide substantial evidence on users' evaluation of system's output. Without proper modeling, such incomplete supervision inevitably misleads model estimation, especially in a bandit learning setting where the feedback is acquired on the fly. In this work, we perform contextual bandit learning with implicit feedback by modeling the feedback as a composition of user result examination and relevance judgment. Since users' examination behavior is unobserved, we introduce latent variables to model it. We perform Thompson sampling on top of variational Bayesian inference for arm selection and model update. Our upper regret bound analysis of the proposed algorithm proves its feasibility of learning from implicit feedback in a bandit setting; and extensive empirical evaluations on click logs collected from a major MOOC platform further demonstrate its learning effectiveness in practice.
The paper proposes a simple technique for improved feature learning in convolutional neural networks. The technique consists of adding a "negative" virtual class to CNN training on classification tasks with the softmax loss function. The authors evaluate their approach on a range of computer vision datasets, (CIFAR10/100/100, LFW, SLLFW, CUB200, ImageNet32) and find that it outperforms simple baselines on all of them, and outperforms more complicated state-of-the-art techniques on most of them. The authors also present an analysis from a few different standpoints as to why their method is effective. Strengths: - The technique proposed by the authors is extremely simple to implement (just a one line change in existing code would suffice, as far as I can tell).