possible outcome
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
UCB-based Algorithms for Multinomial Logistic Regression Bandits
Out of the rich family of generalized linear bandits, perhaps the most well studied ones are logistic bandits that are used in problems with binary rewards: for instance, when the learner aims to maximize the profit over a user that can select one of two possible outcomes (e.g., `click' vs `no-click'). Despite remarkable recent progress and improved algorithms for logistic bandits, existing works do not address practical situations where the number of outcomes that can be selected by the user is larger than two (e.g., `click', `show me later', `never show again', `no click'). In this paper, we study such an extension. We use multinomial logit (MNL) to model the probability of each one of $K+1\geq 2$ possible outcomes (+1 stands for the `not click' outcome): we assume that for a learner's action $\mathbf{x}_t$, the user selects one of $K+1\geq 2$ outcomes, say outcome $i$, with a MNL probabilistic model with corresponding unknown parameter $\bar{\boldsymbol{\theta}}_{\ast i}$. Each outcome $i$ is also associated with a revenue parameter $\rho_i$ and the goal is to maximize the expected revenue. For this problem, we present MNL-UCB, an upper confidence bound (UCB)-based algorithm, that achieves regret $\tilde{\mathcal{O}}(dK\sqrt{T})$ with small dependency on problem-dependent constants that can otherwise be arbitrarily large and lead to loose regret bounds. We present numerical simulations that corroborate our theoretical results.
Large language models replicate and predict human cooperation across experiments in game theory
Palatsi, Andrea Cera, Martin-Gutierrez, Samuel, Cardenal, Ana S., Pellert, Max
Large language models (LLMs) are increasingly used both to make decisions in domains such as health, education and law, and to simulate human behavior. Yet how closely LLMs mirror actual human decision-making remains poorly understood. This gap is critical: misalignment could produce harmful outcomes in practical applications, while failure to replicate human behavior renders LLMs ineffective for social simulations. Here, we address this gap by developing a digital twin of game-theoretic experiments and introducing a systematic prompting and probing framework for machine-behavioral evaluation. Testing three open-source models (Llama, Mistral and Qwen), we find that Llama reproduces human cooperation patterns with high fidelity, capturing human deviations from rational choice theory, while Qwen aligns closely with Nash equilibrium predictions. Notably, we achieved population-level behavioral replication without persona-based prompting, simplifying the simulation process. Extending beyond the original human-tested games, we generate and preregister testable hypotheses for novel game configurations outside the original parameter grid. Our findings demonstrate that appropriately calibrated LLMs can replicate aggregate human behavioral patterns and enable systematic exploration of unexplored experimental spaces, offering a complementary approach to traditional research in the social and behavioral sciences that generates new empirical predictions about human social decision-making.
- North America > United States > Michigan (0.04)
- North America > Canada (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
UCB-based Algorithms for Multinomial Logistic Regression Bandits
Out of the rich family of generalized linear bandits, perhaps the most well studied ones are logistic bandits that are used in problems with binary rewards: for instance, when the learner aims to maximize the profit over a user that can select one of two possible outcomes (e.g., click' vs no-click'). Despite remarkable recent progress and improved algorithms for logistic bandits, existing works do not address practical situations where the number of outcomes that can be selected by the user is larger than two (e.g., click', show me later', never show again', no click'). In this paper, we study such an extension. We use multinomial logit (MNL) to model the probability of each one of K 1\geq 2 possible outcomes ( 1 stands for the not click' outcome): we assume that for a learner's action \mathbf{x}_t, the user selects one of K 1\geq 2 outcomes, say outcome i, with a MNL probabilistic model with corresponding unknown parameter \bar{\boldsymbol{\theta}}_{\ast i} . Each outcome i is also associated with a revenue parameter \rho_i and the goal is to maximize the expected revenue.
- Research Report > New Finding (0.40)
- Research Report > Experimental Study (0.40)
Learning thresholds lead to stable language coexistence
Tamm, Mikhail V., Heinsalu, Els, Scialla, Stefano, Patriarca, Marco
We introduce a language competition model that incorporates the effects of memory and learning on the language shift dynamics, using the Abrams-Strogatz model as a starting point. On a coarse grained time scale, the effects of memory and learning can be expressed as thresholds on the speakers fractions. In its simplest form, the resulting model is exactly solvable. Besides the consensus on one of the two languages, the model describes additional equilibrium states that are not present in the Abrams-Strogatz model: a stable coexistence of the two languages, if both thresholds are low enough, so that the language shift processes in the two opposite directions compensate each other, and a frozen state coinciding with the initial state, when both thresholds are too high for any language shift to take place. We show numerically that these results are preserved for threshold functions of a more general shape.
- Europe > Estonia > Harju County > Tallinn (0.05)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
DeepMind's Latest AI Trounces Human Players at the Game 'Stratego'
Yet to navigate our unpredictable world, it needs to learn to make choices with imperfect information--as we do every single day. DeepMind just took a stab at solving this conundrum. The trick was to interweave game theory into an algorithmic strategy loosely based on the human brain called deep reinforcement learning. The result, DeepNash, toppled human experts in a highly strategic board game called Stratego. A notoriously difficult game for AI, Stratego requires multiple strengths of human wit: long-term thinking, bluffing, and strategizing, all without knowing your opponent's pieces on the board.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.63)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.63)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)
Chapter 2 Error control
If you perform a study and plan to make a claim based on the statistical test you plan to perform, the long run probability of making a correct claim or an erroneous claim is determined by three factors, namely the Type 1 error rate, the Type 2 error rate, and the probability that the null hypothesis is true. There are four possible outcomes of a statistical test, depending on whether the result is statistically significant or not, and whether the null hypothesis is true, or not. False Positive (FP): Concluding there is a true effect, when there is a no true effect (\(H_0\) is true). This is also referred to as a Type 1 error, and indicated by \(\alpha\). False Negative (FN): Concluding there is a no true effect, when there is a true effect (\(H_1\) is true).
La veille de la cybersécurité
The most valuable part of AI is its ability to take in huge amounts of data and calculate every possible outcome, then make recommendations based on a variety of parameters. With the rise of digitization, we're gathering more and more data that, if used to its full potential, will help businesses counter uncertainty and make business outcomes more predictable. Nowadays, companies face countless challenges -- inflation, supply chain delays, natural disasters, and global pandemics. The most valuable part of AI is its ability to take in huge amounts of data and calculate every possible outcome, then make recommendations based on a variety of parameters. It can also offer solutions to lessen these problems without the need for human interference.
A game that stymies AI
Artificial intelligence (AI) has succeeded spectacularly in certain kinds of tasks. These include playing specific games, such as Chess or Go, and finding patterns in images, such as identifying when a human organ is diseased or otherwise abnormal. It has done much less well in situations requiring more generalised learning, such as in understanding a text, or translating it from one language into another. Worse, acting like a human by expressing--and, especially, feeling--appropriate emotions seems well beyond AI's capacity. AI is good at some specialised tasks, but not as good at more general ones.
Non-Determinism and the Lawlessness of Machine Learning Code
Cooper, A. Feder, Frankle, Jonathan, De Sa, Christopher
Legal literature on machine learning (ML) tends to focus on harms, and thus tends to reason about individual model outcomes and summary error rates. This focus has masked important aspects of ML that are rooted in its reliance on randomness -- namely, stochasticity and non-determinism. While some recent work has begun to reason about the relationship between stochasticity and arbitrariness in legal contexts, the role of non-determinism more broadly remains unexamined. In this paper, we clarify the overlap and differences between these two concepts, and show that the effects of non-determinism, and consequently its implications for the law, become clearer from the perspective of reasoning about ML outputs as distributions over possible outcomes. This distributional viewpoint accounts for randomness by emphasizing the possible outcomes of ML. Importantly, this type of reasoning is not exclusive with current legal reasoning; it complements (and in fact can strengthen) analyses concerning individual, concrete outcomes for specific automated decisions. By illuminating the important role of non-determinism, we demonstrate that ML code falls outside of the cyberlaw frame of treating ``code as law,'' as this frame assumes that code is deterministic. We conclude with a brief discussion of what work ML can do to constrain the potentially harm-inducing effects of non-determinism, and we indicate where the law must do work to bridge the gap between its current individual-outcome focus and the distributional approach that we recommend.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > District of Columbia > Washington (0.05)
- (8 more...)
- Law (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
- Education > Curriculum > Subject-Specific Education (0.41)