baker
MultiZebraLogic: A Multilingual Logical Reasoning Benchmark
Bruun, Sofie Helene, Smart, Dan Saattrup
Measuring the full abilities of large language models (LLMs) requires benchmarks representing multiple tasks. We aim to create large, high-quality datasets for comparison of logical reasoning skills across several languages and of suitable difficulty for LLMs of various reasoning ability. We explore multiple ways of increasing difficulty. We generate zebra puzzles in multiple languages, themes, sizes and including 14 different clue types and 8 red herring types (uninformative clues). We find puzzle sizes 2x3 and 4x5 are sufficiently challenging for GPT-4o mini (a non-reasoning model) and o3-mini (a reasoning model), respectively. Including 5 red herrings decreases o3-mini puzzle-level accuracy on 4x5 puzzles by 15$\pm$7 %. Scores of o3-mini on 4x5 puzzles are not significantly affected by use of English vs. Danish or the common houses theme vs. the country-specific smoerrebroed theme. We find no correlation between difficulty and the selected clue types. Datasets of 128+1024 puzzles are published as MultiZebraLogic in each of nine Germanic languages for sizes 2x3 and 4x5. We publish code for puzzle generation, designed for adaptablity into more languages and themes.
- North America > Canada (0.04)
- Europe > Faroe Islands > Streymoy > Tórshavn (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
em Jeopardy! /em 's Most Infamous Moment Haunted the Show's Fans, Its Stars, and Even Alex Trebek. It's Clear Why Now.
's most controversial moment was years in the making. It took many more for the fallout to come into full view. One morning in 2010, Alex Trebek walked onto the IBM campus not far outside New York City and prepared to inspect what would become the most unusual player in's history. The trip, clear across the country from the show's Culver City set, had been carefully planned. David Ferrucci, a computer scientist at IBM, had spent years leading a team to develop what would become the first and, so far, last nonhuman ever to compete on Longtime host Trebek would watch three practice games played with "Watson," as the system was named, and two human contestants. Then the team would be taken to lunch nearby, and Trebek would ultimately take the stage and host two more Watson practice games himself. By then the preparations for a future televised contest with IBM's creation were well underway, but this was the first time Trebek would encounter the technology in person, and his approval was crucial. Ferrucci was eager to show off one element in particular: the display, which had been rigged to show Watson's top three guesses whenever it answered, along with the numerical confidence rate it had in each one. For Ferrucci, this feature was central to demonstrating the computer's language-processing capabilities, because it showed that Watson wasn't just spitting out answers--it was reasoning. If Watson were ever going to be deployed to industries like health care, its human users wouldn't just want to know its best guess. It would be infinitely more valuable to know if Watson was 95 percent confident or just 30 percent, and whether those confidence levels were in line with its actual accuracy rate. It also made for better viewing. Ferrucci had brought his young daughter to the lab earlier in the process and showed her Watson as it played against human opponents. When Watson declined to ring in, Ferrucci's daughter turned to him and asked if the computer had crashed. He struggled to explain that it hadn't--it just wasn't confident enough to hazard a guess.
- North America > United States > California > Los Angeles County > Culver City (0.24)
- North America > United States > New York > Westchester County (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (3 more...)
A Compression Based Classification Framework Using Symbolic Dynamics of Chaotic Maps
Naik, Parth, B, Harikrishnan N
We propose a novel classification framework grounded in symbolic dynamics and data compression using chaotic maps. The core idea is to model each class by generating symbolic sequences from thresholded real-valued training data, which are then evolved through a one-dimensional chaotic map. For each class, we compute the transition probabilities of symbolic patterns (e.g., `00', `01', `10', and `11' for the second return map) and aggregate these statistics to form a class-specific probabilistic model. During testing phase, the test data are thresholded and symbolized, and then encoded using the class-wise symbolic statistics via back iteration, a dynamical reconstruction technique. The predicted label corresponds to the class yielding the shortest compressed representation, signifying the most efficient symbolic encoding under its respective chaotic model. This approach fuses concepts from dynamical systems, symbolic representations, and compression-based learning. We evaluate the proposed method: \emph{ChaosComp} on both synthetic and real-world datasets, demonstrating competitive performance compared to traditional machine learning algorithms (e.g., macro F1-scores for the proposed method on Breast Cancer Wisconsin = 0.9531, Seeds = 0.9475, Iris = 0.8469 etc.). Rather than aiming for state-of-the-art performance, the goal of this research is to reinterpret the classification problem through the lens of dynamical systems and compression, which are foundational perspectives in learning theory and information processing.
- North America > United States > Wisconsin (0.24)
- Asia > India > Karnataka > Bengaluru (0.04)
- Asia > India > Goa (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Elon Musk makes bold play for an unlikely marriage with 3 trillion icon
Elon Musk has been openly hinting at a historic merger in the business world, suggesting that his company xAI should partner with tech giant Apple. Musk's company is the corporate face of his popular AI chatbot Grok, which functions similarly to competitors like ChatGPT, Claude, Gemini, and Copilot. Meanwhile, Apple has struggled to bring its own AI programs to consumers, notably delaying improvements to the Siri voice assistant until 2026. Venture capitalists started openly speculating this month that Musk and Apple make the perfect power couple in the AI world, with xAI bringing Grok to even more people using iPhones through this proposed partnership. On the All-In Podcast, investor Gavin Baker called xAI's Grok4 'the best product' in terms of AI chatbots right now, but added that'the best product doesn't always win in technology.' 'I think there is solid industrial logic for a partnership.
Wimbledon chiefs defend AI use as Jack Draper says line calls not '100% accurate'
Wimbledon bosses have defended the use of AI line judges after Jack Draper said the technology was not "100% accurate". The British No 1 said it was "a shame" human line judges were ousted after crashing out in the second round to the 36-year-old former finalist Marin Cilic. Draper, 23, grew frustrated with the AI-enhanced Hawk-Eye technology during Thursday's match, holding his arms out in disbelief after one of his opponent's serves was not called out in the fourth set. "I don't think it's 100% accurate in all honesty," he said in his post-match press conference. "A couple of the ones today, it showed a mark on the court. There's no way the chalk would have showed that. I guess it cannot be 100% accurate – it's millimetres."
The Bakers and Millers Game with Restricted Locations
Krogmann, Simon, Lenzner, Pascal, Skopalik, Alexander
We study strategic location choice by customers and sellers, termed the Bakers and Millers Game in the literature. In our generalized setting, each miller can freely choose any location for setting up a mill, while each baker is restricted in the choice of location for setting up a bakery. For optimal bargaining power, a baker would like to select a location with many millers to buy flour from and with little competition from other bakers. Likewise, a miller aims for a location with many bakers and few competing millers. Thus, both types of agents choose locations to optimize the ratio of agents of opposite type divided by agents of the same type at their chosen location. Originally raised in the context of Fractional Hedonic Games, the Bakers and Millers Game has applications that range from commerce to product design. We study the impact of location restrictions on the properties of the game. While pure Nash equilibria trivially exist in the setting without location restrictions, we show via a sophisticated, efficient algorithm that even the more challenging restricted setting admits equilibria. Moreover, the computed equilibrium approximates the optimal social welfare by a factor of at most $2\left(\frac{e}{e-1}\right)$. Furthermore, we give tight bounds on the price of anarchy/stability. On the conceptual side, the location choice feature adds a new layer to the standard setting of Hedonic Games, in the sense that agents that select the same location form a coalition. This allows to naturally restrict the possible coalitions that can be formed. With this, our model generalizes simple symmetric Fractional Hedonic Games on complete bipartite valuation graphs and also Hedonic Diversity Games with utilities single-peaked at 0. We believe that this generalization is also a very interesting direction for other types of Hedonic Games.
- Europe > Monaco (0.05)
- Europe > Germany > Brandenburg > Potsdam (0.04)
- North America > United States > Michigan > Wayne County > Detroit (0.04)
- Europe > Netherlands (0.04)
The 10 Best Books of 2024
Slate has relationships with various online retailers. If you buy something through our links, Slate may earn an affiliate commission. We update links when possible, but note that deals can expire and all prices are subject to change. All prices were up to date at the time of publication. It shouldn't be surprising that my list of my favorite books of 2024 includes a number of works by older writers working out what it means to make art as a career--what their creative future might look like.
- North America > United States > New York (0.06)
- Europe > United Kingdom > England (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.05)
- (2 more...)
Evaluating the Impact of Data Augmentation on Predictive Model Performance
Švábenský, Valdemar, Borchers, Conrad, Cloude, Elizabeth B., Shimada, Atsushi
In supervised machine learning (SML) research, large training datasets are essential for valid results. However, obtaining primary data in learning analytics (LA) is challenging. Data augmentation can address this by expanding and diversifying data, though its use in LA remains underexplored. This paper systematically compares data augmentation techniques and their impact on prediction performance in a typical LA task: prediction of academic outcomes. Augmentation is demonstrated on four SML models, which we successfully replicated from a previous LAK study based on AUC values. Among 21 augmentation techniques, SMOTE-ENN sampling performed the best, improving the average AUC by 0.01 and approximately halving the training time compared to the baseline models. In addition, we compared 99 combinations of chaining 21 techniques, and found minor, although statistically significant, improvements across models when adding noise to SMOTE-ENN (+0.014). Notably, some augmentation techniques significantly lowered predictive performance or increased performance fluctuation related to random chance. This paper's contribution is twofold. Primarily, our empirical findings show that sampling techniques provide the most statistically reliable performance improvements for LA applications of SML, and are computationally more efficient than deep generation methods with complex hyperparameter settings. Second, the LA community may benefit from validating a recent study through independent replication.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
- (9 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine (0.68)
- Education > Educational Technology > Educational Software (0.46)
- Education > Educational Setting > Higher Education (0.46)
Set up your smart speaker for emergencies
Sarah Ferman Baker of Texas recently participated in Amazon Prime Day without her husband's knowledge. Watch the pure panic on husband Jamie Baker's face when he counts 17 delivery boxes on his front porch! Smart speakers tell you the weather, play music, answer trivia questions, help you prank your spouse (more on that at the end), and they just might save your life one day. Make sure you know these commands to get help in an emergency by heart. Let's start with the most popular They won't reliably report your location and don't offer a callback number, so they don't meet the standard requirements.
- North America > United States > Texas (0.26)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.06)
A data bottleneck is holding AI science back, says new Nobel winner
AI has been a gamechanger for biochemists like Baker. Seeing what DeepMind was able to do with AlphaFold made it clear that deep learning was going to be a powerful tool for their work. "There's just all these problems that were really hard before that we are now having much more success with thanks to generative AI methods. We can do much more complicated things," Baker says. Baker is already busy at work.