Goto

Collaborating Authors

 kitten


Why do cats lick you? An expert explains.

Popular Science

Why do cats lick you? Grooming is only one way cats say, I love you." Some cats shower their favorite humans with sandpaper kisses. Breakthroughs, discoveries, and DIY tips sent six days a week. If you've ever been around a cat, you know they can get the sudden urge to groom themselves at just about any moment. Everything seems lovely and content. Then, they lose all interest in you and start licking their butt. A cat will be busy grooming themselves. Other cats can't be bothered and won't ever groom or lick their human friends, or other kitty friends for that matter. So, why do some cats lick their owners? Are they trying to clean you, too? We asked an animal behaviorist and cat expert to help us sort out exactly what is going on when your cat licks you. For a mother cat, grooming is an important part of child rearing. When a mama cat licks her kittens it serves two important purposes: keeping her kittens clean and promoting social bonds, Kristyn Vitale, an animal behaviorist at Maueyes Cat Science and Education tells . On the one hand, "mother cats are going to groom their kittens to help keep them clean and healthy," says Vitale. Kittens can be especially susceptible to diseases, and "anybody who's raised young kittens knows how dirty they can get, and a mother cat is not going to obviously bathe their kitten in a tub.

  Country: Asia > Thailand (0.05)
  Genre: Research Report > New Finding (0.50)
  Industry: Media > Photography (0.31)

Does Thinking More always Help? Mirage of Test-Time Scaling in Reasoning Models

Ghosal, Soumya Suvra, Chakraborty, Souradip, Reddy, Avinash, Lu, Yifu, Wang, Mengdi, Manocha, Dinesh, Huang, Furong, Ghavamzadeh, Mohammad, Bedi, Amrit Singh

arXiv.org Artificial Intelligence

Recent trends in test-time scaling for reasoning models (e.g., OpenAI o1, DeepSeek R1) have led to a popular belief that extending thinking traces using prompts like "Wait" or "Let me rethink" can improve performance. This raises a natural question: Does thinking more at test-time truly lead to better reasoning? To answer this question, we perform a detailed empirical study across models and benchmarks, which reveals a consistent pattern of initial performance improvements from additional thinking followed by a decline, due to "overthinking". To understand this non-monotonic trend, we consider a simple probabilistic model, which reveals that additional thinking increases output variance-creating an illusion of improved reasoning while ultimately undermining precision. Thus, observed gains from "more thinking" are not true indicators of improved reasoning, but artifacts stemming from the connection between model uncertainty and evaluation metric. This suggests that test-time scaling through extended thinking is not an effective way to utilize the inference thinking budget. Recognizing these limitations, we introduce an alternative test-time scaling approach, parallel thinking, inspired by Best-of-N sampling. Our method generates multiple independent reasoning paths within the same inference budget and selects the most consistent response via majority vote, achieving up to 20% higher accuracy compared to extended thinking. This provides a simple yet effective mechanism for test-time scaling of reasoning models.


AI has created a new breed of cat video: addictive, disturbing and nauseatingly quick soap operas

The Guardian

At the (tail) end of 2024, Billie Eilish sat cross-legged on stage and began to miaow. Her fans erupted in harmony, each belting out an off-key miaow of their own. This is because Eilish's Oscar-winning track What Was I Made For? – a lachrymose Barbie cut lamenting adulthood's entailing ennui – has become the default soundtrack for a new breed of cat video. You may recognise it: the song often plays over the top of these AI-generated fantasias featuring a cartoonishly fat cat or an equally buff feline with a suspiciously veiny human body. The cat cheats on her lover, falls pregnant or seeks revenge in a weirdly condensed soap opera.


MATHWELL: Generating Age-Appropriate Educational Math Word Problems

Christ, Bryan R, Kropko, Jonathan, Hartvigsen, Thomas

arXiv.org Artificial Intelligence

Math word problems are critical K-8 educational tools, but writing them is time-consuming and requires domain expertise. We suggest that language models can support K-8 math education by automatically generating problems. To be educational, generated problems must be 1) solvable, 2) accurate, and 3) appropriate. Existing datasets are unlabeled for these criteria, making them ill-suited for training problem generators. To address this gap, we use domain expert annotation to curate a high-quality synthetic training dataset for this task. We show the value of this data by using it to iteratively finetune Llama-2 (70B) to create MATHWELL, a K-8 word problem generator. Domain experts find MATHWELL has a 40% higher share of problems that have executable solutions and meet all criteria than existing open-source models, with 74% of its problems with executable solutions being solvable, accurate, and appropriate. MATHWELL achieves 94.9% of GPT-4 Turbo's performance on this task while outputting problems written at a more appropriate reading level for K-8 students. MATHWELL's performance despite being trained by finetuning only highlights the quality of our synthetic data for training age-appropriate word problem generators. We release our model, data, and annotations.


Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation

Dahary, Omer, Patashnik, Or, Aberman, Kfir, Cohen-Or, Daniel

arXiv.org Artificial Intelligence

Text-to-image diffusion models have an unprecedented ability to generate diverse and high-quality images. However, they often struggle to faithfully capture the intended semantics of complex input prompts that include multiple subjects. Recently, numerous layout-to-image extensions have been introduced to improve user control, aiming to localize subjects represented by specific tokens. Yet, these methods often produce semantically inaccurate images, especially when dealing with multiple semantically or visually similar subjects. In this work, we study and analyze the causes of these limitations. Our exploration reveals that the primary issue stems from inadvertent semantic leakage between subjects in the denoising process. This leakage is attributed to the diffusion model's attention layers, which tend to blend the visual features of different subjects. To address these issues, we introduce Bounded Attention, a training-free method for bounding the information flow in the sampling process. Bounded Attention prevents detrimental leakage among subjects and enables guiding the generation to promote each subject's individuality, even with complex multi-subject conditioning. Through extensive experimentation, we demonstrate that our method empowers the generation of multiple subjects that better align with given prompts and layouts.


A Comparative Investigation of Compositional Syntax and Semantics in DALL-E 2

Murphy, Elliot, de Villiers, Jill, Morales, Sofia Lucero

arXiv.org Artificial Intelligence

In this study we compared how well DALL-E 2 visually represented the meaning of linguistic prompts also given to young children in comprehension tests. Sentences representing fundamental components of grammatical knowledge were selected from assessment tests used with several hundred English-speaking children aged 2-7 years for whom we had collected original item-level data. DALL-E 2 was given these prompts five times to generate 20 cartoons per item, for 9 adult judges to score. Results revealed no conditions in which DALL-E 2-generated images that matched the semantic accuracy of children, even at the youngest age (2 years). DALL-E 2 failed to assign the appropriate roles in reversible forms; it failed on negation despite an easier contrastive prompt than the children received; it often assigned the adjective to the wrong noun; it ignored implicit agents in passives. This work points to a clear absence of compositional sentence representations for DALL-E 2.


Silico-centric Theory of Mind

Mukherjee, Anirban, Chang, Hannah Hanwen

arXiv.org Artificial Intelligence

Theory of Mind (ToM) refers to the ability to attribute mental states, such as beliefs, desires, intentions, and knowledge, to oneself and others, and to understand that these mental states can differ from one's own and from reality. We investigate ToM in environments with multiple, distinct, independent AI agents, each possessing unique internal states, information, and objectives. Inspired by human false-belief experiments, we present an AI ('focal AI') with a scenario where its clone undergoes a human-centric ToM assessment. We prompt the focal AI to assess whether its clone would benefit from additional instructions. Concurrently, we give its clones the ToM assessment, both with and without the instructions, thereby engaging the focal AI in higher-order counterfactual reasoning akin to human mentalizing--with respect to humans in one test and to other AI in another. We uncover a discrepancy: Contemporary AI demonstrates near-perfect accuracy on human-centric ToM assessments. Since information embedded in one AI is identically embedded in its clone, additional instructions are redundant. Yet, we observe AI crafting elaborate instructions for their clones, erroneously anticipating a need for assistance. An independent referee AI agrees with these unsupported expectations. Neither the focal AI nor the referee demonstrates ToM in our 'silico-centric' test.


Corgis and cats with crossbows: Party Animals wants to be your new Saturday night video game

The Guardian

I've said it before and I'll say it again: there's nothing quite like a kitten wielding a crossbow. Party Animals gives you an array of adorable pets to dress up and throw into an arena, where they slapstick-spar against each other with bats, shovels, nunchucks and more. Developer Recreate Games looked to the much-loved jelly-baby beat-'em-up Gang Beasts for inspiration for this upcoming party brawler. Having your otter grab a crossbow to take your family out? "Picture this: you transform into an adorable corgi, your best buddy turns into a goofy dinosaur, and your girlfriend becomes a cute kitten. You're all brawling on a submarine, in a bar, in the snow; in all sorts of extraordinary places," explains the head of Recreate Games, known only as PM.


CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?

Wei, Tianwen, Luan, Jian, Liu, Wei, Dong, Shuang, Wang, Bin

arXiv.org Artificial Intelligence

We present the Chinese Elementary School Math Word Problems (CMATH) dataset, comprising 1.7k elementary school-level math word problems with detailed annotations, source from actual Chinese workbooks and exams. This dataset aims to provide a benchmark tool for assessing the following question: to what grade level of elementary school math do the abilities of popular large language models (LLMs) correspond? We evaluate a variety of popular LLMs, including both commercial and open-source options, and discover that only GPT-4 achieves success (accuracy $\geq$ 60\%) across all six elementary school grades, while other models falter at different grade levels. Furthermore, we assess the robustness of several top-performing LLMs by augmenting the original problems in the CMATH dataset with distracting information. Our findings reveal that GPT-4 is able to maintains robustness, while other model fail. We anticipate that our study will expose limitations in LLMs' arithmetic and reasoning capabilities, and promote their ongoing development and advancement.


Zero-shot Learning, Explained - KDnuggets

#artificialintelligence

The reason why machine learning models in general are becoming smarter is due to their dependency on using labeled data to help them discern between two similar objects. However, without these labeled datasets, you will encounter major obstacles when creating the most effective and trustworthy machine-learning model. Deep learning has been widely used to solve tasks such as Computer vision using supervised learning. However, as with many things in life, it comes with restrictions. Supervised classification requires a high quantity and quality of labeled training data in order to produce a robust model.