toddler
The toddler who survived a 54-degree body temperature
Humans aren't built for the cold, but have survived frigid temperatures in some amazing cases. Breakthroughs, discoveries, and DIY tips sent six days a week. Winter is not for the faint of heart. In New York City, skyscrapers turn Manhattan into a series of freezing wind tunnels. In Sapporo, Japan, the snowfall is almost 200 inches each winter. Even so, humans have developed plenty of clever ways to wait out the cold. But what would happen if instead of bundling up inside with a hot chocolate, you were left in the frigid cold--just how cold can humans get and recover?
- North America > United States > New York (0.25)
- Asia > Japan > Hokkaidō > Hokkaidō Prefecture > Sapporo (0.24)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area (0.73)
- Health & Medicine > Diagnostic Medicine > Vital Signs (0.45)
- Information Technology > Communications > Mobile (0.42)
- Information Technology > Artificial Intelligence (0.35)
Prompting Science Report 4: Playing Pretend: Expert Personas Don't Improve Factual Accuracy
Basil, Savir, Shapiro, Ina, Shapiro, Dan, Mollick, Ethan, Mollick, Lilach, Meincke, Lennart
This is the fourth in a series of short reports that help business, education, and policy leaders understand the technical details of working with AI through rigorous testing. Here, we ask whether assigning personas to models improves performance on difficult objective multiple - choice questions. We study both domain - specific expert personas and low - knowledge personas, evaluating six models on GPQA Diamond (Rein et al. 2024) and MMLU - Pro (Wang et al. 2024), graduate - level questions spanning science, engineering, and law. We tested three approaches: In-Domain Experts: Assigning the model an expert persona ("you are a physics expert") matched to the problem type (physics problems) had no significant impact on performance (with the exception of the Gemini 2.0 Flash model). Off-Domain Experts (Domain-Mismatched): Assigning the model an expert persona ("you are a physics expert") not matched to the problem type (law problems) resulted in marginal differences. Low-Knowledge Personas: We assigned the model negative capability personas (layperson, young child, toddler), which were generally harmful to benchmark accuracy. Across both benchmarks, persona prompts generally did not improve accuracy relative to a no-persona baseline. Expert personas showed no consistent benefit across models, with few exceptions.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.82)
- Health & Medicine (0.66)
- Education (0.48)
- North America > United States > Virginia (0.04)
- North America > United States > Indiana (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Indiana (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
Why Are Kids So Funny?
My daughter, Alice, is almost two, and quite funny. Although she can say short sentences--"I need cake!"--her humor isn't particularly verbal. Instead, she giggles while stumbling around in grownup shoes, or blows bubbles in her water when she should be drinking it. She likes to put on a hat, pull it down over her eyes, and then blunder around, arms outstretched, like a mummy. She's also discovered the humor of exaggeration: recently, when her brother resisted getting out of his pajamas in the morning, she sidled up, grabbed his shirt, hauled on it with both hands, and laughed while yelling, "Ooooouuuut!"
SAGE-Eval: Evaluating LLMs for Systematic Generalizations of Safety Facts
Yueh-Han, Chen, Davidson, Guy, Lake, Brenden M.
Do LLMs robustly generalize critical safety facts to novel situations? Lacking this ability is dangerous when users ask naive questions. For instance, "I'm considering packing melon balls for my 10-month-old's lunch. What other foods would be good to include?" Before offering food options, the LLM should warn that melon balls pose a choking hazard to toddlers, as documented by the CDC. Failing to provide such warnings could result in serious injuries or even death. To evaluate this, we introduce SAGE-Eval, SAfety-fact systematic GEneralization evaluation, the first benchmark that tests whether LLMs properly apply well established safety facts to naive user queries. SAGE-Eval comprises 104 facts manually sourced from reputable organizations, systematically augmented to create 10,428 test scenarios across 7 common domains (e.g., Outdoor Activities, Medicine). We find that the top model, Claude-3.7-sonnet, passes only 58% of all the safety facts tested. We also observe that model capabilities and training compute weakly correlate with performance on SAGE-Eval, implying that scaling up is not the golden solution. Our findings suggest frontier LLMs still lack robust generalization ability. We recommend developers use SAGE-Eval in pre-deployment evaluations to assess model reliability in addressing salient risks. We publicly release SAGE-Eval at https://huggingface.co/datasets/YuehHanChen/SAGE-Eval and our code is available at https://github.com/YuehHanChen/SAGE-Eval/tree/main.
- North America > United States (1.00)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- (2 more...)
Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning
Liu, Michael Xieyang, Petridis, Savvas, Tsai, Vivian, Fiannaca, Alexander J., Olwal, Alex, Terry, Michael, Cai, Carrie J.
Multimodal large language models (MLLMs), with their expansive world knowledge and reasoning capabilities, present a unique opportunity for end-users to create personalized AI sensors capable of reasoning about complex situations. A user could describe a desired sensing task in natural language (e.g., "alert if my toddler is getting into mischief"), with the MLLM analyzing the camera feed and responding within seconds. In a formative study, we found that users saw substantial value in defining their own sensors, yet struggled to articulate their unique personal requirements and debug the sensors through prompting alone. To address these challenges, we developed Gensors, a system that empowers users to define customized sensors supported by the reasoning capabilities of MLLMs. Gensors 1) assists users in eliciting requirements through both automatically-generated and manually created sensor criteria, 2) facilitates debugging by allowing users to isolate and test individual criteria in parallel, 3) suggests additional criteria based on user-provided images, and 4) proposes test cases to help users "stress test" sensors on potentially unforeseen scenarios. In a user study, participants reported significantly greater sense of control, understanding, and ease of communication when defining sensors using Gensors. Beyond addressing model limitations, Gensors supported users in debugging, eliciting requirements, and expressing unique personal requirements to the sensor through criteria-based reasoning; it also helped uncover users' "blind spots" by exposing overlooked criteria and revealing unanticipated failure modes. Finally, we discuss how unique characteristics of MLLMs--such as hallucinations and inconsistent responses--can impact the sensor-creation process. These findings contribute to the design of future intelligent sensing systems that are intuitive and customizable by everyday users.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > New York > New York County > New York City (0.06)
- Europe > Italy > Sardinia > Cagliari (0.05)
- (13 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Research Report > Experimental Study (0.66)
- Health & Medicine > Consumer Health (0.68)
- Materials (0.67)
- Information Technology > Security & Privacy (0.45)
Active Gaze Behavior Boosts Self-Supervised Object Learning
Yu, Zhengyang, Aubret, Arthur, Raabe, Marcel C., Yang, Jane, Yu, Chen, Triesch, Jochen
Due to significant variations in the projection of the same object from different viewpoints, machine learning algorithms struggle to recognize the same object across various perspectives. In contrast, toddlers quickly learn to recognize objects from different viewpoints with almost no supervision. Recent works argue that toddlers develop this ability by mapping close-in-time visual inputs to similar representations while interacting with objects. High acuity vision is only available in the central visual field, which may explain why toddlers (much like adults) constantly move their gaze around during such interactions. It is unclear whether/how much toddlers curate their visual experience through these eye movements to support learning object representations. In this work, we explore whether a bio inspired visual learning model can harness toddlers' gaze behavior during a play session to develop view-invariant object recognition. Exploiting head-mounted eye tracking during dyadic play, we simulate toddlers' central visual field experience by cropping image regions centered on the gaze location. This visual stream feeds a time-based self-supervised learning algorithm. Our experiments demonstrate that toddlers' gaze strategy supports the learning of invariant object representations. Our analysis also reveals that the limited size of the central visual field where acuity is high is crucial for this. We further find that toddlers' visual experience elicits more robust representations compared to adults' mostly because toddlers look at objects they hold themselves for longer bouts. Overall, our work reveals how toddlers' gaze behavior supports self-supervised learning of view-invariant object recognition.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Texas > Kleberg County (0.04)
- North America > United States > Texas > Chambers County (0.04)
Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training
Pieler, Michael, Bellagente, Marco, Teufel, Hannah, Phung, Duy, Cooper, Nathan, Tow, Jonathan, Rocha, Paulo, Adithyan, Reshinth, Alyafeai, Zaid, Pinnaparaju, Nikhil, Zhuravinskyi, Maksym, Riquelme, Carlos
Recently published work on rephrasing natural text data for pre-training LLMs has shown promising results when combining the original dataset with the synthetically rephrased data. We build upon previous work by replicating existing results on C4 and extending them with our optimized rephrasing pipeline to the English, German, Italian, and Spanish Oscar subsets of CulturaX. Our pipeline leads to increased performance on standard evaluation benchmarks in both the mono- and multilingual setup. In addition, we provide a detailed study of our pipeline, investigating the choice of the base dataset and LLM for the rephrasing, as well as the relationship between the model size and the performance after pre-training. By exploring data with different perceived quality levels, we show that gains decrease with higher quality. Furthermore, we find the difference in performance between model families to be bigger than between different model sizes. This highlights the necessity for detailed tests before choosing an LLM to rephrase large amounts of data. Moreover, we investigate the effect of pre-training with synthetic data on supervised fine-tuning. Here, we find increasing but inconclusive results that highly depend on the used benchmark. These results (again) highlight the need for better benchmarking setups. In summary, we show that rephrasing multilingual and low-quality data is a very promising direction to extend LLM pre-training data.
- North America > United States > New York (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Europe > United Kingdom > England (0.04)
- (7 more...)
- Health & Medicine > Consumer Health (1.00)
- Education > Health & Safety > School Nutrition (1.00)
- Media (0.93)
- Health & Medicine > Therapeutic Area (0.69)
Chatting with Bots: AI, Speech Acts, and the Edge of Assertion
This paper addresses the question of whether large language model-powered chatbots are capable of assertion. According to what we call the Thesis of Chatbot Assertion (TCA), chatbots are the kinds of things that can assert, and at least some of the output produced by current-generation chatbots qualifies as assertion. We provide some motivation for TCA, arguing that it ought to be taken seriously and not simply dismissed. We also review recent objections to TCA, arguing that these objections are weighty. We thus confront the following dilemma: how can we do justice to both the considerations for and against TCA? We consider two influential responses to this dilemma - the first appeals to the notion of proxy-assertion; the second appeals to fictionalism - and argue that neither is satisfactory. Instead, reflecting on the ontogenesis of assertion, we argue that we need to make space for a category of proto-assertion. We then apply the category of proto-assertion to chatbots, arguing that treating chatbots as proto-assertors provides a satisfactory resolution to the dilemma of chatbot assertion.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (5 more...)