dopamine
The Formalism-Implementation Gap in Reinforcement Learning Research
The last decade has seen an upswing in interest and adoption of reinforcement learning (RL) techniques, in large part due to its demonstrated capabilities at performing certain tasks at "super-human levels". This has incentivized the community to prioritize research that demonstrates RL agent performance, often at the expense of research aimed at understanding their learning dynamics. Performance-focused research runs the risk of overfitting on academic benchmarks -- thereby rendering them less useful -- which can make it difficult to transfer proposed techniques to novel problems. Further, it implicitly diminishes work that does not push the performance-frontier, but aims at improving our understanding of these techniques. This paper argues two points: (i) RL research should stop focusing solely on demonstrating agent capabilities, and focus more on advancing the science and understanding of reinforcement learning; and (ii) we need to be more precise on how our benchmarks map to the underlying mathematical formalisms. We use the popular Arcade Learning Environment (ALE; Bellemare et al., 2013) as an example of a benchmark that, despite being increasingly considered "saturated", can be effectively used for developing this understanding, and facilitating the deployment of RL techniques in impactful real-world problems.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
Mind the Quote: Enabling Quotation-Aware Dialogue in LLMs via Plug-and-Play Modules
Zhang, Yueqi, Yuan, Peiwen, Feng, Shaoxiong, Li, Yiwei, Wang, Xinglin, Shi, Jiayi, Tan, Chuyi, Pan, Boyuan, Hu, Yao, Li, Kan
Human-AI conversation frequently relies on quoting earlier text-"check it with the formula I just highlighted"-yet today's large language models (LLMs) lack an explicit mechanism for locating and exploiting such spans. We formalise the challenge as span-conditioned generation, decomposing each turn into the dialogue history, a set of token-offset quotation spans, and an intent utterance. Building on this abstraction, we introduce a quotation-centric data pipeline that automatically synthesises task-specific dialogues, verifies answer correctness through multi-stage consistency checks, and yields both a heterogeneous training corpus and the first benchmark covering five representative scenarios. To meet the benchmark's zero-overhead and parameter-efficiency requirements, we propose QuAda, a lightweight training-based method that attaches two bottleneck projections to every attention head, dynamically amplifying or suppressing attention to quoted spans at inference time while leaving the prompt unchanged and updating < 2.8% of backbone weights. Experiments across models show that QuAda is suitable for all scenarios and generalises to unseen topics, offering an effective, plug-and-play solution for quotation-aware dialogue.
- Africa > Middle East > Egypt (0.46)
- Asia > Japan (0.14)
- North America > United States (0.14)
- (11 more...)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Government > Regional Government (1.00)
- (6 more...)
Pay attention! 12 ways to improve your focus and concentration span
That was the average length of time an adult could focus on a screen for in 2021, according to research by Gloria Mark, a professor of informatics at the University of California. Twenty years ago, in 2004, that number stood at two-and-a-half minutes. Our attention spans – how long we're able to concentrate without being distracted – are shrinking. Our focus – how intensely we can think about things – is suffering too. The causes: technology that's designed to demand our attention; endless tools for procrastination at our fingertips; rising stress and anxiety disorders; and poor sleep quality.
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Consumer Health (1.00)
DoPAMine: Domain-specific Pre-training Adaptation from seed-guided data Mining
Arannil, Vinayak, Narwal, Neha, Bhabesh, Sourav Sanjukta, Thirandas, Sai Nikhil, Wang, Darren Yow-Bang, Horwood, Graham, Chirayath, Alex Anto, Pandeshwar, Gouri
Large Language Models (LLMs) have shown remarkable ability to generalize effectively across numerous industry domains while executing a range of tasks. Many of these competencies are obtained from the data utilized during the pre-training phase of the Language Models (LMs). However, these models exhibit limitations when tasked with performing in specialized or low-resource industry domains. More recent approaches use LLMs for generating domain-specific synthetic data but most often they lack in truthfulness and complexity. Alternatively, in cases where domain data is available like healthcare and finance most of the LMs are proprietary necessitating the need for a scalable method to curate real world industry specific pre-training data. In this work, we propose an automated and scalable framework - DoPAMine:Domain-specific Pre-training Adaptation from seed-guided data Mining, to mine domain specific training data from a large data corpus for domain adaptation of a LM. The framework leverages the parametric knowledge of a LLM to generate diverse and representative seed data tailored to a specific domain which is then used to mine real world data from a large data corpus like Common Crawl. We evaluated our framework's performance in the continual pre-training (CPT) setting by training two domain specific 7B parameter LMs in healthcare and finance with data mined via DoPAMine. Our experiments show that DoPAMine boosts the performance of pre-trained LLMs on average by 4.9% and 5.1% in zero-shot and 5-shot settings respectively on healthcare tasks from MMLU, MedQA, MedMCQA and PubMedQA datasets, and 2.9% and 6.7% for zero-shot and 5-shot settings respectively on finance tasks from FiQA-SA, FPB and Headlines datasets when compared to the baseline.
- Asia > Middle East (0.14)
- North America > Canada (0.14)
- Europe > Spain (0.14)
- Leisure & Entertainment > Sports (1.00)
- Banking & Finance (1.00)
- Energy > Oil & Gas (0.93)
- (4 more...)
I Caved and Bought My Kids a Coveted Gaming Console. I've Made a Horrible Mistake.
Care and Feeding is Slate's parenting advice column. Have a question for Care and Feeding? We made a huge mistake. This summer, we purchased a switch for our boys (almost 5, and newly 7). The 5 year old isn't that into it, but my 7 year old is thrilled to finally be able to talk games and be in the loop with all his peers. We researched and picked games that were age-appropriate.
- Leisure & Entertainment > Sports (0.70)
- Leisure & Entertainment > Games > Computer Games (0.55)
Computer made out of human BRAINS could solve the world's energy crisis - here's the scientist making science fiction reality
There is a lot of fear about robots replacing human. But maybe it should be the machines worrying about us. Swedish scientists have created the world's first'living computer' that is made out of human brain tissue. It composes of 16 organoids, or clumps of brain cells that were grown in a lab, which send information between each other. They work much like a traditional computer chip - sending and receiving signals through their neurons that act like circuits.
- Energy (1.00)
- Health & Medicine > Therapeutic Area > Neurology (0.56)
Navigating 2024 with strategies tailored for those suffering from anxiety, depression, ADHD
Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. The journey toward improved mental health by setting thoughtful and achievable goals can be a powerful strategy. Whether grappling with anxiety, depression, ADHD or other conditions, establishing personalized goals fosters a sense of direction, accomplishment and empowerment. Explore specific goals tailored to each condition, promoting overall mental well-being.
Dopamine Bonuses
Substantial data support a temporal difference (TO) model of dopamine (OA) neuron activity in which the cells provide a global error signal for reinforcement learning. However, in certain cir(cid:173) cumstances, OA activity seems anomalous under the TO model, responding to non-rewarding stimuli. We address these anoma(cid:173) lies by suggesting that OA cells multiplex information about re(cid:173) ward bonuses, including Sutton's exploration bonuses and Ng et al's non-distorting shaping bonuses. We interpret this additional role for OA in terms of the unconditional attentional and psy(cid:173) chomotor effects of dopamine, having the computational role of guiding exploration.
How fast to work: Response vigor, motivation and tonic dopamine
Reinforcement learning models have long promised to unify computa- tional, psychological and neural accounts of appetitively conditioned be- havior. However, the bulk of data on animal conditioning comes from free-operant experiments measuring how fast animals will work for rein- forcement. Existing reinforcement learning (RL) models are silent about these tasks, because they lack any notion of vigor. They thus fail to ad- dress the simple observation that hungrier animals will work harder for food, as well as stranger facts such as their sometimes greater produc- tivity even when working for irrelevant outcomes such as water. Here, we develop an RL framework for free-operant behavior, suggesting that subjects choose how vigorously to perform selected actions by optimally balancing the costs and benefits of quick responding. Finally, we suggest that tonic levels of dopamine may be involved in the computation linking motivational state to optimal responding, thereby explaining the complex vigor-related ef- fects of pharmacological manipulation of dopamine.
Top Reinforcement Learning Tools/Platforms in 2022
Reinforcement learning is one subfield of machine learning. It involves acting appropriately to maximize reward in a particular circumstance. It is used by various programs and machines to determine the optimal course of action to pursue in a given case. Reinforcement learning has no right or wrong solution; instead, the reinforcement agent decides what to do to finish the task. This differs from supervised learning, where the training data includes the solution key, and the model is trained with that answer.
- North America > Canada > Ontario > Middlesex County > London (0.05)
- Europe > France (0.05)