gold
Gold rebounds above 5,000 after US downs Iran drone
Wild fluctuations in the price of gold continued on Wednesday as geopolitical tensions reignited after the US downed an Iranian drone . The precious metal, which is seen as a so-called safe haven for investors in times of uncertainty, shot back above $5,000 (£3,650) an ounce following days of sharp falls. Gold prices had been propelled to record highs by rapid changes in US trade policy, ongoing geopolitical uncertainty and conflict and central banks increasing their purchases of bullion. Wednesday's jump, to $5,061 per ounce, left the price of gold around 80% higher than the same time a year ago. A US military spokesman confirmed the Iranian drone had been shot down after it aggressively approached an American aircraft carrier in the Arabian Sea. Tehran has not commented on Tuesday's incident.
- North America > United States (1.00)
- Indian Ocean > Arabian Sea (0.25)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.25)
- (15 more...)
- Materials > Metals & Mining (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military (1.00)
Mining GOLD Samples for Conditional GANs
Conditional generative adversarial networks (cGANs) have gained a considerable attention in recent years due to its class-wise controllability and superior quality for complex generation tasks. We introduce a simple yet effective approach to improving cGANs by measuring the discrepancy between the data distribution and the model distribution on given samples. The proposed measure, coined the gap of log-densities (GOLD), provides an effective self-diagnosis for cGANs while being efficiently, computed from the discriminator. We propose three applications of the GOLD: example re-weighting, rejection sampling, and active learning, which improve the training, inference, and data selection of cGANs, respectively. Our experimental results demonstrate that the proposed methods outperform corresponding baselines for all three applications on different image datasets.
An Agentic AI System for Multi-Framework Communication Coding
Yang, Bohao, Yang, Rui, Biro, Joshua M., Wang, Haoyuan, Handley, Jessica L., Richardson, Brianna, Bessias, Sophia, Economou-Zavlanos, Nicoleta, Bedoya, Armando D., Agrawal, Monica, Zavlanos, Michael M., Chowdhury, Anand, Ratwani, Raj M., Sun, Kai, Pollak, Kathryn I., Pencina, Michael J., Hong, Chuan
Clinical communication is central to patient outcomes, yet large-scale human annotation of patient-provider conversation remains labor-intensive, inconsistent, and difficult to scale. Existing approaches based on large language models typically rely on single-task models that lack adaptability, interpretability, and reliability, especially when applied across various communication frameworks and clinical domains. In this study, we developed a Multi-framework Structured Agentic AI system for Clinical Communication (MOSAIC), built on a LangGraph-based architecture that orchestrates four core agents, including a Plan Agent for codebook selection and workflow planning, an Update Agent for maintaining up-to-date retrieval databases, a set of Annotation Agents that applies codebook-guided retrieval-augmented generation (RAG) with dynamic few-shot prompting, and a Verification Agent that provides consistency checks and feedback. To evaluate performance, we compared MOSAIC outputs against gold-standard annotations created by trained human coders. We developed and evaluated MOSAIC using 26 gold standard annotated transcripts for training and 50 transcripts for testing, spanning rheumatology and OB/GYN domains. On the test set, MOSAIC achieved an overall F1 score of 0.928. Performance was highest in the Rheumatology subset (F1 = 0.962) and strongest for Patient Behavior (e.g., patients asking questions, expressing preferences, or showing assertiveness). Ablations revealed that MOSAIC outperforms baseline benchmarking.
- North America > United States > North Carolina > Durham County > Durham (0.05)
- North America > United States > Pennsylvania (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
LLM-Cave: A benchmark and light environment for large language models reasoning and decision-making system
Li, Huanyu, Li, Zongyuan, Huang, Wei, Guo, Xian
Large language models (LLMs) such as ChatGPT o1, ChatGPT o3, and DeepSeek R1 have shown great potential in solving difficult problems. However, current LLM evaluation benchmarks are limited to one-step interactions. Some of the existing sequence decision-making environments, such as TextStarCraftII and LLM-PySC2, are too complicated and require hours of interaction to complete a game. In this paper, we introduce LLM-Cave, a benchmark and light environment for LLM reasoning and decision-making systems. This environment is a classic instance in the era of Symbolism. Artificial intelligence enables the agent to explore the environment and avoid potential losses by reasoning about nearby dangers using partial observable state information. In the experiment, we evaluated the sequential reasoning ability, decision-making performance and computational efficiency of mainstream large language models (LLMs) such as GPT-4o-mini, o1-mini, and DeepSeek-R1. Experiments show that while Deepseek-R1 achieved the highest success rate on complex reasoning tasks, smaller models like 4o-mini significantly narrowed the performance gap on challenges by employing Chain of Speculation and Planner-Critic strategies, at the expense of reduced computational efficiency. This indicates that structured, multi-step reasoning combined with an LLM-based feedback mechanism can substantially enhance an LLM's decision-making capabilities, providing a promising direction for improving reasoning in weaker models and suggesting a new reasoning-centered benchmark for LLM assessment. Our code is open-sourced in https://github.com/puleya1277/CaveEnv.
Walmart's Black Friday Dyson deals are here: Save up to 300 on vacuums and air purifiers
Gear Home Walmart's Black Friday Dyson deals are here: Save up to $300 on vacuums and air purifiers Dyson gear is never cheap, but Walmart has fans, air purifiers, and vacuums for their lowest prices of the year for Black Friday. We may earn revenue from the products available on this page and participate in affiliate programs. Dyson makes impressive home appliances, but they're not cheap. Walmart just dropped its full-on Black Friday deals and that includes year-low prices on Dyson vacuums and air purifiers . These prices likely won't get any lower if you wait, so you might as well just grab what you want now and make your home more comfortable with the power of engineering.
- Retail > Online (1.00)
- Health & Medicine > Health Care Equipment & Supplies (1.00)
Data-Efficient Adaptation and a Novel Evaluation Method for Aspect-based Sentiment Analysis
Hua, Yan Cathy, Denny, Paul, Wicker, Jörg, Taškova, Katerina
Aspect-based Sentiment Analysis (ABSA) is a fine-grained opinion mining approach that identifies and classifies opinions associated with specific entities (aspects) or their categories within a sentence. Despite its rapid growth and broad potential, ABSA research and resources remain concentrated in commercial domains, leaving analytical needs unmet in high-demand yet low-resource areas such as education and healthcare. Domain adaptation challenges and most existing methods' reliance on resource-intensive in-training knowledge injection further hinder progress in these areas. Moreover, traditional evaluation methods based on exact matches are overly rigid for ABSA tasks, penalising any boundary variations which may misrepresent the performance of generative models. This work addresses these gaps through three contributions: 1) We propose a novel evaluation method, Flexible Text Similarity Matching and Optimal Bipartite Pairing (FTS-OBP), which accommodates realistic extraction boundary variations while maintaining strong correlation with traditional metrics and offering fine-grained diagnostics. 2) We present the first ABSA study of small decoder-only generative language models (SLMs; <7B parameters), examining resource lower bounds via a case study in education review ABSA. We systematically explore data-free (in-context learning and weight merging) and data-light fine-tuning methods, and propose a multitask fine-tuning strategy that significantly enhances SLM performance, enabling 1.5-3.8 B models to surpass proprietary large models and approach benchmark results with only 200-1,000 examples on a single GPU. 3) We release the first public set of education review ABSA resources to support future research in low-resource domains.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Singapore (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (11 more...)
- Research Report > New Finding (1.00)
- Instructional Material > Course Syllabus & Notes (1.00)
- Research Report > Experimental Study (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.88)
- (2 more...)
Patient-Centered Summarization Framework for AI Clinical Summarization: A Mixed-Methods Design
Jimenez, Maria Lizarazo, Claros, Ana Gabriela, Green, Kieran, Toro-Tobon, David, Larios, Felipe, Asthana, Sheena, Wenczenovicz, Camila, Maldonado, Kerly Guevara, Vilatuna-Andrango, Luis, Proano-Velez, Cristina, Bandi, Satya Sai Sri, Bagewadi, Shubhangi, Branda, Megan E., Zahidy, Misk Al, Luz, Saturnino, Lapata, Mirella, Brito, Juan P., Ponce-Ponte, Oscar J.
Large Language Models (LLMs) are increasingly demonstrating the potential to reach human-level performance in generating clinical summaries from patient-clinician conversations. However, these summaries often focus on patients' biology rather than their preferences, values, wishes, and concerns. To achieve patient-centered care, we propose a new standard for Artificial Intelligence (AI) clinical summarization tasks: Patient-Centered Summaries (PCS). Our objective was to develop a framework to generate PCS that capture patient values and ensure clinical utility and to assess whether current open-source LLMs can achieve human-level performance in this task. We used a mixed-methods process. Two Patient and Public Involvement groups (10 patients and 8 clinicians) in the United Kingdom participated in semi-structured interviews exploring what personal and contextual information should be included in clinical summaries and how it should be structured for clinical use. Findings informed annotation guidelines used by eight clinicians to create gold-standard PCS from 88 atrial fibrillation consultations. Sixteen consultations were used to refine a prompt aligned with the guidelines. Five open-source LLMs (Llama-3.2-3B, Llama-3.1-8B, Mistral-8B, Gemma-3-4B, and Qwen3-8B) generated summaries for 72 consultations using zero-shot and few-shot prompting, evaluated with ROUGE-L, BERTScore, and qualitative metrics. Patients emphasized lifestyle routines, social support, recent stressors, and care values. Clinicians sought concise functional, psychosocial, and emotional context. The best zero-shot performance was achieved by Mistral-8B (ROUGE-L 0.189) and Llama-3.1-8B (BERTScore 0.673); the best few-shot by Llama-3.1-8B (ROUGE-L 0.206, BERTScore 0.683). Completeness and fluency were similar between experts and models, while correctness and patient-centeredness favored human PCS.
- North America > United States > Minnesota > Olmsted County > Rochester (0.14)
- Europe > United Kingdom > England > Devon > Plymouth (0.04)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- (4 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)
Aligning Large Language Models with Procedural Rules: An Autoregressive State-Tracking Prompting for In-Game Trading
Kim, Minkyung, Kim, Junsik, Yang, Woongcheol, Park, Sangdon, Bae, Sohee
Large Language Models (LLMs) enable dynamic game interactions but fail to follow essential procedural flows in rule-governed trading systems, eroding player trust. This work resolves the core tension between the creative flexibility of LLMs and the procedural demands of in-game trading (browse-offer-review-confirm). To this end, Autoregressive State-Tracking Prompting (ASTP) is introduced, a methodology centered on a strategically orchestrated prompt that compels an LLM to make its state-tracking process explicit and verifiable. Instead of relying on implicit contextual understanding, ASTP tasks the LLM with identifying and reporting a predefined state label from the previous turn. To ensure transactional integrity, this is complemented by a state-specific placeholder post-processing method for accurate price calculations. Evaluation across 300 trading dialogues demonstrates >99% state compliance and 99.3% calculation precision. Notably, ASTP with placeholder post-processing on smaller models (Gemini-2.5-Flash) matches larger models' (Gemini-2.5-Pro) performance while reducing response time from 21.2s to 2.4s, establishing a practical foundation that satisfies both real-time requirements and resource constraints of commercial games.
Large Language Bayes
Many domain experts do not have the time or expertise to write formal Bayesian models. This paper takes an informal problem description as input, and combines a large language model and a probabilistic programming language to define a joint distribution over formal models, latent variables, and data. A posterior over latent variables follows by conditioning on observed data and integrating over formal models. This presents a challenging inference problem. We suggest an inference recipe that amounts to generating many formal models from the large language model, performing approximate inference on each, and then doing a weighted average. This is justified and analyzed as a combination of self-normalized importance sampling, MCMC, and importance-weighted variational inference. Experimentally, this produces sensible predictions from only data and an informal problem description, without the need to specify a formal model.
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Europe > Sweden > Östergötland County > Linköping (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
- (2 more...)
CogniLoad: A Synthetic Natural Language Reasoning Benchmark With Tunable Length, Intrinsic Difficulty, and Distractor Density
Kaiser, Daniel, Frigessi, Arnoldo, Ramezani-Kebrya, Ali, Ricaud, Benjamin
Current benchmarks for long-context reasoning in Large Language Models (LLMs) often blur critical factors like intrinsic task complexity, distractor interference, and task length. To enable more precise failure analysis, we introduce CogniLoad, a novel synthetic benchmark grounded in Cognitive Load Theory (CLT). CogniLoad generates natural-language logic puzzles with independently tunable parameters that reflect CLT's core dimensions: intrinsic difficulty ($d$) controls intrinsic load; distractor-to-signal ratio ($ρ$) regulates extraneous load; and task length ($N$) serves as an operational proxy for conditions demanding germane load. Evaluating 22 SotA reasoning LLMs, CogniLoad reveals distinct performance sensitivities, identifying task length as a dominant constraint and uncovering varied tolerances to intrinsic complexity and U-shaped responses to distractor ratios. By offering systematic, factorial control over these cognitive load dimensions, CogniLoad provides a reproducible, scalable, and diagnostically rich tool for dissecting LLM reasoning limitations and guiding future model development.