Southeast Europe
Achieving Skilled and Reliable Daily Probabilistic Forecasts of Wind Power at Subseasonal-to-Seasonal Timescales over France
Lindas, Eloi, Goude, Yannig, Ciais, Philippe
Accurate and reliable wind power forecasts are crucial for grid stability, balancing supply and demand, and market risk management. Even though short-term weather forecasts have been thoroughly used to provide short-term renewable power predictions, forecasts involving longer prediction horizons still need investigations. Despite the recent progress in subseasonal-to-seasonal weather probabilistic forecasting, their use for wind power prediction usually involves both temporal and spatial aggregation achieve reasonable skill. In this study, we present a forecasting pipeline enabling to transform ECMWF subseasonal-to-seasonal weather forecasts into wind power forecasts for lead times ranging from 1 day to 46 days at daily resolution. This framework also include post-processing of the resulting power ensembles to account for the biases and lack of dispersion of the weather forecasts. We show that our method is able to outperform a climatological baseline by 50 % in terms of both Continuous Ranked Probability Skill Score and Ensemble Mean Squared Error while also providing near perfect calibration of the forecasts for lead times ranging from 15 to 46 days.
- Europe > France (0.05)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- North America > United States > Virginia (0.04)
- (7 more...)
- North America > United States (0.14)
- Europe > Spain (0.04)
- Europe > Southeast Europe (0.04)
- (5 more...)
- Aerospace & Defense (0.68)
- Energy (0.47)
- Information Technology (0.46)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States (0.14)
- Europe > Spain (0.04)
- Europe > Southeast Europe (0.04)
- (5 more...)
- Aerospace & Defense (0.68)
- Energy (0.47)
- Information Technology (0.46)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Follow the Path: Reasoning over Knowledge Graph Paths to Improve LLM Factuality
Zhang, Mike, Bjerva, Johannes, Biswas, Russa
We introduce fs1, a simple yet effective method that improves the factuality of reasoning traces by sourcing them from large reasoning models (e.g., DeepSeek-R1) and grounding them by conditioning on knowledge graph (KG) paths. We fine-tune eight instruction-tuned Large Language Models (LLMs) on 3.9K factually grounded reasoning traces and rigorously evaluate them on six complex open-domain question-answering (QA) benchmarks encompassing 23.9K questions. Our results demonstrate that our fs1-tuned model (32B parameters) consistently outperforms instruction-tuned counterparts with parallel sampling by 6-14 absolute points (pass@$16$). Our detailed analysis shows that fs1 considerably improves model performance over more complex questions (requiring 3 or more hops on KG paths) and numerical answer types compared to the baselines. Furthermore, in single-pass inference, we notice that smaller LLMs show the most improvements. While prior works demonstrate the effectiveness of reasoning traces primarily in the STEM domains, our work shows strong evidence that anchoring reasoning to factual KG paths is a critical step in transforming LLMs for reliable knowledge-intensive tasks.
- Europe > Austria > Vienna (0.14)
- Africa > Seychelles (0.06)
- Europe > Eastern Europe (0.05)
- (28 more...)
Russia's Putin hails war advances; Ukraine retakes parts of Donetsk
How is Russia replenishing its military? What is a'coalition of the willing'? How China forgot promises and'debts' to Ukraine How are Europe, the US pulling apart on Ukraine? Russia's Putin hails war advances; Ukraine retakes parts of Donetsk John Psaropoulos is an independent journalist based in Athens and has been Al Jazeera's correspondent in Southeast Europe since 2012. Ukraine reclaimed 62sq km (24sq miles) of territory last month, its commander in chief revealed on Monday, contradicting Russian President Vladimir Putin's recent claim to be advancing "in all directions".
- Asia > Russia (1.00)
- Europe > Ukraine > Donetsk Oblast > Donetsk (0.62)
- Europe > Southeast Europe (0.25)
- (16 more...)
- Government > Regional Government > Europe Government > Russia Government (1.00)
- Government > Regional Government > Asia Government > Russia Government (1.00)
- Information Technology > Communications (0.49)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.31)
LLM Analysis of 150+ years of German Parliamentary Debates on Migration Reveals Shift from Post-War Solidarity to Anti-Solidarity in the Last Decade
Kostikova, Aida, Pütz, Ole, Eger, Steffen, Sabelfeld, Olga, Paassen, Benjamin
Migration has been a core topic in German political debate, from millions of expellees post World War II over labor migration to refugee movements in the recent past. Studying political speech regarding such wide-ranging phenomena in depth traditionally required extensive manual annotations, limiting the scope of analysis to small subsets of the data. Large language models (LLMs) have the potential to partially automate even complex annotation tasks. We provide an extensive evaluation of a multiple LLMs in annotating (anti-)solidarity subtypes in German parliamentary debates compared to a large set of thousands of human reference annotations (gathered over a year). We evaluate the influence of model size, prompting differences, fine-tuning, historical versus contemporary data; and we investigate systematic errors. Beyond methodological evaluation, we also interpret the resulting annotations from a social science lense, gaining deeper insight into (anti-)solidarity trends towards migrants in the German post-World War II period and recent past. Our data reveals a high degree of migrant-directed solidarity in the postwar period, as well as a strong trend towards anti-solidarity in the German parliament since 2015, motivating further research. These findings highlight the promise of LLMs for political text analysis and the importance of migration debates in Germany, where demographic decline and labor shortages coexist with rising polarization.
- Asia > Russia (0.28)
- Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.14)
- Europe > Czechia (0.14)
- (23 more...)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Government > Military (1.00)
- Government > Immigration & Customs (1.00)
- Government > Regional Government > Europe Government (0.92)
Can VLMs Recall Factual Associations From Visual References?
Ashok, Dhananjay, Chaubey, Ashutosh, Arai, Hirona J., May, Jonathan, Thomason, Jesse
Through a controlled study, we identify a systematic deficiency in the multimodal grounding of Vision Language Models (VLMs). While VLMs can recall factual associations when provided a textual reference to an entity; their ability to do so is significantly diminished when the reference is visual instead. Forcing VLMs to rely on image representations of an entity halves their ability to recall factual knowledge, suggesting that VLMs struggle to link their internal knowledge of an entity with its image representation. We show that such linking failures are correlated with the expression of distinct patterns in model internal states, and that probes on these internal states achieve over 92% accuracy at flagging cases where the VLM response is unreliable. These probes can be applied, without retraining, to identify when a VLM will fail to correctly answer a question that requires an understanding of multimodal input. When used to facilitate selective prediction on a visual question answering task, the probes increase coverage by 7.87% (absolute) while also reducing the risk of error by 0.9% (absolute). Addressing the systematic, detectable deficiency is an important avenue in language grounding, and we provide informed recommendations for future directions.
- North America > United States > California (0.14)
- Europe > Austria > Vienna (0.14)
- Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
- (14 more...)
- Research Report > Strength High (0.34)
- Research Report > Experimental Study (0.34)
AetherCode: Evaluating LLMs' Ability to Win In Premier Programming Competitions
Wang, Zihan, Chen, Jiaze, Liu, Zhicheng, Mak, Markus, Du, Yidi, Moon, Geonsik, Xu, Luoqi, Tua, Aaron, Peng, Kunshuo, Lu, Jiayi, Xia, Mingfei, Zou, Boqian, Ran, Chenyang, Tian, Guang, Zhu, Shoutai, Duan, Yeheng, Kang, Zhenghui, Lin, Zhenxing, Li, Shangshu, Luo, Qiang, Long, Qingshen, Chen, Zhiyong, Xiao, Yihan, Wu, Yurong, Zan, Daoguang, Fu, Yuyi, Wang, Mingxuan, Ding, Ming
Competitive programming has emerged as a critical benchmark for evaluating the reasoning and coding capabilities of Large Language Models (LLMs). Despite impressive progress on existing benchmarks, we argue that current evaluations overstate model proficiency, masking a substantial gap between LLMs and elite human programmers. This gap arises from two key limitations: insufficient difficulty and scope of benchmark problems, and evaluation bias from low-quality test cases. To address these shortcomings, we present AetherCode, a new benchmark that draws problems from premier programming competitions such as IOI and ICPC, offering broader coverage and higher difficulty. AetherCode further incorporates comprehensive, expert-validated test suites built through a hybrid of automated generation and human curation, ensuring rigorous and reliable assessment. By combining challenging problem design with robust evaluation, AetherCode provides a more faithful measure of LLM capabilities and sets a new standard for future research in code reasoning.
- North America > United States > California (0.14)
- South America (0.04)
- North America > Central America (0.04)
- (22 more...)
- Leisure & Entertainment (0.68)
- Education (0.46)
From Time-series Generation, Model Selection to Transfer Learning: A Comparative Review of Pixel-wise Approaches for Large-scale Crop Mapping
Long, Judy, Liu, Tao, Woznicki, Sean Alexander, Marković, Miljana, Marko, Oskar, Sears, Molly
Crop mapping involves identifying and classifying crop types using spatial data, primarily derived from remote sensing imagery. This study presents the first comprehensive review of large-scale, pixel-wise crop mapping workflows, encompassing both conventional supervised methods and emerging transfer learning approaches. To identify the optimal time-series generation approaches and supervised crop mapping models, we conducted systematic experiments, comparing six widely adopted satellite image-based preprocessing methods, alongside eleven supervised pixel-wise classification models. Additionally, we assessed the synergistic impact of varied training sample sizes and variable combinations. Moreover, we identified optimal transfer learning techniques for different magnitudes of domain shift. The evaluation of optimal methods was conducted across five diverse agricultural sites. Landsat 8 served as the primary satellite data source. Labels come from CDL trusted pixels and field surveys. Our findings reveal three key insights. First, fine-scale interval preprocessing paired with Transformer models consistently delivered optimal performance for both supervised and transferable workflows. RF offered rapid training and competitive performance in conventional supervised learning and direct transfer to similar domains. Second, transfer learning techniques enhanced workflow adaptability, with UDA being effective for homogeneous crop classes while fine-tuning remains robust across diverse scenarios. Finally, workflow choice depends heavily on the availability of labeled samples. With a sufficient sample size, supervised training typically delivers more accurate and generalizable results. Below a certain threshold, transfer learning that matches the level of domain shift is a viable alternative to achieve crop mapping. All code is publicly available to encourage reproducibility practice.
- Europe > Southeast Europe (0.04)
- North America > United States > South Dakota (0.04)
- North America > Canada (0.04)
- (19 more...)
- Workflow (1.00)
- Research Report > New Finding (1.00)
- Overview (1.00)
Past, Present and Future: Exploring Adaptive AI in Software Development Bots
--Conversational agents, such as chatbots and virtual assistants, have become essential in software development, boosting productivity, collaboration, and automating various tasks. This paper examines the role of adaptive AI-powered conversational agents in software development, highlighting their ability to offer dynamic, context-aware assistance to developers. Unlike traditional rule-based systems, adaptive AI agents use machine learning and natural language processing to learn from interactions and improve over time, providing more personalized and responsive help. We look at how these tools have evolved from simple query-based systems to advanced AI-driven solutions like GitHub Copilot and Microsoft T eams bots. We also explore the challenges of integrating adaptive AI into software development processes. The study aims to assess the benefits and limitations of these systems, address concerns like data privacy and ethical issues, and offer insights into their future use in the field. Ultimately, adaptive AI chatbots have great potential to revolutionize software development by delivering real-time, customized support and enhancing the efficiency of development cycles. Conversational agents (CAs), including chatbots, dialogue systems, and virtual assistants, are software-based systems designed to process natural language and simulate intelligent dialogue with users [1].
- North America > United States > Hawaii (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Southeast Europe (0.04)
- (2 more...)
- Research Report (0.70)
- Overview (0.67)
- Information Technology > Security & Privacy (1.00)
- Education (1.00)