Abkhazia
APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training
Qin, Jiarui, Xi, Yunjia, Huang, Junjie, Rui, Renting, Yin, Di, Liu, Weiwen, Yu, Yong, Zhang, Weinan, Sun, Xing
With the rapid development of LLM-based agents, there is a growing trend to incorporate agent-specific data into the pre-training stage of LLMs, aiming to better align LLMs with real-world autonomous task execution. However, current pre-training benchmarks primarily focus on isolated and static skills, e.g., common knowledge or mathematical/code reasoning, and fail to reflect model's agentic capabilities. On the other hand, agent benchmarks are typically designed for post-trained models, requiring multi-turn task execution abilities that base models struggle to support. Thus, there is a compelling need for a benchmark that can evaluate agentic potentials during pre-training and guide the model training more effectively. To address this gap, we propose APTBench, a framework that converts real-world agent tasks and successful trajectories into multiple-choice or text completion questions tailored for base models. It focuses on core agentic abilities, e.g., planning and action, and covers key agent scenarios, software engineering and deep research. Compared to existing general-purpose benchmarks, APTBench offers a more predictive signal of a model's downstream performance as an agent, while remaining significantly more lightweight and cost-effective than full-scale, end-to-end agent evaluations after post-training.
- Workflow (1.00)
- Research Report (1.00)
- Banking & Finance > Economy (1.00)
- Education (0.88)
- Government > Voting & Elections (0.68)
Language Models Struggle to Achieve a Consistent Temporal Representation of Facts
Khodja, Hichem Ammar, Béchet, Frédéric, Brabant, Quentin, Nasr, Alexis, Lecorvé, Gwénolé
Language Models (LMs) have shown substantial improvements in handling factual knowledge, yet their capability to consistently represent temporal facts, which are valid only within specific timeframes, remains underexplored. To investigate this, we introduce TimeStress, a novel dataset comprising 521K statements on 2003 of the most popular temporal facts in Wikidata. Each statement contextualizes a fact with correct and incorrect dates across three precisions (Day, Month, Year). This setup allows us to evaluate LMs' ability to discern between correct and incorrect temporal statements based on their probability of being generated. We assess 18 LMs across various architectures using two metrics: the win rate, indicating how often correct dates outperform incorrect ones, and robustness, reflecting consistent performance across all dates. Our findings reveal that while some LMs achieve a win rate exceeding 80\%, robustness remains low, with the best model achieving only 6\%. Furthermore, robust knowledge at one date precision does not reliably transfer to others, highlighting a significant generalization gap. These results underscore the struggle of LMs to maintain a consistent temporal representation, supporting their limitations as reliable sources of temporal knowledge. We provide all data and code for further research.
- Europe > Austria > Vienna (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Russia (0.04)
- (24 more...)
- Government (1.00)
- Leisure & Entertainment > Sports (0.68)
Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach
Poli, Maxime, Chemla, Emmanuel, Dupoux, Emmanuel
Recent progress in Spoken Language Modeling has shown that learning language directly from speech is feasible. Generating speech through a pipeline that operates at the text level typically loses nuances, intonations, and non-verbal vocalizations. Modeling directly from speech opens up the path to more natural and expressive systems. On the other hand, speech-only systems require up to three orders of magnitude more data to catch up to their text-based counterparts in terms of their semantic abilities. We show that fine-tuning speech representation models on phoneme classification leads to more context-invariant representations, and language models trained on these units achieve comparable lexical comprehension to ones trained on hundred times more data.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > Singapore (0.04)
- Asia > Georgia > Abkhazia (0.04)
- (5 more...)
Narratives at Conflict: Computational Analysis of News Framing in Multilingual Disinformation Campaigns
Sinelnik, Antonina, Hovy, Dirk
Any report frames issues to favor a particular interpretation by highlighting or excluding certain aspects of a story. Despite the widespread use of framing in disinformation, framing properties and detection methods remain underexplored outside the English-speaking world. We explore how multilingual framing of the same issue differs systematically. We use eight years of Russia-backed disinformation campaigns, spanning 8k news articles in 4 languages targeting 15 countries. We find that disinformation campaigns consistently and intentionally favor specific framing, depending on the target language of the audience. We further discover how Russian-language articles consistently highlight selected frames depending on the region of the media coverage. We find that the two most prominent models for automatic frame analysis underperform and show high disagreement, highlighting the need for further research.
- Media > News (1.00)
- Government (1.00)
Ukraine says Russia's Black Sea Fleet suffered debilitating losses since collapse of grain deal
Russia's Black Sea Fleet suffered significant losses over the five months following the collapse of the U.N.-brokered grain deal as Ukraine staked a strong claim over major routes through the Black Sea. Russia's Black Sea fleet has suffered severe setbacks as Ukrainian forces continue to cripple a major piece of Moscow's war effort. Last week, Ukrainian media touted a major victory over the Russian fleet with the publication of a video that allegedly showed the destruction of a nearly 70 million missile ship, the Ivanovets. Multiple drones hit the vessel and sank it, with the crew's fate unknown. "As a result of a number of direct hits to the hull, the Russian ship received damage that was incompatible with further movement – the Ivanovets tilted to the stern and sank," said the Military Informant Telegram channel.
- Atlantic Ocean > Black Sea (1.00)
- Asia > Russia (1.00)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.27)
- (6 more...)
- Government > Military > Navy (0.95)
- Government > Regional Government > Europe Government > Russia Government (0.85)
- Government > Regional Government > Asia Government > Russia Government (0.85)
- Information Technology > Communications > Social Media (0.36)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.32)
Full text: NATO Vilnius summit communique
NATO leaders are holding their annual summit as Ukraine looks to the security alliance for support in its attempt to push back invading Russian forces. The Vilnius communique, however, while emphasising NATO's support for Ukraine, gave no clear timetable on when the country might be able to join the alliance, in a major disappointment for Ukrainian President Volodymyr Zelenskyy, who had travelled to the Lithuanian capital. "Ukraine's future is in NATO," the leaders said in the joint statement on Tuesday. "We will be in a position to extend an invitation to Ukraine to join the alliance when allies agree and conditions are met," the declaration said, without specifying the conditions. The communique also touched on the Asia Pacific, with the leaders of Australia, Japan, New Zealand and South Korea all attending as NATO allies. It said China was a challenge to NATO's interests, security and values with its "ambitions and coercive policies" triggering a furious response from Beijing. And it accused Beijing and Moscow of "mutually reinforcing attempts to undercut the rules-based international order". China has said it wants peace in Ukraine, but has not condemned Russia's full scale invasion since it began in February 2022. NATO is a defensive Alliance. It is the unique, essential and indispensable transatlantic forum to consult, coordinate and act on all matters related to our individual and collective security. We reaffirm our iron-clad commitment to defend each other and every inch of Allied territory at all times, protect our one billion citizens, and safeguard our freedom and democracy, in accordance with Article 5 of the Washington Treaty. We will continue to ensure our collective defence from all threats, no matter where they stem from, based on a 360-degree approach, to fulfil NATO's three core tasks of deterrence and defence, crisis prevention and management, and cooperative security. We adhere to international law and to the purposes and principles of the Charter of the United Nations and are committed to upholding the rules-based international order. This Summit marks a milestone in strengthening our Alliance. We look forward to our valuable exchanges with the Heads of State and Government of Australia, Japan, New Zealand, and the Republic of Korea, as well as the President of the European Council and the President of the European Commission at this Summit. We also welcome the engagements with the Foreign Ministers of Georgia and the Republic of Moldova, and with the Deputy Foreign Minister of Bosnia and Herzegovina, as we continue to consult closely on the implementation of NATO's tailored support measures. This is an historic step for Finland and for NATO. For many years, we worked closely as partners; we now stand together as Allies. NATO membership makes Finland safer, and NATO stronger. Every nation has the right to choose its own security arrangements.
- Oceania > Australia (1.00)
- Asia > Russia (0.92)
- Europe > Lithuania > Vilnius County > Vilnius (0.60)
- (42 more...)
- Government > Regional Government > Europe Government (1.00)
- Government > Military (1.00)
- Government > Regional Government > Asia Government > Middle East Government (0.93)
The Radicalization Risks of GPT-3 and Advanced Neural Language Models
McGuffie, Kris, Newhouse, Alex
In this paper, we expand on our previous research of the potential for abuse of generative language models by assessing GPT-3. Experimenting with prompts representative of different types of extremist narrative, structures of social interaction, and radical ideologies, we find that GPT-3 demonstrates significant improvement over its predecessor, GPT-2, in generating extremist texts. We also show GPT-3's strength in generating text that accurately emulates interactive, informational, and influential content that could be utilized for radicalizing individuals into violent far-right extremist ideologies and behaviors. While OpenAI's preventative measures are strong, the possibility of unregulated copycat technology represents significant risk for large-scale online radicalization and recruitment; thus, in the absence of safeguards, successful and efficient weaponization that requires little experimentation is likely. AI stakeholders, the policymaking community, and governments should begin investing as soon as possible in building social norms, public policy, and educational initiatives to preempt an influx of machine-generated disinformation and propaganda. Mitigation will require effective policy and partnerships across industry, government, and civil society.
- Asia > Middle East > Syria (0.29)
- Asia > Russia (0.29)
- Europe > Russia > North Caucasian Federal District > Chechen Republic (0.04)
- (9 more...)
- Research Report (0.50)
- Personal (0.46)
- Law Enforcement & Public Safety > Terrorism (1.00)
- Law (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)
Georgians keep protesting Russian's speech in parliament despite speaker's resignation
TBILISI - The speaker of Georgia's parliament stepped down Friday in the wake of violent clashes that left at least 240 people injured, but the move failed to assuage protesters, who returned to the streets demanding that the interior minister also step down over a brutal police response. A night of clashes Thursday was sparked by a Russian lawmaker who took the speaker's seat as a group of international lawmakers met at the Georgian parliament in Tbilisi. It angered the opposition, which sees the current Georgian government as overly friendly to Russian interests. The protests mark the largest outpouring of anger against the ruling Georgian Dream since it took power in 2012. Officials said at least 240 people were injured when riot police fired rubber bullets and tear gas and unleashed water cannon on protesters outside Georgia's parliament building during the clashes that lasted into early Friday.
- Asia > Russia (1.00)
- Asia > Georgia > Tbilisi > Tbilisi (0.50)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.09)
- (2 more...)
- Government > Regional Government > Europe Government > Russia Government (1.00)
- Government > Regional Government > Asia Government > Russia Government (1.00)
Foreign Policy: A Predictable Future For Technology
Some predict that technology will become more advanced than the human brain. Some predict that technology will become more advanced than the human brain. Ayesha and Parag Khanna are co-directors of the Hybrid Reality Institute. Ayesha is author of Straight Through Processing for Financial Services. Parag is senior research fellow at the New America Foundation and author of How to Run the World: Charting a Course to the Next Renaissance.
- Asia > India (0.06)
- Asia > China (0.06)
- North America > United States > New York (0.04)
- (7 more...)
- Government > Foreign Policy (0.50)
- Health & Medicine > Therapeutic Area (0.47)
- Banking & Finance > Financial Services (0.34)
Why does FIFA still recognise Israeli settlement teams?
This week FIFA's senior representative, Tokyo Sexwale, will throw his hat into the ring as he attempts to resolve disagreements between Israeli and Palestinian football associations. The disputes are over Israeli restrictions placed on the movement of Palestinian players and the participation of at least five Israeli football clubs in Israeli leagues - two issues which Palestinians claim contravene FIFA's own rules. While progress has been achieved on movement for Palestinian players, the issue of settlement teams remains intractable. Their inclusion within Israeli leagues is the manifestation of a political process that seeks to normalise Israel's claim to the Palestinian territory it occupied in 1967. In this context, football has become a tool to legitimise the expanding settlements as an integral part of Israel.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.26)
- Europe > Ukraine > Crimea (0.06)
- Europe > Ukraine > Luhansk Oblast > Luhansk (0.05)
- (11 more...)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Government > Regional Government > Asia Government > Middle East Government > Israel Government (0.52)