Goto

Collaborating Authors

 Atlantic Ocean


Welfare Diplomacy: Benchmarking Language Model Cooperation

arXiv.org Artificial Intelligence

The growing capabilities and increasingly widespread deployment of AI systems necessitate robust benchmarks for measuring their cooperative capabilities. Unfortunately, most multi-agent benchmarks are either zero-sum or purely cooperative, providing limited opportunities for such measurements. We introduce a general-sum variant of the zero-sum board game Diplomacy -- called Welfare Diplomacy -- in which players must balance investing in military conquest and domestic welfare. We argue that Welfare Diplomacy facilitates both a clearer assessment of and stronger training incentives for cooperative capabilities. Our contributions are: (1) proposing the Welfare Diplomacy rules and implementing them via an open-source Diplomacy engine; (2) constructing baseline agents using zero-shot prompted language models; and (3) conducting experiments where we find that baselines using state-of-the-art models attain high social welfare but are exploitable. Our work aims to promote societal safety by aiding researchers in developing and assessing multi-agent AI systems. Code to evaluate Welfare Diplomacy and reproduce our experiments is available at https://github.com/mukobi/welfare-diplomacy.


OceanNet: A principled neural operator-based digital twin for regional oceans

arXiv.org Artificial Intelligence

While data-driven approaches demonstrate great potential in atmospheric modeling and weather forecasting, ocean modeling poses distinct challenges due to complex bathymetry, land, vertical structure, and flow non-linearity. This study introduces OceanNet, a principled neural operator-based digital twin for ocean circulation. OceanNet uses a Fourier neural operator and predictor-evaluate-corrector integration scheme to mitigate autoregressive error growth and enhance stability over extended time scales. A spectral regularizer counteracts spectral bias at smaller scales. OceanNet is applied to the northwest Atlantic Ocean western boundary current (the Gulf Stream), focusing on the task of seasonal prediction for Loop Current eddies and the Gulf Stream meander. Trained using historical sea surface height (SSH) data, OceanNet demonstrates competitive forecast skill by outperforming SSH predictions by an uncoupled, state-of-the-art dynamical ocean model forecast, reducing computation by 500,000 times. These accomplishments demonstrate the potential of physics-inspired deep neural operators as cost-effective alternatives to high-resolution numerical ocean models.


Russia-Ukraine war: List of key events, day 581

Al Jazeera

Russia released a video reportedly showing Viktor Sokolov, commander of Russia's Black Sea Fleet in Crimea, at a meeting with Defence Minister Sergei Shoigu and other military top brass a day after Ukrainian special forces claimed he was among dozens of officers killed in an attack on the fleet's Sevastopol naval base. Ukraine said it was clarifying information regarding Sokolov. The United Kingdom's defence ministry said "a dynamic, deep strike battle" was under way in the Black Sea after the Russian Black Sea Fleet suffered a series of major attacks. Kyiv said its air defences destroyed 26 of 38 Russian drones fired overnight but that some of the drones hit the Danube River port of Izmail, damaging more than 30 vehicles and injuring two drivers during a two-hour attack. The drone barrage also prompted the temporary suspension of ferry services to Romania.


Russia says 19 Ukrainian drones downed over Crimea, Black Sea, and regions

Al Jazeera

Russian aerial defence systems destroyed a wave of 19 Ukrainian drones that were launched overnight in attacks against targets in the Russia-annexed Crimean peninsula, the surrounding Black Sea and other regions of Russia. The Russian defence ministry said early on Thursday that it had "thwarted" the attacks by Ukraine's aircraft-type unmanned aerial vehicles (UAVs). "In the night from 20th to 21st September, an attempt by the Kyiv regime to commit a terrorist attack with lethal drones on sites in the Russian Federation was intercepted," the defence ministry said on the Telegram messaging app. "Air defence systems destroyed 19 Ukrainian UAVs over the Black Sea and the territory of the Republic of Crimea, and one each over the territories of Kursk, Belgorod and Oryol regions," the ministry said. The Belgorod and Kursk regions of Russia border eastern Ukraine, while Oryol is closer to the capital, Moscow.


Ukraine claims to retake Black Sea drilling rigs from Russian control

BBC News

Now it's Russia that appears to have most to worry about, as Ukrainian drones and commandos launch raids on the northwest corner of Crimea, damaging a radar base on the Tarkhankut Peninsula and even planting a Ukrainian flag during an operation to mark Independence Day, on 24 August.


Ukraine launches strikes on Russian territory in 'clever' move against Putin forces: expert

FOX News

Debris rained from the Kyiv night sky as Russia launched air attacks on early Wednesday, killing at least two people in the Ukrainian capital, Mayor Vitali Klitschko wrote on the Telegram messaging app. Ukraine and Russia made their boldest drone and missile strikes in months on each other, with a strike in Kyiv killing two people while a strike on ships in the Black Sea and an airport near the border lasted for hours, according to local reports. "While the Russians have been retaliating brutally against Ukraine, Kyiv's incremental escalation has prevented a massive conventional (or nuclear attack) that would have obliterated Ukraine," Rebekah Koffler, president of Doctrine & Strategy Consulting and a former Defense Intelligence Agency officer, told Fox News Digital. "It's quite witty," she said. "Will this win the war for Ukraine? But it might gradually wear down the Russian people's morale."


Russia-Ukraine war: List of key events, day 553

Al Jazeera

Ukraine bade farewell to legendary fighter pilot Andriy Pilshchykov, known by his call sign "Juice", who was killed with two other pilots during a training flight last week. A Ukrainian flag was draped over 29-year-old Pilshchykov's coffin and his cap placed on top. Russian forces shot down two Ukrainian drones over the Black Sea, the Russian state RIA news agency reported citing the Ministry of Defence. The mission of Ukraine's president in Russian-occupied Crimea said that Moscow was preparing to start a new round of mobilisation for the Russian army in the territory. The United Kingdom's defence ministry said Russia had boosted salaries and benefits for its soldiers making military service "increasingly lucrative".


How Ukraine's stealthy sea drones strike Russian targets

BBC News

President Zelensky has described seaborne drones as Ukraine's "eyes and protection on the frontline", with claims of a series of successful strikes against Russian ships in the Black Sea and on a key bridge to Crimea. These remote-controlled devices are playing an increasingly prominent role, with both sides ramping up their use for attacks and reconnaissance. The BBC's Security Correspondent Frank Gardner and BBC Verify examine their influence on the conflict.


Russian drones threaten Ukraine's key Danube River ports

Al Jazeera

Ukraine's air force said a wave of Russian military drones had entered the mouth of the Danube River and were headed towards the country's Izmail river port near the border with Romania. Social media groups monitoring the war reported hearing air defence systems firing in the area near Ukraine's Danube ports of Izmail and Reni early on Wednesday morning. The governor of southern Odesa region, Oleh Kiper, asked residents of Izmail district to take shelter at around 1:30 a.m. Ukraine's Danube River ports accounted for around a quarter of all grain exports from Ukraine before Russia recently pulled out of a deal allowing safe passage for the export of Ukrainian grain via the country's Black Sea ports. Danube River ports have now become the main export route, with grain shipments sent on barges from Ukraine across the Danube to Romania and its Black Sea port of Constanta for onward shipment.


MMBench: Is Your Multi-modal Model an All-around Player?

arXiv.org Artificial Intelligence

Large vision-language models have recently achieved remarkable progress, exhibiting great perception and reasoning abilities concerning visual information. However, how to effectively evaluate these large vision-language models remains a major obstacle, hindering future model development. Traditional benchmarks like VQAv2 or COCO Caption provide quantitative performance measurements but suffer from a lack of fine-grained ability assessment and non-robust evaluation metrics. Recent subjective benchmarks, such as OwlEval, offer comprehensive evaluations of a model's abilities by incorporating human labor, but they are not scalable and display significant bias. In response to these challenges, we propose MMBench, a novel multi-modality benchmark. MMBench methodically develops a comprehensive evaluation pipeline, primarily comprised of two elements. The first element is a meticulously curated dataset that surpasses existing similar benchmarks in terms of the number and variety of evaluation questions and abilities. The second element introduces a novel CircularEval strategy and incorporates the use of ChatGPT. This implementation is designed to convert free-form predictions into pre-defined choices, thereby facilitating a more robust evaluation of the model's predictions. MMBench is a systematically-designed objective benchmark for robustly evaluating the various abilities of vision-language models. We hope MMBench will assist the research community in better evaluating their models and encourage future advancements in this domain. Project page: https://opencompass.org.cn/mmbench.