Indian Ocean
Using Deep Learning to Identify Initial Error Sensitivity for Interpretable ENSO Forecasts
Toride, Kinya, Newman, Matthew, Hoell, Andrew, Capotondi, Antonietta, Schlör, Jakob, Amaya, Dillon
We introduce an interpretable-by-design method, optimized model-analog, that integrates deep learning with model-analog forecasting, a straightforward yet effective approach that generates forecasts from similar initial climate states in a repository of model simulations. This hybrid framework employs a convolutional neural network to estimate state-dependent weights to identify initial analog states that lead to shadowing target trajectories. The advantage of our method lies in its inherent interpretability, offering insights into initial-error-sensitive regions through estimated weights and the ability to trace the physically-based evolution of the system through analog forecasting. We evaluate our approach using the Community Earth System Model Version 2 Large Ensemble to forecast the El Ni\~no-Southern Oscillation (ENSO) on a seasonal-to-annual time scale. Results show a 10% improvement in forecasting equatorial Pacific sea surface temperature anomalies at 9-12 months leads compared to the original (unweighted) model-analog technique. Furthermore, our model demonstrates improvements in boreal winter and spring initialization when evaluated against a reanalysis dataset. Our approach reveals state-dependent regional sensitivity linked to various seasonally varying physical processes, including the Pacific Meridional Modes, equatorial recharge oscillator, and stochastic wind forcing. Additionally, disparities emerge in the sensitivity associated with El Ni\~no versus La Ni\~na events. El Ni\~no forecasts are more sensitive to initial uncertainty in tropical Pacific sea surface temperatures, while La Ni\~na forecasts are more sensitive to initial uncertainty in tropical Pacific zonal wind stress. This approach has broad implications for forecasting diverse climate phenomena, including regional temperature and precipitation, which are challenging for the original model-analog approach.
Welcome to the Laser Wars
The age of the laser weapon is finally upon us. The United States Army has officially sent a pair of high-energy laser weapons overseas to defend American troops and US allies against enemy drones, the service recently revealed, marking the first publicly known deployment of a directed-energy system for air defense in military history. And, according to a top official, those weapons are actively blasting threats out of the sky. The weapon, known as the Palletized High Energy Laser (P-HEL) and developed by the American defense contractor BlueHalo based on the company's 20-kilowatt Locust Laser Weapon System, first arrived in an unspecified location overseas and "commenced operational employment" in November 2022, according to an April press release from the company. A second system arrived overseas "earlier this year."
Krey\`ol-MT: Building MT for Latin American, Caribbean and Colonial African Creole Languages
Robinson, Nathaniel R., Dabre, Raj, Shurtz, Ammon, Dent, Rasul, Onesi, Onenamiyi, Monroc, Claire Bizon, Grobol, Loïc, Muhammad, Hasan, Garg, Ashi, Etori, Naome A., Tiyyala, Vijay Murari, Samuel, Olanrewaju, Stutzman, Matthew Dean, Odoom, Bismarck Bamfo, Khudanpur, Sanjeev, Richardson, Stephen D., Murray, Kenton
A majority of language technologies are tailored for a small number of high-resource languages, while relatively many low-resource languages are neglected. One such group, Creole languages, have long been marginalized in academic study, though their speakers could benefit from machine translation (MT). These languages are predominantly used in much of Latin America, Africa and the Caribbean. We present the largest cumulative dataset to date for Creole language MT, including 14.5M unique Creole sentences with parallel translations -- 11.6M of which we release publicly, and the largest bitexts gathered to date for 41 languages -- the first ever for 21. In addition, we provide MT models supporting all 41 Creole languages in 172 translation directions. Given our diverse dataset, we produce a model for Creole language MT exposed to more genre diversity than ever before, which outperforms a genre-specific Creole MT model on its own benchmark for 26 of 34 translation directions.
ViWikiFC: Fact-Checking for Vietnamese Wikipedia-Based Textual Knowledge Source
Le, Hung Tuan, To, Long Truong, Nguyen, Manh Trong, Van Nguyen, Kiet
Fact-checking is essential due to the explosion of misinformation in the media ecosystem. Although false information exists in every language and country, most research to solve the problem mainly concentrated on huge communities like English and Chinese. Low-resource languages like Vietnamese are necessary to explore corpora and models for fact verification. To bridge this gap, we construct ViWikiFC, the first manual annotated open-domain corpus for Vietnamese Wikipedia Fact Checking more than 20K claims generated by converting evidence sentences extracted from Wikipedia articles. We analyze our corpus through many linguistic aspects, from the new dependency rate, the new n-gram rate, and the new word rate. We conducted various experiments for Vietnamese fact-checking, including evidence retrieval and verdict prediction. BM25 and InfoXLM (Large) achieved the best results in two tasks, with BM25 achieving an accuracy of 88.30% for SUPPORTS, 86.93% for REFUTES, and only 56.67% for the NEI label in the evidence retrieval task, InfoXLM (Large) achieved an F1 score of 86.51%. Furthermore, we also conducted a pipeline approach, which only achieved a strict accuracy of 67.00% when using InfoXLM (Large) and BM25. These results demonstrate that our dataset is challenging for the Vietnamese language model in fact-checking tasks.
OXYGENERATOR: Reconstructing Global Ocean Deoxygenation Over a Century with Deep Learning
Lu, Bin, Zhao, Ze, Han, Luyu, Gan, Xiaoying, Zhou, Yuntao, Zhou, Lei, Fu, Luoyi, Wang, Xinbing, Zhou, Chenghu, Zhang, Jing
Accurately reconstructing the global ocean deoxygenation over a century is crucial for assessing and protecting marine ecosystem. Existing expert-dominated numerical simulations fail to catch up with the dynamic variation caused by global warming and human activities. Besides, due to the high-cost data collection, the historical observations are severely sparse, leading to big challenge for precise reconstruction. In this work, we propose OxyGenerator, the first deep learning based model, to reconstruct the global ocean deoxygenation from 1920 to 2023. Specifically, to address the heterogeneity across large temporal and spatial scales, we propose zoning-varying graph message-passing to capture the complex oceanographic correlations between missing values and sparse observations. Additionally, to further calibrate the uncertainty, we incorporate inductive bias from dissolved oxygen (DO) variations and chemical effects. Compared with in-situ DO observations, OxyGenerator significantly outperforms CMIP6 numerical simulations, reducing MAPE by 38.77%, demonstrating a promising potential to understand the "breathless ocean" in data-driven manner.
SaudiBERT: A Large Language Model Pretrained on Saudi Dialect Corpora
In this paper, we introduce SaudiBERT, a monodialect Arabic language model pretrained exclusively on Saudi dialectal text. To demonstrate the model's effectiveness, we compared SaudiBERT with six different multidialect Arabic language models across 11 evaluation datasets, which are divided into two groups: sentiment analysis and text classification. SaudiBERT achieved average F1-scores of 86.15\% and 87.86\% in these groups respectively, significantly outperforming all other comparative models. Additionally, we present two novel Saudi dialectal corpora: the Saudi Tweets Mega Corpus (STMC), which contains over 141 million tweets in Saudi dialect, and the Saudi Forums Corpus (SFC), which includes 15.2 GB of text collected from five Saudi online forums. Both corpora are used in pretraining the proposed model, and they are the largest Saudi dialectal corpora ever reported in the literature. The results confirm the effectiveness of SaudiBERT in understanding and analyzing Arabic text expressed in Saudi dialect, achieving state-of-the-art results in most tasks and surpassing other language models included in the study. SaudiBERT model is publicly available on \url{https://huggingface.co/faisalq/SaudiBERT}.
China launches lunar probe to take samples from far side of the moon
Former National Security Adviser Robert O'Brien joins'Life, Liberty & Levin' to discuss the Biden administration's foreign policy in the Middle East. China on Friday launched a lunar probe to land on the far side of the moon and return with samples that could provide insights into differences between the less-explored region and the better-known near side. It is the latest advance in China's increasingly sophisticated space exploration program, which is now competing with the U.S., still the leader in space. China also has a three-member crew on its own orbiting space station and aims to put astronauts on the moon by 2030. Three Chinese lunar probe missions are planned over the next four years.
Portuguese-flagged ship targeted in Arabian Sea drone assault; Houthi rebels claim responsibility
Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. A Portuguese-flagged container ship came under attack by a drone in the far reaches of the Arabian Sea, corresponding with a claim by Yemen's Houthi rebels that they assaulted the ship there, authorities said Tuesday. The attack on the MSC Orion, occurring some 375 miles off the coast of Yemen, appeared to be the first confirmed deep-sea assault claimed by the Houthis since they began targeting ships in November. It suggests the Houthis -- or potentially their main benefactor Iran -- may have the ability to strike into the distances of the Indian Ocean as the rebels previously threatened in their ongoing campaign over Israel's war on Hamas in the Gaza Strip.
Likely missile attack by Yemen's Houthi rebels damages a ship in the Red Sea
Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. A suspected missile attack by Yemen's Houthi rebels damaged a ship in the Red Sea on Monday, authorities said, the latest assault in their campaign against international shipping in the crucial maritime route. The attack happened off the coast of Mokha, Yemen, the British military's United Kingdom Maritime Trade Operations center said. The ship sustained damage in the attack, the UKMTO said, though its crew was safe and heading to its next port of call.
Automated Construction of Theme-specific Knowledge Graphs
Ding, Linyi, Zhou, Sizhe, Xiao, Jinfeng, Han, Jiawei
Despite widespread applications of knowledge graphs (KGs) in various tasks such as question answering and intelligent conversational systems, existing KGs face two major challenges: information granularity and deficiency in timeliness. These hinder considerably the retrieval and analysis of in-context, fine-grained, and up-to-date knowledge from KGs, particularly in highly specialized themes (e.g., specialized scientific research) and rapidly evolving contexts (e.g., breaking news or disaster tracking). To tackle such challenges, we propose a theme-specific knowledge graph (i.e., ThemeKG), a KG constructed from a theme-specific corpus, and design an unsupervised framework for ThemeKG construction (named TKGCon). The framework takes raw theme-specific corpus and generates a high-quality KG that includes salient entities and relations under the theme. Specifically, we start with an entity ontology of the theme from Wikipedia, based on which we then generate candidate relations by Large Language Models (LLMs) to construct a relation ontology. To parse the documents from the theme corpus, we first map the extracted entity pairs to the ontology and retrieve the candidate relations. Finally, we incorporate the context and ontology to consolidate the relations for entity pairs. We observe that directly prompting GPT-4 for theme-specific KG leads to inaccurate entities (such as "two main types" as one entity in the query result) and unclear (such as "is", "has") or wrong relations (such as "have due to", "to start"). In contrast, by constructing the theme-specific KG step by step, our model outperforms GPT-4 and could consistently identify accurate entities and relations. Experimental results also show that our framework excels in evaluations compared with various KG construction baselines.