AITopics | Indian Ocean

Collaborating Authors

Indian Ocean

Reconstructing the Tropical Pacific Upper Ocean using Online Data Assimilation with a Deep Learning model

arXiv.org Artificial IntelligenceJun-11-2024

A deep learning (DL) model, based on a transformer architecture, is trained on a climate-model dataset and compared with a standard linear inverse model (LIM) in the tropical Pacific. We show that the DL model produces more accurate forecasts compared to the LIM when tested on a reanalysis dataset. We then assess the ability of an ensemble Kalman filter to reconstruct the monthly-averaged upper ocean from a noisy set of 24 sea-surface temperature observations designed to mimic existing coral proxy measurements, and compare results for the DL model and LIM. Due to signal damping in the DL model, we implement a novel inflation technique by adding noise from hindcast experiments. Results show that assimilating observations with the DL model yields better reconstructions than the LIM for observation averaging times ranging from one month to one year. The improved reconstruction is due to the enhanced predictive capabilities of the DL model, which map the memory of past observations to future assimilation times.

assimilation, experiment, modeling earth system, (13 more...)

arXiv.org Artificial Intelligence

2406.07063

Country:

North America > United States > Washington > King County > Seattle (0.14)
Pacific Ocean (0.04)
Indian Ocean (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages

Bean, Andrew M., Hellsten, Simi, Mayne, Harry, Magomere, Jabez, Chi, Ethan A., Chi, Ryan, Hale, Scott A., Kirk, Hannah Rose

arXiv.org Artificial IntelligenceJun-11-2024

In this paper, we present the LingOly benchmark, a novel benchmark for advanced reasoning abilities in large language models. Using challenging Linguistic Olympiad puzzles, we evaluate (i) capabilities for in-context identification and generalisation of linguistic patterns in very low-resource or extinct languages, and (ii) abilities to follow complex task instructions. The LingOly benchmark covers more than 90 mostly low-resource languages, minimising issues of data contamination, and contains 1,133 problems across 6 formats and 5 levels of human difficulty. We assess performance with both direct accuracy and comparison to a no-context baseline to penalise memorisation. Scores from 11 state-of-the-art LLMs demonstrate the benchmark to be challenging, and models perform poorly on the higher difficulty problems. On harder problems, even the top model only achieved 38.7% accuracy, 24.7% improvement over the no-context baseline. Large closed models typically outperform open models, and in general, the higher resource the language, the better the scores. These results indicate, in absence of memorisation, true multi-step out-of-domain reasoning remains a challenge for current language models.

arxiv, benchmark, dataset, (15 more...)

arXiv.org Artificial Intelligence

2406.06196

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Africa > Sudan (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(11 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Education (1.00)
Leisure & Entertainment (0.67)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TAXI: Evaluating Categorical Knowledge Editing for Language Models

Powell, Derek, Gerych, Walter, Hartvigsen, Thomas

arXiv.org Artificial IntelligenceJun-6-2024

Humans rarely learn one fact in isolation. Instead, learning a new fact induces knowledge of other facts about the world. For example, in learning a korat is a type of cat, you also infer it is a mammal and has claws, ensuring your model of the world is consistent. Knowledge editing aims to inject new facts into language models to improve their factuality, but current benchmarks fail to evaluate consistency, which is critical to ensure efficient, accurate, and generalizable edits. We manually create TAXI, a new benchmark dataset specifically created to evaluate consistency in categorical knowledge edits. TAXI contains 11,120 multiple-choice queries for 976 edits spanning 41 categories (e.g., Dogs), 164 subjects (e.g., Labrador), and 183 properties (e.g., is a mammal). We then use TAXI to evaluate popular editors' categorical consistency, measuring how often editing a subject's category appropriately edits its properties. We find that 1) the editors achieve marginal, yet non-random consistency, 2) their consistency far underperforms human baselines, and 3) consistency is more achievable when editing atypical subjects Our code and data are available at https://github.com/derekpowell/taxi.

category, consistency, editing, (14 more...)

arXiv.org Artificial Intelligence

2404.15004

Country:

North America > Canada > Newfoundland and Labrador > Labrador (0.25)
North America > United States > Arizona (0.05)
North America > United States > Virginia (0.04)
(9 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Education (0.49)
Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.32)

Add feedback

OCDB: Revisiting Causal Discovery with a Comprehensive Benchmark and Evaluation Framework

Zhou, Wei, Huang, Hong, Zhang, Guowen, Shi, Ruize, Yin, Kehan, Lin, Yuanyuan, Liu, Bang

arXiv.org Artificial IntelligenceJun-6-2024

Large language models (LLMs) have excelled in various natural language processing tasks, but challenges in interpretability and trustworthiness persist, limiting their use in high-stakes fields. Causal discovery offers a promising approach to improve transparency and reliability. However, current evaluations are often one-sided and lack assessments focused on interpretability performance. Additionally, these evaluations rely on synthetic data and lack comprehensive assessments of real-world datasets. These lead to promising methods potentially being overlooked. To address these issues, we propose a flexible evaluation framework with metrics for evaluating differences in causal structures and causal effects, which are crucial attributes that help improve the interpretability of LLMs. We introduce the Open Causal Discovery Benchmark (OCDB), based on real data, to promote fair comparisons and drive optimization of algorithms. Additionally, our new metrics account for undirected edges, enabling fair comparisons between Directed Acyclic Graphs (DAGs) and Completed Partially Directed Acyclic Graphs (CPDAGs). Experimental results show significant shortcomings in existing algorithms' generalization capabilities on real data, highlighting the potential for performance improvement and the importance of our framework in advancing causal discovery techniques.

causal effect, causal graph, dataset, (11 more...)

arXiv.org Artificial Intelligence

2406.04598

Country:

Oceania > Australia > Tasmania (0.04)
North America > Canada > Quebec > Montreal (0.04)
Indian Ocean > Bass Strait (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.54)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)

Add feedback

Generating Harder Cross-document Event Coreference Resolution Datasets using Metaphoric Paraphrasing

Ahmed, Shafiuddin Rehan, Wang, Zhiyong Eric, Baker, George Arthur, Stowe, Kevin, Martin, James H.

arXiv.org Artificial IntelligenceJun-5-2024

The most popular Cross-Document Event Coreference Resolution (CDEC) datasets fail to convey the true difficulty of the task, due to the lack of lexical diversity between coreferring event triggers (words or phrases that refer to an event). Furthermore, there is a dearth of event datasets for figurative language, limiting a crucial avenue of research in event comprehension. We address these two issues by introducing ECB+META, a lexically rich variant of Event Coref Bank Plus (ECB+) for CDEC on symbolic and metaphoric language. We use ChatGPT as a tool for the metaphoric transformation of sentences in the documents of ECB+, then tag the original event triggers in the transformed sentences in a semi-automated manner. In this way, we avoid the re-annotation of expensive coreference links. We present results that show existing methods that work well on ECB+ struggle with ECB+META, thereby paving the way for CDEC research on a much more challenging dataset. Code/data: https://github.com/ahmeshaf/llms_coref

computational linguistic, metaphor, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2407.11988

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Indian Ocean > Arabian Sea > Gulf of Aden (0.05)
Asia > Middle East > Yemen (0.05)
(16 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Graph Neural Network Enhanced Retrieval for Question Answering of LLMs

Li, Zijian, Guo, Qingyan, Shao, Jiawei, Song, Lei, Bian, Jiang, Zhang, Jun, Wang, Rui

arXiv.org Artificial IntelligenceJun-3-2024

Retrieval augmented generation has revolutionized large language model (LLM) outputs by providing factual supports. Nevertheless, it struggles to capture all the necessary knowledge for complex reasoning questions. Existing retrieval methods typically divide reference documents into passages, treating them in isolation. These passages, however, are often interrelated, such as passages that are contiguous or share the same keywords. Therefore, recognizing the relatedness is crucial for enhancing the retrieval process. In this paper, we propose a novel retrieval method, called GNN-Ret, which leverages graph neural networks (GNNs) to enhance retrieval by considering the relatedness between passages. Specifically, we first construct a graph of passages by connecting passages that are structure-related and keyword-related. A graph neural network (GNN) is then leveraged to exploit the relationships between passages and improve the retrieval of supporting passages. Furthermore, we extend our method to handle multi-hop reasoning questions using a recurrent graph neural network (RGNN), named RGNN-Ret. At each step, RGNN-Ret integrates the graphs of passages from previous steps, thereby enhancing the retrieval of supporting passages. Extensive experiments on benchmark datasets demonstrate that GNN-Ret achieves higher accuracy for question answering with a single query of LLMs than strong baselines that require multiple queries, and RGNN-Ret further improves accuracy and achieves state-of-the-art performance, with up to 10.4% accuracy improvement on the 2WikiMQA dataset.

dataset, retrieval, semantic distance, (15 more...)

arXiv.org Artificial Intelligence

2406.06572

Country:

Europe > Germany > Berlin (0.14)
Oceania > Australia (0.04)
North America > United States > Virginia > Richmond (0.04)
(14 more...)

Genre:

Workflow (1.00)
Research Report (1.00)
Personal > Obituary (0.68)

Industry:

Media > Film (0.93)
Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Towards Faithful and Robust LLM Specialists for Evidence-Based Question-Answering

Schimanski, Tobias, Ni, Jingwei, Kraus, Mathias, Ash, Elliott, Leippold, Markus

arXiv.org Artificial IntelligenceJun-3-2024

Advances towards more faithful and traceable answers of Large Language Models (LLMs) are crucial for various research and practical endeavors. One avenue in reaching this goal is basing the answers on reliable sources. However, this Evidence-Based QA has proven to work insufficiently with LLMs in terms of citing the correct sources (source quality) and truthfully representing the information within sources (answer attributability). In this work, we systematically investigate how to robustly fine-tune LLMs for better source quality and answer attributability. Specifically, we introduce a data generation pipeline with automated data quality filters, which can synthesize diversified high-quality training and testing data at scale. We further introduce four test sets to benchmark the robustness of fine-tuned specialist models. Extensive evaluation shows that fine-tuning on synthetic data improves performance on both in- and out-of-distribution. Furthermore, we show that data quality, which can be drastically improved by proposed quality filters, matters more than quantity in improving Evidence-Based QA.

dataset, evidence-based qa, llm, (15 more...)

arXiv.org Artificial Intelligence

2402.08277

Country:

North America > United States (0.14)
Asia > Singapore (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(5 more...)

Genre: Research Report > Experimental Study (0.68)

Industry:

Health & Medicine (0.46)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Improving Open-Ended Text Generation via Adaptive Decoding

Zhu, Wenhong, Hao, Hongkun, He, Zhiwei, Ai, Yiming, Wang, Rui

arXiv.org Artificial IntelligenceJun-2-2024

Current language models decode text token by token according to probabilistic distribution, and determining the appropriate candidates for the next token is crucial to ensure generation quality. This study introduces adaptive decoding, a mechanism that dynamically empowers language models to ascertain a sensible candidate set during generation. Specifically, we introduce an entropy-based metric called confidence and conceptualize determining the optimal candidate set as a confidence-increasing process. The rationality of including a token in the candidate set is assessed by leveraging the increment of confidence. Experimental results reveal that our method balances diversity and coherence well. The human evaluation shows that our method can generate human-preferred text. Additionally, our method can potentially improve the reasoning ability of language models.

algorithm, computational linguistic, open-ended text generation, (13 more...)

arXiv.org Artificial Intelligence

2402.18223

Country:

Europe > Austria > Vienna (0.14)
Africa > Democratic Republic of the Congo (0.14)
Africa > Gabon (0.04)
(18 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Government > Military (0.92)
Government > Voting & Elections (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Exploring the Potential of Hybrid Machine-Learning/Physics-Based Modeling for Atmospheric/Oceanic Prediction Beyond the Medium Range

Patel, Dhruvit, Arcomano, Troy, Hunt, Brian, Szunyogh, Istvan, Ott, Edward

arXiv.org Artificial IntelligenceMay-29-2024

This paper explores the potential of a hybrid modeling approach that combines machine learning (ML) with conventional physics-based modeling for weather prediction beyond the medium range. It extends the work of Arcomano et al. (2022), which tested the approach for short- and medium-range weather prediction, and the work of Arcomano et al. (2023), which investigated its potential for climate modeling. The hybrid model used for the forecast experiments of the paper is based on the low-resolution, simplified parameterization atmospheric general circulation model (AGCM) SPEEDY. In addition to the hybridized prognostic variables of SPEEDY, the current version of the model has three purely ML-based prognostic variables. One of these is 6~h cumulative precipitation, another is the sea surface temperature, while the third is the heat content of the top 300 m deep layer of the ocean. The model has skill in predicting the El Ni\~no cycle and its global teleconnections with precipitation for 3-7 months depending on the season. The model captures equatorial variability of the precipitation associated with Kelvin and Rossby waves and MJO. Predictions of the precipitation in the equatorial region have skill for 15 days in the East Pacific and 11.5 days in the West Pacific. Though the model has low spatial resolution, for these tasks it has prediction skill comparable to what has been published for high-resolution, purely physics-based, conventional operational forecast models.

forecast, model component, prediction, (13 more...)

arXiv.org Artificial Intelligence

2405.19518

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
Pacific Ocean (0.04)
South America (0.04)
(11 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.91)

Add feedback

JADS: A Framework for Self-supervised Joint Aspect Discovery and Summarization

Guo, Xiaobo, Desai, Jay, Sengamedu, Srinivasan H.

arXiv.org Artificial IntelligenceMay-28-2024

To generate summaries that include multiple aspects or topics for text documents, most approaches use clustering or topic modeling to group relevant sentences and then generate a summary for each group. These approaches struggle to optimize the summarization and clustering algorithms jointly. On the other hand, aspect-based summarization requires known aspects. Our solution integrates topic discovery and summarization into a single step. Given text data, our Joint Aspect Discovery and Summarization algorithm (JADS) discovers aspects from the input and generates a summary of the topics, in one step. We propose a self-supervised framework that creates a labeled dataset by first mixing sentences from multiple documents (e.g., CNN/DailyMail articles) as the input and then uses the article summaries from the mixture as the labels. The JADS model outperforms the two-step baselines. With pretraining, the model achieves better performance and stability. Furthermore, embeddings derived from JADS exhibit superior clustering capabilities. Our proposed method achieves higher semantic alignment with ground truth and is factual.

dataset, summarization, summary number, (14 more...)

arXiv.org Artificial Intelligence

2405.18642

Country:

Asia > Middle East > Iraq (0.14)
Europe > United Kingdom > England > Tyne and Wear > Sunderland (0.04)
North America > United States > New York (0.04)
(23 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Government > Military (1.00)
Leisure & Entertainment > Sports > Football (0.93)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback