AITopics | Antarctica

Collaborating Authors

Antarctica

Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs

Feucht, Sheridan, Atkinson, David, Wallace, Byron, Bau, David

arXiv.org Artificial IntelligenceJun-28-2024

LLMs process text as sequences of tokens that roughly correspond to words, where less common words are represented by multiple tokens. However, individual tokens are often semantically unrelated to the meanings of the words/concepts they comprise. For example, Llama-2-7b's tokenizer splits the word "northeastern" into the tokens ['_n', 'ort', 'he', 'astern'], none of which correspond to semantically meaningful units like "north" or "east." Similarly, the overall meanings of named entities like "Neil Young" and multi-word expressions like "break a leg" cannot be directly inferred from their constituent tokens. Mechanistically, how do LLMs convert such arbitrary groups of tokens into useful higher-level representations? In this work, we find that last token representations of named entities and multi-token words exhibit a pronounced "erasure" effect, where information about previous and current tokens is rapidly forgotten in early layers. Using this observation, we propose a method to "read out" the implicit vocabulary of an autoregressive LLM by examining differences in token representations across layers, and present results of this method for Llama-2-7b and Llama-3-8B. To our knowledge, this is the first attempt to probe the implicit vocabulary of an LLM.

llama-3-8b, probe, sequence, (16 more...)

arXiv.org Artificial Intelligence

2406.20086

Country:

South America > Brazil > São Paulo (0.04)
Oceania > Australia > New South Wales (0.04)
North America > United States > North Dakota (0.04)
(8 more...)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment (0.94)
Media > Film (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.81)

Add feedback

Building Understandable Messaging for Policy and Evidence Review (BUMPER) with AI

Rosenfeld, Katherine A., Sonnewald, Maike, Jindal, Sonia J., McCarthy, Kevin A., Proctor, Joshua L.

arXiv.org Artificial IntelligenceJun-27-2024

We introduce a framework for the use of large language models (LLMs) in Building Understandable Messaging for Policy and Evidence Review (BUMPER). LLMs are proving capable of providing interfaces for understanding and synthesizing large databases of diverse media. This presents an exciting opportunity to supercharge the translation of scientific evidence into policy and action, thereby improving livelihoods around the world. However, these models also pose challenges related to access, trust-worthiness, and accountability. The BUMPER framework is built atop a scientific knowledge base (e.g., documentation, code, survey data) by the same scientists (e.g., individual contributor, lab, consortium). We focus on a solution that builds trustworthiness through transparency, scope-limiting, explicit-checks, and uncertainty measures. LLMs are rapidly being adopted and consequences are poorly understood. The framework addresses open questions regarding the reliability of LLMs and their use in high-stakes applications. We provide a worked example in health policy for a model designed to inform measles control programs. We argue that this framework can facilitate accessibility of and confidence in scientific evidence for policymakers, drive a focus on policy-relevance and translatability for researchers, and ultimately increase and accelerate the impact of scientific knowledge used for policy decisions.

cameroon, sia, supplementary immunization activity, (15 more...)

arXiv.org Artificial Intelligence

2407.12812

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Africa > Cameroon (0.07)
Asia > Pakistan (0.04)
(15 more...)

Genre:

Research Report (0.53)
Workflow (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government (1.00)
Health & Medicine > Therapeutic Area > Vaccines (0.94)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Graph Neural Network as Computationally Efficient Emulator of Ice-sheet and Sea-level System Model (ISSM)

Koo, Younghyun, Rahnemoonfar, Maryam

arXiv.org Artificial IntelligenceJun-26-2024

The Ice-sheet and Sea-level System Model (ISSM) provides solutions for Stokes equations relevant to ice sheet dynamics by employing finite element and fine mesh adaption. However, since its finite element method is compatible only with Central Processing Units (CPU), the ISSM has limits on further economizing computational time. Thus, by taking advantage of Graphics Processing Units (GPUs), we design a graph convolutional network (GCN) as a fast emulator for ISSM. The GCN is trained and tested using the 20-year transient ISSM simulations in the Pine Island Glacier (PIG). The GCN reproduces ice thickness and velocity with a correlation coefficient greater than 0.998, outperforming the traditional convolutional neural network (CNN). Additionally, GCN shows 34 times faster computational speed than the CPU-based ISSM modeling. The GPU-based GCN emulator allows us to predict how the PIG will change in the future under different melting rate scenarios with high fidelity and much faster computational time.

emulator, ice velocity, thickness, (15 more...)

arXiv.org Artificial Intelligence

2407.01464

Country:

Antarctica > West Antarctica (0.05)
North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)
North America > Greenland (0.04)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Graph Neural Networks for Emulation of Finite-Element Ice Dynamics in Greenland and Antarctic Ice Sheets

Koo, Younghyun, Rahnemoonfar, Maryam

arXiv.org Artificial IntelligenceJun-26-2024

Although numerical models provide accurate solutions for ice sheet dynamics based on physics laws, they accompany intensified computational demands to solve partial differential equations. In recent years, convolutional neural networks (CNNs) have been widely used as statistical emulators for those numerical models. However, since CNNs operate on regular grids, they cannot represent the refined meshes and computational efficiency of finite-element numerical models. Therefore, instead of CNNs, this study adopts an equivariant graph convolutional network (EGCN) as an emulator for the ice sheet dynamics modeling. EGCN reproduces ice thickness and velocity changes in the Helheim Glacier, Greenland, and Pine Island Glacier, Antarctica, with 260 times and 44 times faster computation time, respectively. Compared to the traditional CNN and graph convolutional network, EGCN shows outstanding accuracy in thickness prediction near fast ice streams by preserving the equivariance to the translation and rotation of graphs.

emulator, helheim glacier, ice sheet, (15 more...)

arXiv.org Artificial Intelligence

2406.18423

Country:

North America > Greenland (0.63)
Antarctica (0.26)
Southern Ocean > Ross Sea > Amundsen Sea (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation

Qian, Kun, Wan, Shunji, Tang, Claudia, Wang, Youzhi, Zhang, Xuanming, Chen, Maximillian, Yu, Zhou

arXiv.org Artificial IntelligenceJun-26-2024

As large language models achieve impressive scores on traditional benchmarks, an increasing number of researchers are becoming concerned about benchmark data leakage during pre-training, commonly known as the data contamination problem. To ensure fair evaluation, recent benchmarks release only the training and validation sets, keeping the test set labels closed-source. They require anyone wishing to evaluate his language model to submit the model's predictions for centralized processing and then publish the model's result on their leaderboard. However, this submission process is inefficient and prevents effective error analysis. To address this issue, we propose to variabilize benchmarks and evaluate language models dynamically. Specifically, we extract variables from each test case and define a value range for each variable. For each evaluation, we sample new values from these value ranges to create unique test cases, thus ensuring a fresh evaluation each time. We applied this variable perturbation method to four datasets: GSM8K, ARC, CommonsenseQA, and TruthfulQA, which cover mathematical generation and multiple-choice tasks. Our experimental results demonstrate that this approach provides a more accurate assessment of the true capabilities of language models, effectively mitigating the contamination problem.

contamination, customer, evaluation, (15 more...)

arXiv.org Artificial Intelligence

2406.17681

Country:

North America > United States > Iowa (0.04)
Europe > United Kingdom > England (0.04)
North America > United States > New York (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Consumer Health (1.00)
Education > Health & Safety > School Nutrition (1.00)
Health & Medicine > Therapeutic Area (0.93)
Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Stars take over Paris for sporty Vogue fashion show

BBC NewsJun-23-2024, 22:45:41 GMT

Singers, supermodels and sports stars descended on Paris as Vogue World took over a city square and turned it into a runway. The fashion magazine turned the historic Place Vendôme into a catwalk to celebrate 100 years of French fashion. A different sport was used as a backdrop for each decade of fashion from the 1920s to the present day - a month before the capital city hosts the Olympic Games. They're the biggest-selling act in the world, and they're about to play the Pyramid Stage.22 hrs agoCulture1 day ago Many have hit out at the brand online, suggesting they would return fewer items if sizing was consistent.1 day agoBusiness2 days ago As a new exhibition opens in London exploring the career of Naomi Campbell, Britain's first black supermodel, a look at the women who forged a path in fashion.2 The acclaimed fashion designer says it taught her a lesson - that fear was not an option.2

paris, sporty vogue fashion show, star take, (3 more...)

BBC News

Country:

South America (0.17)
North America > Central America (0.17)
Oceania > Australia (0.07)
(16 more...)

Industry:

Textiles, Apparel & Luxury Goods (1.00)
Leisure & Entertainment > Sports > Olympic Games (0.57)

Technology: Information Technology > Artificial Intelligence (0.75)

Add feedback

Learning Spatio-Temporal Patterns of Polar Ice Layers With Physics-Informed Graph Neural Network

Liu, Zesheng, Rahnemoonfar, Maryam

arXiv.org Artificial IntelligenceJun-21-2024

Learning spatio-temporal patterns of polar ice layers is crucial for monitoring the change in ice sheet balance and evaluating ice dynamic processes. While a few researchers focus on learning ice layer patterns from echogram images captured by airborne snow radar sensors via different convolutional neural networks, the noise in the echogram images proves to be a major obstacle. Instead, we focus on geometric deep learning based on graph neural networks to learn the spatio-temporal patterns from thickness information of shallow ice layers and make predictions for deep layers. In this paper, we propose a physics-informed hybrid graph neural network that combines the GraphSAGE framework for graph feature learning with the long short-term memory (LSTM) structure for learning temporal changes, and introduce measurements of physical ice properties from Model Atmospheric Regional (MAR) weather model as physical node features. We found that our proposed network can consistently outperform the current non-inductive or non-physical model in predicting deep ice layer thickness.

echogram image, ice layer, neural network, (14 more...)

arXiv.org Artificial Intelligence

2406.15299

Country:

North America > Greenland (0.07)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)
Antarctica (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Energy (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens

Fan, Yongqi, Sun, Hongli, Xue, Kui, Zhang, Xiaofan, Zhang, Shaoting, Ruan, Tong

arXiv.org Artificial IntelligenceJun-21-2024

Numerous advanced Large Language Models (LLMs) now support context lengths up to 128K, and some extend to 200K. Some benchmarks in the generic domain have also followed up on evaluating long-context capabilities. In the medical domain, tasks are distinctive due to the unique contexts and need for domain expertise, necessitating further evaluation. However, despite the frequent presence of long texts in medical scenarios, evaluation benchmarks of long-context capabilities for LLMs in this field are still rare. In this paper, we propose MedOdyssey, the first medical long-context benchmark with seven length levels ranging from 4K to 200K tokens. MedOdyssey consists of two primary components: the medical-context "needles in a haystack" task and a series of tasks specific to medical applications, together comprising 10 datasets. The first component includes challenges such as counter-intuitive reasoning and novel (unknown) facts injection to mitigate knowledge leakage and data contamination of LLMs. The second component confronts the challenge of requiring professional medical expertise. Especially, we design the ``Maximum Identical Context'' principle to improve fairness by guaranteeing that different LLMs observe as many identical contexts as possible. Our experiment evaluates advanced proprietary and open-source LLMs tailored for processing long contexts and presents detailed performance analyses. This highlights that LLMs still face challenges and need for further research in this area. Our code and data are released in the repository: \url{https://github.com/JOHNNY-fans/MedOdyssey.}

claude 3, llm, niah, (15 more...)

arXiv.org Artificial Intelligence

2406.15019

Country:

Asia > China > Shanghai > Shanghai (0.05)
Antarctica (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre:

Research Report (1.00)
Overview (0.87)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking

Li, Wenshuo, Chen, Xinghao, Shu, Han, Tang, Yehui, Wang, Yunhe

arXiv.org Artificial IntelligenceJun-17-2024

Large language models (LLM) have recently attracted significant attention in the field of artificial intelligence. However, the training process of these models poses significant challenges in terms of computational and storage capacities, thus compressing checkpoints has become an urgent problem. In this paper, we propose a novel Extreme Checkpoint Compression (ExCP) framework, which significantly reduces the required storage of training checkpoints while achieving nearly lossless performance. We first calculate the residuals of adjacent checkpoints to obtain the essential but sparse information for higher compression ratio. To further excavate the redundancy parameters in checkpoints, we then propose a weight-momentum joint shrinking method to utilize another important information during the model optimization, i.e., momentum. In particular, we exploit the information of both model and optimizer to discard as many parameters as possible while preserving critical information to ensure optimal performance. Furthermore, we utilize non-uniform quantization to further compress the storage of checkpoints. We extensively evaluate our proposed ExCP framework on several models ranging from 410M to 7B parameters and demonstrate significant storage reduction while maintaining strong performance. For instance, we achieve approximately $70\times$ compression for the Pythia-410M model, with the final performance being as accurate as the original model on various downstream tasks. Codes will be available at https://github.com/Gaffey/ExCP.

checkpoint, extreme llm checkpoint compression, weight-momentum joint, (9 more...)

arXiv.org Artificial Intelligence

2406.11257

Country:

Europe > Austria > Vienna (0.14)
South America (0.04)
Oceania > Australia (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations

Yu, Lei, Cao, Meng, Cheung, Jackie Chi Kit, Dong, Yue

arXiv.org Artificial IntelligenceJun-17-2024

State-of-the-art language models (LMs) sometimes generate non-factual hallucinations that misalign with world knowledge. To explore the mechanistic causes of these hallucinations, we create diagnostic datasets with subject-relation queries and adapt interpretability methods to trace hallucinations through internal model representations. We discover two general and distinct mechanistic causes of hallucinations shared across LMs (Llama-2, Pythia, GPT-J): 1) knowledge enrichment hallucinations: insufficient subject attribute knowledge in lower layer MLPs, and 2) answer extraction hallucinations: failure to select the correct object attribute in upper layer attention heads. We also found these two internal mechanistic causes of hallucinations are reflected in external manifestations. Based on insights from our mechanistic analysis, we propose a novel hallucination mitigation method through targeted restoration of the LM's internal fact recall pipeline, demonstrating superior performance compared to baselines.

hallucination, knowledge, language model, (15 more...)

arXiv.org Artificial Intelligence

2403.18167

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > France (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(13 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.88)

Add feedback