AITopics | Antarctica

Collaborating Authors

Antarctica

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

Fu, Chaoyou, Zhang, Renrui, Wang, Zihan, Huang, Yubo, Zhang, Zhengye, Qiu, Longtian, Ye, Gaoxiang, Shen, Yunhang, Zhang, Mengdan, Chen, Peixian, Zhao, Sirui, Lin, Shaohui, Jiang, Deqiang, Yin, Di, Gao, Peng, Li, Ke, Li, Hongsheng, Sun, Xing

arXiv.org Artificial IntelligenceDec-20-2023

The surge of interest towards Multi-modal Large Language Models (MLLMs), e.g., GPT-4V(ision) from OpenAI, has marked a significant trend in both academia and industry. They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks. Very recently, Google released Gemini, its newest and most capable MLLM built from the ground up for multi-modality. In light of the superior reasoning capabilities, can Gemini challenge GPT-4V's leading position in multi-modal learning? In this paper, we present a preliminary exploration of Gemini Pro's visual understanding proficiency, which comprehensively covers four domains: fundamental perception, advanced cognition, challenging vision tasks, and various expert capacities. We compare Gemini Pro with the state-of-the-art GPT-4V to evaluate its upper limits, along with the latest open-sourced MLLM, Sphinx, which reveals the gap between manual efforts and black-box systems. The qualitative samples indicate that, while GPT-4V and Gemini showcase different answering styles and preferences, they can exhibit comparable visual reasoning capabilities, and Sphinx still trails behind them concerning domain generalizability. Specifically, GPT-4V tends to elaborate detailed explanations and intermediate steps, and Gemini prefers to output a direct and concise answer. The quantitative evaluation on the popular MME benchmark also demonstrates the potential of Gemini to be a strong challenger to GPT-4V. Our early investigation of Gemini also observes some common issues of MLLMs, indicating that there still remains a considerable distance towards artificial general intelligence. Our project for tracking the progress of MLLM is released at https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models.

artwork recognition and description, movie recognition and description, visual reasoning table, (15 more...)

arXiv.org Artificial Intelligence

2312.12436

Country:

North America > United States (0.92)
Europe > France > Île-de-France > Paris > Paris (0.14)
Asia > Middle East > Jordan (0.04)
(11 more...)

Genre:

Research Report (1.00)
Workflow (0.67)

Industry:

Transportation > Passenger (1.00)
Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
(21 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Southern Ocean Dynamics Under Climate Change: New Knowledge Through Physics-Guided Machine Learning

Yik, William, Sonnewald, Maike, Clare, Mariana C. A., Lguensat, Redouane

arXiv.org Artificial IntelligenceDec-17-2023

Complex ocean systems such as the Antarctic Circumpolar Current play key roles in the climate, and current models predict shifts in their strength and area under climate change. However, the physical processes underlying these changes are not well understood, in part due to the difficulty of characterizing and tracking changes in ocean physics in complex models. Using the Antarctic Circumpolar Current as a case study, we extend the method Tracking global Heating with Ocean Regimes (THOR) to a mesoscale eddy permitting climate model and identify regions of the ocean characterized by similar physics, called dynamical regimes, using readily accessible fields from climate models. To this end, we cluster grid cells into dynamical regimes and train an ensemble of neural networks, allowing uncertainty quantification, to predict these regimes and track them under climate change. Finally, we leverage this new knowledge to elucidate the dynamical drivers of the identified regime shifts as noted by the neural network using the 'explainability' methods SHAP and Layer-wise Relevance Propagation. A region undergoing a profound shift is where the Antarctic Circumpolar Current intersects the Pacific-Antarctic Ridge, an area important for carbon draw-down and fisheries. In this region, THOR specifically reveals a shift in dynamical regime under climate change driven by changes in wind stress and interactions with bathymetry. Using this knowledge to guide further exploration, we find that as the Antarctic Circumpolar Current shifts north under intensifying wind stress, the dominant dynamical role of bathymetry weakens and the flow intensifies.

dynamical regime, ensemble, prediction, (13 more...)

arXiv.org Artificial Intelligence

2310.13916

Country:

Southern Ocean > Weddell Sea (0.04)
Pacific Ocean (0.04)
North America > United States > California > Los Angeles County > Claremont (0.04)
(5 more...)

Genre:

Research Report (0.82)
Workflow (0.68)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

VideoLCM: Video Latent Consistency Model

Wang, Xiang, Zhang, Shiwei, Zhang, Han, Liu, Yu, Zhang, Yingya, Gao, Changxin, Sang, Nong

arXiv.org Artificial IntelligenceDec-14-2023

Consistency models have demonstrated powerful capability in efficient image generation and allowed synthesis within a few sampling steps, alleviating the high computational cost in diffusion models. However, the consistency model in the more challenging and resource-consuming video generation is still less explored. In this report, we present the VideoLCM framework to fill this gap, which leverages the concept of consistency models from image generation to efficiently synthesize videos with minimal steps while maintaining high quality. VideoLCM builds upon existing latent video diffusion models and incorporates consistency distillation techniques for training the latent consistency model. Experimental results reveal the effectiveness of our VideoLCM in terms of computational efficiency, fidelity and temporal consistency. Notably, VideoLCM achieves high-fidelity and smooth video synthesis with only four sampling steps, showcasing the potential for real-time synthesis. We hope that VideoLCM can serve as a simple yet effective baseline for subsequent research. The source code and models will be publicly available.

consistency model, diffusion model, synthesis, (15 more...)

arXiv.org Artificial Intelligence

2312.09109

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > United Kingdom > England (0.04)
Europe > Spain > Balearic Islands (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

OpenAsp: A Benchmark for Multi-document Open Aspect-based Summarization

Amar, Shmuel, Schiff, Liat, Ernst, Ori, Shefer, Asi, Shapira, Ori, Dagan, Ido

arXiv.org Artificial IntelligenceDec-7-2023

The performance of automatic summarization models has improved dramatically in recent years. Yet, there is still a gap in meeting specific information needs of users in real-world scenarios, particularly when a targeted summary is sought, such as in the useful aspect-based summarization setting targeted in this paper. Previous datasets and studies for this setting have predominantly concentrated on a limited set of pre-defined aspects, focused solely on single document inputs, or relied on synthetic data. To advance research on more realistic scenarios, we introduce OpenAsp, a benchmark for multi-document \textit{open} aspect-based summarization. This benchmark is created using a novel and cost-effective annotation protocol, by which an open aspect dataset is derived from existing generic multi-document summarization datasets. We analyze the properties of OpenAsp showcasing its high-quality content. Further, we show that the realistic open-aspect setting realized in OpenAsp poses a challenge for current state-of-the-art summarization models, as well as for large language models.

aspect-based summary, dataset, summarization, (16 more...)

arXiv.org Artificial Intelligence

2312.0444

Country:

Asia > Middle East > Israel (0.14)
Europe > France (0.05)
Antarctica (0.04)
(21 more...)

Genre:

Research Report (0.50)
Overview (0.46)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Communications > Social Media (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Low-power, Continuous Remote Behavioral Localization with Event Cameras

Hamann, Friedhelm, Ghosh, Suman, Martinez, Ignacio Juarez, Hart, Tom, Kacelnik, Alex, Gallego, Guillermo

arXiv.org Artificial IntelligenceDec-6-2023

Researchers in natural science need reliable methods for quantifying animal behavior. Recently, numerous computer vision methods emerged to automate the process. However, observing wild species at remote locations remains a challenging task due to difficult lighting conditions and constraints on power supply and data storage. Event cameras offer unique advantages for battery-dependent remote monitoring due to their low power consumption and high dynamic range capabilities. We use this novel sensor to quantify a behavior in Chinstrap penguins called ecstatic display. We formulate the problem as a temporal action detection task, determining the start and end times of the behavior. For this purpose, we recorded a colony of breeding penguins in Antarctica during several weeks and labeled event data on 16 nests. The developed method consists of a generator of candidate time intervals (proposals) and a classifier of the actions within them. The experiments show that the event cameras' natural response to motion is effective for continuous behavior monitoring and detection, reaching a mean average precision (mAP) of 58% (which increases to 63% in good weather conditions). The results also demonstrate the robustness against various lighting conditions contained in the challenging dataset. The low-power capabilities of the event camera allows to record three times longer than with a conventional camera. This work pioneers the use of event cameras for remote wildlife observation, opening new interdisciplinary opportunities. https://tub-rip.github.io/eventpenguins/

artificial intelligence, machine learning, proposal, (18 more...)

arXiv.org Artificial Intelligence

2312.03799

Country:

Antarctica (0.25)
Europe (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.68)
Materials > Chemicals > Industrial Gases > Liquified Gas (0.68)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.68)
Energy > Oil & Gas > Midstream (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Polyglot or Not? Measuring Multilingual Encyclopedic Knowledge in Foundation Models

Schott, Tim, Furman, Daniel, Bhat, Shreshta

arXiv.org Artificial IntelligenceDec-5-2023

In this work, we assess the ability of foundation models to recall encyclopedic knowledge across a wide range of linguistic contexts. To support this, we: 1) produce a 20-language dataset that contains 303k factual associations paired with counterfactuals, 2) evaluate 5 models in a multilingual test, and 3) benchmark a diverse set of 24 models in an English-only test. Meta's LLaMA achieves the highest scores in both multilingual and English-only evaluations. Yet, an analysis of LLaMA's errors reveals significant limitations in its ability to recall facts in languages other than English, plus difficulties related to the location and gender of fact subjects. Overall, our findings suggest that today's foundation models are far from polyglots.

computational linguistic, dataset, stem fact pair, (14 more...)

arXiv.org Artificial Intelligence

2305.13675

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Antarctica (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(14 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Simulation-Based Inference of Surface Accumulation and Basal Melt Rates of an Antarctic Ice Shelf from Isochronal Layers

Moss, Guy, Višnjević, Vjeran, Eisen, Olaf, Oraschewski, Falk M., Schröder, Cornelius, Macke, Jakob H., Drews, Reinhard

arXiv.org Artificial IntelligenceDec-3-2023

The ice shelves buttressing the Antarctic ice sheet determine the rate of ice-discharge into the surrounding oceans. The geometry of ice shelves, and hence their buttressing strength, is determined by ice flow as well as by the local surface accumulation and basal melt rates, governed by atmospheric and oceanic conditions. Contemporary methods resolve one of these rates, but typically not both. Moreover, there is little information of how they changed in time. We present a new method to simultaneously infer the surface accumulation and basal melt rates averaged over decadal and centennial timescales. We infer the spatial dependence of these rates along flow line transects using internal stratigraphy observed by radars, using a kinematic forward model of internal stratigraphy. We solve the inverse problem using simulation-based inference (SBI). SBI performs Bayesian inference by training neural networks on simulations of the forward model to approximate the posterior distribution, allowing us to also quantify uncertainties over the inferred parameters. We demonstrate the validity of our method on a synthetic example, and apply it to Ekstr\"om Ice Shelf, Antarctica, for which newly acquired radar measurements are available. We obtain posterior distributions of surface accumulation and basal melt averaging over 42, 84, 146, and 188 years before 2022. Our results suggest stable atmospheric and oceanographic conditions over this period in this catchment of Antarctica. Use of observed internal stratigraphy can separate the effects of surface accumulation and basal melt, allowing them to be interpreted in a historical context of the last centuries and beyond.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2312.02997

Country:

Antarctica (0.56)
North America > United States (0.28)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > Germany > Bremen (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

ENS-t-SNE: Embedding Neighborhoods Simultaneously t-SNE

Miller, Jacob, Huroyan, Vahan, Navarrete, Raymundo, Hossain, Md Iqbal, Kobourov, Stephen

arXiv.org Artificial IntelligenceDec-2-2023

When visualizing a high-dimensional dataset, dimension reduction techniques are commonly employed which provide a single 2 dimensional view of the data. We describe ENS-t-SNE: an algorithm for Embedding Neighborhoods Simultaneously that generalizes the t-Stochastic Neighborhood Embedding approach. By using different viewpoints in ENS-t-SNE's 3D embedding, one can visualize different types of clusters within the same high-dimensional dataset. This enables the viewer to see and keep track of the different types of clusters, which is harder to do when providing multiple 2D embeddings, where corresponding points cannot be easily identified. We illustrate the utility of ENS-t-SNE with real-world applications and provide an extensive quantitative evaluation with datasets of different types and sizes.

dataset, en-t-sne, subspace, (15 more...)

arXiv.org Artificial Intelligence

2205.1172

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Arizona (0.04)
Europe > United Kingdom > England > East Sussex > Brighton (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Consumer Health (0.93)
Education > Health & Safety > School Nutrition (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Reconstructing Historical Climate Fields With Deep Learning

Bochow, Nils, Poltronieri, Anna, Rypdal, Martin, Boers, Niklas

arXiv.org Artificial IntelligenceNov-30-2023

Historical records of climate fields are often sparse due to missing measurements, especially before the introduction of large-scale satellite missions. Several statistical and model-based methods have been introduced to fill gaps and reconstruct historical records. Here, we employ a recently introduced deep-learning approach based on Fourier convolutions, trained on numerical climate model output, to reconstruct historical climate fields. Using this approach we are able to realistically reconstruct large and irregular areas of missing data, as well as reconstruct known historical events such as strong El Ni\~no and La Ni\~na with very little given information. Our method outperforms the widely used statistical kriging method as well as other recent machine learning approaches. The model generalizes to higher resolutions than the ones it was trained on and can be used on a variety of climate fields. Moreover, it allows inpainting of masks never seen before during the model training.

artificial intelligence, machine learning, rmse, (17 more...)

arXiv.org Artificial Intelligence

2311.18348

Country:

Europe > Germany > Brandenburg > Potsdam (0.04)
South America > Venezuela > Zulia State > Lake Maracaibo (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(10 more...)

Genre: Research Report (0.83)

Industry: Government > Regional Government > North America Government > United States Government (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scalable Extraction of Training Data from (Production) Language Models

Nasr, Milad, Carlini, Nicholas, Hayase, Jonathan, Jagielski, Matthew, Cooper, A. Feder, Ippolito, Daphne, Choquette-Choo, Christopher A., Wallace, Eric, Tramèr, Florian, Lee, Katherine

arXiv.org Artificial IntelligenceNov-28-2023

This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT. Existing techniques from the literature suffice to attack unaligned models; in order to attack the aligned ChatGPT, we develop a new divergence attack that causes the model to diverge from its chatbot-style generations and emit training data at a rate 150x higher than when behaving properly. Our methods show practical attacks can recover far more data than previously thought, and reveal that current alignment techniques do not eliminate memorization.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2311.17035

Country:

South America (1.00)
North America > United States > California (1.00)
Asia > Middle East (1.00)
(39 more...)

Genre:

Personal (1.00)
Research Report > New Finding (0.92)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Media > Television (1.00)
(26 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback