Goto

Collaborating Authors

 eddy


Generative Diffusion Model-based Downscaling of Observed Sea Surface Height over Kuroshio Extension since 2000

Han, Qiuchang, Jiang, Xingliang, Zhao, Yang, Wang, Xudong, Li, Zhijin, Zhang, Renhe

arXiv.org Artificial Intelligence

Satellite altimetry has been widely utilized to monitor global sea surface dynamics, enabling investigation of upper ocean variability from basin-scale to localized eddy ranges. However, the sparse spatial resolution of observational altimetry limits our understanding of oceanic submesoscale variability, prevalent at horizontal scales below 0.25o resolution. Here, we introduce a state-of-the-art generative diffusion model to train high-resolution sea surface height (SSH) reanalysis data and demonstrate its advantage in observational SSH downscaling over the eddy-rich Kuroshio Extension region. The diffusion-based model effectively downscales raw satellite-interpolated data from 0.25o resolution to 1/16o, corresponding to approximately 12-km wavelength. This model outperforms other high-resolution reanalysis datasets and neural network-based methods. Also, it successfully reproduces the spatial patterns and power spectra of satellite along-track observations. Our diffusion-based results indicate that eddy kinetic energy at horizontal scales less than 250 km has intensified significantly since 2004 in the Kuroshio Extension region. These findings underscore the great potential of deep learning in reconstructing satellite altimetry and enhancing our understanding of ocean dynamics at eddy scales.


Masked Generative Story Transformer with Character Guidance and Caption Augmentation

Papadimitriou, Christos, Filandrianos, Giorgos, Lymperaiou, Maria, Stamou, Giorgos

arXiv.org Artificial Intelligence

Story Visualization (SV) is a challenging generative vision task, that requires both visual quality and consistency between different frames in generated image sequences. Previous approaches either employ some kind of memory mechanism to maintain context throughout an auto-regressive generation of the image sequence, or model the generation of the characters and their background separately, to improve the rendering of characters. On the contrary, we embrace a completely parallel transformer-based approach, exclusively relying on Cross-Attention with past and future captions to achieve consistency. Additionally, we propose a Character Guidance technique to focus on the generation of characters in an implicit manner, by forming a combination of text-conditional and character-conditional logits in the logit space. We also employ a caption-augmentation technique, carried out by a Large Language Model (LLM), to enhance the robustness of our approach. The combination of these methods culminates into state-of-the-art (SOTA) results over various metrics in the most prominent SV benchmark (Pororo-SV), attained with constraint resources while achieving superior computational complexity compared to previous arts. The validity of our quantitative results is supported by a human survey.


MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

Tian, Changyao, Zhu, Xizhou, Xiong, Yuwen, Wang, Weiyun, Chen, Zhe, Wang, Wenhai, Chen, Yuntao, Lu, Lewei, Lu, Tong, Zhou, Jie, Li, Hongsheng, Qiao, Yu, Dai, Jifeng

arXiv.org Artificial Intelligence

Developing generative models for interleaved image-text data has both research and practical value. It requires models to understand the interleaved sequences and subsequently generate images and text. However, existing attempts are limited by the issue that the fixed number of visual tokens cannot efficiently capture image details, which is particularly problematic in the multi-image scenarios. To address this, this paper presents MM-Interleaved, an end-to-end generative model for interleaved image-text data. It introduces a multi-scale and multi-image feature synchronizer module, allowing direct access to fine-grained image features in the previous context during the generation process. MM-Interleaved is end-to-end pre-trained on both paired and interleaved image-text corpora. It is further enhanced through a supervised fine-tuning phase, wherein the model improves its ability to follow complex multi-modal instructions. Experiments demonstrate the versatility of MM-Interleaved in recognizing visual details following multi-modal instructions and generating consistent images following both textual and visual conditions. Code and models are available at \url{https://github.com/OpenGVLab/MM-Interleaved}.


Story Visualization by Online Text Augmentation with Context Memory

Ahn, Daechul, Kim, Daneul, Song, Gwangmo, Kim, Seung Hwan, Lee, Honglak, Kang, Dongyeop, Choi, Jonghyun

arXiv.org Artificial Intelligence

Story visualization (SV) is a challenging text-to-image generation task for the difficulty of not only rendering visual details from the text descriptions but also encoding a long-term context across multiple sentences. While prior efforts mostly focus on generating a semantically relevant image for each sentence, encoding a context spread across the given paragraph to generate contextually convincing images (e.g., with a correct character or with a proper background of the scene) remains a challenge. To this end, we propose a novel memory architecture for the Bi-directional Transformer framework with an online text augmentation that generates multiple pseudo-descriptions as supplementary supervision during training for better generalization to the language variation at inference. In extensive experiments on the two popular SV benchmarks, i.e., the Pororo-SV and Flintstones-SV, the proposed method significantly outperforms the state of the arts in various metrics including FID, character F1, frame accuracy, BLEU-2/3, and R-precision with similar or less computational complexity.


Researchers populated a tiny virtual town with AI (and it was very wholesome)

#artificialintelligence

What would happen if you filled a virtual town with AIs and set them loose? As it turns out, they brush their teeth and are very nice to one another! But this unexciting outcome is good news for the researchers who did it, since they wanted to produce "believable simulacra of human behavior" and got just that. The paper describing the experiment, by Stanford and Google researchers, has not been peer reviewed or accepted for publication anywhere, but it makes for interesting reading nonetheless. The idea was to see if they could apply the latest advances in machine learning models to produce "generative agents" that take in their circumstances and output a realistic action in response. And that's very much what they got.


Robo-penguin: how artificial birds are relaying the secrets of ocean currents

The Guardian

If it looks like a penguin and swims like a penguin – but it's actually a robot – then it must be the latest advance in marine sensory equipment. The Quadroin is an autonomous underwater vehicle (AUV): a 3D-printed self-propelled machine designed to mimic a penguin in order to measure the properties of oceanic eddies. It was developed by Burkard Baschek while head of Germany's Institute of Coastal Ocean Dynamics at the Helmholtz Centre Hereon in Geesthacht after he watched more than $20,000 of his equipment sink to the bottom of the Pacific Ocean. Eddies are small ocean currents that other research methods have struggled to capture. They influence all the animals and plants in the seas as well as the Earth's climate, driving roughly 50% of all phytoplankton production.


Cyberpunk 2077 Is a Mess On Every Level

TIME - Tech

Cyberpunk 2077 and the constellation of controversy orbiting it--at nearly every level of its making--is almost laughable. The open-world shooter game, developed by Polish studio CD Projekt Red, was billed as the next big thing in video games, an experience that would impress both visually and narratively. From a huge city full of opportunities to an arsenal of upgradable elements for your customizable character, how could one not be enticed by the previews ahead of the Dec. 10 release? Hell, it's even got Keanu Reeves in it, and a lot of him! In hindsight, Cyberpunk 2077's seven-year lead up didn't do it any favors. After all, you can only rely on hype for so long.


Physics-informed Tensor-train ConvLSTM for Volumetric Velocity Forecasting

Huang, Yu, Tang, Yufei, Zhuang, Hanqi, VanZwieten, James, Cherubin, Laurent

arXiv.org Machine Learning

According to the National Academies, a weekly forecast of velocity, vertical structure, and duration of the Loop Current (LC) and its eddies is critical for understanding the oceanography and ecosystem, and for mitigating outcomes of anthropogenic and natural disasters in the Gulf of Mexico (GoM). However, this forecast is a challenging problem since the LC behaviour is dominated by long-range spatial connections across multiple timescales. In this paper, we extend spatiotemporal predictive learning, showing its effectiveness beyond video prediction, to a 4D model, i.e., a novel Physics-informed Tensor-train ConvLSTM (PITT-ConvLSTM) for temporal sequences of 3D geospatial data forecasting. Specifically, we propose 1) a novel 4D higher-order recurrent neural network with empirical orthogonal function analysis to capture the hidden uncorrelated patterns of each hierarchy, 2) a convolutional tensor-train decomposition to capture higher-order space-time correlations, and 3) to incorporate prior physic knowledge that is provided from domain experts by informing the learning in latent space. The advantage of our proposed method is clear: constrained by physical laws, it simultaneously learns good representations for frame dependencies (both short-term and long-term high-level dependency) and inter-hierarchical relations within each time frame. Experiments on geospatial data collected from the GoM demonstrate that PITT-ConvLSTM outperforms the state-of-the-art methods in forecasting the volumetric velocity of the LC and its eddies for a period of over one week.


How close is the 'Dark Fate' predicted for humanity by the 'Terminator' saga?

#artificialintelligence

"Terminator: Dark Fate," the sixth installment in the long-running science-fiction franchise, opens Friday and posits a world in which a self-aware computer builds an army of killer robots it then uses in an attempt to wipe humanity off the face of the Earth. It's the same vision that filmmaker James Cameron dreamed up for the first "Terminator" movie in 1984, well before the advent of autonomous drones and advanced machine learning made the premise seem a little less science fiction. In that 35-year span, a variety of technological advancements in AI and robotics have brought elements of "Terminator" closer to reality. Artificial intelligence experts are confident, however, that the kind of independent AI and humanoid robots of the movie franchise are still far off. But they also offer a warning: the developments that people have made in AI and military technology could create their own kind of "Judgement Day." "AI is a powerful technology, but it's a tool, not unlike a pencil," Oren Etzioni, CEO of the Allen Institute for Artificial Intelligence, told NBC News.


Cultural Bias in Artificial Intelligence - The New Stack

#artificialintelligence

Advertising and white papers may make artificial intelligence seem like a pie in the sky proposition, with easy analysis, deep insights, and fair algorithms available everywhere. The reality, however, is that AI can expose an even darker side of our own humanity, acting as more of a mirror than as sky-pie. We saw this when Microsoft put an AI-driven bot up on Twitter, only to have it spout racist statements shortly thereafter. Camille Eddy, currently a student pursuing a mechanical engineering bachelor's degree at Boise State, already has a long career as a high-tech robotics intern at places like Alphabet and HP. At OSCON, she spoke on the topic of recognizing cultural bias in AI. "Some of the things we've seen are misclassification or misidentification. For example, Microsoft's Tay AI, a bot that was released on Twitter was famously easily influenced by people talking to it in racist and sexist ways, and it reflected that. People would say'This is an idea, you should hold this idea,' and it did. Talking about ways it can reflect our own biases as a society, and how that might not be something that we want," said Eddy.