magma
The Greek island of Santorini saw thousands of earthquakes last year - now scientists know why
Scientists reveal what triggered Santorini'earthquake swarm' The swarm of tens of thousands of earthquakes near the Greek island of Santorini earlier this year was triggered by molten rock pumping through an underground channel over three months, scientists have discovered. They used physics and artificial intelligence to work out exactly what caused the more than 25,000 earthquakes, which travelled about 20km (12 miles) horizontally through the Earth's crust. They used each of the tremors as virtual sensors, then used artificial intelligence to analyse patterns associated with them. One of the lead researchers, Dr Stephen Hicks from UCL, said combining physics and machine learning in this way could help forecast volcanic eruptions. The seismic activity started to stir beneath the Greek islands of Santorini, Amorgos, and Anafi in January 2025.
- South America (0.15)
- North America > Central America (0.15)
- Oceania > Australia (0.06)
- (14 more...)
- Research Report (0.70)
- Personal > Honors (0.30)
- Leisure & Entertainment (0.74)
- Media > Film (0.30)
Magma: A Foundation Model for Multimodal AI Agents
Yang, Jianwei, Tan, Reuben, Wu, Qianhui, Zheng, Ruijie, Peng, Baolin, Liang, Yongyuan, Gu, Yu, Cai, Mu, Ye, Seonghyeon, Jang, Joel, Deng, Yuquan, Liden, Lars, Gao, Jianfeng
We present Magma, a foundation model that serves multimodal AI agentic tasks in both the digital and physical worlds. Magma is a significant extension of vision-language (VL) models in that it not only retains the VL understanding ability (verbal intelligence) of the latter, but is also equipped with the ability to plan and act in the visual-spatial world (spatial-temporal intelligence) and complete agentic tasks ranging from UI navigation to robot manipulation. To endow the agentic capabilities, Magma is pretrained on large amounts of heterogeneous datasets spanning from images, videos to robotics data, where the actionable visual objects (e.g., clickable buttons in GUI) in images are labeled by Set-of-Mark (SoM) for action grounding, and the object movements (e.g., the trace of human hands or robotic arms) in videos are labeled by Trace-of-Mark (ToM) for action planning. Extensive experiments show that SoM and ToM reach great synergy and facilitate the acquisition of spatial-temporal intelligence for our Magma model, which is fundamental to a wide range of tasks as shown in Fig.1. In particular, Magma creates new state-of-the-art results on UI navigation and robotic manipulation tasks, outperforming previous models that are specifically tailored to these tasks. On image and video-related multimodal tasks, Magma also compares favorably to popular large multimodal models that are trained on much larger datasets. We make our model and code public for reproducibility at https://microsoft.github.io/Magma.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- (8 more...)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.45)
LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing
Zhang, Hongxiang, Rong, Yuyang, He, Yifeng, Chen, Hao
Greybox fuzzing has achieved success in revealing bugs and vulnerabilities in programs. However, randomized mutation strategies have limited the fuzzer's performance on structured data. Specialized fuzzers can handle complex structured data, but require additional efforts in grammar and suffer from low throughput. In this paper, we explore the potential of utilizing the Large Language Model to enhance greybox fuzzing for structured data. We utilize the pre-trained knowledge of LLM about data conversion and format to generate new valid inputs. We further fine-tuned it with paired mutation seeds to learn structured format and mutation strategies effectively. Our LLM-based fuzzer, LLAMAFUZZ, integrates the power of LLM to understand and mutate structured data to fuzzing. We conduct experiments on the standard bug-based benchmark Magma and a wide variety of real-world programs. LLAMAFUZZ outperforms our top competitor by 41 bugs on average. We also identified 47 unique bugs across all trials. Moreover, LLAMAFUZZ demonstrated consistent performance on both bug trigger and bug reached. Compared to AFL++, LLAMAFUZZ achieved 27.19% more branches in real-world program sets on average. We also demonstrate a case study to explain how LLMs enhance the fuzzing process in terms of code coverage.
- North America > United States > California > Yolo County > Davis (0.14)
- North America > United States > New York > New York County > New York City (0.04)
What do more quakes at one of California's riskiest volcanoes mean? Scientists think they know
One of California's riskiest volcanoes has for decades been undergoing geological changes and seismic activity, which are sometimes a precursor to an eruption, but -- thankfully -- no supervolcanic eruptions are expected. That's according to Caltech researchers who have been studying the Long Valley Caldera, which includes the Mammoth Lakes area in Mono County. The caldera was classified in 2018 by the U.S. Geological Survey as one of three volcanoes in the state -- along with 15 elsewhere in the U.S. -- considered a "very high threat," the highest-risk category defined by the agency. The two other volcanoes in California with that classification are Mt. Shasta in Siskiyou County and the Lassen Volcanic Center, which includes Lassen Peak in Shasta County.
- North America > United States > California > Siskiyou County (0.25)
- North America > United States > California > Shasta County (0.25)
- North America > United States > California > Mono County (0.25)
- (5 more...)
ILLUME: Rationalizing Vision-Language Models through Human Interactions
Brack, Manuel, Schramowski, Patrick, Deiseroth, Björn, Kersting, Kristian
Bootstrapping from pre-trained language models has been proven to be an efficient approach for building vision-language models (VLM) for tasks such as image captioning or visual question answering. However, outputs of these models rarely align with user's rationales for specific answers. In order to improve this alignment and reinforce commonsense reasons, we propose a tuning paradigm based on human interactions with machine-generated data. Our ILLUME executes the following loop: Given an image-question-answer prompt, the VLM samples multiple candidate rationales, and a human critic provides feedback via preference selection, used for fine-tuning. This loop increases the training data and gradually carves out the VLM's rationalization capabilities that are aligned with human intent. Our exhaustive experiments demonstrate that ILLUME is competitive with standard supervised finetuning while using significantly fewer training data and only requiring minimal feedback.
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Linearly Mapping from Image to Text Space
Merullo, Jack, Castricato, Louis, Eickhoff, Carsten, Pavlick, Ellie
The extent to which text-only language models (LMs) learn to represent features of the non-linguistic world is an open question. Prior work has shown that pretrained LMs can be taught to caption images when a vision model's parameters are optimized to encode images in the language space. We test a stronger hypothesis: that the conceptual representations learned by frozen text-only models and vision-only models are similar enough that this can be achieved with a linear map. We show that the image representations from vision models can be transferred as continuous prompts to frozen LMs by training only a single linear projection. Using these to prompt the LM achieves competitive performance on captioning and visual question answering tasks compared to models that tune both the image encoder and text decoder (such as the MAGMA model). We compare three image encoders with increasing amounts of linguistic supervision seen during pretraining: BEIT (no linguistic information), NF-ResNET (lexical category information), and CLIP (full natural language descriptions). We find that all three encoders perform equally well at transferring visual property information to the language model (e.g., whether an animal is large or small), but that image encoders pretrained with linguistic supervision more saliently encode category information (e.g., distinguishing hippo vs. elephant) and thus perform significantly better on benchmark language-and-vision tasks. Our results indicate that LMs encode conceptual information structurally similarly to vision-based models, even those that are solely trained on images. Code is available here: https://github.com/jmerullo/limber
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > Dominican Republic (0.04)
- Leisure & Entertainment > Sports > Tennis (0.68)
- Health & Medicine (0.67)
- Consumer Products & Services (0.67)
MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning
Eichenberg, Constantin, Black, Sidney, Weinbach, Samuel, Parcalabescu, Letitia, Frank, Anette
Large-scale pretraining is fast becoming the norm in Vision-Language (VL) modeling. However, prevailing VL approaches are limited by the requirement for labeled data and the use of complex multi-step pretraining objectives. We present MAGMA - a simple method for augmenting generative language models with additional modalities using adapter-based finetuning. Building on Frozen, we train a series of VL models that autoregressively generate text from arbitrary combinations of visual and textual input. The pretraining is entirely end-to-end using a single language modeling objective, simplifying optimization compared to previous approaches. Importantly, the language model weights remain unchanged during training, allowing for transfer of encyclopedic knowledge and in-context learning abilities from language pretraining. MAGMA outperforms Frozen on open-ended generative tasks, achieving state of the art results on the OKVQA benchmark and competitive results on a range of other popular VL benchmarks, while pretraining on 0.2% of the number of samples used to train SimVLM.
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Generation (0.64)
Aligning MAGMA by Few-Shot Learning and Finetuning
Layoun, Jean-Charles, Roger, Alexis, Rish, Irina
The goal of vision-language modeling is to allow models to tie language understanding with visual inputs. The aim of this paper is to evaluate and align the Visual Language Model (VLM) called Multimodal Augmentation of Generative Models through Adapter-based finetuning (MAGMA) with human values. MAGMA is a VLM that is capable of image captioning and visual question-answering. We will evaluate its alignment in three different scenarios. To begin, we assess MAGMA's out-of-the-box alignment through the checkpoint provided by Hugging Face. Then, we measure if few-shot learning manages to improve the results. Finally, we finetune the model on aligned examples and evaluate its behavior.
Sota – MLearning.ai
The person's age in the above photo is difficult to pinpoint, but Magma can recognize them regardless;) Magma a simple method for augmenting generative language models with additional modalities using adapter-based finetuning. Check below and use the demo to find out about the superpowers of this method. How to start with AI art? This is a question that many people are wondering. How should you start your adventure with AI art?
MAGMA: Inference and Prediction with Multi-Task Gaussian Processes
Leroy, Arthur, Latouche, Pierre, Guedj, Benjamin, Gey, Servane
We investigate the problem of multiple time series forecasting, with the objective to improve multiple-step-ahead predictions. We propose a multi-task Gaussian process framework to simultaneously model batches of individuals with a common mean function and a specific covariance structure. This common mean is defined as a Gaussian process for which the hyper-posterior distribution is tractable. Therefore an EM algorithm can be derived for simultaneous hyper-parameters optimisation and hyper-posterior computation. Unlike previous approaches in the literature, we account for uncertainty and handle uncommon grids of observations while maintaining explicit formulations, by modelling the mean process in a non-parametric probabilistic framework. We also provide predictive formulas integrating this common mean process. This approach greatly improves the predictive performance far from observations, where information shared across individuals provides a relevant prior mean. Our overall algorithm is called \textsc{Magma} (standing for Multi tAsk Gaussian processes with common MeAn), and publicly available as a R package. The quality of the mean process estimation, predictive performances, and comparisons to alternatives are assessed in various simulated scenarios and on real datasets.
- Europe > France > Île-de-France > Paris > Paris (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- Information Technology > Modeling & Simulation (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)