Goto

Collaborating Authors

 astronomy


Starstruck

MIT Technology Review

Aomawa Shields '97 was equally enticed by the prospect of studying stars and the dream of becoming one herself. Today, she draws from her exploration of acting and astronomy to search for life on other planets. Few people, if any, contemplate stars--celestial or cinematic--the way Aomawa Shields does. An astronomer and astrobiologist, Shields explores the potential habitability of planets beyond our solar system. But she is also a classically trained actor--and that's helped shape her professional trajectory in unexpected ways. Today, Shields is an associate professor in the Department of Physics and Astronomy at the University of California, Irvine, where she oversees a research team that uses computer models to explore conditions on exoplanets, or planets that revolve around stars other than the sun.


AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy

Joseph, Sebastian Antony, Husain, Syed Murtaza, Offner, Stella S. R., Juneau, Stéphanie, Torrey, Paul, Bolton, Adam S., Farias, Juan P., Gaffney, Niall, Durrett, Greg, Li, Junyi Jessy

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are being explored for applications in scientific research, including their capabilities to synthesize literature, answer research questions, generate research ideas, and even conduct computational experiments. Ultimately, our goal is for these to help scientists derive novel scientific insights. In many areas of science, such insights often arise from processing and visualizing data to understand its patterns. However, evaluating whether an LLM-mediated scientific workflow produces outputs conveying the correct scientific insights is challenging to evaluate and has not been addressed in past work. We introduce AstroVisBench, the first benchmark for both scientific computing and visualization in the astronomy domain. AstroVisBench judges a language model's ability to both (1) create astronomy-specific workflows to process and analyze data and (2) visualize the results of these workflows through complex plots. Our evaluation of visualizations uses a novel LLM-as-a-judge workflow, which is validated against annotation by five professional astronomers. Using AstroVisBench we present an evaluation of state-of-the-art language models, showing a significant gap in their ability to engage in astronomy research as useful assistants. This evaluation provides a strong end-to-end evaluation for AI scientists that offers a path forward for the development of visualization-based workflows, which are central to a broad range of domains from physics to biology.


Astronomers Have Discovered Earth's Latest Quasi-Lunar Moon

WIRED

Astronomers Have Discovered Earth's Latest Quasi-Lunar Moon As mankind was planning the first moon landing in the 1960s, an asteroid approached Earth--and still hasn't left. The Earth has just added its seventh confirmed quasi-lunar moon. It is 2025 PN7, a small Apollo-type asteroid detected in August solely by its brightness, thanks to the Hawaiian Pan-STARRS 1 telescope. After analyzing its trajectory, scientists concluded that the object maintains a 1:1 resonance with the Earth. From a distant perspective, this synchronicity makes it look as if the Earth is accompanied by a tiny asteroid--as if it had an additional moon.


Deep Learning in Astrophysics

Ting, Yuan-Sen

arXiv.org Artificial Intelligence

Deep learning has generated diverse perspectives in astronomy, with ongoing discussions between proponents and skeptics motivating this review. We examine how neural networks complement classical statistics, extending our data analytical toolkit for modern surveys. Astronomy offers unique opportunities through encoding physical symmetries, conservation laws, and differential equations directly into architectures, creating models that generalize beyond training data. Yet challenges persist as unlabeled observations number in billions while confirmed examples with known properties remain scarce and expensive. This review demonstrates how deep learning incorporates domain knowledge through architectural design, with built-in assumptions guiding models toward physically meaningful solutions. We evaluate where these methods offer genuine advances versus claims requiring careful scrutiny. - Neural architectures overcome trade-offs between scalability, expressivity, and data efficiency by encoding physical symmetries and conservation laws into network structure, enabling learning from limited labeled data. - Simulation-based inference and anomaly detection extract information from complex, non-Gaussian distributions where analytical likelihoods fail, enabling field-level cosmological analysis and systematic discovery of rare phenomena. - Multi-scale neural modeling bridges resolution gaps in astronomical simulations, learning effective subgrid physics from expensive high-fidelity runs to enhance large-volume calculations where direct computation remains prohibitive. - Emerging paradigms-reinforcement learning for telescope operations, foundation models learning from minimal examples, and large language model agents for research automation-show promise though are still developing in astronomical applications.


The Platonic Universe: Do Foundation Models See the Same Sky?

UniverseTBD, null, :, null, Duraphe, Kshitij, Smith, Michael J., Sourav, Shashwat, Wu, John F.

arXiv.org Artificial Intelligence

We test the Platonic Representation Hypothesis (PRH) in astronomy by measuring representational convergence across a range of foundation models trained on different data types. Using spectroscopic and imaging observations from JWST, HSC, Legacy Survey, and DESI, we compare representations from vision transformers, self-supervised models, and astronomy-specific architectures via mutual $k$-nearest neighbour analysis. We observe consistent scaling: representational alignment generally increases with model capacity across our tested architectures, supporting convergence toward a shared representation of galaxy astrophysics. Our results suggest that astronomical foundation models can use pre-trained general-purpose architectures, allowing us to capitalise on the broader machine learning community's already-spent computational investment.


From Queries to Criteria: Understanding How Astronomers Evaluate LLMs

Hyk, Alina, McCormick, Kiera, Zhong, Mian, Ciucă, Ioana, Sharma, Sanjib, Wu, John F, Peek, J. E. G., Iyer, Kartheik G., Xiao, Ziang, Field, Anjalie

arXiv.org Artificial Intelligence

There is growing interest in leveraging LLMs to aid in astronomy and other scientific research, but benchmarks for LLM evaluation in general have not kept pace with the increasingly diverse ways that real people evaluate and use these models. In this study, we seek to improve evaluation procedures by building an understanding of how users evaluate LLMs. We focus on a particular use case: an LLM-powered retrieval-augmented generation bot for engaging with astronomical literature, which we deployed via Slack. Our inductive coding of 368 queries to the bot over four weeks and our follow-up interviews with 11 astronomers reveal how humans evaluated this system, including the types of questions asked and the criteria for judging responses. We synthesize our findings into concrete recommendations for building better benchmarks, which we then employ in constructing a sample benchmark for evaluating LLMs for astronomy. Overall, our work offers ways to improve LLM evaluation and ultimately usability, particularly for use in scientific research.


Recent Advances in Simulation-based Inference for Gravitational Wave Data Analysis

Liang, Bo, Wang, He

arXiv.org Machine Learning

The detection of gravitational waves by the LIGO-Virgo-KAGRA collaboration has ushered in a new era of observational astronomy, emphasizing the need for rapid and detailed parameter estimation and population-level analyses. Traditional Bayesian inference methods, particularly Markov chain Monte Carlo, face significant computational challenges when dealing with the high-dimensional parameter spaces and complex noise characteristics inherent in gravitational wave data. This review examines the emerging role of simulation-based inference methods in gravitational wave astronomy, with a focus on approaches that leverage machine-learning techniques such as normalizing flows and neural posterior estimation. We provide a comprehensive overview of the theoretical foundations underlying various simulation-based inference methods, including neural posterior estimation, neural ratio estimation, neural likelihood estimation, flow matching, and consistency models. We explore the applications of these methods across diverse gravitational wave data processing scenarios, from single-source parameter estimation and overlapping signal analysis to testing general relativity and conducting population studies. Although these techniques demonstrate speed improvements over traditional methods in controlled studies, their model-dependent nature and sensitivity to prior assumptions are barriers to their widespread adoption. Their accuracy, which is similar to that of conventional methods, requires further validation across broader parameter spaces and noise conditions.


Integrating Universal Generative AI Platforms in Educational Labs to Foster Critical Thinking and Digital Literacy

Znamenskiy, Vasiliy, Niyazov, Rafael, Hernandez, Joel

arXiv.org Artificial Intelligence

This paper presents a new educational framework for integrating generative artificial intelligence (GenAI) platforms such as ChatGPT, Claude, and Gemini into laboratory activities aimed at developing critical thinking and digital literacy among undergraduate students. Recognizing the limitations and risks of uncritical reliance on large language models (LLMs), the proposed pedagogical model reframes GenAI as a research subject and cognitive tool. Students formulate discipline-specific prompts and evaluate GenAI-generated responses in text, image, and video modalities. A pilot implementation in a general astronomy course for non-science majors demonstrated high levels of engagement and critical reflection, with many students continuing the activity after class and presenting results at a research symposium. The results highlight the importance of structured AI interactions in education and suggest that GenAI can improve learning outcomes when combined with reflective assessment methods. The study proposes a replicable model for interdisciplinary AI-integrated lab work, adaptable to scientific disciplines. See the guide to learning activities based on Generative-Ai platforms: https://doi.org/10.5281/zenodo.15555802


AstroMLab 4: Benchmark-Topping Performance in Astronomy Q&A with a 70B-Parameter Domain-Specialized Reasoning Model

de Haan, Tijmen, Ting, Yuan-Sen, Ghosal, Tirthankar, Nguyen, Tuan Dung, Accomazzi, Alberto, Herron, Emily, Lama, Vanessa, Pan, Rui, Wells, Azton, Ramachandra, Nesar

arXiv.org Artificial Intelligence

General-purpose large language models, despite their broad capabilities, often struggle with specialized domain knowledge, a limitation particularly pronounced in more accessible, lower-parameter versions. This gap hinders their deployment as effective agents in demanding fields such as astronomy. Building on our prior work with AstroSage-8B, this study introduces AstroSage-70B, a significantly larger and more advanced domain-specialized natural-language AI assistant. It is designed for research and education across astronomy, astrophysics, space science, astroparticle physics, cosmology, and astronomical instrumentation. Developed from the Llama-3.1-70B foundation, AstroSage-70B underwent extensive continued pre-training on a vast corpus of astronomical literature, followed by supervised fine-tuning and model merging. Beyond its 70-billion parameter scale, this model incorporates refined datasets, judiciously chosen learning hyperparameters, and improved training procedures, achieving state-of-the-art performance on complex astronomical tasks. Notably, we integrated reasoning chains into the SFT dataset, enabling AstroSage-70B to either answer the user query immediately, or first emit a human-readable thought process. Evaluated on the AstroMLab-1 benchmark -- comprising 4,425 questions from literature withheld during training -- AstroSage-70B achieves state-of-the-art performance. It surpasses all other tested open-weight and proprietary models, including leading systems like o3, Gemini-2.5-Pro, Claude-3.7-Sonnet, Deepseek-R1, and Qwen-3-235B, even those with API costs two orders of magnitude higher. This work demonstrates that domain specialization, when applied to large-scale models, can enable them to outperform generalist counterparts in specialized knowledge areas like astronomy, thereby advancing the frontier of AI capabilities in the field.


Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification

Li, Yu-Yang, Bai, Yu, Wang, Cunshi, Qu, Mengwei, Lu, Ziteng, Soria, Roberto, Liu, Jifeng

arXiv.org Artificial Intelligence

Light curves serve as a valuable source of information on stellar formation and evolution. With the rapid advancement of machine learning techniques, it can be effectively processed to extract astronomical patterns and information. In this study, we present a comprehensive evaluation of deep-learning and large language model (LLM) based models for the automatic classification of variable star light curves, based on large datasets from the Kepler and K2 missions. Special emphasis is placed on Cepheids, RR Lyrae, and eclipsing binaries, examining the influence of observational cadence and phase distribution on classification precision. Employing AutoDL optimization, we achieve striking performance with the 1D-Convolution+BiLSTM architecture and the Swin Transformer, hitting accuracies of 94\% and 99\% correspondingly, with the latter demonstrating a notable 83\% accuracy in discerning the elusive Type II Cepheids-comprising merely 0.02\% of the total dataset.We unveil StarWhisper LightCurve (LC), an innovative Series comprising three LLM-based models: LLM, multimodal large language model (MLLM), and Large Audio Language Model (LALM). Each model is fine-tuned with strategic prompt engineering and customized training methods to explore the emergent abilities of these models for astronomical data. Remarkably, StarWhisper LC Series exhibit high accuracies around 90\%, significantly reducing the need for explicit feature engineering, thereby paving the way for streamlined parallel data processing and the progression of multifaceted multimodal models in astronomical applications. The study furnishes two detailed catalogs illustrating the impacts of phase and sampling intervals on deep learning classification accuracy, showing that a substantial decrease of up to 14\% in observation duration and 21\% in sampling points can be realized without compromising accuracy by more than 10\%.