Goto

Collaborating Authors

 Chen, Victor


Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry

arXiv.org Artificial Intelligence

Here, we present the outcomes from the second Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry, which engaged participants across global hybrid locations, resulting in 34 team submissions. The submissions spanned seven key application areas and demonstrated the diverse utility of LLMs for applications in (1) molecular and material property prediction; (2) molecular and material design; (3) automation and novel interfaces; (4) scientific communication and education; (5) research data management and automation; (6) hypothesis generation and evaluation; and (7) knowledge extraction and reasoning from scientific literature. Each team submission is presented in a summary table with links to the code and as brief papers in the appendix. Beyond team results, we discuss the hackathon event and its hybrid format, which included physical hubs in Toronto, Montreal, San Francisco, Berlin, Lausanne, and Tokyo, alongside a global online hub to enable local and virtual collaboration. Overall, the event highlighted significant improvements in LLM capabilities since the previous year's hackathon, suggesting continued expansion of LLMs for applications in materials science and chemistry research. These outcomes demonstrate the dual utility of LLMs as both multipurpose models for diverse machine learning tasks and platforms for rapid prototyping custom applications in scientific research.


LiDAR-Forest Dataset: LiDAR Point Cloud Simulation Dataset for Forestry Application

arXiv.org Artificial Intelligence

LiDAR simulation and the relevant LiDAR-based applications in Ghallabi et al. [6] used multi-layer LiDAR data to detect lane forestry in Sec. 2, the design and creation of our dataset and metrics markings, which were matched to a prior map using particle filtering in Sec. 3, the extensibility and potential applications in Sec. 4, a to achieve improvements over standard GPS solutions. Jacobsen discussion of future work in Sec. 5, and a conclusion summarizing and Teizer [12] proposed a novel worker safety monitoring system the work in Sec. 6. For its effectiveness, we hope the simulation using LiDAR for precise real-time presence detection near hazards, system and data can catalyze a transformation in simulation systems demonstrably improving over GPS solutions when tested in a virtual and inspire new insights to the digital forestry community.


Robust MCMC Sampling with Non-Gaussian and Hierarchical Priors in High Dimensions

arXiv.org Machine Learning

A key problem in inference for high dimensional unknowns is the design of sampling algorithms whose performance scales favourably with the dimension of the unknown. A typical setting in which these problems arise is the area of Bayesian inverse problems. In such problems, which include graph-based learning, nonparametric regression and PDE-based inversion, the unknown can be viewed as an infinite-dimensional parameter (such as a function) that has been discretised. This results in a high-dimensional space for inference. Here we study robustness of an MCMC algorithm for posterior inference; this refers to MCMC convergence rates that do not deteriorate as the discretisation becomes finer. When a Gaussian prior is employed there is a known methodology for the design of robust MCMC samplers. However, one often requires more flexibility than a Gaussian prior can provide: hierarchical models are used to enable inference of parameters underlying a Gaussian prior; or non-Gaussian priors, such as Besov, are employed to induce sparse MAP estimators; or deep Gaussian priors are used to represent other non-Gaussian phenomena; and piecewise constant functions, which are necessarily non-Gaussian, are required for classification problems. The purpose of this article is to show that the simulation technology available for Gaussian priors can be exported to such non-Gaussian priors. The underlying methodology is based on a white noise representation of the unknown. This is exploited both for robust posterior sampling and for joint inference of the function and parameters involved in the specification of its prior, in which case our framework borrows strength from the well-developed non-centred methodology for Bayesian hierarchical models. The desired robustness of the proposed sampling algorithms is supported by some theory and by extensive numerical evidence from several challenging problems.