growth time
Leveraging Large Language Models to Address Data Scarcity in Machine Learning: Applications in Graphene Synthesis
Biswajeet, Devi Dutta, Kadkhodaei, Sara
Machine learning in materials science faces challenges due to limited experimental data, as generating synthesis data is costly and time-consuming, especially with in-house experiments. Mining data from existing literature introduces issues like mixed data quality, inconsistent formats, and variations in reporting experimental parameters, complicating the creation of consistent features for the learning algorithm. Additionally, combining continuous and discrete features can hinder the learning process with limited data. Here, we propose strategies that utilize large language models (LLMs) to enhance machine learning performance on a limited, heterogeneous dataset of graphene chemical vapor deposition synthesis compiled from existing literature. These strategies include prompting modalities for imputing missing data points and leveraging large language model embeddings to encode the complex nomenclature of substrates reported in chemical vapor deposition experiments. The proposed strategies enhance graphene layer classification using a support vector machine (SVM) model, increasing binary classification accuracy from 39% to 65% and ternary accuracy from 52% to 72%. We compare the performance of the SVM and a GPT-4 model, both trained and fine-tuned on the same data. Our results demonstrate that the numerical classifier, when combined with LLM-driven data enhancements, outperforms the standalone LLM predictor, highlighting that in data-scarce scenarios, improving predictive learning with LLM strategies requires more than simple fine-tuning on datasets. Instead, it necessitates sophisticated approaches for data imputation and feature space homogenization to achieve optimal performance. The proposed strategies emphasize data enhancement techniques, offering a broadly applicable framework for improving machine learning performance on scarce, inhomogeneous datasets.
A multiple k-means cluster ensemble framework for clustering citation trajectories
Chakraborty, Joyita, Pradhan, Dinesh K., Nandi, Subrata
Citation maturity time varies for different articles. However, the impact of all articles is measured in a fixed window. Clustering their citation trajectories helps understand the knowledge diffusion process and reveals that not all articles gain immediate success after publication. Moreover, clustering trajectories is necessary for paper impact recommendation algorithms. It is a challenging problem because citation time series exhibit significant variability due to non linear and non stationary characteristics. Prior works propose a set of arbitrary thresholds and a fixed rule based approach. All methods are primarily parameter dependent. Consequently, it leads to inconsistencies while defining similar trajectories and ambiguities regarding their specific number. Most studies only capture extreme trajectories. Thus, a generalised clustering framework is required. This paper proposes a feature based multiple k means cluster ensemble framework. 1,95,783 and 41,732 well cited articles from the Microsoft Academic Graph data are considered for clustering short term (10 year) and long term (30 year) trajectories, respectively. It has linear run time. Four distinct trajectories are obtained Early Rise Rapid Decline (2.2%), Early Rise Slow Decline (45%), Delayed Rise No Decline (53%), and Delayed Rise Slow Decline (0.8%). Individual trajectory differences for two different spans are studied. Most papers exhibit Early Rise Slow Decline and Delayed Rise No Decline patterns. The growth and decay times, cumulative citation distribution, and peak characteristics of individual trajectories are redefined empirically. A detailed comparative study reveals our proposed methodology can detect all distinct trajectory classes.
Configuration and Fabrication of Preformed Vine Robots
Agharese, Nathaniel, Okamura, Allison M.
Vine robots are a class of soft continuum robots that grow via tip eversion, allowing them to move their tip without relying on reaction forces from the environment. Constructed from compliant materials such as fabric and thin, flexible plastic, these robots are able to grow many times their original length with the use of fluidic pressure. They can be mechanically programmed/preformed to follow a desired path during growth by changing the structure of their body prior to deployment. We present a model for fabricating preformed vine robots with discrete bends. We apply this model across combinations of three fabrication methods and two materials. One fabrication method, taping folds into the robot body, is from the literature. The other two methods, welding folds and connecting fasteners embedded in the robot body, are novel. Measurements show the ability of the resulting vine robots to follow a desired path and show that fabrication method has a significant impact. Results include bend angles with as little as 0.12 degrees of error, and segment lengths with as low as 0.36 mm of error. The required growth pressure and average growth speed of these preformed vine robots ranged from 11.5 to 23.7kPA and 3.75 to 10 cm/s, respectively. These results validate the use of preformed vine robots for deployment along known paths, and serve as a guide for choosing a fabrication method and material combination based on the specific needs of the task.