biochemistry
Advancing Scientific Text Classification: Fine-Tuned Models with Dataset Expansion and Hard-Voting
Rostam, Zhyar Rzgar K, Kertész, Gábor
Efficient text classification is essential for handling the increasing volume of academic publications. This study explores the use of pre-trained language models (PLMs), including BERT, SciBERT, BioBERT, and BlueBERT, fine-tuned on the Web of Science (WoS-46985) dataset for scientific text classification. To enhance performance, we augment the dataset by executing seven targeted queries in the WoS database, retrieving 1,000 articles per category aligned with WoS-46985's main classes. PLMs predict labels for this unlabeled data, and a hard-voting strategy combines predictions for improved accuracy and confidence. Fine-tuning on the expanded dataset with dynamic learning rates and early stopping significantly boosts classification accuracy, especially in specialized domains. Domain-specific models like SciBERT and BioBERT consistently outperform general-purpose models such as BERT. These findings underscore the efficacy of dataset augmentation, inference-driven label prediction, hard-voting, and fine-tuning techniques in creating robust and scalable solutions for automated academic text classification.
Differentiable Folding for Nearest Neighbor Model Optimization
Krueger, Ryan K., Aviran, Sharon, Mathews, David H., Zuber, Jeffrey, Ward, Max
The Nearest Neighbor model is the $\textit{de facto}$ thermodynamic model of RNA secondary structure formation and is a cornerstone of RNA structure prediction and sequence design. The current functional form (Turner 2004) contains $\approx13,000$ underlying thermodynamic parameters, and fitting these to both experimental and structural data is computationally challenging. Here, we leverage recent advances in $\textit{differentiable folding}$, a method for directly computing gradients of the RNA folding algorithms, to devise an efficient, scalable, and flexible means of parameter optimization that uses known RNA structures and thermodynamic experiments. Our method yields a significantly improved parameter set that outperforms existing baselines on all metrics, including an increase in the average predicted probability of ground-truth sequence-structure pairs for a single RNA family by over 23 orders of magnitude. Our framework provides a path towards drastically improved RNA models, enabling the flexible incorporation of new experimental data, definition of novel loss terms, large training sets, and even treatment as a module in larger deep learning pipelines. We make available a new database, RNAometer, with experimentally-determined stabilities for small RNA model systems.
- Europe (0.29)
- Oceania > Australia > Western Australia (0.14)
- North America > United States > California > Yolo County > Davis (0.14)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)
Predicting the Future of AI with AI
The amount of scientific research in AI has been growing exponentially over the last few years, making it challenging for scientists and practitioners to keep track of the progress. Reports suggest that the number of ML papers doubles every 23 months. One of the reasons behind it is that AI is being leveraged in diverse disciplines like mathematics, statistics, physics, medicine, and biochemistry. This poses a unique challenge of organising different ideas and understanding new scientific connections. To this end, a group of researchers led by Mario Krenn and others from the Max Planck Institute for the Science of Light (MPL), Erlangen, Germany, the University of California, the University of Toronto, etc., jointly released a study on high-quality link prediction in an exponentially growing knowledge network.
- North America > Canada > Ontario > Toronto (0.56)
- North America > United States > California (0.25)
- Europe > Germany (0.25)
Why is current deep learning technology a dead end for Artificial General Intelligence?
To not question things is to agree to stay in the same place. Often during that process, your mind can go to wrong directions. But still, you can learn a lot, during the exploration of uncharted territories of the human potential. Excuse me that I will move away from the main topic for a moment, but first I want to share something. More than 10 years ago I started to learn in a hard way what is the power of continuous effort.
Filter to survive
Nowadays, the world accosts us with a veritable tsunami of stimuli, a wall of information that constantly demands our attention. It's too much to keep up with, let alone remember. Too many stimuli make our amygdala overheated and stressed. The amygdala is a small part of the brain, very close to the brain stem. It used to be our survival mechanism, saving us from the tiger in the bushes by making us run first and ask questions later.
Douglas Adams was right – knowledge without understanding is meaningless John Naughton
Fans of Douglas Adams's Hitchhiker's Guide to the Galaxy treasure the bit where a group of hyper-dimensional beings demand that a supercomputer tells them the secret to life, the universe and everything. The machine, which has been constructed specifically for this purpose, takes 7.5m years to compute the answer, which famously comes out as 42. The computer helpfully points out that the answer seems meaningless because the beings who instructed it never knew what the question was. Machine-learning may soon enable us to accurately predict how a protein will fold. But it won't be scientific knowledge It's years since I read Adams's wonderful novel, but an article published in Nature last month brought it vividly to mind.
Johnson & Johnson Post-doc federated and privacy-preserving machine learning Beerse, Belgium Informatics
Janssen Research & Development seeks to drive innovation and deliver transformational medicines for the treatment of diseases in six therapeutic areas: neuroscience, cardiovascular diseases and metabolism, infectious diseases, immunology, oncology and pulmonary hypertension. In these areas, Janssen aims to address and solve unmet medical needs through the development of small and large molecules, as well as vaccines. The Janssen campus in Beerse (Belgium) has a unique ecosystem covering the complete drug development life cycle, with all capabilities from basic science to market access on one campus. The integrated environment of our campus gives our people the chance to experience many different aspects of drug development throughout their career. It has a successful track record of over sixty years of drug discovery and development and is one of the most important innovation engines of the Janssen group worldwide.
- Information Technology > Artificial Intelligence > Machine Learning (0.79)
- Information Technology > Data Science > Data Mining > Big Data (0.45)
Molecular Biology for Computer Scientists
He also taught the biochemistry course that I finally took, two years after finishing my Ph.D. David J. States deserves much of the credit as well. In the three years we have been working together, he greatly extended my understanding of not only what biologists know, but how they think. He has read several drafts of this chapter and made helpful suggestions. David Landsman, Mark Boguski, Kalí Tal and Jill Shirmer have also read the chapter and made suggestions. Angel Lee graciously supplied the gel used in Figure 4. Of course, all remaining mistakes are my responsibility.
Letters to the Editor
I appreciated very much the Spring 1990 issue of the AI Magazine on Robotic Assembly and Task Planning. It seems to me, however, that some good work that has been carried out on this subject in Europe during recent years has not been covered very much. Also commons on the low participation levels of women in the computer industry, suggestions for the inclusion of dissertation abstracts, comments on the Feldman article in the Fall 1990 issue, and a note about the discontinuance of plastic coverings on AI Magazine.
- North America > United States > California > San Mateo County > Menlo Park (0.15)
- North America > United States > New York (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- (5 more...)