Microclustering: When the Cluster Sizes Grow Sublinearly with the Size of the Data Set

arXiv.org Machine Learning

Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some tasks, this assumption is undesirable. For example, when performing entity resolution, the size of each cluster is often unrelated to the size of the data set. Consequently, each cluster contains a negligible fraction of the total number of data points. Such tasks therefore require models that yield clusters whose sizes grow sublinearly with the size of the data set. We address this requirement by defining the \emph{microclustering property} and introducing a new model that exhibits this property. We compare this model to several commonly used clustering models by checking model fit using real and simulated data sets.


Specialized chaperones required

Science

Although some proteins can reach a properly folded state without assistance, many require help to adopt the correct topology and avoid kinetic trapping in nonnative states. Chaperones encapsulate guest proteins and use adenosine triphosphate (ATP)–driven conformational changes to help them fold, but not all chaperones work for all substrates. Balchin et al. compared the folding pathway of the cytoskeleton protein actin with its proper chaperone, TRiC, to the incorrect folding that occurs with the bacterial chaperone GroEL. TRiC functions by stabilizing an extended form of actin with the proper secondary structure and topology. ATP binding and hydrolysis drives release of this partially folded intermediate into the chaperone where it can successfully fold.


Mason cautions young models

FOX News

NEW YORK – Model Claudia Mason didn't have a guide to the glamorous and sometimes difficult life of modeling. So, she decided to write one: "Finding the Supermodel in You." For one, Mason says young models never go to castings or shoots without a chaperone. "My mother always insisted that there was a chaperone present if she couldn't - she was a single mother, working raising me – to go off and accompany me to Europe or certain jobs, or wherever that there was a chaperone," Mason told FOX411. "So, it is so important to have some adult figure."


Report: Chaperones Didn't See Student Struggling in Water

U.S. News

The Sun Journal reports the preliminary findings by a law firm hired by the superintendent indicate adult chaperones felt that the lifeguard was slow to respond. The body of 13-year-old Rayan Lewis was ultimately found a half-hour after he went missing.


Putting the RuBisCO pieces together

Science

Among the thousands of different enzymes that have evolved in nature, ribulose-1,5-bisphosphate carboxylase-oxygenase (known as RuBisCO) holds a special place. It is the enzyme in plants, algae, and many photosynthetic bacteria that ultimately takes energy derived from the Sun and uses it to convert or "fix" atmospheric CO2 into organic forms of carbon that constitute the basis for life (1). To carry out such a massive chemical conversion requires a huge amount of the enzyme, especially because RuBisCO performs reactions quite slowly. Accordingly, RuBisCO is believed to be the most abundant enzyme on the planet (1, 2). RuBisCO is unusual in other ways as well.