Dog/cat classification is a fundamentally different problem from what is often done in stats (highly complex and low noise vs moderately complex and tons of noise. Though there's always the exception, in computer experiments there is often little to no noise and the problem type is immensely complex, ie modeling combustion instability simulators in jet engines using emulators). As a statistician, 3000 can get you a long way for certain classes of problems, but for many others it's extremely small. People around me touch on problems where there are hundreds of millions to billions of observations and hundreds of variables to just a few hundred observations but hundreds of thousands of variables to the weird fat cases where you have hundreds of thousands of observations and variables. My point being that your smaple of n 1 stat grad student is a very poor representation of the community as a whole.
Are you working with image data? This got me thinking -- what can we do if there are multiple object categories in an image? Making an image classification model was a good start, but I wanted to expand my horizons to take on a more challenging task -- building a multi-label image classification model! I didn't want to use toy datasets to build my model -- that is too generic. And then it struck me -- movie/TV series posters contain a variety of people. Could I build my own multi-label image classification model to predict the different genres just by looking at the poster?
Automatically classifying the tissues types of Region of Interest (ROI) in medical imaging has been an important application in Computer-Aided Diagnosis (CAD), such as classification of breast parenchymal tissue in the mammogram, classify lung disease patterns in High-Resolution Computed Tomography (HRCT) etc. Recently, bag-of-features method has shown its power in this field, treating each ROI as a set of local features. In this paper, we investigate using the bag-of-features strategy to classify the tissue types in medical imaging applications. Two important issues are considered here: the visual vocabulary learning and weighting. Although there are already plenty of algorithms to deal with them, all of them treat them independently, namely, the vocabulary learned first and then the histogram weighted. Inspired by Auto-Context who learns the features and classifier jointly, we try to develop a novel algorithm that learns the vocabulary and weights jointly. The new algorithm, called Joint-ViVo, works in an iterative way. In each iteration, we first learn the weights for each visual word by maximizing the margin of ROI triplets, and then select the most discriminate visual words based on the learned weights for the next iteration. We test our algorithm on three tissue classification tasks: identifying brain tissue type in magnetic resonance imaging (MRI), classifying lung tissue in HRCT images, and classifying breast tissue density in mammograms. The results show that Joint-ViVo can perform effectively for classifying tissues.