One thing you can do for example is to output an intermediate layer activation for every data row, then train your classifier/regressor/whatever on those activations rather than on the original features. Which layer to use depends on the problem you're trying to solve, but the closer to the final dense layers you get the more you're going to find things related to what the pre-trained model was trained on (at least, for networks which only inject the error signal at the end). An example of some Python/Lasagne code, lets say you're defining an architecture to load in VGG16, and you want to get one of the intermediate layers. You can further process those activations to get your features. However, generally speaking its probably better to clip off the final dense layers, replace them with something to output the thing you need on your problem, and train for a few epochs.