Review for NeurIPS paper: Compositional Explanations of Neurons
–Neural Information Processing Systems
Summary and Contributions: This paper presents a quite thoughtful and informative attempt to understand what information is encoded in individual neurons and sets of neurons (by which the authors mean units in neural networks, not real brain neurons. This is a somewhat unfortunate use of the word and it might be helpful in the future to talk on neural network units or some other term instead.) It is by now clear to anyone working with these networks that each individual neuron, esp. at the lower and intermediate levels, encodes not some simple human-understandable feature (like color red, size large, wordclass noun, type Person), but rather a complex combination of what one could call sub-facets, which each by itself often not easily described to a human. Combinations of these sub-facets taken from different neurons acting in tandem JOINTLY encode the facets that are more accessible to humans. But of course the other sub-facets also encoded by the neurons present in a human-accessible feature cluster might encode [parts of] a variety of totally unrelated other features, with the result that simple hotspot analysis and similar highlighting techniques are never fully determinate or clear, but always rather ambiguous and'smeared'.
Neural Information Processing Systems
Feb-5-2025, 19:25:41 GMT
- Technology: