Goto

Collaborating Authors

 deepsith



DeepSITH: Efficient Learning via Decomposition of What and When Across Time Scales

Neural Information Processing Systems

Extracting temporal relationships over a range of scales is a hallmark ofhuman perception and cognition---and thus it is a critical feature of machinelearning applied to real-world problems. Neural networks are either plaguedby the exploding/vanishing gradient problem in recurrent neural networks(RNNs) or must adjust their parameters to learn the relevant time scales(e.g., in LSTMs).


DeepSITH: Efficient Learning via Decomposition of What and When Across Time Scales

Neural Information Processing Systems

After enough time has elapsed, the events that were presented close in time will gradually blend together, as illustrated in the bottom panel of Figure 1. We used the parameters presented in that work for the experiment with the adding problem used here. Table 1: Parameter values used for LSTM networks. Table 2: Parameter values used for LMU networks. "Coupled Oscillatory Recurrent Neural Network (coRNN): An Accurate and (Gradient) Stable Architecture for Learning Long Time Dependencies."


DeepSITH: Efficient Learning via Decomposition of What and When Across Time Scales

Neural Information Processing Systems

After enough time has elapsed, the events that were presented close in time will gradually blend together, as illustrated in the bottom panel of Figure 1. We used the parameters presented in that work for the experiment with the adding problem used here. Table 1: Parameter values used for LSTM networks. Table 2: Parameter values used for LMU networks. "Coupled Oscillatory Recurrent Neural Network (coRNN): An Accurate and (Gradient) Stable Architecture for Learning Long Time Dependencies."



DeepSITH: Efficient Learning via Decomposition of What and When Across Time Scales

Neural Information Processing Systems

Extracting temporal relationships over a range of scales is a hallmark ofhuman perception and cognition---and thus it is a critical feature of machinelearning applied to real-world problems. Neural networks are either plaguedby the exploding/vanishing gradient problem in recurrent neural networks(RNNs) or must adjust their parameters to learn the relevant time scales(e.g., in LSTMs). Each SITH module is simply aset of time cells coding what happened when with a geometrically-spaced set oftime lags. The dense connections between layers change the definition of whatfrom one layer to the next. The geometric series of time lags implies thatthe network codes time on a logarithmic scale, enabling DeepSITH network tolearn problems requiring memory over a wide range of time scales.