Train vast neural networks together

Aug-27-2020, 14:00:17 GMT–#artificialintelligence

Hivemind uses Kademlia-based DHT that can scale to tens of thousands of peers with logarithmic search complexity. On each forward pass, a peer first determines what "speciality" of experts is needed to process the current inputs using a small "gating function" module. Then it finds k (e.g. 4) most suitable experts from other peers in the network using the DHT protocol. Finally, it sends forward pass requests to the selected experts, collects their outputs and averages them for the final prediction. Compared to traditional architectures, the Mixture-of-Experts needs much less bandwidth as every input is only sent to a small fraction of all experts. More importantly, the decentralized Mixture-of-Experts layers are inherently fault-tolerant: if some of the chosen experts fail to respond, the model will simply average the remaining ones and call that dropout.

artificial intelligence, machine learning

#artificialintelligence

Aug-27-2020, 14:00:17 GMT

News Web Page

Add feedback

Industry:
- Education > Educational Setting > Home Schooling (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found