Goto

Collaborating Authors

 asfollow


AlleviateAnchor-Shift: ExploreBlindSpotswith Cross-ViewReconstructionforIncompleteMulti-View Clustering

Neural Information Processing Systems

Despite efficiencyimprovements, existing methods overlook themisguidance in anchors learning induced by partial missing samples,i.e., the absence of samples results in shift of learned anchors, further leading to sub-optimal clustering performance.


FedDR-RandomizedDouglas-RachfordSplittingAlgorithms forNonconvexFederatedCompositeOptimization ATheAnalysisofAlgorithm1: RandomizedCoordinateVariant--FedDR

Neural Information Processing Systems

FedAvg: FedAvg [29] has become a de facto standard federated learning algorithm in practice. However,it has several limitations as discussed in many papers, including [23]. It is also difficult to analyze convergence of FedAvg, especially in the nonconvex case andheterogeneity settings (both statistical andsystem heterogeneity). Moreover,FedAvg originally specifies SGD with a fixed number of epochs and a fixed learning rate as its local solver,making itlessflexible inpractice.






SupplementaryMaterial

Neural Information Processing Systems

R(h). (23) Here for simplicity, we abused the symbolD in(22)by maximizing outh0 in the originalD. In the top-left areaP,suppose only oneexample (markedbyxwith vertical coordinate1)isconfidently labeled as positive, and the rest examples are highly inconfidently labeled, hence not to contribute to the riskR. Similarly,there isonly one confidently labeled example ()inthe bottom-right area ofP, and it is negative with vertical coordinate 1. Wheneverλ > 2, the optimalhλ is in(0,1)and can be solved by a quadratic equation. In contrast,di-MDD is immune to this problem becauseRis used only to determineh, while the di-MDD value itself is solely contributed byD. Same as the scenario of largeλ, we do not change the feature distribution of source and target domains, hence keepingD(h) = 1 |h|.



f6a8dd1c954c8506aadc764cc32b895e-Paper.pdf

Neural Information Processing Systems

Clustered attention makes use of similarities between queries and groups them in order to reduce the computational cost. In particular, we perform fast clustering using locality-sensitive hashing and K-Means and only compute the attention once per cluster.