resolving
Resolving the data ambiguity for periodic crystals
The fundamental model of all solid crystalline materials is a periodic set of atomic centers considered up to rigid motion in Euclidean space. The major obstacle to materials discovery was highly ambiguous representations of periodic crystals that didn't allow fast and reliable comparisons and led to numerous (near-) duplicates in many databases of experimental and simulated crystals. This paper exemplarily resolves the ambiguity by invariants, which are descriptors without false negatives.The new Pointwise Distance Distributions (PDD) is a numerical matrix with a near-linear time complexity and an exactly computable metric. The strongest theoretical result is generic completeness (absence of false positives) for all finite and periodic sets of points in any dimension. The strength of PDD is shown by 200B+ pairwise comparisons of all periodic structures in the world's largest collection (Cambridge Structural Database) of existing materials over two days on a modest desktop.
Resolving the Tug-of-War: A Separation of Communication and Learning in Federated Learning
Federated learning (FL) is a promising privacy-preserving machine learning paradigm over distributed data. In this paradigm, each client trains the parameter of a model locally and the server aggregates the parameter from clients periodically. Therefore, we perform the learning and communication over the same set of parameters. However, we find that learning and communication have fundamentally divergent requirements for parameter selection, akin to two opposite teams in a tug-of-war game. To mitigate this discrepancy, we introduce FedSep, a novel two-layer federated learning framework.
Hybrid Batch Normalisation: Resolving the Dilemma of Batch Normalisation in Federated Learning
Chen, Hongyao, Xu, Tianyang, Wu, Xiaojun, Kittler, Josef
Batch Normalisation (BN) is widely used in conventional deep neural network training to harmonise the input-output distributions for each batch of data. However, federated learning, a distributed learning paradigm, faces the challenge of dealing with non-independent and identically distributed data among the client nodes. Due to the lack of a coherent methodology for updating BN statistical parameters, standard BN degrades the federated learning performance. To this end, it is urgent to explore an alternative normalisation solution for federated learning. In this work, we resolve the dilemma of the BN layer in federated learning by developing a customised normalisation approach, Hybrid Batch Normalisation (HBN). HBN separates the update of statistical parameters (i.e. , means and variances used for evaluation) from that of learnable parameters (i.e. , parameters that require gradient updates), obtaining unbiased estimates of global statistical parameters in distributed scenarios. In contrast with the existing solutions, we emphasise the supportive power of global statistics for federated learning. The HBN layer introduces a learnable hybrid distribution factor, allowing each computing node to adaptively mix the statistical parameters of the current batch with the global statistics. Our HBN can serve as a powerful plugin to advance federated learning performance. It reflects promising merits across a wide range of federated learning settings, especially for small batch sizes and heterogeneous data.
- Asia > China (0.04)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (2 more...)
Resolving the data ambiguity for periodic crystals
The fundamental model of all solid crystalline materials is a periodic set of atomic centers considered up to rigid motion in Euclidean space. The major obstacle to materials discovery was highly ambiguous representations of periodic crystals that didn't allow fast and reliable comparisons and led to numerous (near-) duplicates in many databases of experimental and simulated crystals. This paper exemplarily resolves the ambiguity by invariants, which are descriptors without false negatives.The new Pointwise Distance Distributions (PDD) is a numerical matrix with a near-linear time complexity and an exactly computable metric. The strongest theoretical result is generic completeness (absence of false positives) for all finite and periodic sets of points in any dimension. The strength of PDD is shown by 200B pairwise comparisons of all periodic structures in the world's largest collection (Cambridge Structural Database) of existing materials over two days on a modest desktop.
Resolving the Tug-of-War: A Separation of Communication and Learning in Federated Learning
Federated learning (FL) is a promising privacy-preserving machine learning paradigm over distributed data. In this paradigm, each client trains the parameter of a model locally and the server aggregates the parameter from clients periodically. Therefore, we perform the learning and communication over the same set of parameters. However, we find that learning and communication have fundamentally divergent requirements for parameter selection, akin to two opposite teams in a tug-of-war game. To mitigate this discrepancy, we introduce FedSep, a novel two-layer federated learning framework.
Resolving the Human-Subjects Status of ML's Crowdworkers
As the focus of machine learning (ML) has shifted toward settings characterized by massive datasets, researchers have become reliant on crowdsourcing platforms.13,25 Just for the natural language processing (NLP) task of passage-based question answering (QA), more than 15 new datasets containing at least 50k annotations have been introduced since 2016. Prior to that, available QA datasets contained orders of magnitude fewer examples. The ability to construct such enormous resources derives mostly from the liquid market for temporary labor on crowdsourcing platforms such as Amazon Mechanical Turk. These practices, however, have raised ethical concerns, including low wages;5,26 disparate access, benefits, and harms of developed applications;1,20 reproducibility of proposed methods;4,21 and potential for unfairness and discrimination in the resulting technologies.9,14
- Research Report > Experimental Study (0.56)
- Research Report > New Finding (0.34)
24 The Design Philosophy of POP-2 R.J. Popplestone
INTRODUCTION Pop-2 (Burstall & Popplestone, 1968) represents a fairly far-reaching revision, extension and systematization of the author's Pop-1 (Popplestone, 1968). The thoughts expressed here consequently represent a point of view elaborated jointly with my co-designer of Pop-2, Dr R. M. Burstall. AIMS POP-2 is a language to be implemented on real machines, using modest resources of manpower. An implementation of the language must be possible which permits large problems to be tackled. This implementation must not be too inefficient in its use of machine time, or too profligate in its use of store. The language must also take into account such properties of real machines as overwritable store--that is to say it must not be a purely constructive language: it must allow assignment. Pop-2 handles a large range of structures such as list cells (cf. CPO and records (called beads in AED).