Supervised Learning
College admissions scam case set for Sept. 8 trial in Boston
USC's Pat Haden and now two "Varsity Blues" defendants want to file briefs in the college admissions scam case under seal. What they want to share, they argue, is "sensitive, confidential, and personally identifiable information." Haden, the former athletic director at the University of Southern California, has filed a motion in federal court in Boston to "quash a trial subpoena for testimony issued by counsel for defendants," as the Herald has reported. He was just granted permission to state his case in private. Defendants Gamal Abdelaziz and John Wilson are seeking that same protection to keep their arguments out of the public eye -- for now.
MatSat: a matrix-based differentiable SAT solver
Sato, Taisuke, Kojima, Ryosuke
We propose a new approach to SAT solving which solves SAT problems in vector spaces as a cost minimization problem of a non-negative differentiable cost function J^sat. In our approach, a solution, i.e., satisfying assignment, for a SAT problem in n variables is represented by a binary vector u in {0,1}^n that makes J^sat(u) zero. We search for such u in a vector space R^n by cost minimization, i.e., starting from an initial u_0 and minimizing J to zero while iteratively updating u by Newton's method. We implemented our approach as a matrix-based differential SAT solver MatSat. Although existing main-stream SAT solvers decide each bit of a solution assignment one by one, be they of conflict driven clause learning (CDCL) type or of stochastic local search (SLS) type, MatSat fundamentally differs from them in that it continuously approach a solution in a vector space. We conducted an experiment to measure the scalability of MatSat with random 3-SAT problems in which MatSat could find a solution up to n=10^5 variables. We also compared MatSat with four state-of-the-art SAT solvers including winners of SAT competition 2018 and SAT Race 2019 in terms of time for finding a solution, using a random benchmark set from SAT 2018 competition and an artificial random 3-SAT instance set. The result shows that MatSat comes in second in both test sets and outperforms all the CDCL type solvers.
'Jane' Starring Madelaine Petsch Delays Filming Due To COVID-19 Cases On Set
Startup studio and streaming service Creator Plus delayed its filming schedule for "Jane" after two COVID-19 cases were confirmed on set in New Mexico. In a statement obtained by Variety, Creator Plus said the cases were detected "while adhering to strict safety daily testing protocols." "As a result, we immediately implemented a six-day shutdown, which started yesterday (as a half day) from the initial case we received. All lead actors are continuing to test negative despite exposure. We're working closely with our SAG representatives, the CDC and the All Together New Mexico'COVID Safe Practices for Individuals and Employers' while upholding SAG's Return to Work agreement," the company said in a statement Wednesday.
CLINE: Contrastive Learning with Semantic Negative Examples for Natural Language Understanding
Wang, Dong, Ding, Ning, Li, Piji, Zheng, Hai-Tao
Despite pre-trained language models have proven useful for learning high-quality semantic representations, these models are still vulnerable to simple perturbations. Recent works aimed to improve the robustness of pre-trained models mainly focus on adversarial training from perturbed examples with similar semantics, neglecting the utilization of different or even opposite semantics. Different from the image processing field, the text is discrete and few word substitutions can cause significant semantic changes. To study the impact of semantics caused by small perturbations, we conduct a series of pilot experiments and surprisingly find that adversarial training is useless or even harmful for the model to detect these semantic changes. To address this problem, we propose Contrastive Learning with semantIc Negative Examples (CLINE), which constructs semantic negative examples unsupervised to improve the robustness under semantically adversarial attacking. By comparing with similar and opposite semantic examples, the model can effectively perceive the semantic changes caused by small perturbations. Empirical results show that our approach yields substantial improvements on a range of sentiment analysis, reasoning, and reading comprehension tasks. And CLINE also ensures the compactness within the same semantics and separability across different semantics in sentence-level.
See, Hear, Explore: curiosity via audio-visual association
To compute audio features, we take an audio clip spanning 4 time steps (th of a second for these 60 frame per second environments) and apply a Fast Fourier Transform (FFT). The FFT output is downsampled using max pooling to a 512-dimensional feature vector, which is used as input to the discriminator along with a 512-dimensional visual feature vector.
Probing Pre-Trained Language Models for Disease Knowledge
Alghanmi, Israa, Espinosa-Anke, Luis, Schockaert, Steven
Pre-trained language models such as ClinicalBERT have achieved impressive results on tasks such as medical Natural Language Inference. At first glance, this may suggest that these models are able to perform medical reasoning tasks, such as mapping symptoms to diseases. However, we find that standard benchmarks such as MedNLI contain relatively few examples that require such forms of reasoning. To better understand the medical reasoning capabilities of existing language models, in this paper we introduce DisKnE, a new benchmark for Disease Knowledge Evaluation. To construct this benchmark, we annotated each positive MedNLI example with the types of medical reasoning that are needed. We then created negative examples by corrupting these positive examples in an adversarial way. Furthermore, we define training-test splits per disease, ensuring that no knowledge about test diseases can be learned from the training data, and we canonicalize the formulation of the hypotheses to avoid the presence of artefacts. This leads to a number of binary classification problems, one for each type of reasoning and each disease. When analysing pre-trained models for the clinical/biomedical domain on the proposed benchmark, we find that their performance drops considerably.
Gradual Domain Adaptation in the Wild:When Intermediate Distributions are Absent
Abnar, Samira, Berg, Rianne van den, Ghiasi, Golnaz, Dehghani, Mostafa, Kalchbrenner, Nal, Sedghi, Hanie
We focus on the problem of domain adaptation when the goal is shifting the model towards the target distribution, rather than learning domain invariant representations. It has been shown that under the following two assumptions: (a) access to samples from intermediate distributions, and (b) samples being annotated with the amount of change from the source distribution, self-training can be successfully applied on gradually shifted samples to adapt the model toward the target distribution. We hypothesize having (a) is enough to enable iterative self-training to slowly adapt the model to the target distribution, by making use of an implicit curriculum. In the case where (a) does not hold, we observe that iterative self-training falls short. We propose GIFT, a method that creates virtual samples from intermediate distributions by interpolating representations of examples from source and target domains. We evaluate an iterative-self-training method on datasets with natural distribution shifts, and show that when applied on top of other domain adaptation methods, it improves the performance of the model on the target dataset. We run an analysis on a synthetic dataset to show that in the presence of (a) iterative-self-training naturally forms a curriculum of samples. Furthermore, we show that when (a) does not hold, GIFT performs better than iterative self-training.
Multi-output Gaussian Processes for Uncertainty-aware Recommender Systems
Yang, Yinchong, Buettner, Florian
A database describing such user-item interactions often takes the form of a matrix, where each entry describes the interaction between one user and one item. The overall Recommender systems are often designed based rating or purchasing pattern of a user can therefore be described on a collaborative filtering approach, where user by the corresponding row in such a matrix. However, preferences are predicted by modelling interactions since there are typically large numbers of users and items between users and items. Many common approaches in the database, and each user is usually only interested in to solve the collaborative filtering task a small subset of items, this user-item matrix is often large are based on learning representations of users and and sparse. It is therefore inefficient to define the similarity items, including simple matrix factorization, Gaussian between users in the high dimensional feature space defined process latent variable models, and neuralnetwork by all items. Instead, it is more advantageous to derive abstract based embeddings. While matrix factorization feature vectors that represent users and items, which approaches fail to model nonlinear relations, inspired a large variety of low-rank matrix decomposition neural networks can potentially capture such models such as non-negative matrix decomposition [Zhang complex relations with unprecedented predictive et al., 2006], biased matrix decomposition [Koren et al., power and are highly scalable. However, neither 2009] and non-parametric decomposition [Yu et al., 2009]. of them is able to model predictive uncertainties. These methods aim at learning low dimensional representations In contrast, Gaussian Process based models can for all users and items, allowing for the prediction of generate a predictive distribution, but cannot scale the unobserved interaction between a new pair of user and to large amounts of data.
DAMSL: Domain Agnostic Meta Score-based Learning
Cai, John, Cai, Bill, Shen, Shengmei
In this paper, we propose Domain Agnostic Meta Score-based Learning (DAMSL), a novel, versatile and highly effective solution that delivers significant out-performance over state-of-the-art methods for cross-domain few-shot learning. We identify key problems in previous meta-learning methods over-fitting to the source domain, and previous transfer-learning methods under-utilizing the structure of the support set. The core idea behind our method is that instead of directly using the scores from a fine-tuned feature encoder, we use these scores to create input coordinates for a domain agnostic metric space. A graph neural network is applied to learn an embedding and relation function over these coordinates to process all information contained in the score distribution of the support set. We test our model on both established CD-FSL benchmarks and new domains and show that our method overcomes the limitations of previous meta-learning and transfer-learning methods to deliver substantial improvements in accuracy across both smaller and larger domain shifts.
Statistical embedding: Beyond principal components
Tjøstheim, Dag, Jullum, Martin, Løland, Anders
There has been an intense recent activity in embedding of very high dimensional and nonlinear data structures, much of it in the data science and machine learning literature. We survey this activity in four parts. In the first part we cover nonlinear methods such as principal curves, multidimensional scaling, local linear methods, ISOMAP, graph based methods and kernel based methods. The second part is concerned with topological embedding methods, in particular mapping topological properties into persistence diagrams. Another type of data sets with a tremendous growth is very high-dimensional network data. The task considered in part three is how to embed such data in a vector space of moderate dimension to make the data amenable to traditional techniques such as cluster and classification techniques. The final part of the survey deals with embedding in $\mathbb{R}^2$, which is visualization. Three methods are presented: $t$-SNE, UMAP and LargeVis based on methods in parts one, two and three, respectively. The methods are illustrated and compared on two simulated data sets; one consisting of a triple of noisy Ranunculoid curves, and one consisting of networks of increasing complexity and with two types of nodes.