to

### #008 Shallow Neural Network - Master Data Science

In this post we will see how to vectorize across multiple training examples. The outcome will be similar to what we saw in Logistic Regression. These equations tell us how, when given an input feature vector $$x$$, we can generate predictions. If we have $$m$$ training examples we need to repeat this proces $$m$$ times. The notation $$a {[2](i)}$$ means that we are talking about activation in the second layer that comes from $$i {th}$$ training example.

### A Gentle Introduction to Vector Space Models

Vector space models are to consider the relationship between data that are represented by vectors. It is popular in information retrieval systems but also useful for other purposes. Generally, this allows us to compare the similarity of two vectors from a geometric perspective. In this tutorial, we will see what is a vector space model and what it can do. A Gentle Introduction to Vector Space Models Photo by liamfletch, some rights reserved.

### A Survey of Self-Supervised and Few-Shot Object Detection

Labeling data is often expensive and time-consuming, especially for tasks such as object detection and instance segmentation, which require dense labeling of the image. While few-shot object detection is about training a model on novel (unseen) object classes with little data, it still requires prior training on many labeled examples of base (seen) classes. On the other hand, self-supervised methods aim at learning representations from unlabeled data which transfer well to downstream tasks such as object detection. Combining few-shot and self-supervised object detection is a promising research direction. In this survey, we review and characterize the most recent approaches on few-shot and self-supervised object detection. Then, we give our main takeaways and discuss future research directions.

### Permute Me Softly: Learning Soft Permutations for Graph Representations

Graph neural networks (GNNs) have recently emerged as a dominant paradigm for machine learning with graphs. Research on GNNs has mainly focused on the family of message passing neural networks (MPNNs). Similar to the Weisfeiler-Leman (WL) test of isomorphism, these models follow an iterative neighborhood aggregation procedure to update vertex representations, and they next compute graph representations by aggregating the representations of the vertices. Although very successful, MPNNs have been studied intensively in the past few years. Thus, there is a need for novel architectures which will allow research in the field to break away from MPNNs. In this paper, we propose a new graph neural network model, so-called $\pi$-GNN which learns a "soft" permutation (i.e., doubly stochastic) matrix for each graph, and thus projects all graphs into a common vector space. The learned matrices impose a "soft" ordering on the vertices of the input graphs, and based on this ordering, the adjacency matrices are mapped into vectors. These vectors can be fed into fully-connected or convolutional layers to deal with supervised learning tasks. In case of large graphs, to make the model more efficient in terms of running time and memory, we further relax the doubly stochastic matrices to row stochastic matrices. We empirically evaluate the model on graph classification and graph regression datasets and show that it achieves performance competitive with state-of-the-art models.

### Turing approximations, toric isometric embeddings & manifold convolutions

Convolutions are fundamental elements in deep learning architectures. Here, we present a theoretical framework for combining extrinsic and intrinsic approaches to manifold convolution through isometric embeddings into tori. In this way, we define a convolution operator for a manifold of arbitrary topology and dimension. We also explain geometric and topological conditions that make some local definitions of convolutions which rely on translating filters along geodesic paths on a manifold, computationally intractable. A result of Alan Turing from 1938 underscores the need for such a toric isometric embedding approach to achieve a global definition of convolution on computable, finite metric space approximations to a smooth manifold.

### Dimension Reduction and Data Visualization for Fr\'echet Regression

With the rapid development of data collection techniques, complex data objects that are not in the Euclidean space are frequently encountered in new statistical applications. Fr\'echet regression model (Peterson & M\"uller 2019) provides a promising framework for regression analysis with metric space-valued responses. In this paper, we introduce a flexible sufficient dimension reduction (SDR) method for Fr\'echet regression to achieve two purposes: to mitigate the curse of dimensionality caused by high-dimensional predictors, and to provide a tool for data visualization for Fr\'echet regression. Our approach is flexible enough to turn any existing SDR method for Euclidean (X,Y) into one for Euclidean X and metric space-valued Y. The basic idea is to first map the metric-space valued random object $Y$ to a real-valued random variable $f(Y)$ using a class of functions, and then perform classical SDR to the transformed data. If the class of functions is sufficiently rich, then we are guaranteed to uncover the Fr\'echet SDR space. We showed that such a class, which we call an ensemble, can be generated by a universal kernel. We established the consistency and asymptotic convergence rate of the proposed methods. The finite-sample performance of the proposed methods is illustrated through simulation studies for several commonly encountered metric spaces that include Wasserstein space, the space of symmetric positive definite matrices, and the sphere. We illustrated the data visualization aspect of our method by exploring the human mortality distribution data across countries and by studying the distribution of hematoma density.

### Focused Contrastive Training for Test-based Constituency Analysis

We propose a scheme for self-training of grammaticality models for constituency analysis based on linguistic tests. A pre-trained language model is fine-tuned by contrastive estimation of grammatical sentences from a corpus, and ungrammatical sentences that were perturbed by a syntactic test, a transformation that is motivated by constituency theory. We show that consistent gains can be achieved if only certain positive instances are chosen for training, depending on whether they could be the result of a test transformation. This way, the positives, and negatives exhibit similar characteristics, which makes the objective more challenging for the language model, and also allows for additional markup that indicates the position of the test application within the sentence.

### How to Use Arabic Word2Vec Word Embedding with LSTM

Word embedding is the approach of learning word and their relative meanings from a corpus of text and representing the word as a dense vector. The word vector is the projection of the word into a continuous feature vector space, see Figure 1 (A) for clarity. Words that have similar meaning should be close together in the vector space as illustrated in see Figure 1 (B). Word2vec is one of the most popular words embedding in NLP. Word2vec has two types, Continuous Bag-of-Words Model (CBOW) and Continuous Skip-gram Model [3], the model architectures are shown in Figure 2. CBOW predicts the word according to the given context, where Skip-gram predicts the context according to the given word, which increases the computational complexity [3].

### Structured Prediction in NLP -- A survey

Over the last several years, the field of Structured prediction in NLP has had seen huge advancements with sophisticated probabilistic graphical models, energy-based networks, and its combination with deep learning-based approaches. This survey provides a brief of major techniques in structured prediction and its applications in the NLP domains like parsing, sequence labeling, text generation, and sequence to sequence tasks. We also deep-dived into energy-based and attention-based techniques in structured prediction, identified some relevant open issues and gaps in the current state-of-the-art research, and have come up with some detailed ideas for future research in these fields.

### College admissions scam case set for Sept. 8 trial in Boston

USC's Pat Haden and now two "Varsity Blues" defendants want to file briefs in the college admissions scam case under seal. What they want to share, they argue, is "sensitive, confidential, and personally identifiable information." Haden, the former athletic director at the University of Southern California, has filed a motion in federal court in Boston to "quash a trial subpoena for testimony issued by counsel for defendants," as the Herald has reported. He was just granted permission to state his case in private. Defendants Gamal Abdelaziz and John Wilson are seeking that same protection to keep their arguments out of the public eye -- for now.