Goto

Collaborating Authors

 word feature




Appendix A Data

Neural Information Processing Systems

Hz to remove contributions from electrical line noise and other very high frequency noise. We further refer to this as the 2v2 accuracy . Under this metric, chance performance is 50% . The next step is to train and evaluate the proposed models. The distance used in our experiments is cosine distance. This model has 12 layers, 12 attention heads, and 768 hidden units.



'Finally, a dating app feature I can get behind!' Singletons love Hinge's huge update which lets them automatically filter out time-wasters and creeps

Daily Mail - Science & tech

With thousands of potential matches, opening up any dating app can feel like wading through a sea of spam and unwanted texts. But now, Hinge has made it easier than ever to avoid time wasters and toxicity by letting users filter out unwanted terms. The new Hidden Words tool automatically blocks'Likes with Comments' containing words, phrases or even emojis, as chosen by the users. And from'Sunday Roast' to'F1', Hinge users have taken to social media to share the phrases and dating clichรฉs that they're sick of hearing about. One X, formerly Twitter user, wrote: 'Finally, a dating app feature I can get behind.'


Word Features for Latent Dirichlet Allocation

Neural Information Processing Systems

We extend Latent Dirichlet Allocation (LDA) by explicitly allowing for the encoding of side information in the distribution over words. This results in a variety of new capabilities, such as improved estimates for infrequently occurring words, as well as the ability to leverage thesauri and dictionaries in order to boost topic cohesion within and across languages. We present experiments on multi-language topic synchronisation where dictionary information is used to bias corresponding words towards similar topics. Results indicate that our model substantially improves topic cohesion when compared to the standard LDA model.


MLANet: Multi-Level Attention Network with Sub-instruction for Continuous Vision-and-Language Navigation

arXiv.org Artificial Intelligence

Vision-and-Language Navigation (VLN) aims to develop intelligent agents to navigate in unseen environments only through language and vision supervision. In the recently proposed continuous settings (continuous VLN), the agent must act in a free 3D space and faces tougher challenges like real-time execution, complex instruction understanding, and long action sequence prediction. For a better performance in continuous VLN, we design a multi-level instruction understanding procedure and propose a novel model, Multi-Level Attention Network (MLANet). The first step of MLANet is to generate sub-instructions efficiently. We design a Fast Sub-instruction Algorithm (FSA) to segment the raw instruction into sub-instructions and generate a new sub-instruction dataset named ``FSASub". FSA is annotation-free and faster than the current method by 70 times, thus fitting the real-time requirement in continuous VLN. To solve the complex instruction understanding problem, MLANet needs a global perception of the instruction and observations. We propose a Multi-Level Attention (MLA) module to fuse vision, low-level semantics, and high-level semantics, which produce features containing a dynamic and global comprehension of the task. MLA also mitigates the adverse effects of noise words, thus ensuring a robust understanding of the instruction. To correctly predict actions in long trajectories, MLANet needs to focus on what sub-instruction is being executed every step. We propose a Peak Attention Loss (PAL) to improve the flexible and adaptive selection of the current sub-instruction. PAL benefits the navigation agent by concentrating its attention on the local information, thus helping the agent predict the most appropriate actions. We train and test MLANet in the standard benchmark. Experiment results show MLANet outperforms baselines by a significant margin.


Hierarchical Local-Global Transformer for Temporal Sentence Grounding

arXiv.org Artificial Intelligence

This paper studies the multimedia problem of temporal sentence grounding (TSG), which aims to accurately determine the specific video segment in an untrimmed video according to a given sentence query. Traditional TSG methods mainly follow the top-down or bottom-up framework and are not end-to-end. They severely rely on time-consuming post-processing to refine the grounding results. Recently, some transformer-based approaches are proposed to efficiently and effectively model the fine-grained semantic alignment between video and query. Although these methods achieve significant performance to some extent, they equally take frames of the video and words of the query as transformer input for correlating, failing to capture their different levels of granularity with distinct semantics. To address this issue, in this paper, we propose a novel Hierarchical Local-Global Transformer (HLGT) to leverage this hierarchy information and model the interactions between different levels of granularity and different modalities for learning more fine-grained multi-modal representations. Specifically, we first split the video and query into individual clips and phrases to learn their local context (adjacent dependency) and global correlation (long-range dependency) via a temporal transformer. Then, a global-local transformer is introduced to learn the interactions between the local-level and global-level semantics for better multi-modal reasoning. Besides, we develop a new cross-modal cycle-consistency loss to enforce interaction between two modalities and encourage the semantic alignment between them. Finally, we design a brand-new cross-modal parallel transformer decoder to integrate the encoded visual and textual features for final grounding. Extensive experiments on three challenging datasets show that our proposed HLGT achieves a new state-of-the-art performance.


Modeling Task Effects on Meaning Representation in the Brain via Zero-Shot MEG Prediction

arXiv.org Artificial Intelligence

How meaning is represented in the brain is still one of the big open questions in neuroscience. Does a word (e.g., bird) always have the same representation, or does the task under which the word is processed alter its representation (answering "can you eat it?" versus "can it fly?")? The brain activity of subjects who read the same word while performing different semantic tasks has been shown to differ across tasks. However, it is still not understood how the task itself contributes to this difference. In the current work, we study Magnetoencephalography (MEG) brain recordings of participants tasked with answering questions about concrete nouns. We investigate the effect of the task (i.e. the question being asked) on the processing of the concrete noun by predicting the millisecond-resolution MEG recordings as a function of both the semantics of the noun and the task. Using this approach, we test several hypotheses about the task-stimulus interactions by comparing the zero-shot predictions made by these hypotheses for novel tasks and nouns not seen during training. We find that incorporating the task semantics significantly improves the prediction of MEG recordings, across participants. The improvement occurs 475-550ms after the participants first see the word, which corresponds to what is considered to be the ending time of semantic processing for a word. These results suggest that only the end of semantic processing of a word is task-dependent, and pose a challenge for future research to formulate new hypotheses for earlier task effects as a function of the task and stimuli.


Regularised Text Logistic Regression: Key Word Detection and Sentiment Classification for Online Reviews

arXiv.org Machine Learning

Online customer reviews have become important for managers and executives in the hospitality and catering industry who wish to obtain a comprehensive understanding of their customers' demands and expectations. We propose a Regularized Text Logistic (RTL) regression model to perform text analytics and sentiment classification on unstructured text data, which automatically identifies a set of statistically significant and operationally insightful word features, and achieves satisfactory predictive classification accuracy. We apply the RTL model to two online review datasets, Restaurant and Hotel, from TripAdvisor. Our results demonstrate satisfactory classification performance compared with alternative classifiers with a highest true positive rate of 94.9%. Moreover, RTL identifies a small set of word features, corresponding to 3% for Restaurant and 20% for Hotel, which boosts working efficiency by allowing managers to drill down into a much smaller set of important customer reviews. We also develop the consistency, sparsity and oracle property of the estimator.