South America
Guiding Symbolic Natural Language Grammar Induction via Transformer-Based Sequence Probabilities
Goertzel, Ben, Madrigal, Andres Suarez, Yu, Gino
A novel approach to automated learning of syntactic rules governing natural languages is proposed, based on using probabilities assigned to sentences (and potentially longer word sequences) by transformer neural network language models to guide symbolic learning processes like clustering and rule induction. This method exploits the learned linguistic knowledge in transformers, without any reference to their inner representations; hence, the technique is readily adaptable to the continuous appearance of more powerful language models. We show a proof-of-concept example of our proposed technique, using it to guide unsupervised symbolic link-grammar induction methods drawn from our prior research.
Do All Good Actors Look The Same? Exploring News Veracity Detection Across The U.S. and The U.K
Horne, Benjamin D., Gruppi, Maurício, Adalı, Sibel
A major concern with text-based news veracity detection methods is that they may not generalize across countries and cultures. In this short paper, we explicitly test news veracity models across news data from the United States and the United Kingdom, demonstrating there is reason for concern of generalizabilty. Through a series of testing scenarios, we show that text-based classifiers perform poorly when trained on one country's news data and tested on another. Furthermore, these same models have trouble classifying unseen, unreliable news sources. In conclusion, we discuss implications of these results and avenues for future work.
Attention: to Better Stand on the Shoulders of Giants
Yuan, Sha, Shao, Zhou, Zhang, Yu, Wei, Xingxing, Xiao, Tong, Wang, Yifan, Tang, Jie
Science of science (SciSci) is an emerging discipline wherein science is used to study the structure and evolution of science itself using large data sets. The increasing availability of digital data on scholarly outcomes offers unprecedented opportunities to explore SciSci. In the progress of science, the previously discovered knowledge principally inspires new scientific ideas, and citation is a reasonably good reflection of this cumulative nature of scientific research. The researches that choose potentially influential references will have a lead over the emerging publications. Although the peer review process is the mainly reliable way of predicting a paper's future impact, the ability to foresee the lasting impact based on citation records is increasingly essential in the scientific impact analysis in the era of big data. This paper develops an attention mechanism for the long-term scientific impact prediction and validates the method based on a real large-scale citation data set. The results break conventional thinking. Instead of accurately simulating the original power-law distribution, emphasizing the limited attention can better stand on the shoulders of giants.
Road Segmentation on low resolution Lidar point clouds for autonomous vehicles
Gigli, Leonardo, Kiran, B Ravi, Paul, Thomas, Serna, Andres, Vemuri, Nagarjuna, Marcotegui, Beatriz, Velasco-Forero, Santiago
Point cloud datasets for perception tasks in the context of autonomous driving often rely on high resolution 64-layer Light Detection and Ranging (LIDAR) scanners. They are expensive to deploy on real-world autonomous driving sensor architectures which usually employ 16/32 layer LIDARs. We evaluate the effect of subsampling image based representations of dense point clouds on the accuracy of the road segmentation task. In our experiments the low resolution 16/32 layer LIDAR point clouds are simulated by subsampling the original 64 layer data, for subsequent transformation in to a feature map in the Bird-Eye-View (BEV) and SphericalView (SV) representations of the point cloud. We introduce the usage of the local normal vector with the LIDAR's spherical coordinates as an input channel to existing LoDNN architectures. We demonstrate that this local normal feature in conjunction with classical features not only improves performance for binary road segmentation on full resolution point clouds, but it also reduces the negative impact on the accuracy when subsampling dense point clouds as compared to the usage of classical features alone. We assess our method with several experiments on two datasets: KITTI Road-segmentation benchmark and the recently released Semantic KITTI dataset.
Comparing BERT against traditional machine learning text classification
González-Carvajal, Santiago, Garrido-Merchán, Eduardo C.
The BERT model has arisen as a popular state-of-the-art machine learning model in the recent years that is able to cope with multiple NLP tasks such as supervised text classification without human supervision. Its flexibility to cope with any type of corpus delivering great results has make this approach very popular not only in academia but also in the industry. Although, there are lots of different approaches that have been used throughout the years with success. In this work, we first present BERT and include a little review on classical NLP approaches. Then, we empirically test with a suite of experiments dealing different scenarios the behaviour of BERT against the traditional TF-IDF vocabulary fed to machine learning algorithms. Our purpose of this work is to add empirical evidence to support or refuse the use of BERT as a default on NLP tasks. Experiments show the superiority of BERT and its independence of features of the NLP problem such as the language of the text adding empirical evidence to use BERT as a default technique to be used in NLP problems.
How to reverse-engineer a rainforest
But 2019 was the year the earth burned. In Australia, the world watched in horror as bushfires destroyed 10.3 million hectares, marking the continent's most intense and destructive fire season in over 40 years. Earlier that fall, California saw more than 101,000 hectares destroyed, with damages upward of $80 billion. Alaska saw nearly a million. Record-breaking fires also hit Indonesia, Russia, Lebanon -- but nowhere saw the sheer mass of media coverage as the fires that tore through the Amazon nearly all last summer. By year's end, thousands of global media outlets had reported that Brazil's largest rainforest played host to more than 80,000 individual forest fires in 2019, resulting in an estimated 906,000 square hectares of environmental destruction. At the time, Brazil's National Institute for Space Research reported it was the fastest rate of burning since record keeping began in 2013. But amid the charred ruins of one of the largest oxygen-producing environments on the planet, a secret lies buried beneath the soil.
Contributing a New Large Dataset for SARS-CoV-2 Identification via CT Scan
You can find the dataset on Kaggle. For the detailed paper, please visit medRxiv. In December 2019, an outbreak coronavirus (SARS-CoV-2) infection began in Wuhan, the capital of central China's Hubei province. On Jan 30, 2020, the World Health Organization (WHO) declared a global health emergency. By May 9, 2020, over 4 million officially confirmed cases were reported in practically every corner of the Earth, with 275,976 officially reported deaths documented.
Degree-Aware Alignment for Entities in Tail
Zeng, Weixin, Zhao, Xiang, Wang, Wei, Tang, Jiuyang, Tan, Zhen
Entity alignment (EA) is to discover equivalent entities in knowledge graphs (KGs), which bridges heterogeneous sources of information and facilitates the integration of knowledge. Existing EA solutions mainly rely on structural information to align entities, typically through KG embedding. Nonetheless, in real-life KGs, only a few entities are densely connected to others, and the rest majority possess rather sparse neighborhood structure. We refer to the latter as long-tail entities, and observe that such phenomenon arguably limits the use of structural information for EA. To mitigate the issue, we revisit and investigate into the conventional EA pipeline in pursuit of elegant performance. For pre-alignment, we propose to amplify long-tail entities, which are of relatively weak structural information, with entity name information that is generally available (but overlooked) in the form of concatenated power mean word embeddings. For alignment, under a novel complementary framework of consolidating structural and name signals, we identify entity's degree as important guidance to effectively fuse two different sources of information. To this end, a degree-aware co-attention network is conceived, which dynamically adjusts the significance of features in a degree-aware manner. For post-alignment, we propose to complement original KGs with facts from their counterparts by using confident EA results as anchors via iterative training. Comprehensive experimental evaluations validate the superiority of our proposed techniques.
DJEnsemble: On the Selection of a Disjoint Ensemble of Deep Learning Black-Box Spatio-temporal Models
Souto, Yania Molina, Pereira, Rafael, Zorrilla, Rocío, Chaves, Anderson, Tsan, Brian, Rusu, Florin, Ogasawara, Eduardo, Ziviani, Artur, Porto, Fabio
In this paper, we present a cost-based approach for the automatic selection and allocation of a disjoint ensemble of black-box predictors to answer predictive spatio-temporal queries. Our approach is divided into two parts -- offline and online. During the offline part, we preprocess the predictive domain data -- transforming it into a regular grid -- and the black-box models -- computing their spatio-temporal learning function. In the online part, we compute a DJEnsemble plan which minimizes a multivariate cost function based on estimates for the prediction error and the execution cost -- producing a model spatial allocation matrix -- and run the optimal ensemble plan. We conduct a set of extensive experiments that evaluate the DJEnsemble approach and highlight its efficiency. We show that our cost model produces plans with performance close to the actual best plan. When compared against the traditional ensemble approach, DJEnsemble achieves up to $4X$ improvement in execution time and almost $9X$ improvement in prediction accuracy. To the best of our knowledge, this is the first work to solve the problem of optimizing the allocation of black-box models to answer predictive spatio-temporal queries.
The Ten Most Dangerous Roads In The World, And How Self-Driving Cars Would Fare
Will self-driving cars be able to cope with highly dangerous roads? Let's talk about dangerous roads. In a moment, I'll provide you with a recently published list of the presumed Top Ten most dangerous roads in the world. For some of you, the odds are that you'll be happy that you've never had a cause to try and traverse these bad-to-the-bone roads, while others of you are probably going to put these alarming roads on your bucket list of places you have to go and give a whirl someday. Do you prefer roads that are calm, easy to navigate, and present little or no qualms?