While pretrained language models ("LM") have driven impressive gains over morpho-syntactic and semantic tasks, their ability to model discourse and pragmatic phenomena is less clear. As a step towards a better understanding of their discourse modelling capabilities, we propose a sentence intrusion detection task. We examine the performance of a broad range of pretrained LMs on this detection task for English. Lacking a dataset for the task, we introduce INSteD, a novel intruder sentence detection dataset, containing 170,000+ documents constructed from English Wikipedia and CNN news articles. Our experiments show that pretrained LMs perform impressively in in-domain evaluation, but experience a substantial drop in the cross-domain setting, indicating limited generalisation capacity. Further results over a novel linguistic probe dataset show that there is substantial room for improvement, especially in the cross-domain setting.
Multidimensional scaling in networks allows for the discovery of latent information about their structure by embedding nodes in some feature space. Ideological scaling for users in social networks such as Twitter is an example, but similar settings can include diverse applications in other networks and even media platforms or e-commerce. A growing literature of ideology scaling methods in social networks restricts the scaling procedure to nodes that provide interpretability of the feature space: on Twitter, it is common to consider the sub-network of parliamentarians and their followers. This allows to interpret inferred latent features as indices for ideology-related concepts inspecting the position of members of parliament. While effective in inferring meaningful features, this is generally restrained to these sub-networks, limiting interesting applications such as country-wide measurement of polarization and its evolution. We propose two methods to propagate ideological features beyond these sub-networks: one based on homophily (linked users have similar ideology), and the other on structural similarity (nodes with similar neighborhoods have similar ideologies). In our methods, we leverage the concept of neighborhood ideological coherence as a parameter for propagation. Using Twitter data, we produce an ideological scaling for 370K users, and analyze the two families of propagation methods on a population of 6.5M users. We find that, when coherence is considered, the ideology of a user is better estimated from those with similar neighborhoods, than from their immediate neighbors.
Zhang, Daniel, Mishra, Saurabh, Brynjolfsson, Erik, Etchemendy, John, Ganguli, Deep, Grosz, Barbara, Lyons, Terah, Manyika, James, Niebles, Juan Carlos, Sellitto, Michael, Shoham, Yoav, Clark, Jack, Perrault, Raymond
Welcome to the fourth edition of the AI Index Report. This year we significantly expanded the amount of data available in the report, worked with a broader set of external organizations to calibrate our data, and deepened our connections with the Stanford Institute for Human-Centered Artificial Intelligence (HAI). The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Its mission is to provide unbiased, rigorously vetted, and globally sourced data for policymakers, researchers, executives, journalists, and the general public to develop intuitions about the complex field of AI. The report aims to be the most credible and authoritative source for data and insights about AI in the world.
To address the long-standing data sparsity problem in recommender systems (RSs), cross-domain recommendation (CDR) has been proposed to leverage the relatively richer information from a richer domain to improve the recommendation performance in a sparser domain. Although CDR has been extensively studied in recent years, there is a lack of a systematic review of the existing CDR approaches. To fill this gap, in this paper, we provide a comprehensive review of existing CDR approaches, including challenges, research progress, and future directions. Specifically, we first summarize existing CDR approaches into four types, including single-target CDR, multi-domain recommendation, dual-target CDR, and multi-target CDR. We then present the definitions and challenges of these CDR approaches. Next, we propose a full-view categorization and new taxonomies on these approaches and report their research progress in detail. In the end, we share several promising research directions in CDR.
Marketing is evolving day by day. The need to upgrade your marketing is more now than ever. AI is now ruling every industry out there and marketing is no exception to it. Though it's a new trend and most of the organizations and marketers are not aware of this trend completely, a fair number of organizations have started implementing it already. So I thought to give an overview of AI powered Marketing.
The combination of human and machine learning, wherever they complement one another, has a lot of potential applications in citizen science. Several projects have already integrated both forms of learning to perform data-centred tasks (Willi et al. 2019; Sullivan et al. 2018). While the term artificial intelligence (AI) is generally used to refer to any kind of machine or algorithm able to observe the environment, learn, and make decisions, the term machine learning (ML) has been defined'as a subfield of artificial intelligence that includes software able to recognize patterns, make predictions, and apply newly discovered patterns to situations that were not included or covered by their initial design' (Popenici and Kerr 2017, p. 2). ML algorithms are currently the most widely used and applied, for example, in image and speech recognition, fraud detection, and reproducing human abilities in playing Go or driving cars. In scientific research, they find many applications in different fields such as biology, astronomy, and social sciences, just to mention a few (Jordan and Mitchell 2015).
With the recent advances of the Internet of Things, and the increasing accessibility of ubiquitous computing resources and mobile devices, the prevalence of rich media contents, and the ensuing social, economic, and cultural changes, computing technology and applications have evolved quickly over the past decade. They now go beyond personal computing, facilitating collaboration and social interactions in general, causing a quick proliferation of social relationships among IoT entities. The increasing number of these relationships and their heterogeneous social features have led to computing and communication bottlenecks that prevent the IoT network from taking advantage of these relationships to improve the offered services and customize the delivered content, known as relationship explosion. On the other hand, the quick advances in artificial intelligence applications in social computing have led to the emerging of a promising research field known as Artificial Social Intelligence (ASI) that has the potential to tackle the social relationship explosion problem. This paper discusses the role of IoT in social relationships detection and management, the problem of social relationships explosion in IoT and reviews the proposed solutions using ASI, including social-oriented machine-learning and deep-learning techniques.
Multi-label classification (MLC) is a generalization of standard classification where multiple labels may be assigned to a given sample. In the real world, it is more common to deal with noisy datasets than clean datasets, given how modern datasets are labeled by a large group of annotators on crowdsourcing platforms, but little attention has been given to evaluating multi-label classifiers with noisy labels. Exploiting label correlations now becomes a standard component of a multi-label classifier to achieve competitive performance. However, this component makes the classifier more prone to poor generalization - it overfits labels as well as label dependencies. We identify three common real-world label noise scenarios and show how previous approaches per-form poorly with noisy labels. To address this issue, we present a Context-Based Multi-LabelClassifier (CbMLC) that effectively handles noisy labels when learning label dependencies, without requiring additional supervision. We compare CbMLC against other domain-specific state-of-the-art models on a variety of datasets, under both the clean and the noisy settings. We show CbMLC yields substantial improvements over the previous methods in most cases.
Sentiment analysis is a research topic focused on analysing data to extract information related to the sentiment that it causes. Applications of sentiment analysis are wide, ranging from recommendation systems, and marketing to customer satisfaction. Recent approaches evaluate textual content using Machine Learning techniques that are trained over large corpora. However, as social media grown, other data types emerged in large quantities, such as images. Sentiment analysis in images has shown to be a valuable complement to textual data since it enables the inference of the underlying message polarity by creating context and connections. Multimodal sentiment analysis approaches intend to leverage information of both textual and image content to perform an evaluation. Despite recent advances, current solutions still flounder in combining both image and textual information to classify social media data, mainly due to subjectivity, inter-class homogeneity and fusion data differences. In this paper, we propose a method that combines both textual and image individual sentiment analysis into a final fused classification based on AutoML, that performs a random search to find the best model. Our method achieved state-of-the-art performance in the B-T4SA dataset, with 95.19% accuracy.
A significant amount of research has been devoted to automatic personalization in digital applications, especially in Internet applications Computer games represent an ideal research domain for the next . As the content of the Internet services grows, personalized generation of personalized digital applications. This paper presents applications such as recommendation systems help to mitigate information a player-centered framework of AI for game personalization, complementary overload and decision fatigue . This body of work to the commonly used system-centered approaches.