Goto

Collaborating Authors

 South America


Discourse-Aware Prompt Design for Text Generation

arXiv.org Machine Learning

Current efficient fine-tuning methods (e.g., adapters, prefix-tuning, etc.) have optimized conditional text generation via training a small set of extra parameters of the neural language model, while freezing the rest for efficiency. While showing strong performance on some generation tasks, they don't generalize across all generation tasks. In this work, we show that prompt based conditional text generation can be improved with simple and efficient methods that simulate modeling the discourse structure of human written text. We introduce two key design choices: First we show that a higher-level discourse structure of human written text can be modelled with \textit{hierarchical blocking} on prefix parameters that enable spanning different parts of the input and output text and yield more coherent output generations. Second, we propose sparse prefix tuning by introducing \textit{attention sparsity} on the prefix parameters at different layers of the network and learn sparse transformations on the softmax-function, respectively. We find that sparse attention enables the prefix-tuning to better control of the input contents (salient facts) yielding more efficient tuning of the prefix-parameters. Experiments on a wide-variety of text generation tasks show that structured design of prefix parameters can achieve comparable results to fine-tuning all parameters while outperforming standard prefix-tuning on all generation tasks even in low-resource settings.


DEBACER: a method for slicing moderated debates

arXiv.org Artificial Intelligence

Subjects frequently change in moderated debates with several participants, such as in parliamentary sessions, electoral debates, and trials. Partitioning a debate into blocks with the same subject is essential for understanding. Often a moderator is responsible for defining when a new block begins so that the task of automatically partitioning a moderated debate can focus solely on the moderator's behavior. In this paper, we (i) propose a new algorithm, DEBACER, which partitions moderated debates; (ii) carry out a comparative study between conventional and BERTimbau pipelines; and (iii) validate DEBACER applying it to the minutes of the Assembly of the Republic of Portugal. Our results show the effectiveness of DEBACER.


Benchmarking human visual search computational models in natural scenes: models comparison and reference datasets

arXiv.org Artificial Intelligence

Visual search is an essential part of almost any everyday human goal-directed interaction with the environment. Nowadays, several algorithms are able to predict gaze positions during simple observation, but few models attempt to simulate human behavior during visual search in natural scenes. Furthermore, these models vary widely in their design and exhibit differences in the datasets and metrics with which they were evaluated. Thus, there is a need for a reference point, on which each model can be tested and from where potential improvements can be derived. In the present work, we select publicly available state-of-the-art visual search models in natural scenes and evaluate them on different datasets, employing the same metrics to estimate their efficiency and similarity with human subjects. In particular, we propose an improvement to the Ideal Bayesian Searcher through a combination with a neural network-based visual search model, enabling it to generalize to other datasets. The present work sheds light on the limitations of current models and how potential improvements can be accomplished by combining approaches. Moreover, it moves forward on providing a solution for the urgent need for benchmarking data and metrics to support the development of more general human visual search computational models.


PACMAN: PAC-style bounds accounting for the Mismatch between Accuracy and Negative log-loss

arXiv.org Machine Learning

The ultimate performance of machine learning algorithms for classification tasks is usually measured in terms of the empirical error probability (or accuracy) based on a testing dataset. Whereas, these algorithms are optimized through the minimization of a typically different--more convenient--loss function based on a training set. For classification tasks, this loss function is often the negative log-loss that leads to the well-known cross-entropy risk which is typically better behaved (from a numerical perspective) than the error probability. Conventional studies on the generalization error do not usually take into account the underlying mismatch between losses at training and testing phases. In this work, we introduce an analysis based on point-wise PAC approach over the generalization gap considering the mismatch of testing based on the accuracy metric and training on the negative log-loss. We label this analysis PACMAN. Building on the fact that the mentioned mismatch can be written as a likelihood ratio, concentration inequalities can be used to provide some insights for the generalization problem in terms of some point-wise PAC bounds depending on some meaningful information-theoretic quantities. An analysis of the obtained bounds and a comparison with available results in the literature are also provided.


Beyond Parallel Pancakes: Quasi-Polynomial Time Guarantees for Non-Spherical Gaussian Mixtures

arXiv.org Machine Learning

We consider mixtures of $k\geq 2$ Gaussian components with unknown means and unknown covariance (identical for all components) that are well-separated, i.e., distinct components have statistical overlap at most $k^{-C}$ for a large enough constant $C\ge 1$. Previous statistical-query lower bounds [DKS17] give formal evidence that even distinguishing such mixtures from (pure) Gaussians may be exponentially hard (in $k$). We show that this kind of hardness can only appear if mixing weights are allowed to be exponentially small, and that for polynomially lower bounded mixing weights non-trivial algorithmic guarantees are possible in quasi-polynomial time. Concretely, we develop an algorithm based on the sum-of-squares method with running time quasi-polynomial in the minimum mixing weight. The algorithm can reliably distinguish between a mixture of $k\ge 2$ well-separated Gaussian components and a (pure) Gaussian distribution. As a certificate, the algorithm computes a bipartition of the input sample that separates a pair of mixture components, i.e., both sides of the bipartition contain most of the sample points of at least one component. For the special case of colinear means, our algorithm outputs a $k$ clustering of the input sample that is approximately consistent with the components of the mixture. A significant challenge for our results is that they appear to be inherently sensitive to small fractions of adversarial outliers unlike most previous results for Gaussian mixtures. The reason is that such outliers can simulate exponentially small mixing weights even for mixtures with polynomially lower bounded mixing weights. A key technical ingredient is a characterization of separating directions for well-separated Gaussian components in terms of ratios of polynomials that correspond to moments of two carefully chosen orders logarithmic in the minimum mixing weight.


Neural Attention Models in Deep Learning: Survey and Taxonomy

arXiv.org Artificial Intelligence

Attention is a state of arousal capable of dealing with limited processing bottlenecks in human beings by focusing selectively on one piece of information while ignoring other perceptible information. For decades, concepts and functions of attention have been studied in philosophy, psychology, neuroscience, and computing. Currently, this property has been widely explored in deep neural networks. Many different neural attention models are now available and have been a very active research area over the past six years. From the theoretical standpoint of attention, this survey provides a critical analysis of major neural attention models. Here we propose a taxonomy that corroborates with theoretical aspects that predate Deep Learning. Our taxonomy provides an organizational structure that asks new questions and structures the understanding of existing attentional mechanisms. In particular, 17 criteria derived from psychology and neuroscience classic studies are formulated for qualitative comparison and critical analysis on the 51 main models found on a set of more than 650 papers analyzed. Also, we highlight several theoretical issues that have not yet been explored, including discussions about biological plausibility, highlight current research trends, and provide insights for the future.


Researchers explain why they believe Facebook mishandles political ads

NPR Technology

Facebook has worked for years to revamp its handling of political ads -- but researchers who conducted a comprehensive audit of millions of ads say the social media company's efforts have had uneven results. The problems, they say, include overcounting political ads in the U.S. -- and undercounting them in other countries. And despite Facebook's ban on political ads around the time of last year's U.S. elections, the platform allowed more than 70,000 political ads to run anyway, according to the research team that is based at the NYU Cybersecurity for Democracy and at the Belgian university KU Leuven. Their research study was released early Thursday. They also plan to present their findings at a security conference next August.


Artificial intelligence -- our next HR?

#artificialintelligence

A brief list of our favorite sourcing and recruiting tools, including those that are based on self-learning algorithms. Along with other parts of the business, human resources have also been digitized since the pandemic started. It became obvious that you cannot organize a physical interview with the potential candidates. And that many of them left crowded, expensive cities and went home, to their native towns or countries. Traveling is now open again, but it does not give a guarantee that one can hire new developers or marketers in a traditional way, like it was before the COVID19.


Edge Artificial Intelligence Market Research Report by Processor, by Component, by Source, by End-Use, by Application, by Region - Global Forecast to 2026 - Cumulative Impact of COVID-19

#artificialintelligence

GNW The Global Edge Artificial Intelligence Market size was estimated at USD 572.00 million in 2020 and expected to reach USD 701.73 million in 2021, at a CAGR 23.35% to reach USD 2,014.99 million by 2026. Market Statistics: The report provides market sizing and forecast across five major currencies - USD, EUR GBP, JPY, and AUD. It helps organization leaders make better decisions when currency exchange data is readily available. In this report, the years 2018 and 2019 are considered historical years, 2020 as the base year, 2021 as the estimated year, and years from 2022 to 2026 are considered the forecast period. Market Segmentation & Coverage: This research report categorizes the Edge Artificial Intelligence to forecast the revenues and analyze the trends in each of the following sub-markets: Based on Processor, the market was studied across ASIC, CPU, and GPU.


SNEAK: Synonymous Sentences-Aware Adversarial Attack on Natural Language Video Localization

arXiv.org Artificial Intelligence

Natural language video localization (NLVL) is an important task in the vision-language understanding area, which calls for an in-depth understanding of not only computer vision and natural language side alone, but more importantly the interplay between both sides. Adversarial vulnerability has been well-recognized as a critical security issue of deep neural network models, which requires prudent investigation. Despite its extensive yet separated studies in video and language tasks, current understanding of the adversarial robustness in vision-language joint tasks like NLVL is less developed. This paper therefore aims to comprehensively investigate the adversarial robustness of NLVL models by examining three facets of vulnerabilities from both attack and defense aspects. To achieve the attack goal, we propose a new adversarial attack paradigm called synonymous sentences-aware adversarial attack on NLVL (SNEAK), which captures the cross-modality interplay between the vision and language sides.