AITopics

Can I Solve It? Identifying APIs Required to Complete OSS Task

Santos, Fabio, Wiese, Igor, Trinkenreich, Bianca, Steinmacher, Igor, Sarma, Anita, Gerosa, Marco

Open Source Software projects add labels to open issues to help contributors choose tasks. However, manually labeling issues is time-consuming and error-prone. Current automatic approaches for creating labels are mostly limited to classifying issues as a bug/non-bug. In this paper, we investigate the feasibility and relevance of labeling issues with the domain of the APIs required to complete the tasks. We leverage the issues' description and the project history to build prediction models, which resulted in precision up to 82% and recall up to 97.8%. We also ran a user study (n=74) to assess these labels' relevancy to potential contributors. The results show that the labels were useful to participants in choosing tasks, and the API-domain labels were selected more often than the existing architecture-based labels. Our results can inspire the creation of tools to automatically label issues, helping developers to find tasks that better match their skills.

artificial intelligence, machine learning, natural language, (18 more...)

doi: 10.1109/MSR52588.2021.00047

2103.12653

Country:

North America > United States > New York > New York County > New York City (0.04)
South America > Brazil > Paraná (0.04)
North America > United States > Oregon (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Actionable Cognitive Twins for Decision Making in Manufacturing

Rožanec, Jože M., Lu, Jinzhi, Rupnik, Jan, Škrjanc, Maja, Mladenić, Dunja, Fortuna, Blaž, Zheng, Xiaochen, Kiritsis, Dimitris

Actionable Cognitive Twins are the next generation Digital Twins enhanced with cognitive capabilities through a knowledge graph and artificial intelligence models that provide insights and decision-making options to the users. The knowledge graph describes the domain-specific knowledge regarding entities and interrelationships related to a manufacturing setting. It also contains information on possible decision-making options that can assist decision-makers, such as planners or logisticians. In this paper, we propose a knowledge graph modeling approach to construct actionable cognitive twins for capturing specific knowledge related to demand forecasting and production planning in a manufacturing plant. The knowledge graph provides semantic descriptions and contextualization of the production lines and processes, including data identification and simulation or artificial intelligence algorithms and forecasts used to support them. Such semantics provide ground for inferencing, relating different knowledge types: creative, deductive, definitional, and inductive. To develop the knowledge graph models for describing the use case completely, systems thinking approach is proposed to design and verify the ontology, develop a knowledge graph and build an actionable cognitive twin. Finally, we evaluate our approach in two use cases developed for a European original equipment manufacturer related to the automotive industry as part of the European Horizon 2020 project FACTLOG.

digital twin, knowledge graph, ontology, (10 more...)

2103.12854

Country:

Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre:

Research Report (0.64)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.88)

Camara, B. H. P., Silva, M. A. G., Endo, A. T., Vergilio, S. R.

What is the Vocabulary of Flaky Tests? An Extended Replication

Software systems have been continuously evolved and delivered with high quality due to the widespread adoption of automated tests. A recurring issue hurting this scenario is the presence of flaky tests, a test case that may pass or fail non-deterministically. A promising, but yet lacking more empirical evidence, approach is to collect static data of automated tests and use them to predict their flakiness. In this paper, we conducted an empirical study to assess the use of code identifiers to predict test flakiness. To do so, we first replicate most parts of the previous study of Pinto~et~al.~(MSR~2020). This replication was extended by using a different ML Python platform (Scikit-learn) and adding different learning algorithms in the analyses. Then, we validated the performance of trained models using datasets with other flaky tests and from different projects. We successfully replicated the results of Pinto~et~al.~(2020), with minor differences using Scikit-learn; different algorithms had performance similar to the ones used previously. Concerning the validation, we noticed that the recall of the trained models was smaller, and classifiers presented a varying range of decreases. This was observed in both intra-project and inter-projects test flakiness prediction.

classifier, flaky test, original study, (16 more...)

2103.1267

Country:

North America > United States > New York > New York County > New York City (0.04)
South America > Brazil > Paraná > Curitiba (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Multilingual Autoregressive Entity Linking

De Cao, Nicola, Wu, Ledell, Popat, Kashyap, Artetxe, Mikel, Goyal, Naman, Plekhanov, Mikhail, Zettlemoyer, Luke, Cancedda, Nicola, Riedel, Sebastian, Petroni, Fabio

We present mGENRE, a sequence-to-sequence system for the Multilingual Entity Linking (MEL) problem -- the task of resolving language-specific mentions to a multilingual Knowledge Base (KB). For a mention in a given language, mGENRE predicts the name of the target entity left-to-right, token-by-token in an autoregressive fashion. The autoregressive formulation allows us to effectively cross-encode mention string and entity names to capture more interactions than the standard dot product between mention and entity vectors. It also enables fast search within a large KB even for mentions that do not appear in mention tables and with no need for large-scale vector indices. While prior MEL works use a single representation for each entity, we match against entity names of as many languages as possible, which allows exploiting language connections between source input and target name. Moreover, in a zero-shot setting on languages with no training data at all, mGENRE treats the target language as a latent variable that is marginalized at prediction time. This leads to over 50% improvements in average accuracy. We show the efficacy of our approach through extensive evaluation including experiments on three popular MEL benchmarks where mGENRE establishes new state-of-the-art results. Code and pre-trained models at https://github.com/facebookresearch/GENRE.

computational linguistic, mgenre, proceedings, (14 more...)

2103.12528

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > The Bahamas (0.14)
North America > Canada > Quebec > Montreal (0.04)
(17 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Cinà, Antonio Emanuele, Vascon, Sebastiano, Demontis, Ambra, Biggio, Battista, Roli, Fabio, Pelillo, Marcello

The Hammer and the Nut: Is Bilevel Optimization Really Needed to Poison Linear Classifiers?

One of the most concerning threats for modern AI systems is data poisoning, where the attacker injects maliciously crafted training data to corrupt the system's behavior at test time. Availability poisoning is a particularly worrisome subset of poisoning attacks where the attacker aims to cause a Denial-of-Service (DoS) attack. However, the state-of-the-art algorithms are computationally expensive because they try to solve a complex bi-level optimization problem (the "hammer"). We observed that in particular conditions, namely, where the target model is linear (the "nut"), the usage of computationally costly procedures can be avoided. We propose a counter-intuitive but efficient heuristic that allows contaminating the training set such that the target system's performance is highly compromised. We further suggest a re-parameterization trick to decrease the number of variables to be optimized. Finally, we demonstrate that, under the considered settings, our framework achieves comparable, or even better, performances in terms of the attacker's objective while being significantly more computationally efficient.

algorithm, attacker, poisoning, (17 more...)

2103.12399

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)
Europe > Italy > Sardinia > Cagliari (0.05)
(16 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

#artificialintelligenceMar-22-2021, 13:43:20 GMT

Spotting UFOs: Do-it-yourself sky surveillance comes online

If you are perplexed, befuddled and bewildered about reports of unidentified aerial phenomena (UAP) and possible visitations of alien spacecraft, you can take action with do-it-yourself sky-monitoring gear. Given the low cost and high capability of today's consumer-grade technology, you too can be at the ready to document out-of-the-ordinary events. Enter the world of Sky Hub, a global network of smart sensors designed to snag digital signatures of anomalous events. These advancements can be harnessed to probe the ongoing, baffling behavior of UAP and unidentified flying objects (UFOs) reportedly crisscrossing the skies. The mission of Sky Hub is clearcut: Connect a network of civilian-owned sensor arrays, use machine learning to catalogue anomalous events, and share this data with researchers.

cogswell, sky hub, ufo, (14 more...)

#artificialintelligence

Country:

South America > Brazil (0.05)
North America > United States > Kansas (0.05)
Europe > United Kingdom (0.05)

Industry:

Government > Military (0.73)
Government > Regional Government > North America Government > United States Government (0.50)
Transportation > Air (0.49)

Technology: Information Technology > Artificial Intelligence (0.70)

arXiv.org Artificial IntelligenceMar-22-2021

Cooperative Learning of Zero-Shot Machine Reading Comprehension

Luo, Hongyin, Li, Shang-Wen, Yu, Seunghak, Glass, James

Pretrained language models have significantly improved the performance of down-stream language understanding tasks, including extractive question answering, by providing high-quality contextualized word embeddings. However, learning question answering models still need large-scaled data annotation in specific domains. In this work, we propose a cooperative, self-play learning framework, REGEX, for question generation and answering. REGEX is built upon a masked answer extraction task with an interactive learning environment containing an answer entity REcognizer, a question Generator, and an answer EXtractor. Given a passage with a masked entity, the generator generates a question around the entity, and the extractor is trained to extract the masked entity with the generated question and raw texts. The framework allows the training of question generation and answering models on any text corpora without annotation. We further leverage a reinforcement learning technique to reward generating high-quality questions and to improve the answer extraction model's performance. Experiment results show that REGEX outperforms the state-of-the-art (SOTA) pretrained language models and zero-shot approaches on standard question-answering benchmarks, and yields the new SOTA performance under the zero-shot setting.

answer entity, artificial intelligence, natural language, (19 more...)

2103.07449

Country:

Europe (0.46)
South America > Uruguay (0.14)
North America > United States > New York (0.14)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Education > Assessment & Standards > Student Performance (0.41)
Energy > Oil & Gas (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Machine LearningMar-22-2021

Statistically-Robust Clustering Techniques for Mapping Spatial Hotspots: A Survey

Xie, Yiqun, Shekhar, Shashi, Li, Yan

Mapping of spatial hotspots, i.e., regions with significantly higher rates or probability density of generating certain events (e.g., disease or crime cases), is a important task in diverse societal domains, including public health, public safety, transportation, agriculture, environmental science, etc. Clustering techniques required by these domains differ from traditional clustering methods due to the high economic and social costs of spurious results (e.g., false alarms of crime clusters). As a result, statistical rigor is needed explicitly to control the rate of spurious detections. To address this challenge, techniques for statistically-robust clustering have been extensively studied by the data mining and statistics communities. In this survey we present an up-to-date and detailed review of the models and algorithms developed by this field. We first present a general taxonomy of the clustering process with statistical rigor, covering key steps of data and statistical modeling, region enumeration and maximization, significance testing, and data update. We further discuss different paradigms and methods within each of key steps. Finally, we highlight research gaps and potential future directions, which may serve as a stepping stone in generating new ideas and thoughts in this growing field and beyond.

detection, hotspot, statistics, (15 more...)

arXiv.org Machine Learning

2103.12019

Country:

North America > United States > New York > New York County > New York City (0.28)
North America > United States > California (0.14)
North America > United States > Minnesota (0.04)
(18 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Machine LearningMar-22-2021

Spatio-Temporal Neural Network for Fitting and Forecasting COVID-19

Niu, Yi-Shuai, Ding, Wentao, Hu, Junpeng, Xu, Wenxu, Canu, Stephane

We established a Spatio-Temporal Neural Network, namely STNN, to forecast the spread of the coronavirus COVID-19 outbreak worldwide in 2020. The basic structure of STNN is similar to the Recurrent Neural Network (RNN) incorporating with not only temporal data but also spatial features. Two improved STNN architectures, namely the STNN with Augmented Spatial States (STNN-A) and the STNN with Input Gate (STNN-I), are proposed, which ensure more predictability and flexibility. STNN and its variants can be trained using Stochastic Gradient Descent (SGD) algorithm and its improved variants (e.g., Adam, AdaGrad and RMSProp). Our STNN models are compared with several classical epidemic prediction models, including the fully-connected neural network (BPNN), and the recurrent neural network (RNN), the classical curve fitting models, as well as the SEIR dynamical system model. Numerical simulations demonstrate that STNN models outperform many others by providing more accurate fitting and prediction, and by handling both spatial and temporal data.

prediction, spatio-temporal neural network, stnn, (13 more...)

arXiv.org Machine Learning

2103.1186

Country:

North America > United States (0.29)
Europe > Italy (0.05)
Asia > China > Shanghai > Shanghai (0.05)
(9 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)