AITopics | Bucharest

Collaborating Authors

Bucharest

Rethinking the Authorship Verification Experimental Setups

Brad, Florin, Manolache, Andrei, Burceanu, Elena, Barbalau, Antonio, Ionescu, Radu, Popescu, Marius

arXiv.org Artificial IntelligenceNov-1-2022

One of the main drivers of the recent advances in authorship verification is the PAN large-scale authorship dataset. Despite generating significant progress in the field, inconsistent performance differences between the closed and open test sets have been reported. To this end, we improve the experimental setup by proposing five new public splits over the PAN dataset, specifically designed to isolate and identify biases related to the text topic and to the author's writing style. We evaluate several BERT-like baselines on these splits, showing that such models are competitive with authorship verification state-of-the-art methods. Furthermore, using explainable AI, we find that these baselines are biased towards named entities. We show that models trained without the named entities obtain better results and generalize better when tested on DarkReddit, our new dataset for authorship verification.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2112.05125

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(8 more...)

Genre: Research Report (0.84)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

VeriDark: A Large-Scale Benchmark for Authorship Verification on the Dark Web

Manolache, Andrei, Brad, Florin, Barbalau, Antonio, Ionescu, Radu Tudor, Popescu, Marius

arXiv.org Artificial IntelligenceNov-1-2022

The Dark Web represents a hotbed for illicit activity, where users communicate on different market forums in order to exchange goods and services. Law enforcement agencies benefit from forensic tools that perform authorship analysis, in order to identify and profile users based on their textual content. However, authorship analysis has been traditionally studied using corpora featuring literary texts such as fragments from novels or fan fiction, which may not be suitable in a cybercrime context. Moreover, the few works that employ authorship analysis tools for cybercrime prevention usually employ ad-hoc experimental setups and datasets. To address these issues, we release VeriDark: a benchmark comprised of three large scale authorship verification datasets and one authorship identification dataset obtained from user activity from either Dark Web related Reddit communities or popular illicit Dark Web market forums. We evaluate competitive NLP baselines on the three datasets and perform an analysis of the predictions to better understand the limitations of such approaches. We make the datasets and baselines publicly available at https://github.com/bit-ml/VeriDark.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2207.03477

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
(11 more...)

Genre: Research Report (0.64)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Communications > Social Media (1.00)
(2 more...)

Add feedback

Large scale traffic forecasting with gradient boosting, Traffic4cast 2022 challenge

Lumiste, Martin, Ilie, Andrei

arXiv.org Artificial IntelligenceOct-31-2022

Accurate traffic forecasting is of the utmost importance for optimal travel planning and for efficient city mobility. IARAI (The Institute of Advanced Research in Artificial Intelligence) organizes Traffic4cast, a yearly traffic prediction competition based on real-life data [https://www.iarai.ac.at/traffic4cast/], aiming to leverage artificial intelligence advances for producing accurate traffic estimates. We present our solution to the IARAI Traffic4cast 2022 competition, in which the goal is to develop algorithms for predicting road graph edge congestion classes and supersegment-level travel times. In contrast to the previous years, this year's competition focuses on modelling graph edge level behaviour, rather than more coarse aggregated grid-based traffic movies. Due to this, we leverage a method familiar from tabular data modelling -- gradient-boosted decision tree ensembles. We reduce the dimensionality of the input data representing traffic counters with the help of the classic PCA method and feed it as input to a LightGBM model. This simple, fast, and scalable technique allowed us to win second place in the core competition. The source code and references to trained model files and submissions are available at https://github.com/skandium/t4c22 .

artificial intelligence, competition, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2211.00157

Country:

Europe > Spain > Galicia > Madrid (0.05)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.05)
Asia > Middle East > Jordan (0.05)
Europe > Estonia > Harju County > Tallinn (0.04)

Genre: Research Report (0.65)

Industry:

Transportation (0.47)
Consumer Products & Services > Travel (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Deep Learning-Based Anomaly Detection in Synthetic Aperture Radar Imaging

Muzeau, Max, Ren, Chengfang, Angelliaume, Sébastien, Datcu, Mihai, Ovarlez, Jean-Philippe

arXiv.org Machine LearningOct-28-2022

In this paper, we proposed to investigate unsupervised anomaly detection in Synthetic Aperture Radar (SAR) images. Our approach considers anomalies as abnormal patterns that deviate from their surroundings but without any prior knowledge of their characteristics. In the literature, most model-based algorithms face three main issues. First, the speckle noise corrupts the image and potentially leads to numerous false detections. Second, statistical approaches may exhibit deficiencies in modeling spatial correlation in SAR images. Finally, neural networks based on supervised learning approaches are not recommended due to the lack of annotated SAR data, notably for the class of abnormal patterns. Our proposed method aims to address these issues through a self-supervised algorithm. The speckle is first removed through the deep learning SAR2SAR algorithm. Then, an adversarial autoencoder is trained to reconstruct an anomaly-free SAR image. Finally, a change detection processing step is applied between the input and the output to detect anomalies. Experiments are performed to show the advantages of our method compared to the conventional Reed-Xiaoli algorithm, highlighting the importance of an efficient despeckling pre-processing step.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2210.16038

Country:

Europe > France (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)
(6 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Crowd Anomaly Detection: State-of-the-Art, Challenges, and Future Research Directions

Sharif, Md. Haidar, Jiao, Lei, Omlin, Christian W.

arXiv.org Artificial IntelligenceOct-25-2022

Crowd anomaly detection is one of the most popular topics in computer vision in the context of smart cities. A plethora of deep learning methods have been proposed that generally outperform other machine learning solutions. Our review primarily discusses algorithms that were published in mainstream conferences and journals between 2020 and 2022. We present datasets that are typically used for benchmarking, produce a taxonomy of the developed algorithms, and discuss and compare their performances. Our main findings are that the heterogeneities of pre-trained convolutional models have a negligible impact on crowd video anomaly detection performance. We conclude our discussion with fruitful directions for future research.

data mining, machine learning, pattern recognition, (18 more...)

arXiv.org Artificial Intelligence

2210.13927

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.27)
Oceania > Australia > New South Wales > Sydney (0.13)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.13)
(60 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Education (1.00)
(5 more...)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multilingual Multimodal Learning with Machine Translated Text

Qiu, Chen, Oneata, Dan, Bugliarello, Emanuele, Frank, Stella, Elliott, Desmond

arXiv.org Artificial IntelligenceOct-24-2022

Most vision-and-language pretraining research focuses on English tasks. However, the creation of multilingual multimodal evaluation datasets (e.g. Multi30K, xGQA, XVNLI, and MaRVL) poses a new challenge in finding high-quality training data that is both multilingual and multimodal. In this paper, we investigate whether machine translating English multimodal data can be an effective proxy for the lack of readily available multilingual data. We call this framework TD-MML: Translated Data for Multilingual Multimodal Learning, and it can be applied to any multimodal dataset and model. We apply it to both pretraining and fine-tuning data with a state-of-the-art model. In order to prevent models from learning from low-quality translated text, we propose two metrics for automatically removing such translations from the resulting datasets. In experiments on five tasks across 20 languages in the IGLUE benchmark, we show that translated data can provide a useful signal for multilingual multimodal learning, both at pretraining and fine-tuning.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.13134

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(15 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)

Add feedback

Humans.ai wows thousands of people with its synthetic AI Guide in Bucharest

#artificialintelligenceOct-17-2022, 11:20:22 GMT

Humans.ai attended the sixth edition of Spotlight, one of the most anticipated outdoor visual art festivals in Bucharest, where it stole the show with its " BRING IT TO LIFE" installation, set up on the historical Revolution Square. To make its first appearance at the Spotlight festival, a memorable one, Humans.ai More precisely, a video mapping installation on a large-scale sculpture depicting the iconic Humans Head, the protagonist in the company's NFT collection that symbolizes the symbiosis between humans and artificial intelligence. Tens of thousands of tourists and Bucharest residents were in awe of the light show and the vibrant energy emanated by the video projection made by the Humans.ai Though, the real piece de la resistance was DIANA, our synthetic avatar capable of speaking every language of the European Union.

bucharest, festival, synthetic ai guide, (4 more...)

#artificialintelligence

Country: Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.90)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Step out of KG: Knowledge Graph Completion via Knowledgeable Retrieval and Reading Comprehension

Lv, Xin, Lin, Yankai, Yao, Zijun, Zeng, Kaisheng, Zhang, Jiajie, Hou, Lei, Li, Juanzi

arXiv.org Artificial IntelligenceOct-12-2022

Knowledge graphs, as the cornerstone of many AI applications, usually face serious incompleteness problems. In recent years, there have been many efforts to study automatic knowledge graph completion (KGC), most of which use existing knowledge to infer new knowledge. However, in our experiments, we find that not all relations can be obtained by inference, which constrains the performance of existing models. To alleviate this problem, we propose a new model based on information retrieval and reading comprehension, namely IR4KGC. Specifically, we pre-train a knowledge-based information retrieval module that can retrieve documents related to the triples to be completed. Then, the retrieved documents are handed over to the reading comprehension module to generate the predicted answers. In experiments, we find that our model can well solve relations that cannot be inferred from existing knowledge, and achieve good results on KGC datasets.

information retrieval, natural language, relation, (15 more...)

arXiv.org Artificial Intelligence

2210.05921

Country:

North America > United States > New York (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > China > Beijing > Beijing (0.04)
(5 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment > Sports (1.00)
Education > Assessment & Standards > Student Performance (0.81)
Media > Film (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.83)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.55)

Add feedback

Multimodal Multi-Head Convolutional Attention with Various Kernel Sizes for Medical Image Super-Resolution

Georgescu, Mariana-Iuliana, Ionescu, Radu Tudor, Miron, Andreea-Iuliana, Savencu, Olivian, Ristea, Nicolae-Catalin, Verga, Nicolae, Khan, Fahad Shahbaz

arXiv.org Artificial IntelligenceOct-12-2022

Super-resolving medical images can help physicians in providing more accurate diagnostics. In many situations, computed tomography (CT) or magnetic resonance imaging (MRI) techniques capture several scans (modes) during a single investigation, which can jointly be used (in a multimodal fashion) to further boost the quality of super-resolution results. To this end, we propose a novel multimodal multi-head convolutional attention module to super-resolve CT and MRI scans. Our attention module uses the convolution operation to perform joint spatial-channel attention on multiple concatenated input tensors, where the kernel (receptive field) size controls the reduction rate of the spatial attention, and the number of convolutional filters controls the reduction rate of the channel attention, respectively. We introduce multiple attention heads, each head having a distinct receptive field size corresponding to a particular reduction rate for the spatial attention. We integrate our multimodal multi-head convolutional attention (MMHCA) into two deep neural architectures for super-resolution and conduct experiments on three data sets. Our empirical results show the superiority of our attention module over the state-of-the-art attention mechanisms used in super-resolution. Moreover, we conduct an ablation study to assess the impact of the components involved in our attention module, e.g. the number of inputs or the number of heads. Our code is freely available at https://github.com/lilygeorgescu/MHCA.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2204.04218

Country:

Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
Europe > Sweden > Östergötland County > Linköping (0.04)
Asia > Middle East > UAE (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

YFACC: A Yor\`ub\'a speech-image dataset for cross-lingual keyword localisation through visual grounding

Olaleye, Kayode, Oneata, Dan, Kamper, Herman

arXiv.org Artificial IntelligenceOct-12-2022

Visually grounded speech (VGS) models are trained on images paired with unlabelled spoken captions. Such models could be used to build speech systems in settings where it is impossible to get labelled data, e.g. for documenting unwritten languages. However, most VGS studies are in English or other high-resource languages. This paper attempts to address this shortcoming. We collect and release a new single-speaker dataset of audio captions for 6k Flickr images in Yor\`ub\'a -- a real low-resource language spoken in Nigeria. We train an attention-based VGS model where images are automatically tagged with English visual labels and paired with Yor\`ub\'a utterances. This enables cross-lingual keyword localisation: a written English query is detected and located in Yor\`ub\'a speech. To quantify the effect of the smaller dataset, we compare to English systems trained on similar and more data. We hope that this new dataset will stimulate research in the use of VGS models for real low-resource languages.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2210.046

Country:

Africa > Nigeria (0.24)
Africa > South Africa (0.04)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback