AITopics | Szymański, Piotr

Collaborating Authors

Szymański, Piotr

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Evaluation of Code LLMs on Geospatial Code Generation

Gramacki, Piotr, Martins, Bruno, Szymański, Piotr

arXiv.org Artificial IntelligenceDec-13-2024

Software development support tools have been studied for a long time, with recent approaches using Large Language Models (LLMs) for code generation. These models can generate Python code for data science and machine learning applications. LLMs are helpful for software engineers because they increase productivity in daily work. An LLM can also serve as a "mentor" for inexperienced software developers, and be a viable learning support. High-quality code generation with LLMs can also be beneficial in geospatial data science. However, this domain poses different challenges, and code generation LLMs are typically not evaluated on geospatial tasks. Here, we show how we constructed an evaluation benchmark for code generation models, based on a selection of geospatial tasks. We categorised geospatial tasks based on their complexity and required tools. Then, we created a dataset with tasks that test model capabilities in spatial reasoning, spatial data processing, and geospatial tools usage. The dataset consists of specific coding problems that were manually created for high quality. For every problem, we proposed a set of test scenarios that make it possible to automatically check the generated code for correctness. In addition, we tested a selection of existing code generation LLMs for code generation in the geospatial domain. We share our dataset and reproducible evaluation code on a public GitHub repository, arguing that this can serve as an evaluation benchmark for new LLMs in the future. Our dataset will hopefully contribute to the development new models capable of solving geospatial coding tasks with high accuracy. These models will enable the creation of coding assistants tailored for geospatial applications.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3687123.3698286

2410.04617

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Software (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SRAI: Towards Standardization of Geospatial AI

Gramacki, Piotr, Leśniara, Kacper, Raczycki, Kamil, Woźniak, Szymon, Przymus, Marcin, Szymański, Piotr

arXiv.org Artificial IntelligenceOct-23-2023

Spatial Representations for Artificial Intelligence (srai) is a Python library for working with geospatial data. The library can download geospatial data, split a given area into micro-regions using multiple algorithms and train an embedding model using various architectures. It includes baseline models as well as more complex methods from published works. Those capabilities make it possible to use srai in a complete pipeline for geospatial task solving. The proposed library is the first step to standardize the geospatial AI domain toolset. It is fully open-source and published under Apache 2.0 licence.

library, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3615886.3627740

2310.13098

Country:

Europe (0.71)
North America > United States > New York (0.14)
North America > United States > Louisiana (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Industry:

Information Technology (0.94)
Transportation > Infrastructure & Services (0.47)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Improved DeepFake Detection Using Whisper Features

Kawa, Piotr, Plata, Marcin, Czuba, Michał, Szymański, Piotr, Syga, Piotr

arXiv.org Artificial IntelligenceJun-2-2023

With a recent influx of voice generation methods, the threat introduced by audio DeepFake (DF) is ever-increasing. Several different detection methods have been presented as a countermeasure. Many methods are based on so-called front-ends, which, by transforming the raw audio, emphasize features crucial for assessing the genuineness of the audio sample. Our contribution contains investigating the influence of the state-of-the-art Whisper automatic speech recognition model as a DF detection front-end. We compare various combinations of Whisper and well-established front-ends by training 3 detection models (LCNN, SpecRNet, and MesoNet) on a widely used ASVspoof 2021 DF dataset and later evaluating them on the DF In-The-Wild dataset. We show that using Whisper-based features improves the detection for each model and outperforms recent results on the In-The-Wild dataset by reducing Equal Error Rate by 21%.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.01428

Country:

Europe > Poland (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

highway2vec -- representing OpenStreetMap microregions with respect to their road network characteristics

Leśniara, Kacper, Szymański, Piotr

arXiv.org Artificial IntelligenceApr-26-2023

Recent years brought advancements in using neural networks for representation learning of various language or visual phenomena. New methods freed data scientists from hand-crafting features for common tasks. Similarly, problems that require considering the spatial variable can benefit from pretrained map region representations instead of manually creating feature tables that one needs to prepare to solve a task. However, very few methods for map area representation exist, especially with respect to road network characteristics. In this paper, we propose a method for generating microregions' embeddings with respect to their road infrastructure characteristics. We base our representations on OpenStreetMap road networks in a selection of cities and use the H3 spatial index to allow reproducible and scalable representation learning. We obtained vector representations that detect how similar map hexagons are in the road networks they contain. Additionally, we observe that embeddings yield a latent space with meaningful arithmetic operations. Finally, clustering methods allowed us to draft a high-level typology of obtained representations. We are confident that this contribution will aid data scientists working on infrastructure-related prediction tasks with spatial variables.

artificial intelligence, machine learning, microregion, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3557918.3565865

2304.13865

Country:

Europe > Poland (0.71)
North America > United States > California (0.28)

Genre: Research Report (0.82)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish

Augustyniak, Łukasz, Tagowski, Kamil, Sawczyn, Albert, Janiak, Denis, Bartusiak, Roman, Szymczak, Adrian, Wątroba, Marcin, Janz, Arkadiusz, Szymański, Piotr, Morzy, Mikołaj, Kajdanowicz, Tomasz, Piasecki, Maciej

arXiv.org Artificial IntelligenceNov-23-2022

The availability of compute and data to train larger and larger language models increases the demand for robust methods of benchmarking the true progress of LM training. Recent years witnessed significant progress in standardized benchmarking for English. Benchmarks such as GLUE, SuperGLUE, or KILT have become de facto standard tools to compare large language models. Following the trend to replicate GLUE for other languages, the KLEJ benchmark has been released for Polish. In this paper, we evaluate the progress in benchmarking for low-resourced languages. We note that only a handful of languages have such comprehensive benchmarks. We also note the gap in the number of tasks being evaluated by benchmarks for resource-rich English/Chinese and the rest of the world. In this paper, we introduce LEPISZCZE (the Polish word for glew, the Middle English predecessor of glue), a new, comprehensive benchmark for Polish NLP with a large variety of tasks and high-quality operationalization of the benchmark. We design LEPISZCZE with flexibility in mind. Including new models, datasets, and tasks is as simple as possible while still offering data versioning and model tracking. In the first run of the benchmark, we test 13 experiments (task and dataset pairs) based on the five most recent LMs for Polish. We use five datasets from the Polish benchmark and add eight novel datasets. As the paper's main contribution, apart from LEPISZCZE, we provide insights and experiences learned while creating the benchmark for Polish as the blueprint to design similar benchmarks for other low-resourced languages.

artificial intelligence, benchmark, natural language, (15 more...)

arXiv.org Artificial Intelligence

2211.13112

Country:

North America > United States (0.46)
Europe > Poland (0.28)
Asia > Middle East (0.28)

Genre: Research Report (0.40)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)

Add feedback

gtfs2vec -- Learning GTFS Embeddings for comparing Public Transport Offer in Microregions

Gramacki, Piotr, Woźniak, Szymon, Szymański, Piotr

arXiv.org Artificial IntelligenceNov-2-2021

We selected 48 European cities and gathered their public transport timetables in the GTFS format. We utilized Uber's H3 spatial index to divide each city into hexagonal micro-regions. Based on the timetables data we created certain features describing the quantity and variety of public transport availability in each region. Next, we trained an auto-associative deep neural network to embed each of the regions. Having such prepared representations, we then used a hierarchical clustering approach to identify similar regions. To do so, we utilized an agglomerative clustering algorithm with a euclidean distance between regions and Ward's method to minimize in-cluster variance. Finally, we analyzed the obtained clusters at different levels to identify some number of clusters that qualitatively describe public transport availability. We showed that our typology matches the characteristics of analyzed cities and allows succesful searching for areas with similar public transport schedule characteristics.

artificial intelligence, machine learning, neural network, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3486640.3491392

2111.0096

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Poland > Lesser Poland Province > Kraków (0.14)

Genre: Research Report (0.40)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Hex2vec -- Context-Aware Embedding H3 Hexagons with OpenStreetMap Tags

Woźniak, Szymon, Szymański, Piotr

arXiv.org Artificial IntelligenceNov-1-2021

Representation learning of spatial and geographic data is a rapidly developing field which allows for similarity detection between areas and high-quality inference using deep neural networks. Past approaches however concentrated on embedding raster imagery (maps, street or satellite photos), mobility data or road networks. In this paper we propose the first approach to learning vector representations of OpenStreetMap regions with respect to urban functions and land-use in a micro-region grid. We identify a subset of OSM tags related to major characteristics of land-use, building and urban region functions, types of water, green or other natural areas. Through manual verification of tagging quality, we selected 36 cities were for training region representations. Uber's H3 index was used to divide the cities into hexagons, and OSM tags were aggregated for each hexagon. We propose the hex2vec method based on the Skip-gram model with negative sampling. The resulting vector representations showcase semantic structures of the map characteristics, similar to ones found in vector-based language models. We also present insights from region similarity detection in six Polish cities and propose a region typology obtained through agglomerative clustering.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3486635.3491076

2111.0097

Country:

Asia (0.96)
Europe > Poland (0.68)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Workflow (0.68)
Research Report (0.64)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Transfer Learning Approach to Bicycle-sharing Systems' Station Location Planning using OpenStreetMap Data

Raczycki, Kamil, Szymański, Piotr

arXiv.org Artificial IntelligenceNov-1-2021

Bicycle-sharing systems (BSS) have become a daily reality for many citizens of larger, wealthier cities in developed regions. However, planning the layout of bicycle-sharing stations usually requires expensive data gathering, surveying travel behavior and trip modelling followed by station layout optimization. Many smaller cities and towns, especially in developing areas, may have difficulty financing such projects. Planning a BSS also takes a considerable amount of time. Yet as the pandemic has shown us, municipalities will face the need to adapt rapidly to mobility shifts, which include citizens leaving public transport for bicycles. Laying out a bike sharing system quickly will become critical in addressing the increase in bike demand. This paper addresses the problem of cost and time in BSS layout design and proposes a new solution to streamline and facilitate the process of such planning by using spatial embedding methods. Based only on publicly available data from OpenStreetMap, and station layouts from 34 cities in Europe, a method has been developed to divide cities into micro-regions using the Uber H3 discrete global grid system and to indicate regions where it is worth placing a station based on existing systems in different cities using transfer learning. The result of the work is a mechanism to support planners in their decision making when planning a station layout with a choice of reference cities.

artificial intelligence, machine learning, neighbourhood, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3486626.3493434

2111.0099

Country:

Asia > China (0.48)
Europe > Poland (0.48)
North America > United States > New York (0.14)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report (0.64)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (0.67)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Avaya Conversational Intelligence: A Real-Time System for Spoken Language Understanding in Human-Human Call Center Conversations

Mizgajski, Jan, Szymczak, Adrian, Głowski, Robert, Szymański, Piotr, Żelasko, Piotr, Augustyniak, Łukasz, Morzy, Mikołaj, Carmiel, Yishay, Hodson, Jeff, Wójciak, Łukasz, Smoczyk, Daniel, Wróbel, Adam, Borowik, Bartosz, Artajew, Adam, Baran, Marcin, Kwiatkowski, Cezary, Żyła-Hoppe, Marzena

arXiv.org Machine LearningSep-2-2019

Avaya Conversational Intelligence (ACI) is an end-to-end, cloud-based solution for real-time Spoken Language Understanding for call centers. It combines large vocabulary, real-time speech recognition, transcript refinement, and entity and intent recognition in order to convert live audio into a rich, actionable stream of structured events. These events can be further leveraged with a business rules engine, thus serving as a foundation for real-time supervision and assistance applications. After the ingestion, calls are enriched with unsupervised keyword extraction, abstractive summarization, and business-defined attributes, enabling offline use cases, such as business intelligence, topic mining, full-text search, quality assurance, and agent training. ACI comes with a pretrained, configurable library of hundreds of intents and a robust intent training environment that allows for efficient, cost-effective creation and customization of customer-specific intents.

computer based training, educational technology, speech recognition, (21 more...)

arXiv.org Machine Learning

1909.02851

Country:

Europe > Poland (0.32)
North America > United States > California (0.15)

Genre: Research Report (0.40)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.98)

Add feedback

LNEMLC: Label Network Embeddings for Multi-Label Classifiation

Szymański, Piotr, Kajdanowicz, Tomasz, Chawla, Nitesh

arXiv.org Machine LearningDec-7-2018

Multi-label classification aims to classify instances with discrete non-exclusive labels. Most approaches on multi-label classification focus on effective adaptation or transformation of existing binary and multi-class learning approaches but fail in modelling the joint probability of labels or do not preserve generalization abilities for unseen label combinations. To address these issues we propose a new multi-label classification scheme, LNEMLC - Label Network Embedding for Multi-Label Classification, that embeds the label network and uses it to extend input space in learning and inference of any base multi-label classifier. The approach allows capturing of labels' joint probability at low computational complexity providing results comparable to the best methods reported in the literature. We demonstrate how the method reveals statistically significant improvements over the simple kNN baseline classifier. We also provide hints for selecting the robust configuration that works satisfactorily across data domains.

artificial intelligence, label network, machine learning, (16 more...)

arXiv.org Machine Learning

1812.02956

Country: Europe > Poland (0.29)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback