AITopics | Wójcik, Bartosz

Collaborating Authors

Wójcik, Bartosz

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Maybe I Should Not Answer That, but... Do LLMs Understand The Safety of Their Inputs?

Chrabąszcz, Maciej, Szatkowski, Filip, Wójcik, Bartosz, Dubiński, Jan, Trzciński, Tomasz

arXiv.org Artificial IntelligenceFeb-22-2025

Ensuring the safety of the Large Language Model (LLM) is critical, but currently used methods in most cases sacrifice the model performance to obtain increased safety or perform poorly on data outside of their adaptation distribution. We investigate existing methods for such generalization and find them insufficient. Surprisingly, while even plain LLMs recognize unsafe prompts, they may still generate unsafe responses. To avoid performance degradation and preserve safe performance, we advocate for a two-step framework, where we first identify unsafe prompts via a lightweight classifier, and apply a "safe" model only to such prompts. In particular, we explore the design of the safety detector in more detail, investigating the use of different classifier architectures and prompting techniques. Interestingly, we find that the final hidden state for the last token is enough to provide robust performance, minimizing false positives on benign data while performing well on malicious prompt detection. Additionally, we show that classifiers trained on the representations from different model layers perform comparably on the latest model layers, indicating that safety representation is present in the LLMs' hidden states at most model stages. Our work is a step towards efficient, representation-based safety mechanisms for LLMs.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.16174

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback

Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference

Wójcik, Bartosz, Devoto, Alessio, Pustelnik, Karol, Minervini, Pasquale, Scardapane, Simone

arXiv.org Artificial IntelligenceDec-15-2023

The computational cost of transformer models makes them inefficient in low-latency or low-power applications. While techniques such as quantization or linear attention can reduce the computational load, they may incur a reduction in accuracy. In addition, globally reducing the cost for all inputs may be sub-optimal. We observe that for each layer, the full width of the layer may be needed only for a small subset of tokens inside a batch and that the "effective" width needed to process a token can vary from layer to layer. Motivated by this observation, we introduce the Adaptive Computation Module (ACM), a generic module that dynamically adapts its computational load to match the estimated difficulty of the input on a per-token basis. An ACM consists of a sequence of learners that progressively refine the output of their preceding counterparts. An additional gating mechanism determines the optimal number of learners to execute for each token. We also describe a distillation technique to replace any pre-trained model with an "ACMized" variant. The distillation phase is designed to be highly parallelizable across layers while being simple to plug-and-play into existing networks. Our evaluation of transformer models in computer vision and speech recognition demonstrates that substituting layers with ACMs significantly reduces inference costs without degrading the downstream accuracy for a wide interval of user-defined budgets.

learner, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2312.10193

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Exploiting Transformer Activation Sparsity with Dynamic Inference

Piórczyński, Mikołaj, Szatkowski, Filip, Bałazy, Klaudia, Wójcik, Bartosz

arXiv.org Artificial IntelligenceOct-6-2023

At the same time, previous studies have revealed significant activation sparsity in these models, indicating the presence of redundant computations. In this paper, we propose Dynamic Sparsified Transformer Inference (DSTI), a method that radically reduces the inference cost of Transformer models by enforcing activation sparsity and subsequently transforming a dense model into its sparse Mixture of Experts (MoE) version. We demonstrate that it is possible to train small gating networks that successfully predict the relative contribution of each expert during inference. Furthermore, we introduce a mechanism that dynamically determines the number of executed experts individually for each token. DSTI can be applied to any Transformer-based architecture and has negligible impact on the accuracy. For the BERT-base classification model, we reduce inference cost by almost 60%.

activation sparsity, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2310.04361

Country:

Europe > Poland (0.15)
Europe > Finland (0.14)
Europe > Belgium (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Face Identity-Aware Disentanglement in StyleGAN

Suwała, Adrian, Wójcik, Bartosz, Proszewska, Magdalena, Tabor, Jacek, Spurek, Przemysław, Śmieja, Marek

arXiv.org Artificial IntelligenceSep-21-2023

Conditional GANs are frequently used for manipulating the attributes of face images, such as expression, hairstyle, pose, or age. Even though the state-of-the-art models successfully modify the requested attributes, they simultaneously modify other important characteristics of the image, such as a person's identity. In this paper, we focus on solving this problem by introducing PluGeN4Faces, a plugin to StyleGAN, which explicitly disentangles face attributes from a person's identity. Our key idea is to perform training on images retrieved from movie frames, where a given person appears in various poses and with different attributes. By applying a type of contrastive loss, we encourage the model to group images of the same person in similar regions of latent space. Our experiments demonstrate that the modifications of face attributes performed by PluGeN4Faces are significantly less invasive on the remaining characteristics of the image than in the existing state-of-the-art models.

artificial intelligence, machine learning, plugen4face, (18 more...)

arXiv.org Artificial Intelligence

2309.12033

Genre: Research Report (0.90)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.90)

Add feedback

Computer Vision based inspection on post-earthquake with UAV synthetic dataset

Żarski, Mateusz, Wójcik, Bartosz, Miszczak, Jarosław A., Blachowski, Bartłomiej, Ostrowski, Mariusz

arXiv.org Artificial IntelligenceOct-11-2022

Earthquakes are sudden and violent disasters that cover huge areas of land in a very short period of time. They have been known to mankind since ancient times and invariably pose one of the most serious threats to the lives of people concentrated in large cities. The scale of their destructive power can be seen in the number of nearly two million earthquake victims in the 20th century alone [1], or in the most devastating events, which could claim up to nearly a million lives [2]. At the same time, the map of seismically active areas largely overlaps with densely populated areas, particularly in North America, Europe and Asia [3], which focuses researchers on this type of hazard and methods of its mitigation. Studies conducted to date have assessed the effects of earthquakes both in terms of the impact on housing and infrastructure, and the performance of public services in repairing damage or improving traffic flow in the affected area [4, 5]. These works have led to concepts of cities in which such events will no longer have a critical impact on the lives of residents, but with the cost of monitoring the condition of structures even after seemingly harmless, small earthquakes to take corrective action immediately after damage occurs [6]. This, however, requires the use of modern methods of construction monitoring to reduce the labor intensity of the entire process, without which the end goal is impossible to achieve. In this paper, we present our step towards building autonomous systems that can bring this goal closer.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ACCESS.2022.3212918

2210.05282

Country:

North America (0.88)
Europe (0.66)
Asia (0.66)

Genre: Research Report (0.82)

Industry:

Materials (0.46)
Energy > Oil & Gas (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(2 more...)

Add feedback

Hard hat wearing detection based on head keypoint localization

Wójcik, Bartosz, Żarski, Mateusz, Książek, Kamil, Miszczak, Jarosław Adam, Skibniewski, Mirosław Jan

arXiv.org Artificial IntelligenceJun-21-2021

In recent years, a lot of attention is paid to deep learning methods in the context of vision-based construction site safety systems, especially regarding personal protective equipment. However, despite all this attention, there is still no reliable way to establish the relationship between workers and their hard hats. To answer this problem a combination of deep learning, object detection and head keypoint localization, with simple rule-based reasoning is proposed in this article. In tests, this solution surpassed the previous methods based on the relative bounding box position of different instances, as well as direct detection of hard hat wearers and non-wearers. The results show that the conjunction of novel deep learning methods with humanly-interpretable rule-based systems can result in a solution that is both reliable and can successfully mimic manual, on-site supervision. This work is the next step in the development of fully autonomous construction site safety systems and shows that there is still room for improvement in this area.

deep learning, detection, labor law, (20 more...)

arXiv.org Artificial Intelligence

2106.10944

Country: North America > United States > Maryland > Prince George's County > College Park (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Energy > Power Industry (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adversarial Examples Detection and Analysis with Layer-wise Autoencoders

Wójcik, Bartosz, Morawiecki, Paweł, Śmieja, Marek, Krzyżek, Tomasz, Spurek, Przemysław, Tabor, Jacek

arXiv.org Machine LearningJun-17-2020

We present a mechanism for detecting adversarial examples based on data representations taken from the hidden layers of the target network. For this purpose, we train individual autoencoders at intermediate layers of the target network. This allows us to describe the manifold of true data and, in consequence, decide whether a given example has the same characteristics as true data. It also gives us insight into the behavior of adversarial examples and their flow through the layers of a deep neural network. Experimental results show that our method outperforms the state of the art in supervised and unsupervised settings.

adversarial example, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

2006.10013

Country: North America > Canada (0.46)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

One-element Batch Training by Moving Window

Spurek, Przemysław, Knop, Szymon, Tabor, Jacek, Podolak, Igor, Wójcik, Bartosz

arXiv.org Machine LearningMay-31-2019

Several deep models, esp. the generative, compare the samples from two distributions (e.g. WAE like AutoEncoder models, set-processing deep networks, etc) in their cost functions. Using all these methods one cannot train the model directly taking small size (in extreme -- one element) batches, due to the fact that samples are to be compared. We propose a generic approach to training such models using one-element mini-batches. The idea is based on splitting the batch in latent into parts: previous, i.e. historical, elements used for latent space distribution matching and the current ones, used both for latent distribution computation and the minimization process. Due to the smaller memory requirements, this allows to train networks on higher resolution images then in the classical approach.

activation, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

1905.12947

Country: North America > Canada (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

LOSSGRAD: automatic learning rate in gradient descent

Wójcik, Bartosz, Maziarka, Łukasz, Tabor, Jacek

arXiv.org Machine LearningFeb-20-2019

In this paper, we propose a simple, fast and easy to implement algorithm LOSSGRAD (locally optimal step-size in gradient descent), which automatically modifies the step-size in gradient descent during neural networks training. Given a function $f$, a point $x$, and the gradient $\nabla_x f$ of $f$, we aim to find the step-size $h$ which is (locally) optimal, i.e. satisfies: $$ h=arg\,min_{t \geq 0} f(x-t \nabla_x f). $$ Making use of quadratic approximation, we show that the algorithm satisfies the above assumption. We experimentally show that our method is insensitive to the choice of initial learning rate while achieving results comparable to other methods.

deep learning, learning rate, neural network, (17 more...)

arXiv.org Machine Learning

1902.07656

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback