AITopics | special character

Collaborating Authors

special character

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Gaussian Process Assisted Meta-learning for Image Classification and Object Detection Models

Flowers, Anna R., Franck, Christopher T., Gramacy, Robert B., Krometis, Justin A.

arXiv.org Machine LearningDec-24-2025

Collecting operationally realistic data to inform machine learning models can be costly. Before collecting new data, it is helpful to understand where a model is deficient. For example, object detectors trained on images of rare objects may not be good at identification in poorly represented conditions. We offer a way of informing subsequent data acquisition to maximize model performance by leveraging the toolkit of computer experiments and metadata describing the circumstances under which the training data was collected (e.g., season, time of day, location). We do this by evaluating the learner as the training data is varied according to its metadata. A Gaussian process (GP) surrogate fit to that response surface can inform new data acquisitions. This meta-learning approach offers improvements to learner performance as compared to data with randomly selected metadata, which we illustrate on both classic learning examples, and on a motivating application involving the collection of aerial images in search of airplanes.

category, experiment, proportion balance, (16 more...)

arXiv.org Machine Learning

2512.20021

Country:

North America > United States > Virginia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Adaptive Dual-Layer Web Application Firewall (ADL-WAF) Leveraging Machine Learning for Enhanced Anomaly and Threat Detection

Sameh, Ahmed, Selim, Sahar

arXiv.org Artificial IntelligenceNov-18-2025

Web Application Firewalls are crucial for protecting web applications against a wide range of cyber threats. Traditional Web Application Firewalls often struggle to effectively distinguish between malicious and legitimate traffic, leading to limited efficacy in threat detection. To overcome these limitations, this paper proposes an Adaptive Dual-Layer WAF employing a two-layered Machine Learning model designed to enhance the accuracy of anomaly and threat detection. The first layer employs a Decision Tree (DT) algorithm to detect anomalies by identifying traffic deviations from established normal patterns. The second layer employs Support Vector Machine to classify these anomalies as either threat anomalies or benign anomalies. Our Adaptive Dual Layer WAF incorporates comprehensive data pre-processing and feature engineering techniques and has been thoroughly evaluated using five large benchmark datasets. Evaluation using these datasets shows that ADL WAF achieves a detection accuracy of 99.88% and a precision of 100%, significantly enhancing anomaly detection and reducing false positives. These findings suggest that integrating machine learning techniques into WAFs can substantially improve web application security by providing more accurate and efficient threat detection.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.12643

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.95)

Add feedback

Enhancing Password Security Through a High-Accuracy Scoring Framework Using Random Forests

Mazelan, Muhammed El Mustaqeem, Abdul, Noor Hazlina, AlDahoul, Nouar

arXiv.org Artificial IntelligenceNov-14-2025

Password security plays a crucial role in cybersecurity, yet traditional password strength meters, which rely on static rules like character - type requirements, often fail . Such methods are easily bypassed by common password patterns (e.g., 'P@ssw0rd1!'), giving users a false sense of security . To address this, we implement and evaluate a password strength scoring system by comparing four machine learning models: Random Forest (RF), Support Vector Machine (SVM), a Convolutional Neural Network (CNN), and Logistic Regression with a dataset of over 660,000 real - world passwords. Our primary contribution is a novel hybrid feature engineering approach that captures nuanced vulnerabilities missed by standard metrics . We introduce features like leetspeak - normalized Shannon entropy to assess true randomness, pattern detection for keyboard walks and sequences, and character - level TF - IDF n - grams to identify frequently reused substrings from breached password datasets. Crucially, the interpretability of the Random Forest model allows for feature importance analysis, providing a clear pathway to developing security tools that offer specific, actionable feedback to users. This study bridges the gap betwee n predictive accuracy and practical usability, resulting in a high - performance scoring system that not only reduces password - based vulnerabilities but also empowers users to make more informed security decisions. Keywords - Password Security, Machine Learning, Rule - Based Attack, Brute - Force Attack, Dictionary Attack, Cybersecurity. 1. P asswords remain a cornerstone of online security, serving as the primary means of authentication for countless systems and applications . However, this reliance is a critical vulnerability; according to a report by Google Cloud, a staggering 86% of breaches involve stolen credentials, posing a significant threat to both user data and system security .[1] M any users choose weak, easily guessable passwords, which pose a serious threat to both user data and system security . Attackers frequently exploit this vulnerability in large - scale attacks, compromising user privacy and enabling financial fraud . Most traditional password strength scoring tools rely on static rules, such as requiring a mix of lowercase, uppercase, digits, and special characters (LUDS), which fail to adapt to evolving attack patterns .

artificial intelligence, machine learning, password, (18 more...)

arXiv.org Artificial Intelligence

2511.09492

Country: Asia (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.54)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Applying the Chinese Wall Reverse Engineering Technique to Large Language Model Code Editing

Hanmongkolchai, Manatsawin

arXiv.org Artificial IntelligenceJul-22-2025

This work does not provide legal advice, and do not claims that any legal opinion provided are correct 1 and block the model's output accordingly. This technique might not completely block partial matches and does not work for open source development as reproduction of the original code is expected. Some models address this issue by using curated datasets with appropriate licensed contents. For example, the Stack v2 dataset [2] and Starcoder2 model limits data to permissively licensed sources and contents with unknown license. The Common Pile dataset [3] and the accompanied Comma model improves on this by limiting the dataset to permissive licensed contents only. Most permissive licenses only have attribution as its primary sole licensing condition and may be easier to comply with than the GPLv2 license. Ideally, models that are trained on public domain contents may be the best in terms of legal compliance as they have no restrictions or requirements, but to our knowledge no such text generation models exist today with reasonable quality.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2507.15599

Genre: Research Report (0.40)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)

Add feedback

Welp, Nvidia's RTX 5090 can crack an 8-digit password in 3 hours

PCWorldMay-12-2025, 14:48:10 GMT

I have bad news for everyone with weak passwords. A hacker can guess your laziest random passwords in the same amount of time it takes to watch a movie. It turns out when you put the most brutally fast consumer graphics card on the task of, uh, brute-forcing 8-character passwords, it can crack a numbers-only string in 3 hours. Such is the finding of Hive Systems, a cybersecurity firm based in Virginia, as part of the research that went into its 2025 password table. The chart shows how fast a "consumer budget" hacker could brute-force passwords of varying lengths (4 to 18 characters) and compositions (e.g., numbers only, lowercase letters, uppercase and lowercase letters, etc.).

artificial intelligence, natural language, password, (19 more...)

PCWorld

Country: North America > United States > Virginia (0.25)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.36)

Add feedback

M-IFEval: Multilingual Instruction-Following Evaluation

Dussolle, Antoine, Díaz, Andrea Cardeña, Sato, Shota, Devine, Peter

arXiv.org Artificial IntelligenceFeb-7-2025

Instruction following is a core capability of modern Large language models (LLMs), making evaluating this capability essential to understanding these models. The Instruction Following Evaluation (IFEval) benchmark from the literature does this using objective criteria, offering a measure of LLM performance without subjective AI or human judgement. However, it only includes English instructions, limiting its ability to assess LLMs in other languages. We propose the Multilingual Instruction Following Evaluation (M-IFEval) benchmark, expanding the evaluation to French, Japanese, and Spanish, with both general and language-specific instructions. Applying this benchmark to 8 state-of-the-art LLMs, we find that benchmark performance across languages and instruction types can vary widely, underscoring the importance of a multilingual benchmark for evaluating LLMs in a diverse cultural context.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.04688

Country:

Asia > Thailand > Bangkok > Bangkok (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > Faroe Islands > Streymoy > Tórshavn (0.04)
(5 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

MathReader : Text-to-Speech for Mathematical Documents

Hyeon, Sieun, Jung, Kyudan, Kim, Nam-Joon, Ryu, Hyun Gon, Do, Jaeyoung

arXiv.org Artificial IntelligenceJan-13-2025

TTS (Text-to-Speech) document reader from Microsoft, Adobe, Apple, and OpenAI have been serviced worldwide. They provide relatively good TTS results for general plain text, but sometimes skip contents or provide unsatisfactory results for mathematical expressions. This is because most modern academic papers are written in LaTeX, and when LaTeX formulas are compiled, they are rendered as distinctive text forms within the document. However, traditional TTS document readers output only the text as it is recognized, without considering the mathematical meaning of the formulas. To address this issue, we propose MathReader, which effectively integrates OCR, a fine-tuned T5 model, and TTS. MathReader demonstrated a lower Word Error Rate (WER) than existing TTS document readers, such as Microsoft Edge and Adobe Acrobat, when processing documents containing mathematical formulas. MathReader reduced the WER from 0.510 to 0.281 compared to Microsoft Edge, and from 0.617 to 0.281 compared to Adobe Acrobat. This will significantly contribute to alleviating the inconvenience faced by users who want to listen to documents, especially those who are visually impaired. The code is available at https://github.com/hyeonsieun/MathReader.

formula, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.07088

Country: North America > United States (0.15)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.37)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

The MSR-Video to Text Dataset with Clean Annotations

Chen, Haoran, Li, Jianmin, Frintrop, Simone, Hu, Xiaolin

arXiv.org Artificial IntelligenceFeb-25-2024

Video captioning automatically generates short descriptions of the video content, usually in form of a single sentence. Many methods have been proposed for solving this task. A large dataset called MSR Video to Text (MSR-VTT) is often used as the benchmark dataset for testing the performance of the methods. However, we found that the human annotations, i.e., the descriptions of video contents in the dataset are quite noisy, e.g., there are many duplicate captions and many captions contain grammatical problems. These problems may pose difficulties to video captioning models for learning underlying patterns. We cleaned the MSR-VTT annotations by removing these problems, then tested several typical video captioning models on the cleaned dataset. Experimental results showed that data cleaning boosted the performances of the models measured by popular quantitative metrics. We recruited subjects to evaluate the results of a model trained on the original and cleaned datasets. The human behavior experiment demonstrated that trained on the cleaned dataset, the model generated captions that were more coherent and more relevant to the contents of the video clips.

annotation, dataset, video, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.cviu.2022.103581

2102.06448

Country:

North America > United States (0.14)
Asia > China > Beijing > Beijing (0.05)
Europe > Germany (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.98)
Information Technology > Data Science > Data Quality (0.69)

Add feedback

Stylometry Analysis of Multi-authored Documents for Authorship and Author Style Change Detection

Zamir, Muhammad Tayyab, Ayub, Muhammad Asif, Gul, Asma, Ahmad, Nasir, Ahmad, Kashif

arXiv.org Artificial IntelligenceJan-12-2024

In recent years, the increasing use of Artificial Intelligence based text generation tools has posed new challenges in document provenance, authentication, and authorship detection. However, advancements in stylometry have provided opportunities for automatic authorship and author change detection in multi-authored documents using style analysis techniques. Style analysis can serve as a primary step toward document provenance and authentication through authorship detection. This paper investigates three key tasks of style analysis: (i) classification of single and multi-authored documents, (ii) single change detection, which involves identifying the point where the author switches, and (iii) multiple author-switching detection in multi-authored documents. We formulate all three tasks as classification problems and propose a merit-based fusion framework that integrates several state-of-the-art natural language processing (NLP) algorithms and weight optimization techniques. We also explore the potential of special characters, which are typically removed during pre-processing in NLP applications, on the performance of the proposed methods for these tasks by conducting extensive experiments on both cleaned and raw datasets. Experimental results demonstrate significant improvements over existing solutions for all three tasks on a benchmark dataset.

dataset, detection, multi-authored document, (16 more...)

arXiv.org Artificial Intelligence

2401.06752

Country:

Asia > Pakistan > Islamabad Capital Territory > Islamabad (0.04)
Oceania > Australia > Western Australia (0.04)
Europe > Ireland > Munster > County Cork > Cork (0.04)
(2 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.34)

Industry:

Education (0.46)
Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

Adapting the Tesseract Open-Source OCR Engine for Tamil and Sinhala Legacy Fonts and Creating a Parallel Corpus for Tamil-Sinhala-English

Vasantharajan, Charangan, Tharmalingam, Laksika, Thayasivam, Uthayasanker

arXiv.org Artificial IntelligenceDec-15-2022

Most low-resource languages do not have the necessary resources to create even a substantial monolingual corpus. These languages may often be found in government proceedings but mainly in Portable Document Format (PDF) that contains legacy fonts. Extracting text from these documents to create a monolingual corpus is challenging due to legacy font usage and printer-friendly encoding, which are not optimized for text extraction. Therefore, we propose a simple, automatic, and novel idea that can scale for Tamil, Sinhala, English languages, and many documents along with parallel corpora. Since Tamil and Sinhala are Low-Resource Languages, we improved the performance of Tesseract by employing LSTM-based training on more than 20 legacy fonts to recognize printed characters in these languages. Especially, our model detects code-mixed text, numbers, and special characters from the printed document. It is shown that this approach can reduce the character-level error rate of Tesseract from 6.03 to 2.61 for Tamil (-3.42% relative change) and 7.61 to 4.74 for Sinhala (-2.87% relative change), as well as the word-level error rate from 39.68 to 20.61 for Tamil (-19.07% relative change) and 35.04 to 26.58 for Sinhala (-8.46% relative change) on the test set. Also, our newly created parallel corpus consists of 185.4k, 168.9k, and 181.04k sentences and 2.11M, 2.22M, and 2.33M Words in Tamil, Sinhala, and English respectively. This study shows that fine-tuning Tesseract models on multiple new fonts help to understand the texts and enhances the performance of the OCR. We made newly trained models and the source code for fine-tuning Tesseract, freely available.

corpus, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IALP57159.2022.9961304

2109.05952

Country:

Asia > Sri Lanka > Western Province > Colombo > Colombo (0.05)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.56)

Add feedback