AITopics

2504.00042

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Financial News (1.00)
Research Report > Experimental Study (0.93)

Industry:

Banking & Finance > Trading (1.00)
Law (0.94)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Lourenço, Vítor N., Silva, Gabriela G., Fernandes, Leandro A. F.

Hierarchy-of-Visual-Words: a Learning-based Approach for Trademark Image Retrieval

arXiv.org Artificial IntelligenceJul-30-2025

From the background, the procedure extracts the holes' shapes and associate them with the component shapes' list (lines 7 and 8). The foreground shapes are used in the next iterations (lines 5 and 9) until all component shapes have been extracted from the initial binary trademark image. Shape's feature extraction consists of building a feature vector for each component shape of a given trademark image (Figs. 1 (d) and (k)). These 29-dimension feature vectors combine region-based and contour-based descriptors. Shape's region is described by the 25 moments of the Zernike polynomials (ZM) of order p from 0 to 8: Z p,q= p + 1 π null ρ null θ V p,q(ρ,θ) I ( ρ,θ), (1) where ρ = null x 2 + y 2 is the length of vector from origin to pixel (x,y), θ is the angle between the vector defining ρ and the x -axis in the counter clockwise direction and V p,q(ρ,θ) is a Zernike polynomial of order p with repetition q that forms a complete set over the interior of the unit disk inscribing the component shape: V p,q( ρ,θ) = R p,q(ρ) exp ( i qθ) .

machine learning, pattern recognition, trademark image, (21 more...)

doi: 10.5753/sibgrapi.2019.9803

1908.02786

Country: South America > Brazil > Rio de Janeiro (0.28)

Genre: Research Report (1.00)

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.93)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.93)

Mohammadi, Hadi, Meijer, Yasmeen F. S. S., Papadopoulou, Efthymia, Bagheri, Ayoub

Do Large Language Models Understand Morality Across Cultures?

arXiv.org Artificial IntelligenceJul-30-2025

Recent advancements in large language models (LLMs) have established them as powerful tools across numerous domains. However, persistent concerns about embedded biases, such as gender, racial, and cultural biases arising from their training data, raise significant questions about the ethical use and societal consequences of these technologies. This study investigates the extent to which LLMs capture cross-cultural differences and similarities in moral perspectives. Specifically, we examine whether LLM outputs align with patterns observed in international survey data on moral attitudes. To this end, we employ three complementary methods: (1) comparing variances in moral scores produced by models versus those reported in surveys, (2) conducting cluster alignment analyses to assess correspondence between country groupings derived from LLM outputs and survey data, and (3) directly probing models with comparative prompts using systematically chosen token pairs. Our results reveal that current LLMs often fail to reproduce the full spectrum of cross-cultural moral variation, tending to compress differences and exhibit low alignment with empirical survey patterns. These findings highlight a pressing need for more robust approaches to mitigate biases and improve cultural representativeness in LLMs. We conclude by discussing the implications for the responsible development and global deployment of LLMs, emphasizing fairness and ethical alignment.

large language model, machine learning, natural language, (18 more...)

2507.21319

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (0.69)
Law > Civil Rights & Constitutional Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

AIHubJul-29-2025, 10:04:45 GMT

Open-source Swiss language model to be released this summer

This summer, EPFL and ETH Zurich will release a large language model (LLM) developed on public infrastructure. Trained on the "Alps" supercomputer at the Swiss National Supercomputing Centre (CSCS), the new LLM marks a milestone in open-source AI and multilingual excellence. Earlier this month in Geneva, around 50 leading global initiatives and organisations dedicated to open-source LLMs and trustworthy AI convened at the International Open-Source LLM Builders Summit. Hosted by the AI centres of EPFL and ETH Zurich, the event marked a significant step in building a vibrant and collaborative international ecosystem for open foundation models. Open LLMs are increasingly viewed as credible alternatives to commercial systems, most of which are developed behind closed doors in the United States or China.

collaboration, large language model, natural language, (14 more...)

AIHub

Country:

Europe > Switzerland > Zürich > Zürich (0.49)
North America > United States (0.25)
Asia > China (0.25)
Europe > Finland > Kainuu > Kajaani (0.05)

Industry:

Law (0.50)
Information Technology (0.31)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Ramezani, Niloofar, Slawski, Martin

Lasso Penalization for High-Dimensional Beta Regression Models: Computation, Analysis, and Inference

arXiv.org Machine LearningJul-29-2025

Beta regression is commonly employed when the outcome variable is a proportion. Since its conception, the approach has been widely used in applications spanning various scientific fields. A series of extensions have been proposed over time, several of which address variable selection and penalized estimation, e.g., with an $\ell_1$-penalty (LASSO). However, a theoretical analysis of this popular approach in the context of Beta regression with high-dimensional predictors is lacking. In this paper, we aim to close this gap. A particular challenge arises from the non-convexity of the associated negative log-likelihood, which we address by resorting to a framework for analyzing stationary points in a neighborhood of the target parameter. Leveraging this framework, we derive a non-asymptotic bound on the $\ell_1$-error of such stationary points. In addition, we propose a debiasing approach to construct confidence intervals for the regression parameters. A proximal gradient algorithm is devised for optimizing the resulting penalized negative log-likelihood function. Our theoretical analysis is corroborated via simulation studies, and a real data example concerning the prediction of county-level proportions of incarceration is presented to showcase the practical utility of our methodology.

artificial intelligence, machine learning, regression, (18 more...)

arXiv.org Machine Learning

2507.20079

Country:

North America > United States > Virginia > Albemarle County > Charlottesville (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > Virginia > Richmond (0.04)

Genre: Research Report (0.82)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)
Law (0.93)
Health & Medicine > Health Care Providers & Services (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.83)

Nguyen, Tan-Minh, Nguyen, Hoang-Trung, Dao, Trong-Khoi, Phan, Xuan-Hieu, Nguyen, Ha-Thanh, Vuong, Thi-Hai-Yen

VLQA: The First Comprehensive, Large, and High-Quality Vietnamese Dataset for Legal Question Answering

The advent of large language models (LLMs) has led to significant achievements in various domains, including legal text processing. Leveraging LLMs for legal tasks is a natural evolution and an increasingly compelling choice. However, their capabilities are often portrayed as greater than they truly are. Despite the progress, we are still far from the ultimate goal of fully automating legal tasks using artificial intelligence (AI) and natural language processing (NLP). Moreover, legal systems are deeply domain-specific and exhibit substantial variation across different countries and languages. The need for building legal text processing applications for different natural languages is, therefore, large and urgent. However, there is a big challenge for legal NLP in low-resource languages such as Vietnamese due to the scarcity of resources and annotated data. The need for labeled legal corpora for supervised training, validation, and supervised fine-tuning is critical. In this paper, we introduce the VLQA dataset, a comprehensive and high-quality resource tailored for the Vietnamese legal domain. We also conduct a comprehensive statistical analysis of the dataset and evaluate its effectiveness through experiments with state-of-the-art models on legal information retrieval and question-answering tasks.

2507.19995

Genre: Research Report (0.69)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)

Korshunov, Pavel, Kotwal, Ketan, Ecabert, Christophe, Vidit, Vidit, Mohammadi, Amir, Marcel, Sebastien

Investigation of Accuracy and Bias in Face Recognition Trained with Synthetic Data

The use of synthetic data to train face recognition (FR) models has gained increasing attention in recent years, primarily, due to its potential to avoid ethical, legal, and licensing challenges associated with using real facial images, especially in the context of privacy regulations such as the GDPR [1]. Synthetic data offers the potential to generate large-scale datasets to train models for commercial use without infringing on individual privacy, hence, facilitating the development of safer FR systems. Moreover, it allows a more fine-grained control over the data generation, which potentially can help mitigating biases in FR systems. The main focus of the recent work is on generating synthetic face datasets that could be used to train FR models with performance approaching that of models trained on real data [1, 3-8]. However, several issues are still missing from the current research discourse pertaining to training FR models with synthetic data: Dual-generator framework: Synthetic face data generation often employs a two-stage process: a seed generator for creation of distinct identities and an augmentation generator for producing intra-class variations such as different poses, lighting conditions, and expressions. Although considerable effort is directed toward enhancing the diversity of seed identities [3, 4], the role of augmentation generators in influencing FR performance remains underexplored. Unfair dataset comparisons: Comparative studies between synthetic and real datasets frequently suffer from inconsistencies in dataset sizes and compositions. A synthetic dataset with 10 K identities and 64 images per identity gets compared with another dataset of 50 k identities and 20 images per identity or to a WebFace-12M's 1. 5 M identities and 12 M images [2].

artificial intelligence, dataset, machine learning, (15 more...)

2507.20782

Country: Europe > Switzerland (0.14)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Lafargue, Valentin, Monteiro, Adriana Laurindo, Claeys, Emmanuelle, Risser, Laurent, Loubes, Jean-Michel

Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

Proving the compliance of AI algorithms has become an important challenge with the growing deployment of such algorithms for real-life applications. Inspecting possible biased behaviors is mandatory to satisfy the constraints of the regulations of the EU Artificial Intelligence's Act. Regulation-driven audits increasingly rely on global fairness metrics, with Disparate Impact being the most widely used. Yet such global measures depend highly on the distribution of the sample on which the measures are computed. We investigate first how to manipulate data samples to artificially satisfy fairness criteria, creating minimally perturbed datasets that remain statistically indistinguishable from the original distribution while satisfying prescribed fairness constraints. Then we study how to detect such manipulation. Our analysis (i) introduces mathematically sound methods for modifying empirical distributions under fairness constraints using entropic or optimal transport projections, (ii) examines how an auditee could potentially circumvent fairness inspections, and (iii) offers recommendations to help auditors detect such data manipulations. These results are validated through experiments on classical tabular datasets in bias detection.

artificial intelligence, data mining, machine learning, (20 more...)

2507.20708

Country: North America > United States (0.67)

Genre: Research Report > New Finding (1.00)

Industry:

Law (0.92)
Information Technology (0.67)
Law Enforcement & Public Safety > Fraud (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Talies, Jesco, Breitbarth, Eric, Melching, David

Towards trustworthy AI in materials mechanics through domain-guided attention

Ensuring the trustworthiness and robustness of deep learning models remains a fundamental challenge, particularly in high-stakes scientific applications. In this study, we present a framework called attention-guided training that combines explainable artificial intelligence techniques with quantitative evaluation and domain-specific priors to guide model attention. We demonstrate that domain specific feedback on model explanations during training can enhance the model's generalization capabilities. We validate our approach on the task of semantic crack tip segmentation in digital image correlation data which is a key application in the fracture mechanical characterization of materials. By aligning model attention with physically meaningful stress fields, such as those described by Williams' analytical solution, attention-guided training ensures that the model focuses on physically relevant regions. This finally leads to improved generalization and more faithful explanations.

explanation, machine learning, natural language, (19 more...)

2507.20658

Country:

North America > United States (0.93)
Europe (0.68)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government (0.93)
Law (0.68)
Health & Medicine (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Before the Outrage: Challenges and Advances in Predicting Online Antisocial Behavior

Ollagnier, Anaïs

Antisocial behavior (ASB) on social media-including hate speech, harassment, and trolling-poses growing challenges for platform safety and societal wellbeing. While prior work has primarily focused on detecting harmful content after it appears, predictive approaches aim to forecast future harmful behaviors-such as hate speech propagation, conversation derailment, or user recidivism-before they fully unfold. Despite increasing interest, the field remains fragmented, lacking a unified taxonomy or clear synthesis of existing methods. This paper presents a systematic review of over 49 studies on ASB prediction, offering a structured taxonomy of five core task types: early harm detection, harm emergence prediction, harm propagation prediction, behavioral risk prediction, and proactive moderation support. We analyze how these tasks differ by temporal framing, prediction granularity, and operational goals. In addition, we examine trends in modeling techniques-from classical machine learning to pre-trained language models-and assess the influence of dataset characteristics on task feasibility and generalization. Our review highlights methodological challenges, such as dataset scarcity, temporal drift, and limited benchmarks, while outlining emerging research directions including multilingual modeling, cross-platform generalization, and human-in-the-loop systems. By organizing the field around a coherent framework, this survey aims to guide future work toward more robust and socially responsible ASB prediction.

data mining, machine learning, natural language, (21 more...)

2507.20614

Country:

Asia (1.00)
North America > United States > California (0.46)
North America > United States > New York (0.28)
Europe > United Kingdom > England (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.68)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)
(2 more...)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
(4 more...)