AITopics

2411.01608

Country:

Europe (0.68)
Asia > China (0.28)
North America > United States (0.28)
South America > Brazil (0.28)

Genre: Research Report > New Finding (0.86)

Industry: Transportation > Infrastructure & Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Online and Offline Evaluations of Collaborative Filtering and Content Based Recommender Systems

Elahi, Ali, Zirak, Armin

Recommender systems are widely used AI applications designed to help users efficiently discover relevant items. The effectiveness of such systems is tied to the satisfaction of both users and providers. However, user satisfaction is complex and cannot be easily framed mathematically using information retrieval and accuracy metrics. While many studies evaluate accuracy through offline tests, a growing number of researchers argue that online evaluation methods such as A/B testing are better suited for this purpose. We have employed a variety of algorithms on different types of datasets divergent in size and subject, producing recommendations in various platforms, including media streaming services, digital publishing websites, e-commerce systems, and news broadcasting networks. Notably, our target websites and datasets are in Persian (Farsi) language. This study provides a comparative analysis of a large-scale recommender system that has been operating for the past year across about 70 websites in Iran, processing roughly 300 requests per second collectively. The system employs user-based and item-based recommendations using content-based, collaborative filtering, trend-based methods, and hybrid approaches. Through both offline and online evaluations, we aim to identify where these algorithms perform most efficiently and determine the best method for our specific needs, considering the dataset and system scale. Our methods of evaluation include manual evaluation, offline tests including accuracy and ranking metrics like hit-rate@k and nDCG, and online tests consisting of click-through rate (CTR). Additionally we analyzed and proposed methods to address cold-start and popularity bias.

algorithm, artificial intelligence, social media, (11 more...)

2411.01354

Country:

Asia > Middle East > Iran (0.24)
North America > United States > New York > New York County > New York City (0.17)
North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.14)
(8 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Media > Music (0.48)
Information Technology > Services (0.35)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)

Freitag, Raquel Meister Ko

Diversidade lingu\'istica e inclus\~ao digital: desafios para uma ia brasileira

Linguistic diversity is a human attribute which, with the advance of generative AIs, is coming under threat. This paper, based on the contributions of sociolinguistics, examines the consequences of the variety selection bias imposed by technological applications and the vicious circle of preserving a variety that becomes dominant and standardized because it has linguistic documentation to feed the large language models for machine learning. Linguistic bias in chatgpt: Language models reinforce dialect discrimination. "i don't think these devices are very culturally sensitive."

artificial intelligence, large language model, natural language, (14 more...)

2411.01259

Country:

North America > United States (0.06)
South America > Brazil > São Paulo > Campinas (0.04)
Europe > Portugal (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)

Guiding Multi-agent Multi-task Reinforcement Learning by a Hierarchical Framework with Logical Reward Shaping

Liu, Chanjuan, Cong, Jinmiao, Chen, Bingcai, Jin, Yaochu, Zhu, Enqiang

Multi-agent hierarchical reinforcement learning (MAHRL) has been studied as an effective means to solve intelligent decision problems in complex and large-scale environments. However, most current MAHRL algorithms follow the traditional way of using reward functions in reinforcement learning, which limits their use to a single task. This study aims to design a multi-agent cooperative algorithm with logic reward shaping (LRS), which uses a more flexible way of setting the rewards, allowing for the effective completion of multi-tasks. LRS uses Linear Temporal Logic (LTL) to express the internal logic relation of subtasks within a complex task. Then, it evaluates whether the subformulae of the LTL expressions are satisfied based on a designed reward structure. This helps agents to learn to effectively complete tasks by adhering to the LTL expressions, thus enhancing the interpretability and credibility of their decisions. To enhance coordination and cooperation among multiple agents, a value iteration technique is designed to evaluate the actions taken by each agent. Based on this evaluation, a reward function is shaped for coordination, which enables each agent to evaluate its status and complete the remaining subtasks through experiential learning. Experiments have been conducted on various types of tasks in the Minecraft-like environment. The results demonstrate that the proposed algorithm can improve the performance of multi-agents when learning to complete multi-tasks.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2411.01184

Country:

Asia > China > Liaoning Province > Dalian (0.05)
Asia > China > Guangdong Province > Guangzhou (0.04)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
(4 more...)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment > Games > Computer Games (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

de Gois, Túlio Sousa, Freitas, Flávia Oliveira, Tejada, Julian, Freitag, Raquel Meister Ko.

NLP and Education: using semantic similarity to evaluate filled gaps in a large-scale Cloze test in the classroom

Since half past the last century, the Cloze test has been used for educational purposes to assess proficiency in understanding texts in different languages Taylor [1953], Brown [1980, 2002]. The task consists of the systematic filling in of gaps in a text, specifically a prose selection Bickley et al. [1970], previously adapted to the participant's realities, and the scores of correct answers are associated with the degree of comprehension of the text by the participant. Different measures, such as exact answer, acceptable answer Brown [1980], multiple choice, and Clozentropy Darnell [1968], Lowry and Marr [1975], have been used to assess gap-filling since Taylor's initial proposal Taylor [1953]. These measures will be further examined in Section 2. The exact answer may seem easier to calculate, especially for a Cloze test applied to large and heterogeneous groups of students with insufficient time for teachers to analyze each answer individually. In Brazil, for instance, teachers usually have to manage numerous classes, and this correction method helps to provide rapid answers to students' reading proficiency, allowing one to check the answers objectively Cunha and Santos [2010] without possible or different options.

artificial intelligence, machine learning, natural language, (21 more...)

2411.0128

Country:

South America > Brazil > Sergipe (0.05)
North America > United States (0.04)

Genre: Research Report (0.83)

Industry: Education > Educational Setting (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.92)

TODO: Enhancing LLM Alignment with Ternary Preferences

Guo, Yuxiang, Yin, Lu, Jiang, Bo, Zhang, Jiaqi

Aligning large language models (LLMs) with human intent is critical for enhancing their performance across a variety of tasks. Standard alignment techniques, such as Direct Preference Optimization (DPO), often rely on the binary Bradley-Terry (BT) model, which can struggle to capture the complexities of human preferences -- particularly in the presence of noisy or inconsistent labels and frequent ties. To address these limitations, we introduce the Tie-rank Oriented Bradley-Terry model (TOBT), an extension of the BT model that explicitly incorporates ties, enabling more nuanced preference representation. Building on this, we propose Tie-rank Oriented Direct Preference Optimization (TODO), a novel alignment algorithm that leverages TOBT's ternary ranking system to improve preference alignment. In evaluations on Mistral-7B and Llama 3-8B models, TODO consistently outperforms DPO in modeling preferences across both in-distribution and out-of-distribution datasets. Additional assessments using MT Bench and benchmarks such as Piqa, ARC-c, and MMLU further demonstrate TODO's superior alignment performance. Notably, TODO also shows strong results in binary preference alignment, highlighting its versatility and potential for broader integration into LLM alignment. The implementation details can be found in https://github.com/XXares/TODO.

large language model, machine learning, natural language, (20 more...)

2411.02442

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

AutoPT: How Far Are We from the End2End Automated Web Penetration Testing?

Wu, Benlong, Chen, Guoqiang, Chen, Kejiang, Shang, Xiuwei, Han, Jiapeng, He, Yanru, Zhang, Weiming, Yu, Nenghai

Penetration testing is essential to ensure Web security, which can detect and fix vulnerabilities in advance, and prevent data leakage and serious consequences. The powerful inference capabilities of large language models (LLMs) have made significant progress in various fields, and the development potential of LLM-based agents can revolutionize the cybersecurity penetration testing industry. In this work, we establish a comprehensive end-to-end penetration testing benchmark using a real-world penetration testing environment to explore the capabilities of LLM-based agents in this domain. Our results reveal that the agents are familiar with the framework of penetration testing tasks, but they still face limitations in generating accurate commands and executing complete processes. Accordingly, we summarize the current challenges, including the difficulty of maintaining the entire message history and the tendency for the agent to become stuck. Based on the above insights, we propose a Penetration testing State Machine (PSM) that utilizes the Finite State Machine (FSM) methodology to address these limitations. Then, we introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs, which utilizes the inherent inference ability of LLM and the constraint framework of state machines. Our evaluation results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model and improves the task completion rate from 22% to 41% on the benchmark target. Compared with the baseline framework and manual work, AutoPT also reduces time and economic costs further. Hence, our AutoPT has facilitated the development of automated penetration testing and significantly impacted both academia and industry.

large language model, machine learning, natural language, (18 more...)

2411.01236

Country:

Asia > China > Anhui Province > Hefei (0.04)
South America (0.04)
North America > United States > Washington > King County > Bellevue (0.04)
(7 more...)

Genre: Research Report > New Finding (0.86)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Kirch, Nathalie Maria, Field, Severin, Casper, Stephen

What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks

While `jailbreaks' have been central to research on the safety and reliability of LLMs (large language models), the underlying mechanisms behind these attacks are not well understood. Some prior works have used linear methods to analyze jailbreak prompts or model refusal. Here, however, we compare linear and nonlinear methods to study the features in prompts that contribute to successful jailbreaks. We do this by probing for jailbreak success based only on the portions of the latent representations corresponding to prompt tokens. First, we introduce a dataset of 10,800 jailbreak attempts from 35 attack methods. We then show that different jailbreaking methods work via different nonlinear features in prompts. Specifically, we find that while probes can distinguish between successful and unsuccessful jailbreaking prompts with a high degree of accuracy, they often transfer poorly to held-out attack methods. We also show that nonlinear probes can be used to mechanistically jailbreak the LLM by guiding the design of adversarial latent perturbations. These mechanistic jailbreaks are able to jailbreak Gemma-7B-IT more reliably than 34 of the 35 techniques that it was trained on. Ultimately, our results suggest that jailbreaks cannot be thoroughly understood in terms of universal or linear prompt features alone.

information, shake and bake method, synthesize methamphetamine, (12 more...)

2411.03343

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground (1.00)
Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(15 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Gonçalves, Francisco M. F. R., Bena, Ryan M., Pérez-Arancibia, Néstor O.

Closed-Loop Stability of a Lyapunov-Based Switching Attitude Controller for Energy-Efficient Torque-Input-Selection During Flight

arXiv.org Artificial IntelligenceNov-1-2024

We present a new Lyapunov-based switching attitude controller for energy-efficient real-time selection of the torque inputted to an uncrewed aerial vehicle (UAV) during flight. The proposed method, using quaternions to describe the attitude of the controlled UAV, interchanges the stability properties of the two fixed points-one locally asymptotically stable and another unstable-of the resulting closed-loop (CL) switching dynamics of the system. In this approach, the switching events are triggered by the value of a compound energy-based function. To analyze and ensure the stability of the CL switching dynamics, we use classical nonlinear Lyapunov techniques, in combination with switching-systems theory. For this purpose, we introduce a new compound Lyapunov function (LF) that not only enables us to derive the conditions for CL asymptotic and exponential stability, but also provides us with an estimate of the CL system's region of attraction. This new estimate is considerably larger than those previously reported for systems of the type considered in this paper. To test and demonstrate the functionality, suitability, and performance of the proposed method, we present and discuss experimental data obtained using a 31-g quadrotor during the execution of high-speed yaw-tracking maneuvers. Also, we provide empirical evidence indicating that all the initial conditions chosen for these maneuvers, as estimated, lie inside the system's region of attraction. Last, experimental data obtained through these flight tests show that the proposed switching controller reduces the control effort by about 53%, on average, with respect to that corresponding to a commonly used benchmark control scheme, when executing a particular type of high-speed yaw-tracking maneuvers.

artificial intelligence, controller, initial condition, (17 more...)

2411.00417

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Galicia > Madrid (0.04)
South America > Brazil > Minas Gerais > Belo Horizonte (0.04)
(8 more...)

Genre: Research Report (0.40)

Industry:

Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Geothermal System for Power Generation > Advanced Geothermal System (AGS) (0.61)
Transportation > Air (0.50)
Transportation > Infrastructure & Services (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

arXiv.org Machine LearningNov-1-2024

Zipfian Whitening

Yokoi, Sho, Bao, Han, Kurita, Hiroto, Shimodaira, Hidetoshi

The word embedding space in neural models is skewed, and correcting this can improve task performance. We point out that most approaches for modeling, correcting, and measuring the symmetry of an embedding space implicitly assume that the word frequencies are uniform; in reality, word frequencies follow a highly non-uniform distribution, known as Zipf's law. Surprisingly, simply performing PCA whitening weighted by the empirical word frequency that follows Zipf's law significantly improves task performance, surpassing established baselines. From a theoretical perspective, both our approach and existing methods can be clearly categorized: word representations are distributed according to an exponential family with either uniform or Zipfian base measures. By adopting the latter approach, we can naturally emphasize informative low-frequency words in terms of their vector norm, which becomes evident from the information-geometric perspective, and in terms of the loss functions for imbalanced classification. Additionally, our theory corroborates that popular natural language processing methods, such as skip-gram negative sampling, WhiteningBERT, and headless language models, work well just because their word embeddings encode the empirical word frequency into the underlying probabilistic model.

frequency, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2411.0068

Country:

Asia > Japan > Honshū > Tōhoku (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > Dominican Republic (0.04)
(14 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)