AITopics

doi: 10.1007/978-3-032-06096-9_13

2506.01444

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
(2 more...)

arXiv.org Artificial IntelligenceMay-23-2024

Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation

Wang, Jiawei, Jiang, Renhe, Yang, Chuang, Wu, Zengqing, Onizuka, Makoto, Shibasaki, Ryosuke, Koshizuka, Noboru, Xiao, Chuan

This paper introduces a novel approach using Large Language Models (LLMs) integrated into an agent framework for flexible and effective personal mobility generation. LLMs overcome the limitations of previous models by effectively processing semantic data and offering versatility in modeling various tasks. Our approach addresses three research questions: aligning LLMs with real-world urban mobility data, developing reliable activity generation strategies, and exploring LLM applications in urban mobility. The key technical contribution is a novel LLM agent framework that accounts for individual activity patterns and motivations, including a self-consistency approach to align LLMs with real-world activity data and a retrieval-augmented strategy for interpretable activity generation. We evaluate our LLM agent framework and compare it with state-of-the-art personal mobility generation approaches, demonstrating the effectiveness of our approach and its potential applications in urban mobility. Overall, this study marks the pioneering work of designing an LLM agent framework for activity generation based on real-world human activity data, offering a promising tool for urban mobility analysis.

llm agent, motivation, trajectory, (13 more...)

2402.14744

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Retail (1.00)
Consumer Products & Services > Restaurants (1.00)
Transportation (0.69)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-29-2022

HUSP-SP: Faster Utility Mining on Sequence Data

Zhang, Chunkai, Yang, Yuting, Du, Zilin, Gan, Wensheng, Yu, Philip S.

High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity. However, due to the combinatorial explosion of the search space when the HUSPM problem encounters a low utility threshold or large-scale data, it may be time-consuming and memory-costly to address the HUSPM problem. Several algorithms have been proposed for addressing this problem, but they still cost a lot in terms of running time and memory usage. In this paper, to further solve this problem efficiently, we design a compact structure called sequence projection (seqPro) and propose an efficient algorithm, namely discovering high-utility sequential patterns with the seqPro structure (HUSP-SP). HUSP-SP utilizes the compact seq-array to store the necessary information in a sequence database. The seqPro structure is designed to efficiently calculate candidate patterns' utilities and upper bound values. Furthermore, a new upper bound on utility, namely tighter reduced sequence utility (TRSU) and two pruning strategies in search space, are utilized to improve the mining performance of HUSP-SP. Experimental results on both synthetic and real-life datasets show that HUSP-SP can significantly outperform the state-of-the-art algorithms in terms of running time, memory usage, search space pruning efficiency, and scalability.

data mining, machine learning, pattern recognition, (20 more...)

2212.14255

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Asia > China > Heilongjiang Province > Harbin (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.46)

Huang, Gengsen, Gan, Wensheng, Yu, Philip S.

Towards Sequence Utility Maximization under Utility Occupancy Measure

arXiv.org Artificial IntelligenceDec-20-2022

The discovery of utility-driven patterns is a useful and difficult research topic. It can extract significant and interesting information from specific and varied databases, increasing the value of the services provided. In practice, the measure of utility is often used to demonstrate the importance, profit, or risk of an object or a pattern. In the database, although utility is a flexible criterion for each pattern, it is a more absolute criterion due to the neglect of utility sharing. This leads to the derived patterns only exploring partial and local knowledge from a database. Utility occupancy is a recently proposed model that considers the problem of mining with high utility but low occupancy. However, existing studies are concentrated on itemsets that do not reveal the temporal relationship of object occurrences. Therefore, this paper towards sequence utility maximization. We first define utility occupancy on sequence data and raise the problem of High Utility-Occupancy Sequential Pattern Mining (HUOSPM). Three dimensions, including frequency, utility, and occupancy, are comprehensively evaluated in HUOSPM. An algorithm called Sequence Utility Maximization with Utility occupancy measure (SUMU) is proposed. Furthermore, two data structures for storing related information about a pattern, Utility-Occupancy-List-Chain (UOL-Chain) and Utility-Occupancy-Table (UO-Table) with six associated upper bounds, are designed to improve efficiency. Empirical experiments are carried out to evaluate the novel algorithm's efficiency and effectiveness. The influence of different upper bounds and pruning strategies is analyzed and discussed. The comprehensive results suggest that the work of our algorithm is intelligent and effective.

data mining, machine learning, pattern recognition, (17 more...)

2212.10452

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)
Asia > Vietnam > Hải Dương Province > Hải Dương (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)

arXiv.org Artificial IntelligenceFeb-8-2022

OPP-Miner: Order-preserving sequential pattern mining

Wu, Youxi, Hu, Qian, Li, Yan, Guo, Lei, Zhu, Xingquan, Wu, Xindong

A time series is a collection of measurements in chronological order. Discovering patterns from time series is useful in many domains, such as stock analysis, disease detection, and weather forecast. To discover patterns, existing methods often convert time series data into another form, such as nominal/symbolic format, to reduce dimensionality, which inevitably deviates the data values. Moreover, existing methods mainly neglect the order relationships between time series values. To tackle these issues, inspired by order-preserving matching, this paper proposes an Order-Preserving sequential Pattern (OPP) mining method, which represents patterns based on the order relationships of the time series data. An inherent advantage of such representation is that the trend of a time series can be represented by the relative order of the values underneath the time series data. To obtain frequent trends in time series, we propose the OPP-Miner algorithm to mine patterns with the same trend (sub-sequences with the same relative order). OPP-Miner employs the filtration and verification strategies to calculate the support and uses pattern fusion strategy to generate candidate patterns. To compress the result set, we also study finding the maximal OPPs. Experiments validate that OPP-Miner is not only efficient and scalable but can also discover similar sub-sequences in time series. In addition, case studies show that our algorithms have high utility in analyzing the COVID-19 epidemic by identifying critical trends and improve the clustering performance.

algorithm, candidate pattern, time sery, (15 more...)

2202.0314

Country:

Asia > China > Tianjin Province > Tianjin (0.04)
Asia > China > Beijing > Beijing (0.04)
South America > Brazil (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.89)
Health & Medicine > Therapeutic Area > Immunology (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Information Management (0.93)

Raza, Atif, Kramer, Stefan

Pattern Sampling for Shapelet-based Time Series Classification

arXiv.org Machine LearningFeb-16-2021

Subsequence-based time series classification algorithms provide accurate and interpretable models, but training these models is extremely computation intensive. The asymptotic time complexity of subsequence-based algorithms remains a higher-order polynomial, because these algorithms are based on exhaustive search for highly discriminative subsequences. Pattern sampling has been proposed as an effective alternative to mitigate the pattern explosion phenomenon. Therefore, we employ pattern sampling to extract discriminative features from discretized time series data. A weighted trie is created based on the discretized time series data to sample highly discriminative patterns. These sampled patterns are used to identify the shapelets which are used to transform the time series classification problem into a feature-based classification problem. Finally, a classification model can be trained using any off-the-shelf algorithm. Creating a pattern sampler requires a small number of patterns to be evaluated compared to an exhaustive search as employed by previous approaches. Compared to previously proposed algorithms, our approach requires considerably less computational and memory resources. Experiments demonstrate how the proposed approach fares in terms of classification accuracy and runtime performance.

algorithm, artificial intelligence, machine learning, (14 more...)

2102.08498

Country: Europe > Germany (0.14)

Genre: Research Report (0.64)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (0.46)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.46)
Energy > Oil & Gas > Midstream (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Pathak, Shivam, He, Mingyi, Malinchik, Sergey, Sobolevsky, Stanislav

Pattern Ensembling for Spatial Trajectory Reconstruction

arXiv.org Machine LearningJan-24-2021

Digital sensing provides an unprecedented opportunity to assess and understand mobility. However, incompleteness, missing information, possible inaccuracies, and temporal heterogeneity in the geolocation data can undermine its applicability. As mobility patterns are often repeated, we propose a method to use similar trajectory patterns from the local vicinity and probabilistically ensemble them to robustly reconstruct missing or unreliable observations. We evaluate the proposed approach in comparison with traditional functional trajectory interpolation using a case of sea vessel trajectory data provided by The Automatic Identification System (AIS). By effectively leveraging the similarities in real-world trajectories, our pattern ensembling method helps to reconstruct missing trajectory segments of extended length and complex geometry. It can be used for locating mobile objects when temporary unobserved as well as for creating an evenly sampled trajectory interpolation useful for further trajectory mining.

candidate pattern, stanislav sobolevsky, trajectory, (16 more...)

2101.09844

Country:

North America > United States > Iowa (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Camden County > Cherry Hill (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Transportation (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.94)

Asadi, Behzad, Varadharajan, Vijay

Towards a Robust Classifier: An MDL-Based Method for Generating Adversarial Examples

arXiv.org Machine LearningDec-10-2019

We address the problem of adversarial examples in machine learning where an adversary tries to misguide a classifier by making functionality-preserving modifications to original samples. We assume a black-box scenario where the adversary has access to only the feature set, and the final hard-decision output of the classifier. We propose a method to generate adversarial examples using the minimum description length (MDL) principle. Our final aim is to improve the robustness of the classifier by considering generated examples in rebuilding the classifier. We evaluate our method for the application of static malware detection in portable executable (PE) files. We consider API calls of PE files as their distinguishing features where the feature vector is a binary vector representing the presence-absence of API calls. In our method, we first create a dataset of benign samples by querying the target classifier. We next construct a code table of frequent patterns for the compression of this dataset using the MDL principle. We finally generate an adversarial example corresponding to a malware sample by selecting and adding a pattern from the benign code table to the malware sample. The selected pattern is the one that minimizes the length of the compressed adversarial example given the code table. This modification preserves the functionalities of the original malware sample as all original API calls are kept, and only some new API calls are added. Considering a neural network, we show that the evasion rate is 78.24 percent for adversarial examples compared to 8.16 percent for original malware samples. This shows the effectiveness of our method in generating examples that need to be considered in rebuilding the classifier.

adversarial example, classifier, dataset, (16 more...)

1912.05945

Country:

Europe > France (0.04)
Oceania > Australia (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(5 more...)

Genre: Research Report (0.42)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.91)

Asadi, Behzad, Varadharajan, Vijay

An MDL-Based Classifier for Transactional Datasets with Application in Malware Detection

arXiv.org Machine LearningOct-8-2019

We design a classifier for transactional datasets with application in malware detection. We build the classifier based on the minimum description length (MDL) principle. This involves selecting a model that best compresses the training dataset for each class considering the MDL criterion. To select a model for a dataset, we first use clustering followed by closed frequent pattern mining to extract a subset of closed frequent patterns (CFPs). We show that this method acts as a pattern summarization method to avoid pattern explosion; this is done by giving priority to longer CFPs, and without requiring to extract all CFPs. We then use the MDL criterion to further summarize extracted patterns, and construct a code table of patterns. This code table is considered as the selected model for the compression of the dataset. We evaluate our classifier for the problem of static malware detection in portable executable (PE) files. We consider API calls of PE files as their distinguishing features. The presence-absence of API calls forms a transactional dataset. Using our proposed method, we construct two code tables, one for the benign training dataset, and one for the malware training dataset. Our dataset consists of 19696 benign, and 19696 malware samples, each a binary sequence of size 22761. We compare our classifier with deep neural networks providing us with the state-of-the-art performance. The comparison shows that our classifier performs very close to deep neural networks. We also discuss that our classifier is an interpretable classifier. This provides the motivation to use this type of classifiers where some degree of explanation is required as to why a sample is classified under one class rather than the other class.

algorithm, code table, dataset, (15 more...)

1910.03751

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Oceania > Australia (0.04)
North America > United States > Missouri > Jackson County > Kansas City (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

AAAI ConferencesFeb-14-2017

Deterministic versus Probabilistic Methods for Searching for an Evasive Target

Bernardini, Sara (Royal Holloway University of London) | Fox, Maria (King's College London) | Long, Derek (King's College London) | Piacentini, Chiara (University of Toronto)

Several advanced applications of autonomous aerial vehicles in civilian and military contexts involve a searching agent with imperfect sensors that seeks to locate a mobile target in a given region. Effectively managing uncertainty is key to solving the related search problem, which is why all methods devised so far hinge on a probabilistic formulation of the problem and solve it through branch-and-bound algorithms, Bayesian filtering or POMDP solvers. In this paper, we consider a class of hard search tasks involving a target that exhibits an intentional evasive behaviour and moves over a large geographical area, i.e., a target that is particularly difficult to track down and uncertain to locate. We show that, even for such a complex problem, it is advantageous to compile its probabilistic structure into a deterministic model and use standard deterministic solvers to find solutions. In particular, we formulate the search problem for our uncooperative target both as a deterministic automated planning task and as a constraint programming task and show that in both cases our solution outperforms POMDPs methods.