Sarawak
AugAbEx : Way Forward for Extractive Case Summarization
Bindal, Purnima, Kumar, Vikas, Rathore, Sagar, Bhatnagar, Vasudha
Summarization of legal judgments poses a heavy cognitive burden on law practitioners due to the complexity of the language, context-sensitive legal jargon, and the length of the document. Therefore, the automatic summarization of legal documents has attracted serious attention from natural language processing researchers. Since the abstractive summaries of legal documents generated by deep neural methods remain prone to the risk of misrepresenting nuanced legal jargon or overlooking key contextual details, we envisage a rising trend toward the use of extractive case summarizers. Given the high cost of human annotation for gold standard extractive summaries, we engineer a light and transparent pipeline that leverages existing abstractive gold standard summaries to create the corresponding extractive gold standard versions. The approach ensures that the experts` opinions ensconced in the original gold standard abstractive summaries are carried over to the transformed extractive summaries. We aim to augment seven existing case summarization datasets, which include abstractive summaries, by incorporating corresponding extractive summaries and create an enriched data resource for case summarization research community. To ensure the quality of the augmented extractive summaries, we perform an extensive comparative evaluation with the original abstractive gold standard summaries covering structural, lexical, and semantic dimensions. We also compare the domain-level information of the two summaries. We commit to release the augmented datasets in the public domain for use by the research community and believe that the resource will offer opportunities to advance the field of automatic summarization of legal documents.
- Oceania > Australia (0.14)
- Europe > Ukraine > Sumy Oblast > Sumy (0.04)
- North America > United States > California (0.04)
- (5 more...)
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
Hong, Hanbin, Feng, Shuya, Naderloui, Nima, Yan, Shenao, Zhang, Jingyu, Liu, Biying, Arastehfard, Ali, Huang, Heqing, Hong, Yuan
Large Language Models (LLMs) have rapidly transitioned from academic research to core components of real-world applications, especially since the emergence of high-profile foundation models such as OpenAI's GPT series [17, 140], Google Gemini [9], Meta Llama [175, 176], Anthropic Claude [12], Alibaba Qwen [11, 210, 209], and Doubao [172]. Today, LLMs are deployed across an unprecedented range of sectors--from web search and code assistants to legal, educational, and healthcare domains--reaching hundreds of millions of end users globally. The rapid adoption of LLMs has ushered in a new era of AI-powered services, but it also brings serious safety and security risks. These risks manifest in multiple forms, ranging from misinformation and privacy leaks to adversarial attacks that exploit model vulnerabilities. In particular, a growing body of work shows that carefully crafted jailbreak prompts can bypass alignment constraints, inducing models to produce sensitive, illegal, or harmful content. Alarmingly, recent studies report that such attacks achieve success rates exceeding 90% even on flagship models such as GPT-4, Claude 3, and DeepSeek-R1 [124, 42, 154, 118]. The outputs generated through these attacks could be used for malicious purposes, underscoring the urgent need for close attention and mitigation.
- Europe > Austria > Vienna (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (29 more...)
- Information Technology > Security & Privacy (1.00)
- Education (0.92)
- Government > Military (0.88)
Conditional Generative Adversarial Networks Based Inertial Signal Translation
The paper presents an approach in which inertial signals measured with a wrist-worn sensor (e.g., a smartwatch) are translated into those that would be recorded using a shoe-mounted sensor, enabling the use of state-of-the-art gait analysis methods. In the study, the signals are translated using Conditional Generative Adversarial Networks (GANs). Two different GAN versions are used for experimental verification: traditional ones trained using binary cross-entropy loss and Wasserstein GANs (WGANs). For the generator, two architectures, a convolutional autoencoder, and a convolutional U-Net, are tested. The experiment results have shown that the proposed approach allows for an accurate translation, enabling the use of wrist sensor inertial signals for efficient, every-day gait analysis.
Latent Factorization of Tensors with Threshold Distance Weighted Loss for Traffic Data Estimation
Intelligent transportation systems (ITS) rely heavily on comp lete and high - quality spatiotemporal traffic data to achieve optimal performance. Nevertheless, in real - word traffic data collection processes, issues such as communication failures and sensor malfunctions often lead to incomplete or corrupted datasets, th ereby posing significant challenges to the advancement of ITS. Among various methods for imputing missing spatiotemporal traffic data, the latent factorization of tensors (LFT) model has emerged as a widely adopted and effective solution. However, conventi onal LFT models typically employ the standard L 2 - norm in their learning objective, which makes them vulnerable to the influence of outliers. To overcome this limitation, this paper proposes a threshold distance weighted (TDW) loss - incorporated Latent Facto ri zation of Tensors ( TDW LFT) model . The proposed loss function effectively reduces the model's sensitivity to outliers by assigning differentiated weights to individual samples. Extensive experiments conducted on two traffic speed datasets sourced from div erse urban environments confirm that the proposed TDW LFT model consistently outperforms state - of - the - art approaches in terms of both in both prediction accuracy and computational efficiency .
- North America > United States > New York (0.05)
- Asia > China > Guangdong Province > Guangzhou (0.05)
- Asia > Singapore (0.04)
- (4 more...)
- Research Report > Promising Solution (0.66)
- Overview > Innovation (0.48)
- Information Technology > Data Science > Data Mining (0.95)
- Information Technology > Communications > Networks (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)
Durghotona GPT: A Web Scraping and Large Language Model Based Framework to Generate Road Accident Dataset Automatically in Bangladesh
Chowdhury, MD Thamed Bin Zaman, Hossain, Moazzem, Islam, Md. Ridwanul
Road accidents pose significant concerns globally. They lead to large financial losses, injuries, disabilities, and societal challenges. Accurate and timely accident data is essential for predicting and mitigating these events. This paper presents a novel framework named 'Durghotona GPT' that integrates web scraping and Large Language Models (LLMs) to automate the generation of comprehensive accident datasets from prominent national dailies in Bangladesh. The authors collected accident reports from three major newspapers: Prothom Alo, Dhaka Tribune, and The Daily Star. The collected news was then processed using the newest available LLMs: GPT-4, GPT-3.5, and Llama-3. The framework efficiently extracts relevant information, categorizes reports, and compiles detailed datasets. Thus, this framework overcomes limitations of manual data collection methods such as delays, errors, and communication gaps. The authors' evaluation demonstrates that Llama-3, an open-source model, performs comparably to GPT-4. It achieved 89% accuracy in the authors' evaluation. Therefore, it can be considered a cost-effective alternative for similar tasks. The results suggest that the framework developed by the authors can drastically enhance the quality and availability of accident data. As a result, it can support critical applications in traffic safety analysis, urban planning, and public health. The authors also developed an interface for 'Durghotona GPT' for ease of use as part of this paper. Future work will focus on expanding data collection methods and refining LLMs to further increase dataset accuracy and applicability.
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.26)
- Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
- Asia > China (0.04)
- (3 more...)
- Health & Medicine (0.67)
- Media > News (0.37)
Variational Autoencoder-Based Approach to Latent Feature Analysis on Efficient Representation of Power Load Monitoring Data
With the development of smart grids, High-Dimensional and Incomplete (HDI) Power Load Monitoring (PLM) data challenges the performance of Power Load Forecasting (PLF) models. In this paper, we propose a potential characterization model VAE-LF based on Variational Autoencoder (VAE) for efficiently representing and complementing PLM missing data. VAE-LF learns a low-dimensional latent representation of the data using an Encoder-Decoder structure by splitting the HDI PLM data into vectors and feeding them sequentially into the VAE-LF model, and generates the complementary data. Experiments on the UK-DALE dataset show that VAE-LF outperforms other benchmark models in both 5% and 10% sparsity test cases, with significantly lower RMSE and MAE, and especially outperforms on low sparsity ratio data. The method provides an efficient data-completion solution for electric load management in smart grids.
- Asia > Singapore (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Oceania > Australia > Western Australia > Perth (0.04)
- (5 more...)
Extracting Abstraction Dimensions by Identifying Syntax Pattern from Texts
Zhou, Jian, Li, Jiazheng, Zhuge, Sirui, Zhuge, Hai
This paper proposed an approach to automatically discovering subject dimension, action dimension, object dimension and adverbial dimension from texts to efficiently operate texts and support query in natural language. The high quality of trees guarantees that all subjects, actions, objects and adverbials and their subclass relations within texts can be represented. The independency of trees ensures that there is no redundant representation between trees. The expressiveness of trees ensures that the majority of sentences can be accessed from each tree and the rest of sentences can be accessed from at least one tree so that the tree-based search mechanism can support querying in natural language. Experiments show that the average precision, recall and F1-score of the abstraction trees constructed by the subclass relations of subject, action, object and adverbial are all greater than 80%. The application of the proposed approach to supporting query in natural language demonstrates that different types of question patterns for querying subject or object have high coverage of texts, and searching multiple trees on subject, action, object and adverbial according to the question pattern can quickly reduce search space to locate target sentences, which can support precise operation on texts.
- Oceania > Australia (0.14)
- Asia > China > Beijing > Beijing (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (2 more...)
- Research Report (1.00)
- Personal > Honors (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
- (2 more...)
MorphoNavi: Aerial-Ground Robot Navigation with Object Oriented Mapping in Digital Twin
Karaf, Sausar, Martynov, Mikhail, Sautenkov, Oleg, Darush, Zhanibek, Tsetserukou, Dzmitry
-- This paper presents a novel mapping approach for a universal aerial-ground robotic system utilizing a single monocular camera. The proposed system is capable of detecting a diverse range of objects and estimating their positions without requiring fine-tuning for specific environments. The system's performance was evaluated through a simulated search-and-rescue scenario, where the MorphoGear robot successfully located a robotic dog while an operator monitored the process. This work contributes to the development of intelligent, mul-timodal robotic systems capable of operating in unstructured environments. Robotics has experienced rapid advancements in recent years, with Vision-Language Models (VLMs) emerging as a powerful tool for mission execution based on RGB images. Since VLMs require only an image and a text prompt as input, they eliminate the need for expensive and specialized sensors such as LiDARs and depth cameras. This simplicity and cost-effectiveness suggest that vision-language-based control will play a crucial role in the future of robotics, with cameras becoming the primary sensor for most robotic systems. In this paper, we introduce a novel mapping approach designed for a universal air-ground robotic system using a single monocular camera.
- North America > United States (0.05)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- (2 more...)
Latent Tensor Factorization with Nonlinear PID Control for Missing Data Recovery in Non-Intrusive Load Monitoring
Wang, Yiran, Xie, Tangtang, Wu, Hao
Non-Intrusive Load Monitoring (NILM) has emerged as a key smart grid technology, identifying electrical device and providing detailed energy consumption data for precise demand response management. Nevertheless, NILM data suffers from missing values due to inescapable factors like sensor failure, leading to inaccuracies in non-intrusive load monitoring. A stochastic gradient descent (SGD)-based latent factorization of tensors model has proven to be effective in estimating missing data, however, it updates a latent factor solely based on the current stochastic gradient, without considering past information, which leads to slow convergence of anLFT model. To address this issue, this paper proposes a Nonlinear Proportional-integral-derivative (PID)-Incorporated Latent factorization of tensors (NPIL) model with two-fold ideas: a) rebuilding the instant learning error according to the principle of a nonlinear PID controller, thus, the past update information is efficiently incorporated into the learning scheme, and b) implementing gain parameter adaptation by utilizing particle swarm optimization (PSO) algorithm, hence, the model computational efficiency is effectively improved. Experimental results on real-world NILM datasets demonstrate that the proposed NPIL model surpasses state-of-the-art models in convergence rate and accuracy when predicting the missing NILM data.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.76)
Academic Network Representation via Prediction-Sampling Incorporated Tensor Factorization
Zhang, Chunyang, Liao, Xin, Wu, Hao
Accurate representation to an academic network is of great significance to academic relationship mining like predicting scientific impact. A Latent Factorization of Tensors (LFT) model is one of the most effective models for learning the representation of a target network. However, an academic network is often High-Dimensional and Incomplete (HDI) because the relationships among numerous network entities are impossible to be fully explored, making it difficult for an LFT model to learn accurate representation of the academic network. To address this issue, this paper proposes a Prediction-sampling-based Latent Factorization of Tensors (PLFT) model with two ideas: 1) constructing a cascade LFT architecture to enhance model representation learning ability via learning academic network hierarchical features, and 2) introducing a nonlinear activation-incorporated predicting-sampling strategy to more accurately learn the network representation via generating new academic network data layer by layer. Experimental results from the three real-world academic network datasets show that the PLFT model outperforms existing models when predicting the unexplored relationships among network entities.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
- Asia > Singapore (0.04)
- (8 more...)