Goto

Collaborating Authors

 Thiruvananthapuram


Synthesis of Safety Specifications for Probabilistic Systems

Ohlmann, Gaspard, Court, Edwin Hamel-De le, Belardinelli, Francesco

arXiv.org Artificial Intelligence

Ensuring that agents satisfy safety specifications can be crucial in safety-critical environments. While methods exist for controller synthesis with safe temporal specifications, most existing methods restrict safe temporal specifications to probabilistic-avoidance constraints. Formal methods typically offer more expressive ways to express safety in probabilistic systems, such as Probabilistic Computation Tree Logic (PCTL) formulas. Thus, in this paper, we develop a new approach that supports more general temporal properties expressed in PCTL. Our contribution is twofold. First, we develop a theoretical framework for the Synthesis of safe-PCTL specifications. We show how the reducing global specification satisfaction to local constraints, and define CPCTL, a fragment of safe-PCTL. We demonstrate how the expressiveness of CPCTL makes it a relevant fragment for the Synthesis Problem. Second, we leverage these results and propose a new Value Iteration-based algorithm to solve the synthesis problem for these more general temporal properties, and we prove the soundness and completeness of our method.


HybridGuard: Enhancing Minority-Class Intrusion Detection in Dew-Enabled Edge-of-Things Networks

Kara, Binayak, Sahua, Ujjwal, Thomas, Ciza, Sahoo, Jyoti Prakash

arXiv.org Artificial Intelligence

Securing Dew-Enabled Edge-of-Things (EoT) networks against sophisticated intrusions is a critical challenge. This paper presents HybridGuard, a framework that integrates machine learning and deep learning to improve intrusion detection. HybridGuard addresses data imbalance through mutual information based feature selection, ensuring that the most relevant features are used to improve detection performance, especially for minority attack classes. The framework leverages Wasserstein Conditional Generative Adversarial Networks with Gradient Penalty (WCGAN-GP) to further reduce class imbalance and enhance detection precision. It adopts a two-phase architecture called DualNetShield to support advanced traffic analysis and anomaly detection, improving the granular identification of threats in complex EoT environments. HybridGuard is evaluated on the UNSW-NB15, CIC-IDS-2017, and IOTID20 datasets, where it demonstrates strong performance across diverse attack scenarios and outperforms existing solutions in adapting to evolving cybersecurity threats. This approach establishes HybridGuard as an effective tool for protecting EoT networks against modern intrusions.


JobHop: A Large-Scale Dataset of Career Trajectories

Johary, Iman, Romero, Raphael, Mara, Alexandru C., De Bie, Tijl

arXiv.org Artificial Intelligence

Understanding labor market dynamics is essential for policymakers, employers, and job seekers. However, comprehensive datasets that capture real-world career trajectories are scarce. In this paper, we introduce JobHop, a large-scale public dataset derived from anonymized resumes provided by VDAB, the public employment service in Flanders, Belgium. Utilizing Large Language Models (LLMs), we process unstructured resume data to extract structured career information, which is then normalized to standardized ESCO occupation codes using a multi-label classification model. This results in a rich dataset of over 1.67 million work experiences, extracted from and grouped into more than 361,000 user resumes and mapped to standardized ESCO occupation codes, offering valuable insights into real-world occupational transitions. This dataset enables diverse applications, such as analyzing labor market mobility, job stability, and the effects of career breaks on occupational transitions. It also supports career path prediction and other data-driven decision-making processes. To illustrate its potential, we explore key dataset characteristics, including job distributions, career breaks, and job transitions, demonstrating its value for advancing labor market research.


Cross-Corpus Validation of Speech Emotion Recognition in Urdu using Domain-Knowledge Acoustic Features

Talpur, Unzela, Syed, Zafi Sherhan, Syed, Muhammad Shehram Shah, Syed, Abbas Shah

arXiv.org Artificial Intelligence

Speech Emotion Recognition (SER) is a key affective computing technology that enables emotionally intelligent artificial intelligence. While SER is challenging in general, it is particularly difficult for low-resource languages such as Urdu. This study investigates Urdu SER in a cross-corpus setting, an area that has remained largely unexplored. We employ a cross-corpus evaluation framework across three different Urdu emotional speech datasets to test model generalization. Two standard domain-knowledge based acoustic feature sets, eGeMAPS and ComParE, are used to represent speech signals as feature vectors which are then passed to Logistic Regression and Multilayer Perceptron classifiers. Classification performance is assessed using unweighted average recall (UAR) whilst considering class-label imbalance. Results show that Self-corpus validation often overestimates performance, with UAR exceeding cross-corpus evaluation by up to 13%, underscoring that cross-corpus evaluation offers a more realistic measure of model robustness. Overall, this work emphasizes the importance of cross-corpus validation for Urdu SER and its implications contribute to advancing affective computing research for underrepresented language communities.


FUSE: Fast Semi-Supervised Node Embedding Learning via Structural and Label-Aware Optimization

Chakraborty, Sujan, Bordoloi, Rahul, Sengupta, Anindya, Wolkenhauer, Olaf, Bej, Saptarshi

arXiv.org Artificial Intelligence

Graph-based learning is a cornerstone for analyzing structured data, with node classification as a central task. However, in many real-world graphs, nodes lack informative feature vectors, leaving only neighborhood connectivity and class labels as available signals. In such cases, effective classification hinges on learning node embeddings that capture structural roles and topological context. We introduce a fast semi-supervised embedding framework that jointly optimizes three complementary objectives: (i) unsupervised structure preservation via scalable modularity approximation, (ii) supervised regularization to minimize intra-class variance among labeled nodes, and (iii) semi-supervised propagation that refines unlabeled nodes through random-walk-based label spreading with attention-weighted similarity. These components are unified into a single iterative optimization scheme, yielding high-quality node embeddings. On standard benchmarks, our method consistently achieves classification accuracy at par with or superior to state-of-the-art approaches, while requiring significantly less computational cost.


Multi-Modal Oral Cancer Detection Using Weighted Ensemble Convolutional Neural Networks

George, Ajo Babu, George, Sreehari J R Ajo Babu, George, Sreehari J R Ajo Babu, R, Sreehari J

arXiv.org Artificial Intelligence

Aims Late diagnosis of Oral Squamous Cell Carcinoma (OSCC) contributes significantly to its high global mortality rate, with over 50\% of cases detected at advanced stages and a 5-year survival rate below 50\% according to WHO statistics. This study aims to improve early detection of OSCC by developing a multimodal deep learning framework that integrates clinical, radiological, and histopathological images using a weighted ensemble of DenseNet-121 convolutional neural networks (CNNs). Material and Methods A retrospective study was conducted using publicly available datasets representing three distinct medical imaging modalities. Each modality-specific dataset was used to train a DenseNet-121 CNN via transfer learning. Augmentation and modality-specific preprocessing were applied to increase robustness. Predictions were fused using a validation-weighted ensemble strategy. Evaluation was performed using accuracy, precision, recall, F1-score. Results High validation accuracy was achieved for radiological (100\%) and histopathological (95.12\%) modalities, with clinical images performing lower (63.10\%) due to visual heterogeneity. The ensemble model demonstrated improved diagnostic robustness with an overall accuracy of 84.58\% on a multimodal validation dataset of 55 samples. Conclusion The multimodal ensemble framework bridges gaps in the current diagnostic workflow by offering a non-invasive, AI-assisted triage tool that enhances early identification of high-risk lesions. It supports clinicians in decision-making, aligning with global oncology guidelines to reduce diagnostic delays and improve patient outcomes.

  Country:
  Genre: Research Report (1.00)

Spike Agreement Dependent Plasticity: A scalable Bio-Inspired learning paradigm for Spiking Neural Networks

Bej, Saptarshi, E, Muhammed Sahad, Lakshmi, Gouri, Kumar, Harshit, Kar, Pritam, Das, Bikas C

arXiv.org Artificial Intelligence

We introduce Spike Agreement Dependent Plasticity (SADP), a biologically inspired synaptic learning rule for Spiking Neural Networks (SNNs) that relies on the agreement between pre- and post-synaptic spike trains rather than precise spike-pair timing. SADP generalizes classical Spike-Timing-Dependent Plasticity (STDP) by replacing pairwise temporal updates with population-level correlation metrics such as Cohen's kappa. The SADP update rule admits linear-time complexity and supports efficient hardware implementation via bitwise logic. Empirical results on MNIST and Fashion-MNIST show that SADP, especially when equipped with spline-based kernels derived from our experimental iontronic organic memtransistor device data, outperforms classical STDP in both accuracy and runtime. Our framework bridges the gap between biological plausibility and computational scalability, offering a viable learning mechanism for neuromorphic systems.


DRIVE: Disfluency-Rich Synthetic Dialog Data Generation Framework for Intelligent Vehicle Environments

Chavda, Anshul, Jagadeesh, M, Kullayappa, Chintalapalli Raja, Jayaprakash, B, Sruthi, Medchalimi, Bhattacharyya, Pushpak

arXiv.org Artificial Intelligence

In-car conversational AI is becoming increasingly critical as autonomous vehicles and smart assistants gain widespread adoption. Yet, existing datasets fail to capture the spontaneous disfluencies such as hesitations, false starts, repetitions, and self-corrections that characterize real driver-AI dialogs. To address this, we introduce DiscoDrive, a synthetic corpus of 3500 multi-turn dialogs across seven automotive domains, generated using a two-stage, prompt-driven pipeline that dynamically integrates disfluencies during synthesis. We show that DiscoDrive is effective both as a training resource, enabling DialoGPT-Medium and T5-Base to match or exceed KVRET-trained models on the MultiWOZ 2.2 and Schema-Guided Dialogue (SGD) relevant test sets (BLEU-4 improvements of 0.26 to 0.61; METEOR +2.10; ROUGE-L +3.48; BERTScore F1 improvements of 1.35 to 3.48), and as a data augmentation resource in low-resource scenarios, delivering additional gains of up to BLEU-4 +0.38, METEOR +1.95, ROUGE-L +2.87, and BERTScore F1 +4.00 when combined with 10 percent of KVRET. Human evaluations further confirm that dialogs sampled from DiscoDrive are rated higher than KVRET's human-collected dialogs in naturalness (3.8 vs 3.6) and coherence (4.1 vs 4.0), and are perceived as more context-appropriate than leading post-hoc methods (such as LARD), without compromising clarity. DiscoDrive fills a critical gap in existing resources and serves as a versatile corpus for both training and augmenting conversational AI, enabling robust handling of real-world, disfluent in-car interactions.


Dependency-aware synthetic tabular data generation

Umesh, Chaithra, Schultz, Kristian, Mahendra, Manjunath, Bej, Saptarshi, Wolkenhauer, Olaf

arXiv.org Artificial Intelligence

Synthetic tabular data is increasingly used in privacy-sensitive domains such as health care, but existing generative models often fail to preserve inter-attribute relationships. In particular, functional dependencies (FDs) and logical dependencies (LDs), which capture deterministic and rule-based associations between features, are rarely or often poorly retained in synthetic datasets. To address this research gap, we propose the Hierarchical Feature Generation Framework (HFGF) for synthetic tabular data generation. We created benchmark datasets with known dependencies to evaluate our proposed HFGF. The framework first generates independent features using any standard generative model, and then reconstructs dependent features based on predefined FD and LD rules. Our experiments on four benchmark datasets with varying sizes, feature imbalance, and dependency complexity demonstrate that HFGF improves the preservation of FDs and LDs across six generative models, including CTGAN, TVAE, and GReaT. Our findings demonstrate that HFGF can significantly enhance the structural fidelity and downstream utility of synthetic tabular data.


Tagging fully hadronic exotic decays of the vectorlike $\mathbf{B}$ quark using a graph neural network

Bardhan, Jai, Mandal, Tanumoy, Mitra, Subhadip, Neeraj, Cyrin, Rawat, Mihir

arXiv.org Artificial Intelligence

Following up on our earlier study in [J. Bardhan et al., Machine learning-enhanced search for a vectorlike singlet B quark decaying to a singlet scalar or pseudoscalar, Phys. Rev. D 107 (2023) 115001; arXiv:2212.02442], we investigate the LHC prospects of pair-produced vectorlike $B$ quarks decaying exotically to a new gauge-singlet (pseudo)scalar field $Φ$ and a $b$ quark. After the electroweak symmetry breaking, the $Φ$ decays predominantly to $gg/bb$ final states, leading to a fully hadronic $2b+4j$ or $6b$ signature. Because of the large Standard Model background and the lack of leptonic handles, it is a difficult channel to probe. To overcome the challenge, we employ a hybrid deep learning model containing a graph neural network followed by a deep neural network. We estimate that such a state-of-the-art deep learning analysis pipeline can lead to a performance comparable to that in the semi-leptonic mode, taking the discovery (exclusion) reach up to about $M_B=1.8\:(2.4)$ TeV at HL-LHC when $B$ decays fully exotically, i.e., BR$(B \to bΦ) = 100\%$.