Goto

Collaborating Authors

 Freising



PROSPECT: LabeledTandemMassSpectrometry DatasetforMachineLearninginProteomics

Neural Information Processing Systems

PROSPECT provides value to proteomics and machine learning researchers by including several high-quality annotations and by being accessible in terms of format and structure for applying machinelearning.



FUSE: Fast Semi-Supervised Node Embedding Learning via Structural and Label-Aware Optimization

Chakraborty, Sujan, Bordoloi, Rahul, Sengupta, Anindya, Wolkenhauer, Olaf, Bej, Saptarshi

arXiv.org Artificial Intelligence

Graph-based learning is a cornerstone for analyzing structured data, with node classification as a central task. However, in many real-world graphs, nodes lack informative feature vectors, leaving only neighborhood connectivity and class labels as available signals. In such cases, effective classification hinges on learning node embeddings that capture structural roles and topological context. We introduce a fast semi-supervised embedding framework that jointly optimizes three complementary objectives: (i) unsupervised structure preservation via scalable modularity approximation, (ii) supervised regularization to minimize intra-class variance among labeled nodes, and (iii) semi-supervised propagation that refines unlabeled nodes through random-walk-based label spreading with attention-weighted similarity. These components are unified into a single iterative optimization scheme, yielding high-quality node embeddings. On standard benchmarks, our method consistently achieves classification accuracy at par with or superior to state-of-the-art approaches, while requiring significantly less computational cost.



9 Appendix Supplementary material for the paper Causal analysis of 19 spread in Germany

Neural Information Processing Systems

W in V, W is independent of V\ ( Descendants(W) Parents( W)) given Parents (W) . As expected we see that the number of detected causes by Granger is multiple times more than those of SyPI; in most cases Granger detects as causes all the candidate states. On the other hand, SyPI does not suffer from such problems even when there are latent confounders. Finally, in the third column, we report the detected distant causes. Strict thresholds (the default of SyPI method) are used for the analysis.



Scaling behavior of large language models in emotional safety classification across sizes and tasks

Pinzuti, Edoardo, Tüscher, Oliver, Castro, André Ferreira

arXiv.org Artificial Intelligence

Understanding how large language models (LLMs) process emotionally sensitive content is critical for building safe and reliable systems, particularly in mental health contexts. We investigate the scaling behavior of LLMs on two key tasks: trinary classification of emotional safety (safe vs. unsafe vs. borderline) and multi-label classification using a six-category safety risk taxonomy. To support this, we construct a novel dataset by merging several human-authored mental health datasets (> 15K samples) and augmenting them with emotion re-interpretation prompts generated via ChatGPT. We evaluate four LLaMA models (1B, 3B, 8B, 70B) across zero-shot, few-shot, and fine-tuning settings. Our results show that larger LLMs achieve stronger average performance, particularly in nuanced multi-label classification and in zero-shot settings. However, lightweight fine-tuning allowed the 1B model to achieve performance comparable to larger models and BERT in several high-data categories, while requiring <2GB VRAM at inference. These findings suggest that smaller, on-device models can serve as viable, privacy-preserving alternatives for sensitive applications, offering the ability to interpret emotional context and maintain safe conversational boundaries. This work highlights key implications for therapeutic LLM applications and the scalable alignment of safety-critical systems.



PROSPECT: Labeled Tandem Mass Spectrometry Dataset for Machine Learning in Proteomics

Neural Information Processing Systems

Proteomics is the interdisciplinary field focusing on the large-scale study of proteins. Proteins essentially organize and execute all functions within organisms. Today, the bottom-up analysis approach is the most commonly used workflow, where proteins are digested into peptides and subsequently analyzed using Tandem Mass Spectrometry (MS/MS).