malware
A Proof of Proposition 2.2: additive expansion proposition
We first define the restricted Cheeger constant in the link prediction task. Then, according to Proposition 2.1, we have: Then, we can draw the same conclusion with Eq.12, and the Thus, Eq.16 can be simplified to: "sites" Based on the Eq.15 and Eq.17, we can rewrite L The inequality holds due to the assumption. Knowledge discovery: In the 5 random experiments, we add 500 pseudo links in each iteration. The metadata information of the nodes are all strongly relevant to "Linux" Both papers focus on the "malware"/"phishing" under the topic "Computer security". The detailed result of the case study is shown in Table 6.
- Information Technology > Security & Privacy (1.00)
- Information Technology > Software (0.96)
- Information Technology > Artificial Intelligence > Machine Learning (0.90)
- Information Technology > Data Science > Data Mining (0.57)
AI is already making online swindles easier. It could get much worse.
AI is already making online swindles easier. It could get much worse. Some cybersecurity researchers say it's too early to worry about AI-orchestrated cyberattacks. Others say it could already be happening. Anton Cherepanov is always on the lookout for something interesting. And in late August last year, he spotted just that.
- North America > United States > New York (0.05)
- Asia > China (0.05)
- North America > United States > Virginia (0.04)
- (3 more...)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.71)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
149 Million Usernames and Passwords Exposed by Unsecured Database
This "dream wish list for criminals" includes millions of Gmail, Facebook, banking logins, and more. The researcher who discovered it suspects they were collected using infostealing malware. A database containing 149 million account usernames and passwords--including 48 million for Gmail, 17 million for Facebook, and 420,000 for the cryptocurrency platform Binance --has been removed after a researcher reported the exposure to the hosting provider. The longtime security analyst who discovered the database, Jeremiah Fowler, could not find indications of who owned or operated it, so he worked to notify the host, which took down the trove because it violated a terms of service agreement. In addition to email and social media logins for a number of platforms, Fowler also observed credentials for government systems from multiple countries as well as consumer banking and credit card logins and media streaming platforms.
- North America > United States > California (0.05)
- North America > United States > Arizona (0.05)
- North America > Canada (0.05)
- (3 more...)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Services (0.91)
- Government > Military > Cyberwarfare (0.30)
Comparative Analysis of Hash-based Malware Clustering via K-Means
Thein, Aink Acrie Soe, Pitropakis, Nikolaos, Papadopoulos, Pavlos, Grierson, Sam, Jan, Sana Ullah
With the adoption of multiple digital devices in everyday life, the cyber-attack surface has increased. Adversaries are continuously exploring new avenues to exploit them and deploy malware. On the other hand, detection approaches typically employ hashing-based algorithms such as SSDeep, TLSH, and IMPHash to capture structural and behavioural similarities among binaries. This work focuses on the analysis and evaluation of these techniques for clustering malware samples using the K-means algorithm. More specifically, we experimented with established malware families and traits and found that TLSH and IMPHash produce more distinct, semantically meaningful clusters, whereas SSDeep is more efficient for broader classification tasks. The findings of this work can guide the development of more robust threat-detection mechanisms and adaptive security mechanisms.
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.34)
Hackers tricked ChatGPT, Grok and Google into helping them install malware
GPU prices could follow RAM's big rise Using popular AI chatbots, attackers created search-friendly links that instructed a user to hack their own device. Ever since reporting earlier this year on how easy it is to trick an agentic browser, I've been following the intersections between modern AI and old-school scams. Now, there's a new convergence on the horizon: hackers are apparently using AI prompts to seed Google search results with dangerous commands. When executed by unknowing users, these commands prompt computers to give the hackers the access they need to install malware. The warning comes by way of a recent report from detection-and-response firm Huntress.
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.43)
Clustering Malware at Scale: A First Full-Benchmark Study
Mocko, Martin, Ševcech, Jakub, Chudá, Daniela
Recent years have shown that malware attacks still happen with high frequency. Malware experts seek to categorize and classify incoming samples to confirm their trustworthiness or prove their maliciousness. One of the ways in which groups of malware samples can be identified is through malware clustering. Despite the efforts of the community, malware clustering which incorporates benign samples has been under-explored. Moreover, despite the availability of larger public benchmark malware datasets, malware clustering studies have avoided fully utilizing these datasets in their experiments, often resorting to small datasets with only a few families. Additionally, the current state-of-the-art solutions for malware clustering remain unclear. In our study, we evaluate malware clustering quality and establish the state-of-the-art on Bodmas and Ember - two large public benchmark malware datasets. Ours is the first study of malware clustering performed on whole malware benchmark datasets. Additionally, we extend the malware clustering task by incorporating benign samples. Our results indicate that incorporating benign samples does not significantly degrade clustering quality. We find that there are differences in the quality of the created clusters between Ember and Bodmas, as well as a private industry dataset. Contrary to popular opinion, our top clustering performers are K-Means and BIRCH, with DBSCAN and HAC falling behind.
- Europe > Slovakia > Bratislava > Bratislava (0.04)
- Europe > Czechia > South Moravian Region > Brno (0.04)
- North America > United States > California (0.04)
MASCOT: Analyzing Malware Evolution Through A Well-Curated Source Code Dataset
Li, Bojing, Zhong, Duo, Nadendla, Dharani, Terceros, Gabriel, Bhandar, Prajna, S, Raguvir, Nicholas, Charles
Abstract--In recent years, the explosion of malware and extensive code reuse have formed complex evolutionary connections among malware specimens. The rapid pace of development makes it challenging for existing studies to characterize recent evolutionary trends. In addition, intuitive tools to untangle these intricate connections between malware specimens or categories are urgently needed. This paper introduces a manually-reviewed malware source code dataset containing 6032 specimens. Building on and extending current research from a software engineering perspective, we systematically evaluate the scale, development costs, code quality, as well as security and dependencies of modern malware. We further introduce a multi-view genealogy analysis to clarify malware connections: at an overall view, this analysis quantifies the strength and direction of connections among specimens and categories; at a detailed view, it traces the evolutionary histories of individual specimens. Experimental results indicate that, despite persistent shortcomings in code quality, malware specimens exhibit an increasing complexity and standardization, in step with the development of mainstream software engineering practices. Meanwhile, our genealogy analysis intuitively reveals lineage expansion and evolution driven by code reuse, providing new evidence and tools for understanding the formation and evolution of the malware ecosystem. With the rapid development of information technology and large language models, malware has experienced a surge in recent years, exhibiting strong connections among categories and specimens, as well as high code reuse rates [1]. In the past 12 months, more than 107 million new malicious or potentially unwanted applications were detected [2], [3]. Many of these malware specimens are variants of previously known malware, which indicates the prevalence of code reuse and family-oriented evolution. However, the difficulty of collecting, reviewing, and labeling has resulted in a scarcity of source code datasets [4]. Existing datasets lack human curation, reliable labels, and timestamps.
- Research Report (0.64)
- Overview (0.46)
Binary-30K: A Heterogeneous Dataset for Deep Learning in Binary Analysis and Malware Detection
Deep learning research for binary analysis faces a critical infrastructure gap. Today, existing datasets target single platforms, require specialized tooling, or provide only hand-engineered features incompatible with modern neural architectures; no single dataset supports accessible research and pedagogy on realistic use cases. To solve this, we introduce Binary-30K, the first heterogeneous binary dataset designed for sequence-based models like transformers. Critically, Binary-30K covers Windows, Linux, macOS, and Android across 15+ CPU architectures. With 29,793 binaries and approximately 26.93% malware representation, Binary-30K enables research on platform-invariant detection, cross-target transfer learning, and long-context binary understanding. The dataset provides pre-computed byte-level BPE tokenization alongside comprehensive structural metadata, supporting both sequence modeling and structure-aware approaches. Platform-first stratified sampling ensures representative coverage across operating systems and architectures, while distribution via Hugging Face with official train/validation/test splits enables reproducible benchmarking. The dataset is publicly available at https://huggingface.co/datasets/mjbommar/binary-30k, providing an accessible resource for researchers, practitioners, and students alike.
- North America > United States > Hawaii (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Hungary > Budapest > Budapest (0.04)
- Research Report (1.00)
- Instructional Material (1.00)
- Asia > China (0.05)
- North America > United States > Texas (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Media (1.00)
- Leisure & Entertainment > Sports (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.72)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Communications > Networks (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)
Zipf-Gramming: Scaling Byte N-Grams Up to Production Sized Malware Corpora
Raff, Edward, Curtin, Ryan R., Everett, Derek, Joyce, Robert J., Holt, James
A classifier using byte n-grams as features is the only approach we have found fast enough to meet requirements in size (sub 2 MB), speed (multiple GB/s), and latency (sub 10 ms) for deployment in numerous malware detection scenarios. However, we've consistently found that 6-8 grams achieve the best accuracy on our production deployments but have been unable to deploy regularly updated models due to the high cost of finding the top-k most frequent n-grams over terabytes of executable programs. Because the Zipfian distribution well models the distribution of n-grams, we exploit its properties to develop a new top-k n-gram extractor that is up to $35\times$ faster than the previous best alternative. Using our new Zipf-Gramming algorithm, we are able to scale up our production training set and obtain up to 30\% improvement in AUC at detecting new malware. We show theoretically and empirically that our approach will select the top-k items with little error and the interplay between theory and engineering required to achieve these results.
- North America > United States > New York > New York County > New York City (0.14)
- Asia > South Korea > Seoul > Seoul (0.05)
- Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
- Europe > Denmark (0.04)