South America
MH-1M: A 1.34 Million-Sample Comprehensive Multi-Feature Android Malware Dataset for Machine Learning, Deep Learning, Large Language Models, and Threat Intelligence Research
Braganca, Hendrio, Kreutz, Diego, Rocha, Vanderson, Assolin, Joner, Feitosa, and Eduardo
Abstract--We present MH-1M, one of the most comprehensive and up-to-date datasets for advanced Android malware research. The dataset comprises 1,340,515 applications, encompassing a wide range of features and extensive metadata. T o ensure accurate malware classification, we employ the VirusT otal API, integrating multiple detection engines for comprehensive and reliable assessment. Our GitHub, Figshare, and Harvard Dataverse repositories provide open access to the processed dataset and its extensive supplementary metadata, totaling more than 400 GB of data and including the outputs of the feature extraction pipeline as well as the corresponding VirusT otal reports. Our findings underscore the MH-1M dataset's invaluable role in understanding the evolving landscape of malware. The pervasive spread of Android malware poses a significant challenge for cybersecurity research. This challenge stems mainly from the open-source nature and affordability of Android platforms, which grant users access to a large market of free applications. At the same time, malware continually evolves, adapting its tactics to execute more sophisticated and frequent attacks. Such attacks often result in data destruction, information theft, and several other cybercrimes [1], [2], [3]. Machine learning (ML) algorithms have been widely used to uncover malware and have demonstrated remarkable effectiveness in detection systems, leveraging their discriminative capabilities to identify new variants of malicious applications [4], [5], [6]. To mitigate these risks, researchers have developed a variety of methods for detecting Android malware, establishing machine learning as a central focus of contemporary mobile security research [7], [8], [9]. However, the effectiveness of ML models is highly dependent on the quality of the datasets used for training. Many existing datasets suffer from limitations such as outdated data, inadequate representation, and a limited number of samples and features, making them unsuitable for modern malware detection [10], [2], [11], [12]. These issues raise concerns about the reliability of reported performance metrics and can potentially lead to misleading conclusions [2]. A growing body of research in Android malware detection strongly supports the notion that increasing the number of discriminative features can significantly improve classification performance [13], [14], [15]. We present in Table I an overview of widely used Android malware datasets from recent years.
Understanding Code Agent Behaviour: An Empirical Study of Success and Failure Trajectories
Majgaonkar, Oorja, Fei, Zhiwei, Li, Xiang, Sarro, Federica, Ye, He
The increasing deployment of Large Language Model (LLM) agents for complex software engineering tasks has created a need to understand their problem-solving behaviours beyond simple success metrics. While these agents demonstrate impressive capabilities in automated issue resolution, their decision-making processes remain largely opaque. This paper presents an empirical study of agent trajectories, namely the execution traces capturing the steps agents take when attempting to resolve software issues. We analyse trajectories from three state-of-the-art code agents (OpenHands, SWE-agent, and Prometheus) on the SWE-Bench benchmark, examining both successful and failed attempts. Our investigation reveals several key insights into agent behaviour. First, we identify how distinct problem-solving strategies, such as defensive programming and context gathering, enable success in different scenarios. Second, we find that failed trajectories are consistently longer and exhibit higher variance than successful ones, with failure patterns differing significantly between agents. Third, our fault localisation analysis shows that while most trajectories correctly identify problematic files (72-81\% even in failures), success depends more on achieving approximate rather than exact code modifications. These and other findings unveiled by our study, provide a foundation for understanding agent behaviour through trajectory analysis, contributing to the development of more robust and interpretable autonomous software engineering systems.
MalDataGen: A Modular Framework for Synthetic Tabular Data Generation in Malware Detection
Paim, Kayua Oleques, Nogueira, Angelo Gaspar Diniz, Kreutz, Diego, Cordeiro, Weverton, Mansilha, Rodrigo Brandao
High-quality data scarcity hinders malware detection, limiting ML performance. We introduce MalDataGen, an open-source modular framework for generating high-fidelity synthetic tabular data using modular deep learning models (e.g., WGAN-GP, VQ-V AE). Evaluated via dual validation (TR-TS/TS-TR), seven classifiers, and utility metrics, MalDataGen outperforms benchmarks like SDV while preserving data utility. Its flexible design enables seamless integration into detection pipelines, offering a practical solution for cybersecurity applications. I. Introduction Modern machine learning algorithms, particularly deep learning architectures, depend on large-scale datasets with reliable annotations to achieve optimal performance.
Exploiting Latent Space Discontinuities for Building Universal LLM Jailbreaks and Data Extraction Attacks
Paim, Kayua Oleques, Mansilha, Rodrigo Brandao, Kreutz, Diego, Franco, Muriel Figueredo, Cordeiro, Weverton
The rapid proliferation of Large Language Models (LLMs) has raised significant concerns about their security against adversarial attacks. In this work, we propose a novel approach to crafting universal jailbreaks and data extraction attacks by exploiting latent space discontinuities, an architectural vulnerability related to the sparsity of training data. Initial results indicate that when these discontinuities are exploited, they can consistently and profoundly compromise model behavior, even in the presence of layered defenses. The findings suggest that this strategy has substantial potential as a systemic attack vector. Disclaimer: This paper contains examples of harmful and offensive language. Additional supporting materials may be provided upon formal request and are subject to the signing of a liability and ethical use agreement. Large Language Models (LLMs) are enabling novel applications of Artificial Intelligence (AI) and transforming human activities through conversational models (e.g., ChatGPT, DeepSeek, Gemini, Llama, and Claude). LLMs allow for natural human-AI interaction and specialized applications across multiple domains, including image generation (e.g., Adobe Firefly and Pixlr), code automation (e.g., GitHub Copilot and Amazon CodeWhisperer), and retrieval-augmented generation systems (e.g., Perplexity AI and IBM watsonx). The interactions may happen using different interfaces, such as via direct interaction with the user using a Web interface or indirectly via APIs.
Mysterious drones spotted over military base storing US nuclear weapons
China's president Xi caught knifing Trump in brutal attack just hours after historic summit World's'most trusted' broadcaster the BBC doctored Trump speech a week before the election, whistleblower reveals I won't ever forget what I saw at Andy Cohen's party. He may admit he's hooking up with guys on every dating app but this is the truth about men like him: KENNEDY'Venomous' Republican split over Israel hits new low as fiery feud reaches White House America's most dangerous cities revealed: Crime, natural disaster risks and financial safety top the list of growing concerns Drivers mock new design for world's best-selling car: 'Did it already get into a wreck?' I learned the horrifying risks of'miracle' ADHD drugs and stopped taking them... but it was too late Roller coaster camera caught utter terror on people's faces after seat belt failed on 208ft ride that travels at 75mph The leafy suburb under an hour from Manhattan where wealthy New Yorkers are fleeing to escape'woke' Mamdani's socialist dystopia The five cities with America's most pleasant climate revealed - and they're all in the same state A girl, 15, bludgeoned to death in a gated enclave, a Kennedy cousin released and the brother who'knows the truth' about the death that haunts Camelot Sex aids and poppers... the sordid discoveries made by royal aides after party Andrew threw for Epstein and Ghislaine Maxwell - and the truth about those massages: ROBERT JOBSON READ MORE: New Jersey UFO mystery solved! Mysterious drones were spotted near Belgium's Kleine Brogel air base, where US nuclear weapons are stored, prompting fears of a potential espionage operation. Belgium's Defense Minister Theo Francken confirmed that drones entered the base's airspace in two waves on Saturday and Sunday night.
OpenAI, Amazon sign 38bn AI deal
OpenAI has signed a new deal valued at $38bn with Amazon that will allow the artificial intelligence giant to run AI workloads across Amazon Web Services (AWS) cloud infrastructure. The seven-year deal announced on Monday is the first big AI push for the e-commerce giant after a restructuring last week. Experts say this does not mean that it will allow OpenAI to train its model on websites hosted by AWS - which includes the websites of The New York Times, Reddit and United Airlines. "Running OpenAI training inside AWS doesn't change their ability to scrape content from AWS-hosted websites [which they could already do for anything publicly readable]. This is strictly speaking about the economics of rent vs buy for GPU [graphics processing unit] capacity," Joshua McKenty, CEO of the AI detection company PolyguardAI, told Al Jazeera. The deal is also a major vote of confidence for the e-commerce giant's cloud unit, AWS, which some investors feared had fallen behind rivals Microsoft and Google in the artificial intelligence (AI) race.
ChatGPT owner OpenAI signs 38bn cloud computing deal with Amazon
OpenAI has signed a $38bn (£29bn) contract with Amazon to access its cloud computing infrastructure, as the start-up continues its run of major partnerships to secure computing power . In 2025, the ChatGPT maker has signed deals worth more than $1tn with Oracle, Broadcom, AMD and chip-making giant Nvidia. Its latest deal reduces its reliance on Microsoft. As part of the seven-year agreement, OpenAI will gain access to Nvidia graphics processors to train its artificial intelligence models. The deal follows a sweeping restructure of OpenAI last week which saw it convert away from being a non-profit and changed its relationship with Microsoft to give OpenAI more operational and financial freedom.
Portuguese Man O'War species honors 'One-Eyed Dragon' samurai
The newly discovered P. mikazuki is a tribute the famous warrior Date Masamune. Breakthroughs, discoveries, and DIY tips sent every weekday. A team of university students in Japan identified an entirely new species of the mighty Portuguese Man O'War . Described in a study recently published in the journal, the creature's distinct features and fearsome venom have earned it a name that honors a famous 16th century samurai warrior. It's easy to mistake the Portuguese Man O'War () for a jellyfish .
More than 700 officers to police Villa-Maccabi match
Warnings of disruption and protests have come from police as more than 700 officers prepare to mount an operation in Birmingham for Aston Villa's Uefa Europa League match against Maccabi Tel Aviv. Officers will be keeping the public safe and to tackle any crime and disorder on Thursday, West Midlands Police said, with police horses, dogs, the force's drone unit, and road policing officers out in the city. Planned protests include one by supporters of Palestine, who want the match to be called off. Last month, a decision to ban Tel Aviv fans from the event became the focus of parliamentary-level debate . The Israeli club later said supporters would not travel to Birmingham for safety reasons.
Chefs, your jobs are safe for now! Humanoid robot attempts to cook a stir-fry - but ends up flinging the food on the floor and slipping over in the mess
Trump threatens to walk out on Norah O'Donnell as 60 Minutes EDITS OUT astonishing meltdown White House makes'venomous' split with Israel: Fiery feud engulfs Trump insiders with alliance on the brink I won't ever forget what I saw at Andy Cohen's party. He may admit he's hooking up with guys on every dating app but this is the truth about men like him: KENNEDY Sad secrets of privileged son, 20, accused of murdering his self-made single mother near their $1.9m home, then screaming'Mama' Three Americans among seven killed when avalanche obliterates Himalayan climbers' base camp Thomas Massie remarries 16 months after losing wife of 31 years... as Trump ally launches sick attack Trump stuns 60 Minutes' Norah O'Donnell as he breaks terrifying news about China and Russia nukes Ex-CIA spy shares an easy way to tell if someone is lying... and the tactic he uses to strengthen his love life Justin Baldoni's bombshell $400M case against Blake Lively and Ryan Reynolds is'formally ended by a judge' JD Vance declares himself'UFO' lunatic as he vows to pull back the curtain on government secrets Sex aids and poppers... the sordid discoveries made by royal aides after party Andrew threw for Epstein and Ghislaine Maxwell - and the truth about those massages: ROBERT JOBSON Top Democrat lawmaker becomes international fugitive after she was freed on bail'for stealing thousands from vulnerable man, 83' George Clooney gives rare insight into life with wife Amal and their twins - as he details his relationship with his kids, lauds his'beautiful' family and brands himself'very lucky' Shohei Ohtani's wife makes rare appearance to celebrate Dodgers star's World Series win I learned the horrifying risks of'miracle' ADHD drugs and stopped taking them... but it was too late A girl, 15, bludgeoned to death in a gated enclave, a Kennedy cousin released and the brother who'knows the truth' about the death that haunts Camelot Justin Trudeau's rapper son sounds worse than ever in latest music video despite father's burgeoning romance with Katy Perry Moment'knifeman who hurt 11 people in Huntingdon train rampage storms barber shop moments after stabbing 14-year-old boy' Meghan is mocked for her new Christmas recipe... boiled water! Chefs, your jobs are safe for now! Robots might be poised to replace humans in factories and warehouses, but chefs don't need to worry about losing their jobs anytime soon. In a viral video, which has amassed over 6.3 million views, a humanoid robot attempts to make a stir-fry for its owner - with disastrous results.