Goto

Collaborating Authors

 Mendoza Province


VelLMes: A high-interaction AI-based deception framework

Sladić, Muris, Valeros, Veronica, Catania, Carlos, Garcia, Sebastian

arXiv.org Artificial Intelligence

There are very few SotA deception systems based on Large Language Models. The existing ones are limited only to simulating one type of service, mainly SSH shells. These systems - but also the deception technologies not based on LLMs - lack an extensive evaluation that includes human attackers. Generative AI has recently become a valuable asset for cybersecurity researchers and practitioners, and the field of cyber-deception is no exception. Researchers have demonstrated how LLMs can be leveraged to create realistic-looking honeytokens, fake users, and even simulated systems that can be used as honeypots. This paper presents an AI-based deception framework called VelLMes, which can simulate multiple protocols and services such as SSH Linux shell, MySQL, POP3, and HTTP. All of these can be deployed and used as honeypots, thus VelLMes offers a variety of choices for deception design based on the users' needs. VelLMes is designed to be attacked by humans, so interactivity and realism are key for its performance. We evaluate the generative capabilities and the deception capabilities. Generative capabilities were evaluated using unit tests for LLMs. The results of the unit tests show that, with careful prompting, LLMs can produce realistic-looking responses, with some LLMs having a 100% passing rate. In the case of the SSH Linux shell, we evaluated deception capabilities with 89 human attackers. The results showed that about 30% of the attackers thought that they were interacting with a real system when they were assigned an LLM-based honeypot. Lastly, we deployed 10 instances of the SSH Linux shell honeypot on the Internet to capture real-life attacks. Analysis of these attacks showed us that LLM honeypots simulating Linux shells can perform well against unstructured and unexpected attacks on the Internet, responding correctly to most of the issued commands.


The Robustness of Structural Features in Species Interaction Networks

Fard, Sanaz Hasanzadeh, Dolson, Emily

arXiv.org Artificial Intelligence

Species interaction networks are a powerful tool for describing ecological communities; they typically contain nodes representing species, and edges representing interactions between those species. For the purposes of drawing abstract inferences about groups of similar networks, ecologists often use graph topology metrics to summarize structural features. However, gathering the data that underlies these networks is challenging, which can lead to some interactions being missed. Thus, it is important to understand how much different structural metrics are affected by missing data. To address this question, we analyzed a database of 148 real-world bipartite networks representing four different types of species interactions (pollination, host-parasite, plant-ant, and seed-dispersal). For each network, we measured six different topological properties: number of connected components, variance in node betweenness, variance in node PageRank, largest Eigenvalue, the number of non-zero Eigenvalues, and community detection as determined by four different algorithms. We then tested how these properties change as additional edges -- representing data that may have been missed -- are added to the networks. We found substantial variation in how robust different properties were to the missing data. For example, the Clauset-Newman-Moore and Louvain community detection algorithms showed much more gradual change as edges were added than the label propagation and Girvan-Newman algorithms did, suggesting that the former are more robust. Robustness also varied for some metrics based on interaction type. These results provide a foundation for selecting network properties to use when analyzing messy ecological network data.


LLMs for Domain Generation Algorithm Detection

La O, Reynier Leyva, Catania, Carlos A., Parlanti, Tatiana

arXiv.org Artificial Intelligence

We perform a detailed evaluation of two important techniques: In-Context Learning (ICL) and Supervised Fine-Tuning (SFT), showing how they can improve detection. SFT increases performance by using domain-specific data, whereas ICL helps the detection model to quickly adapt to new threats without requiring much retraining. We use Meta's Llama3 8B model, on a custom dataset with 68 malware families and normal domains, covering several hard-to-detect schemes, including recent word-based DGAs. Results proved that LLM-based methods can achieve competitive results in DGA detection. In particular, the SFT-based LLM DGA detector outperforms state-of-the-art models using attention layers, achieving 94% accuracy with a 4% false positive rate (FPR) and excelling at detecting word-based DGA domains.


WineGraph: A Graph Representation For Food-Wine Pairing

Gawrysiak, Zuzanna, Żywot, Agata, Ławrynowicz, Agnieszka

arXiv.org Artificial Intelligence

We present WineGraph, an extended version of FlavorGraph, a heterogeneous graph incorporating wine data into its structure. This integration enables food-wine pairing based on taste and sommelier-defined rules. Leveraging a food dataset comprising 500,000 reviews and a wine reviews dataset with over 130,000 entries, we computed taste descriptors for both food and wine. This information was then utilised to pair food items with wine and augment FlavorGraph with additional data. The results demonstrate the potential of heterogeneous graphs to acquire supplementary information, proving beneficial for wine pairing.


Deadly plane crash after DC airspace breached, Capitol Police halt youth choir and more top headlines

FOX News

SEARCH SUSPENDED - No survivors found after plane violates DC airspace, scrambles military before crashing in Virginia. LAND OF THE FREE? - Capitol Police spark outrage as youth choir's national anthem performance halted. 'BEST SOLUTION' - AI could help solve NJ missing child mystery, become model for cold case probes. RECORD SCRATCH - 'American Pie' icon Don McLean weighs in on AI's effect on the music industry. WHAT'S IN STORE - Target backs organization pushing US demilitarization, Mt. 'IT HAS TO BE JOE BIDEN' - Ex-FBI director James Comey speaks out on 2024 race.


Zoltan Istvan on AI, Transhumanism, Politics and Ethics

#artificialintelligence

Zoltan Istvan is a former journalist, political candidate, entrepreneur, bestselling author, and founder of the US Transhumanist Party. He has been on this podcast twice before when we discussed Istvan's presidential campaign and his bestselling novel The Transhumanist Wager. During this 1-hour conversation with Zoltan Istvan, we cover a variety of interesting topics such as the challenge of doing graduate school at Oxford, Quantum Archaeology; Trump, transhumanism, politics, and conflict; the Immortality or Bust documentary; microchipping refugees and selling off public lands; the ethics of doing damage now in the hope of fixing it later; technosolutionism and why Technology is Not Enough; longevity, entrepreneurship, and healthcare; the distinction between a body with a brain vs a brain with a body; the timeline to AGI, mind-uploading and indefinite life extension. As always you can listen to or download the audio file above or scroll down and watch the video interview in full. To show your support you can write a review on iTunes, make a direct donation, or become a patron on Patreon.


Beyond Random Split for Assessing Statistical Model Performance

Catania, Carlos, Guerra, Jorge, Romero, Juan Manuel, Caffaratti, Gabriel, Marchetta, Martin

arXiv.org Artificial Intelligence

Even though a train/test split of the dataset randomly performed is a common practice, could not always be the best approach for estimating performance generalization under some scenarios. The fact is that the usual machine learning methodology can sometimes overestimate the generalization error when a dataset is not representative or when rare and elusive examples are a fundamental aspect of the detection problem. In the present work, we analyze strategies based on the predictors' variability to split in training and testing sets. Such strategies aim at guaranteeing the inclusion of rare or unusual examples with a minimal loss of the population's representativeness and provide a more accurate estimation about the generalization error when the dataset is not representative. Two baseline classifiers based on decision trees were used for testing the four splitting strategies considered. Both classifiers were applied on CTU19 a low-representative dataset for a network security detection problem. Preliminary results showed the importance of applying the three alternative strategies to the Monte Carlo splitting strategy in order to get a more accurate error estimation on different but feasible scenarios.


Bioavailable Strontium, Human Paleogeography, and Migrations in the Southern Andes: A Machine Learning and GIS Approach

#artificialintelligence

The Andes are a unique geological and biogeographic feature of South America. From the perspective of human geography, this mountain range provides ready access to highly diverse altitudinally arranged ecosystems. The combination of a geologically and ecologically diverse landscape provides an exceptional context to explore the potential of strontium isotopes to track the movements of people and the conveyance of material culture. Here we develop an isotopic landscape of bioavailable strontium (87Sr/86Sr) that is applied to reconstruct human paleogeography across time in the southern Andes of Argentina and Chile (31°–34°S). These results come from a macro-regional sampling of rodents (N = 65) and plants (N = 26) from modern and archeological contexts. This “Southern Andean Strontium Transect” extends over 350 km across the Andes, encompassing the main geological provinces between the Pacific coast (Chile) and the eastern lowlands (Argentina). We follow a recently developed approach to isoscape construction based on Random Forest regression and GIS analysis. Our results suggest that bioavailable strontium is tightly linked with bedrock geology and offers a highly resolved proxy to track human paleogeography involving the levels of territories or daily mobility and anomalous events that disrupt home ranges, such as migration. The southern Andes provide an ideal geological setting to develop this approach, since the geological variation in rock age and composition produces di...


Reinforcement Learning-based Autoscaling of Workflows in the Cloud: A Survey

Garí, Yisel, Monge, David A., Pacini, Elina, Mateos, Cristian, Garino, Carlos García

arXiv.org Machine Learning

Reinforcement Learning (RL) has demonstrated a great potential for automatically solving decision making problems in complex uncertain environments. Basically, RL proposes a computational approach that allows learning through interaction in an environment of stochastic behavior, with agents taking actions to maximize some cumulative short-term and long-term rewards. Some of the most impressive results have been shown in Game Theory where agents exhibited super-human performance in games like Go or Starcraft 2, which led to its adoption in many other domains including Cloud Computing. Particularly, workflow autoscaling exploits the Cloud elasticity to optimize the execution of workflows according to a given optimization criteria. This is a decision-making problem in which it is necessary to establish when and how to scale-up/down computational resources; and how to assign them to the upcoming processing workload. Such actions have to be taken considering some optimization criteria in the Cloud, a dynamic and uncertain environment. Motivated by this, many works apply RL to the autoscaling problem in Cloud. In this work we survey exhaustively those proposals from major venues, and uniformly compare them based on a set of proposed taxonomies. We also discuss open problems and provide a prospective of future research in the area.


30 Top Artificial Intelligence And Machine Learning Companies

#artificialintelligence

Artificial intelligence has become an essential part of our everyday lives. It is used in financial processes, medical examinations, logistics, publishing, and in a wide range of other fast-rising industries. According to The AI Index 2018 Annual Report by Stanford University, active AI startups in the US increased 2.1x from 2015 to 2018, while venture capital funding for US AI startups increased 4.5x from 2013 to 2017. Today, there are so many AI development companies on the market that it is becoming more and more difficult to choose the one. Based on my experience in IT market research, I've compiled a list of best AI providers.