Plotting

Detecting Localized Density Anomalies in Multivariate Data via Coin-Flip Statistics

arXiv.org Machine Learning

Detecting localized density differences in multivariate data is a crucial task in computational science. Such anomalies can indicate a critical system failure, lead to a groundbreaking scientific discovery, or reveal unexpected changes in data distribution. We introduce EagleEye, an anomaly detection method to compare two multivariate datasets with the aim of identifying local density anomalies, namely over- or under-densities affecting only localised regions of the feature space. Anomalies are detected by modelling, for each point, the ordered sequence of its neighbours' membership label as a coin-flipping process and monitoring deviations from the expected behaviour of such process. A unique advantage of our method is its ability to provide an accurate, entirely unsupervised estimate of the local signal purity. We demonstrate its effectiveness through experiments on both synthetic and real-world datasets. In synthetic data, EagleEye accurately detects anomalies in multiple dimensions even when they affect a tiny fraction of the data. When applied to a challenging resonant anomaly detection benchmark task in simulated Large Hadron Collider data, EagleEye successfully identifies particle decay events present in just 0.3% of the dataset. In global temperature data, EagleEye uncovers previously unidentified, geographically localised changes in temperature fields that occurred in the most recent years. Thanks to its key advantages of conceptual simplicity, computational efficiency, trivial parallelisation, and scalability, EagleEye is widely applicable across many fields.


Barrier Certificates for Unknown Systems with Latent States and Polynomial Dynamics using Bayesian Inference

arXiv.org Machine Learning

-- Certifying safety in dynamical systems is crucial, but barrier certificates -- widely used to verify that system trajectories remain within a safe region -- typically require explicit system models. When dynamics are unknown, data-driven methods can be used instead, yet obtaining a valid certificate requires rigorous uncertainty quantification. For this purpose, existing methods usually rely on full-state measurements, limiting their applicability. This paper proposes a novel approach for synthesizing barrier certificates for unknown systems with latent states and polynomial dynamics. A Bayesian framework is employed, where a prior in state-space representation is updated using input-output data via a targeted marginal Metropolis-Hastings sampler . The resulting samples are used to construct a candidate barrier certificate through a sum-of-squares program. It is shown that if the candidate satisfies the required conditions on a test set of additional samples, it is also valid for the true, unknown system with high probability. The approach and its probabilistic guarantees are illustrated through a numerical simulation. Ensuring the safety of dynamical systems is a critical concern in applications such as human-robot interaction, autonomous driving, and medical devices, where failures can lead to severe consequences. In such scenarios, safety constraints typically mandate that the system state remains within a predefined allowable region. Barrier certificates [1] provide a systematic framework for verifying safety by establishing mathematical conditions that guarantee that system trajectories remain within these regions.


Identifying Obfuscated Code through Graph-Based Semantic Analysis of Binary Code

arXiv.org Machine Learning

Protecting sensitive program content is a critical issue in various situations, ranging from legitimate use cases to unethical contexts. Obfuscation is one of the most used techniques to ensure such protection. Consequently, attackers must first detect and characterize obfuscation before launching any attack against it. This paper investigates the problem of function-level obfuscation detection using graph-based approaches, comparing algorithms, from elementary baselines to promising techniques like GNN (Graph Neural Networks), on different feature choices. We consider various obfuscation types and obfuscators, resulting in two complex datasets. Our findings demonstrate that GNNs need meaningful features that capture aspects of function semantics to outperform baselines. Our approach shows satisfactory results, especially in a challenging 11-class classification task and in a practical malware analysis example.


On Model Protection in Federated Learning against Eavesdropping Attacks

arXiv.org Machine Learning

-- In this study, we investigate the protection offered by Federated Learning algorithms against eavesdropping adversaries. In our model, the adversary is capable of intercepting model updates transmitted from clients to the server, enabling it to create its own estimate of the model. Unlike previous research, which predominantly focuses on safeguarding client data, our work shifts attention to protecting the client model itself. Through a theoretical analysis, we examine how various factors--such as the probability of client selection, the structure of local objective functions, global aggregation at the server, and the eavesdropper's capabilities--impact the overall level of protection. We further validate our findings through numerical experiments, assessing the protection by evaluating the model accuracy achieved by the adversary. Finally, we compare our results with methods based on differential privacy, underscoring their limitations in this specific context. Traditionally, deep learning techniques require centralized data collection and processing that may be infeasible in collaborative scenarios, such as healthcare, credit scoring, vehicle fleet learning, internet-of-things, e-commerce, and natural language processing, due to the high scalability of modern networks, growing sensitive data privacy concerns, and legal regulations such as GDPR [1]-[3]. In these domains, data is often distributed among multiple parties of interest, with no single trusted authority. Federated Learning (FL) has emerged as a distributed collaborative learning paradigm, which allows coordination among multiple clients to perform training without sharing raw data. Instead, they participate in the learning process by training models locally and sharing only the model parameters with a central server. This server aggregates the updates and redistributes the improved model to all participants [4], [5]. Based on the distribution/partition of data among the clients, FL can be classified into horizontal (HFL), vertical (VFL), and transfer (TFL) federated learning [1], [6].


On the Geometry of Receiver Operating Characteristic and Precision-Recall Curves

arXiv.org Machine Learning

We study the geometry of Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves in binary classification problems. The key finding is that many of the most commonly used binary classification metrics are merely functions of the composition function $G := F_p \circ F_n^{-1}$, where $F_p(\cdot)$ and $F_n(\cdot)$ are the class-conditional cumulative distribution functions of the classifier scores in the positive and negative classes, respectively. This geometric perspective facilitates the selection of operating points, understanding the effect of decision thresholds, and comparison between classifiers. It also helps explain how the shapes and geometry of ROC/PR curves reflect classifier behavior, providing objective tools for building classifiers optimized for specific applications with context-specific constraints. We further explore the conditions for classifier dominance, present analytical and numerical examples demonstrating the effects of class separability and variance on ROC and PR geometries, and derive a link between the positive-to-negative class leakage function $G(\cdot)$ and the Kullback--Leibler divergence. The framework highlights practical considerations, such as model calibration, cost-sensitive optimization, and operating point selection under real-world capacity constraints, enabling more informed approaches to classifier deployment and decision-making.


UK needs to relax AI laws or risk transatlantic ties, thinktank warns

The Guardian

To enforce a strict licensing model, the UK would also need to restrict access to models that have been trained on such content, which could include US-owned AI systems. With the Trump administration signalling it will not pursue strict AI regulations and China pursuing AI growth at "breakneck speed", the UK could weaken its economic and national security interests by lagging in the AI race, said TBI. "If the UK imposes laws that are too strict, it risks falling behind in the AI-driven economy and weakening its capacity to protect national security interests," said TBI. The report said arguing that commercial AI models cannot be trained on content from the open web was close to saying knowledge workers โ€“ a broad category of professionals ranging from lawyers to researchers โ€“ cannot profit from insights they get when reading the same content. Rather than fighting to uphold outdated regulations, said TBI, rights holders and policymakers should help build a future where creativity is valued alongside AI innovation. Fernando Garibay, a record producer who has worked with artists including Lady Gaga and U2, said history has been dotted with "end-of-time claims" related to technological breakthroughs, from the printing press to music streaming.


Ghibli effect: ChatGPT usage hits record after rollout of viral feature

The Japan Times

The frenzy to create Ghibli-style AI art using ChatGPT's image-generation tool led to a record surge in users for OpenAI's chatbot last week, straining its servers and temporarily limiting the feature's usage. The viral trend saw users from across the globe flood social media with images based on the hand-drawn style of the famed Japanese animation outfit, Studio Ghibli, founded by renowned director Hayao Miyazaki and known for movies such as "Spirited Away" and "My Neighbor Totoro." Average weekly active users breached the 150 million mark for the first time this year, according to data from market research firm Similarweb.


Intel's new CEO vows to run chipmaker 'as a startup, on day one'

ZDNet

On Monday, chip giant Intel's new CEO, Lip-Bu Tan -- who took over from outgoing CEO Pat Gelsinger only 15 days earlier -- laid out in broad terms his strategy to return the company to greatness. Speaking at Intel Vision, the company's annual event for customers and partners in Las Vegas, Tan emphasized changing Intel's culture, promising to run the company "as a startup, on day one." Tan said the culture needs changing because Intel has lost much of its engineering focus over the years. "Intel has lost some of this talent over the years," he said. "I want to re-group the talent and attract some of the new talent. Also: Intel touts new Xeon chip's AI power in bid to fend off AMD, ARM advances Recalling his affection for basketball and California's Golden State Warriors, Tan remarked, "I love the game, how they pass the ball to the teammate to receive it -- this is the kind of team I would like to build." All of the culture remake, he said, is necessary to "Pull together strong teams to correct the past mistakes and start to earn your trust." Tan put Intel's problems front and center. Without enumerating the mistakes in detail, it's well-known to investors and to the industry at large that Intel has lost an enormous amount of market share to AMD over the years and has ceded the artificial intelligence battle to Nvidia. "It has been a tough period for quite a long time for Intel," observed Tan. "It was very hard for me to watch its struggle; I simply cannot stay on the sideline knowing that I could help turn things around." Addressing the customers in the room, Tan remarked, "You deserve better, and we need to improve -- and we will." He asked the audience to "please be brutally honest with us.


Intel: 'Panther Lake' will be our hybrid hero for the PC

PCWorld

Intel executives pledged Tuesday that its upcoming Panther Lake chip will combine the best aspects of its earlier processors, Lunar Lake and Arrow Lake. Intel executives spoke in Las Vegas on the second day of its Intel Vision conference, which engages Intel's partners and customers. Intel's new chief executive Lip-Bu Tan outlined his plans for Intel's new direction on Monday, asking for brutal honesty while pledging to return Intel to greatness. We already knew that Panther Lake would be a critical product for Intel this year. Not only is the chip the next iteration of Intel's PC client roadmap, but it's the first chip on Intel's next-generation 18A manufacturing process.


The Arlo Video Doorbell is still 54% off after the Amazon Big Spring Sale

Mashable

SAVE 70: The Arlo Video Doorbell is on sale at Amazon for just 59.99, down from the list price of 129.99. That's a 54% discount that matches the lowest price we've ever seen at Amazon. The Amazon Big Spring Sale is officially over, but Amazon has an olive branch for those of us wo didn't get around to shopping the sale. If you have plans to ramp up home security, check out this still-live Amazon Spring Sale deal. As of April 1, the Arlo Video Doorbell is still just 59.99, marked down from the normal price of 129.99.