Goto

Collaborating Authors

 taipei


Inspired by Ukraine, and worried by China: Taiwan teaches its citizens how to fly drones

The Guardian

I n a small, crowded room in Taipei, Pan Chien-chin is trying to keep a drone hovering steadily. Imagining himself flying a plane, he gently nudges controller joysticks to guide the insect-like device as it hums through the air. Cheers break out as Pan, who has never flown a drone before, steers it around a rectangular course marked by traffic cones without crashing. Around him are about two dozen fellow trainees, all signed up for the same course: Taiwan's first civil defence drone training programme. "The war in Ukraine has really changed how drones are used," says Pan, 48, a food company worker. "It's like giving myself another skill, something I can use if it's ever needed one day," he adds.


Beyond Permissions: Investigating Mobile Personalization with Simulated Personas

arXiv.org Artificial Intelligence

Mobile applications increasingly rely on sensor data to infer user context and deliver personalized experiences. Yet the mechanisms behind this personalization remain opaque to users and researchers alike. This paper presents a sandbox system that uses sensor spoofing and persona simulation to audit and visualize how mobile apps respond to inferred behaviors. Rather than treating spoofing as adversarial, we demonstrate its use as a tool for behavioral transparency and user empowerment. Our system injects multi-sensor profiles - generated from structured, lifestyle-based personas - into Android devices in real time, enabling users to observe app responses to contexts such as high activity, location shifts, or time-of-day changes. With automated screenshot capture and GPT-4 Vision-based UI summarization, our pipeline helps document subtle personalization cues. Preliminary findings show measurable app adaptations across fitness, e-commerce, and everyday service apps such as weather and navigation. We offer this toolkit as a foundation for privacy-enhancing technologies and user-facing transparency interventions.


You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors

arXiv.org Artificial Intelligence

Large language models (LLMs) have been widely adopted across various applications, leveraging customized system prompts for diverse tasks. Facing potential system prompt leakage risks, model developers have implemented strategies to prevent leakage, primarily by disabling LLMs from repeating their context when encountering known attack patterns. However, it remains vulnerable to new and unforeseen prompt-leaking techniques. In this paper, we first introduce a simple yet effective prompt leaking attack to reveal such risks. Our attack is capable of extracting system prompts from various LLM-based application, even from SOTA LLM models such as GPT-4o or Claude 3.5 Sonnet. Our findings further inspire us to search for a fundamental solution to the problems by having no system prompt in the context. To this end, we propose SysVec, a novel method that encodes system prompts as internal representation vectors rather than raw text. By doing so, SysVec minimizes the risk of unauthorized disclosure while preserving the LLM's core language capabilities. Remarkably, this approach not only enhances security but also improves the model's general instruction-following abilities. Experimental results demonstrate that SysVec effectively mitigates prompt leakage attacks, preserves the LLM's functional integrity, and helps alleviate the forgetting issue in long-context scenarios.


E-PhishGen: Unlocking Novel Research in Phishing Email Detection

arXiv.org Artificial Intelligence

Every day, our inboxes are flooded with unsolicited emails, ranging between annoying spam to more subtle phishing scams. Unfortunately, despite abundant prior efforts proposing solutions achieving near-perfect accuracy, the reality is that countering malicious emails still remains an unsolved dilemma. This "open problem" paper carries out a critical assessment of scientific works in the context of phishing email detection. First, we focus on the benchmark datasets that have been used to assess the methods proposed in research. We find that most prior work relied on datasets containing emails that -- we argue -- are not representative of current trends, and mostly encompass the English language. Based on this finding, we then re-implement and re-assess a variety of detection methods reliant on machine learning (ML), including large-language models (LLM), and release all of our codebase -- an (unfortunately) uncommon practice in related research. We show that most such methods achieve near-perfect performance when trained and tested on the same dataset -- a result which intrinsically hinders development (how can future research outperform methods that are already near perfect?). To foster the creation of "more challenging benchmarks" that reflect current phishing trends, we propose E-PhishGEN, an LLM-based (and privacy-savvy) framework to generate novel phishing-email datasets. We use our E-PhishGEN to create E-PhishLLM, a novel phishing-email detection dataset containing 16616 emails in three languages. We use E-PhishLLM to test the detectors we considered, showing a much lower performance than that achieved on existing benchmarks -- indicating a larger room for improvement. We also validate the quality of E-PhishLLM with a user study (n=30). To sum up, we show that phishing email detection is still an open problem -- and provide the means to tackle such a problem by future research.


Ensembling Membership Inference Attacks Against Tabular Generative Models

arXiv.org Artificial Intelligence

Membership Inference Attacks (MIAs) have emerged as a principled framework for auditing the privacy of synthetic data generated by tabular generative models, where many diverse methods have been proposed that each exploit different privacy leakage signals. However, in realistic threat scenarios, an adversary must choose a single method without a priori guarantee that it will be the empirically highest performing option. We study this challenge as a decision theoretic problem under uncertainty and conduct the largest synthetic data privacy benchmark to date. Here, we find that no MIA constitutes a strictly dominant strategy across a wide variety of model architectures and dataset domains under our threat model. Motivated by these findings, we propose ensemble MIAs and show that unsupervised ensembles built on individual attacks offer empirically more robust, regret-minimizing strategies than individual attacks.


In China's shadow, Taiwan is building a drone army to repel an invasion

Al Jazeera

The tiny "stealth" Carbon Voyager 1, fast-moving Black Tide I, and explosives-carrying Sea Shark 800 were the highlight of an expo for companies vying to help Taiwan build up a maritime drone force. Taipei believes drones could be pivotal in repelling China in the event its forces attempt to invade the self-ruled island, which Beijing has threatened to annex by force if necessary. Su'ao is just 60km (37 miles) from Fulong, one of the so-called "red beaches" identified by defence experts as potential landing sites for the People's Liberation Army (PLA) due to their unique topography. Whereas Russia sent tanks across land borders to launch its war on Ukraine in 2022, a Chinese invasion of Taiwan would involve Beijing sending vessels across the 180-km- (112-mile-)wide Taiwan Strait. While the Taiwan Strait's choppy waters and Taiwan's mountainous geography and shallow beaches pose formidable challenges to an amphibious invasion, technological advances and a decades-long modernisation campaign by the PLA have steadily chipped away at the island's natural defences.


Busting the Paper Ballot: Voting Meets Adversarial Machine Learning

arXiv.org Artificial Intelligence

We show the security risk associated with using machine learning classifiers in United States election tabulators. The central classification task in election tabulation is deciding whether a mark does or does not appear on a bubble associated to an alternative in a contest on the ballot. Barretto et al. (E-Vote-ID 2021) reported that convolutional neural networks are a viable option in this field, as they outperform simple feature-based classifiers. Our contributions to election security can be divided into four parts. To demonstrate and analyze the hypothetical vulnerability of machine learning models on election tabulators, we first introduce four new ballot datasets. Second, we train and test a variety of different models on our new datasets. These models include support vector machines, convolutional neural networks (a basic CNN, VGG and ResNet), and vision transformers (Twins and CaiT). Third, using our new datasets and trained models, we demonstrate that traditional white box attacks are ineffective in the voting domain due to gradient masking. Our analyses further reveal that gradient masking is a product of numerical instability. We use a modified difference of logits ratio loss to overcome this issue (Croce and Hein, ICML 2020). Fourth, in the physical world, we conduct attacks with the adversarial examples generated using our new methods. In traditional adversarial machine learning, a high (50% or greater) attack success rate is ideal. However, for certain elections, even a 5% attack success rate can flip the outcome of a race. We show such an impact is possible in the physical domain. We thoroughly discuss attack realism, and the challenges and practicality associated with printing and scanning ballot adversarial examples.


US chip export controls are a 'failure' because they spur Chinese development, Nvidia boss says

The Guardian

US chip exports controls have been a "failure", the head of Nvidia, Jensen Huang, told a tech forum on Wednesday, as the Chinese government separately slammed US warnings to other countries against using Chinese tech. Successive US administrations have imposed restrictions on the sale of hi-tech AI chips to China, in an effort to curb China's military advancement and protect US dominance of the AI industry. But Huang told the Computex tech forum in Taipei that the controls had instead spurred on Chinese developers. "The local companies are very, very talented and very determined, and the export control gave them the spirit, the energy and the government support to accelerate their development," Huang told media the Computex tech show in Taipei. "I think, all in all, the export control was a failure."


CAMOUFLAGE: Exploiting Misinformation Detection Systems Through LLM-driven Adversarial Claim Transformation

arXiv.org Artificial Intelligence

Automated evidence-based misinformation detection systems, which evaluate the veracity of short claims against evidence, lack comprehensive analysis of their adversarial vulnerabilities. Existing black-box text-based adversarial attacks are ill-suited for evidence-based misinformation detection systems, as these attacks primarily focus on token-level substitutions involving gradient or logit-based optimization strategies, which are incapable of fooling the multi-component nature of these detection systems. These systems incorporate both retrieval and claim-evidence comparison modules, which requires attacks to break the retrieval of evidence and/or the comparison module so that it draws incorrect inferences. We present CAMOUFLAGE, an iterative, LLM-driven approach that employs a two-agent system, a Prompt Optimization Agent and an Attacker Agent, to create adversarial claim rewritings that manipulate evidence retrieval and mislead claim-evidence comparison, effectively bypassing the system without altering the meaning of the claim. The Attacker Agent produces semantically equivalent rewrites that attempt to mislead detectors, while the Prompt Optimization Agent analyzes failed attack attempts and refines the prompt of the Attacker to guide subsequent rewrites. This enables larger structural and stylistic transformations of the text rather than token-level substitutions, adapting the magnitude of changes based on previous outcomes. Unlike existing approaches, CAMOUFLAGE optimizes its attack solely based on binary model decisions to guide its rewriting process, eliminating the need for classifier logits or extensive querying. We evaluate CAMOUFLAGE on four systems, including two recent academic systems and two real-world APIs, with an average attack success rate of 46.92\% while preserving textual coherence and semantic equivalence to the original claims.


Taiwan Makes the Majority of the World's Computer Chips. Now It's Running Out of Electricity

WIRED

This story originally appeared on Yale Environment 360 and is part of the Climate Desk collaboration. Some 50 miles southwest of Taipei, Taiwan's capital, and strategically located close to a cluster of the island's top universities, the 3,500-acre Hsinchu Science Park is globally celebrated as the incubator of Taiwan's most successful technology companies. It opened in 1980, the government having acquired the land and cleared the rice fields,with the aim of creating a technology hub that would combine advanced research and industrial production. Today Taiwan's science parks house more than 1,100 companies, employ 321,000 people, and generate 127 billion in annual revenue. Along the way, Hsinchu Science Park's Industrial Technology Research Institute has given birth to startups that have grown into world leaders.