Goto

Collaborating Authors

 device type


Rule: Federated Rule Dataset for Rule Recommendation Benchmarking

Neural Information Processing Systems

In the rapidly evolving landscape of smart home automation, the potential of IoT devices is vast. In this realm, rules are the main tool utilized for this automation, which are predefined conditions or triggers that establish connections between devices, enabling seamless automation of specific processes. However, one significant challenge researchers face is the lack of comprehensive datasets to explore and advance the field of smart home rule recommendations. These datasets are essential for developing and evaluating intelligent algorithms that can effectively recommend rules for automating processes while preserving the privacy of the users, as it involves personal information about users' daily lives. To bridge this gap, we present the Wyze Rule Dataset, a large-scale dataset designed specifically for smart home rule recommendation research. Wyze Rule encompasses over 1 million rules gathered from a diverse user base of 300,000 individuals from Wyze Labs, offering an extensive and varied collection of real-world data. With a focus on federated learning, our dataset is tailored to address the unique challenges of a cross-device federated learning setting in the recommendation domain, featuring a large-scale number of clients with widely heterogeneous data. To establish a benchmark for comparison and evaluation, we have meticulously implemented multiple baselines in both centralized and federated settings. Researchers can leverage these baselines to gauge the performance and effectiveness of their rule recommendation systems, driving advancements in the domain.



How to use the Matter smart home standard: The best way to get all your devices talking to one another

Popular Science

There are a lot of fantastic smart home devices on the market now, but it's not always easy getting them to talk to one another and work in unison. That's where Matter comes in: It's a standardized set of rules for getting smart home kit gadgets to speak the same language. Every smart home device comes with its own apps--but what happens when you want to turn off your lights and heating with one voice command? Or want to use smart plugs from two different manufacturers? Matter aims to standardize all that.


HeteroSwitch: Characterizing and Taming System-Induced Data Heterogeneity in Federated Learning

arXiv.org Artificial Intelligence

Federated Learning (FL) is a practical approach to train deep learning models collaboratively across user-end devices, protecting user privacy by retaining raw data on-device. In FL, participating user-end devices are highly fragmented in terms of hardware and software configurations. Such fragmentation introduces a new type of data heterogeneity in FL, namely \textit{system-induced data heterogeneity}, as each device generates distinct data depending on its hardware and software configurations. In this paper, we first characterize the impact of system-induced data heterogeneity on FL model performance. We collect a dataset using heterogeneous devices with variations across vendors and performance tiers. By using this dataset, we demonstrate that \textit{system-induced data heterogeneity} negatively impacts accuracy, and deteriorates fairness and domain generalization problems in FL. To address these challenges, we propose HeteroSwitch, which adaptively adopts generalization techniques (i.e., ISP transformation and SWAD) depending on the level of bias caused by varying HW and SW configurations. In our evaluation with a realistic FL dataset (FLAIR), HeteroSwitch reduces the variance of averaged precision by 6.3\% across device types.


IoTFlowGenerator: Crafting Synthetic IoT Device Traffic Flows for Cyber Deception

arXiv.org Artificial Intelligence

Over the years, honeypots emerged as an important security tool to understand attacker intent and deceive attackers to spend time and resources. Recently, honeypots are being deployed for Internet of things (IoT) devices to lure attackers, and learn their behavior. However, most of the existing IoT honeypots, even the high interaction ones, are easily detected by an attacker who can observe honeypot traffic due to lack of real network traffic originating from the honeypot. This implies that, to build better honeypots and enhance cyber deception capabilities, IoT honeypots need to generate realistic network traffic flows. To achieve this goal, we propose a novel deep learning based approach for generating traffic flows that mimic real network traffic due to user and IoT device interactions. A key technical challenge that our approach overcomes is scarcity of device-specific IoT traffic data to effectively train a generator. We address this challenge by leveraging a core generative adversarial learning algorithm for sequences along with domain specific knowledge common to IoT devices. Through an extensive experimental evaluation with 18 IoT devices, we demonstrate that the proposed synthetic IoT traffic generation tool significantly outperforms state of the art sequence and packet generators in remaining indistinguishable from real traffic even to an adaptive attacker.


Quantifying and Managing Impacts of Concept Drifts on IoT Traffic Inference in Residential ISP Networks

arXiv.org Artificial Intelligence

Millions of vulnerable consumer IoT devices in home networks are the enabler for cyber crimes putting user privacy and Internet security at risk. Internet service providers (ISPs) are best poised to play key roles in mitigating risks by automatically inferring active IoT devices per household and notifying users of vulnerable ones. Developing a scalable inference method that can perform robustly across thousands of home networks is a non-trivial task. This paper focuses on the challenges of developing and applying data-driven inference models when labeled data of device behaviors is limited and the distribution of data changes (concept drift) across time and space domains. Our contributions are three-fold: (1) We collect and analyze network traffic of 24 types of consumer IoT devices from 12 real homes over six weeks to highlight the challenge of temporal and spatial concept drifts in network behavior of IoT devices; (2) We analyze the performance of two inference strategies, namely "global inference" (a model trained on a combined set of all labeled data from training homes) and "contextualized inference" (several models each trained on the labeled data from a training home) in the presence of concept drifts; and (3) To manage concept drifts, we develop a method that dynamically applies the ``closest'' model (from a set) to network traffic of unseen homes during the testing phase, yielding better performance in 20% of scenarios.


Device identification using optimized digital footprints

arXiv.org Artificial Intelligence

The rapidly increasing number of internet of things (IoT) and non-IoT devices has imposed new security challenges to network administrators. Accurate device identification in the increasingly complex network structures is necessary. In this paper, a device fingerprinting (DFP) method has been proposed for device identification, based on digital footprints, which devices use for communication over a network. A subset of nine features have been selected from the network and transport layers of a single transmission control protocol/internet protocol packet based on attribute evaluators in Weka, to generate device-specific signatures. The method has been evaluated on two online datasets, and an experimental dataset, using different supervised machine learning (ML) algorithms. Results have shown that the method is able to distinguish device type with up to 100% precision using the random forest (RF) classifier, and classify individual devices with up to 95.7% precision. These results demonstrate the applicability of the proposed DFP method for device identification, in order to provide a more secure and robust network.


Using Large Pre-Trained Language Model to Assist FDA in Premarket Medical Device

arXiv.org Artificial Intelligence

This paper proposes a possible method using natural language processing that might assist in the FDA medical device marketing process. Actual device descriptions are taken and matched with the device description in FDA Title 21 of CFR to determine their corresponding device type. Both pre-trained word embeddings such as FastText and large pre-trained sentence embedding models such as sentence transformers are evaluated on their accuracy in characterizing a piece of device description. An experiment is also done to test whether these models can identify the devices wrongly classified in the FDA database. The result shows that sentence transformer with T5 and MPNet and GPT-3 semantic search embedding show high accuracy in identifying the correct classification by narrowing down the correct label to be contained in the first 15 most likely results, as compared to 2585 types of device descriptions that must be manually searched through. On the other hand, all methods demonstrate high accuracy in identifying completely incorrectly labeled devices, but all fail to identify false device classifications that are wrong but closely related to the true label.


Studying the Robustness of Anti-adversarial Federated Learning Models Detecting Cyberattacks in IoT Spectrum Sensors

arXiv.org Artificial Intelligence

Device fingerprinting combined with Machine and Deep Learning (ML/DL) report promising performance when detecting cyberattacks targeting data managed by resource-constrained spectrum sensors. However, the amount of data needed to train models and the privacy concerns of such scenarios limit the applicability of centralized ML/DL-based approaches. Federated learning (FL) addresses these limitations by creating federated and privacy-preserving models. However, FL is vulnerable to malicious participants, and the impact of adversarial attacks on federated models detecting spectrum sensing data falsification (SSDF) attacks on spectrum sensors has not been studied. To address this challenge, the first contribution of this work is the creation of a novel dataset suitable for FL and modeling the behavior (usage of CPU, memory, or file system, among others) of resource-constrained spectrum sensors affected by different SSDF attacks. The second contribution is a pool of experiments analyzing and comparing the robustness of federated models according to i) three families of spectrum sensors, ii) eight SSDF attacks, iii) four scenarios dealing with unsupervised (anomaly detection) and supervised (binary classification) federated models, iv) up to 33% of malicious participants implementing data and model poisoning attacks, and v) four aggregation functions acting as anti-adversarial mechanisms to increase the models robustness.


Doing good by fighting fraud: Ethical anti-fraud systems for mobile payments

arXiv.org Artificial Intelligence

App builders commonly use security challenges, a form of step-up authentication, to add security to their apps. However, the ethical implications of this type of architecture has not been studied previously. In this paper, we present a large-scale measurement study of running an existing anti-fraud security challenge, Boxer, in real apps running on mobile devices. We find that although Boxer does work well overall, it is unable to scan effectively on devices that run its machine learning models at less than one frame per second (FPS), blocking users who use inexpensive devices. With the insights from our study, we design Daredevil, anew anti-fraud system for scanning payment cards that work swell across the broad range of performance characteristics and hardware configurations found on modern mobile devices. Daredevil reduces the number of devices that run at less than one FPS by an order of magnitude compared to Boxer, providing a more equitable system for fighting fraud. In total, we collect data from 5,085,444 real devices spread across 496 real apps running production software and interacting with real users.