magneto
Magneto: Combining Small and Large Language Models for Schema Matching
Liu, Yurong, Pena, Eduardo, Santos, Aecio, Wu, Eden, Freire, Juliana
Recent advances in language models opened new opportunities to address complex schema matching tasks. Schema matching approaches have been proposed that demonstrate the usefulness of language models, but they have also uncovered important limitations: Small language models (SLMs) require training data (which can be both expensive and challenging to obtain), and large language models (LLMs) often incur high computational costs and must deal with constraints imposed by context windows. We present Magneto, a cost-effective and accurate solution for schema matching that combines the advantages of SLMs and LLMs to address their limitations. By structuring the schema matching pipeline in two phases, retrieval and reranking, Magneto can use computationally efficient SLM-based strategies to derive candidate matches which can then be reranked by LLMs, thus making it possible to reduce runtime without compromising matching accuracy. We propose a self-supervised approach to fine-tune SLMs which uses LLMs to generate syntactically diverse training data, and prompting strategies that are effective for reranking. We also introduce a new benchmark, developed in collaboration with domain experts, which includes real biomedical datasets and presents new challenges to schema matching methods. Through a detailed experimental evaluation, using both our new and existing benchmarks, we show that Magneto is scalable and attains high accuracy for datasets from different domains.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- (6 more...)
MAGNETO: Edge AI for Human Activity Recognition -- Privacy and Personalization
Zuo, Jingwei, Arvanitakis, George, Ndhlovu, Mthandazo, Hacid, Hakim
Human activity recognition (HAR) is a well-established field, significantly advanced by modern machine learning (ML) techniques. While companies have successfully integrated HAR into consumer products, they typically rely on a predefined activity set, which limits personalizations at the user level (edge devices). Despite advancements in Incremental Learning for updating models with new data, this often occurs on the Cloud, necessitating regular data transfers between cloud and edge devices, thus leading to data privacy issues. In this paper, we propose MAGNETO, an Edge AI platform that pushes HAR tasks from the Cloud to the Edge. MAGNETO allows incremental human activity learning directly on the Edge devices, without any data exchange with the Cloud. This enables strong privacy guarantees, low processing latency, and a high degree of personalization for users. In particular, we demonstrate MAGNETO in an Android device, validating the whole pipeline from data collection to result visualization.
This Fallout TV Show Is a Terrible Idea--Unless It's a Comedy
Ever since Cats of Zero Wing delivered the oddly worded threat "all your base are belong to us" some 30 years ago, the writing in video games has been received with varying levels of enthusiasm. Often, it's denounced as stilted, hackneyed, and just plain nonsensical. At the same time, it has become a much loved, instantly recognizable genre unto itself. While the earliest iconically bad dialog mostly derived from poor translations--like Magneto in the 1992 X-Men arcade game introducing himself as "Magneto, master of magnet!" and shouting "Welcome … to die!"--a lot of it has been terrible all on its own: Peter Dinklage, for example, tried to take a subtle approach to the lines he was fed in Destiny and sounded unmistakably like he'd been drugged. Infamously, Hollywood has spent billions of dollars trying to adapt game franchises into movies and TV shows, yet decades since a goggling Dennis Hopper horrified children across the world with his turn as Nintendo's Bowser, it still hasn't succeeded.
MAGNETO: Fingerprinting USB Flash Drives via Unintentional Magnetic Emissions
Ibrahim, Omar Adel, Sciancalepore, Savio, Oligeri, Gabriele, Di Pietro, Roberto
Universal Serial Bus (USB) Flash Drives are nowadays one of the most convenient and diffused means to transfer files, especially when no Internet connection is available. However, USB flash drives are also one of the most common attack vectors used to gain unauthorized access to host devices. For instance, it is possible to replace a USB drive so that when the USB key is connected, it would install passwords stealing tools, root-kit software, and other disrupting malware. In such a way, an attacker can steal sensitive information via the USB-connected devices, as well as inject any kind of malicious software into the host. To thwart the above-cited raising threats, we propose MAGNETO, an efficient, non-interactive, and privacy-preserving framework to verify the authenticity of a USB flash drive, rooted in the analysis of its unintentional magnetic emissions. We show that the magnetic emissions radiated during boot operations on a specific host are unique for each device, and sufficient to uniquely fingerprint both the brand and the model of the USB flash drive, or the specific USB device, depending on the used equipment. Our investigation on 59 different USB flash drives---belonging to 17 brands, including the top brands purchased on Amazon in mid-2019---, reveals a minimum classification accuracy of 98.2% in the identification of both brand and model, accompanied by a negligible time and computational overhead. MAGNETO can also identify the specific USB Flash drive, with a minimum classification accuracy of 91.2%. Overall, MAGNETO proves that unintentional magnetic emissions can be considered as a viable and reliable means to fingerprint read-only USB flash drives. Finally, future research directions in this domain are also discussed.
- Asia > Middle East > Qatar (0.04)
- North America > United States > Texas (0.04)
- Asia > China (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Data Science (1.00)
- Information Technology > Communications > Networks (0.93)
- (2 more...)