Greenstadt, Rachel
Can deepfakes be created by novice users?
Mehta, Pulak, Jagatap, Gauri, Gallagher, Kevin, Timmerman, Brian, Deb, Progga, Garg, Siddharth, Greenstadt, Rachel, Dolan-Gavitt, Brendan
Recent advancements in machine learning and computer vision have led to the proliferation of Deepfakes. As technology democratizes over time, there is an increasing fear that novice users can create Deepfakes, to discredit others and undermine public discourse. In this paper, we conduct user studies to understand whether participants with advanced computer skills and varying levels of computer science expertise can create Deepfakes of a person saying a target statement using limited media files. We conduct two studies; in the first study (n = 39) participants try creating a target Deepfake in a constrained time frame using any tool they desire. In the second study (n = 29) participants use pre-specified deep learning-based tools to create the same Deepfake. We find that for the first study, 23.1% of the participants successfully created complete Deepfakes with audio and video, whereas, for the second user study, 58.6% of the participants were successful in stitching target speech to the target video. We further use Deepfake detection software tools as well as human examiner-based analysis, to classify the successfully generated Deepfake outputs as fake, suspicious, or real. The software detector classified 80% of the Deepfakes as fake, whereas the human examiners classified 100% of the videos as fake. We conclude that creating Deepfakes is a simple enough task for a novice user given adequate tools and time; however, the resulting Deepfakes are not sufficiently real-looking and are unable to completely fool detection software as well as human examiners
Active Authentication on Mobile Devices via Stylometry, Application Usage, Web Browsing, and GPS Location
Fridman, Lex, Weber, Steven, Greenstadt, Rachel, Kam, Moshe
Active authentication is the problem of continuously verifying the identity of a person based on behavioral aspects of their interaction with a computing device. In this study, we collect and analyze behavioral biometrics data from 200subjects, each using their personal Android mobile device for a period of at least 30 days. This dataset is novel in the context of active authentication due to its size, duration, number of modalities, and absence of restrictions on tracked activity. The geographical colocation of the subjects in the study is representative of a large closed-world environment such as an organization where the unauthorized user of a device is likely to be an insider threat: coming from within the organization. We consider four biometric modalities: (1) text entered via soft keyboard, (2) applications used, (3) websites visited, and (4) physical location of the device as determined from GPS (when outdoors) or WiFi (when indoors). We implement and test a classifier for each modality and organize the classifiers as a parallel binary decision fusion architecture. We are able to characterize the performance of the system with respect to intruder detection time and to quantify the contribution of each modality to the overall performance.
Learning to Extract Quality Discourse in Online Communities
Brennan, Michael Robert (Drexel University) | Wrazien, Stacy (Drexel University) | Greenstadt, Rachel (Drexel University)
Collaborative filtering systems have been developed to manage information overload and improve discourse in online communities. In such systems, users rank content provided by other users on the validity or usefulness within their particular context. The goal is that "good" content will rise to prominence and "bad" content will fade into obscurity. These filtering mechanisms are not well-understood and have known weaknesses. For example, they depend on the presence of a large crowd to rate content, but such a crowd may not be present. Additionally, the community's decisions determine which voices will reach a large audience and which will be silenced, but it is not known if these decisions represent "the wisdom of crowds" or a "censoring mob." Our approach uses statistical machine learning to predict community ratings. By extracting features that replicate the community's verdict, we can better understand collaborative filtering, improve the way the community uses the ratings of their members, and design agents that augment community decision-making. Slashdot is an example of such a community where peers will rate each others' comments based on their relevance to the post. This work extracts a wide variety of features from the Slashdot metadata and posts' linguistic contents to identify features that can predict the community rating. We find that author reputation, use of pronouns, and author sentiment are salient. We achieve 76% accuracy predicting community ratings as good, neutral, or bad.
A Travel-Time Optimizing Edge Weighting Scheme for Dynamic Re-Planning
Feit, Andrew (Drexel University) | Toval, Lenrik (Drexel University) | Hovagimian, Raffi (Drexel University) | Greenstadt, Rachel (Drexel University)
The success of autonomous vehicles has made path planning in real, physically grounded environments an increasingly important problem. In environments where speed matters and vehicles must maneuver around obstructions, such as autonomous car navigation in hostile environments, the speed with which real vehicles can traverse a path is often dependent on the sharpness of the corners on the path as well as the length of path edges. We present an algorithm that incorporates the use of the turn angle through path nodes as a limiting factor for vehicle speed. Vehicle speed is then used in a time-weighting calculation for each edge. This allows the path planning algorithm to choose potentially longer paths, with less turns in order to minimize path traversal time. Results simulated in the Breve environment show that travel time can be reduced over the solution obtained using the Anytime D* Algorithm by approximately 10% for a vehicle that is speed limited based on turn rate.
Practical Attacks Against Authorship Recognition Techniques
Brennan, Michael Robert (Drexel University) | Greenstadt, Rachel (Drexel University)
The use of statistical AI techniques in authorship recognition (or stylometry) has contributed to literary and historical breakthroughs. These successes have led to the use of these techniques in criminal investigations and prosecutions. However, few have studied adversarial attacks and their devastating effect on the robustness of existing classification methods. This paper presents a framework for adversarial attacks including obfuscation attacks, where a subject attempts to hide their identity imitation attacks, where a subject attempts to frame another subject by imitating their writing style. The major contribution of this research is that it demonstrates that both attacks work very well. The obfuscation attack reduces the effectiveness of the techniques to the level of random guessing and the imitation attack succeeds with 68-91% probability depending on the stylometric technique used. These results are made more significant by the fact that the experimental subjects were unfamiliar with stylometric techniques, without specialized knowledge in linguistics, and spent little time on the attacks. This paper also provides another significant contribution to the field in using human subjects to empirically validate the claim of high accuracy for current techniques (without attacks) by reproducing results for three representative stylometric methods.