performer
The shock of seeing your body used in deepfake porn
Adult content creators are having their performances used without consent. This is just one way that AI now threatens their rights and livelihoods. When Jennifer got a job doing research for a nonprofit in 2023, she ran her new professional headshot through a facial recognition program. She wanted to see if the tech would pull up the porn videos she'd made more than 10 years before, when she was in her early 20s. It did in fact return some of that content, and also something alarming that she'd never seen before: one of her old videos, but with someone else's face on her body. "At first, I thought it was just a different person," says Jennifer, who is being identified by a pseudonym to protect her privacy. But then she recognized a distinctly garish background from a video she'd shot around 2013, and she realized: "Somebody used me in a deepfake."
Sub-Linear Memory: How to Make Performers SLiM
Transformer architectures have become very popular yet the original implementation requires O(L2) in serial time and memory as functions of input length L. Recent works proposed various linear self-attention mechanisms, scaling only as O(L) for serial computation. We conduct a thorough complexity analysis of Performers, a class which includes most recent linear Transformer mechanisms. We note a remarkable computational flexibility: the gradient computation can be performed with no approximations using sublinear memory as a function of L (in addition to negligible storage for the input sequence), at a cost of greater time complexity in the parallel setting. In the extreme case, a Performer consumes only O(1) memory, and still requires O(L) time. Due to complete backwardcompatibility, this discovered time-memory tradeoff can be used for fine-tuning on low-memory devices in a decentralized fashion without any server computations.
0aa800df4298539770b57824afc77a89-Supplemental-Conference.pdf
Figure 8: The average values during training of the two components used in the criteria for neuron importance in the input layer: the absolute gradient of the loss with respect to the reconstructed samples and the sum of the absolute weights connected to a neuron. A.1 Implementation Details For all datasets, we used standard normalization that scales the features to have zero mean and standard deviation of one. The architecture of the autoencoder consists of one hidden layer with sigmoid activation. A linear activation is used for the output layer. We use a hidden layer of 200 neurons for all datasets.
Sub-Linear Memory: How to Make Performers SLiM
Transformer architectures have become very popular yet the original implementation requires $O(L^2)$ in serial time and memory as functions of input length $L$. Recent works proposed various linear self-attention mechanisms, scaling only as $O(L)$ for serial computation. We conduct a thorough complexity analysis of Performers, a class which includes most recent linear Transformer mechanisms. We note a remarkable computational flexibility: the gradient computation can be performed with no approximations using sublinear memory as a function of $L$ (in addition to negligible storage for the input sequence), at a cost of greater time complexity in the parallel setting. In the extreme case, a Performer consumes only $O(1)$ memory, and still requires $O(L)$ time. Due to complete backward-compatibility, this discovered time-memory tradeoff can be used for fine-tuning on low-memory devices in a decentralized fashion without any server computations.
Retrieval
Late interaction methods compute representations for the query and corpus graphs separately, and compare these representations using simple similarity functions at the last stage, leading to highly scalable systems. Early interaction methods combine information from both graphs right from the input stages, are usually considerablymoreaccurate,butslower.