We propose a novel Neural Identity Carrier (NICe), which learns identity transformation from an arbitrary face-swapping proxy via a U– .
To better model the inconsistency of face-swapping proxy, we borrow the theory of aleatoric uncertainty. Moreover, we introduce aleatoric uncertainty loss to tolerate the uncertainty in proxy data and force our NICe to learn the primary identity information in the meantime.
With the predicted temporally stable appearance, we further introduce static detail supervision to help NICe to generate results with more fine-grained details.
We also verify that the refined forgery data can help to improve temporal-aware deepfake detection performance.
2. Related Work
2.1. Face-Swapping Approaches
2.2. Uncertainty Modeling
where N is the number of data points, is the model’s observation noise parameter which captures how much noise we have in the outputs, and is the distribution’s parameters to be optimized.is the ground truth of the output data, is the model’s function, is the input data point,
2.3. 3D Face Reconstruction
3.1. Initial Face Swapping
3.2. Consistency Transfer
3.2.1. Coherence Consistency Transfer
3.2.2. Detail Consistency Transfer
3.3. Static Detail Extractor
where D to a normal map. And By converting original geometry M and its surface normal N to UV space, denoted as and , we can calculate the detail geometry from them. We formulate this process ascontrols the static detail, and both control the dynamic detail. Then, we convert
where B is the shaded texture, represented in UV coordinates. The obtained detail parameters are then used to constrain the network for more realistic results.is a differentiable mesh renderer  and
3.4. Training Losses
3.4.1. Static Detail Extractor Training
with photometric loss, ID-MRF loss , soft symmetry loss , coherence loss and regularization loss .
where I on layer of VGG19.denotes the VGG19’s feature-level distance between and
wheredenotes the facial mask in UV space, and denotes the horizontal flip operation.
where, , , , , and are the parameters of , while is the detail code of .
3.4.2. Neural Identity Carrier Training
where VGG features which consist of features from layers , , , and . The denotes the model’s noise parameter—predicting how much noise we have in the outputs. It is noteworthy that we learn the noise parameter implicitly from the loss function. is a basic perceptual error between and . This loss can basically guarantee that the NICe can learn identity transformation from arbitrary face-swapping proxy.denotes the
whereand are the albedo coefficients of and , encoded by respectively.
whereand are the shape parameters of and encoded by respectively.
whereand are detail information’s latent code of and encoded by respectively.
4.1. Quantitative Evaluation
where t using the ground truth backward flow as defined in , and are the results of frame t and . Here, we only evaluate stability in facial regions. Lower stability error indicates more stable results. For the entire video, we use average errors instead. As shown in Table 1, Our method outperforms all mentioned methods which means that our method produces more steady results.measures the coherence between two adjacent output and , is the facial area mask, is the function to warp to time step