ABSTRACT
Magnetic flux leakage (MFL) and ultrasonic testing (UT) are widely used in-line inspection technologies to detect corrosion defects along pipelines. The integration of MFL and UT data has the potential to provide complementary insights that facilitate a comprehensive assessment of pipeline integrity. However, due to the inherent dissimilarity with their underlying physical principles, these techniques yield notable disparities in signal characteristics, posing challenges in integrating these multimodal data. This study aims to establish a translation mapping between MFL and UT signals to achieve consistent physical interpretations across the two modalities. Thus, this study explored the feasibility of generative adversarial network (GAN) based models encompassing both supervised and unsupervised translation approaches contingent on the availability of aligned data. Furthermore, two translation modes, MFL-UT and UT-MFL, were analyzed separately to understand the effectiveness of the translation direction. The experimental results demonstrate satisfactory performance for both aligned and unaligned data translation, with the UT-MFL translation direction yielding superior results. Overall, the translation approaches pave the way for future applications, especially in subsequent data analysis tasks such as registration, comparison, and fusion of multimodal data.
1. Introduction
Steel pipes are commonly used for oil and gas transmission due to their durability and efficiency. However, the presence of corrosion over time poses a significant risk of pipeline failures. To prevent such failures, nondestructive inspections (NDI) are periodically used to detect defects and ensure safe pipeline operation. In the practical field, magnetic flux leakage (MFL) and ultrasonic testing (UT) are matured NDI technologies in the pipeline industry [1,2]. By analyzing acquired NDI signals, suspected defects that endanger the integrity of the pipeline structure can be identified and quantified [3,4]. Based on statistical comparisons of in-line inspection (ILI) and field data, axial MFL systems have been observed to be more sensitive to pits and circumferential grooves, while UT systems excel in accurate wall thinning and scanning of large corroded areas [5,6]. The integration of multiple measurement systems allows for more reliable and comprehensive fused results. However, the diversity of feature expression for MFL and UT limits downstream tasks such as difference detection and data fusion [7–9]. To facilitate low-level data comparison and smooth operational processes, the imperative arises to harmonize heterogeneous data originating from divergent sensor types and bring them into a common domain [10]. This unification of NDI data of different modalities enables easier comparison between disparate NDI systems, ultimately strengthening the overall integrity assessment capabilities.
Transformation is one popular way to connect heterogeneous data in the NDI domain. The transformation between NDI data types can be implemented based on the principle of physics or the data-driven approach [11,12]. For instance, a Q-transform is proposed for the registration of ultrasonic signals and eddy current signals in different physical formats, so that the transformed ultrasonic signals can be superimposed on the eddy current field [13]. Q-transform is specifically designed to handle the unique properties of eddy current and ultrasonic signals by converting ultrasonic signals from the wave domain into the diffusion domain. However, due to the distinct nature of MFL signals, using Q-transform directly for MFL data transformation will not be feasible. Additionally, normalization has been used to make sensor data with different quantities comparable [14]. However, a limitation of traditional normalization is the requirement for investigating transform algorithms to unify different response patterns at defects. Given the complexity of manually complementing transformations between different sensors, data-driven approaches were introduced to address this limitation. In [15], the deep features of thermal images and ultrasound images are automatically extracted by an encoder. Such feature representation makes multi-modality fusion more effective. Furthermore, to learn the underlying relationships between heterogeneous NDI data, the shared latent features among the two signal modalities were identified by the autoencoder neural network [16]. By leveraging data-driven approaches, researchers can analyze the correlations between different types of NDI data at a high level, reducing the reliance on complex knowledge of physical transformations.
NDI techniques such as MFL and UT are routinely employed every two to six years in pipelines to detect damage. However, existing research lacks a systematic mapping of MFL and UT data to relate multimodal inspection data, which is essential to fully leverage the value of inspection data and enhance the understanding of pipeline integrity. As an indirect measurement system, the MFL technique captures magnetic field disturbances to identify defects [17]. On the contrary, the UT technique is considered as a straightforward measurement that provides wall thickness [18]. The acquired UT and MFL signals can be effectively represented as two-dimensional matrices, which are then visually represented as images in Fig. 1. Due to the heterogeneous physical principles underlying the MFL and UT technologies, the images exhibit significant differences. Attempting to translate two pipeline NDI modalities, MFL and UT, into the same domain, the problem was approached as an image translation task. This involves the development of a mapping function that effectively transforms images from the source domain to the target domain while preserving crucial content properties of the source image. It should be noted that despite the limited exploration of image translation techniques in NDI applications, such approaches have been extensively studied and implemented in various fields, including remote sensing, natural images, and medical images [19–21]. In these fields, image translation has proven to be a valuable tool for bridging the gap between disparate domains and facilitating meaningful data interpretation.
Deep learning (DL) methods have facilitated automated defect detection, classification, and quantification in non-destructive evaluation (NDE) [22,23]. In the context of complex data variations due to environmental changes or manufacturing uncertainties and material complexity, DL techniques have shown a great advantage in their ability to learn complex patterns and features from inspection data with high efficiency, eliminating the need for manual feature engineering [24]. Deep learning methods, such as Generative Adversarial Networks (GAN), can effectively translate images across varied styles [25]. However, the application of deep learning-based translation methods to multiple NDE data has not been explored and investigated yet. GAN’s unique adversarial constraints enable it to generate images that closely resemble real ones, making it challenging to distinguish between the two. In the context of image translation, the models have been explored both in supervised and unsupervised approaches. In the supervised approach, conditional GANs using convolutional neural networks (CNNs) are applied for aligned image-to-image translation [26]. To overcome the lack of available aligned samples, the unaligned image-to-image translation allowing models to learn mappings without explicit training pairs was proposed [27]. The cross-domain mappings can be learned by specific constraints and assumptions. More detailed GAN-based translation methods will be illustrated in Section 2.
In the context of NDE 4.0, information from multiple NDE sources is automatically processed and integrated with artificial intelligence technologies to achieve high-level reliability for asset integrity assessment [28,29]. However, aligning data from different sources of varied physical formats remains a challenge. As discussed, GAN-based translation methods provide a viable solution for aligning MFL and UT data. The translation mapping established with this approach aims to improve the efficiency of NDE practices by ensuring consistent physical interpretations across different inspection modalities. The pipeline NDI image translation faces several challenges: (1) Translation between MFL and UT images requires alignment with strict constraints on spatial locations. Objects in one dataset must be translated into a correct form based on the physical properties in the other dataset, ensuring exact positional correspondence. (2) Aligned MFL and UT observations are not always available. Mismatches between two inspection runs are often observed due to different coordinate systems. This requires the translation model to robustly handle unaligned sample pairs during the learning process. For this purpose, GAN-based translation models are used to translate between multimodal NDI data. These models pro- vide pixel-wise translation and can be trained on a dataset containing aligned or unaligned data pairs. The contributions are summarized as follows:
- The automatic translation algorithm for multimodal inspection data is employed in the NDT domain. This potentially facilitates subsequent tasks such as direct comparison and fusion of MFL and UT data.
- To address different data availability scenarios, translation models trained on either aligned or unaligned data have been implemented.
- A comparative evaluation was conducted by analyzing two translation directions, MFL UT and UT-MFL, to determine the optimal translation direction.
The rest of the paper is organized as follows. In Section 2, the methods for cross modality translation are introduced. The experimental validation and results are emphasized in Section 3. The discussion and related recommendations are shown in Section 4. Finally, conclusions are made in Section 5.
2. Translation of cross-modality data
Image translation based on GAN has attracted much more attention as it enables the generation of images with customized requirements. GANs are generative models with the goal of jointly training the generator and discriminator to map source images to target images with minimized differences in pixel values and shape features [30]. Basically, the generator aims to generate samples that resemble real images, while the discriminator’s task is to distinguish between real and generated images as much as possible. During training, both the generator and the discriminator compete in an adversarial scheme to solve a min–max optimization problem.
For multimodal NDI data such as MFL and UT, a deep translation network enables the conversion of MFL images to UT images or vice versa, ensuring that both images are kept in the same domain. Many GAN variants have been proposed to tackle the cross-domain problem. These translation models can be classified into two categories: supervised and unsupervised methods, depending on the availability of aligned training samples. Supervised image translation requires a one-to-one correspondence between images in the source and target domains, while the unsupervised translation models aim to learn a mapping between images from the source and target domains without the need for explicit pixel pairing. This section discusses models applicable to aligned and unaligned training data respectively, accommodating diverse data availability scenarios and addressing the challenges of multimodal NDI data translation.
Fig. 1. Examples of signal. (a) Ultrasonic testing. (b) Magnetic flux leakage.
Fig. 2. The schematic diagram of Res-Pix2Pix. (a) Res-Pix2Pix architecture. (b) Discriminator structure. (c) Generator structure.
2.1. Supervised translation with GAN
In this section, we introduce translation methods designed for supervised settings specifically tailored for aligned data, including Pix2Pix and BicycleGAN. These models operate on the foundation of conditional generative adversarial networks (cGANs), utilizing source and corresponding target images as conditional input for model training. Framework and implementation details are presented for each method.
(1) Pix2Pix: The Pix2Pix is one representative supervised image translation framework [26]. The architecture of a residual-based Pix2Pix (Res-Pix2Pix) is shown in Fig. 2(a). The framework consists of generator and discriminator sections. Specifically, the source image is fed into the generator 𝐺 as a condition to generate a pseudo image in the target domain. Then, the discriminator 𝐷 recognizes the generated image and the ground truth. By continuously optimizing the generator and the discriminator, the translated image gets closer to the ground truth. The diagram in Fig. 2 shows one translation direction from 𝑋 to 𝑌 . For the opposite translation direction from 𝑌 to 𝑋 adopts the same network structure, only the source and target domain are exchanged. Depending on the translation direction, X can represent either the UT domain or the MFL domain. In the displayed image, X represents the UT domain and Y represents the MFL domain.
The generator 𝐺𝑌 translates images from the 𝑋 domain to the 𝑌 domain. Different from the original Pix2Pix network employs the U-Net for the generator, the architecture of the generator in this study consists of an encoder–decoder network with 6 residual bottleneck. The encoder–decoder architecture helps the output retain the same size as the input. The encoder takes the input image and performs feature extraction with convolution operation, while the decoder utilizes transposed convolution for pixel-level predictions. To retain the details of the source image, several residual blocks are designed to deep the network [31]. Moreover, we use instance normalization after each layer to stabilize the learning process [32].
The discriminator 𝐷𝑌 adapts the PatchGAN with five layers, enabling the discriminator to distinguish the images in a smaller patch level. This discriminator is implemented as a straightforward feed forward convolutional network. For the Pix2Pix network trained with aligned images, the input of the discriminator is concated true image pairs (𝑋, 𝑌 ), and fake image pairs (𝑋, 𝑌̂ ). The output of the discriminator is a probability map. With the probability map, binary output can be obtained to denote whether the patch is fake or real. It should be noted that the discriminator is only used in the training stage.
In the training phase, the total objective function governing the weight updates for both the discriminator and generator is formulated as shown in Eq. (1). Both networks compete in an adversarial scheme to solve a min–max optimization problem over the objective loss function shown as Eq. (2). To effectively guide the learning of the generator, a reconstruction penalty is introduced. As detailed in [26], the L1-norm is employed as a measure of the distance between the generated image and the target, ensuring that the translation results closely approximate the ground truth. In this manner, the reconstruction loss aims to produce a realistic image that closely aligns with the target domain, which is defined as Eq. (3).
where LGAN is the adversarial loss, Lrecon is the reconstruction loss, 𝜆 is the regularization weight that controls the weight of the reconstruction loss. E donates the expected value. ‖.‖1 denotes 𝐿1 distance. In each iteration of adversarial training, the optimization objective is pursued through parameter updates for the discriminator (𝐷) and generator (𝐺). Specifically, the parameters of 𝐷 are updated by maximizing the objective value using training pairs, while the parameters of 𝐺 are concurrently updated by minimizing the same objective value. This iterative process employs minibatch stochastic gradient descent (SGD) with the Adam solver in each training step [33].
(2) BicycleGAN: Another translation model for aligned data is BicycleGAN, which is similar to Pix2Pix with the difference of incorporating target domain characteristics with higher-dimensional latent variables [34]. Consistent mapping between the output and latent layers was achieved through joint training of the conditional variational autoencoder (cVAE-GAN) and the conditional latent regressor GAN (cLR-GAN). Moreover, random noise satisfying the Gaussian distribution was introduced to broaden the diversity of image generation.
For the implementation of the training, the data flow in cVAE-GAN follows 𝑌 → 𝑧 → 𝑌̂ , which can be understood as the reconstruction of 𝑌 in Fig. 3(a). The real image 𝑌 is mapped with latent code from a multivariate Gaussian distribution using an encoder 𝐸, which is concatenated with another input image 𝑋 to generate a fake image 𝑌̂ . Compared with the GAN structure in Pix2Pix, the cVAE-GAN combined GAN with a variational autoencoder, in which latent code 𝑧 generated by encoder 𝐸 with a Gaussian assumption was used as additional input for GAN. Three loss items in Eq (4) are designed to guide the training process.
where 1 output to match the input and stabilize the training. The loss function 𝐾 𝐿 (𝐸) is designed to encourage the latent distribution encoded by 𝐸(𝑌 ) to closely resemble a random Gaussian.
In contrast, cLR-GAN shown in Fig. 3(b) follows 𝑧 → 𝑌̂ → 𝑧,̂ a randomly drawn latent code 𝑧 and image 𝑋 are used to generate a fake image 𝑌̂ , an encoder 𝐸 is used to produce the recovered 𝑧̂ = 𝐸(𝑌̂ ). The training of the network is guided by reducing the difference between the latent encoding of the fake image 𝑌̂ and the random noise of the input. Therefore, adversarial loss GAN (𝐺, 𝐷) and latent vector loss latent (𝐺, 𝐸) are designed for networking training.
where 𝑙1𝑎𝑡𝑒𝑛𝑡 encourages 𝑧̂ to be close to the randomly drawn 𝑧 to enable bijective mapping. The total loss function of BicycleGAN consists of 𝑐 𝑉 𝐴𝐸−𝐺𝐴𝑁 and 𝑐 𝐿𝑅−𝐺𝐴𝑁.
Fig. 3. The schematic diagram of BicycleGAN [34]. (a) cVAE-GAN. (b) cLR-GAN.
2.2. Unsupervised translation with GAN
It should be noted that supervised translation requires aligned MFL and UT data. However, the misalignment between MFL and UT data can arise from the discrepancy in coordinates used by different measurement systems. This discrepancy is often attributed to positioning errors caused by the inspection tool’s motion instability, such as slippage and wobbling, as it traverses the pipeline. Consequently, the MFL and UT data with the distance reference are misaligned. Unlike image translation models trained with aligned image pairs that capture correspondences within the same feature space, using unaligned training image pairs in this context hampers the translation model’s ability to learn and generate accurate correspondences. To effectively utilize the unaligned data, the translation model needs to robustly handle unaligned sample pairs during the learning process.
(1) CycleGAN: To facilitate the utilization of unaligned data, unsupervised image translation models typically incorporate a variety of techniques within their network architectures. These techniques assist the model in learning the underlying patterns and relationships between images in two different domains. In CycleGAN [27], the framework employs two symmetrical GANs to establish a closed-loop network structure. In this configuration, one GAN undertakes the task of translating images from the source domain to the target domain, while the other GAN inversely translates images from the target domain back to the source domain. The incorporation of cycle-consistency serves as a key loss component, compelling the image to be identical with the original after two translations. The integration of cycle-consistency and adversarial loss in CycleGAN addresses the lack of aligned data, allowing the model to effectively learn patterns between unaligned images. As illustrated in Fig. 4, the framework employs two generators (𝐺𝑋2𝑌 and 𝐺𝑌 2𝑋 ) and discriminators (𝐷𝑋 and 𝐷𝑌 ), which are structured similarly to the generator and discriminator in the Pix2Pix model.
Specifically, two cycles are designed in CycleGAN to achieve the transformation of two unaligned data sets. As shown in Fig. 4, the cycle with the solid green line is to realize the conversion from domain 𝑋 to domain 𝑌 , while the solid orange line achieves the conversion from domain 𝑌 to domain 𝑋. Taking the orange cycle as an example, the input image 𝑋1 is translated by 𝐺𝑋2𝑌 to 𝑌 domain, and then the discriminator 𝐷𝑌 determines whether the generated image 𝑌̂1 is real or fake. The structure so far is similar to Pix2Pix. However, as there is no aligned target image for input image 𝑋1 , the reconstruction loss in Pix2Pix is unavailable for this unsupervised translation. To overcome this limitation, another generator 𝐺𝑌 2𝑋 is introduced to recover the image 𝑋1 from the generated image 𝑌̂1 . In this way, the cycle consistency loss is determined by calculating the difference between the recovered image 𝑋̂1 and the real image 𝑋1 .
Fig. 4. The schematic diagram of CycleGAN [27].
For the generated image, two discriminators 𝐷𝑋 and 𝐷𝑌 are introduced to judge whether the image in the corresponding field is real or not. In CycleGAN, the input of the discriminator has a slight difference compared to Pix2Pix because of the absence of an aligned counterpart. Take 𝐷𝑋 as an example, the input is not the aligned true and fake pairs across domains, but the real and generated images in the same domain 𝑋. Again, the discriminator is only used in the training stage.
The unsupervised translation model for MFL and UT incorporates both adversarial loss and cycle-consistency loss expressed as Eq. (12). The adversarial loss consists of two parts, i.e., 𝐺𝐴𝑁 X→Y and 𝐺𝐴𝑁 Y→X. The adversarial loss ensures that the generated sample follows the same distribution as the real sample. Meanwhile, the cycle-consistency loss encourages the sample to remain unchanged after passing through two generators.
where GAN is the adversarial loss, cyc is the cycle consistency loss. 𝜆 controls the importance of 𝑐 𝑦𝑐 . By jointly training two generators and two discriminators, the ultimate goal of obtaining the well-trained G to fool the well-trained D can be achieved.
(2) DiscoGAN: The objective of DiscoGAN is very similar to that of CycleGAN. DiscoGAN also aims to learn the relationships between different domains with unaligned data [35]. These unsupervised approaches overcome unaligned data problems with cyclic losses which encourage the translated domain to be faithfully reconstructed when mapped back to the original domain. Although their structure is similar, the implementation details and loss functions are different.
The generator structures used in DiscoGAN and CycleGAN are somewhat different. DiscoGAN uses the standard autoencoder architecture, and its narrow bottleneck layer may prevent the output image from retaining visual details in the input image. In comparison, CycleGAN introduces residual blocks to increase its feature extraction capacity. Another difference between DiscoGAN and CycleGAN is the penalty for cycle-consistency. DiscoGAN penalizes the L2 while CycleGAN uses L1 loss.
where 𝑐 𝑦𝑐 measures how well the original image is reconstructed after the two translations from the source domain to the target domain and back to the source domain. Compared to DiscoGAN, Cycle-GAN has an additional hyperparameter to adjust the contribution of cycle-consistency loss in the overall loss function.
(3) UNIT: Unsupervised image-to-image translation (UNIT) leverages the latent space assumption between source and target images [36]. This shared latent space hypothesis posits the existence of a variable 𝑧, enabling the recovery of an image in both domains. The latent space 𝑧 can be inferred from source images using respective encoders denoted as 𝐸𝑋 and 𝐸𝑌 . UNIT replaces the domain-specific latent space used in CycleGAN with a shared latent space that can be used to combine VAEs trained on data from both domains. The representation of each domain employs a VAE-GAN, with a weight-sharing constraint enforcing the creation of a shared latent space. As shown in Fig. 5(a), the VAE component learns by minimizing the loss between original and compressed data, while the GAN component employs an adversarial learning approach. This cohesive combination of VAEs and GANs, coupled with the shared latent space, defines the core principles of UNIT for effective unsupervised image-to-image translation. Additionally, the inclusion of cycle consistency loss serves to further regularize the inherently ambiguous nature of unsupervised image-to-image translation problems, which is depicted in Fig. 5(b).
where 𝐾 𝐿 represents KL divergence terms penalize deviation of the distribution of the latent code from the zero mean Gaussian (0, 𝐼). 𝑝𝐺𝑋 and 𝑝𝐺𝑌 follow Laplacian distributions. Therefore, minimizing the negative log-likelihood term is equivalent to minimizing the absolute distance between the image and the reconstructed image.
The adversarial loss is a traditional conditional GAN objective function. They are used to ensure the translated images resemble images in the target domains.
where the negative log-likelihood objective term ensures a twice-translated image resembles the input one.
Fig. 5. The schematic diagram of UNIT [36]. (a) VAE-GAN framework. (b) Cycle-consistency framework.
2.3. Evaluation Metrics
For a quantitative evaluation of the translation results, we used several metrics to assess the performance of our models. These metrics encompass widely recognized indicators, such as the Fréchet Inception Distance (FID) [37], structural similarity index (SSIM) [38], and normalized cross-correlation (NCC), as well as a predefined precision measure for the evaluation of the area of the defect. The FID metric provides a quantitative measure of the feature distance between real images and generated images. Lower FID values indicate better translation results. On the other hand, SSIM calculates the similarity between real images and the generated counterparts. A higher SSIM score suggests that the translated image closely resembles the original real image. Furthermore, the NCC index serves to measure the correlation between two images. With values ranging from −1 to 1, an NCC score of 1 denotes perfect correlation, while −1 indicates perfect anti-correlation.
where 𝜇𝑟 and 𝜇𝑔 represent the mean value of real images and the generated images, respectively. Moreover, 𝛴𝑟 represents the covariance matrix of real images, while 𝛴𝑔 represents the covariance matrix of generated images. 𝑇 𝑟 denotes the trace of a matrix. 𝜎𝑟2 and 𝜎𝑔2 are the variances of real images and the generated images, respectively. 𝜎𝑟𝑔 is the covariance. 𝐶1 and 𝐶2 are two constants used to avoid zeros in the denominator. 𝑟𝑖,𝑗 and 𝑔𝑖,𝑗 are the pixel values of the images at location (𝑖, 𝑗).
The defect area constitutes a crucial region of interest (ROI) within the images. To assess the image translations in a pixel-based manner, a novel image classification metric focusing on the ROI was devised. The classification process involves categorizing pixels in all images as either defect or non-defect, with specific classification principles outlined for both UT and MFL images, as presented in Table 1. Considering that the MFL inspection tool can typically detect defects with depths greater than 5% of the wall thickness [39], a threshold of 5%wt was set in the UT domain. The analysis of the normalized MFL images indicates that values representing defects are typically greater than 0.06, therefore a threshold of 0.06 was set in the MFL domain for defect classification. As a result, binary images are obtained for evaluation, as exemplified in Fig. 6. For the evaluation of the translation results, we define the classification precision, which serves as a performance indicator.
where the Precision was derived from the true-positive (TP) and false-positive (FP) samples extracted from the confusion matrix. This metric allows us to gain insights into the quality and fidelity of the translated images, especially in the context of the critical ROI.
3. Experiments and results
In this section, experiments were carried out to illustrate the promising results of GAN based methods. The in-line inspection data used were sourced from a Pipeline Inspection Gauge (PIG) traversing the pipeline, with MFL and UT inspections conducted on the same pipeline during the same year. The pipeline under inspection was 30 inches in diameter with a wall thickness of 7.1 mm. Recognizing that lift- off variations between sensors and the complexity of the pipeline environment can cause signal variations, a comprehensive background magnetization correction was implemented for the MFL data as defined by Eq. (27). To ensure consistent measurement spacing across NDI techniques, the recorded MFL and UT data were interpolated to a spatial interval of 1 mm using the cubic spline interpolation method.
where 𝑖 is sensor channel index, 𝑗 is axial sampling index. 𝑋𝑖,𝑗 is the preprocessed MFL signal of channel 𝑖 at axial sampling 𝑗. 𝑋𝑖,𝑗 is measured MFL signal of channel 𝑖 at axial sampling 𝑗, 𝐾 is total number of axial sampling. 𝑀 is the median value of the measured MFL signal. After these procedures, MFL data is spatially correlated with the UT data.
For the model training phase, we prepared both aligned and un- aligned datasets. Aligned datasets were constructed by pairing the MFL and UT measurements based on mileage. In cases where misalignment occurred due to unstable running conditions, manual adjustments were made to ensure alignment. Accordingly, unaligned datasets were generated by randomly introducing axial or circumferential shifts to one of the images within each aligned dataset. This process was designed to simulate real-world scenarios in which misalignments occur between the two inspection methods. The resulting image pairs subjected to these intentional shifts are referred to as unaligned data, as they do not exhibit strict pixel-to-pixel correspondence. In total, we collected 400 image pairs for training and 50 image pairs for testing, all formatted into patches of size 256 × 256. Our network is trained with 100 epochs and the batch size is 8. To facilitate a comprehensive performance evaluation of the model, we conducted qualitative and quantitative comparative studies with GAN-based models. All models were trained with the same batch size and the same training iterations.
Fig. 6. Binary processing. (a) Normalized MFL image. (b) Binary MFL image. (c) Original UT image. (d) Binary UT image.
In addition to the models presented in Section 2, the U-Net architecture is considered for comparison. Recognized for its ability to capture multi-scale contextual features through hierarchical feature maps, U-Net is commonly employed in GANs for generator construction. In this study, a modified version of the original U-Net structure was used, with input and output sizes adjusted to 256 × 256. Furthermore, diffusion modeling has shown great potential for high-quality image synthesis and has gained increasing attention in image-to-image translation tasks. Diffusion models convert samples from a standard Gaussian distribution into samples from an empirical data distribution through an iterative denoising process. For supervised translation, a novel image-to-image translation method based on the Brownian Bridge diffusion model (Diffusion) was proposed in [40]. This approach models image-to-image translation as a stochastic Brownian Bridge process and learns the translation between two domains directly through the bidirectional diffusion process, rather than relying on conditional generation. For unsupervised translation, a model for translating unpaired natural images using denoising diffusion models (UNIT-Diffusion) was developed in [41], which eliminates the need for adversarial training. These diffusion-based methods were included in our experiments for comparison. Two translation modes were realized in this study: (1) from MFL images to UT images and (2) from UT images to MFL images.
3.1. Translation of MFL to UT
In practice, MFL images possess the capability to provide high-resolution observations of magnetic flux leakage intensity. Predicting UT images from MFL observations aims to preserve information detected by MFL but presented in UT format. The qualitative results of selected sample translations are presented in Fig. 7. Each two rows represents the detailed outcomes of the translation for a single sample. The first column shows the measured MFL and UT data, respectively, while the other columns showcase the translation results. Overall, the converted UT images effectively capture the majority of significant defect areas. However, some subtle differences can be observed in the finer details among the various translation methods. To quantitatively assess these similarities between the translated images and the reference images, we present the quantitative results in Table 2.
For the models trained on aligned data, Res-Pix2Pix remains the best-performing model in terms of FID, SSIM, NCC, and Precision. It consistently produces images that are more correlated with the mea- sured UT than BicycleGAN, U-Net, and Diffusion. While BicycleGAN performs better than U-Net in FID and SSIM metrics, U Net shows relatively better performance in NCC and Precision. For the models trained on unaligned data, CycleGAN performs the best in terms of all metrics. And DiscoGAN performs better than UNIT across all metrics. Regular diffusion models have been reported to outperform GAN models in unconstrained image generation tasks [42]. However, in our study, we observe that diffusion-based models are less competitive compared to the best-performing GAN-based models. It is important to note that diffusion models for unconstrained image generation are typically trained on large, highly heterogeneous datasets. In contrast, the translation models used for MFL and UT in this study are trained on relatively small and less diverse datasets. In addition, field pipeline inspection images have higher intrinsic noise compared to natural images. Diffusion models rely on pixel-level loss functions during training, which are less sensitive to noise and other fine-grained features in images. In contrast, the adversarial loss function used in GAN models better captures and processes these fine-grained features. Overall, Res-Pix2Pix performs better for aligned data, while CycleGAN outperforms for unaligned data. To compare the performance trained on aligned and unaligned data, we chose the best-performing model in their respective scenarios for comparison. As we can see from Table 2, Res Pix2Pix trained on aligned images performs better than CycleGAN trained on unaligned data.
In contrast to other image generation or translation tasks, such as style transfer and colorization, where the input and output data types remain consistent, the translation between MFL and UT images involves representing an image into a different data type with significantly dissimilar image structures. As a consequence of these unique challenges, the metrics of NCC and SSIM scores are relatively lower when compared to other image applications. The lower scores indicate that achieving a high similarity correlation between the generated and ground truth images is more complex in the case of MFL-UT image translation. Nevertheless, the translation results still exhibit satisfactory image generation as shown in Fig. 7.
Fig. 7. Visual comparison of translation results using different approaches on MFL to UT. (Images in the same domain in each row share the colorbar). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
3.2. Translation of UT to MFL
To evaluate the performance of different translation directions, the conversion from UT to MFL was also conducted in this section. Different from MFL detects the magnetic flux leakage intensity, UT inspection gives the wall thickness measurement value, but with limited detection capability for small and subtle defects in the pipeline. The qualitative results are presented in Fig. 8. Intuitively, similar to the conversion of MFL to UT, obvious defects can be successfully generated. At the local part of images generated by the U-Net, DiscoGAN, and diffusion-based models, there is much noise so they look blurry. CycleGAN is the top-performing model. As for DiscoGAN and UNIT, the two slightly inferior models, their respective metrics exhibit minimal disparity. Overall, the precision across all GAN-based models exhibits slight variance, falling within the range of 0.6 to 0.7.
The quantitative results for the translation from UT to MFL are displayed in Table 3. For the model trained on aligned data, Res-Pix2Pix has achieved the best results in the indicators of FID, SSIM, NCC, and Precision. Higher Precision values indicate the defective regions in the translated images can be effectively classified in pixel manner. While the precision for both Res-Pix2Pix and BicycleGAN are the same, Res-Pix2Pix demonstrates superior performance concerning other metrics. In the context of models trained on unaligned data,
The diffusion-based model shows inferior performance compared to the GAN-based model in both supervised and unsupervised translation tasks. As shown in example 3 of Fig. 8, there is a significant difference between the UNIT-Diffusion results and the measured MFL. In addition to the previously mentioned factors of dataset size, heterogeneity, and noise level, this discrepancy is partly attributed to the randomness in the generated results. In our experiment, GAN models only receive source images to produce deterministic outputs, while diffusion models generate stochastic images by initiating the sampling process from a random noise image. In the task of translating from UT to MFL, where the translation is performed on a one-to-one basis, the variability exhibited by diffusion models can lead to inaccuracy in the translation results. Consequently, the general advantages of diffusion models with respect to sample diversity may be less pronounced in UT-to-MFL translation.
Fig. 8. Visual comparison of translation results using different approaches on UT to MFL. (Images in the same domain in each row share the colorbar). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
To compare the effect of different translation directions, we select metrics SSIM, NCC, and Precision for evaluation purposes. Notably, the FID metric is significantly influenced by pixel values, and given the disparate value scales inherent in MFL and UT data, direct comparison of FID values between MFL images and UT images is rendered unfeasible. Remarkably, our findings reveal a consistent trend across all metrics, wherein the UT to MFL translation direction consistently yields higher values than its counterpart, MFL to UT. This observation supports the notion that the images generated from UT to MFL are more similar to the reference image.
4. Discussion
To address the challenge of aligning data from different NDE sources with varying physical interpretations or formats, data translation map- pings created with GAN based models enable MFL and UT data to be represented in the same domain. The subsequent discussion involved an in-depth assessment of the performance of GAN based models in NDE translation applications, the exploration of optimal translation directions, and a comparative analysis between supervised and unsupervised translation methods based on the experimental results.
The application of GAN-based models to translate a sample from the source domain to the target domain is demonstrated with both quantitative scores and qualitative comparisons. The U-Net model is compared with the GAN-based models to evaluate the effectiveness of the adversarial strategy. The U-Net employs a single loss function that guides the training process by emphasizing the discrepancy between the predicted and target values for image translation. Limitations are introduced because the singular loss function may not comprehensively capture the complex relationships within the data, leading to noticeable distortion in local details. In contrast, GANs adopt an alternative training paradigm that incorporates adversarial dynamics between the generator and the discriminator. This alternative approach enhances GANs’ performance in image translation, allowing the model to capture intricate details more effectively than U-Net.
This study examines two translation modes, from MFL to UT images and vice versa. The performance of each translation direction was evaluated on both aligned and unaligned data, consistently revealing that the UT to MFL translation achieves superior results. The generated MFL images exhibit a closer resemblance to the reference images. This observation can be attributed to the inherent complexities in generating UT images. Analogizing the conversion between MFL and UT to the MFL direct inversion problem, i.e., deriving the metal loss profile from the MFL measurement signal, this process is recognized as an ill-posed inverse problem [43]. Similarly, generating UT images can be seen as a complex task due to the one-to many mapping relationship between MFL and UT data. In contrast, the translation from UT to MFL resembles a simpler forward problem akin to the conversion from metal loss profiles to MFL data. This favorable characteristic contributes to the improved performance observed in the UT to MFL direction.
The application of GAN-based NDI image translation has been investigated in both supervised and unsupervised approaches. In supervised domain mapping, the methods are trained using aligned image pairs. On the other hand, unsupervised translation tasks have been developed to address the challenge of expensive aligned data preparation, adopting cycle-consistency for model training. In contrast to supervised models, unsupervised approaches learn the mappings between two image domains instead of relying on aligned images. To assess the performance of multimodal data translation under different training conditions, a comparison was conducted between pair-based and unpair-based training models. The results revealed that aligned data-based translation outperforms unaligned data translation. Specifically, Res-Pix2Pix excelled in aligned datasets, while CycleGAN outperformed other methods in unaligned datasets. Nevertheless, the practicality of unaligned approaches in dealing with real-world data remains acceptable, despite their slightly lower performance compared to supervised models. Ultimately, the selection of translation methods should consider the availability of aligned or unaligned data.
As a pre-processing step for subsequent analyses, the translation operation can be applied to multimodal measurements taken close to each other on the same structure. The translation of multimodal NDT data can facilitate defect characterization by bridging the gap between different measurement techniques. For instance, relying solely on axial MFL measurements may overlook axial defects, whereas integrating UT data incorporates complementary information for a more comprehensive defect assessment. Additionally, while UT offers accurate thickness measurements, its resolution is typically lower than MFL inspections, potentially leading to missed defects within sensor gaps. With data translation, multimodal data integration becomes feasible, leading to improved performance in defect detection and analysis. Some works have already offered data fusion techniques for NDI, primarily relying on probabilistic approaches such as Bayesian inference [12], fuzzy set theory [44], and Dempster-Shafer evidence theory [45]. This underscores the significance of translation models in facilitating multimodal data integration, thereby playing a key role in subsequent inspection tasks and analysis for pipeline condition assessment and maintenance.
5. Conclusion
This paper presents a feasibility study on magnetic flux leakage and ultrasonic testing cross-modal data translation, aiming to convert measurement signals from different physical formats into the same domain. The translation is achieved using GAN-based models trained with either aligned or unaligned data. By leveraging the adversarial strategy, the generator and discriminator play pivotal roles in the translation process. Supervised training is guided by adversarial loss and reconstruction loss, while an additional consistency constraint enhances the unsupervised training model. Two translation modes were examined in this study, specifically from MFL to UT and from UT to MFL, to evaluate the performance of the translation directions. The comparative analysis leads to key findings: (1) GAN-based models with adversarial strategies effectively generate target domain data; (2) The UT to MFL translation outperforms the MFL to UT translation in terms of selected metrics, e.g., NCC, SSIM, and Precision; and (3) Aligned data-based translation demonstrates superior performance over unaligned data translation.
The translation results obtained in this study hold great potential as valuable inputs for further data fusion operations. By leveraging these results, we may achieve a more comprehensive understanding of defects, bringing us closer to accurate ground truth assessments. Moreover, the concept of data translation can be extended to various non-destructive inspection methods, including MFL to eddy current (EC) and UT to EC conversions, thereby broadening the applicability of this approach across different inspection techniques, and the integration of multiple NDE techniques becomes possible.
CRediT authorship contribution statement
Jiatong Ling: Writing – original draft, Visualization, Methodology, Investigation, Data curation, Conceptualization. Xiang Peng: Writing– review & editing, Data curation, Conceptualization. Matthias Peussner: Writing – review & editing, Data curation, Conceptualization. Kevin Siggers: Writing – review & editing, Funding acquisition, Conceptualization. Zheng Liu: Writing – review & editing, Supervision, Conceptualization.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
This work was supported by NSERC and ROSEN Technology Canada under Grant ALLRP 576744 - 22.
Authors:
Jiatong Ling, Xiang Peng, Matthias Peussner, Kevin Siggers, Zheng Liu