Published on 17-Mar-2025

Flaw sizing with plane wave imaging (PWI) – total focusing method (TFM) and deep learning for reactor pressure vessel

Flaw sizing with plane wave imaging (PWI) – total focusing method (TFM) and deep learning for reactor pressure vessel

Table of Contents

ABSTRACT

Developments in machine learning and deep convolutional networks (CNNs) have enabled automated assessment of nondestructive evaluation (NDE) data. Ultrasonic data is especially challenging for automated evaluation due to its complexity, multi-channel nature, and volume. Typical flaw signals have a low signal-to-noise ratio, particularly diffraction signals critical for sizing. This study presents a proof-of-concept on the application of deep CNNs, specifically U-net and Swin-U-net, for flaw sizing in ultrasonic data from a nuclear test block with realistic flaw simulations. The segmentation CNNs extract flaw signals, enabling the identification of the deepest crack tip echo, mimicking human inspection. This mimics the process used by human inspectors. Two distinct CNNs are trained: U-net and a transformer-based Swin-U-net. A novel data reconstruction method is proposed that combines plane wave imaging (PWI), synthetic aperture focusing (SAFT), and total focusing method (TFM) to provide a unified volume reconstructed view. Both networks provide good segmentation performance allowing accurate sizing, despite noisy data and complex flaw signals.

1. Introduction

Nondestructive evaluation (NDE) is used extensively in the nuclear industry to maintain and confirm continued reliable and safe operation of mechanical structures [1,2]. The effectiveness and reliability of NDE is paramount and there has been significant research and investment into confirming NDE reliability. Ultrasonic testing (UT) is particularly relevant and challenging. While the richness of UT data can be har­nessed to enhance inspection capabilities and performance it also makes the interpretation of complex data challenging. Improved procedures, qualification, increased mechanization and moving to technologies like phased array (PA) ultrasonics [3] have significantly improved to NDE reliability. At the same time, longstanding issues have remained relevant and even made worse by some of these developments. NDE data analysis is still laborious, difficult, and prone to human error, which is aggra­vated when the amount of data increases [4,5]. In this context, artificial intelligence and machine learning (AI/ML) can enhance the reliability of inspections through comprehensive interpretation of vast amounts of inspection data, a task that is beyond human capability. The further integration of digital tools and AI/ML is a critical enabler of NDE 4.0, the ongoing step-change or “revolution” in NDE systems and use [6–8].

1.1. Advances in NDE signal processing

Recent developments that combine different methods for insonifying a region of interest (ROI) with digital focusing have improved the per­formance of UT systems. Some emerging digital signal processing methods, like the total focusing method (TFM) [9–12], phase coherence imaging (PCI) [13–15], and plane wave imaging (PWI)[16,17] offer the opportunity to further enhance NDE capability and provide better data for flaw characterization and sizing [18,19]. Additionally, the use of ultrasonic signals, wavelet analysis, and significance testing has enabled the separation of structural noise and flaw echoes in metallic materials, further enhancing flaw detection capabilities [20]. Innovations such as adaptive time-frequency filtering and ultrasonic array transducers, have contributed to the enhancement of flaw detection in various materials, including stainless steel structures and composites [21]. These advancements have led to the development of novel approaches for improving the ultrasonic inspection of materials, emphasizing the continuous evolution of ultrasonic NDE techniques for flaw detection and sizing.

1.2. The challenges of high-volume data in NDE

The new NDE technologies have increased the amount of data to be analyzed. While this improves the data and technical performance of these methods, it also makes data evaluation more challenging and may increase potential for human errors. Data analysis can be a tedious and time-consuming task. Human inspectors are susceptible to errors due to fatigue and environmental factors [22–24], and performance can vary for different inspectors [25]. While recent research has provided tools for quantifying detection [25], the methodology for quantifying sizing performance is still relatively undeveloped [26]. Flaw sizing in NDE presents several major challenges due to the diverse nature of materials and the complexities involved in flaw detection. One of the key chal­lenges is the need for accurate and reliable flaw detection in critical components of systems such as nuclear power plants, and pipelines [27]. Furthermore, the effectiveness and accuracy of flaw sizing methods are crucial for ensuring the safety and integrity of structures, where ultra­sonic NDE techniques play a vital role in detecting, locating, and sizing potential flaws [21].

1.3. Flaw sizing

Different methods have been used for sizing flaws from UT signals [18]. Amplitude based sizing methods, such as the 6 dB drop method, have been industry standard but are limited and unreliable in the case of real cracks with irregular surfaces, often resulting in under or oversizing [28–30]. Other sizing techniques such as distance amplitude correction (DAC) and distance gain size (DGS) rely on the comparison of measured signals with reference signals from known reflectors. As such, sizing accuracy depends strongly on the similarity between the real flaws and the know reflectors, which in most cases is unrealistic [31,32]. So far, the most reliable methods for sizing of real cracks are those that are based on correctly identifying and locating diffraction signals from crack tips [15,33–35]. These are sometimes challenging due to the low amplitude of the tip signals and uncertainties related to the correct tip identification, particularly in noisy materials.

1.4. Automated flaw detection

Recent work on machine learning (ML) for NDE [36–41] makes it possible to automate the difficult data evaluation, reducing the risk of human errors, and increasing the consistency and efficiency of in­spections [42]. Although machine learning has shown success in com­plex classification tasks its use remains limited due to a lack of representative flawed datasets for training. For ultrasonic testing the relevant flaw characteristics include: location and orientation; size; opening through the whole path and at crack tip; fracture surface roughness; and whether it is filled with some substance [31]. It has also been demonstrated that ML generalization is different for different artificial flaws, solidification cracks, EDM notches, and simple simulated flaws [43]. Fortunately, data augmentation techniques have had recent success in providing synthetic or virtual flaws with sufficient quality for model training [44–48].

Different ML architectures have been successfully applied to NDE tasks. Examples in defect detection include support vector machines [49,50], random forest algorithms [51–53], or autoencoders [54]. More­over, the success of deep neural networks in various cognitive tasks has raised expectations for the application of ML in ultrasonic testing data interpretation in the non-destructive evaluation (NDE) field [55]. Recently, convolutional neural networks (CNN) have become the method of choice for tasks like image classification, object detection, and segmentation [56,57]. The integration of machine learning with ultrasonic data has shown potential in improving flaw detection and classification. Studies have demonstrated the use of neural networks to classify various flaw types, such as cracks, slag inclusions, porosity, and artificial flaws from ultrasonic signals, highlighting the potential for machine learning in enhancing flaw detection capabilities [43,58,59]. CNNs have been increasingly utilized in the field of ultrasonic testing [60]. CNNs are particularly preferred for processing two dimensional ultrasonic data and have been found to be effective in ultrasonic flaw classification, even in noisy conditions [58]. CNN-based ultrasonic image reconstruction has also been proposed as an approach for improving ultrafast acquisition [61], and several studies have demon­strated the effectiveness of CNNs in processing ultrasonic data to locate, and identify flaws from UT data [43,59,62].

Fig. 1. Mock-up with coordinate system. Schematic representation (a); and real mockup with probe and probe movement system (b).

Fig. 2. Schematic representation of the probe and wedge set-up. The plane waves were shot at 35◦, 40◦, 45◦, 50◦, and 55◦ angles. The reconstructed region of interest ROI is 200 × 200 voxels.

1.6. Data acquisition and digital reconstruction

Conventional PAUT applies beamforming both in transmission and reception. Electronic beamforming is performed directly in the hard­ware, according to precalculated focal laws, typically using the delay-and-sum (DAS) technique [70]. The beam may be “planar” or focused at specific points in the target volume. Because of the pre-determined focusing points, PAUT can suffer from limited lateral resolution and contrast, and loss of resolving power with increasing distance from the focusing point. To overcome these issues several alternative beam­ forming methods have beam put forward, mostly originating in medical UT applications. For the inspection of solid materials, factors such as mode conversion, or the effect of complex microstructure must be considered, but the working principles of these algorithms still apply. Spatial filtering techniques based on spatial spectrum estimation [71,72] can improve resolution and suppress interference from unwanted direction but are sensitive to covariance estimation and can be computationally complex. Variance based methods [73–76] excel at rejecting interference while enhancing the desired signal but might be sensitive to estimation errors or noise. Adaptive beamforming methods [77–79] dynamically adjust to changing conditions and are effective at suppressing interference in dynamic scenarios but can be slow to converge and continuous adaptation and parameter updates can intro­duce computational overhead. A version of DAS that performs the multiplication of combinatorially coupled signals before the summing step, called delay multiply and sum (DMAS) [80], enables higher contrast resolution though at the cost of amplitude loss away from the transmission focal depth.

The superposition of elemental signals works for wide range of different inspections, even when the elemental signals are separate in time and never superimposed in the physical material [81]. A set of reconstruction techniques takes this approach and excite and acquire separate elemental signals independently and then compute reconstructed data in post processing. The primary benefit of these techniques is that the focusing can be easily computed throughout the insonified volume, and various advanced focusing and contrast techniques can be applied to further optimize the inspection. The potential drawbacks include more computationally intensive data processing and a large volume of acquired data.

To achieve focusing at every point in a ROI the target is insonified with multiple waves. Although implementation details may vary, the digital focusing (DF) algorithms for UT mostly share the same working principle from a conceptual point of view [13,14,16,82,83]: a region of interest is discretized into a grid of pixels/voxels and the intensity of each pixel/voxel is determined by the coherent summation of the contribution from different signals that insonify the pixel/voxel from different directions. The various embodiments of this working principle differ in how the target is insonified, and how the contributions to the intensity in each/pixel and voxel are calculated exactly [83].

One of the first such algorithms was the synthetic aperture focusing technique (SAFT). Originally developed to improve the capabilities of radar systems, SAFT has been applied to ultrasonic testing since the late 70’s [81,84,85]. Each element works in pulse-echo mode and the data/image reconstruction for a point in a grid in the ROI is obtained by the coherent summation of the data sampled from each elements’ pulse-echo signal [86,87]. Other methods, using more than one element on receive can provide high resolution, though at the cost of processing time [82,88]. Full matrix capture (FMC) [9,89,90] is a data acquisition technique where the data from all possible transmit-receive element combinations on a phased-array probe is captured, providing a comprehensive dataset that enables detailed analysis and imaging of the target. Processing the complete matrix of collected data requires significant computational power [83,90,91]. Methods such as plane wave imaging (PWI) [16,17] and sparse array acquisition [92] mitigate these issues by reducing the number of transmit events without significant reduction of imaging quality if the parameters are adequate.

In the present study, we present a proof-of-concept on the application of deep convolutional neural networks (CNNs) to provide sizing on plane wave imaging/total focusing data obtained from a nuclear test block with realistic flaw simulations. We combine PWI with TFM and SAFT to generate high quality reconstructions of a region near the inner diameter of a reactor pressure vessel (RPV) wall mock-up with embedded artificial flaws. We then use these images to train a state-of- the art Swin U-net model [93] to detect and size defects in the mock-up.

Fig. 3. Representation of probe orientation and alignment for each of the for scans.

2. Materials and methods

2.1. Mock-up material and geometry

The mock-up (shown in Fig. 1) represents a section of an RPV made from ferritic pressure vessel steel (shear wave sound speed 3250 m/s) cladded on inner diameter side with a 10 mm thick layer of austenitic stainless steel. The wall thickness of the mock-up is 151 mm, including the cladding. The outer radius is 1922 mm, the arc length is 948 mm at the outer diameter, and the axial length is 933 mm. Flaw depths range from 1.8 mm to 18 mm, and flaw lengths range from 5 mm to 40 mm.

Fig. 4. Representation of proposed combined PWI-TFM and SATF technique (SATFM).

Fig. 5. 3D volumes of the four full scans after SATFM reconstruction. The Axial + data set shows intermittent contact issues.

2.2. Phased array data acquisition set-up

The phased array data was acquired in pulse-echo mode using a Verasonics Vantage 64 LE system research scanner and an Imasonic CdC8426-4 linear-array probe with a center frequency of 3.5 MHz. The wavelength is ≈ 0.93 mm. The probe has a total of 96 elements (with size 17 × 0.725 mm and pitch 0.8 mm) but, since the Vantage 64 LE is limited to 64 receive (Rx) channels, only the first 64 elements were used, resulting in a total aperture of 51.2 mm. 

Inspections looking for crack-type defects are typically made using shear waves at around 45◦ . So, to minimize electronic steering, the probe the was mounted on a 30.5◦ wedge made from cross-linked polystyrene (sound speed 2330 mm/s). With no electronic steering, the 30.5◦ wedge results in 45◦ shear waves in the steel mock-up. Each acquisition was performed using 6 plane waves (PW) transmitted at different angles resulting in 0◦, 35◦, 40◦, 45◦, 50◦, and 55◦ in steel. The recorded data corresponds to acquired signals received by the 64 elem­ents. A schematic representation of the probe-wedge set-up is pre­sented in Fig. 2. 

The received signals were sampled at 31.25 MHz (10.2 samples/ wave in steel), resulting in an A-scan resolution of 9.6 samples/mm (sound path resolution of 0.032 μs). Each acquisition was performed using six steered PWs and saves a file containing six sets (one per angle) of 64 A-scans (one per element) with 5888 data points per A-scan. The size of each file is 4417 kB. However, the data from 0◦ angle acquisitions was not used for the data reconstruction/processing.

The probe-wedge assembly Table 1 was enclosed in a custom-made chassis with features that enable water to be fed to the contact surface for proper coupling. The chassis was attached to a two-axis encoding scanner (Olympus GLIDER X-Y Scanner) that was mounted on the mock-up (Fig. 1). The encoder was interfaced with the Verasonics system via an Arduino UNO, which was used to decode the encoder’s signal and, accordingly, trigger the Verasonics device to perform the acquisitions at fixed step positions. The Arduino was programmed to output one trig­gering pulse after every 8 steps/pulses from the scanning-axis encoder, which for the encoder’s resolution results in one acquisition per 2.46 mm. The indexing-axis step/resolution was also set to 2.46 mm.

The data acquisition was done one scan line at a time. The procedure consisted of: manually moving the probe-wedge assembly along a scanning line encompassing the full length of the mock-up, with the encoder triggering the acquisitions; moving the probe-wedge to the start of a new scanning line at the next indexing position; and repeating the two previous steps until the full scan of the mock-up is complete.

The mock-up was scanned four times with four different probe ori­entations. The scanning direction was fixed along the axial dimension of the mock-up but for each scan the probe was aligned with or against the circumferential or axial dimensions of the mock-up. The four scanning directions labeled Axial+, Axial-, Circumferential+, and Circumferen­tial- (Table 2) are depicted in Fig. 3. Two versions of the wedge were manufactured with different contact surface geometry to conform to the mock-up curvature and ensure adequate coupling at the different scan­ning orientations.

Due to the asymmetry of the probe-wedge assembly coupled with size limitations of the scanning encoder, the four full scans did not have the same number of indexing lines and acquisitions per scanning line. The number of acquisitions for each full scan is presented in. The size of the raw data for each full scan of the whole mock-up is 2 × a × e × i × s × p, where a is the number of transmit angles, e is the number of ele­ ments, p is the number of data points per A-scan, s is the number of acquisitions/files per scan line, and i is the number of scan lines/index.

2.2.1. PWI/TFM data reconstruction

The raw data was reconstructed using the modified plane wave imaging PWI algorithm with direct mode following Le Jeune et al. [17]. The method allows focusing on every point of a ROI with a reduced number of transmissions compared to full FMC acquisition. Per acqui­sition, a set of N plane waves are transmitted at N angles and the backscattered signals recoded by every element E are saved into a ma­trix. For each angle, the image at the ROI is constructed pixel-by-pixel from the coherent sum of the analytical signals s given by the Hil­bert’s transform of the angle-element signal responses at the pixel location, with the intensity of a pixel at point P given by:

where tqP is the forward time of flight the plane wave at angle q to the ocusing point P, and tkP is the backward time of flight from the focusing point P to the receiving element k. The reconstructed data is then saved as two datasets containing the magnitude and the phase angle of the analytical signal, respectively.

The size of the defined ROI was 200 × 200 pixels with a pixel size of 0.6 mm, encompassing a 120 × 120 mm area. The position of the ROI relative to the probe is depicted in Fig. 2. Although the data acquisition was performed with six PW, the 0◦ wave was not used in the data reconstruction, reducing the number of PW to five (35◦, 40◦, 45◦, 50◦, and 55◦ ). The dimensionality/size of the data is thus reduced as pre­sented in Table 3.

Fig. 6. Example flawed data frames (left) and resolved masks (right) from the training data. Reconstructed signal amplitude is indicated with darkening blue. Flawed pixels in mask are indicated in black. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

2.2.2. Synthetic aperture total focusing method (SATFM)

Having the ROI resolution equal to a submultiple of the scanning resolution, enables straightforward summing the intensity of each pixel across different frames in the same scan line, effectively combining the PWI-TFM with SAFT (Fig. 4). The previously created PWI-TFM re­constructions were then combined to create full SATFM reconstruction. This is a straightforward extension from the very similar SAFT and TFM reconstruction methods to form a combined reconstruction. The TFM reconstructed ROI with magnitude and phase angle information at 0.6 mm resolution and 2.46 mm frame step are combined by offsetting by 4 pixels (2.46 mm) followed by phase-coherent summing along the indexing direction (in the case of the Circumferential scans) and the scanning direction (in the case of the Axial scans). The final dimension of the four SATFM datasets is presented in Table 4. The SATFM recon­struction significantly reduces data size and enables the reconstruction of the full 3D volume (Fig. 5) of the target, which promotes easier data visualization and evaluation. Fig. 5 shows that the Axial + data set shows intermittent contact issues. Most flaws are not immediately apparent as high-amplitude signals and require more detailed evalua­tion of local signal behaviour to be revealed.

Fig. 7. Sizing results over the 20 flaws labeled in the test data. Negative numbers indicate mm below scanning surface.

2.3. Data pre-processing and labeling for ML

While the SATFM reconstruction significantly reduces data redun­dancy and allows better reconstruction with more elemental signals, it may also suffer from unsharpness induced by errors in elemental signal configuration. For example, poor surface quality, uneven scanning and material variations may induce noise and result in unsharpening of the reconstructed image. This is of particular concern, when scanning on uneven surfaces and aiming to find small elusive signals such as crack tip signals. With the observed poor surface condition of the mock-up, it was decided to use SATFM reconstruction for data evaluation and labeling, for best data quality, but train the ML model on the individual PWI/TFM frames to allow best possible chances of picking up crack tip signals. The raw inference results were then recombined akin to the UT data to the SATFM voxel space to be represented.

Fig. 8. Example SATFM reconstructed data and reconstructed true state labels, U-net-inference and Swin-U-net inference results. The results are from Axial + data and show example of excessive false call rate experienced by the U-net (right side of the labeled flaw) and corresponding reduced sensitivity by the Swin-U-net (left indicated flaw). The grey horizontal lines show nominal back-wall and estimated cladding interface.

Labeling was carried out using open-source Napari software [94], which is particularly well-suited for labeling multi-dimensional data. The manual labeling was performed at pixel-level for all data. It was further refined by applying an amplitude threshold to focus more accurately on the higher-amplitude echoes within the labeled area. The label mask established corresponded to the whole flaw “body”, i.e. sig­nals from corner trap to highest tip.

It is normally expected that the corner trap gives the highest amplitude and is the primary means for crack detection. The tip diffraction is expected to be some 20 dB lower [18]. In between, several minor echoes are expected unless the flaw surface is exceptionally smooth. These arise from crack path tortuosity and microscopic branching. Thus, the expected indication from a flaw would be a dyad of high corner trap echo and lower tip echo sparsely connected by echoes along the crack face.

In the present case, several things complicate this theoretical ideal. Firstly, the austenitic cladding on the surface exhibits significant microstructural noise that effectively suppresses much of the corner trap echo. Thus, in many cases, the corner trap echo is not clearly distin­guishable from the cladding noise. Secondly, the tip echo and face echoes tend to vary due to variations in the tip/face condition and vary significantly depending on the elemental beam direction and local conditions. Thirdly, some of the artificial flaws in the present mock-up were created by weld-implanting flaws into the pristine material and this implantation weld shows increased noise and sometimes spurious indications that may be related to additional weld flaws. Thus, the tasks for the ML model would be to extract or segment the flaw face signals from the assumed corner trap within the cladding noise to the highest tip-like signal still associated with the flaw, and to do this in a stable manner even with variation of the signal strength and contact conditions during the scanning. Accordingly, in the data labeling, the labeled mask corresponded to such crack face signals, as evaluated by the human evaluator. This task is inherently ambiguous, and it is impossible to fully separate flaw signals from the surrounding noise. Fig. 6 shows some example label masks from the training data.

Fig. 9. Example SATFM reconstructed data and reconstructed true state labels, U-net-inference and Swin-U-net inference results. The results are from Circumferential + data and show example of good detection and sizing on low amplitude flaw indication. Both models show very similar inference results.

As is evident from Fig. 6, the labeling reflects the inherent ambiguity and contains significant uncertainty. The primary interest lies in correctly defining the crack tip and secondary interest to correctly identify the crack path all the way to the cladding noise. However, not all features or characteristics of the defect are necessarily visible in a single frame. Thus, the labeling, as implemented, will necessarily contain high number of labeled pixels, which are impossible to infer from the single shot. Accordingly, it is not expected or required that the trained model will be able to successfully replicate the labeling in each shot. Instead, the aim is to train a model that will, when combined in SATFM manner, correctly infer the crack extent.

The flaws in the mock-up are primarily oriented in near axial or circumferential directions. These correspond to the scan directions. Consequently, the scan plan (Table 2) results in all flaws scanned twice with optimal orientation, once from each direction. While these are the same physical flaws, the scans from different directions see different face of the crack and are considered substantially independent from each other.

2.4. Models training

For the model training, the negative (Circumferential- and Axial-) scans were used for training and validation and the positive scans were used for testing only. This provided extensive independent testing data. Also, to change the scan direction, the probe is removed and re-introduced to the holder and the scanning sees the surface in different direction. Hence, if there’s any systematic variation in the surface or probe attachments, these will not leak information to the training with this arrangement.

Altogether, this yielded 219230 example frames for training and validation, with 12791 (12.2 %) being flawed. The samples were ran­domized during training with 6400 reserved for validation.

Two models were trained, Swin-U-net [93] (26 million parameters) and a more traditional U-net [95] (20 million parameters). The U-net architecture has shown good performance in multitude of engineering segmentation tasks and is used here as a baseline model. The Swin-U-net is a more recent architecture and used here to explore the potential benefits afforded by the transformer architecture. The U-net exhibits somewhat better inference performance (22 vs. 66 ms/frame on Apple M1 Max). Both models are fast and this difference is expected to have minor practical difference. Both models are trained with the same data. Both models were trained until no further improvement was expected.

In the training phase, we employed a Gaussian smoothing with a 5 × 5 pixel kernel to smooth the labels. This was the extent of data augmentation; notably, we consciously chose not to use the common practice of flip augmentation. Our rationale was that the directionality of echoes carries significant information in our study, and flipping could potentially distort this. Additionally, while the reconstruction process inherently acts as a smoothing operation, we postulated that introducing random noise would not contribute meaningful variability relevant to our problem domain. However, a more complex method of noise introduction, such as Perlin noise, might be advantageous for intro­ducing problem-specific variation but was omitted here for simplicity.

The training data is suboptimal and could be improved. Firstly, the data is biased towards the non-flawed samples, as is typical for NDE data. Secondly, the number of flaw samples is fairly limited and the flaw signals contain artifacts due to the flaw implantation process. These are common issues in NDE and there are known ways to alleviate them, most importantly the use of virtual flaws [46]. However, in the present work this data refinement was omitted for simplicity.

Fig. 10. Full 3D reconstructions. Comparison between the "raw" SATFM reconstruction, the true state labels, the U-net inference results, and the Swin-U-net inference results.

3. Results and discussion

As discussed, the labeling in the training and testing data is neces­sarily ambiguous, and thus the elementary segmentation results by themselves are not expected to be too good or interesting. Nevertheless, the intersection over union (IoU) values are shown in Table 5 for completeness, together with the more interesting combined SATFM segmentation results. Here, the IoU values are computed for each labeled SATFM frame. Even more important is the detection rate and accuracy of the deepest flaw tip signal. To evaluate this, each distinct flaw in the data was separated, the maximum flaw depth was computed from both the labels and inference results, and these values were compared to reveal the sizing accuracy. These data are shown in Fig. 7 and the mean sizing accuracy and confidence bounds are presented in Table 5. All the labeled cracks were found. However, the task and labeling focused on cracks with significant size so the high detection rate here should not be taken to indicate the applicability of the model for flaw detection in general.

The two scan directions reserved for testing purposes showed sig­nificant differences in data quality: the Axial + scan had significantly more contact variation and data quality issues due to the surface quality and other sample-related coupling issues. While this was unintended, it offers a peek into the effects of data quality on the models studied. Thus, in the results, the Axial, Circumferential, and combined total results are presented separately (Table 5).

For a qualitative impression on the data quality and model perfor­mance, some example images are presented in Figs. 8 and 9. Both the models are seen to accurately segment the labeled defect up to the highest tip. For the lower-quality data (Fig. 8), the U-net shows addi­tional (false) indications near the bottom, whereas the Swin-U-net does not. Conversely, the Swin-U-net is seen to lose some sensitivity on the left indication due to the data quality issues, while the U-net seems to retain full sensitivity. For the good quality data (Fig. 9), both models show similar results.

Despite considerable ambiguity in the segmentation task, the trained models show impressive aggregate (SATFM) performance for the crack detection and sizing task. While the IoU values for individual frames is very low, the models still manage to correctly identify the deepest crack locations, on aggregate. The sizing accuracy is on par with the labeling accuracy and the slight undersizing tendency for both models is likely mostly associated with labeling inaccuracy that tends to oversize the true indications discernible from the data.

The reconstruction of individual TFM frames into “SATFM” frames provides significant benefit for sizing, as the accumulated signal di­minishes issues stemming from local variation between different TFM shots. Perhaps more unexpected is that this benefit also translates into combining inference signals. The per-frame inference results display fairly low IoU values and aggregating results provide significant improvement. To a certain extent, this may be caused by deficiencies in the per-shot labeling, but due to the inherent ambiguity of the task, improving per-shot labeling provides limited opportunity.

The comparison between the baseline U-Net and transformed based Swin-U-net show some differences. Both of the models display impres­sive capability to correctly detect the deepest crack location and, over­all, the differences are too small to be significant, i.e. the variation between the model is in the same order of magnitude as the variation within the model. However, the Swin-U-net shows much fewer false calls than the U-Net.

Looking at the different test data, it’s noticeable that on the better-quality data (Circumferential+) the results of the two models are very similar both in terms of sizing accuracy and false call pixel rate. The overall false pixel rate is very low, and false clusters are associated with unlabeled flaw-like features in the data, such as flaws in unfavorable orientation. However, the models seem to react differently to degrading data quality. The U-net retains sizing accuracy, while exhibiting increased false call pixel rate (Fig. 8). The additional falsely called pixels tend to be associated with high amplitude noise near the back wall. In this sense, the reaction can be considered “conservative” in that speci­ficity is sacrificed for sensitivity. The Swin-U-net, in contrast, retains or even reduces false call rate, at the expense of slight reduction of sizing accuracy. The differences overall are small and both models retain impressive performance despite decreasing data quality (Fig. 7).

The focus of this study was to train deep learning models to faithfully segment the rather ill-defined flaw face signals to allow automated flaw sizing. The use of new lower quality “SATFM” reconstruction that combines existing TFM and SAFT techniques proved valuable in obtaining reliable sizing data from the noisy signals. In addition, the reconstruction significantly reduces data size and enables reconstruction to true 3D volume, which promotes easier data evaluation and visualization. Similar benefits were also seen when ML training and inference were done on the single TFM- frame level and individual results then combined similarly to 3D volume (Fig. 10). Results show that both tried models can be successfully trained to segment noisy flaw signals so that the resolved highest tip signals correspond accurately to labeled values, i.e. what human inspector would have identified. Ground truth in this context refers to the labels. It should be noted that no comparison was made with the identified flaw signals and true flaw depth with destructive evaluation. In further research, more detailed comparison and tuning to actual known flaw depths will be addressed. There was no geometric variation in the training data and therefore the model is not expected to generalize to different target geometries. Additional tuning would be needed to overcome this.

4. Conclusions

In this paper, we implemented a novel “SATFM” data reconstruction method that is a straight-forward extension to the existing TFM and SAFT techniques. The combined reconstruction improves data evalua­tion for complex data sets and, in particular, helps to correctly identify crack tip signals in noisy data. The reconstruction was further used to help formulate AI/ML task in a way to allow training flaw data seg­mentation for subsequent flaw sizing. Two models were trained and results compared: a state-of-the art Swin-U-net transformer-based model and, for comparison, a classical U-net model.

The following conclusions can be drawn from this study:

  • The proposed SATFM reconstruction provides stable reconstruction that supports data evaluation for flaw sizing;
  • The SATFM also enables the full 3D reconstruction of the target volume and a significant data size reduction;
  • Both the classic U-net and the transformer based Swin-Unet yielded similar qualitatively good results;
  • On high-quality data (i.e. with adequate contact) both of the models yield very similar results;
  • On low-quality data the U-net model displays lower specificity, while the Swin-U-net displays lower sensitivity;
  • All the labeled cracks were successfully detected with both models and the sizing accuracy in the test data is on-par with the labeling accuracy with slight tendency towards undersizing, in this proof-of-concept.
  • The results show the viability of ML for ultrasonic sizing and constitute a successful proof-of-concept.

CRediT authorship contribution statement

Gonçalo Sorger: Writing – review & editing, Writing – original draft, Visualization, Software, Methodology, Investigation. Iikka Virkkunen: Writing – review & editing, Writing – original draft, Visualization, Su­pervision, Software, Methodology, Conceptualization. Christer Söderholm: Software, Investigation.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

The flawed mock-ups for this study were provided by Fortum. The authors acknowledge the financial support of the Finnish State Nuclear Waste Management Fund (VYR).



NEWSLETTER

Get the latest insights from the NDT world delivered straight to your inbox
See you soon in your inbox