ABSTRACT
Undersea pipelines are susceptible to corrosion, leading to resource loss and significant harm to the natural ecosystem. Hence, it is necessary to construct a corrosion model for detection and maintenance. This research primarily examines the existing literature on data-driven models utilising Machine Learning (ML) methods, particularly Artificial Neural Networks (NN’s) and also considers the models based on other theories to provide references for corrosion models. An initial stage involves analysing the main cause of corrosion and identifying the key factors contributing to this structural failure. Then, the review highlights the benefits of ML by listing their composition and current applications. Furthermore, the article analyses corrosion modelling using other methods and examines the potential avenues for optimisation that may provide to ML. Additionally, it considers the cost aspect and provides potential methods and suggestions for reducing costs. This review can serve as a valuable reference for researchers studying corrosive pipeline modelling.
1. Introduction
Owing to the progress made in industry and technology, there is an ongoing substantial rise in the worldwide need for fossil energy. Oil and gas are expected to remain the predominant sources of global energy in the future, with around 50 % of the overall share [1]. Pipelines have shown their economic efficacy in the transportation of crude oil and natural gas across significant distances in the petroleum industry [2]. Submarine pipelines, which connect countries with abundant energy resources like Malaysia, Brunei, and Gulf countries, are widely recognised as the most practical and effective means of transporting large amounts of oil and gas to both inter-field and onshore areas [3]. Nevertheless, the use of pipelines for submarine transportation still presents many significant issues that need resolution.
Pipelines are often plagued by corrosion, a prevalent problem that may cause damage both outside and inside. This issue is closely tied to environmental concerns. The combination of water velocity, varying dissolved oxygen levels [4], a complex mixture of salts [4,5], and hydrostatic pressure in deep-sea environments [6] may lead to significant corrosion of submarine pipelines. Besides external environmental factors, internal factors such as the oxygen concentration, reactivity, temperature, flow rate, and pressure of the liquids and gases being transported, as well as the employment of different metals and their material structures in the piping system, can also contribute to corrosion [7]. Furthermore, the choice of maintenance plan and the regularity of maintenance for maritime pipes will impact the occurrence of corrosion. If the corrosion problem is not adequately addressed, it will give rise to several complications. Indeed, corrosion has the potential to result in many concerns such as degradation, leaks, and failures of metallic pipes. Additionally, it may have adverse economic consequences and negatively affect both the environment and human beings [8–10]. In November 2013, a corrosion-induced oil pipeline leak in Qingdao resulted in an explosion due to incorrect emergency maintenance procedures, resulting in 62 fatalities [11] In Mexico, pipeline explosion accidents have caused a minimum of 124 deaths and numerous injuries [12]. Corrosion has also been identified as the primary cause of pipeline deterioration in Canada [13] Fig. 1 displays the detailed categorization of accidents that have occurred in the offshore steel pipelines [14].
To address the corrosion issues, it is crucial to identify and maintain the operational integrity of pipelines. Nevertheless, when it comes to offshore pipelines, detecting both internal and external issues is challenging in the maritime setting. Therefore, in addition to using advanced tools for detection, developing a predictive model, such as an algorithm to forecast the corrosion rate of the pipeline, is a more effective approach [15].
Collecting corrosion data is an essential first step in developing a prediction model. The data often lacks information, has errors, is varied, and is large in volume (with a low concentration of valuable information [16], resulting in a highly complex system. Utilizing traditional methods to examine these datasets that lack integrity may provide inaccurate findings [17], therefore worsening the corrosion rate.
Hence, it is essential to develop novel approaches to address the intricate issue, which should include a minimum of three distinct characteristics:
- Can effectively extract certain characteristics from datasets with defects;
- Possess a high computational capacity;
- Does not depend on the physical representation of the datasets.
Fig. 1. Offshore steel pipeline accident failure.
Given these three characteristics, this research opts for Machine Learning (ML) as the solution to address these intricate problems. ML allows for the identification and extraction of intricate and often concealed relationships, insights, patterns, rules, and advice from provided information, which are then represented in mathematical form [18] (The ML techniques discussed in this article specifically pertain to supervised learning). Extensive research has shown that ML is quite effective in predicting corrosion [19-21]. Qinying Wang et al. [19] developed an ML database for predicting corrosion in oil and gas pipelines. This database helps in making informed decisions when selecting models. Mazzella et al. [20] used ML techniques to precisely forecast the rate of corrosion. Their method correctly estimates the growth rates of corrosion for an underground asset for every given latitude and longitude combination in North America. Ossai, C.I.[21] used a data-driven ML method to assess the progression of the corrosion defect depth increase. Their study focused on the X52 grade of the pipeline.
Despite the numerous benefits and extensive utilization of ML, it is important to acknowledge its limitations. One limitation is as ML’s heavy reliance on data, the quality of the data directly impacts the accuracy of ML predictions. Therefore, it is crucial to preprocess the acquired data appropriately to enhance its quality before utilizing it for modelling purposes. Furthermore, because to ML’s lack of dependence on data-based theory, the resulting model exhibits limited interpretability. Consequently, although this study will primarily focus on analysing corrosion models built by ML, other corrosion models, including those based on statistics and solid mechanics, will be also analysed to expand the scope and interpretability of corrosion modelling for maritime pipelines.
In conclusion, this manuscript will provide an analysis of the importance of predicting corrosion in submarine pipelines through an intensive literature review. It examines the causes and characteristics of corrosion in these pipelines, with a primary focus on the methodology,
steps, and outcomes of corrosion prediction modelling using ML. Additionally, it explores corrosion modelling based on other technological theories and presents a summary of the findings. Furthermore, this study enumerates some key aspects that serve as a valuable resource for future research and resolution of issues about submarine pipeline corrosion.
The subsequent sections of this article are as follows: Section 2 provides an overview of the sources and classifications of all references thereby leading to relevant research questions. Section 3 introduces 5 types of corrosion in oil and gas pipelines, along with the theoretical foundation of corrosion modelling, including the definition of corrosion damage and corrosion rate, as well as regular maintenance etc. Sections 4 and 5 provides a brief introduction to ML (Supervised Learning) and NN models, explaining the steps used in analyzing the results. Section 6 presents and discusses results from models based on multiple theories pertaining to alternative corrosion modelling. Sections 7 and 8 covers the outcome derived from the aforementioned models, which was examined and combined in accordance with the proposed review. In addition, Section 8 provides significant recommendations and insights into the future modelling of maritime-corroded pipes.
2. Screening of literature
Based on the findings of the analysis in the introduction, in this section, we will conduct a literature screening to examine and determine the approaches that are most pertinent to the research problem. We originally used the search engine ’Web of Science’, which encompasses a diverse array of papers, to extract the bibliometric datasets by doing targeted keyword searches and can combine with ‘VOSviewer’, a software application specifically created for generating and visualising bibliometric networks, to examine the link among the topic terms in papers. Upon analysing the results obtained via ‘VOSviewer’, we will initiate a further investigation, using several search engines including ‘Google Scholar’, ‘IET Inspec’ and ‘Science Direct’ to identify pertinent articles that align with our predetermined criteria.
2.1. Reviewing for deep sea pipeline corrosion
To investigate the study subject of ’deep-sea pipeline corrosion’ in recent papers, we first do a search using the keywords ’pipeline’, ’corrosion’, and ’sea’ combined with the logical operator ’AND’. There are a total of 94 publications spanning from 2020 to 2023 on this search engine. Fig. 2 displays the bibliometric network of publications and their citation linkages. The weight of each word corresponds to the number of occurrences, while the colour label indicates the average number of citations.
The result unequivocally establishes that the primary determinant of underwater pipeline corrosion is the central subject of these articles, which provides us with further examination to examine the characteristics of corrosion. Furthermore, the undersea environment also influences pipeline corrosion, as shown by the weight and relevance of the words like ‘seawater’ and ‘hydrostatic pressure’.
2.2. Reviewing for pipeline corrosion and ML
To analyse the relevance of pipeline corrosion and ML, the second search utilises the keywords ’Pipeline corrosion’ and ’machine learning’. There are a total of 112 papers indexed in the Web of Science database during the years 2020 to 2023. Fig. 3 displays the outcomes of VOSviewer.
The outcome unequivocally validates the analysis presented in the introduction, indicating that the model is undeniably the central concept in papers on corrosion analysis. In addition, the use of Artificial Neural Networks (NN’s) is shown, with a predominant focus on citation. This may demonstrate that NN’s are more effective in addressing corrosion problems. Furthermore, this figure indicates that researchers are primarily focused on the accuracy of corrosion prediction during this period.
Fig. 2. The bibliometric network of ’pipeline’, ’corrosion’, and ’sea’ publications and citation linkages.
Fig. 3. The bibliometric network of ’Pipeline corrosion’, and ’Machine learning’ publications and citation linkages.
2.3. Reviewing for corrosion models based on other methods or principles
Although this manuscript focuses primarily on the application of ML to marine pipelines, Section 1 also discusses various factors influencing pipeline corrosion. During this review, we identified several studies that models corrosion based on theories, such as statistics, solid mechanics, and so on, on the effects of internal, external, or operational factors on pipelines. Hence, this study will also review the literature since these approaches can enrich the architecture of corrosion modelling more fully on the one hand, while also generating data that can be used as a source of data for data-driven modelling on the other.
2.4. Post reviewing selection criteria
Based on the above analysis and the information presented in Figs. 2 and 3, the scope of the papers to be discussed in this study is defined. As an integral component of this study, the literature should encompass an analysis of the various types of corrosion occurring in submarine pipelines, including an examination of the corrosion features or underlying reasons. Additionally, the influence of the deep-sea environment, pipeline, the introduction of corrosion models, the collection of corrosion datasets, and the implementation of ML methods (especially NN’s) and other techniques or theories for corrosion research should be included. The analysis of prediction results should also be conducted. To find relevant publications, this work primarily uses the search engines ’Google Scholar’, ’IET Inspec’ and ‘Science Direct’. Additionally, we also extract citations from the references to these articles. Table 1 shows the classification of portion references.
2.5. Research questions
Based on these search results, there is a distinct understanding of the study on deep-sea pipeline corrosion, which emphasises the path for future research. Therefore, there are a minimum of five problems that must be resolved based on the former search.
- What is the cause or characteristic of submarine pipeline corrosion?
- Why are NN’s so popular for addressing corrosion issues?
- How can we ensure the precision of corrosion prediction?
- How to deal with ML models that are not particularly explanatory and are sensitive to the quality of the data?
- What are the advantages of corrosion modelling based on other theories, and how can they successfully feed data-driven models?
The subsequent study will aim to address these issues by investigating the above-mentioned research questions.
3. Features of pipeline corrosion
Corrosion is a significant factor leading to pipeline failures in the oil and gas production sector. It is responsible for between 25 % and 66 % of the total downtime experienced in the industry [22–24] Therefore, in order to address this significant issue, it is essential to examine the cause of corrosion in deep-sea pipes. Jiang, X., et al. [4] demonstrated that the environment has a significant impact on deep sea pipeline corrosion. This impact may be categorized as external variables, which include chemical effects such as the pH of the water and physical impacts such as hydrostatic pressure. And internal variables, such as internal surface corrosion due to CO2, and H2S may have a significant influence. Furthermore, whether maritime pipes are consistently maintained influences whether or not they corrode. To better extract corrosion features and provide theoretical support for subsequent modelling, this section first cites several literature and outlines the common forms of corrosion in oil and gas pipelines, including Uniform corrosion, Pitting corrosion, Cavitation corrosion, Stray current corrosion, Microbiologically-influenced corrosion [25] with the methodology for extracting corrosion features, definitions of corrosion and corrosion rate, as well as the methodology for periodic maintenance of marine pipelines, are given. These preceding concepts will serve as the foundation for the subsequent discussion on corrosion modelling.
3.1. Uniform corrosion
Uniform corrosion can be defined as the electrical coupling of two infinitely close reversible electrodes [26]. Two theoretical basic stages of the uniform corrosion mechanism are [27]:
- The first phase, during which the main surface of the metal is corroded by chemical solutions.
- Corrosion nuclei spread uniformly across the metal surface, acting as a corrosive engine. Significant quantities of gas are generated by the corrosion process in highly concentrated acidic solutions.
Overall, uniform corrosion is often regarded as the predominant kind of corrosion and is primarily accountable for the majority of material degradation [28].
3.2. Pitting corrosion
Pitting corrosion is a highly destructive kind of corrosion that specifically affects structural components like stainless steel in environment with chloride and it causes localized damage and may have severe consequences [29]. Due to its localized occurrence, detecting it may be challenging, yet it has a significant impact on structural integrity [30]. This kind of corrosion may arise due to several factors, including but not limited to the ones mentioned in [31].
- Flaws in the substance of the pipe or flaws on its surface;
- Physical harm to the protective layer that prevents corrosion;
- Infiltration by a highly reactive chemical substance, such as chlorides;
- Inappropriate choice of materials.
3.3. Cavitation corrosion
Cavitation-corrosion refers to the simultaneous degradation of a surface caused by the creation and subsequent collapse of bubbles in a liquid, together with corrosion (Cavitation-corrosion is synonymous with cavitation erosion) [32]. This corrosion often occurs in the flow passage component. Hence, cavitation corrosion poses a significant risk to the safety management of flowing systems [33].
3.4. Stray current corrosion
Stray-current corrosion refers to a more rapid kind of corrosion that occurs due to the presence of an electric current that is externally produced. Corrosion may happen in pipes that are not protected and in metal structures that are submerged and positioned close to electric power sources or any place where there are changes in voltage [34].
3.5. Microbiologically-influenced corrosion
Microbiologically Influenced Corrosion (MIC) is the phenomenon where microorganisms affect the rate at which metals and non-metallic materials corrode by attaching themselves to the surfaces [35]. Preserving against microbial corrosion in marine settings is crucial because of the wide range of marine microorganism species and the potential for surface degradation caused by organisms like corals and barnacles.
3.6. Feature extraction
Feature extraction may be categorized into two types: manual feature extraction methods and intelligent detection systems. Manual methods are not precise and efficient when dealing with unfamiliar working settings or application situations due to the reliance on extensive labour and the associated high expenses, as well as the uncertainty of the final results acquired [36]. To enhance the accuracy of detections, advanced detection tools, specifically In-Line Inspection (ILI), have been employed in pipeline detection. ILI offers an efficient method to inspect extensive pipeline lengths within relatively short periods [37]. Additionally, Non-Destructive Testing (NDT) techniques such as Visual Inspection (VI), Onsite Metallography (OM), Liquid Penetration (PT), Magnetic Particle Inspection (MPI), and Eddy Current (EC) are utilized to evaluate the locations identified by ILI [38].
3.7. Corrosion damage and rate definition
By incorporating the concept of corrosion into performance analysis, it is possible to establish equations that accurately explain the corrosion process. These equations may then be used as digital inputs in various corrosion models. The equation for corrosion damage may be separated into two components [39].
In the case of uniform corrosion, the extent of corrosion may be quantified as the thickness of the metal layer a which is influenced by internal independent variables INT, and external independent variables EXT and time t. The independent variables consist of the characteristics of the material structure, whereas the external variables include the ambient elements. Once the thickness exceeds a certain threshold, it becomes susceptible to harm. Here is the mathematical expression:
a = a(t, INT, EXT) (1)
For localized corrosion, the damage is determined by whether the deepest corrosion amax surpasses a critical thickness, the equation form is:
amax = amax(t, INT, EXT) (2)
Corrosion rate is an essential index for building a prediction model in marine areas. because it is regarded as an output for the corrosion features. Corrosion rate can be defined as the speed at which metals degrade in a specific environment. The equation is [40]:
In this formula, where I is an electrical current, F is Faraday’s constant (96 500 C), ΔW is weight loss due to corrosion, W is the molecular weight of metal and Z is valence.
To summarise, these equations indicate that corrosion, particularly underwater corrosion, is intricate and attributed to many types of sources.
3.8. Periodic maintenance
Periodic maintenance of maritime pipes is essential for successfully minimizing the effects of corrosion. According to DNV-OS-F101 [41], maintenance is an essential component of subsea pipeline integrity management [42]. The maintenance of subsea pipelines can be categorized into two methods based on their operational position: abovewater and underwater. The above-water method involves repairing the pipeline by lifting the damaged segment to the support vessel, while the underwater method involves completing the repair on the seafloor [43]. Regardless of whether the above-water or underwater technologies are used, substantial personnel, material, and time expenses will be accrued, while regular inspections and maintenance will further contribute to these costs. In addition, it is necessary to consider transportation expenses, since the majority of maritime pipelines are situated in remote locations.
In order to minimize expenses associated with pipeline maintenance, it is crucial to select an appropriate corrosion prediction method. This will enable us to anticipate the frequency of regular maintenance and reduce unnecessary costs related to manpower, materials, time, and transportation. Additionally, by predicting corrosion from various perspectives such as environmental changes or the impact on the material’s internal structure, we can determine the maintenance strategy that should be implemented. This will be further discussed in the following sub-section.
3.9. Corrosion model
As stated in the introduction and Sub-Section 3.8, developing a corrosion model might compensate for the challenges encountered during real-world maritime pipeline maintenance. The extent of compensation is contingent upon the calibre of the model design. Prediction models include physical models, statistical models, intelligent models based on ML and so on, according to various prediction principles. [44–47] illustrates types of models used to address corrosion-related issues. Yihuan Wang. et.al. [45] developed a probabilistic physical model to analyse the reliability of corroded pipelines. This model is proven to be an effective and practical representation of naturally corroded surfaces. Mustafa M. Corrosion-Related Cobanoglu. et.al. [48] constructed a statistical model to study the development of corrosion failures. This model addresses the failure rate and reliability. Both of these examples demonstrate the efficacy of constructing corrosion models. Table 2 shows some types of models applied to problems related to corrosion.
Traditional mathematical statistics approaches are capable of efficiently establishing the mapping connection between the corrosion degree and corrosion variables [49]. Nevertheless, their model remains intricate and dependent on the fundamental theories. Furthermore, with the advancement of information technology, big data has emerged as a crucial component within the oil and gas sectors. Data mining methods and big data analytics are being used to extract significant insights from various datasets [50]. Therefore, ML based on a data-driven model may provide several benefits, which will be discussed in Section IV and Section V. As previously mentioned in Section I, it is significant to include non-data-driven models alongside data-driven models due to their higher interpretability. In this regard, Section VI will delve into these non-data-driven models about the current topic.
4. Supervised learning
Machine Learning (ML), introduced by A. L. Samuel [51] in 1959, has found extensive applications in several fields such as computer vision, materials, general game playing, data mining, and bioinformatics [52–57]. As the Artificial Intelligence and ML mature, significant progress is being made by both mainstream Artificial Intelligence researchers and professionals from other domains who are using these technologies to achieve their goals [57].
There are three primary categories of ML: Supervised learning, Unsupervised learning, and Reinforcement learning. Supervised learning is a valuable method for forecasting corrosion rates and classifying the extent of corrosion. It is particularly helpful in addressing regression and classification challenges. Hence, using supervised learning for further inquiry is a praiseworthy choice. According to [16], there are at least 35 articles that explicitly employ supervised learning for corrosion prediction from 2017 to 2020.
4.1. Supervised learning methods
Supervised learning is a subset of ML that relies on labelled datasets to train algorithms for accurate data classification or result prediction. Supervised approaches find use in many fields, like marketing, finance, manufacturing, testing, stock market prediction, and others [58]. Supervised learning may be categorized into two main types: classification and regression. For instance, constructing a corrosion degree model involves utilizing classification learning to forecast categorical class labels for future occurrences based on prior information. These labels are discrete (Fig. 4 shows the classification learning model). Constructing a corrosion rate prediction model involves doing a regression analysis, where the result is a continuous response variable.
Currently, there exist several kinds of supervised learning techniques. This work provides a comprehensive overview of numerous commonly used approaches: Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), K-Nearest Neighbours (KNN), Naive Bayes Classifier (NBC) including detailed explanations of their characteristics, benefits, and drawbacks, which are shown in Table 3. These approaches are used for the investigation of corrosion issues (the KNN method is provided in Table 5): Nurul Rawaida Ain Burhani et al. [60] employed a LR method to identify the key factors influencing the deterioration of corrosion under insulation. This study provides valuable insights for inspection planning and prioritising maintenance schedules. In another study, Ya-jun Lv et al. [61] utilised optimisation SVM (PSO-SVM) and grid search SVM (GS-SVM) to accurately calculate the 3D coordinate data of rebar corrosion. These methods effectively predict the sectional corrosion rate of steel. About DT, Bill Gu et al. [62] proposed a DT approach to identify the relationship between the occurrence of Stress Corrosion Cracking (SCC) and environmental and loading conditions, as well as to enhance the understanding of SCC susceptibility from a mechanical perspective. As for the BNN method, Beian Li et al. [63] employed the naive Bayesian model to classify the degree of steel bar corrosion in concrete structures. The evaluation method was found to be highly reliable.
This article provides an introduction to two commonly used ML methods: LR and SVM. Both of these techniques need the use of several equations for derivation. The Logistic Regression (LR) method has been widely used in several domains like medical, business, marketing research, and decision-making processes, including both complex situations and simple binary choices [71–74]. SVM has been used to tackle several corrosion issues [61,75,76]. The concepts of LR and SVM are readily understandable.
Table 2
Types of Models Applied to Problems Related to Corrosion [46].
4.1.1. LR algorithms
LR, despite its name, is primarily a classification model that is often used for binary classification tasks. LR uses the log odds ratio instead of probabilities and uses an iterative maximum likelihood technique instead of the least squares approach to construct the final model [64]. The log odds may be defined as:
where p represents the likelihood of accurately forecasting an occurrence. Therefore, to forecast an event that consists of many response variables (features)x, we need to consider the weight w, the intercept b and the binary variable (label) y = 1(which means an event occurred). The log odd can be described as:
By using Equation (4), one may derive a logistic regression model:
4.1.2. SVM algorithms
The SVM technique is a classification approach that identifies a hyperplane or function S(x) = wTx +b to accurately distinguish two classes (y = 1, y = − 1) with a maximum margin [66]. Linear SVM may be classified into two basic categories, namely Hard-margin and Softmargin, based on whether the data is linearly separable or not. The optimisation problem for Hard-margin may be defined as follows:
where 1⩽i⩽n, n as the examples in dataset.
In the case of Soft-margin, an additional positive slack variable is included to optimise the classification process. The issue may be stated as:
The regularisation parameter, denoted as C > 0 is given. This parameter achieves a compromise between two conflicting criteria: maximising the margin and minimising the error. The slack variables ζi represent the distance of the erroneously categorised locations from the ideal hyperplane [77].
Fig. 4. Supervised (classification) learning model [59].
To address a nonlinear issue, using a kernel may convert the training data into a feature space of larger dimensions, enabling linear separability of the data [64].
Furthermore, there exists a significant subdivision of SVM known as Support Vector Regression (SVR). The key distinction between this approach and SVM is in its objective of classifying sample points into a single category, enabling it to effectively address a wide range of regression problems.
When considering the supervised learning methodologies, Table 3, basic equation derivations, and their applications, it is clear that they have substantial scientific and practical significance in the domain of deep-sea pipeline corrosion. Hence, possessing an understanding of the implementation of these strategies is crucial for tackling these issues.
4.2. Utilisation of supervised learning
With the context of pipeline corrosion, to utilise supervised learning, it is necessary to follow a series of five steps. Fig. 5 shows the workflow of supervised learning methods.
- The first stage involves gathering the dataset and then partitioning it into at least a training set and a testing set. The dataset might consist of digital data, photographs, or other valuable information;
- The second stage involves data reprocessing and feature extraction, which will be further discussed in the next section. This stage is to enhance the data and optimize the distinguishability of the original data in the feature space;
- The third phase entails constructing an appropriate prediction model to get the output that corresponds to the input. The input datasets for marine pipeline research may consist of the material performance index, pH, and temperature. The output variables might include corrosion rate, corrosion degree, and residual strength;
- The fourth phase involves training and testing. This step entails selecting the appropriate supervised technique and inputting the prepared feature data into the model. Subsequently, continuously validate and assess until the training and testing errors reach a lower threshold;
- Lastly, execute the process of deploying the model to the desired application and ensure that it attains the intended outcomes.
While classic ML-supervised learning algorithms offer numerous advantages, they also come with certain drawbacks that need to be addressed. According to Table 2, the primary disadvantage is their heavy reliance on the dataset. When dealing with large volumes of data that contain noise and imperfections, as well as existing correlations among the features, these algorithms can result in overfitting and a decrease in accuracy. In addition, these approaches may demonstrate a significant generalisation error when applied to novel data.
To effectively address the problem of corrosion, especially when it is challenging to get a clean dataset, it is imperative to develop a system that can accommodate insufficient input and provide satisfactory output. Therefore, the NN is used to address these problems, and it has the following benefits [78]:
- Storing data over the whole network. Traditional programming data is kept on the network, not a database. A few missing data points don’t stop the network.
- Capacity to operate with little information: Even partial data may give output after NN training. The significance of missing information determines performance loss.
- Possessing fault tolerance: One or more NN cells may be corrupted without affecting output. This feature makes networks fault-tolerant.
- Possessing a dispersed memory: NN may learn by determining instances and teaching the network the desired output by displaying these examples. The network’s success depends on the chosen cases, and if the event cannot be depicted in all its facets, the network may provide erroneous output.
- Progressive deterioration: Networks lag and degrade. The network issue takes time to deteriorate.
- Proficiency in developing ML models: Commenting on comparable occurrences helps NN make judgements.
- Ability to do parallel processing: NN may multitask due to their numerical strength.
Table 3
A concise overview of many prevalent supervised learning methods.
Given these characteristics, while NN still depends on the dataset, they will exhibit superior performance and be widely used.
4.3. A summary of NN’s
Artificial Neural Network, which are simply called Neural Networks (NN’s) are classified as a kind of ML algorithm. Since their introduction in the 1960 s, NN’s have continually shown their effectiveness as a reliable framework for modelling nonlinear systems which have been widely used in several engineering applications [79].
An NN functions in a manner akin to the human brain, as it acquires information through a learning process and stores this knowledge by establishing connections between neurons (referred to as a perceptron in the case of a single-layer neural network) with different synaptic weights [80]. Neurons get a digital input of electrical impulses. If the cumulative signal is above a certain threshold, the perceptron will either let the signal proceed or. A mathematical model may be used to depict the process [81]:
In this formula, xi xi refers to input, wi refers to weight, f is the transfer function, N is output and T is the threshold value.
The connectivity pattern between neurons plays a crucial role in the functioning of the NN’s, which may receive either stimulating or suppressing inputs [82]. A fundamental NN should include the following components [83]: A layer that receives input:x, an indeterminate number of hidden layers, an output layer:̂y, A collection of weights w and biases b connecting every layer, and an option to choose an activation function for every concealed layer σ. The output of a two-layer neural network is as follows [83] and Fig. 6 demonstrates a two-hidden-layer NN network for corrosion rate prediction:
It is important to note that, for an NN model, the selection of the number of hidden layers and the choice of activation function in each layer may significantly influence the outcome. Regarding the hidden layers, their number is highly dependent on the size of the dataset. As for the activation functions, the performance of various functions can be seen in Fig. 7, which highlights the differences.
In summary, the primary benefit of NN’s is that they do not answer problems in a purely mathematical manner, while they exhibit information-processing traits that provide an approximate solution to a particular problem [84]. This saves time by avoiding useless study. Currently, NNs have found applications in several domains of materials research, including nanomaterial fabrication, quantum computing, and material property analysis [85–87]. In addition, NN’s can be expanded based on several principles to adapt them for varied purposes. For example, Convolutional Neural Networks (CNN’s) are often used for tasks related to images and videos, whilst Recurrent Neural Networks (RNN’s) are utilized for challenges that include sequences or time. Hence, NN’s may be regarded as suitable for modelling the corrosion of maritime pipelines owing to their robust applications.
In the next section, the approach to ML implementation for corrosion analysis based on existing literature will be discussed with a listing of steps for building predicting models using methods discussed in previous sections.
5. Predictive corrosion modelling using ML
ML has shown advantages in several pieces of literature, where different ML techniques have been used to construct corrosion prediction models. This section will provide a concise overview of the stages involved in constructing a data-driven model and analyse the strengths and weaknesses of the existing literature on model construction.
5.1. Corrosion data collection and description
Corrosion is the basis for further discussion. The efficacy of the modelling is contingent upon the calibre of the data. This section provides a collection of corrosion datasets together with their accompanying descriptions and methodology in Table 4 for reference purposes.
Based on the aforementioned list, determining the characteristics of a good dataset is not a challenging task. A good dataset should possess a minimum of two properties.
- The dataset should have a large depth and breadth to provide sufficient space for training the model.
- The dataset should include a wide range of attributes related to the objects to facilitate model adaptation when encountering unknown data.
Fig. 5. The workflow of supervised learning methods.
5.2. Data preprocessing
Data preprocessing is crucial for ML since the data’s quality and the existence of substantial information directly impact the model’s learning process. However, very few works have been focused on the effects of data preprocessing techniques [65]. The majority of models cannot handle missing data. Furthermore, the dataset presents many problems, including outliers, high dimensionality, and noisy data, which need treatment to improve its completeness and correctness. In addition, some chosen characteristics may demonstrate a significant connection and overlap.
Therefore, it is crucial to analyse and preprocess a dataset before feeding it into a learning system.
The process of data preparation involves the following steps:
- Data cleaning: locates and fixes errors or discrepancies in the data including Identifying, Removing and imputing missing values. It is a quadratic dependency between the prediction accuracy and the percentage of missing values [93].
- Feature selection and reduction: This method refers to the process of creating new features that improve discrimination and removing duplicate or unnecessary features to reduce the dimensionality of the dataset. Eliminating unnecessary or duplicative features may enhance the precision of the model and save runtime while maintaining the integrity of feature properties. Most researchers [21,94,95] favour feature selection when using ML and more sophisticated techniques have been devised [96].
Fig. 6. A basic two-hidden-layer NN for corrosion rate prediction.
5.3. Analysis and discussion
Once the dataset has been cleansed, we can use ML techniques to analyse the data using the procedures outlined in Section 4.2. Table 5 provides a review of several literature sources that have utilized ML analysis on corrosion datasets, including their outcomes and limitations.
It is evident from the aforementioned list that ML is very proficient in resolving corrosion issues. Consequently, it has the capability to be extensively used for the detection of corrosion in subaquatic pipes. Corrosion rate [75,97,99], and failure pressure [101,102] are often used as output variables in ML models. For NN’s, to validate the anticipated outcomes obtained from the NN model, one may assess the coefficient of determination (R2), which quantifies the similarity between the expected results and the training data [101,102].
Regarding [75] and [101], it is evident that the integration of ML with various algorithms may effectively address corrosion issues. Specifically, [75] conducted a comparative analysis of single, ensemble, and hybrid models to evaluate their performance. The ensemble models were built using several approaches such as NN’s, SVR/SVMs, CART, and Linear regression models. The hybrid metaheuristic regression model (SFA-LSSVR) combines a Smart Firefly Algorithm (SFA) with the Least Squares Support Vector Regression (LSSVR) method. The comparison of these models was conducted using metrics such as Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and Synthesis Index (SI). The findings indicate that the tiering ensemble outperforms both the individual models and alternative ensemble methods. And the SFA-LSSVR model, achieved the lowest MAPE values for forecasting corrosion rate and pitting risk, with values of 1.26 % and 5.60 % respectively. In addition, the SFALSSVR model has shown its efficacy in dynamically optimising the hyperparameters of the LSSVR via the use of SFA. Therefore, enhancing the precision of estimating the likelihood of pitting and the rate of corrosion may be achieved.
Fig. 7. Performance of various activation functions.
Table 5
A review of ML analysis of corrosion datasets in the literature.
Regarding [101], the methodology involves integrating NN with FEM to develop analytical equations for estimating the failure pressure of a corroded pipeline. These equations account for longitudinally interacting corrosion defects and consider the combined effects of internal pressure and longitudinal compressive stress. The NN model is trained using data generated from the FEM. The accuracy of the FEM was confirmed via the validation process using burst tests, which simulate the pressure at which a real pipeline breakdown occurs. When the NN was evaluated using a finite element analysis dataset that had not been seen before, the predictions made by the failure pressure prediction equations were highly accurate. The R2 value was 0.9921. The MSE was 4.746 × 10− 4 and the MAE was 1.374 × 10− 2. The percentage error ranged from − 9.39 % to 4.63 %, with a standard deviation of 2.83.
Nevertheless, there are some issues with this list, with one of the primary concerns being the insufficient size of the dataset [75,97,99,102] or the lack of comprehensiveness in the corrosion components [100]. These limitations might potentially hinder the efficacy of these models when applied to significant corrosion problems. Another problem is about validation, although the results show a satisfactory outcome, these articles didn’t show the comparison with different ML methods [101] or different gear [100].
To construct a robust corrosion model, one must undertake a complicated task including extensive data collection and analysis. It is essential to adhere to the processes outlined in this section. Moving forward, the next section discusses the approach of corrosion modelling using alternative methods.
6. Corrosion modelling using alternative methods
Based on the data-driven models established by ML and NN’s, it is evident that these models heavily rely on the dataset. Consequently, if the dataset is insufficient or of poor quality, the predictive results obtained from these models will be limited in their applicability. Additionally, while the evaluation parameter of ML can comprehensively assess the strengths and weaknesses of the model, it cannot explain the theoretical knowledge underlying the dataset. Therefore, this section will present corrosion models based on alternative principles and corrosion characteristics to provide a reference for corrosion modelling. Table 6 presents corrosion models categorized by distinct types of corrosion and the causes of internal and exterior corrosion. These models are developed utilizing techniques such as finite element analysis, solid mechanics, and statistical theory.
Based on the findings from the literature mentioned above, we can identify three elements that might be advantageous for the area of subsea pipeline corrosion. The corrosion process of maritime pipes is influenced by both internal and external causes. There is a correlation between certain factors. In situations where detection conditions are not feasible, it is possible to control the most influential variables to prevent corrosion. For instance, research in Literature [109] demonstrates that temperature, pressure, and pipe wall shear stress can all contribute to an increase in the corrosion rate caused by hydrates. Additionally, higher operating pressure leads to higher hydrate formation temperatures, which in turn promote the onset of erosion corrosion and accelerate the rate of velocity loss along the pipeline. Therefore, controlling the operating pressure may somewhat limit the increase in the corrosion rate.
Secondly, it is important to note that varying degrees of corrosion may have distinct impacts on maritime pipelines. For instance, research findings from the literature [105] indicates that as the length of corrosion increases, the influence of defect spacing on collapse pressure becomes more significant. An escalation in defect depth results in intensified defect interactions. According to the literature [107], when a corroded pipeline experiences full-scale corrosion with an uneven reduction in thickness (averaging 10 percent), there is a corresponding decrease in crush strength of 10–13 percent. Similarly, in the case of fullscale corrosion with an average reduction in thickness of 40 percent, the crush strength of the pipeline diminishes by 45 percent. Therefore, it is important to examine the impact of varying degrees of corrosion on the structural characteristics of the pipe.
Furthermore, the proper and rational upkeep of marine pipelines can significantly impact the corrosion process. Maintenance plans should be adjusted based on the level of risk and operating expenses., e.g., literature[42] uses shorter maintenance intervals to mitigate the danger of failure above acceptable criteria, and discovers programs that strike a balance between pipeline availability and maintenance expenses.
In the previous analysis, it is important to highlight that the modelling process of the mentioned methods is easily interpretable as it depends on specific solid mechanics and statistical models, while the empirical model proposed in the literature [109] also offers a specific theoretical foundation for modelling, thereby making the results highly pertinent. Therefore, these findings also provide inspiration for the datadriven models mentioned above, meaning that the underlying principles of the data should be considered while creating the models.
7. Discussion and Suggestion
The previous discussion presented the proposed methods and stages for constructing data-driven ML models, as well as the advantages and limitations of various forms of ML and presented the problems solved by corrosion modelling based on other theories and some of the reflections that these approaches may bring to the field of marine pipeline corrosion. Therefore, this section will build on and integrate the discussion based on the above two sections.
For the ML section pertaining to its feature is based on the outcome of Section IV and Section V. In this regard, the main features of ML (supervised learning) can be summarised as follows:
- Possess a robust capacity to analyse many issues using clear and systematic steps;
- The principles underlying ML techniques are not intricate;
- The effectiveness of ML models is strongly dependent on the quality of the dataset;
- The aim of a supervised learning is to predict the output of a function for any given inputs, based on pre-labelled training instances.
To strengthen the capacity to analyse corrosion difficulties, future research should include these components based on the primary characteristics of ML methods:
Firstly, ML-based data-driven models heavily rely on the quality of the acquired dataset. To improve the dataset, it is necessary to increase the depth or breadth of the data, and in the case of when there are enough funds for operational expenses, a substantial quantity of data may be gathered using a range of state-of-the-art devices. for example, the abundance of data from smart devices and sensors connected to the cloud via the Industrial Internet of Things (IIOT) [103] allows for the easy gathering of large amounts of data. in the case of insufficient operational costs, other approaches like Virtual Sample Generation (VSG) [104] or Monte Carlo simulation [90] may be used to produce data for model training.
Secondly, even if an ample quantity of data is acquired through diverse techniques, failure to preprocess it may result in biased predictions due to the presence of noisy or highly correlated data. This, in turn, can escalate operational and maintenance expenses. Additionally, depending on the outcome of data dimensionality reduction, it is possible to decrease the number of data categories to be collected, thereby reducing the necessary equipment.
Furthermore, while the data-driven model has the advantage of obtaining predictions without relying on data theory, this weakens its interpretability, as opposed to the model provided in Section VI, which relies on a great deal of knowledge of solid mechanics and statistics but produces clearly more specific results. As a result, it is necessary to collect data while comprehending the theoretical knowledge that underpins it, and then produce the final model based on this explanation. Alternatively, as shown in the literature [101], it is successful when integrating the Data-driven and Non-Data-Driven models, i.e., to collect the dataset using the latter’s knowledge principles and then apply ML to create predictions.
With respect to Section VI, it is shown that in addition to expanding the outer edges of modelling in the field of corrosion or providing theory and data sets for optimisation of data-driven models, these methods also demonstrate the importance of maintenance of offshore pipelines, where on the one hand, a suitable maintenance strategy can be effective in reducing corrosion. On the other hand, good maintenance strategy should not only be about how to prevent risk, but also about the cost of maintenance, and this, in turn, demonstrates the importance of building predictive models, as reasonable predictions can control the number of corrosion cases and thus reduce unnecessary spending. This in turn shows the importance of predictive modelling of corrosion, as reasonable predictions can control the number of maintenance visits and thus reduce unnecessary expenditure.
In this regard, constructing a corrosion model is an important task that requires inspection techniques, specific modelling methods, data collection methods (for ML methods) and accurate analysis algorithms, all of which must be correct throughout the process.
8. Conclusion
This article presents an intensive literature review on the development of submarine pipeline corrosion models for future research. The review begins by outlining the importance of constructing a corrosion model, then introducing the factors that contribute to corrosion and providing an examination of corrosion characteristics. This research also provides a summary of the methodology used to identify pipeline corrosion.
This article primarily discusses the model-building techniques used by ML, with a particular emphasis on highlighting the benefits of ML approaches. The content encompasses a range of ML methodologies, such as NN’s, and provides a systematic procedure for constructing an ML model. Moreover, it emphasizes the need of gathering and reprocessing data. The analysis findings indicate that ML is a feasible method for addressing the issue of corrosion.
This article also explores corrosion models that utilize other approaches, such as solid mechanics and statistics, in addition to ML methods. The purpose is to address the limited interpretability of ML models and expand the range of corrosion modelling. These alternative methods either analyse the structural properties of the pipeline, evaluate the pipeline’s maintenance strategy from different angles, or investigate the impact of various factors on the corrosion rate. In summary, these approaches not only expand the range and variety of corrosion modelling but they may also be integrated with data-driven models to create more precise and understandable corrosion models for marine pipelines.
Consequently, drawing on the aforementioned analysis and debates, this study presents a number of recommendations about the elements and objectives that should be taken into account when developing future models for maritime pipeline corrosion.
Before collecting various indicators to achieve data dimensionality reduction and reduce the use of data collection sensors, it is necessary to analyze the correlation between certain internal and external factors that cause corrosion. This will help in reducing unnecessary expenditure.
When utilizing a data-driven model, it is imperative to comprehend the underlying theory of the data in order to improve the interpretability of the final model. Alternatively, one can employ solid mechanics, statistics, or other models to create a dataset, and subsequently construct a data-driven model based on that dataset.
One important aspect to address in the future is how to effectively conduct routine maintenance on submarine pipelines while keeping expenses to a minimum. The geographical positioning of offshore pipelines necessitates significant maintenance expenses. The downtime associated with unnecessary maintenance activities diminishes production efficiency and results in avoidable profit loss. Therefore, it is crucial to devise efficient and cost-effective strategies that mitigate risks.
Corrosion-resistant materials may be developed by using the understanding of corrosion principles and their impact on material structure. This paper’s research serves as a reference for corrosion modelling, as well as for the selection of materials and structural design for marine pipelines. For instance, it considers the impact of changes in pH value and temperature on corrosion. Therefore, it is necessary for the pipeline materials (or coatings) to possess characteristics such as resistance to acid, alkali, and high temperatures.
CRediT authorship contribution statement
Ziheng Zhao: Writing – original draft. Mohammad Nishat Akhtar: Writing – review & editing. Elmi Abu Bakar: Writing – review & editing. Norizham Bin Abdul Razak: Writing – review & editing.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
The authors would like to acknowledge Research Creativity and Management Office, Universiti Sains Malaysia for their support in this work.