Review of Machine Learning Approach on Credit Card Fraud Detection

  • Review Article
  • Open access
  • Published: 05 May 2022
  • Volume 2 , pages 55–68, ( 2022 )

Cite this article

You have full access to this open access article

literature review for credit card fraud detection

  • Rejwan Bin Sulaiman   ORCID: orcid.org/0000-0002-3037-7808 1 ,
  • Vitaly Schetinin 1 &
  • Paul Sant 1  

30k Accesses

33 Citations

Explore all metrics

Massive usage of credit cards has caused an escalation of fraud. Usage of credit cards has resulted in the growth of online business advancement and ease of the e-payment system. The use of machine learning (methods) are adapted on a larger scale to detect and prevent fraud. ML algorithms play an essential role in analysing customer data. In this research article, we have conducted a comparative analysis of the literature review considering the ML techniques for credit card fraud detection (CCFD) and data confidentiality. In the end, we have proposed a hybrid solution, using the neural network (ANN) in a federated learning framework. It has been observed as an effective solution for achieving higher accuracy in CCFD while ensuring privacy.

Similar content being viewed by others

literature review for credit card fraud detection

Artificial Intelligence and Fraud Detection

literature review for credit card fraud detection

Drawbacks of Artificial Intelligence and Their Potential Solutions in the Healthcare Sector

literature review for credit card fraud detection

Utilization of artificial intelligence in the banking sector: a systematic literature review

Avoid common mistakes on your manuscript.

1 Introduction

In the twenty-first century, most financial institutions have increasingly made business facilities available for the public through internet banking. E-payment methods play an imperative role in today's competitive financial society. They have made purchasing goods and services very convenient. Financial institutions often provide customers with cards that make their lives convenient as they go shopping without carrying cash. Other than debit cards the credit cards are also beneficial to consumers because it protects them against purchased goods that might be damaged, lost or even stolen. Customers are required to verify the transaction with the merchant before carrying out any transaction using their credit card.

According to statistics, Visa [ 50 ] and Mastercard [ 51 ] issued 2287 million total credit cards during 2020 (4th quarter) worldwide (Figs.  1 and 2 ).

figure 1

Amount of Master credit card issued worldwide [ 51 ]

figure 2

Amount of Visa credit issued worldwide [ 50 ]

Visa issued 1131 million, whereas master card issued 1156 million cards worldwide. These statistics show how the usage of card-based transactions became easy and famous to the end-users. Fraudsters pave their way to manipulate this group of people due to the massive portion of global transactions falling in this category. And perhaps sometimes it is easy to social engineer humans easily.

Despite the several benefits that credit cards provide to consumers, they are also associated with problems such as security and fraud. Credit card fraud is considered a challenge which banks and financial institutions are facing. It occurs when unapproved individuals use credit cards for gaining money or property using fraudulent means. Credit card information is sensitive to be stolen via online platforms and web pages that are unsecured. They can also be obtained from identity theft schemes. Fraudsters can access the credit and debit card numbers of users illegitimately without their consent and knowledge.

According to “U.K. finance” [ 27 ], fraudulent activities associated with credit and debit cards have proven to be one of the major causes of financial losses in the finance industry. Due to the advancement of technology, it is big threat that leads to massive loss of finances globally. Therefore, it is imperative to carry out credit card fraud detection to reduce financial losses.

Machine learning is effective in determining which transactions are fraudulent and those that are legitimate. One of the main challenges associated with detection techniques is the barrier to exchanging ideas related to fraud detection. According to a study by “U.K. finance”, the number of credit and debit fraud cases reported in the U.K. worth £574.2 million in 2020 [ 27 ].

In recent years, fraud detection in credit card has increased tremendously, drawing the attention of most scholars and researchers [ 22 ]. This research paper seeks to review and evaluate various aspects of credit and debit fraud detection. The paper examines various techniques used to detect fraudulent credit card transactions and finally proposes a better technique for credit card fraud. Researchers are trying to solve some methodological barriers that pose a limitation in ML real-time application. Various research has been done in different domains such as abnormal patterns detection [ 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 ], biometric identification [ 36 , 37 ], Diabetes Prediction [ 38 , 39 ], Happiness prediction [ 40 ], Water quality prediction [ 41 ], accident prevention at Heathrow [ 42 ], timely diagnosis of bone diseases [ 43 ], Predicting informational efficiency using deep Neural Network [ 49 ]. Despite these limitations, researchers are working to gain the ML power to detect frauds.

1.1 Motivation

CCFD involves quite complex procedures and techniques for developing an effective detection system. Following are some of the problems in CCFD that I have analyzed from the literature review, and it has motivated me to propose an effective solution to the problems.

Credit card transactions are substantial in number and are heterogeneous. The users use the credit cards for various purposes based on geographical locations and currencies, which shows that the fraudulent transactions are widely diverse [ 10 ]. This problem has motivated me to devise a solution that can potentially help to detect the fraudulent transaction irrespective of geographical location. Fraud detection is also a multi-objective task. Banks and financial institutions need to give their users a good experience and service at all times. Therefore, it is challenging to use the customer datasets for experimental purposes while ensuring service availability and privacy. To compensate this challenge, my motivation leads to introduce the framework of federated learning for data privacy assurance.

Fraudulent transaction diversity and imbalanced datasets is also a big challenge in CCFD [ 22 ]. Getting real-time datasets of credit card transactions is quite challenging. Banks and financial sectors do not expose their customer's data due to GDPR. Therefore, it creates a challenge for the researchers to gather the datasets for credit card fraud detection. My motivation leads to helping research communities and data scientist who work in the financial sector to devise a system to fulfil the challenges of getting big data for an effective machine learning model.

2 Literature Review

It is imperative for any banking or financial institution that issues credit and debit cards to put in place an effective measure to detect any cases of fraudulent transactions. Some of the notable methods identified to help detect fraud in credit card that includes RF, ANN, SVM, k-nearest neighbors and other techniques that have a hybrid and privacy-preserving approach for data privacy.

We will discuss in brief all the approaches mentioned above.

2.1 Random Forest (R.F.)

Random forest is an algorithm based on ML which is constructed from a decision tree (DT) algorithm, commonly used to resolve various regression and classification problems. It helps in predicting output with high accuracy in large datasets. The Random Forest technique combines several classifiers to provide a solution to different intricate issues. The random forest helps in predicting the average mean of output from other trees. An increase in the number of trees tends to increase the precision of the outcome. The random forest method helps in eradicating various limitations of a decision tree algorithm [ 8 ]. It also minimizes the lifting of datasets and thus increasing precision. Several decision trees exist in a forest whereby a individual tree act as weak-learner; however, they together form strong learner. The RF technique is high-speed and effective in handling large volumes of datasets and unbalanced ones. However, the random forest has limitations in training the range of datasets, especially in the regression problems.

The various traditional algorithm was used, such as Logistic regression (L.R.), C4,5, and R.F. Logistic regression (L.R.) describes and explains the association between the dependent binary variable and independent variable. The C4.5 is commonly considered for data mining as DT classifiers in generating decisions based on various sets of data provided. Traditionally, the algorithm combined Threshold optimization (T) as well as Baye’s Minimum Risk Classifiers (M.R.) were used in fraudulent grouping transactions by altering the prospect of the limit. T and M.R. improve predictions' accuracy and reduce the overall cost involved [ 11 ]. However, logistic regression performs well in the regression problem, as it tolerates the model overfitting, unlike the decision tree. Also, there is a significantly less real-time scenario of having linear problems. When considering the CCFD, the real-time datasets are nonlinear. Therefore, the use of logistic regression is not suitable to be considered.

Olena et al. have proposed a hybrid approach for credit card fraud detection using random forest and isolation forest, which is used to identify anomaly-based transactions [ 15 ].The proposed model of the author is based on two primary sub-systems. One of them is concerned about anomaly detection that works based on unsupervised learning. The second one is an interpretation that incorporates the anomalies type. It is based on supervised learning. The proposed work's primary concern is the data speed that works effectively when considered with the hybrid model on the real-time data [ 15 , 16 ]. The system was evaluated for identifying the users' geolocation while performing transactions for detection purpose. This hybrid model is not based on the anomaly level. However, the anomaly type determines it. The system of anomaly-based transactions detects fraud, based on geolocation. However, preserving privacy and confidentiality is a lack of finding in this research work, as the real-time data is involved in detecting the fraudulent transaction. The researcher did not mention any hashing, or encrypted methods followed to keep user’s data from being exposed. Therefore, to comply with this challenge, there is a need to ensure data confidentiality for the credit card users for the research purpose. The researcher also did not mention how to tackle geolocation spoofing techniques to prevent fraud. Our contribution will be focused on considering the geolocation and time features for detecting frauds combining ANN and federated learning approach to ensure data confidentiality.

Although the random forest algorithms are quite effective in predicting the class of regression problems, they constitute various limitations when it comes to the CCFD in real-time. It can perform well on lab-based datasets where limited data is available. The random forest algorithms are slower in performance in real-time scenarios. The training process is slower, and it takes a longer time to make predictions. Therefore, for effective CCFD in real-life datasets, we need a large volume of data, and random forest algorithms lack the capability of training the datasets effectively and making predictions.

2.2 Artificial Neural Network (ANN) Method

ANN is a ML algorithm which functions similarly to the human brain. Typically, ANN is based on two types of methods: supervised method and unsupervised method. The Unsupervised Neural Network is widely used in detecting fraud cases since it has an accuracy rate of 95%[ 4 ]. The unsupervised neural network attempts to find similar patterns between the credit cardholders present and those found in earlier transactions. Suppose the details found in the current transactions are correlated with the previous transactions. Then, a fraud case is likely to be detected [ 4 ]. ANN methods are highly fault tolerant. For instance, the generation of output is sustained even with the corruption in single or multiple cells. Due to its high speed and effective processing capabilities, ANN can be considered an effective solution for the CCFD.

The author used three stages in detecting fraud; verifying the user, fuzzy clustering algorithm, and ANN classification phase to differentiate between legitimate and suspicious transactions. This technique helped generate an accuracy rate of 93.90 and 6.10% in classifying transactions incorrectly [ 7 ]. Although ANN, along with the clustering, performs well in detecting fraudulent transactions, the author failed to consider the appropriate structure of ANN that requires progressive trials and errors.

An artificial neural network that is trained using a simulated annealing algorithm is effective in identifying various fraudulent credit card transactions. The stimulation annealing algorithm optimizes the performance by finding out the best suitable configuration weight in the neural network [ 10 ].

Saurabh et al. have proposed a model based on the artificial neural network (ANN) [ 17 ] and backpropagation for credit card fraud detection [ 17 , 18 ]. The procedure is followed by taking the customers' dataset, i.e., name, transaction ID and time. With 80% of data for training, the author experimented, 20% of the data is taken for testing and validation purpose. The proposed model has given a significant outcome for the detection of fraudulent transactions in real-time data. For the evaluation purpose, authors have used confusion matrix, recall, accuracy, and precision. By performing this experiment, the achieved accuracy is 99.96% which is enhanced compared to the previous model while considering the real-time data. Although it has produced good results; however, for training and researching, this research work lacks the potential solution of data threat by the researcher or even by an individual bank employee. Therefore, it is required to have a solution that can potentially fulfil all the criteria for data confidentiality and integrity of the bank credit card transactions. The authors have not mentioned anything about data confidentiality while using it for training like name, age and gender. Therefore, our proposed work will use a federated learning model to ensure data privacy to train it for credit card fraud detection.

Data mining techniques such as the DT, MLP, and CFLANN are widely considered to determine patterns from the previous transaction. These models often use two types of datasets in comparing the performance. The Multiple-layer perception (MLP) model has an accuracy of 88.95% in the Australian-credit card dataset [Class Distribution: CLASS 2: +: 307 (44.5%), CLASS 1 383 (55.5%)] and 78.50% in the German-credit card dataset [ 24 ]. which gives the indication that the MLP perform differently in a different dataset. The use of MLP could not be very effective in CCFD as reason been having a larger number of parameters, and it causes the highly dense structure that ultimately results in redundancy and performance inefficiency. The author did not highlight this concern which is essential to consider to use the MLP process in real-time.

ANN is an effective algorithm that can be used in CCFD [ 4 , 7 , 10 ]. It can be seen from the literature; it has produced good performance when used in congestion with various functions and algorithms. Those functions have their individual lacking. However, the use of ANN in CCFD is proven to be promising due to its capability to accommodate a larger volume of data and distributed memory structure.

2.3 Support Vector Machine (SVM)

SVM is considered for classification and carry out regression analysis for various problem. In this approach, researchers often analyze the patterns in which customers use credit cards. The paying patterns of the customers were collected from the datasets. The support vector machine technique is used in classifying consumer patterns into either fraudulent or non-fraudulent transactions. The SVM method is effective, and it provides accurate results when fewer features have been used from the dataset [ 5 ]. However, the problem exists when a larger volume of datasets (at least over 100,000) is used. While considering the use of SVM in CCFD, it is ineffective when used in real-time as the size of datasets are large.

Rtayli et al. have proposed a method for credit card fraud risk (CCR) for the higher dimensionality data by using the classification technique of random forest classifier (RFC) [ 27 ] and SVM [ 26 , 27 ], in a hybrid approach. The idea was inspired by the feature selection of fraudulent transactions in the big imbalanced dataset. The fraud transactions are minimal in number and become difficult for detection. To evaluate the model, the author has used evaluation metrics that comprise accuracy, recall and area under the curve.

Based on SVM while using RFC suggested that it has produced the accuracy of 95%, false-positive transactions are decreased by improving the sensitively to 87% which has caused the better fraud detection in the massive dataset and imbalanced data [ 26 , 27 ]. This model has also improved the classification performance. Although the method produced efficient corresponding output for fraud detection while using classification features, this model limits the transaction's privacy in term of performing the evaluation metrics of accuracy and recall. Therefore, to fix privacy concern, we are using a federated learning model that trains data locally. We are also combining it with artificial neural network. RFC performs slow when dealing with large datasets.

2.4 K-Nearest Neighbour (KNN)

KNN is type of supervised ML method helpful in classifying and performing regression analysis on problems. It is an effective method in supervised learning. It helps in improving the detection and decreasing false-alarm rate. It uses a supervised technique in establishing the presence of fraudulent activity in credit card transactions [ 14 ]. The KNN fraud detection technique requires two estimates: correlation of transaction and distance between the occurrence of transaction in data. The KNN technique is suitable for detecting fraudulent activity during transaction time. By performing over-sampling and separating data, it can be possibly used to determine the anomalies in the targets. Therefore, it can be considered for CCFD in memory limitations. It can assist in CCFD while utilizing low memory and less computation power. It is a faster approach for any number of datasets. While comparing with other anomaly-based techniques, KNN results higher in accuracy and efficiency [ 12 ].

It is widely used in identifying a similar pattern in previous transactions carried out by the cardholder. The commonly used machine learning algorithms include LR, Naïve Bayes and KNN. The KNN has an accuracy rate of 97.69% when it comes to the detection of fraudulent transactions in Credit card [ 13 ]. It has produced optimum performance KNN is proven to be efficient in performance with respect to all metrics been used, as it didn’t record any false-positive while classifying. Another study was performed using KNN, where 72% accuracy was achieved for CCFD [ 12 ].

Although the authors conducted progressive tests while utilizing KNN, it is critical to note the algorithm's limitations. KNN is a memory-intensive algorithm that scales up non-essential data characteristics. It likewise falls short in the experiments cited above. When the algorithm is fed a large amount of data, the performance of the KNN algorithm degrades. As a result, these constraints have an effect on the accuracy and recall matrix in the CCFD process.

2.5 Hybrid Approach

The procedures for CCFD are now replaced by the ML techniques that have resulted in higher efficiency. One of the research teams has proposed a method that involves loan fraud detection while using the ML in credit card transactions [ 44 ]. The process was experimented with by using the Extreme Gradient Boosting (XGBoost) algorithm with other data mining procedures that have produced optimal results in CCFD. The research work was followed by keeping the valuable information without having knowledge about it.

To achieve the research targets, the authors have used a hybrid technique of supervised and unsupervised ML algorithms. In this procedure, PK-XGBoost and XGBoost were used. While observing the performance, PK-XGBoost has performed better in comparison with simple XGBoost [ 45 , 46 ]. The performance metric keeps the higher efficiency in detecting fraud while ensuring user privacy. Due to the higher number of transactions in credit cards, this approach possesses limitations in terms of privacy assurance. Also, XGBoost overfits the dataset in some cases to avoid these various parameters need to be tuned and act together to attain adequate accuracy.

The researchers have used the hybrid method for CCFD using the random forest as well as isolation forest that is used for identification of anomaly transaction [ 47 ]. This method is comprised of two categories. The first one is involved in anomaly-based detection while using unsupervised learning, and the other one is used for interpretations of anomaly detection, and it works on the basis of supervised learning. The proposed method is considered by using high-speed data when the method is used on real-life datasets [ 15 , 16 ]. The evaluation of the proposed system was evaluated for the identification of the user geolocation. This technique is not cantered on the anomaly level; instead, it is the anomaly-type that defines it. Although it helped to detect the fraudulent transaction on the basis of geolocation, however, data confidentiality and privacy could be compromised. While considering the author work, the model should be evaluated while ensuring the confidentiality of the data. Therefore, it is required to have a model that provides data confidentiality while achieving higher accuracy in CCFD of more extensive datasets.

2.6 Privacy-Preserving Techniques

In the ML approach, dataset training is essential, and for practical training, ML algorithms should be provided with a large volume of data. There has been various research done by using Credit card data in a privacy-preserving manner. One of the experiments was done using the supervised ML approach with blockchain technology. It was used on Ethereum, and it was performed on 300 thousand accounts. The results achieved showed that the alteration of parameters changes the value of precision and recall. Also, it was observed that the use of blockchain could be a threat on the basis of the fact that it is decentralised technology [ 53 ]. However, blockchain technology is one of the effective ways of ensuring data privacy due to its decentralised nature. However, considering the use of decentralized technology in the Real World for CCFD, it possesses various limitations that include scalability issues, maintaining data in the wallets. It is also processor-intensive, consuming higher energy, Hence it is expensive, and standardisation is not globally adapted. Therefore, considering the blockchain in banks and financial institutions for CCFD could not be the right choice.

The use of data for experimental purposes should be followed by the GDPR. The research was done by using the techniques of gossip learning and federated learning. It was observed that the gossip learning techniques are ineffective because of not having a central control system. While on the other hand, F.L. has performed better as of its semi decentralised nature [ 52 , 54 ].

Credit card data is imbalanced and skewed. Finance institutions are not allowed to share their credit card data due to privacy concerns and GDPR. Therefore, while considering this issue, experiments were done by using the techniques of federated learning. In this method, the data was trained locally on the participants, i.e., banks and financial institutes. The result showed that the use of F.L. could fulfil the privacy issue where the data is not shared to the central aggregated server; instead, only the trained model is shared [ 55 ]. This is an ideal situation where the data is secured in terms of privacy and confidentiality. F.L. is a cyclic process where the information is trained locally at the client’s devices, and the mean average of the model from the individual client is aggregated together. And by this way, anomaly-based fraudulent transactions are learnt from the respective clients, and thus an effective ML model is trained.

2.7 Blockchain Technology

There are various applications based on blockchain technology that has achieved good public attention. It is based on the fact that; it goes beyond the limits of central servers like banks and other institutions. Instead, it provides the decentralised approach where the user behaviour depends on the nature of the Blockchain technology. There is malicious software that can cause fraud in blockchain transactions. Michal et al. (2019) have proposed a supervised machine learning approach in blockchain technology [ 56 ]. The authors have used this technique on Ethereum blockchain. The experiment was performed on 300 thousand accounts, and the results were compared with random forest, SVM and XGBoost [ 57 ]. They have concluded in the experiment that the various transaction parameters alter the value of precision and recall. They have also suggested that Blockchain is self-maintained technology. This reliance on this could be a potential threat, especially in the finance sector. Therefore, our research is based on a more practical approach with federated learning which is semi-decentralised that ensures efficiency and privacy at the same time.

2.7.1 Why Not Blockchain?

Machine learning approaches are life-changing and continuously evolving in our daily life to make things more comfortable around us. The main hurdle in ML constitutes the diversified and complex training data. Crowdsourcing is one technique used for data collection for the central server, but it possesses limitations concerning data privacy [ 53 ]. Blockchain is one of the emerging technologies for making the possibility of providing the decentralised platform that could result in providing enhanced security to the data [ 57 ]. Therefore, it could be considered the medium of data collection for CCFD in how data is exchanged among banks and financial institutions securely. However, there are several drawback and limitation that make this technology less efficient to use for exchanging data. Furthermore, due to GDPR exchanging data constitutes privacy concerns. Following are some of the disadvantages of blockchain technology while considering CCFD:

The process slows down if there are too many users in a network.

Due to the consensus method used in Blockchain, it is harder to scale the data.

It requires higher energy usage.

Blockchain sometime show inefficiency in its operation.

User must maintain its data in wallets.

The technology is costly.

It is not standardised.

The issues mentioned above in blockchain technology discourage researchers and academic institutions from adopting this technology for CCFD. Our proposed research will fix this issue using the semi-decentralised technique of federated learning. It would provide higher efficiency where the participants will train their model locally (preserve security), resulting in faster processing capability than blockchain technology and higher data scalability (Table 1 ).

3 Classification Imbalance Problem

In credit cards, fraud detection data imbalance is one of the challenging parts that the researchers tried to study. While training the machine learning algorithm could lead to misclassification because of the ratio of genuine transactions towards the fraud transactions (Fig.  3 ).

figure 3

It shows the ratio of imbalance of the data has in the dataset used in most of the research on our table. 284,807 transactions are genuine, whereas 492 were a fraud

Pre-processing the data is one of the techniques to handle imbalanced data, where the oversampling of fraud transactions and under-sampling the legit transaction is performed. That increases the fraud class and decrease the legit transaction class in the original dataset. The performance of the ML algorithm increased after over-sampling where synthetic minority oversampling technique (SMOTE) is considered [ 10 ] for imbalanced data. Balanced classification-rate (BCR) and Matthews correlation-coefficient (MCC) are two metrics for handling class imbalances, and it was observed that the fraud miner is better at achieving higher accuracy. Even though there are various drawbacks of using the SMOTE that includes the noise and probability of overlapping between the class that results in overfitting the model, In the experiment [ 19 ], SMOTE is found to have achieved 2–4% better accuracy as compared to other classification methods. Although adaptive synthetic (ADASYN) and Ranked Minority Oversampling in Boosting (RAMO) methods were proposed afterwards, however, it caused the issue of classification while considering the increased number of iterations, and the researchers have suggested that the ensemble classifier could perform well in contrast to single-classifier when used with imbalance datasets.

4 Model Design

The centralised approach is one of the commonly adopted methods for credit card fraud detection. A fraud detection system (FDS) becomes inefficient when the limited datasets are available and the limited detection period. Banks and other financial centres cannot share their data on a central server due to GDPR. Users’ privacy can still be compromised even if the "anonymised" dataset is locally on servers as it could be reversed-engineered. Therefore, to cope with this challenge we are using FL in our research model as this gives the capability to train the real-time data locally on the edges devices and trained model is centrally shared among all other banks and research centres that can effectively enhance the accuracy of fraudulent transactions.

Secondly, in our research model, we will be using the ANN algorithm to find better evaluation matrix’s on clients’ data in combination with Federated learning to achieve higher accuracy. Furthermore, this model will play an essential role to accomplish the privacy of the user's data in the given hybrid model approach.

4.1 Proposed Model with Federated Learning and ANN

In our FL model, the following steps are involved in training the model until all participants achieve the full transition:

Clients selection

Based on the eligibility criteria, the server selects the participating clients.

Broadcasting

In this stage, the chosen client downloads our model. It will be an artificial neural network mode.

Computation phase

In this stage, all the participant devices compute the model-update by executing the program provided by the server.

Aggregation

In this stage, the server performs the aggregation of the updates from the device.

Model-update

In this, the shared server performs aggregation of the clients update locally and update the shared model.

Model Outline

The proposed model of federated learning with ANN can be classified into three phases followed one after the other until the last phase is completed, and the cycle continues. We will start from Step one as follows:

This step involves the distribution of our model (ANN) from the central server to the respective correspondence banks or financial institutions. It is displayed as "Black Brain" in Fig.  4 . Once the individual banks receive the model, it starts training the model with the available datasets locally. The training process is illustrated below, where the trained model is represented as differentiated by colors for the bank (A-purple, B-blue, C-green and D-red). Digit "1" shows the first phase of sending process of our model to the banks.

figure 4

: Step 1 of the proposed model

On completing step-1, step-2 starts simultaneously to send a trained model from banks to the central server of the federated learning model. On the server, all models from the respective banks are combined and form an "upgraded model" as illustrated in Fig.  5 .

figure 5

Step 2 of the proposed model

Step-3 is the last step of our proposed model, reflecting the sending of "upgraded model" (formed by the mean average of all corresponding trained models from different banks) to the individual bank separately. Furthermore, on receiving the model by the banks, it is trained locally as step-1. Once the training is completed, it is sent back to the server. The process is repeated cyclically until the expected outcome is ensured (Fig.  6 ).

figure 6

Step 3 of the proposed model

Cycle Repetition

After completion of step-3, the process is continued by sending the trained model to the server as the first step explained. Again, the server takes the mean-average of all banks, and it is sent to individual banks again. According to our hypothesis, this repeated training process repeatedly can ensure higher accuracy in CCFD. The overall process is represented in Fig.  7 .

figure 7

The model is commonly and collaboratively shared by banks and other research centres where the data is kept locally to their database. However, just the trained model is shared among all participants, not actual data. The central server will be trained mutually by all participants, resulting in better classification than the individual model trained locally. In simple words, the learning pattern is learnt locally at each client-side, and these learnt patterns are aggregated together in the central server. It is trained from the mutual inputs from all participants. This central model is shared back to all participants, and fraud detection is performed accordingly. By performing the steps mentioned above, FL can significantly enhance fraud detection accuracy, and simultaneously, the privacy of the customer's data is preserved by using the FL, which will incorporate the data according to GDPR.

5 Proposed Method

In this review paper, we found that the usage of supervised learning is common practice among researchers. SVM, KNN, Naïve-Bayes, logistic-regression and DT models are highly used. We also see that the hybrid approach gives a better performance than if usage of a single algorithm/ classifier. As it can be observed, various experiments that are performed on the CCFD in the previous section, although different ML models are proven to be effective in this process however due to data imbalance and heterogeneity, CCFD is always challenging, and models are unable to yield higher accuracy. The factor of data imbalance and heterogeneity could be enhanced for higher volume of data and also the real-time fraudulent patterns are observed constantly, so the model is updated with the potential feature variables. The use of real-time datasets involves privacy issues as the banks and financial institutions are obliged to follow GDPR rules. Our proposed solution suggests the use of a privacy-preserving approach of using the datasets for effective ML model training. Following is the flow chart of the proposed solution that will follow each step as shown, and eventually, it performs an iterative process.

Figure  8 shows our proposed methodology following number of steps from beginning to the end. Data splitting is performed into training, validation and testing with the percentage of 75%, 15% and 15% respectively across the whole dataset. Machine learning algorithm is used on the training data. In our proposed topology, we have used FL framework for model training. In this architecture, model is sent from FL central server to the local server comprising of local devices. The model sent at local devices is trained separately and eventually the trained model is sent back to the FL server and aggregated together. This process is repeated to keep the model updated with the latest patterns. In this framework, only the trained model from the local devices is shared to the FL server and the data is remained secured locally on devices. Once the model is trained, it can be evaluated for performance analysis by testing and validation data. And the trained model from the real time transaction data can be effectively used for CCFD.

figure 8

It shows our proposed methodology. In this model, transaction data can be used for preprocessing and applying ML models. Data splitting, processing, and using the ML model in FL framework is used for data privacy and effective model performance analysis

Our proposed solutions involve the use of a federated learning concept that follows the framework for banks and financial institutions to collaborate for training the ML model. In this process of collaboration, the model is trained locally on each participant, and the trained model is combined centrally without data. The mean average of the trained model is repeated across the participants for training and keep learning new patterns from the variety of data. In this process, the data is not shared; instead, only the trained model is combined centrally. It follows the data privacy concept, where the data is secured (not shared), but at the same time ML model is trained from the datasets. Experiments show that the use of Deep learning algorithms has produced effective outcomes in CCFD. Our proposed solution outlines the use of an artificial neural network with the F.L., which can bring up model training on the bigger scale real-time datasets where privacy is ensured, and the trained model can promise the optimal CCFD. Although work has been done on ANN for CCFD, however, it is based on lab-based datasets. Our proposed solution is novel in the sense that it uses the hybrid approach that is based on using real-time data in a privacy-preserving manner. The use of ANN for effective detection and federated learning for providing the framework of data privacy will provide a hybrid approach which is a novel contribution.

6 Conclusion

This review paper explores the various techniques been used for CCFD. It can be analysed that the ML techniques are a great way to enhance the accuracy of CCFD. However, we need large datasets to train the model to avoid the issue of data imbalance. The use of real-time datasets can provide us with more variety of data, while privacy remains an issue. According to our proposed method, we can utilise the real-time datasets to train the model in a privacy-preserving manner. A Federated learning framework with ANN can enhance the capability of the ML model to detect fraudulent transactions. The proposed hybrid approach can alter the way of CCFD in an effective manner while utilising the real-life datasets and give a new horizon in the field of the banking and finance industry. The proposed method can help the finance institutions and banks to utilise the real-time datasets by the mutual collaboration that would give a collective benefit for developing an effective system for CCFD. Although the proposed method is effective in terms of CCFD while using the real-time datasets in a privacy-preserving way, however, it has limitations when it comes to real-life deployment. All banks and financial institutes have their own rules and regulations, and they are quite strict about it. Adapting the proposed method will be challenging as every bank and finance institutes have their own limitations, and they rely on their internal resources rather than using a centralised approach. Although data is not shared centrally, even the trained model will be going to learn patterns that can be possibly decoded by hackers. Therefore, while keeping the limitations in place, there still needs to be work done for gaining the confidence of banks and financial institutes to adopt this technology.

Availability of Data and Material

Not applicable.

Lucas Y, Portier P-E, Laporte L, et al. Multiple perspectives HMM-based feature engineering for credit card fraud detection. In: ACM, 2019. p. 1359–1361.

Duman E, Elikucuk I. Solving credit card fraud detection problem by the new metaheuristics migrating birds optimization. Berlin: Springer; 2013.

Book   Google Scholar  

Botchey FE, Qin Z, Hughes-Lartey K. Mobile money fraud prediction—a cross-case analysis on the efficiency of support vector machines, gradient boosted decision trees, and Naïve Bayes algorithms. Information. 2020;11:383. https://doi.org/10.3390/info11080383 .

Article   Google Scholar  

Ogwueleka FN. Data mining application in credit card fraud detection system. J Eng Sci Technol. 2011;6:311–22.

Google Scholar  

Sriram Sasank JVV, Sahith GR, Abhinav K, Belwal M. Credit card fraud detection using various classification and sampling techniques: a comparative study. In: IEEE, 2019. p. 1713–1718.

Ojugo AA, Nwankwo O. Spectral-cluster solution for credit-card fraud detection using a genetic algorithm trained modular deep learning neural network. JINAV J Inf Vis. 2021;2:15–24. https://doi.org/10.35877/454RI.jinav274 .

Majhi SK, Bhatachharya S, Pradhan R, Biswal S. Fuzzy clustering using SALP swarm algorithm for automobile insurance fraud detection. J Intell Fuzzy Syst. 2019;36:2333–44. https://doi.org/10.3233/JIFS-169944 .

Darwish SM. An intelligent credit card fraud detection approach based on semantic fusion of two classifiers. Soft Comput. 2019;24:1243–53. https://doi.org/10.1007/s00500-019-03958-9 .

Sobanadevi V, Ravi G. Handling data imbalance using a heterogeneous bagging-based stacked ensemble (HBSE) for credit card fraud detection. Singapore: Springer; 2020.

Li C, Ding N, Dong H, Zhai Y. Application of credit card fraud detection based on CS-SVM. Int J Mach Learn Comput 2021;11(1).

Olowookere TA, Adewale OS. A framework for detecting credit card fraud with cost-sensitive meta-learning ensemble approach. Sci Afr. 2020;8:e00464. https://doi.org/10.1016/j.sciaf.2020.e00464 .

Itoo F, Meenakshi SS. Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. Int J Inf Technol. 2020;13:1503–11. https://doi.org/10.1007/s41870-020-00430-y .

Awoyemi JO, Adetunmbi AO, Oluwadare SA. Credit card fraud detection using machine learning techniques: a comparative analysis. IEEE, 2017. p. 1–9

Alam MN, Podder P, Bharati S, Mondal MRH. Effective machine learning approaches for credit card fraud detection. Cham: Springer; 2021.

Vynokurova O, Peleshko D, Bondarenko O, Ilyasov V, Serzhantov V, Peleshko M. Hybrid machine learning system for solving fraud detection tasks. In: 2020 IEEE third international conference on data stream mining & processing (DSMP), IEEE; 2020. p. 1–5.

Rai AK, Dwivedi RK. Fraud detection in credit card data using unsupervised machine learning based scheme. In: IEEE, 2020. p. 421–426.

Dubey SC, Mundhe KS, Kadam AA. Credit card fraud detection using artificial neural network and back propagation. In: 2020 4th international conference on intelligent computing and control systems (ICICCS). IEEE; 2020. p. 268–273.

Patidar R, Sharma L. Credit card fraud detection using neural network. Int J Soft Comput Eng (IJSCE), 2011;1(32–38).

Dhankhad S, Mohammed E, Far B. Supervised machine learning algorithms for credit card fraudulent transaction detection: a comparative study. In: IEEE, 2018. p. 122–125.

Puh M, Brkic L. Detecting credit card fraud using selected machine learning algorithms. In: Croatian Society MIPRO, 2019. p. 1250–1255.

Varmedja D, Karanovic M, Sladojevic S, et al. Credit card fraud detection—machine learning methods. In: IEEE, 2019. p. 1–5.

Zhu H, Liu G, Zhou M, Xie Y, Abusorrah A, Kang Q. Optimizing weighted extreme learning machines for imbalanced classification and application to credit card fraud detection. Neurocomputing. 2020;407:50–62. https://doi.org/10.1016/j.neucom.2020.04.078 .

Jemima Jebaseeli T, Venkatesan R, Ramalakshmi K. Fraud detection for credit card transactions using random forest algorithm. Singapore: Springer; 2020.

Dighe D, Patil S, Kokate S. Detection of credit card fraud transactions using machine learning algorithms and neural networks: a comparative study. In: IEEE, 2018. P. 1–6.

Mishra MK, Dash R (2014) A comparative study of Chebyshev functional link artificial neural network, multi-layer perceptron and decision tree for credit card fraud detection. In: IEEE, p. 228–233

Rtayli N, Enneya N. selection features and support vector machine for credit card risk identification. Procedia Manuf. 2020;46:941–8.

Xuan S, Liu G, Li Z, Zheng L, Wang S, Jiang C. Random forest for credit card fraud detection. In: 2018 IEEE 15th international conference on networking, sensing and control (ICNSC). IEEE; 2018, p. 1–6.

Worobec K. The definitive overview of payment industry fraud. In: Ukfinance.org.uk. 2021. https://www.ukfinance.org.uk/system/files/Fraud%20The%20Facts%202021-%20FINAL.pdf .

Jakaite L, Schetinin V, Maple C. Bayesian assessment of newborn brain maturity from two-channel sleep electroencephalograms. Comput Math Methods Med. 2012;2012:629654–7. https://doi.org/10.1155/2012/629654 .

Article   MATH   Google Scholar  

Jakaite L, Schetinin V, Maple C, Schult J. Bayesian decision trees for EEG assessment of newborn brain maturity. In: The 10th annual workshop on computational intelligence UKCI 2010. 2010. https://doi.org/10.1109/UKCI.2010.5625584

Jakaite L, Schetinin V, Schult J. Feature extraction from electroencephalograms for Bayesian assessment of newborn brain maturity. In: Proceedings of the 24th IEEE international symposium on computer-based medical systems. 2011. https://doi.org/10.1109/CBMS.2011.5999109

Jakaite L, Schetinin V, Schult J. Feature extraction from electroencephalograms for Bayesian assessment of newborn brain maturity. In: 24th International symposium on computer-based medical systems (CBMS), 2011. p. 1–6. https://doi.org/10.1109/CBMS.2011.5999109

Nyah N, Jakaite L, Schetinin V, Sant P, Aggoun A. Evolving polynomial neural networks for detecting abnormal patterns. In: 2016 IEEE 8th international conference on intelligent systems (I.S.), 2016. p. 74–80. https://doi.org/10.1109/IS.2016.7737403 .

Nyah N, Jakaite L, Schetinin V, Sant P, Aggoun A. Learning polynomial neural networks of a near-optimal connectivity for detecting abnormal patterns in biometric data. In: 2016 SAI computing conference (SAI), 2016. p. 409–413. https://doi.org/10.1109/SAI.2016.7556014 .

Schetinin V, Jakaite L. Classification of newborn EEG maturity with Bayesian averaging over decision trees. Expert Syst Appl. 2012;39(10):9340–7. https://doi.org/10.1016/j.eswa.2012.02.184 .

Schetinin V, Jakaite L. Extraction of features from sleep EEG for Bayesian assessment of brain development. PLoS ONE. 2017;12(3):1–13. https://doi.org/10.1371/journal.pone.0174027 .

Schetinin V, Jakaite L, Nyah N, Novakovic D, Krzanowski W. Feature extraction with GMDH-type neural networks for EEG-based person identification. Int J Neural Syst. 2018. https://doi.org/10.1142/S0129065717500642 .

Hassan MM, Billah MAM, Rahman MM, Zaman S, Shakil MMH, Angon JH. Early predictive analytics in healthcare for diabetes prediction using machine learning approach. In: 2021 12th international conference on computing communication and networking technologies (ICCCNT). IEEE; 2021. p. 01–05.

Hassan MM, Peya ZJ, Mollick S, Billah MAM, Shakil MMH, Dulla AU. Diabetes prediction in healthcare at early stage using machine learning approach. In: 2021 12th international conference on computing communication and networking technologies (ICCCNT). IEEE; 2021. p. 01–05.

Kong M, Li L, Wu R, Tao X. An empirical study of learning based happiness prediction approaches. Hum Centric Intell Syst. 2021;1(1–2):18.

Hassan M, Akter L, Rahman M, Zaman S, Hasib K, Jahan N, Smrity R, Farhana J, Raihan M, Mollick S. Efficient prediction of water quality index (WQI) using machine learning algorithms. Hum Centric Intell Syst. 2021;1(3–4):86.

Schetinin V, Jakaite L, Krzanowski WJ. Prediction of survival probabilities with Bayesian decision trees. Expert Syst Appl. 2013;40(14):5466–76. https://doi.org/10.1016/j.eswa.2013.04.009 .

Schetinin V, Jakaite L, Krzanowski W. Bayesian learning of models for estimating uncertainty in alert systems: application to air traffic conflict avoidance. Integr Comput Aided Eng. 2018;26:1–17. https://doi.org/10.3233/ICA-180567 .

Jakaite L, Schetinin V, Hladuvka J, Minaev S, Ambia A, Krzanowski W. Deep learning for early detection of pathological changes in X-ray bone microstructures: case of osteoarthritis. Sci Rep. 2021. https://doi.org/10.1038/s41598-021-81786-4 .

Wen H, Huang F. Personal loan fraud detection based on hybrid supervised and unsupervised learning. In: 2020 5th IEEE international conference on big data analytics (ICBDA). IEEE; 2020. p. 339–343.

Li W, Lin S, Qian X, et al. An evidence theory-based validation method for models with multivariate outputs and uncertainty. SIMULATION. 2021;97:821–34. https://doi.org/10.1177/00375497211022814 .

Zięba M, Tomczak SK, Tomczak JM. Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Syst Appl. 2016;58:93–101. https://doi.org/10.1016/j.eswa.2016.04.001 .

Vynokurova O, Peleshko D, Bondarenko O, et al. (2020) Hybrid Machine Learning System for Solving Fraud Detection Tasks. IEEE, pp 1–5

Rejwan BS, Schetinin V. Deep neural-network prediction for study of informational efficiency. In: Arai K, editor. Intelligent systems and applications. IntelliSys 2021. Lecture notes in networks and systems, vol. 295. Cham: Springer; 2022. https://doi.org/10.1007/978-3-030-82196-8_34 .

Chapter   Google Scholar  

Visa credit cards in circulation 2020|Statista. In: Statista. 2021. https://www.statista.com/statistics/618115/number-of-visa-credit-cards-worldwide-by-region/ .

Mastercard: credit cards in circulation 2021|Statista. In: Statista. 2021. https://www.statista.com/statistics/618137/number-of-mastercard-credit-cards-worldwide-by-region/ . Accessed 24 Nov 2021.

Hegedűs I, Danner G, Jelasity M. Decentralized learning works: an empirical comparison of gossip learning and federated learning. J Parallel Distrib Comput. 2021;148:109–24. https://doi.org/10.1016/j.jpdc.2020.10.006 .

Ostapowicz M, Żbikowski K. Detecting fraudulent accounts on blockchain: a supervised approach. Cham: Springer; 2019.

Danner G, Berta Á, Hegedűs I, Jelasity M. Robust fully distributed minibatch gradient descent with privacy preservation. Secur Commun Netw. 2018;2018:1–15. https://doi.org/10.1155/2018/6728020 .

Yang W, Zhang Y, Ye K, et al. FFD: a federated learning based method for credit card fraud detection. Cham: Springer; 2019.

Ostapowicz M, Żbikowski K. Detecting fraudulent accounts on blockchain: a supervised approach. In: International conference on web information systems engineering. Springer, Cham; 2020. p. 18–31.

Carneiro N, Figueira G, Costa M. A data mining based system for credit-card fraud detection in e-tail. Dec Support Syst. 2017;95:91–101. https://doi.org/10.1016/j.dss.2017.01.002 .

Download references

Author information

Authors and affiliations.

University of Bedfordshire, Luton, UK

Rejwan Bin Sulaiman, Vitaly Schetinin & Paul Sant

You can also search for this author in PubMed   Google Scholar

Contributions

RBS significantly contributed to the conceptual parts of the paper's contribution to the knowledge. VS and PS assisted with report improvement and review, as well as providing guidance on manuscript drafting.

Corresponding author

Correspondence to Rejwan Bin Sulaiman .

Ethics declarations

Conflict of interest.

The authors declare that they have no competing interests.

Ethics Approval

Consent to participate, consent for publication, additional information, publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Bin Sulaiman, R., Schetinin, V. & Sant, P. Review of Machine Learning Approach on Credit Card Fraud Detection. Hum-Cent Intell Syst 2 , 55–68 (2022). https://doi.org/10.1007/s44230-022-00004-0

Download citation

Received : 25 November 2021

Accepted : 28 March 2022

Published : 05 May 2022

Issue Date : June 2022

DOI : https://doi.org/10.1007/s44230-022-00004-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Artificial neural network (ANN)
  • Credit card fraud
  • Federated learning
  • Random forest (RF) method
  • Support vector machine (SVM)
  • Privacy-preserving

Advertisement

  • Find a journal
  • Publish with us
  • Track your research

A systematic review of literature on credit card cyber fraud detection using machine and deep learning

Affiliations.

  • 1 School of Business, University of Southern Queensland, Toowoomba, QLD, Australia.
  • 2 School of Computing, SRM Institute of Science and Technology, Chennai, India.
  • 3 School of Management, Presidency University, Bangalore, India.
  • PMID: 37346569
  • PMCID: PMC10280638
  • DOI: 10.7717/peerj-cs.1278

The increasing spread of cyberattacks and crimes makes cyber security a top priority in the banking industry. Credit card cyber fraud is a major security risk worldwide. Conventional anomaly detection and rule-based techniques are two of the most common utilized approaches for detecting cyber fraud, however, they are the most time-consuming, resource-intensive, and inaccurate. Machine learning is one of the techniques gaining popularity and playing a significant role in this field. This study examines and synthesizes previous studies on the credit card cyber fraud detection. This review focuses specifically on exploring machine learning/deep learning approaches. In our review, we identified 181 research articles, published from 2019 to 2021. For the benefit of researchers, review of machine learning/deep learning techniques and their relevance in credit card cyber fraud detection is presented. Our review provides direction for choosing the most suitable techniques. This review also discusses the major problems, gaps, and limits in detecting cyber fraud in credit card and recommend research directions for the future. This comprehensive review enables researchers and banking industry to conduct innovation projects for cyber fraud detection.

Keywords: Artificial intelligence; Bank industry; Credit card cyber fraud; Cyber security; Deep learning; Machine learning.

© 2023 Marazqah Btoush et al.

Grants and funding

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

Credit card fraud detection using a hierarchical behavior-knowledge space model

Roles Funding acquisition, Writing – review & editing

* E-mail: [email protected]

Affiliations Department of Electronic and Electrical Engineering, Brunel University London, Uxbridge, UB8 3PH, United Kingdom, Visiting Professor, School of Electronic and Information Engineering, Tongji University, Shanghai, China

ORCID logo

Roles Data curation, Formal analysis, Methodology, Writing – original draft

Affiliation Faculty of Engineering, Computing and Science, Swinburne University of Technology (Sarawak Campus), Malaysia

Roles Investigation, Methodology, Resources

Roles Resources, Validation, Visualization, Writing – review & editing

Affiliation Econometrics and Business Statistics, School of Business, Monash University Malaysia, Selangor, Malaysia

Roles Supervision, Validation, Writing – review & editing

Affiliation Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, Victoria, Australia

  • Asoke K. Nandi, 
  • Kuldeep Kaur Randhawa, 
  • Hong Siang Chua, 
  • Manjeevan Seera, 
  • Chee Peng Lim

PLOS

  • Published: January 20, 2022
  • https://doi.org/10.1371/journal.pone.0260579
  • Reader Comments

Table 1

With the advancement in machine learning, researchers continue to devise and implement effective intelligent methods for fraud detection in the financial sector. Indeed, credit card fraud leads to billions of dollars in losses for merchants every year. In this paper, a multi-classifier framework is designed to address the challenges of credit card fraud detections. An ensemble model with multiple machine learning classification algorithms is designed, in which the Behavior-Knowledge Space (BKS) is leveraged to combine the predictions from multiple classifiers. To ascertain the effectiveness of the developed ensemble model, publicly available data sets as well as real financial records are employed for performance evaluations. Through statistical tests, the results positively indicate the effectiveness of the developed model as compared with the commonly used majority voting method for combination of predictions from multiple classifiers in tackling noisy data classification as well as credit card fraud detection problems.

Citation: Nandi AK, Randhawa KK, Chua HS, Seera M, Lim CP (2022) Credit card fraud detection using a hierarchical behavior-knowledge space model. PLoS ONE 17(1): e0260579. https://doi.org/10.1371/journal.pone.0260579

Editor: Alfredo Vellido, Universitat Politecnica de Catalunya, SPAIN

Received: May 8, 2021; Accepted: November 12, 2021; Published: January 20, 2022

Copyright: © 2022 Nandi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant benchmark data are within the manuscript, given in references [ 24 ], [ 25 ], and [ 26 ]. Relevant real data records are available from a public repository: https://doi.org/10.6084/m9.figshare.17030138 .

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

Classification has been a key application area of machine learning. A classifier learns a mathematical model from training data samples that maps input features to the target classes or labels [ 1 ]. Given a new unseen data sample, the trained classifier is used to provide a prediction of the target class [ 2 ]. It is, however, not easy to use single or few input variables only to differentiate multiple classes to their fullest [ 1 ]. In many classifiers such as neural networks, k -nearest neighbors ( k NN), Support Vector Machine (SVM), and Naïve Bayes (NB), the underlying assumption is that training data samples contain a valid representation of the population of interest, which normally require a balanced sample distribution [ 3 ]. It has been empirically observed that building an accurate classifier based on a single paradigm is often ineffective, if not impossible [ 2 ].

Establishing an accurate classifier is not an easy task, as each classification method has its own advantages and disadvantages. As a result, the concept of classifier fusion using multiple classifiers has become one of the most significant methodologies to improve the classification performance. All classifiers provide their predictions of the class of an incoming data sample, and these predictions are analyzed and combined using some fusion strategy [ 4 ]. In this regard, selections of appropriate classifiers for constructing an ensemble classification model remain a difficult task [ 2 ].

It is a well-established notion in the literature that a classifier combination offers a viable alternative to yield better results than those from a single classifier. This is however dependent on how independent and diverse the classifiers are. Diversity among the chosen classifiers is an important factor for building a successful multi-classifier system (MCS). Various MCS methods have been proposed in modelling and handling different types of data [ 5 ]. Research in this area has led to the development of MCS models that combine the strengths of various individual classifiers, which are built using different training paradigms, to provide improved and robust classification performance [ 2 ].

With the rapid growth in e-commerce, the number of credit card transactions has been on the rise [ 6 ]. Alongside this growth, the issue of credit card fraud has become serious and complicated [ 7 ]. Generally, fraud detection solutions can be divided into supervised and unsupervised classification methods [ 8 ]. In supervised methods, the classification models are based on different samples of genuine and fraudulent transactions, while in unsupervised methods, outliers are detected from the data samples [ 9 ]. Merchants are responsible for paying the bill when a fraud occurs through an online or in-store transaction [ 10 ]. In this paper, we focus on the design and application of an ensemble classification model for credit card fraud detection, which is regarded as a significant problem in the financial sector. Indeed, billions of dollars are lost annually due to credit card fraud, and both merchants and consumers are significantly affected by the consequences of fraud [ 11 ]. With the advancement in fraud detection methods, fraudsters are finding new methods to avoid detection. Capturing irregular transaction patterns is a vital step in fraud detection [ 12 ], and efficient and effective classification methods are required for accurate detection of credit card frauds.

Two main methods are compared, namely majority voting and Behavior-Knowledge Space (BKS) [ 13 ] in this paper. Majority voting is simple but effective method, where an odd number of constituent classifiers is used for a decision in an ensemble. On the other hand, BKS considers the predictive accuracy of each classifiers and use this extra information to aggregate predictions from individual classifiers and derive better results. The main contribution of this paper is the formulation of an ensemble MCS model with the BKS for detection of real-world credit card fraud. The proposed model allows the MCS to accumulate its knowledge and yield better results over time.

The organization of this paper is as follows. A literature review on different types of MCS is presented in Section 2. Designs of the MCS model with BKS are explained in Section 3. A series of empirical evaluation on credit card fraud using publicly available data as well as real-world data from our collection is presented in Section 4. A summary of the findings is given in Section 5.

2. Literature review

An MCS model commonly includes a decision combination method for combining predictions from an ensemble of classifiers. A number of applications using MCS models have been developed over the years. In this section, we present a literature review on different classifier configurations, starting from two classifiers to four or more classifiers.

2.1 Two classifiers

An ensemble classification model using k NN and SVM was presented in [ 14 ] to classify electrocardiogram (ECG) signals. The proposed model achieved an accuracy score of 0.752 as compared with 0.561 to 0.737 from other classifiers [ 14 ]. In financial market trading, an automated framework was presented in [ 15 ], and an MCS model was used a weighted multi-category generalized eigenvalue SVM and Random Forest (RF) to generate the buy or sell signals. Evaluated with five index returns, including those from NASDAQ and DOW JONES, the MCS model achieved notable improvements over the buy/hold strategy as compared with the outcomes from other algorithms [ 15 ].

Predictions of severity pertaining to abnormal aviation events with risk levels were conducted in [ 16 ] using an MCS framework consisting of SVM and deep learning models. The SVM was used for discovering the relationships between event synopsis and consequences, while deep learning was deployed in training. Using cross-validation, the proposed MCS model achieved 81% accuracy, which are 3% and 6% higher than standalone SVM and deep learning models, respectively [ 16 ].

In [ 17 ], an MCS model based on dynamic weights was developed. The MCS model comprised a backpropagation neural network and the nearest neighbour algorithm, which dynamically assigned a fusion weight to a classifier. Using several public face databases, the proposed method obtained better classification accuracy rates as compared with those from individual classifiers [ 17 ]. An MCS model was proposed for face image segmentation in [ 18 ]. A total of three Bayes and one SVM were used in the MCS model. An error rate of 13.9% was achieved, as compared with 50% from standard classifiers, for hair across eyes requirements [ 18 ].

2.2 Three classifiers

In [ 2 ], an MCS was designed using stacked generalization based on DT (Decision Tree), k NN, and NB. A total of 20 different UCI data sets were used in the experiments. Based on a breast cancer data set, an accuracy rate of 74.8% was achieved by the MCS model, as compared with 71.2% from other classifiers [ 2 ]. An adaptive MCS model for gene expression was examined in [ 4 ]. Particle swarm optimization, bat-inspired algorithm, and SVM were used in the ensemble model, which showed significant improvements in classification performance with respect to breast cancer and embryonal tumors, where the training error reduced by up to 50% [ 4 ].

In [ 19 ], an MCS model to maximize the diagnostic accuracy of thyroid detection. The model utilized SVM, NB, k NN, and closest matching rule classifiers to yield the best diagnostic accuracy. The proposed system achieved an accuracy of 99.5% as compared with 99.1% from the best individual classifier in automatically discriminating thyroid histopathology images as either normal thyroid or papillary thyroid carcinoma [ 19 ]. An MCS framework to exploit unlabelled data was detailed in [ 20 ]. The MCS model was built using NB, SVM, and k NN. A total of five text classification data sets were used in the experiments. The highest accuracy rate of 83.3% was achieved by the MCS model, as compared with those from other algorithms [ 20 ].

2.3 Four or more classifiers

An adaptive MCS model for oil-bearing reservoir recognition was presented in [ 5 ]. A total of five classifiers were used, namely C4.5, SVM, radial basis function, data gravitation-based, and k NN algorithms. A number of rules were included in the adaptive MCS model as well. The proposed solution achieved perfect accuracy in recognizing the properties of different layers in the oil logging data [ 5 ]. An advanced warning system was designed in [ 21 ] using an MCS approach for outward foreign direct investment. Logistic regression, SVM, NN, and decision trees were used in the MCS model, which was applied to resource-based enterprises in China. The experimental results indicated the MCS model was able to yield an accuracy score of 85.1%, as compared with 82.5% from a standard neural network model [ 21 ].

In [ 22 ], estimations of precipitation from satellite images were carried out with an MCS model, which combined RF, NN, SVM, NB, weighted k NN, and k -means together. A total of six classes of precipitation intensities were obtained, from no rain to very high precipitation. A score of 0.93 for the coefficient of correlation was yielded by the proposed method, as compared with only 0.46 from other methods [ 22 ]. In [ 23 ], a one-against-one method was explored using MCS that consisted of NN, DT, k NN, SVM, linear discriminant analysis, and logistic regression. An error rate of 0.99% was produced by the MCS model, as compared with 14.9% from other methods on the zoo data set [ 23 ]. In [ 24 ], sentiments of tweets are automatically classified either positive or negative using an ensemble. Public tweet sentiment datasets are used in the experiment. The ensemble is formed using multinomial NB, SVM, RF, and logistic regression. An accuracy rate of 81.06% was achieved on a dataset trained with only 0.03% of the obtained data [ 24 ].

2.5 Remarks

Based on the above review that focuses on various classifier configurations (from two or more classifiers), it is clear that MCS has been used in various applications, including finance, medical, engineering and other sectors. The MCS configuration offers the advantage that the output is not constrained by one classifier, with a pool of classifiers to provide the possibility of improved results. In the event that one classifier produces an incorrect prediction while other counterparts yield a correct one, the combined output can be correct, e.g. in accordance with the majority voting principle. The combined output is, therefore, able to reduce the number of incorrect predictions from single classification method. The results from various MCS configurations reported in the literature are promising, with typically higher accuracy rates. However, MCS-based methods tend to run slower, since a higher computation load is required for execution of multiple classifiers, although this is not regularly reported in the literature. While better results often outweigh longer computational durations, it is useful to ensure that MCS configurations are feasible in terms of computational requirements for practical applications in real-world environments.

3. Classification methods

In this study, several standard machine learning models from H2O.ai were employed to establish an MCS model. The Python software running on the Google Colab environment was used. In the following sub-sections, the majority voting and the BKS model by Huang and Suen [ 25 ] for decision combination is explained.

3.1 Majority voting

Given M target classes in which each class is represented by C i , ∀ i ∈ Λ = { 1 , 2 ,…, M }. The classifier task is to categorize an input sample, x , to one of the ( M + 1 ) classes, with the ( M + 1 )th class denoting that the classifier rejects x .

literature review for credit card fraud detection

A BKS is a K -dimensional space, where every dimension indicates the decision (i.e., predicted class) from one classifier. The intersection of the decisions from K different classifiers occupies one unit in the BKS, e.g., BKS ( e 1 ( x ) = j 1 ,…, e K ( x ) = j K ) denotes a unit where each e k produces a prediction j k , k = 1 ,…, K . In each BKS unit, there are M partitions (cells), which accumulate the number of data samples actually belonging to C i .

Consider an example with two classifiers. A two-dimensional (2–D) BKS can be formed, as given in Table 1 .

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pone.0260579.t001

literature review for credit card fraud detection

The BKS has similarity with the confusion matrix. With the Bayesian approach, multiplication of evidence from the confusion matrices is required to estimate the joint probability of K events when combining the predictions. This step is eliminated in the BKS method, where a final decision is reached by giving the input sample directly to the class that has gathered the greatest number of samples. This simple method of BKS gives a fast and efficient method for combining various decisions, as shown in [ 25 ] for classification of unconstrained handwritten numerals.

A hierarchical agent-based framework with the BKS for decision combination is proposed. As shown in Fig 1 , the framework has N agent groups in the base layer, with each group comprises multiple individual agents. The agents can be machine learning models, statistical methods as well as other classification algorithms. A manager agent is assigned to combine the predictions from each agent group using a BSK. Each manager agent sends its prediction to a decision combination module comprising another BKS in the top layer that produces the final combined prediction.

thumbnail

https://doi.org/10.1371/journal.pone.0260579.g001

A numerical example is presented to better illustrate the BKS mechanism. In Table 2 , a simple binary classification problem is shown. There are two agents (classifiers) and six input samples, along with their predicted and actual classes. A BKS can be constructed, as shown in Table 3 . As an example, for input samples 1 and 4 ( Table 2 ), both agents 1 and 2 predict class 1, and the actual class is 1. This information is recorded in the highlighted (grey) BKS unit in Table 3 . Given a new test sample, the predictions from all agents are used to activate a BKS unit, and the combined predicted class (final output) is reached based on the highest number of samples from the majority class, as given in Eq ( 5 ). Whenever the highlighted (grey) BKS unit is activated during the test phase, the combined (final) prediction is Class 1.

thumbnail

https://doi.org/10.1371/journal.pone.0260579.t002

thumbnail

https://doi.org/10.1371/journal.pone.0260579.t003

4. Experiments

In this empirical evaluation, publicly available data sets from UCI Machine Learning Repository [ 28 ], KEEL Repository [ 29 ], and Kaggle [ 30 ] are used. A real-world data set is also used for evaluation.

Fig 2 shows the configuration of the hierarchical agent-based framework used in the experiments. It consists of three groups, where each group contains three agents. The three agents are Random Forest (RF), Generalized Linear Model (GLM), and Gradient Boosting Machine (GBM), which have been selected based on extensive experiments of individual and group performances. Three agent managers are established, each with a BKS module. The prediction from these three agent managers are sent to the decision combination module that has another BKS to produce the final predicted class.

thumbnail

https://doi.org/10.1371/journal.pone.0260579.g002

Training is first conducted using randomized orders of the data samples, which is followed by a validation process. This in turn creates three group-based BKS modules (one for each group). The next step is combining the outputs from BKS modules 1 to 3 using training data with another randomized sequence, leading to the establishment of another overall (final) BKS module that combines the outputs from the previous three group-based BKS modules. Given a test sample, the group-based BKS outputs are combined again with the overall BKS module to produce a final predicted class for computation of the performance metrics, namely classification accuracy and F1-score.

Classification accuracy and F1-score of each experiment are recorded using Eqs ( 7 ) and ( 8 ), respectively.

literature review for credit card fraud detection

4.2 Benchmark data

A total of 10 data sets are used in the experiments. The details of each data set, i.e., B1 to B10, are shown in Table 4 , including the number of instances and features as well as the imbalanced ratio (IR) information.

thumbnail

https://doi.org/10.1371/journal.pone.0260579.t004

The accuracy rates and F1 scores are shown in Tables 5 and 6 , respectively. In general, the BKS results are slightly higher than those from majority voting for both performance indicators.

thumbnail

https://doi.org/10.1371/journal.pone.0260579.t005

thumbnail

https://doi.org/10.1371/journal.pone.0260579.t006

To evaluate the robustness of BKS, the data samples are corrupted with noise at 10% and 20% levels. A total of 25 runs are conducted for each data set, and the average results are listed in Table 7 . Fig 3 indicates the numbers of wins pertaining to the BKS against majority voting. The three bars for each dataset represent the data with no noise (-0), with 10% noise (-0.1), and with 20% noise (-0.2).

thumbnail

https://doi.org/10.1371/journal.pone.0260579.g003

thumbnail

https://doi.org/10.1371/journal.pone.0260579.t007

To evaluate whether BKS performs better than majority voting from the statistical perspective, a two-tailed sign test is used, as detailed in Section 4.1. Fig 3 shows the number of wins of BKS over majority voting from the experimental results (plotted at 16 wins and above). BKS achieves at least 18 wins out of 25 experimental runs in all ten noisy data sets (10% and 20% noise levels), indicating its superior performance over majority voting in undertaking noisy data samples for α = 0.05 (95% confidence level). When a more stringent statistical significance level of α = 0.01 (i.e., 99% confidence level) is used for evaluation, BKS outperforms majority voting in 9 out of 10 data sets with a noise level of 20%. This outcome positively indicates the usefulness of BKS over majority voting in mitigating the negative effect of noise in performance.

To ascertain the effectiveness of BKS with other methods in the literature, a comparison of the F1 score with the published results of GEP [ 26 ] and CUSBoost [ 33 ] is shown in Table 8 . CUSBoost [ 33 ] achieves the worst performance, while GEP [ 26 ] achieves close results as compared with those from BKS and majority voting. Overall, BKS achieves the highest F1 scores in four out of six data sets, while the scores of the remaining two are a little lower by 0.01 as compared with those of majority voting.

thumbnail

https://doi.org/10.1371/journal.pone.0260579.t008

4.3 Real-world data

This evaluation focuses on real financial transaction records (available in [ 34 ]) from September to November 2017 in a Southeast Asia financial firm. As indicated in [ 35 ], Southeast Asia is one of the fastest growing regions over the years, with a gross domestic product growth rate of over 6%. In this experiment, a total of 60,595 transaction records from 9,685 customers are available for evaluation. The transactions cover activities in 23 countries, with various spending items ranging from online website purchases to grocery shopping. A total of 28 transactions have been identified by the firm and labeled as fraud cases, with the remaining being genuine, or non-fraud cases.

Each transaction record consists of the account number, transaction amount, date, time, device type used, merchant category code (MCC), country, and type of transaction. The account number is anonymized to ensure privacy of customers. In addition to the nine original features, feature aggregation is conducted to generate eight new features. These aggregated features utilise the transaction amount, acquiring country, MCC, and device type over a period of three months. A summary of the features is shown in Table 9 .

thumbnail

https://doi.org/10.1371/journal.pone.0260579.t009

Feature importance scores can provide useful information of the data set. The scores can highlight the relevance of each feature for classification. Based on the 17 features, we carry out a feature importance study using the Decision Tree (DT), Random Forest (RF), and XGBoost classifiers. Fig 4 illustrates the results. It can be observed that all the features depict different levels of importance, and feature 12 (i.e., the count of unique acquiring country) appears to be the most important feature in all three classifiers. The remaining aggregated features (features 10 to 17) generally have slightly higher importance scores as compared with those of the original features.

thumbnail

https://doi.org/10.1371/journal.pone.0260579.g004

Similar to the benchmark data experiment, noise is added with increment of 10% to 40% to this real-world data set. Table 10 summarizes the results. BKS outperforms majority voting when the level of noise increases, indicating its robustness against noisy data. When the noise level increases to 20% and above, BKS outperforms majority voting 18 times (20% and 30% noise) and 19 times (40% noise), respectively. This outcome positively signifies the statistical superior performance of BKS over majority voting at 95% confidence level ( α = 0.05) for undertaking noisy data (20% noise and above) in this real-world experiment.

thumbnail

https://doi.org/10.1371/journal.pone.0260579.t010

Table 11 lists that F1 scores of the experiments. When no noise is added, the F1 scores for both BKS and voting are the same. Again, for noisy data sets, BKS consistently achieves higher F1 scores, as compared with those from majority voting.

thumbnail

https://doi.org/10.1371/journal.pone.0260579.t011

In addition to the experiments with additive noise, two experiments with under-sampling methods are conducted. Two different ratios of minority (fraud transactions) to majority (genuine transactions) are evaluated, i.e., 1:100 and 1:500, and the overall results are shown in Table 12 . Obviously, under-sampling does not help improve the voting results, while the use of 1:100 ratio enhances the BKS results slightly, as the data set is much more balanced, as compared to the original ratio.

thumbnail

https://doi.org/10.1371/journal.pone.0260579.t012

5. Conclusions

A multi-classifier system has been designed to address the classification challenge pertaining to credit card fraud. Specifically, the combination of a hierarchical agent-based framework with the BKS as a decision-making method has been constructed for classifying transaction records of credit cards into fraudulent and non-fraudulent cases. This combination allows the accumulation of knowledge and yields better results over time. To evaluate the proposed multi-classifier system, a series of experiments using publicly available data sets and real financial records have been conducted. The results from the ten benchmark data sets indicate the performance of BKS is better than that of the majority voting method for decision combination. In addition to noise-free data, noise up to 20% has been added to the data samples, in order to evaluate the robustness of the proposed method in noisy environments. Based on the statistical sign test, the BKS-based framework offers statistically superior performance over the majority voting method.

For the real transaction records from a financial firm, up to 40% noise has been added to the data samples. When the noise levels reach 20% and above, the BKS-based framework outperforms the majority voting method, with statistical significance at the 95% confidence level, as ascertained by the sign test. Based on the outcomes from both benchmark and real-world data, the proposed BKS-based framework is effective for detecting fraudulent credit card cases.

In future work, we will address several limitations of the current BKS models. Firstly, it is possible for the BKS table to contain empty cells, leading to no prediction for a given data sample. This observation generally occurs when the number of classifiers increases, i.e., a larger knowledge space is formed. In addition, noisy data sets, particularly noise in class labels, result in inaccurate information captured in the BKS cells, leading to erroneous predictions. We intend to exploit probabilistic methods, such as Bayesian inference, to interpret the BKS prediction and enhance its robustness in undertaking noisy data classification problems.

Additionally, we will investigate imbalanced data issues using a combination of over-sampling and under-sampling techniques. The effect of these different techniques toward classification performance will be analyzed and compared systematically using statistical hypothesis tests. We will also develop an online version of the proposed model. The model will be able to learn data samples on-the-fly and keep improving its prediction accuracy incrementally. This online learning model will be applied to various financial problems as well as other classification tasks.

  • 1. Weiss S. M., & Kulikowski C. A. (1991). Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems. Morgan Kaufmann Publishers Inc.
  • View Article
  • Google Scholar
  • PubMed/NCBI
  • 28. “UCI Machine Learning Repository,” [Online] Available: https://archive.ics.uci.edu/ml/datasets , 2020.
  • 29. “KEEL Data Set Repository,” [Online] Available: https://sci2s.ugr.es/keel/datasets.php , 2020.
  • 30. “Credit Card Fraud Detection,” [Online] Available: https://www.kaggle.com/mlg-ulb/creditcardfraud , 2020.
  • 31. Sheskin D. J. (2020). Handbook of parametric and nonparametric statistical procedures . CRC Press.
  • 34. “Transaction Records,” [Online] Available: https://doi.org/10.6084/m9.figshare.17119091 , 2021.
  • 35. Jiang C., & Yu W. (2018). Risk Control Theory of Online Transactions, Science Press, Beijing, China.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

An intelligent payment card fraud detection system

Manjeevan seera.

1 Econometrics and Business Statistics, School of Business, Monash University Malaysia, Selangor, Malaysia

Chee Peng Lim

2 Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC Australia

3 AIM Research Center on Artificial Intellegence in Value Creation, EMLYON Business School, Écully, France

Lalitha Dhamotharan

4 University of Exeter Business School, University of Exeter, Exeter, UK

Kim Hua Tan

5 Nottingham University Business School, Nottingham, UK

Payment cards offer a simple and convenient method for making purchases. Owing to the increase in the usage of payment cards, especially in online purchases, fraud cases are on the rise. The rise creates financial risk and uncertainty, as in the commercial sector, it incurs billions of losses each year. However, real transaction records that can facilitate the development of effective predictive models for fraud detection are difficult to obtain, mainly because of issues related to confidentially of customer information. In this paper, we apply a total of 13 statistical and machine learning models for payment card fraud detection using both publicly available and real transaction records. The results from both original features and aggregated features are analyzed and compared. A statistical hypothesis test is conducted to evaluate whether the aggregated features identified by a genetic algorithm can offer a better discriminative power, as compared with the original features, in fraud detection. The outcomes positively ascertain the effectiveness of using aggregated features for undertaking real-world payment card fraud detection problems.

Introduction

Many types of payment cards, which include credit, charge, debit, and prepaid cards, are widely available nowadays, and they constitute one of the most popular payment methods in some countries (Pavía et al., 2012 ). Indeed, advances in digital technologies have transformed the way we handle money. Payment methods have changed from being a physical activity to a digital transaction over electronics means (Pavía et al., 2012 ). This has revolutionized the landscape of monetary policy, including business strategies and operations of both large and small companies.

As reported in Forbes ( 2011 ) by the American Bankers, it is estimated that 10,000 transactions pertaining to payment cards occur every second globally. Owing to such a high transaction rate, payment cards have become a target for fraud. Fraud has been a key concern in most commercial and business areas (Bernard et al., 2019 ). Indeed, since Diners Club issued the first credit card in 1950, credit card companies have been constantly fighting against fraud (Forbes, 2011 ). Each year, payment card fraud leads to losses in billions of dollars. These losses create risk and uncertainties to the financial institutions (Sariannidis et al., 2020 ). Fraud cases occur under different conditions, e.g., transactions at the Point of Sales (POS), online or over-the-telephone transactions, i.e., Card Not Present (CNP) cases, or transactions with lost or stolen cards. The loss from fraudulent incidents in payment cards amounted to $21.84 billion in 2015, with issuers bearing the cost of $15.72 billion (The Nilson Report, 2016 ). Based on the European Central Bank in 2012, the majority (60%) of fraud cases stemmed from CNP transactions, while another 23% at the POS terminals.

The potential of substantial monetary gains, combined with the ever-changing nature of financial services, creates a wide range of opportunities for fraud cases to occur (Ferreira & Meidutė-Kavaliauskienė, 2019 ). Funds from payment card fraud are often used in criminal activities, e.g., to support terrorism activities (Everett, 2003 ). Over the years, fraudulent mechanisms have evolved along with the models used by the banks to avoid fraud detection (Bhattacharyya et al., 2011 ). Therefore, it is imperative to develop effective and efficient methods to detect payment card fraud. The developed methods also need to be revised continually in accordance with the advances in technologies. There are many challenges in developing effective fraud detection methods. Among them, researchers face the difficulty in obtaining real data samples from payment card transactions as financial institutions are reluctant to share their data owing to confidentiality issues (Dal Pozzolo et al., 2014 ). As a result, only limited research studies with real data are available in this area. Some machine learning and related studies on fraud detection have been conducted using publicly available data sets (Sahin et al., 2013 ), e.g. SVMs (support vector machines), ANNs (artificial neural networks), decision trees, as well as regression and rule-based methods.

In this paper, a total of thirteen widely used statistical and machine learning methods for fraud detection in payment cards are implemented. The methods used include SVMs and ANNs, as well as more recent deep learning methodologies. Two data sources are used for evaluation: publicly available repositories and real payment card database. In this study, the availability of real payment card data for evaluation is of particular importance, which ensure that the methods developed are usable and useful in real-world situations. In other words, the developed methods should be deployable in the financial sector for detecting payment card fraud, leading to reduction in financial losses and mitigation of risk and uncertainty in the business world.

The main contributions of the paper are two-fold. Firstly, a feature aggregation method is devised for processing payment transaction records. The key benefit of aggregating various features from transaction data is improvement in robustness to counter the effects of concept drift (Whitrow et al., 2009 ). Besides that, feature selection using optimization methods is applied to the transaction data. Secondly, a real payment card database is used for analyzing the performance of a variety of statistical and intelligent data-based algorithms for fraud detection, in addition to using benchmark data. As it is different to obtain real financial records for analysis (due to confidentiality issues), the outcome of our study is important for deriving valuable insights into the robustness of various data-based methods utilizing aggregated features for fraud detection in payment card transactions in real-world environments.

For the remaining part of this paper, we firstly present a literature review on finance applications of statistical and machine learning algorithms. Secondly, the background of various classification methods devised in this paper is given. We then explain the detailed experimental study, which covers the results, analysis, and comparison. Implications of the developed methods for practical application and a summary of the findings are presented at the end of this paper.

Literature review

Fraud detection systems are used in identifying unusual behaviors in electronic payment transactions. In this paper, we conduct a review on a variety of fraud detection systems, which is divided into two broad categories, namely, systems that use the original features and those that aggregate the original features. A summary of the review is presented at the end of this section.

Original features

A data set from a Brazilian online payment service was used in de Sá et al. ( 2018 ) for detection of credit card fraud. A customized Bayesian Network Classifier was proposed. The underlying method relied on an evolutionary algorithm coupled with a hyper-heuristic mechanism to search for the best component combinations in the data set. The proposed algorithm improved the efficiency by 72.64% (de Sá et al., 2018 ). A data set from Worldline Belgium was used in Van Vlasselaer et al. ( 2015 ) for credit card fraud identification with respect to the online stores sector. A method known as APATE (Anomaly Prevention using Advanced Transaction Exploration) that combined customer spending history and a time-based suspiciousness measure pertaining to each transaction was formulated, which yielded good results in term of the area under the ROC (Receiver Operating Characteristic) (i.e. AUC) plot (Van Vlasselaer et al., 2015 ).

In Russac et al. ( 2018 ), the same data set as in Van Vlasselaer et al. ( 2015 ) was used to extract sequential information from the transactions. As the computational load was heavy, only three categorical features were used. Sequential information of the three categorical features was extracted using a Word2Vec neural network. This method led to reduction of the memory usage by half while improving the performance by 3% (Russac et al., 2018 ). A real data set from Banco Bilbao Vizcaya Argentaria (BBVA) was used in Gómez et al. ( 2018 ) for fraud detection. The MLP (multi-layer perceptron) ANN was used in classifying the data samples. Various backpropagation methods were used together with the MLP network. The results were comparable with those from other costly solutions (Gómez et al., 2018 ). A databased containing transaction records of credit cards was used for fraud detection in Jurgovsky et al. ( 2018 ). The LSTM (long short term memory) model was applied to examine the sequence of transactions. The RF (Random Forest) model was compared with LSTM in the study. Based on the observation, both RF and LSTM detected different fraud cases. A further analysis suggested a combination of RF and LSTM could result in a better fraud detection system (Jurgovsky et al., 2018 ).

A data set from a card processing company, CardCom, was used in Robinson and Aria ( 2018 ) to ascertain fraud cases in prepaid cards. The Hidden Markov Model (HMM) was exploited. The proposed technique acquired a good F-score, which could detect fraudulent cases in real-time (Robinson & Aria, 2018 ) with thousands of prepaid card transactions. A total of three real-world data sets were used in Rtayli and Enneya ( 2020 ). A hybrid SVM model consisting of recursive feature elimination, grid search and oversampling technique was proposed. The proposed method gave the best results in terms of efficiency and effectiveness (Rtayli & Enneya, 2020 ). The weighted extreme learning machine (WELM) was used for evaluation with benchmark credit card data in Zhu et al. ( 2020 ). WELM with the dandelion algorithm yielded a high detection performance (Zhu et al, 2020 ). An ensemble model using sequential modeling of deep learning and voting mechanism of ANN was proposed in Forough and Momtazi ( 2021 ). Based on real-world credit card data, the time analysis results indicated the real time high efficiency of the proposed model as compared with other models (Forough & Momtazi, 2021 ).

Aggregated features

In this section, we review feature aggregation for fraud detection. Among various feature aggregation methods, feature averaging summarizes the cardholder activities by comparing the spending habits and patterns (Russac et al., 2018 ). In Bahnsen et al. ( 2016 ), a credit card-related database for fraud detection was examined. By analyzing the periodic behaviors over time, an aggregated feature set was generated. The use of aggregated features yielded an average of 13% saving with different classification models, including RF (random forest), LOR (logistic regression), and DT (decision tree) (Bahnsen et al., 2016 ).

Two data sets pertaining to European card holders were used in Dal Pozzolo et al. ( 2017 ) for detecting credit card fraud. An alert-feedback interaction was applied to train and update the classifier. During augmentation of the features, a set of aggregated features linked with individual transactions was generated to better separate fraud cases from real transactions. The outcome indicated that a lower degree of influence of feedback led to less precise alerts (Dal Pozzolo et al., 2017 ). In Fu et al. ( 2016 ), a credit card database collected from a commercial bank was collected for fraud detection analysis. To capture the underlying patterns associated with fraudulent behaviours, the CNN (convolutional neural network) was employed. Feature engineering techniques similar to transaction aggregation were adopted to generate a total of eight features, from average transaction amount to trading entropy gain. Given a feature matrix, the CNN was able to identify the patterns of fraud and produced better performance when compared with other methods (Fu et al., 2016 ).

In Jiang et al. ( 2018 ), a simulator was used to generate credit card transaction data for the purpose of fraud detection. All generated data samples were divided into multiple groups, with each group containing similar transactional behaviors. A window-sliding method was applied to aggregate the transactions and produce 7 (4 amount-related and 3 time-related) features. The RF achieved 80% accuracy at the detection of transactions, coupled with a feedback mechanism (Jiang et al., 2018 ). Another fraud detection study pertaining to credit cards with simulated data was reported in Lim et al. ( 2014 ). A conditional weighted transaction aggregation method was applied to record the transactions. Each transaction was given a weight. A distance measure between the previous and current transactions was exploited to set the weight. Algorithms such as RF, k -NN, and LOR were used for classification. Comparing with the transaction-based technique, the method of aggregation was able to produce better outcomes (Lim et al., 2014 ).

In Lucas et al. ( 2020 ), modelling on the sequence of credit card transactions was conducted. Three different scenarios were considered, namely sequences fraudulent and non-fraudulent records, sequences obtained by fixing cardholders, and sequences of amount spent between the current and past transaction records. Each sequence was associated with a likelihood, as modelled using the HMM model. The resulting information was adopted as additional features for analysis with the RF model. The results indicated that the feature engineering performed well for credit card fraud detection (Lucas et al., 2020 ). A real-world credit card data set from a commercial bank in China was used by Zhang et al. ( 2020 ). A method that consisted of feature engineering and deep learning was devised. For each occurrence of a transaction, a number of feature variables were computed based on the incoming and past transaction records for aggregation. In addition, homogeneous historical transactions were considered. The proposed method could efficiently identify fraudulent transactions (Zhang et al., 2020 ).

Table ​ Table1 1 depicts a summary of the relevant publications discussed earlier. In general, two broad categories can be formed: original features and aggregated features. According to the review, two types of features are typically used, i.e., original/standard and aggregated features. The use of aggregated features increases the ability of the classifiers to detect fraudulent transactions, as depicted by the reported results. On the other hand, there is a lack of comprehensive statistical analyses for the results presented in the literature. In general, the accuracy metric is used for performance assessment, which could not aptly reflect the true potential of a classifier, especially in dealing with highly imbalanced data sets that commonly exist in fraud detection studies. Some studies use the AUC metric, which is a more comprehensive performance indicator. One critical issue is the false alarm rate, i.e., genuine transactions flagged by the detection systems as fraudulent cases. Financial institutions spend substantial time and money in investigating these legitimate cases. Importantly, flagging a genuine transaction causes customer dissatisfaction and inconvenience. As such, an efficient and effective fraud detection system should minimise the false alarm rate, which is the main focus of our study in this paper. Our main contribution is we deploy a real-world payment data set to comprehensively assess the ability of a total of thirteen classifiers along with the use of statistical measures, including the AUC, for performance assessment as well as for comparison with other results reported in the literature.

Summary of review

Classification methods

In this section, we present an overview on thirteen classification methods devised for evaluation. These methods are split into six main sub-groups, with the details as follows.

Bayesian methods

The Bayesian theorem is used to develop the Naïve Bayes (NB) method. To formulate a classification method, an independence assumption is adopted in NB. Features from different classes are assumed to be independent from each other, which is a strong assumption. Given an input vector X , with Y as the associated target class, the Bayesian theorem yields

where P( X |Y) and P(Y| X ) are the conditional probability (of X given the occurrence of Y) and posterior probability (of Y given the occurrence of X ), respectively; while P( X ) and P ( Y ) are the probability of evidence with respect to X and Y. The predicted target class of input X is based on P(Y| X ) that yields the highest value. Suppose that input X consists of n features, we have

Tree methods

A total of five methods based on trees are presented, which consists of the Decision Trees (DT), Decision Stump (DS), Random Tree (RT), Random Forest (RF) and Gradient Boosted Trees (GBT).

Decision Trees (DT) can be applied to both classification and regression problems. In classification tasks, a predicted outcome with respect to a target class is required, while the predicted outcome can use a real number (e.g. the price of a stock) in regression tasks. A DT is used for predicting dependent Y from independent variables X  =  X 1 , …, X n . In DT, a number of nodes are established, forming a link from the input to output data samples. The Gini impurity measure is used to determine how frequently a randomly chosen input sample is incorrectly labeled.

The Gini index for a data set with J classes can be computed using

where i  ∈ {1, 2, …, J }, and p k represents the proportion of samples in class k . A rule that splits the input features is encoded in each DT node. To form a tree structure, new nodes can be created, subject to a stopping criterion. When an input sample is provided, the majority of data samples stemmed from a leaf node of the tree are identified, leading to a predicted target class.

On the other hand, Decision Stump (DS) is developed based on DT with only single split. DS is useful to tackle an uneven distribution of data samples. The procedure of tree averaging uses a set of n weights, { w 1 , w 2 , …, w n }, for every tree in the set of pruned trees. The weights are normalized, leading to ∑ i = 1 i = n w i = 1 . Object O can be classified by a smoothed tree, in which the probability of each class given by each tree is determined as follows

where k is the class. Then, calculate the distribution probability over all classes with respect to object O by adding over the pruned trees set,

The Random Tree (RT) method is devised based on the same principle of DT. It, however, uses a subset of randomly selected input samples for the split process. A subset ratio is exploited to determine the random subset size. Both nominal and numerical data samples can be used with an RT. Each interior node corresponds to one of the input variables. The number of edges in an interior node is equivalent to the possible values pertaining to the corresponding input variable.

Based on an ensemble of RT models, the Random Forest (RF) method is formed, where the tree size is a user-defined parameter. Given the predictions from the trees, the final predicted class yielded by RF is determined by using a voting mechanism. When each classifier, denoted as h k ( x ) , is a DT, the ensemble is an RF. The DT parameter of h k ( x ) is

The term can be written as h k x = h x | Θ k . For classification function f x combines all classifier outputs, whereby every DT outputs a vote for the most probable class given input x . The class with the highest number of votes wins.

Another useful regression/classification ensemble method is Gradient Boosted Trees (GBT). To increase the prediction predictions, GBT exploits a forward-learning ensemble with the boosting concept for improvement. Using a training set of { ( x 1 , y 1 ) , ⋯ , ( x n , y n ) } of x inputs and the corresponding y outputs, the goal of GBT is to find F ^ ( x ) to function F ( x ) which reduces the expected value of loss function L ( y , F x ) , i.e.,

GBT assumes y seeks an approximation of F ^ ( x ) in form of a weighted sum of functions h i x from some class H (known as weak learners),

Neural networks

In general, the Multi-Layer Perceptron (MLP) Artificial Neural Network (ANN) structure comprises three (input, hidden, and output) layers. The backpropagation (BP) algorithm for learning the network weights. Similarly, BP is used for learning in the FF (Feed-Forward) ANN. In both networks, the information is propagated from the input to output nodes via the nodes in the hidden layer. The goal in training is to minimize the cost function C of data set D p via

where y p is target (output) vector for p , y ^ is actual output of MLP ( y ^ = M L P x ; w ) , x ( p ) is input vector for p , L is criterion to optimize the mean squared error. Gradient descent is used, which is an iterative procedure to modify the weights, W t ,

where n is the learning rate.

Deep Learning (DL) has its roots in ANN. It is a popular learning model currently, where an FF ANN model with possibly thousands of hidden layers is formulated. Different activation functions are used in DL. Based on local data, the weights in each node makes contributions toward a global model through an averaging procedure. As DL is based on a multi-layered ANN model, it is trained using SGD (stochastic gradient descent) with backpropagation. Backpropagation works by accumulating prediction errors from the forward phase for updating the network parameters so that it performs better in the next iteration. The descent method needs the gradient loss function of ∇ J θ to be computed. In an ANN, the operation can be conducted by a computational graph. The graph turns the loss function J into a number of intermediate variables. Backpropagation is applied recursively in the chain-rule for computing the gradients from the outputs to the inputs.

Regression methods

Two different regression methods are presented, consisting of Linear Regression (LIR) and Logistic Regression (LOR). Given a data set, a linear function is used to regulate the relation with respect to the scalar variables in LIR. Specifically, linear prediction functions are exploited to model the relation based on parameters estimated using the data samples. When there are two or more predictors, the target output is a linear combination of the predictors. The output (dependent variable) is obtained using

where each b i is the corresponding coefficient of each x i , which is the explanatory variable. In a two-dimensional example, a straight line through the data samples is formed, whereby the predicted output, y ^ , for a scalar input x is given by

LOR, another regression method, is useful to process numerical as well as nominal input samples. In LOR, one or more predictors are exploited to yield an estimation of the probability pertaining to a binary response. Support the probability of event occurrence as p, the linear function of predictor x is given by

Similar to Eq. (13), in the case involving independent variables, x i ’s,

The output probability is computed using

Support vector machines

Support Vector Machines (SVMs) are supervised models that can be used for both classification and regression tasks. A SVM can be formulated to act as a binary classifier through a model that assigns the data samples into two different categories. A parallel margin is established based on the data samples that yields the widest possible separation pertaining to different classes. Specifically, a hyperplane is established as follows

where w, x, and b are the weight, input, and bias term, respectively. In an optimal hyperplane, H 0 , the margin, M, is given by

where w 0 is formed from the training samples, known as the support vectors, i.e.,

Rule induction

Rule Induction (RI) is a method where formal rules are extracted from a set of observations. The rules represent a full model or local patterns pertaining to the data sample. The method begins with less common classes, and grows as well as prunes the rules based on all positive samples, or over 50% of error is reached. For each rule, r i , its accuracy is calculated using

During the growing phase, specific conditions are incorporated into the rule, and this process continues until an accuracy rate of 100% is achieved. During the pruning phase, the final sequence of each rule is removed using a pruning metric.

Empirical evaluation

In this section, an empirical evaluation using publicly available databases together with a real-world payment card database is presented. All experiments were performed using a commercial data mining software, i.e., RapidMiner Studio 7.6. RapidMiner Studio is a software product that allows the prototyping and validation of machine learning and related classification algorithms. All parameters were set based on the default settings in RapidMiner. For evaluation, we adopt the tenfold cross-validation (CV) method. The key advantage of CV is to minimize bias with respect to random sampling during the test phase (Seera et al., 2015 ).

Class imbalance

In an imbalanced data set, there are fewer training samples from the minority class(es), as compared with those from the majority class(es) (Rtayli & Enneya, 2020 ). In fraud detection, the number of fraud cases is normally tiny, as compared with those of normal transactions. Fraudsters always attempt to create a fraudulent transaction as close as possible to a real transaction, in order to avoid being detected (Li et al., 2021 ). This data imbalanced issue affects the performance of machine learning methods. The learning algorithms normally focus on data samples from the majority class, leading to a higher accuracy rate as compared with that of the minority class.

While there are a number of ways for tackling the class imbalance challenge, creating synthetic data samples for learning leads to an increase in the false positive rate (Fiore et al., 2019 ). In our evaluation, we adopt the under-sampling technique to tackle this imbalanced data distribution problem. It is desirable to avoid the mistake in classifying legitimate transactions as fraud cases, in order to avoid poor customer services. On the other hand, it is necessary to accurately detect fraudulent transactions, in order to minimize financial losses.

The use of accuracy in imbalanced classification problems is inappropriate (Han et al., 2011 ), because the majority of data samples influence the results. The ROC curve depicts the classification performance subject to different thresholds. The AUC score is the probabilities of a classifier’s ranking pertaining to randomly selected positive samples to be larger than those of negative ones, which statistically significant difference in performances with respect to different methods (Hanley & McNeil, 1982 ). In our experiments, the AUC score was used.

Benchmark data

We used three data sets from the UCI Machine Learning Repository (Dua & Graff, 2019 ), namely Statlog (German Credit), Statlog (Australian Credit), and Default Credit Card; hereafter denoted as German, Australian, and Card, in the first evaluation. All three were binary classification tasks, and the number of features varied from 14 to 24. The details are given in Table ​ Table2 2 .

Distribution of data samples

The accuracy rates for the three problems are listed in Table ​ Table3. 3 . With respect to the German data set, the lowest (69.9%) and highest (76.7%) accuracy scores were produced by LIR and DS, respectively. For the Australia data set, the lowest accuracy rate was produced by RT (i.e., 70.725%), while GBT produced the highest accuracy rate (i.e., 86.232%). This constituted the most significant variation in all classifiers in this experiment, i.e., a difference of 16% from the lowest to the highest accuracy rates. For the Card data set, NB and GBT yielded the lowest and highest accuracy rates, i.e., 70.7% and 82.06%, respectively.

Accuracy results (best in bold)

The AUC scores are given in Table ​ Table4. 4 . A score as close to unity as possible is preferred. In the German data set, the lowest AUC score of 0.5 was produced by both DT and DS, while LIR achieved the highest score of 0.796. RT and GBT yielded the lowest (0.682) and highest (0.937) AUC results for the Australia data set, respectively. Similarly, RT and GBT produced the lowest (0.545) and highest (0.778) AUC results for the Card data set, respectively.

AUC results (best in bold)

To further demonstrate the usefulness of various methods, the results were compared with those in recent publications, as reported in Feng et al. ( 2018 ) and Jadhav et al. ( 2018 ). A fivefold CV method was used in Feng et al. ( 2018 ), while Jadhav et al. ( 2018 ) used the tenfold CV method, i.e., the same as our experiment. The models in both Feng et al. ( 2018 ) and Jadhav et al. ( 2018 ) were built using the MATLAB software. The highest accuracy rate of 76.70% in the German data set, as shown in Table ​ Table5, 5 , was achieved from our experiment. With respect to AUC, the best score of 0.796 was also from LIR in our study, followed by NB (Jadhav et al., 2018 ) at 0.767.

Comparison of accuracy and AUC using the German data set (best in bold)

In the Australian data set, the best accuracy rate of 87.00%, as shown in Table ​ Table6, 6 , was yielded by RF (Feng et al., 2018 ). In comparison with our results, the accuracy rate achieved by GBT was 86.23%. In addition, the best AUC result (0.937) was produced by GBT in our experiment, which was followed by 0.913 from NB (Jadhav et al., 2018 ). As can be observed in Table ​ Table7, 7 , the best reported accuracy rate with respect to the Card data set was 82.06% from GBT in our study. Similarly, GBT achieved the best AUC score of 0.778, which was higher than the best score of 0.699 from NB in (Jadhav et al., 2018 ). As can be observed in Table ​ Table7, 7 , the best reported accuracy rate with respect to the Card data set was 82.06% from GBT in our study. Similarly, GBT achieved the best AUC score of 0.778, which was higher than the best score of 0.699 from NB in (Jadhav et al., 2018 ).

Comparison of accuracy and AUC using the Australian data set (best in bold)

Comparison of accuracy and AUC using the Card data set (best in bold)

Real payment card data

In this evaluation, we established a database with real payment card transactions obtained from a financial firm in Malaysia. The data set contained transactions from January to March 2017. A total of 61,786 transaction records were available for evaluation. The transactions covered activities in 112 countries, with various spending items ranging from online website purchases to grocery shopping. Among the transactions, 46% occurred locally in Malaysia, which was followed by transactions made in Thailand and Indonesia. A total of 31 transactions were identified and labeled as fraud cases, with the remaining being genuine, or non-fraud cases. The list of features used is given in Table ​ Table8 8 .

List of features

Feature aggregation

In feature engineering, feature selection and feature aggregation constitute two main considerations. The capability of extracting discriminative features and removing irrelevant ones is important to improve classification performances, especially when dealing with high-dimensional data (Rtayli & Enneya, 2020 ). In general, feature selection can be categorized into two: filter and wrapper methods (Zhang et al., 2019 ). An independent evaluation is used in filter-based method to evaluate and identify important features. A pre-determined classifier is used in computing the evaluation in wrapper-based methods, which is computationally expensive.

When the number of features is small, feature aggregation methods are useful. A number of aggregation methods used in the literature are reviewed. Specifically, the data set used in Bahnsen et al. ( 2016 ) contained 27 features. Eight groups of duration were established, namely 1, 3, 6, 12, 18, 24, 72, and 168 h. The features were aggregated based on merchant codes, merchant groups, various transaction types, modes of entry, and country groups. The raw features yielded inferior results than those of aggregated features in all cases (Bahnsen et al., 2016 ). In Jha et al. ( 2012 ), additional features were derived by combining information across 1 day, 1 month, and 3 months. A total of 14 derived features were created, covering transaction amounts over a month, the total transaction records in one country over the last month, and average spending amounts during the last 3 months.

In Whitrow et al. ( 2009 ), aggregation of transaction information over periods of 1, 3, and 7 days was conducted. The data set was grouped into 24 categories, leading to a total of 48 features using the Laplace-smoothed estimation of the fraud rates. The aggregated records included the numbers of transactions performed at terminals using PIN as well as the numbers of transactions performed to date. In Fu et al. ( 2016 ), a total of 8 features were used for aggregation, with no details of the time period. The features used for aggregation included average, total, bias, number, country, terminal, merchant, trading entropy of transactions during a period of time. In Jiang et al. ( 2018 ), a total of 5 raw features were used, and the users were divided into 3 similar groups using k -means. To combine the information from the transactions information, the sliding-window method was used. Aggregation over a period of a week, a month, or a user-defined period was conducted. A total of 7 features were aggregated, with 4 amount-related features and another 3 time-related features. Based on the original nine features in our real data set provided by a Malaysian financial institution, we investigated on how aggregation of transaction records could affect the classifiers’ performance. Information on each cardholder’s account was continuously updated as new transactions occurred. Note that the use of new information aimed to give a better discrimination between fraud and non-fraud transactions. In accordance with Whitrow et al. ( 2009 ), a series of three suspicious transactions was likely to be an indicator of a fraud case.

We formed four data sets (denoted as D1, D2, D3, and D4) for evaluation, namely the original and three additional data sets. In line with the method in Bahnsen et al. ( 2016 ), D2 included additional aggregated features:

  • sum of transaction amounts for the same country and transaction type in the last 24 h.
  • sum of transaction amounts in the last 24 h;
  • no. of transactions having the same country and transaction type in the last 24 h;
  • no. of transactions in the last 24 h;

Four new features, in addition to the aggregated ones, were added to D3, as follows:

  • no. of transactions having the same type of device in the last 24 h;
  • no. of transactions having the same MCC (merchant category code) in the last 24 h;
  • sum of transaction amounts having the same type of device in the last 24 h.
  • sum of transactions amounts having the same MCC in the last 24 h;

D4 was produced using a GA (genetic algorithm) to determine the most significant features. Based on the survival-of-the-fittest concept, the GA is useful for undertaking search and optimization tasks. Here, feature selection with the GA was conducted using the default parameters in RapidMiner. During the feature selection process, the ‘mutation’ operator switched the features “on” and “off”, while the ‘crossover’ operator interchanged the selected features. Further details on this process are given in RapidMiner ( 2018 ).

Experimental results

Table ​ Table9 9 lists the accuracy (ACC) rates of all experiments. In D1, the ACC rates were approximately 99%, with 11 out of 13 achieving more than 99.8%. NB produced the lowest accuracy rate of 32.8%. Improvement in ACC was achieved by using D2, where the accuracy rate of NB increased from 32.8 to 97.6%. The other 12 models achieved ACC rates of more than 99.9%. The best ACC rate was produced by DT and DS, which showed a minor improvement from that of D1. The D3 results from all models remained similar to those from D2. The D4 results from all models showed either a minor increase or remained the same. DT and DS yielded the best ACC rate for all four data sets.

Accuracy (ACC) results (best in bold)

To further assess the performance, we employed the Matthews Correlation Coefficient (MCC) (Powers, 2011 ). MCC provides a performance indication in binary tasks. It yields a balanced metric in evaluating classification problems with different data sample sizes pertaining to the target classes. Both false and true negative and positive predictions are considered, as follows

A total disagreement and a perfect prediction are indicated by − 1 and + 1, respectively.

The MCC scores are given in Table ​ Table10. 10 . In D1, the MCC scores were poor, with the best score of 0.12 achieved by GBT. Most of the remaining classifiers did not produce desirable results, as their fraud detection rates were 0. A similar trend of MCC could be observed across D2, D3, and D4. DT and DS achieved the best MCC score of 0.964, while RF showed competitive MCC scores. Improvements in other classifiers could be observed as well, with differences in scores for D2, D3, and D4. The fraud detection rates are shown in Fig.  1 . NB achieved the most stable performances across all four data sets. For other classifiers except NB, GBT, and DL, the fraud detection rate with D1 was zero. In D2, all classifiers, except LIR and RI, managed to detect fraud cases. This trend continued for both D3 and D4. It could be observed that feature aggregation was able to improve the fraud detection rates. The non-fraud detection rates are shown in Fig.  2 . Most classifiers, except NB, produced perfect or almost perfect accuracy rates. These results were in contrast to the fraud detection rates shown in Fig.  1 .

MCC rates (best in bold)

An external file that holds a picture, illustration, etc.
Object name is 10479_2021_4149_Fig1_HTML.jpg

Fraud detection rates of data sets 1–4

An external file that holds a picture, illustration, etc.
Object name is 10479_2021_4149_Fig2_HTML.jpg

Non-fraud detection rates of data set 1–4

Table ​ Table11 11 summarizes the AUC scores. The results of D1 varied from 0.5 to 0.866, as achieved by GBT. In D2, improvements in most of the AUC scores could be observed. RF yielded the greatest improvement, i.e., from 0.5 to 0.958. All other classifiers achieved an increase in their AUC scores, except DT and RI. In D3, minor changes in the AUC scores could be observed. A similar observation could be made from the results of D4. While DS yielded the highest accuracy rates in Table ​ Table10, 10 , its AUC score was among the lowest, i.e., at 0.5 in all four data sets. One reason could be the simplicity of DS that contained a single split, which compromised its detection capabilities. GBT produced the highest AUC score of 0.967 for D2, D3, and D4. The boosting effect in GBT allowed learning to be focused on misclassified samples, which eventually led to a robust classification model, as shown in the results.

Managerial implications

From the perspective of financial institutions which provide payment cards services, the increase in fraud directly hits their business activities and impacts their profitability. With the rise in e-commerce transactions, the number of online transactions increases rapidly. The popularity of using payment cards is further fueled by the ability to perform online transactions from anywhere and anytime, with potentially lower prices in purchases. However, this comes with a cost, i.e., the number of fraud cases increases sharply as well. As a result, an effective fraud detection system for payment cards is vital for any financial institution to mitigate the risk of fraudulent transactions. It is costly for financial institutions to keep absorbing losses, as it creates financial uncertainty for them. The cost associated with efforts to handle risk and uncertainty will eventually be passed on to the retailers and consumers, leading to the increase in prices of goods. As an example, in a normal scenario, the payment card discount rate could be set at a lower range, closer to 1%. With an increase in the number of fraudulent transactions, the financial institution issuing the payment terminal may increase this rate to a higher rate, causing the merchant to get less from every single transaction. This leads to merchants selling goods online may increase the cost of the items, in which the consumers will end up paying for higher prices in their purchases.

In the recent year with the outbreak of Covid-19, consumers are having second thoughts on using cash. Using cash has been seen as potential hygiene issues, as the cash is passed around from one person to another. This has made many to move from handling cash to using plastic cards instead. While there are digital wallets or electronic wallets, which have gained popularity in recent times, their acceptance has made payment cards still a preferred choice for consumers. With the wide acceptance rate combined with consumers moving to cards from the Covid-19 pandemic, the rise of transactions will in turn have more fraud.

In this study, the developed fraud detection methods aim to mitigate the uncertainties caused by the issues discussed earlier. Indeed, it is crucial vital to detect fraudulent transactions before they happen, in order to ensure the overhead of business activities does not keep increasing. While there are many studies on payment card fraud detection, the use of real data is rare, since it is difficult to obtain real transaction records. As such, most of the existing methods use publicly available data sets for evaluation, and their effectiveness in real-world environments is unknown. In contrast, we have employed a real-world payment card database for demonstrating the usefulness of the developed methods with aggregated features. The resulting models can be readily used by financial institutions in real environment without requiring another round of assessment with the real data.

While there are many commercial fraud detection tools, fraudsters always try to outsmart the detection systems. As such, the journey of detecting and fighting fraud is a continuous one, not just a one-off attempt. The developed methods offer a viable approach for further enhancement when more and more data samples are available for learning purposes, improving accuracy of the detection rates over time in real-world environments.

Conclusions

We have presented an investigation on fraud detection pertaining to payment card transactions in this paper. The main contributions are the development of a practical system utilizing aggregated features for payment cards fraud detection and the use of with real transaction records for evaluation and demonstration of the effectiveness of the developed system. Our study is important for reducing the risks of financial losses as well as uncertainties faced by institutions in their daily business activities. In our analysis, a total of thirteen statistical and machine learning methods, ranging from ANN to deep learning models have been used for evaluation. Three benchmark credit card data sets obtained from a public repository have been used for performance assessment. The AUC metric is employed, which indicates statistical difference in performance of various detection methods. The best AUC score achieved is 0.937 from GBT for the Australian data set.

Importantly, real-world payment card transactions, in addition to benchmark databases, have been employed in our study. The same statistical and machine learning methods have been used for performance assessment. Besides that, feature selection with the GA has been performed. Feature selection is key in ensuring that only important and non-redundant features are identified, to ensure the classification performance can be enhanced and, at the same time, computation load can be optimized for real-world implementation.

As it is onerous to acquire real financial records, the outcome of this study is significant in uncovering valuable insights into the robustness of machine learning algorithms in real-world environments. The features from the original data set are aggregated to form new features, with the aim to counter the effects of concept drift and enhance performance. Our findings pertaining to the benefits of feature aggregation methods are in line with some reported results in the literature, e.g. aggregation of data over a short period led to an increase in probability to detect fraud as indicated in Jha et al. ( 2012 ), while Lim et al. ( 2014 ) revealed that aggregation-based methods yielded better results as compared with those from standard transactions.

Based on the original features, the best AUC score (from GBT) is 0.866, while the best AUC score (also from GBT) increases to 0.967 with the use of aggregated features. RF has recorded the largest improvement in AUC, i.e., 0.5 with the original feature set to 0.958 with the aggregated features. In addition to AUC, another useful performance metric, i.e., MCC, has been adopted for evaluation. Again, the MCC scores indicate that aggregated features are able to improve the results. Both DT and DS produce the highest MCC score (0.964), GBT (which yields the highest AUC score) achieves 0.869. The results indicate the usefulness of the aggregated features in improving the overall performances in both ACC and MCC scores.

This study is significant in view of the rise in e-commerce activities whereby the number of online transactions increases rapidly in this digital era. Indeed, with the outbreak of Covid-19, consumers are now resorting to online purchases. As such, an effective fraud detection system for payment cards is vital for a financial institution to mitigate the risk of fraudulent transactions. The developed system, therefore, offers a viable solution for financial institutions to detect fraudulent transactions pertaining to payment cards services in real environments.

In summary, we have evaluated the usefulness of machine learning and related models with aggregated features for fraud detection with both benchmark and real-world payment card database. The resulting models demonstrate a great potential for use by financial institutions in their daily business activities. Nevertheless, fraudsters always attempt to outsmart the detection systems. As such, the journey of detecting and fighting fraud is a continuous one. A number of limitations of this study and further research to enhance the develop fraud detection system are discussed.

Limitations and further research

The current study can be improved from several angles. Firstly, the real payment card database used is limited to a financial institution in Malaysia. The transactions are mostly occurred in the Asia region. It would be useful to acquire more real-world data from different regions, in order to fully evaluate the effectiveness of the developed method for detecting fraud in other regions around the world. The consumers in different regions may transact with different characteristics, and with varying spending patterns. As such, it is necessary to conduct further evaluation of the developed method with real data from different regions, in order to have a robust model that can be used for fraud detection of payment cards globally.

Another limitation is the use of single models for developing the fraud detection framework in this study. To further enhance the developed framework, hybrid models can be formed using combination of two or more models (Jiang et al., 2020 ). Hybrid models enable the use of more than one model to determine the transaction legitimacy, in order to improve further the fraud detection rate. In addition, online implementation of the detection methods will be investigated. This will allow detection and prevention of payment card fraud in real-time and in real environments. On the other hand, as financial risks differ between different regions, a variety of risk models and management strategies are available (Jawadi et al., 2019 ; Ben Amuer & Prigent, 2018 ), it is important to further improve the adaptability of the developed framework to suit various risk analysis methodologies. It is useful to investigate the applicability of different measurement errors in other financial domains (e.g. Ben Ameur et al., 2018 , 2020 ), in order to ensure that the developed framework can be generalized for other financial risk analytic tasks.

Declaration

The authors declare that they have no conflict of interest.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Manjeevan Seera, Email: [email protected] .

Chee Peng Lim, Email: [email protected] .

Ajay Kumar, Email: moc.noyl-me@ramuka .

Lalitha Dhamotharan, Email: [email protected] .

Kim Hua Tan, Email: [email protected] .

  • Ameur B, Prigent JL. Risk management of time varying floors for dynamic portfolio insurance. European Journal of Operational Research. 2018; 269 :363–381. doi: 10.1016/j.ejor.2018.01.041. [ CrossRef ] [ Google Scholar ]
  • Bahnsen AC, Aouada D, Stojanovic A, Ottersten B. Feature engineering strategies for credit card fraud detection. Expert Systems with Applications. 2016; 51 :134–142. doi: 10.1016/j.eswa.2015.12.030. [ CrossRef ] [ Google Scholar ]
  • Ben Ameur H, Ftiti Z, Jawadi F, Louhichi W. Measuring extreme risk dependence between the oil and gas markets. Annals of Operations Research. 2020 doi: 10.1007/s10479-020-03796-1. [ CrossRef ] [ Google Scholar ]
  • Ben Ameur H, Jawadi F, Idi Cheffou A, Louhichi W. Measurement errors in stock markets. Annals of Operations Research. 2018; 262 :287–306. doi: 10.1007/s10479-016-2138-z. [ CrossRef ] [ Google Scholar ]
  • Bernard P, De Freitas NEM, Maillet BB. A financial fraud detection indicator for investors: an IDeA. Annals of Operations Research. 2019 doi: 10.1007/s10479-019-03360-6. [ CrossRef ] [ Google Scholar ]
  • Bhattacharyya S, Jha S, Tharakunnel K, Westland JC. Data mining for credit card fraud: A comparative study. Decision Support Systems. 2011; 50 (3):602–613. doi: 10.1016/j.dss.2010.08.008. [ CrossRef ] [ Google Scholar ]
  • Card Fraud Worldwide—The Nilson Report, vol. 1096 (2016).
  • Dal Pozzolo A, Boracchi G, Caelen O, Alippi C, Bontempi G. Credit card fraud detection: A realistic modeling and a novel learning strategy. IEEE Transactions on Neural Networks and Learning Systems. 2017; 29 (8):3784–3797. [ PubMed ] [ Google Scholar ]
  • Dal Pozzolo A, Caelen O, Le Borgne YA, Waterschoot S, Bontempi G. Learned lessons in credit card fraud detection from a practitioner perspective. Expert Systems with Applications. 2014; 41 (10):4915–4928. doi: 10.1016/j.eswa.2014.02.026. [ CrossRef ] [ Google Scholar ]
  • de Sá AG, Pereira AC, Pappa GL. A customized classification algorithm for credit card fraud detection. Engineering Applications of Artificial Intelligence. 2018; 72 :21–29. doi: 10.1016/j.engappai.2018.03.011. [ CrossRef ] [ Google Scholar ]
  • Dua, D., &Graff, C. (2019). UCI machine learning repository [online]. Available http://archive.ics.uci.edu/ml . Irvine, CA: University of California, School of Information and Computer Science.
  • Edge ME, Sampaio PRF. The design of FFML: A rule-based policy modelling language for proactive fraud management in financial data streams. Expert Systems with Applications. 2012; 39 (11):9966–9985. doi: 10.1016/j.eswa.2012.01.143. [ CrossRef ] [ Google Scholar ]
  • Everett C. Credit card fraud funds terrorism. Computer Fraud and Security. 2003; 2003 (5):1. doi: 10.1016/S1361-3723(03)05001-2. [ CrossRef ] [ Google Scholar ]
  • Feng X, Xiao Z, Zhong B, Qiu J, Dong Y. Dynamic ensemble classification for credit scoring using soft probability. Applied Soft Computing. 2018; 65 :139–151. doi: 10.1016/j.asoc.2018.01.021. [ CrossRef ] [ Google Scholar ]
  • Ferreira FA, Meidutė-Kavaliauskienė I. Toward a sustainable supply chain for social credit: Learning by experience using single-valued neutrosophic sets and fuzzy cognitive maps. Annals of Operations Research. 2019 doi: 10.1007/s10479-019-03194-2. [ CrossRef ] [ Google Scholar ]
  • Fiore U, De Santis A, Perla F, Zanetti P, Palmieri F. Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Information Sciences. 2019; 479 :448–455. doi: 10.1016/j.ins.2017.12.030. [ CrossRef ] [ Google Scholar ]
  • Forbes. (2011). Bringing trust back to the table—part one: Adyen and Mobile Payments.
  • Forough J, Momtazi S. Ensemble of deep sequential models for credit card fraud detection. Applied Soft Computing. 2021; 99 :106883. doi: 10.1016/j.asoc.2020.106883. [ CrossRef ] [ Google Scholar ]
  • Fu, K., Cheng, D., Tu, Y., & Zhang, L. (2016). Credit card fraud detection using convolutional neural networks. In International conference on neural information processing (pp. 483–490). Cham: Springer.
  • Gómez JA, Arévalo J, Paredes R, Nin J. End-to-end neural network architecture for fraud scoring in card payments. Pattern Recognition Letters. 2018; 105 :175–181. doi: 10.1016/j.patrec.2017.08.024. [ CrossRef ] [ Google Scholar ]
  • Han J, Pei J, Kamber M. Data mining: Concepts and techniques. Elsevier; 2011. [ Google Scholar ]
  • Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982; 143 (1):29–36. doi: 10.1148/radiology.143.1.7063747. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Jadhav S, He H, Jenkins K. Information gain directed genetic algorithm wrapper feature selection for credit rating. Applied Soft Computing. 2018; 69 :541–553. doi: 10.1016/j.asoc.2018.04.033. [ CrossRef ] [ Google Scholar ]
  • Jawadi F, Louhichi W, Cheffou AI, Ben Ameur H. Modeling time-varying beta in a sustainable stock market with a three-regime threshold GARCH model. Annals of Operations Resesarch. 2019; 281 :275–295. doi: 10.1007/s10479-018-2793-3. [ CrossRef ] [ Google Scholar ]
  • Jha S, Guillen M, Westland JC. Employing transaction aggregation strategy to detect credit card fraud. Expert Systems with Applications. 2012; 39 (16):12650–12657. doi: 10.1016/j.eswa.2012.05.018. [ CrossRef ] [ Google Scholar ]
  • Jiang C, Song J, Liu G, Zheng L, Luan W. Credit card fraud detection: A novel approach using aggregation strategy and feedback mechanism. IEEE Internet of Things Journal. 2018; 5 (5):3637–3647. doi: 10.1109/JIOT.2018.2816007. [ CrossRef ] [ Google Scholar ]
  • Jiang M, Jia L, Chen Z, Chen W. The two-stage machine learning ensemble models for stock price prediction by combining mode decomposition, extreme learning machine and improved harmony search algorithm. Annals of Operations Research. 2020; 1 :1–33. doi: 10.1007/s10479-020-03690-w. [ CrossRef ] [ Google Scholar ]
  • Jurgovsky J, Granitzer M, Ziegler K, Calabretto S, Portier PE, He-Guelton L, Caelen O. Sequence classification for credit-card fraud detection. Expert Systems with Applications. 2018; 100 :234–245. doi: 10.1016/j.eswa.2018.01.037. [ CrossRef ] [ Google Scholar ]
  • Li Z, Huang M, Liu G, Jiang C. A hybrid method with dynamic weighted entropy for handling the problem of class imbalance with overlap in credit card fraud detection. Expert Systems with Applications. 2021; 175 :114750. doi: 10.1016/j.eswa.2021.114750. [ CrossRef ] [ Google Scholar ]
  • Lim, W. Y., Sachan, A., & Thing, V. (2014). Conditional weighted transaction aggregation for credit card fraud detection. In IFIP International conference on digital forensics (pp. 3–16). Berlin: Springer.
  • Lucas Y, Portier PE, Laporte L, He-Guelton L, Caelen O, Granitzer M, Calabretto S. Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs. Future Generation Computer Systems. 2020; 102 :393–402. doi: 10.1016/j.future.2019.08.029. [ CrossRef ] [ Google Scholar ]
  • Pavía JM, Veres-Ferrer EJ, Foix-Escura G. Credit card incidents and control systems. International Journal of Information Management. 2012; 32 (6):501–503. doi: 10.1016/j.ijinfomgt.2012.03.003. [ CrossRef ] [ Google Scholar ]
  • Powers D. Evaluation: From prediction, recall and F-factor to ROC, informedness, markedness and correlation. Journal of Machine Learning Technologies. 2011; 2 (1):37–63. [ Google Scholar ]
  • Randhawa K, Loo CK, Seera M, Lim CP, Nandi AK. Credit card fraud detection using AdaBoost and majority voting. IEEE Access. 2018; 6 :14277–14284. doi: 10.1109/ACCESS.2018.2806420. [ CrossRef ] [ Google Scholar ]
  • RapidMiner. (2018). Optimize Selection (RapidMiner Studio Core) [Online]. Available https://docs.rapidminer.com/latest/studio/operators/modeling/optimization/feature_selection/optimize_selection.html .
  • Robinson WN, Aria A. Sequential fraud detection for prepaid cards using hidden Markov model divergence. Expert Systems with Applications. 2018; 91 :235–251. doi: 10.1016/j.eswa.2017.08.043. [ CrossRef ] [ Google Scholar ]
  • Rtayli N, Enneya N. Enhanced credit card fraud detection based on SVM-recursive feature elimination and hyper-parameters optimization. Journal of Information Security and Applications. 2020; 55 :102596. doi: 10.1016/j.jisa.2020.102596. [ CrossRef ] [ Google Scholar ]
  • Russac, Y., Caelen, O., & He-Guelton, L. (2018). Embeddings of categorical variables for sequential data in fraud context. In International conference on advanced machine learning technologies and applications (pp. 542–552). Cham: Springer.
  • Sahin Y, Bulkan S, Duman E. A cost-sensitive decision tree approach for fraud detection. Expert Systems with Applications. 2013; 40 (15):5916–5923. doi: 10.1016/j.eswa.2013.05.021. [ CrossRef ] [ Google Scholar ]
  • Sariannidis N, Papadakis S, Garefalakis A, Lemonakis C, Kyriaki-Argyro T. Default avoidance on credit card portfolios using accounting, demographical and exploratory factors: Decision making based on machine learning (ML) techniques. Annals of Operations Research. 2020; 294 (1):715–739. doi: 10.1007/s10479-019-03188-0. [ CrossRef ] [ Google Scholar ]
  • Seera M, Lim CP, Tan SC, Loo CK. A hybrid FAM–CART model and its application to medical data classification. Neural Computing and Applications. 2015; 26 (8):1799–1811. doi: 10.1007/s00521-015-1852-9. [ CrossRef ] [ Google Scholar ]
  • Van Vlasselaer V, Bravo C, Caelen O, Eliassi-Rad T, Akoglu L, Snoeck M, Baesens B. APATE: A novel approach for automated credit card transaction fraud detection using network-based extensions. Decision Support Systems. 2015; 75 :38–48. doi: 10.1016/j.dss.2015.04.013. [ CrossRef ] [ Google Scholar ]
  • Whitrow C, Hand DJ, Juszczak P, Weston D, Adams NM. Transaction aggregation as a strategy for credit card fraud detection. Data Mining and Knowledge Discovery. 2009; 18 (1):30–55. doi: 10.1007/s10618-008-0116-z. [ CrossRef ] [ Google Scholar ]
  • Zhang J, Xiong Y, Min S. A new hybrid filter/wrapper algorithm for feature selection in classification. Analytica Chimica Acta. 2019; 1080 :43–54. doi: 10.1016/j.aca.2019.06.054. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zhang X, Han Y, Xu W, Wang Q. HOBA: A novel feature engineering methodology for credit card fraud detection with a deep learning architecture. Information Sciences. 2020 doi: 10.1016/j.ins.2019.05.023. [ CrossRef ] [ Google Scholar ]
  • Zhu H, Liu G, Zhou M, Xie Y, Abusorrah A, Kang Q. Optimizing Weighted Extreme Learning Machines for imbalanced classification and application to credit card fraud detection. Neurocomputing. 2020; 407 :50–62. doi: 10.1016/j.neucom.2020.04.078. [ CrossRef ] [ Google Scholar ]

Credit Card Fraud Detection Using Machine Learning Techniques

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

  • Open access
  • Published: 26 April 2024

Healthcare insurance fraud detection using data mining

  • Zain Hamid 1   na1 ,
  • Fatima Khalique 1   na1 ,
  • Saba Mahmood 1   na1 ,
  • Ali Daud 2   na1 ,
  • Amal Bukhari 3   na1 &
  • Bader Alshemaimri 4   na1  

BMC Medical Informatics and Decision Making volume  24 , Article number:  112 ( 2024 ) Cite this article

67 Accesses

Metrics details

Healthcare programs and insurance initiatives play a crucial role in ensuring that people have access to medical care. There are many benefits of healthcare insurance programs but fraud in healthcare continues to be a significant challenge in the insurance industry. Healthcare insurance fraud detection faces challenges from evolving and sophisticated fraud schemes that adapt to detection methods. Analyzing extensive healthcare data is hindered by complexity, data quality issues, and the need for real-time detection, while privacy concerns and false positives pose additional hurdles. The lack of standardization in coding and limited resources further complicate efforts to address fraudulent activities effectively.

In this study, a fraud detection methodology is presented that utilizes association rule mining augmented with unsupervised learning techniques to detect healthcare insurance fraud. Dataset from the Centres for Medicare and Medicaid Services (CMS) 2008-2010 DE-SynPUF is used for analysis. The proposed methodology works in two stages. First, association rule mining is used to extract frequent rules from the transactions based on patient, service and service provider features. Second, the extracted rules are passed to unsupervised classifiers, such as IF, CBLOF, ECOD, and OCSVM, to identify fraudulent activity.

Descriptive analysis shows patterns and trends in the data revealing interesting relationship among diagnosis codes, procedure codes and the physicians. The baseline anomaly detection algorithms generated results in 902.24 seconds. Another experiment retrieved frequent rules using association rule mining with apriori algorithm combined with unsupervised techniques in 868.18 seconds. The silhouette scoring method calculated the efficacy of four different anomaly detection techniques showing CBLOF with highest score of 0.114 followed by isolation forest with the score of 0.103. The ECOD and OCSVM techniques have lower scores of 0.063 and 0.060, respectively.

The proposed methodology enhances healthcare insurance fraud detection by using association rule mining for pattern discovery and unsupervised classifiers for effective anomaly detection.

Peer Review reports

Introduction

The healthcare system plays a crucial role in maintaining the health and well-being of society, and many countries provide health insurance to their citizens to ensure they have access to medical care when needed. Health insurance can be provided by both public and private entities, and it helps to cover the cost of medical treatments, procedures, and medications. This system also helps to protect people from the financial burden of unexpected medical expenses that can arise due to illness or injury. The Sehat Sahulat Program was a health insurance initiative launched by the government of Pakistan in partnership with provincial governments, aimed at providing health coverage for needy people to minimize or eliminate out-of-pocket expenses and reduce poverty [ 1 ]. The program covers emergency and inpatient services requiring secondary and tertiary care but does not include outpatient services. The financial range for overall treatment coverage varies from 720,000 to 1,000,000 PKR and includes transportation for maternal care, referrals to tertiary care, and funeral allowances [ 2 ]. Similarly United States has its own Federal Government sponsored national healthcare program, Medicare, which provides affordable health insurance to individuals 65 years and older, and other select individuals with permanent disabilities [ 3 ]. Other than United States, countries like Canada, UK, France and many other also provide such facilities to their citizens. Advancements in medical sciences and technology have led to significant improvements in the health and well-being of the general public. However, the cost of quality healthcare can be high, and this is where health insurance plans play their role. Despite the significance of health insurance plans, fraudsters continually develop sophisticated schemes to evade detection. They may employ advanced techniques such as identity theft, billing for services not provided, or collusion among healthcare providers. Healthcare insurance frauds are causing billions of dollars loss in healthcare funds around the world. In 2010, the cost went up to 10% of total health care expenditure worldwide [ 4 ]. According to some reports, the US healthcare system loses around $505 billion to $850 billion every year. This percentage is from 9% to 19% of the total healthcare expenditure [ 5 ]. It can be easily seen that this additional burden leads to increased taxes and higher health insurance plans for individuals.

figure 1

Sources of Waste in US Health Care

The Government Accountability Office of the United States (U.S. GAO) estimated over $51 million Medicare beneficiaries in 2013 with services costing somewhere over $600 billion. On the other side, it also costs $50 billion in improper payments, including some basic technical and human errors, which there may be fraudulent cases [ 6 ]. According to RGA data, published in 2017,The countries of Australia, Singapore, Malaysia, Thailand, the Philippines, Vietnam, and Korea, as well as Japan and Indonesia also faces healthcare fraud issues [ 7 ]. As per limited figures, the European continent has at least €56 billion in losses annually over fraud practices [ 8 ]. The Swedish insurance industry pays out SEK 70 billion loss to its customers in more than $3 million in claims; unfortunately, 5% of these payments turn out as fraudulent [ 9 ].

Insurance scams in the healthcare industry are resulting in losses of billions of dollars for public healthcare systems all over the world. Healthcare systems generate vast amounts of data, including patient records, billing information, and medical claims. Analyzing this large volume of data to identify anomalies or patterns indicative of fraud can be challenging and time-consuming. anomaly detection has been studied in different domain for identification of abnormal behaviors [ 10 , 11 ] .Data mining techniques in combination with different analytical approaches i.e., machine learning techniques are now recognized as a key practice to identify fraud [ 12 ].

The Fig. 1 explains the most popular classification of the frauds in healthcare insurance system. Fraud can be identified through the services availing as well as providing patterns. Availing patterns such as repetition of services, age inconsistency, gender inconsistency, and visit frequency can leads towards fraud and waste of healthcare insurance. These patterns are performed by the patients. On the sides hospitals, providing patterns such as Billing, Unnecessary treatments, unnecessary procedures, charging multiple times, and misuse of credentials can leads towards fraud and abuse of system [ 13 ]. Recently researchers [ 14 ] tried to find behavioural relationship of different visits of patients utilizing hierarchical attention mechanism in fraud.

The healthcare insurance system involves mainly three actors - the insured, medical institutions, and insurance providers as depicted in the Fig. 2 . Each actor may have different interests that can lead to fraud, for example, over-diagnosis and treatment by hospitals, fake medical treatment by insured individuals, and insufficient review of medical insurance settlement data by the health insurance providers. These frauds cause a significant loss to the insurance fund and threaten its normal operation. Measures should be taken to detect and report fraud, waste, and abuse in the system, including errors and abuse by providers, unnecessary costs to the payer, and exploitation of weaknesses in internal control mechanisms. Recently authors [ 15 ] proposed a Bayesian belief network based model to identify fraudulent activities involving all stakeholders in a transaction.

figure 2

Healthcare Insurance Ecosystem :Patient, Hospital, and Services Providers

“Fraud, Waste, and Abuse” (often abbreviated as “FWA”) is a term used in the healthcare industry, including health insurance, to refer to practices that result in unnecessary or excessive healthcare costs, improper payments, or other fraudulent activities. Waste, abuse, and fraud in healthcare can result in substantial financial losses for insurance companies, which can drive up the cost of healthcare for everyone. To combat FWA in healthcare, insurance companies, regulators, and law enforcement agencies work to detect and prevent these activities, investigate potential cases, and prosecute those responsible for engaging in fraudulent activities. Fraud in Healthcare occurs when individuals or organizations intentionally deceive healthcare providers, insurance companies, or patients in order to gain some type of financial benefit. This can take many forms, including billing for services that were never performed, submitting false claims, and using someone else’s insurance information. These actions can result in improper payment or financial gain for the individuals or organizations involved, and can ultimately increase healthcare costs for patients and insurance providers. Healthcare fraud is a serious crime and can result in civil and criminal penalties, including fines, imprisonment, and exclusion from government healthcare programs.

Waste in healthcare is a significant problem that can lead to unnecessary costs without providing any or way less benefit to patients. It refers to the overuse or misuse of healthcare resources, which can result in inefficient healthcare practices and poor patient outcomes. Examples of waste in healthcare includes ordering unnecessary tests or procedures, prescribing expensive brand-name drugs when generic alternatives are available, and using higher-cost facilities for routine care. Such wastes in healthcare can contribute to rising healthcare costs and reduced access to care for patients. Addressing waste in healthcare is also important for improving the efficiency and effectiveness of healthcare delivery, while also ensuring that patients receive the appropriate care they need. This can involve implementing strategies to reduce unnecessary testing and procedures, promoting the use of cost-effective medications, and encouraging the use of lower-cost healthcare facilities for routine care.

Abuse in healthcare is another challenge that can lead to unnecessary costs and improper payments. It refers to actions that are inconsistent with accepted business or medical practices, and can result in fraudulent or unethical behavior by healthcare providers or organizations. Some of the many examples of abuse in healthcare are over-billing for services or billing for services that were not actually provided. It can lead to fraudulent or unethical behavior, resulting in unnecessary costs and improper payments. Addressing abuse in healthcare is important for improving the integrity of healthcare delivery, while also ensuring that patients receive the appropriate care they need. This can involve implementing strategies to detect and prevent abuse, such as conducting regular audits of billing practices, implementing fraud detection software, or establishing clear policies and procedures for billing and reimbursement. Healthcare billing processes are often complex, involving numerous codes, regulations, and billing procedures. Fraud detection systems must navigate this complexity to identify irregularities accurately. In addition, the unavailability of labelled data in the domain poses another challenge. Therefore, there is a need to design and develop effective unsupervised learning-based technique that can help detect and prevent health insurance fraud, provide actionable insights to relevant stakeholders, such as hospital and insurance providers.

Following are the key contributions of the research

Use of unique user behavior patterns at transaction level

Propose of detecting health insurance fraud by considering the interactions of multiple players, including patients, service providers, and physicians

Propose a novel method that combines unsupervised rule mining and unsupervised classification approaches to identify fraudalent transactions

Define a cost-based evaluation metric as compared to error-based metric that aligns well with proposed methodology

Related work

Significant research exists related to the general insurance fraud detection in the past that focuses on data mining and machine learning techniques [ 16 ]. Researchers have mostly focused on one of the stakeholders of insurance triangle , more frequently, on the frauds done by patients or by the hospitals. In this research, all stakeholders in insurance triangle are focused for the better identification of frauds committed across the board. A summarised view of methods and techniques used for fraud or anomaly detection in healthcare as well as other domains with significant results, is presented here.

Association rule mining

Association rule mining is not yet widely researched in the area of healthcare insurance or for any other fraudulent activities. Although it is a widely used data mining technique but still carrying some drawbacks, Yadav et al. discussed some techniques that can help improve the algorithm [ 17 ]. Saba et al. shared the initial stage of the study, by using the association rule followed by the SVM classification algorithm, they believe their model can address the discrepancies and thus reduce fraud in health insurance [ 18 ]. Sornalakshmi et al. presented the new technique by combining the MapReduce and Apriori association rule mining. MapReduce makes parallel computing very easy. However, the author believes Apriori algorithm needs to be fully implemented, as there is a lot of improvement needed in Apriori algorithms for parallel and distributed terms [ 19 ]. Authors of [ 20 ] used the algorithm in medical billing also believes that Apriori algorithm is good for finding frequent item-sets from billing database.

Unsupervised machine learning

Data mining helps detect and prevent insurance fraud. Anomaly detection, Clustering, and classification can detect fraudulent insurance claims [ 21 ]. After finding anomalous claims, further investigation can be required to narrow the focus and identify fraud patterns. A recent research [ 22 ] highlights the current and furite chanlenges of anomaly detection Kirlidogab and Asukb [ 23 ] used longitudinal data of nine years but also suggest one-year analyses which can be beneficial for detecting “hit and run” frauds that are hard to detect over long periods. Gao [ 24 ] proposed the SSlsomap activity clustering method, SimLOF outlier detection method, and the Dempster-Shafer Theory-based evidence aggregation method to detect the unusual categories and frequencies of behaviours simultaneously. Alwan [ 25 ] shows how combining machine learning techniques with existing methods for detecting fraud can make it easier to find fraud. Specifically, the paper examines the effectiveness of several data mining techniques, including Decision Tree, Support Vector Machine, K-Nearest Neighbor, and Hidden Markov Model, in detecting credit card fraud. The findings highlight the potential of a hybrid approach that integrates these methods to enhance fraud detection.

Shang [ 26 ] suggested the use of One Class Support Vector Machine (OCSVM) for the intrusion detection. Authors describe that OCSVM in anomalies detection fields have advantages, such as fast and strong generalization ability, the less support vector, the simple model, and the great practical value [ 27 ].Recently authors of the work [ 28 ] proposed utilization of one class svm for the defect identification in railway track geometry. Maglaras [ 29 ] combined the ensemble methods and social networking metrics for the enhancement of the OCSVM, but it needs the improvement in order to decrease false positive and increase detection accuracy. Maglaras [ 30 ] developed using an OCSVM classifier and recursive k-means clustering. It is trained offline using network traces, and only severe alerts are communicated to the system. The module is part of an IDS system developed under CoCkpitCI, and its performance is stable and not affected by the selection of parameters \(\nu\) and \(\sigma\) . However, the author believes further evaluation is needed to determine its effectiveness under different anomalies scenarios. Wang [ 31 ] proposes an improved particle swarm optimization algorithm to enhance the accuracy of the OCSVM-based power system anomaly detection. The algorithm introduces an adaptive learning factor and splitting and elimination mechanism to improve the population’s diversity and fine searching ability. Amer [ 32 ] proposed SVM-based algorithms are effective for unsupervised anomaly detection, outperforming clustering-based methods in most cases. The proposed eta one-class SVM produces the most promising results, with a sparse solution and superior AUC performance. The introduced outlier score calculation method allows for ranking of outliers, providing practical value for anomaly detection applications.

In 2008, Fei Tony Liu and Zhi-Hua Zhou developed an algorithm called the Isolation Forest [ 33 ] with the purpose of finding anomalies in data. This particular algorithm makes use of binary trees in order to identify anomalies, and because of its linear time complexity and low memory requirements, it is well suited for the processing of large amounts of data. Isolation Forest algorithm’s low accuracy, execution efficiency, and generalization ability are addressed by Xu’s SAiForest data anomaly detection method [ 34 ]. SAiForest optimises the forest by selecting isolation trees with high abnormality detection and difference using simulated annealing and selective integration based on precision and difference value. Cheng [ 35 ] proposes the union of Isolation Forest and Local Outlier Factor to detect outliers in multiple datasets. The algorithm calculates each data point’s anomaly score using binary trees and prunes normal samples to find outlier candidates. The proposed method addresses Isolation Forest’s local outlier issues and reduces Local Outlier Factor’s time complexity. Ding [ 36 ] proposes an iForest-based anomaly detection framework under the sliding windows framework iForestASD, for streaming data. Four real-world data sets show that proposed method is efficient. Authors believes there is still a lot improvement required in the algorithm, such as defining the threshold and size of sliding window. Lesouple [ 37 ] introduced generalized isolation forest for anomaly detection. Although it achieved the less execution time but the false alarm rate is high. A recent work [ 38 ] utilized autoencoder methods to find fraudulent claims and found that this technique outperformed to the density based clustering methods.

Cluster Based Local Outlier Factor (CBLOF) was proposed by He et al. in 2022 [ 39 ]. It is generally used for outlier detection that considers a combination of local distances to nearby clusters and the size of those clusters. It identifies anomalies as data points that are located in small clusters next to a larger nearby cluster. Such outliers may not be single points but instead, small groups of isolated points. John [ 40 ] explained the workings of Local Outlier Factor and Isolation Forest and suggested its use for identification of credit card fraud with the accuracy of 97% and 76% respectively. Kanyama [ 41 ] used K-Nearest Neighbor (k-NN), CBLOF, and histogram-based outlier score (HBOS) for anomaly detection in smart water metering networks. After the experimentation, authors believes that CBLOF performs better than KNN in terms of detection rates, but KNN achieved almost zero in terms of False Positive Rate. Irfan [ 42 ] performed an experiment for the evaluation of the performance of three unsupervised outlier detection algorithm such as K-Means, LOF, and CBLOF. Authour states that the CBLOF performed better than its competitors, CBLOF was faster in terms of computational complexity. Author recommended to restart the K-Means algorithm multiple times for stable cluster results, but CBLOF may be preferable for applications where processing speed or updating clustered models in streaming data is important. In another experiment Irfan [ 43 ] applied the methodology for churn prediction in banking system and came up with the same results in favor of CBLOF. The main goal of the research in this domain is to find the most important features and data sources, such as medical records, billing details, and demographic information, for using unsupervised learning techniques to find health insurance fraud. In the proposed approach, it is intended to identify the fraudulent patterns based on interaction of three stakeholders, that is, patient, physician and service. In addition, the influence of data preprocessing approaches, like normalization, feature scaling, and missing data imputation, on the accuracy and resilience of fraud detection models is studied. Moreover, the potential of ensemble methods, combining multiple unsupervised learning models to enhance accuracy and generalization is investigated by evaluating the performance of various unsupervised learning algorithms in detecting health insurance fraud.

Dataset in use

In terms of studies utilizing the same dataset, Table 1 describes the details on research conducted on DE-synPUF dataset described in “ Dataset ” section. Bauder et al. [ 44 , 45 ] found that the C4.5 decision tree algorithm outperformed others in terms of Area Under the Curve (AUC) metrics, indicating its efficacy in identifying fraudulent activities. Similarly, Herland et al. [ 46 , 47 ] demonstrated the effectiveness of logistic regression and gradient tree boosting, achieving commendable AUC results. Fan et al. [ 48 ] highlighted the superior performance of decision tree classifiers, especially when integrating social media data, suggesting a novel approach to enhancing fraud detection accuracy.

Ekin et al. [ 49 ] noted a direct correlation between increased class imbalance and decreased AUC scores, yet they identified Random Walk Oversampling (RWO) as a potent method to counteract this issue. Sadiq et al. (2017) [ 50 ] employed a PRIM-based bump-hunting technique, effectively pinpointing potential fraudulent activities, while Sadiq et al. (2019) [ 51 ] used propensity matching and clustering in their CPM Fraud Miner to detect data anomalies indicative of fraud.

Each study, while advancing the understanding of fraud detection mechanisms, encountered limitations. These ranged from the challenges of dealing with highly imbalanced datasets, as in the case of Bauder et al., to the complexities of integrating diverse data sources, such as social media and public records, which could introduce biases into the analysis, as noted by Fan et al. Moreover, the methodologies often relied on assumptions or incomplete data, with the true extent of fraudulent activities remaining partially uncovered, thus highlighting the necessity for more comprehensive and robust data analysis techniques in future research.

For the research studies using supervised learning techniques, a key aspect is distinguishing between fraudulent and legitimate providers by identifying relevant features. While, the approach to identifying these features differs among researchers, a common predominant focus is on the provider level as opposed to the transaction level. In addition, when defining ground truth for supervised learning, the accuracy of the labeled dataset is crucial, as it directly impacts the categorization of data into specific classes. In most studies on the dataset, fraud labels are obtained by incorporating exclusions from the Office of Inspector General’s (OIG) List of Excluded Individuals/Entities (LEIE) database [ 54 ]. While the LEIE database lists provider-level exclusions, it does not comprehensively capture all instances of provider fraud. Notably, 38% of providers with fraud convictions remain in medical practice, and 21% have not been suspended from practicing medicine despite their convictions, as highlighted in Pande and Maas [ 55 ]. The integrity of provider classification into fraudulent or legitimate (non-fraudulent) is essential, yet there also remains an ambiguity for providers not previously scrutinized for fraud. Some studies have attempted to mitigate this uncertainty by estimating a range for the class distribution of unreviewed providers, highlighting the potential misclassification of fraud cases as non-fraudulent. The binary classification system, categorizing providers simply as fraudulent or legitimate, may not fully capture the amount of fraud commitment. Therefore, assessing the level of “fraud confidence” could offer a better approach for training models. In our study we evaluate our rules based on confidence and support for rule approach. Furthermore, the fraud dataset typically exhibits a skew, with a disproportionate number of providers classified as legitimate compared to those deemed fraudulent. This imbalance, known as “class imbalance,” reflects a common challenge in the dataset’s label distribution.

While unsupervised methods used are more practical, in approaches like outlier detection, the responsibility to establish fraudulent intent falls to investigators or experts. Identifying specific claim line details that underpin the fraud is also challenging, given that such billing discrepancies often pertain to the overall behavior of the provider. Therefore, in our study we employ the unsupervised methods in combination with rule-based approach for detection to mitigate some shortcomings of unsupervised approaches.

Based on the related work, a potential research area that is explored is the limited application of association rule mining for fraud detection across all stakeholders in the insurance triangle (patients, physicians, and services). Additionally, exploring the integration of association rule mining with other techniques like unsupervised classifiers or ensemble methods could further enhance the accuracy and effectiveness of fraud detection systems in the healthcare insurance domain.

Materials & methods

In this study, the Centers for Medicare and Medicaid Services (CMS) Linkable 2008-2010 Medicare Data Entrepreneurs’ Synthetic Public Use File DE-synPUF is utilized ( https://www.cms.gov/data-research/statistics-trends-and-reports/medicare-claims-synthetic-public-use-files/cms-2008-2010-data-entrepreneurs-synthetic-public-use-file-de-synpuf ). The claims made by Medicare recipients and a random sample of five percent of those beneficiaries from 2008 to 2010 are included in the dataset. The CMS made twenty random sample files available for researchers. The inpatient dataset from subsample 1 of the available files is utilized in this study. While, there is nothing that restricts using only this one sample or using multiple samples at the same time, studies have suggested that inpatient fraud may be more prevalent than outpatient fraud. One of the possible explanation for this is that inpatient care tends to be more expensive than outpatient care, which means that there is a greater potential for fraudulent activity to generate large profits. Additionally, inpatient care may involve more complex procedure and treatments, which can be easier to over bill or manipulate as compare to simpler outpatient services. The selection of this particular method for validating the proposed methodology was completely arbitrary, and in future more samples can be added to the dataset.

figure 3

Inpatient Claim extracted from Carrier Claims, Prescription Drug Events, Beneficiary Summary and Outpatient Claim of DE-synPUF dataset

The selected sample consists of following features. The beneficiary code (DESYNPUF_ID) identifies each beneficiary in the dataset, while the claim ID distinguishes claims for the same beneficiary. A record’s claim line section identifies its claim component. The start and end dates indicate the claim period. The provider institution is the medical facility that performed the service, and the claim payment amount is the total amount paid. Attending, operating, and other physician NPI numbers identify service providers. The inpatient admission and discharge dates show when the beneficiary was hospitalised. Diagnosis and procedure codes define illnesses and treatments. Lastly, the revenue centre HCFA common procedure coding system classifies medical service.The attributes are presented graphically in the Fig. 3 .

Baseline methods

The unsupervised base-line learning techniques used in this research include Apriori, Isolation Forest, One-Class SVM i.e OCSVM, Clustering-based Local Outlier Factor (CBLOF), and Ensemble Correlation-Based Outlier Detection (ECOD). Apriori is a well-known algorithm for mining frequent itemsets and association rules, which is used to identify patterns and relationships between different items in a dataset. Isolation Forest is a tree-based algorithm that partitions the dataset into isolated subspaces, which is used to detect anomalies and outliers. OCSVM is a support vector machine-based algorithm that creates a boundary around the normal data points, which is used to identify anomalous data points that fall outside the boundary. OCSVM is relevant for anomaly detection due to its ability to identify outliers or anomalies in datasets where only one class (normal instances) is predominantly represented. CBLOF is a clustering-based approach that uses k-means clustering to identify local outlier factors, which is used to identify anomalous clusters. ECOD is an ensemble method that combines multiple correlation-based outlier detection methods, which is used to identify anomalous data points that are consistent across multiple methods.

Apriori algorithm

Agrawal and Srikant proposed the Apriori algorithm in 1994, which has become a widely used data mining algorithm for identifying frequent item sets in a transaction database [ 56 ]. In the field of association rule mining, the Apriori algorithm is recognized as one of the most well-known algorithms [ 57 ]. However, it may not be the optimal choice for detecting anomalies or fraudulent transactions in a database. This is because it is commonly assumed that fraudulent transactions are significantly fewer than normal ones. Therefore, when implementing Apriori, it is expected that the algorithm will generate rules based on normal transactions.

Apriori algorithm works in two steps for association rule mining. The first step is to find all the frequently occurring item sets from the data and generating association rules from the set of frequently occurring items is done in the second step [ 58 ].

Isolation forest

Isolation Forest was introduced at Lie et al. [ 33 ] in 2008. Generally, it is designed to detect anomalies from structured data. The iTree, or isolation tree, is a binary tree data structure in which each node corresponds to a subset of data objects. The tree is constructed by randomly sub sampling a subset of n data objects from the entire dataset and using it as the data pool for the root node. The tree grows by recursively partitioning the data objects in the leaf node into two child nodes, until a single data object remains in the node or the maximum depth limit is reached. The branching criterion for each data object is determined by comparing a randomly selected feature of the data object to a split value within the range of that feature’s values. The path length of a data object in the iTree serves as an indication of the object’s abnormal degree. An iForest, or isolation forest, is constructed by creating multiple iTrees, and the anomaly score of a data object is calculated by averaging the path lengths of that object across all iTrees in the forest. The final anomaly score is then normalized using a factor.The visual representation is shown in the Fig. 4 .

Isolation Forest consists of two steps, training and testing phase. In training, the algorithm builds an ensemble of isolation trees, known as iTreesEach tree is build through algorithm. By default 100 iTrees are built in an IForest but changes can be made in experiments for obtaining the best results.

figure a

Algorithm 1 Building a decision tree

figure 4

Isolation Forest

In the next step of IF algorithm, each data point is passed through each built iTree to calculate its corresponding anomaly score a ( x ) from 0 to 1. Labels are assigned based on their respective data point’s scores. Specifically, those with scores below 0.5 are classified as normal and receive a label of 1. On the other hand, data points with scores that are closer to 1 are deemed as potential anomalies and thus labeled with a value of -1.

Anomalies are detected through

where c ( m ) is a normalization constant for a data set of size n . The expression E(h(x)) represents the expected or “average” value of this path length across all the Isolation Trees. The expression k(m) represents the average value of h(x) given a sample size of m and is defined using the following equation. Following equation illustrates the formula of the constant k(m).

where H is the harmonic number, which can be estimated by \(H(i)=\ln (i)+\gamma\) , where \(\gamma =0.5772156649\) is the Euler-Mascheroni constant.

Cluster based local outlier factor

Cluster-Based Local Outlier Factor (CBLOF) was proposed by He et al. [ 39 ] in 2002. The CBLOF definition of anomalies takes into account both the local distances to neighbouring clusters as well as the sizes of the clusters to which the data point belongs. Algorithm first cluster next to a nearby large cluster are identified as outliers. The Local outliers may not be a singular point, but a small group of isolated points as shown in Fig. 5 .

In general, the procedure of CBLOF can be described in the three steps. Initially, a data point is assigned to one and only one cluster. K-means is commonly used as clusteric algorithm for CBLOF. Next, CBLOF ranks clusters according to the cluster size from large to small and get the cumulative data counts. Clusters that holds 90% of the data are considered as “large” clusters rest of them are consider as “small” clusters. The threshold of 0.9 can be fine-tuned as per requirement. Lastly, the outlier detection process involves the calculation of the distance of a data point to the centroid and its corresponding outlier score. For data points belonging to a large cluster, the distance is calculated as the distance from the data point to the centroid of its cluster. The outlier score is then determined as the product of this distance and the number of data points in the cluster. For the smaller clusters the distance is the distance from the data point to the centroid of the nearest large cluster. The outlier score for these data points is determined as the product of this distance and the size of the small cluster to which the data point belongs.

figure 5

Cluster Based Local Outlier Factor

As it can be seen in Fig. 3 , clusters A1 and B1 are the smaller clusters and A2, and B2 are large cluster. A1 and B1 will be considered as outlier as they do not belong to any of the large clusters A2 and B2. According to the local neighborhood, data in cluster A1 is local outliers to A2, and same with B1 for B2.

One-class support vector machine

An unsupervised learning technique, One-Class Support Vector Machine (OCSVM) is used for outlier detection and constituting an incremental learning process. Its application in Anomaly Detection is widely used around the world such as Outlier Detection, Novelty Detection, and many others. OCSVM is modified to be a single-class learner from SVM that tries to find a hyper-sphere among the instances of the normal classes. This model classifies new data as normal or abnormal, all observations inside the hyper-sphere are normal and those outside the hyper-sphere and abnormal or anomalies.

Let us first examine the conventional two-class support vector machine. Consider a data set with two dimensional space \((x_1,y_1),(x_2,y_2),\dots ,(x_n,y_n)\) \(;\) points \(x_i \in \mathbb {R}^d\) where \(x_i\) is the i -th input data point and \(y_i \in \{-1,1\}\) is the i -th output pattern, indicating the class membership.

A significant advantage of support vector machines (SVMs) is their capability of generating a non-linear decision boundary by transforming the data through a non-linear mapping \(\phi\) to a higher-dimensional feature space F . In this feature space, it may be possible to separate the classes with a hyperplane, even if a linear boundary is not feasible in the original input space I . This process results in a non-linear curve in the input space when the hyperplane is projected back. By utilizing a polynomial kernel for the projection, all the dots are elevated to the third dimension, and a hyperplane can be employed for separation. When the plane’s intersection with the space is projected back to the two-dimensional space, it results in a circular boundary.

The hyperplane that separates the classes in an SVM is represented by the equation \(w^Tx + b = 0\) , where w is a vector in the feature space F and b is a scalar in \(\mathbb {R}\) . The margin between the classes is determined by this hyperplane, with all data points belonging to class \(-1\) on one side and all data points belonging to class 1 on the other. The hyperplane aims to maximize the distance between the closest data points from each class to itself, thus achieving the maximum margin or “separating power.”

To address the issue of overfitting in the presence of noisy data, slack variables \(\xi _i\) are introduced to permit some data points to lie within the margin. The trade-off between maximizing the margin and accommodating training errors is controlled by the constant \(C > 0\) . The SVM classifier’s objective function is a minimization formulation that balances these factors.

According to Scholkopf et al. [ 59 ], separates all the data points from the origin in the feature space F and maximizes the distance from hyperplane to the origin. This result in a binary function which returns +1 in a “smaller” region and -1 elsewhere.

By using Lagrange techniques and using a kernel function for the dot product calculations, the decision function becomes:

Empirical cumulative distribution based outlier detection

The Empirical Cumulative Distribution-based Outlier Detection (ECOD) method has several advantageous attributes that distinguish it from alternative algorithms. ECOD is unique in its lack of dependence on hyperparameters, its computational efficiency and swiftness, and its ease of interpretation and comprehension. The ECOD approach leverages information regarding the distribution of data to identify points that deviate significantly from the majority, thus indicating their outlier status. The ECOD technique calculates the tail probability of each variable using univariate Empirical Cumulative Distribution Functions \(\left( \delta \right)\) and combines these probabilities through multiplication.

Detection of the anomalies through ECOD is done through the computation of three values. ECDfs are used to generate the left- and right-tail probability values,

O-left = Sum of the negative log of the left-tail probability of every variable

O-right = Sum of the negative log of the right-tail probability of every variable

O-auto = Sum of left- or right-tail probability of every variable, depending on whether it is left- or right skewed

Final outlier score of an observation is obtained through taking the extreme negative log probability score.

For mathematically-inclined, following are simplified formulations of the three equations describe above

where \(\gamma _j\) is the skewness coefficient

Proposed methodology

The proposed methodology is designed based on two features of healthcare ecosystem. First, since multiple entities are involved in a health insurance claim including service provider, beneficiary, service and claim, it is important to analyze a transaction in in context of interactions among these entities. Second, rules provide context by showing how various factors interact in an ecosystem, consequently also show the expected behaviour of the system. Based on these two characteristics of the health insurance claims, we use features from three entities, including patient, provider and physician that are represented in the claims data where each instance is a transaction with items corresponding to features of the three players. Apriori is used to mine association rules in the claims transactional data between features. These association rules indicate which features tend to co-occur frequently in instances. Features that are part of strong association rules are considered important or informative. Based on the association rules generated by Apriori, we filter out features that do not meet the support and confidence thresholds. Features that are part of strong association rules with high support and confidence values are retained as selected features.

figure 6

Proposed Ensemble Methodology

Rules capture patterns and associations in the data that may not be evident when analyzing individual transactions. By identifying fraudulent rules from the set of all extracted, a broader understanding of how fraudsters manipulate the system is gained, potentially uncovering more comprehensive fraudulent schemes. Focusing on individual transactions can lead to a high rate of false positives, where legitimate transactions are wrongly flagged as fraudulent due to isolated anomalies. Identifying fraudulent rules allows for a more nuanced approach, reducing false alarms by considering patterns over multiple transactions. Fraudsters continuously evolve their tactics. Identifying rules allows your fraud detection system to adapt to new fraud schemes by detecting changes in patterns and associations, even if the specific transactions involved differ.

A methodology is proposed reference to the Fig. 6 that initiates with the Apriori association rule mining algorithm to derive a set of rules. During association rule mining as shown in the Fig. 7 , it is imperative to apply filters to the mined association rules employing statistical metrics such as support, confidence, and lift ratio. Support denotes the frequency with which an association rule manifests in the dataset, while confidence quantifies the reliability of a rule’s computation. The lift ratio gauges the strength of the association between the antecedents and consequences of the rule. Hence, association rules with support, confidence, and lift ratios falling below predefined thresholds are considered as potential candidates for fraudulent rules. Subsequently, a classifier is employed to categorize these identified rules into fraudulent or non-fraudulent categories, utilizing unsupervised methodologies applied to the Apriori-generated rules.

figure 7

Association Rule Mining Process Flow Chart

In essence, our methodology integrates rule mining, statistical filtering, and machine learning-based classification to identify and distinguish potentially fraudulent rules within the dataset. This approach allows for a more refined and data-driven assessment of suspicious patterns, enhancing the efficiency and accuracy of fraud detection in complex healthcare insurance transactions.

Evaluation metrics

The traditional methods used on the dataset for evaluation are “error-based” focusing primarily on minimizing the number and severity of mistakes in fraud prediction, such as false positives and false negatives. However, this approach is limited in its ability to impact the financial and operational implications of fraud detection efforts. The main problem with the error-based approaches is that it does not account for the varying costs associated with different types of errors. For example, the cost of a false positive (wrongly flagging a legitimate claim as fraudulent) can be vastly different from that of a false negative (failing to detect an actual fraudulent claim). In a healthcare insurance context, the latter might lead to substantial financial losses and undermine the integrity of the insurance system.

Given these challenges, a cost-based evaluation metric becomes more applicable and relevant. This approach incorporates the financial impact of fraud detection decisions, prioritizing actions that save the most money or resources for the insurance provider. Coverage-based metrics align well with the cost-based approach in this context. Coverage reflects the proportion of fraudulent activities that the detection system can identify across the dataset. A high coverage rate means that the system can effectively identify a large portion of fraudulent claims, which is crucial for minimizing financial losses in health insurance fraud. This metric complements the cost-based approach by ensuring that the fraud detection efforts are not just accurate in terms of error minimization but are also comprehensive and financially prudent, addressing the most costly or impactful fraudulent activities first.

For aligning the coverage based metrics to our association rule mining algorithm, we use support, confidence, lift and leverage given in equations 10-13 that evaluate the quality of the resulted rules separately. For Apriori algorithm, support refers to the frequency of an itemset in the database, while confidence measure the strength of the association between two itemsets. We then define cover for each rule that captures the different dependencies between the rules based on the coverage criteria given by given by 14 .

Here, Cover ( Rule ) measures the average distance of every rule with every other rule.

In order to evaluate the results of unsupervised techniques including Isolation Forest, CBLOF, ECOD, and OCSVM, there are a variety of validity metrics proposed where most popular is Silhouette Score [ 60 ]. The silhouette coefficient is calculated by taking into account the mean intra-cluster distance a and the mean nearest-cluster distance b for each data point i.e. \((b - a)/max(a, b)\) [ 61 ]. A silhouette score near +1 indicates correct cluster, near 0 suggests possible alternative cluster, and near -1 indicates wrong cluster.

Results and discussion

This section presents the results and discussion of research on healthcare insurance fraud detection using data mining techniques. The study utilized the open-source CMS 2008-2010 DE-SynPUF dataset, which was preprocessed by removing less important features and encoding the data.

Descriptive analysis

The descriptive analysis allows to identify patterns, trends, and relationships in the data, which assists in drawing important conclusions and making informed decisions. The dataset consists of 66,773 insurance claim records. To streamline the analysis, features related to the Health Care Common Procedure Coding System (HCPCS) are excluded. These codes represent procedures, supplies, products, and services that may be provided to Medicare beneficiaries and individuals enrolled in private health insurance programs. By removing these features, the most relevant and informative features in the inpatient dataset is key focus here.

As shown in Fig. 8 , the dataset contains 2675 unique provider institutions, with 50% of the total occurring less than 10 times in the complete dataset. The provider institution “23006G” occurred in 772 records. The 20 most-occurring institution providers share the count of 7524 transactions. 209 provider institutions were only seen once in the complete dataset. The dataset contains a large number of unique provider institutions, but the majority of these institutions occur very few times in the dataset. Additionally, there are a small number of provider institutions that occur frequently, with the top 20 accounting for a significant proportion of the transactions. Finally, a substantial proportion of the provider institutions in the dataset are only seen once. This information can be used to inform further analysis of the dataset, such as identifying outliers or patterns in the data.

figure 8

Provider Institutions Occurrence in Entire Dataset

The Fig. 9 shows 16670 unique attending physicians in the dataset while 75% of the physicians appear only once or twice. Attending Physician with id ‘9011551271’ appears in 533 transactions. The 20 most appearing attending physician share 5675 transactions. This information suggests that there is a large degree of variation in the frequency of attending physicians in the DE-synPUF dataset. While a small number of physicians occur frequently, the majority occur infrequently, which may have implications for analysis of the data.

figure 9

Attending Physicians Occurrences in Entire Dataset

Figure 10 shows the occurrence of top 20 operating physicians. The term operating physician refers to a physician (e.g., surgeon) who performs an operative procedure in the medical centre and who has the responsibilities outlined in the medical staff rules and regulations. The dataset contains 12076 unique operating physicians, while 75% of the physicians appear only once or twice. The operating physician with id ‘9612910514’ appears 324 times which is the highest occurrence. The 30 most frequent operating physicians shared 4377 transactions. This information suggests that there is a large degree of variation in the frequency of operating physicians in the dataset. While a small number of physicians occur frequently, the majority occur infrequently, which may have implications for analysis of the data.

figure 10

Operating Physicians Occurrences in Entire Dataset

Upon comparing the features of attending physicians and operating physicians, it can be seen in Fig. 11 that 26.7% of the physicians were found in both features.

figure 11

Unique & Common Physicians in Attending Physicians and Operating Physicians

In terms of features related to diagnosis codes, the dataset contains 5357 unique diagnosis codes. 50% of the diagnosis codes appear in fewer than seven transactions. The diagnosis codes are present under the mapping of ICD-9 coding. Diagnosis code ‘4019’ appears 23512 times, and referred to hypertension. Hypertension is also known as high blood pressure. The second most frequently occurring diagnosis code is ‘25000’ which is commonly known as diabetes mellitus without mention of complication, type II or unspecified type, not stated as uncontrolled. Figure 12 refers to the 20 most frequent diagnoses in the transactions.

figure 12

All Diagnosis Codes With Occurrences in the Dataset, 4019 Occurred the Most

The procedure codes are also compared to the diagnosis codes. In some transactions, the procedure codes in Fig. 13 are the same as the diagnosis codes, Fig. 12 . A detailed breakdown of the results of this analysis can be found in the Table 2 . Except the feature procedure code 1, all of the other features has up to 35% same codes as diagnosis codes.

figure 13

All Procedure Codes with Occurrences in Dataset

Figure 14 can be referred as overall summary overall summary of finding the common codes between diagnosis and procedure codes. Feature Procedure_Code_1 has more than 95% of the procedure codes and rest of the features only have around 50% and also contains diagnosis codes.

figure 14

Codes Comparison Summary

Preprocessing

Healthcare insurance fraud is widespread problem and can be perpetrated though various means, including upcoding, misrepresenting procedures to obtain payment for non-covered services, over billing, waiving patient copays or deductibles, and forging or altering medical bills or receipts. Identity theft is also a common way to commit health insurance fraud [ 62 , 63 ]. Insurance fraud are often performed through the partnership of the services provider, patients, and hospital. Fraudster play through the technicalities of billing, unnecessary treatments and unnecessary procedures in order to get unjust benefits.

For this study, nine features are identified based on their relevance to the research question and their potential to provide insight into the relationships or patterns of interest. For example, \(Provider_Institution\) is relevant for understanding the quality of care provided to beneficiaries, while \(Claim_Payment_Amount\) is important for investigating the financial implications of Medicare claims. \(Attending_Physician\) , \(Operating_Physician\) , and \(Other_Physician\) are useful in identifying patterns of physician involvement in care, while \(Claim_Admitting_Diagnosis_Code\) , \(Claim_Day_Spent\) , \(Claim_Diagnosis_Related_Group_Code\) , and \(Claim_Procedure_Code_1\) provide insight into the types of medical conditions and procedures that are most common among Medicare beneficiaries. Ultimately, the selection of these features is based on their potential to answer the research question.

Inpatient dataset of DE-SynPUF contains the 10 features for Beneficiary Diagnosis, that are excluded since dataset also contains \(Claim_Admitting_Diagnosis_Code\) which indicates the beneficiary’s initial diagnosis at the time of admission. Mostly the claim is done through this one feature, rest of the diagnosis codes are mainly used for side diseases. Similarly, the procedure code has 6 features but only \(Claim_Procedure_Code_1\) is used. Rest of the procedure code features contains the same code as diagnosis code. After the selection of the features, the values within every feature was labelled in such a way that help distinguish the code after the generation of the rules though association rule mining as shown in the Table 3 .

Findings and interpretations

Two experiments are conducted in this study. Initially, all baseline anomaly detection techniques are applied on the preprocessed data. This approach was time intensive as presented in Table 4 . In second experiment, frequent rules are mined using association rule mining, specifically, through apriori algorithm and then unsupervised techniques are applied on the extracted rules. The time consumed using this approach is presented in Table 5 . The time delay seen through the comparison of the two approaches shows our approach performs better even when using 100% of the dataset. The achieved difference in time is due to the fact that when using conventional approach, the experiment needs to be repeated each time a new transaction is added to the database, however the proposed approach works by extracting rules from transactions once thereby training our model to classify new instances of transactions as fraudulent or non fraudulent.

The Apriori association rule mining algorithm when applied on the preprocessed dataset, results in 72 rules that frequently appear together in the CMS 2008-2010 DE-SynPUF dataset, presented in Table 6 . Association rule mining seeks high-confidence rules. Confidence measures the strength of the association between two item sets, while support measures their frequency in the database. To evaluate the rules generated through Apriori association rule mining, the coverage score against every rule is calculated.

The rules only give information about the itemsets appearing together in the transaction, it does not identify the nature of the rule; thats is, if it is normal or fraudulent. To identify the nature of the generated rules, Isolation Forest algorithm is used over the rules. The Isolation Forest works by creating random decision trees to isolate fraudulent points from normal points in the dataset. The algorithm initially identified 14 fraudulent rules in the DE-SynPUF dataset. However, due to the sensitive nature of healthcare and financial transactions, three additional unsupervised algorithms named CBLOF, ECOD, and OCSVM are applied to obtain more reliable and weighted results.

As a result, the CBLOF, ECOD, and OCSVM algorithms identified 8, 4, and 8 fraudulent rules, respectively as shown in Fig. 15 . In total, 52 out of 72 rules were marked as normal by all of the algorithms. However, in combination, 20 rules were marked as fraudulent by one or more algorithms.

figure 15

Fraudulent Classification Through Classifiers

The results of our analysis are presented in Table 7 , which shows the classification of rules according to the number of algorithms that classified them as fraudulent. The table indicates that 52 rules were classified as normal, while 10 were classified as fraudulent by one algorithm, 6 were classified as fraudulent by two algorithms, and 4 were classified as fraudulent by three algorithms . No rules were classified as fraudulent by all four algorithms.

These findings suggest that a combination of association rule mining and un-supervised classifiers help us achieve more reliable results in detecting healthcare insurance fraud. Detailed results against the transactions can be seen in Table 8 . For example, The first rule states that if the Diagnosis Code is 7802 and the Patient Pay is less than $5000 USD, the claim is classified as fraudulent (1) by the One-OCSVM and ECOD, while it is classified as normal (0) by the Isolation Forest and CBLOF detectors. As for the second rule, it states that if the Procedure Code is 3995 and the Patient Pay more than $5,000 USD and less than$10,000 USD, the claim is classified as normal (0) by all detectors except ECOD. The following table shows the grammar along with the fraudulent status by the selected anomaly detection techniques.

The silhouette scores method is then applied to evaluate the effectiveness of four different anomaly detection techniques: Isolation Forest, CBLOF, ECOD, and OCSVM. The result of this evaluation is presented in Table 9 . The silhouette scores for each technique are listed in the “Scores” column, while the “Classifier” column specifies the name of the anomaly detection technique used. As can be seen, the CBLOF technique has the highest silhouette score of 0.114, followed by Isolation Forest with a score of 0.103. The ECOD and OCSVM techniques have lower scores of 0.063 and 0.060, respectively.

While comparing our study to the existing work on the same dataset, such as listed in Table 1 , it is crucial to understand that the work presented in these studies is not transaction based and therefore cannot be compared at transactional level for fraud detection. In addition, process of feature engineering applied heavily relies on the available data sources, which limits the range of covariates linked to their single target variable, provider fraud, across different domains. For instance, while provider specialty is a common covariate in analyzing professional claims, it’s often overlooked in prescription claims analysis for different types of pharmacies. Our proposed approach adapts to the available features across various players and data sources within a domain. Our methodology is also capable of “graceful degradation,” meaning it continues to function when some data or variables are absent. Furthermore, the studies typically start with aggregated data before applying predictive algorithms, leaving the specifics of how certain claim components are aggregated rather vague. For instance, the method for aggregating variables such as the quantity dispensed and the compounding details in prescription claims is not well-defined. Lastly, the evaluation methods used are “error-based” instead of “cost-based”. Our presented approach extracts patterns or insights from disaggregated claims data, enhancing the current fraud literature and presenting a methodology evaluated on cost-based metric (coverage) making it suitable for practical investigative use.

There are, however, certain limitations in the presented work regarding the dataset diversity used for healthcare insurance fraud detection. While the dataset provided valuable insights into fraudulent patterns among patients, physicians, and services, it may not have fully captured the diversity of fraud scenarios prevalent in real-world settings. This limitation could potentially lead to biased or incomplete fraud detection model, as certain types of fraud or unique patterns may not have been adequately represented in the dataset. To address this limitation future work focuses on exploring techniques such as data augmentation to enrich the existing dataset. This could involve generating synthetic data points or incorporating external data sources to introduce more variability and complexity into the dataset.

The findings demonstrate the effectiveness of data mining techniques for healthcare insurance fraud detection that can have important implications for fraud prevention efforts in the healthcare industry. Further research can explore the temporal aspect of fraud patterns by conducting a thorough temporal analysis. This involves examining historical data to identify trends and changes in fraudulent behavior over time. By understanding how fraud patterns evolve and adapt, researchers can develop dynamic fraud detection models that can effectively detect emerging fraud schemes.

The complexity and substantial monetary value of the healthcare industry makes it a desirable target for fraudulent activity. Due to the growing older population, healthcare insurance has been a consistent focus. The Centers for Medicare & Medicaid Services (CMS) and other organizations work ceaselessly to reduce fraudulent operations. The use of publicly accessible healthcare insurance data to identify and prevent potential fraudulent actions is a recent development, despite the issue’s longevity. Effective machine learning solutions can drastically minimize fraudulent occurrences and the resources necessary to investigate probable fraud cases.

In this study, a methodology based on combination of pattern recognition through association rule mining and unsupervised learning techniques is presented for detecting healthcare insurance fraud. Apriori association rule mining technique is used,that is not previously used on CMS 2008-2010 DE-SynPUF dataset. Rules obtained are further provided to the anomaly detection algorithms such as Isolation Forest, OCSVM, ECOD, and CBLOF. After combining all results, 20 rules are classified as fraudulent by one or more than one algorithms, and 52 are marked as normal.

The presented study shows promising results in detecting healthcare insurance fraud through identified methodology and provides a strong foundation for future research in the detection of healthcare insurance fraud using unsupervised learning techniques. The work is intended to continue towards improving the performance and developing a more comprehensive and effective framework for detecting fraudulent activities in healthcare insurance datasets.

Availability of data and materials

No datasets were generated or analysed during the current study.

Government of Pakistan. Introduction, Sehat Sahulat Program. 2019. https://sehatinsafcard.com/introduction.php . Accessed January 2023.

Government of Pakistan. Benefits Package. 2019. https://sehatinsafcard.com/benefits.php . Accessed January 2023.

Government of United States. Centers for Medicare and Medicaid Services. 1965. https://www.medicare.gov/ . Accessed January 2023.

Gee J, Button M, Brooks G. The financial cost of healthcare fraud: what data from around the world shows. 2010.

Berwick DM, Hackbarth AD. Eliminating waste in US health care. JAMA. 2012;307(14):1513–6.

Article   CAS   PubMed   Google Scholar  

M King K. Progress Made, but More Action Needed to Address Medicare Fraud, Waste, and Abuse. 2014. https://www.gao.gov/assets/gao-14-560t.pdf . Accessed January 2023.

Barrett P. Global Claims Fraud Survey. 2017. https://www.rgare.com/docs/default-source/knowledge-center-articles/rga-2017-global-claims-fraud-survey-white-paper---final.pdf?sfvrsn=601a588_0 . Accessed January 2023.

Miller A. Health and hard time. Can Med Assoc; 2013.

Hansson A, Cedervall H. Insurance Fraud Detection using Unsupervised Sequential Anomaly Detection. 2022.

Hayat MK, Daud A, Banjar A, Alharbey R, Bukhari A. A deep co-evolution architecture for anomaly detection in dynamic networks. Multimed Tools Appl. 2023:1–20.

Hayat MK, Daud A. Anomaly detection in heterogeneous bibliographic information networks using co-evolution pattern mining. Scientometrics. 2017;113(1):149–75.

Article   CAS   Google Scholar  

Gomes C, Jin Z, Yang H. Insurance fraud detection with unsupervised deep learning. J Risk Insur. 2021;88(3):591–624.

Article   Google Scholar  

Matloob I, Khan S, ur Rahman H, Hussain F. Medical health benefit management system for real-time notification of fraud using historical medical records. Appl Sci. 2020;10(15):5144.

Lu J, Lin K, Chen R, Lin M, Chen X, Lu P. Health insurance fraud detection by using an attributed heterogeneous information network with a hierarchical attention mechanism. BMC Med Inform Decis Mak. 2023;23(1):1–17.

Masood I, Wang Y, Daud A, Aljohani NR, Dawood H. Towards smart healthcare: patient data privacy and security in sensor-cloud infrastructure. Wirel Commun Mob Comput. 2018;2018:1–23.

Benedek B, Ciumas C, Nagy BZ. Automobile insurance fraud detection in the age of big data–a systematic and comprehensive literature review. J Financ Regul Compliance. 2022.

Yadav C, Wang S, Kumar M. An approach to improve apriori algorithm based on association rule mining. In: 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT). IEEE; 2013. p. 1–9.

Kareem S, Ahmad RB, Sarlan AB. Framework for the identification of fraudulent health insurance claims using association rule mining. In: 2017 IEEE Conference on Big Data and Analytics (ICBDA). IEEE; 2017. p. 99–104.

Sornalakshmi M, Balamurali S, Venkatesulu M, Krishnan MN, Ramasamy LK, Kadry S, et al. An efficient apriori algorithm for frequent pattern mining using mapreduce in healthcare data. Bull Electr Eng Inform. 2021;10(1):390–403.

Abdullah U, Ahmad J, Ahmed A. Analysis of effectiveness of apriori algorithm in medical billing data mining. In: 2008 4th International Conference on Emerging Technologies. IEEE; 2008. p. 327–331.

Thornton D, van Capelleveen G, Poel M, van Hillegersberg J, Mueller RM. Outlier-based Health Insurance Fraud Detection for US Medicaid Data. In: ICEIS (2). 2014. p. 684–694.

Feroze A, Daud A, Amjad T, Hayat MK. Group anomaly detection: past notions, present insights, and future prospects. SN Comput Sci. 2021;2:1–27.

Kirlidog M, Asuk C. A Fraud Detection Approach with Data Mining in Health Insurance. Procedia Soc Behav Sci. 2012;62:989–94. https://doi.org/10.1016/j.sbspro.2012.09.168 . World Conference on Business, Economics and Management (BEM-2012), May 4–6 2012, Antalya, Turkey.

Gao Y, Sun C, Li R, Li Q, Cui L, Gong B. An Efficient Fraud Identification Method Combining Manifold Learning and Outliers Detection in Mobile Healthcare Services. IEEE Access. 2018;6:60059–68. https://doi.org/10.1109/ACCESS.2018.2875516 .

Alwan RH, Hamad MM, Dawood OA. A comprehensive survey of fraud detection methods in credit card based on data mining techniques. In: AIP Conference Proceedings. vol. 2400. AIP Publishing LLC; 2022. p. 020006.

Shang W, Zeng P, Wan M, Li L, An P. Intrusion detection algorithm based on OCSVM in industrial control system. Secur Commun Netw. 2016;9(10):1040–9.

Maglaras LA, Jiang J, Cruz T. Integrated OCSVM mechanism for intrusion detection in SCADA systems. Electron Lett. 2014;50(25):1935–6.

Ghiasi R, Khan MA, Sorrentino D, Diaine C, Malekjafarian A. An unsupervised anomaly detection framework for onboard monitoring of railway track geometrical defects using one-class support vector machine. Eng Appl Artif Intell. 2024;133:108167.

Maglaras LA, Jiang J, Cruz TJ. Combining ensemble methods and social network metrics for improving accuracy of OCSVM on intrusion detection in SCADA systems. J Inf Secur Appl. 2016;30:15–26. https://doi.org/10.1016/j.jisa.2016.04.002 .

Maglaras LA, Jiang J. Ocsvm model combined with k-means recursive clustering for intrusion detection in scada systems. In: 10th International conference on heterogeneous networking for quality, reliability, security and robustness. IEEE; 2014. p. 133–134.

Wang Z, Fu Y, Song C, Zeng P, Qiao L. Power system anomaly detection based on OCSVM optimized by improved particle swarm optimization. IEEE Access. 2019;7:181580–8.

Amer M, Goldstein M, Abdennadher S. Enhancing one-class support vector machines for unsupervised anomaly detection. In: Proceedings of the ACM SIGKDD workshop on outlier detection and description. 2013. p. 8–15.

Liu FT, Ting KM, Zhou ZH. Isolation Forest. In: 2008 Eighth IEEE International Conference on Data Mining. 2008. p. 413–422. https://doi.org/10.1109/ICDM.2008.17 .

Xu D, Wang Y, Meng Y, Zhang Z, An improved data anomaly detection method based on isolation forest. In: 2017 10th international symposium on computational intelligence and design (ISCID). vol. 2. IEEE; 2017. p. 287–91.

Cheng Z, Zou C, Dong J. Outlier detection using isolation forest and local outlier factor. In: Proceedings of the conference on research in adaptive and convergent systems. 2019. p. 161–168.

Ding Z, Fei M. An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window. IFAC Proc. 2013;46(20):12–7.

Google Scholar  

Lesouple J, Baudoin C, Spigai M, Tourneret JY. Generalized isolation forest for anomaly detection. Pattern Recogn Lett. 2021;149:109–19.

Suesserman M, Gorny S, Lasaga D, Helms J, Olson D, Bowen E, et al. Procedure code overutilization detection from healthcare claims using unsupervised deep learning methods. BMC Med Inform Decis Mak. 2023;23(1):196.

Article   PubMed   PubMed Central   Google Scholar  

He Z, Xu X, Deng S. Discovering cluster-based local outliers. Pattern Recogn Lett. 2003;24(9):1641–50. https://doi.org/10.1016/S0167-8655(03)00003-5 .

John H, Naaz S. Credit Card Fraud Detection using Local Outlier Factor and Isolation Forest. Int J Comput Sci Eng. 2019;7:1060–1064. https://doi.org/10.26438/ijcse/v7i4.10601064 .

Kanyama MN, Nyirenda C, Clement-Temaneh N. Anomaly Detection in Smart Water metering Networks. In: The 5th International Workshop on Advanced Computational Intelligence and Intelligent Informatics (IWACIII2017). 2017. p. 1–10.

Ullah I, Hussain H, Rahman S, Rahman A, Shabir M, Ullah N, et al. Using K-Means, LOF, and CBLOF as Prediction Tools.

Ullah I, Hussain H, Ali I, Liaquat A, Churn prediction in banking system using K-means, LOF, and CBLOF. In: 2019 International conference on electrical, communication, and computer engineering (ICECCE). IEEE; 2019. p. 1–6.

Bauder R, Khoshgoftaar T. Medicare fraud detection using random forest with class imbalanced big data. Proceedings-2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science, IRI 2018, 80–87. 2018.

Bauder RA, Khoshgoftaar TM. The detection of medicare fraud using machine learning methods with excluded provider labels. In: The Thirty-First International Flairs Conference. 2018.

Herland M, Khoshgoftaar TM, Bauder RA. Big data fraud detection using multiple medicare data sources. J Big Data. 2018;5(1):1–21.

Herland M, Bauder RA, Khoshgoftaar TM. The effects of class rarity on the evaluation of supervised healthcare fraud detection models. J Big Data. 2019;6:1–33.

Fan B, Zhang X, Fan W. In: Identifying Physician Fraud in Healthcare with Open Data. 2019. p. 222–235. https://doi.org/10.1007/978-3-030-34482-5_20 .

Fulton LV, Adepoju OE, Dolezel D, Ekin T, Gibbs D, Hewitt B, et al. Determinants of diabetes disease management, 2011–2019. In: Healthcare. vol. 9. MDPI; 2021. p. 944.

Sadiq S, Tao Y, Yan Y, Shyu ML, Mining anomalies in medicare big data using patient rule induction method. In: 2017 IEEE third international conference on multimedia Big Data (BigMM). IEEE; 2017. p. 185–92.

Sadiq S, Shyu ML. Cascaded propensity matched fraud miner: Detecting anomalies in medicare big data. J Innov Technol. 2019;1(1):51–61.

Zafari B, Ekin T. Topic modelling for medical prescription fraud and abuse detection. J R Stat Soc Ser C Appl Stat. 2019;68(3):751–69.

Ekin T, Lakomski G, Musal RM. An unsupervised Bayesian hierarchical method for medical fraud assessment. Stat Anal Data Min ASA Data Sci J. 2019;12(2):116–24.

US Department of Health and Human Services, Office of Inspector General (OIG). LEIE Downloadable Databases. https://oig.hhs.gov/exclusions/exclusions_list.asp . Accessed January 2023.

Pande V, Maas W. Physician Medicare fraud: Characteristics and consequences. Int J Pharm Healthc Mark. 2013;7. https://doi.org/10.1108/17506121311315391 .

Agrawal R, Srikant R, et al. Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB. vol. 1215. Santiago; 1994. p. 487–499.

Liu X, Zhao Y, Sun M. An improved apriori algorithm based on an evolution-communication tissue-like P system with promoters and inhibitors. Discret Dyn Nat Soc. 2017;2017.

Santoso MH. Application of Association Rule Method Using Apriori Algorithm to Find Sales Patterns Case Study of Indomaret Tanjung Anom. Brilliance Res Artif Intell. 2021;1(2):54–66.

Schölkopf B, Williamson RC, Smola A, Shawe-Taylor J, Platt J. Support vector method for novelty detection. Adv Neural Inf Process Syst. 1999;12.

Rousseeuw PJ. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7 .

Shahapure KR, Nicholas C, Cluster quality analysis using silhouette score. In: 2020 IEEE 7th international conference on data science and advanced analytics (DSAA). IEEE; 2020. p. 747–8.

of United States G. Health Care Fraud. FBI; 2016. https://www.fbi.gov/investigate/white-collar-crime/health-care-fraud . Accessed January 2023.

of Michigan S. What is Health Insurance Fraud? https://www.michigan.gov/difs/consumers/fraud/what-is-health-insurance-fraud . Accessed January 2023.

Download references

Acknowledgements

Not applicable.

Institutional review board

Not applicable” for studies not involving humans or animals.

Informed consent

Not applicable” for studies not involving humans.

This research is funded by Rabdan Academy, Abu Dhabi, United Arab Emirates.

Author information

Zain Hamid, Fatima Khalique, Saba Mahmood, Ali Daud, Amal Bukhari and Bader Alshemaimri contributed equally to this work.

Authors and Affiliations

Department of Computer Science, Bahria University, Islamabad, Pakistan

Zain Hamid, Fatima Khalique & Saba Mahmood

Faculty of Resilience, Rabdan Academy, Abu Dhabi, United Arab Emirates

Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia

Amal Bukhari

Software Engineering Department, College of Computing and Information Sciences, King Saud University, Riyadh, Saudi Arabia

Bader Alshemaimri

You can also search for this author in PubMed   Google Scholar

Contributions

Zain, Fatima and Saba wrote a major part of the paper under the supervision of Saba and Ali. Ali, Bader and Amal have helped design and improve the methodology and improved the paper initial draft with Zain and Fatima. Bader and Amal have helped in improving the paper sections, such as, review methodology, datasets, performance evaluation and challenges and future directions. Ali and Amal have improved the technical writing of the paper overall. All authors are involved in revising the manuscript critically and have approved the final version of the manuscript.

Corresponding author

Correspondence to Ali Daud .

Ethics declarations

Ethics approval and consent to participate, consent for publication.

All authors are agreed to publish this work.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Hamid, Z., Khalique, F., Mahmood, S. et al. Healthcare insurance fraud detection using data mining. BMC Med Inform Decis Mak 24 , 112 (2024). https://doi.org/10.1186/s12911-024-02512-4

Download citation

Received : 03 December 2023

Accepted : 15 April 2024

Published : 26 April 2024

DOI : https://doi.org/10.1186/s12911-024-02512-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Unsupervised learning
  • Healthcare insurance
  • Healthcare insurance frauds
  • Association rules mining techniques

BMC Medical Informatics and Decision Making

ISSN: 1472-6947

literature review for credit card fraud detection

COMMENTS

  1. Review of Machine Learning Approach on Credit Card Fraud Detection

    Massive usage of credit cards has caused an escalation of fraud. Usage of credit cards has resulted in the growth of online business advancement and ease of the e-payment system. The use of machine learning (methods) are adapted on a larger scale to detect and prevent fraud. ML algorithms play an essential role in analysing customer data. In this research article, we have conducted a ...

  2. A systematic review of literature on credit card cyber fraud detection

    Credit card fraud detection using weighted support vector machine. Journal: 2020: Zhang, Bhandari & Black (2020) A137: Machine learning methods for analysis fraud credit card transaction. Journal: 2019: Saragih et al. (2019) A138: A review on credit card fraud detection using machine learning. Journal: 2019: Shirgave et al. (2019) A139

  3. (PDF) Credit Card Fraud Detection

    This is a systematic literature review to reflect the previous studies that dealt with credit card fraud detection and highlight the different machine learning techniques to deal with this problem ...

  4. PDF A systematic review of literature on credit card cyber fraud detection

    How to cite this article Marazqah Btoush EAL, Zhou X, Gururajan R, Chan KC, Genrich R, Sankaran P. 2023. A systematic review of literature on credit card cyber fraud detection using machine and deep learning. PeerJ Comput. Sci. 9:e1278 DOI 10.7717/peerj-cs.1278 Submitted 28 December 2022 Accepted 15 February 2023 Published 17 April 2023

  5. Credit Card Fraud Detection using Machine Learning: A Systematic

    A systematic literature review that systematically reviews and synthesizes the existing literature on machine learning (ML)-based fraud detection showed that support vector machine and artificial neural network are popular ML algorithms used for fraud detection, and credit card fraud is the most popular fraud type addressed using ML techniques.

  6. Credit card fraud detection in the era of disruptive technologies: A

    Though largely tested for credit card fraud detection, both KNN and SVM are computationally expensive and may show reduced performance in detecting credit card fraud for large datasets. One of the unsupervised analogy-based solutions suggested in the literature to detect fraudulent transactions is the use of recommender systems.

  7. A systematic review of literature on credit card cyber fraud detection

    In our review, we identified 181 research articles, published from 2019 to 2021. For the benefit of researchers, review of machine learning/deep learning techniques and their relevance in credit card cyber fraud detection is presented. Our review provides direction for choosing the most suitable techniques. This review also discusses the major ...

  8. Credit card fraud detection using a hierarchical behavior ...

    With the advancement in machine learning, researchers continue to devise and implement effective intelligent methods for fraud detection in the financial sector. Indeed, credit card fraud leads to billions of dollars in losses for merchants every year. In this paper, a multi-classifier framework is designed to address the challenges of credit card fraud detections. An ensemble model with ...

  9. PDF Literature Review On Identification Of Fraudulent Credit Card Fraud

    The purpose of this literature review is to provide an overview of current research and achievements in credit card fraud detection. It discusses the primary approaches, algorithms, and datasets ... They conducted a survey on credit card fraud detection, taking into account the three main types of fraud: insurance, corporate, and bank. The two ...

  10. A Review of Credit Card Fraud Detection Using Machine Learning

    Therefore, the emergence of the credit card use and the increasing number of fraudsters have generated different issues that concern the banking sector. Unfortunately, these issues obstruct the performance of Fraud Control Systems (Fraud Detection Systems & Fraud Prevention Systems) and abuse the transparency of online payments.

  11. Credit Card Fraud Detection Using Machine Learning

    Reports of Credit card fraud in the US rose by 44.7% from 271,927 in 2019 to 393,207 reports in 2020. There are two kinds of credit card fraud, the first one is by having a credit card account opened under your name by an identity thief, reports of this fraudulent behavior increased 48% from 2019 to 2020.

  12. Credit Card Fraud Detection using Machine Learning Algorithms

    Abstract. Credit card frauds are easy and friendly targets. E-commerce and many other online sites have increased the online payment modes, increasing the risk for online frauds. Increase in fraud rates, researchers started using different machine learning methods to detect and analyse frauds in online transactions.

  13. Fraud detection and prevention in e-commerce: A systematic literature

    Credit card fraud detection is a common topic some studies address. For instance, Sorournejad et al. (2016) review credit card fraud detection techniques into two categories: supervised or unsupervised. They present a taxonomy detailing the different techniques found in the literature with a focus on the two categories.

  14. PDF Literature Review of Different Machine Learning Algorithms for Credit

    Literature Review of Different Machine Learning Algorithms for Credit Card Fraud Detection. Nayan Uchhana, Ravi Ranjan, Shashank Sharma, Deepak Agrawal, Anurag Punde. Abstract: Every year fraud cost generated in the economy is more than $4 trillion internationally. This is unsurprising, as the return on investment for fraud can be massive.

  15. Review on Credit Card Fraud Detection Techniques

    Online transactions have taken the world by storm in today's society and credit cards stand one of the maximum used expense methods. Because of this popularity, fraud has arisen in this field, which is called credit card fraud. Credit card fraud has become a worldwide concern. This research work examines the identification of credit card fraud techniques. Different machine learning, data ...

  16. Credit card fraud and detection techniques: A review

    Banks and Bank Systems, Volume 4, Issue 2, 2009. 57. Linda Delamaire (UK), Hussein Abdou (UK), John Pointon (UK) Credit card fraud and detection techniques: a review. Abstract. Fraud is one of the ...

  17. A Critical review of Credit Card Fraud Detection Techniques

    Abstract: Credit card fraud is one of the most important threats that affect people as well as companies across the world, particularly with the growing volume of financial transactions using credit cards every day. This puts the security of financial transactions at serious risk and calls for a fundamental solution. In this paper, we discuss various techniques of credit card fraud detection ...

  18. An intelligent payment card fraud detection system

    Aggregated features. In this section, we review feature aggregation for fraud detection. Among various feature aggregation methods, feature averaging summarizes the cardholder activities by comparing the spending habits and patterns (Russac et al., 2018).In Bahnsen et al. (), a credit card-related database for fraud detection was examined.By analyzing the periodic behaviors over time, an ...

  19. PDF Credit Card Fraud Detection using Machine Learning: A Systematic

    customers use credit card for buying things online. In this way, some of the customers can be the thief who has stolen the card of a person to make the online transactions. This is considered as the credit card fraud that must be detected. This fraud can also be in the form of any purchase by using the credit card in an unauthorized way.

  20. Fraud risk assessment in car insurance using claims graph features in

    Semantic Scholar extracted view of "Fraud risk assessment in car insurance using claims graph features in machine learning" by Ivan Vorobyev ... A novel combined approach based on deep Autoencoder and deep classifiers for credit card fraud detection. Hosein Fanai H. Abbasimehr. Computer Science, Business ... A systematic literature review. V. F ...

  21. Credit Card Fraud Detection Using Machine Learning Techniques

    There are many types of fraud in our daily life. One of the frauds occurring these days is credit card fraud. When people around the globe make credit card transactions, there will also be fraudulent transactions. To avoid credit card fraud, we must know the patterns and how the fraud values differ. This paper proposed credit card fraud detection using machine learning based on the labeled ...

  22. Healthcare insurance fraud detection using data mining

    Automobile insurance fraud detection in the age of big data-a systematic and comprehensive literature review. J Financ Regul Compliance. 2022. Yadav C, Wang S, Kumar M. An approach to improve apriori algorithm based on association rule mining. ... John H, Naaz S. Credit Card Fraud Detection using Local Outlier Factor and Isolation Forest. Int ...