systematic literature review on open source software

Open access
Published: 08 November 2016

A systematic literature review of open source software quality assessment models

Adewole Adewumi 1 ,
Sanjay Misra ORCID: orcid.org/0000-0002-3556-9331 1 , 2 ,
Nicholas Omoregbe 1 ,
Broderick Crawford 3 &
Ricardo Soto 3

SpringerPlus volume 5 , Article number: 1936 ( 2016 ) Cite this article

7599 Accesses

23 Citations

6 Altmetric

Metrics details

Many open source software (OSS) quality assessment models are proposed and available in the literature. However, there is little or no adoption of these models in practice. In order to guide the formulation of newer models so they can be acceptable by practitioners, there is need for clear discrimination of the existing models based on their specific properties. Based on this, the aim of this study is to perform a systematic literature review to investigate the properties of the existing OSS quality assessment models by classifying them with respect to their quality characteristics, the methodology they use for assessment, and their domain of application so as to guide the formulation and development of newer models. Searches in IEEE Xplore, ACM, Science Direct, Springer and Google Search is performed so as to retrieve all relevant primary studies in this regard. Journal and conference papers between the year 2003 and 2015 were considered since the first known OSS quality model emerged in 2003.

A total of 19 OSS quality assessment model papers were selected. To select these models we have developed assessment criteria to evaluate the quality of the existing studies. Quality assessment models are classified into five categories based on the quality characteristics they possess namely: single-attribute, rounded category, community-only attribute, non-community attribute as well as the non-quality in use models. Our study reflects that software selection based on hierarchical structures is found to be the most popular selection method in the existing OSS quality assessment models. Furthermore, we found that majority (47%) of the existing models do not specify any domain of application.

Conclusions

In conclusion, our study will be a valuable contribution to the community and helps the quality assessment model developers in formulating newer models and also to the practitioners (software evaluators) in selecting suitable OSS in the midst of alternatives.

Prior to the emergence of open source software (OSS) quality models, the McCall, Dromey and ISO 9126 models were already in existence (Miguel et al. 2014 ). These models however did not consider some quality attributes unique to OSS such as community—a body of users and developers formed around OSS who contribute to the software and popularize it (Haaland et al. 2010 ). This gap is what led to the evolution of OSS quality models. Majority of the OSS quality models that exist today are derived from the ISO 9126 quality model (Miguel et al. 2014 ; Adewumi et al. 2013 ). It defines six internal and external quality characteristics, which are functionality, reliability, usability, efficiency, maintainability and portability. ISO 25010 replaced the ISO 9126 in 2010 (ISO/IEC 9126 2001 ), it has the following product quality attributes (ISO/IEC 2501 0 2001): functional suitability, reliability, performance efficiency, operability, security, compatibility, maintainability and transferability. The ISO 25010 quality in use attributes includes effectiveness, efficiency, satisfaction, safety and usability.

It is important to note that ISO 25010 can serve as standard for OSS only in terms of product quality and quality in use. It does not address unique characteristics of OSS such as the community. A key distinguishing feature of OSS is that it is built and maintained by a community (Haaland et al. 2010 ). The quality of this community also determines the quality of the OSS (Samoladas et al. 2008 ). From the literature, community related quality characteristics include (Soto and Ciolkowski 2009 ): maintenance capacity, sustainability, and process maturity. Maintenance capacity refers to the number of contributors to an OSS project and the amount of time they are willing and able to contribute to the development effort as observed from versioning logs, mailing lists, discussion forums and bug report systems. Furthermore, sustainability refers to the ability of the community to grow in terms of new contributors and to regenerate by attracting and engaging new members to take the place of those leaving the community. In addition, process maturity refers to the adoption and use of standard practices in the development process such as submission and review of changes, peer review of changes, provision of a test suite, and planned releases.

Since the advent of the first OSS quality model in 2003 (Adewumi et al. 2013 ), a number of other models have since been derived leading to an increasing collection of OSS quality models. Quality models in general can be classified into three broad categories namely: definition, assessment and prediction models (Ouhbi et al. 2014 , 2015 ; Deissenboeck et al. 2009 ). Generally, OSS quality assessment models outline specific attributes that guide the selection of OSS. The assessment models are very significant because they can help software evaluators to select suitable OSS in the midst of alternatives (Kuwata et al. 2014 ). However, despite the numerous quality assessment models proposed, there is still little or no adoption of these models in practice (Hauge et al. 2009 ; Ali Babar 2010 ). In order to guide the formulation of newer models, there is need to understand the nature of the existing OSS quality assessment models. The aim of this study is to investigate the nature of the existing OSS quality assessment models by classifying them with respect to their quality characteristics, the methodology they use for assessment, and their domain of application so as to guide the formulation and development of newer models. Existing studies on OSS quality assessment models (Miguel et al. 2014 ; Adewumi et al. 2013 ) are largely descriptive reviews that did not seek to classify OSS quality assessment models along specific dimensions, or answer specific research questions. In contrast, this paper employs a methodical, structured, and rigorous analysis of existing literature in order to classify existing OSS quality assessment models and establish a template guide for model developers when they come up with new models. Thus, this study is a systematic literature review that investigates three research questions, namely: (1) what are the key quality characteristics possessed by the OSS assessment models? (2) What selection methods are employed for use in these assessment models? (3) What is the domain of application? In order to conduct this systematic review, the original guidelines proposed by Kitchenham ( 2004 ) have been followed.

The rest of this paper is structured as follows: “ Methods ” section describes the method of obtaining the existing OSS quality models. “ Results ” section presents the results obtained in the study, while “ Summary and discussion ” section discusses the findings of the study. “ Conclusion and future work ” section concludes the paper with a brief note.

This section outlines the research questions posed in this study and also explains in detail the rationale behind each question. It goes on to discuss the search strategy for retrieving the relevant papers; criteria for including any given paper in the study; quality assessment of the retrieved papers as well as how relevant information was extracted from each selected paper.

Research questions

This study aims at gaining insight into the existing OSS quality models and addresses three research questions. The three research questions alongside the rationale motivating each question is presented in Table 1 . These form the basis for defining the search strategy.

Search strategy

A search string was defined based on the keywords derived from the research question as follows: “(Open Source Software OR libre OR OSS or FLOSS or FOSS) AND (model OR quality model OR measurement model OR evaluation model)”.

In order to retrieve the primary studies containing OSS quality models we made use of Scopus digital library. It indexes several renowned scientific journals, books and conference proceedings (e.g. IEEE, ACM, Science Direct and Springer). We considered only papers from (2003 to 2015) since the first OSS quality model emerged in 2003 (Haaland et al. 2010 ; Adewumi et al. 2013 ). We also focused on journal papers and conference proceedings in the subject area of Computer Science that were written in English. A total of 3198 primary studies were initially retrieved. After checking through their titles and abstracts, the number was reduced to 209. To be sure that no paper had been left out, we also performed a search in IEEE Explore, ACM and Springer using the same search string. No new papers were retrieved from this search that had not already been seen from the search in Scopus. Furthermore, a search was performed using Google Search and two relevant articles were retrieved (Duijnhouwer and Widdows 2003 ; Atos 2006 ) and added to make a total of 211 retrieved papers. These papers were read in detail to determine their suitability for inclusion.

Inclusion criteria

Papers proposing cost models and conceptual models were removed. Also position papers and papers that did not present a model for assessing quality in OSS in order to guide selection in the midst of alternatives were also removed. A crosscheck was conducted through the reference list of candidate studies to ensure that no model had been left out. As a result, 19 primary studies were selected, which are further discussed in the next segment of this section.

Quality assessment

Each primary study was evaluated by using the criteria defined in Adewumi et al. ( 2013 ). The criteria are based on four quality assessment (QA) questions:

Are the model’s attributes derived from a known standard (this can be ISO 9126, ISO 25010 or CMMI)?

Is the evaluation procedure of the model adequately described?

Does a tool support the evaluation process?

Is a demonstration of quality assessment using the model provided?

The questions were scored as follows:

Y (yes), the model’s attribute are mostly derived from a known standard, P (Partly), only a few of the model’s attributes are derived from a known standard; N (no), the model’s attributes are not all derived from a known standard.

Y, the evaluation procedure of the model are adequately described; P, the evaluation procedure was described inadequately; N, the evaluation procedure of the model was not described at all.

Y, the evaluation process is fully supported by a tool; P, the evaluation process is partially supported by a tool; N no tool support is provided for the evaluation process.

Y a complete demonstration of quality assessment using the model is provided; P only a partial demonstration of quality assessment using the model is provided; N there is no demonstration of quality assessment using the model provided.

The scoring procedure was Y = 1, P = 0.5, N = 0. The first author coordinated the quality evaluation extraction process. The first author assessed every paper, and assigned 5 papers each to the second, third and fourth authors and 4 papers to the fifth author so they could assess independently. When there was a disagreement, we discussed the issues until we reached agreement.

Data extraction strategy

In this phase, the first author extracted the data while the other four authors checked the extraction. This approach though inconsistent with the medical standards summarized in Kitchenham’s guidelines ( 2004 ) has been found useful in practice (Brereton et al. 2007 ). The first author coordinated the data extraction and checking tasks, which involved all of the authors of this paper. Allocation was not randomized rather it was based on the time availability of the individual researchers. When there was a disagreement, we discussed the issues until we reached agreement.

The selected studies were gleaned to collect the data that would provide the set of possible answers to the research questions. Table 2 shows the data extraction form that was created as an Excel sheet and filled by the first author for each of the papers selected.

From Table 2 it can be observed that the information extracted includes: the Study Ref., title, and classification [publication outlet, publication year and research questions (RQ) 1, 2 and 3].

Quality characteristics that the models in the selected studies can possess include the product quality and the quality in use characteristics of the ISO 25010 namely: functional suitability, reliability, performance efficiency, operability, security, compatibility, maintainability, transferability, effectiveness, efficiency, satisfaction, safety and usability. We also include community related quality characteristics as described in the literature namely (Soto and Ciolkowski 2009 ): maintenance capacity, sustainability and process maturity.

The methods used by assessment models for selection can be classified as (Petersen et al. 2008 ; Wen et al. 2012 ):

Data mining technique such as: Artificial Neural Network, Case-Based Reasoning, Data Envelope Analysis (DEA), Fuzzy Logic etc.

Process: A series of actions, or functions leading to a selection result and performing operations on data

Tool based technique: A technique that greatly employs software tools to accomplish selection task

Model: A system representation that allows for selection based on investigation through a hierarchical structure

Framework: A real or conceptual structure intended to serve as support or guide for selection process

Other, e.g. guidelines

The domain of application can be classified as follows (Forward and Lethbridge 2008 ):

Data dominant software—i.e. consumer-oriented software, business-oriented software, design and engineering software as well as information display and transaction entry

Systems software—i.e. operating systems, networking/communications, device/peripheral drivers, support utilities, middleware and system components, software backplanes (e.g. Eclipse), servers and malware

Control-dominant software—i.e. hardware control, embedded software, real time control software, process control software (e.g. air traffic control, industrial process, nuclear plants)

Computation-dominant software—i.e. operations research, information management and manipulation, artistic creativity, scientific software and artificial intelligence

No domain specified

Synthesis method

The synthesis method was based on:

Counting the number of papers per publication outlet and the number of papers found on a year-wise basis,

Counting the primary studies that are classified in response to each research question,

Presenting charts and frequency tables for the classification results which have been used in the analysis,

Presenting in the discussion a narrative summary with which to recount the key findings of this study.

This section presents the results obtained in response to the research questions posed in this study. Table 3 is a summary of the OSS quality assessment models used in this study, their sources and year of publication. The first column of the table (Study Ref.) represents the reference number of each quality assessment model in ascending order. The table shows that 2009 has the most number of published papers—three publications in total. The year 2003, 2004, 2005 and 2012 have the lowest number of publications—one published paper each. All other years (2007, 2008, 2011, 2013, 2014, 2015) have two published papers.

The studies were assessed for quality using the criteria described in the previous section (see “ Quality assessment ” section). The score for each study is shown in Table 4 . The results of the quality analysis shows that all studies scored above 1 on the proposed quality assessment scale with only one study scoring less than 2. One study scored 4, five studies scored 3.5, five studies scored 3, five studies scored 2.5 and two studies scored 2.

Table 5 shows the summary of the response to the research questions from each of the selected articles. From the table, it can be observed that an assessment model can belong to more than one category for RQ1 (an example is the assessment model in Study Ref. 8 which is single-attribute model, a non-community attribute model and a non-quality in use model).

RQ1. What are the key quality characteristics possessed by the models?

To address RQ1, we performed a comparative study of each identified model against ISO 25010 as well as community related quality characteristics described in “ Background ” section. Based on our comparative study, which is presented in Table 6 , we classify the quality assessment models into five categories, which are discussed as follows:

Single-attribute models: This refers to models that only measure one quality characteristic. Qualification and Selection of Open Source software (QSOS) model (Atos 2006 , Deprez and Alexandre 2008 ), Mathieu and Wray model ( 2007 ), Sudhaman and Thangavel model ( 2015 ) and Open Source Usability Maturity Model (OS-UMM) model (Raza et al. 2012 ) fall into this category. QSOS possesses maintainability as its quality characteristic. Mathieu and Wray as well as Sudhaman and Thangavel models both possess efficiency as their singular quality characteristic. In addition, OS-UMM possesses usability as its singular quality characteristic.

Rounded category models: This refers to models that possess at least one quality characteristic in each of the three categories used for comparison (i.e. product quality, quality in use and community related characteristics). Open Source Maturity Model (OSMM) (Duijnhouwer and Widdows 2003 ), Open Business Readiness Rating (Open BRR) model (Wasserman et al. 2006 ), Source Quality Observatory for Open Source Software (SQO-OSS) model (Samoladas et al. 2008 ; Spinellis et al. 2009 ), Evaluation Framework for Free/Open souRce projecTs (EFFORT) model (Aversano and Tortorella 2013 ), Muller ( 2011 ) and Sohn et al. model ( 2015 ) fall into this category of models. OSMM possesses all the quality characteristics in the product quality category as well as in the community-related quality characteristics but only possesses usability in the quality in use category. Open BRR and EFFORT models both possess all the community-related quality characteristics, some of the product quality characteristics and usability from the quality in use category. SQO-OSS possesses all the community-related quality characteristics, three of the product quality characteristics and effectiveness from the quality in use category. Muller model possesses one characteristic each from the product quality and community-related categories. It also possesses efficiency and usability from the quality in use category. As for Sohn et al. model, it possesses two quality characteristics from the product quality category and one quality characteristic each from the quality in use and community-related quality categories.

Community-only attribute model: This refers to a model that only measures community-related quality characteristics. The only model that fits this description is the Kuwata et al. model ( 2014 ) as seen in Table 6 . The model does not possess any quality characteristic from the product quality or quality in use categories.

Non-community attribute model: This refers to models that do not measure any community-related quality characteristics. QSOS (Atos 2006 ), Sung et al. ( 2007 ), Raffoul et al. ( 2008 ), Alfonzo et al. ( 2008 ), Mathieu and Wray, Chirila et al. (Del Bianco et al. 2010a ), OS-UMM (Raza et al. 2012 ), Sudhaman and Thangavel, and Sarrab and Rehman (Sarrab and Rehman 2014 ) models fall into this category.

Non-quality in use models: This refers to models that do not include any quality in use characteristics in their structure. QSOS (Atos 2006 , Deprez and Alexandre 2008 ), QualOSS (Soto and Ciolkowski 2009 ), OMM (Petrinja et al. 2009 , Del Bianco et al. 2010b , Del Bianco et al. 2011 , Chirila et al. ( 2011 ), Adewumi et al. ( 2013 ), and Kuwata et al. models are the models in this category.

From our classification, it is possible for a particular model to belong to more than one category. QSOS for instance belongs to three of the categories (i.e. it is a single-attribute model, non-community attribute model and non-quality in use model). Mathieu and Wray model ( 2007 ), Chirila et al. model ( 2011 ), OS-UMM (Raza et al. 2012 ), Sudhaman and Thangavel model ( 2015 ), as well as Kuwata et al. model ( 2014 ) all belong to two categories respectively. Precisely, Mathieu and Wray model is a single-attribute model and non-community attribute model. Chirila et al. model is a non-community attribute model as well as a non-quality in use model. OS-UMM is a single attribute model and a non-community attribute model. Sudhaman and Thangavel model is both a single-attribute model and non-community attribute model. Kuwata et al. model is both a community-only attribute model and a non-quality in use model. All the other models belong to a single category and they include: OSMM (Duijnhouwer and Widdows 2003 ), Open BRR (Wasserman et al. 2006 ), Sung et al. ( 2007 ), QualOSS (Soto and Ciolkowski 2009 ), OMM (Petrinja et al. 2009 ), SQO-OSS (Samoladas et al. 2008 ), EFFORT (Aversano and Tortorella 2013 ), Raffoul et al. ( 2008 ), Alfonzo et al. ( 2008 ), Muller ( 2011 ), Adewumi et al. ( 2013 ), Sohn et al. as well as Sarrab and Rehman models ( 2014 ).

Table 6 is a comparative analysis between the OSS quality models presented in Table 3 and the ISO 25010 model. It also features community related characteristics and how they compare with the OSS quality models. Cells marked with ‘x’ indicate that the OSS quality model possesses such characteristic similar to ISO 25010. An empty cell simply means that the OSS quality model does not possess such characteristic as found in ISO 25010.

Figure 1 shows the frequency distribution of the ISO 25010 Product quality characteristics in the OSS quality models we considered. It shows that maintainability is measured by 55% of the existing OSS quality models making it the most common product quality characteristic measured by existing OSS quality models. This is followed by functional suitability, which is measured in 50% of the existing quality models. The least measured are operability, compatibility and transferability that are each measured by 30% of the existing quality models. From Fig. 1 , it can be inferred that the maintainability of a given OSS is of more importance than the functionality it possesses. This is because being an OSS; the code is accessible making it possible to incorporate missing features. However, such missing features can be difficult to implement if the code is not well documented, readable and understandable which are all attributes of maintainable code. Similar inferences can be made as regard the other quality characteristics. For instance, the reliability and security of an OSS can be improved upon if the code is maintainable. In addition, the performance efficiency, operability, compatibility and transferability can all be improved upon with maintainable code.

Frequency distribution of ISO 25010 product quality characteristics in OSS quality models

Figure 2 shows the frequency distribution of the ISO 25010 Quality in Use characteristics in the OSS quality models we considered. It shows that usability is measured by 50% of the existing OSS quality models making it the most commonly measured characteristic in this category. It is followed by effectiveness and efficiency, which are both considered by 15% of the existing OSS quality models. Satisfaction and safety on the other hand are not considered in any of the existing OSS quality models. From Fig. 2 , it can be easily inferred that usability is the most significant attribute under the quality in use category and hence all other attributes in this category add up to define it. In other words, usable OSS is one that is effective in accomplishing specific tasks, efficient in managing system resources, safe for the environment and provides satisfaction to an end-user.

Frequency distribution of ISO 25010 quality in use characteristics in OSS quality models

Figure 3 shows the frequency distribution of community related quality characteristics in the OSS quality models we considered. It shows that maintenance capacity is measured in 45% of the existing OSS quality models making it the most commonly measured attribute in this category. It is closely followed by sustainability that is measured by 40% of the existing OSS quality models. Process maturity is the least measured attribute in this category and is considered in 35% of the existing OSS quality models. It can be inferred from Fig. 3 that evaluators of an OSS quality via its community are mostly interested in the maintenance capacity of such a community in comparison to the sustainability of the community. Also, they are more concerned about the sustainability of the community than the maturity of the community’s processes.

Frequency distribution of community related quality characteristics in OSS quality models

RQ2. What are the methods applied for reaching selection decisions?

Figure 4 depicts the various selection methods adopted in the existing OSS quality models for reaching a decision in the midst of alternatives. The model approach, which entails making system representation that allows for selection based on investigation through a hierarchical structure is the most common selection method used in the existing literature and is used by six (32%) of the existing models. This is followed by the process approach that accounts for use in 21% (four) of the existing models. For the “other” category, three (16%) of the models use a form of guideline in the selection process. Framework approach accounts for 11% while the data mining approach, as well as the tool-based approach both account for 10% each of the existing OSS quality models. In general, it can be observed that more emphasis is placed on non-automated approaches in the existing quality models and so applying these models in real life selection scenarios is usually time-consuming and requires expertise to conduct (Hauge et al. 2009 ; Ali Babar 2010 ).

Selection methods used in OSS quality models

RQ3. What is the domain of application?

Figure 5 depicts the domain of application of the existing OSS quality assessment models. In general, majority of the models do not specify the domain of application. However, for those with specific domain of application, we observed that majority focus on measuring quality in data-dominant software that includes: business-oriented software such as Enterprise Resource Planning and Customer Relationship Management solutions; design and engineering software as well as information display and transaction systems such as issue tracking systems. System software evaluation accounts for 16% while computation-dominant software accounts for 11%.

Domains in which OSS quality models have been applied

Summary and discussion

Principal findings.

From the existing OSS quality models considered in this study, 20% of the models only measure a single quality attribute. Models in this category include: QSOS (which measures maintainability) (Atos 2006 ), Wray and Mathieu (Mathieu and Wray 2007 ) (which measures efficiency), OS-UMM (which measures usability) (Raza et al. 2012 ) and Sudhaman and Thangavel model (which measures efficiency) (Sudhaman and Thangavel 2015 ). Furthermore, 50% of the existing models do not measure community related quality characteristics even though community is what distinguishes OSS from their proprietary counterpart. Models in this category include: QSOS (Atos 2006 ), Sung et al. model ( 2007 ), Raffoul et al. model ( 2008 ), Alfonzo et al. model ( 2008 ), Wray and Mathieu model (Mathieu and Wray 2007 ), Chirila et al. model ( 2011 ), OS-UMM (Raza et al. 2012 ), Sudhaman and Thangavel model ( 2015 ) and Sarrab and Rehman model ( 2014 ). In addition, 35% of the models touch on all categories. They include: OSMM (Duijnhouwer and Widdows 2003 ), Open BRR (Wasserman et al. 2006 ), SQO-OSS (Spinellis et al. 2009 ), EFFORT (Aversano and Tortorella 2013 ), Müller model ( 2011 ) and Sohn et al. model ( 2015 ). Among these models a number of them have been applied to selection scenarios and reported in the literature. A notable example is the EFFORT model, which has been applied to evaluate OSS in the customer relationship management (CRM) domain (Aversano and Tortorella 2011 ) as well as in the enterprise resource-planning (ERP) domain (Aversano and Tortorella 2013 ).

From the existing OSS quality models, it is observed that in the aspect of product quality as defined by ISO 25010, maintainability is the most significant quality characteristic; Usability is the most significant quality in use characteristic in the existing OSS quality models while Maintenance capacity is the most significant community related characteristic in the OSS quality assessment models. Also worthy of note is that satisfaction and safety attributes of quality in use are never considered in the OSS quality models.

The model approach is the most adopted selection method in the existing OSS quality models. The least considered are the tool-based and data mining selection approaches. However, as newer publications emerge we expect to see other approaches and data mining gaining more ground.

Majority (47%) of the existing models do not specify any domain of application. As for those with specific domain of application, a greater percentage focus of data-dominant software especially enterprise resource planning software. Computation-dominant software is the least considered in this regard. Software in this category includes: operations research, information management and manipulation, artistic creativity, scientific software and artificial intelligence software.

From the this study, we also observed that none of the existing models evaluate all the criteria that we laid out, in terms of every quality characteristic under product quality, quality in use, and community related quality characteristics.

Implications of the results

Based on the comparison of the existing quality assessment models, there is clearly no suitable model—each model has its own limitations. As a result, the findings of this analysis have implications especially for practitioners who work towards coming up with new assessment models. They should note the following points in line with the research questions posed in this study:

Emphasis should shift from trying to build comprehensive models (containing all the possible software characteristics) to building models that include only essential quality characteristics. This study has shown that these essential quality characteristics include: maintainability, usability and maintenance capacity of software community. By narrowing down to these three essential quality characteristics, model developers would help to reduce the burden of OSS evaluation via existing quality assessment models, which has been referred to largely as being laborious and time consuming to conduct (Hauge et al. 2009 ; Ali Babar 2010 ).

Newer models should incorporate selection methods that are amenable to automation as this is not the case in most of the existing OSS quality assessment models reviewed in this study. The selection methods mostly adopted are the model (32%), process (21%) and other (16%) such as guidelines, which are not easily amenable to automation (Fahmy et al. 2012 ). Model developers should thus turn their focus to data mining techniques (Leopairote et al. 2013 ), framework or tool-based selection methods, which are currently among the least considered options. The advantage this offers is that it will help quicken the evaluation process resulting in faster decision-making. Following this advice could also bring about increased adoption of the models in practice (Wang et al. 2013 ). In addition, model developers can also consider modeling quality assessment as a multi-criteria decision-making (MCDM) problem so as to facilitate automation as seen in recent studies (Fakir and Canbolat 2008 ; Cavus 2010 , 2011 ). A MCDM problem in this context can be regarded as a process of choosing among available alternatives (i.e. different OSS alternatives) based on a number of attributes (quality criteria). Considering this option opens the model developer to several well-known MCDM methods that amenable to automation such as: DEA, Analytic Hierarchy Process (AHP), and Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) to mention a few (Zavadskas et al. 2014 ).

From Fig. 5 , it can be observed that 47% of the quality assessment models considered do not mention the domain of application. This implies that most of the models were designed to be domain-independent. As such, domain-independence should be the focus of model developers (Wagner et al. 2015 ). A domain independent model is one that is able to assess quality in various category of OSS including those that are data-dominant, system software, control-dominant and computation-dominant. It should also be able to this with little or no customization. By following this particular consideration, the model proposed can tend to be widely adopted and possibly standardized.

Threats to validity

Construct threats to validity in this type of study is related to the identification of primary studies. In order to ensure that, as many relevant primary studies as possible were included, different synonyms for ‘open source software’ and ‘quality model’ were included in the search string. The first and second author conducted the automatic search for relevant literature independently and the results obtained were harmonized using a spreadsheet application and duplicates were removed. The reference sections of the selected papers were also scanned to ensure that all relevant references had been included. The final decision to include a study for further consideration depended on the agreement of all the authors. If a disagreement arose, then a discussion took place until consensus was reached.

Internal validity has to do with the data extraction and analysis. As previously mentioned, the first author carried out the data extraction of the primary studies and assigned them to the other authors to assess. The first author also participated in assessing all the primary studies and compared his results with those of the other authors and discrepancies in results were discussed until an agreement was reached. The assignment process of the primary studies to the other authors was not randomized because the sample size (number of primary studies) was relatively small and the time availability of each researcher needed to be considered. In order to properly classify the primary studies based on the quality characteristics they possessed, the authors adopted the ISO 25010 model ( 2001 ) as benchmark. All the authors were fully involved in the process of classifying the primary studies and all disagreements where discussed until a consensus was reached.

To mitigate the effects of incorrect data extraction, which can affect conclusion validity, the steps in the selection and data, extraction activity was clearly described as discussed in the previous paragraphs. The traceability between the data extracted and the conclusions was strengthened through the direct generation of charts and frequency tables from the data by using a statistical package. In our opinion, slight differences based on publication selection bias and misclassification would not alter the main conclusions drawn from the papers identified in this study.

As regards the external validity of this study, the results obtained apply specifically to quality assessment models within the OSS domain. Quality assessment models that evaluate quality in proprietary software are not covered. In addition, the validity of the inferences in this paper only concern OSS quality assessment models. This threat is therefore not present in this context. The results of this study may serve as starting point for OSS quality researchers to further identify and classify newer models in this domain.

Conclusion and future work

The overall goal of this study is to analyze and classify the existing knowledge as regards OSS quality assessment models. Papers dealing with these models were identified between 2003 and 2015. 19 papers were selected. The main publication outlets of the papers identified were journals and conference proceedings. The result of this study shows that maintainability is the most significant and ubiquitous product quality characteristic considered in the literature while usability is the most significant attribute in the quality in use category. Maintenance capacity of an OSS community is also a crucial quality characteristic under community related quality characteristics. The most commonly used selection method is the model approach and the least considered are the tool-based and data mining approaches. Another interesting result is that nearly half (47%) of the selected papers do not mention an application domain for the models in their research. More attention should be paid to building models that incorporate only essential quality characteristics. Also, framework, tool-based and data mining selection methods should be given more attention in future model proposals.

This study could help researchers to identify essential quality attributes with which to develop more robust quality models that are applicable in the various software domains. Also, researchers can compare the existing selection methods in order to determine the most effective. As future work, we intend to model OSS quality assessment as a MCDM problem. This will afford us the opportunity to choose from a range of MCDM methods one (or more) that can be used to evaluate quality in OSS across multiple domains.

Abbreviations

customer relationship management

Data Envelope Analysis

Evaluation Framework for Free/Open souRce projecTs

enterprise resource-planning

multi-criteria decision making

Open Business Readiness Rating

Open Source Maturity Model

Open Source Usability Maturity Model

open source software

quality assessment

Qualification and Selection of Open Source software

research question

Source Quality Observatory for Open Source Software

Technique for Order of Preference by Similarity to Ideal Solution

Adewumi A, Misra S, Omoregbe N (2013a) A review of models for evaluating quality in open source software. IERI Proc 4(1):88–92

Article Google Scholar

Adewumi A, Omoregbe N, Misra S (2013) Quantitative quality model for evaluating open source web applications: case study of repository software. In: 16th International conference on computational science and engineering (CSE), Dec 3 2013

Alfonzo O, Domínguez K, Rivas L, Perez M, Mendoza L, Ortega M (2008) Quality measurement model for analysis and design tools based on FLOSS. In: 19th Australian conference on software engineering, Perth, Australia, 26–28 March 2008

Atos (2006), Method for qualification and selection of open source software (QSOS) version 2.0. http://backend.qsos.org/download/qsos-2.0_en.pdf . Accessed 5 Jan 2015

Aversano L, Tortorella M (2011) Applying EFFORT for evaluating CRM open source systems. In: International conference on product-focused software process improvement, Springer, Heidelberg, pp 202–216

Aversano L, Tortorella M (2013) Quality evaluation of FLOSS projects: application to ERP systems. Inf Softw Technol 55(7):1260–1276

Brereton OP, Kitchenham BA, Budgen DT, Khalil M (2007) Lessons from applying the systematic literature review process within the software engineering domain. J Syst Softw 80:571–583

Cavus N (2010) The evaluation of learning management systems using an artificial intelligence fuzzy logic algorithm. Adv Eng Softw 41:248–254

Article MATH Google Scholar

Cavus N (2011) The application of a multi-attribute decision-making algorithm to learning management systems evaluation. Br J Edu Technol 42:19–30

Chirila C, Juratoni D, Tudor D, Cretu V (2011) Towards a software quality assessment model based on open-source statical code analyzers. In: 6th IEEE international conference on computational intelligence and informatics (SACI), May 19 2011

Deissenboeck F, Juergens E, Lochman K, Wagner S (2009) Software quality models: purposes, usage scenarios and requirements. In: ICSE workshop on software quality, May 16 2009

Del Bianco V, Lavazza L, Morasca S, Taibi D, Tosi D (2010a) The QualiSPo approach to OSS product quality evaluation. In: 3rd International workshop on emerging trends in free/libre/open source software research and development, New York

Del Bianco V, Lavazza L, Morasca S, Taibi D, Tosi D (2010b) An investigation of the users’ perception of OSS quality. In: 6th International conference on open source systems, Springer Verlag, pp 15–28

Del Bianco V, Lavazza L, Morasca S, Taibi D (2011) A survey on open source software trustworthiness. IEEE Softw 28(5):67–75

Deprez JC, Alexandre S (2008) Comparing assessment methodologies for free/open source software: OpenBRR and QSOS. In: 9th international conference on product-focused software process improvement (PROFES‘08), Springer, Heidelberg, pp 189–203

Duijnhouwer F, Widdows C (2003) Open source maturity model. http://jose-manuel.me/thesis/references/GB_Expert_Letter_Open_Source_Maturity_Model_1.5.3.pdf Accessed: 5 Jan 2015

Fahmy S, Haslinda N, Roslina W, Fariha Z (2012) Evaluating the quality of software in e-book using the ISO 9126 model. Int J Control Autom 5:115–122

Google Scholar

Fakir O, Canbolat MS (2008) A web-based decision support system for multi-criteria inventory classification using fuzzy AHP methodology. Expert Syst Appl 35:1367–1378

Forward A, Lethbridge TC (2008) A taxonomy of software types to facilitate search and evidence-based software engineering. In: Proceedings of the 2008 conference of the centre for advanced studies on collaborative research, Oct 27 2008

Haaland K, Groven AK, Regnesentral N, Glott R, Tannenberg A, FreeCode AS (2010) Free/libre open source quality models—a comparison between two approaches. In: 4th FLOS international workshop on Free/Libre/Open Source Software, July 2010

Hauge Ø, Østerlie T, Sørensen CF, Gerea M (2009) An empirical study on selection of open source software—preliminary results. In: ICSE workshop on emerging trends in free/libre/open source software research and development, May 18 2009

ISO/IEC 9126 (2001) Software engineering—product quality—part 1: quality model. http://www.iso.org/iso/catalogue_detail.htm?csnumber=22749 Accessed 14 Nov 2015

ISO/IEC 25010 (2010) Systems and software engineering—systems and software product quality requirements and evaluation (SQuaRE)—system and software quality models. http://www.iso.org/iso/catalogue_detail.htm?csnumber=35733 Accessed 14 Oct 2016

Kitchenham BA (2004) Procedures for undertaking systematic reviews. http://csnotes.upm.edu.my/kelasmaya/pgkm20910.nsf/0/715071a8011d4c2f482577a700386d3a/$FILE/10.1.1.122.3308[1].pdf . Accessed 14 Oct 2016

Kuwata Y, Takeda K, Miura H (2014) A study on maturity model of open source software community to estimate the quality of products. Proc Comput Sci 35:1711–1717

Leopairote W, Surarerks A, Prompoon N (2013) Evaluating software quality in use using user reviews mining. In: 10th International joint conference on computer science and software engineering, May 29 2013

Mathieu R, Wray B (2007) The application of DEA to measure the efficiency of open source security tool production. In: AMCIS 2007 proceedings, Dec 31 2007

Miguel JP, Mauricio D, Rodríguez G (2014) A review of software quality models for the evaluation of software products. Int J Soft Eng Appl 5(6):31–53

Müller T (2011) How to choose an free and open source integrated library system. Int Digi Lib Perspect 27(1):57–78

Ouhbi S, Idri A, Fernández-Alemán JL, Toval A (2014) Evaluating software product quality: a systematic mapping study. In: International conference on software process and product measurement, Oct 6 2014

Ouhbi S, Idri A, Fernández-Alemán JL, Toval A (2015) Predicting software product quality: a systematic mapping study. Computación y Sistemas 19(3):547–562

Petersen K, Feldt R, Mujtaba S, Mattsson M (2008) Systematic mapping studies in software engineering. In: 12th International conference on evaluation and assessment in software engineering, Blekinge Institute of Technology, Italy, Jun 26 2008

Petrinja E, Nambakam R, Sillitti A (2009) Introducing the open source maturity model. In: Proceedings of the 2009 ICSE workshop on emerging trends in free/libre/open source software research and development, May 18 2009

Raffoul E, Domínguez K, Perez M, Mendoza LE, Griman AC (2008) Quality model for the selection of FLOSS-based Issue tracking system. In: Proceedings of the IASTED international conference on software engineering, Innsbruck, Austria, 12 Feb 2008

Raza A, Capretz LF, Ahmed F (2012) An open source usability maturity model (OS-UMM). Comput Hum Behav 28(4):1109–1121

Samoladas I, Gousios G, Spinellis D, Stamelos I (2008) The SQO-OSS quality model: measurement based open source software evaluation. In: IFIP International Conference on Open Source Systems. Springer, Milano, pp 237–248

Sarrab M, Rehman OMH (2014) Empirical study of open source software selection for adoption, based on software quality characteristics. Adv Eng Softw 69:1–11

Sohn H, Lee M, Seong B, Kim J (2015) Quality evaluation criteria based on open source mobile HTML5 UI framework for development of cross-platform. Int J Soft Eng Appl 9(6):1–12

Soto M, Ciolkowski M (2009) The QualOSS open source assessment model measuring the performance of open source communities. In: Proceedings of the 3rd international symposium on empirical software engineering and measurement, 15 Oct 2009

Spinellis D, Gousios G, Karakoidas V, Louridas P, Adams PJ, Samoladas I, Stamelos I (2009) Evaluating the quality of open source software. Elect Notes Theor Comp Sci 233:5–28

Stol KJ, Ali Babar, M (2010) Challenges in using open source software in product development: a review of the literature. In: Proceedings of the 3rd international workshop on emerging trends in free/libre/open source software research and development, May 8 2010

Sudhaman P, Thangavel C (2015) Efficiency analysis of ERP projects—software quality perspective. Int J of Proj Manag 33:961–970

Sung WJ, Kim JH, Rhew SY (2007) A quality model for open source software selection. In: Sixth international conference on advanced language processing and web information technology, 22 Aug 2007

Wagner S, Goeb A, Heinemann L, Kläs M, Lampasona C, Lochmann K, Mayr A, Plösch R, Seidl A, Streit J, Trendowicz A (2015) Operationalised product quality models and assessment: the Quamoco approach. Inf and Soft Tech 62:101–123

Wang D, Zhu S, Li T (2013) SumView: a web-based engine for summarizing product reviews and customer opinions. Expert Syst Appl 40:27–33

Wasserman AI, Pal M, Chan C (2006) Business readiness rating for open source. In: Proceedings of the EFOSS Workshop, Como, Italy, 8 Jun 2006

Wen J, Li S, Lin Z, Hu Y, Huang C (2012) Systematic literature review of machine learning based software development effort estimation models. Inf Softw Technol 54(1):41–59

Zavadskas EK, Turskis Z, Kildienė S (2014) State of art surveys of overviews on MCDM/MADM methods. Technol Econ Dev Econ 20:165–179

Download references

Authors’ contributions

AA is a Ph.D. student and has done a significant part of the work under the supervision of SM. SM—is main supervisor of AA and working with him since last 4 years for completion of the work. NO is co-supervisor of AA and provided his continuous guidance in completion of the work. BC and RS—are co-researchers with our software engineering cluster in CU. They both contributed a lot for improving the manuscript (reviewed and added valuable contributions) since the beginning of the work. All authors read and approved the final manuscript.

Acknowledgements

We are thankful to Dr. Olawande Daramola of Computer and Information Science Department for his valuable suggestions and comments for improvement of the work/paper.

Competing interests

The authors declare that they have no competing interests.

Availability of data and materials

The dataset(s) supporting the conclusions of this article is included within the article in Tables 3 , 5 and 6 .

Author information

Authors and affiliations.

Covenant University, Ota, Nigeria

Adewole Adewumi, Sanjay Misra & Nicholas Omoregbe

Atilim University, Ankara, Turkey

Sanjay Misra

Pontificia Universidad Católica de Valparaíso, Valparaiso, Chile

Broderick Crawford & Ricardo Soto

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanjay Misra .

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Adewumi, A., Misra, S., Omoregbe, N. et al. A systematic literature review of open source software quality assessment models. SpringerPlus 5 , 1936 (2016). https://doi.org/10.1186/s40064-016-3612-4

Download citation

Received : 17 May 2016

Accepted : 27 October 2016

Published : 08 November 2016

DOI : https://doi.org/10.1186/s40064-016-3612-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Open source software
Quality assessment models

systematic literature review on open source software

Open Source Software Evaluation, Selection, and Adoption: a Systematic Literature Review

Ieee account.

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests
US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support
About IEEE Xplore
Accessibility
Terms of Use
Nondiscrimination Policy
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

View all journals
My Account Login
Explore content
About the journal
Publish with us
Sign up for alerts
Open access
Published: 01 February 2021

An open source machine learning framework for efficient and transparent systematic reviews

Rens van de Schoot ORCID: orcid.org/0000-0001-7736-2091 1 ,
Jonathan de Bruin ORCID: orcid.org/0000-0002-4297-0502 2 ,
Raoul Schram 2 ,
Parisa Zahedi ORCID: orcid.org/0000-0002-1610-3149 2 ,
Jan de Boer ORCID: orcid.org/0000-0002-0531-3888 3 ,
Felix Weijdema ORCID: orcid.org/0000-0001-5150-1102 3 ,
Bianca Kramer ORCID: orcid.org/0000-0002-5965-6560 3 ,
Martijn Huijts ORCID: orcid.org/0000-0002-8353-0853 4 ,
Maarten Hoogerwerf ORCID: orcid.org/0000-0003-1498-2052 2 ,
Gerbrich Ferdinands ORCID: orcid.org/0000-0002-4998-3293 1 ,
Albert Harkema ORCID: orcid.org/0000-0002-7091-1147 1 ,
Joukje Willemsen ORCID: orcid.org/0000-0002-7260-0828 1 ,
Yongchao Ma ORCID: orcid.org/0000-0003-4100-5468 1 ,
Qixiang Fang ORCID: orcid.org/0000-0003-2689-6653 1 ,
Sybren Hindriks 1 ,
Lars Tummers ORCID: orcid.org/0000-0001-9940-9874 5 &
Daniel L. Oberski ORCID: orcid.org/0000-0001-7467-2297 1 , 6

Nature Machine Intelligence volume 3 , pages 125–133 ( 2021 ) Cite this article

71k Accesses

207 Citations

162 Altmetric

Metrics details

Computational biology and bioinformatics
Computer science
Medical research

A preprint version of the article is available at arXiv.

To help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible, we designed a tool to accelerate the step of screening titles and abstracts. For many tasks—including but not limited to systematic reviews and meta-analyses—the scientific literature needs to be checked systematically. Scholars and practitioners currently screen thousands of studies by hand to determine which studies to include in their review or meta-analysis. This is error prone and inefficient because of extremely imbalanced data: only a fraction of the screened studies is relevant. The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We therefore developed an open source machine learning-aided pipeline applying active learning: ASReview. We demonstrate by means of simulation studies that active learning can yield far more efficient reviewing than manual reviewing while providing high quality. Furthermore, we describe the options of the free and open source research software and present the results from user experience tests. We invite the community to contribute to open source projects such as our own that provide measurable and reproducible improvements over current practice.

Causal machine learning for predicting treatment outcomes

Highly accurate protein structure prediction with AlphaFold

Segment anything in medical images

With the emergence of online publishing, the number of scientific manuscripts on many topics is skyrocketing 1 . All of these textual data present opportunities to scholars and practitioners while simultaneously confronting them with new challenges. Scholars often develop systematic reviews and meta-analyses to develop comprehensive overviews of the relevant topics 2 . The process entails several explicit and, ideally, reproducible steps, including identifying all likely relevant publications in a standardized way, extracting data from eligible studies and synthesizing the results. Systematic reviews differ from traditional literature reviews in that they are more replicable and transparent 3 , 4 . Such systematic overviews of literature on a specific topic are pivotal not only for scholars, but also for clinicians, policy-makers, journalists and, ultimately, the general public 5 , 6 , 7 .

Given that screening the entire research literature on a given topic is too labour intensive, scholars often develop quite narrow searches. Developing a search strategy for a systematic review is an iterative process aimed at balancing recall and precision 8 , 9 ; that is, including as many potentially relevant studies as possible while simultaneously limiting the total number of studies retrieved. The vast number of publications in the field of study often leads to a relatively precise search, with the risk of missing relevant studies. The process of systematic reviewing is error prone and extremely time intensive 10 . In fact, if the literature of a field is growing faster than the amount of time available for systematic reviews, adequate manual review of this field then becomes impossible 11 .

The rapidly evolving field of machine learning has aided researchers by allowing the development of software tools that assist in developing systematic reviews 11 , 12 , 13 , 14 . Machine learning offers approaches to overcome the manual and time-consuming screening of large numbers of studies by prioritizing relevant studies via active learning 15 . Active learning is a type of machine learning in which a model can choose the data points (for example, records obtained from a systematic search) it would like to learn from and thereby drastically reduce the total number of records that require manual screening 16 , 17 , 18 . In most so-called human-in-the-loop 19 machine-learning applications, the interaction between the machine-learning algorithm and the human is used to train a model with a minimum number of labelling tasks. Unique to systematic reviewing is that not only do all relevant records (that is, titles and abstracts) need to seen by a researcher, but an extremely diverse range of concepts also need to be learned, thereby requiring flexibility in the modelling approach as well as careful error evaluation 11 . In the case of systematic reviewing, the algorithm(s) are interactively optimized for finding the most relevant records, instead of finding the most accurate model. The term researcher-in-the-loop was introduced 20 as a special case of human-in-the-loop with three unique components: (1) the primary output of the process is a selection of the records, not a trained machine learning model; (2) all records in the relevant selection are seen by a human at the end of the process 21 ; (3) the use-case requires a reproducible workflow and complete transparency is required 22 .

Existing tools that implement such an active learning cycle for systematic reviewing are described in Table 1 ; see the Supplementary Information for an overview of all of the software that we considered (note that this list was based on a review of software tools 12 ). However, existing tools have two main drawbacks. First, many are closed source applications with black box algorithms, which is problematic as transparency and data ownership are essential in the era of open science 22 . Second, to our knowledge, existing tools lack the necessary flexibility to deal with the large range of possible concepts to be learned by a screening machine. For example, in systematic reviews, the optimal type of classifier will depend on variable parameters, such as the proportion of relevant publications in the initial search and the complexity of the inclusion criteria used by the researcher 23 . For this reason, any successful system must allow for a wide range of classifier types. Benchmark testing is crucial to understand the real-world performance of any machine learning-aided system, but such benchmark options are currently mostly lacking.

In this paper we present an open source machine learning-aided pipeline with active learning for systematic reviews called ASReview. The goal of ASReview is to help scholars and practitioners to get an overview of the most relevant records for their work as efficiently as possible while being transparent in the process. The open, free and ready-to-use software ASReview addresses all concerns mentioned above: it is open source, uses active learning, allows multiple machine learning models. It also has a benchmark mode, which is especially useful for comparing and designing algorithms. Furthermore, it is intended to be easily extensible, allowing third parties to add modules that enhance the pipeline. Although we focus this paper on systematic reviews, ASReview can handle any text source.

In what follows, we first present the pipeline for manual versus machine learning-aided systematic reviews. We then show how ASReview has been set up and how ASReview can be used in different workflows by presenting several real-world use cases. We subsequently demonstrate the results of simulations that benchmark performance and present the results of a series of user-experience tests. Finally, we discuss future directions.

Pipeline for manual and machine learning-aided systematic reviews

The pipeline of a systematic review without active learning traditionally starts with researchers doing a comprehensive search in multiple databases 24 , using free text words as well as controlled vocabulary to retrieve potentially relevant references. The researcher then typically verifies that the key papers they expect to find are indeed included in the search results. The researcher downloads a file with records containing the text to be screened. In the case of systematic reviewing it contains the titles and abstracts (and potentially other metadata such as the authors’s names, journal name, DOI) of potentially relevant references into a reference manager. Ideally, two or more researchers then screen the records’s titles and abstracts on the basis of the eligibility criteria established beforehand 4 . After all records have been screened, the full texts of the potentially relevant records are read to determine which of them will be ultimately included in the review. Most records are excluded in the title and abstract phase. Typically, only a small fraction of the records belong to the relevant class, making title and abstract screening an important bottleneck in systematic reviewing process 25 . For instance, a recent study analysed 10,115 records and excluded 9,847 after title and abstract screening, a drop of more than 95% 26 . ASReview therefore focuses on this labour-intensive step.

The research pipeline of ASReview is depicted in Fig. 1 . The researcher starts with a search exactly as described above and subsequently uploads a file containing the records (that is, metadata containing the text of the titles and abstracts) into the software. Prior knowledge is then selected, which is used for training of the first model and presenting the first record to the researcher. As screening is a binary classification problem, the reviewer must select at least one key record to include and exclude on the basis of background knowledge. More prior knowledge may result in improved efficiency of the active learning process.

The symbols indicate whether the action is taken by a human, a computer, or whether both options are available.

A machine learning classifier is trained to predict study relevance (labels) from a representation of the record-containing text (feature space) on the basis of prior knowledge. We have purposefully chosen not to include an author name or citation network representation in the feature space to prevent authority bias in the inclusions. In the active learning cycle, the software presents one new record to be screened and labelled by the user. The user’s binary label (1 for relevant versus 0 for irrelevant) is subsequently used to train a new model, after which a new record is presented to the user. This cycle continues up to a certain user-specified stopping criterion has been reached. The user now has a file with (1) records labelled as either relevant or irrelevant and (2) unlabelled records ordered from most to least probable to be relevant as predicted by the current model. This set-up helps to move through a large database much quicker than in the manual process, while the decision process simultaneously remains transparent.

Software implementation for ASReview

The source code 27 of ASReview is available open source under an Apache 2.0 license, including documentation 28 . Compiled and packaged versions of the software are available on the Python Package Index 29 or Docker Hub 30 . The free and ready-to-use software ASReview implements oracle, simulation and exploration modes. The oracle mode is used to perform a systematic review with interaction by the user, the simulation mode is used for simulation of the ASReview performance on existing datasets, and the exploration mode can be used for teaching purposes and includes several preloaded labelled datasets.

The oracle mode presents records to the researcher and the researcher classifies these. Multiple file formats are supported: (1) RIS files are used by digital libraries such as IEEE Xplore, Scopus and ScienceDirect; the citation managers Mendeley, RefWorks, Zotero and EndNote support the RIS format too. (2) Tabular datasets with the .csv, .xlsx and .xls file extensions. CSV files should be comma separated and UTF-8 encoded; the software for CSV files accepts a set of predetermined labels in line with the ones used in RIS files. Each record in the dataset should hold the metadata on, for example, a scientific publication. Mandatory metadata is text and can, for example, be titles or abstracts from scientific papers. If available, both are used to train the model, but at least one is needed. An advanced option is available that splits the title and abstracts in the feature-extraction step and weights the two feature matrices independently (for TF–IDF only). Other metadata such as author, date, DOI and keywords are optional but not used for training the models. When using ASReview in the simulation or exploration mode, an additional binary variable is required to indicate historical labelling decisions. This column, which is automatically detected, can also be used in the oracle mode as background knowledge for previous selection of relevant papers before entering the active learning cycle. If unavailable, the user has to select at least one relevant record that can be identified by searching the pool of records. At least one irrelevant record should also be identified; the software allows to search for specific records or presents random records that are most likely to be irrelevant due to the extremely imbalanced data.

The software has a simple yet extensible default model: a naive Bayes classifier, TF–IDF feature extraction, a dynamic resampling balance strategy 31 and certainty-based sampling 17 , 32 for the query strategy. These defaults were chosen on the basis of their consistently high performance in benchmark experiments across several datasets 31 . Moreover, the low computation time of these default settings makes them attractive in applications, given that the software should be able to run locally. Users can change the settings, shown in Table 2 , and technical details are described in our documentation 28 . Users can also add their own classifiers, feature extraction techniques, query strategies and balance strategies.

ASReview has a number of implemented features (see Table 2 ). First, there are several classifiers available: (1) naive Bayes; (2) support vector machines; (3) logistic regression; (4) neural networks; (5) random forests; (6) LSTM-base, which consists of an embedding layer, an LSTM layer with one output, a dense layer and a single sigmoid output node; and (7) LSTM-pool, which consists of an embedding layer, an LSTM layer with many outputs, a max pooling layer and a single sigmoid output node. The feature extraction techniques available are Doc2Vec 33 , embedding LSTM, embedding with IDF or TF–IDF 34 (the default is unigram, with the option to run n -grams while other parameters are set to the defaults of Scikit-learn 35 ) and sBERT 36 . The available query strategies for the active learning part are (1) random selection, ignoring model-assigned probabilities; (2) uncertainty-based sampling, which chooses the most uncertain record according to the model (that is, closest to 0.5 probability); (3) certainty-based sampling (max in ASReview), which chooses the record most likely to be included according to the model; and (4) mixed sampling, which uses a combination of random and certainty-based sampling.

There are several balance strategies that rebalance and reorder the training data. This is necessary, because the data is typically extremely imbalanced and therefore we have implemented the following balance strategies: (1) full sampling, which uses all of the labelled records; (2) undersampling the irrelevant records so that the included and excluded records are in some particular ratio (closer to one); and (3) dynamic resampling, a novel method similar to undersampling in that it decreases the imbalance of the training data 31 . However, in dynamic resampling, the number of irrelevant records is decreased, whereas the number of relevant records is increased by duplication such that the total number of records in the training data remains the same. The ratio between relevant and irrelevant records is not fixed over interactions, but dynamically updated depending on the number of labelled records, the total number of records and the ratio between relevant and irrelevant records. Details on all of the described algorithms can be found in the code and documentation referred to above.

By default, ASReview converts the records’s texts into a document-term matrix, terms are converted to lowercase and no stop words are removed by default (but this can be changed). As the document-term matrix is identical in each iteration of the active learning cycle, it is generated in advance of model training and stored in the (active learning) state file. Each row of the document-term matrix can easily be requested from the state-file. Records are internally identified by their row number in the input dataset. In oracle mode, the record that is selected to be classified is retrieved from the state file and the record text and other metadata (such as title and abstract) are retrieved from the original dataset (from the file or the computer’s memory). ASReview can run on your local computer, or on a (self-hosted) local or remote server. Data (all records and their labels) remain on the users’s computer. Data ownership and confidentiality are crucial and no data are processed or used in any way by third parties. This is unique by comparison with some of the existing systems, as shown in the last column of Table 1 .

Real-world use cases and high-level function descriptions

Below we highlight a number of real-world use cases and high-level function descriptions for using the pipeline of ASReview.

ASReview can be integrated in classic systematic reviews or meta-analyses. Such reviews or meta-analyses entail several explicit and reproducible steps, as outlined in the PRISMA guidelines 4 . Scholars identify all likely relevant publications in a standardized way, screen retrieved publications to select eligible studies on the basis of defined eligibility criteria, extract data from eligible studies and synthesize the results. ASReview fits into this process, particularly in the abstract screening phase. ASReview does not replace the initial step of collecting all potentially relevant studies. As such, results from ASReview depend on the quality of the initial search process, including selection of databases 24 and construction of comprehensive searches using keywords and controlled vocabulary. However, ASReview can be used to broaden the scope of the search (by keyword expansion or omitting limitation in the search query), resulting in a higher number of initial papers to limit the risk of missing relevant papers during the search part (that is, more focus on recall instead of precision).

Furthermore, many reviewers nowadays move towards meta-reviews when analysing very large literature streams, that is, systematic reviews of systematic reviews 37 . This can be problematic as the various reviews included could use different eligibility criteria and are therefore not always directly comparable. Due to the efficiency of ASReview, scholars using the tool could conduct the study by analysing the papers directly instead of using the systematic reviews. Furthermore, ASReview supports the rapid update of a systematic review. The included papers from the initial review are used to train the machine learning model before screening of the updated set of papers starts. This allows the researcher to quickly screen the updated set of papers on the basis of decisions made in the initial run.

As an example case, let us look at the current literature on COVID-19 and the coronavirus. An enormous number of papers are being published on COVID-19. It is very time consuming to manually find relevant papers (for example, to develop treatment guidelines). This is especially problematic as urgent overviews are required. Medical guidelines rely on comprehensive systematic reviews, but the medical literature is growing at breakneck pace and the quality of the research is not universally adequate for summarization into policy 38 . Such reviews must entail adequate protocols with explicit and reproducible steps, including identifying all potentially relevant papers, extracting data from eligible studies, assessing potential for bias and synthesizing the results into medical guidelines. Researchers need to screen (tens of) thousands of COVID-19-related studies by hand to find relevant papers to include in their overview. Using ASReview, this can be done far more efficiently by selecting key papers that match their (COVID-19) research question in the first step; this should start the active learning cycle and lead to the most relevant COVID-19 papers for their research question being presented next. A plug-in was therefore developed for ASReview 39 , which contained three databases that are updated automatically whenever a new version is released by the owners of the data: (1) the Cord19 database, developed by the Allen Institute for AI, with over all publications on COVID-19 and other coronavirus research (for example SARS, MERS and so on) from PubMed Central, the WHO COVID-19 database of publications, the preprint servers bioRxiv and medRxiv and papers contributed by specific publishers 40 . The CORD-19 dataset is updated daily by the Allen Institute for AI and updated also daily in the plugin. (2) In addition to the full dataset, we automatically construct a daily subset of the database with studies published after December 1st, 2019 to search for relevant papers published during the COVID-19 crisis. (3) A separate dataset of COVID-19 related preprints, containing metadata of preprints from over 15 preprints servers across disciplines, published since January 1st, 2020 41 . The preprint dataset is updated weekly by the maintainers and then automatically updated in ASReview as well. As this dataset is not readily available to researchers through regular search engines (for example, PubMed), its inclusion in ASReview provided added value to researchers interested in COVID-19 research, especially if they want a quick way to screen preprints specifically.

Simulation study

To evaluate the performance of ASReview on a labelled dataset, users can employ the simulation mode. As an example, we ran simulations based on four labelled datasets with version 0.7.2 of ASReview. All scripts to reproduce the results in this paper can be found on Zenodo ( https://doi.org/10.5281/zenodo.4024122 ) 42 , whereas the results are available at OSF ( https://doi.org/10.17605/OSF.IO/2JKD6 ) 43 .

First, we analysed the performance for a study systematically describing studies that performed viral metagenomic next-generation sequencing in common livestock such as cattle, small ruminants, poultry and pigs 44 . Studies were retrieved from Embase ( n = 1,806), Medline ( n = 1,384), Cochrane Central ( n = 1), Web of Science ( n = 977) and Google Scholar ( n = 200, the top relevant references). After deduplication this led to 2,481 studies obtained in the initial search, of which 120 were inclusions (4.84%).

A second simulation study was performed on the results for a systematic review of studies on fault prediction in software engineering 45 . Studies were obtained from ACM Digital Library, IEEExplore and the ISI Web of Science. Furthermore, a snowballing strategy and a manual search were conducted, accumulating to 8,911 publications of which 104 were included in the systematic review (1.2%).

A third simulation study was performed on a review of longitudinal studies that applied unsupervised machine learning techniques to longitudinal data of self-reported symptoms of the post-traumatic stress assessed after trauma exposure 46 , 47 ; 5,782 studies were obtained by searching Pubmed, Embase, PsychInfo and Scopus and through a snowballing strategy in which both the references and the citation of the included papers were screened. Thirty-eight studies were included in the review (0.66%).

A fourth simulation study was performed on the results for a systematic review on the efficacy of angiotensin-converting enzyme inhibitors, from a study collecting various systematic review datasets from the medical sciences 15 . The collection is a subset of 2,544 publications from the TREC 2004 Genomics Track document corpus 48 . This is a static subset from all MEDLINE records from 1994 through 2003, which allows for replicability of results. Forty-one publications were included in the review (1.6%).

Performance metrics

We evaluated the four datasets using three performance metrics. We first assess the work saved over sampling (WSS), which is the percentage reduction in the number of records needed to screen achieved by using active learning instead of screening records at random; WSS is measured at a given level of recall of relevant records, for example 95%, indicating the work reduction in screening effort at the cost of failing to detect 5% of the relevant records. For some researchers it is essential that all relevant literature on the topic is retrieved; this entails that the recall should be 100% (that is, WSS@100%). We also propose the amount of relevant references found after having screened the first 10% of the records (RRF10%). This is a useful metric for getting a quick overview of the relevant literature.

For every dataset, 15 runs were performed with one random inclusion and one random exclusion (see Fig. 2 ). The classical review performance with randomly found inclusions is shown by the dashed line. The average work saved over sampling at 95% recall for ASReview is 83% and ranges from 67% to 92%. Hence, 95% of the eligible studies will be found after screening between only 8% to 33% of the studies. Furthermore, the number of relevant abstracts found after reading 10% of the abstracts ranges from 70% to 100%. In short, our software would have saved many hours of work.

a – d , Results of the simulation study for the results for a study systematically review studies that performed viral metagenomic next-generation sequencing in common livestock ( a ), results for a systematic review of studies on fault prediction in software engineering ( b ), results for longitudinal studies that applied unsupervised machine learning techniques on longitudinal data of self-reported symptoms of posttraumatic stress assessed after trauma exposure ( c ), and results for a systematic review on the efficacy of angiotensin-converting enzyme inhibitors ( d ). Fiteen runs (shown with separate lines) were performed for every dataset, with only one random inclusion and one random exclusion. The classical review performances with randomly found inclusions are shown by the dashed lines.

Usability testing (user experience testing)

We conducted a series of user experience tests to learn from end users how they experience the software and implement it in their workflow. The study was approved by the Ethics Committee of the Faculty of Social and Behavioral Sciences of Utrecht University (ID 20-104).

Unstructured interviews

The first user experience (UX) test—carried out in December 2019—was conducted with an academic research team in a substantive research field (public administration and organizational science) that has conducted various systematic reviews and meta-analyses. It was composed of three university professors (ranging from assistant to full) and three PhD candidates. In one 3.5 h session, the participants used the software and provided feedback via unstructured interviews and group discussions. The goal was to provide feedback on installing the software and testing the performance on their own data. After these sessions we prioritized the feedback in a meeting with the ASReview team, which resulted in the release of v.0.4 and v.0.6. An overview of all releases can be found on GitHub 27 .

A second UX test was conducted with four experienced researchers developing medical guidelines based on classical systematic reviews, and two experienced reviewers working at a pharmaceutical non-profit organization who work on updating reviews with new data. In four sessions, held in February to March 2020, these users tested the software following our testing protocol. After each session we implemented the feedback provided by the experts and asked them to review the software again. The main feedback was about how to upload datasets and select prior papers. Their feedback resulted in the release of v.0.7 and v.0.9.

Systematic UX test

In May 2020 we conducted a systematic UX test. Two groups of users were distinguished: an unexperienced group and an experienced user who already used ASReview. Due to the COVID-19 lockdown the usability tests were conducted via video calling where one person gave instructions to the participant and one person observed, called human-moderated remote testing 49 . During the tests, one person (SH) asked the questions and helped the participant with the tasks, the other person observed and made notes, a user experience professional at the IT department of Utrecht University (MH).

To analyse the notes, thematic analysis was used, which is a method to analyse data by dividing the information in subjects that all have a different meaning 50 using the Nvivo 12 software 51 . When something went wrong the text was coded as showstopper, when something did not go smoothly the text was coded as doubtful, and when something went well the subject was coded as superb. The features the participants requested for future versions of the ASReview tool were discussed with the lead engineer of the ASReview team and were submitted to GitHub as issues or feature requests.

The answers to the quantitative questions can be found at the Open Science Framework 52 . The participants ( N = 11) rated the tool with a grade of 7.9 (s.d. = 0.9) on a scale from one to ten (Table 2 ). The unexperienced users on average rated the tool with an 8.0 (s.d. = 1.1, N = 6). The experienced user on average rated the tool with a 7.8 (s.d. = 0.9, N = 5). The participants described the usability test with words such as helpful, accessible, fun, clear and obvious.

The UX tests resulted in the new release v0.10, v0.10.1 and the major release v0.11, which is a major revision of the graphical user interface. The documentation has been upgraded to make installing and launching ASReview more straightforward. We made setting up the project, selecting a dataset and finding past knowledge is more intuitive and flexible. We also added a project dashboard with information on your progress and advanced settings.

Continuous input via the open source community

Finally, the ASReview development team receives continuous feedback from the open science community about, among other things, the user experience. In every new release we implement features listed by our users. Recurring UX tests are performed to keep up with the needs of users and improve the value of the tool.

We designed a system to accelerate the step of screening titles and abstracts to help researchers conduct a systematic review or meta-analysis as efficiently and transparently as possible. Our system uses active learning to train a machine learning model that predicts relevance from texts using a limited number of labelled examples. The classifier, feature extraction technique, balance strategy and active learning query strategy are flexible. We provide an open source software implementation, ASReview with state-of-the-art systems across a wide range of real-world systematic reviewing applications. Based on our experiments, ASReview provides defaults on its parameters, which exhibited good performance on average across the applications we examined. However, we stress that in practical applications, these defaults should be carefully examined; for this purpose, the software provides a simulation mode to users. We encourage users and developers to perform further evaluation of the proposed approach in their application, and to take advantage of the open source nature of the project by contributing further developments.

Drawbacks of machine learning-based screening systems, including our own, remain. First, although the active learning step greatly reduces the number of manuscripts that must be screened, it also prevents a straightforward evaluation of the system’s error rates without further onerous labelling. Providing users with an accurate estimate of the system’s error rate in the application at hand is therefore a pressing open problem. Second, although, as argued above, the use of such systems is not limited in principle to reviewing, no empirical benchmarks of actual performance in these other situations yet exist to our knowledge. Third, machine learning-based screening systems automate the screening step only; although the screening step is time-consuming and a good target for automation, it is just one part of a much larger process, including the initial search, data extraction, coding for risk of bias, summarizing results and so on. Although some other works, similar to our own, have looked at (semi-)automating some of these steps in isolation 53 , 54 , to our knowledge the field is still far removed from an integrated system that would truly automate the review process while guaranteeing the quality of the produced evidence synthesis. Integrating the various tools that are currently under development to aid the systematic reviewing pipeline is therefore a worthwhile topic for future development.

Possible future research could also focus on the performance of identifying full text articles with different document length and domain-specific terminologies or even other types of text, such as newspaper articles and court cases. When the selection of past knowledge is not possible based on expert knowledge, alternative methods could be explored. For example, unsupervised learning or pseudolabelling algorithms could be used to improve training 55 , 56 . In addition, as the NLP community pushes forward the state of the art in feature extraction methods, these are easily added to our system as well. In all cases, performance benefits should be carefully evaluated using benchmarks for the task at hand. To this end, common benchmark challenges should be constructed that allow for an even comparison of the various tools now available. To facilitate such a benchmark, we have constructed a repository of publicly available systematic reviewing datasets 57 .

The future of systematic reviewing will be an interaction with machine learning algorithms to deal with the enormous increase of available text. We invite the community to contribute to open source projects such as our own, as well as to common benchmark challenges, so that we can provide measurable and reproducible improvement over current practice.

Data availability

The results described in this paper are available at the Open Science Framework ( https://doi.org/10.17605/OSF.IO/2JKD6 ) 43 . The answers to the quantitative questions of the UX test can be found at the Open Science Framework (OSF.IO/7PQNM) 52 .

Code availability

All code to reproduce the results described in this paper can be found on Zenodo ( https://doi.org/10.5281/zenodo.4024122 ) 42 . All code for the software ASReview is available under an Apache 2.0 license ( https://doi.org/10.5281/zenodo.3345592 ) 27 , is maintained on GitHub 63 and includes documentation ( https://doi.org/10.5281/zenodo.4287120 ) 28 .

Bornmann, L. & Mutz, R. Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. 66 , 2215–2222 (2015).

Article Google Scholar

Gough, D., Oliver, S. & Thomas, J. An Introduction to Systematic Reviews (Sage, 2017).

Cooper, H. Research Synthesis and Meta-analysis: A Step-by-Step Approach (SAGE Publications, 2015).

Liberati, A. et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J. Clin. Epidemiol. 62 , e1–e34 (2009).

Boaz, A. et al. Systematic Reviews: What have They Got to Offer Evidence Based Policy and Practice? (ESRC UK Centre for Evidence Based Policy and Practice London, 2002).

Oliver, S., Dickson, K. & Bangpan, M. Systematic Reviews: Making Them Policy Relevant. A Briefing for Policy Makers and Systematic Reviewers (UCL Institute of Education, 2015).

Petticrew, M. Systematic reviews from astronomy to zoology: myths and misconceptions. Brit. Med. J. 322 , 98–101 (2001).

Lefebvre, C., Manheimer, E. & Glanville, J. in Cochrane Handbook for Systematic Reviews of Interventions (eds. Higgins, J. P. & Green, S.) 95–150 (John Wiley & Sons, 2008); https://doi.org/10.1002/9780470712184.ch6 .

Sampson, M., Tetzlaff, J. & Urquhart, C. Precision of healthcare systematic review searches in a cross-sectional sample. Res. Synth. Methods 2 , 119–125 (2011).

Wang, Z., Nayfeh, T., Tetzlaff, J., O’Blenis, P. & Murad, M. H. Error rates of human reviewers during abstract screening in systematic reviews. PLoS ONE 15 , e0227742 (2020).

Marshall, I. J. & Wallace, B. C. Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Syst. Rev. 8 , 163 (2019).

Harrison, H., Griffin, S. J., Kuhn, I. & Usher-Smith, J. A. Software tools to support title and abstract screening for systematic reviews in healthcare: an evaluation. BMC Med. Res. Methodol. 20 , 7 (2020).

O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M. & Ananiadou, S. Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst. Rev. 4 , 5 (2015).

Wallace, B. C., Trikalinos, T. A., Lau, J., Brodley, C. & Schmid, C. H. Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinf. 11 , 55 (2010).

Cohen, A. M., Hersh, W. R., Peterson, K. & Yen, P.-Y. Reducing workload in systematic review preparation using automated citation classification. J. Am. Med. Inform. Assoc. 13 , 206–219 (2006).

Kremer, J., Steenstrup Pedersen, K. & Igel, C. Active learning with support vector machines. WIREs Data Min. Knowl. Discov. 4 , 313–326 (2014).

Miwa, M., Thomas, J., O’Mara-Eves, A. & Ananiadou, S. Reducing systematic review workload through certainty-based screening. J. Biomed. Inform. 51 , 242–253 (2014).

Settles, B. Active Learning Literature Survey (Minds@UW, 2009); https://minds.wisconsin.edu/handle/1793/60660

Holzinger, A. Interactive machine learning for health informatics: when do we need the human-in-the-loop? Brain Inform. 3 , 119–131 (2016).

Van de Schoot, R. & De Bruin, J. Researcher-in-the-loop for Systematic Reviewing of Text Databases (Zenodo, 2020); https://doi.org/10.5281/zenodo.4013207

Kim, D., Seo, D., Cho, S. & Kang, P. Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf. Sci. 477 , 15–29 (2019).

Nosek, B. A. et al. Promoting an open research culture. Science 348 , 1422–1425 (2015).

Kilicoglu, H., Demner-Fushman, D., Rindflesch, T. C., Wilczynski, N. L. & Haynes, R. B. Towards automatic recognition of scientifically rigorous clinical research evidence. J. Am. Med. Inform. Assoc. 16 , 25–31 (2009).

Gusenbauer, M. & Haddaway, N. R. Which academic search systems are suitable for systematic reviews or meta‐analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res. Synth. Methods 11 , 181–217 (2020).

Borah, R., Brown, A. W., Capers, P. L. & Kaiser, K. A. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open 7 , e012545 (2017).

de Vries, H., Bekkers, V. & Tummers, L. Innovation in the Public Sector: a systematic review and future research agenda. Public Adm. 94 , 146–166 (2016).

Van de Schoot, R. et al. ASReview: Active Learning for Systematic Reviews (Zenodo, 2020); https://doi.org/10.5281/zenodo.3345592

De Bruin, J. et al. ASReview Software Documentation 0.14 (Zenodo, 2020); https://doi.org/10.5281/zenodo.4287120

ASReview PyPI Package (ASReview Core Development Team, 2020); https://pypi.org/project/asreview/

Docker container for ASReview (ASReview Core Development Team, 2020); https://hub.docker.com/r/asreview/asreview

Ferdinands, G. et al. Active Learning for Screening Prioritization in Systematic Reviews—A Simulation Study (OSF Preprints, 2020); https://doi.org/10.31219/osf.io/w6qbg

Fu, J. H. & Lee, S. L. Certainty-enhanced active learning for improving imbalanced data classification. In 2011 IEEE 11th International Conference on Data Mining Workshops 405–412 (IEEE, 2011).

Le, Q. V. & Mikolov, T. Distributed representations of sentences and documents. Preprint at https://arxiv.org/abs/1405.4053 (2014).

Ramos, J. Using TF–IDF to determine word relevance in document queries. In Proc. 1st Instructional Conference on Machine Learning Vol. 242, 133–142 (ICML, 2003).

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12 , 2825–2830 (2011).

MathSciNet MATH Google Scholar

Reimers, N. & Gurevych, I. Sentence-BERT: sentence embeddings using siamese BERT-networks Preprint at https://arxiv.org/abs/1908.10084 (2019).

Smith, V., Devane, D., Begley, C. M. & Clarke, M. Methodology in conducting a systematic review of systematic reviews of healthcare interventions. BMC Med. Res. Methodol. 11 , 15 (2011).

Wynants, L. et al. Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal. Brit. Med. J . 369 , 1328 (2020).

Van de Schoot, R. et al. Extension for COVID-19 Related Datasets in ASReview (Zenodo, 2020). https://doi.org/10.5281/zenodo.3891420 .

Lu Wang, L. et al. CORD-19: The COVID-19 open research dataset. Preprint at https://arxiv.org/abs/2004.10706 (2020).

Fraser, N. & Kramer, B. Covid19_preprints (FigShare, 2020); https://doi.org/10.6084/m9.figshare.12033672.v18

Ferdinands, G., Schram, R., Van de Schoot, R. & De Bruin, J. Scripts for ‘ASReview: Open Source Software for Efficient and Transparent Active Learning for Systematic Reviews’ (Zenodo, 2020); https://doi.org/10.5281/zenodo.4024122

Ferdinands, G., Schram, R., van de Schoot, R. & de Bruin, J. Results for ‘ASReview: Open Source Software for Efficient and Transparent Active Learning for Systematic Reviews’ (OSF, 2020); https://doi.org/10.17605/OSF.IO/2JKD6

Kwok, K. T. T., Nieuwenhuijse, D. F., Phan, M. V. T. & Koopmans, M. P. G. Virus metagenomics in farm animals: a systematic review. Viruses 12 , 107 (2020).

Hall, T., Beecham, S., Bowes, D., Gray, D. & Counsell, S. A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38 , 1276–1304 (2012).

van de Schoot, R., Sijbrandij, M., Winter, S. D., Depaoli, S. & Vermunt, J. K. The GRoLTS-Checklist: guidelines for reporting on latent trajectory studies. Struct. Equ. Model. Multidiscip. J. 24 , 451–467 (2017).

Article MathSciNet Google Scholar

van de Schoot, R. et al. Bayesian PTSD-trajectory analysis with informed priors based on a systematic literature search and expert elicitation. Multivar. Behav. Res. 53 , 267–291 (2018).

Cohen, A. M., Bhupatiraju, R. T. & Hersh, W. R. Feature generation, feature selection, classifiers, and conceptual drift for biomedical document triage. In Proc. 13th Text Retrieval Conference (TREC, 2004).

Vasalou, A., Ng, B. D., Wiemer-Hastings, P. & Oshlyansky, L. Human-moderated remote user testing: orotocols and applications. In 8th ERCIM Workshop, User Interfaces for All Vol. 19 (ERCIM, 2004).

Joffe, H. in Qualitative Research Methods in Mental Health and Psychotherapy: A Guide for Students and Practitioners (eds Harper, D. & Thompson, A. R.) Ch. 15 (Wiley, 2012).

NVivo v. 12 (QSR International Pty, 2019).

Hindriks, S., Huijts, M. & van de Schoot, R. Data for UX-test ASReview - June 2020. OSF https://doi.org/10.17605/OSF.IO/7PQNM (2020).

Marshall, I. J., Kuiper, J. & Wallace, B. C. RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. J. Am. Med. Inform. Assoc. 23 , 193–201 (2016).

Nallapati, R., Zhou, B., dos Santos, C. N., Gulcehre, Ç. & Xiang, B. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proc. 20th SIGNLL Conference on Computational Natural Language Learning 280–290 (Association for Computational Linguistics, 2016).

Xie, Q., Dai, Z., Hovy, E., Luong, M.-T. & Le, Q. V. Unsupervised data augmentation for consistency training. Preprint at https://arxiv.org/abs/1904.12848 (2019).

Ratner, A. et al. Snorkel: rapid training data creation with weak supervision. VLDB J. 29 , 709–730 (2020).

Systematic Review Datasets (ASReview Core Development Team, 2020); https://github.com/asreview/systematic-review-datasets

Wallace, B. C., Small, K., Brodley, C. E., Lau, J. & Trikalinos, T. A. Deploying an interactive machine learning system in an evidence-based practice center: Abstrackr. In Proc. 2nd ACM SIGHIT International Health Informatics Symposium 819–824 (Association for Computing Machinery, 2012).

Cheng, S. H. et al. Using machine learning to advance synthesis and use of conservation and environmental evidence. Conserv. Biol. 32 , 762–764 (2018).

Yu, Z., Kraft, N. & Menzies, T. Finding better active learners for faster literature reviews. Empir. Softw. Eng . 23 , 3161–3186 (2018).

Ouzzani, M., Hammady, H., Fedorowicz, Z. & Elmagarmid, A. Rayyan—a web and mobile app for systematic reviews. Syst. Rev. 5 , 210 (2016).

Przybyła, P. et al. Prioritising references for systematic reviews with RobotAnalyst: a user study. Res. Synth. Methods 9 , 470–488 (2018).

ASReview: Active learning for Systematic Reviews (ASReview Core Development Team, 2020); https://github.com/asreview/asreview

Download references

Acknowledgements

We would like to thank the Utrecht University Library, focus area Applied Data Science, and departments of Information and Technology Services, Test and Quality Services, and Methodology and Statistics, for their support. We also want to thank all researchers who shared data, participated in our user experience tests or who gave us feedback on ASReview in other ways. Furthermore, we would like to thank the editors and reviewers for providing constructive feedback. This project was funded by the Innovation Fund for IT in Research Projects, Utrecht University, the Netherlands.

Author information

Authors and affiliations.

Department of Methodology and Statistics, Faculty of Social and Behavioral Sciences, Utrecht University, Utrecht, the Netherlands

Rens van de Schoot, Gerbrich Ferdinands, Albert Harkema, Joukje Willemsen, Yongchao Ma, Qixiang Fang, Sybren Hindriks & Daniel L. Oberski

Department of Research and Data Management Services, Information Technology Services, Utrecht University, Utrecht, the Netherlands

Jonathan de Bruin, Raoul Schram, Parisa Zahedi & Maarten Hoogerwerf

Utrecht University Library, Utrecht University, Utrecht, the Netherlands

Jan de Boer, Felix Weijdema & Bianca Kramer

Department of Test and Quality Services, Information Technology Services, Utrecht University, Utrecht, the Netherlands

Martijn Huijts

School of Governance, Faculty of Law, Economics and Governance, Utrecht University, Utrecht, the Netherlands

Lars Tummers

Department of Biostatistics, Data management and Data Science, Julius Center, University Medical Center Utrecht, Utrecht, the Netherlands

Daniel L. Oberski

You can also search for this author in PubMed Google Scholar

Contributions

R.v.d.S. and D.O. originally designed the project, with later input from L.T. J.d.Br. is the lead engineer, software architect and supervises the code base on GitHub. R.S. coded the algorithms and simulation studies. P.Z. coded the very first version of the software. J.d.Bo., F.W. and B.K. developed the systematic review pipeline. M.Huijts is leading the UX tests and was supported by S.H. M.Hoogerwerf developed the architecture of the produced (meta)data. G.F. conducted the simulation study together with R.S. A.H. performed the literature search comparing the different tools together with G.F. J.W. designed all the artwork and helped with formatting the manuscript. Y.M. and Q.F. are responsible for the preprocessing of the metadata under the supervision of J.d.Br. R.v.d.S, D.O. and L.T. wrote the paper with input from all authors. Each co-author has written parts of the manuscript.

Corresponding author

Correspondence to Rens van de Schoot .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Jian Wu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information.

Overview of software tools supporting systematic reviews.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

van de Schoot, R., de Bruin, J., Schram, R. et al. An open source machine learning framework for efficient and transparent systematic reviews. Nat Mach Intell 3 , 125–133 (2021). https://doi.org/10.1038/s42256-020-00287-7

Download citation

Received : 04 June 2020

Accepted : 17 December 2020

Published : 01 February 2021

Issue Date : February 2021

DOI : https://doi.org/10.1038/s42256-020-00287-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

A systematic review, meta-analysis, and meta-regression of the prevalence of self-reported disordered eating and associated factors among athletes worldwide.

Hadeel A. Ghazzawi
Lana S. Nimer
Haitham Jahrami

Journal of Eating Disorders (2024)

Systematic review using a spiral approach with machine learning

Amirhossein Saeidmehr
Piers David Gareth Steel
Faramarz F. Samavati

Systematic Reviews (2024)

The spatial patterning of emergency demand for police services: a scoping review

Samuel Langton
Stijn Ruiter
Linda Schoonmade

Crime Science (2024)

The SAFE procedure: a practical stopping heuristic for active learning-based screening in systematic reviews and meta-analyses

Josien Boetje
Rens van de Schoot

Tunneling, cognitive load and time orientation and their relations with dietary behavior of people experiencing financial scarcity – an AI-assisted scoping review elaborating on scarcity theory

Annemarieke van der Veer
Tamara Madern
Frank J. van Lenthe

International Journal of Behavioral Nutrition and Physical Activity (2024)

Quick links

Explore articles by subject
Guide to authors
Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Start your free trial

Arrange a trial for your organisation and discover why FSTA is the leading database for reliable research on the sciences of food and health.

REQUEST A FREE TRIAL

Research Skills Blog

5 software tools to support your systematic review processes

By Dr. Mina Kalantar on 19-Jan-2021 13:01:01

4 software tools to support your systematic review processes | IFIS Publishing

Systematic reviews are a reassessment of scholarly literature to facilitate decision making. This methodical approach of re-evaluating evidence was initially applied in healthcare, to set policies, create guidelines and answer medical questions.

Systematic reviews are large, complex projects and, depending on the purpose, they can be quite expensive to conduct. A team of researchers, data analysts and experts from various fields may collaborate to review and examine incredibly large numbers of research articles for evidence synthesis. Depending on the spectrum, systematic reviews often take at least 6 months, and sometimes upwards of 18 months to complete.

The main principles of transparency and reproducibility require a pragmatic approach in the organisation of the required research activities and detailed documentation of the outcomes. As a result, many software tools have been developed to help researchers with some of the tedious tasks required as part of the systematic review process.

hbspt.cta._relativeUrls=true;hbspt.cta.load(97439, 'ccc20645-09e2-4098-838f-091ed1bf1f4e', {"useNewLoader":"true","region":"na1"});

The first generation of these software tools were produced to accommodate and manage collaborations, but gradually developed to help with screening literature and reporting outcomes. Some of these software packages were initially designed for medical and healthcare studies and have specific protocols and customised steps integrated for various types of systematic reviews. However, some are designed for general processing, and by extending the application of the systematic review approach to other fields, they are being increasingly adopted and used in software engineering, health-related nutrition, agriculture, environmental science, social sciences and education.

Software tools

There are various free and subscription-based tools to help with conducting a systematic review. Many of these tools are designed to assist with the key stages of the process, including title and abstract screening, data synthesis, and critical appraisal. Some are designed to facilitate the entire process of review, including protocol development, reporting of the outcomes and help with fast project completion.

As time goes on, more functions are being integrated into such software tools. Technological advancement has allowed for more sophisticated and user-friendly features, including visual graphics for pattern recognition and linking multiple concepts. The idea is to digitalise the cumbersome parts of the process to increase efficiency, thus allowing researchers to focus their time and efforts on assessing the rigorousness and robustness of the research articles.

This article introduces commonly used systematic review tools that are relevant to food research and related disciplines, which can be used in a similar context to the process in healthcare disciplines.

These reviews are based on IFIS' internal research, thus are unbiased and not affiliated with the companies.

This online platform is a core component of the Cochrane toolkit, supporting parts of the systematic review process, including title/abstract and full-text screening, documentation, and reporting.

The Covidence platform enables collaboration of the entire systematic reviews team and is suitable for researchers and students at all levels of experience.

From a user perspective, the interface is intuitive, and the citation screening is directed step-by-step through a well-defined workflow. Imports and exports are straightforward, with easy export options to Excel and CVS.

Access is free for Cochrane authors (a single reviewer), and Cochrane provides a free trial to other researchers in healthcare. Universities can also subscribe on an institutional basis.

Rayyan is a free and open access web-based platform funded by the Qatar Foundation, a non-profit organisation supporting education and community development initiative . Rayyan is used to screen and code literature through a systematic review process.

Unlike Covidence, Rayyan does not follow a standard SR workflow and simply helps with citation screening. It is accessible through a mobile application with compatibility for offline screening. The web-based platform is known for its accessible user interface, with easy and clear export options.

Function comparison of 5 software tools to support the systematic review process

Eppi-reviewer.

EPPI-Reviewer is a web-based software programme developed by the Evidence for Policy and Practice Information and Co-ordinating Centre (EPPI) at the UCL Institute for Education, London .

It provides comprehensive functionalities for coding and screening. Users can create different levels of coding in a code set tool for clustering, screening, and administration of documents. EPPI-Reviewer allows direct search and import from PubMed. The import of search results from other databases is feasible in different formats. It stores, references, identifies and removes duplicates automatically. EPPI-Reviewer allows full-text screening, text mining, meta-analysis and the export of data into different types of reports.

There is no limit for concurrent use of the software and the number of articles being reviewed. Cochrane reviewers can access EPPI reviews using their Cochrane subscription details.

EPPI-Centre has other tools for facilitating the systematic review process, including coding guidelines and data management tools.

CADIMA is a free, online, open access review management tool, developed to facilitate research synthesis and structure documentation of the outcomes.

The Julius Institute and the Collaboration for Environmental Evidence established the software programme to support and guide users through the entire systematic review process, including protocol development, literature searching, study selection, critical appraisal, and documentation of the outcomes. The flexibility in choosing the steps also makes CADIMA suitable for conducting systematic mapping and rapid reviews.

CADIMA was initially developed for research questions in agriculture and environment but it is not limited to these, and as such, can be used for managing review processes in other disciplines. It enables users to export files and work offline.

The software allows for statistical analysis of the collated data using the R statistical software. Unlike EPPI-Reviewer, CADIMA does not have a built-in search engine to allow for searching in literature databases like PubMed.

DistillerSR

DistillerSR is an online software maintained by the Canadian company, Evidence Partners which specialises in literature review automation. DistillerSR provides a collaborative platform for every stage of literature review management. The framework is flexible and can accommodate literature reviews of different sizes. It is configurable to different data curation procedures, workflows and reporting standards. The platform integrates necessary features for screening, quality assessment, data extraction and reporting. The software uses Artificial Learning (AL)-enabled technologies in priority screening. It is to cut the screening process short by reranking the most relevant references nearer to the top. It can also use AL, as a second reviewer, in quality control checks of screened studies by human reviewers. DistillerSR is used to manage systematic reviews in various medical disciplines, surveillance, pharmacovigilance and public health reviews including food and nutrition topics. The software does not support statistical analyses. It provides configurable forms in standard formats for data extraction.

DistillerSR allows direct search and import of references from PubMed. It provides an add on feature called LitConnect which can be set to automatically import newly published references from data providers to keep reviews up to date during their progress.

The systematic review Toolbox is a web-based catalogue of various tools, including software packages which can assist with single or multiple tasks within the evidence synthesis process. Researchers can run a quick search or tailor a more sophisticated search by choosing their approach, budget, discipline, and preferred support features, to find the right tools for their research.

If you enjoyed this blog post, you may also be interested in our recently published blog post addressing the difference between a systematic review and a systematic literature review.

FSTA - Food Science & Technology Abstracts
IFIS Collections
Resources Hub
Diversity Statement
Sustainability Commitment
Company news
Frequently Asked Questions
Privacy Policy
Terms of Use for IFIS Collections

Ground Floor, 115 Wharfedale Road, Winnersh Triangle, Wokingham, Berkshire RG41 5RB

Get in touch with IFIS

A systematic literature review of open source software quality assessment models

Affiliations.

1 Covenant University, Ota, Nigeria.
2 Covenant University, Ota, Nigeria ; Atilim University, Ankara, Turkey.
3 Pontificia Universidad Católica de Valparaíso, Valparaiso, Chile.
PMID: 27872799
PMCID: PMC5101238
DOI: 10.1186/s40064-016-3612-4

Background: Many open source software (OSS) quality assessment models are proposed and available in the literature. However, there is little or no adoption of these models in practice. In order to guide the formulation of newer models so they can be acceptable by practitioners, there is need for clear discrimination of the existing models based on their specific properties. Based on this, the aim of this study is to perform a systematic literature review to investigate the properties of the existing OSS quality assessment models by classifying them with respect to their quality characteristics, the methodology they use for assessment, and their domain of application so as to guide the formulation and development of newer models. Searches in IEEE Xplore, ACM, Science Direct, Springer and Google Search is performed so as to retrieve all relevant primary studies in this regard. Journal and conference papers between the year 2003 and 2015 were considered since the first known OSS quality model emerged in 2003.

Results: A total of 19 OSS quality assessment model papers were selected. To select these models we have developed assessment criteria to evaluate the quality of the existing studies. Quality assessment models are classified into five categories based on the quality characteristics they possess namely: single-attribute, rounded category, community-only attribute, non-community attribute as well as the non-quality in use models. Our study reflects that software selection based on hierarchical structures is found to be the most popular selection method in the existing OSS quality assessment models. Furthermore, we found that majority (47%) of the existing models do not specify any domain of application.

Conclusions: In conclusion, our study will be a valuable contribution to the community and helps the quality assessment model developers in formulating newer models and also to the practitioners (software evaluators) in selecting suitable OSS in the midst of alternatives.

Keywords: Analysis; Community; ISO 25010; Open source software; Quality assessment models.

Publication types

Oracle Mode
Oracle Mode – Advanced
Exploration Mode
Simulation Mode
Simulation Infrastructure

Join the movement towards fast, open, and transparent systematic reviews

ASReview LAB v1.5 is out!

By loading the video, you agree to YouTube's privacy policy. Learn more

Always unblock YouTube

ASReview uses state-of-the-art active learning techniques to solve one of the most interesting challenges in systematically screening large amounts of text : there’s not enough time to read everything!

The project has grown into a vivid worldwide community of researchers, users, and developers. ASReview is coordinated at Utrecht University and is part of the official AI-labs at the university.

Free, Open and Transparent

The software is installed on your device locally. This ensures that nobody else has access to your data, except when you share it with others. Nice, isn’t it?

Free and open source
Local or server installation
Full control over your data
Follows the Reproducibility and Data Storage Checklist for AI-Aided Systematic Reviews

In 2 minutes up and running

With the smart project setup features, you can start a new project in minutes. Ready, set, start screening!

Create as many projects as you want
Choose your own or an existing dataset
Select prior knowledge
Select your favorite active learning algorithm

Three modi to choose from

ASReview LAB can be used for:

Screening with the Oracle Mode , including advanced options
Teaching using the Exploration Mode
Validating algorithms using the Simulation Mode

We also offer an open-source research infrastructure to run large-scale simulation studies for validating newly developed AI algorithms.

Follow the development

Open-source means:

All annotated source code is available
You can see the developers at work in open Pull Requests
Open Pull Request show in what direction the project is developing
Anyone can contribute!

Give a GitHub repo a star if you like our work.

Join the community

A community-driven project means:

The project is a joined endeavor
Your contribution matters!

Join the movement towards transparent AI-aided reviewing

Beginner -> User -> Developer -> Maintainer

Organizations

Github stars

Join the ASReview Development Fund

Many users donate their time to continue the development of the different software tools that are part of the ASReview universe. Also, donations and research grants make innovations possible!

Navigating the Maze of Models in ASReview

Starting a systematic review can feel like navigating through a maze, with countless articles and endless…

ASReview LAB Class 101

ASReview LAB Class 101 Welcome to ASReview LAB class 101, an introduction to the most important…

Introducing the Noisy Label Filter (NLF) procedure in systematic reviews

The ASReview team developed a procedure to overcome replication issues in creating a dataset for simulation…

Seven ways to integrate ASReview in your systematic review workflow

Seven ways to integrate ASReview in your systematic review workflow Systematic reviewing using software implementing Active…

Active Learning Explained

Active Learning Explained The rapidly evolving field of artificial intelligence (AI) has allowed the development of…

the Zen of Elas

The Zen of Elas Elas is the Mascotte of ASReview and your Electronic Learning Assistant who…

Five ways to get involved in ASReview

Five ways to get involved in ASReview ASReview LAB is open and free (Libre) software, maintained…

Connecting RIS import to export functionalities

What’s new in v0.19? Connecting RIS import to export functionalities Download ASReview LAB 0.19Update to ASReview…

Meet the new ASReview Maintainer: Yongchao Ma

Meet Front-End Developer and ASReview Maintainer Yongchao Ma As a user of ASReview, you are probably…

UPDATED: ASReview Hackathon for Follow the Money

This event has passed The winners of the hackathon were: Data pre-processing: Teamwork by: Raymon van…

What’s new in release 0.18?

More models more options, available now! Version 0.18 slowly opens ways to the soon to be…

Simulation Mode Class 101

Simulation Mode Class 101 Have you ever done a systematic review manually, but wondered how much…

Library Services

UCL LIBRARY SERVICES

Guides and databases
Library skills
Systematic reviews

Software for systematic reviews

What are systematic reviews?
Types of systematic reviews
Formulating a research question
Identifying studies
Searching databases
Describing and appraising studies
Synthesis and systematic maps
Online training and support
Live and face to face training
Individual support
Further help

A range of software is available for systematic reviews, especially to support screening and data extraction but also for other stages of the process. Specialist systematic review software may also contain functions for machine learning, data-analysis, visualisation and reporting tools. It is also possible to use reference management software or Excel for some of the stages of reviewing.

EPPI-Reviewer Developed within the EPPI-Centre, at UCL Institute of Education. While there is a cost to use, there is a reduced fee for single-use reviews (used by students or researchers on small budgets), compared with the costs for review that can have more than one user. The fees go towards the cost of infrastructure, development and support of the software. There is also a free trial available.
EPPI-Mapper Tool for visualising ‘maps’ of research evidence.
Rayyan Rayyan is a freely available software that can be used for the screening process and has other features such as text mining tools.
Covidence Covidence has a free trial with a limited amount of records and reviewers, and then requires an annual subscription.Can be used for screening and to support data extraction.
Abstrackr Free open source screening software created by Brown University.
Systematic review toolbox A comprehensive list of software to support systematic reviews at a variety of stages. You can also search by stage of review (e..g protocol) to find appropriate software.
RevMan Software from the Cochrane Collaboraton. RevMan Web facilitates the creation of meta-analyses, forest plots, risk-of-bias tables, and other systematic review elements. Free to authors working on Cochrane Reviews, otherwise subscription required.
Reference management software

Although reference management software such as EndNote, Mendeley and Zotero are not bespoke review management tools, they can be used for systematic reviews. The Journal of Medical Library Association, for example, has documented the use of EndNote. In addition, below are some links that discuss the use of Mendeley and Zotero for systematic reviews.

Reference management software UCL Library Services guide to reference management software.
Using EndNote for systematic reviews Guidance from UCL Library Services.

Mendeley is only suitable for systematic reviews if you have the desktop version already downloaded, as the replacement, Mendeley Reference Manager does not currently have a deduplication option. Mendeley Desktop is no longer available to download for new users and will only be available on UCL computers or on your own device using Desktop@UCL Anywhere for a limited time.

Using Mendeley Desktop for systematic reviews UCL Library Services guide to using Mendeley Desktop for systematic reviews
Zotero and systematic reviews Zotero discussion forum on using Zotero for systematic reviews.
Using Zotero for systematic reviews Video (20 mins) from Washington State University Libraries.
Zotero guide UCL Library Services guide to using Zotero.

Related guides

<< Previous: Synthesis and systematic maps
Next: Support for systematic reviews >>
Last Updated: Apr 4, 2024 10:09 AM
URL: https://library-guides.ucl.ac.uk/systematic-reviews

Help | Advanced Search

Computer Science > Cryptography and Security

Title: migrating software systems towards post-quantum-cryptography -- a systematic literature review.

Abstract: Networks such as the Internet are essential for our connected world. Quantum computing poses a threat to this heterogeneous infrastructure since it threatens fundamental security mechanisms. Therefore, a migration to post-quantum-cryptography (PQC) is necessary for networks and their components. At the moment, there is little knowledge on how such migrations should be structured and implemented in practice. Our systematic literature review addresses migration approaches for IP networks towards PQC. It surveys papers about the migration process and exemplary real-world software system migrations. On the process side, we found that terminology, migration steps, and roles are not defined precisely or consistently across the literature. Still, we identified four major phases and appropriate substeps which we matched with also emerging archetypes of roles. In terms of real-world migrations, we see that reports used several different PQC implementations and hybrid solutions for migrations of systems belonging to a wide range of system types. Across all papers we noticed three major challenges for adopters: missing experience of PQC and a high realization effort, concerns about the security of the upcoming system, and finally, high complexity. Our findings indicate that recent standardization efforts already push quantum-safe networking forward. However, the literature is still not in consensus about definitions and best practices. Implementations are mostly experimental and not necessarily practical, leading to an overall chaotic situation. To better grasp this fast moving field of (applied) research, our systematic literature review provides a comprehensive overview of its current state and serves as a starting point for delving into the matter of PQC migration.

Submission history

Access paper:.

Other Formats

References & Citations

Google Scholar
Semantic Scholar

BibTeX formatted citation

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

IMAGES

(PDF) A systematic literature review of open source software quality
(PDF) A Systematic Literature Review on Massive Open Online Course for
Systematic literature review on software quality for AI-based software
Systematic Literature Review Process
How to Conduct a Systematic Review
(PDF) A Systematic Literature Review on Feature Oriented Software

VIDEO

Introduction Systematic Literature Review-Various frameworks Bibliometric Analysis
Editorial workflow in OJS 3.3. Module 7: Responding to the reviews
Systematic Literature Review
Editorial workflow in OJS 3.3. Module 6: The reviewer's steps
Systematic Literature Review (SLR)
Systematic Literature Review Paper presentation

COMMENTS

A systematic literature review of open source software quality
Background Many open source software (OSS) quality assessment models are proposed and available in the literature. However, there is little or no adoption of these models in practice. In order to guide the formulation of newer models so they can be acceptable by practitioners, there is need for clear discrimination of the existing models based on their specific properties. Based on this, the ...
Quality evaluation models or frameworks for open source software: A
Abstract As open project repositories have become widespread, evaluating the quality of open source software (OSS) has gained attention in the software community. ... Quality evaluation models or frameworks for open source software: A systematic literature review. Nebi Yılmaz, Corresponding Author. Nebi Yılmaz [email protected]
Open Source Software Evaluation, Selection, and Adoption: a Systematic
Open Source Software (OSS) is experiencing an increasing popularity both in industry and in academia. ... This Systematic Literature Review provides an overview of the available OSS evaluation ...
PDF Open Source Software Evaluation, Selection, and Adoption: a Systematic
Open Source Software (OSS) is experi-encing an increasing popularity both in industry and in academia. Aim. We investigated models for the selection, evaluation, and adoption of OSS, focusing on factors that affect most the evaluation of OSS. Method. We conducted a Systematic Literature Review of 262 studies published until the end of 2019,
PDF Crafting a Systematic Literature Review on Open-Source Platforms
Abstract. This working paper unveils the crafting of a systematic literature re-view on open-source platforms. The high-competitive mobile devices market, where several players such as Apple, Google, Nokia and Microsoft run a plat-forms-war with constant shifts in their technological strategies, is gaining in-creasing attention from scholars.
Crafting a systematic literature review on open-source platforms
open-source software definitions [15]. Both Stallman's and OSI definitions address very well the public with both expertise in software development and software license agreements, however general public could reveal difficulties in understanding the open-source term. To position the open-source software concept used in this review with a mapping
Adoption of open source software in software-intensive organizations
A systematic literature review by Hauge et al. (2010) found six distinct approaches recorded by researchers for evaluating OSS software, including components to be used in software products. ... Benefits and challenges of open source software peer review. International Journal of Human-Computer Studies, Volume 77, 2015, pp. 52-65.
A resilience‐based framework for assessing the evolution of open source
Open source software evaluation, selection, and adoption: a systematic literature review. In: Proc. of 46th EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA); 2020: 437 ‐ 444. Google Scholar; 11 Wasserman AI, Guo X, McMillian B, Qian K, Wei M‐Y, Xu Q. OSSPAL: finding and evaluating open source software.
Open Source Software Development Challenges: A Systematic Literature
GHTorrent, GitHub, Open-Source, OSS, SLR, Systematic Literature Review Thanks to distributed version control systems such as Git, Mercurial, e tc., open-source dev elopment platforms hav e reached ...
Adoption of Open Source Software in Software-Intensive Organizations
open source software, organizations, software development, systematic literature review 1. Introduction The open source software (OSS) phenomenon has over the last decade had a signi cant impact, not only on the software industry, but also on software-intensive organizations in both the public and private sector. The
Adoption of open source software in software-intensive organizations
With a focus on software development, this systematic literature review seeks to evaluate, synthesize, and present the empirical research results on OSS within organizations. ... Finally, we present the objectives for this literature review. 2.1. Open source softwareEric Raymond describes the development of OSS as a bazaar-like activity driven ...
Open Source Software Evaluation, Selection, and Adoption: a Systematic
Background. Open Source Software (OSS) is experiencing an increasing popularity both in industry and in academia. Aim. We investigated models for the selection, evaluation, and adoption of OSS, focusing on factors that affect most the evaluation of OSS. Method. We conducted a Systematic Literature Review of 262 studies published until the end of 2019, to understand whether OSS selection is ...
Open Source Software Development Challenges: A Systematic Literature
In this study, the 172 studies that use GHTorrent as a data source were categorized within the scope of open source software development challenges and a systematic literature review was carried out. Moreover, the pros and cons of the dataset have been indicated and the focused issues of the literature on and the open challenges have been noted ...
PDF A systematic literature review of open source software quality
REVIEW A systematic literature review of open source software quality assessment models Adewole Adewumi1, Sanjay Misra1,2*, Nicholas Omoregbe1, Broderick Crawford3 and Ricardo Soto3 Abstract Background: Many open source software (OSS) quality assessment models are proposed and available in the litera-ture.
Adoption Barriers of Open-Source Software: A Systematic Review
Abstract. Context: Free/Libre and Open Source Software (FOSS/FLOSS) has had a profound impact in the field of Information Technology. In recent years, adoption rates of FLOSS have been growing throughout public and private organizations.
Quality evaluation models or frameworks for open source software: A
An empirical study on selection of open source software‐preliminary results. In: ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development. IEEE; 2009. Google Scholar; 22 Stol KJ, Babar MA. Challenges in using open source software in product development: a review of the literature. Proceedings of the 3rd ...
An open source machine learning framework for efficient and ...
It is a challenging task for any research field to screen the literature and determine what needs to be included in a systematic review in a transparent way. A new open source machine learning ...
A business model for commercial open source software: A systematic
A systematic literature review of the COSS business model was conducted and 1157 studies were retrieved through search in six academic databases. The result of the process of selecting the primary studies was 21 studies. By backward snowballing, we discovered 10 other studies, and thus a total of 31 studies were found.
A systematic literature review of open source software quality
Background. Many open source software (OSS) quality assessment models are proposed and available in the literature. However, there is little or no adoption of these models in practice. In order to guide the formulation of newer models so they can be acceptable by practitioners, there is need for clear discrimination of the existing models based ...
5 software tools to support your systematic review processes
Covidence. This online platform is a core component of the Cochrane toolkit, supporting parts of the systematic review process, including title/abstract and full-text screening, documentation, and reporting. The Covidence platform enables collaboration of the entire systematic reviews team and is suitable for researchers and students at all ...
Guiding the way: A systematic literature review on ...
Request PDF | On Apr 1, 2024, Zixuan Feng and others published Guiding the way: A systematic literature review on mentoring practices in open source software projects | Find, read and cite all the ...
A systematic literature review of open source software quality
Background: Many open source software (OSS) quality assessment models are proposed and available in the literature. However, there is little or no adoption of these models in practice. In order to guide the formulation of newer models so they can be acceptable by practitioners, there is need for clear discrimination of the existing models based on their specific properties.
ASReview
Free and open source; Local or server installation; Full control over your data; Follows the Reproducibility and Data Storage Checklist for AI-Aided Systematic Reviews; Download ASReview LAB. In 2 minutes up and running. ... Seven ways to integrate ASReview in your systematic review workflow Systematic reviewing using software implementing ...
A systematic literature review on the barriers faced by newcomers to
A systematic review for Software Engineering establishes the problem as the target of the systematic review and the intervention as what can be observed in the context of the systematic review [3]. Considering such definitions, we initially established the population as open source projects and the intervention as contributions of newcomers.
Software for systematic reviews
A range of software is available for systematic reviews, especially to support screening and data extraction but also for other stages of the process. ... Free open source screening software created by Brown University. Systematic review toolbox. A comprehensive list of software to support systematic reviews at a variety of stages. You can also ...
[2404.12854] Migrating Software Systems towards Post-Quantum
Our systematic literature review addresses migration approaches for IP networks towards PQC. It surveys papers about the migration process and exemplary real-world software system migrations. On the process side, we found that terminology, migration steps, and roles are not defined precisely or consistently across the literature.
Indigenous conflict management practices in Ethiopia: a systematic
The systematic review process includes the following steps: formulating research questions, establishing standards for collecting literature, choosing inclusion and exclusion criteria, developing a thorough search strategy for locating literature, developing a codebook for classifying and describing literature, coding the literature and ...

A systematic literature review of open source software quality assessment models

Conclusions

Research questions

Search strategy

Inclusion criteria

Quality assessment

Data extraction strategy

Synthesis method

RQ1. What are the key quality characteristics possessed by the models?

RQ2. What are the methods applied for reaching selection decisions?

RQ3. What is the domain of application?

Summary and discussion

Implications of the results

Threats to validity

Conclusion and future work

Abbreviations

Authors’ contributions

Acknowledgements

Competing interests

Availability of data and materials

Author information

Corresponding author

Rights and permissions

About this article

Share this article

Open Source Software Evaluation, Selection, and Adoption: a Systematic Literature Review

Purchase Details

Profile Information

An open source machine learning framework for efficient and transparent systematic reviews

Similar content being viewed by others

Causal machine learning for predicting treatment outcomes

Highly accurate protein structure prediction with AlphaFold

Segment anything in medical images

Pipeline for manual and machine learning-aided systematic reviews

Software implementation for ASReview

Real-world use cases and high-level function descriptions

Simulation study

Performance metrics

Usability testing (user experience testing)

Unstructured interviews

Systematic UX test

Continuous input via the open source community

Data availability

Code availability

Acknowledgements

Author information

Contributions

Corresponding author

Ethics declarations

Additional information

Supplementary information

Rights and permissions

About this article

Share this article

This article is cited by

Systematic review using a spiral approach with machine learning

The spatial patterning of emergency demand for police services: a scoping review

The SAFE procedure: a practical stopping heuristic for active learning-based screening in systematic reviews and meta-analyses

Tunneling, cognitive load and time orientation and their relations with dietary behavior of people experiencing financial scarcity – an AI-assisted scoping review elaborating on scarcity theory

Quick links

Start your free trial

5 software tools to support your systematic review processes

Software tools

Function comparison of 5 software tools to support the systematic review process

DistillerSR

Get in touch with IFIS

A systematic literature review of open source software quality assessment models

Publication types

Join the movement towards fast, open, and transparent systematic reviews

Free, Open and Transparent

In 2 minutes up and running

Three modi to choose from

Follow the development

Join the community

Beginner -> User -> Developer -> Maintainer

Join the ASReview Development Fund

Navigating the Maze of Models in ASReview

ASReview LAB Class 101

Introducing the Noisy Label Filter (NLF) procedure in systematic reviews

Seven ways to integrate ASReview in your systematic review workflow