U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Heart Views
  • v.18(3); Jul-Sep 2017

Guidelines To Writing A Clinical Case Report

What is a clinical case report.

A case report is a detailed report of the symptoms, signs, diagnosis, treatment, and follow-up of an individual patient. Case reports usually describe an unusual or novel occurrence and as such, remain one of the cornerstones of medical progress and provide many new ideas in medicine. Some reports contain an extensive review of the relevant literature on the topic. The case report is a rapid short communication between busy clinicians who may not have time or resources to conduct large scale research.

WHAT ARE THE REASONS FOR PUBLISHING A CASE REPORT?

The most common reasons for publishing a case are the following: 1) an unexpected association between diseases or symptoms; 2) an unexpected event in the course observing or treating a patient; 3) findings that shed new light on the possible pathogenesis of a disease or an adverse effect; 4) unique or rare features of a disease; 5) unique therapeutic approaches; variation of anatomical structures.

Most journals publish case reports that deal with one or more of the following:

  • Unusual observations
  • Adverse response to therapies
  • Unusual combination of conditions leading to confusion
  • Illustration of a new theory
  • Question regarding a current theory
  • Personal impact.

STRUCTURE OF A CASE REPORT[ 1 , 2 ]

Different journals have slightly different formats for case reports. It is always a good idea to read some of the target jiurnals case reports to get a general idea of the sequence and format.

In general, all case reports include the following components: an abstract, an introduction, a case, and a discussion. Some journals might require literature review.

The abstract should summarize the case, the problem it addresses, and the message it conveys. Abstracts of case studies are usually very short, preferably not more than 150 words.

Introduction

The introduction gives a brief overview of the problem that the case addresses, citing relevant literature where necessary. The introduction generally ends with a single sentence describing the patient and the basic condition that he or she is suffering from.

This section provides the details of the case in the following order:

  • Patient description
  • Case history
  • Physical examination results
  • Results of pathological tests and other investigations
  • Treatment plan
  • Expected outcome of the treatment plan
  • Actual outcome.

The author should ensure that all the relevant details are included and unnecessary ones excluded.

This is the most important part of the case report; the part that will convince the journal that the case is publication worthy. This section should start by expanding on what has been said in the introduction, focusing on why the case is noteworthy and the problem that it addresses.

This is followed by a summary of the existing literature on the topic. (If the journal specifies a separate section on literature review, it should be added before the Discussion). This part describes the existing theories and research findings on the key issue in the patient's condition. The review should narrow down to the source of confusion or the main challenge in the case.

Finally, the case report should be connected to the existing literature, mentioning the message that the case conveys. The author should explain whether this corroborates with or detracts from current beliefs about the problem and how this evidence can add value to future clinical practice.

A case report ends with a conclusion or with summary points, depending on the journal's specified format. This section should briefly give readers the key points covered in the case report. Here, the author can give suggestions and recommendations to clinicians, teachers, or researchers. Some journals do not want a separate section for the conclusion: it can then be the concluding paragraph of the Discussion section.

Notes on patient consent

Informed consent in an ethical requirement for most studies involving humans, so before you start writing your case report, take a written consent from the patient as all journals require that you provide it at the time of manuscript submission. In case the patient is a minor, parental consent is required. For adults who are unable to consent to investigation or treatment, consent of closest family members is required.

Patient anonymity is also an important requirement. Remember not to disclose any information that might reveal the identity of the patient. You need to be particularly careful with pictures, and ensure that pictures of the affected area do not reveal the identity of the patient.

  • - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Writing a case report...

Writing a case report in 10 steps

  • Related content
  • Peer review
  • Victoria Stokes , foundation year 2 doctor, trauma and orthopaedics, Basildon Hospital ,
  • Caroline Fertleman , paediatrics consultant, The Whittington Hospital NHS Trust
  • victoria.stokes1{at}nhs.net

Victoria Stokes and Caroline Fertleman explain how to turn an interesting case or unusual presentation into an educational report

It is common practice in medicine that when we come across an interesting case with an unusual presentation or a surprise twist, we must tell the rest of the medical world. This is how we continue our lifelong learning and aid faster diagnosis and treatment for patients.

It usually falls to the junior to write up the case, so here are a few simple tips to get you started.

First steps

Begin by sitting down with your medical team to discuss the interesting aspects of the case and the learning points to highlight. Ideally, a registrar or middle grade will mentor you and give you guidance. Another junior doctor or medical student may also be keen to be involved. Allocate jobs to split the workload, set a deadline and work timeframe, and discuss the order in which the authors will be listed. All listed authors should contribute substantially, with the person doing most of the work put first and the guarantor (usually the most senior team member) at the end.

Getting consent

Gain permission and written consent to write up the case from the patient or parents, if your patient is a child, and keep a copy because you will need it later for submission to journals.

Information gathering

Gather all the information from the medical notes and the hospital’s electronic systems, including copies of blood results and imaging, as medical notes often disappear when the patient is discharged and are notoriously difficult to find again. Remember to anonymise the data according to your local hospital policy.

Write up the case emphasising the interesting points of the presentation, investigations leading to diagnosis, and management of the disease/pathology. Get input on the case from all members of the team, highlighting their involvement. Also include the prognosis of the patient, if known, as the reader will want to know the outcome.

Coming up with a title

Discuss a title with your supervisor and other members of the team, as this provides the focus for your article. The title should be concise and interesting but should also enable people to find it in medical literature search engines. Also think about how you will present your case study—for example, a poster presentation or scientific paper—and consider potential journals or conferences, as you may need to write in a particular style or format.

Background research

Research the disease/pathology that is the focus of your article and write a background paragraph or two, highlighting the relevance of your case report in relation to this. If you are struggling, seek the opinion of a specialist who may know of relevant articles or texts. Another good resource is your hospital library, where staff are often more than happy to help with literature searches.

How your case is different

Move on to explore how the case presented differently to the admitting team. Alternatively, if your report is focused on management, explore the difficulties the team came across and alternative options for treatment.

Finish by explaining why your case report adds to the medical literature and highlight any learning points.

Writing an abstract

The abstract should be no longer than 100-200 words and should highlight all your key points concisely. This can be harder than writing the full article and needs special care as it will be used to judge whether your case is accepted for presentation or publication.

Discuss with your supervisor or team about options for presenting or publishing your case report. At the very least, you should present your article locally within a departmental or team meeting or at a hospital grand round. Well done!

Competing interests: We have read and understood BMJ’s policy on declaration of interests and declare that we have no competing interests.

what is a clinical case study

Case Study Research Method in Psychology

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

Case studies are in-depth investigations of a person, group, event, or community. Typically, data is gathered from various sources using several methods (e.g., observations & interviews).

The case study research method originated in clinical medicine (the case history, i.e., the patient’s personal history). In psychology, case studies are often confined to the study of a particular individual.

The information is mainly biographical and relates to events in the individual’s past (i.e., retrospective), as well as to significant events that are currently occurring in his or her everyday life.

The case study is not a research method, but researchers select methods of data collection and analysis that will generate material suitable for case studies.

Freud (1909a, 1909b) conducted very detailed investigations into the private lives of his patients in an attempt to both understand and help them overcome their illnesses.

This makes it clear that the case study is a method that should only be used by a psychologist, therapist, or psychiatrist, i.e., someone with a professional qualification.

There is an ethical issue of competence. Only someone qualified to diagnose and treat a person can conduct a formal case study relating to atypical (i.e., abnormal) behavior or atypical development.

case study

 Famous Case Studies

  • Anna O – One of the most famous case studies, documenting psychoanalyst Josef Breuer’s treatment of “Anna O” (real name Bertha Pappenheim) for hysteria in the late 1800s using early psychoanalytic theory.
  • Little Hans – A child psychoanalysis case study published by Sigmund Freud in 1909 analyzing his five-year-old patient Herbert Graf’s house phobia as related to the Oedipus complex.
  • Bruce/Brenda – Gender identity case of the boy (Bruce) whose botched circumcision led psychologist John Money to advise gender reassignment and raise him as a girl (Brenda) in the 1960s.
  • Genie Wiley – Linguistics/psychological development case of the victim of extreme isolation abuse who was studied in 1970s California for effects of early language deprivation on acquiring speech later in life.
  • Phineas Gage – One of the most famous neuropsychology case studies analyzes personality changes in railroad worker Phineas Gage after an 1848 brain injury involving a tamping iron piercing his skull.

Clinical Case Studies

  • Studying the effectiveness of psychotherapy approaches with an individual patient
  • Assessing and treating mental illnesses like depression, anxiety disorders, PTSD
  • Neuropsychological cases investigating brain injuries or disorders

Child Psychology Case Studies

  • Studying psychological development from birth through adolescence
  • Cases of learning disabilities, autism spectrum disorders, ADHD
  • Effects of trauma, abuse, deprivation on development

Types of Case Studies

  • Explanatory case studies : Used to explore causation in order to find underlying principles. Helpful for doing qualitative analysis to explain presumed causal links.
  • Exploratory case studies : Used to explore situations where an intervention being evaluated has no clear set of outcomes. It helps define questions and hypotheses for future research.
  • Descriptive case studies : Describe an intervention or phenomenon and the real-life context in which it occurred. It is helpful for illustrating certain topics within an evaluation.
  • Multiple-case studies : Used to explore differences between cases and replicate findings across cases. Helpful for comparing and contrasting specific cases.
  • Intrinsic : Used to gain a better understanding of a particular case. Helpful for capturing the complexity of a single case.
  • Collective : Used to explore a general phenomenon using multiple case studies. Helpful for jointly studying a group of cases in order to inquire into the phenomenon.

Where Do You Find Data for a Case Study?

There are several places to find data for a case study. The key is to gather data from multiple sources to get a complete picture of the case and corroborate facts or findings through triangulation of evidence. Most of this information is likely qualitative (i.e., verbal description rather than measurement), but the psychologist might also collect numerical data.

1. Primary sources

  • Interviews – Interviewing key people related to the case to get their perspectives and insights. The interview is an extremely effective procedure for obtaining information about an individual, and it may be used to collect comments from the person’s friends, parents, employer, workmates, and others who have a good knowledge of the person, as well as to obtain facts from the person him or herself.
  • Observations – Observing behaviors, interactions, processes, etc., related to the case as they unfold in real-time.
  • Documents & Records – Reviewing private documents, diaries, public records, correspondence, meeting minutes, etc., relevant to the case.

2. Secondary sources

  • News/Media – News coverage of events related to the case study.
  • Academic articles – Journal articles, dissertations etc. that discuss the case.
  • Government reports – Official data and records related to the case context.
  • Books/films – Books, documentaries or films discussing the case.

3. Archival records

Searching historical archives, museum collections and databases to find relevant documents, visual/audio records related to the case history and context.

Public archives like newspapers, organizational records, photographic collections could all include potentially relevant pieces of information to shed light on attitudes, cultural perspectives, common practices and historical contexts related to psychology.

4. Organizational records

Organizational records offer the advantage of often having large datasets collected over time that can reveal or confirm psychological insights.

Of course, privacy and ethical concerns regarding confidential data must be navigated carefully.

However, with proper protocols, organizational records can provide invaluable context and empirical depth to qualitative case studies exploring the intersection of psychology and organizations.

  • Organizational/industrial psychology research : Organizational records like employee surveys, turnover/retention data, policies, incident reports etc. may provide insight into topics like job satisfaction, workplace culture and dynamics, leadership issues, employee behaviors etc.
  • Clinical psychology : Therapists/hospitals may grant access to anonymized medical records to study aspects like assessments, diagnoses, treatment plans etc. This could shed light on clinical practices.
  • School psychology : Studies could utilize anonymized student records like test scores, grades, disciplinary issues, and counseling referrals to study child development, learning barriers, effectiveness of support programs, and more.

How do I Write a Case Study in Psychology?

Follow specified case study guidelines provided by a journal or your psychology tutor. General components of clinical case studies include: background, symptoms, assessments, diagnosis, treatment, and outcomes. Interpreting the information means the researcher decides what to include or leave out. A good case study should always clarify which information is the factual description and which is an inference or the researcher’s opinion.

1. Introduction

  • Provide background on the case context and why it is of interest, presenting background information like demographics, relevant history, and presenting problem.
  • Compare briefly to similar published cases if applicable. Clearly state the focus/importance of the case.

2. Case Presentation

  • Describe the presenting problem in detail, including symptoms, duration,and impact on daily life.
  • Include client demographics like age and gender, information about social relationships, and mental health history.
  • Describe all physical, emotional, and/or sensory symptoms reported by the client.
  • Use patient quotes to describe the initial complaint verbatim. Follow with full-sentence summaries of relevant history details gathered, including key components that led to a working diagnosis.
  • Summarize clinical exam results, namely orthopedic/neurological tests, imaging, lab tests, etc. Note actual results rather than subjective conclusions. Provide images if clearly reproducible/anonymized.
  • Clearly state the working diagnosis or clinical impression before transitioning to management.

3. Management and Outcome

  • Indicate the total duration of care and number of treatments given over what timeframe. Use specific names/descriptions for any therapies/interventions applied.
  • Present the results of the intervention,including any quantitative or qualitative data collected.
  • For outcomes, utilize visual analog scales for pain, medication usage logs, etc., if possible. Include patient self-reports of improvement/worsening of symptoms. Note the reason for discharge/end of care.

4. Discussion

  • Analyze the case, exploring contributing factors, limitations of the study, and connections to existing research.
  • Analyze the effectiveness of the intervention,considering factors like participant adherence, limitations of the study, and potential alternative explanations for the results.
  • Identify any questions raised in the case analysis and relate insights to established theories and current research if applicable. Avoid definitive claims about physiological explanations.
  • Offer clinical implications, and suggest future research directions.

5. Additional Items

  • Thank specific assistants for writing support only. No patient acknowledgments.
  • References should directly support any key claims or quotes included.
  • Use tables/figures/images only if substantially informative. Include permissions and legends/explanatory notes.
  • Provides detailed (rich qualitative) information.
  • Provides insight for further research.
  • Permitting investigation of otherwise impractical (or unethical) situations.

Case studies allow a researcher to investigate a topic in far more detail than might be possible if they were trying to deal with a large number of research participants (nomothetic approach) with the aim of ‘averaging’.

Because of their in-depth, multi-sided approach, case studies often shed light on aspects of human thinking and behavior that would be unethical or impractical to study in other ways.

Research that only looks into the measurable aspects of human behavior is not likely to give us insights into the subjective dimension of experience, which is important to psychoanalytic and humanistic psychologists.

Case studies are often used in exploratory research. They can help us generate new ideas (that might be tested by other methods). They are an important way of illustrating theories and can help show how different aspects of a person’s life are related to each other.

The method is, therefore, important for psychologists who adopt a holistic point of view (i.e., humanistic psychologists ).

Limitations

  • Lacking scientific rigor and providing little basis for generalization of results to the wider population.
  • Researchers’ own subjective feelings may influence the case study (researcher bias).
  • Difficult to replicate.
  • Time-consuming and expensive.
  • The volume of data, together with the time restrictions in place, impacted the depth of analysis that was possible within the available resources.

Because a case study deals with only one person/event/group, we can never be sure if the case study investigated is representative of the wider body of “similar” instances. This means the conclusions drawn from a particular case may not be transferable to other settings.

Because case studies are based on the analysis of qualitative (i.e., descriptive) data , a lot depends on the psychologist’s interpretation of the information she has acquired.

This means that there is a lot of scope for Anna O , and it could be that the subjective opinions of the psychologist intrude in the assessment of what the data means.

For example, Freud has been criticized for producing case studies in which the information was sometimes distorted to fit particular behavioral theories (e.g., Little Hans ).

This is also true of Money’s interpretation of the Bruce/Brenda case study (Diamond, 1997) when he ignored evidence that went against his theory.

Breuer, J., & Freud, S. (1895).  Studies on hysteria . Standard Edition 2: London.

Curtiss, S. (1981). Genie: The case of a modern wild child .

Diamond, M., & Sigmundson, K. (1997). Sex Reassignment at Birth: Long-term Review and Clinical Implications. Archives of Pediatrics & Adolescent Medicine , 151(3), 298-304

Freud, S. (1909a). Analysis of a phobia of a five year old boy. In The Pelican Freud Library (1977), Vol 8, Case Histories 1, pages 169-306

Freud, S. (1909b). Bemerkungen über einen Fall von Zwangsneurose (Der “Rattenmann”). Jb. psychoanal. psychopathol. Forsch ., I, p. 357-421; GW, VII, p. 379-463; Notes upon a case of obsessional neurosis, SE , 10: 151-318.

Harlow J. M. (1848). Passage of an iron rod through the head.  Boston Medical and Surgical Journal, 39 , 389–393.

Harlow, J. M. (1868).  Recovery from the Passage of an Iron Bar through the Head .  Publications of the Massachusetts Medical Society. 2  (3), 327-347.

Money, J., & Ehrhardt, A. A. (1972).  Man & Woman, Boy & Girl : The Differentiation and Dimorphism of Gender Identity from Conception to Maturity. Baltimore, Maryland: Johns Hopkins University Press.

Money, J., & Tucker, P. (1975). Sexual signatures: On being a man or a woman.

Further Information

  • Case Study Approach
  • Case Study Method
  • Enhancing the Quality of Case Studies in Health Services Research
  • “We do things together” A case study of “couplehood” in dementia
  • Using mixed methods for evaluating an integrative approach to cancer care: a case study

Print Friendly, PDF & Email

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Write for Us
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Volume 21, Issue 1
  • What is a case study?
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Roberta Heale 1 ,
  • Alison Twycross 2
  • 1 School of Nursing , Laurentian University , Sudbury , Ontario , Canada
  • 2 School of Health and Social Care , London South Bank University , London , UK
  • Correspondence to Dr Roberta Heale, School of Nursing, Laurentian University, Sudbury, ON P3E2C6, Canada; rheale{at}laurentian.ca

https://doi.org/10.1136/eb-2017-102845

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

What is it?

Case study is a research methodology, typically seen in social and life sciences. There is no one definition of case study research. 1 However, very simply… ‘a case study can be defined as an intensive study about a person, a group of people or a unit, which is aimed to generalize over several units’. 1 A case study has also been described as an intensive, systematic investigation of a single individual, group, community or some other unit in which the researcher examines in-depth data relating to several variables. 2

Often there are several similar cases to consider such as educational or social service programmes that are delivered from a number of locations. Although similar, they are complex and have unique features. In these circumstances, the evaluation of several, similar cases will provide a better answer to a research question than if only one case is examined, hence the multiple-case study. Stake asserts that the cases are grouped and viewed as one entity, called the quintain . 6  ‘We study what is similar and different about the cases to understand the quintain better’. 6

The steps when using case study methodology are the same as for other types of research. 6 The first step is defining the single case or identifying a group of similar cases that can then be incorporated into a multiple-case study. A search to determine what is known about the case(s) is typically conducted. This may include a review of the literature, grey literature, media, reports and more, which serves to establish a basic understanding of the cases and informs the development of research questions. Data in case studies are often, but not exclusively, qualitative in nature. In multiple-case studies, analysis within cases and across cases is conducted. Themes arise from the analyses and assertions about the cases as a whole, or the quintain, emerge. 6

Benefits and limitations of case studies

If a researcher wants to study a specific phenomenon arising from a particular entity, then a single-case study is warranted and will allow for a in-depth understanding of the single phenomenon and, as discussed above, would involve collecting several different types of data. This is illustrated in example 1 below.

Using a multiple-case research study allows for a more in-depth understanding of the cases as a unit, through comparison of similarities and differences of the individual cases embedded within the quintain. Evidence arising from multiple-case studies is often stronger and more reliable than from single-case research. Multiple-case studies allow for more comprehensive exploration of research questions and theory development. 6

Despite the advantages of case studies, there are limitations. The sheer volume of data is difficult to organise and data analysis and integration strategies need to be carefully thought through. There is also sometimes a temptation to veer away from the research focus. 2 Reporting of findings from multiple-case research studies is also challenging at times, 1 particularly in relation to the word limits for some journal papers.

Examples of case studies

Example 1: nurses’ paediatric pain management practices.

One of the authors of this paper (AT) has used a case study approach to explore nurses’ paediatric pain management practices. This involved collecting several datasets:

Observational data to gain a picture about actual pain management practices.

Questionnaire data about nurses’ knowledge about paediatric pain management practices and how well they felt they managed pain in children.

Questionnaire data about how critical nurses perceived pain management tasks to be.

These datasets were analysed separately and then compared 7–9 and demonstrated that nurses’ level of theoretical did not impact on the quality of their pain management practices. 7 Nor did individual nurse’s perceptions of how critical a task was effect the likelihood of them carrying out this task in practice. 8 There was also a difference in self-reported and observed practices 9 ; actual (observed) practices did not confirm to best practice guidelines, whereas self-reported practices tended to.

Example 2: quality of care for complex patients at Nurse Practitioner-Led Clinics (NPLCs)

The other author of this paper (RH) has conducted a multiple-case study to determine the quality of care for patients with complex clinical presentations in NPLCs in Ontario, Canada. 10 Five NPLCs served as individual cases that, together, represented the quatrain. Three types of data were collected including:

Review of documentation related to the NPLC model (media, annual reports, research articles, grey literature and regulatory legislation).

Interviews with nurse practitioners (NPs) practising at the five NPLCs to determine their perceptions of the impact of the NPLC model on the quality of care provided to patients with multimorbidity.

Chart audits conducted at the five NPLCs to determine the extent to which evidence-based guidelines were followed for patients with diabetes and at least one other chronic condition.

The three sources of data collected from the five NPLCs were analysed and themes arose related to the quality of care for complex patients at NPLCs. The multiple-case study confirmed that nurse practitioners are the primary care providers at the NPLCs, and this positively impacts the quality of care for patients with multimorbidity. Healthcare policy, such as lack of an increase in salary for NPs for 10 years, has resulted in issues in recruitment and retention of NPs at NPLCs. This, along with insufficient resources in the communities where NPLCs are located and high patient vulnerability at NPLCs, have a negative impact on the quality of care. 10

These examples illustrate how collecting data about a single case or multiple cases helps us to better understand the phenomenon in question. Case study methodology serves to provide a framework for evaluation and analysis of complex issues. It shines a light on the holistic nature of nursing practice and offers a perspective that informs improved patient care.

  • Gustafsson J
  • Calanzaro M
  • Sandelowski M

Competing interests None declared.

Provenance and peer review Commissioned; internally peer reviewed.

Read the full text or download the PDF:

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2023 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

What Is a Case Study?

Weighing the pros and cons of this method of research

Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

what is a clinical case study

Cara Lustik is a fact-checker and copywriter.

what is a clinical case study

Verywell / Colleen Tighe

  • Pros and Cons

What Types of Case Studies Are Out There?

Where do you find data for a case study, how do i write a psychology case study.

A case study is an in-depth study of one person, group, or event. In a case study, nearly every aspect of the subject's life and history is analyzed to seek patterns and causes of behavior. Case studies can be used in many different fields, including psychology, medicine, education, anthropology, political science, and social work.

The point of a case study is to learn as much as possible about an individual or group so that the information can be generalized to many others. Unfortunately, case studies tend to be highly subjective, and it is sometimes difficult to generalize results to a larger population.

While case studies focus on a single individual or group, they follow a format similar to other types of psychology writing. If you are writing a case study, we got you—here are some rules of APA format to reference.  

At a Glance

A case study, or an in-depth study of a person, group, or event, can be a useful research tool when used wisely. In many cases, case studies are best used in situations where it would be difficult or impossible for you to conduct an experiment. They are helpful for looking at unique situations and allow researchers to gather a lot of˜ information about a specific individual or group of people. However, it's important to be cautious of any bias we draw from them as they are highly subjective.

What Are the Benefits and Limitations of Case Studies?

A case study can have its strengths and weaknesses. Researchers must consider these pros and cons before deciding if this type of study is appropriate for their needs.

One of the greatest advantages of a case study is that it allows researchers to investigate things that are often difficult or impossible to replicate in a lab. Some other benefits of a case study:

  • Allows researchers to capture information on the 'how,' 'what,' and 'why,' of something that's implemented
  • Gives researchers the chance to collect information on why one strategy might be chosen over another
  • Permits researchers to develop hypotheses that can be explored in experimental research

On the other hand, a case study can have some drawbacks:

  • It cannot necessarily be generalized to the larger population
  • Cannot demonstrate cause and effect
  • It may not be scientifically rigorous
  • It can lead to bias

Researchers may choose to perform a case study if they want to explore a unique or recently discovered phenomenon. Through their insights, researchers develop additional ideas and study questions that might be explored in future studies.

It's important to remember that the insights from case studies cannot be used to determine cause-and-effect relationships between variables. However, case studies may be used to develop hypotheses that can then be addressed in experimental research.

Case Study Examples

There have been a number of notable case studies in the history of psychology. Much of  Freud's work and theories were developed through individual case studies. Some great examples of case studies in psychology include:

  • Anna O : Anna O. was a pseudonym of a woman named Bertha Pappenheim, a patient of a physician named Josef Breuer. While she was never a patient of Freud's, Freud and Breuer discussed her case extensively. The woman was experiencing symptoms of a condition that was then known as hysteria and found that talking about her problems helped relieve her symptoms. Her case played an important part in the development of talk therapy as an approach to mental health treatment.
  • Phineas Gage : Phineas Gage was a railroad employee who experienced a terrible accident in which an explosion sent a metal rod through his skull, damaging important portions of his brain. Gage recovered from his accident but was left with serious changes in both personality and behavior.
  • Genie : Genie was a young girl subjected to horrific abuse and isolation. The case study of Genie allowed researchers to study whether language learning was possible, even after missing critical periods for language development. Her case also served as an example of how scientific research may interfere with treatment and lead to further abuse of vulnerable individuals.

Such cases demonstrate how case research can be used to study things that researchers could not replicate in experimental settings. In Genie's case, her horrific abuse denied her the opportunity to learn a language at critical points in her development.

This is clearly not something researchers could ethically replicate, but conducting a case study on Genie allowed researchers to study phenomena that are otherwise impossible to reproduce.

There are a few different types of case studies that psychologists and other researchers might use:

  • Collective case studies : These involve studying a group of individuals. Researchers might study a group of people in a certain setting or look at an entire community. For example, psychologists might explore how access to resources in a community has affected the collective mental well-being of those who live there.
  • Descriptive case studies : These involve starting with a descriptive theory. The subjects are then observed, and the information gathered is compared to the pre-existing theory.
  • Explanatory case studies : These   are often used to do causal investigations. In other words, researchers are interested in looking at factors that may have caused certain things to occur.
  • Exploratory case studies : These are sometimes used as a prelude to further, more in-depth research. This allows researchers to gather more information before developing their research questions and hypotheses .
  • Instrumental case studies : These occur when the individual or group allows researchers to understand more than what is initially obvious to observers.
  • Intrinsic case studies : This type of case study is when the researcher has a personal interest in the case. Jean Piaget's observations of his own children are good examples of how an intrinsic case study can contribute to the development of a psychological theory.

The three main case study types often used are intrinsic, instrumental, and collective. Intrinsic case studies are useful for learning about unique cases. Instrumental case studies help look at an individual to learn more about a broader issue. A collective case study can be useful for looking at several cases simultaneously.

The type of case study that psychology researchers use depends on the unique characteristics of the situation and the case itself.

There are a number of different sources and methods that researchers can use to gather information about an individual or group. Six major sources that have been identified by researchers are:

  • Archival records : Census records, survey records, and name lists are examples of archival records.
  • Direct observation : This strategy involves observing the subject, often in a natural setting . While an individual observer is sometimes used, it is more common to utilize a group of observers.
  • Documents : Letters, newspaper articles, administrative records, etc., are the types of documents often used as sources.
  • Interviews : Interviews are one of the most important methods for gathering information in case studies. An interview can involve structured survey questions or more open-ended questions.
  • Participant observation : When the researcher serves as a participant in events and observes the actions and outcomes, it is called participant observation.
  • Physical artifacts : Tools, objects, instruments, and other artifacts are often observed during a direct observation of the subject.

If you have been directed to write a case study for a psychology course, be sure to check with your instructor for any specific guidelines you need to follow. If you are writing your case study for a professional publication, check with the publisher for their specific guidelines for submitting a case study.

Here is a general outline of what should be included in a case study.

Section 1: A Case History

This section will have the following structure and content:

Background information : The first section of your paper will present your client's background. Include factors such as age, gender, work, health status, family mental health history, family and social relationships, drug and alcohol history, life difficulties, goals, and coping skills and weaknesses.

Description of the presenting problem : In the next section of your case study, you will describe the problem or symptoms that the client presented with.

Describe any physical, emotional, or sensory symptoms reported by the client. Thoughts, feelings, and perceptions related to the symptoms should also be noted. Any screening or diagnostic assessments that are used should also be described in detail and all scores reported.

Your diagnosis : Provide your diagnosis and give the appropriate Diagnostic and Statistical Manual code. Explain how you reached your diagnosis, how the client's symptoms fit the diagnostic criteria for the disorder(s), or any possible difficulties in reaching a diagnosis.

Section 2: Treatment Plan

This portion of the paper will address the chosen treatment for the condition. This might also include the theoretical basis for the chosen treatment or any other evidence that might exist to support why this approach was chosen.

  • Cognitive behavioral approach : Explain how a cognitive behavioral therapist would approach treatment. Offer background information on cognitive behavioral therapy and describe the treatment sessions, client response, and outcome of this type of treatment. Make note of any difficulties or successes encountered by your client during treatment.
  • Humanistic approach : Describe a humanistic approach that could be used to treat your client, such as client-centered therapy . Provide information on the type of treatment you chose, the client's reaction to the treatment, and the end result of this approach. Explain why the treatment was successful or unsuccessful.
  • Psychoanalytic approach : Describe how a psychoanalytic therapist would view the client's problem. Provide some background on the psychoanalytic approach and cite relevant references. Explain how psychoanalytic therapy would be used to treat the client, how the client would respond to therapy, and the effectiveness of this treatment approach.
  • Pharmacological approach : If treatment primarily involves the use of medications, explain which medications were used and why. Provide background on the effectiveness of these medications and how monotherapy may compare with an approach that combines medications with therapy or other treatments.

This section of a case study should also include information about the treatment goals, process, and outcomes.

When you are writing a case study, you should also include a section where you discuss the case study itself, including the strengths and limitiations of the study. You should note how the findings of your case study might support previous research. 

In your discussion section, you should also describe some of the implications of your case study. What ideas or findings might require further exploration? How might researchers go about exploring some of these questions in additional studies?

Need More Tips?

Here are a few additional pointers to keep in mind when formatting your case study:

  • Never refer to the subject of your case study as "the client." Instead, use their name or a pseudonym.
  • Read examples of case studies to gain an idea about the style and format.
  • Remember to use APA format when citing references .

Crowe S, Cresswell K, Robertson A, Huby G, Avery A, Sheikh A. The case study approach .  BMC Med Res Methodol . 2011;11:100.

Crowe S, Cresswell K, Robertson A, Huby G, Avery A, Sheikh A. The case study approach . BMC Med Res Methodol . 2011 Jun 27;11:100. doi:10.1186/1471-2288-11-100

Gagnon, Yves-Chantal.  The Case Study as Research Method: A Practical Handbook . Canada, Chicago Review Press Incorporated DBA Independent Pub Group, 2010.

Yin, Robert K. Case Study Research and Applications: Design and Methods . United States, SAGE Publications, 2017.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Case Study | Definition, Examples & Methods

Case Study | Definition, Examples & Methods

Published on 5 May 2022 by Shona McCombes . Revised on 30 January 2023.

A case study is a detailed study of a specific subject, such as a person, group, place, event, organisation, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research.

A case study research design usually involves qualitative methods , but quantitative methods are sometimes also used. Case studies are good for describing , comparing, evaluating, and understanding different aspects of a research problem .

Table of contents

When to do a case study, step 1: select a case, step 2: build a theoretical framework, step 3: collect your data, step 4: describe and analyse the case.

A case study is an appropriate research design when you want to gain concrete, contextual, in-depth knowledge about a specific real-world subject. It allows you to explore the key characteristics, meanings, and implications of the case.

Case studies are often a good choice in a thesis or dissertation . They keep your project focused and manageable when you don’t have the time or resources to do large-scale research.

You might use just one complex case study where you explore a single subject in depth, or conduct multiple case studies to compare and illuminate different aspects of your research problem.

Prevent plagiarism, run a free check.

Once you have developed your problem statement and research questions , you should be ready to choose the specific case that you want to focus on. A good case study should have the potential to:

  • Provide new or unexpected insights into the subject
  • Challenge or complicate existing assumptions and theories
  • Propose practical courses of action to resolve a problem
  • Open up new directions for future research

Unlike quantitative or experimental research, a strong case study does not require a random or representative sample. In fact, case studies often deliberately focus on unusual, neglected, or outlying cases which may shed new light on the research problem.

If you find yourself aiming to simultaneously investigate and solve an issue, consider conducting action research . As its name suggests, action research conducts research and takes action at the same time, and is highly iterative and flexible. 

However, you can also choose a more common or representative case to exemplify a particular category, experience, or phenomenon.

While case studies focus more on concrete details than general theories, they should usually have some connection with theory in the field. This way the case study is not just an isolated description, but is integrated into existing knowledge about the topic. It might aim to:

  • Exemplify a theory by showing how it explains the case under investigation
  • Expand on a theory by uncovering new concepts and ideas that need to be incorporated
  • Challenge a theory by exploring an outlier case that doesn’t fit with established assumptions

To ensure that your analysis of the case has a solid academic grounding, you should conduct a literature review of sources related to the topic and develop a theoretical framework . This means identifying key concepts and theories to guide your analysis and interpretation.

There are many different research methods you can use to collect data on your subject. Case studies tend to focus on qualitative data using methods such as interviews, observations, and analysis of primary and secondary sources (e.g., newspaper articles, photographs, official records). Sometimes a case study will also collect quantitative data .

The aim is to gain as thorough an understanding as possible of the case and its context.

In writing up the case study, you need to bring together all the relevant aspects to give as complete a picture as possible of the subject.

How you report your findings depends on the type of research you are doing. Some case studies are structured like a standard scientific paper or thesis, with separate sections or chapters for the methods , results , and discussion .

Others are written in a more narrative style, aiming to explore the case from various angles and analyse its meanings and implications (for example, by using textual analysis or discourse analysis ).

In all cases, though, make sure to give contextual details about the case, connect it back to the literature and theory, and discuss how it fits into wider patterns or debates.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

McCombes, S. (2023, January 30). Case Study | Definition, Examples & Methods. Scribbr. Retrieved 22 April 2024, from https://www.scribbr.co.uk/research-methods/case-studies/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, correlational research | guide, design & examples, a quick guide to experimental design | 5 steps & examples, descriptive research design | definition, methods & examples.

Aquila Corporation Wheelchair Cushion Systems

Case Reports Vs Clinical Studies

Uncategorized

This post discusses questions validity if authored by an employee of the reporting company, Roho. This blog will answer these questions regarding clinical studies and clinical evidence:

  • What is the difference between a clinical study and a case report?
  • Who can observe and document the results of a clinical study?
  • What circumstance would beg questioning the validity of a case report?

(Information below was take from the site clinical trials .gov a service of the National Institute of Health)

Definition of case report and clinical study

In medicine, a  case report  is a detailed  report of the symptoms, signs, diagnosis, treatment, and follow-up of an individual patient.  Case reports  may contain a demographic profile of the patient, but usually describe an unusual or novel occurrence.

The case report is written on one individual patient.

Clinical Study

A research study using human subjects to evaluate biomedical or health-related outcomes. Two types of clinical studies are Interventional Studies  (or clinical trials) and  Observational Studies . A clinical study involves multiple patients.

Observational Clinical Studies have a qualified investigator.

In an observational study, investigators assess health outcomes in groups of participants according to a research plan or protocol. Participants may receive interventions (which can include medical products such as drugs or devices) or procedures as part of their routine medical care, but participants are not assigned to specific interventions by the investigator (as in a clinical trial).

The Key Responsibilities of a Clinical Study Investigator:

  • Be qualified to practice medicine or psychiatry and meet the qualifications specified by applicable national regulatory requirements(s)
  • Be qualified by education, training, and experience to assume responsibility for the proper conduct of the study,
  • Be familiar with and compliant with Good Clinical Practice (GCP)  ICH E6 Guideline  and applicable ethical and regulatory requirements prior to commencement of work on the study.
  • Provide evidence of his/her qualification using the Abbreviated  TransCelerate Curriculum Vitae (CV) form

The internal validity of a medical device case report is questioned if bias is present. One must consider bias in a case report authored by an employee of the company that makes the device described in the report.

These are the facts on clinical studies published on the roho website. 

  • There are 15 of what roho calls clinical studies on the roho website.  Based on the above definitions, these are not clinical studies but rather case reports. 
  • Of these 15 case reports only one pertains a seat cushion improving a pressure ulcer.

This one single case report is written by Cynthia Fleck, an employee of crown therapeutics which is a division of roho

After selling 1 million cushions over the span of 45 years in business roho has exactly 1 case report which was written by an employee of roho which then begs the question of validity of this report.

Related Posts

Pressure Injury Prevention , SofTech , Uncategorized

Considerations with a standing chair mechanism in fighting pressure sores.

Inspirational people in wheelchairs to follow on social media, get relief & healing from pressure injuries.

Order Your Pressure Relief Wheelchair Cushion Today

  • Open access
  • Published: 27 June 2011

The case study approach

  • Sarah Crowe 1 ,
  • Kathrin Cresswell 2 ,
  • Ann Robertson 2 ,
  • Guro Huby 3 ,
  • Anthony Avery 1 &
  • Aziz Sheikh 2  

BMC Medical Research Methodology volume  11 , Article number:  100 ( 2011 ) Cite this article

772k Accesses

1036 Citations

37 Altmetric

Metrics details

The case study approach allows in-depth, multi-faceted explorations of complex issues in their real-life settings. The value of the case study approach is well recognised in the fields of business, law and policy, but somewhat less so in health services research. Based on our experiences of conducting several health-related case studies, we reflect on the different types of case study design, the specific research questions this approach can help answer, the data sources that tend to be used, and the particular advantages and disadvantages of employing this methodological approach. The paper concludes with key pointers to aid those designing and appraising proposals for conducting case study research, and a checklist to help readers assess the quality of case study reports.

Peer Review reports

Introduction

The case study approach is particularly useful to employ when there is a need to obtain an in-depth appreciation of an issue, event or phenomenon of interest, in its natural real-life context. Our aim in writing this piece is to provide insights into when to consider employing this approach and an overview of key methodological considerations in relation to the design, planning, analysis, interpretation and reporting of case studies.

The illustrative 'grand round', 'case report' and 'case series' have a long tradition in clinical practice and research. Presenting detailed critiques, typically of one or more patients, aims to provide insights into aspects of the clinical case and, in doing so, illustrate broader lessons that may be learnt. In research, the conceptually-related case study approach can be used, for example, to describe in detail a patient's episode of care, explore professional attitudes to and experiences of a new policy initiative or service development or more generally to 'investigate contemporary phenomena within its real-life context' [ 1 ]. Based on our experiences of conducting a range of case studies, we reflect on when to consider using this approach, discuss the key steps involved and illustrate, with examples, some of the practical challenges of attaining an in-depth understanding of a 'case' as an integrated whole. In keeping with previously published work, we acknowledge the importance of theory to underpin the design, selection, conduct and interpretation of case studies[ 2 ]. In so doing, we make passing reference to the different epistemological approaches used in case study research by key theoreticians and methodologists in this field of enquiry.

This paper is structured around the following main questions: What is a case study? What are case studies used for? How are case studies conducted? What are the potential pitfalls and how can these be avoided? We draw in particular on four of our own recently published examples of case studies (see Tables 1 , 2 , 3 and 4 ) and those of others to illustrate our discussion[ 3 – 7 ].

What is a case study?

A case study is a research approach that is used to generate an in-depth, multi-faceted understanding of a complex issue in its real-life context. It is an established research design that is used extensively in a wide variety of disciplines, particularly in the social sciences. A case study can be defined in a variety of ways (Table 5 ), the central tenet being the need to explore an event or phenomenon in depth and in its natural context. It is for this reason sometimes referred to as a "naturalistic" design; this is in contrast to an "experimental" design (such as a randomised controlled trial) in which the investigator seeks to exert control over and manipulate the variable(s) of interest.

Stake's work has been particularly influential in defining the case study approach to scientific enquiry. He has helpfully characterised three main types of case study: intrinsic , instrumental and collective [ 8 ]. An intrinsic case study is typically undertaken to learn about a unique phenomenon. The researcher should define the uniqueness of the phenomenon, which distinguishes it from all others. In contrast, the instrumental case study uses a particular case (some of which may be better than others) to gain a broader appreciation of an issue or phenomenon. The collective case study involves studying multiple cases simultaneously or sequentially in an attempt to generate a still broader appreciation of a particular issue.

These are however not necessarily mutually exclusive categories. In the first of our examples (Table 1 ), we undertook an intrinsic case study to investigate the issue of recruitment of minority ethnic people into the specific context of asthma research studies, but it developed into a instrumental case study through seeking to understand the issue of recruitment of these marginalised populations more generally, generating a number of the findings that are potentially transferable to other disease contexts[ 3 ]. In contrast, the other three examples (see Tables 2 , 3 and 4 ) employed collective case study designs to study the introduction of workforce reconfiguration in primary care, the implementation of electronic health records into hospitals, and to understand the ways in which healthcare students learn about patient safety considerations[ 4 – 6 ]. Although our study focusing on the introduction of General Practitioners with Specialist Interests (Table 2 ) was explicitly collective in design (four contrasting primary care organisations were studied), is was also instrumental in that this particular professional group was studied as an exemplar of the more general phenomenon of workforce redesign[ 4 ].

What are case studies used for?

According to Yin, case studies can be used to explain, describe or explore events or phenomena in the everyday contexts in which they occur[ 1 ]. These can, for example, help to understand and explain causal links and pathways resulting from a new policy initiative or service development (see Tables 2 and 3 , for example)[ 1 ]. In contrast to experimental designs, which seek to test a specific hypothesis through deliberately manipulating the environment (like, for example, in a randomised controlled trial giving a new drug to randomly selected individuals and then comparing outcomes with controls),[ 9 ] the case study approach lends itself well to capturing information on more explanatory ' how ', 'what' and ' why ' questions, such as ' how is the intervention being implemented and received on the ground?'. The case study approach can offer additional insights into what gaps exist in its delivery or why one implementation strategy might be chosen over another. This in turn can help develop or refine theory, as shown in our study of the teaching of patient safety in undergraduate curricula (Table 4 )[ 6 , 10 ]. Key questions to consider when selecting the most appropriate study design are whether it is desirable or indeed possible to undertake a formal experimental investigation in which individuals and/or organisations are allocated to an intervention or control arm? Or whether the wish is to obtain a more naturalistic understanding of an issue? The former is ideally studied using a controlled experimental design, whereas the latter is more appropriately studied using a case study design.

Case studies may be approached in different ways depending on the epistemological standpoint of the researcher, that is, whether they take a critical (questioning one's own and others' assumptions), interpretivist (trying to understand individual and shared social meanings) or positivist approach (orientating towards the criteria of natural sciences, such as focusing on generalisability considerations) (Table 6 ). Whilst such a schema can be conceptually helpful, it may be appropriate to draw on more than one approach in any case study, particularly in the context of conducting health services research. Doolin has, for example, noted that in the context of undertaking interpretative case studies, researchers can usefully draw on a critical, reflective perspective which seeks to take into account the wider social and political environment that has shaped the case[ 11 ].

How are case studies conducted?

Here, we focus on the main stages of research activity when planning and undertaking a case study; the crucial stages are: defining the case; selecting the case(s); collecting and analysing the data; interpreting data; and reporting the findings.

Defining the case

Carefully formulated research question(s), informed by the existing literature and a prior appreciation of the theoretical issues and setting(s), are all important in appropriately and succinctly defining the case[ 8 , 12 ]. Crucially, each case should have a pre-defined boundary which clarifies the nature and time period covered by the case study (i.e. its scope, beginning and end), the relevant social group, organisation or geographical area of interest to the investigator, the types of evidence to be collected, and the priorities for data collection and analysis (see Table 7 )[ 1 ]. A theory driven approach to defining the case may help generate knowledge that is potentially transferable to a range of clinical contexts and behaviours; using theory is also likely to result in a more informed appreciation of, for example, how and why interventions have succeeded or failed[ 13 ].

For example, in our evaluation of the introduction of electronic health records in English hospitals (Table 3 ), we defined our cases as the NHS Trusts that were receiving the new technology[ 5 ]. Our focus was on how the technology was being implemented. However, if the primary research interest had been on the social and organisational dimensions of implementation, we might have defined our case differently as a grouping of healthcare professionals (e.g. doctors and/or nurses). The precise beginning and end of the case may however prove difficult to define. Pursuing this same example, when does the process of implementation and adoption of an electronic health record system really begin or end? Such judgements will inevitably be influenced by a range of factors, including the research question, theory of interest, the scope and richness of the gathered data and the resources available to the research team.

Selecting the case(s)

The decision on how to select the case(s) to study is a very important one that merits some reflection. In an intrinsic case study, the case is selected on its own merits[ 8 ]. The case is selected not because it is representative of other cases, but because of its uniqueness, which is of genuine interest to the researchers. This was, for example, the case in our study of the recruitment of minority ethnic participants into asthma research (Table 1 ) as our earlier work had demonstrated the marginalisation of minority ethnic people with asthma, despite evidence of disproportionate asthma morbidity[ 14 , 15 ]. In another example of an intrinsic case study, Hellstrom et al.[ 16 ] studied an elderly married couple living with dementia to explore how dementia had impacted on their understanding of home, their everyday life and their relationships.

For an instrumental case study, selecting a "typical" case can work well[ 8 ]. In contrast to the intrinsic case study, the particular case which is chosen is of less importance than selecting a case that allows the researcher to investigate an issue or phenomenon. For example, in order to gain an understanding of doctors' responses to health policy initiatives, Som undertook an instrumental case study interviewing clinicians who had a range of responsibilities for clinical governance in one NHS acute hospital trust[ 17 ]. Sampling a "deviant" or "atypical" case may however prove even more informative, potentially enabling the researcher to identify causal processes, generate hypotheses and develop theory.

In collective or multiple case studies, a number of cases are carefully selected. This offers the advantage of allowing comparisons to be made across several cases and/or replication. Choosing a "typical" case may enable the findings to be generalised to theory (i.e. analytical generalisation) or to test theory by replicating the findings in a second or even a third case (i.e. replication logic)[ 1 ]. Yin suggests two or three literal replications (i.e. predicting similar results) if the theory is straightforward and five or more if the theory is more subtle. However, critics might argue that selecting 'cases' in this way is insufficiently reflexive and ill-suited to the complexities of contemporary healthcare organisations.

The selected case study site(s) should allow the research team access to the group of individuals, the organisation, the processes or whatever else constitutes the chosen unit of analysis for the study. Access is therefore a central consideration; the researcher needs to come to know the case study site(s) well and to work cooperatively with them. Selected cases need to be not only interesting but also hospitable to the inquiry [ 8 ] if they are to be informative and answer the research question(s). Case study sites may also be pre-selected for the researcher, with decisions being influenced by key stakeholders. For example, our selection of case study sites in the evaluation of the implementation and adoption of electronic health record systems (see Table 3 ) was heavily influenced by NHS Connecting for Health, the government agency that was responsible for overseeing the National Programme for Information Technology (NPfIT)[ 5 ]. This prominent stakeholder had already selected the NHS sites (through a competitive bidding process) to be early adopters of the electronic health record systems and had negotiated contracts that detailed the deployment timelines.

It is also important to consider in advance the likely burden and risks associated with participation for those who (or the site(s) which) comprise the case study. Of particular importance is the obligation for the researcher to think through the ethical implications of the study (e.g. the risk of inadvertently breaching anonymity or confidentiality) and to ensure that potential participants/participating sites are provided with sufficient information to make an informed choice about joining the study. The outcome of providing this information might be that the emotive burden associated with participation, or the organisational disruption associated with supporting the fieldwork, is considered so high that the individuals or sites decide against participation.

In our example of evaluating implementations of electronic health record systems, given the restricted number of early adopter sites available to us, we sought purposively to select a diverse range of implementation cases among those that were available[ 5 ]. We chose a mixture of teaching, non-teaching and Foundation Trust hospitals, and examples of each of the three electronic health record systems procured centrally by the NPfIT. At one recruited site, it quickly became apparent that access was problematic because of competing demands on that organisation. Recognising the importance of full access and co-operative working for generating rich data, the research team decided not to pursue work at that site and instead to focus on other recruited sites.

Collecting the data

In order to develop a thorough understanding of the case, the case study approach usually involves the collection of multiple sources of evidence, using a range of quantitative (e.g. questionnaires, audits and analysis of routinely collected healthcare data) and more commonly qualitative techniques (e.g. interviews, focus groups and observations). The use of multiple sources of data (data triangulation) has been advocated as a way of increasing the internal validity of a study (i.e. the extent to which the method is appropriate to answer the research question)[ 8 , 18 – 21 ]. An underlying assumption is that data collected in different ways should lead to similar conclusions, and approaching the same issue from different angles can help develop a holistic picture of the phenomenon (Table 2 )[ 4 ].

Brazier and colleagues used a mixed-methods case study approach to investigate the impact of a cancer care programme[ 22 ]. Here, quantitative measures were collected with questionnaires before, and five months after, the start of the intervention which did not yield any statistically significant results. Qualitative interviews with patients however helped provide an insight into potentially beneficial process-related aspects of the programme, such as greater, perceived patient involvement in care. The authors reported how this case study approach provided a number of contextual factors likely to influence the effectiveness of the intervention and which were not likely to have been obtained from quantitative methods alone.

In collective or multiple case studies, data collection needs to be flexible enough to allow a detailed description of each individual case to be developed (e.g. the nature of different cancer care programmes), before considering the emerging similarities and differences in cross-case comparisons (e.g. to explore why one programme is more effective than another). It is important that data sources from different cases are, where possible, broadly comparable for this purpose even though they may vary in nature and depth.

Analysing, interpreting and reporting case studies

Making sense and offering a coherent interpretation of the typically disparate sources of data (whether qualitative alone or together with quantitative) is far from straightforward. Repeated reviewing and sorting of the voluminous and detail-rich data are integral to the process of analysis. In collective case studies, it is helpful to analyse data relating to the individual component cases first, before making comparisons across cases. Attention needs to be paid to variations within each case and, where relevant, the relationship between different causes, effects and outcomes[ 23 ]. Data will need to be organised and coded to allow the key issues, both derived from the literature and emerging from the dataset, to be easily retrieved at a later stage. An initial coding frame can help capture these issues and can be applied systematically to the whole dataset with the aid of a qualitative data analysis software package.

The Framework approach is a practical approach, comprising of five stages (familiarisation; identifying a thematic framework; indexing; charting; mapping and interpretation) , to managing and analysing large datasets particularly if time is limited, as was the case in our study of recruitment of South Asians into asthma research (Table 1 )[ 3 , 24 ]. Theoretical frameworks may also play an important role in integrating different sources of data and examining emerging themes. For example, we drew on a socio-technical framework to help explain the connections between different elements - technology; people; and the organisational settings within which they worked - in our study of the introduction of electronic health record systems (Table 3 )[ 5 ]. Our study of patient safety in undergraduate curricula drew on an evaluation-based approach to design and analysis, which emphasised the importance of the academic, organisational and practice contexts through which students learn (Table 4 )[ 6 ].

Case study findings can have implications both for theory development and theory testing. They may establish, strengthen or weaken historical explanations of a case and, in certain circumstances, allow theoretical (as opposed to statistical) generalisation beyond the particular cases studied[ 12 ]. These theoretical lenses should not, however, constitute a strait-jacket and the cases should not be "forced to fit" the particular theoretical framework that is being employed.

When reporting findings, it is important to provide the reader with enough contextual information to understand the processes that were followed and how the conclusions were reached. In a collective case study, researchers may choose to present the findings from individual cases separately before amalgamating across cases. Care must be taken to ensure the anonymity of both case sites and individual participants (if agreed in advance) by allocating appropriate codes or withholding descriptors. In the example given in Table 3 , we decided against providing detailed information on the NHS sites and individual participants in order to avoid the risk of inadvertent disclosure of identities[ 5 , 25 ].

What are the potential pitfalls and how can these be avoided?

The case study approach is, as with all research, not without its limitations. When investigating the formal and informal ways undergraduate students learn about patient safety (Table 4 ), for example, we rapidly accumulated a large quantity of data. The volume of data, together with the time restrictions in place, impacted on the depth of analysis that was possible within the available resources. This highlights a more general point of the importance of avoiding the temptation to collect as much data as possible; adequate time also needs to be set aside for data analysis and interpretation of what are often highly complex datasets.

Case study research has sometimes been criticised for lacking scientific rigour and providing little basis for generalisation (i.e. producing findings that may be transferable to other settings)[ 1 ]. There are several ways to address these concerns, including: the use of theoretical sampling (i.e. drawing on a particular conceptual framework); respondent validation (i.e. participants checking emerging findings and the researcher's interpretation, and providing an opinion as to whether they feel these are accurate); and transparency throughout the research process (see Table 8 )[ 8 , 18 – 21 , 23 , 26 ]. Transparency can be achieved by describing in detail the steps involved in case selection, data collection, the reasons for the particular methods chosen, and the researcher's background and level of involvement (i.e. being explicit about how the researcher has influenced data collection and interpretation). Seeking potential, alternative explanations, and being explicit about how interpretations and conclusions were reached, help readers to judge the trustworthiness of the case study report. Stake provides a critique checklist for a case study report (Table 9 )[ 8 ].

Conclusions

The case study approach allows, amongst other things, critical events, interventions, policy developments and programme-based service reforms to be studied in detail in a real-life context. It should therefore be considered when an experimental design is either inappropriate to answer the research questions posed or impossible to undertake. Considering the frequency with which implementations of innovations are now taking place in healthcare settings and how well the case study approach lends itself to in-depth, complex health service research, we believe this approach should be more widely considered by researchers. Though inherently challenging, the research case study can, if carefully conceptualised and thoughtfully undertaken and reported, yield powerful insights into many important aspects of health and healthcare delivery.

Yin RK: Case study research, design and method. 2009, London: Sage Publications Ltd., 4

Google Scholar  

Keen J, Packwood T: Qualitative research; case study evaluation. BMJ. 1995, 311: 444-446.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Sheikh A, Halani L, Bhopal R, Netuveli G, Partridge M, Car J, et al: Facilitating the Recruitment of Minority Ethnic People into Research: Qualitative Case Study of South Asians and Asthma. PLoS Med. 2009, 6 (10): 1-11.

Article   Google Scholar  

Pinnock H, Huby G, Powell A, Kielmann T, Price D, Williams S, et al: The process of planning, development and implementation of a General Practitioner with a Special Interest service in Primary Care Organisations in England and Wales: a comparative prospective case study. Report for the National Co-ordinating Centre for NHS Service Delivery and Organisation R&D (NCCSDO). 2008, [ http://www.sdo.nihr.ac.uk/files/project/99-final-report.pdf ]

Robertson A, Cresswell K, Takian A, Petrakaki D, Crowe S, Cornford T, et al: Prospective evaluation of the implementation and adoption of NHS Connecting for Health's national electronic health record in secondary care in England: interim findings. BMJ. 2010, 41: c4564-

Pearson P, Steven A, Howe A, Sheikh A, Ashcroft D, Smith P, the Patient Safety Education Study Group: Learning about patient safety: organisational context and culture in the education of healthcare professionals. J Health Serv Res Policy. 2010, 15: 4-10. 10.1258/jhsrp.2009.009052.

Article   PubMed   Google Scholar  

van Harten WH, Casparie TF, Fisscher OA: The evaluation of the introduction of a quality management system: a process-oriented case study in a large rehabilitation hospital. Health Policy. 2002, 60 (1): 17-37. 10.1016/S0168-8510(01)00187-7.

Stake RE: The art of case study research. 1995, London: Sage Publications Ltd.

Sheikh A, Smeeth L, Ashcroft R: Randomised controlled trials in primary care: scope and application. Br J Gen Pract. 2002, 52 (482): 746-51.

PubMed   PubMed Central   Google Scholar  

King G, Keohane R, Verba S: Designing Social Inquiry. 1996, Princeton: Princeton University Press

Doolin B: Information technology as disciplinary technology: being critical in interpretative research on information systems. Journal of Information Technology. 1998, 13: 301-311. 10.1057/jit.1998.8.

George AL, Bennett A: Case studies and theory development in the social sciences. 2005, Cambridge, MA: MIT Press

Eccles M, the Improved Clinical Effectiveness through Behavioural Research Group (ICEBeRG): Designing theoretically-informed implementation interventions. Implementation Science. 2006, 1: 1-8. 10.1186/1748-5908-1-1.

Article   PubMed Central   Google Scholar  

Netuveli G, Hurwitz B, Levy M, Fletcher M, Barnes G, Durham SR, Sheikh A: Ethnic variations in UK asthma frequency, morbidity, and health-service use: a systematic review and meta-analysis. Lancet. 2005, 365 (9456): 312-7.

Sheikh A, Panesar SS, Lasserson T, Netuveli G: Recruitment of ethnic minorities to asthma studies. Thorax. 2004, 59 (7): 634-

CAS   PubMed   PubMed Central   Google Scholar  

Hellström I, Nolan M, Lundh U: 'We do things together': A case study of 'couplehood' in dementia. Dementia. 2005, 4: 7-22. 10.1177/1471301205049188.

Som CV: Nothing seems to have changed, nothing seems to be changing and perhaps nothing will change in the NHS: doctors' response to clinical governance. International Journal of Public Sector Management. 2005, 18: 463-477. 10.1108/09513550510608903.

Lincoln Y, Guba E: Naturalistic inquiry. 1985, Newbury Park: Sage Publications

Barbour RS: Checklists for improving rigour in qualitative research: a case of the tail wagging the dog?. BMJ. 2001, 322: 1115-1117. 10.1136/bmj.322.7294.1115.

Mays N, Pope C: Qualitative research in health care: Assessing quality in qualitative research. BMJ. 2000, 320: 50-52. 10.1136/bmj.320.7226.50.

Mason J: Qualitative researching. 2002, London: Sage

Brazier A, Cooke K, Moravan V: Using Mixed Methods for Evaluating an Integrative Approach to Cancer Care: A Case Study. Integr Cancer Ther. 2008, 7: 5-17. 10.1177/1534735407313395.

Miles MB, Huberman M: Qualitative data analysis: an expanded sourcebook. 1994, CA: Sage Publications Inc., 2

Pope C, Ziebland S, Mays N: Analysing qualitative data. Qualitative research in health care. BMJ. 2000, 320: 114-116. 10.1136/bmj.320.7227.114.

Cresswell KM, Worth A, Sheikh A: Actor-Network Theory and its role in understanding the implementation of information technology developments in healthcare. BMC Med Inform Decis Mak. 2010, 10 (1): 67-10.1186/1472-6947-10-67.

Article   PubMed   PubMed Central   Google Scholar  

Malterud K: Qualitative research: standards, challenges, and guidelines. Lancet. 2001, 358: 483-488. 10.1016/S0140-6736(01)05627-6.

Article   CAS   PubMed   Google Scholar  

Yin R: Case study research: design and methods. 1994, Thousand Oaks, CA: Sage Publishing, 2

Yin R: Enhancing the quality of case studies in health services research. Health Serv Res. 1999, 34: 1209-1224.

Green J, Thorogood N: Qualitative methods for health research. 2009, Los Angeles: Sage, 2

Howcroft D, Trauth E: Handbook of Critical Information Systems Research, Theory and Application. 2005, Cheltenham, UK: Northampton, MA, USA: Edward Elgar

Book   Google Scholar  

Blakie N: Approaches to Social Enquiry. 1993, Cambridge: Polity Press

Doolin B: Power and resistance in the implementation of a medical management information system. Info Systems J. 2004, 14: 343-362. 10.1111/j.1365-2575.2004.00176.x.

Bloomfield BP, Best A: Management consultants: systems development, power and the translation of problems. Sociological Review. 1992, 40: 533-560.

Shanks G, Parr A: Positivist, single case study research in information systems: A critical analysis. Proceedings of the European Conference on Information Systems. 2003, Naples

Pre-publication history

The pre-publication history for this paper can be accessed here: http://www.biomedcentral.com/1471-2288/11/100/prepub

Download references

Acknowledgements

We are grateful to the participants and colleagues who contributed to the individual case studies that we have drawn on. This work received no direct funding, but it has been informed by projects funded by Asthma UK, the NHS Service Delivery Organisation, NHS Connecting for Health Evaluation Programme, and Patient Safety Research Portfolio. We would also like to thank the expert reviewers for their insightful and constructive feedback. Our thanks are also due to Dr. Allison Worth who commented on an earlier draft of this manuscript.

Author information

Authors and affiliations.

Division of Primary Care, The University of Nottingham, Nottingham, UK

Sarah Crowe & Anthony Avery

Centre for Population Health Sciences, The University of Edinburgh, Edinburgh, UK

Kathrin Cresswell, Ann Robertson & Aziz Sheikh

School of Health in Social Science, The University of Edinburgh, Edinburgh, UK

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Sarah Crowe .

Additional information

Competing interests.

The authors declare that they have no competing interests.

Authors' contributions

AS conceived this article. SC, KC and AR wrote this paper with GH, AA and AS all commenting on various drafts. SC and AS are guarantors.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article.

Crowe, S., Cresswell, K., Robertson, A. et al. The case study approach. BMC Med Res Methodol 11 , 100 (2011). https://doi.org/10.1186/1471-2288-11-100

Download citation

Received : 29 November 2010

Accepted : 27 June 2011

Published : 27 June 2011

DOI : https://doi.org/10.1186/1471-2288-11-100

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Case Study Approach
  • Electronic Health Record System
  • Case Study Design
  • Case Study Site
  • Case Study Report

BMC Medical Research Methodology

ISSN: 1471-2288

what is a clinical case study

  • Open access
  • Published: 23 April 2024

Comparing generative and extractive approaches to information extraction from abstracts describing randomized clinical trials

  • Christian Witte 1   na1 ,
  • David M. Schmidt 1   na1 &
  • Philipp Cimiano 1  

Journal of Biomedical Semantics volume  15 , Article number:  3 ( 2024 ) Cite this article

120 Accesses

3 Altmetric

Metrics details

Systematic reviews of Randomized Controlled Trials (RCTs) are an important part of the evidence-based medicine paradigm. However, the creation of such systematic reviews by clinical experts is costly as well as time-consuming, and results can get quickly outdated after publication. Most RCTs are structured based on the Patient, Intervention, Comparison, Outcomes (PICO) framework and there exist many approaches which aim to extract PICO elements automatically. The automatic extraction of PICO information from RCTs has the potential to significantly speed up the creation process of systematic reviews and this way also benefit the field of evidence-based medicine.

Previous work has addressed the extraction of PICO elements as the task of identifying relevant text spans or sentences, but without populating a structured representation of a trial. In contrast, in this work, we treat PICO elements as structured templates with slots to do justice to the complex nature of the information they represent. We present two different approaches to extract this structured information from the abstracts of RCTs. The first approach is an extractive approach based on our previous work that is extended to capture full document representations as well as by a clustering step to infer the number of instances of each template type. The second approach is a generative approach based on a seq2seq model that encodes the abstract describing the RCT and uses a decoder to infer a structured representation of a trial including its arms, treatments, endpoints and outcomes. Both approaches are evaluated with different base models on a manually annotated dataset consisting of RCT abstracts on an existing dataset comprising 211 annotated clinical trial abstracts for Type 2 Diabetes and Glaucoma. For both diseases, the extractive approach (with flan-t5-base ) reached the best \(F_1\) score, i.e. 0.547 ( \(\pm 0.006\) ) for type 2 diabetes and 0.636 ( \(\pm 0.006\) ) for glaucoma. Generally, the \(F_1\) scores were higher for glaucoma than for type 2 diabetes and the standard deviation was higher for the generative approach.

In our experiments, both approaches show promising performance extracting structured PICO information from RCTs, especially considering that most related work focuses on the far easier task of predicting less structured objects. In our experimental results, the extractive approach performs best in both cases, although the lead is greater for glaucoma than for type 2 diabetes. For future work, it remains to be investigated how the base model size affects the performance of both approaches in comparison. Although the extractive approach currently leaves more room for direct improvements, the generative approach might benefit from larger models.

Introduction

The number of publications describing Randomized Controlled Trials has been increasing at an exponential pace for decades [ 1 ], thus making it more and more challenging to appropriately summarize the existing clinical evidence by way of systematic reviews. Yet, the ability to summarize the current clinical evidence is a core process to support evidence-based medical decision making [ 2 ]. Indeed, the creation of systematic reviews is costly and time consuming as it is done manually by clinical experts with the result that systematic reviews and guidelines quickly become outdated after publication or are even outdated at the time of publication [ 3 , 4 , 5 , 6 ]. Due to the effort associated with the creation of systematic reviews, there has been significant interest on the question how to automate their creation [ 7 , 8 , 9 ]. Recently, approaches to automatically summarize clinical evidence by way of argumentative structures have been proposed [ 10 ]. The bottleneck for such approaches is the missing availability of a database of semantically described clinical trials that comprise of structured representations of the key outcomes of each study. As argued by Sánchez-Graillet et al. [ 10 ], information extraction approaches have the potential to support the extraction of key information about the design and results of clinical trials from publications. These structured representations of the results of a trial in turn could support the process of systematic review creation or at least considerably reduce the effort to do so.

Most RCTs follow the PICO ( P atient, I ntervention, C omparison, O utcomes) framework for structuring the presentation of clinical research findings. As a result, early IE approaches in the clinical domain classify full sentences of RCTs [ 11 , 12 ] or smaller text spans [ 13 ] into the elements of the PICO framework. However, treating the PICO elements as flat objects represented as a collection of text spans does not reflect the complex information presented in RCTs for the following reasons: 1) the description of a single PICO element could be spread across several sentences and 2) the relationship between different PICO elements is not modelled (e.g. which outcomes belong to the intervention group and which ones belong to the comparison group).

Witte and Cimiano [ 14 ] have proposed an extractive information extraction approach that captures the design and key results of an RCT by way of 10 different templates that capture the PICO elements in a structured way, modelling dependencies and relations between them. These templates are based on the C-TrO Ontology that has been designed to support use cases related to the aggregation of evidence from multiple clinical trials [ 15 ]. Those templates are instantiated with information from a given abstract describing the trial. For instance, a template Medication with slots DrugName , DoseValue and DoseUnit could be used to describe medications of intervention arms mentioned in a RCT. However, Witte and Cimiano [ 14 ] assume that the number of template instances (e.g. number of outcomes) is provided a-priori, which hinders the application of their approach in real world settings. Further, the approach of Witte and Cimiano [ 14 ] chunks the text into smaller segments and then combines the templates instantiated for each segment. This makes it difficult to capture relations that are mentioned across chunks.

In this paper, we build on the approach of Witte and Cimiano [ 14 ] and extend it in two directions. First, we rely on Longformers [ 16 ] and Flan-T5 [ 17 ] in order to encode the complete abstract, inferring template instances and slots jointly for the complete text. Second, overcoming the key assumption that the number of template instances are known a priori, we extend the approach by a clustering step that induces the number of template instances in an unsupervised manner.

Beyond the extractive approach, we also present a generative approach that is inspired in recent seq2seq architectures such as REBEL [ 18 ] or GenIE [ 19 ]. These approaches rely on an encoder-decoder architecture by which the text is encoded and certain output structures are generated. We apply this idea to directly decode a complex nested template structure representing the design and key results of a study. As main novelties, we propose a decoding approach that relies on a grammar to guide decoding, ensuring that only valid structures are generated. Second, we present an approach to linearize the structure to be predicted such that it can be encoded as a sequence to be predicted by the generative approach. Our grammar-constrained decoding approach is inspired by Lu et al. [ 20 ], who also prune/mask the vocabulary to consist only of elements which comply with the desired output format. The decoding mechanism presented in this work generalizes the output format specification to arbitrary right-linear context-free grammars.

We evaluate and compare both approaches on the dataset provided by Sanchez-Graillet et al. [ 21 ] and used in previous work [ 14 ], which consists of predicting 10 templates. The dataset comprises a total of 211 documents for two diseases: type 2 diabetes (104) and glaucoma (107). Our results show that the improved extractive approach using Flan-T5 as a base model performs best for both diseases in the dataset, achieving a mean \(F_1\) score of 0.547 ( \(\pm 0.006\) ) for type 2 diabetes and 0.636 ( \(\pm 0.006\) ) for glaucoma. However, both approaches have different strengths and weaknesses and are not yet suitable to fully automate the process of systematic review creation, but still have the potential to reduce the necessary effort a lot.

Additional data and evaluations (Appendix 2 , 4 and 5 ) as well as the used grammar (Appendix 1 ) and a case study (Appendix 3 ) can be found in the appendix.

In summary, our contributions are the following:

We present an extension of the approach proposed by Witte and Cimiano [ 14 ] in two directions: i) relying on Longformers [ 16 ] and Flan-T5 [ 17 ] to encode the complete abstract and infer templates and slots for the complete document jointly, and ii) using a clustering step to cluster the extracted template instances to infer the number of instances for each template type.

We present a novel generative information extraction approach that relies on a grammar to guide decoding, and propose a novel serialization of the nested template structure such that the problem can be casted as a seq2seq inference problem.

We evaluate both approaches on the dataset by Sanchez-Graillet et al. [ 21 ] and show that our improved extractive approach using Flan-T5 [ 17 ] as a base model performs best for both diseases.

Related work

In recent years, a number of information extraction approaches have been developed, targeting tasks such as event extraction (e.g., Lu et al. [ 22 ], Hsu et al. [ 23 ], Yang et al. [ 20 ]), relation extraction (e.g., Giorgi et al. [ 24 ]) or role/slot/template filling (e.g. Du et al. [ 25 , 26 ]). With respect to biomedical information extraction, there are also several approaches which aim to solve different tasks specifically for the domain of biomedical texts, e.g. scientific articles or clinical trials. Application domains range from event extraction (e.g., Wang et al. [ 27 ], Ramponi et al. [ 28 ], Zhu and Zheng [ 29 ], Huang et al. [ 30 ], Trieu et al. [ 31 ]) over relation extraction (e.g., Jiang and Kavuluru [ 32 , 33 ]) and question answering (e.g., Wang et al. [ 27 ]) through to named entity recognition (e.g., Stylianou et al. [ 34 ]).

The set of methods and tools used to solve these problems is quite diverse, comprising joint end-to-end transformer models (e.g., Ramponi et al. [ 28 ], Trieu et al. [ 31 ], Jiang and Kavuluru [ 32 ], Stylianou et al. [ 34 ]) as well as support vector machines (e.g., Kim and Meystre [ 33 ]), conditional random fields (e.g., Stylianou et al. [ 35 ], Farnsworth et al. [ 34 ], Tseo et al. [ 36 ]), hybrid deep neural networks (e.g., Zhu and Zheng [ 29 ]) and Long Short-Term Memory networks (LSTMs, e.g., Jiang and Kavuluru [ 32 ], Kim and Meystre [ 33 ], Farnsworth et al. [ 35 ]).

Some related work also deals with detecting clinical trial outcomes, outcome spans (e.g., Abaho et al. [ 37 , 38 , 39 ], Ganguly et al. [ 40 ]) or slot fillers (e.g., Papanikolaou et al. [ 41 ]) in (randomized) clinical trial abstracts. However, they lack the specific structure and dependencies of PICO templates and slots, which are used in this paper. These approaches mostly use transformer architectures, sometimes in combination with, e.g., LSTMs to detect the outcomes/slot fillers.

The PICO framework is frequently used to describe the results of RCTs in a structured way. This structure comprises of a number of templates and corresponding slots (which are uniquely assigned to a single template type). However, a RCT can contain multiple instances of a template, imposing the problem of matching recognized slot fillers with their corresponding template instance.

Some efforts in this area focus on the problem that larger amounts of training data are missing or at least expensive to create due to the need for clinical experts as annotators. These approaches therefore utilize distant or weak supervision for training on noisy label data (e.g., Dhrangadhariya and Müller [ 42 ], Nye et al. [ 43 ], Wallace et al. [ 44 ], Liu et al. [ 45 ]). In contrast, the approach presented in this paper relies on the availability of sufficient classical supervised training data.

Other methods work with Conditional Random Fields (CRFs) in combination with (Bi-)LSTMs (e.g., Jin and Szolovits [ 46 ], Kang et al. [ 47 ]) or rule-based methods (e.g., Chabou and Iglewski [ 48 ]).

While most recent work relies on transformer architectures, there are also diverse other approaches which utilize different machine learning techniques like support vector machines (e.g., Yuan et al. [ 49 ]), convolutional neural networks (e.g., Stylianou et al. [ 50 ]), LSTMs (e.g., Jin and Szolovits [ 51 ]) or other deep learning-based approaches (e.g., Afzal et al. [ 52 ]).

Several recent approaches use transformer models like BERT (Bidirectional Encoder Representations from Transformers, Devlin et al. [ 53 ]) for PICO recognition, but focus on different architectual and task-related details.

However, some approaches refer to PICO elements as flat classes, i.e. parts of sentences are just labeled, e.g., P or I, whereas our approach considers PICO elements to be nested structures, i.e. templates with slots that have to be filled with some portion of text. Examples for this simplified view on PICO elements are listed in the following:

Schmidt et al. [ 54 ] treat the PICO recognition task as a sentence classification/question answering task and thus, in contrast to the approach presented in this paper, do not work on the level of whole documents/abstracts or PICO elements which span multiple sentences. Therefore, Schmidt et al. [ 54 ] do not benefit from contextualized representations utilizing the whole abstract as a context. Moreover, the problem of mapping found PICO elements to unique template instances is not dealt with.

Zhang et al. [ 55 ] propose a multi-step approach that first identifies P, I/C and O elements in the text using either Convolutional Neural Networks (CNNs) or Bi-LSTMs. After that, a Diseases Named Entity Recognition model is used to extract disease-related entities in the PICO-labeled sentences. Various different models, like, e.g., BERT-based or LSTM-based models, are compared in this category. Finally, a mapping model resolves some ambiguities, like intersections of recognition results for P and O. Again, different models (including both BERT and Bi-LSTMs) are evaluated for this task. Although this approach makes some efforts to create more structured results than flat sentence classification, it still ignores some aspects of the more complex structure of PICO elements.

Whitton and Hunter [ 56 ] propose a more structured view on PICO elements, e.g., by differentiating between two arms of a RCT. This is achieved in two steps by first applying a named-entity recognition model, recognizing three general types of entities (interventions, outcomes and measures). In a second step, they are then related to each other using a relation extraction model which also differentiates between the (up to) two arms of the considered RCTs. However, they focus on evidence tables, which are different from the nested template structure we work with in this paper. Moreover, the other approach does not work in a sequence-to-sequence manner with constrained decoding like the generative approach described in this paper.

Dhrangadhariya et al. [ 57 ] implement PICO recognition for more fine-grained entities, which - similarly to our approach - also consider more detailed information about participants, interventions and outcomes, like sample size, age, mortality, drugs or surgical interventions. Nevertheless, it is still less detailed than the template structure used in this paper, which consists of 10 templates comprising overall 85 slots (see Witte and Cimiano [ 14 ]). Moreover, by using BERT as an encoder and Bi-LSTM, self-attention as well as CRF and linear layers for classification, it does not work in a sequence-to-sequence manner like the generative approach we present in this work.

In this work, we address the problem of extracting a set of template instances from unstructured text. We tackle this problem from two different perspectives and present two approaches solving the same problem: 1) an extractive approach and 2) a generative approach. An illustration of both approaches can be found in Fig. 1 .

figure 1

Illustration of both described approaches starting with the tokenized input and ending with the generated template instances

The used data model captures the design and key results of an RCT by way of 10 different templates consisting of a total of 85 different slots that capture various aspects of the PICO elements in a structured way. These templates are based on the C-TrO Ontology that has been designed to support use cases related to the aggregation of evidence from multiple clinical trials [ 15 ]. The mean number of slot fillers per template is shown in Table 1 . A template \(t_i\) is defined by a type \(i \in \mathcal L\) and a set of slots \(\mathcal S_i = \bigcup _j s_{ij}\) , where \(s_{ij}\) denotes slot j of template \(t_i\) , \(\bigcup _j\) this way denotes the set union over all slots j and \(\mathcal L\) denotes the set of all template types. A template is instantiated by assigning slot-fillers to its slots, where a slot-filler can be either a text span from the input document or a template instance, depending on the slot. Figure 2 visualizes the used data model. In the following subsections, we describe the extractive and the generative approach in more detail.

figure 2

Schema of the PICO data model used in the experiments

Extractive approach

Our extractive approach is based on the Intra-Template Compatibility (ITC) approach [ 14 ], which adopts a two-step architecture: In a first step, all textual slot-fillers are extracted from the input document, followed by a second step, which assigns the extracted slot-fillers to template instances. The extraction of slot-fillers and their clustering and assignment are described in the “ Extraction of textual slot-fillers ” and “ Assignment of textual slot-fillers to template instances ” Sections, respectively.

Encoding of the input document

The ITC approach uses BERT (Bidirectional Encoder Representations from Transformers) [ 53 ] to compute a contextualized representation of each token \(w_i\) of the input document \(d=(w_1,\ldots ,w_n)\) . As the length of RCT abstracts typically exceeds the maximum number of tokens of most BERT implementations, the authors of ITC split the document into consecutive chunks and process each chunk separately. However, this approach treats each chunk as an isolated unit and hence the model is not able to learn token representations which incorporate the context of the full input document. Therefore, we adopt the Longformer [ 16 ] approach as well as the Flan-T5 model [ 17 ] to learn full-document contextualized representations \(\textbf{h}_i \in \mathbb R^d\) (with \(d = 768\) for both T5 and Longformer models) for each token \(w_i\) of the input document, where d is the output dimension of the encoder of the respective model.

Extraction of textual slot-fillers

The ITC approach extracts slot-fillers from the input document by predicting start and end tokens of slot-fillers, followed by a step which joins the predicted start and end tokens. This is realized by training two linear layers which take the contextualized representation \(\textbf{h}_i\) of the tokens \(w_i\) as input and predicts whether or not this is a slot-filler start or end token, respectively:

where \(\mathcal S = \bigcup _i \mathcal S_i \cup \{\mathbb O\}\) is the set of all slots including the special no-slot label \(\mathbb O\) which indicates that a token is not classified as a start/end token of a slot-filler. The vectors \(\textbf{p}_{s,i}\) , \(\textbf{p}_{e,i}\) denote the predicted probability distribution over the slots that a token \(w_i\) is the start/end of the respective slots. The final prediction is determined by the \(\arg \max\) operation.

The predicted start/end tokens are joined sentence-wise by minimizing the distance between start and end tokens in terms of tokens in between. More precisely, for a given sentence, we first collect all predicted start and end tokens. For each predicted start token \(w_s\) , at position i we seek an end token \(w_e\) at position \(j \ge i\) with matching label and minimal distance to \(w_s\) and assign it to \(w_s\) as its end token. Finally, we discard predicted start/end tokens which have no matching end/start token. This slightly differs from the IOB format [ 58 ], as only start and end token of a sequence are tagged and all tokens in between are classified just like tokens which are not part of any sequence. A comparison of both tagging schemes can be found in Table 2 .

For each extracted slot-filler i with start/end tokens \(w_s\) resp. \(w_e\) with corresponding token representations \(\textbf{h}_s\) , resp. \(\textbf{h}_e\) , ITC computes a representation \(\textbf{e}_i\) by summing the representations of the start and end tokens followed by a dense layer with ReLU [ 59 ] activation function:

The learned representations \(\textbf{e}_i\) of the extracted slot-fillers (SFs) are then used as input to subsequent modules. In the remainder of this paper, we denote the set of all extracted slot fillers as \(\mathcal E\) , where each slot filler in \(\mathcal E\) is represented by its vector representation computed by Eq. ( 3 ).

Assignment of textual slot-fillers to template instances

Typically, for some slot types like the textual slot fillers of the Outcome template, there are several slot fillers of the same type extracted from an original document. Therefore, we need a way to group these slot fillers such that actual template instances, e.g. multiple Outcome instances, can be created from these slot fillers. Deciding which slot fillers belong together is however not a trivial task.

The assignment of extracted SFs to template instances is therefore done in ITC by a clustering approach per template based on a pairwise similarity or compatibility function \(q: \mathbb R^d \times \mathbb R^d \rightarrow [0, 1]\) . q scores the similarity between two SFs in the sense that they belong to the same template instance, where \(g(\textbf{e}_i, \textbf{e}_j)=1\) indicates maximum similarity such that \(\textbf{e}_i\) and \(\textbf{e}_j\) should be assigned to the same template instance. Note that \(\textbf{e}_i\) and \(\textbf{e}_j\) are entity representations calculated based on the contextualized embeddings generated by the used models. Thus, we can use results from the established field of (density-based) clustering to figure out the SF grouping. The similarity function q is implemented in a slightly more complex way compared to the original paper, using two linear layers with a ReLU activation function in between and followed by a sigmoid activation function:

Note that due to the symmetry of \(+\) , also q is a symmetric function, i.e. \(q(\textbf{e}_i, \textbf{e}_j) = q(\textbf{e}_j, \textbf{e}_i)\) for all pairs of \(\textbf{e}_i, \textbf{e}_j\) . Then the mean pairwise similarity between SFs of a cluster \(C_i \subseteq \mathcal E\) is given by

The score of a clustering \(\mathbb C_i = \{ C_1, \ldots , C_{m_i} \}\) of SFs \(\mathcal E_i \subseteq \mathcal E\) for template \(t_i\) is the mean score of its cluster scores:

The ITC approach seeks a clustering \(\mathbb C_i^*\) of \(m_i\) clusters which maximizes the score given by Eq. ( 7 ):

where \(\mathcal U_{i,m_i}\) denotes the set of all clusterings of the set \(\mathcal E_i\) with \(m_i\) clusters. Note that the optimization objective defined by Eq. ( 8 ) is parameterized by the number of clusters \(m_i\) . In order to alleviate the assumption that the number of instances of templates needs to be known a priori, we propose a clustering step to induce the number of template instances per template type using Hierarchical Agglomerative Clustering (HAC) with a threshold based on the average of values computed for the training data, namely:

the average similarity values of pairs belonging to the same template instance

the average similarity values of pairs belonging to different instances

After the clustering \(\mathbb C_i^*(m_i)\) has been estimated, the template instances \(t_{ij}\) are derived from those clusters \(C_j^* \in \mathbb C_i^*(m_i)\) . The slot to which a SF \(\textbf{e}_k \in C_j^*\) is assigned is given by the label assigned by the SF extraction module by Eqs. ( 1 ) and ( 2 ). In summary, the assignment of SFs to template instances is done as follows:

For each template \(t_i\) , the set \(\mathcal E_i \subseteq \mathcal E\) of SFs which can be assigned to instances of template type \(t_i\) is estimated.

Equation ( 8 ) or Agglomerative Hierarchical Clustering is used to find some clustering of the SFs in \(\mathcal E_i\) .

The template instances are derived from the clusters in the clustering.

As an example, we consider the following four extracted slot fillers:

PercentageAffected: 16

PercentageAffected: 8

TimePoint: week 24

TimePoint: week 12

Additionally, we assume our trained similarity function gives us the similarities presented in Table 3 .

Given these similarities and a clustering threshold of, e.g., 0.5, this results in two clusters which can be then directly used to create the corresponding Outcome template instances. These two clusters are:

PercentageAffected: 16 and TimePoint: week 24

PercentageAffected: 8 and TimePoint: week 12

The clustering thus provides a robust and flexible way to both determine the number of template instances to generate as well as the groups of slot fillers those instances comprise.

Generative approach

In this section we propose a simple generative approach for extracting template instances from unstructured text based on the Transformer [ 60 ] encoder-decoder model. As encoder-decoder models require the output to be a linear token sequence, the set of TIs needs to be converted into a sequence of tokens. In Section “ Linearization of sets of template instances ”, we present a simple recursive method for linearizing sets of TIs along a context free grammar (CFG) for describing the linearized structures. In Section “ Decoding ” we adopt the presented CFG for generating valid token sequences representing sets of TIs.

Transformer-based encoder-decoder models

Transformer-based [ 60 ] encoder-decoder models are seq2seq models which haven been used on a variety of natural language processing tasks like machine translation [ 61 ] and text summarization [ 62 ]. The encoder part of the Transformer learns a contextualized representation of the input tokens \(w_1, \ldots , w_n\) via multi-headed self-attention [ 60 ], converting the input sequence into a sequence of vectors \(\textbf{h}_1, \ldots , \textbf{h}_n \in \mathbb R^d\) , where d is the dimension of the Transformer model. Then the decoder part takes the vector sequence from the encoder as input and produces an output vector sequence \(\textbf{d}= (\textbf{d}_1, \ldots , \textbf{d}_n \in \mathbb R^d)\) via multi-headed cross-attention. The computational complexity of self-attention grows quadratically with the number of tokens. Beltagy et al. [ 16 ] proposed the Longformer encoder-decoder, which combines local and global multi-headed self-attention in the encoder, reducing computational complexity from \(\mathcal O(n^2)\) to \(\mathcal O(n)\) .

The output vector sequence \(\textbf{d}\) is used to compute a probability distribution over the vocabulary of the underlying model via the following equation:

where \(\textbf{v}_i \in \mathbb R^d\) is the embedding of token \(y_i\) , \(b_i\) is a bias for token \(y_i\) , \(\textbf{d}_{t-1}\) is the output vector of the decoder at position \((t-1)\) and d is the model dimension. The probability of token \(y_t\) at position t is conditioned on the input token sequence x and the past decoded tokens \(y_1, \ldots , y_{t-1}\) . This dependence is encoded through the vector \(\textbf{d}_{t-1}\) via multi-headed self- and cross-attention.

Token prediction in the decoder is done by maximum a posteriori probability (MAP) inference. Hence the predicted token at position i is given by the token with maximal posterior probability:

The generative model is trained via teacher forcing by minimizing the cross entropy loss between the predicted token distribution described by Eq. ( 9 ) and the ground truth label.

Linearization of sets of template instances

As encoder-decoder models expect the output space to be token sequences, we present a simple recursive linearization procedure of template instances (TIs). First, note that TIs are described by the content of their slots (i.e., their slot-fillers), and that slot-fillers can be either text spans from the input document or other TIs. Hence the recursion base is given by the linearization of textual slot-fillers. Let \(f = w_{k_1}, \ldots , w_{k_m}\) be a token sequence which represents a textual slot-filler f for a slot of name SLOT . Then the linearization of this slot-filler is the token sequence itself enclosed by the special tokens [start:SLOT] and [end:SLOT] , i.e. [start:SLOT] \(\odot ~w_{k_1} \odot \ldots \odot w_{k_m} \odot\) [end:SLOT] , where \(\odot\) denotes the concatenation of tokens. If the slot-filler is a TI, then it is recursively linearized and the resulting token sequence is enclosed by the special tokens [start:SLOT] and [end:SLOT] . The linearization of TIs is described below.

In general, more than one slot-filler can be assigned to a slot of a TI. Therefore, we denote the complete content of a slot as a set \(\mathcal F\) of slot-fillers. As sets, in contrast to sequences, are unordered constructs by definition, the linearization of sets of slot-fillers is inherently ambiguous. To get an unambiguous order, we introduce a slot ordering operator \(\omega\) which converts sets of slot-fillers into sequences of slot-fillers according to predefined criteria (e.g. position within input document in case of textual slot-fillers). Then sets \(\mathcal F\) of slot-fillers are linearized as follows: First, we sort the elements of \(\mathcal F\) according to the sorting operator \(\omega\) and obtain a sequence F of slot-fillers. Then we linearize each slot-filler in F as described above and concatenate the resulting token sequences, respecting the ordering of slot-fillers in F .

Next, we describe the linearization of TIs. As TIs are represented by the content of their slots, the linearization of a TI has to include the linearization of its slots. However, a template does not impose any ordering of its slots, and hence the linearization order of the slots of a TI is undefined. Therefore, we introduce another ordering operator \(\Omega\) which orders the slots of a template. Then the linearization of a TI is the concatenation of the linearizations of its slots according to the ordering of its slots given by the ordering operator \(\Omega\) .

Any set of TIs induces a graph with TIs as nodes and links between TIs as edges. Recall that there is a link from TI \(t_{ij}\) to TI \(t_{kl}\) iff \(t_{kl}\) is a slot-filler of \(t_{ij}\) . In order to guarantee that the linearization algorithm described above is well defined, we require the induced graph to be 1) acyclic and 2) connected. The first requirement ensures that the linearization algorithm terminates, while the second ensures the absence of isolated TI, which can not be linearized.

However, choosing \(\omega\) and \(\Omega\) is only necessary for training but not for inference purposes, as the decoding allows to fill template slots in any order. Therefore, we choose arbitrary but fixed \(\omega\) and \(\Omega\) for the experiments described in the “ Experimental results ” Section.

A full example for a whole linearized publication template instance can be found in Listing 2 in Appendix 6 . A shorter example for an intervention template instance with both textual and template slot fillers can be found in Fig. 3 .

figure 3

Illustration of linearization of an intervention template instance

A context-free grammar for describing linearization of sets of template instances

In the following, we describe the linearization of sets of TIs (described in Section “ Linearization of sets of template instances ”) by a context-free grammar (CFG) which is used in the decoding process (“ Decoding ” Section) to constrain the generation of tokens. A CFG is defined by a 4-tuple \(\mathcal G = (N, T, R, S)\) , where N is a set of non-terminal symbols, T is a set of terminal symbols, R is a set of production rules and \(S \in T\) is the start symbol of the grammar. The set of terminal symbols is defined by the vocabulary of the underlying encoder-decoder model together with some special tokens for defining the production rules R . The recursion base of the linearizations of sets of TIs is given by the linerization of textual slots which we describe by the following equation:

where TEMPLATE and SLOT are placeholders for names of template and slots, respectively, TEXT is a placeholder for any token sequence from the input document and [start:SLOT] , [end:SLOT] are special tokens enclosing the textual slot-filler. Eq. ( 11 ) schematically defines production rules for textual slot-fillers, and TEMPLATE is the non-terminal symbol which is used to identify the respective production rules. Note that the non-terminal symbol TEMPLATE on the right-hand side of Eq. ( 11 ) allows recursion and hence the application of more the one production rule associated with the non-terminal symbol TEMPLATE . The recursion base of production rules is given by

where TEMPLATE is again a placeholder for the template name and [end:TEMPLATE] is a special token indicating the end of the linearization of the template TEMPLATE . Production rules for TIs are described by

Analogously to the production rules defined by Eq. ( 11 ) for textual slot-fillers, the production rules for slots containing TIs as slot-fillers is defined by

where TEMPLATE_HEAD is a placeholder for any non-terminal symbol whose associated production rules are derived from Eq. ( 13 ). Listing 1 shows the production rules for the data model used in our experiments.

In Section “ A context-free grammar for describing linearization of sets of template instances ”, we presented a CFG which describes valid token sequences representing a set of TIs. In this section, we describe a simple method to constrain token prediction such that only such token sequences are generated which are valid according the CFG. For example, consider a slot Drug which can have textual slot-fillers for describing drug names for a medication. After the special token [start:Drug] has been predicted, we know that the set of next possible tokens would consist of all tokens from the input document plus the special token [end:Drug] . This information is encoded by the CFG, and the decoding method described in this section uses this information to constrain token prediction.

In this paper, we slightly generalize the constrained decoding approach of Lu et al. [ 20 ] to arbitrary right-linear CFGs by applying a strategy similar to recursive descent parsing.

Beginning with a start symbol, in our case PUBLICATION_HEAD , the set of possible next tokens is calculated in each decoding step. This set is then used to generate a mask for the model vocabulary to discard all tokens which would not comply with the production rules of the CFG. From the remaining tokens, we select the token with the maximum value in a greedy fashion. The implementation of a beam search to optimize the decoding output even more remains for future work.

To keep track of the decisions and possible next tokens, a stack data structure is used to guide the decoding. Whenever a start token of a slot like [start:NumberAffected] is chosen as the decoded token, this decision is saved by adding this to the decoding stack. This is then used to constrain the tokens in the next step to be only those which can follow a [start:NumberAffected] token. Similarly, when an end token like [end:NumberAffected] is chosen, the top stack element is removed from the stack.

This way, the decoding is guided to comply with the requirements imposed by the CFG and this way ensuring the output can then be parsed into actual TIs.

Experimental results

In this section, we discuss the setting of our experiments as well as the results of those experiments.

Experimental setting

In our experiments, we use the same dataset as Witte and Cimiano [ 14 ] for type 2 diabetes and glaucoma. The dataset comprises a total of 211 documents for two diseases: type 2 diabetes (104) and glaucoma (107). The 104 type 2 diabetes documents are split up into a training, validation and test sets of size 68, 16 and 20, respectively. Analogously, the 107 glaucoma documents are split up into a training, validation and test sets of size 69, 17 and 21, respectively. We use the same fixed train-validation-test split and run separate experiments for those two diseases. Both the extractive and the generative approach were then evaluated using multiple base models, namely allenai/longformer-base-4096 [ 16 ] for the extractive approach and allenai/led-base-16384 [ 16 ] as well as google/flan-t5-base [ 17 ] for both approaches. As the extractive approach requires just an encoder whereas the generative approach needs a decoder due to its seq2seq nature, we compare two encoder-decoder models from which only the encoder is used in the extractive approach. Additionally, we also evaluate an encoder-only model for the extractive approach to ensure the partial usage of the encoder-decoder models does not harm the performance.

For these models and diseases, we then run hyperparameter optimizations using Optuna [ 63 ] with 30 trials each and measuring performance using validation \(F_1\) scores. In each trial, an initial learning rate (between \(1e^{-3}\) and \(1e^{-5}\) , using logarithmic domain) and a \(\lambda\) for the lambda learning rate scheduler (between 0.9 and 1.0, using logarithmic domain, learning rate calculated with \(lr ( epoch ) = \lambda ^ {epoch}\) ) are sampled from Optuna. The used batch size is 1 and the number of epochs is 50 in all experiments. Each experiment is then run on a single NVIDIA A40 GPU. The best hyperparameters for each disease-approach-model-combination are then used to train 10 additional models. Unless stated differently, mean and standard deviation in tables refer to the different results of these 10 training runs. The means and standard deviations of the test \(F_1\) scores of these 10 trained models are listed in Table 4 for each combination.

Slot-filler extraction results

In all categories, the extractive approach paired with the flan-t5-base model performs best. In summary, for glaucoma, the extractive approach performs best with model flan-t5-base and a mean test \(F_1\) score of 0.636 ( \(\pm 0.006\) standard deviation across the 10 training runs with the best found hyperparameters of the category). This way, it outperforms the other tested models of the extractive approach as well as all models of the generative approach by 0.02 or more. For type 2 diabetes, the extractive approach performs best as well with model flan-t5-base and a mean \(F_1\) score of 0.547 ( \(\pm 0.006\) standard deviation). This indicates that the extractive approach is superior to the generative approach, although the lead is much smaller for type 2 diabetes than for glaucoma.

Table 5 shows the mean \(F_1\) scores per template on the type 2 diabetes and glaucoma test set. The table shows the values of the best models of each category (w.r.t. validation \(F_1\) score), i.e. the flan-t5-base models in all four cases. The mean \(F_1\) values are calculated for each of the 10 models trained using the best hyperparameters of their respective category. The values in the table correspond to the mean and standard deviation of those mean \(F_1\) scores per template. The generative approach performs better than the extractive one on the Medication templates (0.48 vs. 0.34 and 0.62 vs. 0.53 \(F_1\) score for type 2 diabetes and glaucoma, respectively). On the Population and Outcome template, the results are mixed with one approach performing better for one disease dataset but not for the other. On all six remaining templates, the extractive approach performs better, although with different margins.

Mean \(F_1\) scores per slot are shown in Table 7 in the Appendix 2 , again with mean and standard deviation (of the mean \(F_1\) scores) calculated for the 10 models trained using the best hyperparameters of their respective category. The \(F_1\) scores of the different slots range from over 0.9, e.g. PMID or PublicationYear , to below 0.1, e.g. FinalNumPatientsArm or ObservedResult . There are also some noticeable differences between the diseases, with Journal achieving scores of 0.96 and 0.92 for type 2 diabetes in contrast to 0.67 and 0.74 for glaucoma. There are also slots where one approach performs better than the other across both datasets, e.g. DoseUnit (0.77/0.8 generative vs. 0.24/0.6 extractive) and NumberPatientsCT (0.65/0.65 generative vs. 0.93/0.86 extractive).

Joint training on both datasets

Additionally to the main experiment described above, we ran another small experiment, training the best-performing generative and extractive model ( flan-t5-base in both cases) with the best-performing respective parameters in 10 trials on the union of the type 2 diabetes and glaucoma training, validation and test datasets, respectively. The resulting models are then again evaluated on the separated datasets for comparability reasons. The resulting mean \(F_1\) scores ( \(\pm \sigma\) ) for the generative approach are 0.556 (± 0.026) for type 2 diabetes and 0.626 (± 0.015) for glaucoma. For the extractive approach, the mean \(F_1\) scores ( \(\pm \sigma\) ) are 0.560 (± 0.007) for type 2 diabetes and 0.644 (± 0.008) for glaucoma. Therefore, the performance increases for both datasets and both approaches compared to the original results trained on the separated datasets. Moreover, the generative approach achieves comparable performance to the scores of the extractive approach trained on the separated datasets. At the same time, the extractive approach gets even better when also trained on both datasets at the same time.

Considering the relatively small datasets, this might indicate that performance for both diseases benefits from similar data in the other dataset, respectively. Therefore, we are optimistic that the training of a single general model (in contrast to specialized models for each disease as described in the main experiment) is possible with comparable or even better performance on diseases the model has been trained on (i.e., in-distribution data) and acceptable performance on different but similar diseases (i.e., out-of-distribution data). However, another dataset would be necessary to test this hypothesis such that this remains to be investigated in future work.

Inferred template cardinality results

In this section, we evaluate the ability of our models to infer the correct number of instances for each template type. For this, we compare the number of inferred templates to the number of instances in the gold standard by computing the mean abolsute deviation. Table 6 shows the mean absolute deviation between the ground truth and predicted template cardinality of the best extractive and generative model on the type 2 diabetes and glaucoma test sets. The mean absolute deviation values are calculated separately for each of the 10 models trained using the best hyperparameters of their respective category. The values in the table are then mean and standard deviation of those mean absolute deviations across the respective 10 trained models. Additionally, in Appendix 5 , the corresponding mean ground truth (GT) and predicted template cardinalities are listed in order to allow a judgement whether or not a certain deviation is high. Note that the templates Publication , ClinicalTrial and Population are not mentioned in these tables as their cardinality is always one.

On the type 2 diabetes dataset, the extractive approach yields better results than the generative approach in terms of template cardinality prediction for the DiffBetweenGroups , Endpoint and Medication templates, whereas the generative approach yields better results for the Arm , Intervention and Outcome templates. On the glaucoma dataset, the generative approach performs better than the extractive one in terms of cardinality inference on all templates except DiffBetweenGroups (0.39 vs. 0.17) and Endpoint (2.91 vs. 0.35).

The overall slot-filler extraction results of both models in terms of micro \(F_1\) measure indicate that the extractive approach is slightly superior to the generative approach, although the margin is especially small for the type 2 diabetes dataset (cf. Table 4 ). Moreover, the mean \(F_1\) scores per template (Table 5 ) suggest that the extractive approach performs better than the generative one on most templates on both datasets.

However, the full picture is a little more complex and both approaches have areas in which they perform better or worse than the other one and vice versa, and that for a variety of reasons.

First, it is noticeable that the \(F_1\) scores for glaucoma are, on average, higher than those for type 2 diabetes. Nevertheless, the difference between the results for both datasets is not the same for both approaches, although the trend is the same. For the generative approach, the performance of the best-performing flan-t5-base model decreases by just 0.045 (around \(7.7\%\) relatively) and the led-base-16384 version even increases its mean performance.

In contrast, the best-performing extractive version, again flan-t5-base , loses 0.089 (around \(14\%\) relatively) in terms of \(F_1\) performance - relatively almost twice as much as the generative approach. This may indicate that the extractive approach is better able to exploit certain characteristics which are specific to the glaucoma dataset and which are not present in the type 2 diabetes dataset, whereas the generative approach is more robust against those differences - both in a positive and in a negative way - and that way maybe generalizing a little more due to the more complex nature of the seq2seq task. However, it is not clear which properties of the data cause this deviation.

Considering robustness and the different complexity of the tasks of the extractive and generative task, this is to some degree also mirrored by the standard deviations of the two approaches. While the standard deviation for the extractive approach is not greater than 0.01, the standard deviation of the generative models is not smaller than 0.025 and gets up to 0.106 for led-base-16384 . Therefore, it is more than doubled at least compared to the extractive approach.

Moreover, the standard deviation appears to be correlated to the chosen model, with flan-t5-base giving the lowest deviation, followed by (for the extractive part) longformer-base-4096 and finally led-base-16384 consistently across both datasets.

The different strengths and weaknesses of both approaches become even more apparent examining the different performances separated by templates (Table 5 ) and, ultimately, single slots (Table 7 in the Appendix 2 ).

For whole templates, Table 5 shows an in parts mixed picture of which approach performs best. In many cases in which the extractive approach performs best, both approaches perform similarly well (e.g., Publication ). However, there are also different cases like Clinical Trial where the margin is larger, but also Medication where the generative approach outperforms the extractive approach by around 0.1 although the standard deviation is also quite high for the generative glaucoma case. In other cases there are large differences between the two datasets, which is also true for the evaluation per slot.

As an example for unexpected single slot differences, consider the Journal slot. One would expect the recognition of the Journal slot to be a comparably simple task across both datasets. However, the performance greatly differs between the datasets, although both approaches achieve good scores on this slot. For the type 2 diabetes dataset, the performance is nearly perfect with scores above 0.9. In contrast, the scores for the glaucoma dataset are still good but much worse with scores around 0.7. The different possible slot fillers are shown in Table 9 in the Appendix 4 . Looking at the different slot fillers, it is not immediately clear why the diabetes case is so much easier for both approaches than the glaucoma case. Both tables have approximately the same number of different entries and in both cases the journal names are in many cases trivial to recognize (containing either Diabetes or Ophthalmol ).

However, the distribution of occurrences might partially explain the performance differences here. Although both datasets have similar number of Journal slot fillers with up to three occurrences, only the type 2 diabetes dataset has (even multiple) Journal slot fillers with a high number of occurrences (more than \(\approx 8\) , e.g.). Therefore, the reason why the Journal slot appears to be so much easier to recognize in the type 2 diabetes dataset might not be due to the textual form of the slot fillers but instead because fewer slot fillers account for a larger majority of the general slot occurrences compared to the glaucoma dataset. The absolute numbers and differences are still quite small, however, but this might allow to get much better scores just by recognizing two or three Journal slot fillers. There may be many more examples which are not discussed here.

All in all it is not clear in all cases what properties of the data cause those partial differences in performance. However, it underlines on the one hand how much data variance can influence information extraction approaches like the two presented ones. On the other hand, this also emphasizes how both approaches can have different strengths and weaknesses and a flat evaluation only considering the final single performance score does not do justice to the complex nature of the task.

Similarly to the work by Witte and Cimiano [ 14 ], we conduct a case study on a single RCT abstract in which we compare the predicted and ground truth results for one exemplary document out of the type 2 diabetes test dataset. For this case study, we use the same publication as considered by Witte and Cimiano [ 14 ] which is the one by Shankar et al. [ 64 ]. The results of this case study can be found in Table 8 in the Appendix 3 .

Both the extractive and the generative approach succeed in extracting the basic characteristics of the trial which are part of the Publication template, e.g. authors, title and publication year. This is consistent with the results of Table 5 , which indicate that Publication is an especially easy template to extract. Similarly, the ClinicalTrial and Medication instances are, except some small errors, extracted almost perfectly. The template instance for the used Intervention is also extracted without errors by both approaches, which is a little more surprising taking into account the slightly lower score of around 0.6. Moreover, both approaches correctly predict that there are no textual slot fillers of the Arm template in the text.

For the Population template instance, we first encounter moderate differences to the gold standard. Although both approaches manage to extract USA as slot fillers for the Country slot, both fail to extract the second slot filler Australia as well as Ethnicity . The latter is at least in line with the fact that the first gold standard precondition - mentioning the ethnicity of the patients - is not recognized by both approaches. For the second Precondition slot filler, both approaches get a part of it but not the full slot filler, with the generative approach recognizing a slightly larger part of the actual slot filler. This is to some degree unexpected, as the mean performance of the extractive approach on the Population templates of the type 2 diabetes dataset is more than twice as high as the score of the generative approach.

For the DiffBetweenGroups template, the extractive approach returns a perfect result in this case, whereas the generative approach misses the \(P <0.001\) slot filler but delivers a duplicate of the \(P = 0.013\) slot filler. The mean results of Table 5 suggest similar performance, which is not the case here.

For the Endpoint template instances, both approaches manage to extract most slot fillers at least partially but show issues grouping them together correctly. The extractive approach puts all of the extracted slot fillers in just two instances, missing most instances of the gold standard. For the generative approach, however, it is the other way around and too many instances (containing some duplicates) are generated. Nevertheless, some of the generated instances are correct and in some cases there is just a part missing. Generally, the performance is rather unsatisfying here but is consistent with the comparably poor mean performance of around 0.4 on the Endpoint template, indicating this is an especially hard template to extract.

However, the situation is even worse for the Outcome template instances, which was to be expected considering the mean performance on the type 2 diabetes dataset of just 0.2 and 0.11 for the generative and extractive approach, respectively. Again, both approaches at least partially recognize most slot fillers, but fail to group them together correctly. Similarly to the Endpoint template instances, the extractive approach generates too few instances whereas the generative approach generates more instances. Nevertheless, those instances are not entirely correct in most cases. This suggests future work has to improve this grouping beyond simple similarity calculations or fully relying on the language model and constrained decoding.

Taken together, the current results, while promising, are not accurate enough to support the full automatic creation of a systematic review as proposed by Sanchez-Graillet et al. [ 10 ]. However, the proposed approach could considerably reduce the workload for teams to extract key information from a set of publications in the sense proposed by Thomas et al. [ 65 ]. The results, however, would need to be manually controlled. While the approach is not yet suited to support the full creation of a systematic review at high-quality, it could be used to summarize the existing literature in a cost-effective fashion to allow researchers to get a first overview of existing clinical evidence or as a basis to form hypothesis to be validated further on.

We have presented an extended extractive and a generative approach for extracting structured information from Randomized Controlled Trial abstracts, which can both support clinicians in finding best therapies on the basis of clinical evidence and in creating systematic reviews of the whole body of available clinical evidence. The extractive approach is realized by a two-step architecture which first extracts slot-fillers from the input document, followed by a clustering step which assigns the extracted slot-fillers to template instances. The best models of this approach yield an average \(F_1\) score of 0.547 on type 2 diabetes and 0.636 on glaucoma test sets, respectively. In the generative approach, the structured information given by the template instances is encoded as a linear token sequence which is decoded at inference time by utilizing a context-free grammar for guidance. The best models of the generative approach yield an average \(F_1\) score of 0.539 on type 2 diabetes and 0.584 on glaucoma test sets, respectively.

Future work should investigate whether the lead of the extractive approach persists when the base models of both approaches are scaled up, e.g. by using flan-t5-large , flan-t5-xl or even flan-t5-xxl or other large language models. The benefits of the extractive and generative approach could also be combined by adding a pointer network to the generative model. We will also investigate whether integrating a pointer network into the generative model can improve results. It would be also interesting to test the results in an actual evidence generation and comparison case study to assess whether the approach can indeed support the process of summarizing results from the clinical literature for a particular research question.

Availability of data and materials

The code and datasets generated and/or analysed during the current study are available in the Zenodo repository, https://doi.org/10.5281/zenodo.10419786 [ 66 ].

Abbreviations

Bidirectional encoder representations from transformers

Clinical trial ontology; CFG: context-free grammar

Convolutional neural network

Conditional random field

Evidence-based medicine

Finetuning language models

Generative information extraction

Ground truth

Hierarchical agglomerative clustering

Information extraction

if and only if

Intra-template compatibility

Longformer-encoder-decoder

Long short-term memory network

Mean absolute deviation

Maximum a posteriori probability

Patient, intervention, comparison, outcomes

Randomized controlled trial

Relation extraction by end-to-end language generation

Rectified linear unit

Slot-filler

Text-to-text transfer transformer

Template instance

Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 2010;7(9): e1000326. https://doi.org/10.1371/journal.pmed.1000326 .

Article   Google Scholar  

Sackett DL, Rosenberg WM, Gray JM, Haynes RB, Richardson WS. Evidence based medicine. BMJ. 1996;313(7050):170.

Shekelle PG, Ortiz E, Rhodes S, Morton SC, Eccles MP, Grimshaw JM, et al. Validity of the Agency for Healthcare Research and Quality clinical practice guidelines: how quickly do guidelines become outdated? JAMA. 2001;286(12):1461–7.

Shojania KG, Sampson M, Ansari MT, Ji J, Doucette S, Moher D. How quickly do systematic reviews go out of date? A survival analysis Ann Intern Med. 2007;147(4):224–33.

Beller EM, Chen JKH, Wang ULH, Glasziou PP. Are systematic reviews up-to-date at the time of publication? Syst Rev. 2013;2:36. https://doi.org/10.1186/2046-4053-2-36 .

Koch G. No improvement–still less than half of the Cochrane reviews are up to date. In: XIV Cochrane Colloquium. Dublin; 2006.

Tsafnat G, Glasziou P, Choong MK, et al. Systematic review automation technologies Syst Rev. 2014;3:74. https://doi.org/10.1186/2046-4053-3-74 .

Beller E, Clark J, Tsafnat G, et al. Making progress with the automation of systematic reviews: principles of the International Collaboration for the Automation of Systematic Reviews (ICASR). Syst Rev. 2018;7:77. https://doi.org/10.1186/s13643-018-0740-7 .

O’Connor AM, Tsafnat G, Gilbert SB, Thayer KA, Shemilt I, Thomas J, et al. Still moving toward automation of the systematic review process: a summary of discussions at the third meeting of the International Collaboration for Automation of Systematic Reviews (ICASR). Syst Rev. 2019;8:57. https://doi.org/10.1186/s13643-019-0975-y .

Sanchez-Graillet O, Witte C, Grimm F, Grautoff S, Ell B, Cimiano P. Synthesizing evidence from clinical trials with dynamic interactive argument trees. J Biomed Semant. 2022;13(1):16. https://doi.org/10.1186/s13326-022-00270-8 .

Boudin F, Nie JY, Bartlett JC, Grad R, Pluye P, Dawes M. Combining classifiers for robust PICO element detection. BMC Med Inform Decis Mak. 2010;10(1):1–6.

Jin D, Szolovits P. Pico. Element detection in medical text via long short-term memory neural networks. In: Proceedings of the BioNLP 2018 workshop. Melbourne: Association for Computational Linguistics; 2018. p. 67–75.  https://aclanthology.org/W18-2308 . https://doi.org/10.18653/v1/W18-2308 .

Trenta A, Hunter A, Riedel S. Extraction of evidence tables from abstracts of randomized clinical trials using a maximum entropy classifier and global constraints. 2015. arXiv preprint arXiv:1509.05209 .

Witte C, Cimiano P. Intra-Template Entity Compatibility based Slot-Filling for Clinical Trial Information Extraction. In: Proceedings of the 21st Workshop on Biomedical Language Processing. Dublin: Association for Computational Linguistics; 2022. p. 178–192.  https://aclanthology.org/2022.bionlp-1.18 . https://doi.org/10.18653/v1/2022.bionlp-1.18 .

Sanchez-Graillet O, Cimiano P, Witte C, Ell B. C-TrO: An Ontology for Summarization and Aggregation of the Level of Evidence in Clinical Trials. In: Proc. of the 5th Joint Ontology Workshops (JOWO): Ontologies and Data in the Life Sciences. 2019.  https://ceur-ws.org/Vol-2518/paper-ODLS7.pdf .

Beltagy I, Peters ME, Cohan A. Longformer: The long-document transformer. 2020. arXiv preprint arXiv:2004.05150 .

Chung HW, Hou L, Longpre S, Zoph B, Tay Y, Fedus W, et al. Scaling Instruction-Finetuned Language Models. CoRR. 2022. https://doi.org/10.48550/ARXIV.2210.11416 . arXiv:2210.11416

Cabot PLH, Navigli R. REBEL: Relation extraction by end-to-end language generation. In: Findings of the Association for Computational Linguistics: EMNLP 2021. Punta Cana: Association for Computational Linguistics; 2021. p. 2370–2381. https://aclanthology.org/2021.findings-emnlp.204 . https://doi.org/10.18653/v1/2021.findings-emnlp.204 .

Josifoski M, De Cao N, Peyrard M, West R. GenIE: generative information extraction. 2021. arXiv preprint arXiv:2112.08340 .

Lu Y, Lin H, Xu J, Han X, Tang J, Li A, et al. Text2Event: Controllable Sequence-to-Structure Generation for End-to-end Event Extraction. CoRR. 2021. arXiv:2106.09232 .

Sanchez-Graillet O, Witte C, Grimm F, Cimiano P. An annotated corpus of clinical trial publications supporting schema-based relational information extraction. J Biomed Semant. 2021. Under Review.

Hsu I, Huang K, Boschee E, Miller S, Natarajan P, Chang K, et al. Event Extraction as Natural Language Generation. CoRR. 2021. arXiv:2108.12724 .

Yang H, Sui D, Chen Y, Liu K, Zhao J, Wang T. Document-Level Event Extraction via Parallel Prediction Networks. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Online: Association for Computational Linguistics; 2021. p. 6298–6308. https://doi.org/10.18653/v1/2021.acl-long.492 .

Giorgi J, Bader GD, Wang B. A sequence-to-sequence approach for document-level relation extraction. BioNLP 2022@ ACL 2022. Dublin: Association for Computational Linguistics; 2022. p. 10–25.  https://aclanthology.org/2022.bionlp-1.2 . https://doi.org/10.18653/v1/2022.bionlp-1.2 .

Du X, Rush A, Cardie C. GRIT: Generative Role-filler Transformers for Document-level Event Entity Extraction. In: Merlo P, Tiedemann J, Tsarfaty R, editors. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Online: Association for Computational Linguistics; 2021. p. 634–644. https://doi.org/10.18653/v1/2021.eacl-main.52 . https://aclanthology.org/2021.eacl-main.52 .

Du X, Rush A, Cardie C. Template Filling with Generative Transformers. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Online: Association for Computational Linguistics; 2021. p. 909–914. https://doi.org/10.18653/v1/2021.naacl-main.70 .

Wang XD, Weber L, Leser U. Biomedical Event Extraction as Multi-turn Question Answering. In: Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis. Online: Association for Computational Linguistics; 2020. p. 88–96. https://doi.org/10.18653/v1/2020.louhi-1.10 .

Ramponi A, Van Der Goot R, Lombardo R, Plank B. Biomedical Event Extraction as Sequence Labeling. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics; 2020. p. 5357–5367. https://doi.org/10.18653/v1/2020.emnlp-main.431 .

Zhu L, Zheng H. Biomedical Event Extraction with a Novel Combination Strategy Based on Hybrid Deep Neural Networks. BMC Bioinformatics. 2020;21(1):47. https://doi.org/10.1186/s12859-020-3376-2 .

Huang KH, Yang M, Peng N. Biomedical Event Extraction with Hierarchical Knowledge Graphs. In: Cohn T, He Y, Liu Y, editors. Findings of the Association for Computational Linguistics: EMNLP 2020. Online: Association for Computational Linguistics; 2020. p. 1277–1285. https://doi.org/10.18653/v1/2020.findings-emnlp.114 . https://aclanthology.org/2020.findings-emnlp.114 .

Trieu HL, Tran TT, Duong KNA, Nguyen A, Miwa M, Ananiadou S. DeepEventMine: End-to-End Neural Nested Event Extraction from Biomedical Texts. Bioinformatics. 2020;36(19):4910–7. https://doi.org/10.1093/bioinformatics/btaa540 .

Jiang Y, Kavuluru R. End-to-End \(n\) -ary Relation Extraction for Combination Drug Therapies. 2023. https://doi.org/10.48550/arXiv.2303.16886 . arXiv:2303.16886 .

Kim Y, Meystre SM. Ensemble Method-Based Extraction of Medication and Related Information from Clinical Texts. J Am Med Inform Assoc. 2020;27(1):31–8. https://doi.org/10.1093/jamia/ocz100 .

Stylianou N, Kosmoliaptsis P, Vlahavas I. Improved Biomedical Entity Recognition via Longer Context Modeling. In: Maglogiannis I, Macintyre J, Iliadis L, editors. Artificial Intelligence Applications and Innovations. vol. 627. Cham: Springer International Publishing; 2021. p. 45–56. https://doi.org/10.1007/978-3-030-79150-6_4 .

Farnsworth S, Gurdin G, Vargas J, Mulyar A, Lewinski N, McInnes BT. Extracting Experimental Parameter Entities from Scientific Articles. J Biomed Inform. 2022Feb;126: 103970. https://doi.org/10.1016/j.jbi.2021.103970 .

Tseo Y, Salkola MI, Mohamed A, Kumar A, Abnousi F. Information Extraction of Clinical Trial Eligibility Criteria. 2020. https://doi.org/10.48550/arXiv.2006.07296 . arXiv:2006.07296 .

Abaho M, Bollegala D, Williamson PR, Dodd S. Assessment of contextualised representations in detecting outcome phrases in clinical trials. CoRR. 2022. https://doi.org/10.48550/ARXIV.2203.03547 . arXiv:2203.03547 .

Abaho M, Bollegala D, Williamson P, Dodd S. Position-based Prompting for Health Outcome Generation. In: Demner-Fushman D, Cohen KB, Ananiadou S, Tsujii J, editors. Proceedings of the 21st Workshop on Biomedical Language Processing. Dublin: Association for Computational Linguistics; 2022. p. 26–36. https://doi.org/10.18653/v1/2022.bionlp-1.3 . https://aclanthology.org/2022.bionlp-1.3 .

Abaho M, Bollegala D, Williamson P, Dodd S. Detect and Classify – Joint Span Detection and Classification for Health Outcomes. In: Moens MF, Huang X, Specia L, Yih SWt, editors. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Online and Punta Cana: Association for Computational Linguistics; 2021. p. 8709–8721. https://doi.org/10.18653/v1/2021.emnlp-main.686 . https://aclanthology.org/2021.emnlp-main.686 .

Ganguly D, Gleize M, Hou Y, Jochim C, Bonin F, Pascale A, et al. Outcome Prediction from Behaviour Change Intervention Evaluations using a Combination of Node and Word Embedding. AMIA Ann Symp Proc. 2021;2021:486–95. Published online 2022 Feb 21.

Papanikolaou Y, Staib M, Grace JJ, Bennett F. Slot Filling for Biomedical Information Extraction. In: Demner-Fushman D, Cohen KB, Ananiadou S, Tsujii J, editors. Proceedings of the 21st Workshop on Biomedical Language Processing, BioNLP@ACL 2022, Dublin, Ireland, May 26, 2022. Association for Computational Linguistics; 2022. p. 82–90. https://doi.org/10.18653/v1/2022.bionlp-1.7 .

Dhrangadhariya A, Müller H. Not so Weak PICO: Leveraging Weak Supervision for Participants, Interventions, and Outcomes Recognition for Systematic Review Automation. JAMIA Open. 2023;6(1):ooac107. https://doi.org/10.1093/jamiaopen/ooac107 .

Nye BE, DeYoung J, Lehman E, Nenkova A, Marshall IJ, Wallace BC. Understanding Clinical Trial Reports: Extracting Medical Entities and Their Relations. CoRR. 2020. arXiv:2010.03550 .

Wallace BC, Kuiper J, Sharma A, Zhu MB, Marshall IJ. Extracting PICO Sentences from Clinical Trial Reports using Supervised Distant Supervision. J Mach Learn Res. 2016;17:132:1–25. http://jmlr.org/papers/v17/15-404.html .

Liu S, Sun Y, Li B, Wang W, Bourgeois FT, Dunn AG. Sent2Span: Span Detection for PICO Extraction in the Biomedical Text without Span Annotations. In: Moens MF, Huang X, Specia L, Yih SWt, editors. Findings of the Association for Computational Linguistics: EMNLP 2021. Punta Cana: Association for Computational Linguistics; 2021. p. 1705–1715. https://doi.org/10.18653/v1/2021.findings-emnlp.147 . https://aclanthology.org/2021.findings-emnlp.147 .

Jin D, Szolovits P. Advancing PICO element detection in biomedical text via deep neural networks. Bioinform. 2020;36(12):3856–62. https://doi.org/10.1093/bioinformatics/btaa256 .

Kang T, Zou S, Weng C. Pretraining to Recognize PICO Elements from Randomized Controlled Trial Literature. In: Ohno-Machado L, Séroussi B, editors. MEDINFO 2019: Health and Wellbeing e-Networks for All - Proceedings of the 17th World Congress on Medical and Health Informatics, Lyon, France, 25-30 August 2019. vol. 264 of Studies in Health Technology and Informatics. IOS Press; 2019. p. 188–192. https://doi.org/10.3233/SHTI190209 .

Chabou S, Iglewski M. Combination of Conditional Random Field with a Rule Based Method in the Extraction of PICO Elements. BMC Med Inform Decis Mak. 2018;18(1):128. https://doi.org/10.1186/s12911-018-0699-2 .

Yuan X, Xiaoli L, Shilei L, Qinwen S, Ke L. Extracting PICO Elements From RCT Abstracts Using 1-2gram Analysis And Multitask Classification. In: Proceedings of the Third International Conference on Medical and Health Informatics 2019 - ICMHI 2019. Xiamen: ACM Press; 2019. p. 194–199. https://doi.org/10.1145/3340037.3340043 .

Stylianou N, Razis G, Goulis DG, Vlahavas I. EBM+: Advancing Evidence-Based Medicine via Two Level Automatic Identification of Populations, Interventions, Outcomes in Medical Literature. Artif Intell Med. 2020;108: 101949. https://doi.org/10.1016/j.artmed.2020.101949 .

Jin D, Szolovits P. PICO Element Detection in Medical Text via Long Short-Term Memory Neural Networks. In: Proceedings of the BioNLP 2018 Workshop. Melbourne: Association for Computational Linguistics; 2018. p. 67–75. https://doi.org/10.18653/v1/W18-2308 .

Afzal M, Alam F, Malik KM, Malik GM. Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation. J Med Internet Res. 2020;22(10): e19810. https://doi.org/10.2196/19810 .

Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of NAACL-HLT. Minneapolis: Association for Computational Linguistics; 2019. p. 4171–4186.  https://aclanthology.org/N19-1423 . https://doi.org/10.18653/v1/N19-1423 .

Schmidt L, Weeds J, Higgins JPT. Data Mining in Clinical Trial Text: Transformers for Classification and Question Answering Tasks. In: Cabitza F, Fred ALN, Gamboa H, editors. Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2020) - Volume 5: HEALTHINF, Valletta, Malta, February 24-26, 2020. SCITEPRESS. 2020. p. 83–94. https://doi.org/10.5220/0008945700830094 .

Zhang T, Yu Y, Mei J, Tang Z, Zhang X, Li S. Unlocking the Power of Deep PICO Extraction: Step-wise Medical NER Identification. CoRR. 2020. arXiv:2005.06601 .

Whitton J, Hunter A. Automated tabulation of clinical trial results: A joint entity and relation extraction approach with transformer-based language representations. Artif Intell Med. 2023;144:102661. https://doi.org/10.1016/j.artmed.2023.102661 .

Dhrangadhariya A, Aguilar G, Solorio T, Hilfiker R, Müller H. End-to-End Fine-Grained Neural Entity Recognition of Patients, Interventions, Outcomes. In: Candan KS, Ionescu B, Goeuriot L, Larsen B, Müller H, Joly A, et al., editors. Experimental IR Meets Multilinguality, Multimodality, and Interaction. vol. 12880. Cham: Springer International Publishing; 2021. p. 65–77. https://doi.org/10.1007/978-3-030-85251-1_6 .

Text chunking using transformation-based learning. In: Natural language processing using very large corpora. Springer. p. 157–176.

Agarap AF. Deep learning using rectified linear units (relu). 2018. arXiv preprint arXiv:1803.08375 .

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Adv Neural Inf Process Syst. 2017;30:5998-6008.  https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html .

Chen MX, Firat O, Bapna A, Johnson M, Macherey W, Foster G, et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne: Association for Computational Linguistics; 2018. p. 76–86.  https://aclanthology.org/P18-1008 . https://doi.org/10.18653/v1/P18-1008 .

Shi T, Keneshloo Y, Ramakrishnan N, Reddy CK. Neural abstractive text summarization with sequence-to-sequence models. ACM Trans Data Sci. 2021;2(1):1–37.

Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: A Next-generation Hyperparameter Optimization Framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery; 2019.  https://doi.org/10.1145/3292500.3330701 .

Shankar RR, Bao Y, Han P, Hu J, Ma J, Peng Y, et al. Sitagliptin added to stable insulin therapy with or without metformin in Chinese patients with type 2 diabetes. J Diabetes Investig. 2017;8(3):321–9.

Thomas J, Noel-Storr A, Marshall I, et al. Living systematic reviews: 2. Combining human and machine effort. J Clin Epidemiol. 2017;91:31–7. https://doi.org/10.1016/j.jclinepi.2017.08.011 .

Schmidt DM, Witte C, Cimiano P. ag-sc/Clinical-Trial-Information-Extraction: Initial release. Zenodo; 2023. https://doi.org/10.5281/zenodo.10419786 .

Download references

Acknowledgements

Not applicable.

Open Access funding enabled and organized by Projekt DEAL. The research of David M. Schmidt is funded by the Ministry of Culture and Science of the State of North Rhine-Westphalia under the grant no NW21-059A (SAIL). Christian Witte has been funded by a grant from the Federal Ministry of Health (BMG) as part of the KINBIOTICS project. Philipp Cimiano acknowledges funding from the Transregio 318 “Constructing Explainability” (Projects B1 and C5). We acknowledge the financial support of the German Research Foundation (DFG) and the Open Access Publication Fund of Bielefeld University for the article processing charge.

Author information

Christian Witte and David M. Schmidt contributed equally to this work.

Authors and Affiliations

Semantic Computing Group, Center for Cognitive Interaction Technology, Bielefeld University, Inspiration 1, Bielefeld, 33619, NRW, Germany

Christian Witte, David M. Schmidt & Philipp Cimiano

You can also search for this author in PubMed   Google Scholar

Contributions

CW contributed to the introduction and model section as well as worked on both approaches and their implementation. DS finalized many parts of the implementation of both the extractive and the generative approach and ran the experiments as well as adapted the original draft to the current state of the code. Moreover, DS was the main author of the experimental results, discussion and related work section and decoding subsection. CW and DS equally contributed to this paper. PC supervised all of the above steps. All authors read and approved the final manuscript.

Corresponding author

Correspondence to David M. Schmidt .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1 Grammar definition

figure a

Listing 1 Grammar of the data model used for decoding in our experiments

Appendix 2 Slot evaluation

Appendix 3 case study, appendix 4 journal slot fillers, appendix 5 template cardinalities, appendix 6 linearized publication.

figure b

Listing 2 Linearization of Glaucoma Publication Template Instance

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Witte, C., Schmidt, D.M. & Cimiano, P. Comparing generative and extractive approaches to information extraction from abstracts describing randomized clinical trials. J Biomed Semant 15 , 3 (2024). https://doi.org/10.1186/s13326-024-00305-2

Download citation

Received : 05 January 2024

Accepted : 05 April 2024

Published : 23 April 2024

DOI : https://doi.org/10.1186/s13326-024-00305-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Information Extraction
  • Deep Learning
  • Clinical Trials
  • Evidence-Based Medicine

Journal of Biomedical Semantics

ISSN: 2041-1480

what is a clinical case study

  • Search Menu
  • Volume 2024, Issue 4, April 2024 (In Progress)
  • Volume 2024, Issue 3, March 2024
  • Case of the Year
  • MSF Case Reports
  • Audiovestibular medicine
  • Cardiology and cardiovascular systems
  • Critical care medicine
  • Dermatology
  • Emergency medicine
  • Endocrinology and metabolism
  • Gastroenterology and hepatology
  • Geriatrics and gerontology
  • Haematology
  • Infectious diseases and tropical medicine
  • Medical ophthalmology
  • Medical disorders in pregnancy
  • Paediatrics
  • Palliative medicine
  • Pharmacology and pharmacy
  • Radiology, nuclear medicine, and medical imaging
  • Respiratory disorders
  • Rheumatology
  • Sexual and reproductive health
  • Sports medicine
  • Substance abuse
  • Author Guidelines
  • Submission Site
  • Open Access
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

Introduction, acknowledgements, author contributions, conflict of interest statement, ethical approval, research registration, provenance and peer review.

  • < Previous

Identifying the clinical and histopathological characteristics of amelanotic melanoma: a case series

ORCID logo

  • Article contents
  • Figures & tables
  • Supplementary Data

Aroon Sohail, Svetlana Kavaklieva, Identifying the clinical and histopathological characteristics of amelanotic melanoma: a case series, Oxford Medical Case Reports , Volume 2024, Issue 4, April 2024, omae029, https://doi.org/10.1093/omcr/omae029

  • Permissions Icon Permissions

Amelanotic melanoma (AM) is a subtype of melanoma where the lesion demonstrates no pigmentation. This can lead to delays in referral with studies showing a higher mortality rate. To determine the characteristics of AM lesions, we conducted a retrospective analysis of patients with confirmed AM. Of the 16 patients, 68.75% were male and the mean age at diagnosis was 78 years. The most common location for AM was the head (37.5%) which also demonstrated a higher mitotic rate (10.67 mm 2 ) compared to the average (7.31 mm 2 ). More than half of the lesions (56%) had been present for more than 1 year. With a misdiagnosis rate of 87.5%, the likelihood of delays were evident. There was no unifying feature on clinical assessment, however conspicuous vessel findings were noted on 62.5% of lesions. We have demonstrated that AM continues to remain a missed diagnosis with the potential for a more lethal cancer to form.

Amelanotic melanoma (AM) is a subtype of melanoma where the lesion phenotype demonstrates little to no pigmentation. It accounts for less than 2% of melanoma diagnosis, but is associated with a higher mortality rate [ 1 , 2 ]. Amelanotic melanoma remains a diagnostic challenge as it mimics various benign and malignant conditions. A misdiagnosis rate of up to 89% has been reported [ 3 ]. It is often detected at later stages in disease progression, potentially secondary to delays in referral due to misdiagnosis with a consequently more lethal disease developing. Thomas et al. conducted an international population based study which highlighted several key characteristics regarding AM [ 4 ]. The study identified 8% of melanomas were histopathologically amelanotic (275 out of 3467). Amelanotic melanoma was identified to generally have a higher tumour stage at diagnosis in comparison to pigmented melanoma. Furthermore, AM had a greater hazard of death compared to pigmented melanoma, with a hazard ratio of 2.0. The study concluded that survival after diagnosis of AM was poorer than pigmented melanoma, due to a more advanced stage at diagnosis, likely due to difficulties and delays in diagnosis. There are few reports in the current literature regarding AM, which limits our understanding of its associated characteristics. Herein we present a case series of 16 patients with AM, reporting the clinical and histological features of the disease. The objective of this report is to educate and increase the awareness of AM by increasing the understanding of the clinical and histopathological characteristics associated with AM.

In a 10-year period at a single centre in the Northwest of England, patients with a histological diagnosis of AM were enrolled and data collated retrospectively. Consent for research was obtained from each patient at biopsy.

Table 1 summarises demographics, clinical presentation and histological characteristics of the lesions. A total of 16 patients were diagnosed with AM. There were 11 men and 5 women with a mean age at diagnosis of 78 years (male mean age 77 years, female mean age 79 years). Amelanotic melanoma was most commonly seen to manifest in the head (37.5%), followed by the legs (25%), back (18.8%), arms (12.5%) and neck (6.25%). The average size of the lesion at diagnosis was 13.88 mm (minimum size 5 mm, maximum size 35 mm). The average Breslow thickness was 3.16 mm (minimum thickness 0.2 mm, maximum thickness 5.7 mm) and an average mitotic rate of 7.31 mm 2 (minimum rate of 0 mm 2 , maximum rate of 23 mm 2 ). Interestingly, the average Breslow thickness of lesions on the back (4.07 mm) was notably higher than the overall average. In contrast, the average mitotic rate was highest in lesions of the head (10.67 mm 2 ) and leg (9 mm 2 ). The most common histological subtype was nodular (62.5%), followed by superficial spreading (18.75%), lentigo maligna melanoma (6.25%), desmoplastic (6.25%) and balloon (6.25%). Desmoplastic and lentigo maligna melanoma subtypes both had a notably higher Breslow thickness than the average at 5.5 mm.

Patient demographics and summary of AM features (SSM—superficial spreading melanoma; LMM—lentigo maligna melanoma; NMSC—non-melanoma skin cancer)

Table 2 summarises the clinical findings and dermoscopic descriptions during clinical examination. In our series, the clinical misdiagnosis rate was 87.5% with squamous cell carcinoma (SCC) and basal cell carcinoma (BCC) the most common misdiagnoses, accounting for 9 of 16 cases. Onset of lesions were an almost even split, with 56% of lesions reported as longstanding (>12 months) and 44% recent (<12 months). Lesions were noted as non-ulcerated (75%), erythematous (50%) or yellow plaque (12.5%), however there was variation amongst the lesions with no unifying common feature. Dermoscopic findings were variable, but conspicuous vessels were noted in 62.5% of lesions. The most common vessel patterns were linear and irregular vessels, both identified in 18.75% of cases with 12.5% of cases showing arborising vessels.

Summary of AM clinical and dermoscopic findings (BCC—basal cell carcinoma; SCC—squamous cell carcinoma; AM—amelanotic melanoma)

Our case series demonstrates that almost all patients with AM were elderly (>70 years-old). This is in keeping with the general statistics of melanoma incidence which increases with age, with the highest rates in the 85–89 age group for both males and females [ 5 ]. Furthermore, our cohort were predominantly male, which also follows the general trend in melanoma incidence where rates are higher in males in the older age groups [ 5 ].

The most common site in our series was an exposed area which was the head. This is contrary to the current statistics of melanoma, where the most common location is the trunk [ 5 ]. According to the literature, melanomas which develop on the trunk occur more often in the fifth to sixth decades of life, whereas melanomas that develop in high ultra violet (UV) exposed body regions, like the head and neck, occur more commonly in the eighth decade [ 3 ]. Due to the older age and mainly exposed sites of involvement, it is most likely that chronic UV exposure rather than intermittent is leading to the development of AM.

The overall Breslow thickness was found to be high, which is in accordance to previous reports and likely to be related to a delay in diagnosis. Furthermore, a higher mitotic rate was noted in AM on the head, which may suggest a more aggressive nature of AM in exposed areas. The most common clinical misdiagnosis of AM was BCC and SCC and this is in agreement with previous reports [ 6 ].

Dermoscopic characteristics predominantly consisted of vascular abnormalities. Dermoscopy is a vital component of detecting abnormal features which can suggest AM. Vessel analysis is recommended for lesions which lack pigmentation to identify suspicious lesions, with the 5 + 2 list used to guide vessel analysis [ 7 ]. Vascular patterns which raise the suspicion of melanoma include irregular dot, linear irregular, arborising and polymorphic vessels. These patterns have been highlighted in a larger cohort study by Paolino et al. where there was high prevalence of linear looped vessels (58.8%), linear irregular vessels (50.0%) and arborising vessels (47.2%) [ 8 ]. Considering 62.5% of the AM lesions in our series had vessel features in accordance with the aforementioned report, we would recommend that any lesion suspicious for AM should have dermoscopy assessment with specific focus on identifying the aforementioned vessel features [ 9 ].

We recognise the limitations of our study, principally the small sample size and no follow up. Furthermore, dermoscopic data were often incomplete and lacking detail, making the clinical assessment analysis more difficult. Regardless, we have demonstrated that AM can have a significant misdiagnosis rate with the possibility for a more aggressive cancer to form, potentially leading to unfavourable patient outcomes. We have also demonstrated key features of suspicious lesions which require assessment, notably abnormal vessels. Practitioners in primary and secondary care should be aware of and vigilant in identifying AM and arranging further appropriate investigation.

Aroon Sohail (Investigation, writing—original draft, corresponding author) and Svetlana Kavaklieva (Supervision, writing—review and editing).

Authors declare that they have no competing interests.

The authors received no funding for this work.

Ethical Approval was provided/waived by the authors institution.

Informed consent was obtained from each patient at the time of biopsy.

Dr Aroon Sohail.

Not commissioned, externally peer reviewed.

Moreau   JF , Weissfeld   JL , Ferris   LK . Characteristics and survival of patients with invasive amelanotic melanoma in the USA . Melanoma Res   2013   [cited 2023 Jan 16] ; 23 : 408 – 13 . Available from:   https://journals.lww.com/melanomaresearch/Fulltext/2013/10000/Characteristics_and_survival_of_patients_with.10.aspx .

Google Scholar

Detrixhe   A , Libon   F , Mansuy   M , Nikkels-Tassoudji   N , Rorive   A , Arrese   JE . et al.    Melanoma masquerading as nonmelanocytic lesions . Melanoma Res   2016   [cited 2023 Jan 16] ; 26 : 631 – 4 . Available from:   https://journals.lww.com/melanomaresearch/Fulltext/2016/12000/Melanoma_masquerading_as_nonmelanocytic_lesions.12.aspx .

Gong   HZ , Zheng   HY , Li   J . Amelanotic melanoma . Melanoma Res   2019   [cited 2023 Jan 16] ; 29 : 221 – 30 . Available from:   https://journals.lww.com/melanomaresearch/Fulltext/2019/06000/Amelanotic_melanoma.1.aspx .

Thomas   NE , Kricker   A , Waxweiler   WT , Dillon   PM , Busam   KJ , From   L . et al.    Comparison of Clinicopathologic features and survival of Histopathologically Amelanotic and pigmented melanomas: a population-based study NIH public access . JAMA Dermatol   2014 ; 150 : 1306 – 14 .

UK CR . Melanoma skin cancer incidence statistics . Cancer Research UK   [Internet] . [cited 2023 Jan 16] . Available from:   https://www.cancerresearchuk.org/health-professional/cancer-statistics/statistics-by-cancer-type/melanoma-skin-cancer/incidence#heading-One

Google Preview

McClain   SE , Mayo   KB , Shada   AL , Smolkin   ME , Patterson   JW , Slingluff   CL . Amelanotic melanomas presenting as red skin lesions: a diagnostic challenge with potentially lethal consequences . Int J Dermatol   2012   [cited 2023 Jan 16] ; 51 : 420 – 6   Available from: /pmc/articles/PMC4465919/ .

Kreusch   J , Koch   F . Incident light microscopic characterization of vascular patterns in skin tumors . Hautarzt   1996   [cited 2023 Jan 16] ; 47 : 264 – 72 . Available from:   https://pubmed.ncbi.nlm.nih.gov/8655309/ .

Paolino   G , Bearzi   P , Pampena   R , Longo   C , Frascione   P , Rizzo   N . et al.    Clinicopathological and dermoscopic features of amelanotic and hypomelanotic melanoma: a retrospective multicentric study . Int J Dermatol   2020   [cited 2024 Jan 20] ; 59 : 1371 – 80 . Available from:   https://onlinelibrary.wiley.com/doi/full/10.1111/ijd.15064 .

Dawood   S , Altayeb   A , Atwan   A , Mills   C . Dermoscopic features of amelanotic and hypomelanotic melanomas: a review of 49 cases . Dermatol Pract Concept   2022   [cited 2023 Jan 28] ; 12 : e2022060   Available from: /pmc/articles/PMC9116512/ .

Email alerts

Citing articles via, affiliations.

  • Online ISSN 2053-8855
  • Copyright © 2024 Oxford University Press
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

  • Case report
  • Open access
  • Published: 23 April 2024

Genetic exploration of Dravet syndrome: two case report

  • Agung Triono 1 ,
  • Elisabeth Siti Herini   ORCID: orcid.org/0000-0003-2571-8310 1 &

Journal of Medical Case Reports volume  18 , Article number:  215 ( 2024 ) Cite this article

169 Accesses

1 Altmetric

Metrics details

Dravet syndrome is an infantile-onset developmental and epileptic encephalopathy (DEE) characterized by drug resistance, intractable seizures, and developmental comorbidities. This article focuses on manifestations in two Indonesian children with Javanese ethnicity who experienced Dravet syndrome with an SCN1A gene mutation, presenting genetic analysis findings using next-generation sequencing.

Case presentation

We present a case series involving two Indonesian children with Javanese ethnicity whom had their first febrile seizure at the age of 3 months, triggered after immunization. Both patients had global developmental delay and intractable seizures. We observed distinct genetic findings in both our cases. The first patient revealed heterozygous deletion mutation in three genes ( TTC21B , SCN1A , and SCN9A ). In our second patient, previously unreported mutation was discovered at canonical splice site upstream of exon 24 of the SCN1A gene. Our patient’s outcomes improved after therapeutic evaluation based on mutation findings When comparing clinical manifestations in our first and second patients, we found that the more severe the genetic mutation discovered, the more severe the patient’s clinical manifestations.

These findings emphasize the importance of comprehensive genetic testing beyond SCN1A , providing valuable insights for personalized management and tailored therapeutic interventions in patients with Dravet syndrome. Our study underscores the potential of next-generation sequencing in advancing genotype–phenotype correlations and enhancing diagnostic precision for effective disease management.

Peer Review reports

Dravet syndrome (DS), previously known as severe myoclonic epilepsy of infancy (SMEI), is an infantile-onset developmental and epileptic encephalopathy (DEE) characterized by drug resistance, intractable seizures, and comorbidities including intellectual disability, behavioral problems, sleep disturbances, gait disturbances, and an increased risk of sudden unexpected death in epilepsy [ 1 , 2 ]. The incidence of DS is approximately 1 in every 15,700 births [ 3 ]. The first symptom of DS is seizures in the first year of life, followed by developmental delay [ 1 ]. This first seizure is either generalized tonic–clonic or focal (occasionally hemiclonic) clonic, and in more than half of the cases, it is a febrile seizure, making it difficult to distinguish from a self-limiting febrile seizure. Infection, hot environment, exhaust, sunlight, or exercise can initiate an attack of DS [ 4 , 5 ]. Approximately 80% of patients with DS carry a pathogenic variant of the sodium channel alpha 1 subunit ( SCN1A ) gene resulting in haploinsufficiency Nav1.1, the alpha-1 subunit of the sodium channel. PCDH19, SCN2A, SCN8A, SCN1B, GABRA1, GABRB3, GABRG2, KCNA2, CHD2, CPLX1, HCN1A, and STXBP1 variants may also be involved in DS or DS-like phenotypes. Accordingly, genetic testing is required to identify other genes that play a role in the DS phenotype and to expand genotype-DS phenotype correlations to enhance the future management of this disease [ 6 ]. In the last decade, next-generation sequencing (NGS) technology has been able to analyze a set of genes (targeted panel sequencing), exome [(whole exome sequencing (WES)], or genome [whole genome sequencing (WGS)] in a single sequencing process, making it possible to diagnose rare diseases such as early childhood epilepsy [ 7 ]. Identification of the genetic basis of DS can provide additional information regarding pathophysiology, prognosis, and individual drug therapy options according to the patient’s condition.

We present a case series involving two children, one aged 11 years and 2 months, and the other aged 1 year and 4 months. Both children were diagnosed with DS, exhibiting symptoms of intractable seizures, global developmental delay, and seizures triggered by postimmunization fever. Despite displaying similar symptoms, the two individuals possess different genetic variants of the SCN1A gene and also possible novel mutation in DS. We also discuss the main clinical characteristics, treatment course, and management of DS at tertiary referral hospitals in Indonesia.

A boy with Javanese ethnicity aged 11 years and 2 months with uncontrollable seizures regularly visits our hospital. The patient had his first seizure at the age of 3 months with a duration of 15 min, and it was triggered after receiving diphtheria–pertussis–tetanus (DPT) immunization, which was accompanied by fever. The patient has about six to seven seizures per day for 1 min in the form of generalized tonic–clonic and absence seizures. He was the first child of nonconsanguineous healthy parents with normal prenatal and birth history. He has a younger sister with normal development. There is no history of family members with febrile seizure. The patient was born at 40 weeks of gestation, with a birth weight of 4000 g, length of 52 cm, and head circumference of 33 cm. The patient is currently experiencing global developmental delay and is still in kindergarten. He had learning difficulties and was unable to speak words at an age-appropriate level. He had delayed motor development and was unable to perform age-appropriate motor activities. Head circumference was 46.5 cm (microcephaly). There were no signs of meningeal irritation nor Babinski response. The motor examination revealed no increased tone in the upper and lower limb. Other systemic examinations revealed no abnormalities. Interictal electroencephalography (EEG) showed diffuse epileptiform irritative abnormality on a normal background (Fig.  1 ). Magnetic resonance imaging (MRI) of the brain showed cerebral atrophy, bilateral frontal subarachnoid enlargement, bilateral occipital lobe and polymicrogyria, and a neuroglial cyst in the right temporal lobe (Fig.  2 ). He was recommended to get genetic testing done since he was suspected of having DS.

figure 1

Electroencephalography (EEG) shows diffuse epileptiform irritative abnormality on a normal background

figure 2

Axial brain magnetic resonance imagery shows cerebral atrophy, bilateral frontal subarachnoid enlargement, bilateral occipital lobe polymicrogyria, and a neuroglial cyst in the right temporal lobe

Whole genome sequencing (WGS), whole exome sequencing (WES), and Sanger sequencing were performed at 3Billion (Seoul, Korea). The WGS and WES procedures were conducted according to the protocols of Richards et al . [ 8 ] and Seo et al . [ 9 ], respectively. Both WES and WGS are comprised of four main parts: (1) high-quality sequencing; (2) sequencing data analysis including alignment to the genome reference consortium human 37 (GRCh37)/hg19 for WES, also alignment to the genome reference consortium human 38 (GRCh38) and revised Cambridge reference sequence (rCRS) of the mitochondrial genome for WGS; (3) variant annotation and prioritization by EVIDENCE [a software that was developed in house to prioritize variants based on the American College of Medical Genetics and Genomics (ACMG) guidelines [ 10 ]]; and (4) variant interpretation in the context of the patient’s symptoms and reporting of disease-causing variants. Once EVIDENCE prioritizes the top candidate variants/genes, 3Billion’s highly-trained clinical/medical geneticists manually curate each variant to identify the disease-causing variant for reporting.

In our initial examination, we performed WES on patient 1 and subsequently identified a copy number variant (CNV), prompting us to proceed with WGS. The WGS analysis revealed a heterozygous pathogenic 552.9 Kb deletion variant in 2q24.3. The heterozygous deletion NC_000002.12:g.165811316_166364199delinsTGTACACTA at 2q24.3 spans across three genes ( TTC21B , SCN1A , and SCN9A ). The variant is not observed in the gnomAD SVs v2.1.1 dataset. SCN1A is subject to haploinsufficiency. Other pathogenic variants have been reported in this region. There are multiple similarly affected individuals reported with similar likely pathogenic copy–number–loss overlapping this region [ 11 , 12 ]. Therefore, this variant was classified as pathogenic. Due to region-spanning mutation in SCN1A, which suitable with clinical manifestation, the patient was diagnosed with DS (OMIM 607208: since we were unable to perform a Sanger sequencing study on both of the parents, the pattern of inheritance is still uncertain.

The arents were counseled about their child’s condition and agreed to undergo multipronged therapy. Before the patient was diagnosed with DS, he had received valproic acid (30 mg/kg per day), phenobarbital (2.5 mg/kg per day), and oxcarbazepine (5 mg/kg per day), also physio, occupation, and speech therapy but had not shown significant improvement. He was seizure-free for 3 months after oxcarbazepine was changed to levetiracetam (27 mg/kg per day). However, the patient then had another episodes of less than 5 minutes general tonic–clonic seizure (GTCS)-induced by fever. Interictal EEG was performed to evaluate his condition, and we found that the diffuse epileptiform irritative abnormality persisted.

A 1 year and 4 month-old-girl with Javanese ethnicity was referred to our hospital due experiencing myoclonic seizure followed by 20 minute GTCS at 3 months, after fever following DPT immunization. She then continued to experience generalized tonic–clonic seizures one to two times per day for 10–15 seconds. At 9 months of age, the patient received a second DPT immunization, and on the same day, she had another generalized tonic–clonic seizure that lasted > 30 minutes, resulting in her admission to the pediatric intensive care unit. Before the first seizure, the patient could lift her head, grasp a toy and make eye contact, but after that, she could neither lift her head nor grasp an object. The patient has no previous history of trauma.

She had a normal head circumference increased physiological reflexes in all extremities. Other systemic examinations revealed no abnormalities. Computed tomography (CT) scan examination of the head showed a subdural hygroma in the right and left frontoparietal region, without any other abnormalities (Fig.  3 ). Electroencephalography (EEG) at the beginning of the seizure did not show any abnormalities, but the EEG follow-up 7 months after the onset of the seizure showed an abnormal epileptiform (spike wave) with a normal background (Fig.  4 ). Thus, she was suspected of having DS and was recommended to undergo genetic examination.

figure 3

Axial brain computed tomography scan shows a subdural hygroma in the right and left frontoparietal region, without any other abnormalities

figure 4

Electroencephalography shows abnormal irritative epileptiform with a normal background

Whole exome sequencing (WES) showed a likely pathogenic variant identified as a heterozygous mutation of the SCN1A gene with genomic position 2-166859265-T-C (GRCh37), [NM_001165963.4:C.4003-2A > G [NP_001159435.1:p.?]. The variant is located in the canonical splice site upstream of exon 24 of SCN1A gene (NM_001165963.4 transcript). Since this variant is an essential splicing variant, the protein consequence is uncertain and therefore represented as (p.?). In this patient’s genetic mutation, the canonical junction site occurs which is expected to alter the junction and result in loss or disruption of normal protein function. However, using an in silico predictor, spliceAI ( https://spliceailookup.broadinstitute.org/ ), the variant is predicted to result in a loss of 22 base pairs at end of exon 24. This loss is expected to create a frameshift at the Gly1342 position. Sanger sequencing confirmed the patient’s genotype (Fig.  5 A), but the mother’s Sanger analysis was negative (Fig.  5 B). Due to familial issues, Sanger sequencing was not performed on the father, leaving the inheritance pattern unresolved.

figure 5

A Sanger sequencing result of patient 2 showed a heterozygous mutation of the SCN1A gene with the genomic position 2-166859265-T-C (GRCh37), [NM_001165963.4:C.4003-2A >G [NP_001159435.1:p.?] (red arrow); and B Sanger sequencing result of patient 2’s mother showed normal sequence

The parents were counseled about their child’s condition and agreed to undergo multipronged therapy. Before patient was diagnosed with DS, she received clonazepam (0.01 mg/kg per day), valproic acid (29 mg/kg per day), and phenytoin (5 mg/kg per day), but seizure persisted. When phenytoin was stopped, with valproic acid (30 mg/kg per day) and clonazepam (0.04 mg/kg per day) adjusted, seizures were greatly decreased. Later, patient only experienced one seizure per year. The patient routinely received physio, speech, and occupational therapy.

When comparing the clinical features and outcomes of the two patients (Table  1 ), we found that our first patient, who had three medications, was still having a generalized seizure induced by fever with duration less than 5 minutes after they had been seizure-free for 3 months (at the age 11 years and 8 months. Our second patient, however, only experienced one seizure annually after receiving two medications (at the age 1 year and 10 months). This difference implies that the clinical state of the first patient was worse than that of the second.

Research on the identification of DS genetic mutations using NGS has never been done in Indonesia. In 2010, we conducted a study to identify pathogenic variants of the SCN1A gene using the Sanger sequencing method and successfully reported cases of novel SCN1A mutations in Indonesia in patients with severe myoclonic epilepsy in infancy (SMEI) and borderline SMEI (SMEB). The first boy identified with SMEI experienced a variety of seizures, including his first febrile seizure and general tonic–clonic seizure at 7 months of age, and later suffered from myoclonic seizures, left-sided hemiconvulsions, also focal convulsions without fever, along with delayed speech development. The second patient with SMEB had his first febrile seizures with GTCS after immunization at 3 months old, then later on experienced status epilepticus, GTCS, and atonic convulsions without fever [ 13 ]. We also conducted another research on the spectrum of generalized epilepsy with febrile seizure plus (GEFS+) focusing on clinical manifestations and SCN1A gene mutations. That study analyzed a total of 34 patients who suffered from SMEI (7 patients), SMEB (7 patients), febrile seizure plus (FS+) and absence/myoclonic/atonic/partial seizures (11 patients), and FS+ (9 patients) [ 14 ].

However, the research that we have done uses the Sanger sequencing genetic examination, which is expensive and takes considerable time. Additionally, it is unable to find any other gene besides SCN1A in patients with DS. A study by Djémié et al . in Belgium reported the discovery of 28 pathogenic variants of the SCN1A gene using the NGS method which were previously missed or undiagnosed using Sanger sequencing [ 7 ]. To link DS cases more effectively, we are attempting to conduct NGS genetic tests, specifically WES and WGS.

Dravet syndrome (DS) was infrequently reported in Indonesia due to its difficulty in diagnosis, misdiagnosis as febrile seizures or other epilepsy syndromes, or lack of follow-up and genetic testing in our country. According to the to the International League Against Epilepsy (ILAE) [ 15 ], the diagnostic criteria for this condition should consist of a number of the following symptoms: (1) a family history of epilepsy or febrile seizures; (2) normal development before seizures onset; (3) seizure before 1 year of age; (4) EEG with generalized spike and polyspike waves; (5) pleomorphic epilepsy (myoclonic, focal, clonic, absence, and generalized seizures); (6) focal abnormalities or early photosensitivity; (7) psychomotor retardation after 24 months; (8) exacerbation of seizures with increased body temperature; and (9) the appearance of subsequent ataxia, pyramidal signs or interictal myoclonus after the beginning of psychomotor slowing. Both of our patients had seizures beginning with increased body temperature and regression of development after seizure onset, which were resistant to the majority of anticonvulsant medications. The seizures began as generalized tonic–clonic seizures, followed by absence seizures. Both of our patients also experienced subsequent ataxia and pyramidal signs. Thus, they were suspected of having DS and were advised to undergo genetic testing.

Infants with DS have normal physical and psychomotor development at the time of their first seizure, which typically occurs between the ages of 5 and 8 months. In our case series, both of our patients experienced their first seizure at the age of 3 months [ 16 , 17 ]. In the first year of life, the most common form of seizure is febrile tonic–clonic. Some patients may experience myoclonic and dyscognitive seizures infrequently. Frequently, protracted seizures result in status epilepticus. In the first year of life, seizures are precipitated by fever/illness, immunization, and cleansing [ 16 ]. As the infant develops, he or she will experience a variety of seizure types, as well as fever and emotional stress, flashes of light, and overexertion being seizure triggers. The child with DS will develop hypotonia, ataxia, incoordination, and pyramidal signs, dysautonomia events, cognitive impairment, and behavioral disturbances such as attention deficit, hyperactivity, or autistic characteristics [ 15 ]. Some of the conditions above are very consistent with what happened to our patients.

The EEG performed during the early phases of the disease is normal. However, as the child grows, generalized spike waves with isolated or brief discharges of fast polyspike waves may be present [ 15 , 18 ]. In the first case, we found diffuse epileptiform irritative abnormality with a normal background, whereas in the second case, initially it was found normal, then a few months later it became abnormal irritative epileptiform with a normal background.

Genetic testing is developing rapidly and playing a significant role in the specific diagnosis and management of epilepsy [ 19 , 20 ]. Several genes with pathogenic mutations produce DS or DS-like phenotypes, which inevitably require different drug therapy approaches. Genes that cause DS can be grouped based on how they work: specifically, three sodium channel-related genes ( SCN2A, SCN8A , and SCN1B ), one potassium channel-related gene ( KCNA2 ), three gamma-aminobutyric acid receptors ( GABAR ) genes ( GABRA2, GABRB3 , and GABRG2 ), a cyclic nucleotide gated cation channel gene ( HCN1 ), and other functional genes including CHD2, CPLX1 , and STXBP1 . Approximately 80% of patients with DS have a pathogenic variant of the SCN1A gene, from which the majority of SCN1A variants are de novo, but 10% of people inherit the SCNA1 mutation from one or both parents [ 6 ]. Both of our patients had a mutation in the SCN1A gene, which is the most common mutation seen in DS.

Furthermore, TTC21B and SCN9A mutations were also found in our first patient. A study conducted by Suls et al . also reported a four generation Bulgarian family with epilepsy, revealing a heterozygous 400 kb deletion on chromosome 2q24 that included the SCN1A and TTC21B genes [ 21 ]. The patients exhibited variable phenotypes, but all experienced generalized tonic–clonic seizures around the first year of life, with some presenting myoclonic or absence seizures. Febrile seizures occurred in three of the four patients during infancy. Notably, one patient had mild mental retardation, another had psychomotor slowing, and a third had mental retardation from early infancy; all showed reduced seizures on medication. The findings in that study parallel the situation observed in our initial patient case. Meanwhile, a study by Singh et al . identified a heterozygous mutation in the SCN9A gene in two patients diagnosed with DS [ 22 ]. One of these patients also exhibited a mutation in the SCN1A gene. The study provided evidence suggesting that the SCN9A gene on chromosome 2q24 could potentially serve as a modifier for DS. Among 109 patients with DS, 8% were found to have an SCN9A mutation. This included six patients with double heterozygosity for SCN9A and SCN1A mutations and three patients with only heterozygous SCN9A mutations, supporting the notion of a multifactorial inheritance pattern [ 22 ]. The previous research confirmed the severity of clinical symptoms in our first patient, whom we identified mutations in the SCN1A, SCN9A , and TTC21B genes.

In the last decade, there has been a very rapid development of neurogenetic science and diagnostic technology. NGS is the latest method of genetic examination that allows for the discovery of causal mutations, including de novo, novel, and familial mutations related to epilepsy syndromes that have variable phenotypic features [ 23 ]. The first generation of DNA sequencing using the Sanger method could only examine one gene at a time and had limitations especially when examining large genomic regions, so the NGS method is more widely used today [ 7 , 23 ]. A study conducted by Kim et al . in Seoul reported an increase in diagnostic yield using WES after targeted panel sequencing with negative results in infantile onset epilepsy by 8%. This result suggests that WES assays increase the opportunity to search for new epilepsy genes and uncover less well-known epileptic phenotypes from known neurological diseases [ 24 ]. The WES examination also allows for the discovery of de novo or inherited mutations if the patient and both parents are examined [ 25 ].

According to the recommendations of the North American consensus panel, clobazam and valproic acid are the first-line therapies for antiepileptic drugs, followed by stiripentol, topiramate and levetiracetam. Patients with a suboptimal response to clobazam and valproic acid have been advised to consider the ketogenic diet as a second-line treatment [ 17 ]. SCN1A is a gene that codes for sodium channel channels, so drugs that work as sodium channel blockers, such as lamotrigine, phenytoin, carbamazepine, oxcarbazepine, lacosamide, and rufinamide, are contraindicated in patients with DS because they can increase the frequency of seizures [ 4 ]. After the failure of first- and second-line therapy, surgical therapies, such as vagus nerve stimulation (VNS), were moderately agreed upon and should be considered [ 17 ]. Besides medication, controlling infections and body temperature variations also showed to decrease the frequency of seizures and severity of the disease [ 18 ]. Initially, the first patient received oxcarbazepine and the second patient got phenytoin, which had been contraindicated to patients with DS. Futhermore, after eliminating medications that were contraindicated, both patients’ outcome improved.

In this study, we discovered unique mutations that have never been documented before, particularly in Indonesia, where NGS analysis of DS genetic variants has never been done. However, the limitation of this study, is that the information comes from two cases only. Further research is needed to explore more cases from Indonesia population.

In summary, our case series utilizing next-generation sequencing (NGS) unveils the intricate genetic landscape of Dravet syndrome (DS) in two Indonesian pediatric cases. By using WGS and WES, we identified distinct mutations in the SCN1A gene, as well as contributions from genes, such as TTC21B and SCN9A . The power of WGS lies in its ability to uncover rare pathogenic variants, including a 552.9 Kb deletion in the 2q24.3 region. These findings emphasize the importance of comprehensive genetic testing beyond SCN1A , providing valuable insights for personalized management and tailored therapeutic interventions in patients with DS. Our study underscores the potential of NGS in advancing genotype–phenotype correlations and enhancing diagnostic precision for effective disease management. Furthermore, we found that the clinical condition of the first patient was worse than that experienced by the second patient. This difference suggests that the more severe the genetic mutation detected, the more severe the clinical manifestations of the patient.

Availability of data and materials

The dataset used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

American College Of Medical Genetics

Copy number variant

Computed tomography

  • Dravet syndrome

Developmental and epileptic encephalopathy

Electroencephalography

Febrile seizure plus

Generalized epilepsy with febrile seizure plus

Genome reference consortium human 37

Genome reference consortium human 38

General tonic clonic seizure

International league against epilepsy

Magnetic resonance imaging

  • Next-generation sequencing

Revised Cambridge reference sequence

Sodium channel alpha 1 subunit

Severe myoclonic epilepsy of infancy-borderline

Severe myoclonic epilepsy in infancy

Vagus nerve stimulation

Whole-exome sequencing

Whole-genome sequencing

Zuberi SM, Wirrell E, Yozawitz E, Wilmshurst JM, Specchio N, Riney K, et al . ILAE classification and definition of epilepsy syndromes with onset in neonates and infants: position statement by the ILAE Task Force on Nosology and Definitions. Epilepsia. 2022;63(6):1349–97.

Article   PubMed   Google Scholar  

Wirrell EC, Hood V, Knupp KG, Meskis MA, Nabbout R, Scheffer IE, et al . International consensus on diagnosis and management of Dravet syndrome. Epilepsia. 2022;63(7):1761–77.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Isom LL, Knupp KG. Dravet syndrome: novel approaches for the most common genetic epilepsy. Neurotherapeutics. 2021;18(3):1524–34.

Cardenal-Muñoz E, Auvin S, Villanueva V, Cross JH, Zuberi SM, Lagae L, et al . Guidance on Dravet syndrome from infant to adult care: road map for treatment planning in Europe. Epilepsia Open. 2022;7(1):11–26.

Chen C, Fang F, Wang X, Lv J, Wang X, Jin H. Phenotypic and genotypic characteristics of SCN1A associated seizure diseases. Front Mol Neurosci. 2022;28(15): 821012.

Article   Google Scholar  

Ding J, Wang L, Jin Z, Qiang Y, Li W, Wang Y, et al . Do all roads lead to Rome? Genes causing Dravet syndrome and Dravet syndrome-like phenotypes. Front Neurol. 2022;11(13): 832380.

Djémié T, Weckhuysen S, Von Spiczak S, Carvill GL, Jaehn J, Anttonen A, et al . Pitfalls in genetic testing: the story of missed SCN1A mutations. Mol Genet Genomic Med. 2016;4(4):457–64.

Article   PubMed   PubMed Central   Google Scholar  

Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al . Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17(5):405–24.

Seo GH, Kim T, Choi IH, Park J, Lee J, Kim S, et al . Diagnostic yield and clinical utility of whole exome sequencing using an automated variant prioritization system, EVIDENCE . Clin Genet. 2020;98(6):562–70.

Riggs ER, Andersen EF, Cherry AM, Kantarci S, Kearney H, Patel A, et al . Technical standards for the interpretation and reporting of constitutional copy-number variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics (ACMG) and the Clinical Genome Resource (ClinGen). Genet Med. 2020;22(2):245–57.

Lim BC, Hwang H, Kim H, Chae JH, Choi J, Kim KJ, et al . Epilepsy phenotype associated with a chromosome 2q243 deletion involving SCN1A: migrating partial seizures of infancy or atypical Dravet syndrome? Epilepsy Res. 2015;109:34–9.

Article   CAS   PubMed   Google Scholar  

Fry AE, Rees E, Thompson R, Mantripragada K, Blake P, Jones G, et al . Pathogenic copy number variants and SCN1A mutations in patients with intellectual disability and childhood-onset epilepsy. BMC Med Genet. 2016;17(1):34.

Herini ES, Gunadi, Van Kempen MJA, Yusoff S, Sutaryo, Sunartini, et al. Novel SCN1A mutations in Indonesian patients with severe myoclonic epilepsy in infancy. Pediatr Int. 2010;52(2):234–9.

Herini ES, Gunadi, Harahap ISK, Yusoff S, Morikawa S, Patria SY, et al. Generalized epilepsy with febrile seizures plus (GEFS+) spectrum: clinical manifestations and SCN1A mutations in Indonesian patients. Epilepsy Res. 2010;90(1–2):132–9.

Anwar A, Saleem S, Patel UK, Arumaithurai K, Malik P. Dravet syndrome: an overview. Cureus. 2019. https://www.cureus.com/articles/20900-dravet-syndrome-an-overview . Accessed 20 Nov 2023.

Brunklaus A, Dorris L, Ellis R, Reavey E, Lee E, Forbes G, et al . The clinical utility of an SCN1A genetic diagnosis in infantile-onset epilepsy. Dev Med Child Neurol. 2013;55(2):154–61.

Wirrell EC, Laux L, Donner E, Jette N, Knupp K, Meskis MA, et al . Optimizing the diagnosis and management of Dravet syndrome: recommendations from a North American Consensus Panel. Pediatr Neurol. 2017;68:18-34.e3.

Yadav R, Shah S, Bhandari B, Marasini K, Mandal P, Murarka H, et al . Patient with Dravet syndrome: a case report. Clin Case Rep. 2022;10(5): e05840.

Møller RS, Dahl HA, Helbig I. The contribution of next generation sequencing to epilepsy genetics. Expert Rev Mol Diagn. 2015;15(12):1531–8.

Yozawitz E, Moshé SL. The influence of genetics on epilepsy syndromes in infancy and childhood. Acta Epileptol. 2022;4(1):41.

Suls A, Velizarova R, Yordanova I, Deprez L, Van Dyck T, Wauters J, et al . Four generations of epilepsy caused by an inherited microdeletion of the SCN1A gene. Am Acad Neurol. 2010;75(72):72–6.

CAS   Google Scholar  

Singh NA, Pappas C, Dahle EJ, Claes LRF, Pruess TH, De Jonghe P, et al . A role of SCN9A in human epilepsies, as a cause of febrile seizures and as a potential modifier of Dravet syndrome. PLoS Genet. 2009;5(9): e1000649.

Dunn P, Albury CL, Maksemous N, Benton MC, Sutherland HG, Smith RA, et al . Next generation sequencing methods for diagnosis of epilepsy syndromes. Front Genet. 2018;7(9):20.

Kim SY, Jang SS, Kim H, Hwang H, Choi JE, Chae J, et al . Genetic diagnosis of infantile-onset epilepsy in the clinic: application of whole-exome sequencing following epilepsy gene panel testing. Clin Genet. 2021;99(3):418–24.

Poduri A, Sheidley BR, Shostak S, Ottman R. Genetic testing in the epilepsies—developments and dilemmas. Nat Rev Neurol. 2014;10(5):293–9.

Download references

Acknowledgements

The authors express their gratitude to the patient and their families for their cooperation, as well as to all the staff and nurses who provided care for the patient. We are also thankful for the Faculty of Medicine, Public Health, and Nursing, Universitas Gadjah Mada for funding this research and providing English editing services for assistance in the editing and proofreading process. Additionally, we appreciate the assistance of Kristy Iskandar, Marissa Leviani Hadiyanto and Khansadhia Hasmaradana Mooiindie during the data collection and editing phases.

This study was supported by the Faculty of Medicine, Public Health and Nursing, Universitas Gadjah Mada, (Dana Masyarakat to ESH). The funding body did not influence the study design, data analysis, data interpretation, nor manuscript writing.

Author information

Authors and affiliations.

Department of Child Health, Faculty of Medicine, Public Health and Nursing, Universitas Gadjah Mada, Dr. Sardjito Hospital, Jl. Kesehatan No. 1, Yogyakarta, 55281, Indonesia

Agung Triono & Elisabeth Siti Herini

Pediatric Surgery Division, Department of Surgery/Genetics Working Group/Translational Research Unit, Faculty of Medicine, Public Health and Nursing, Universitas Gadjah Mada, Dr. Sardjito Hospital, Yogyakarta, 55281, Indonesia

You can also search for this author in PubMed   Google Scholar

Contributions

ESH, AG, and G made substantial contributions to the conception and design of the work. AG contributed to data acquisition. ESH, AG, and G performed the data analyses and the interpretation of the data. ESH and AG drafted the text and prepared the figures. ESH, AG, and G revised, read, and approved the final manuscript. All authors approve the present version for publication, and are accountable for all aspects related to the study.

Corresponding author

Correspondence to Elisabeth Siti Herini .

Ethics declarations

Ethics approval and consent to participate.

The Medical and Health Research Ethics Committee of the Faculty of Medicine, Public Health and Nursing, Universitas Gadjah Mada, Yogyakarta, Indonesia approved all recruitment and research protocols. Eligible patients signed an informed consent form (ethical clearance number KE-FK-0455-EC-2023; dated March 2023). Patients older than 12 years old and/or their parents or guardian (for patients < 12 years old) signed a written informed consent form to be included in this study.

Consent for publication

Written informed consent was obtained from the patients’ legal guardians for publication of this case report and any accompanying images.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Triono, A., Herini, E.S. & Gunadi Genetic exploration of Dravet syndrome: two case report. J Med Case Reports 18 , 215 (2024). https://doi.org/10.1186/s13256-024-04514-2

Download citation

Received : 31 January 2024

Accepted : 18 March 2024

Published : 23 April 2024

DOI : https://doi.org/10.1186/s13256-024-04514-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Case series

Journal of Medical Case Reports

ISSN: 1752-1947

  • Submission enquiries: Access here and click Contact Us
  • General enquiries: [email protected]

what is a clinical case study

Mobile Navigation

Moderna and OpenAI partner to accelerate the development of life-saving treatments.

Moderna-HeaderImage

More resources

  • View product

Moderna partners with OpenAI to deploy ChatGPT Enterprise to thousands of employees across the company. Now every function is empowered with AI, creating novel use cases and GPTs that accelerate and expand the impact of every team.

Moderna has been at the intersection of science, technology, and health for more than 10 years. Moderna’s mission is to deliver the greatest possible impact to people through mRNA medicines—with the COVID-19 vaccine being their most well-known breakthrough. 

The company has partnered with OpenAI since early 2023. Now, ChatGPT Enterprise is evolving how Moderna operates across each function.

Moderna is using its platform for developing mRNA medicines to bring up to 15 new products to market in the next 5 years—from a vaccine against RSV to individualized cancer treatments. In order to achieve its ambitions, Moderna has adopted a people-centric, technology-forward approach, constantly testing new technology and innovation that can increase human capacity and clinical performance.

We believe very profoundly at Moderna that ChatGPT and what OpenAI is doing is going to change the world. We’re looking at every business process—from legal, to research, to manufacturing, to commercial—and thinking about how to redesign them with AI.

Moderna brings AI to everyone

Moderna adopted generative AI the same way Moderna adopts other technology: with the mindset of using the power of digital to maximize its positive impact on patients. To allow AI to flourish, they knew they needed to start with the user and invest in laying a strong foundation for change.

Moderna’s objective was to achieve 100% adoption and proficiency of generative AI by all its people with access to digital solutions in six months. “We believe in collective intelligence when it comes to paradigm changes,” said Miller, “it’s everyone together, everyone with a voice and nobody left behind.” For this, Moderna assigned a team of dedicated experts to drive a bespoke transformation program. Their approach combined individual, collective and structural change management initiatives.   

Individual change management initiatives included in-depth research and listening programs, as well as trainings hosted in person, online and with dedicated AI learning companions. “Using AI to teach AI was key to our success”, Miller points out. Collective change management initiatives included an AI prompt contest to identify the top 100 AI power users who were then structured as a cohort of internal Generative AI Champions. Moderna’s culture of learning led to local office hours in every business line and geography, and scaled through an internal forum on AI, which now has 2,000 active weekly participants. Lastly, structural change management initiatives included engaging Moderna’s CEO and executive committee members to foster AI culture through leadership meetings and town halls as well as incentive programs and sponsored events with internal and external experts.  

 This work led to an early win with the launch of an internal AI chatbot tool, mChat, at the beginning of 2023. Built on OpenAI’s API, mChat was a success, adopted by more than 80% of employees across the company, building a solid foundation for the adoption of ChatGPT Enterprise.  

90% of companies want to do GenAI, but only 10% of them are successful, and the reason they fail is because they haven’t built the mechanisms of actually transforming the workforce to adopt new technology and new capabilities.

Building momentum with ChatGPT Enterprise

With the launch of ChatGPT Enterprise, Moderna had a decision to make: continue developing mChat as an all-purpose AI tool, or give employees access to ChatGPT Enterprise?

“As a science-based company, we research everything,” said Brice Challamel, Head of AI Products and Platforms at Moderna. Challamel’s team did extensive user testing comparing mChat, Copilot, and ChatGPT Enterprise. “We found out that the net promoter score of ChatGPT Enterprise was through the roof. This was by far the company-favorite solution, and the one we decided to double down on,” Challamel said.  

Once employees had a way to create their own GPTs easily, the only limit was their imaginations. “We were never here to fill a bucket, but to light a fire,” Challamel said. “We saw the fire spread, with hundreds of use cases creating positive value across teams. We knew we were on to something revolutionary for the company.”

The company’s results are beyond expectations. Within two months of the ChatGPT Enterprise adoption: 

  • Moderna had 750 GPTs across the company
  • 40% of weekly active users created GPTs 
  • Each user has 120 ChatGPT Enterprise conversations per week on average

Augmenting clinical trial development with GPTs

One of the many solutions Moderna has built and is continuing to develop and validate with ChatGPT Enterprise is a GPT pilot called Dose ID. Dose ID has the potential to review and analyze clinical data and is able to integrate and visualize large datasets. Dose ID is intended for use as a data-analysis assistant to the clinical study team, helping to augment the team’s clinical judgment and decision-making.

 “Dose ID has provided supportive rationale for why we have picked a specific dose over other doses. It has allowed us to create customized data visualizations and it has also helped the study team members converse with the GPT to further analyze the data from multiple different angles,” said Meklit Workneh, Director of Clinical Development at Moderna. 

Dose ID uses ChatGPT Enterprise’s advanced data analysis feature to automate the analysis and verify the optimal vaccine dose selected by the clinical study team, by applying standard dose selection criteria and principles. Dose ID provides a rationale, references its sources, and generates informative charts illustrating the key findings. This allows for a detailed review, led by humans and with AI input, prioritizing safety and optimizing the vaccine profile prior to further development in late-stage clinical trials. 

“The Dose ID GPT has the potential to boost the amount of work we’re able to do as a team. We can comprehensively evaluate these extremely large amounts of data, and do it in a very efficient, safe, and accurate way, while helping to ensure security and privacy,” added Workneh.

Moderna-Image1

Improving compliance and telling the company’s story

Moderna’s legal team boasts 100% adoption of ChatGPT Enterprise. “It lets us focus our time and attention on those matters that are truly driving an impact for patients,” said Shannon Klinger, Moderna’s Chief Legal Officer. 

Now, with the Contract Companion GPT, any function can get a clear, readable summary of a contract. The Policy Bot GPT helps employees get quick answers about internal policies without needing to search through hundreds of documents. 

Moderna’s corporate brand team has also found many ways to take advantage of ChatGPT Enterprise. They have a GPT that helps prepare slides for quarterly earnings calls, and another GPT that helps convert biotech terminology into approachable language for investor communications. 

“Sometimes we’re so in our own world, and AI helps the brand think beyond that,” explained Kate Cronin, Chief Brand Officer of Moderna. “What would my mother want to know about Moderna, versus a regulator, versus a doctor? How do we tell our story in an effective way across different audiences? That’s where I think there’s a huge opportunity.”

Moderna Image2

A team of a few thousand can perform like a team of 100,000

With an ambitious plan to launch multiple products in the next few years, Moderna sees AI as a key component to their success—and their ability to stay lean as a business while setting new benchmarks in innovation. 

“If we had to do it the old biopharmaceutical ways, we might need a hundred thousand people today,” said Bancel. “We really believe we can maximize our impact on patients with a few thousand people, using technology and AI to scale the company.” 

Moderna has been well positioned to leverage generative AI having spent the last decade building a robust tech stack and data platform. The company fosters a culture of learning and curiosity, attracting employees that excel in adopting new technologies and building AI-first solutions.

By making business processes at Moderna more efficient and accurate, the use of AI ultimately translates to better outcomes for patients. “I’m really thankful for the entire OpenAI team, and the time and engagement they have with our team, so that together we can save more lives,” Bancel said. 

Screenshot 2024 04 01 At 1036 58am

  • Open access
  • Published: 22 April 2024

Artificial intelligence and medical education: application in classroom instruction and student assessment using a pharmacology & therapeutics case study

  • Kannan Sridharan 1 &
  • Reginald P. Sequeira 1  

BMC Medical Education volume  24 , Article number:  431 ( 2024 ) Cite this article

454 Accesses

1 Altmetric

Metrics details

Artificial intelligence (AI) tools are designed to create or generate content from their trained parameters using an online conversational interface. AI has opened new avenues in redefining the role boundaries of teachers and learners and has the potential to impact the teaching-learning process.

In this descriptive proof-of- concept cross-sectional study we have explored the application of three generative AI tools on drug treatment of hypertension theme to generate: (1) specific learning outcomes (SLOs); (2) test items (MCQs- A type and case cluster; SAQs; OSPE); (3) test standard-setting parameters for medical students.

Analysis of AI-generated output showed profound homology but divergence in quality and responsiveness to refining search queries. The SLOs identified key domains of antihypertensive pharmacology and therapeutics relevant to stages of the medical program, stated with appropriate action verbs as per Bloom’s taxonomy. Test items often had clinical vignettes aligned with the key domain stated in search queries. Some test items related to A-type MCQs had construction defects, multiple correct answers, and dubious appropriateness to the learner’s stage. ChatGPT generated explanations for test items, this enhancing usefulness to support self-study by learners. Integrated case-cluster items had focused clinical case description vignettes, integration across disciplines, and targeted higher levels of competencies. The response of AI tools on standard-setting varied. Individual questions for each SAQ clinical scenario were mostly open-ended. The AI-generated OSPE test items were appropriate for the learner’s stage and identified relevant pharmacotherapeutic issues. The model answers supplied for both SAQs and OSPEs can aid course instructors in planning classroom lessons, identifying suitable instructional methods, establishing rubrics for grading, and for learners as a study guide. Key lessons learnt for improving AI-generated test item quality are outlined.

Conclusions

AI tools are useful adjuncts to plan instructional methods, identify themes for test blueprinting, generate test items, and guide test standard-setting appropriate to learners’ stage in the medical program. However, experts need to review the content validity of AI-generated output. We expect AIs to influence the medical education landscape to empower learners, and to align competencies with curriculum implementation. AI literacy is an essential competency for health professionals.

Peer Review reports

Artificial intelligence (AI) has great potential to revolutionize the field of medical education from curricular conception to assessment [ 1 ]. AIs used in medical education are mostly generative AI large language models that were developed and validated based on billions to trillions of parameters [ 2 ]. AIs hold promise in the incorporation of history-taking, assessment, diagnosis, and management of various disorders [ 3 ]. While applications of AIs in undergraduate medical training are being explored, huge ethical challenges remain in terms of data collection, maintaining anonymity, consent, and ownership of the provided data [ 4 ]. AIs hold a promising role amongst learners because they can deliver a personalized learning experience by tracking their progress and providing real-time feedback, thereby enhancing their understanding in the areas they are finding difficult [ 5 ]. Consequently, a recent survey has shown that medical students have expressed their interest in acquiring competencies related to the use of AIs in healthcare during their undergraduate medical training [ 6 ].

Pharmacology and Therapeutics (P & T) is a core discipline embedded in the undergraduate medical curriculum, mostly in the pre-clerkship phase. However, the application of therapeutic principles forms one of the key learning objectives during the clerkship phase of the undergraduate medical career. Student assessment in pharmacology & therapeutics (P&T) is with test items such as multiple-choice questions (MCQs), integrated case cluster questions, short answer questions (SAQs), and objective structured practical examination (OSPE) in the undergraduate medical curriculum. It has been argued that AIs possess the ability to communicate an idea more creatively than humans [ 7 ]. It is imperative that with access to billions of trillions of datasets the AI platforms hold promise in playing a crucial role in the conception of various test items related to any of the disciplines in the undergraduate medical curriculum. Additionally, AIs provide an optimized curriculum for a program/course/topic addressing multidimensional problems [ 8 ], although robust evidence for this claim is lacking.

The existing literature has evaluated the knowledge, attitude, and perceptions of adopting AI in medical education. Integration of AIs in medical education is the need of the hour in all health professional education. However, the academic medical fraternity facing challenges in the incorporation of AIs in the medical curriculum due to factors such as inadequate grounding in data analytics, lack of high-quality firm evidence favoring the utility of AIs in medical education, and lack of funding [ 9 ]. Open-access AI platforms are available free to users without any restrictions. Hence, as a proof-of-concept, we chose to explore the utility of three AI platforms to identify specific learning objectives (SLOs) related to pharmacology discipline in the management of hypertension for medical students at different stages of their medical training.

Study design and ethics

The present study is observational, cross-sectional in design, conducted in the Department of Pharmacology & Therapeutics, College of Medicine and Medical Sciences, Arabian Gulf University, Kingdom of Bahrain, between April and August 2023. Ethical Committee approval was not sought given the nature of this study that neither had any interaction with humans, nor collection of any personal data was involved.

Study procedure

We conducted the present study in May-June 2023 with the Poe© chatbot interface created by Quora© that provides access to the following three AI platforms:

Sage Poe [ 10 ]: A generative AI search engine developed by Anthropic © that conceives a response based on the written input provided. Quora has renamed Sage Poe as Assistant © from July 2023 onwards.

Claude-Instant [ 11 ]: A retrieval-based AI search engine developed by Anthropic © that collates a response based on pre-written responses amongst the existing databases.

ChatGPT version 3.5 [ 12 ]: A generative architecture-based AI search engine developed by OpenAI © trained on large and diverse datasets.

We queried the chatbots to generate SLOs, A-type MCQs, integrated case cluster MCQs, integrated SAQs, and OSPE test items in the domain of systemic hypertension related to the P&T discipline. Separate prompts were used to generate outputs for pre-clerkship (preclinical) phase students, and at the time of graduation (before starting residency programs). Additionally, we have also evaluated the ability of these AI platforms to estimate the proportion of students correctly answering these test items. We used the following queries for each of these objectives:

Specific learning objectives

Can you generate specific learning objectives in the pharmacology discipline relevant to undergraduate medical students during their pre-clerkship phase related to anti-hypertensive drugs?

Can you generate specific learning objectives in the pharmacology discipline relevant to undergraduate medical students at the time of graduation related to anti-hypertensive drugs?

A-type MCQs

In the initial query used for A-type of item, we specified the domains (such as the mechanism of action, pharmacokinetics, adverse reactions, and indications) so that a sample of test items generated without any theme-related clutter, shown below:

Write 20 single best answer MCQs with 5 choices related to anti-hypertensive drugs for undergraduate medical students during the pre-clerkship phase of which 5 MCQs should be related to mechanism of action, 5 MCQs related to pharmacokinetics, 5 MCQs related to adverse reactions, and 5 MCQs should be related to indications.

The MCQs generated with the above search query were not based on clinical vignettes. We queried again to generate MCQs using clinical vignettes specifically because most medical schools have adopted problem-based learning (PBL) in their medical curriculum.

Write 20 single best answer MCQs with 5 choices related to anti-hypertensive drugs for undergraduate medical students during the pre-clerkship phase using a clinical vignette for each MCQ of which 5 MCQs should be related to the mechanism of action, 5 MCQs related to pharmacokinetics, 5 MCQs related to adverse reactions, and 5 MCQs should be related to indications.

We attempted to explore whether AI platforms can provide useful guidance on standard-setting. Hence, we used the following search query.

Can you do a simulation with 100 undergraduate medical students to take the above questions and let me know what percentage of students got each MCQ correct?

Integrated case cluster MCQs

Write 20 integrated case cluster MCQs with 2 questions in each cluster with 5 choices for undergraduate medical students during the pre-clerkship phase integrating pharmacology and physiology related to systemic hypertension with a case vignette.

Write 20 integrated case cluster MCQs with 2 questions in each cluster with 5 choices for undergraduate medical students during the pre-clerkship phase integrating pharmacology and physiology related to systemic hypertension with a case vignette. Please do not include ‘none of the above’ as the choice. (This modified search query was used because test items with ‘None of the above’ option were generated with the previous search query).

Write 20 integrated case cluster MCQs with 2 questions in each cluster with 5 choices for undergraduate medical students at the time of graduation integrating pharmacology and physiology related to systemic hypertension with a case vignette.

Integrated short answer questions

Write a short answer question scenario with difficult questions based on the theme of a newly diagnosed hypertensive patient for undergraduate medical students with the main objectives related to the physiology of blood pressure regulation, risk factors for systemic hypertension, pathophysiology of systemic hypertension, pathological changes in the systemic blood vessels in hypertension, pharmacological management, and non-pharmacological treatment of systemic hypertension.

Write a short answer question scenario with moderately difficult questions based on the theme of a newly diagnosed hypertensive patient for undergraduate medical students with the main objectives related to the physiology of blood pressure regulation, risk factors for systemic hypertension, pathophysiology of systemic hypertension, pathological changes in the systemic blood vessels in hypertension, pharmacological management, and non-pharmacological treatment of systemic hypertension.

Write a short answer question scenario with questions based on the theme of a newly diagnosed hypertensive patient for undergraduate medical students at the time of graduation with the main objectives related to the physiology of blood pressure regulation, risk factors for systemic hypertension, pathophysiology of systemic hypertension, pathological changes in the systemic blood vessels in hypertension, pharmacological management, and non-pharmacological treatment of systemic hypertension.

Can you generate 5 OSPE pharmacology and therapeutics prescription writing exercises for the assessment of undergraduate medical students at the time of graduation related to anti-hypertensive drugs?

Can you generate 5 OSPE pharmacology and therapeutics prescription writing exercises containing appropriate instructions for the patients for the assessment of undergraduate medical students during their pre-clerkship phase related to anti-hypertensive drugs?

Can you generate 5 OSPE pharmacology and therapeutics prescription writing exercises containing appropriate instructions for the patients for the assessment of undergraduate medical students at the time of graduation related to anti-hypertensive drugs?

Both authors independently evaluated the AI-generated outputs, and a consensus was reached. We cross-checked the veracity of answers suggested by AIs as per the Joint National Commission Guidelines (JNC-8) and Goodman and Gilman’s The Pharmacological Basis of Therapeutics (2023), a reference textbook [ 13 , 14 ]. Errors in the A-type MCQs were categorized as item construction defects, multiple correct answers, and uncertain appropriateness to the learner’s level. Test items in the integrated case cluster MCQs, SAQs and OSPEs were evaluated with the Preliminary Conceptual Framework for Establishing Content Validity of AI-Generated Test Items based on the following domains: technical accuracy, comprehensiveness, education level, and lack of construction defects (Table  1 ). The responses were categorized as complete and deficient for each domain.

The pre-clerkship phase SLOs identified by Sage Poe, Claude-Instant, and ChatGPT are listed in the electronic supplementary materials 1 – 3 , respectively. In general, a broad homology in SLOs generated by the three AI platforms was observed. All AI platforms identified appropriate action verbs as per Bloom’s taxonomy to state the SLO; action verbs such as describe, explain, recognize, discuss, identify, recommend, and interpret are used to state the learning outcome. The specific, measurable, achievable, relevant, time-bound (SMART) SLOs generated by each AI platform slightly varied. All key domains of antihypertensive pharmacology to be achieved during the pre-clerkship (pre-clinical) years were relevant for graduating doctors. The SLOs addressed current JNC Treatment Guidelines recommended classes of antihypertensive drugs, the mechanism of action, pharmacokinetics, adverse effects, indications/contraindications, dosage adjustments, monitoring therapy, and principles of monotherapy and combination therapy.

The SLOs to be achieved by undergraduate medical students at the time of graduation identified by Sage Poe, Claude-Instant, and ChatGPT listed in electronic supplementary materials 4 – 6 , respectively. The identified SLOs emphasize the application of pharmacology knowledge within a clinical context, focusing on competencies needed to function independently in early residency stages. These SLOs go beyond knowledge recall and mechanisms of action to encompass competencies related to clinical problem-solving, rational prescribing, and holistic patient management. The SLOs generated require higher cognitive ability of the learner: action verbs such as demonstrate, apply, evaluate, analyze, develop, justify, recommend, interpret, manage, adjust, educate, refer, design, initiate & titrate were frequently used.

The MCQs for the pre-clerkship phase identified by Sage Poe, Claude-Instant, and ChatGPT listed in the electronic supplementary materials 7 – 9 , respectively, and those identified with the search query based on the clinical vignette in electronic supplementary materials ( 10 – 12 ).

All MCQs generated by the AIs in each of the four domains specified [mechanism of action (MOA); pharmacokinetics; adverse drug reactions (ADRs), and indications for antihypertensive drugs] are quality test items with potential content validity. The test items on MOA generated by Sage Poe included themes such as renin-angiotensin-aldosterone (RAAS) system, beta-adrenergic blockers (BB), calcium channel blockers (CCB), potassium channel openers, and centrally acting antihypertensives; on pharmacokinetics included high oral bioavailability/metabolism in liver [angiotensin receptor blocker (ARB)-losartan], long half-life and renal elimination [angiotensin converting enzyme inhibitors (ACEI)-lisinopril], metabolism by both liver and kidney (beta-blocker (BB)-metoprolol], rapid onset- short duration of action (direct vasodilator-hydralazine), and long-acting transdermal drug delivery (centrally acting-clonidine). Regarding the ADR theme, dry cough, angioedema, and hyperkalemia by ACEIs in susceptible patients, reflex tachycardia by CCB/amlodipine, and orthostatic hypotension by CCB/verapamil addressed. Clinical indications included the drug of choice for hypertensive patients with concomitant comorbidity such as diabetics (ACEI-lisinopril), heart failure and low ejection fraction (BB-carvedilol), hypertensive urgency/emergency (alpha cum beta receptor blocker-labetalol), stroke in patients with history recurrent stroke or transient ischemic attack (ARB-losartan), and preeclampsia (methyldopa).

Almost similar themes under each domain were identified by the Claude-Instant AI platform with few notable exceptions: hydrochlorothiazide (instead of clonidine) in MOA and pharmacokinetics domains, respectively; under the ADR domain ankle edema/ amlodipine, sexual dysfunction and fatigue in male due to alpha-1 receptor blocker; under clinical indications the best initial monotherapy for clinical scenarios such as a 55-year old male with Stage-2 hypertension; a 75-year-old man Stage 1 hypertension; a 35-year-old man with Stage I hypertension working on night shifts; and a 40-year-old man with stage 1 hypertension and hyperlipidemia.

As with Claude-Instant AI, ChatGPT-generated test items on MOA were mostly similar. However, under the pharmacokinetic domain, immediate- and extended-release metoprolol, the effect of food to enhance the oral bioavailability of ramipril, and the highest oral bioavailability of amlodipine compared to other commonly used antihypertensives were the themes identified. Whereas the other ADR themes remained similar, constipation due to verapamil was a new theme addressed. Notably, in this test item, amlodipine was an option that increased the difficulty of this test item because amlodipine therapy is also associated with constipation, albeit to a lesser extent, compared to verapamil. In the clinical indication domain, the case description asking “most commonly used in the treatment of hypertension and heart failure” is controversial because the options listed included losartan, ramipril, and hydrochlorothiazide but the suggested correct answer was ramipril. This is a good example to stress the importance of vetting the AI-generated MCQ by experts for content validity and to assure robust psychometrics. The MCQ on the most used drug in the treatment of “hypertension and diabetic nephropathy” is more explicit as opposed to “hypertension and diabetes” by Claude-Instant because the therapeutic concept of reducing or delaying nephropathy must be distinguished from prevention of nephropathy, although either an ACEI or ARB is the drug of choice for both indications.

It is important to align student assessment to the curriculum; in the PBL curriculum, MCQs with a clinical vignette are preferred. The modification of the query specifying the search to generate MCQs with a clinical vignette on domains specified previously gave appropriate output by all three AI platforms evaluated (Sage Poe; Claude- Instant; Chat GPT). The scenarios generated had a good clinical fidelity and educational fit for the pre-clerkship student perspective.

The errors observed with AI outputs on the A-type MCQs are summarized in Table  2 . No significant pattern was observed except that Claude-Instant© generated test items in a stereotyped format such as the same choices for all test items related to pharmacokinetics and indications, and all the test items in the ADR domain are linked to the mechanisms of action of drugs. This illustrates the importance of reviewing AI-generated test items by content experts for content validity to ensure alignment with evidence-based medicine and up-to-date treatment guidelines.

The test items generated by ChatGPT had the advantage of explanations supplied rendering these more useful for learners to support self-study. The following examples illustrate this assertion: “ A patient with hypertension is started on a medication that works by blocking beta-1 receptors in the heart (metoprolol)”. Metoprolol is a beta blocker that works by blocking beta-1 receptors in the heart, which reduces heart rate and cardiac output, resulting in a decrease in blood pressure. However, this explanation is incomplete because there is no mention of other less important mechanisms, of beta receptor blockers on renin release. Also, these MCQs were mostly recall type: Which of the following medications is known to have a significant first-pass effect? The explanation reads: propranolol is known to have a significant first pass-effect, meaning that a large portion of the drug is metabolized by the liver before it reaches systemic circulation. Losartan, amlodipine, ramipril, and hydrochlorothiazide do not have significant first-pass effect. However, it is also important to extend the explanation further by stating that the first-pass effect of propranolol does not lead to total loss of pharmacological activity because the metabolite hydroxy propranolol also has potent beta-blocking activity. Another MCQ test item had a construction defect: “A patient with hypertension is started on a medication that can cause photosensitivity. Which of the following medications is most likely responsible?” Options included: losartan, amlodipine, ramipril, hydrochlorothiazide, hydrochlorothiazide/triamterene. The explanation states that hydrochlorothiazide (HCTZ) can cause photosensitivity, which is increased sensitivity to sunlight that can cause skin rashes and sunburns. Losartan, amlodipine, ramipril, and HCTZ/triamterene are less likely to cause photosensitivity. However, it is irrational to claim that HCTZ/triamterene, one of the wrong options, is less likely to cause photosensitivity. The difficulty index of such test items is likely unacceptable in high-stake tests to make equitable psychometric decisions.

The responses from Sage Poe, Claude-Instant, and ChatGPT on guidance for standard-setting methods are listed in the electronic supplementary materials 13 – 15 . Sage Poe offered this information with the caveat that “the results of this simulation were purely hypothetical and do not reflect the actual performance of any real students”. The simulation estimates of examinees correctly answering test items ranged from 60 to 90% for the mechanism of action, 65–85% for pharmacokinetics, 60–80% for adverse reactions, and 75–90% for clinical indications of antihypertensives.

In contrast, Claude-Instant admitted that “Performing a full simulation with 100 medical students is beyond the scope of what I can do as an AI assistant. I can generate sample MCQ questions and answers, but I do not have access to a population of medical students to administer the questions. A true simulation that administers questions to real students and analyzes the results would require access to a database of medical students, a platform to deliver the questions, and code to analyze the responses. As an AI assistant, I lack these capabilities. While I cannot perform a full simulation, here are some considerations for administering MCQ questions to medical students: provide details on the purpose and instructions for the test to ensure students understand what is expected out of them; consider including some easier questions in addition to more challenging ones to boost students’ confidence and motivation; avoid ambiguous wording and ensure each question has only one unambiguously correct answer option; when analyzing results, look for questions factors that may have impacted performance, like unclear wording or answer options that are too similar. Focusing on weak questions can help improve future tests; compare performance across different student groups to identify areas where more teaching may be needed; provide detailed answer explanations and review sessions after the test to help students learn from any errors and reinforce correct concepts”. However, ChatGPT offered the simulation result estimates for the 20 MCQ on antihypertensive drugs: mechanism of action ranged from 67 to 92%, pharmacokinetics 63–86%, adverse effects 65–82%, and clinical indications 64–89%. Furthermore, it also stated that “Overall, the performance of the students was quite good, with most questions having a response rate of over 70%. However, there were some questions where the projected correct response rate was rather low, such as Question #5 (Mechanism of Action of Hydrochlorothiazide; see Electronic Supplementary Material 12 ) and Question 18 (Indications for Verapamil; see Electronic Supplementary Material 10 ). This may suggest areas where students need more focused education or review.”

We asked AI assistants to generate 20 integrated case cluster MCQs with 2 test items in each cluster with five options for undergraduate medical students in the pre-clerkship phase integrating pharmacology and physiology related to systemic hypertension with a case vignette and the responses by Sage Poe, Claude-Instant, and ChatGPT are listed in the electronic supplementary materials ( 16 – 18 ). In all instances, the test items generated had focused case descriptions in the form of a clinical vignette, and horizontal integration across the pathophysiology of hypertension and pharmacology of antihypertensive drugs. These test items mostly targeted the ‘knows (knowledge)’ or ‘knows how (competence)’ level on Miller’s pyramid and are suitable for assessing the clinical competence of pre-clerkship medical students, especially in an integrated PBL curriculum. Both the AI assistants generated excellent clinical vignettes and themes; however, most of the cluster MCQs by ChatGPT had “None of the above” as an option, which is often considered a test item construction flaw. Notwithstanding these limitations, case cluster integrated test items are valuable for learners to integrate their knowledge of different basic medical sciences and their application to clinical sciences. This integrated approach can be used for both instructional and student assessment purposes to make the course more meaningful. Indeed, one of the basic tenets of PBL is curriculum integration.

In the next query, we asked AI assistants to write integrated case cluster MCQs with 2 test items in each cluster with 5 options for undergraduate medical students at the time of graduation integrating pharmacology and physiology of hypertension with a case vignette. The responses are listed in the electronic supplementary materials 19 – 21 . Claude-Instant has provided only partial responses to both the queries. Overall, the information provided in case vignettes requires the learner to interpret the clinical information and make therapeutic decisions. Such test items are appropriate for assessing the minimum competency of medical school graduates to start internship or residency programs; these MCQs are generally comparable to test items used in USMLE Part 2. The evaluation of the integrated case cluster test items provided by all the AI platforms is summarized in Table  3 . In general, the AI outputs had at least one construction defect, and did not include all the core sub-topics to a sufficient degree.

When asked to simulate the above-generated test items to 100 undergraduate medical students at the time of graduation and provide a percentage of students who are likely to correctly answer each item, the AI assistant responded:

“ As an AI language model, I cannot simulate the responses of 100 undergraduate medical students likely to obtain the correct answer for each question. However, I can provide some general information about the difficulty level of each question based on the complexity of the concepts being tested and the level of integration between pharmacology and physiology ”.

AI assistant went on to clarify that questions that require knowledge of basic pharmacology principles, such as the mechanism of action of specific drugs, are likely to be easier for students to answer correctly. Test items that require an understanding of the physiological mechanisms underlying hypertension and correlating with symptoms are likely to be more challenging for students. The AI assistant sorted these test items into two categories accordingly. Overall, the difficulty level of the test item is based on the level of integration between pharmacology and pathophysiology. Test items that require an understanding of both pharmacological and physiological mechanisms are likely to be more challenging for students requiring a strong foundation in both pharmacology and physiology concepts to be able to correctly answer integrated case-cluster MCQs.

Short answer questions

The responses to a search query on generating SAQs appropriate to the pre-clerkship phase Sage Poe, Claude-Instant, and ChatGPT generated items are listed in the electronic supplementary materials 22 – 24 for difficult questions and 25–27 for moderately difficult questions.

It is apparent from these case vignette descriptions that the short answer question format varied. Accordingly, the scope for asking individual questions for each scenario is open-ended. In all instances, model answers are supplied which are helpful for the course instructor to plan classroom lessons, identify appropriate instructional methods, and establish rubrics for grading the answer scripts, and as a study guide for students.

We then wanted to see to what extent AI can differentiate the difficulty of the SAQ by replacing the search term “difficult” with “moderately difficult” in the above search prompt: the changes in the revised case scenarios are substantial. Perhaps the context of learning and practice (and the level of the student in the MD/medical program) may determine the difficulty level of SAQ generated. It is worth noting that on changing the search from cardiology to internal medicine rotation in Sage Poe the case description also changed. Thus, it is essential to select an appropriate AI assistant, perhaps by trial and error, to generate quality SAQs. Most of the individual questions tested stand-alone knowledge and did not require students to demonstrate integration.

The responses of Sage Poe, Claude-Instant, and ChatGPT for the search query to generate SAQs at the time of graduation are listed in the electronic supplementary materials 28 – 30 . It is interesting to note how AI assistants considered the stage of the learner while generating the SAQ. The response by Sage Poe is illustrative for comparison. “You are a newly graduated medical student who is working in a hospital” versus “You are a medical student in your pre-clerkship.”

Some questions were retained, deleted, or modified to align with competency appropriate to the context (Electronic Supplementary Materials 28 – 30 ). Overall, the test items at both levels from all AI platforms were technically accurate and thorough addressing the topics related to different disciplines (Table  3 ). The differences in learning objective transition are summarized in Table  4 . A comparison of learning objectives revealed that almost all objectives remained the same except for a few (Table  5 ).

A similar trend was apparent with test items generated by other AI assistants, such as ChatGPT. The contrasting differences in questions are illustrated by the vertical integration of basic sciences and clinical sciences (Table  6 ).

Taken together, these in-depth qualitative comparisons suggest that AI assistants such as Sage Poe and ChatGPT consider the learner’s stage of training in designing test items, learning outcomes, and answers expected from the examinee. It is critical to state the search query explicitly to generate quality output by AI assistants.

The OSPE test items generated by Claude-Instant and ChatGPT appropriate to the pre-clerkship phase (without mentioning “appropriate instructions for the patients”) are listed in the electronic supplementary materials 31 and 32 and with patient instructions on the electronic supplementary materials 33 and 34 . For reasons unknown, Sage Poe did not provide any response to this search query.

The five OSPE items generated were suitable to assess the prescription writing competency of pre-clerkship medical students. The clinical scenarios identified by the three AI platforms were comparable; these scenarios include patients with hypertension and impaired glucose tolerance in a 65-year-old male, hypertension with chronic kidney disease (CKD) in a 55-year-old woman, resistant hypertension with obstructive sleep apnea in a 45-year-old man, and gestational hypertension at 32 weeks in a 35-year-old (Claude-Instant AI). Incorporating appropriate instructions facilitates the learner’s ability to educate patients and maximize safe and effective therapy. The OSPE item required students to write a prescription with guidance to start conservatively, choose an appropriate antihypertensive drug class (drug) based on the patients’ profile, specifying drug name, dose, dosing frequency, drug quantity to be dispensed, patient name, date, refill, and caution as appropriate, in addition to prescribers’ name, signature, and license number. In contrast, ChatGPT identified clinical scenarios to include patients with hypertension and CKD, hypertension and bronchial asthma, gestational diabetes, hypertension and heart failure, and hypertension and gout (ChatGPT). Guidance for dosage titration, warnings to be aware, safety monitoring, and frequency of follow-up and dose adjustment. These test items are designed to assess learners’ knowledge of P & T of antihypertensives, as well as their ability to provide appropriate instructions to patients. These clinical scenarios for writing prescriptions assess students’ ability to choose an appropriate drug class, write prescriptions with proper labeling and dosing, reflect drug safety profiles, and risk factors, and make modifications to meet the requirements of special populations. The prescription is required to state the drug name, dose, dosing frequency, patient name, date, refills, and cautions or instructions as needed. A conservative starting dose, once or twice daily dosing frequency based on the drug, and instructions to titrate the dose slowly if required.

The responses from Claude-Instant and ChatGPT for the search query related to generating OSPE test items at the time of graduation are listed in electronic supplementary materials 35 and 36 . In contrast to the pre-clerkship phase, OSPEs generated for graduating doctors’ competence assessed more advanced drug therapy comprehension. For example, writing a prescription for:

(1) A 65-year- old male with resistant hypertension and CKD stage 3 to optimize antihypertensive regimen required the answer to include starting ACEI and diuretic, titrating the dosage over two weeks, considering adding spironolactone or substituting ACEI with an ARB, and need to closely monitor serum electrolytes and kidney function closely.

(2) A 55-year-old woman with hypertension and paroxysmal arrhythmia required the answer to include switching ACEI to ARB due to cough, adding a CCB or beta blocker for rate control needs, and adjusting the dosage slowly and monitoring for side effects.

(3) A 45-year-old man with masked hypertension and obstructive sleep apnea require adding a centrally acting antihypertensive at bedtime and increasing dosage as needed based on home blood pressure monitoring and refer to CPAP if not already using one.

(4) A 75-year-old woman with isolated systolic hypertension and autonomic dysfunction to require stopping diuretic and switching to an alpha blocker, upward dosage adjustment and combining with other antihypertensives as needed based on postural blood pressure changes and symptoms.

(5) A 35-year-old pregnant woman with preeclampsia at 29 weeks require doubling methyldopa dose and consider adding labetalol or nifedipine based on severity and educate on signs of worsening and to follow-up immediately for any concerning symptoms.

These case scenarios are designed to assess the ability of the learner to comprehend the complexity of antihypertensive regimens, make evidence-based regimen adjustments, prescribe multidrug combinations based on therapeutic response and tolerability, monitor complex patients for complications, and educate patients about warning signs and follow-up.

A similar output was provided by ChatGPT, with clinical scenarios such as prescribing for patients with hypertension and myocardial infarction; hypertension and chronic obstructive pulmonary airway disease (COPD); hypertension and a history of angina; hypertension and a history of stroke, and hypertension and advanced renal failure. In these cases, wherever appropriate, pharmacotherapeutic issues like taking ramipril after food to reduce side effects such as giddiness; selection of the most appropriate beta-blocker such as nebivolol in patients with COPD comorbidity; the importance of taking amlodipine at the same time every day with or without food; preference for telmisartan among other ARBs in stroke; choosing furosemide in patients with hypertension and edema and taking the medication with food to reduce the risk of gastrointestinal adverse effect are stressed.

The AI outputs on OSPE test times were observed to be technically accurate, thorough in addressing core sub-topics suitable for the learner’s level and did not have any construction defects (Table  3 ). Both AIs provided the model answers with explanatory notes. This facilitates the use of such OSPEs for self-assessment by learners for formative assessment purposes. The detailed instructions are helpful in creating optimized therapy regimens, and designing evidence-based regimens, to provide appropriate instructions to patients with complex medical histories. One can rely on multiple AI sources to identify, shortlist required case scenarios, and OSPE items, and seek guidance on expected model answers with explanations. The model answer guidance for antihypertensive drug classes is more appropriate (rather than a specific drug of a given class) from a teaching/learning perspective. We believe that these scenarios can be refined further by providing a focused case history along with relevant clinical and laboratory data to enhance clinical fidelity and bring a closer fit to the competency framework.

In the present study, AI tools have generated SLOs that comply with the current principles of medical education [ 15 ]. AI tools are valuable in constructing SLOs and so are especially useful for medical fraternities where training in medical education is perceived as inadequate, more so in the early stages of their academic career. Data suggests that only a third of academics in medical schools have formal training in medical education [ 16 ] which is a limitation. Thus, the credibility of alternatives, such as the AIs, is evaluated to generate appropriate course learning outcomes.

We observed that the AI platforms in the present study generated quality test items suitable for different types of assessment purposes. The AI-generated outputs were similar with minor variation. We have used generative AIs in the present study that could generate new content from their training dataset [ 17 ]. Problem-based and interactive learning approaches are referred to as “bottom-up” where learners obtain first-hand experience in solving the cases first and then indulge in discussion with the educators to refine their understanding and critical thinking skills [ 18 ]. We suggest that AI tools can be useful for this approach for imparting the core knowledge and skills related to Pharmacology and Therapeutics to undergraduate medical students. A recent scoping review evaluating the barriers to writing quality test items based on 13 studies has concluded that motivation, time constraints, and scheduling were the most common [ 19 ]. AI tools can be valuable considering the quick generation of quality test items and time management. However, as observed in the present study, the AI-generated test items nevertheless require scrutiny by faculty members for content validity. Moreover, it is important to train faculty in AI technology-assisted teaching and learning. The General Medical Council recommends taking every opportunity to raise the profile of teaching in medical schools [ 20 ]. Hence, both the academic faculty and the institution must consider investing resources in AI training to ensure appropriate use of the technology [ 21 ].

The AI outputs assessed in the present study had errors, particularly with A-type MCQs. One notable observation was that often the AI tools were unable to differentiate the differences between ACEIs and ARBs. AI platforms access several structured and unstructured data, in addition to images, audio, and videos. Hence, the AI platforms can commit errors due to extracting details from unauthenticated sources [ 22 ] created a framework identifying 28 factors for reconstructing the path of AI failures and for determining corrective actions. This is an area of interest for AI technical experts to explore. Also, this further iterates the need for human examination of test items before using them for assessment purposes.

There are concerns that AIs can memorize and provide answers from their training dataset, which they are not supposed to do [ 23 ]. Hence, the use of AIs-generated test items for summative examinations is debatable. It is essential to ensure and enhance the security features of AI tools to reduce or eliminate cross-contamination of test items. Researchers have emphasized that AI tools will only reach their potential if developers and users can access full-text non-PDF formats that help machines comprehend research papers and generate the output [ 24 ].

AI platforms may not always have access to all standard treatment guidelines. However, in the present study, it was observed that all three AI platforms generally provided appropriate test items regarding the choice of medications, aligning with recommendations from contemporary guidelines and standard textbooks in pharmacology and therapeutics. The prompts used in the study were specifically focused on the pre-clerkship phase of the undergraduate medical curriculum (and at the time of their graduation) and assessed fundamental core concepts, which were also reflected in the AI outputs. Additionally, the recommended first-line antihypertensive drug classes have been established for several decades, and information regarding their pharmacokinetics, ADRs, and indications is well-documented in the literature.

Different paradigms and learning theories have been proposed to support AI in education. These paradigms include AI- directed (learner as recipient), AI-supported (learner as collaborator), and AI-empowered (learner as leader) that are based on Behaviorism, Cognitive-Social constructivism, and Connectivism-Complex adaptive systems, respectively [ 25 ]. AI techniques have potential to stimulate and advance instructional and learning sciences. More recently a three- level model that synthesizes and unifies existing learning theories to model the roles of AIs in promoting learning process has been proposed [ 26 ]. The different components of our study rely upon these paradigms and learning theories as the theoretical underpinning.

Strengths and limitations

To the best of our knowledge, this is the first study evaluating the utility of AI platforms in generating test items related to a discipline in the undergraduate medical curriculum. We have evaluated the AI’s ability to generate outputs related to most types of assessment in the undergraduate medical curriculum. The key lessons learnt for improving the AI-generated test item quality from the present study are outlined in Table  7 . We used a structured framework for assessing the content validity of the test items. However, we have demonstrated using a single case study (hypertension) as a pilot experiment. We chose to evaluate anti-hypertensive drugs as it is a core learning objective and one of the most common disorders relevant to undergraduate medical curricula worldwide. It would be interesting to explore the output from AI platforms for other common (and uncommon/region-specific) disorders, non-/semi-core objectives, and disciplines other than Pharmacology and Therapeutics. An area of interest would be to look at the content validity of the test items generated for different curricula (such as problem-based, integrated, case-based, and competency-based) during different stages of the learning process. Also, we did not attempt to evaluate the generation of flowcharts, algorithms, or figures for generating test items. Another potential area for exploring the utility of AIs in medical education would be repeated procedural practices such as the administration of drugs through different routes by trainee residents [ 27 ]. Several AI tools have been identified for potential application in enhancing classroom instructions and assessment purposes pending validation in prospective studies [ 28 ]. Lastly, we did not administer the AI-generated test items to students and assessed their performance and so could not comment on the validity of test item discrimination and difficulty indices. Additionally, there is a need to confirm the generalizability of the findings to other complex areas in the same discipline as well as in other disciplines that pave way for future studies. The conceptual framework used in the present study for evaluating the AI-generated test items needs to be validated in a larger population. Future studies may also try to evaluate the variations in the AI outputs with repetition of the same queries.

Notwithstanding ongoing discussions and controversies, AI tools are potentially useful adjuncts to optimize instructional methods, test blueprinting, test item generation, and guidance for test standard-setting appropriate to learners’ stage in the medical program. However, experts need to critically review the content validity of AI-generated output. These challenges and caveats are to be addressed before the use of widespread use of AIs in medical education can be advocated.

Data availability

All the data included in this study are provided as Electronic Supplementary Materials.

Tolsgaard MG, Pusic MV, Sebok-Syer SS, Gin B, Svendsen MB, Syer MD, Brydges R, Cuddy MM, Boscardin CK. The fundamentals of Artificial Intelligence in medical education research: AMEE Guide 156. Med Teach. 2023;45(6):565–73.

Article   Google Scholar  

Sriwastwa A, Ravi P, Emmert A, Chokshi S, Kondor S, Dhal K, Patel P, Chepelev LL, Rybicki FJ, Gupta R. Generative AI for medical 3D printing: a comparison of ChatGPT outputs to reference standard education. 3D Print Med. 2023;9(1):21.

Azer SA, Guerrero APS. The challenges imposed by artificial intelligence: are we ready in medical education? BMC Med Educ. 2023;23(1):680.

Masters K. Ethical use of Artificial Intelligence in Health Professions Education: AMEE Guide 158. Med Teach. 2023;45(6):574–84.

Nagi F, Salih R, Alzubaidi M, Shah H, Alam T, Shah Z, Househ M. Applications of Artificial Intelligence (AI) in Medical Education: a scoping review. Stud Health Technol Inf. 2023;305:648–51.

Google Scholar  

Mehta N, Harish V, Bilimoria K, et al. Knowledge and attitudes on artificial intelligence in healthcare: a provincial survey study of medical students. MedEdPublish. 2021;10(1):75.

Mir MM, Mir GM, Raina NT, Mir SM, Mir SM, Miskeen E, Alharthi MH, Alamri MMS. Application of Artificial Intelligence in Medical Education: current scenario and future perspectives. J Adv Med Educ Prof. 2023;11(3):133–40.

Garg T. Artificial Intelligence in Medical Education. Am J Med. 2020;133(2):e68.

Matheny ME, Whicher D, Thadaney IS. Artificial intelligence in health care: a report from the National Academy of Medicine. JAMA. 2020;323(6):509–10.

Sage Poe. Available at: https://poe.com/Assistant (Accessed on. 3rd June 2023).

Claude-Instant: Available at: https://poe.com/Claude-instant (Accessed on 3rd. June 2023).

ChatGPT: Available at: https://poe.com/ChatGPT (Accessed on 3rd. June 2023).

James PA, Oparil S, Carter BL, Cushman WC, Dennison-Himmelfarb C, Handler J, Lackland DT, LeFevre ML, MacKenzie TD, Ogedegbe O, Smith SC Jr, Svetkey LP, Taler SJ, Townsend RR, Wright JT Jr, Narva AS, Ortiz E. 2014 evidence-based guideline for the management of high blood pressure in adults: report from the panel members appointed to the Eighth Joint National Committee (JNC 8). JAMA. 2014;311(5):507–20.

Eschenhagen T. Treatment of hypertension. In: Brunton LL, Knollmann BC, editors. Goodman & Gilman’s the pharmacological basis of therapeutics. 14th ed. New York: McGraw Hill; 2023.

Shabatura J. September. Using Bloom’s taxonomy to write effective learning outcomes. https://tips.uark.edu/using-blooms-taxonomy/ (Accessed on 19th 2023).

Trainor A, Richards JB. Training medical educators to teach: bridging the gap between perception and reality. Isr J Health Policy Res. 2021;10(1):75.

Boscardin C, Gin B, Golde PB, Hauer KE. ChatGPT and generative artificial intelligence for medical education: potential and opportunity. Acad Med. 2023. https://doi.org/10.1097/ACM.0000000000005439 . (Published ahead of print).

Duong MT, Rauschecker AM, Rudie JD, Chen PH, Cook TS, Bryan RN, Mohan S. Artificial intelligence for precision education in radiology. Br J Radiol. 2019;92(1103):20190389.

Karthikeyan S, O’Connor E, Hu W. Barriers and facilitators to writing quality items for medical school assessments - a scoping review. BMC Med Educ. 2019;19(1):123.

Developing teachers and trainers in undergraduate medical education. Advice supplementary to Tomorrow’s Doctors. (2009). https://www.gmc-uk.org/-/media/documents/Developing_teachers_and_trainers_in_undergraduate_medical_education___guidance_0815.pdf_56440721.pdf (Accessed on 19th September 2023).

Cooper A, Rodman A. AI and Medical Education - A 21st-Century Pandora’s Box. N Engl J Med. 2023;389(5):385–7.

Chanda SS, Banerjee DN. Omission and commission errors underlying AI failures. AI Soc. 2022;17:1–24.

Narayanan A, Kapoor S. ‘GPT-4 and Professional Benchmarks: The Wrong Answer to the Wrong Question’. Substack newsletter. AI Snake Oil (blog). https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks (Accessed on 19th September 2023).

Brainard J. November. As scientists face a flood of papers, AI developers aim to help. Science, 21 2023. doi.10.1126/science.adn0669.

Ouyang F, Jiao P. Artificial intelligence in education: the three paradigms. Computers Education: Artif Intell. 2021;2:100020.

Gibson D, Kovanovic V, Ifenthaler D, Dexter S, Feng S. Learning theories for artificial intelligence promoting learning processes. Br J Edu Technol. 2023;54(5):1125–46.

Guerrero DT, Asaad M, Rajesh A, Hassan A, Butler CE. Advancing Surgical Education: the Use of Artificial Intelligence in Surgical Training. Am Surg. 2023;89(1):49–54.

Lee S. AI tools for educators. EIT InnoEnergy Master School Teachers Conference. 2023. https://www.slideshare.net/ignatia/ai-toolkit-for-educators?from_action=save (Accessed on 24th September 2023).

Download references

Author information

Authors and affiliations.

Department of Pharmacology & Therapeutics, College of Medicine & Medical Sciences, Arabian Gulf University, Manama, Kingdom of Bahrain

Kannan Sridharan & Reginald P. Sequeira

You can also search for this author in PubMed   Google Scholar

Contributions

RPS– Conceived the idea; KS– Data collection and curation; RPS and KS– Data analysis; RPS and KS– wrote the first draft and were involved in all the revisions.

Corresponding author

Correspondence to Kannan Sridharan .

Ethics declarations

Ethics approval and consent to participate.

Not applicable as neither there was any interaction with humans, nor any personal data was collected in this research study.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Sridharan, K., Sequeira, R.P. Artificial intelligence and medical education: application in classroom instruction and student assessment using a pharmacology & therapeutics case study. BMC Med Educ 24 , 431 (2024). https://doi.org/10.1186/s12909-024-05365-7

Download citation

Received : 26 September 2023

Accepted : 28 March 2024

Published : 22 April 2024

DOI : https://doi.org/10.1186/s12909-024-05365-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Medical education
  • Pharmacology
  • Therapeutics

BMC Medical Education

ISSN: 1472-6920

what is a clinical case study

IMAGES

  1. How To Write A Clinical Case Study Nursing

    what is a clinical case study

  2. FREE 11+ Clinical Case Study Templates in PDF

    what is a clinical case study

  3. Clinical Case Study

    what is a clinical case study

  4. 8+ Clinical Case Study Templates and Templates

    what is a clinical case study

  5. FREE 11+ Clinical Case Study Templates in PDF

    what is a clinical case study

  6. Clinical case study presentation

    what is a clinical case study

VIDEO

  1. Understanding Clinical Trials: What you need to know to be a part of the latest research

  2. V3/SDR Clinical Case Study by Dr Clarence Tam

  3. Clinical Case Study of Class II Restoration Tooth #21 ADA (#34 FDI) By Dr John C Comisi

  4. CLINICAL CASE STUDY QUESTIONS FOR PHARMACISTS || PART : 14 || #PROMETRIC #MOH #DHA #HAAD #SLE

  5. CLINICAL CASE STUDY QUESTIONS FOR PHARMACISTS || PART : 15 || #PROMETRIC #MOH #DHA #HAAD #SLE

  6. LET'S LEARN : DRUGS CLASSIFICATIONS & INDICATIONS || #VITAMINS || (PART : 4 )

COMMENTS

  1. Guidelines To Writing A Clinical Case Report

    A case report is a detailed report of the symptoms, signs, diagnosis, treatment, and follow-up of an individual patient. Case reports usually describe an unusual or novel occurrence and as such, remain one of the cornerstones of medical progress and provide many new ideas in medicine. Some reports contain an extensive review of the relevant ...

  2. Writing a case report in 10 steps

    Writing up. Write up the case emphasising the interesting points of the presentation, investigations leading to diagnosis, and management of the disease/pathology. Get input on the case from all members of the team, highlighting their involvement. Also include the prognosis of the patient, if known, as the reader will want to know the outcome.

  3. A young researcher's guide to writing a clinical case report

    A clinical case report or case study is a means of disseminating new knowledge gained from clinical practice. Medical practitioners often come across patient cases that are different or unusual such as a previously unknown condition, a complication of a known disease, an unusual side effect or adverse response to a mode of treatment, or a new ...

  4. Clinical Case Studies

    Study design. There are four basic clinical studies: case series, case-control studies, cohort studies, and randomized clinical trials. 2 The case report or case series is probably the weakest method of deriving clinical data. Case series are usually retrospective reviews that list the clinical findings of patients with a specific disease.

  5. What Is a Case Study?

    A case study is a detailed study of a specific subject, such as a person, group, place, event, organization, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research. A case study research design usually involves qualitative methods, but quantitative methods are sometimes also used.

  6. Case Study Research Method in Psychology

    General components of clinical case studies include: background, symptoms, assessments, diagnosis, treatment, and outcomes. Interpreting the information means the researcher decides what to include or leave out. A good case study should always clarify which information is the factual description and which is an inference or the researcher's ...

  7. What is a case study?

    Case study is a research methodology, typically seen in social and life sciences. There is no one definition of case study research.1 However, very simply… 'a case study can be defined as an intensive study about a person, a group of people or a unit, which is aimed to generalize over several units'.1 A case study has also been described as an intensive, systematic investigation of a ...

  8. Case Study: Definition, Examples, Types, and How to Write

    A case study is an in-depth study of one person, group, or event. In a case study, nearly every aspect of the subject's life and history is analyzed to seek patterns and causes of behavior. Case studies can be used in many different fields, including psychology, medicine, education, anthropology, political science, and social work.

  9. Case Study

    A case study is a detailed study of a specific subject, such as a person, group, place, event, organisation, or phenomenon. Case studies are commonly used in social, educational, clinical, and business research. A case study research design usually involves qualitative methods, but quantitative methods are sometimes also used.

  10. Clinical Case Reports

    Clinical Case Reports is very fortunate to be supported by many other journals published by Wiley, including a number of society-owned journals. These journals participate in the Manuscript Transfer Program by referring case reports and offering authors the option to have their paper, with any peer review reports, automatically transferred to Clinical Case Reports.

  11. Difference Between Case Reports & Clinical Studies

    Definition of case report and clinical study. In medicine, a case report is a detailed report of the symptoms, signs, diagnosis, treatment, and follow-up of an individual patient. Case reports may contain a demographic profile of the patient, but usually describe an unusual or novel occurrence. The case report is written on one individual patient.

  12. Clinical Case Studies: Sage Journals

    Clinical Case Studies (CCS), peer-reviewed & published bi-monthly electronic only, is the only journal devoted entirely to innovative psychotherapy case studies & presents cases involving individual, couples, & family therapy. The easy-to-follow case presentation format allows you to learn how interesting & challenging cases were assessed ...

  13. Clinical Case Studies

    Randall Cox. Ateka A. Contractor. Preview abstract. Free access Research article First published August 11, 2021 pp. 61-81. xml PDF / EPUB. Table of contents for Clinical Case Studies, 21, 1, Feb 01, 2022.

  14. Clinical case definition

    In epidemiology, a clinical case definition, a clinical definition, or simply a case definition lists the clinical criteria by which public health professionals determine whether a person's illness is included as a case in an outbreak investigation—that is, whether a person is considered directly affected by an outbreak. Absent an outbreak, case definitions are used in the surveillance of ...

  15. Submission Guidelines: Clinical Case Studies: Sage Journals

    CLINICAL CASE STUDIES seeks manuscripts of innovative and novel psychotherapy treatment cases that articulate various theoretical frameworks (behavioral, cognitive-behavioral, gestalt, humanistic, psychodynamic, rational-emotive therapy, existential, systems, and others).All manuscripts will require an abstract and must adhere to the following format:

  16. Sequential CD7 CAR T-Cell Therapy and Allogeneic HSCT without GVHD

    Nine patients were enrolled in a clinical study of donor-derived CD7 CAR T cells (ClinicalTrials.gov number, NCT04599556) (Fig. S1 in the Supplementary Appendix, available with the full text of ...

  17. The case study approach

    A case study is a research approach that is used to generate an in-depth, multi-faceted understanding of a complex issue in its real-life context. It is an established research design that is used extensively in a wide variety of disciplines, particularly in the social sciences. A case study can be defined in a variety of ways (Table 5 ), the ...

  18. Poorly differentiated neuroendocrine carcinoma originating in the

    Clinical Case Reports aims to improve global health outcomes by sharing clinical knowledge through the use of medical case reports, ... In this study, we presented a rare case of poorly differentiated NEC (small cell) and described its clinicopathological presentation. The presentation of this tumor with respiratory distress in a nonsmoker male ...

  19. Comparing generative and extractive approaches to information

    The number of publications describing Randomized Controlled Trials has been increasing at an exponential pace for decades [], thus making it more and more challenging to appropriately summarize the existing clinical evidence by way of systematic reviews.Yet, the ability to summarize the current clinical evidence is a core process to support evidence-based medical decision making [].

  20. Identifying the clinical and histopathological characteristics of

    Table 2 summarises the clinical findings and dermoscopic descriptions during clinical examination. In our series, the clinical misdiagnosis rate was 87.5% with squamous cell carcinoma (SCC) and basal cell carcinoma (BCC) the most common misdiagnoses, accounting for 9 of 16 cases.

  21. Bioinformatic Analysis and Clinical Case Studies Identify CD276 as a

    Renal cell carcinoma (RCC) originates from a malignant epithelial tumor of the renal tubules, accounting for 2-3% of all adult malignancies, and has the highest mortality rate among urological cancers. 1 Clear cell renal cell carcinoma (ccRCC) is the most prevalent form of renal cell carcinoma, accounting for approximately 75% of cases. 2 Statistics indicate an annual increase of nearly 2% in ...

  22. Genetic exploration of Dravet syndrome: two case report

    Background Dravet syndrome is an infantile-onset developmental and epileptic encephalopathy (DEE) characterized by drug resistance, intractable seizures, and developmental comorbidities. This article focuses on manifestations in two Indonesian children with Javanese ethnicity who experienced Dravet syndrome with an SCN1A gene mutation, presenting genetic analysis findings using next-generation ...

  23. OpenAI customer story: Moderna

    Dose ID is intended for use as a data-analysis assistant to the clinical study team, helping to augment the team's clinical judgment and decision-making. "Dose ID has provided supportive rationale for why we have picked a specific dose over other doses. It has allowed us to create customized data visualizations and it has also helped the ...

  24. Artificial intelligence and medical education: application in classroom

    Integrated case-cluster items had focused clinical case description vignettes, integration across disciplines, and targeted higher levels of competencies. ... ChatGPT generated explanations for test items, this enhancing usefulness to support self-study by learners. Integrated case-cluster items had focused clinical case description vignettes ...