Josphine chaumba, troy university, additional files, make a submission, information.
Advances in Social Work ISSN: 1527-8565 eISSN: 2331-4125
Land Acknowledgement. We acknowledge the Indiana University School of Social Work is located on the ancestral lands of Indigenous Peoples from time immemorial. Indiana is the traditional lands of Potawatomi, Illini, Miami, Kickapoo, Lenape/Delaware, Wea, Piankashaw, Shawnee, Nanticoke, and Wyandot. We are dedicated to amplifying Indigenous voices and perspectives, improving community relationships, correcting the narratives, and making the Indiana University School of Social Work supportive and inclusive places for Native and Indigenous students, faculty, and staff. With humility and respect, we at Indiana University School of Social Work recognize and honor all Indigenous Peoples, their histories, their political rights and sovereignty and their sacred ties to the land and waters.
Lisa Bunting, Nicole Gleghorne, Aideen Maguire, Sarah McKenna, Dermot O’Reilly, Changing Trends in Child Welfare Inequalities in Northern Ireland, The British Journal of Social Work , Volume 54, Issue 5, July 2024, Pages 1809–1829, https://doi.org/10.1093/bjsw/bcad259
Longitudinal research in England and Wales has identified increasing inequality in child welfare interventions, particularly with respect to children in the poorest areas coming into care. Although previous cross-sectional research has shown associations between area level deprivation and child welfare interventions to be weakest in Northern Ireland (NI), it remains unknown if this reflects wider trends over time. This study uses longitudinal administrative data to investigate the relationship between area level deprivation and the (1) referral, (2) investigation, (3) registration and (4) looked after stages of children’s contact with child and family social work from 2010 to 2017 (stages 1–3) and 2020 (stage 4). Both relative and absolute measures of inequality (Ratio of Inequality, Slope Index of Inequality and Relative Index of Inequality) were calculated to examine trends. The results highlight a clear and increasing social gradient in child welfare interventions in NI over time, particularly at the higher levels of intervention and those involving children aged 0–4 years. Routine analysis of children’s social care caseloads by deprivation is highlighted as a means of focusing attention on poverty and material inequality, prompting practitioners, managers and policy makers to consider the drivers of such inequality and how this might be addressed.
Sign in with a library card.
Access to content on Oxford Academic is often provided through institutional subscriptions and purchases. If you are a member of an institution with an active account, you may be able to access content in one of the following ways:
Typically, access is provided across an institutional network to a range of IP addresses. This authentication occurs automatically, and it is not possible to sign out of an IP authenticated account.
Choose this option to get remote access when outside your institution. Shibboleth/Open Athens technology is used to provide single sign-on between your institution’s website and Oxford Academic.
If your institution is not listed or you cannot sign in to your institution’s website, please contact your librarian or administrator.
Enter your library card number to sign in. If you cannot sign in, please contact your librarian.
Society member access to a journal is achieved in one of the following ways:
Many societies offer single sign-on between the society website and Oxford Academic. If you see ‘Sign in through society site’ in the sign in pane within a journal:
If you do not have a society account or have forgotten your username or password, please contact your society.
Some societies use Oxford Academic personal accounts to provide access to their members. See below.
A personal account can be used to get email alerts, save searches, purchase content, and activate subscriptions.
Some societies use Oxford Academic personal accounts to provide access to their members.
Click the account icon in the top right to:
Oxford Academic is home to a wide variety of products. The institutional subscription may not cover the content that you are trying to access. If you believe you should have access to that content, please contact your librarian.
For librarians and administrators, your personal account also provides access to institutional account management. Here you will find options to view and activate subscriptions, manage institutional settings and access options, access usage statistics, and more.
To purchase short-term access, please sign in to your personal account above.
Don't already have a personal account? Register
Month: | Total Views: |
---|---|
December 2023 | 4 |
January 2024 | 32 |
February 2024 | 22 |
March 2024 | 34 |
April 2024 | 19 |
May 2024 | 17 |
June 2024 | 5 |
July 2024 | 13 |
August 2024 | 15 |
Citing articles via.
Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide
Sign In or Create an Account
This PDF is available to Subscribers Only
For full access to this pdf, sign in to an existing account, or purchase an annual subscription.
BMC Digital Health volume 2 , Article number: 60 ( 2024 ) Cite this article
34 Accesses
Metrics details
Stigma surrounding substance use can result in severe consequences for physical and mental health. Identifying situations in which stigma occurs and characterizing its impact could be a critical step toward improving outcomes for individuals experiencing stigma. As part of a larger research project with the goal of informing the development of interventions for substance use disorder, this study leverages natural language processing methods and a theory-informed approach to identify and characterize manifestations of substance use stigma in social media data.
We harvested social media data, creating an annotated corpus of 2,214 Reddit posts from subreddits relating to substance use. We trained a set of binary classifiers; each classifier detected one of three stigma types: Internalized Stigma, Anticipated Stigma, and Enacted Stigma, from the Stigma Framework. We evaluated hybrid models that combine contextual embeddings with features derived from extant lexicons and handcrafted lexicons based on stigma theory, and assessed the performance of these models. Then, using the trained and evaluated classifiers, we performed a mixed-methods analysis to quantify the presence and type of stigma in a corpus of 161,448 unprocessed posts derived from subreddits relating to substance use.
For all stigma types, we identified hybrid models (RoBERTa combined with handcrafted stigma features) that significantly outperformed RoBERTa-only baselines. In the model’s predictions on our unseen data, we observed that Internalized Stigma was the most prevalent stigma type for alcohol and cannabis, but in the case of opioids, Anticipated Stigma was the most frequent. Feature analysis indicated that language conveying Internalized Stigma was predominantly characterized by emotional content, with a focus on shame, self-blame, and despair. In contrast, Enacted Stigma and Anticipated involved a complex interplay of emotional, social, and behavioral features.
Our main contributions are demonstrating a theory-based approach to extracting and comparing different types of stigma in a social media dataset, and employing patterns in word usage to explore and characterize its manifestations. The insights from this study highlight the need to consider the impacts of stigma differently by mechanism (internalized, anticipated, and enacted), and enhance our current understandings of how each stigma mechanism manifests within language in particular cognitive, emotional, social, and behavioral aspects.
Peer Review reports
Persons with substance use disorders (SUDs) can experience stigma in various forms, including stereotypes, prejudice, and discrimination, and this stigma can have far-ranging consequences for their health, employment, housing, and relationships [ 1 ]. Individuals experiencing stigma may internalize these negative beliefs and feelings, have diminished self-esteem and recovery capital [ 2 , 3 ], and be reluctant to seek treatment [ 4 ].
Interventions focused on stigma reduction in the context of substance use have been limited, and these have tended to focus on structural stigma (e.g., education of professionals that work with persons with SUDs) as opposed to social or self-stigma [ 5 ]. There is also awareness of the bias in words used to describe SUDs, and the need to consider word choice [ 6 , 7 ]. However, despite the potential harms of substance use stigma, our knowledge of how different types of stigma affect persons within the context of SUDs remains limited [ 5 , 8 , 9 , 10 , 11 , 12 ].
In this article, we demonstrate a stigma theory-informed deep learning approach to the task of identifying examples of substance use stigma in a large dataset. To ensure that we capture stigma in the diverse forms in which it occurs, we employ the Stigma Framework [ 13 ], which defines three stigma mechanisms for those who experience stigma: Internalized Stigma , Anticipated Stigma , and Enacted Stigma . The Stigma Framework has been used to characterize stigma processes in various health-related contexts, including problematic substance use [ 11 ] and HIV [ 13 ], and extant literature has sought to develop instruments to assess the experience of these three types of stigma [ 11 ]. To our knowledge, however, prior work has not explored how the three stigma mechanisms are conveyed by the language used in social media. We examine stigma as expressed in social media for two main reasons: 1) previous literature has shown that stigma relating to mental health is endemic in social media [ 14 , 15 ]; and 2) social media can serve an important role in understanding and promoting public health [ 16 , 17 ].
This current study aims to answer the research question: How do the three stigma mechanisms in the Stigma Framework manifest differently in terms of distribution and nature in social media? We take the following approach:
We develop classifiers to identify three stigma mechanisms in an annotated social media dataset and evaluate the performance of these classifiers.
To gain a deeper understanding of the prevalence of the three stigma mechanisms in social media at large, we analyze how each stigma mechanism is distributed in the predictions made by the classifiers on the unseen portion of our data.
To better understand the linguistic expression of the different stigma mechanisms in social media, we identify the highest-ranking features associated with each mechanism and offer illustrative examples.
Conceptualizations of stigma.
Goffman [ 18 ] influentially defined stigma as “an attribute that is deeply discrediting”, and which reduces the stigmatized “from a whole and usual person to a tainted, discounted one” (p. 3). Goffman described stigma as a product of interactions, and stated that “a language of relations, not attributes, is really needed to describe stigma” [ 18 ] (p.3). The relational nature of stigma was emphasized by subsequent stigma theory [ 19 , 20 ] that characterized stigma as a social process situated in a social context, with Link and Phelan [ 19 ] conceptualizing stigma as a convergence of labeling, stereotyping, separation, status loss, and discrimination, all within a power structure.
To complement existing societal-level conceptualizations of stigma with individual-level ones and create a more comprehensive theory of stigma and its impact, Earnshaw and Chaudoir [ 13 ] proposed the Stigma Framework. In this framework, which draws on stigma theory from a variety of domains [ 19 , 20 , 21 , 22 , 23 ], attention is given to both the mechanisms of stigma employed by those with power, and also the ways that stigma is experienced or adopted by stigmatized individuals. Earnshaw and Chaudoir distinguish three mechanisms employed by those who distance themselves from the “mark” of stigma: prejudice, stereotyping, and discrimination; and three mechanisms (hereafter primarily called “types”) for those who experience stigma: Internalized Stigma , Anticipated Stigma , and Enacted Stigma . Table 1 provides definitions and examples of each of the three types of experienced stigma, in the context of substance use, as defined in Smith et al. [ 11 ]. The stigma mechanisms identified by the Stigma Framework have been assessed in various health-related contexts and have been associated with physical, mental, and behavioral outcomes for those that experience stigma [ 11 , 24 , 25 ].
Despite the existence of different conceptualizations of stigma, there is much that we do not yet understand about stigma processes. In particular, there is a recognized need to more clearly define and characterize the nature of stigma [ 9 , 26 ]; to identify societal and individual-level factors affecting stereotyping, prejudice, and discrimination [ 12 ]; and to develop a more nuanced understanding of how different stigma mechanisms may affect substance use recovery [ 11 ]. In this study, we develop models to identify stigma in a large social media dataset for subsequent qualitative analysis intended to enhance our understanding of the complex interplay of the effects of stigma on the individual within their embedded contexts.
Although a multitude of computational models for the detection of abusive language and hate speech in social media texts has been proposed [ 27 , 28 ], the computational detection of social stigma has been less extensively explored. Whereas hate speech is commonly defined as a communicative act of disparagement of a person or group [ 29 ], the arguably broader concept of stigma can include, in addition to direct antagonism, more subtle and systematic forms of discrimination and distancing, of both others and the self [ 1 , 18 , 19 , 30 ]. Research on stigma detection in a variety of specific domains has been conducted, with works on the detection of depression stigma [ 14 ], mental health stigma [ 31 , 32 ], stigmatizing language in healthcare discussions [ 33 ], Alzheimer’s Disease stigma [ 34 ], schizophrenia stigma [ 35 ], and obesity stigma [ 36 ].
Li et al. [ 14 ] produce models for the detection of depression stigma in Mandarin Chinese Weibo posts. In their data, they find only 6% of the posts contain stigmatizing content; however, when training their model, the authors create a balanced corpus of texts (stigmatizing vs. non-stigmatizing). The researchers test logistic regression, multi-layer perceptron (MLP), support vector machine, and random forest classifiers trained in conjunction with a simplified Chinese version of Linguistic Inquiry and Word Count (LIWC) features [ 37 ]. The trained models detect stigmatizing posts and also classify each stigma-positive instance as an instance of one of three depression stigma sub-narratives (‘unpredictability’, ‘weakness’, or ‘false illness’), with the researchers finding best results when using random forest models.
Straton et al. [ 33 ] build a model for the detection of stigmatizing language in Facebook healthcare discussions around the topic of vaccination. In their annotated corpus of postings from anti-vaccination message walls, they find language stigmatizing government organizations and institutions, and in pro-vaccination message walls, they find language stigmatizing the anti-vaccination movement. Using a balanced dataset, the researchers use term frequency-inverse document frequency (TF-IDF) weighted n-grams and LIWC psychological features to train a variety of classifiers, with a convolutional neural network model resulting in the best performance.
Gottipati et al. [ 32 ] perform mental disorder stigma detection on a corpus of mental health-related news articles published by Singapore’s largest media organizations. The authors create an (approximately) balanced dataset of stigmatizing and non-stigmatizing news article titles paired with a sentence from the same article. The researchers create features from TF-IDF weighted n-grams and compare a variety of machine learning classifiers, finding best performance with XGBoost [ 38 ].
To develop a model for detecting stigmatizing language related to mental health, Lee and Kyung [ 31 ] create a corpus of 240 sentence pairs (stigmatizing and non-stigmatizing), entitled the Mental Health Stigma Corpus. The authors fine-tune a BERT-base model [ 39 ] to classify sentences as stigma-positive or stigma-negative and achieve promising results, though the synthetic nature of their dataset may raise questions with regard its ability to generalize to real-world data. We summarize the results of the four stigma detection studies described here in Table 2 .
Although research on health-related stigma detection has been performed in a variety of domains, to our knowledge, all have treated stigma as a single monolithic concept. In this work, we incorporate the three stigma mechanisms (Internalized, Anticipated, and Enacted Stigma) of the Stigma Framework [ 13 ] to better differentiate between different types of stigma experiences, including identifying linguistic features which are most characteristic of each stigma type. For instance, the social media examples that we observed included stigmatizing language (“my sister is a hopeless alcoholic”), reports of stigmatization (“my husband took away the kids and said I’d never get clean”), and the experience of stigma (“I feel so much shame that I can’t tell anyone”).
Based on the effectiveness of BERT contextual embeddings, TF-IDF-weighted n-grams, and LIWC features for the purpose of stigmatizing language detection [ 14 , 31 , 33 ], we experiment with combinations of these resources. Given the prevalence of affect types such as sadness, anxiety, and fear in social media posts discussing experiences of substance use [ 40 ] and prior literature arguing that emotion regulation can be a factor in stigma coping [ 41 , 42 ], we also experiment with count-based features derived from extant affect lexicons and our own handcrafted stigma lexicons. These handcrafted lexicons incorporate affective, social, and behavioral concepts based on stigma theory, including anxiety, depression, and secretive behavior [ 5 , 9 ].
In this study, we employ classifiers to identify three different types of stigma in a social media dataset. We train and evaluate a set of models for each stigma type and then perform a mixed-methods analysis of the data identified by these models. A flowchart overview of our project is depicted in Fig. 1 .
Project overview flowchart
Harvesting data.
To create our dataset, approximately 160 thousand English-language Reddit posts authored between January 1, 2013 and December 31, 2019 were collected using Pushshift.io [ 43 ]. To capture diverse manifestations of substance use stigma and stigma-related behaviors (including navigation of legality for users), we focused on three substances for this analysis: alcohol, cannabis, and opioids. We selected subreddits related to the three substances of interest (e.g., ‘r/stopdrinking’, ‘r/marijuana’, and ‘r/opiates’) and sampled only thread-initiating posts, as these posts often contain richer descriptions of Redditor’s experiences [ 44 ]. In our previous research [ 40 , 45 ], we found these subreddits contained detailed accounts of both substance use and SUD recovery. Table 3 provides a breakdown of post counts for each subreddit in the harvested Reddit data. Subreddits that allude to or mention recovery or support in subreddit titles, descriptions or rules are labeled with checkmarks.
We observed that posts containing explicit references to stigma were relatively uncommon. To increase the volume of relevant data for annotation and to support subsequent natural language processing, we employed the keyword sampling method used in Chen et al. [ 40 ] to build our annotated corpus. Only the posts that matched a regular expression containing a keyword list were sampled to increase the probability of sampling stigma-related content. The theory-informed keyword list, derived from stigma literature [ 10 , 11 , 24 , 25 ], includes terms with stigma-related connotations (such as ‘shame’, ‘disappoint’, and ‘untrustworthy’) and terms referring to the actors who may be involved in stigma-related experiences (‘family’, ‘co-worker’, ‘husband’). Over the course of the annotation process, this list of keywords was iteratively refined to increase the prevalence of stigma in samples. The final set of sampling keywords is listed in Table 4 . Additionally, subreddits that produced low yields for stigma content (e.g., r/alcohol, r/Petioles, r/trees) were removed from the candidates for annotation sampling. Table 5 shows the breakdown of post counts for each of the subreddits and the distribution of the three stigma types in the annotated dataset.
Three annotators with expertise in informatics, natural language processing, nursing, and public health annotated a total of 2,214 Reddit posts at the span-level for three stigma types based on the Stigma Framework [ 13 ]: Internalized Stigma, Anticipated Stigma, and Enacted Stigma. We developed an annotation guide including definitions, synthetic examples, and instructions for identifying and distinguishing these three stigma types based on extant literature [ 11 , 46 ]. A detailed description of our annotation guidelines is provided as Additional file 1 .
Annotators independently identified passages containing stigma in the posts before discussing and reconciling the annotations. In addition to labeling stigma spans, annotators also labeled posts for substance type and the author’s recovery outlook (positive, neutral, or negative), and identified spans containing mentions of social isolation and labels (e.g., ‘addict’). Table 6 lists pairwise inter-annotator agreement for the three annotators at post level, prior to reconciliation, measured using Cohen’s Kappa [ 47 ]. Overall, pair-wise agreement on the stigma mechanisms reflected moderate agreement [ 48 ], with the highest agreement being for Internalized Stigma. Pair-wise agreement scores on all annotation types varied between 0.66 and 0.71, indicating substantial agreement.
In the annotated corpus, we observed that Reddit posts ranged in length from 28 characters to 25,743 characters, with a mean length of 1,816 characters (Fig. 2 ). As many posts exceed the 512-token input length limit of the RoBERTa encoder [ 49 ] that we use in our detection model, we opt to chunk posts into text segments. We use the term ‘segment’ to refer to the chunks of text used as inputs to our classifiers, and we use ‘span’ to refer to passages of text within posts labeled by annotators. We map the annotated span labels onto the segments, and then use the labeled segments to train our models. When the trained models make predictions, they first make predictions on individual segments before we map these predictions back to the post level, where, if any segment within a post is predicted as stigma-positive, the entire post is then predicted to be stigma-positive.
Architecture of the hybrid model
Although segmenting posts solves the input limitation issue, this also increases class imbalance in our dataset. In our annotated corpus, we find that within individual posts, the stigma-positive spans can be infrequent, with multi-paragraph posts sometimes only containing a few stigma-positive words. As a result, when we split the Reddit posts into smaller units (such as sentences), we produce far more negative examples than positive ones, and the portion of stigma-positive texts in our corpus decreases (Table 7 ). When splitting posts down to the level of sentences, we see severe class imbalance, with only 1.69% of the data containing Enacted Stigma.
Class imbalance can result in classifiers which perform well for the majority class, but poorly for the minority class [ 50 , 51 ]. To mitigate class imbalance, we experimented with a variety of segmentation lengths, and found the best performing length to be approximately 600 characters. At this length, text segments seem to be short enough to mitigate the amount of irrelevant information (features unrelated to stigma), but they also remain lengthy enough to keep the imbalance of classes from becoming severe.
To build segments from our post data, we begin by splitting all posts into sentences using Natural Language Toolkit (NLTK) 3.5 [ 52 ]. We then join the resulting sentences in the order they appear in the post until the threshold value of 600 characters in length is reached, after which, a new segment is started. We do not split sentences, and thus segments vary in length. After segmenting texts, labels are assigned to segments by checking for overlap between segment spans and annotation spans. The texts are then pre-processed by removing URLs, hyperlinks, and other HTML-related text residue.
To identify Reddit posts in the harvested data that have a high probability of containing reports and instances of substance use stigma, we create binary classifiers for each stigma type: Internalized Stigma, Anticipated Stigma, and Enacted Stigma. Because each segment of input text may be stigma-positive for multiple stigma types, we treat this classification task as a set of independent binary classification tasks rather than a single multi-class classification task.
We utilize a RoBERTa encoder [ 49 ] as the main component of the classifier, and also make use of n-gram features, features derived from affective and psychological lexicons, and handcrafted features to enrich the model with external knowledge relevant to the task. To integrate RoBERTa embeddings with the additional features, we use a hybrid model (Fig. 3 ) based on Prakash et al. [ 53 ], where the first stage is MLP pre-training. The MLP is pre-trained on a concatenated vector of TF-IDF weighted n-grams, features derived from the NRC Footnote 1 Emotional Intensity Lexicon [ 54 ], features derived from Wordnet-Affect [ 55 ], features generated from the LIWC 2015 lexicon [ 37 ], and handcrafted substance use stigma features.
Histogram of post character length
After pre-training is complete, the trained MLP weights are used along with a pre-trained RoBERTa encoder in the fine-tuning process. The < s > token output of the RoBERTa encoder and the MLP output are normalized and then concatenated before being passed to an MLP classifier head, which outputs the probability that a given sequence of text contains the current type of substance use stigma.
When building input to the MLP component of the classifier, we create the following feature sets:
To create TF-IDF features, we remove English stop words from the text using the NLTK 3.5 package, and then use Scikit-learn 1.8 [ 56 ] to create TF-IDF weighted n-grams in the range (2, 6) with a dimensionality of 10,000.
We include NRC features [ 54 ] to take advantage of the scaled emotional intensity scores that the NRC lexicon provides. We use the NRC Emotional Intensity Lexicon to generate 10-dimensional intensity-scaled affect features (with each dimension corresponding to one of the concepts listed in Table 8 ). To produce feature vectors, we follow the method of Babanejad et al. [ 57 ], who create ‘EAISe’ representations (Emotion Affective Intensity with Sentiment Features) for their sarcasm detection model.
Wordnet-Affect [ 1 ], developed based on Wordnet 1.6 [ 58 ], enabled us to incorporate finer-grained affect types. Based on literature relating to substance use, stigma, and emotion and an examination of our Reddit corpus, we identified 13 Wordnet-Affect concepts that were relevant to substance use stigma (Table 8 ) and constructed lexical sets around each of the 13 Wordnet-Affect concepts using Wordnet. Using these sets, we generate 13-dimensional feature vectors using the same method that we use to build our NRC vectors.
Linguistic, grammatical, and psychological features are generated using LIWC 2015 software [ 37 ]. We remove the ‘word count’ feature and retain all others, resulting in a 92-dimensional vector.
We create handcrafted lexicons (identified as ‘INT’, ‘ANT’, and ‘ENA’) to capture affective, behavioral, and social concepts related to each stigma type. These lexicons were developed through examination of TF-IDF weighted n-gram chi-square rankings for the training data, identification of recurring concepts in the stigma-positive examples of the training data that corresponded to concepts from stigma literature and survey instruments [ 10 , 11 , 24 , 25 , 46 , 59 ], and iterative building and evaluation of lexical sets for each concept using a validation set. For Anticipated Stigma, an associated behavior such as concealment [ 25 ] is included in the ‘secrecy’ concept through keywords such as ‘sneak’, ‘hid’, or ‘throwaway’ (used in mentions of ‘throwaway’ Reddit accounts created to preserve anonymity). The six concepts included in each feature set is listed here in Table 8 , and the complete list of keywords included in each concept is listed in Additional file 2 . To create 6-dimensional feature vectors, we start with a vector of zeros. We then search text segments for each of the words in our lexical sets. If a lexicon word is present, we add ‘1’ to the concept dimension associated with the word.
After building all feature vectors, we separately normalize each set of features, then concatenate them to form a 10,121-dimensional input vector.
Training sets are sampled from our segment-level data and contain a mixture of stigma-positive and stigma-negative texts. In development, the best results for MLP and hybrid models were found when using a training set with a negative to positive rate of 3:1, and we use this rate to train our final hybrid models. Our validation and test sets are randomly sampled from 10% of the post-level data. After a set of Reddit posts is sampled, the constituent segments are retrieved and used as the evaluation set.
We train all models on a single Tesla A100 GPU on the Google Colab platform. Training is implemented using Pytorch 1.12 [ 60 ] and the Huggingface library [ 61 ]. We pre-train our MLP for 30 epochs using the AdamW optimizer with a learning rate of 5.e-5 (controlled by a learning rate scheduler) and a batch size of 32. We determine the optimal threshold for positive class F1 after each training epoch using a precision-recall curve on the validation set. The best model is checkpointed based on positive class F1 performance.
During fine-tuning, we fine-tune cased RoBERTa-base (123 million parameters) for 10 epochs with a learning rate of 5.e-5 and batch size of 32. We also experiment with the cased RoBERTa-large encoder (354 million parameters), and when fine-tuning RoBERTa-large, we train for 10 epochs with a learning rate of 7.e-6 and a batch size of 32. Less than 15 min of GPU time were required to train a single hybrid model.
As we sought to identify the stigma-positive Reddit posts within the unseen harvested Reddit data, we evaluate each model’s predictions at the post-level by mapping segment predictions to each post. We compare the performance of models by reporting the mean macro F1 score of five runs on the same data, using different random seeds. We list results from variations of hybrid models utilizing different sets of features. As a baseline for comparison to the hybrid models, we list results using RoBERTa-base and RoBERTa-large with a simple classifier head, trained on a balanced training set (via undersampling), and using the same threshold moving method as used in our hybrid model.
Improvements over the RoBERTa-only baselines are considered significant at a significance level (α) of 0.05 according to McNemar’s test [ 62 ] with false discovery rate (FDR) correction [ 63 ]. McNemar’s significance test has been considered appropriate for binary classification tasks [ 64 ]; thus, we employ it on the predictions of the paired models. Because we make multiple hypothesis tests in our comparisons, FDR correction is applied to p -values.
To explore each feature set’s potential for use in stigma detection, we also considered the results of MLP evaluation on single feature sets and set combinations. We use an MLP for this comparison rather than a hybrid model since in the hybrid models, redundancies in the information encoded by feature set combinations and the information encoded by RoBERTa can make the relative performance contribution of each feature set difficult to disentangle. We also perform exploratory feature ranking of all features using the chi-square measure to explore the strength of association between each feature and its relevant stigma type. The feature selection tools of the Scikit-learn package were used to implement this experiment [ 56 ].
Last, we perform an error analysis of the hybrid model’s predictions. This evaluation not only informs future improvements on our approach, but also provides insights into difficulties that arise in the perception and experience of stigma.
Mixed-methods research can facilitate research that cannot be answered using a single method. Though there is controversy concerning what constitutes mixed-methods research, integrating quantitative and qualitative approaches is considered increasingly important, and extant literature has observed and demonstrated that the definition of mixed-methods research will continue to grow [ 65 , 66 ]. In this study, we leverage both quantitative and qualitative methods for various affordances identified by Doyle et al. [ 66 ] including: triangulation, completeness, and illustration of data.
We performed a mixed-methods analysis to: 1) estimate the amount of stigma in the larger social media data store; and 2) characterize the nature of the different stigma mechanisms. First, we characterized the presence of stigma in the unseen portion of the harvested Reddit data by examining patterns in the distribution of stigma predictions with respect to substance and subreddit, and the correlations between stigma type predictions. We employ chi-square tests to compare the presence of the stigma mechanisms in the three substances. As a chi-square test of independence on its own merely shows that there is an association between two nominal variables and does not show which cells are contributing to the lack of fit [ 67 , 68 ], we calculated standardized Pearson residuals. A standardized Pearson residual exceeding two in absolute value in a given cell indicates a lack of fit [ 67 , 68 ]. Second, we considered the feature rankings and the instances of predicted stigma in the test data in concert to illustrate how the three types of stigma concretely manifest in cognitive and emotional processes, social interactions, and behaviors in everyday life. To protect the identities of the posters, we employ synthetic quotes in our illustration [ 69 ].
Model performance and evaluation, overall model performance.
Table 9 lists the results of post-level stigma detection for the three stigma types. For all three stigma types, we found hybrid models that significantly outperformed their respective RoBERTa-only baselines, with the largest gain observed for the Anticipated Stigma RoBERTa-large hybrid model using only the handcrafted stigma features (+ 7.08 F1). These results provide evidence that n-gram, affective, behavioral, and social features can be combined with contextual embeddings to improve substance use stigma detection.
In the results of MLP evaluation (Table 10 ), the handcrafted lexicons (STIG) appeared to be relatively effective resources for the task of stigma detection, and the other feature sets (NRC, WNA, and LIWC) also appear to be viable resources (to varying degrees). For individual feature sets, the handcrafted stigma lexicons appeared to provide the best results for Internalized Stigma and Anticipated Stigma, whereas LIWC provided best results for Enacted Stigma. For feature set combinations, adding additional feature sets usually led to improvement for MLP models (with some exceptions), although the combination of all features only outperformed the handcrafted stigma lexicons for the case of Enacted Stigma.
The results in Tables 9 and 10 show that, overall, scores for Internalized Stigma are higher than for the other stigma types; Internalized Stigma was the most frequent of the three stigma types in the annotated corpus (making it the stigma type with the greatest number of examples). When performing exploratory feature ranking of all features (Table 11 ), count-based features had stronger associations (higher chi-square scores) with Internalized Stigma than they did with the other stigma types. Affective concepts such as ‘shame’ and ‘guilt’ had strong relationships with Internalized Stigma, which likely benefitted performance.
Overall performance for Anticipated and Enacted Stigma was weaker than for Internalized Stigma. There may be a number of reasons for this. First, Anticipated and Enacted Stigma had fewer examples and relatively weaker associations with count-based features in comparison with Internalized Stigma. For Enacted Stigma, the highest-ranking features were labels such as ‘alcoholic’ and ‘junkie’, which were fairly common in the entire corpus. Labeling terms such as ‘alcoholic’ may be used to enact stigma, but they may also be used to express membership in recovery groups and are a part of ‘recovery dialects’ used within such groups [ 2 ]. Moreover, labeling terms may also be appropriated by members of stigmatized groups to increase perceptions of power for the stigmatized individual or group [ 70 ]. The variety of motivations behind the uses of such labeling terms such as ‘junkie’ may be a limiting factor to their viability as features for stigma detection.
Another potential factor for the weaker performance for Anticipated and Enacted Stigma may be their social nature. Whereas Internalized Stigma frequently focus on a single entity (the post author), with feature rankings showing strong relationships with inward features (n-grams such as ‘i ashamed’), both Anticipated and Enacted Stigma involved other actors. With Anticipated Stigma, the highest ranking features involved concealment of use (ANT_secrecy) and other actors (ANT_social), as post authors were concerned about concealing their use from others. With Enacted Stigma, there was a wide variety of actors involved in the relationships between the stigmatizer and the person(s) being stigmatized (e.g. ‘family to partner’, ‘partner to society’, ‘co-workers to society’). Further, while Internalized Stigma frequently focused on the act of shaming oneself, Enacted Stigma involved a more diverse set of verbs/actions through which stigma was performed (e.g., disapproving looks, expressions of distrust, arrests, searches, evictions, insults, generalizations, coerced drug tests, denial of healthcare services, termination of employment, termination of personal relationships). Many of the verbs related to these stigmatizing actions were included in the ENA_stigmatizing_actions and ENA_trust features, which ranked second and third, respectively, in the feature ranking.
Model performance by stigma type followed a similar pattern to that of inter-annotator agreement across stigma types (Table 6 ), in which annotators found highest agreement on Internalized Stigma and less agreement on Anticipated and Enacted Stigma. The complexities involved in identifying these two stigma types seemed to be a challenge for both human annotators as well as the models.
We provide an error analysis of the Anticipated and Enacted Stigma models to gain insights into the challenges involved in detecting these stigma types. We give synthetic quotes based on our data to demonstrate error types, with features typical of Anticipate or Enacted Stigma texts bolded.
We observed that both the Anticipated and Enacted Stigma hybrid models produced false positives for texts which do not match the temporal requirements of their respective stigma type (future for Anticipated Stigma, present or past for Enacted Stigma). The following example (a false positive for Enacted Stigma due to temporal mismatch), is representative of this error type:
If I come clean, my family will disown me – that isn’t even an option.
For the RoBERTa-only baseline models, this error type was noticeably less frequent. This may be a limitation of the use of count-based features in the hybrid models, as the model may weighting keywords such as disown more heavily than the tense-related syntactic information that has been shown to be encoded by BERT [ 71 ].
During annotation, we observed that individuals abstaining from substance use were pressured by persons who engaged in substance use, often in the context of alcohol use when it is normalized in home or work-related settings. Though this behavior was not annotated as stigma, when it appeared in texts, it led to false positive predictions by both the baseline and hybrid models, and is exemplified by the following excerpt:
I told my mother I quit drinking and she laughed at me. I quit in May and have avoided telling my family because they drink a lot and I didn't want to put up with the questions or judgement .
In examples like this, the model seems to leverage features relevant to stigma ( she laughed at me , judgement ) while failing to learn cues that indicate the mother is an alcohol user critical of another user’s abstinence.
Both the baseline and hybrid models for Anticipated and Enacted Stigma were prone to produce false positives for texts where typical features of stigma are present, but the motivation behind an action potentially construed as stigmatizing is unrelated to stereotyping, prejudice, or discrimination. In the following example, a partner appears to terminate a relationship due to apathetic behavior rather than stigma, and thus should be labeled as stigma-negative:
I struggled for a long time with the sadness that comes with addiction, so the feelings of apathy that followed it seemed like a relief. Eventually, they resulted in my partner breaking up with me.
Although BERT models have been demonstrated to encode information that can be leveraged to make predictions about causality [ 72 ], interpreting the motivations behind the actions described in texts can be a difficult task even for human judgement. We further discuss this issue in our limitations section.
To better understand how the three stigma mechanisms outlined in the Stigma Framework manifest within our social media dataset, we employed the classifiers to identify instances of the stigma types in the previously unexplored portion of our collected Reddit data ( n = 161,448). The distribution of stigma predictions across subreddits is presented in Table 12 . Overall, the portion of stigma-positive predictions for each type were noticeably lower than the portions seen in the annotated data (Table 5 ). This outcome aligns with expectations, given that: 1) keyword sampling was used to increase the proportion of stigma in the annotated data; and 2) in the unexplored data, a larger portion of posts originated from subreddits focused on general substance use, rather than on support or recovery. In both the predictions and annotations, we observed that, for all three substance types, the estimated stigma proportion was highest for support-focused subreddits, where posters often described challenging experiences relating to their attempts at recovery.
With respect to alcohol and cannabis, Internalized Stigma appeared to be the most common of the three stigma types. The focus on the self makes intuitive sense given the first-person viewpoint of social media narratives, and the prominent features of Internalized Stigma (Table 11 ) suggest that these data could serve as a rich source for future research on how individuals may seek to internally reconcile the cognitive and emotional aspects of shame and guilt that accompany Internalized Stigma.
However, in the case of opioids, we observed a higher frequency of Anticipated Stigma compared to Internalized Stigma. Chi-square tests examining the presence of the three stigma mechanisms in the three substances, with the standardized Pearson residual for Anticipated Stigma x Opioids, also confirm that the observed presence of Anticipated Stigma exceeds the expected in that case (see Additional file 3 ).
We also explored the extent to which the stigma mechanisms co-occurred in the data. Figure 4 shows a Pearson correlation matrix between stigma labels for text segments in the annotated data (left) and also for the predictions on the unseen data (right). The largest correlation score is a value of 0.11 between Internalized and Anticipated Stigma (in the annotated data), indicating that text segments with multiple stigma labels are relatively infrequent in the annotated data. Although we observed some concepts were shared across stigma types in the feature rankings, such as labeling terms (e.g., ‘addict’), the relatively low correlation between paired stigma types illustrates the utility of developing separate models for each stigma type. Furthermore, this underscores the potential utility of the three stigma types distinguished by the Stigma Framework [ 13 ] for future research in clarifying the mechanisms by which stigma can affect persons with SUDs.
Pearson correlation between stigma types for text segments in the annotated dataset (left) and the predictions on the unseen data (right)
To characterize the nature of stigma as manifested in social media, we consider the feature rankings associated with each stigma type, along with the instances of stigma in the test data. Figure 5 depicts the concepts from the handcrafted stigma lexicons that were among the highest-ranking features for each stigma type, along with synthetic examples. Among the posts associated with Internalized Stigma, we observed an abundance of affective content (shame, self-blame, and despair). Our examination of the test data further uncovers that posts containing shame and self-blame also often involved the poster using self-deprecating language (in the form of pejoratives) and labels to describe themselves, and express feelings of weakness and perceptions of failure.
Conceptual differentiation of stigma types. All examples are synthetic quotes that resemble the phenomena and sentiment observed in the data
For Anticipated and Enacted Stigma, emotion was still important, but social and behavioral features were also prominent (i.e., ANT_social, ENA_stigmatizing_actions). The ‘ANT_social’ lexicon includes possible members of a user’s social circle (e.g., ‘parents’, ‘partner’, ‘friend’). Since, by definition, Internalized Stigma is focused on the self, Anticipated Stigma is focused on one’s expectation of how they are perceived by others, and Enacted stigma, by stigmatizing behavior, these associations make intuitive sense. The social media data highlights additional features tied to Anticipated Stigma, such as secretive behavior, concern over how one is perceived, and a fear of disappointing others. Notably, the theme of concealment, especially from close relations like family members, partners, or employers, is prominent in the Anticipated Stigma texts (as exemplified in the examples 3–5 in Fig. 5 ).
Enacted Stigma often involved the use of labels to describe another person, and as seen in the final two examples of Fig. 5 , the usage of these terms can be descriptive (‘He is always drunk’) or may have judgmental motivations in their usage (‘down-and-out junkies’). Stigmatizing actions related to judging, disparaging, or confronting others figured prominently in terms of this type of stigma, and could involve many different pairs of stigmatizer and stigmatized persons (e.g., parent–child, child-parent, friends, partners, co-workers, and the poster feeling stigmatized by the public, people, or society at large). Features related to trust also ranked highly for Enacted Stigma, corresponding to previous stigma research which identified ‘untrustworthiness’ as a common stereotype espoused by user’s family members [ 24 ].
Other phenomena to consider were instances in which multiple stigma types were present. The third text in Fig. 5 exemplifies a common scenario for the pairing of Internalized Stigma and Anticipated Stigma, with posters expressing reticence to interact with others due to their own shame. Text segments containing all three stigma types were relatively rare in the annotated corpus (0.78% of all stigma-positive segments), though the fifth example in Fig. 5 illustrates an instance where an author appears to negatively judge persons experiencing SUDs, describe concealment of their own use, and express internal guilt for their use, all within a relatively brief sequence of text.
Similar to Straton et al. [ 33 ], we observed that the LIWC categories for emotional tone and clout showed fairly strong relationships with stigma; however, we observed a limited relation to stigma for the remaining 90 LIWC categories. The clout feature, derived from ratios of personal pronoun frequencies, is based on Kacewicz et al. [ 73 ], who found that high-status authors consistently used more 1st person plural (e.g., ‘we’, ‘our’) and 2nd person singular (‘you’) pronouns, whereas low-status authors were more frequently self-focused and used more 1st person singular pronouns (‘I’, ‘me’). This may explain the effectiveness of the clout feature for predicting Internalized Stigma (low clout scores appeared to be indicative of Internalized Stigma), which is heavily focused on inner experiences, with heavy use of 1st person pronouns. The LIWC emotional tone feature [ 74 ] calculates the difference between positive emotion word count and negative emotion word count, with higher scores indicating greater overall positivity. The generally negative emotional content of stigma-positive texts is a likely factor for the high ranking of the tone feature for all three stigma types.
In this study, our objective was to investigate how the three different stigma mechanisms in the Stigma Framework manifest differently in terms of distribution and nature in a social media dataset. Through an analysis of feature rankings, the distribution of predictions, and specific instances of stigma in our data, we discerned distinct patterns across Internalized, Anticipated, and Enacted Stigma. Furthermore, we characterized the language used to convey and describe each of these three mechanisms.
In terms of the distributions of the three stigma mechanisms, we observed that Internalized Stigma was the most prevalent stigma type with respect to alcohol and cannabis. However, in the case of opioids, Anticipated Stigma was more frequent than Internalized Stigma. Though these patterns were only observed in a single dataset and further exploration of the presence of different stigma mechanisms in other data is needed, it is worthwhile to consider these findings in the context of the larger societal concern about opioid use. Extant literature emphasizes that great care must be used in crafting public health messaging concerning opioid addiction due to the potential for increased stigmatization of those who use opioids [ 75 ]. The social environment surrounding opioid use appears to lead to greater anticipation of stigma and a tendency to conceal behavior, compared to the environments surrounding cannabis and alcohol. Thus, it may be important to focus on the portrayal of opioid use, anonymous forms of support, and an emphasis on support for interpersonal interactions in the context of opioid use.
Additionally, our study considered the nature of language used to express stigma as it manifests in social media. This exploration not only confirms that language is a powerful vehicle for expressing stigma, as established in prior literature [ 2 ], but also illuminates the nuanced relationship between word usage and specific stigma types, and the pivotal roles of affect, social perceptions, personal interactions, and behavior in the expression of stigma, in social media. In the social media data, we found that Internalized Stigma is predominantly characterized by emotional content, with a focus on shame, self-blame, and despair. In contrast, Enacted Stigma and Anticipated involve a complex interplay of emotional, social, and behavioral features. The former encompasses stigmatizing behaviors and issues of trust, while the latter centers on expectations of external perceptions and the fear of disappointing others. For Anticipated Stigma, the feature analysis demonstrated that issues of concealment were prominent, along with the presence of close interpersonal relationships.
Insights from this study can serve as priorities in the design of stigma reduction interventions. For example, the high-ranking features from the Enacted Stigma lexicon include both stigmatizing actions such as confronting and blaming, as well as indicators of trust (e.g., expressed as disappointment, suspicion, or a lack of respect for privacy). In future intervention development, the integration of components addressing these core issues is critical.
Overall, our findings improve our understanding of stigma mechanisms in social media discourse and could also inform the development of targeted interventions that address the challenges of those affected by stigma. Furthermore, the adaptability of our lexicons to stigma research in other contexts, such as HIV/AIDS or disordered eating, where similar emotions, behaviors (e.g., hiding, concealment), and attitudinal constructs such as trust [ 24 , 76 ] are at play, hold promise for broader applications beyond substance use.
Although the purposive sampling used in this study allowed us to develop a sufficient corpus of stigma-positive texts within a reasonable amount of time, our sampling method may also be viewed as one of its limitations. By sampling from a limited set of subreddits focused on substance use, we realize that our detection model may not generalize to other types of texts. Additionally, since keyword matching enrichment was used during the sampling process, the distribution of texts in our corpus differs from that of the substance recovery subreddits which they were sampled from. When making predictions on random samples, our models may have faced performance issues due to the increased imbalance between stigma-positive and stigma-negative texts.
To facilitate the aims of this research, we sought to identify stigma and accounts of stigma within social media narratives. In many of the possible instances of stigma that appear, the motivations behind the potentially stigmatizing actions are unclear or unstated. For posts containing sequences such as ‘my parents kicked me out of the house’, it may be difficult to determine whether the parents’ actions are motivated by stigma or by other factors. Causal ambiguity can lead our models to produce errors, and also lead to disagreement among our annotators. Collection and triangulation of data collected through other means, such as interview, survey, or diary data, could perhaps complement insights from social media.
In this study, we performed an examination of stigma surrounding substance use within the realm of social media. Our approach encompassed data collection, corpus annotation, and the development of binary classifiers tailored to detect three different stigma mechanisms. By synergizing contextual embeddings with count-based features, we achieved models that exhibited superior performance across all three stigma categories compared to RoBERTa-only baselines. Through a mixed-methods analysis of the model's predictions, we unraveled critical insights into the relations of word usage to the expression of different types of stigma. Affective, social, and behavioral features emerged as pivotal components in the expression of substance use stigma.
Our main contributions include: demonstrating a theory-based approach to extracting and comparing different types of stigma in a large social media dataset, and employing patterns in word usage to explore and characterize its manifestations. The insights from this study highlight the need to consider the impacts of stigma differently by mechanism (internalized, anticipated, and enacted), and enhance our current understandings of how the stigma mechanisms manifest within language in particular cognitive, emotional, social, and behavioral aspects. Moving forward, we envisage further analysis of stigma instances in our dataset to glean insights into how individuals navigate the challenges they encounter, informing the development of more effective stigma reduction strategies. Furthermore, the concepts encapsulated in our handcrafted lexicons hold promise for future stigma research in diverse contexts, extending the applicability of our findings beyond substance use disorders.
Stigma datasets and models trained to detect stigma could potentially be used by bad actors to target vulnerable individuals. In order to reduce the risk of any potential harms to the authors of the sensitive posts examined in our research, we do not share our models or annotated dataset publicly.
National Research Council Canada.
Substance use disorder
Multi-layer perceptron
Linguistic inquiry and word count
Term frequency-inverse document frequency
Bidirectional encoder representations from transformers
Robustly optimized bidirectional encoder representations from transformers
National research council Canada
Natural language toolkit
Wordnet-affect
Internalized stigma feature lexicon
Anticipated stigma feature lexicon
Enacted stigma feature lexicon
Kulesza M, Ramsey S, Brown R, Larimer M. Stigma among Individuals with Substance Use Disorders: Does it Predict Substance Use, and Does it Diminish with Treatment? J Addict Behav Ther Rehabil. 2014;3(1):1000115. https://doi.org/10.4172/2324-9005.1000115 .
Article CAS PubMed PubMed Central Google Scholar
Ashford RD, Brown AM, Ashford A, Curtis B. Recovery dialects: A pilot study of stigmatizing and nonstigmatizing label use by individuals in recovery from substance use disorders. Exp Clin Psychopharmacol. 2019;27(6):530–5. https://doi.org/10.1037/pha0000286 .
Article PubMed PubMed Central Google Scholar
Bozdağ N, Çuhadar D. Internalized stigma, self-efficacy and treatment motivation in patients with substance use disorders. J Subst Use. 2022;27(2):174–80. https://doi.org/10.1080/14659891.2021.1916846 .
Article Google Scholar
Hammarlund R, Crapanzano KA, Luce L, Mulligan L, Ward KM. Review of the effects of self-stigma and perceived social stigma on the treatment-seeking decisions of individuals with drug- and alcohol-use disorders. Subst Abuse Rehabil. 2018;9:115–36. https://doi.org/10.2147/SAR.S183256 .
Livingston JD, Milne T, Fang ML, Amari E. The effectiveness of interventions for reducing stigma related to substance use disorders: a systematic review. Addiction. 2012;107(1):39–50. https://doi.org/10.1111/j.1360-0443.2011.03601.x .
Ashford RD, Brown AM, Curtis B. Substance use, recovery, and linguistics: The impact of word choice on explicit and implicit bias. Drug Alcohol Depend. 2018;189:131–8. https://doi.org/10.1080/07347324.2019.1585216 .
Volkow ND, Gordon JA, Koob GF. Choosing appropriate language to reduce the stigma around mental illness and substance use disorders. Neuropsychopharmacol. 2021;46(13):2230–2. https://doi.org/10.1038/s41386-021-01069-4 .
Brown SA. Standardized measures for substance use stigma. Drug Alcohol Depend. 2011;116(1):137–41. https://doi.org/10.1016/j.drugalcdep.2010.12.005 .
Article PubMed Google Scholar
Kulesza M, Larimer ME, Rao D. Substance Use Related Stigma: What we Know and the Way Forward. J Addict Behav Ther Rehabil. 2013;2(2). https://doi.org/10.4172/2324-9005.1000106 .
Kulesza M, Watkins KE, Ober AJ, Osilla KC, Ewing B. Internalized stigma as an independent risk factor for substance use problems among primary care patients: Rationale and preliminary support. Drug Alcohol Depend. 2017;180:52–5. https://doi.org/10.1016/j.drugalcdep.2017.08.002 .
Smith LR, Earnshaw VA, Copenhaver MM, Cunningham CO. Substance use stigma: Reliability and validity of a theory-based scale for substance-using populations. Drug Alcohol Depend. 2016;162:34–43. https://doi.org/10.1016/j.drugalcdep.2016.02.019 .
Corrigan P, Schomerus G, Shuman V, Kraus D, Perlick D, Harnish A, et al. Developing a research agenda for understanding the stigma of addictions Part I: Lessons from the Mental Health Stigma Literature. Am J Addict. 2017;26(1):59–66. https://doi.org/10.1111/ajad.12458 .
Earnshaw VA, Chaudoir SR. From Conceptualizing to Measuring HIV Stigma: A Review of HIV Stigma Mechanism Measures. AIDS Behav. 2009;13(6):1160–77. https://doi.org/10.1007/s10461-009-9593-3 .
Li A, Jiao D, Zhu T. Detecting depression stigma on social media: A linguistic analysis. J Affect Disord. 2018;232:358–62. https://doi.org/10.1016/j.jad.2018.02.087 .
Article CAS PubMed Google Scholar
Li A, Jiao D, Liu X, Zhu T. A Comparison of the Psycholinguistic Styles of Schizophrenia-Related Stigma and Depression-Related Stigma on Social Media: Content Analysis. J Med Internet Res. 2020;22(4): e16470. https://doi.org/10.2196/16470 .
Clark O, Lee MM, Jingree ML, O’Dwyer E, Yue Y, Marrero A, et al. Weight Stigma and Social Media: Evidence and Public Health Solutions. Front Nutr. 2021;8:739056.
Dredze M. How Social Media Will Change Public Health. IEEE Intell Syst. 2012;27(4):81–4. https://doi.org/10.1109/MIS.2012.76 .
Goffman E. Stigma: Notes on the management of spoiled identity. New York: Simon and Schuster; 2009. p. 52, 65.
Link BG, Phelan JC. Conceptualizing Stigma. Annu Rev Sociol. 2001;27(1):363–85. https://doi.org/10.1146/annurev.soc.27.1.363 .
Parker R, Aggleton P. HIV and AIDS-related stigma and discrimination: a conceptual framework and implications for action. Soc Sci Med. 2003;57(1):13–24. https://doi.org/10.1016/S0277-9536(02)00304-0 .
Brewer MB, Brown RJ. Intergroup relations. In: Gilbert DT, Fiske ST, Lindzey G, editors. The handbook of social psychology. 4. New York: Oxford University Press; 1998.
Google Scholar
Meyer IH. Prejudice, Social Stress, and Mental Health in Lesbian, Gay, and Bisexual Populations: Conceptual Issues and Research Evidence. Psychol Bull. 2003;129(5):674–97. https://doi.org/10.1037/0033-2909.129.5.674 .
Phelan J, Link BG, Dovidio JF. Stigma and Prejudice: One Animal or Two? Soc Sci Med. 2008;67(3):358–67. https://doi.org/10.1016/j.socscimed.2008.03.022 .
Earnshaw V, Smith L, Copenhaver M. Drug Addiction Stigma in the Context of Methadone Maintenance Therapy: An Investigation into Understudied Sources of Stigma. Int J Ment Health Addict. 2013;11(1):110–22. https://doi.org/10.1007/s11469-012-9402-5 .
Quinn DM, Earnshaw VA. Concealable Stigmatized Identities and Psychological Well-Being. Soc Personal Psychol Compass. 2013;7(1):40–51 ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3664915/ ).
Crapanzano KA, Hammarlund R, Ahmad B, Hunsinger N, Kullar R. The association between perceived stigma and substance use disorder treatment outcomes: a review. Subst Abuse Rehabil. 2018;10:1–12. https://doi.org/10.2147/SAR.S183252 .
Schmidt A, Wiegand M. A Survey on Hate Speech Detection using Natural Language Processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media. Valencia, Spain: Association for Computational Linguistics; 2017. p. 1–10. https://doi.org/10.18653/v1/W17-1101 .
Chapter Google Scholar
Yin W, Zubiaga A. Towards generalisable hate speech detection: a review on obstacles and solutions. PeerJ Comput Sci. 2021;7:e598. https://doi.org/10.7717/peerj-cs.598 .
Nockleby JT, Levy LW, Karst KL, Mahoney DJ. Encyclopedia of the American constitution. Detroit, MI: Macmillan Reference; 2000. p. 1277–9.
Allport GW, Clark K, Pettigrew T. The Nature of Prejudice. 1954.
Lee MH, Kyung R. Mental Health Stigma and Natural Language Processing: Two Enigmas Through the Lens of a Limited Corpus. In: 2022 IEEE World AI IoT Congress (AIIoT). 2022. p. 688–91. https://doi.org/10.1109/AIIoT54504.2022.9817362 .
Gottipati S, Chong M, Kiat A, Kawidiredjo B. Exploring Media Portrayals of People with Mental Disorders using NLP: In: Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies. p. 708–15. https://doi.org/10.5220/0010380007080715 .
Straton N, Jang H, Ng R. Stigma Annotation Scheme and Stigmatized Language Detection in Health-Care Discussions on Social Media. In: Proceedings of the 12th Language Resources and Evaluation Conference. Marseille, France: European Language Resources Association; 2020. p. 1178–90 https://aclanthology.org/2020.lrec-1.148 .
Oscar N, Fox PA, Croucher R, Wernick R, Keune J, Hooker K. Machine Learning, Sentiment Analysis, and Tweets: An Examination of Alzheimer’s Disease Stigma on Twitter. J Gerontol B. 2017;72(5):742–51. https://doi.org/10.1093/geronb/gbx014 .
Jilka S, Odoi CM, van Bilsen J, Morris D, Erturk S, Cummins N, et al. Identifying schizophrenia stigma on Twitter: a proof of principle model using service user supervised machine learning. Schizophr. 2022;8(1):1–8. https://doi.org/10.1038/s41537-021-00197-6 .
Pollack CC, Emond JA, O’Malley AJ, Byrd A, Green P, Miller KE, et al. Characterizing the Prevalence of Obesity Misinformation, Factual Content, Stigma, and Positivity on the Social Media Platform Reddit Between 2011 and 2019: Infodemiology Study. Journal of Medical Internet Research. 2022;24(12):e36729. Available from: https://www.jmir.org/2022/12/e36729 .
Pennebaker JW, Boyd RL, Jordan K, Blackburn K. The Development and Psychometric Properties of LIWC2015. 2015. https://repositories.lib.utexas.edu/handle/2152/31333 .
Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016. p. 785–94. Available from: http://arxiv.org/abs/1603.02754 . Accessed 14 Jan 2022.
Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:181004805. 2019. http://arxiv.org/abs/1810.04805 . Accessed 11 May 2020.
Chen AT, Johnny S, Conway M. Examining stigma relating to substance use and contextual factors in social media discussions. Drug and Alcohol Dependence Reports. 2022;3:100061. https://doi.org/10.1016/j.dadr.2022.100061 .
Hatzenbuehler ML, Nolen-Hoeksema S, Dovidio J. How Does Stigma “Get Under the Skin”?: The Mediating Role of Emotion Regulation. Psychol Sci. 2009;20(10):1282–9. https://doi.org/10.1111/j.1467-9280.2009.02441.x .
Wang K, Burton CL, Pachankis JE. Depression and Substance Use: Towards the Development of an Emotion Regulation Model of Stigma Coping. Subst Use Misuse. 2018;53(5):859–66. https://doi.org/10.1080/10826084.2017.1391011 .
Baumgartner J, Zannettou S, Keegan B, Squire M, Blackburn J. The Pushshift Reddit Dataset. Proc Int AAAI Conf Web Soc Media. 2020;14:830–9.
MacLean D, Gupta S, Lembke A, Manning C, Heer J. Forum77: An Analysis of an Online Health Forum Dedicated to Addiction Recovery. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. New York, NY, USA: Association for Computing Machinery; 2015. p. 1511–26. (CSCW ’15). https://doi.org/10.1007/BF02295996 .
Benson R, Hu M, Chen AT, Zhu SH, Conway M. Examining Cannabis, Tobacco, and Vaping Discourse on Reddit: An Exploratory Approach Using Natural Language Processing. Front Public Health. 2022. Available from: https://www.frontiersin.org/article/10.3389/fpubh.2021.738513 .
Palamar JJ, Kiang MV, Halkitis PN. Development and Psychometric Evaluation of Scales that Assess Stigma Associated With Illicit Drug Users. Subst Use Misuse. 2011;46(12):1457–67. https://doi.org/10.3109/10826084.2011.596606 .
Cohen J. A Coefficient of Agreement for Nominal Scales. Educ Psychol Measur. 1960;20(1):37–46. https://doi.org/10.1177/001316446002000104 .
Viera AJ, Garrett JM. Understanding interobserver agreement: the kappa statistic. Fam Med. 2005;37(5):360–3.
PubMed Google Scholar
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv. 2019. https://doi.org/10.48550/arXiv.1907.11692 .
He H, Garcia EA. Learning from Imbalanced Data. IEEE Trans Knowl Data Eng. 2009;21(9):1263–84. https://doi.org/10.1109/TKDE.2008.239 .
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G. Learning from class-imbalanced data: Review of methods and applications. Expert Syst Appl. 2017;73:220–39. https://doi.org/10.1016/j.eswa.2016.12.035 .
Bird S, Klein E, Loper E. Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc. 2009.
Prakash A, Tayyar Madabushi H. Incorporating Count-Based Features into Pre-Trained Models for Improved Stance Detection. In: Proceedings of the 3rd NLP4IF Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda. Barcelona, Spain: International Committee on Computational Linguistics (ICCL); 2020. p. 22–32 https://aclanthology.org/2020.nlp4if-1.3 .
Mohammad S. Word Affect Intensities. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan: European Language Resources Association (ELRA); 2018. https://aclanthology.org/L18-1027 .
Strapparava C, Valitutti A. WordNet-Affect: An Affective Extension of WordNet. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004). Lisbon: European Language Resources Association (ELRA); 2004. p. 1083–6.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011;12:2825–30.
Babanejad N, Davoudi H, An A, Papagelis M. Affective and Contextual Embedding for Sarcasm Detection. In: Proceedings of the 28th International Conference on Computational Linguistics. Barcelona, Spain: International Committee on Computational Linguistics; 2020. p. 225–43. https://doi.org/10.18653/v1/2020.coling-main.20 .
Miller GA. WordNet: a lexical database for English. Commun ACM. 1995;38(11):39–41. https://doi.org/10.1145/219717.219748 .
Brown-Johnson CG, Cataldo PhD JK, Orozco N, Lisha NE, Hickman N, Prochaska JJ. Validity and Reliability of the Internalized Stigma of Smoking Inventory: An Exploration of Shame, Isolation, and Discrimination in Smokers with Mental Health Diagnoses. Am J Addict. 2015;24(5):410–8. https://doi.org/10.1111/ajad.12215 .
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In: Wallach H, Larochelle H, Beygelzimer A, Alché-Buc F d’, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems 32. Curran Associates, Inc.; 2019. p. 8024–35. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf .
Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, et al. HuggingFace’s Transformers: State-of-the-art Natural Language Processing. 2019. https://doi.org/10.48550/arXiv.1910.03771 .
Book Google Scholar
McNemar Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika. 1947;12(2):153–7. https://doi.org/10.1007/BF02295996 .
Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J Roy Stat Soc: Ser B (Methodol). 1995;57(1):289–300.
Dror R, Baumer G, Shlomov S, Reichart R. The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics; 2018. p. 1383–92. https://doi.org/10.18653/v1/P18-1128 .
Creswell JW, Clark VLP. Designing and Conducting Mixed Methods Research. SAGE Publications; 2017.
Doyle L, Brady AM, Byrne G. An overview of mixed methods research. J Res Nurs. 2009;14(2):175–85. https://doi.org/10.1177/1744987108093962 .
Agresti A. Categorical data analysis, Second Edition. New York: Wiley; 2002.
Sharpe D. Chi-square test is statistically significant: Now what? Pract Assess Res Eval. 2015;20(1):8.
Moreno MA, Goniu N, Moreno PS, Diekema D. Ethics of Social Media Research: Common Concerns and Practical Considerations. Cyberpsychol Behav Soc Netw. 2013;16(9):708–13. https://doi.org/10.1089/cyber.2012.0334 .
Galinsky AD, Wang CS, Whitson JA, Anicich EM, Hugenberg K, Bodenhausen GV. The Reappropriation of Stigmatizing Labels: The Reciprocal Relationship Between Power and Self-Labeling. Psychol Sci. 2013;24(10):2020–9. https://doi.org/10.1177/0956797613482943 .
Jawahar G, Sagot B, Seddah D. What Does BERT Learn about the Structure of Language? In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics; 2019. p. 3651–7. https://doi.org/10.18653/v1/P19-1356 .
Khetan V, Ramnani R, Anand M, Sengupta S, Fano AE. Causal BERT: Language Models for Causality Detection Between Events Expressed in Text. In: Arai K, editor. Intelligent Computing. Cham: Springer International Publishing; 2022. p. 965–80. (Lecture Notes in Networks and Systems). https://doi.org/10.1007/978-3-030-80119-9_64 .
Kacewicz E, Pennebaker JW, Davis M, Jeon M, Graesser AC. Pronoun Use Reflects Standings in Social Hierarchies. J Lang Soc Psychol. 2014;33(2):125–43. https://doi.org/10.1177/0261927X13502654 .
Cohn MA, Mehl MR, Pennebaker JW. Linguistic Markers of Psychological Change Surrounding September 11, 2001. Psychol Sci. 2004;15(10):687–93. https://doi.org/10.1111/j.0956-7976.2004.00741.x .
Moore MD, Ali S, Burnich-Line D, Gonzales W, Stanton MV. Stigma, Opioids, and Public Health Messaging: The Need to Disentangle Behavior From Identity. Am J Public Health. 2020;110(6):807–10. https://doi.org/10.2105/AJPH.2020.305628 .
O’Connor C, McNamara N, O’Hara L, McNicholas M, McNicholas F. How do people with eating disorders experience the stigma associated with their condition? A mixed-methods systematic review. J Ment Health. 2021;30(4):454–69. https://doi.org/10.1080/09638237.2019.1685081 .
Download references
Not applicable.
Research reported in this publication was supported by the National Institute On Drug Abuse of the National Institutes of Health under Award Number R21DA056684. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Authors and affiliations.
University of Washington School of Medicine, Biomedical Informatics and Medical Education, 850 Republican St., Box 358047, Seattle, WA, 98109, USA
David Roesler & Annie T. Chen
School of Nursing, University of Washington, Box 357260, Seattle, WA, 98195, USA
Shana Johnny
School of Computing and Information Systems, University of Melbourne, Parkville, VIC, 3052, Australia
Mike Conway
You can also search for this author in PubMed Google Scholar
ATC and DR conceptualized the study. All authors performed data curation, and DR and ATC performed data analysis. DR drafted the initial manuscript and iteratively revised with ATC. All authors reviewed and approved the final manuscript.
Correspondence to Mike Conway or Annie T. Chen .
Ethics approval and consent to participate.
The work performed in this study was determined as non-human subjects research by the Human Subjects Division at the University of Washington (STUDY00015737), and approved by the Office of Research Ethics and Integrity of the University of Melbourne (reference no. 2022–25512-34338–4). All methods were carried out in accordance with relevant guidelines and regulations.
Competing interests.
The authors declare no competing interests.
Publisher’s note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1..
A detailed description of our annotation guidelines.
A complete list of keywords included in each of the handcrafted stigma lexicons.
Results of chi-square tests examining the distribution of stigma labels for each substance.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
Cite this article.
Roesler, D., Johnny, S., Conway, M. et al. A theory-informed deep learning approach to extracting and characterizing substance use-related stigma in social media. BMC Digit Health 2 , 60 (2024). https://doi.org/10.1186/s44247-024-00065-0
Download citation
Received : 19 May 2023
Accepted : 26 January 2024
Published : 16 August 2024
DOI : https://doi.org/10.1186/s44247-024-00065-0
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
ISSN: 2731-684X
Background Triage and clinical consultations increasingly occur remotely. We aimed to learn why safety incidents occur in remote encounters and how to prevent them.
Setting and sample UK primary care. 95 safety incidents (complaints, settled indemnity claims and reports) involving remote interactions. Separately, 12 general practices followed 2021–2023.
Methods Multimethod qualitative study. We explored causes of real safety incidents retrospectively (‘Safety I’ analysis). In a prospective longitudinal study, we used interviews and ethnographic observation to produce individual, organisational and system-level explanations for why safety and near-miss incidents (rarely) occurred and why they did not occur more often (‘Safety II’ analysis). Data were analysed thematically. An interpretive synthesis of why safety incidents occur, and why they do not occur more often, was refined following member checking with safety experts and lived experience experts.
Results Safety incidents were characterised by inappropriate modality, poor rapport building, inadequate information gathering, limited clinical assessment, inappropriate pathway (eg, wrong algorithm) and inadequate attention to social circumstances. These resulted in missed, inaccurate or delayed diagnoses, underestimation of severity or urgency, delayed referral, incorrect or delayed treatment, poor safety netting and inadequate follow-up. Patients with complex pre-existing conditions, cardiac or abdominal emergencies, vague or generalised symptoms, safeguarding issues, failure to respond to previous treatment or difficulty communicating seemed especially vulnerable. General practices were facing resource constraints, understaffing and high demand. Triage and care pathways were complex, hard to navigate and involved multiple staff. In this context, patient safety often depended on individual staff taking initiative, speaking up or personalising solutions.
Conclusion While safety incidents are extremely rare in remote primary care, deaths and serious harms have resulted. We offer suggestions for patient, staff and system-level mitigations.
Data are available upon reasonable request. Details of real safety incidents are not available for patient confidentiality reasons. Requests for data on other aspects of the study from other researchers will be considered.
This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/ .
https://doi.org/10.1136/bmjqs-2023-016674
Request permissions.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Safety incidents are extremely rare in primary care but they do happen. Concerns have been raised about the safety of remote triage and remote consultations.
Rare safety incidents (involving death or serious harm) in remote encounters can be traced back to various clinical, communicative, technical and logistical causes. Telephone and video encounters in general practice are occurring in a high-risk (extremely busy and sometimes understaffed) context in which remote workflows may not be optimised. Front-line staff use creativity and judgement to help make care safer.
As remote modalities become mainstreamed in primary care, staff should be trained in the upstream causes of safety incidents and how they can be mitigated. The subtle and creative ways in which front-line staff already contribute to safety culture should be recognised and supported.
In early 2020, remote triage and remote consultations (together, ‘remote encounters’), in which the patient is in a different physical location from the clinician or support staff member, were rapidly expanded as a safety measure in many countries because they eliminated the risk of transmitting COVID-19. 1–4 But by mid-2021, remote encounters had begun to be depicted as potentially unsafe because they had come to be associated with stories of patient harm, including avoidable deaths and missed cancers. 5–8
Providing triage and clinical care remotely is sometimes depicted as a partial solution to the system pressures facing primary healthcare in many countries, 9–11 including rising levels of need or demand, the ongoing impact of the COVID-19 pandemic and workforce challenges (especially short-term or longer-term understaffing). In this context, remote encounters may be an important component of a mixed-modality health service when used appropriately alongside in-person contacts. 12 13 But this begs the question of what ‘appropriate’ and ‘safe’ use of remote modalities in a primary care context is. Safety incidents (defined as ‘any unintended or unexpected incident which could have, or did, lead to harm for one or more patients receiving healthcare 14 ’) are extremely rare in primary healthcare consultations generally, 15 16 in-hours general practice telephone triage 17 and out-of-hours primary care. 18 But the recent widespread expansion of remote triage and remote consulting in primary care means that a wider range of patients and conditions are managed remotely, making it imperative to re-examine where the risks lie.
Theoretical approaches to safety in healthcare fall broadly into two traditions. 19 ‘Safety I’ studies focus on what went wrong. Incident reports are analysed to identify ‘root causes’ and ‘safety gaps’, and recommendations are made to reduce the chance that further similar incidents will happen in the future. 20 Such studies, undertaken in isolation, tend to lead to a tightening of rules, procedures and protocols. ‘Safety II’ studies focus on why, most of the time, things do not go wrong. Ethnography and other qualitative methods are employed to study how humans respond creatively to unique and unforeseen situations, thereby preventing safety incidents most of the time. 19 Such studies tend to show that actions which achieve safety are highly context specific, may entail judiciously breaking the rules and require human qualities such as courage, initiative and adaptability. 21 Few previous studies have combined both approaches.
In this study, we aimed to use Safety I methods to learn why safety incidents occur (although rarely) in remote primary care encounters and also apply Safety II methods to examine the kinds of creative actions taken by front-line staff that contribute to a safety culture and thereby prevent such incidents.
Multimethod qualitative study across UK, including incident analysis, longitudinal ethnography and national stakeholder interviews.
The idea for this safety study began during a longitudinal ethnographic study of 12 general practices across England, Scotland and Wales as they introduced (and, in some cases, subsequently withdrew) various remote and digital modalities. Practices were selected for maximum diversity in geographical location, population served and digital maturity and followed from mid-2021 to end 2023 using staff and patient interviews and in-person ethnographic visits. The study protocol, 22 baseline findings 23 and a training needs analysis 24 have been published. To provide context for our ethnography, we interviewed a sample of national stakeholders in remote and digital primary care, including out-of-hours providers running telephone-led services, and held four online multistakeholder workshops, one of which was on the theme of safety, for policymakers, clinicians, patients and other parties. Early data from this detailed qualitative work revealed staff and patient concerns about the safety of remote encounters but no actual examples of harm.
To explore the safety theme further, we decided to take a dual approach. First, following Safety I methodology for the study of rare harms, 20 we set out to identify and analyse a sample of safety incidents involving remote encounters. These were sourced from arm’s-length bodies (NHS England, NHS Resolution, Healthcare Safety Investigation Branch) and providers of healthcare at scale (health boards, integrated care systems and telephone advice services), since our own small sample had not identified any of these rare occurrences. Second, we extended our longitudinal ethnographic design to more explicitly incorporate Safety II methodology, 19 allowing us to examine safety culture and safety practices in our 12 participating general practices, especially the adaptive work done by staff to avert potential safety incidents.
Table 1 summarises the data sources.
Summary of data sources
The Safety I dataset (rows 2-5) consisted of 95 specific incident reports, including complaints submitted to the main arm’s-length NHS body in England, NHS England, between 2020 and 2023 (n=69), closed indemnity claims that had been submitted to a national indemnity body, NHS Resolution, between 2015 and 2023 (n=16), reports from an urgent care telephone service in Wales (NHS 111 Wales) between 2020 and 2023 (n=6) and a report on an investigation of telephone advice during the COVID-19 crisis between 2020 and 2022 7 (n=4). These 95 incidents were organised using Microsoft Excel spreadsheets.
The Safety II dataset (rows 6-10) consisted of extracts from fieldnotes, workshop transcripts and interviews collected over 2 years, stored and coded on NVivo qualitative software. These were identified by searching for text words and codes (e.g. ‘risk’, ‘safety’, ‘incident’) and by asking researchers-in-residence, who were closely familiar with practices, to highlight safety incidents involving harm and examples of safety-conscious work practices. This dataset included over 100 formal interviews and numerous on-the-job interviews with practice staff, plus interviews with a sample of 10 GP (general practitioner) trainers and 10 GP trainees (penultimate row of table 1 ) and with six clinical safety experts identified through purposive sampling from government, arm’s-length bodies and health boards (bottom row of table 1 ).
We analysed incident reports, interview data and ethnographic fieldnotes using thematic analysis as described by Braun and Clarke. 25 These authors define a theme as an important, broad pattern in a set of qualitative data, which can (where necessary) be further refined using coding.
Themes in the incident dataset were identified by five steps. First, two researchers (both medically qualified) read each source repeatedly to gain familiarity. Second, those researchers worked independently using Braun and Clarke’s criterion (‘whether it captures something important in relation to the overall research question’—p 82 25 ) to identify themes. Third, they discussed their initial interpretations with each other and resolved differences through discussion. Fourth, they extracted evidence from the data sources to illustrate and refine each theme. Finally, they presented their list of themes along with illustrative examples to the wider team. Cases used to illustrate themes were systematically fictionalised by changing age, randomly allocating gender and altering clinical details. 26 For example, an acute appendicitis could be changed to acute diverticulitis if the issue was a missed acute abdomen.
These safety themes were then used to sensitise us to seek relevant (confirming and disconfirming) material from our ethnographic and interview datasets. For example, the theme ‘poor communication’ (and subthemes such as ‘failure to seek further clarification’ within this) promoted us to look for examples in our stakeholder interviews of poor communication offered as a cause of safety incidents and examples in our ethnographic notes of good communication (including someone seeking clarification). We used these wider data to add nuance to the initial list of themes.
As a final sense-checking step, the draft findings from this study were shown to each of the six safety experts in our sample and refined in the light of their comments (in some cases, for example, they considered the case to have been overfictionalised, thereby losing key clinical messages; they also gave additional examples to illustrate some of the themes we had identified, which underlined the importance of those themes).
The dataset ( table 1 ) consisted of 95 incident reports (see fictionalised examples in box 1 ), plus approximately 400 pages of extracts from interviews, ethnographic fieldnotes and workshop discussions, including situated safety practices (see examples in box 2 ), plus strategic insights relating to policy, organisation and planning of services. Notably, almost all incidents related to telephone calls.
All these cases have been systematically fictionalised as explained in the text.
Case 1 (death)
A woman in her 70s experiencing sudden breathlessness called her GP (general practitioner) surgery. The receptionist answered the phone and informed her that she would place her on the doctor’s list for an emergency call-back. The receptionist was distracted by a patient in the waiting room and did not do so. The patient deteriorated and died at home that afternoon.—NHS Resolution case, pre-2020
Case 2 (death)
An elderly woman contacted her GP after a telephone contact with the out-of-hours service, where constipation had been diagnosed. The GP prescribed laxatives without seeing the patient. The patient self-presented to the emergency department (ED) the following day in obstruction secondary to an incarcerated hernia and died in the operating theatre.—NHS Resolution case, pre-2020
Case 3 (risk to vulnerable patients)
A daughter complained that her elderly father was unable to access his GP surgery as he could not navigate the online triage system. When he phoned the surgery directly, he was directed back to the online system and told to get a relative to complete the form for him.—Complaint to NHS England, 2021
Case 4 (harm)
A woman in her first pregnancy at 28 weeks’ gestation experiencing urinary incontinence called NHS 111. She was taken down by a ‘urinary problems’ algorithm. Both the call handler and the subsequent clinician failed to recognise that she had experienced premature rupture of membranes. She later presented to the maternity department in active labour, and the opportunity to give early steroids to the premature infant was missed.—NHS Resolution case, pre-2020
Case 5 (death)
A doctor called about a 16-year-old girl with lethargy, shaking, fever and poor oral intake who had been unwell for 5 days. The doctor spoke to her older sister and advised that the child had likely glandular fever and should rest. When the parents arrived home, they called an ambulance but the child died of sepsis in the ED.—NHS Resolution case, pre-2020
Case 6 (death)
A 40-year-old woman, 6 weeks after caesarean section, contacted her GP due to shortness of breath, increased heart rate and dry cough. She was advised to get a COVID test and to dial 111 if she developed a productive cough, fever or pain. The following day she collapsed and died at home. The postmortem revealed a large pulmonary embolus. On reviewing the case, her GP surgery felt that had she been seen face to face, her oxygen saturations would have been measured and may have led to suspicion of the diagnosis.—NHS Resolution case, 2020
Case 7 (death)
A son complained that his father with diabetes and chronic kidney disease did not receive any in-person appointments over a period of 1 year. His father went on to die following a leg amputation arising from a complication of his diabetes.—Complaint to NHS England, 2021
Case 8 (death)
A 73-year-old diabetic woman with throat pain and fatigue called the surgery. She was diagnosed with a viral illness and given self-care advice. Over the next few days, she developed worsening breathlessness and was advised to do a COVID test and was given a pulse oximeter. She was found dead at home 4 days later. Postmortem found a blocked coronary artery and a large amount of pulmonary oedema. The cause of death was myocardial infarction and heart failure.—NHS Resolution case, pre-2020
Case 9 (harm)
A patient with a history of successfully treated cervical cancer developed vaginal bleeding. A diagnosis of fibroids was made and the patient received routine care by telephone over the next few months until a scan revealed a local recurrence of the original cancer.—Complaint to NHS England, 2020
Case 10 (death)
A 65-year-old female smoker with chronic cough and breathlessness presented to her GP. She was diagnosed with chronic obstructive pulmonary disease (COPD) and monitored via telephone. She did not respond to inhalers or antibiotics but continued to receive telephone monitoring without further investigation. Her symptoms continued to worsen and she called an ambulance. In the ED, she was diagnosed with heart failure and died soon after.—Complaint to NHS England, 2021
Case 11 (harm)
A 30-year-old woman presented with intermittent episodes of severe dysuria over a period of 2 years. She was given repeated courses of antibiotics but no urine was sent for culture and she was not examined. After 4 months of symptoms, she saw a private GP and was diagnosed with genital herpes.—Complaint to NHS England, 2021
Case 12 (harm)
There were repeated telephone consultations about a baby whose parents were concerned that the child was having a funny colour when feeding or crying. The 6-week check was done by telephone and at no stage was the child seen in person. Photos were sent in, but the child’s dark skin colour meant that cyanosis was not easily apparent to the reviewing clinician. The child was subsequently admitted by emergency ambulance where a significant congenital cardiac abnormality was found.—Complaint to NHS England, 2020 1
Case 13 (harm)
A 35-year-old woman in her third trimester of pregnancy had a telephone appointment with her GP about a breast lump. She was informed that this was likely due to antenatal breast changes and was not offered an in-person appointment. She attended after delivery and was referred to a breast clinic where a cancer was diagnosed.—Complaint to NHS England, 2020
Case 14 (harm)
A 63-year-old woman with a variety of physical symptoms including diarrhoea, hip girdle pain, palpitations, light-headedness and insomnia called her surgery on multiple occasions. She was told her symptoms were likely due to anxiety, but was diagnosed with stage 4 ovarian cancer and died soon after.—Complaint to NHS England, 2021
Case 15 (death)
A man with COPD with worsening shortness of breath called his GP surgery. The staff asked him if it was an emergency, and when the patient said no, scheduled him for 2 weeks later. The patient died before the appointment.—Complaint to NHS England, 2021
Case 16 (safety incident averted by switching to video call for a sick child)
‘I’ve remembered one father that called up. Really didn’t seem to be too concerned. And was very much under-playing it and then when I did a video call, you know this child… had intercostal recession… looked really, really poorly. And it was quite scary actually that, you know, you’d had the conversation and if you’d just listened to what Dad was saying, actually, you probably wouldn’t be concerned.’—GP (general practitioner) interview 2022
Case 17 (‘red flag’ spotted by support staff member)
A receptionist was processing routine ‘administrative’ encounters sent in by patients using AccuRx (text messaging software). She became concerned about a sick note renewal request from a patient with a mental health condition. The free text included a reference to feeling suicidal, so the receptionist moved the request to the ‘red’ (urgent call-back) list. In interviews with staff, it became apparent that there had recently been heated discussion in the practice about whether support staff were adding ‘too many’ patients to the red list. After discussing cases, the doctors concluded that it should be them, not the support staff, who should absorb the risk in uncertain cases. The receptionist said that they had been told: ‘if in doubt, put it down as urgent and then the duty doctor can make a decision.’—Ethnographic fieldnotes from general practice 2023
Case 18 (‘check-in’ phone call added on busy day)
A duty doctor was working through a very busy Monday morning ‘urgent’ list. One patient had acute abdominal pain, which would normally have triggered an in-person appointment, but there were no slots and hard decisions were being made. This patient had had the pain already for a week, so the doctor judged that the general rule of in-person examination could probably be over-ridden. But instead of simply allocating to a call-back, the doctor asked a support staff member to phone the patient, ask ‘are you OK to wait until tomorrow?’ and offer basic safety-netting advice.—Ethnographic fieldnotes from general practice 2023
Case 19 (receptionist advocating on behalf of ‘angry’ walk-in patient)
A young Afghan man with limited English walked into a GP surgery on a very busy day, ignoring the prevailing policy of ‘total triage’ (make contact by phone or online in the first instance). He indicated that he wanted a same-day in-person appointment for a problem he perceived as urgent. A heated exchange occurred with the first receptionist, and the patient accused her of ‘racism’. A second receptionist of non-white ethnicity herself noted the man’s distress and suspected that there may indeed be an urgent problem. She asked the first receptionist to leave the scene, saying she wanted to ‘have a chat’ with the patient (‘the colour of my skin probably calmed him down more than anything’). Through talking to the patient and looking through his record, she ascertained that he had an acute infection that likely needed prompt attention. She tried to ‘bend the rules’ and persuade the duty doctor to see the patient, conveying the clinical information but deliberately omitting the altercation. But the first receptionist complained to the doctor (‘he called us racists’) and the doctor decided that the patient would not therefore be offered a same-day appointment. The second receptionist challenged the doctor (‘that’s not a reason to block him from getting care’). At this point, the patient cried and the second receptionist also became upset (‘this must be serious, you know’). On this occasion, despite her advocacy the patient was not given an immediate appointment.—Ethnographic fieldnotes from general practice 2022
Case 20 (long-term condition nurse visits ‘unengaged’ patients at home)
An advanced nurse practitioner talks of two older patients, each with a long-term condition, who are ‘unengaged’ and lacking a telephone. In this practice, all long-term condition reviews are routinely done by phone. She reflects that some people ‘choose not to have avenues of communication’ (ie, are deliberately not contactable), and that there may be reasons for this (‘maybe health anxiety or just old’). She has, on occasion, ‘turned up’ unannounced at the patient’s home and asked to come in and do the review, including bloods and other tests. She reflects that while most patients engage well with the service, ‘half my job is these patients who don’t engage very well.’—Ethnographic fieldnotes from digitally advanced general practice 2022
Case 21 (doctor over-riding patient’s request for telephone prescribing)
A GP trainee described a case of a 53-year-old first-generation immigrant from Pakistan, a known smoker with hypertension and diabetes. He had booked a telephone call for vomiting and sinus pain. There was no interpreter available but the man spoke some English. He said he had awoken in the night with pain in his sinuses and vomiting. All he wanted was painkillers for his sinuses. The story did not quite make sense, and the man ‘sounded unwell’. The GP told him he needed to come in and be examined. The patient initially resisted but was persuaded to come in. When the GP went to call him in, the man was visibly unwell and lying down in the waiting room. When seen in person, he admitted to shoulder pain. The GP sent him to accident and emergency (A&E) where a myocardial infarction was diagnosed.—Trainee interview 2023
Below, we describe the main themes that were evident in the safety incidents: a challenging organisational and system context, poor communication compounded by remote modalities, limited clinical information, patient and carer burden and inadequate training. Many safety incidents illustrated multiple themes—for example, poor communication and failures of clinical assessment or judgement and patient complexity and system pressures. In the detailed findings below, we illustrate why safety incidents occasionally occur and why they are usually avoided.
Introduction of remote triage and expansion of remote consultations in UK primary care occurred at a time of unprecedented system stress (an understaffed and chronically under-resourced primary care sector, attempting to cope with a pandemic). 23 Many organisations had insufficient telephone lines or call handlers, so patients struggled to access services (eg, half of all calls to the emergency COVID-19 telephone service in March 2020 were never answered 7 ). Most remote consultations were by telephone. 27
Our safety incident dataset included examples of technically complex access routes which patients found difficult or impossible to navigate (case 3 in box 1 ) and which required non-clinical staff to make clinical or clinically related judgements (cases 4 and 15). Our ethnographic dataset contained examples of inflexible application of triage rules (eg, no face-to-face consultation unless the patient had already had a telephone call), though in other practices these rules could be over-ridden by staff using their judgement or asking colleagues. Some practices had a high rate of failed telephone call-backs (patient unobtainable).
High demand, staff shortages and high turnover of clinical and support staff made the context for remote encounters inherently risky. Several incidents were linked to a busy staff member becoming distracted (case 1). Telephone consultations, which tend to be shorter, were sometimes used in the hope of improving efficiency. Some safety incidents suggested perfunctory and transactional telephone consultations, with flawed decisions made on the basis of incomplete information (eg, case 2).
Many practices had shifted—at least to some extent—from a demand-driven system (in which every request for an appointment was met) to a capacity-driven one (in which, if a set capacity was exceeded, patients were advised to seek care elsewhere), though the latter was often used flexibly rather than rigidly with an expectation that some patients would be ‘squeezed in’. In some practices, capacity limits had been introduced to respond to escalation of demand linked to overuse of triage templates (eg, to inquire about minor symptoms).
As a result of task redistribution and new staff roles, a single episode of care for one problem often involved multiple encounters or tasks distributed among clinical and non-clinical staff (often in different locations and sometimes also across in-hours and out-of-hours providers). Capacity constraints in onward services placed pressure on primary care to manage risk in the community, leading in some cases to failure to escalate care appropriately (case 6).
Some safety incidents were linked to organisational routines that had not adapted sufficiently to remote—for example, a prescription might be issued but (for various reasons) it could not be transmitted electronically to the pharmacy. Certain urgent referrals were delayed if the consultation occurred remotely (a referral for suspected colon cancer, for example, would not be accepted without a faecal immunochemical test).
Training, supervising and inducting staff was more difficult when many were working remotely. If teams saw each other less frequently, relationship-building encounters and ‘corridor’ conversations were reduced, with knock-on impacts for individual and team learning and patient care. Those supervising trainees or allied professionals reported loss of non-verbal cues (eg, more difficult to assess how confident or distressed the trainee was).
Clinical and support staff regularly used initiative and situated judgement to compensate for an overall lack of system resilience ( box 1 ). Many practices had introduced additional safety measures such as lists of patients who, while not obviously urgent, needed timely review by a clinician. Case 17 illustrates how a rule of thumb ‘if in doubt, put it down as urgent’ was introduced and then applied to avert a potentially serious mental health outcome. Case 18 illustrates how, in the context of insufficient in-person slots to accommodate all high-risk cases, a unique safety-netting measure was customised for a patient.
Because sense data (eg, sight, touch, smell) are missing, 28 remote consultations rely heavily on the history. Many safety incidents were characterised by insufficient or inaccurate information for various reasons. Sometimes (cases 2, 5, 6, 8, 9, 10 and 11), the telephone consultation was too short to do justice to the problem; the clinician asked few or no questions to build rapport, obtain a full history, probe the patient’s answers for additional detail, confirm or exclude associated symptoms and inquire about comorbidities and medication. Video provided some visual cues but these were often limited to head and shoulders, and photographs were sometimes of poor quality.
Cases 2, 4, 5 and 9 illustrate the dangers of relying on information provided by a third party (another staff member or a relative). A key omission (eg, in case 5) was failing to ask why the patient was unable to come to the phone or answer questions directly.
Some remote triage conversations were conducted using an inappropriate algorithm. In case 4, for example, the call handler accepted a pregnant patient’s assumption that leaking fluid was urine when the problem was actually ruptured membranes. The wrong pathway was selected; vital questions remained unasked; and a skewed history was passed to (and accepted by) the clinician. In case 8, the patient’s complaint of ‘throat’ pain was taken literally and led to ‘viral illness’ advice, overlooking a myocardial infarction.
The cases in box 2 illustrate how staff compensated for communication challenges. In case 16, a GP plays a hunch that a father’s account of his child’s asthma may be inaccurate and converts a phone encounter to video, revealing the child’s respiratory distress. In case 19 (an in-person encounter but relevant because the altercation occurs partly because remote triage is the default modality), one receptionist correctly surmises that the patient’s angry demeanour may indicate urgency and uses her initiative and interpersonal skills to obtain additional clinical information. In case 20, a long-term condition nurse develops a labour-intensive workaround to overcome her elderly patients’ ‘lack of engagement’. More generally, we observed numerous examples of staff using both formal tools (eg, see ‘red list’ in case 17) and informal measures (eg, corridor chats) to pass on what they believed to be crucial information.
Cases 2 and 4–14 all describe serious conditions including congenital cyanotic heart disease, pulmonary oedema, sepsis, cancer and diabetic foot which would likely have been readily diagnosed with an in-person examination. While patients often uploaded still images of skin lesions, these were not always of sufficient quality to make a confident diagnosis.
Several safety incidents involved clinicians assuming that a diagnosis made on a remote consultation was definitive rather than provisional. Especially when subsequent consultations were remote, such errors could become ingrained, leading to diagnostic overshadowing and missed or delayed diagnosis (cases 2, 8, 9, 10, 11 and 13). Patients with pre-existing conditions (especially if multiple or progressive), the very young and the elderly were particularly difficult to assess by telephone (cases 1, 2, 8, 10, 12 and 16). Clinical conditions difficult to assess remotely included possible cardiac pain (case 8), acute abdomen (case 2), breathing difficulties (cases 1, 6 and 10), vague and generalised symptoms (cases 5 and 14) and symptoms which progressed despite treatment (cases 9, 10 and 11). All these categories came up repeatedly in interviews and workshops as clinically risky.
Subtle aspects of the consultation which may have contributed to safety incidents in a telephone consultation included the inability to fully appraise the patient’s overall health and well-being (including indicators relevant to mental health such as affect, eye contact, personal hygiene and evidence of self-harm), general demeanour, level of agitation and concern, and clues such as walking speed and gait (cases 2, 5, 6, 7, 8, 10, 12 and 14). Our interviews included stories of missed cases of new-onset frailty and dementia in elderly patients assessed by telephone.
In most practices we studied, most long-term condition management was undertaken by telephone. This may be appropriate (and indeed welcome) when the patient is well and confident and a physical examination is not needed. But diabetes reviews, for example, require foot examination. Case 7 describes the deterioration and death of a patient with diabetes whose routine check-ups had been entirely by telephone. We also heard stories of delayed diagnosis of new diabetes in children when an initial telephone assessment failed to pick up lethargy, weight loss and smell of ketones, and point-of-care tests of blood or urine were not possible.
Nurses observed that remote consultations limit opportunities for demonstrating or checking the patient’s technique in using a device for monitoring or treating their condition such as an inhaler, oximeter or blood pressure machine.
Safety netting was inadequate in many remote safety incidents, even when provided by a clinician (cases 2, 5, 6, 8, 10, 12 and 13) but especially when conveyed by a non-clinician (case 15). Expert interviewees identified that making life-changing diagnoses remotely and starting patients on long-term medication without an in-person appointment was also risky.
Our ethnographic data showed that various measures were used to compensate for limited clinical information, including converting a phone consultation to video (case 16), asking the patient if they felt they could wait until an in-person slot was available (case 18), visiting the patient at home (case 20) and enacting a ‘if the history doesn’t make sense, bring the patient in for an in-person assessment’ rule of thumb (case 21). Out-of-hours providers added examples of rules of thumb that their services had developed over years of providing remote services, including ‘see a child face-to-face if the parent rings back’, ‘be cautious about third-party histories’, ‘visit a palliative care patient before starting a syringe driver’ and ‘do not assess abdominal pain remotely’.
Given the greater importance of the history in remote consultations, patients who lacked the ability to communicate and respond in line with clinicians’ expectations were at a significant disadvantage. Several safety incidents were linked to patients’ limited fluency in the language and culture of the clinician or to specific vulnerabilities such as learning disability, cognitive impairment, hearing impairment or neurodiversity. Those with complex medical histories and comorbidities, and those with inadequate technical set-up and skills (case 3), faced additional challenges.
In many practices, in-person appointments were strictly limited according to more or less rigid triage criteria. Some patients were unable to answer the question ‘is this an emergency?’ correctly, leading to their condition being deprioritised (case 15). Some had learnt to ‘game’ the triage system (eg, online templates 29 ) by adapting their story to obtain the in-person appointment they felt they needed. This could create distrust and lead to inaccurate information on the patient record.
Our ethnographic dataset contained many examples of clinical and support staff using initiative to compensate for vulnerable patients’ inability or unwillingness to take on the additional burden of remote modalities (cases 19 and 20 in Box 2 30 31 ).
Safety incidents highlighted various training needs for support staff members (eg, customer care skills, risks of making clinical judgements) and clinicians (eg, limitations of different modalities, risks of diagnostic overshadowing). Whereas out-of-hours providers gave thorough training to novice GPs (covering such things as attentiveness, rapport building, history taking, probing, attending to contextual cues and safety netting) in telephone consultations, 32–34 many in-hours clinicians had never been formally taught to consult by telephone. Case 17 illustrates how on-the-job training based on acknowledgement of contextual pressures and judicious use of rules of thumb may be very effective in averting safety incidents.
An important overall finding from this study is that examples of deaths or serious harms associated with remote encounters in primary care were extremely rare, amounting to fewer than 100 despite an extensive search going back several years.
Analysis of these 95 safety incidents, drawn from multiple complementary sources, along with rich qualitative data from ethnography, interviews and workshops has clarified where the key risks lie in remote primary care. Remote triage and consultations expanded rapidly in the context of the COVID-19 crisis; they were occurring in the context of resource constraints, understaffing and high demand. Triage and care pathways were complex, multilayered and hard to navigate; some involved distributed work among multiple clinical and non-clinical staff. In some cases, multiple remote encounters preceded (and delayed) a needed in-person assessment.
In this high-risk context, safety incidents involving death or serious harm were rare, but those that occurred were characterised by a combination of inappropriate choice of modality, poor rapport building, inadequate information gathering, limited clinical assessment, inappropriate clinical pathway (eg, wrong algorithm) and failure to take account of social circumstances. These led to missed, inaccurate or delayed diagnoses, underestimation of severity or urgency, delayed referral, incorrect or delayed treatment, poor safety netting and inadequate follow-up. Patients with complex or multiple pre-existing conditions, cardiac or abdominal emergencies, vague or generalised symptoms, safeguarding issues and failure to respond to previous treatment, and those who (for any reason) had difficulty communicating, seemed particularly at risk.
The main strength of this study was that it combined the largest Safety I study undertaken to date of safety incidents in remote primary care (using datasets which have not previously been tapped for research), with a large, UK-wide ethnographic Safety II analysis of general practice as well as stakeholder interviews and workshops. Limitations of the safety incident sample (see final column in table 1 ) include that it was skewed towards very rare cases of death and serious harm, with relatively few opportunities for learning that did not result in serious harm. Most sources were retrospective and may have suffered from biases in documentation and recall. We also failed to obtain examples of safeguarding incidents (which would likely turn up in social care audits). While all cases involved a remote modality (or a patient who would not or could not use one), it is impossible to definitively attribute the harm to that modality.
This study has affirmed previous findings that processes, workflows and training in in-hours general practice have not adapted adequately to the booking, delivery and follow-up of remote consultations. 24 35 36 Safety issues can arise, for example, from how the remote consultation interfaces with other key practice routines (eg, for making urgent referrals for possible cancer). The sheer complexity and fragmentation of much remote and digital work underscores the findings from a systematic review of the importance of relational coordination (defined as ‘a mutually reinforcing process of communicating and relating for the purpose of task integration ’ (p 3) 37 ) and psychological safety (defined as ‘people’s perceptions of the consequences of taking interpersonal risks in a particular context such as a workplace ’ (p 23) 38 ) in building organisational resilience and assuring safety.
The additional workload and complexity associated with running remote appointments alongside in-person ones is cognitively demanding for staff and requires additional skills for which not all are adequately trained. 24 39 40 We have written separately about the loss of traditional continuity of care as primary care services become digitised, 41–43 and about the unmet training needs of both clinical and support staff for managing remote and digital encounters. 24
Our findings also resonate with research showing that remote modalities can interfere with communicative tasks such as rapport building, establishing a therapeutic relationship and identifying non-verbal cues such as tearfulness 35 36 44 ; that remote consultations tend to be shorter and feature less discussion, information gathering and safety netting 45–48 ; and that clinical assessment in remote encounters may be challenging, 27 49 50 especially when physical examination is needed. 35 36 51 These factors may rarely contribute to incorrect or delayed diagnoses, underestimation of the seriousness or urgency of a case, and failure to identify a deteriorating trajectory. 35 36 52–54
Even when systems seem adequate, patients may struggle to navigate them. 23 30 31 This finding aligns with an important recent review of cognitive load theory in the context of remote and digital health services: because such services are more cognitively demanding for patients, they may widen inequities of access. 55 Some patients lack navigating and negotiating skills, access to key technologies 13 36 or confidence in using them. 30 35 The remote encounter may require the patient to have a sophisticated understanding of access and cross-referral pathways, interpret their own symptoms (including making judgements about severity and urgency), obtain and use self-monitoring technologies (such as a blood pressure machine or oximeter) and convey these data in medically meaningful ways (eg, by completing algorithmic triage forms or via a telephone conversation). 30 56 Furthermore, the remote environment may afford fewer opportunities for holistically evaluating, supporting or safeguarding the vulnerable patient, leading to widening inequities. 13 35 57 Previous work has also shown that patients with pre-existing illness, complex comorbidities or high-risk states, 58 59 language non-concordance, 13 35 inability to describe their symptoms (eg, due to autism 60 ), extremes of age 61 and those with low health or system literacy 30 are more difficult to assess remotely.
Many of the contributory factors to safety incidents in remote encounters have been suggested previously, 35 36 and align broadly with factors that explain safety incidents more generally. 53 62 63 This new study has systematically traced how upstream factors may, very rarely, combine to contribute to avoidable human tragedies—and also how primary care teams develop local safety practices and cultures to help avoid them. Our study provides some important messages for practices and policymakers.
First, remote encounters in general practice are mostly occurring in a system designed for in-person encounters, so processes and workflows may work less well.
Second, because the remote encounter depends more on history taking and dialogue, verbal communication is even more mission critical. Working remotely under system pressures and optimising verbal communication should both be priorities for staff training.
Third, the remote environment may increase existing inequities as patients’ various vulnerabilities (eg, extremes of age, poverty, language and literacy barriers, comorbidities) make remote communication and assessment more difficult. Our study has revealed impressive efforts from staff to overcome these inequities on an individual basis; some of these workarounds may become normalised and increase efficiency, but others are labour intensive and not scalable.
A final message from this study is that clinical assessment provides less information when a physical examination (and even a basic visual overview) is not possible. Hence, the remote consultation has a higher degree of inherent uncertainty. Even when processes have been optimised (eg, using high-quality triage to allocate modality), but especially when they have not, diagnoses and assessments of severity or urgency should be treated as more provisional and revisited accordingly. We have given examples in the Results section of how local adaptation and rule breaking bring flexibility into the system and may become normalised over time, leading to the creation of locally understood ‘rules of thumb’ which increase safety.
Overall, these findings underscore the need to share learning and develop guidance about the drivers of risk, how these play out in different kinds of remote encounters and how to develop and strengthen Safety II approaches to mitigate those risks. Table 2 shows proposed mitigations at staff, process and system levels, as well as a preliminary list of suggestions for patients, which could be refined with patient input using codesign methods. 64
Reducing safety incidents in remote primary care
This study has helped explain where the key risks lie in remote primary care encounters, which in our dataset were almost all by telephone. It has revealed examples of how front-line staff create and maintain a safety culture, thereby helping to prevent such incidents. We suggest four key avenues for further research. First, additional ethnographic studies in general practice might extend these findings and focus on specific subquestions (eg, how practices identify, capture and learn from near-miss incidents). Second, ethnographic studies of out-of-hours services, which are mostly telephone by default, may reveal additional elements of safety culture from which in-hours general practice could learn. Third, the rise in asynchronous e-consultations (in which patients complete an online template and receive a response by email) raises questions about the safety of this new modality which could be explored in mixed-methods studies including quantitative analysis of what kinds of conditions these consultations cover and qualitative analysis of the content and dynamics of the interaction. Finally, our findings suggest that the safety of new clinically related ‘assistant’ roles in general practice should be urgently evaluated, especially when such staff are undertaking remote assessment or remote triage.
Patient consent for publication.
Not applicable.
Ethical approval was granted by the East Midlands—Leicester South Research Ethics Committee and UK Health Research Authority (September 2021, 21/EM/0170 and subsequent amendments). Access to the NHS Resolution dataset was obtained by secondment of the RP via honorary employment contract, where she worked with staff to de-identify and fictionalise relevant cases. The Remote by Default 2 study (referenced in main text) was co-designed by patients and lay people; it includes a diverse patient panel. Oversight was provided by an independent external advisory group with a lay chair and patient representation. A person with lived experience of a healthcare safety incident (NS) is a co-author on this paper and provided input to data analysis and writing up, especially the recommendations for patients in table 2 .
We thank the participating organisations for cooperating with this study and giving permission to use fictionalised safety incidents. We thank the participants in the ethnographic study (patients, practice staff, policymakers, other informants) who gave generously of their time and members of the study advisory group.
X @dakinfrancesca, @trishgreenhalgh
Contributors RP led the Safety I analysis with support from AC. The Safety II analysis was part of a wider ethnographic study led by TG and SS, on which all other authors undertook fieldwork and contributed data. TG and RP wrote the paper, with all other authors contributing refinements. All authors checked and approved the final manuscript. RP is guarantor.
Funding Funding was from NIHR HS&DR (grant number 132807) (Remote by Default 2 study) and NIHR School for Primary Care Research (grant number 594) (ModCons study), plus an NIHR In-Practice Fellowship for RP.
Competing interests RP was National Professional Advisor, Care Quality Commission 2017–2022, where her role included investigation of safety issues.
Provenance and peer review Not commissioned; externally peer reviewed.
You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.
All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .
Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.
Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.
Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.
Original Submission Date Received: .
Find support for a specific problem in the support section of our website.
Please let us know what you think of our products and services.
Visit our dedicated information section to learn more about MDPI.
Diagnostic evaluation of the contribution of complementary training subjects in the self-perception of competencies in ethics, social responsibility, and sustainability in engineering students.
2. theoretical framework, 3. review of related research, 4. materials and methods, 4.1. study population, 4.2. instrument, 4.3. data analysis technique, 5.1. descriptive statistics, 5.2. analysis of competencies in ers vs. courses taken, 5.3. relationship of ers competencies with sociodemographic variables, 6. discussion, 7. conclusions, 8. future work, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest.
Sociodemographic Variables | First Semester | Last Semesters | Total | ||||
---|---|---|---|---|---|---|---|
n | % | n | % | n | % | ||
Gender | Female | 34 | 13.7 | 18 | 10.7 | 52 | 12.4 |
Male | 210 | 84.3 | 151 | 89.3 | 361 | 86.4 | |
Other | 5 | 2.0 | 0 | 0 | 5 | 1.2 | |
Age | 15–25 years | 209 | 83.9 | 86 | 50.9 | 295 | 70.6 |
26–35 years | 33 | 13.3 | 64 | 37.9 | 97 | 23.2 | |
36 years and above | 7 | 2.8 | 19 | 11.3 | 26 | 6.2 | |
Stratum | 1 | 64 | 25.7 | 32 | 18.9 | 96 | 23.0 |
2 | 110 | 44.2 | 83 | 49.1 | 193 | 46.2 | |
3 | 69 | 27.7 | 54 | 32.0 | 123 | 29.4 | |
4 | 6 | 2.4 | 0 | 0 | 6 | 1.4 |
Experts | Total | ||
---|---|---|---|
n | % | ||
Higher education level | Master’s degree | 13 | 61.9 |
Doctor’s degree | 8 | 38.1 | |
Age | 26–35 years | 1 | 4.8 |
36–45 years | 6 | 28.6 | |
46–55 years | 8 | 38.1 | |
56 years and above | 6 | 28.6 | |
Experience in education | 1–5 years | 1 | 4.8 |
5–10 years | 3 | 14.3 | |
Over 10 years | 17 | 81.0 | |
Experience in the productive sector | Yes | 14 | 66.7 |
No | 7 | 33.3 | |
Years in the productive sector | 1–5 years | 1 | 4.8 |
5–10 years | 1 | 4.8 | |
Over 10 years | 12 | 57.1 | |
TOTAL | 21 | 100 |
Reliability Statistics | ||
---|---|---|
Cronbach’s Alpha | Cronbach’s Alpha Based on Standardized Items | N of Elements |
0.930 | 0.934 | 30 |
Competency | Dimensions | Indicator | Item |
---|---|---|---|
Social Responsibility [ ] | Awareness | I am aware that I am in the world to contribute responsibly to its transformation | R1 |
I understand that being part of this world entails a responsibility towards the members of a group or organization for the benefit of society | R2 | ||
Commitment | I am familiar with and care about local issues and their connection to national and global factors | R3 | |
Citizenship | As a student, I feel that I have the skills to contribute to social, political, and economic changes in my community | R4 | |
As a student, I would like to contribute to public policies that improve the quality of life for (ethnic, racial, sexual) minority groups and other vulnerable groups (children, women…) | R5 | ||
Social justice | I believe that my educational process provides me with the necessary tools to follow up on public or private programs and initiatives aimed at social transformation | R6 | |
I believe that, through my profession, I can contribute to reducing poverty and inequality in my country | R7 | ||
Ethics [ ] | Responsibility | In my daily actions, it is important to fulfill my commitments on time | E1 |
In my daily actions, I am willing to take responsibility for any mistakes | E2 | ||
Act with moral principles and professional values | I am willing to spend time updating my knowledge about my career | E3 | |
There are ethical decisions that are so important in my career that I cannot leave them to the sole discretion of others | E4 | ||
In my daily actions, maintaining confidentiality is crucial | E5 | ||
Doing the right things in my daily life brings me inner peace | E6 | ||
I communicate my values through my daily actions | E7 | ||
Professional and personal ethics | To avoid mistakes in my profession, I must be aware of the limits of my knowledge and skills | E8 | |
Working with passion is part of my personal fulfillment | E9 | ||
Ethical aspects are crucial to my career and future profession | E10 | ||
I must assess the consequences before making important decisions | E11 | ||
It is good to aspire but not have excessive ambition | E12 | ||
To perform well in my career, developing technical skills alone is not enough | E13 | ||
Honesty | To be a good professional, I cannot ignore the problems of the society I live in | E14 | |
I take the risk of making mistakes to improve my career performance | E15 | ||
Sustainability [ ] (S1, S6, S7, S8) [ ] (S2 to S5) | Systemic | I analyze individually or in groups situations related to sustainability and their impact on society, the environment, and the economy, both locally and globally | S1 |
Discipline and regulations | I am aware of the importance of sustainability in society. I learn and then I impact my community | S6 | |
Anticipatory | I use resources sustainably in the prevention of negative impacts on the environment and social and economic systems | S7 | |
I anticipate and understand the impact of environmental changes on social and economic systems | S3 | ||
Strategic | I am aware of the potential of the human and natural resources in my environment for sustainable development | S8 | |
I actively participate in groups or communities committed to sustainability | S2 | ||
Action competence for interventions | I am coherent in my actions, respecting and appreciating (biological, social, cultural) diversity and committing myself to improving sustainability | S4 | |
I create and provide critical and creative solutions to technology and engineering issues, always considering sustainability | S5 |
Competencies | Social Responsibility | Ethics | Sustainability |
---|---|---|---|
Social responsibility | 1 | ||
Ethics | 0.566 ** | 1 | |
Sustainability | 0.719 ** | 0.484 ** | 1 |
Group | Gender | Age | Stratum | |
---|---|---|---|---|
Mode | First semester | 2 | 1 | 2 |
Last semesters | 2 | 1 | 2 | |
All | 2 | 1 | 2 |
Group | Social Responsibility | Ethics | Sustainability |
---|---|---|---|
First semester | 4.028 (0.656) | 4.496 (0.453) | 3.798 (0.689) |
Last semester | 4.101 (0.589) | 4.577 (0.447) | 3.921 (0.646) |
Levene Test | t-Test for Equality of Means | ||||||||
---|---|---|---|---|---|---|---|---|---|
F | Sig. | t | Gl | Sig (Bilateral) | Mean Differences | Standard Error Differences | 95% Difference Confidence Interval | ||
Social responsibility | 0.919 | 0.338 | −1.167 | 416 | 0.244 | −0.07332 | 0.06281 | −0.19679 | 0.05014 |
Ethics | 1.277 | 0.259 | −1.808 | 416 | 0.071 | −0.08127 | 0.04494 | −0.16961 | 0.00706 |
Sustainability | 0.128 | 0.721 | −1.839 | 416 | 0.067 | −0.12317 | 0.06698 | −0.25483 | 0.00849 |
Statistical Tests | Social Responsibility | Ethics | Sustainability |
---|---|---|---|
Mann–Whitney U test | 20,073.500 | 18,501.000 | 19,304.500 |
Wilcoxon W test | 51,198.500 | 49,626.000 | 50,429.500 |
Z test | −0.800 | −2.101 | −1.435 |
Bilateral asymptotic sig. | 0.424 | 0.036 | 0.151 |
ANOVA | Gender | Age | Stratum | |||
---|---|---|---|---|---|---|
F | Sig. | F | Sig. | F | Sig. | |
Social responsibility | 0.438 | 0.646 | 11.052 | 0.000 | 1.705 | 0.165 |
Ethics | 0.337 | 0.714 | 7.404 | 0.000 | 0.227 | 0.877 |
Sustainability | 0.805 | 0.448 | 9.237 | 0.000 | 0.742 | 0.527 |
Social Responsibility | |||
---|---|---|---|
Age | N | Subset | |
1 | 2 | ||
15–25 years | 295 | 3.9603 | |
26–35 years | 97 | 4.2180 | |
36 years and above | 26 | 4.5357 | 4.5357 |
Sig. | 0.091 | 0.221 |
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
Yepes, S.M.; Montes, W.F.; Herrera, A. Diagnostic Evaluation of the Contribution of Complementary Training Subjects in the Self-Perception of Competencies in Ethics, Social Responsibility, and Sustainability in Engineering Students. Sustainability 2024 , 16 , 7069. https://doi.org/10.3390/su16167069
Yepes SM, Montes WF, Herrera A. Diagnostic Evaluation of the Contribution of Complementary Training Subjects in the Self-Perception of Competencies in Ethics, Social Responsibility, and Sustainability in Engineering Students. Sustainability . 2024; 16(16):7069. https://doi.org/10.3390/su16167069
Yepes, Sara María, Willer Ferney Montes, and Andres Herrera. 2024. "Diagnostic Evaluation of the Contribution of Complementary Training Subjects in the Self-Perception of Competencies in Ethics, Social Responsibility, and Sustainability in Engineering Students" Sustainability 16, no. 16: 7069. https://doi.org/10.3390/su16167069
Article access statistics, further information, mdpi initiatives, follow mdpi.
Subscribe to receive issue release notifications and newsletters from MDPI journals
IMAGES
COMMENTS
Social work researchers will send out a survey, receive responses, aggregate the results, analyze the data, and form conclusions based on trends. Surveys are one of the most common research methods social workers use — and for good reason. They tend to be relatively simple and are usually affordable.
Social work research means conducting an investigation in accordance with the scientific method. The aim of social work research is to build the social work knowledge base in order to solve practical problems in social work practice or social policy. Investigating phenomena in accordance with the scientific method requires maximal adherence to ...
Graduate Research Methods in Social Work by DeCarlo, et al., is a comprehensive and well-structured guide that serves as an invaluable resource for graduate students delving into the intricate world of social work research. The book is divided into five distinct parts, each carefully curated to provide a step-by-step approach to mastering ...
This textbook was created to provide an introduction to research methods for BSW and MSW students, with particular emphasis on research and practice relevant to students at the University of Texas at Arlington. It provides an introduction to social work students to help evaluate research for evidence-based practice and design social work research projects. It can be used with its companion, A ...
Practice research in social work is evolving and has been iteratively defined through a series of statements over the last 15 years (Epstein et al., 2015; Fook & Evans, 2011; Joubert et al., 2023; Julkunen et al., 2014; Sim et al., 2019).Most recently, the Melbourne Statement on Practice Research (Joubert et al., 2023) focused on practice meeting research, with an emphasis on 'the ...
In addition, Bruce Thyer is the editor of the journal Research in Social Work Practice and expressed interest in updating the book along with the other two candidates. In the field of social work, qualitative research is starting to gain more prominence as are mixed methods and various issues regarding race, ethnicity and gender.
Abstract. Mixed methods are a useful approach chosen by many social work researchers. This article showcases a quality framework using social work examples as practical guidance for social work researchers. Combining methodological literature with practical social work examples, elements of a high-quality approach to mixed methods are showcased ...
A three-part structure introduces the fundamentals of research methods, the different types of social work research, and the use of data analysis for evaluation of social work practice. Chapter-opening vignettes illustrate the value of chapter content to the practicing social worker.
"`Not so much a handbook, but an excellent source of reference' - British Journal of Social Work This volume is the definitive resource for anyone doing research in social work. It details both quantitative and qualitative methods and data collection, as well as suggesting the methods appropriate to particular types of studies.
Table of Contents. Chapter 1: Introduction to research. Chapter 2: Beginning a research project. Chapter 3: Reading and evaluating literature. Chapter 4: Conducting a literature review. Chapter 5: Ethics in social work research. Chapter 6: Linking methods with theory. Chapter 7: Design and causality. Chapter 8: Creating and refining a research ...
Title = Practice Research Methods in Social Work: Processes, Applications, and Implications for Social Service Organisations . 1* Bowen McBeath, 2 Michael J. Austin, 2 Sarah Carnochan, and 2 Emmeline Chuang. 1 School of Social Work, Portland State University, Portland, OR, U.S.A.
Widely considered the best text for the course, RESEARCH METHODS FOR SOCIAL WORK, Seventh Edition strikes an optimal balance of quantitative and qualitative research techniques--illustrating how the two methods complement one another. Allen Rubin and Earl R. Babbie's classic bestseller is acclaimed for its depth and breadth of coverage as well as the authors' clear and often humorous ...
Research on Social Work Practice (RSWP), peer-reviewed and published eight times per year, is a disciplinary journal devoted to the publication of empirical research concerning the assessment methods and outcomes of social work practice. Intervention programs covered include behavior analysis and therapy; psychotherapy or counseling with individuals; case management; and education.
Research for Social Workers has built a strong reputation as an accessible guide to the key research methods and approaches used in the discipline. Ideal for beginners, the book outlines the importance of social work research, its guiding principles and explains how to choose a topic area, develop research questions together with describing the key steps in the research process.
Welcome to the SAGE Edge site for Research Methods for Social Work, 1e!. Research Methods for Social Work: A Problem-Based Approach is a comprehensive introduction to methods instruction that engages students innovatively and interactively.Using a case study and problem-based learning (PBL) approach, authors Antoinette Y. Farmer and G. Lawrence Farmer utilize case examples to achieve a level ...
Another critical factor that distinguishes PR from other participatory research methods is the connection between social work practice and social service managers. Compared to action research and empowerment evaluation methodologies, PR is more explicitly organisational in understanding how managers, front line staff and service users make ...
Handbook of Social Work Research Methods Encyclopedia of Survey Research Methods Paul Lavrakas covers all major facets of survey research methodology, from selecting the sample design and the sampling frame, designing and pretesting the questionnaire, data collection, and data coding, to the thorny issues surrounding diminishing response rates ...
Type of Publication: Empirical research articles are published in scholarly or academic journals . These publications are sometimes referred to as "peer-reviewed," "academic" or "refereed" publications. Examples of such publications include: Social Work Research, Mental Health Practice, and Journal of Substance Abuse. <<
PDF | On Jan 1, 2009, A. Rubin and others published Research Methods for Social Work | Find, read and cite all the research you need on ResearchGate
Methods of Social Work Research I 19:910:505 Spring 2022 Catalog Course Description. k Research I19:910:505Spring 2022Instructor:Email:Catalog Course Description Introduction to scientific, analytic, approach to building knowledge and skills, including the role of concepts and theory, hypothesis formulation, operationalization, research design ...
Mixed methods research adds three important elements to social work research: voices of participants, comprehensive analyses of phenomena, and enhanced validity of findings. For these reasons, the teaching and use of mixed methods research remain integral to social work. Keywords: Mixed methods research, social work.
Using a true experiment in social work research is often pretty difficult, since as I mentioned earlier, true experiments can be quite resource intensive. True experiments work best with relatively large sample sizes, and random assignment, a key criterion for a true experimental design, is hard (and unethical) to execute in practice when you ...
Within this process, the following will be covered: the scientific method for building knowledge for social work practice, ethical standards for scientific inquiry, qualitative and quantitative research methodology, research designs for developing knowledge and systematically evaluating social work practice and human service programs, and the ...
Based on: Campbell AnneTaylor BrianMcGlade Anne, Research design in social work: Qualitative and quantitative methods. London: Sage Publications - Learning Matters, 2017; 160 pp. ISBN 9781446271247, £20.99 (pbk) ... Qualitative Methods in Social Work Research, 2nd edn. Thousand Oaks, CA: SAGE, 2008. 281 pp. ISBN 978 1412951920 (hbk ...
The complexity of social problems addressed by the social work profession makes mixed methods research an essential tool. This literature review examined common quantitative and qualitative techniques used by social work researchers and what mixed methods research may add to social work research. Surveys and in-depth interviews were the most common quantitative and qualitative data collection ...
Social support was a significant component of an individual's external resources in COR. Citation 11 Perceived social support refers to the subjective feeling and evaluation of the degree of external support from individuals. Citation 17 Many studies have confirmed the positive correlation between social support and LS in adult cancer patients.
Methods. The present study examines how the relationship between area level indicators of deprivation and child welfare interventions across NI has changed over time. Specifically, it extends the current evidence base to look at four stages of children's contact with child and family social work in NI during each of the years from 2010 to ...
Mixed-methods research can facilitate research that cannot be answered using a single method. Though there is controversy concerning what constitutes mixed-methods research, integrating quantitative and qualitative approaches is considered increasingly important, and extant literature has observed and demonstrated that the definition of mixed ...
Background Triage and clinical consultations increasingly occur remotely. We aimed to learn why safety incidents occur in remote encounters and how to prevent them. Setting and sample UK primary care. 95 safety incidents (complaints, settled indemnity claims and reports) involving remote interactions. Separately, 12 general practices followed 2021-2023. Methods Multimethod qualitative study ...
Higher education institutions, as organizations that transform society, have a responsibility to contribute to the construction of a sustainable and resilient world that is aware of the collateral effects of technological advances. This is the initial phase of a research that aims to determine whether subjects in the complementary training area have a significant effect on ethical, social ...