• Resources Home 🏠
  • Try SciSpace Copilot
  • Search research papers
  • Add Copilot Extension
  • Try AI Detector
  • Try Paraphraser
  • Try Citation Generator
  • April Papers
  • June Papers
  • July Papers

SciSpace Resources

The Craft of Writing a Strong Hypothesis

Deeptanshu D

Table of Contents

Writing a hypothesis is one of the essential elements of a scientific research paper. It needs to be to the point, clearly communicating what your research is trying to accomplish. A blurry, drawn-out, or complexly-structured hypothesis can confuse your readers. Or worse, the editor and peer reviewers.

A captivating hypothesis is not too intricate. This blog will take you through the process so that, by the end of it, you have a better idea of how to convey your research paper's intent in just one sentence.

What is a Hypothesis?

The first step in your scientific endeavor, a hypothesis, is a strong, concise statement that forms the basis of your research. It is not the same as a thesis statement , which is a brief summary of your research paper .

The sole purpose of a hypothesis is to predict your paper's findings, data, and conclusion. It comes from a place of curiosity and intuition . When you write a hypothesis, you're essentially making an educated guess based on scientific prejudices and evidence, which is further proven or disproven through the scientific method.

The reason for undertaking research is to observe a specific phenomenon. A hypothesis, therefore, lays out what the said phenomenon is. And it does so through two variables, an independent and dependent variable.

The independent variable is the cause behind the observation, while the dependent variable is the effect of the cause. A good example of this is “mixing red and blue forms purple.” In this hypothesis, mixing red and blue is the independent variable as you're combining the two colors at your own will. The formation of purple is the dependent variable as, in this case, it is conditional to the independent variable.

Different Types of Hypotheses‌

Types-of-hypotheses

Types of hypotheses

Some would stand by the notion that there are only two types of hypotheses: a Null hypothesis and an Alternative hypothesis. While that may have some truth to it, it would be better to fully distinguish the most common forms as these terms come up so often, which might leave you out of context.

Apart from Null and Alternative, there are Complex, Simple, Directional, Non-Directional, Statistical, and Associative and casual hypotheses. They don't necessarily have to be exclusive, as one hypothesis can tick many boxes, but knowing the distinctions between them will make it easier for you to construct your own.

1. Null hypothesis

A null hypothesis proposes no relationship between two variables. Denoted by H 0 , it is a negative statement like “Attending physiotherapy sessions does not affect athletes' on-field performance.” Here, the author claims physiotherapy sessions have no effect on on-field performances. Even if there is, it's only a coincidence.

2. Alternative hypothesis

Considered to be the opposite of a null hypothesis, an alternative hypothesis is donated as H1 or Ha. It explicitly states that the dependent variable affects the independent variable. A good  alternative hypothesis example is “Attending physiotherapy sessions improves athletes' on-field performance.” or “Water evaporates at 100 °C. ” The alternative hypothesis further branches into directional and non-directional.

  • Directional hypothesis: A hypothesis that states the result would be either positive or negative is called directional hypothesis. It accompanies H1 with either the ‘<' or ‘>' sign.
  • Non-directional hypothesis: A non-directional hypothesis only claims an effect on the dependent variable. It does not clarify whether the result would be positive or negative. The sign for a non-directional hypothesis is ‘≠.'

3. Simple hypothesis

A simple hypothesis is a statement made to reflect the relation between exactly two variables. One independent and one dependent. Consider the example, “Smoking is a prominent cause of lung cancer." The dependent variable, lung cancer, is dependent on the independent variable, smoking.

4. Complex hypothesis

In contrast to a simple hypothesis, a complex hypothesis implies the relationship between multiple independent and dependent variables. For instance, “Individuals who eat more fruits tend to have higher immunity, lesser cholesterol, and high metabolism.” The independent variable is eating more fruits, while the dependent variables are higher immunity, lesser cholesterol, and high metabolism.

5. Associative and casual hypothesis

Associative and casual hypotheses don't exhibit how many variables there will be. They define the relationship between the variables. In an associative hypothesis, changing any one variable, dependent or independent, affects others. In a casual hypothesis, the independent variable directly affects the dependent.

6. Empirical hypothesis

Also referred to as the working hypothesis, an empirical hypothesis claims a theory's validation via experiments and observation. This way, the statement appears justifiable and different from a wild guess.

Say, the hypothesis is “Women who take iron tablets face a lesser risk of anemia than those who take vitamin B12.” This is an example of an empirical hypothesis where the researcher  the statement after assessing a group of women who take iron tablets and charting the findings.

7. Statistical hypothesis

The point of a statistical hypothesis is to test an already existing hypothesis by studying a population sample. Hypothesis like “44% of the Indian population belong in the age group of 22-27.” leverage evidence to prove or disprove a particular statement.

Characteristics of a Good Hypothesis

Writing a hypothesis is essential as it can make or break your research for you. That includes your chances of getting published in a journal. So when you're designing one, keep an eye out for these pointers:

  • A research hypothesis has to be simple yet clear to look justifiable enough.
  • It has to be testable — your research would be rendered pointless if too far-fetched into reality or limited by technology.
  • It has to be precise about the results —what you are trying to do and achieve through it should come out in your hypothesis.
  • A research hypothesis should be self-explanatory, leaving no doubt in the reader's mind.
  • If you are developing a relational hypothesis, you need to include the variables and establish an appropriate relationship among them.
  • A hypothesis must keep and reflect the scope for further investigations and experiments.

Separating a Hypothesis from a Prediction

Outside of academia, hypothesis and prediction are often used interchangeably. In research writing, this is not only confusing but also incorrect. And although a hypothesis and prediction are guesses at their core, there are many differences between them.

A hypothesis is an educated guess or even a testable prediction validated through research. It aims to analyze the gathered evidence and facts to define a relationship between variables and put forth a logical explanation behind the nature of events.

Predictions are assumptions or expected outcomes made without any backing evidence. They are more fictionally inclined regardless of where they originate from.

For this reason, a hypothesis holds much more weight than a prediction. It sticks to the scientific method rather than pure guesswork. "Planets revolve around the Sun." is an example of a hypothesis as it is previous knowledge and observed trends. Additionally, we can test it through the scientific method.

Whereas "COVID-19 will be eradicated by 2030." is a prediction. Even though it results from past trends, we can't prove or disprove it. So, the only way this gets validated is to wait and watch if COVID-19 cases end by 2030.

Finally, How to Write a Hypothesis

Quick-tips-on-how-to-write-a-hypothesis

Quick tips on writing a hypothesis

1.  Be clear about your research question

A hypothesis should instantly address the research question or the problem statement. To do so, you need to ask a question. Understand the constraints of your undertaken research topic and then formulate a simple and topic-centric problem. Only after that can you develop a hypothesis and further test for evidence.

2. Carry out a recce

Once you have your research's foundation laid out, it would be best to conduct preliminary research. Go through previous theories, academic papers, data, and experiments before you start curating your research hypothesis. It will give you an idea of your hypothesis's viability or originality.

Making use of references from relevant research papers helps draft a good research hypothesis. SciSpace Discover offers a repository of over 270 million research papers to browse through and gain a deeper understanding of related studies on a particular topic. Additionally, you can use SciSpace Copilot , your AI research assistant, for reading any lengthy research paper and getting a more summarized context of it. A hypothesis can be formed after evaluating many such summarized research papers. Copilot also offers explanations for theories and equations, explains paper in simplified version, allows you to highlight any text in the paper or clip math equations and tables and provides a deeper, clear understanding of what is being said. This can improve the hypothesis by helping you identify potential research gaps.

3. Create a 3-dimensional hypothesis

Variables are an essential part of any reasonable hypothesis. So, identify your independent and dependent variable(s) and form a correlation between them. The ideal way to do this is to write the hypothetical assumption in the ‘if-then' form. If you use this form, make sure that you state the predefined relationship between the variables.

In another way, you can choose to present your hypothesis as a comparison between two variables. Here, you must specify the difference you expect to observe in the results.

4. Write the first draft

Now that everything is in place, it's time to write your hypothesis. For starters, create the first draft. In this version, write what you expect to find from your research.

Clearly separate your independent and dependent variables and the link between them. Don't fixate on syntax at this stage. The goal is to ensure your hypothesis addresses the issue.

5. Proof your hypothesis

After preparing the first draft of your hypothesis, you need to inspect it thoroughly. It should tick all the boxes, like being concise, straightforward, relevant, and accurate. Your final hypothesis has to be well-structured as well.

Research projects are an exciting and crucial part of being a scholar. And once you have your research question, you need a great hypothesis to begin conducting research. Thus, knowing how to write a hypothesis is very important.

Now that you have a firmer grasp on what a good hypothesis constitutes, the different kinds there are, and what process to follow, you will find it much easier to write your hypothesis, which ultimately helps your research.

Now it's easier than ever to streamline your research workflow with SciSpace Discover . Its integrated, comprehensive end-to-end platform for research allows scholars to easily discover, write and publish their research and fosters collaboration.

It includes everything you need, including a repository of over 270 million research papers across disciplines, SEO-optimized summaries and public profiles to show your expertise and experience.

If you found these tips on writing a research hypothesis useful, head over to our blog on Statistical Hypothesis Testing to learn about the top researchers, papers, and institutions in this domain.

Frequently Asked Questions (FAQs)

1. what is the definition of hypothesis.

According to the Oxford dictionary, a hypothesis is defined as “An idea or explanation of something that is based on a few known facts, but that has not yet been proved to be true or correct”.

2. What is an example of hypothesis?

The hypothesis is a statement that proposes a relationship between two or more variables. An example: "If we increase the number of new users who join our platform by 25%, then we will see an increase in revenue."

3. What is an example of null hypothesis?

A null hypothesis is a statement that there is no relationship between two variables. The null hypothesis is written as H0. The null hypothesis states that there is no effect. For example, if you're studying whether or not a particular type of exercise increases strength, your null hypothesis will be "there is no difference in strength between people who exercise and people who don't."

4. What are the types of research?

• Fundamental research

• Applied research

• Qualitative research

• Quantitative research

• Mixed research

• Exploratory research

• Longitudinal research

• Cross-sectional research

• Field research

• Laboratory research

• Fixed research

• Flexible research

• Action research

• Policy research

• Classification research

• Comparative research

• Causal research

• Inductive research

• Deductive research

5. How to write a hypothesis?

• Your hypothesis should be able to predict the relationship and outcome.

• Avoid wordiness by keeping it simple and brief.

• Your hypothesis should contain observable and testable outcomes.

• Your hypothesis should be relevant to the research question.

6. What are the 2 types of hypothesis?

• Null hypotheses are used to test the claim that "there is no difference between two groups of data".

• Alternative hypotheses test the claim that "there is a difference between two data groups".

7. Difference between research question and research hypothesis?

A research question is a broad, open-ended question you will try to answer through your research. A hypothesis is a statement based on prior research or theory that you expect to be true due to your study. Example - Research question: What are the factors that influence the adoption of the new technology? Research hypothesis: There is a positive relationship between age, education and income level with the adoption of the new technology.

8. What is plural for hypothesis?

The plural of hypothesis is hypotheses. Here's an example of how it would be used in a statement, "Numerous well-considered hypotheses are presented in this part, and they are supported by tables and figures that are well-illustrated."

9. What is the red queen hypothesis?

The red queen hypothesis in evolutionary biology states that species must constantly evolve to avoid extinction because if they don't, they will be outcompeted by other species that are evolving. Leigh Van Valen first proposed it in 1973; since then, it has been tested and substantiated many times.

10. Who is known as the father of null hypothesis?

The father of the null hypothesis is Sir Ronald Fisher. He published a paper in 1925 that introduced the concept of null hypothesis testing, and he was also the first to use the term itself.

11. When to reject null hypothesis?

You need to find a significant difference between your two populations to reject the null hypothesis. You can determine that by running statistical tests such as an independent sample t-test or a dependent sample t-test. You should reject the null hypothesis if the p-value is less than 0.05.

what is a hypothesis in a scientific report

You might also like

Consensus GPT vs. SciSpace GPT: Choose the Best GPT for Research

Consensus GPT vs. SciSpace GPT: Choose the Best GPT for Research

Sumalatha G

Literature Review and Theoretical Framework: Understanding the Differences

Nikhil Seethi

Types of Essays in Academic Writing - Quick Guide (2024)

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2023 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

How to Write a Great Hypothesis

Hypothesis Definition, Format, Examples, and Tips

Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

what is a hypothesis in a scientific report

Amy Morin, LCSW, is a psychotherapist and international bestselling author. Her books, including "13 Things Mentally Strong People Don't Do," have been translated into more than 40 languages. Her TEDx talk,  "The Secret of Becoming Mentally Strong," is one of the most viewed talks of all time.

what is a hypothesis in a scientific report

Verywell / Alex Dos Diaz

  • The Scientific Method

Hypothesis Format

Falsifiability of a hypothesis.

  • Operationalization

Hypothesis Types

Hypotheses examples.

  • Collecting Data

A hypothesis is a tentative statement about the relationship between two or more variables. It is a specific, testable prediction about what you expect to happen in a study. It is a preliminary answer to your question that helps guide the research process.

Consider a study designed to examine the relationship between sleep deprivation and test performance. The hypothesis might be: "This study is designed to assess the hypothesis that sleep-deprived people will perform worse on a test than individuals who are not sleep-deprived."

At a Glance

A hypothesis is crucial to scientific research because it offers a clear direction for what the researchers are looking to find. This allows them to design experiments to test their predictions and add to our scientific knowledge about the world. This article explores how a hypothesis is used in psychology research, how to write a good hypothesis, and the different types of hypotheses you might use.

The Hypothesis in the Scientific Method

In the scientific method , whether it involves research in psychology, biology, or some other area, a hypothesis represents what the researchers think will happen in an experiment. The scientific method involves the following steps:

  • Forming a question
  • Performing background research
  • Creating a hypothesis
  • Designing an experiment
  • Collecting data
  • Analyzing the results
  • Drawing conclusions
  • Communicating the results

The hypothesis is a prediction, but it involves more than a guess. Most of the time, the hypothesis begins with a question which is then explored through background research. At this point, researchers then begin to develop a testable hypothesis.

Unless you are creating an exploratory study, your hypothesis should always explain what you  expect  to happen.

In a study exploring the effects of a particular drug, the hypothesis might be that researchers expect the drug to have some type of effect on the symptoms of a specific illness. In psychology, the hypothesis might focus on how a certain aspect of the environment might influence a particular behavior.

Remember, a hypothesis does not have to be correct. While the hypothesis predicts what the researchers expect to see, the goal of the research is to determine whether this guess is right or wrong. When conducting an experiment, researchers might explore numerous factors to determine which ones might contribute to the ultimate outcome.

In many cases, researchers may find that the results of an experiment  do not  support the original hypothesis. When writing up these results, the researchers might suggest other options that should be explored in future studies.

In many cases, researchers might draw a hypothesis from a specific theory or build on previous research. For example, prior research has shown that stress can impact the immune system. So a researcher might hypothesize: "People with high-stress levels will be more likely to contract a common cold after being exposed to the virus than people who have low-stress levels."

In other instances, researchers might look at commonly held beliefs or folk wisdom. "Birds of a feather flock together" is one example of folk adage that a psychologist might try to investigate. The researcher might pose a specific hypothesis that "People tend to select romantic partners who are similar to them in interests and educational level."

Elements of a Good Hypothesis

So how do you write a good hypothesis? When trying to come up with a hypothesis for your research or experiments, ask yourself the following questions:

  • Is your hypothesis based on your research on a topic?
  • Can your hypothesis be tested?
  • Does your hypothesis include independent and dependent variables?

Before you come up with a specific hypothesis, spend some time doing background research. Once you have completed a literature review, start thinking about potential questions you still have. Pay attention to the discussion section in the  journal articles you read . Many authors will suggest questions that still need to be explored.

How to Formulate a Good Hypothesis

To form a hypothesis, you should take these steps:

  • Collect as many observations about a topic or problem as you can.
  • Evaluate these observations and look for possible causes of the problem.
  • Create a list of possible explanations that you might want to explore.
  • After you have developed some possible hypotheses, think of ways that you could confirm or disprove each hypothesis through experimentation. This is known as falsifiability.

In the scientific method ,  falsifiability is an important part of any valid hypothesis. In order to test a claim scientifically, it must be possible that the claim could be proven false.

Students sometimes confuse the idea of falsifiability with the idea that it means that something is false, which is not the case. What falsifiability means is that  if  something was false, then it is possible to demonstrate that it is false.

One of the hallmarks of pseudoscience is that it makes claims that cannot be refuted or proven false.

The Importance of Operational Definitions

A variable is a factor or element that can be changed and manipulated in ways that are observable and measurable. However, the researcher must also define how the variable will be manipulated and measured in the study.

Operational definitions are specific definitions for all relevant factors in a study. This process helps make vague or ambiguous concepts detailed and measurable.

For example, a researcher might operationally define the variable " test anxiety " as the results of a self-report measure of anxiety experienced during an exam. A "study habits" variable might be defined by the amount of studying that actually occurs as measured by time.

These precise descriptions are important because many things can be measured in various ways. Clearly defining these variables and how they are measured helps ensure that other researchers can replicate your results.

Replicability

One of the basic principles of any type of scientific research is that the results must be replicable.

Replication means repeating an experiment in the same way to produce the same results. By clearly detailing the specifics of how the variables were measured and manipulated, other researchers can better understand the results and repeat the study if needed.

Some variables are more difficult than others to define. For example, how would you operationally define a variable such as aggression ? For obvious ethical reasons, researchers cannot create a situation in which a person behaves aggressively toward others.

To measure this variable, the researcher must devise a measurement that assesses aggressive behavior without harming others. The researcher might utilize a simulated task to measure aggressiveness in this situation.

Hypothesis Checklist

  • Does your hypothesis focus on something that you can actually test?
  • Does your hypothesis include both an independent and dependent variable?
  • Can you manipulate the variables?
  • Can your hypothesis be tested without violating ethical standards?

The hypothesis you use will depend on what you are investigating and hoping to find. Some of the main types of hypotheses that you might use include:

  • Simple hypothesis : This type of hypothesis suggests there is a relationship between one independent variable and one dependent variable.
  • Complex hypothesis : This type suggests a relationship between three or more variables, such as two independent and dependent variables.
  • Null hypothesis : This hypothesis suggests no relationship exists between two or more variables.
  • Alternative hypothesis : This hypothesis states the opposite of the null hypothesis.
  • Statistical hypothesis : This hypothesis uses statistical analysis to evaluate a representative population sample and then generalizes the findings to the larger group.
  • Logical hypothesis : This hypothesis assumes a relationship between variables without collecting data or evidence.

A hypothesis often follows a basic format of "If {this happens} then {this will happen}." One way to structure your hypothesis is to describe what will happen to the  dependent variable  if you change the  independent variable .

The basic format might be: "If {these changes are made to a certain independent variable}, then we will observe {a change in a specific dependent variable}."

A few examples of simple hypotheses:

  • "Students who eat breakfast will perform better on a math exam than students who do not eat breakfast."
  • "Students who experience test anxiety before an English exam will get lower scores than students who do not experience test anxiety."​
  • "Motorists who talk on the phone while driving will be more likely to make errors on a driving course than those who do not talk on the phone."
  • "Children who receive a new reading intervention will have higher reading scores than students who do not receive the intervention."

Examples of a complex hypothesis include:

  • "People with high-sugar diets and sedentary activity levels are more likely to develop depression."
  • "Younger people who are regularly exposed to green, outdoor areas have better subjective well-being than older adults who have limited exposure to green spaces."

Examples of a null hypothesis include:

  • "There is no difference in anxiety levels between people who take St. John's wort supplements and those who do not."
  • "There is no difference in scores on a memory recall task between children and adults."
  • "There is no difference in aggression levels between children who play first-person shooter games and those who do not."

Examples of an alternative hypothesis:

  • "People who take St. John's wort supplements will have less anxiety than those who do not."
  • "Adults will perform better on a memory task than children."
  • "Children who play first-person shooter games will show higher levels of aggression than children who do not." 

Collecting Data on Your Hypothesis

Once a researcher has formed a testable hypothesis, the next step is to select a research design and start collecting data. The research method depends largely on exactly what they are studying. There are two basic types of research methods: descriptive research and experimental research.

Descriptive Research Methods

Descriptive research such as  case studies ,  naturalistic observations , and surveys are often used when  conducting an experiment is difficult or impossible. These methods are best used to describe different aspects of a behavior or psychological phenomenon.

Once a researcher has collected data using descriptive methods, a  correlational study  can examine how the variables are related. This research method might be used to investigate a hypothesis that is difficult to test experimentally.

Experimental Research Methods

Experimental methods  are used to demonstrate causal relationships between variables. In an experiment, the researcher systematically manipulates a variable of interest (known as the independent variable) and measures the effect on another variable (known as the dependent variable).

Unlike correlational studies, which can only be used to determine if there is a relationship between two variables, experimental methods can be used to determine the actual nature of the relationship—whether changes in one variable actually  cause  another to change.

The hypothesis is a critical part of any scientific exploration. It represents what researchers expect to find in a study or experiment. In situations where the hypothesis is unsupported by the research, the research still has value. Such research helps us better understand how different aspects of the natural world relate to one another. It also helps us develop new hypotheses that can then be tested in the future.

Thompson WH, Skau S. On the scope of scientific hypotheses .  R Soc Open Sci . 2023;10(8):230607. doi:10.1098/rsos.230607

Taran S, Adhikari NKJ, Fan E. Falsifiability in medicine: what clinicians can learn from Karl Popper [published correction appears in Intensive Care Med. 2021 Jun 17;:].  Intensive Care Med . 2021;47(9):1054-1056. doi:10.1007/s00134-021-06432-z

Eyler AA. Research Methods for Public Health . 1st ed. Springer Publishing Company; 2020. doi:10.1891/9780826182067.0004

Nosek BA, Errington TM. What is replication ?  PLoS Biol . 2020;18(3):e3000691. doi:10.1371/journal.pbio.3000691

Aggarwal R, Ranganathan P. Study designs: Part 2 - Descriptive studies .  Perspect Clin Res . 2019;10(1):34-36. doi:10.4103/picr.PICR_154_18

Nevid J. Psychology: Concepts and Applications. Wadworth, 2013.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • How to Write a Strong Hypothesis | Guide & Examples

How to Write a Strong Hypothesis | Guide & Examples

Published on 6 May 2022 by Shona McCombes .

A hypothesis is a statement that can be tested by scientific research. If you want to test a relationship between two or more variables, you need to write hypotheses before you start your experiment or data collection.

Table of contents

What is a hypothesis, developing a hypothesis (with example), hypothesis examples, frequently asked questions about writing hypotheses.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess – it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations, and statistical analysis of data).

Variables in hypotheses

Hypotheses propose a relationship between two or more variables . An independent variable is something the researcher changes or controls. A dependent variable is something the researcher observes and measures.

In this example, the independent variable is exposure to the sun – the assumed cause . The dependent variable is the level of happiness – the assumed effect .

Prevent plagiarism, run a free check.

Step 1: ask a question.

Writing a hypothesis begins with a research question that you want to answer. The question should be focused, specific, and researchable within the constraints of your project.

Step 2: Do some preliminary research

Your initial answer to the question should be based on what is already known about the topic. Look for theories and previous studies to help you form educated assumptions about what your research will find.

At this stage, you might construct a conceptual framework to identify which variables you will study and what you think the relationships are between them. Sometimes, you’ll have to operationalise more complex constructs.

Step 3: Formulate your hypothesis

Now you should have some idea of what you expect to find. Write your initial answer to the question in a clear, concise sentence.

Step 4: Refine your hypothesis

You need to make sure your hypothesis is specific and testable. There are various ways of phrasing a hypothesis, but all the terms you use should have clear definitions, and the hypothesis should contain:

  • The relevant variables
  • The specific group being studied
  • The predicted outcome of the experiment or analysis

Step 5: Phrase your hypothesis in three ways

To identify the variables, you can write a simple prediction in if … then form. The first part of the sentence states the independent variable and the second part states the dependent variable.

In academic research, hypotheses are more commonly phrased in terms of correlations or effects, where you directly state the predicted relationship between variables.

If you are comparing two groups, the hypothesis can state what difference you expect to find between them.

Step 6. Write a null hypothesis

If your research involves statistical hypothesis testing , you will also have to write a null hypothesis. The null hypothesis is the default position that there is no association between the variables. The null hypothesis is written as H 0 , while the alternative hypothesis is H 1 or H a .

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis is not just a guess. It should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations, and statistical analysis of data).

A research hypothesis is your proposed answer to your research question. The research hypothesis usually includes an explanation (‘ x affects y because …’).

A statistical hypothesis, on the other hand, is a mathematical statement about a population parameter. Statistical hypotheses always come in pairs: the null and alternative hypotheses. In a well-designed study , the statistical hypotheses correspond logically to the research hypothesis.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

McCombes, S. (2022, May 06). How to Write a Strong Hypothesis | Guide & Examples. Scribbr. Retrieved 15 April 2024, from https://www.scribbr.co.uk/research-methods/hypothesis-writing/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, operationalisation | a guide with examples, pros & cons, what is a conceptual framework | tips & examples, a quick guide to experimental design | 5 steps & examples.

Elsevier QRcode Wechat

  • Manuscript Preparation

What is and How to Write a Good Hypothesis in Research?

  • 4 minute read
  • 296.9K views

Table of Contents

One of the most important aspects of conducting research is constructing a strong hypothesis. But what makes a hypothesis in research effective? In this article, we’ll look at the difference between a hypothesis and a research question, as well as the elements of a good hypothesis in research. We’ll also include some examples of effective hypotheses, and what pitfalls to avoid.

What is a Hypothesis in Research?

Simply put, a hypothesis is a research question that also includes the predicted or expected result of the research. Without a hypothesis, there can be no basis for a scientific or research experiment. As such, it is critical that you carefully construct your hypothesis by being deliberate and thorough, even before you set pen to paper. Unless your hypothesis is clearly and carefully constructed, any flaw can have an adverse, and even grave, effect on the quality of your experiment and its subsequent results.

Research Question vs Hypothesis

It’s easy to confuse research questions with hypotheses, and vice versa. While they’re both critical to the Scientific Method, they have very specific differences. Primarily, a research question, just like a hypothesis, is focused and concise. But a hypothesis includes a prediction based on the proposed research, and is designed to forecast the relationship of and between two (or more) variables. Research questions are open-ended, and invite debate and discussion, while hypotheses are closed, e.g. “The relationship between A and B will be C.”

A hypothesis is generally used if your research topic is fairly well established, and you are relatively certain about the relationship between the variables that will be presented in your research. Since a hypothesis is ideally suited for experimental studies, it will, by its very existence, affect the design of your experiment. The research question is typically used for new topics that have not yet been researched extensively. Here, the relationship between different variables is less known. There is no prediction made, but there may be variables explored. The research question can be casual in nature, simply trying to understand if a relationship even exists, descriptive or comparative.

How to Write Hypothesis in Research

Writing an effective hypothesis starts before you even begin to type. Like any task, preparation is key, so you start first by conducting research yourself, and reading all you can about the topic that you plan to research. From there, you’ll gain the knowledge you need to understand where your focus within the topic will lie.

Remember that a hypothesis is a prediction of the relationship that exists between two or more variables. Your job is to write a hypothesis, and design the research, to “prove” whether or not your prediction is correct. A common pitfall is to use judgments that are subjective and inappropriate for the construction of a hypothesis. It’s important to keep the focus and language of your hypothesis objective.

An effective hypothesis in research is clearly and concisely written, and any terms or definitions clarified and defined. Specific language must also be used to avoid any generalities or assumptions.

Use the following points as a checklist to evaluate the effectiveness of your research hypothesis:

  • Predicts the relationship and outcome
  • Simple and concise – avoid wordiness
  • Clear with no ambiguity or assumptions about the readers’ knowledge
  • Observable and testable results
  • Relevant and specific to the research question or problem

Research Hypothesis Example

Perhaps the best way to evaluate whether or not your hypothesis is effective is to compare it to those of your colleagues in the field. There is no need to reinvent the wheel when it comes to writing a powerful research hypothesis. As you’re reading and preparing your hypothesis, you’ll also read other hypotheses. These can help guide you on what works, and what doesn’t, when it comes to writing a strong research hypothesis.

Here are a few generic examples to get you started.

Eating an apple each day, after the age of 60, will result in a reduction of frequency of physician visits.

Budget airlines are more likely to receive more customer complaints. A budget airline is defined as an airline that offers lower fares and fewer amenities than a traditional full-service airline. (Note that the term “budget airline” is included in the hypothesis.

Workplaces that offer flexible working hours report higher levels of employee job satisfaction than workplaces with fixed hours.

Each of the above examples are specific, observable and measurable, and the statement of prediction can be verified or shown to be false by utilizing standard experimental practices. It should be noted, however, that often your hypothesis will change as your research progresses.

Language Editing Plus

Elsevier’s Language Editing Plus service can help ensure that your research hypothesis is well-designed, and articulates your research and conclusions. Our most comprehensive editing package, you can count on a thorough language review by native-English speakers who are PhDs or PhD candidates. We’ll check for effective logic and flow of your manuscript, as well as document formatting for your chosen journal, reference checks, and much more.

Systematic Literature Review or Literature Review

  • Research Process

Systematic Literature Review or Literature Review?

What is a Problem Statement

What is a Problem Statement? [with examples]

You may also like.

impactful introduction section

Make Hook, Line, and Sinker: The Art of Crafting Engaging Introductions

Limitations of a Research

Can Describing Study Limitations Improve the Quality of Your Paper?

Guide to Crafting Impactful Sentences

A Guide to Crafting Shorter, Impactful Sentences in Academic Writing

Write an Excellent Discussion in Your Manuscript

6 Steps to Write an Excellent Discussion in Your Manuscript

How to Write Clear Civil Engineering Papers

How to Write Clear and Crisp Civil Engineering Papers? Here are 5 Key Tips to Consider

what is a hypothesis in a scientific report

The Clear Path to An Impactful Paper: ②

Essentials of Writing to Communicate Research in Medicine

The Essentials of Writing to Communicate Research in Medicine

There are some recognizable elements and patterns often used for framing engaging sentences in English. Find here the sentence patterns in Academic Writing

Changing Lines: Sentence Patterns in Academic Writing

Input your search keywords and press Enter.

What is a scientific hypothesis?

It's the initial building block in the scientific method.

A girl looks at plants in a test tube for a science experiment. What's her scientific hypothesis?

Hypothesis basics

What makes a hypothesis testable.

  • Types of hypotheses
  • Hypothesis versus theory

Additional resources

Bibliography.

A scientific hypothesis is a tentative, testable explanation for a phenomenon in the natural world. It's the initial building block in the scientific method . Many describe it as an "educated guess" based on prior knowledge and observation. While this is true, a hypothesis is more informed than a guess. While an "educated guess" suggests a random prediction based on a person's expertise, developing a hypothesis requires active observation and background research. 

The basic idea of a hypothesis is that there is no predetermined outcome. For a solution to be termed a scientific hypothesis, it has to be an idea that can be supported or refuted through carefully crafted experimentation or observation. This concept, called falsifiability and testability, was advanced in the mid-20th century by Austrian-British philosopher Karl Popper in his famous book "The Logic of Scientific Discovery" (Routledge, 1959).

A key function of a hypothesis is to derive predictions about the results of future experiments and then perform those experiments to see whether they support the predictions.

A hypothesis is usually written in the form of an if-then statement, which gives a possibility (if) and explains what may happen because of the possibility (then). The statement could also include "may," according to California State University, Bakersfield .

Here are some examples of hypothesis statements:

  • If garlic repels fleas, then a dog that is given garlic every day will not get fleas.
  • If sugar causes cavities, then people who eat a lot of candy may be more prone to cavities.
  • If ultraviolet light can damage the eyes, then maybe this light can cause blindness.

A useful hypothesis should be testable and falsifiable. That means that it should be possible to prove it wrong. A theory that can't be proved wrong is nonscientific, according to Karl Popper's 1963 book " Conjectures and Refutations ."

An example of an untestable statement is, "Dogs are better than cats." That's because the definition of "better" is vague and subjective. However, an untestable statement can be reworded to make it testable. For example, the previous statement could be changed to this: "Owning a dog is associated with higher levels of physical fitness than owning a cat." With this statement, the researcher can take measures of physical fitness from dog and cat owners and compare the two.

Types of scientific hypotheses

Elementary-age students study alternative energy using homemade windmills during public school science class.

In an experiment, researchers generally state their hypotheses in two ways. The null hypothesis predicts that there will be no relationship between the variables tested, or no difference between the experimental groups. The alternative hypothesis predicts the opposite: that there will be a difference between the experimental groups. This is usually the hypothesis scientists are most interested in, according to the University of Miami .

For example, a null hypothesis might state, "There will be no difference in the rate of muscle growth between people who take a protein supplement and people who don't." The alternative hypothesis would state, "There will be a difference in the rate of muscle growth between people who take a protein supplement and people who don't."

If the results of the experiment show a relationship between the variables, then the null hypothesis has been rejected in favor of the alternative hypothesis, according to the book " Research Methods in Psychology " (​​BCcampus, 2015). 

There are other ways to describe an alternative hypothesis. The alternative hypothesis above does not specify a direction of the effect, only that there will be a difference between the two groups. That type of prediction is called a two-tailed hypothesis. If a hypothesis specifies a certain direction — for example, that people who take a protein supplement will gain more muscle than people who don't — it is called a one-tailed hypothesis, according to William M. K. Trochim , a professor of Policy Analysis and Management at Cornell University.

Sometimes, errors take place during an experiment. These errors can happen in one of two ways. A type I error is when the null hypothesis is rejected when it is true. This is also known as a false positive. A type II error occurs when the null hypothesis is not rejected when it is false. This is also known as a false negative, according to the University of California, Berkeley . 

A hypothesis can be rejected or modified, but it can never be proved correct 100% of the time. For example, a scientist can form a hypothesis stating that if a certain type of tomato has a gene for red pigment, that type of tomato will be red. During research, the scientist then finds that each tomato of this type is red. Though the findings confirm the hypothesis, there may be a tomato of that type somewhere in the world that isn't red. Thus, the hypothesis is true, but it may not be true 100% of the time.

Scientific theory vs. scientific hypothesis

The best hypotheses are simple. They deal with a relatively narrow set of phenomena. But theories are broader; they generally combine multiple hypotheses into a general explanation for a wide range of phenomena, according to the University of California, Berkeley . For example, a hypothesis might state, "If animals adapt to suit their environments, then birds that live on islands with lots of seeds to eat will have differently shaped beaks than birds that live on islands with lots of insects to eat." After testing many hypotheses like these, Charles Darwin formulated an overarching theory: the theory of evolution by natural selection.

"Theories are the ways that we make sense of what we observe in the natural world," Tanner said. "Theories are structures of ideas that explain and interpret facts." 

  • Read more about writing a hypothesis, from the American Medical Writers Association.
  • Find out why a hypothesis isn't always necessary in science, from The American Biology Teacher.
  • Learn about null and alternative hypotheses, from Prof. Essa on YouTube .

Encyclopedia Britannica. Scientific Hypothesis. Jan. 13, 2022. https://www.britannica.com/science/scientific-hypothesis

Karl Popper, "The Logic of Scientific Discovery," Routledge, 1959.

California State University, Bakersfield, "Formatting a testable hypothesis." https://www.csub.edu/~ddodenhoff/Bio100/Bio100sp04/formattingahypothesis.htm  

Karl Popper, "Conjectures and Refutations," Routledge, 1963.

Price, P., Jhangiani, R., & Chiang, I., "Research Methods of Psychology — 2nd Canadian Edition," BCcampus, 2015.‌

University of Miami, "The Scientific Method" http://www.bio.miami.edu/dana/161/evolution/161app1_scimethod.pdf  

William M.K. Trochim, "Research Methods Knowledge Base," https://conjointly.com/kb/hypotheses-explained/  

University of California, Berkeley, "Multiple Hypothesis Testing and False Discovery Rate" https://www.stat.berkeley.edu/~hhuang/STAT141/Lecture-FDR.pdf  

University of California, Berkeley, "Science at multiple levels" https://undsci.berkeley.edu/article/0_0_0/howscienceworks_19

Sign up for the Live Science daily newsletter now

Get the world’s most fascinating discoveries delivered straight to your inbox.

Alina Bradford

AI-powered 'digital twin' of Earth could make weather predictions at super speeds

Part of the San Andreas fault may be gearing up for an earthquake

'Vampire' bacteria thirst for human blood — and cause deadly infections as they feed

Most Popular

  • 2 NASA spacecraft snaps mysterious 'surfboard' orbiting the moon. What is it?
  • 3 'Gambling with your life': Experts weigh in on dangers of the Wim Hof method
  • 4 Viking Age women with cone-shaped skulls likely learned head-binding practice from far-flung region
  • 5 'Exceptional' prosthesis of gold, silver and wool helped 18th-century man live with cleft palate
  • 2 AI pinpoints where psychosis originates in the brain
  • 3 NASA's downed Ingenuity helicopter has a 'last gift' for humanity — but we'll have to go to Mars to get it
  • 4 Anglerfish entered the midnight zone 55 million years ago and thrived by becoming sexual parasites
  • 5 2,500-year-old skeletons with legs chopped off may be elites who received 'cruel' punishment in ancient China

what is a hypothesis in a scientific report

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Korean Med Sci
  • v.37(16); 2022 Apr 25

Logo of jkms

A Practical Guide to Writing Quantitative and Qualitative Research Questions and Hypotheses in Scholarly Articles

Edward barroga.

1 Department of General Education, Graduate School of Nursing Science, St. Luke’s International University, Tokyo, Japan.

Glafera Janet Matanguihan

2 Department of Biological Sciences, Messiah University, Mechanicsburg, PA, USA.

The development of research questions and the subsequent hypotheses are prerequisites to defining the main research purpose and specific objectives of a study. Consequently, these objectives determine the study design and research outcome. The development of research questions is a process based on knowledge of current trends, cutting-edge studies, and technological advances in the research field. Excellent research questions are focused and require a comprehensive literature search and in-depth understanding of the problem being investigated. Initially, research questions may be written as descriptive questions which could be developed into inferential questions. These questions must be specific and concise to provide a clear foundation for developing hypotheses. Hypotheses are more formal predictions about the research outcomes. These specify the possible results that may or may not be expected regarding the relationship between groups. Thus, research questions and hypotheses clarify the main purpose and specific objectives of the study, which in turn dictate the design of the study, its direction, and outcome. Studies developed from good research questions and hypotheses will have trustworthy outcomes with wide-ranging social and health implications.

INTRODUCTION

Scientific research is usually initiated by posing evidenced-based research questions which are then explicitly restated as hypotheses. 1 , 2 The hypotheses provide directions to guide the study, solutions, explanations, and expected results. 3 , 4 Both research questions and hypotheses are essentially formulated based on conventional theories and real-world processes, which allow the inception of novel studies and the ethical testing of ideas. 5 , 6

It is crucial to have knowledge of both quantitative and qualitative research 2 as both types of research involve writing research questions and hypotheses. 7 However, these crucial elements of research are sometimes overlooked; if not overlooked, then framed without the forethought and meticulous attention it needs. Planning and careful consideration are needed when developing quantitative or qualitative research, particularly when conceptualizing research questions and hypotheses. 4

There is a continuing need to support researchers in the creation of innovative research questions and hypotheses, as well as for journal articles that carefully review these elements. 1 When research questions and hypotheses are not carefully thought of, unethical studies and poor outcomes usually ensue. Carefully formulated research questions and hypotheses define well-founded objectives, which in turn determine the appropriate design, course, and outcome of the study. This article then aims to discuss in detail the various aspects of crafting research questions and hypotheses, with the goal of guiding researchers as they develop their own. Examples from the authors and peer-reviewed scientific articles in the healthcare field are provided to illustrate key points.

DEFINITIONS AND RELATIONSHIP OF RESEARCH QUESTIONS AND HYPOTHESES

A research question is what a study aims to answer after data analysis and interpretation. The answer is written in length in the discussion section of the paper. Thus, the research question gives a preview of the different parts and variables of the study meant to address the problem posed in the research question. 1 An excellent research question clarifies the research writing while facilitating understanding of the research topic, objective, scope, and limitations of the study. 5

On the other hand, a research hypothesis is an educated statement of an expected outcome. This statement is based on background research and current knowledge. 8 , 9 The research hypothesis makes a specific prediction about a new phenomenon 10 or a formal statement on the expected relationship between an independent variable and a dependent variable. 3 , 11 It provides a tentative answer to the research question to be tested or explored. 4

Hypotheses employ reasoning to predict a theory-based outcome. 10 These can also be developed from theories by focusing on components of theories that have not yet been observed. 10 The validity of hypotheses is often based on the testability of the prediction made in a reproducible experiment. 8

Conversely, hypotheses can also be rephrased as research questions. Several hypotheses based on existing theories and knowledge may be needed to answer a research question. Developing ethical research questions and hypotheses creates a research design that has logical relationships among variables. These relationships serve as a solid foundation for the conduct of the study. 4 , 11 Haphazardly constructed research questions can result in poorly formulated hypotheses and improper study designs, leading to unreliable results. Thus, the formulations of relevant research questions and verifiable hypotheses are crucial when beginning research. 12

CHARACTERISTICS OF GOOD RESEARCH QUESTIONS AND HYPOTHESES

Excellent research questions are specific and focused. These integrate collective data and observations to confirm or refute the subsequent hypotheses. Well-constructed hypotheses are based on previous reports and verify the research context. These are realistic, in-depth, sufficiently complex, and reproducible. More importantly, these hypotheses can be addressed and tested. 13

There are several characteristics of well-developed hypotheses. Good hypotheses are 1) empirically testable 7 , 10 , 11 , 13 ; 2) backed by preliminary evidence 9 ; 3) testable by ethical research 7 , 9 ; 4) based on original ideas 9 ; 5) have evidenced-based logical reasoning 10 ; and 6) can be predicted. 11 Good hypotheses can infer ethical and positive implications, indicating the presence of a relationship or effect relevant to the research theme. 7 , 11 These are initially developed from a general theory and branch into specific hypotheses by deductive reasoning. In the absence of a theory to base the hypotheses, inductive reasoning based on specific observations or findings form more general hypotheses. 10

TYPES OF RESEARCH QUESTIONS AND HYPOTHESES

Research questions and hypotheses are developed according to the type of research, which can be broadly classified into quantitative and qualitative research. We provide a summary of the types of research questions and hypotheses under quantitative and qualitative research categories in Table 1 .

Research questions in quantitative research

In quantitative research, research questions inquire about the relationships among variables being investigated and are usually framed at the start of the study. These are precise and typically linked to the subject population, dependent and independent variables, and research design. 1 Research questions may also attempt to describe the behavior of a population in relation to one or more variables, or describe the characteristics of variables to be measured ( descriptive research questions ). 1 , 5 , 14 These questions may also aim to discover differences between groups within the context of an outcome variable ( comparative research questions ), 1 , 5 , 14 or elucidate trends and interactions among variables ( relationship research questions ). 1 , 5 We provide examples of descriptive, comparative, and relationship research questions in quantitative research in Table 2 .

Hypotheses in quantitative research

In quantitative research, hypotheses predict the expected relationships among variables. 15 Relationships among variables that can be predicted include 1) between a single dependent variable and a single independent variable ( simple hypothesis ) or 2) between two or more independent and dependent variables ( complex hypothesis ). 4 , 11 Hypotheses may also specify the expected direction to be followed and imply an intellectual commitment to a particular outcome ( directional hypothesis ) 4 . On the other hand, hypotheses may not predict the exact direction and are used in the absence of a theory, or when findings contradict previous studies ( non-directional hypothesis ). 4 In addition, hypotheses can 1) define interdependency between variables ( associative hypothesis ), 4 2) propose an effect on the dependent variable from manipulation of the independent variable ( causal hypothesis ), 4 3) state a negative relationship between two variables ( null hypothesis ), 4 , 11 , 15 4) replace the working hypothesis if rejected ( alternative hypothesis ), 15 explain the relationship of phenomena to possibly generate a theory ( working hypothesis ), 11 5) involve quantifiable variables that can be tested statistically ( statistical hypothesis ), 11 6) or express a relationship whose interlinks can be verified logically ( logical hypothesis ). 11 We provide examples of simple, complex, directional, non-directional, associative, causal, null, alternative, working, statistical, and logical hypotheses in quantitative research, as well as the definition of quantitative hypothesis-testing research in Table 3 .

Research questions in qualitative research

Unlike research questions in quantitative research, research questions in qualitative research are usually continuously reviewed and reformulated. The central question and associated subquestions are stated more than the hypotheses. 15 The central question broadly explores a complex set of factors surrounding the central phenomenon, aiming to present the varied perspectives of participants. 15

There are varied goals for which qualitative research questions are developed. These questions can function in several ways, such as to 1) identify and describe existing conditions ( contextual research question s); 2) describe a phenomenon ( descriptive research questions ); 3) assess the effectiveness of existing methods, protocols, theories, or procedures ( evaluation research questions ); 4) examine a phenomenon or analyze the reasons or relationships between subjects or phenomena ( explanatory research questions ); or 5) focus on unknown aspects of a particular topic ( exploratory research questions ). 5 In addition, some qualitative research questions provide new ideas for the development of theories and actions ( generative research questions ) or advance specific ideologies of a position ( ideological research questions ). 1 Other qualitative research questions may build on a body of existing literature and become working guidelines ( ethnographic research questions ). Research questions may also be broadly stated without specific reference to the existing literature or a typology of questions ( phenomenological research questions ), may be directed towards generating a theory of some process ( grounded theory questions ), or may address a description of the case and the emerging themes ( qualitative case study questions ). 15 We provide examples of contextual, descriptive, evaluation, explanatory, exploratory, generative, ideological, ethnographic, phenomenological, grounded theory, and qualitative case study research questions in qualitative research in Table 4 , and the definition of qualitative hypothesis-generating research in Table 5 .

Qualitative studies usually pose at least one central research question and several subquestions starting with How or What . These research questions use exploratory verbs such as explore or describe . These also focus on one central phenomenon of interest, and may mention the participants and research site. 15

Hypotheses in qualitative research

Hypotheses in qualitative research are stated in the form of a clear statement concerning the problem to be investigated. Unlike in quantitative research where hypotheses are usually developed to be tested, qualitative research can lead to both hypothesis-testing and hypothesis-generating outcomes. 2 When studies require both quantitative and qualitative research questions, this suggests an integrative process between both research methods wherein a single mixed-methods research question can be developed. 1

FRAMEWORKS FOR DEVELOPING RESEARCH QUESTIONS AND HYPOTHESES

Research questions followed by hypotheses should be developed before the start of the study. 1 , 12 , 14 It is crucial to develop feasible research questions on a topic that is interesting to both the researcher and the scientific community. This can be achieved by a meticulous review of previous and current studies to establish a novel topic. Specific areas are subsequently focused on to generate ethical research questions. The relevance of the research questions is evaluated in terms of clarity of the resulting data, specificity of the methodology, objectivity of the outcome, depth of the research, and impact of the study. 1 , 5 These aspects constitute the FINER criteria (i.e., Feasible, Interesting, Novel, Ethical, and Relevant). 1 Clarity and effectiveness are achieved if research questions meet the FINER criteria. In addition to the FINER criteria, Ratan et al. described focus, complexity, novelty, feasibility, and measurability for evaluating the effectiveness of research questions. 14

The PICOT and PEO frameworks are also used when developing research questions. 1 The following elements are addressed in these frameworks, PICOT: P-population/patients/problem, I-intervention or indicator being studied, C-comparison group, O-outcome of interest, and T-timeframe of the study; PEO: P-population being studied, E-exposure to preexisting conditions, and O-outcome of interest. 1 Research questions are also considered good if these meet the “FINERMAPS” framework: Feasible, Interesting, Novel, Ethical, Relevant, Manageable, Appropriate, Potential value/publishable, and Systematic. 14

As we indicated earlier, research questions and hypotheses that are not carefully formulated result in unethical studies or poor outcomes. To illustrate this, we provide some examples of ambiguous research question and hypotheses that result in unclear and weak research objectives in quantitative research ( Table 6 ) 16 and qualitative research ( Table 7 ) 17 , and how to transform these ambiguous research question(s) and hypothesis(es) into clear and good statements.

a These statements were composed for comparison and illustrative purposes only.

b These statements are direct quotes from Higashihara and Horiuchi. 16

a This statement is a direct quote from Shimoda et al. 17

The other statements were composed for comparison and illustrative purposes only.

CONSTRUCTING RESEARCH QUESTIONS AND HYPOTHESES

To construct effective research questions and hypotheses, it is very important to 1) clarify the background and 2) identify the research problem at the outset of the research, within a specific timeframe. 9 Then, 3) review or conduct preliminary research to collect all available knowledge about the possible research questions by studying theories and previous studies. 18 Afterwards, 4) construct research questions to investigate the research problem. Identify variables to be accessed from the research questions 4 and make operational definitions of constructs from the research problem and questions. Thereafter, 5) construct specific deductive or inductive predictions in the form of hypotheses. 4 Finally, 6) state the study aims . This general flow for constructing effective research questions and hypotheses prior to conducting research is shown in Fig. 1 .

An external file that holds a picture, illustration, etc.
Object name is jkms-37-e121-g001.jpg

Research questions are used more frequently in qualitative research than objectives or hypotheses. 3 These questions seek to discover, understand, explore or describe experiences by asking “What” or “How.” The questions are open-ended to elicit a description rather than to relate variables or compare groups. The questions are continually reviewed, reformulated, and changed during the qualitative study. 3 Research questions are also used more frequently in survey projects than hypotheses in experiments in quantitative research to compare variables and their relationships.

Hypotheses are constructed based on the variables identified and as an if-then statement, following the template, ‘If a specific action is taken, then a certain outcome is expected.’ At this stage, some ideas regarding expectations from the research to be conducted must be drawn. 18 Then, the variables to be manipulated (independent) and influenced (dependent) are defined. 4 Thereafter, the hypothesis is stated and refined, and reproducible data tailored to the hypothesis are identified, collected, and analyzed. 4 The hypotheses must be testable and specific, 18 and should describe the variables and their relationships, the specific group being studied, and the predicted research outcome. 18 Hypotheses construction involves a testable proposition to be deduced from theory, and independent and dependent variables to be separated and measured separately. 3 Therefore, good hypotheses must be based on good research questions constructed at the start of a study or trial. 12

In summary, research questions are constructed after establishing the background of the study. Hypotheses are then developed based on the research questions. Thus, it is crucial to have excellent research questions to generate superior hypotheses. In turn, these would determine the research objectives and the design of the study, and ultimately, the outcome of the research. 12 Algorithms for building research questions and hypotheses are shown in Fig. 2 for quantitative research and in Fig. 3 for qualitative research.

An external file that holds a picture, illustration, etc.
Object name is jkms-37-e121-g002.jpg

EXAMPLES OF RESEARCH QUESTIONS FROM PUBLISHED ARTICLES

  • EXAMPLE 1. Descriptive research question (quantitative research)
  • - Presents research variables to be assessed (distinct phenotypes and subphenotypes)
  • “BACKGROUND: Since COVID-19 was identified, its clinical and biological heterogeneity has been recognized. Identifying COVID-19 phenotypes might help guide basic, clinical, and translational research efforts.
  • RESEARCH QUESTION: Does the clinical spectrum of patients with COVID-19 contain distinct phenotypes and subphenotypes? ” 19
  • EXAMPLE 2. Relationship research question (quantitative research)
  • - Shows interactions between dependent variable (static postural control) and independent variable (peripheral visual field loss)
  • “Background: Integration of visual, vestibular, and proprioceptive sensations contributes to postural control. People with peripheral visual field loss have serious postural instability. However, the directional specificity of postural stability and sensory reweighting caused by gradual peripheral visual field loss remain unclear.
  • Research question: What are the effects of peripheral visual field loss on static postural control ?” 20
  • EXAMPLE 3. Comparative research question (quantitative research)
  • - Clarifies the difference among groups with an outcome variable (patients enrolled in COMPERA with moderate PH or severe PH in COPD) and another group without the outcome variable (patients with idiopathic pulmonary arterial hypertension (IPAH))
  • “BACKGROUND: Pulmonary hypertension (PH) in COPD is a poorly investigated clinical condition.
  • RESEARCH QUESTION: Which factors determine the outcome of PH in COPD?
  • STUDY DESIGN AND METHODS: We analyzed the characteristics and outcome of patients enrolled in the Comparative, Prospective Registry of Newly Initiated Therapies for Pulmonary Hypertension (COMPERA) with moderate or severe PH in COPD as defined during the 6th PH World Symposium who received medical therapy for PH and compared them with patients with idiopathic pulmonary arterial hypertension (IPAH) .” 21
  • EXAMPLE 4. Exploratory research question (qualitative research)
  • - Explores areas that have not been fully investigated (perspectives of families and children who receive care in clinic-based child obesity treatment) to have a deeper understanding of the research problem
  • “Problem: Interventions for children with obesity lead to only modest improvements in BMI and long-term outcomes, and data are limited on the perspectives of families of children with obesity in clinic-based treatment. This scoping review seeks to answer the question: What is known about the perspectives of families and children who receive care in clinic-based child obesity treatment? This review aims to explore the scope of perspectives reported by families of children with obesity who have received individualized outpatient clinic-based obesity treatment.” 22
  • EXAMPLE 5. Relationship research question (quantitative research)
  • - Defines interactions between dependent variable (use of ankle strategies) and independent variable (changes in muscle tone)
  • “Background: To maintain an upright standing posture against external disturbances, the human body mainly employs two types of postural control strategies: “ankle strategy” and “hip strategy.” While it has been reported that the magnitude of the disturbance alters the use of postural control strategies, it has not been elucidated how the level of muscle tone, one of the crucial parameters of bodily function, determines the use of each strategy. We have previously confirmed using forward dynamics simulations of human musculoskeletal models that an increased muscle tone promotes the use of ankle strategies. The objective of the present study was to experimentally evaluate a hypothesis: an increased muscle tone promotes the use of ankle strategies. Research question: Do changes in the muscle tone affect the use of ankle strategies ?” 23

EXAMPLES OF HYPOTHESES IN PUBLISHED ARTICLES

  • EXAMPLE 1. Working hypothesis (quantitative research)
  • - A hypothesis that is initially accepted for further research to produce a feasible theory
  • “As fever may have benefit in shortening the duration of viral illness, it is plausible to hypothesize that the antipyretic efficacy of ibuprofen may be hindering the benefits of a fever response when taken during the early stages of COVID-19 illness .” 24
  • “In conclusion, it is plausible to hypothesize that the antipyretic efficacy of ibuprofen may be hindering the benefits of a fever response . The difference in perceived safety of these agents in COVID-19 illness could be related to the more potent efficacy to reduce fever with ibuprofen compared to acetaminophen. Compelling data on the benefit of fever warrant further research and review to determine when to treat or withhold ibuprofen for early stage fever for COVID-19 and other related viral illnesses .” 24
  • EXAMPLE 2. Exploratory hypothesis (qualitative research)
  • - Explores particular areas deeper to clarify subjective experience and develop a formal hypothesis potentially testable in a future quantitative approach
  • “We hypothesized that when thinking about a past experience of help-seeking, a self distancing prompt would cause increased help-seeking intentions and more favorable help-seeking outcome expectations .” 25
  • “Conclusion
  • Although a priori hypotheses were not supported, further research is warranted as results indicate the potential for using self-distancing approaches to increasing help-seeking among some people with depressive symptomatology.” 25
  • EXAMPLE 3. Hypothesis-generating research to establish a framework for hypothesis testing (qualitative research)
  • “We hypothesize that compassionate care is beneficial for patients (better outcomes), healthcare systems and payers (lower costs), and healthcare providers (lower burnout). ” 26
  • Compassionomics is the branch of knowledge and scientific study of the effects of compassionate healthcare. Our main hypotheses are that compassionate healthcare is beneficial for (1) patients, by improving clinical outcomes, (2) healthcare systems and payers, by supporting financial sustainability, and (3) HCPs, by lowering burnout and promoting resilience and well-being. The purpose of this paper is to establish a scientific framework for testing the hypotheses above . If these hypotheses are confirmed through rigorous research, compassionomics will belong in the science of evidence-based medicine, with major implications for all healthcare domains.” 26
  • EXAMPLE 4. Statistical hypothesis (quantitative research)
  • - An assumption is made about the relationship among several population characteristics ( gender differences in sociodemographic and clinical characteristics of adults with ADHD ). Validity is tested by statistical experiment or analysis ( chi-square test, Students t-test, and logistic regression analysis)
  • “Our research investigated gender differences in sociodemographic and clinical characteristics of adults with ADHD in a Japanese clinical sample. Due to unique Japanese cultural ideals and expectations of women's behavior that are in opposition to ADHD symptoms, we hypothesized that women with ADHD experience more difficulties and present more dysfunctions than men . We tested the following hypotheses: first, women with ADHD have more comorbidities than men with ADHD; second, women with ADHD experience more social hardships than men, such as having less full-time employment and being more likely to be divorced.” 27
  • “Statistical Analysis
  • ( text omitted ) Between-gender comparisons were made using the chi-squared test for categorical variables and Students t-test for continuous variables…( text omitted ). A logistic regression analysis was performed for employment status, marital status, and comorbidity to evaluate the independent effects of gender on these dependent variables.” 27

EXAMPLES OF HYPOTHESIS AS WRITTEN IN PUBLISHED ARTICLES IN RELATION TO OTHER PARTS

  • EXAMPLE 1. Background, hypotheses, and aims are provided
  • “Pregnant women need skilled care during pregnancy and childbirth, but that skilled care is often delayed in some countries …( text omitted ). The focused antenatal care (FANC) model of WHO recommends that nurses provide information or counseling to all pregnant women …( text omitted ). Job aids are visual support materials that provide the right kind of information using graphics and words in a simple and yet effective manner. When nurses are not highly trained or have many work details to attend to, these job aids can serve as a content reminder for the nurses and can be used for educating their patients (Jennings, Yebadokpo, Affo, & Agbogbe, 2010) ( text omitted ). Importantly, additional evidence is needed to confirm how job aids can further improve the quality of ANC counseling by health workers in maternal care …( text omitted )” 28
  • “ This has led us to hypothesize that the quality of ANC counseling would be better if supported by job aids. Consequently, a better quality of ANC counseling is expected to produce higher levels of awareness concerning the danger signs of pregnancy and a more favorable impression of the caring behavior of nurses .” 28
  • “This study aimed to examine the differences in the responses of pregnant women to a job aid-supported intervention during ANC visit in terms of 1) their understanding of the danger signs of pregnancy and 2) their impression of the caring behaviors of nurses to pregnant women in rural Tanzania.” 28
  • EXAMPLE 2. Background, hypotheses, and aims are provided
  • “We conducted a two-arm randomized controlled trial (RCT) to evaluate and compare changes in salivary cortisol and oxytocin levels of first-time pregnant women between experimental and control groups. The women in the experimental group touched and held an infant for 30 min (experimental intervention protocol), whereas those in the control group watched a DVD movie of an infant (control intervention protocol). The primary outcome was salivary cortisol level and the secondary outcome was salivary oxytocin level.” 29
  • “ We hypothesize that at 30 min after touching and holding an infant, the salivary cortisol level will significantly decrease and the salivary oxytocin level will increase in the experimental group compared with the control group .” 29
  • EXAMPLE 3. Background, aim, and hypothesis are provided
  • “In countries where the maternal mortality ratio remains high, antenatal education to increase Birth Preparedness and Complication Readiness (BPCR) is considered one of the top priorities [1]. BPCR includes birth plans during the antenatal period, such as the birthplace, birth attendant, transportation, health facility for complications, expenses, and birth materials, as well as family coordination to achieve such birth plans. In Tanzania, although increasing, only about half of all pregnant women attend an antenatal clinic more than four times [4]. Moreover, the information provided during antenatal care (ANC) is insufficient. In the resource-poor settings, antenatal group education is a potential approach because of the limited time for individual counseling at antenatal clinics.” 30
  • “This study aimed to evaluate an antenatal group education program among pregnant women and their families with respect to birth-preparedness and maternal and infant outcomes in rural villages of Tanzania.” 30
  • “ The study hypothesis was if Tanzanian pregnant women and their families received a family-oriented antenatal group education, they would (1) have a higher level of BPCR, (2) attend antenatal clinic four or more times, (3) give birth in a health facility, (4) have less complications of women at birth, and (5) have less complications and deaths of infants than those who did not receive the education .” 30

Research questions and hypotheses are crucial components to any type of research, whether quantitative or qualitative. These questions should be developed at the very beginning of the study. Excellent research questions lead to superior hypotheses, which, like a compass, set the direction of research, and can often determine the successful conduct of the study. Many research studies have floundered because the development of research questions and subsequent hypotheses was not given the thought and meticulous attention needed. The development of research questions and hypotheses is an iterative process based on extensive knowledge of the literature and insightful grasp of the knowledge gap. Focused, concise, and specific research questions provide a strong foundation for constructing hypotheses which serve as formal predictions about the research outcomes. Research questions and hypotheses are crucial elements of research that should not be overlooked. They should be carefully thought of and constructed when planning research. This avoids unethical studies and poor outcomes by defining well-founded objectives that determine the design, course, and outcome of the study.

Disclosure: The authors have no potential conflicts of interest to disclose.

Author Contributions:

  • Conceptualization: Barroga E, Matanguihan GJ.
  • Methodology: Barroga E, Matanguihan GJ.
  • Writing - original draft: Barroga E, Matanguihan GJ.
  • Writing - review & editing: Barroga E, Matanguihan GJ.

Grad Coach

What Is A Research (Scientific) Hypothesis? A plain-language explainer + examples

By:  Derek Jansen (MBA)  | Reviewed By: Dr Eunice Rautenbach | June 2020

If you’re new to the world of research, or it’s your first time writing a dissertation or thesis, you’re probably noticing that the words “research hypothesis” and “scientific hypothesis” are used quite a bit, and you’re wondering what they mean in a research context .

“Hypothesis” is one of those words that people use loosely, thinking they understand what it means. However, it has a very specific meaning within academic research. So, it’s important to understand the exact meaning before you start hypothesizing. 

Research Hypothesis 101

  • What is a hypothesis ?
  • What is a research hypothesis (scientific hypothesis)?
  • Requirements for a research hypothesis
  • Definition of a research hypothesis
  • The null hypothesis

What is a hypothesis?

Let’s start with the general definition of a hypothesis (not a research hypothesis or scientific hypothesis), according to the Cambridge Dictionary:

Hypothesis: an idea or explanation for something that is based on known facts but has not yet been proved.

In other words, it’s a statement that provides an explanation for why or how something works, based on facts (or some reasonable assumptions), but that has not yet been specifically tested . For example, a hypothesis might look something like this:

Hypothesis: sleep impacts academic performance.

This statement predicts that academic performance will be influenced by the amount and/or quality of sleep a student engages in – sounds reasonable, right? It’s based on reasonable assumptions , underpinned by what we currently know about sleep and health (from the existing literature). So, loosely speaking, we could call it a hypothesis, at least by the dictionary definition.

But that’s not good enough…

Unfortunately, that’s not quite sophisticated enough to describe a research hypothesis (also sometimes called a scientific hypothesis), and it wouldn’t be acceptable in a dissertation, thesis or research paper . In the world of academic research, a statement needs a few more criteria to constitute a true research hypothesis .

What is a research hypothesis?

A research hypothesis (also called a scientific hypothesis) is a statement about the expected outcome of a study (for example, a dissertation or thesis). To constitute a quality hypothesis, the statement needs to have three attributes – specificity , clarity and testability .

Let’s take a look at these more closely.

Need a helping hand?

what is a hypothesis in a scientific report

Hypothesis Essential #1: Specificity & Clarity

A good research hypothesis needs to be extremely clear and articulate about both what’ s being assessed (who or what variables are involved ) and the expected outcome (for example, a difference between groups, a relationship between variables, etc.).

Let’s stick with our sleepy students example and look at how this statement could be more specific and clear.

Hypothesis: Students who sleep at least 8 hours per night will, on average, achieve higher grades in standardised tests than students who sleep less than 8 hours a night.

As you can see, the statement is very specific as it identifies the variables involved (sleep hours and test grades), the parties involved (two groups of students), as well as the predicted relationship type (a positive relationship). There’s no ambiguity or uncertainty about who or what is involved in the statement, and the expected outcome is clear.

Contrast that to the original hypothesis we looked at – “Sleep impacts academic performance” – and you can see the difference. “Sleep” and “academic performance” are both comparatively vague , and there’s no indication of what the expected relationship direction is (more sleep or less sleep). As you can see, specificity and clarity are key.

A good research hypothesis needs to be very clear about what’s being assessed and very specific about the expected outcome.

Hypothesis Essential #2: Testability (Provability)

A statement must be testable to qualify as a research hypothesis. In other words, there needs to be a way to prove (or disprove) the statement. If it’s not testable, it’s not a hypothesis – simple as that.

For example, consider the hypothesis we mentioned earlier:

Hypothesis: Students who sleep at least 8 hours per night will, on average, achieve higher grades in standardised tests than students who sleep less than 8 hours a night.  

We could test this statement by undertaking a quantitative study involving two groups of students, one that gets 8 or more hours of sleep per night for a fixed period, and one that gets less. We could then compare the standardised test results for both groups to see if there’s a statistically significant difference. 

Again, if you compare this to the original hypothesis we looked at – “Sleep impacts academic performance” – you can see that it would be quite difficult to test that statement, primarily because it isn’t specific enough. How much sleep? By who? What type of academic performance?

So, remember the mantra – if you can’t test it, it’s not a hypothesis 🙂

A good research hypothesis must be testable. In other words, you must able to collect observable data in a scientifically rigorous fashion to test it.

Defining A Research Hypothesis

You’re still with us? Great! Let’s recap and pin down a clear definition of a hypothesis.

A research hypothesis (or scientific hypothesis) is a statement about an expected relationship between variables, or explanation of an occurrence, that is clear, specific and testable.

So, when you write up hypotheses for your dissertation or thesis, make sure that they meet all these criteria. If you do, you’ll not only have rock-solid hypotheses but you’ll also ensure a clear focus for your entire research project.

What about the null hypothesis?

You may have also heard the terms null hypothesis , alternative hypothesis, or H-zero thrown around. At a simple level, the null hypothesis is the counter-proposal to the original hypothesis.

For example, if the hypothesis predicts that there is a relationship between two variables (for example, sleep and academic performance), the null hypothesis would predict that there is no relationship between those variables.

At a more technical level, the null hypothesis proposes that no statistical significance exists in a set of given observations and that any differences are due to chance alone.

And there you have it – hypotheses in a nutshell. 

If you have any questions, be sure to leave a comment below and we’ll do our best to help you. If you need hands-on help developing and testing your hypotheses, consider our private coaching service , where we hold your hand through the research journey.

what is a hypothesis in a scientific report

Psst... there’s more!

This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...

You Might Also Like:

Research limitations vs delimitations

16 Comments

Lynnet Chikwaikwai

Very useful information. I benefit more from getting more information in this regard.

Dr. WuodArek

Very great insight,educative and informative. Please give meet deep critics on many research data of public international Law like human rights, environment, natural resources, law of the sea etc

Afshin

In a book I read a distinction is made between null, research, and alternative hypothesis. As far as I understand, alternative and research hypotheses are the same. Can you please elaborate? Best Afshin

GANDI Benjamin

This is a self explanatory, easy going site. I will recommend this to my friends and colleagues.

Lucile Dossou-Yovo

Very good definition. How can I cite your definition in my thesis? Thank you. Is nul hypothesis compulsory in a research?

Pereria

It’s a counter-proposal to be proven as a rejection

Egya Salihu

Please what is the difference between alternate hypothesis and research hypothesis?

Mulugeta Tefera

It is a very good explanation. However, it limits hypotheses to statistically tasteable ideas. What about for qualitative researches or other researches that involve quantitative data that don’t need statistical tests?

Derek Jansen

In qualitative research, one typically uses propositions, not hypotheses.

Samia

could you please elaborate it more

Patricia Nyawir

I’ve benefited greatly from these notes, thank you.

Hopeson Khondiwa

This is very helpful

Dr. Andarge

well articulated ideas are presented here, thank you for being reliable sources of information

TAUNO

Excellent. Thanks for being clear and sound about the research methodology and hypothesis (quantitative research)

I have only a simple question regarding the null hypothesis. – Is the null hypothesis (Ho) known as the reversible hypothesis of the alternative hypothesis (H1? – How to test it in academic research?

Tesfaye Negesa Urge

this is very important note help me much more

Trackbacks/Pingbacks

  • What Is Research Methodology? Simple Definition (With Examples) - Grad Coach - […] Contrasted to this, a quantitative methodology is typically used when the research aims and objectives are confirmatory in nature. For example,…

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly
  • Affiliate Program

Wordvice

  • UNITED STATES
  • 台灣 (TAIWAN)
  • TÜRKIYE (TURKEY)
  • Academic Editing Services
  • - Research Paper
  • - Journal Manuscript
  • - Dissertation
  • - College & University Assignments
  • Admissions Editing Services
  • - Application Essay
  • - Personal Statement
  • - Recommendation Letter
  • - Cover Letter
  • - CV/Resume
  • Business Editing Services
  • - Business Documents
  • - Report & Brochure
  • - Website & Blog
  • Writer Editing Services
  • - Script & Screenplay
  • Our Editors
  • Client Reviews
  • Editing & Proofreading Prices
  • Wordvice Points
  • Partner Discount
  • Plagiarism Checker
  • APA Citation Generator
  • MLA Citation Generator
  • Chicago Citation Generator
  • Vancouver Citation Generator
  • - APA Style
  • - MLA Style
  • - Chicago Style
  • - Vancouver Style
  • Writing & Editing Guide
  • Academic Resources
  • Admissions Resources

How to Write a Research Hypothesis: Good & Bad Examples

what is a hypothesis in a scientific report

What is a research hypothesis?

A research hypothesis is an attempt at explaining a phenomenon or the relationships between phenomena/variables in the real world. Hypotheses are sometimes called “educated guesses”, but they are in fact (or let’s say they should be) based on previous observations, existing theories, scientific evidence, and logic. A research hypothesis is also not a prediction—rather, predictions are ( should be) based on clearly formulated hypotheses. For example, “We tested the hypothesis that KLF2 knockout mice would show deficiencies in heart development” is an assumption or prediction, not a hypothesis. 

The research hypothesis at the basis of this prediction is “the product of the KLF2 gene is involved in the development of the cardiovascular system in mice”—and this hypothesis is probably (hopefully) based on a clear observation, such as that mice with low levels of Kruppel-like factor 2 (which KLF2 codes for) seem to have heart problems. From this hypothesis, you can derive the idea that a mouse in which this particular gene does not function cannot develop a normal cardiovascular system, and then make the prediction that we started with. 

What is the difference between a hypothesis and a prediction?

You might think that these are very subtle differences, and you will certainly come across many publications that do not contain an actual hypothesis or do not make these distinctions correctly. But considering that the formulation and testing of hypotheses is an integral part of the scientific method, it is good to be aware of the concepts underlying this approach. The two hallmarks of a scientific hypothesis are falsifiability (an evaluation standard that was introduced by the philosopher of science Karl Popper in 1934) and testability —if you cannot use experiments or data to decide whether an idea is true or false, then it is not a hypothesis (or at least a very bad one).

So, in a nutshell, you (1) look at existing evidence/theories, (2) come up with a hypothesis, (3) make a prediction that allows you to (4) design an experiment or data analysis to test it, and (5) come to a conclusion. Of course, not all studies have hypotheses (there is also exploratory or hypothesis-generating research), and you do not necessarily have to state your hypothesis as such in your paper. 

But for the sake of understanding the principles of the scientific method, let’s first take a closer look at the different types of hypotheses that research articles refer to and then give you a step-by-step guide for how to formulate a strong hypothesis for your own paper.

Types of Research Hypotheses

Hypotheses can be simple , which means they describe the relationship between one single independent variable (the one you observe variations in or plan to manipulate) and one single dependent variable (the one you expect to be affected by the variations/manipulation). If there are more variables on either side, you are dealing with a complex hypothesis. You can also distinguish hypotheses according to the kind of relationship between the variables you are interested in (e.g., causal or associative ). But apart from these variations, we are usually interested in what is called the “alternative hypothesis” and, in contrast to that, the “null hypothesis”. If you think these two should be listed the other way round, then you are right, logically speaking—the alternative should surely come second. However, since this is the hypothesis we (as researchers) are usually interested in, let’s start from there.

Alternative Hypothesis

If you predict a relationship between two variables in your study, then the research hypothesis that you formulate to describe that relationship is your alternative hypothesis (usually H1 in statistical terms). The goal of your hypothesis testing is thus to demonstrate that there is sufficient evidence that supports the alternative hypothesis, rather than evidence for the possibility that there is no such relationship. The alternative hypothesis is usually the research hypothesis of a study and is based on the literature, previous observations, and widely known theories. 

Null Hypothesis

The hypothesis that describes the other possible outcome, that is, that your variables are not related, is the null hypothesis ( H0 ). Based on your findings, you choose between the two hypotheses—usually that means that if your prediction was correct, you reject the null hypothesis and accept the alternative. Make sure, however, that you are not getting lost at this step of the thinking process: If your prediction is that there will be no difference or change, then you are trying to find support for the null hypothesis and reject H1. 

Directional Hypothesis

While the null hypothesis is obviously “static”, the alternative hypothesis can specify a direction for the observed relationship between variables—for example, that mice with higher expression levels of a certain protein are more active than those with lower levels. This is then called a one-tailed hypothesis. 

Another example for a directional one-tailed alternative hypothesis would be that 

H1: Attending private classes before important exams has a positive effect on performance. 

Your null hypothesis would then be that

H0: Attending private classes before important exams has no/a negative effect on performance.

Nondirectional Hypothesis

A nondirectional hypothesis does not specify the direction of the potentially observed effect, only that there is a relationship between the studied variables—this is called a two-tailed hypothesis. For instance, if you are studying a new drug that has shown some effects on pathways involved in a certain condition (e.g., anxiety) in vitro in the lab, but you can’t say for sure whether it will have the same effects in an animal model or maybe induce other/side effects that you can’t predict and potentially increase anxiety levels instead, you could state the two hypotheses like this:

H1: The only lab-tested drug (somehow) affects anxiety levels in an anxiety mouse model.

You then test this nondirectional alternative hypothesis against the null hypothesis:

H0: The only lab-tested drug has no effect on anxiety levels in an anxiety mouse model.

hypothesis in a research paper

How to Write a Hypothesis for a Research Paper

Now that we understand the important distinctions between different kinds of research hypotheses, let’s look at a simple process of how to write a hypothesis.

Writing a Hypothesis Step:1

Ask a question, based on earlier research. Research always starts with a question, but one that takes into account what is already known about a topic or phenomenon. For example, if you are interested in whether people who have pets are happier than those who don’t, do a literature search and find out what has already been demonstrated. You will probably realize that yes, there is quite a bit of research that shows a relationship between happiness and owning a pet—and even studies that show that owning a dog is more beneficial than owning a cat ! Let’s say you are so intrigued by this finding that you wonder: 

What is it that makes dog owners even happier than cat owners? 

Let’s move on to Step 2 and find an answer to that question.

Writing a Hypothesis Step 2:

Formulate a strong hypothesis by answering your own question. Again, you don’t want to make things up, take unicorns into account, or repeat/ignore what has already been done. Looking at the dog-vs-cat papers your literature search returned, you see that most studies are based on self-report questionnaires on personality traits, mental health, and life satisfaction. What you don’t find is any data on actual (mental or physical) health measures, and no experiments. You therefore decide to make a bold claim come up with the carefully thought-through hypothesis that it’s maybe the lifestyle of the dog owners, which includes walking their dog several times per day, engaging in fun and healthy activities such as agility competitions, and taking them on trips, that gives them that extra boost in happiness. You could therefore answer your question in the following way:

Dog owners are happier than cat owners because of the dog-related activities they engage in.

Now you have to verify that your hypothesis fulfills the two requirements we introduced at the beginning of this resource article: falsifiability and testability . If it can’t be wrong and can’t be tested, it’s not a hypothesis. We are lucky, however, because yes, we can test whether owning a dog but not engaging in any of those activities leads to lower levels of happiness or well-being than owning a dog and playing and running around with them or taking them on trips.  

Writing a Hypothesis Step 3:

Make your predictions and define your variables. We have verified that we can test our hypothesis, but now we have to define all the relevant variables, design our experiment or data analysis, and make precise predictions. You could, for example, decide to study dog owners (not surprising at this point), let them fill in questionnaires about their lifestyle as well as their life satisfaction (as other studies did), and then compare two groups of active and inactive dog owners. Alternatively, if you want to go beyond the data that earlier studies produced and analyzed and directly manipulate the activity level of your dog owners to study the effect of that manipulation, you could invite them to your lab, select groups of participants with similar lifestyles, make them change their lifestyle (e.g., couch potato dog owners start agility classes, very active ones have to refrain from any fun activities for a certain period of time) and assess their happiness levels before and after the intervention. In both cases, your independent variable would be “ level of engagement in fun activities with dog” and your dependent variable would be happiness or well-being . 

Examples of a Good and Bad Hypothesis

Let’s look at a few examples of good and bad hypotheses to get you started.

Good Hypothesis Examples

Bad hypothesis examples, tips for writing a research hypothesis.

If you understood the distinction between a hypothesis and a prediction we made at the beginning of this article, then you will have no problem formulating your hypotheses and predictions correctly. To refresh your memory: We have to (1) look at existing evidence, (2) come up with a hypothesis, (3) make a prediction, and (4) design an experiment. For example, you could summarize your dog/happiness study like this:

(1) While research suggests that dog owners are happier than cat owners, there are no reports on what factors drive this difference. (2) We hypothesized that it is the fun activities that many dog owners (but very few cat owners) engage in with their pets that increases their happiness levels. (3) We thus predicted that preventing very active dog owners from engaging in such activities for some time and making very inactive dog owners take up such activities would lead to an increase and decrease in their overall self-ratings of happiness, respectively. (4) To test this, we invited dog owners into our lab, assessed their mental and emotional well-being through questionnaires, and then assigned them to an “active” and an “inactive” group, depending on… 

Note that you use “we hypothesize” only for your hypothesis, not for your experimental prediction, and “would” or “if – then” only for your prediction, not your hypothesis. A hypothesis that states that something “would” affect something else sounds as if you don’t have enough confidence to make a clear statement—in which case you can’t expect your readers to believe in your research either. Write in the present tense, don’t use modal verbs that express varying degrees of certainty (such as may, might, or could ), and remember that you are not drawing a conclusion while trying not to exaggerate but making a clear statement that you then, in a way, try to disprove . And if that happens, that is not something to fear but an important part of the scientific process.

Similarly, don’t use “we hypothesize” when you explain the implications of your research or make predictions in the conclusion section of your manuscript, since these are clearly not hypotheses in the true sense of the word. As we said earlier, you will find that many authors of academic articles do not seem to care too much about these rather subtle distinctions, but thinking very clearly about your own research will not only help you write better but also ensure that even that infamous Reviewer 2 will find fewer reasons to nitpick about your manuscript. 

Perfect Your Manuscript With Professional Editing

Now that you know how to write a strong research hypothesis for your research paper, you might be interested in our free AI proofreader , Wordvice AI, which finds and fixes errors in grammar, punctuation, and word choice in academic texts. Or if you are interested in human proofreading , check out our English editing services , including research paper editing and manuscript editing .

On the Wordvice academic resources website , you can also find many more articles and other resources that can help you with writing the other parts of your research paper , with making a research paper outline before you put everything together, or with writing an effective cover letter once you are ready to submit.

  • Privacy Policy

Buy Me a Coffee

Research Method

Home » What is a Hypothesis – Types, Examples and Writing Guide

What is a Hypothesis – Types, Examples and Writing Guide

Table of Contents

What is a Hypothesis

Definition:

Hypothesis is an educated guess or proposed explanation for a phenomenon, based on some initial observations or data. It is a tentative statement that can be tested and potentially proven or disproven through further investigation and experimentation.

Hypothesis is often used in scientific research to guide the design of experiments and the collection and analysis of data. It is an essential element of the scientific method, as it allows researchers to make predictions about the outcome of their experiments and to test those predictions to determine their accuracy.

Types of Hypothesis

Types of Hypothesis are as follows:

Research Hypothesis

A research hypothesis is a statement that predicts a relationship between variables. It is usually formulated as a specific statement that can be tested through research, and it is often used in scientific research to guide the design of experiments.

Null Hypothesis

The null hypothesis is a statement that assumes there is no significant difference or relationship between variables. It is often used as a starting point for testing the research hypothesis, and if the results of the study reject the null hypothesis, it suggests that there is a significant difference or relationship between variables.

Alternative Hypothesis

An alternative hypothesis is a statement that assumes there is a significant difference or relationship between variables. It is often used as an alternative to the null hypothesis and is tested against the null hypothesis to determine which statement is more accurate.

Directional Hypothesis

A directional hypothesis is a statement that predicts the direction of the relationship between variables. For example, a researcher might predict that increasing the amount of exercise will result in a decrease in body weight.

Non-directional Hypothesis

A non-directional hypothesis is a statement that predicts the relationship between variables but does not specify the direction. For example, a researcher might predict that there is a relationship between the amount of exercise and body weight, but they do not specify whether increasing or decreasing exercise will affect body weight.

Statistical Hypothesis

A statistical hypothesis is a statement that assumes a particular statistical model or distribution for the data. It is often used in statistical analysis to test the significance of a particular result.

Composite Hypothesis

A composite hypothesis is a statement that assumes more than one condition or outcome. It can be divided into several sub-hypotheses, each of which represents a different possible outcome.

Empirical Hypothesis

An empirical hypothesis is a statement that is based on observed phenomena or data. It is often used in scientific research to develop theories or models that explain the observed phenomena.

Simple Hypothesis

A simple hypothesis is a statement that assumes only one outcome or condition. It is often used in scientific research to test a single variable or factor.

Complex Hypothesis

A complex hypothesis is a statement that assumes multiple outcomes or conditions. It is often used in scientific research to test the effects of multiple variables or factors on a particular outcome.

Applications of Hypothesis

Hypotheses are used in various fields to guide research and make predictions about the outcomes of experiments or observations. Here are some examples of how hypotheses are applied in different fields:

  • Science : In scientific research, hypotheses are used to test the validity of theories and models that explain natural phenomena. For example, a hypothesis might be formulated to test the effects of a particular variable on a natural system, such as the effects of climate change on an ecosystem.
  • Medicine : In medical research, hypotheses are used to test the effectiveness of treatments and therapies for specific conditions. For example, a hypothesis might be formulated to test the effects of a new drug on a particular disease.
  • Psychology : In psychology, hypotheses are used to test theories and models of human behavior and cognition. For example, a hypothesis might be formulated to test the effects of a particular stimulus on the brain or behavior.
  • Sociology : In sociology, hypotheses are used to test theories and models of social phenomena, such as the effects of social structures or institutions on human behavior. For example, a hypothesis might be formulated to test the effects of income inequality on crime rates.
  • Business : In business research, hypotheses are used to test the validity of theories and models that explain business phenomena, such as consumer behavior or market trends. For example, a hypothesis might be formulated to test the effects of a new marketing campaign on consumer buying behavior.
  • Engineering : In engineering, hypotheses are used to test the effectiveness of new technologies or designs. For example, a hypothesis might be formulated to test the efficiency of a new solar panel design.

How to write a Hypothesis

Here are the steps to follow when writing a hypothesis:

Identify the Research Question

The first step is to identify the research question that you want to answer through your study. This question should be clear, specific, and focused. It should be something that can be investigated empirically and that has some relevance or significance in the field.

Conduct a Literature Review

Before writing your hypothesis, it’s essential to conduct a thorough literature review to understand what is already known about the topic. This will help you to identify the research gap and formulate a hypothesis that builds on existing knowledge.

Determine the Variables

The next step is to identify the variables involved in the research question. A variable is any characteristic or factor that can vary or change. There are two types of variables: independent and dependent. The independent variable is the one that is manipulated or changed by the researcher, while the dependent variable is the one that is measured or observed as a result of the independent variable.

Formulate the Hypothesis

Based on the research question and the variables involved, you can now formulate your hypothesis. A hypothesis should be a clear and concise statement that predicts the relationship between the variables. It should be testable through empirical research and based on existing theory or evidence.

Write the Null Hypothesis

The null hypothesis is the opposite of the alternative hypothesis, which is the hypothesis that you are testing. The null hypothesis states that there is no significant difference or relationship between the variables. It is important to write the null hypothesis because it allows you to compare your results with what would be expected by chance.

Refine the Hypothesis

After formulating the hypothesis, it’s important to refine it and make it more precise. This may involve clarifying the variables, specifying the direction of the relationship, or making the hypothesis more testable.

Examples of Hypothesis

Here are a few examples of hypotheses in different fields:

  • Psychology : “Increased exposure to violent video games leads to increased aggressive behavior in adolescents.”
  • Biology : “Higher levels of carbon dioxide in the atmosphere will lead to increased plant growth.”
  • Sociology : “Individuals who grow up in households with higher socioeconomic status will have higher levels of education and income as adults.”
  • Education : “Implementing a new teaching method will result in higher student achievement scores.”
  • Marketing : “Customers who receive a personalized email will be more likely to make a purchase than those who receive a generic email.”
  • Physics : “An increase in temperature will cause an increase in the volume of a gas, assuming all other variables remain constant.”
  • Medicine : “Consuming a diet high in saturated fats will increase the risk of developing heart disease.”

Purpose of Hypothesis

The purpose of a hypothesis is to provide a testable explanation for an observed phenomenon or a prediction of a future outcome based on existing knowledge or theories. A hypothesis is an essential part of the scientific method and helps to guide the research process by providing a clear focus for investigation. It enables scientists to design experiments or studies to gather evidence and data that can support or refute the proposed explanation or prediction.

The formulation of a hypothesis is based on existing knowledge, observations, and theories, and it should be specific, testable, and falsifiable. A specific hypothesis helps to define the research question, which is important in the research process as it guides the selection of an appropriate research design and methodology. Testability of the hypothesis means that it can be proven or disproven through empirical data collection and analysis. Falsifiability means that the hypothesis should be formulated in such a way that it can be proven wrong if it is incorrect.

In addition to guiding the research process, the testing of hypotheses can lead to new discoveries and advancements in scientific knowledge. When a hypothesis is supported by the data, it can be used to develop new theories or models to explain the observed phenomenon. When a hypothesis is not supported by the data, it can help to refine existing theories or prompt the development of new hypotheses to explain the phenomenon.

When to use Hypothesis

Here are some common situations in which hypotheses are used:

  • In scientific research , hypotheses are used to guide the design of experiments and to help researchers make predictions about the outcomes of those experiments.
  • In social science research , hypotheses are used to test theories about human behavior, social relationships, and other phenomena.
  • I n business , hypotheses can be used to guide decisions about marketing, product development, and other areas. For example, a hypothesis might be that a new product will sell well in a particular market, and this hypothesis can be tested through market research.

Characteristics of Hypothesis

Here are some common characteristics of a hypothesis:

  • Testable : A hypothesis must be able to be tested through observation or experimentation. This means that it must be possible to collect data that will either support or refute the hypothesis.
  • Falsifiable : A hypothesis must be able to be proven false if it is not supported by the data. If a hypothesis cannot be falsified, then it is not a scientific hypothesis.
  • Clear and concise : A hypothesis should be stated in a clear and concise manner so that it can be easily understood and tested.
  • Based on existing knowledge : A hypothesis should be based on existing knowledge and research in the field. It should not be based on personal beliefs or opinions.
  • Specific : A hypothesis should be specific in terms of the variables being tested and the predicted outcome. This will help to ensure that the research is focused and well-designed.
  • Tentative: A hypothesis is a tentative statement or assumption that requires further testing and evidence to be confirmed or refuted. It is not a final conclusion or assertion.
  • Relevant : A hypothesis should be relevant to the research question or problem being studied. It should address a gap in knowledge or provide a new perspective on the issue.

Advantages of Hypothesis

Hypotheses have several advantages in scientific research and experimentation:

  • Guides research: A hypothesis provides a clear and specific direction for research. It helps to focus the research question, select appropriate methods and variables, and interpret the results.
  • Predictive powe r: A hypothesis makes predictions about the outcome of research, which can be tested through experimentation. This allows researchers to evaluate the validity of the hypothesis and make new discoveries.
  • Facilitates communication: A hypothesis provides a common language and framework for scientists to communicate with one another about their research. This helps to facilitate the exchange of ideas and promotes collaboration.
  • Efficient use of resources: A hypothesis helps researchers to use their time, resources, and funding efficiently by directing them towards specific research questions and methods that are most likely to yield results.
  • Provides a basis for further research: A hypothesis that is supported by data provides a basis for further research and exploration. It can lead to new hypotheses, theories, and discoveries.
  • Increases objectivity: A hypothesis can help to increase objectivity in research by providing a clear and specific framework for testing and interpreting results. This can reduce bias and increase the reliability of research findings.

Limitations of Hypothesis

Some Limitations of the Hypothesis are as follows:

  • Limited to observable phenomena: Hypotheses are limited to observable phenomena and cannot account for unobservable or intangible factors. This means that some research questions may not be amenable to hypothesis testing.
  • May be inaccurate or incomplete: Hypotheses are based on existing knowledge and research, which may be incomplete or inaccurate. This can lead to flawed hypotheses and erroneous conclusions.
  • May be biased: Hypotheses may be biased by the researcher’s own beliefs, values, or assumptions. This can lead to selective interpretation of data and a lack of objectivity in research.
  • Cannot prove causation: A hypothesis can only show a correlation between variables, but it cannot prove causation. This requires further experimentation and analysis.
  • Limited to specific contexts: Hypotheses are limited to specific contexts and may not be generalizable to other situations or populations. This means that results may not be applicable in other contexts or may require further testing.
  • May be affected by chance : Hypotheses may be affected by chance or random variation, which can obscure or distort the true relationship between variables.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Data collection

Data Collection – Methods Types and Examples

Delimitations

Delimitations in Research – Types, Examples and...

Research Process

Research Process – Steps, Examples and Tips

Research Design

Research Design – Types, Methods and Examples

Institutional Review Board (IRB)

Institutional Review Board – Application Sample...

Evaluating Research

Evaluating Research – Process, Examples and...

Research Hypothesis In Psychology: Types, & Examples

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Learn about our Editorial Process

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A research hypothesis, in its plural form “hypotheses,” is a specific, testable prediction about the anticipated results of a study, established at its outset. It is a key component of the scientific method .

Hypotheses connect theory to data and guide the research process towards expanding scientific understanding

Some key points about hypotheses:

  • A hypothesis expresses an expected pattern or relationship. It connects the variables under investigation.
  • It is stated in clear, precise terms before any data collection or analysis occurs. This makes the hypothesis testable.
  • A hypothesis must be falsifiable. It should be possible, even if unlikely in practice, to collect data that disconfirms rather than supports the hypothesis.
  • Hypotheses guide research. Scientists design studies to explicitly evaluate hypotheses about how nature works.
  • For a hypothesis to be valid, it must be testable against empirical evidence. The evidence can then confirm or disprove the testable predictions.
  • Hypotheses are informed by background knowledge and observation, but go beyond what is already known to propose an explanation of how or why something occurs.
Predictions typically arise from a thorough knowledge of the research literature, curiosity about real-world problems or implications, and integrating this to advance theory. They build on existing literature while providing new insight.

Types of Research Hypotheses

Alternative hypothesis.

The research hypothesis is often called the alternative or experimental hypothesis in experimental research.

It typically suggests a potential relationship between two key variables: the independent variable, which the researcher manipulates, and the dependent variable, which is measured based on those changes.

The alternative hypothesis states a relationship exists between the two variables being studied (one variable affects the other).

A hypothesis is a testable statement or prediction about the relationship between two or more variables. It is a key component of the scientific method. Some key points about hypotheses:

  • Important hypotheses lead to predictions that can be tested empirically. The evidence can then confirm or disprove the testable predictions.

In summary, a hypothesis is a precise, testable statement of what researchers expect to happen in a study and why. Hypotheses connect theory to data and guide the research process towards expanding scientific understanding.

An experimental hypothesis predicts what change(s) will occur in the dependent variable when the independent variable is manipulated.

It states that the results are not due to chance and are significant in supporting the theory being investigated.

The alternative hypothesis can be directional, indicating a specific direction of the effect, or non-directional, suggesting a difference without specifying its nature. It’s what researchers aim to support or demonstrate through their study.

Null Hypothesis

The null hypothesis states no relationship exists between the two variables being studied (one variable does not affect the other). There will be no changes in the dependent variable due to manipulating the independent variable.

It states results are due to chance and are not significant in supporting the idea being investigated.

The null hypothesis, positing no effect or relationship, is a foundational contrast to the research hypothesis in scientific inquiry. It establishes a baseline for statistical testing, promoting objectivity by initiating research from a neutral stance.

Many statistical methods are tailored to test the null hypothesis, determining the likelihood of observed results if no true effect exists.

This dual-hypothesis approach provides clarity, ensuring that research intentions are explicit, and fosters consistency across scientific studies, enhancing the standardization and interpretability of research outcomes.

Nondirectional Hypothesis

A non-directional hypothesis, also known as a two-tailed hypothesis, predicts that there is a difference or relationship between two variables but does not specify the direction of this relationship.

It merely indicates that a change or effect will occur without predicting which group will have higher or lower values.

For example, “There is a difference in performance between Group A and Group B” is a non-directional hypothesis.

Directional Hypothesis

A directional (one-tailed) hypothesis predicts the nature of the effect of the independent variable on the dependent variable. It predicts in which direction the change will take place. (i.e., greater, smaller, less, more)

It specifies whether one variable is greater, lesser, or different from another, rather than just indicating that there’s a difference without specifying its nature.

For example, “Exercise increases weight loss” is a directional hypothesis.

hypothesis

Falsifiability

The Falsification Principle, proposed by Karl Popper , is a way of demarcating science from non-science. It suggests that for a theory or hypothesis to be considered scientific, it must be testable and irrefutable.

Falsifiability emphasizes that scientific claims shouldn’t just be confirmable but should also have the potential to be proven wrong.

It means that there should exist some potential evidence or experiment that could prove the proposition false.

However many confirming instances exist for a theory, it only takes one counter observation to falsify it. For example, the hypothesis that “all swans are white,” can be falsified by observing a black swan.

For Popper, science should attempt to disprove a theory rather than attempt to continually provide evidence to support a research hypothesis.

Can a Hypothesis be Proven?

Hypotheses make probabilistic predictions. They state the expected outcome if a particular relationship exists. However, a study result supporting a hypothesis does not definitively prove it is true.

All studies have limitations. There may be unknown confounding factors or issues that limit the certainty of conclusions. Additional studies may yield different results.

In science, hypotheses can realistically only be supported with some degree of confidence, not proven. The process of science is to incrementally accumulate evidence for and against hypothesized relationships in an ongoing pursuit of better models and explanations that best fit the empirical data. But hypotheses remain open to revision and rejection if that is where the evidence leads.
  • Disproving a hypothesis is definitive. Solid disconfirmatory evidence will falsify a hypothesis and require altering or discarding it based on the evidence.
  • However, confirming evidence is always open to revision. Other explanations may account for the same results, and additional or contradictory evidence may emerge over time.

We can never 100% prove the alternative hypothesis. Instead, we see if we can disprove, or reject the null hypothesis.

If we reject the null hypothesis, this doesn’t mean that our alternative hypothesis is correct but does support the alternative/experimental hypothesis.

Upon analysis of the results, an alternative hypothesis can be rejected or supported, but it can never be proven to be correct. We must avoid any reference to results proving a theory as this implies 100% certainty, and there is always a chance that evidence may exist which could refute a theory.

How to Write a Hypothesis

  • Identify variables . The researcher manipulates the independent variable and the dependent variable is the measured outcome.
  • Operationalized the variables being investigated . Operationalization of a hypothesis refers to the process of making the variables physically measurable or testable, e.g. if you are about to study aggression, you might count the number of punches given by participants.
  • Decide on a direction for your prediction . If there is evidence in the literature to support a specific effect of the independent variable on the dependent variable, write a directional (one-tailed) hypothesis. If there are limited or ambiguous findings in the literature regarding the effect of the independent variable on the dependent variable, write a non-directional (two-tailed) hypothesis.
  • Make it Testable : Ensure your hypothesis can be tested through experimentation or observation. It should be possible to prove it false (principle of falsifiability).
  • Clear & concise language . A strong hypothesis is concise (typically one to two sentences long), and formulated using clear and straightforward language, ensuring it’s easily understood and testable.

Consider a hypothesis many teachers might subscribe to: students work better on Monday morning than on Friday afternoon (IV=Day, DV= Standard of work).

Now, if we decide to study this by giving the same group of students a lesson on a Monday morning and a Friday afternoon and then measuring their immediate recall of the material covered in each session, we would end up with the following:

  • The alternative hypothesis states that students will recall significantly more information on a Monday morning than on a Friday afternoon.
  • The null hypothesis states that there will be no significant difference in the amount recalled on a Monday morning compared to a Friday afternoon. Any difference will be due to chance or confounding factors.

More Examples

  • Memory : Participants exposed to classical music during study sessions will recall more items from a list than those who studied in silence.
  • Social Psychology : Individuals who frequently engage in social media use will report higher levels of perceived social isolation compared to those who use it infrequently.
  • Developmental Psychology : Children who engage in regular imaginative play have better problem-solving skills than those who don’t.
  • Clinical Psychology : Cognitive-behavioral therapy will be more effective in reducing symptoms of anxiety over a 6-month period compared to traditional talk therapy.
  • Cognitive Psychology : Individuals who multitask between various electronic devices will have shorter attention spans on focused tasks than those who single-task.
  • Health Psychology : Patients who practice mindfulness meditation will experience lower levels of chronic pain compared to those who don’t meditate.
  • Organizational Psychology : Employees in open-plan offices will report higher levels of stress than those in private offices.
  • Behavioral Psychology : Rats rewarded with food after pressing a lever will press it more frequently than rats who receive no reward.

Print Friendly, PDF & Email

Enago Academy

How to Develop a Good Research Hypothesis

' src=

The story of a research study begins by asking a question. Researchers all around the globe are asking curious questions and formulating research hypothesis. However, whether the research study provides an effective conclusion depends on how well one develops a good research hypothesis. Research hypothesis examples could help researchers get an idea as to how to write a good research hypothesis.

This blog will help you understand what is a research hypothesis, its characteristics and, how to formulate a research hypothesis

Table of Contents

What is Hypothesis?

Hypothesis is an assumption or an idea proposed for the sake of argument so that it can be tested. It is a precise, testable statement of what the researchers predict will be outcome of the study.  Hypothesis usually involves proposing a relationship between two variables: the independent variable (what the researchers change) and the dependent variable (what the research measures).

What is a Research Hypothesis?

Research hypothesis is a statement that introduces a research question and proposes an expected result. It is an integral part of the scientific method that forms the basis of scientific experiments. Therefore, you need to be careful and thorough when building your research hypothesis. A minor flaw in the construction of your hypothesis could have an adverse effect on your experiment. In research, there is a convention that the hypothesis is written in two forms, the null hypothesis, and the alternative hypothesis (called the experimental hypothesis when the method of investigation is an experiment).

Characteristics of a Good Research Hypothesis

As the hypothesis is specific, there is a testable prediction about what you expect to happen in a study. You may consider drawing hypothesis from previously published research based on the theory.

A good research hypothesis involves more effort than just a guess. In particular, your hypothesis may begin with a question that could be further explored through background research.

To help you formulate a promising research hypothesis, you should ask yourself the following questions:

  • Is the language clear and focused?
  • What is the relationship between your hypothesis and your research topic?
  • Is your hypothesis testable? If yes, then how?
  • What are the possible explanations that you might want to explore?
  • Does your hypothesis include both an independent and dependent variable?
  • Can you manipulate your variables without hampering the ethical standards?
  • Does your research predict the relationship and outcome?
  • Is your research simple and concise (avoids wordiness)?
  • Is it clear with no ambiguity or assumptions about the readers’ knowledge
  • Is your research observable and testable results?
  • Is it relevant and specific to the research question or problem?

research hypothesis example

The questions listed above can be used as a checklist to make sure your hypothesis is based on a solid foundation. Furthermore, it can help you identify weaknesses in your hypothesis and revise it if necessary.

Source: Educational Hub

How to formulate a research hypothesis.

A testable hypothesis is not a simple statement. It is rather an intricate statement that needs to offer a clear introduction to a scientific experiment, its intentions, and the possible outcomes. However, there are some important things to consider when building a compelling hypothesis.

1. State the problem that you are trying to solve.

Make sure that the hypothesis clearly defines the topic and the focus of the experiment.

2. Try to write the hypothesis as an if-then statement.

Follow this template: If a specific action is taken, then a certain outcome is expected.

3. Define the variables

Independent variables are the ones that are manipulated, controlled, or changed. Independent variables are isolated from other factors of the study.

Dependent variables , as the name suggests are dependent on other factors of the study. They are influenced by the change in independent variable.

4. Scrutinize the hypothesis

Evaluate assumptions, predictions, and evidence rigorously to refine your understanding.

Types of Research Hypothesis

The types of research hypothesis are stated below:

1. Simple Hypothesis

It predicts the relationship between a single dependent variable and a single independent variable.

2. Complex Hypothesis

It predicts the relationship between two or more independent and dependent variables.

3. Directional Hypothesis

It specifies the expected direction to be followed to determine the relationship between variables and is derived from theory. Furthermore, it implies the researcher’s intellectual commitment to a particular outcome.

4. Non-directional Hypothesis

It does not predict the exact direction or nature of the relationship between the two variables. The non-directional hypothesis is used when there is no theory involved or when findings contradict previous research.

5. Associative and Causal Hypothesis

The associative hypothesis defines interdependency between variables. A change in one variable results in the change of the other variable. On the other hand, the causal hypothesis proposes an effect on the dependent due to manipulation of the independent variable.

6. Null Hypothesis

Null hypothesis states a negative statement to support the researcher’s findings that there is no relationship between two variables. There will be no changes in the dependent variable due the manipulation of the independent variable. Furthermore, it states results are due to chance and are not significant in terms of supporting the idea being investigated.

7. Alternative Hypothesis

It states that there is a relationship between the two variables of the study and that the results are significant to the research topic. An experimental hypothesis predicts what changes will take place in the dependent variable when the independent variable is manipulated. Also, it states that the results are not due to chance and that they are significant in terms of supporting the theory being investigated.

Research Hypothesis Examples of Independent and Dependent Variables

Research Hypothesis Example 1 The greater number of coal plants in a region (independent variable) increases water pollution (dependent variable). If you change the independent variable (building more coal factories), it will change the dependent variable (amount of water pollution).
Research Hypothesis Example 2 What is the effect of diet or regular soda (independent variable) on blood sugar levels (dependent variable)? If you change the independent variable (the type of soda you consume), it will change the dependent variable (blood sugar levels)

You should not ignore the importance of the above steps. The validity of your experiment and its results rely on a robust testable hypothesis. Developing a strong testable hypothesis has few advantages, it compels us to think intensely and specifically about the outcomes of a study. Consequently, it enables us to understand the implication of the question and the different variables involved in the study. Furthermore, it helps us to make precise predictions based on prior research. Hence, forming a hypothesis would be of great value to the research. Here are some good examples of testable hypotheses.

More importantly, you need to build a robust testable research hypothesis for your scientific experiments. A testable hypothesis is a hypothesis that can be proved or disproved as a result of experimentation.

Importance of a Testable Hypothesis

To devise and perform an experiment using scientific method, you need to make sure that your hypothesis is testable. To be considered testable, some essential criteria must be met:

  • There must be a possibility to prove that the hypothesis is true.
  • There must be a possibility to prove that the hypothesis is false.
  • The results of the hypothesis must be reproducible.

Without these criteria, the hypothesis and the results will be vague. As a result, the experiment will not prove or disprove anything significant.

What are your experiences with building hypotheses for scientific experiments? What challenges did you face? How did you overcome these challenges? Please share your thoughts with us in the comments section.

Frequently Asked Questions

The steps to write a research hypothesis are: 1. Stating the problem: Ensure that the hypothesis defines the research problem 2. Writing a hypothesis as an 'if-then' statement: Include the action and the expected outcome of your study by following a ‘if-then’ structure. 3. Defining the variables: Define the variables as Dependent or Independent based on their dependency to other factors. 4. Scrutinizing the hypothesis: Identify the type of your hypothesis

Hypothesis testing is a statistical tool which is used to make inferences about a population data to draw conclusions for a particular hypothesis.

Hypothesis in statistics is a formal statement about the nature of a population within a structured framework of a statistical model. It is used to test an existing hypothesis by studying a population.

Research hypothesis is a statement that introduces a research question and proposes an expected result. It forms the basis of scientific experiments.

The different types of hypothesis in research are: • Null hypothesis: Null hypothesis is a negative statement to support the researcher’s findings that there is no relationship between two variables. • Alternate hypothesis: Alternate hypothesis predicts the relationship between the two variables of the study. • Directional hypothesis: Directional hypothesis specifies the expected direction to be followed to determine the relationship between variables. • Non-directional hypothesis: Non-directional hypothesis does not predict the exact direction or nature of the relationship between the two variables. • Simple hypothesis: Simple hypothesis predicts the relationship between a single dependent variable and a single independent variable. • Complex hypothesis: Complex hypothesis predicts the relationship between two or more independent and dependent variables. • Associative and casual hypothesis: Associative and casual hypothesis predicts the relationship between two or more independent and dependent variables. • Empirical hypothesis: Empirical hypothesis can be tested via experiments and observation. • Statistical hypothesis: A statistical hypothesis utilizes statistical models to draw conclusions about broader populations.

' src=

Wow! You really simplified your explanation that even dummies would find it easy to comprehend. Thank you so much.

Thanks a lot for your valuable guidance.

I enjoy reading the post. Hypotheses are actually an intrinsic part in a study. It bridges the research question and the methodology of the study.

Useful piece!

This is awesome.Wow.

It very interesting to read the topic, can you guide me any specific example of hypothesis process establish throw the Demand and supply of the specific product in market

Nicely explained

It is really a useful for me Kindly give some examples of hypothesis

It was a well explained content ,can you please give me an example with the null and alternative hypothesis illustrated

clear and concise. thanks.

So Good so Amazing

Good to learn

Thanks a lot for explaining to my level of understanding

Explained well and in simple terms. Quick read! Thank you

It awesome. It has really positioned me in my research project

Rate this article Cancel Reply

Your email address will not be published.

what is a hypothesis in a scientific report

Enago Academy's Most Popular Articles

Content Analysis vs Thematic Analysis: What's the difference?

  • Reporting Research

Choosing the Right Analytical Approach: Thematic analysis vs. content analysis for data interpretation

In research, choosing the right approach to understand data is crucial for deriving meaningful insights.…

Cross-sectional and Longitudinal Study Design

Comparing Cross Sectional and Longitudinal Studies: 5 steps for choosing the right approach

The process of choosing the right research design can put ourselves at the crossroads of…

what is a hypothesis in a scientific report

  • Industry News

COPE Forum Discussion Highlights Challenges and Urges Clarity in Institutional Authorship Standards

The COPE forum discussion held in December 2023 initiated with a fundamental question — is…

Networking in Academic Conferences

  • Career Corner

Unlocking the Power of Networking in Academic Conferences

Embarking on your first academic conference experience? Fear not, we got you covered! Academic conferences…

Research recommendation

Research Recommendations – Guiding policy-makers for evidence-based decision making

Research recommendations play a crucial role in guiding scholars and researchers toward fruitful avenues of…

Choosing the Right Analytical Approach: Thematic analysis vs. content analysis for…

Comparing Cross Sectional and Longitudinal Studies: 5 steps for choosing the right…

How to Design Effective Research Questionnaires for Robust Findings

what is a hypothesis in a scientific report

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

what is a hypothesis in a scientific report

What should universities' stance be on AI tools in research and academic writing?

  • OC Test Preparation
  • Selective School Test Preparation
  • Maths Acceleration
  • English Advanced
  • Maths Standard
  • Maths Advanced
  • Maths Extension 1
  • English Standard
  • English Common Module
  • Maths Standard 2
  • Maths Extension 2
  • Business Studies
  • Legal Studies
  • UCAT Exam Preparation

Select a year to see available courses

  • Level 7 English
  • Level 7 Maths
  • Level 8 English
  • Level 8 Maths
  • Level 9 English
  • Level 9 Maths
  • Level 9 Science
  • Level 10 English
  • Level 10 Maths
  • Level 10 Science
  • VCE English Units 1/2
  • VCE Biology Units 1/2
  • VCE Chemistry Units 1/2
  • VCE Physics Units 1/2
  • VCE Maths Methods Units 1/2
  • VCE English Units 3/4
  • VCE Maths Methods Units 3/4
  • VCE Biology Unit 3/4
  • VCE Chemistry Unit 3/4
  • VCE Physics Unit 3/4
  • Castle Hill
  • Strathfield
  • Sydney City
  • Inspirational Teachers
  • Great Learning Environments
  • Proven Results
  • OC Test Guide
  • Selective Schools Guide
  • Reading List
  • Year 6 English
  • NSW Primary School Rankings
  • Year 7 & 8 English
  • Year 9 English
  • Year 10 English
  • Year 11 English Standard
  • Year 11 English Advanced
  • Year 12 English Standard
  • Year 12 English Advanced
  • HSC English Skills
  • How To Write An Essay
  • How to Analyse Poetry
  • English Techniques Toolkit
  • Year 7 Maths
  • Year 8 Maths
  • Year 9 Maths
  • Year 10 Maths
  • Year 11 Maths Advanced
  • Year 11 Maths Extension 1
  • Year 12 Maths Standard 2
  • Year 12 Maths Advanced
  • Year 12 Maths Extension 1
  • Year 12 Maths Extension 2
  • Year 11 Biology
  • Year 11 Chemistry
  • Year 11 Physics
  • Year 12 Biology
  • Year 12 Chemistry
  • Year 12 Physics
  • Physics Practical Skills
  • Periodic Table
  • NSW High Schools Guide
  • NSW High School Rankings
  • ATAR & Scaling Guide
  • HSC Study Planning Kit
  • Student Success Secrets
  • 1300 008 008
  • Book a Free Trial

How to Write a Scientific Report | Step-by-Step Guide

Got to document an experiment but don't know how? In this post, we'll guide you step-by-step through how to write a scientific report and provide you with an example.

' src=

Get free study tips and resources delivered to your inbox.

Join 75,893 students who already have a head start.

" * " indicates required fields

You might also like

  • How To Create A Weekly Study Rhythm | Step-By-Step Guide
  • Transferring to a Selective High School in Year 11
  • How Parents Can Limit Study Distractions
  • 4 Ways To Find Inspiration For Creative Writing
  • Tips For Matrix Year 11 Subject Selection

Related courses

Year 9 science, year 10 science.

Is your teacher expecting you to write an experimental report for every class experiment? Are you still unsure about how to write a scientific report properly? Don’t fear! We will guide you through all the parts of a scientific report, step-by-step.

How to write a scientific report:

  • What is a scientific report
  • General rules to write Scientific reports
  • Syllabus dot point 
  • Introduction/Background information
  • Risk assessment

What is a scientific report?

A scientific report documents all aspects of an experimental investigation. This includes:

  • The aim of the experiment
  • The hypothesis
  • An introduction to the relevant background theory
  • The methods used
  • The results
  • A discussion of the results
  • The conclusion

Scientific reports allow their readers to understand the experiment without doing it themselves. In addition, scientific reports give others the opportunity to check the methodology of the experiment to ensure the validity of the results.

A scientific report is written in several stages. We write the introduction, aim, and hypothesis before performing the experiment, record the results during the experiment, and complete the discussion and conclusions after the experiment.

But, before we delve deeper into how to write a scientific report, we need to have a science experiment to write about! Read our 7 Simple Experiments You Can Do At Home article and see which one you want to do.

blog-how-to-write-a-scientific-report-experiment

General rules about writing scientific reports

Learning how to write a scientific report is different from writing English essays or speeches!

You have to use:

  • Passive voice (which you should avoid when writing for other subjects like English!)
  • Past-tense language
  • Headings and subheadings
  • A pencil to draw scientific diagrams and graphs
  • Simple and clear lines for scientific diagrams
  • Tables and graphs where necessary

Structure of scientific reports:

Now that you know the general rules on how to write scientific reports, let’s look at the conventions for their structure!

The title should simply introduce what your experiment is about.

The Role of Light in Photosynthesis

2. Introduction/Background information

Write a paragraph that gives your readers background information to understand your experiment.

This includes explaining scientific theories, processes and other related knowledge.

Photosynthesis is a vital process for life. It occurs when plants intake carbon dioxide, water, and light, and results in the production of glucose and water. The light required for photosynthesis is absorbed by chlorophyll, the green pigment of plants, which is contained in the chloroplasts.

The glucose produced through photosynthesis is stored as starch, which is used as an energy source for the plant and its consumers.

The presence of starch in the leaves of a plant indicates that photosynthesis has occurred.

blog-how-to-write-a-scientific-report-photosynthesis

The aim identifies what is going to be tested in the experiment. This should be short, concise and clear.

The aim of the experiment is to test whether light is required for photosynthesis to occur.

4. Hypothesis

The hypothesis is a prediction of the outcome of the experiment. You have to use background information to make an educated prediction.

It is predicted that photosynthesis will occur only in leaves that are exposed to light and not in leaves that are not exposed to light. This will be indicated by the presence or absence of starch in the leaves.

5. Risk assessment

Identify the hazards associated with the experiment and provide a method to prevent or minimise the risks. A hazard is something that can cause harm, and the risk is the likelihood that harm will occur from the hazard.

A table is an excellent way to present your risk assessment.

Remember, you have to specify the  type of harm that can occur because of the hazard. It is not enough to simply identify the hazard.

  • Do not write:  “Scissors are sharp”
  • Instead, you have to write:  “Scissors are sharp and can cause injury”

blog-how-to-write-a-scientific-report-photosynthesis-risk

The method has 3 parts:

  • A list of every material used
  • Steps of what you did in the experiment
  • A scientific diagram of the experimental apparatus

Let’s break down what you need to do for each section.

6a. Materials

This must list every piece of equipment and material you used in the experiment.

Remember, you need to also specify the amount of each material you used.

  • 1 geranium plant
  • Aluminium foil
  • 2 test tubes
  • 1 test tube rack
  • 1 pair of scissors
  • 1 250 mL beaker
  • 1 pair of forceps
  • 1 10 mL measuring cylinder
  • Iodine solution (5 mL)
  • Methylated spirit (50ml)
  • Boiling water
  • 2 Petri dishes

blog-how-to-write-a-scientific-report-photosynthesis-material

The rule of thumb is that you should write the method in a clear way so that readers are able to repeat the experiment and get similar results.

Using a numbered list for the steps of your experimental procedure is much clearer than writing a whole paragraph of text.  The steps should:

  • Be written in a sequential order, based on when they were performed.
  • Specify any equipment that was used.
  • Specify the quantity of any materials that were used.

You also need to use past tense and passive voice when you are writing your method. Scientific reports are supposed to show the readers what you did in the experiment, not what you will do.

  • Aluminium foil was used to fully cover a leaf of the geranium plant. The plant was left in the sun for three days.
  • On the third day, the covered leaf and 1 non-covered leaf were collected from the plant. The foil was removed from the covered leaf, and a 1 cm square was cut from each leaf using a pair of scissors.
  • 150 mL of water was boiled in a kettle and poured into a 250 mL beaker.
  • Using forceps, the 1 cm square of covered leaf was placed into the beaker of boiling water for 2 minutes. It was then placed in a test tube labelled “dark”.
  • The water in the beaker was discarded and replaced with 150 mL of freshly boiled water.
  • Using forceps, the 1 cm square non-covered leaf was placed into the beaker of boiling water for 2 minutes. It was then placed in a test tube labelled “light”
  • 5 mL of methylated spirit was measured with a measuring cylinder and poured into each test tube so that the leaves were fully covered.
  • The water in the beaker was replaced with 150 mL of freshly boiled water and both the “light” and “dark” test tubes were immersed in the beaker of boiling water for 5 minutes.
  • The leaves were collected from each test tube with forceps, rinsed under cold running water, and placed onto separate labelled Petri dishes.
  • 3 drops of iodine solution were added to each leaf.
  • Both Petri dishes were placed side by side and observations were recorded.
  • The experiment was repeated 5 times, and results were compared between different groups.

6c. Diagram

After you finish your steps, it is time to draw your scientific diagrams! Here are some rules for drawing scientific diagrams:

  • Always use a pencil to draw your scientific diagrams.
  • Use simple, sharp, 2D lines and shapes to draw your diagram. Don’t draw 3D shapes or use shading.
  • Label everything in your diagram.
  • Use thin, straight lines to label your diagram. Do not use arrows.
  • Ensure that the label lines touch the outline of the equipment you are labelling and not cross over it or stop short of it
  • The label lines should never cross over each other.
  • Use a ruler for any straight lines in your diagram.
  • Draw a sufficiently large diagram so all components can be seen clearly.

blog-how-to-write-a-scientific-report-scientific-diagram-photosynthesis

This is where you document the results of your experiment. The data that you record for your experiment will generally be qualitative and/or quantitative.

Qualitative data is data that relates to qualities and is based on observations (qualitative – quality). This type of data is descriptive and is recorded in words. For example, the colour changed from green to orange, or the liquid became hot.

Quantitative data refers to numerical data (quantitative – quantity). This type of data is recorded using numbers and is either measured or counted. For example, the plant grew 5.2 cm, or there were 5 frogs.

You also need to record your results in an appropriate way. Most of the time, a table is the best way to do this.

Here are some rules to using tables

  • Use a pencil and a ruler to draw your table
  • Draw neat and straight lines
  • Ensure that the table is closed (connect all your lines)
  • Don’t cross your lines (erase any lines that stick out of the table)
  • Use appropriate columns and rows
  • Properly name each column and row (including the units of measurement in brackets)
  • Do not write your units in the body of your table (units belong in the header)
  • Always include a title

Note : If your results require calculations, clearly write each step.

Observations of the effects of light on the amount of starch in plant leaves.

blog-how-to-write-a-scientific-report-photosynthesis-results

If quantitative data was recorded, the data is often also plotted on a graph.

8. Discussion

The discussion is where you analyse and interpret your results, and identify any experimental errors or possible areas of improvements.

You should divide your discussion as follows.

1. Trend in the results

Describe the ‘trend’ in your results. That is, the relationship you observed between your independent and dependent variables.

The independent variable is the variable that you are changing in the experiment. In this experiment, it is the amount of light that the leaves are exposed to.

The dependent variable is the variable that you are measuring in the experiment, In this experiment, it is the presence of starch in the leaves.

Explain how a particular result is achieved by referring to scientific knowledge, theories and any other scientific resources you find. 2. Scientific explanation: 

The presence of starch is indicated when the addition of iodine causes the leaf to turn dark purple. The results show that starch was present in the leaves that were exposed to light, while the leaves that were not exposed to light did not contain starch.

2. Scientific explanation:

Provide an explanation of the results using scientific knowledge, theories and any other scientific resources you find.

As starch is produced during photosynthesis, these results show that light plays a key role in photosynthesis.

3. Validity 

Validity refers to whether or not your results are valid. This can be done by examining your variables.

VA lidity =  VA riables

Identify the independent, dependent, controlled variables and the control experiment (if you have one).

The controlled variables are the variables that you keep the same across all tests e.g. the size of the leaf sample.

The control experiment is where you don’t apply an independent variable. It is untouched for the whole experiment.

Ensure that you never change more than one variable at a time!

The independent variable of the experiment was amount of light that the leaves were exposed to (the covered and uncovered geranium leaf), while the dependent variable was the presence of starch. The controlled variables were the size of the leaf sample, the duration of the experiment, the amount of time the solutions were heated, and the amount of iodine solution used.

4. Reliability 

Identify how you ensured the reliability of the results.

RE liability = RE petition

Show that you repeated your experiments, cross-checked your results with other groups or collated your results with the class.

The reliability of the results was ensured by repeating the experiment 5 times and comparing results with other groups. Since other groups obtained comparable results, the results are reliable.

5. Accuracy

Accuracy should be discussed if your results are in the form of quantitative data, and there is an accepted value for the result.

Accuracy would not be discussed for our example photosynthesis experiment as qualitative data was collected, however it would if we were measuring gravity using a pendulum:

The measured value of gravity was 9.8 m/s 2 , which is in agreement with the accepted value of 9.8 m/s 2 .

6. Possible improvements 

Identify any errors or risks found in the experiment and provide a method to improve it.

If there are none, then suggest new ways to improve the experimental design, and/or minimise error and risks.

blog-how-to-write-a-scientific-report-improve

Possible improvements could be made by including control experiments. For example, testing whether the iodine solution turns dark purple when added to water or methylated spirits. This would help to ensure that the purple colour observed in the experiments is due to the presence of starch in the leaves rather than impurities.

9. Conclusion

State whether the aim was achieved, and if your hypothesis was supported.

The aim of the investigation was achieved, and it was found that light is required for photosynthesis to occur. This was evidenced by the presence of starch in leaves that had been exposed to light, and the absence of starch in leaves that had been unexposed. These results support the proposed hypothesis.

Written by Matrix Science Team

' src=

© Matrix Education and www.matrix.edu.au, 2023. Unauthorised use and/or duplication of this material without express and written permission from this site’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Matrix Education and www.matrix.edu.au with appropriate and specific direction to the original content.

Year 9 Science tutoring at Matrix is known for helping students build a strong foundation before studying Biology, Chemistry or Physics in senior school.

Learning methods available

Level 9 Science course that covers every aspect of the new Victorian Science Curriculum.

Level 10 Science course that covers every aspect of the new Victorian Science Curriculum.

Year 10 Science tutoring at Matrix is known for helping students build a strong foundation before studying Biology, Chemistry or Physics in Year 11 and 12.

Related articles

blog-study-space-hero-banner

Creating the Best Study Space For Yourself

Good study practice means more than just revision and books. Learn how creating the right space is an important part of academic success.

Hero Banner 5 Editing Skills You Need To Ace High School

5 Editing Skills You Need To Ace High School

How often do you edit your work before you submit it? In this article, we go through the 5 essential editing skills you need to bring your English essays and creatives to the next level.

what is a hypothesis in a scientific report

Scientific Method and Science Skills

In this post, the Matrix Science Team explain why you should consider a Science subject for Years 11 and 12.

Scientific Hypothesis Examples

  • Scientific Method
  • Chemical Laws
  • Periodic Table
  • Projects & Experiments
  • Biochemistry
  • Physical Chemistry
  • Medical Chemistry
  • Chemistry In Everyday Life
  • Famous Chemists
  • Activities for Kids
  • Abbreviations & Acronyms
  • Weather & Climate
  • Ph.D., Biomedical Sciences, University of Tennessee at Knoxville
  • B.A., Physics and Mathematics, Hastings College

A hypothesis is an educated guess about what you think will happen in a scientific experiment, based on your observations. Before conducting the experiment, you propose a hypothesis so that you can determine if your prediction is supported.

There are several ways you can state a hypothesis, but the best hypotheses are ones you can test and easily refute. Why would you want to disprove or discard your own hypothesis? Well, it is the easiest way to demonstrate that two factors are related. Here are some good scientific hypothesis examples:

  • Hypothesis: All forks have three tines. This would be disproven if you find any fork with a different number of tines.
  • Hypothesis: There is no relationship between smoking and lung cancer. While it is difficult to establish cause and effect in health issues, you can apply statistics to data to discredit or support this hypothesis.
  • Hypothesis: Plants require liquid water to survive. This would be disproven if you find a plant that doesn't need it.
  • Hypothesis: Cats do not show a paw preference (equivalent to being right- or left-handed). You could gather data around the number of times cats bat at a toy with either paw and analyze the data to determine whether cats, on the whole, favor one paw over the other. Be careful here, because individual cats, like people, might (or might not) express a preference. A large sample size would be helpful.
  • Hypothesis: If plants are watered with a 10% detergent solution, their growth will be negatively affected. Some people prefer to state a hypothesis in an "If, then" format. An alternate hypothesis might be: Plant growth will be unaffected by water with a 10% detergent solution.
  • Scientific Hypothesis, Model, Theory, and Law
  • What Are the Elements of a Good Hypothesis?
  • What Is a Hypothesis? (Science)
  • Understanding Simple vs Controlled Experiments
  • Six Steps of the Scientific Method
  • What Is a Testable Hypothesis?
  • Null Hypothesis Definition and Examples
  • What Are Examples of a Hypothesis?
  • How To Design a Science Fair Experiment
  • Null Hypothesis Examples
  • What 'Fail to Reject' Means in a Hypothesis Test
  • Middle School Science Fair Project Ideas
  • Effect of Acids and Bases on the Browning of Apples
  • High School Science Fair Projects
  • How to Write a Science Fair Project Report

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Academic writing
  • How to write a lab report

How To Write A Lab Report | Step-by-Step Guide & Examples

Published on May 20, 2021 by Pritha Bhandari . Revised on July 23, 2023.

A lab report conveys the aim, methods, results, and conclusions of a scientific experiment. The main purpose of a lab report is to demonstrate your understanding of the scientific method by performing and evaluating a hands-on lab experiment. This type of assignment is usually shorter than a research paper .

Lab reports are commonly used in science, technology, engineering, and mathematics (STEM) fields. This article focuses on how to structure and write a lab report.

Instantly correct all language mistakes in your text

Upload your document to correct all your mistakes in minutes

upload-your-document-ai-proofreader

Table of contents

Structuring a lab report, introduction, other interesting articles, frequently asked questions about lab reports.

The sections of a lab report can vary between scientific fields and course requirements, but they usually contain the purpose, methods, and findings of a lab experiment .

Each section of a lab report has its own purpose.

  • Title: expresses the topic of your study
  • Abstract : summarizes your research aims, methods, results, and conclusions
  • Introduction: establishes the context needed to understand the topic
  • Method: describes the materials and procedures used in the experiment
  • Results: reports all descriptive and inferential statistical analyses
  • Discussion: interprets and evaluates results and identifies limitations
  • Conclusion: sums up the main findings of your experiment
  • References: list of all sources cited using a specific style (e.g. APA )
  • Appendices : contains lengthy materials, procedures, tables or figures

Although most lab reports contain these sections, some sections can be omitted or combined with others. For example, some lab reports contain a brief section on research aims instead of an introduction, and a separate conclusion is not always required.

If you’re not sure, it’s best to check your lab report requirements with your instructor.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Your title provides the first impression of your lab report – effective titles communicate the topic and/or the findings of your study in specific terms.

Create a title that directly conveys the main focus or purpose of your study. It doesn’t need to be creative or thought-provoking, but it should be informative.

  • The effects of varying nitrogen levels on tomato plant height.
  • Testing the universality of the McGurk effect.
  • Comparing the viscosity of common liquids found in kitchens.

An abstract condenses a lab report into a brief overview of about 150–300 words. It should provide readers with a compact version of the research aims, the methods and materials used, the main results, and the final conclusion.

Think of it as a way of giving readers a preview of your full lab report. Write the abstract last, in the past tense, after you’ve drafted all the other sections of your report, so you’ll be able to succinctly summarize each section.

To write a lab report abstract, use these guiding questions:

  • What is the wider context of your study?
  • What research question were you trying to answer?
  • How did you perform the experiment?
  • What did your results show?
  • How did you interpret your results?
  • What is the importance of your findings?

Nitrogen is a necessary nutrient for high quality plants. Tomatoes, one of the most consumed fruits worldwide, rely on nitrogen for healthy leaves and stems to grow fruit. This experiment tested whether nitrogen levels affected tomato plant height in a controlled setting. It was expected that higher levels of nitrogen fertilizer would yield taller tomato plants.

Levels of nitrogen fertilizer were varied between three groups of tomato plants. The control group did not receive any nitrogen fertilizer, while one experimental group received low levels of nitrogen fertilizer, and a second experimental group received high levels of nitrogen fertilizer. All plants were grown from seeds, and heights were measured 50 days into the experiment.

The effects of nitrogen levels on plant height were tested between groups using an ANOVA. The plants with the highest level of nitrogen fertilizer were the tallest, while the plants with low levels of nitrogen exceeded the control group plants in height. In line with expectations and previous findings, the effects of nitrogen levels on plant height were statistically significant. This study strengthens the importance of nitrogen for tomato plants.

Your lab report introduction should set the scene for your experiment. One way to write your introduction is with a funnel (an inverted triangle) structure:

  • Start with the broad, general research topic
  • Narrow your topic down your specific study focus
  • End with a clear research question

Begin by providing background information on your research topic and explaining why it’s important in a broad real-world or theoretical context. Describe relevant previous research on your topic and note how your study may confirm it or expand it, or fill a gap in the research field.

This lab experiment builds on previous research from Haque, Paul, and Sarker (2011), who demonstrated that tomato plant yield increased at higher levels of nitrogen. However, the present research focuses on plant height as a growth indicator and uses a lab-controlled setting instead.

Next, go into detail on the theoretical basis for your study and describe any directly relevant laws or equations that you’ll be using. State your main research aims and expectations by outlining your hypotheses .

Based on the importance of nitrogen for tomato plants, the primary hypothesis was that the plants with the high levels of nitrogen would grow the tallest. The secondary hypothesis was that plants with low levels of nitrogen would grow taller than plants with no nitrogen.

Your introduction doesn’t need to be long, but you may need to organize it into a few paragraphs or with subheadings such as “Research Context” or “Research Aims.”

A lab report Method section details the steps you took to gather and analyze data. Give enough detail so that others can follow or evaluate your procedures. Write this section in the past tense. If you need to include any long lists of procedural steps or materials, place them in the Appendices section but refer to them in the text here.

You should describe your experimental design, your subjects, materials, and specific procedures used for data collection and analysis.

Experimental design

Briefly note whether your experiment is a within-subjects  or between-subjects design, and describe how your sample units were assigned to conditions if relevant.

A between-subjects design with three groups of tomato plants was used. The control group did not receive any nitrogen fertilizer. The first experimental group received a low level of nitrogen fertilizer, while the second experimental group received a high level of nitrogen fertilizer.

Describe human subjects in terms of demographic characteristics, and animal or plant subjects in terms of genetic background. Note the total number of subjects as well as the number of subjects per condition or per group. You should also state how you recruited subjects for your study.

List the equipment or materials you used to gather data and state the model names for any specialized equipment.

List of materials

35 Tomato seeds

15 plant pots (15 cm tall)

Light lamps (50,000 lux)

Nitrogen fertilizer

Measuring tape

Describe your experimental settings and conditions in detail. You can provide labelled diagrams or images of the exact set-up necessary for experimental equipment. State how extraneous variables were controlled through restriction or by fixing them at a certain level (e.g., keeping the lab at room temperature).

Light levels were fixed throughout the experiment, and the plants were exposed to 12 hours of light a day. Temperature was restricted to between 23 and 25℃. The pH and carbon levels of the soil were also held constant throughout the experiment as these variables could influence plant height. The plants were grown in rooms free of insects or other pests, and they were spaced out adequately.

Your experimental procedure should describe the exact steps you took to gather data in chronological order. You’ll need to provide enough information so that someone else can replicate your procedure, but you should also be concise. Place detailed information in the appendices where appropriate.

In a lab experiment, you’ll often closely follow a lab manual to gather data. Some instructors will allow you to simply reference the manual and state whether you changed any steps based on practical considerations. Other instructors may want you to rewrite the lab manual procedures as complete sentences in coherent paragraphs, while noting any changes to the steps that you applied in practice.

If you’re performing extensive data analysis, be sure to state your planned analysis methods as well. This includes the types of tests you’ll perform and any programs or software you’ll use for calculations (if relevant).

First, tomato seeds were sown in wooden flats containing soil about 2 cm below the surface. Each seed was kept 3-5 cm apart. The flats were covered to keep the soil moist until germination. The seedlings were removed and transplanted to pots 8 days later, with a maximum of 2 plants to a pot. Each pot was watered once a day to keep the soil moist.

The nitrogen fertilizer treatment was applied to the plant pots 12 days after transplantation. The control group received no treatment, while the first experimental group received a low concentration, and the second experimental group received a high concentration. There were 5 pots in each group, and each plant pot was labelled to indicate the group the plants belonged to.

50 days after the start of the experiment, plant height was measured for all plants. A measuring tape was used to record the length of the plant from ground level to the top of the tallest leaf.

In your results section, you should report the results of any statistical analysis procedures that you undertook. You should clearly state how the results of statistical tests support or refute your initial hypotheses.

The main results to report include:

  • any descriptive statistics
  • statistical test results
  • the significance of the test results
  • estimates of standard error or confidence intervals

The mean heights of the plants in the control group, low nitrogen group, and high nitrogen groups were 20.3, 25.1, and 29.6 cm respectively. A one-way ANOVA was applied to calculate the effect of nitrogen fertilizer level on plant height. The results demonstrated statistically significant ( p = .03) height differences between groups.

Next, post-hoc tests were performed to assess the primary and secondary hypotheses. In support of the primary hypothesis, the high nitrogen group plants were significantly taller than the low nitrogen group and the control group plants. Similarly, the results supported the secondary hypothesis: the low nitrogen plants were taller than the control group plants.

These results can be reported in the text or in tables and figures. Use text for highlighting a few key results, but present large sets of numbers in tables, or show relationships between variables with graphs.

You should also include sample calculations in the Results section for complex experiments. For each sample calculation, provide a brief description of what it does and use clear symbols. Present your raw data in the Appendices section and refer to it to highlight any outliers or trends.

The Discussion section will help demonstrate your understanding of the experimental process and your critical thinking skills.

In this section, you can:

  • Interpret your results
  • Compare your findings with your expectations
  • Identify any sources of experimental error
  • Explain any unexpected results
  • Suggest possible improvements for further studies

Interpreting your results involves clarifying how your results help you answer your main research question. Report whether your results support your hypotheses.

  • Did you measure what you sought out to measure?
  • Were your analysis procedures appropriate for this type of data?

Compare your findings with other research and explain any key differences in findings.

  • Are your results in line with those from previous studies or your classmates’ results? Why or why not?

An effective Discussion section will also highlight the strengths and limitations of a study.

  • Did you have high internal validity or reliability?
  • How did you establish these aspects of your study?

When describing limitations, use specific examples. For example, if random error contributed substantially to the measurements in your study, state the particular sources of error (e.g., imprecise apparatus) and explain ways to improve them.

The results support the hypothesis that nitrogen levels affect plant height, with increasing levels producing taller plants. These statistically significant results are taken together with previous research to support the importance of nitrogen as a nutrient for tomato plant growth.

However, unlike previous studies, this study focused on plant height as an indicator of plant growth in the present experiment. Importantly, plant height may not always reflect plant health or fruit yield, so measuring other indicators would have strengthened the study findings.

Another limitation of the study is the plant height measurement technique, as the measuring tape was not suitable for plants with extreme curvature. Future studies may focus on measuring plant height in different ways.

The main strengths of this study were the controls for extraneous variables, such as pH and carbon levels of the soil. All other factors that could affect plant height were tightly controlled to isolate the effects of nitrogen levels, resulting in high internal validity for this study.

Your conclusion should be the final section of your lab report. Here, you’ll summarize the findings of your experiment, with a brief overview of the strengths and limitations, and implications of your study for further research.

Some lab reports may omit a Conclusion section because it overlaps with the Discussion section, but you should check with your instructor before doing so.

If you want to know more about AI for academic writing, AI tools, or fallacies make sure to check out some of our other articles with explanations and examples or go directly to our tools!

  • Ad hominem fallacy
  • Post hoc fallacy
  • Appeal to authority fallacy
  • False cause fallacy
  • Sunk cost fallacy
  • Deep learning
  • Generative AI
  • Machine learning
  • Reinforcement learning
  • Supervised vs. unsupervised learning

 (AI) Tools

  • Grammar Checker
  • Paraphrasing Tool
  • Text Summarizer
  • AI Detector
  • Plagiarism Checker
  • Citation Generator

A lab report conveys the aim, methods, results, and conclusions of a scientific experiment . Lab reports are commonly assigned in science, technology, engineering, and mathematics (STEM) fields.

The purpose of a lab report is to demonstrate your understanding of the scientific method with a hands-on lab experiment. Course instructors will often provide you with an experimental design and procedure. Your task is to write up how you actually performed the experiment and evaluate the outcome.

In contrast, a research paper requires you to independently develop an original argument. It involves more in-depth research and interpretation of sources and data.

A lab report is usually shorter than a research paper.

The sections of a lab report can vary between scientific fields and course requirements, but it usually contains the following:

  • Abstract: summarizes your research aims, methods, results, and conclusions
  • References: list of all sources cited using a specific style (e.g. APA)
  • Appendices: contains lengthy materials, procedures, tables or figures

The results chapter or section simply and objectively reports what you found, without speculating on why you found these results. The discussion interprets the meaning of the results, puts them in context, and explains why they matter.

In qualitative research , results and discussion are sometimes combined. But in quantitative research , it’s considered important to separate the objective results from your interpretation of them.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bhandari, P. (2023, July 23). How To Write A Lab Report | Step-by-Step Guide & Examples. Scribbr. Retrieved April 15, 2024, from https://www.scribbr.com/academic-writing/lab-report/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, guide to experimental design | overview, steps, & examples, how to write an apa methods section, how to write an apa results section, unlimited academic ai-proofreading.

✔ Document error-free in 5minutes ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

  • Search Menu
  • Browse content in A - General Economics and Teaching
  • Browse content in A1 - General Economics
  • A11 - Role of Economics; Role of Economists; Market for Economists
  • Browse content in B - History of Economic Thought, Methodology, and Heterodox Approaches
  • Browse content in B4 - Economic Methodology
  • B49 - Other
  • Browse content in C - Mathematical and Quantitative Methods
  • Browse content in C0 - General
  • C00 - General
  • C01 - Econometrics
  • Browse content in C1 - Econometric and Statistical Methods and Methodology: General
  • C10 - General
  • C11 - Bayesian Analysis: General
  • C12 - Hypothesis Testing: General
  • C13 - Estimation: General
  • C14 - Semiparametric and Nonparametric Methods: General
  • C18 - Methodological Issues: General
  • Browse content in C2 - Single Equation Models; Single Variables
  • C21 - Cross-Sectional Models; Spatial Models; Treatment Effect Models; Quantile Regressions
  • C23 - Panel Data Models; Spatio-temporal Models
  • C26 - Instrumental Variables (IV) Estimation
  • Browse content in C3 - Multiple or Simultaneous Equation Models; Multiple Variables
  • C30 - General
  • C31 - Cross-Sectional Models; Spatial Models; Treatment Effect Models; Quantile Regressions; Social Interaction Models
  • C32 - Time-Series Models; Dynamic Quantile Regressions; Dynamic Treatment Effect Models; Diffusion Processes; State Space Models
  • C35 - Discrete Regression and Qualitative Choice Models; Discrete Regressors; Proportions
  • Browse content in C4 - Econometric and Statistical Methods: Special Topics
  • C40 - General
  • Browse content in C5 - Econometric Modeling
  • C52 - Model Evaluation, Validation, and Selection
  • C53 - Forecasting and Prediction Methods; Simulation Methods
  • C55 - Large Data Sets: Modeling and Analysis
  • Browse content in C6 - Mathematical Methods; Programming Models; Mathematical and Simulation Modeling
  • C63 - Computational Techniques; Simulation Modeling
  • C67 - Input-Output Models
  • Browse content in C7 - Game Theory and Bargaining Theory
  • C71 - Cooperative Games
  • C72 - Noncooperative Games
  • C73 - Stochastic and Dynamic Games; Evolutionary Games; Repeated Games
  • C78 - Bargaining Theory; Matching Theory
  • C79 - Other
  • Browse content in C8 - Data Collection and Data Estimation Methodology; Computer Programs
  • C83 - Survey Methods; Sampling Methods
  • Browse content in C9 - Design of Experiments
  • C90 - General
  • C91 - Laboratory, Individual Behavior
  • C92 - Laboratory, Group Behavior
  • C93 - Field Experiments
  • C99 - Other
  • Browse content in D - Microeconomics
  • Browse content in D0 - General
  • D00 - General
  • D01 - Microeconomic Behavior: Underlying Principles
  • D02 - Institutions: Design, Formation, Operations, and Impact
  • D03 - Behavioral Microeconomics: Underlying Principles
  • D04 - Microeconomic Policy: Formulation; Implementation, and Evaluation
  • Browse content in D1 - Household Behavior and Family Economics
  • D10 - General
  • D11 - Consumer Economics: Theory
  • D12 - Consumer Economics: Empirical Analysis
  • D13 - Household Production and Intrahousehold Allocation
  • D14 - Household Saving; Personal Finance
  • D15 - Intertemporal Household Choice: Life Cycle Models and Saving
  • D18 - Consumer Protection
  • Browse content in D2 - Production and Organizations
  • D20 - General
  • D21 - Firm Behavior: Theory
  • D22 - Firm Behavior: Empirical Analysis
  • D23 - Organizational Behavior; Transaction Costs; Property Rights
  • D24 - Production; Cost; Capital; Capital, Total Factor, and Multifactor Productivity; Capacity
  • Browse content in D3 - Distribution
  • D30 - General
  • D31 - Personal Income, Wealth, and Their Distributions
  • D33 - Factor Income Distribution
  • Browse content in D4 - Market Structure, Pricing, and Design
  • D40 - General
  • D41 - Perfect Competition
  • D42 - Monopoly
  • D43 - Oligopoly and Other Forms of Market Imperfection
  • D44 - Auctions
  • D47 - Market Design
  • D49 - Other
  • Browse content in D5 - General Equilibrium and Disequilibrium
  • D50 - General
  • D51 - Exchange and Production Economies
  • D52 - Incomplete Markets
  • D53 - Financial Markets
  • D57 - Input-Output Tables and Analysis
  • Browse content in D6 - Welfare Economics
  • D60 - General
  • D61 - Allocative Efficiency; Cost-Benefit Analysis
  • D62 - Externalities
  • D63 - Equity, Justice, Inequality, and Other Normative Criteria and Measurement
  • D64 - Altruism; Philanthropy
  • D69 - Other
  • Browse content in D7 - Analysis of Collective Decision-Making
  • D70 - General
  • D71 - Social Choice; Clubs; Committees; Associations
  • D72 - Political Processes: Rent-seeking, Lobbying, Elections, Legislatures, and Voting Behavior
  • D73 - Bureaucracy; Administrative Processes in Public Organizations; Corruption
  • D74 - Conflict; Conflict Resolution; Alliances; Revolutions
  • D78 - Positive Analysis of Policy Formulation and Implementation
  • Browse content in D8 - Information, Knowledge, and Uncertainty
  • D80 - General
  • D81 - Criteria for Decision-Making under Risk and Uncertainty
  • D82 - Asymmetric and Private Information; Mechanism Design
  • D83 - Search; Learning; Information and Knowledge; Communication; Belief; Unawareness
  • D84 - Expectations; Speculations
  • D85 - Network Formation and Analysis: Theory
  • D86 - Economics of Contract: Theory
  • D89 - Other
  • Browse content in D9 - Micro-Based Behavioral Economics
  • D90 - General
  • D91 - Role and Effects of Psychological, Emotional, Social, and Cognitive Factors on Decision Making
  • D92 - Intertemporal Firm Choice, Investment, Capacity, and Financing
  • Browse content in E - Macroeconomics and Monetary Economics
  • Browse content in E0 - General
  • E00 - General
  • E01 - Measurement and Data on National Income and Product Accounts and Wealth; Environmental Accounts
  • E02 - Institutions and the Macroeconomy
  • E03 - Behavioral Macroeconomics
  • Browse content in E1 - General Aggregative Models
  • E10 - General
  • E12 - Keynes; Keynesian; Post-Keynesian
  • E13 - Neoclassical
  • Browse content in E2 - Consumption, Saving, Production, Investment, Labor Markets, and Informal Economy
  • E20 - General
  • E21 - Consumption; Saving; Wealth
  • E22 - Investment; Capital; Intangible Capital; Capacity
  • E23 - Production
  • E24 - Employment; Unemployment; Wages; Intergenerational Income Distribution; Aggregate Human Capital; Aggregate Labor Productivity
  • E25 - Aggregate Factor Income Distribution
  • Browse content in E3 - Prices, Business Fluctuations, and Cycles
  • E30 - General
  • E31 - Price Level; Inflation; Deflation
  • E32 - Business Fluctuations; Cycles
  • E37 - Forecasting and Simulation: Models and Applications
  • Browse content in E4 - Money and Interest Rates
  • E40 - General
  • E41 - Demand for Money
  • E42 - Monetary Systems; Standards; Regimes; Government and the Monetary System; Payment Systems
  • E43 - Interest Rates: Determination, Term Structure, and Effects
  • E44 - Financial Markets and the Macroeconomy
  • Browse content in E5 - Monetary Policy, Central Banking, and the Supply of Money and Credit
  • E50 - General
  • E51 - Money Supply; Credit; Money Multipliers
  • E52 - Monetary Policy
  • E58 - Central Banks and Their Policies
  • Browse content in E6 - Macroeconomic Policy, Macroeconomic Aspects of Public Finance, and General Outlook
  • E60 - General
  • E62 - Fiscal Policy
  • E66 - General Outlook and Conditions
  • Browse content in E7 - Macro-Based Behavioral Economics
  • E71 - Role and Effects of Psychological, Emotional, Social, and Cognitive Factors on the Macro Economy
  • Browse content in F - International Economics
  • Browse content in F0 - General
  • F00 - General
  • Browse content in F1 - Trade
  • F10 - General
  • F11 - Neoclassical Models of Trade
  • F12 - Models of Trade with Imperfect Competition and Scale Economies; Fragmentation
  • F13 - Trade Policy; International Trade Organizations
  • F14 - Empirical Studies of Trade
  • F15 - Economic Integration
  • F16 - Trade and Labor Market Interactions
  • F18 - Trade and Environment
  • Browse content in F2 - International Factor Movements and International Business
  • F20 - General
  • F21 - International Investment; Long-Term Capital Movements
  • F22 - International Migration
  • F23 - Multinational Firms; International Business
  • Browse content in F3 - International Finance
  • F30 - General
  • F31 - Foreign Exchange
  • F32 - Current Account Adjustment; Short-Term Capital Movements
  • F34 - International Lending and Debt Problems
  • F35 - Foreign Aid
  • F36 - Financial Aspects of Economic Integration
  • Browse content in F4 - Macroeconomic Aspects of International Trade and Finance
  • F40 - General
  • F41 - Open Economy Macroeconomics
  • F42 - International Policy Coordination and Transmission
  • F43 - Economic Growth of Open Economies
  • F44 - International Business Cycles
  • Browse content in F5 - International Relations, National Security, and International Political Economy
  • F50 - General
  • F51 - International Conflicts; Negotiations; Sanctions
  • F52 - National Security; Economic Nationalism
  • F55 - International Institutional Arrangements
  • Browse content in F6 - Economic Impacts of Globalization
  • F60 - General
  • F61 - Microeconomic Impacts
  • F63 - Economic Development
  • Browse content in G - Financial Economics
  • Browse content in G0 - General
  • G00 - General
  • G01 - Financial Crises
  • G02 - Behavioral Finance: Underlying Principles
  • Browse content in G1 - General Financial Markets
  • G10 - General
  • G11 - Portfolio Choice; Investment Decisions
  • G12 - Asset Pricing; Trading volume; Bond Interest Rates
  • G14 - Information and Market Efficiency; Event Studies; Insider Trading
  • G15 - International Financial Markets
  • G18 - Government Policy and Regulation
  • Browse content in G2 - Financial Institutions and Services
  • G20 - General
  • G21 - Banks; Depository Institutions; Micro Finance Institutions; Mortgages
  • G22 - Insurance; Insurance Companies; Actuarial Studies
  • G23 - Non-bank Financial Institutions; Financial Instruments; Institutional Investors
  • G24 - Investment Banking; Venture Capital; Brokerage; Ratings and Ratings Agencies
  • G28 - Government Policy and Regulation
  • Browse content in G3 - Corporate Finance and Governance
  • G30 - General
  • G31 - Capital Budgeting; Fixed Investment and Inventory Studies; Capacity
  • G32 - Financing Policy; Financial Risk and Risk Management; Capital and Ownership Structure; Value of Firms; Goodwill
  • G33 - Bankruptcy; Liquidation
  • G34 - Mergers; Acquisitions; Restructuring; Corporate Governance
  • G38 - Government Policy and Regulation
  • Browse content in G4 - Behavioral Finance
  • G40 - General
  • G41 - Role and Effects of Psychological, Emotional, Social, and Cognitive Factors on Decision Making in Financial Markets
  • Browse content in G5 - Household Finance
  • G50 - General
  • G51 - Household Saving, Borrowing, Debt, and Wealth
  • Browse content in H - Public Economics
  • Browse content in H0 - General
  • H00 - General
  • Browse content in H1 - Structure and Scope of Government
  • H10 - General
  • H11 - Structure, Scope, and Performance of Government
  • Browse content in H2 - Taxation, Subsidies, and Revenue
  • H20 - General
  • H21 - Efficiency; Optimal Taxation
  • H22 - Incidence
  • H23 - Externalities; Redistributive Effects; Environmental Taxes and Subsidies
  • H24 - Personal Income and Other Nonbusiness Taxes and Subsidies; includes inheritance and gift taxes
  • H25 - Business Taxes and Subsidies
  • H26 - Tax Evasion and Avoidance
  • Browse content in H3 - Fiscal Policies and Behavior of Economic Agents
  • H31 - Household
  • Browse content in H4 - Publicly Provided Goods
  • H40 - General
  • H41 - Public Goods
  • H42 - Publicly Provided Private Goods
  • H44 - Publicly Provided Goods: Mixed Markets
  • Browse content in H5 - National Government Expenditures and Related Policies
  • H50 - General
  • H51 - Government Expenditures and Health
  • H52 - Government Expenditures and Education
  • H53 - Government Expenditures and Welfare Programs
  • H54 - Infrastructures; Other Public Investment and Capital Stock
  • H55 - Social Security and Public Pensions
  • H56 - National Security and War
  • H57 - Procurement
  • Browse content in H6 - National Budget, Deficit, and Debt
  • H63 - Debt; Debt Management; Sovereign Debt
  • Browse content in H7 - State and Local Government; Intergovernmental Relations
  • H70 - General
  • H71 - State and Local Taxation, Subsidies, and Revenue
  • H73 - Interjurisdictional Differentials and Their Effects
  • H75 - State and Local Government: Health; Education; Welfare; Public Pensions
  • H76 - State and Local Government: Other Expenditure Categories
  • H77 - Intergovernmental Relations; Federalism; Secession
  • Browse content in H8 - Miscellaneous Issues
  • H81 - Governmental Loans; Loan Guarantees; Credits; Grants; Bailouts
  • H83 - Public Administration; Public Sector Accounting and Audits
  • H87 - International Fiscal Issues; International Public Goods
  • Browse content in I - Health, Education, and Welfare
  • Browse content in I0 - General
  • I00 - General
  • Browse content in I1 - Health
  • I10 - General
  • I11 - Analysis of Health Care Markets
  • I12 - Health Behavior
  • I13 - Health Insurance, Public and Private
  • I14 - Health and Inequality
  • I15 - Health and Economic Development
  • I18 - Government Policy; Regulation; Public Health
  • Browse content in I2 - Education and Research Institutions
  • I20 - General
  • I21 - Analysis of Education
  • I22 - Educational Finance; Financial Aid
  • I23 - Higher Education; Research Institutions
  • I24 - Education and Inequality
  • I25 - Education and Economic Development
  • I26 - Returns to Education
  • I28 - Government Policy
  • Browse content in I3 - Welfare, Well-Being, and Poverty
  • I30 - General
  • I31 - General Welfare
  • I32 - Measurement and Analysis of Poverty
  • I38 - Government Policy; Provision and Effects of Welfare Programs
  • Browse content in J - Labor and Demographic Economics
  • Browse content in J0 - General
  • J00 - General
  • J01 - Labor Economics: General
  • J08 - Labor Economics Policies
  • Browse content in J1 - Demographic Economics
  • J10 - General
  • J12 - Marriage; Marital Dissolution; Family Structure; Domestic Abuse
  • J13 - Fertility; Family Planning; Child Care; Children; Youth
  • J14 - Economics of the Elderly; Economics of the Handicapped; Non-Labor Market Discrimination
  • J15 - Economics of Minorities, Races, Indigenous Peoples, and Immigrants; Non-labor Discrimination
  • J16 - Economics of Gender; Non-labor Discrimination
  • J18 - Public Policy
  • Browse content in J2 - Demand and Supply of Labor
  • J20 - General
  • J21 - Labor Force and Employment, Size, and Structure
  • J22 - Time Allocation and Labor Supply
  • J23 - Labor Demand
  • J24 - Human Capital; Skills; Occupational Choice; Labor Productivity
  • Browse content in J3 - Wages, Compensation, and Labor Costs
  • J30 - General
  • J31 - Wage Level and Structure; Wage Differentials
  • J33 - Compensation Packages; Payment Methods
  • J38 - Public Policy
  • Browse content in J4 - Particular Labor Markets
  • J40 - General
  • J42 - Monopsony; Segmented Labor Markets
  • J44 - Professional Labor Markets; Occupational Licensing
  • J45 - Public Sector Labor Markets
  • J48 - Public Policy
  • J49 - Other
  • Browse content in J5 - Labor-Management Relations, Trade Unions, and Collective Bargaining
  • J50 - General
  • J51 - Trade Unions: Objectives, Structure, and Effects
  • J53 - Labor-Management Relations; Industrial Jurisprudence
  • Browse content in J6 - Mobility, Unemployment, Vacancies, and Immigrant Workers
  • J60 - General
  • J61 - Geographic Labor Mobility; Immigrant Workers
  • J62 - Job, Occupational, and Intergenerational Mobility
  • J63 - Turnover; Vacancies; Layoffs
  • J64 - Unemployment: Models, Duration, Incidence, and Job Search
  • J65 - Unemployment Insurance; Severance Pay; Plant Closings
  • J68 - Public Policy
  • Browse content in J7 - Labor Discrimination
  • J71 - Discrimination
  • J78 - Public Policy
  • Browse content in J8 - Labor Standards: National and International
  • J81 - Working Conditions
  • J88 - Public Policy
  • Browse content in K - Law and Economics
  • Browse content in K0 - General
  • K00 - General
  • Browse content in K1 - Basic Areas of Law
  • K14 - Criminal Law
  • K2 - Regulation and Business Law
  • Browse content in K3 - Other Substantive Areas of Law
  • K31 - Labor Law
  • Browse content in K4 - Legal Procedure, the Legal System, and Illegal Behavior
  • K40 - General
  • K41 - Litigation Process
  • K42 - Illegal Behavior and the Enforcement of Law
  • Browse content in L - Industrial Organization
  • Browse content in L0 - General
  • L00 - General
  • Browse content in L1 - Market Structure, Firm Strategy, and Market Performance
  • L10 - General
  • L11 - Production, Pricing, and Market Structure; Size Distribution of Firms
  • L13 - Oligopoly and Other Imperfect Markets
  • L14 - Transactional Relationships; Contracts and Reputation; Networks
  • L15 - Information and Product Quality; Standardization and Compatibility
  • L16 - Industrial Organization and Macroeconomics: Industrial Structure and Structural Change; Industrial Price Indices
  • L19 - Other
  • Browse content in L2 - Firm Objectives, Organization, and Behavior
  • L21 - Business Objectives of the Firm
  • L22 - Firm Organization and Market Structure
  • L23 - Organization of Production
  • L24 - Contracting Out; Joint Ventures; Technology Licensing
  • L25 - Firm Performance: Size, Diversification, and Scope
  • L26 - Entrepreneurship
  • Browse content in L3 - Nonprofit Organizations and Public Enterprise
  • L33 - Comparison of Public and Private Enterprises and Nonprofit Institutions; Privatization; Contracting Out
  • Browse content in L4 - Antitrust Issues and Policies
  • L40 - General
  • L41 - Monopolization; Horizontal Anticompetitive Practices
  • L42 - Vertical Restraints; Resale Price Maintenance; Quantity Discounts
  • Browse content in L5 - Regulation and Industrial Policy
  • L50 - General
  • L51 - Economics of Regulation
  • Browse content in L6 - Industry Studies: Manufacturing
  • L60 - General
  • L62 - Automobiles; Other Transportation Equipment; Related Parts and Equipment
  • L63 - Microelectronics; Computers; Communications Equipment
  • L66 - Food; Beverages; Cosmetics; Tobacco; Wine and Spirits
  • Browse content in L7 - Industry Studies: Primary Products and Construction
  • L71 - Mining, Extraction, and Refining: Hydrocarbon Fuels
  • L73 - Forest Products
  • Browse content in L8 - Industry Studies: Services
  • L81 - Retail and Wholesale Trade; e-Commerce
  • L83 - Sports; Gambling; Recreation; Tourism
  • L84 - Personal, Professional, and Business Services
  • L86 - Information and Internet Services; Computer Software
  • Browse content in L9 - Industry Studies: Transportation and Utilities
  • L91 - Transportation: General
  • L93 - Air Transportation
  • L94 - Electric Utilities
  • Browse content in M - Business Administration and Business Economics; Marketing; Accounting; Personnel Economics
  • Browse content in M1 - Business Administration
  • M11 - Production Management
  • M12 - Personnel Management; Executives; Executive Compensation
  • M14 - Corporate Culture; Social Responsibility
  • Browse content in M2 - Business Economics
  • M21 - Business Economics
  • Browse content in M3 - Marketing and Advertising
  • M31 - Marketing
  • M37 - Advertising
  • Browse content in M4 - Accounting and Auditing
  • M42 - Auditing
  • M48 - Government Policy and Regulation
  • Browse content in M5 - Personnel Economics
  • M50 - General
  • M51 - Firm Employment Decisions; Promotions
  • M52 - Compensation and Compensation Methods and Their Effects
  • M53 - Training
  • M54 - Labor Management
  • Browse content in N - Economic History
  • Browse content in N0 - General
  • N00 - General
  • N01 - Development of the Discipline: Historiographical; Sources and Methods
  • Browse content in N1 - Macroeconomics and Monetary Economics; Industrial Structure; Growth; Fluctuations
  • N10 - General, International, or Comparative
  • N11 - U.S.; Canada: Pre-1913
  • N12 - U.S.; Canada: 1913-
  • N13 - Europe: Pre-1913
  • N17 - Africa; Oceania
  • Browse content in N2 - Financial Markets and Institutions
  • N20 - General, International, or Comparative
  • N22 - U.S.; Canada: 1913-
  • N23 - Europe: Pre-1913
  • Browse content in N3 - Labor and Consumers, Demography, Education, Health, Welfare, Income, Wealth, Religion, and Philanthropy
  • N30 - General, International, or Comparative
  • N31 - U.S.; Canada: Pre-1913
  • N32 - U.S.; Canada: 1913-
  • N33 - Europe: Pre-1913
  • N34 - Europe: 1913-
  • N36 - Latin America; Caribbean
  • N37 - Africa; Oceania
  • Browse content in N4 - Government, War, Law, International Relations, and Regulation
  • N40 - General, International, or Comparative
  • N41 - U.S.; Canada: Pre-1913
  • N42 - U.S.; Canada: 1913-
  • N43 - Europe: Pre-1913
  • N44 - Europe: 1913-
  • N45 - Asia including Middle East
  • N47 - Africa; Oceania
  • Browse content in N5 - Agriculture, Natural Resources, Environment, and Extractive Industries
  • N50 - General, International, or Comparative
  • N51 - U.S.; Canada: Pre-1913
  • Browse content in N6 - Manufacturing and Construction
  • N63 - Europe: Pre-1913
  • Browse content in N7 - Transport, Trade, Energy, Technology, and Other Services
  • N71 - U.S.; Canada: Pre-1913
  • Browse content in N8 - Micro-Business History
  • N82 - U.S.; Canada: 1913-
  • Browse content in N9 - Regional and Urban History
  • N91 - U.S.; Canada: Pre-1913
  • N92 - U.S.; Canada: 1913-
  • N93 - Europe: Pre-1913
  • N94 - Europe: 1913-
  • Browse content in O - Economic Development, Innovation, Technological Change, and Growth
  • Browse content in O1 - Economic Development
  • O10 - General
  • O11 - Macroeconomic Analyses of Economic Development
  • O12 - Microeconomic Analyses of Economic Development
  • O13 - Agriculture; Natural Resources; Energy; Environment; Other Primary Products
  • O14 - Industrialization; Manufacturing and Service Industries; Choice of Technology
  • O15 - Human Resources; Human Development; Income Distribution; Migration
  • O16 - Financial Markets; Saving and Capital Investment; Corporate Finance and Governance
  • O17 - Formal and Informal Sectors; Shadow Economy; Institutional Arrangements
  • O18 - Urban, Rural, Regional, and Transportation Analysis; Housing; Infrastructure
  • O19 - International Linkages to Development; Role of International Organizations
  • Browse content in O2 - Development Planning and Policy
  • O23 - Fiscal and Monetary Policy in Development
  • O25 - Industrial Policy
  • Browse content in O3 - Innovation; Research and Development; Technological Change; Intellectual Property Rights
  • O30 - General
  • O31 - Innovation and Invention: Processes and Incentives
  • O32 - Management of Technological Innovation and R&D
  • O33 - Technological Change: Choices and Consequences; Diffusion Processes
  • O34 - Intellectual Property and Intellectual Capital
  • O38 - Government Policy
  • Browse content in O4 - Economic Growth and Aggregate Productivity
  • O40 - General
  • O41 - One, Two, and Multisector Growth Models
  • O43 - Institutions and Growth
  • O44 - Environment and Growth
  • O47 - Empirical Studies of Economic Growth; Aggregate Productivity; Cross-Country Output Convergence
  • Browse content in O5 - Economywide Country Studies
  • O52 - Europe
  • O53 - Asia including Middle East
  • O55 - Africa
  • Browse content in P - Economic Systems
  • Browse content in P0 - General
  • P00 - General
  • Browse content in P1 - Capitalist Systems
  • P10 - General
  • P16 - Political Economy
  • P17 - Performance and Prospects
  • P18 - Energy: Environment
  • Browse content in P2 - Socialist Systems and Transitional Economies
  • P26 - Political Economy; Property Rights
  • Browse content in P3 - Socialist Institutions and Their Transitions
  • P37 - Legal Institutions; Illegal Behavior
  • Browse content in P4 - Other Economic Systems
  • P48 - Political Economy; Legal Institutions; Property Rights; Natural Resources; Energy; Environment; Regional Studies
  • Browse content in P5 - Comparative Economic Systems
  • P51 - Comparative Analysis of Economic Systems
  • Browse content in Q - Agricultural and Natural Resource Economics; Environmental and Ecological Economics
  • Browse content in Q1 - Agriculture
  • Q10 - General
  • Q12 - Micro Analysis of Farm Firms, Farm Households, and Farm Input Markets
  • Q13 - Agricultural Markets and Marketing; Cooperatives; Agribusiness
  • Q14 - Agricultural Finance
  • Q15 - Land Ownership and Tenure; Land Reform; Land Use; Irrigation; Agriculture and Environment
  • Q16 - R&D; Agricultural Technology; Biofuels; Agricultural Extension Services
  • Browse content in Q2 - Renewable Resources and Conservation
  • Q25 - Water
  • Browse content in Q3 - Nonrenewable Resources and Conservation
  • Q32 - Exhaustible Resources and Economic Development
  • Q34 - Natural Resources and Domestic and International Conflicts
  • Browse content in Q4 - Energy
  • Q41 - Demand and Supply; Prices
  • Q48 - Government Policy
  • Browse content in Q5 - Environmental Economics
  • Q50 - General
  • Q51 - Valuation of Environmental Effects
  • Q53 - Air Pollution; Water Pollution; Noise; Hazardous Waste; Solid Waste; Recycling
  • Q54 - Climate; Natural Disasters; Global Warming
  • Q56 - Environment and Development; Environment and Trade; Sustainability; Environmental Accounts and Accounting; Environmental Equity; Population Growth
  • Q58 - Government Policy
  • Browse content in R - Urban, Rural, Regional, Real Estate, and Transportation Economics
  • Browse content in R0 - General
  • R00 - General
  • Browse content in R1 - General Regional Economics
  • R11 - Regional Economic Activity: Growth, Development, Environmental Issues, and Changes
  • R12 - Size and Spatial Distributions of Regional Economic Activity
  • R13 - General Equilibrium and Welfare Economic Analysis of Regional Economies
  • Browse content in R2 - Household Analysis
  • R20 - General
  • R23 - Regional Migration; Regional Labor Markets; Population; Neighborhood Characteristics
  • R28 - Government Policy
  • Browse content in R3 - Real Estate Markets, Spatial Production Analysis, and Firm Location
  • R30 - General
  • R31 - Housing Supply and Markets
  • R38 - Government Policy
  • Browse content in R4 - Transportation Economics
  • R40 - General
  • R41 - Transportation: Demand, Supply, and Congestion; Travel Time; Safety and Accidents; Transportation Noise
  • R48 - Government Pricing and Policy
  • Browse content in Z - Other Special Topics
  • Browse content in Z1 - Cultural Economics; Economic Sociology; Economic Anthropology
  • Z10 - General
  • Z12 - Religion
  • Z13 - Economic Sociology; Economic Anthropology; Social and Economic Stratification
  • Advance Articles
  • Editor's Choice
  • Author Guidelines
  • Submission Site
  • Open Access Options
  • Self-Archiving Policy
  • Why Submit?
  • About The Quarterly Journal of Economics
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

I. introduction, ii. a simple framework for discovery, iii. application and data, iv. the surprising importance of the face, v. algorithm-human communication, vi. evaluating these new hypotheses, vii. conclusion, data availability.

  • < Previous

Machine Learning as a Tool for Hypothesis Generation *

  • Article contents
  • Figures & tables
  • Supplementary Data

Jens Ludwig, Sendhil Mullainathan, Machine Learning as a Tool for Hypothesis Generation, The Quarterly Journal of Economics , Volume 139, Issue 2, May 2024, Pages 751–827, https://doi.org/10.1093/qje/qjad055

  • Permissions Icon Permissions

While hypothesis testing is a highly formalized activity, hypothesis generation remains largely informal. We propose a systematic procedure to generate novel hypotheses about human behavior, which uses the capacity of machine learning algorithms to notice patterns people might not. We illustrate the procedure with a concrete application: judge decisions about whom to jail. We begin with a striking fact: the defendant’s face alone matters greatly for the judge’s jailing decision. In fact, an algorithm given only the pixels in the defendant’s mug shot accounts for up to half of the predictable variation. We develop a procedure that allows human subjects to interact with this black-box algorithm to produce hypotheses about what in the face influences judge decisions. The procedure generates hypotheses that are both interpretable and novel: they are not explained by demographics (e.g., race) or existing psychology research, nor are they already known (even if tacitly) to people or experts. Though these results are specific, our procedure is general. It provides a way to produce novel, interpretable hypotheses from any high-dimensional data set (e.g., cell phones, satellites, online behavior, news headlines, corporate filings, and high-frequency time series). A central tenet of our article is that hypothesis generation is a valuable activity, and we hope this encourages future work in this largely “prescientific” stage of science.

Science is curiously asymmetric. New ideas are meticulously tested using data, statistics, and formal models. Yet those ideas originate in a notably less meticulous process involving intuition, inspiration, and creativity. The asymmetry between how ideas are generated versus tested is noteworthy because idea generation is also, at its core, an empirical activity. Creativity begins with “data” (albeit data stored in the mind), which are then “analyzed” (through a purely psychological process of pattern recognition). What feels like inspiration is actually the output of a data analysis run by the human brain. Despite this, idea generation largely happens off stage, something that typically happens before “actual science” begins. 1 Things are likely this way because there is no obvious alternative. The creative process is so human and idiosyncratic that it would seem to resist formalism.

That may be about to change because of two developments. First, human cognition is no longer the only way to notice patterns in the world. Machine learning algorithms can also find patterns, including patterns people might not notice themselves. These algorithms can work not just with structured, tabular data but also with the kinds of inputs that traditionally could only be processed by the mind, like images or text. Second, data on human behavior is exploding: second-by-second price and volume data in asset markets, high-frequency cellphone data on location and usage, CCTV camera and police bodycam footage, news stories, children’s books, the entire text of corporate filings, and so on. The kind of information researchers once relied on for inspiration is now machine readable: what was once solely mental data is increasingly becoming actual data. 2

We suggest that these changes can be leveraged to expand how hypotheses are generated. Currently, researchers do of course look at data to generate hypotheses, as in exploratory data analysis, but this depends on the idiosyncratic creativity of investigators who must decide what statistics to calculate. In contrast, we suggest capitalizing on the capacity of machine learning algorithms to automatically detect patterns, especially ones people might never have considered. A key challenge is that we require hypotheses that are interpretable to people. One important goal of science is to generalize knowledge to new contexts. Predictive patterns in a single data set alone are rarely useful; they become insightful when they can be generalized. Currently, that generalization is done by people, and people can only generalize things they understand. The predictors produced by machine learning algorithms are, however, notoriously opaque—hard-to-decipher “black boxes.” We propose a procedure that integrates these algorithms into a pipeline that results in human-interpretable hypotheses that are both novel and testable.

While our procedure is broadly applicable, we illustrate it in a concrete application: judicial decision making. Specifically we study pretrial decisions about which defendants are jailed versus set free awaiting trial, a decision that by law is supposed to hinge on a prediction of the defendant’s risk ( Dobbie and Yang 2021 ). 3 This is also a substantively interesting application in its own right because of the high stakes involved and mounting evidence that judges make these decisions less than perfectly ( Kleinberg et al. 2018 ; Rambachan et al. 2021 ; Angelova, Dobbie, and Yang 2023 ).

We begin with a striking fact. When we build a deep learning model of the judge—one that predicts whether the judge will detain a given defendant—a single factor emerges as having large explanatory power: the defendant’s face. A predictor that uses only the pixels in the defendant’s mug shot explains from one-quarter to nearly one-half of the predictable variation in detention. 4 Defendants whose mug shots fall in the bottom quartile of predicted detention are 20.4 percentage points more likely to be jailed than those in the top quartile. By comparison, the difference in detention rates between those arrested for violent versus nonviolent crimes is 4.8 percentage points. Notice what this finding is and is not. We are not claiming the mug shot predicts defendant behavior; that would be the long-discredited field of phrenology ( Schlag 1997 ). We instead claim the mug shot predicts judge behavior: how the defendant looks correlates strongly with whether the judge chooses to jail them. 5

Has the algorithm found something new in the pixels of the mug shot or simply rediscovered something long known or intuitively understood? After all, psychologists have been studying people’s reactions to faces for at least 100 years ( Todorov et al. 2015 ; Todorov and Oh 2021 ), while economists have shown that judges are influenced by factors (like race) that can be seen from someone’s face ( Arnold, Dobbie, and Yang 2018 ; Arnold, Dobbie, and Hull 2020 ). When we control for age, gender, race, skin color, and even the facial features suggested by previous psychology research (dominance, trustworthiness, attractiveness, and competence), none of these factors (individually or jointly) meaningfully diminishes the algorithm’s predictive power (see Figure I , Panel A). It is perhaps worth noting that the algorithm on its own does rediscover some of the signal from these features: in fact, collectively these known features explain |$22.3\%$| of the variation in predicted detention (see Figure I , Panel B). The key point is that the algorithm has discovered a great deal more as well.

Correlates of Judge Detention Decision and Algorithmic Prediction of Judge Decision

Correlates of Judge Detention Decision and Algorithmic Prediction of Judge Decision

Panel A summarizes the explanatory power of a regression model in explaining judge detention decisions, controlling for the different explanatory variables indicated at left (shaded tiles), either on their own (dark circles) or together with the algorithmic prediction of the judge decisions (triangles). Each row represents a different regression specification. By “other facial features,” we mean variables that previous psychology research suggests matter for how faces influence people’s reactions to others (dominance, trustworthiness, competence, and attractiveness). Ninety-five percent confidence intervals around our R 2 estimates come from drawing 10,000 bootstrap samples from the validation data set. Panel B shows the relationship between the different explanatory variables as indicated at left by the shaded tiles with the algorithmic prediction itself as the outcome variable in the regressions. Panel C examines the correlation with judge decisions of the two novel hypotheses generated by our procedure about what facial features affect judge detention decisions: well-groomed and heavy-faced.

Perhaps we should control for something else? Figuring out that “something else” is itself a form of hypothesis generation. To avoid a possibly endless—and misleading—process of generating other controls, we take a different approach. We show mug shots to subjects and ask them to guess whom the judge will detain and incentivize them for accuracy. These guesses summarize the facial features people readily (if implicitly) believe influence jailing. Although subjects are modestly good at this task, the algorithm is much better. It remains highly predictive even after controlling for these guesses. The algorithm seems to have found something novel beyond what scientists have previously hypothesized and beyond whatever patterns people can even recognize in data (whether or not they can articulate them).

What, then, are the novel facial features the algorithm has discovered? If we are unable to answer that question, we will have simply replaced one black box (the judge’s mind) with another (an algorithmic model of the judge’s mind). We propose a solution whereby the algorithm can communicate what it “sees.” Specifically, our procedure begins with a mug shot and “morphs” it to create a mug shot that maximally increases (or decreases) the algorithm’s predicted detention probability. The result is pairs of synthetic mug shots that can be examined to understand and articulate what differs within the pairs. The algorithm discovers, and people name that discovery. In principle we could have just shown subjects actual mug shots with higher versus lower predicted detention odds. But faces are so rich that between any pair of actual mug shots, many things will happen to be different and most will be unrelated to detention (akin to the curse of dimensionality). Simply looking at pairs of actual faces can, as a result, lead to many spurious observations. Morphing creates counterfactual synthetic images that are as similar as possible except with respect to detention odds, to minimize extraneous differences and help focus on what truly matters for judge detention decisions.

Importantly, we do not generate hypotheses by looking at the morphs ourselves; instead, they are shown to independent study subjects (MTurk or Prolific workers) in an experimental design. Specifically, we showed pairs of morphed images and asked participants to guess which image the algorithm predicts to have higher detention risk. Subjects were given both incentives and feedback, so they had motivation and opportunity to learn the underlying patterns. While subjects initially guess the judge’s decision correctly from these morphed mug shots at about the same rate as they do when looking at “raw data,” that is, actual mug shots (modestly above the |$50\%$| random guessing mark), they quickly learn from these morphed images what the algorithm is seeing and reach an accuracy of nearly |$70\%$|⁠ . At the end, participants are asked to put words to the differences they see across images in each pair, that is, to name what they think are the key facial features the algorithm is relying on to predict judge decisions. Comfortingly, there is substantial agreement on what subjects see: a sizable share of subjects all name the same feature. To verify whether the feature they identify is used by the algorithm, a separate sample of subjects independently coded mug shots for this new feature. We show that the new feature is indeed correlated with the algorithm’s predictions. What subjects think they’re seeing is indeed what the algorithm is also “seeing.”

Having discovered a single feature, we can iterate the procedure—the first feature explains only a fraction of what the algorithm has captured, suggesting there are many other factors to be discovered. We again produce morphs, but this time hold the first feature constant: that is, we orthogonalize so that the pairs of morphs do not differ on the first feature. When these new morphs are shown to subjects, they consistently name a second feature, which again correlates with the algorithm’s prediction. Both features are quite important. They explain a far larger share of what the algorithm sees than all the other variables (including race and skin color) besides gender. These results establish our main goals: show that the procedure produces meaningful communication, and that it can be iterated.

What are the two discovered features? The first can be called “well-groomed” (e.g., tidy, clean, groomed, versus unkept, disheveled, sloppy look), and the second can be called “heavy-faced” (e.g., wide facial shape, puffier face, wider face, rounder face, heavier). These features are not just predictive of what the algorithm sees, but also of what judges actually do ( Figure I , Panel C). We find that both well-groomed and heavy-faced defendants are more likely to be released, even controlling for demographic features and known facial features from psychology. Detention rates of defendants in the top and bottom quartile of well-groomedness differ by 5.5 percentage points ( ⁠|$24\%$| of the base rate), while the top versus bottom quartile difference in heavy-facedness is 7 percentage points (about |$30\%$| of the base rate). Both differences are larger than the 4.8 percentage points detention rate difference between those arrested for violent versus nonviolent crimes. Not only are these magnitudes substantial, these hypotheses are novel even to practitioners who work in the criminal justice system (in a public defender’s office and a legal aid society).

Establishing whether these hypotheses are truly causally related to judge decisions is obviously beyond the scope of the present article. But we nonetheless present a few additional findings that are at least suggestive. These novel features do not appear to be simply proxies for factors like substance abuse, mental health, or socioeconomic status. Moreover, we carried out a lab experiment in which subjects are asked to make hypothetical pretrial release decisions as if they were a judge. They are shown information about criminal records (current charge, prior arrests) along with mug shots that are randomly morphed in the direction of higher or lower values of well-groomed (or heavy-faced). Subjects tend to detain those with higher-risk structured variables (criminal records), all else equal, suggesting they are taking the task seriously. These same subjects, though, are also more likely to detain defendants who are less heavy-faced or well-groomed, even though these were randomly assigned.

Ultimately, though, this is not a study about well-groomed or heavy-faced defendants, nor are its implications limited to faces or judges. It develops a general procedure that can be applied wherever behavior can be predicted using rich (especially high-dimensional) data. Development of such a procedure has required overcoming two key challenges.

First, to generate interpretable hypotheses, we must overcome the notorious black box nature of most machine learning algorithms. Unlike with a regression, one cannot simply inspect the coefficients. A modern deep-learning algorithm, for example, can have tens of millions of parameters. Noninspectability is especially problematic when the data are rich and high dimensional since the parameters are associated with primitives such as pixels. This problem of interpretation is fundamental and remains an active area of research. 6 Part of our procedure here draws on the recent literature in computer science that uses generative models to create counterfactual explanations. Most of those methods are designed for AI applications that seek to automate tasks humans do nearly perfectly, like image classification, where predictability of the outcome (is this image of a dog or a cat?) is typically quite high. 7 Interpretability techniques are used to ensure the algorithm is not picking up on spurious signal. 8 We developed our method, which has similar conceptual underpinnings to this existing literature, for social science applications where the outcome (human behavior) is typically more challenging to predict. 9 To what degree existing methods (as they currently stand or with some modification) could perform as well or better in social science applications like ours is a question we leave to future work.

Second, we must overcome what we might call the Rorschach test problem. Suppose we, the authors, were to look at these morphs and generate a hypothesis. We would not know if the procedure played any meaningful role. Perhaps the morphs, like ink blots, are merely canvases onto which we project our creativity. 10 Put differently, a single research team’s idiosyncratic judgments lack the kind of replicability we desire of a scientific procedure. To overcome this problem, it is key that we use independent (nonresearcher) subjects to inspect the morphs. The fact that a sizable share of subjects all name the same discovery suggests that human-algorithm communication has occurred and the procedure is replicable, rather than reflecting some unique spark of creativity.

At the same time, the fact that our procedure is not fully automatic implies that it will be shaped and constrained by people. Human participants are needed to name the discoveries. So whole new concepts that humans do not yet understand cannot be produced. Such breakthroughs clearly happen (e.g., gravity or probability) but are beyond the scope of procedures like ours. People also play a crucial role in curating the data the algorithm sees. Here, for example, we chose to include mug shots. The creative acquisition of rich data is an important human input into this hypothesis generation procedure. 11

Our procedure can be applied to a broad range of settings and will be particularly useful for data that are not already intrinsically interpretable. Many data sets contain a few variables that already have clear, fixed meanings and are unlikely to lead to novel discoveries. In contrast, images, text, and time series are rich high-dimensional data with many possible interpretations. Just as there is an ocean of plausible facial features, these sorts of data contain a large set of potential hypotheses that an algorithm can search through. Such data are increasingly available and used by economists, including news headlines, legislative deliberations, annual corporate reports, Federal Open Market Committee statements, Google searches, student essays, résumés, court transcripts, doctors’ notes, satellite images, housing photos, and medical images. Our procedure could, for example, raise hypotheses about what kinds of news lead to over- or underreaction of stock prices, which features of a job interview increase racial disparities, or what features of an X-ray drive misdiagnosis.

Central to this work is the belief that hypothesis generation is a valuable activity in and of itself. Beyond whatever the value might be of our specific procedure and empirical application, we hope these results also inspire greater attention to this traditionally “prescientific” stage of science.

We develop a simple framework to clarify the goals of hypothesis generation and how it differs from testing, how algorithms might help, and how our specific approach to algorithmic hypothesis generation differs from existing methods. 12

II.A. The Goals of Hypothesis Generation

What criteria should we use for assessing hypothesis generation procedures? Two common goals for hypothesis generation are ones that we ensure ex post. First is novelty. In our application, we aim to orthogonalize against known factors, recognizing that it may be hard to orthogonalize against all known hypotheses. Second, we require that hypotheses be testable ( Popper 2002 ). But what can be tested is hard to define ex ante, in part because it depends on the specific hypothesis and the potential experimental setups. Creative empiricists over time often find ways to test hypotheses that previously seemed untestable. 13 To these, we add two more: interpretability and empirical plausibility.

What do we mean by empirically plausible? Let y be some outcome of interest, which for simplicity we assume is binary, and let h ( x ) be some hypothesis that maps the features of each instance, x , to [0,1]. By empirical plausibility we mean some correlation between y and h ( x ). Our ultimate aim is to uncover causal relationships. But causality can only be known after causal testing. That raises the question of how to come up with ideas worth causally testing, and how we would recognize them when we see them. Many true hypotheses need not be visible in raw correlations. Those can only be identified with background knowledge (e.g., theory). Other procedures would be required to surface those. Our focus here is on searching for true hypotheses that are visible in raw correlations. Of course not every correlation will turn out to be a true hypothesis, but even in those cases, generating such hypotheses and then invalidating them can be a valuable activity. Debunking spurious correlations has long been one of the most useful roles of empirical work. Understanding what confounders produce those correlations can also be useful.

We care about our final goal for hypothesis generation, interpretability, because science is largely about helping people make forecasts into new contexts, and people can only do that with hypotheses they meaningfully understand. Consider an uninterpretable hypothesis like “this set of defendants is more likely to be jailed than that set,” but we cannot articulate a reason why. From that hypothesis, nothing could be said about a new set of courtroom defendants. In contrast an interpretable hypothesis like “skin color affects detention” has implications for other samples of defendants and for entirely different settings. We could ask whether skin color also affects, say, police enforcement choices or whether these effects differ by time of day. By virtue of being interpretable, these hypotheses let us use a wider set of knowledge (police may share racial biases; skin color is not as easily detected at night). 14 Interpretable descriptions let us generalize to novel situations, in addition to being easier to communicate to key stakeholders and lending themselves to interpretable solutions.

II.B. Human versus Algorithmic Hypothesis Generation

Human hypothesis generation has the advantage of generating hypotheses that are interpretable. By construction, the ideas that humans come up with are understandable by humans. But as a procedure for generating new ideas, human creativity has the drawback of often being idiosyncratic and not necessarily replicable. A novel hypothesis is novel exactly because one person noticed it when many others did not. A large body of evidence shows that human judgments have a great deal of “noise.” It is not just that different people draw different conclusions from the same observations, but the same person may notice different things at different times ( Kahneman, Sibony, and Sunstein 2022 ). A large body of psychology research shows that people typically are not able to introspect and understand why we notice specific things those times we do notice them. 15

There is also no guarantee that human-generated hypotheses need be empirically plausible. The intuition is related to “overfitting.” Suppose that people look at a subset of all data and look for something that differentiates positive ( y  = 1) from negative ( y  = 0) cases. Even with no noise in y , there is randomness in which observations are in the data. That can lead to idiosyncratic differences between y  = 0 and y  = 1 cases. As the number of comprehensible hypotheses gets large, there is a “curse of dimensionality”: many plausible hypotheses for these idiosyncratic differences. That is, many different hypotheses can look good in sample but need not work out of sample. 16

In contrast, supervised learning tools in machine learning are designed to generate predictions in new (out-of-sample) data. 17 That is, algorithms generate hypotheses that are empirically plausible by construction. 18 Moreover, machine learning can detect patterns in data that humans cannot. Algorithms can notice, for example, that livestock all tend to be oriented north ( Begall et al. 2008 ), whether someone is about to have a heart attack based on subtle indications in an electrocardiogram ( Mullainathan and Obermeyer 2022 ), or that a piece of machinery is about to break ( Mobley 2002 ). We call these machine learning prediction functions m ( x ), which for a binary outcome y map to [0, 1].

The challenge is that most m ( x ) are not interpretable. For this type of statistical model to yield an interpretable hypothesis, its parameters must be interpretable. That can happen in some simple cases. For example, if we had a data set where each dimension of x was interpretable (such as individual structured variables in a tabular data set) and we used a predictor such as OLS (or LASSO), we could just read the hypotheses from the nonzero coefficients: which variables are significant? Even in that case, interpretation is challenging because machine learning tools, built to generate accurate predictions rather than apportion explanatory power across explanatory variables, yield coefficients that can be unstable across realizations of the data ( Mullainathan and Spiess 2017 ). 19 Often interpretation is much less straightforward than that. If x is an image, text, or time series, the estimated models (such as convolutional neural networks) can have literally millions of parameters. The models are defined on granular inputs with no particular meaning: if we knew m ( x ) weighted a particular pixel, what have we learned? In these cases, the estimated model m ( x ) is not interpretable. Our focus is on these contexts where algorithms, as black-box models, are not readily interpreted.

Ideally one might marry people’s unique knowledge of what is comprehensible with an algorithm’s superior capacity to find meaningful correlations in data: to have the algorithm discover new signal and then have humans name that discovery. How to do so is not straightforward. We might imagine formalizing the set of interpretable prediction functions, and then focus on creating machine learning techniques that search over functions in that set. But mathematically characterizing those functions is typically not possible. Or we might consider seeking insight from a low-dimensional representation of face space, or “eigenfaces,” which are a common teaching tool for principal components analysis ( Sirovich and Kirby 1987 ). But those turn out not to provide much useful insight for our purposes. 20 In some sense it is obvious why: the subset of actual faces is unlikely to be a linear subspace of the space of pixels. If we took two faces and linearly interpolated them the resulting image would not look like a face. Some other approach is needed. We build on methods in computer science that use generative models to generate counterfactual explanations.

II.C. Related Methods

Our hypothesis generation procedure is part of a growing literature that aims to integrate machine learning into the way science is conducted. A common use (outside of economics) is in what could be called “closed world problems”: situations where the fundamental laws are known, but drawing out predictions is computationally hard. For example, the biochemical rules of how proteins fold are known, but it is hard to predict the final shape of a protein. Machine learning has provided fundamental breakthroughs, in effect by making very hard-to-compute outcomes computable in a feasible timeframe. 21

Progress has been far more limited with applications where the relationship between x and y is unknown (“open world” problems), like human behavior. First, machine learning here has been useful at generating unexpected findings, although these are not hypotheses themselves. Pierson et al. (2021) show that a deep-learning algorithm is better able to predict patient pain from an X-ray than clinicians can: there are physical knee defects that medicine currently does not understand. But that study is not able to isolate what those defects are. 22 Second, machine learning has also been used to explore investigator-generated hypotheses, such as Mullainathan and Obermeyer (2022) , who examine whether physicians suffer from limited attention when diagnosing patients. 23

Finally, a few papers take on the same problem that we do. Fudenberg and Liang (2019) and Peterson et al. (2021) have used algorithms to predict play in games and choices between lotteries. They inspected those algorithms to produce their insights. Similarly, Kleinberg et al. (2018) and Sunstein (2021) use algorithmic models of judges and inspect those models to generate hypotheses. 24 Our proposal builds on these papers. Rather than focusing on generating an insight for a specific application, we suggest a procedure that can be broadly used for many applications. Importantly, our procedure does not rely on researcher inspection of algorithmic output. When an expert researcher with a track record of generating scientific ideas uses some procedure to generate an idea, how do we know whether the result is due to the procedure or the researcher? By relying on a fixed algorithmic procedure that human subjects can interface with, hypothesis generation goes from being an idiosyncratic act of individuals to a replicable process.

III.A. Judicial Decision Making

Although our procedure is broadly applicable, we illustrate it through a specific application to the U.S. criminal justice system. We choose this application partly because of its social relevance. It is also an exemplar of the type of application where our hypothesis generation procedure can be helpful. Its key ingredients—a clear decision maker, a large number of choices (over 10 million people are arrested each year in the United States) that are recorded in data, and, increasingly, high-dimensional data that can also be used to model those choices, such as mug shot images, police body cameras, and text from arrest reports or court transcripts—are shared with a variety of other applications.

Our specific focus is on pretrial hearings. Within 24–48 hours after arrest, a judge must decide where the defendant will await trial, in jail or at home. This is a consequential decision. Cases typically take 2–4 months to resolve, sometimes up to 9–12 months. Jail affects people’s families, their livelihoods, and the chances of a guilty plea ( Dobbie, Goldin, and Yang 2018 ). On the other hand, someone who is released could potentially reoffend. 25

While pretrial decisions are by law supposed to hinge on the defendant’s risk of flight or rearrest if released ( Dobbie and Yang 2021 ), studies show that judges’ decisions deviate from those guidelines in a number of ways. For starters, judges seem to systematically mispredict defendant risk ( Jung et al. 2017 ; Kleinberg et al. 2018 ; Rambachan 2021 ; Angelova, Dobbie, and Yang 2023 ), partly because judges overweight the charge for which people are arrested ( Sunstein 2021 ). Judge decisions can also depend on extralegal factors like race ( Arnold, Dobbie, and Yang 2018 ; Arnold, Dobbie, and Hull 2020 ), whether the judge’s favorite football team lost ( Eren and Mocan 2018 ), weather ( Heyes and Saberian 2019 ), the cases the judge just heard ( Chen, Moskowitz, and Shue 2016 ), and if the hearing is on the defendant’s birthday ( Chen and Philippe 2023 ). These studies test hypotheses that some human being was clever enough to think up. But there remains a great deal of unexplained variation in judges’ decisions. The challenge of expanding the set of hypotheses for understanding this variation without losing the benefit of interpretability is the motivation for our own analysis here.

III.B. Administrative Data

We obtained data from Mecklenburg County, North Carolina, the second most populated county in the state (over 1 million residents) that includes North Carolina’s largest city (Charlotte). The county is similar to the rest of the United States in terms of economic conditions (2021 poverty rates were |$11.0\%$| versus |$11.4\%$|⁠ , respectively), although the share of Mecklenburg County’s population that is non-Hispanic white is lower than the United States as a whole ( ⁠|$56.6\%$| versus |$75.8\%$|⁠ ). 26 We rely on three sources of administrative data: 27

The Mecklenburg County Sheriff’s Office (MCSO) publicly posts arrest data for the past three years, which provides information on defendant demographics like age, gender, and race, as well as the charge for which someone was arrested.

The North Carolina Administrative Office of the Courts (NCAOC) maintains records on the judge’s pretrial decisions (detain, release, etc.).

Data from the North Carolina Department of Public Safety includes information about the defendant’s prior convictions and incarceration spells, if any.

We also downloaded photos of the defendants from the MCSO public website (so-called mug shots), 28 which capture a frontal view of each person from the shoulders up in front of a gray background. These images are 400 pixels wide by 480 pixels high, but we pad them with a black boundary to be square 512 × 512 images to conform with the requirements of some of the machine learning tools. In Figure II , we give readers a sense of what these mug shots look like, with two important caveats. First, given concerns about how the overrepresentation of disadvantaged groups in discussions of crime can contribute to stereotyping ( Bjornstrom et al. 2010 ), we illustrate the key ideas of the paper using images for non-Hispanic white males. Second, out of sensitivity to actual arrestees, we do not wish to display actual mug shots (which are available at the MCSO website). 29 Instead, the article only shows mug shots that are synthetic, generated using generative adversarial networks as described in Section V.B .

Illustrative Facial Images

Illustrative Facial Images

This figure shows facial images that illustrate the format of the mug shots posted publicly on the Mecklenberg County, North Carolina, sheriff’s office website. These are not real mug shots of actual people who have been arrested, but are synthetic. Moreover, given concerns about how the overrepresentation of disadvantaged groups in discussions of crime can exacerbate stereotyping, we illustrate the our key ideas using images for non-Hispanic white men. However, in our human intelligence tasks that ask participants to provide labels (ratings for different image features), we show images that are representative of the Mecklenberg County defendant population as a whole.

These data capture much of the information the judge has available at the time of the pretrial hearing, but not all of it. Both the judge and the algorithm see structured variables about each defendant like defendant demographics, current charge, and prior record. Because the mug shot (which the algorithm uses) is taken not long before the pretrial hearing, it should be a reasonable proxy for what the judge sees in court. The additional information the judge has but the algorithm does not includes the narrative arrest report from the police and what happens in court. While pretrial hearings can be quite brief in many jurisdictions (often not more than just a few minutes), the judge may nonetheless hear statements from police, prosecutors, defense lawyers, and sometimes family members. Defendants usually have their lawyers speak for them and do not say much at these hearings.

We downloaded 81,166 arrests made between January 18, 2017, and January 17, 2020, involving 42,353 unique defendants. We apply several data filters, like dropping cases without mugshots ( Online Appendix Table A.I ), leaving 51,751 observations. Because our goal is inference about new out-of-sample (OOS) observations, we partition our data as follows:

A train set of N = 22,696 cases, constructed by taking arrests through July 17, 2019, grouping arrests by arrestee, 30 randomly selecting |$70\%$| to the training-plus-validation data set, then randomly selecting |$70\%$| of those arrestees for the training data specifically.

A validation set of N = 9,604 cases used to report OOS performance in the article’s main exhibits, consisting of the remaining |$30\%$| in the combined training-plus-validation data frame.

A lock-box hold-out set of N = 19,009 cases that we did not touch until the article was accepted for final publication, to avoid what one might call researcher overfitting: we run lots of models over the course of writing the article, and the results on the validation data set may overstate our findings. This data set consists of the N = 4,759 valid cases for the last six months of our data period (July 17, 2019, to January 17, 2020) plus a random sample of |$30\%$| of those arrested before July 17, 2019, so that we can present results that are OOS with respect to individuals and time. Once this article was officially accepted, we replicated the findings presented in our main exhibits (see Online Appendix D and Online Appendix Tables A.XVIII–A.XXXII ). We see that our core findings are qualitatively similar. 31

Descriptive statistics are shown in Table I . Relative to the county as a whole, the arrested population substantially overrepresents men ( ⁠|$78.7\%$|⁠ ) and Black residents ( ⁠|$69.4\%$|⁠ ). The average age of arrestees is 31.8 years. Judges detain |$23.3\%$| of cases, and in |$25.1\%$| of arrests the person is rearrested before their case is resolved (about one-third of those released). Randomization of arrestees to the training versus validation data sets seems to have been successful, as shown in Table I . None of the pairwise comparisons has a p -value below .05 (see Online Appendix Table A.II ). A permutation multivariate analysis of variance test of the joint null hypothesis that the training-validation differences for all variables are all zero yields p  = .963. 32 A test for the same joint null hypothesis for the differences between the training sample and the lock-box hold-out data set (out of sample by individual) yields a test statistic of p  = .537.

Summary Statistics for Mecklenburg County NC Data, 2017–2020

Notes. This table reports descriptive statistics for our full data set and analysis subsets, which cover the period January 18, 2017, through January 17, 2020, from Mecklenburg County, NC. The lock-box hold-out data set consists of data from the last six months of our study period (July 17, 2019–January 17, 2020) plus a subset of cases through July 16, 2019, selected by randomly selecting arrestees. The remainder of the data set is then randomly assigned by arrestee to our training data set (used to build our algorithms) or to our validation set (which we use to report results in the article’s main exhibits). For additional details of our data filters and partitioning procedures, see Online Appendix Table A.I . We define pretrial release as being released on the defendant’s own recognizance or having been assigned and then posting cash bail requirements within three days of arrest. We define rearrest as experiencing a new arrest before adjudication of the focal arrest, with detained defendants being assigned zero values for the purposes of this table. Arrest charge categories reflect the most serious criminal charge for which a person was arrested, using the FBI Uniform Crime Reporting hierarchy rule in cases where someone is arrested and charged with multiple offenses. For analyses of variance for the test of the joint null hypothesis that the difference in means across each variable is zero, see Online Appendix Table A.II .

III.C. Human Labels

The administrative data capture many key features of each case but omit some other important ones. We solve these data insufficiency problems through a series of human intelligence tasks (HITs), which involve having study subjects on one of two possible platforms (Amazon’s Mechanical Turk or Prolific) assign labels to each case from looking at the mug shots. More details are in Online Appendix Table A.III . We use data from these HITs mostly to understand how the algorithm’s predictions relate to already-known determinants of human decision making, and hence the degree to which the algorithm is discovering something novel.

One set of HITs filled in demographic-related data: ethnicity; skin tone (since people are often stereotyped on skin color, or “colorism”; Hunter 2007 ), reported on an 18-point scale; the degree to which defendants appear more stereotypically Black on a 9-point scale ( Eberhardt et al. 2006 show this affects criminal justice decisions); and age, to compare to administrative data for label quality checks. 33 Because demographics tend to be easy for people to see in images, we collect just one label per image for each of these variables. To confirm one label is enough, we repeated the labeling task for 100 images but collected 10 labels for each image; we see that additional labels add little information. 34 Another data quality check comes from the fact that the distributions of skin color ratings do systematically differ by defendant race ( Online Appendix Figure A.III ).

A second type of HIT measured facial features that previous psychology research has shown affect human judgments. The specific set of facial features we focus on come from the influential study by Oosterhof and Todorov (2008) of people’s perceptions of the facial features of others. When subjects are asked to provide descriptions of different faces, principal components analysis suggests just two dimensions account for about |$80\%$| of the variation: (i) trustworthiness and (ii) dominance. We also collected data on two other facial features shown to be associated with real-world decisions like hiring or whom to vote for: (iii) attractiveness and (iv) competence ( Frieze, Olson, and Russell 1991 ; Little, Jones, and DeBruine 2011 ; Todorov and Oh 2021 ). 35

We asked subjects to rate images for each of these psychological features on a nine-point scale. Because psychological features may be less obvious than demographic features, we collected three labels per training–data set image and five per validation–data set image. 36 There is substantial variation in the ratings that subjects assign to different images for each feature (see Online Appendix Figure A.VI ). The ratings from different subjects for the same feature and image are highly correlated: interrater reliability measures (Cronbach’s α) range from 0.87 to 0.98 ( Online Appendix Figure A.VII ), similar to those reported in studies like Oosterhof and Todorov (2008) . 37 The information gain from collecting more than a few labels per image is modest. 38 For summary statistics, see Online Appendix Table A.IV .

Finally, we also tried to capture people’s implicit or tacit understanding of the determinants of judges’ decisions by asking subjects to predict which mug shot out of a pair would be detained, with images in each pair matched on gender, race, and five-year age brackets. 39 We incentivized study subjects for correct predictions and gave them feedback over the course of the 50 image pairs to facilitate learning. We treat the first 10 responses per subject as a “learning set” that we exclude from our analysis.

The first step of our hypothesis generation procedure is to build an algorithmic model of some behavior, which in our case is the judge’s detention decision. A sizable share of the predictable variation in judge decisions comes from a surprising source: the defendant’s face. Facial features implicated by past research explain just a modest share of this predictable variation. The algorithm seems to have found a novel discovery.

IV.A. What Drives Judge Decisions?

We begin by predicting judge pretrial detention decisions ( y  = 1 if detain, y  = 0 if release) using all the inputs available ( x ). We use the training data set to construct two separate models for the two types of data available. We apply gradient-boosted decision trees to predict judge decisions using the structured administrative data (current charge, prior record, age, gender), m s ( x ); for the unstructured data (raw pixel values from the mug shots), we train a convolutional neural network, m u ( x ). Each model returns an estimate of y (a predicted detention probability) for a given x . Because these initial steps of our procedure use standard machine learning methods, we relegate their discussion to the Online Appendix .

We pool the signal from both models to form a single weighted-average model |$m_p(x)=[\hat{\beta _s} m_s(x) + \hat{\beta _u} m_u(x)]$| using a so-called stacking procedure where the data are used to estimate the relevant weights. 40 Combining structured and unstructured data is an active area of deep-learning research, often called fusion modeling ( Yuhas, Goldstein, and Sejnowski 1989 ; Lahat, Adali, and Jutten 2015 ; Ramachandram and Taylor 2017 ; Baltrušaitis, Ahuja, and Morency 2019 ). We have tried several of the latest fusion architectures; none improve on our ensemble approach.

Judge decisions do have some predictable structure. We report predictive performance as the area under the receiver operating characteristic curve, or AUC, which is a measure of how well the algorithm rank-orders cases with values from 0.5 (random guessing) to 1.0 (perfect prediction). Intuitively, AUC can be thought of as the chance that a uniformly randomly selected detained defendant has a higher predicted detention likelihood than a uniformly randomly selected released defendant. The algorithm built using all candidate features, m p ( x ), has an AUC of 0.780 (see Online Appendix Figure A.X ).

What is the algorithm using to make its predictions? A single type of input captures a sizable share of the total signal: the defendant’s face. The algorithm built using only the mug shot image, m u ( x ), has an AUC of 0.625 (see Online Appendix Figure A.X ). Since an AUC of 0.5 represents random prediction, in AUC terms the mug shot accounts for |$\frac{0.625-0.5}{0.780-0.5}=44.6\%$| of the predictive signal about judicial decisions.

Another common way to think about predictive accuracy is in R 2 terms. While our data are high dimensional (because the facial image is a high-dimensional object), the algorithm’s prediction of the judge’s decision based on the facial image, m u ( x ), is a scalar and can be easily included in a familiar regression framework. Like AUC, measures like R 2 and mean squared error capture how well a model rank-orders observations by predicted probabilities, but R 2 , unlike AUC, also captures how close predictions are to observed outcomes (calibration). 41 The R 2 from regressing y against m s ( x ) and m u ( x ) in the validation data is 0.11. Regressing y against m u ( x ) alone yields an R 2 of 0.03. So depending on how we measure predictive accuracy, around a quarter ( ⁠|$\frac{0.03}{0.11}=27.3\%)$| to a half ( ⁠|$44.6\%$|⁠ ) of the predicted signal about judges’ decisions is captured by the face.

Average differences are another way to see what drives judges’ decisions. For any given feature x k , we can calculate the average detention rate for different values of the feature. For example, for the variable measuring whether the defendant is male ( x k  = 1) versus female ( x k  = 0), we can calculate and plot E [ y | x k  = 1] versus E [ y | x k  = 0]. As shown in Online Appendix Figure A.XI , the difference in detention rates equals 4.8 percentage points for those arrested for violent versus nonviolent crimes, 10.2 percentage points for men versus women, and 4.3 percentage points for bottom versus top quartile of skin tone, which are all sizable relative to the baseline detention rate of |$23.3\%$| in our validation data set. By way of comparison, average detention rates for the bottom versus top quartile of the mug shot algorithm’s predictions, m u ( x ), differ by 20.4 percentage points.

In what follows, we seek to understand more about the mug shot–based prediction of the judge’s decision, which we refer to simply as m ( x ) in the remainder of the article.

IV.B. Judicial Error?

So far we have shown that the face predicts judges’ behavior. Are judges right to use face information? To be precise, by “right” we do not mean a broader ethical judgment; for many reasons, one could argue it is never ethical to use the face. But suppose we take a rather narrow (exceedingly narrow) formulation of “right.” Recall the judge is meant to make jailing decisions based on the defendant’s risk. Is the use of these facial characteristics consistent with that objective? Put differently, if we account for defendant risk differences, do these facial characteristics still predict judge decisions? The fact that judges rely on the face in making detention decisions is in itself a striking insight regardless of whether the judges use appearance as a proxy for risk or are committing a cognitive error.

At first glance, the most straightforward way to answer this question would be to regress rearrest against the algorithm’s mug shot–based detention prediction. That yields a statistically significant relationship: The coefficient (and standard error) for the mug shot equals 0.6127 (0.0460) with no other explanatory variables in the regression versus 0.5735 (0.0521) with all the explanatory variables (as in the final column, Table III ). But the interpretation here is not so straightforward.

The challenge of interpretation comes from the fact that we have only measured crime rates for the released defendants. The problem with having measured crime, not actual crime, is that whether someone is charged with a crime is itself a human choice, made by police. If the choices police make about when to make an arrest are affected by the same biases that might afflict judges, then measured rearrest rates may correlate with facial characteristics simply due to measurement bias. The problem created by having measures of rearrest only for released defendants is that if judges have access to private information (defendant characteristics not captured by our data set), and judges use that information to inform detention decisions, then the released and detained defendants may be different in unobservable ways that are relevant for rearrest risk ( Kleinberg et al. 2018 ).

With these caveats in mind, at least we can perform a bounding exercise. We created a predictor of rearrest risk (see Online Appendix B ) and then regress judges’ decisions on predicted rearrest risk. We find that a one-unit change in predicted rearrest risk changes judge detention rates by 0.6103 (standard error 0.0213). By comparison, we found that a one-unit change in the mug shot (by which we mean the algorithm’s mug shot–based prediction of the judge detention decision) changes judge detention rates by 0.6963 (standard error 0.0383; see Table III , column (1)). That means if the judges were reacting to the defendant’s face only because the face is a proxy for rearrest risk, the difference in rearrest risk for those with a one-unit difference in the mug shot would need to be |$\frac{0.6963}{0.6103} = 1.141$|⁠ . But when we directly regress rearrest against the algorithm’s mug shot–based detention prediction, we get a coefficient of 0.6127 (standard error 0.0460). Clearly 0.6127 < 1.141; that is, the mug shot does not seem to be strongly related enough to rearrest risk to explain the judge’s use of it in making detention decisions. 42

Of course this leaves us with the second problem with our data: we only have crime data on the released. It is possible the relationship between the mug shot and risk could be very different among the |$23.3\%$| of defendants who are detained (which we cannot observe). Put differently, the mug shot–risk relationship among the |$76.7\%$| of the defendants who are released is 0.6127; and let A be the (unknown) mug shot–risk relationship among the jailed. What we really want to know is the mug shot–risk relationship among all defendants, which equals (0.767 · 0.6127) + (0.233 · A ). For this mug shot–risk relationship among all defendants to equal 1.141, A would need to be 2.880, nearly five times as great among the detained defendants as among the released. This would imply an implausibly large effect of the mug shot on rearrest risk relative to the size of the effects on rearrest risk of other defendant characteristics. 43

In addition, the results from Section VI.B call into question that these characteristics are well-understood proxies for risk. As we show there, experts who understand pretrial (public defenders and legal aid society staff) do not recognize the signal about judge decision making that the algorithm has discovered in the mug shot. These considerations as a whole—that measured rearrest is itself biased, the bounding exercise, and the failure of experts to recreate this signal—together lead us to tentatively conclude that it is unlikely that what the algorithm is finding in the face is merely a well-understood proxy for risk, but reflects errors in the judicial decision-making process. Of course, that presumption is not essential for the rest of the article, which asks: what exactly has the algorithm discovered in the face?

IV.C. Is the Algorithm Discovering Something New?

Previous studies already tell us a number of things about what shapes the decisions of judges and other people. For example, we know people stereotype by gender ( Avitzour et al. 2020 ), age ( Neumark, Burn, and Button 2016 ; Dahl and Knepper 2020 ), and race or ethnicity ( Bertrand and Mullainathan 2004 ; Arnold, Dobbie, and Yang 2018 ; Arnold, Dobbie, and Hull 2020 ; Fryer 2020 ; Hoekstra and Sloan 2022 ; Goncalves and Mello 2021 ). Is the algorithm just rediscovering known determinants of people’s decisions, or discovering something new? We address this in two ways. We first ask how much of the algorithm’s predictions can be explained by already-known features ( Table II ). We then ask how much of the algorithm’s predictive power in explaining actual judges’ decisions is diminished when we control for known factors ( Table III ). We carry out both analyses for three sets of known facial features: (i) demographic characteristics, (ii) psychological features, and (iii) incentivized human guesses. 44

Is the Algorithm Rediscovering Known Facial Features?

Notes. The table presents the results of regressing an algorithmic prediction of judge detention decisions against each of the different explanatory variables as listed in the rows, where each column represents a different regression specification (the specific explanatory variables in each regression are indicated by the filled-in coefficients and standard errors in the table). The algorithm was trained using mug shots from the training data set; the regressions reported here are carried out using data from the validation data set. Data on skin tone, attractiveness, competence, dominance, and trustworthiness comes from asking subjects to assign feature ratings to mug shot images from the Mecklenburg County, NC, Sheriff’s Office public website (see the text). The human guess about the judges’ decision comes from showing workers on the Prolific platform pairs of mug shot images and asking them to report which defendant they believe the judge would be more likely to detain. Regressions follow a linear probability model and also include indicators for unknown race and unknown gender. * p < .1; ** p < .05; *** p < .01.

Does the Algorithm Predict Judge Behavior after Controlling for Known Factors?

Notes. This table reports the results of estimating a linear probability specification of judges’ detain decisions against different explanatory variables in the validation set described in Table I . Each row represents a different explanatory variable for the regression, while each column reports the results of a separate regression with different combinations of explanatory variables (as indicated by the filled-in coefficients and standard errors in the table). The algorithmic predictions of the judges’ detain decision come from our convolutional neural network algorithm built using the defendants’ face image as the only feature, using data from the training data set. Measures of defendant demographics and current arrest charge come from government administrative data obtained from a combination of Mecklenburg County, NC, and state agencies. Measures of skin tone, attractiveness, competence, dominance, and trustworthiness come from subject ratings of mug shot images (see the text). Human guess variable comes from showing subjects pairs of mug shot images and asking subjects to identify the defendant they think the judge would be more likely to detain. Regression specifications also include indicators for unknown race and unknown gender. * p < .1; ** p < .05; *** p < .01.

Table II , columns (1)–(3) show the relationship of the algorithm’s predictions to demographics. The predictions vary enormously by gender (men have predicted detention likelihoods 11.9 percentage points higher than women), less so by age, 45 and by different indicators of race or ethnicity. With skin tone scored on a 0−1 continuum, defendants whom independent raters judge to be at the lightest end of the continuum are 4.4 percentage points less likely to be detained than those rated to have the darkest skin tone (column (3)). Conditional on skin tone, Black defendants have a 1.9 percentage point lower predicted likelihood of detention compared with whites. 46

Table II , column (4) shows how the algorithm’s predictions relate to facial features implicated by past psychological studies as shaping people’s judgments of one another. These features also help explain the algorithm’s predictions of judges’ detention decisions: people judged by independent raters to be one standard deviation more attractive, competent, or trustworthy have lower predicted likelihood of detention equal to 0.55, 0.91, and 0.48 percentage points, respectively, or |$2.2\%$|⁠ , |$3.6\%$|⁠ , and |$1.8\%$| of the base rate. 47 Those whom subjects judge are one standard deviation more dominant-looking have a higher predicted likelihood of detention of 0.37 percentage points (or |$1.5\%)$|⁠ .

How do we know we have controlled for everything relevant from past research? The literature on what shapes human judgments in general is vast; perhaps there are things that are relevant for judges’ decisions specifically that we have inadvertently excluded? One way to solve this problem would be to do a comprehensive scan of past studies of human judgment and decision making, and then decide which results from different non–criminal justice contexts might be relevant for criminal justice. But that itself is a form of human-driven hypothesis generation, bringing us right back to where we started.

To get out of this box, we take a different approach. Instead of enumerating individual characteristics, we ask people to embody their beliefs in a guess, which ought to be the compound of all these characteristics. Then we can ask whether the algorithm has rediscovered this human guess (and later whether it has discovered more). We ask independent subjects to look at pairs of mug shots matched by gender, race, and five-year age bins and forecast which defendant is more likely to be detained by a judge. We provide a financial incentive for accurate guesses to increase the chances that subjects take the exercise seriously. 48 We also provide subjects with an opportunity to learn by showing subjects 50 image pairs with feedback after each pair about which defendant the judge detained. We treat the first 10 image pairs from each subject as learning trials and only use data from the last 40 image pairs. This approach is intended to capture anything that influences judges’ decisions that subjects could recognize, from subtle signs of things like socioeconomic status or drug use or mood, to things people can recognize but not articulate.

It turns out subjects are modestly good at this task ( Table II ). Participants guess which mug shot is more likely to be detained at a rate of |$51.4\%$|⁠ , which is different to a statistically significant degree from the |$50\%$| random-guessing threshold. When we regress the algorithm’s predicted detention rate against these subject guesses, the coefficient is 3.99 percentage points, equal to |$17.1\%$| of the base rate.

The findings in Table II are somewhat remarkable. The only input the algorithm had access to was the raw pixel values of each mug shot, yet it has rediscovered findings from decades of previous research and human intuition.

Interestingly, these features collectively explain only a fraction of the variation in the algorithm’s predictions: the R 2 is only 0.2228. That by itself does not necessarily mean the algorithm has discovered additional useful signal. It is possible that the remaining variation is prediction error—components of the prediction that do not explain actual judges’ decisions.

In Table III , we test whether the algorithm uncovers any additional signal for actual judge decisions, above and beyond the influence of these known factors. The algorithm by itself produces an R 2 of 0.0331 (column (1)), substantially higher than all previously known features taken together, which produce an R 2 of 0.0162 (column (5)), or the human guesses alone which produce an R 2 of 0.0025 (so we can see the algorithm is much better at predicting detention from faces than people are). Another way to see that the algorithm has detected signal above and beyond these known features is that the coefficient on the algorithm prediction when included alone in the regression, 0.6963 (column (1)), changes only modestly when we condition on everything else, now equal to 0.6171 (column (7)). The algorithm seems to have discovered some novel source of signal that better predicts judge detention decisions. 49

The algorithm has made a discovery: something about the defendant’s face explains judge decisions, above and beyond the facial features implicated by existing research. But what is it about the face that matters? Without an answer, we are left with a discovery of an unsatisfying sort. We have simply replaced one black box hypothesis generation procedure (human creativity) with another (the algorithm). In what follows we demonstrate how existing methods like saliency maps cannot solve this challenge in our application and then discuss our solution to that problem.

V.A. The Challenge of Explanation

The problem of algorithm-human communication stems from the fact that we cannot simply look inside the algorithm’s “black box” and see what it is doing because m ( x ), the algorithmic predictor, is so complicated. A common solution in computer science is to forget about looking inside the algorithmic black box and focus instead on drawing inferences from curated outputs of that box. Many of these methods involve gradients: given a prediction function m ( x ), we can calculate the gradient |$\nabla m(x) = \frac{\mathrm{d}{m}}{\mathrm{d}x}(x)$|⁠ . This lets us determine, at any input value, what change in the input vector maximally changes the prediction. 50 The idea of gradients is useful for image classification tasks because it allows us to tell which pixel image values are most important for changing the predicted outcome.

For example, a widely used method known as saliency maps uses gradient information to highlight the specific pixels that are most important for predicting the outcome of interest ( Baehrens et al. 2010 ; Simonyan, Vedaldi, and Zisserman 2014 ). This approach works well for many applications like determining whether a given picture contains a given type of animal, a common task in ecology ( Norouzzadeh et al. 2018 ). What distinguishes a cat from a dog? A saliency map for a cat detector might highlight pixels around, say, the cat’s head: what is most cat-like is not the tail, paws, or torso, but the eyes, ears, and whiskers. But more complicated outcomes of the sort social scientists study may depend on complicated functions of the entire image.

Even if saliency maps were more selective in highlighting pixels in applications like ours, for hypothesis generation they also suffer from a second limitation: they do not convey enough information to enable people to articulate interpretable hypotheses. In the cat detector example, a saliency map can tell us that something about the cat’s (say) whiskers are key for distinguishing cats from dogs. But what about that feature matters? Would a cat look more like a dog if its whiskers were longer? Or shorter? More (or less?) even in length? People need to know not just what features matter but how they must change to change the prediction. For hypothesis generation, the saliency map undercommunicates with humans.

To test the ability of saliency maps to help with our application, we focused on a facial feature that people already understand and can easily recognize from a photo: age. We first build an algorithm that predicts each defendant’s age from their mug shot. For a representative image, as in the top left of Figure III , we can highlight which pixels are most important for predicting age, shown in the top right. 51 A key limitation of saliency maps is easy to see: because age (like many human facial features) is a function of almost every part of a person’s face, the saliency map highlights almost everything.

Candidate Algorithm-Human Communication Vehicles for a Known Facial Feature: Age

Candidate Algorithm-Human Communication Vehicles for a Known Facial Feature: Age

Panel A shows a randomly selected point in the GAN latent space for a non-Hispanic white male defendant. Panel B shows a saliency map that highlights the pixels that are most important for an algorithmic model that predicts the defendant’s age from the mug shot image. Panel C shows an image changed or “morphed” in the direction of older age, based on the gradient of the image-based age prediction, using the “naive” morphing procedure that does not constrain the new image to lie on the face manifold (see the text). Panel D shows the image morphed to the maximum age using our actual preferred morphing procedure.

An alternative to simply highlighting high-leverage pixels is to change them in the direction of the gradient of the predicted outcome, to—ideally—create a new face that now has a different predicted outcome, what we call “morphing.” This new image answers the counterfactual question: “How would this person’s face change to increase their predicted outcome?” Our approach builds on the ability of people to comprehend ideas through comparisons, so we can show morphed image pairs to subjects to have them name the differences that they see. Figure IV summarizes our semiautomated hypothesis generation pipeline. (For more details see Online Appendix B .) The benefit of morphed images over actual mug shot images is to isolate the differences across faces that matter for the outcome of interest. By reducing noise, morphing also reduces the risk of spurious discoveries.

Hypothesis Generation Pipeline

Hypothesis Generation Pipeline

The diagram illustrates all the algorithmic components in our procedure by presenting a full pipeline for algorithmic interpretation.

Figure V illustrates how this morphing procedure works in practice and highlights some of the technical challenges that arise. Let the box in the top panel represent the space of all possible images—all possible combinations of pixel values for, say, a 512 × 512 image. Within this space, we can apply our mug shot–based predictor of the known facial feature, age, to identify all images with the same predicted age, as shown by the contour map of the prediction function. Imagine picking some random initial mug shot image. We could follow the gradient to find an image with a higher predicted value of the outcome y .

Morphing Images for Detention Risk On and Off the Face Manifold

Morphing Images for Detention Risk On and Off the Face Manifold

The figure shows the difference between an unconstrained (naive) morphing procedure and our preferred new morphing approach. In both panels, the background represents the image space (set of all possible pixel values) and the blue line (color version available online) represents the set of all pixel values that correspond to any face image (the face manifold). The orange lines show all images that have the same predicted outcome (isoquants in predicted outcome). The initial face (point on the outermost contour line) is a randomly selected face in GAN face space. From there we can naively follow the gradients of an algorithm that predicts some outcome of interest from face images. As shown in Panel A, this takes us off the face manifold and yields a nonface image. Alternatively, with a model of the face manifold, we can follow the gradient for the predicted outcome while ensuring that the new image is again a realistic instance as shown in Panel B.

The challenge is that most points in this image space are not actually face images. Simply following the gradient will usually take us off the data distribution of face images, as illustrated abstractly in the top panel of Figure V . What this means in practice is shown in the bottom left panel of Figure III : the result is an image that has a different predicted outcome (in the figure, illustrated for age) but no longer looks like a real instance—that is, no longer looks like a realistic face image. This “naive” morphing procedure will not work without some way to ensure the new point we wind up on in image space corresponds to a realistic face image.

V.B. Building a Model of the Data Distribution

To ensure morphing leads to realistic face images, we need a model of the data distribution p ( x )—in our specific application, the set of images that are faces. We rely on an unsupervised learning approach to this problem. 52 Specifically, we use generative adversarial networks (GANs), originally introduced to generate realistic new images for a variety of tasks (see Goodfellow et al. 2014 ). 53

A GAN is built by training two algorithms that “compete” with each another, the generator G and the classifier C : the generator creates synthetic images and the classifier (or “discriminator”), presented with synthetic or real images, tries to distinguish which is which. A good discriminator pressures the generator to produce images that are harder to distinguish from real; in turn, a good generator pressures the classifier to get better at discriminating real from synthetic images. Data on actual faces are used to train the discriminator, which results in the generator being trained as it seeks to fool the discriminator. With machine learning, the performance of C and G improve with successive iterations of training. A perfect G would output images where the classifier C does no better than random guessing. Such a generator would by definition limit itself to the same input space that defines real images, that is, the data distribution of faces. (Additional discussion of GANs in general and how we construct our GAN specifically are in Online Appendix B .)

To build our GAN and evaluate its expressiveness we use standard training metrics, which turn out to compare favorably to what we see with other widely used GAN models on other data sets (see Online Appendix B.C for details). A more qualitative way to judge our GAN comes from visual inspection; some examples of synthetic face images are in Figure II . Most importantly, the GAN we build (as is true of GANs in general) is not generic. GANs are specific. They do not generate “faces” but instead seek to match the distribution of pixel combinations in the training data. For example, our GAN trained using mug shots would never generate generic Facebook profile photos or celebrity headshots.

Figure V illustrates how having a model such as the GAN lets morphing stay on the data distribution of faces and produce realistic images. We pick a random point in the space of faces (mug shots) and then use the algorithmic predictor of the outcome of interest m ( x ) to identify nearby faces that are similar in all respects except those relevant for the outcome. Notice this procedure requires that faces closer to one another in GAN latent space should look relatively more similar to one another to a human in pixel space. Otherwise we might make a small movement along the gradient and wind up with a face that looks different in all sorts of other ways that are irrelevant to the outcome. That is, we need the GAN not just to model the support of the data but also to provide a meaningful distance metric.

When we produce these morphs, what can possibly change as we morph? In principle there is no limit. The changes need not be local: features such as skin color, which involves many pixels, could change. So could features such as attractiveness, where the pixels that need to change to make a face more attractive vary from face to face: the “same” change may make one face more attractive and another less so. Anything represented in the face could change, as could anything else in the image beyond the face that matters for the outcome (if, for example, localities varied in both detention rates and the type of background they have someone stand in front of for mug shots).

In practice, though, there is a limit. What can change depends on how rich and expressive the estimated GAN is. If the GAN fails to capture a certain kind of face or a dimension of the face, then we are unlikely to be able to morph on that dimension. The morphing procedure is only as complete as the GAN is expressive. Assuming the GAN expresses a feature, then if m ( x ) truly depends on that feature, morphing will likely display it. Nor is there any guarantee that in any given application the classifier m ( x ) will find novel signal for the outcome y , or that the GAN successfully learns the data distribution ( Nalisnick et al. 2018 ), or that subjects can detect and articulate whatever signal the classifier algorithm has discovered. Determining the general conditions under which our procedure will work is something we leave to future research. Whether our procedure can work for the specific application of judge decisions is the question to which we turn next. 54

V.C. Validating the Morphing Procedure

We return to our algorithmic prediction of a known facial feature—age—and see what morphing by age produces as a way to validate or test our procedure. When we follow the gradient of the predicted outcome (age), by constraining ourselves to stay on the GAN’s latent space of faces we wind up with a new age-morphed face that does indeed look like a realistic face image, as shown in the bottom right of Figure III . We seem to have successfully developed a model of the data distribution and a way to move around on that surface to create realistic new instances.

To figure out if algorithm-human communication occurs, we run these age-morphed image pairs through our experimental pipeline ( Figure IV ). Our procedure is only useful if it is replicable—that is, if it does not depend on the idiosyncratic insights of any particular person. For that reason, the people looking at these images and articulating what they see should not be us (the investigators) but a sample of external, independent study subjects. In our application, we use Prolific workers (see Online Appendix Table A.III ). Reliability or replicability is indicated by the agreement in the subject responses: lots of subjects see and articulate the same thing in the morphed images.

We asked subjects to look at 50 age-morphed image pairs selected at random from a population of 100 pairs, and told them the images in each pair differ on some hidden dimension but did not tell them what that was. 55 We asked subjects to guess which image expresses that hidden feature more, gave them feedback about the right answer, treated the first 10 image pairs as learning examples, and calculated accuracy on the remaining 40 images. Subjects correctly selected the older image |$97.8\%$| of the time.

The final step was to ask subjects to name what differs in image pairs. Making sense of these responses requires some way to group them into semantic categories. Each subject comment could include several concepts (e.g., “wrinkles, gray hair, tired”). We standardized these verbal descriptions by removing punctuation, using only lowercase characters, and removing stop words. We gave three research assistants not otherwise involved in the project these responses and asked them to create their own categories that would capture all the responses (see Online Appendix Figure A.XIII ). We also gave them an illustrative subject comment and highlighted the different “types” of categories (descriptive physical features, i.e., “thick eyebrows,” descriptive impression category, i.e., “energetic,” but also an illustration of a category of comment that is too vague to lend itself to useful measurement, i.e., “ears”). In our validation exercise |$81.5\%$| of subject reports fall into the semantic categories of either age or the closely related feature of hair color. 56

V.D. Understanding the Judge Detention Predictor

Having validated our algorithm-human communication procedure for the known facial feature of age, we are ready to apply it to generate a new hypothesis about what drives judge detention decisions. To do this we combine the mug shot algorithm predictor of judges’ detention decisions, m ( x ), with our GAN of the data distribution of mug shot images, then create new synthetic image pairs morphed with respect to the likelihood the judge would detain the defendant (see Figure IV ).

The top panel of Figure VI shows a pair of such images. Underneath we show an “image strip” of intermediate steps, along with each image’s predicted detention rate. With an overall detention rate of |$23.3\%$| in our validation data set, morphing takes us from about one-half the base rate ( ⁠|$13\%$|⁠ ) up to nearly twice the base rate ( ⁠|$41\%$|⁠ ). Additional examples of morphed image pairs are shown in Figure VII .

Illustration of Morphed Faces along the Detention Gradient

Illustration of Morphed Faces along the Detention Gradient

Panel A shows the result of selecting a random point on the GAN latent face space for a white non-Hispanic male defendant, then using our new morphing procedure to increase the predicted detention risk of the image to 0.41 (left) or reduce the predicted detention risk down to 0.13 (right). The overall average detention rate in the validation data set of actual mug shot images is 0.23 by comparison. Panel B shows the different intermediate images between these two end points, while Panel C shows the predicted detention risk for each of the images in the middle panel.

Examples of Morphing along the Gradients of the Face-Based Detention Predictor

Examples of Morphing along the Gradients of the Face-Based Detention Predictor

We showed 54 subjects 50 detention-risk-morphed image pairs each, asked them to predict which defendant would be detained, offered them financial incentives for correct answers, 57 and gave them feedback on the right answer. Online Appendix Figure A.XV shows how accurate subjects are as they get more practice across successive morphed image pairs. With the initial image-pair trials, subjects are not much better than random guessing, in the range of what we see when subjects look at pairs of actual mugshots (where accuracy is |$51.4\%$| across the final 40 mug shot pairs people see). But unlike what happens when subjects look at actual images, when looking at morphed image pairs subjects seem to quickly learn what the algorithm is trying to communicate to them. Accuracy increased by over 10 percentage points after 20 morphed image pairs and reached |$67\%$| after 30 image pairs. Compared to looking at actual mugshots, the morphing procedure accomplished its goal of making it easier for subjects to see what in the face matters most for detention risk.

We asked subjects to articulate the key differences they saw across morphed image pairs. The result seems to be a reliable hypothesis—a facial feature that a sizable share of subjects name. In the top panel of Figure VIII , we present a histogram of individual tokens (cleaned words from worker comments) in “word cloud” form, where word size is approximately proportional to frequency. 58 Some of the most common words are “shaved,” “cleaner,” “length,” “shorter,” “moustache,” and “scruffy.” To form semantic categories, we use a procedure similar to what we describe for our validation exercise for the known feature of age. 59 Grouping tokens into semantic categories, we see that nearly |$40\%$| of the subjects see and name a similar feature that they think helps explain judge detention decisions: how well-groomed the defendant is (see the bottom panel of Figure VIII ). 60

Subject Reports of What They See between Detention-Risk-Morphed Image Pairs

Subject Reports of What They See between Detention-Risk-Morphed Image Pairs

Panel A shows a word cloud of subject reports about what they see as the key difference between image pairs where one is a randomly selected point in the GAN latent space and the other is morphed in the direction of a higher predicted detention risk. Words are approximately proportionately sized to the frequency of subject mentions. Panel B shows the frequency of semantic groupings of those open-ended subject reports (see the text for additional details).

Can we confirm that what the subjects think the algorithm is seeing is what the algorithm actually sees? We asked a separate set of 343 independent subjects (MTurk workers) to label the 32,881 mug shots in our combined training and validation data sets for how well-groomed each image was perceived to be on a nine-point scale. 61 For data sets of our size, these labeling costs are fairly modest, but in principle those costs could be much more substantial (or even prohibitive) in some applications.

Table IV suggests algorithm-human communication has successfully occurred: our new hypothesis, call it h 1 ( x ), is correlated with the algorithm’s prediction of the judge, m ( x ). If subjects were mistaken in thinking they saw well-groomed differences across images, there would be no relationship between well-groomed and the detention predictions. Yet what we actually see is the R 2 from regressing the algorithm’s predictions against well-groomed equals 0.0247, or |$11\%$| of the R 2 we get from a model with all the explanatory variables (0.2361). In a bivariate regression the coefficient (−0.0172) implies that a one standard deviation increase in well-groomed (1.0118 points on our 9-point scale) is associated with a decline in predicted detention risk of 1.74 percentage points, or |$7.5\%$| of the base rate. Another way to see the explanatory power of this hypothesis is to note that this coefficient hardly changes when we add all the other explanatory variables to the regression (equal to −0.0153 in the final column) despite the substantial increase in the model’s R 2 .

Correlation between Well-Groomed and the Algorithm’s Prediction

Notes. This table shows the results of estimating a linear probability specification regressing algorithmic predictions of judges’ detain decision against different explanatory variables, using data from the validation set of cases from Mecklenburg County, NC. Each row of the table represents a different explanatory variable for the regression, while each column reports the results of a separate regression with different combinations of explanatory variables (as indicated by the filled-in coefficients and standard errors in the table). Algorithmic predictions of judges’ decisions come from applying an algorithm built with face images in the training data set to validation set observations. Data on well-groomed, skin tone, attractiveness, competence, dominance, and trustworthiness come from subject ratings of mug shot images (see the text). Human guess variable comes from showing subjects pairs of mug shot images and asking subjects to identify the defendant they think the judge would be more likely to detain. Regression specifications also include indicators for unknown race and unknown gender. * p < .1; ** p < .05; *** p < .01.

V.E. Iteration

Our procedure is iterable. The first novel feature we discovered, well-groomed, explains some—but only some—of the variation in the algorithm’s predictions of the judge. We can iterate our procedure to generate hypotheses about the remaining residual variation as well. Note that the order in which features are discovered will depend on how important each feature is in explaining the judge’s detention decision and on how salient each feature is to the subjects who are viewing the morphed image pairs. So explanatory power for the judge’s decisions need not monotonically decline as we iterate and discover new features.

To isolate the algorithm’s signal above and beyond what is explained by well-groomed, we wish to generate a new set of morphed image pairs that differ in predicted detention but hold well-groomed constant. That would help subjects see other novel features that might differ across the detention-risk-morphed images, without subjects getting distracted by differences in well-groomed. 62 But iterating the procedure raises several technical challenges. To see these challenges, consider what would in principle seem to be the most straightforward way to orthogonalize, in the GAN’s latent face space:

use training data to build predictors of detention risk, m ( x ), and the facial features to orthogonalize against, h 1 ( x );

pick a point on the GAN latent space of faces;

collect the gradients with respect to m ( x ) and h 1 ( x );

use the Gram-Schmidt process to move within the latent space toward higher predicted detention risk m ( x ), but orthogonal to h 1 ( x ); and

show new morphed image pairs to subjects, have them name a new feature.

The challenge with implementing this playbook in practice is that we do not have labels for well-groomed for the GAN-generated synthetic faces. Moreover, it would be infeasible to collect this feature for use in this type of orthogonalization procedure. 63 That means we cannot orthogonalize against well-groomed, only against predictions of well-groomed. And orthogonalizing with respect to a prediction is an error-prone process whenever the predictor is imperfect (as it is here). 64 The errors in the process accumulate as we take many morphing steps. Worse, that accumulated error is not expected to be zero on average. Because we are morphing in the direction of predicted detention and we know predicted detention is correlated with well-groomed, the prediction error will itself be correlated with well-groomed.

Instead we use a different approach. We build a new detention-risk predictor with a curated training data set, limited to pairs of images matched on the features to be orthogonalized against. For each detained observation i (such that y i  = 1), we find a released observation j (such that y j  = 0) where h 1 ( x i ) =  h 1 ( x j ). In that training data set y is now orthogonal to h 1 ( x ), so we can use the gradient of the orthogonalized detention risk predictor to move in GAN latent space to create new morphed images with different detention odds but are similar with respect to well-groomed. 65 We call these “orthogonalized morphs,” which we then feed into the experimental pipeline shown in Figure IV . 66 An open question for future work is how many iterations are possible before the dimensionality of the matching problem required for this procedure would create problems.

Examples from this orthogonalized image-morphing procedure are in Figure IX . Changes in facial features across morphed images are notably different from those in the first iteration of morphs as in Figure VI . From these examples, it appears possible that orthogonalization may be slightly imperfect; sometimes they show subtle differences in “well-groomed” and perhaps age. As with the first iteration of the morphing procedure, the second (orthogonalized) iteration of the procedure again generates images that vary substantially in their predicted risk, from 0.07 up to 0.27 (see Online Appendix Figure A.XVIII ).

Examples of Morphing along the Orthogonal Gradients of the Face-Based Detention Predictor

Examples of Morphing along the Orthogonal Gradients of the Face-Based Detention Predictor

Still, there is a salient new signal: when presented to subjects they name a second facial feature, as shown in Figure X . We showed 52 subjects (Prolific workers) 50 orthogonalized morphed image pairs and asked them to name the differences they see. The word cloud shown in the top panel of Figure X shows that some of the most common terms reported by subjects include “big,” “wider,” “presence,” “rounded,” “body,” “jaw,” and “head.” When we ask independent research assistants to group the subject tokens into semantic groups, we can see as in the bottom of the figure that a sizable share of subject comments (around |$22\%$|⁠ ) refer to a similar facial feature, h 2 ( x ): how “heavy-faced” or “full-faced” the defendant is.

Subject Reports of What They See between Detention-Risk-Morphed Image Pairs, Orthogonalized to the First Novel Feature Discovered (Well-Groomed)

Subject Reports of What They See between Detention-Risk-Morphed Image Pairs, Orthogonalized to the First Novel Feature Discovered (Well-Groomed)

Panel A shows a word cloud of subject reports about what they see as the key difference between image pairs, where one is a randomly selected point in the GAN latent space and the other is morphed in the direction of a higher predicted detention risk, where we are moving along the detention gradient orthogonal to well-groomed and skin tone (see the text). Panel B shows the frequency of semantic groupings of these open-ended subject reports (see the text for additional details).

This second facial feature (like the first) is again related to the algorithm’s prediction of the judge. When we ask a separate sample of subjects (343 MTurk workers, see Online Appendix Table A.III ) to independently label our validation images for heavy-facedness, we can see the R 2 from regressing the algorithm’s predictions against heavy-faced yields an R 2 of 0.0384 ( Table V , column (1)). With a coefficient of −0.0182 (0.0009), the results imply that a one standard deviation change in heavy-facedness (1.1946 points on our 9-point scale) is associated with a reduced predicted detention risk of 2.17 percentage points, or |$9.3\%$| of the base rate. Adding in other facial features implicated by past research substantially boosts the adjusted R 2 of the regression but barely changes the coefficient on heavy-facedness.

Correlation between Heavy-Faced and the Algorithm’s Prediction

Notes. This table shows the results of estimating a linear probability specification regressing algorithmic predictions of judges’ detain decision against different explanatory variables, using data from the validation set of cases from Mecklenburg County, NC. Each row of the table represents a different explanatory variable for the regression, while each column reports the results of a separate regression with different combinations of explanatory variables (as indicated by the filled-in coefficients and standard errors in the table). Algorithmic predictions of judges’ decisions come from applying the algorithm built with face images in the training data set to validation set observations. Data on heavy-faced, well-groomed, skin tone, attractiveness, competence, dominance, and trustworthiness come from subject ratings of mug shot images (see the text). Human guess variable comes from showing subjects pairs of mug shot images and asking subjects to identify the defendant they think the judge would be more likely to detain. Regression specifications also include indicators for unknown race and unknown gender. * p < .1; ** p < .05; *** p < .01.

In principle, the procedure could be iterated further. After all, well-groomed, heavy-faced plus previously known facial features all together still only explain |$27\%$| of the variation in the algorithm’s predictions of the judges’ decisions. As long as there is residual variation, the hypothesis generation crank could be turned again and again. Because our goal is not to fully explain judges’ decisions but to illustrate that the procedure works and is iterable, we leave this for future work (ideally done on data from other jurisdictions as well).

Here we consider whether the new hypotheses our procedure has generated meet our final criterion: empirical plausibility. We show that these facial features are new not just to the scientific literature but also apparently to criminal justice practitioners, before turning to whether these correlations might reflect some underlying causal relationship.

VI.A. Do These Hypotheses Predict What Judges Actually Do?

Empirical plausibility need not be implied by the fact that our new facial features are correlated with the algorithm’s predictions of judges’ decisions. The algorithm, after all, is not a perfect predictor. In principle, well-groomed and heavy-faced might be correlated with the part of the algorithm’s prediction that is unrelated to judge behavior, or m ( x ) − y .

In Table VI , we show that our two new hypotheses are indeed empirically plausible. The adjusted R 2 from regressing judges’ decisions against heavy-faced equals 0.0042 (column (1)), while for well-groomed the figure is 0.0021 (column (2)) and for both together the figure equals 0.0061 (column (3)). As a benchmark, the adjusted R 2 from all variables (other than the algorithm’s overall mug shot–based prediction) in explaining judges’ decisions equals 0.0218 (column (6)). So the explanatory power of our two novel hypotheses alone equals about |$28\%$| of what we get from all the variables together.

Do Well-Groomed and Heavy-Faced Correlate with Judge Decisions?

Notes. This table reports the results of estimating a linear probability specification of judges’ detain decisions against different explanatory variables in the validation set described in Table I . The algorithmic predictions of the judges’ detain decision come from our convolutional neural network algorithm built using the defendants’ face image as the only feature, using data from the training data set. Measures of defendant demographics and current arrest charge come from Mecklenburg County, NC, administrative data. Data on heavy-faced, well-groomed, skin tone, attractiveness, competence, dominance, and trustworthiness come from subject ratings of mug shot images (see the text). Human guess variable comes from showing subjects pairs of mug shot images and asking subjects to identify the defendant they think the judge would be more likely to detain. Regression specifications also include indicators for unknown race and unknown gender. * p < .1; ** p < .05; *** p < .01.

For a sense of the magnitude of these correlations, the coefficient on heavy-faced of −0.0234 (0.0036) in column (1) and on well-groomed of −0.0198 (0.0043) in column (2) imply that one standard deviation changes in each variable are associated with reduced detention rates equal to 2.8 and 2.0 percentage points, respectively, or |$12.0\%$| and |$8.9\%$| of the base rate. Interestingly, column (7) shows that heavy-faced remains statistically significant even when we control for the algorithm’s prediction. The discovery procedure led us to a facial feature that, when measured independently, captures signal above and beyond what the algorithm found. 67

VI.B. Do Practitioners Already Know This?

Our procedure has identified two hypotheses that are new to the existing research literature and to our study subjects. Yet the study subjects we have collected data from so far likely have relatively little experience with the criminal justice system. A reader might wonder: do experienced criminal justice practitioners already know that these “new” hypotheses affect judge decisions? The practitioners might have learned the influence of these facial features from day-to-day experience.

To answer this question, we carried out two smaller-scale data collections with a sample of N  = 15 staff at a public defender’s office and a legal aid society. We first asked an open-ended question: on what basis do judges decide to detain versus release defendants pretrial? Practitioners talked about judge misunderstandings of the law, people’s prior criminal records, and judge underappreciation for the social contexts in which criminal records arise. Aside from the defendant’s race, nothing about the appearance of defendants was mentioned.

We showed practitioners pairs of actual mug shots and asked them to guess which person is more likely to be detained by a judge (as we had done with MTurk and Prolific workers). This yields a sample of 360 detention forecasts. After seeing these mug shots practitioners were asked an open-ended question about what they think matters about the defendant’s appearance for judge detention decisions. There were a few mentions of well-groomed and one mention of something related to heavy-faced, but these were far from the most frequently mentioned features, as seen in Online Appendix Figure A.XX .

The practitioner forecasts do indeed seem to be more accurate than those of “regular” study subjects. Table VII , column (5) shows that defendants whom the practitioners predict will be detained are 29.2 percentage points more likely to actually be detained, even after controlling for the other known determinants of detention from past research. This is nearly four times the effect of forecasts made by Prolific workers, as shown in the last column of Table VI . The practitioner guesses (unlike the regular study subjects) are even about as accurate as the algorithm; the R 2 from the practitioner guess (0.0165 in column (1)) is similar to the R 2 from the algorithm’s predictions (0.0166 in column (6)).

Results from the Criminal Justice Practitioner Sample

Notes. This table shows the results of estimating judges’ detain decisions using a linear probability specification of different explanatory variables on a subset of the validation set. The criminal justice practitioner’s guess about the judge’s decision comes from showing 15 different public defenders and legal aid society members actual mug shot images of defendants and asking them to report which defendant they believe the judge would be more likely to detain. The pairs are selected to be congruent in gender and race but discordant in detention outcome. The algorithmic predictions of judges’ detain decisions come from applying the algorithm, which is built with face images in the training data set, to validation set observations. Measures of defendant demographics and current arrest charge come from Mecklenburg County, NC, administrative data. Data on heavy-faced, well-groomed, skin tone, attractiveness, competence, dominance, and trustworthiness come from subject ratings of mug shot images (see the text). Regression specifications also include indicators for unknown race and unknown gender. * p < .1; ** p < .05; *** p < .01.

Yet practitioners do not seem to already know what the algorithm has discovered. We can see this in several ways in Table VII . First, the sum of the adjusted R 2 values from the bivariate regressions of judge decisions against practitioner guesses and judge decisions against the algorithm mug shot–based prediction is not so different from the adjusted R 2 from including both variables in the same regression (0.0165 + 0.0166 = 0.0331 from columns (1) plus (6), versus 0.0338 in column (7)). We see something similar for the novel features of well-groomed and heavy-faced specifically as well. 68 The practitioners and the algorithm seem to be tapping into largely unrelated signal.

VI.C. Exploring Causality

Are these novel features actually causally related to judge decisions? Fully answering that question is clearly beyond the scope of the present article. But we can present some additional evidence that is at least suggestive.

For starters we can rule out some obvious potential confounders. With the specific hypotheses in hand, identifying the most important concerns with confounding becomes much easier. In our application, well-groomed and heavy-faced could in principle be related to things like (say) the degree to which the defendant has a substance-abuse problem, is struggling with mental health, or their socioeconomic status. But as shown in a series of Online Appendix  tables, we find that when we have study subjects independently label the mug shots in our validation data set for these features and then control for them, our novel hypotheses remain correlated with the algorithmic predictions of the judge and actual judge decisions. 69 We might wonder whether heavy-faced is simply a proxy for something that previous mock-trial-type studies suggest might matter for criminal justice decisions, “baby-faced” ( Berry and Zebrowitz-McArthur 1988 ). 70 But when we have subjects rate mug shots for baby-facedness, our full-faced measure remains strongly predictive of the algorithm’s predictions and actual judge decisions; see Online Appendix Tables A.XII and A.XVI .

In addition, we carried out a laboratory-style experiment with Prolific workers. We randomly morphed synthetic mug shot images in the direction of either higher or lower well-groomed (or full-faced), randomly assigned structured variables (current charge and prior record) to each image, explained to subjects the detention decision judges are asked to make, and then asked them which from each pair of subjects they would be more likely to detain if they were the judge. The framework from Mobius and Rosenblat (2006) helps clarify what this lab experiment gets us: appearance might affect how others treat us because others are reacting to something about our own appearance directly, because our appearance affects our own confidence, or because our appearance affects our effectiveness in oral communication. The experiment’s results shut down these latter two mechanisms and isolate the effects of something about appearance per se, recognizing it remains possible well-groomed and heavy-faced are correlated with some other aspect of appearance. 71

The study subjects recommend for detention those subjects with higher-risk structured variables (like current charge and prior record), which at the very least suggests they are taking the task seriously. Holding these other case characteristics constant, we find that the subjects are more likely to recommend for detention those defendants who are less well-groomed or less heavy-faced (see Online Appendix Table A.XVII ). Qualitatively, these results support the idea that well-groomed and heavy-faced could have a causal effect. It is not clear that the magnitudes in these experiments necessarily have much meaning: the subjects are not actual judges, and the context and structure of choice is very different from real detention decisions. Still, it is worth noting that the magnitudes implied by our results are nontrivial. Changing well-groomed or heavy-faced has the same effect on subject decisions as a movement within the predicted rearrest risk distribution of 4 and 6 percentile points, respectively (see Online Appendix C for details). Of course only an actual field experiment could conclusively determine causality here, but carrying out that type of field experiment might seem more worthwhile to an investigator in light of the lab experiment’s results.

Is this enough empirical support for these hypotheses to justify incurring the costs of causal testing? The empirical basis for these hypotheses would seem to be at least as strong as (or perhaps stronger than) the informal standard currently used to decide whether an idea is promising enough to test, which in our experience comes from some combination of observing the world, brainstorming, and perhaps some exploratory investigator-driven correlational analysis.

What might such causal testing look like? One possibility would follow in the spirit of Goldin and Rouse (2000) and compare detention decisions in settings where the defendant is more versus less visible to the judge to alter the salience of appearance. For example, many jurisdictions have continued to use some version of virtual hearings even after the pandemic. 72 In Chicago the court system has the defendant appear virtually but everyone else is in person, and the court system of its own volition has changed the size of the monitors used to display the defendant to court participants. One could imagine adding some planned variation to screen size or distance or angle to the judge. These video feeds could in principle be randomly selected for AI adjustment to the defendant’s level of well-groomedness or heavy-facedness (this would probably fall into a legal gray area). In the case of well-groomed, one could imagine a field experiment that changed this aspect of the defendant’s actual appearance prior to the court hearing. We are not claiming these are the right designs but intend only to illustrate that with new hypotheses in hand, economists are positioned to deploy the sort of creativity and rigorous testing that have become the hallmark of the field’s efforts at causal inference.

We have presented a new semi-automated procedure for hypothesis generation. We applied this new procedure to a concrete, socially important application: why judges jail some defendants and not others. Our procedure suggests two novel hypotheses: some defendants appear more well-groomed or more heavy-faced than others.

Beyond the specific findings from our illustrative application, our empirical analysis also illustrates a playbook for other applications. Start with a high-dimensional predictor m ( x ) of some behavior of interest. Build an unsupervised model of the data distribution, p ( x ). Then combine the models for m ( x ) and p ( x ) in a morphing procedure to generate new instances that answer the counterfactual question: what would a given instance look like with higher or lower likelihood of the outcome? Show morphed pairs of instances to participants and get them to name what they see as the differences between morphed instances. Get others to independently rate instances for whatever the new hypothesis is; do these labels correlate with both m ( x ) and the behavior of interest, y ? If so, we have a new hypothesis worth causal testing. This playbook is broadly applicable whenever three conditions are met.

The first condition is that we have a behavior we can statistically predict. The application we examine here fits because the behavior is clearly defined and measured for many cases. A study of, say, human creativity would be more challenging because it is not clear that it can be measured ( Said-Metwaly, Van den Noortgate, and Kyndt 2017 ). A study of why U.S. presidents use nuclear weapons during wartime would be challenging because there have been so few cases.

The second condition relates to what input data are available to predict behavior. Our procedure is likely to add only modest value in applications where we only have traditional structured variables, because those structured variables already make sense to people. Moreover the structured variables are usually already hypothesized to affect different behaviors, which is why economists ask about them on surveys. Our procedure will be more helpful with unstructured, high-dimensional data like images, language, and time series. The deeper point is that the collection of such high-dimensional data is often incidental to the scientific enterprise. We have images because the justice system photographs defendants during booking. Schools collect text from students as part of required assignments. Cellphones create location data as part of cell tower “pings.” These high-dimensional data implicitly contain an endless number of “features.”

Such high-dimensional data have already been found to predict outcomes in many economically relevant applications. Student essays predict graduation. Newspaper text predicts political slant of writers and editors. Federal Open Market Committee notes predict asset returns or volatility. X-ray images or EKG results predict doctor diagnoses (or misdiagnoses). Satellite images predict the income or health of a place. Many more relationships like these remain to be explored. From such prediction models, one could readily imagine human inspection of morphs leading to novel features. For example, suppose high-frequency data on volume and stock prices are used to predict future excess returns, for example, to understand when the market over- or undervalues a stock. Morphs of these time series might lead us to discover the kinds of price paths that produce overreaction. After all, some investors have even named such patterns (e.g., “head and shoulders,” “double bottom”) and trade on them.

The final condition is to be able to morph the input data to create new cases that differ in the predicted outcome. This requires some unsupervised learning technique to model the data distribution. The good news is that a number of such techniques are now available that work well with different types of high-dimensional data. We happen to use GANs here because they work well with images. But our procedure can accomodate a variety of unsupervised models. For example for text we can use other methods like Bidirectional Encoder Representations from Transformers ( Devlin et al. 2018 ), or for time series we could use variational auto-encoders ( Kingma and Welling 2013 ).

An open question is the degree to which our experimental pipeline could be changed by new technologies, and in particular by recent innovations in generative modeling. For example, several recent models allow people to create new synthetic images from text descriptions, and so could perhaps (eventually) provide alternative approaches to the creation of counterfactual instances. 73 Similarly, recent generative language models appear to be able to process images (e.g., GPT-4), although they are only recently publicly available. Because there is inevitably some uncertainty in forecasting what those tools will be able to do in the future, they seem unlikely to be able to help with the first stage of our procedure’s pipeline—build a predictive model of some behavior of interest. To see why, notice that methods like GPT-4 are unlikely to have access to data on judge decisions linked to mug shots. But the stage of our pipeline that GPT-4 could potentially be helpful for is to substitute for humans in “naming” the contrasts between the morphed pairs of counterfactual instances. Though speculative, such innovations potentially allow for more of the hypothesis generation procedure to be automated. We leave the exploration of these possibilities to future work.

Finally, it is worth emphasizing that hypothesis generation is not hypothesis testing. Each follows its own logic, and one procedure should not be expected to do both. Each requires different methods and approaches. What is needed to creatively produce new hypotheses is different from what is needed to carefully test a given hypothesis. Testing is about the curation of data, an effort to compare comparable subsets from the universe of all observations. But the carefully controlled experiment’s focus on isolating the role of a single prespecified factor limits the ability to generate new hypotheses. Generation is instead about bringing as much data to bear as possible, since the algorithm can only consider signal within the data available to it. The more diverse the data sources, the more scope for discovery. An algorithm could have discovered judge decisions are influenced by football losses, as in Eren and Mocan (2018) , but only if we thought to merge court records with massive archives of news stories as for example assembled by Leskovec, Backstrom, and Kleinberg (2009) . For generating ideas, creativity in experimental design useful for testing is replaced with creativity in data assembly and merging.

More generally, we hope to raise interest in the curious asymmetry we began with. Idea generation need not remain such an idiosyncratic or nebulous process. Our framework hopefully illustrates that this process can also be modeled. Our results illustrate that such activity could bear actual empirical fruit. At a minimum, these results will hopefully spur more theoretical and empirical work on hypothesis generation rather than leave this as a largely “prescientific” activity.

This is a revised version of Chicago Booth working paper 22-15 “Algorithmic Behavioral Science: Machine Learning as a Tool for Scientific Discovery.” We gratefully acknowledge support from the Alfred P. Sloan Foundation, Emmanuel Roman, and the Center for Applied Artificial Intelligence at the University of Chicago, and we thank Stephen Billings for generously sharing data. For valuable comments we thank Andrei Shleifer, Larry Katz, and five anonymous referees, as well as Marianne Bertrand, Jesse Bruhn, Steven Durlauf, Joel Ferguson, Emma Harrington, Supreet Kaur, Matteo Magnaricotte, Dev Patel, Betsy Levy Paluck, Roberto Rocha, Evan Rose, Suproteem Sarkar, Josh Schwartzstein, Nick Swanson, Nadav Tadelis, Richard Thaler, Alex Todorov, Jenny Wang, and Heather Yang, plus seminar participants at Bocconi, Brown, Columbia, ETH Zurich, Harvard, the London School of Economics, MIT, Stanford, the University of California Berkeley, the University of Chicago, the University of Pennsylvania, the University of Toronto, the 2022 Behavioral Economics Annual Meetings, and the 2022 NBER Summer Institute. For invaluable assistance with the data and analysis we thank Celia Cook, Logan Crowl, Arshia Elyaderani, and especially Jonas Knecht and James Ross. This research was reviewed by the University of Chicago Social and Behavioral Sciences Institutional Review Board (IRB20-0917) and deemed exempt because the project relies on secondary analysis of public data sources. All opinions and any errors are our own.

The question of hypothesis generation has been a vexing one in philosophy, as it appears to follow a process distinct from deduction and has been sometimes called “abduction” (see Schickore 2018 for an overview). A fascinating economic exploration of this topic can be found in Heckman and Singer (2017) , which outlines a strategy for how economists should proceed in the face of surprising empirical results. Finally, there is a small but growing literature that uses machine learning in science. In the next section we discuss how our approach is similar in some ways and different in others.

See Einav and Levin (2014) , Varian (2014) , Athey (2017) , Mullainathan and Spiess (2017) , Gentzkow, Kelly, and Taddy (2019) , and Adukia et al. (2023) on how these changes can affect economics.

In practice, there are a number of additional nuances, as discussed in Section III.A and Online Appendix A.A .

This is calculated for some of the most commonly used measures of predictive accuracy, area under the curve (AUC) and R 2 , recognizing that different measures could yield somewhat different shares of variation explained. We emphasize the word predictable here: past work has shown that judges are “noisy” and decisions are hard to predict ( Kahneman, Sibony, and Sunstein 2022 ). As a consequence, a predictive model of the judge can do better than the judge themselves ( Kleinberg et al. 2018 ).

In Section IV.B , we examine whether the mug shot’s predictive power can be explained by underlying risk differences. There, we tentatively conclude that the predictive power of the face likely reflects judicial error, but that working assumption is not essential to either our results or the ultimate goal of the article: uncovering hypotheses for later careful testing.

For reviews of the interpretability literature, see Doshi-Velez and Kim (2017) and Marcinkevičs and Vogt (2020) .

See Liu et al. (2019) , Narayanaswamy et al. (2020) , Lang et al. (2021) , and Ghandeharioun et al. (2022) .

For example, if every dog photo in a given training data set had been taken outdoors and every cat photo was taken indoors, the algorithm might learn what animal is in the image based in part on features of the background, which would lead the algorithm to perform poorly in a new data set of more representative images.

For example, for canonical computer science applications like image classification (does this photo contain an image of a dog or of a cat?), predictive accuracy (AUC) can be on the order of 0.99. In contrast, our model of judge decisions using the face only achieves an AUC of 0.625.

Of course even if the hypotheses that are generated are the result of idiosyncratic creativity, this can still be useful. For example, Swanson (1986 , 1988) generated two novel medical hypotheses: the possibility that magnesium affects migraines and that fish oil may alleviate Raynaud’s syndrome.

Conversely, given a data set, our procedure has a built-in advantage: one could imagine a huge number of hypotheses that, while possible, are not especially useful because they are not measurable. Our procedure is by construction guaranteed to generate hypotheses that are measurable in a data set.

For additional discussion, see Ludwig and Mullainathan (2023a) .

For example, isolating the causal effects of gender on labor market outcomes is a daunting task, but the clever test in Goldin and Rouse (2000) overcomes the identification challenges by using variation in screening of orchestra applicants.

See the clever paper by Grogger and Ridgeway (2006) that uses this source of variation to examine this question.

This is related to what Autor (2014) called “Polanyi’s paradox,” the idea that people’s understanding of how the world works is beyond our capacity to explicitly describe it. For discussions in psychology about the difficulty for people to access their own cognition, see Wilson (2004) and Pronin (2009) .

Consider a simple example. Suppose x  = ( x 1 , …, x k ) is a k -dimensional binary vector, all possible values of x are equally likely, and the true function in nature relating x to y only depends on the first dimension of x so the function h 1 is the only true hypothesis and the only empirically plausible hypothesis. Even with such a simple true hypothesis, people can generate nonplausible hypotheses. Imagine a pair of data points ( x 0 , 0) and ( x 1 , 1). Since the data distribution is uniform, x 0 and x 1 will differ on |$\frac{k}{2}$| dimensions in expectation. A person looking at only one pair of observations would have a high chance of generating an empirically implausible hypothesis. Looking at more data, the probability of discovering an implausible hypothesis declines. But the problem remains.

Some canonical references include Breiman et al. (1984) , Breiman (2001) , Hastie et al. (2009) , and Jordan and Mitchell (2015) . For discussions about how machine learning connects to economics, see Belloni, Chernozhukov, and Hansen (2014) , Varian (2014) , Mullainathan and Spiess (2017) , Athey (2018) , and Athey and Imbens (2019) .

Of course there is not always a predictive signal in any given data application. But that is equally an issue for human hypothesis generation. At least with machine learning, we have formal procedures for determining whether there is any signal that holds out of sample.

The intuition here is quite straightforward. If two predictor variables are highly correlated, the weight that the algorithm puts on one versus the other can change from one draw of the data to the next depending on the idiosyncratic noise in the training data set, but since the variables are highly correlated, the predicted outcome values themselves (hence predictive accuracy) can be quite stable.

See Online Appendix Figure A.I , which shows the top nine eigenfaces for the data set we describe below, which together explain |$62\%$| of the variation.

Examples of applications of this type include Carleo et al. (2019) , He et al. (2019) , Davies et al. (2021) , Jumper et al. (2021) , and Pion-Tonachini et al. (2021) .

As other examples, researchers have found that retinal images alone can unexpectedly predict gender of patient or macular edema ( Narayanaswamy et al. 2020 ; Korot et al. 2021 ).

Sheetal, Feng, and Savani (2020) use machine learning to determine which of the long list of other survey variables collected as part of the World Values Survey best predict people’s support for unethical behavior. This application sits somewhat in between an investigator-generated hypothesis and the development of an entirely new hypothesis, in the sense that the procedure can only choose candidate hypotheses for unethical behavior from the set of variables the World Values Survey investigators thought to include on their questionnaire.

Closest is Miller et al. (2019) , which morphs EKG output but stops at the point of generating realistic morphs and does not carry this through to generating interpretable hypotheses.

Additional details about how the system works are found in Online Appendix A .

For Black non-Hispanics, the figures for Mecklenburg County versus the United States were |$33.3\%$| versus |$13.6\%$|⁠ . See https://www.census.gov/programs-surveys/sis/resources/data-tools/quickfacts.html .

Details on how we operationalize these variables are found in Online Appendix A .

The mug shot seems to have originated in Paris in the 1800s ( https://law.marquette.edu/facultyblog/2013/10/a-history-of-the-mug-shot/ ). The etymology of the term is unclear, possibly based on “mug” as slang for either the face or an “incompetent person” or “sucker” since only those who get caught are photographed by police ( https://www.etymonline.com/word/mug-shot ).

See https://mecksheriffweb.mecklenburgcountync.gov/ .

We partition the data by arrestee, not arrest, to ensure people show up in only one of the partitions to avoid inadvertent information “leakage” across data partitions.

As the Online Appendix  tables show, while there are some changes to a few of the coefficients that relate the algorithm’s predictions to factors known from past research to shape human decisions, the core findings and conclusions about the importance of the defendant’s appearance and the two specific novel facial features we identify are similar.

Using the data on arrests up to July 17, 2019, we randomly reassign arrestees to three groups of similar size to our training, validation, and lock-box hold-out data sets, convert the data to long format (with one row for each arrest-and-variable) and calculate an F -test statistic for the joint null hypothesis that the difference in baseline characteristics are all zero, clustering standard errors by arrestee. We store that F -test statistic, rerun this procedure 1,000 times, and then report the share of splits with an F -statistic larger than the one observed for the original data partition.

For an example HIT task, see Online Appendix Figure A.II .

For age and skin tone, we calculated the average pairwise correlation between two labels sampled (without replacement) from the 10 possibilities, repeated across different random pairs. The Pearson correlation was 0.765 for skin tone, 0.741 for age, and between age assigned labels versus administrative data, 0.789. The maximum correlation between the average of the first k labels collected and the k + 1 label is not all that much higher for k  = 1 than k  = 9 (0.733 versus 0.837).

For an example of the consent form and instructions given to labelers, see Online Appendix Figures A.IV and A.V .

We actually collected at least three and at least five, but the averages turned out to be very close to the minimums, equal to 3.17 and 5.07, respectively.

For example, in Oosterhof and Todorov (2008) , Supplemental Materials Table S2, they report Cronbach’s α values of 0.95 for attractiveness, and 0.93 for both trustworthy and dominant.

See Online Appendix Figure A.VIII , which shows that the change in the correlation between the ( k + 1)th label with the mean of the first k labels declines after three labels.

For an example, see Online Appendix Figure A.IX .

We use the validation data set to estimate |$\hat{\beta }$| and then evaluate the accuracy of m p ( x ). Although this could lead to overfitting in principle, since we are only estimating a single parameter, this does not matter much in practice; we get very similar results if we randomly partition the validation data set by arrestee, use a random |$30\%$| of the validation data set to estimate the weights, then measure predictive performance in the other random |$70\%$| of the validation data set.

The mean squared area for a linear probability model’s predictions is related to the Brier score ( Brier 1950 ). For a discussion of how this relates to AUC and calibration, see Murphy (1973) .

Note how this comparison helps mitigate the problem that police arrest decisions could depend on a person’s face. When we regress rearrest against the mug shot, that estimated coefficient may be heavily influenced by how police arrest decisions respond to the defendant’s appearance. In contrast when we regress judge detention decisions against predicted rearrest risk, some of the variation across defendants in rearrest risk might come from the effect of the defendant’s appearance on the probability a police officer makes an arrest, but a great deal of the variation in predicted risk presumably comes from people’s behavior.

The average mug shot–predicted detention risk for the bottom and top quartiles equal 0.127 and 0.332; that difference times 2.880 implies a rearrest risk difference of 59.0 percentage points. By way of comparison, the difference in rearrest risk between those who are arrested for a felony crime rather than a less serious misdemeanor crime is equal to just 7.8 percentage points.

In our main exhibits, we impose a simple linear relationship between the algorithm’s predicted detention risk and known facial features like age or psychological variables, for ease of presentation. We show our results are qualitatively similar with less parametric specifications in Online Appendix Tables A.VI, A.VII, and A.VIII .

With a coefficient value of 0.0006 on age (measured in years), the algorithm tells us that even a full decade’s difference in age has |$5\%$| the impact on detention likelihood compared to the effects of gender (10 × 0.0006 = 0.6 percentage point higher likelihood of detention, versus 11.9 percentage points).

Online Appendix Table A.V shows that Hispanic ethnicity, which we measure from subject ratings from looking at mug shots, is not statistically significantly related to the algorithm’s predictions. Table II , column (2) showed that conditional on gender, Black defendants have slightly higher predicted detention odds than white defendants (0.3 percentage points), but this is not quite significant ( t  = 1.3). Online Appendix Table A.V , column (1) shows that conditioning on Hispanic ethnicity and having stereotypically Black facial features—as measured in Eberhardt et al. (2006) —increases the size of the Black-white difference in predicted detention odds (now equal to 0.8 percentage points) as well as the difference’s statistical significance ( t  = 2.2).

This comes from multiplying the effect of each 1 unit change in our 9-point scale associated, equal to 0.55, 0.91, and 0.48 percentage points, respectively, with the standard deviation of the average label for each psychological feature for each image, which equal 0.923, 0.911, and 0.844, respectively.

As discussed in Online Appendix Table A.III , we offer subjects a |${\$}$| 3.00 base rate for participation plus an incentive of 5 cents per correct guess. With 50 image pairs shown to each participant, they could increase their earnings by another |${\$}$| 2.50, or up to |$83\%$| above the base compensation.

Table III gives us another way to see how much of previously known features are rediscovered by the algorithm. That the algorithm’s prediction plus all previously known features yields an R 2 of just 0.0380 (column (7)), not much larger than with the algorithm alone, suggests the algorithm has discovered most of the signal in these known features. But not necessarily all: these other known features often do remain statistically significant predictors of judges’ decisions even after controlling for the algorithm’s predictions (last column). One possible reason is that, given finite samples, the algorithm has only imperfectly reconstructed factors such as “age” or “human guess.” Controlling for these factors directly adds additional signal.

Imagine a linear prediction function like |$m(x_1,x_2) = \widehat{\beta }_1 x_1 + \widehat{\beta }_2 x_2$|⁠ . If our best estimates suggested |$\widehat{\beta }_2=0$|⁠ , the maximum change to the prediction comes from incrementally changing x 1 .

As noted already, to avoid contributing to the stereotyping of minorities in discussions of crime, in our exhibits we show images for non-Hispanic white men, although in our HITs we use images representative of the larger defendant population.

Modeling p ( x ) through a supervised learning task would involve assembling a large set of images, having subjects label each image for whether they contain a realistic face, and then predicting those labels using the image pixels as inputs. But this supervised learning approach is costly because it requires extensive annotation of a large training data set.

Kaji, Manresa, and Pouliot (2020) and Athey et al. (2021 , 2022) are recent uses of GANs in economics.

Some ethical issues are worth considering. One is bias. With human hypothesis generation there is the risk people “see” an association that impugns some group yet has no basis in fact. In contrast our procedure by construction only produces empirically plausible hypotheses. A different concern is the vulnerability of deep learning to adversarial examples: tiny, almost imperceptible changes in an image changing its classification for the outcome y , so that mug shots that look almost identical (that is, are very “similar” in some visual image metric) have dramatically different m ( x ). This is a problem because tiny changes to an image don’t change the nature of the object; see Szegedy et al. (2013) and Goodfellow, Shlens, and Szegedy (2014) . In practice such instances are quite rare in nature, indeed, so rare they usually occur only if intentionally (maliciously) generated.

Online Appendix Figure A.XII gives an example of this task and the instructions given to participating subjects to complete it. Each subject was tested on 50 image pairs selected at random from a population of 100 images. Subjects were told that for every pair, one image was higher in some unknown feature, but not given details as to what the feature might be. As in the exercise for predicting detention, feedback was given immediately after selecting an image, and a 5 cent bonus was paid for every correct answer.

In principle this semantic grouping could be carried out in other ways, for example, with automated procedures involving natural-language processing.

See Online Appendix Table A.III for a high-level description of this human intelligence task, and Online Appendix Figure A.XIV for a sample of the task and the subject instructions.

We drop every token of just one or two characters in length, as well as connector words without real meaning for this purpose, like “had,” “the,” and “and,” as well as words that are relevant to our exercise but generic, like “jailed,” “judge,” and “image.”

We enlisted three research assistants blinded to the findings of this study and asked them to come up with semantic categories that captured all subject comments. Since each assistant mapped each subject comment to |$5\%$| of semantic categories on average, if the assistant mappings were totally uncorrelated, we would expect to see agreement of at least two assistant categorizations about |$5\%$| of the time. What we actually see is if one research assistant made an association, |$60\%$| of the time another assistant would make the same association. We assign a comment to a semantic category when at least two of the assistants agree on the categorization.

Moreover what subjects see does not seem to be particularly sensitive to which images they see. (As a reminder, each subject sees 50 morphed image pairs randomly selected from a larger bank of 100 morphed image pairs). If we start with a subject who says they saw “well-groomed” in the morphed image pairs they saw, for other subjects who saw 21 or fewer images in common (so saw mostly different images) they also report seeing well-groomed |$31\%$| of the time, versus |$35\%$| among the population. We select the threshold of 21 images because this is the smallest threshold in which at least 50 pairs of raters are considered.

See Online Appendix Table A.III and Online Appendix Figure A.XVI . This comes to a total of 192,280 individual labels, an average of 3.2 labels per image in the training set and an average of 10.8 labels per image in the validation set. Sampling labels from different workers on the same image, these ratings have a correlation of 0.14.

It turns out that skin tone is another feature that is correlated with well-groomed, so we orthogonalize on that as well as well-groomed. To simplify the discussion, we use “well-groomed” as a stand-in for both features we orthogonalize against, well-groomed plus skin tone.

To see why, consider the mechanics of the procedure. Since we orthogonalize as we create morphs, we would need labels at each morphing step. This would entail us producing candidate steps (new morphs), collecting data on each of the candidates, picking one that has the same well-groomed value, and then repeating. Moreover, until the labels are collected at a given step, the next step could not be taken. Since producing a final morph requires hundreds of such intermediate morphing steps, the whole process would be so time- and resource-consuming as to be infeasible.

While we can predict demographic features like race and age (above/below median age) nearly perfectly, with AUC values close to 1, for predicting well-groomed, the mean absolute error of our OOS prediction is 0.63, which is plus or minus over half a slider value for this 9-point-scaled variable. One reason it is harder to predict well-groomed is because the labels, which come from human subjects looking at and labeling mug shots, are themselves noisy, which introduces irreducible error.

For additional details see Online Appendix Figure A.XVII and Online Appendix B .

There are a few additional technical steps required, discussed in Online Appendix B . For details on the HIT we use to get subjects to name the new hypothesis from looking at orthogonalized morphs, and the follow-up HIT to generate independent labels for that new hypothesis or facial feature, see Online Appendix Table A.III .

See Online Appendix Figure A.XIX .

The adjusted R 2 of including the practitioner forecasts plus well-groomed and heavy-facedness together (column (3), equal to 0.0246) is not that different from the sum of the R 2 values from including just the practitioner forecasts (0.0165 in column (1)) plus that from including just well-groomed and heavy-faced (equal to 0.0131 in Table VII , column (2)).

In Online Appendix Table A.IX we show that controlling for one obvious indicator of a substance abuse issue—arrest for drugs—does not seem to substantially change the relationship between full-faced or well-groomed and the predicted detention decision. Online Appendix Tables A.X and A.XI show a qualitatively similar pattern of results for the defendant’s mental health and socioeconomic status, which we measure by getting a separate sample of subjects to independently rate validation–data set mug shots. We see qualitatively similar results when the dependent variable is the actual rather than predicted judge decision; see Online Appendix Tables A.XIII, A.XIV, and A.XV .

Characteristics of having a baby face included large eyes, narrow chin, small nose, and high, raised eyebrows. For a discussion of some of the larger literature on how that feature shapes the reactions of other people generally, see Zebrowitz et al. (2009) .

For additional details, see Online Appendix C .

See https://www.nolo.com/covid-19/virtual-criminal-court-appearances-in-the-time-of-the-covid-19.html .

See https://stablediffusionweb.com/ and https://openai.com/product/dall-e-2 .

The data underlying this article are available in the Harvard Dataverse, https://doi.org/10.7910/DVN/ILO46V ( Ludwig and Mullainathan 2023b ).

Adukia   Anjali , Eble   Alex , Harrison   Emileigh , Birali Runesha   Hakizumwami , Szasz   Teodora , “ What We Teach about Race and Gender: Representation in Images and Text of Children’s Books ,” Quarterly Journal of Economics , 138 ( 2023 ), 2225 – 2285 . https://doi.org/10.1093/qje/qjad028

Google Scholar

Angelova   Victoria , Dobbie   Will S. , Yang   Crystal S. , “ Algorithmic Recommendations and Human Discretion ,” NBER Working Paper no. 31747, 2023 . https://doi.org/10.3386/w31747

Arnold   David , Dobbie   Will S. , Hull   Peter , “ Measuring Racial Discrimination in Bail Decisions ,” NBER Working Paper no. 26999, 2020.   https://doi.org/10.3386/w26999

Arnold   David , Dobbie   Will , Yang   Crystal S. , “ Racial Bias in Bail Decisions ,” Quarterly Journal of Economics , 133 ( 2018 ), 1885 – 1932 . https://doi.org/10.1093/qje/qjy012

Athey   Susan , “ Beyond Prediction: Using Big Data for Policy Problems ,” Science , 355 ( 2017 ), 483 – 485 . https://doi.org/10.1126/science.aal4321

Athey   Susan , “ The Impact of Machine Learning on Economics ,” in The Economics of Artificial Intelligence: An Agenda , Ajay Agrawal, Joshua Gans, and Avi Goldfarb, eds. (Chicago: University of Chicago Press , 2018 ), 507 – 547 .

Athey   Susan , Imbens   Guido W. , “ Machine Learning Methods That Economists Should Know About ,” Annual Review of Economics , 11 ( 2019 ), 685 – 725 . https://doi.org/10.1146/annurev-economics-080217-053433

Athey   Susan , Imbens   Guido W. , Metzger   Jonas , Munro   Evan , “ Using Wasserstein Generative Adversarial Networks for the Design of Monte Carlo Simulations ,” Journal of Econometrics , ( 2021 ), 105076. https://doi.org/10.1016/j.jeconom.2020.09.013

Athey   Susan , Karlan   Dean , Palikot   Emil , Yuan   Yuan , “ Smiles in Profiles: Improving Fairness and Efficiency Using Estimates of User Preferences in Online Marketplaces ,” NBER Working Paper no. 30633 , 2022 . https://doi.org/10.3386/w30633

Autor   David , “ Polanyi’s Paradox and the Shape of Employment Growth ,” NBER Working Paper no. 20485 , 2014 . https://doi.org/10.3386/w20485

Avitzour   Eliana , Choen   Adi , Joel   Daphna , Lavy   Victor , “ On the Origins of Gender-Biased Behavior: The Role of Explicit and Implicit Stereotypes ,” NBER Working Paper no. 27818 , 2020 . https://doi.org/10.3386/w27818

Baehrens   David , Schroeter   Timon , Harmeling   Stefan , Kawanabe   Motoaki , Hansen   Katja , Müller   Klaus-Robert , “ How to Explain Individual Classification Decisions ,” Journal of Machine Learning Research , 11 ( 2010 ), 1803 – 1831 .

Baltrušaitis   Tadas , Ahuja   Chaitanya , Morency   Louis-Philippe , “ Multimodal Machine Learning: A Survey and Taxonomy ,” IEEE Transactions on Pattern Analysis and Machine Intelligence , 41 ( 2019 ), 423 – 443 . https://doi.org/10.1109/TPAMI.2018.2798607

Begall   Sabine , Červený   Jaroslav , Neef   Julia , Vojtěch   Oldřich , Burda   Hynek , “ Magnetic Alignment in Grazing and Resting Cattle and Deer ,” Proceedings of the National Academy of Sciences , 105 ( 2008 ), 13451 – 13455 . https://doi.org/10.1073/pnas.0803650105

Belloni   Alexandre , Chernozhukov   Victor , Hansen   Christian , “ High-Dimensional Methods and Inference on Structural and Treatment Effects ,” Journal of Economic Perspectives , 28 ( 2014 ), 29 – 50 . https://doi.org/10.1257/jep.28.2.29

Berry   Diane S. , Zebrowitz-McArthur   Leslie , “ What’s in a Face? Facial Maturity and the Attribution of Legal Responsibility ,” Personality and Social Psychology Bulletin , 14 ( 1988 ), 23 – 33 . https://doi.org/10.1177/0146167288141003

Bertrand   Marianne , Mullainathan   Sendhil , “ Are Emily and Greg More Employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination ,” American Economic Review , 94 ( 2004 ), 991 – 1013 . https://doi.org/10.1257/0002828042002561

Bjornstrom   Eileen E. S. , Kaufman   Robert L. , Peterson   Ruth D. , Slater   Michael D. , “ Race and Ethnic Representations of Lawbreakers and Victims in Crime News: A National Study of Television Coverage ,” Social Problems , 57 ( 2010 ), 269 – 293 . https://doi.org/10.1525/sp.2010.57.2.269

Breiman   Leo , “ Random Forests ,” Machine Learning , 45 ( 2001 ), 5 – 32 . https://doi.org/10.1023/A:1010933404324

Breiman   Leo , Friedman   Jerome H. , Olshen   Richard A. , Stone   Charles J. , Classification and Regression Trees (London: Routledge , 1984 ). https://doi.org/10.1201/9781315139470

Google Preview

Brier   Glenn W. , “ Verification of Forecasts Expressed in Terms of Probability ,” Monthly Weather Review , 78 ( 1950 ), 1 – 3 . https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2

Carleo   Giuseppe , Cirac   Ignacio , Cranmer   Kyle , Daudet   Laurent , Schuld   Maria , Tishby   Naftali , Vogt-Maranto   Leslie , Zdeborová   Lenka , “ Machine Learning and the Physical Sciences ,” Reviews of Modern Physics , 91 ( 2019 ), 045002 . https://doi.org/10.1103/RevModPhys.91.045002

Chen   Daniel L. , Moskowitz   Tobias J. , Shue   Kelly , “ Decision Making under the Gambler’s Fallacy: Evidence from Asylum Judges, Loan Officers, and Baseball Umpires ,” Quarterly Journal of Economics , 131 ( 2016 ), 1181 – 1242 . https://doi.org/10.1093/qje/qjw017

Chen   Daniel L. , Philippe   Arnaud , “ Clash of Norms: Judicial Leniency on Defendant Birthdays ,” Journal of Economic Behavior & Organization , 211 ( 2023 ), 324 – 344 . https://doi.org/10.1016/j.jebo.2023.05.002

Dahl   Gordon B. , Knepper   Matthew M. , “ Age Discrimination across the Business Cycle ,” NBER Working Paper no. 27581 , 2020 . https://doi.org/10.3386/w27581

Davies   Alex , Veličković   Petar , Buesing   Lars , Blackwell   Sam , Zheng   Daniel , Tomašev   Nenad , Tanburn   Richard , Battaglia   Peter , Blundell   Charles , Juhász   András , Lackenby   Marc , Williamson   Geordie , Hassabis   Demis , Kohli   Pushmeet , “ Advancing Mathematics by Guiding Human Intuition with AI ,” Nature , 600 ( 2021 ), 70 – 74 . https://doi.org/10.1038/s41586-021-04086-x

Devlin   Jacob , Chang   Ming-Wei , Lee   Kenton , Toutanova   Kristina , “ BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding ,” arXiv preprint arXiv:1810.04805 , 2018 . https://doi.org/10.48550/arXiv.1810.04805

Dobbie   Will , Goldin   Jacob , Yang   Crystal S. , “ The Effects of Pretrial Detention on Conviction, Future Crime, and Employment: Evidence from Randomly Assigned Judges ,” American Economic Review , 108 ( 2018 ), 201 – 240 . https://doi.org/10.1257/aer.20161503

Dobbie   Will , Yang   Crystal S. , “ The US Pretrial System: Balancing Individual Rights and Public Interests ,” Journal of Economic Perspectives , 35 ( 2021 ), 49 – 70 . https://doi.org/10.1257/jep.35.4.49

Doshi-Velez   Finale , Kim   Been , “ Towards a Rigorous Science of Interpretable Machine Learning ,” arXiv preprint arXiv:1702.08608 , 2017 . https://doi.org/10.48550/arXiv.1702.08608

Eberhardt   Jennifer L. , Davies   Paul G. , Purdie-Vaughns   Valerie J. , Lynn Johnson   Sheri , “ Looking Deathworthy: Perceived Stereotypicality of Black Defendants Predicts Capital-Sentencing Outcomes ,” Psychological Science , 17 ( 2006 ), 383 – 386 . https://doi.org/10.1111/j.1467-9280.2006.01716.x

Einav   Liran , Levin   Jonathan , “ The Data Revolution and Economic Analysis ,” Innovation Policy and the Economy , 14 ( 2014 ), 1 – 24 . https://doi.org/10.1086/674019

Eren   Ozkan , Mocan   Naci , “ Emotional Judges and Unlucky Juveniles ,” American Economic Journal: Applied Economics , 10 ( 2018 ), 171 – 205 . https://doi.org/10.1257/app.20160390

Frieze   Irene Hanson , Olson   Josephine E. , Russell   June , “ Attractiveness and Income for Men and Women in Management ,” Journal of Applied Social Psychology , 21 ( 1991 ), 1039 – 1057 . https://doi.org/10.1111/j.1559-1816.1991.tb00458.x

Fryer   Roland G., Jr , “ An Empirical Analysis of Racial Differences in Police Use of Force: A Response ,” Journal of Political Economy , 128 ( 2020 ), 4003 – 4008 . https://doi.org/10.1086/710977

Fudenberg   Drew , Liang   Annie , “ Predicting and Understanding Initial Play ,” American Economic Review , 109 ( 2019 ), 4112 – 4141 . https://doi.org/10.1257/aer.20180654

Gentzkow   Matthew , Kelly   Bryan , Taddy   Matt , “ Text as Data ,” Journal of Economic Literature , 57 ( 2019 ), 535 – 574 . https://doi.org/10.1257/jel.20181020

Ghandeharioun   Asma , Kim   Been , Li   Chun-Liang , Jou   Brendan , Eoff   Brian , Picard   Rosalind W. , “ DISSECT: Disentangled Simultaneous Explanations via Concept Traversals ,” arXiv preprint arXiv:2105.15164   2022 . https://doi.org/10.48550/arXiv.2105.15164

Goldin   Claudia , Rouse   Cecilia , “ Orchestrating Impartiality: The Impact of ‘Blind’ Auditions on Female Musicians ,” American Economic Review , 90 ( 2000 ), 715 – 741 . https://doi.org/10.1257/aer.90.4.715

Goncalves   Felipe , Mello   Steven , “ A Few Bad Apples? Racial Bias in Policing ,” American Economic Review , 111 ( 2021 ), 1406 – 1441 . https://doi.org/10.1257/aer.20181607

Goodfellow   Ian , Pouget-Abadie   Jean , Mirza   Mehdi , Xu   Bing , Warde-Farley   David , Ozair   Sherjil , Courville   Aaron , Bengio   Yoshua , “ Generative Adversarial Nets ,” Advances in Neural Information Processing Systems , 27 ( 2014 ), 2672 – 2680 .

Goodfellow   Ian J. , Shlens   Jonathon , Szegedy   Christian , “ Explaining and Harnessing Adversarial Examples ,” arXiv preprint arXiv:1412.6572 , 2014 . https://doi.org/10.48550/arXiv.1412.6572

Grogger   Jeffrey , Ridgeway   Greg , “ Testing for Racial Profiling in Traffic Stops from Behind a Veil of Darkness ,” Journal of the American Statistical Association , 101 ( 2006 ), 878 – 887 . https://doi.org/10.1198/016214506000000168

Hastie   Trevor , Tibshirani   Robert , Friedman   Jerome H. , Friedman   Jerome H. , The Elements of Statistical Learning: Data Mining, Inference, and Prediction , vol. 2 (Berlin: Springer , 2009 ).

He   Siyu , Li   Yin , Feng   Yu , Ho   Shirley , Ravanbakhsh   Siamak , Chen   Wei , Póczos   Barnabás , “ Learning to Predict the Cosmological Structure Formation ,” Proceedings of the National Academy of Sciences , 116 ( 2019 ), 13825 – 13832 . https://doi.org/10.1073/pnas.1821458116

Heckman   James J. , Singer   Burton , “ Abducting Economics ,” American Economic Review , 107 ( 2017 ), 298 – 302 . https://doi.org/10.1257/aer.p20171118

Heyes   Anthony , Saberian   Soodeh , “ Temperature and Decisions: Evidence from 207,000 Court Cases ,” American Economic Journal: Applied Economics , 11 ( 2019 ), 238 – 265 . https://doi.org/10.1257/app.20170223

Hoekstra   Mark , Sloan   CarlyWill , “ Does Race Matter for Police Use of Force? Evidence from 911 Calls ,” American Economic Review , 112 ( 2022 ), 827 – 860 . https://doi.org/10.1257/aer.20201292

Hunter   Margaret , “ The Persistent Problem of Colorism: Skin Tone, Status, and Inequality ,” Sociology Compass , 1 ( 2007 ), 237 – 254 . https://doi.org/10.1111/j.1751-9020.2007.00006.x

Jordan   Michael I. , Mitchell   Tom M. , “ Machine Learning: Trends, Perspectives, and Prospects ,” Science , 349 ( 2015 ), 255 – 260 . https://doi.org/10.1126/science.aaa8415

Jumper   John , Evans   Richard , Pritzel   Alexander , Green   Tim , Figurnov   Michael , Ronneberger   Olaf , Tunyasuvunakool   Kathryn , Bates   Russ , Žídek   Augustin , Potapenko   Anna  et al.  , “ Highly Accurate Protein Structure Prediction with AlphaFold ,” Nature , 596 ( 2021 ), 583 – 589 . https://doi.org/10.1038/s41586-021-03819-2

Jung   Jongbin , Concannon   Connor , Shroff   Ravi , Goel   Sharad , Goldstein   Daniel G. , “ Simple Rules for Complex Decisions ,” SSRN working paper , 2017 . https://doi.org/10.2139/ssrn.2919024

Kahneman   Daniel , Sibony   Olivier , Sunstein   C. R , Noise (London: HarperCollins , 2022 ).

Kaji   Tetsuya , Manresa   Elena , Pouliot   Guillaume , “ An Adversarial Approach to Structural Estimation ,” University of Chicago, Becker Friedman Institute for Economics Working Paper No. 2020-144 , 2020 . https://doi.org/10.2139/ssrn.3706365

Kingma   Diederik P. , Welling   Max , “ Auto-Encoding Variational Bayes ,” arXiv preprint arXiv:1312.6114 , 2013 . https://doi.org/10.48550/arXiv.1312.6114

Kleinberg   Jon , Lakkaraju   Himabindu , Leskovec   Jure , Ludwig   Jens , Mullainathan   Sendhil , “ Human Decisions and Machine Predictions ,” Quarterly Journal of Economics , 133 ( 2018 ), 237 – 293 . https://doi.org/10.1093/qje/qjx032

Korot   Edward , Pontikos   Nikolas , Liu   Xiaoxuan , Wagner   Siegfried K. , Faes   Livia , Huemer   Josef , Balaskas   Konstantinos , Denniston   Alastair K. , Khawaja   Anthony , Keane   Pearse A. , “ Predicting Sex from Retinal Fundus Photographs Using Automated Deep Learning ,” Scientific Reports , 11 ( 2021 ), 10286 . https://doi.org/10.1038/s41598-021-89743-x

Lahat   Dana , Adali   Tülay , Jutten   Christian , “ Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects ,” Proceedings of the IEEE , 103 ( 2015 ), 1449 – 1477 . https://doi.org/10.1109/JPROC.2015.2460697

Lang   Oran , Gandelsman   Yossi , Yarom   Michal , Wald   Yoav , Elidan   Gal , Hassidim   Avinatan , Freeman   William T , Isola   Phillip , Globerson   Amir , Irani   Michal , et al.  , “ Explaining in Style: Training a GAN to Explain a Classifier in StyleSpace ,” paper presented at the IEEE/CVF International Conference on Computer Vision , 2021. https://doi.org/10.1109/ICCV48922.2021.00073

Leskovec   Jure , Backstrom   Lars , Kleinberg   Jon , “ Meme-Tracking and the Dynamics of the News Cycle ,” paper presented at the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009. https://doi.org/10.1145/1557019.1557077

Little   Anthony C. , Jones   Benedict C. , DeBruine   Lisa M. , “ Facial Attractiveness: Evolutionary Based Research ,” Philosophical Transactions of the Royal Society B: Biological Sciences , 366 ( 2011 ), 1638 – 1659 . https://doi.org/10.1098/rstb.2010.0404

Liu   Shusen , Kailkhura   Bhavya , Loveland   Donald , Han   Yong , “ Generative Counterfactual Introspection for Explainable Deep Learning ,” paper presented at the IEEE Global Conference on Signal and Information Processing (GlobalSIP) , 2019. https://doi.org/10.1109/GlobalSIP45357.2019.8969491

Ludwig   Jens , Mullainathan   Sendhil , “ Machine Learning as a Tool for Hypothesis Generation ,” NBER Working Paper no. 31017 , 2023a . https://doi.org/10.3386/w31017

Ludwig   Jens , Mullainathan   Sendhil , “ Replication Data for: ‘Machine Learning as a Tool for Hypothesis Generation’ ,” ( 2023b ), Harvard Dataverse. https://doi.org/10.7910/DVN/ILO46V .

Marcinkevičs   Ričards , Vogt   Julia E. , “ Interpretability and Explainability: A Machine Learning Zoo Mini-Tour ,” arXiv preprint arXiv:2012.01805 , 2020 . https://doi.org/10.48550/arXiv.2012.01805

Miller   Andrew , Obermeyer   Ziad , Cunningham   John , Mullainathan   Sendhil , “ Discriminative Regularization for Latent Variable Models with Applications to Electrocardiography ,” paper presented at the International Conference on Machine Learning , 2019.

Mobius   Markus M. , Rosenblat   Tanya S. , “ Why Beauty Matters ,” American Economic Review , 96 ( 2006 ), 222 – 235 . https://doi.org/10.1257/000282806776157515

Mobley   R. Keith , An Introduction to Predictive Maintenance (Amsterdam: Elsevier , 2002 ).

Mullainathan   Sendhil , Obermeyer   Ziad , “ Diagnosing Physician Error: A Machine Learning Approach to Low-Value Health Care ,” Quarterly Journal of Economics , 137 ( 2022 ), 679 – 727 . https://doi.org/10.1093/qje/qjab046

Mullainathan   Sendhil , Spiess   Jann , “ Machine Learning: an Applied Econometric Approach ,” Journal of Economic Perspectives , 31 ( 2017 ), 87 – 106 . https://doi.org/10.1257/jep.31.2.87

Murphy   Allan H. , “ A New Vector Partition of the Probability Score ,” Journal of Applied Meteorology and Climatology , 12 ( 1973 ), 595 – 600 . https://doi.org/10.1175/1520-0450(1973)012<0595:ANVPOT>2.0.CO;2

Nalisnick   Eric , Matsukawa   Akihiro , Whye Teh   Yee , Gorur   Dilan , Lakshminarayanan   Balaji , “ Do Deep Generative Models Know What They Don’t Know? ,” arXiv preprint arXiv:1810.09136 , 2018 . https://doi.org/10.48550/arXiv.1810.09136

Narayanaswamy   Arunachalam , Venugopalan   Subhashini , Webster   Dale R. , Peng   Lily , Corrado   Greg S. , Ruamviboonsuk   Paisan , Bavishi   Pinal , Brenner   Michael , Nelson   Philip C. , Varadarajan   Avinash V. , “ Scientific Discovery by Generating Counterfactuals Using Image Translation ,” in International Conference on Medical Image Computing and Computer-Assisted Intervention , (Berlin: Springer , 2020), 273 – 283 . https://doi.org/10.1007/978-3-030-59710-8_27

Neumark   David , Burn   Ian , Button   Patrick , “ Experimental Age Discrimination Evidence and the Heckman Critique ,” American Economic Review , 106 ( 2016 ), 303 – 308 . https://doi.org/10.1257/aer.p20161008

Norouzzadeh   Mohammad Sadegh , Nguyen   Anh , Kosmala   Margaret , Swanson   Alexandra , S. Palmer   Meredith , Packer   Craig , Clune   Jeff , “ Automatically Identifying, Counting, and Describing Wild Animals in Camera-Trap Images with Deep Learning ,” Proceedings of the National Academy of Sciences , 115 ( 2018 ), E5716 – E5725 . https://doi.org/10.1073/pnas.1719367115

Oosterhof   Nikolaas N. , Todorov   Alexander , “ The Functional Basis of Face Evaluation ,” Proceedings of the National Academy of Sciences , 105 ( 2008 ), 11087 – 11092 . https://doi.org/10.1073/pnas.0805664105

Peterson   Joshua C. , Bourgin   David D. , Agrawal   Mayank , Reichman   Daniel , Griffiths   Thomas L. , “ Using Large-Scale Experiments and Machine Learning to Discover Theories of Human Decision-Making ,” Science , 372 ( 2021 ), 1209 – 1214 . https://doi.org/10.1126/science.abe2629

Pierson   Emma , Cutler   David M. , Leskovec   Jure , Mullainathan   Sendhil , Obermeyer   Ziad , “ An Algorithmic Approach to Reducing Unexplained Pain Disparities in Underserved Populations ,” Nature Medicine , 27 ( 2021 ), 136 – 140 . https://doi.org/10.1038/s41591-020-01192-7

Pion-Tonachini   Luca , Bouchard   Kristofer , Garcia Martin   Hector , Peisert   Sean , Bradley Holtz   W. , Aswani   Anil , Dwivedi   Dipankar , Wainwright   Haruko , Pilania   Ghanshyam , Nachman   Benjamin  et al.  “ Learning from Learning Machines: A New Generation of AI Technology to Meet the Needs of Science ,” arXiv preprint arXiv:2111.13786 , 2021 . https://doi.org/10.48550/arXiv.2111.13786

Popper   Karl , The Logic of Scientific Discovery (London: Routledge , 2nd ed. 2002 ). https://doi.org/10.4324/9780203994627

Pronin   Emily , “ The Introspection Illusion ,” Advances in Experimental Social Psychology , 41 ( 2009 ), 1 – 67 . https://doi.org/10.1016/S0065-2601(08)00401-2

Ramachandram   Dhanesh , Taylor   Graham W. , “ Deep Multimodal Learning: A Survey on Recent Advances and Trends ,” IEEE Signal Processing Magazine , 34 ( 2017 ), 96 – 108 . https://doi.org/10.1109/MSP.2017.2738401

Rambachan   Ashesh , “ Identifying Prediction Mistakes in Observational Data ,” Harvard University Working Paper, 2021 . www.nber.org/system/files/chapters/c14777/c14777.pdf

Said-Metwaly   Sameh , Van den Noortgate   Wim , Kyndt   Eva , “ Approaches to Measuring Creativity: A Systematic Literature Review ,” Creativity: Theories–Research-Applications , 4 ( 2017 ), 238 – 275 . https://doi.org/10.1515/ctra-2017-0013

Schickore   Jutta , “ Scientific Discovery ,” in The Stanford Encyclopedia of Philosophy , Edward N. Zalta, ed. (Stanford, CA: Stanford University , 2018).

Schlag   Pierre , “ Law and Phrenology ,” Harvard Law Review , 110 ( 1997 ), 877 – 921 . https://doi.org/10.2307/1342231

Sheetal   Abhishek , Feng   Zhiyu , Savani   Krishna , “ Using Machine Learning to Generate Novel Hypotheses: Increasing Optimism about COVID-19 Makes People Less Willing to Justify Unethical Behaviors ,” Psychological Science , 31 ( 2020 ), 1222 – 1235 . https://doi.org/10.1177/0956797620959594

Simonyan   Karen , Vedaldi   Andrea , Zisserman   Andrew , “ Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps ,” paper presented at the Workshop at International Conference on Learning Representations , 2014.

Sirovich   Lawrence , Kirby   Michael , “ Low-Dimensional Procedure for the Characterization of Human Faces ,” Journal of the Optical Society of America A , 4 ( 1987 ), 519 – 524 . https://doi.org/10.1364/JOSAA.4.000519

Sunstein   Cass R. , “ Governing by Algorithm? No Noise and (Potentially) Less Bias ,” Duke Law Journal , 71 ( 2021 ), 1175 – 1205 . https://doi.org/10.2139/ssrn.3925240

Swanson   Don R. , “ Fish Oil, Raynaud’s Syndrome, and Undiscovered Public Knowledge ,” Perspectives in Biology and Medicine , 30 ( 1986 ), 7 – 18 . https://doi.org/10.1353/pbm.1986.0087

Swanson   Don R. , “ Migraine and Magnesium: Eleven Neglected Connections ,” Perspectives in Biology and Medicine , 31 ( 1988 ), 526 – 557 . https://doi.org/10.1353/pbm.1988.0009

Szegedy   Christian , Zaremba   Wojciech , Sutskever   Ilya , Bruna   Joan , Erhan   Dumitru , Goodfellow   Ian , Fergus   Rob , “ Intriguing Properties of Neural Networks ,” arXiv preprint arXiv:1312.6199 , 2013 . https://doi.org/10.48550/arXiv.1312.6199

Todorov   Alexander , Oh   DongWon , “ The Structure and Perceptual Basis of Social Judgments from Faces. in Advances in Experimental Social Psychology , B. Gawronski, ed. (Amsterdam: Elsevier , 2021 ), 189–245.

Todorov   Alexander , Olivola   Christopher Y. , Dotsch   Ron , Mende-Siedlecki   Peter , “ Social Attributions from Faces: Determinants, Consequences, Accuracy, and Functional Significance ,” Annual Review of Psychology , 66 ( 2015 ), 519 – 545 . https://doi.org/10.1146/annurev-psych-113011-143831

Varian   Hal R. , “ Big Data: New Tricks for Econometrics ,” Journal of Economic Perspectives , 28 ( 2014 ), 3 – 28 . https://doi.org/10.1257/jep.28.2.3

Wilson   Timothy D. , Strangers to Ourselves (Cambridge, MA: Harvard University Press , 2004 ).

Yuhas   Ben P. , Goldstein   Moise H. , Sejnowski   Terrence J. , “ Integration of Acoustic and Visual Speech Signals Using Neural Networks ,” IEEE Communications Magazine , 27 ( 1989 ), 65 – 71 . https://doi.org/10.1109/35.41402

Zebrowitz   Leslie A. , Luevano   Victor X. , Bronstad   Philip M. , Aharon   Itzhak , “ Neural Activation to Babyfaced Men Matches Activation to Babies ,” Social Neuroscience , 4 ( 2009 ), 1 – 10 . https://doi.org/10.1080/17470910701676236

Supplementary data

Email alerts, citing articles via.

  • Recommend to Your Librarian

Affiliations

  • Online ISSN 1531-4650
  • Print ISSN 0033-5533
  • Copyright © 2024 President and Fellows of Harvard College
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

The Writing Center • University of North Carolina at Chapel Hill

Scientific Reports

What this handout is about.

This handout provides a general guide to writing reports about scientific research you’ve performed. In addition to describing the conventional rules about the format and content of a lab report, we’ll also attempt to convey why these rules exist, so you’ll get a clearer, more dependable idea of how to approach this writing situation. Readers of this handout may also find our handout on writing in the sciences useful.

Background and pre-writing

Why do we write research reports.

You did an experiment or study for your science class, and now you have to write it up for your teacher to review. You feel that you understood the background sufficiently, designed and completed the study effectively, obtained useful data, and can use those data to draw conclusions about a scientific process or principle. But how exactly do you write all that? What is your teacher expecting to see?

To take some of the guesswork out of answering these questions, try to think beyond the classroom setting. In fact, you and your teacher are both part of a scientific community, and the people who participate in this community tend to share the same values. As long as you understand and respect these values, your writing will likely meet the expectations of your audience—including your teacher.

So why are you writing this research report? The practical answer is “Because the teacher assigned it,” but that’s classroom thinking. Generally speaking, people investigating some scientific hypothesis have a responsibility to the rest of the scientific world to report their findings, particularly if these findings add to or contradict previous ideas. The people reading such reports have two primary goals:

  • They want to gather the information presented.
  • They want to know that the findings are legitimate.

Your job as a writer, then, is to fulfill these two goals.

How do I do that?

Good question. Here is the basic format scientists have designed for research reports:

  • Introduction

Methods and Materials

This format, sometimes called “IMRAD,” may take slightly different shapes depending on the discipline or audience; some ask you to include an abstract or separate section for the hypothesis, or call the Discussion section “Conclusions,” or change the order of the sections (some professional and academic journals require the Methods section to appear last). Overall, however, the IMRAD format was devised to represent a textual version of the scientific method.

The scientific method, you’ll probably recall, involves developing a hypothesis, testing it, and deciding whether your findings support the hypothesis. In essence, the format for a research report in the sciences mirrors the scientific method but fleshes out the process a little. Below, you’ll find a table that shows how each written section fits into the scientific method and what additional information it offers the reader.

Thinking of your research report as based on the scientific method, but elaborated in the ways described above, may help you to meet your audience’s expectations successfully. We’re going to proceed by explicitly connecting each section of the lab report to the scientific method, then explaining why and how you need to elaborate that section.

Although this handout takes each section in the order in which it should be presented in the final report, you may for practical reasons decide to compose sections in another order. For example, many writers find that composing their Methods and Results before the other sections helps to clarify their idea of the experiment or study as a whole. You might consider using each assignment to practice different approaches to drafting the report, to find the order that works best for you.

What should I do before drafting the lab report?

The best way to prepare to write the lab report is to make sure that you fully understand everything you need to about the experiment. Obviously, if you don’t quite know what went on during the lab, you’re going to find it difficult to explain the lab satisfactorily to someone else. To make sure you know enough to write the report, complete the following steps:

  • What are we going to do in this lab? (That is, what’s the procedure?)
  • Why are we going to do it that way?
  • What are we hoping to learn from this experiment?
  • Why would we benefit from this knowledge?
  • Consult your lab supervisor as you perform the lab. If you don’t know how to answer one of the questions above, for example, your lab supervisor will probably be able to explain it to you (or, at least, help you figure it out).
  • Plan the steps of the experiment carefully with your lab partners. The less you rush, the more likely it is that you’ll perform the experiment correctly and record your findings accurately. Also, take some time to think about the best way to organize the data before you have to start putting numbers down. If you can design a table to account for the data, that will tend to work much better than jotting results down hurriedly on a scrap piece of paper.
  • Record the data carefully so you get them right. You won’t be able to trust your conclusions if you have the wrong data, and your readers will know you messed up if the other three people in your group have “97 degrees” and you have “87.”
  • Consult with your lab partners about everything you do. Lab groups often make one of two mistakes: two people do all the work while two have a nice chat, or everybody works together until the group finishes gathering the raw data, then scrams outta there. Collaborate with your partners, even when the experiment is “over.” What trends did you observe? Was the hypothesis supported? Did you all get the same results? What kind of figure should you use to represent your findings? The whole group can work together to answer these questions.
  • Consider your audience. You may believe that audience is a non-issue: it’s your lab TA, right? Well, yes—but again, think beyond the classroom. If you write with only your lab instructor in mind, you may omit material that is crucial to a complete understanding of your experiment, because you assume the instructor knows all that stuff already. As a result, you may receive a lower grade, since your TA won’t be sure that you understand all the principles at work. Try to write towards a student in the same course but a different lab section. That student will have a fair degree of scientific expertise but won’t know much about your experiment particularly. Alternatively, you could envision yourself five years from now, after the reading and lectures for this course have faded a bit. What would you remember, and what would you need explained more clearly (as a refresher)?

Once you’ve completed these steps as you perform the experiment, you’ll be in a good position to draft an effective lab report.

Introductions

How do i write a strong introduction.

For the purposes of this handout, we’ll consider the Introduction to contain four basic elements: the purpose, the scientific literature relevant to the subject, the hypothesis, and the reasons you believed your hypothesis viable. Let’s start by going through each element of the Introduction to clarify what it covers and why it’s important. Then we can formulate a logical organizational strategy for the section.

The inclusion of the purpose (sometimes called the objective) of the experiment often confuses writers. The biggest misconception is that the purpose is the same as the hypothesis. Not quite. We’ll get to hypotheses in a minute, but basically they provide some indication of what you expect the experiment to show. The purpose is broader, and deals more with what you expect to gain through the experiment. In a professional setting, the hypothesis might have something to do with how cells react to a certain kind of genetic manipulation, but the purpose of the experiment is to learn more about potential cancer treatments. Undergraduate reports don’t often have this wide-ranging a goal, but you should still try to maintain the distinction between your hypothesis and your purpose. In a solubility experiment, for example, your hypothesis might talk about the relationship between temperature and the rate of solubility, but the purpose is probably to learn more about some specific scientific principle underlying the process of solubility.

For starters, most people say that you should write out your working hypothesis before you perform the experiment or study. Many beginning science students neglect to do so and find themselves struggling to remember precisely which variables were involved in the process or in what way the researchers felt that they were related. Write your hypothesis down as you develop it—you’ll be glad you did.

As for the form a hypothesis should take, it’s best not to be too fancy or complicated; an inventive style isn’t nearly so important as clarity here. There’s nothing wrong with beginning your hypothesis with the phrase, “It was hypothesized that . . .” Be as specific as you can about the relationship between the different objects of your study. In other words, explain that when term A changes, term B changes in this particular way. Readers of scientific writing are rarely content with the idea that a relationship between two terms exists—they want to know what that relationship entails.

Not a hypothesis:

“It was hypothesized that there is a significant relationship between the temperature of a solvent and the rate at which a solute dissolves.”

Hypothesis:

“It was hypothesized that as the temperature of a solvent increases, the rate at which a solute will dissolve in that solvent increases.”

Put more technically, most hypotheses contain both an independent and a dependent variable. The independent variable is what you manipulate to test the reaction; the dependent variable is what changes as a result of your manipulation. In the example above, the independent variable is the temperature of the solvent, and the dependent variable is the rate of solubility. Be sure that your hypothesis includes both variables.

Justify your hypothesis

You need to do more than tell your readers what your hypothesis is; you also need to assure them that this hypothesis was reasonable, given the circumstances. In other words, use the Introduction to explain that you didn’t just pluck your hypothesis out of thin air. (If you did pluck it out of thin air, your problems with your report will probably extend beyond using the appropriate format.) If you posit that a particular relationship exists between the independent and the dependent variable, what led you to believe your “guess” might be supported by evidence?

Scientists often refer to this type of justification as “motivating” the hypothesis, in the sense that something propelled them to make that prediction. Often, motivation includes what we already know—or rather, what scientists generally accept as true (see “Background/previous research” below). But you can also motivate your hypothesis by relying on logic or on your own observations. If you’re trying to decide which solutes will dissolve more rapidly in a solvent at increased temperatures, you might remember that some solids are meant to dissolve in hot water (e.g., bouillon cubes) and some are used for a function precisely because they withstand higher temperatures (they make saucepans out of something). Or you can think about whether you’ve noticed sugar dissolving more rapidly in your glass of iced tea or in your cup of coffee. Even such basic, outside-the-lab observations can help you justify your hypothesis as reasonable.

Background/previous research

This part of the Introduction demonstrates to the reader your awareness of how you’re building on other scientists’ work. If you think of the scientific community as engaging in a series of conversations about various topics, then you’ll recognize that the relevant background material will alert the reader to which conversation you want to enter.

Generally speaking, authors writing journal articles use the background for slightly different purposes than do students completing assignments. Because readers of academic journals tend to be professionals in the field, authors explain the background in order to permit readers to evaluate the study’s pertinence for their own work. You, on the other hand, write toward a much narrower audience—your peers in the course or your lab instructor—and so you must demonstrate that you understand the context for the (presumably assigned) experiment or study you’ve completed. For example, if your professor has been talking about polarity during lectures, and you’re doing a solubility experiment, you might try to connect the polarity of a solid to its relative solubility in certain solvents. In any event, both professional researchers and undergraduates need to connect the background material overtly to their own work.

Organization of this section

Most of the time, writers begin by stating the purpose or objectives of their own work, which establishes for the reader’s benefit the “nature and scope of the problem investigated” (Day 1994). Once you have expressed your purpose, you should then find it easier to move from the general purpose, to relevant material on the subject, to your hypothesis. In abbreviated form, an Introduction section might look like this:

“The purpose of the experiment was to test conventional ideas about solubility in the laboratory [purpose] . . . According to Whitecoat and Labrat (1999), at higher temperatures the molecules of solvents move more quickly . . . We know from the class lecture that molecules moving at higher rates of speed collide with one another more often and thus break down more easily [background material/motivation] . . . Thus, it was hypothesized that as the temperature of a solvent increases, the rate at which a solute will dissolve in that solvent increases [hypothesis].”

Again—these are guidelines, not commandments. Some writers and readers prefer different structures for the Introduction. The one above merely illustrates a common approach to organizing material.

How do I write a strong Materials and Methods section?

As with any piece of writing, your Methods section will succeed only if it fulfills its readers’ expectations, so you need to be clear in your own mind about the purpose of this section. Let’s review the purpose as we described it above: in this section, you want to describe in detail how you tested the hypothesis you developed and also to clarify the rationale for your procedure. In science, it’s not sufficient merely to design and carry out an experiment. Ultimately, others must be able to verify your findings, so your experiment must be reproducible, to the extent that other researchers can follow the same procedure and obtain the same (or similar) results.

Here’s a real-world example of the importance of reproducibility. In 1989, physicists Stanley Pons and Martin Fleischman announced that they had discovered “cold fusion,” a way of producing excess heat and power without the nuclear radiation that accompanies “hot fusion.” Such a discovery could have great ramifications for the industrial production of energy, so these findings created a great deal of interest. When other scientists tried to duplicate the experiment, however, they didn’t achieve the same results, and as a result many wrote off the conclusions as unjustified (or worse, a hoax). To this day, the viability of cold fusion is debated within the scientific community, even though an increasing number of researchers believe it possible. So when you write your Methods section, keep in mind that you need to describe your experiment well enough to allow others to replicate it exactly.

With these goals in mind, let’s consider how to write an effective Methods section in terms of content, structure, and style.

Sometimes the hardest thing about writing this section isn’t what you should talk about, but what you shouldn’t talk about. Writers often want to include the results of their experiment, because they measured and recorded the results during the course of the experiment. But such data should be reserved for the Results section. In the Methods section, you can write that you recorded the results, or how you recorded the results (e.g., in a table), but you shouldn’t write what the results were—not yet. Here, you’re merely stating exactly how you went about testing your hypothesis. As you draft your Methods section, ask yourself the following questions:

  • How much detail? Be precise in providing details, but stay relevant. Ask yourself, “Would it make any difference if this piece were a different size or made from a different material?” If not, you probably don’t need to get too specific. If so, you should give as many details as necessary to prevent this experiment from going awry if someone else tries to carry it out. Probably the most crucial detail is measurement; you should always quantify anything you can, such as time elapsed, temperature, mass, volume, etc.
  • Rationale: Be sure that as you’re relating your actions during the experiment, you explain your rationale for the protocol you developed. If you capped a test tube immediately after adding a solute to a solvent, why did you do that? (That’s really two questions: why did you cap it, and why did you cap it immediately?) In a professional setting, writers provide their rationale as a way to explain their thinking to potential critics. On one hand, of course, that’s your motivation for talking about protocol, too. On the other hand, since in practical terms you’re also writing to your teacher (who’s seeking to evaluate how well you comprehend the principles of the experiment), explaining the rationale indicates that you understand the reasons for conducting the experiment in that way, and that you’re not just following orders. Critical thinking is crucial—robots don’t make good scientists.
  • Control: Most experiments will include a control, which is a means of comparing experimental results. (Sometimes you’ll need to have more than one control, depending on the number of hypotheses you want to test.) The control is exactly the same as the other items you’re testing, except that you don’t manipulate the independent variable-the condition you’re altering to check the effect on the dependent variable. For example, if you’re testing solubility rates at increased temperatures, your control would be a solution that you didn’t heat at all; that way, you’ll see how quickly the solute dissolves “naturally” (i.e., without manipulation), and you’ll have a point of reference against which to compare the solutions you did heat.

Describe the control in the Methods section. Two things are especially important in writing about the control: identify the control as a control, and explain what you’re controlling for. Here is an example:

“As a control for the temperature change, we placed the same amount of solute in the same amount of solvent, and let the solution stand for five minutes without heating it.”

Structure and style

Organization is especially important in the Methods section of a lab report because readers must understand your experimental procedure completely. Many writers are surprised by the difficulty of conveying what they did during the experiment, since after all they’re only reporting an event, but it’s often tricky to present this information in a coherent way. There’s a fairly standard structure you can use to guide you, and following the conventions for style can help clarify your points.

  • Subsections: Occasionally, researchers use subsections to report their procedure when the following circumstances apply: 1) if they’ve used a great many materials; 2) if the procedure is unusually complicated; 3) if they’ve developed a procedure that won’t be familiar to many of their readers. Because these conditions rarely apply to the experiments you’ll perform in class, most undergraduate lab reports won’t require you to use subsections. In fact, many guides to writing lab reports suggest that you try to limit your Methods section to a single paragraph.
  • Narrative structure: Think of this section as telling a story about a group of people and the experiment they performed. Describe what you did in the order in which you did it. You may have heard the old joke centered on the line, “Disconnect the red wire, but only after disconnecting the green wire,” where the person reading the directions blows everything to kingdom come because the directions weren’t in order. We’re used to reading about events chronologically, and so your readers will generally understand what you did if you present that information in the same way. Also, since the Methods section does generally appear as a narrative (story), you want to avoid the “recipe” approach: “First, take a clean, dry 100 ml test tube from the rack. Next, add 50 ml of distilled water.” You should be reporting what did happen, not telling the reader how to perform the experiment: “50 ml of distilled water was poured into a clean, dry 100 ml test tube.” Hint: most of the time, the recipe approach comes from copying down the steps of the procedure from your lab manual, so you may want to draft the Methods section initially without consulting your manual. Later, of course, you can go back and fill in any part of the procedure you inadvertently overlooked.
  • Past tense: Remember that you’re describing what happened, so you should use past tense to refer to everything you did during the experiment. Writers are often tempted to use the imperative (“Add 5 g of the solid to the solution”) because that’s how their lab manuals are worded; less frequently, they use present tense (“5 g of the solid are added to the solution”). Instead, remember that you’re talking about an event which happened at a particular time in the past, and which has already ended by the time you start writing, so simple past tense will be appropriate in this section (“5 g of the solid were added to the solution” or “We added 5 g of the solid to the solution”).
  • Active: We heated the solution to 80°C. (The subject, “we,” performs the action, heating.)
  • Passive: The solution was heated to 80°C. (The subject, “solution,” doesn’t do the heating–it is acted upon, not acting.)

Increasingly, especially in the social sciences, using first person and active voice is acceptable in scientific reports. Most readers find that this style of writing conveys information more clearly and concisely. This rhetorical choice thus brings two scientific values into conflict: objectivity versus clarity. Since the scientific community hasn’t reached a consensus about which style it prefers, you may want to ask your lab instructor.

How do I write a strong Results section?

Here’s a paradox for you. The Results section is often both the shortest (yay!) and most important (uh-oh!) part of your report. Your Materials and Methods section shows how you obtained the results, and your Discussion section explores the significance of the results, so clearly the Results section forms the backbone of the lab report. This section provides the most critical information about your experiment: the data that allow you to discuss how your hypothesis was or wasn’t supported. But it doesn’t provide anything else, which explains why this section is generally shorter than the others.

Before you write this section, look at all the data you collected to figure out what relates significantly to your hypothesis. You’ll want to highlight this material in your Results section. Resist the urge to include every bit of data you collected, since perhaps not all are relevant. Also, don’t try to draw conclusions about the results—save them for the Discussion section. In this section, you’re reporting facts. Nothing your readers can dispute should appear in the Results section.

Most Results sections feature three distinct parts: text, tables, and figures. Let’s consider each part one at a time.

This should be a short paragraph, generally just a few lines, that describes the results you obtained from your experiment. In a relatively simple experiment, one that doesn’t produce a lot of data for you to repeat, the text can represent the entire Results section. Don’t feel that you need to include lots of extraneous detail to compensate for a short (but effective) text; your readers appreciate discrimination more than your ability to recite facts. In a more complex experiment, you may want to use tables and/or figures to help guide your readers toward the most important information you gathered. In that event, you’ll need to refer to each table or figure directly, where appropriate:

“Table 1 lists the rates of solubility for each substance”

“Solubility increased as the temperature of the solution increased (see Figure 1).”

If you do use tables or figures, make sure that you don’t present the same material in both the text and the tables/figures, since in essence you’ll just repeat yourself, probably annoying your readers with the redundancy of your statements.

Feel free to describe trends that emerge as you examine the data. Although identifying trends requires some judgment on your part and so may not feel like factual reporting, no one can deny that these trends do exist, and so they properly belong in the Results section. Example:

“Heating the solution increased the rate of solubility of polar solids by 45% but had no effect on the rate of solubility in solutions containing non-polar solids.”

This point isn’t debatable—you’re just pointing out what the data show.

As in the Materials and Methods section, you want to refer to your data in the past tense, because the events you recorded have already occurred and have finished occurring. In the example above, note the use of “increased” and “had,” rather than “increases” and “has.” (You don’t know from your experiment that heating always increases the solubility of polar solids, but it did that time.)

You shouldn’t put information in the table that also appears in the text. You also shouldn’t use a table to present irrelevant data, just to show you did collect these data during the experiment. Tables are good for some purposes and situations, but not others, so whether and how you’ll use tables depends upon what you need them to accomplish.

Tables are useful ways to show variation in data, but not to present a great deal of unchanging measurements. If you’re dealing with a scientific phenomenon that occurs only within a certain range of temperatures, for example, you don’t need to use a table to show that the phenomenon didn’t occur at any of the other temperatures. How useful is this table?

A table labeled Effect of Temperature on Rate of Solubility with temperature of solvent values in 10-degree increments from -20 degrees Celsius to 80 degrees Celsius that does not show a corresponding rate of solubility value until 50 degrees Celsius.

As you can probably see, no solubility was observed until the trial temperature reached 50°C, a fact that the text part of the Results section could easily convey. The table could then be limited to what happened at 50°C and higher, thus better illustrating the differences in solubility rates when solubility did occur.

As a rule, try not to use a table to describe any experimental event you can cover in one sentence of text. Here’s an example of an unnecessary table from How to Write and Publish a Scientific Paper , by Robert A. Day:

A table labeled Oxygen requirements of various species of Streptomyces showing the names of organisms and two columns that indicate growth under aerobic conditions and growth under anaerobic conditions with a plus or minus symbol for each organism in the growth columns to indicate value.

As Day notes, all the information in this table can be summarized in one sentence: “S. griseus, S. coelicolor, S. everycolor, and S. rainbowenski grew under aerobic conditions, whereas S. nocolor and S. greenicus required anaerobic conditions.” Most readers won’t find the table clearer than that one sentence.

When you do have reason to tabulate material, pay attention to the clarity and readability of the format you use. Here are a few tips:

  • Number your table. Then, when you refer to the table in the text, use that number to tell your readers which table they can review to clarify the material.
  • Give your table a title. This title should be descriptive enough to communicate the contents of the table, but not so long that it becomes difficult to follow. The titles in the sample tables above are acceptable.
  • Arrange your table so that readers read vertically, not horizontally. For the most part, this rule means that you should construct your table so that like elements read down, not across. Think about what you want your readers to compare, and put that information in the column (up and down) rather than in the row (across). Usually, the point of comparison will be the numerical data you collect, so especially make sure you have columns of numbers, not rows.Here’s an example of how drastically this decision affects the readability of your table (from A Short Guide to Writing about Chemistry , by Herbert Beall and John Trimbur). Look at this table, which presents the relevant data in horizontal rows:

A table labeled Boyle's Law Experiment: Measuring Volume as a Function of Pressure that presents the trial number, length of air sample in millimeters, and height difference in inches of mercury, each of which is presented in rows horizontally.

It’s a little tough to see the trends that the author presumably wants to present in this table. Compare this table, in which the data appear vertically:

A table labeled Boyle's Law Experiment: Measuring Volume as a Function of Pressure that presents the trial number, length of air sample in millimeters, and height difference in inches of mercury, each of which is presented in columns vertically.

The second table shows how putting like elements in a vertical column makes for easier reading. In this case, the like elements are the measurements of length and height, over five trials–not, as in the first table, the length and height measurements for each trial.

  • Make sure to include units of measurement in the tables. Readers might be able to guess that you measured something in millimeters, but don’t make them try.
  • Don’t use vertical lines as part of the format for your table. This convention exists because journals prefer not to have to reproduce these lines because the tables then become more expensive to print. Even though it’s fairly unlikely that you’ll be sending your Biology 11 lab report to Science for publication, your readers still have this expectation. Consequently, if you use the table-drawing option in your word-processing software, choose the option that doesn’t rely on a “grid” format (which includes vertical lines).

How do I include figures in my report?

Although tables can be useful ways of showing trends in the results you obtained, figures (i.e., illustrations) can do an even better job of emphasizing such trends. Lab report writers often use graphic representations of the data they collected to provide their readers with a literal picture of how the experiment went.

When should you use a figure?

Remember the circumstances under which you don’t need a table: when you don’t have a great deal of data or when the data you have don’t vary a lot. Under the same conditions, you would probably forgo the figure as well, since the figure would be unlikely to provide your readers with an additional perspective. Scientists really don’t like their time wasted, so they tend not to respond favorably to redundancy.

If you’re trying to decide between using a table and creating a figure to present your material, consider the following a rule of thumb. The strength of a table lies in its ability to supply large amounts of exact data, whereas the strength of a figure is its dramatic illustration of important trends within the experiment. If you feel that your readers won’t get the full impact of the results you obtained just by looking at the numbers, then a figure might be appropriate.

Of course, an undergraduate class may expect you to create a figure for your lab experiment, if only to make sure that you can do so effectively. If this is the case, then don’t worry about whether to use figures or not—concentrate instead on how best to accomplish your task.

Figures can include maps, photographs, pen-and-ink drawings, flow charts, bar graphs, and section graphs (“pie charts”). But the most common figure by far, especially for undergraduates, is the line graph, so we’ll focus on that type in this handout.

At the undergraduate level, you can often draw and label your graphs by hand, provided that the result is clear, legible, and drawn to scale. Computer technology has, however, made creating line graphs a lot easier. Most word-processing software has a number of functions for transferring data into graph form; many scientists have found Microsoft Excel, for example, a helpful tool in graphing results. If you plan on pursuing a career in the sciences, it may be well worth your while to learn to use a similar program.

Computers can’t, however, decide for you how your graph really works; you have to know how to design your graph to meet your readers’ expectations. Here are some of these expectations:

  • Keep it as simple as possible. You may be tempted to signal the complexity of the information you gathered by trying to design a graph that accounts for that complexity. But remember the purpose of your graph: to dramatize your results in a manner that’s easy to see and grasp. Try not to make the reader stare at the graph for a half hour to find the important line among the mass of other lines. For maximum effectiveness, limit yourself to three to five lines per graph; if you have more data to demonstrate, use a set of graphs to account for it, rather than trying to cram it all into a single figure.
  • Plot the independent variable on the horizontal (x) axis and the dependent variable on the vertical (y) axis. Remember that the independent variable is the condition that you manipulated during the experiment and the dependent variable is the condition that you measured to see if it changed along with the independent variable. Placing the variables along their respective axes is mostly just a convention, but since your readers are accustomed to viewing graphs in this way, you’re better off not challenging the convention in your report.
  • Label each axis carefully, and be especially careful to include units of measure. You need to make sure that your readers understand perfectly well what your graph indicates.
  • Number and title your graphs. As with tables, the title of the graph should be informative but concise, and you should refer to your graph by number in the text (e.g., “Figure 1 shows the increase in the solubility rate as a function of temperature”).
  • Many editors of professional scientific journals prefer that writers distinguish the lines in their graphs by attaching a symbol to them, usually a geometric shape (triangle, square, etc.), and using that symbol throughout the curve of the line. Generally, readers have a hard time distinguishing dotted lines from dot-dash lines from straight lines, so you should consider staying away from this system. Editors don’t usually like different-colored lines within a graph because colors are difficult and expensive to reproduce; colors may, however, be great for your purposes, as long as you’re not planning to submit your paper to Nature. Use your discretion—try to employ whichever technique dramatizes the results most effectively.
  • Try to gather data at regular intervals, so the plot points on your graph aren’t too far apart. You can’t be sure of the arc you should draw between the plot points if the points are located at the far corners of the graph; over a fifteen-minute interval, perhaps the change occurred in the first or last thirty seconds of that period (in which case your straight-line connection between the points is misleading).
  • If you’re worried that you didn’t collect data at sufficiently regular intervals during your experiment, go ahead and connect the points with a straight line, but you may want to examine this problem as part of your Discussion section.
  • Make your graph large enough so that everything is legible and clearly demarcated, but not so large that it either overwhelms the rest of the Results section or provides a far greater range than you need to illustrate your point. If, for example, the seedlings of your plant grew only 15 mm during the trial, you don’t need to construct a graph that accounts for 100 mm of growth. The lines in your graph should more or less fill the space created by the axes; if you see that your data is confined to the lower left portion of the graph, you should probably re-adjust your scale.
  • If you create a set of graphs, make them the same size and format, including all the verbal and visual codes (captions, symbols, scale, etc.). You want to be as consistent as possible in your illustrations, so that your readers can easily make the comparisons you’re trying to get them to see.

How do I write a strong Discussion section?

The discussion section is probably the least formalized part of the report, in that you can’t really apply the same structure to every type of experiment. In simple terms, here you tell your readers what to make of the Results you obtained. If you have done the Results part well, your readers should already recognize the trends in the data and have a fairly clear idea of whether your hypothesis was supported. Because the Results can seem so self-explanatory, many students find it difficult to know what material to add in this last section.

Basically, the Discussion contains several parts, in no particular order, but roughly moving from specific (i.e., related to your experiment only) to general (how your findings fit in the larger scientific community). In this section, you will, as a rule, need to:

Explain whether the data support your hypothesis

  • Acknowledge any anomalous data or deviations from what you expected

Derive conclusions, based on your findings, about the process you’re studying

  • Relate your findings to earlier work in the same area (if you can)

Explore the theoretical and/or practical implications of your findings

Let’s look at some dos and don’ts for each of these objectives.

This statement is usually a good way to begin the Discussion, since you can’t effectively speak about the larger scientific value of your study until you’ve figured out the particulars of this experiment. You might begin this part of the Discussion by explicitly stating the relationships or correlations your data indicate between the independent and dependent variables. Then you can show more clearly why you believe your hypothesis was or was not supported. For example, if you tested solubility at various temperatures, you could start this section by noting that the rates of solubility increased as the temperature increased. If your initial hypothesis surmised that temperature change would not affect solubility, you would then say something like,

“The hypothesis that temperature change would not affect solubility was not supported by the data.”

Note: Students tend to view labs as practical tests of undeniable scientific truths. As a result, you may want to say that the hypothesis was “proved” or “disproved” or that it was “correct” or “incorrect.” These terms, however, reflect a degree of certainty that you as a scientist aren’t supposed to have. Remember, you’re testing a theory with a procedure that lasts only a few hours and relies on only a few trials, which severely compromises your ability to be sure about the “truth” you see. Words like “supported,” “indicated,” and “suggested” are more acceptable ways to evaluate your hypothesis.

Also, recognize that saying whether the data supported your hypothesis or not involves making a claim to be defended. As such, you need to show the readers that this claim is warranted by the evidence. Make sure that you’re very explicit about the relationship between the evidence and the conclusions you draw from it. This process is difficult for many writers because we don’t often justify conclusions in our regular lives. For example, you might nudge your friend at a party and whisper, “That guy’s drunk,” and once your friend lays eyes on the person in question, she might readily agree. In a scientific paper, by contrast, you would need to defend your claim more thoroughly by pointing to data such as slurred words, unsteady gait, and the lampshade-as-hat. In addition to pointing out these details, you would also need to show how (according to previous studies) these signs are consistent with inebriation, especially if they occur in conjunction with one another. To put it another way, tell your readers exactly how you got from point A (was the hypothesis supported?) to point B (yes/no).

Acknowledge any anomalous data, or deviations from what you expected

You need to take these exceptions and divergences into account, so that you qualify your conclusions sufficiently. For obvious reasons, your readers will doubt your authority if you (deliberately or inadvertently) overlook a key piece of data that doesn’t square with your perspective on what occurred. In a more philosophical sense, once you’ve ignored evidence that contradicts your claims, you’ve departed from the scientific method. The urge to “tidy up” the experiment is often strong, but if you give in to it you’re no longer performing good science.

Sometimes after you’ve performed a study or experiment, you realize that some part of the methods you used to test your hypothesis was flawed. In that case, it’s OK to suggest that if you had the chance to conduct your test again, you might change the design in this or that specific way in order to avoid such and such a problem. The key to making this approach work, though, is to be very precise about the weakness in your experiment, why and how you think that weakness might have affected your data, and how you would alter your protocol to eliminate—or limit the effects of—that weakness. Often, inexperienced researchers and writers feel the need to account for “wrong” data (remember, there’s no such animal), and so they speculate wildly about what might have screwed things up. These speculations include such factors as the unusually hot temperature in the room, or the possibility that their lab partners read the meters wrong, or the potentially defective equipment. These explanations are what scientists call “cop-outs,” or “lame”; don’t indicate that the experiment had a weakness unless you’re fairly certain that a) it really occurred and b) you can explain reasonably well how that weakness affected your results.

If, for example, your hypothesis dealt with the changes in solubility at different temperatures, then try to figure out what you can rationally say about the process of solubility more generally. If you’re doing an undergraduate lab, chances are that the lab will connect in some way to the material you’ve been covering either in lecture or in your reading, so you might choose to return to these resources as a way to help you think clearly about the process as a whole.

This part of the Discussion section is another place where you need to make sure that you’re not overreaching. Again, nothing you’ve found in one study would remotely allow you to claim that you now “know” something, or that something isn’t “true,” or that your experiment “confirmed” some principle or other. Hesitate before you go out on a limb—it’s dangerous! Use less absolutely conclusive language, including such words as “suggest,” “indicate,” “correspond,” “possibly,” “challenge,” etc.

Relate your findings to previous work in the field (if possible)

We’ve been talking about how to show that you belong in a particular community (such as biologists or anthropologists) by writing within conventions that they recognize and accept. Another is to try to identify a conversation going on among members of that community, and use your work to contribute to that conversation. In a larger philosophical sense, scientists can’t fully understand the value of their research unless they have some sense of the context that provoked and nourished it. That is, you have to recognize what’s new about your project (potentially, anyway) and how it benefits the wider body of scientific knowledge. On a more pragmatic level, especially for undergraduates, connecting your lab work to previous research will demonstrate to the TA that you see the big picture. You have an opportunity, in the Discussion section, to distinguish yourself from the students in your class who aren’t thinking beyond the barest facts of the study. Capitalize on this opportunity by putting your own work in context.

If you’re just beginning to work in the natural sciences (as a first-year biology or chemistry student, say), most likely the work you’ll be doing has already been performed and re-performed to a satisfactory degree. Hence, you could probably point to a similar experiment or study and compare/contrast your results and conclusions. More advanced work may deal with an issue that is somewhat less “resolved,” and so previous research may take the form of an ongoing debate, and you can use your own work to weigh in on that debate. If, for example, researchers are hotly disputing the value of herbal remedies for the common cold, and the results of your study suggest that Echinacea diminishes the symptoms but not the actual presence of the cold, then you might want to take some time in the Discussion section to recapitulate the specifics of the dispute as it relates to Echinacea as an herbal remedy. (Consider that you have probably already written in the Introduction about this debate as background research.)

This information is often the best way to end your Discussion (and, for all intents and purposes, the report). In argumentative writing generally, you want to use your closing words to convey the main point of your writing. This main point can be primarily theoretical (“Now that you understand this information, you’re in a better position to understand this larger issue”) or primarily practical (“You can use this information to take such and such an action”). In either case, the concluding statements help the reader to comprehend the significance of your project and your decision to write about it.

Since a lab report is argumentative—after all, you’re investigating a claim, and judging the legitimacy of that claim by generating and collecting evidence—it’s often a good idea to end your report with the same technique for establishing your main point. If you want to go the theoretical route, you might talk about the consequences your study has for the field or phenomenon you’re investigating. To return to the examples regarding solubility, you could end by reflecting on what your work on solubility as a function of temperature tells us (potentially) about solubility in general. (Some folks consider this type of exploration “pure” as opposed to “applied” science, although these labels can be problematic.) If you want to go the practical route, you could end by speculating about the medical, institutional, or commercial implications of your findings—in other words, answer the question, “What can this study help people to do?” In either case, you’re going to make your readers’ experience more satisfying, by helping them see why they spent their time learning what you had to teach them.

Works consulted

We consulted these works while writing this handout. This is not a comprehensive list of resources on the handout’s topic, and we encourage you to do your own research to find additional publications. Please do not use this list as a model for the format of your own reference list, as it may not match the citation style you are using. For guidance on formatting citations, please see the UNC Libraries citation tutorial . We revise these tips periodically and welcome feedback.

American Psychological Association. 2010. Publication Manual of the American Psychological Association . 6th ed. Washington, DC: American Psychological Association.

Beall, Herbert, and John Trimbur. 2001. A Short Guide to Writing About Chemistry , 2nd ed. New York: Longman.

Blum, Deborah, and Mary Knudson. 1997. A Field Guide for Science Writers: The Official Guide of the National Association of Science Writers . New York: Oxford University Press.

Booth, Wayne C., Gregory G. Colomb, Joseph M. Williams, Joseph Bizup, and William T. FitzGerald. 2016. The Craft of Research , 4th ed. Chicago: University of Chicago Press.

Briscoe, Mary Helen. 1996. Preparing Scientific Illustrations: A Guide to Better Posters, Presentations, and Publications , 2nd ed. New York: Springer-Verlag.

Council of Science Editors. 2014. Scientific Style and Format: The CSE Manual for Authors, Editors, and Publishers , 8th ed. Chicago & London: University of Chicago Press.

Davis, Martha. 2012. Scientific Papers and Presentations , 3rd ed. London: Academic Press.

Day, Robert A. 1994. How to Write and Publish a Scientific Paper , 4th ed. Phoenix: Oryx Press.

Porush, David. 1995. A Short Guide to Writing About Science . New York: Longman.

Williams, Joseph, and Joseph Bizup. 2017. Style: Lessons in Clarity and Grace , 12th ed. Boston: Pearson.

You may reproduce it for non-commercial use if you use the entire handout and attribute the source: The Writing Center, University of North Carolina at Chapel Hill

Make a Gift

On World Parkinson’s Day, a New Theory Emerges on the Disease’s Origins and Spread

The nose or the gut? For the past two decades, the scientific community has debated the wellspring of the toxic proteins at the source of Parkinson’s disease. In 2003, a German pathologist, Heiko Braak, MD, first proposed that the disease begins outside the brain. More recently, Per Borghammer, MD, with Aarhus University Hospital in Denmark, and his colleagues argue that the disease is the result of processes that start in either the brain’s smell center (brain-first) or the body’s intestinal tract (body-first).    

A new hypothesis paper appearing in the Journal of Parkinson’s Disease on World Parkinson’s Day unites the brain- and body-first models with some of the likely causes of the disease–environmental toxicants that are either inhaled or ingested. The authors of the new study, who include Borghammer, argue that inhalation of certain pesticides, common dry cleaning chemicals, and air pollution predispose to a brain-first model of the disease. Other ingested toxicants, such as tainted food and contaminated drinking water, lead to body-first model of the disease.

“In both the brain-first and body-first scenarios the pathology arises in structures in the body closely connected to the outside world,” said Ray Dorsey, MD , a professor of Neurology at the University of Rochester Medical Center and co-author of the piece. “Here we propose that Parkinson’s is a systemic disease and that its initial roots likely begin in the nose and in the gut and are tied to environmental factors increasingly recognized as major contributors, if not causes, of the disease. This further reinforces the idea that Parkinson’s, the world’s fastest growing brain disease, may be fueled by toxicants and is therefore largely preventable.”  

Different pathways to the brain, different forms of disease

A misfolded protein called alpha-synuclein has been in scientists’ sights for the last 25 years as one of the driving forces behind Parkinson’s. Over time, the protein accumulates in the brain in clumps, called Lewy bodies, and causes progressive dysfunction and death of many types of nerve cells, including those in the dopamine-producing regions of the brain that control motor function. When first proposed, Braak thought that an unidentified pathogen, such as a virus, may be responsible for the disease. 

The new piece argues that toxins encountered in the environment, specifically the dry cleaning and degreasing chemicals trichloroethylene (TCE) and perchloroethylene (PCE), the weed killer paraquat, and air pollution, could be common causes for the formation of toxic alpha-synuclein. TCE and PCE contaminates thousands of former industrial, commercial, and military sites, most notably the Marine Corps base Camp Lejeune, and paraquat is one of the most widely used herbicides in the US, despite being banned for safety concerns in more than 30 countries, including the European Union and China.  Air pollution was at toxic levels in nineteenth century London when James Parkinson, whose 269th birthday is celebrated today, first described the condition. 

Dorsey Figure 2_April 2024

ScienceDaily

Parkinson's Disease: New theory on the disease's origins and spread

The nose or the gut? For the past two decades, the scientific community has debated the wellspring of the toxic proteins at the source of Parkinson's disease. In 2003, a German pathologist, Heiko Braak, MD, first proposed that the disease begins outside the brain. More recently, Per Borghammer, MD, with Aarhus University Hospital in Denmark, and his colleagues argue that the disease is the result of processes that start in either the brain's smell center (brain-first) or the body's intestinal tract (body-first).

A new hypothesis paper appearing in the Journal of Parkinson's Disease on World Parkinson's Day unites the brain- and body-first models with some of the likely causes of the disease-environmental toxicants that are either inhaled or ingested. The authors of the new study, who include Borghammer, argue that inhalation of certain pesticides, common dry cleaning chemicals, and air pollution predispose to a brain-first model of the disease. Other ingested toxicants, such as tainted food and contaminated drinking water, lead to body-first model of the disease.

"In both the brain-first and body-first scenarios the pathology arises in structures in the body closely connected to the outside world," said Ray Dorsey, MD, a professor of Neurology at the University of Rochester Medical Center and co-author of the piece. "Here we propose that Parkinson's is a systemic disease and that its initial roots likely begin in the nose and in the gut and are tied to environmental factors increasingly recognized as major contributors, if not causes, of the disease. This further reinforces the idea that Parkinson's, the world's fastest growing brain disease, may be fueled by toxicants and is therefore largely preventable."

Different pathways to the brain, different forms of disease

A misfolded protein called alpha-synuclein has been in scientists' sights for the last 25 years as one of the driving forces behind Parkinson's. Over time, the protein accumulates in the brain in clumps, called Lewy bodies, and causes progressive dysfunction and death of many types of nerve cells, including those in the dopamine-producing regions of the brain that control motor function. When first proposed, Braak thought that an unidentified pathogen, such as a virus, may be responsible for the disease.

The new piece argues that toxins encountered in the environment, specifically the dry cleaning and degreasing chemicals trichloroethylene (TCE) and perchloroethylene (PCE), the weed killer paraquat, and air pollution, could be common causes for the formation of toxic alpha-synuclein. TCE and PCE contaminates thousands of former industrial, commercial, and military sites, most notably the Marine Corps base Camp Lejeune, and paraquat is one of the most widely used herbicides in the US, despite being banned for safety concerns in more than 30 countries, including the European Union and China. Air pollution was at toxic levels in nineteenth century London when James Parkinson, whose 269th birthday is celebrated today, first described the condition.

The nose and the gut are lined with a soft permeable tissue, and both have well established connections to the brain. In the brain-first model, the chemicals are inhaled and may enter the brain via the nerve responsible for smell. From the brain's smell center, alpha-synuclein spreads to other parts of the brain principally on one side, including regions with concentrations of dopamine-producing neurons. The death of these cells is a hallmark of Parkinson's disease. The disease may cause asymmetric tremor and slowness in movement and, a slower rate of progression after diagnosis, and only much later, significant cognitive impairment or dementia.

When ingested, the chemicals pass through the lining of the gastrointestinal tract. Initial alpha-synuclein pathology may begin in the gut's own nervous system from where it can spread to both sides of the brain and spinal cord. This body-first pathway is often associated with Lewy body dementia, a disease in the same family as Parkinson's, which is characterized by early constipation and sleep disturbance, followed by more symmetric slowing in movements and earlier dementia, as the disease spreads through both brain hemispheres.

New models to understand and study brain diseases

"These environmental toxicants are widespread and not everyone has Parkinson's disease," said Dorsey. "The timing, dose, and duration of exposure and interactions with genetic and other environmental factors are probably key to determining who ultimately develops Parkinson's. In most instances, these exposures likely occurred years or decades before symptoms develop."

Pointing to a growing body of research linking environmental exposure to Parkinson's disease, the authors believe the new models may enable the scientific community to connect specific exposures to specific forms of the disease. This effort will be aided by increasing public awareness of the adverse health effects of many chemicals in our environment. The authors conclude that their hypothesis "may explain many of the mysteries of Parkinson's disease and open the door toward the ultimate goal-prevention."

In addition to Parkinson's, these models of environmental exposure may advance understanding of how toxicants contribute to other brain disorders, including autism in children, ALS in adults, and Alzheimer's in seniors. Dorsey and his colleagues at the University of Rochester have organized a symposium on the Brain and the Environment in Washington, DC, on May 20 that will examine the role toxicants in our food, water, and air are playing in all these brain diseases.

Additional authors of the hypothesis paper include Briana De Miranda, PhD, with the University of Alabama at Birmingham, and Jacob Horsager, MD, PhD, with Aarhus University Hospital in Denmark.

  • Parkinson's Research
  • Chronic Illness
  • Brain Tumor
  • Diseases and Conditions
  • Parkinson's
  • Disorders and Syndromes
  • Brain-Computer Interfaces
  • Parkinson's disease
  • Deep brain stimulation
  • Homosexuality
  • Dopamine hypothesis of schizophrenia
  • Excitotoxicity and cell damage

Story Source:

Materials provided by University of Rochester Medical Center . Original written by Mark Michaud. Note: Content may be edited for style and length.

Journal Reference :

  • E. Ray Dorsey, Briana R. De Miranda, Jacob Horsager, Per Borghammer. The Body, the Brain, the Environment, and Parkinson’s Disease . Journal of Parkinson's Disease , 2024; 1 DOI: 10.3233/JPD-240019

Cite This Page :

Explore More

  • Drain On Economy Due to Climate Change
  • 'Tube Map' Around Planets and Moons
  • 'Bizarre' Evolutionary Pattern: Homo Lineage
  • Largest Known Marine Reptile
  • Neolithic Humans Lived in Lava Tube Caves
  • Imminent Carbon Release from the Tundra
  • How Working Memory Reallly Works
  • Substantial Global Cost of Climate Inaction
  • Paradox of Extreme Cold Events in Warming World
  • Plastic Pollution Kills Ocean Embryos

Trending Topics

Strange & offbeat.

IMAGES

  1. hypothesis in research methodology notes

    what is a hypothesis in a scientific report

  2. 15 Hypothesis Examples (2024)

    what is a hypothesis in a scientific report

  3. 🏷️ Formulation of hypothesis in research. How to Write a Strong

    what is a hypothesis in a scientific report

  4. Research Hypothesis: Definition, Types, Examples and Quick Tips

    what is a hypothesis in a scientific report

  5. How to write a hypothesis

    what is a hypothesis in a scientific report

  6. How to Write a Strong Hypothesis in 6 Simple Steps

    what is a hypothesis in a scientific report

VIDEO

  1. Research Hypothesis and its Types with examples /urdu/hindi

  2. What Is A Hypothesis?

  3. 9. The Null Hypothesis

  4. Statistics: Ch 9 Hypothesis Testing (1 of 34) What is a Hypothesis?

  5. How to Write a Scientific report

  6. Types of Hypothesis in Research Methodology with examples

COMMENTS

  1. How to Write a Strong Hypothesis

    5. Phrase your hypothesis in three ways. To identify the variables, you can write a simple prediction in if…then form. The first part of the sentence states the independent variable and the second part states the dependent variable. If a first-year student starts attending more lectures, then their exam scores will improve.

  2. How to Write a Hypothesis in 6 Steps, With Examples

    A hypothesis is a statement that explains the predictions and reasoning of your research—an "educated guess" about how your scientific experiments will end. Use this guide to learn how to write a hypothesis and read successful and unsuccessful examples of a testable hypotheses.

  3. Scientific Hypotheses: Writing, Promoting, and Predicting Implications

    A snapshot analysis of citation activity of hypothesis articles may reveal interest of the global scientific community towards their implications across various disciplines and countries. As a prime example, Strachan's hygiene hypothesis, published in 1989,10 is still attracting numerous citations on Scopus, the largest bibliographic database ...

  4. Scientific hypothesis

    hypothesis. science. scientific hypothesis, an idea that proposes a tentative explanation about a phenomenon or a narrow set of phenomena observed in the natural world. The two primary features of a scientific hypothesis are falsifiability and testability, which are reflected in an "If…then" statement summarizing the idea and in the ...

  5. Research Hypothesis: Definition, Types, Examples and Quick Tips

    What is a Hypothesis? The first step in your scientific endeavor, a hypothesis, is a strong, concise statement that forms the basis of your research. It is not the same as a thesis statement, which is a brief summary of your research paper. The sole purpose of a hypothesis is to predict your paper's findings, data, and conclusion.

  6. How to Write a Great Hypothesis

    The Hypothesis in the Scientific Method . In the scientific method, whether it involves research in psychology, biology, ... For example, a researcher might operationally define the variable "test anxiety" as the results of a self-report measure of anxiety experienced during an exam. A "study habits" variable might be defined by the amount of ...

  7. How to Write a Strong Hypothesis

    Step 5: Phrase your hypothesis in three ways. To identify the variables, you can write a simple prediction in if … then form. The first part of the sentence states the independent variable and the second part states the dependent variable. If a first-year student starts attending more lectures, then their exam scores will improve.

  8. What is and How to Write a Good Hypothesis in Research?

    Simply put, a hypothesis is a research question that also includes the predicted or expected result of the research. Without a hypothesis, there can be no basis for a scientific or research experiment. As such, it is critical that you carefully construct your hypothesis by being deliberate and thorough, even before you set pen to paper.

  9. PDF Scientific Reports

    scientific hypothesis have a responsibility to the rest of the scientific world to report their findings, particularly if these findings add to or contradict previous ideas. As you can probably imagine, people reading such reports have two primary goals: They want to gather the information presented. They want to know that the findings are ...

  10. What is a scientific hypothesis?

    Bibliography. A scientific hypothesis is a tentative, testable explanation for a phenomenon in the natural world. It's the initial building block in the scientific method. Many describe it as an ...

  11. A Practical Guide to Writing Quantitative and Qualitative Research

    INTRODUCTION. Scientific research is usually initiated by posing evidenced-based research questions which are then explicitly restated as hypotheses.1,2 The hypotheses provide directions to guide the study, solutions, explanations, and expected results.3,4 Both research questions and hypotheses are essentially formulated based on conventional theories and real-world processes, which allow the ...

  12. What Is A Research (Scientific) Hypothesis?

    A research hypothesis (also called a scientific hypothesis) is a statement about the expected outcome of a study (for example, a dissertation or thesis). To constitute a quality hypothesis, the statement needs to have three attributes - specificity, clarity and testability. Let's take a look at these more closely.

  13. How to Write a Research Hypothesis: Good & Bad Examples

    Another example for a directional one-tailed alternative hypothesis would be that. H1: Attending private classes before important exams has a positive effect on performance. Your null hypothesis would then be that. H0: Attending private classes before important exams has no/a negative effect on performance.

  14. What is a Hypothesis

    Definition: Hypothesis is an educated guess or proposed explanation for a phenomenon, based on some initial observations or data. It is a tentative statement that can be tested and potentially proven or disproven through further investigation and experimentation. Hypothesis is often used in scientific research to guide the design of experiments ...

  15. What Is a Hypothesis? The Scientific Method

    A hypothesis (plural hypotheses) is a proposed explanation for an observation. The definition depends on the subject. In science, a hypothesis is part of the scientific method. It is a prediction or explanation that is tested by an experiment. Observations and experiments may disprove a scientific hypothesis, but can never entirely prove one.

  16. Research Hypothesis In Psychology: Types, & Examples

    A research hypothesis, in its plural form "hypotheses," is a specific, testable prediction about the anticipated results of a study, established at its outset. It is a key component of the scientific method. Hypotheses connect theory to data and guide the research process towards expanding scientific understanding.

  17. What is a Research Hypothesis and How to Write a Hypothesis

    The steps to write a research hypothesis are: 1. Stating the problem: Ensure that the hypothesis defines the research problem. 2. Writing a hypothesis as an 'if-then' statement: Include the action and the expected outcome of your study by following a 'if-then' structure. 3.

  18. How to Write a Scientific Report

    In addition, scientific reports give others the opportunity to check the methodology of the experiment to ensure the validity of the results. A scientific report is written in several stages. We write the introduction, aim, and hypothesis before performing the experiment, record the results during the experiment, and complete the discussion and ...

  19. On the scope of scientific hypotheses

    Scientific hypothesis: an implicit or explicit statement that can be verbal or formal. The hypothesis makes a statement about some natural phenomena (via an assumption, explanation, cause, law or prediction). The scientific hypothesis is made antecedent to performing a scientific process where there is a commitment to evaluate it.

  20. Scientific Hypothesis Examples

    Scientific Hypothesis Examples . Hypothesis: All forks have three tines. This would be disproven if you find any fork with a different number of tines. Hypothesis: There is no relationship between smoking and lung cancer.While it is difficult to establish cause and effect in health issues, you can apply statistics to data to discredit or support this hypothesis.

  21. How To Write A Lab Report

    A lab report conveys the aim, methods, results, and conclusions of a scientific experiment. The main purpose of a lab report is to demonstrate your understanding of the scientific method by performing and evaluating a hands-on lab experiment. This type of assignment is usually shorter than a research paper.

  22. Machine Learning as a Tool for Hypothesis Generation*

    Abstract. While hypothesis testing is a highly formalized activity, hypothesis generation remains largely informal. We propose a systematic procedure to generate novel hypotheses about human behavior, which uses the capacity of machine learning algorithms to notice patterns people might not.

  23. Scientific Reports

    The scientific method, you'll probably recall, involves developing a hypothesis, testing it, and deciding whether your findings support the hypothesis. In essence, the format for a research report in the sciences mirrors the scientific method but fleshes out the process a little.

  24. On World Parkinson's Day, a New Theory Emerges on the Disease's Origins

    The hypothesis paper builds upon a growing scientific consensus that Parkinson's disease route to the brain starts in either the nose or the gut and proposes that the likely source is environmental toxicants. ... Additional authors of the hypothesis paper include Briana De Miranda, PhD, with the University of Alabama at Birmingham, and Jacob ...

  25. Parkinson's Disease: New theory on the disease's origins and spread

    New hypothesis paper builds on a growing scientific consensus that Parkinson's disease route to the brain starts in either the nose or the gut and proposes that environmental toxicants are the ...